Skip to content

neutral-Stage/remotion-captioneer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

30 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฌ remotion-captioneer

Drop-in animated captions for Remotion.

Feed it audio. Get word-level synced, beautifully animated captions. Four styles. Zero hassle.

CI npm license remotion CodeQL


๐Ÿค Works With @remotion/captions

Our types are fully compatible with the official @remotion/captions package. You can convert freely between them:

import { createTikTokStyleCaptions } from "@remotion/captions";
import { toCaptionArray, fromCaptionArray } from "remotion-captioneer";

// Convert our CaptionData โ†’ flat Caption[] for @remotion/captions
const flatCaptions = toCaptionArray(myCaptionData);
const { pages } = createTikTokStyleCaptions({
  captions: flatCaptions,
  combineTokensWithinMilliseconds: 1200,
});

// Or go the other way: Caption[] โ†’ CaptionData
const captionData = fromCaptionArray(flatCaptions);
@remotion/captions (official) remotion-captioneer (this)
Caption types โœ… Caption type โœ… Compatible + CaptionData with segments
Page segmentation โœ… createTikTokStyleCaptions() โŒ Use official package
Animated components โŒ Build yourself โœ… 4 ready-to-use styles
STT/transcription โŒ Separate package โœ… 5 providers built-in
CLI tool โŒ โœ… npx captioneer process

๐ŸŽฅ Caption Styles Preview

Word Highlight

Each word lights up as it's spoken with a scale animation.

"Hello world this is"
  dim  dim  GOLD  dim

Karaoke

Progressive color fill โ€” left-to-right like karaoke.

"Hello world this is"
 RED   red  โ–‘โ–‘โ–‘โ–‘  โ–‘โ–‘โ–‘

Typewriter

Character-by-character reveal with blinking cursor.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Hello world th|      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Bounce

Active word bounces up with spring physics.

"Hello  world  this  is"
  โ†“     โ†‘      โ†“     โ†“
       bounce!

๐Ÿ‘‰ See them animated live at the demo page.


โœจ Features

  • ๐ŸŽ™๏ธ 5 STT Providers โ€” Local Whisper, OpenAI, Groq, Deepgram, AssemblyAI
  • ๐ŸŽจ 14 Caption Styles โ€” Word Highlight, Karaoke, Typewriter, Bounce, Wave, Glow, Erase, Pill, Flicker, Highlighter, Blur, Rainbow, Scale, Spotlight
  • ๐ŸŽญ 24 Presets โ€” TikTok, Instagram, YouTube, Podcast, Cinematic, Music, Tutorial, Minimal, Gaming, News, Education, Fun
  • ๐ŸŽต Audio-Video Sync โ€” Beat detection, volume-reactive animations, timeline keyframes
  • ๐Ÿ“ฆ Template System โ€” Data-driven video generation from JSON config
  • ๐Ÿงฑ Layout Primitives โ€” Stack, Row, Columns, Grid, Center, FadeIn, SlideUp
  • ๐Ÿ“ค 7 Export Formats โ€” SRT, VTT, ASS, TXT, word-level SRT & VTT
  • โšก Drop-in Components โ€” <AnimatedCaptions> works out of the box
  • ๐Ÿ”ง CLI Tool โ€” process, batch, export, presets, providers, styles
  • ๐Ÿ“ Zero Config โ€” Works with sensible defaults, customizable everything
  • ๐Ÿ”ท TypeScript โ€” Full type definitions included
  • ๐Ÿณ Docker Ready โ€” Deploy rendering at scale

๐Ÿš€ Quick Start

Option 1: Scaffold a Project

npx captioneer init my-video
cd my-video
npm install
npm start

This creates a ready-to-use Remotion project with captions.

Option 2: Add to Existing Project

1. Install

npm install remotion-captioneer

Option 2: Add to Existing Project

1. Install

npm install remotion-captioneer

2. Generate Captions from Audio

npx captioneer process my-audio.mp4

This creates my-audio-captions.json with word-level timestamps.

3. Use in Your Remotion Project

import { AbsoluteFill } from "remotion";
import { AnimatedCaptions } from "remotion-captioneer";
import captions from "./my-audio-captions.json";

export const MyVideo = () => {
  return (
    <AbsoluteFill style={{ backgroundColor: "#0a0a0a" }}>
      <AnimatedCaptions
        captions={captions}
        style="word-highlight"
        position="bottom"
        highlightColor="#FFD700"
      />
    </AbsoluteFill>
  );
};

That's it. Render with npx remotion render as usual.


๐ŸŽจ Caption Styles

14 animated styles, each with a unique visual feel:

Style Effect Best For
word-highlight Each word lights up with scale animation Podcasts, interviews
karaoke Progressive left-to-right color fill Music, singing
typewriter Character-by-character reveal + cursor Tutorials, code demos
bounce Active word bounces with spring physics Social media, reels
wave Words animate in a wave pattern Music, rhythmic content
glow Neon glow pulsing on active word Cinematic, dramatic
typewriter-erase Types then erases word-by-word Transitions, reveals
pill Active word in a colored pill/badge Clean, modern look
flicker Flickers in like a neon sign Retro, neon aesthetic
highlighter Yellow highlighter behind active word Study, educational
blur Future words blur, active word sharpens Dramatic reveals
rainbow Cycling rainbow colors on active word Fun, playful content
scale Words grow from small to full size Energetic, bold
spotlight Radial spotlight effect behind active word Theatrical, stage
<AnimatedCaptions captions={captions} style="word-highlight" />
<AnimatedCaptions captions={captions} style="karaoke" />
<AnimatedCaptions captions={captions} style="typewriter" />
<AnimatedCaptions captions={captions} style="bounce" />
<AnimatedCaptions captions={captions} style="wave" />
<AnimatedCaptions captions={captions} style="glow" />
<AnimatedCaptions captions={captions} style="typewriter-erase" />
<AnimatedCaptions captions={captions} style="pill" />
<AnimatedCaptions captions={captions} style="flicker" />
<AnimatedCaptions captions={captions} style="highlighter" />
<AnimatedCaptions captions={captions} style="blur" />
<AnimatedCaptions captions={captions} style="rainbow" />
<AnimatedCaptions captions={captions} style="scale" />
<AnimatedCaptions captions={captions} style="spotlight" />

๐Ÿ“ก STT Providers

Choose your speech-to-text backend. Supports 5 providers out of the box:

Provider Env Variable Speed Offline Best For
Local Whisper โ€” โญโญ โœ… Privacy, no API costs
OpenAI OPENAI_API_KEY โญโญโญ โŒ Best accuracy
Groq GROQ_API_KEY โญโญโญโญโญ โŒ Ultra-fast inference
Deepgram DEEPGRAM_API_KEY โญโญโญโญ โŒ Real-time capable
AssemblyAI ASSEMBLYAI_API_KEY โญโญโญ โŒ Rich features

๐ŸŽญ Caption Presets

Apply a professional look instantly with one of 16 built-in presets:

import { AnimatedCaptions, applyPreset } from "remotion-captioneer";

// Use a preset
<AnimatedCaptions
  captions={captions}
  {...applyPreset("tiktok")}
/>

// Or spread individual props
const tiktokStyle = applyPreset("cinematic-gold");
<AnimatedCaptions captions={captions} {...tiktokStyle} />

Available Presets

Category Presets
Social Media tiktok, instagram-reels, youtube-shorts, twitter-clips
Podcast podcast-clean, podcast-bold
Cinematic cinematic-gold, cinematic-white, cinematic-neon
Music music-karaoke, music-wave
Tutorial tutorial-typewriter, tutorial-erase
Minimal minimal-white, minimal-subtle
Gaming gaming-neon, gaming-bold
News & Documentary news-ticker, documentary
Education education-highlighter, education-scale
Fun & Creative fun-rainbow, retro-flicker
# List presets from CLI
npx captioneer presets

๐Ÿ“ค Export Formats

Export captions to standard subtitle formats:

import { toSRT, toVTT, toASS, toPlainText } from "remotion-captioneer";

const srt = toSRT(captionData);       // SubRip (.srt)
const vtt = toVTT(captionData);       // WebVTT (.vtt)
const ass = toASS(captionData);       // SubStation Alpha (.ass)
const txt = toPlainText(captionData); // Plain text

// Word-level exports (for custom timing)
const srtWords = toWordLevelSRT(captionData);
const vttWords = toWordLevelVTT(captionData);
# Export from CLI
npx captioneer export captions.json --format srt
npx captioneer export captions.json --format vtt --output subtitles.vtt
npx captioneer export captions.json --format ass
npx captioneer export captions.json --format srt-words

Formats: srt, vtt, ass, txt, srt-words, vtt-words

Auto-Detection

The CLI auto-detects available providers from environment variables:

# Groq is fastest โ€” set this first if you have a key
export GROQ_API_KEY="gsk_..."

# Or OpenAI
export OPENAI_API_KEY="sk-..."

# Then just run โ€” it picks the best available
npx captioneer process audio.mp4

Explicit Provider

npx captioneer process audio.mp4 --provider groq
npx captioneer process audio.mp4 --provider openai --model whisper-1
npx captioneer process audio.mp4 --provider deepgram --model nova-2
npx captioneer process audio.mp4 --provider assemblyai
npx captioneer process audio.mp4 --provider local --model base

Check Provider Status

npx captioneer providers
๐Ÿ“ก Available STT Providers:

  local           โœ… ready
                  models: tiny, base, small, medium, large

  groq            โœ… ready
                  models: whisper-large-v3, whisper-large-v3-turbo, distil-whisper-large-v3-en

  openai          โšช not configured
                  models: whisper-1

Programmatic Usage

import { GroqProvider, OpenAIProvider } from "remotion-captioneer";

// Groq โ€” ultra-fast
const groq = new GroqProvider("gsk_...");
const captions = await groq.transcribe("audio.mp4", {
  model: "whisper-large-v3-turbo",
  language: "en",
});

// OpenAI
const openai = new OpenAIProvider("sk-...");
const captions = await openai.transcribe("audio.mp4");

// Auto-detect from env
import { detectProvider } from "remotion-captioneer";
const detected = detectProvider();
if (detected) {
  const captions = await detected.provider.transcribe("audio.mp4");
}

๐ŸŽต Audio-Video Sync

Frame-perfect animations synchronized to audio. No more manually timing keyframes.

Pre-analyze Audio

import { analyzeAudio } from "remotion-captioneer";

const analysis = await analyzeAudio("my-audio.mp4");
// Returns: beats, volumeFrames, bpm, energy levels

Beat-Reactive Hooks

import {
  AudioSyncProvider,
  useBeatPulse,
  useVolume,
  useEnergy,
} from "remotion-captioneer";

// Wrap your composition
const MyVideo = () => (
  <AudioSyncProvider analysis={audioAnalysis}>
    <BeatReactiveContent />
  </AudioSyncProvider>
);

// Use in any child component
const BeatReactiveContent = () => {
  const pulse = useBeatPulse();       // 0โ†’1 spring on each beat
  const volume = useVolume();          // Current volume 0-1
  const energy = useEnergy();          // Smoothed energy 0-1

  return (
    <div style={{
      transform: `scale(${1 + pulse * 0.2})`,
      opacity: 0.5 + volume * 0.5,
    }}>
      ๐ŸŽต Synced to the beat!
    </div>
  );
};

Timeline Keyframes

import { useTimelineValue, fadeInOut } from "remotion-captioneer";

// Map animation to audio timestamps (in ms)
const opacity = useTimelineValue({
  keyframes: [
    { timeMs: 0, value: 0 },
    { timeMs: 1000, value: 1, easing: "easeOut" },
    { timeMs: 5000, value: 1 },
    { timeMs: 6000, value: 0, easing: "easeIn" },
  ],
  defaultValue: 0,
});

// Or use the helper
const fadeOpacity = useTimelineValue(
  fadeInOut(0, 1000, 5000, 6000)
);

Available Hooks

Hook Returns Use For
useVolume() number (0-1) Opacity, scale, size
useBeat() BeatInfo | null Flash effects, pulses
useBeatPulse() number (0-1 spring) Bounce, scale on beat
useEnergy() number (0-1) Background intensity
useIsOnBeat() boolean Conditional rendering
useTimelineValue() number Keyframe animations
useTimelineProgress() number (0-1) Progress bars

๐Ÿ“ฆ Template System

Build videos from JSON config. No code needed for simple videos.

Quick Template

import { buildTemplate, TemplateComposition } from "remotion-captioneer";

const template = buildTemplate({
  name: "My Captioned Video",
  intro: {
    title: "Episode 1",
    subtitle: "Getting Started",
    logo: "/logo.png",
  },
  captions: [
    { captions: myCaptions, captionStyle: "word-highlight" },
  ],
  outro: {
    heading: "Thanks for watching!",
    cta: "Subscribe for more",
    logo: "/logo.png",
  },
});

// Use as Remotion composition
<TemplateComposition template={template} />

Preset Scenes

import {
  createIntroScene,
  createCaptionScene,
  createOutroScene,
  createDividerScene,
} from "remotion-captioneer";

const intro = createIntroScene({
  title: "My Video",
  subtitle: "A demo",
  durationSec: 3,
});

const content = createCaptionScene({
  captions: myCaptions,
  captionStyle: "karaoke",
  highlightColor: "#FF6B6B",
});

const outro = createOutroScene({
  heading: "The End",
  cta: "Like & Subscribe",
  logo: "/logo.png",
});

Design Tokens

Customize the entire look with a single config:

const template = buildTemplate({
  name: "Brand Video",
  tokens: {
    colors: {
      primary: "#6366F1",
      accent: "#FFD700",
      background: "#0a0a0a",
      text: "#FFFFFF",
    },
    typography: {
      headingFont: "Poppins, sans-serif",
      bodyFont: "Inter, sans-serif",
    },
  },
  // ...
});

๐Ÿงฑ Layout Primitives

Composable layout building blocks for any Remotion video:

import {
  Stack, Row, Columns, Grid,
  Center, FadeIn, SlideUp,
  GradientBg, Overlay, Positioned,
} from "remotion-captioneer";

// Vertical stack
<Stack gap={24}>
  <FadeIn delayMs={0}>Title</FadeIn>
  <FadeIn delayMs={200}>Subtitle</FadeIn>
</Stack>

// Horizontal columns
<Columns ratios={[2, 1]} gap={32}>
  <div>Main content</div>
  <div>Sidebar</div>
</Columns>

// Grid layout
<Grid columns={3} gap={16}>
  {items.map(item => <Card key={item.id} />)}
</Grid>

// Animated entrance
<SlideUp delayMs={500} durationMs={800}>
  <div>Slides up with delay</div>
</SlideUp>

// Gradient background
<GradientBg from="#0a0a0a" to="#1a1a2e">
  <Center>Content here</Center>
</GradientBg>

๐ŸŽ™๏ธ CLI Reference

Process Audio

# Basic usage (auto-detects provider from env vars)
npx captioneer process audio.mp4

# Specify provider
npx captioneer process audio.mp4 --provider groq
npx captioneer process audio.mp4 --provider openai --model whisper-1

# With options
npx captioneer process audio.mp4 --provider groq --language en --output captions.json
npx captioneer process audio.mp4 --provider local --model base

# Pass API key directly
npx captioneer process audio.mp4 --provider groq --api-key gsk_...

Options:

  • -p, --provider <provider> โ€” STT provider: local, openai, groq, deepgram, assemblyai
  • -m, --model <model> โ€” Model name (provider-specific)
  • -k, --api-key <key> โ€” API key (or use env vars)
  • -l, --language <lang> โ€” Language code: en, es, fr, de, etc.
  • -o, --output <path> โ€” Output JSON path
  • -v, --verbose โ€” Verbose output

Other Commands

# Scaffold a new project
npx captioneer init my-video

# List available providers and their status
npx captioneer providers

# List available caption styles
npx captioneer styles

# List available presets
npx captioneer presets

# Export captions to SRT/VTT/ASS
npx captioneer export captions.json --format srt
npx captioneer export captions.json --format vtt --output subs.vtt

# Batch process a directory of audio files
npx captioneer batch ./audio-files/
npx captioneer batch ./audio-files/ --provider groq --output-dir ./captions/

# Start real-time preview server
npx captioneer preview

# Open Remotion Studio with demos
npx captioneer demo

๐Ÿ“– Caption Data Format

The generated JSON follows this structure:

interface CaptionData {
  segments: Array<{
    text: string;           // Full segment text
    startMs: number;        // Segment start time (ms)
    endMs: number;          // Segment end time (ms)
    words: Array<{
      word: string;         // Word text
      startMs: number;      // Word start time (ms)
      endMs: number;        // Word end time (ms)
      confidence: number;   // Whisper confidence (0-1)
    }>;
  }>;
  language: string;         // Detected language
  durationMs: number;       // Total duration (ms)
}

You can also create caption data manually or from other sources โ€” just match this format.


โš™๏ธ Configuration

Create a .captioneerrc file in your project root:

{
  "whisperPath": "./whisper.cpp",
  "modelPath": "./whisper.cpp/models/ggml-base.bin",
  "defaultModel": "base",
  "defaultLanguage": "en",
  "defaultStyle": "word-highlight"
}

Or add to your package.json:

{
  "captioneer": {
    "defaultModel": "base",
    "defaultLanguage": "en"
  }
}

๐ŸŽฌ Full Example

import {
  AbsoluteFill,
  Audio,
  Composition,
  staticFile,
} from "remotion";
import { AnimatedCaptions } from "remotion-captioneer";
import captions from "./captions.json";

export const CaptionedVideo = () => (
  <AbsoluteFill
    style={{
      background: "linear-gradient(135deg, #0a0a0a 0%, #1a1a2e 100%)",
    }}
  >
    <Audio src={staticFile("my-audio.mp4")} />
    <AnimatedCaptions
      captions={captions}
      style="karaoke"
      position="bottom"
      highlightColor="#FF6B6B"
      fontSize={64}
      fontFamily="Inter, sans-serif"
    />
  </AbsoluteFill>
);

export const RemotionRoot = () => (
  <Composition
    id="CaptionedVideo"
    component={CaptionedVideo}
    durationInFrames={900} // 30s at 30fps
    fps={30}
    width={1920}
    height={1080}
  />
);

๐Ÿณ Docker

FROM node:20-slim

# Install whisper.cpp dependencies
RUN apt-get update && apt-get install -y git cmake build-essential

WORKDIR /app
COPY . .
RUN npm install

# The CLI will auto-install whisper.cpp on first run
ENTRYPOINT ["npx", "captioneer"]

๐Ÿ› ๏ธ Component Props

<AnimatedCaptions>

Prop Type Default Description
captions CaptionData required Caption data object
style CaptionStyle "word-highlight" Caption animation style
fontFamily string "Inter, sans-serif" Font family
fontSize number 56 Font size in px
fontColor string "rgba(255,255,255,0.5)" Inactive text color
highlightColor string "#FFD700" Active/highlight color
position "top" | "center" | "bottom" "bottom" Vertical position

๐Ÿ“š Examples

See the examples/ directory for complete working examples:

File What it shows
01-basic.tsx Simplest captioned video
02-presets.tsx Using presets (TikTok, Cinematic, Gaming)
03-audio-sync.tsx Beat-reactive animations
04-template.tsx Multi-scene template (intro โ†’ content โ†’ outro)
05-layouts.tsx Custom layouts with primitives
06-export.ts Export to SRT, VTT, ASS formats
07-emoji.tsx Emoji reactions at word timestamps

๐Ÿ—บ๏ธ Roadmap

โœ… Completed

  • 14 caption styles (word-highlight, karaoke, typewriter, bounce, wave, glow, typewriter-erase, pill, flicker, highlighter, blur, rainbow, scale, spotlight)
  • 24 caption presets across 10 categories (Social, Podcast, Cinematic, Music, Tutorial, Minimal, Gaming, News, Education, Fun)
  • Multi-line auto-wrapping with smart breaks (smartWrap())
  • Word-level emoji reactions (EmojiReactions + autoGenerateReactions())
  • Real-time preview server (npx captioneer preview)
  • Batch processing mode (npx captioneer batch ./audio/)
  • Multi-provider STT (OpenAI, Groq, Deepgram, AssemblyAI, Local Whisper)
  • @remotion/captions compatibility layer
  • Audio-video sync (beat detection, volume hooks, timeline keyframes)
  • Template system for data-driven videos
  • Layout primitives (Stack, Row, Columns, Grid, FadeIn, SlideUp, etc.)
  • Export formats (SRT, VTT, ASS, TXT, word-level SRT & VTT)
  • Project scaffolder (npx captioneer init)
  • 7 working examples covering all features
  • 10 CLI commands (init, process, batch, export, preview, presets, providers, styles, demo)
  • GitHub Pages demo with all 14 styles animated
  • GitHub Actions CI/CD (build, test, release to npm, CodeQL)
  • 0 vulnerabilities in npm audit

๐Ÿ”ฎ Future

  • Caption style marketplace (community-contributed styles)
  • AI-powered auto-emoji (autoGenerateReactions() โ€” keyword-based emoji generation from 60+ wordโ†’emoji mappings)
  • Multi-language caption support with RTL
  • Caption editor with visual timeline (Preview server with playback controls, progress bar, beat markers, style selector)
  • Integration with video hosting APIs (YouTube, Vimeo)
  • Real-time caption rendering in browser (npx captioneer preview โ€” live browser-based caption rendering with audio sync)
  • Caption translation utilities
  • Speaker diarization (multi-speaker support)

๐Ÿค Contributing

Contributions welcome! Please open an issue first to discuss what you'd like to change.

  1. Fork the repo
  2. Create your feature branch (git checkout -b feature/amazing)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing)
  5. Open a Pull Request

๐Ÿ“„ License

MIT ยฉ Shuvo Roy


๐Ÿ’ก Why This Exists

Everyone using Remotion for captioned videos ends up rebuilding the same thing:

Get audio โ†’ run Whisper โ†’ parse output โ†’ sync to frames โ†’ animate words

This package handles steps 2-5 so you can focus on your content, not plumbing.

โญ Star this repo if it helps you!