🎬 remotion-captioneer

Drop-in animated captions for Remotion.

Feed it audio. Get word-level synced, beautifully animated captions. Four styles. Zero hassle.

🌐 Live Demo →

🤝 Works With `@remotion/captions`

Our types are fully compatible with the official @remotion/captions package. You can convert freely between them:

import { createTikTokStyleCaptions } from "@remotion/captions";
import { toCaptionArray, fromCaptionArray } from "remotion-captioneer";

// Convert our CaptionData → flat Caption[] for @remotion/captions
const flatCaptions = toCaptionArray(myCaptionData);
const { pages } = createTikTokStyleCaptions({
  captions: flatCaptions,
  combineTokensWithinMilliseconds: 1200,
});

// Or go the other way: Caption[] → CaptionData
const captionData = fromCaptionArray(flatCaptions);

	`@remotion/captions` (official)	`remotion-captioneer` (this)
Caption types	✅ `Caption` type	✅ Compatible + `CaptionData` with segments
Page segmentation	✅ `createTikTokStyleCaptions()`	❌ Use official package
Animated components	❌ Build yourself	✅ 4 ready-to-use styles
STT/transcription	❌ Separate package	✅ 5 providers built-in
CLI tool	❌	✅ `npx captioneer process`

🎥 Caption Styles Preview

Word Highlight

Each word lights up as it's spoken with a scale animation.

"Hello world this is"
  dim  dim  GOLD  dim

Karaoke

Progressive color fill — left-to-right like karaoke.

"Hello world this is"
 RED   red  ░░░░  ░░░

Typewriter

Character-by-character reveal with blinking cursor.

┌─────────────────────┐
│ Hello world th|      │
└─────────────────────┘

Bounce

Active word bounces up with spring physics.

"Hello  world  this  is"
  ↓     ↑      ↓     ↓
       bounce!

👉 See them animated live at the demo page.

✨ Features

🎙️ 5 STT Providers — Local Whisper, OpenAI, Groq, Deepgram, AssemblyAI
🎨 14 Caption Styles — Word Highlight, Karaoke, Typewriter, Bounce, Wave, Glow, Erase, Pill, Flicker, Highlighter, Blur, Rainbow, Scale, Spotlight
🎭 24 Presets — TikTok, Instagram, YouTube, Podcast, Cinematic, Music, Tutorial, Minimal, Gaming, News, Education, Fun
🎵 Audio-Video Sync — Beat detection, volume-reactive animations, timeline keyframes
📦 Template System — Data-driven video generation from JSON config
🧱 Layout Primitives — Stack, Row, Columns, Grid, Center, FadeIn, SlideUp
📤 7 Export Formats — SRT, VTT, ASS, TXT, word-level SRT & VTT
⚡ Drop-in Components — <AnimatedCaptions> works out of the box
🔧 CLI Tool — process, batch, export, presets, providers, styles
📐 Zero Config — Works with sensible defaults, customizable everything
🔷 TypeScript — Full type definitions included
🐳 Docker Ready — Deploy rendering at scale

🚀 Quick Start

Option 1: Scaffold a Project

npx captioneer init my-video
cd my-video
npm install
npm start

This creates a ready-to-use Remotion project with captions.

Option 2: Add to Existing Project

1. Install

npm install remotion-captioneer

Option 2: Add to Existing Project

1. Install

npm install remotion-captioneer

2. Generate Captions from Audio

npx captioneer process my-audio.mp4

This creates my-audio-captions.json with word-level timestamps.

3. Use in Your Remotion Project

import { AbsoluteFill } from "remotion";
import { AnimatedCaptions } from "remotion-captioneer";
import captions from "./my-audio-captions.json";

export const MyVideo = () => {
  return (
    <AbsoluteFill style={{ backgroundColor: "#0a0a0a" }}>
      <AnimatedCaptions
        captions={captions}
        style="word-highlight"
        position="bottom"
        highlightColor="#FFD700"
      />
    </AbsoluteFill>
  );
};

That's it. Render with npx remotion render as usual.

🎨 Caption Styles

14 animated styles, each with a unique visual feel:

Style	Effect	Best For
`word-highlight`	Each word lights up with scale animation	Podcasts, interviews
`karaoke`	Progressive left-to-right color fill	Music, singing
`typewriter`	Character-by-character reveal + cursor	Tutorials, code demos
`bounce`	Active word bounces with spring physics	Social media, reels
`wave`	Words animate in a wave pattern	Music, rhythmic content
`glow`	Neon glow pulsing on active word	Cinematic, dramatic
`typewriter-erase`	Types then erases word-by-word	Transitions, reveals
`pill`	Active word in a colored pill/badge	Clean, modern look
`flicker`	Flickers in like a neon sign	Retro, neon aesthetic
`highlighter`	Yellow highlighter behind active word	Study, educational
`blur`	Future words blur, active word sharpens	Dramatic reveals
`rainbow`	Cycling rainbow colors on active word	Fun, playful content
`scale`	Words grow from small to full size	Energetic, bold
`spotlight`	Radial spotlight effect behind active word	Theatrical, stage

<AnimatedCaptions captions={captions} style="word-highlight" />
<AnimatedCaptions captions={captions} style="karaoke" />
<AnimatedCaptions captions={captions} style="typewriter" />
<AnimatedCaptions captions={captions} style="bounce" />
<AnimatedCaptions captions={captions} style="wave" />
<AnimatedCaptions captions={captions} style="glow" />
<AnimatedCaptions captions={captions} style="typewriter-erase" />
<AnimatedCaptions captions={captions} style="pill" />
<AnimatedCaptions captions={captions} style="flicker" />
<AnimatedCaptions captions={captions} style="highlighter" />
<AnimatedCaptions captions={captions} style="blur" />
<AnimatedCaptions captions={captions} style="rainbow" />
<AnimatedCaptions captions={captions} style="scale" />
<AnimatedCaptions captions={captions} style="spotlight" />

📡 STT Providers

Choose your speech-to-text backend. Supports 5 providers out of the box:

Provider	Env Variable	Speed	Offline	Best For
Local Whisper	—	⭐⭐	✅	Privacy, no API costs
OpenAI	`OPENAI_API_KEY`	⭐⭐⭐	❌	Best accuracy
Groq	`GROQ_API_KEY`	⭐⭐⭐⭐⭐	❌	Ultra-fast inference
Deepgram	`DEEPGRAM_API_KEY`	⭐⭐⭐⭐	❌	Real-time capable
AssemblyAI	`ASSEMBLYAI_API_KEY`	⭐⭐⭐	❌	Rich features

🎭 Caption Presets

Apply a professional look instantly with one of 16 built-in presets:

import { AnimatedCaptions, applyPreset } from "remotion-captioneer";

// Use a preset
<AnimatedCaptions
  captions={captions}
  {...applyPreset("tiktok")}
/>

// Or spread individual props
const tiktokStyle = applyPreset("cinematic-gold");
<AnimatedCaptions captions={captions} {...tiktokStyle} />

Available Presets

Category	Presets
Social Media	`tiktok`, `instagram-reels`, `youtube-shorts`, `twitter-clips`
Podcast	`podcast-clean`, `podcast-bold`
Cinematic	`cinematic-gold`, `cinematic-white`, `cinematic-neon`
Music	`music-karaoke`, `music-wave`
Tutorial	`tutorial-typewriter`, `tutorial-erase`
Minimal	`minimal-white`, `minimal-subtle`
Gaming	`gaming-neon`, `gaming-bold`
News & Documentary	`news-ticker`, `documentary`
Education	`education-highlighter`, `education-scale`
Fun & Creative	`fun-rainbow`, `retro-flicker`

# List presets from CLI
npx captioneer presets

📤 Export Formats

Export captions to standard subtitle formats:

import { toSRT, toVTT, toASS, toPlainText } from "remotion-captioneer";

const srt = toSRT(captionData);       // SubRip (.srt)
const vtt = toVTT(captionData);       // WebVTT (.vtt)
const ass = toASS(captionData);       // SubStation Alpha (.ass)
const txt = toPlainText(captionData); // Plain text

// Word-level exports (for custom timing)
const srtWords = toWordLevelSRT(captionData);
const vttWords = toWordLevelVTT(captionData);

# Export from CLI
npx captioneer export captions.json --format srt
npx captioneer export captions.json --format vtt --output subtitles.vtt
npx captioneer export captions.json --format ass
npx captioneer export captions.json --format srt-words

Formats: srt, vtt, ass, txt, srt-words, vtt-words

Auto-Detection

The CLI auto-detects available providers from environment variables:

# Groq is fastest — set this first if you have a key
export GROQ_API_KEY="gsk_..."

# Or OpenAI
export OPENAI_API_KEY="sk-..."

# Then just run — it picks the best available
npx captioneer process audio.mp4

Explicit Provider

npx captioneer process audio.mp4 --provider groq
npx captioneer process audio.mp4 --provider openai --model whisper-1
npx captioneer process audio.mp4 --provider deepgram --model nova-2
npx captioneer process audio.mp4 --provider assemblyai
npx captioneer process audio.mp4 --provider local --model base

Check Provider Status

npx captioneer providers

📡 Available STT Providers:

  local           ✅ ready
                  models: tiny, base, small, medium, large

  groq            ✅ ready
                  models: whisper-large-v3, whisper-large-v3-turbo, distil-whisper-large-v3-en

  openai          ⚪ not configured
                  models: whisper-1

Programmatic Usage

import { GroqProvider, OpenAIProvider } from "remotion-captioneer";

// Groq — ultra-fast
const groq = new GroqProvider("gsk_...");
const captions = await groq.transcribe("audio.mp4", {
  model: "whisper-large-v3-turbo",
  language: "en",
});

// OpenAI
const openai = new OpenAIProvider("sk-...");
const captions = await openai.transcribe("audio.mp4");

// Auto-detect from env
import { detectProvider } from "remotion-captioneer";
const detected = detectProvider();
if (detected) {
  const captions = await detected.provider.transcribe("audio.mp4");
}

🎵 Audio-Video Sync

Frame-perfect animations synchronized to audio. No more manually timing keyframes.

Pre-analyze Audio

import { analyzeAudio } from "remotion-captioneer";

const analysis = await analyzeAudio("my-audio.mp4");
// Returns: beats, volumeFrames, bpm, energy levels

Beat-Reactive Hooks

import {
  AudioSyncProvider,
  useBeatPulse,
  useVolume,
  useEnergy,
} from "remotion-captioneer";

// Wrap your composition
const MyVideo = () => (
  <AudioSyncProvider analysis={audioAnalysis}>
    <BeatReactiveContent />
  </AudioSyncProvider>
);

// Use in any child component
const BeatReactiveContent = () => {
  const pulse = useBeatPulse();       // 0→1 spring on each beat
  const volume = useVolume();          // Current volume 0-1
  const energy = useEnergy();          // Smoothed energy 0-1

  return (
    <div style={{
      transform: `scale(${1 + pulse * 0.2})`,
      opacity: 0.5 + volume * 0.5,
    }}>
      🎵 Synced to the beat!
    </div>
  );
};

Timeline Keyframes

import { useTimelineValue, fadeInOut } from "remotion-captioneer";

// Map animation to audio timestamps (in ms)
const opacity = useTimelineValue({
  keyframes: [
    { timeMs: 0, value: 0 },
    { timeMs: 1000, value: 1, easing: "easeOut" },
    { timeMs: 5000, value: 1 },
    { timeMs: 6000, value: 0, easing: "easeIn" },
  ],
  defaultValue: 0,
});

// Or use the helper
const fadeOpacity = useTimelineValue(
  fadeInOut(0, 1000, 5000, 6000)
);

Available Hooks

Hook	Returns	Use For
`useVolume()`	`number` (0-1)	Opacity, scale, size
`useBeat()`	`BeatInfo \| null`	Flash effects, pulses
`useBeatPulse()`	`number` (0-1 spring)	Bounce, scale on beat
`useEnergy()`	`number` (0-1)	Background intensity
`useIsOnBeat()`	`boolean`	Conditional rendering
`useTimelineValue()`	`number`	Keyframe animations
`useTimelineProgress()`	`number` (0-1)	Progress bars

📦 Template System

Build videos from JSON config. No code needed for simple videos.

Quick Template

import { buildTemplate, TemplateComposition } from "remotion-captioneer";

const template = buildTemplate({
  name: "My Captioned Video",
  intro: {
    title: "Episode 1",
    subtitle: "Getting Started",
    logo: "/logo.png",
  },
  captions: [
    { captions: myCaptions, captionStyle: "word-highlight" },
  ],
  outro: {
    heading: "Thanks for watching!",
    cta: "Subscribe for more",
    logo: "/logo.png",
  },
});

// Use as Remotion composition
<TemplateComposition template={template} />

Preset Scenes

import {
  createIntroScene,
  createCaptionScene,
  createOutroScene,
  createDividerScene,
} from "remotion-captioneer";

const intro = createIntroScene({
  title: "My Video",
  subtitle: "A demo",
  durationSec: 3,
});

const content = createCaptionScene({
  captions: myCaptions,
  captionStyle: "karaoke",
  highlightColor: "#FF6B6B",
});

const outro = createOutroScene({
  heading: "The End",
  cta: "Like & Subscribe",
  logo: "/logo.png",
});

Design Tokens

Customize the entire look with a single config:

const template = buildTemplate({
  name: "Brand Video",
  tokens: {
    colors: {
      primary: "#6366F1",
      accent: "#FFD700",
      background: "#0a0a0a",
      text: "#FFFFFF",
    },
    typography: {
      headingFont: "Poppins, sans-serif",
      bodyFont: "Inter, sans-serif",
    },
  },
  // ...
});

🧱 Layout Primitives

Composable layout building blocks for any Remotion video:

import {
  Stack, Row, Columns, Grid,
  Center, FadeIn, SlideUp,
  GradientBg, Overlay, Positioned,
} from "remotion-captioneer";

// Vertical stack
<Stack gap={24}>
  <FadeIn delayMs={0}>Title</FadeIn>
  <FadeIn delayMs={200}>Subtitle</FadeIn>
</Stack>

// Horizontal columns
<Columns ratios={[2, 1]} gap={32}>
  <div>Main content</div>
  <div>Sidebar</div>
</Columns>

// Grid layout
<Grid columns={3} gap={16}>
  {items.map(item => <Card key={item.id} />)}
</Grid>

// Animated entrance
<SlideUp delayMs={500} durationMs={800}>
  <div>Slides up with delay</div>
</SlideUp>

// Gradient background
<GradientBg from="#0a0a0a" to="#1a1a2e">
  <Center>Content here</Center>
</GradientBg>

🎙️ CLI Reference

Process Audio

# Basic usage (auto-detects provider from env vars)
npx captioneer process audio.mp4

# Specify provider
npx captioneer process audio.mp4 --provider groq
npx captioneer process audio.mp4 --provider openai --model whisper-1

# With options
npx captioneer process audio.mp4 --provider groq --language en --output captions.json
npx captioneer process audio.mp4 --provider local --model base

# Pass API key directly
npx captioneer process audio.mp4 --provider groq --api-key gsk_...

Options:

-p, --provider <provider> — STT provider: local, openai, groq, deepgram, assemblyai
-m, --model <model> — Model name (provider-specific)
-k, --api-key <key> — API key (or use env vars)
-l, --language <lang> — Language code: en, es, fr, de, etc.
-o, --output <path> — Output JSON path
-v, --verbose — Verbose output

Other Commands

# Scaffold a new project
npx captioneer init my-video

# List available providers and their status
npx captioneer providers

# List available caption styles
npx captioneer styles

# List available presets
npx captioneer presets

# Export captions to SRT/VTT/ASS
npx captioneer export captions.json --format srt
npx captioneer export captions.json --format vtt --output subs.vtt

# Batch process a directory of audio files
npx captioneer batch ./audio-files/
npx captioneer batch ./audio-files/ --provider groq --output-dir ./captions/

# Start real-time preview server
npx captioneer preview

# Open Remotion Studio with demos
npx captioneer demo

📖 Caption Data Format

The generated JSON follows this structure:

interface CaptionData {
  segments: Array<{
    text: string;           // Full segment text
    startMs: number;        // Segment start time (ms)
    endMs: number;          // Segment end time (ms)
    words: Array<{
      word: string;         // Word text
      startMs: number;      // Word start time (ms)
      endMs: number;        // Word end time (ms)
      confidence: number;   // Whisper confidence (0-1)
    }>;
  }>;
  language: string;         // Detected language
  durationMs: number;       // Total duration (ms)
}

You can also create caption data manually or from other sources — just match this format.

⚙️ Configuration

Create a .captioneerrc file in your project root:

{
  "whisperPath": "./whisper.cpp",
  "modelPath": "./whisper.cpp/models/ggml-base.bin",
  "defaultModel": "base",
  "defaultLanguage": "en",
  "defaultStyle": "word-highlight"
}

Or add to your package.json:

{
  "captioneer": {
    "defaultModel": "base",
    "defaultLanguage": "en"
  }
}

🎬 Full Example

import {
  AbsoluteFill,
  Audio,
  Composition,
  staticFile,
} from "remotion";
import { AnimatedCaptions } from "remotion-captioneer";
import captions from "./captions.json";

export const CaptionedVideo = () => (
  <AbsoluteFill
    style={{
      background: "linear-gradient(135deg, #0a0a0a 0%, #1a1a2e 100%)",
    }}
  >
    <Audio src={staticFile("my-audio.mp4")} />
    <AnimatedCaptions
      captions={captions}
      style="karaoke"
      position="bottom"
      highlightColor="#FF6B6B"
      fontSize={64}
      fontFamily="Inter, sans-serif"
    />
  </AbsoluteFill>
);

export const RemotionRoot = () => (
  <Composition
    id="CaptionedVideo"
    component={CaptionedVideo}
    durationInFrames={900} // 30s at 30fps
    fps={30}
    width={1920}
    height={1080}
  />
);

🐳 Docker

FROM node:20-slim

# Install whisper.cpp dependencies
RUN apt-get update && apt-get install -y git cmake build-essential

WORKDIR /app
COPY . .
RUN npm install

# The CLI will auto-install whisper.cpp on first run
ENTRYPOINT ["npx", "captioneer"]

🛠️ Component Props

`<AnimatedCaptions>`

Prop	Type	Default	Description
`captions`	`CaptionData`	required	Caption data object
`style`	`CaptionStyle`	`"word-highlight"`	Caption animation style
`fontFamily`	`string`	`"Inter, sans-serif"`	Font family
`fontSize`	`number`	`56`	Font size in px
`fontColor`	`string`	`"rgba(255,255,255,0.5)"`	Inactive text color
`highlightColor`	`string`	`"#FFD700"`	Active/highlight color
`position`	`"top" \| "center" \| "bottom"`	`"bottom"`	Vertical position

📚 Examples

See the examples/ directory for complete working examples:

File	What it shows
`01-basic.tsx`	Simplest captioned video
`02-presets.tsx`	Using presets (TikTok, Cinematic, Gaming)
`03-audio-sync.tsx`	Beat-reactive animations
`04-template.tsx`	Multi-scene template (intro → content → outro)
`05-layouts.tsx`	Custom layouts with primitives
`06-export.ts`	Export to SRT, VTT, ASS formats
`07-emoji.tsx`	Emoji reactions at word timestamps

🗺️ Roadmap

✅ Completed

🔮 Future

Caption style marketplace (community-contributed styles)
~~AI-powered auto-emoji~~ (autoGenerateReactions() — keyword-based emoji generation from 60+ word→emoji mappings)
Multi-language caption support with RTL
~~Caption editor with visual timeline~~ (Preview server with playback controls, progress bar, beat markers, style selector)
Integration with video hosting APIs (YouTube, Vimeo)
~~Real-time caption rendering in browser~~ (npx captioneer preview — live browser-based caption rendering with audio sync)
Caption translation utilities
Speaker diarization (multi-speaker support)

🤝 Contributing

Contributions welcome! Please open an issue first to discuss what you'd like to change.

Fork the repo
Create your feature branch (git checkout -b feature/amazing)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing)
Open a Pull Request

📄 License

💡 Why This Exists

Everyone using Remotion for captioned videos ends up rebuilding the same thing:

Get audio → run Whisper → parse output → sync to frames → animate words

This package handles steps 2-5 so you can focus on your content, not plumbing.

⭐ Star this repo if it helps you!

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
docs		docs
examples		examples
src		src
.gitignore		.gitignore
.npmignore		.npmignore
.prettierrc		.prettierrc
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

🎬 remotion-captioneer

🌐 Live Demo →

🤝 Works With @remotion/captions

🎥 Caption Styles Preview

Word Highlight

Karaoke

Typewriter

Bounce

✨ Features

🚀 Quick Start

Option 1: Scaffold a Project

Option 2: Add to Existing Project

1. Install

Option 2: Add to Existing Project

1. Install

2. Generate Captions from Audio

3. Use in Your Remotion Project

🎨 Caption Styles

📡 STT Providers

🎭 Caption Presets

Available Presets

📤 Export Formats

Auto-Detection

Explicit Provider

Check Provider Status

Programmatic Usage

🎵 Audio-Video Sync

Pre-analyze Audio

Beat-Reactive Hooks

Timeline Keyframes

Available Hooks

📦 Template System

Quick Template

Preset Scenes

Design Tokens

🧱 Layout Primitives

🎙️ CLI Reference

Process Audio

Other Commands

📖 Caption Data Format

⚙️ Configuration

🎬 Full Example

🐳 Docker

🛠️ Component Props

<AnimatedCaptions>

📚 Examples

🗺️ Roadmap

✅ Completed

🔮 Future

🤝 Contributing

📄 License

💡 Why This Exists

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors 1

Languages

🤝 Works With `@remotion/captions`

`<AnimatedCaptions>`

Packages