Drop-in animated captions for Remotion.
Feed it audio. Get word-level synced, beautifully animated captions. Four styles. Zero hassle.
๐ Live Demo โ
Our types are fully compatible with the official @remotion/captions package. You can convert freely between them:
import { createTikTokStyleCaptions } from "@remotion/captions";
import { toCaptionArray, fromCaptionArray } from "remotion-captioneer";
// Convert our CaptionData โ flat Caption[] for @remotion/captions
const flatCaptions = toCaptionArray(myCaptionData);
const { pages } = createTikTokStyleCaptions({
captions: flatCaptions,
combineTokensWithinMilliseconds: 1200,
});
// Or go the other way: Caption[] โ CaptionData
const captionData = fromCaptionArray(flatCaptions);@remotion/captions (official) |
remotion-captioneer (this) |
|
|---|---|---|
| Caption types | โ
Caption type |
โ
Compatible + CaptionData with segments |
| Page segmentation | โ
createTikTokStyleCaptions() |
โ Use official package |
| Animated components | โ Build yourself | โ 4 ready-to-use styles |
| STT/transcription | โ Separate package | โ 5 providers built-in |
| CLI tool | โ | โ
npx captioneer process |
|
Each word lights up as it's spoken with a scale animation. |
Progressive color fill โ left-to-right like karaoke. |
|
Character-by-character reveal with blinking cursor. |
Active word bounces up with spring physics. |
๐ See them animated live at the demo page.
- ๐๏ธ 5 STT Providers โ Local Whisper, OpenAI, Groq, Deepgram, AssemblyAI
- ๐จ 14 Caption Styles โ Word Highlight, Karaoke, Typewriter, Bounce, Wave, Glow, Erase, Pill, Flicker, Highlighter, Blur, Rainbow, Scale, Spotlight
- ๐ญ 24 Presets โ TikTok, Instagram, YouTube, Podcast, Cinematic, Music, Tutorial, Minimal, Gaming, News, Education, Fun
- ๐ต Audio-Video Sync โ Beat detection, volume-reactive animations, timeline keyframes
- ๐ฆ Template System โ Data-driven video generation from JSON config
- ๐งฑ Layout Primitives โ Stack, Row, Columns, Grid, Center, FadeIn, SlideUp
- ๐ค 7 Export Formats โ SRT, VTT, ASS, TXT, word-level SRT & VTT
- โก Drop-in Components โ
<AnimatedCaptions>works out of the box - ๐ง CLI Tool โ process, batch, export, presets, providers, styles
- ๐ Zero Config โ Works with sensible defaults, customizable everything
- ๐ท TypeScript โ Full type definitions included
- ๐ณ Docker Ready โ Deploy rendering at scale
npx captioneer init my-video
cd my-video
npm install
npm startThis creates a ready-to-use Remotion project with captions.
npm install remotion-captioneernpm install remotion-captioneernpx captioneer process my-audio.mp4This creates my-audio-captions.json with word-level timestamps.
import { AbsoluteFill } from "remotion";
import { AnimatedCaptions } from "remotion-captioneer";
import captions from "./my-audio-captions.json";
export const MyVideo = () => {
return (
<AbsoluteFill style={{ backgroundColor: "#0a0a0a" }}>
<AnimatedCaptions
captions={captions}
style="word-highlight"
position="bottom"
highlightColor="#FFD700"
/>
</AbsoluteFill>
);
};That's it. Render with npx remotion render as usual.
14 animated styles, each with a unique visual feel:
| Style | Effect | Best For |
|---|---|---|
word-highlight |
Each word lights up with scale animation | Podcasts, interviews |
karaoke |
Progressive left-to-right color fill | Music, singing |
typewriter |
Character-by-character reveal + cursor | Tutorials, code demos |
bounce |
Active word bounces with spring physics | Social media, reels |
wave |
Words animate in a wave pattern | Music, rhythmic content |
glow |
Neon glow pulsing on active word | Cinematic, dramatic |
typewriter-erase |
Types then erases word-by-word | Transitions, reveals |
pill |
Active word in a colored pill/badge | Clean, modern look |
flicker |
Flickers in like a neon sign | Retro, neon aesthetic |
highlighter |
Yellow highlighter behind active word | Study, educational |
blur |
Future words blur, active word sharpens | Dramatic reveals |
rainbow |
Cycling rainbow colors on active word | Fun, playful content |
scale |
Words grow from small to full size | Energetic, bold |
spotlight |
Radial spotlight effect behind active word | Theatrical, stage |
<AnimatedCaptions captions={captions} style="word-highlight" />
<AnimatedCaptions captions={captions} style="karaoke" />
<AnimatedCaptions captions={captions} style="typewriter" />
<AnimatedCaptions captions={captions} style="bounce" />
<AnimatedCaptions captions={captions} style="wave" />
<AnimatedCaptions captions={captions} style="glow" />
<AnimatedCaptions captions={captions} style="typewriter-erase" />
<AnimatedCaptions captions={captions} style="pill" />
<AnimatedCaptions captions={captions} style="flicker" />
<AnimatedCaptions captions={captions} style="highlighter" />
<AnimatedCaptions captions={captions} style="blur" />
<AnimatedCaptions captions={captions} style="rainbow" />
<AnimatedCaptions captions={captions} style="scale" />
<AnimatedCaptions captions={captions} style="spotlight" />Choose your speech-to-text backend. Supports 5 providers out of the box:
| Provider | Env Variable | Speed | Offline | Best For |
|---|---|---|---|---|
| Local Whisper | โ | โญโญ | โ | Privacy, no API costs |
| OpenAI | OPENAI_API_KEY |
โญโญโญ | โ | Best accuracy |
| Groq | GROQ_API_KEY |
โญโญโญโญโญ | โ | Ultra-fast inference |
| Deepgram | DEEPGRAM_API_KEY |
โญโญโญโญ | โ | Real-time capable |
| AssemblyAI | ASSEMBLYAI_API_KEY |
โญโญโญ | โ | Rich features |
Apply a professional look instantly with one of 16 built-in presets:
import { AnimatedCaptions, applyPreset } from "remotion-captioneer";
// Use a preset
<AnimatedCaptions
captions={captions}
{...applyPreset("tiktok")}
/>
// Or spread individual props
const tiktokStyle = applyPreset("cinematic-gold");
<AnimatedCaptions captions={captions} {...tiktokStyle} />| Category | Presets |
|---|---|
| Social Media | tiktok, instagram-reels, youtube-shorts, twitter-clips |
| Podcast | podcast-clean, podcast-bold |
| Cinematic | cinematic-gold, cinematic-white, cinematic-neon |
| Music | music-karaoke, music-wave |
| Tutorial | tutorial-typewriter, tutorial-erase |
| Minimal | minimal-white, minimal-subtle |
| Gaming | gaming-neon, gaming-bold |
| News & Documentary | news-ticker, documentary |
| Education | education-highlighter, education-scale |
| Fun & Creative | fun-rainbow, retro-flicker |
# List presets from CLI
npx captioneer presetsExport captions to standard subtitle formats:
import { toSRT, toVTT, toASS, toPlainText } from "remotion-captioneer";
const srt = toSRT(captionData); // SubRip (.srt)
const vtt = toVTT(captionData); // WebVTT (.vtt)
const ass = toASS(captionData); // SubStation Alpha (.ass)
const txt = toPlainText(captionData); // Plain text
// Word-level exports (for custom timing)
const srtWords = toWordLevelSRT(captionData);
const vttWords = toWordLevelVTT(captionData);# Export from CLI
npx captioneer export captions.json --format srt
npx captioneer export captions.json --format vtt --output subtitles.vtt
npx captioneer export captions.json --format ass
npx captioneer export captions.json --format srt-wordsFormats: srt, vtt, ass, txt, srt-words, vtt-words
The CLI auto-detects available providers from environment variables:
# Groq is fastest โ set this first if you have a key
export GROQ_API_KEY="gsk_..."
# Or OpenAI
export OPENAI_API_KEY="sk-..."
# Then just run โ it picks the best available
npx captioneer process audio.mp4npx captioneer process audio.mp4 --provider groq
npx captioneer process audio.mp4 --provider openai --model whisper-1
npx captioneer process audio.mp4 --provider deepgram --model nova-2
npx captioneer process audio.mp4 --provider assemblyai
npx captioneer process audio.mp4 --provider local --model basenpx captioneer providers๐ก Available STT Providers:
local โ
ready
models: tiny, base, small, medium, large
groq โ
ready
models: whisper-large-v3, whisper-large-v3-turbo, distil-whisper-large-v3-en
openai โช not configured
models: whisper-1
import { GroqProvider, OpenAIProvider } from "remotion-captioneer";
// Groq โ ultra-fast
const groq = new GroqProvider("gsk_...");
const captions = await groq.transcribe("audio.mp4", {
model: "whisper-large-v3-turbo",
language: "en",
});
// OpenAI
const openai = new OpenAIProvider("sk-...");
const captions = await openai.transcribe("audio.mp4");
// Auto-detect from env
import { detectProvider } from "remotion-captioneer";
const detected = detectProvider();
if (detected) {
const captions = await detected.provider.transcribe("audio.mp4");
}Frame-perfect animations synchronized to audio. No more manually timing keyframes.
import { analyzeAudio } from "remotion-captioneer";
const analysis = await analyzeAudio("my-audio.mp4");
// Returns: beats, volumeFrames, bpm, energy levelsimport {
AudioSyncProvider,
useBeatPulse,
useVolume,
useEnergy,
} from "remotion-captioneer";
// Wrap your composition
const MyVideo = () => (
<AudioSyncProvider analysis={audioAnalysis}>
<BeatReactiveContent />
</AudioSyncProvider>
);
// Use in any child component
const BeatReactiveContent = () => {
const pulse = useBeatPulse(); // 0โ1 spring on each beat
const volume = useVolume(); // Current volume 0-1
const energy = useEnergy(); // Smoothed energy 0-1
return (
<div style={{
transform: `scale(${1 + pulse * 0.2})`,
opacity: 0.5 + volume * 0.5,
}}>
๐ต Synced to the beat!
</div>
);
};import { useTimelineValue, fadeInOut } from "remotion-captioneer";
// Map animation to audio timestamps (in ms)
const opacity = useTimelineValue({
keyframes: [
{ timeMs: 0, value: 0 },
{ timeMs: 1000, value: 1, easing: "easeOut" },
{ timeMs: 5000, value: 1 },
{ timeMs: 6000, value: 0, easing: "easeIn" },
],
defaultValue: 0,
});
// Or use the helper
const fadeOpacity = useTimelineValue(
fadeInOut(0, 1000, 5000, 6000)
);| Hook | Returns | Use For |
|---|---|---|
useVolume() |
number (0-1) |
Opacity, scale, size |
useBeat() |
BeatInfo | null |
Flash effects, pulses |
useBeatPulse() |
number (0-1 spring) |
Bounce, scale on beat |
useEnergy() |
number (0-1) |
Background intensity |
useIsOnBeat() |
boolean |
Conditional rendering |
useTimelineValue() |
number |
Keyframe animations |
useTimelineProgress() |
number (0-1) |
Progress bars |
Build videos from JSON config. No code needed for simple videos.
import { buildTemplate, TemplateComposition } from "remotion-captioneer";
const template = buildTemplate({
name: "My Captioned Video",
intro: {
title: "Episode 1",
subtitle: "Getting Started",
logo: "/logo.png",
},
captions: [
{ captions: myCaptions, captionStyle: "word-highlight" },
],
outro: {
heading: "Thanks for watching!",
cta: "Subscribe for more",
logo: "/logo.png",
},
});
// Use as Remotion composition
<TemplateComposition template={template} />import {
createIntroScene,
createCaptionScene,
createOutroScene,
createDividerScene,
} from "remotion-captioneer";
const intro = createIntroScene({
title: "My Video",
subtitle: "A demo",
durationSec: 3,
});
const content = createCaptionScene({
captions: myCaptions,
captionStyle: "karaoke",
highlightColor: "#FF6B6B",
});
const outro = createOutroScene({
heading: "The End",
cta: "Like & Subscribe",
logo: "/logo.png",
});Customize the entire look with a single config:
const template = buildTemplate({
name: "Brand Video",
tokens: {
colors: {
primary: "#6366F1",
accent: "#FFD700",
background: "#0a0a0a",
text: "#FFFFFF",
},
typography: {
headingFont: "Poppins, sans-serif",
bodyFont: "Inter, sans-serif",
},
},
// ...
});Composable layout building blocks for any Remotion video:
import {
Stack, Row, Columns, Grid,
Center, FadeIn, SlideUp,
GradientBg, Overlay, Positioned,
} from "remotion-captioneer";
// Vertical stack
<Stack gap={24}>
<FadeIn delayMs={0}>Title</FadeIn>
<FadeIn delayMs={200}>Subtitle</FadeIn>
</Stack>
// Horizontal columns
<Columns ratios={[2, 1]} gap={32}>
<div>Main content</div>
<div>Sidebar</div>
</Columns>
// Grid layout
<Grid columns={3} gap={16}>
{items.map(item => <Card key={item.id} />)}
</Grid>
// Animated entrance
<SlideUp delayMs={500} durationMs={800}>
<div>Slides up with delay</div>
</SlideUp>
// Gradient background
<GradientBg from="#0a0a0a" to="#1a1a2e">
<Center>Content here</Center>
</GradientBg># Basic usage (auto-detects provider from env vars)
npx captioneer process audio.mp4
# Specify provider
npx captioneer process audio.mp4 --provider groq
npx captioneer process audio.mp4 --provider openai --model whisper-1
# With options
npx captioneer process audio.mp4 --provider groq --language en --output captions.json
npx captioneer process audio.mp4 --provider local --model base
# Pass API key directly
npx captioneer process audio.mp4 --provider groq --api-key gsk_...Options:
-p, --provider <provider>โ STT provider:local,openai,groq,deepgram,assemblyai-m, --model <model>โ Model name (provider-specific)-k, --api-key <key>โ API key (or use env vars)-l, --language <lang>โ Language code:en,es,fr,de, etc.-o, --output <path>โ Output JSON path-v, --verboseโ Verbose output
# Scaffold a new project
npx captioneer init my-video
# List available providers and their status
npx captioneer providers
# List available caption styles
npx captioneer styles
# List available presets
npx captioneer presets
# Export captions to SRT/VTT/ASS
npx captioneer export captions.json --format srt
npx captioneer export captions.json --format vtt --output subs.vtt
# Batch process a directory of audio files
npx captioneer batch ./audio-files/
npx captioneer batch ./audio-files/ --provider groq --output-dir ./captions/
# Start real-time preview server
npx captioneer preview
# Open Remotion Studio with demos
npx captioneer demoThe generated JSON follows this structure:
interface CaptionData {
segments: Array<{
text: string; // Full segment text
startMs: number; // Segment start time (ms)
endMs: number; // Segment end time (ms)
words: Array<{
word: string; // Word text
startMs: number; // Word start time (ms)
endMs: number; // Word end time (ms)
confidence: number; // Whisper confidence (0-1)
}>;
}>;
language: string; // Detected language
durationMs: number; // Total duration (ms)
}You can also create caption data manually or from other sources โ just match this format.
Create a .captioneerrc file in your project root:
{
"whisperPath": "./whisper.cpp",
"modelPath": "./whisper.cpp/models/ggml-base.bin",
"defaultModel": "base",
"defaultLanguage": "en",
"defaultStyle": "word-highlight"
}Or add to your package.json:
{
"captioneer": {
"defaultModel": "base",
"defaultLanguage": "en"
}
}import {
AbsoluteFill,
Audio,
Composition,
staticFile,
} from "remotion";
import { AnimatedCaptions } from "remotion-captioneer";
import captions from "./captions.json";
export const CaptionedVideo = () => (
<AbsoluteFill
style={{
background: "linear-gradient(135deg, #0a0a0a 0%, #1a1a2e 100%)",
}}
>
<Audio src={staticFile("my-audio.mp4")} />
<AnimatedCaptions
captions={captions}
style="karaoke"
position="bottom"
highlightColor="#FF6B6B"
fontSize={64}
fontFamily="Inter, sans-serif"
/>
</AbsoluteFill>
);
export const RemotionRoot = () => (
<Composition
id="CaptionedVideo"
component={CaptionedVideo}
durationInFrames={900} // 30s at 30fps
fps={30}
width={1920}
height={1080}
/>
);FROM node:20-slim
# Install whisper.cpp dependencies
RUN apt-get update && apt-get install -y git cmake build-essential
WORKDIR /app
COPY . .
RUN npm install
# The CLI will auto-install whisper.cpp on first run
ENTRYPOINT ["npx", "captioneer"]| Prop | Type | Default | Description |
|---|---|---|---|
captions |
CaptionData |
required | Caption data object |
style |
CaptionStyle |
"word-highlight" |
Caption animation style |
fontFamily |
string |
"Inter, sans-serif" |
Font family |
fontSize |
number |
56 |
Font size in px |
fontColor |
string |
"rgba(255,255,255,0.5)" |
Inactive text color |
highlightColor |
string |
"#FFD700" |
Active/highlight color |
position |
"top" | "center" | "bottom" |
"bottom" |
Vertical position |
See the examples/ directory for complete working examples:
| File | What it shows |
|---|---|
01-basic.tsx |
Simplest captioned video |
02-presets.tsx |
Using presets (TikTok, Cinematic, Gaming) |
03-audio-sync.tsx |
Beat-reactive animations |
04-template.tsx |
Multi-scene template (intro โ content โ outro) |
05-layouts.tsx |
Custom layouts with primitives |
06-export.ts |
Export to SRT, VTT, ASS formats |
07-emoji.tsx |
Emoji reactions at word timestamps |
- 14 caption styles (word-highlight, karaoke, typewriter, bounce, wave, glow, typewriter-erase, pill, flicker, highlighter, blur, rainbow, scale, spotlight)
- 24 caption presets across 10 categories (Social, Podcast, Cinematic, Music, Tutorial, Minimal, Gaming, News, Education, Fun)
- Multi-line auto-wrapping with smart breaks (
smartWrap()) - Word-level emoji reactions (
EmojiReactions+autoGenerateReactions()) - Real-time preview server (
npx captioneer preview) - Batch processing mode (
npx captioneer batch ./audio/) - Multi-provider STT (OpenAI, Groq, Deepgram, AssemblyAI, Local Whisper)
- @remotion/captions compatibility layer
- Audio-video sync (beat detection, volume hooks, timeline keyframes)
- Template system for data-driven videos
- Layout primitives (Stack, Row, Columns, Grid, FadeIn, SlideUp, etc.)
- Export formats (SRT, VTT, ASS, TXT, word-level SRT & VTT)
- Project scaffolder (
npx captioneer init) - 7 working examples covering all features
- 10 CLI commands (init, process, batch, export, preview, presets, providers, styles, demo)
- GitHub Pages demo with all 14 styles animated
- GitHub Actions CI/CD (build, test, release to npm, CodeQL)
- 0 vulnerabilities in npm audit
- Caption style marketplace (community-contributed styles)
-
AI-powered auto-emoji(autoGenerateReactions()โ keyword-based emoji generation from 60+ wordโemoji mappings) - Multi-language caption support with RTL
-
Caption editor with visual timeline(Preview server with playback controls, progress bar, beat markers, style selector) - Integration with video hosting APIs (YouTube, Vimeo)
-
Real-time caption rendering in browser(npx captioneer previewโ live browser-based caption rendering with audio sync) - Caption translation utilities
- Speaker diarization (multi-speaker support)
Contributions welcome! Please open an issue first to discuss what you'd like to change.
- Fork the repo
- Create your feature branch (
git checkout -b feature/amazing) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing) - Open a Pull Request
MIT ยฉ Shuvo Roy
Everyone using Remotion for captioned videos ends up rebuilding the same thing:
Get audio โ run Whisper โ parse output โ sync to frames โ animate words
This package handles steps 2-5 so you can focus on your content, not plumbing.
โญ Star this repo if it helps you!