A Rust SDK that turns text, URLs, or search queries into narrated videos — complete with TTS, captions, and stock visuals.
- Build pipeline from text content to rendered video in a single call
- Pluggable providers — swap any stage by implementing a trait
Here's a cringe demo made using this:
output_optimized.mp4
I watch random videos when I code, like the news, or random stuff in the background. I made this to be used in an automated pipeline for another personal app, which, in realtime reads from RSS feeds I m interested in and generates this content to satisfy my ADHD brain
[dependencies]
narrate-this = "0.2"
tokio = { version = "1", features = ["full"] }use narrate_this::{
ContentPipeline, ContentSource, ElevenLabsConfig, ElevenLabsTts,
FfmpegRenderer, FirecrawlScraper, FsAudioStorage, OpenAiConfig,
OpenAiKeywords, PexelsSearch, RenderConfig, StockMediaPlanner,
};
#[tokio::main]
async fn main() -> narrate_this::Result<()> {
let pipeline = ContentPipeline::builder()
.content(FirecrawlScraper::new("http://localhost:3002"))
.tts(ElevenLabsTts::new(ElevenLabsConfig {
api_key: "your-elevenlabs-key".into(),
..Default::default()
}))
.media(StockMediaPlanner::new(
OpenAiKeywords::new(OpenAiConfig {
api_key: "your-openai-key".into(),
..Default::default()
}),
PexelsSearch::new("your-pexels-key"),
))
.renderer(FfmpegRenderer::new(), RenderConfig::default())
.audio_storage(FsAudioStorage::new("./output"))
.build()?;
let output = pipeline
.process(ContentSource::ArticleUrl {
url: "https://example.com/article".into(),
title: Some("My Article".into()),
})
.await?;
println!("Video: {}", output.video_path.unwrap());
Ok(())
}Content Source -> Narration -> Text Transforms -> TTS -> Media -> Audio Storage -> Video Render
Only TTS is required. Everything else is optional — skip content sourcing if you pass raw text, skip media if you just want audio, skip rendering if you don't need video.
// Scrape and narrate an article
ContentSource::ArticleUrl {
url: "https://example.com/article".into(),
title: Some("Optional title hint".into()),
}
// Search the web and narrate the results
ContentSource::SearchQuery("latest Rust async developments".into())
// Just narrate some text directly (no content provider needed)
ContentSource::Text("Your text to narrate...".into())The builder uses type-state to enforce valid configuration at compile time:
ContentPipeline::builder()
// Content provider (optional — skip for raw text)
.content(FirecrawlScraper::new("http://localhost:3002"))
// Text transforms (chainable, applied in order)
.text_transform(OpenAiTransform::new(
&openai_key,
"Rewrite in a casual podcast style",
))
// TTS provider (the only required piece)
.tts(ElevenLabsTts::new(ElevenLabsConfig {
api_key: tts_key,
voice_id: "custom-voice-id".into(), // default: "Gr7mLjPA3HhuWxZidxPW"
speed: 1.2,
..Default::default()
}))
// Everything below is optional, in any order
// Media planner — see "Media" section below
.media(StockMediaPlanner::new(keywords_provider, search_provider))
.renderer(FfmpegRenderer::new(), RenderConfig {
output_path: "./output.mp4".into(),
audio_tracks: vec![
AudioTrack::new("./background.mp3").volume(0.15),
],
..Default::default()
}.pix_fmt("yuv420p"))
.audio_storage(FsAudioStorage::new("./audio_cache"))
.cache(my_cache_provider)
.build()?;The .media() builder method takes a MediaPlanner — a single trait that owns all media selection logic.
Use StockMediaPlanner for keyword extraction + stock search (e.g. Pexels):
.media(StockMediaPlanner::new(
OpenAiKeywords::new(OpenAiConfig {
api_key: "your-openai-key".into(),
..Default::default()
}),
PexelsSearch::new("your-pexels-key"),
))Use LlmMediaPlanner to provide your own images/videos with descriptions. An LLM matches them to narration chunks based on semantic relevance:
use narrate_this::{LlmMediaPlanner, MediaAsset, MediaFallback, OpenAiConfig};
.media(
LlmMediaPlanner::new(OpenAiConfig {
api_key: "your-openai-key".into(),
..Default::default()
})
.assets(vec![
MediaAsset::image("./hero.jpg", "A rocket launching into space"),
MediaAsset::video("https://example.com/demo.mp4", "App demo walkthrough"),
MediaAsset::image_bytes(screenshot_bytes, "Dashboard screenshot"),
])
// Optional: fall back to stock search for unmatched chunks
.stock_search(
OpenAiKeywords::new(OpenAiConfig {
api_key: "your-openai-key".into(),
..Default::default()
}),
PexelsSearch::new("your-pexels-key"),
)
.fallback(MediaFallback::StockSearch) // default
.allow_reuse(true) // same asset can appear multiple times
.max_reuse(Some(2)), // but at most twice
)Media sources can be URLs, local file paths, or raw bytes — the renderer handles all three.
// Run the full pipeline
let output = pipeline.process(source).await?;
// With progress callbacks
let output = pipeline.process_with_progress(source, |event| {
match event {
PipelineProgress::NarrationStarted => println!("Scraping..."),
PipelineProgress::TtsComplete { audio_bytes, caption_count } => {
println!("Audio: {audio_bytes} bytes, {caption_count} captions");
}
PipelineProgress::RenderComplete { ref path } => {
println!("Video saved to {path}");
}
_ => {}
}
}).await?;
// Or just parts of it
let text = pipeline.narrate(source).await?; // narration only
let tts_result = pipeline.synthesize("Text").await?; // TTS onlypub struct ContentOutput {
pub narration: String,
pub audio: Vec<u8>, // MP3
pub captions: Vec<CaptionSegment>, // word-level timing
pub media_segments: Vec<MediaSegment>, // source: MediaSource (URL, file path, or bytes)
pub audio_path: Option<String>, // if audio storage configured
pub video_path: Option<String>, // if renderer configured
}You can control how the LLM writes the narration:
let scraper = FirecrawlScraper::with_config(FirecrawlConfig {
base_url: "http://localhost:3002".into(),
style: NarrationStyle::default()
.role("podcast host")
.persona("a friendly tech enthusiast")
.tone("Casual and upbeat")
.length("3-5 paragraphs, 60-120 seconds when read aloud")
.structure("Open with a hook, then dive into details"),
..Default::default()
});let config = RenderConfig {
output_path: "./output.mp4".into(),
audio_tracks: vec![
AudioTrack::new("./music.mp3")
.volume(0.15) // 0.0–1.0, default 0.3
.start_at(2000) // delay start by 2s
.no_loop(), // loops by default
],
..Default::default()
};Customize encoding settings via RenderConfig. All fields are optional and fall back to sensible defaults:
let config = RenderConfig {
output_path: "./output.mp4".into(),
..Default::default()
}
.video_codec("libx265") // default: "libx264"
.audio_codec("libopus") // default: "aac"
.preset("slow") // default: "fast"
.crf(18) // default: ffmpeg default (23 for x264)
.pix_fmt("yuv420p") // broad player compatibility
.subtitle_style("FontSize=32,PrimaryColour=&H00FFFF&,Bold=1")
.extra_output_args(["-movflags", "+faststart"]);| Field | Default | FFmpeg flag |
|---|---|---|
video_codec |
"libx264" |
-c:v |
audio_codec |
"aac" |
-c:a |
preset |
"fast" |
-preset |
crf |
none | -crf |
pix_fmt |
none | -pix_fmt |
subtitle_style |
"FontSize=24,PrimaryColour=&HFFFFFF&" |
subtitles force_style |
extra_output_args |
[] |
any flags before output path |
extra_output_args is a catch-all for anything not covered above (e.g. -tune, -b:v, -movflags).
Built-in:
| Provider | Service |
|---|---|
ElevenLabsTts |
ElevenLabs |
FirecrawlScraper |
Firecrawl |
OpenAiKeywords / OpenAiTransform |
OpenAI (gpt-4o-mini) |
PexelsSearch |
Pexels |
StockMediaPlanner |
Keywords + stock search |
LlmMediaPlanner |
AI asset matching + stock fallback |
FfmpegRenderer |
Local FFmpeg |
FsAudioStorage |
Local filesystem |
PgCache |
PostgreSQL (feature-gated: pg-cache) |
You can swap in your own by implementing the matching trait:
#[async_trait]
impl TtsProvider for MyTtsProvider {
async fn synthesize(&self, text: &str) -> Result<TtsResult> {
// ...
}
}Traits: TtsProvider, ContentProvider, MediaPlanner, KeywordExtractor, MediaSearchProvider, TextTransformer, AudioStorage, CacheProvider, VideoRenderer.
[dependencies]
narrate-this = { version = "0.1", features = ["pg-cache"] }let pool = sqlx::PgPool::connect("postgres://localhost/narrate").await?;
let cache = PgCache::new(pool);
let pipeline = ContentPipeline::builder()
.tts(my_tts)
.cache(cache)
.build()?;- Rust 2024 edition (1.85+)
- FFmpeg on PATH for video rendering
- A Firecrawl instance for URL/search sources
- API keys for whichever providers you use
cp examples/.env.example examples/.env
# fill in your API keys
cargo run --example basic
cargo run --example local_ttsAll errors come back as narrate_this::SdkError with variants for each stage (Tts, Llm, MediaSearch, MediaPlanner, WebScraper, etc.). Non-fatal errors (like a media search miss) are logged as warnings via tracing and won't stop the pipeline.
MIT