narrate-this

A Rust SDK that turns text, URLs, or search queries into narrated videos — complete with TTS, captions, and stock visuals.

Build pipeline from text content to rendered video in a single call
Pluggable providers — swap any stage by implementing a trait

Here's a cringe demo made using this:

output_optimized.mp4

I watch random videos when I code, like the news, or random stuff in the background. I made this to be used in an automated pipeline for another personal app, which, in realtime reads from RSS feeds I m interested in and generates this content to satisfy my ADHD brain

Quick start

[dependencies]
narrate-this = "0.2"
tokio = { version = "1", features = ["full"] }

use narrate_this::{
    ContentPipeline, ContentSource, ElevenLabsConfig, ElevenLabsTts,
    FfmpegRenderer, FirecrawlScraper, FsAudioStorage, OpenAiConfig,
    OpenAiKeywords, PexelsSearch, RenderConfig, StockMediaPlanner,
};

#[tokio::main]
async fn main() -> narrate_this::Result<()> {
    let pipeline = ContentPipeline::builder()
        .content(FirecrawlScraper::new("http://localhost:3002"))
        .tts(ElevenLabsTts::new(ElevenLabsConfig {
            api_key: "your-elevenlabs-key".into(),
            ..Default::default()
        }))
        .media(StockMediaPlanner::new(
            OpenAiKeywords::new(OpenAiConfig {
                api_key: "your-openai-key".into(),
                ..Default::default()
            }),
            PexelsSearch::new("your-pexels-key"),
        ))
        .renderer(FfmpegRenderer::new(), RenderConfig::default())
        .audio_storage(FsAudioStorage::new("./output"))
        .build()?;

    let output = pipeline
        .process(ContentSource::ArticleUrl {
            url: "https://example.com/article".into(),
            title: Some("My Article".into()),
        })
        .await?;

    println!("Video: {}", output.video_path.unwrap());
    Ok(())
}

How the pipeline works

Content Source -> Narration -> Text Transforms -> TTS -> Media -> Audio Storage -> Video Render

Only TTS is required. Everything else is optional — skip content sourcing if you pass raw text, skip media if you just want audio, skip rendering if you don't need video.

Content sources

// Scrape and narrate an article
ContentSource::ArticleUrl {
    url: "https://example.com/article".into(),
    title: Some("Optional title hint".into()),
}

// Search the web and narrate the results
ContentSource::SearchQuery("latest Rust async developments".into())

// Just narrate some text directly (no content provider needed)
ContentSource::Text("Your text to narrate...".into())

Builder API

The builder uses type-state to enforce valid configuration at compile time:

ContentPipeline::builder()
    // Content provider (optional — skip for raw text)
    .content(FirecrawlScraper::new("http://localhost:3002"))

    // Text transforms (chainable, applied in order)
    .text_transform(OpenAiTransform::new(
        &openai_key,
        "Rewrite in a casual podcast style",
    ))

    // TTS provider (the only required piece)
    .tts(ElevenLabsTts::new(ElevenLabsConfig {
        api_key: tts_key,
        voice_id: "custom-voice-id".into(),  // default: "Gr7mLjPA3HhuWxZidxPW"
        speed: 1.2,
        ..Default::default()
    }))

    // Everything below is optional, in any order

    // Media planner — see "Media" section below
    .media(StockMediaPlanner::new(keywords_provider, search_provider))

    .renderer(FfmpegRenderer::new(), RenderConfig {
        output_path: "./output.mp4".into(),
        audio_tracks: vec![
            AudioTrack::new("./background.mp3").volume(0.15),
        ],
        ..Default::default()
    }.pix_fmt("yuv420p"))
    .audio_storage(FsAudioStorage::new("./audio_cache"))
    .cache(my_cache_provider)
    .build()?;

Media

The .media() builder method takes a MediaPlanner — a single trait that owns all media selection logic.

Stock media only

Use StockMediaPlanner for keyword extraction + stock search (e.g. Pexels):

.media(StockMediaPlanner::new(
    OpenAiKeywords::new(OpenAiConfig {
        api_key: "your-openai-key".into(),
        ..Default::default()
    }),
    PexelsSearch::new("your-pexels-key"),
))

User-provided assets with AI matching

Use LlmMediaPlanner to provide your own images/videos with descriptions. An LLM matches them to narration chunks based on semantic relevance:

use narrate_this::{LlmMediaPlanner, MediaAsset, MediaFallback, OpenAiConfig};

.media(
    LlmMediaPlanner::new(OpenAiConfig {
        api_key: "your-openai-key".into(),
        ..Default::default()
    })
    .assets(vec![
        MediaAsset::image("./hero.jpg", "A rocket launching into space"),
        MediaAsset::video("https://example.com/demo.mp4", "App demo walkthrough"),
        MediaAsset::image_bytes(screenshot_bytes, "Dashboard screenshot"),
    ])
    // Optional: fall back to stock search for unmatched chunks
    .stock_search(
        OpenAiKeywords::new(OpenAiConfig {
            api_key: "your-openai-key".into(),
            ..Default::default()
        }),
        PexelsSearch::new("your-pexels-key"),
    )
    .fallback(MediaFallback::StockSearch) // default
    .allow_reuse(true)                    // same asset can appear multiple times
    .max_reuse(Some(2)),                  // but at most twice
)

Media sources can be URLs, local file paths, or raw bytes — the renderer handles all three.

Processing

// Run the full pipeline
let output = pipeline.process(source).await?;

// With progress callbacks
let output = pipeline.process_with_progress(source, |event| {
    match event {
        PipelineProgress::NarrationStarted => println!("Scraping..."),
        PipelineProgress::TtsComplete { audio_bytes, caption_count } => {
            println!("Audio: {audio_bytes} bytes, {caption_count} captions");
        }
        PipelineProgress::RenderComplete { ref path } => {
            println!("Video saved to {path}");
        }
        _ => {}
    }
}).await?;

// Or just parts of it
let text = pipeline.narrate(source).await?;           // narration only
let tts_result = pipeline.synthesize("Text").await?;   // TTS only

Output

pub struct ContentOutput {
    pub narration: String,
    pub audio: Vec<u8>,                    // MP3
    pub captions: Vec<CaptionSegment>,     // word-level timing
    pub media_segments: Vec<MediaSegment>, // source: MediaSource (URL, file path, or bytes)
    pub audio_path: Option<String>,        // if audio storage configured
    pub video_path: Option<String>,        // if renderer configured
}

Narration style

You can control how the LLM writes the narration:

let scraper = FirecrawlScraper::with_config(FirecrawlConfig {
    base_url: "http://localhost:3002".into(),
    style: NarrationStyle::default()
        .role("podcast host")
        .persona("a friendly tech enthusiast")
        .tone("Casual and upbeat")
        .length("3-5 paragraphs, 60-120 seconds when read aloud")
        .structure("Open with a hook, then dive into details"),
    ..Default::default()
});

Background audio

let config = RenderConfig {
    output_path: "./output.mp4".into(),
    audio_tracks: vec![
        AudioTrack::new("./music.mp3")
            .volume(0.15)       // 0.0–1.0, default 0.3
            .start_at(2000)     // delay start by 2s
            .no_loop(),         // loops by default
    ],
    ..Default::default()
};

FFmpeg configuration

Customize encoding settings via RenderConfig. All fields are optional and fall back to sensible defaults:

let config = RenderConfig {
    output_path: "./output.mp4".into(),
    ..Default::default()
}
.video_codec("libx265")           // default: "libx264"
.audio_codec("libopus")           // default: "aac"
.preset("slow")                   // default: "fast"
.crf(18)                          // default: ffmpeg default (23 for x264)
.pix_fmt("yuv420p")               // broad player compatibility
.subtitle_style("FontSize=32,PrimaryColour=&H00FFFF&,Bold=1")
.extra_output_args(["-movflags", "+faststart"]);

Field	Default	FFmpeg flag
`video_codec`	`"libx264"`	`-c:v`
`audio_codec`	`"aac"`	`-c:a`
`preset`	`"fast"`	`-preset`
`crf`	none	`-crf`
`pix_fmt`	none	`-pix_fmt`
`subtitle_style`	`"FontSize=24,PrimaryColour=&HFFFFFF&"`	subtitles force_style
`extra_output_args`	`[]`	any flags before output path

extra_output_args is a catch-all for anything not covered above (e.g. -tune, -b:v, -movflags).

Providers

Built-in:

Provider	Service
`ElevenLabsTts`	ElevenLabs
`FirecrawlScraper`	Firecrawl
`OpenAiKeywords` / `OpenAiTransform`	OpenAI (gpt-4o-mini)
`PexelsSearch`	Pexels
`StockMediaPlanner`	Keywords + stock search
`LlmMediaPlanner`	AI asset matching + stock fallback
`FfmpegRenderer`	Local FFmpeg
`FsAudioStorage`	Local filesystem
`PgCache`	PostgreSQL (feature-gated: `pg-cache`)

You can swap in your own by implementing the matching trait:

#[async_trait]
impl TtsProvider for MyTtsProvider {
    async fn synthesize(&self, text: &str) -> Result<TtsResult> {
        // ...
    }
}

Traits: TtsProvider, ContentProvider, MediaPlanner, KeywordExtractor, MediaSearchProvider, TextTransformer, AudioStorage, CacheProvider, VideoRenderer.

PostgreSQL cache

[dependencies]
narrate-this = { version = "0.1", features = ["pg-cache"] }

let pool = sqlx::PgPool::connect("postgres://localhost/narrate").await?;
let cache = PgCache::new(pool);

let pipeline = ContentPipeline::builder()
    .tts(my_tts)
    .cache(cache)
    .build()?;

Prerequisites

Rust 2024 edition (1.85+)
FFmpeg on PATH for video rendering
A Firecrawl instance for URL/search sources
API keys for whichever providers you use

Running the examples

cp examples/.env.example examples/.env
# fill in your API keys
cargo run --example basic
cargo run --example local_tts

Error handling

All errors come back as narrate_this::SdkError with variants for each stage (Tts, Llm, MediaSearch, MediaPlanner, WebScraper, etc.). Non-fatal errors (like a media search miss) are logged as warnings via tracing and won't stop the pipeline.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
examples		examples
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

narrate-this

Quick start

How the pipeline works

Content sources

Builder API

Media

Stock media only

User-provided assets with AI matching

Processing

Output

Narration style

Background audio

FFmpeg configuration

Providers

PostgreSQL cache

Prerequisites

Running the examples

Error handling

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

narrate-this

Quick start

How the pipeline works

Content sources

Builder API

Media

Stock media only

User-provided assets with AI matching

Processing

Output

Narration style

Background audio

FFmpeg configuration

Providers

PostgreSQL cache

Prerequisites

Running the examples

Error handling

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages