Skip to content

Alex-Kaff/narrate-this

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

narrate-this

Crates.io docs.rs License: MIT

A Rust SDK that turns text, URLs, or search queries into narrated videos — complete with TTS, captions, and stock visuals.

  • Build pipeline from text content to rendered video in a single call
  • Pluggable providers — swap any stage by implementing a trait

Here's a cringe demo made using this:

output_optimized.mp4

I watch random videos when I code, like the news, or random stuff in the background. I made this to be used in an automated pipeline for another personal app, which, in realtime reads from RSS feeds I m interested in and generates this content to satisfy my ADHD brain

Quick start

[dependencies]
narrate-this = "0.2"
tokio = { version = "1", features = ["full"] }
use narrate_this::{
    ContentPipeline, ContentSource, ElevenLabsConfig, ElevenLabsTts,
    FfmpegRenderer, FirecrawlScraper, FsAudioStorage, OpenAiConfig,
    OpenAiKeywords, PexelsSearch, RenderConfig, StockMediaPlanner,
};

#[tokio::main]
async fn main() -> narrate_this::Result<()> {
    let pipeline = ContentPipeline::builder()
        .content(FirecrawlScraper::new("http://localhost:3002"))
        .tts(ElevenLabsTts::new(ElevenLabsConfig {
            api_key: "your-elevenlabs-key".into(),
            ..Default::default()
        }))
        .media(StockMediaPlanner::new(
            OpenAiKeywords::new(OpenAiConfig {
                api_key: "your-openai-key".into(),
                ..Default::default()
            }),
            PexelsSearch::new("your-pexels-key"),
        ))
        .renderer(FfmpegRenderer::new(), RenderConfig::default())
        .audio_storage(FsAudioStorage::new("./output"))
        .build()?;

    let output = pipeline
        .process(ContentSource::ArticleUrl {
            url: "https://example.com/article".into(),
            title: Some("My Article".into()),
        })
        .await?;

    println!("Video: {}", output.video_path.unwrap());
    Ok(())
}

How the pipeline works

Content Source -> Narration -> Text Transforms -> TTS -> Media -> Audio Storage -> Video Render

Only TTS is required. Everything else is optional — skip content sourcing if you pass raw text, skip media if you just want audio, skip rendering if you don't need video.

Content sources

// Scrape and narrate an article
ContentSource::ArticleUrl {
    url: "https://example.com/article".into(),
    title: Some("Optional title hint".into()),
}

// Search the web and narrate the results
ContentSource::SearchQuery("latest Rust async developments".into())

// Just narrate some text directly (no content provider needed)
ContentSource::Text("Your text to narrate...".into())

Builder API

The builder uses type-state to enforce valid configuration at compile time:

ContentPipeline::builder()
    // Content provider (optional — skip for raw text)
    .content(FirecrawlScraper::new("http://localhost:3002"))

    // Text transforms (chainable, applied in order)
    .text_transform(OpenAiTransform::new(
        &openai_key,
        "Rewrite in a casual podcast style",
    ))

    // TTS provider (the only required piece)
    .tts(ElevenLabsTts::new(ElevenLabsConfig {
        api_key: tts_key,
        voice_id: "custom-voice-id".into(),  // default: "Gr7mLjPA3HhuWxZidxPW"
        speed: 1.2,
        ..Default::default()
    }))

    // Everything below is optional, in any order

    // Media planner — see "Media" section below
    .media(StockMediaPlanner::new(keywords_provider, search_provider))

    .renderer(FfmpegRenderer::new(), RenderConfig {
        output_path: "./output.mp4".into(),
        audio_tracks: vec![
            AudioTrack::new("./background.mp3").volume(0.15),
        ],
        ..Default::default()
    }.pix_fmt("yuv420p"))
    .audio_storage(FsAudioStorage::new("./audio_cache"))
    .cache(my_cache_provider)
    .build()?;

Media

The .media() builder method takes a MediaPlanner — a single trait that owns all media selection logic.

Stock media only

Use StockMediaPlanner for keyword extraction + stock search (e.g. Pexels):

.media(StockMediaPlanner::new(
    OpenAiKeywords::new(OpenAiConfig {
        api_key: "your-openai-key".into(),
        ..Default::default()
    }),
    PexelsSearch::new("your-pexels-key"),
))

User-provided assets with AI matching

Use LlmMediaPlanner to provide your own images/videos with descriptions. An LLM matches them to narration chunks based on semantic relevance:

use narrate_this::{LlmMediaPlanner, MediaAsset, MediaFallback, OpenAiConfig};

.media(
    LlmMediaPlanner::new(OpenAiConfig {
        api_key: "your-openai-key".into(),
        ..Default::default()
    })
    .assets(vec![
        MediaAsset::image("./hero.jpg", "A rocket launching into space"),
        MediaAsset::video("https://example.com/demo.mp4", "App demo walkthrough"),
        MediaAsset::image_bytes(screenshot_bytes, "Dashboard screenshot"),
    ])
    // Optional: fall back to stock search for unmatched chunks
    .stock_search(
        OpenAiKeywords::new(OpenAiConfig {
            api_key: "your-openai-key".into(),
            ..Default::default()
        }),
        PexelsSearch::new("your-pexels-key"),
    )
    .fallback(MediaFallback::StockSearch) // default
    .allow_reuse(true)                    // same asset can appear multiple times
    .max_reuse(Some(2)),                  // but at most twice
)

Media sources can be URLs, local file paths, or raw bytes — the renderer handles all three.

Processing

// Run the full pipeline
let output = pipeline.process(source).await?;

// With progress callbacks
let output = pipeline.process_with_progress(source, |event| {
    match event {
        PipelineProgress::NarrationStarted => println!("Scraping..."),
        PipelineProgress::TtsComplete { audio_bytes, caption_count } => {
            println!("Audio: {audio_bytes} bytes, {caption_count} captions");
        }
        PipelineProgress::RenderComplete { ref path } => {
            println!("Video saved to {path}");
        }
        _ => {}
    }
}).await?;

// Or just parts of it
let text = pipeline.narrate(source).await?;           // narration only
let tts_result = pipeline.synthesize("Text").await?;   // TTS only

Output

pub struct ContentOutput {
    pub narration: String,
    pub audio: Vec<u8>,                    // MP3
    pub captions: Vec<CaptionSegment>,     // word-level timing
    pub media_segments: Vec<MediaSegment>, // source: MediaSource (URL, file path, or bytes)
    pub audio_path: Option<String>,        // if audio storage configured
    pub video_path: Option<String>,        // if renderer configured
}

Narration style

You can control how the LLM writes the narration:

let scraper = FirecrawlScraper::with_config(FirecrawlConfig {
    base_url: "http://localhost:3002".into(),
    style: NarrationStyle::default()
        .role("podcast host")
        .persona("a friendly tech enthusiast")
        .tone("Casual and upbeat")
        .length("3-5 paragraphs, 60-120 seconds when read aloud")
        .structure("Open with a hook, then dive into details"),
    ..Default::default()
});

Background audio

let config = RenderConfig {
    output_path: "./output.mp4".into(),
    audio_tracks: vec![
        AudioTrack::new("./music.mp3")
            .volume(0.15)       // 0.0–1.0, default 0.3
            .start_at(2000)     // delay start by 2s
            .no_loop(),         // loops by default
    ],
    ..Default::default()
};

FFmpeg configuration

Customize encoding settings via RenderConfig. All fields are optional and fall back to sensible defaults:

let config = RenderConfig {
    output_path: "./output.mp4".into(),
    ..Default::default()
}
.video_codec("libx265")           // default: "libx264"
.audio_codec("libopus")           // default: "aac"
.preset("slow")                   // default: "fast"
.crf(18)                          // default: ffmpeg default (23 for x264)
.pix_fmt("yuv420p")               // broad player compatibility
.subtitle_style("FontSize=32,PrimaryColour=&H00FFFF&,Bold=1")
.extra_output_args(["-movflags", "+faststart"]);
Field Default FFmpeg flag
video_codec "libx264" -c:v
audio_codec "aac" -c:a
preset "fast" -preset
crf none -crf
pix_fmt none -pix_fmt
subtitle_style "FontSize=24,PrimaryColour=&HFFFFFF&" subtitles force_style
extra_output_args [] any flags before output path

extra_output_args is a catch-all for anything not covered above (e.g. -tune, -b:v, -movflags).

Providers

Built-in:

Provider Service
ElevenLabsTts ElevenLabs
FirecrawlScraper Firecrawl
OpenAiKeywords / OpenAiTransform OpenAI (gpt-4o-mini)
PexelsSearch Pexels
StockMediaPlanner Keywords + stock search
LlmMediaPlanner AI asset matching + stock fallback
FfmpegRenderer Local FFmpeg
FsAudioStorage Local filesystem
PgCache PostgreSQL (feature-gated: pg-cache)

You can swap in your own by implementing the matching trait:

#[async_trait]
impl TtsProvider for MyTtsProvider {
    async fn synthesize(&self, text: &str) -> Result<TtsResult> {
        // ...
    }
}

Traits: TtsProvider, ContentProvider, MediaPlanner, KeywordExtractor, MediaSearchProvider, TextTransformer, AudioStorage, CacheProvider, VideoRenderer.

PostgreSQL cache

[dependencies]
narrate-this = { version = "0.1", features = ["pg-cache"] }
let pool = sqlx::PgPool::connect("postgres://localhost/narrate").await?;
let cache = PgCache::new(pool);

let pipeline = ContentPipeline::builder()
    .tts(my_tts)
    .cache(cache)
    .build()?;

Prerequisites

  • Rust 2024 edition (1.85+)
  • FFmpeg on PATH for video rendering
  • A Firecrawl instance for URL/search sources
  • API keys for whichever providers you use

Running the examples

cp examples/.env.example examples/.env
# fill in your API keys
cargo run --example basic
cargo run --example local_tts

Error handling

All errors come back as narrate_this::SdkError with variants for each stage (Tts, Llm, MediaSearch, MediaPlanner, WebScraper, etc.). Non-fatal errors (like a media search miss) are logged as warnings via tracing and won't stop the pipeline.

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages