Skip to content

Guoyuer/reelsmith

Repository files navigation

reelsmith

CI Python 3.11+ License

Turn a folder of photos and videos into a polished highlight reel — with one command.

AI plans the edit, FFmpeg renders locally at full resolution. Your raw media never leaves your machine — only compressed thumbnails and preview clips are sent to Gemini for planning. Rendering happens entirely on your GPU at 4K60 if you want. About $0.10/run with --model fast, $0.50 with balanced.

ui

Features

  • AI plans, local renders — Gemini only sees 400px thumbnails and 480p 1 fps preview clips. Your original 4K photos and videos stay local. FFmpeg renders the final output from source files at any resolution you choose.
  • Sees and hears everything — despite the compression, Gemini sees every photo and watches every video clip with audio. It selects by visual and aural judgment, not metadata.
  • Per-segment AI music — Lyria RealTime generates mood-matched background tracks, crossfaded into one composite. Dynamic ducking around speech via sidechaincompress.
  • Beat-synced transitions — cuts snap to music beats via BPM detection. Speech segments are preserved without snapping.
  • GPU-accelerated — NVENC (Linux/Windows) and VideoToolbox (macOS) for encoding and decoding. Automatic fallback to CPU.
  • Rich terminal UI — live progress panel with per-stage status, sub-stage bars, cost tracking, and a summary table on completion.
  • Iterate fast — thumbnails and previews are cached. Re-planning is a single Gemini call. Re-rendering at a different resolution without another API call.

Quick Start

Prerequisites: Python 3.11+, FFmpeg, Gemini API key

git clone https://github.com/Guoyuer/reelsmith.git && cd reelsmith
python -m venv venv && source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -e .
cp .env.example .env   # then add your GEMINI_API_KEY

Run the full pipeline:

reelsmith full -n my-trip -p ./photos --duration 60 --model balanced -r 1080p30

That's it. Output lands in workspace/runs/my-trip/output/.

Iteration Workflow

# 1. Fast draft
reelsmith full -n trip -p ./photos --duration 120 --model fast -r 720p30

# 2. Re-plan with tweaks
reelsmith plan -n trip --duration 90 --model balanced --style cinematic \
  --focus "street food close-ups; temple serenity"

# 3. Final render
reelsmith assemble -n trip -r 4k60

Commands

Command What it does
reelsmith full End-to-end: prepare → plan → music → assemble
reelsmith prepare Scan media folder + generate thumbnails and previews
reelsmith plan Re-plan with Gemini (reuses cached media)
reelsmith assemble Re-render from existing EDL
reelsmith config Show saved run config
reelsmith workspace Disk usage and cleanup

Key Flags

Flag Required Description
-n / --name yes Run name (isolates workspace)
-p / --path yes Path to photos/videos folder
--duration yes Target length in seconds
--model yes fast ($0.24), balanced ($0.48), quality ($1.92)
-r / --resolution yes 4k60, 1080p30, 720p30, or WxHxFPS
--style no upbeat (default), cinematic, reflective, energetic
--trip-type no general (default), family, solo, food, adventure, architecture. Recommended — improves narrative quality
--focus no Creative focus: "family joy; exotic street markets"
--instruct no Free-form Gemini instructions: "no text overlays"
--lang no en (default), cn, both — for titles and overlays
--music no auto (default), none, or /path/to/track.mp3

Run reelsmith full --help for all options.

Architecture

See docs/architecture.md for the full data flow diagram — inputs, caches, EDL, and render artifact paths across all 4 stages.

How It Works

prepare ──▸ plan ──▸ generate_music ──▸ assemble
  │          │          │            │                │
  │          │          │            │                ├─ per-segment FFmpeg render
  │          │          │            │                ├─ TS concat (no re-encode)
  │          │          │            ├─ Lyria music   ├─ beat sync + music ducking
  │          │          ├─ Gemini    │  per segment   └─ validation (6 checks)
  │          │          │  sees all  │
  │          ├─ thumbs  │  photos +  │
  ├─ scan    ├─ ffprobe │  watches   │
  │  folder  ├─ preview │  videos    │
  │          │  clips   │            │

Plan stage — Gemini receives photo thumbnails inline + a concatenated video preview (480p, with audio) via Files API. One API call returns a structured EDL (JSON) with narrative arc, item selection, trim points, transitions, effects, text overlays, and music moods. Postprocessing validates paths, clamps trim points, and deduplicates.

Assemble stage — Each segment rendered as a single FFmpeg filter_complex_script. Photos get cosine-eased Ken Burns effects with blurred background fill. Videos are trimmed and speed-ramped per the EDL. Segments concatenated via TS demuxer (no re-encode), then music mixed with sidechaincompress ducking (500ms release).

Requirements

macOS Linux Windows
GPU encode VideoToolbox NVENC NVENC
GPU decode VideoToolbox CUDA CUDA
HEIC photos native pillow-heif pillow-heif
CPU fallback libx264 libx264 libx264

No local AI models needed — all inference runs via Gemini API.

Development

pip install -e ".[dev]"
pytest                       # unit tests (default)
pytest -m integration        # FFmpeg integration tests

Pre-commit hooks: ruff check --fix, ruff format, pytest.

License

Apache 2.0

About

AI-powered highlight reel generator — Gemini plans the edit, FFmpeg renders locally at full resolution

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages