Turn a folder of photos and videos into a polished highlight reel — with one command.
AI plans the edit, FFmpeg renders locally at full resolution. Your raw media never
leaves your machine — only compressed thumbnails and preview clips are sent to Gemini
for planning. Rendering happens entirely on your GPU at 4K60 if you want.
About $0.10/run with --model fast, $0.50 with balanced.
- AI plans, local renders — Gemini only sees 400px thumbnails and 480p 1 fps preview clips. Your original 4K photos and videos stay local. FFmpeg renders the final output from source files at any resolution you choose.
- Sees and hears everything — despite the compression, Gemini sees every photo and watches every video clip with audio. It selects by visual and aural judgment, not metadata.
- Per-segment AI music — Lyria RealTime generates mood-matched background tracks,
crossfaded into one composite. Dynamic ducking around speech via
sidechaincompress. - Beat-synced transitions — cuts snap to music beats via BPM detection. Speech segments are preserved without snapping.
- GPU-accelerated — NVENC (Linux/Windows) and VideoToolbox (macOS) for encoding and decoding. Automatic fallback to CPU.
- Rich terminal UI — live progress panel with per-stage status, sub-stage bars, cost tracking, and a summary table on completion.
- Iterate fast — thumbnails and previews are cached. Re-planning is a single Gemini call. Re-rendering at a different resolution without another API call.
Prerequisites: Python 3.11+, FFmpeg, Gemini API key
git clone https://github.com/Guoyuer/reelsmith.git && cd reelsmith
python -m venv venv && source venv/bin/activate # Windows: venv\Scripts\activate
pip install -e .
cp .env.example .env # then add your GEMINI_API_KEYRun the full pipeline:
reelsmith full -n my-trip -p ./photos --duration 60 --model balanced -r 1080p30That's it. Output lands in workspace/runs/my-trip/output/.
# 1. Fast draft
reelsmith full -n trip -p ./photos --duration 120 --model fast -r 720p30
# 2. Re-plan with tweaks
reelsmith plan -n trip --duration 90 --model balanced --style cinematic \
--focus "street food close-ups; temple serenity"
# 3. Final render
reelsmith assemble -n trip -r 4k60| Command | What it does |
|---|---|
reelsmith full |
End-to-end: prepare → plan → music → assemble |
reelsmith prepare |
Scan media folder + generate thumbnails and previews |
reelsmith plan |
Re-plan with Gemini (reuses cached media) |
reelsmith assemble |
Re-render from existing EDL |
reelsmith config |
Show saved run config |
reelsmith workspace |
Disk usage and cleanup |
| Flag | Required | Description |
|---|---|---|
-n / --name |
yes | Run name (isolates workspace) |
-p / --path |
yes | Path to photos/videos folder |
--duration |
yes | Target length in seconds |
--model |
yes | fast ($0.24), balanced ($0.48), quality ($1.92) |
-r / --resolution |
yes | 4k60, 1080p30, 720p30, or WxHxFPS |
--style |
no | upbeat (default), cinematic, reflective, energetic |
--trip-type |
no | general (default), family, solo, food, adventure, architecture. Recommended — improves narrative quality |
--focus |
no | Creative focus: "family joy; exotic street markets" |
--instruct |
no | Free-form Gemini instructions: "no text overlays" |
--lang |
no | en (default), cn, both — for titles and overlays |
--music |
no | auto (default), none, or /path/to/track.mp3 |
Run reelsmith full --help for all options.
See docs/architecture.md for the full data flow diagram — inputs, caches, EDL, and render artifact paths across all 4 stages.
prepare ──▸ plan ──▸ generate_music ──▸ assemble
│ │ │ │ │
│ │ │ │ ├─ per-segment FFmpeg render
│ │ │ │ ├─ TS concat (no re-encode)
│ │ │ ├─ Lyria music ├─ beat sync + music ducking
│ │ ├─ Gemini │ per segment └─ validation (6 checks)
│ │ │ sees all │
│ ├─ thumbs │ photos + │
├─ scan ├─ ffprobe │ watches │
│ folder ├─ preview │ videos │
│ │ clips │ │
Plan stage — Gemini receives photo thumbnails inline + a concatenated video preview (480p, with audio) via Files API. One API call returns a structured EDL (JSON) with narrative arc, item selection, trim points, transitions, effects, text overlays, and music moods. Postprocessing validates paths, clamps trim points, and deduplicates.
Assemble stage — Each segment rendered as a single FFmpeg filter_complex_script.
Photos get cosine-eased Ken Burns effects with blurred background fill. Videos are
trimmed and speed-ramped per the EDL. Segments concatenated via TS demuxer (no
re-encode), then music mixed with sidechaincompress ducking (500ms release).
| macOS | Linux | Windows | |
|---|---|---|---|
| GPU encode | VideoToolbox | NVENC | NVENC |
| GPU decode | VideoToolbox | CUDA | CUDA |
| HEIC photos | native | pillow-heif | pillow-heif |
| CPU fallback | libx264 | libx264 | libx264 |
No local AI models needed — all inference runs via Gemini API.
pip install -e ".[dev]"
pytest # unit tests (default)
pytest -m integration # FFmpeg integration testsPre-commit hooks: ruff check --fix, ruff format, pytest.
