AI-powered short-form video production pipeline — generates YouTube Shorts & Instagram Reels from a title or script using autonomous agents, React/Remotion, and open AI APIs.
Given a topic or full script, the pipeline automatically:
- Writes the script — using personality-driven short-form writing styles (punchy, outscal, etc.)
- Generates scene images — AI image generation via Pollinations API (Flux, nanobanana, DALL-E, etc.)
- Produces narration audio — per-scene TTS with word-level SRT timing via edge-tts
- Downloads BGM / SFX — royalty-free audio from Pixabay
- Compiles to MP4 — smooth Ken Burns motion + scene transitions + burned TikTok-style captions via Remotion
Output: vertical MP4 (1080×1920, 30fps, H.264) ready for TikTok, YouTube Shorts, and Instagram Reels.
| Scene 1 | Scene 4 | Scene 6 |
|---|---|---|
| … | … |
Example output in
output/20260411_funfact-planet-mars/
.github/agents/ ← Agent orchestrators
image-video-gen.agent.md ← 7-phase: script → images → audio → metadata
remotion-compilation.agent.md ← compile output folder → final MP4
.agents/skills/ ← Domain skill packages (reusable across agents)
pollinations-python/ ← Image generation (Pollinations API + Python)
edge-tts/ ← TTS narration + word-level SRT
pixabay-audio/ ← BGM / SFX search & download
ffmpeg/ ← Video/audio conversion helpers
remotion-best-practices/ ← Remotion animation & timing rules
shorts-script-personality/ ← Viral short-form script styles
content-creation/ ← Marketing copy & angle research
social-content/ ← Social media optimization
youtube-scriptwriting/ ← Hook → body → payoff structure
ai-image-prompts-skill/ ← Curated 10k+ image prompt library
src/ ← Shared Remotion template (never modified per-video)
Root.tsx ← Loads scene-config.json dynamically
Main.tsx ← TransitionSeries + per-scene audio
SceneImage.tsx ← Animated image (zoom / pan / Ken Burns)
CaptionOverlay.tsx ← TikTok-style word-highlighted captions
output/{date}_{slug}/ ← Per-video assets (git-ignored)
scene_N/ ← image, audio, prompt, SRT per scene
scene-config.json ← Motion + timing config (read at render time)
subtitles.srt ← Full-video merged captions
metadata.json ← Title, hashtags, thumbnail_prompt
Shared template model: src/ is one shared Remotion project. Every video only needs its own scene-config.json + asset files in the output folder. REMOTION_PUBLIC_DIR switches which video is loaded at render time.
- Node.js 18+ and npm
- Python 3.10+
- ffmpeg in PATH (for audio concatenation)
- API keys: Pollinations, Pixabay (optional)
git clone https://github.com/myusp/image-video-gen-agent.git
cd image-video-gen-agent
# Node deps (Remotion)
npm install
# Python deps
pip install edge-tts requests python-dotenv pillow openaicp .env.example .envEdit .env:
POLLINATIONS_API_KEY="your-key-here" # from https://pollinations.ai
PIXABAY_API_KEY="your-key-here" # optional, for BGM/SFX
EDGE_TTS_NAME="id-ID-ArdiNeural" # TTS voice (see edge-tts --list-voices)
OUTPUT="./output"
WORD_BREAK_SUBTITLE=4The pipeline runs via GitHub Copilot Agent Mode (VS Code). Open the project in VS Code, then in a chat window:
@Image Video Generator Fun fact tentang Planet Mars
The agent will:
- Auto-generate a script (or accept your script)
- Generate scene images, TTS audio, BGM
- Save everything to
output/20260411_funfact-planet-mars/
@Remotion Compilation Agent output/20260411_funfact-planet-mars
Or run directly from the terminal:
REMOTION_PUBLIC_DIR=./output/20260411_funfact-planet-mars \
npm run render:captions \
--output ./output/20260411_funfact-planet-mars/final_video.mp4REMOTION_PUBLIC_DIR=./output/20260411_funfact-planet-mars npm run dev| Effect | Description | Auto-selected when prompt contains |
|---|---|---|
zoom_in |
Scale 1.0 → 1.5 | close-up, detail, macro, face |
zoom_out |
Scale 1.5 → 1.0 | wide shot, landscape, crowd |
pan_left_right |
±10% X @ 1.5× scale | action, movement, chase, speed |
pan_right_left |
reverse X pan | same |
pan_up_down |
±10% Y @ 1.5× scale | falling, rain, descending |
pan_down_up |
reverse Y pan | rising, flying, sky, hope |
ken_burns |
Scale 1.0 → 1.4 + drift | cinematic, dramatic, portrait, reveal |
Three Remotion compositions are registered:
| Composition ID | Caption Style | Use |
|---|---|---|
main |
TikTok-style word highlight (yellow #FFE600) |
Default for Shorts |
main-plain-captions |
Plain white text, no highlight | Clean look |
main-no-captions |
No captions | B-roll, manual sub overlay |
Captions include a −150ms timing offset to compensate for edge-tts word-boundary delay.
output/
└── 20260411_funfact-planet-mars/
├── scene_1/
│ ├── image_1.jpeg ← AI-generated scene image
│ ├── audio_1.mp3 ← TTS narration
│ ├── prompt_1.txt ← Image prompt (used for motion selection)
│ ├── subtitles_1.txt ← Scene text
│ └── subtitle_1.srt ← Word-level SRT
├── scene_2/ … scene_N/
├── audio/
│ ├── bgm_cinematic.mp3 ← Background music (Pixabay)
│ └── audio_manifest.json
├── scene-config.json ← Motion + timing per scene
├── subtitles.srt ← Full-video captions
├── subtitles.txt ← Full narration text
├── audio_full.mp3 ← Concatenated narration
└── metadata.json ← Title, description, hashtags, thumbnail_prompt
Skills are modular knowledge packages used by agents. Each skill under .agents/skills/ contains:
SKILL.md— Instructions and workflowreferences/— Domain reference docsscripts/— Ready-to-run Python scripts
See each skill's SKILL.md for usage details.
| Constraint | Reason |
|---|---|
No em dash — / en dash – in subtitles |
Causes rendering artifacts in burned captions |
Images must be .jpeg |
Pollinations API returns JPEG binary; other formats break rendering |
| No symlinks in output folder | Remotion HTTP bundler can't resolve symlinks → 404 |
Never edit src/ per-video |
Shared template; config goes only in scene-config.json |
REMOTION_PUBLIC_DIR must be set |
Without it, Remotion can't find scene-config.json |
Pan translate < (scale-1)/(2×scale) |
Exceeding this shows black edges at frame borders |
Contributions welcome! Please open an issue before submitting large PRs.
- Fork the repo
- Create a feature branch (
git checkout -b feat/my-feature) - Commit your changes
- Open a PR against
main
MIT © 2026 Mohamad Yusup