Clone viral videos with your product swapped in
Category: Content Creation Runtime: Claude Code AIOS Rating: 8/10
Give it any viral video and your product images — it analyzes the video scene-by-scene using the SEALCaM framework, then recreates each scene with your product swapped in. Generates images, animates them into video clips, adds AI-generated music, and combines everything into a final video.
The pipeline has 4 mandatory human checkpoints so you approve every stage before spending money.
Reference Video Gemini AI analyzes Scene
+ Product Images → each scene using → Breakdown
(inputs/ folder) SEALCaM framework (YAML)
CHECKPOINT 1: "Does this scene breakdown look right?"
Agent writes Prompts logged Cost
image + video → to Airtable → Estimate
prompts per scene (Scenes table) Shown
CHECKPOINT 2: "Ready to generate images? Cost: $X"
NanoBanana Pro Images logged Images
generates start → to Airtable → Shown to
image per scene (start_image field) User
CHECKPOINT 3: "Ready to generate videos? Cost: $X"
Kling 2.6 animates Videos logged Videos
each image into → to Airtable → Shown to
5s video clip (scene_video field) User
CHECKPOINT 4: "Ready to add music and combine?"
Suno V4 generates FFmpeg concatenates Final
background music → all clips + adds → Video in
via Kie.ai music with fade-out outputs/
| Platform | Role | Cost |
|---|---|---|
| Kie.ai | Core engine — NanoBanana Pro (images), Kling 2.6 (video), Suno V4 (music) | Per-generation |
| Google Gemini | Video analysis — SEALCaM scene-by-scene breakdown | Free tier |
| Airtable | Scene logging — prompts, images, videos per scene | Free tier |
| FFmpeg | Post-production — concatenate clips + overlay music | Free (local) |
- Upload video to Gemini 2.0 Flash
- AI analyzes using SEALCaM (Subject, Environment, Action, Lighting, Camera, Metatokens)
- CHECKPOINT: Show scene breakdown, get approval
- Agent writes image + video motion prompt per scene with your product
- Create records in Airtable Scenes table
- CHECKPOINT: Show prompts + cost estimate
- Upload reference images to Kie.ai (temporary 3-day URLs)
- NanoBanana Pro generates start image per scene (9:16, 2K)
- CHECKPOINT: Show images, get approval
- Each start image animated into 5s video via Kling 2.6
- CHECKPOINT: Show videos, get approval
- Suno V4 generates instrumental background music
- FFmpeg concatenates all clips + overlays music with 2-second fade-out
- Final video saved to outputs/
creative-cloner/
├── Getting_Started.md
├── .agent/
│ ├── .env
│ ├── requirements.txt
│ └── skills/creative-cloner/SKILL.md (266 lines — brain)
├── tools/
│ ├── analyze_video.py
│ ├── generate_images.py
│ ├── generate_videos.py
│ ├── generate_music.py
│ └── combine_all.py
├── inputs/
│ ├── project1 - animals/ (animals.mp4 + tea.png)
│ └── project2 - surf/ (surf.mp4 + redboots.jpg + logo.png)
└── outputs/
| Variable | Source | Purpose |
|---|---|---|
KIE_API_KEY |
kie.ai dashboard | Image + Video + Music gen + File upload |
GEMINI_API_KEY |
aistudio.google.com/apikey | Video analysis with Gemini 2.0 Flash |
AIRTABLE_TOKEN |
airtable.com/create/tokens | Read/write scene records |
AIRTABLE_BASE_ID |
Airtable URL | Target base |
| Generation | Model | Cost | Time |
|---|---|---|---|
| Image (per scene) | NanoBanana Pro | $0.09 | ~30-60s |
| Video 5s (per scene) | Kling 2.6 | $0.28 | ~2-4 min |
| Video 10s (per scene) | Kling 2.6 | $0.56 | ~3-6 min |
| Music (per project) | Suno V4 | ~$0.10 | ~1-3 min |
Example: 4 scenes with 5s videos = $1.58 (4 images $0.36 + 4 videos $1.12 + 1 music $0.10)
- Stop at Every Checkpoint — NEVER auto-proceed between phases
- Show Costs Before Generating — Always show per-unit + total before any generation
- One Script at a Time — Run sequentially, never in parallel
- Log Everything to Airtable — Every prompt, image, and video gets logged
- NEVER Auto-Retry Errors — On failure, STOP and give user 3 options
| Problem | Cause | Fix |
|---|---|---|
| Kie.ai error but generation succeeded | Timeout/network issues | Check kie.ai dashboard before retrying |
| Bad scene breakdown | Low quality reference video | Use clear 5-15 second reference videos |
| Airtable logging fails | Wrong column names | Column names are case-sensitive |
Built by Copyweb — architecting how marketing teams operate in an AI-first world.