Generate cinematic AI videos from text, images, or audio directly from your OpenClaw bot. Powered by LTX-2 by Lightricks.
Tell your bot to make a video. It handles the rest — prompt enhancement, API calls, file saving.
You: "Make me a cinematic video of a samurai walking through rain in Tokyo at night"
Bot: *enhances your prompt, generates video, saves to Desktop*
Supports:
- Text-to-Video — Describe a scene, get a video
- Image-to-Video — Animate a still image
- Audio-to-Video — Generate video synced to audio (lip sync, music videos)
- Camera presets — dolly, pan, crane, handheld, orbit, tracking
- AI audio — synchronized sound effects, dialog, ambient
- Prompt enhancement — turns basic ideas into cinematic prompts
npx clawdhub@latest install cineclawcp -r cineclaw ~/.openclaw/workspace/skills/cineclawInstall the skill from https://github.com/babakarto/cineclaw
- Get an API key at console.ltx.video
- Set it as an environment variable:
echo 'export LTX_API_KEY="your_key_here"' >> ~/.bashrc
source ~/.bashrc- Done. Start generating.
"Generate a quick video of ocean waves at sunset"
"Make a 10-second cinematic video of a cyberpunk street"
"Create a pro quality 4K video of a coffee being poured in slow motion"
"Animate this photo with a slow camera dolly forward"
"Generate a video synced to this audio track"
- Enhance your prompt (adding camera, lighting, audio cues)
- Show you the enhanced prompt for approval
- Estimate the cost before generating
- Generate and save the video
- Ask if you want to iterate or upscale
| Model | Speed | Quality | Best For |
|---|---|---|---|
| ltx-2-fast | ~5-15s | Good | Drafts, iteration, previews |
| ltx-2-pro | ~30-90s | Cinematic | Final output, client work, A2V |
Per second of generated video:
| Model | 1080p | 1440p | 4K |
|---|---|---|---|
| fast | ~$0.02 | ~$0.04 | ~$0.08 |
| pro | ~$0.05 | ~$0.10 | ~$0.20 |
A typical 6-second fast preview costs about $0.12. The bot always estimates cost before generating.
cineclaw/
├── SKILL.md # Main skill (OpenClaw reads this)
├── references/
│ ├── ltx-api.md # API endpoints, params, errors
│ ├── prompting-guide.md # How to write great video prompts
│ └── ltx2-prompt-guide-advanced.md # Deep research from X/Twitter community
├── scripts/
│ └── ltx_generate.py # Python script for all API calls
├── README.md # This file
└── LICENSE # MIT
Structure: [Scene]. [Subject]. [Action]. [Camera]. [Style].
Always start with the scene — prevents morphing and inconsistencies.
Key style keywords: 35mm film, halation, shallow depth of field, golden hour, cinematic lighting
I2V tip: Always add "Static camera. No camera movement." to prevent unwanted dolly/zoom.
A2V tip: Match audio emotion to prompt emotion. Isolate vocals for music videos.
See references/prompting-guide.md for the full guide with examples.
- Batch generation (multiple videos from a script)
- Video-to-video (style transfer, retakes)
- LoRA support for custom styles
- Storyboard mode (multi-shot sequences)
- Multi-provider support (Runway, Kling, Sora)
- Telegram inline preview (send video directly in chat)
PRs welcome. If you add support for another video API, keep the same interface pattern — the bot shouldn't need to learn a new workflow.
MIT
- LTX-2 by Lightricks — the video generation model
- OpenClaw — the AI agent platform
- Prompting guide based on official LTX docs, community research (700+ tweets analyzed), and walterlow/ltx-2-prompt