Skip to content

Add AI video generation using Wan2.1/2.2 models#11

Open
Asad-Ismail wants to merge 4 commits intomainfrom
add-ai-video-generation
Open

Add AI video generation using Wan2.1/2.2 models#11
Asad-Ismail wants to merge 4 commits intomainfrom
add-ai-video-generation

Conversation

@Asad-Ismail
Copy link
Copy Markdown
Owner

Summary

Replaces stock footage dependency with AI-generated video clips. Instead of searching Pexels/Pixabay for generic clips, this generates custom video that actually matches the script content.

Uses Wan2.1/2.2 text-to-video models via HuggingFace diffusers. Both are Apache 2.0 licensed.

Two model options:

Model Resolution VRAM GPU Gen time/clip
Wan2.1 1.3B 480p/15fps 8GB RTX 3060+ ~4 min
Wan2.2 5B 720p/24fps 24GB RTX 4090 ~9 min

How it works

  1. Script gets split into sentence-level prompts
  2. Each prompt generates a 5-second video clip
  3. Clips are saved to disk as .mp4 files
  4. Existing combine_videos pipeline handles the rest (no changes needed downstream)

Falls back gracefully — if no CUDA GPU or diffusers not installed, the UI warns the user. Existing Pexels/Pixabay sources work exactly as before.

Changes

  • New app/services/video_gen.py — generation service with model loading, prompt enhancement, clip generation
  • Updated app/services/task.pyai_generated branch in get_video_materials()
  • Updated webui/Main.py — new video source option + model selector
  • Updated config.example.toml — video_gen_model setting
  • Updated requirements.txt — diffusers + accelerate

Test plan

  • Without GPU: select AI Generated, verify warning shows in UI
  • With CUDA GPU: pip install diffusers accelerate
  • Select "AI Generated (Wan2.1/2.2)" as video source
  • Choose Wan2.1 1.3B model
  • Generate a short video, verify clips are script-relevant
  • Verify Pexels/Pixabay still work as before

Two model options:
- wan2.1-1.3b: 480p/15fps, 8GB VRAM (RTX 3060+)
- wan2.2-5b: 720p/24fps, 24GB VRAM (RTX 4090)

Both Apache 2.0, full HuggingFace diffusers integration.
Generates per-segment clips from the script text instead
of searching stock footage sites.
- Add ai_generated branch in get_video_materials()
- Skip search term generation for AI source (uses script directly)
- Pass video_script to get_video_materials for prompt splitting
- Add "AI Generated (Wan2.1/2.2)" to video source dropdown
- Show model selector (1.3B vs 5B) when AI source selected
- Warn if CUDA/diffusers not available
- Add video_gen_model config to config.example.toml
- Add diffusers + accelerate to requirements.txt
Avoids merge conflict with other PRs that add deps at the end of the file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant