AI-powered anime dubbing pipeline — automated voice cloning and translation using existing subtitles, open-source models, and GPU acceleration.
Tdarr-style library automation for Plex/media servers.
- Scans your media library for anime with Japanese audio but no EN/DE audio track
- Uses existing subtitle files (.srt/.ass) as the translation source — no machine translation needed
- Separates vocals from music/effects in the Japanese audio
- Identifies speakers per subtitle line using diarization + Whisper cross-reference
- Clones each character's voice and synthesizes translated speech
- Remixes the new vocal track with original music/effects
- Muxes the new audio track into the media file
- Notifies Plex to refresh
| Component | Model | License |
|---|---|---|
| Source Separation | Demucs / HTDemucs (Meta) | MIT |
| Speaker Diarization | pyannote 3.1 | MIT |
| Speech-to-Text | Whisper large-v3-turbo (OpenAI) | MIT |
| Voice Clone + TTS | CosyVoice3 0.5B (Alibaba) | Apache 2.0 |
| Audio Processing | FFmpeg | LGPL/GPL |
All models are open-source and run locally on your GPU.
Minimum: 8 GB VRAM (sequential model loading) Recommended: 16+ GB VRAM (parallel stages possible)
See docs/ROADMAP.md for full technical specification.
docker compose up -dThen open http://localhost:29100 for the Web GUI.
See config.example.yaml for all configuration options.
🚧 Early development — Pipeline architecture defined, implementation in progress.
MIT