Skip to content

Add Kokoro TTS as a voice provider#7

Open
Asad-Ismail wants to merge 2 commits intomainfrom
add-kokoro-tts
Open

Add Kokoro TTS as a voice provider#7
Asad-Ismail wants to merge 2 commits intomainfrom
add-kokoro-tts

Conversation

@Asad-Ismail
Copy link
Copy Markdown
Owner

Summary

  • Adds Kokoro TTS (82M params, Apache 2.0 license) as a new voice provider
  • Near-ElevenLabs quality, runs on CPU, supports 8 languages and 30+ voices
  • Simple pip install: pip install kokoro soundfile (+ espeak-ng system dep)
  • Word-level timestamp estimation for subtitle sync
  • Available in the TTS server dropdown alongside Edge TTS, Chatterbox, etc.

Voices included

American English (6F/2M), British English (2F/2M), Japanese (2F/1M), Mandarin Chinese (3F/2M), French, Hindi, Italian, Brazilian Portuguese, Spanish

Test plan

  • Install deps: pip install kokoro soundfile and brew install espeak (macOS) or apt install espeak-ng (Linux)
  • Select "Kokoro TTS" in the TTS server dropdown
  • Generate a video with an English voice (e.g. af_heart)
  • Verify audio quality and subtitle timing
  • Test with a non-English voice (e.g. Japanese jf_alpha)

Kokoro is an 82M parameter open-source TTS model (Apache 2.0) that
consistently ranks near ElevenLabs in quality benchmarks. Runs on
CPU, supports 8 languages and 30+ voices, pip-installable.

- Add kokoro_tts() with word-level timestamp estimation
- Add get_kokoro_voices() with all available voices
- Add Kokoro to TTS server dropdown in webui
- Add kokoro + soundfile to requirements.txt
soundfile (libsndfile) doesn't support MP3 encoding. Save as
.wav and set _actual_audio_file so task.py picks up the right
path, same pattern as chatterbox TTS.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant