Minimal Kokoro TTS API with CUDA support for NVIDIA GPUs (Ada Lovelace and newer).
Source: https://github.com/cipherdolls/kokoro-cuda
- Single FastAPI endpoint for text-to-speech
- Streaming audio (WAV/PCM) and encoded formats (MP3, Opus, FLAC)
- 49 built-in voices (American/British English, Spanish, French, Hindi, Italian, Japanese, Portuguese, Chinese)
- Configurable voice, speed, and bitrate
- Web UI at
/— type text, pick a voice, and hear audio instantly - Swagger UI at
/docs - Automatic model + voice download on first start (persisted via Docker volumes)
docker build -t kokoro-cuda .
docker run --gpus all -p 8880:8880 -v kokoro-models:/app/models -v kokoro-voices:/app/voices kokoro-cuda{
"input": "Hello, this is a test.",
"voice": "af_heart",
"speed": 1.0,
"response_format": "wav",
"bitrate": "192k"
}| Parameter | Default | Options |
|---|---|---|
| input | (required) | Any text |
| voice | af_heart | 49 voices — see GET /v1/voices |
| speed | 1.0 | 0.5 - 2.0 |
| response_format | wav | wav, mp3, opus, flac, pcm |
| bitrate | 192k | 128k, 192k, 320k |
Returns available voice packs.
Returns model status.
# WAV
curl -X POST http://localhost:8880/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Hello world"}' -o hello.wav
# MP3 at 320k
curl -X POST http://localhost:8880/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Hello world", "response_format": "mp3", "bitrate": "320k"}' -o hello.mp3Open http://localhost:8880/ in your browser. Select a voice, adjust speed, type text, and click Speak to hear the audio immediately.
49 voice packs are downloaded automatically on first start. Naming convention: {lang}{gender}_{name}
| Prefix | Language | Voices |
|---|---|---|
af_ / am_ |
American English | 20 |
bf_ / bm_ |
British English | 8 |
ef_ / em_ |
Spanish | 3 |
ff_ |
French | 1 |
hf_ / hm_ |
Hindi | 4 |
if_ / im_ |
Italian | 2 |
jf_ / jm_ |
Japanese | 5 |
pf_ / pm_ |
Portuguese | 3 |
zf_ |
Chinese | 4 |
Run the full benchmark suite (Kokoro + Whisper validation):
cd benchmark
docker compose up --build --abort-on-container-exit --exit-code-from benchmarkResults are saved to benchmark/output/<GPU_NAME>/report.md. See latest results.
main.py — FastAPI app, model loading, streaming TTS endpoint
download_model.py — Downloads model + voice pack on first start
entrypoint.sh — Runs download then starts uvicorn
Dockerfile — CUDA 12.8 runtime image (Ada Lovelace + Blackwell)
benchmark/ — Benchmark suite with Whisper validation
- NVIDIA GPU with CUDA 12.8+ (RTX 4090 / RTX 5090 and similar)
- Docker with NVIDIA Container Toolkit