Kokoro CUDA

Minimal Kokoro TTS API with CUDA support for NVIDIA GPUs (Ada Lovelace and newer).

Source: https://github.com/cipherdolls/kokoro-cuda

Features

Single FastAPI endpoint for text-to-speech
Streaming audio (WAV/PCM) and encoded formats (MP3, Opus, FLAC)
49 built-in voices (American/British English, Spanish, French, Hindi, Italian, Japanese, Portuguese, Chinese)
Configurable voice, speed, and bitrate
Web UI at / — type text, pick a voice, and hear audio instantly
Swagger UI at /docs
Automatic model + voice download on first start (persisted via Docker volumes)

Quick Start

docker build -t kokoro-cuda .
docker run --gpus all -p 8880:8880 -v kokoro-models:/app/models -v kokoro-voices:/app/voices kokoro-cuda

API

`POST /v1/audio/speech`

{
  "input": "Hello, this is a test.",
  "voice": "af_heart",
  "speed": 1.0,
  "response_format": "wav",
  "bitrate": "192k"
}

Parameter	Default	Options
input	(required)	Any text
voice	af_heart	49 voices — see `GET /v1/voices`
speed	1.0	0.5 - 2.0
response_format	wav	wav, mp3, opus, flac, pcm
bitrate	192k	128k, 192k, 320k

`GET /v1/voices`

Returns available voice packs.

`GET /health`

Returns model status.

Examples

# WAV
curl -X POST http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello world"}' -o hello.wav

# MP3 at 320k
curl -X POST http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello world", "response_format": "mp3", "bitrate": "320k"}' -o hello.mp3

Web UI

Open http://localhost:8880/ in your browser. Select a voice, adjust speed, type text, and click Speak to hear the audio immediately.

Voices

49 voice packs are downloaded automatically on first start. Naming convention: {lang}{gender}_{name}

Prefix	Language	Voices
`af_` / `am_`	American English	20
`bf_` / `bm_`	British English	8
`ef_` / `em_`	Spanish	3
`ff_`	French	1
`hf_` / `hm_`	Hindi	4
`if_` / `im_`	Italian	2
`jf_` / `jm_`	Japanese	5
`pf_` / `pm_`	Portuguese	3
`zf_`	Chinese	4

Benchmark

Run the full benchmark suite (Kokoro + Whisper validation):

cd benchmark
docker compose up --build --abort-on-container-exit --exit-code-from benchmark

Results are saved to benchmark/output/<GPU_NAME>/report.md. See latest results.

Architecture

main.py            — FastAPI app, model loading, streaming TTS endpoint
download_model.py  — Downloads model + voice pack on first start
entrypoint.sh      — Runs download then starts uvicorn
Dockerfile         — CUDA 12.8 runtime image (Ada Lovelace + Blackwell)
benchmark/         — Benchmark suite with Whisper validation

Requirements

NVIDIA GPU with CUDA 12.8+ (RTX 4090 / RTX 5090 and similar)
Docker with NVIDIA Container Toolkit

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
benchmark		benchmark
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
download_model.py		download_model.py
entrypoint.sh		entrypoint.sh
main.py		main.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kokoro CUDA

Features

Quick Start

API

`POST /v1/audio/speech`

`GET /v1/voices`

`GET /health`

Examples

Web UI

Voices

Benchmark

Architecture

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kokoro CUDA

Features

Quick Start

API

POST /v1/audio/speech

GET /v1/voices

GET /health

Examples

Web UI

Voices

Benchmark

Architecture

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /v1/audio/speech`

`GET /v1/voices`

`GET /health`

Packages