OpenVox

A self-hosted developer podcast studio powered by Pocket-TTS. Turn technical documentation, markdown files, and web pages into listenable podcast-like episodes — all running locally on CPU.

Built on top of an OpenAI-compatible TTS API server. Any OpenAI TTS client can still use this as a drop-in replacement while the studio UI provides a full workflow for long-form content.

Tested and working fully with WingmanAI by Shipbit. Due to low resource use, can be used for real time local text to speech even while playing intensive video games (even in VR!) with WingmanAI.

Screenshots

Desktop

Add Content	Library	Settings	Player

Mobile

Add Content	Library	Settings	Player

Features

OpenVox (Web UI at /)

Import content from files (.md, .txt), URLs, pasted text, or git repos
Automatic text cleaning and normalization for TTS
Configurable chunking strategies (paragraph, sentence, heading, max chars)
Background audio generation with progress tracking
Full-screen player with karaoke-style word-by-word subtitles
Chunked progress bar with tap-to-jump chapter segments
Playback speed control (0.5x–3x), sleep timer, skip forward/back
Persistent library with folders, tags, and playback position
Mini-player bar with tap-to-expand fullscreen
Dark theme, keyboard shortcuts, mobile-first responsive design

OpenAI-Compatible API

POST /v1/audio/speech — generate speech (streaming + file)
GET /v1/voices — list available voices
GET /health — health check for containers
Works with any OpenAI TTS client

Infrastructure

CPU optimized — no GPU required
Docker ready with persistent data volumes
150+ community voices included
Voice cloning from short audio samples

Quick Start

Docker (Recommended)

git clone https://github.com/theepicsaxguy/OpenVox.git
cd OpenVox

docker compose up -d

Open http://localhost:49112 — the OpenVox UI loads at /.

Data (library database, uploaded sources, generated audio) persists in ./data/.

Python (from source)

git clone https://github.com/theepicsaxguy/OpenVox.git
cd OpenVox

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

pip install -r requirements.txt
python server.py

Studio Workflow

Add content — upload a file, paste a URL, clone a git repo, or type/paste text
Preview — see raw vs cleaned text side-by-side
Chunk — choose strategy and preview chunk boundaries
Generate — select voice and format, queue for background generation
Listen — full-screen player with karaoke subtitles and chunked scrubber
Organize — folders, tags, playback progress in the library

API Usage

The OpenAI-compatible API works exactly as before:

curl http://localhost:49112/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"model": "tts-1", "input": "Hello world!", "voice": "alba"}' \
  --output speech.mp3

from openai import OpenAI

client = OpenAI(base_url="http://localhost:49112/v1", api_key="not-needed")

response = client.audio.speech.create(
    model="tts-1",
    voice="alba",
    input="Hello world!"
)
response.stream_to_file("output.mp3")

API Reference

Endpoint	Method	Description
`/`	GET	OpenVox web UI
`/health`	GET	Health check
`/v1/voices`	GET	List voices
`/v1/audio/speech`	POST	Generate speech (OpenAI-compatible)
`/api/studio/*`	Various	Studio API (sources, episodes, library, playback, settings)

Speech Parameters

Parameter	Type	Required	Default	Description
`model`	string	No	-	Ignored (for OpenAI compatibility)
`input`	string	Yes	-	Text to synthesize
`voice`	string	No	`alba`	Voice ID (see `/v1/voices`)
`response_format`	string	No	`mp3`	`mp3`, `wav`, `pcm`, `opus`, `flac`
`stream`	boolean	No	`false`	Enable streaming response

Custom Voices

Place audio files (.wav, .mp3, .flac) in a voices directory

Configure the server:

# Docker
POCKET_TTS_VOICES_DIR=/path/to/voices docker compose up -d

# Python
python server.py --voices-dir /path/to/voices

Use by filename: {"voice": "my_voice.wav", "input": "Hello!"}

Built-in voices: alba, marius, javert, jean, fantine, cosette, eponine, azelma

The voices/ directory includes 150+ community-contributed voices.

Voice File Guidelines

Duration: 3-10 seconds of clear speech
Quality: Clean audio without background noise
Format: WAV, MP3, or FLAC

Configuration

Environment Variables

Variable	Default	Description
`POCKET_TTS_HOST`	`0.0.0.0`	Server bind address
`POCKET_TTS_PORT`	`49112`	Server port
`POCKET_TTS_VOICES_DIR`	`./voices`	Custom voices directory
`POCKET_TTS_DATA_DIR`	`./data`	Studio data directory (DB, sources, audio)
`POCKET_TTS_MODEL_PATH`	-	Custom model path
`POCKET_TTS_STREAM_DEFAULT`	`true`	Enable streaming by default
`POCKET_TTS_LOG_LEVEL`	`INFO`	Log level: DEBUG, INFO, WARNING, ERROR
`POCKET_TTS_LOG_DIR`	`./logs`	Log files directory
`HF_TOKEN`	-	Hugging Face token (for voice cloning)

Data Directory

The studio stores all persistent data in POCKET_TTS_DATA_DIR (default: ./data/):

data/
  podcast_studio.db     # SQLite database (library, settings, playback state)
  sources/              # Uploaded source files
  audio/                # Generated audio (per-episode subdirectories)

Back up by copying this single directory.

Project Structure

OpenVox/
├── app/
│   ├── __init__.py              # Flask app factory
│   ├── config.py                # Configuration from environment
│   ├── logging_config.py        # Logging with file rotation
│   ├── routes.py                # OpenAI-compatible API endpoints
│   ├── services/
│   │   ├── audio.py             # Audio format conversion, streaming
│   │   └── tts.py               # TTS model service, voice cache
│   └── studio/                  # Podcast Studio module
│       ├── __init__.py          # Blueprint registration
│       ├── db.py                # SQLite schema and connections
│       ├── repositories.py      # DB query abstraction layer
│       ├── schemas.py           # Marshmallow schemas for request bodies
│       ├── sources_routes.py    # Source CRUD endpoints
│       ├── episodes_routes.py   # Episode CRUD endpoints
│       ├── folders_routes.py    # Folder CRUD endpoints
│       ├── tags_routes.py       # Tag management endpoints
│       ├── playback_routes.py   # Playback state endpoints
│       ├── settings_routes.py   # Settings endpoints
│       ├── library_routes.py    # Library tree, generation status
│       ├── ingestion.py         # File/URL/text import
│       ├── git_ingestion.py     # Git repository import
│       ├── normalizer.py        # Text cleaning for TTS
│       ├── chunking.py          # Text chunking strategies
│       ├── breathing.py         # Natural pause insertion
│       ├── generation.py        # Background generation queue
│       └── audio_assembly.py    # Chunk merging for export
├── static/
│   ├── css/
│   │   ├── studio.css           # Studio styles (dark theme)
│   │   └── style.css            # Base styles
│   └── js/studio/               # Vanilla JS ES modules
│       ├── main.js              # Entry point, router, toast/confirm
│       ├── utils.js             # Shared helpers (escapeHtml, formatTime, etc.)
│       ├── dom.js               # Safe DOM manipulation helpers
│       ├── state.js             # Pub/sub state management
│       ├── library.js           # Library view, folder tree
│       ├── editor.js            # Import/source/episode views
│       ├── settings.js          # Settings panel
│       ├── player.js            # Player coordinator
│       ├── player-state.js      # Player state management
│       ├── player-controls.js   # Play/pause, seek, volume
│       ├── player-queue.js      # Queue management, shuffle, repeat
│       ├── player-waveform.js   # Waveform visualization
│       ├── player-render.js     # Mini/full player, karaoke subtitles
│       ├── player-chunk.js      # Chunk loading and playback
│       ├── api.js               # Fetch-based API client (import from here)
│       ├── api.ts               # TypeScript API wrapper (build-step only)
│       └── client.ts            # Generated TypeScript client (DO NOT EDIT)
├── scripts/
│   ├── generate_openapi.py      # Generates OpenAPI spec from Flask routes
│   └── validate_openapi.py      # Validates OpenAPI spec completeness
├── e2e/
│   └── studio.spec.js           # Playwright E2E tests
├── templates/studio.html        # Studio UI shell
├── data/                        # Persistent data (Docker volume)
├── voices/                      # Voice files (150+ included)
├── server.py                    # Entry point, CLI, Waitress server
├── Dockerfile
├── docker-compose.yml
├── docker-entrypoint.sh         # Container entrypoint
├── openapi.yaml                 # Generated OpenAPI spec
├── orval.config.mjs             # Orval client generation config
├── pyproject.toml               # Python project config, ruff settings
├── requirements.txt             # Python dependencies
└── package.json                 # Node.js dependencies and scripts

Development

pip install -r requirements.txt
python server.py --log-level DEBUG

API Client Generation

After modifying any backend route, regenerate the TypeScript API client:

pnpm install
pnpm run client:generate

This runs scripts/generate_openapi.py to produce openapi.yaml, then Orval generates static/js/studio/client.ts. CI validates that generated files are up to date.

Linting

pip install ruff
ruff check .
ruff format .

# JavaScript
pnpm run lint

Troubleshooting

Model Loading Takes Long

First run downloads the model (~500MB). Subsequent runs use the cached model. Docker persists the cache in a named volume.

Voice Cloning Requires HF Token

Get a token from https://huggingface.co/settings/tokens and set HF_TOKEN.

Port Already in Use

python server.py --port 8080
# or
POCKET_TTS_PORT=8080 docker compose up -d

Credits

Pocket-TTS by Kyutai Labs
pocket-tts-openai_streaming_server by teddybear082 (original project this builds upon)
Community voice contributors (see voices/credits.txt)

License

This project is licensed under the MIT License - see LICENSE for details.

Pocket-TTS is subject to its own license terms.

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
.github/workflows		.github/workflows
.opencode/skills/frontend-design		.opencode/skills/frontend-design
.vscode		.vscode
app		app
data/sources		data/sources
docs/screenshots		docs/screenshots
e2e		e2e
logs		logs
scripts		scripts
static		static
templates		templates
voices		voices
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
eslint.config.js		eslint.config.js
openapi.yaml		openapi.yaml
orval.config.mjs		orval.config.mjs
package.json		package.json
playwright.config.js		playwright.config.js
pocket-tts-logo.ico		pocket-tts-logo.ico
pyproject.toml		pyproject.toml
renovate.json		renovate.json
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
server.py		server.py
tsconfig.json		tsconfig.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenVox

Screenshots

Desktop

Mobile

Features

Quick Start

Docker (Recommended)

Python (from source)

Studio Workflow

API Usage

API Reference

Speech Parameters

Custom Voices

Voice File Guidelines

Configuration

Environment Variables

Data Directory

Project Structure

Development

API Client Generation

Linting

Troubleshooting

Model Loading Takes Long

Voice Cloning Requires HF Token

Port Already in Use

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenVox

Screenshots

Desktop

Mobile

Features

Quick Start

Docker (Recommended)

Python (from source)

Studio Workflow

API Usage

API Reference

Speech Parameters

Custom Voices

Voice File Guidelines

Configuration

Environment Variables

Data Directory

Project Structure

Development

API Client Generation

Linting

Troubleshooting

Model Loading Takes Long

Voice Cloning Requires HF Token

Port Already in Use

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages