A multi-service Text-to-Speech proxy with intelligent voice routing, supporting multiple TTS engines and providing a unified REST API for speech synthesis.
π Table of Contents
- Multi-Service Architecture: Intelligent routing between different TTS engines
- Smart Voice Selection: Automatic service selection based on voice preferences
- High-Quality German Voices: Thorsten (male) and CSS10 (female) voices
- Docker Support: Complete containerized setup optimized for CPU
- RESTful API: Clean, simple API compatible with various frontends
- Authentication: API key-based security
- Health Monitoring: Built-in health check endpoints
- CPU Optimized: Efficient processing on standard hardware
π Prerequisites
- Docker and Docker Compose
- Multi-core CPU recommended for optimal performance
- Node.js 18+ (for local development)
-
Clone the repository:
git clone https://github.com/loonylabs-dev/tts-proxy.git cd tts-proxy -
Set up environment:
cp .env.example .env # Edit .env with your API key -
Start the services:
docker compose up -d
-
Test the service:
curl -X POST "http://localhost:3000/api/tts" \ -H "x-api-key: your_api_key_here" \ -H "Content-Type: application/json" \ -d '{"text": "Hello World", "voice": "thorsten_male"}' \ --output test.wav
π» Local Development
-
Install dependencies:
cd proxy npm install -
Set up environment:
cp .env.example .env # Edit .env with your configuration -
Start TTS services (Docker):
docker compose up -d thorsten css10
-
Start the proxy:
npm run dev # Development mode # or npm run build && npm start # Production mode
The proxy provides a RESTful API for text-to-speech conversion with intelligent voice routing.
Generate speech from text with automatic service selection.
Request Parameters:
{
"text": "Text to be spoken",
"voice": "voice_id", // See voice options below
"language": "de|en", // Optional, auto-detected
"speed": 1.0, // Optional, default 1.0
"pitch": 1.0 // Optional, default 1.0
}Voice Selection Options:
-
Structured Voice IDs (Recommended):
"thorsten_male"- Thorsten German male voice"css10_female"- CSS10 German female voice
-
Gender Keywords:
"male"/"mΓ€nnlich"- Routes to Thorsten"female"/"weiblich"- Routes to CSS10
-
Service Names:
"thorsten"- Direct routing to Thorsten service"css10"- Direct routing to CSS10 service
Examples:
# German male voice
curl -X POST "http://localhost:3000/api/tts" \
-H "x-api-key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{"text": "Guten Tag, wie geht es Ihnen?", "voice": "thorsten_male"}' \
--output german_male.wav
# German female voice
curl -X POST "http://localhost:3000/api/tts" \
-H "x-api-key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{"text": "Hallo, ich bin eine weibliche Stimme", "voice": "css10_female"}' \
--output german_female.wav
# Auto-detection based on gender
curl -X POST "http://localhost:3000/api/tts" \
-H "x-api-key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{"text": "Hello World", "voice": "male"}' \
--output english_male.wavResponse: WAV audio file
List available voices and their capabilities.
Response:
{
"voices": [
{
"id": "thorsten_male",
"name": "Thorsten (Male)",
"description": "Warm German male voice, very natural",
"gender": "male",
"quality": "high",
"languages": ["de", "en"],
"service": "thorsten",
"model": "tts_models/de/thorsten/vits"
},
{
"id": "css10_female",
"name": "CSS10 (Female)",
"description": "Clear German female voice with good pronunciation",
"gender": "female",
"quality": "high",
"languages": ["de"],
"service": "css10",
"model": "tts_models/de/css10/vits-neon"
}
]
}GET /health- Health check for all servicesGET /debug/coqui- Debug endpoint for service testing
π§ Environment Variables
| Variable | Default | Description |
|---|---|---|
API_KEY |
Required | Authentication key for API access |
TTS_TYPE |
coqui |
TTS engine type |
TTS_THORSTEN_URL |
http://thorsten:5002 |
Thorsten service URL |
TTS_CSS10_URL |
http://css10:5003 |
CSS10 service URL |
PORT |
3000 |
Proxy server port (local dev only) |
π³ Docker Configuration
The Docker setup includes:
- Thorsten container: German male voice (Coqui TTS)
- CSS10 container: German female voice (Coqui TTS)
- Proxy container: API proxy with intelligent routing
- Optional Cloudflare tunnel: For external access
π§ CPU Configuration
The setup is optimized for CPU processing:
- 6GB memory limit per TTS service
- Multi-core CPU utilization
- No GPU dependencies required
- Runs on standard Docker setups
Note: For GPU acceleration, see docker-compose.gpu-backup.yml for reference configuration.
tts-proxy/
βββ proxy/ # TypeScript proxy server
β βββ src/index.ts # Main proxy logic
β βββ dist/ # Compiled JavaScript
β βββ package.json # Dependencies
βββ f5-tts-service/ # Thorsten TTS service
β βββ Dockerfile # Coqui TTS container
βββ css10/ # CSS10 TTS service
β βββ Dockerfile # Coqui TTS container
βββ docker-compose.yml # Service orchestration
βββ .env.example # Environment template
βββ README.md # This file
- Keep your API key secure and never commit it to version control
- Use the provided
.env.exampleas a template for configuration - Cloudflare tunnel credentials are excluded from git tracking
- The proxy requires valid API keys for all requests (except health checks)
- Internal communication uses Docker networks for security
401 Unauthorized:
- Check
API_KEYin.envfile - Ensure
x-api-keyheader is correct in requests - Health endpoint (
/health) doesn't require authentication
TTS Service Unreachable:
- Check Docker containers:
docker compose ps - Verify GPU access:
docker compose logs thorsten - Test direct service:
curl http://localhost:5002/
Model Not Found:
- Check available voices:
GET /api/tts/voices - Verify TTS service logs:
docker compose logs css10 - Ensure containers are fully started (first startup takes longer)
Audio Quality Issues:
- Adjust speed/pitch parameters
- Try different voice options
- Check input text language matches voice capabilities
High CPU Usage:
- TTS processing is CPU-intensive by design
- Monitor CPU usage:
docker stats - Consider upgrading to multi-core CPU for better performance
Slow Response Times:
- First request per container is slower (model loading)
- Consider keeping containers warm with health checks
- Monitor GPU memory usage
Container Won't Start:
- Check logs:
docker compose logs - Verify
.envfile exists and containsAPI_KEY - Ensure sufficient disk space for Docker images
Port Conflicts:
- Setup uses internal Docker networking by default
- Modify
docker-compose.ymlif external access needed - Check no other services use ports 5002, 5003
MIT License - see LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
For questions or support, please open an issue on GitHub.
Commercial Use: All TTS models used are commercially licensed (CC0/Apache 2.0). See CLAUDE.md for detailed licensing information.