Skip to content

loonylabs-dev/tts-proxy

Repository files navigation

TTS Proxy

Docker Coqui CPU Health

A multi-service Text-to-Speech proxy with intelligent voice routing, supporting multiple TTS engines and providing a unified REST API for speech synthesis.

πŸ“‹ Table of Contents

✨ Features

  • Multi-Service Architecture: Intelligent routing between different TTS engines
  • Smart Voice Selection: Automatic service selection based on voice preferences
  • High-Quality German Voices: Thorsten (male) and CSS10 (female) voices
  • Docker Support: Complete containerized setup optimized for CPU
  • RESTful API: Clean, simple API compatible with various frontends
  • Authentication: API key-based security
  • Health Monitoring: Built-in health check endpoints
  • CPU Optimized: Efficient processing on standard hardware

πŸš€ Quick Start

πŸ“‹ Prerequisites
  • Docker and Docker Compose
  • Multi-core CPU recommended for optimal performance
  • Node.js 18+ (for local development)

🐳 Docker Deployment (Recommended)

  1. Clone the repository:

    git clone https://github.com/loonylabs-dev/tts-proxy.git
    cd tts-proxy
  2. Set up environment:

    cp .env.example .env
    # Edit .env with your API key
  3. Start the services:

    docker compose up -d
  4. Test the service:

    curl -X POST "http://localhost:3000/api/tts" \
      -H "x-api-key: your_api_key_here" \
      -H "Content-Type: application/json" \
      -d '{"text": "Hello World", "voice": "thorsten_male"}' \
      --output test.wav
πŸ’» Local Development
  1. Install dependencies:

    cd proxy
    npm install
  2. Set up environment:

    cp .env.example .env
    # Edit .env with your configuration
  3. Start TTS services (Docker):

    docker compose up -d thorsten css10
  4. Start the proxy:

    npm run dev          # Development mode
    # or
    npm run build && npm start  # Production mode

πŸ”Œ API Usage

The proxy provides a RESTful API for text-to-speech conversion with intelligent voice routing.

POST /api/tts

Generate speech from text with automatic service selection.

Request Parameters:

{
  "text": "Text to be spoken",
  "voice": "voice_id",         // See voice options below
  "language": "de|en",         // Optional, auto-detected
  "speed": 1.0,               // Optional, default 1.0
  "pitch": 1.0                // Optional, default 1.0
}

Voice Selection Options:

  1. Structured Voice IDs (Recommended):

    • "thorsten_male" - Thorsten German male voice
    • "css10_female" - CSS10 German female voice
  2. Gender Keywords:

    • "male" / "mΓ€nnlich" - Routes to Thorsten
    • "female" / "weiblich" - Routes to CSS10
  3. Service Names:

    • "thorsten" - Direct routing to Thorsten service
    • "css10" - Direct routing to CSS10 service

Examples:

# German male voice
curl -X POST "http://localhost:3000/api/tts" \
  -H "x-api-key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"text": "Guten Tag, wie geht es Ihnen?", "voice": "thorsten_male"}' \
  --output german_male.wav

# German female voice
curl -X POST "http://localhost:3000/api/tts" \
  -H "x-api-key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hallo, ich bin eine weibliche Stimme", "voice": "css10_female"}' \
  --output german_female.wav

# Auto-detection based on gender
curl -X POST "http://localhost:3000/api/tts" \
  -H "x-api-key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello World", "voice": "male"}' \
  --output english_male.wav

Response: WAV audio file

GET /api/tts/voices

List available voices and their capabilities.

Response:

{
  "voices": [
    {
      "id": "thorsten_male",
      "name": "Thorsten (Male)",
      "description": "Warm German male voice, very natural",
      "gender": "male",
      "quality": "high",
      "languages": ["de", "en"],
      "service": "thorsten",
      "model": "tts_models/de/thorsten/vits"
    },
    {
      "id": "css10_female",
      "name": "CSS10 (Female)",
      "description": "Clear German female voice with good pronunciation",
      "gender": "female",
      "quality": "high",
      "languages": ["de"],
      "service": "css10",
      "model": "tts_models/de/css10/vits-neon"
    }
  ]
}

Other Endpoints

  • GET /health - Health check for all services
  • GET /debug/coqui - Debug endpoint for service testing

βš™οΈ Configuration

πŸ”§ Environment Variables
Variable Default Description
API_KEY Required Authentication key for API access
TTS_TYPE coqui TTS engine type
TTS_THORSTEN_URL http://thorsten:5002 Thorsten service URL
TTS_CSS10_URL http://css10:5003 CSS10 service URL
PORT 3000 Proxy server port (local dev only)
🐳 Docker Configuration

The Docker setup includes:

  • Thorsten container: German male voice (Coqui TTS)
  • CSS10 container: German female voice (Coqui TTS)
  • Proxy container: API proxy with intelligent routing
  • Optional Cloudflare tunnel: For external access
πŸ”§ CPU Configuration

The setup is optimized for CPU processing:

  • 6GB memory limit per TTS service
  • Multi-core CPU utilization
  • No GPU dependencies required
  • Runs on standard Docker setups

Note: For GPU acceleration, see docker-compose.gpu-backup.yml for reference configuration.

πŸ“‚ Project Structure

tts-proxy/
β”œβ”€β”€ proxy/                        # TypeScript proxy server
β”‚   β”œβ”€β”€ src/index.ts             # Main proxy logic
β”‚   β”œβ”€β”€ dist/                    # Compiled JavaScript
β”‚   └── package.json             # Dependencies
β”œβ”€β”€ f5-tts-service/              # Thorsten TTS service
β”‚   └── Dockerfile               # Coqui TTS container
β”œβ”€β”€ css10/                       # CSS10 TTS service
β”‚   └── Dockerfile               # Coqui TTS container
β”œβ”€β”€ docker-compose.yml           # Service orchestration
β”œβ”€β”€ .env.example                 # Environment template
└── README.md                    # This file

πŸ”’ Security Notes

  • Keep your API key secure and never commit it to version control
  • Use the provided .env.example as a template for configuration
  • Cloudflare tunnel credentials are excluded from git tracking
  • The proxy requires valid API keys for all requests (except health checks)
  • Internal communication uses Docker networks for security

πŸ”§ Troubleshooting

API Connection Issues

401 Unauthorized:

  • Check API_KEY in .env file
  • Ensure x-api-key header is correct in requests
  • Health endpoint (/health) doesn't require authentication

TTS Service Unreachable:

  • Check Docker containers: docker compose ps
  • Verify GPU access: docker compose logs thorsten
  • Test direct service: curl http://localhost:5002/

Voice Issues

Model Not Found:

  • Check available voices: GET /api/tts/voices
  • Verify TTS service logs: docker compose logs css10
  • Ensure containers are fully started (first startup takes longer)

Audio Quality Issues:

  • Adjust speed/pitch parameters
  • Try different voice options
  • Check input text language matches voice capabilities

Performance Issues

High CPU Usage:

  • TTS processing is CPU-intensive by design
  • Monitor CPU usage: docker stats
  • Consider upgrading to multi-core CPU for better performance

Slow Response Times:

  • First request per container is slower (model loading)
  • Consider keeping containers warm with health checks
  • Monitor GPU memory usage

Docker Issues

Container Won't Start:

  • Check logs: docker compose logs
  • Verify .env file exists and contains API_KEY
  • Ensure sufficient disk space for Docker images

Port Conflicts:

  • Setup uses internal Docker networking by default
  • Modify docker-compose.yml if external access needed
  • Check no other services use ports 5002, 5003

πŸ“„ License

MIT License - see LICENSE file for details.

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

For questions or support, please open an issue on GitHub.


Commercial Use: All TTS models used are commercially licensed (CC0/Apache 2.0). See CLAUDE.md for detailed licensing information.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published