-
Notifications
You must be signed in to change notification settings - Fork 360
feat: Add WebSocket support for real-time TTS streaming with multi-user capability #356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…improve client feedback
…proved voice retrieval
…cket support for concurrent users
|
Hi @KoljaB What's new:
Perfect for LLM integrations - each user gets their own queue so multiple conversations can run simultaneously. Works with OpenAI, Kokoro, Azure, and ElevenLabs engines. Everything's backward compatible. |
|
Hi @KoljaB , |
|
Thank you so much. This looks like a great PR. I'm in Spain for the next two months, so can't really look into it for a while. |
|
Hi @KoljaB, please take your time. I was testing and found a few bugs related to handling different audio formats from different providers. They are fixed in the latest commit #57bbfcb |
Kyutai Labs' Pocket TTS - lightweight 100M parameter model with: - CPU-optimized inference (~6x real-time performance) - Voice cloning via WAV files - ~200ms latency to first audio chunk - 8 built-in voices Install with: pip install pocket-tts
Summary
Implements a WebSocket endpoint for real-time text-to-speech streaming, enabling bidirectional communication and support for multiple concurrent users. Includes a complete demo client and enhanced web UI with mode switching.
Key Features
WebSocket Endpoint (
/ws)Enhanced Web Interface
WebSocket Client Demo (
websocket_client.py)Engine Improvements
Technical Details
Engine Compatibility
✅ WebSocket-compatible: OpenAI, Kokoro, Azure, ElevenLabs
❌ Not compatible: System engine (pyttsx3) - displays clear error message
Dependencies
websockets- WebSocket client/serverpyaudio- Audio playback in demo clientFiles Changed
async_server.py- WebSocket endpoint, engine tracking, UI enhancementsstatic/tts.js- WebSocket client logic, mode switching, auto-sendwebsocket_client.py- Python demo client (new)README.md- Updated documentationBreaking Changes
None - all changes are additive and backward compatible.
Testing
Tested with OpenAI and Kokoro engines. WebSocket mode successfully handles: