Real-time multi-language audio translator with live transcription and translation. Polyglot captures system audio (games, videos, streams) and translates speech into multiple target languages simultaneously using AI.
Real-time translations in 4 languages with live audio visualization and debug panel
- π§ System Audio Capture - Translates any audio playing on your computer via WASAPI loopback (Windows)
- π£οΈ Auto Language Detection - Automatically detects the spoken language using Whisper
- π GPU Accelerated - Runs on CUDA-enabled GPUs (RTX 5080 tested with PyTorch nightly)
- β‘ Parallel Translation - Translates to multiple languages simultaneously using threading
- π― Smart Sentence Detection - Configurable silence detection for natural sentence boundaries
- π Real-time Dashboard - Live visual feedback with audio level graphs and debug metrics
- πΎ Auto-Save - Continuous transcript logging to
transcript.txt - βοΈ Live Configuration - Adjust detection thresholds in real-time via web UI
- Transcription: OpenAI Whisper
large-v3(multilingual speech recognition) - Translation: Meta M2M100
1.2B(100 languages support) - Backend: Flask + SocketIO (WebSocket real-time communication)
- Frontend: Vanilla JavaScript with real-time updates
- Audio: PyAudioWPatch (WASAPI loopback for Windows system audio)
- GPU: PyTorch with CUDA 12.9 (RTX 5080 Blackwell support)
- GPU: CUDA-enabled GPU (RTX 5080 or compatible)
- RAM: 16GB+ recommended
- VRAM: 10GB+ for Whisper large-v3 + M2M100 1.2B
- OS: Windows (for WASAPI loopback audio)
- Python 3.8+
- CUDA Toolkit 12.9+
- PyTorch 2.10+ (nightly build for RTX 5080)
git clone https://github.com/yourusername/polyglot.git
cd polyglotpython -m venv venv
venv\Scripts\activate # Windowspip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu129pip install flask flask-socketio transformers
pip install pyaudiowpatch soundfile resampy
pip install langdetect numpyModels will be automatically downloaded on first run:
- Whisper large-v3: ~3GB
- M2M100 1.2B: ~5GB
python app.pyYou should see:
============================================================
Polyglot π - Real-time Audio Translator
============================================================
Open your browser to: http://localhost:5000
Device: cuda
Whisper Model: large-v3
Translation Model: facebook/m2m100_1.2B
Target languages: English, German, French, Italian
Press Ctrl+C to stop
Navigate to http://localhost:5000 in your browser
Click βοΈ Settings to:
- Select target languages for translation
- Adjust audio detection thresholds
- Fine-tune sentence detection sensitivity
- Click
βΆοΈ Start Listening - Play any audio on your computer (YouTube, games, etc.)
- Watch real-time transcriptions and translations appear!
Click π Debug to see:
- Real-time audio level graphs
- Silence detection progress
- Buffer status and processing state
Edit config.py to customize:
class Config:
# Model selection
WHISPER_MODEL = "large-v3" # tiny, base, small, medium, large, large-v2, large-v3
TRANSLATION_MODEL = "facebook/m2m100_1.2B" # or m2m100_418M for faster/lighter
# Target languages
TARGET_LANGUAGES = [
{"code": "en", "name": "English"},
{"code": "de", "name": "German"},
{"code": "fr", "name": "French"},
{"code": "it", "name": "Italian"},
]
# Audio detection thresholds
MIN_AUDIO_LENGTH = 3.0 # Minimum seconds before processing
MAX_AUDIO_LENGTH = 30.0 # Maximum seconds (force processing)
SILENCE_THRESHOLD = 0.01 # Volume level considered silence
SILENCE_CHUNKS = 15 # Consecutive silent chunks to trigger endPolyglot supports 100+ languages via M2M100:
Major Languages: English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Turkish, Vietnamese, Thai, Indonesian, Hebrew, Greek, Swedish, Danish, Norwegian, Finnish, Czech, Romanian, Ukrainian, Persian, Bengali, Tamil, Telugu, Urdu, and many more!
See M2M100 documentation for full language list.
Adjust audio detection parameters in real-time:
- Min Audio Length: Increase to avoid tiny fragments, decrease for shorter sentences
- Max Audio Length: Increase for longer sentences, decrease if cutting mid-speech
- Silence Threshold: Increase if breaking at small pauses, decrease if missing breaks
- Silence Duration: Increase if breaking too often, decrease if running together
Monitor audio processing in real-time:
- Audio Level Graph: Green bar shows current audio level vs threshold (orange line)
- Silence Counter Graph: Blue bar shows progress toward sentence detection
- Processing Stats: Buffer size, processing state, chunk limits
All transcriptions are automatically saved to transcript.txt with timestamps.
Device: cpu
Solution: Install PyTorch with CUDA support:
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu129Solution: Check that audio is playing and verify WASAPI loopback device in logs.
Solution: Increase SILENCE_CHUNKS in Settings or config.py
Solution: Decrease SILENCE_CHUNKS and/or SILENCE_THRESHOLD
Solution: Already handled via MIN_AUDIO_LEVEL check. Increase if needed.
On RTX 5080:
- Whisper large-v3: ~1-3 seconds per sentence
- M2M100 1.2B: ~0.3-0.5 seconds per language (parallel)
- Total latency: ~2-4 seconds from speech to translation
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper - Speech recognition
- Meta M2M100 - Translation model
- PyAudioWPatch - WASAPI audio capture
- Flask-SocketIO - Real-time communication
Project Link: https://github.com/yourusername/polyglot
Made with β€οΈ for the multilingual community
