VoiceBridge offers a streamlined reference implementation for low-overhead speech recognition and lifelike speech synthesis in Python. It bundles curated scripts, reproducible workflows, and documentation so you can quickly evaluate bidirectional audio-text conversion on your own machines.
- Real-time ready speech-to-text transcription using
SpeechRecognition. - Natural-sounding text-to-speech playback via
gTTS. - Small, dependency-light Python scripts that are easy to adapt.
- Clear separation between input assets, transcription output, and generated audio.
Speech To Text.py— helper script that transcribes WAV audio into a UTF-8 text file.Text To Speech.py— companion script that vocalizes text and exports an audio file.README.md— this guide.
-
Create a fresh Python 3.9+ environment (virtualenv, venv, or conda).
-
Install the required libraries:
pip install SpeechRecognition gTTS pydubWindows users may also need to install FFmpeg and ensure it is on the
PATHfor media encoding. -
Prepare your assets:
- Speech-to-text expects a mono WAV file (
.wav, 16-bit) atinput/audio.wavby default. - Text-to-speech expects a UTF-8 text file at
input/prompts.txt.
- Speech-to-text expects a mono WAV file (
Feel free to adjust paths or output filenames inside each script.
python "Speech To Text.py" --audio input/audio.wav --output output/transcript.txt
The script normalizes audio, submits it to the recognizer, and writes the recognized transcript to the target file. For noisy recordings, experiment with different recognizer engines (google, sphinx, etc.) or tweak pause thresholds in the script.
python "Text To Speech.py" --text input/prompts.txt --voice en --slow false --output output/speech.mp3
The script loads the provided text, requests synthesis from Google Text-to-Speech, and saves an MP3. Switch the --voice argument to any language code supported by gTTS.
- Add CLI flags for batch transcription or multi-lingual speech synthesis.
- Integrate with an async message bus to process audio uploads automatically.
- Plug in different backends such as Whisper or Coqui TTS for offline workflows.
This repository is distributed under the MIT License. Consult LICENSE for the full text.