Skip to content

VoiceBridge delivers seamless, high-fidelity speech-to-text transcription and expressive text-to-speech synthesis with a unified, easy-to-integrate workflow.

Notifications You must be signed in to change notification settings

VectorPioneer/bidirectional-speech-text-conversion

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

VoiceBridge: Speech ↔ Text Conversion Suite

VoiceBridge offers a streamlined reference implementation for low-overhead speech recognition and lifelike speech synthesis in Python. It bundles curated scripts, reproducible workflows, and documentation so you can quickly evaluate bidirectional audio-text conversion on your own machines.

Key Capabilities

  • Real-time ready speech-to-text transcription using SpeechRecognition.
  • Natural-sounding text-to-speech playback via gTTS.
  • Small, dependency-light Python scripts that are easy to adapt.
  • Clear separation between input assets, transcription output, and generated audio.

Repository Structure

  • Speech To Text.py — helper script that transcribes WAV audio into a UTF-8 text file.
  • Text To Speech.py — companion script that vocalizes text and exports an audio file.
  • README.md — this guide.

Getting Started

  1. Create a fresh Python 3.9+ environment (virtualenv, venv, or conda).

  2. Install the required libraries:

    pip install SpeechRecognition gTTS pydub
    

    Windows users may also need to install FFmpeg and ensure it is on the PATH for media encoding.

  3. Prepare your assets:

    • Speech-to-text expects a mono WAV file (.wav, 16-bit) at input/audio.wav by default.
    • Text-to-speech expects a UTF-8 text file at input/prompts.txt.

Feel free to adjust paths or output filenames inside each script.

Usage

Convert Speech to Text

python "Speech To Text.py" --audio input/audio.wav --output output/transcript.txt

The script normalizes audio, submits it to the recognizer, and writes the recognized transcript to the target file. For noisy recordings, experiment with different recognizer engines (google, sphinx, etc.) or tweak pause thresholds in the script.

Convert Text to Speech

python "Text To Speech.py" --text input/prompts.txt --voice en --slow false --output output/speech.mp3

The script loads the provided text, requests synthesis from Google Text-to-Speech, and saves an MP3. Switch the --voice argument to any language code supported by gTTS.

Customization Ideas

  • Add CLI flags for batch transcription or multi-lingual speech synthesis.
  • Integrate with an async message bus to process audio uploads automatically.
  • Plug in different backends such as Whisper or Coqui TTS for offline workflows.

License

This repository is distributed under the MIT License. Consult LICENSE for the full text.

About

VoiceBridge delivers seamless, high-fidelity speech-to-text transcription and expressive text-to-speech synthesis with a unified, easy-to-integrate workflow.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%