Goal of this project is to create a real time transcription tool that can be used to transcribe audio from microphone and output to text.
- Transcribe audio from microphone and output to text in terminal.
- ASR (Automatic Speech Recognition) using Google Speech API, Whisper, or AssemblyAI.
- Python 3.6 or higher
- All requirements in requirements.txt
If could not build wheels for pyaudio, On Mac
brew install portaudio
On Linux
sudo apt-get install portaudio19-dev
Then try again
-
RTT.py
- Main file that runs the program.
-
RTT_spectrogram.py
- Records audio from microphone and outputs to text using spectrogram.
- Taken from: https://python-sounddevice.readthedocs.io/en/0.4.1/examples.html#recording-with-arbitrary-duration
-
system_record.py
- Records audio from system and outputs to text using loopback.
-
util.py
- Utility functions for the program.
-
file_converter.py
- Converts audio files to any supporting format. Requires ffmpeg.
-
ASR
- Folder containing ASR modules. Currently supports Google Speech API, Whisper, and AssemblyAI.
-
If you don’t have Python installed, install it from here.
-
Clone this repository.
-
Navigate into the project directory:
cd RTT -
Create a new virtual environment:
python -m venv venv . venv/bin/activate -
Install the requirements:
pip install -r requirements.txt
-
Make a copy of the example environment variables file:
cp .env.example .env
-
Add your API key to the newly created
.envfile. -
Run the app using python or python3 depending on your system:
python RTT.py