real_time_transcribe

Use Pyaudio to catch speakers audio from Stereo Mixer, and then use Whisper to transcribe voice to text. Also can input a youtube url and get the transcription.

"rtt stream" to open a tkinter ui and show real time transcription
"rtt yt -u {url}" given a youtube video url and transcribe it into docx file.

Install with GPU usage

to use GPU, need to install torch with cuda compatible version install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

git clone https://github.com/tea9296/real_time_transcribe.git
pip install -r requirements_gpu.txt --extra-index-url https://download.pytorch.org/whl/cu126
pip install .

Install

make sure you have Stereo Mixer in the computer and enable it in recording.
use pip install git+https://github.com/tea9296/real_time_transcribe.git to install the rtt package.
use "rtt stream -l {language} -d {delay_time}" to open a tkinter ui and show real time transcription
use "rtt yt -u {url} -o {output_file_name}" given a youtube video url and transcribe it into docx file.
use "rtt wav -i {wav file path} -o {output_file_name}" to transcribe a .wav file.

Example

rtt stream -l Japanese -d 5

-l or --language you can type English, Japanese or Chinese... -d or --delay_time you can type a number less than 30, this means the length (seconds) of each audio clip, and each clip will send to whisper model to output texts.

rtt yt -u https://www.youtube.com/watch?.... -o tsp.docx
rtt yt -u https://www.youtube.com/watch?.... -o tsp.srt -l Chinese

-u or --url the youtube url link. -o or --output_file the output .docx file name, default is the title of youtube url link. If the extension is .srt or .txt, will save into text file with timestamps. -l or --language the language. Chinese, English or Japanese...

Future work

separate audio data by the loudness (try not to split a sentense)
fix button not work issue when the tkinter user interface is transparent, so that i can start and end the program or changing config using user interface.
add some config like the model of whipser(currently medium), select recording device, font color and size?

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
rtt		rtt
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
requirements_gpu.txt		requirements_gpu.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

real_time_transcribe

Install with GPU usage

Install

Example

Future work

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

real_time_transcribe

Install with GPU usage

Install

Example

Future work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages