Transcribe video or audio files (anything ffmpeg can decode) to timestamped text locally (no cloud APIs).
- macOS, Windows, or Linux
- Python 3.10+ (3.11 recommended)
ffmpeginstalled and available on your PATH
Install ffmpeg (macOS):
brew install ffmpegInstall ffmpeg (Windows):
winget install Gyan.FFmpeg- or
choco install ffmpeg
Install ffmpeg (Linux, Debian/Ubuntu):
sudo apt-get update
sudo apt-get install -y ffmpegFrom this repo folder:
python3 -m venv .venv
source .venv/bin/activate # macOS/Linux
python -m pip install -U pip
pip install -r requirements.txtOn Windows (PowerShell):
py -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -U pip
pip install -r requirements.txtOn Windows (cmd.exe):
py -m venv .venv
.\.venv\Scripts\activate.bat
python -m pip install -U pip
pip install -r requirements.txtTranscribe a media file to timestamped text + JSON segments:
python -m transcriber.cli --input /path/to/video.mkv --model small --outdir ./outputOutputs:
output/<video>.timestamps.txtoutput/<video>.segments.json
Launch the GUI:
python -m transcriber.gui- The first run will download the selected Whisper model weights into your local cache (still running on-device).
- All transcription happens locally on your machine.
--model: Whisper model name (e.g.tiny,base,small,medium,large-v3)--language: defaults toen--device: defaults toauto(tries GPU backends first, falls back to CPU)--compute-type: if omitted, defaults based on device (GPU:float16, CPU:int8)