STT Benchmark

A TypeScript + Python web UI for benchmarking open-source speech-to-text models on CPU.

Model	Variants	Notes
faster-whisper	tiny · base · small · medium	CTranslate2-optimised Whisper
whisper.cpp	base.en (configurable)	Pure C++ inference
Vosk	small-en (configurable)	Kaldi-based, fully offline
WhisperX	base (configurable)	Whisper + forced alignment

Prerequisites

Tool	Minimum version	Check
Node.js	18 LTS	`node -v`
Python	3.10	`python3 --version`
ffmpeg	any recent	`ffmpeg -version`
cmake	any recent	`cmake --version`
whisper.cpp binary	—	see §4

Install system dependencies

Ubuntu / Debian

sudo apt update && sudo apt install -y ffmpeg python3-pip python3-venv cmake

macOS (Homebrew)

brew install ffmpeg node python cmake

Windows

Install Node 18 LTS from https://nodejs.org
Install Python 3.10+ from https://python.org
Install ffmpeg: winget install ffmpeg or download from https://ffmpeg.org/download.html

1. Clone & install Node dependencies

git clone <repo-url> stt-benchmark
cd stt-benchmark
npm install

2. Install Python dependencies

It is strongly recommended to use a virtual environment.

python3 -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate

# CPU-only PyTorch (saves ~2 GB vs the CUDA wheel)
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu

# All other deps
pip install -r requirements.txt

Note: whisperx has a transitive dependency on ctranslate2. If the pip install fails, try upgrading pip first: pip install --upgrade pip.

3. Install Vosk model

mkdir -p models
cd models

# Small English model (~40 MB) — fastest
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip

# Full English model (~1.8 GB) — more accurate
# wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22.zip
# unzip vosk-model-en-us-0.22.zip

By default the server looks for:

models/vosk-model-small-en-us-0.15/

Override with an environment variable:

export VOSK_MODEL_PATH=/absolute/path/to/vosk-model

4. Install whisper.cpp

Option A — build from source (recommended)

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build && cmake --build build --config Release -j$(nproc)

# Download a GGML model
bash models/download-ggml-model.sh base.en

# Copy binary and model into the project
cp build/bin/whisper-cli ../stt-benchmark/whisper-cli
cp models/ggml-base.en.bin ../stt-benchmark/models/
cd ..

Option B — pre-built binary (Linux x86_64)

wget https://github.com/ggerganov/whisper.cpp/releases/latest/download/whisper.cpp-linux-x64.tar.gz
tar -xzf whisper.cpp-linux-x64.tar.gz

Environment variables:

export WHISPER_CPP_BIN=/path/to/whisper-cli      # default: "whisper-cli" in project root
export WHISPER_CPP_MODEL=/path/to/ggml-base.en.bin   # default: models/ggml-base.en.bin

The binary produced by recent whisper.cpp builds is named whisper-cli, not main.

5. Configure credentials

Copy the example env file and set your credentials:

cp .env.example .env

Edit .env:

AUTH_USER=your_username
AUTH_PASS=your_password
PORT=3001

The server requires HTTP Basic Auth on all API routes. The frontend login screen reads the same credentials. .env is gitignored and never committed.

6. Build the TypeScript frontend

# Compile app.ts → frontend/dist/app.js, then copy next to index.html
npm run build:frontend
cp frontend/dist/app.js frontend/app.js

7. Start and stop the server

Use the provided scripts (they handle venv activation, port cleanup, and health checking):

# Start (waits up to 20 s for the server to be ready)
./start.sh

# Stop
./stop.sh

Or run manually:

source .venv/bin/activate
npm run dev

The server starts on http://localhost:3001 (or the PORT set in .env).

When accessed over a network (e.g. a VM's public IP), use the machine's IP directly — e.g. http://34.x.x.x:3001. All API calls use relative URLs so they follow the page origin.

8. Test each model individually

All scripts accept an audio file path and output JSON. You can test them directly:

faster-whisper

python3 backend/scripts/run_faster_whisper.py sample.wav tiny
python3 backend/scripts/run_faster_whisper.py sample.wav base
python3 backend/scripts/run_faster_whisper.py sample.wav small
python3 backend/scripts/run_faster_whisper.py sample.wav medium

WhisperX

python3 backend/scripts/run_whisperx.py sample.wav

Vosk

python3 backend/scripts/vosk_transcribe.py sample.wav models/vosk-model-small-en-us-0.15

whisper.cpp (via curl with auth)

curl -u emtester:supress232 -X POST http://localhost:3001/transcribe/whisper-cpp \
  -F "audio=@sample.wav"

All at once via the API

AUTH="-u emtester:supress232"

# List available models
curl $AUTH http://localhost:3001/models

# Transcribe with faster-whisper base
curl $AUTH -X POST http://localhost:3001/transcribe/faster-whisper-base \
  -F "audio=@sample.wav"

# View saved benchmark results
curl $AUTH http://localhost:3001/benchmark

API Reference

All endpoints require HTTP Basic Auth (Authorization: Basic <base64>) or the X-API-Key: <base64(user:pass)> header.

Single-file benchmarking

Method	Endpoint	Description
`GET`	`/models`	List all available model IDs
`POST`	`/transcribe/:modelId`	Upload audio (multipart `audio` field) and transcribe
`GET`	`/benchmark`	Return all saved single-file results
`DELETE`	`/benchmark`	Clear all single-file results
`DELETE`	`/benchmark/:id`	Delete one result by ID

Batch / ZIP benchmarking

Method	Endpoint	Description
`POST`	`/benchmark-batch`	Upload a ZIP containing audio files + `mapping.json`
`GET`	`/benchmark-batch`	Return all past batch analyses
`DELETE`	`/benchmark-batch`	Clear all batch results
`DELETE`	`/benchmark-batch/:id`	Delete one batch result by ID

ZIP structure (flexible — files may be at root or inside a single folder):

my-test.zip
├── mapping.json          ← required: maps filename → reference transcript
├── audio1.wav
└── audio2.wav

mapping.json format:

{
  "audio1.wav": "The reference transcript for audio one.",
  "audio2.wav": "Another reference transcript."
}

The batch endpoint runs every selected model against every audio file and returns per-file WER (Word Error Rate) and CER (Character Error Rate) alongside transcription time and audio duration.

Audio duration backfill

Method	Endpoint	Description
`GET`	`/backfill-durations/scan`	Scan uploads/ and backfill durations for existing results
`POST`	`/backfill-durations/zip`	Upload a ZIP to backfill durations for results that lost their audio

Model IDs

ID	Description
`faster-whisper-tiny`	faster-whisper, tiny variant
`faster-whisper-base`	faster-whisper, base variant
`faster-whisper-small`	faster-whisper, small variant
`faster-whisper-medium`	faster-whisper, medium variant
`whisper-cpp`	whisper.cpp, ggml-base.en
`vosk`	Vosk small-en
`whisperx`	WhisperX base

Single-file response shape

{
  "id": "uuid",
  "model": "faster-whisper-base",
  "variant": "base",
  "transcription": "Hello world this is a test.",
  "timeTakenMs": 1842,
  "cpuPercent": 98.5,
  "audioFile": "test.wav",
  "audioDurationMs": 4200,
  "timestamp": "2026-04-16T12:34:56.000Z",
  "error": null
}

Project Structure

stt-benchmark/
├── backend/
│   ├── server.ts                  ← Express REST API (ts-node)
│   └── scripts/
│       ├── run_faster_whisper.py  ← faster-whisper inference
│       ├── run_whisperx.py        ← WhisperX inference
│       └── vosk_transcribe.py     ← Vosk inference
├── frontend/
│   ├── index.html                 ← Single-page UI + login overlay
│   ├── app.ts                     ← TypeScript source
│   └── app.js                     ← Compiled output (copy from dist/ after build)
├── models/                        ← Vosk model dir + GGML .bin (not committed)
├── uploads/                       ← Temporary audio uploads (runtime, not committed)
├── .env                           ← Credentials & port (not committed — see .env.example)
├── .env.example                   ← Template for .env
├── benchmark_results.json         ← Persistent single-file results (runtime)
├── batch_results.json             ← Persistent batch results (runtime)
├── whisper-cli                    ← whisper.cpp binary (not committed)
├── start.sh                       ← Start server with health check
├── stop.sh                        ← Stop server cleanly
├── package.json
├── tsconfig.json                  ← Backend TS config
├── tsconfig.frontend.json         ← Frontend TS config
└── requirements.txt

Troubleshooting

Login screen shown / 401 errors — ensure .env exists with correct AUTH_USER/AUTH_PASS. The frontend uses HTTP Basic Auth; credentials are stored in sessionStorage for the browser session only.

CORS / private network error in browser — if accessing via a VM's public IP, open the app at http://<public-ip>:3001 directly, not via a proxy that rewrites the origin to localhost.

faster-whisper first run is slow — it downloads model weights on first use (~150 MB for tiny). Subsequent runs use the local cache.

whisperx ImportError on ctranslate2 — try pip install ctranslate2==4.1.0 explicitly.

Vosk "model not found" — ensure VOSK_MODEL_PATH points to the extracted directory (not the zip).

whisper.cpp "command not found" — set WHISPER_CPP_BIN to the full path of the whisper-cli binary, or place it in the project root.

Audio format errors — the backend accepts WAV, MP3, FLAC, OGG, M4A. Vosk converts internally via ffmpeg; ensure ffmpeg is in PATH.

High memory on medium model — faster-whisper medium requires ~1.5 GB RAM. Use tiny or base for constrained environments.

EADDRINUSE on start — start.sh uses fuser to kill any existing process on the port before starting. If fuser is not installed: sudo apt install psmisc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STT Benchmark

Prerequisites

Install system dependencies

1. Clone & install Node dependencies

2. Install Python dependencies

3. Install Vosk model

4. Install whisper.cpp

5. Configure credentials

6. Build the TypeScript frontend

7. Start and stop the server

8. Test each model individually

API Reference

Single-file benchmarking

Batch / ZIP benchmarking

Audio duration backfill

Model IDs

Single-file response shape

Project Structure

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
backend		backend
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
start.sh		start.sh
stop.sh		stop.sh
tsconfig.frontend.json		tsconfig.frontend.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

STT Benchmark

Prerequisites

Install system dependencies

1. Clone & install Node dependencies

2. Install Python dependencies

3. Install Vosk model

4. Install whisper.cpp

5. Configure credentials

6. Build the TypeScript frontend

7. Start and stop the server

8. Test each model individually

API Reference

Single-file benchmarking

Batch / ZIP benchmarking

Audio duration backfill

Model IDs

Single-file response shape

Project Structure

Troubleshooting

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages