Audio Transcriber

Overview

Web-first experience: A SvelteKit frontend that accepts uploaded audio and displays AI-generated transcripts with timestamps and metadata.
FastAPI backend: Hosts the transcription queue, exposes REST endpoints, and processes uploads via OpenAI Whisper.
Browser focus: The project ships as a documented web stack without desktop runtimes so anyone can clone, run, and contribute via their browser.

Stack

Frontend: SvelteKit + Vite, TypeScript, scoped CSS.
Backend: FastAPI + Whisper/pytorch, Python 3.12.
Tooling: start.sh boots both services together (frontend on 5173, backend on 8000).

Getting started

1. Prerequisites

python3.12 (or compatible 3.12.x environment) with pip (pip is typically bundled with Python).
Node.js (tested on v22.x via Volta or nvm) and npm.
ffmpeg (required for audio format processing - install via brew install ffmpeg on macOS).

System Requirements

This project uses Whisper's medium model (~769 MB) for transcription. Recommended hardware:

RAM: 8GB minimum, 16GB recommended (model needs ~2-3GB RAM when loaded)
CPU: 4+ cores recommended for reasonable transcription speed
GPU: Optional but significantly faster
- NVIDIA: CUDA-compatible GPU with 4GB+ VRAM
- Apple Silicon (M1/M2/M3): Automatic Metal acceleration
- CPU-only: Works but slower (expect ~2x realtime for medium model)
Storage: 2GB+ free space (for model download and temporary files)
OS: macOS 10.15+, Linux, or Windows 10+

Performance Notes:

CPU-only transcription speed: ~2x realtime (a 10-minute audio file takes ~5 minutes to transcribe)
GPU acceleration can achieve 10-30x realtime depending on hardware
First run will download the model (~769 MB) which may take a few minutes

2. Backend setup (one-time)

cd backend
python -m venv venv  # Creates virtual environment (venv is built into Python)
source venv/bin/activate
pip install -r requirements.txt  # pip is typically bundled with Python

The backend listens on http://localhost:8000 and exposes /docs for OpenAPI.
Uploads via the frontend stream through transcriptionStore.uploadFile with progress callbacks.

3. Frontend setup (one-time)

cd frontend
npm install

The SvelteKit app runs on http://localhost:5173 and targets the backend API.

4. Run both together

After completing the one-time setup above, you can start both services with:

./start.sh

start.sh activates the backend venv, boots main.py, waits a few seconds, then runs npm run dev.
Press Ctrl+C once to stop both servers.

Troubleshooting

Permissions errors when running npm install usually mean you need a clean Volta/node install.
If uploads fail, open backend/main.py logs for whisper errors and verify the frontend is pointing to the correct backend endpoint in frontend/src/lib/api.ts.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.cursor/rules		.cursor/rules
backend		backend
frontend		frontend
shared		shared
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Transcriber

Overview

Stack

Getting started

1. Prerequisites

System Requirements

2. Backend setup (one-time)

3. Frontend setup (one-time)

4. Run both together

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Audio Transcriber

Overview

Stack

Getting started

1. Prerequisites

System Requirements

2. Backend setup (one-time)

3. Frontend setup (one-time)

4. Run both together

Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages