- Web-first experience: A SvelteKit frontend that accepts uploaded audio and displays AI-generated transcripts with timestamps and metadata.
- FastAPI backend: Hosts the transcription queue, exposes REST endpoints, and processes uploads via OpenAI Whisper.
- Browser focus: The project ships as a documented web stack without desktop runtimes so anyone can clone, run, and contribute via their browser.
- Frontend: SvelteKit + Vite, TypeScript, scoped CSS.
- Backend: FastAPI + Whisper/pytorch, Python 3.12.
- Tooling:
start.shboots both services together (frontend on 5173, backend on 8000).
python3.12(or compatible 3.12.x environment) withpip(pip is typically bundled with Python).- Node.js (tested on v22.x via Volta or nvm) and
npm. ffmpeg(required for audio format processing - install viabrew install ffmpegon macOS).
This project uses Whisper's medium model (~769 MB) for transcription. Recommended hardware:
- RAM: 8GB minimum, 16GB recommended (model needs ~2-3GB RAM when loaded)
- CPU: 4+ cores recommended for reasonable transcription speed
- GPU: Optional but significantly faster
- NVIDIA: CUDA-compatible GPU with 4GB+ VRAM
- Apple Silicon (M1/M2/M3): Automatic Metal acceleration
- CPU-only: Works but slower (expect ~2x realtime for medium model)
- Storage: 2GB+ free space (for model download and temporary files)
- OS: macOS 10.15+, Linux, or Windows 10+
Performance Notes:
- CPU-only transcription speed: ~2x realtime (a 10-minute audio file takes ~5 minutes to transcribe)
- GPU acceleration can achieve 10-30x realtime depending on hardware
- First run will download the model (~769 MB) which may take a few minutes
cd backend
python -m venv venv # Creates virtual environment (venv is built into Python)
source venv/bin/activate
pip install -r requirements.txt # pip is typically bundled with Python- The backend listens on
http://localhost:8000and exposes/docsfor OpenAPI. - Uploads via the frontend stream through
transcriptionStore.uploadFilewith progress callbacks.
cd frontend
npm install- The SvelteKit app runs on
http://localhost:5173and targets the backend API.
After completing the one-time setup above, you can start both services with:
./start.shstart.shactivates the backend venv, bootsmain.py, waits a few seconds, then runsnpm run dev.- Press
Ctrl+Conce to stop both servers.
- Permissions errors when running
npm installusually mean you need a clean Volta/node install. - If uploads fail, open
backend/main.pylogs forwhispererrors and verify the frontend is pointing to the correct backend endpoint infrontend/src/lib/api.ts.