Like SuperWhisper, but free. Like Wispr Flow, but local.
Hold a key, speak, release — your words appear wherever your cursor is. Built for vibe coding and conversations with AI agents.
- Free & open source — no subscription, no cloud dependency
- Runs locally on Apple Silicon via MLX Whisper or Parakeet
- Or use cloud (Groq) when you prefer
- One command install —
uv tool install git+https://github.com/bokan/stt.git
- Global hotkey — works in any application, configurable trigger key
- Hold-to-record — no start/stop buttons, just hold and speak
- Auto-type — transcribed text is typed directly into the active field
- Shift+record — automatically sends Enter after typing (great for chat interfaces)
- Audio feedback — subtle system sounds confirm recording state (can be disabled)
- Silence detection — automatically skips transcription when no speech detected
- Slash commands — say "slash help" to type
/help - Context prompts — improve accuracy with domain-specific vocabulary
- Auto-updates — notifies when a new version is available
- macOS with Apple Silicon (M1/M2/M3/M4)
- UV package manager
- For cloud mode (optional): Groq API key
uv tool install git+https://github.com/bokan/stt.gitOn first run, a setup wizard will guide you through configuration.
To update:
uv tool install --reinstall git+https://github.com/bokan/stt.gitSTT needs macOS permissions to capture the global hotkey and type text into other apps.
Grant these to your terminal app (iTerm2, Terminal, Warp, etc.) — not "stt":
- Accessibility — System Settings → Privacy & Security → Accessibility
- Input Monitoring — System Settings → Privacy & Security → Input Monitoring
stt| Action | Keys |
|---|---|
| Record | Hold Right Command (default) |
| Record + Enter | Hold Shift while recording |
| Cancel recording / stuck transcription | ESC |
| Quit | Ctrl+C |
Settings are stored in ~/.config/stt/.env. Run stt --config to reconfigure, or edit directly:
# Transcription provider: "mlx" (default), "whisper-cpp-http", "parakeet", or "groq"
PROVIDER=mlx
# Local HTTP server URL (default: http://localhost:8080)
WHISPER_CPP_HTTP_URL=http://localhost:8080
# Required for cloud mode only
GROQ_API_KEY=gsk_...
# Audio device (saved automatically after first selection; device name, not index)
AUDIO_DEVICE=MacBook Pro Microphone
# Language code for transcription
LANGUAGE=en
# Trigger key: cmd_r, cmd_l, alt_r, alt_l, ctrl_r, ctrl_l, shift_r
HOTKEY=cmd_r
# Context prompt to improve accuracy for specific terms
PROMPT=Claude, Anthropic, TypeScript, React, Python
# Disable audio feedback sounds
SOUND_ENABLED=trueSTT includes a prompt overlay (triggered by Right Option by default) for quickly pasting common prompts.
Prompts live in:
~/.config/stt/prompts/*.md
Local transcription uses Apple Silicon GPU acceleration via MLX. On first run, the Whisper large-v3 model (~3GB) will be downloaded and cached. Subsequent runs load from cache.
Runs completely offline — no API key required. Supports 99 languages and context prompts.
Nvidia's Parakeet model via MLX. Faster than Whisper (~3000x realtime factor) with comparable accuracy.
PROVIDER=parakeetOn first run, the model (~2.5GB) will be downloaded and cached.
Limitations:
- English only
Phonetic correction: While Parakeet doesn't support Whisper-style prompts, it uses the PROMPT setting for phonetic post-processing. Terms like Claude Code, WezTerm will correct sound-alike ASR errors (e.g., "cloud code" → "Claude Code", "Vez term" → "WezTerm").
To use cloud transcription instead:
PROVIDER=groq
GROQ_API_KEY=gsk_...Requires a Groq API key (free tier available).
Run a local HTTP server with Whisper transcription. Useful for performance or custom integration.
PROVIDER=whisper-cpp-http
WHISPER_CPP_HTTP_URL=http://localhost:8080Start the server:
# Terminal 1: Start the whisper.cpp server
./whisper-server -m models/ggml-large-v3.bin -f
# Or run in background with a custom port
./whisper-server -m models/ggml-large-v3.bin -f -t 4 -ngl 32 -p 8080The server provides a whisper.cpp-compatible endpoint:
curl -X POST http://localhost:8080/inference \
-H "Content-Type: multipart/form-data" \
-F "file=@audio.wav" \
-F "language=en"Benefits:
- Fast HTTP API for integrating with other services
- Reuse whisper.cpp model across multiple applications
- Hardware accelerated on CPU/NVIDIA
- Configurable temperature, model, and decoding options
The PROMPT setting helps Whisper recognize domain-specific terms:
# Programming
PROMPT=TypeScript, React, useState, async await, API endpoint
# AI tools
PROMPT=Claude, Anthropic, OpenAI, Groq, LLM, GPTgit clone https://github.com/bokan/stt.git
cd stt
uv sync
uv run sttMIT
