Speech-to-text and AI text processing for Windows. Transcribe audio using on-device AI models or cloud APIs via extensions, then process the result with custom LLM prompts. Your voice data stays on your PC with local models — or use cloud APIs for faster processing.
- On-device models — Parakeet TDT 0.6B (25+ languages, fast) and Canary 180M Flash (EN/DE/FR/ES with translation), running on CPU via SherpaOnnx with int8 quantization — no GPU required
- Cloud transcription — Groq Whisper, OpenAI Whisper, AssemblyAI, Deepgram, and any OpenAI-compatible server (Ollama, LM Studio, vLLM). API keys encrypted at rest via DPAPI
- Streaming preview — Silero VAD detects speech segments during recording and transcribes them in real time, showing partial results in the overlay before recording stops
- Live transcription — Floating window with real-time continuous transcription (via LiveTranscript plugin)
- File transcription — Drag-and-drop audio/video files. Supports WAV, MP3, M4A, AAC, OGG, FLAC, WMA, MP4, MKV, AVI, MOV, WebM
- Subtitle export — Export transcriptions as TXT, SRT, or WebVTT
- System-wide — Three independent hotkeys: Hybrid (short press toggles, long press is push-to-talk), Toggle-only, and Hold-only. Per-profile hotkeys for direct profile activation. Auto-pastes into any app
- Non-blocking pipeline — Multiple recordings can be queued while transcription runs in the background
- Sound feedback — Audio cues for recording start and stop
- Silence detection — Automatically stops recording after configurable silence period
- Whisper mode — Boosted microphone gain for quiet speech
- Audio normalization — Automatic gain control for consistent input levels
- Media pause — Automatically pauses media playback during recording
- Audio ducking — Reduces system volume while recording
- Custom prompts — Process transcriptions (or any text) with LLM prompts. Standalone Prompt Palette via global hotkey — a floating panel for AI text processing independent of dictation
- LLM providers — Groq, OpenAI, Google Gemini, and any OpenAI-compatible server (Ollama, LM Studio, vLLM) — all via plugins
- Translation — Cloud LLM translation with local Marian ONNX model fallback. 20 target languages: EN, DE, FR, ES, IT, NL, PL, SV, DA, FI, CS, RU, UK, HU, JA, ZH, AR, HI, VI, ID
- Profiles — Per-app and per-website overrides for language, task, transcription model, and whisper mode. Match by process name and/or URL pattern with wildcard support. Automatically activates when dictating in a matched application or website
- Dictionary — Custom term corrections applied after transcription (e.g., fix names, jargon, or recurring misrecognitions). Regex support. 13 built-in term packs: Web Dev, .NET/C#, DevOps, Data & AI, Design, Game Dev, Mobile, Security, Databases, Medical, Legal, Finance, Music Production
- Snippets — Text shortcuts with trigger→replacement. Placeholders:
{date},{time},{datetime},{clipboard},{day},{year}. Date/time support custom formats (e.g.{date:dd.MM.yyyy}) - History — Searchable transcription history with raw/final text tracking, app context, inline editing, and export (TXT, CSV, Markdown, JSON)
- Plugin system — Extensible plugin architecture with SDK and marketplace. Create custom plugins for transcription engines, LLM providers, post-processors, or action plugins
- Plugin marketplace — Browse, install, update, and uninstall plugins directly from Settings. Auto-installs recommended extensions on first run
- Action plugins — Linear (create issues), Obsidian (create notes), Script (run custom scripts), Webhook (HTTP notifications)
- HTTP API — Local REST server for integration with external tools (status, models, transcription, history, profiles, dictation control)
- Model auto-unload — Automatically unloads models after configurable idle timeout to save memory
- Fluent Design — WPF-UI with Mica backdrop, native title bar, and Fluent controls
- Dynamic Island overlay — Configurable widgets (LED, timer, waveform) with multi-monitor support and real-time microphone level meter
- Home dashboard — Usage statistics (words, WPM, apps, time saved) with activity chart
- Welcome wizard — Guided 4-step onboarding: extension installation & model download, microphone test, hotkey setup. Re-accessible from the dashboard
- Auto-update — Built-in updates via Velopack
- Windows autostart — Optional start with Windows (via registry)
- System tray — Minimizes to tray with quick access
- Localization — English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Czech, Swedish, Danish, Finnish
- Windows 10 or later (x64 or ARM64)
- 8 GB RAM minimum, 16 GB+ recommended for larger models
- ~700 MB disk space for the Parakeet model, ~200 MB for Canary
| Use Case | Model | Size |
|---|---|---|
| General transcription (25+ languages) | Parakeet TDT 0.6B | 670 MB, int8 |
| Multilingual with translation (EN/DE/FR/ES) | Canary 180M Flash | 198 MB, int8 |
Both models run on CPU with int8 quantization — no GPU required. Local models are provided by the SherpaOnnx plugin, installed via the built-in marketplace.
| Provider | Model | Notes |
|---|---|---|
| Groq | whisper-large-v3 | Fast cloud transcription, supports translation |
| Groq | whisper-large-v3-turbo | Fastest, no translation |
| OpenAI | gpt-4o-transcribe | Highest accuracy |
| OpenAI | gpt-4o-mini-transcribe | Lower cost, good quality |
| OpenAI | whisper-1 | Classic, supports translation |
| AssemblyAI | various | Real-time and batch transcription |
| Deepgram | nova-3 | Fast, accurate cloud ASR |
| OpenAI Compatible | Any model | Local LLM servers (Ollama, LM Studio, vLLM) |
Cloud providers are available as plugins and can be configured in Settings > Extensions.
-
Clone the repository:
git clone https://github.com/TypeWhisper/typewhisper-win.git cd typewhisper-win -
Build with .NET 10:
dotnet build
-
Run the app:
dotnet run --project src/TypeWhisper.Windows
-
The app appears in the system tray — the welcome wizard guides you through extension installation, model download, and setup.
TypeWhisper can run a local HTTP server (default port 9876, configurable in Settings) for integration with external tools.
| Endpoint | Method | Description |
|---|---|---|
/v1/status |
GET | App status and active model |
/v1/models |
GET | List all available models (local + cloud) |
/v1/transcribe |
POST | Transcribe audio from request body. Query params: language, task (transcribe/translate) |
/v1/history |
GET | Search history with pagination. Query params: q, limit, offset |
/v1/history |
DELETE | Delete history entries by ID |
/v1/profiles |
GET | List all profiles |
/v1/profiles/toggle |
PUT | Toggle a profile on/off by ID |
/v1/dictation/start |
POST | Start recording |
/v1/dictation/stop |
POST | Stop recording |
/v1/dictation/status |
GET | Check current dictation state |
Profiles let you configure transcription settings per application or website. For example:
- Outlook — German language
- Slack — English language
- Terminal — Whisper mode always on
- github.com — English language (matches in any browser)
- docs.google.com — German language, translate to English
Create profiles in Settings > Profiles. Assign process names and/or URL patterns, set language/task/model overrides, and adjust priority. URL patterns support wildcard matching — e.g. *.github.com matches gist.github.com.
When you start dictating, TypeWhisper matches the active window and browser URL against your profiles with the following priority:
- Process + URL match — highest specificity (e.g. chrome.exe + github.com)
- URL-only match — cross-browser profiles (e.g. github.com in any browser)
- Process-only match — generic app profiles (e.g. all of Chrome)
The active profile is shown in the recording overlay.
TypeWhisper supports plugins for adding custom transcription engines, LLM providers, post-processors, and action plugins. Plugins are .NET class libraries with a manifest.json, installed to %LocalAppData%/TypeWhisper/Plugins/.
The built-in marketplace provides the following plugins:
| Plugin | Type | Description |
|---|---|---|
| SherpaOnnx | Transcription | Local ASR with Parakeet and Canary models |
| OpenAI | Transcription + LLM | OpenAI Whisper and GPT models |
| Groq | Transcription + LLM | Groq Whisper and Llama models |
| Google Gemini | LLM | Gemini 2.5 Flash, Pro, and Flash Lite |
| AssemblyAI | Transcription | AssemblyAI speech-to-text |
| Deepgram | Transcription | Deepgram Nova ASR |
| OpenAI Compatible | Transcription + LLM | Any OpenAI-compatible server (Ollama, LM Studio, vLLM) |
| Linear | Action | Create Linear issues from transcriptions |
| Obsidian | Action | Create Obsidian notes from transcriptions |
| Script | Action | Run custom scripts with transcription data |
| LiveTranscript | Action | Floating live transcription window |
| Webhook | Event | HTTP notifications for transcription events |
| Interface | Purpose |
|---|---|
ITranscriptionEnginePlugin |
Cloud/custom transcription (e.g., Whisper API) |
ILlmProviderPlugin |
LLM chat completions (e.g., translation, course correction) |
IPostProcessorPlugin |
Post-processing pipeline (text cleanup, formatting) |
IActionPlugin |
Custom actions triggered by transcription events |
ITypeWhisperPlugin |
Event observer (e.g., webhook, logging) |
The SDK includes helpers for OpenAI-compatible APIs:
OpenAiTranscriptionHelper— multipart/form-data upload for Whisper-compatible endpointsOpenAiChatHelper— chat completion requestsOpenAiApiHelper— shared HTTP error handling
typewhisper-win/
├── src/
│ ├── TypeWhisper.Core/ # Core logic (net10.0)
│ │ ├── Data/ # SQLite database with migrations
│ │ ├── Interfaces/ # Service contracts
│ │ ├── Models/ # Profile, AppSettings, ModelInfo, TermPack, etc.
│ │ └── Translation/ # MarianTokenizer, MarianConfig (local ONNX translation)
│ ├── TypeWhisper.PluginSDK/ # Plugin SDK (net10.0-windows)
│ │ ├── Helpers/ # OpenAiChatHelper, OpenAiTranscriptionHelper
│ │ └── Models/ # PluginManifest, PluginEvents, etc.
│ └── TypeWhisper.Windows/ # WPF UI layer (net10.0-windows, WPF-UI Fluent Design)
│ ├── Controls/ # HotkeyRecorderControl, MicLevelMeter
│ ├── Native/ # P/Invoke, KeyboardHook
│ ├── Services/
│ │ ├── Plugins/ # PluginLoader, PluginManager, PluginEventBus, PluginRegistryService
│ │ └── Providers/ # LocalProviderBase
│ ├── ViewModels/ # MVVM view models
│ ├── Views/ # MainWindow, SettingsWindow, WelcomeWindow, sections
│ ├── Resources/ # Icons, sounds, silero_vad.onnx
│ └── App.xaml.cs # Composition root
├── plugins/ # Plugin source (distributed via marketplace)
│ ├── TypeWhisper.Plugin.OpenAi/ # OpenAI transcription + LLM
│ ├── TypeWhisper.Plugin.Groq/ # Groq transcription + LLM
│ ├── TypeWhisper.Plugin.Gemini/ # Google Gemini LLM
│ ├── TypeWhisper.Plugin.OpenAiCompatible/ # OpenAI-compatible servers
│ ├── TypeWhisper.Plugin.SherpaOnnx/ # Local ASR (Parakeet, Canary)
│ ├── TypeWhisper.Plugin.AssemblyAi/ # AssemblyAI transcription
│ ├── TypeWhisper.Plugin.Deepgram/ # Deepgram transcription
│ ├── TypeWhisper.Plugin.Linear/ # Linear issue creation
│ ├── TypeWhisper.Plugin.Obsidian/ # Obsidian note creation
│ ├── TypeWhisper.Plugin.Script/ # Custom script execution
│ ├── TypeWhisper.Plugin.LiveTranscript/ # Live transcription window
│ └── TypeWhisper.Plugin.Webhook/ # Webhook event notifications
└── tests/
├── TypeWhisper.Core.Tests/ # xUnit + Moq, in-memory SQLite
└── TypeWhisper.PluginSystem.Tests/ # Plugin infrastructure tests
Patterns: MVVM with CommunityToolkit.Mvvm. App.xaml.cs is the composition root. SQLite for persistence with a custom migration pattern. Plugin system with AssemblyLoadContext isolation and manifest-based discovery. Post-processing pipeline with priority-based ordering.
Key dependencies: NAudio (audio), NHotkey.Wpf (hotkeys), WPF-UI (Fluent Design), Microsoft.ML.OnnxRuntime (local translation), Velopack (updates), H.NotifyIcon.Wpf (system tray).
GPLv3 — see LICENSE for details. Commercial licensing available — see LICENSE-COMMERCIAL.md. Trademark policy — see TRADEMARK.md.