TypeWhisper for Windows

Speech-to-text and AI text processing for Windows. Transcribe audio using on-device AI models or cloud APIs via extensions, then process the result with custom LLM prompts. Your voice data stays on your PC with local models — or use cloud APIs for faster processing.

Features

Transcription

On-device models — Parakeet TDT 0.6B (25+ languages, fast) and Canary 180M Flash (EN/DE/FR/ES with translation), running on CPU via SherpaOnnx with int8 quantization — no GPU required
Cloud transcription — Groq Whisper, OpenAI Whisper, AssemblyAI, Deepgram, and any OpenAI-compatible server (Ollama, LM Studio, vLLM). API keys encrypted at rest via DPAPI
Streaming preview — Silero VAD detects speech segments during recording and transcribes them in real time, showing partial results in the overlay before recording stops
Live transcription — Floating window with real-time continuous transcription (via LiveTranscript plugin)
File transcription — Drag-and-drop audio/video files. Supports WAV, MP3, M4A, AAC, OGG, FLAC, WMA, MP4, MKV, AVI, MOV, WebM
Subtitle export — Export transcriptions as TXT, SRT, or WebVTT

Dictation

System-wide — Three independent hotkeys: Hybrid (short press toggles, long press is push-to-talk), Toggle-only, and Hold-only. Per-profile hotkeys for direct profile activation. Auto-pastes into any app
Non-blocking pipeline — Multiple recordings can be queued while transcription runs in the background
Sound feedback — Audio cues for recording start and stop
Silence detection — Automatically stops recording after configurable silence period
Whisper mode — Boosted microphone gain for quiet speech
Audio normalization — Automatic gain control for consistent input levels
Media pause — Automatically pauses media playback during recording
Audio ducking — Reduces system volume while recording

AI Processing

Custom prompts — Process transcriptions (or any text) with LLM prompts. Standalone Prompt Palette via global hotkey — a floating panel for AI text processing independent of dictation
LLM providers — Groq, OpenAI, Google Gemini, and any OpenAI-compatible server (Ollama, LM Studio, vLLM) — all via plugins
Translation — Cloud LLM translation with local Marian ONNX model fallback. 20 target languages: EN, DE, FR, ES, IT, NL, PL, SV, DA, FI, CS, RU, UK, HU, JA, ZH, AR, HI, VI, ID

Personalization

Profiles — Per-app and per-website overrides for language, task, transcription model, and whisper mode. Match by process name and/or URL pattern with wildcard support. Automatically activates when dictating in a matched application or website
Dictionary — Custom term corrections applied after transcription (e.g., fix names, jargon, or recurring misrecognitions). Regex support. 13 built-in term packs: Web Dev, .NET/C#, DevOps, Data & AI, Design, Game Dev, Mobile, Security, Databases, Medical, Legal, Finance, Music Production
Snippets — Text shortcuts with trigger→replacement. Placeholders: {date}, {time}, {datetime}, {clipboard}, {day}, {year}. Date/time support custom formats (e.g. {date:dd.MM.yyyy})
History — Searchable transcription history with raw/final text tracking, app context, inline editing, and export (TXT, CSV, Markdown, JSON)

Integration & Extensibility

Plugin system — Extensible plugin architecture with SDK and marketplace. Create custom plugins for transcription engines, LLM providers, post-processors, or action plugins
Plugin marketplace — Browse, install, update, and uninstall plugins directly from Settings. Auto-installs recommended extensions on first run
Action plugins — Linear (create issues), Obsidian (create notes), Script (run custom scripts), Webhook (HTTP notifications)
HTTP API — Local REST server for integration with external tools (status, models, transcription, history, profiles, dictation control)
Model auto-unload — Automatically unloads models after configurable idle timeout to save memory

General

Fluent Design — WPF-UI with Mica backdrop, native title bar, and Fluent controls
Dynamic Island overlay — Configurable widgets (LED, timer, waveform) with multi-monitor support and real-time microphone level meter
Home dashboard — Usage statistics (words, WPM, apps, time saved) with activity chart
Welcome wizard — Guided 4-step onboarding: extension installation & model download, microphone test, hotkey setup. Re-accessible from the dashboard
Auto-update — Built-in updates via Velopack
Windows autostart — Optional start with Windows (via registry)
System tray — Minimizes to tray with quick access
Localization — English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Czech, Swedish, Danish, Finnish

System Requirements

Windows 10 or later (x64 or ARM64)
8 GB RAM minimum, 16 GB+ recommended for larger models
~700 MB disk space for the Parakeet model, ~200 MB for Canary

Models

Local Models

Use Case	Model	Size
General transcription (25+ languages)	Parakeet TDT 0.6B	670 MB, int8
Multilingual with translation (EN/DE/FR/ES)	Canary 180M Flash	198 MB, int8

Both models run on CPU with int8 quantization — no GPU required. Local models are provided by the SherpaOnnx plugin, installed via the built-in marketplace.

Cloud Models (optional)

Provider	Model	Notes
Groq	whisper-large-v3	Fast cloud transcription, supports translation
Groq	whisper-large-v3-turbo	Fastest, no translation
OpenAI	gpt-4o-transcribe	Highest accuracy
OpenAI	gpt-4o-mini-transcribe	Lower cost, good quality
OpenAI	whisper-1	Classic, supports translation
AssemblyAI	various	Real-time and batch transcription
Deepgram	nova-3	Fast, accurate cloud ASR
OpenAI Compatible	Any model	Local LLM servers (Ollama, LM Studio, vLLM)

Cloud providers are available as plugins and can be configured in Settings > Extensions.

Build

Clone the repository:

git clone https://github.com/TypeWhisper/typewhisper-win.git
cd typewhisper-win

Build with .NET 10:
```
dotnet build
```

Run the app:

dotnet run --project src/TypeWhisper.Windows

The app appears in the system tray — the welcome wizard guides you through extension installation, model download, and setup.

HTTP API

TypeWhisper can run a local HTTP server (default port 9876, configurable in Settings) for integration with external tools.

Endpoint	Method	Description
`/v1/status`	GET	App status and active model
`/v1/models`	GET	List all available models (local + cloud)
`/v1/transcribe`	POST	Transcribe audio from request body. Query params: `language`, `task` (transcribe/translate)
`/v1/history`	GET	Search history with pagination. Query params: `q`, `limit`, `offset`
`/v1/history`	DELETE	Delete history entries by ID
`/v1/profiles`	GET	List all profiles
`/v1/profiles/toggle`	PUT	Toggle a profile on/off by ID
`/v1/dictation/start`	POST	Start recording
`/v1/dictation/stop`	POST	Stop recording
`/v1/dictation/status`	GET	Check current dictation state

Profiles

Profiles let you configure transcription settings per application or website. For example:

Outlook — German language
Slack — English language
Terminal — Whisper mode always on
github.com — English language (matches in any browser)
docs.google.com — German language, translate to English

Create profiles in Settings > Profiles. Assign process names and/or URL patterns, set language/task/model overrides, and adjust priority. URL patterns support wildcard matching — e.g. *.github.com matches gist.github.com.

When you start dictating, TypeWhisper matches the active window and browser URL against your profiles with the following priority:

Process + URL match — highest specificity (e.g. chrome.exe + github.com)
URL-only match — cross-browser profiles (e.g. github.com in any browser)
Process-only match — generic app profiles (e.g. all of Chrome)

The active profile is shown in the recording overlay.

Plugins

TypeWhisper supports plugins for adding custom transcription engines, LLM providers, post-processors, and action plugins. Plugins are .NET class libraries with a manifest.json, installed to %LocalAppData%/TypeWhisper/Plugins/.

The built-in marketplace provides the following plugins:

Plugin	Type	Description
SherpaOnnx	Transcription	Local ASR with Parakeet and Canary models
OpenAI	Transcription + LLM	OpenAI Whisper and GPT models
Groq	Transcription + LLM	Groq Whisper and Llama models
Google Gemini	LLM	Gemini 2.5 Flash, Pro, and Flash Lite
AssemblyAI	Transcription	AssemblyAI speech-to-text
Deepgram	Transcription	Deepgram Nova ASR
OpenAI Compatible	Transcription + LLM	Any OpenAI-compatible server (Ollama, LM Studio, vLLM)
Linear	Action	Create Linear issues from transcriptions
Obsidian	Action	Create Obsidian notes from transcriptions
Script	Action	Run custom scripts with transcription data
LiveTranscript	Action	Floating live transcription window
Webhook	Event	HTTP notifications for transcription events

Plugin Types

Interface	Purpose
`ITranscriptionEnginePlugin`	Cloud/custom transcription (e.g., Whisper API)
`ILlmProviderPlugin`	LLM chat completions (e.g., translation, course correction)
`IPostProcessorPlugin`	Post-processing pipeline (text cleanup, formatting)
`IActionPlugin`	Custom actions triggered by transcription events
`ITypeWhisperPlugin`	Event observer (e.g., webhook, logging)

SDK Helpers

The SDK includes helpers for OpenAI-compatible APIs:

OpenAiTranscriptionHelper — multipart/form-data upload for Whisper-compatible endpoints
OpenAiChatHelper — chat completion requests
OpenAiApiHelper — shared HTTP error handling

Architecture

typewhisper-win/
├── src/
│   ├── TypeWhisper.Core/           # Core logic (net10.0)
│   │   ├── Data/                   # SQLite database with migrations
│   │   ├── Interfaces/             # Service contracts
│   │   ├── Models/                 # Profile, AppSettings, ModelInfo, TermPack, etc.
│   │   └── Translation/            # MarianTokenizer, MarianConfig (local ONNX translation)
│   ├── TypeWhisper.PluginSDK/      # Plugin SDK (net10.0-windows)
│   │   ├── Helpers/                # OpenAiChatHelper, OpenAiTranscriptionHelper
│   │   └── Models/                 # PluginManifest, PluginEvents, etc.
│   └── TypeWhisper.Windows/        # WPF UI layer (net10.0-windows, WPF-UI Fluent Design)
│       ├── Controls/               # HotkeyRecorderControl, MicLevelMeter
│       ├── Native/                 # P/Invoke, KeyboardHook
│       ├── Services/
│       │   ├── Plugins/            # PluginLoader, PluginManager, PluginEventBus, PluginRegistryService
│       │   └── Providers/          # LocalProviderBase
│       ├── ViewModels/             # MVVM view models
│       ├── Views/                  # MainWindow, SettingsWindow, WelcomeWindow, sections
│       ├── Resources/              # Icons, sounds, silero_vad.onnx
│       └── App.xaml.cs             # Composition root
├── plugins/                        # Plugin source (distributed via marketplace)
│   ├── TypeWhisper.Plugin.OpenAi/           # OpenAI transcription + LLM
│   ├── TypeWhisper.Plugin.Groq/             # Groq transcription + LLM
│   ├── TypeWhisper.Plugin.Gemini/           # Google Gemini LLM
│   ├── TypeWhisper.Plugin.OpenAiCompatible/ # OpenAI-compatible servers
│   ├── TypeWhisper.Plugin.SherpaOnnx/       # Local ASR (Parakeet, Canary)
│   ├── TypeWhisper.Plugin.AssemblyAi/       # AssemblyAI transcription
│   ├── TypeWhisper.Plugin.Deepgram/         # Deepgram transcription
│   ├── TypeWhisper.Plugin.Linear/           # Linear issue creation
│   ├── TypeWhisper.Plugin.Obsidian/         # Obsidian note creation
│   ├── TypeWhisper.Plugin.Script/           # Custom script execution
│   ├── TypeWhisper.Plugin.LiveTranscript/   # Live transcription window
│   └── TypeWhisper.Plugin.Webhook/          # Webhook event notifications
└── tests/
    ├── TypeWhisper.Core.Tests/           # xUnit + Moq, in-memory SQLite
    └── TypeWhisper.PluginSystem.Tests/   # Plugin infrastructure tests

Patterns: MVVM with CommunityToolkit.Mvvm. App.xaml.cs is the composition root. SQLite for persistence with a custom migration pattern. Plugin system with AssemblyLoadContext isolation and manifest-based discovery. Post-processing pipeline with priority-based ordering.

Key dependencies: NAudio (audio), NHotkey.Wpf (hotkeys), WPF-UI (Fluent Design), Microsoft.ML.OnnxRuntime (local translation), Velopack (updates), H.NotifyIcon.Wpf (system tray).

License

GPLv3 — see LICENSE for details. Commercial licensing available — see LICENSE-COMMERCIAL.md. Trademark policy — see TRADEMARK.md.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github		.github
docs/plans		docs/plans
plugins		plugins
src		src
tests		tests
.gitignore		.gitignore
Directory.Build.props		Directory.Build.props
LICENSE		LICENSE
LICENSE-COMMERCIAL.md		LICENSE-COMMERCIAL.md
README.md		README.md
TESTING_GUIDE.md		TESTING_GUIDE.md
TRADEMARK.md		TRADEMARK.md
TypeWhisper.slnx		TypeWhisper.slnx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TypeWhisper for Windows

Features

Transcription

Dictation

AI Processing

Personalization

Integration & Extensibility

General

System Requirements

Models

Local Models

Cloud Models (optional)

Build

HTTP API

Profiles

Plugins

Plugin Types

SDK Helpers

Architecture

License

About

Licenses found

Uh oh!

Releases 27

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TypeWhisper for Windows

Features

Transcription

Dictation

AI Processing

Personalization

Integration & Extensibility

General

System Requirements

Models

Local Models

Cloud Models (optional)

Build

HTTP API

Profiles

Plugins

Plugin Types

SDK Helpers

Architecture

License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 27

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages