Fully-local voice transcription and text-to-speech.
Speak, transcribe, polish, listen back — all on-device. Zero network. Zero telemetry.
Download · Features · Quick Start · Security · Audit · MIT License
IronMic captures your voice, transcribes it with Whisper, and copies the raw transcript to your clipboard or routes it to the active page — dictate, search, listen, notes, or timeline. Optionally polish text with a local LLM on-demand, or hear it read back with a built-in TTS engine. Everything runs entirely on your machine — no audio or text ever leaves the device.
- Voice-to-clipboard — Press a global hotkey, speak, press again. Raw transcript lands in your clipboard instantly, ready to paste anywhere.
- Page-aware recording — Voice input routes to the active page: dictate page inserts into the editor, search page fills the search bar, listen page queues for TTS playback, notes page appends to the active note.
- Progressive feedback — See your transcript appear in the timeline immediately after recording stops, with a live progress indicator. No more waiting for the full pipeline to complete.
- Whisper large-v3-turbo — State-of-the-art local speech recognition with GPU acceleration (Metal on macOS).
- On-demand LLM text cleanup — Click "Polish now" on any timeline entry to clean up text with a local Mistral 7B model. Removes filler words, fixes grammar, preserves your meaning. Raw text is always shown first — you control when cleanup runs.
- Custom dictionary — Add domain-specific terms, names, and jargon to improve transcription accuracy.
- Kokoro 82M TTS — Hear your dictations read back through a local neural voice engine.
- 15 English voices — American and British accents, male and female. Preview and switch in Settings.
- Speed control — 0.5x to 2.0x playback speed.
- Word-level highlighting — Words highlight in sync as they're spoken (karaoke-style).
- Auto read-back — Optionally read text aloud automatically after dictation completes.
- Built-in AI chat — Wrapper around GitHub Copilot CLI and Claude Code CLI.
- Context-aware — Ask questions, refine text, brainstorm — powered by your existing AI subscriptions.
- Streaming responses — Real-time token-by-token output.
- Privacy-first — The AI feature is off by default. When enabled, it uses your own CLI tools and credentials.
IronMic includes 5 TensorFlow.js-powered ML features that learn from your usage patterns — entirely on-device.
- Voice Activity Detection — Filters silence and noise before Whisper, making transcription faster. Configurable sensitivity.
- Turn Detection — Detects when you're done speaking. Three modes: push-to-talk, auto-detect (default 3s silence), and always-listening. Enables hands-free conversation loops with the AI assistant.
- Voice Commands — Say "search for meeting notes", "open analytics", or "summarize today" — IronMic classifies the intent and executes the action. LLM fallback for complex commands. Voice correction: "no, I meant...".
- Context-Aware Routing — Voice input automatically routes to dictation, AI conversation, or command classification based on your active screen. No manual mode switching.
- Ambient Meeting Mode — Leave IronMic running during a meeting. It transcribes with speaker awareness, detects when the meeting ends, and generates a summary with action items.
- Semantic Search — Search by meaning, not just keywords. "Find everything about the auth redesign" returns semantically related entries across all content.
- Smart Notifications — Learns which notifications matter to you and ranks them by predicted importance. Starts with heuristics, improves over time.
- Workflow Discovery — Detects repeating action patterns (e.g., "You check GitHub PRs, then update Jira, then dictate a status update every Thursday at 2 PM") and suggests automations.
All ML models are under 50MB total. All training data stays in SQLite on your machine. Toggle each feature independently in Settings > Voice AI.
- Unified search — Search across all content types from one place: dictation entries (FTS5), meeting notes, AI chat conversations, and written notes. Filter by category, see highlighted matches.
- Timeline view — Scrollable card feed of all dictations, newest first. Entries show raw text by default with on-demand polishing.
- Full-text search — Instant search across all transcriptions (SQLite FTS5) and meeting sessions (name, summary, transcript, action items).
- Semantic search — AI-powered meaning-based search across all content (TF.js Universal Sentence Encoder).
- Tags — Categorize entries with custom tags.
- Pin & archive — Pin important entries to the top, archive old ones.
- Raw vs. polished toggle — Switch between original transcript and LLM-cleaned version on any entry.
- Auto-cleanup — Configure automatic deletion of entries older than N days.
- Dashboard — Daily word counts, recording time, words per minute, streaks.
- Topic classification — LLM-powered topic tagging with trend charts.
- Vocabulary richness — Type-Token Ratio and unique word tracking.
- Productivity comparison — Period-over-period metrics.
- In-app downloads — Download Whisper, LLM, and TTS models directly from Settings.
- Manual model import — If downloads are blocked by a corporate proxy, download models in your browser and import them into IronMic with a file picker. The app validates and copies them to the right location.
- HTTP proxy support — Configure a proxy in Settings > Security for corporate networks (HTTP/HTTPS/SOCKS5).
- Multiple Whisper sizes — Switch between tiny, base, small, medium, and large models.
- GPU acceleration — Detect and enable Metal (macOS) or CUDA for faster inference.
- Progress tracking — Download progress bars with percentage indicators.
- Dark / Light / System theme — Full theme support with auto mode that follows your OS preference.
- Configurable hotkey — Visual key recorder with conflict detection.
- Rich text editor — Bold, italic, headings, lists, blockquotes, code, highlights, and more.
- Enterprise design system — Clean, professional UI built on Inter font with the IronMic blue accent.
Electron UI ← IPC (contextBridge) → Rust Core (napi-rs)
├── React 18 + Zustand ├── Audio capture (cpal)
├── TF.js ML Web Worker ├── Whisper.cpp (speech-to-text)
│ ├── VAD (Silero) ├── llama.cpp (text cleanup)
│ ├── Intent classifier ├── Kokoro ONNX (text-to-speech)
│ ├── Semantic search (USE) ├── Audio playback (cpal)
│ ├── Notification ranker ├── SQLite (storage + FTS5)
│ └── Workflow predictor └── Clipboard (arboard)
└── Web Audio API (AudioWorklet)
| Layer | Tech | Purpose |
|---|---|---|
| UI | Electron + React 18 + Tailwind CSS + Zustand | Desktop shell, component UI, state management |
| Editor | TipTap (ProseMirror) | Rich text editing |
| Bridge | napi-rs (N-API) | Typed Rust ↔ Node.js communication |
| Audio | cpal + Web Audio API | Cross-platform mic input, real-time VAD frames |
| STT | whisper-rs (whisper.cpp) | Local speech-to-text |
| LLM | llama-cpp-rs (llama.cpp) | Local text cleanup |
| TTS | ort (ONNX Runtime) + Kokoro 82M | Local neural text-to-speech |
| ML | TensorFlow.js (Web Worker) | VAD, intent, search, notifications, workflows |
| Storage | rusqlite (SQLite + FTS5) | Entries, settings, ML data, embeddings |
| Clipboard | arboard | Cross-platform clipboard |
- Rust stable (1.88+) + cargo
- Node.js 20+ + npm
- CMake (for whisper.cpp / llama.cpp compilation)
- espeak-ng (for TTS phonemization):
brew install espeak-ng - Platform-specific:
- macOS: Xcode Command Line Tools
- Windows: Visual Studio Build Tools (C++ workload)
- Linux:
build-essential,libasound2-dev,libsqlite3-dev
Pre-built packages are available on the Releases page.
| Platform | Artifact |
|---|---|
| macOS (Apple Silicon) | .dmg |
| Windows | .exe installer |
| Linux | .AppImage |
macOS note: The app is not code-signed. After downloading, run this in Terminal before opening:
xattr -cr ~/Downloads/IronMic-*.dmgThen open the .dmg and drag IronMic to Applications. On first launch you may need to right-click → Open → Open.
After installing, open IronMic and go to Settings > Models to download the speech recognition model (~1.5 GB). The app will walk you through setup on first launch.
Models run entirely on your machine. Nothing is sent externally.
# Clone
git clone https://github.com/greenpioneersolutions/IronMic.git
cd IronMic
# One-command dev mode (builds Rust, installs deps, launches everything)
./scripts/dev.shThe dev script:
- Builds the Rust native addon with Metal GPU + TTS support
- Installs npm dependencies (if needed)
- Compiles Electron main & preload TypeScript
- Launches Vite dev server + Electron concurrently
# Build Rust core
cd rust-core
cargo build --release --features metal,tts
cp target/release/libironmic_core.dylib ironmic-core.node
cd ..
# Install and run frontend
cd electron-app
npm install
npx concurrently "npx vite" "sleep 3 && npx electron ."Models are downloaded through the Settings UI inside the app:
- Whisper large-v3-turbo (~1.5 GB) — speech recognition
- Mistral 7B Instruct Q4 (~4.4 GB) — text cleanup
- Kokoro 82M (~163 MB + ~7.5 MB voices) — text-to-speech
# Rust tests (225 tests)
cd rust-core
cargo test --no-default-features
# Frontend build verification
cd electron-app
npx vite buildThese are hard architectural constraints, not policies:
- No network calls. All outbound requests blocked. Model downloads are the only exception, triggered explicitly by you.
- Audio never hits disk. Mic input lives in memory only. Buffers explicitly zeroed after use.
- No telemetry. No analytics, crash reporting, or usage tracking.
- Local-only storage. Single SQLite file in your app data directory.
- Sandboxed renderer.
contextIsolation: true,nodeIntegration: false,sandbox: true. - Model download integrity. SHA-256 verified, HTTPS-only, domain-validated.
For the full security model, threat analysis, and configuration guide, see SECURITY.md.
IronMic/
├── rust-core/ # Rust native addon (napi-rs)
│ ├── src/
│ │ ├── audio/ # Mic capture + resampling
│ │ ├── transcription/ # Whisper integration + dictionary
│ │ ├── llm/ # LLM text cleanup + chat + prompts
│ │ ├── tts/ # Kokoro TTS + playback + timestamps
│ │ ├── storage/ # SQLite CRUD + FTS5 + migrations (v3)
│ │ │ ├── entries.rs # Dictation entry CRUD
│ │ │ ├── analytics.rs # Analytics snapshots + topics
│ │ │ ├── notifications.rs # ML notification system
│ │ │ ├── actions.rs # Action log for workflow discovery
│ │ │ ├── workflows.rs # Discovered workflow patterns
│ │ │ ├── embeddings.rs # Semantic search embeddings
│ │ │ ├── ml_models.rs # ML model weight persistence
│ │ │ ├── vad.rs # VAD training samples
│ │ │ ├── intents.rs # Intent + voice routing logs
│ │ │ └── meetings.rs # Meeting session management
│ │ ├── clipboard/ # Clipboard management
│ │ ├── hotkey/ # Pipeline state machine
│ │ ├── error.rs # Unified error types
│ │ └── lib.rs # N-API exports (~100 functions)
│ └── models/ # Downloaded model weights (gitignored)
├── electron-app/ # Electron + React frontend
│ └── src/
│ ├── main/ # Electron main process + IPC handlers
│ │ └── ai/ # AI chat adapter (Copilot/Claude CLI)
│ ├── preload/ # Typed contextBridge API
│ └── renderer/ # React UI
│ ├── components/ # 30+ components
│ ├── stores/ # Zustand state (11 stores)
│ ├── services/tfjs/ # TensorFlow.js ML services
│ │ ├── VADService.ts # Voice activity detection
│ │ ├── TurnDetector.ts # End-of-turn detection
│ │ ├── IntentClassifier.ts # Voice command classification
│ │ ├── VoiceRouter.ts # Context-aware routing
│ │ ├── MeetingDetector.ts # Ambient meeting mode
│ │ ├── SemanticSearch.ts # USE embeddings + similarity
│ │ ├── NotificationRanker.ts # ML notification ranking
│ │ ├── WorkflowMiner.ts # Action pattern mining
│ │ ├── AudioBridge.ts # Web Audio API integration
│ │ └── TFJSRuntime.ts # TF.js initialization
│ ├── workers/ # ML Web Worker (TF.js inference)
│ └── hooks/ # Custom hooks (bridge, theme, search)
├── scripts/
│ └── dev.sh # One-command dev launcher
└── CLAUDE.md # Architecture documentation
The Rust core uses Cargo feature flags to conditionally compile heavy dependencies:
| Feature | Dependencies | Purpose |
|---|---|---|
napi-export (default) |
— | Enables N-API function exports |
whisper |
whisper-rs | Speech-to-text |
metal |
whisper + whisper-rs/metal | GPU acceleration on macOS |
llm |
llama_cpp_rs | LLM text cleanup |
tts |
ort, ndarray | Kokoro TTS inference |
Tests run with --no-default-features to avoid N-API linker dependencies.
We publish a comprehensive, code-referenced self-audit that verifies every security claim we make. It points to exact files, line numbers, and code snippets — so you can confirm everything yourself.
The audit covers: network isolation, audio zero-on-drop, model download integrity (SHA-256 + HTTPS + domain validation), HuggingFace source verification, Electron sandbox configuration, IPC validation, environment variable scoping, SQL injection protection, XSS prevention, and more.
We encourage you to verify our claims. If you don't trust our self-audit, please engage an independent security professional to review the codebase. We welcome it.
MIT — see LICENSE.
