Your Mac's notch just learned to listen.
Hold ⌥ Option in any text field · Speak · Release · Your words appear ✨
╔══════════════════════════════════════════════════════════════════╗
║ ║
║ You're typing an email. Mid-thought, you pause. ║
║ ║
║ You hold ⌥ Option. ║
║ ║
║ The notch awakens — a pill of obsidian glass expands, ║
║ pulsing with a cyan-to-purple gradient. ║
║ 28 bars of waveform dance to your voice in real-time. ║
║ ║
║ "Schedule the meeting for Thursday at 3 PM" ║
║ ║
║ You release the key. A brief shimmer of orange-gold. ║
║ The text materializes in your email — as if you typed it. ║
║ ║
║ The notch fades back to sleep. ║
║ ║
║ No internet. No cloud. No latency. ║
║ Just your voice, your Mac, and a Whisper. ║
║ ║
╚══════════════════════════════════════════════════════════════════╝
WhisperNotch is a macOS-native speech-to-text utility that transforms your MacBook's notch into an intelligent voice interface. Built entirely in Swift, it captures your voice, runs OpenAI's Whisper model locally on-device via Apple's Core ML, and injects the transcribed text directly into whatever app you're using.
No accounts. No API keys. No data leaving your machine. Ever.
The Philosophy: The best interface is the one that disappears. WhisperNotch has no window, no dock icon, no friction. It lives in the notch — the one part of your screen you never use — and activates with a single key hold.
|
|
|
|
|
|
WhisperNotch follows a clean Service-Oriented Architecture with SwiftUI reactive state management. Each layer has a single responsibility, and the AppDelegate acts as the central orchestrator binding everything together.
┌─────────────────────────────────────────────────────────────────────────┐
│ WhisperNotch Architecture │
│ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ 🎨 PRESENTATION LAYER │ │
│ │ │ │
│ │ ┌─────────────┐ ┌──────────────────┐ ┌────────────────┐ │ │
│ │ │ NotchPanel │ │ NotchContentView │ │ SettingsView │ │ │
│ │ │ (NSPanel) │◄──│ (SwiftUI) │ │ (SwiftUI) │ │ │
│ │ │ floating │ │ waveform, text │ │ model picker │ │ │
│ │ │ window │ │ states, glow │ │ permissions │ │ │
│ │ └──────────────┘ └────────┬─────────┘ └────────────────┘ │ │
│ │ │ @ObservedObject │ │
│ └───────────────────────────────┼───────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────────┼───────────────────────────────────┐ │
│ │ 📊 STATE LAYER │ │
│ │ │ │ │
│ │ ┌──────────▼──────────┐ │ │
│ │ │ NotchState │ │ │
│ │ │ ───────────────── │ │ │
│ │ │ isListening │ │ │
│ │ │ isTranscribing │ │ │
│ │ │ transcribedText │ │ │
│ │ │ audioLevels[28] │ │ │
│ │ │ statusMessage │ │ │
│ │ └──────────▲──────────┘ │ │
│ │ │ @Published │ │
│ └───────────────────────────────┼───────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────────┼───────────────────────────────────┐ │
│ │ 🎯 ORCHESTRATION LAYER │ │
│ │ │ │ │
│ │ ┌──────────▼──────────┐ │ │
│ │ │ AppDelegate │ │ │
│ │ │ ───────────────── │ │ │
│ │ │ Binds hotkey events │ │ │
│ │ │ Manages audio flow │ │ │
│ │ │ Coordinates AI │ │ │
│ │ │ Controls UI panel │ │ │
│ │ │ Runs interim timer │ │ │
│ │ └──┬───┬───┬───┬──────┘ │ │
│ │ │ │ │ │ │ │
│ └───────────────────────┼───┼───┼───┼────────────────────────────────┘ │
│ │ │ │ │ │
│ ┌───────────────────────┼───┼───┼───┼────────────────────────────────┐ │
│ │ ⚙️ SERVICE LAYER │ │
│ │ │ │ │ │ │ │
│ │ ┌───────────────────▼┐ │ │ ┌▼──────────────────┐ │ │
│ │ │ HotkeyManager │ │ │ │ TextInjector │ │ │
│ │ │ ──────────────── │ │ │ │ ─────────────── │ │ │
│ │ │ NSEvent monitor │ │ │ │ NSPasteboard │ │ │
│ │ │ CGEvent tap │ │ │ │ CGEvent ⌘V │ │ │
│ │ │ ⌥ Option detect │ │ │ │ 150ms debounce │ │ │
│ │ └────────────────────┘ │ │ └───────────────────┘ │ │
│ │ │ │ │ │
│ │ ┌──────────────────────▼┐ ┌▼──────────────────────┐ │ │
│ │ │ AudioEngine │ │ WhisperService │ │ │
│ │ │ ────────────────────│ │ ──────────────────── │ │ │
│ │ │ AVAudioEngine │ │ WhisperKit wrapper │ │ │
│ │ │ 16kHz mono capture │ │ Core ML inference │ │ │
│ │ │ RMS+Peak metering │ │ 6 model variants │ │ │
│ │ │ Accelerate.framework │ │ Lazy model loading │ │ │
│ │ └───────────────────────┘ └───────────────────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ 🍎 SYSTEM LAYER │ │
│ │ │ │
│ │ AVFoundation · Accelerate · AppKit · Carbon.HIToolbox │ │
│ │ CoreGraphics · Core ML · SwiftUI · NSAccessibility │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
This is the complete flow of what happens when you hold ⌥ Option and speak.
flowchart TD
A["🖐️ User holds ⌥ Option"] --> B{"HotkeyManager\ndetects flagsChanged"}
B -->|Option only\nno Cmd/Ctrl/Shift| C["onHotkeyDown()"]
B -->|Other modifiers held| X["❌ Ignored\n(Cmd+⌥, etc.)"]
C --> D["AppDelegate orchestrates"]
D --> E["NotchPanel.show()\nfade-in animation 0.2s"]
D --> F["AudioEngine.startRecording()\n16kHz mono capture begins"]
D --> G["Start interim timer\nevery 1.5 seconds"]
F --> H["🎤 User speaks"]
H --> I["Audio tap fires\n~30 times/second"]
I --> J["Compute audio level\nRMS×0.4 + Peak×0.6"]
J --> K["Power curve normalize\nmin(pow(blend × 3.5, 0.7), 1.0)"]
K --> L["NotchState.pushAudioLevel()"]
L --> M["🌊 Waveform animates\n28 bars with spring physics"]
I --> N["Append to audio buffer\nthread-safe via NSLock"]
G --> O{"1.5s tick"}
O --> P["Peek audio buffer\n(non-destructive read)"]
P --> Q{">= 8000 samples?\n(0.5 seconds)"}
Q -->|Yes| R["WhisperService\n.interimTranscribe()"]
R --> S["💬 Live text preview\nupdates in notch UI"]
Q -->|No| O
H --> T["🖐️ User releases ⌥ Option"]
T --> U["onHotkeyUp()"]
U --> V["AudioEngine.stopRecording()"]
U --> W["Stop interim timer"]
V --> Y["Get full audio buffer"]
Y --> Z{">= 1600 samples?\n(0.1 seconds)"}
Z -->|No| AA["⚠️ 'Too short —\nhold ⌥ longer'"]
Z -->|Yes| AB["NotchState.isTranscribing = true\nOrange-gold gradient appears"]
AB --> AC["WhisperService.transcribe()\nFull Whisper inference on-device"]
AC --> AD["Final text returned"]
AD --> AE["TextInjector.injectText()"]
AE --> AF["1. Clear NSPasteboard"]
AF --> AG["2. Write text to clipboard"]
AG --> AH["3. Wait 150ms"]
AH --> AI["4. Post CGEvent ⌘V\n(key down + key up)"]
AI --> AJ["✅ 'Inserted' confirmation\ngreen checkmark in notch"]
AJ --> AK["Auto-hide panel\nafter 1.8 seconds"]
AK --> AL["🖤 Notch returns to sleep"]
style A fill:#1a1a2e,stroke:#e94560,color:#fff
style H fill:#1a1a2e,stroke:#00d2ff,color:#fff
style T fill:#1a1a2e,stroke:#e94560,color:#fff
style M fill:#0f3460,stroke:#00d2ff,color:#fff
style S fill:#0f3460,stroke:#a855f7,color:#fff
style AC fill:#1a1a2e,stroke:#f59e0b,color:#fff
style AJ fill:#0f3460,stroke:#00C853,color:#fff
style AL fill:#000000,stroke:#333,color:#888
style X fill:#2d2d2d,stroke:#666,color:#999
The audio processing pipeline is where physics meets perception. Here's how raw microphone data becomes a dancing waveform:
┌──────────────────────────────────┐
│ MICROPHONE INPUT │
│ (native hardware format) │
└──────────────┬───────────────────┘
│
▼
┌──────────────────────────────────┐
│ AVAudioEngine Tap │
│ 1024-byte buffer chunks │
│ fires ~30 times per second │
└──────────────┬───────────────────┘
│
┌─────────┴─────────┐
▼ ▼
┌─────────────────┐ ┌──────────────────┐
│ FORMAT CONVERT │ │ LEVEL METERING │
│ │ │ │
│ Native format │ │ ┌──────────────┐ │
│ ↓ │ │ │ vDSP_rmsqv │ │
│ AVAudioConvert │ │ │ (RMS energy) │ │
│ ↓ │ │ └──────┬───────┘ │
│ 16 kHz mono │ │ │ │
│ Float32 │ │ ┌──────▼───────┐ │
│ │ │ │ vDSP_maxmgv │ │
│ │ │ │ (peak level) │ │
│ │ │ └──────┬───────┘ │
└────────┬────────┘ │ │ │
│ │ ┌──────▼───────┐ │
│ │ │ BLEND │ │
│ │ │ rms×0.4 │ │
│ │ │ + peak×0.6 │ │
│ │ └──────┬───────┘ │
│ │ │ │
│ │ ┌──────▼───────┐ │
│ │ │ NORMALIZE │ │
│ │ │ pow(x×3.5, │ │
│ │ │ 0.7) │ │
│ │ │ capped @1.0 │ │
│ │ └──────┬───────┘ │
│ └─────────┼────────┘
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────┐
│ AUDIO BUFFER │ │ WAVEFORM UI │
│ ───────────── │ │ ────────────── │
│ Thread-safe │ │ 28-bar rolling │
│ NSLock guard │ │ buffer display │
│ Continuous │ │ Spring-animated │
│ append │ │ Color-reactive │
└────────┬────────┘ └──────────────────┘
│
▼
┌─────────────────┐
│ WHISPER MODEL │
│ ───────────── │
│ Core ML │
│ On-device │
│ inference │
└─────────────────┘
| Method | Pros | Cons |
|---|---|---|
| RMS only | Smooth, energy-accurate | Sluggish, misses transients |
| Peak only | Responsive, catches plosives | Jittery, visually noisy |
| Blended (40/60) | Best of both — smooth yet responsive | — |
The power curve pow(x × 3.5, 0.7) serves two purposes:
- Amplification (
× 3.5): Quiet speech becomes visible - Compression (
^0.7): Loud sounds don't clip — dynamic range is preserved
WhisperNotch/
│
├── 📦 Package.swift # SPM manifest — WhisperKit ≥ 0.9.0
├── 🔧 project.yml # XcodeGen project definition
├── 📀 build-dmg.sh # DMG installer builder
├── ⚡ setup.sh # Project bootstrap script
│
└── WhisperNotch/
│
├── Sources/
│ │
│ ├── App/ # ── Application Core ──
│ │ ├── WhisperNotchApp.swift # @main entry, SwiftUI lifecycle
│ │ └── AppDelegate.swift # 🧠 The brain — orchestrates everything
│ │
│ ├── Models/ # ── State Management ──
│ │ └── NotchState.swift # Observable state container
│ │
│ ├── Services/ # ── Business Logic ──
│ │ ├── AudioEngine.swift # 🎤 Mic → 16kHz Float32 pipeline
│ │ ├── WhisperService.swift # 🤖 WhisperKit AI wrapper
│ │ ├── HotkeyManager.swift # ⌨️ Global ⌥ key detection
│ │ └── TextInjector.swift # 💉 Clipboard + ⌘V injection
│ │
│ └── Views/ # ── User Interface ──
│ ├── NotchPanel.swift # 🖤 Floating NSPanel at notch
│ ├── NotchContentView.swift # 🌊 SwiftUI waveform + states
│ └── SettingsView.swift # ⚙️ Model picker & permissions
│
├── Info.plist # Bundle metadata (LSUIElement: true)
├── WhisperNotch.entitlements # Audio input + network client
└── Assets.xcassets/ # App icon & asset catalog
| File | Lines | Role | Key Insight |
|---|---|---|---|
AppDelegate.swift |
272 | Orchestrator | Binds all services, manages lifecycle, runs interim timer |
NotchContentView.swift |
237 | Main UI | Custom shape, gradient borders, waveform, 4 state views |
SettingsView.swift |
186 | Settings | Model picker with size/speed/accuracy descriptions |
HotkeyManager.swift |
146 | Input | NSEvent monitor + CGEvent tap with Accessibility polling |
AudioEngine.swift |
114 | Capture | Accelerate-powered metering, format conversion |
NotchPanel.swift |
97 | Window | Screen-saver level panel, fade animations |
WhisperService.swift |
90 | AI | Lazy model loading, interim + final transcription |
TextInjector.swift |
38 | Output | Thread-safe clipboard write + synthetic paste |
NotchState.swift |
27 | State | Observable properties + 28-sample rolling buffer |
WhisperNotchApp.swift |
15 | Entry | @main, delegates to AppDelegate |
Total: ~1,232 lines of Swift — lean, focused, no bloat.
⌨️ HotkeyManager — The Sentinel
The HotkeyManager uses a two-tier detection system:
Tier 1 — NSEvent Global Monitor (always works):
NSEvent.addGlobalMonitorForEvents(matching: .flagsChanged) { event in
let optionHeld = event.modifierFlags.contains(.option)
let hasOthers = event.modifierFlags.contains([.command, .control, .shift])
if optionHeld && !hasOthers → onHotkeyDown()
if !optionHeld → onHotkeyUp()
}
Tier 2 — CGEvent Tap (requires Accessibility):
CGEvent.tapCreate(tap: .cgAnnotatedSessionEventTap, ...)
- Required for posting synthetic ⌘V events
- Polls
AXIsProcessTrusted()every 2 seconds until granted - Once trusted, creates tap and enables auto-paste
Why two tiers? The app works immediately after launch (Tier 1), even before the user grants Accessibility. Auto-paste unlocks when they do (Tier 2).
🎤 AudioEngine — The Ear
The AudioEngine maintains a thread-safe circular recording with two read modes:
| Method | Behavior | Used By |
|---|---|---|
getRecordedAudio() |
Returns buffer and clears it | Final transcription |
peekRecordedAudio() |
Returns buffer without clearing | Interim transcription |
Audio Format Pipeline:
Microphone (native: 44.1/48 kHz stereo)
↓ AVAudioConverter
Whisper-ready (16 kHz mono Float32)
↓ NSLock-guarded append
Thread-safe audio buffer
Level Metering uses Apple's Accelerate framework (SIMD-optimized):
vDSP_rmsqv— Root Mean Square (energy envelope)vDSP_maxmgv— Maximum magnitude (transient detection)
🤖 WhisperService — The Brain
Wraps WhisperKit for seamless Core ML inference.
Available Models:
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
tiny |
~39 MB | ⚡⚡⚡ | ★★☆ | Quick notes, commands |
tiny.en |
~39 MB | ⚡⚡⚡ | ★★★ | English-only, fastest |
base |
~74 MB | ⚡⚡ | ★★★ | General dictation (default) |
base.en |
~74 MB | ⚡⚡ | ★★★★ | English dictation |
small |
~244 MB | ⚡ | ★★★★ | Multilingual, complex speech |
small.en |
~244 MB | ⚡ | ★★★★★ | Best English accuracy |
Two Transcription Modes:
interimTranscribe()— Best-effort on partial audio (≥0.5s). Used for live preview.transcribe()— Full accuracy on complete audio buffer. Used on key release.
Models are lazy-loaded on first launch and cached by WhisperKit for instant subsequent starts.
🖤 NotchPanel — The Stage
A carefully configured NSPanel that integrates with the hardware notch:
NSPanel Configuration:
├── Style: borderless, non-activating
├── Level: .screenSaver (above everything)
├── Size: 520 × 120 points
├── Position: centered at screen top (notch area)
├── Background: transparent (SwiftUI renders the shape)
├── Behavior: canJoinAllSpaces, stationary, fullScreenAuxiliary
└── Interaction: ignoresMouseEvents (click-through)
Animations:
- Show:
alphaValue0 → 1 over 0.2s (ease-in) - Hide:
alphaValue1 → 0 over 0.25s (ease-out), thenorderOut
The panel is completely non-interactive — clicks pass through to whatever is behind it. It's a heads-up display, not a window.
🌊 NotchContentView — The Canvas
The SwiftUI view renders four distinct states:
┌─────────────────────────────────────────────┐
│ STATE MACHINE │
│ │
│ ┌──────────┐ isListening ┌───────────┐ │
│ │ IDLE ├────────────────►│ LISTENING │ │
│ │ │ │ 🌊 waveform│ │
│ └──────────┘ │ 💬 interim │ │
│ ▲ └─────┬──────┘ │
│ │ │ release │
│ │ ┌─────▼──────┐ │
│ │ │TRANSCRIBING│ │
│ │ │ ⏳ spinner │ │
│ │ │ orange glow│ │
│ │ └─────┬──────┘ │
│ │ │ done │
│ │ 1.8s ┌─────▼──────┐ │
│ └───────────────────────│ DONE │ │
│ │ ✅ text │ │
│ │ "Inserted" │ │
│ └────────────┘ │
└─────────────────────────────────────────────┘
Visual Design Language:
- Background: Solid black
NotchShape(pill with 10pt top / 20pt bottom corners) - Border (listening): Cyan ↔ Purple linear gradient, pulsing glow
- Border (transcribing): Orange ↔ Gold linear gradient
- Shadow: Black 0.6 opacity, 16pt blur, 6pt Y offset
- Typography: SF Rounded, semibold/medium weights
| Requirement | Details |
|---|---|
| macOS | 14.0 Sonoma or later |
| Chip | Apple Silicon (M1/M2/M3/M4) recommended |
| Xcode | 15.0+ (for building from source) |
| Disk Space | ~75 MB for base model (downloaded on first launch) |
# 1. Clone and open
open WhisperNotch.xcodeproj
# 2. Xcode auto-resolves WhisperKit package
# 3. Select scheme: WhisperNotch → My Mac
# 4. ⌘R to build and runswift build
swift run WhisperNotchchmod +x build-dmg.sh
./build-dmg.sh
# → Creates WhisperNotch-Installer.dmg1. 🖱️ Right-click app → Open (bypasses Gatekeeper on first run)
2. 🎤 Grant Microphone permission when prompted
3. ♿ Grant Accessibility in System Settings → Privacy & Security
4. ⏳ Wait for Whisper model to download (~74 MB, one-time)
5. ⌥ Hold Option in any text field and speak!
WhisperNotch requires two system permissions — both for good reason:
| Permission | Why It's Needed | What Happens Without It |
|---|---|---|
| 🎤 Microphone | Capture your voice for transcription | App cannot function at all |
| ♿ Accessibility | Detect global ⌥ hotkey + auto-paste text | Hotkey works, but no auto-paste — text goes to clipboard for manual ⌘V |
Why no sandbox? The app sandbox prevents global event monitoring and Accessibility API access — both essential for WhisperNotch's core workflow. The app compensates by being 100% local with no network calls (except the one-time model download).
Click the waveform icon in your menu bar → Settings:
- Model Picker — Switch between 6 Whisper variants (tiny → small, with/without
.en) - Permission Status — See if Microphone and Accessibility are granted
- Quick Links — Jump directly to System Settings to fix permissions
┌───────────────────────────────────────────────────────┐
│ TECH STACK │
│ │
│ Language Swift 5.9 │
│ UI SwiftUI + AppKit (NSPanel, NSHosting) │
│ Audio AVFoundation + Accelerate (vDSP) │
│ AI Engine WhisperKit 0.9+ (Core ML backend) │
│ Events Carbon.HIToolbox + CoreGraphics │
│ Build Swift Package Manager + XcodeGen │
│ Target macOS 14.0+ (Sonoma) │
│ Architecture arm64 (Apple Silicon), x86_64 compat │
│ │
└───────────────────────────────────────────────────────┘
"Failed to create event tap"
Grant Accessibility permission:
- Open System Settings → Privacy & Security → Accessibility
- Add WhisperNotch (or toggle it off and on)
- You may need to restart the app
No audio captured / empty transcription
- Check System Settings → Privacy & Security → Microphone
- Ensure your mic isn't muted or exclusively used by another app (Zoom, etc.)
- Test your mic in System Settings → Sound → Input
Model download is slow or fails
- The model downloads once on first launch (~74 MB for base)
- Ensure you have internet connectivity for this initial download
- Subsequent launches use the cached model — no internet needed
Text not auto-pasting into the app
- Auto-paste requires Accessibility permission
- Without it, text is still copied to your clipboard — press ⌘V manually
- The notch shows "Inserted ✓" either way (text is on clipboard)
"Too short — hold ⌥ longer"
- Minimum recording length is 0.1 seconds (1,600 samples at 16 kHz)
- Hold the Option key for at least half a second while speaking
| Decision | Why |
|---|---|
| No Dock icon | LSUIElement: true — it's a utility, not an app you switch to |
| No sandbox | Global hotkeys + Accessibility API require it |
| ⌥ Option key | Least-used modifier; doesn't conflict with system shortcuts |
| 28 waveform bars | Fills the notch width perfectly at 3px × 2.5px spacing |
| 1.5s interim interval | Fast enough for live preview, slow enough to not overload CPU |
| 150ms paste delay | Ensures clipboard is committed before synthetic ⌘V fires |
| RMS×0.4 + Peak×0.6 | Responsive but smooth — pure RMS is sluggish, pure peak is jittery |
| NSPanel not NSWindow | No app activation, no taskbar entry, no focus stealing |
| Click-through UI | The notch display should never interrupt your workflow |
| base model default | Best balance of speed and accuracy for most users |
Built with obsessive attention to detail.
Because your Mac's notch deserved a purpose.
MIT License — Use it, fork it, make it yours.