Feat/multilingual whisper by andyhtran · Pull Request #1 · andyhtran/MiniWhisper

andyhtran · 2026-03-24T22:36:55Z

Summary

This PR adds a new multilingual transcription mode powered by whisper.cpp (large-v3-turbo) while
keeping the existing fast English-only Parakeet path.

It also fixes a critical Whisper integration bug where transcriptions returned empty text even when
valid audio was recorded.

What Changed

Added multilingual model support using whisper.cpp:
- New transcription mode selection in the app
- Whisper model download/init lifecycle
- Whisper inference path for recorded WAV audio
Embedded and signed whisper.framework in app packaging/signing scripts so the app launches
reliably outside local build folders
Updated default recording shortcut and display formatting updates for shortcut text
Added/updated tests to match shortcut display behavior and cover Whisper config defaults

Bug Fix Included

Empty transcription with Whisper

Root cause:

whisper_full was called with detect_language = true, which can trigger language-detection-only
behavior (returns success but no segments).

Fix:

Keep auto language behavior via params.language = nil
Disable detection-only path by setting detect_language = false
Ensure converter callbacks correctly signal end-of-stream in audio conversion/resampling

Result:

Whisper now returns non-empty segments for valid recorded WAV input.

Verification

swift build passes
swift test passes (35 tests)
Direct Whisper harness run against saved recording now returns:
- rc=0
- segments=1
- non-empty transcript text

Notes

Existing Parakeet (English-only) flow remains intact
Multilingual mode is fully on-device

- New TranscriptionMode enum (.english / .multilingual) persisted in UserDefaults - WhisperProvider: downloads 547 MB model on demand, resamples audio to 16 kHz, runs whisper_full with Metal GPU; auto-detects language via params.language = nil (detect_language = false to avoid detection-only mode with no segments) - ParakeetProvider: add unload() to free memory on mode switch - AppState: switchTranscriptionMode(), updated preloadModel/startRecording/transcribe to dispatch to active engine; forwards isModelDownloading/modelDownloadProgress - ModelPickerView: footer popover with English Only and Multilingual rows - MenuBarView: download progress bar + percentage in header; globe icon in footer - Shortcut default changed to Option+Shift+R; display uses words (Option+, Shift+) instead of symbols - build-app.sh: embed whisper.framework with proper versioned symlink structure - sign-and-notarize.sh: sign embedded frameworks before app - justfile: DEV_CODESIGN_IDENTITY for stable dev signing (no accessibility resets) - All 35 tests pass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

andyhtran and others added 3 commits March 24, 2026 05:30

add svg for icon

150cc4c

update readme

5cefbe8

andyhtran merged commit d7136d2 into main Mar 24, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/multilingual whisper#1

Feat/multilingual whisper#1
andyhtran merged 3 commits intomainfrom
feat/multilingual-whisper

andyhtran commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andyhtran commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Bug Fix Included

Empty transcription with Whisper

Verification

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andyhtran commented Mar 24, 2026 •

edited

Loading