On-device speech-to-text on your Mac. Transcribe audio files, generate subtitles, stream from the microphone — all locally, no cloud.
No API keys. No network. No subscriptions. The speech recognition is already on your computer — ohr lets you use it.
Every Mac with Apple Silicon has a built-in speech recognizer — Apple's on-device SpeechAnalyzer, shipped as part of the Speech framework (macOS 26+). ohr wraps it in a CLI and an OpenAI-compatible HTTP server — so you can actually use it. All inference runs on-device, no network calls.
- UNIX tool —
ohr meeting.m4a— file in, text out. Pipe-friendly, multiple output formats, proper exit codes - Subtitle generator —
ohr -o srt lecture.wav > lecture.srt— SRT and VTT with precise timestamps - Live transcription —
ohr --listen— real-time microphone input - OpenAI-compatible server —
ohr --serve— drop-in replacement forPOST /v1/audio/transcriptions - Zero cost — no API keys, no cloud, no subscriptions, 30 languages supported
- Apple Silicon Mac, macOS 26 Tahoe or newer
- Building from source requires Command Line Tools with macOS 26 SDK (ships Swift 6.3). No Xcode required.
Homebrew (recommended):
brew tap Arthur-Ficial/tap
brew install Arthur-Ficial/tap/ohrBuild from source:
git clone https://github.com/Arthur-Ficial/ohr.git
cd ohr
make installohr meeting.m4a# SRT format
ohr -o srt lecture.wav > lecture.srt
# WebVTT format
ohr --vtt interview.m4a > interview.vttohr -o json recording.m4a | jq .{
"model": "apple-speechanalyzer",
"text": "Hello, this is a test of the speech to text system.",
"segments": [
{ "id": 0, "start": 0.0, "end": 1.86, "text": "Hello, this is a test" },
{ "id": 1, "start": 1.86, "end": 4.44, "text": "of the speech to text system." }
],
"duration": 4.44,
"language": "en",
"metadata": { "on_device": true, "version": "0.1.3" }
}ohr --timestamps meeting.m4a[00:00:00,000] Hello, this is a test
[00:00:01,860] of the speech to text system.
cat recording.wav | ohrohr meeting.m4a | apfel "summarize this meeting"ohr --listen # plain text with timestamps
ohr --listen --json # JSONL stream
ohr --listen --srt # SRT as you speakohr -l de-DE meeting.m4a # German
ohr -l fr-FR interview.m4a # French
ohr -l ja-JP recording.m4a # Japanese# Start server
ohr --serve
# In another terminal:
curl -X POST http://localhost:11434/v1/audio/transcriptions \
-F file=@meeting.m4a \
-F model=apple-speechanalyzer{"text": "Hello, this is a test of the speech to text system."}All five response formats:
# JSON (default)
curl -X POST http://localhost:11434/v1/audio/transcriptions \
-F file=@audio.m4a -F response_format=json
# Verbose JSON (with segments, timestamps, duration)
curl -X POST http://localhost:11434/v1/audio/transcriptions \
-F file=@audio.m4a -F response_format=verbose_json
# Plain text
curl -X POST http://localhost:11434/v1/audio/transcriptions \
-F file=@audio.m4a -F response_format=text
# SRT subtitles
curl -X POST http://localhost:11434/v1/audio/transcriptions \
-F file=@audio.m4a -F response_format=srt
# WebVTT subtitles
curl -X POST http://localhost:11434/v1/audio/transcriptions \
-F file=@audio.m4a -F response_format=vttSee demo/ for real-world shell scripts powered by ohr.
subtitle — generate subtitles:
demo/subtitle lecture.m4a --save # saves lecture.srt next to fileaudio-grep — search inside audio files:
demo/audio-grep "budget" meetings/*.m4a # find mentions with timestamps
demo/audio-grep -c "deadline" *.m4a # count matches per fileminutes — meeting to minutes (ohr + apfel):
demo/minutes standup.m4a -o markdown > standup.mdbatch-transcribe — transcribe a whole folder:
demo/batch-transcribe ~/recordings/ -o srtwhisper-compat — drop-in Whisper CLI replacement:
demo/whisper-compat audio.m4a --output_format srt --language enAlso in demo/:
- dictate — speak into a text file via microphone
- live-caption — real-time captions in the terminal
- voice-search — search spoken content across files
- translate-audio — transcribe then translate (ohr + apfel)
- action-items — extract to-dos from meetings (ohr + apfel)
- podcast-chapters — timestamped chapter markers (ohr + apfel)
- voice-note — record, transcribe, and summarize (ohr + apfel)
Base URL: http://localhost:11434/v1
| Feature | Status | Notes |
|---|---|---|
POST /v1/audio/transcriptions |
Supported | All 5 response formats |
GET /v1/models |
Supported | Returns apple-speechanalyzer |
GET /health |
Supported | Model availability, formats, languages |
GET /v1/logs |
Debug only | Available with --debug |
GET /v1/logs/stats |
Debug only | Available with --debug |
response_format |
Supported | json, verbose_json, text, srt, vtt |
language |
Supported | BCP-47 language code |
model |
Accepted | Ignored (only one model) |
prompt |
Accepted | Ignored (SpeechAnalyzer doesn't support prompting) |
temperature |
Accepted | Validated (0.0–1.0) |
| CORS | Supported | Enable with --cors |
| Token auth | Supported | --token <secret> or --token-auto |
POST /v1/chat/completions |
501 | Use apfel |
POST /v1/embeddings |
501 | Use kern |
| Format | Extensions |
|---|---|
| Apple M4A | .m4a |
| WAV | .wav, .wave |
| MP3 | .mp3 |
| MPEG-4 | .mp4 |
| Core Audio | .caf |
| AIFF | .aiff, .aif |
| FLAC | .flac |
30 languages including English, German, Spanish, French, Italian, Japanese, Korean, Portuguese, Chinese (Simplified, Traditional, Cantonese).
ohr --model-info # full listTested on Apple M2 with synthetic speech. Real-world performance may vary.
| Audio Length | Transcribe Time | Speed |
|---|---|---|
| 5 seconds | 300ms | 8x real-time |
| 30 seconds | 600ms | 39x real-time |
| 1 minute | 1.5s | 46x real-time |
| 3 minutes | 2.5s | 58x real-time |
| 10 minutes | 4.7s | 57x real-time |
10 minutes of audio transcribes in under 5 seconds. No upper limit found.
| Constraint | Detail |
|---|---|
| Platform | macOS 26+, Apple Silicon only |
| Model | One model (apple-speechanalyzer), not configurable |
| Accuracy | ~90-95% on clear synthetic speech. Lower on real speech with noise, accents, or multiple speakers |
| Numbers | Spoken numbers sometimes confused with ordinals ("five second" → "52nd") |
| Languages | 30 languages supported, but accuracy tested only for English. Other languages need the -l flag |
| No diarization | Cannot distinguish between different speakers |
| Audio formats | m4a, wav, mp3, mp4, caf, aiff, flac. No OGG, OPUS, or WebM |
| apfel integration | When piping to apfel, transcripts longer than ~3000 words may exceed apfel's 4096-token context window |
| Testing gap | All testing used synthetic say command speech, not real human recordings |
See docs/testing.md for the full QA report with methodology and detailed results.
ohr <file> Transcribe audio file
ohr -o srt <file> Generate SRT subtitles
ohr -o vtt <file> Generate VTT subtitles
ohr -o json <file> JSON output with segments
ohr --listen Live microphone transcription
ohr --serve Start OpenAI-compatible server
cat audio.wav | ohr Transcribe from stdin
Output options:
| Flag | Description |
|---|---|
-o, --output <fmt> |
Output format: plain (default), json, srt, vtt |
--json |
Shorthand for -o json |
--srt |
Shorthand for -o srt |
--vtt |
Shorthand for -o vtt |
--timestamps |
Show timestamps in plain text output |
-l, --language <code> |
Language code (e.g. en-US, de-DE) |
-q, --quiet |
Suppress headers and chrome |
--no-color |
Disable ANSI colors |
Server options (--serve):
| Flag | Description |
|---|---|
--port <n> |
Server port (default: 11434) |
--host <addr> |
Bind address (default: 127.0.0.1) |
--cors |
Enable CORS headers for browser clients |
--allowed-origins <origins> |
Add comma-separated allowed origins |
--no-origin-check |
Disable origin checking |
--token <secret> |
Require Bearer token authentication |
--token-auto |
Generate and print a random Bearer token |
--public-health |
Keep /health unauthenticated on non-loopback |
--footgun |
Disable all protections |
--max-concurrent <n> |
Max concurrent requests (default: 5) |
--debug |
Verbose logging and enable /v1/logs endpoints |
Info options:
| Flag | Description |
|---|---|
-v, --version |
Print version |
-h, --help |
Show help |
--release |
Show detailed build info |
--model-info |
Show model capabilities and languages |
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Runtime error |
| 2 | Usage error (bad flags) |
| 3 | Unsupported audio format |
| 4 | File not found |
| 5 | Transcription failed |
| 6 | Rate limited |
| Variable | Description |
|---|---|
OHR_PORT |
Server port (default: 11434) |
OHR_HOST |
Server bind address (default: 127.0.0.1) |
OHR_TOKEN |
Bearer token for server authentication |
OHR_LANGUAGE |
Default language code |
NO_COLOR |
Disable colors (no-color.org) |
CLI (file/stdin/mic) ──┐
├──→ Speech.SpeechAnalyzer (file transcription)
├──→ Speech.SpeechTranscriber (live microphone)
HTTP Server (/v1/*) ───┘ (100% on-device, zero network)
Built with Swift 6.3 strict concurrency. Single Package.swift, three targets:
OhrCore— pure logic library (no Speech framework dependency, unit-testable)ohr— executable (CLI + server)ohr-tests— 109 unit tests
No Xcode required. Builds and tests with Command Line Tools only.
# Build + install (auto-bumps patch version each time)
make install # build release + install to /usr/local/bin
make build # build release only (no install)
# Version management (zero manual editing)
make version # print current version
make release-minor # bump minor: 0.1.x -> 0.2.0
make release-major # bump major: 0.x.y -> 1.0.0
# Debug build (no version bump, uses swift directly)
swift build # quick debug build
# Unit tests
swift run ohr-tests # 109 pure Swift unit tests (no XCTest needed)
# Integration tests (requires server running)
ohr --serve --token test --debug & # start server
OHR_TEST_TOKEN=test python3 -m pytest Tests/integration/ -v # 42 integration testsEvery make build/make install automatically:
- Bumps the patch version (
.versionfile is the single source of truth) - Generates build metadata (commit, date, Swift version) viewable via
ohr --release
| Tool | What | Apple Framework | Repo |
|---|---|---|---|
| apfel | LLM (text generation) | FoundationModels | golden example |
| ohr | Speech-to-text | SpeechAnalyzer | you are here |
| kern | Text embeddings | NLContextualEmbedding | sister project |
| auge | Vision / OCR | Vision | sister project |
Meta-repo: apfel-ecosystem