Skip to content

felores/narrate

Repository files navigation

narrate

Make your AI agents and scripts speak. One command, six TTS providers, zero lock-in.

60-second quickstart (macOS)

Hear narrate speak in three commands — no API keys, no signup:

brew install felores/narrate/narrate
brew services start narrate
narrate "Hello, narrate"

That's it. Uses your built-in macOS voice. Want studio-quality voices? Add an API key — it's optional.

Linux? curl -fsSL https://raw.githubusercontent.com/felores/narrate/main/install.sh | bash, then sudo apt install espeak-ng, then narrate-server & and narrate "hello".


Table of contents

Add an API key

Optional. The default macOS voice works fine for notifications, but premium providers sound dramatically better. Pick one (or several):

Provider Where to get the key Cost
ElevenLabs elevenlabs.io free tier, premium voices
OpenAI platform.openai.com/api-keys pay-per-use, very cheap
Google Gemini aistudio.google.com/apikey free tier
xAI console.x.ai pay-per-use

Then add the key(s) to ~/.env and switch the default provider:

echo 'OPENAI_API_KEY=sk-...' >> ~/.env       # any subset works
echo 'ELEVENLABS_API_KEY=...' >> ~/.env

mkdir -p ~/.config/narrate
echo '{"default_provider":"openai","default_voice":"nova"}' > ~/.config/narrate/config.json

brew services restart narrate
narrate "Now I sound much better"

narrate verify shows you which providers are configured. See Provider setup detail for per-provider voice IDs.

Why ~/.env, not ~/.zshrc? Background services (brew services, LaunchAgent, systemd) don't run shell init. ~/.env is the only path that works for both CLI and the server-as-service.


Why narrate

Every coding harness reinvents voice. ElevenLabs has a UI, OpenAI has an API, Cartesia has another API, Voicebox has its own MCP server — and each agent (Claude Code, OpenCode, Pi, Cursor, Cline) has its own way of plugging in. The result: shell scripts that hardcode one provider, hooks that break when you change agents, no shared concept of "voice".

narrate collapses the matrix:

  • One server, one set of API keys, one set of voice presets.
  • Three interfaces: HTTP for anything, CLI for shells, MCP for agents that speak the protocol.
  • Six providers behind a uniform Provider interface — including a proxy to Voicebox for fully local voice cloning.
  • Voice presets that abstract over providers (narrate --voice researcher works whether researcher is OpenAI Nova or Voicebox Morgan).
  • Drop into any harness: hook scripts, plugins, MCP — pick whichever your tool supports.

Providers

Provider Type Auth Notes
ElevenLabs Cloud ELEVENLABS_API_KEY High quality, premium voices
OpenAI TTS Cloud OPENAI_API_KEY alloy, echo, fable, onyx, nova, shimmer
Google Gemini TTS Cloud GEMINI_API_KEY Multilingual, requires ffmpeg for PCM→WAV
xAI Grok TTS Cloud XAI_API_KEY eve, ara, rex, sal, leo
Voicebox Local proxy none Auto-detects on :17493 — voice cloning, 7 local engines, 23 languages
System (say / espeak) Local none Zero-dep fallback, works offline

Add any subset. narrate uses what you've configured and reports the rest as ⚪ not configured in narrate verify.

Install

macOS — Homebrew (recommended, one command)

brew install felores/narrate/narrate
brew services start narrate          # auto-start at login

That's everything. Bun is pulled in as a dependency. After this you can run narrate "hello" and you'll hear it.

Linux / macOS — curl install

Requires bun first (curl -fsSL https://bun.sh/install | bash).

curl -fsSL https://raw.githubusercontent.com/felores/narrate/main/install.sh -o /tmp/narrate-install.sh
bash /tmp/narrate-install.sh
"$HOME/.local/share/narrate/service/launchd/install.sh"   # macOS
"$HOME/.local/share/narrate/service/systemd/install.sh"   # Linux

Clones to ~/.local/share/narrate, writes wrappers to ~/.local/bin/{narrate,narrate-server}, then installs the auto-start service. Override paths via NARRATE_DIR, BIN_DIR, NARRATE_REF.

Development — git clone

git clone https://github.com/felores/narrate.git ~/Documents/GitHub/narrate
cd ~/Documents/GitHub/narrate
bun install
bun run src/server.ts &
bun run src/cli.ts verify

Where things live

Once installed, the repo + scripts are at one of these paths depending on the method you used:

Install method $NARRATE_DIR Logs
Homebrew $(brew --prefix narrate)/libexec $NARRATE_DIR/logs/narrate.log
curl install ~/.local/share/narrate $NARRATE_DIR/logs/narrate.log
git clone (dev) wherever you cloned (e.g. ~/Documents/GitHub/narrate) $NARRATE_DIR/logs/narrate.log

Set it once in your shell init so the recipes below work copy-paste:

# pick the line that matches how you installed
export NARRATE_DIR="$(brew --prefix narrate)/libexec"   # brew
export NARRATE_DIR="$HOME/.local/share/narrate"         # curl
export NARRATE_DIR="$HOME/Documents/GitHub/narrate"     # git clone

The running server reports its own location at GET /health (repo_dir, logs_dir) — useful for plugins and tooling that need to self-locate.

Configure

You can skip this entirely if the Add an API key section above covered your needs. This section is for named voice presets and per-provider tweaks.

Voice presets (voices.json)

Map a friendly name to a (provider, voice_id) triple so you can swap providers without touching agent code:

mkdir -p ~/.config/narrate
cp "$NARRATE_DIR/voices.json.example" ~/.config/narrate/voices.json
narrate --voice researcher "Findings ready"   # uses the preset from voices.json

Edit ~/.config/narrate/voices.json to add your own presets. Full schema in voices.json — voice presets.

Custom defaults (config.json)

cat > ~/.config/narrate/config.json <<EOF
{
  "default_provider": "openai",
  "default_voice": "researcher",
  "port": 8888
}
EOF
brew services restart narrate

See Configuration precedence for the full resolution chain.

Quickstart by interface

narrate exposes three interfaces. Pick whichever your tool supports.

CLI — narrate "..."

Best for shells, hooks, scripts, cron, terminal one-offs.

narrate "Build complete"
narrate --voice engineer "Tests passed"
narrate --provider system --id Samantha "Local fallback"
echo "Long output" | narrate --quiet
narrate verify              # doctor-style health snapshot
narrate verify --test       # also play one sample per configured provider (1 API call each)

HTTP — POST localhost:8888/notify

Best for plugin code, webhooks, anything that can fetch.

curl -X POST http://localhost:8888/notify \
  -H 'Content-Type: application/json' \
  -H 'X-Narrate-Client-Id: my-app' \
  -d '{"message":"Build green","voice":"engineer"}'

MCP — narrate.speak(...)

Best for AI agents with native tool calling. The agent itself decides when to speak.

# Claude Code one-liner
claude mcp add narrate \
  --transport http \
  --url http://localhost:8888/mcp \
  --header "X-Narrate-Client-Id: claude-code"

Or via .mcp.json in any HTTP MCP client (Cursor, Windsurf, VS Code, Cline):

{
  "mcpServers": {
    "narrate": {
      "url": "http://localhost:8888/mcp",
      "headers": { "X-Narrate-Client-Id": "cursor" }
    }
  }
}

The agent now sees narrate.speak, narrate.list_voices, and narrate.list_providers as tools.

Use it from each harness

Per-harness recipes live under integrations/. Summary:

Harness Method Recipe
Claude Code MCP (recommended) or Stop hook integrations/claude-code/
Cursor / Windsurf / Cline MCP integrations/cursor/
OpenCode Plugin (@opencode-ai/plugin) integrations/opencode/
Pi (pi-mono) agent.subscribe('turn_end') integrations/pi/
ChatGPT Codex CLI Wrapper script integrations/codex/
Shell scripts / cron / CI Direct CLI integrations/shell/

Provider setup detail

ElevenLabs

  1. Sign up at elevenlabs.io → API Keys → create a key.
  2. echo 'ELEVENLABS_API_KEY=your_key' >> ~/.env
  3. Voice IDs: find them at elevenlabs.io/voice-lab (each voice's URL ends in its ID).
  4. Add to voices.json:
    "rachel": { "provider": "elevenlabs", "voice_id": "21m00Tcm4TlvDq8ikWAM" }

OpenAI TTS

  1. Get a key at platform.openai.com/api-keys.
  2. echo 'OPENAI_API_KEY=sk-...' >> ~/.env
  3. Six built-in voices (no IDs to look up): alloy, echo, fable, onyx, nova, shimmer.
  4. Optional providerConfig: { "model": "tts-1-hd", "speed": 1.2 } for higher quality / faster speech.
    "narrator": {
      "provider": "openai",
      "voice_id": "fable",
      "providerConfig": { "model": "tts-1-hd" }
    }

Google Gemini TTS

  1. Get a key at aistudio.google.com/apikey.
  2. echo 'GEMINI_API_KEY=...' >> ~/.env
  3. Install ffmpeg (Gemini returns raw PCM that we convert to WAV):
    brew install ffmpeg                     # macOS
    sudo apt install ffmpeg                 # Linux
  4. Voice names: Kore, Puck, Charon, Fenrir, Aoede (and others — see Gemini docs).

xAI Grok TTS

  1. Get a key at console.x.ai.
  2. echo 'XAI_API_KEY=...' >> ~/.env
  3. Voice IDs: eve, ara, rex, sal, leo.
  4. Optional: XAI_LANGUAGE=auto (default), XAI_VOICE_ID=ara set as default voice.

Voicebox (local)

See Voicebox deep dive. TLDR:

"$NARRATE_DIR/examples/voicebox-install-macos.sh"
open /Applications/Voicebox.app
# wait for Kokoro model download via Settings → Engines (or another engine)
"$NARRATE_DIR/examples/voicebox-create-profile.sh"     # creates "Bella" profile
narrate --provider voicebox --id Bella "Local voice"

System (say / espeak)

Zero config on macOS — say is built in. On Linux, install espeak-ng:

sudo apt install espeak-ng     # Debian/Ubuntu
sudo dnf install espeak-ng     # Fedora

Voice names: any voice your system speaks. On macOS:

say -v '?'                      # list all installed voices
narrate --provider system --id Samantha "macOS Samantha"
narrate --provider system --id "Daniel" "British Daniel"

Voicebox deep dive

Voicebox is a local-first desktop app that runs TTS engines on your GPU. narrate uses it as a provider — your agent calls narrate.speak, narrate proxies to voicebox, voicebox plays the audio.

Install

"$NARRATE_DIR/examples/voicebox-install-macos.sh"

(Or download manually from voicebox.sh and drag to /Applications.)

Engine vs profile (gotcha)

Voicebox has two concepts:

  • Engine = the underlying TTS model (Kokoro, Qwen, Chatterbox, TADA, LuxTTS). Each engine ships preset voices.
  • Profile = a usable voice instance, either created from a preset or cloned from audio.

/speak only accepts profile names — preset voices have to be promoted to profiles first. Do it via UI, or with the helper:

"$NARRATE_DIR/examples/voicebox-create-profile.sh"                          # creates "Bella" from kokoro/af_bella
"$NARRATE_DIR/examples/voicebox-create-profile.sh" Adam kokoro am_adam en
"$NARRATE_DIR/examples/voicebox-create-profile.sh" Dora kokoro ef_dora es
"$NARRATE_DIR/examples/voicebox-create-profile.sh" George kokoro bm_george en

Multi-language behavior

Kokoro voices are flexible: the same profile can speak any of Kokoro's 8 languages depending on what language you pass to /speak. Voices are style vectors at the model level — they describe a timbre, not a language. Pointing them at a different language is supported.

  • A kokoro/ef_dora-backed profile created with language: "es" speaks natural Spanish.
  • The same Dora profile asked to speak language: "en" speaks English with a Spanish accent (her trained timbre + English phonetics).
  • A kokoro/af_bella-backed profile (en-trained) asked to speak language: "es" speaks Spanish with Bella's American voice timbre but proper Spanish phonetics — this is the way to make Bella speak Spanish naturally.
  • narrate's voicebox provider resolves profile.language automatically (cached 60s) as the default. Override per-call with --language es (CLI), providerConfig.language: "es" (POST body or voices.json), or pin a preset:
"bella_es": {
  "provider": "voicebox",
  "voice_id": "Bella",
  "providerConfig": { "language": "es" }
}

Available Kokoro presets at a glance

50 presets total. Some highlights:

Preset Name Language / accent
af_bella, af_nova, af_sky, af_nicole various en-female (US)
am_adam, am_onyx, am_echo Adam, Onyx, Echo en-male (US)
bf_emma, bf_alice Emma, Alice en-female (UK)
bm_george, bm_daniel George, Daniel en-male (UK)
ef_dora, em_alex Dora, Alex es female / male
ff_siwis Siwis fr female
hf_alpha, hm_omega various hi female / male
jf_alpha, jm_kumo various ja female / male
zf_xiaoxiao, others various zh female

Full list: curl http://127.0.0.1:17493/profiles/presets/kokoro.

voices.json — voice presets

Map a friendly name to a (provider, voice_id, options) triple so you can swap providers without touching agent code.

v2 schema (current)

{
  "default_voice": "fred",
  "default_rate": 175,
  "voices": {
    "fred":      { "provider": "elevenlabs", "voice_id": "s3TPKV1kjDlVtZbl4Ksh" },
    "researcher":{ "provider": "openai",     "voice_id": "nova"     },
    "engineer":  { "provider": "openai",     "voice_id": "alloy"    },
    "narrator":  { "provider": "openai",     "voice_id": "fable",
                   "providerConfig": { "model": "tts-1-hd" } },
    "ara":       { "provider": "xai",        "voice_id": "ara"      },
    "kore":      { "provider": "gemini",     "voice_id": "Kore"     },
    "bella":     { "provider": "voicebox",   "voice_id": "Bella"    },
    "dora":      { "provider": "voicebox",   "voice_id": "Dora"     },
    "samantha":  { "provider": "system",     "voice_id": "Samantha" }
  }
}

Use it with the preset name: narrate --voice dora "Hola".

v1 backward-compat

If your voices.json only has voice_name per entry (no provider field), narrate auto-assumes provider: "system" (the v1 schema was for macOS say). You'll see a one-line warning at startup.

Per-preset providerConfig

Each provider accepts extra options under providerConfig:

Provider Useful keys
ElevenLabs model_id, voice_settings: {stability, similarity_boost, style, use_speaker_boost}
OpenAI model (tts-1 / tts-1-hd), speed (0.25–4.0)
Gemini model
xAI language, sample_rate, bit_rate, codec
Voicebox language, instruct (Qwen CustomVoice natural-language delivery), personality (boolean), return_audio (use /generate instead of /speak)
System rate

CLI reference

narrate [options] "text to speak"
narrate verify [--test]
echo "text" | narrate [options]

Options:
  -v, --voice NAME      Voice preset from voices.json (e.g. fred, researcher)
  -i, --id ID           Raw provider voice id (bypasses preset registry)
  -p, --provider NAME   elevenlabs | openai | gemini | xai | voicebox | system
  -l, --language LANG   Force generation language (e.g. es, en, ja, fr).
                        Useful with cross-language voices: a Kokoro Bella
                        (en-trained) speaks proper Spanish phonetics with
                        --language es, since Kokoro is multilingual at the
                        model level.
  --instruct TEXT       Natural-language delivery hint (Qwen CustomVoice
                        only). E.g. "warm conversational tone",
                        "broadcast news quality", "speak slowly with
                        emphasis". Other engines ignore this flag.
  -u, --url URL         Server URL (default http://localhost:8888)
  -q, --quiet           Suppress output
  -h, --help            Show help

Subcommands:
  verify                Health snapshot — server status, provider matrix, voices
  verify --test         Also play one sample per configured provider (1 API call each)

Env:
  NARRATE_URL           Override default server URL
  NARRATE_VOICE         Default preset (fallback for omitted --voice)

--language and --instruct forward as providerConfig.{language,instruct} and override both preset providerConfig and the voicebox provider's auto-resolved profile defaults.

# Bella is en-trained, but Kokoro can aim her at Spanish phonetics:
narrate --provider voicebox --id Bella --language es "Hola, soy Bella en español"

# Qwen Ryan with delivery direction:
narrate --provider voicebox --id Ryan --instruct "broadcast news quality" "Headlines tonight"

HTTP API reference

POST /notify

Speak text. Returns immediately; audio plays asynchronously.

Body:

Field Type Required Notes
message string yes Up to 5000 chars, no control characters
voice string no Preset name from voices.json
voice_id string no Raw provider voice id (bypasses presets)
voice_name string no Legacy alias for voice_id
provider string no Override default provider
voice_enabled boolean no (default true) If false, returns {status: "ok", message: "voice_enabled=false; nothing to do"}
providerConfig object no Per-provider passthrough config (see provider table above)

Headers:

Header Purpose
X-Narrate-Client-Id Client identifier (logged + future per-client routing)

Response (200):

{ "status": "success", "provider": "openai", "voice": "alloy", "format": "mp3", "delegated": false }

delegated: true means the provider played the audio itself (voicebox, system) and narrate skipped local playback.

POST /pai

Legacy alias for /notify (PAI Voice compatibility).

GET /health

Server + provider snapshot.

{
  "status": "healthy",
  "port": 8888,
  "default_provider": "xai",
  "default_voice": "ara",
  "voices_path": "/Users/you/.config/narrate/voices.json",
  "voices": ["fred", "researcher", "engineer", ...],
  "providers": {
    "elevenlabs": { "configured": true },
    "openai": { "configured": true },
    "gemini": { "configured": true },
    "xai": { "configured": true },
    "voicebox": { "configured": true },
    "system": { "configured": true }
  }
}

GET /voices

Full voices.json contents.

{
  "default_voice": "fred",
  "default_rate": 175,
  "voices": { "fred": { ... }, "researcher": { ... } }
}

POST /mcp

MCP Streamable HTTP endpoint. JSON-RPC 2.0. See MCP tools reference.

MCP tools reference

Three tools available via the MCP server at /mcp:

speak

narrate.speak({
  text: string,                  // required, max 5000
  voice?: string,                // preset name from voices.json
  voice_id?: string,             // raw provider voice id
  provider?: "elevenlabs" | "openai" | "gemini" | "xai" | "voicebox" | "system"
}) -> "Spoken via <provider> (voice=<voice>, format=<fmt>, delegated playback)"

list_voices

narrate.list_voices() -> Array<{ name, provider, voice_id, description }>

Returns all voice presets from voices.json.

list_providers

narrate.list_providers() -> Array<{ name, label, configured, reason? }>

Returns the provider health matrix — same data as GET /health's providers field.

Discover via JSON-RPC

# tools/list
curl -X POST http://localhost:8888/mcp \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'

# tools/call
curl -X POST http://localhost:8888/mcp \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"speak","arguments":{"text":"Hello","voice":"researcher"}}}'

Configuration precedence

Higher rows win. narrate reads each layer at startup; mid-flight changes need a server restart.

# Layer Used for
1 CLI flags / POST body / MCP tool args per-call provider, voice, providerConfig
2 ~/.config/narrate/config.json default_provider, default_voice, port, voices_path
3 NARRATE_* env vars NARRATE_PORT, NARRATE_PROVIDER, NARRATE_VOICE, NARRATE_VOICES_PATH, NARRATE_URL (CLI only)
4 ~/.claude/settings.json (legacy compat) TTS_PROVIDER and DA_VOICE_ID/NARRATE_VOICE_ID are read for backward-compat
5 ~/.env API keys (ELEVENLABS_API_KEY, etc.) auto-loaded if present
6 Built-in defaults port: 8888, default_provider: "elevenlabs", default_rate: 175

API keys come from process.env (loaded from your shell or auto-loaded from ~/.env). Never put them in config.json or voices.json.

Run as a service

macOS (launchd)

brew services start narrate              # if installed via Homebrew
"$NARRATE_DIR/service/launchd/install.sh" # if installed via curl/git

The installer:

  1. Renders com.narrate.server.plist from a template ($HOME and $NARRATE_DIR substituted at install time, with a static PATH of <bun_dir>:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin).
  2. Drops it at ~/Library/LaunchAgents/.
  3. Loads it with launchctl.
  4. Verifies it's running.

To remove:

brew services stop narrate
"$NARRATE_DIR/service/launchd/uninstall.sh"

Linux (systemd)

"$NARRATE_DIR/service/systemd/install.sh"

Installs as a user service (~/.config/systemd/user/narrate.service) and runs systemctl --user enable --now.

To remove:

"$NARRATE_DIR/service/systemd/uninstall.sh"

Logging and observability

Live logs

File What
logs/narrate.log All requests, with timestamp, provider, voice, latency, client id
logs/narrate-error.log Errors
logs/launchd-stdout.log Pre-init startup output (small, only grows on crashes)
logs/launchd-stderr.log Same for stderr
# follow live request log (resolve the path via /health if you don't know it)
LOGS_DIR="$(curl -s localhost:8888/health | python3 -c 'import sys,json;print(json.load(sys.stdin)["logs_dir"])')"
tail -f "$LOGS_DIR/narrate.log"

# or if you set $NARRATE_DIR per "Where things live":
tail -f "$NARRATE_DIR/logs/narrate.log"

# example line
2026-04-27T23:44:36.733Z [/notify] → provider=voicebox voice=Dora bytes=42 from=localhost client=- ua=Bun/1.2.10
2026-04-27T23:44:36.755Z [/notify] ✅ 25ms provider=voicebox voice=Dora format=mp3 delegated=true

Log rotation

In-process rotation. Defaults: 10 MiB per file, keep last 5 (narrate.lognarrate.log.1 → ... → narrate.log.5).

# tune via env (read once at server start)
NARRATE_LOG_MAX_BYTES=20971520 NARRATE_LOG_KEEP=10 narrate-server

# disable entirely (use raw stdout/stderr — useful for `bun run` dev mode)
NARRATE_LOG_DISABLED=1 narrate-server

narrate verify doctor

narrate verify
narrate verify --test    # also play 1 sample per configured provider

Prints server health, default provider/voice, voices file path, preset list, and per-provider configured/reason status.

Architecture

┌────────────────────────────────────────────────────────────┐
│                      narrate (Bun process)                 │
│                                                            │
│   HTTP server (port 8888)                                  │
│   ├─ POST /notify    POST /pai (legacy)                    │
│   ├─ GET  /health    GET  /voices                          │
│   └─ POST /mcp       (MCP Streamable HTTP)                 │
│                                                            │
│            │                                               │
│            ▼                                               │
│   handleNotify()                                           │
│            │                                               │
│            ▼                                               │
│   Provider registry  (ALL_PROVIDERS)                       │
│   ┌──────────────┬──────────────┬────────────┐             │
│   │ ElevenLabs   │ OpenAI       │ Gemini     │  cloud      │
│   ├──────────────┼──────────────┼────────────┤             │
│   │ xAI          │ Voicebox     │ System     │  cloud/local│
│   └──────────────┴──────────────┴────────────┘             │
│            │                                               │
│            ▼                                               │
│   ArrayBuffer  (or delegated=true)                         │
│            │                                               │
│            ▼                                               │
│   playback.ts → afplay (macOS) / ffplay (Linux)            │
└────────────────────────────────────────────────────────────┘

Each Provider (in src/providers/) implements a small interface:

interface Provider {
  name: string;
  label: string;
  health(): Promise<ProviderHealth>;
  generateSpeech(text: string, voice: string, opts?: ProviderOptions): Promise<AudioResult>;
  listVoices?(): Promise<VoiceInfo[]>;
}

Provider implementations talk to their respective APIs (or local services like voicebox :17493). The result is either an ArrayBuffer (cloud — narrate plays it locally via playback.ts) or delegated: true (voicebox, system — they handled playback themselves).

The MCP server is a thin wrapper: it registers narrate.speak, narrate.list_voices, narrate.list_providers as tools, and the speak tool calls the same handleNotify function as the HTTP handler. One code path, three interfaces.

Project layout

narrate/
├── src/
│   ├── providers/
│   │   ├── base.ts              # Provider interface, types
│   │   ├── elevenlabs.ts
│   │   ├── openai.ts
│   │   ├── gemini.ts
│   │   ├── xai.ts
│   │   ├── voicebox.ts
│   │   ├── system.ts
│   │   └── index.ts             # registry
│   ├── voices.ts                # voices.json loader (v1 → v2 compat)
│   ├── config.ts                # XDG config + env vars + ~/.claude/settings.json shim
│   ├── playback.ts              # afplay / ffplay
│   ├── logger.ts                # rotating file logger
│   ├── mcp.ts                   # MCP server (Streamable HTTP)
│   ├── server.ts                # HTTP server
│   └── cli.ts                   # narrate CLI
├── integrations/                # one folder per harness with real refs
│   ├── claude-code/
│   ├── opencode/
│   ├── pi/
│   ├── codex/
│   ├── cursor/
│   └── shell/
├── service/
│   ├── launchd/                 # macOS install + plist template
│   └── systemd/                 # Linux install + unit template
├── examples/
│   ├── config.example.json
│   ├── voicebox-install-macos.sh
│   └── voicebox-create-profile.sh
├── voices.json.example
├── install.sh                   # curl install entry point
├── package.json
├── tsconfig.json
├── README.md
├── CHANGELOG.md
├── LICENSE
└── .github/workflows/           # CI (TBD)

narrate vs voicebox

Voicebox is a full local-first TTS studio with on-device inference, voice cloning, dictation, MCP server, and 7 local engines. It's a desktop app.

narrate is a thin gateway. They compose — voicebox is one of narrate's providers.

narrate voicebox
Form factor CLI + HTTP server + MCP Desktop app (Tauri)
Engines Cloud + voicebox proxy + system 7 local engines (MLX/CUDA)
Voice cloning No (uses provider voices) Yes (zero-shot)
Dictation (STT) No Yes (Whisper hotkey)
MCP server Yes (/mcp) Yes (/mcp on :17493)
Footprint < 1 MB + bun GB of models
Best for Drop into any agent or shell Privacy-first studio workflows

Use narrate when you want one command that any harness or shell can call, mixing cloud and local providers. Use voicebox when you want fully local, GPU-accelerated voice. Use both when you want voicebox's quality plus narrate's harness-agnostic gateway.

Roadmap

Status Item
✅ v0.1.0 6 providers, CLI, HTTP server, voices.json v2, launchd + systemd
✅ v0.2.0 Per-request observability, narrate verify, real OpenCode + Pi integrations, voicebox install helper
✅ v0.3.0 MCP server (/mcp), curl install script, Homebrew tap, voicebox profile helper, multi-language fix
✅ v0.3.1 In-process log rotation
✅ v0.3.2 Voicebox instruct passthrough (Qwen natural-language delivery)
✅ v0.3.3 CLI --language and --instruct flags
✅ v0.3.4 SwiftBar / xbar menubar plugin
✅ v0.3.5 Portability fixes — /health exposes repo_dir/logs_dir, plugin auto-locates, SwiftBar Login Items autostart, plist drops $PATH snapshot
✅ v0.3.6 First-run UX: default provider is system so fresh installs work without API keys. README rewritten for non-technical users with a 3-command quickstart at the top.
Planned v0.4 Pre-built single-binary releases (bun build --compile per platform)
Planned v0.5 More providers (Cartesia, Hume EVI, Azure TTS)
Planned v0.6 --direct CLI mode (skip server, call providers directly)
Planned v0.7 Streaming TTS over WebSocket
Planned v0.8 Auth tokens for /notify and /mcp (currently localhost-only)
Planned v1.0 Test suite, GitHub Actions CI, npm publish

Troubleshooting

narrate verify says provider X is ⚪ not configured

  • Cloud provider: API key env var not set. cat ~/.env | grep <PROVIDER>_API_KEY. Restart the server after adding (brew services restart narrate or relaunch LaunchAgent).
  • Voicebox: app not running, or running on a non-default port. Open /Applications/Voicebox.app. If on a different port, set VOICEBOX_URL=http://127.0.0.1:NNNNN.
  • System on Linux: install espeak-ng.

Server logs show [xai] 404 Voice 'Samantha' not found

The default provider is whatever ~/.claude/settings.json says (or default_provider in config.json). When you pass --id Samantha without --provider system, narrate uses the default provider — which doesn't know about Samantha. Either:

  • narrate --provider system --id Samantha "..." (explicit provider)
  • narrate --voice samantha "..." (preset that bundles provider + voice id)

Voicebox profile speaks the wrong language

Solved in v0.3.0 (aede995): voicebox's /speak doesn't auto-pull language from the profile, it defaults to "en". narrate now resolves and passes profile.language automatically. If still wrong, force it via providerConfig.language:

"dora_es": {
  "provider": "voicebox", "voice_id": "Dora",
  "providerConfig": { "language": "es" }
}

Two narrate binaries on PATH

If you both brew install narrate AND ran the curl install, you have /opt/homebrew/bin/narrate and ~/.local/bin/narrate. Both work; PATH order decides which wins. Pick one and remove the other.

Logs are massive

Tune rotation:

# in your shell init or LaunchAgent EnvironmentVariables
NARRATE_LOG_MAX_BYTES=2097152    # 2 MiB
NARRATE_LOG_KEEP=3

Or disable entirely:

NARRATE_LOG_DISABLED=1

"Stateless transport cannot be reused" on /mcp

Already fixed in v0.3.0 (a5aaa14). If you see this, your local install is pre-fix — pull main and reload.

Contributing

git clone https://github.com/felores/narrate.git
cd narrate
bun install
bun run --watch src/server.ts                      # hot-reload dev mode
./node_modules/.bin/tsc --noEmit                   # typecheck

To add a new TTS provider:

  1. Create src/providers/<name>.ts implementing the Provider interface from src/providers/base.ts.
  2. Register it in src/providers/index.ts.
  3. Add an integration test in narrate verify --test (the sampleVoiceFor map).
  4. Document it in this README's Provider setup detail.

PRs welcome. Issues: https://github.com/felores/narrate/issues

License

MIT — see LICENSE.

About

Provider-agnostic TTS gateway and CLI for AI coding harnesses — ElevenLabs, OpenAI, Gemini, xAI, Voicebox, system TTS. Works with Claude Code, OpenCode, ChatGPT Codex, Cursor, Windsurf, Cline.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors