Skip to content

punt-labs/vox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

355 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

punt-vox

Voice for your AI coding assistant.

License CI PyPI Python Working Backwards

When Claude Code finishes a task, hits an error, or needs your approval --- you hear it. No need to watch the terminal. Keep working; your assistant will tell you what happened.

Platforms: macOS, Linux

Hear It

Real samples generated by vox with ElevenLabs v3. The first three are the same recap with different /vibe moods --- expressive tags change how the voice sounds without changing the words.

Sample Vibe Voice
Task recap neutral sarah listen
Same recap [excited] sarah listen
Same recap [weary] [sighs] sarah listen
Task complete neutral matilda listen

Quick Start

curl -fsSL https://raw.githubusercontent.com/punt-labs/vox/3837b9c/install.sh | sh

Restart Claude Code, then:

/vox y        # hear when tasks complete or need input
/recap        # spoken summary of what just happened
Manual install (if you already have uv)
uv tool install punt-vox
vox install
vox doctor
Verify before running
curl -fsSL https://raw.githubusercontent.com/punt-labs/vox/3837b9c/install.sh -o install.sh
shasum -a 256 install.sh
cat install.sh
sh install.sh

Features

  • Notification layer --- spoken summaries when tasks finish, chimes when Claude needs input
  • Session vibe --- /vibe sets the mood for all speech. Auto-mode reads session signals (test results, lint, git ops) and adapts the voice. Manual mode lets you set it yourself. ElevenLabs expressive tags ([weary], [excited], [sighs]) color every utterance.
  • Five providers --- ElevenLabs, OpenAI, AWS Polly, macOS say, and Linux espeak-ng. The full experience (natural voice, expressive tags, /vibe) requires ElevenLabs.
  • Opt-in only --- no audio until you enable it, no surprises
  • Voice or chime --- /mute switches to audio tones, no TTS API calls
  • Graceful absence --- if punt-vox isn't installed, Claude Code works exactly as before
  • MCP-native --- runs as a Claude Code plugin with slash commands and hooks
  • Daemon mode --- optional single-process daemon (vox serve) fronted by mcp-proxy. Eliminates per-session overhead, deduplicates audio across sessions, and drops hook latency from ~500ms to ~15ms

What It Looks Like

Enable notifications

> /vox y

Vox enabled. You'll hear when tasks finish or need approval.
Pick a voice with /unmute @<name>.

Get a recap

> /recap

Speaking: "I refactored the authentication module into three files, added
comprehensive tests for the token refresh flow, and fixed a race condition
in the session middleware. All 47 tests pass."

Set the vibe

> /vibe banging my head against the wall

Vibe: banging my head against the wall → [frustrated] [sighs] [manual]

Auto-mode (default) reads session signals and adapts automatically --- after a string of test failures the voice sounds [weary], after a successful release it sounds [excited].

Switch to chime-only

> /mute

Muted — chimes only.

Chimes are mood-aware: when a vibe is active, chimes pitch-shift to match (bright for happy sessions, dark for frustrated ones). Eight distinct signals (tests pass/fail, lint pass/fail, git push, merge conflict, done, prompt) × three mood variants = 24 chime assets.

Commands

Command Purpose
/vox y Enable vox (chime notifications)
/vox n Disable vox
/vox c Continuous mode (spoken summaries on task completion)
/unmute Enable voice mode (spoken notifications)
/unmute @matilda Set session voice + enable voice
/unmute @ Browse voice roster
/mute Chimes only --- no voice
/recap Spoken summary of Claude's last response
/vibe <mood> Set session mood --- voice adapts to match
/vibe auto Auto-detect mood from session signals (default)
/vibe off Disable vibe --- neutral voice

Providers

The full experience --- natural voice with expressive tags that respond to /vibe --- requires ElevenLabs. The other providers are fallbacks for environments where ElevenLabs isn't available.

Provider API Key Default Voice Best For
ElevenLabs ELEVENLABS_API_KEY matilda Recommended. Natural voice, expressive tags via /vibe
OpenAI OPENAI_API_KEY nova Fast notifications, low latency
AWS Polly AWS credentials joanna Natural voice, cost-effective
macOS say samantha Zero-config on macOS, offline
espeak-ng en Zero-config on Linux, offline

Auto-detection order: ElevenLabs > OpenAI > Polly (if AWS credentials valid) > say (macOS) / espeak (Linux).

CLI

punt-vox is also a standalone TTS tool, independent of Claude Code.

vox unmute "Hello world"                       # Synthesize + play
vox record "Hello world" -o hello.mp3          # Synthesize + save
vox record --from segments.json                # From JSON segments file
vox vibe excited                               # Set session mood
vox notify y                                   # Enable notifications
vox notify c                                   # Continuous spoken mode
vox speak n                                    # Chimes only
vox voice matilda                              # Set session voice
vox status                                     # Current state
vox version                                    # Print version
vox doctor                                     # Check setup
vox install                                    # Install Claude Code plugin
vox mcp                                        # Start MCP server (stdio)
vox serve                                      # Start daemon (HTTP + WebSocket)
vox daemon install                             # Register as system service
vox daemon status                              # Check if daemon is running

Environment Variables

Variable Description Default
TTS_PROVIDER Force a specific provider auto-detect
TTS_MODEL Model override provider default
VOX_OUTPUT_DIR Output directory ~/vox-output

Roadmap

Shipped

  • Mic API: unified unmute/record/vibe/who MCP tools with segment-based input
  • Notification layer: /vox y|n|c, /mute, /unmute, /recap, Stop + Notification hooks
  • Multi-provider TTS engine: ElevenLabs, AWS Polly, OpenAI, macOS say, Linux espeak-ng
  • Claude Code plugin: marketplace install, MCP server, slash commands
  • CLI: unmute, record, vibe, on/off, mute, version, status, doctor
  • Ephemeral output mode (.vox/ in cwd)
  • Two-channel display: panel summaries with voice/provider context
  • Audio playback serialization via flock --- concurrent utterances queue instead of overlapping
  • ElevenLabs streaming API for lower time-to-first-audio
  • /vibe with auto, manual, and off modes --- ElevenLabs expressive tags color every utterance
  • Auto-vibe signal accumulator: test pass/fail, lint, git ops feed mood detection
  • Per-signal chime assets and vibe-driven chimes with mood-aware pitch shifting
  • Daemon mode: single vox serve process with mcp-proxy, audio deduplication, launchd/systemd service management

Coming Soon

Feature What It Does
Per-session voices Each Claude Code session gets its own voice from a pool --- no more five matildas talking at once. /voice to audition and pick.

Documentation

Architecture (PDF) | Design Log | Testing | Changelog

Development

uv sync --all-extras    # Install dependencies
make check              # Run all quality gates

License

MIT

About

A voice for Claude Code: Text-to-speech CLI, MCP server, and Claude Code plugin (ElevenLabs, AWS Polly, OpenAI)

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors