Skip to content

feat(voice): add Voxtral TTS provider (local + cloud)#34

Open
RollandMELET wants to merge 1 commit intoearlyaidopters:mainfrom
RollandMELET:feat/voxtral-tts
Open

feat(voice): add Voxtral TTS provider (local + cloud)#34
RollandMELET wants to merge 1 commit intoearlyaidopters:mainfrom
RollandMELET:feat/voxtral-tts

Conversation

@RollandMELET
Copy link
Copy Markdown
Contributor

Summary

  • Adds Voxtral (Mistral AI) as a new TTS provider with two modes:
    • Local: mlx-audio server on Apple Silicon (zero cost, ~2.5GB VRAM, high-quality French/English voices)
    • Cloud: Mistral API, OpenAI-compatible endpoint ($16/M chars)
  • Adds toOggOpus() helper that converts any audio format to OGG Opus via ffmpeg, with magic-bytes detection to skip if already OGG. This also fixes ElevenLabs (returns MP3) and Gradium (returns raw Opus) which weren't being converted before.
  • Updates the TTS cascade to: Voxtral local -> Voxtral API -> ElevenLabs -> Gradium -> macOS say
  • Documents all config vars in .env.example

Motivation

Running TTS locally eliminates API costs and latency. Voxtral 4B is Mistral's open-weight TTS model with native French support - ideal for ClaudeClaw users who self-host on Apple Silicon Macs. The cloud fallback uses the same Mistral API for users without local hardware.

Configuration

# Local (Apple Silicon)
pip install mlx-audio
mlx_audio.server --port 8881
# .env
VOXTRAL_LOCAL_URL=http://localhost:8881

# Cloud
# .env
MISTRAL_API_KEY=your-key

# Optional
VOXTRAL_VOICE=fr_male          # or fr_female, en_male, en_female
VOXTRAL_LOCAL_MODEL=mlx-community/Voxtral-4B-TTS-2603-mlx-4bit

Test plan

  • Local mode: set VOXTRAL_LOCAL_URL, send a voice note, verify OGG Opus response
  • Cloud mode: set MISTRAL_API_KEY only, verify cloud API fallback works
  • Cascade: disable Voxtral vars, verify ElevenLabs/Gradium/macOS say still work
  • toOggOpus(): verify MP3 (ElevenLabs) and raw Opus (Gradium) are properly converted
  • No Voxtral config: verify existing behavior is unchanged

Breaking changes

None. Fully backwards-compatible - existing setups without Voxtral vars work exactly as before.

Add Voxtral (Mistral AI) as a TTS provider with two modes:
- Local: mlx-audio server on Apple Silicon (zero cost, ~2.5GB VRAM)
- Cloud: Mistral API ($16/M chars, OpenAI-compatible)

Also adds:
- toOggOpus() helper for reliable audio format conversion
  (Telegram requires OGG Opus; detects magic bytes to skip if already OGG)
- Updated TTS cascade: Voxtral local -> Voxtral API -> ElevenLabs -> Gradium -> macOS say
- .env.example documentation for all Voxtral config vars

Local setup (Apple Silicon Mac):
  pip install mlx-audio
  mlx_audio.server --port 8881
  # .env: VOXTRAL_LOCAL_URL=http://localhost:8881

Cloud setup:
  # .env: MISTRAL_API_KEY=your-key

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant