Skip to content

tts: add Gemini native TTS provider#30

Open
chantskevin wants to merge 2 commits intocalesthio:mainfrom
chantskevin:feat/gemini-tts-provider
Open

tts: add Gemini native TTS provider#30
chantskevin wants to merge 2 commits intocalesthio:mainfrom
chantskevin:feat/gemini-tts-provider

Conversation

@chantskevin
Copy link
Copy Markdown

Summary

  • Adds gemini_tts provider tool using Gemini's generateContent API with response_modalities=["AUDIO"]
  • Supports 30 prebuilt voices, 3 models (flash/pro), automatic language detection, and prompt-directed delivery (e.g. "say cheerfully:")
  • Auto-discovered by the TTS selector — no changes needed elsewhere
  • Uses same GOOGLE_API_KEY / GEMINI_API_KEY as existing Google TTS tool

Test plan

  • Tool registers and is discovered by the registry
  • TTS selector routes to it via preferred_provider: "gemini"
  • Smoke test: generated 3.97s WAV audio successfully
  • Test with different voices (Aoede, Kore, Puck)
  • Test with pro model (gemini-2.5-pro-preview-tts)
  • Test invalid voice name returns clear error

🤖 Generated with Claude Code

Adds a new TTS provider that uses Gemini's generateContent API with
response_modalities=["AUDIO"] for expressive, context-aware speech.
Supports 30 voices, 3 models (flash/pro), automatic language detection,
and prompt-directed delivery. Auto-discovered by the TTS selector.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chantskevin chantskevin requested a review from calesthio as a code owner April 17, 2026 02:24
- Guard against missing/blocked candidates and safety filter rejections
- Extract API error body instead of losing it via raise_for_status()
- Reject empty PCM audio instead of writing a silent 0-second WAV
- Remove redundant ext variable (audio_format already constrained by enum)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
chantskevin added a commit to chantskevin/OpenMontage that referenced this pull request Apr 19, 2026
tts: add Gemini native TTS provider

Adds gemini_tts provider using Gemini's generateContent API with
response_modalities=["AUDIO"]. 30 voices, flash/pro models,
auto-discovered by tts_selector. Same GOOGLE_API_KEY/GEMINI_API_KEY
as existing Google TTS tool.

Includes hardening: guards against missing/blocked candidates and
safety-filter rejections, extracts API error body, rejects empty PCM.

Merged locally onto fork ahead of upstream PR calesthio#30.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant