Skip to content

Feat/new tts engines#31

Open
willwade wants to merge 8 commits intomainfrom
feat/new-tts-engines
Open

Feat/new tts engines#31
willwade wants to merge 8 commits intomainfrom
feat/new-tts-engines

Conversation

@willwade
Copy link
Copy Markdown
Owner

@willwade willwade commented Apr 8, 2026

No description provided.

willwade added 6 commits April 8, 2026 12:47
…o, Mistral, Murf, Unreal Speech, Resemble)

- Cartesia: sonic-3/sonic-2 with emotion-to-SSML mapping
- Deepgram: aura-2 with static voice list and streaming
- Hume: octave-2/octave-1 with version mapping and streaming
- xAI: grok-tts with native audio tag passthrough
- Fish Audio: s2-pro with model-as-header pattern
- Mistral: voxtral-mini-tts-2603 with SSE streaming
- Murf: GEN2/FALCON with dual model endpoints
- Unreal Speech: two-step URI + direct streaming
- Resemble: base64 JSON + direct streaming
- ElevenLabs: added v3 audio tag processing
- 166 new tests across 12 test files
- All engines registered in factory, types, and exports
- Add engine table entries for Cartesia, Deepgram, Hume, xAI, Fish Audio,
  Mistral, Murf, Unreal Speech, Resemble
- Add engine-specific examples with usage notes
- Update timing table, SSML table, and Speech Markdown table
- Update factory pattern engine list
- Move Mistral/Murf/Unreal/Resemble to Completed in BACKLOG
- Replace Buffer.from() in ElevenLabs with cross-env base64ToUint8Array
- Replace atob() in Mistral/Murf/Resemble with shared utility
- Add shared src/utils/base64-utils.ts for cross-env base64 decoding
- Fix Fish Audio checkCredentials to use GET /v1/model (no quota consumed)
- Fix Resemble checkCredentials to use GET /v2/voices (no quota consumed)
- Add voice discovery: Hume (16 static voices), Mistral (27 static voices)
- Add Resemble voice listing via API
- Fix xAI: remove dead processAudioTags no-op code
- Fix pre-existing lint issues in google.ts, polly.ts, playht.ts, openai.ts,
  sherpaonnx.ts, abstract-tts.ts, azure.ts
- Add shared language-utils.ts with toIso639_3(), toLanguageDisplay()
- Fix bcp47: use full BCP-47 codes (en-US not en) in static voice lists
- Fix iso639_3: use proper 3-letter ISO 639-3 codes (eng not en)
- Fix display: use human-readable names (English (US) not en-US)
- Applies to: cartesia, deepgram, hume, xai, fishaudio, mistral,
  murf, unrealspeech, resemble
- Add ModelFeature type: streaming, audio-tags, inline-voice-cloning,
  open-source, word-boundary-events, character-boundary-events, ssml
- Add FEATURES constants and ModelInfo interface to types.ts
- Add getModels(), hasFeature(), _getCurrentModelId() to AbstractTTSClient
- Define accurate _models metadata for all 23 engines based on code audit:
  - Real word boundaries: elevenlabs, google (beta), polly, azure, watson, sherpaonnx
  - Character boundaries: elevenlabs (all models)
  - Native SSML: google, polly, azure, watson, witai, sapi
  - Audio tags: elevenlabs v3, openai gpt-4o-mini-tts, cartesia sonic-3, xai grok-tts, fish audio s2-pro
  - Voice cloning: elevenlabs v3, cartesia sonic-3, hume octave-2, fish audio s2-pro, mistral, resemble
  - Open source: sherpaonnx, sherpaonnx-wasm, espeak-ng, mistral, resemble
- Pipe response.body directly when not using timestamps (avoids buffering)
- Only buffer when useTimestamps=true (needs JSON response for alignment)
- Only buffer when format=wav (needs mp3-to-wav conversion)
- Add streaming feature to all ElevenLabs models
- Update BACKLOG: mark streaming improvements as done for Cartesia,
  Deepgram, ElevenLabs, Polly; note Google Cloud TTS SDK limitation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants