Summary
ElevenLabs released eleven_v3, their most expressive model. It uses inline audio tags ([excited], [whispers], [laughs]) instead of SSML. The wrapper already accepts modelId: "eleven_v3" but has no awareness of the v3 markup format.
Changes Needed
1. SSML → audio tag translation in prepareText()
When model is eleven_v3 and input is SSML, translate key tags to v3 audio tags instead of stripping. Best-effort mapping:
| SSML |
v3 audio tag |
<emphasis level="strong"> |
[excited] |
<emphasis level="reduced"> |
[whispers] |
<break time="Xs"/> |
[pause] |
<prosody rate="..."> |
strip (no direct mapping) |
2. Expose new v3 request parameters
Add to ElevenLabsTTSOptions: seed, languageCode, previousText, nextText, applyTextNormalization
3. Tests
- Synthesise with
modelId: "eleven_v3" (real API call)
- SSML input to v3 client — verify translation + no crash
- Plain text + audio tags pass through unmodified
Out of Scope
- Other models (
eleven_flash_v2_5, eleven_turbo_v2_5) — already work
- Azure or other engine changes
Summary
ElevenLabs released
eleven_v3, their most expressive model. It uses inline audio tags ([excited],[whispers],[laughs]) instead of SSML. The wrapper already acceptsmodelId: "eleven_v3"but has no awareness of the v3 markup format.Changes Needed
1. SSML → audio tag translation in
prepareText()When model is
eleven_v3and input is SSML, translate key tags to v3 audio tags instead of stripping. Best-effort mapping:<emphasis level="strong">[excited]<emphasis level="reduced">[whispers]<break time="Xs"/>[pause]<prosody rate="...">2. Expose new v3 request parameters
Add to
ElevenLabsTTSOptions:seed,languageCode,previousText,nextText,applyTextNormalization3. Tests
modelId: "eleven_v3"(real API call)Out of Scope
eleven_flash_v2_5,eleven_turbo_v2_5) — already work