feat: Qwen3-TTS 1.7B VoiceDesign ONNX backend#3
Merged
wavekat-eason merged 15 commits intomainfrom Apr 7, 2026
Merged
Conversation
Switch qwen3-tts backend from 0.6B/elbruno to the WaveKat 1.7B VoiceDesign ONNX repo with INT4 by default. - download: replace ureq HTTP with hf-hub client; downloads int4/ ONNX + embeddings/ + tokenizer/ from wavekat/Qwen3-TTS-1.7B-VoiceDesign-ONNX (pinned revision) - model: HIDDEN_DIM 1024→2048, MAX_NEW_TOKENS 2048→8192, sampling updated to config.json values (top_k=50, temp=0.9), non-streaming prefill (all text in prefill, matches generate_onnx.py), fix last-position logits extraction bug, min_new_tokens=2, VoiceDesign codec prefix (4 tokens, no speaker slot), int4/ and embeddings/ subdirectory paths - sampler: replace top_p with top_k - tokenizer: load from tokenizer/ subdirectory Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `instruction` field to `SynthesizeRequest` with `with_instruction()` - Qwen3-TTS backend builds user-turn prefix from instruction tokens; warns on stderr when no instruction is provided - Upgrade hf-hub 0.3 → 0.5 (fixes relative-Location redirect bug) - synthesize example: default instruction, /lang, /langs, /instruct, /status, /help commands for live session control Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduce ModelConfig (precision + execution_provider + model_dir) and replace new_with_precision/from_dir with a single from_config constructor. Model dir resolution order: config field → WAVEKAT_MODEL_DIR → HF Hub. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Wrap no-instruction eprintln! in Once so it fires at most once per process rather than on every synthesize() call - Remove double blank line and empty [dev-dependencies] in Cargo.toml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Merged
wavekat-eason
pushed a commit
that referenced
this pull request
Apr 7, 2026
## 🤖 New release * `wavekat-tts`: 0.0.1 <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.0.1](https://github.com/wavekat/wavekat-tts/releases/tag/v0.0.1) - 2026-04-07 ### Added - CUDA/TRT/CoreML execution providers + RTF benchmarks - Qwen3-TTS 1.7B VoiceDesign ONNX backend ([#3](#3)) - Qwen3-TTS backend and ONNX export tooling ([#1](#1)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
wavekat/Qwen3-TTS-1.7B-VoiceDesign-ONNX)ModelConfigbuilder API replaces positional constructors — controls precision (INT4/FP32), execution provider (CPU/CUDA/CoreML), and local model dirSynthesizeRequest::with_instructionfor VoiceDesign style prompts; warns once if omitted/lang,/instruct,/statuscommandsureqtohf-hubfor caching and auth (HF_TOKEN)wavekat-corebumped to 0.0.5 withwavfeature; WAV I/O viaAudioFrame::write_wavTest plan
make checkpassescargo run --example synthesize --features qwen3-tts -- "Hello"downloads INT4 and produces audiocargo run --example synthesize --features qwen3-tts -- --precision fp32 "Hello"downloads FP32cargo run --example synthesize --features qwen3-tts -- -ienters interactive modeWAVEKAT_MODEL_DIR=<path> cargo run --example synthesize --features qwen3-tts -- "Hello"loads from local dir🤖 Generated with Claude Code