Skip to content

feat: Qwen3-TTS 1.7B VoiceDesign ONNX backend#3

Merged
wavekat-eason merged 15 commits intomainfrom
feat/model-config
Apr 7, 2026
Merged

feat: Qwen3-TTS 1.7B VoiceDesign ONNX backend#3
wavekat-eason merged 15 commits intomainfrom
feat/model-config

Conversation

@wavekat-eason
Copy link
Copy Markdown
Contributor

@wavekat-eason wavekat-eason commented Apr 7, 2026

Summary

  • Adds Qwen3-TTS-12Hz-1.7B-VoiceDesign ONNX backend with auto-download from HF Hub (wavekat/Qwen3-TTS-1.7B-VoiceDesign-ONNX)
  • ModelConfig builder API replaces positional constructors — controls precision (INT4/FP32), execution provider (CPU/CUDA/CoreML), and local model dir
  • SynthesizeRequest::with_instruction for VoiceDesign style prompts; warns once if omitted
  • Interactive CLI example with /lang, /instruct, /status commands
  • Switches from manual HTTP download + ureq to hf-hub for caching and auth (HF_TOKEN)
  • wavekat-core bumped to 0.0.5 with wav feature; WAV I/O via AudioFrame::write_wav

Test plan

  • make check passes
  • cargo run --example synthesize --features qwen3-tts -- "Hello" downloads INT4 and produces audio
  • cargo run --example synthesize --features qwen3-tts -- --precision fp32 "Hello" downloads FP32
  • cargo run --example synthesize --features qwen3-tts -- -i enters interactive mode
  • WAVEKAT_MODEL_DIR=<path> cargo run --example synthesize --features qwen3-tts -- "Hello" loads from local dir

🤖 Generated with Claude Code

wavekat-eason and others added 15 commits April 6, 2026 20:19
Switch qwen3-tts backend from 0.6B/elbruno to the WaveKat
1.7B VoiceDesign ONNX repo with INT4 by default.

- download: replace ureq HTTP with hf-hub client; downloads
  int4/ ONNX + embeddings/ + tokenizer/ from
  wavekat/Qwen3-TTS-1.7B-VoiceDesign-ONNX (pinned revision)
- model: HIDDEN_DIM 1024→2048, MAX_NEW_TOKENS 2048→8192,
  sampling updated to config.json values (top_k=50, temp=0.9),
  non-streaming prefill (all text in prefill, matches
  generate_onnx.py), fix last-position logits extraction bug,
  min_new_tokens=2, VoiceDesign codec prefix (4 tokens, no
  speaker slot), int4/ and embeddings/ subdirectory paths
- sampler: replace top_p with top_k
- tokenizer: load from tokenizer/ subdirectory

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `instruction` field to `SynthesizeRequest` with `with_instruction()`
- Qwen3-TTS backend builds user-turn prefix from instruction tokens;
  warns on stderr when no instruction is provided
- Upgrade hf-hub 0.3 → 0.5 (fixes relative-Location redirect bug)
- synthesize example: default instruction, /lang, /langs, /instruct,
  /status, /help commands for live session control

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduce ModelConfig (precision + execution_provider + model_dir) and
replace new_with_precision/from_dir with a single from_config constructor.
Model dir resolution order: config field → WAVEKAT_MODEL_DIR → HF Hub.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Wrap no-instruction eprintln! in Once so it fires at most once per
  process rather than on every synthesize() call
- Remove double blank line and empty [dev-dependencies] in Cargo.toml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wavekat-eason wavekat-eason merged commit c27da36 into main Apr 7, 2026
5 checks passed
@wavekat-eason wavekat-eason deleted the feat/model-config branch April 7, 2026 02:51
@github-actions github-actions bot mentioned this pull request Apr 7, 2026
wavekat-eason pushed a commit that referenced this pull request Apr 7, 2026
## 🤖 New release

* `wavekat-tts`: 0.0.1

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [0.0.1](https://github.com/wavekat/wavekat-tts/releases/tag/v0.0.1) -
2026-04-07

### Added

- CUDA/TRT/CoreML execution providers + RTF benchmarks
- Qwen3-TTS 1.7B VoiceDesign ONNX backend
([#3](#3))
- Qwen3-TTS backend and ONNX export tooling
([#1](#1))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant