feat: Qwen3-TTS 1.7B VoiceDesign ONNX backend by wavekat-eason · Pull Request #3 · wavekat/wavekat-tts

wavekat-eason · 2026-04-07T02:09:30Z

Summary

Adds Qwen3-TTS-12Hz-1.7B-VoiceDesign ONNX backend with auto-download from HF Hub (wavekat/Qwen3-TTS-1.7B-VoiceDesign-ONNX)
ModelConfig builder API replaces positional constructors — controls precision (INT4/FP32), execution provider (CPU/CUDA/CoreML), and local model dir
SynthesizeRequest::with_instruction for VoiceDesign style prompts; warns once if omitted
Interactive CLI example with /lang, /instruct, /status commands
Switches from manual HTTP download + ureq to hf-hub for caching and auth (HF_TOKEN)
wavekat-core bumped to 0.0.5 with wav feature; WAV I/O via AudioFrame::write_wav

Test plan

make check passes
cargo run --example synthesize --features qwen3-tts -- "Hello" downloads INT4 and produces audio
cargo run --example synthesize --features qwen3-tts -- --precision fp32 "Hello" downloads FP32
cargo run --example synthesize --features qwen3-tts -- -i enters interactive mode
WAVEKAT_MODEL_DIR=<path> cargo run --example synthesize --features qwen3-tts -- "Hello" loads from local dir

🤖 Generated with Claude Code

Switch qwen3-tts backend from 0.6B/elbruno to the WaveKat 1.7B VoiceDesign ONNX repo with INT4 by default. - download: replace ureq HTTP with hf-hub client; downloads int4/ ONNX + embeddings/ + tokenizer/ from wavekat/Qwen3-TTS-1.7B-VoiceDesign-ONNX (pinned revision) - model: HIDDEN_DIM 1024→2048, MAX_NEW_TOKENS 2048→8192, sampling updated to config.json values (top_k=50, temp=0.9), non-streaming prefill (all text in prefill, matches generate_onnx.py), fix last-position logits extraction bug, min_new_tokens=2, VoiceDesign codec prefix (4 tokens, no speaker slot), int4/ and embeddings/ subdirectory paths - sampler: replace top_p with top_k - tokenizer: load from tokenizer/ subdirectory Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add `instruction` field to `SynthesizeRequest` with `with_instruction()` - Qwen3-TTS backend builds user-turn prefix from instruction tokens; warns on stderr when no instruction is provided - Upgrade hf-hub 0.3 → 0.5 (fixes relative-Location redirect bug) - synthesize example: default instruction, /lang, /langs, /instruct, /status, /help commands for live session control Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Introduce ModelConfig (precision + execution_provider + model_dir) and replace new_with_precision/from_dir with a single from_config constructor. Model dir resolution order: config field → WAVEKAT_MODEL_DIR → HF Hub. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Wrap no-instruction eprintln! in Once so it fires at most once per process rather than on every synthesize() call - Remove double blank line and empty [dev-dependencies] in Cargo.toml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

## 🤖 New release * `wavekat-tts`: 0.0.1 <details><summary>Changelog</summary> <blockquote> ## [0.0.1](https://github.com/wavekat/wavekat-tts/releases/tag/v0.0.1) - 2026-04-07 ### Added - CUDA/TRT/CoreML execution providers + RTF benchmarks - Qwen3-TTS 1.7B VoiceDesign ONNX backend ([#3](#3)) - Qwen3-TTS backend and ONNX export tooling ([#1](#1)) </blockquote> </details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

wavekat-eason and others added 15 commits April 6, 2026 20:19

docs: add 1.7B VoiceDesign migration notes

e031f0b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: show model loading progress after download

3db024f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add FP32 precision option to Qwen3-TTS backend

dfa4423

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: update README for 1.7B VoiceDesign + FP32 precision

9fb5ba7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: mark CosyVoice backend as planned

3ab1b92

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: bump wavekat-core to 0.0.4, use AudioFrame::from_vec

f5da9a3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: propose AudioFrame WAV I/O feature for wavekat-core

f3c4fd6

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: use AudioFrame::write_wav from wavekat-core 0.0.5

2203380

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: show write_wav in quick start and update example commands

3b39e0d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: remove redundant cargo add wavekat-core from quick start

5ce743e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

style: fix rustfmt formatting in qwen3_tts backend

f286d3a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

wavekat-eason merged commit c27da36 into main Apr 7, 2026
5 checks passed

wavekat-eason deleted the feat/model-config branch April 7, 2026 02:51

github-actions bot mentioned this pull request Apr 7, 2026

chore: release v0.0.1 #8

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Qwen3-TTS 1.7B VoiceDesign ONNX backend#3

feat: Qwen3-TTS 1.7B VoiceDesign ONNX backend#3
wavekat-eason merged 15 commits intomainfrom
feat/model-config

wavekat-eason commented Apr 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wavekat-eason commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wavekat-eason commented Apr 7, 2026 •

edited

Loading