feat: add FireRedVAD backend by wavekat-eason · Pull Request #38 · wavekat/wavekat-vad

wavekat-eason · 2026-03-25T05:40:18Z

Summary

Add FireRedVAD backend with pure Rust FBank feature extraction and CMVN normalization, using ONNX Runtime only for inference
Include accuracy tests, benchmarks, CI integration, upstream validation tests, and a standalone example
Add FireRedVAD support to vad-lab with per-stage process timings, RTF tooltip, and improved result card layout

Details

Library crate (wavekat-vad):

New firered feature flag with backends::firered::FireRedVad implementation
Pure Rust preprocessing pipeline: 80-band Mel filterbank, Kaldi mel scale, CMVN normalization
Zero-copy ONNX inference with DFSMN cache tensors
ONNX model and CMVN stats embedded at compile time via build.rs
Upstream parity validation against Python reference implementation
Added to accuracy baseline, benchmarks, and CI workflows

vad-lab tool:

FireRedVAD available as a backend option
Per-stage process timings (preprocess, inference, postprocess) streamed to frontend
RTF info tooltip explaining timing breakdown
Improved VAD result card layout and defaults
Fixed stale Select param values in saved configs

Docs:

Implementation plan, video script, experiment notes
README updates with acknowledgements section
Reference Python scripts for validation

Test plan

cargo test --workspace passes
cargo test --features firered — FireRedVAD unit tests and upstream validation
Accuracy tests pass against baseline thresholds
vad-lab displays FireRedVAD results with timing breakdown

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…at/firered-vad

Implement FireRedVAD as a new VAD backend behind the `firered` feature flag. Includes pure Rust FBank feature extraction (matching kaldi_native_fbank), Kaldi binary CMVN parser, and streaming ONNX inference with DFSMN cache management. Validated against Python reference pipeline with max probability diff of 0.000012 across 98 frames, and against upstream fireredvad pip package on real speech audio with max diff of 0.0005 across 1150 frames. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Wire the firered feature into accuracy.rs, vad_comparison.rs bench, Makefile targets, and all GitHub Actions workflows so FireRedVAD is tested alongside the other backends. Baseline: P=0.950 R=0.879 F1=0.913 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add ref_upstream_probs.json from official fireredvad pip package - Add probabilities_match_upstream_fireredvad test comparing Rust output directly against upstream PyTorch probabilities - Use TensorRef::from_array_view for zero-copy tensor creation (eliminates 19,456-element cache clone per frame) - Remove dead extract_frame(&[i16]) method, use f32 path directly - Add --save-upstream flag to validate_upstream.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add `ProcessTimings` to the `VoiceActivityDetector` trait so each backend reports a named breakdown of where time is spent (e.g. fbank → cmvn → onnx). Instrument all four backends (WebRTC, Silero, TEN-VAD, FireRedVAD) and surface the data in: - accuracy tests: per-stage µs/frame in both inline output and the markdown report - vad-lab: stage deltas piped through the pipeline → WebSocket → React UI, displayed next to RTF in each timeline header Also integrates FireRedVAD into vad-lab (backend creation, available backends list, 16 kHz resampling gate). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Split label/config and RTF/stage timings into two-line layouts for better vertical alignment, add border to VAD canvases matching waveform/spectrogram style, and include FireRedVAD as a default config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SelectOption now has distinct value (machine identifier) and label (display string) instead of encoding both in a single string. This removes the prefix-stripping hack in mode parsing and sets the default WebRTC VAD mode to very_aggressive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When Select option values change (e.g. the value/label refactor), configs persisted in localStorage may hold values that no longer match any valid option, silently breaking detector creation. The backfill logic now validates Select params against available options and resets invalid values to the default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add an info icon next to the RTF value that shows a tooltip with the frame duration, per-stage timing, and how RTF is computed (processing time / audio duration). Uses shadcn tooltip component. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Local (Apple Silicon) max diff is ~0.00068 but CI (Linux x86_64) hits ~0.00114 due to different SIMD/FMA paths in FFT computation. Bump tolerance from 1e-3 to 2e-3 to accommodate both environments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

## 🤖 New release * `wavekat-vad`: 0.1.8 -> 0.1.9 (✓ API compatible changes) <details><summary>Changelog</summary> <blockquote> ## [0.1.9](v0.1.8...v0.1.9) - 2026-03-25 ### Added - add FireRedVAD backend ([#38](#38)) </blockquote> </details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

wavekat-eason and others added 16 commits March 24, 2026 21:55

docs: add FireRedVAD implementation plan

4bd9586

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'main' of github.com-wavekat:wavekat/wavekat-vad into fe…

8fcfc39

…at/firered-vad

docs: add FireRedVAD to ONNX model downloads section

db9ad91

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add Acknowledgements section to README

49053f0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add FireRedVAD to README

f8e8076

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add FireRedVAD video script

741d33c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add FireRedVAD example and update video script

0969728

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wavekat-eason merged commit 1afdc5d into main Mar 25, 2026
9 checks passed

wavekat-eason deleted the feat/firered-vad branch March 25, 2026 05:55

github-actions bot mentioned this pull request Mar 25, 2026

chore(wavekat-vad): release v0.1.9 #39

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add FireRedVAD backend#38

feat: add FireRedVAD backend#38
wavekat-eason merged 16 commits intomainfrom
feat/firered-vad

wavekat-eason commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wavekat-eason commented Mar 25, 2026

Summary

Details

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant