Skip to content

feat: add FireRedVAD backend#38

Merged
wavekat-eason merged 16 commits intomainfrom
feat/firered-vad
Mar 25, 2026
Merged

feat: add FireRedVAD backend#38
wavekat-eason merged 16 commits intomainfrom
feat/firered-vad

Conversation

@wavekat-eason
Copy link
Copy Markdown
Contributor

Summary

  • Add FireRedVAD backend with pure Rust FBank feature extraction and CMVN normalization, using ONNX Runtime only for inference
  • Include accuracy tests, benchmarks, CI integration, upstream validation tests, and a standalone example
  • Add FireRedVAD support to vad-lab with per-stage process timings, RTF tooltip, and improved result card layout

Details

Library crate (wavekat-vad):

  • New firered feature flag with backends::firered::FireRedVad implementation
  • Pure Rust preprocessing pipeline: 80-band Mel filterbank, Kaldi mel scale, CMVN normalization
  • Zero-copy ONNX inference with DFSMN cache tensors
  • ONNX model and CMVN stats embedded at compile time via build.rs
  • Upstream parity validation against Python reference implementation
  • Added to accuracy baseline, benchmarks, and CI workflows

vad-lab tool:

  • FireRedVAD available as a backend option
  • Per-stage process timings (preprocess, inference, postprocess) streamed to frontend
  • RTF info tooltip explaining timing breakdown
  • Improved VAD result card layout and defaults
  • Fixed stale Select param values in saved configs

Docs:

  • Implementation plan, video script, experiment notes
  • README updates with acknowledgements section
  • Reference Python scripts for validation

Test plan

  • cargo test --workspace passes
  • cargo test --features firered — FireRedVAD unit tests and upstream validation
  • Accuracy tests pass against baseline thresholds
  • vad-lab displays FireRedVAD results with timing breakdown

🤖 Generated with Claude Code

wavekat-eason and others added 16 commits March 24, 2026 21:55
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement FireRedVAD as a new VAD backend behind the `firered` feature
flag. Includes pure Rust FBank feature extraction (matching
kaldi_native_fbank), Kaldi binary CMVN parser, and streaming ONNX
inference with DFSMN cache management.

Validated against Python reference pipeline with max probability diff
of 0.000012 across 98 frames, and against upstream fireredvad pip
package on real speech audio with max diff of 0.0005 across 1150
frames.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire the firered feature into accuracy.rs, vad_comparison.rs bench,
Makefile targets, and all GitHub Actions workflows so FireRedVAD is
tested alongside the other backends.

Baseline: P=0.950 R=0.879 F1=0.913

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add ref_upstream_probs.json from official fireredvad pip package
- Add probabilities_match_upstream_fireredvad test comparing Rust
  output directly against upstream PyTorch probabilities
- Use TensorRef::from_array_view for zero-copy tensor creation
  (eliminates 19,456-element cache clone per frame)
- Remove dead extract_frame(&[i16]) method, use f32 path directly
- Add --save-upstream flag to validate_upstream.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `ProcessTimings` to the `VoiceActivityDetector` trait so each
backend reports a named breakdown of where time is spent (e.g.
fbank → cmvn → onnx). Instrument all four backends (WebRTC, Silero,
TEN-VAD, FireRedVAD) and surface the data in:

- accuracy tests: per-stage µs/frame in both inline output and the
  markdown report
- vad-lab: stage deltas piped through the pipeline → WebSocket →
  React UI, displayed next to RTF in each timeline header

Also integrates FireRedVAD into vad-lab (backend creation, available
backends list, 16 kHz resampling gate).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split label/config and RTF/stage timings into two-line layouts for
better vertical alignment, add border to VAD canvases matching
waveform/spectrogram style, and include FireRedVAD as a default config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SelectOption now has distinct value (machine identifier) and label
(display string) instead of encoding both in a single string.
This removes the prefix-stripping hack in mode parsing and sets
the default WebRTC VAD mode to very_aggressive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When Select option values change (e.g. the value/label refactor),
configs persisted in localStorage may hold values that no longer
match any valid option, silently breaking detector creation.
The backfill logic now validates Select params against available
options and resets invalid values to the default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add an info icon next to the RTF value that shows a tooltip with
the frame duration, per-stage timing, and how RTF is computed
(processing time / audio duration). Uses shadcn tooltip component.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Local (Apple Silicon) max diff is ~0.00068 but CI (Linux x86_64) hits
~0.00114 due to different SIMD/FMA paths in FFT computation. Bump
tolerance from 1e-3 to 2e-3 to accommodate both environments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wavekat-eason wavekat-eason merged commit 1afdc5d into main Mar 25, 2026
9 checks passed
@wavekat-eason wavekat-eason deleted the feat/firered-vad branch March 25, 2026 05:55
wavekat-eason pushed a commit that referenced this pull request Mar 25, 2026
## 🤖 New release

* `wavekat-vad`: 0.1.8 -> 0.1.9 (✓ API compatible changes)

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

##
[0.1.9](v0.1.8...v0.1.9)
- 2026-03-25

### Added

- add FireRedVAD backend
([#38](#38))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant