Merged
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement FireRedVAD as a new VAD backend behind the `firered` feature flag. Includes pure Rust FBank feature extraction (matching kaldi_native_fbank), Kaldi binary CMVN parser, and streaming ONNX inference with DFSMN cache management. Validated against Python reference pipeline with max probability diff of 0.000012 across 98 frames, and against upstream fireredvad pip package on real speech audio with max diff of 0.0005 across 1150 frames. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire the firered feature into accuracy.rs, vad_comparison.rs bench, Makefile targets, and all GitHub Actions workflows so FireRedVAD is tested alongside the other backends. Baseline: P=0.950 R=0.879 F1=0.913 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add ref_upstream_probs.json from official fireredvad pip package - Add probabilities_match_upstream_fireredvad test comparing Rust output directly against upstream PyTorch probabilities - Use TensorRef::from_array_view for zero-copy tensor creation (eliminates 19,456-element cache clone per frame) - Remove dead extract_frame(&[i16]) method, use f32 path directly - Add --save-upstream flag to validate_upstream.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `ProcessTimings` to the `VoiceActivityDetector` trait so each backend reports a named breakdown of where time is spent (e.g. fbank → cmvn → onnx). Instrument all four backends (WebRTC, Silero, TEN-VAD, FireRedVAD) and surface the data in: - accuracy tests: per-stage µs/frame in both inline output and the markdown report - vad-lab: stage deltas piped through the pipeline → WebSocket → React UI, displayed next to RTF in each timeline header Also integrates FireRedVAD into vad-lab (backend creation, available backends list, 16 kHz resampling gate). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split label/config and RTF/stage timings into two-line layouts for better vertical alignment, add border to VAD canvases matching waveform/spectrogram style, and include FireRedVAD as a default config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SelectOption now has distinct value (machine identifier) and label (display string) instead of encoding both in a single string. This removes the prefix-stripping hack in mode parsing and sets the default WebRTC VAD mode to very_aggressive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When Select option values change (e.g. the value/label refactor), configs persisted in localStorage may hold values that no longer match any valid option, silently breaking detector creation. The backfill logic now validates Select params against available options and resets invalid values to the default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add an info icon next to the RTF value that shows a tooltip with the frame duration, per-stage timing, and how RTF is computed (processing time / audio duration). Uses shadcn tooltip component. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Local (Apple Silicon) max diff is ~0.00068 but CI (Linux x86_64) hits ~0.00114 due to different SIMD/FMA paths in FFT computation. Bump tolerance from 1e-3 to 2e-3 to accommodate both environments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wavekat-eason
pushed a commit
that referenced
this pull request
Mar 25, 2026
## 🤖 New release * `wavekat-vad`: 0.1.8 -> 0.1.9 (✓ API compatible changes) <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.1.9](v0.1.8...v0.1.9) - 2026-03-25 ### Added - add FireRedVAD backend ([#38](#38)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Details
Library crate (
wavekat-vad):fireredfeature flag withbackends::firered::FireRedVadimplementationbuild.rsvad-lab tool:
Docs:
Test plan
cargo test --workspacepassescargo test --features firered— FireRedVAD unit tests and upstream validation🤖 Generated with Claude Code