feat: accuracy cross-validation + fix reflect padding by wavekat-eason · Pull Request #11 · wavekat/wavekat-turn

wavekat-eason · 2026-03-29T00:54:52Z

Summary

Add end-to-end cross-validation of the Rust pipeline against the Python (Pipecat) reference
Fix a silent preprocessing bug: STFT was using zero-padding instead of reflect-padding

What's in this PR

Cross-validation infrastructure

scripts/gen_reference.py — generates reference.json and *.mel.npy fixtures from the Python pipeline
tests/fixtures/ — committed WAV clips, reference.json, and mel tensors
tests/accuracy.rs — regression tests asserting Rust probability is within ±0.02 of Python
make accuracy — prints a per-backend markdown table of probability diffs
make mel — compares [80×800] mel tensors element-wise, reporting max/mean diff and outlier locations
.github/workflows/turn-accuracy.yml — manual workflow that runs the report and opens a PR to update the README table

Bug fix: reflect padding
STFT center-padding was using zeros; WhisperFeatureExtractor uses np.pad(mode="reflect"). The fix drops mel max diff from 0.087 → 0.00001 (noise floor only) and probability diffs from 0.0017/0.0138 → 0.0000/0.0051.

Test plan

cargo test --features pipecat — all tests pass
make accuracy — all clips PASS within ±0.02
make mel — max diff ≤ 0.00002 for all clips

🤖 Generated with Claude Code

- Record three fixture WAV clips (silence, finished, mid-utterance) - Add scripts/gen_reference.py to generate Python-side probabilities - Commit tests/fixtures/reference.json as the baseline - Add tests/accuracy.rs with per-clip regression tests and a multi-backend accuracy_report table (make accuracy) - Add .github/workflows/turn-accuracy.yml to auto-update README table - Add Accuracy section to README with benchmark-table placeholders Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix STFT center-padding: was using zeros, now uses reflect mode to match WhisperFeatureExtractor (np.pad mode="reflect") This drops mel max diff from 0.087 → 0.00001 (FFT noise floor) and probability diffs from [0.0017, 0.0138] → [0.0000, 0.0051] - Save per-clip mel tensors as .npy in gen_reference.py - Add mel_report unit test (make mel) comparing [80×800] tensors element-wise with max/mean diff and outlier fraction - Fix i16→f32 normalization: 32767 → 32768 to match soundfile - Add ndarray-npy dev dependency for .npy loading in tests - Add make mel target Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mel tensors are 768 KB of opaque binary that can't be diffed. make mel is a dev diagnostic tool — .npy files are regenerated locally by scripts/gen_reference.py alongside reference.json. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Gate fixtures_dir, RefEntry, load_reference under #[cfg(any(feature = "pipecat"))] to avoid dead-code warnings in no-feature builds - Move reference_prob and load_wav_f32 into pipecat mod (only used there) - Suppress unused_mut on the rows accumulator in accuracy_report - Run cargo fmt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…alidation

wavekat-eason and others added 5 commits March 29, 2026 13:43

Merge remote-tracking branch 'origin/main' into feat/accuracy-cross-v…

c9d3641

…alidation

wavekat-eason merged commit cf2e498 into main Mar 29, 2026
5 checks passed

wavekat-eason deleted the feat/accuracy-cross-validation branch March 29, 2026 02:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: accuracy cross-validation + fix reflect padding#11

feat: accuracy cross-validation + fix reflect padding#11
wavekat-eason merged 5 commits intomainfrom
feat/accuracy-cross-validation

wavekat-eason commented Mar 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wavekat-eason commented Mar 29, 2026

Summary

What's in this PR

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant