-
Notifications
You must be signed in to change notification settings - Fork 367
am finalize message #1378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
am finalize message #1378
Conversation
📝 WalkthroughWalkthroughExposes VAD types via public re-exports. Adds control message channeling and VAD-driven finalize signaling in the listener FSM, multiplexing control with audio. Updates transcript manager to persist and deduplicate final words across calls, emitting only newly finalized words per append while pruning partials accordingly. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Mic
actor Speaker
participant FSM as Listener FSM
participant VAD as VadSession
participant Ctrl as control_tx/control_rx
participant Stream as Mixed stream (audio + control)
participant Listener as Downstream Listener
Mic->>FSM: mic chunk
Speaker->>FSM: speaker chunk
FSM->>FSM: mix(mic, speaker) -> mixed
FSM->>VAD: process(mixed)
alt SpeechEnd detected
VAD-->>FSM: Transition::SpeechEnd
FSM->>Ctrl: send ControlMessage::Finalize
end
Ctrl-->>FSM: ControlMessage
FSM->>Stream: select(audio_stream, control_stream)
Stream->>Listener: MixedMessage::{Audio|Control}
note over Listener: Receives audio frames and finalize controls multiplexed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–60 minutes Possibly related PRs
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 3 files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/listener/src/fsm.rs (1)
492-496
: Server-side Control messages aren’t handledAlthough
ListenClient::from_realtime_audio
correctly serializesMixedMessage::Control
(and the interface defines theControl
variant) and your listener plugin merges audio + control streams, the server code never matches on or forwards incoming control frames. Both in
owhisper-server/src/server.rs
(around line 426)owhisper-server/src/commands/run/realtime.rs
(around line 167)you only build the input stream as audio chunks:
- let input = audio.map(|chunk| owhisper_interface::MixedMessage::Audio(chunk)); + let control_stream = control_rx.into_stream().map(|ctrl| owhisper_interface::MixedMessage::Control(ctrl)); + let combined_input = futures_util::stream::select( + audio.map(|chunk| owhisper_interface::MixedMessage::Audio(chunk)), + control_stream, + ); + let input = combined_input;Please update the server’s real-time input pipelines to handle
MixedMessage::Control
(e.g.Finalize
,KeepAlive
,CloseStream
) so that control frames are respected end-to-end.
🧹 Nitpick comments (4)
plugins/listener/src/fsm.rs (3)
86-87
: Avoid magic number for control channel capacityLift the
8
into a named constant to document intent and ease tuning.fn new() -> Self { - const CHUNK_BUFFER_SIZE: usize = 64; + const CHUNK_BUFFER_SIZE: usize = 64; + const CONTROL_BUFFER_SIZE: usize = 8; @@ - let (control_tx, control_rx) = flume::bounded::<owhisper_interface::ControlMessage>(8); + let (control_tx, control_rx) = + flume::bounded::<owhisper_interface::ControlMessage>(CONTROL_BUFFER_SIZE);
342-347
: Mixed computation duplicated later — DRY and minor perf winYou recompute the same
mixed
vector again for recording. Reuse the earlier result.- if record { - let mixed: Vec<f32> = mic_chunk - .iter() - .zip(speaker_chunk.iter()) - .map(|(mic, speaker)| (mic + speaker).clamp(-1.0, 1.0)) - .collect(); - if save_mixed_tx.send_async(mixed).await.is_err() { + if record { + if save_mixed_tx.send_async(mixed.clone()).await.is_err() { tracing::error!("save_mixed_tx_send_error"); } }
348-356
: Finalize on SpeechEnd — validate semantics and backpressure
- Triggering Finalize on any
SpeechEnd
from the mixed stream means speaker-only silence may finalize mic hypotheses too. If the intent is “phrase boundary regardless of source,” this is fine; otherwise consider gating on mic-dominant segments.send_async
on a bounded(8) channel can block this hot loop under receiver backpressure. If Finalize events can ever burst, considertry_send
with best-effort semantics or raising the bound.Would you like a small debounce/in-speech guard to ensure a single Finalize per contiguous speech segment?
plugins/listener/src/manager.rs (1)
98-128
: Finalization diff logic — solid; consider epsilon constant and stricter partial pruningThe start/end-based dedup is clear and effective. Two small improvements:
- Lift
0.001
into a named epsilon to document units (seconds) and ease tuning.- When pruning partials after new finals, consider an epsilon-inclusive cutoff to avoid edge residue on exact boundaries.
+const TIME_EPSILON: f32 = 0.001; // seconds; ~1 ms tolerance for alignment @@ - let new_final_words: Vec<owhisper_interface::Word> = words + let new_final_words: Vec<owhisper_interface::Word> = words .into_iter() .filter(|new_word| { !self.final_words.iter().any(|existing_word| { - (existing_word.start - new_word.start).abs() < 0.001 - && (existing_word.end - new_word.end).abs() < 0.001 + (existing_word.start - new_word.start).abs() < TIME_EPSILON + && (existing_word.end - new_word.end).abs() < TIME_EPSILON }) }) .collect(); @@ - if let Some(last_final_word) = new_final_words.last() { + if let Some(last_final_word) = new_final_words.last() { self.partial_words = self .partial_words .iter() - .filter(|w| w.start > last_final_word.end) + .filter(|w| w.start >= last_final_word.end - TIME_EPSILON) .cloned() .collect::<Vec<_>>(); }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
crates/vad/src/continuous.rs
(1 hunks)plugins/listener/src/fsm.rs
(6 hunks)plugins/listener/src/manager.rs
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}
⚙️ CodeRabbit configuration file
**/*.{js,ts,tsx,rs}
: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".
Files:
crates/vad/src/continuous.rs
plugins/listener/src/fsm.rs
plugins/listener/src/manager.rs
🔇 Additional comments (11)
crates/vad/src/continuous.rs (1)
10-10
: Public re-export of VAD types looks goodRe-exporting
VadConfig
,VadSession
, andVadTransition
from here simplifies downstream imports and aligns with their usage elsewhere in the PR. No runtime impact and no unused items introduced.plugins/listener/src/fsm.rs (7)
71-73
: Add control channel to AudioChannels — appropriate extensionIntroducing a dedicated control channel for out-of-band signals (e.g., Finalize) is a clean separation from the audio paths. Fields are consumed later, so no unused members.
117-119
: Initialize new control fields — consistentStruct initialization updated correctly; no omissions.
302-303
: Clone control_tx into the processing task — good scope captureThis is the right place to source control messages from the audio loop. No leaks and no unused clones.
474-479
: Audio stream mapping into MixedMessage::Audio — goodZipping mic/speaker and mapping into the transport type is clear and minimal.
480-484
: Control stream mapping — goodSimple lift into
MixedMessage::Control
. No extraneous allocations.
485-485
: Stream select to multiplex audio + control — appropriate
select
provides fair interleaving so control messages don’t starve. Good choice for simplicity.
310-316
: Drop thesample_rate
pinning suggestion—VadConfig
has no such fieldI searched the entire workspace (including
crates/vad
) and found nostruct VadConfig
definition orsample_rate
field available for configuration—so addingsample_rate
would not compile. You can safely ignore the optional refactor to pinsample_rate
.Likely an incorrect or invalid review comment.
plugins/listener/src/manager.rs (3)
5-6
: Persisted final_words state — enables cross-call dedupKeeping accumulated finals here aligns with the PR goal (emit only deltas). No API surface changes.
13-14
: Default impl updated for final_words — OKProperly initialized to empty; no unintended allocations.
126-128
: Return only newly observed finals — matches PR intent
Diff.final_words
carries deltas;self.final_words
retains the history. Good separation.
Summary by cubic
Auto-finalizes messages at end-of-speech and fixes duplicate final words in transcripts. Adds a control channel for finalize events and re-exports VAD types for plugin use.
New Features
Bug Fixes