Dynamic VAD Thresholding to Prevent False Barge-ins During Playback


Currently, when the Feros agent is playing back audio (TTS or Gemini Live output), the user's microphone can pick up background noise (keyboard typing, room noise) or acoustic echo from the speakers. This can cause the local Silero VAD to detect a false positive SpeechStarted event.

In the Standard Reactor pipeline, this triggers a barge-in that cuts off the bot mid-sentence.
In the Gemini Live (Native Multimodal) pipeline, this triggers a local barge-in which sends a hard backend.interrupt() to Google's servers, abruptly killing Gemini's output stream. This occurs because the VAD runs on raw, undenoised audio on this pipeline.
Solution
Implemented dynamic VAD thresholding (barge-in sensitivity reduction) to automatically lower VAD sensitivity strictly while the bot is broadcasting audio.

This requires the user to speak louder and more intentionally to successfully interrupt the bot, filtering out background noise and minor acoustic echo.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic VAD Thresholding to Prevent False Barge-ins During Playback #51

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dynamic VAD Thresholding to Prevent False Barge-ins During Playback #51

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions