Currently, when the Feros agent is playing back audio (TTS or Gemini Live output), the user's microphone can pick up background noise (keyboard typing, room noise) or acoustic echo from the speakers. This can cause the local Silero VAD to detect a false positive SpeechStarted event.
In the Standard Reactor pipeline, this triggers a barge-in that cuts off the bot mid-sentence.
In the Gemini Live (Native Multimodal) pipeline, this triggers a local barge-in which sends a hard backend.interrupt() to Google's servers, abruptly killing Gemini's output stream. This occurs because the VAD runs on raw, undenoised audio on this pipeline.
Solution
Implemented dynamic VAD thresholding (barge-in sensitivity reduction) to automatically lower VAD sensitivity strictly while the bot is broadcasting audio.
This requires the user to speak louder and more intentionally to successfully interrupt the bot, filtering out background noise and minor acoustic echo.
Currently, when the Feros agent is playing back audio (TTS or Gemini Live output), the user's microphone can pick up background noise (keyboard typing, room noise) or acoustic echo from the speakers. This can cause the local Silero VAD to detect a false positive SpeechStarted event.
In the Standard Reactor pipeline, this triggers a barge-in that cuts off the bot mid-sentence.
In the Gemini Live (Native Multimodal) pipeline, this triggers a local barge-in which sends a hard backend.interrupt() to Google's servers, abruptly killing Gemini's output stream. This occurs because the VAD runs on raw, undenoised audio on this pipeline.
Solution
Implemented dynamic VAD thresholding (barge-in sensitivity reduction) to automatically lower VAD sensitivity strictly while the bot is broadcasting audio.
This requires the user to speak louder and more intentionally to successfully interrupt the bot, filtering out background noise and minor acoustic echo.