Skip to content

Dynamic VAD Thresholding to Prevent False Barge-ins During Playback #51

@jjleng

Description

@jjleng

Currently, when the Feros agent is playing back audio (TTS or Gemini Live output), the user's microphone can pick up background noise (keyboard typing, room noise) or acoustic echo from the speakers. This can cause the local Silero VAD to detect a false positive SpeechStarted event.

In the Standard Reactor pipeline, this triggers a barge-in that cuts off the bot mid-sentence.
In the Gemini Live (Native Multimodal) pipeline, this triggers a local barge-in which sends a hard backend.interrupt() to Google's servers, abruptly killing Gemini's output stream. This occurs because the VAD runs on raw, undenoised audio on this pipeline.
Solution
Implemented dynamic VAD thresholding (barge-in sensitivity reduction) to automatically lower VAD sensitivity strictly while the bot is broadcasting audio.

This requires the user to speak louder and more intentionally to successfully interrupt the bot, filtering out background noise and minor acoustic echo.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions