Core architecture rework for agent framework by DeepBhupatkar · Pull Request #169 · videosdk-live/agents

DeepBhupatkar · 2026-01-15T05:28:12Z

Overview

This PR focuses on reworking the agent framework core architecture and improving modularization.

1. refactor(room): modularize room responsibilities and stream handling

Room logic previously handled multiple responsibilities including connection lifecycle, SIP participant management, recording orchestration,and input stream handling.

This change modularizes the room implementation by introducing dedicated managers:

InputStreamManager: handles incoming participant audio/video streams
SIPManager: manages SIP operations, call info fetching, and transfers
RecordingManager: orchestrates participant-level recording and merging

Additionally, output-side custom audio track implementations have been moved from audio_stream.py to output_stream.py to clearly separate input and output stream responsibilities.

Room logic previously handled multiple responsibilities including connection lifecycle, SIP participant management, recording orchestration,and input stream handling. This change modularizes the room implementation by introducing dedicated managers: - InputStreamManager: handles incoming participant audio/video streams - SIPManager: manages SIP operations, call info fetching, and transfers - RecordingManager: orchestrates participant-level recording and merging Additionally, output-side custom audio track implementations have been moved from `audio_stream.py` to `output_stream.py` to clearly separate input and output stream responsibilities.

…k-core

- Refactored `pipeline.py` - Single Pipeline class for all configurations Add Core Modules: - `speech_understanding.py` - VAD, STT, Turn Detection - `content_generation.py` - LLM processing, tool calling, KB integration - `speech_generation.py` - TTS synthesis and audio playback - `pipeline_orchestrator.py` - Component orchestration and event routing - `realtime_llm_adapter.py` - Realtime model adapter

- Removed `ConversationFlow` - functionality absorbed into PipelineOrchestrator - Removed `CascadingPipeline` and `RealTimePipeline` - replaced by unified Pipeline

Implement decorator-based hooks (@pipeline.on("event_name")) for intercepting and modifying pipeline data at key stages: - Audio streaming hooks (speech_in, speech_out) for real-time audio processing - Vision hook (vision_frame) for video frame processing - STT hook allows cleaning, normalization, redaction, or enrichment of the transcript. - LLM control (llm hook with yield-based bypass, agent_response for output) - Lifecycle hooks (user_turn_start/end, agent_turn_start/end) - rename RealtimeLLMWrapper -> RealtimeLLMAdapter

- Allows using an external STT provider and Knowledge Base before passing text to a Realtime model for LLM+TTS. - Allows using a Realtime model for STT+LLM while intercepting text to use an external TTS provider.

Introduce common stream hooks: @pipeline.on("stt") for audio → transcript events and @pipeline.on("tts") for text → audio events. enable unified pre- and post-processing in a single location.

…Component enums

DeepBhupatkar and others added 25 commits January 15, 2026 10:51

Merge remote-tracking branch 'origin/main' into rework/agent-framewor…

63609aa

…k-core

disable metrics collector and tracing

7b7ff87

refactor: unify pipelines under PipelineOrchestrator

b81187b

- Removed `ConversationFlow` - functionality absorbed into PipelineOrchestrator - Removed `CascadingPipeline` and `RealTimePipeline` - replaced by unified Pipeline

disable eval feature for now

fbf9a32

update the examples

c823f52

Knowledgebase removed from generate method

e9e1b98

remove unwanted file

937440b

speech hook fix

0eab2e3

threshold value update

0321556

feat: additional stt tts option with realtime models

ae1ee38

- Allows using an external STT provider and Knowledge Base before passing text to a Realtime model for LLM+TTS. - Allows using a Realtime model for STT+LLM while intercepting text to use an external TTS provider.

update plugins to support external stt & tts

ba6cd54

cleanup

4308491

fixed the interruption issue while using external tts with realtime

eaf8362

add content_generated hook

0e12c8f

update example

0c8bf55

refactor(hooks): unify STT/TTS stream hooks

63a3a53

Introduce common stream hooks: @pipeline.on("stt") for audio → transcript events and @pipeline.on("tts") for text → audio events. enable unified pre- and post-processing in a single location.

Add agent response hook to modify LLM output only

f3011c8

update the examples

4c3d3e0

update the example

30ca663

graph related changes

e1b7917

Added usage metrics for stt

f50d2ad

Add centralized pipeline configuration with PipelineMode and Pipeline…

07c08df

…Component enums

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core architecture rework for agent framework#169

Core architecture rework for agent framework#169
DeepBhupatkar wants to merge 25 commits intomainfrom
rework/agent-framework-core

DeepBhupatkar commented Jan 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DeepBhupatkar commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DeepBhupatkar commented Jan 15, 2026 •

edited

Loading