Skip to content

Core architecture rework for agent framework#169

Draft
DeepBhupatkar wants to merge 25 commits intomainfrom
rework/agent-framework-core
Draft

Core architecture rework for agent framework#169
DeepBhupatkar wants to merge 25 commits intomainfrom
rework/agent-framework-core

Conversation

@DeepBhupatkar
Copy link
Collaborator

@DeepBhupatkar DeepBhupatkar commented Jan 15, 2026

Overview

This PR focuses on reworking the agent framework core architecture and improving modularization.


1. refactor(room): modularize room responsibilities and stream handling

Room logic previously handled multiple responsibilities including connection lifecycle, SIP participant management, recording orchestration,and input stream handling.

This change modularizes the room implementation by introducing dedicated managers:

  • InputStreamManager: handles incoming participant audio/video streams
  • SIPManager: manages SIP operations, call info fetching, and transfers
  • RecordingManager: orchestrates participant-level recording and merging

Additionally, output-side custom audio track implementations have been moved from audio_stream.py to output_stream.py to clearly separate input and output stream responsibilities.

DeepBhupatkar and others added 25 commits January 15, 2026 10:51
Room logic previously handled multiple responsibilities including connection lifecycle, SIP participant management, recording orchestration,and input stream handling.

This change modularizes the room implementation by introducing dedicated managers:

- InputStreamManager: handles incoming participant audio/video streams
- SIPManager: manages SIP operations, call info fetching, and transfers
- RecordingManager: orchestrates participant-level recording and merging

Additionally, output-side custom audio track implementations have been moved from `audio_stream.py` to `output_stream.py` to clearly separate input and output stream responsibilities.
- Refactored `pipeline.py` - Single Pipeline class for all configurations

Add Core Modules:

- `speech_understanding.py` - VAD, STT, Turn Detection
- `content_generation.py` - LLM processing, tool calling, KB integration
- `speech_generation.py` - TTS synthesis and audio playback
- `pipeline_orchestrator.py` - Component orchestration and event routing
- `realtime_llm_adapter.py` - Realtime model adapter
 - Removed `ConversationFlow` - functionality absorbed into PipelineOrchestrator
- Removed `CascadingPipeline` and `RealTimePipeline` - replaced by unified Pipeline
Implement decorator-based hooks (@pipeline.on("event_name")) for intercepting
and modifying pipeline data at key stages:

- Audio streaming hooks (speech_in, speech_out) for real-time audio processing
- Vision hook (vision_frame) for video frame processing
- STT  hook allows cleaning, normalization, redaction, or enrichment of the transcript.
- LLM control (llm hook with yield-based bypass, agent_response for output)
- Lifecycle hooks (user_turn_start/end, agent_turn_start/end)

- rename RealtimeLLMWrapper -> RealtimeLLMAdapter
- Allows using an external STT provider and Knowledge Base before passing text to a Realtime model for LLM+TTS.

- Allows using a Realtime model for STT+LLM while intercepting text to use an external TTS provider.
Introduce common stream hooks: @pipeline.on("stt") for audio → transcript events and @pipeline.on("tts") for text → audio events.

enable unified pre- and post-processing in a single location.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants