fix: align voice WebSocket with reference architecture accept-first pattern#159
Merged
philmerrell merged 2 commits intodevelopfrom Apr 13, 2026
Merged
Conversation
…attern Rewrites voice_stream to match the sample-strands-agent-with-agentcore reference architecture: - Accept WebSocket immediately (AgentCore validates auth at proxy layer) - Extract params via helper functions: custom header → query param → config message - Config message always read to supplement missing params in cloud mode - /voice/stream as main route, /ws as alias for AgentCore Runtime - Frontend uses /voice/stream for local dev, /ws for AgentCore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ofilson
added a commit
that referenced
this pull request
Apr 16, 2026
* test(session): update compaction config defaults and fix integration test skipping - Enable compaction by default in CompactionConfig - Increase protected_turns default from 2 to 3 - Add pytest marker to skip integration tests when AGENTCORE_MEMORY_ID is not set - Fix import path for get_metadata_storage in cache savings tests from metadata_storage.get_metadata_storage to storage.get_metadata_storage - Ensures integration tests only run in appropriate environments with required AWS credentials * test(session): enhance session manager fixture with initialize() mock and cleanup - Mock AgentCoreMemorySessionManager.initialize() to simulate SDK behavior - Add _mock_sdk_initialize shim that loads messages and validates agent uniqueness - Track active patches in fixture scope for proper cleanup on teardown - Update fixture docstring to document initialize() mocking and message control - Convert fixture to generator with yield to enable patch cleanup - Allow tests to control loaded messages via mgr.read_agent and mgr.list_messages * Release 1.0.0-beta.22: Cognito-native auth, CORS unification, RBAC consolidation, Trivy supply chain fix (#137)⚠️ BREAKING CHANGE: Authentication replaced with AWS Cognito. The legacy generic OIDC implementation has been removed with no backward compatibility layer. Existing deployments must re-bootstrap. Cognito First-Boot Authentication: - Cognito User Pool, App Client, and Domain provisioned in Infrastructure stack - CognitoJWTValidator replaces GenericOIDCJWTValidator - New system/ module for first-boot setup, Cognito user/group management - New cognito_idp_service for federated identity provider CRUD via Cognito IdP APIs - First-boot page with admin account creation (race-condition-safe DynamoDB writes) - Frontend auth flow rewritten for Cognito OAuth 2.0 + PKCE - Runtime-provisioner and runtime-updater Lambda functions removed (2,800+ lines) - Backend OIDC service, token exchange, and discovery endpoints removed (1,318 lines) - 2,057 lines of new Cognito test coverage (IdP service, JWT validator, first-boot, system) RBAC Consolidation: - Single require_app_roles dependency replaces 6 role-checking functions/decorators - User roles enriched from stored DynamoDB profile during token processing - Profile cache invalidation on sync for immediate role updates - JSON array parsing for custom:roles claim (Entra ID compatibility) - jwt_role_mappings updates allowed on system_admin role CORS Unification: - buildCorsOrigins() shared helper across all 6 CDK stacks - S3 CORS made conditional, ExposedHeaders→ExposeHeaders fix - Python APIs read CORS_ORIGINS env var (replaces allow_origins=['*']) Security: - Trivy action upgraded v0.28.0→v0.35.0 — old SHA was compromised in March 2026 supply chain attack (GHSA-69fq-xp46-6x23) CI/CD: - CDK_DOMAIN_NAME and CDK_CORS_ORIGINS added to all workflow jobs - App API synth-cdk actually skipped on PRs (guard was missing despite beta.20 docs) - SSM StringParameter creation guarded against empty values Bootstrap: - seed_bootstrap_data.py sole owner of RBAC role seeding (removed from app startup) - system_admin role seeded with jwt_role_mappings=['system_admin'] - Additive JWT mapping seeding for existing deployments Documentation: - 54,665 lines of outdated specs and AI artifacts purged (121 files) Dependencies: - Python: fastapi 0.135.3, uvicorn 0.44.0, boto3 1.42.83, strands-agents 1.34.1, bedrock-agentcore 1.6.0, google-genai 1.70.0, ruff 0.15.9, mypy 1.20.0 - Frontend: Angular 21.2.7, katex 0.16.45, mermaid 11.14.0, Analog.js alpha.26 - Infrastructure: aws-cdk-lib 2.248.0, aws-cdk 2.1117.0, ts-jest 29.4.9 * refactor: centralize env vars and magic strings into config/constants.py (#139) Create agents/main_agent/config/constants.py with EnvVars, Defaults, and Prefixes classes. Update all 13 modules to import from the centralized constants instead of using inline os.getenv() with hardcoded strings. This eliminates scattered magic strings and provides a single reference for all configuration. Zero behavior change — all values are identical. 543/543 tests passing. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: extract BaseAgent ABC and ChatAgent from MainAgent (#140) * refactor: centralize env vars and magic strings into config/constants.py Create agents/main_agent/config/constants.py with EnvVars, Defaults, and Prefixes classes. Update all 13 modules to import from the centralized constants instead of using inline os.getenv() with hardcoded strings. This eliminates scattered magic strings and provides a single reference for all configuration. Zero behavior change — all values are identical. 543/543 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: extract BaseAgent ABC and ChatAgent from MainAgent Split MainAgent into a three-tier hierarchy: - BaseAgent (ABC): shared init for model config, tools, session, streaming - ChatAgent(BaseAgent): Strands Agent creation and text streaming - MainAgent(ChatAgent): backward-compatible alias (pass-through) All existing callers continue to import and use MainAgent unchanged. The _build_filtered_tools() helper is extracted from _create_agent() for reuse by future agent types (SkillAgent, VoiceAgent). 543/543 tests passing — zero behavior change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add agent type registry and create_agent() factory (#141) Introduce agent_types.py with a pluggable registry pattern: - create_agent(agent_type, **kwargs) → BaseAgent subclass - register_agent_type(name, cls) for dynamic registration - ChatAgent registered as "chat" by default Future agent types (skill, voice) will register themselves here. Existing code is unchanged — MainAgent still works as before. 552/552 tests passing (9 new factory tests). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add progressive skill disclosure system with SkillAgent (#142) Implement three-level skill architecture adapted from sample-strands-agent: - Level 1: Lightweight skill catalog injected into system prompt - Level 2: SKILL.md instructions loaded on-demand via skill_dispatcher - Level 3: Tool execution via skill_executor New modules: - skills/skill_registry.py: Discovers SKILL.md files, binds tools, serves catalog - skills/skill_tools.py: skill_dispatcher + skill_executor Strands @tool functions - skills/decorators.py: @Skill() decorator and register_skill() for tool tagging - skill_agent.py: SkillAgent(ChatAgent) with progressive disclosure override - skills/definitions/web-search/SKILL.md: Example skill definition SkillAgent registered as "skill" in agent_types factory. Existing behavior completely unchanged — SkillAgent is additive only. 590/590 tests passing (38 new skill tests). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add VoiceAgent with BidiAgent for speech-to-speech interaction (#143) Implement VoiceAgent(BaseAgent) for bidirectional voice using Nova Sonic 2: - BidiNovaSonicModel with configurable voice, sample rate, and model - Voice-text continuity via _load_text_history() from text session - Separate agent_id ("voice") to prevent session state conflicts - Voice-optimized system prompt with conversational guidelines - PyAudio mock for server-side (browser uses Web Audio API) - Conditional registration — only available with strands-agents[bidi] Add voice-related constants to config/constants.py (EnvVars + Defaults). Register "voice" type in agent_types factory. 606/606 tests passing (16 new voice tests). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add approval hooks for gating dangerous tool operations (#144) Implement three approval hook categories following the sample-strands-agent pattern, all using Strands BeforeToolCallEvent: - EmailApprovalHook: Gates send_email, delete_emails, forward_email, etc. - ExternalWriteApprovalHook: Gates create_pull_request, deploy, push_code, etc. - DangerousToolApprovalHook: Gates delete_file, drop_table, execute_sql, etc. Hooks set _approval_required/_approval_message on the tool_use dict for the streaming layer to surface to the client for user confirmation. All hooks registered in BaseAgent._create_hooks() — inherited by all agent types (ChatAgent, SkillAgent, VoiceAgent). 618/618 tests passing (12 new approval hook tests). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add WebSocket voice route and bidi dependency for VoiceAgent (#145) * feat: add bidi dependency, WebSocket voice route, and test client Wire up the VoiceAgent for end-to-end testing: - Add strands-agents[bidi] optional dependency group to pyproject.toml - Fix BidiAgent/BidiNovaSonicModel import paths (strands.experimental.bidi) - Create voice_routes.py with WebSocket endpoint at /voice/stream - JWT auth from query params (trusted decode, same as invocations) - Bidirectional protocol: audio/text input, agent event streaming - Debug endpoints: GET /voice/sessions, DELETE /voice/sessions/{id} - Register voice router in inference API main.py - Add test_voice_client.py script for manual WebSocket testing 632/632 tests passing (14 new voice route tests). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: handle CancelledError in VoiceAgent.stop() during teardown The BidiAgent's Nova Sonic stream teardown can raise CancelledError when pending AWS SDK futures are cancelled during shutdown. This is expected behavior, not an error. - VoiceAgent.stop(): catch CancelledError and Exception from BidiAgent - voice_routes.py finally block: catch BaseException (CancelledError is a BaseException in Python 3.12, escaping except Exception) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: pass session_id and agent_id to list_messages in voice history AgentCoreMemorySessionManager.list_messages() requires session_id and agent_id positional args. Pass session_id=self.session_id and agent_id="default" to read the text chat agent's history for voice-text continuity. Use the SDK's limit param instead of post-slicing. Update tests to verify the correct call signature. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use BidiAgent.receive() for voice event streaming BidiAgent uses receive() as its event source, not stream_async(). Audio/text input is sent via send_audio()/send_text() separately, and receive() yields typed events (BidiAudioStreamEvent, BidiTranscriptStreamEvent, etc.) asynchronously. - VoiceAgent.stream_async(): iterate BidiAgent.receive(), yield event.as_dict() for JSON-serializable dicts - voice_routes._send_to_client(): simplified to handle dicts directly since stream_async now yields dicts, not strings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Angular voice components for Nova Sonic bidirectional audio Frontend voice support with three-layer architecture: New services (frontend/ai.client/src/app/session/services/voice/): - pcm-utils.ts: Pure PCM encoding/decoding (Float32↔Int16↔base64) - AudioRecorderService: Mic capture via Web Audio API → 16kHz PCM chunks - AudioPlayerService: Gapless base64 PCM playback with interruption support - VoiceChatService: WebSocket orchestration + state machine (idle → connecting → listening → speaking) Modified components: - chat-input: Voice toggle button with animated state indicators (pulsing red = listening, bouncing green = speaking, spinner = connecting) - chat-input template: Live transcript overlay during voice mode - session.page.ts: Wire voice response completions to message list - MessageMapService: addVoiceMessage() for finalized voice transcripts TypeScript compiles cleanly (tsc --noEmit). Angular build requires Node 20.19+ (current machine has 20.18.1). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: convert SessionMessage to dict for BidiAgent and fix TS2774 Backend: _load_text_history() now calls .to_dict() on SessionMessage objects before passing to BidiAgent. Nova Sonic expects plain dicts with {"role": "...", "content": [...]}, not SessionMessage objects. Frontend: Fix TS2774 in AudioRecorderService — use typeof check instead of truthiness check for getUserMedia function detection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use to_message() instead of to_dict() for BidiAgent history SessionMessage.to_dict() wraps the message in metadata: {"message": {"role": ..., "content": [...]}, "message_id": 0, ...} SessionMessage.to_message() returns the plain message dict: {"role": "user", "content": [...]} Nova Sonic's _get_message_history_events expects the plain format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use BidiAgent.send() and receive() APIs correctly BidiAgent has send(dict) and receive() — not send_audio()/send_text() or stream_async(). Align VoiceAgent methods with the actual SDK: - send_audio(): calls self._bidi_agent.send({"type": "bidi_audio_input", ...}) - send_text(): calls self._bidi_agent.send({"type": "bidi_text_input", ...}) - receive_events(): wraps self._bidi_agent.receive() with as_dict() conversion - stream_async(): now a no-op stub (voice uses receive_events() instead) Update voice_routes._send_to_client to call receive_events() not stream_async(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Implement feature X to enhance user experience and optimize performance * feat: add voice overlay component for voice interactions - Implemented VoiceOverlayComponent with HTML, CSS, and TypeScript files. - Added styles for visualizer orb and status badges using Tailwind CSS. - Integrated voice status management and session handling in the component. - Enhanced voice chat service to support transcript entries and reveal logic. - Updated session page to handle voice overlay closure and persist transcripts as messages. - Introduced configuration constants for voice processing parameters. * feat: enhance voice agent with real-time cost calculation and metadata handling * fix: refine token usage handling and improve message processing in voice components * fix: sanitize user-provided values in log statements to prevent log injection Addresses CodeQL alert #567 (py/log-injection). All user-provided values (session_id, user_id, msg_type, enabled_tools) are now passed through _sanitize_log() which strips newline and carriage return characters before being interpolated into log messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: WebSocket voice streaming with AgentCore auth support (#155) * feat: update WebSocket voice streaming endpoint for AgentCore compatibility * fix: ensure config message is required for WebSocket voice stream authentication * feat: WebSocket voice streaming with AgentCore auth and protocol config (#156) * feat: update WebSocket voice streaming endpoint for AgentCore compatibility * fix: ensure config message is required for WebSocket voice stream authentication * feat: add protocol configuration for HTTP support in InferenceApiStack * fix: include bidi dependency in uv sync commands for Inference API Dockerfile (#157) * fix: improve AgentCore connection detection in voice stream handling (#158) * fix: align voice WebSocket with reference architecture accept-first pattern (#159) * fix: align voice WebSocket with reference architecture accept-first pattern Rewrites voice_stream to match the sample-strands-agent-with-agentcore reference architecture: - Accept WebSocket immediately (AgentCore validates auth at proxy layer) - Extract params via helper functions: custom header → query param → config message - Config message always read to supplement missing params in cloud mode - /voice/stream as main route, /ws as alias for AgentCore Runtime - Frontend uses /voice/stream for local dev, /ws for AgentCore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add missing try block in voice_stream causing IndentationError Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * docs: add Voice Mode to Key Features in README (#160) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: colinmxs <colinmxs@users.noreply.github.com> Co-authored-by: Colin Smith <7762103+colinmxs@users.noreply.github.com> Co-authored-by: Phil Merrell <philmerrell@boisestate.edu> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ofilson
added a commit
that referenced
this pull request
Apr 21, 2026
* basic e2e testing (not hooked up to nightly) * get rid of warnings * add all home page tests. * settings and assistants page tests * Rebase e2e testing branch (#164) * test(session): update compaction config defaults and fix integration test skipping - Enable compaction by default in CompactionConfig - Increase protected_turns default from 2 to 3 - Add pytest marker to skip integration tests when AGENTCORE_MEMORY_ID is not set - Fix import path for get_metadata_storage in cache savings tests from metadata_storage.get_metadata_storage to storage.get_metadata_storage - Ensures integration tests only run in appropriate environments with required AWS credentials * test(session): enhance session manager fixture with initialize() mock and cleanup - Mock AgentCoreMemorySessionManager.initialize() to simulate SDK behavior - Add _mock_sdk_initialize shim that loads messages and validates agent uniqueness - Track active patches in fixture scope for proper cleanup on teardown - Update fixture docstring to document initialize() mocking and message control - Convert fixture to generator with yield to enable patch cleanup - Allow tests to control loaded messages via mgr.read_agent and mgr.list_messages * Release 1.0.0-beta.22: Cognito-native auth, CORS unification, RBAC consolidation, Trivy supply chain fix (#137)⚠️ BREAKING CHANGE: Authentication replaced with AWS Cognito. The legacy generic OIDC implementation has been removed with no backward compatibility layer. Existing deployments must re-bootstrap. Cognito First-Boot Authentication: - Cognito User Pool, App Client, and Domain provisioned in Infrastructure stack - CognitoJWTValidator replaces GenericOIDCJWTValidator - New system/ module for first-boot setup, Cognito user/group management - New cognito_idp_service for federated identity provider CRUD via Cognito IdP APIs - First-boot page with admin account creation (race-condition-safe DynamoDB writes) - Frontend auth flow rewritten for Cognito OAuth 2.0 + PKCE - Runtime-provisioner and runtime-updater Lambda functions removed (2,800+ lines) - Backend OIDC service, token exchange, and discovery endpoints removed (1,318 lines) - 2,057 lines of new Cognito test coverage (IdP service, JWT validator, first-boot, system) RBAC Consolidation: - Single require_app_roles dependency replaces 6 role-checking functions/decorators - User roles enriched from stored DynamoDB profile during token processing - Profile cache invalidation on sync for immediate role updates - JSON array parsing for custom:roles claim (Entra ID compatibility) - jwt_role_mappings updates allowed on system_admin role CORS Unification: - buildCorsOrigins() shared helper across all 6 CDK stacks - S3 CORS made conditional, ExposedHeaders→ExposeHeaders fix - Python APIs read CORS_ORIGINS env var (replaces allow_origins=['*']) Security: - Trivy action upgraded v0.28.0→v0.35.0 — old SHA was compromised in March 2026 supply chain attack (GHSA-69fq-xp46-6x23) CI/CD: - CDK_DOMAIN_NAME and CDK_CORS_ORIGINS added to all workflow jobs - App API synth-cdk actually skipped on PRs (guard was missing despite beta.20 docs) - SSM StringParameter creation guarded against empty values Bootstrap: - seed_bootstrap_data.py sole owner of RBAC role seeding (removed from app startup) - system_admin role seeded with jwt_role_mappings=['system_admin'] - Additive JWT mapping seeding for existing deployments Documentation: - 54,665 lines of outdated specs and AI artifacts purged (121 files) Dependencies: - Python: fastapi 0.135.3, uvicorn 0.44.0, boto3 1.42.83, strands-agents 1.34.1, bedrock-agentcore 1.6.0, google-genai 1.70.0, ruff 0.15.9, mypy 1.20.0 - Frontend: Angular 21.2.7, katex 0.16.45, mermaid 11.14.0, Analog.js alpha.26 - Infrastructure: aws-cdk-lib 2.248.0, aws-cdk 2.1117.0, ts-jest 29.4.9 * refactor: centralize env vars and magic strings into config/constants.py (#139) Create agents/main_agent/config/constants.py with EnvVars, Defaults, and Prefixes classes. Update all 13 modules to import from the centralized constants instead of using inline os.getenv() with hardcoded strings. This eliminates scattered magic strings and provides a single reference for all configuration. Zero behavior change — all values are identical. 543/543 tests passing. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: extract BaseAgent ABC and ChatAgent from MainAgent (#140) * refactor: centralize env vars and magic strings into config/constants.py Create agents/main_agent/config/constants.py with EnvVars, Defaults, and Prefixes classes. Update all 13 modules to import from the centralized constants instead of using inline os.getenv() with hardcoded strings. This eliminates scattered magic strings and provides a single reference for all configuration. Zero behavior change — all values are identical. 543/543 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: extract BaseAgent ABC and ChatAgent from MainAgent Split MainAgent into a three-tier hierarchy: - BaseAgent (ABC): shared init for model config, tools, session, streaming - ChatAgent(BaseAgent): Strands Agent creation and text streaming - MainAgent(ChatAgent): backward-compatible alias (pass-through) All existing callers continue to import and use MainAgent unchanged. The _build_filtered_tools() helper is extracted from _create_agent() for reuse by future agent types (SkillAgent, VoiceAgent). 543/543 tests passing — zero behavior change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add agent type registry and create_agent() factory (#141) Introduce agent_types.py with a pluggable registry pattern: - create_agent(agent_type, **kwargs) → BaseAgent subclass - register_agent_type(name, cls) for dynamic registration - ChatAgent registered as "chat" by default Future agent types (skill, voice) will register themselves here. Existing code is unchanged — MainAgent still works as before. 552/552 tests passing (9 new factory tests). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add progressive skill disclosure system with SkillAgent (#142) Implement three-level skill architecture adapted from sample-strands-agent: - Level 1: Lightweight skill catalog injected into system prompt - Level 2: SKILL.md instructions loaded on-demand via skill_dispatcher - Level 3: Tool execution via skill_executor New modules: - skills/skill_registry.py: Discovers SKILL.md files, binds tools, serves catalog - skills/skill_tools.py: skill_dispatcher + skill_executor Strands @tool functions - skills/decorators.py: @Skill() decorator and register_skill() for tool tagging - skill_agent.py: SkillAgent(ChatAgent) with progressive disclosure override - skills/definitions/web-search/SKILL.md: Example skill definition SkillAgent registered as "skill" in agent_types factory. Existing behavior completely unchanged — SkillAgent is additive only. 590/590 tests passing (38 new skill tests). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add VoiceAgent with BidiAgent for speech-to-speech interaction (#143) Implement VoiceAgent(BaseAgent) for bidirectional voice using Nova Sonic 2: - BidiNovaSonicModel with configurable voice, sample rate, and model - Voice-text continuity via _load_text_history() from text session - Separate agent_id ("voice") to prevent session state conflicts - Voice-optimized system prompt with conversational guidelines - PyAudio mock for server-side (browser uses Web Audio API) - Conditional registration — only available with strands-agents[bidi] Add voice-related constants to config/constants.py (EnvVars + Defaults). Register "voice" type in agent_types factory. 606/606 tests passing (16 new voice tests). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add approval hooks for gating dangerous tool operations (#144) Implement three approval hook categories following the sample-strands-agent pattern, all using Strands BeforeToolCallEvent: - EmailApprovalHook: Gates send_email, delete_emails, forward_email, etc. - ExternalWriteApprovalHook: Gates create_pull_request, deploy, push_code, etc. - DangerousToolApprovalHook: Gates delete_file, drop_table, execute_sql, etc. Hooks set _approval_required/_approval_message on the tool_use dict for the streaming layer to surface to the client for user confirmation. All hooks registered in BaseAgent._create_hooks() — inherited by all agent types (ChatAgent, SkillAgent, VoiceAgent). 618/618 tests passing (12 new approval hook tests). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add WebSocket voice route and bidi dependency for VoiceAgent (#145) * feat: add bidi dependency, WebSocket voice route, and test client Wire up the VoiceAgent for end-to-end testing: - Add strands-agents[bidi] optional dependency group to pyproject.toml - Fix BidiAgent/BidiNovaSonicModel import paths (strands.experimental.bidi) - Create voice_routes.py with WebSocket endpoint at /voice/stream - JWT auth from query params (trusted decode, same as invocations) - Bidirectional protocol: audio/text input, agent event streaming - Debug endpoints: GET /voice/sessions, DELETE /voice/sessions/{id} - Register voice router in inference API main.py - Add test_voice_client.py script for manual WebSocket testing 632/632 tests passing (14 new voice route tests). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: handle CancelledError in VoiceAgent.stop() during teardown The BidiAgent's Nova Sonic stream teardown can raise CancelledError when pending AWS SDK futures are cancelled during shutdown. This is expected behavior, not an error. - VoiceAgent.stop(): catch CancelledError and Exception from BidiAgent - voice_routes.py finally block: catch BaseException (CancelledError is a BaseException in Python 3.12, escaping except Exception) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: pass session_id and agent_id to list_messages in voice history AgentCoreMemorySessionManager.list_messages() requires session_id and agent_id positional args. Pass session_id=self.session_id and agent_id="default" to read the text chat agent's history for voice-text continuity. Use the SDK's limit param instead of post-slicing. Update tests to verify the correct call signature. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use BidiAgent.receive() for voice event streaming BidiAgent uses receive() as its event source, not stream_async(). Audio/text input is sent via send_audio()/send_text() separately, and receive() yields typed events (BidiAudioStreamEvent, BidiTranscriptStreamEvent, etc.) asynchronously. - VoiceAgent.stream_async(): iterate BidiAgent.receive(), yield event.as_dict() for JSON-serializable dicts - voice_routes._send_to_client(): simplified to handle dicts directly since stream_async now yields dicts, not strings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Angular voice components for Nova Sonic bidirectional audio Frontend voice support with three-layer architecture: New services (frontend/ai.client/src/app/session/services/voice/): - pcm-utils.ts: Pure PCM encoding/decoding (Float32↔Int16↔base64) - AudioRecorderService: Mic capture via Web Audio API → 16kHz PCM chunks - AudioPlayerService: Gapless base64 PCM playback with interruption support - VoiceChatService: WebSocket orchestration + state machine (idle → connecting → listening → speaking) Modified components: - chat-input: Voice toggle button with animated state indicators (pulsing red = listening, bouncing green = speaking, spinner = connecting) - chat-input template: Live transcript overlay during voice mode - session.page.ts: Wire voice response completions to message list - MessageMapService: addVoiceMessage() for finalized voice transcripts TypeScript compiles cleanly (tsc --noEmit). Angular build requires Node 20.19+ (current machine has 20.18.1). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: convert SessionMessage to dict for BidiAgent and fix TS2774 Backend: _load_text_history() now calls .to_dict() on SessionMessage objects before passing to BidiAgent. Nova Sonic expects plain dicts with {"role": "...", "content": [...]}, not SessionMessage objects. Frontend: Fix TS2774 in AudioRecorderService — use typeof check instead of truthiness check for getUserMedia function detection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use to_message() instead of to_dict() for BidiAgent history SessionMessage.to_dict() wraps the message in metadata: {"message": {"role": ..., "content": [...]}, "message_id": 0, ...} SessionMessage.to_message() returns the plain message dict: {"role": "user", "content": [...]} Nova Sonic's _get_message_history_events expects the plain format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use BidiAgent.send() and receive() APIs correctly BidiAgent has send(dict) and receive() — not send_audio()/send_text() or stream_async(). Align VoiceAgent methods with the actual SDK: - send_audio(): calls self._bidi_agent.send({"type": "bidi_audio_input", ...}) - send_text(): calls self._bidi_agent.send({"type": "bidi_text_input", ...}) - receive_events(): wraps self._bidi_agent.receive() with as_dict() conversion - stream_async(): now a no-op stub (voice uses receive_events() instead) Update voice_routes._send_to_client to call receive_events() not stream_async(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Implement feature X to enhance user experience and optimize performance * feat: add voice overlay component for voice interactions - Implemented VoiceOverlayComponent with HTML, CSS, and TypeScript files. - Added styles for visualizer orb and status badges using Tailwind CSS. - Integrated voice status management and session handling in the component. - Enhanced voice chat service to support transcript entries and reveal logic. - Updated session page to handle voice overlay closure and persist transcripts as messages. - Introduced configuration constants for voice processing parameters. * feat: enhance voice agent with real-time cost calculation and metadata handling * fix: refine token usage handling and improve message processing in voice components * fix: sanitize user-provided values in log statements to prevent log injection Addresses CodeQL alert #567 (py/log-injection). All user-provided values (session_id, user_id, msg_type, enabled_tools) are now passed through _sanitize_log() which strips newline and carriage return characters before being interpolated into log messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: WebSocket voice streaming with AgentCore auth support (#155) * feat: update WebSocket voice streaming endpoint for AgentCore compatibility * fix: ensure config message is required for WebSocket voice stream authentication * feat: WebSocket voice streaming with AgentCore auth and protocol config (#156) * feat: update WebSocket voice streaming endpoint for AgentCore compatibility * fix: ensure config message is required for WebSocket voice stream authentication * feat: add protocol configuration for HTTP support in InferenceApiStack * fix: include bidi dependency in uv sync commands for Inference API Dockerfile (#157) * fix: improve AgentCore connection detection in voice stream handling (#158) * fix: align voice WebSocket with reference architecture accept-first pattern (#159) * fix: align voice WebSocket with reference architecture accept-first pattern Rewrites voice_stream to match the sample-strands-agent-with-agentcore reference architecture: - Accept WebSocket immediately (AgentCore validates auth at proxy layer) - Extract params via helper functions: custom header → query param → config message - Config message always read to supplement missing params in cloud mode - /voice/stream as main route, /ws as alias for AgentCore Runtime - Frontend uses /voice/stream for local dev, /ws for AgentCore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add missing try block in voice_stream causing IndentationError Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * docs: add Voice Mode to Key Features in README (#160) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: colinmxs <colinmxs@users.noreply.github.com> Co-authored-by: Colin Smith <7762103+colinmxs@users.noreply.github.com> Co-authored-by: Phil Merrell <philmerrell@boisestate.edu> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * small testing fix * add e2e to nightly process * fix tests / warnings --------- Co-authored-by: Oscar Filson <OSCARFILSON@boisestate.edu> Co-authored-by: colinmxs <colinmxs@users.noreply.github.com> Co-authored-by: Colin Smith <7762103+colinmxs@users.noreply.github.com> Co-authored-by: Phil Merrell <philmerrell@boisestate.edu> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
auth_token,session_id,user_id, andenabled_toolsarrive via the first client message, supplementing any headers or query params_get_param_from_request/_get_enabled_tools_from_requestfor clean AgentCore custom header → query param fallback/voice/streamas main route,/wsas alias for AgentCore Runtime compatibility/voice/streamfor local dev,/wsfor AgentCoreTest plan
/ws)/voice/streamwith query-param token🤖 Generated with Claude Code