Skip to content

Python: Bug: Streaming replies error – final ResponseOutputMessage missing .delta on large runs #12296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ltwlf opened this issue May 28, 2025 · 1 comment · Fixed by #12301
Assignees
Labels
agents bug Something isn't working python Pull requests for the Python Semantic Kernel

Comments

@ltwlf
Copy link
Contributor

ltwlf commented May 28, 2025

Describe the bug
Invoking an AzureResponseAgent that returns a function‑calling answer whose total context grows to many tokens (exact threshold still unclear) causes the run to terminate with

AttributeError: 'ResponseOutputMessage' object has no attribute 'delta'

To Reproduce

  1. Use the latest main branch of Semantic Kernel.
  2. Configure an Azure OpenAI GPT‑4 model that supports tool/function calling (e.g. gpt‑4o).
  3. Create an AzureResponseAgent with at least one function/tool; keep all defaults.
  4. Prompt the agent so that the response invokes the tool and the combined context becomes very large (e.g. ask for a deep recursive JSON structure).
  5. Observe the run end with the AttributeError above.

Expected behavior
The agent run should finish cleanly and deliver the complete assistant message (including tool_calls) without raising an exception, regardless of response length.

Screenshots
N/A – console traceback available upon request.

Platform

  • Language: Python 3.13.3
  • Source: main branch
  • AI model: Azure OpenAI gpt‑4.1
  • IDE: VS Code
  • OS: Windows 11 

Additional context

  • Bug has surfaced in two separate apps under active development.
  • Root cause seems related to OpenAI’s switch from streaming …MessageDeltaEvent objects to a final ResponseOutputMessage (which lacks .delta) for very large tool‑calling completions—but the precise size boundary still needs investigation.
@ltwlf ltwlf added the bug Something isn't working label May 28, 2025
@markwallace-microsoft markwallace-microsoft added python Pull requests for the Python Semantic Kernel triage labels May 28, 2025
@ltwlf
Copy link
Contributor Author

ltwlf commented May 28, 2025

Digging a bit deeper

Root cause – Since March 2025 the OpenAI/Azure Responses API appends a terminal object of type ResponseOutputMessage to every streamed run.

This object exposes its payload in .content and deliberately has no .delta.

By contrast, the incremental events SK was written for (ResponseTextDeltaEvent, ResponseMessageDeltaEvent, etc.) all carry their chunk under .delta.

Why we hit it mostly with tool calls – Whenever the model returns a tool_calls array, the SDK is required (per the Assistants spec) to emit the consolidated ResponseOutputMessage so you get a valid, schema-complete assistant message — see the “Final message” note in the official Assistants streaming docs. Plain-text replies may skip that step when the answer is short, which is why non-tool completions often survive.

Trigger pattern – Any long response (many tokens) or any function-calling response, because both reliably produce the final ResponseOutputMessage.

Effect in SK – In responses_agent_thread_actions.py the stream loop still assumes every event owns .delta; when the last object doesn’t, Python raises
AttributeError: 'ResponseOutputMessage' object has no attribute 'delta'.

Impact – Hard stop: the completed assistant message (and any tool_calls) never reaches callers, breaking both user UX and planner logic.

I’m putting together a concise fix and will open a PR shortly. Just wanted to document the investigation so everyone sees it’s a general streaming issue rather than something unique to function calls.

@ltwlf ltwlf changed the title Python: Bug: Large function‑calling responses with many tokens crash with AttributeError: ResponseOutputMessage has no delta Python: Bug: Streaming replies error – final ResponseOutputMessage missing .delta on large runs May 28, 2025
ltwlf added a commit to ltwlf/semantic-kernel that referenced this issue May 28, 2025
…alls_from_output

- Updated type annotation from list[ResponseFunctionToolCall] to list[ResponseOutputItem  < /dev/null |  ResponseOutputMessage] to accurately reflect actual input types
- Improved filtering to only process ResponseFunctionToolCall objects, safely ignoring other types like ResponseOutputMessage
- Removed problematic .delta access and dangerous cast operations that caused AttributeError: 'ResponseOutputMessage' object has no attribute 'call_id'

Fixes microsoft#12296

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@moonbox3 moonbox3 added agents and removed triage labels May 28, 2025
github-merge-queue bot pushed a commit that referenced this issue May 29, 2025
…alls_from_output (#12301)

## Summary
Fixes AttributeError: 'ResponseOutputMessage' object has no attribute
'call_id' in `ResponsesAgentThreadActions._get_tool_calls_from_output`

- Updated type annotation from `list[ResponseFunctionToolCall]` to
`list[ResponseOutputItem < /dev/null | ResponseOutputMessage]` to
accurately reflect actual input types
- Improved filtering to only process `ResponseFunctionToolCall` objects,
safely ignoring other types like `ResponseOutputMessage`
- Removed problematic `.delta` access and dangerous cast operations that
caused the AttributeError

## Test plan
- [x] All existing OpenAI responses agent tests pass (34/34)
- [x] Ruff formatting and linting passes
- [x] Pre-commit hooks pass

Fixes #12296

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agents bug Something isn't working python Pull requests for the Python Semantic Kernel
Projects
Status: No status
3 participants