Skip to content

feat(chat): MCP tool call loop in built-in chat UI#377

Open
rayone wants to merge 4 commits intojundot:mainfrom
rayone:feat/chat-mcp-tool-call-loop
Open

feat(chat): MCP tool call loop in built-in chat UI#377
rayone wants to merge 4 commits intojundot:mainfrom
rayone:feat/chat-mcp-tool-call-loop

Conversation

@rayone
Copy link
Copy Markdown

@rayone rayone commented Mar 25, 2026

Problem

The built-in chat UI (/admin/chat) silently drops responses when a model decides to call an MCP tool. streamResponse only handled delta.content and delta.reasoning_content — when the model returns finish_reason: tool_calls there is no delta.content, so streamingContent stays empty and nothing is pushed to the message list.

Solution

  • Accumulate delta.tool_calls chunks during streaming (OpenAI chunked format, keyed by index)
  • On finish_reason: tool_calls: push a hidden assistant message with tool_calls into history, execute all tools in parallel via POST /v1/mcp/execute, push hidden tool result messages, then recurse back into streamResponse so the model receives full tool context and streams a final answer
  • Show a per-tool wrench indicator (toolName…) in the UI while execution is in flight
  • Hide bookkeeping assistant/tool protocol messages from display using _ui: false / x-show="msg._ui !== false"
  • Simplify messagesForApi to a single filter+map pass (was two passes with redundant filter)

Tested with

Tavily MCP + Qwen3.5-27B on macOS (Apple Silicon)

The chat UI's streamResponse function did not handle finish_reason:
tool_calls — when a model returned a tool call the response was silently
dropped, leaving the user with no output.

Changes:
- Accumulate delta.tool_calls chunks during streaming
- On finish_reason tool_calls: push hidden assistant message, execute
  all tools in parallel via /v1/mcp/execute, push hidden tool result
  messages, then recurse into streamResponse so the model can stream
  its final answer with full tool context
- Show a per-tool wrench indicator while execution is in flight
- Hide bookkeeping assistant/tool messages from the UI (msg._ui flag)
- Simplify messagesForApi to a single filter+map pass
@rayone
Copy link
Copy Markdown
Author

rayone commented Mar 25, 2026

Note re: #286

This PR and #286 are orthogonal:

  • MCP supports Streamable HTTP. #286 adds streamable_http transport to the Python MCP client — so oMLX can connect directly to remote MCP servers (like Tavily at https://mcp.tavily.com/mcp/) without needing mcp-remote as a stdio bridge.
  • This PR (feat(chat): MCP tool call loop in built-in chat UI #377) fixes the chat UI tool call loop — the built-in chat at /admin/chat was silently dropping responses whenever a model returned finish_reason: tool_calls.

Both are needed. If #286 merges first it would also allow simplifying mcp.json configs that currently work around the missing streamable HTTP transport using npx mcp-remote.

…lation

Tests the two core logic contracts:
- messagesForApi filtering: excludes tool_call indicators, preserves
  tool_calls/tool_call_id fields, leaves normal conversation unchanged
- Streaming tool_call chunk accumulation: single call, parallel calls,
  no calls, missing index defaults to 0
Copy link
Copy Markdown
Owner

@jundot jundot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, really useful addition. The tool call loop was definitely missing from the chat UI.

Two things before merging:

1. Recursion depth limit

streamResponse() recurses on finish_reason: tool_calls with no max depth. If the model keeps returning tool calls it loops forever. Adding a depth parameter with a cap (10 or so) would fix it.

2. Abort guard before recursion

If the user clicks Stop during tool execution, errors get caught but the code still recurses into a new stream. A quick if (this.abortController?.signal.aborted) return; before the recursive call would respect the user's intent.

@jundot jundot force-pushed the main branch 2 times, most recently from 187e87b to dfc5b20 Compare March 29, 2026 10:44
rayone added 2 commits March 31, 2026 10:35
…ling

Addresses both review requests from jundot:

1. Recursion depth limit: add depth param to streamResponse() with
   MAX_TOOL_DEPTH = 10 constant. Exceeding the limit pushes an error
   message and fully resets streaming state (isStreaming, streamingContent,
   abortController) before returning — the early return is before the
   try block so finally does not run.

2. Abort guard before recursion: check abortController.signal.aborted
   after tool execution completes and bail out before the recursive call.

Additional improvements beyond the review requests:

- 30s per-tool timeout via AbortSignal.timeout() combined with the user
  abort signal using AbortSignal.any(), with distinct TimeoutError message
- HTTP error detection on /v1/mcp/execute responses (non-ok status)
- Error indicators: failed tools show red wrench + 'toolName failed' for
  2s before being removed, so the user sees what went wrong
- MAX_TOOL_DEPTH and TOOL_TIMEOUT_MS moved to top-level chatApp() data
  properties for discoverability and easier tuning

Tests: extend TestChatToolCallSafety with 11 new cases covering depth
boundary conditions, result format contracts (success/error/timeout/HTTP),
error indicator format, API filtering of error indicators, and abort guard
logic (including None controller safety).
…vity

Alpine 3 only tracks properties that exist on an object at creation time.
Adding _error to an indicator after push (indicator._error = true) would
not trigger a re-render because Alpine never set up a dependency for it.

Fix: include _error: false in the initial push so Alpine tracks the
property from creation and correctly re-renders when it flips to true.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants