Skip to content

Stop generation on consecutive duplicate tool calls#1027

Draft
chopchop-jiahao wants to merge 2 commits intoml-explore:mainfrom
chopchop-jiahao:feat/tool-call-dedup
Draft

Stop generation on consecutive duplicate tool calls#1027
chopchop-jiahao wants to merge 2 commits intoml-explore:mainfrom
chopchop-jiahao:feat/tool-call-dedup

Conversation

@chopchop-jiahao
Copy link
Copy Markdown

Summary

When using tool calling with certain models (e.g. Qwen3.5-9B), the model sometimes generates 43–64 identical tool calls in a single response until max_tokens is exhausted. This wastes compute and confuses the agent loop.

This PR adds a lightweight ToolCallDedup class that compares each completed tool call with the previous one. On consecutive duplicate, generation stops early with finish_reason=tool_calls.

Changes:

  • New mlx_lm/tool_call_dedup.py — simple consecutive duplicate detector (exact string comparison, no JSON normalization)
  • Integration in server.py — one instance per request, checked at each tool_call_end token
  • Unit tests covering: first call, different calls, consecutive/non-consecutive duplicates, whitespace sensitivity, logging, state management
  • Integration tests through the full HTTP pipeline, including a 43-duplicate reproduction of the real-world scenario, both streaming and non-streaming

Design decisions:

  • Only detects consecutive duplicates (A-A), not alternating patterns (A-B-A-B) — sufficient for observed behavior and avoids false positives
  • Exact text comparison — conservative, no risk of normalizing away meaningful differences
  • No configuration flag — the behavior is strictly beneficial (stops degenerate loops)
  • Per-request instance — no thread safety concerns

Closes #613

chopchop-jiahao and others added 2 commits March 20, 2026 08:23
Models like Qwen3.5-9B generate 43-64 identical tool calls in a single
response until max_tokens, wasting compute and confusing agent loops.

Add ToolCallDedup: compares each completed tool call with the previous
one during generation. On consecutive duplicate, stop with
finish_reason=tool_calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@chopchop-jiahao chopchop-jiahao marked this pull request as draft March 20, 2026 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tool / Function call issue with gpt-oss-20b-MXFP4-Q4

1 participant