Qwen3.5-35B-A3B-4bit can emit malformed tool-call output around 20k prompt tokens.

  I am seeing malformed tool-call output with `Qwen3.5-35B-A3B-4bit`.

  The important part is that this is reproducible with a direct `mlx_lm.stream_generate(...)` call, without going through any OpenAI-compatible server wrapper and without any downstream parser.

  ## Environment

  - `mlx_lm==0.31.1`
  - model: mlx-community/Qwen3.5-35B-A3B-4bit
  - prompt length in this repro: `20043` tokens
  - tool calling enabled
  - sampling used in the repro:
    - `temperature=0.7`
    - `top_p=0.95`
    - `top_k=0`
    - `min_p=0.0`
    - `repetition_penalty=1.3`
    - `presence_penalty=0.0`
    - `frequency_penalty=0.0`
    - `max_tokens=32768`
    - `seed=3`
  - chat template kwargs:
    - `{"enable_thinking": false}`

  ## What I attached

  - [request.json](https://github.com/user-attachments/files/26272812/request.json)
    - direct replay bundle for `mlx_lm`
    - contains normalized `messages`, `tools`, `sampling`, and the fully rendered `prompt`
  - [test_script.py](https://github.com/user-attachments/files/26272811/test_script.py)
    - minimal script
    - just loads the model and streams the raw output
    - no parsing, no post-processing, no OpenAI server layer

  ## Reproduction

  Edit the model path on line 7 of request.json.

  Run:

  ```bash
  python test_script.py request.json
 ```

  The script simply does:

  - load(model_path)
  - mx.random.seed(seed)
  - stream_generate(...)
  - print raw text to stdout

  ## Expected behavior

  The model should either:

  - emit a valid, fully closed tool-call block, or
  - answer in plain text

  It should not emit an incomplete / malformed tool-call block.

  ## Actual behavior

  With the attached prompt and seed=3, the direct raw generation produces malformed output like this:

  好嘞！我这就用 `agent-browser` 来查：

  <tool_call>
  <function=browser>

  This is a direct mlx_lm replay, so this is not caused by an external OpenAI-compatible server wrapper or a client-side parser.

  ## Additional observation

  This specific attached request is the last clean request before malformed tool-call content started polluting later history.

  After one malformed tool-call appears in history, later turns become much more likely to produce even worse corrupted blocks, for example:

  - unclosed <tool_call>
  - unclosed <function=...>
  - stray tags like </arg_key>
  - malformed parameter sections such as binaryNameFromEntryPointMap

  So there seem to be two separate issues:

  1. the first malformed tool-call can already happen from a clean context
  2. once it happens, feeding that malformed output back into history causes strong cascading contaminatione

  ### As a comparison, I tested the GGUF version and the exact same issue does not occur. 

  ## Secondary note about mlx_lm.server

  I also tested the same context through mlx_lm.server in streaming mode.

  In that path, the client only received the leading assistant text, then the stream terminated without [DONE], and the server raised:

  ValueError: No function provided.

  inside the tool parser.

  That server behavior is secondary, though. The main issue here is that the direct raw mlx_lm generation already produces malformed tool-call output.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3.5-35B-A3B-4bit can emit malformed tool-call output around 20k prompt tokens. #1061

Environment

What I attached

Reproduction

Expected behavior

Actual behavior

Additional observation

As a comparison, I tested the GGUF version and the exact same issue does not occur.

Secondary note about mlx_lm.server

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3.5-35B-A3B-4bit can emit malformed tool-call output around 20k prompt tokens. #1061

Description

Environment

What I attached

Reproduction

Expected behavior

Actual behavior

Additional observation

As a comparison, I tested the GGUF version and the exact same issue does not occur.

Secondary note about mlx_lm.server

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions