Skip to content

Mistral Tool Calling: Premature termination on empty tokens #1069

@vitaly-d

Description

@vitaly-d

Issue:

In server.py, the tool-calling logic incorrectly terminates if gen.text is an empty string and the model's tool_call_end is also "" .

2026-03-27 20:47:58,659 - INFO - KV Caches: 1 seq, 7.06 GB, latest user cache 0 tokens
127.0.0.1 - - [27/Mar/2026 20:47:58] "POST /v1/chat/completions HTTP/1.1" 200 -
2026-03-27 20:47:58,680 - DEBUG - Starting stream:
2026-03-27 20:48:00,370 - INFO - Prompt processing progress: 95/96
2026-03-27 20:48:00,905 - DEBUG - [TOOL_CALLS]
2026-03-27 20:48:01,117 - DEBUG - write
2026-03-27 20:48:01,339 - DEBUG - [ARGS]
2026-03-27 20:48:01,571 - DEBUG - {"
2026-03-27 20:48:01,810 - DEBUG - file
2026-03-27 20:48:02,052 - DEBUG - Path
2026-03-27 20:48:02,299 - DEBUG - ":
2026-03-27 20:48:02,545 - DEBUG -  "/
...
2026-03-27 20:48:09,790 - DEBUG - content
2026-03-27 20:48:10,040 - DEBUG - ":
2026-03-27 20:48:10,290 - DEBUG -  "#
2026-03-27 20:48:10,538 - DEBUG -  Document
2026-03-27 20:48:10,787 - DEBUG -  Loader
2026-03-27 20:48:11,035 - DEBUG -  Module
2026-03-27 20:48:11,283 - DEBUG - \n
2026-03-27 20:48:11,533 - DEBUG - #
...
2026-03-27 20:48:25,284 - DEBUG - import
2026-03-27 20:48:25,547 - DEBUG -  torch
2026-03-27 20:48:25,808 - DEBUG - \n
2026-03-27 20:48:26,075 - DEBUG - \n
2026-03-27 20:48:26,345 - DEBUG - #
2026-03-27 20:48:26,610 - DEBUG -  ---
2026-03-27 20:48:26,871 - DEBUG -
----------------------------------------
Exception occurred during processing of request from ('127.0.0.1', 56974)
Traceback (most recent call last):
...
  packages/mlx_lm/tool_parsers/mistral.py", line 17, in parse_tool_call
    raise ValueError(f"Could not parse tool call from: {text}")
ValueError: Could not parse tool call from: write[ARGS]{"filePath": "/Volumes/Code/transformation-playground/maia-retrieval-layer/statements_retriever/doc_loader.py", "content": "# Document Loader Module\n# This module handles document loading and caching functionality\n\nfrom collections import defaultdict\nfrom dataclasses import dataclass, field\nimport hashlib\nfrom pathlib import Path\nfrom typing import Any, Callable, Dict, List, Protocol, Tuple\nimport torch\n\n# ---

$ pip show mlx-lm | grep "Version"
Version: 0.31.1

Possible cause:

Streaming detokenizers yield gen.text = "" when buffering tokens for punctuation cleaning or completing multi-byte UTF-8 sequences. The server's equality check if gen.text == ctx.tool_call_end: cannot distinguish between an intermediate buffered token and an actual end-of-call when the end-marker is an empty string.

Models:

I tried out:

$ ls -l ~/.cache/huggingface/hub | grep Devstral
drwxr-xr-x@ 5 vitaly  staff  160 Mar 26 13:22 models--mlx-community--Devstral-2-123B-Instruct-2512-4bit
drwxr-xr-x@ 5 vitaly  staff  160 Mar 28 01:16 models--mlx-community--mistralai_Devstral-Small-2-24B-Instruct-2512-MLX-BF16
drwxr-xr-x@ 5 vitaly  staff  160 Mar 28 00:31 models--mlx-community--mistralai_Devstral-Small-2-24B-Instruct-2512-MLX-MXFP4

the models were called from mistral-vibe:

active_model = "devstral-local"

[[providers]]
name = "local_mlx"
api_style = "openai"
api_base = "http://127.0.0.1:8080/v1"
api_key = "dummy_key"
# backend = "generic"

[[models]]
alias = "devstral-local"
provider = "local_mlx"
# name = "mlx-community/Devstral-2-123B-Instruct-2512-4bit"
# name = "mlx-community/mistralai_Devstral-Small-2-24B-Instruct-2512-MLX-MXFP4"
name = "mlx-community/mistralai_Devstral-Small-2-24B-Instruct-2512-MLX-BF16"

Proposed Fix:

Ensure tool_call_end is not "" before performing the equality check in server.py:

            elif in_tool_call:
                # if gen.text == ctx.tool_call_end:  <-- leads to parsing an incomplete tool call
                if ctx.tool_call_end and gen.text == ctx.tool_call_end:
                    tool_calls.append(tool_text)
                    tool_text = ""
                    in_tool_call = False
                else:
                    tool_text += gen.text

This allows intermediate empty strings to correctly fall through to the else block, where they are appended to the tool_text buffer without terminating the call.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions