fix: harden streaming reasoning/tool parsing across engines by krystophny · Pull Request #1 · Thump604/vllm-mlx

krystophny · 2026-03-29T06:32:54Z

This builds on waybarrios#177 and adds the remaining hardening that showed up under broader local + live validation.

What this adds on top of the branch:

preserve raw tool-call text for non-streaming chat in both SimpleEngine and BatchedEngine
parse Qwen function/parameter XML in addition to the JSON and bracket formats
make Qwen/Harmony streaming completion checks use accumulated text instead of depending on a lucky final delta
route GPT-OSS streaming through the split-token Harmony state machine
dedupe repeated identical Harmony commentary calls and emit only newly completed calls in streaming
treat atomic tool-call parsers as terminal in streaming, so completed tool calls finish with finish_reason="tool_calls"
replace the format-specific end-of-stream tool fallback with a validated final parse over accumulated tool text
add direct streaming and engine-preservation coverage for these cases

Validation on current main before splitting this out:

pytest -q tests -x --timeout=120 -m 'not slow'
result: 1090 passed, 11 skipped, 18 deselected
live E2E passes:
- Qwen3.5-2B simple + batched
- Qwen3.5-9B simple + batched
- Qwen3-Coder-30B simple
- GPT-OSS-20B with gpt_oss reasoning parser, simple + batched
- GPT-OSS-20B with harmony reasoning parser, simple
- Devstral-Small-2 simple + batched with forced named tool choice

Validation on this branch:

python -m pytest -q tests/test_harmony_parsers.py tests/test_tool_parsers.py tests/test_simple_engine.py tests/test_engine_tool_output_preservation.py
python -m py_compile vllm_mlx/server.py vllm_mlx/engine/simple.py vllm_mlx/engine/batched.py vllm_mlx/reasoning/gpt_oss_parser.py vllm_mlx/reasoning/harmony_parser.py vllm_mlx/tool_parsers/harmony_tool_parser.py vllm_mlx/tool_parsers/qwen_tool_parser.py

Note: importing server.py on the base branch still hits pre-existing Responses-API type references that are unrelated to this streaming/tool fix, so I kept the branch-local rerun focused on the directly touched parser/engine coverage.

fix: harden streaming tool parsing across engines

6732c3f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden streaming reasoning/tool parsing across engines#1

fix: harden streaming reasoning/tool parsing across engines#1
krystophny wants to merge 1 commit intoThump604:fix/streaming-reasoning-tool-coexistence-v2from
computor-org:fix/pr177-streaming-tool-hardening

krystophny commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krystophny commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant