Skip to content

fix: harden streaming reasoning/tool parsing across engines#1

Open
krystophny wants to merge 1 commit intoThump604:fix/streaming-reasoning-tool-coexistence-v2from
computor-org:fix/pr177-streaming-tool-hardening
Open

fix: harden streaming reasoning/tool parsing across engines#1
krystophny wants to merge 1 commit intoThump604:fix/streaming-reasoning-tool-coexistence-v2from
computor-org:fix/pr177-streaming-tool-hardening

Conversation

@krystophny
Copy link
Copy Markdown

This builds on waybarrios#177 and adds the remaining hardening that showed up under broader local + live validation.

What this adds on top of the branch:

  • preserve raw tool-call text for non-streaming chat in both SimpleEngine and BatchedEngine
  • parse Qwen function/parameter XML in addition to the JSON and bracket formats
  • make Qwen/Harmony streaming completion checks use accumulated text instead of depending on a lucky final delta
  • route GPT-OSS streaming through the split-token Harmony state machine
  • dedupe repeated identical Harmony commentary calls and emit only newly completed calls in streaming
  • treat atomic tool-call parsers as terminal in streaming, so completed tool calls finish with finish_reason="tool_calls"
  • replace the format-specific end-of-stream tool fallback with a validated final parse over accumulated tool text
  • add direct streaming and engine-preservation coverage for these cases

Validation on current main before splitting this out:

  • pytest -q tests -x --timeout=120 -m 'not slow'
  • result: 1090 passed, 11 skipped, 18 deselected
  • live E2E passes:
    • Qwen3.5-2B simple + batched
    • Qwen3.5-9B simple + batched
    • Qwen3-Coder-30B simple
    • GPT-OSS-20B with gpt_oss reasoning parser, simple + batched
    • GPT-OSS-20B with harmony reasoning parser, simple
    • Devstral-Small-2 simple + batched with forced named tool choice

Validation on this branch:

  • python -m pytest -q tests/test_harmony_parsers.py tests/test_tool_parsers.py tests/test_simple_engine.py tests/test_engine_tool_output_preservation.py
  • python -m py_compile vllm_mlx/server.py vllm_mlx/engine/simple.py vllm_mlx/engine/batched.py vllm_mlx/reasoning/gpt_oss_parser.py vllm_mlx/reasoning/harmony_parser.py vllm_mlx/tool_parsers/harmony_tool_parser.py vllm_mlx/tool_parsers/qwen_tool_parser.py

Note: importing server.py on the base branch still hits pre-existing Responses-API type references that are unrelated to this streaming/tool fix, so I kept the branch-local rerun focused on the directly touched parser/engine coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant