Skip to content

[Bug]: Responses API: Streaming returns ResponseTextDeltaEvent instead of ResponseFunctionCallArgumentsDeltaEvent for tool calls while using non-harmony models #29725

@sumitaryal

Description

@sumitaryal

Your current environment

The output of python collect_env.py
Collecting environment information...
==============================
        System Info
==============================
OS                           : macOS 26.1 (arm64)
GCC version                  : Could not collect
Clang version                : 17.0.0 (clang-1700.4.4.1)
CMake version                : version 3.31.5
Libc version                 : N/A

==============================
       PyTorch Info
==============================
PyTorch version              : 2.9.0
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.9 (main, Mar 11 2025, 17:41:32) [Clang 20.1.0 ] (64-bit runtime)
Python platform              : macOS-26.1-arm64-arm-64bit

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : False
CUDA runtime version         : No CUDA
CUDA_MODULE_LOADING set to   : N/A
GPU models and configuration : No CUDA
Nvidia driver version        : No CUDA
cuDNN version                : No CUDA
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Apple M1

==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.2.6
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.0
[pip3] torchaudio==2.9.0
[pip3] torchvision==0.24.0
[pip3] transformers==4.57.3
[conda] blas                      1.0                         mkl  
[conda] ffmpeg                    4.3                  h0a44026_0    pytorch
[conda] mkl                       2021.4.0           hecd8cb5_637  
[conda] mkl-service               2.4.0            py39h9ed2024_0  
[conda] mkl_fft                   1.3.1            py39h4ab4a9b_0  
[conda] mkl_random                1.2.2            py39hb2f4e1b_0  
[conda] numpy                     1.19.3                   pypi_0    pypi
[conda] numpydoc                  1.4.0            py39hecd8cb5_0  
[conda] pytorch                   1.13.1                  py3.9_0    pytorch
[conda] pyzmq                     23.2.0           py39he9d5cce_0  
[conda] torchvision               0.14.1                 py39_cpu    pytorch
[conda] transformers              4.29.2                   pypi_0    pypi

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.1.dev11724+gea228b449.d20251127 (git sha: ea228b449, date: 20251127)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1

🐛 Describe the bug

When using the Responses API with streaming enabled and a tool call is expected for non-harmony models, for e.g. Qwen3, the stream emits ResponseTextDeltaEvent chunks instead of ResponseFunctionCallArgumentsDeltaEvent. As a result, the client cannot reliably detect or parse tool call arguments from the stream, since the payload appears as plain text deltas rather than structured function call argument events.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",  # vLLM server URL
    api_key="not-used",
)

tools = [
    {
        "type": "function",
        "name": "get_weather",
        "description": "Get current temperature for a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country, for example Paris, France",
                },
            },
            "required": ["location"],
            "additionalProperties": False,
        },
    }
]

stream = client.responses.create(
    model="Qwen/Qwen3-4B-Instruct-2507",
    input=[{"role": "user", "content": "What's the weather like in Paris today?"}],
    tools=tools,
    stream=True
)

for event in stream:
    print(event)

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions