[Bug]: Responses API: Streaming returns ResponseTextDeltaEvent instead of ResponseFunctionCallArgumentsDeltaEvent for tool calls while using non-harmony models

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Collecting environment information...
==============================
        System Info
==============================
OS                           : macOS 26.1 (arm64)
GCC version                  : Could not collect
Clang version                : 17.0.0 (clang-1700.4.4.1)
CMake version                : version 3.31.5
Libc version                 : N/A

==============================
       PyTorch Info
==============================
PyTorch version              : 2.9.0
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.9 (main, Mar 11 2025, 17:41:32) [Clang 20.1.0 ] (64-bit runtime)
Python platform              : macOS-26.1-arm64-arm-64bit

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : False
CUDA runtime version         : No CUDA
CUDA_MODULE_LOADING set to   : N/A
GPU models and configuration : No CUDA
Nvidia driver version        : No CUDA
cuDNN version                : No CUDA
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Apple M1

==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.2.6
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.0
[pip3] torchaudio==2.9.0
[pip3] torchvision==0.24.0
[pip3] transformers==4.57.3
[conda] blas                      1.0                         mkl  
[conda] ffmpeg                    4.3                  h0a44026_0    pytorch
[conda] mkl                       2021.4.0           hecd8cb5_637  
[conda] mkl-service               2.4.0            py39h9ed2024_0  
[conda] mkl_fft                   1.3.1            py39h4ab4a9b_0  
[conda] mkl_random                1.2.2            py39hb2f4e1b_0  
[conda] numpy                     1.19.3                   pypi_0    pypi
[conda] numpydoc                  1.4.0            py39hecd8cb5_0  
[conda] pytorch                   1.13.1                  py3.9_0    pytorch
[conda] pyzmq                     23.2.0           py39he9d5cce_0  
[conda] torchvision               0.14.1                 py39_cpu    pytorch
[conda] transformers              4.29.2                   pypi_0    pypi

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.1.dev11724+gea228b449.d20251127 (git sha: ea228b449, date: 20251127)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
```

</details>


### 🐛 Describe the bug

When using the Responses API with streaming enabled and a tool call is expected for non-harmony models, for e.g. Qwen3, the stream emits ResponseTextDeltaEvent chunks instead of ResponseFunctionCallArgumentsDeltaEvent. As a result, the client cannot reliably detect or parse tool call arguments from the stream, since the payload appears as plain text deltas rather than structured function call argument events.

```python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",  # vLLM server URL
    api_key="not-used",
)

tools = [
    {
        "type": "function",
        "name": "get_weather",
        "description": "Get current temperature for a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country, for example Paris, France",
                },
            },
            "required": ["location"],
            "additionalProperties": False,
        },
    }
]

stream = client.responses.create(
    model="Qwen/Qwen3-4B-Instruct-2507",
    input=[{"role": "user", "content": "What's the weather like in Paris today?"}],
    tools=tools,
    stream=True
)

for event in stream:
    print(event)
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Responses API: Streaming returns ResponseTextDeltaEvent instead of ResponseFunctionCallArgumentsDeltaEvent for tool calls while using non-harmony models #29725

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Responses API: Streaming returns ResponseTextDeltaEvent instead of ResponseFunctionCallArgumentsDeltaEvent for tool calls while using non-harmony models #29725

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions