-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of python collect_env.py
Collecting environment information...
==============================
System Info
==============================
OS : macOS 26.1 (arm64)
GCC version : Could not collect
Clang version : 17.0.0 (clang-1700.4.4.1)
CMake version : version 3.31.5
Libc version : N/A
==============================
PyTorch Info
==============================
PyTorch version : 2.9.0
Is debug build : False
CUDA used to build PyTorch : None
ROCM used to build PyTorch : N/A
==============================
Python Environment
==============================
Python version : 3.12.9 (main, Mar 11 2025, 17:41:32) [Clang 20.1.0 ] (64-bit runtime)
Python platform : macOS-26.1-arm64-arm-64bit
==============================
CUDA / GPU Info
==============================
Is CUDA available : False
CUDA runtime version : No CUDA
CUDA_MODULE_LOADING set to : N/A
GPU models and configuration : No CUDA
Nvidia driver version : No CUDA
cuDNN version : No CUDA
HIP runtime version : N/A
MIOpen runtime version : N/A
Is XNNPACK available : True
==============================
CPU Info
==============================
Apple M1
==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.2.6
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.0
[pip3] torchaudio==2.9.0
[pip3] torchvision==0.24.0
[pip3] transformers==4.57.3
[conda] blas 1.0 mkl
[conda] ffmpeg 4.3 h0a44026_0 pytorch
[conda] mkl 2021.4.0 hecd8cb5_637
[conda] mkl-service 2.4.0 py39h9ed2024_0
[conda] mkl_fft 1.3.1 py39h4ab4a9b_0
[conda] mkl_random 1.2.2 py39hb2f4e1b_0
[conda] numpy 1.19.3 pypi_0 pypi
[conda] numpydoc 1.4.0 py39hecd8cb5_0
[conda] pytorch 1.13.1 py3.9_0 pytorch
[conda] pyzmq 23.2.0 py39he9d5cce_0
[conda] torchvision 0.14.1 py39_cpu pytorch
[conda] transformers 4.29.2 pypi_0 pypi
==============================
vLLM Info
==============================
ROCM Version : Could not collect
vLLM Version : 0.1.dev11724+gea228b449.d20251127 (git sha: ea228b449, date: 20251127)
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
Could not collect
==============================
Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
🐛 Describe the bug
When using the Responses API with streaming enabled and a tool call is expected for non-harmony models, for e.g. Qwen3, the stream emits ResponseTextDeltaEvent chunks instead of ResponseFunctionCallArgumentsDeltaEvent. As a result, the client cannot reliably detect or parse tool call arguments from the stream, since the payload appears as plain text deltas rather than structured function call argument events.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1", # vLLM server URL
api_key="not-used",
)
tools = [
{
"type": "function",
"name": "get_weather",
"description": "Get current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country, for example Paris, France",
},
},
"required": ["location"],
"additionalProperties": False,
},
}
]
stream = client.responses.create(
model="Qwen/Qwen3-4B-Instruct-2507",
input=[{"role": "user", "content": "What's the weather like in Paris today?"}],
tools=tools,
stream=True
)
for event in stream:
print(event)Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working