Eval bug: Runtime failure when using ChatBox with tools enabled and GPT-OSS-20B

### Name and Version

**Environment** (compiled from master)

```
root@77c821627b43:/app# ./llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /app/libggml-cuda.so
load_backend: loaded CPU backend from /app/libggml-cpu-alderlake.so
version: 6118 (6c7e9a54)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
```

**Issue**

Runtime runs in 

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

**Environment** (Compiled from master):

```
root@77c821627b43:/app# ./llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no  
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no  
ggml_cuda_init: found 2 CUDA devices:  
  Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes  
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes  
load_backend: loaded CUDA backend from /app/libggml-cuda.so  
load_backend: loaded CPU backend from /app/libggml-cpu-alderlake.so  
version: 6118 (6c7e9a54)  
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu  
```

### Models

gpt-oss-20b-BF16.gguf

### Problem description & steps to reproduce

Issue:

The server fails at runtime when attempting to use ChatBox with tools enabled. Everything builds fine from master, and the server starts up without issues. However, once I initiate a session using the ChatBox frontend with tools turned on, the process crashes or becomes unresponsive.

Expected behavior:

The server should operate normally with tools enabled in ChatBox.

Steps to reproduce:

Build llama-server from latest master
Start the server
Connect with ChatBox
Enable tools (MSP Server)
Attempt to start a chat
Runtime fails with error

```
srv  log_server_r: request: POST /v1/chat/completions 192.168.1.248 200
slot      release: id  0 | task 2 | stop processing: n_past = 419, truncated = 0
slot print_timing: id  0 | task 2 |
prompt eval time =     109.57 ms /   327 tokens (    0.34 ms per token,  2984.31 tokens per second)
       eval time =     291.01 ms /    34 tokens (    8.56 ms per token,   116.83 tokens per second)
      total time =     400.59 ms /   361 tokens
libggml-base.so(+0x16d4b)[0x7f5eb1ed3d4b]
libggml-base.so(ggml_print_backtrace+0x21f)[0x7f5eb1ed41af]
libggml-base.so(+0x28aaf)[0x7f5eb1ee5aaf]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7f5eb1d3d20c]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277)[0x7f5eb1d3d277]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8)[0x7f5eb1d3d4d8]
/app/llama-server(+0x44fb2)[0x5593b03aefb2]
/app/llama-server(+0x158ce8)[0x5593b04c2ce8]
/app/llama-server(+0xb0f14)[0x5593b041af14]
/app/llama-server(+0xb321c)[0x5593b041d21c]
/app/llama-server(+0xdf406)[0x5593b0449406]
/app/llama-server(+0x856fd)[0x5593b03ef6fd]
/app/llama-server(+0x4d5e5)[0x5593b03b75e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f5eb1988d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f5eb1988e40]
/app/llama-server(+0x4f035)[0x5593b03b9035]
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unexpected content at end of input
```


Additional context:

Both CUDA and CPU backends load successfully.
No errors during build or initial startup.
ChatBox works fine without tools.
The failure happens only when tools are enabled.
Let me know if logs or stack traces are needed.

Runtime configuration 

```
Configuration:
  GPU Layers: 99
  Threads: -1
  Context Size: 16384
  Temperature: 1.0
  Top-p: 1.0
  Top-k: 0
  Jinja: true
```


### First Bad Commit

_No response_

### Relevant log output

```shell
srv log_server_r: request: POST /v1/chat/completions 192.168.1.248 200
slot release: id 0 | task 2 | stop processing: n_past = 419, truncated = 0
slot print_timing: id 0 | task 2 |
prompt eval time = 109.57 ms / 327 tokens ( 0.34 ms per token, 2984.31 tokens per second)
eval time = 291.01 ms / 34 tokens ( 8.56 ms per token, 116.83 tokens per second)
total time = 400.59 ms / 361 tokens
libggml-base.so(+0x16d4b)[0x7f5eb1ed3d4b]
libggml-base.so(ggml_print_backtrace+0x21f)[0x7f5eb1ed41af]
libggml-base.so(+0x28aaf)[0x7f5eb1ee5aaf]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7f5eb1d3d20c]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277)[0x7f5eb1d3d277]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8)[0x7f5eb1d3d4d8]
/app/llama-server(+0x44fb2)[0x5593b03aefb2]
/app/llama-server(+0x158ce8)[0x5593b04c2ce8]
/app/llama-server(+0xb0f14)[0x5593b041af14]
/app/llama-server(+0xb321c)[0x5593b041d21c]
/app/llama-server(+0xdf406)[0x5593b0449406]
/app/llama-server(+0x856fd)[0x5593b03ef6fd]
/app/llama-server(+0x4d5e5)[0x5593b03b75e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f5eb1988d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f5eb1988e40]
/app/llama-server(+0x4f035)[0x5593b03b9035]
terminate called after throwing an instance of 'std::runtime_error'
what(): Unexpected content at end of input
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Runtime failure when using ChatBox with tools enabled and GPT-OSS-20B #15170

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Runtime failure when using ChatBox with tools enabled and GPT-OSS-20B #15170

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions