[Bug] Aime2025数据集开启思考后，同样的配置，lm_eval复现成功，Aisbench工具开启思考下无法复现精度分数

### Operating System and Version

dsv3_2 support_chat

### Python Environment for Tool Installation

Python virtual environment created with Anaconda/Miniconda

### Python Version

3.11

### AISBench Tool Version

20251229

### AISBench Execution Command

ais_bench --models vllm_api_stream_chat --datasets aime2025_gen --debug

### Model Configuration File or Custom Configuration File Content

dict(
        attr="service",
        type=VLLMCustomAPIChat,
        abbr="vllm-api-stream-chat",
        path="/DeepSeek-V3.2-Exp-W8A8",
        model="serve",
        stream=True,
        request_rate=0,
        retry=2,
        api_key="",
        host_ip="localhost",
        host_port=8007,
        url="",
        max_out_len=73000,
        batch_size=30,
        trust_remote_code=False,
        generation_kwargs=dict(
            top_p=0.95,
            chat_template_kwargs={
                "enable_thinking":True
            },
            temperature=1,
            ignore_eos=False,
        ),
        pred_postprocessor=dict(type=extract_non_reasoning_content),
    )

### Expected Behavior

_No response_

### Actual Behavior

NA

### Pre-Checks

- [x] I have read the Quick Start section in the homepage documentation and cannot resolve the issue
- [x] I have searched the FAQ and found no duplicate issues
- [x] I have searched existing Issues and found no duplicate issues
- [x] I have updated to the latest version, and the issue still persists

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Aime2025数据集开启思考后，同样的配置，lm_eval复现成功，Aisbench工具开启思考下无法复现精度分数 #138

Operating System and Version

Python Environment for Tool Installation

Python Version

AISBench Tool Version

AISBench Execution Command

Model Configuration File or Custom Configuration File Content

Expected Behavior

Actual Behavior

Pre-Checks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Aime2025数据集开启思考后，同样的配置，lm_eval复现成功，Aisbench工具开启思考下无法复现精度分数 #138

Description

Operating System and Version

Python Environment for Tool Installation

Python Version

AISBench Tool Version

AISBench Execution Command

Model Configuration File or Custom Configuration File Content

Expected Behavior

Actual Behavior

Pre-Checks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions