Skip to content

[Bug] Aime2025数据集开启思考后,同样的配置,lm_eval复现成功,Aisbench工具开启思考下无法复现精度分数 #138

@Sfeching

Description

@Sfeching

Operating System and Version

dsv3_2 support_chat

Python Environment for Tool Installation

Python virtual environment created with Anaconda/Miniconda

Python Version

3.11

AISBench Tool Version

20251229

AISBench Execution Command

ais_bench --models vllm_api_stream_chat --datasets aime2025_gen --debug

Model Configuration File or Custom Configuration File Content

dict(
attr="service",
type=VLLMCustomAPIChat,
abbr="vllm-api-stream-chat",
path="/DeepSeek-V3.2-Exp-W8A8",
model="serve",
stream=True,
request_rate=0,
retry=2,
api_key="",
host_ip="localhost",
host_port=8007,
url="",
max_out_len=73000,
batch_size=30,
trust_remote_code=False,
generation_kwargs=dict(
top_p=0.95,
chat_template_kwargs={
"enable_thinking":True
},
temperature=1,
ignore_eos=False,
),
pred_postprocessor=dict(type=extract_non_reasoning_content),
)

Expected Behavior

No response

Actual Behavior

NA

Pre-Checks

  • I have read the Quick Start section in the homepage documentation and cannot resolve the issue
  • I have searched the FAQ and found no duplicate issues
  • I have searched existing Issues and found no duplicate issues
  • I have updated to the latest version, and the issue still persists

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcontent_check_failedissue content check failed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions