[Bug] mmlu_pro数据集评测精度结果都是0

### 操作系统及版本

Ubuntu 22.04.5 LTS

### 安装工具的python环境

在anaconda/miniconda创建的python虚拟环境

### python版本

3.11

### AISBench工具版本

3.11.10

### AISBench执行命令

ais_bench --models vllm_api_general_chat --datasets mmlu_pro_gen_5_shot_str --debug

### 模型配置文件或自定义配置文件内容

root@glmTester:/home# cat /home/benchmark/ais_bench/benchmark/configs/models/vllm_api/vllm_api_general_chat.py
from ais_bench.benchmark.models import VLLMCustomAPIChat
from ais_bench.benchmark.utils.model_postprocessors import extract_non_reasoning_content

models = [
    dict(
        attr="service",
        type=VLLMCustomAPIChat,
        abbr='vllm-api-general-chat',
        path="/data/Qwen3-32B",
        model="qwen3",
        request_rate = 0,
        retry = 2,
        host_ip = "100.100.*.**",
        host_port = 8011,
        max_out_len = 8000,
        batch_size=16,
        trust_remote_code=False,
        generation_kwargs = dict(
            temperature = 0,
            ignore_eos = False
        )
    )
]

### 预期行为

测试结果正常

### 实际行为

结果都是0

### 前置检查

- [x] 我已读懂主页文档的快速入门，无法解决问题
- [x] 我已检索过FAQ，无重复问题
- [x] 我已搜索过现有Issue，无重复问题
- [x] 我已更新到最新版本，问题仍存在

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] mmlu_pro数据集评测精度结果都是0 #125

操作系统及版本

安装工具的python环境

python版本

AISBench工具版本

AISBench执行命令

模型配置文件或自定义配置文件内容

预期行为

实际行为

前置检查

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] mmlu_pro数据集评测精度结果都是0 #125

Description

操作系统及版本

安装工具的python环境

python版本

AISBench工具版本

AISBench执行命令

模型配置文件或自定义配置文件内容

预期行为

实际行为

前置检查

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions