[Bug] Qwen3-Coder-Next跑humanevalx数据集报错

### 操作系统及版本

oe2203sp4.aarch64

### 安装工具的python环境

docker容器中的python环境

### python版本

3.11

### AISBench工具版本

3.1.20260119

### AISBench执行命令

ais_bench --models vllm_api_general_chat --datasets humanevalx_gen_0_shot --mode all --max-num-workers=4 --debug

### 模型配置文件或自定义配置文件内容

from ais_bench.benchmark.models import VLLMCustomAPIChat
from ais_bench.benchmark.utils.postprocess.model_postprocessors import extract_non_reasoning_content

models = [
    dict(
        attr="service",
        type=VLLMCustomAPIChat,
        abbr="vllm-api-general-chat",
        path="/data/Qwen3-Coder-Next/",
        model="/data/Qwen3-Coder-Next/",
        stream=False,
        request_rate=0,
        use_timestamp=False,
        retry=2,
        api_key="",
        host_ip="localhost",
        host_port=8000,
        url="",
        max_out_len=20000,
        batch_size=1,
        trust_remote_code=False,
        generation_kwargs=dict(
            temperature=0.01,
            ignore_eos=False,
        ),
        pred_postprocessor=dict(type=extract_non_reasoning_content),
    )
]


### 预期行为

完成精度测试并返回测试结果

### 实际行为

1、服务正常推理，进度条持续增长；
2、推理即将结束时报错，报错错误栈为：
[2026-02-09 17:32:10,901] [ais_bench] [INFO] Debug mode, print progress directly
[2026-02-09 17:32:10,951] [ais_bench] [INFO] Running 1-th replica of evaluation
[LOG_WARNING] can not create directory, directory: /home/ubuntu/ascend/log, possible reason: No such file or directory.path string is NULLpath string is NULLTraceback (most recent call last):
  File "/data/cc/home/xty/benchmark/ais_bench/benchmark/tasks/openicl_eval.py", line 521, in <module>
    raise e
  File "/data/cc/home/xty/benchmark/ais_bench/benchmark/tasks/openicl_eval.py", line 518, in <module>
    evaluator.run()
  File "/data/cc/home/xty/benchmark/ais_bench/benchmark/tasks/openicl_eval.py", line 98, in run
    self._score()
  File "/data/cc/home/xty/benchmark/ais_bench/benchmark/tasks/openicl_eval.py", line 283, in _score
    result = icl_evaluator.evaluate(k, n, copy.deepcopy(test_set), **preds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/cc/home/xty/benchmark/ais_bench/benchmark/openicl/icl_evaluator/icl_base_evaluator.py", line 284, in evaluate
    results = self.score(**current_params)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/cc/home/xty/benchmark/ais_bench/benchmark/datasets/humanevalx/humanevalx.py", line 144, in score
    result = evaluate_functional_correctness(config)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/cc/home/xty/benchmark/ais_bench/benchmark/datasets/humanevalx/humaneval_x_eval.py", line 135, in evaluate_functional_correctness
    problems = read_dataset(config.problem_file,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/cc/home/xty/benchmark/ais_bench/benchmark/datasets/humanevalx/humaneval_x_utils.py", line 234, in read_dataset
    dataset = {task["task_id"]: task for task in stream_jsonl(data_file)}
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/cc/home/xty/benchmark/ais_bench/benchmark/datasets/humanevalx/humaneval_x_utils.py", line 234, in <dictcomp>
    dataset = {task["task_id"]: task for task in stream_jsonl(data_file)}
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/cc/home/xty/benchmark/ais_bench/benchmark/datasets/humanevalx/humaneval_x_utils.py", line 212, in stream_jsonl
    with gzip.open(filename, 'rt', encoding='utf-8') as fp:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.10/lib/python3.11/gzip.py", line 58, in open
    binary_file = GzipFile(filename, gz_mode, compresslevel)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.10/lib/python3.11/gzip.py", line 174, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/data/cc/home/xty/benchmark/benchmark/ais_bench/datasets/humanevalx/humanevalx_js.jsonl.gz'
[ERROR] 2026-02-09-17:32:10 (PID:120083, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception
[2026-02-09 17:32:12,440] [ais_bench] [INFO] Evaluation tasks completed.
[2026-02-09 17:32:12,440] [ais_bench] [INFO] Summarizing evaluation results...
dataset            version    metric    mode    vllm-api-general-chat
-----------------  ---------  --------  ------  -----------------------
humanevalx-python  -          -         -       -
humanevalx-cpp     -          -         -       -
humanevalx-go      -          -         -       -
humanevalx-java    -          -         -       -
humanevalx-js      -          -         -       -
[2026-02-09 17:32:12,444] [ais_bench] [INFO] write summary to /data/cc/home/xty/benchmark/outputs/default/20260209_153309/summary/summary_20260209_153309.txt
[2026-02-09 17:32:12,444] [ais_bench] [INFO] write csv to /data/cc/home/xty/benchmark/outputs/default/20260209_153309/summary/summary_20260209_153309.csv

我的ais_bench的路径为/data/cc/home/xty/benchmark/ais_bench，但是在错误栈的最后一行报No such file or directory的地方打印出的路径为/data/cc/home/xty/benchmark/benchmark/ais_bench

### 前置检查

- [x] 我已读懂主页文档的快速入门，无法解决问题
- [x] 我已检索过FAQ，无重复问题
- [x] 我已搜索过现有Issue，无重复问题
- [x] 我已更新到最新版本，问题仍存在

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Qwen3-Coder-Next跑humanevalx数据集报错 #139

操作系统及版本

安装工具的python环境

python版本

AISBench工具版本

AISBench执行命令

模型配置文件或自定义配置文件内容

预期行为

实际行为

前置检查

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Qwen3-Coder-Next跑humanevalx数据集报错 #139

Description

操作系统及版本

安装工具的python环境

python版本

AISBench工具版本

AISBench执行命令

模型配置文件或自定义配置文件内容

预期行为

实际行为

前置检查

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions