local eval add mindformers model by muqing-li · Pull Request #110 · AISBench/benchmark

muqing-li · 2026-01-15T09:05:15Z

Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。

PR Type / PR类型

Related Issue | 关联 Issue
Fixes #(issue ID / issue 编号) / Relates to #(issue ID / issue 编号)

🔍 Motivation / 变更动机

Please describe the motivation of this PR and the goal you want to achieve through this PR.
请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。

当前 AISBench 评测工具暂不支持直接加载 MindFormers 训练生成的模型权重进行离线评测，因此需要在 AISBench 侧对 MindFormers/MindSpore 模型进行统一封装适配，以实现兼容评测.

目标：创建 MindFormers 框架的模型加载脚本与对应的aisbench配置文件，令aisbench可通过对应命令调用mindformers模型权重进行测评：

ais_bench --models mf_model --datasets 评测任务

📝 Modifica

tion / 修改内容

Please briefly describe what modification is made in this PR.
请简要描述此拉取请求中进行的修改。

1）增加配置文件:ais_bench\benchmark\configs\models\mf_models\mindformers_model.py

在Aisbench工具的config文件中，通过设置 type: MindFormerModel 来选择mindformers的模型。与HF离线模型配置的差别如下:
```
 type=MindFormerModel, 
 abbr='mindformer-model'
 yaml_cfg_path = 'THUDM/your.yaml',	#path to mindformers yaml file, current value is just a example
```
注意:yaml_cfg_path为新增配置,用来指示yaml配置路径。

2）增加模型封装文件:ais_bench/benchmark/ais_bench/benchmark/models/mindformers_model.py,

主要包含：model build ：load_tokenizer、load_model、load_checkpoint。generete调用接口：主要集成mindformers的text_generete的调用，以及部分后处理
- 利用aisbench的工厂机制,通过 @MODELS.register_module(),将mindformers的模型封装类注册进去,
3）修改aisbench runner 中的逻辑，适配inference任务的subprocess调用的命令，在运行mindformer_model时，使用msrun代替torchrun
```
if self.abbr == 'mindformer-model':
    command = (
        f"msrun "
        f"--worker_num={self.num_gpus} "
        f"--local_worker_num={self.node_gpus} "
        f"--master_port={port} "
        f"--log_dir='output/msrun_log' "
        f"--join=True "
        f'{script_path} {cfg_path}'
    )
else :
    command = (f'torchrun --master_port={port} '
               f'--nproc_per_node {self.num_procs} '
               f'{script_path} {cfg_path}')
```

📐 Associated Test Results / 关联测试结果

Please provide links to the related test results, such as CI pipelines, test reports, etc.
请提供相关测试结果的链接，例如 CI 管道、测试报告等。

Aisbench demo执行结果

黑盒验证：HuggingFace权重转换成minfformers权重，mindformers加载转换后的权重和huggingface流程加载转换前权重得到的评分相同
相同权重经过HuggingFaceModel和MindFormerModel模型进行评测的结果，误差控制在1%内：

Qwen3_0.6B Ceval评分（单卡）

评测项目	MindFormerModel得分	HuggingFaceModel得分
ceval-stem	44.27	44.58
ceval-social-science	55.83	56.34
ceval-humanities	50.72	48.74
ceval-other	50.30	50.49
ceval-hard	39.59	40.37
ceval	49.13	48.97
ceval-weighted	48.96	48.74

Qwen3-30B-A3B MMLU得分（多卡）

评测项目	MindFormerModel得分	HuggingFaceModel得分
mmlu-humanities	79.89	80.16
mmlu-stem	78.47	77.51
mmlu-social-science	86.15	86.32
mmlu-other	79.40	79.63
mmlu	80.62	80.86
mmlu-weighted	79.19	78.99

⚠️ BC-breaking (Optional) / 向后不兼容变更（可选）

Does the modification introduce changes that break the backward compatibility of the downstream repositories? If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
是否引入了会破坏下游存储库向后兼容性的更改？如果是，请描述它如何破坏兼容性，以及下游项目应该如何修改其代码以保持与此 PR 的兼容性。

⚠️ Performance degradation (Optional) / 性能下降（可选）

If the modification introduces performance degradation, please describe the impact of the performance degradation and the expected performance improvement.
如果引入了性能下降，请描述性能下降的影响和预期的性能改进。

🌟 Use cases (Optional) / 使用案例（可选）

If this PR introduces a new feature, it is better to list some use cases here and update the documentation.
如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。

✅ Checklist / 检查列表

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues. / 使用预提交或其他 linting 工具来修复潜在的 lint 问题。
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。
All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects. / 如果此拉取请求对下游或其他相关项目有潜在影响，应在那些项目中测试此 PR。
CLA has been signed and all committers have signed the CLA in this PR. / CLA 已签署，且本 PR 中的所有提交者均已签署 CLA。

👥 Collaboration Info / 协作信息

Suggested Reviewers / 建议审核人: @xxx
Relevant Module Owners / 相关模块负责人: @xxx
Other Collaboration Notes / 其他协作说明：

🌟 Useful CI Command / 实用的CI命令

Command / 命令	Introduction / 介绍
`/gemini review`	Performs a code review for the current pull request in its current state by Gemini. / 对当前拉取请求在当前状态下由 Gemini 执行代码审核。
`/gemini summary`	Provides a summary of the current pull request in its current state by Gemini. / 对当前拉取请求在当前状态下由 Gemini 提供摘要。
`/gemini help`	Displays a list of available commands of Gemini. / 显示 Gemini 可用命令的列表。
`/readthedocs build`	Triggers a build of the documentation for the current pull request in its current state by Read the Docs. / 触发当前拉取请求在当前状态下由 Read the Docs 构建文档。

gemini-code-assist · 2026-01-15T09:05:35Z

Summary of Changes

Hello @muqing-li, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在增强 AISBench 评估工具的功能，使其能够直接加载并评估由 MindFormers 框架训练生成的模型权重。通过引入 MindFormers 模型的统一封装和适配其特有的分布式运行机制，AISBench 现在可以通过简单的命令行指令对 MindFormers 模型进行离线评估，从而扩展了其支持的模型生态系统。

Highlights

新增 MindFormers 模型配置: 增加了 ais_bench/benchmark/configs/models/mf_models/mf_model.py 配置文件，允许通过 type: MindFormerModel 来配置 MindFormers 模型，并引入了 yaml_cfg_file 参数指定 MindFormers 的 YAML 配置文件路径。
MindFormers 模型封装: 引入了 ais_bench/benchmark/models/local_models/mindformers_model.py 文件，其中包含 MindFormerModel 类，用于封装 MindFormers 模型的加载（tokenizer、model、checkpoint）和生成（text_generate）接口，并注册到 AISBench 的模型工厂机制中。
适配 MindFormers 模型的运行逻辑: 修改了 ais_bench/benchmark/tasks/openicl_infer.py 中的 runner 逻辑，当模型类型为 mindformer-model 时，使用 msrun 命令替代 torchrun 来启动推理任务，以支持 MindFormers 的分布式运行。

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

感谢您的贡献。本次PR为AISBench新增了对MindFormers模型的支持，这是一个非常有价值的功能。代码整体结构清晰，但在具体实现上存在一些关键问题需要修复。主要问题集中在MindFormerModel类中，其中几个核心的评估方法（如get_ppl）错误地混合使用了PyTorch和MindSpore的API，这将导致运行时错误。此外，还存在缺失导入、裸露的except语句以及一些硬编码配置等问题。请在合并前解决这些问题，以确保代码的正确性和可维护性。

ais_bench/benchmark/models/local_models/mindformers_model.py

ais_bench/benchmark/configs/models/mf_models/mf_model.py

ais_bench/benchmark/models/local_models/mindformers_model.py

ais_bench/benchmark/tasks/openicl_infer.py

ais_bench/benchmark/utils/config/build.py

muqing-li · 2026-01-15T11:04:31Z

/gemini review

muqing-li · 2026-01-15T11:06:04Z

/gemini help

gemini-code-assist

Code Review

This pull request adds support for evaluating MindFormers models within AISBench. The changes include a new MindFormerModel wrapper, a corresponding configuration file, and modifications to the task runner to use msrun for distributed execution. While the core feature of running MindFormer models seems well-implemented, I've found several critical issues in the utility methods for calculating PPL and log-likelihood, which appear to be incorrectly mixing PyTorch and MindSpore APIs, and will lead to runtime errors. Additionally, there are bugs in the multi-node command generation for msrun. I've provided detailed comments and suggestions to address these issues.

ais_bench/benchmark/models/local_models/mindformers_model.py

ais_bench/benchmark/tasks/openicl_infer.py

ais_bench/benchmark/models/local_models/mindformers_model.py

ais_bench/benchmark/tasks/openicl_infer.py

ais_bench/benchmark/utils/config/build.py

gemini-code-assist · 2026-01-15T11:07:20Z

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

muqing-li · 2026-01-15T12:14:21Z

/gemini review

gemini-code-assist

Code Review

此 PR 成功地为 AISBench 添加了对 MindFormers 模型的支持，包括新的模型封装、配置以及对 msrun 启动器的适配。代码结构清晰，实现了预定目标。

我提出了一些改进建议，主要集中在以下几个方面：

代码健壮性: 在 mindformers_model.py 中，对可能为 None 的 yaml_cfg_file 进行检查，并使用更合适的 ValueError 替代 assert 来处理无效输入。
可维护性: 移除了未使用的导入，避免了硬编码值（如 fastchat 模板），并建议重构 openicl_infer.py 中重复的 msrun 命令构建逻辑。
代码清晰度: 建议为 build.py 中的特殊处理逻辑添加注释，并修正配置文件中具有误导性的注释。
性能: 优化了 generate 方法中停止字符串的后处理逻辑，以避免不必要的列表重复创建。

这些修改将有助于提高代码的健壮性、可维护性和性能。总体而言，这是一次很棒的贡献。

ais_bench/benchmark/models/local_models/mindformers_model.py

ais_bench/benchmark/configs/models/mf_models/mf_model.py

ais_bench/benchmark/models/local_models/mindformers_model.py

ais_bench/benchmark/tasks/openicl_infer.py

ais_bench/benchmark/utils/config/build.py

muqing-li · 2026-01-15T13:04:46Z

/gemini review

gemini-code-assist

Code Review

感谢您的贡献，这个PR为AISBench添加了对MindFormers模型的支持，这是一个很棒的功能。代码整体结构清晰，成功地集成了msrun来启动MindFormers模型。

我在代码中发现了一些可以改进的地方，主要集中在以下几个方面：

多节点执行的Bug：在 openicl_infer.py 中，为 msrun 构建多节点命令时，worker_num 参数设置错误，可能导致多节点任务失败。
代码健壮性：在 mindformers_model.py 中，存在一些可能导致运行时错误或非预期行为的问题，例如可变默认参数的修改、不安全的字典访问和应使用 ValueError 的 assert。
代码可维护性：在 build.py 中为MindFormers模型添加了特殊处理逻辑，这会降低代码的长期可维护性。
性能：mindformers_model.py 中的字符串后处理逻辑可以更高效。

具体的修改建议请见我的评论。修复这些问题后，代码将更加健robust和易于维护。

ais_bench/benchmark/tasks/openicl_infer.py

ais_bench/benchmark/models/local_models/mindformers_model.py

ais_bench/benchmark/tasks/openicl_infer.py

ais_bench/benchmark/models/local_models/mindformers_model.py

ais_bench/benchmark/utils/config/build.py

muqing-li · 2026-01-19T01:43:42Z

/gemini review

gemini-code-assist

Code Review

本次 PR 新增了对 MindFormers 模型的支持，使得 ais_bench 可以直接加载 MindFormers 模型进行评测。主要改动包括新增 MindFormerModel 模型封装、对应的配置文件以及修改任务启动逻辑以适配 msrun。整体实现很完整，代码质量很高。我发现了一些可以改进的地方，主要集中在代码可读性和可维护性方面，希望能帮助提升代码质量。

ais_bench/benchmark/models/local_models/mindformers_model.py

ais_bench/benchmark/configs/models/mf_models/mf_model.py

ais_bench/benchmark/tasks/openicl_infer.py

muqing-li · 2026-01-19T02:12:47Z

/gemini review

gemini-code-assist

Code Review

感谢您的贡献，此 PR 为 ais_bench 增加了对 MindFormers 模型的支持，这是一个很棒的功能。整体实现很全面，包括了模型加载、推理和多卡运行的适配。

我主要针对代码的可维护性、健壮性和风格提出了一些建议，希望能帮助您进一步提升代码质量：

代码重构：在 openicl_infer.py 中，生成 msrun 和 torchrun 命令的逻辑存在较多重复，建议进行重构以提高可读性和可维护性。
正确性：在 mindformers_model.py 中，对生成文本进行后处理的逻辑在处理多个停止标记时可能存在问题，我提供了一个更健壮的实现方式。
代码质量：我还发现了一些可以改进的地方，例如移除未使用的导入、将方法内的常量提升为类常量、使用 ValueError 代替 assert 进行参数校验等。

总的来说，这是一次高质量的提交。期待这些修改合并后能让工具更加完善。

ais_bench/benchmark/models/local_models/mindformers_model.py

ais_bench/benchmark/utils/config/build.py

ais_bench/benchmark/tasks/openicl_infer.py

ais_bench/benchmark/models/local_models/mindformers_model.py

Copilot

Pull request overview

This PR adds first-class support for running offline local evaluation with MindSpore/MindFormers models in AISBench, including adapting the local inference runner to launch distributed jobs with msrun when required.

Changes:

Add MindFormerModel local model wrapper (MindFormers build + checkpoint load + generate).
Add a MindFormers model config template (mf_model) to enable ais_bench --models mf_model ....
Extend config/task utilities to better handle model_cfg.type being a class and to select msrun vs torchrun via a model class launcher attribute.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
ais_bench/benchmark/utils/config/build.py	Adds helpers to derive model type name / class from config (supports `type` as class).
ais_bench/benchmark/tasks/openicl_infer.py	Adds launcher-aware command building (`torchrun` vs `msrun`) for local inference subprocesses.
ais_bench/benchmark/models/local_models/mindformers_model.py	New MindFormers local model wrapper implementing load + generate.
ais_bench/benchmark/models/local_models/base.py	Adds default `launcher = "torchrun"` class attribute for local models.
ais_bench/benchmark/configs/models/mf_models/mf_model.py	Adds example config to run MindFormers models via AISBench.
ais_bench/benchmark/models/init.py	Minor formatting/indent fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T01:47:35Z

ais_bench/benchmark/tasks/openicl_infer.py

+                worker_num=self.num_gpus,
+                local_worker_num=self.num_gpus,


In msrun mode, worker_num / local_worker_num are derived from num_gpus, but the code elsewhere distinguishes num_gpus (resource binding) vs num_procs (process count). If run_cfg.num_procs != run_cfg.num_gpus, this will launch the wrong number of msrun workers. Consider basing these on self.num_procs (or validating they match) to keep semantics consistent with the torchrun branch.

Suggested change

worker_num=self.num_gpus,

local_worker_num=self.num_gpus,

worker_num=self.num_procs,

local_worker_num=self.num_procs,

Copilot · 2026-02-11T01:47:36Z

ais_bench/benchmark/tasks/openicl_infer.py

+        model_cls = get_model_cls_from_cfg(self.model_cfg)
+        launcher = getattr(model_cls, 'launcher', 'torchrun') if model_cls else 'torchrun'
        if self.num_gpus > 1 and not use_backend and self.nnodes == 1:
-            port = random.randint(12000, 32000)
-            command = (f'torchrun --master_port={port} '
-                       f'--nproc_per_node {self.num_procs} '
-                       f'{script_path} {cfg_path}')
+            ctx = dict(
+                port=random.randint(12000, 32000),
+                script_path=script_path,
+                cfg_path=cfg_path,
+                worker_num=self.num_gpus,
+                local_worker_num=self.num_gpus,
+                nproc_per_node=self.num_procs,
+                nnodes=1,
+                master_addr=None,
+                node_rank=None,
+            )
+            builder = LAUNCHER_CMD_BUILDERS.get(launcher, LAUNCHER_CMD_BUILDERS["torchrun"])
+            command = builder(ctx)
        elif self.nnodes > 1:


The new launcher selection + msrun command builder logic isn't covered by existing unit tests for OpenICLInferTask.get_command (tests currently only assert torchrun behavior). Please add a test case that uses a registered model class (or a stub with launcher = 'msrun') and asserts the generated command contains msrun and correct worker/local_worker counts.

Copilot · 2026-02-11T01:47:36Z

ais_bench/benchmark/models/local_models/mindformers_model.py

+import transformers
+
+from ais_bench.benchmark.models.local_models.base import BaseModel
+from ais_bench.benchmark.models import APITemplateParser


APITemplateParser is imported but not used. If MindFormerModel should use a template parser, wire it up explicitly; otherwise remove the import to avoid confusion.

Suggested change

from ais_bench.benchmark.models import APITemplateParser

Copilot · 2026-02-11T01:47:36Z

ais_bench/benchmark/models/local_models/mindformers_model.py

+        if isinstance(outputs, dict):
+            outputs = outputs.get('sequences', outputs)
+            if outputs is None:
+                raise ValueError("Model output dictionary is missing 'sequence' key.")


The error message says the output dict is missing 'sequence', but the code actually looks for the 'sequences' key. This mismatch will mislead debugging; update the message (or the key) so they are consistent.

Suggested change

raise ValueError("Model output dictionary is missing 'sequence' key.")

raise ValueError("Model output dictionary is missing 'sequences' key.")

Copilot · 2026-02-11T01:47:36Z

ais_bench/benchmark/configs/models/mf_models/mf_model.py

+models = [
+    dict(
+        attr="local", # local or service
+        type=MindFormerModel, #  transformers < 4.33.0 用这个，优先AutoModelForCausalLM.from_pretrained加载模型，失败则用AutoModel.from_pretrained加载


The inline comment for type=MindFormerModel mentions Transformers (<4.33.0) and HuggingFace loading behavior, which is unrelated to MindFormers and can confuse users. Please update the comment to describe MindFormers-specific behavior/requirements (e.g., that it uses yaml_cfg_file + MindFormers checkpoint loading).

Suggested change

type=MindFormerModel, # transformers < 4.33.0 用这个，优先AutoModelForCausalLM.from_pretrained加载模型，失败则用AutoModel.from_pretrained加载

type=MindFormerModel, # MindFormers 模型类型，结合 yaml_cfg_file 配置和 MindFormers checkpoint 加载模型

Copilot · 2026-02-11T01:47:36Z

ais_bench/benchmark/configs/models/mf_models/mf_model.py

+            seed = None,
+            repetition_penalty = 1.03,
+        ),
+        run_cfg = dict(num_gpus=1, num_procs=1),  # 多卡/多机多卡 参数，使用torchrun拉起任务


run_cfg comment says the task is launched with torchrun, but MindFormerModel.launcher is set to msrun and OpenICLInferTask now supports msrun. Please update this comment to match the actual launcher to avoid misconfiguration.

Suggested change

run_cfg = dict(num_gpus=1, num_procs=1), # 多卡/多机多卡参数，使用torchrun拉起任务

run_cfg = dict(num_gpus=1, num_procs=1), # 多卡/多机多卡参数，使用 msrun 拉起任务

Copilot · 2026-02-11T01:47:37Z

ais_bench/benchmark/models/local_models/mindformers_model.py

+import os, sys
+from typing import Dict, List, Optional, Union
+
+import numpy as np
+import torch


os, sys, and torch are imported but never used in this module, which adds noise and can introduce unnecessary dependency constraints. Please remove unused imports to keep the wrapper minimal.

Suggested change

import os, sys

from typing import Dict, List, Optional, Union

import numpy as np

import torch

from typing import Dict, List, Optional, Union

import numpy as np

Copilot · 2026-02-11T01:47:37Z

ais_bench/benchmark/models/local_models/mindformers_model.py

+            assert len(messages) == 1
+            tokens = self.tokenizer(messages, padding=False, truncation=False, return_tensors='np')
+            input_ids = tokens['input_ids']
+            if input_ids.shape[-1] > self.max_seq_len:
+                input_ids = np.concatenate([input_ids[:, : self.max_seq_len // 2], input_ids[:, - self.max_seq_len // 2:]], axis=-1)


Using assert len(messages) == 1 for input validation is fragile because assertions can be disabled with Python optimizations, and it produces an unhelpful failure for users. Please raise a regular exception (e.g., ValueError/ParameterValueError) with a clear message when mode == 'mid' is used with batch inputs.

Suggested change

assert len(messages) == 1

tokens = self.tokenizer(messages, padding=False, truncation=False, return_tensors='np')

input_ids = tokens['input_ids']

if input_ids.shape[-1] > self.max_seq_len:

input_ids = np.concatenate([input_ids[:, : self.max_seq_len // 2], input_ids[:, - self.max_seq_len // 2:]], axis=-1)

if len(messages) != 1:

raise ValueError(

f"mode 'mid' does not support batch inputs: expected 1 message, got {len(messages)}."

)

tokens = self.tokenizer(messages, padding=False, truncation=False, return_tensors='np')

input_ids = tokens['input_ids']

if input_ids.shape[-1] > self.max_seq_len:

input_ids = np.concatenate(

[input_ids[:, : self.max_seq_len // 2], input_ids[:, - self.max_seq_len // 2:]],

axis=-1,

)

Copilot · 2026-02-11T01:47:37Z

ais_bench/benchmark/utils/config/build.py

+def get_model_type_name(model_cfg: ConfigDict) -> str:
+    """从 model_cfg 得到稳定的模型类型名（用于分支判断）。兼容 type 为类或字符串。"""
+    t = model_cfg.get("type", "")
+    if t is None or t == "":
+        return ""
+    if hasattr(t, "__name__"):
+        return t.__name__
+    if isinstance(t, str):
+        return t.split(".")[-1] if "." in t else t
+    return ""


get_model_type_name() is introduced to support model_cfg['type'] being a class, but the existing unit tests for build_model_from_cfg only cover string types. Please add a test that passes a class object in type to ensure logging/validation paths don’t crash (regression protection for this change).

Copilot · 2026-02-11T01:47:37Z

ais_bench/benchmark/models/local_models/mindformers_model.py

@@ -0,0 +1,316 @@
+import os, sys
+from typing import Dict, List, Optional, Union


Import of 'Union' is not used.

Suggested change

from typing import Dict, List, Optional, Union

from typing import Dict, List, Optional

gemini-code-assist bot reviewed Jan 15, 2026

View reviewed changes

muqing-li force-pushed the mf_model branch from 25e5f38 to 42501e0 Compare January 15, 2026 11:00

muqing-li closed this Jan 15, 2026

muqing-li reopened this Jan 15, 2026

gemini-code-assist bot reviewed Jan 15, 2026

View reviewed changes

muqing-li force-pushed the mf_model branch 2 times, most recently from 9377abd to 3e27b24 Compare January 15, 2026 12:13

gemini-code-assist bot reviewed Jan 15, 2026

View reviewed changes

muqing-li force-pushed the mf_model branch from 3e27b24 to c30f992 Compare January 15, 2026 12:29

gemini-code-assist bot reviewed Jan 15, 2026

View reviewed changes

muqing-li force-pushed the mf_model branch from c30f992 to 034af79 Compare January 19, 2026 01:42

gemini-code-assist bot reviewed Jan 19, 2026

View reviewed changes

muqing-li force-pushed the mf_model branch from 034af79 to 66edeac Compare January 19, 2026 02:12

gemini-code-assist bot reviewed Jan 19, 2026

View reviewed changes

SJTUyh reviewed Jan 26, 2026

View reviewed changes

ais_bench/benchmark/tasks/openicl_infer.py Outdated Show resolved Hide resolved

muqing-li force-pushed the mf_model branch 2 times, most recently from 2837d12 to 0692edb Compare January 26, 2026 12:24

GaoHuaZhang reviewed Feb 3, 2026

View reviewed changes

ais_bench/benchmark/tasks/openicl_infer.py Outdated Show resolved Hide resolved

SJTUyh reviewed Feb 3, 2026

View reviewed changes

ais_bench/benchmark/models/local_models/mindformers_model.py Outdated Show resolved Hide resolved

muqing-li force-pushed the mf_model branch from e7cbf01 to 0692edb Compare February 3, 2026 12:58

李宜杰 and others added 2 commits February 4, 2026 09:40

local eval add mindformers model

07d546b

refactor backend cmd logits

2a51fff

muqing-li force-pushed the mf_model branch 3 times, most recently from 48bef55 to 766fdbe Compare February 6, 2026 01:25

add mindspore modelnotfound error

5fa16b2

muqing-li force-pushed the mf_model branch from 766fdbe to 5fa16b2 Compare February 6, 2026 02:15

zhongzhouTan-coder requested a review from Copilot February 11, 2026 01:42

Copilot started reviewing on behalf of zhongzhouTan-coder February 11, 2026 01:42 View session

Copilot AI reviewed Feb 11, 2026

View reviewed changes

	raise ValueError("Model output dictionary is missing 'sequence' key.")
	raise ValueError("Model output dictionary is missing 'sequences' key.")

	type=MindFormerModel, # transformers < 4.33.0 用这个，优先AutoModelForCausalLM.from_pretrained加载模型，失败则用AutoModel.from_pretrained加载
	type=MindFormerModel, # MindFormers 模型类型，结合 yaml_cfg_file 配置和 MindFormers checkpoint 加载模型

	run_cfg = dict(num_gpus=1, num_procs=1), # 多卡/多机多卡参数，使用torchrun拉起任务
	run_cfg = dict(num_gpus=1, num_procs=1), # 多卡/多机多卡参数，使用 msrun 拉起任务

		@@ -0,0 +1,316 @@
		import os, sys
		from typing import Dict, List, Optional, Union

	from typing import Dict, List, Optional, Union
	from typing import Dict, List, Optional

Conversation

muqing-li commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Motivation / 变更动机

📝 Modifica

📐 Associated Test Results / 关联测试结果

⚠️ BC-breaking (Optional) / 向后不兼容变更（可选）

⚠️ Performance degradation (Optional) / 性能下降（可选）

🌟 Use cases (Optional) / 使用案例（可选）

✅ Checklist / 检查列表

👥 Collaboration Info / 协作信息

🌟 Useful CI Command / 实用的CI命令

Uh oh!

gemini-code-assist bot commented Jan 15, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

muqing-li commented Jan 15, 2026

Uh oh!

muqing-li commented Jan 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot commented Jan 15, 2026

Uh oh!

muqing-li commented Jan 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

muqing-li commented Jan 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

muqing-li commented Jan 19, 2026

Uh oh!

gemini-code-assist bot left a comment

muqing-li commented Jan 15, 2026 •

edited

Loading