Conversation
Summary of ChangesHello @muqing-li, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求旨在增强 AISBench 评估工具的功能,使其能够直接加载并评估由 MindFormers 框架训练生成的模型权重。通过引入 MindFormers 模型的统一封装和适配其特有的分布式运行机制,AISBench 现在可以通过简单的命令行指令对 MindFormers 模型进行离线评估,从而扩展了其支持的模型生态系统。 Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
/gemini review |
|
/gemini help |
There was a problem hiding this comment.
Code Review
This pull request adds support for evaluating MindFormers models within AISBench. The changes include a new MindFormerModel wrapper, a corresponding configuration file, and modifications to the task runner to use msrun for distributed execution. While the core feature of running MindFormer models seems well-implemented, I've found several critical issues in the utility methods for calculating PPL and log-likelihood, which appear to be incorrectly mixing PyTorch and MindSpore APIs, and will lead to runtime errors. Additionally, there are bugs in the multi-node command generation for msrun. I've provided detailed comments and suggestions to address these issues.
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. |
9377abd to
3e27b24
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
此 PR 成功地为 AISBench 添加了对 MindFormers 模型的支持,包括新的模型封装、配置以及对 msrun 启动器的适配。代码结构清晰,实现了预定目标。
我提出了一些改进建议,主要集中在以下几个方面:
- 代码健壮性: 在
mindformers_model.py中,对可能为None的yaml_cfg_file进行检查,并使用更合适的ValueError替代assert来处理无效输入。 - 可维护性: 移除了未使用的导入,避免了硬编码值(如 fastchat 模板),并建议重构
openicl_infer.py中重复的msrun命令构建逻辑。 - 代码清晰度: 建议为
build.py中的特殊处理逻辑添加注释,并修正配置文件中具有误导性的注释。 - 性能: 优化了
generate方法中停止字符串的后处理逻辑,以避免不必要的列表重复创建。
这些修改将有助于提高代码的健壮性、可维护性和性能。总体而言,这是一次很棒的贡献。
|
/gemini review |
There was a problem hiding this comment.
Code Review
感谢您的贡献,这个PR为AISBench添加了对MindFormers模型的支持,这是一个很棒的功能。代码整体结构清晰,成功地集成了msrun来启动MindFormers模型。
我在代码中发现了一些可以改进的地方,主要集中在以下几个方面:
- 多节点执行的Bug:在
openicl_infer.py中,为msrun构建多节点命令时,worker_num参数设置错误,可能导致多节点任务失败。 - 代码健壮性:在
mindformers_model.py中,存在一些可能导致运行时错误或非预期行为的问题,例如可变默认参数的修改、不安全的字典访问和应使用ValueError的assert。 - 代码可维护性:在
build.py中为MindFormers模型添加了特殊处理逻辑,这会降低代码的长期可维护性。 - 性能:
mindformers_model.py中的字符串后处理逻辑可以更高效。
具体的修改建议请见我的评论。修复这些问题后,代码将更加健robust和易于维护。
|
/gemini review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
感谢您的贡献,此 PR 为 ais_bench 增加了对 MindFormers 模型的支持,这是一个很棒的功能。整体实现很全面,包括了模型加载、推理和多卡运行的适配。
我主要针对代码的可维护性、健壮性和风格提出了一些建议,希望能帮助您进一步提升代码质量:
- 代码重构:在
openicl_infer.py中,生成msrun和torchrun命令的逻辑存在较多重复,建议进行重构以提高可读性和可维护性。 - 正确性:在
mindformers_model.py中,对生成文本进行后处理的逻辑在处理多个停止标记时可能存在问题,我提供了一个更健壮的实现方式。 - 代码质量:我还发现了一些可以改进的地方,例如移除未使用的导入、将方法内的常量提升为类常量、使用
ValueError代替assert进行参数校验等。
总的来说,这是一次高质量的提交。期待这些修改合并后能让工具更加完善。
2837d12 to
0692edb
Compare
48bef55 to
766fdbe
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds first-class support for running offline local evaluation with MindSpore/MindFormers models in AISBench, including adapting the local inference runner to launch distributed jobs with msrun when required.
Changes:
- Add
MindFormerModellocal model wrapper (MindFormers build + checkpoint load + generate). - Add a MindFormers model config template (
mf_model) to enableais_bench --models mf_model .... - Extend config/task utilities to better handle
model_cfg.typebeing a class and to selectmsrunvstorchrunvia a model classlauncherattribute.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| ais_bench/benchmark/utils/config/build.py | Adds helpers to derive model type name / class from config (supports type as class). |
| ais_bench/benchmark/tasks/openicl_infer.py | Adds launcher-aware command building (torchrun vs msrun) for local inference subprocesses. |
| ais_bench/benchmark/models/local_models/mindformers_model.py | New MindFormers local model wrapper implementing load + generate. |
| ais_bench/benchmark/models/local_models/base.py | Adds default launcher = "torchrun" class attribute for local models. |
| ais_bench/benchmark/configs/models/mf_models/mf_model.py | Adds example config to run MindFormers models via AISBench. |
| ais_bench/benchmark/models/init.py | Minor formatting/indent fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| worker_num=self.num_gpus, | ||
| local_worker_num=self.num_gpus, |
There was a problem hiding this comment.
In msrun mode, worker_num / local_worker_num are derived from num_gpus, but the code elsewhere distinguishes num_gpus (resource binding) vs num_procs (process count). If run_cfg.num_procs != run_cfg.num_gpus, this will launch the wrong number of msrun workers. Consider basing these on self.num_procs (or validating they match) to keep semantics consistent with the torchrun branch.
| worker_num=self.num_gpus, | |
| local_worker_num=self.num_gpus, | |
| worker_num=self.num_procs, | |
| local_worker_num=self.num_procs, |
| model_cls = get_model_cls_from_cfg(self.model_cfg) | ||
| launcher = getattr(model_cls, 'launcher', 'torchrun') if model_cls else 'torchrun' | ||
| if self.num_gpus > 1 and not use_backend and self.nnodes == 1: | ||
| port = random.randint(12000, 32000) | ||
| command = (f'torchrun --master_port={port} ' | ||
| f'--nproc_per_node {self.num_procs} ' | ||
| f'{script_path} {cfg_path}') | ||
| ctx = dict( | ||
| port=random.randint(12000, 32000), | ||
| script_path=script_path, | ||
| cfg_path=cfg_path, | ||
| worker_num=self.num_gpus, | ||
| local_worker_num=self.num_gpus, | ||
| nproc_per_node=self.num_procs, | ||
| nnodes=1, | ||
| master_addr=None, | ||
| node_rank=None, | ||
| ) | ||
| builder = LAUNCHER_CMD_BUILDERS.get(launcher, LAUNCHER_CMD_BUILDERS["torchrun"]) | ||
| command = builder(ctx) | ||
| elif self.nnodes > 1: |
There was a problem hiding this comment.
The new launcher selection + msrun command builder logic isn't covered by existing unit tests for OpenICLInferTask.get_command (tests currently only assert torchrun behavior). Please add a test case that uses a registered model class (or a stub with launcher = 'msrun') and asserts the generated command contains msrun and correct worker/local_worker counts.
| import transformers | ||
|
|
||
| from ais_bench.benchmark.models.local_models.base import BaseModel | ||
| from ais_bench.benchmark.models import APITemplateParser |
There was a problem hiding this comment.
APITemplateParser is imported but not used. If MindFormerModel should use a template parser, wire it up explicitly; otherwise remove the import to avoid confusion.
| from ais_bench.benchmark.models import APITemplateParser |
| if isinstance(outputs, dict): | ||
| outputs = outputs.get('sequences', outputs) | ||
| if outputs is None: | ||
| raise ValueError("Model output dictionary is missing 'sequence' key.") |
There was a problem hiding this comment.
The error message says the output dict is missing 'sequence', but the code actually looks for the 'sequences' key. This mismatch will mislead debugging; update the message (or the key) so they are consistent.
| raise ValueError("Model output dictionary is missing 'sequence' key.") | |
| raise ValueError("Model output dictionary is missing 'sequences' key.") |
| models = [ | ||
| dict( | ||
| attr="local", # local or service | ||
| type=MindFormerModel, # transformers < 4.33.0 用这个,优先AutoModelForCausalLM.from_pretrained加载模型,失败则用AutoModel.from_pretrained加载 |
There was a problem hiding this comment.
The inline comment for type=MindFormerModel mentions Transformers (<4.33.0) and HuggingFace loading behavior, which is unrelated to MindFormers and can confuse users. Please update the comment to describe MindFormers-specific behavior/requirements (e.g., that it uses yaml_cfg_file + MindFormers checkpoint loading).
| type=MindFormerModel, # transformers < 4.33.0 用这个,优先AutoModelForCausalLM.from_pretrained加载模型,失败则用AutoModel.from_pretrained加载 | |
| type=MindFormerModel, # MindFormers 模型类型,结合 yaml_cfg_file 配置和 MindFormers checkpoint 加载模型 |
| seed = None, | ||
| repetition_penalty = 1.03, | ||
| ), | ||
| run_cfg = dict(num_gpus=1, num_procs=1), # 多卡/多机多卡 参数,使用torchrun拉起任务 |
There was a problem hiding this comment.
run_cfg comment says the task is launched with torchrun, but MindFormerModel.launcher is set to msrun and OpenICLInferTask now supports msrun. Please update this comment to match the actual launcher to avoid misconfiguration.
| run_cfg = dict(num_gpus=1, num_procs=1), # 多卡/多机多卡 参数,使用torchrun拉起任务 | |
| run_cfg = dict(num_gpus=1, num_procs=1), # 多卡/多机多卡 参数,使用 msrun 拉起任务 |
| import os, sys | ||
| from typing import Dict, List, Optional, Union | ||
|
|
||
| import numpy as np | ||
| import torch |
There was a problem hiding this comment.
os, sys, and torch are imported but never used in this module, which adds noise and can introduce unnecessary dependency constraints. Please remove unused imports to keep the wrapper minimal.
| import os, sys | |
| from typing import Dict, List, Optional, Union | |
| import numpy as np | |
| import torch | |
| from typing import Dict, List, Optional, Union | |
| import numpy as np |
| assert len(messages) == 1 | ||
| tokens = self.tokenizer(messages, padding=False, truncation=False, return_tensors='np') | ||
| input_ids = tokens['input_ids'] | ||
| if input_ids.shape[-1] > self.max_seq_len: | ||
| input_ids = np.concatenate([input_ids[:, : self.max_seq_len // 2], input_ids[:, - self.max_seq_len // 2:]], axis=-1) |
There was a problem hiding this comment.
Using assert len(messages) == 1 for input validation is fragile because assertions can be disabled with Python optimizations, and it produces an unhelpful failure for users. Please raise a regular exception (e.g., ValueError/ParameterValueError) with a clear message when mode == 'mid' is used with batch inputs.
| assert len(messages) == 1 | |
| tokens = self.tokenizer(messages, padding=False, truncation=False, return_tensors='np') | |
| input_ids = tokens['input_ids'] | |
| if input_ids.shape[-1] > self.max_seq_len: | |
| input_ids = np.concatenate([input_ids[:, : self.max_seq_len // 2], input_ids[:, - self.max_seq_len // 2:]], axis=-1) | |
| if len(messages) != 1: | |
| raise ValueError( | |
| f"mode 'mid' does not support batch inputs: expected 1 message, got {len(messages)}." | |
| ) | |
| tokens = self.tokenizer(messages, padding=False, truncation=False, return_tensors='np') | |
| input_ids = tokens['input_ids'] | |
| if input_ids.shape[-1] > self.max_seq_len: | |
| input_ids = np.concatenate( | |
| [input_ids[:, : self.max_seq_len // 2], input_ids[:, - self.max_seq_len // 2:]], | |
| axis=-1, | |
| ) |
| def get_model_type_name(model_cfg: ConfigDict) -> str: | ||
| """从 model_cfg 得到稳定的模型类型名(用于分支判断)。兼容 type 为类或字符串。""" | ||
| t = model_cfg.get("type", "") | ||
| if t is None or t == "": | ||
| return "" | ||
| if hasattr(t, "__name__"): | ||
| return t.__name__ | ||
| if isinstance(t, str): | ||
| return t.split(".")[-1] if "." in t else t | ||
| return "" |
There was a problem hiding this comment.
get_model_type_name() is introduced to support model_cfg['type'] being a class, but the existing unit tests for build_model_from_cfg only cover string types. Please add a test that passes a class object in type to ensure logging/validation paths don’t crash (regression protection for this change).
| @@ -0,0 +1,316 @@ | |||
| import os, sys | |||
| from typing import Dict, List, Optional, Union | |||
There was a problem hiding this comment.
Import of 'Union' is not used.
| from typing import Dict, List, Optional, Union | |
| from typing import Dict, List, Optional |
Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。
PR Type / PR类型
Related Issue | 关联 Issue
Fixes #(issue ID / issue 编号) / Relates to #(issue ID / issue 编号)
🔍 Motivation / 变更动机
Please describe the motivation of this PR and the goal you want to achieve through this PR.
请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。
当前 AISBench 评测工具暂不支持直接加载 MindFormers 训练生成的模型权重进行离线评测,因此需要在 AISBench 侧对 MindFormers/MindSpore 模型进行统一封装适配,以实现兼容评测.
目标:创建 MindFormers 框架的模型加载脚本与对应的aisbench配置文件,令aisbench可通过对应命令调用mindformers模型权重进行测评:
ais_bench --models mf_model --datasets 评测任务📝 Modifica
tion / 修改内容
Please briefly describe what modification is made in this PR.
请简要描述此拉取请求中进行的修改。
1)增加配置文件:
ais_bench\benchmark\configs\models\mf_models\mindformers_model.py在Aisbench工具的config文件中,通过设置
type: MindFormerModel来选择mindformers的模型。与HF离线模型配置的差别如下:注意:yaml_cfg_path为新增配置,用来指示yaml配置路径。
2)增加模型封装文件:
ais_bench/benchmark/ais_bench/benchmark/models/mindformers_model.py,主要包含:model build :load_tokenizer、load_model、load_checkpoint。generete调用接口:主要集成mindformers的text_generete的调用,以及部分后处理
@MODELS.register_module(),将mindformers的模型封装类注册进去,3)修改aisbench runner 中的逻辑,适配inference任务的subprocess调用的命令,在运行mindformer_model时,使用msrun代替torchrun
📐 Associated Test Results / 关联测试结果
Please provide links to the related test results, such as CI pipelines, test reports, etc.
请提供相关测试结果的链接,例如 CI 管道、测试报告等。
Aisbench demo执行结果
黑盒验证:HuggingFace权重转换成minfformers权重,mindformers加载转换后的权重和huggingface流程加载转换前权重得到的评分相同
相同权重经过HuggingFaceModel和MindFormerModel模型进行评测的结果,误差控制在1%内:
Qwen3_0.6B Ceval评分 (单卡)
Qwen3-30B-A3B MMLU得分(多卡)
Does the modification introduce changes that break the backward compatibility of the downstream repositories? If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
是否引入了会破坏下游存储库向后兼容性的更改?如果是,请描述它如何破坏兼容性,以及下游项目应该如何修改其代码以保持与此 PR 的兼容性。
If the modification introduces performance degradation, please describe the impact of the performance degradation and the expected performance improvement.
如果引入了性能下降,请描述性能下降的影响和预期的性能改进。
🌟 Use cases (Optional) / 使用案例(可选)
If this PR introduces a new feature, it is better to list some use cases here and update the documentation.
如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。
✅ Checklist / 检查列表
Before PR:
After PR:
👥 Collaboration Info / 协作信息
🌟 Useful CI Command / 实用的CI命令
/gemini review/gemini summary/gemini help/readthedocs build