Skip to content

【TEST】补充math和agieval数据集的冒烟用例#145

Open
GaoHuaZhang wants to merge 1 commit intoAISBench:masterfrom
GaoHuaZhang:smoke_add
Open

【TEST】补充math和agieval数据集的冒烟用例#145
GaoHuaZhang wants to merge 1 commit intoAISBench:masterfrom
GaoHuaZhang:smoke_add

Conversation

@GaoHuaZhang
Copy link
Collaborator

@GaoHuaZhang GaoHuaZhang commented Feb 11, 2026

Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。

PR Type / PR类型

  • Feature(功能新增)
  • Bugfix(Bug 修复)
  • Docs(文档更新)
  • CI/CD(持续集成/持续部署)
  • Refactor(代码重构)
  • Perf(性能优化)
  • Dependency(依赖项更新)
  • Test-Cases(测试用例更新)
  • Other(其他)

Related Issue | 关联 Issue
Fixes #(issue ID / issue 编号) / Relates to #(issue ID / issue 编号)

🔍 Motivation / 变更动机

为 agieval、math 两个精度评测场景补充 smoke test,便于在 CI 中快速回归 run_server_accuracy / llm_datasets_main 流程,验证配置与环境正确性。
Add smoke tests for agieval and math accuracy evaluation scenarios to enable quick regression of run_server_accuracy / llm_datasets_main in CI and verify config and environment.

📝 Modification / 修改内容

  • 新增 2 个 smoke test case
    • accuracy_agieval:agieval 数据集(冒烟仅跑子集 agieval-gaokao-chinese),vllm-api-general-chat。
    • accuracy_math:math 数据集(子集 math_prm800k_500),vllm-api-general-chat。
  • 每个 case 包含case.ymlrun.shclean.sh,以及 ais_bench_configs/ 下 datasets、models(vllm_api)配置;目录分别为 smoke_tests/test-case/run_server_accuracy/llm_datasets_main/accuracy_agieval/accuracy_math/
  • 配置来源:case 内复制的 config 来自 ais_bench/benchmark/configs/datasets/ais_bench/benchmark/configs/models/ 的现有配置,本次为拷贝到 smoke 用例目录并接好 run/clean,无修改既有业务逻辑。
  • 共 10 个文件变更,+214 行。

📐 Associated Test Results / 关联测试结果

待 CI 运行后补充。 / To be added after CI run.

⚠️ BC-breaking (Optional) / 向后不兼容变更(可选)

无。仅新增 smoke test 用例,不涉及下游兼容性。 / None. New smoke test cases only; no downstream compatibility impact.

⚠️ Performance degradation (Optional) / 性能下降(可选)

无。 / None.

🌟 Use cases (Optional) / 使用案例(可选)

image

✅ Checklist / 检查列表

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues. / 使用预提交或其他 linting 工具来修复潜在的 lint 问题。
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。
  • The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。
  • All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects. / 如果此拉取请求对下游或其他相关项目有潜在影响,应在那些项目中测试此 PR。
  • CLA has been signed and all committers have signed the CLA in this PR. / CLA 已签署,且本 PR 中的所有提交者均已签署 CLA。

👥 Collaboration Info / 协作信息

  • Suggested Reviewers / 建议审核人: @xxx
  • Relevant Module Owners / 相关模块负责人: @xxx
  • Other Collaboration Notes / 其他协作说明:

🌟 Useful CI Command / 实用的CI命令

Command / 命令 Introduction / 介绍
/gemini review Performs a code review for the current pull request in its current state by Gemini. / 对当前拉取请求在当前状态下由 Gemini 执行代码审核。
/gemini summary Provides a summary of the current pull request in its current state by Gemini. / 对当前拉取请求在当前状态下由 Gemini 提供摘要。
/gemini help Displays a list of available commands of Gemini. / 显示 Gemini 可用命令的列表。
/readthedocs build Triggers a build of the documentation for the current pull request in its current state by Read the Docs. / 触发当前拉取请求在当前状态下由 Read the Docs 构建文档。

@gemini-code-assist
Copy link

Summary of Changes

Hello @GaoHuaZhang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在通过引入针对 mathagieval 数据集的新冒烟测试用例,增强项目的测试覆盖范围和稳定性。这些测试用例经过精心配置,以使用有限的数据样本和特定的模型设置,从而实现快速、高效的验证,确保 ais_bench 框架内关键数据集和模型的集成与基本功能正常运作。

Highlights

  • 新增 AgiEval 数据集冒烟测试用例: 为 AgiEval 数据集添加了新的冒烟测试用例,旨在快速验证其基本功能和集成。
  • 新增 Math 数据集冒烟测试用例: 为 Math 数据集添加了新的冒烟测试用例,以确保其核心功能在快速检查中正常运行。
  • 数据集配置优化: 针对 AgiEval 和 Math 冒烟测试,数据集配置被限制为仅使用一个子集或小样本范围([0:10]),以加快测试速度。
  • 模型配置更新: 为 AgiEval 和 Math 冒烟测试配置了 vllm_api 模型,统一使用 'qwen' 模型,并设置了特定的 max_out_lenbatch_size
  • 测试脚本和配置文件的引入: 每个新的冒烟测试都引入了 case.ymlrun.shclean.sh 文件,用于定义测试案例、执行测试流程和清理测试环境。

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • smoke_tests/test-case/run_server_accuracy/llm_datasets_main/accuracy_agieval/ais_bench_configs/datasets/agieval/accuracy_agieval.py
    • 为 AgiEval 冒烟测试配置了数据集,限制为单个子集和小的测试范围。
  • smoke_tests/test-case/run_server_accuracy/llm_datasets_main/accuracy_agieval/ais_bench_configs/models/vllm_api/accuracy_agieval.py
    • 为 AgiEval 冒烟测试定义了模型配置,指定使用 'qwen' 模型,并设置了 max_out_lenbatch_size
  • smoke_tests/test-case/run_server_accuracy/llm_datasets_main/accuracy_agieval/case.yml
    • 添加了 accuracy_agieval 冒烟测试的基准案例定义文件。
  • smoke_tests/test-case/run_server_accuracy/llm_datasets_main/accuracy_agieval/clean.sh
    • 创建了 AgiEval 冒烟测试的清理脚本,用于删除临时日志文件。
  • smoke_tests/test-case/run_server_accuracy/llm_datasets_main/accuracy_agieval/run.sh
    • 实现了 AgiEval 准确性冒烟测试的执行脚本,包括环境设置和结果验证。
  • smoke_tests/test-case/run_server_accuracy/llm_datasets_main/accuracy_math/ais_bench_configs/datasets/math/accuracy_math.py
    • 为 Math 冒烟测试配置了数据集,限制为小的测试范围。
  • smoke_tests/test-case/run_server_accuracy/llm_datasets_main/accuracy_math/ais_bench_configs/models/vllm_api/accuracy_math.py
    • 为 Math 冒烟测试定义了模型配置,指定使用 'qwen' 模型,并设置了 max_out_lenbatch_size
  • smoke_tests/test-case/run_server_accuracy/llm_datasets_main/accuracy_math/case.yml
    • 添加了 accuracy_math 冒烟测试的基准案例定义文件。
  • smoke_tests/test-case/run_server_accuracy/llm_datasets_main/accuracy_math/clean.sh
    • 创建了 Math 冒烟测试的清理脚本,用于删除临时日志文件。
  • smoke_tests/test-case/run_server_accuracy/llm_datasets_main/accuracy_math/run.sh
    • 实现了 Math 准确性冒烟测试的执行脚本,包括环境设置和结果验证。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

这个PR为mathagieval数据集添加了冒烟测试用例,这对于保证代码质量非常有帮助。在审查代码时,我注意到在accuracy_agievalaccuracy_math两个测试用例中,run.shclean.sh脚本存在大量的代码重复。为了提高代码的可维护性,强烈建议将这些脚本中的通用逻辑提取到共享脚本中。例如,可以创建一个通用的run.sh,由各个测试用例的run.sh来设置特定变量并调用它。这可以减少冗余代码,并使未来的修改更加容易。除此之外,我还针对run.sh中的重复代码块和clean.sh的简化提出了一些具体的建议。

@@ -0,0 +1,4 @@
#!/bin/bash
CUR_DIR=$(dirname $(readlink -f $0))
[ -f "${CUR_DIR}/tmplog.txt" ] && rm -f "${CUR_DIR}/tmplog.txt"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

此行命令可以简化。rm -f 命令在文件不存在时不会报错,因此前面的 [ -f ... ] 检查是多余的。

另外,这个 clean.sh 脚本与 accuracy_math 测试用例中的脚本内容完全相同。为了避免代码重复,建议将此脚本统一管理,例如创建一个公共的清理脚本供所有测试用例使用。

Suggested change
[ -f "${CUR_DIR}/tmplog.txt" ] && rm -f "${CUR_DIR}/tmplog.txt"
rm -f "${CUR_DIR}/tmplog.txt"

Comment on lines +51 to +70
if [ ! -f "$LOG_EVAL_OUTPUT_PATH" ];then
echo "Can't find $LOG_EVAL_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$LOG_INFER_OUTPUT_PATH" ];then
echo "Can't find $LOG_INFER_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$PREDICTIONS_OUTPUT_PATH" ];then
echo "Can't find $PREDICTIONS_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$RESULTS_OUTPUT_PATH" ];then
echo "Can't find $RESULTS_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$SUMMARY_OUTPUT_PATH" ];then
echo "Can't find $SUMMARY_OUTPUT_PATH"
exit $ret_failed
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

这部分代码中存在大量重复的 if 语句来检查文件是否存在。为了提高代码的可读性和可维护性,建议使用一个循环来代替这些重复的检查。

Suggested change
if [ ! -f "$LOG_EVAL_OUTPUT_PATH" ];then
echo "Can't find $LOG_EVAL_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$LOG_INFER_OUTPUT_PATH" ];then
echo "Can't find $LOG_INFER_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$PREDICTIONS_OUTPUT_PATH" ];then
echo "Can't find $PREDICTIONS_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$RESULTS_OUTPUT_PATH" ];then
echo "Can't find $RESULTS_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$SUMMARY_OUTPUT_PATH" ];then
echo "Can't find $SUMMARY_OUTPUT_PATH"
exit $ret_failed
fi
for output_file in \
"$LOG_EVAL_OUTPUT_PATH" \
"$LOG_INFER_OUTPUT_PATH" \
"$PREDICTIONS_OUTPUT_PATH" \
"$RESULTS_OUTPUT_PATH" \
"$SUMMARY_OUTPUT_PATH"
do
if [ ! -f "$output_file" ]; then
echo "Can't find $output_file"
exit $ret_failed
fi
done

@@ -0,0 +1,4 @@
#!/bin/bash
CUR_DIR=$(dirname $(readlink -f $0))
[ -f "${CUR_DIR}/tmplog.txt" ] && rm -f "${CUR_DIR}/tmplog.txt"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

此行命令可以简化。rm -f 命令在文件不存在时不会报错,因此前面的 [ -f ... ] 检查是多余的。

另外,这个 clean.sh 脚本与 accuracy_agieval 测试用例中的脚本内容完全相同。为了避免代码重复,建议将此脚本统一管理,例如创建一个公共的清理脚本供所有测试用例使用。

Suggested change
[ -f "${CUR_DIR}/tmplog.txt" ] && rm -f "${CUR_DIR}/tmplog.txt"
rm -f "${CUR_DIR}/tmplog.txt"

Comment on lines +50 to +69
if [ ! -f "$LOG_EVAL_OUTPUT_PATH" ];then
echo "Can't find $LOG_EVAL_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$LOG_INFER_OUTPUT_PATH" ];then
echo "Can't find $LOG_INFER_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$PREDICTIONS_OUTPUT_PATH" ];then
echo "Can't find $PREDICTIONS_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$RESULTS_OUTPUT_PATH" ];then
echo "Can't find $RESULTS_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$SUMMARY_OUTPUT_PATH" ];then
echo "Can't find $SUMMARY_OUTPUT_PATH"
exit $ret_failed
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

这部分代码中存在大量重复的 if 语句来检查文件是否存在。为了提高代码的可读性和可维护性,建议使用一个循环来代替这些重复的检查。这个问题在 accuracy_agieval/run.sh 中也同样存在。

Suggested change
if [ ! -f "$LOG_EVAL_OUTPUT_PATH" ];then
echo "Can't find $LOG_EVAL_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$LOG_INFER_OUTPUT_PATH" ];then
echo "Can't find $LOG_INFER_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$PREDICTIONS_OUTPUT_PATH" ];then
echo "Can't find $PREDICTIONS_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$RESULTS_OUTPUT_PATH" ];then
echo "Can't find $RESULTS_OUTPUT_PATH"
exit $ret_failed
fi
if [ ! -f "$SUMMARY_OUTPUT_PATH" ];then
echo "Can't find $SUMMARY_OUTPUT_PATH"
exit $ret_failed
fi
for output_file in \
"$LOG_EVAL_OUTPUT_PATH" \
"$LOG_INFER_OUTPUT_PATH" \
"$PREDICTIONS_OUTPUT_PATH" \
"$RESULTS_OUTPUT_PATH" \
"$SUMMARY_OUTPUT_PATH"
do
if [ ! -f "$output_file" ]; then
echo "Can't find $output_file"
exit $ret_failed
fi
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments