[CI] Reorg test file and fix moonlight oom#593
Conversation
Summary of ChangesHello @guapisolo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on refining the continuous integration (CI) test infrastructure by eliminating a significant number of redundant test files and updating configurations for existing tests. The changes aim to improve the efficiency and relevance of the CI pipeline, ensuring that tests reflect current model and hardware setups, and reducing unnecessary overhead. This cleanup is part of an ongoing effort to maintain a robust and optimized testing environment. Highlights
Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request primarily reorganizes e2e test files and addresses an out-of-memory (OOM) issue with the moonlight model tests. The changes to fix the OOM, such as increasing tensor parallelism and adjusting sglang's static memory fraction, are appropriate. The GPU reallocation in the qwen async test is also a reasonable configuration adjustment. However, a large number of test files have been removed without corresponding additions in this PR. While this is labeled as a reorganization, the absence of the new files raises concerns about a potential loss of test coverage. It's important to ensure these tests are properly relocated or their functionality is otherwise preserved. I've also left a couple of comments regarding a potentially confusing variable name used for GPU memory configuration.
|
|
||
| sglang_args = ( | ||
| "--rollout-num-gpus-per-engine 2 " "--sglang-mem-fraction-static 0.8 " "--sglang-max-running-requests 512 " | ||
| "--rollout-num-gpus-per-engine 2 " f"--sglang-mem-fraction-static {0.7 if TIGHT_HOST_MEMORY else 0.8} " "--sglang-max-running-requests 512 " |
There was a problem hiding this comment.
The variable TIGHT_HOST_MEMORY is used to configure --sglang-mem-fraction-static, which controls GPU memory allocation. This is confusing because the variable name implies it's related to host (CPU) memory, while the parameter affects device (GPU) memory. Other tests in this repository use a variable named TIGHT_DEVICE_MEMORY for this purpose. For clarity and consistency, it would be better to use a variable that accurately reflects its purpose, like TIGHT_DEVICE_MEMORY. This would involve changing the variable definition on line 5 as well.
|
|
||
| sglang_args = ( | ||
| "--rollout-num-gpus-per-engine 2 " "--sglang-mem-fraction-static 0.8 " "--sglang-max-running-requests 512 " | ||
| "--rollout-num-gpus-per-engine 2 " f"--sglang-mem-fraction-static {0.7 if TIGHT_HOST_MEMORY else 0.8} " "--sglang-max-running-requests 512 " |
There was a problem hiding this comment.
The variable TIGHT_HOST_MEMORY is used to configure --sglang-mem-fraction-static, which controls GPU memory allocation. This is confusing because the variable name implies it's related to host (CPU) memory, while the parameter affects device (GPU) memory. Other tests in this repository use a variable named TIGHT_DEVICE_MEMORY for this purpose. For clarity and consistency, it would be better to use a variable that accurately reflects its purpose, like TIGHT_DEVICE_MEMORY. This would involve changing the variable definition on line 5 as well.
There was a problem hiding this comment.
It looks good, but I have some advice.
Comments on deleting CI under image tags:
For the CI under image tags, I planned to run all tests under those images to validate the dev Docker, even though this may leave duplicate CI in different tags.
If we adopt the PR above:
We may need to update/add tags in the Docker release workflow (it seems this has not been added yet).
I think it’s better to add/update these tags every time we release a new Docker image.
There was a problem hiding this comment.
It looks good, but I have some advice.
Comments on deleting CI under image tags:
For the CI under image tags, I planned to run all tests under those images to validate the dev Docker, even though this may leave duplicate CI in different tags.
If we adopt the PR above:
We may need to update/add tags in the Docker release workflow (it seems this has not been added yet - #573).
I think it’s better to add/update these tags every time we release a new Docker image.
TODO: Replace moonlight-16B-A3B test by GLM-4.7-flash after new sglang docker bump and #566 merge. It's a 30b model with mla and mtp.
Upd: new dev docker has been bumped. wip #566