Conversation
Summary of ChangesHello @guapisolo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces foundational support for GPT-OSS models within the system, specifically integrating them for SGLang inference and Megatron training. A key enhancement is the introduction of a configurable Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for gpt-oss models by making pad_token_id configurable and adding new scripts for gpt-oss-20b. A high-severity security vulnerability was found in scripts/run-gpt-oss-20b.sh due to the exposure of the WandB API key as a command-line argument; it is strongly recommended to use environment variables for secrets. Furthermore, the script scripts/run-gpt-oss-20b.sh requires improvements in process cleanup and handling of undefined variables for robustness.
scripts/run-gpt-oss-20b.sh
Outdated
| --use-wandb | ||
| --wandb-project miles-mgt-oss | ||
| --wandb-group "20b-bf16" | ||
| --wandb-key ${WANDB_API_KEY} |
There was a problem hiding this comment.
The script passes the WandB API key as a command-line argument: --wandb-key ${WANDB_API_KEY}. This is a high-risk practice because command-line arguments are often visible to other users on the same system via process listing commands (e.g., ps aux). Exposing the API key in this manner could lead to unauthorized access to the WandB project, potentially allowing an attacker to view or manipulate experiment data. The WandB API key should be passed through an environment variable, which is more secure than a command-line argument. The wandb library automatically reads the WANDB_API_KEY environment variable. Remove the --wandb-key argument from the script and ensure the WANDB_API_KEY environment variable is set in the execution environment.
| pkill -9 sglang | ||
| sleep 3 | ||
| ray stop --force | ||
| pkill -9 ray | ||
| pkill -9 python | ||
| sleep 3 | ||
| pkill -9 ray | ||
| pkill -9 python |
There was a problem hiding this comment.
The process cleanup logic at the start of the script is overly aggressive and contains redundancies. Using pkill -9 python is particularly risky as it can terminate unrelated Python processes on the system. It's safer to rely on ray stop --force for Ray processes and be more specific with other pkill commands. The repeated commands are also unnecessary.
| pkill -9 sglang | |
| sleep 3 | |
| ray stop --force | |
| pkill -9 ray | |
| pkill -9 python | |
| sleep 3 | |
| pkill -9 ray | |
| pkill -9 python | |
| # for rerun the task | |
| ray stop --force | |
| pkill -f sglang | |
| sleep 3 | |
| pkill -9 -f sglang |
| # Must use --qkv-format bshd for the fused backend to work with this model's attention pattern. | ||
| --qkv-format bshd | ||
| --attention-backend fused | ||
| ) |
There was a problem hiding this comment.
The EVAL_ARGS array is used in the ray job submit command but is not defined in the script. This will cause evaluation-related arguments to be missed. You should define EVAL_ARGS, even if it's empty, to prevent potential errors and improve script clarity.
| ) | |
| ) | |
| EVAL_ARGS=( | |
| # Add evaluation arguments here, e.g.: | |
| # --eval-interval 100 | |
| ) |
| set -ex | ||
|
|
||
| # will prevent ray from buffering stdout/stderr | ||
| export PYTHONBUFFERED=16 |
There was a problem hiding this comment.
The environment variable PYTHONBUFFERED is not a standard way to control Python's output buffering. The correct variable for this purpose is PYTHONUNBUFFERED. Setting it to 1 will disable output buffering, which is generally desired for logging in distributed environments like Ray.
| export PYTHONBUFFERED=16 | |
| export PYTHONUNBUFFERED=1 |
| \"env_vars\": { | ||
| \"PYTHONPATH\": \"/root/Megatron-LM/\", | ||
| \"CUDA_DEVICE_MAX_CONNECTIONS\": \"1\", | ||
| \"NCCL_NVLS_ENABLE\": \"${HAS_NVLINK}\" |
There was a problem hiding this comment.
The HAS_NVLINK variable is used without a default value. If this variable is not set in the execution environment, it will expand to an empty string, potentially causing an invalid configuration for NCCL_NVLS_ENABLE. It's safer to provide a default value.
| \"NCCL_NVLS_ENABLE\": \"${HAS_NVLINK}\" | |
| "NCCL_NVLS_ENABLE": "${HAS_NVLINK:-0}" |
Exists:
Steps:
Will no cover in this PR: