Skip to content

[feat] gpt-oss support v1#585

Draft
guapisolo wants to merge 5 commits intoradixark:mainfrom
guapisolo:feat/gpt_oss_fp4
Draft

[feat] gpt-oss support v1#585
guapisolo wants to merge 5 commits intoradixark:mainfrom
guapisolo:feat/gpt_oss_fp4

Conversation

@guapisolo
Copy link
Collaborator

@guapisolo guapisolo commented Feb 11, 2026

Exists:

  • SGLang full gpt-oss inference support
  • Megatron gpt-oss training support (bf16, mxfp4)

Steps:

Will no cover in this PR:

  • fp4 ckpt conversion. Which affect:
    • Ckpt load/save. Current mbridge does not support native low precision ckpt conversion. Using Megeatron-Bridge cannot do ckpt conversion directly.
    • KL loss.
  • Need further investigate Megeatron-Bridge.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @guapisolo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces foundational support for GPT-OSS models within the system, specifically integrating them for SGLang inference and Megatron training. A key enhancement is the introduction of a configurable pad_token_id, which provides greater flexibility in handling token padding. Additionally, new scripts have been added to streamline the setup and execution of the gpt-oss-20b model, facilitating its use in various training and evaluation scenarios.

Highlights

  • GPT-OSS Support: Introduced initial support for GPT-OSS models, enabling both SGLang inference and Megatron training capabilities.
  • Configurable Padding Token ID: Implemented a configurable pad_token_id argument, allowing users to specify the padding token ID during training and inference processes.
  • New Model Scripts: Added dedicated shell scripts for configuring and running the gpt-oss-20b model, including specific architecture and training parameters.
Changelog
  • miles/backends/fsdp_utils/actor.py
    • Updated calls to get_batch within _compute_log_prob and _train_core to pass the newly introduced pad_token_id argument.
  • miles/backends/megatron_utils/model.py
    • Updated calls to get_batch within forward_step to pass the new pad_token_id argument.
  • miles/backends/training_utils/data.py
    • Modified the get_batch function to accept pad_token_id as a parameter.
    • Removed the hardcoded pad_token_id = 0 assignment within the get_batch function.
  • miles/utils/arguments.py
    • Added a new command-line argument --pad-token-id to allow configuration of the padding token ID.
  • scripts/models/gpt-oss-20b.sh
    • Added a new script defining the model architecture arguments for the gpt-oss-20b model.
  • scripts/run-gpt-oss-20b.sh
    • Added a new script to orchestrate the training run for the gpt-oss-20b model, including various performance, optimizer, and SGLang arguments.
Activity
  • No human activity has occurred on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for gpt-oss models by making pad_token_id configurable and adding new scripts for gpt-oss-20b. A high-severity security vulnerability was found in scripts/run-gpt-oss-20b.sh due to the exposure of the WandB API key as a command-line argument; it is strongly recommended to use environment variables for secrets. Furthermore, the script scripts/run-gpt-oss-20b.sh requires improvements in process cleanup and handling of undefined variables for robustness.

--use-wandb
--wandb-project miles-mgt-oss
--wandb-group "20b-bf16"
--wandb-key ${WANDB_API_KEY}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The script passes the WandB API key as a command-line argument: --wandb-key ${WANDB_API_KEY}. This is a high-risk practice because command-line arguments are often visible to other users on the same system via process listing commands (e.g., ps aux). Exposing the API key in this manner could lead to unauthorized access to the WandB project, potentially allowing an attacker to view or manipulate experiment data. The WandB API key should be passed through an environment variable, which is more secure than a command-line argument. The wandb library automatically reads the WANDB_API_KEY environment variable. Remove the --wandb-key argument from the script and ensure the WANDB_API_KEY environment variable is set in the execution environment.

Comment on lines +4 to +11
pkill -9 sglang
sleep 3
ray stop --force
pkill -9 ray
pkill -9 python
sleep 3
pkill -9 ray
pkill -9 python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The process cleanup logic at the start of the script is overly aggressive and contains redundancies. Using pkill -9 python is particularly risky as it can terminate unrelated Python processes on the system. It's safer to rely on ray stop --force for Ray processes and be more specific with other pkill commands. The repeated commands are also unnecessary.

Suggested change
pkill -9 sglang
sleep 3
ray stop --force
pkill -9 ray
pkill -9 python
sleep 3
pkill -9 ray
pkill -9 python
# for rerun the task
ray stop --force
pkill -f sglang
sleep 3
pkill -9 -f sglang

# Must use --qkv-format bshd for the fused backend to work with this model's attention pattern.
--qkv-format bshd
--attention-backend fused
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The EVAL_ARGS array is used in the ray job submit command but is not defined in the script. This will cause evaluation-related arguments to be missed. You should define EVAL_ARGS, even if it's empty, to prevent potential errors and improve script clarity.

Suggested change
)
)
EVAL_ARGS=(
# Add evaluation arguments here, e.g.:
# --eval-interval 100
)

set -ex

# will prevent ray from buffering stdout/stderr
export PYTHONBUFFERED=16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The environment variable PYTHONBUFFERED is not a standard way to control Python's output buffering. The correct variable for this purpose is PYTHONUNBUFFERED. Setting it to 1 will disable output buffering, which is generally desired for logging in distributed environments like Ray.

Suggested change
export PYTHONBUFFERED=16
export PYTHONUNBUFFERED=1

\"env_vars\": {
\"PYTHONPATH\": \"/root/Megatron-LM/\",
\"CUDA_DEVICE_MAX_CONNECTIONS\": \"1\",
\"NCCL_NVLS_ENABLE\": \"${HAS_NVLINK}\"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The HAS_NVLINK variable is used without a default value. If this variable is not set in the execution environment, it will expand to an empty string, potentially causing an invalid configuration for NCCL_NVLS_ENABLE. It's safer to provide a default value.

Suggested change
\"NCCL_NVLS_ENABLE\": \"${HAS_NVLINK}\"
"NCCL_NVLS_ENABLE": "${HAS_NVLINK:-0}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant