[AMD] Unify run-qwen3-4B.sh to support both AMD and NVIDIA GPUs by lizamd · Pull Request #597 · radixark/miles

lizamd · 2026-02-13T19:04:50Z

Auto-detect GPU vendor (/dev/kfd or torch.version.hip for AMD, nvidia-smi for NVIDIA) and conditionally apply platform-specific settings:

AMD: HIP_VISIBLE_DEVICES, RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES, --no-gradient-accumulation-fusion, --no-offload-train/rollout
NVIDIA: NVLink detection, NCCL_NVLS_ENABLE
Both: dynamic Megatron-LM path detection, configurable MODEL_DIR/DATA_DIR

This eliminates the need for a separate run-qwen3-4B-amd.sh script.

Auto-detect GPU vendor (/dev/kfd or torch.version.hip for AMD, nvidia-smi for NVIDIA) and conditionally apply platform-specific settings: - AMD: HIP_VISIBLE_DEVICES, RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES, --no-gradient-accumulation-fusion, --no-offload-train/rollout - NVIDIA: NVLink detection, NCCL_NVLS_ENABLE - Both: dynamic Megatron-LM path detection, configurable MODEL_DIR/DATA_DIR This eliminates the need for a separate run-qwen3-4B-amd.sh script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-02-13T19:05:07Z

Summary of Changes

Hello @lizamd, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the run-qwen3-4B.sh script by introducing robust GPU vendor detection and applying tailored configurations for both AMD and NVIDIA environments. The changes streamline the workflow for users operating on different hardware, making the script more versatile and user-friendly. It also improves path management by making model and data directories configurable and dynamically locating the Megatron-LM installation.

Highlights

Unified GPU Support: The run-qwen3-4B.sh script now automatically detects the GPU vendor (AMD or NVIDIA) and applies platform-specific configurations, eliminating the need for separate scripts.
Platform-Specific Optimizations: For AMD GPUs, the script sets HIP_VISIBLE_DEVICES, RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES, and disables gradient accumulation fusion and offloading. For NVIDIA, it continues to detect NVLink and configure NCCL_NVLS_ENABLE.
Configurable Paths: Introduced MODEL_DIR and DATA_DIR environment variables, allowing users to easily configure the paths for model checkpoints and prompt data.
Dynamic Megatron-LM Path Detection: The script now dynamically detects the installation path of Megatron-LM, improving flexibility and reducing hardcoded dependencies.

Changelog

scripts/run-qwen3-4B.sh
- Implemented GPU vendor detection logic (AMD via /dev/kfd or torch.version.hip, NVIDIA via nvidia-smi).
- Introduced and utilized MODEL_DIR and DATA_DIR environment variables for configurable paths.
- Added AMD-specific environment variables (RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES, HIP_VISIBLE_DEVICES) and training arguments (--no-gradient-accumulation-fusion, --no-offload-train, --no-offload-rollout).
- Updated ray start and ray job submit commands to use dynamically determined GPU counts (NUM_GPUS).
- Modified PYTHONPATH in the Ray runtime environment to dynamically detect the Megatron-LM installation path.
- Updated checkpoint, rollout, and evaluation arguments to reference the new MODEL_DIR and DATA_DIR variables.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request does a great job of unifying the run script to support both AMD and NVIDIA GPUs, which improves maintainability by removing a duplicate script. The introduction of platform detection, configurable paths, and dynamic discovery of Megatron-LM is a solid improvement.

I have a couple of suggestions to make the script even more robust and readable:

The logic for determining NUM_GPUS for NVIDIA is currently hardcoded, unlike the dynamic approach for AMD. I've suggested a change to determine this dynamically, which would make the script more flexible across different NVIDIA hardware setups.
I've also suggested a minor formatting change to an if statement to improve readability.

Overall, these are excellent changes that make the script more generic and easier to use.

gemini-code-assist · 2026-02-13T19:07:39Z

scripts/run-qwen3-4B.sh

    HAS_NVLINK=0
+else
+    NVLINK_COUNT=$(nvidia-smi topo -m 2>/dev/null | grep -o 'NV[0-9][0-9]*' | wc -l)
+    if [ "$NVLINK_COUNT" -gt 0 ]; then HAS_NVLINK=1; else HAS_NVLINK=0; fi


For better readability and maintainability, it's recommended to expand this compact if-then-else statement into a multi-line block. This makes the logic clearer at a glance.

Suggested change

if [ "$NVLINK_COUNT" -gt 0 ]; then HAS_NVLINK=1; else HAS_NVLINK=0; fi

if [ "$NVLINK_COUNT" -gt 0 ]; then

HAS_NVLINK=1

else

HAS_NVLINK=0

fi

gemini-code-assist · 2026-02-13T19:07:39Z

scripts/run-qwen3-4B.sh

+    NVLINK_COUNT=$(nvidia-smi topo -m 2>/dev/null | grep -o 'NV[0-9][0-9]*' | wc -l)
+    if [ "$NVLINK_COUNT" -gt 0 ]; then HAS_NVLINK=1; else HAS_NVLINK=0; fi
+    echo "HAS_NVLINK: $HAS_NVLINK (detected $NVLINK_COUNT NVLink references)"
+    NUM_GPUS=8


Hardcoding NUM_GPUS=8 for NVIDIA is less flexible and inconsistent with the dynamic calculation for AMD GPUs. It's better to determine the number of GPUs dynamically for NVIDIA as well. This can be done by checking the CUDA_VISIBLE_DEVICES environment variable or using nvidia-smi. This makes the script more robust and adaptable to different environments.

Suggested change

NUM_GPUS=8

if [ -n "${CUDA_VISIBLE_DEVICES-}" ]; then

NUM_GPUS=$(echo "${CUDA_VISIBLE_DEVICES}" | tr ',' '\n' | wc -l)

else

# Fallback to nvidia-smi if CUDA_VISIBLE_DEVICES is not set, with a final fallback to 8.

NUM_GPUS=$(nvidia-smi --query-gpu=count --format=csv,noheader 2>/dev/null || echo 8)

fi

scripts/run-qwen3-4B.sh

yushengsu-thu · 2026-02-13T19:45:09Z

scripts/run-qwen3-4B.sh

+PLATFORM_TRAIN_ARGS=()
+if [ "$GPU_VENDOR" = "amd" ]; then
+    # Apex not available on ROCm
+    MISC_ARGS+=(--no-gradient-accumulation-fusion)


We do not need this: MISC_ARGS+=(--no-gradient-accumulation-fusion)
The megatron and related stuff within the AMD docker img is already supported.

cc. @zyzshishui to confirm this.

yes, no need

yushengsu-thu · 2026-02-13T19:45:55Z

scripts/run-qwen3-4B.sh

+    # Apex not available on ROCm
+    MISC_ARGS+=(--no-gradient-accumulation-fusion)
+    # Disable offloading (torch_memory_saver may not support ROCm; MI300X has 192GB HBM)
+    PLATFORM_TRAIN_ARGS+=(--no-offload-train --no-offload-rollout)


We do not need this: PLATFORM_TRAIN_ARGS+=(--no-offload-train --no-offload-rollout)
The torch_memory_saver has already resolved this issue,e and the AMD docker img is already supported.

cc. @zyzshishui to confirm this.

correct, can be removed

- Use dynamic NVIDIA GPU count via nvidia-smi -L instead of hardcoded 8 - Remove --no-gradient-accumulation-fusion (AMD Docker now supports it) - Remove --no-offload-train/rollout (torch_memory_saver resolved for ROCm) - Expand compact if/else to multi-line for readability Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Prevent driver-level deadlocks when offload is enabled on AMD GPUs, consistent with PR radixark#588 changes to run-qwen3-4B-amd.sh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist bot reviewed Feb 13, 2026

View reviewed changes

yushengsu-thu added the AMD label Feb 13, 2026

yushengsu-thu self-assigned this Feb 13, 2026

yushengsu-thu self-requested a review February 13, 2026 19:47

yushengsu-thu reviewed Feb 13, 2026

View reviewed changes

lizamd and others added 2 commits February 16, 2026 19:03

Add --sglang-disable-custom-all-reduce for AMD

0ea7c3b

Prevent driver-level deadlocks when offload is enabled on AMD GPUs, consistent with PR radixark#588 changes to run-qwen3-4B-amd.sh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Unify run-qwen3-4B.sh to support both AMD and NVIDIA GPUs#597

[AMD] Unify run-qwen3-4B.sh to support both AMD and NVIDIA GPUs#597
lizamd wants to merge 3 commits intoradixark:mainfrom
lizamd:unify-qwen3-4b-amd-nvidia

lizamd commented Feb 13, 2026

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 13, 2026

Uh oh!

gemini-code-assist bot Feb 13, 2026

Uh oh!

Uh oh!

Uh oh!

yushengsu-thu Feb 13, 2026

Uh oh!

zyzshishui Feb 13, 2026

Uh oh!

yushengsu-thu Feb 13, 2026

Uh oh!

zyzshishui Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-    NUM_GPUS=8
+    if [ -n "${CUDA_VISIBLE_DEVICES-}" ]; then
+        NUM_GPUS=$(echo "${CUDA_VISIBLE_DEVICES}" | tr ',' '\n' | wc -l)
+    else
+        # Fallback to nvidia-smi if CUDA_VISIBLE_DEVICES is not set, with a final fallback to 8.
+        NUM_GPUS=$(nvidia-smi --query-gpu=count --format=csv,noheader 2>/dev/null || echo 8)
+    fi

Conversation

lizamd commented Feb 13, 2026

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yushengsu-thu Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

zyzshishui Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

yushengsu-thu Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

zyzshishui Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants