[AMD] Unify run-qwen3-4B.sh to support both AMD and NVIDIA GPUs#597
[AMD] Unify run-qwen3-4B.sh to support both AMD and NVIDIA GPUs#597lizamd wants to merge 3 commits intoradixark:mainfrom
Conversation
Auto-detect GPU vendor (/dev/kfd or torch.version.hip for AMD, nvidia-smi for NVIDIA) and conditionally apply platform-specific settings: - AMD: HIP_VISIBLE_DEVICES, RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES, --no-gradient-accumulation-fusion, --no-offload-train/rollout - NVIDIA: NVLink detection, NCCL_NVLS_ENABLE - Both: dynamic Megatron-LM path detection, configurable MODEL_DIR/DATA_DIR This eliminates the need for a separate run-qwen3-4B-amd.sh script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello @lizamd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request does a great job of unifying the run script to support both AMD and NVIDIA GPUs, which improves maintainability by removing a duplicate script. The introduction of platform detection, configurable paths, and dynamic discovery of Megatron-LM is a solid improvement.
I have a couple of suggestions to make the script even more robust and readable:
- The logic for determining
NUM_GPUSfor NVIDIA is currently hardcoded, unlike the dynamic approach for AMD. I've suggested a change to determine this dynamically, which would make the script more flexible across different NVIDIA hardware setups. - I've also suggested a minor formatting change to an
ifstatement to improve readability.
Overall, these are excellent changes that make the script more generic and easier to use.
scripts/run-qwen3-4B.sh
Outdated
| HAS_NVLINK=0 | ||
| else | ||
| NVLINK_COUNT=$(nvidia-smi topo -m 2>/dev/null | grep -o 'NV[0-9][0-9]*' | wc -l) | ||
| if [ "$NVLINK_COUNT" -gt 0 ]; then HAS_NVLINK=1; else HAS_NVLINK=0; fi |
There was a problem hiding this comment.
For better readability and maintainability, it's recommended to expand this compact if-then-else statement into a multi-line block. This makes the logic clearer at a glance.
| if [ "$NVLINK_COUNT" -gt 0 ]; then HAS_NVLINK=1; else HAS_NVLINK=0; fi | |
| if [ "$NVLINK_COUNT" -gt 0 ]; then | |
| HAS_NVLINK=1 | |
| else | |
| HAS_NVLINK=0 | |
| fi |
scripts/run-qwen3-4B.sh
Outdated
| NVLINK_COUNT=$(nvidia-smi topo -m 2>/dev/null | grep -o 'NV[0-9][0-9]*' | wc -l) | ||
| if [ "$NVLINK_COUNT" -gt 0 ]; then HAS_NVLINK=1; else HAS_NVLINK=0; fi | ||
| echo "HAS_NVLINK: $HAS_NVLINK (detected $NVLINK_COUNT NVLink references)" | ||
| NUM_GPUS=8 |
There was a problem hiding this comment.
Hardcoding NUM_GPUS=8 for NVIDIA is less flexible and inconsistent with the dynamic calculation for AMD GPUs. It's better to determine the number of GPUs dynamically for NVIDIA as well. This can be done by checking the CUDA_VISIBLE_DEVICES environment variable or using nvidia-smi. This makes the script more robust and adaptable to different environments.
| NUM_GPUS=8 | |
| if [ -n "${CUDA_VISIBLE_DEVICES-}" ]; then | |
| NUM_GPUS=$(echo "${CUDA_VISIBLE_DEVICES}" | tr ',' '\n' | wc -l) | |
| else | |
| # Fallback to nvidia-smi if CUDA_VISIBLE_DEVICES is not set, with a final fallback to 8. | |
| NUM_GPUS=$(nvidia-smi --query-gpu=count --format=csv,noheader 2>/dev/null || echo 8) | |
| fi |
scripts/run-qwen3-4B.sh
Outdated
| PLATFORM_TRAIN_ARGS=() | ||
| if [ "$GPU_VENDOR" = "amd" ]; then | ||
| # Apex not available on ROCm | ||
| MISC_ARGS+=(--no-gradient-accumulation-fusion) |
There was a problem hiding this comment.
We do not need this: MISC_ARGS+=(--no-gradient-accumulation-fusion)
The megatron and related stuff within the AMD docker img is already supported.
cc. @zyzshishui to confirm this.
scripts/run-qwen3-4B.sh
Outdated
| # Apex not available on ROCm | ||
| MISC_ARGS+=(--no-gradient-accumulation-fusion) | ||
| # Disable offloading (torch_memory_saver may not support ROCm; MI300X has 192GB HBM) | ||
| PLATFORM_TRAIN_ARGS+=(--no-offload-train --no-offload-rollout) |
There was a problem hiding this comment.
We do not need this: PLATFORM_TRAIN_ARGS+=(--no-offload-train --no-offload-rollout)
The torch_memory_saver has already resolved this issue,e and the AMD docker img is already supported.
cc. @zyzshishui to confirm this.
There was a problem hiding this comment.
correct, can be removed
- Use dynamic NVIDIA GPU count via nvidia-smi -L instead of hardcoded 8 - Remove --no-gradient-accumulation-fusion (AMD Docker now supports it) - Remove --no-offload-train/rollout (torch_memory_saver resolved for ROCm) - Expand compact if/else to multi-line for readability Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevent driver-level deadlocks when offload is enabled on AMD GPUs, consistent with PR radixark#588 changes to run-qwen3-4B-amd.sh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Auto-detect GPU vendor (/dev/kfd or torch.version.hip for AMD, nvidia-smi for NVIDIA) and conditionally apply platform-specific settings:
This eliminates the need for a separate run-qwen3-4B-amd.sh script.