Skip to content

Unify qwen3-4b.sh script for amd and nvidia#1

Open
lizamd wants to merge 77 commits intomainfrom
unify-qwen3-4b-amd-nvidia
Open

Unify qwen3-4b.sh script for amd and nvidia#1
lizamd wants to merge 77 commits intomainfrom
unify-qwen3-4b-amd-nvidia

Conversation

@lizamd
Copy link
Owner

@lizamd lizamd commented Feb 13, 2026

Summary

  • Auto-detect GPU vendor (/dev/kfd or torch.version.hip for AMD, nvidia-smi for NVIDIA)
  • Conditionally apply platform-specific args:
    • AMD: HIP_VISIBLE_DEVICES, RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES, --no-gradient-accumulation-fusion, --no-offload-train/rollout
    • NVIDIA: NVLink detection, NCCL_NVLS_ENABLE
  • Dynamic Megatron-LM path detection (from PR Fix PYTHONPATH for AMD container Megatron-LM location radixark/miles#506) for both platforms
  • Configurable MODEL_DIR/DATA_DIR env vars with /root defaults
  • Dynamic NUM_GPUS detection instead of hardcoded 8

This eliminates the need for a separate run-qwen3-4B-amd.sh.

fzyzcjy and others added 30 commits January 22, 2026 11:52
Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>
fzyzcjy and others added 30 commits January 23, 2026 18:20
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Yusheng Su <radixark@ac-h200-user-3.tail134ba0.ts.net>
Co-authored-by: root <root@mi300x8-008.atl1.do.cpe.ice.amd.com>
Co-authored-by: Banghua Zhu <banghuazhu@radixar.ai>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
Auto-detect GPU vendor (/dev/kfd or torch.version.hip for AMD,
nvidia-smi for NVIDIA) and conditionally apply platform-specific
settings:
- AMD: HIP_VISIBLE_DEVICES, RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES,
  --no-gradient-accumulation-fusion, --no-offload-train/rollout
- NVIDIA: NVLink detection, NCCL_NVLS_ENABLE
- Both: dynamic Megatron-LM path detection, configurable MODEL_DIR/DATA_DIR

This eliminates the need for a separate run-qwen3-4B-amd.sh script.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use dynamic NVIDIA GPU count via nvidia-smi -L instead of hardcoded 8
- Remove --no-gradient-accumulation-fusion (AMD Docker now supports it)
- Remove --no-offload-train/rollout (torch_memory_saver resolved for ROCm)
- Expand compact if/else to multi-line for readability

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevent driver-level deadlocks when offload is enabled on AMD GPUs,
consistent with PR radixark#588 changes to run-qwen3-4B-amd.sh.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.