Unify qwen3-4b.sh script for amd and nvidia by lizamd · Pull Request #1 · lizamd/miles

lizamd · 2026-02-13T19:03:39Z

Summary

Auto-detect GPU vendor (/dev/kfd or torch.version.hip for AMD, nvidia-smi for NVIDIA)
Conditionally apply platform-specific args:
- AMD: HIP_VISIBLE_DEVICES, RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES, --no-gradient-accumulation-fusion, --no-offload-train/rollout
- NVIDIA: NVLink detection, NCCL_NVLS_ENABLE
Dynamic Megatron-LM path detection (from PR Fix PYTHONPATH for AMD container Megatron-LM location radixark/miles#506) for both platforms
Configurable MODEL_DIR/DATA_DIR env vars with /root defaults
Dynamic NUM_GPUS detection instead of hardcoded 8

This eliminates the need for a separate run-qwen3-4B-amd.sh.

Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>

…with compatibility test (radixark#436)

…d latency (radixark#444)

…adixark#446)

…adixark#453)

radixark#460)

…mple (radixark#463)

…egenerated cases (radixark#464)

…#466)

)

…dixark#491)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Co-authored-by: Yusheng Su <radixark@ac-h200-user-3.tail134ba0.ts.net>

…xark#522)

…rk#524)

)

Co-authored-by: root <root@mi300x8-008.atl1.do.cpe.ice.amd.com>

Co-authored-by: Banghua Zhu <banghuazhu@radixar.ai> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Co-authored-by: Zijie Xia <zijie_xia@icloud.com> Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com> Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>

Auto-detect GPU vendor (/dev/kfd or torch.version.hip for AMD, nvidia-smi for NVIDIA) and conditionally apply platform-specific settings: - AMD: HIP_VISIBLE_DEVICES, RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES, --no-gradient-accumulation-fusion, --no-offload-train/rollout - NVIDIA: NVLink detection, NCCL_NVLS_ENABLE - Both: dynamic Megatron-LM path detection, configurable MODEL_DIR/DATA_DIR This eliminates the need for a separate run-qwen3-4B-amd.sh script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Use dynamic NVIDIA GPU count via nvidia-smi -L instead of hardcoded 8 - Remove --no-gradient-accumulation-fusion (AMD Docker now supports it) - Remove --no-offload-train/rollout (torch_memory_saver resolved for ROCm) - Expand compact if/else to multi-line for readability Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Prevent driver-level deadlocks when offload is enabled on AMD GPUs, consistent with PR radixark#588 changes to run-qwen3-4B-amd.sh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fzyzcjy and others added 30 commits January 22, 2026 11:52

Enable experimental rollout flag for CI tests (radixark#492)

474d542

Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>

Fix PYTHONPATH for AMD container Megatron-LM location (radixark#506)

72bafb1

Revert "Enable experimental rollout flag for CI tests" (radixark#507)

37c96a5

Add new API with extensibility and compatibility adapters (radixark#432)

df87211

Copy and split sglang_rollout.py to modular_rollout (radixark#433)

2dbe0d7

Use new rollout function API for modular rollout (radixark#434)

4adb662

Add mock SGLang server (radixark#435)

d08836c

Add integration test for rollout generation for several combinations …

e4c2dbf

…with compatibility test (radixark#436)

Add new sample generation API (radixark#437)

79368ff

Remove global variables in modular rollout (radixark#438)

fd7a755

Remove misplaced fields in GenerateState (radixark#439)

d6b23e0

Temporarily remove DP rank balancing in generate state (radixark#440)

8b1fe7f

Cleanup and shorten modular rollout code (radixark#441)

e436f1c

Add tests for all reward functions (radixark#443)

b915eb3

Enhance mock sglang server with concurrency and requests recording an…

3e3ce1a

…d latency (radixark#444)

Add FunctionRegistry to patch load_function (radixark#445)

274fc42

Add integration tests to cover various modes and features in rollout (r…

5b1eb16

…adixark#446)

Support speculative information in mock sglang server (radixark#449)

dda5031

Add thorough test for single turn generate function (radixark#450)

4cc51e1

Refactor single turn generate function (radixark#451)

d1f29ed

Allow user-provided function to add extra arguments (radixark#452)

491e71b

Copy core of retool example into multi_turn.py and adapt to new API (r…

f75a9a7

…adixark#453)

Support tool response tokenization logic (radixark#454)

7157eba

Support mock tools and corresponding server replies (radixark#456)

1e74a8a

Refactor to extract generation fixtures (radixark#458)

b609350

Support multi-turn testing with snapshot test utils (radixark#459)

ab03942

Update multi turn single sample implementation to use standard tooling (

a1dca85

radixark#460)

Support comparison tests and edge case tests for multi-turn-single-sa…

0c0cc8a

…mple (radixark#463)

Change behavior of multi turn single sample to match single turn in d…

cc46921

…egenerated cases (radixark#464)

Refactor and unify multi-turn-single-sample and single-turn (radixark…

0626e73

…#466)

fzyzcjy and others added 30 commits January 23, 2026 18:20

Support rollout routing replay for multi turn (radixark#486)

716c2dd

Change max_num_tokens according to rollout_max_context_len (radixark#487

1c26271

)

Minor code and test cleanup (radixark#488)

4faeb7a

Cleanup file and folder structure for rollout (radixark#489)

204ecb1

Add CPU-only tests to CI (radixark#490)

6ecdec9

Use new rollout function by default when corresponding flag is on (ra…

59fa9f1

…dixark#491)

Enable experimental rollout flag for CI tests (radixark#508)

f20b28c

rather professional readme document (radixark#511)

f652c2c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

[Docs] Remove linked blog (radixark#518)

5585843

Adds new blogs to latest update (radixark#520)

a8c8687

[CI] Re-organize and enable necessary end2end CI cases (radixark#499)

81ea4a8

Co-authored-by: Yusheng Su <radixark@ac-h200-user-3.tail134ba0.ts.net>

fix: fix arg parsing by making help a string instead of a tuple (radi…

9440fd1

…xark#522)

[fix] mbridge incorrectly handle all weight precision to bf16 (radixa…

a681b87

…rk#524)

fix int4 kernel setup.py (radixark#527)

d22bc8c

fix: missing comma in runtime env JSON for qwen3-235B-A22B (radixark#531

ebdea20

)

Fix accuracy bug in data packing for thd (radixark#542)

be0c840

Fix Memory Leak on Rocm Offload (radixark#545)

511e560

[bugfix] Fix R3 padding (radixark#551)

6bc0dd5

Super tiny fix for blog link (radixark#553)

e8c5a8c

Co-authored-by: root <root@mi300x8-008.atl1.do.cpe.ice.amd.com>

Update CODEOWNERS with new paths and owners (radixark#564)

e67fc49

feat: Support OAI TITO v1 (radixark#502)

32d4c85

Update CODEOWNERS for miles directory ownership (radixark#569)

824b849

[Doc] Add doc for miles router (radixark#538)

a93d484

fix: allow --seq-length CLI override in megatron backend (radixark#568)

05e73c3

Co-authored-by: Banghua Zhu <banghuazhu@radixar.ai> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

[AMD] Bump Megatron to 3714d81d and SGLang v0.5.7 (radixark#563)

6fd53f6

[CI] Fix CI oom in Qwen-30B-A3B (radixark#580)

500d9e9

docs: add Miles server arguments (radixark#517)

9286b1a

Co-authored-by: Zijie Xia <zijie_xia@icloud.com> Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com> Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>

Add --sglang-disable-custom-all-reduce for AMD

0ea7c3b

Prevent driver-level deadlocks when offload is enabled on AMD GPUs, consistent with PR radixark#588 changes to run-qwen3-4B-amd.sh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify qwen3-4b.sh script for amd and nvidia#1

Unify qwen3-4b.sh script for amd and nvidia#1
lizamd wants to merge 77 commits intomainfrom
unify-qwen3-4b-amd-nvidia

lizamd commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

Conversation

lizamd commented Feb 13, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants