Megatron-LM specific release v2.8 rocm cherrypicks by sudhu2k · Pull Request #498 · ROCm/TransformerEngine

sudhu2k · 2026-03-19T20:38:12Z

Description

These cherry picks fixes some issues that's seen in Megatron-LM unit test runs.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

* Update max_fp8 value based on is_fp8_fnuz check in utils.py * Fixed and added test_cast_master_weights_to_fp8 to ci Addressed Reviews * Update copyright information. --------- Co-authored-by: Veera Gopu <veerarajasekharreddy.gopu@amd.com>

* Update permutation.py * Update permutation.py * Update transformer_engine/pytorch/triton/permutation.py * Update transformer_engine/pytorch/triton/permutation.py --------- Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ipanfilo · 2026-03-20T04:14:48Z

ci/pytorch.sh

    configure_omp_threads 8
    run_default_fa 1 test_fused_optimizer.py
    run_default_fa 3 test_sanity_import.py
+    run_default_fa 3 distributed/test_cast_master_weights_to_fp8.py


It requires test hotfix doesn't it?

This one didn't require the hotfix.

TE2.10 required hotfix:

TransformerEngine/tests/pytorch/distributed/test_cast_master_weights_to_fp8.py

Lines 722 to 724 in 3344b85

# ROCm: Use executable as-is; do not resolve() or a venv symlink may point to system

# Python which does not have torch/site-packages.

python_exe = pathlib.Path(sys.executable)

TE2.8 implements the test differently:

TransformerEngine/tests/pytorch/distributed/test_cast_master_weights_to_fp8.py

Lines 27 to 40 in a25a454

def _run_test(quantization):

test_path = TEST_ROOT / "run_cast_master_weights_to_fp8.py"

test_cmd = LAUNCH_CMD + [str(test_path)] + ["--quantization", quantization]

result = subprocess.run(test_cmd, env=os.environ, check=False)

assert result.returncode == 0

@pytest.mark.parametrize("quantization", ["fp8", "fp8_cs", "fp8_block"])

def test_cast_master_weights_to_fp8(quantization):

if quantization in ("fp8", "fp8_cs") and not fp8_available:

pytest.skip(reason_for_no_fp8)

if quantization == "fp8_block" and not fp8_block_scaling_available:

pytest.skip(reason_for_no_fp8_block_scaling)

_run_test(quantization)

ipanfilo

Run CI level 3

sudhu2k · 2026-03-20T17:12:06Z

Run CI level 3

MGPU/SGPU tests passed on level 3
failed on example test on MI325 with a flaky huggingface error:

File "/opt/venv/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 3010, in dataset_info
   hf_raise_for_status(r)
 File "/opt/venv/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 880, in hf_raise_for_status
   raise _format(HfHubHTTPError, message, response) from e
huggingface_hub.errors.HfHubHTTPError: (Request ID: Root=1-69bcf50c-102ad0297e9cea061658671b;35076563-ba85-4f92-a0a7-3f465e541e0a)

429 Too Many Requests: you have reached your 'api' rate limit.

https://github.com/ROCm/TransformerEngine/actions/runs/23328907485/job/67856033151

sudhu2k and others added 2 commits March 17, 2026 18:30

sudhu2k requested review from ipanfilo, wangye805 and wenchenvincent as code owners March 19, 2026 20:38

ipanfilo reviewed Mar 20, 2026

View reviewed changes

sudhu2k requested a review from ipanfilo March 20, 2026 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Megatron-LM specific release v2.8 rocm cherrypicks#498

Megatron-LM specific release v2.8 rocm cherrypicks#498
sudhu2k wants to merge 2 commits intorelease_v2.8_rocmfrom
sudhu/release_v2.8_rocm_cherrypicks

sudhu2k commented Mar 19, 2026

Uh oh!

ipanfilo Mar 20, 2026

Uh oh!

sudhu2k Mar 20, 2026

Uh oh!

ipanfilo left a comment

Uh oh!

sudhu2k commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	# ROCm: Use executable as-is; do not resolve() or a venv symlink may point to system
	# Python which does not have torch/site-packages.
	python_exe = pathlib.Path(sys.executable)

	def _run_test(quantization):
	test_path = TEST_ROOT / "run_cast_master_weights_to_fp8.py"
	test_cmd = LAUNCH_CMD + [str(test_path)] + ["--quantization", quantization]
	result = subprocess.run(test_cmd, env=os.environ, check=False)
	assert result.returncode == 0


	@pytest.mark.parametrize("quantization", ["fp8", "fp8_cs", "fp8_block"])
	def test_cast_master_weights_to_fp8(quantization):
	if quantization in ("fp8", "fp8_cs") and not fp8_available:
	pytest.skip(reason_for_no_fp8)
	if quantization == "fp8_block" and not fp8_block_scaling_available:
	pytest.skip(reason_for_no_fp8_block_scaling)
	_run_test(quantization)

Conversation

sudhu2k commented Mar 19, 2026

Description

Type of change

Checklist:

Uh oh!

ipanfilo Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

sudhu2k Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

ipanfilo left a comment

Choose a reason for hiding this comment

Uh oh!

sudhu2k commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants