Skip to content

Rocm pixi env#175

Draft
Emrys-Merlin wants to merge 3 commits intoaqlaboratory:pixi-betafrom
sdvillal:rocm-pixi-env
Draft

Rocm pixi env#175
Emrys-Merlin wants to merge 3 commits intoaqlaboratory:pixi-betafrom
sdvillal:rocm-pixi-env

Conversation

@Emrys-Merlin
Copy link
Copy Markdown

Summary

This PR introduces a ROCm pixi-environment called openfold3-rocm7 in line with the cpu/cuda12/cuda13 environments. This unifies the usage pattern of openfold3 after the migration to the pixi package manger.

Changes

  • Added a pytorch-rocm pixi-feature, which pulls pytorch and triton with rocm support from the pytorch PyPI mirror. (Please, note that we cannot pull pytorch-rocm dependencies from conda-forge (yet).)

Related Issues

I tried to build the environment on our HPC cluster, but our proxy interfered with the resolution of the pytorch dependency. @sdvillal thankfully already opened an issue about that with the pixi developers, so hopefully this will be resolved soon. I spun up an AWS EC2 instance where the resolution worked without any issues.

Testing

I could only test that the environment resolves as I do not have access to an AMD accelerator. @singagan if you could help me out here, that would be highly appreciated :-)

The current output of the validate-openfold3-rocm command is as follows:

$ pixi run -e openfold3-rocm7 validate-openfold3-rocm
OpenFold3 ROCm environment check

  [PASS] PyTorch installed: 2.11.0+rocm7.2
  [PASS] PyTorch built with ROCm (HIP): 7.2.26015
  [FAIL] ROCm GPU visible: none
  [PASS] Triton installed: 3.6.0
  [FAIL] Triton backend is HIP: 0 active drivers ([]). There should only be one.
  [PASS] Triton evoformer kernel loaded

One or more checks failed. See above for details.
Installation instructions: https://github.com/aqlaboratory/openfold-3/blob/main/docs/source/Installation.md

Other Notes
Note, as we need to pull pytorch from PyPI, we pull almost all dependencies from PyPI and not from conda-forge. This is necessary, because if any one of our dependencies were to pull pytorch from conda-forge, this would supersede our PyPI pytorch request and we would end up with a pytorch version without ROCm support. This is a known pixi limitation. If it gets resolved, we could think about pulling more of the dependencies from conda-forge, but this is optional and not a blocker.

@sdvillal, I would love to get your feedback. The environment setup is rather complex and I'm not completely convinced I assembled the rocm environment correctly (or if I pulled in unnecessary features).

@jnwei @jandom As discussed in #166, this is the draft to enable ROCm in the pixi setup.

@jandom
Copy link
Copy Markdown
Collaborator

jandom commented Apr 14, 2026

@Emrys-Merlin great contribution Tim :-)

@jandom
Copy link
Copy Markdown
Collaborator

jandom commented Apr 16, 2026

Getting some test failures with this env on AMD

FAILED openfold3/tests/test_triangular_attention.py::test_shape[cuda-True] - AssertionError: Values are not sufficiently close.
FAILED openfold3/tests/test_triangular_attention.py::test_shape[cuda-False] - AssertionError: Values are not sufficiently close.
FAILED openfold3/tests/test_triangular_multiplicative_update.py::test_shape[cuda] - AssertionError: Values are not sufficiently close.

It could all be expected numerics, unclear. This is the chip

(openfold3:openfold3-rocm7) [jandom@k006-004-v2 openfold-3]$ amd-smi 
+------------------------------------------------------------------------------+
| AMD-SMI 26.2.1+fc0010cf6a    amdgpu version: 6.16.13  ROCm version: 7.2.0    |
| VBIOS version: 613661                                                        |
| Platform: Linux Guest (Passthrough)                                          |
|-------------------------------------+----------------------------------------|
| BDF                        GPU-Name | Mem-Uti   Temp   UEC       Power-Usage |
| GPU  HIP-ID  OAM-ID  Partition-Mode | GFX-Uti    Fan               Mem-Usage |
|=====================================+========================================|
| 0000:0c:00.0     AMD Instinct MI210 | 0 %      51 °C   0            43/300 W |
|   0       0     N/A             N/A | 0 %        N/A             10/65520 MB |
+-------------------------------------+----------------------------------------+
+------------------------------------------------------------------------------+
| Processes:                                                                   |
|  GPU        PID  Process Name          GTT_MEM  VRAM_MEM  MEM_USAGE     CU % |
|==============================================================================|
|  No running processes found                                                  |
+------------------------------------------------------------------------------+

update

Looking in a more detailed way, the test_triangular_multiplicative_update.py update seems fine/minimal drift

E       
E       output:
E         Shape: (2, 22, 22, 128)
E         Number of differences: 27 / 123904 (0.0%)
E         Statistics are computed for differing elements only.
E         Stats for abs(obtained - expected):
E           Max:     1.8238788470625877e-06
E           Mean:    1.1477516181912506e-06
E           Median:  1.0848743841052055e-06
E         Stats for abs(obtained - expected) / abs(expected):
E           Max:     0.003565334714949131
E           Mean:    0.0020918985828757286
E           Median:  0.001880077994428575
E         Individual errors:

The other two (both for test_triangular_attention.py) look more severe:

       output:
E         Shape: (2, 22, 22, 128)
E         Number of differences: 57574 / 123904 (46.5%)
E         Statistics are computed for differing elements only.
E         Stats for abs(obtained - expected):
E           Max:     6.344435678329319e-05
E           Mean:    9.54591541812988e-06
E           Median:  8.05101626610849e-06
E         Stats for abs(obtained - expected) / abs(expected):
E           Max:     2778.166259765625
E           Mean:    1.3626092672348022
E           Median:  0.3236933946609497

E       output:
E         Shape: (2, 22, 22, 128)
E         Number of differences: 47210 / 123904 (38.1%)
E         Statistics are computed for differing elements only.
E         Stats for abs(obtained - expected):
E           Max:     5.022007826482877e-05
E           Mean:    1.0010324331233278e-05
E           Median:  8.413369869231246e-06
E         Stats for abs(obtained - expected) / abs(expected):
E           Max:     57885.04296875
E           Mean:    3.7973642349243164
E           Median:  0.34220370650291443

@Emrys-Merlin
Copy link
Copy Markdown
Author

Thanks a lot for testing this @jandom! I really appreciate it :-)

I think I count it as a win that the tests ran at all :-D

I agree that some of the numerical differences warrant deeper inspection. I'm open to support here, but I am a bit handicapped without access to AMD GPUs. If it is easy for you to share limited access with me to debug this, that could speed up things a bit. I will continue looking for an internal solution.

I will be on vacation next week. So, I won't be very responsive. If we don't find a solution until Barcelona, I'm happy to chat there :-)

@jandom
Copy link
Copy Markdown
Collaborator

jandom commented Apr 20, 2026

No worries, I've shared this ticket with Gagan already – he might be able to come in and help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants