Rocm pixi env by Emrys-Merlin · Pull Request #175 · aqlaboratory/openfold-3

Emrys-Merlin · 2026-04-10T09:00:28Z

Summary

This PR introduces a ROCm pixi-environment called openfold3-rocm7 in line with the cpu/cuda12/cuda13 environments. This unifies the usage pattern of openfold3 after the migration to the pixi package manger.

Changes

Added a pytorch-rocm pixi-feature, which pulls pytorch and triton with rocm support from the pytorch PyPI mirror. (Please, note that we cannot pull pytorch-rocm dependencies from conda-forge (yet).)

Related Issues

I tried to build the environment on our HPC cluster, but our proxy interfered with the resolution of the pytorch dependency. @sdvillal thankfully already opened an issue about that with the pixi developers, so hopefully this will be resolved soon. I spun up an AWS EC2 instance where the resolution worked without any issues.

Testing

I could only test that the environment resolves as I do not have access to an AMD accelerator. @singagan if you could help me out here, that would be highly appreciated :-)

The current output of the validate-openfold3-rocm command is as follows:

$ pixi run -e openfold3-rocm7 validate-openfold3-rocm
OpenFold3 ROCm environment check

  [PASS] PyTorch installed: 2.11.0+rocm7.2
  [PASS] PyTorch built with ROCm (HIP): 7.2.26015
  [FAIL] ROCm GPU visible: none
  [PASS] Triton installed: 3.6.0
  [FAIL] Triton backend is HIP: 0 active drivers ([]). There should only be one.
  [PASS] Triton evoformer kernel loaded

One or more checks failed. See above for details.
Installation instructions: https://github.com/aqlaboratory/openfold-3/blob/main/docs/source/Installation.md

Other Notes
Note, as we need to pull pytorch from PyPI, we pull almost all dependencies from PyPI and not from conda-forge. This is necessary, because if any one of our dependencies were to pull pytorch from conda-forge, this would supersede our PyPI pytorch request and we would end up with a pytorch version without ROCm support. This is a known pixi limitation. If it gets resolved, we could think about pulling more of the dependencies from conda-forge, but this is optional and not a blocker.

@sdvillal, I would love to get your feedback. The environment setup is rather complex and I'm not completely convinced I assembled the rocm environment correctly (or if I pulled in unnecessary features).

@jnwei @jandom As discussed in #166, this is the draft to enable ROCm in the pixi setup.

jandom · 2026-04-14T16:17:10Z

@Emrys-Merlin great contribution Tim :-)

jandom · 2026-04-16T13:51:10Z

Getting some test failures with this env on AMD

FAILED openfold3/tests/test_triangular_attention.py::test_shape[cuda-True] - AssertionError: Values are not sufficiently close.
FAILED openfold3/tests/test_triangular_attention.py::test_shape[cuda-False] - AssertionError: Values are not sufficiently close.
FAILED openfold3/tests/test_triangular_multiplicative_update.py::test_shape[cuda] - AssertionError: Values are not sufficiently close.

It could all be expected numerics, unclear. This is the chip

(openfold3:openfold3-rocm7) [jandom@k006-004-v2 openfold-3]$ amd-smi 
+------------------------------------------------------------------------------+
| AMD-SMI 26.2.1+fc0010cf6a    amdgpu version: 6.16.13  ROCm version: 7.2.0    |
| VBIOS version: 613661                                                        |
| Platform: Linux Guest (Passthrough)                                          |
|-------------------------------------+----------------------------------------|
| BDF                        GPU-Name | Mem-Uti   Temp   UEC       Power-Usage |
| GPU  HIP-ID  OAM-ID  Partition-Mode | GFX-Uti    Fan               Mem-Usage |
|=====================================+========================================|
| 0000:0c:00.0     AMD Instinct MI210 | 0 %      51 °C   0            43/300 W |
|   0       0     N/A             N/A | 0 %        N/A             10/65520 MB |
+-------------------------------------+----------------------------------------+
+------------------------------------------------------------------------------+
| Processes:                                                                   |
|  GPU        PID  Process Name          GTT_MEM  VRAM_MEM  MEM_USAGE     CU % |
|==============================================================================|
|  No running processes found                                                  |
+------------------------------------------------------------------------------+

update

Looking in a more detailed way, the test_triangular_multiplicative_update.py update seems fine/minimal drift

E       
E       output:
E         Shape: (2, 22, 22, 128)
E         Number of differences: 27 / 123904 (0.0%)
E         Statistics are computed for differing elements only.
E         Stats for abs(obtained - expected):
E           Max:     1.8238788470625877e-06
E           Mean:    1.1477516181912506e-06
E           Median:  1.0848743841052055e-06
E         Stats for abs(obtained - expected) / abs(expected):
E           Max:     0.003565334714949131
E           Mean:    0.0020918985828757286
E           Median:  0.001880077994428575
E         Individual errors:

The other two (both for test_triangular_attention.py) look more severe:

       output:
E         Shape: (2, 22, 22, 128)
E         Number of differences: 57574 / 123904 (46.5%)
E         Statistics are computed for differing elements only.
E         Stats for abs(obtained - expected):
E           Max:     6.344435678329319e-05
E           Mean:    9.54591541812988e-06
E           Median:  8.05101626610849e-06
E         Stats for abs(obtained - expected) / abs(expected):
E           Max:     2778.166259765625
E           Mean:    1.3626092672348022
E           Median:  0.3236933946609497

E       output:
E         Shape: (2, 22, 22, 128)
E         Number of differences: 47210 / 123904 (38.1%)
E         Statistics are computed for differing elements only.
E         Stats for abs(obtained - expected):
E           Max:     5.022007826482877e-05
E           Mean:    1.0010324331233278e-05
E           Median:  8.413369869231246e-06
E         Stats for abs(obtained - expected) / abs(expected):
E           Max:     57885.04296875
E           Mean:    3.7973642349243164
E           Median:  0.34220370650291443

Emrys-Merlin · 2026-04-17T15:22:06Z

Thanks a lot for testing this @jandom! I really appreciate it :-)

I think I count it as a win that the tests ran at all :-D

I agree that some of the numerical differences warrant deeper inspection. I'm open to support here, but I am a bit handicapped without access to AMD GPUs. If it is easy for you to share limited access with me to debug this, that could speed up things a bit. I will continue looking for an internal solution.

I will be on vacation next week. So, I won't be very responsive. If we don't find a solution until Barcelona, I'm happy to chat there :-)

jandom · 2026-04-20T10:51:17Z

No worries, I've shared this ticket with Gagan already – he might be able to come in and help

Emrys-Merlin and others added 3 commits April 9, 2026 20:05

First draft for rocm env

de7e331

First working install

0d6d0fc

Remove pytorch-lighting dep in pixi.toml

89815f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rocm pixi env#175

Rocm pixi env#175
Emrys-Merlin wants to merge 3 commits intoaqlaboratory:pixi-betafrom
sdvillal:rocm-pixi-env

Emrys-Merlin commented Apr 10, 2026

Uh oh!

jandom commented Apr 14, 2026

Uh oh!

jandom commented Apr 16, 2026 •

edited

Loading

Uh oh!

Emrys-Merlin commented Apr 17, 2026

Uh oh!

jandom commented Apr 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Emrys-Merlin commented Apr 10, 2026

Uh oh!

jandom commented Apr 14, 2026

Uh oh!

jandom commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Emrys-Merlin commented Apr 17, 2026

Uh oh!

jandom commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jandom commented Apr 16, 2026 •

edited

Loading

jandom commented Apr 20, 2026 •

edited

Loading