feature: pixi-beta by jandom · Pull Request #143 · aqlaboratory/openfold-3

jandom · 2026-03-23T11:34:31Z

Completing the work started in #34

Remaining tasks

fix random ruff problems
update the docker build to use both pixi and conda
confirm that inference can be done
confirm that training can be done (decision: deferred)

* Add initial pixi environment all tests pass, predictions seem to be correct corresponds to a modernized conda environment following best practices * Reorder dependencies for easier read * Add openfold3 as an editable dependency * Sync cuda-python pin between pypi package and the conda environment * Comments Comments Overcommenting issues * Add explicitly a conda yml version of the pixi environment * Improve some wordings * Update pixi lockfile * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Swap ninja verification with pytorch's * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Use vendored deepspeed evoformer builder Use vendored deepspeed in the attention primitives * Add symlink to vendored deepspeed as in upstream * Vendor also op_builder.__init__ from deepspeed * Import explicitly EvoformerAttnBuilder, avoiding broken introspection magic * Add a ignore mechanism for cutlass detection in vendored deepspeed * Apply cutlass detection workaround and remove all nvidia-cutlass tricks from pixi environment * Remove nvidia-cutlass from openfold-3 dependencies (fix later) * Remove pypi ninja dependency in pixi workspace * No need for cutlass hacks * Add pixi config to .gitattributes * Remove deepspeed hacks for good * Update pixi lockfile * Update pixi conda environment * Remove MKL from pypi dependencies, as it is unused * Remove aria2 from pypi dependencies, unused and not so much of a convenience * Update lockfile Update lockfile * Re-enable pure PyPI install * Disable hack when conda is active * More comments on cutlass python API deprecation and pytorch * Make pixi environments (CPU, CUDA12, CUDA13, for all major platforms) * Increase LMDB map size to make test pass in osx-arm64 * Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml * Pin cuequivariance until test failure is investigated * Move deepspeed to optional dependency also in pyproject * Pyproject: extend python version support * Pyproject: move dependencies table together with optional-dependencies * Pyproject: document future decision on dependency-groups * Pyproject: reformat to consolidate indent to 4 spaces * Pyproject: reorder dependencies for easier read * Pixi: add scipy * Pixi: add comment on CUDA13 * Pixi: make cuequivariance CUDA generic for its conda packages * Pixi: add reminder about devel install * Pyproject: fix and improve readability, add URLs * pixi.toml: make more readable by showing first envs, then base, then variants * pixi.toml: pin deepspeed to 0.18.3, first one with ninja detection fixed * pixi.toml: fully enable aarch64 and cuda13, revamp docs * pixi.lock: update * pixi.toml: add triton to cuequivariance dependencies for CUDA13 * pixi.lock: update * pixi.toml: include pip to allow users to play * pixi.toml: formatting for better readability * pixi.toml: restrict cuequivariance-cu13 to linux-64 until we unpin to >=0.8 * pixi.toml: formatting for better readability * pixi.toml: make pytorch-gpu an isolated environment feature in this way we can more easily express when a package is not ready yet in CF * pixi.toml: add environments that combine mostly pypi-based deps with CUDA from conda * pixi.toml: add openfold3-editable-full and account for lack of cuequivariance for python=3.14 * pixi.toml: brief documentation of the pypi-dominant environments * pixi.toml: add also the dev optional dependency group to openfold3-full * pyproject.toml: pin cuequivariance to <0.8 until we adapt tests * pixi.toml: add kalign to required non-pypi dependencies * pixi.toml: add more bioinformatics tools to non-pypi * pixi.toml: make env setup be part of the deepspeed-build feature * pixi.toml: simplify management of pypi features * pixi.lock: update, all tests pass A100,B300 x CUDA12,CUDA13 * pixi.toml: add table of what works and what needs test * pixi.toml: add tasks for exporting to regular conda environment yamls * conda environments: delete outdated modernized conda env, use new tasks instead * pixi.toml: bump min pixi version * pixi.toml: remove unnecessary comments * pixi.toml: remove unnecessary envvar definition for isolating extension builds * pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment * pixi.toml: add simple task to run test and save rsults to an environment-specific dir * of3: enable pickling regardless of forking strategy and platform * of3: enable multiple data loader workers in osx mps backed * Vendor improved deepspeed builder from upstream PR See: deepspeedai/DeepSpeed#7760 * pixi.lock: update * pixi.toml: remove some comment noise * of3: fix multiprocessing configuration corner case in osx * docker: move outdated example dockerfiles to docker/pixi-examples * examples: add example runner for osx inference * pixi.toml: ensure we get the right pytorch from pypi something smilar should actually be supported in pyproject.toml * pixi.lock: update, fixed torch cuda missmatch in pypi environments * pixi.toml: fix lock export + make default environment be maintenance * pixi.toml: use a more consitent name for environment arg * pixi.lock: update * pixi.toml: workaround for no-default-feature breaking the test task (pixi bug) * pixi.toml: issue with pixi pypi resolution seems solved * Revert "pixi.toml: issue with pixi pypi resolution seems solved" This reverts commit ded3482. * pixi.toml: better document problem and workaround * pixi.toml: make the test task present in all relevant environments this I feel makes less surprising its use, as opposed to passing the environment as an arg to a dependent task * pixi.toml: let CUDA13 flow freely * pixi.lock: update for initial pytorch 2.10, cuda 13.1 support * pixi.toml: add safe cuda environments (no accelerators) * of3: remove deepspeed hacks note that there are still some in __init__.py * of3: unvendor deepspeed * pixi.toml: simplify deepspeed dependency after our changes made it to CF/pypi * pixi.toml: remove safe environments as we are not maintaining them * pixi.toml: enable pytorch-coda in cuda 13 env after 2.10 release * pyproject.toml: pin deepspeed to >0.18.5, improved evoformer compilation * Add awscrt to dependencies, missing from recent PR * pixi.toml: setup correctly path to PTXAS_BLACKWELL for triton >=3.6.0 * pixi.toml: add -safe environments, at the moment just without cuequivariance these are also conda-pure environments * pixi.lock: update after consolidation (no vendor, pytorch 2.10 + CF cuda13) * pixi.toml: update outdated comments * updates with GB10 tests (#2) * updates with GB10 tests * cleanup * harmonize * linting data_module.py * speculative changes * pixi.toml: remove safe environments * pixi.lock: update after removal of safe environments * Remove pixi docker examples, to rework * Comment-out workaround for hard to reproduce ABI mismatch problem * pixi.toml: bump pixi, improve conda export by including all env variables * pixi.toml: unpin biotite * pixi.toml: python has its own feature * pixi.toml: bump deepspeed * pyproject.toml: bump deepspeed to version without Evoformer build bug * pixi.toml: detail on workaround * pixi.lock: update * pixi.toml: add example task to update safely the lockfile * pixi.toml: remove kalign2 * tests: fix test depending on unspecified glob return order * pixi.toml: better metadata * docs: wip * pixi.lock: update * Allow to configure multiprocessing start and set safe defaults We would still need to document this for users * Fix capitalization error * Fix capitalization error * Fix typo * pixi.lock: update --------- Co-authored-by: Tim Adler <tim.adler@bayer.com> Co-authored-by: Jan Domański <jan.domanski@omsf.io>

jandom · 2026-04-07T08:51:04Z

@sdvillal I tried to get something for the docs that gives people an overview of this feature – what do you think about this?

jandom · 2026-04-08T11:04:44Z

+        lmdb_env.close()
+        del lmdb_env


I'm not in love with this – might open a side-car PR to clean this up. LMDB roundtrip test fails for me without this hack. We just need a better way to cleanup these resources

jandom · 2026-04-08T11:05:16Z

                transaction.put(key_bytes, val_bytes)

        lmdb_env.close()
+        del lmdb_env


Same LMDB hack here

jandom · 2026-04-08T11:06:09Z

        test_lmdb_dir = tmp_path / "test_lmdb"
-        map_size = 20 * 1024
+        map_size = 200 * 1024


We bumped into this while working on another problem, this should just be tied to the page size on the os – otherwise we're hacking around with random numbers

jandom · 2026-04-08T11:07:02Z

@sdvillal can you take a look here? I've messed around a bit

Manually triggered the test workflow from the branch, otherwise it picks-up the workflow form main (points to files that no longer exists)

https://github.com/aqlaboratory/openfold-3/actions/runs/24134205657

update all the above tests passed, I think this is ready to go! LMDB is the only loose-end but it's being addressed in other PRs already.

Another batch of tests https://github.com/aqlaboratory/openfold-3/actions/runs/24185282518 🛑 failed, missing dep

https://github.com/aqlaboratory/openfold-3/actions/runs/24186689907 🟢

jnwei

This looks great! Thank you for further polishing this @jandom , and of course @sdvillal and @Emrys-Merlin for the initial contribution in #34 We might need more tweaks to documentation/workflows later on, but this is a great starting point.

One question regarding the pixi environments and the diagram. While the diagram is a great way to show the different groups of dependencies,

I worry a bit that people will see openfold3-cpu and assume that we support a cpu only version for running OpenFold3. It might be good to have a footnote or other note explaining the limitations of that environment.

jnwei · 2026-04-10T06:43:12Z

+- **`Dockerfile.pixi`** (recommended) — uses [pixi](https://pixi.sh) to manage all dependencies including CUDA toolkit, cuDNN, CUTLASS, and build tools from conda-forge. No `nvidia/cuda` base image needed.
+- **`Dockerfile.conda`** (legacy) — uses conda/mamba with an `nvidia/cuda` base image. Will be deprecated in Q3 2026.
+
+## Pixi-based builds


Should we also add instructions for building a docker environment with cuequivariance? Currently that's only present in the conda path, which is scheduled to be deprecated

I think the pixi environments already come with cueq installed, so there is no need to do it optionally. it's on by default.

jnwei · 2026-04-10T07:23:52Z

+#[feature.kalign.pypi-dependencies]
+#kalign-python = "*"


Is this still being used? maybe it's redundant with the pyproject.toml dependencies?

yeah, this is not used atm but indicates a real pain-point I've encountered.

by default anything in pyproject.toml will default to a conda-package. for kalign-python, the pypi package is more up-to-date. So we may have to "bring this back" if and when needed.

We can remove it as I won't forget.

Note that kalign-python pypi packages vendor OpenMP's runtime, both in macos and linux, which could be problematic - it definitely is in macos where we need to workaround (I should document) a big fat warning, but I would not be too confident in linux even if its dynamic linker should be better behaved. I opened an issue upstream about it.

I also created a PR to have kalign-python conda packages (there is none at the moment). This would create openmp-risk-free environments (because in the conda ecosystem these dependencies are coordinated and shared by all dependencies within an environment). Unfortunately it is taking too long to come to a resolution. When and if that is accepted, we will add it to the conda dependencies - I would go as far as getting kalign-python from conda-forge also for the pypi-centered environments.

jandom · 2026-04-15T12:24:09Z

Your spidy-sense was correct @jnwei – the pixi envs include cueq but the tests are getting skipped (locally, DGX – on the AWS builders we don't have a GPU).

openfold3/tests/test_kernels.py::TestKernels::test_cueq_backward_bf16 SKIPPED (Requires CU-Equivaraince to be installed)           [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_backward_fp32 SKIPPED (Requires CU-Equivaraince to be installed)           [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_forward_bf16 SKIPPED (Requires CU-Equivaraince to be installed)            [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_forward_fp32 SKIPPED (Requires CU-Equivaraince to be installed)            [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_tri_mult_bwd SKIPPED (Requires CU-Equivaraince to be installed)            [ 72%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_tri_mult_fwd SKIPPED (Requires CU-Equivaraince to be installed)

Otherwise the branch is all green https://github.com/aqlaboratory/openfold-3/actions/runs/24453803143 🟢

sdvillal · 2026-04-15T16:22:11Z

Your spidy-sense was correct @jnwei – the pixi envs include cueq but the tests are getting skipped (locally, DGX – on the AWS builders we don't have a GPU).

openfold3/tests/test_kernels.py::TestKernels::test_cueq_backward_bf16 SKIPPED (Requires CU-Equivaraince to be installed)           [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_backward_fp32 SKIPPED (Requires CU-Equivaraince to be installed)           [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_forward_bf16 SKIPPED (Requires CU-Equivaraince to be installed)            [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_forward_fp32 SKIPPED (Requires CU-Equivaraince to be installed)            [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_tri_mult_bwd SKIPPED (Requires CU-Equivaraince to be installed)            [ 72%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_tri_mult_fwd SKIPPED (Requires CU-Equivaraince to be installed)

Otherwise the branch is all green https://github.com/aqlaboratory/openfold-3/actions/runs/24453803143 🟢

IIRC, we setup so that cueq is not installed in ARM systems. For me, cueq tests were run successfully in both A100s and B300 - we needed to pin it down for the tests to pass. Will give a look later.

sdvillal · 2026-04-15T16:33:10Z

I worry a bit that people will see openfold3-cpu and assume that we support a cpu only version for running OpenFold3. It might be good to have a footnote or other note explaining the limitations of that environment.

Inference works like a charm in CPU only - in my limited experience at least. What kind of limitations are you thinking about?

sdvillal · 2026-04-15T16:39:44Z

@sdvillal I tried to get something for the docs that gives people an overview of this feature – what do you think about this?

This looks cool; I would maybe simplify by (it is already good enough as is):

flagging default as a "maintenance" environment, not including openfold itself
I think a table linking to systems might be more informative (e.g., cuequivariance is not available in ARM)
people will need a good look to understand how colors are informative (they are, congrats for that!)
some rows are not consistent (e.g., deepspeed and not-in-pypi); then openfold-3-full also pulls deepspeed, so these envs also come with batteries included

Fine for me to merge as is and I can give a hand with more accurate docs later, if needed.

As a note, one of the limitations (or design choices) from pixi makes it difficult to have more modular environments (e.g., that you can opt out deepspeed with a command line option). I discussed with the pixi folks and they seem to have some ideas on how to bring this modularity back without fully compromising locking. It would be a great addition for our complex setup.

jandom · 2026-04-15T19:37:20Z

@sdvillal that's good feedback thanks!!

I think a table linking to systems might be more informative (e.g., cuequivariance is not available in ARM)

well about that :P #181 – it's available for 0.8 and above, but it required the tests to be fixed. inference works for me in this branch too.

some rows are not consistent (e.g., deepspeed and not-in-pypi); then openfold-3-full also pulls deepspeed, so these envs also come with batteries included

sorry it's late – maybe you can unpack this for me? BTW there is a source .excalidraw in the repo, and a vscode extension if you want to add your chef's kiss!

jandom mentioned this pull request Mar 23, 2026

build: fix the blackwell dockerfile #84

Closed

jandom self-assigned this Mar 23, 2026

jandom added 2 commits March 23, 2026 05:03

fix linter problems

ce70ebb

add pre-commit

9b1749c

jandom added the safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. label Mar 23, 2026

christinaflo mentioned this pull request Mar 26, 2026

Fixes for forkserver/spawn serialization and fix for LMDB upgrade issues #148

Open

jandom added 2 commits March 26, 2026 12:25

Merge branch 'main' into pixi-beta

6fbef74

Merge branch 'public-main' into pixi-beta

57736cd

jandom changed the title ~~feature: pixi~~ feature: pixi-beta Mar 31, 2026

Merge branch 'public-main' into pixi-beta

784502b

jandom added 7 commits April 7, 2026 02:02

add pixi.excalidraw to docs

f696d6c

Merge branch 'public-main' into pixi-beta

4ce0467

remove blackwell build instructions (obsolete)

c101a86

update docs to recommend pixi

cf4bfb6

better docs on pixi

4e36782

update pixi.lock

48e06e3

docker build and tests for pixi

8a4a26b

jandom added safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. and removed safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. labels Apr 8, 2026

jandom commented Apr 8, 2026

View reviewed changes

Merge branch 'main' into pixi-beta

3f9ed35

jandom requested a review from jnwei April 8, 2026 13:44

set a sensible 2mb default

888f070

jnwei mentioned this pull request Apr 8, 2026

Enable native AMD ROCm inference via Triton kernels #166

Merged

more context manager plus dirty dataclass

39ddce9

jandom added 2 commits April 9, 2026 03:03

unit tests

07d3454

Merge branch 'public-main' into pixi-beta

5e42337

jandom added safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. and removed safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. labels Apr 9, 2026

more linting

8a745e9

jandom added safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. and removed safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. labels Apr 9, 2026

jandom added 3 commits April 9, 2026 04:00

missed a dep: regenerate pixi.lock

ba24b95

Merge branch 'main' into pixi-beta

68e743e

remove duplicate projects

feeaacd

jnwei approved these changes Apr 10, 2026

View reviewed changes

jandom added 3 commits April 14, 2026 22:03

Merge branch 'main' into pixi-beta

a6df9f7

review: comments from Jennifer

ba83d3e

Merge branch 'public-main' into pixi-beta

bc7badc

jandom requested a review from jnwei April 15, 2026 12:03

jandom added safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. and removed safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. labels Apr 15, 2026

update pixi.lock

299e596

		#[feature.kalign.pypi-dependencies]
		#kalign-python = "*"

Conversation

jandom commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jandom commented Apr 7, 2026

Uh oh!

jandom Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

jandom Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

jandom Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

jandom commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnwei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jnwei Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

jandom Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnwei Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

jandom Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

sdvillal Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jandom commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdvillal commented Apr 15, 2026

Uh oh!

sdvillal commented Apr 15, 2026

Uh oh!

sdvillal commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jandom commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jandom commented Mar 23, 2026 •

edited

Loading

jandom commented Apr 8, 2026 •

edited

Loading

sdvillal Apr 15, 2026 •

edited

Loading

jandom commented Apr 15, 2026 •

edited

Loading

sdvillal commented Apr 15, 2026 •

edited

Loading

jandom commented Apr 15, 2026 •

edited

Loading