Skip to content

feature: pixi-beta#143

Open
jandom wants to merge 26 commits intomainfrom
pixi-beta
Open

feature: pixi-beta#143
jandom wants to merge 26 commits intomainfrom
pixi-beta

Conversation

@jandom
Copy link
Copy Markdown
Collaborator

@jandom jandom commented Mar 23, 2026

Completing the work started in #34

Remaining tasks

  • fix random ruff problems
  • update the docker build to use both pixi and conda
  • confirm that inference can be done
  • confirm that training can be done (decision: deferred)

* Add initial pixi environment

all tests pass, predictions seem to be correct
corresponds to a modernized conda environment following best practices

* Reorder dependencies for easier read

* Add openfold3 as an editable dependency

* Sync cuda-python pin between pypi package and the conda environment

* Comments

Comments

Overcommenting issues

* Add explicitly a conda yml version of the pixi environment

* Improve some wordings

* Update pixi lockfile

* Vendoring pieces of deepspeed

incomplete, we might not need the native sources
from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026

* Swap ninja verification with pytorch's

* Vendoring pieces of deepspeed

incomplete, we might not need the native sources
from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026

* Use vendored deepspeed evoformer builder

Use vendored deepspeed in the attention primitives

* Add symlink to vendored deepspeed as in upstream

* Vendor also op_builder.__init__ from deepspeed

* Import explicitly EvoformerAttnBuilder, avoiding broken introspection magic

* Add a ignore mechanism for cutlass detection in vendored deepspeed

* Apply cutlass detection workaround and remove all nvidia-cutlass tricks from pixi environment

* Remove nvidia-cutlass from openfold-3 dependencies (fix later)

* Remove pypi ninja dependency in pixi workspace

* No need for cutlass hacks

* Add pixi config to .gitattributes

* Remove deepspeed hacks for good

* Update pixi lockfile

* Update pixi conda environment

* Remove MKL from pypi dependencies, as it is unused

* Remove aria2 from pypi dependencies, unused and not so much of a convenience

* Update lockfile

Update lockfile

* Re-enable pure PyPI install

* Disable hack when conda is active

* More comments on cutlass python API deprecation and pytorch

* Make pixi environments (CPU, CUDA12, CUDA13, for all major platforms)

* Increase LMDB map size to make test pass in osx-arm64

* Better comments of TODOs in pixi.toml

Better comments of TODOs in pixi.toml

Better comments of TODOs in pixi.toml

* Pin cuequivariance until test failure is investigated

* Move deepspeed to optional dependency also in pyproject

* Pyproject: extend python version support

* Pyproject: move dependencies table together with optional-dependencies

* Pyproject: document future decision on dependency-groups

* Pyproject: reformat to consolidate indent to 4 spaces

* Pyproject: reorder dependencies for easier read

* Pixi: add scipy

* Pixi: add comment on CUDA13

* Pixi: make cuequivariance CUDA generic for its conda packages

* Pixi: add reminder about devel install

* Pyproject: fix and improve readability, add URLs

* pixi.toml: make more readable by showing first envs, then base, then variants

* pixi.toml: pin deepspeed to 0.18.3, first one with ninja detection fixed

* pixi.toml: fully enable aarch64 and cuda13, revamp docs

* pixi.lock: update

* pixi.toml: add triton to cuequivariance dependencies for CUDA13

* pixi.lock: update

* pixi.toml: include pip to allow users to play

* pixi.toml: formatting for better readability

* pixi.toml: restrict cuequivariance-cu13 to linux-64 until we unpin to >=0.8

* pixi.toml: formatting for better readability

* pixi.toml: make pytorch-gpu an isolated environment feature

in this way we can more easily express when a package is not ready yet in CF

* pixi.toml: add environments that combine mostly pypi-based deps with CUDA from conda

* pixi.toml: add openfold3-editable-full and account for lack of cuequivariance for python=3.14

* pixi.toml: brief documentation of the pypi-dominant environments

* pixi.toml: add also the dev optional dependency group to openfold3-full

* pyproject.toml: pin cuequivariance to <0.8 until we adapt tests

* pixi.toml: add kalign to required non-pypi dependencies

* pixi.toml: add more bioinformatics tools to non-pypi

* pixi.toml: make env setup be part of the deepspeed-build feature

* pixi.toml: simplify management of pypi features

* pixi.lock: update, all tests pass A100,B300 x CUDA12,CUDA13

* pixi.toml: add table of what works and what needs test

* pixi.toml: add tasks for exporting to regular conda environment yamls

* conda environments: delete outdated modernized conda env, use new tasks instead

* pixi.toml: bump min pixi version

* pixi.toml: remove unnecessary comments

* pixi.toml: remove unnecessary envvar definition for isolating extension builds

* pixi.toml: better definition of maintenance environment

pixi.toml: better definition of maintenance environment

pixi.toml: better definition of maintenance environment

* pixi.toml: add simple task to run test and save rsults to an environment-specific dir

* of3: enable pickling regardless of forking strategy and platform

* of3: enable multiple data loader workers in osx mps backed

* Vendor improved deepspeed builder from upstream PR

See: deepspeedai/DeepSpeed#7760

* pixi.lock: update

* pixi.toml: remove some comment noise

* of3: fix multiprocessing configuration corner case in osx

* docker: move outdated example dockerfiles to docker/pixi-examples

* examples: add example runner for osx inference

* pixi.toml: ensure we get the right pytorch from pypi

something smilar should actually be supported in pyproject.toml

* pixi.lock: update, fixed torch cuda missmatch in pypi environments

* pixi.toml: fix lock export + make default environment be maintenance

* pixi.toml: use a more consitent name for environment arg

* pixi.lock: update

* pixi.toml: workaround for no-default-feature breaking the test task (pixi bug)

* pixi.toml: issue with pixi pypi resolution seems solved

* Revert "pixi.toml: issue with pixi pypi resolution seems solved"

This reverts commit ded3482.

* pixi.toml: better document problem and workaround

* pixi.toml: make the test task present in all relevant environments

this I feel makes less surprising its use, as opposed to passing the environment as an arg to a dependent task

* pixi.toml: let CUDA13 flow freely

* pixi.lock: update for initial pytorch 2.10, cuda 13.1 support

* pixi.toml: add safe cuda environments (no accelerators)

* of3: remove deepspeed hacks

note that there are still some in __init__.py

* of3: unvendor deepspeed

* pixi.toml: simplify deepspeed dependency after our changes made it to CF/pypi

* pixi.toml: remove safe environments as we are not maintaining them

* pixi.toml: enable pytorch-coda in cuda 13 env after 2.10 release

* pyproject.toml: pin deepspeed to >0.18.5, improved evoformer compilation

* Add awscrt to dependencies, missing from recent PR

* pixi.toml: setup correctly path to PTXAS_BLACKWELL for triton >=3.6.0

* pixi.toml: add -safe environments, at the moment just without cuequivariance

these are also conda-pure environments

* pixi.lock: update after consolidation (no vendor, pytorch 2.10 + CF cuda13)

* pixi.toml: update outdated comments

* updates with GB10 tests (#2)

* updates with GB10 tests

* cleanup

* harmonize

* linting data_module.py

* speculative changes

* pixi.toml: remove safe environments

* pixi.lock: update after removal of safe environments

* Remove pixi docker examples, to rework

* Comment-out workaround for hard to reproduce ABI mismatch problem

* pixi.toml: bump pixi, improve conda export by including all env variables

* pixi.toml: unpin biotite

* pixi.toml: python has its own feature

* pixi.toml: bump deepspeed

* pyproject.toml: bump deepspeed to version without Evoformer build bug

* pixi.toml: detail on workaround

* pixi.lock: update

* pixi.toml: add example task to update safely the lockfile

* pixi.toml: remove kalign2

* tests: fix test depending on unspecified glob return order

* pixi.toml: better metadata

* docs: wip

* pixi.lock: update

* Allow to configure multiprocessing start and set safe defaults

We would still need to document this for users

* Fix capitalization error

* Fix capitalization error

* Fix typo

* pixi.lock: update

---------

Co-authored-by: Tim Adler <tim.adler@bayer.com>
Co-authored-by: Jan Domański <jan.domanski@omsf.io>
@jandom jandom self-assigned this Mar 23, 2026
@jandom jandom added the safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. label Mar 23, 2026
@jandom jandom changed the title feature: pixi feature: pixi-beta Mar 31, 2026
@jandom
Copy link
Copy Markdown
Collaborator Author

jandom commented Apr 7, 2026

Screenshot 2026-04-07 at 10 50 37

@sdvillal I tried to get something for the docs that gives people an overview of this feature – what do you think about this?

@jandom jandom added safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. and removed safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. labels Apr 8, 2026
Comment thread openfold3/core/data/io/dataset_cache.py Outdated
Comment on lines +197 to +198
lmdb_env.close()
del lmdb_env
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in love with this – might open a side-car PR to clean this up. LMDB roundtrip test fails for me without this hack. We just need a better way to cleanup these resources

transaction.put(key_bytes, val_bytes)

lmdb_env.close()
del lmdb_env
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same LMDB hack here

Comment thread openfold3/tests/test_lmdb.py Outdated
Comment on lines +68 to +69
test_lmdb_dir = tmp_path / "test_lmdb"
map_size = 20 * 1024
map_size = 200 * 1024
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We bumped into this while working on another problem, this should just be tied to the page size on the os – otherwise we're hacking around with random numbers

@jandom
Copy link
Copy Markdown
Collaborator Author

jandom commented Apr 8, 2026

@sdvillal can you take a look here? I've messed around a bit

Manually triggered the test workflow from the branch, otherwise it picks-up the workflow form main (points to files that no longer exists)

https://github.com/aqlaboratory/openfold-3/actions/runs/24134205657

update all the above tests passed, I think this is ready to go! LMDB is the only loose-end but it's being addressed in other PRs already.

Another batch of tests https://github.com/aqlaboratory/openfold-3/actions/runs/24185282518 🛑 failed, missing dep

https://github.com/aqlaboratory/openfold-3/actions/runs/24186689907 🟢

@jandom jandom requested a review from jnwei April 8, 2026 13:44
@jandom jandom added safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. and removed safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. labels Apr 9, 2026
@jandom jandom added safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. and removed safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. labels Apr 9, 2026
Copy link
Copy Markdown
Contributor

@jnwei jnwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thank you for further polishing this @jandom , and of course @sdvillal and @Emrys-Merlin for the initial contribution in #34 We might need more tweaks to documentation/workflows later on, but this is a great starting point.

One question regarding the pixi environments and the diagram. While the diagram is a great way to show the different groups of dependencies,

I worry a bit that people will see openfold3-cpu and assume that we support a cpu only version for running OpenFold3. It might be good to have a footnote or other note explaining the limitations of that environment.

Comment thread .github/workflows/integration-test.yml Outdated
Comment thread docker/DOCKER.md
- **`Dockerfile.pixi`** (recommended) — uses [pixi](https://pixi.sh) to manage all dependencies including CUDA toolkit, cuDNN, CUTLASS, and build tools from conda-forge. No `nvidia/cuda` base image needed.
- **`Dockerfile.conda`** (legacy) — uses conda/mamba with an `nvidia/cuda` base image. Will be deprecated in Q3 2026.

## Pixi-based builds
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add instructions for building a docker environment with cuequivariance? Currently that's only present in the conda path, which is scheduled to be deprecated

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pixi environments already come with cueq installed, so there is no need to do it optionally. it's on by default.

Comment thread docs/source/modern-conda-environments-with-pixi.md
Comment thread docs/source/modern-conda-environments-with-pixi.md Outdated
Comment thread examples/example_runner_yamls/osx.yaml Outdated
Comment thread pixi.toml Outdated
Comment on lines +156 to +157
#[feature.kalign.pypi-dependencies]
#kalign-python = "*"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still being used? maybe it's redundant with the pyproject.toml dependencies?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is not used atm but indicates a real pain-point I've encountered.

by default anything in pyproject.toml will default to a conda-package. for kalign-python, the pypi package is more up-to-date. So we may have to "bring this back" if and when needed.

Copy link
Copy Markdown

@sdvillal sdvillal Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove it as I won't forget.

Note that kalign-python pypi packages vendor OpenMP's runtime, both in macos and linux, which could be problematic - it definitely is in macos where we need to workaround (I should document) a big fat warning, but I would not be too confident in linux even if its dynamic linker should be better behaved. I opened an issue upstream about it.

I also created a PR to have kalign-python conda packages (there is none at the moment). This would create openmp-risk-free environments (because in the conda ecosystem these dependencies are coordinated and shared by all dependencies within an environment). Unfortunately it is taking too long to come to a resolution. When and if that is accepted, we will add it to the conda dependencies - I would go as far as getting kalign-python from conda-forge also for the pypi-centered environments.

Comment thread openfold3/tests/core/data/primitives/caches/test_format.py Outdated
Comment thread openfold3/tests/core/data/primitives/caches/test_lmdb.py Outdated
Comment thread openfold3/tests/core/data/primitives/caches/test_read_datacache.py
Comment thread openfold3/tests/core/data/primitives/caches/test_format.py
@jandom jandom requested a review from jnwei April 15, 2026 12:03
@jandom jandom added safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. and removed safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. labels Apr 15, 2026
@jandom
Copy link
Copy Markdown
Collaborator Author

jandom commented Apr 15, 2026

Your spidy-sense was correct @jnwei – the pixi envs include cueq but the tests are getting skipped (locally, DGX – on the AWS builders we don't have a GPU).

openfold3/tests/test_kernels.py::TestKernels::test_cueq_backward_bf16 SKIPPED (Requires CU-Equivaraince to be installed)           [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_backward_fp32 SKIPPED (Requires CU-Equivaraince to be installed)           [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_forward_bf16 SKIPPED (Requires CU-Equivaraince to be installed)            [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_forward_fp32 SKIPPED (Requires CU-Equivaraince to be installed)            [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_tri_mult_bwd SKIPPED (Requires CU-Equivaraince to be installed)            [ 72%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_tri_mult_fwd SKIPPED (Requires CU-Equivaraince to be installed) 

Otherwise the branch is all green https://github.com/aqlaboratory/openfold-3/actions/runs/24453803143 🟢

@sdvillal
Copy link
Copy Markdown

Your spidy-sense was correct @jnwei – the pixi envs include cueq but the tests are getting skipped (locally, DGX – on the AWS builders we don't have a GPU).

openfold3/tests/test_kernels.py::TestKernels::test_cueq_backward_bf16 SKIPPED (Requires CU-Equivaraince to be installed)           [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_backward_fp32 SKIPPED (Requires CU-Equivaraince to be installed)           [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_forward_bf16 SKIPPED (Requires CU-Equivaraince to be installed)            [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_forward_fp32 SKIPPED (Requires CU-Equivaraince to be installed)            [ 71%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_tri_mult_bwd SKIPPED (Requires CU-Equivaraince to be installed)            [ 72%]
openfold3/tests/test_kernels.py::TestKernels::test_cueq_tri_mult_fwd SKIPPED (Requires CU-Equivaraince to be installed) 

Otherwise the branch is all green https://github.com/aqlaboratory/openfold-3/actions/runs/24453803143 🟢

IIRC, we setup so that cueq is not installed in ARM systems. For me, cueq tests were run successfully in both A100s and B300 - we needed to pin it down for the tests to pass. Will give a look later.

@sdvillal
Copy link
Copy Markdown

I worry a bit that people will see openfold3-cpu and assume that we support a cpu only version for running OpenFold3. It might be good to have a footnote or other note explaining the limitations of that environment.

Inference works like a charm in CPU only - in my limited experience at least. What kind of limitations are you thinking about?

@sdvillal
Copy link
Copy Markdown

sdvillal commented Apr 15, 2026

Screenshot 2026-04-07 at 10 50 37 @sdvillal I tried to get something for the docs that gives people an overview of this feature – what do you think about this?

This looks cool; I would maybe simplify by (it is already good enough as is):

  • flagging default as a "maintenance" environment, not including openfold itself
  • I think a table linking to systems might be more informative (e.g., cuequivariance is not available in ARM)
  • people will need a good look to understand how colors are informative (they are, congrats for that!)
  • some rows are not consistent (e.g., deepspeed and not-in-pypi); then openfold-3-full also pulls deepspeed, so these envs also come with batteries included

Fine for me to merge as is and I can give a hand with more accurate docs later, if needed.

As a note, one of the limitations (or design choices) from pixi makes it difficult to have more modular environments (e.g., that you can opt out deepspeed with a command line option). I discussed with the pixi folks and they seem to have some ideas on how to bring this modularity back without fully compromising locking. It would be a great addition for our complex setup.

@jandom
Copy link
Copy Markdown
Collaborator Author

jandom commented Apr 15, 2026

@sdvillal that's good feedback thanks!!

I think a table linking to systems might be more informative (e.g., cuequivariance is not available in ARM)

well about that :P #181 – it's available for 0.8 and above, but it required the tests to be fixed. inference works for me in this branch too.

some rows are not consistent (e.g., deepspeed and not-in-pypi); then openfold-3-full also pulls deepspeed, so these envs also come with batteries included

sorry it's late – maybe you can unpack this for me? BTW there is a source .excalidraw in the repo, and a vscode extension if you want to add your chef's kiss!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants