Conversation
* Add initial pixi environment all tests pass, predictions seem to be correct corresponds to a modernized conda environment following best practices * Reorder dependencies for easier read * Add openfold3 as an editable dependency * Sync cuda-python pin between pypi package and the conda environment * Comments Comments Overcommenting issues * Add explicitly a conda yml version of the pixi environment * Improve some wordings * Update pixi lockfile * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Swap ninja verification with pytorch's * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Use vendored deepspeed evoformer builder Use vendored deepspeed in the attention primitives * Add symlink to vendored deepspeed as in upstream * Vendor also op_builder.__init__ from deepspeed * Import explicitly EvoformerAttnBuilder, avoiding broken introspection magic * Add a ignore mechanism for cutlass detection in vendored deepspeed * Apply cutlass detection workaround and remove all nvidia-cutlass tricks from pixi environment * Remove nvidia-cutlass from openfold-3 dependencies (fix later) * Remove pypi ninja dependency in pixi workspace * No need for cutlass hacks * Add pixi config to .gitattributes * Remove deepspeed hacks for good * Update pixi lockfile * Update pixi conda environment * Remove MKL from pypi dependencies, as it is unused * Remove aria2 from pypi dependencies, unused and not so much of a convenience * Update lockfile Update lockfile * Re-enable pure PyPI install * Disable hack when conda is active * More comments on cutlass python API deprecation and pytorch * Make pixi environments (CPU, CUDA12, CUDA13, for all major platforms) * Increase LMDB map size to make test pass in osx-arm64 * Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml * Pin cuequivariance until test failure is investigated * Move deepspeed to optional dependency also in pyproject * Pyproject: extend python version support * Pyproject: move dependencies table together with optional-dependencies * Pyproject: document future decision on dependency-groups * Pyproject: reformat to consolidate indent to 4 spaces * Pyproject: reorder dependencies for easier read * Pixi: add scipy * Pixi: add comment on CUDA13 * Pixi: make cuequivariance CUDA generic for its conda packages * Pixi: add reminder about devel install * Pyproject: fix and improve readability, add URLs * pixi.toml: make more readable by showing first envs, then base, then variants * pixi.toml: pin deepspeed to 0.18.3, first one with ninja detection fixed * pixi.toml: fully enable aarch64 and cuda13, revamp docs * pixi.lock: update * pixi.toml: add triton to cuequivariance dependencies for CUDA13 * pixi.lock: update * pixi.toml: include pip to allow users to play * pixi.toml: formatting for better readability * pixi.toml: restrict cuequivariance-cu13 to linux-64 until we unpin to >=0.8 * pixi.toml: formatting for better readability * pixi.toml: make pytorch-gpu an isolated environment feature in this way we can more easily express when a package is not ready yet in CF * pixi.toml: add environments that combine mostly pypi-based deps with CUDA from conda * pixi.toml: add openfold3-editable-full and account for lack of cuequivariance for python=3.14 * pixi.toml: brief documentation of the pypi-dominant environments * pixi.toml: add also the dev optional dependency group to openfold3-full * pyproject.toml: pin cuequivariance to <0.8 until we adapt tests * pixi.toml: add kalign to required non-pypi dependencies * pixi.toml: add more bioinformatics tools to non-pypi * pixi.toml: make env setup be part of the deepspeed-build feature * pixi.toml: simplify management of pypi features * pixi.lock: update, all tests pass A100,B300 x CUDA12,CUDA13 * pixi.toml: add table of what works and what needs test * pixi.toml: add tasks for exporting to regular conda environment yamls * conda environments: delete outdated modernized conda env, use new tasks instead * pixi.toml: bump min pixi version * pixi.toml: remove unnecessary comments * pixi.toml: remove unnecessary envvar definition for isolating extension builds * pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment * pixi.toml: add simple task to run test and save rsults to an environment-specific dir * of3: enable pickling regardless of forking strategy and platform * of3: enable multiple data loader workers in osx mps backed * Vendor improved deepspeed builder from upstream PR See: deepspeedai/DeepSpeed#7760 * pixi.lock: update * pixi.toml: remove some comment noise * of3: fix multiprocessing configuration corner case in osx * docker: move outdated example dockerfiles to docker/pixi-examples * examples: add example runner for osx inference * pixi.toml: ensure we get the right pytorch from pypi something smilar should actually be supported in pyproject.toml * pixi.lock: update, fixed torch cuda missmatch in pypi environments * pixi.toml: fix lock export + make default environment be maintenance * pixi.toml: use a more consitent name for environment arg * pixi.lock: update * pixi.toml: workaround for no-default-feature breaking the test task (pixi bug) * pixi.toml: issue with pixi pypi resolution seems solved * Revert "pixi.toml: issue with pixi pypi resolution seems solved" This reverts commit ded3482. * pixi.toml: better document problem and workaround * pixi.toml: make the test task present in all relevant environments this I feel makes less surprising its use, as opposed to passing the environment as an arg to a dependent task * pixi.toml: let CUDA13 flow freely * pixi.lock: update for initial pytorch 2.10, cuda 13.1 support * pixi.toml: add safe cuda environments (no accelerators) * of3: remove deepspeed hacks note that there are still some in __init__.py * of3: unvendor deepspeed * pixi.toml: simplify deepspeed dependency after our changes made it to CF/pypi * pixi.toml: remove safe environments as we are not maintaining them * pixi.toml: enable pytorch-coda in cuda 13 env after 2.10 release * pyproject.toml: pin deepspeed to >0.18.5, improved evoformer compilation * Add awscrt to dependencies, missing from recent PR * pixi.toml: setup correctly path to PTXAS_BLACKWELL for triton >=3.6.0 * pixi.toml: add -safe environments, at the moment just without cuequivariance these are also conda-pure environments * pixi.lock: update after consolidation (no vendor, pytorch 2.10 + CF cuda13) * pixi.toml: update outdated comments * updates with GB10 tests (#2) * updates with GB10 tests * cleanup * harmonize * linting data_module.py * speculative changes * pixi.toml: remove safe environments * pixi.lock: update after removal of safe environments * Remove pixi docker examples, to rework * Comment-out workaround for hard to reproduce ABI mismatch problem * pixi.toml: bump pixi, improve conda export by including all env variables * pixi.toml: unpin biotite * pixi.toml: python has its own feature * pixi.toml: bump deepspeed * pyproject.toml: bump deepspeed to version without Evoformer build bug * pixi.toml: detail on workaround * pixi.lock: update * pixi.toml: add example task to update safely the lockfile * pixi.toml: remove kalign2 * tests: fix test depending on unspecified glob return order * pixi.toml: better metadata * docs: wip * pixi.lock: update * Allow to configure multiprocessing start and set safe defaults We would still need to document this for users * Fix capitalization error * Fix capitalization error * Fix typo * pixi.lock: update --------- Co-authored-by: Tim Adler <tim.adler@bayer.com> Co-authored-by: Jan Domański <jan.domanski@omsf.io>
@sdvillal I tried to get something for the docs that gives people an overview of this feature – what do you think about this? |
| lmdb_env.close() | ||
| del lmdb_env |
There was a problem hiding this comment.
I'm not in love with this – might open a side-car PR to clean this up. LMDB roundtrip test fails for me without this hack. We just need a better way to cleanup these resources
| transaction.put(key_bytes, val_bytes) | ||
|
|
||
| lmdb_env.close() | ||
| del lmdb_env |
| test_lmdb_dir = tmp_path / "test_lmdb" | ||
| map_size = 20 * 1024 | ||
| map_size = 200 * 1024 |
There was a problem hiding this comment.
We bumped into this while working on another problem, this should just be tied to the page size on the os – otherwise we're hacking around with random numbers
|
@sdvillal can you take a look here? I've messed around a bit Manually triggered the test workflow from the branch, otherwise it picks-up the workflow form main (points to files that no longer exists) https://github.com/aqlaboratory/openfold-3/actions/runs/24134205657 update all the above tests passed, I think this is ready to go! LMDB is the only loose-end but it's being addressed in other PRs already. Another batch of tests https://github.com/aqlaboratory/openfold-3/actions/runs/24185282518 🛑 failed, missing dep https://github.com/aqlaboratory/openfold-3/actions/runs/24186689907 🟢 |
jnwei
left a comment
There was a problem hiding this comment.
This looks great! Thank you for further polishing this @jandom , and of course @sdvillal and @Emrys-Merlin for the initial contribution in #34 We might need more tweaks to documentation/workflows later on, but this is a great starting point.
One question regarding the pixi environments and the diagram. While the diagram is a great way to show the different groups of dependencies,
I worry a bit that people will see openfold3-cpu and assume that we support a cpu only version for running OpenFold3. It might be good to have a footnote or other note explaining the limitations of that environment.
| - **`Dockerfile.pixi`** (recommended) — uses [pixi](https://pixi.sh) to manage all dependencies including CUDA toolkit, cuDNN, CUTLASS, and build tools from conda-forge. No `nvidia/cuda` base image needed. | ||
| - **`Dockerfile.conda`** (legacy) — uses conda/mamba with an `nvidia/cuda` base image. Will be deprecated in Q3 2026. | ||
|
|
||
| ## Pixi-based builds |
There was a problem hiding this comment.
Should we also add instructions for building a docker environment with cuequivariance? Currently that's only present in the conda path, which is scheduled to be deprecated
There was a problem hiding this comment.
I think the pixi environments already come with cueq installed, so there is no need to do it optionally. it's on by default.
| #[feature.kalign.pypi-dependencies] | ||
| #kalign-python = "*" |
There was a problem hiding this comment.
Is this still being used? maybe it's redundant with the pyproject.toml dependencies?
There was a problem hiding this comment.
yeah, this is not used atm but indicates a real pain-point I've encountered.
by default anything in pyproject.toml will default to a conda-package. for kalign-python, the pypi package is more up-to-date. So we may have to "bring this back" if and when needed.
There was a problem hiding this comment.
We can remove it as I won't forget.
Note that kalign-python pypi packages vendor OpenMP's runtime, both in macos and linux, which could be problematic - it definitely is in macos where we need to workaround (I should document) a big fat warning, but I would not be too confident in linux even if its dynamic linker should be better behaved. I opened an issue upstream about it.
I also created a PR to have kalign-python conda packages (there is none at the moment). This would create openmp-risk-free environments (because in the conda ecosystem these dependencies are coordinated and shared by all dependencies within an environment). Unfortunately it is taking too long to come to a resolution. When and if that is accepted, we will add it to the conda dependencies - I would go as far as getting kalign-python from conda-forge also for the pypi-centered environments.
|
Your spidy-sense was correct @jnwei – the pixi envs include cueq but the tests are getting skipped (locally, DGX – on the AWS builders we don't have a GPU). Otherwise the branch is all green https://github.com/aqlaboratory/openfold-3/actions/runs/24453803143 🟢 |
IIRC, we setup so that cueq is not installed in ARM systems. For me, cueq tests were run successfully in both A100s and B300 - we needed to pin it down for the tests to pass. Will give a look later. |
Inference works like a charm in CPU only - in my limited experience at least. What kind of limitations are you thinking about? |
This looks cool; I would maybe simplify by (it is already good enough as is):
Fine for me to merge as is and I can give a hand with more accurate docs later, if needed. As a note, one of the limitations (or design choices) from pixi makes it difficult to have more modular environments (e.g., that you can opt out deepspeed with a command line option). I discussed with the pixi folks and they seem to have some ideas on how to bring this modularity back without fully compromising locking. It would be a great addition for our complex setup. |
|
@sdvillal that's good feedback thanks!!
well about that :P #181 – it's available for 0.8 and above, but it required the tests to be fixed. inference works for me in this branch too.
sorry it's late – maybe you can unpack this for me? BTW there is a source .excalidraw in the repo, and a vscode extension if you want to add your chef's kiss! |


Completing the work started in #34
Remaining tasks