Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 20 additions & 7 deletions docker/Build_instructions_blackwell.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,34 @@ This will create a Docker image named `openfold-3-blackwell` with the `latest` t
## test Pytorch and CUDA

```bash
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 openfold-3-blackwell:latest python -c "import torch; print('CUDA:', torch.version.cuda); print('PyTorch:', torch.__version__)"
docker run \
--gpus all \
--ipc=host \
--ulimit memlock=-1 \
openfold-3-blackwell:latest \
python -c "import torch; print('CUDA:', torch.version.cuda); print('PyTorch:', torch.__version__)"
```

Should print something like:
CUDA: 12.8
PyTorch: 2.7.0a0+ecf3bae40a.nv25.02

```
CUDA: 13.1
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important: with CUDA 12.9+ we get sm121 support out of the box

PyTorch: 2.10.0a0+b4e4ee81d3.nv25.12
```

## test run_openfold inference example

docker run --gpus all -it --ipc=host --ulimit memlock=-1 \
-v $(pwd):/output \
```bash
docker run \
--gpus all -it \
--ipc=host \
--ulimit memlock=-1 \
-v /home/jandom/.openfold3:/root/.openfold3 \
-v $(pwd)/output:/output \
-w /output openfold-3-blackwell:latest \
run_openfold predict \
--query_json=/opt/openfold-3/examples/example_inference_inputs/query_ubiquitin.json \
--query_json=examples/example_inference_inputs/query_ubiquitin.json \
--num_diffusion_samples=1 \
--num_model_seeds=1 \
--use_templates=false
--use_templates=false
```
56 changes: 22 additions & 34 deletions docker/Dockerfile.blackwell
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Simple OpenFold3 Dockerfile using NVIDIA PyTorch container
FROM nvcr.io/nvidia/pytorch:25.02-py3
FROM nvcr.io/nvidia/pytorch:25.12-py3

# Install system dependencies
RUN apt-get update && apt-get install -y \
Expand All @@ -13,15 +13,21 @@ RUN apt-get update && apt-get install -y \
libxft2 \
&& rm -rf /var/lib/apt/lists/*

# Clone OpenFold3 source and modify environment file
# Install CUTLASS for DeepSpeed Evoformer attention kernel
# We need only the headers for DeepSpeed JIT, don't need the pip package with bindings
WORKDIR /opt
RUN git clone https://github.com/aqlaboratory/openfold-3.git && \
cd openfold-3 && \
cp -p environments/production-linux-64.yml environments/production.yml.backup && \
grep -v "pytorch::pytorch" environments/production.yml > environments/production.yml.tmp && \
mv environments/production.yml.tmp environments/production.yml
Comment on lines -18 to -22
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was completely unused: everything is installed via the system python+pip

RUN git clone https://github.com/NVIDIA/cutlass --branch v3.6.0 --depth 1

# Pre-compile DeepSpeed operations for Blackwell GPUs to avoid runtime compilation
# Create necessary cache directories
RUN python3 -c "import os; os.makedirs('/root/.triton/autotune', exist_ok=True)"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is empirically needed in my tests, which is a bit odd


WORKDIR /opt/openfold-3
# Set environment variables including CUDA architecture for Blackwell
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
KMP_AFFINITY=none \
CUTLASS_PATH=/opt/cutlass \
TORCH_CUDA_ARCH_LIST="12.1"
Comment on lines +25 to +30
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can still remove some of these – all of those could be provided at runtime, and are quite specific to the use case here


# Install Python dependencies
RUN pip install --no-cache-dir \
Expand All @@ -46,36 +52,18 @@ RUN pip install --no-cache-dir \
awscli \
memory_profiler \
func_timeout \
biotite==1.2.0 \
"nvidia-cutlass<4" \
"cuda-python<12.9.1"
Comment on lines -50 to -51
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We get coda-python with the image, no need to duplicate that
  • We also only need the cutlass headers, no need to install the package

biotite==1.2.0

# Install CUTLASS for DeepSpeed Evoformer attention kernel
WORKDIR /opt
RUN git clone https://github.com/NVIDIA/cutlass --branch v3.6.0 --depth 1
COPY pyproject.toml /opt/openfold3/
COPY openfold3/__init__.py /opt/openfold3/openfold3/
COPY scripts/ /opt/openfold3/scripts/

# Install OpenFold3 package itself (provides run_openfold command)
WORKDIR /opt/openfold-3
RUN python3 -m pip install --editable --no-deps .

# Set environment variables including CUDA architecture for Blackwell
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
KMP_AFFINITY=none \
CUTLASS_PATH=/opt/cutlass \
TORCH_CUDA_ARCH_LIST="12.0"

# Pre-compile DeepSpeed operations for Blackwell GPUs to avoid runtime compilation
# Create necessary cache directories
RUN python3 -c "import os; os.makedirs('/root/.triton/autotune', exist_ok=True)"
WORKDIR /opt/openfold3
RUN python3 -m pip install --no-deps --editable .

# Create a Python sitecustomize.py to set TORCH_CUDA_ARCH_LIST before any imports
# This ensures the variable is set before PyTorch's cpp_extension checks it
RUN mkdir -p /usr/local/lib/python3.12/site-packages && \
echo 'import os' > /usr/local/lib/python3.12/site-packages/sitecustomize.py && \
echo 'os.environ.setdefault("TORCH_CUDA_ARCH_LIST", "12.0")' >> /usr/local/lib/python3.12/site-packages/sitecustomize.py && \
echo 'os.environ.setdefault("CUTLASS_PATH", "/opt/cutlass")' >> /usr/local/lib/python3.12/site-packages/sitecustomize.py && \
echo 'os.environ.setdefault("KMP_AFFINITY", "none")' >> /usr/local/lib/python3.12/site-packages/sitecustomize.py
Comment on lines -74 to -78
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this can be removed

# Copy the entire source tree directly (at the very end for optimal caching)
COPY . /opt/openfold3

# Default command
CMD ["/bin/bash"]
54 changes: 54 additions & 0 deletions environments/production-linux-aarch64.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Blackwell (sm_121) on aarch64 environment
name: openfold3-env
variables:
CUDA_HOME: /usr/local/cuda
PATH: /usr/local/cuda/bin:${PATH}
LD_LIBRARY_PATH: /usr/local/cuda/lib64:${LD_LIBRARY_PATH}
# Triton bundles its own ptaxs which does not support sm_121
# This forces Triton to use the system ptaxas compiler, aware of sm_121
TRITON_PTXAS_PATH: /usr/local/cuda/bin/ptxas
# Requires: git clone https://github.com/NVIDIA/cutlass --branch v3.6.0 --depth 1 ~/workspace/cutlass
CUTLASS_PATH: /home/jandom/workspace/cutlass
# Note: OMP_NUM_THREADS=1 is required to avoid threading conflicts
OMP_NUM_THREADS: "1"
Comment on lines +4 to +13
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the really ugly part, especially the hard-coded paths specific to my box or $HOME – all of this get taken care of when using the docker image from nvidia with torch pre-installed


channels:
- conda-forge
- bioconda
- nvidia
dependencies:
- python
- awscli
- setuptools
- pip
- conda-forge::uv
- pytorch-lightning
- biopython
- numpy
- pandas
- PyYAML
- requests
- scipy
- tqdm
- typing-extensions
- wandb
- modelcif
- ml-collections
- rdkit=2025.09.3
- mmseqs2
- bioconda::hmmer
- bioconda::hhsuite
- bioconda::kalign2
- bioconda::snakemake
- memory_profiler
- func_timeout
- boto3
- conda-forge::python-lmdb=1.6
- conda-forge::ijson
- pip:
# PyTorch stable cu130 for aarch64 - works on Blackwell via PTX JIT
- --extra-index-url https://download.pytorch.org/whl/cu130
- torch>=2.9.0
Comment on lines +50 to +51
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important to get a sufficiently high version of torch. A couple of things got removed or moved

  • biotite conda package only exists for linux64 but the pip package does better
  • mkl removed
  • pytorch-cuda, again only for linux64

- biotite==1.2.0
- deepspeed
- pdbeccdutils
Loading