- Anaconda (required)
- Python 3.8
- CUDA-enabled GPU (recommended for training)
conda create --name SE3Bind python=3.8
conda activate SE3BindOption A: Run the setup script
bash setup.shOption B: Install manually
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install e3nn
pip install biopython
pip install pandas
pip install matplotlib
pip install tqdm
pip install plotly
pip install mrcfileSE3Bind/
├── src/ # Source code
│ ├── train_T0.py # Task 0 (Re-Docking) training
│ ├── train_T1.py # Task 1 (Binding affinity) training
│ ├── TrainerT0.py # Re-docking model trainer class
│ ├── TrainerT1.py # Binding affinity model trainer class
│ ├── TrainerWrapper.py # Wrapper for both trainers
│ ├── TorchDockingFFT.py # FFT-based docking
│ ├── ProcessCoords.py # Coordinate processing
│ ├── Rotations.py # Rotation handling
│ ├── UtilityFunctions.py # Utility functions
│ ├── PlotterT0.py # Task 0 (Re-Docking) visualization
│ ├── PlotterT1.py # Task 1 (Binding affinity) visualization
│ ├── configT0.txt # Task 0 (Re-Docking) configuration
│ └── configT1.txt # Task 1 (Binding affinity) configuration
├── models/ # Model architectures
│ ├── model_sampling.py # Sampling model
│ ├── model_docking.py # Docking model
│ └── model_interaction.py # Interaction/affinity model
├── data/ # Dataset scripts and data
├── Figs/ # Output visualizations
└── Log/ # Training logs and saved models
Create a CSV file (e.g., mappings_example.csv) with your PDB files and chain mappings:
filename,antibody_chains,antigen_chains
5mev.pdb,H;L,A
3iu3.pdb,A;B,K
1s78.pdb,H;L,A;B- filename: Name of your PDB file
- antibody_chains: Comma-separated chain IDs for antibody (e.g., 'H,L')
- antigen_chains: Comma-separated chain IDs for antigen (e.g., 'A')
Run the dataset generation script:
cd SE3Bind/data/
python Inference_dataset_generation.py <csv_file> <pdb_directory> <output_path> <output_name>Or use the provided bash script:
cd SE3Bind/data/
bash run_inference_dataset_gen.shExample:
python Inference_dataset_generation.py mappings_example.csv ./pdb_files/ ./datasets/ inference_dataset.pklArguments:
csv_file: Path to your CSV mapping filepdb_directory: Directory containing your PDB filesoutput_path: Where to save the output datasetoutput_name: Name for the output pickle file
This will:
- Read your CSV file
- Split each PDB into antibody and antigen chains
- Generate voxelized representations (75³ grid at 2Å resolution)
- Save as a pickle file ready for inference
Configure your inference settings in src/inference_config.txt:
# Key settings to modify:
testset = ../data/datasets/inference_dataset.pkl # Your generated dataset
resume_epoch = 1000 # Epoch of trained model to loadRun inference using the command-line interface:
cd SE3Bind/src/
python train_T1.py --mode evaluate --config inference_config.txtCommand-line options:
--mode: Operation mode (train,evaluate, orresume)--config: Path to config file (default:configT1.txt)--testset: Path to dataset (overrides config file)--epoch: Epoch number for evaluation (overrides config file)
Examples:
# Evaluate with config file settings
python train_T1.py --mode evaluate --config inference_config.txt
# Evaluate with CLI overrides
python train_T1.py --mode evaluate --config inference_config.txt \
--experiment my_custom_run --testset ../data/datasets/my_data.pkl --epoch 1000The predictions will be saved in Log/losses/<experiment_name>/ with binding affinity (ΔG) predictions.
Predicts the Re-docking pose of antibody-antigen complexes.
(not recommended without GPU):
python train_T0.pyConfiguration: Edit src/configT0.txt to set model parameters.
Configuration: Edit `src/configT1.txt` to set model parameters.
### Model Configuration
Key parameters in config files:
- `box_dim`: Base grid dimension (default: 50)
- `padded_dim`: Padded grid dimension (default: 100)
- `resolution_in_angstroms`: Voxel resolution (default: 2.0)
- `learning_rate`: Optimizer learning rate
- `train_epochs`: Number of training epochs
- `eval_freq`: Evaluation frequency
- `docked_complex`: Use docked complex features (True/False)
- `zero_feature`: Use zero-feature ablation (True/False)
### Monitoring Training
Training logs are saved in:
- `src/slurm_log/` (cluster jobs)
- `Log/losses/` (loss values)
- `Log/saved_models/` (model checkpoints)
---
## Inputs
### Data Format
- **Training/Testing datasets:** `.pkl` files containing:
- Receptor (antibody) volumes
- Ligand (antigen) volumes
- Ground truth rotations and translations
- Atomic coordinates
- Binding affinity values
- Structure IDs and cluster information
Paths are configured in `configT0.txt` and `configT1.txt`.
---
## Outputs
### Generated Directories
#### `Figs/`
Visualization outputs:
- **`correlation_plots/`** - ΔF vs ΔG correlation plots
- **`loss_plots/`** - Training loss curves
- **`RMSD_distribution_plots/`** - RMSD distributions
- **`Coordinate_RMSD/`** - 3D docking pose visualizations (HTML)
- **`CorrelationFFTvolumes/`** - Energy grid visualizations
- **`Feature_volumes/`** - Feature map visualizations (.html and .map files)
- **`Input_volumes/`** - Input volume visualizations
#### `Log/`
Training artifacts:
- **`losses/`** - Loss and RMSD log files (.txt)
- **`saved_models/`** - Model checkpoints (.th)
### Output File Types
- `.html` - Interactive 3D plots (open in web browser)
- `.map` - MRC density maps (open in PyMOL/ChimeraX)
- `.txt` - Training logs and metrics
- `.th` - PyTorch model checkpoints
---