Developed at Artificial Intelligence Protein Design Lab
Figure: Optimization results across 32 cycles for protein-ligand complexes, showing improvements in MPNN scores and binding energies (DDG).
An iterative protein design pipeline that combines LigandMPNN sequence generation with PyRosetta FastRelax optimization. Through repeated cycles of sequence design and structural relaxation, this method can improve protein backbone geometry and binding affinity.
# Create conda environment with required dependencies
conda create -n ligmpnn-fr -y \
-c nvidia -c pytorch -c conda-forge \
python=3.12 \
pytorch pytorch-cuda=12.4 \
numpy scipy pandas \
openbabel \
biopython prody ml-collections dm-tree
conda activate ligmpnn-fr# 1. LigandMPNN
# 2. PyRosetta
# 3. This repositoryInput: Protein-ligand complex PDB + ligand parameters
↓
┌─→ [Iteration Cycle] ←─┐
│ │ │
│ ├── 1. LigandMPNN Sequence Design
│ │ └── Generate new amino acid sequence
│ │
│ ├── 2. Side Chain Packing (Optional)
│ │ └── Optimize side chain conformations
│ │
│ ├── 3. PyRosetta FastRelax
│ │ ├── Full backbone + side chain optimization
│ │ ├── Apply distance constraints (optional)
│ │ └── Energy minimization
│ │
│ ├── 4. Structure Evaluation
│ │ ├── Calculate Rosetta energy
│ │ └── Evaluate MPNN score
│ │
│ └── 5. Select Best Structure → Next Cycle
└──────────────────────────────────────┘
python ligandmpnn_fastrelax_complete.py \
--pdb_path protein_ligand_complex.pdb \
--ligand_params_path ligand.params \
--out_folder output_directory \
--n_cycles 5See example/ directory for sample input files and analysis scripts:
example/lmpnn_fr.py- Example optimization workflowexample/fastrelax_scores.svg- Visualization of optimization results
# Standard optimization with 8 cycles
python ligandmpnn_fastrelax_complete.py \
--pdb_path complex.pdb \
--ligand_params_path ligand.params \
--out_folder results \
--n_cycles 8 \
--temperature 0.1 \
--num_seq_per_target 4 \
--save_stats# High-throughput optimization with parallel processing
python ligandmpnn_fastrelax_complete.py \
--pdb_path complex.pdb \
--ligand_params_path ligand.params \
--out_folder results \
--n_cycles 16 \
--num_seq_per_target 8 \
--num_processes 8 \
--pyrosetta_threads 4 \
--pack_side_chains \
--temperature 0.15 \
--target_atm_for_cst "O1,N1,N2" \
--hb_atoms "O1,O2,O3" \
--save_stats| Argument | Description |
|---|---|
--pdb_path |
Input protein-ligand complex PDB file |
--ligand_params_path |
Rosetta ligand parameters file (.params) |
--out_folder |
Output directory for results |
| Argument | Default | Description |
|---|---|---|
--n_cycles |
3 | Number of design-optimization cycles |
--temperature |
0.1 | LigandMPNN sampling temperature (0.05-0.3) |
--num_seq_per_target |
1 | Sequences generated per cycle |
| Argument | Default | Description |
|---|---|---|
--pack_side_chains |
False | Enable side chain packing optimization |
--target_atm_for_cst |
"" | Ligand atoms for distance constraints (e.g., "O1,N1,N2") |
--selection_metric |
"ddg" | Metric for structure selection (ddg/totalscore/cms) |
| Argument | Default | Description |
|---|---|---|
--num_processes |
1 | Parallel processes for relaxation |
--pyrosetta_threads |
1 | Threads per PyRosetta process |
| Argument | Default | Description |
|---|---|---|
--redesigned_residues |
[] | Specific residues to redesign (e.g., "A45 A46 A48") |
--fixed_residues |
[] | Residues to keep unchanged |
--omit_AAs |
"X" | Amino acids to exclude from design |
--save_stats |
False | Save detailed statistics |
output_directory/
├── seqs/ # Generated sequences
│ ├── input_cycle_1.fa # FASTA format with MPNN scores
│ └── input_cycle_N.fa
├── backbones/ # Intermediate structures
│ ├── input_cycle_1_threaded.pdb # Post-threading structures
│ └── input_cycle_N_threaded.pdb
├── relaxed/ # Optimized structures
│ ├── input_cycle_1_relaxed.pdb # Final relaxed structures
│ └── input_cycle_N_relaxed.pdb
└── stats/ (optional) # Performance metrics
├── input_cycle_1.json # Energy/score statistics
└── input_cycle_N.json
Distance constraints between ligand atoms and nearby protein residues maintain binding geometry:
def extract_dist_cst_from_pdb(pdb_in, lig_tr_atms, bsite_res=''):
parser = PDBParser(QUIET=True)
# Vectorized distance calculations with NumPy
# Generates: AtomPair O1 147 CA 45 HARMONIC 3.2 0.5- Parallel Processing: Multiple structures relaxed simultaneously using multiprocessing
- Multithreading: Configurable thread allocation per PyRosetta process need to fix!
- GPU Acceleration: LigandMPNN inference on CUDA
- Vectorized Operations: NumPy-based calculations
Optimization metrics from runs with protein-ligand complexes:
- MPNN Score: ~0.85 → ~0.55 (lower is better)
- DDG: ~-25 → ~-35 REU (binding energy improvement)
- Residue Total Score: Stabilized around -2.4 to -3.2 REU
- CMS: Maintained ~240-270 (confidence levels)
Most improvements occur within the first 10-15 cycles.
Common Issues:
- Memory errors: Reduce
--num_processesor--pyrosetta_threads - Version of pyrosetta: Multithreading issues
- Missing ligand: Verify
.paramsfile path and format - Poor optimization: Try different
--temperaturevalues (0.05-0.3)
Tips:
- Use
--pack_side_chainsfor better optimization - Set
--target_atm_for_cstfor key ligand interactions - Monitor with
--save_stats
This implementation is based on the LigandMPNN-FR concept by Gyu Rie Lee (2022). The original script is available in the original_script/ directory for reference. For detailed comparison with the original implementation, see original_script/README.md.
Related Work:
- LigandMPNN: Dauparas et al. (2022)
- PyRosetta: Chaudhury et al. (2010)