Adding functionality for unconstrained ligand minimization#15
Draft
Adding functionality for unconstrained ligand minimization#15
Conversation
- Add Dockerfile using condaforge/miniforge3 for conda dependencies - Add docker-build.yml workflow triggered by releases, tags, or manual dispatch - Add .dockerignore to exclude build artifacts - Update README with Docker usage instructions Image published to ghcr.io/delalamo/graphrelax Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ature/docker-workflow
…mization Detects missing residues in protein chains by checking residue numbering discontinuities and C-N bond distances. Chains are split at gaps before OpenMM minimization to prevent the creation of unrealistic peptide bonds across gaps. Original chain IDs are restored after minimization. - Add chain_gaps.py module with detect_chain_gaps, split_chains_at_gaps, and restore_chain_ids functions - Integrate gap detection into relaxer.py relax() method - Add split_chains_at_gaps config option (enabled by default) - Add --no-split-gaps CLI flag to disable the feature - Add comprehensive tests for chain gap detection
…atom The bug was assigning a new chain ID for every atom at a gap start residue instead of just once when entering a new segment. Added tracking of processed gap starts to prevent duplicate chain assignments.
- Free up disk space by removing unused .NET, GHC, and Boost packages - Install CPU-only PyTorch to avoid large CUDA dependencies - Use --no-cache-dir to minimize pip cache usage The GitHub Actions runner was running out of disk space when installing PyTorch with CUDA dependencies (~5-7GB) alongside conda packages. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable unconstrained minimization for protein-ligand complexes using
openmmforcefields for small molecule parameterization.
Changes:
- Add include_ligands, ligand_forcefield, and ligand_smiles config options
- Create ligand_utils.py module for ligand extraction and parameterization
- Add _relax_unconstrained_with_ligands method using SystemGenerator
- Update CLI with --include-ligands, --ligand-forcefield, --ligand-smiles
- Add [ligands] optional dependency group to pyproject.toml
- Add unit tests for ligand utilities
- Add ligand_utils.py to pylint exclude (optional dependency imports)
The implementation:
1. Separates protein (ATOM) and ligands (HETATM) from PDB
2. Processes protein with pdbfixer (avoiding terminal detection issues)
3. Parameterizes ligands with OpenFF Sage 2.0 (default) or GAFF2/Espaloma
4. Combines topologies and minimizes together
Usage:
graphrelax -i complex.pdb -o minimized.pdb --include-ligands
graphrelax -i complex.pdb -o minimized.pdb --include-ligands \
--ligand-forcefield gaff-2.11 --ligand-smiles 'LIG:c1ccccc1'
Requires: pip install graphrelax[ligands]
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nary Replace the large hardcoded SMILES dictionary with dynamic extraction from PDB coordinates using RDKit bond perception: - Rename get_common_ligand_smiles() to get_ion_smiles() (only ions need explicit SMILES since they're single atoms without bond info) - Add is_single_atom_ligand() helper function - Update create_openff_molecule() to try RDKit parsing first, then fall back to OpenFF PDB parsing, using ion lookup only for single atoms - Update relaxer.py to use the new approach - Update tests to reflect the refactored functions This removes ~70 lines of hardcoded SMILES while making the code more robust - it can now handle any ligand that RDKit can perceive bonds for. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The ligand libraries (openmmforcefields, openff-toolkit, rdkit) are only available via conda-forge, not PyPI. Instead of lazy imports, use top-level try-except blocks with clear ImportError messages that tell users exactly how to install the missing dependencies: - ligand_utils.py: Check for openff-toolkit and rdkit at import time - relaxer.py: Check for openmmforcefields at import time - Both provide clear conda install commands with version numbers - pyproject.toml: Updated comment with full conda install command - cli.py: Removed optional dependency check (now handled at runtime) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This was
linked to
issues
Jan 13, 2026
- Add artifacts.py with comprehensive artifact detection for buffers, cryoprotectants, detergents, lipids, reducing agents, and halide ions - Auto-remove artifacts by default, preserve biologically relevant ions - Add --keep-all-ligands and --keep-ligand flags to whitelist residues - Update README with mamba/micromamba installation instructions - Add tests for artifact detection and removal Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolved conflicts: - README.md: Combined mamba installation docs with pre-idealize feature - cli.py: Added both artifact removal flags and pre-idealize flags - relaxer.py: Combined ligand_utils and idealize imports Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove action="append" in favor of single comma-separated string - Fix parsing logic to handle single string instead of list - Add unit tests for --keep-ligand CLI argument parsing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code cleanup: - Consolidate WATER_RESIDUES to artifacts.py (was defined in 3 places) - Create shared check_gpu_available() in utils.py (was duplicated) - Remove unused add_ter_records_at_gaps() from chain_gaps.py - Remove unused _relax_direct() method from relaxer.py (~100 lines) - Remove corresponding test class TestAddTerRecordsAtGaps README update: - Update --keep-ligand documentation to show comma-separated syntax Total: ~180 lines of redundant code removed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove requirements.txt (dependencies in pyproject.toml) - Update test_relaxer_integration.py to use public API: - Use check_gpu_available() from utils instead of removed method - Use relax() with constrained=False instead of removed _relax_direct() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Make it clear that openmmforcefields, openff-toolkit, and rdkit are conda-forge only (like pdbfixer) - Add ligand support installation command to PyPI section - Reorganize dependencies table to show required vs optional Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- pdbfixer, openmmforcefields, openff-toolkit, and rdkit are now all required for installation - Simplified installation instructions - Removed "optional" labeling from dependencies table Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change idealization from opt-in (--pre-idealize) to opt-out (--no-idealize) - IdealizeConfig.enabled now defaults to True - Add --ignore-missing-residues flag to skip adding residues from SEQRES - Add --overwrite flag to allow overwriting output files - Preserve residue numbering with keepIds=True in PDB output - Update README to reflect new default behavior Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add renumber_residues_sequential() to idealize.py to ensure sequential residue numbering (1, 2, 3...) per chain after pdbfixer adds missing residues. This fixes false chain gap detection caused by non-sequential numbers from pdbfixer. - Add CLI warning when using resfile with idealization enabled, since residue numbers in the resfile must match the idealized structure - Update README to document the residue renumbering behavior Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix bug where ligands were extracted before the ligand-presence check, causing the ligand-aware minimization path to never be triggered. Now for unconstrained minimization with ligands: - Ligands are detected before any extraction - Full PDB (with ligands) is passed to _relax_unconstrained() - Ligands are parameterized via openmmforcefields and minimized together with the protein For constrained minimization, ligands are still extracted and restored unchanged since AmberRelaxation cannot handle arbitrary ligands. This fixes protein-ligand clashes that occurred when the protein moved during minimization while ligands stayed in their original positions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update idealization pipeline to properly handle ligands: 1. Add missing residues and atoms (protein only) 2. Idealize bond lengths and angles 3. Minimize protein with constraints (without ligands) 4. Reintroduce ligands and minimize protein+ligand complex together This ensures ligands move with the protein during idealization rather than staying at their original coordinates while the protein moves. New function minimize_with_ligands() uses openmmforcefields to parameterize ligands and perform constrained minimization on the full protein-ligand complex. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When ligands containing transition metals (heme, Fe-S clusters, chlorophylls, etc.) are detected, raise a clear error explaining the options instead of attempting to parameterize them. - Add UNPARAMETERIZABLE_COFACTORS set in ligand_utils.py - Add is_unparameterizable_cofactor() check function - Update relaxer and idealize to fail early with helpful message Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Don't silently catch and continue when ligand parameterization fails during idealization - let it fail with a clear error message instead of failing later during relaxation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
If create_openff_molecule() fails, let the error propagate rather than silently skipping the ligand and then failing later. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…tion - Remove openmmforcefields dependency for ligand handling - Add ligand exclusion zone approach in relaxer.py: ligand atoms are added as massless dummy particles with LJ repulsion, preventing protein from moving into ligand space during minimization - Simplify idealize.py: ligands are extracted before protein minimization and restored afterward at original positions - Fix PyTorch deprecation warning in tensor_utils.py (use tuple for multidimensional indexing) - Add PDBe SMILES fetching for ligand identification (ligand_utils.py) - Remove complex ligand parameterization code that was failing due to PDB files lacking bond information for HETATM records This approach is more robust because: 1. No need for ligand force field parameters 2. Works with any ligand without SMILES 3. Protein minimizes fully while respecting ligand positions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Owner
Author
|
What a god damn mess. I need to stop vibe coding and go outside |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.