A Python tool to find and fill protein cavities with water molecules using KVFinder, Monte Carlo sampling, and RDKit-based explicit water generation.

ities to fill with user-defined water counts
- Monte Carlo Sampling: Places water molecules using Monte Carlo sampling with clash detection
- Explicit Waters: Builds full H-O-H waters with RDKit (including hydrogens)
- CLI Interface: Easy-to-use command-line interface built with Typer
- Python: Python 3.8 or higher
You can install this via pip by running:
pip install cavefillerOr install it from here - from source to get the newest version:
pip install git+https://github.com/Desperadus/CaveFilleralternatively
# Clone the repository
git clone https://github.com/Desperadus/CaveFiller.git
cd CaveFiller
# Install the package
pip install -e .Install with OpenMM support:
pip install -e ".[openmm]"Using uv (recommended for lock-managed environments):
uv sync --extra openmmcavefiller protein.pdbThis will:
- Detect cavities in
protein.pdb - Display a list of found cavities with their volumes and areas
- Prompt you to select which cavities to fill
- Prompt you for the number of water molecules per cavity
- Place waters using Monte Carlo sampling with clash detection
- Build explicit RDKit H-O-H waters and export a combined PDB
- Save the output to
./output/protein_filled.pdb
cavefiller [PROTEIN_FILE] [OPTIONS]Arguments:
PROTEIN_FILE: Path to the protein PDB file (required)
Options:
--output-dir PATH: Directory to save output files (default:./output)--grid-step FLOAT: Grid spacing for cavity detection in Ångströms (default: 0.6)--probe-in FLOAT: Probe In radius for cavity detection in Ångströms (default: 1.4)--probe-out FLOAT: Probe Out radius for cavity detection in Ångströms (default: 4.0)--exterior-trim-distance FLOAT: Exterior trim distance in Ångströms (default: 2.4)--volume-cutoff FLOAT: Minimum cavity volume to consider in Ų (default: 5.0)--auto-select: Automatically select all cavities without user interaction--cavity-ids TEXT: Comma-separated list of cavity IDs to fill (e.g., '1,2,3')--waters-per-cavity TEXT: Comma-separated list of water counts (e.g., '10,15,20'), must match cavity-ids order--optimize-mmff94 / --no-optimize-mmff94: Enable/disable MMFF94 with protein fixed - Note IT IS VERY SLOW ON BIGGER COMPLEXES (default: enabled)--mmff-max-iterations INTEGER: Max MMFF94 iterations (default: 300)--optimize-openmm / --no-optimize-openmm: Enable/disable OpenMM minimization with AMBER14/TIP3P and standard energy minimization; protein atoms are fixed by zero mass (takes precedence over MMFF94 when active)--cuda / --no-cuda: When OpenMM is enabled, require CUDA platform (fails if unavailable) (default: disabled)--openmm-max-iterations INTEGER: Max OpenMM minimization iterations (default: 500)--keep-all: Keep all optimized waters (skip post-optimization clash-based dropping) (default: disabled)
Recommended usage:
- Prefer interactive/manual cavity and water-count selection over
--auto-select. Auto-selection often overfills cavities with too many waters. - If OpenMM is installed and MMFF94 optimization is enabled, CaveFiller will prefer OpenMM automatically.
- Use
--optimize-openmmif OpenMM is installed and you want stronger relaxation of residual clashes. - Otherwise keep
--optimize-mmff94enabled to refine water placement after Monte Carlo sampling. - Use
--keep-allif you want to keep all waters after OpenMM/MMFF94.
Interactive cavity and water selection:
cavefiller protein.pdb --output-dir resultsAuto-select all cavities with default water counts (not generally recommended):
cavefiller protein.pdb --auto-selectFill specific cavities with specific water counts:
cavefiller protein.pdb --cavity-ids "1,3,5" --waters-per-cavity "10,15,20"Custom cavity detection parameters:
cavefiller protein.pdb --grid-step 0.6 --probe-in 1.4 --probe-out 4.0 --exterior-trim-distance 2.4 --volume-cutoff 5.0- Cavity Detection: The tool uses pyKVFinder to detect cavities in the input protein structure
- Cavity Analysis: Displays information about detected cavities (ID, volume, surface area)
- Cavity Selection:
- Interactive mode: User selects cavities and specifies water counts
- Auto mode: All cavities are selected with automatic water count estimation
- Command-line mode: Specific cavities and water counts are pre-selected
- Water Placement:
- Monte Carlo sampling places waters randomly in cavity
- Clash detection validates each position against protein atoms and other waters
- Uses Van der Waals radii for distance calculations
- RDKit Water Construction:
- Explicit H-O-H waters are generated with ideal geometry
- Waters include hydrogens and proper HOH residue records in the output PDB
- Samples around cavity grid points with small local jitter
- Validates position stays near cavity voxels (< 0.7 Å from a grid point)
- Checks for clashes with protein atoms (minimum distance based on VDW radii)
- Checks for clashes with other waters (minimum 2.7 Å separation)
- Attempts up to 500 placements per water molecule
- Uses Van der Waals radii for different atom types (H, C, N, O, S, P)
- Minimum water-protein distance: 2.35 Å
- Minimum water-water distance: 2.7 Å
- Tolerance of 0.5 Å for VDW overlap
- Creates proper H-O-H geometry for each water
- Writes explicit HOH residues (O, H1, H2) into output PDB
The tool generates the following files in the output directory:
protein_filled.pdb: Protein structure with explicit water molecules in selected cavities
- typer: CLI framework
- pyKVFinder: Cavity detection
- rdkit: Molecular manipulation and explicit water generation
- numpy: Numerical operations
- biopython: PDB file handling
- openmm (optional extra): OpenMM minimization with optional pdbfixer fallback for topology repair
uv sync --group dev
uv run pytestSee LICENSE file for details.
If you use CaveFiller in your research, please cite:
- pyKVFinder: Guerra et al. (2020) BMC Bioinformatics
- RDKit: RDKit: Open-source cheminformatics; http://www.rdkit.org
Contributions are welcome! Please feel free to submit a Pull Request.