Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 83 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ ensemble.to_xyz("output.xyz")

### Named Presets

Four use-case presets are available out of the box:
Five use-case presets are available out of the box:

```python
from openconf import generate_conformers
Expand All @@ -51,6 +51,8 @@ ensemble = generate_conformers(mol, preset="spectroscopic") # NMR / IR / VCD
ensemble = generate_conformers(mol, preset="docking") # docking pose recovery
```

For FEP-style analogue generation from a fixed pose, see [`generate_conformers_from_pose`](#5-analogue--fep-style-r-group-exploration) below.

### Custom Configuration

For full control, pass a `ConformerConfig` directly. You can also use
Expand Down Expand Up @@ -106,7 +108,7 @@ config.max_out = 200 # override a single field
ensemble = generate_conformers(mol, config=config)
```

Available presets: `"rapid"`, `"ensemble"`, `"spectroscopic"`, `"docking"`.
Available presets: `"rapid"`, `"ensemble"`, `"spectroscopic"`, `"docking"`, `"analogue"`.

Below are representative wall-clock timings measured on a single CPU core
(Apple M2 Pro), mean over 3 runs.
Expand Down Expand Up @@ -325,18 +327,85 @@ ensemble.to_sdf("docking_input.sdf")

---

### 5. Analogue / FEP-style R-group exploration

Generate conformers for an MCS-aligned analogue while keeping the core scaffold
exactly fixed at the input pose. The correct entry point here is
`generate_conformers_from_pose` rather than `generate_conformers`.

- Starts from the **supplied conformer** — no ETKDG seeding
- Only **free terminal rotors** are explored (those whose moving fragment is
entirely outside the constrained core)
- MMFF minimization uses stiff **position restraints** on all constrained atoms,
then snaps them to exact starting coordinates so there is zero drift
- **Global shake** is suppressed to avoid thrashing the starting pose

```python
from rdkit import Chem
from rdkit.Chem import AllChem
from openconf import generate_conformers_from_pose

# Suppose we have an MCS-aligned analogue with a propyl substituent
# replacing the butyl chain on a benzene scaffold.
mol = Chem.MolFromSmiles("CCCc1ccccc1")
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, AllChem.ETKDGv3())

# Ring heavy-atom indices — these must not move
ring_atoms = [3, 4, 5, 6, 7, 8]

ensemble = generate_conformers_from_pose(mol, constrained_atoms=ring_atoms)
ensemble.to_sdf("analogues.sdf")
```

The default preset (`"analogue"`) returns up to 50 conformers. Pass `preset=` or
`config=` to override:

```python
from openconf import ConformerConfig, generate_conformers_from_pose

# Fewer conformers, faster turnaround
config = ConformerConfig(max_out=10, n_steps=60, pool_max=200)
ensemble = generate_conformers_from_pose(mol, constrained_atoms=ring_atoms, config=config)
```

<details><summary>Full analogue preset equivalent</summary>

```python
from openconf import ConformerConfig, PrismConfig, generate_conformers_from_pose

config = ConformerConfig(
max_out=50,
pool_max=500,
n_steps=150,
energy_window_kcal=10.0,
do_final_refine=True,
minimize_batch_size=8,
parent_strategy="softmax",
final_select="diverse",
prism_config=PrismConfig(energy_window_kcal=10.0),
)
ensemble = generate_conformers_from_pose(mol, constrained_atoms=ring_atoms, config=config)
```

</details>

---

### Configuration Quick Reference

| Parameter | Rapid | Ensemble | Spectroscopic | Docking |
|---|---|---|---|---|
| `max_out` | 5 | 50 | 100 | 250 |
| `n_steps` | 30 | 200 | 400 | 500 |
| `energy_window_kcal` | 20 | 10 | 5 | 18 |
| `seed_n_per_rotor` | 2 | 3 | 5 | 4 |
| `seed_prune_rms_thresh` | 1.5 | 1.0 | 0.5 | 0.8 |
| `do_final_refine` | False | True | True | False |
| `parent_strategy` | softmax | softmax | softmax | uniform |
| `final_select` | diverse | diverse | energy | diverse |
| Parameter | Rapid | Ensemble | Spectroscopic | Docking | Analogue |
|---|---|---|---|---|---|
| `max_out` | 5 | 50 | 100 | 250 | 50 |
| `n_steps` | 30 | 200 | 400 | 500 | 150 |
| `energy_window_kcal` | 20 | 10 | 5 | 18 | 10 |
| `seed_n_per_rotor` | 2 | 3 | 5 | 4 | — |
| `seed_prune_rms_thresh` | 1.5 | 1.0 | 0.5 | 0.8 | — |
| `do_final_refine` | False | True | True | False | True |
| `parent_strategy` | softmax | softmax | softmax | uniform | softmax |
| `final_select` | diverse | diverse | energy | diverse | diverse |

*Analogue mode uses `generate_conformers_from_pose`; seeding parameters are unused because ETKDG is skipped.*

## How It Works

Expand Down Expand Up @@ -380,11 +449,13 @@ openconf is not recommended for macrocycles (ring size ≥ 12). On macrocyclic r

- `generate_conformers(mol, method="hybrid", config=None)` - Main entry point
- `generate_conformers_from_smiles(smiles, ...)` - Convenience wrapper
- `generate_conformers_from_pose(mol, constrained_atoms, config=None)` - FEP-style analogue generation from an aligned pose

### Configuration Classes

- `ConformerConfig` - Main configuration
- `PrismConfig` - PRISM Pruner settings
- `ConstraintSpec` - Positional constraints for pose-locked generation

### Data Classes

Expand Down
32 changes: 32 additions & 0 deletions SCIENCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,38 @@ Seed count is computed automatically from the molecular topology rather than set

The key to making this efficient is aggressive deduplication. Without it, the search quickly fills with near-identical structures that differ only in insignificant ways. openconf uses PRISM Pruner, which implements a cached divide-and-conquer algorithm for comparing conformers by RMSD and moment of inertia. Rather than doing O(N²) all-to-all comparisons, PRISM sorts conformers by energy and recursively partitions them, exploiting the fact that similar structures tend to cluster. This lets openconf maintain large internal pools (thousands of conformers) while keeping only the truly unique ones. The final selection step returns the lowest-energy conformers after PRISM deduplication, ensuring the output ensemble contains distinct, low-strain geometries.

## Pose-Constrained Generation (FEP / Analogue Mode)

A common need in lead optimization is generating conformers for an analogue where the core scaffold is already placed — for example, the result of an MCS alignment from a co-crystal structure. Standard ETKDG seeding is not appropriate here: it randomizes the entire molecule and cannot be guaranteed to recover the aligned pose. Instead, openconf supports constrained generation via `generate_conformers_from_pose`.

The idea is to identify a set of *constrained atoms* (the MCS core) whose Cartesian positions must remain fixed at the input geometry, and to explore only the remaining degrees of freedom — the *free rotors*, i.e., bonds whose entire moving fragment lies outside the constrained set. The algorithm adapts in three ways:

**Seeding.** ETKDG is skipped entirely. The single input conformer is used directly as the seed, immediately fast-minimized with position restraints to relax any bond-length or angle strain in the free fragment before exploration begins.

**Move set.** Only free rotors (and ring flips of entirely free rings) are sampled. The global shake move is suppressed: it would randomize 50–80% of all rotors, some of which pass through the core, defeating the constraint. Move probability that would have gone to global shake is redistributed to `single_rotor` moves.

**Minimization.** Every MMFF minimization call applies `MMFFAddPositionConstraint` to each constrained atom with a stiff harmonic force constant (default 1000 kcal/mol/Ų). After minimization converges, constrained atom coordinates are snapped back to the exact reference values, eliminating any residual drift. Final refinement applies the same treatment.

The rotor filtering logic uses a BFS traversal: for each bond (i, j) in the rotor list, we compute the set of atoms reachable from j without crossing (i, j). If this moving fragment contains no constrained atoms, the rotor is free. If the moving fragment *does* contain constrained atoms but the opposite side (reachable from i) does not, the rotor is flipped so that the free atoms become the moving side. Rotors where constrained atoms appear on both sides are excluded entirely — they cannot be rotated without moving the core.

```python
from rdkit import Chem
from rdkit.Chem import AllChem
from openconf import generate_conformers_from_pose

# MCS-aligned analogue: ethyl group added to a benzene scaffold
mol = Chem.MolFromSmiles("CCc1ccccc1")
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, AllChem.ETKDGv3()) # or use your aligned pose

# Constrain the benzene ring (heavy-atom indices 2–7)
ensemble = generate_conformers_from_pose(mol, constrained_atoms=list(range(2, 8)))
```

The `"analogue"` preset (50 conformers, 150 steps, softmax parent strategy, full refinement) is the default. Because the free rotor space is much smaller than the full conformational space, 150 steps is usually sufficient for thorough coverage of simple R-groups. For larger substituents with many free rotors, increase `n_steps` accordingly.

**Atom index convention.** `Chem.AddHs` appends new H atoms after all existing atoms, preserving all prior indices. So whether you pass a heavy-atom-only mol (indices 0…N−1 for heavy atoms) or an H-added mol (same heavy-atom indices plus new H indices), the constrained atom indices remain valid after the internal `AddHs` call.

## Tuning Guide

For the most common workflows, named presets are the easiest starting point:
Expand Down
8 changes: 6 additions & 2 deletions openconf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,13 @@
ConformerEnsemble,
ConformerRecord,
generate_conformers,
generate_conformers_from_pose,
generate_conformers_from_smiles,
)
from .config import ConformerConfig, ConformerPreset, PrismConfig, preset_config
from .config import ConformerConfig, ConformerPreset, ConstraintSpec, PrismConfig, preset_config
from .dedupe import prism_dedupe
from .io import mol_to_smiles, read_sdf, smiles_to_mol, write_sdf, write_xyz
from .perceive import Rotor, RotorModel, build_rotor_model, prepare_molecule
from .perceive import Rotor, RotorModel, build_rotor_model, filter_constrained_rotors, prepare_molecule
from .relax import RDKitMMFFMinimizer, get_minimizer
from .torsionlib import TorsionLibrary, TorsionRule

Expand All @@ -32,14 +33,17 @@
"ConformerEnsemble",
"ConformerPreset",
"ConformerRecord",
"ConstraintSpec",
"PrismConfig",
"RDKitMMFFMinimizer",
"Rotor",
"RotorModel",
"TorsionLibrary",
"TorsionRule",
"build_rotor_model",
"filter_constrained_rotors",
"generate_conformers",
"generate_conformers_from_pose",
"generate_conformers_from_smiles",
"get_minimizer",
"mol_to_smiles",
Expand Down
87 changes: 86 additions & 1 deletion openconf/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

from rdkit import Chem

from .config import ConformerConfig, ConformerPreset, preset_config
from .config import ConformerConfig, ConformerPreset, ConstraintSpec, preset_config
from .perceive import build_rotor_model, prepare_molecule
from .pool import ConformerRecord
from .propose.hybrid import run_hybrid_generation
Expand Down Expand Up @@ -189,6 +189,91 @@ def generate_conformers(
return ConformerEnsemble(mol=mol, records=records)


def generate_conformers_from_pose(
mol: Chem.Mol,
constrained_atoms: list[int] | frozenset[int],
config: ConformerConfig | None = None,
preset: ConformerPreset | None = None,
) -> ConformerEnsemble:
"""Generate conformers for an MCS-aligned pose, keeping core atoms fixed.

Designed for FEP-style analogue generation: you supply a molecule with an
existing conformer (e.g. the result of MCS alignment) and the indices of
the atoms that must remain fixed (the MCS core / scaffold). Only the free
terminal rotors are explored, so the search is faster and stays consistent
with the bound pose.

**Atom index convention:** pass indices as they appear in *mol*. If *mol*
already has explicit hydrogens the indices are used as-is. If *mol* has only
heavy atoms, ``Chem.AddHs`` is called internally — it appends new H atoms at
the end and leaves all existing indices unchanged, so heavy-atom indices
remain valid after H addition.

Args:
mol: RDKit molecule with at least one conformer (the MCS-aligned pose).
If multiple conformers are present, the first is used as the seed.
constrained_atoms: Atom indices of the core scaffold that must not move.
These are indices into *mol* as supplied (see note above).
config: Configuration options. Defaults to the ``"analogue"`` preset.
preset: Named preset. Defaults to ``"analogue"`` when neither *config*
nor *preset* is given. Mutually exclusive with *config*.

Returns:
ConformerEnsemble with terminal-group conformational diversity while
preserving the input core geometry.

Raises:
ValueError: If mol has no conformers, if both *config* and *preset* are
supplied, or if the method is unknown.

Examples:
>>> from rdkit import Chem
>>> from openconf import generate_conformers_from_pose
>>> mol = Chem.MolFromSmiles("CCCc1ccccc1")
>>> # (assume mol already has an aligned conformer)
>>> ensemble = generate_conformers_from_pose( # doctest: +SKIP
... mol, constrained_atoms=[4, 5, 6, 7, 8, 9]
... )
"""
if config is not None and preset is not None:
raise ValueError("Specify at most one of 'config' or 'preset', not both.")

if mol.GetNumConformers() == 0:
raise ValueError(
"mol must have at least one conformer for pose-constrained generation. "
"Supply the MCS-aligned pose as conformer 0."
)

# Resolve config — default to "analogue" preset for this entry point.
if config is not None:
resolved_config = ConformerConfig(**{k: v for k, v in config.__dict__.items() if k != "constraint_spec"})
elif preset is not None:
resolved_config = preset_config(preset)
else:
resolved_config = preset_config("analogue")

# Attach the constraint spec (overrides any constraint_spec already in config).
resolved_config.constraint_spec = ConstraintSpec(constrained_atoms=frozenset(constrained_atoms))

# Prepare molecule — AddHs is a no-op if Hs are already present.
prepped_mol = prepare_molecule(mol, add_hs=True)

rotor_model = build_rotor_model(prepped_mol)

prepped_mol, conf_ids, energies = run_hybrid_generation(prepped_mol, rotor_model, resolved_config)

records = [
ConformerRecord(
conf_id=cid,
energy_kcal=energy,
source="analogue",
)
for cid, energy in zip(conf_ids, energies, strict=True)
]

return ConformerEnsemble(mol=prepped_mol, records=records)


def generate_conformers_from_smiles(
smiles: str,
method: str = "hybrid",
Expand Down
Loading
Loading