Skip to content

SIGSEGV crash in ligand SMILES parsing due to missing implicit valence check #649

@volgin

Description

@volgin

Bug

AllChem.AddHs() in schema.py (line ~1243) can crash the process with SIGSEGV (exit -11) or SIGILL (exit -4) on certain valid-looking SMILES inputs.

Root cause

AddHs() internally calls getNumImplicitHs(), which requires calcImplicitValence() to have run successfully for all atoms. While MolFromSmiles() with default sanitization catches most problems, some molecules with problematic valence states (hypervalent atoms, certain organometallics) pass sanitization but still crash in AddHs().

Location

# src/boltz/data/parse/schema.py, ligand SMILES processing
mol = AllChem.MolFromSmiles(seq)
mol = AllChem.AddHs(mol)  # <-- can SIGSEGV here

Fix

Add UpdatePropertyCache(strict=True) between MolFromSmiles() and AddHs(). This replicates the implicit valence calculation and raises a clean ValueError for molecules that would otherwise crash:

mol = AllChem.MolFromSmiles(seq)
if mol is None:
    raise ValueError(f"Failed to parse SMILES: {seq}")
mol.UpdatePropertyCache(strict=True)
mol = AllChem.AddHs(mol)

We encountered this in production and fixed it in volgin/boltz-community@b555625.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions