Skip to content

Invalid Smiles not working in scikit-fingerprints #5

@julianpollmann

Description

@julianpollmann

Some SMILES are invalid according to scikit-fingerprints, but seem to be valid for PubChem e.g., C1=ClC=ClC([Cl-]1)OS(=O)(=O)Cl

Using such SMILES as input will lead to an Exception:

File "/xx/chemap/chemap/fingerprint_computation.py", line 531, in _compute_sklearn
  X = fp.transform(smiles)
File "/xx/envs/chemap/lib/python3.13/site-packages/sklearn/utils/_set_output.py", line 316, in wrapped
  data_to_wrap = f(self, X, *args, **kwargs)
File "/xx/envs/chemap/lib/python3.13/site-packages/skfp/fingerprints/map.py", line 180, in transform
  return super().transform(X, copy=copy)
          ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
File "/xx/envs/chemap/lib/python3.13/site-packages/sklearn/utils/_set_output.py", line 316, in wrapped
  data_to_wrap = f(self, X, *args, **kwargs)
File "/xx/envs/chemap/lib/python3.13/site-packages/skfp/bases/base_fp_transformer.py", line 216, in transform
  results = self._calculate_fingerprint(X)
File "/xx/envs/chemap/lib/python3.13/site-packages/skfp/fingerprints/map.py", line 183, in _calculate_fingerprint
  X = ensure_mols(X)
File "/xx/envs/chemap/lib/python3.13/site-packages/skfp/utils/validators.py", line 26, in ensure_mols
  raise TypeError(f"Could not parse '{X[idx]}' at index {idx} as molecule")
TypeError: Could not parse 'C1=ClC=ClC([Cl-]1)OS(=O)(=O)Cl' at index 24143 as molecule

Either clean SMILES beforehand, pass Mols (RDKit or MolsTransformer) or catch exception

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions