Skip to content

Atomic properties and adjacency matrix for each target mol are used#155

Open
DaniDelHoyo wants to merge 4 commits intoRMeli:developfrom
DaniDelHoyo:develop
Open

Atomic properties and adjacency matrix for each target mol are used#155
DaniDelHoyo wants to merge 4 commits intoRMeli:developfrom
DaniDelHoyo:develop

Conversation

@DaniDelHoyo
Copy link

@DaniDelHoyo DaniDelHoyo commented Feb 26, 2026

Description

This PR is meant to solve an issue when calculating the RMSD on a list of target molecules. At least when using the command line, def symmrmsd iterates over the target molecules coordinates and send them to _rmsd_isomorphic_core for the calculation.
However, the atomic properties and adjacency matrix sent is the one of the first target molecule (lines 369 and 371):

mols[0].atomicnums
mols[0].adjacency_matrix

This assumes that the properties and adjacency matrix of all the target molecules are all the same, which was not my case, and leads to incorrect RMSD calculations. As we have that information for each of the molecules, it is not difficult to just pass the respective atomic properties and adjacency matrix for each of the molecules.

I have not tested in depth if my changes break anything else though, so please take it into account first.

Checklist

  • Tests
  • Documentation
  • Changelog

@RMeli
Copy link
Owner

RMeli commented Mar 2, 2026

Hi @DaniDelHoyo, thank you for rising the problem and for your contribution! The fact that all molecules have the same atomic properties and adjacency matrix in symmermsd was a design decision since I was focused on docking (i.e. different poses of the same molecule). Indeed, the symmrmds function was not accepting a list and the tests are failing with your changes.

However, I understand this might be an issue if it fails with incorrect results. Could you please share a bit more about the calculation that was failing for you (i.e. a simple reproducer), so that we can decide how to move forward?

@DaniDelHoyo
Copy link
Author

Hi @RMeli , thank you for your answer. I did notice there was some failing tests sorry, I didn't have the time to check them.
So the problem actually arises when trying to compare docking poses that come from different origins (different software) or even those with the original crystal structure. I am guessing that at some point in the pipeline conversions the order of the atoms varies and makes the adjacency matrix be different. I attach some example molecule files where the problem can be checked.
Mol1 and Mol2 are sdf files (had to rename them as txt so GitHub let me upload them) originally coming from pdbqt files docked with Vina.
Crystal is the actual pdb position of the ligand of a complex.

mol2.txt
mol1.txt
crystal.txt

When executing spyrmsd such as:
python -m spyrmsd mol1.sdf mol2.sdf crystal.pdb
Output:
5.59979
7.44949

However, if we skip the mol2.sdf (and therefore, the adjacency matrix of crystal.pdb is used in the RMSD calculation), it's RMSD is quite different:
python -m spyrmsd mol1.sdf crystal.pdb
Output:
1.21230

I have noticed the same behavior in some other cases but this was the clearest.
Hope this helps, let me know if I could do anything else.
Dani

@DaniDelHoyo
Copy link
Author

I have just noticed that my changes would only make an effect if the parameter "cache" is deactivated (True by default), because with cache=True, the isomorphism from the first molecule will be used even if the whole list of adj. matrix and atom properties are passed. So it would be necessary to control this cache parameter too, somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants