Skip to content

Conversation

@rietmann-nv
Copy link

When unpickling into an existing molecule with an atom with additional ring
data, the previous ringinfo data would get reset when adding the bonds. We now
save the ringinfo, add the bonds, and restore the ringinfo.

When unpickling into an existing molecule with an atom with additional ring
data, the previous ringinfo data would get reset when adding the bonds. We now
save the ringinfo, add the bonds, and restore the ringinfo.
When unpickling into an existing molecule, it happens that when you add bonds it
destroys the previous ringinfo. Instead of trying to save it when this happens,
we re-compute the ringinfo after adding the bonds.
rietmann-nv and others added 24 commits November 5, 2025 14:35
In the line:
existing = addPerElementProp(name, scope, T(), computed, false);
when T = ExplicitBitVect, the default contstructor left dp_bits=nullptr. Later
in the addPerElementProp routine a copy constructor was called, which assumed a
non-null dp_bits pointer. By making the dp_bits non-null, we avoid the segfault.
All tests in `graphmoltestPickler` now pass.

Property pickle support uses new property iterator
We removed the ringInfo reset after the addBonds when pickling. The reference
resets the also resets the property cache, but we don't do this.
…rrectly handled

The pickle writer was outputting a "RDVal" instead of a string, and there wasn't
any method to serialize the RDVal. By correctly getting the string, it was able
to be serialized properly, fixing the pickle, which fixed the unpickling, which
threw an error.

We also added a check for pickleFlags, but this wasn't necessary for the
catch_pickle tests, but maybe elsewhere it will be relevant.
Re-getting ringInfo after ringInfo was generated internally, solved a problem
where the compat interface was stale.

We added an extra function that was thought to help, but actually didn't help.
It seems useful anyway, so keeping it.
Old proplist always had computedPropName as a prop. This meant that it always
was counted and existed. This is difficult to support using new backend and we
are dropping this, unless someone desperately wants it.
Properties with unsigned int and int were converted to ANY type, so we added
some extra conversion tooling to help ensure that we don't end up in ANY type
for int - unsigned int. That said, we also added a fix for any time just in case.
Don't try to convert float32 -> int32 in property storage
The internal functions use a certain amount of compatibility routines so it's
far from a perfect implementation. It's also unclear if any checks/modifications
are missing, hopefully further testing will check for that.
Storing and recalling PropTokens wasn't handled correctly. We ensure we store a
PropToken at the parser side, and then RDMol now handles PropTokens who were
stored
Adjust pickling to ensure that PropTokens are converted to strings before pickling
This resets the bond query to avoid any use-after-free query invocations
Some notes, from a debug program I wrote to help fix this:

```
Creating mol from: c12ccc(CCCCCCCc5ccc(C2)cc5)cc1
Creating query from: c1cc2ccc1CCCCCCCc1ccc(cc1)C2

Ring info for mol:
  numRings: 5
  Ring atoms: [1,2,3,18,19,0,] [12,13,14,16,17,11,] [4,5,6,7,8,9,10,11,12,13,14,15,0,1,2,3,] [4,5,6,7,8,9,10,11,17,16,14,15,0,1,2,3,] [18,19,0,15,14,13,12,11,10,9,8,7,6,5,4,3,]

Ring info for query:
  numRings: 6
  Ring atoms: [0,5,4,3,2,1,] [14,15,16,17,18,13,] [6,7,8,9,10,11,12,13,14,15,16,19,2,3,4,5,] [6,7,8,9,10,11,12,13,18,17,16,19,2,3,4,5,] [19,2,1,0,5,6,7,8,9,10,11,12,13,14,15,16,] [17,18,13,12,11,10,9,8,7,6,5,0,1,2,19,16,]
Mol canonical SMILES:   c1cc2ccc1CCCCCCCc1ccc(cc1)C2
Query canonical SMILES: c1cc2ccc1CCCCCCCc1ccc(cc1)C2

Attempting SubstructMatch(*mol, *query)...
Result: FAILED  <-- master branch says is SUCCESS

Attempting SubstructMatch(*query, *mol)...
Result: SUCCESS
Match size: 20
```

In the fixed minimal_rdmol branch, I had to comment out an if checking the number of atom rings, which diverges.

```
if (res) {
    // if (hasOwningMol() && what->hasOwningMol() &&
        // this->getOwningMol().getRingInfo()->isInitialized() &&
        // what->getOwningMol().getRingInfo()->isInitialized() &&
        // this->getOwningMol().getRingInfo()->numAtomRings(d_index) >
            // what->getOwningMol().getRingInfo()->numAtomRings(what->d_index)) {
      // res = false;
  // } else if (!this->getAtomicNum()) {
  if (!this->getAtomicNum()) {
...
```

The master branch doesn't have this atomRing check:

```
if (res) {
  if (!this->getAtomicNum()) {
```
Templated over atom & bond to allow property testing using existing function.
The replaceSubstruct tried to replace the same bond twice due to iterator
invalidation. Now we collect indices first and iterator over those instead.
Error: what():  getImplicitValence() called without preceding call to calcImplicitValence()

Similar to a previous bug, need to calculate implicitvalence before using it.
Adjusted code to better match master reference, despite the slightly confusing
way master is written.
rietmann and others added 15 commits December 31, 2025 12:13
ringInfo compat interface needs to be re-called when using the compatibility
interface to ensure that the data is properly synced
setOwningMol for bonds was prevented from happening more than once. This only
sets a pointer, so I don't know why this would be prevented, so I removed the
check. Obviously one could have issues downstream with indexing and such, but
that is a problem regardless of how many times it is set.
Moved the bond additions after the loop to avoid invalidating the iterator
…ct()"

New backend doesn't have GetDict(), so we had to replace this using
getPropList() and a sequence of 'try-catch' to get the correct type. This would
be better if we could determine the type of the property before throwing an
exception, but getting the templated type correct for the variety of objects
that use GetPropsAsDict was too complicated so I opted for the simpler approach
using try-catch.
Avoids an overflow when the property is at uint_max
But also have to wrap that in a try-catch to avoid other errors...
…tQuery(). Required for QueryAtom::setQuery(nullptr) and QueryBond::setQuery(nullptr), where the PRECONDITION additionally needs to start with `what == nullptr ||`. This fixes a segfault in molInterchangeCatchTest
Slightly more involved handling of property to python conversion. This handles
overflow scenarios more correctly... the try-catch approach has some faults but
is the best I can come up with at the moment.

Also remove __computedProps check in test, which is deprecated.
Added a sync between stereo groups compat interface and the native backend,
because python often uses the compat interface. This required a few non-trivial
changes to pipe through an owning mol to stereo groups and adding a sync
variable to stereo groups in RDMol.

Also fixed another bug with RingInfo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants