forked from rdkit/rdkit
-
Notifications
You must be signed in to change notification settings - Fork 1
Fixing pickle bug: Number of rings now maintained when unpickling into existing molecules #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rietmann-nv
wants to merge
44
commits into
ndickson-nvidia:minimal_rdmol
Choose a base branch
from
rietmann-nv:minimal_rdmol
base: minimal_rdmol
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Fixing pickle bug: Number of rings now maintained when unpickling into existing molecules #2
rietmann-nv
wants to merge
44
commits into
ndickson-nvidia:minimal_rdmol
from
rietmann-nv:minimal_rdmol
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When unpickling into an existing molecule with an atom with additional ring data, the previous ringinfo data would get reset when adding the bonds. We now save the ringinfo, add the bonds, and restore the ringinfo.
When unpickling into an existing molecule, it happens that when you add bonds it destroys the previous ringinfo. Instead of trying to save it when this happens, we re-compute the ringinfo after adding the bonds.
dc2e0d5 to
9d2df3c
Compare
In the line: existing = addPerElementProp(name, scope, T(), computed, false); when T = ExplicitBitVect, the default contstructor left dp_bits=nullptr. Later in the addPerElementProp routine a copy constructor was called, which assumed a non-null dp_bits pointer. By making the dp_bits non-null, we avoid the segfault.
All tests in `graphmoltestPickler` now pass. Property pickle support uses new property iterator
We removed the ringInfo reset after the addBonds when pickling. The reference resets the also resets the property cache, but we don't do this.
…rrectly handled The pickle writer was outputting a "RDVal" instead of a string, and there wasn't any method to serialize the RDVal. By correctly getting the string, it was able to be serialized properly, fixing the pickle, which fixed the unpickling, which threw an error. We also added a check for pickleFlags, but this wasn't necessary for the catch_pickle tests, but maybe elsewhere it will be relevant.
Re-getting ringInfo after ringInfo was generated internally, solved a problem where the compat interface was stale. We added an extra function that was thought to help, but actually didn't help. It seems useful anyway, so keeping it.
Old proplist always had computedPropName as a prop. This meant that it always was counted and existed. This is difficult to support using new backend and we are dropping this, unless someone desperately wants it.
Properties with unsigned int and int were converted to ANY type, so we added some extra conversion tooling to help ensure that we don't end up in ANY type for int - unsigned int. That said, we also added a fix for any time just in case.
Don't try to convert float32 -> int32 in property storage
The internal functions use a certain amount of compatibility routines so it's far from a perfect implementation. It's also unclear if any checks/modifications are missing, hopefully further testing will check for that.
Storing and recalling PropTokens wasn't handled correctly. We ensure we store a PropToken at the parser side, and then RDMol now handles PropTokens who were stored
Adjust pickling to ensure that PropTokens are converted to strings before pickling
This resets the bond query to avoid any use-after-free query invocations
Some notes, from a debug program I wrote to help fix this:
```
Creating mol from: c12ccc(CCCCCCCc5ccc(C2)cc5)cc1
Creating query from: c1cc2ccc1CCCCCCCc1ccc(cc1)C2
Ring info for mol:
numRings: 5
Ring atoms: [1,2,3,18,19,0,] [12,13,14,16,17,11,] [4,5,6,7,8,9,10,11,12,13,14,15,0,1,2,3,] [4,5,6,7,8,9,10,11,17,16,14,15,0,1,2,3,] [18,19,0,15,14,13,12,11,10,9,8,7,6,5,4,3,]
Ring info for query:
numRings: 6
Ring atoms: [0,5,4,3,2,1,] [14,15,16,17,18,13,] [6,7,8,9,10,11,12,13,14,15,16,19,2,3,4,5,] [6,7,8,9,10,11,12,13,18,17,16,19,2,3,4,5,] [19,2,1,0,5,6,7,8,9,10,11,12,13,14,15,16,] [17,18,13,12,11,10,9,8,7,6,5,0,1,2,19,16,]
Mol canonical SMILES: c1cc2ccc1CCCCCCCc1ccc(cc1)C2
Query canonical SMILES: c1cc2ccc1CCCCCCCc1ccc(cc1)C2
Attempting SubstructMatch(*mol, *query)...
Result: FAILED <-- master branch says is SUCCESS
Attempting SubstructMatch(*query, *mol)...
Result: SUCCESS
Match size: 20
```
In the fixed minimal_rdmol branch, I had to comment out an if checking the number of atom rings, which diverges.
```
if (res) {
// if (hasOwningMol() && what->hasOwningMol() &&
// this->getOwningMol().getRingInfo()->isInitialized() &&
// what->getOwningMol().getRingInfo()->isInitialized() &&
// this->getOwningMol().getRingInfo()->numAtomRings(d_index) >
// what->getOwningMol().getRingInfo()->numAtomRings(what->d_index)) {
// res = false;
// } else if (!this->getAtomicNum()) {
if (!this->getAtomicNum()) {
...
```
The master branch doesn't have this atomRing check:
```
if (res) {
if (!this->getAtomicNum()) {
```
Also: It shouldn't need to exist
Templated over atom & bond to allow property testing using existing function.
The replaceSubstruct tried to replace the same bond twice due to iterator invalidation. Now we collect indices first and iterator over those instead.
Error: what(): getImplicitValence() called without preceding call to calcImplicitValence() Similar to a previous bug, need to calculate implicitvalence before using it.
Adjusted code to better match master reference, despite the slightly confusing way master is written.
ringInfo compat interface needs to be re-called when using the compatibility interface to ensure that the data is properly synced
setOwningMol for bonds was prevented from happening more than once. This only sets a pointer, so I don't know why this would be prevented, so I removed the check. Obviously one could have issues downstream with indexing and such, but that is a problem regardless of how many times it is set.
…exists Using getProp instead
Moved the bond additions after the loop to avoid invalidating the iterator
…ct()" New backend doesn't have GetDict(), so we had to replace this using getPropList() and a sequence of 'try-catch' to get the correct type. This would be better if we could determine the type of the property before throwing an exception, but getting the templated type correct for the variety of objects that use GetPropsAsDict was too complicated so I opted for the simpler approach using try-catch.
…version It was converting slightly wrong...
Avoids an overflow when the property is at uint_max
But also have to wrap that in a try-catch to avoid other errors...
…tQuery(). Required for QueryAtom::setQuery(nullptr) and QueryBond::setQuery(nullptr), where the PRECONDITION additionally needs to start with `what == nullptr ||`. This fixes a segfault in molInterchangeCatchTest
Slightly more involved handling of property to python conversion. This handles overflow scenarios more correctly... the try-catch approach has some faults but is the best I can come up with at the moment. Also remove __computedProps check in test, which is deprecated.
Added a sync between stereo groups compat interface and the native backend, because python often uses the compat interface. This required a few non-trivial changes to pipe through an owning mol to stereo groups and adding a sync variable to stereo groups in RDMol. Also fixed another bug with RingInfo
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When unpickling into an existing molecule with an atom with additional ring
data, the previous ringinfo data would get reset when adding the bonds. We now
save the ringinfo, add the bonds, and restore the ringinfo.