Conversation
|
For completeness adding my commends from Slack here... We've lost the desired high-level API from my implementation in #288. i.e., One can test the implementation against the whole dataset using the following from qcelemental.models import Molecule
from qcelemental.exceptions import MoleculeFormatError
from pathlib import Path
if __name__ == "__main__":
import sys
path = Path(sys.argv[1])
failures = []
for i, p in enumerate(path.iterdir()):
full_path = p.resolve()
try:
Molecule.from_file(full_path)
except MoleculeFormatError as e:
print(full_path.name)
failures.append(full_path.name)
if i % 1000 == 0:
print(i)
print(failures)
print(f"Total Failures: {len(failures)}")The changed test implementation from unprocessed, processed = _filter_xyz(string, strict=True)to final = qcelemental.molparse.from_string(string, return_processed=False, dtype="gdb")is what makes this PR still "pass" the tests I wrote, but we've lost the |
|
Sorry, saw this after Slack, so I'll repeat here :-) A near-high-level API should work now as For anyone following along, the key difference is that this PR parses gdb as a separate dtype, whereas #288 parses gdb under "xyz" dtype with some regex relaxations. Maybe that's ok, as gdb is a correct superset of xyz, but I do worry about less guidance/errors being returned to the user. e.g., the below could pass, when it probably wasn't the user's intended geometry. |
|
Cool! Thanks for the update :) I worry about the alternative case, i.e., end users see all the Is there a reason you prefer requiring the extra |
|
Also, I still see many more failures with the current code. Better than before, but I get 613 failures on the |
|
Ideal scenario for this PR:
Can you help me to understand this scenario you are concerned about? Would this be a format we expect users to encounter in regular use or more a hypothetical that concerns you? Thanks for your time on this. I'm happy to help finish the implementation if you can point out the concerns you have with #288 that may have undesired behavior. I found the |
|
This pull request introduces 1 alert and fixes 1 when merging 508817f into cb04079 - view on LGTM.com new alerts:
fixed alerts:
|
|
This pull request introduces 1 alert and fixes 1 when merging 829dd44 into cb04079 - view on LGTM.com new alerts:
fixed alerts:
|
See description and purpose and proposed tests at #288. This is a separate implementation of the parsing.
Status