-
Notifications
You must be signed in to change notification settings - Fork 4
Support setting techniques from ExPaNDS ontology label #321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
YooSunYoung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few minor comments but seems okay to me in general.
| The first element of the list is the primary label. | ||
| All labels are lowercase and contain no leading or trailing whitespace. | ||
| """ | ||
| return _load_ontology("expands_techniques") # type: ignore[return-value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this function is for notebooks, maybe we can wrap it in html repr, otherwise ignore me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mean t as a 'backend' implementation. I don't think it is very useful to show this list in a notebook directly. The link to the actual ontology seems more useful.
src/scitacean/ontology/__init__.py
Outdated
| for iri, labels in expands_techniques().items(): | ||
| if label in labels: | ||
| return Technique(pid=iri, name=labels[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There won't be any overlapping labels between techniques, right...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, there are a few. E.g.
diffraction ['http://purl.org/pan-science/PaNET/PaNET01022', 'http://purl.org/pan-science/PaNET/PaNET2020018']
I'll update this to raise in that case. Users will just have to specify the exact IRI.
| type="raw", | ||
| contact_email="jan-lukas.wynen@ess.eu", | ||
| creation_time="2142-04-02T16:44:56", | ||
| owner="Jan-Lukas Wynen", | ||
| owner_group="ess", | ||
| principal_investigator="jan-lukas.wynen@ess.eu", | ||
| source_folder=RemotePath("/hex/source62"), | ||
| techniques=[technique], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay to use real emails for testings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, I shouldn't. I guess I was just lazy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can maybe add a test if the ontology file is up to date in the CI or will it not be ever changed...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will change. I wouldn't want that check in the regular tests, but I can try and set it up in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure how to do this. The JSONLD file does not encode the version. I would have to parse that out of the HTML, TTL, or N Triples file. Or clone the github repo. But then I don't know if the deployed version is the same as the repo.
Do you think this is worth pursuing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simply check if the binary files that we make from the jsonld files are exactly the same?
But we can skip it I guess, if we ever want it to be updated, we will probably be asked to do so by instrument scientists or SIMS team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we download and parse the whole ontology every time we use it, then we don't even need to store it in the package. I wanted to avoid that so that we don't need an internet connection to make local datasets. Admittedly, that is a niche use case.
Fixes #311
The label argument must exactly match the label in the ontology. This can be a little cumbersome because users have to look up those labels. I considered implemented a fuzzy matcher. But I don't want to do that in a
__setitem__function. So I'll leave that as a later addition if we want it. In the meantime, I have a prototype using RapidFuzz and a widget.I chose to encrypt the json file because that makes it 10x smaller and adds almost no complication except that we can't read it on GitHub.