Skip to content

Conversation

@jl-wynen
Copy link
Collaborator

Fixes #311

The label argument must exactly match the label in the ontology. This can be a little cumbersome because users have to look up those labels. I considered implemented a fuzzy matcher. But I don't want to do that in a __setitem__ function. So I'll leave that as a later addition if we want it. In the meantime, I have a prototype using RapidFuzz and a widget.

I chose to encrypt the json file because that makes it 10x smaller and adds almost no complication except that we can't read it on GitHub.

@jl-wynen jl-wynen requested a review from YooSunYoung December 12, 2025 15:47
Copy link
Collaborator

@YooSunYoung YooSunYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few minor comments but seems okay to me in general.

The first element of the list is the primary label.
All labels are lowercase and contain no leading or trailing whitespace.
"""
return _load_ontology("expands_techniques") # type: ignore[return-value]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function is for notebooks, maybe we can wrap it in html repr, otherwise ignore me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mean t as a 'backend' implementation. I don't think it is very useful to show this list in a notebook directly. The link to the actual ontology seems more useful.

Comment on lines 78 to 80
for iri, labels in expands_techniques().items():
if label in labels:
return Technique(pid=iri, name=labels[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There won't be any overlapping labels between techniques, right...?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, there are a few. E.g.

diffraction ['http://purl.org/pan-science/PaNET/PaNET01022', 'http://purl.org/pan-science/PaNET/PaNET2020018']

I'll update this to raise in that case. Users will just have to specify the exact IRI.

Comment on lines 473 to 480
type="raw",
contact_email="jan-lukas.wynen@ess.eu",
creation_time="2142-04-02T16:44:56",
owner="Jan-Lukas Wynen",
owner_group="ess",
principal_investigator="jan-lukas.wynen@ess.eu",
source_folder=RemotePath("/hex/source62"),
techniques=[technique],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to use real emails for testings?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I shouldn't. I guess I was just lazy

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can maybe add a test if the ontology file is up to date in the CI or will it not be ever changed...?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will change. I wouldn't want that check in the regular tests, but I can try and set it up in CI.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure how to do this. The JSONLD file does not encode the version. I would have to parse that out of the HTML, TTL, or N Triples file. Or clone the github repo. But then I don't know if the deployed version is the same as the repo.
Do you think this is worth pursuing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simply check if the binary files that we make from the jsonld files are exactly the same?

But we can skip it I guess, if we ever want it to be updated, we will probably be asked to do so by instrument scientists or SIMS team.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we download and parse the whole ontology every time we use it, then we don't even need to store it in the package. I wanted to avoid that so that we don't need an internet connection to make local datasets. Admittedly, that is a niche use case.

@jl-wynen jl-wynen merged commit 2f16a83 into main Dec 15, 2025
13 checks passed
@jl-wynen jl-wynen deleted the expands-ontologies branch December 15, 2025 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add helper for specifying technique IDs from ExPaNDS onthology

3 participants