-
Notifications
You must be signed in to change notification settings - Fork 15
enh: allow creation of dandiset dois (contrasted to a version doi) #297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
https://github.com/dandi/dandi-archive/pull/2350/files#r2054748163 If version_doi is False, we should point to the DLP instead of a specific version, which will prevent needing additional updates. |
dandischema/models.py
Outdated
| DANDI_ID_PATTERN = r"\d{6}" | ||
| VERSION_PATTERN = rf"{DANDI_ID_PATTERN}/\d+\.\d+\.\d+" | ||
| DANDI_DOI_WITH_VERSION = rf"^10.(48324|80507)/dandi\.{VERSION_PATTERN}" | ||
| DANDI_DOI_NO_VERSION = r"^10\.(48324|80507)/dandi\.\d{6}" | ||
| DANDI_DOI_PATTERN = rf"{DANDI_DOI_WITH_VERSION}|{DANDI_DOI_NO_VERSION}" | ||
| DANDI_PUBID_PATTERN = rf"^DANDI:{VERSION_PATTERN}" | ||
|
|
||
| PUBLISHED_DANDISET_URL_PATTERN = ( | ||
| rf"^{DANDI_INSTANCE_URL_PATTERN}/dandiset/{DANDI_ID_PATTERN}" | ||
| ) | ||
| PUBLISHED_VERSION_URL_PATTERN = ( | ||
| rf"^{DANDI_INSTANCE_URL_PATTERN}/dandiset/{VERSION_PATTERN}$" | ||
| ) | ||
| PUBLISHED_URL_PATTERN = ( | ||
| rf"{PUBLISHED_VERSION_URL_PATTERN}|{PUBLISHED_DANDISET_URL_PATTERN}" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: this has gotten really messy, it can be cleaned up (i just hacked it out to work for now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of those will be outdated soon. We are modifying some of them in #294. The goal for that PR is not to clean the mess though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@candleindark thanks for lookin out, nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the model changes are still necessary at all? I've gone through several implementations that were trying to pass all the data through Pydantic validation first (currently we are going PublishedDandiset or unvalidated), and separately we've also made the decision to store the dandiset-wide doi on the draft version. If we aren't going to soften the Dandiset pydantic model so we can use that for a draft dandiset, I'm not sure if the draft could ever pass PublishedDandiset validation. So there isnt really a need to change the regex to accept that other format of doi...
edd18b6 to
e6ebaa2
Compare
yarikoptic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid creating new classes if feasible
120c277 to
d6f4848
Compare
|
For |
This also means that we will get unvalidated data. Is it possible to think of another approach? I am aware that there are already existing uses of If the goal is to create a DOI for a dandiset, encompassing all versions, you may need less information than what is contained in a |
|
@candleindark I dont think its as bad to be unvalidated here as it might seem, but it does seem like we are bypassing the value of using the Pydantic models for this. Normal validation is still going on under the hood for Dandi itself, this is just executed after the data has been saved to the db, just validating the data we are sending to Datacite. If that doesnt conform to our models, thats ok, and if it doesnt conform to their spec, we will just log and move on. |
My concern is that once Assuming that you are calling the from pydantic import BaseModel
class Bar(BaseModel):
s: str
class Foo(BaseModel):
bar: Bar = Bar(s="default string")
x: int
# `model_construct()` is called to return an intermediate result so that the result
# is never treated as a `Foo` instance by other code. Calling dict() with result of
# `model_construct()` returns the raw field values of the result (including the
# default values and excluding the extra values).
f_dict: dict = dict(Foo.model_construct(x=42, y=3))
print(f_dict)
"""
{'bar': Bar(s='default string'), 'x': 42}
"""In this example, there is no (potentially) invalid object. I think you can use this method in your |
- deprecate to_datacite(publish) in favor of event - If PublishedDandiset validation fails, fall back to unvalidated Dandiset
67fb5ef to
942f616
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #297 +/- ##
==========================================
- Coverage 97.88% 94.13% -3.75%
==========================================
Files 16 16
Lines 1983 2080 +97
==========================================
+ Hits 1941 1958 +17
- Misses 42 122 +80
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
2465bbd to
278f7c9
Compare
construct_unvalidated_dandiset was reorganized to use a dict for the inner portions *prior* to initializing as a model. This allows mypy to understand the expected types (otherwise we need to type ignore most of it)
278f7c9 to
53adc4b
Compare
| except ValidationError: | ||
| # mypy can't track that meta is still dict after failed PublishedDandiset(**meta) | ||
| assert isinstance(meta, dict) | ||
| if meta.get("version") == "draft": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the version is a draft, it wont have fields like datePublished. However this can also happen when we are creating a Dandiset DOI from a published version-- in this case, the metadata is the published version, but the doi and the url fields wont pass validation (they both won't include the version).
Previously I modified the PublishedDandiset to accept either format for url and doi, but I dont think that really makes sense-- those aren't valid for a published dandiset, and we wouldnt want our output schema to reflect flexibility thats not really there.
No description provided.