-
Notifications
You must be signed in to change notification settings - Fork 15
Introduce ContentType as Blob vs Zarr and rename blobDateModified to contentDateModified #220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…contentDateModified This is just an initial attempt open for discussion. I ran into "blobDateModified" in a zarr metadata and it raised my eyebrow since that is not really appropriate and confusing. Hence I decided to look into generalization. I also thought that it would be valuable to make "type" of the content Asset points to explicit, although that could lead to inconsistencies since information is somewhat redundant with encodingFormat and potentially could also be deduced from contenUrl since we have different end points on S3, etc. Nevertheless I think it might be better to make it explicit. Or at least we have to rename blobDateModified. - ContenType name is quite suboptimal since there is a standard HTTP header https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type and thus we could potential confusion. But we should keep it a "Type" (not e.g. a Class) to be consistent with other type definitions among models. So the other part we could try to vary is "Content". Possible alternatives are "Object", "Data", "Resource" - ATM we call all Zarrs just Zarr but it is a "ZarrFolder" really. I wonder if it would be time to start to introduce differentiation here by making it "ZarrFolder", as later we might get "ZarrHDF5" or alike
6d3b5ff to
ddd77da
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #220 +/- ##
==========================================
- Coverage 97.66% 97.61% -0.05%
==========================================
Files 18 18
Lines 1798 1806 +8
==========================================
+ Hits 1756 1763 +7
- Misses 42 43 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
i'm fine with something like this being in the database, but not sure about changing the metadata model. perhaps we can brainstorm when we meet. |
satra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have both encodingFormat and dataType - do we need yet another field or a clearer encodingFormat.
(I see no hits among jsonld manifests ATM)
|
| if "schemaKey" not in obj: | ||
| obj["schemaKey"] = "Dandiset" | ||
| obj["schemaVersion"] = to_version | ||
| if version2tuple(schema_version) < version2tuple("0.7.0"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here will need to become some future version now that 0.7.0 is out... we might want to introduce next branch alike to next in linux development, or more specifically next-0.8.0 and position this PR against it so we could absorb meanwhile non-breaking changes
| if version2tuple(schema_version) < version2tuple("0.7.0"): | |
| if version2tuple(schema_version) < version2tuple("0.8.0"): |
| or obj.get("path", "").endswith(".zarr") | ||
| ) | ||
| else models.ContentType.Blob | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not yet 100% certain since we have only 2 cases -- it is either zarr or not (blob), but this field and code would be the "centralization" of the logic for DANDI-specific behavior of zarr-vs-blob. Otherwise, if we add some other "contentType" later on (e.g. remoteLink 🤷) such comparisons would need to be duplicated in every piece of code... we would need likely to add validation for that logic also at pydantic or linkml level.
This is just an initial attempt open for discussion. @satra please chime in
I ran into "blobDateModified" in a zarr metadata and it raised my eyebrow since that is not really appropriate and confusing. Hence I decided to look into generalization. I also thought that it would be valuable to make "type" of the content Asset points to explicit, although that could lead to inconsistencies since information is somewhat redundant with encodingFormat and potentially could also be deduced from contenUrl since we have different end points on S3, etc.
Nevertheless I think it might be better to make it explicit. Or at least we have to rename blobDateModified.
ContenType name is quite suboptimal since there is a standard HTTP header https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type and thus we could potential confusion.
But we should keep it a "Type" (not e.g. a Class) to be consistent with other type definitions among models.
So the other part we could try to vary is "Content". Possible alternatives are "Object", "Data", "Resource"
ATM we call all Zarrs just Zarr but it is a "ZarrFolder" really. I wonder if it would be time to start to introduce differentiation here by making it "ZarrFolder", as later we might get "ZarrHDF5" or alike