-
Notifications
You must be signed in to change notification settings - Fork 18
Prerequisites for supporting domain detection classes #564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f4657ab
bf7e7dd
f13bbe3
b254647
bd9614a
d9037af
aa6f20c
6674131
3b10b39
55654ac
4b7063a
e30d7a0
6553964
fd4034d
5998918
c552c3b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1 @@ | ||
| __version__ = '0.13.1' | ||
| __version__ = '0.14.0' |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,21 @@ | ||
| from typing_extensions import Protocol, runtime_checkable | ||
| from os import SEEK_SET | ||
|
|
||
|
|
||
| @runtime_checkable | ||
| class Reader(Protocol): | ||
| def read(self, size: int = -1, /) -> bytes: | ||
| """EOF if empty b''.""" | ||
|
|
||
|
|
||
| @runtime_checkable | ||
| class Seeker(Protocol): | ||
| def seek(self, offset: int, whence: int = SEEK_SET) -> int: | ||
| """ Change the position to the given offset, returning the absolute position. """ | ||
|
|
||
|
|
||
| @runtime_checkable | ||
| class ReadSeeker(Reader, Seeker, Protocol): | ||
| """ | ||
| A :class:`Reader` capable of changing the position from which it is reading. | ||
| """ |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,6 +42,10 @@ class StandardDatasetIndex(str, PydanticEnum): | |
| """ Index for the name of a data file within a dataset. """ | ||
| COMPOSITE_SOURCE_ID = (9, str, "COMPOSITE_SOURCE_ID") | ||
| """ Index for DATA_ID values of source dataset(s) when dataset is composite format and derives from others. """ | ||
| HYDROFABRIC_VERSION = (10, str, "HYDROFABRIC_VERSION") | ||
| """ Version string for version of the hydrofabric to use (e.g., 2.0.1). """ | ||
| HYDROFABRIC_REGION = (11, str, "HYDROFABRIC_REGION") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Out of scope, but related, we will need something for the hydrofabric model attributes as well. We should probably chat with the HF team about how they intend to version those.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've opened #569 for tracking this in the future.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for opening something to track this! |
||
| """ Region string (e.g., conus, vpu01) for the applicable region of the hydrofabric. """ | ||
|
|
||
| def __new__(cls, index: int, ty: type, name: str): | ||
| o = str.__new__(cls, name) | ||
|
|
@@ -90,8 +94,8 @@ class DataFormat(PydanticEnum): | |
| index that can be used to distinguish the collections, so that the right data can be identified. | ||
| """ | ||
| AORC_CSV = (0, | ||
| {StandardDatasetIndex.CATCHMENT_ID: None, StandardDatasetIndex.TIME: ""}, | ||
| {"": datetime, "APCP_surface": float, "DLWRF_surface": float, "DSWRF_surface": float, | ||
| {StandardDatasetIndex.CATCHMENT_ID: None, StandardDatasetIndex.TIME: "Time"}, | ||
| {"Time": datetime, "APCP_surface": float, "DLWRF_surface": float, "DSWRF_surface": float, | ||
| "PRES_surface": float, "SPFH_2maboveground": float, "TMP_2maboveground": float, | ||
| "UGRD_10maboveground": float, "VGRD_10maboveground": float, "precip_rate": float}, | ||
| True | ||
|
|
@@ -184,6 +188,38 @@ class DataFormat(PydanticEnum): | |
| T_ROUTE_CONFIG = (13, {StandardDatasetIndex.DATA_ID: None, StandardDatasetIndex.HYDROFABRIC_ID: None}, None, False) | ||
| """ Format for t-route application configuration. """ | ||
|
|
||
| NGEN_GEOPACKAGE_HYDROFABRIC_V2 = (14, | ||
| {StandardDatasetIndex.CATCHMENT_ID: "divide_id", | ||
| StandardDatasetIndex.HYDROFABRIC_ID: None, | ||
| StandardDatasetIndex.HYDROFABRIC_REGION: None, | ||
| StandardDatasetIndex.HYDROFABRIC_VERSION: None}, | ||
| {"fid": int, "divide_id": str, "geom": Any, "toid": str, "type": str, | ||
| "ds_id": float, "areasqkm": float, "id": str, "lengthkm": float, | ||
| "tot_drainage_areasqkm": float, "has_flowline": bool}, | ||
| ) | ||
| """ GeoPackage hydrofabric format v2 used by NextGen (id is catchment id). """ | ||
|
|
||
| EMPTY = (15, {}, None, False) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do these interact with the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm ... I don't think there's any coupling, strictly speaking, between categories and either formats or domains. For this in particular, I think that's fine: it's reasonable to have a dataset in any category for which the data coverage is defined to be either empty or unknown. More generally, I can already see things with categories that need tweaking (the notion of an Did you have something specific in mind when you noted this, or just a general concern?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've always thought there was an implied unidirectional relationship from a
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There definitely is conceptually, just not one that's explicitly defined somewhere. I haven't tested that scenario, but I expect it would allow the |
||
| """ | ||
| "Format" for an empty dataset that, having no data (yet), doesn't have (or need) an applicable defined structure. | ||
|
|
||
| The intent of this is for simplicity when creating dataset. This format represents a type of dataset that doesn't, | ||
| and importantly, **cannot** yet truly have a more specific format that matches its contents. A key implication is | ||
| an expectation is that the domain of the dataset (including the format) **must** be changed as soon as any data is | ||
| added to the dataset. | ||
| """ | ||
|
|
||
| GENERIC = (16, {}, None, False) | ||
| """ | ||
| Format without any indications or restrictions on the defined structure of contained data. | ||
|
|
||
| This value is very much like ``EMPTY`` except that it is applicable to non-empty datasets. It represents absolutely | ||
| nothing about the structure of any contents, and thus that absolutely anything can be contained or added. In | ||
| practice, the main intended difference from ``EMPTY`` is that datasets in this format will not be required to update | ||
| their data domain at the time new data is added (while not applicable to ``EMPTY``, the same is true when any data | ||
| is removed). | ||
| """ | ||
|
|
||
| @classmethod | ||
| def can_format_fulfill(cls, needed: 'DataFormat', alternate: 'DataFormat') -> bool: | ||
| """ | ||
|
|
@@ -325,7 +361,9 @@ class ContinuousRestriction(Serializable): | |
|
|
||
| variable: StandardDatasetIndex | ||
| begin: datetime | ||
| """ An inclusive beginning value. """ | ||
| end: datetime | ||
| """ An exclusive end value. """ | ||
| datetime_pattern: Optional[str] | ||
| subclass: str = None | ||
| """ | ||
|
|
@@ -439,9 +477,6 @@ def convert_truncated_serial_form(cls, truncated_json_obj: dict, datetime_format | |
|
|
||
| return json_copy | ||
|
|
||
| def __hash__(self) -> int: | ||
| return hash((self.variable.name, self.begin, self.end)) | ||
|
|
||
| def contains(self, other: 'ContinuousRestriction') -> bool: | ||
| """ | ||
| Whether this object contains all the values of the given object and the two are of the same index. | ||
|
|
@@ -492,6 +527,11 @@ def __init__( | |
| if allow_reorder: | ||
| self.values.sort() | ||
|
|
||
| def __eq__(self, other): | ||
| if not isinstance(other, DiscreteRestriction): | ||
| return False | ||
| return self.variable == other.variable and sorted(self.values) == sorted(other.values) | ||
|
|
||
| def __hash__(self) -> int: | ||
| return hash((self.variable.name, *self.values)) | ||
|
|
||
|
|
@@ -551,7 +591,22 @@ def is_all_possible_values(self) -> bool: | |
|
|
||
| class DataDomain(Serializable): | ||
| """ | ||
| A domain for a dataset, with domain-defining values contained by one or more discrete and/or continuous components. | ||
| A domain for some collection of data, with defining values contained by discrete and/or continuous components. | ||
|
|
||
| A definition for the domain of some kind of collection of data. The collection may be something more concrete, like | ||
| a ::class:`Dataset` instance, or more abstract, like forcing data sufficient to run a requested model execution. | ||
|
|
||
| The definition consists of details on the structure and content of the data within the collection. Structure is | ||
| represented by a ::class:`DataFormat` attribute, and contents are represented by collections of | ||
| ::class:`ContinuousRestriction` and ::class:`DiscreteRestriction` objects. | ||
|
|
||
| While a domain may have any number of continuous or discrete restrictions individually, combined it must have at | ||
| least one, or validation will fail. | ||
|
|
||
| There is a notion of whether a domain "contains" certain described data. This described data can be a simple | ||
| description of some data index and values it, fundamentally the definition of ::class:`ContinuousRestriction` and | ||
| ::class:`DiscreteRestriction` objects. The described data can also be more complex, like another fully defined | ||
| domain. A function is provided by the type for performing such tests. | ||
| """ | ||
| data_format: DataFormat = Field( | ||
| description="The format for the data in this domain, which contains details like the indices and other data fields." | ||
|
|
@@ -622,12 +677,17 @@ def handle_type_map(t): | |
|
|
||
| @root_validator() | ||
| def validate_sufficient_restrictions(cls, values): | ||
| data_format = values.get("data_format") | ||
| if data_format == DataFormat.EMPTY or data_format == DataFormat.GENERIC: | ||
| return values | ||
| continuous_restrictions = values.get("continuous_restrictions", {}) | ||
| discrete_restrictions = values.get("discrete_restrictions", {}) | ||
| if len(continuous_restrictions) + len(discrete_restrictions) == 0: | ||
| msg = "Cannot create {} without at least one finite continuous or discrete restriction" | ||
| raise RuntimeError(msg.format(cls.__name__)) | ||
| return values | ||
| if len(continuous_restrictions) + len(discrete_restrictions) > 0: | ||
| return values | ||
| raise RuntimeError(f"Cannot create {cls.__name__} without at least one finite continuous or discrete " | ||
| f"restriction, except when data format is {DataFormat.GENERIC.name} or " | ||
| f"{DataFormat.EMPTY.name} (provided value was: " | ||
| f"{'None' if data_format is None else data_format.name})") | ||
|
|
||
| @classmethod | ||
| def factory_init_from_restriction_collections(cls, data_format: DataFormat, **kwargs) -> 'DataDomain': | ||
|
|
@@ -836,6 +896,7 @@ def dict( | |
| return serial | ||
|
|
||
|
|
||
|
|
||
| class DataCategory(PydanticEnum): | ||
| """ | ||
| The general category values for different data. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1 @@ | ||
| __version__ = '0.9.5' | ||
| __version__ = '0.10.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on making this a generic
VERSION?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep it separate, at least for now (does any other data format use any versioning at the moment?).
A lot of - maybe all - the other data DMOD works with is only fully valid if we assume its applied within some hydrofabric. I suspect before too long we will want or need a hydrofabric version index in constraints defining the domain of regridded forcings or BMI init config datasets, to be able to tell if the
cat-1156involved is actually thecat-1156we are interested in. And that would be a more flexible way to constrain things than aligning the specific hydrofabric id.