Support non-ndarray computations, cache unit calls, and add slots to DataArray#47
Support non-ndarray computations, cache unit calls, and add slots to DataArray#47stubbiali wants to merge 6 commits intomcgibbon:masterfrom
Conversation
| for name, value in array_dict.items(): | ||
| if not isinstance(value, np.ndarray): | ||
| array_dict[name] = np.asarray(value) | ||
| pass |
There was a problem hiding this comment.
This is a temporary and dirty solution. We could think of a mechanism to control whether arrays must be coerced or not.
There was a problem hiding this comment.
We can't merge this change as-is, what problem is being solved here and what other solutions are available for it?
There was a problem hiding this comment.
The asarray function of Numpy seeks to coerce the input array-like storage value into a ndarray. This operation could break e.g. the data layout and the memory alignment of value. In the specific case of Tasmania, value could be a GT4Py storage, whose low-level details and features are fitted to the target computing architecture and thus must be preserved. The problem can be circumvented by monkey-patching Numpy via the function gt4py.storage.prepare_numpy(), but this is much GT4Py-specific. We could think of a more organic solution, or just pass value to DataArray as it is and let DataArray perform all type checks (and eventually throw exceptions).
|
|
||
|
|
||
| class DataArray(xr.DataArray): | ||
| __slots__ = [] |
There was a problem hiding this comment.
What are the implications of setting this? What warning is it suppressing, and what behavior does it cause when you set this to an empty list?
There was a problem hiding this comment.
This is aimed to suppress FutureWarning: xarray subclass DataArray should explicitly define __slots__. Here is a nice explanation of how __slots__ work.
| """ | ||
| if len(data_array.values.shape) == 0 and len(out_dims) == 0: | ||
| return data_array.values # special case, 0-dimensional scalar array | ||
| if len(data_array.data.shape) == 0 and len(out_dims) == 0: |
There was a problem hiding this comment.
This change alters the behavior of this function, which is OK, but the variable names, function name, file name, and docstring need to be updated. For example, I would suggest naming the function something like get_underlying_data.
The tests in test_get_restore_numpy_array.py should also be updated to cover cases where data is not a numpy array (and have similar re-namings).
There was a problem hiding this comment.
That's correct. I will rename the function and update the tests.
| for name, value in array_dict.items(): | ||
| if not isinstance(value, np.ndarray): | ||
| array_dict[name] = np.asarray(value) | ||
| pass |
There was a problem hiding this comment.
We can't merge this change as-is, what problem is being solved here and what other solutions are available for it?
|
Could you also please update the PR name with a brief description of what these changes do? e.g. "support non-ndarray computations, cache unit calls, add slots to DataArray"? If the name is too long, these can be put into separate PRs. The unit caching for example is mergeable as-is. |
|
I think the new name is not too long ;) |
A few disparate minor changes.
ndarrayAPI (in the specific case: GT4Py) by (i) avoiding axes transposition unless necessary, (ii) accessing thedatarather than thevaluesattribute ofDataArray, and (iii) avoiding explicit coercion.UnitRegistryto improve performance.__slots__class attribute toDataArrayto suppress warning.