-
Notifications
You must be signed in to change notification settings - Fork 33
Use UVParameter forms in UVData.__add__ and UVData.fast_concat to prevent missing parameters. #1606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e0ef552 to
829fab9
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1606 +/- ##
==========================================
- Coverage 99.93% 99.93% -0.01%
==========================================
Files 67 67
Lines 22688 22670 -18
==========================================
- Hits 22674 22656 -18
Misses 14 14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@kartographer this is a slightly different approach than you took in the |
f6b1427 to
33b50d9
Compare
89f562f to
c604805
Compare
945b3a6 to
34449d2
Compare
a1f87dc to
26ed06f
Compare
steven-murray
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bhazelton and sorry for the very slow review! I have a few comments. Mostly I think we can make some of these functions a bit more clear in what they are doing. They seem like they will become quite central in how UVxxx objects operate, so we should make them as clear as possible for our own future reference.
It might also be useful to do a quick profiling for these functions, as I expect they might be bottleneck functions in some applications (especially in terms of memory)
src/pyuvdata/utils/tools.py
Outdated
| return np.asarray(inds) | ||
|
|
||
|
|
||
| def flt_ind_str_arr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be flt_int_str_arr? And can we just make it float_int_to_str_array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| flt_first: bool = True, | ||
| ) -> StrArray: | ||
| """ | ||
| Create a string array built from float and integer arrays for matching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have an example like "E.g. for float 3.7 and int 3 this will create an entry in the output array as '3.7_3'"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
src/pyuvdata/uvbase.py
Outdated
|
|
||
| def _get_param_axis(self, axis_name: str, single_named_axis: bool = False): | ||
| """ | ||
| Get a mapping of parameters that have a given axis to the axis number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use "attributes" instead of "parameters" here? I was confused what this function was doing until I realized you were talking about attributes.
Probably this will be even more easily cleared up by including a single simple example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a problem here in that the attributes are UVParameter objects. I'll work on the wording and an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I updated the name and the docstring. Let me know if it's more clear.
src/pyuvdata/uvbase.py
Outdated
| self, | ||
| other, | ||
| axis_name: str, | ||
| other_inds: IntArray, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this also have a default of None, corresponding to including all entries from the other object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could, that is supported by get_from_form which this is passed to, and I'm willing to add it. But at least at the moment, that's not something that would actually get used because this gets called after we have checked to see if there are duplicates so we know which indices we want. It could possibly be a performance improvement path in the case that there's no down selection, so I'll add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| new_array = np.concatenate( | ||
| [ | ||
| getattr(self, param), | ||
| getattr(other, "_" + param).get_from_form(other_form_dict), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we couldn't just use np.take(getattr(other, param), other_inds, axis) here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. I didn't write get_from_form, it looks like it tries to do a slicing if possible and if it can't it just uses np.take, so I think it's already somewhat optimized? But probably worth doing some performance testing on.
| if param not in multi_axis_params: | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not also padding single-axis objects? This is confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is confusing! Among other things, this is handling the case where we have data divided into chunks along more than one axis and we're trying to combine it. If you combine first along one axis and then start adding in the first chunk along the next axis you have to pad out the multi-dimensional arrays with zeros (and flags) for the corners you don't have actual information yet. That doesn't come up for single axis parameters.
Imagine a 2D array split into quadrants. The first two (along any axis) add fine. When you add in the third, you need to pad the fourth quadrant with zeros (and flags). Then when we add in the fourth, if all the data are zero and flagged we allow it to overwrite that quadrant. Does that make sense? Happy to add some more comments or docstrings if you think it'd be helpful.
| order_dict : dict | ||
| dict giving the final sort indices for each axis (keys are axes, values | ||
| are index arrays for sorting). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this sorting occurs in multiple functions, might it be better to just have a standalone "arbitrary axis sorting helper"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, possibly. I think the reason I did it this way is that each UVParmeter is only affected by one of these functions, so the sorting is currently only done only once per parameter. It's just done in a different function depending on whether it's a single axis or multiaxis array. For the single axis array, the sorting can be done at the same time as the indexing with the np.take call, so I think that's why I did it this way. But open to refactoring if it's not a performance hit.
26ed06f to
347cfdd
Compare
Description
This is following some of the ideas in UVBase._select_along_axis. This could probably be generalized to UVBase, but I wanted to get UVData's version done to work out the ideas before moving it up UVBase. And I wanted to at least get a draft up for @rlbyrne to try out.
Motivation and Context
fixes #1595
Types of changes
Checklist:
Bug fix checklist: