-
Notifications
You must be signed in to change notification settings - Fork 0
feat!: Implementation project for decoupling resampling parameters and symmetry detection fix #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Note: benchmark comparisons added to the PR description. |
|
Hi, I'm going to refrain from reviewing until the PR is made to In general, multiple PRs are easier to review and merge, so to the question of multiple PRs vs one PR, I'd prefer multiple PRs. |
|
UPDATE: After discussions with the rest of the research team, I'm going to find a better way to address PR#2 that involves modifying the GSG to handle empty hypothesis spaces instead of setting a minimum hypothesis space size. I will mark this as draft. |
|
@vkakerbeck, I've updated the PR description now. You already approved the work from PR#5. It would be great if you can take a look at PR#1 and PR#3, to review the research aspects, when you get a chance. No rush since we've deprioritized integrating this work in favor of preparation for the mini focus week. |
|
NOTE: I'm working on resolving some merge conflicts with |
|
lmk if you need help resolving any of these conflicts (since I merged that PR) |
|
@vkakerbeck, The PR is now ready for your review. |
vkakerbeck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Overall looks great. I just left a few clarifying comments. A few thoughts on a high level:
- When you check whether all still works in the hierarchical setup, make sure to also run the
infer_comp_lvl1_with_comp_models_and_resamplingexperiment. This one is not part of the benchmark table reported but it's the only hierarchical experiment that actually uses the resampling - I am happy with the solution you implemented here but to simplify things in the future it might be worth considering whether we can use hypothesis space size + hypothesis age to check for symmetry instead of recalculating the indices. Especially since sampling new hypotheses towards the end would also throw off the hypothesis check that we currently do
- Do you plan on making this the new setup for benchmarks? The accuracy gains are impressive but with the large runtime increases it may not be practical to do for now until we actually keep the hypothesis space smaller.
| self.last_possible_hypotheses = ( | ||
| self.hypotheses_updater.remap_hypotheses_ids_to_present( | ||
| self.last_possible_hypotheses | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why this needs to be assigned to self.last_possible_hypotheses. It somehow looks wrong to set self.last_possible_hypotheses twice in this if branch. Would it make more sense to do last_possible_hypotheses_remapped = self.hypotheses_updater.remap_hypotheses_ids_to_present(self.last_possible_hypotheses) and then pass last_possible_hypotheses_remapped into the _check_for_symmetry function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sounds good. The arguments are updated in a12335710366 and type hinting for the function is added in 65119deed73e
| ) | ||
| if increment_evidence: | ||
| previous_hyps = set(self.last_possible_hypotheses) | ||
| if increment_evidence and self.last_possible_hypotheses.graph_id == object_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this check should be happening earlier. If the last mlh is not the same as the current one we shouldn't even go down the symmetry check patch and call this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the earliest we could do this check is in get_unique_pose_if_available:
# Check for symmetry
last_possible_hypotheses_remapped = (
self.hypotheses_updater.remap_hypotheses_ids_to_present(
self.last_possible_hypotheses
)
)
if last_possible_hypotheses_remapped.graph_id == object_id:
symmetry_detected = self._check_for_symmetry(
object_id=object_id,
last_possible_object_hypotheses=last_possible_hypotheses_remapped,
possible_object_hypotheses_ids=possible_object_hypotheses_ids,
# Don't increment symmetry counter if LM didn't process observation
increment_evidence=self.buffer.get_last_obs_processed(),
)
else:
symmetry_detected = False
self.last_possible_hypotheses = ConsistentHypothesesIds(
hypotheses_ids=possible_object_hypotheses_ids,
channel_sizes=self.channel_hypothesis_mapping[object_id].channel_sizes,
graph_id=object_id,
)But this would be equivalent to just returning early at the beginning of the _check_for_symmetry function:
if (
last_possible_object_hypotheses is None
or last_possible_object_hypotheses.graph_id != object_id
):
return FalseI updated this in da1dff650e7f, let me know if this is not what you meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of checking if self.previous_mlh["graph_id"] == self.current_mlh["graph_id"] in get_unique_pose_if_available before calling self._check_for_symmetry. That way we don't have to pass object_id to the check_for_symmetry function and it seems more clearer to read. But maybe I am missing something.
Anyways, this is more of an implementation note and not a functional change
| resampling_multiplier: Determines the number of the hypotheses to resample | ||
| as a multiplier of the object graph nodes. Value of 0.0 results in no | ||
| resampling. Value can be greater than 1 but not to exceed the | ||
| `num_hyps_per_node` of the current step. Defaults to 0.1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the last part. How is num_hyps_per_node determined and why does it vary on a step by step basis (+how do you then make sure your resampling_multiplier is always below it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The num_hyps_per_node is a variable calculated in this function. If the sampling is informed by the observation, it can be either 2 (when pose is defined), or umbilical_num_poses (when pose is undefined). So, it can vary based on the current observation on a step by step basis. We don't want the resampling_multiplier to exceed this value because that is the total amount of informed hypotheses available for us at any step. This limit is set here.
| resampling. Value can be greater than 1 but not to exceed the | ||
| `num_hyps_per_node` of the current step. Defaults to 0.1. | ||
| evidence_slope_threshold: Hypotheses below this threshold are deleted. | ||
| Defaults to 0.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be helpful to add an expected range here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in ab2c18fbf27a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the default range be [-1,2] in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. Fixed in 3d1fcee85dc5
| for input_channel in input_channels_to_use: | ||
| # Calculate sample count for each type | ||
| existing_count, informed_count = self._sample_count( | ||
| hypotheses_selection, informed_count = self._sample_count( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is hypotheses_selection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment here.
| # Should we remove this now that we are resampling? We can sample the | ||
| # same number of hypotheses during initialization as in every other step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I think it could be worth trying (and would also move us more towards an "episode-free" world)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok nice, yeah I'm testing this in the follow-up intelligent resizing work.
| ) | ||
| # Calculate the total number of informed hypotheses to be resampled | ||
| new_informed = round(graph_num_points * resampling_multiplier) | ||
| new_informed -= new_informed % num_hyps_per_node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this line just make sure new_informed is divisible by num_hyps_per_node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. This is useful for the sample_informed function. To sample efficiently, we first get the nodes with highest evidence, this number of nodes sampled will be the requested number of hypotheses divided by num_hyps_per_node. After getting the nodes, we then tile to get num_hyps_per_node rotations for each node.
So making sure new_informed is divisible by num_hyps_per_node just makes sure we get exactly the number of hyps we ask for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay cool, thank! For readability it may help to add a short comment here, it took me a while to parse this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added in 97a92fac867b
| # Returns a selection of hypotheses to maintain/delete | ||
| hypotheses_selection = tracker.select_hypotheses( | ||
| slope_threshold=self.evidence_slope_threshold, channel=input_channel | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit confused by this hypothesis_selection. Is this a list of indices? What does the comment mean? Is it hypotheses to maintain or to delete? Could we call this hypothis_ids_to_delete? Or add a bit more explanation in the comment on what this variable contains?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment here.
| def __init__(self, window_size: int = 3, min_age: int = 5) -> None: | ||
| def __init__(self, window_size: int = 10, min_age: int = 5) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious about the reason for this. Did the windowsize of 3 not work well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I found this out empirically a while back. A window size of 3 is too small and ends up deleting good hypotheses in experiments with noise. A value of 10 or 12 works much better for deletion.
| return HypothesesSelection(maintain_mask) | ||
|
|
||
|
|
||
| class HypothesesSelection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay reading this now I understand what happens in the other parts of the code. This may just be personal preference so feel free to leave it as is an discuss with engineering when integrating but to me, this is a lot more confusing than just explicitly defining masks in the lines where you need them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed useful to have one object that can provide us with maintained or deleted hypotheses since we need both at different parts of the code and I didn't want to keep passing two masks and sets of ids. But it could be an overkill that affects the readability of the code. I'll bring this up during the IP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay sounds good. I found it a bit confusing to read and would find mask and not mask easier to follow. But like you said, something you can figure out during the IP
|
@vkakerbeck, thanks for the review! I've addressed your comments.
I ran all the compositional benchmarks and added the comparisons to tables in this PR description. All of them remained the same, except for
I'm not sure I understand what you mean, maybe we can discuss this more in our 1:1
Yes, I agree. We won't make it the new default just yet. In general, I think |
|
Great! Some nice improvements on the compositional + resampling experiment :) Re. using hyp space size + age I was just wondering if our definition of symmetry could change with this new resampling approach. I haven't thought it all the way through yet and am not sure we are quite there yet but here is what I was thinking: Right now we define symmetry as having (roughly) the same set of possible hypotheses for a certain number of steps. Currently, this is measured by looking at the hypothesis IDs of the hypotheses above a threshold. However, with the resampling we will not have the threshold anymore and instead all hypotheses that exist are considered possible (if I understand it right? I think you set x_percent_threshold to all now). So that means that if the hypothesis space size remains the same for several steps (i.e. no hypotheses added or deleted), we have a symmetric hyp space. It gets a bit more complicated since we could be adding the same amount as we delete so we would need to look at the age in addition. So something like Not sure I am thinking of all the edge cases thought. Happy to talk more about this in our meeting. |
|
Thanks @vkakerbeck, yes, I think this would be much simpler! NOTE: while investigating why prediction error wasn't being reported, I found a bug that led to errors in many episodes in the resampling updater with the compositional resampling experiment. So, only a subset of the episodes ran, and that's why we had this very high accuracy. In a82dd2ae549b, I had changed the condition to initialize the hypothesis space from infer_comp_lvl1_with_comp_models_and_resampling
|
|
Good catch! |
This implementation project bundles three focused changes that implement decoupling resampling parameters, maintaining a minimum hypothesis space size and fixing symmetry detection for terminal states. For easier reviewing, I suggest taking a look at the separate PRs merged into
devfirst:resampling_multiplierandevidence_slope_thresholdfor controlling the sampling and deletion behavior.HypothesesSelectioncontainer class andEvidenceSlopeTracker.select_hypothesesthat returns a selection of hypotheses.ResamplingHypothesesUpdater._sample_countto return aHypothesesSelection._sample_countPR#2: Minimum maintained hypothesesExtendsEvidenceSlopeTracker.select_hypotheseswith the argumentmin_maintained_hyps.Ensures we never drop below a requested minimum during deletion, as required by GSG.Adds a new unit test for this change.HypothesesUpdater.remap_hypotheses_ids_to_presentto the protocol.ConsistentHypothesesIdscontains the ids to be mapped across timesteps.All 3 PRs have been approved by @nielsleadholm as part of the research prototype. The next step is for @vkakerbeck (FYI @scottcanoe / @hlee9212 ) to take a look at the research aspects.
Afterwards I will work with the engineering team to merge this into
tbp.monty. @tristanls-tbp or @jeremyshoemaker, feel free to start taking a look to decide if you want to split this up into multiple IP PRs or just a single one with all the changes. I've kept thisdevbranch in sync withtbp.monty:mainfor a smoother merge.Benchmarks
All runs reported below are on wandb tagged under "feat.dynamic_resizing:PR#4"
The benchmarks below are generated by changing the
default_evidence_lm_configto use:hypotheses_updater_class = ResamplingHypothesesUpdaterevidence_threshold_config="all"base_config_10distinctobj_dist_agent
base_config_10distinctobj_surf_agent
randrot_noise_10distinctobj_dist_agent
randrot_noise_10distinctobj_dist_on_distm
randrot_noise_10distinctobj_surf_agent
randrot_10distinctobj_surf_agent
randrot_noise_10distinctobj_5lms_dist_agent
base_10simobj_surf_agent
randrot_noise_10simobj_dist_agent
randrot_noise_10simobj_surf_agent
randomrot_rawnoise_10distinctobj_surf_agent
base_10multi_distinctobj_dist_agent
base_77obj_dist_agent
base_77obj_surf_agent
randrot_noise_77obj_dist_agent
randrot_noise_77obj_surf_agent
randrot_noise_77obj_5lms_dist_agent
unsupervised_inference_distinctobj_dist_agent
unsupervised_inference_distinctobj_surf_agent
infer_comp_lvl1_with_monolithic_models
infer_comp_lvl1_with_comp_models
infer_comp_lvl2_with_comp_models
infer_comp_lvl3_with_comp_models
infer_comp_lvl4_with_comp_models
infer_comp_lvl1_with_comp_models_and_resampling