feat!: Implementation project for decoupling resampling parameters and symmetry detection fix #4

ramyamounir · 2025-10-17T16:23:45Z

This implementation project bundles three focused changes that implement decoupling resampling parameters, maintaining a minimum hypothesis space size and fixing symmetry detection for terminal states. For easier reviewing, I suggest taking a look at the separate PRs merged into dev first:

PR#1: Decouple resampling parameters
- Introduces resampling_multiplier and evidence_slope_threshold for controlling the sampling and deletion behavior.
- Adds HypothesesSelection container class and EvidenceSlopeTracker.select_hypotheses that returns a selection of hypotheses.
- Modifies ResamplingHypothesesUpdater._sample_count to return a HypothesesSelection.
- Simplifies unit tests for _sample_count
~~PR#2: Minimum maintained hypotheses~~
- ~~Extends EvidenceSlopeTracker.select_hypotheses with the argument min_maintained_hyps.~~
- ~~Ensures we never drop below a requested minimum during deletion, as required by GSG.~~
- ~~Adds a new unit test for this change.~~
PR#3: Symmetry remapping fix
- Remaps hypotheses ids between steps based on resampling telemetry.
- Implements the mapping function in the hypotheses updater classes and adds HypothesesUpdater.remap_hypotheses_ids_to_present to the protocol.
- A new dataclass ConsistentHypothesesIds contains the ids to be mapped across timesteps.
- Adds a few unit tests for the remapping function
PR#5: Update gsg to handle empty hypothesis spaces (Research approved)
PR#6: Implement burst sampling and remove special handling of the first observation.

All 3 PRs have been approved by @nielsleadholm as part of the research prototype. The next step is for @vkakerbeck (FYI @scottcanoe / @hlee9212 ) to take a look at the research aspects.

Afterwards I will work with the engineering team to merge this into tbp.monty. @tristanls-tbp or @jeremyshoemaker, feel free to start taking a look to decide if you want to split this up into multiple IP PRs or just a single one with all the changes. I've kept this dev branch in sync with tbp.monty:main for a smoother merge.

Benchmarks

All runs reported below are on wandb tagged under "feat.dynamic_resizing:PR#4"

The benchmarks below are generated by changing the default_evidence_lm_config to use:

hypotheses_updater_class = ResamplingHypothesesUpdater
evidence_threshold_config="all"

base_config_10distinctobj_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	100	100	0	⬜
used_mlh (%)	0	0	0	⬜
match_steps	34	36	2	❌
rotation_error (deg)	13.31	9.45	-3.86	✅
runtime (min)	2	3	1	❌
episode_runtime (sec)	8	14	6	❌

base_config_10distinctobj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	100	100	0	⬜
used_mlh (%)	0	0	0	⬜
match_steps	26	28	2	❌
rotation_error (deg)	15.03	7.43	-7.6	✅
runtime (min)	2	3	1	❌
episode_runtime (sec)	12	15	3	❌

randrot_noise_10distinctobj_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	100	100	0	⬜
used_mlh (%)	3	1	-2	✅
match_steps	52	37	-15	✅
rotation_error (deg)	22.41	13.26	-9.15	✅
runtime (min)	3	4	1	❌
episode_runtime (sec)	22	25	3	❌

randrot_noise_10distinctobj_dist_on_distm

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	99	100	1	✅
used_mlh (%)	1	1	0	⬜
match_steps	37	36	-1	✅
rotation_error (deg)	12.27	12.28	0.01	⬜
runtime (min)	2	3	1	❌
episode_runtime (sec)	17	22	5	❌

randrot_noise_10distinctobj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	100	100	0	⬜
used_mlh (%)	1	0	-1	✅
match_steps	29	29	0	⬜
rotation_error (deg)	20.12	9.83	-10.29	✅
runtime (min)	3	3	0	⬜
episode_runtime (sec)	20	27	7	❌

randrot_10distinctobj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	100	100	0	⬜
used_mlh (%)	0	1	1	❌
match_steps	28	29	1	❌
rotation_error (deg)	19.36	8.22	-11.14	✅
runtime (min)	2	2	0	⬜
episode_runtime (sec)	10	15	5	❌

randrot_noise_10distinctobj_5lms_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	100	100	0	⬜
used_mlh (%)	0	0	0	⬜
match_steps	52	53	1	❌
rotation_error (deg)	44.71	43.73	-0.98	✅
runtime (min)	5	6	1	❌
episode_runtime (sec)	40	54	14	❌

base_10simobj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	93.57	100	6.43	✅
used_mlh (%)	8.57	6.43	-2.14	✅
match_steps	70	67	-3	✅
rotation_error (deg)	12.34	1.61	-10.73	✅
runtime (min)	5	7	2	❌
episode_runtime (sec)	27	38	11	❌

randrot_noise_10simobj_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	84	98	14	✅
used_mlh (%)	40	33	-7	✅
match_steps	234	228	-6	✅
rotation_error (deg)	31.08	19.08	-12	✅
runtime (min)	11	21	10	❌
episode_runtime (sec)	89	171	82	❌

randrot_noise_10simobj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	93	99	6	✅
used_mlh (%)	34	38	4	❌
match_steps	172	197	25	❌
rotation_error (deg)	22.81	8.82	-13.99	✅
runtime (min)	14	22	8	❌
episode_runtime (sec)	120	192	72	❌

randomrot_rawnoise_10distinctobj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	67	78	11	✅
used_mlh (%)	77	76	-1	✅
match_steps	15	15	0	⬜
rotation_error (deg)	117.12	102.97	-14.15	✅
runtime (min)	4	4	0	⬜
episode_runtime (sec)	7	7	0	⬜

base_10multi_distinctobj_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	80.71	75	-5.71	❌
used_mlh (%)	14.29	20	5.71	❌
match_steps	30	29	-1	✅
rotation_error (deg)	19.6	22.53	2.93	❌
runtime (min)	3	4	1	❌
episode_runtime (sec)	1	2	1	❌

base_77obj_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	93.51	99.13	5.62	✅
used_mlh (%)	12.99	9.52	-3.47	✅
match_steps	107	95	-12	✅
rotation_error (deg)	16.46	8.47	-7.99	✅
runtime (min)	20	51	31	❌
episode_runtime (sec)	68	186	118	❌

base_77obj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	98.7	100	1.3	✅
used_mlh (%)	6.49	4.76	-1.73	✅
match_steps	56	43	-13	✅
rotation_error (deg)	11.38	4.02	-7.36	✅
runtime (min)	12	22	10	❌
episode_runtime (sec)	34	66	32	❌

randrot_noise_77obj_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	89.18	97.84	8.66	✅
used_mlh (%)	22.08	22.08	0	⬜
match_steps	147	153	6	❌
rotation_error (deg)	37.37	26.61	-10.76	✅
runtime (min)	32	91	59	❌
episode_runtime (sec)	117	348	231	❌

randrot_noise_77obj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	93.94	100	6.06	✅
used_mlh (%)	23.81	25.54	1.73	❌
match_steps	115	121	6	❌
rotation_error (deg)	36.7	22.93	-13.77	✅
runtime (min)	36	59	23	❌
episode_runtime (sec)	130	220	90	❌

randrot_noise_77obj_5lms_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	87.01	93.51	6.5	✅
used_mlh (%)	0	1.3	1.3	❌
match_steps	64	73	9	❌
rotation_error (deg)	60.61	50.33	-10.28	✅
runtime (min)	10	24	14	❌
episode_runtime (sec)	95	252	157	❌

unsupervised_inference_distinctobj_dist_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	98	98	0	⬜
match_steps	99	100	1	❌
runtime (min)	29	12	-17	✅
episode_runtime (sec)	17	7	-10	✅

unsupervised_inference_distinctobj_surf_agent

Metric	Baseline	Proposed	Δ	Result
percent_correct (%)	95	100	5	✅
match_steps	100	100	0	⬜
runtime (min)	22	19	-3	✅
episode_runtime (sec)	13	11	-2	✅

infer_comp_lvl1_with_monolithic_models

Metric	Baseline	Proposed	Δ	Result
correct_child_or_parent (%)	64.29	64.29	0	⬜
used_mlh (%)	61.9	61.9	0	⬜
match_steps	354	354	0	⬜
rotation_error (deg)	77.31	77.31	0	⬜
avg_prediction_error	0.37	0.37	0	⬜
runtime (min)	27	29	2	❌

infer_comp_lvl1_with_comp_models

Metric	Baseline	Proposed	Result
correct_child_or_parent (%)	83.33	83.33	⬜
used_mlh (%)	32.14	32.14	⬜
match_steps	50	50	⬜
rotation_error (deg)	54.89	54.89	⬜
avg_prediction_error	0.36	0.36	⬜
runtime (min)	4	4	⬜

infer_comp_lvl2_with_comp_models

Metric	Baseline	Proposed	Δ	Result
correct_child_or_parent (%)	86.19	86.19	0	⬜
used_mlh (%)	28.57	28.57	0	⬜
match_steps	35	35	0	⬜
rotation_error (deg)	42.13	42.13	0	⬜
avg_prediction_error	0.36	0.36	0	⬜
runtime (min)	11	9	-2	✅

infer_comp_lvl3_with_comp_models

Metric	Baseline	Proposed	Δ	Result
correct_child_or_parent (%)	64.57	64.57	0	⬜
used_mlh (%)	48.57	48.57	0	⬜
match_steps	35	35	0	⬜
rotation_error (deg)	50.01	50.01	0	⬜
avg_prediction_error	0.37	0.37	0	⬜
runtime (min)	18	19	1	❌

infer_comp_lvl4_with_comp_models

Metric	Baseline	Proposed	Δ	Result
correct_child_or_parent (%)	66.21	66.21	0	⬜
used_mlh (%)	46.15	46.15	0	⬜
match_steps	36	36	0	⬜
rotation_error (deg)	40.45	40.45	0	⬜
avg_prediction_error	0.37	0.37	0	⬜
runtime (min)	21	19	-2	✅

infer_comp_lvl1_with_comp_models_and_resampling

Metric	Baseline	Proposed	Δ	Result
correct_child_or_parent (%)	69.05	78.57	9.52	✅
used_mlh (%)	44.05	51.19	7.14	❌
match_steps	40	57	17	❌
rotation_error (deg)	89.46	68.02	-21.44	✅
avg_prediction_error	0.35	0.34	-0.01	⬜
runtime (min)	4	4	0	⬜

ramyamounir · 2025-10-20T12:49:28Z

Note: benchmark comparisons added to the PR description.

tristanls-tbp · 2025-10-20T18:00:22Z

Hi, I'm going to refrain from reviewing until the PR is made to tbp.monty.

In general, multiple PRs are easier to review and merge, so to the question of multiple PRs vs one PR, I'd prefer multiple PRs.

ramyamounir · 2025-10-20T18:09:03Z

UPDATE: After discussions with the rest of the research team, I'm going to find a better way to address PR#2 that involves modifying the GSG to handle empty hypothesis spaces instead of setting a minimum hypothesis space size.

I will mark this as draft.

ramyamounir · 2025-10-30T21:40:16Z

@vkakerbeck, I've updated the PR description now. You already approved the work from PR#5. It would be great if you can take a look at PR#1 and PR#3, to review the research aspects, when you get a chance. No rush since we've deprioritized integrating this work in favor of preparation for the mini focus week.

ramyamounir · 2025-10-30T21:57:01Z

NOTE: I'm working on resolving some merge conflicts with tbp.monty PR#467

vkakerbeck · 2025-10-31T08:35:30Z

lmk if you need help resolving any of these conflicts (since I merged that PR)

ramyamounir · 2025-11-03T18:19:57Z

@vkakerbeck, The PR is now ready for your review.

vkakerbeck

Nice! Overall looks great. I just left a few clarifying comments. A few thoughts on a high level:

When you check whether all still works in the hierarchical setup, make sure to also run the infer_comp_lvl1_with_comp_models_and_resampling experiment. This one is not part of the benchmark table reported but it's the only hierarchical experiment that actually uses the resampling
I am happy with the solution you implemented here but to simplify things in the future it might be worth considering whether we can use hypothesis space size + hypothesis age to check for symmetry instead of recalculating the indices. Especially since sampling new hypotheses towards the end would also throw off the hypothesis check that we currently do
Do you plan on making this the new setup for benchmarks? The accuracy gains are impressive but with the large runtime increases it may not be practical to do for now until we actually keep the hypothesis space smaller.

vkakerbeck · 2025-11-04T13:37:29Z

src/tbp/monty/frameworks/models/evidence_matching/learning_module.py

+            self.last_possible_hypotheses = (
+                self.hypotheses_updater.remap_hypotheses_ids_to_present(
+                    self.last_possible_hypotheses
+                )
+            )


I'm not sure I understand why this needs to be assigned to self.last_possible_hypotheses. It somehow looks wrong to set self.last_possible_hypotheses twice in this if branch. Would it make more sense to do last_possible_hypotheses_remapped = self.hypotheses_updater.remap_hypotheses_ids_to_present(self.last_possible_hypotheses) and then pass last_possible_hypotheses_remapped into the _check_for_symmetry function?

Yeah, sounds good. The arguments are updated in a12335710366 and type hinting for the function is added in 65119deed73e

vkakerbeck · 2025-11-04T13:40:07Z

src/tbp/monty/frameworks/models/evidence_matching/learning_module.py

        )
-        if increment_evidence:
-            previous_hyps = set(self.last_possible_hypotheses)
+        if increment_evidence and self.last_possible_hypotheses.graph_id == object_id:


It seems like this check should be happening earlier. If the last mlh is not the same as the current one we shouldn't even go down the symmetry check patch and call this function.

I think the earliest we could do this check is in get_unique_pose_if_available:

# Check for symmetry last_possible_hypotheses_remapped = ( self.hypotheses_updater.remap_hypotheses_ids_to_present( self.last_possible_hypotheses ) ) if last_possible_hypotheses_remapped.graph_id == object_id: symmetry_detected = self._check_for_symmetry( object_id=object_id, last_possible_object_hypotheses=last_possible_hypotheses_remapped, possible_object_hypotheses_ids=possible_object_hypotheses_ids, # Don't increment symmetry counter if LM didn't process observation increment_evidence=self.buffer.get_last_obs_processed(), ) else: symmetry_detected = False self.last_possible_hypotheses = ConsistentHypothesesIds( hypotheses_ids=possible_object_hypotheses_ids, channel_sizes=self.channel_hypothesis_mapping[object_id].channel_sizes, graph_id=object_id, )

But this would be equivalent to just returning early at the beginning of the _check_for_symmetry function:

if ( last_possible_object_hypotheses is None or last_possible_object_hypotheses.graph_id != object_id ): return False

I updated this in da1dff650e7f, let me know if this is not what you meant.

I was thinking of checking if self.previous_mlh["graph_id"] == self.current_mlh["graph_id"] in get_unique_pose_if_available before calling self._check_for_symmetry. That way we don't have to pass object_id to the check_for_symmetry function and it seems more clearer to read. But maybe I am missing something.
Anyways, this is more of an implementation note and not a functional change

vkakerbeck · 2025-11-04T13:43:11Z

src/tbp/monty/frameworks/models/evidence_matching/resampling_hypotheses_updater.py

+            resampling_multiplier: Determines the number of the hypotheses to resample
+                as a multiplier of the object graph nodes. Value of 0.0 results in no
+                resampling. Value can be greater than 1 but not to exceed the
+                `num_hyps_per_node` of the current step. Defaults to 0.1.


I'm not sure I understand the last part. How is num_hyps_per_node determined and why does it vary on a step by step basis (+how do you then make sure your resampling_multiplier is always below it?)

The num_hyps_per_node is a variable calculated in this function. If the sampling is informed by the observation, it can be either 2 (when pose is defined), or umbilical_num_poses (when pose is undefined). So, it can vary based on the current observation on a step by step basis. We don't want the resampling_multiplier to exceed this value because that is the total amount of informed hypotheses available for us at any step. This limit is set here.

vkakerbeck · 2025-11-04T13:43:50Z

src/tbp/monty/frameworks/models/evidence_matching/resampling_hypotheses_updater.py

+                resampling. Value can be greater than 1 but not to exceed the
+                `num_hyps_per_node` of the current step. Defaults to 0.1.
+            evidence_slope_threshold: Hypotheses below this threshold are deleted.
+                Defaults to 0.0.


It could be helpful to add an expected range here as well

Added in ab2c18fbf27a

Wouldn't the default range be [-1,2] in this case?

Yes, you're right. Fixed in 3d1fcee85dc5

vkakerbeck · 2025-11-04T13:46:43Z

src/tbp/monty/frameworks/models/evidence_matching/resampling_hypotheses_updater.py

        for input_channel in input_channels_to_use:
            # Calculate sample count for each type
-            existing_count, informed_count = self._sample_count(
+            hypotheses_selection, informed_count = self._sample_count(


What is hypotheses_selection?

See comment here.

vkakerbeck · 2025-11-04T13:50:38Z

src/tbp/monty/frameworks/models/evidence_matching/resampling_hypotheses_updater.py

+        # Should we remove this now that we are resampling? We can sample the
+        # same number of hypotheses during initialization as in every other step.


That's a good point, I think it could be worth trying (and would also move us more towards an "episode-free" world)

Ok nice, yeah I'm testing this in the follow-up intelligent resizing work.

vkakerbeck · 2025-11-04T13:54:33Z

src/tbp/monty/frameworks/models/evidence_matching/resampling_hypotheses_updater.py

-        )
+        # Calculate the total number of informed hypotheses to be resampled
+        new_informed = round(graph_num_points * resampling_multiplier)
+        new_informed -= new_informed % num_hyps_per_node


Does this line just make sure new_informed is divisible by num_hyps_per_node?

Yes, exactly. This is useful for the sample_informed function. To sample efficiently, we first get the nodes with highest evidence, this number of nodes sampled will be the requested number of hypotheses divided by num_hyps_per_node. After getting the nodes, we then tile to get num_hyps_per_node rotations for each node.

So making sure new_informed is divisible by num_hyps_per_node just makes sure we get exactly the number of hyps we ask for.

okay cool, thank! For readability it may help to add a short comment here, it took me a while to parse this line

Sure, added in 97a92fac867b

vkakerbeck · 2025-11-04T13:59:19Z

src/tbp/monty/frameworks/models/evidence_matching/resampling_hypotheses_updater.py

+        # Returns a selection of hypotheses to maintain/delete
+        hypotheses_selection = tracker.select_hypotheses(
+            slope_threshold=self.evidence_slope_threshold, channel=input_channel
+        )


I'm still a bit confused by this hypothesis_selection. Is this a list of indices? What does the comment mean? Is it hypotheses to maintain or to delete? Could we call this hypothis_ids_to_delete? Or add a bit more explanation in the comment on what this variable contains?

See comment here.

vkakerbeck · 2025-11-04T19:07:53Z

src/tbp/monty/frameworks/utils/evidence_matching.py

-    def __init__(self, window_size: int = 3, min_age: int = 5) -> None:
+    def __init__(self, window_size: int = 10, min_age: int = 5) -> None:


Just curious about the reason for this. Did the windowsize of 3 not work well?

Yeah, I found this out empirically a while back. A window size of 3 is too small and ends up deleting good hypotheses in experiments with noise. A value of 10 or 12 works much better for deletion.

vkakerbeck · 2025-11-04T20:17:20Z

src/tbp/monty/frameworks/utils/evidence_matching.py

+        return HypothesesSelection(maintain_mask)
+
+
+class HypothesesSelection:


okay reading this now I understand what happens in the other parts of the code. This may just be personal preference so feel free to leave it as is an discuss with engineering when integrating but to me, this is a lot more confusing than just explicitly defining masks in the lines where you need them.

It seemed useful to have one object that can provide us with maintained or deleted hypotheses since we need both at different parts of the code and I didn't want to keep passing two masks and sets of ids. But it could be an overkill that affects the readability of the code. I'll bring this up during the IP.

okay sounds good. I found it a bit confusing to read and would find mask and not mask easier to follow. But like you said, something you can figure out during the IP

fix: type hinting fix (optional None)

ramyamounir · 2025-11-10T21:51:58Z

@vkakerbeck, thanks for the review! I've addressed your comments.

make sure to also run the infer_comp_lvl1_with_comp_models_and_resampling experiment.

I ran all the compositional benchmarks and added the comparisons to tables in this PR description. All of them remained the same, except for infer_comp_lvl1_with_comp_models_and_resampling, it seems to have improved quite a bit actually. I will look into why the avg prediction error is NaN.

might be worth considering whether we can use hypothesis space size + hypothesis age to check for symmetry instead of recalculating the indices.

I'm not sure I understand what you mean, maybe we can discuss this more in our 1:1

Do you plan on making this the new setup for benchmarks? The accuracy gains are impressive but with the large runtime increases it may not be practical to do for now until we actually keep the hypothesis space smaller.

Yes, I agree. We won't make it the new default just yet. In general, I think ResamplingHypothesesUpdater should completely replace the existing default updater if we manage to make it faster while maintaining good performance, but it's not there yet.

vkakerbeck · 2025-11-11T09:35:19Z

Great! Some nice improvements on the compositional + resampling experiment :)

Re. using hyp space size + age I was just wondering if our definition of symmetry could change with this new resampling approach. I haven't thought it all the way through yet and am not sure we are quite there yet but here is what I was thinking: Right now we define symmetry as having (roughly) the same set of possible hypotheses for a certain number of steps. Currently, this is measured by looking at the hypothesis IDs of the hypotheses above a threshold. However, with the resampling we will not have the threshold anymore and instead all hypotheses that exist are considered possible (if I understand it right? I think you set x_percent_threshold to all now). So that means that if the hypothesis space size remains the same for several steps (i.e. no hypotheses added or deleted), we have a symmetric hyp space. It gets a bit more complicated since we could be adding the same amount as we delete so we would need to look at the age in addition. So something like

current_symmetric_hyp_space_size = len(possible_hypotheses[possible_hypotheses.age>self.symmetry_evidence]
if current_symmetric_hyp_space_size == last_symmetric_hyp_space_size:
   self.symmetry_evidence += 1
   last_symmetric_hyp_space_size = current_symmetric_hyp_space_size

Not sure I am thinking of all the edge cases thought. Happy to talk more about this in our meeting.

…er_node`

ramyamounir · 2025-11-11T23:38:43Z

Thanks @vkakerbeck, yes, I think this would be much simpler!

NOTE: while investigating why prediction error wasn't being reported, I found a bug that led to errors in many episodes in the resampling updater with the compositional resampling experiment. So, only a subset of the episodes ran, and that's why we had this very high accuracy. In a82dd2ae549b, I had changed the condition to initialize the hypothesis space from input_channel not in mapper.channels to displacements is None. I now realized that with stacked LMs we may need to initialize a hypothesis space after we have displacement. I temporarily added the original condition back to make it work but I will work on fixing this in intelligent sampling. Ideally, we won't have a special initialize hypothesis space condition as mentioned here. The results are still a bit better but not a significant performance boost as we previously thought 😞

infer_comp_lvl1_with_comp_models_and_resampling

Metric	Baseline	Proposed	Δ	Result
correct_child_or_parent (%)	69.05	78.57	9.52	✅
used_mlh (%)	44.05	51.19	7.14	❌
match_steps	40	57	17	❌
rotation_error (deg)	89.46	68.02	-21.44	✅
avg_prediction_error	0.35	0.34	-0.01	⬜
runtime (min)	4	4	0	⬜

vkakerbeck · 2025-11-12T09:32:52Z

Good catch!

ramyamounir added 16 commits August 15, 2025 07:10

feat: decouple resampling params (#1)

efa8696

Merge branch 'main' into dev

fb16234

Merge branch 'main' into dev

ce1c37d

Merge branch 'main' into dev

fcb98a2

Merge branch 'main' into dev

9e72e94

Merge branch 'main' into dev

0b9ec0e

Merge branch 'main' into dev

823c741

Merge branch 'main' into dev

03daf45

Merge branch 'main' into dev

cea1465

Merge branch 'main' into dev

867b6c0

Merge branch 'main' into dev

7f07e97

Merge branch 'main' into dev

54fe5cd

feat!: add support for minimum maintained hypotheses (#2)

d84b031

Merge branch 'main' into dev

93f4e47

feat: symmetry remapping fix for consistent ids (#3)

d3da576

Merge branch 'main' into dev

0dff3d7

ramyamounir requested review from jeremyshoemaker, nielsleadholm, tristanls-tbp and vkakerbeck October 17, 2025 16:23

ramyamounir assigned vkakerbeck and nielsleadholm Oct 17, 2025

ramyamounir added triaged implementation labels Oct 17, 2025

refactor: adjust default value of evidence_slope_threshold to 0.3

7a65bcb

tristanls-tbp removed their request for review October 20, 2025 17:59

ramyamounir removed the request for review from jeremyshoemaker October 20, 2025 18:09

ramyamounir added 2 commits November 2, 2025 17:35

Merge branch 'main' into dev

66d1bad

tests: provide init_hyp_space to sample count tests

f650dba

vkakerbeck approved these changes Nov 4, 2025

View reviewed changes

ramyamounir added 4 commits November 10, 2025 10:43

docs: update docstring to add evidence_slope_threshold expected range

ab2c18f

refactor: update last_possible_hypotheses remapping

a123357

chore: added type hinting to _check_for_symmetry

65119de

fix: type hinting fix (optional None)

refactor: move object_id check in symmetry logic

da1dff6

ramyamounir force-pushed the dev branch from 92c62fd to da1dff6 Compare November 10, 2025 16:36

ramyamounir added 2 commits November 10, 2025 17:56

refactor: remove unneccessary variable

732c084

style: ruff RET504

9855df4

ramyamounir added 6 commits November 11, 2025 04:39

docs: fix range of evidence_slope_threshold in docstring

3d1fcee

docs: add comment about new_informed being divisible by `num_hyps_p…

97a92fa

…er_node`

Merge branch 'main' into dev

085a1e1

Merge branch 'main' into dev

9166334

Merge branch 'main' into dev

c26d9dc

refactor: return update telemetry for prediction error as dict

dbecf0a

fix: temporary fix for init_hyp_space conditions

9412d28

ramyamounir added 5 commits November 12, 2025 06:09

Merge branch 'main' into dev

1ef9cf8

Merge branch 'main' into dev

fcf0546

feat!: burst sampling added to the resampling updater (#6)

7985888

Merge branch 'main' into dev

451a418

Merge branch 'main' into dev

c31557c

		# Should we remove this now that we are resampling? We can sample the
		# same number of hypotheses during initialization as in every other step.

		def __init__(self, window_size: int = 3, min_age: int = 5) -> None:
		def __init__(self, window_size: int = 10, min_age: int = 5) -> None:

		return HypothesesSelection(maintain_mask)


		class HypothesesSelection:

feat!: Implementation project for decoupling resampling parameters and symmetry detection fix #4

Are you sure you want to change the base?

feat!: Implementation project for decoupling resampling parameters and symmetry detection fix #4

Uh oh!

Conversation

ramyamounir commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

base_config_10distinctobj_dist_agent

base_config_10distinctobj_surf_agent

randrot_noise_10distinctobj_dist_agent

randrot_noise_10distinctobj_dist_on_distm

randrot_noise_10distinctobj_surf_agent

randrot_10distinctobj_surf_agent

randrot_noise_10distinctobj_5lms_dist_agent

base_10simobj_surf_agent

randrot_noise_10simobj_dist_agent

randrot_noise_10simobj_surf_agent

randomrot_rawnoise_10distinctobj_surf_agent

base_10multi_distinctobj_dist_agent

base_77obj_dist_agent

base_77obj_surf_agent

randrot_noise_77obj_dist_agent

randrot_noise_77obj_surf_agent

randrot_noise_77obj_5lms_dist_agent

unsupervised_inference_distinctobj_dist_agent

unsupervised_inference_distinctobj_surf_agent

infer_comp_lvl1_with_monolithic_models

infer_comp_lvl1_with_comp_models

infer_comp_lvl2_with_comp_models

infer_comp_lvl3_with_comp_models

infer_comp_lvl4_with_comp_models

infer_comp_lvl1_with_comp_models_and_resampling

Uh oh!

ramyamounir commented Oct 20, 2025

Uh oh!

tristanls-tbp commented Oct 20, 2025

Uh oh!

ramyamounir commented Oct 20, 2025

Uh oh!

ramyamounir commented Oct 30, 2025

Uh oh!

ramyamounir commented Oct 30, 2025

Uh oh!

vkakerbeck commented Oct 31, 2025

Uh oh!

ramyamounir commented Nov 3, 2025

Uh oh!

vkakerbeck left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramyamounir Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramyamounir Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ramyamounir commented Oct 17, 2025 •

edited

Loading

ramyamounir Nov 10, 2025 •

edited

Loading

ramyamounir Nov 10, 2025 •

edited

Loading

ramyamounir commented Nov 10, 2025 •

edited

Loading