ooptimize IoU matching: sparse strtree output + 1:1 pre-filtering#1361
ooptimize IoU matching: sparse strtree output + 1:1 pre-filtering#1361musaqlain wants to merge 1 commit intoweecology:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses the performance/memory bottleneck in polygon IoU matching by keeping STRtree overlap results sparse and reducing the size of the assignment problem before running Hungarian matching.
Changes:
- Change
_overlap_all()to return sparse parallel arrays (overlap indices + intersection/union areas) instead of dense(n_truth × n_pred)matrices. - Update
match_polygons()to pre-resolve unambiguous 1:1 overlaps and runlinear_sum_assignment()only on the remaining ambiguous subset. - Compute union areas via
area(A) + area(B) - area(intersection)instead ofshapely.union().
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7932be5 to
5afa786
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1361 +/- ##
==========================================
- Coverage 87.35% 86.82% -0.54%
==========================================
Files 24 24
Lines 2981 3202 +221
==========================================
+ Hits 2604 2780 +176
- Misses 377 422 +45
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Traced through the diff the two-stage matching logic checks out. If a truth has exactly one overlapping pred and that pred has exactly one overlapping truth (bincount==1 both sides) with positive intersection, pulling them out can't affect the optimal assignment for what remains. The inter_areas > 0 guard matters too STRtree "intersects" includes boundary-touching pairs with zero-area overlap that you'd never want to lock in as a match. Couple of things I noticed: The early return adds a new path that didn't exist before when truths and preds both exist but nothing overlaps, the original ran linear_sum_assignment on an all-zeros matrix and assigned some truths to predictions with IoU=0. The new code short-circuits all truths to unmatched (prediction_id=None, IoU=0). More sensible, but it is a behavioral change a test for truths-present-zero-overlaps would pin down the contract. setdefault in Stage 2 is technically safe since Stage 1 and Stage 2 touch disjoint truth indices by construction, but a direct assignment would make that invariant visible instead of silently swallowing a violation if it ever breaks. Since #1345 had the tracemalloc script, running it on this branch |
Resolves #1345
_overlap_all() used the STRtree to find overlapping pairs but then discarded that sparse result by filling dense (n_truth × n_pred) matrices. These were passed directly to
linear_sum_assignment(), which runs inO(n²m).Following @jveitchmichaelis's suggestion, this PR:
_overlap_all() returns sparse parallel arrays directly from the STRtree.
match_polygons() first identify unambiguous 1:1 matches {using
np.bincounton the STRtree indices} and resolves them immediately. Only the remaining ambiguous pairs go tolinear_sum_assignment()via a reduced sub-matrix.1 improvement I made: union areas are computed arithmetically (
area(A) + area(B) - area(intersection)) instead of callingshapely.union(), this is more efficient as per my findings.Existing tests pass as-is.....
#AI disclosure