Skip to content

ooptimize IoU matching: sparse strtree output + 1:1 pre-filtering#1361

Open
musaqlain wants to merge 1 commit intoweecology:mainfrom
musaqlain:IoU_performance
Open

ooptimize IoU matching: sparse strtree output + 1:1 pre-filtering#1361
musaqlain wants to merge 1 commit intoweecology:mainfrom
musaqlain:IoU_performance

Conversation

@musaqlain
Copy link
Copy Markdown
Contributor

@musaqlain musaqlain commented Mar 25, 2026

Resolves #1345

_overlap_all() used the STRtree to find overlapping pairs but then discarded that sparse result by filling dense (n_truth × n_pred) matrices. These were passed directly to linear_sum_assignment(), which runs in O(n²m).

Following @jveitchmichaelis's suggestion, this PR:

  1. _overlap_all() returns sparse parallel arrays directly from the STRtree.

  2. match_polygons() first identify unambiguous 1:1 matches {using np.bincount on the STRtree indices} and resolves them immediately. Only the remaining ambiguous pairs go to linear_sum_assignment() via a reduced sub-matrix.

  3. 1 improvement I made: union areas are computed arithmetically (area(A) + area(B) - area(intersection)) instead of calling shapely.union(), this is more efficient as per my findings.

Existing tests pass as-is.....

#AI disclosure

  • AI is used for final improvements. snippets are generated by copilot along my coding
  • AI is used to gather background knowledge...

Copilot AI review requested due to automatic review settings March 25, 2026 19:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses the performance/memory bottleneck in polygon IoU matching by keeping STRtree overlap results sparse and reducing the size of the assignment problem before running Hungarian matching.

Changes:

  • Change _overlap_all() to return sparse parallel arrays (overlap indices + intersection/union areas) instead of dense (n_truth × n_pred) matrices.
  • Update match_polygons() to pre-resolve unambiguous 1:1 overlaps and run linear_sum_assignment() only on the remaining ambiguous subset.
  • Compute union areas via area(A) + area(B) - area(intersection) instead of shapely.union().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 85.71429% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.82%. Comparing base (884502e) to head (5afa786).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
src/deepforest/IoU.py 85.71% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1361      +/-   ##
==========================================
- Coverage   87.35%   86.82%   -0.54%     
==========================================
  Files          24       24              
  Lines        2981     3202     +221     
==========================================
+ Hits         2604     2780     +176     
- Misses        377      422      +45     
Flag Coverage Δ
unittests 86.82% <85.71%> (-0.54%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vickysharma-prog
Copy link
Copy Markdown
Contributor

Traced through the diff the two-stage matching logic checks out. If a truth has exactly one overlapping pred and that pred has exactly one overlapping truth (bincount==1 both sides) with positive intersection, pulling them out can't affect the optimal assignment for what remains. The inter_areas > 0 guard matters too STRtree "intersects" includes boundary-touching pairs with zero-area overlap that you'd never want to lock in as a match.

Couple of things I noticed:

The early return adds a new path that didn't exist before when truths and preds both exist but nothing overlaps, the original ran linear_sum_assignment on an all-zeros matrix and assigned some truths to predictions with IoU=0. The new code short-circuits all truths to unmatched (prediction_id=None, IoU=0). More sensible, but it is a behavioral change a test for truths-present-zero-overlaps would pin down the contract.

setdefault in Stage 2 is technically safe since Stage 1 and Stage 2 touch disjoint truth indices by construction, but a direct assignment would make that invariant visible instead of silently swallowing a violation if it ever breaks.

Since #1345 had the tracemalloc script, running it on this branch
would give concrete before/after numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance: Dense O(n×m) IoU matrices cause excessive memory usage for large-scale evaluations

3 participants