Skip to content

Performance: Dense O(n×m) IoU matrices cause excessive memory usage for large-scale evaluations #1345

@musaqlain

Description

@musaqlain

_overlap_all() allocates two massive, dense (n_truth × n_pred) NumPy arrays, even though the STRtree query immediately before it proves that only a tiny fraction of pairs actually intersect.

https://github.com/weecology/DeepForest/blob/main/src/deepforest/IoU.py#L54-L58

These dense matrices are then passed to linear_sum_assignment() and used to compute a third dense iou_mat, further tripling the memory cost.

https://github.com/weecology/DeepForest/blob/main/src/deepforest/IoU.py#L115

https://github.com/weecology/DeepForest/blob/main/src/deepforest/IoU.py#L122-L128

Impact

For a realistic large-tile evaluation with 5,000 ground truth boxes and 10,000 predictions, this allocates:

  • intersections: 5,000 × 10,000 × 8 bytes = ~381 MB
  • unions: same = ~381 MB
  • iou_mat (line 123): same = ~381 MB
  • np.zeros_like (line 126): same = ~381 MB
  • Total: ~1.5 GB for a single image evaluation

In reality, only a tiny fraction of these pairs actually intersect (typically <1%), so >99% of that memory stores zeros. _overlap_all is called per image via evaluate_boxes → evaluate_image_boxes → compute_IoU, so this becomes a bottleneck when evaluating large NEON plots or aerial survey tiles containing thousands of trees.

Steps to Reproduce

Here is a standalone script that measures the memory waste using tracemalloc:

import time
import tracemalloc
import numpy as np
import geopandas as gpd
from shapely.geometry import box
from deepforest.IoU import _overlap_all

def make_dummy_data(n, img_size=10000, box_size=50):
    xs = np.random.uniform(0, img_size - box_size, n)
    ys = np.random.uniform(0, img_size - box_size, n)
    geoms = [box(x, y, x + box_size, y + box_size) for x, y in zip(xs, ys)]
    return gpd.GeoDataFrame({"score": np.random.uniform(0.5, 1.0, n)}, geometry=geoms)

np.random.seed(42)
truth = make_dummy_data(5000)
preds = make_dummy_data(10000)

tracemalloc.start()
t0 = time.time()

inter, _, _, _ = _overlap_all(preds, truth)

t1 = time.time()
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()

sparsity = np.count_nonzero(inter) / inter.size
print(f"Shape: {inter.shape}")
print(f"Sparsity: {sparsity:.4f}")
print(f"Peak memory: {peak / 1024 / 1024:.1f} MB")
print(f"Time: {t1-t0:.2f}s")

#Proposed Fix
-Use scipy.sparse.csr_matrix instead of np.zeros() to store the intersections and unions, reducing the spatial complexity from O(N*M) to O(k) (where k is the number of actual overlapping pairs).
-For linear_sum_assignment which requires a dense matrix, extract only the dense sub-matrix consisting of rows and columns that have at least one intersection, then map the assignment indices back to the original arrays. This greatly reduces the problem size for the cubic-time Hungarian algorithm.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions