-
Notifications
You must be signed in to change notification settings - Fork 250
Description
_overlap_all() allocates two massive, dense (n_truth × n_pred) NumPy arrays, even though the STRtree query immediately before it proves that only a tiny fraction of pairs actually intersect.
https://github.com/weecology/DeepForest/blob/main/src/deepforest/IoU.py#L54-L58
These dense matrices are then passed to linear_sum_assignment() and used to compute a third dense iou_mat, further tripling the memory cost.
https://github.com/weecology/DeepForest/blob/main/src/deepforest/IoU.py#L115
https://github.com/weecology/DeepForest/blob/main/src/deepforest/IoU.py#L122-L128
Impact
For a realistic large-tile evaluation with 5,000 ground truth boxes and 10,000 predictions, this allocates:
intersections: 5,000 × 10,000 × 8 bytes = ~381 MBunions: same = ~381 MBiou_mat(line 123): same = ~381 MBnp.zeros_like(line 126): same = ~381 MB- Total: ~1.5 GB for a single image evaluation
In reality, only a tiny fraction of these pairs actually intersect (typically <1%), so >99% of that memory stores zeros. _overlap_all is called per image via evaluate_boxes → evaluate_image_boxes → compute_IoU, so this becomes a bottleneck when evaluating large NEON plots or aerial survey tiles containing thousands of trees.
Steps to Reproduce
Here is a standalone script that measures the memory waste using tracemalloc:
import time
import tracemalloc
import numpy as np
import geopandas as gpd
from shapely.geometry import box
from deepforest.IoU import _overlap_all
def make_dummy_data(n, img_size=10000, box_size=50):
xs = np.random.uniform(0, img_size - box_size, n)
ys = np.random.uniform(0, img_size - box_size, n)
geoms = [box(x, y, x + box_size, y + box_size) for x, y in zip(xs, ys)]
return gpd.GeoDataFrame({"score": np.random.uniform(0.5, 1.0, n)}, geometry=geoms)
np.random.seed(42)
truth = make_dummy_data(5000)
preds = make_dummy_data(10000)
tracemalloc.start()
t0 = time.time()
inter, _, _, _ = _overlap_all(preds, truth)
t1 = time.time()
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
sparsity = np.count_nonzero(inter) / inter.size
print(f"Shape: {inter.shape}")
print(f"Sparsity: {sparsity:.4f}")
print(f"Peak memory: {peak / 1024 / 1024:.1f} MB")
print(f"Time: {t1-t0:.2f}s")#Proposed Fix
-Use scipy.sparse.csr_matrix instead of np.zeros() to store the intersections and unions, reducing the spatial complexity from O(N*M) to O(k) (where k is the number of actual overlapping pairs).
-For linear_sum_assignment which requires a dense matrix, extract only the dense sub-matrix consisting of rows and columns that have at least one intersection, then map the assignment indices back to the original arrays. This greatly reduces the problem size for the cubic-time Hungarian algorithm.