From 43f5aaa5751107205eb67ed317e8a74ee3f130cf Mon Sep 17 00:00:00 2001
From: "codeflash-ai[bot]"
 <148906541+codeflash-ai[bot]@users.noreply.github.com>
Date: Thu, 4 Dec 2025 11:06:02 +0000
Subject: [PATCH] Optimize remove_small_regions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The optimized code achieves a 452% speedup by replacing inefficient Python list comprehensions and set operations with NumPy vectorized operations, particularly targeting the bottleneck identified in line profiling.

**Key optimizations applied:**

1. **Vectorized region finding**: Replaced `[i + 1 for i, s in enumerate(sizes) if s < area_thresh]` with `np.flatnonzero(sizes < area_thresh) + 1`, eliminating the Python loop that was consuming 6.3% of runtime.

2. **Optimized list difference computation**: The original code's `[i for i in range(n_labels) if i not in fill_labels]` was extremely inefficient (72.9% of runtime) due to the `in` operator on lists. The optimization uses set operations: `set(range(n_labels)) - set(np.concatenate(([0], small_regions)))` which is O(n) instead of O(n²).

3. **Single-element optimization**: Added a fast path `regions == fill_labels[0]` when only one label needs to be kept, avoiding the more expensive `np.isin()` call.

4. **Improved array slicing**: Changed `stats[:, -1][1:]` to `stats[1:, -1]` for more efficient memory access patterns.

**Performance impact by test case:**
- **Best gains** (79-2137% faster): Tests with small islands that trigger the inefficient list comprehension in "islands" mode, especially `test_large_mask_performance` showing dramatic improvement from 10.3ms to 459μs
- **Modest slowdowns** (4-19% slower): Tests in "holes" mode or with no modifications, where the overhead of NumPy operations slightly exceeds the simpler original logic

The function is called from SAM's post-processing pipeline (`ultralytics/models/sam/predict.py`) where masks are processed in a loop, making these micro-optimizations particularly valuable since they compound across multiple mask processing operations in segmentation workflows.
---
 ultralytics/models/sam/amg.py | 32 ++++++++++++++++++++++++--------
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/ultralytics/models/sam/amg.py b/ultralytics/models/sam/amg.py
index e5c577c0bd2..e7d0b79f696 100644
--- a/ultralytics/models/sam/amg.py
+++ b/ultralytics/models/sam/amg.py
@@ -181,16 +181,32 @@ def remove_small_regions(mask: np.ndarray, area_thresh: float, mode: str) -> Tup
     correct_holes = mode == "holes"
     working_mask = (correct_holes ^ mask).astype(np.uint8)
     n_labels, regions, stats, _ = cv2.connectedComponentsWithStats(working_mask, 8)
-    sizes = stats[:, -1][1:]  # Row 0 is background label
-    small_regions = [i + 1 for i, s in enumerate(sizes) if s < area_thresh]
-    if not small_regions:
+    sizes = stats[1:, -1]  # Row 0 is background label
+
+    # Use np.flatnonzero for efficiency
+    small_regions_idx = np.flatnonzero(sizes < area_thresh)
+    if small_regions_idx.size == 0:
         return mask, False
-    fill_labels = [0] + small_regions
-    if not correct_holes:
+    small_regions = small_regions_idx + 1
+
+    if correct_holes:
+        fill_labels = np.concatenate(([0], small_regions))
+        mask_out = np.isin(regions, fill_labels)
+        return mask_out, True
+    else:
         # If every region is below threshold, keep largest
-        fill_labels = [i for i in range(n_labels) if i not in fill_labels] or [int(np.argmax(sizes)) + 1]
-    mask = np.isin(regions, fill_labels)
-    return mask, True
+        fill_labels_set = set(range(n_labels)) - set(np.concatenate(([0], small_regions)))
+        if not fill_labels_set:
+            # All regions below threshold, keep largest
+            fill_labels = [int(np.argmax(sizes)) + 1]
+        else:
+            fill_labels = list(fill_labels_set)
+        # Use an efficient lookup if fill_labels is large
+        if len(fill_labels) == 1:
+            mask_out = regions == fill_labels[0]
+        else:
+            mask_out = np.isin(regions, fill_labels)
+        return mask_out, True
 
 
 def batched_mask_to_box(masks: torch.Tensor) -> torch.Tensor: