Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 4, 2025

📄 452% (4.52x) speedup for remove_small_regions in ultralytics/models/sam/amg.py

⏱️ Runtime : 12.1 milliseconds 2.19 milliseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 452% speedup by replacing inefficient Python list comprehensions and set operations with NumPy vectorized operations, particularly targeting the bottleneck identified in line profiling.

Key optimizations applied:

  1. Vectorized region finding: Replaced [i + 1 for i, s in enumerate(sizes) if s < area_thresh] with np.flatnonzero(sizes < area_thresh) + 1, eliminating the Python loop that was consuming 6.3% of runtime.

  2. Optimized list difference computation: The original code's [i for i in range(n_labels) if i not in fill_labels] was extremely inefficient (72.9% of runtime) due to the in operator on lists. The optimization uses set operations: set(range(n_labels)) - set(np.concatenate(([0], small_regions))) which is O(n) instead of O(n²).

  3. Single-element optimization: Added a fast path regions == fill_labels[0] when only one label needs to be kept, avoiding the more expensive np.isin() call.

  4. Improved array slicing: Changed stats[:, -1][1:] to stats[1:, -1] for more efficient memory access patterns.

Performance impact by test case:

  • Best gains (79-2137% faster): Tests with small islands that trigger the inefficient list comprehension in "islands" mode, especially test_large_mask_performance showing dramatic improvement from 10.3ms to 459μs
  • Modest slowdowns (4-19% slower): Tests in "holes" mode or with no modifications, where the overhead of NumPy operations slightly exceeds the simpler original logic

The function is called from SAM's post-processing pipeline (ultralytics/models/sam/predict.py) where masks are processed in a loop, making these micro-optimizations particularly valuable since they compound across multiple mask processing operations in segmentation workflows.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 50 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np

# imports
import pytest
from ultralytics.models.sam.amg import remove_small_regions

# unit tests

# --- Basic Test Cases ---


def test_no_small_regions_islands():
    # No regions smaller than threshold, mode 'islands'
    mask = np.zeros((10, 10), dtype=bool)
    mask[2:8, 2:8] = True  # One big region
    out, modified = remove_small_regions(mask, area_thresh=10, mode="islands")  # 15.3μs -> 17.8μs (14.0% slower)


def test_no_small_regions_holes():
    # No holes smaller than threshold, mode 'holes'
    mask = np.ones((10, 10), dtype=bool)
    mask[4:6, 4:6] = False  # One hole of size 4
    out, modified = remove_small_regions(mask, area_thresh=2, mode="holes")  # 14.5μs -> 16.0μs (8.90% slower)


def test_remove_small_island():
    # Remove a small island (region)
    mask = np.zeros((10, 10), dtype=bool)
    mask[1:4, 1:4] = True  # Small island (size 9)
    mask[5:9, 5:9] = True  # Large island (size 16)
    out, modified = remove_small_regions(mask, area_thresh=10, mode="islands")  # 65.0μs -> 31.0μs (110% faster)
    # Only large island should remain
    expected = np.zeros((10, 10), dtype=bool)
    expected[5:9, 5:9] = True


def test_remove_small_hole():
    # Remove a small hole
    mask = np.ones((10, 10), dtype=bool)
    mask[2:4, 2:4] = False  # Small hole (size 4)
    mask[6:9, 6:9] = False  # Large hole (size 9)
    out, modified = remove_small_regions(mask, area_thresh=5, mode="holes")  # 53.7μs -> 65.8μs (18.4% slower)
    # Only large hole should remain
    expected = np.ones((10, 10), dtype=bool)
    expected[6:9, 6:9] = False


def test_multiple_small_islands():
    # Remove multiple small islands
    mask = np.zeros((10, 10), dtype=bool)
    mask[0:2, 0:2] = True  # Small island (size 4)
    mask[8:10, 8:10] = True  # Small island (size 4)
    mask[4:9, 4:9] = True  # Large island (size 25)
    out, modified = remove_small_regions(mask, area_thresh=10, mode="islands")  # 53.9μs -> 30.1μs (79.4% faster)
    expected = np.zeros((10, 10), dtype=bool)
    expected[4:9, 4:9] = True


def test_multiple_small_holes():
    # Remove multiple small holes
    mask = np.ones((10, 10), dtype=bool)
    mask[0:2, 0:2] = False  # Small hole (size 4)
    mask[8:10, 8:10] = False  # Small hole (size 4)
    mask[4:9, 4:9] = False  # Large hole (size 25)
    out, modified = remove_small_regions(mask, area_thresh=10, mode="holes")  # 52.0μs -> 60.2μs (13.6% slower)
    expected = np.ones((10, 10), dtype=bool)
    expected[4:9, 4:9] = False


def test_area_threshold_edge():
    # Test area threshold exactly equal to region size
    mask = np.zeros((10, 10), dtype=bool)
    mask[2:7, 2:7] = True  # Region size 25
    out, modified = remove_small_regions(mask, area_thresh=25, mode="islands")  # 15.8μs -> 18.1μs (12.6% slower)


# --- Edge Test Cases ---


def test_empty_mask():
    # All zeros mask, nothing to remove
    mask = np.zeros((10, 10), dtype=bool)
    out, modified = remove_small_regions(mask, area_thresh=5, mode="islands")  # 14.9μs -> 17.3μs (13.7% slower)


def test_full_mask():
    # All ones mask, nothing to remove
    mask = np.ones((10, 10), dtype=bool)
    out, modified = remove_small_regions(mask, area_thresh=5, mode="islands")  # 15.4μs -> 16.2μs (4.48% slower)


def test_all_regions_below_threshold_islands():
    # All islands are below threshold, should keep largest only
    mask = np.zeros((10, 10), dtype=bool)
    mask[0:2, 0:2] = True  # 4
    mask[4:6, 4:6] = True  # 4
    mask[8:10, 8:10] = True  # 4
    out, modified = remove_small_regions(mask, area_thresh=10, mode="islands")  # 64.7μs -> 33.9μs (90.9% faster)


def test_all_holes_below_threshold_holes():
    # All holes are below threshold, should fill all holes
    mask = np.ones((10, 10), dtype=bool)
    mask[0:2, 0:2] = False  # 4
    mask[4:6, 4:6] = False  # 4
    mask[8:10, 8:10] = False  # 4
    out, modified = remove_small_regions(mask, area_thresh=10, mode="holes")  # 53.8μs -> 63.5μs (15.3% slower)


def test_invalid_mode():
    # Should raise assertion error for invalid mode
    mask = np.ones((5, 5), dtype=bool)
    with pytest.raises(AssertionError):
        remove_small_regions(mask, area_thresh=5, mode="invalid_mode")  # 1.71μs -> 1.63μs (4.78% faster)


def test_non_square_mask():
    # Mask with non-square shape
    mask = np.zeros((5, 10), dtype=bool)
    mask[1:4, 2:8] = True
    out, modified = remove_small_regions(mask, area_thresh=10, mode="islands")  # 19.6μs -> 23.3μs (16.0% slower)


def test_border_touching_island():
    # Island touching border should be treated as a normal region
    mask = np.zeros((10, 10), dtype=bool)
    mask[0:3, 0:3] = True  # Touches top left
    out, modified = remove_small_regions(mask, area_thresh=10, mode="islands")  # 65.3μs -> 35.1μs (85.8% faster)


def test_border_touching_hole():
    # Hole touching border should be treated as a normal hole
    mask = np.ones((10, 10), dtype=bool)
    mask[0:3, 0:3] = False  # Touches top left
    out, modified = remove_small_regions(mask, area_thresh=10, mode="holes")  # 51.6μs -> 64.4μs (19.8% slower)


def test_mask_dtype_uint8():
    # Accepts uint8 mask
    mask = np.zeros((10, 10), dtype=np.uint8)
    mask[2:8, 2:8] = 1
    out, modified = remove_small_regions(
        mask.astype(bool), area_thresh=10, mode="islands"
    )  # 15.5μs -> 18.0μs (14.1% slower)


def test_mask_dtype_bool():
    # Accepts bool mask
    mask = np.zeros((10, 10), dtype=bool)
    mask[2:8, 2:8] = True
    out, modified = remove_small_regions(mask, area_thresh=10, mode="islands")  # 14.5μs -> 17.0μs (15.0% slower)


# --- Large Scale Test Cases ---


def test_large_mask_many_small_islands():
    # Large mask with many small islands
    mask = np.zeros((100, 100), dtype=bool)
    # Place 50 small islands of size 1
    for i in range(50):
        mask[i, i] = True
    # Place one large island
    mask[60:90, 60:90] = True
    out, modified = remove_small_regions(mask, area_thresh=10, mode="islands")  # 43.6μs -> 46.7μs (6.57% slower)
    expected = np.zeros((100, 100), dtype=bool)
    expected[60:90, 60:90] = True


def test_large_mask_many_small_holes():
    # Large mask with many small holes
    mask = np.ones((100, 100), dtype=bool)
    for i in range(50):
        mask[i, i] = False
    # Place one large hole
    mask[60:90, 60:90] = False
    out, modified = remove_small_regions(mask, area_thresh=10, mode="holes")  # 42.9μs -> 44.8μs (4.28% slower)
    expected = np.ones((100, 100), dtype=bool)
    expected[60:90, 60:90] = False


def test_large_mask_no_small_regions():
    # Large mask, no regions below threshold
    mask = np.zeros((100, 100), dtype=bool)
    mask[10:90, 10:90] = True  # Large region
    out, modified = remove_small_regions(mask, area_thresh=50, mode="islands")  # 45.5μs -> 47.8μs (4.72% slower)


def test_large_mask_all_small_regions():
    # Large mask, all regions below threshold
    mask = np.zeros((100, 100), dtype=bool)
    for i in range(10):
        mask[i * 10 : (i + 1) * 10, i * 10 : (i + 1) * 10] = True  # 10 regions of size 100
    out, modified = remove_small_regions(mask, area_thresh=200, mode="islands")  # 43.4μs -> 46.4μs (6.44% slower)


def test_large_mask_performance():
    # Large mask with random islands and holes, performance test
    np.random.seed(42)
    mask = np.random.choice([False, True], size=(200, 200), p=[0.9, 0.1])
    out, modified = remove_small_regions(mask, area_thresh=20, mode="islands")  # 10.3ms -> 459μs (2137% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import numpy as np

# imports
import pytest  # used for our unit tests
from ultralytics.models.sam.amg import remove_small_regions

# unit tests

# ---- BASIC TEST CASES ----


def test_no_small_holes_to_remove():
    """Test: No small holes present, so mask should remain unchanged in 'holes' mode."""
    mask = np.ones((10, 10), dtype=bool)
    area_thresh = 5
    processed, modified = remove_small_regions(mask, area_thresh, "holes")  # 21.3μs -> 25.4μs (16.0% slower)


def test_no_small_islands_to_remove():
    """Test: No small islands present, so mask should remain unchanged in 'islands' mode."""
    mask = np.zeros((10, 10), dtype=bool)
    mask[2:8, 2:8] = True  # Large island
    area_thresh = 5
    processed, modified = remove_small_regions(mask, area_thresh, "islands")  # 17.5μs -> 20.1μs (13.3% slower)


def test_remove_small_hole():
    """Test: Remove a small hole in 'holes' mode."""
    mask = np.ones((10, 10), dtype=bool)
    mask[5, 5] = False  # Small hole
    area_thresh = 2
    processed, modified = remove_small_regions(mask, area_thresh, "holes")  # 63.4μs -> 67.2μs (5.69% slower)


def test_remove_small_island():
    """Test: Remove a small island in 'islands' mode."""
    mask = np.zeros((10, 10), dtype=bool)
    mask[5, 5] = True  # Small island
    area_thresh = 2
    processed, modified = remove_small_regions(mask, area_thresh, "islands")  # 59.5μs -> 34.8μs (71.3% faster)


def test_keep_large_hole():
    """Test: Large hole should not be removed in 'holes' mode."""
    mask = np.ones((10, 10), dtype=bool)
    mask[2:8, 2:8] = False  # Large hole
    area_thresh = 5
    processed, modified = remove_small_regions(mask, area_thresh, "holes")  # 15.6μs -> 16.7μs (7.02% slower)


def test_keep_large_island():
    """Test: Large island should not be removed in 'islands' mode."""
    mask = np.zeros((10, 10), dtype=bool)
    mask[2:8, 2:8] = True  # Large island
    area_thresh = 5
    processed, modified = remove_small_regions(mask, area_thresh, "islands")  # 14.2μs -> 16.7μs (15.3% slower)


def test_multiple_small_holes():
    """Test: Multiple small holes are all filled in 'holes' mode."""
    mask = np.ones((10, 10), dtype=bool)
    mask[1, 1] = False
    mask[8, 8] = False
    area_thresh = 2
    processed, modified = remove_small_regions(mask, area_thresh, "holes")  # 57.3μs -> 62.7μs (8.68% slower)


def test_multiple_small_islands():
    """Test: Multiple small islands are all removed in 'islands' mode."""
    mask = np.zeros((10, 10), dtype=bool)
    mask[1, 1] = True
    mask[8, 8] = True
    area_thresh = 2
    processed, modified = remove_small_regions(mask, area_thresh, "islands")  # 58.6μs -> 33.7μs (74.0% faster)


def test_hole_exactly_at_threshold():
    """Test: Hole with area exactly at threshold should not be removed."""
    mask = np.ones((10, 10), dtype=bool)
    mask[2:4, 2:4] = False  # 4 pixels
    area_thresh = 4
    processed, modified = remove_small_regions(mask, area_thresh, "holes")  # 14.8μs -> 16.3μs (9.22% slower)


def test_island_exactly_at_threshold():
    """Test: Island with area exactly at threshold should not be removed."""
    mask = np.zeros((10, 10), dtype=bool)
    mask[2:4, 2:4] = True  # 4 pixels
    area_thresh = 4
    processed, modified = remove_small_regions(mask, area_thresh, "islands")  # 14.0μs -> 16.3μs (14.2% slower)


# ---- EDGE TEST CASES ----


def test_empty_mask():
    """Test: Empty mask should be handled gracefully."""
    mask = np.zeros((10, 10), dtype=bool)
    area_thresh = 1
    processed, modified = remove_small_regions(mask, area_thresh, "islands")  # 14.0μs -> 16.8μs (16.4% slower)


def test_full_mask():
    """Test: Full mask should be handled gracefully."""
    mask = np.ones((10, 10), dtype=bool)
    area_thresh = 1
    processed, modified = remove_small_regions(mask, area_thresh, "holes")  # 13.3μs -> 15.3μs (12.8% slower)


def test_single_pixel_hole():
    """Test: Single pixel hole should be filled if below threshold."""
    mask = np.ones((5, 5), dtype=bool)
    mask[2, 2] = False
    area_thresh = 2
    processed, modified = remove_small_regions(mask, area_thresh, "holes")  # 60.5μs -> 65.6μs (7.73% slower)


def test_single_pixel_island():
    """Test: Single pixel island should be removed if below threshold."""
    mask = np.zeros((5, 5), dtype=bool)
    mask[2, 2] = True
    area_thresh = 2
    processed, modified = remove_small_regions(mask, area_thresh, "islands")  # 57.4μs -> 33.2μs (72.8% faster)


def test_area_thresh_zero():
    """Test: Area threshold zero should not remove any regions."""
    mask = np.ones((5, 5), dtype=bool)
    mask[2, 2] = False
    processed, modified = remove_small_regions(mask, 0, "holes")  # 14.7μs -> 16.2μs (9.35% slower)


def test_area_thresh_large():
    """Test: Area threshold larger than any region removes all islands except the largest."""
    mask = np.zeros((5, 5), dtype=bool)
    mask[1, 1] = True
    mask[3, 3] = True
    area_thresh = 10
    processed, modified = remove_small_regions(mask, area_thresh, "islands")  # 60.7μs -> 31.7μs (91.7% faster)


def test_invalid_mode():
    """Test: Invalid mode should raise assertion error."""
    mask = np.ones((5, 5), dtype=bool)
    with pytest.raises(AssertionError):
        remove_small_regions(mask, 1, "invalid_mode")  # 1.47μs -> 1.47μs (0.614% faster)


def test_non_bool_mask():
    """Test: Non-bool mask should still work (converted internally)."""
    mask = np.ones((5, 5), dtype=int)
    mask[2, 2] = 0
    processed, modified = remove_small_regions(mask, 2, "holes")  # 61.7μs -> 69.9μs (11.8% slower)


def test_non_square_mask():
    """Test: Non-square mask should be handled."""
    mask = np.ones((5, 10), dtype=bool)
    mask[2, 2] = False
    processed, modified = remove_small_regions(mask, 2, "holes")  # 53.0μs -> 58.1μs (8.84% slower)


def test_touching_border_hole():
    """Test: Hole touching border should be filled if below threshold."""
    mask = np.ones((5, 5), dtype=bool)
    mask[0, 0] = False
    area_thresh = 2
    processed, modified = remove_small_regions(mask, area_thresh, "holes")  # 49.0μs -> 54.9μs (10.7% slower)


def test_touching_border_island():
    """Test: Island touching border should be removed if below threshold."""
    mask = np.zeros((5, 5), dtype=bool)
    mask[0, 0] = True
    area_thresh = 2
    processed, modified = remove_small_regions(mask, area_thresh, "islands")  # 55.3μs -> 32.6μs (69.6% faster)


# ---- LARGE SCALE TEST CASES ----


def test_large_mask_many_small_holes():
    """Test: Large mask with many small holes, all should be filled."""
    mask = np.ones((100, 100), dtype=bool)
    # Create 50 small holes
    for i in range(50):
        mask[i, i] = False
    area_thresh = 2
    processed, modified = remove_small_regions(mask, area_thresh, "holes")  # 43.0μs -> 45.3μs (5.00% slower)


def test_large_mask_many_small_islands():
    """Test: Large mask with many small islands, all should be removed."""
    mask = np.zeros((100, 100), dtype=bool)
    # Create 50 small islands
    for i in range(50):
        mask[i, i] = True
    area_thresh = 2
    processed, modified = remove_small_regions(mask, area_thresh, "islands")  # 42.4μs -> 44.5μs (4.84% slower)


def test_large_mask_large_island_remains():
    """Test: Large mask with one large island and many small islands, only large island remains."""
    mask = np.zeros((100, 100), dtype=bool)
    mask[10:90, 10:90] = True  # Large island
    for i in range(50):
        mask[i, i] = True  # Small islands
    area_thresh = 50
    processed, modified = remove_small_regions(mask, area_thresh, "islands")  # 45.1μs -> 47.8μs (5.63% slower)
    # Small islands removed
    for i in range(50):
        if not (10 <= i < 90):
            pass


def test_large_mask_large_hole_remains():
    """Test: Large mask with one large hole and many small holes, only large hole remains."""
    mask = np.ones((100, 100), dtype=bool)
    mask[10:90, 10:90] = False  # Large hole
    for i in range(50):
        mask[i, i] = False  # Small holes
    area_thresh = 50
    processed, modified = remove_small_regions(mask, area_thresh, "holes")  # 44.2μs -> 46.6μs (5.22% slower)
    # Small holes filled
    for i in range(50):
        if not (10 <= i < 90):
            pass


def test_large_mask_no_modification():
    """Test: Large mask with no small regions, nothing should be changed."""
    mask = np.ones((100, 100), dtype=bool)
    area_thresh = 2
    processed, modified = remove_small_regions(mask, area_thresh, "holes")  # 40.9μs -> 43.7μs (6.48% slower)


def test_large_mask_all_small_islands():
    """Test: Large mask with all islands below threshold, largest should remain."""
    mask = np.zeros((100, 100), dtype=bool)
    # Create 10 islands of size 1
    for i in range(10):
        mask[i, i] = True
    area_thresh = 2
    processed, modified = remove_small_regions(mask, area_thresh, "islands")  # 41.7μs -> 43.8μs (4.89% slower)


def test_large_mask_all_small_holes():
    """Test: Large mask with all holes below threshold, all should be filled."""
    mask = np.ones((100, 100), dtype=bool)
    # Create 10 holes of size 1
    for i in range(10):
        mask[i, i] = False
    area_thresh = 2
    processed, modified = remove_small_regions(mask, area_thresh, "holes")  # 41.3μs -> 42.7μs (3.17% slower)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-remove_small_regions-mirbye2t and push.

Codeflash Static Badge

The optimized code achieves a 452% speedup by replacing inefficient Python list comprehensions and set operations with NumPy vectorized operations, particularly targeting the bottleneck identified in line profiling.

**Key optimizations applied:**

1. **Vectorized region finding**: Replaced `[i + 1 for i, s in enumerate(sizes) if s < area_thresh]` with `np.flatnonzero(sizes < area_thresh) + 1`, eliminating the Python loop that was consuming 6.3% of runtime.

2. **Optimized list difference computation**: The original code's `[i for i in range(n_labels) if i not in fill_labels]` was extremely inefficient (72.9% of runtime) due to the `in` operator on lists. The optimization uses set operations: `set(range(n_labels)) - set(np.concatenate(([0], small_regions)))` which is O(n) instead of O(n²).

3. **Single-element optimization**: Added a fast path `regions == fill_labels[0]` when only one label needs to be kept, avoiding the more expensive `np.isin()` call.

4. **Improved array slicing**: Changed `stats[:, -1][1:]` to `stats[1:, -1]` for more efficient memory access patterns.

**Performance impact by test case:**
- **Best gains** (79-2137% faster): Tests with small islands that trigger the inefficient list comprehension in "islands" mode, especially `test_large_mask_performance` showing dramatic improvement from 10.3ms to 459μs
- **Modest slowdowns** (4-19% slower): Tests in "holes" mode or with no modifications, where the overhead of NumPy operations slightly exceeds the simpler original logic

The function is called from SAM's post-processing pipeline (`ultralytics/models/sam/predict.py`) where masks are processed in a loop, making these micro-optimizations particularly valuable since they compound across multiple mask processing operations in segmentation workflows.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 4, 2025 11:06
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant