Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 4, 2025

📄 71% (0.71x) speedup for generate_crop_boxes in ultralytics/models/sam/amg.py

⏱️ Runtime : 1.16 milliseconds 679 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 71% speedup through several key micro-optimizations that reduce overhead in the inner loops:

Key Optimizations:

  1. Eliminated repeated attribute lookups: Stored crop_boxes.append and layer_idxs.append in local variables (add, add_idx) to avoid repeated dot notation lookups in the tight inner loop that executes 4,764 times.

  2. Precomputed bounds checking: Moved min() operations outside the inner loop by precomputing x1 = x0 + crop_w and using conditional assignment x1_clipped = x1 if x1 <= min_add_crop_w else min_add_crop_w, replacing expensive function calls with simple comparisons.

  3. Reduced list comprehension overhead: Converted list comprehensions for crop_box_x0 and crop_box_y0 to tuple generator expressions, avoiding temporary list allocations while maintaining the same iteration behavior.

  4. Hoisted invariant computations: Moved loop-invariant calculations like box_layer = i_layer + 1 outside the inner loops to avoid redundant arithmetic.

Performance Impact:
The optimizations are most effective for large-scale test cases where the inner loop executes many times:

  • Large images with multiple layers show 87-131% speedup (e.g., 512x512 with 5 layers)
  • Medium complexity cases show 45-50% speedup
  • Simple cases with few iterations show modest 10-17% gains or slight overhead due to setup costs

Context Relevance:
Based on the function reference, generate_crop_boxes is called in SAM's generate() method for image segmentation, where it processes multiple crop regions for each input image. The optimization directly benefits this hot path by reducing the overhead of generating potentially hundreds of crop boxes per image, making real-time segmentation more efficient.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 38 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
# imports
from ultralytics.models.sam.amg import generate_crop_boxes

# unit tests

# -------- BASIC TEST CASES --------


def test_single_layer_no_overlap():
    # Basic test: 1 layer, no overlap, square image
    im_size = (100, 100)
    n_layers = 1
    overlap_ratio = 0.0
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 5.80μs -> 6.30μs (7.93% slower)
    # The next 4 should be 2x2 grid, non-overlapping
    expected_boxes = [
        [0, 0, 50, 50],
        [0, 50, 50, 100],
        [50, 0, 100, 50],
        [50, 50, 100, 100],
    ]
    for i, box in enumerate(expected_boxes, start=1):
        pass


def test_multiple_layers_with_overlap():
    # Basic test: 2 layers, overlap, rectangular image
    im_size = (80, 120)
    n_layers = 2
    overlap_ratio = 0.25
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 11.1μs -> 9.95μs (11.6% faster)
    # Check that all boxes are within image bounds
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_zero_layers():
    # Basic test: 0 layers, should return only the original image box
    im_size = (50, 50)
    n_layers = 0
    overlap_ratio = 0.5
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 1.35μs -> 1.95μs (30.7% slower)


def test_non_square_image():
    # Basic test: non-square image, 1 layer, some overlap
    im_size = (60, 100)
    n_layers = 1
    overlap_ratio = 0.2
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 5.66μs -> 6.41μs (11.8% slower)
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_layer_indices_are_correct():
    # Check that layer indices are assigned correctly
    im_size = (32, 32)
    n_layers = 2
    overlap_ratio = 0.1
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 11.3μs -> 9.60μs (17.2% faster)


# -------- EDGE TEST CASES --------


def test_minimal_image_size():
    # Edge: 1x1 image, multiple layers
    im_size = (1, 1)
    n_layers = 3
    overlap_ratio = 0.5
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 26.0μs -> 18.0μs (44.4% faster)
    # All boxes should be [0, 0, 1, 1]
    for box in crop_boxes:
        pass


def test_large_overlap_ratio():
    # Edge: overlap_ratio > 1
    im_size = (100, 100)
    n_layers = 1
    overlap_ratio = 1.5
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 6.09μs -> 6.49μs (6.13% slower)
    # Should not error, all boxes within bounds
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_negative_overlap_ratio():
    # Edge: negative overlap_ratio, should still work (no assertion in function)
    im_size = (100, 100)
    n_layers = 1
    overlap_ratio = -0.5
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 5.67μs -> 6.25μs (9.30% slower)
    # Boxes should be non-overlapping or possibly with gaps
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_zero_overlap_ratio():
    # Edge: overlap_ratio = 0, should produce non-overlapping grid
    im_size = (64, 64)
    n_layers = 2
    overlap_ratio = 0.0
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 11.2μs -> 9.95μs (12.3% faster)


def test_negative_layer_count():
    # Edge: negative n_layers, should produce only the original box
    im_size = (100, 100)
    n_layers = -1
    overlap_ratio = 0.2
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 2.03μs -> 2.67μs (24.0% slower)


def test_zero_image_size():
    # Edge: zero-dimension image
    im_size = (0, 100)
    n_layers = 1
    overlap_ratio = 0.5
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 7.93μs -> 8.09μs (1.99% slower)
    # All boxes should have y0 == y1 == 0
    for box in crop_boxes:
        pass


# -------- LARGE SCALE TEST CASES --------


def test_large_image_and_layers():
    # Large scale: moderately large image and layers
    im_size = (256, 256)
    n_layers = 3  # 1 + 4 + 16 + 64 = 85 boxes
    overlap_ratio = 0.2
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 28.3μs -> 19.3μs (46.1% faster)
    # Check that all boxes are within bounds
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_maximum_reasonable_layers():
    # Large scale: test maximum layers that keeps box count < 1000
    im_size = (128, 128)
    n_layers = 4  # 1 + 4 + 16 + 64 + 256 = 341 boxes
    overlap_ratio = 0.1
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 75.1μs -> 40.1μs (87.3% faster)
    # Check uniqueness of boxes
    unique_boxes = set(tuple(box) for box in crop_boxes)


def test_large_non_square_image():
    # Large scale: non-square, large image
    im_size = (300, 500)
    n_layers = 2
    overlap_ratio = 0.3
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 12.1μs -> 10.9μs (10.6% faster)


def test_performance_large_number_of_boxes():
    # Large scale: ensure function doesn't take too long for 1000 boxes
    im_size = (512, 512)
    n_layers = 4  # 1 + 4 + 16 + 64 + 256 = 341 boxes
    overlap_ratio = 0.5
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 77.8μs -> 41.0μs (89.6% faster)


def test_all_boxes_cover_image():
    # Large scale: ensure at least one box covers the full image
    im_size = (400, 600)
    n_layers = 3
    overlap_ratio = 0.2
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 27.1μs -> 18.6μs (45.8% faster)


# -------- ADDITIONAL EDGE CASES --------


def test_one_pixel_overlap():
    # Edge: overlap_ratio so small that overlap rounds to 0
    im_size = (32, 32)
    n_layers = 1
    overlap_ratio = 0.0001
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 6.02μs -> 6.36μs (5.36% slower)


def test_highly_rectangular_image():
    # Edge: very tall image
    im_size = (1000, 10)
    n_layers = 2
    overlap_ratio = 0.15
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 11.7μs -> 10.8μs (8.44% faster)
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_layer_zero_only():
    # Edge: n_layers = 0, only original image box
    im_size = (77, 99)
    n_layers = 0
    overlap_ratio = 0.3
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 1.41μs -> 1.98μs (28.6% slower)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
# imports
from ultralytics.models.sam.amg import generate_crop_boxes

# unit tests

# ---------------------------
# BASIC TEST CASES
# ---------------------------


def test_basic_single_layer_no_overlap():
    # Basic test: 1 layer, no overlap
    im_size = (100, 200)  # height, width
    n_layers = 1
    overlap_ratio = 0.0
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 6.68μs -> 7.31μs (8.60% slower)
    # Check that all crops are within bounds
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_basic_two_layers_with_overlap():
    # Basic test: 2 layers, some overlap
    im_size = (120, 80)
    n_layers = 2
    overlap_ratio = 0.2
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 12.2μs -> 10.5μs (16.3% faster)
    # Check all boxes are within bounds
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_basic_square_image():
    # Basic test: square image, 1 layer
    im_size = (100, 100)
    n_layers = 1
    overlap_ratio = 0.1
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 5.82μs -> 6.25μs (6.89% slower)
    # Check that all crops are within bounds
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_basic_zero_layers():
    # Edge: zero layers, only the original image
    im_size = (50, 50)
    n_layers = 0
    overlap_ratio = 0.5
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 1.32μs -> 1.93μs (31.6% slower)


# ---------------------------
# EDGE TEST CASES
# ---------------------------


def test_edge_minimum_image_size():
    # Edge: minimum possible image size
    im_size = (1, 1)
    n_layers = 1
    overlap_ratio = 0.5
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 6.20μs -> 6.43μs (3.58% slower)
    # Should not produce out-of-bounds boxes
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_edge_high_overlap_ratio():
    # Edge: overlap ratio > 1
    im_size = (100, 100)
    n_layers = 1
    overlap_ratio = 2.0
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 6.13μs -> 6.56μs (6.55% slower)
    # Should still produce valid boxes
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_edge_non_integer_image_size():
    # Edge: float image sizes (should work if cast to int)
    im_size = (100.9, 99.1)
    n_layers = 1
    overlap_ratio = 0.25
    # Should handle float input gracefully
    crop_boxes, layer_idxs = generate_crop_boxes(
        (int(im_size[0]), int(im_size[1])), n_layers, overlap_ratio
    )  # 5.89μs -> 6.28μs (6.19% slower)
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_edge_zero_overlap_ratio():
    # Edge: zero overlap
    im_size = (60, 40)
    n_layers = 2
    overlap_ratio = 0.0
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 11.3μs -> 10.2μs (10.8% faster)
    # Check for non-overlapping (no crop should have x0 == x1 or y0 == y1 except possibly at the edge)
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_edge_large_overlap_ratio():
    # Edge: overlap ratio close to 1
    im_size = (100, 100)
    n_layers = 1
    overlap_ratio = 0.99
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 5.68μs -> 6.14μs (7.45% slower)
    # Should still produce valid boxes
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_edge_negative_overlap_ratio():
    # Edge: negative overlap ratio (should not crash, but overlap becomes negative)
    im_size = (100, 100)
    n_layers = 1
    overlap_ratio = -0.1
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 5.75μs -> 6.33μs (9.15% slower)
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_edge_large_n_layers():
    # Edge: large number of layers, but not exceeding reasonable bounds
    im_size = (64, 64)
    n_layers = 5  # 2^6 = 64 crops per side at layer 5
    overlap_ratio = 0.2
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 257μs -> 115μs (123% faster)
    # All boxes within bounds
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_edge_non_standard_aspect_ratio():
    # Edge: extremely wide image
    im_size = (10, 1000)
    n_layers = 2
    overlap_ratio = 0.1
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 12.7μs -> 11.0μs (15.1% faster)
    for box in crop_boxes:
        x0, y0, x1, y1 = box


# ---------------------------
# LARGE SCALE TEST CASES
# ---------------------------


def test_large_scale_many_layers():
    # Large scale: reasonably large image and number of layers
    im_size = (500, 500)
    n_layers = 4  # 2^5 = 32 crops per side at layer 4
    overlap_ratio = 0.3
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 78.7μs -> 41.2μs (91.0% faster)
    # Check that all boxes are within bounds
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_large_scale_maximum_elements():
    # Large scale: maximum elements just below 1000
    im_size = (256, 256)
    n_layers = 4  # 1 + 4 + 16 + 64 + 256 = 341 boxes
    overlap_ratio = 0.25
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 74.5μs -> 38.8μs (92.1% faster)
    # All boxes within bounds
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_large_scale_rectangular_image():
    # Large scale: non-square image
    im_size = (300, 700)
    n_layers = 3
    overlap_ratio = 0.2
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 27.3μs -> 18.1μs (50.7% faster)
    for box in crop_boxes:
        x0, y0, x1, y1 = box


def test_large_scale_performance():
    # Large scale: check that function runs efficiently
    im_size = (512, 512)
    n_layers = 5  # 1 + 4 + 16 + 64 + 256 + 1024 = 1365 boxes
    overlap_ratio = 0.1
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 264μs -> 114μs (131% faster)
    # Check that all boxes are within bounds
    for box in crop_boxes:
        x0, y0, x1, y1 = box


# ---------------------------
# FUNCTIONALITY/INTEGRITY TESTS
# ---------------------------


def test_integrity_no_duplicate_boxes():
    # Integrity: no duplicate crop boxes
    im_size = (128, 128)
    n_layers = 3
    overlap_ratio = 0.25
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 26.5μs -> 18.2μs (45.8% faster)
    seen = set()
    for box in crop_boxes:
        tup = tuple(box)
        seen.add(tup)


def test_integrity_layer_indices():
    # Integrity: layer indices match expected counts
    im_size = (64, 64)
    n_layers = 2
    overlap_ratio = 0.2
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 10.9μs -> 9.67μs (12.8% faster)


def test_integrity_box_format():
    # Integrity: all boxes are [x0, y0, x1, y1] and x1 > x0, y1 > y0
    im_size = (100, 100)
    n_layers = 2
    overlap_ratio = 0.1
    crop_boxes, layer_idxs = generate_crop_boxes(im_size, n_layers, overlap_ratio)  # 10.7μs -> 9.74μs (9.79% faster)
    for box in crop_boxes:
        x0, y0, x1, y1 = box


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-generate_crop_boxes-mirbh0uv and push.

Codeflash Static Badge

The optimized code achieves a **71% speedup** through several key micro-optimizations that reduce overhead in the inner loops:

**Key Optimizations:**

1. **Eliminated repeated attribute lookups**: Stored `crop_boxes.append` and `layer_idxs.append` in local variables (`add`, `add_idx`) to avoid repeated dot notation lookups in the tight inner loop that executes 4,764 times.

2. **Precomputed bounds checking**: Moved `min()` operations outside the inner loop by precomputing `x1 = x0 + crop_w` and using conditional assignment `x1_clipped = x1 if x1 <= min_add_crop_w else min_add_crop_w`, replacing expensive function calls with simple comparisons.

3. **Reduced list comprehension overhead**: Converted list comprehensions for `crop_box_x0` and `crop_box_y0` to tuple generator expressions, avoiding temporary list allocations while maintaining the same iteration behavior.

4. **Hoisted invariant computations**: Moved loop-invariant calculations like `box_layer = i_layer + 1` outside the inner loops to avoid redundant arithmetic.

**Performance Impact:**
The optimizations are most effective for **large-scale test cases** where the inner loop executes many times:
- Large images with multiple layers show 87-131% speedup (e.g., 512x512 with 5 layers)
- Medium complexity cases show 45-50% speedup  
- Simple cases with few iterations show modest 10-17% gains or slight overhead due to setup costs

**Context Relevance:**
Based on the function reference, `generate_crop_boxes` is called in SAM's `generate()` method for image segmentation, where it processes multiple crop regions for each input image. The optimization directly benefits this hot path by reducing the overhead of generating potentially hundreds of crop boxes per image, making real-time segmentation more efficient.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 4, 2025 10:52
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash labels Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant