Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 4, 2025

📄 92% (0.92x) speedup for min_index in ultralytics/data/converter.py

⏱️ Runtime : 149 milliseconds 77.5 milliseconds (best of 73 runs)

📝 Explanation and details

The optimization replaces the original squared-distance calculation with a more efficient np.einsum approach that provides a 92% speedup.

Key Changes:

  1. Split computation into two steps: The difference calculation (arr1[:, None, :] - arr2[None, :, :]) is separated from the sum-of-squares operation
  2. Replace ** 2).sum(-1) with np.einsum('ijk,ijk->ij', diff, diff, optimize=True): This computes the squared distances more efficiently by avoiding intermediate memory allocation

Why This Is Faster:

  • Memory efficiency: The original approach creates a large (N×M×2) intermediate array, squares all elements, then sums. The einsum version performs the sum-of-squares as a single vectorized operation directly into the final (N×M) result
  • Optimized computation path: einsum with optimize=True finds the most efficient computation order and leverages optimized BLAS operations
  • Reduced memory bandwidth: Less data movement between CPU and memory, which is often the bottleneck for array operations

Performance Profile:

  • Small arrays (test cases): ~60% slower due to einsum overhead, but this is negligible in absolute terms (microseconds)
  • Large arrays (1000+ points): 87-100% faster where the optimization really matters
  • The function is called in a hot path within merge_multi_segment() which processes segmentation data, making this optimization valuable for computer vision workloads

Impact on Workloads:
Based on the function reference, min_index is used in merge_multi_segment() for connecting COCO segmentation coordinates. This optimization will significantly improve performance when processing large segmentation datasets or real-time computer vision applications where many segments need merging.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 38 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np

# imports
import pytest  # used for our unit tests
from ultralytics.data.converter import min_index

# unit tests

# ------------------------
# Basic Test Cases
# ------------------------


def test_single_point_each():
    # Both arrays have a single identical point
    arr1 = np.array([[0, 0]])
    arr2 = np.array([[0, 0]])
    codeflash_output = min_index(arr1, arr2)  # 26.6μs -> 59.8μs (55.6% slower)


def test_single_point_each_different():
    # Both arrays have a single, different point
    arr1 = np.array([[1, 2]])
    arr2 = np.array([[3, 4]])
    codeflash_output = min_index(arr1, arr2)  # 19.4μs -> 49.2μs (60.7% slower)


def test_two_points_each_closest_pair():
    # Closest pair is not at (0,0)
    arr1 = np.array([[0, 0], [10, 10]])
    arr2 = np.array([[9, 9], [100, 100]])
    # (1,0) is the closest: [10,10] <-> [9,9]
    codeflash_output = min_index(arr1, arr2)  # 22.1μs -> 51.3μs (57.0% slower)


def test_multiple_points_simple():
    # Multiple points, obvious closest pair
    arr1 = np.array([[0, 0], [5, 5], [10, 10]])
    arr2 = np.array([[2, 2], [8, 8]])
    # Closest: [0,0] <-> [2,2] (distance sqrt(8))
    codeflash_output = min_index(arr1, arr2)  # 21.0μs -> 49.9μs (57.9% slower)


def test_multiple_points_multiple_minima():
    # Multiple pairs with the same minimal distance
    arr1 = np.array([[0, 0], [1, 1]])
    arr2 = np.array([[1, 1], [0, 0]])
    # Both (0,1) and (1,0) have distance 0, but np.argmin returns first in C order
    codeflash_output = min_index(arr1, arr2)  # 20.2μs -> 48.6μs (58.5% slower)


# ------------------------
# Edge Test Cases
# ------------------------


def test_empty_first_array():
    # arr1 is empty
    arr1 = np.empty((0, 2))
    arr2 = np.array([[1, 2], [3, 4]])
    with pytest.raises(ValueError):
        min_index(arr1, arr2)  # 16.5μs -> 44.1μs (62.6% slower)


def test_empty_second_array():
    # arr2 is empty
    arr1 = np.array([[1, 2], [3, 4]])
    arr2 = np.empty((0, 2))
    with pytest.raises(ValueError):
        min_index(arr1, arr2)  # 15.4μs -> 43.5μs (64.5% slower)


def test_both_arrays_empty():
    # Both arrays are empty
    arr1 = np.empty((0, 2))
    arr2 = np.empty((0, 2))
    with pytest.raises(ValueError):
        min_index(arr1, arr2)  # 14.5μs -> 43.4μs (66.6% slower)


def test_identical_points_multiple():
    # All points are identical, minimal distance is always zero
    arr1 = np.array([[5, 5], [5, 5], [5, 5]])
    arr2 = np.array([[5, 5], [5, 5]])
    # First occurrence is (0,0)
    codeflash_output = min_index(arr1, arr2)  # 22.7μs -> 52.1μs (56.4% slower)


def test_negative_coordinates():
    # Points with negative coordinates
    arr1 = np.array([[-1, -1], [-2, -2]])
    arr2 = np.array([[0, 0], [-2, -2]])
    # Closest: (-2,-2) <-> (-2,-2)
    codeflash_output = min_index(arr1, arr2)  # 20.7μs -> 47.8μs (56.7% slower)


def test_large_coordinates():
    # Points with large coordinate values
    arr1 = np.array([[1e9, 1e9], [1e9 + 1, 1e9 + 1]])
    arr2 = np.array([[1e9 + 2, 1e9 + 2], [1e9 - 1, 1e9 - 1]])
    # Closest: (0,1): [1e9,1e9] <-> [1e9-1,1e9-1]
    codeflash_output = min_index(arr1, arr2)  # 20.0μs -> 48.2μs (58.5% slower)


def test_non_integer_coordinates():
    # Points with floating point coordinates
    arr1 = np.array([[0.1, 0.2], [0.3, 0.4]])
    arr2 = np.array([[0.15, 0.25], [0.5, 0.5]])
    # Closest: [0.1,0.2] <-> [0.15,0.25]
    codeflash_output = min_index(arr1, arr2)  # 19.3μs -> 47.6μs (59.5% slower)


def test_minimum_at_end():
    # Closest pair is at the last indices
    arr1 = np.array([[0, 0], [1, 1], [2, 2]])
    arr2 = np.array([[10, 10], [2, 2]])
    # Closest: [2,2] <-> [2,2]
    codeflash_output = min_index(arr1, arr2)  # 19.7μs -> 47.4μs (58.4% slower)


def test_duplicate_points():
    # Duplicate points in one array
    arr1 = np.array([[0, 0], [0, 0], [1, 1]])
    arr2 = np.array([[1, 1]])
    # Closest: [1,1] <-> [1,1]
    codeflash_output = min_index(arr1, arr2)  # 20.0μs -> 48.0μs (58.3% slower)


# ------------------------
# Large Scale Test Cases
# ------------------------


def test_large_arrays_minimum_at_start():
    # Large arrays, minimum at (0,0)
    arr1 = np.zeros((1000, 2))
    arr2 = np.ones((1000, 2)) * 100
    arr2[0] = [0, 0]
    codeflash_output = min_index(arr1, arr2)  # 19.1ms -> 10.1ms (89.2% faster)


def test_large_arrays_minimum_at_end():
    # Large arrays, minimum at the last indices
    arr1 = np.ones((1000, 2)) * 100
    arr1[-1] = [42, 42]
    arr2 = np.ones((1000, 2)) * 100
    arr2[-1] = [42, 42]
    codeflash_output = min_index(arr1, arr2)  # 18.9ms -> 10.1ms (87.7% faster)


def test_large_arrays_random_minimum():
    # Large arrays, random minimum
    rng = np.random.default_rng(123)
    arr1 = rng.integers(-1000, 1000, size=(1000, 2))
    arr2 = rng.integers(-1000, 1000, size=(1000, 2))
    # Insert a guaranteed minimum at (123, 456)
    arr1[123] = [555, 555]
    arr2[456] = [555, 555]
    codeflash_output = min_index(arr1, arr2)  # 19.4ms -> 10.2ms (90.7% faster)


def test_large_arrays_all_same_point():
    # All points are identical in both arrays
    arr1 = np.full((1000, 2), [7, 7])
    arr2 = np.full((1000, 2), [7, 7])
    codeflash_output = min_index(arr1, arr2)  # 21.1ms -> 10.0ms (111% faster)


def test_large_arrays_many_minima():
    # Multiple minima in large arrays
    arr1 = np.zeros((1000, 2))
    arr2 = np.zeros((1000, 2))
    codeflash_output = min_index(arr1, arr2)  # 20.9ms -> 10.1ms (106% faster)


# ------------------------
# Additional Robustness Tests
# ------------------------


def test_minimum_distance_not_zero():
    # Ensure function finds minimum even if all distances are nonzero
    arr1 = np.array([[0, 0], [10, 10]])
    arr2 = np.array([[5, 5], [9, 9]])
    # Closest: [10,10] <-> [9,9]
    codeflash_output = min_index(arr1, arr2)  # 33.0μs -> 68.2μs (51.6% slower)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import numpy as np

# imports
import pytest
from ultralytics.data.converter import min_index

# unit tests

# -------------------- Basic Test Cases --------------------


def test_single_point_each():
    # Both arrays have a single identical point
    arr1 = np.array([[0, 0]])
    arr2 = np.array([[0, 0]])
    codeflash_output = min_index(arr1, arr2)  # 25.0μs -> 58.2μs (57.1% slower)


def test_single_point_each_different():
    # Both arrays have a single different point
    arr1 = np.array([[1, 2]])
    arr2 = np.array([[3, 4]])
    codeflash_output = min_index(arr1, arr2)  # 20.3μs -> 52.9μs (61.7% slower)


def test_two_points_each():
    # Both arrays have two points, closest are (1,1) and (1,2)
    arr1 = np.array([[0, 0], [1, 1]])
    arr2 = np.array([[2, 2], [1, 2]])
    codeflash_output = min_index(arr1, arr2)  # 22.9μs -> 52.6μs (56.4% slower)


def test_different_lengths():
    # arr1 has 3, arr2 has 2 points
    arr1 = np.array([[0, 0], [1, 1], [2, 2]])
    arr2 = np.array([[3, 3], [1, 2]])
    codeflash_output = min_index(arr1, arr2)  # 21.1μs -> 50.7μs (58.3% slower)


def test_multiple_minimums():
    # Multiple pairs have the same minimum distance
    arr1 = np.array([[0, 0], [1, 1]])
    arr2 = np.array([[0, 0], [1, 1]])
    # The first minimum found should be (0,0) to (0,0)
    codeflash_output = min_index(arr1, arr2)  # 20.2μs -> 49.5μs (59.2% slower)


# -------------------- Edge Test Cases --------------------


def test_empty_arr1():
    # arr1 is empty, should raise IndexError
    arr1 = np.empty((0, 2))
    arr2 = np.array([[1, 2]])
    with pytest.raises(ValueError):
        min_index(arr1, arr2)  # 16.9μs -> 45.9μs (63.1% slower)


def test_empty_arr2():
    # arr2 is empty, should raise IndexError
    arr1 = np.array([[1, 2]])
    arr2 = np.empty((0, 2))
    with pytest.raises(ValueError):
        min_index(arr1, arr2)  # 16.1μs -> 45.3μs (64.4% slower)


def test_both_empty():
    # Both arrays are empty
    arr1 = np.empty((0, 2))
    arr2 = np.empty((0, 2))
    with pytest.raises(ValueError):
        min_index(arr1, arr2)  # 15.5μs -> 43.0μs (63.9% slower)


def test_identical_points():
    # All points are identical, so any pair is a valid minimum
    arr1 = np.array([[5, 5], [5, 5]])
    arr2 = np.array([[5, 5], [5, 5]])
    # Should return the first minimum found
    codeflash_output = min_index(arr1, arr2)  # 22.5μs -> 52.9μs (57.4% slower)


def test_negative_coordinates():
    # Points with negative coordinates
    arr1 = np.array([[-1, -2], [3, 4]])
    arr2 = np.array([[0, 0], [-1, -2]])
    # Closest are (-1,-2) and (-1,-2)
    codeflash_output = min_index(arr1, arr2)  # 20.8μs -> 49.6μs (58.1% slower)


def test_large_coordinates():
    # Points with large magnitude coordinates
    arr1 = np.array([[1e6, 1e6], [2e6, 2e6]])
    arr2 = np.array([[1e6 + 1, 1e6 + 1], [0, 0]])
    # Closest are (1e6,1e6) and (1e6+1,1e6+1)
    codeflash_output = min_index(arr1, arr2)  # 20.7μs -> 49.8μs (58.4% slower)


def test_non_integer_coordinates():
    # Points with float coordinates
    arr1 = np.array([[0.5, 1.2], [3.1, 4.7]])
    arr2 = np.array([[0.6, 1.1], [3.0, 4.8]])
    # Closest are (0.5,1.2) and (0.6,1.1)
    codeflash_output = min_index(arr1, arr2)  # 19.3μs -> 47.7μs (59.5% slower)


def test_single_row_arr1():
    # arr1 has a single point, arr2 has several
    arr1 = np.array([[0, 0]])
    arr2 = np.array([[1, 1], [2, 2], [-1, -1]])
    # Closest is (0,0) and (-1,-1)
    codeflash_output = min_index(arr1, arr2)  # 20.0μs -> 49.4μs (59.4% slower)


def test_single_row_arr2():
    # arr2 has a single point, arr1 has several
    arr1 = np.array([[1, 1], [2, 2], [-1, -1]])
    arr2 = np.array([[0, 0]])
    # Closest is (-1,-1) and (0,0)
    codeflash_output = min_index(arr1, arr2)  # 19.9μs -> 48.0μs (58.6% slower)


def test_large_arrays_random():
    # Large arrays with random points
    rng = np.random.default_rng(42)
    arr1 = rng.integers(-1000, 1000, size=(500, 2))
    arr2 = rng.integers(-1000, 1000, size=(500, 2))
    # Manually set a known closest pair
    arr1[123] = [10000, 10000]
    arr2[456] = [10000, 10000]
    codeflash_output = min_index(arr1, arr2)  # 4.55ms -> 2.57ms (77.0% faster)


def test_large_arrays_identical():
    # Large arrays where all points are identical
    arr1 = np.full((1000, 2), 7)
    arr2 = np.full((1000, 2), 7)
    # The first minimum found should be (0,0)
    codeflash_output = min_index(arr1, arr2)  # 19.0ms -> 10.1ms (87.1% faster)


def test_large_arrays_offset():
    # Large arrays, arr2 is arr1 shifted by a constant
    arr1 = np.arange(1000).reshape(-1, 2)
    arr2 = arr1 + 5
    # Closest pair should be (0,0)
    codeflash_output = min_index(arr1, arr2)  # 5.01ms -> 2.56ms (95.6% faster)


def test_large_arrays_min_at_end():
    # Closest pair is at the last indices
    arr1 = np.zeros((999, 2))
    arr2 = np.zeros((999, 2))
    arr1 = np.vstack([arr1, [12345, 67890]])
    arr2 = np.vstack([arr2, [12345, 67890]])
    codeflash_output = min_index(arr1, arr2)  # 20.5ms -> 10.2ms (100% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-min_index-mircu5gm and push.

Codeflash Static Badge

The optimization replaces the original squared-distance calculation with a more efficient `np.einsum` approach that provides a **92% speedup**.

**Key Changes:**
1. **Split computation into two steps**: The difference calculation (`arr1[:, None, :] - arr2[None, :, :]`) is separated from the sum-of-squares operation
2. **Replace `** 2).sum(-1)` with `np.einsum('ijk,ijk->ij', diff, diff, optimize=True)`**: This computes the squared distances more efficiently by avoiding intermediate memory allocation

**Why This Is Faster:**
- **Memory efficiency**: The original approach creates a large (N×M×2) intermediate array, squares all elements, then sums. The `einsum` version performs the sum-of-squares as a single vectorized operation directly into the final (N×M) result
- **Optimized computation path**: `einsum` with `optimize=True` finds the most efficient computation order and leverages optimized BLAS operations
- **Reduced memory bandwidth**: Less data movement between CPU and memory, which is often the bottleneck for array operations

**Performance Profile:**
- Small arrays (test cases): ~60% slower due to `einsum` overhead, but this is negligible in absolute terms (microseconds)
- Large arrays (1000+ points): **87-100% faster** where the optimization really matters
- The function is called in a hot path within `merge_multi_segment()` which processes segmentation data, making this optimization valuable for computer vision workloads

**Impact on Workloads:**
Based on the function reference, `min_index` is used in `merge_multi_segment()` for connecting COCO segmentation coordinates. This optimization will significantly improve performance when processing large segmentation datasets or real-time computer vision applications where many segments need merging.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 4, 2025 11:30
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant