⚡️ Speed up function min_index by 92%
#49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 92% (0.92x) speedup for
min_indexinultralytics/data/converter.py⏱️ Runtime :
149 milliseconds→77.5 milliseconds(best of73runs)📝 Explanation and details
The optimization replaces the original squared-distance calculation with a more efficient
np.einsumapproach that provides a 92% speedup.Key Changes:
arr1[:, None, :] - arr2[None, :, :]) is separated from the sum-of-squares operation** 2).sum(-1)withnp.einsum('ijk,ijk->ij', diff, diff, optimize=True): This computes the squared distances more efficiently by avoiding intermediate memory allocationWhy This Is Faster:
einsumversion performs the sum-of-squares as a single vectorized operation directly into the final (N×M) resulteinsumwithoptimize=Truefinds the most efficient computation order and leverages optimized BLAS operationsPerformance Profile:
einsumoverhead, but this is negligible in absolute terms (microseconds)merge_multi_segment()which processes segmentation data, making this optimization valuable for computer vision workloadsImpact on Workloads:
Based on the function reference,
min_indexis used inmerge_multi_segment()for connecting COCO segmentation coordinates. This optimization will significantly improve performance when processing large segmentation datasets or real-time computer vision applications where many segments need merging.✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-min_index-mircu5gmand push.