⚡️ Speed up function remove_small_regions by 452%
#48
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 452% (4.52x) speedup for
remove_small_regionsinultralytics/models/sam/amg.py⏱️ Runtime :
12.1 milliseconds→2.19 milliseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 452% speedup by replacing inefficient Python list comprehensions and set operations with NumPy vectorized operations, particularly targeting the bottleneck identified in line profiling.
Key optimizations applied:
Vectorized region finding: Replaced
[i + 1 for i, s in enumerate(sizes) if s < area_thresh]withnp.flatnonzero(sizes < area_thresh) + 1, eliminating the Python loop that was consuming 6.3% of runtime.Optimized list difference computation: The original code's
[i for i in range(n_labels) if i not in fill_labels]was extremely inefficient (72.9% of runtime) due to theinoperator on lists. The optimization uses set operations:set(range(n_labels)) - set(np.concatenate(([0], small_regions)))which is O(n) instead of O(n²).Single-element optimization: Added a fast path
regions == fill_labels[0]when only one label needs to be kept, avoiding the more expensivenp.isin()call.Improved array slicing: Changed
stats[:, -1][1:]tostats[1:, -1]for more efficient memory access patterns.Performance impact by test case:
test_large_mask_performanceshowing dramatic improvement from 10.3ms to 459μsThe function is called from SAM's post-processing pipeline (
ultralytics/models/sam/predict.py) where masks are processed in a loop, making these micro-optimizations particularly valuable since they compound across multiple mask processing operations in segmentation workflows.✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-remove_small_regions-mirbye2tand push.