⚡️ Speed up function _merge_tiles_elements by 45%
#786
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 45% (0.45x) speedup for
_merge_tiles_elementsininference/core/utils/drawing.py⏱️ Runtime :
2.36 milliseconds→1.63 milliseconds(best of48runs)📝 Explanation and details
The optimized code achieves a 45% speedup through several key performance improvements:
Main Optimizations:
Replaced
np.ones() * colorwithnp.full(): The original code usednp.ones((shape)) * colorwhich creates an array of ones and then multiplies by the color. The optimized version usesnp.full(shape, color, dtype=np.uint8)which directly creates the array with the desired values and correct dtype, eliminating the multiplication step and ensuring proper data type from creation.Eliminated expensive itertools operations: The original code used
itertools.chain.from_iterable(zip(row, [vertical_padding] * grid_size[1]))which creates multiple intermediate objects and performs complex chaining. The optimized version uses direct list slicing (row_with_paddings[::2] = rowandrow_with_paddings[1::2] = vertical_padding_row[:-1]) to interleave tiles and padding more efficiently.Replaced
np.concatenatewithnp.vstack: For stacking arrays along the first axis (rows),np.vstackis more optimized thannp.concatenate(axis=0)and uses less memory overhead.Improved list construction: Instead of repeatedly appending to lists in loops, the optimized version preallocates list sizes where possible and uses more efficient indexing operations.
Performance Impact by Test Cases:
Function Usage Context:
Based on the function reference,
_merge_tiles_elementsis called from_generate_tiles, which appears to be part of an image tiling/visualization pipeline. This suggests the function is likely used for creating composite images from multiple smaller images, potentially in batch processing scenarios where the performance gains would compound across multiple calls.The optimizations are particularly valuable for computer vision workflows where large numbers of images need to be arranged in grids with margins, as the improvements scale with both grid size and individual tile size.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-_merge_tiles_elements-miqlt5zcand push.