⚡️ Speed up function compute_conv_output_shape by 94%
#160
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 94% (0.94x) speedup for
compute_conv_output_shapeinkeras/src/ops/operation_utils.py⏱️ Runtime :
946 microseconds→487 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 94% speedup by eliminating expensive NumPy array operations and replacing them with efficient Python list comprehensions and native arithmetic.
Key optimizations:
Eliminated unnecessary NumPy array conversions: The original code converted
spatial_shape,kernel_shape[:-2], anddilation_rateto NumPy arrays, even when only basic indexing and arithmetic were needed. The optimized version keeps these as native Python tuples/lists.Replaced vectorized NumPy operations with list comprehensions: The most expensive operations were the NumPy vectorized calculations for
output_spatial_shape. These are now computed element-wise using list comprehensions with explicit indexing, avoiding NumPy's overhead for small arrays (typically 1-3 dimensions).Streamlined None dimension handling: Instead of mutating a NumPy array in a loop to handle None dimensions, the optimized version uses a single list comprehension to identify None positions and a tuple comprehension to create the calculation-ready spatial shape.
Eliminated redundant array operations: Removed the final
[int(i) for i in output_spatial_shape]conversion since the list comprehensions already produce integers directly.Why this works: For small arrays (1-3D convolutions are most common), NumPy's vectorization overhead outweighs its benefits. The function references show this is called from convolutional layer constructors during model building, where the 94% speedup significantly improves model initialization time. The optimization is particularly effective for the common test cases with valid/same padding, showing 70-100% improvements across different input configurations.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-compute_conv_output_shape-mirf96krand push.