⚡️ Speed up function bincount by 21%
#148
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 21% (0.21x) speedup for
bincountinkeras/src/backend/openvino/numpy.py⏱️ Runtime :
20.2 milliseconds→16.7 milliseconds(best of30runs)📝 Explanation and details
The optimization achieves a 21% speedup by introducing LRU caching for frequently created OpenVINO constants in the
bincountfunction, which is the primary performance bottleneck.Key optimizations applied:
Constant Caching with LRU Cache: Added three cached helper functions (
_ov_const,_ov_const_notype,_ov_const_empty) that cache the results ofov_opset.constant()calls. This eliminates repeated creation of identical OpenVINO constant tensors for scalar values like -1, 0, 1, and empty shapes.Combined Type Checking in
get_ov_output: Merged the separateisinstance(x, float)andisinstance(x, int)checks into a singleisinstance(x, (float, int))check, reducing redundant type checking overhead.Why this leads to speedup:
The line profiler reveals that constant creation (
ov_opset.constant().output(0)) was consuming significant time in the original code:scalar_shape = ov_opset.constant([], x_type).output(0)took 11.5% of total timeconst_minus_one = ov_opset.constant(-1, x_type).output(0)took 5.4% of total timeconst_oneandconst_zerocreationWith caching, these expensive constant creation operations are reduced from multiple calls per function invocation to just cache lookups after the first creation. The optimized version shows these operations now take significantly less time (0.9% for
scalar_shape, 1.1% forconst_minus_one).Test case performance benefits:
The optimization particularly benefits test cases that involve multiple calls to
bincountor operations with repeated scalar constants, as evidenced by the 6-20% improvements in various edge case tests. The caching is most effective when the same data types are used repeatedly across function calls, which is common in machine learning workloads where tensors often share consistent dtypes.This optimization is especially valuable in ML pipelines where
bincountmay be called frequently with similar input types, maximizing the benefit of the cached constants.✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-bincount-mir4sqtmand push.