⚡️ Speed up function tri by 22%
#152
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 22% (0.22x) speedup for
triinkeras/src/backend/openvino/numpy.py⏱️ Runtime :
15.1 milliseconds→12.4 milliseconds(best of36runs)📝 Explanation and details
The optimization achieves a 21% speedup by introducing a constant caching mechanism that eliminates redundant OpenVINO constant creation operations.
Key optimizations applied:
Constant Caching: Added
_const_cachedictionary to store frequently usedov_opset.constantobjects, avoiding repeated Python→C++ conversions for the same values (0, 1, [0], [1]).Fast Path for Common Types: Optimized
ensure_constantfunction with separate handling for integers and floats, the most common input types, reducing branching overhead.Reused Constant Arrays: Cached common constant arrays like
[0]and[1]used inunsqueezeoperations, eliminating multiple allocations of identical constants.Why this leads to speedup:
ov_opset.constant()call involves expensive Python→C++ marshalling. Caching eliminates duplicate calls.ensure_constantfunction is called multiple times per invocation, so optimizing it has multiplicative benefits.Impact on workloads:
The
trifunction is called bytrilandtriufunctions (as shown in function_references), which are commonly used matrix operations. Since these functions may be called in loops or repeatedly during model operations, the 21% speedup compounds significantly.Test case performance:
The optimization particularly benefits larger matrices (17.6% faster on 100×100) and error handling paths (35.1% faster for invalid inputs), showing broad improvements across different usage patterns.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-tri-mirbrff0and push.