⚡️ Speed up function tril by 81%
#153
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 81% (0.81x) speedup for
trilinkeras/src/backend/openvino/numpy.py⏱️ Runtime :
13.5 milliseconds→7.47 milliseconds(best of52runs)📝 Explanation and details
The optimized code achieves an 81% speedup through two key optimization strategies:
1. Constant Caching in OpenVINO Operations
The most significant improvement comes from caching frequently-used OpenVINO constants at module level:
In the original code, functions like
tri()repeatedly created the same constants (e.g.,ov_opset.constant(0, Type.i32)) on every call. The profiler shows these constant creation calls taking significant time - for example, theov_opset.constant(0, Type.i32)calls inrow_rangeandcol_rangecreation consumed ~13% of total execution time.By pre-computing and reusing these constants, the optimized version eliminates redundant OpenVINO graph node creation, reducing the
tri()function time from 39.6ms to 13.4ms (~66% improvement).2. Streamlined Control Flow in get_ov_output
The optimized version uses direct returns and simplified conditional expressions:
This eliminates intermediate variable assignments and reduces function call overhead.
Performance Impact Analysis:
tril()function shows the most dramatic improvement (65ms → 26ms), as it heavily callstri()which benefits from constant cachingThe cached constants approach is especially valuable in OpenVINO's graph-based computation model where each constant creation involves graph node allocation overhead.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-tril-mirc9edvand push.