⚡️ Speed up method StatelessScope.get_current_value by 19%
#169
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 19% (0.19x) speedup for
StatelessScope.get_current_valueinkeras/src/backend/common/stateless_scope.py⏱️ Runtime :
1.15 microseconds→970 nanoseconds(best of35runs)📝 Explanation and details
The optimized code achieves an 18% speedup primarily through attribute lookup caching and minor initialization optimizations in the
__init__method.Key optimizations applied:
Cached attribute lookups: Instead of repeatedly resolving
backend.cast,backend.convert_to_tensor, andVariableon each loop iteration, these are cached once as local variables (backend_cast,backend_convert_to_tensor,VariableType). This eliminates multiple dictionary lookups in Python's module namespace during the loop.Empty sequence optimization: Changed default from
state_mapping or {}tostate_mapping or (), avoiding unnecessary dict construction when the parameter is None, since the code iterates over it as a sequence anyway.Why this leads to speedup:
()is a singleton and requires no memory allocation, unlike{}Impact on workloads:
Based on the test cases, this optimization is most beneficial when
StatelessScopeis instantiated with non-emptystate_mappingparameters, as the cached lookups reduce overhead proportional to the mapping size. The optimization maintains identical behavior and error handling - all validation logic, shape checking, and exception messages remain unchanged.The
get_current_valuemethod shows minimal improvement (265ns reduction) as it was already near-optimal with a simple dictionary lookup.✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-StatelessScope.get_current_value-mirlutkoand push.