⚡️ Speed up function _dummy_copy by 7%
#49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 7% (0.07x) speedup for
_dummy_copyinxarray/core/groupby.py⏱️ Runtime :
696 microseconds→652 microseconds(best of5runs)📝 Explanation and details
The optimization introduces LRU caching to the
get_fill_valuefunction, which eliminates redundant computations of the expensivemaybe_promote(dtype)call.What changed:
@functools.lru_cache(maxsize=128)to a new_get_fill_value_cachedfunction that wraps the original logicget_fill_valueto delegate to the cached versionWhy this speeds up the code:
The profiler shows
maybe_promote(dtype)consuming 98.6% ofget_fill_value's runtime (67,850ns out of 68,839ns total). Since dtypes are immutable and fill values are deterministic, caching eliminates this repeated work. With caching, the optimized version showsget_fill_valuetaking only 39,964ns total - a 42% reduction in this function's execution time.Impact on workloads:
The
function_referencesshow_dummy_copyis called from_iter_over_selectionsin computation.py, which processes multiple selections over datasets/arrays. This creates a hot path where the same dtypes appear repeatedly, making the cache highly effective. The 6% overall speedup demonstrates the cumulative benefit whenget_fill_valueis called multiple times with the same dtype values.Test case performance:
The annotated tests show 7-11% improvements in simple test cases, indicating the optimization is particularly effective for workloads with repeated dtype operations - exactly what the LRU cache is designed to accelerate.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
⏪ Replay Tests and Runtime
test_pytest_xarrayteststest_concat_py_xarrayteststest_computation_py_xarrayteststest_formatting_py_xarray__replay_test_0.py::test_xarray_core_groupby__dummy_copyTo edit these changes
git checkout codeflash/optimize-_dummy_copy-mij01uinand push.