β‘ Optimize RichardsGlu::compute_gradients by removing unnecessary clones #28
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR optimizes
RichardsGlu::compute_gradientsby eliminating unnecessaryArray2cloning operations when using cached values.π‘ What:
src/richards/richards_glu.rsto use a conditional borrowing pattern. Instead of cloning the cachedOption<Array2<f32>>, the code now borrows the cached reference if available, or creates a new owned array (stored in a local variable) and borrows it if not.&Array2<f32>references.richards_glu_benchto verify performance.π― Why:
.cloned().unwrap_or_else(...), which forced a full matrix copy even when the cache was present. This caused significant allocation and memory copy overhead during training.π Measured Improvement:
compute_gradients.PR created automatically by Jules for task 4074737496829598444 started by @ryancinsight
High-level PR Summary
This PR optimizes the
compute_gradientsmethod inRichardsGluby replacing expensive cloning operations with conditional borrowing. Instead of cloning cachedArray2<f32>matrices usingcloned().unwrap_or_else(...), the code now uses a pattern that borrows cached references when available or creates owned values in local variables when cache misses occur. This eliminates unnecessary memory allocations and copies during the training forward pass, achieving approximately 7.4% performance improvement (from ~31.46ms to ~29.13ms). A new benchmark is included to measure the optimization impact.β±οΈ Estimated Review Time: 5-15 minutes
π‘ Review Order Suggestion
Cargo.tomlbenches/richards_glu_bench.rssrc/richards/richards_glu.rs