Skip to content

Left matmul 100x performance improvements#44

Merged
dance858 merged 13 commits intomainfrom
profile-left-matmul
Feb 13, 2026
Merged

Left matmul 100x performance improvements#44
dance858 merged 13 commits intomainfrom
profile-left-matmul

Conversation

@dance858
Copy link
Collaborator

@dance858 dance858 commented Feb 12, 2026

This PR accelerates left matrix multiplication by 100x (!) by avoiding explicit Kronecker product construction. Instead of treating the operation as a generic sparse matrix–matrix multiply, we use specialized logic that exploits the block/Kronecker structure. The initialization that took 15 seconds on one of Max's problems now takes 0.17 seconds.

This will make the parameter code for left matmul much simpler, since you only need to update A when refreshing a parameter @Transurgeon.

I have not tested that our Python tests in DNLP pass with this code, so let's wait with merging it until I've done so.

We should also do this refactor for right matmul, but that's for another day. I wonder if claude can code it up by mimicking my implementation of left_matmul? @Transurgeon

@dance858
Copy link
Collaborator Author

I tested that it works on the python side and added a few extra tests on the python side. I also addressed your comments @Transurgeon, so I'm merging this.

@dance858 dance858 merged commit 0e804a0 into main Feb 13, 2026
11 checks passed
@dance858 dance858 deleted the profile-left-matmul branch February 13, 2026 20:58
Transurgeon added a commit that referenced this pull request Feb 13, 2026
Sync parameter-support with main's left matmul 100x perf improvements
(PR #44) and right matmul refactor (PR #46). Simplify param matmul to
store only the small A matrix instead of block-diagonal — block_left_multiply_*
functions handle the rest.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants