Skip to content

[WIP] Matrix abstraction#49

Merged
Transurgeon merged 11 commits intomainfrom
matrix-abstraction
Mar 11, 2026
Merged

[WIP] Matrix abstraction#49
Transurgeon merged 11 commits intomainfrom
matrix-abstraction

Conversation

@dance858
Copy link
Collaborator

@dance858 dance858 commented Mar 7, 2026

uncommented windows, remember to add later

@dance858 dance858 changed the title Matrix abstraction [WIP] Matrix abstraction Mar 7, 2026
Copy link
Collaborator

@Transurgeon Transurgeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really nice first iteration at cleaning this up with the matrix class.
One thing that could be added for this PR would be a constructor for right matmul (dense), what do you think?
The speedups with the batch matrix-vector products can be added later.

Comment on lines +150 to +152
cblas_dgemv(CblasRowMajor, CblasNoTrans, m, n, 1.0, dm->x, n, j_dense, 1,
0.0, C->x + i, 1);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude suggests that we could use batched matrix-vector (with dgemm) products with a batch size of 256.
This seems to enable the major speedups. But we can do that in another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you actually tried to see that it's significantly faster? It is certainly possible to do so, but it would probably require us to collect blocks of J first which likely would lead to much more complicated code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I did, it makes a bit difference. but let's revisit this in another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Let's do that in a new PR.

Comment on lines +43 to +44
static CSC_Matrix *dense_block_left_mult_sparsity(const Matrix *A,
const CSC_Matrix *J, int p)
Copy link
Collaborator

@Transurgeon Transurgeon Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude suggests we could do this to avoid dynamic allocation with iVec:

 The sparsity function uses iVec (a growable array) to accumulate row indices. Since dense
  A means every non-empty block contributes exactly m rows, we know the nnz per column
 upfront. A two-pass approach (count, then fill) avoids dynamic reallocation:

 // Pass 1: count nnz per column
 for (j = 0; j < k; j++) {
     col_nnz = 0;
     for each block with entries: col_nnz += m;
     Cp[j+1] = Cp[j] + col_nnz;
 }
 // Pass 2: fill row indices directly into C->i

Copy link
Collaborator

@Transurgeon Transurgeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge this!
There seems to be some code formatting issues though.

@Transurgeon Transurgeon merged commit 9480159 into main Mar 11, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants