Skip to content

GraphBLAS backend#42

Open
Transurgeon wants to merge 18 commits intomasterfrom
graphblas-backend
Open

GraphBLAS backend#42
Transurgeon wants to merge 18 commits intomasterfrom
graphblas-backend

Conversation

@Transurgeon
Copy link
Owner

This PR attempts to add a new backend written in python-graphBLAS. This will be an optional dependency.
The next steps for this to be ready will be to:

  1. refactor the operations so that they return a gb.Matrix every time.
  2. add some extra test cases for backend specific.
  3. verify all unit-tests are passing and update any additional changes.
  4. Whenever the matrix order feature is added to graphBLAS, we can update the non-parametrised operations to stay in row major order.

@eriknw
Copy link

eriknw commented Oct 16, 2023

👋 👀

@Transurgeon
Copy link
Owner Author

I have tested the performance of this backend versus the SciPy one on the following example (taken from the benchmark suite, which had a 13x slowdown)

import cvxpy as cp
import numpy as np
import scipy.stats as st
from cvxpy.settings import GRAPHBLAS_CANON_BACKEND

rs = np.random.RandomState(123)
N = 50
T = 350
cov = rs.rand(N, N) * 1.5 - 0.5
cov = cov @ cov.T / 1000 + np.diag(rs.rand(N) * 0.7 + 0.3) / 1000
mean = np.zeros(N) + 1 / 1000
returns = st.multivariate_normal.rvs(mean=mean, cov=cov, size=T, random_state=rs)

d = cp.Variable((int(T * (T - 1) / 2), 1))
w = cp.Variable((N, 1))
constraints = []
ret_w = cp.Variable((T, 1))
constraints.append(ret_w == returns @ w)
mat = np.zeros((d.shape[0], T))
"""
We need to create a vector that has the following entries:
    ret_w[i] - ret_w[j]
for j in range(T), for i in range(j+1, T).
We do this by building a numpy array of mostly 0's and 1's.
(It would be better to use SciPy sparse matrix objects.)
"""
ell = 0
for j in range(T):
    for i in range(j + 1, T):
        # write to mat so that (mat @ ret_w)[ell] == var_i - var_j
        mat[ell, i] = 1
        mat[ell, j] = -1
        ell += 1
all_pairs_ret_diff = mat @ ret_w
constraints += [d >= all_pairs_ret_diff,
                d >= -all_pairs_ret_diff,
                w >= 0,
                cp.sum(w) == 1]
risk = cp.sum(d) / ((T - 1) * T)
objective = cp.Minimize(risk * 1000)
problem = cp.Problem(objective, constraints)
#print(problem.get_problem_data(solver=cp.CLARABEL, canon_backend=GRAPHBLAS_CANON_BACKEND))
#print(problem.get_problem_data(solver=cp.CLARABEL, canon_backend=cp.SCIPY_CANON_BACKEND))
print(problem.solve(solver=cp.CLARABEL, canon_backend=GRAPHBLAS_CANON_BACKEND))
#print(problem.solve(solver=cp.CLARABEL, canon_backend=cp.SCIPY_CANON_BACKEND))

@PTNobel
Copy link

PTNobel commented Jan 3, 2024

Okay, after my exploration it looks like the issue is the following: when constructing the constant_data lin_op of a NumPy matrix that is actually sparse, the SciPy backend is correctly filtering out the zero entries, but the graphblas backend is recording them as actual values.

Here's a script that shows the difference in behavior between the two libraries:

import graphblas as gb
import scipy.sparse as sp
import numpy as np

A = np.zeros((100, 100))

print('scipy:', sp.csc_matrix(A).nnz)
print('graphblas:', gb.Matrix.from_dense(A).nvals)

@Transurgeon
Copy link
Owner Author

@eriknw good news! We have fixed one of the major slowdowns for the graphBLAS backend.
It was due to my mishap as I forgot to include the parameter missing_value=0 in the gb.Matrix.from_dense function call. This create a huge 20m+ entry "sparse" matrix. Thanks to Parth, we were able to trace back to the root of the issue.
There are still 2-3 more benchmarks that are slower with graphBLAS, so I will try to debug into those this upcoming week.
See the image below for the new benchmark comparison vs SciPy
image
I think we might be able to release this backend as an optional dependency in a future version of cvxpy!

P.S. Im just curious, why did you guys choose to add the additional parameter missing_value? Is there a use case where it would be something other than 0? I feel like it could potentially be better to have a default value set to 0, for the people who don't read the docs carefully like I did :) . Let me know your thoughts on this.

@eriknw
Copy link

eriknw commented Jan 10, 2024

That's great! I'm eager to get my hands dirty with this. Let me know if there's anything specific you'd like me to look at.

P.S. Im just curious, why did you guys choose to add the additional parameter missing_value? Is there a use case where it would be something other than 0? I feel like it could potentially be better to have a default value set to 0, for the people who don't read the docs carefully like I did :) . Let me know your thoughts on this.

That missing values are not assumed to be zero is a fundamental principle of GraphBLAS. Also, "dense" is in the name ;). How can we update the docstring to make it clearer?

Here's the PR where we added this... and specifically where missing_value is used:
https://github.com/python-graphblas/python-graphblas/pull/382/files#diff-030428fe7505f6ad25207ca3e12b2fdc197af1dd4cf219ca6e743712a3f6340eR1452-R1453

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants