Conversation
|
👋 👀 |
|
I have tested the performance of this backend versus the SciPy one on the following example (taken from the benchmark suite, which had a 13x slowdown) import cvxpy as cp
import numpy as np
import scipy.stats as st
from cvxpy.settings import GRAPHBLAS_CANON_BACKEND
rs = np.random.RandomState(123)
N = 50
T = 350
cov = rs.rand(N, N) * 1.5 - 0.5
cov = cov @ cov.T / 1000 + np.diag(rs.rand(N) * 0.7 + 0.3) / 1000
mean = np.zeros(N) + 1 / 1000
returns = st.multivariate_normal.rvs(mean=mean, cov=cov, size=T, random_state=rs)
d = cp.Variable((int(T * (T - 1) / 2), 1))
w = cp.Variable((N, 1))
constraints = []
ret_w = cp.Variable((T, 1))
constraints.append(ret_w == returns @ w)
mat = np.zeros((d.shape[0], T))
"""
We need to create a vector that has the following entries:
ret_w[i] - ret_w[j]
for j in range(T), for i in range(j+1, T).
We do this by building a numpy array of mostly 0's and 1's.
(It would be better to use SciPy sparse matrix objects.)
"""
ell = 0
for j in range(T):
for i in range(j + 1, T):
# write to mat so that (mat @ ret_w)[ell] == var_i - var_j
mat[ell, i] = 1
mat[ell, j] = -1
ell += 1
all_pairs_ret_diff = mat @ ret_w
constraints += [d >= all_pairs_ret_diff,
d >= -all_pairs_ret_diff,
w >= 0,
cp.sum(w) == 1]
risk = cp.sum(d) / ((T - 1) * T)
objective = cp.Minimize(risk * 1000)
problem = cp.Problem(objective, constraints)
#print(problem.get_problem_data(solver=cp.CLARABEL, canon_backend=GRAPHBLAS_CANON_BACKEND))
#print(problem.get_problem_data(solver=cp.CLARABEL, canon_backend=cp.SCIPY_CANON_BACKEND))
print(problem.solve(solver=cp.CLARABEL, canon_backend=GRAPHBLAS_CANON_BACKEND))
#print(problem.solve(solver=cp.CLARABEL, canon_backend=cp.SCIPY_CANON_BACKEND)) |
|
Okay, after my exploration it looks like the issue is the following: when constructing the constant_data lin_op of a NumPy matrix that is actually sparse, the SciPy backend is correctly filtering out the zero entries, but the graphblas backend is recording them as actual values. Here's a script that shows the difference in behavior between the two libraries: import graphblas as gb
import scipy.sparse as sp
import numpy as np
A = np.zeros((100, 100))
print('scipy:', sp.csc_matrix(A).nnz)
print('graphblas:', gb.Matrix.from_dense(A).nvals) |
|
@eriknw good news! We have fixed one of the major slowdowns for the graphBLAS backend. P.S. Im just curious, why did you guys choose to add the additional parameter missing_value? Is there a use case where it would be something other than 0? I feel like it could potentially be better to have a default value set to 0, for the people who don't read the docs carefully like I did :) . Let me know your thoughts on this. |
|
That's great! I'm eager to get my hands dirty with this. Let me know if there's anything specific you'd like me to look at.
That missing values are not assumed to be zero is a fundamental principle of GraphBLAS. Also, "dense" is in the name ;). How can we update the docstring to make it clearer? Here's the PR where we added this... and specifically where |

This PR attempts to add a new backend written in python-graphBLAS. This will be an optional dependency.
The next steps for this to be ready will be to: