Skip to content

eepperly/Stable-Randomized-Least-Squares

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stable randomized least-squares

Open in MATLAB Online

This repository contains code for the paper Fast randomized least-squares solvers can be just as accurate and stable as classical direct solvers by Ethan N. Epperly, Maike Meier, and Yuji Nakatsukasa. It also contains code for the earlier paper Fast and forward stable randomized algorithms for linear least-squares problems by Ethan N. Epperly; see also that paper's repo.

Background: Fast, randomized least-squares methods

The overdetermined linear least-squares problem $$x = \text{argmin}_{x \in \mathbb{R}^n} \lVert b - Ax\rVert_2$$ is one of the most fundamental problem in machine learning, statistics, and computational mathematics. To solve least-squares problems accurately, the standard approach is to use QR factorization; QR-based least-squares solvers require $O(mn^2)$ operations for an $m\times n$ matrix $A$.

In 2008, Rokhlin and Tygert introduced the sketch-and-precondition method, a faster least-squares solver that uses randomized dimensionality reduction to precondition an iterative least-squares algorithm. Sketch-and-precondition solves a least-squares problem in roughly $O(mn + n^3)$ operations, a significant speedup over QR for highly overdetermined least-squares problems $m\gg n\gg 1$. Another class of randomized least-squares solvers, known as iterative sketching methods (introduced by Pilanci and Wainwright, 2016), achieve the same $O(mn + n^3)$ operation count as sketch-and-precondition and can have benefits in some situations (e.g., for parallel computing).

Until recently, the numerical stability of randomized least-squares solvers was poorly understood. How do these algorithms perform when the matrix $A$ is stored on a computer using double-, single-, or even half-precision floating point numbers? The gold standard stability property for numerical algorithms is backward stability, a property possessed by the classical QR method for least-squares. Unfortunately, both sketch-and-precondition and iterative sketching are not backward stable, though good implementations satisfy a weaker stability property known as forward stability. Forward stability is enough for many applications, but—to truly use randomized least-squares solvers as a drop-in replacement for the QR method—it is desirable to have a fast randomized least-squares solver that is backward stable.

The paper Fast randomized least-squares solvers can be just as accurate and stable as classical direct solvers introduces two fast, backward stable randomized least-squares solvers:

  • FOSSILS, a backward stable version of iterative sketching
  • Sketch-and-precondition with iterative refinement (SPIR), a backward stable version of sketch-and-precondition

These methods are modified versions of iterative sketching and sketch-and-precondition that perform one step of iterative refinement. Fortunately, this one refinement step is enough to obtain backward stable algorithms.

Code to reproduce the numerical experiments from this paper are found in paper2_experiments/.

  • Figure 1 (paper2_experiments/test_comparison.m): Computes forward and backward errors for different randomized least-squares solvers during the course of iteration.
  • Table 1: (paper2_experiments/test_residual_orthogonality.m): Tests the value of $\lVert A^\top (b-Ax)\rVert` for different forward and backward stable least-squares solvers.
  • Figure 2 (paper2_experiments/test_comparison_2.m): Computes the final forward and backward errors for different randomized least-squares solvers for problems of increasing difficulty.
  • Figures 3 and 5 (paper2_experiments/test_cond_res.m and paper2_experiments/test_sizes.m): Compares the backward error and iteration counts for FOSSILS for least-squares problems of different sizes, conditioning, and residual norm.
  • Figure 4 (paper2_experiments/test_kernel.m and paper2_experiments/test_prony.m): Compares the runtime of FOSSILS, iterative sketching with momentum, and MATLAB's mldivide on a kernel regression problem and an application of Prony's method to quantum eigenvalue estimation.
  • Table 2: (paper2_experiments/test_sparse.m): Compares FOSSILS to mldivide for sparse least-squares problems from the SuiteSparse matrix collection.
  • Figure 6: (paper2_experiments/test_spir.m): Numerical evaluation of the sketch-and-precondition with iterative refinement (SPIR) method.

Code to reproduce the numerical experiments from this paper are found in paper1_experiments/.

  • Figure 1 (paper1_experiments/test_iterative_sketching.m): Computes the forward, residual, and backward errors for iterative sketching for different condition numbers and residual norms.
  • Figure 2 (paper1_experiments/test_variants.m): Compares the performance iterative sketching (including versions with damping and momentum) to sketch-and-precondition (with both the zero and sketch-and-solve initializations).
  • Figure 3 (paper1_experiments/test_bad_iterative_sketching.m): Compare the stable implementation of iterative sketching (in code/iterative_sketching.m) to three "bad" implementations.
  • Figure 4 (paper1_experiments/test_timing.m): Compares the runtime of iterative sketching (including versions with damping and momentum) to MATLAB's mldivide on a simplified kernel regression task.
  • Figure 5 (paper1_experiments/test_sparse.m): Compares the runtime of iterative sketching to MATLAB's mldivide for solving random sparse least-squares problems with three nonzeros per row.

Randomized least-squares solvers

This repository contains code for iterative sketching and sketch-and-precondition. Code for these methods is found in the code/ directory.

Sparse sign embeddings

The core ingredient for randomized least-squares solvers is a fast random embedding. Based on the empirical comparison in this paper (see Fig. 2) and our own testing, our preferred embedding is the sparse sign embedding.

To generate a sparse sign embedding in our code, first build the mex file using the following command:

mex sparsesign.c

Then, to generate a d by m sparse sign embedding with zeta nonzeros per column, run

S = sparsesign(d, m, zeta);

FOSSILS

FOSSILS (Fast Optimal Stable Sketchy Iterative Least Squares) is a backward stable randomized least-squares solver with a fast roughly $O(mn + n^3)$ runtime. The first step of the algorithm is to collect a sketch $SA$ of the matrix $A$ and compute its QR factorization $SA = QR$. The core engine for FOSSILS is the following outer solver: On input $A,b$,

  1. Compute $c = R^{-\top} (A^\top b)$.
  2. Solve the equation $(R^{-\top} A^\top A R^{-1}) y = c$ using the Polyak heavy ball method.
  3. Output $x = R^{-1}y$.

On it's own, the FOSSILS outer solver is not backward stable, but it can be upgraded to backward stability by using iterative refinement.

  1. Set $x_0 = R^{-1} (Q^\top (Sb))$. This is the "sketch and solve" initialization.
  2. Update $x_1 = x_0 + \mathrm{FossilsOuterSolver}(b-Ax_0)$.
  3. Output $x_2 = x_1 + \mathrm{FossilsOuterSolver}(b-Ax_1)$.

This simple two-step refinement procedure leads to a backward stable algorithm.

To run FOSSILS using our code, the command is

[x, stats] = fossils(A, b, [d, iterations, summary, verbose, reproducible])

All but the first two inputs are optional. The optional inputs are as follows:

  • d: sketching dimension. (Default value: 12*size(A,2))
  • iterations: number of iterations for each refinement step (i.e., [50,50] to set fifty steps for each refinement step). Set to 'adaptive' to automatically set the number of refinement steps. (Default value: 'adaptive')
  • summary: a function of x to be recorded at every iteration. The results will be outputted in the optional output argument stats. (Default value: None)
  • verbose: if true, output at every iteration. (Default value: false)
  • reproducible: if true, use a slow, but reproducible implementation of sparse sign embeddings. (Default value: false)

Inputting a value of [] for an optional argument results in the default value.

Sketch-and-precondition with iterative refinement

Sketch-and-precondition with iterative refinement (SPIR) is a fast, backward stable randomized least-squares solver. It is similar to FOSSILS, except that it uses a Krylov method in place of the Polyak heavy ball method.

To run SPIR using our code, the command is

[x, stats] = spir(A, b, [d, q, summary, verbose, opts, reproducible])

All but the first two inputs are optional. The optional inputs are as follows:

  • d: sketching dimension. (Default value: 2*size(A,2))
  • q: number of iterations for each refinement step. (Default value: [50,50])
  • summary: a function of x to be recorded at every iteration. The results will be outputted in the optional output argument stats. (Default value: None)
  • verbose: if true, output at every iteration. (Default value: false)
  • opts: specifies the initial iterate $x_0$ and iterative method (LSQR, symmetric conjugate gradient, or preconditioned conjugate gradient). If 'cold' is a substring of opts, then the initial iterate is chosen to be $x_0 = 0$. Otherwise, we use a warm start and choose $x_0$ to be the sketch-and-solve solution. If cgne is a substring of opts, then we solve $Ax = b$ using CGNE, and if sym is a substring of opts, then we perform conjugate gradient on the symmetrically preconditioned system $(R^{-\top} A^\top A R^{-1}) y = c$; otherwise, we use LSQR. We recommend using the LSQR or symmetric conjugate gradient implementations in practice. (Default value: '')
  • reproducible: if true, use a slow, but reproducible implementation of sparse sign embeddings. (Default value: false)

Inputting a value of [] for an optional argument results in the default value.

Iterative sketching

Iterative sketching is a fast, forward stable randomized least-squares solver. As with FOSSILS, begin by sketching and QR-factorizing $SA = QR$. We use a sparse sign embedding for the embedding matrix $S$. After which, iterative sketching produces a sequence of better-and-better least-squares solutions using the iteration

$$ x_{i+1} = x_i + R^{-1} ( R^{-\top} (A^\top(b - Ax_i))). $$

The main result of Fast and forward stable randomized algorithms for linear least-squares problems is that iterative sketching is forward stable: If you run it for sufficiently many iterations, the forward error $\lVert x - x_i \rVert$ and residual error $\lVert (b-Ax) - (b-Ax_i) \rVert$ are roughly as small as for a standard direct method like (Householder) QR factorization.

Iterative sketching can optionally be accelerated by incorporating damping and momentum, resulting in a modified iteration

$$ x_{i+1} = x_i + \alpha R^{-1} ( R^{-\top} (A^\top(b - Ax_i))) + \beta (x_i - x_{i-1}). $$

We call $\alpha$ and $\beta$ the damping parameter and momentum parameter respectively. The optimal damping and momentum parameters were computed in these papers.

To run iterative sketching using our code, the command is

[x, stats] = iterative_sketching(A, b, [d, q, summary, verbose, damping, momentum, reproducible])

All but the first two inputs are optional. The optional inputs are as follows:

  • d: sketching dimension. (Default value: see paper)
  • q: number of iterations (if q>=1) or tolerance (if q<1). If q>=1, iterative sketching will be run for q iterations. Otherwise, iterative sketching is run for an adaptive number of steps until the norm change in residual is less than q*(Anorm * norm(x) + 0.01*Acond*norm(r)). Here, Anorm and Acond are estimates of the norm and condition number of A. (Default value: eps)
  • summary: a function of x to be recorded at every iteration. The results will be outputted in the optional output argument stats. (Default value: None)
  • verbose: if true, output at every iteration. (Default value: false)
  • damping: damping coefficient. To use the optimal value, set damping to 'optimal'. (Default value: 1, i.e., no damping).
  • momentum: momentum coefficient. To use the optimal value, set both damping and momentum to 'optimal'. (Default value: 0, i.e., no momentum).
  • reproducible: if true, use a slow, but reproducible implementation of sparse sign embeddings. (Default value: false)

Inputting a value of [] for an optional argument results in the default value.

Sketch-and-precondition

Sketch-and-precondition is also based on a QR factorization $SA = QR$ of a sketch of the matrix $A$. It then uses $R$ as a preconditioner for solving $Ax = b$ using the LSQR or CGNE method. To call sketch-and-precondition, use the following command

[x, stats] = sketch_and_precondition(A, b, [d, q, summary, verbose, opts, reproducible])

All but the first two inputs are optional. The optional inputs are as follows:

  • d: sketching dimension. (Default value: 2*size(A,2))
  • q: number of iterations. (Default value: 100)
  • summary: a function of x to be recorded at every iteration. The results will be outputted in the optional output argument stats. (Default value: None)
  • verbose: if true, output at every iteration. (Default value: false)
  • opts: specifies the initial iterate $x_0$ and iterative method (LSQR, symmetric conjugate gradient, or preconditioned conjugate gradient). If 'cold' is a substring of opts, then the initial iterate is chosen to be $x_0 = 0$. Otherwise, we use a warm start and choose $x_0$ to be the sketch-and-solve solution. If cgne is a substring of opts, then we solve $Ax = b$ using CGNE, and if sym is a substring of opts, then we perform conjugate gradient on the symmetrically preconditioned system $(R^{-\top} A^\top A R^{-1}) y = c$; otherwise, we use LSQR. We recommend using the LSQR or symmetric conjugate gradient implementations in practice. (Default value: '')
  • reproducible: if true, use a slow, but reproducible implementation of sparse sign embeddings. (Default value: false)

Inputting a value of [] for an optional argument results in the default value.

Computing or estimating the backward error

Our code also provides functions to compute or estimate the backward error, defined to be $$\mathrm{BE}_\theta(\hat{x}) = \min \left\{ \lVert [\Delta A,\theta\cdot\Delta b]\rVert_F : \hat{x} = \mathrm{argmin}_y \lVert (b+\Delta b) - (A+\Delta A)y \rVert \right\}.$$The backward error is a quantitative measure of the size of the perturbations to $A$ and $b$ needed to make the numerically computed solution $\hat{x}$ the exact solution to the least-squares problem. The parameter $\theta \in [0,\infty]$ sets the relative importances of perturbations to $A$ and $b$. If $\lVert A \rVert = \lVert b \rVert = 1$, a method is backward stable if and only if $\mathrm{BE}_1(\hat{x})$ is at most a small multiple of the machine precision ($\approx 10^{-16}$ in double precision). See this section of our paper for details on the backward error and descriptions of the methods below for estimating it.

The backward error can be computed in $O(m^3)$ operations using the Waldén–Karlson–Sun–Higham formula. In our code,

backward_error_ls(A,b,xhat,[theta])

The default value of theta is infinity (Inf).

To obtain a cheaper estimate of the backward error, one can use the Karlson–Waldén estimate, which is guaranteed to be within a factor of $\sqrt{2}$ of the true backward error (neglecting rounding errors incurred in evaluating the formulas). The Karlson–Waldén estimate can be called in our code using the command

kw_estimate(A,b,xhat,[theta,S,V])

The default value of theta is infinity (Inf). The Karlson–Waldén estimate runs in $O(mn^2)$ operations. If one has access to an SVD of A (i.e., [U,S,V] = svd(A,"econ")), then the S and V matrices can be provided to kw_estimate as optional arguments, reducing the runtime to $O(mn)$.

Finally, for an even faster estimate of the backward error, one can use the sketched Karlson–Waldén estimate. The sketched Karlson–Waldén estimate is also within a small constant factor of the true backward error, and runs in a faster (roughly) $O(mn + n^3)$ operations. The sketched Karlson–Waldén estimate can be called as

kw_estimate(A,b,xhat,theta,"sketched",[d])

The parameter d sets the embedding dimension for the sketch, defaulting to 2*size(A,2).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors