21
Introduction Blendenpik BGQImplementation Evaluation Future Work Randomized Sketching for Large-Scale Sparse Ridge Regression Problems Chander Iyer 1 Christopher Carothers 1 Petros Drineas 2 1 Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 2 Department of Computer Science, Purdue University, West Lafayette, IN Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems November 13, 2016 Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 1/21

Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Randomized Sketching for Large-Scale Sparse Ridge RegressionProblems

Chander Iyer 1 Christopher Carothers 1 Petros Drineas 2

1Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY2Department of Computer Science, Purdue University, West Lafayette, IN

Workshop on Latest Advances in Scalable Algorithms for Large-ScaleSystems

November 13, 2016

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 1/21

Page 2: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Outline

Introduction.

“Big Data” in real time.Randomization : An HPC perspective.

Sparse least squares regression.

Regularized least squares solvers : exact v/s randomized.Blendenpik : A randomized iterative least squares solver.

Implementing Blendenpik on the Blue Gene/Q.

Distributed Blendenpik for large-scale sparse matrices.Batchwise Blendenpik.

Evaluation and Results.

Datasets.Scalability analysis.Performance analysis.Numerical stability analysis.

Summary and Future work.

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 2/21

Page 3: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

“Big Data” in real time (Arjun Shankar, SOS17 Conference)

Social Medium Data generation rate

400M / day

Images : 30B / month

Mails : 419B / day

Videos : 76PB / year

Table : Social Media data generation rate

Sensor Data generation rate

Ion mobility spectroscopy 10TB / dayBoeing Flight recorder 240TB / tripAstrophysics Data 10PB / yearSquare kilometer telescope array 480 PB / day

Table : Sensor data generation rate

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 3/21

Page 4: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Randomization : An HPC perspective

Numerical Algorithms and Libraries at Exascale, Dongarra et. al.,2015,HPCwire

“. . . one of the most interesting developments in HPC math libraries istaking place at the intersection of numerical linear algebra and dataanalytics, where a new class of randomized algorithms is emerging. . . ”.

“. . . powerful tools for solving both least squares and low-rankapproximation problems, which are ubiquitous in large-scale data analyticsand scientific computing.”

“these algorithms are playing a major role in the processing of theinformation that has previously lain fallow, or even been discarded,because meaningful analysis of it was simply infeasible-this is the so called’Dark Data phenomenon’.”

Randomized Algorithms (random sampling / random projections)

Can be scaled with relative ease(!) compared to traditional solvers tomodern HPC architectures.

Numerically robust due to implicit regularization.

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 4/21

Page 5: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Least squares solvers

Regularized least squares Regression

y∗ = argmin‖y‖2 subject to y ∈ argmin

x‖Ax − b‖22 + λ‖x‖22 where

A ∈ Rm×n; nnz(A) ≈ m ∗ n; m ≫ n; x ∈ R

n.

Traditional ridge regression solvers are based on the solving in the dual spaceor using kernelized ridge regression that runs in O(mn2).

Randomized least squares solvers(Existing approaches)

Sample rows after preprocessing A. Then apply QR on the sampled matrix.Drineas, Mahoney, Muthukrishnan & Sarlos, Numer. Math., 2011

Construct a preconditioner from A. Then iteratively solve thepreconditioned matrix.Rokhlin & Tygert, PNAS, 2008

Blendenpik(Avron, Maymounkov & Toledo, SISC, 2010)

Combines both approaches that runs in O(mn logm) time.

Preprocess A by applying a unitary transform. Then sample rows from thistransform and apply QR to construct a preconditioner. Then iterativelysolve the preconditioned matrix to construct an approximate solution.

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 5/21

Page 6: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

The Blendenpik algorithm

Input: A′ ∈ R(m+n)×n matrix, where A′ =

(

AλI

)

m ≫ n and rank (A) = n.

b′ ∈ Rm+n vector where b′ =

(

b0

)

.

F ∈ R(m+n)×(m+n) random unitary transform matrix.

regularization parameter λ > 0 γ(≥ 1) - Sampling factor.Output: x = Solution of minx‖Ax − b‖2.while Output not returned do

M = FA′

Let S ∈ R(m+n)×(m+n) be a random diagonal matrix:

Sii =

{

1 with probability γnm+n

0 with probability 1 − γnm+n

Ms = SM

Ms = QsRs

κ = κestimate(Rs )

if κ−1 > 5ǫmachine theny = minz‖A

′R−1s z − b′‖2

Solve Rs x = y

return x

elseif # iterations > 3 then

solve using Baseline Least squares and returnend if

end ifend while

random unitary transformation

Sampling

Thin QR preconditioning

Preconditioned iterative solve

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 6/21

Page 7: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Distributed Blendenpik for large-scale sparse matrices

The sparse unitary transformation is implemented using the RandomizedSparsity Preserving Transform (RSPT) proposed by Clarkson andWoordruff (runs in O(nnz(A)) time) and a combination of RSPT and 1-Droutines of Discrete Cosine Transform(DCT) of the FFTW library.

Distributed Blendenpik is implemented on top of Elemental. Elementalpartitions the input matrices into rectangular process grids in a 2D cyclicdistribution. The 2D input distribution format is locally non-contiguous,while the 1-D unitary transform needs locally contiguous columns on theinput matrix. This redistribution is done by an MPI AlltoAll collectiveoperation.

Challenges

Memory Constraints: The number of elements in a column is limited bythe RAM available to the process assigned to that column. Also, a processmay share the buffer with several columns at once.

MPI Framework Constraints: The number of elements that can beredistributed in a collective operation is limited upto INT MAX(231 − 1).

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 7/21

Page 8: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Batchwise Blendenpik

Solution Batchwise redistribution and transformation.

a0,0 a0,1 a0,2 . . . a0,n

a1,0 a1,1 a1,2 . . . a1,n

a2,0 a2,1 a2,2 . . . a2,n

......

.... . .

...

am,0 am,1 am,2 . . . am,n

a0,0 a0,1 a0,2

a1,0 a1,1 a1,2

a2,0 a2,1 a2,2

A[Mc, Mr]

2× 2 processor grid

a0,0 a0,1 a0,2 a0,3 . . . a0,n

a1,0 a1,1 a1,2 a1,3 . . . a1,n

a2,0 a2,2 a2,2 a2,3 . . . a2,n

......

......

. . ....

am,0 am,1 am,2 am,3 . . . am,n

a0,0

a1,0

a2,0

am,0

a0,1

a1,1

a2,2

am,1

a0,2

a1,2

a2,2

am,2

a0,3

a1,3

a2,3

am,3

A[*, Vr]

ms0,0ms0,1

ms0,2. . . ms0,n

ms1,0ms1,1

ms1,2. . . ms1,n

ms2,0ms2,1

ms2,2. . . ms2,n

......

.... . .

...

msγn,0msγn,1

msγn,2. . . msγn,n

ms0,0ms0,1

ms0,2

ms1,0ms1,1

ms1,2

ms2,0ms2,1

ms2,2

Ms[Mc, Mr]

m0,0 m0,1 m0,2 m0,3 . . . m0,n

m1,0 m1,1 m1,2 m1,3 . . . m1,n

m2,0 m2,2 m2,2 m2,3 . . . m2,n

......

......

. . ....

mm,0 mm,1 mm,2 mm,3 . . . mm,n

m0,0

m1,0

m2,0

mm,0

m0,1

m1,1

m2,2

mm,1

m0,2

m1,2

m2,2

mm,2

m0,3

m1,3

m2,3

mm,3

M[*, Vr]

Batchwise Transform

Redistribute

Sample

&Redistribute

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 8/21

Page 9: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Datasets

Matrix Name # of rows # of columns # of entries(Millions)

Condition num-ber

ns3Da−8 163, 312 20, 414 16.77 7.07E + 002mesh deform−4 936, 092 9, 393 12.2 1.17E + 003memplus−32 568, 256 17, 758 13.26 1.29E + 005

sls−1 1, 748, 122 62, 729 116.462 8.67E + 007rma10−8 374, 680 46, 835 36.18 7.98E + 010c-41−32 312, 608 9, 769 6.3 4.78E + 012

Table : Matrices used in our evaluations.

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 9/21

Page 10: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Evaluation metrics

Let A′ ∈ R(m+n)×n be the input matrix, b′ ∈ R

(m+n) be the right hand sidevector and let:

x ←− the min-norm solution obtained from batchwise Blendenpik

x∗ ←− the exact solution

r ←− the residual error, defined as b′ − A′x .

trun ←− running time of Blendenpik.

t∗

run ←− running time of baseline (Elemental).

We evaluate the Blendenpik algorithm using the following metrics.

Speedup : given byt∗run

trun.

Accuracy : defined in terms of the relative error for the min-norm solution

x given by‖A′x − A′x∗‖2‖A′x∗‖2

and the backward error given by

‖A′T r‖2.

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 10/21

Page 11: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Speedup analysis for well-conditioned matrices for increasing regularization values.

Regularization10 -12 10 -11 10 -10 10 -9 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2

Sp

eed

up

0

1

2

3

4

5

6

7

8

9

10

ns3Da-RSPTns3Da-RSPT-RDCTmesh-RSPTmesh-RSPT-RDCTmemplus-RSPTmemplus-RSPT-RDCT

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 11/21

Page 12: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Speedup analysis for ill-conditioned matrices for increasing regularization values.

Regularization10 -12 10 -11 10 -10 10 -9 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2

Sp

eed

up

0

2

4

6

8

10

12

14

rma10-RSPTrma10-RSPT-RDCTc41-RSPTc41-RSPT-RDCT

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 12/21

Page 13: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Relative Error as a function of λ.

Regularization10 -12 10 -11 10 -10 10 -9 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2

Relat

ive Er

ror

10 -14

10 -12

10 -10

10 -8

10 -6

10 -4

ns3Da-RSPTns3Da-RSPT-RDCTmesh-RSPTmesh-RSPT-RDCTmemplus-RSPTmemplus-RSPT-RDCT

Regularization10 -12 10 -11 10 -10 10 -9 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2

Relat

ive Er

ror

10 -10

10 -5

10 0

rma10-RSPTrma10-RSPT-RDCTc41-RSPTc41-RSPT-RDCT

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 13/21

Page 14: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Backward Error as a function of λ.

Regularization10 -12 10 -11 10 -10 10 -9 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2

Back

ward

Error

10 -14

10 -12

10 -10

10 -8

10 -6

10 -4

10 -2

ns3Da-Elementalns3Da-RSPTns3Da-RSPT-RDCTmesh-Elementalmesh-RSPTmesh-RSPT-RDCTmemplus-Elementalmemplus-RSPTmemplus-RSPT-RDCTsls-RSPTsls-RSPT-RDCT

Regularization10 -12 10 -11 10 -10 10 -9 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2

Back

ward

Error

10 -10

10 -8

10 -6

10 -4

10 -2

10 0

10 2

10 4

rma10-Elementalrma10-RSPTrma10-RSPT-RDCTc41-Elementalc41-RSPTc41-RSPT-RDCT

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 14/21

Page 15: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Strong Scaling as a function of increasing Blue Gene/Q nodes at optimal regularization value λ∗.

Number of Blue Gene/Q nodes16 32 64 128 256

Sp

eed

up

0

2

4

6

8

10

12

14ns3Da-RSPTns3Da-RSPT-RDCTmesh-RSPTmesh-RSPT-RDCTmemplus-RSPTmemplus-RSPT-RDCTrma10-RSPTrma10-RSPT-RDCTc41-RSPTc41-RSPT-RDCT

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 15/21

Page 16: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Speedup as a function of increasing oversampling factors at optimal regularization value λ∗.

Oversampling factor2 2.5 3 3.5 4 4.5 5 5.5 6

Sp

eed

up

0

2

4

6

8

10

12

14

ns3Da-RSPTns3Da-RSPT-RDCTmesh-RSPTmesh-RSPT-RDCTmemplus-RSPTmemplus-RSPT-RDCTrma10-RSPTrma10-RSPT-RDCTc41-RSPTc41-RSPT-RDCT

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 16/21

Page 17: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Relative Error as a function of increasing oversampling factors at optimal regularization value λ∗.

Oversampling factor2 2.5 3 3.5 4 4.5 5 5.5 6

Rel

ativ

e E

rro

r

10 -14

10 -13

10 -12

10 -11

10 -10

10 -9

10 -8

ns3Da-RSPTns3Da-RSPT-RDCTmesh-RSPTmesh-RSPT-RDCTmemplus-RSPTmemplus-RSPT-RDCTrma10-RSPTrma10-RSPT-RDCTc41-RSPTc41-RSPT-RDCT

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 17/21

Page 18: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Backward Error as a function of increasing oversampling factors at optimal regularization value

λ∗.

Oversampling factor2 2.5 3 3.5 4 4.5 5 5.5 6

Bac

kwar

d E

rro

r

10 -10

10 -5

10 0ns3Da-Elementalns3Da-RSPTns3Da-RSPT-RDCTmesh-Elementalmesh-RSPTmesh-RSPT-RDCTmemplus-Elementalmemplus-RSPTmemplus-RSPT-RDCTsls-RSPTsls-RSPT-RDCTrma10-Elementalrma10-RSPTrma10-RSPT-RDCTc41-Elementalc41-RSPTc41-RSPT-RDCT

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 18/21

Page 19: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Summary

The speedup achieved by RSPT is always better than the RSPT-RDCTtransform for two reasons. First, the Blendenpik algorithm spendsreasonable time to compute the RDCT transform. Second, the RSPTproduces a better preconditioner than the RSPT-RDCT transform thatleads to faster convergence of the LSQR stage.

As the regularization values increase, the speedup increases until it peaksfor a certain regularization value and then reduces again for all matriceswith the exception of the rma10-8 matrix.

The relative error decreases with increasing values of the regularizationparameter until it achieves the smallest relative error at λ = λ

∗ chosen asthe optimal regularizer for our evaluations. As the condition numbers ofthe matrices increase, λ∗ for each matrix also increases.

The sparse randomized transforms demonstrate significant strong scalingfor all matrices at λ∗ with the exception of the rma10-8 matrix.

The Blendenpik solver demonstrates excellent speedup and numericalstability in terms of the relative error at λ∗ for increasing oversamplingfactors. The backward error is somewhat worse yet comparable to thebackward error achieved by the baseline sparse Elemental solver at λ∗.

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 19/21

Page 20: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Future Work

Sparse QR preconditioning The most inhibitive stage of the Blendenpikalgorithm is the dense QR preconditioning stage which for a sketchedmatrix Ms ∈ R

γn×n runs in O(n3) time. For sparse approximately-squarematrices applying a sparse random transform results in a sparse sketchwhich makes a dense QR preconditioner an unsuitable choice. In such ascenario, Sparse QR based on multifrontal QR factorization is a suitablechoice for this stage due to its O(n2) runtime.

Restarted LSQR A common problem with certain matrices that haveheavy-tailed singular spectra is that even though the preconditionerconstructed in well-conditioned, the LSQR stage for such matricesconverge extremely slowly leading to stagnancy. One way to resolvestagnancy is using a Restarted LSQR solver similar to the BidiagonalBlock Lanczos approach in Algorithm.

Ridge Regression variants Another key improvement for the sparseridge-regression problem that we envision is extending the Blendenpikalgorithm framework to include other ridge-regression variants like dualridge regression and kernel ridge regression.

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 20/21

Page 21: Randomized Sketching for Large-Scale Sparse Ridge ... · approximation problems, which are ubiquitous in large-scale data analytics and scientific computing.” “these algorithms

Introduction Blendenpik BGQImplementation Evaluation Future Work

Thank you !!!

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 21/21