Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Introduction Blendenpik BGQImplementation Evaluation Future Work
Randomized Sketching for Large-Scale Sparse Ridge RegressionProblems
Chander Iyer 1 Christopher Carothers 1 Petros Drineas 2
1Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY2Department of Computer Science, Purdue University, West Lafayette, IN
Workshop on Latest Advances in Scalable Algorithms for Large-ScaleSystems
November 13, 2016
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 1/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Outline
Introduction.
“Big Data” in real time.Randomization : An HPC perspective.
Sparse least squares regression.
Regularized least squares solvers : exact v/s randomized.Blendenpik : A randomized iterative least squares solver.
Implementing Blendenpik on the Blue Gene/Q.
Distributed Blendenpik for large-scale sparse matrices.Batchwise Blendenpik.
Evaluation and Results.
Datasets.Scalability analysis.Performance analysis.Numerical stability analysis.
Summary and Future work.
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 2/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
“Big Data” in real time (Arjun Shankar, SOS17 Conference)
Social Medium Data generation rate
400M / day
Images : 30B / month
Mails : 419B / day
Videos : 76PB / year
Table : Social Media data generation rate
Sensor Data generation rate
Ion mobility spectroscopy 10TB / dayBoeing Flight recorder 240TB / tripAstrophysics Data 10PB / yearSquare kilometer telescope array 480 PB / day
Table : Sensor data generation rate
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 3/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Randomization : An HPC perspective
Numerical Algorithms and Libraries at Exascale, Dongarra et. al.,2015,HPCwire
“. . . one of the most interesting developments in HPC math libraries istaking place at the intersection of numerical linear algebra and dataanalytics, where a new class of randomized algorithms is emerging. . . ”.
“. . . powerful tools for solving both least squares and low-rankapproximation problems, which are ubiquitous in large-scale data analyticsand scientific computing.”
“these algorithms are playing a major role in the processing of theinformation that has previously lain fallow, or even been discarded,because meaningful analysis of it was simply infeasible-this is the so called’Dark Data phenomenon’.”
Randomized Algorithms (random sampling / random projections)
Can be scaled with relative ease(!) compared to traditional solvers tomodern HPC architectures.
Numerically robust due to implicit regularization.
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 4/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Least squares solvers
Regularized least squares Regression
y∗ = argmin‖y‖2 subject to y ∈ argmin
x‖Ax − b‖22 + λ‖x‖22 where
A ∈ Rm×n; nnz(A) ≈ m ∗ n; m ≫ n; x ∈ R
n.
Traditional ridge regression solvers are based on the solving in the dual spaceor using kernelized ridge regression that runs in O(mn2).
Randomized least squares solvers(Existing approaches)
Sample rows after preprocessing A. Then apply QR on the sampled matrix.Drineas, Mahoney, Muthukrishnan & Sarlos, Numer. Math., 2011
Construct a preconditioner from A. Then iteratively solve thepreconditioned matrix.Rokhlin & Tygert, PNAS, 2008
Blendenpik(Avron, Maymounkov & Toledo, SISC, 2010)
Combines both approaches that runs in O(mn logm) time.
Preprocess A by applying a unitary transform. Then sample rows from thistransform and apply QR to construct a preconditioner. Then iterativelysolve the preconditioned matrix to construct an approximate solution.
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 5/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
The Blendenpik algorithm
Input: A′ ∈ R(m+n)×n matrix, where A′ =
(
AλI
)
m ≫ n and rank (A) = n.
b′ ∈ Rm+n vector where b′ =
(
b0
)
.
F ∈ R(m+n)×(m+n) random unitary transform matrix.
regularization parameter λ > 0 γ(≥ 1) - Sampling factor.Output: x = Solution of minx‖Ax − b‖2.while Output not returned do
M = FA′
Let S ∈ R(m+n)×(m+n) be a random diagonal matrix:
Sii =
{
1 with probability γnm+n
0 with probability 1 − γnm+n
Ms = SM
Ms = QsRs
κ = κestimate(Rs )
if κ−1 > 5ǫmachine theny = minz‖A
′R−1s z − b′‖2
Solve Rs x = y
return x
elseif # iterations > 3 then
solve using Baseline Least squares and returnend if
end ifend while
random unitary transformation
Sampling
Thin QR preconditioning
Preconditioned iterative solve
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 6/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Distributed Blendenpik for large-scale sparse matrices
The sparse unitary transformation is implemented using the RandomizedSparsity Preserving Transform (RSPT) proposed by Clarkson andWoordruff (runs in O(nnz(A)) time) and a combination of RSPT and 1-Droutines of Discrete Cosine Transform(DCT) of the FFTW library.
Distributed Blendenpik is implemented on top of Elemental. Elementalpartitions the input matrices into rectangular process grids in a 2D cyclicdistribution. The 2D input distribution format is locally non-contiguous,while the 1-D unitary transform needs locally contiguous columns on theinput matrix. This redistribution is done by an MPI AlltoAll collectiveoperation.
Challenges
Memory Constraints: The number of elements in a column is limited bythe RAM available to the process assigned to that column. Also, a processmay share the buffer with several columns at once.
MPI Framework Constraints: The number of elements that can beredistributed in a collective operation is limited upto INT MAX(231 − 1).
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 7/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Batchwise Blendenpik
Solution Batchwise redistribution and transformation.
a0,0 a0,1 a0,2 . . . a0,n
a1,0 a1,1 a1,2 . . . a1,n
a2,0 a2,1 a2,2 . . . a2,n
......
.... . .
...
am,0 am,1 am,2 . . . am,n
a0,0 a0,1 a0,2
a1,0 a1,1 a1,2
a2,0 a2,1 a2,2
A[Mc, Mr]
2× 2 processor grid
a0,0 a0,1 a0,2 a0,3 . . . a0,n
a1,0 a1,1 a1,2 a1,3 . . . a1,n
a2,0 a2,2 a2,2 a2,3 . . . a2,n
......
......
. . ....
am,0 am,1 am,2 am,3 . . . am,n
a0,0
a1,0
a2,0
am,0
a0,1
a1,1
a2,2
am,1
a0,2
a1,2
a2,2
am,2
a0,3
a1,3
a2,3
am,3
A[*, Vr]
ms0,0ms0,1
ms0,2. . . ms0,n
ms1,0ms1,1
ms1,2. . . ms1,n
ms2,0ms2,1
ms2,2. . . ms2,n
......
.... . .
...
msγn,0msγn,1
msγn,2. . . msγn,n
ms0,0ms0,1
ms0,2
ms1,0ms1,1
ms1,2
ms2,0ms2,1
ms2,2
Ms[Mc, Mr]
m0,0 m0,1 m0,2 m0,3 . . . m0,n
m1,0 m1,1 m1,2 m1,3 . . . m1,n
m2,0 m2,2 m2,2 m2,3 . . . m2,n
......
......
. . ....
mm,0 mm,1 mm,2 mm,3 . . . mm,n
m0,0
m1,0
m2,0
mm,0
m0,1
m1,1
m2,2
mm,1
m0,2
m1,2
m2,2
mm,2
m0,3
m1,3
m2,3
mm,3
M[*, Vr]
Batchwise Transform
Redistribute
Sample
&Redistribute
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 8/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Datasets
Matrix Name # of rows # of columns # of entries(Millions)
Condition num-ber
ns3Da−8 163, 312 20, 414 16.77 7.07E + 002mesh deform−4 936, 092 9, 393 12.2 1.17E + 003memplus−32 568, 256 17, 758 13.26 1.29E + 005
sls−1 1, 748, 122 62, 729 116.462 8.67E + 007rma10−8 374, 680 46, 835 36.18 7.98E + 010c-41−32 312, 608 9, 769 6.3 4.78E + 012
Table : Matrices used in our evaluations.
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 9/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Evaluation metrics
Let A′ ∈ R(m+n)×n be the input matrix, b′ ∈ R
(m+n) be the right hand sidevector and let:
x ←− the min-norm solution obtained from batchwise Blendenpik
x∗ ←− the exact solution
r ←− the residual error, defined as b′ − A′x .
trun ←− running time of Blendenpik.
t∗
run ←− running time of baseline (Elemental).
We evaluate the Blendenpik algorithm using the following metrics.
Speedup : given byt∗run
trun.
Accuracy : defined in terms of the relative error for the min-norm solution
x given by‖A′x − A′x∗‖2‖A′x∗‖2
and the backward error given by
‖A′T r‖2.
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 10/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Speedup analysis for well-conditioned matrices for increasing regularization values.
Regularization10 -12 10 -11 10 -10 10 -9 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2
Sp
eed
up
0
1
2
3
4
5
6
7
8
9
10
ns3Da-RSPTns3Da-RSPT-RDCTmesh-RSPTmesh-RSPT-RDCTmemplus-RSPTmemplus-RSPT-RDCT
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 11/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Speedup analysis for ill-conditioned matrices for increasing regularization values.
Regularization10 -12 10 -11 10 -10 10 -9 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2
Sp
eed
up
0
2
4
6
8
10
12
14
rma10-RSPTrma10-RSPT-RDCTc41-RSPTc41-RSPT-RDCT
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 12/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Relative Error as a function of λ.
Regularization10 -12 10 -11 10 -10 10 -9 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2
Relat
ive Er
ror
10 -14
10 -12
10 -10
10 -8
10 -6
10 -4
ns3Da-RSPTns3Da-RSPT-RDCTmesh-RSPTmesh-RSPT-RDCTmemplus-RSPTmemplus-RSPT-RDCT
Regularization10 -12 10 -11 10 -10 10 -9 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2
Relat
ive Er
ror
10 -10
10 -5
10 0
rma10-RSPTrma10-RSPT-RDCTc41-RSPTc41-RSPT-RDCT
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 13/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Backward Error as a function of λ.
Regularization10 -12 10 -11 10 -10 10 -9 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2
Back
ward
Error
10 -14
10 -12
10 -10
10 -8
10 -6
10 -4
10 -2
ns3Da-Elementalns3Da-RSPTns3Da-RSPT-RDCTmesh-Elementalmesh-RSPTmesh-RSPT-RDCTmemplus-Elementalmemplus-RSPTmemplus-RSPT-RDCTsls-RSPTsls-RSPT-RDCT
Regularization10 -12 10 -11 10 -10 10 -9 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2
Back
ward
Error
10 -10
10 -8
10 -6
10 -4
10 -2
10 0
10 2
10 4
rma10-Elementalrma10-RSPTrma10-RSPT-RDCTc41-Elementalc41-RSPTc41-RSPT-RDCT
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 14/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Strong Scaling as a function of increasing Blue Gene/Q nodes at optimal regularization value λ∗.
Number of Blue Gene/Q nodes16 32 64 128 256
Sp
eed
up
0
2
4
6
8
10
12
14ns3Da-RSPTns3Da-RSPT-RDCTmesh-RSPTmesh-RSPT-RDCTmemplus-RSPTmemplus-RSPT-RDCTrma10-RSPTrma10-RSPT-RDCTc41-RSPTc41-RSPT-RDCT
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 15/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Speedup as a function of increasing oversampling factors at optimal regularization value λ∗.
Oversampling factor2 2.5 3 3.5 4 4.5 5 5.5 6
Sp
eed
up
0
2
4
6
8
10
12
14
ns3Da-RSPTns3Da-RSPT-RDCTmesh-RSPTmesh-RSPT-RDCTmemplus-RSPTmemplus-RSPT-RDCTrma10-RSPTrma10-RSPT-RDCTc41-RSPTc41-RSPT-RDCT
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 16/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Relative Error as a function of increasing oversampling factors at optimal regularization value λ∗.
Oversampling factor2 2.5 3 3.5 4 4.5 5 5.5 6
Rel
ativ
e E
rro
r
10 -14
10 -13
10 -12
10 -11
10 -10
10 -9
10 -8
ns3Da-RSPTns3Da-RSPT-RDCTmesh-RSPTmesh-RSPT-RDCTmemplus-RSPTmemplus-RSPT-RDCTrma10-RSPTrma10-RSPT-RDCTc41-RSPTc41-RSPT-RDCT
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 17/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Backward Error as a function of increasing oversampling factors at optimal regularization value
λ∗.
Oversampling factor2 2.5 3 3.5 4 4.5 5 5.5 6
Bac
kwar
d E
rro
r
10 -10
10 -5
10 0ns3Da-Elementalns3Da-RSPTns3Da-RSPT-RDCTmesh-Elementalmesh-RSPTmesh-RSPT-RDCTmemplus-Elementalmemplus-RSPTmemplus-RSPT-RDCTsls-RSPTsls-RSPT-RDCTrma10-Elementalrma10-RSPTrma10-RSPT-RDCTc41-Elementalc41-RSPTc41-RSPT-RDCT
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 18/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Summary
The speedup achieved by RSPT is always better than the RSPT-RDCTtransform for two reasons. First, the Blendenpik algorithm spendsreasonable time to compute the RDCT transform. Second, the RSPTproduces a better preconditioner than the RSPT-RDCT transform thatleads to faster convergence of the LSQR stage.
As the regularization values increase, the speedup increases until it peaksfor a certain regularization value and then reduces again for all matriceswith the exception of the rma10-8 matrix.
The relative error decreases with increasing values of the regularizationparameter until it achieves the smallest relative error at λ = λ
∗ chosen asthe optimal regularizer for our evaluations. As the condition numbers ofthe matrices increase, λ∗ for each matrix also increases.
The sparse randomized transforms demonstrate significant strong scalingfor all matrices at λ∗ with the exception of the rma10-8 matrix.
The Blendenpik solver demonstrates excellent speedup and numericalstability in terms of the relative error at λ∗ for increasing oversamplingfactors. The backward error is somewhat worse yet comparable to thebackward error achieved by the baseline sparse Elemental solver at λ∗.
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 19/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Future Work
Sparse QR preconditioning The most inhibitive stage of the Blendenpikalgorithm is the dense QR preconditioning stage which for a sketchedmatrix Ms ∈ R
γn×n runs in O(n3) time. For sparse approximately-squarematrices applying a sparse random transform results in a sparse sketchwhich makes a dense QR preconditioner an unsuitable choice. In such ascenario, Sparse QR based on multifrontal QR factorization is a suitablechoice for this stage due to its O(n2) runtime.
Restarted LSQR A common problem with certain matrices that haveheavy-tailed singular spectra is that even though the preconditionerconstructed in well-conditioned, the LSQR stage for such matricesconverge extremely slowly leading to stagnancy. One way to resolvestagnancy is using a Restarted LSQR solver similar to the BidiagonalBlock Lanczos approach in Algorithm.
Ridge Regression variants Another key improvement for the sparseridge-regression problem that we envision is extending the Blendenpikalgorithm framework to include other ridge-regression variants like dualridge regression and kernel ridge regression.
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 20/21
Introduction Blendenpik BGQImplementation Evaluation Future Work
Thank you !!!
Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 21/21