32
Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling Joint work with Peter Richtárik (Edinburgh) & Tong Zhang (Rutgers & Baidu)

Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Embed Size (px)

Citation preview

Page 1: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Zheng QuUniversity of Edinburgh

Optimization & Big Data Workshop Edinburgh, 6th to 8th May, 2015

Randomized dual coordinate ascent with arbitrary

sampling

Joint work with Peter Richtárik (Edinburgh) & Tong Zhang (Rutgers & Baidu)

Page 2: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Supervised Statistical Learning

Data Algorithm Predictor

Page 3: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Supervised Statistical Learning

Data Algorithm Predictor

Predicted label True label

Input Label

Page 4: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Empirical Risk Minimization

Data Algorithm PredictorInput Label

empirical risk regularization

n = # samples (big!)

Page 5: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

n = # samples (big!)

empirical loss regularization

ERM problem:

Empirical Risk Minimization

Page 6: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Algorithm: QUARTZ

Z. Q., P. Richtárik (UoE) and T. Zhang (Rutgers & Baidu Big Data Lab, Beijing)Randomized dual coordinate ascent with arbitrary sampling arXiv:1411.5873, 2014

Page 7: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Primal-Dual Formulation

Fenchel conjugates:

ERM problem

Dual problem

Page 8: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Intuition behind QUARTZ

Fenchel’s inequality

weak duality

Optimality conditions

Page 9: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

The Primal-Dual Update

STEP 1: PRIMAL UPDATE

STEP 2: DUAL UPDATE

Optimality conditions

Page 10: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

STEP 1: Primal update

STEP 2: Dual update

Just maintaining

Page 11: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

SDCA: SS. Shwartz & T. Zhang, 09/2012mSDCA M. Takáč, A. Bijral, P. Richtárik & N. Srebro, 03/2013ASDCA: SS. Shwartz & T. Zhang, 05/2013AccProx-SDCA: SS. Shwartz & T. Zhang, 10/2013 DisDCA: TB. Yang, 2013 Iprox-SDCA: PL. Zhao & T. Zhang, 01/2014 APCG: QH. Lin, Z. Lu & L. Xiao, 07/2014SPDC: Y. Zhang & L. Xiao, 09/2014QUARTZ: Z. Q., P. Richtárik & T. Zhang, 11/2014

Randomized Primal-Dual Methods

zheng qu
comments
Page 12: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Convergence Theorem

Expected

Separable

Overapproximation

ESO Assumption

Convex combination constant

zheng qu
comments
Page 13: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Iteration Complexity Result

(*)

zheng qu
Page 14: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Complexity Results for Serial Sampling

zheng qu
comments
Page 15: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Experiment: Quartz vs SDCA,uniform vs optimal sampling

zheng qu
comments
Page 16: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

QUARTZ with Standard Mini-Batching

zheng qu
comments
Page 17: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Data Sparsity

A normalized measure of average sparsity of the data

“Fully sparse data” “Fully dense data”

zheng qu
comments
Page 18: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Iteration Complexity Results

zheng qu
comments
Page 19: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Iteration Complexity Results

zheng qu
comments
Page 20: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Theoretical Speedup Factor

Linear speedup up to a certain data-independent mini-batch size:

Further data-dependent speedup:

zheng qu
comments
Page 21: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Plots of Theoretical Speedup Factor

Linear speedup up to a certain data-independent mini-batch size:

Further data-dependent speedup:

zheng qu
comments
Page 22: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Theoretical vs Pratical Speedup

astro_ph; sparsity: 0.08%; n=29,882; cov1; sparsity: 22.22%; n=522,911;

zheng qu
comments
Page 23: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Comparison with Accelerated Mini-Batch P-D Methods

zheng qu
comments
Page 24: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Distribution of Datan = # dual variables Data matrix

Page 25: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Distributed Sampling

Random set of dual variables

Page 26: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Distributed Sampling & Distributed Coordinate Descent

Peter Richtárik and Martin TakáčDistributed coordinate descent for learning with big dataarXiv:1310.2059, 2013

Previously studied (not in the primal-dual setup):

Olivier Fercoq, Z. Q., Peter Richtárik and Martin TakáčFast distributed coordinate descent for minimizing non strongly convex losses2014 IEEE Int Workshop on Machine Learning for Signal Processing, 2014

Jakub Marecek, Peter Richtárik and Martin TakáčFast distributed coordinate descent for minimizing partially separable functionsarXiv:1406.0238, 2014

2

strongly convex & smooth

convex & smooth

Page 27: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Complexity of Distributed QUARTZ

Page 28: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Reallocating Load: Theoretical Speedup

Page 29: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Theoretical vs Practical Speedup

zheng qu
comments
Page 30: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

More on ESOESO:

second order /curvature informationlocal second order /curvature information

lost

get

Page 31: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Computation of ESO Parameters

Lemma (QR’14b)

Sampling Data

Page 32: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling

Conclusion

QUARTZ (Randomized coordinate ascent method with arbitrary sampling )o Direct primal-dual analysis (for arbitrary sampling)

optimal serial sampling tau-nice sampling (mini-batch) distributed sampling

o Theoretical speedup factor which is a very good predictor of the practical speedup factor depends on both the sparsity and the condition number shows a weak dependence on how data is distributed

Accelerated QUARTZ? Randomized fixed point algorithm with relaxation? …?

zheng qu
comments