34
Computation and Minimax Risk The most challenging topic… Some recent progress: tradeoffs between time and accuracy via convex relaxations (Chandrasekaran & Jordan, 2013) constraints on computation via optimization oracles (Duchi, McMahan & Jordan, 2014) parallelization via optimistic concurrency control (Pan, et al., 2014)

Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Embed Size (px)

Citation preview

Page 1: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Computation and Minimax Risk

• The most challenging topic…• Some recent progress:

– tradeoffs between time and accuracy via convex relaxations (Chandrasekaran & Jordan, 2013)

– constraints on computation via optimization oracles (Duchi, McMahan & Jordan, 2014)

– parallelization via optimistic concurrency control (Pan, et al., 2014)

Page 2: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Concurrency Control for Distributed Machine

LearningMichael I. Jordan

University of California, Berkeley

(with Xinghao Pan, Joseph Gonzalez, Stefanie Jegelka, Tamara Broderick and Joseph Bradley)

Page 3: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Distributed Computing Meets Large-Scale Statistical Inference

• In many areas of statistics, parallel/distributed approaches are increasingly essential (e.g., to provide time/sample tradeoffs)

• Many methods, either optimization-based or integration-based, involve exploring models having variable structure

• Leading to a core problem: how to ensure that statistical consistency and coherence are maintained when multiple processors are making structural changes to a model?

Page 4: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Data

ModelState

Serial Inference

Page 5: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

ModelState

Coordination Free Parallel Inference

Processor 1

Processor 2

Data

Page 6: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Data

ModelState

Coordination Free Parallel Inference

Processor 1

Processor 2

Keep Calm and Carry On.

Page 7: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Accuracy

Serial

Low High

Page 8: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Accuracy

Scalability

Coordination-free

Serial

High

Low High

Low

Page 9: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Accuracy

Scalability

Coordination-free

Serial

High

Low High

Low

ConcurrencyControl

Database mechanismso Guarantee correctnesso Maximize concurrency Mutual exclusion Optimistic CC

Page 10: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Data

ModelState

Mutual Exclusion Through Locking

Processor 1

Processor 2

Introducing locking (scheduling) protocols to identify

potential conflicts.

Page 11: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Data

ModelState

Processor 1

Processor 2

Enforce serialization of computation that could conflict.

Mutual Exclusion Through Locking

Page 12: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

Allow computation to proceed without blocking.

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Page 13: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

?✔

Validate potential conflicts.

Valid outcome

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Page 14: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

? ?✗ ✗

Validate potential conflicts.

Invalid Outcome

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Page 15: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

Take a compensating action.

✗ ✗Amend the Value

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Page 16: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

✗ ✗

Validate potential conflicts.

Invalid Outcome

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Page 17: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

✗ ✗Rollback and Redo

Take a compensating action.

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Page 18: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

Rollback and Redo

Non-Blocking Computation

Validation: Identify Errors

Resolution: Correct Errors

Concurrency

AccuracyFast

Infrequent

Requirements:

Page 19: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Concurrency Control

Coordination Free:

Provably fast and correct under key assumptions.

Concurrency Control:

Provably correct and fast under key assumptions.

Systems Ideas toImprove Efficiency

Page 20: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Examples

Keyw

ord

sQ

ueri

es

A B C D E F G H

1 2 3 4 5 6 7 8

$2 $5 $1 $2 $5 $1 $4 $2

Costs

$2 $2 $4 $4 $3 $6 $5 $1

Value

θ1

ϕ1

θ2

θ3θ4

ϕ2 ϕ3 ϕ4θ5

θ6

Clustering: DP-means Submodularity: Double Greedy

Bayesian Nonparametrics: Chinese Restaurant Process

Page 21: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Clustering with DP-means

Page 22: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Bayesian Nonparametrics Meets Optimization

• A methodology whereby optimization functionals arise when “small-variance asymptotics” are applied to Bayesian models based on combinatorial stochastic process priors

Page 23: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Bayesian Nonparametrics Meets Optimization

• A methodology whereby optimization functionals arise when “small-variance asymptotics” are applied to Bayesian models based on combinatorial stochastic process priors

• Inspiration: the venerable, scalable K-means algorithm can be derived as the limit of an Expectation-Maximization algorithm for fitting a mixture model

Page 24: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Bayesian Nonparametrics Meets Optimization

• A methodology whereby optimization functionals arise when “small-variance asymptotics” are applied to Bayesian models based on combinatorial stochastic process priors

• Inspiration: the venerable, scalable K-means algorithm can be derived as the limit of an Expectation-Maximization algorithm for fitting a mixture model

• We do something similar in spirit, taking limits of various Bayesian nonparametric models:– Dirichlet process mixtures– hierarchical Dirichlet process mixtures– beta processes and hierarchical beta processes

Page 25: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

DP-Means Algorithm

Computing cluster membership

[Kulis and Jordan, 2012]

λ

Page 26: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

DP-Means Algorithm

Updating cluster centers:

[Kulis and Jordan, ICML’12]

Page 27: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

DP-Means Parallel Execution

Computing cluster membership in parallel:

CPU 1

CPU 2

Cannot introduce

overlapping clusters in parallel

Page 28: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Optimistic Concurrency Control

for Parallel DP-Means

ResolutionAssign new cluster center to existing cluster

Optimistic AssumptionNo new cluster created nearby

ValidationVerify that new clusters don’t overlap

CPU 1

CPU 2

Page 29: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Corr

ectn

es

sConcurrency Control for DP-means

Theorem: OCC DP-means is serializable, i.e. equivalent to some sequential execution.

Corollary: OCC DP-means preserves theoretical properties of DP-means.

Theorem: Assuming well-spaced clusters, expected overhead of OCC DP-means, in terms of number of rejected proposals, does not depend on size of data set.

Con

cu

rre

ncy

Page 30: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Empirical Validation Failure Rate

30

OC

C O

verh

ead

Poin

ts F

aili

ng

Valid

ati

on

Dataset Size

λ Separable Clusters

2 Processors

4 Processors

8 Processors

16 Processors

32 Processors

Independence of dataset size

Page 31: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Empirical Validation Failure Rate

31

OC

C O

verh

ead

Poin

ts F

aili

ng

Valid

ati

on

Dataset Size

Overlapping Clusters

2 Processors

4 Processors

8 Processors

16 Processors

32 Processors

Weak dependence of dataset size

Page 32: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Distributed Evaluation Amazon EC2

1 2 3 4 5 6 7 80

500

1000

1500

2000

2500

3000

3500

Number of Machines

Ru

nti

me I

n S

econ

dP

er

Com

ple

te P

ass o

ver

Data

OCC DP-means Runtime Projected Linear Scaling

2x #machines≈ ½x runtime

~140 million data points; 1, 2, 4, 8 machines

Page 33: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Summary

Accuracy Scalability

SequentialAppealing theoretical properties

Little

Coordination-free

Approximate, under

assumptionsAlways fast

Concurrency Control

Always correctGood, under assumptions• Coordination-free approach guarantees speed, and

analysis focuses on showing accuracy under assumptions.• Our approach guarantees accuracy, and analysis focuses

on showing speed under assumptions.

Page 34: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran

Conclusions

• Many conceptual and mathematical challenges arising in taking seriously the problem of “Big Data”

• Facing these challenges will require a rapprochement between computer science and statistics, bringing them together at the level of their foundations – thus reshaping both disciplines