Transcript
Page 1: A New Parallel Framework for  Machine Learning
Page 2: A New Parallel Framework for  Machine Learning

Carnegie Mellon

Joseph GonzalezJoint work with

YuchengLow

AapoKyrola

DannyBickson

CarlosGuestrin

GuyBlelloch

JoeHellerstein

DavidO’Hallaron

A New Parallel Framework for Machine Learning

AlexSmola

Page 3: A New Parallel Framework for  Machine Learning

In ML we face BIG problems

48 Hours a MinuteYouTube

24 Million Wikipedia Pages

750 MillionFacebook Users

6 Billion Flickr Photos

Page 4: A New Parallel Framework for  Machine Learning

Massive data provides opportunities for rich probabilistic structure …

4

Page 5: A New Parallel Framework for  Machine Learning

Shopper 1 Shopper 2

Cameras Cooking

5

Social Network

Page 6: A New Parallel Framework for  Machine Learning

What are the tools for massive data?

6

Page 7: A New Parallel Framework for  Machine Learning

Parallelism: Hope & ChallengesWide array of different parallel architectures:

New Challenges for Designing Machine Learning Algorithms: Race conditions and deadlocksManaging distributed model state

New Challenges for Implementing Machine Learning Algorithms:Parallel debugging and profilingHardware specific APIs

7

GPUs Multicore Clusters Mini Clouds Clouds

Page 8: A New Parallel Framework for  Machine Learning

8

Massive Structured Problems

Advances Parallel Hardware?Thesis:

“Parallel Learning and Inference in Probabilistic Graphical Models”

Page 9: A New Parallel Framework for  Machine Learning

9

Massive Structured Problems

Advances Parallel Hardware

“Parallel Learning and Inference in Probabilistic Graphical Models”

GraphLab

Parallel Algorithms forProbabilistic Learning and Inference

Probabilistic Graphical Models

Page 10: A New Parallel Framework for  Machine Learning

10

GraphLab

Massive Structured Problems

Advances Parallel Hardware

Parallel Algorithms forProbabilistic Learning and Inference

Probabilistic Graphical Models

Page 11: A New Parallel Framework for  Machine Learning

11

Massive Structured Problems

Advances Parallel HardwareGraphLab

Parallel Algorithms forProbabilistic Learning and Inference

Probabilistic Graphical Models

Page 12: A New Parallel Framework for  Machine Learning

How will wedesign and implement

parallel learning systems?

Page 13: A New Parallel Framework for  Machine Learning

Threads, Locks, & Messages

“low level parallel primitives”

We could use ….

Page 14: A New Parallel Framework for  Machine Learning

Threads, Locks, and MessagesML experts repeatedly solve the same parallel design challenges:

Implement and debug complex parallel systemTune for a specific parallel platformTwo months later the conference paper contains:

“We implemented ______ in parallel.”The resulting code:

is difficult to maintainis difficult to extendcouples learning model to parallel implementation

14

Graduate students

Page 15: A New Parallel Framework for  Machine Learning

Map-Reduce / HadoopBuild learning algorithms on-top of

high-level parallel abstractions

... a better answer:

Page 16: A New Parallel Framework for  Machine Learning

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

16

Embarrassingly Parallel independent computation

12.9

42.3

21.3

25.8

No Communication needed

Page 17: A New Parallel Framework for  Machine Learning

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

17

12.9

42.3

21.3

25.8

24.1

84.3

18.4

84.4

Image Features

Page 18: A New Parallel Framework for  Machine Learning

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

18

Embarrassingly Parallel independent computation

12.9

42.3

21.3

25.8

17.5

67.5

14.9

34.3

24.1

84.3

18.4

84.4

No Communication needed

Page 19: A New Parallel Framework for  Machine Learning

CPU 1 CPU 2

MapReduce – Reduce Phase

19

12.9

42.3

21.3

25.8

24.1

84.3

18.4

84.4

17.5

67.5

14.9

34.3

2226.

26

1726.

31

Image Features

Attractive Face Statistics

Not Attractive FaceStatistics

N A A N N N A A N A N A

Attractive FacesNot AttractiveFaces

Page 20: A New Parallel Framework for  Machine Learning

BeliefPropagation

Label Propagation

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!

20

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Is there more toMachine Learning

?

Page 21: A New Parallel Framework for  Machine Learning

Concrete ExampleLabel Propagation

Page 22: A New Parallel Framework for  Machine Learning

Profile

Label Propagation AlgorithmSocial Arithmetic:

Recurrence Algorithm:

iterate until convergence

Parallelism:Compute all Likes[i] in parallel

Sue Ann

Carlos

Me

50% What I list on my profile40% Sue Ann Likes10% Carlos Like

40%

10%

50%

80% Cameras20% Biking

30% Cameras70% Biking

50% Cameras50% Biking

I Like:

+60% Cameras, 40% Biking

Page 23: A New Parallel Framework for  Machine Learning

Properties of Graph Parallel Algorithms

DependencyGraph

IterativeComputation

What I Like

What My Friends Like

Factored Computation

Page 24: A New Parallel Framework for  Machine Learning

?

BeliefPropagation

Label Propagation

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!

24

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Map Reduce?

Page 25: A New Parallel Framework for  Machine Learning

Why not use Map-Reducefor

Graph Parallel Algorithms?

Page 26: A New Parallel Framework for  Machine Learning

Data DependenciesMap-Reduce does not efficiently express dependent data

User must code substantial data transformations Costly data replication

Inde

pend

ent D

ata

Row

s

Page 27: A New Parallel Framework for  Machine Learning

Slow

Proc

esso

rIterative Algorithms

Map-Reduce not efficiently express iterative algorithms:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Barr

ier

Barr

ier

Barr

ier

Page 28: A New Parallel Framework for  Machine Learning

MapAbuse: Iterative MapReduceOnly a subset of data needs computation:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Barr

ier

Barr

ier

Barr

ier

Page 29: A New Parallel Framework for  Machine Learning

MapAbuse: Iterative MapReduceSystem is not optimized for iteration:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Disk Penalty

Disk Penalty

Disk Penalty

Startup Penalty

Startup Penalty

Startup Penalty

Page 30: A New Parallel Framework for  Machine Learning

BeliefPropagation

SVM

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!

30

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Map Reduce?Pregel (Giraph)?

Page 31: A New Parallel Framework for  Machine Learning

BarrierPregel (Giraph)

Bulk Synchronous Parallel Model:

Compute Communicate

Page 32: A New Parallel Framework for  Machine Learning

Bulk synchronous computation can be highly inefficient.

32

Example: Loopy Belief Propagation

Problem

Page 33: A New Parallel Framework for  Machine Learning

Loopy Belief Propagation (Loopy BP)

• Iteratively estimate the “beliefs” about vertices– Read in messages– Updates marginal

estimate (belief)– Send updated

out messages• Repeat for all variables

until convergence

33

Page 34: A New Parallel Framework for  Machine Learning

Bulk Synchronous Loopy BP

• Often considered embarrassingly parallel – Associate processor

with each vertex– Receive all messages– Update all beliefs– Send all messages

• Proposed by:– Brunton et al. CRV’06– Mendiburu et al. GECC’07– Kang,et al. LDMTA’10– …

34

Page 35: A New Parallel Framework for  Machine Learning

Sequential Computational Structure

35

Page 36: A New Parallel Framework for  Machine Learning

Hidden Sequential Structure

36

Page 37: A New Parallel Framework for  Machine Learning

Hidden Sequential Structure

• Running Time:

EvidenceEvidence

Time for a singleparallel iteration Number of Iterations

37

Page 38: A New Parallel Framework for  Machine Learning

Optimal Sequential Algorithm

Forward-Backward

Bulk Synchronous2n2/p

p ≤ 2n

RunningTime

2n

Gap

p = 1Optimal Parallel

np = 2 38

Page 39: A New Parallel Framework for  Machine Learning

39

The Splash Operation• Generalize the optimal chain algorithm:

to arbitrary cyclic graphs:

~

1) Grow a BFS Spanning tree with fixed size

2) Forward Pass computing all messages at each vertex

3) Backward Pass computing all messages at each vertex

Page 40: A New Parallel Framework for  Machine Learning

Data-Parallel Algorithms can be Inefficient

The limitations of the Map-Reduce abstraction can lead to inefficient parallel algorithms.

1 2 3 4 5 6 7 80

100020003000400050006000700080009000

Number of CPUs

Runti

me

in S

econ

ds

Optimized in Memory Bulk Synchronous

Asynchronous Splash BP

Page 41: A New Parallel Framework for  Machine Learning

BeliefPropagationSVM

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

The Need for a New AbstractionMap-Reduce is not well suited for Graph-Parallelism

41

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Pregel (Giraph)

Page 42: A New Parallel Framework for  Machine Learning

What is GraphLab?

Page 43: A New Parallel Framework for  Machine Learning

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

43

Page 44: A New Parallel Framework for  Machine Learning

Data Graph

44

A graph with arbitrary data (C++ Objects) associated with each vertex and edge.

Vertex Data:• User profile text• Current interests estimates

Edge Data:• Similarity weights

Graph:• Social Network

Page 45: A New Parallel Framework for  Machine Learning

Implementing the Data GraphMulticore Setting

In MemoryChallenge:

Fast lookup, low overhead

Solution:Dense data-structuresFixed Vdata & Edata typesImmutable graph structure

Cluster Setting

In MemoryPartition Graph:

ParMETIS or Random Cuts

Cached Ghosting

Node 1 Node 2

A BC D

A BC D

A B

C D

Page 46: A New Parallel Framework for  Machine Learning

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

46

Page 47: A New Parallel Framework for  Machine Learning

label_prop(i, scope){ // Get Neighborhood data (Likes[i], Wij, Likes[j]) scope;

// Update the vertex data

// Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); }

Update Functions

47

An update function is a user defined program which when applied to a vertex transforms the data in the scope of the vertex

Page 48: A New Parallel Framework for  Machine Learning

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

48

Page 49: A New Parallel Framework for  Machine Learning

The Scheduler

49

CPU 1

CPU 2

The scheduler determines the order that vertices are updated.

e f g

kjih

dcba b

ih

a

i

b e f

j

c

Sche

dule

r

The process repeats until the scheduler is empty.

Page 50: A New Parallel Framework for  Machine Learning

Choosing a Schedule

GraphLab provides several different schedulersRound Robin: vertices are updated in a fixed orderFIFO: Vertices are updated in the order they are addedPriority: Vertices are updated in priority order

50

The choice of schedule affects the correctness and parallel performance of the algorithm

Obtain different algorithms by simply changing a flag! --scheduler=roundrobin --scheduler=fifo --scheduler=priority

Page 51: A New Parallel Framework for  Machine Learning

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

52

Page 52: A New Parallel Framework for  Machine Learning

Ensuring Race-Free CodeHow much can computation overlap?

Page 53: A New Parallel Framework for  Machine Learning

CPU 1 CPU 2

Common Problem: Write-Write Race

54

Processors running adjacent update functions simultaneously modify shared data:

CPU1 writes: CPU2 writes:

Final Value

Page 54: A New Parallel Framework for  Machine Learning

Importance of Consistency

Alternating Least Squares

Page 55: A New Parallel Framework for  Machine Learning

Importance of Consistency

Build

Test

Debug

Tweak Model

Page 56: A New Parallel Framework for  Machine Learning

GraphLab Ensures Sequential Consistency

57

For each parallel execution, there exists a sequential execution of update functions which produces the same result.

CPU 1

CPU 2

SingleCPU

Parallel

Sequential

time

Page 57: A New Parallel Framework for  Machine Learning

Consistency Rules

58

Guaranteed sequential consistency for all update functions

Data

Page 58: A New Parallel Framework for  Machine Learning

Full Consistency

59

Page 59: A New Parallel Framework for  Machine Learning

Obtaining More Parallelism

60

Page 60: A New Parallel Framework for  Machine Learning

Edge Consistency

61

CPU 1 CPU 2

Safe

Read

Page 61: A New Parallel Framework for  Machine Learning

Consistency Through R/W LocksRead/Write locks:

Full Consistency

Edge Consistency

Write Write WriteCanonical Lock Ordering

Read Write ReadRead Write

Page 62: A New Parallel Framework for  Machine Learning

Multicore Setting: Pthread R/W LocksDistributed Setting: Distributed Locking

Prefetch Locks and Data

Allow computation to proceed while locks/data are requested.

Node 2

Consistency Through R/W Locks

Node 1Data GraphPartition

Lock Pipeline

Page 63: A New Parallel Framework for  Machine Learning

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

65

Page 64: A New Parallel Framework for  Machine Learning

Anatomy of a GraphLab Program:

1) #include <graphlab.hpp>2) Define C++ Update Function3) Build data graph using the C++ graph object4) Set engine parameters:

1) Scheduler type 2) Consistency model

5) Add initial vertices to the scheduler 6) Run the engine on the graph [Blocking call]7) Final answer is stored in the graph

Page 65: A New Parallel Framework for  Machine Learning

Algorithms Implemented PageRankLoopy Belief PropagationGibbs SamplingCoEMGraphical Model Parameter LearningProbabilistic Matrix/Tensor FactorizationAlternating Least SquaresLasso with Sparse FeaturesSupport Vector Machines with Sparse FeaturesLabel-Propagation…

Page 66: A New Parallel Framework for  Machine Learning

Shared MemoryExperiments

Shared Memory Setting16 Core Workstation

68

Page 67: A New Parallel Framework for  Machine Learning

Loopy Belief Propagation

69

3D retinal image denoising

Data GraphUpdate Function:

Loopy BP Update EquationScheduler:

SplashBPConsistency Model:

Edge Consistency

Vertices: 1 MillionEdges: 3 Million

Page 68: A New Parallel Framework for  Machine Learning

Loopy Belief Propagation

70

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Optimal

Bette

r

SplashBP

15.5x speedup

Page 69: A New Parallel Framework for  Machine Learning

Gibbs SamplingProtein-protein interaction networks [Elidan et al. 2006]

Provably correct ParallelizationEdge Consistency

71

Discrete MRF14K Vertices100K Edges

Backbone

Protein

Interactions

Side-Chain

Page 70: A New Parallel Framework for  Machine Learning

Gibbs Sampling

72

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

OptimalBe

tter

Chromatic Gibbs Sampler

Page 71: A New Parallel Framework for  Machine Learning

Carnegie Mellon

An asynchronous Gibbs Sampler that adaptively addresses strong dependencies.

Splash Gibbs Sampler

73

Page 72: A New Parallel Framework for  Machine Learning

74

Splash Gibbs Sampler

Step 1: Grow multiple Splashes in parallel:

ConditionallyIndependent

Page 73: A New Parallel Framework for  Machine Learning

75

Splash Gibbs Sampler

Step 1: Grow multiple Splashes in parallel:

ConditionallyIndependent

Tree-width = 1

Page 74: A New Parallel Framework for  Machine Learning

76

Splash Gibbs Sampler

Step 1: Grow multiple Splashes in parallel:

ConditionallyIndependent

Tree-width = 2

Page 75: A New Parallel Framework for  Machine Learning

77

Splash Gibbs Sampler

Step 2: Calibrate the trees in parallel

Page 76: A New Parallel Framework for  Machine Learning

78

Splash Gibbs Sampler

Step 3: Sample trees in parallel

Page 77: A New Parallel Framework for  Machine Learning

79

Experimental Results

The Splash sampler outperforms the Chromatic sampler on models with strong dependencies

Likelihood Final Sample

Bette

r

Splash

Chromatic

“Mixing”

BetterSplash

Chromatic

Speedup in Sample Generation

Bette

r

Splash

Chromatic

• Markov logic network with strong dependencies10K Variables 28K Factors

Page 78: A New Parallel Framework for  Machine Learning

CoEM (Rosie Jones, 2005)Named Entity Recognition Task

the dog

Australia

Catalina Island

<X> ran quickly

travelled to <X>

<X> is pleasant

Hadoop 95 Cores 7.5 hrs

Is “Dog” an animal?Is “Catalina” a place?

Vertices: 2 MillionEdges: 200 Million

Page 79: A New Parallel Framework for  Machine Learning

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Bette

r

Optimal

GraphLab CoEM

CoEM (Rosie Jones, 2005)

81

GraphLab 16 Cores 30 min

15x Faster!6x fewer CPUs!

Hadoop 95 Cores 7.5 hrs

Page 80: A New Parallel Framework for  Machine Learning

Lasso: Regularized Linear Model

82

Data matrix,n x d

weights d x 1 Observationsn x 1

5 Features

4 Examples

Shooting Algorithm [Coordinate Descent]• Updates on weight vertices modify

losses on observation vertices.

Requires theFull Consistency ModelFinancial prediction dataset

from Kogan et al [2009].

Regularization

Page 81: A New Parallel Framework for  Machine Learning

Full Consistency

83

Optimal

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Bette

r

Dense

Sparse

Page 82: A New Parallel Framework for  Machine Learning

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Relaxing Consistency

84

Why does this work? (See Shotgut ICML Paper)

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Bette

r Optimal

Dense

Sparse

Page 83: A New Parallel Framework for  Machine Learning

ExperimentsAmazon EC2

High-Performance Nodes

85

Page 84: A New Parallel Framework for  Machine Learning

Video Cosegmentation

Segments mean the same

Model: 10.5 million nodes, 31 million edges

Gaussian EM clustering + BP on 3D grid

Page 85: A New Parallel Framework for  Machine Learning

Video Coseg. Speedups

Page 86: A New Parallel Framework for  Machine Learning

Matrix FactorizationNetflix Collaborative Filtering

Alternating Least Squares Matrix FactorizationModel: 0.5 million nodes, 99 million edges

Netflix

Users

Movies

d

Page 87: A New Parallel Framework for  Machine Learning

NetflixSpeedup Increasing size of the matrix factorization

Page 88: A New Parallel Framework for  Machine Learning

The Cost of Hadoop

Page 89: A New Parallel Framework for  Machine Learning

SummaryAn abstraction tailored to Machine Learning

Targets Graph-Parallel Algorithms

Naturally expressesData/computational dependenciesDynamic iterative computation

Simplifies parallel algorithm designAutomatically ensures data consistencyAchieves state-of-the-art parallel performance on a variety of problems

92

Page 90: A New Parallel Framework for  Machine Learning

Carnegie Mellon

Checkout GraphLab

http://graphlab.org

93

Documentation… Code… Tutorials…

Questions & Feedback

[email protected]

Page 91: A New Parallel Framework for  Machine Learning

Current/Future Work Out-of-core StorageHadoop/HDFS Integration

Graph ConstructionGraph StorageLaunching GraphLab from HadoopFault Tolerance through HDFS Checkpoints

Sub-scope parallelismAddress the challenge of very high degree nodes

Improved graph partitioningSupport for dynamic graph structure


Recommended