91

A New Parallel Framework for Machine Learning

  • Upload
    pavel

  • View
    64

  • Download
    0

Embed Size (px)

DESCRIPTION

A New Parallel Framework for Machine Learning. Joseph Gonzalez Joint work with. Yucheng Low. Aapo Kyrola. Danny Bickson. Carlos Guestrin. Guy Blelloch. Joe Hellerstein. David O’Hallaron. Alex Smola. In ML we face BIG problems. 24 Million Wikipedia Pages. 750 Million - PowerPoint PPT Presentation

Citation preview

Page 1: A New Parallel Framework for  Machine Learning
Page 2: A New Parallel Framework for  Machine Learning

Carnegie Mellon

Joseph GonzalezJoint work with

YuchengLow

AapoKyrola

DannyBickson

CarlosGuestrin

GuyBlelloch

JoeHellerstein

DavidO’Hallaron

A New Parallel Framework for Machine Learning

AlexSmola

Page 3: A New Parallel Framework for  Machine Learning

In ML we face BIG problems

48 Hours a MinuteYouTube

24 Million Wikipedia Pages

750 MillionFacebook Users

6 Billion Flickr Photos

Page 4: A New Parallel Framework for  Machine Learning

Massive data provides opportunities for rich probabilistic structure …

4

Page 5: A New Parallel Framework for  Machine Learning

Shopper 1 Shopper 2

Cameras Cooking

5

Social Network

Page 6: A New Parallel Framework for  Machine Learning

What are the tools for massive data?

6

Page 7: A New Parallel Framework for  Machine Learning

Parallelism: Hope & ChallengesWide array of different parallel architectures:

New Challenges for Designing Machine Learning Algorithms: Race conditions and deadlocksManaging distributed model state

New Challenges for Implementing Machine Learning Algorithms:Parallel debugging and profilingHardware specific APIs

7

GPUs Multicore Clusters Mini Clouds Clouds

Page 8: A New Parallel Framework for  Machine Learning

8

Massive Structured Problems

Advances Parallel Hardware?Thesis:

“Parallel Learning and Inference in Probabilistic Graphical Models”

Page 9: A New Parallel Framework for  Machine Learning

9

Massive Structured Problems

Advances Parallel Hardware

“Parallel Learning and Inference in Probabilistic Graphical Models”

GraphLab

Parallel Algorithms forProbabilistic Learning and Inference

Probabilistic Graphical Models

Page 10: A New Parallel Framework for  Machine Learning

10

GraphLab

Massive Structured Problems

Advances Parallel Hardware

Parallel Algorithms forProbabilistic Learning and Inference

Probabilistic Graphical Models

Page 11: A New Parallel Framework for  Machine Learning

11

Massive Structured Problems

Advances Parallel HardwareGraphLab

Parallel Algorithms forProbabilistic Learning and Inference

Probabilistic Graphical Models

Page 12: A New Parallel Framework for  Machine Learning

How will wedesign and implement

parallel learning systems?

Page 13: A New Parallel Framework for  Machine Learning

Threads, Locks, & Messages

“low level parallel primitives”

We could use ….

Page 14: A New Parallel Framework for  Machine Learning

Threads, Locks, and MessagesML experts repeatedly solve the same parallel design challenges:

Implement and debug complex parallel systemTune for a specific parallel platformTwo months later the conference paper contains:

“We implemented ______ in parallel.”The resulting code:

is difficult to maintainis difficult to extendcouples learning model to parallel implementation

14

Graduate students

Page 15: A New Parallel Framework for  Machine Learning

Map-Reduce / HadoopBuild learning algorithms on-top of

high-level parallel abstractions

... a better answer:

Page 16: A New Parallel Framework for  Machine Learning

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

16

Embarrassingly Parallel independent computation

12.9

42.3

21.3

25.8

No Communication needed

Page 17: A New Parallel Framework for  Machine Learning

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

17

12.9

42.3

21.3

25.8

24.1

84.3

18.4

84.4

Image Features

Page 18: A New Parallel Framework for  Machine Learning

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

18

Embarrassingly Parallel independent computation

12.9

42.3

21.3

25.8

17.5

67.5

14.9

34.3

24.1

84.3

18.4

84.4

No Communication needed

Page 19: A New Parallel Framework for  Machine Learning

CPU 1 CPU 2

MapReduce – Reduce Phase

19

12.9

42.3

21.3

25.8

24.1

84.3

18.4

84.4

17.5

67.5

14.9

34.3

2226.

26

1726.

31

Image Features

Attractive Face Statistics

Not Attractive FaceStatistics

N A A N N N A A N A N A

Attractive FacesNot AttractiveFaces

Page 20: A New Parallel Framework for  Machine Learning

BeliefPropagation

Label Propagation

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!

20

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Is there more toMachine Learning

?

Page 21: A New Parallel Framework for  Machine Learning

Concrete ExampleLabel Propagation

Page 22: A New Parallel Framework for  Machine Learning

Profile

Label Propagation AlgorithmSocial Arithmetic:

Recurrence Algorithm:

iterate until convergence

Parallelism:Compute all Likes[i] in parallel

Sue Ann

Carlos

Me

50% What I list on my profile40% Sue Ann Likes10% Carlos Like

40%

10%

50%

80% Cameras20% Biking

30% Cameras70% Biking

50% Cameras50% Biking

I Like:

+60% Cameras, 40% Biking

Page 23: A New Parallel Framework for  Machine Learning

Properties of Graph Parallel Algorithms

DependencyGraph

IterativeComputation

What I Like

What My Friends Like

Factored Computation

Page 24: A New Parallel Framework for  Machine Learning

?

BeliefPropagation

Label Propagation

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!

24

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Map Reduce?

Page 25: A New Parallel Framework for  Machine Learning

Why not use Map-Reducefor

Graph Parallel Algorithms?

Page 26: A New Parallel Framework for  Machine Learning

Data DependenciesMap-Reduce does not efficiently express dependent data

User must code substantial data transformations Costly data replication

Inde

pend

ent D

ata

Row

s

Page 27: A New Parallel Framework for  Machine Learning

Slow

Proc

esso

rIterative Algorithms

Map-Reduce not efficiently express iterative algorithms:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Barr

ier

Barr

ier

Barr

ier

Page 28: A New Parallel Framework for  Machine Learning

MapAbuse: Iterative MapReduceOnly a subset of data needs computation:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Barr

ier

Barr

ier

Barr

ier

Page 29: A New Parallel Framework for  Machine Learning

MapAbuse: Iterative MapReduceSystem is not optimized for iteration:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Disk Penalty

Disk Penalty

Disk Penalty

Startup Penalty

Startup Penalty

Startup Penalty

Page 30: A New Parallel Framework for  Machine Learning

BeliefPropagation

SVM

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!

30

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Map Reduce?Pregel (Giraph)?

Page 31: A New Parallel Framework for  Machine Learning

BarrierPregel (Giraph)

Bulk Synchronous Parallel Model:

Compute Communicate

Page 32: A New Parallel Framework for  Machine Learning

Bulk synchronous computation can be highly inefficient.

32

Example: Loopy Belief Propagation

Problem

Page 33: A New Parallel Framework for  Machine Learning

Loopy Belief Propagation (Loopy BP)

• Iteratively estimate the “beliefs” about vertices– Read in messages– Updates marginal

estimate (belief)– Send updated

out messages• Repeat for all variables

until convergence

33

Page 34: A New Parallel Framework for  Machine Learning

Bulk Synchronous Loopy BP

• Often considered embarrassingly parallel – Associate processor

with each vertex– Receive all messages– Update all beliefs– Send all messages

• Proposed by:– Brunton et al. CRV’06– Mendiburu et al. GECC’07– Kang,et al. LDMTA’10– …

34

Page 35: A New Parallel Framework for  Machine Learning

Sequential Computational Structure

35

Page 36: A New Parallel Framework for  Machine Learning

Hidden Sequential Structure

36

Page 37: A New Parallel Framework for  Machine Learning

Hidden Sequential Structure

• Running Time:

EvidenceEvidence

Time for a singleparallel iteration Number of Iterations

37

Page 38: A New Parallel Framework for  Machine Learning

Optimal Sequential Algorithm

Forward-Backward

Bulk Synchronous2n2/p

p ≤ 2n

RunningTime

2n

Gap

p = 1Optimal Parallel

np = 2 38

Page 39: A New Parallel Framework for  Machine Learning

39

The Splash Operation• Generalize the optimal chain algorithm:

to arbitrary cyclic graphs:

~

1) Grow a BFS Spanning tree with fixed size

2) Forward Pass computing all messages at each vertex

3) Backward Pass computing all messages at each vertex

Page 40: A New Parallel Framework for  Machine Learning

Data-Parallel Algorithms can be Inefficient

The limitations of the Map-Reduce abstraction can lead to inefficient parallel algorithms.

1 2 3 4 5 6 7 80

100020003000400050006000700080009000

Number of CPUs

Runti

me

in S

econ

ds

Optimized in Memory Bulk Synchronous

Asynchronous Splash BP

Page 41: A New Parallel Framework for  Machine Learning

BeliefPropagationSVM

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

The Need for a New AbstractionMap-Reduce is not well suited for Graph-Parallelism

41

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Pregel (Giraph)

Page 42: A New Parallel Framework for  Machine Learning

What is GraphLab?

Page 43: A New Parallel Framework for  Machine Learning

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

43

Page 44: A New Parallel Framework for  Machine Learning

Data Graph

44

A graph with arbitrary data (C++ Objects) associated with each vertex and edge.

Vertex Data:• User profile text• Current interests estimates

Edge Data:• Similarity weights

Graph:• Social Network

Page 45: A New Parallel Framework for  Machine Learning

Implementing the Data GraphMulticore Setting

In MemoryChallenge:

Fast lookup, low overhead

Solution:Dense data-structuresFixed Vdata & Edata typesImmutable graph structure

Cluster Setting

In MemoryPartition Graph:

ParMETIS or Random Cuts

Cached Ghosting

Node 1 Node 2

A BC D

A BC D

A B

C D

Page 46: A New Parallel Framework for  Machine Learning

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

46

Page 47: A New Parallel Framework for  Machine Learning

label_prop(i, scope){ // Get Neighborhood data (Likes[i], Wij, Likes[j]) scope;

// Update the vertex data

// Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); }

Update Functions

47

An update function is a user defined program which when applied to a vertex transforms the data in the scope of the vertex

Page 48: A New Parallel Framework for  Machine Learning

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

48

Page 49: A New Parallel Framework for  Machine Learning

The Scheduler

49

CPU 1

CPU 2

The scheduler determines the order that vertices are updated.

e f g

kjih

dcba b

ih

a

i

b e f

j

c

Sche

dule

r

The process repeats until the scheduler is empty.

Page 50: A New Parallel Framework for  Machine Learning

Choosing a Schedule

GraphLab provides several different schedulersRound Robin: vertices are updated in a fixed orderFIFO: Vertices are updated in the order they are addedPriority: Vertices are updated in priority order

50

The choice of schedule affects the correctness and parallel performance of the algorithm

Obtain different algorithms by simply changing a flag! --scheduler=roundrobin --scheduler=fifo --scheduler=priority

Page 51: A New Parallel Framework for  Machine Learning

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

52

Page 52: A New Parallel Framework for  Machine Learning

Ensuring Race-Free CodeHow much can computation overlap?

Page 53: A New Parallel Framework for  Machine Learning

CPU 1 CPU 2

Common Problem: Write-Write Race

54

Processors running adjacent update functions simultaneously modify shared data:

CPU1 writes: CPU2 writes:

Final Value

Page 54: A New Parallel Framework for  Machine Learning

Importance of Consistency

Alternating Least Squares

Page 55: A New Parallel Framework for  Machine Learning

Importance of Consistency

Build

Test

Debug

Tweak Model

Page 56: A New Parallel Framework for  Machine Learning

GraphLab Ensures Sequential Consistency

57

For each parallel execution, there exists a sequential execution of update functions which produces the same result.

CPU 1

CPU 2

SingleCPU

Parallel

Sequential

time

Page 57: A New Parallel Framework for  Machine Learning

Consistency Rules

58

Guaranteed sequential consistency for all update functions

Data

Page 58: A New Parallel Framework for  Machine Learning

Full Consistency

59

Page 59: A New Parallel Framework for  Machine Learning

Obtaining More Parallelism

60

Page 60: A New Parallel Framework for  Machine Learning

Edge Consistency

61

CPU 1 CPU 2

Safe

Read

Page 61: A New Parallel Framework for  Machine Learning

Consistency Through R/W LocksRead/Write locks:

Full Consistency

Edge Consistency

Write Write WriteCanonical Lock Ordering

Read Write ReadRead Write

Page 62: A New Parallel Framework for  Machine Learning

Multicore Setting: Pthread R/W LocksDistributed Setting: Distributed Locking

Prefetch Locks and Data

Allow computation to proceed while locks/data are requested.

Node 2

Consistency Through R/W Locks

Node 1Data GraphPartition

Lock Pipeline

Page 63: A New Parallel Framework for  Machine Learning

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

65

Page 64: A New Parallel Framework for  Machine Learning

Anatomy of a GraphLab Program:

1) #include <graphlab.hpp>2) Define C++ Update Function3) Build data graph using the C++ graph object4) Set engine parameters:

1) Scheduler type 2) Consistency model

5) Add initial vertices to the scheduler 6) Run the engine on the graph [Blocking call]7) Final answer is stored in the graph

Page 65: A New Parallel Framework for  Machine Learning

Algorithms Implemented PageRankLoopy Belief PropagationGibbs SamplingCoEMGraphical Model Parameter LearningProbabilistic Matrix/Tensor FactorizationAlternating Least SquaresLasso with Sparse FeaturesSupport Vector Machines with Sparse FeaturesLabel-Propagation…

Page 66: A New Parallel Framework for  Machine Learning

Shared MemoryExperiments

Shared Memory Setting16 Core Workstation

68

Page 67: A New Parallel Framework for  Machine Learning

Loopy Belief Propagation

69

3D retinal image denoising

Data GraphUpdate Function:

Loopy BP Update EquationScheduler:

SplashBPConsistency Model:

Edge Consistency

Vertices: 1 MillionEdges: 3 Million

Page 68: A New Parallel Framework for  Machine Learning

Loopy Belief Propagation

70

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Optimal

Bette

r

SplashBP

15.5x speedup

Page 69: A New Parallel Framework for  Machine Learning

Gibbs SamplingProtein-protein interaction networks [Elidan et al. 2006]

Provably correct ParallelizationEdge Consistency

71

Discrete MRF14K Vertices100K Edges

Backbone

Protein

Interactions

Side-Chain

Page 70: A New Parallel Framework for  Machine Learning

Gibbs Sampling

72

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

OptimalBe

tter

Chromatic Gibbs Sampler

Page 71: A New Parallel Framework for  Machine Learning

Carnegie Mellon

An asynchronous Gibbs Sampler that adaptively addresses strong dependencies.

Splash Gibbs Sampler

73

Page 72: A New Parallel Framework for  Machine Learning

74

Splash Gibbs Sampler

Step 1: Grow multiple Splashes in parallel:

ConditionallyIndependent

Page 73: A New Parallel Framework for  Machine Learning

75

Splash Gibbs Sampler

Step 1: Grow multiple Splashes in parallel:

ConditionallyIndependent

Tree-width = 1

Page 74: A New Parallel Framework for  Machine Learning

76

Splash Gibbs Sampler

Step 1: Grow multiple Splashes in parallel:

ConditionallyIndependent

Tree-width = 2

Page 75: A New Parallel Framework for  Machine Learning

77

Splash Gibbs Sampler

Step 2: Calibrate the trees in parallel

Page 76: A New Parallel Framework for  Machine Learning

78

Splash Gibbs Sampler

Step 3: Sample trees in parallel

Page 77: A New Parallel Framework for  Machine Learning

79

Experimental Results

The Splash sampler outperforms the Chromatic sampler on models with strong dependencies

Likelihood Final Sample

Bette

r

Splash

Chromatic

“Mixing”

BetterSplash

Chromatic

Speedup in Sample Generation

Bette

r

Splash

Chromatic

• Markov logic network with strong dependencies10K Variables 28K Factors

Page 78: A New Parallel Framework for  Machine Learning

CoEM (Rosie Jones, 2005)Named Entity Recognition Task

the dog

Australia

Catalina Island

<X> ran quickly

travelled to <X>

<X> is pleasant

Hadoop 95 Cores 7.5 hrs

Is “Dog” an animal?Is “Catalina” a place?

Vertices: 2 MillionEdges: 200 Million

Page 79: A New Parallel Framework for  Machine Learning

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Bette

r

Optimal

GraphLab CoEM

CoEM (Rosie Jones, 2005)

81

GraphLab 16 Cores 30 min

15x Faster!6x fewer CPUs!

Hadoop 95 Cores 7.5 hrs

Page 80: A New Parallel Framework for  Machine Learning

Lasso: Regularized Linear Model

82

Data matrix,n x d

weights d x 1 Observationsn x 1

5 Features

4 Examples

Shooting Algorithm [Coordinate Descent]• Updates on weight vertices modify

losses on observation vertices.

Requires theFull Consistency ModelFinancial prediction dataset

from Kogan et al [2009].

Regularization

Page 81: A New Parallel Framework for  Machine Learning

Full Consistency

83

Optimal

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Bette

r

Dense

Sparse

Page 82: A New Parallel Framework for  Machine Learning

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Relaxing Consistency

84

Why does this work? (See Shotgut ICML Paper)

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Bette

r Optimal

Dense

Sparse

Page 83: A New Parallel Framework for  Machine Learning

ExperimentsAmazon EC2

High-Performance Nodes

85

Page 84: A New Parallel Framework for  Machine Learning

Video Cosegmentation

Segments mean the same

Model: 10.5 million nodes, 31 million edges

Gaussian EM clustering + BP on 3D grid

Page 85: A New Parallel Framework for  Machine Learning

Video Coseg. Speedups

Page 86: A New Parallel Framework for  Machine Learning

Matrix FactorizationNetflix Collaborative Filtering

Alternating Least Squares Matrix FactorizationModel: 0.5 million nodes, 99 million edges

Netflix

Users

Movies

d

Page 87: A New Parallel Framework for  Machine Learning

NetflixSpeedup Increasing size of the matrix factorization

Page 88: A New Parallel Framework for  Machine Learning

The Cost of Hadoop

Page 89: A New Parallel Framework for  Machine Learning

SummaryAn abstraction tailored to Machine Learning

Targets Graph-Parallel Algorithms

Naturally expressesData/computational dependenciesDynamic iterative computation

Simplifies parallel algorithm designAutomatically ensures data consistencyAchieves state-of-the-art parallel performance on a variety of problems

92

Page 90: A New Parallel Framework for  Machine Learning

Carnegie Mellon

Checkout GraphLab

http://graphlab.org

93

Documentation… Code… Tutorials…

Questions & Feedback

[email protected]

Page 91: A New Parallel Framework for  Machine Learning

Current/Future Work Out-of-core StorageHadoop/HDFS Integration

Graph ConstructionGraph StorageLaunching GraphLab from HadoopFault Tolerance through HDFS Checkpoints

Sub-scope parallelismAddress the challenge of very high degree nodes

Improved graph partitioningSupport for dynamic graph structure