33
Distributed Machine Learning and Graph Processing with Sparse Matrices Shivaram Venkataraman * , Erik Bodzsar # Indrajit Roy + , Alvin AuYoung + , Rob Schreiber + * UC Berkeley, # U Chicago, + HP Labs

Distributed Machine Learning and Graph Processing with Sparse

Embed Size (px)

Citation preview

Page 1: Distributed Machine Learning and Graph Processing with Sparse

Distributed Machine Learning and Graph Processing with Sparse Matrices

Shivaram Venkataraman*, Erik Bodzsar#

Indrajit Roy+, Alvin AuYoung+, Rob Schreiber+

*UC Berkeley, #U Chicago, +HP Labs

Page 2: Distributed Machine Learning and Graph Processing with Sparse

Big Data, Complex Algorithms

PageRank(Dominant eigenvector)

Recommendations(Matrix factorization)

Anomaly detection(Top-K eigenvalues)

User Importance(Vertex Centrality)

2

Machine learning + Graph algorithms

Page 3: Distributed Machine Learning and Graph Processing with Sparse

Large-Scale Processing Frameworks

Data-parallel frameworks – MapReduce/Dryad (2004)– Process each record in parallel

– Use case: Computing sufficient statistics, analytics queries

Graph-centric frameworks – Pregel/GraphLab (2010)– Process each vertex in parallel

– Use case: Graphical models

Array-based frameworks – MadLINQ (2012)– Process blocks of array in parallel

– Use case: Linear Algebra Operations

3

Page 4: Distributed Machine Learning and Graph Processing with Sparse

PageRank using Matrices

Power Method

Dominant eigenvector

4

Mp

M = web graph matrixp = PageRank vector

Simplified algorithm repeat { p = M*p }

Linear Algebra Operations on Sparse Matrices

p

Page 5: Distributed Machine Learning and Graph Processing with Sparse

Presto

Large-scale machine learning and

graph processing on sparse matrices

5

Extend R – make it scalable, distributed

Page 6: Distributed Machine Learning and Graph Processing with Sparse

Challenge 1 – Sparse Matrices

6

Page 7: Distributed Machine Learning and Graph Processing with Sparse

Challenge 1 – Sparse Matrices

1

10

100

1000

10000

1 11 21 31 41 51 61 71 81 91

Bloc

k de

nsit

y (n

orm

aliz

ed )

Block ID

LiveJournal Netflix ClueWeb-1B

1000x more data Computation imbalance

7

Page 8: Distributed Machine Learning and Graph Processing with Sparse

Challenge 2 – Data Sharing

Sharing data through pipes/network

Time-inefficient (sending copies)Space-inefficient (extra copies)

Process

copy of data

local copy

Process

data

Process

copy of data

Process

copy of data

Server 1

network

copynetwork

copy

Server 2

8

Sparse matrices

Communication overhead

Page 9: Distributed Machine Learning and Graph Processing with Sparse

Outline

• Motivation

• Programming model

• Design

• Applications and Results

9

Page 10: Distributed Machine Learning and Graph Processing with Sparse

darray

10

Page 11: Distributed Machine Learning and Graph Processing with Sparse

foreach f (x)

11

Page 12: Distributed Machine Learning and Graph Processing with Sparse

PageRank Using Presto

M darray(dim=c(N,N),blocks=(s,N))

P darray(dim=c(N,1),blocks=(s,1))

while(..){

foreach(i,1:len,

calculate(m=splits(M,i),

x=splits(P), p=splits(P,i)) {

p m*x

}

)}

Create Distributed Array

12

M p

P1

P2

PN/s

Page 13: Distributed Machine Learning and Graph Processing with Sparse

PageRank Using Presto

M darray(dim=c(N,N),blocks=(s,N))

P darray(dim=c(N,1),blocks=(s,1))

while(..){

foreach(i,1:len,

calculate(m=splits(M,i),

x=splits(P), p=splits(P,i)) {

p m*x

}

)}

Execute function in a cluster

Pass array partitions

13

p

P1

P2

PN/s

M

Page 14: Distributed Machine Learning and Graph Processing with Sparse

Presto Architecture

WorkerWorker

Master

R instanceR instance

DR

AM

R instance R instanceR instance

DR

AM

R instance

14

Page 15: Distributed Machine Learning and Graph Processing with Sparse

Repartitioning Matrices

Profile execution

Repartition

15

Partition if max(𝑡)

𝑚𝑒𝑑𝑖𝑎𝑛 (𝑡)> 𝛿

Page 16: Distributed Machine Learning and Graph Processing with Sparse

Maintaining Size Invariants

invariant(mat,vec, type=ROW)

16

Page 17: Distributed Machine Learning and Graph Processing with Sparse

Sharing Distributed Arrays

Versioned distributed arrays

17

Goal: Zero-copy sharing across cores

Immutable partitions Safe sharing

Page 18: Distributed Machine Learning and Graph Processing with Sparse

Data Sharing Challenges

1. Garbage collection

R object data partR object header

R instance R instance

2. Header conflicts

18

Page 19: Distributed Machine Learning and Graph Processing with Sparse

Overriding R’s allocator

page

Shared R object data part

Allocate process-local headers. Map data in shared memory

Local R object header

page boundary page boundary

19

Page 20: Distributed Machine Learning and Graph Processing with Sparse

Outline

• Motivation

• Programming model

• Design

• Applications and Results

20

Page 21: Distributed Machine Learning and Graph Processing with Sparse

demo

5 node cluster 8 cores per node

PageRank on 1.5B edge Twitter data

demodemo

demo

21

Page 22: Distributed Machine Learning and Graph Processing with Sparse

22

Page 23: Distributed Machine Learning and Graph Processing with Sparse

Applications Implemented in Presto

Application Algorithm Presto LOC

PageRank Eigenvector calculation 41

Triangle counting Top-K eigenvalues 121

Netflix recommendation Matrix factorization 130

Centrality measure Graph algorithm 132

k-path connectivity Graph algorithm 30

k-means Clustering 71

Sequence alignment Smith-Waterman 64

Fewer than 140 lines of code

23

Page 24: Distributed Machine Learning and Graph Processing with Sparse

Evaluation Overview

Evaluation Setup - 25 machine cluster

- Machine: 24 cores, 96GB RAM, 10Gbps network

Data-sharing benefits – 1.5B edge Twitter graph

Repartitioning analysis – 6B edge Web-graph

Faster than Spark and Hadoop using in-memory data

Collaborative Filtering using Netflix dataset

24

Page 25: Distributed Machine Learning and Graph Processing with Sparse

Data sharing benefits

25

4.45

2.49

1.63

0.71

0.7

0.72

10

20

40

CORE

S

4.38

2.21

1.22

1.22

2.12

4.16

10

20

40

CORE

S

Compute TransferNo sharing

Sharing

Page 26: Distributed Machine Learning and Graph Processing with Sparse

0

10

20

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Split

siz

e (G

B)

Iteration count

Repartitioning Progress

26

Page 27: Distributed Machine Learning and Graph Processing with Sparse

0 20 40 60 80 100 120 140 160

Wor

kers

Transfer Compute

0 20 40 60 80 100 120 140 160

Wor

kers

Repartitioning benefits

No Repartition

Repartition

27

Page 28: Distributed Machine Learning and Graph Processing with Sparse

Related Work

Large scale data processing frameworks

– MapReduce, Dryad, Spark, GraphLab

Matrix Computations – Ricardo, MadLINQ

HPC systems – ARPACK, Combinatorial BLAS

Multi-core R packages – doMC, snow, Rmpi

28

Page 29: Distributed Machine Learning and Graph Processing with Sparse

Presto

29

Co-partitioning

matrices

Locality-based

scheduling

Caching

partitions

Page 30: Distributed Machine Learning and Graph Processing with Sparse

Conclusion

Presto: Large scale array-based framework extends R

Challenges with Sparse matrices

Repartitioning, sharing versioned arrays

30

Page 31: Distributed Machine Learning and Graph Processing with Sparse

Backup Slides

31

Page 32: Distributed Machine Learning and Graph Processing with Sparse

Netflix Collaborative Filtering

32

755.112

380.985

256.1

234.236

202.725

155.299

0 200 400 600 800

8

16

24

32

40

48

Time (seconds)

Num

ber

of c

ores

Load t(R)×R R×t(R)×R

Page 33: Distributed Machine Learning and Graph Processing with Sparse

Repartitioning benefits

0

50

100

150

200

250

300

350

400

2000

3000

4000

5000

6000

7000

8000

0 5 10 15 20

Cum

ula

tive

par

titi

onin

g ti

me

(s)

Tim

e to

con

verg

ence

(s)

Number of Repartitions

Convergence Time

Time spent partitioning

33