32
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University

Piccolo: Building fast distributed programs with partitioned tables

  • Upload
    gianna

  • View
    72

  • Download
    0

Embed Size (px)

DESCRIPTION

Piccolo: Building fast distributed programs with partitioned tables. Russell Power Jinyang Li New York University. Motivating Example : PageRank. for each node X in graph: for each edge X  Z: next[Z ] += curr[X ]. Repeat until convergence. Fits in memory!. Input Graph. Curr. - PowerPoint PPT Presentation

Citation preview

Page 1: Piccolo:  Building fast  distributed programs with partitioned tables

Piccolo: Building fast distributed programs with partitioned tables

Russell PowerJinyang LiNew York University

Page 2: Piccolo:  Building fast  distributed programs with partitioned tables

Motivating Example: PageRank

for each node X in graph: for each edge XZ: next[Z] += curr[X]

Repeat until convergence

AB,C,DBECD…

A: 0.12B: 0.15C: 0.2…

A: 0B: 0C: 0…

Curr NextInput GraphA: 0.2B: 0.16C: 0.21…

A: 0.2B: 0.16C: 0.21…

A: 0.25B: 0.17C: 0.22…

A: 0.25B: 0.17C: 0.22…

Fits in memory!

Page 3: Piccolo:  Building fast  distributed programs with partitioned tables

PageRank in MapReduce

1

2 3

Distributed Storage

Graph stream Rank streamA->B,C, B->D

A:0.1, B:0.2Rank streamA:0.1, B:0.2

Data flow models do not expose global state.

Page 4: Piccolo:  Building fast  distributed programs with partitioned tables

PageRank in MapReduce

1

2 3

Distributed Storage

Graph stream Rank streamA->B,C, B->D

A:0.1, B:0.2Rank streamA:0.1, B:0.2

Data flow models do not expose global state.

Page 5: Piccolo:  Building fast  distributed programs with partitioned tables

PageRank With MPI/RPC

1

2 3

Distributed Storage

GraphA-

>B,C…

RanksA: 0…

GraphB->D

RanksB: 0…Graph

C->E,F…

RanksC: 0…

User explicitly programs

communication

Page 6: Piccolo:  Building fast  distributed programs with partitioned tables

Piccolo’s Goal: Distributed Shared State

1

2 3

Distributed Storage

GraphA->B,CB->D

RanksA: 0B: 0

read/write

Distributed in-memory

state

Page 7: Piccolo:  Building fast  distributed programs with partitioned tables

Piccolo’s Goal: Distributed Shared State

1

2 3

GraphA-

>B,C…

RanksA: 0…

GraphB->D

RanksB: 0…Graph

C->E,F…

RanksC: 0…

Piccolo runtime handles

communication

Page 8: Piccolo:  Building fast  distributed programs with partitioned tables

Ease of use Performance

Page 9: Piccolo:  Building fast  distributed programs with partitioned tables
Page 10: Piccolo:  Building fast  distributed programs with partitioned tables

Talk outline Motivation Piccolo's Programming Model Runtime Scheduling Evaluation

Page 11: Piccolo:  Building fast  distributed programs with partitioned tables

Programming Model1

2 3

GraphAB,CBD…

RanksA: 0B: 0

read/writex get/putupdate/iterate

Implemented as library for C++ and Python

Page 12: Piccolo:  Building fast  distributed programs with partitioned tables

def main(): for i in range(50): launch_jobs(NUM_MACHINES, pr_kernel,

graph, curr, next) swap(curr, next) next.clear()

def pr_kernel(graph, curr, next): i = my_instance n = len(graph)/NUM_MACHINES for s in graph[(i-1)*n:i*n] for t in s.out: next[t] += curr[s.id] / len(s.out)

Naïve PageRank with Piccolo

Run by a single controller

Jobs run by many machines

curr = Table(key=PageID, value=double)next = Table(key=PageID, value=double)

Controller launches jobs in parallel

Page 13: Piccolo:  Building fast  distributed programs with partitioned tables

Naïve PageRank is Slow1

2 3Graph

A->B,C

RanksA: 0…

GraphB->D

RanksB: 0…Graph

C->E,F…

RanksC: 0…

getput

put

putget

get

Page 14: Piccolo:  Building fast  distributed programs with partitioned tables

PageRank: Exploiting Locality Control

table partitioningCo-locate

tables

Co-locate execution with table

curr = Table(…,partitions=100,partition_by=site)

next = Table(…,partitions=100,partition_by=site)

group_tables(curr,next,graph) def pr_kernel(graph, curr, next): for s in graph.get_iterator(my_instance) for t in s.out: next[t] += curr[s.id] / len(s.out) def main(): for i in range(50): launch_jobs(curr.num_partitions, pr_kernel, graph, curr, next, locality=curr) swap(curr, next) next.clear()

  

Page 15: Piccolo:  Building fast  distributed programs with partitioned tables

Exploiting Locality1

2 3Graph

A->B,C

RanksA: 0…

GraphB->D

RanksB: 0…Graph

C->E,F…

RanksC: 0…

getput

put

putget

get

Page 16: Piccolo:  Building fast  distributed programs with partitioned tables

Exploiting Locality1

2 3Graph

A->B,C

RanksA: 0…

GraphB->D

RanksB: 0…Graph

C->E,F…

RanksC: 0…

putget

putputget get

Page 17: Piccolo:  Building fast  distributed programs with partitioned tables

Synchronization1

2 3Graph

A->B,C

RanksA: 0…

GraphB->D

RanksB: 0…Graph

C->E,F…

RanksC: 0…

put (a=0.3)put (a=0.2)

How to handle synchronization?

Page 18: Piccolo:  Building fast  distributed programs with partitioned tables

Synchronization Primitives

Avoid write conflicts with accumulation functions NewValue = Accum(OldValue, Update)

sum, product, min, max

Global barriers are sufficient Tables provide release consistency

Page 19: Piccolo:  Building fast  distributed programs with partitioned tables

PageRank: Efficient Synchronizationcurr = Table(…,partition_by=site,accumulate=sum)next = Table(…,partition_by=site,accumulate=sum)group_tables(curr,next,graph)

def pr_kernel(graph, curr, next): for s in graph.get_iterator(my_instance) for t in s.out: next.update(t, curr.get(s.id)/len(s.out))

def main(): for i in range(50): handle = launch_jobs(curr.num_partitions, pr_kernel, graph, curr, next, locality=curr) barrier(handle) swap(curr, next) next.clear()

Accumulation via sum

Update invokes accumulation function

Explicitly wait between iterations

Page 20: Piccolo:  Building fast  distributed programs with partitioned tables

Efficient Synchronization1

2 3Graph

A->B,C

RanksA: 0…

GraphB->D

RanksB: 0…Graph

C->E,F…

RanksC: 0…

put (a=0.3)put (a=0.2)update (a, 0.2) update (a, 0.3)

Runtime computes sumWorkers buffer

updates locally Release consistency

Page 21: Piccolo:  Building fast  distributed programs with partitioned tables

Table Consistency1

2 3Graph

A->B,C

RanksA: 0…

GraphB->D

RanksB: 0…Graph

C->E,F…

RanksC: 0…

put (a=0.3)put (a=0.2)update (a, 0.2) update (a, 0.3)

Page 22: Piccolo:  Building fast  distributed programs with partitioned tables

PageRank with Checkpointing

curr = Table(…,partition_by=site,accumulate=sum)next = Table(…,partition_by=site,accumulate=sum)group_tables(curr,next)def pr_kernel(graph, curr, next): for node in graph.get_iterator(my_instance) for t in s.out: next.update(t,curr.get(s.id)/len(s.out))

def main(): curr, userdata = restore() last = userdata.get(‘iter’, 0) for i in range(last,50): handle = launch_jobs(curr.num_partitions, pr_kernel, graph, curr, next, locality=curr) cp_barrier(handle, tables=(next), userdata={‘iter’:i}) swap(curr, next) next.clear()

Restore previous computation

User decides which tables to checkpoint and when

Page 23: Piccolo:  Building fast  distributed programs with partitioned tables

Distributed Storage

Recovery via Checkpointing1

2 3Graph

A->B,C

RanksA: 0…

GraphB->D

RanksB: 0…Graph

C->E,F…

RanksC: 0…

Runtime uses Chandy-Lamport protocol

Page 24: Piccolo:  Building fast  distributed programs with partitioned tables

Talk Outline Motivation Piccolo's Programming Model Runtime Scheduling Evaluation

Page 25: Piccolo:  Building fast  distributed programs with partitioned tables

Load Balancing

1 32

master

J5J3

Other workers are updating P6!

Pause updates!

P1P1, P2 P3P3, P4 P5P5, P6J1J1, J2 J3

J3, J4 J6

J5, J6

P6

Coordinates work-stealing

Page 26: Piccolo:  Building fast  distributed programs with partitioned tables

Talk Outline Motivation Piccolo's Programming Model System Design Evaluation

Page 27: Piccolo:  Building fast  distributed programs with partitioned tables

Piccolo is Fast

NYU cluster, 12 nodes, 64 cores 100M-page graph

Main Hadoop Overheads: Sorting HDFS Serialization

Pag

eRan

k ite

ratio

n tim

e (s

econ

ds)

HadoopPiccolo

Page 28: Piccolo:  Building fast  distributed programs with partitioned tables

Piccolo Scales Well

EC2 Cluster - linearly scaled input graph

ideal

1 billion page graph

Pag

eRan

k ite

ratio

n tim

e (s

econ

ds)

Page 29: Piccolo:  Building fast  distributed programs with partitioned tables

Iterative Applications N-Body Simulation Matrix Multiply

Asynchronous Applications Distributed web crawler

Other applications

No straightforward Hadoop implementation

Page 30: Piccolo:  Building fast  distributed programs with partitioned tables

Related Work Data flow

MapReduce, Dryad Tuple Spaces

Linda, JavaSpaces Distributed Shared Memory

CRL, TreadMarks, Munin, Ivy UPC, Titanium

Page 31: Piccolo:  Building fast  distributed programs with partitioned tables

ConclusionDistributed shared table model

User-specified policies provide for Effective use of locality Efficient synchronization Robust failure recovery

Page 32: Piccolo:  Building fast  distributed programs with partitioned tables

Gratuitous Cat Picture

I can haz kwestions?

Try it out: piccolo.news.cs.nyu.edu