54
Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1

Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs

presented by Qixuan Li

1

Page 2: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

What is Graph Partitioning?

B

Partition 1 Partition 2

A

2

Page 3: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Why Graph Partitioning?

3

Page 4: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Social Network

4

Page 5: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Why Graph Partitioning?

• Large graphs are becoming increasingly prevalent

• Large graph datasets are too large to manage on a single machine

5

Page 6: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Ideal Graph Partitioning Algorithm• As few cut as possible

• As balance as possible

Conflict!

6

Page 7: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Dynamic Graph Partitioning

• Graph is always changing

• NP-hard

7

Page 8: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Dynamic Graph Partitioning Example

E

D

CPartition 1 Partition 2

B

A

Is A still in Partition 1?

8

Page 9: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Leopard

• Algorithm Overview

Vertex Assignment

Vertex Reassignment

Computation Skipping

• Integration with Replication

9

Page 10: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Algorithm Overview

• Incrementally maintain a quality graph partitioning, dynamically adjusting as new vertices and edge are added to the graph

• Integrates consideration of replication with partitioning

10

Page 11: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Objective Function

11

FENNEL Scoring Function:

N(v): the set of neighbors of vertex v Pi: partition i k: # partitions

Page 12: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Leopard

• Algorithm Overview

Vertex Assignment

Vertex Reassignment

Computation Skipping

• Integration with Replication

12

Page 13: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Vertex Assignment

• Given a vertex u:

compute the best partition for u using the objective function and put u into that partition

13

Page 14: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Observation: Vertex Assignment • Graph is changing

• Need to consider adds/deletes

Page 15: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Leopard

• Algorithm Overview

Vertex Assignment

Vertex Reassignment

Computation Skipping

• Integration with Replication

15

Page 16: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Vertex Reassignment v1

• When add/delete an edge (u, v)

Go through every vertex and examine for reassignment

Slow!

16

Page 17: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Observation: Vertex Reassignment v1• Most vertices are minimally affected

• u may move to v’s partition or vice versa

• ripple effect

17

Page 18: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Vertex Reassignment v2

• When an edge (u, v) is added/deleted:

u and v are chosen as the initial candidates for examination of potential reassignment

if either one of them is reassigned, add the immediate neighbors of the moved vertex to the candidate set

check for reassignment of every vertex in the candidate set

Still too slow!

18

Page 19: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Observation: Vertex Reassignment v2 • As a vertex’s number of neighbors increases,

the influence of a new neighbor decreases

• Accumulate the changes until they exceed the threshold

19

Page 20: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Leopard

• Algorithm Overview

Vertex Assignment

Vertex Reassignment

Computation Skipping

• Integration with Replication

20

Page 21: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Computation Skipping

Partition 1 Partition 2

A

threshold = 0.5 ratio = # accumulated changes / # neighbors (1) 1 edge is added to A, 1 / 3 = 0.33 < 0.5. Don’t recompute (2) When 1 more new edge is added for A: 2 / 4 = 0.5 = 0.5. Recompute the partition for A. (3) Reset # accumulated changes to 0.

21

Page 22: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Vertex Reassignment (Final)

• When an edge (u, v) is added/deleted:

u and v are chosen as the initial candidates for examination of potential reassignment

if either one of them is reassigned, add the immediate neighbors of the moved vertex to the candidate set

examine vertex in the candidate set that has accumulated changes exceeding the threshold

22

Page 23: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Vertex Reassignment (Final)

B

Partition 1 Partition 2

A

23

Page 24: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Vertex Reassignment (Final)

B

Partition 1 Partition 2

A

Candidate Set: {A, B}

24

Page 25: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Vertex Reassignment (Final)

B

Partition 1 Partition 2

A

neighbors: 1 vertices: 5 score: 1 * (1 - 5/6) = 0.17

neighbors: 3 vertices: 5 score: 3 * (1 - 3/6) = 1.5

Goals: (1) few cuts and (2) balanced

Objective Function: # neighbors * (1 - #vertices/capacity)

Partition Capacity: 6 25

Page 26: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Vertex Reassignment (Final)

B

Partition 1 Partition 2

A

neighbors: 1 vertices: 4 score: 1 * (1 - 4/6) = 0.33

neighbors: 2 vertices: 4 score: 2 * (1 - 4/6) = 0.66

Goals: (1) few cuts and (2) balanced

Objective Function: # neighbors * (1 - #vertices/capacity)

Partition Capacity: 6 26

Page 27: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Vertex Reassignment (Final)

CDPartition 1 Partition 2

A B

27

Page 28: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Vertex Reassignment (Final)

CDPartition 1 Partition 2

A

Candidate Set: {C, D}

B

28

Page 29: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Leopard

• Algorithm Overview

Vertex Assignment

Vertex Reassignment

Computation Skipping

• Integration with Replication

29

Page 30: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Replication

• Fault tolerance

• Access locality

30

Page 31: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Minimum-Average Replication• Take two parameters: minimum and average

number of copies

• Minimum: fault tolerance

• Average: additional access locality

• Decide the number and location of copies

31

Page 32: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Minimum-Average Replication

Min: 2 avg: 2.5

# copies vertices

2 A,C,D,E,H,J,K,L

3 F,I

4 B,G

example taken from the paper

secondary

primary

32

Page 33: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Minimum-Average Replication

# copies vertices

2 A,C,D,E,H,J,K,L

3 F,I

4 B,G

only fault tolerance

Min: 2 avg: 2.5

primary

secondary

33

Page 34: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Recall: Objective Function

34

FENNEL Scoring Function:

Page 35: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Modified Vertex Assignment

• def: primary copy of v = p(v)

secondary copy of v = s(v)

• p(v): u is v’s neighbor; s(u) or p(u) is in the partition

• s(v): u is v’s neighbor; p(u) is in the partition

• Assign two scores for each partition, one for the p(v) and one for s(v)

35

Page 36: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Minimum-Average Replication

Partition 1

0.15 0.1

min: 2 avg: 3

A

Partition 2 Partition 3 Partition 4 Partition 5

0.45 0.4

0.25 0.2

0.35 0.3

0.55 X

primary:

secondary:

36

Page 37: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Minimum-Average Replication

Partition 1

0.15 0.1

min: 2 avg: 3

A

Partition 2 Partition 3 Partition 4 Partition 5

0.45 0.4

0.25 0.2

0.35 0.3

0.55 X

primary:

secondary:

37

Page 38: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Minimum-Average Replication

0.220.40.870.9 0.2 0.11 0.1

High Low

... ... ... ...

min: 2 avg: 3

cutoff: top avg-1 / k-1 = 2 / 4 = 50% of scores k: # partitions

keep the last n sorted computed scores

0.3

38

Page 39: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Minimum-Average Replication

0.220.40.870.9 0.2 0.11 0.1

High Low

... ... ... ...

min: 2 avg: 3

cutoff: 50th highest score

keep the last n sorted computed scores

0.3

39

Page 40: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Minimum-Average Replication

0.220.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ...

min: 2 avg: 3

cutoff: 50th highest score

keep the last n sorted computed scores

50th 51st

40

Page 41: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Minimum-Average Replication

0.220.30.40.870.9 0.11 0.1

High Low

... ... ... ...

min: 2 avg: 3

cutoff: 50th highest score

keep the last n sorted computed scores

50th 51st

0.2

41

Page 42: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Evaluation

• Machine: 4th Generation Intel Core i5 and 16 GB memory

• Comparison Points:

Leopard (FENNEL as objective function)

One-pass FENNEL (no vertex reassignment)

METIS (static graphs)

ParMetis (repartitioning for dynamic graphs)

Hash Partitioning

42

Page 43: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Evaluation

• Partition graphs into 40 partitions • Parameters: • cut ratio: # edge cuts / # edges

43

� = 1.5 and ↵ =

pk|E|

|V |1.5

Page 44: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Dataset

44

Page 45: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Cut Ratio

45

Page 46: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Cut RatioMETIS: upper bound

46

Page 47: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Cut RatioWeb graphs

47

Page 48: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Computation Skipping

48

Page 49: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Computation Skipping

49

Page 50: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Computation Skipping

50

Page 51: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Computation Skipping always examine

never examine51

Page 52: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Computation Skipping Sparse

Dense52

Page 53: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Summary

• core idea: vertex assigned to partition with with most of its neighbors, large partition should be penalized to prevent it from becoming too large

• focuses on read-only workload

• first to integrate replication with partitioning

• perform poorly on web graph

Page 54: Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Discussion

• How the presented approach is different from heuristic method and how well it performs in maintaining accuracy?

• The experiment and research are mainly for theoretical analysis like edge-cut ratio. Then how about the running time?

• How the solution presented (map vertices to machines in a distributed hash table) solves the fundamental problem with hash partitioning stated in the introduction?

• How are frequent updates to the skipped computation counter propagated in the presence of replication?