43
HDRF: Stream-Based Partitioning for Power-Law Graphs. travel grant from Fabio Petroni, L. Querzoni, K. Daudjee, S. Kamali, G. Iacoboni

HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

HDRF: Stream-Based Partitioning for Power-Law Graphs.

travel grant from

Fabio Petroni, L. Querzoni, K. Daudjee, S. Kamali, G. Iacoboni

Page 2: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Graphs

I large amount of real data can be represented as a graph.

VerticesEdges

G(V,E)I the problem of optimally partitioning a graph whilemaintaining load balance is important in several contexts.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 2 of 22

Page 3: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Distributed Graph Computing Frameworks

I support e�cient computation on really large graphs.B graph analytics; data mining and machine learning tasks.

I input data partitioning can have a significant impact on theperformance of the graph computation.B a↵ects network usage, memory occupation, synchronization.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 3 of 22

Page 4: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Balanced Graph Partitioning

vertex-cutedge-cutI partition G into smaller components of (ideally) equal size

edge-cutvertex disjoint partitions

vertex-cutedge disjoint partitions

I a vertex can be cut in multiple ways and span severalpartitions while a cut edge connects only two partitions.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 4 of 22

Page 5: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Balanced Graph Partitioning

vertex-cutedge-cutI partition G into smaller components of (ideally) equal size

edge-cutvertex disjoint partitions

vertex-cutedge disjoint partitions

I a vertex can be cut in multiple ways and span severalpartitions while a cut edge connects only two partitions.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 4 of 22

Page 6: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Vertex-Cut

computation steps are associated with edges

v-cut perform better on power law graphs

create less storage and network overhead

(Gonzalez et al., 2012)

I modern distributed graph computingframeworks use vertex-cut.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 5 of 22

Page 7: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Power-Law GraphsI characteristic of real graphs: power-law degree distribution.

B most vertices have few connections while a few have many.

100

101

102

103

104

105

106

100 101 102 103 104 105

nu

mb

er

of

vert

ice

s

degree

crawlTwitter

connection 2010

log-log scale

I the probability that a vertex has degree d is P(d) / d

�↵ .

I ↵ controls the “skewness” of the degree distribution.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 6 of 22

Page 8: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Balanced Vertex-Cut Graph PartitioningIv 2 V vertex; e 2 E edge; p 2 P partition.

IA(v) set of partitions where vertex v is replicated.

I � � 1 tolerance to load imbalance.I the size |p| of partition p is its edge cardinality.

minimize replicasreduce (1) bandwidth, (2) memory

usage and (3) synchronization

balance the loadefficient usage of available

computing resources

min1

|V |X

v2V

|A(v)| s.t. maxp2P

|p| < �|E ||P |

I the objective function is the replication factor (RF).B average number of replicas per vertex.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 7 of 22

Page 9: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Streaming SettingI input data is a list of edges, consumed in streaming fashion,requiring only a single pass.

partitioningalgorithm

stream of edges

graph

3 handle graphs that don’t fit in the main memory.

3 impose minimum overhead in time.

3 scalable, easy parallel implementations.

7 assignment decision taken cannot be later changed.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 8 of 22

Page 10: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Algorithms Taxonomy

Grid PDS HDRFDBH Greedy

hashing

stream-based vertex-cut graph

partitioning

Hashing

constrained partitioning

(Gonzalez et al., 2012)(Jain et al., 2013) (Petroni et al., 2015)(Xie et al., 2014)

history aware

(Jain et al., 2013)(Gonzalez et al., 2012)

history agnostic

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 9 of 22

Page 11: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Algorithms Taxonomy

Grid PDS HDRFDBH Greedy

hashing

stream-based vertex-cut graph

partitioning

Hashing

constrained partitioning

(Gonzalez et al., 2012)(Jain et al., 2013) (Petroni et al., 2015)(Xie et al., 2014)

history aware

(Jain et al., 2013)(Gonzalez et al., 2012)

history agnostic

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 9 of 22

Page 12: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

HDRF: High Degree are Replicated First

I favor the replication of high-degree vertices.

I the number of high-degree vertices in power-law graphs isvery low.

I overall reduction of the replication factor.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 10 of 22

Page 13: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

HDRF: High Degree are Replicated First

I in the context of robustness to network failure.

I if few high-degree vertices are removed from a power-lawgraph then it is turned into a set of isolated clusters.

I focus on the locality of low-degree vertices.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 10 of 22

Page 14: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

The HDRF Algorithm

vertex without replicas

vertex with replicas

case 1vertices not assigned to partitions

incoming edge

e

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22

Page 15: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

The HDRF Algorithm

vertex without replicas

vertex with replicas

case 1place e in the least loaded partition

incoming edge

e

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22

Page 16: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

The HDRF Algorithm

case 2only one vertex has been assigned

vertex without replicas

vertex with replicas

case 1place e in the least loaded partition

incoming edge

e

e

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22

Page 17: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

The HDRF Algorithm

case 2place e in the partition

vertex without replicas

vertex with replicas

case 1place e in the least loaded partition

incoming edge

e

e

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22

Page 18: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

The HDRF Algorithm

case 2place e in the partition

vertex without replicas

vertex with replicas

case 1place e in the least loaded partition

incoming edge

e

e

case 3vertices assigned, common partition

e

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22

Page 19: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

The HDRF Algorithm

case 2place e in the partition

vertex without replicas

vertex with replicas

case 1place e in the least loaded partition

incoming edge

e

e

case 3place e in the intersection

e

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22

Page 20: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Create Replicas

case 4empty intersection

e

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 12 of 22

Page 21: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Create Replicas

case 4least loaded partition in the union

case 4empty intersection

e

estandard Greedy solution

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 12 of 22

Page 22: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Create Replicas

case 4least loaded partition in the union

case 4empty intersection

e

case 4replicate vertex with highest degree

eδ(v1) > δ(v2)

estandard Greedy solution

HDRF

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 12 of 22

Page 23: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Equivalent Formulation: Maximize The Score

I computing a score C (e, p) for all partitions p 2 P .

I assigns e to the partition that maximizes C (e, p).

C (e, p) = CREP(e, p) + CBAL(p)

I the balance term breaks ties in replication term .

I this may not be enough to ensure load balance.

� control the importance of the balance term.

balanced partitions even when classical greedy fails.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 13 of 22

Page 24: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Equivalent Formulation: Maximize The Score

I computing a score C (e, p) for all partitions p 2 P .

I assigns e to the partition that maximizes C (e, p).

C (e, p) = CREP(e, p) + � · CBAL(p)

the balance term breaks ties in replication term.

this may not be enough to ensure load balance.

I � controls the importance of the balance term .

I balanced partitions even when classical greedy fails.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 13 of 22

Page 25: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Ordered Stream Of Edges

I without �

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22

Page 26: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Ordered Stream Of Edges

I without �

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22

Page 27: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Ordered Stream Of Edges

I without �

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22

Page 28: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Ordered Stream Of Edges

I without �

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22

Page 29: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Ordered Stream Of Edges

I without �

single partitionall the graph in a

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22

Page 30: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Ordered Stream Of Edges

I with �

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22

Page 31: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Experiments - Settings

I standalone partitioner.B VGP, a software package for one-pass vertex-cut balanced

graph partitioning.

B measure the performance: replication and balancing.

I GraphLab.B HDRF has been integrated in GraphLab PowerGraph 2.2.

B measure the impact on the execution time of graphcomputation in a distributed graph computing frameworks.

I stream of edges in random order.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 15 of 22

Page 32: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Experiments - Datasets

I real-wordgraphs.

I syntheticgraphs.

1M vertices

60M to 3M edges

Dataset |V | |E |MovieLens 10M 80.6K 10M

Netflix 497.9K 100.4MTencent Weibo 1.4M 140M

twitter-2010 41.7M 1.47B

100

101

102

103

104

105

101 102 103 104 105

nu

mb

er

of

vert

ice

s

degree

α = 1.8α = 2.2α = 2.6α = 3.0α = 3.4α = 4.0

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 16 of 22

Page 33: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Results - Synthetic Graphs Replication FactorI 128 partitions

4

8

1.8 2.0 2.4 2.8 3.2 3.6 4.0

rep

lica

tion

fa

cto

r

alpha

HDRFPDS |P|=133

grid |P|=121

greedyDBH

skewed homogeneousCIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22

Page 34: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Results - Synthetic Graphs Replication FactorI 128 partitions

4

8

1.8 2.0 2.4 2.8 3.2 3.6 4.0

rep

lica

tion

fa

cto

r

alpha

HDRFPDS |P|=133

grid |P|=121

greedyDBH

skewed homogeneous

less denseless edges

easyer to partition

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22

Page 35: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Results - Synthetic Graphs Replication FactorI 128 partitions

4

8

1.8 2.0 2.4 2.8 3.2 3.6 4.0

rep

lica

tion

fa

cto

r

alpha

HDRFPDS |P|=133

grid |P|=121

greedyDBH

skewed homogeneous

less denseless edges

easier to partition

more densemore edges

difficult to partition

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22

Page 36: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Results - Synthetic Graphs Replication FactorI 128 partitions

4

8

1.8 2.0 2.4 2.8 3.2 3.6 4.0

rep

lica

tion

fa

cto

r

alpha

HDRFPDS |P|=133

grid |P|=121

greedyDBH

skewed homogeneous

less denseless edges

easier to partition

more densemore edges

difficult to partition

exploitpower-law

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22

Page 37: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Results - Synthetic Graphs Replication FactorI 128 partitions

4

8

1.8 2.0 2.4 2.8 3.2 3.6 4.0

rep

lica

tion

fa

cto

r

alpha

HDRFPDS |P|=133

grid |P|=121

greedyDBH

skewed homogeneousCIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22

Page 38: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Results - Real-Word Graphs Replication FactorI 133 partitions

1

2

4

8

16

Tencent Weibo Netflix MovieLens 10M twitter-2010

replic

atio

n fact

or

PDS

7.9

11.3 11.5

8.0

greedy

2.8

8.1

10.7

5.9

DBH

1.5

7.1

12.9

6.8

HDRF

1.3

4.9

9.1

4.8

1

2

4

8

16

Tencent Weibo Netflix MovieLens 10M twitter-2010

replic

atio

n fact

or

PDSgreedy

DBHHDRF

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 18 of 22

Page 39: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Results - Real-Word Graphs Replication FactorI 133 partitions

1

2

4

8

16

Tencent Weibo Netflix MovieLens 10M twitter-2010

replic

atio

n fact

or

PDS

7.9

11.3 11.5

8.0

greedy

2.8

8.1

10.7

5.9

DBH

1.5

7.1

12.9

6.8

HDRF

1.3

4.9

9.1

4.8

1

2

4

8

16

Tencent Weibo Netflix MovieLens 10M twitter-2010

replic

atio

n fact

or

PDSgreedy

DBHHDRF

HDRF achieves a replication factor about 40% smaller than DBH,

more than 50% smaller than Greedy, almost 3× smaller than PDS,

more than 4× smaller than Grid and almost 14× smaller than Hashing.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 18 of 22

Page 40: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Results - Load Relative Standard DeviationI MovieLens 10M

0

1

2

3

4

5

6

7

8 16 32 64 128 256

loa

d r

ela

tive

sta

nd

ard

de

via

tion

(%

)

partitions

PDSgridgreedyDBHHDRFhashing

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 19 of 22

Page 41: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Results - Graph Algorithm Runtime SpeedupISGD algorithm for collaborative filtering on Tencent Weibo.

1

1.5

2

2.5

3

32 64 128

spe

ed

up

(×)

partitions

greedyPDS

1

1.5

2

2.5

3

32 64 128

spe

ed

up

(×)

partitions

greedyPDS |P|=31,57,133

I the speedup is proportional to both:B the advantage in replication factor.B the actual network usage of the algorithm.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 20 of 22

Page 42: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

ConclusionI HDRF is a one-pass vertex-cut graph partitioning algorithm.

I based on a greedy vertex-cut approach that leveragesinformation on vertex degrees.

I we provide a theoretical analysis of HDRF with anaverage-case upper bound for the vertex replication factor.

I experimental study shows that:

B HDRF provides the smallest replication factor with close tooptimal load balance.

B HDRF significantly reduces the time needed to performcomputation on graphs.

I the stand-alone software package for one-pass v-cut balancedpartitioning at https://github.com/fabiopetroni/VGP.

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 21 of 22

Page 43: HDRF: Stream-Based Partitioning for Power-Law Graphs.€¦ · Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning

Thank you!

Questions?

Fabio PetroniSapienza University of Rome, Italy

Current position:

PhD Student in Engineering in Computer Science

Research Interests:

data mining, machine learning, big data

[email protected]

CIKM 2015. October 19-23, 2015. Melbourne, Australia. 22 of 22