55
Speeding Up Algorithms on Compressed Web Graphs Chinmay Karande (Georgia Institute of Technology) Kumar Chellapilla (Microsoft Live Labs) Reid Andersen (Microsoft Live Labs) WSDM 2009 C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 1 / 41

Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Speeding Up Algorithms on Compressed Web Graphs

Chinmay Karande (Georgia Institute of Technology)Kumar Chellapilla (Microsoft Live Labs)

Reid Andersen (Microsoft Live Labs)

WSDM 2009

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 1 / 41

Page 2: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Outline

1 Fast Algorithms for Compressed GraphsGraph CompressionAdjacency Matrix Multiplication on Compressed GraphsAdapting PageRank Markov Chain to Compressed GraphsOther algorithmsImplementation Results

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 2 / 41

Page 3: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

WWW Graph

A gold-mine of important information.

Webpages are nodes, hyperlinks are edges.

HUGE dataset: ∼ 1 Trillion pages

Graph Algorithms:I Importance metrics: PageRank, HITS, SALSA...I Finding pathsI Clustering

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 3 / 41

Page 4: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

WWW Graph

A gold-mine of important information.

Webpages are nodes, hyperlinks are edges.

HUGE dataset: ∼ 1 Trillion pages

Graph Algorithms:I Importance metrics: PageRank, HITS, SALSA...I Finding pathsI Clustering

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 3 / 41

Page 5: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

WWW Graph

A gold-mine of important information.

Webpages are nodes, hyperlinks are edges.

HUGE dataset: ∼ 1 Trillion pages

Graph Algorithms:I Importance metrics: PageRank, HITS, SALSA...I Finding pathsI Clustering

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 3 / 41

Page 6: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Structural Graph Compression

Replace a dense subgraph by a sparse one, such that:I Maintain connectivityI DecompressibleI Maintain ‘structure’

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 4 / 41

Page 7: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Clique-Star Compression

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 5 / 41

Page 8: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Clique-Star Compression: Terminology

Real Node

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 6 / 41

Page 9: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Clique-Star Compression: Terminology

Real Node

Virtual Node

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 7 / 41

Page 10: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Clique-Star Compression: Terminology

Real Node

Virtual Node

Virtual Edge

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 8 / 41

Page 11: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Clique-Star Compression

Compression performed in phases: In each phase compressedge-disjoint cliques.

In each phase, virtual edges may become longer by one.

Diminishing returns on number of phases: ∼ 6 to 8 phases yield 10fold compression. [G. Buehrer, K. Chellapilla]

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 9 / 41

Page 12: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Problem Statement

Problem: Given G ′ compressed from G , how do we perform computationson G ′ so as to infer properties of G?

How to determine metrics like:I PageRankI HITSI SALSA

of G without decompressing G ′?

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 10 / 41

Page 13: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Example: PageRank

PageRank can be viewed as:

Repeated multiplication by adjacency matrix (with adjustments)I We need a Black-Box procedure to multiply a vector by adjacency

matrix of G given only G ′.

Steady state of a Markov ChainI We need a Markov Chain on G ′ that ‘mimics’ the PageRank MC on G .

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 11 / 41

Page 14: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Example: PageRank

PageRank can be viewed as:

Repeated multiplication by adjacency matrix (with adjustments)I We need a Black-Box procedure to multiply a vector by adjacency

matrix of G given only G ′.

Steady state of a Markov ChainI We need a Markov Chain on G ′ that ‘mimics’ the PageRank MC on G .

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 11 / 41

Page 15: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Outline

1 Fast Algorithms for Compressed GraphsGraph CompressionAdjacency Matrix Multiplication on Compressed GraphsAdapting PageRank Markov Chain to Compressed GraphsOther algorithmsImplementation Results

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 12 / 41

Page 16: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Adjacency Matrix Multiplication: Nuts and Bolts

y = ET · x x ∈ Rn

yv = xu1 + xu2 + xu3 + xu4 + xu5

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 13 / 41

Page 17: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Adjacency Matrix Multiplication on Compressed Graph

y = ET · x x ∈ Rn

yv = xu1 + xu2 + xu3 + xu4 + xu5

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 14 / 41

Page 18: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Adjacency Matrix Multiplication on Compressed Graph

y = ET · x x ∈ Rn

yv = xu1 + xu2 + yw

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 15 / 41

Page 19: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Adjacency Matrix Multiplication on Compressed Graph:Dependencies

Consider a virtual edge u → w1 → ... → wk → v :

yv depends upon ywk

ywkdepends upon ywk−1

...

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 16 / 41

Page 20: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Acyclic Dependencies on Virtual Nodes

Subgraph induced by edges incident on virtual nodes is a forest. [G.Buehrer, K. Chellapilla]⇒ There exists a way to resolve dependencies.

while y is undefined on some virtual nodes doPick virtual node w such that y is defined on all virtual predecessorsof w .Compute and define yw .

end while

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 17 / 41

Page 21: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Acyclic Dependencies on Virtual Nodes

Subgraph induced by edges incident on virtual nodes is a forest. [G.Buehrer, K. Chellapilla]⇒ There exists a way to resolve dependencies.

while y is undefined on some virtual nodes doPick virtual node w such that y is defined on all virtual predecessorsof w .Compute and define yw .

end while

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 17 / 41

Page 22: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Solution

Permute virtual nodes in the order of dependencies

Practical ConsiderationsI Sequential File AccessI Synchronous algorithmI For SALSA: Inverted adjacency required for virtual nodes.I Speed-up almost matches the storage reduction ratio.!

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 18 / 41

Page 23: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Solution

Permute virtual nodes in the order of dependencies

Practical ConsiderationsI Sequential File AccessI Synchronous algorithmI For SALSA: Inverted adjacency required for virtual nodes.

I Speed-up almost matches the storage reduction ratio.!

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 18 / 41

Page 24: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Solution

Permute virtual nodes in the order of dependencies

Practical ConsiderationsI Sequential File AccessI Synchronous algorithmI For SALSA: Inverted adjacency required for virtual nodes.I Speed-up almost matches the storage reduction ratio.!

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 18 / 41

Page 25: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Outline

1 Fast Algorithms for Compressed GraphsGraph CompressionAdjacency Matrix Multiplication on Compressed GraphsAdapting PageRank Markov Chain to Compressed GraphsOther algorithmsImplementation Results

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 19 / 41

Page 26: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

PageRank Scheme

PageRank is a Markov Chain:

With probability α, perform a uniform ‘jump’.

Pr [u → v ] = (1− α)1

|δout(u)|

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 20 / 41

Page 27: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

PageRank on Compressed Graph

u1

u2

u3

v1

v2

u1

u2

u3

v1

v2

w

Pr [Xt = ui ] = pi

Pr [Xt+1 = vi | p1, p2, p3] =1

|δ(u1)|+

1

|δ(u2)|+

1

|δ(u3)|

=1

|δ(w)|∑

i

|δ(w)||δ(ui )|

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 21 / 41

Page 28: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

PageRank on Compressed Graph

u1

u2

u3

v1

v2

u1

u2

u3

v1

v2

w

Pr [Xt = ui ] = pi

Pr [Xt+1 = vi | p1, p2, p3] =1

|δ(u1)|+

1

|δ(u2)|+

1

|δ(u3)|

=1

|δ(w)|∑

i

|δ(w)||δ(ui )|

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 21 / 41

Page 29: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Defining the ‘reach’ of a node

∆(u) =

1 If u is real∑uv∈E ′

∆(v) If u is virtual

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 22 / 41

Page 30: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Illustration of ∆ function

∆(u) = 1

∆(v) = 5

∆(w) = 3

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 23 / 41

Page 31: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Defining the true out-degree of a node

Γ(u) =∑

uv∈E ′

∆(v)

If G ′ is compressed from G then:

For real u, Γ(u) is the out-degree of u in G .

For virtual u, Γ(u) = ∆(u).

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 24 / 41

Page 32: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Illustration of Γ function

Γ(u) = 7

Γ(v) = 5

Γ(w) = 3

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 25 / 41

Page 33: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

PageRank on Compressed Graph

With probability α, perform a uniform ‘jump’ but don’t jump to andfrom virtual nodes.

Pr [u → v ] =

(1− α)

∆(v)

Γ(u)If u is real

∆(v)

Γ(u)If u is virtual

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 26 / 41

Page 34: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

PageRank on Compressed Graph

With probability α, perform a uniform ‘jump’ but don’t jump to andfrom virtual nodes.

Pr [u → v ] =

(1− α)

∆(v)

Γ(u)If u is real

∆(v)

Γ(u)If u is virtual

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 26 / 41

Page 35: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Correctness theorem

Theorem

If G ′ is compressed from G and p′, p are respective PageRank vectors,then for every real node u, p′(u) = εp(u).

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 27 / 41

Page 36: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Proof

Proof.

Split the compression from G to G ′ in phases:

G = G0 � G1 � ... � Gk = G ′

Let pi be the steady state of (modified) PageRank on Gi .

Conclusion: For u ∈ V (Gi ), pi(u)’s and pi+1(u)’s satisfy the sameequations ⇒ pi+1(u) = εi+1pi(u)

ε = ε1 · ε2 · ... · εk

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 28 / 41

Page 37: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Proof

Proof.

Split the compression from G to G ′ in phases:

G = G0 � G1 � ... � Gk = G ′

Let pi be the steady state of (modified) PageRank on Gi .

Conclusion: For u ∈ V (Gi ), pi(u)’s and pi+1(u)’s satisfy the sameequations ⇒ pi+1(u) = εi+1pi(u)

ε = ε1 · ε2 · ... · εk

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 28 / 41

Page 38: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Proof

Proof.

Split the compression from G to G ′ in phases:

G = G0 � G1 � ... � Gk = G ′

Let pi be the steady state of (modified) PageRank on Gi .

Conclusion: For u ∈ V (Gi ), pi(u)’s and pi+1(u)’s satisfy the sameequations ⇒ pi+1(u) = εi+1pi(u)

ε = ε1 · ε2 · ... · εk

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 28 / 41

Page 39: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Proof

Proof.

Split the compression from G to G ′ in phases:

G = G0 � G1 � ... � Gk = G ′

Let pi be the steady state of (modified) PageRank on Gi .

Conclusion: For u ∈ V (Gi ), pi(u)’s and pi+1(u)’s satisfy the sameequations ⇒ pi+1(u) = εi+1pi(u)

ε = ε1 · ε2 · ... · εk

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 28 / 41

Page 40: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Solution

Run (modified) PageRank on compressed graph, and normalize the valueson real nodes to unit norm..

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 29 / 41

Page 41: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Precision Theorem

Theorem

ε ≥ 2−k

where k is the length of the longest virtual edge.

Proof.

Split the compression from G to G ′ in phases:

G = G0 � G1 � ... � Gk = G ′

Let pi be the steady state of (modified) PageRank on Gi .

To prove: εi ≥ 1/2.

Follows from the fact that∑u∈V (Gi )

pi (u) +∑u∈Q

pi (u) = 1

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 30 / 41

Page 42: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Precision Theorem

Theorem

ε ≥ 2−k

where k is the length of the longest virtual edge.

Proof.

Split the compression from G to G ′ in phases:

G = G0 � G1 � ... � Gk = G ′

Let pi be the steady state of (modified) PageRank on Gi .

To prove: εi ≥ 1/2.

Follows from the fact that∑u∈V (Gi )

pi (u) +∑u∈Q

pi (u) = 1

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 30 / 41

Page 43: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Precision Theorem

Theorem

ε ≥ 2−k

where k is the length of the longest virtual edge.

Proof.

Split the compression from G to G ′ in phases:

G = G0 � G1 � ... � Gk = G ′

Let pi be the steady state of (modified) PageRank on Gi .

To prove: εi ≥ 1/2.

Follows from the fact that∑u∈V (Gi )

pi (u) +∑u∈Q

pi (u) = 1

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 30 / 41

Page 44: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Solution

Run (modified) PageRank on compressed graph, and normalize the valueson real nodes to unit norm..

Practical Considerations:I Modified only the weights - Can run any existing PageRank

implementation almost unchanged.I Sequential File AccessI Asynchronous: Distributed computing feasible.I Convergence may be slower due to longer path lengths.

I Speed-up per iteration almost matches the storage reductionratio.!

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 31 / 41

Page 45: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Solution

Run (modified) PageRank on compressed graph, and normalize the valueson real nodes to unit norm..

Practical Considerations:I Modified only the weights - Can run any existing PageRank

implementation almost unchanged.I Sequential File AccessI Asynchronous: Distributed computing feasible.I Convergence may be slower due to longer path lengths.I Speed-up per iteration almost matches the storage reduction

ratio.!

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 31 / 41

Page 46: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Outline

1 Fast Algorithms for Compressed GraphsGraph CompressionAdjacency Matrix Multiplication on Compressed GraphsAdapting PageRank Markov Chain to Compressed GraphsOther algorithmsImplementation Results

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 32 / 41

Page 47: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

SALSA: Stochastic Approach to Link-Structure Analysis

Both the Synchronous and Asynchronous methods can be adapted forSALSA.

In-link counterparts of ∆ and Γ required.

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 33 / 41

Page 48: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Shortest Paths: BFS

Simply define edge weights as:

w(u, v) =

{1 If v is real0 If v is virtual

Use a Deque.

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 34 / 41

Page 49: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Outline

1 Fast Algorithms for Compressed GraphsGraph CompressionAdjacency Matrix Multiplication on Compressed GraphsAdapting PageRank Markov Chain to Compressed GraphsOther algorithmsImplementation Results

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 35 / 41

Page 50: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Experiments: Proof of Concept

If β is the reduction ratio in the number of edges, we cannot hope for theprograms to run β times faster.

O(|V |) operations such as:

Allocating variables

Copying and zeroing values between iterations

bring down the speed-up to a small extent.

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 36 / 41

Page 51: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

PageRank on eu-2005

Uncompressed graph

No. of nodes: 862,664No. of edges: 19,235,140

Compressed graph has β = 4.34

No. of nodes: 1,196,536No. of edges: 4,429,375

Uncompressed Synchronous Asynchronous

Time/iteration (sec) 5.37 1.58 1.50

No. of iterations 19 19 50

Speed-up 1 3.40 1.36

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 37 / 41

Page 52: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

PageRank on uk-2005

Uncompressed graph

No. of nodes: 39,459,925No. of edges: 936,364,282

Compressed graph has β = 6.18

No. of nodes: 47,482,140No. of edges: 151,456,024

Uncompressed Synchronous Asynchronous

Time/iteration (sec) 264.40 59.52 59.15

No. of iterations 21 21 53

Speed-up 1 4.44 2.53

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 38 / 41

Page 53: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

SALSA on eu-2005

Uncompressed graph

No. of nodes: 862,664No. of edges: 19,235,140

Compressed graph has β = 4.34

No. of nodes: 1,196,536No. of edges: 4,429,375

Uncompressed Synchronous Asynchronous

Time/iteration (sec) 5.48 2.37 1.97

No. of iterations 91 91 100

Speed-up 1 2.31 2.70

Storage Reduction 1 2.36 3.21

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 39 / 41

Page 54: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

PageRank on uk-2005

Uncompressed graph

No. of nodes: 39,459,925No. of edges: 936,364,282

Compressed graph has β = 6.18

No. of nodes: 47,482,140No. of edges: 151,456,024

Uncompressed Synchronous Asynchronous

Time/iteration (sec) 276.09 72.93 88.69

No. of iterations 104 104 124

Speed-up 1 3.11 3.18

Storage Reduction 1 3.47 4.54

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 40 / 41

Page 55: Speeding Up Algorithms on Compressed Web Graphstranslectures.videolectures.net/.../tag=37587/wsdm09_karande_suac… · C. Karande, K. Chellapilla, R. Andersen Speeding Up Algorithms

Thank you.!

C. Karande, K. Chellapilla, R. Andersen () Speeding Up Algorithms on Compressed Web Graphs WSDM 2009 41 / 41