Transcript

Efficient PageRank Tracking

in Evolving Networks

Naoto Ohsaka (UTokyo)

Takanori Maehara (Shizuoka University)

Ken-ichi Kawarabayashi (NII)

KDDโ€™15: Social and Graphs

ERATO Kawarabayashi Large Graph Project

Background

Personalized PageRank Tracking0.21

2

๐’• = ๐Ÿ ๐’• = ๐Ÿ ๐’• = ๐Ÿ‘

0.25 0.23

ApplicationsSearch engine [Brin-Page. โ€™98]

Spam detection[Chung-Toyoda-Kitsuregawa. โ€™09, โ€™10]

User recommendation[Gupta-Goel-Lin-Sharma-Wang-Zadeh. โ€™13]

Size Speed

WWW 60T 600K pages / s

Twitter 300M 5K tweets / s

Google+ 700M +19 users / s

Growth of real networks

Background

Existing Work for PageRank Tracking

3

๐’Ž random

edge insertions

ScalabilityUpdate time < 0.1s

Error โ‰ˆ 10-9

Aggregation/Disaggregation[Chien et al. โ€™04]

๐’ช ๐’Ž ๐‘บ log 1/๐œ– 68M edges

Monte-Carlo[Bahmani et al. โ€™10]

๐’ช(๐‘š + log๐‘š /๐๐Ÿ) 68M edges

Power methodnaive method

๐’ช ๐’Ž๐Ÿ log 1/๐œ– 11M edges

Our Contribution

๐’Ž random

edge insertions

ScalabilityUpdate time < 0.1s

Error โ‰ˆ 10-9

This workAve.

๐“ž ๐’Ž+ ๐šซ ๐ฅ๐จ๐ ๐’Ž/๐3,700M edges

Aggregation/Disaggregation[Chien et al. โ€™04]

๐’ช ๐’Ž ๐‘บ log 1/๐œ– 68M edges

Monte-Carlo[Bahmani et al. โ€™10]

๐’ช(๐‘š + log๐‘š /๐๐Ÿ) 68M edges

Power methodnaive method

๐’ช ๐’Ž๐Ÿ log 1/๐œ– 11M edges

Propose a simple, efficient, & accurate method for

Personalized PageRank tracking in evolving networks

โ‡ฉ Max. out-deg

4

Preliminaries

Definition of Personalized PageRank[Brin-Page. Comput. Networks ISDN Syst.โ€™98] [Jeh-Widom. WWWโ€™03]

Linear equation

A solution ๐‘ฅ of ๐’™ = ๐œถ๐‘ท๐’™ + ๐Ÿ โˆ’ ๐œถ ๐’ƒ

Random walk modeling web browsing

Moves to a random out-neighbor w.p. ๐›ผJumps to a random vertex w.p. 1 โˆ’ ๐›ผ(biased by ๐‘)

Transition matrix

Preference vector

Decay factor = 0.85

Random-walk

interpretation

Stationary

distribution

๐’–

๐’—๐’˜

๐’‚

5

Preliminaries

Computing PageRank in Static Graphs

Solving eq. ๐‘ฅ = ๐›ผ๐‘ƒ๐‘ฅ + 1 โˆ’ ๐›ผ ๐‘

Power method ๐‘ฅ(๐œˆ) = ๐›ผ๐‘ƒ๐‘ฅ ๐œˆโˆ’1 + 1 โˆ’ ๐›ผ ๐‘

Gauss-Seidel [Del Corso-Gullรญ-Romani. Internet Math.โ€™05]

LU/QR factorization [Fujiwara-Nakatsuji-Onizuka-Kitsuregawa. VLDBโ€™12]

Krylov subspace method [Maehara-Akiba-Iwata-Kawarabayashi. VLDBโ€™14]

Estimating the frequency ๐‘ฅ๐‘ฃ of visiting ๐‘ฃ

Monte-Carlo simulation

6

Preliminaries

Tracking PageRank in Evolving Graphs

Aggregation/disaggregation[Chien-Dwork-Kumar-Simon-Sivakumar. Internet Math.โ€™04]

Apply the power method to a subgraph

Monte-Carlo[Bahmani-Chowdhury. VLDBโ€™10]

Maintain & update random-walk segments

7

Still large

๐›€ ๐Ÿ/๐๐Ÿ Too many

๐‘ฎ(๐’•)(๐‘ฝ, ๐‘ฌ(๐’•))

Proposed Method

Problem Setting

8

Given at time ๐‘ก:Edges inserted to๏ผdeleted fromGiven ๐บ 0 , ๐›ผ, ๐‘ at time 0

Problem at time 0Compute approx.

PPR ๐‘ฅ 0 of ๐บ(0)

Problem at time ๐‘กCompute approx.

PPR ๐‘ฅ ๐‘ก of ๐บ(๐‘ก)

๐’™ ๐ŸŽ โˆ’ ๐’™โˆ— ๐ŸŽ โˆž < ๐

๐‘ฎ ๐ŸŽ(๐‘ฝ, ๐‘ฌ(๐ŸŽ))

โ€ฆ ๐‘ฎ(๐’• โˆ’ ๐Ÿ)(๐‘ฝ, ๐‘ฌ(๐’• โˆ’ ๐Ÿ))

Solving eq. ๐‘ฅ ๐‘ก = ๐›ผ๐‘ƒ ๐‘ก ๐‘ฅ ๐‘ก + 1 โˆ’ ๐›ผ ๐‘1. ๐‘ฅ(๐‘ก โˆ’ 1) is a GOOD initial solution for ๐‘ฅ(๐‘ก)2. Improving approximate solution locally

Proposed Method

The Idea

9

We use the Gauss-Southwell method

a.k.a. Bookmark coloring algorithm [Berkhin. Internet Math.โ€™06]

Local algorithm[Spielman-Teng. SIAM J. Comput.โ€™13] [Andersen-Chung-Lang. FOCSโ€™06]

[Southwell. โ€™40,โ€™46]

error < ๐œ–

error โ‰ฅ ๐œ–Insert

๐œˆ = 1,2,3, โ€ฆ

๐‘– โ†a vertex with largest ๐‘Ÿ๐‘–๐œˆโˆ’1

If ๐‘Ÿ๐‘–๐œˆโˆ’1

< ๐œ– terminate

Update ๐‘ฅ(๐œˆโˆ’1) & ๐‘Ÿ(๐œˆโˆ’1) locally so that ๐‘Ÿ๐‘–๐œˆ= 0

๐œˆth solution ๐‘ฅ ๐œˆ

๐œˆth residual ๐‘Ÿ(๐œˆ) = 1 โˆ’ ๐›ผ ๐‘ โˆ’ ๐ผ โˆ’ ๐›ผ๐‘ƒ ๐‘ฅ ๐œˆ

Goal: ๐‘Ÿ ๐œˆ โ†’ ๐ŸŽ

Proposed Method

Gauss-Southwell Method [Southwell. โ€™40,โ€™46]

10

# iterations

Stops within ๐‘Ÿ 0

1โˆ’๐›ผ ๐œ–iter.

โœ” ๐‘Ÿ ๐œˆ โ‰ค ๐‘Ÿ ๐œˆโˆ’1 โˆ’ 1 โˆ’ ๐›ผ ๐œ–

๐’Š

๐’—

๐’–

๐’˜

+๐›ผ๐‘Ÿ๐‘–(๐œˆโˆ’1)

deg

+๐›ผ๐‘Ÿ๐‘–(๐œˆโˆ’1)

deg+๐›ผ

๐‘Ÿ๐‘–(๐œˆโˆ’1)

deg

Accuracy

๐’™โˆ— โˆ’ ๐’™ ๐‚โˆžโ‰ค

๐

๐Ÿโˆ’๐œถ

โœ” ๐‘Ÿ๐‘–(๐œˆ)

< ๐œ–

Proposed Method

Overview

11

At time ๐‘ก:

๐‘ฅ ๐‘ก (0) = ๐‘ฅ ๐‘ก โˆ’ 1

๐‘Ÿ ๐‘ก 0 = ๐‘Ÿ ๐‘ก โˆ’ 1 + ๐›ผ ๐‘ƒ ๐‘ก โˆ’ ๐‘ƒ ๐‘ก โˆ’ 1 ๐‘ฅ ๐‘ก โˆ’ 1

Apply the Gauss-Southwell method

depends on 1

At time ๐‘ก:

๐‘ฅ ๐‘ก (0) = ๐‘ฅ ๐‘ก โˆ’ 1

๐‘Ÿ ๐‘ก 0 = ๐‘Ÿ ๐‘ก โˆ’ 1 + ๐›ผ ๐‘ƒ ๐‘ก โˆ’ ๐‘ƒ ๐‘ก โˆ’ 1 ๐‘ฅ ๐‘ก โˆ’ 1

Apply the Gauss-Southwell method

Proposed Method

Performance Analysis

12

Increase of ๐’“ ๐Ÿ

Computation time of = ๐’ช ฮ” ร— #iters.Max. out-deg โ‡ง

๐’Š

๐’—

๐’–

๐’˜

+๐›ผ๐‘Ÿ๐‘–(๐œˆโˆ’1)

deg

+๐›ผ๐‘Ÿ๐‘–(๐œˆโˆ’1)

deg+๐›ผ

๐‘Ÿ๐‘–(๐œˆโˆ’1)

degHow small?

Proposed Method

Performance Analysis: Any Change

13

๐‘ฎ(๐’• โˆ’ ๐Ÿ) ๐‘ฎ(๐’•)

Consider any change including full construction

increase of ๐‘Ÿ 1 โ‰ค ๐Ÿ๐œถ

Same as static computation

Proposed Method

Performance Analysis: Random Edge Insertion

14

Monte-Carlo[Bahmani-Chowdhury. VLDBโ€™10]

๐„ # updated seg. = ๐‘‚ ๐‘… log๐‘š

๐‘… = ฮฉ(1/๐œ–2) is total # seg.

Our method

๐„ increase of ๐‘Ÿ 1 โ‰ค ๐Ÿ๐œถ/๐’•

A single-edge is randomly inserted for each time

๐‘ฎ(๐ŸŽ)

๐ธ 0 = โˆ… Insert an edge ๐ธ ๐‘š = ๐‘š

โ€ฆ ๐‘ฎ(๐’Ž)๐‘ฎ(๐Ÿ)

Lemma 3 in [Bahmani-Chowdhury. VLDBโ€™10]

Proposed Method

Performance Analysis: Results

15

Our result for random edge insertion (Prop. 6 in the paper)

If ๐‘š edges are randomly and sequentially inserted,

expected total #iter. is ๐“ž ๐ฅ๐จ๐ ๐’Ž/๐

โ‡จ expected total time is ๐“ž(๐’Ž+ ๐šซ ๐ฅ๐จ๐ ๐’Ž/๐)

Our result for any change (Prop. 5 in the paper)

#iter. for any change is amortized ๐“ž(๐Ÿ/๐)

โ‡จ Time is amortized ๐“ž(๐šซ/๐)

Experiments

Setting: Single-edge Insertion

Parameter settings

๐›ผ = 0.85

๐‘ has 100 non-zero elements

๐œ– = 10โˆ’9

16

๐‘ฎ(๐ŸŽ) ๐‘ฎ(๐Ÿ) ๐‘ฎ(๐Ÿ๐ŸŽ๐Ÿ“)

Except the last

100,000 edgesAdd an edge The whole graph

โ€ฆ

chronologicallyor

randomly

Experiments

Efficiency Comparison: Time for an Edge Insertionweb-Google

[SNAP]

๐‘‰ =1M

๐ธ =5M

Wikipedia[KONECT]

๐‘‰ = 2M

๐ธ =40M

twitter-2010[LAW]

๐‘‰ = 142M

๐ธ =1,500M

uk-2007-05[LAW]

๐‘‰ = 105M

๐ธ =3,700M

This work 7 ฮผs 77 ฮผs 29,383 ฮผs 2 ฮผs

Aggregation/Disaggregation[Chien et al. โ€™04]

320 ฮผs 40,336 ฮผs >100,000 ฮผs >100,000 ฮผs

Monte-Carlo[Bahmani et al. โ€™10]

444 ฮผs 9,196 ฮผs >100,000 ฮผs >100,000 ฮผs

Warm start power method 80,994 ฮผs >100,000 ฮผs >100,000 ฮผs >100,000 ฮผs

From scratch power method >100,000 ฮผs >100,000 ฮผs >100,000 ฮผs >100,000 ฮผs

17Environment: Intel Xeon E5-2690 2.90GHz CPU with 256GB memory

[KONECT] The Koblenz Network Collection http://konect.uni-koblenz.de/networks/

[LAW] Laboratory for Web Algorithmics http://law.di.unimi.it/datasets.php

[SNAP] Stanford Large Network Dataset Collection http://snap.stanford.edu/data/

Experiments

Accuracy Comparison: Transition of Ave. ๐ฟ1 Error

18

This work

Aggregation/Disaggregation [Chien et al.โ€™04]

Monte-Carlo [Bahmani et al.โ€™10]

Warm start (power method)

From scratch (power method)

Wikipedia [KONECT]

๐‘‰ =2M ๐ธ =40M

Environment: Intel Xeon E5-2690 2.90GHz CPU with 256GB memory

10-4

10-8

10-10

10-6

10-12

10-14

soc-Epinions1 [SNAP]

๐‘‰ =76K ๐ธ =509K

10-4

10-8

10-10

10-6

10-12

10-14

Comparable (~10-9) to naive methods

Dataset [Source] ๐‘ฝ ๐‘ฌ Max. Out-deg ๐šซ Ave. Time Ave. #Iter.

wiki-Talk [SNAP] 2M 5M 100,022 589.6 ฮผs 2.3

web-Google [SNAP] 1M 5M 3,444 7.2 ฮผs 22.6

as-Skitter [SNAP] 2M 11M 35,387 288.4 ฮผs 0.8

FlickrTime [KONECT] 2M 33M 26,367 95.3 ฮผs 16.2

WikipediaTime [KONECT] 2M 40M 6,975 76.8 ฮผs 46.0

soc-LiveJournal1 [SNAP] 5M 68M 20,292 17.9 ฮผs 7.6

twitter-2010 [LAW] 42M 1,500M 2,997,469 29,382.8 ฮผs 0.7

uk-2007-05 [LAW] 105M 3,700M 15,402 2.3 ฮผs 0.0

Experiments

Evaluation: Time & #Iter. for a Single-edge Insertion

[KONECT] The Koblenz Network Collection http://konect.uni-koblenz.de/networks/

[LAW] Laboratory for Web Algorithmics http://law.di.unimi.it/datasets.php

[SNAP] Stanford Large Network Dataset Collection http://snap.stanford.edu/data/

Environment: Intel Xeon E5-2690 2.90GHz CPU with 256GB memory 19

Experiments

Evaluation: Relationship between ๐‘ฌ & #Iter.

20

๐’ช ๐ธ โˆ’1

We plotted for 15 datasets

Matches our theoretical result

๐ธ

Ave. #iter. for

edge insertion

Environment: Intel Xeon E5-2690 2.90GHz CPU with 256GB memory

Dataset [Source] ๐‘ฝ ๐‘ฌ Max. Out-deg ๐šซ Ave. Time Ave. #Iter.

wiki-Talk [SNAP] 2M 5M 100,022 589.6 ฮผs 2.3

web-Google [SNAP] 1M 5M 3,444 7.2 ฮผs 22.6

as-Skitter [SNAP] 2M 11M 35,387 288.4 ฮผs 0.8

FlickrTime [KONECT] 2M 33M 26,367 95.3 ฮผs 16.2

WikipediaTime [KONECT] 2M 40M 6,975 76.8 ฮผs 46.0

soc-LiveJournal1 [SNAP] 5M 68M 20,292 17.9 ฮผs 7.6

twitter-2010 [LAW] 42M 1,500M 2,997,469 29,382.8 ฮผs 0.7

uk-2007-05 [LAW] 105M 3,700M 15,402 2.3 ฮผs 0.0

[KONECT] The Koblenz Network Collection http://konect.uni-koblenz.de/networks/

[LAW] Laboratory for Web Algorithmics http://law.di.unimi.it/datasets.php

[SNAP] Stanford Large Network Dataset Collection http://snap.stanford.edu/data/

Experiments

Evaluation: Time & #Iter. for a Single-edge Insertion

What

Happens !?21Environment: Intel Xeon E5-2690 2.90GHz CPU with 256GB memory

Experiments

Discussion: What is the Difference?twitter-2010 ๐‘ข, ๐‘ฃ says โ€œ๐‘ฃ follows ๐‘ขโ€

uk-2007-05 ๐‘ข, ๐‘ฃ says โ€œPage ๐‘ข links to ๐‘ฃโ€

๐’Œ

# vertices s.t.

out-deg โ‰ฅ ๐’Œ

Celebrities cause the performance degradation !!22

Summary

Proposed an efficient & accurate method for

Personalized PageRank tracking in evolving networks

23

Experimentally

Scales to a graph w/ 3.7B edges

Further Speed-up based on our observation

Handle dangling nodes

Theoretically

Ave. ๐“ž(๐’Ž+ ๐šซ ๐ฅ๐จ๐ ๐’Ž/๐)for ๐’Ž edge insertions

Future Work


Recommended