Efficient PageRank Tracking
in Evolving Networks
Naoto Ohsaka (UTokyo)
Takanori Maehara (Shizuoka University)
Ken-ichi Kawarabayashi (NII)
KDDโ15: Social and Graphs
ERATO Kawarabayashi Large Graph Project
Background
Personalized PageRank Tracking0.21
2
๐ = ๐ ๐ = ๐ ๐ = ๐
0.25 0.23
ApplicationsSearch engine [Brin-Page. โ98]
Spam detection[Chung-Toyoda-Kitsuregawa. โ09, โ10]
User recommendation[Gupta-Goel-Lin-Sharma-Wang-Zadeh. โ13]
Size Speed
WWW 60T 600K pages / s
Twitter 300M 5K tweets / s
Google+ 700M +19 users / s
Growth of real networks
Background
Existing Work for PageRank Tracking
3
๐ random
edge insertions
ScalabilityUpdate time < 0.1s
Error โ 10-9
Aggregation/Disaggregation[Chien et al. โ04]
๐ช ๐ ๐บ log 1/๐ 68M edges
Monte-Carlo[Bahmani et al. โ10]
๐ช(๐ + log๐ /๐๐) 68M edges
Power methodnaive method
๐ช ๐๐ log 1/๐ 11M edges
Our Contribution
๐ random
edge insertions
ScalabilityUpdate time < 0.1s
Error โ 10-9
This workAve.
๐ ๐+ ๐ซ ๐ฅ๐จ๐ ๐/๐3,700M edges
Aggregation/Disaggregation[Chien et al. โ04]
๐ช ๐ ๐บ log 1/๐ 68M edges
Monte-Carlo[Bahmani et al. โ10]
๐ช(๐ + log๐ /๐๐) 68M edges
Power methodnaive method
๐ช ๐๐ log 1/๐ 11M edges
Propose a simple, efficient, & accurate method for
Personalized PageRank tracking in evolving networks
โฉ Max. out-deg
4
Preliminaries
Definition of Personalized PageRank[Brin-Page. Comput. Networks ISDN Syst.โ98] [Jeh-Widom. WWWโ03]
Linear equation
A solution ๐ฅ of ๐ = ๐ถ๐ท๐ + ๐ โ ๐ถ ๐
Random walk modeling web browsing
Moves to a random out-neighbor w.p. ๐ผJumps to a random vertex w.p. 1 โ ๐ผ(biased by ๐)
Transition matrix
Preference vector
Decay factor = 0.85
Random-walk
interpretation
Stationary
distribution
๐
๐๐
๐
5
Preliminaries
Computing PageRank in Static Graphs
Solving eq. ๐ฅ = ๐ผ๐๐ฅ + 1 โ ๐ผ ๐
Power method ๐ฅ(๐) = ๐ผ๐๐ฅ ๐โ1 + 1 โ ๐ผ ๐
Gauss-Seidel [Del Corso-Gullรญ-Romani. Internet Math.โ05]
LU/QR factorization [Fujiwara-Nakatsuji-Onizuka-Kitsuregawa. VLDBโ12]
Krylov subspace method [Maehara-Akiba-Iwata-Kawarabayashi. VLDBโ14]
Estimating the frequency ๐ฅ๐ฃ of visiting ๐ฃ
Monte-Carlo simulation
6
Preliminaries
Tracking PageRank in Evolving Graphs
Aggregation/disaggregation[Chien-Dwork-Kumar-Simon-Sivakumar. Internet Math.โ04]
Apply the power method to a subgraph
Monte-Carlo[Bahmani-Chowdhury. VLDBโ10]
Maintain & update random-walk segments
7
Still large
๐ ๐/๐๐ Too many
๐ฎ(๐)(๐ฝ, ๐ฌ(๐))
Proposed Method
Problem Setting
8
Given at time ๐ก:Edges inserted to๏ผdeleted fromGiven ๐บ 0 , ๐ผ, ๐ at time 0
Problem at time 0Compute approx.
PPR ๐ฅ 0 of ๐บ(0)
Problem at time ๐กCompute approx.
PPR ๐ฅ ๐ก of ๐บ(๐ก)
๐ ๐ โ ๐โ ๐ โ < ๐
๐ฎ ๐(๐ฝ, ๐ฌ(๐))
โฆ ๐ฎ(๐ โ ๐)(๐ฝ, ๐ฌ(๐ โ ๐))
Solving eq. ๐ฅ ๐ก = ๐ผ๐ ๐ก ๐ฅ ๐ก + 1 โ ๐ผ ๐1. ๐ฅ(๐ก โ 1) is a GOOD initial solution for ๐ฅ(๐ก)2. Improving approximate solution locally
Proposed Method
The Idea
9
We use the Gauss-Southwell method
a.k.a. Bookmark coloring algorithm [Berkhin. Internet Math.โ06]
Local algorithm[Spielman-Teng. SIAM J. Comput.โ13] [Andersen-Chung-Lang. FOCSโ06]
[Southwell. โ40,โ46]
error < ๐
error โฅ ๐Insert
๐ = 1,2,3, โฆ
๐ โa vertex with largest ๐๐๐โ1
If ๐๐๐โ1
< ๐ terminate
Update ๐ฅ(๐โ1) & ๐(๐โ1) locally so that ๐๐๐= 0
๐th solution ๐ฅ ๐
๐th residual ๐(๐) = 1 โ ๐ผ ๐ โ ๐ผ โ ๐ผ๐ ๐ฅ ๐
Goal: ๐ ๐ โ ๐
Proposed Method
Gauss-Southwell Method [Southwell. โ40,โ46]
10
# iterations
Stops within ๐ 0
1โ๐ผ ๐iter.
โ ๐ ๐ โค ๐ ๐โ1 โ 1 โ ๐ผ ๐
๐
๐
๐
๐
+๐ผ๐๐(๐โ1)
deg
+๐ผ๐๐(๐โ1)
deg+๐ผ
๐๐(๐โ1)
deg
Accuracy
๐โ โ ๐ ๐โโค
๐
๐โ๐ถ
โ ๐๐(๐)
< ๐
Proposed Method
Overview
11
At time ๐ก:
๐ฅ ๐ก (0) = ๐ฅ ๐ก โ 1
๐ ๐ก 0 = ๐ ๐ก โ 1 + ๐ผ ๐ ๐ก โ ๐ ๐ก โ 1 ๐ฅ ๐ก โ 1
Apply the Gauss-Southwell method
depends on 1
At time ๐ก:
๐ฅ ๐ก (0) = ๐ฅ ๐ก โ 1
๐ ๐ก 0 = ๐ ๐ก โ 1 + ๐ผ ๐ ๐ก โ ๐ ๐ก โ 1 ๐ฅ ๐ก โ 1
Apply the Gauss-Southwell method
Proposed Method
Performance Analysis
12
Increase of ๐ ๐
Computation time of = ๐ช ฮ ร #iters.Max. out-deg โง
๐
๐
๐
๐
+๐ผ๐๐(๐โ1)
deg
+๐ผ๐๐(๐โ1)
deg+๐ผ
๐๐(๐โ1)
degHow small?
Proposed Method
Performance Analysis: Any Change
13
๐ฎ(๐ โ ๐) ๐ฎ(๐)
Consider any change including full construction
increase of ๐ 1 โค ๐๐ถ
Same as static computation
Proposed Method
Performance Analysis: Random Edge Insertion
14
Monte-Carlo[Bahmani-Chowdhury. VLDBโ10]
๐ # updated seg. = ๐ ๐ log๐
๐ = ฮฉ(1/๐2) is total # seg.
Our method
๐ increase of ๐ 1 โค ๐๐ถ/๐
A single-edge is randomly inserted for each time
๐ฎ(๐)
๐ธ 0 = โ Insert an edge ๐ธ ๐ = ๐
โฆ ๐ฎ(๐)๐ฎ(๐)
Lemma 3 in [Bahmani-Chowdhury. VLDBโ10]
Proposed Method
Performance Analysis: Results
15
Our result for random edge insertion (Prop. 6 in the paper)
If ๐ edges are randomly and sequentially inserted,
expected total #iter. is ๐ ๐ฅ๐จ๐ ๐/๐
โจ expected total time is ๐(๐+ ๐ซ ๐ฅ๐จ๐ ๐/๐)
Our result for any change (Prop. 5 in the paper)
#iter. for any change is amortized ๐(๐/๐)
โจ Time is amortized ๐(๐ซ/๐)
Experiments
Setting: Single-edge Insertion
Parameter settings
๐ผ = 0.85
๐ has 100 non-zero elements
๐ = 10โ9
16
๐ฎ(๐) ๐ฎ(๐) ๐ฎ(๐๐๐)
Except the last
100,000 edgesAdd an edge The whole graph
โฆ
chronologicallyor
randomly
Experiments
Efficiency Comparison: Time for an Edge Insertionweb-Google
[SNAP]
๐ =1M
๐ธ =5M
Wikipedia[KONECT]
๐ = 2M
๐ธ =40M
twitter-2010[LAW]
๐ = 142M
๐ธ =1,500M
uk-2007-05[LAW]
๐ = 105M
๐ธ =3,700M
This work 7 ฮผs 77 ฮผs 29,383 ฮผs 2 ฮผs
Aggregation/Disaggregation[Chien et al. โ04]
320 ฮผs 40,336 ฮผs >100,000 ฮผs >100,000 ฮผs
Monte-Carlo[Bahmani et al. โ10]
444 ฮผs 9,196 ฮผs >100,000 ฮผs >100,000 ฮผs
Warm start power method 80,994 ฮผs >100,000 ฮผs >100,000 ฮผs >100,000 ฮผs
From scratch power method >100,000 ฮผs >100,000 ฮผs >100,000 ฮผs >100,000 ฮผs
17Environment: Intel Xeon E5-2690 2.90GHz CPU with 256GB memory
[KONECT] The Koblenz Network Collection http://konect.uni-koblenz.de/networks/
[LAW] Laboratory for Web Algorithmics http://law.di.unimi.it/datasets.php
[SNAP] Stanford Large Network Dataset Collection http://snap.stanford.edu/data/
Experiments
Accuracy Comparison: Transition of Ave. ๐ฟ1 Error
18
This work
Aggregation/Disaggregation [Chien et al.โ04]
Monte-Carlo [Bahmani et al.โ10]
Warm start (power method)
From scratch (power method)
Wikipedia [KONECT]
๐ =2M ๐ธ =40M
Environment: Intel Xeon E5-2690 2.90GHz CPU with 256GB memory
10-4
10-8
10-10
10-6
10-12
10-14
soc-Epinions1 [SNAP]
๐ =76K ๐ธ =509K
10-4
10-8
10-10
10-6
10-12
10-14
Comparable (~10-9) to naive methods
Dataset [Source] ๐ฝ ๐ฌ Max. Out-deg ๐ซ Ave. Time Ave. #Iter.
wiki-Talk [SNAP] 2M 5M 100,022 589.6 ฮผs 2.3
web-Google [SNAP] 1M 5M 3,444 7.2 ฮผs 22.6
as-Skitter [SNAP] 2M 11M 35,387 288.4 ฮผs 0.8
FlickrTime [KONECT] 2M 33M 26,367 95.3 ฮผs 16.2
WikipediaTime [KONECT] 2M 40M 6,975 76.8 ฮผs 46.0
soc-LiveJournal1 [SNAP] 5M 68M 20,292 17.9 ฮผs 7.6
twitter-2010 [LAW] 42M 1,500M 2,997,469 29,382.8 ฮผs 0.7
uk-2007-05 [LAW] 105M 3,700M 15,402 2.3 ฮผs 0.0
Experiments
Evaluation: Time & #Iter. for a Single-edge Insertion
[KONECT] The Koblenz Network Collection http://konect.uni-koblenz.de/networks/
[LAW] Laboratory for Web Algorithmics http://law.di.unimi.it/datasets.php
[SNAP] Stanford Large Network Dataset Collection http://snap.stanford.edu/data/
Environment: Intel Xeon E5-2690 2.90GHz CPU with 256GB memory 19
Experiments
Evaluation: Relationship between ๐ฌ & #Iter.
20
๐ช ๐ธ โ1
We plotted for 15 datasets
Matches our theoretical result
๐ธ
Ave. #iter. for
edge insertion
Environment: Intel Xeon E5-2690 2.90GHz CPU with 256GB memory
Dataset [Source] ๐ฝ ๐ฌ Max. Out-deg ๐ซ Ave. Time Ave. #Iter.
wiki-Talk [SNAP] 2M 5M 100,022 589.6 ฮผs 2.3
web-Google [SNAP] 1M 5M 3,444 7.2 ฮผs 22.6
as-Skitter [SNAP] 2M 11M 35,387 288.4 ฮผs 0.8
FlickrTime [KONECT] 2M 33M 26,367 95.3 ฮผs 16.2
WikipediaTime [KONECT] 2M 40M 6,975 76.8 ฮผs 46.0
soc-LiveJournal1 [SNAP] 5M 68M 20,292 17.9 ฮผs 7.6
twitter-2010 [LAW] 42M 1,500M 2,997,469 29,382.8 ฮผs 0.7
uk-2007-05 [LAW] 105M 3,700M 15,402 2.3 ฮผs 0.0
[KONECT] The Koblenz Network Collection http://konect.uni-koblenz.de/networks/
[LAW] Laboratory for Web Algorithmics http://law.di.unimi.it/datasets.php
[SNAP] Stanford Large Network Dataset Collection http://snap.stanford.edu/data/
Experiments
Evaluation: Time & #Iter. for a Single-edge Insertion
What
Happens !?21Environment: Intel Xeon E5-2690 2.90GHz CPU with 256GB memory
Experiments
Discussion: What is the Difference?twitter-2010 ๐ข, ๐ฃ says โ๐ฃ follows ๐ขโ
uk-2007-05 ๐ข, ๐ฃ says โPage ๐ข links to ๐ฃโ
๐
# vertices s.t.
out-deg โฅ ๐
Celebrities cause the performance degradation !!22
Summary
Proposed an efficient & accurate method for
Personalized PageRank tracking in evolving networks
23
Experimentally
Scales to a graph w/ 3.7B edges
Further Speed-up based on our observation
Handle dangling nodes
Theoretically
Ave. ๐(๐+ ๐ซ ๐ฅ๐จ๐ ๐/๐)for ๐ edge insertions
Future Work