Upload
david-gleich
View
629
Download
4
Tags:
Embed Size (px)
DESCRIPTION
A talk based on
Citation preview
A dynamical system for PageRank with
time-dependent teleportation
David F. Gleich!Computer Science"Purdue University
Paper http://arxiv.org/abs/1211.4266 Code https://www.cs.purdue.edu/homes/dgleich/codes/dynsyspr-im
Ryan A. Rossi!Computer Science"Purdue University
1 David Gleich · Purdue ANL Seminar
1. Perspectives on PageRank
2. PageRank as a dynamical system and time-dependent teleportation
3. Predicting using PageRank
4. Applications to the power-grid?
2 David Gleich · Purdue ANL Seminar
Given a graph, what are the most important nodes?
3 David Gleich · Purdue ANL Seminar
The random surfer model!At a node … 1. follow edges with prob α 2. do something else with prob (1-α)
Google’s PageRank is one possible answer PageRank by Google
1
2
3
4
5
6
The Model1. follow edges uniformly with
probability �, and2. randomly jump with probability
1� �, we’ll assume everywhere isequally likely
The places we find thesurfer most often are im-portant pages.
David F. Gleich (Sandia) PageRank intro Purdue 5 / 36
The important pages are the places we are most likely to find the random surfer
4 David Gleich · Purdue ANL Seminar
The most important page on the web.!
5 David Gleich · Purdue ANL Seminar
PageRank details
1
2
3
4
5
6
!
2664
1/6 1/2 0 0 0 01/6 0 0 1/3 0 01/6 1/2 0 1/3 0 01/6 0 1/2 0 0 01/6 0 1/2 1/3 0 11/6 0 0 0 1 0
3775
| {z }P
P�j�0eTP=eT
“jump” ! v = [ 1n ... 1
n ]T ���0
eTv=1
Markov chainî�P+ (1� �)veT
óx = x
unique x ) �j � 0, eTx = 1.
Linear system (�� �P)x = (1� �)vIgnored dangling nodes patched back to v
algorithms laterDavid F. Gleich (Sandia) PageRank intro Purdue 6 / 36
PageRank by Google
1
2
3
4
5
6
The Model1. follow edges uniformly with
probability �, and2. randomly jump with probability
1� �, we’ll assume everywhere isequally likely
The places we find thesurfer most often are im-portant pages.
David F. Gleich (Sandia) PageRank intro Purdue 5 / 36
PageRank via
v is the jump vector.! vi � 0, eT v = 1
6 David Gleich · Purdue ANL Seminar
My definition of PageRank
A PageRank vector x is the solution of the linear system: (I – αP) x = (1 –α) v
where P is a column stochastic matrix, 0 ≤ α< 1, and v is a probability vector. PageRank details
1
2
3
4
5
6
!
2664
1/6 1/2 0 0 0 01/6 0 0 1/3 0 01/6 1/2 0 1/3 0 01/6 0 1/2 0 0 01/6 0 1/2 1/3 0 11/6 0 0 0 1 0
3775
| {z }P
P�j�0eTP=eT
“jump” ! v = [ 1n ... 1
n ]T ���0
eTv=1
Markov chainî�P+ (1� �)veT
óx = x
unique x ) �j � 0, eTx = 1.
Linear system (�� �P)x = (1� �)vIgnored dangling nodes patched back to v
algorithms laterDavid F. Gleich (Sandia) PageRank intro Purdue 6 / 36
Just three ingredients!
vi � 0, eT v = 1
↵ usually 0.5 to 0.99
7 David Gleich · Purdue ANL Seminar
This definition applies to a remarkable variety of problems 1. GeneRank 2. ProteinRank 3. FoodRank 4. SportsRank 5. HostRank 6. TrustRank 7. BadRank 8. IsoRank 9. SimRank 10. ObjectRank 11. ItemRank 12. ArticleRank 13. BookRank 14. FutureRank
15. TimedPageRank 16. SocialPageRank 17. DiffusionRank 18. ImpressionRank 19. TweetRank 20. TwitterRank 21. ReversePageRank 22. PageTrust 23. PopRank 24. CiteRank 25. FactRank 26. InvestorRank 27. ImageRank 28. VisualRank
29. QueryRank 30. BookmarkRan 31. StoryRank 32. PerturbationRank 33. ChemicalRank 34. RoadRank 35. PaperRank 36. Etc…
8 David Gleich · Purdue ANL Seminar
Richardson is a robust, simple algorithm to compute PageRank
(I � ↵P)x = (1 � ↵)vRichardson )
x
(k+1) = ↵Px
(k ) + (1 � ↵)v
error = kx
(k ) � xk1 2↵k
Given α, P, v
9 David Gleich · Purdue ANL Seminar
The teleportation distribution v models where surfers “restart” What if this changes with time?
10
David Gleich · Purdue ANL Seminar
First idea Resolve PageRank when v changes + PageRank is fast to solve! + Easy to understand – Need another model to incorporate the past – PageRank isn’t that fast to solve. Is there anything better?
11
David Gleich · Purdue ANL Seminar
Let’s look at how PageRank evolves with iterations
�x
(k ) = x
(k+1) � x
(k )
= ↵Px
(k ) + (1 � ↵)v � x
(k )
= (1 � ↵)v � (I � ↵P)x(k )
x
0(t) = (1 � ↵)v � (I � ↵P)x(t)
PageRank is the steady-state solution of the ODE
12
David Gleich · Purdue ANL Seminar
A dynamical system for "time-dependent teleportation
+ Easy to integrate + Easy to understand + Possible to treat analytically! – Need to “model time” (not dimensionless) – Still useful to have a data assimilation model
x
0(t) = (1 � ↵)v(t) � (I � ↵P)x(t)
13
David Gleich · Purdue ANL Seminar
Need a self-stabilized ODE We use a standard RK integrator "(ode45 in Matlab) We used the formulation to maintain x(t) as a probability distribution
x
0(t) = (1 � ↵)v(t) � (�I � ↵P)x(t)
� = (1 � ↵)eTv(t) + ↵e
Tx(t)
14
David Gleich · Purdue ANL Seminar
Where is this model realistic?
On Wikipedia, we have hourly visit data that provides a coarse measure of outside interest
15
David Gleich · Purdue ANL Seminar
Now PageRank values are time-series, not static scores
1 MainPage 2 FrancisMag 3 Search 4 Pricewater 5 UnitedStat 6 Protectedr 7 administra 8 Wikipedia 9 Glycoprote 10 Duckworth!
11 501(c) 12 Searching 13 Contents 14 Politics 15 Non!profit 16 Science 17 History 18 Society 19 Technology 20 Geography
21 Maintopicc 22 Featuredco 23 administra 24 Contents/Q 25 Freeconten 26 Encycloped 27 AmericanId 28 UnitedKing 29 Mathematic 30 Biography
31 Arts 32 AmericanId 33 Englishlan 34 adminship 35 Fundamenta 36 England 37 Watchmen 38 featuredco 39 Watchmen(f 40 Earthquake
41 India 42 Sciencepor 43 Redirects 44 Articles 45 Wikipedia 46 protectedp 47 QuestCrew 48 Wiki 49 Associatio 50 Raceandeth
51 Greygoo 52 pageprotec 53 Rihanna 54 Listofbasi 55 Sciencepor 56 KaraDioGua 57 TheBeatles 58 Technology 59 London 60 Football(s
61 Science 62 Gackt 63 Teleprompt 64 Technology 65 Society 66 Outlineofs 67 ER(TVserie 68 Philippine 69 NewYorkCit 70 Australia
71 Madonna(en 72 Richtermag 73 Tobaccoadv 74 Geography 75 California 76 Constantin 77 RobKnox 78 LosAngeles 79 Canada 80 MurderofEv
81 Livingpeop 82 Mathematic 83 Societypor 84 functionar 85 March6 86 Day26 87 Skittles(c 88 EveCarson 89 Redirectsf 90 U2
91 Categories 92 Germany 93 MediaWiki 94 Rorschach( 95 EatBulaga! 96 PaulaAbdul 97 Daylightsa 98 NewYork 99 Characters 100 Scotland
Earthquake
Australian Earthquake
occurs!
Main page
Time Time
Impo
rtanc
e
16
David Gleich · Purdue ANL Seminar
Some quick theory
x(t) = exp[�(I � ↵P)t ]x(0)
+ (1 � ↵)
Z t
0
exp[�(I � ↵P)(t � ⌧ )]v(⌧ ) d⌧ .
x
0(t) = (1 � ↵)v(t) � (I � ↵P)x(t)
Z t
0
exp[�(I � ↵P)(t � ⌧ )]v(⌧ ) d⌧
= (I � ↵P)
�1v � exp[�(I � ↵P)t ](I � ↵P)
�1v
x(t) = exp[�(I � ↵P)t ](x(0) � x) + x
For general v(t)
For static v(t) = v
The original "PageRank vector
17
David Gleich · Purdue ANL Seminar
Thus we recover "the original PageRank vector "if interest stops changing.
18
David Gleich · Purdue ANL Seminar
0 5 10 15 200.1
0.2
0.3
0.4
0.5
time
Dyn
amic
Pag
eRan
k
Page 1Page 2Page 3Page 4
Cyclical behavior in the time-dependent PageRank scores
1
2
3
4
0 20 40 60 800
0.05
0.1
0.15
0.2
time
Tim
e−de
pend
ent t
elep
orta
tion
Page 1Page 2Page 3Page 4
19
David Gleich · Purdue ANL Seminar
Modeling cyclical behavior
Cyclically switch between teleportation vectors vj
v(t) =
1
k
kX
j=1
vj
⇣cos(t + (j � 1)
2⇡k ) + 1
⌘
0 20 40 60 800
0.05
0.1
0.15
0.2
time
Tim
e−de
pend
ent t
elep
orta
tion
Page 1Page 2Page 3Page 4
v1 v2 v1 v2
20
David Gleich · Purdue ANL Seminar
Modeling cyclical behavior
Cyclically switch between teleportation vectors vj
v(t) =
1
k
kX
j=1
vj
⇣cos(t + (j � 1)
2⇡k ) + 1
⌘
x(t) = x + Re {s exp(ıt)}Then the eventual solution is
(I � ↵P)x = (1 � ↵)1k
Ve
(I � ↵1+ı P)s
= (1 � ↵)
1
k (1+ı) V exp(ıf)PageRank vector with average teleportation
PageRank with complex teleportation
21
David Gleich · Purdue ANL Seminar
Thus we can determine "the size of the oscillation "for the case of cyclical teleportation
22
David Gleich · Purdue ANL Seminar
Is it useful? Let’s try and predict retweets on Twitter
We crawled Twitter and gathered "a graph of who follows who and "how active each user is in a month This yields a graph and 6 vectors v!!Our goal is to predict how many tweets you’ll send next month based on the current month!
23
David Gleich · Purdue ANL Seminar
First, how do we model time?
v1, ... , vk ! V =⇥v1, ... , vk
⇤
v(t) = Ve(floor {t} + 1) = vfloor{t}+1
t=1 is one month
vs(t) = Ve(floor {t/s} + 1) = vfloor{t/s}+1
Rescaling time t=s is one month
x(sj), j = 0, 1, ... These are the same time points
s=∞ yields a recomputed PageRank at each step!
24
David Gleich · Purdue ANL Seminar
The effect of s on PageRank of one node is considerable
s = 1 s = 2 s = 6
(a) timescale s
! = 0.1 ! = 1 ! = 10
(b) smoothing ✓
! = 0.5 ! = 0.85 ! = 0.99
(c) damping parameter ↵
Figure 3 – The evolution of PageRank values for one node due to the dynamical teleportation.
The horizontal axis is time [0, 20], and the vertical axis runs between [0.01,0.014]. In figure (a),
↵ = 0.85, and we vary the time-scale parameter (section 2.5) with no smoothing. The solid
dark line corresponds to the step function of solving PageRank exactly at each change in the
teleportation vector. All samples are taken from the same e↵ective time-points as discussed in
the section. In figure (b), we vary the smoothing (section 2.6) of the teleportation vectors with
s = 2, and ↵ = 0.85. In figure (c), we vary ↵ with s = 2 and no smoothing. We used the ode45
function in Matlab, a Runge-Kutta method, to evolve the system.
2.7 Choosing the teleportation factor
Picking ↵ even for static PageRank problems is challenging, see Gleich et al. [2010] and Con-
stantine and Gleich [2010] for some discussion. In this manuscript, we do not perform any
systematic study of the e↵ects of ↵ beyond Figure 3(c). This simple experiment shows
one surprising feature. Common wisdom for choosing ↵ in the static case suggests that
as ↵ approaches 1, the vector becomes more sensitive. For the dynamic teleportation
setting, however, the opposite is true. Small values of ↵ produce solutions that more
closely reflect the teleportation vector – the quantity that is changing – whereas large
12
s = 1 s = 2 s = 6
(a) timescale s
! = 0.1 ! = 1 ! = 10
(b) smoothing ✓
! = 0.5 ! = 0.85 ! = 0.99
(c) damping parameter ↵
Figure 3 – The evolution of PageRank values for one node due to the dynamical teleportation.
The horizontal axis is time [0, 20], and the vertical axis runs between [0.01,0.014]. In figure (a),
↵ = 0.85, and we vary the time-scale parameter (section 2.5) with no smoothing. The solid
dark line corresponds to the step function of solving PageRank exactly at each change in the
teleportation vector. All samples are taken from the same e↵ective time-points as discussed in
the section. In figure (b), we vary the smoothing (section 2.6) of the teleportation vectors with
s = 2, and ↵ = 0.85. In figure (c), we vary ↵ with s = 2 and no smoothing. We used the ode45
function in Matlab, a Runge-Kutta method, to evolve the system.
2.7 Choosing the teleportation factor
Picking ↵ even for static PageRank problems is challenging, see Gleich et al. [2010] and Con-
stantine and Gleich [2010] for some discussion. In this manuscript, we do not perform any
systematic study of the e↵ects of ↵ beyond Figure 3(c). This simple experiment shows
one surprising feature. Common wisdom for choosing ↵ in the static case suggests that
as ↵ approaches 1, the vector becomes more sensitive. For the dynamic teleportation
setting, however, the opposite is true. Small values of ↵ produce solutions that more
closely reflect the teleportation vector – the quantity that is changing – whereas large
12
Time
Page
Rank
x1(t
)
gray involves just recomputing PageRank at each change
Data from Wikipedia
25
David Gleich · Purdue ANL Seminar
Second, can we make it smooth?
v1, ... , vk ! V =⇥v1, ... , vk
⇤
v(t) = Ve(floor {t} + 1) = vfloor{t}+1
t=1 is one month
¯v(t ; ✓) = �v(t)| {z }new data
+ (1 � �)
¯v(t � h; ✓)| {z }old data
,
v̄0(t ; ✓) = ✓v(t) � ✓v̄(t ; ✓) Full ODE
Forward Euler "interpretation
26
David Gleich · Purdue ANL Seminar
s = 1 s = 2 s = 6
(a) timescale s
! = 0.1 ! = 1 ! = 10
(b) smoothing ✓
! = 0.5 ! = 0.85 ! = 0.99
(c) damping parameter ↵
Figure 3 – The evolution of PageRank values for one node due to the dynamical teleportation.
The horizontal axis is time [0, 20], and the vertical axis runs between [0.01,0.014]. In figure (a),
↵ = 0.85, and we vary the time-scale parameter (section 2.5) with no smoothing. The solid
dark line corresponds to the step function of solving PageRank exactly at each change in the
teleportation vector. All samples are taken from the same e↵ective time-points as discussed in
the section. In figure (b), we vary the smoothing (section 2.6) of the teleportation vectors with
s = 2, and ↵ = 0.85. In figure (c), we vary ↵ with s = 2 and no smoothing. We used the ode45
function in Matlab, a Runge-Kutta method, to evolve the system.
2.7 Choosing the teleportation factor
Picking ↵ even for static PageRank problems is challenging, see Gleich et al. [2010] and Con-
stantine and Gleich [2010] for some discussion. In this manuscript, we do not perform any
systematic study of the e↵ects of ↵ beyond Figure 3(c). This simple experiment shows
one surprising feature. Common wisdom for choosing ↵ in the static case suggests that
as ↵ approaches 1, the vector becomes more sensitive. For the dynamic teleportation
setting, however, the opposite is true. Small values of ↵ produce solutions that more
closely reflect the teleportation vector – the quantity that is changing – whereas large
12
The effect of theta on PageRank of one node is moderate
Time
Page
Rank
x1(t
)
Only matters if there is a big jump
Data from Wikipedia
s = 1 s = 2 s = 6
(a) timescale s
! = 0.1 ! = 1 ! = 10
(b) smoothing ✓
! = 0.5 ! = 0.85 ! = 0.99
(c) damping parameter ↵
Figure 3 – The evolution of PageRank values for one node due to the dynamical teleportation.
The horizontal axis is time [0, 20], and the vertical axis runs between [0.01,0.014]. In figure (a),
↵ = 0.85, and we vary the time-scale parameter (section 2.5) with no smoothing. The solid
dark line corresponds to the step function of solving PageRank exactly at each change in the
teleportation vector. All samples are taken from the same e↵ective time-points as discussed in
the section. In figure (b), we vary the smoothing (section 2.6) of the teleportation vectors with
s = 2, and ↵ = 0.85. In figure (c), we vary ↵ with s = 2 and no smoothing. We used the ode45
function in Matlab, a Runge-Kutta method, to evolve the system.
2.7 Choosing the teleportation factor
Picking ↵ even for static PageRank problems is challenging, see Gleich et al. [2010] and Con-
stantine and Gleich [2010] for some discussion. In this manuscript, we do not perform any
systematic study of the e↵ects of ↵ beyond Figure 3(c). This simple experiment shows
one surprising feature. Common wisdom for choosing ↵ in the static case suggests that
as ↵ approaches 1, the vector becomes more sensitive. For the dynamic teleportation
setting, however, the opposite is true. Small values of ↵ produce solutions that more
closely reflect the teleportation vector – the quantity that is changing – whereas large
12
27
David Gleich · Purdue ANL Seminar
Parameters of the prediction
alpha – PageRank modeling parameters s – time-scale theta - smoothing
28
David Gleich · Purdue ANL Seminar
The prediction model ⇥
f̄(t � 1) f̄(t � 2) ... f̄(t � w)⇤
b ⇡ p(t)
sMAPE =1|T |
|T |X
t=1
|pt � p̂t |(pt + p̂t )/2
averaged over nodes
Linear, one-step ahead prediction
is evaluated using
29
David Gleich · Purdue ANL Seminar
The results
Dataset Type ✓ Error Ratio
s (timescale)
1 2 6 1TWITTER stationary 0.01 0.635 0.929 0.913 0.996
0.50 0.636 0.735 0.854 0.939
1.00 0.522 0.562 0.710 0.963
non-stationary 0.01 0.461 0.841 1.001 0.992
0.50 0.261 0.608 0.585 0.929
1.00 0.137 0.605 0.617 0.918
Err Ratio = SMAPE of tweets + Time-dependent PR / SMAPE of tweets only If this ratio < 1, then using Time-dependent PR helps Stationary nodes are those with small maximum change in scores Non-stationary nodes are those with large maximum change in scores
30
David Gleich · Purdue ANL Seminar
We tried the same experiment with Wikipedia, "but there was no meaningful change in the prediction error.
31
David Gleich · Purdue ANL Seminar
Using Granger Causality to study link relationships on Wikipedia
1 MainPage 2 FrancisMag 3 Search 4 Pricewater 5 UnitedStat 6 Protectedr 7 administra 8 Wikipedia 9 Glycoprote 10 Duckworth−
11 501(c) 12 Searching 13 Contents 14 Politics 15 Non−profit 16 Science 17 History 18 Society 19 Technology 20 Geography
21 Maintopicc 22 Featuredco 23 administra 24 Contents/Q 25 Freeconten 26 Encycloped 27 AmericanId 28 UnitedKing 29 Mathematic 30 Biography
31 Arts 32 AmericanId 33 Englishlan 34 adminship 35 Fundamenta 36 England 37 Watchmen 38 featuredco 39 Watchmen(f 40 Earthquake
41 India 42 Sciencepor 43 Redirects 44 Articles 45 Wikipedia 46 protectedp 47 QuestCrew 48 Wiki 49 Associatio 50 Raceandeth
51 Greygoo 52 pageprotec 53 Rihanna 54 Listofbasi 55 Sciencepor 56 KaraDioGua 57 TheBeatles 58 Technology 59 London 60 Football(s
61 Science 62 Gackt 63 Teleprompt 64 Technology 65 Society 66 Outlineofs 67 ER(TVserie 68 Philippine 69 NewYorkCit 70 Australia
71 Madonna(en 72 Richtermag 73 Tobaccoadv 74 Geography 75 California 76 Constantin 77 RobKnox 78 LosAngeles 79 Canada 80 MurderofEv
81 Livingpeop 82 Mathematic 83 Societypor 84 functionar 85 March6 86 Day26 87 Skittles(c 88 EveCarson 89 Redirectsf 90 U2
91 Categories 92 Germany 93 MediaWiki 94 Rorschach( 95 EatBulaga! 96 PaulaAbdul 97 Daylightsa 98 NewYork 99 Characters 100 Scotland
1 MainPage 2 FrancisMag 3 Search 4 Pricewater 5 UnitedStat 6 Protectedr 7 administra 8 Wikipedia 9 Glycoprote 10 Duckworth−
11 501(c) 12 Searching 13 Contents 14 Politics 15 Non−profit 16 Science 17 History 18 Society 19 Technology 20 Geography
21 Maintopicc 22 Featuredco 23 administra 24 Contents/Q 25 Freeconten 26 Encycloped 27 AmericanId 28 UnitedKing 29 Mathematic 30 Biography
31 Arts 32 AmericanId 33 Englishlan 34 adminship 35 Fundamenta 36 England 37 Watchmen 38 featuredco 39 Watchmen(f 40 Earthquake
41 India 42 Sciencepor 43 Redirects 44 Articles 45 Wikipedia 46 protectedp 47 QuestCrew 48 Wiki 49 Associatio 50 Raceandeth
51 Greygoo 52 pageprotec 53 Rihanna 54 Listofbasi 55 Sciencepor 56 KaraDioGua 57 TheBeatles 58 Technology 59 London 60 Football(s
61 Science 62 Gackt 63 Teleprompt 64 Technology 65 Society 66 Outlineofs 67 ER(TVserie 68 Philippine 69 NewYorkCit 70 Australia
71 Madonna(en 72 Richtermag 73 Tobaccoadv 74 Geography 75 California 76 Constantin 77 RobKnox 78 LosAngeles 79 Canada 80 MurderofEv
81 Livingpeop 82 Mathematic 83 Societypor 84 functionar 85 March6 86 Day26 87 Skittles(c 88 EveCarson 89 Redirectsf 90 U2
91 Categories 92 Germany 93 MediaWiki 94 Rorschach( 95 EatBulaga! 96 PaulaAbdul 97 Daylightsa 98 NewYork 99 Characters 100 Scotland
Earthquake Richter Mag.
Causes?
Of course! We build this into the model.
32
David Gleich · Purdue ANL Seminar
But, the question is, which of these are preserved after incorporating the effects of page view data?
33
David Gleich · Purdue ANL Seminar
Using Granger Causality to find the important links on Wikipedia
Earthquake Granger causes p-value
Seismic hazard 0.003535
Extensional tectonics 0.003033
Landslide dam 0.002406
Earthquake preparedness 0.001157
Richter magnitude scale 0.000584
Fault (geology) 0.000437
Aseismic creep 0.000419
Seismometer 0.000284
Epicenter 0.000020
Seismology 0.000001
34
David Gleich · Purdue ANL Seminar
Thus, these links “fit” our model, whereas the other links on the page do not.
35
David Gleich · Purdue ANL Seminar
Application to the power grid
Prior work • Kim, Obah, 2007; Jin et al., 2010; Adolf et al., 2011; Halappanavar et
al., 2012
has found that graph properties have important correlations with power-grid vulnerabilities and contingency analysis
36
David Gleich · Purdue ANL Seminar
Each edge has a power flow that satisfies some non-linear power flow equation. We use average daily flows to study time-dependent PageRank on the line graph of the underlying network. Lines with high variance may be problematic?
37
David Gleich · Purdue ANL Seminar
My questions
Sample data to test this idea? Too simplistic?
Time-dependent betweenness centrality with cyclical teleportation?
Other power-grid problems where similar ideas may be able to help?
38
David Gleich · Purdue ANL Seminar
A dynamical system for PageRank with
time-dependent teleportation
David F. Gleich!Computer Science"Purdue University
Paper http://arxiv.org/abs/1211.4266 Code https://www.cs.purdue.edu/homes/dgleich/codes/dynsyspr-im
Ryan A. Rossi!Computer Science"Purdue University
39
David Gleich · Purdue ANL Seminar