Upload
yandex
View
159
Download
2
Tags:
Embed Size (px)
Citation preview
Asymptotic behaviour of rankingalgorithms in directed randomnetworks
Nelly Litvak
University of Twente, The Netherlands
joint work withMariana Olvera-Cravioto and Ningyuan Chen
Workshop on Extremal Graph TheoryMoscow, 06-06-2014
Power law of PageRank
Pandurangan, Raghavan, Upfal, 2002.
[ Nelly Litvak, SOR group ] 2/25
Power laws in complex networks
I Power laws: Internet, WWW, social networks, biologicalnetworks, etc...
I degree of the node = # (in-/out-) linksI [fraction nodes degree at least k] = pk ,I Power law: pk ≈ const · k−α, α > 0.I Power law is the model for high variability: some nodes (hubs)
have extremely many connectionsI log pk = log(const) − α log kI Straight line on the log-log scale
[ Nelly Litvak, SOR group ] 3/25
Power laws in complex networks
I Power laws: Internet, WWW, social networks, biologicalnetworks, etc...
I degree of the node = # (in-/out-) linksI [fraction nodes degree at least k] = pk ,I Power law: pk ≈ const · k−α, α > 0.I Power law is the model for high variability: some nodes (hubs)
have extremely many connections
I log pk = log(const) − α log kI Straight line on the log-log scale
[ Nelly Litvak, SOR group ] 3/25
Power laws in complex networks
I Power laws: Internet, WWW, social networks, biologicalnetworks, etc...
I degree of the node = # (in-/out-) linksI [fraction nodes degree at least k] = pk ,I Power law: pk ≈ const · k−α, α > 0.I Power law is the model for high variability: some nodes (hubs)
have extremely many connectionsI log pk = log(const) − α log k
I Straight line on the log-log scale
[ Nelly Litvak, SOR group ] 3/25
Power laws in complex networks
I Power laws: Internet, WWW, social networks, biologicalnetworks, etc...
I degree of the node = # (in-/out-) linksI [fraction nodes degree at least k] = pk ,I Power law: pk ≈ const · k−α, α > 0.I Power law is the model for high variability: some nodes (hubs)
have extremely many connectionsI log pk = log(const) − α log kI Straight line on the log-log scale
[ Nelly Litvak, SOR group ] 3/25
Regular variation
I X is regularly varying random variable with index α
P(X > x) = L(x)x−α, x > 0
I L(x) is slowly varying:for every t > 0, L(tx)/L(x)→ 1 as x →∞
[ Nelly Litvak, SOR group ] 4/25
Google PageRank
I S. Brin, L. Page, The anatomy of a large-scale hypertextualWeb search engine (1998)
I PageRank Ri of page i = 1, . . . , n is defined as a stationarydistribution of a random walk with jumps:
Ri =∑j → i
c
djRj + (1 − c)bi , i = 1, . . . , n
I dj = # out-links of page j
I c ∈ (0, 1), originally 0.85, probability of a random jump
I bi probability to jump to page i , originally, bi = 1/n
I personalized PageRank: bi 6= 1/n
[ Nelly Litvak, SOR group ] 5/25
Google PageRank
I S. Brin, L. Page, The anatomy of a large-scale hypertextualWeb search engine (1998)
I PageRank Ri of page i = 1, . . . , n is defined as a stationarydistribution of a random walk with jumps:
Ri =∑j → i
c
djRj + (1 − c)bi , i = 1, . . . , n
I dj = # out-links of page j
I c ∈ (0, 1), originally 0.85, probability of a random jump
I bi probability to jump to page i , originally, bi = 1/n
I personalized PageRank: bi 6= 1/n
[ Nelly Litvak, SOR group ] 5/25
Google PageRank
I S. Brin, L. Page, The anatomy of a large-scale hypertextualWeb search engine (1998)
I PageRank Ri of page i = 1, . . . , n is defined as a stationarydistribution of a random walk with jumps:
Ri =∑j → i
c
djRj + (1 − c)bi , i = 1, . . . , n
I dj = # out-links of page j
I c ∈ (0, 1), originally 0.85, probability of a random jump
I bi probability to jump to page i , originally, bi = 1/n
I personalized PageRank: bi 6= 1/n
[ Nelly Litvak, SOR group ] 5/25
Examples of applications
Ri =∑j → i
c
djRj + (1 − c)bi , i = 1, . . . , n
I Topic-sensitive search (Haveliwala, 2002);I Spam detection (Gyongyi et al., 2004)I Finding related entities (Chakrabarti, 2007);I Link prediction (Liben-Nowell and Kleinberg, 2003;
Voevodski, Teng, Xia, 2009);I Finding local cuts (Andersen, Chung, Lang, 2006);I Graph clustering (Tsiatas, Chung, 2010);I Person name disambiguation
(Smirnova, Avrachenkov, Trousse, 2010);I Finding most influential people in Wikipedia
(Shepelyansky et al, 2010, 2013)
[ Nelly Litvak, SOR group ] 6/25
Stochastic model for PageRank
I Rescale: Ri → nRi , bi → nbi
Ri =∑j → i
c
djRj + (1 − c)bi , i = 1, . . . , n
I Stochastic equation:
Rd= c
N∑j=1
1
DjRj + cp0 + (1 − c)B
I N: in-degree of the randomly chosen pageI D: out-degree of page that links to the randomly chosen pageI p0: fraction of pages with out-degree zeroI Rj is distributed as R; N,D,Rj are independent; N and B can
be dependentI We can denote Q = cp0 + (1 − c)B, Cj = c/Dj .
[ Nelly Litvak, SOR group ] 7/25
Stochastic model for PageRank
I Rescale: Ri → nRi , bi → nbi
Ri =∑j → i
c
djRj + (1 − c)bi , i = 1, . . . , n
I Stochastic equation:
Rd= c
N∑j=1
1
DjRj + cp0 + (1 − c)B
I N: in-degree of the randomly chosen pageI D: out-degree of page that links to the randomly chosen pageI p0: fraction of pages with out-degree zeroI Rj is distributed as R; N,D,Rj are independent; N and B can
be dependentI We can denote Q = cp0 + (1 − c)B, Cj = c/Dj .
[ Nelly Litvak, SOR group ] 7/25
Results for stochastic recursion
Rd=
N∑j=1
CjRj + Q
Theorem (Volkovich&L 2010)
If P(B > x) = o(P(N > x)), then the following are equivalent:
I P(N > x) ∼ x−αNLN(x) as x →∞,
I P(R > x) ∼ cNx−αNLN(x) as x →∞,
where cN = (E (c/D))αN [1 − E(N)E((C )αN )]−1
[ Nelly Litvak, SOR group ] 8/25
Power Law behaviour of PageRank
I Data for Web, Wikipedia and Preferential Attachment graph
[ Nelly Litvak, SOR group ] 9/25
Results for stochastic recursion
Rd=
N∑j=1
CjRj + Q
I Series of papers Olvera-Cravioto& Jelenkovic 2010, 2012,Olvera-Cravioto 2012 analyzed the recursion in details usingsample path large deviation and implicit renewal theory.
I Tail behaviour of R is obtained under most generalassumptions on Cj ’s
I R can be heavy-tailed even when N is light-tailed.
[ Nelly Litvak, SOR group ] 10/25
Recursion on a graph
I So far we, in fact, consider recursion on a treeI Will similar results hold on a particular graph structure?I Some graphs are tree-like (Thorny Branching Process, TBP)
[ Nelly Litvak, SOR group ] 11/25
Directed configuration model
I Directed graph on n nodes V = {v1, . . . , vn}.I In-degree and out-degree:
I mi = in-degree of node vi = number of edges pointing to vi .I di = out-degree of node vi = number of edges pointing from
vi .
I (m,d) = ({mi }, {di }) is called a bi-degree-sequence.
I Target distributions:
In-degree: F = (fk : k = 0, 1, 2, . . . ), and
Out-degree: G = (gk : k = 0, 1, 2, . . . ).
[ Nelly Litvak, SOR group ] 12/25
Assumptions on the target distributions
I Suppose further that for some α,β > 2,
F (x) =∑k>x
fk 6 x−αLF (x)
and
G (x) =∑k>x
gk 6 x−βLG (x),
for all x > 0, where LF (·) and LG (·) are slowly varying.
I Assume both F and G have finite variance.
[ Nelly Litvak, SOR group ] 13/25
The bi-degree sequence (Chen&Olvera-Cravioto, 2012)
1 Fix 0 < δ0 < 1 − θ, θ = max{α−1,β−1, 1/2}.2 Sample {γ1, . . . ,γn} i.i.d. from F ; let Γn =
∑ni=1 γi .
3 Sample {ξ1, . . . , ξn} i.i.d. from G ; let Ξn =∑n
i=1 ξi .4 Let ∆n = Γn − Ξn. If |∆n| 6 nθ+δ0 go to step 5; otherwise go
to step 2.5 Choose randomly |∆n| nodes S = {i1, i2, . . . , i|∆n|} without
replacement and let
Ni = γi + τi , Di = ξi + χi , i = 1, 2, . . . , n,
whereχi =
{1 if ∆n > 0 and i ∈ S,
0 otherwise,and
τi =
{1 if ∆n < 0 and i ∈ S,
0 otherwise.
[ Nelly Litvak, SOR group ] 14/25
Constructing the graph
I Using the bi-degree-sequence (N,D) for the in- andout-degrees:
I assign to each node vi a number mi of inbound stubs and anumber di of outbound stubs;
I pair outbound stubs to inbound stubs to form directed edgesby matching to each inbound stub an outbound stub chosenuniformly at random from the set of unpaired outbound stubs.
I proceed in the same way for all remaining unpaired inboundstubs, i.e., choose uniformly from the set of unpaired outboundstubs and draw the corresponding directed edge.
I The result is a multigraph (e.g., with self-loops and multipleedges in the same direction) on nodes {v1, . . . , vn}.
[ Nelly Litvak, SOR group ] 15/25
PageRank in directed configuration model
I Ci = ζi/Di , where {ζi } is a sequence of i.i.d. random variablesindependent of (N,D) (ζi = c in a classical case)
I M = M(n) ∈ Rn×n is related to the adjacency matrix of thegraph:
Mi ,j =
{sijCi , if there are sij edges from i to j ,
0, otherwise.
I Q ∈ Rn is a personalization vector
I We are interested in one coordinate, R1, of the vector R ∈ Rn
defined byR = RM + Q
[ Nelly Litvak, SOR group ] 16/25
Matrix iterations
R(n,0) = B,
R(n,1) = R(n,0)M + Q = BM + Q,
R(n,2) = R(n,1)M + Q = BM2 + QM + Q,
R(n,3) = R(n,2)M + Q = BM3 + QM2 + QM + Q,
...
R(n,k) =
k−1∑i=0
QM i + BMk , k > 1.
We are interested in analyzing P(R(n,∞)1 > x), x →∞.
[ Nelly Litvak, SOR group ] 17/25
Idea of the analysis
I R(n,k)1 – PageRank on a perfect branching tree
I R – solution of the equation
Rd=
γ∑i=1
CjRj + Q
I We will try to prove the following: for any fixed t ∈ R, and arandomly chosen node v ,
P(R(n,∞)1 6 t) ≈ P(R
(n,k)1 6 t) ≈ P(R
(n,k)1 6 t) ≈ P(R 6 t)
for large enough n, k .
[ Nelly Litvak, SOR group ] 18/25
Idea of the analysis
If we prove that for some k = k(n)→∞ and any ε > 0,
(Matrix Iterations) P(∣∣∣R(n,∞)
1 − R(n,k)1
∣∣∣ > ε)→ 0,
(1)
(Coupling with branching tree) P(∣∣∣R(n,k)
1 − R(n,k)1
∣∣∣ > ε)→ 0,
(2)
(Limiting solution) P(∣∣∣R(n,k)
1 − R∣∣∣ > ε)→ 0,
(3)
as n→∞, then it will follow, by Slutsky’s lemma, that
R(n,∞)1 ⇒ R(∞)
as n→∞, where ⇒ denotes convergence in distribution.
[ Nelly Litvak, SOR group ] 19/25
Coupling with branching tree
I We start with random node (node 1) and explore itsneighbours, labeling the stubs that we have already seen
I τ – the number of generations of WBP completed beforecoupling breaks
[ Nelly Litvak, SOR group ] 20/25
Coupling with branching tree
Lemma
Let τ be the number of generations of the TBP that we are able tocomplete before we draw the first stub that has already beenobserved before. Then, for any 0 < ε < 1/2, anda = (1/2 − ε)/ logm, where m = E [N]
P(τ 6 a log n) = O(n−ε/2
)as n→∞.
[ Nelly Litvak, SOR group ] 21/25
Combining with matrix iteration
I P(∣∣∣R(n,∞)
1 − R(n,k)1
∣∣∣ > ckKn)= o(1)
I We need ckn = o(1) for some k < τ
I Combining this with Lemma 2, we get the main result
[ Nelly Litvak, SOR group ] 22/25
Main result
I Let n be the number of nodes in the random graph, and let Nand D be r.v.s having the in-degree and effective out-degreedistributions, resp.
I Let R(n) be the rank vector computed on the graph with nnodes.
I Theorem: (Chen, L, Olvera-Cravioto, 2014) Suppose0 < c < 1/(E [N])2, then
R1(n)⇒ R, n→∞,
where R is the solution to the fixed point equation
Rd= q + c
N∑i=1
Ri
Di.
[ Nelly Litvak, SOR group ] 23/25
Work in progress
I Relaxing conditions on c: better bounds for τ and the matrixiterations
I So far, finite variance assumption
I The result probably will not hold for all c ∈ (0, 1).
I The PageRank must converge for all c < 1. Will we obtainthe same power law but with different factor?
[ Nelly Litvak, SOR group ] 24/25
Thank you!
[ Nelly Litvak, SOR group ] 25/25