22
Pagerank - - ÷ web as a Graph with orientation : the arrows are links : Page I links Page 2 and Page 3 Page 2 links Page 1 Page 3 links Page 2

Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

Pagerank-

-

÷web as a Graph with orientation : the arrows

are links :

Page I links Page 2 and Page 3

Page 2 links Page 1

Page 3 links Page 2

Page 2: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

the graph can be encoded in an adjacencymatrix :

←2 3

Hsl !:b ) = hittin.hif1 O>

1

Let wi be the"

importance"

of The pagei

Pagerank adopts the following condition :

' the importance of a page is distributed

(uniformly ) toThe linked pages .

Page 3: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

In our example :

-2 3

IN a =W

2

Wi t wzV /

W 2 = 12

W 3 =WI

2

In general , givenH = [ his ] ; ,

of site N × N

Fi,

di = II,his ← number of pages

linked fromPage i

✓T , Wi = E hi ← importance of Page i is

di Set according To the condition

in the previous page

Page 4: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

N

ti,

di =E his e- D= HE5--1

N

A -

= E hiiwi ← It =ID-

'

H, , wi

indiwww.qsfddi:) , Il

a -

- Ido'-

'i.IN/--diagCd)

" t:L

Page 5: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

N

ti,

di =E his ← I

= HE-1--1

N

A

nhi÷iii. ill :: met :D=Ido'-

'i.am/--diagCd)

her

÷::÷÷l I

Page 6: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

:::::::÷÷.:÷÷But .

. .

this initial version has some issues,

we need to modify it.

The first issue is : what happens ifFi : di = o ?

Page 7: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

Dealing with dangling nodes ( di = o )this is the case of a page i that has no

outgoing links,

and it is called a dangling node

In case of dangling nodes,

that correspond to

rows of H that have only Zero entries,

The Pagerank idea is to replace such rows

with rows full of ones i

1 2

" I :÷÷lix.3

-4

Page 8: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

I 2

ne to :*:L ix.3

-4

I 2

it! : I#I I I I

3 -

Cg-

4

Meaning : if a page has not outgoing links,

there its importance i 's equallydistributed among all pages

Page 9: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

I 2

" to :*:L ix.3

-4

i.io:÷ s:*I I I I

ti=

H

tuff←-

-

where a has ones in the position of danglingnodes ( in our case : u

= I §/ )

Page 10: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

We can then rewrite the problem, as pollens :

I =

vector of dangling modes

I=

H t U

#t

--

I = diag CI) ,where I

=IEM

=I -

n II

Pagerank version

2-Find w C- IRN Such that WAO and' '

wT

= Wt Mqq.wg.w.nw.ga.ee#,e,.genuegg,iassociated to the eigenvalue I

Page 11: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

What about well - posed ness of our previous

Pagerank problem ?

Existence of a solution .

theorem . A more- mall Solution w exists

.

-

i

-

Proof the thesis-

i

I weE IRN

:

weto and wT

=WTM

is equivalent to

Fw c- IRN : Wto and WIM- Id ) = O

is equivalent to

t w E RN: W ± o and ( Mt - Id ) w = O

is then equivalent to det ( MT - Id ) = O

Page 12: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

Since the determinant of a matrix is equal to

the determinant of its transpose ,

the problem is

equivalent to

det ( M - id ) = o

this is finally equivalent to the existence of

a vector veto such that ( M - Id ) y = O.

This happens for I = I =L !) .Indeed

,

recalling that M=

I - ' te,

we have

I - ' te I

=I- ' I

= I ,then

( I - in - Id ) I =

O

Ba

Page 13: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

Exercise : reasoning as in the previous proof ,

prove that the"

left eigenvalues"

are

the Sanne as the "

right eigenvalues"

.

The left eigenvalues of a matrix AEIRN"

are

the d EIR such that FWERN,

WTA = xwt

The rigth eigenvalues of a matrix AE IRN " Nare

the X EIR such that FVEIRN,

Aw = Xw

NOTE The Pagerank problem is equivalent to-

finding a left eigenvector of MID -' te

Page 14: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

Uniqueness of the Solution .

The solution w is not unique i any scalar

multiple of w is solution if w is a solution.

The question is then ; is the Solution

unique upto scalar multiplication ? That is

:I f w is a solution an Wii 's a solution,

can

We say that there is a LER such that

w=

air ?

Page 15: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

The answer to the previous question can be

found in the Perron- Frobenius theorem

.

Roughly speaking ,Sufficient conditions for a

uniqueSolution ( upto scalar multiples) w to

of WT= WTA are i

- A is irreducible

- A has strictly positive elements

But,

our matrix M does not fulfill suck

conditions.

Then,

We further modify the

problem .

Page 16: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

If M =D - I te is The matrix of the original

Pagerank problem ,now we introduce the

modified matrix A such that

A- = y M t ( i - r )

aItwhere : y is a parameters E 10,1)

.

E. g. i 8=0.85

IEIR

"

is givenHi

, vizo and It. =L

Pagerank version 3f

the problem becomes to find w ⇒ o s.

t. WE WTA

.-Indeed A fulfils the condition of the RF

.

theorem,

then F ! Solution w and one can

also prove that in such a ease Hi, wi > o

Page 17: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

What is the interpretation of the modified

Pagerank problemof finding o ⇐ w ERN such that

wT

=WTA

with A = y M t ( e - y ) ¥

It?

Answer : the importance of thepages

is givenin part from the previous idea ( wtf M )and in part it is given according to rt

( this is the poet G - r ) ( XIE ) yT )-

A common choice is I= NIIt

Page 18: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

Computation of the solution.

Under the assumption of the P.

F. theorem

,

it is shown that I is the eigenvalue of

A with maximum absolute value.

The matrix A is mousy mimetic but the eigenvector can be

efficiently computed by a"

power method "

the following algorithm gets H, I

, y ,max it

and returnsI ,

are approximation of we ,

after maxi 't power iterations.

Page 19: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

function y = pagerank ( H, I

, 8 ,max it )

N = size ( H,

i) ;

look for the dangling nodes and construct A

construct di and B= diag lot )

construct A= y 5- 'Iit C I - y ) ut

y = rand ( n , a) ; y = y / Ily He

-

Since y has positive entries

for it =l

: Marit Kylie = Sam C Y )

y YT A

YT = Y Tf k y If,

← this step is not needed since

Y has already Hy 11,

- I

end

Page 20: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

there are some detailsthat make the codemore efficient ( in MATLAB but not only )

o store H as a sparse matrix

o find the dangling nodes :

D=

HonesI Nn) ( equivalent to ol = Suen ( H

,2 ) )

dangling = ( D= = o )o represent d and te as

it=

d t N # dangling e- not expensive to

I= H t ( dangling * ones I say ) compute

2-

may be expensive to Construct

but this is not needed.

We only need to compute ( for xT=ytD - ')It= EHt ( x dangling)* ones ( i. N )

Page 21: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and

PageRank beyond the Web by David F. Gleich https://arxiv.org/abs/1407.5107

References

these notes follow :

Dario A. Bini

,

" le problem a del PageRank"

An interesting presentation of PageRank from the

probabilistic point of View,

withmanyapplicationsis :

11 "O n :

Page 22: Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r) where: Ita y is a parameters E) 10,1. E. g. i 8=0.85 EIR I " is given Hi, vizo and