Pagerank -  · Pagerank problem, now we introduce the modified matrix A such that A-= y M t (i-r)...

Preview:

Citation preview

Pagerank-

-

÷web as a Graph with orientation : the arrows

are links :

Page I links Page 2 and Page 3

Page 2 links Page 1

Page 3 links Page 2

the graph can be encoded in an adjacencymatrix :

←2 3

Hsl !:b ) = hittin.hif1 O>

1

Let wi be the"

importance"

of The pagei

Pagerank adopts the following condition :

' the importance of a page is distributed

(uniformly ) toThe linked pages .

In our example :

-2 3

IN a =W

2

Wi t wzV /

W 2 = 12

W 3 =WI

2

In general , givenH = [ his ] ; ,

of site N × N

Fi,

di = II,his ← number of pages

linked fromPage i

✓T , Wi = E hi ← importance of Page i is

di Set according To the condition

in the previous page

N

ti,

di =E his e- D= HE5--1

N

A -

= E hiiwi ← It =ID-

'

H, , wi

indiwww.qsfddi:) , Il

a -

- Ido'-

'i.IN/--diagCd)

" t:L

N

ti,

di =E his ← I

= HE-1--1

N

A

nhi÷iii. ill :: met :D=Ido'-

'i.am/--diagCd)

her

÷::÷÷l I

:::::::÷÷.:÷÷But .

. .

this initial version has some issues,

we need to modify it.

The first issue is : what happens ifFi : di = o ?

Dealing with dangling nodes ( di = o )this is the case of a page i that has no

outgoing links,

and it is called a dangling node

In case of dangling nodes,

that correspond to

rows of H that have only Zero entries,

The Pagerank idea is to replace such rows

with rows full of ones i

1 2

" I :÷÷lix.3

-4

I 2

ne to :*:L ix.3

-4

I 2

it! : I#I I I I

3 -

Cg-

4

Meaning : if a page has not outgoing links,

there its importance i 's equallydistributed among all pages

I 2

" to :*:L ix.3

-4

i.io:÷ s:*I I I I

ti=

H

tuff←-

-

where a has ones in the position of danglingnodes ( in our case : u

= I §/ )

We can then rewrite the problem, as pollens :

I =

vector of dangling modes

I=

H t U

#t

--

I = diag CI) ,where I

=IEM

=I -

n II

Pagerank version

2-Find w C- IRN Such that WAO and' '

wT

= Wt Mqq.wg.w.nw.ga.ee#,e,.genuegg,iassociated to the eigenvalue I

What about well - posed ness of our previous

Pagerank problem ?

Existence of a solution .

theorem . A more- mall Solution w exists

.

-

i

-

Proof the thesis-

i

I weE IRN

:

weto and wT

=WTM

is equivalent to

Fw c- IRN : Wto and WIM- Id ) = O

is equivalent to

t w E RN: W ± o and ( Mt - Id ) w = O

is then equivalent to det ( MT - Id ) = O

Since the determinant of a matrix is equal to

the determinant of its transpose ,

the problem is

equivalent to

det ( M - id ) = o

this is finally equivalent to the existence of

a vector veto such that ( M - Id ) y = O.

This happens for I = I =L !) .Indeed

,

recalling that M=

I - ' te,

we have

I - ' te I

=I- ' I

= I ,then

( I - in - Id ) I =

O

Ba

Exercise : reasoning as in the previous proof ,

prove that the"

left eigenvalues"

are

the Sanne as the "

right eigenvalues"

.

The left eigenvalues of a matrix AEIRN"

are

the d EIR such that FWERN,

WTA = xwt

The rigth eigenvalues of a matrix AE IRN " Nare

the X EIR such that FVEIRN,

Aw = Xw

NOTE The Pagerank problem is equivalent to-

finding a left eigenvector of MID -' te

Uniqueness of the Solution .

The solution w is not unique i any scalar

multiple of w is solution if w is a solution.

The question is then ; is the Solution

unique upto scalar multiplication ? That is

:I f w is a solution an Wii 's a solution,

can

We say that there is a LER such that

w=

air ?

The answer to the previous question can be

found in the Perron- Frobenius theorem

.

Roughly speaking ,Sufficient conditions for a

uniqueSolution ( upto scalar multiples) w to

of WT= WTA are i

- A is irreducible

- A has strictly positive elements

But,

our matrix M does not fulfill suck

conditions.

Then,

We further modify the

problem .

If M =D - I te is The matrix of the original

Pagerank problem ,now we introduce the

modified matrix A such that

A- = y M t ( i - r )

aItwhere : y is a parameters E 10,1)

.

E. g. i 8=0.85

IEIR

"

is givenHi

, vizo and It. =L

Pagerank version 3f

the problem becomes to find w ⇒ o s.

t. WE WTA

.-Indeed A fulfils the condition of the RF

.

theorem,

then F ! Solution w and one can

also prove that in such a ease Hi, wi > o

What is the interpretation of the modified

Pagerank problemof finding o ⇐ w ERN such that

wT

=WTA

with A = y M t ( e - y ) ¥

It?

Answer : the importance of thepages

is givenin part from the previous idea ( wtf M )and in part it is given according to rt

( this is the poet G - r ) ( XIE ) yT )-

A common choice is I= NIIt

Computation of the solution.

Under the assumption of the P.

F. theorem

,

it is shown that I is the eigenvalue of

A with maximum absolute value.

The matrix A is mousy mimetic but the eigenvector can be

efficiently computed by a"

power method "

the following algorithm gets H, I

, y ,max it

and returnsI ,

are approximation of we ,

after maxi 't power iterations.

function y = pagerank ( H, I

, 8 ,max it )

N = size ( H,

i) ;

look for the dangling nodes and construct A

construct di and B= diag lot )

construct A= y 5- 'Iit C I - y ) ut

y = rand ( n , a) ; y = y / Ily He

-

Since y has positive entries

for it =l

: Marit Kylie = Sam C Y )

y YT A

YT = Y Tf k y If,

← this step is not needed since

Y has already Hy 11,

- I

end

there are some detailsthat make the codemore efficient ( in MATLAB but not only )

o store H as a sparse matrix

o find the dangling nodes :

D=

HonesI Nn) ( equivalent to ol = Suen ( H

,2 ) )

dangling = ( D= = o )o represent d and te as

it=

d t N # dangling e- not expensive to

I= H t ( dangling * ones I say ) compute

2-

may be expensive to Construct

but this is not needed.

We only need to compute ( for xT=ytD - ')It= EHt ( x dangling)* ones ( i. N )

PageRank beyond the Web by David F. Gleich https://arxiv.org/abs/1407.5107

References

these notes follow :

Dario A. Bini

,

" le problem a del PageRank"

An interesting presentation of PageRank from the

probabilistic point of View,

withmanyapplicationsis :

11 "O n :