22
1 KAIST Knowledge Service Engineering Data Mining Lab. Page Rank Algorithm Jung Hoon Kim N5, Room 2239 E-mail: [email protected] 2014.01.14

Page rank algorithm

Embed Size (px)

DESCRIPTION

Page Rank, PR algorithm, page rank algorithm

Citation preview

Page 1: Page rank algorithm

1KAIST Knowledge Service Engineering

Data Mining Lab.

Page Rank AlgorithmJung Hoon Kim

N5, Room 2239 E-mail: [email protected]

2014.01.14

Page 2: Page rank algorithm

Introduction

First introduced by Sergey Brin & Larry Page in 1998

Original ranking algorithm didn’t suitable for web in 1996# of Web pages grew rapidly

in 1996, query “classification technique” => 10 million relevant page searched!

content similarity method are easily spammed vulnerable for spam page

2KAIST Knowledge Service Engineering

Data Mining Lab.

Page 3: Page rank algorithm

Basic

page rank algorithm has two principleA hyperlink from a page pointing to another page is an

implicit conveyance of authority to the target page. thus, the more in-links that a page i receives, the more prestige the page i has

Pages that point to page i also have their own prestige score. A page with higher prestige score pointing to i is more important than a page with a lower prestige score pointing to i

3KAIST Knowledge Service Engineering

Data Mining Lab.

Page 4: Page rank algorithm

principle

hyperlink trick

many incident node means more important

4KAIST Knowledge Service Engineering

Data Mining Lab.

Page 5: Page rank algorithm

Authority

more authority people say .. is more important

John is computer scientistAlice is cooker

5KAIST Knowledge Service Engineering

Data Mining Lab.

Page 6: Page rank algorithm

Big picture

big picture

famous person is means having many incident edges

6KAIST Knowledge Service Engineering

Data Mining Lab.

Page 7: Page rank algorithm

Cyclic problem

In web, there are many cycles like this

this matrix has cycle A->B->Eit means the score is increased by infinitely

7KAIST Knowledge Service Engineering

Data Mining Lab.

Page 8: Page rank algorithm

Random suffer trick

To avoid many problem and many reasonthey adapted random surfer

each node can ability to move any node it can solve cycle problem high incident node can have high rank sometimes it called as damping factor(d)

by google initial model, d = 0.15

8KAIST Knowledge Service Engineering

Data Mining Lab.

Page 9: Page rank algorithm

Test

1000 times test resultnearly correct ;D, A has high rank

A has only one incident link

To easily identify rank, to express percentage is good methods

9KAIST Knowledge Service Engineering

Data Mining Lab.

Page 10: Page rank algorithm

Example

10KAIST Knowledge Service Engineering

Data Mining Lab.

Page 11: Page rank algorithm

Solve cycle problem

Solve cycle problem

11KAIST Knowledge Service Engineering

Data Mining Lab.

Page 12: Page rank algorithm

12KAIST Knowledge Service Engineering

Data Mining Lab.

Formula

P(i) = Score of i page= Number of outlink of j i

a1

b3

c2

Page 13: Page rank algorithm

13KAIST Knowledge Service Engineering

Data Mining Lab.

Formula

in mathematically, we have a system of n linear equations.P=(P1, P2, P3 , … Pn)

A is adjacent matrix, so we can make this formula

Page 14: Page rank algorithm

Example

14KAIST Knowledge Service Engineering

Data Mining Lab.

Page 15: Page rank algorithm

Linear Algebra

formula

P is an eigenvector with the corresponding eigenvalue of 1. 1 is the largest eigenvalue and the PageRank vector P is the

principle eigenvector to calculate P, we can use power iteration algorithm

15KAIST Knowledge Service Engineering

Data Mining Lab.

Page 16: Page rank algorithm

Condition

but the conditions are that A is a stochastic matrix and that it is irreducible and aperiodic

We can see the graph model as markov modeleach web page is node and hyperlink is transition

A is not a stochastic matrix, because there are zero row(5). zero row means no out-link. So we fix the problem by adding a complete set of outgoing

links from each such page i to all the pages on the Web

16KAIST Knowledge Service Engineering

Data Mining Lab.

Page 17: Page rank algorithm

Modified version

17KAIST Knowledge Service Engineering

Data Mining Lab.

Page 18: Page rank algorithm

irreducible

if there is no path from u to v, A is not irreducible because of some pair of nodes u and v.if there are path u to v, A is irreducible!

A state i is periodic with period k > 1 if k is the smallest number such that all paths leading from state i back to state i have a length that is a multiple of k. If a state is not periodic, A markov chain is aperiodic if all states are aperiodic

18KAIST Knowledge Service Engineering

Data Mining Lab.

Page 19: Page rank algorithm

Page Rank

It is easy to deal with the above two problems with a single strategyWe add a link from each page to every page and give each

link a small transition probability controlled by a parameter d

19KAIST Knowledge Service Engineering

Data Mining Lab.

Page 20: Page rank algorithm

Page Rank

The computation of pagerank values of the Web pages can be done using the power iteration method, which produces the principal eigenvector with an eigenvalue of 1

The iteration ends when the PageRank values do not change much or converge.

20KAIST Knowledge Service Engineering

Data Mining Lab.

Page 21: Page rank algorithm

Real Page rank

21KAIST Knowledge Service Engineering

Data Mining Lab.

To deal with web spam is most important thinggive equal random surfer constants and calculate all the

page needs to many times to calculate itCurrently, Google use more 200 factors to calculate

ranking in web

Page 22: Page rank algorithm

22KAIST Knowledge Service Engineering

Data Mining Lab.

Thank you