The Google Pagerank algorithm - How does it work?

QAB Term 1

GUSTAVO ARGUELLO

KUNDAN BHADURI

VERITY NOBLE

IMBA NOV 2010 N1

IE BUSINESS SCHOOL

MARIA DE MOLINA 11

MADRID 28002 SPAIN

Markov Chains and Google Inc.

QAB Term 1 Project: Markov Chains and Google Inc.

IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 1

Table of Contents

Implementing Markov Chains with Google PageRank ......................................................................................................... 2

Issues to be addressed ......................................................................................................................................................... 3

Techniques that may be used to overcome the problem of solving such a large system ................................................... 4

Exhibit 1: A sample 4-state Markov chain with transition probabilities .............................................................................. 6

Exhibit 2: Sample 4X4 transition Matrix ............................................................................................................................... 6

Exhibit 3: Explaining the basis of Markov’s chain ................................................................................................................ 6

Exhibit 4: Demonstrating the stable state values using simple matrix multiplication ......................................................... 7

Exhibit 5: Calculating the steady state eigen values πA and πE ............................................................................................ 8

Exhibit 6: The improved Google PageRank algorithm .......................................................................................................... 8

Exhibit 7: PageRank of the search string ‘Techbend blog’ ................................................................................................... 9

Exhibit 8: The correlation between a webpage and the rest of the web ............................................................................ 9

Exhibit 9: KundanBhaduri.com and its links to other sites................................................................................................. 10

Exhibit 10: Applying Markov Chain method to calculate the PageRank for ‘TechBend blog’ ........................................... 11

Exhibit 11: Computing a small Eigen value with Power Method ....................................................................................... 12



Implementing Markov Chains with Google PageRank

In its most basic form, a homogeneous Markov chain (Exhibit 1) simply refers to a series of events/actions that follow

one another and that are independent of each other, while the transition from one state to another is memory-less.

More scientifically, a Markov chain is a collection of random variables {Xt} which holds the property that given the

current state, the future is conditionally independent of the past.1 The collection of these variables is shown in a

square matrix which is known as the Transition Matrix. Therefore, we can classify a problem to be solvable by the

theory of Markov chains if it bears the following characteristics:

a) At any point in time, any of the objects should be in one and exactly one defined state. At the end of the period,

the object can move to a new state or remain in its original state 2.

b) The objects move between states based on the transition probabilities (Exhibit 2) that depend on only the

current state. The sum of all probabilities of moving to all possible states should be one.

c) The transition probabilities (of going from A to B) remain constant over time.

In order to develop an understanding of how to solve the Markov chain, assume that the simple 2-state chain in

Exhibit 2 describes a simple website. A user typically clicks a link on the homepage (E) for 70% of the time that leads

her to page (A), while the remaining 30% of the time, the user clicks a link that keeps her on the same page (E).

Similarly, once the user is on page (A), 40% of the times, the user clicks another link back to (E) and the remaining 60%

of the time the user clicks a link that keeps her on the same page (E). The Markov chain can help us find the

probabilities of a random user being present on any page after X number of iterations of this chain. The website

administrator might want to use this information in order to decide as to which page to focus on for maximising his ad

revenue. Please note that Google’s implementation of the Markov Chain is that of a Non-Absrobing Markov Chain.

In order to solve this problem, we start by using the tree method of calculating 2nd

level probability Pij (2) i.e. the

probability of going from any node i to j in the 2nd

iteration, where i, j belong to E or A as given in Exhibit 4. Here we

observe that the probability of landing on the page A are now 63% and 64% respectively if the user was at E and A

respectively at the end of the first iteration. Following this method, if we continue working for up to 7 iterations, we

will realize that the probability values have reached a steady state and do not change anymore.

In order to find the steady state probability values of both the webpages, we use the steady state equation of π =

π*P and solve as shown in Exhibit 5. This establishes the Eigen values of πA and πE as 0.63 and 0.37 respectively.

Therefore, we can recommend that it is wiser to spend advertising effort on the page A since in the long run it is twice

as likely to attract clicks as page E. As we progress towards looking at how Google ranks pages according to their

relevance, it will be interesting to note that their Eigen values play a significant part.

Markov chains have significant use in industrial research, organization behaviour, financial markets analysis, human

resource planning, marketing forecast etc. A very interesting use of Markov’s chain has been in the music industry. As

early as in the 1950s, music composers used the Markov Chain to study the pattern of notes in popular songs3 and

thereby create new music sequences based on the studied musical notes.

The example of linked webpages that we discussed above can now be extrapolated to calculate the probability of

arriving at any webpage for a certain search criteria, if the entire World Wide Web is considered as a large connected,

memoryless chain. Based on the relevance criterion, we can estimate the highest relevance factor, and therefore any

page’s utility rank for a search string. This is the rationale behind Google’s patented PageRank algorithm.

1 Weisstein, Eric W. "Markov Chain." From MathWorld - A Wolfram Web Resource. http://mathworld.wolfram.com/MarkovChain.html

2 Tamara Lynn Anthony, Rice University: Markov Chains

3 Verbeurgt Karsten, Dinolfo Michael, Fayer Mikhail: Extracting Patterns in Music for Composition via Markov Chains



Google’s PageRank algorithm4 is a stochastic algorithm that determines the significance of a page relative to a search

string. This is not the only factor that Google adopts to rank pages, but it is an important one. For Google (or for a web

administrator), the PageRank of a page denotes the real probability of a random web surfer reaching that page after

clicking on many links. The PageRanks form a probability distribution over web pages, explaining why the sum of

PageRank of all pages is 1. Refer to Exhibit 6 for a mathematical representation of the PageRank algorithm. Essentially,

the Google PageRank method will rank those pages higher (i.e. more important) that have links to other higher ranked

or more important pages.

Let us explain the algorithm with a real-life example: One of the co-authors of this report is an active Technology

blogger and writes a blog called “The TechBend” at www.KundanBhaduri.com. Exhibit 7 shows that the Google

PageRank of the search string “Techbend blog” is highest for www.KundanBhaduri.com and it thus appears on top of

Google’s search results. Interestingly, while there are other professional sites and blogs with domain names such as

www.TechBend.com etc, yet they do not figure anywhere close to the top of the search results on Google. Let us

explore how this was achieved using the application of Markov Chain.

Holistically, the internet as we know is a connected graph of interlinked webpages (Exhibit 8). Therefore, it will have

an exhaustively large transition probability matrix. One look at Exhibit 9 tells us that for the homepage of The

Techbend to rank high on Google’s PageRank, its Eigen value has to be higher than all other competing webpages that

have the same context. More specifically, Eigen values on connections to those nodes (webpages) in the matrix have

to be high which themselves have high Eigen values with other connections. In other words, the probability of reaching

our target page will be high when coming from another high-probability page. We tested this logic with Exhibits 3 and

5 where we saw that A achieved a higher Eigen value because it was more probable to arrive at A from E or to remain

on A itself. This logic is at the core of Google’s PageRank.

In our example, www.KundanBhaduri.com does achieve a higher PageRank by linking itself with other highly

prominent websites such as Techcrunch, Engadget and TED. Since these sites enjoy a higher PageRank, by linking

themselves back to The Techbend Blog, the overall probability of a random surfer arriving at www.KundanBhaduri.com

is higher than it is for www.TechBend.com. This is explained by a higher Eigen Value (Exhibit 10) and therefore a

higher PageRank for The Techbend. An important factor that needs to be emphasized here is that it is not just about

the number of links that a webpage exchanges with another but its relative importance in the universe of all such links.

Issues to be addressed

However, since the internet is an exhaustively large set of nodes (over 1 trillion)5, there are some issues that need to

be addressed to make the Markov Chain model functional for Google PageRank. Firstly, the calculation of the Eigen

Vector for such a large (and growing) matrix is non-trivial. We will address this issue in the second part of the report.

Other than that, the issues related to handling dangling nodes (i.e. dead pages) and calculating an appropriate

damping factor are significant. The damping factor refers to the probability that the random user will not abruptly end

the session (by either exiting the browser or typing a new URL). In order to avoid a situation of creating an absorbing

Markov chain, pages with no outbound links are assumed to link out to all other pages in the collection. Their

PageRank scores are therefore divided evenly amongst all other pages.

Calculating the preliminary transition matrix of the web is also a significant challenge given the massive size of the

worldwide web. Therefore, a workaround to this problem is by ‘guessing’ the transition matrix and then progressively

correcting the value. Since Google recalculates the PageRanks every time it crawls through the web, its approximation

decreases with each iteration.

4 Hwai-Hui Fu , Dennis K. J. Lin and Hsien-Tang Tsai (Dept. of Bus. Administration, Shu-Te University): Applied Stochastic models in Business and Industry

5 Alpert, Jesse, and Nissan Hajaj. "We Knew the Web Was Big..." Official Google Blog. 25 July 2008. Web. 06 Feb. 2011.



Techniques that may be used to overcome the problem of solving such a large

system:

Now that we understand how Google was able to apply a form of Markov Chain modelling to create their PageRank system,

we will address one of the most significant problems they faced, solving the system π = π P. Solving this equation in a small

matrix we can quickly find exact solutions. When the web was much smaller, Google could compute the steady state vector

of 26 million pages in about 2 hours6. The resulting computation would then be used for a fixed period of time. However,

because of the sheer size of the World Wide Web, which Google asserts the number of websites is now over the 1 trillion

mark7, the resulting stochastic matrix will now contain over a trillion rows and columns.

Additionally, given the dynamics of Web 2.0, it would no longer be efficient for Google to use the stale data from these

computations for a fixed time interval. “Today, Google downloads the web continuously, collecting updated page

information and re-processing the entire web-link graph several times per day”8. In sum, the ever changing, and ever

expanding nature of the World Wide Web and its content, coupled with the search engine’s commitment to provide the

best information available, only serves to multiply exponentially the problem of solving the aforementioned system.

If you think about it, the resulting matrix of the web, with it’s over a trillion columns and rows, is going to be composed

mostly of zeroes, given that most webpages link to a very tiny and limited number of additional web pages. In fact, a 2004

study shows that the average number of out-links from a given webpage is just 52, hence only 52 of the remaining trillion

elements are non-zero.9 This means that the web matrix is very sparse.

In order to solve this problem, one of the main tools that can be used (or a variation thereof that Google appears to have

implemented), is called “The Power Method” or “Power Iteration”. This method applied to the Google matrix will converge

to the PageRank vector, in other words, it will ultimately help us define the weighting or importance of our webpages

relative to the entire matrix. The power method is an iterative process for approximating eigenvalues; we will use this

method to find our dominant Eigenvalue and Eigenvector. “Eigenvectors of a square matrix are the non-zero vectors

which, after being multiplied by the matrix, remain proportional to the original vector".10

In order to implement this

method, we must assume that our matrix, which we will now refer to as matrix A, has a dominant eigenvalue with

corresponding dominant eigenvectors. The dominant eigenvector of a matrix is an eigenvector corresponding to the

eigenvalue of largest magnitude of that matrix. In order to approximate a dominant eigenvector we choose an initial

approximation of one of the dominant eigenvectors of A, which we will call π 0. Then we can form the following sequence11

:

π 1 = A π 0

π 2 = A π 1 = A(A π 0) = A2

π 0

π 3 = A π 2 = A(A2

π 0) = A3

π 0

⁞

π k = A π k-1 = A(Ak-1

π 0) = Ak π 0

For large powers of k, this method provides a good approximation of the dominant eigenvector in matrix A. The method

requires successive iterates until some convergence criterion is satisfied. With our dominant eigenvector, we can find our

dominant eigenvalue using the Rayleigh quotient, as follows12

:

6 Alpert, Jesse, and Nissan Hajaj. "We Knew the Web Was Big..." Official Google Blog. 25 July 2008. Web. 06 Feb. 2011.

<http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html>. 7 Ibid.

8 Ibid.

9 Anuj Nanavati, Arindam Chakraborty, David Deangelis, Hasrat Godil, and Thomas D’Silva, An investigation of documents on the World Wide Web,

hRp://www.iit.edu/˜dsiltho/InvesTgaTon.pdf, December 2004.

10 "Eigenvalues and Eigenvectors." Wikipedia, the Free Encyclopedia. 27 Sept. 2010. Web. 10 Feb. 2011.

http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors. 11

Larson, Ron, David C. Falvo, and Bruce H. Edwards. Elementary Linear Algebra. Boston: Houghton Mifflin, 2004. 550-58. Print. 12

Ibid.



λ = A π ∙ π ___________________________

π ∙ π

“In cases for which the power method generates a good approximation of a dominant eigenvector, the Rayleigh quotient

provides a correspondingly good approximation of the dominant eigenvalue”13

.

One of the unique features of the Google matrix, as we briefly mentioned before, is that the total number of nonzero

elements in a given row is quite small (due to the small number of hyperlinks that a given webpage might contain) (Exhibit

11). Since all our computations involve this sparse matrix multiplied by vectors, an iteration of the power method is

considered very cheap14

.

Another necessary technique Google implemented to make this system solvable was the fix to the dangling node problem.

What happens when a user arrives at a webpage that does not link out to another webpage? Does our random surfer

become absorbed by this webpage, does he never leave? This is the dangling node problem, for which our Markov Chain

could categorize these nodes as absorbing states, unless we do something to correct this situation. Suppose the Google

Matrix was called Matrix H. In order to correct for this, we could create a new matrix S = H + dw, where d is a column

vector that identifies dangling nodes and assigns either a 1 if the node is dangling or a 0 otherwise, and w is a row vector

(w1, w2, …, wn) used to determine where our random surfer will go in order to not become absorbed. One way of assigning

value to this row vector is to say that there is equal probability our surfer will land on any of the n webpages that exist, so

the row for w would look like this: ( ��

�� … �� ). Whilst there are other ways to assign w, this is the most common, and is

sufficient for our purposes.

Another important technique that may be used by Google to help solve the system is the inclusion of a damping factor. The

damping factor is added in to account for the possibility that a given web surfer may at any time choose not to follow the

links on a given webpage that are available to him and type in any URL in order to go to a page that is out of the current

chain. In fact, Brin and Page reference the damping factor in their original paper on Google (submitted while at Stanford),

“The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85”15

.

While the damping factor is intended to model the behaviour of a random web surfer, it also serves the additional

purpose of speeding up convergence of the power method. This is because the ratio of the two eigenvalues largest in

magnitude of the matrix determine how quickly the method converges16

. It has been proven that the second largest

eigenvalue of the Google matrix is less than or equal to the damping factor used17

. The power method converges quickly

when the damping factor is less than 1. According to Rebecca Wills, only 29 iterations are required for the difference

between iterates to become less than 10-2

when using a damping factor of 0.85, the number of iterations goes up to 44

when the damping factor is raised to 0.9018

. Hence, the damping factor increases/speeds the solvability of this complex

system by reducing the iterations necessary to assign PageRank vectors.

While Google’s problem of solving this enormous system is certainly no easy task, especially not at the speed that they

might require. They have been able to overcome these significant obstacles through the unique application of certain

existing mathematical algorithms.

13

Larson, Ron, David C. Falvo, and Bruce H. Edwards. Elementary Linear Algebra. Boston: Houghton Mifflin, 2004. 550-58. Print. 14

Wills, Rebecca. “Google’s PageRank: The Math Behind the Search Engine.” The Mathematical Intelligencer 28.4 (Fall 2006): 6-11. 15

Brin, S., and Page L.. "The Anatomy of a Large-scale Hypertextual Web Search Engine." Computer Networks and ISDN Systems 30.1-7 (1998): 107-17. Print. 16

Gene H. Golub and Charles F. Van Loan, Matrix computations, 3rd ed., The Johns Hopkins University Press, 1996. 17

Taher H. Haveliwala and Sepandar D. Kamvar, The second eigenvalue of the Google matrix, Tech. report, Stanford University, 2003. 18

Wills, Rebecca. “Google’s PageRank: The Math Behind the Search Engine.” The Mathematical Intelligencer 28.4 (Fall 2006): 6-11.



Exhibit 1: A sample 4-state Markov chain with transition probabilities

Exhibit 2: Sample 4X4 transition Matrix

��

Exhibit 3: Explaining the basis of Markov’s chain19

19

Image taken from http://en.wikipedia.org/wiki/Markov_chain

1

2

3 4

P12

P23

P24

P34

P41

P11



Exhibit 4: Demonstrating the stable state values using simple matrix

multiplication

P = �0.3 0.70.4 0.6� Pij (2) = |P|

2ij

� �0.3 0.70.4 0.6� * �0.3 0.70.4 0.6� = �0.37 0.630.36 0.64�

P3 0.363 0.637

0.364 0.636

P4 0.3637 0.6363

0.3636 0.6364

P5 0.36363 0.63637

0.36364 0.63636

P6 0.363637 0.636363

0.363636 0.636364

P7 0.363636 0.636364

0.363636 0.636364

P8 0.363636 0.636364

0.363636 0.636364

P9 0.363636 0.636364

0.363636 0.636364

P10 0.363636 0.636364

0.363636 0.636364

P11 0.363636 0.636364

0.363636 0.636364

P12 0.363636 0.636364

0.363636 0.636364

S

t

a

b

l

e

S

t

a

t

e

V

a

l

u

e

s



Exhibit 5: Calculating the steady state eigen values πA and πE

π = π*P

Therefore, �π� π�� = �π� π��* �0.3 0.70.4 0.6�

Solving these two equations:

1. π� = 0.3*π� +0.4*π�

2. π� = 0.7*π� +0.6*π�

Also, we know that:

3. πE + πA = 1

Since equations 1 & 2 are similar, solving equations 2 and 3 together:

π� = 0.7*(1 − π�) +0.6*π�

Or, �� = 0.63

And, �� = 0.37

Exhibit 6: The improved Google PageRank algorithm

PR(A) = #1 − $∑&('()) ∗ 1+ + $∑&('() ∗ #�-('�)&('�) +�-('�)&('�) +⋯+�-('/)&('/) )

Where:

• PR(A) is the PageRank of page A

• PR(Ti) is the PageRank of pages Ti that link to page A

• C(Ti) is the number of outbound links on page Ti

• n is the total number of all pages that link to page A

• N is the total number of all pages on the web.

It is noteworthy that there is an adjusting damping factor involved in the calculation. The above equation represents

the final version of the PageRank algorithm with the damping factor being incorporated within the first argument on

the RHS of the equation.



Exhibit 7: PageRank of the search string ‘Techbend blog’

Exhibit 8: The correlation between a webpage and the rest of the web20

20

Laure Ninove, Cristobald de Kerchove , Paul Van Dooren: Université Catholique de Louvain

http://www.esat.kuleuven.be/scd/golub/presentations/Gene_PVD.pdf

The importance of these links determines

the overall importance of your webpage to

the PageRank algorithm



Exhibit 9: KundanBhaduri.com and its links to other sites

The homepage of KundanBhaduri.com

hosts the blog The TechBend

Engadget

Very high PageRank

TechCrunch

Very high PageRank

TED

Very high PageRank

Rest of

Internet



Exhibit 10: Applying Markov Chain method to calculate the PageRank for

‘TechBend blog’

Following is the probability matrix that shows the likelihood of a user clicking on a page to arrive at the homepage of

another website when she is searching for the string “TechBend blog”. All site names here refer to their respective

homepages, for the purpose of Markov chain analysis.

For the Stable-state matrix π = π*P �� (1)

We assume:

Webpage Eigen Value

KundanBhaduri.com πA

TechBend.com πB

Engadget.com πC

TED.com πD

TechCrunch.com πE

Therefore using (1), we get:

πA = πA *0.6 + πB*0.42 + πC*0.65 + πD*0.54 + πE*0.64 + …*0.59 + … �� (2)

πB = πA *0.3 + πB*0.1 + πC*0.02 + πD*0.22 + πE*0.17 + …*0.31 + … �� (3)

It is clear from equations (2) and (3) that πA >> πB considering that there are no other webpages on the internet

that are more important (i.e. have higher probability rank) than the pages described in the above table.

Therefore, we conclude that KundanBhaduri.com will have a higher PageRank than TechBend.com for the search

term ‘TechBend blog’

Ku

nda

nB

had

uri

.co

m

Tech

Be

nd.c

om

Enga

dge

t.co

m

TED

.co

m

Tech

Cru

nch

.co

m

…

Re

st o

f th

e In

tern

et

KundanBhaduri.com 0.6 0.3 0.01 0.03 0.01 … …

TechBend.com 0.42 0.1 0.12 0.01 0.11 … …

Engadget.com 0.65 0.02 0.1 0.21 0.01 … …

TED.com 0.54 0.22 0.1 0 0.09 … …

TechCrunch.com 0.64 0.17 0.13 0.01 0 … …

… 0.59 0.31 0.02 0.04 0.01 … …

Rest of the Internet … … … … … … …

Transition Probabilities of

KundanBhaduri.com and TechBend.com



Exhibit 11: Computing a small Eigen value with Power Method

We know that: π = π*P

For a hypothetical π of the order 20X20, notice that most of the nodes are zero. This considerably reduces

the total cost of computing the π*P value, since sum of all the zero valued π row/column values will be zero.

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 4 0 0 0 4 0 0 4 0 0 9 0 0 7 0 0 0 1

0 9 0 0 6 0 0 12 0 8 0 0 8 0 0 5 0 0 2 0

8 0 7 0 0 8 0 0 4 0 2 0 2 0 5 0 0 6 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 5 3

0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 4 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 6 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0

0 0 5 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0

0 0 0 0 0 0 8 0 8 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0

0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 5 0 7 8 0 6 0 6 0 8 1 0

0 0 0 0 0 0 0 8 0 0 0 0 9 0 0 0 2 0 1 0

0 0 0 0 0 8 0 0 0 0 0 0 0 0 7 0 0 0 0 0

0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 5 5 0 0 0 0 0 8 0 0 7

Therefore πA = ∑ �0 ∗ �(�1(2� for value of j = 1 and k belongs to a value between πA to πB

Since most of the values of the above terms are zero, we only need to count for rows 1 and 4 from the table

above. Therefore, πA = 1 * πA + 8 * πD

This helps us solve a large Markov transition probability matrix in a trivial way.

Education

The Google Pagerank algorithm - How does it work?