56
Matrices with Application to Page Rank Anil Maheshwari [email protected] School of Computer Science Carleton University Canada 1

Matrices with Application to Page Rank

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Matrices with Application to Page Rank

Anil Maheshwari

[email protected] of Computer ScienceCarleton UniversityCanada

1

Introduction

Matrices

Matrices

1. A Rectangular Array

2. Operations: Addition; Multiplication; Diagonalization; Transpose; Inverse;Determinant

3. Row Operations; Linear Equations; Gaussian Elimination

4. Types: Identity; Symmetric; Diagonal; Upper/Lower Traingular;Orthogonal; Orthonormal

5. Transformations - Eigenvalues and Eigenvectors

6. Rank; Column and Row Space; Null Space

7. Applications: Page Rank, Dimensionality Reduction, RecommenderSystems, . . .

2

Utility Matrix M

A Matrix M where rows represent users, columns items, and entries in Mrepresents the ratings.

M =

1 1 1 0 0

3 3 3 0 0

4 4 4 0 0

5 5 5 0 0

0 2 0 4 4

0 0 0 5 5

0 1 0 2 2

=

.13 −.02 .01

.41 −.07 .03

.55 −.1 .04

.68 −.11 .05

.15 .59 −.65

.07 .73 .67

.07 .29 −.32

12.5 0 0

0 9.5 0

0 0 1.35

.56 .59 .56 .09 .09

−.12 .02 −.12 .69 .69

.40 −.8 .40 .09 .09

Questions: How to guess missing entries? How to guess ratings for a newuser? . . .

3

Matrix Vector Product

Matrix-vector product: Ax = b

[2 1

3 4

] [4

−2

]=

[2× 4 + 1×−2

3× 4 + 4×−2

]=

[6

4

][

6

4

]

4

Matrix Vector Product

Ax = b as linear combination of columns:[2 1

3 4

] [4

−2

]= 4

[2

3

]−2

[1

4

]

5

Matrix-Matrix Product

• Matrix-matrix product A = BC:[2 0

3 1

] [2 4

0 4

]=

[4 8

6 16

]

• A = BC as sum of rank 1 matrices:

[2 0

3 1

][2 4

0 4

]=

[2

3

] [2 4

]+

[0

1

] [0 4

]=

[4 8

6 12

]+

[0 0

0 4

]

=

[4 8

6 16

]

6

Row Reduced Echelon Form

Let A =

2 2 0

2 4 8

10 16 24

1st Pivot: Replace r2 by r2 − r1, and r3 by r3 − 5r1:2 2 0

0 2 8

0 6 24

2nd Pivot: Replace r3 by r3 − 3r2:2 2 0

0 2 8

0 0 0

7

RREF contd.

Divide the first row by 2, the second row by 2:1 1 0

0 1 4

0 0 0

Replace r1 by r1 − r2:

R =

1 0 −4

0 1 4

0 0 0

8

Rank

A =

2 2 0

2 4 8

10 16 24

RREF−−−→

1 0 −4

0 1 4

0 0 0

= R

Definitions:

• Rank = Number of non-zero pivots = 2

• Basis vectors of row space = rows corresponding to non-zero pivots in Rv1 =

[10−4

]and v2 =

[014

]• Basis vectors of column space = Columns of A corresponding to

non-zero pivots of R.

u1 =[

2210

]and u2 =

[2416

]• A as sum of the product of rank 1 matrices

A = u1vT1 + u2v

T2 =

[2210

][ 1 0 −4 ] +

[2416

][ 0 1 4 ]

9

Null Space

Null space of A = All vectors x such that Ax = 0.This includes the 0 vector

[000

]Is there a vector x = (x1, x2, x3) ∈ R3, such thatAx = x1

[2210

]+ x2

[2416

]+ x3

[0824

]=

[000

]x = (1,−1, 1/4), or any of its scalar multiples, satisfies Ax = 0

Dimension of Null Space of A= Number of columns (A) - rank(A)= 3− 2 = 1

10

Spaces for A

Let A be m× n matrix with real entries.Let R be RREF of A consisting of r ≤ min{m,n} non-zero pivots.

1. rank(A) = r

2. Column space is a subspace of Rm of dimension r, and its basis vectorsare the columns of A corresponding to the non-zero pivots in R.

3. Row space is a subspace of Rn of dimension r, and its basis vectors arethe rows of R corresponding to the non-zero pivots.

4. The null-space of A consists of all the vectors x ∈ Rn satisfying Ax = 0.They form a subspace of dimension n− r.

11

Eigenvalues

Eigenvalues and Eigenvectors

Given an n× n matrix A.A non-zero vector v is an eigenvector of A, if Av = λv for some scalar λ.λ is the eigenvalue corresponding to vector v.

Example

Let A =

[2 1

3 4

]

Observe that

[2 1

3 4

][1

3

]= 5

[1

3

]and

[2 1

3 4

][1

−1

]= 1

[1

−1

]

Thus, λ1 = 5 and λ2 = 1 are the eigenvalues of A.Corresponding eigenvectors are v1 = [1, 3] and v2 = [1,−1], as Av1 = λ1v1

and Av2 = λ2v2.

12

Matrices with distinct eigenvalues

PropertryLet A be an n× n real matrix with n distinct eigenvalues.The corresponding eigenvectors are linearly independent.

Proof: Proof by contradiction. Let λ1, . . . , λn be the distinct eigenvalues andv1, . . . , vn the corresponding eigenvectors, that are linearly dependent.

Assume v1, . . . , vn−1 are L.I. (otherwise work with a smaller set).

Dependence =⇒ α1v1 + . . .+ αn−1vn−1 + αnvn = 0, where αn 6= 0.

=⇒ vn = −α1αn

v1 + . . .+−αn−1

αnvn−1

Multiply by A: Avn = λnvn = −α1αn

λ1v1 + . . .+−αn−1

αnλn−1vn−1

Multiply by λn: λnvn = −α1αn

λnv1 + . . .+−αn−1

αnλnvn−1

Subtract last two equations:0 = −α1

αn(λn − λ1)v1 + . . .+

−αn−1

αn(λn − λn−1)vn−1

Since, λn − λi 6= 0, =⇒ the vectors v1, . . . , vn−1 arre linearly dependent.A contradiction.

2 13

Matrices with distinct eigenvalues

Let A be an n× n real matrix with n distinct eigenvalues.Let λ1, . . . , λn be the distinct eigenvalues and let x1, . . . , xn be thecorresponding eigenvectors, respectively. Let each xi = [xi1, xi2, . . . , xin].

Define an eigenvector matrix X =

x11 x21 . . . xn1

......

......

x1n x2n . . . xnn

Define a diagonal n× n matrix Λ

λ1 0 0 . . . 0

0 λ2 0 . . . 0...

......

......

0 0 . . . 0 λn

Consider the matrix product AX,

AX = A

x1 . . . xn

=

λ1x1 . . . λnxn

= XΛ

14

Matrices with distinct eigenvalues

Since eigenvectors are linearly independent, we know that X−1 exists.

Multiply by X−1 on both the sides from left in AX = XΛ and we obtain

X−1AX = X−1XΛ = Λ (1)

and when we multiply on the right we obtain

AXX−1 = A = XΛX−1 (2)

An Application of Diagonalization A = XΛX−1

Consider A2 = (XΛX−1)(XΛX−1) = XΛ(X−1X)ΛX−1 = XΛ2X−1

=⇒ A2 has the same set of eigenvectors as A, but eigenvalues aresquared.

Similarly, Ak = XΛkX−1.Eigenvectors of Ak are same as that of A and its eigenvalues are raised tothe power of k.

15

Eigenvalues of Ak

Let Avi = λivi

Consider: A2vi = A(Avi) = A(λivi) = λi(Avi) = λi(λivi) = λ2i vi

=⇒ A2vi = λ2i vi

Eigenvalues of Ak

For an integer k > 0, Ak has the same eigenvectors as A, but theeigenvalues are λk.

16

Symmetric Matrices

ExampleConsider symmetric matrix S = [ 3 1

1 3 ].Its eigenvalues are λ1 = 4 and λ2 = 2 and the corresponding eigenvectorsare q1 = (1/

√2, 1/√

2) and q2 = (1/√

2,−1/√

2), respectively.Note that eigenvalues are real and the eigenvectors are orthonormal.

S =

[3 1

1 3

]=

[1/√

2 1/√

2

1/√

2 −1/√

2

][4 0

0 2

][1/√

2 1/√

2

1/√

2 −1/√

2

]

Eigenvalues of Symmetric MatricesAll the eigenvalues of a real symmetric matrix S are real. Moreover, allcomponents of the eigenvectors of a real symmetric matrix S are real.

17

Symmetric Matrices (contd.)

PropertyAny pair of eigenvectors of a real symmetric matrix S corresponding to twodifferent eigenvalues are orthogonal.

Proof: Let q1 and q2 be eigenvectors corresponding to λ1 6= λ2, respectively.We have Sq1 = λ1q1 and Sq2 = λ2q2.Now (Sq1)T = qT1 S

T = qT1 S = λ1qT1 , as S is symmetric,

Multiply by q2 on the right and we obtain λ1qT1 q2 = qT1 Sq2 = qT1 λ2q2.

Since λ1 6= λ2 and λ1qT1 q2 = qT1 λ2q2, this implies that qT1 q2 = 0 and thus the

eigenvectors q1 and q2 are orthogonal.2

18

Symmetric Matrices (contd.)

Symmetric matrices with distinct eigenvaluesLet S be a n× n symmetric matrix with n distinct eigenvalues and letq1, . . . , qn be the corresponding orthonormal eigenvectors. Let Q be then× n matrix consiting of q1, . . . , qn as its columns. ThenS = QΛQ−1 = QΛQT . Furthermore, S = λ1q1q

T1 + λ2q2q

T2 + · · ·+ λnqnq

Tn

S =

[3 1

1 3

]= 4

[1/√2

1/√2

] [1/√2 1/

√2]+ 2

[1/√2

−1/√2

] [1/√2 −1/

√2]

19

Summary for Symmetric Matrices

TheoremFor a real symmetric n× n matrix S, we have

1. All eigenvalues of S are real.

2. S can be expressed as S = QΛQT , where Q consists of orthonormalbasis of Rn formed by n eigenvectors of S, and Λ is a diagonal matrixconsisting of n eigenvalues of S.

3. S can be expressed as the sum of the product of rank 1 matrices:

S = λ1q1qT1 + . . .+ λnqnq

Tn

Note: Since Q is a basis of Rn, any vector x can be expressed as a linearcombination x = α1q1 + . . .+ αnqn

Consider x · qi = (α1q1 + . . .+ αnqn) · qi = αi

20

Inverse of Symmetric Matrices

Claim

S = QΛQT and S−1 = 1λ1q1q

T1 + . . .+ 1

λnqnq

Tn

Proof Sketch: S = QΛQT = λ1q1qT1 + . . .+ λnqnq

Tn

SS−1 = (λ1q1qT1 + . . .+ λnqnq

Tn )( 1

λ1q1q

T1 + . . .+ 1

λnqnq

Tn ) = I as q1, . . . , qn

are orthonormal.2

21

Positive Definite Matrices

Positive Definite MatricesA symmetric matrix S is positive definite if all its eigenvalues > 0.It is positive semi-definite if all the eigenvalues are ≥ 0.

An Alternate CharacterizationLet S be a n× n real symmetric matrix. For all non-zero vectors x ∈ Rn, ifxTSx > 0 holds, then all the eigenvalues of S are > 0.

Proof: Let λi be an eigenvalue of S.Let the corresponding unit eigenvector is qi.Note that qTi qi = 1.Since S is symmetric, we know that λi is real.Now we have, λi = λiq

Ti qi = qTi λiqi = qTi Sqi.

But qTi Sqi > 0, hence λi > 0.2

22

Eigenvalue Identities

TraceLet λ1, . . . , λn be eigenvalues of n× n real matrix A.

trace(A) =n∑i=1

aii =n∑i=1

λi

Determinant

det(A) =n∏i=1

λi

23

Markov Matrices

Markov Matrices

1/3

2/3

P Q

R

1/2

1/2

1/3 2/3

P Q RP 0 1/3 1/3Q 1/2 0 2/3R 1/2 2/3 0

24

Markov Chain

- X0, X1, . . . be a sequence of r. v. that evolve over time.- At time 0, we have X0, followed by X1 at time 1, . . .- Assume each Xi takes value from the set {1, . . . , n} that represents the setof states.- This sequence is a Markov chain if the probability that Xm+1 equals aparticular state αm+1 ∈ {1, . . . , n} only depends on what is the state of Xmand is completely independent of the states of X0, . . . , Xm−1.

Memoryless property:P [Xm+1 = αm+1|Xm = αm, Xm−1 = αm−1, . . . , X0 = α0] = P [Xm+1 = αm+1|Xm =

αm], where α0, . . . , αm+1, · · · ∈ {1, . . . , n}

25

Memoryless Property

1/3

2/3

P Q

R

1/2

1/2

1/3 2/3

P Q RP 0 1/3 1/3Q 1/2 0 2/3R 1/2 2/3 0

26

Markov Matrices

What is a Markov Matrix?

A square matrix A is a Markovian Matrix if

1. A[i, j] = probability of transition from the state j to state i.

2. Sum of the values within any column is 1 (= probability of leaving from astate to any of the possible states).

27

State Transitions

Start in an initial state and in each successive step make a transition from thecurrent state to the next state respecting the probabilities.

1. What is the probability of reaching the state j after taking n stepsstarting from the state i?

2. Given an initial probability vector representing the probabilities of startingin various states, what is the steady state? After traversing the chain fora large number of steps, what is the probability of landing in variousstates?

1/3

2/3

P Q

R

1/2

1/2

1/3 2/3

28

Types of States

Recurrent State: A state i is recurrent if starting from state i, with probability1, we can return to the state i after making finitely many transitions.

Transient State: A state i is transient, i.e. there is a non-zero probability ofnot returning to the state i.

1 2

3

4 5

6

Figure 1: Recurrent States={1,2,3}. Transient States={4,5,6}

29

Irreducible Markov Chains

A Markov chain is irreducible if it is possible to go between any pair ofstates in a finite number of steps. Otherwise it is called reducible.

Observation: If the graph is strongly connected then it is irreducible.

1 2

3

4 5

6

30

Aperiodic Markov Chains

Period of a statePeriod of a state i is the greatest common divisor (GCD) of all possiblenumber of steps it takes the chain to return to the state i starting from i.

Note: If there is no way to return to i starting from i, then its period isundefined.

Aperiodic Markov ChainA Markov chain is aperiodic if the periods of each of its states is 1.

31

Eigenvalues of Markov Matrices

A =

0 1/3 1/3

1/2 0 2/3

1/2 2/3 0

Eigenvalues of A are the roots of det(A− λI) = 0

Eigenvalue Eigenvectorλ1 = 1 v1 = (2/3, 1, 1)

λ2 = −2/3 v2 = (0,−1, 1)

λ3 = −1/3 v3 = (−2, 1, 1)

Observe: Largest (principal) eigenvalue is 1 and the corresponding(principal) eigenvector is (2/3, 1, 1). Note that Avi = λivi, for i = 1, . . . , 3.

Any vector can be converted to a unit vector: v||v||

Example: v1 = ( 23, 1, 1)→ 3√

22( 23, 1, 1)

32

Principal Eigenvalue of Markov Matrices

Principal EigenvalueThe largest eigenvalue of a Markovian matrix is 1

See Notes on Algorithm Design for the proof.

Idea: Let B = AT−→1 is an Eigenvector of B, as B

−→1 = 1

−→1

=⇒ 1 is an Eigenvalue of A.

Using contradiction, show that B cannot have any eigenvalue > 1

33

Eigenvalues of Powers of A

A =

0 1/3 1/3

1/2 0 2/3

1/2 2/3 0

Note that all the entries in A2 are > 0 and all the entries within a column stilladds to 1.

A2 =

1/3 2/9 2/9

1/3 11/17 1/6

1/3 1/6 11/17

Ak is MarkovianIf the entries within each column of A adds to 1, then entries within eachcolumn of Ak, for any integer k > 0, will add to 1.

34

Random Surfer Model

Initial: Surfer with probability vector u0 = (1/3, 1/3, 1/3)

u1 = Au0 =

0 1/3 1/3

1/2 0 2/3

1/2 2/3 0

1/3

1/3

1/3

=

4/18

7/18

7/18

u2 = Au1 =

0 1/3 1/3

1/2 0 2/3

1/2 2/3 0

4/18

7/18

7/18

=

7/27

10/27

10/27

Likewise, we compute u3 = Au2 = [20/81, 61/162, 61/162],u4 = Au3 = [61/243, 91/243, 91/243],u5 = Au4 = [182/729, 547/1458, 547/1458],. . .

u∞ = [0.25, 0.375, 0.375] = [2/8, 3/8, 3/8]

35

Linear Combination of Eigenvectors

u0 =

1/3

1/3

1/3

= c1

2/3

1

1

+ c2

0

−11

+ c3

−211

u1 = Au0

= c1Av1 + c2Av2 + c3Av3

= c1λ1v1 + c2λ2v2 + c3λ3v3 (as Avi = λivi)

Thus,

u1 = A

1/3

1/3

1/3

= c1λ1

2/3

1

1

+ c2λ2

0

−11

+ c3λ3

−211

36

Linear Combination of Eigenvectors(contd.)

u2 = Au1 = A2u0 = c1λ

21

2/3

1

1

+ c2λ22

0

−11

+ c3λ23

−211

In general, for integer k > 0, uk = Aku0 = c1λk1v1 + c2λ

k2v2 + c3λ

k3v3, i.e.

uk = Ak

1/3

1/3

1/3

= c1λk1

2/3

1

1

+ c2λk2

0

−11

+ c3λk3

−211

and that equals

uk = c11k

2/3

1

1

+ c2(−2

3)k

0

−11

+ c3(−1

3)k

−211

37

Linear Combination of Eigenvectors(contd.)

For large values of k, ( 23)k → 0 and ( 1

3)k → 0. The above expression

reduces to

uk ≈ c1

2/3

1

1

=3

8

2/3

1

1

=

2/8

3/8

3/8

Note that the value of c1 is derived by solving the equation foru0 = c1v1 + c2v2 + c3v3 for u0 = [1/3, 1/3, 1/3]

38

Linear Combination of Eigenvectors(contd.)

Suppose u0 = [1/4, 1/4, 1/2]

u1 = Au0 = [1/4, 11/24, 7/24]

u2 = Au1 = [1/4, 23/72, 31/72]

u3 = Au2 = [1/4, 89/216, 73/216]

. . .

u∞ = [2/8, 3/8, 3/8]

39

Convergence?

Entries in Ak

Assume that all the entries of a Markov matrix A, or of some finite power ofA, i.e. Ak for some integer k > 0, are strictly > 0. A corresponds to anirreducible aperiodic Markov chain.

Irreducible: for any pair of states i and j, it is always possible to go fromstate i to state j in finite number of steps with positive probability.

Period of a state i: GCD of all possible number of steps it takes the chain toreturn to the state i starting from i.

Aperiodic: M is aperiodic if the GCD is 1 for the period of each of the statesin M .

40

Properties of Markov Matrix A, when Ak > 0

1 2 3

A =

0 0 1

1 1/2 0

0 1/2 0

A2 =

0 0 1

1 1/2 0

0 1/2 0

0 0 1

1 1/2 0

0 1/2 0

=

0 1/2 0

1/2 1/4 1

1/2 1/4 0

A3 =

1/2 1/4 0

1/4 5/8 1/2

1/4 1/8 1/2

A4 =

1/4 1/8 1/2

5/8 9/16 1/4

1/8 5/16 1/4

A4 > 0 and for k ≥ 4, Ak > 0.

A corresponds to irreducible aperiodic Markov chain.

41

Perron-Frobenius Theorem

Assume A corresponds to an irreducible aperiodic Markov chain M .

Perron-Frobenius Theorem from linear algebra states that

1. Largest eigenvalue 1 of A is unique

2. All other eigenvalues of A have magnitude strictly smaller than 1

3. All the coordinates of the eigenvector v1 corresponding to the eigenvalue1 are > 0

4. The steady state corresponds to the eigenvector v1

42

Pagerank

Pagerank Algorithm

Problem: How to rank the web-pages?

Ranking assigns a real number to each web-page.The higher the number, the more important the page is.Needs to be automated, as the web is extremely large.

We will study the Page Rank algorithm.

Source: Page, Brin, Motwani, Winograd, The PageRank citation ranking:Bringing order to the Web published as a technical report in1998).

43

Web as a Graph

- G = (V,E) is a positively weighted directed graph- Each web-page is a vertex of G- If a web-page u points (links) to the web-page v, there is a directed edgefrom u to v- The weight of an edge uv is 1

out-degree(u)

Assume V = {v1, . . . , vn}n× n adjacency matrix M of G is:

M(i, j) =

{1

out-degree(vj), if vjvi ∈ E

0 otherwise

Assumption: A surfer will make a random transition from a web-page to whatit points to.

44

An Example

v1 v2

v3v4

v5

M =

0 0 1/2 1/3 0

1/2 0 0 0 0

1/2 1/2 0 0 0

0 1/2 1/2 1/3 0

0 0 0 1/3 0

45

Remarks

1. Assumes users will visit useful pages rather than useless pages.

2. Random Surfer Model - Assume initially a web-surfer is equally likely tobe at any node of G, given by the vector v0 = (1/|V |, . . . , 1/|V |).

3. In each step it makes a transition: v1 = Mv, v2 = Mv1 = M2v0, . . .,vk = Mvk−1 = Mkv0.

4. Need to worry about sink nodes/dead ends; circling within same set ofnodes; and whether we will reach a steady state?

46

Abstract representation of a web graph

Strongly

Connected

Component

In Component Out Component

• In-Component: Nodes that can reach strongly-connected component

• Out-component: Nodes that can be reached from strongly-connectedcomponent

• Possibly multiple copies of above configuration

47

Avoiding Sink Nodes

Idea: Make sink nodes point to all other nodes.

M =

0 0 1/2 1/3 0

1/2 0 0 0 0

1/2 1/2 0 0 0

0 1/2 1/2 1/3 0

0 0 0 1/3 0

0 0 1/2 1/3 1/5

1/2 0 0 0 1/5

1/2 1/2 0 0 1/5

0 1/2 1/2 1/3 1/5

0 0 0 1/3 1/5

= Q

v1 v2

v3v4

v5

48

Teleportation - Key Idea

Define K = αQ+ 1−αnE

Teleportation Parameter: 0 < α < 1, e.g α = 0.9

E is a n× n matrix of all 1s.

Observations on K:

1. Each entry of K is > 0

2. The entries within each column sums to 1

3. K satisfies the requirements of irreducible aperiodic Markov chain

4. Its largest eigenvalue is 1

5. By Perron-Frobenius Theorem, the steady state (=page ranks)correspond to the principal eigenvector

49

Conclusions

Computational Issues: K = αQ+ 1−αnE

Q is sparse and E is special.

Favors: Teleport to specific pages. Teleport to topic-sensitive pages (Sports,Business, Science, News, ...) based on the profile of the user.

Caution: Real story is not that simple

50

References

1. Link Analysis Chapter in mmds.org

2. Chapter on Matrices in CS in my notes on algorithm design

3. Page, Brin, Motwani, Winograd, The PageRank citation ranking:Bringing order to the Web published as a technical report in1998.

4. Brin and Page, The Anatomy of a Large-Scale Hypertextual Web SearchEngine, Computer Networks 56 (18): 3825-3833, Reprinted in 2012.

51