Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Matrices with Application to Page Rank
Anil Maheshwari
[email protected] of Computer ScienceCarleton UniversityCanada
1
Matrices
1. A Rectangular Array
2. Operations: Addition; Multiplication; Diagonalization; Transpose; Inverse;Determinant
3. Row Operations; Linear Equations; Gaussian Elimination
4. Types: Identity; Symmetric; Diagonal; Upper/Lower Traingular;Orthogonal; Orthonormal
5. Transformations - Eigenvalues and Eigenvectors
6. Rank; Column and Row Space; Null Space
7. Applications: Page Rank, Dimensionality Reduction, RecommenderSystems, . . .
2
Utility Matrix M
A Matrix M where rows represent users, columns items, and entries in Mrepresents the ratings.
M =
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
=
.13 −.02 .01
.41 −.07 .03
.55 −.1 .04
.68 −.11 .05
.15 .59 −.65
.07 .73 .67
.07 .29 −.32
12.5 0 0
0 9.5 0
0 0 1.35
.56 .59 .56 .09 .09
−.12 .02 −.12 .69 .69
.40 −.8 .40 .09 .09
Questions: How to guess missing entries? How to guess ratings for a newuser? . . .
3
Matrix Vector Product
Matrix-vector product: Ax = b
[2 1
3 4
] [4
−2
]=
[2× 4 + 1×−2
3× 4 + 4×−2
]=
[6
4
][
6
4
]
4
Matrix Vector Product
Ax = b as linear combination of columns:[2 1
3 4
] [4
−2
]= 4
[2
3
]−2
[1
4
]
5
Matrix-Matrix Product
• Matrix-matrix product A = BC:[2 0
3 1
] [2 4
0 4
]=
[4 8
6 16
]
• A = BC as sum of rank 1 matrices:
[2 0
3 1
][2 4
0 4
]=
[2
3
] [2 4
]+
[0
1
] [0 4
]=
[4 8
6 12
]+
[0 0
0 4
]
=
[4 8
6 16
]
6
Row Reduced Echelon Form
Let A =
2 2 0
2 4 8
10 16 24
1st Pivot: Replace r2 by r2 − r1, and r3 by r3 − 5r1:2 2 0
0 2 8
0 6 24
2nd Pivot: Replace r3 by r3 − 3r2:2 2 0
0 2 8
0 0 0
7
RREF contd.
Divide the first row by 2, the second row by 2:1 1 0
0 1 4
0 0 0
Replace r1 by r1 − r2:
R =
1 0 −4
0 1 4
0 0 0
8
Rank
A =
2 2 0
2 4 8
10 16 24
RREF−−−→
1 0 −4
0 1 4
0 0 0
= R
Definitions:
• Rank = Number of non-zero pivots = 2
• Basis vectors of row space = rows corresponding to non-zero pivots in Rv1 =
[10−4
]and v2 =
[014
]• Basis vectors of column space = Columns of A corresponding to
non-zero pivots of R.
u1 =[
2210
]and u2 =
[2416
]• A as sum of the product of rank 1 matrices
A = u1vT1 + u2v
T2 =
[2210
][ 1 0 −4 ] +
[2416
][ 0 1 4 ]
9
Null Space
Null space of A = All vectors x such that Ax = 0.This includes the 0 vector
[000
]Is there a vector x = (x1, x2, x3) ∈ R3, such thatAx = x1
[2210
]+ x2
[2416
]+ x3
[0824
]=
[000
]x = (1,−1, 1/4), or any of its scalar multiples, satisfies Ax = 0
Dimension of Null Space of A= Number of columns (A) - rank(A)= 3− 2 = 1
10
Spaces for A
Let A be m× n matrix with real entries.Let R be RREF of A consisting of r ≤ min{m,n} non-zero pivots.
1. rank(A) = r
2. Column space is a subspace of Rm of dimension r, and its basis vectorsare the columns of A corresponding to the non-zero pivots in R.
3. Row space is a subspace of Rn of dimension r, and its basis vectors arethe rows of R corresponding to the non-zero pivots.
4. The null-space of A consists of all the vectors x ∈ Rn satisfying Ax = 0.They form a subspace of dimension n− r.
11
Eigenvalues and Eigenvectors
Given an n× n matrix A.A non-zero vector v is an eigenvector of A, if Av = λv for some scalar λ.λ is the eigenvalue corresponding to vector v.
Example
Let A =
[2 1
3 4
]
Observe that
[2 1
3 4
][1
3
]= 5
[1
3
]and
[2 1
3 4
][1
−1
]= 1
[1
−1
]
Thus, λ1 = 5 and λ2 = 1 are the eigenvalues of A.Corresponding eigenvectors are v1 = [1, 3] and v2 = [1,−1], as Av1 = λ1v1
and Av2 = λ2v2.
12
Matrices with distinct eigenvalues
PropertryLet A be an n× n real matrix with n distinct eigenvalues.The corresponding eigenvectors are linearly independent.
Proof: Proof by contradiction. Let λ1, . . . , λn be the distinct eigenvalues andv1, . . . , vn the corresponding eigenvectors, that are linearly dependent.
Assume v1, . . . , vn−1 are L.I. (otherwise work with a smaller set).
Dependence =⇒ α1v1 + . . .+ αn−1vn−1 + αnvn = 0, where αn 6= 0.
=⇒ vn = −α1αn
v1 + . . .+−αn−1
αnvn−1
Multiply by A: Avn = λnvn = −α1αn
λ1v1 + . . .+−αn−1
αnλn−1vn−1
Multiply by λn: λnvn = −α1αn
λnv1 + . . .+−αn−1
αnλnvn−1
Subtract last two equations:0 = −α1
αn(λn − λ1)v1 + . . .+
−αn−1
αn(λn − λn−1)vn−1
Since, λn − λi 6= 0, =⇒ the vectors v1, . . . , vn−1 arre linearly dependent.A contradiction.
2 13
Matrices with distinct eigenvalues
Let A be an n× n real matrix with n distinct eigenvalues.Let λ1, . . . , λn be the distinct eigenvalues and let x1, . . . , xn be thecorresponding eigenvectors, respectively. Let each xi = [xi1, xi2, . . . , xin].
Define an eigenvector matrix X =
x11 x21 . . . xn1
......
......
x1n x2n . . . xnn
Define a diagonal n× n matrix Λ
λ1 0 0 . . . 0
0 λ2 0 . . . 0...
......
......
0 0 . . . 0 λn
Consider the matrix product AX,
AX = A
x1 . . . xn
=
λ1x1 . . . λnxn
= XΛ
14
Matrices with distinct eigenvalues
Since eigenvectors are linearly independent, we know that X−1 exists.
Multiply by X−1 on both the sides from left in AX = XΛ and we obtain
X−1AX = X−1XΛ = Λ (1)
and when we multiply on the right we obtain
AXX−1 = A = XΛX−1 (2)
An Application of Diagonalization A = XΛX−1
Consider A2 = (XΛX−1)(XΛX−1) = XΛ(X−1X)ΛX−1 = XΛ2X−1
=⇒ A2 has the same set of eigenvectors as A, but eigenvalues aresquared.
Similarly, Ak = XΛkX−1.Eigenvectors of Ak are same as that of A and its eigenvalues are raised tothe power of k.
15
Eigenvalues of Ak
Let Avi = λivi
Consider: A2vi = A(Avi) = A(λivi) = λi(Avi) = λi(λivi) = λ2i vi
=⇒ A2vi = λ2i vi
Eigenvalues of Ak
For an integer k > 0, Ak has the same eigenvectors as A, but theeigenvalues are λk.
16
Symmetric Matrices
ExampleConsider symmetric matrix S = [ 3 1
1 3 ].Its eigenvalues are λ1 = 4 and λ2 = 2 and the corresponding eigenvectorsare q1 = (1/
√2, 1/√
2) and q2 = (1/√
2,−1/√
2), respectively.Note that eigenvalues are real and the eigenvectors are orthonormal.
S =
[3 1
1 3
]=
[1/√
2 1/√
2
1/√
2 −1/√
2
][4 0
0 2
][1/√
2 1/√
2
1/√
2 −1/√
2
]
Eigenvalues of Symmetric MatricesAll the eigenvalues of a real symmetric matrix S are real. Moreover, allcomponents of the eigenvectors of a real symmetric matrix S are real.
17
Symmetric Matrices (contd.)
PropertyAny pair of eigenvectors of a real symmetric matrix S corresponding to twodifferent eigenvalues are orthogonal.
Proof: Let q1 and q2 be eigenvectors corresponding to λ1 6= λ2, respectively.We have Sq1 = λ1q1 and Sq2 = λ2q2.Now (Sq1)T = qT1 S
T = qT1 S = λ1qT1 , as S is symmetric,
Multiply by q2 on the right and we obtain λ1qT1 q2 = qT1 Sq2 = qT1 λ2q2.
Since λ1 6= λ2 and λ1qT1 q2 = qT1 λ2q2, this implies that qT1 q2 = 0 and thus the
eigenvectors q1 and q2 are orthogonal.2
18
Symmetric Matrices (contd.)
Symmetric matrices with distinct eigenvaluesLet S be a n× n symmetric matrix with n distinct eigenvalues and letq1, . . . , qn be the corresponding orthonormal eigenvectors. Let Q be then× n matrix consiting of q1, . . . , qn as its columns. ThenS = QΛQ−1 = QΛQT . Furthermore, S = λ1q1q
T1 + λ2q2q
T2 + · · ·+ λnqnq
Tn
S =
[3 1
1 3
]= 4
[1/√2
1/√2
] [1/√2 1/
√2]+ 2
[1/√2
−1/√2
] [1/√2 −1/
√2]
19
Summary for Symmetric Matrices
TheoremFor a real symmetric n× n matrix S, we have
1. All eigenvalues of S are real.
2. S can be expressed as S = QΛQT , where Q consists of orthonormalbasis of Rn formed by n eigenvectors of S, and Λ is a diagonal matrixconsisting of n eigenvalues of S.
3. S can be expressed as the sum of the product of rank 1 matrices:
S = λ1q1qT1 + . . .+ λnqnq
Tn
Note: Since Q is a basis of Rn, any vector x can be expressed as a linearcombination x = α1q1 + . . .+ αnqn
Consider x · qi = (α1q1 + . . .+ αnqn) · qi = αi
20
Inverse of Symmetric Matrices
Claim
S = QΛQT and S−1 = 1λ1q1q
T1 + . . .+ 1
λnqnq
Tn
Proof Sketch: S = QΛQT = λ1q1qT1 + . . .+ λnqnq
Tn
SS−1 = (λ1q1qT1 + . . .+ λnqnq
Tn )( 1
λ1q1q
T1 + . . .+ 1
λnqnq
Tn ) = I as q1, . . . , qn
are orthonormal.2
21
Positive Definite Matrices
Positive Definite MatricesA symmetric matrix S is positive definite if all its eigenvalues > 0.It is positive semi-definite if all the eigenvalues are ≥ 0.
An Alternate CharacterizationLet S be a n× n real symmetric matrix. For all non-zero vectors x ∈ Rn, ifxTSx > 0 holds, then all the eigenvalues of S are > 0.
Proof: Let λi be an eigenvalue of S.Let the corresponding unit eigenvector is qi.Note that qTi qi = 1.Since S is symmetric, we know that λi is real.Now we have, λi = λiq
Ti qi = qTi λiqi = qTi Sqi.
But qTi Sqi > 0, hence λi > 0.2
22
Eigenvalue Identities
TraceLet λ1, . . . , λn be eigenvalues of n× n real matrix A.
trace(A) =n∑i=1
aii =n∑i=1
λi
Determinant
det(A) =n∏i=1
λi
23
Markov Chain
- X0, X1, . . . be a sequence of r. v. that evolve over time.- At time 0, we have X0, followed by X1 at time 1, . . .- Assume each Xi takes value from the set {1, . . . , n} that represents the setof states.- This sequence is a Markov chain if the probability that Xm+1 equals aparticular state αm+1 ∈ {1, . . . , n} only depends on what is the state of Xmand is completely independent of the states of X0, . . . , Xm−1.
Memoryless property:P [Xm+1 = αm+1|Xm = αm, Xm−1 = αm−1, . . . , X0 = α0] = P [Xm+1 = αm+1|Xm =
αm], where α0, . . . , αm+1, · · · ∈ {1, . . . , n}
25
Markov Matrices
What is a Markov Matrix?
A square matrix A is a Markovian Matrix if
1. A[i, j] = probability of transition from the state j to state i.
2. Sum of the values within any column is 1 (= probability of leaving from astate to any of the possible states).
27
State Transitions
Start in an initial state and in each successive step make a transition from thecurrent state to the next state respecting the probabilities.
1. What is the probability of reaching the state j after taking n stepsstarting from the state i?
2. Given an initial probability vector representing the probabilities of startingin various states, what is the steady state? After traversing the chain fora large number of steps, what is the probability of landing in variousstates?
1/3
2/3
P Q
R
1/2
1/2
1/3 2/3
28
Types of States
Recurrent State: A state i is recurrent if starting from state i, with probability1, we can return to the state i after making finitely many transitions.
Transient State: A state i is transient, i.e. there is a non-zero probability ofnot returning to the state i.
1 2
3
4 5
6
Figure 1: Recurrent States={1,2,3}. Transient States={4,5,6}
29
Irreducible Markov Chains
A Markov chain is irreducible if it is possible to go between any pair ofstates in a finite number of steps. Otherwise it is called reducible.
Observation: If the graph is strongly connected then it is irreducible.
1 2
3
4 5
6
30
Aperiodic Markov Chains
Period of a statePeriod of a state i is the greatest common divisor (GCD) of all possiblenumber of steps it takes the chain to return to the state i starting from i.
Note: If there is no way to return to i starting from i, then its period isundefined.
Aperiodic Markov ChainA Markov chain is aperiodic if the periods of each of its states is 1.
31
Eigenvalues of Markov Matrices
A =
0 1/3 1/3
1/2 0 2/3
1/2 2/3 0
Eigenvalues of A are the roots of det(A− λI) = 0
Eigenvalue Eigenvectorλ1 = 1 v1 = (2/3, 1, 1)
λ2 = −2/3 v2 = (0,−1, 1)
λ3 = −1/3 v3 = (−2, 1, 1)
Observe: Largest (principal) eigenvalue is 1 and the corresponding(principal) eigenvector is (2/3, 1, 1). Note that Avi = λivi, for i = 1, . . . , 3.
Any vector can be converted to a unit vector: v||v||
Example: v1 = ( 23, 1, 1)→ 3√
22( 23, 1, 1)
32
Principal Eigenvalue of Markov Matrices
Principal EigenvalueThe largest eigenvalue of a Markovian matrix is 1
See Notes on Algorithm Design for the proof.
Idea: Let B = AT−→1 is an Eigenvector of B, as B
−→1 = 1
−→1
=⇒ 1 is an Eigenvalue of A.
Using contradiction, show that B cannot have any eigenvalue > 1
33
Eigenvalues of Powers of A
A =
0 1/3 1/3
1/2 0 2/3
1/2 2/3 0
Note that all the entries in A2 are > 0 and all the entries within a column stilladds to 1.
A2 =
1/3 2/9 2/9
1/3 11/17 1/6
1/3 1/6 11/17
Ak is MarkovianIf the entries within each column of A adds to 1, then entries within eachcolumn of Ak, for any integer k > 0, will add to 1.
34
Random Surfer Model
Initial: Surfer with probability vector u0 = (1/3, 1/3, 1/3)
u1 = Au0 =
0 1/3 1/3
1/2 0 2/3
1/2 2/3 0
1/3
1/3
1/3
=
4/18
7/18
7/18
u2 = Au1 =
0 1/3 1/3
1/2 0 2/3
1/2 2/3 0
4/18
7/18
7/18
=
7/27
10/27
10/27
Likewise, we compute u3 = Au2 = [20/81, 61/162, 61/162],u4 = Au3 = [61/243, 91/243, 91/243],u5 = Au4 = [182/729, 547/1458, 547/1458],. . .
u∞ = [0.25, 0.375, 0.375] = [2/8, 3/8, 3/8]
35
Linear Combination of Eigenvectors
u0 =
1/3
1/3
1/3
= c1
2/3
1
1
+ c2
0
−11
+ c3
−211
u1 = Au0
= c1Av1 + c2Av2 + c3Av3
= c1λ1v1 + c2λ2v2 + c3λ3v3 (as Avi = λivi)
Thus,
u1 = A
1/3
1/3
1/3
= c1λ1
2/3
1
1
+ c2λ2
0
−11
+ c3λ3
−211
36
Linear Combination of Eigenvectors(contd.)
u2 = Au1 = A2u0 = c1λ
21
2/3
1
1
+ c2λ22
0
−11
+ c3λ23
−211
In general, for integer k > 0, uk = Aku0 = c1λk1v1 + c2λ
k2v2 + c3λ
k3v3, i.e.
uk = Ak
1/3
1/3
1/3
= c1λk1
2/3
1
1
+ c2λk2
0
−11
+ c3λk3
−211
and that equals
uk = c11k
2/3
1
1
+ c2(−2
3)k
0
−11
+ c3(−1
3)k
−211
37
Linear Combination of Eigenvectors(contd.)
For large values of k, ( 23)k → 0 and ( 1
3)k → 0. The above expression
reduces to
uk ≈ c1
2/3
1
1
=3
8
2/3
1
1
=
2/8
3/8
3/8
Note that the value of c1 is derived by solving the equation foru0 = c1v1 + c2v2 + c3v3 for u0 = [1/3, 1/3, 1/3]
38
Linear Combination of Eigenvectors(contd.)
Suppose u0 = [1/4, 1/4, 1/2]
u1 = Au0 = [1/4, 11/24, 7/24]
u2 = Au1 = [1/4, 23/72, 31/72]
u3 = Au2 = [1/4, 89/216, 73/216]
. . .
u∞ = [2/8, 3/8, 3/8]
39
Convergence?
Entries in Ak
Assume that all the entries of a Markov matrix A, or of some finite power ofA, i.e. Ak for some integer k > 0, are strictly > 0. A corresponds to anirreducible aperiodic Markov chain.
Irreducible: for any pair of states i and j, it is always possible to go fromstate i to state j in finite number of steps with positive probability.
Period of a state i: GCD of all possible number of steps it takes the chain toreturn to the state i starting from i.
Aperiodic: M is aperiodic if the GCD is 1 for the period of each of the statesin M .
40
Properties of Markov Matrix A, when Ak > 0
1 2 3
A =
0 0 1
1 1/2 0
0 1/2 0
A2 =
0 0 1
1 1/2 0
0 1/2 0
0 0 1
1 1/2 0
0 1/2 0
=
0 1/2 0
1/2 1/4 1
1/2 1/4 0
A3 =
1/2 1/4 0
1/4 5/8 1/2
1/4 1/8 1/2
A4 =
1/4 1/8 1/2
5/8 9/16 1/4
1/8 5/16 1/4
A4 > 0 and for k ≥ 4, Ak > 0.
A corresponds to irreducible aperiodic Markov chain.
41
Perron-Frobenius Theorem
Assume A corresponds to an irreducible aperiodic Markov chain M .
Perron-Frobenius Theorem from linear algebra states that
1. Largest eigenvalue 1 of A is unique
2. All other eigenvalues of A have magnitude strictly smaller than 1
3. All the coordinates of the eigenvector v1 corresponding to the eigenvalue1 are > 0
4. The steady state corresponds to the eigenvector v1
42
Pagerank Algorithm
Problem: How to rank the web-pages?
Ranking assigns a real number to each web-page.The higher the number, the more important the page is.Needs to be automated, as the web is extremely large.
We will study the Page Rank algorithm.
Source: Page, Brin, Motwani, Winograd, The PageRank citation ranking:Bringing order to the Web published as a technical report in1998).
43
Web as a Graph
- G = (V,E) is a positively weighted directed graph- Each web-page is a vertex of G- If a web-page u points (links) to the web-page v, there is a directed edgefrom u to v- The weight of an edge uv is 1
out-degree(u)
Assume V = {v1, . . . , vn}n× n adjacency matrix M of G is:
M(i, j) =
{1
out-degree(vj), if vjvi ∈ E
0 otherwise
Assumption: A surfer will make a random transition from a web-page to whatit points to.
44
Remarks
1. Assumes users will visit useful pages rather than useless pages.
2. Random Surfer Model - Assume initially a web-surfer is equally likely tobe at any node of G, given by the vector v0 = (1/|V |, . . . , 1/|V |).
3. In each step it makes a transition: v1 = Mv, v2 = Mv1 = M2v0, . . .,vk = Mvk−1 = Mkv0.
4. Need to worry about sink nodes/dead ends; circling within same set ofnodes; and whether we will reach a steady state?
46
Abstract representation of a web graph
Strongly
Connected
Component
In Component Out Component
• In-Component: Nodes that can reach strongly-connected component
• Out-component: Nodes that can be reached from strongly-connectedcomponent
• Possibly multiple copies of above configuration
47
Avoiding Sink Nodes
Idea: Make sink nodes point to all other nodes.
M =
0 0 1/2 1/3 0
1/2 0 0 0 0
1/2 1/2 0 0 0
0 1/2 1/2 1/3 0
0 0 0 1/3 0
→
0 0 1/2 1/3 1/5
1/2 0 0 0 1/5
1/2 1/2 0 0 1/5
0 1/2 1/2 1/3 1/5
0 0 0 1/3 1/5
= Q
v1 v2
v3v4
v5
48
Teleportation - Key Idea
Define K = αQ+ 1−αnE
Teleportation Parameter: 0 < α < 1, e.g α = 0.9
E is a n× n matrix of all 1s.
Observations on K:
1. Each entry of K is > 0
2. The entries within each column sums to 1
3. K satisfies the requirements of irreducible aperiodic Markov chain
4. Its largest eigenvalue is 1
5. By Perron-Frobenius Theorem, the steady state (=page ranks)correspond to the principal eigenvector
49
Conclusions
Computational Issues: K = αQ+ 1−αnE
Q is sparse and E is special.
Favors: Teleport to specific pages. Teleport to topic-sensitive pages (Sports,Business, Science, News, ...) based on the profile of the user.
Caution: Real story is not that simple
50
References
1. Link Analysis Chapter in mmds.org
2. Chapter on Matrices in CS in my notes on algorithm design
3. Page, Brin, Motwani, Winograd, The PageRank citation ranking:Bringing order to the Web published as a technical report in1998.
4. Brin and Page, The Anatomy of a Large-Scale Hypertextual Web SearchEngine, Computer Networks 56 (18): 3825-3833, Reprinted in 2012.
51