Math20F Slides: Summary of Chapters 5.1-3, 6.1-5, and 7.1 ...aoswald/Math20F/SlidesMath20FCh567.pdf · Math20F Slides: Summary of Chapters 5.1-3, 6.1-5, and 7.1+4 of [Lay] P. Oswald

Math20F Slides: Summary of Chapters 5.1-3, 6.1-5, and7.1+4 of [Lay]

P. Oswald

Mathematics

UCSD Fall 2015

Outline

1 Eigenvalue Problem and Diagonalization (Chapters 5.1-3)

2 Orthogonality (Chapters 6.1-5)

3 More Factorizations (Chapters 7.1-4)

Outline




Definition

The eigenvector problem can be formulated for any linear mapT : V → V mapping a vector space into itself. However, we will do thisonly for V = Rn, where T = TA is given by matrix-vector multiplicationwith a square matrix A.

Definition. A number λ ∈ R is called (real) eigenvalue of A ∈ Mn×n ifthere is a nonzero vector 0 6= v ∈ Rn (called eigenvector associated with λsuch that

Av = λv ⇐⇒ (A− λIN)v = 0.

Any pair (λ, v) with the above properties is called eigenpair of A.The first formulation tells us that eigenvectors are the vectors in V thatare left unchanged under the action of TA up to a stretching factor λ, thesecond leads to finding eigenpairs in practice.Simple examples: Triangular matrices - λ is eigenvalue if and only ifλ = aii for one of the elements on the diagonal (the correspondingeigenvectors can be found by solving the respective homogeneous system(A− aii In)v = v).

Definition

The eigenvector problem can be formulated for any linear mapT : V → V mapping a vector space into itself. However, we will do thisonly for V = Rn, where T = TA is given by matrix-vector multiplicationwith a square matrix A.Definition. A number λ ∈ R is called (real) eigenvalue of A ∈ Mn×n ifthere is a nonzero vector 0 6= v ∈ Rn (called eigenvector associated with λsuch that

Av = λv ⇐⇒ (A− λIN)v = 0.

Any pair (λ, v) with the above properties is called eigenpair of A.The first formulation tells us that eigenvectors are the vectors in V thatare left unchanged under the action of TA up to a stretching factor λ, thesecond leads to finding eigenpairs in practice.

Simple examples: Triangular matrices - λ is eigenvalue if and only ifλ = aii for one of the elements on the diagonal (the correspondingeigenvectors can be found by solving the respective homogeneous system(A− aii In)v = v).

Definition

The eigenvector problem can be formulated for any linear mapT : V → V mapping a vector space into itself. However, we will do thisonly for V = Rn, where T = TA is given by matrix-vector multiplicationwith a square matrix A.Definition. A number λ ∈ R is called (real) eigenvalue of A ∈ Mn×n ifthere is a nonzero vector 0 6= v ∈ Rn (called eigenvector associated with λsuch that

Av = λv ⇐⇒ (A− λIN)v = 0.

Any pair (λ, v) with the above properties is called eigenpair of A.The first formulation tells us that eigenvectors are the vectors in V thatare left unchanged under the action of TA up to a stretching factor λ, thesecond leads to finding eigenpairs in practice.Simple examples: Triangular matrices - λ is eigenvalue if and only ifλ = aii for one of the elements on the diagonal (the correspondingeigenvectors can be found by solving the respective homogeneous system(A− aii In)v = v).

Examples

A =

(3 00 −1

), B =

(2 01 2

), C =

(1 −11 1

), D = vuT .

A : λ1 = 3, v1 = e1; λ2 = −1, v2 = e2.

B : λ = 2, v1 = e2.

C : No real eigenvalues (rotation by α = π/4, followed by stretchingwith factor

√2)!

D is a rank-one matrix in Mn×n if 0 6= v,u ∈ Rn with two realeigenvalues: λ1 = uTv (associated eigenvector v1 = v), and λ2 = 0(n − 1 linearly independent associated eigenvectors v2, . . . , vn).

Examples

A =

(3 00 −1

), B =

(2 01 2

), C =

(1 −11 1

), D = vuT .

A : λ1 = 3, v1 = e1; λ2 = −1, v2 = e2.

B : λ = 2, v1 = e2.


√2)!


Examples

A =

(3 00 −1

), B =

(2 01 2

), C =

(1 −11 1

), D = vuT .

A : λ1 = 3, v1 = e1; λ2 = −1, v2 = e2.

B : λ = 2, v1 = e2.


√2)!


Examples

A =

(3 00 −1

), B =

(2 01 2

), C =

(1 −11 1

), D = vuT .

A : λ1 = 3, v1 = e1; λ2 = −1, v2 = e2.

B : λ = 2, v1 = e2.


√2)!


Examples

A =

(3 00 −1

), B =

(2 01 2

), C =

(1 −11 1

), D = vuT .

A : λ1 = 3, v1 = e1; λ2 = −1, v2 = e2.

B : λ = 2, v1 = e2.


√2)!


Motivation

What are the potential benefits of knowing the eigenvalues/vectors?

As ”directions” in V that are invariant (unchanged) undermultiplication by A (or, in terms of linear transformations, invariantunder the action of T = TA) they might have special meaning to theapplication at hand (ground states in physics, resonances inmechanical systems,...).Example: If TA is a rotation about an axis with direction vector v 6= 0then has the eigenpair (1, v).If A ∈ Mn×n possesses an eigenbasis (i.e., a basis V consisting ofeigenvectors vj associated with eigenvalues λj ∈ R, j = 1, . . . , n) thenmany operations simplify if one switches the coordinate representationto V :

v =n∑

j=1

xjvj =⇒ Akv =n∑

j=1

λkj xjvj ,

or

(A− λIn)−1v =n∑

j=1

xjλj − λ

vj , λ 6= λj for all j .

Motivation


As ”directions” in V that are invariant (unchanged) undermultiplication by A (or, in terms of linear transformations, invariantunder the action of T = TA) they might have special meaning to theapplication at hand (ground states in physics, resonances inmechanical systems,...).

Example: If TA is a rotation about an axis with direction vector v 6= 0then has the eigenpair (1, v).If A ∈ Mn×n possesses an eigenbasis (i.e., a basis V consisting ofeigenvectors vj associated with eigenvalues λj ∈ R, j = 1, . . . , n) thenmany operations simplify if one switches the coordinate representationto V :

v =n∑

j=1

xjvj =⇒ Akv =n∑

j=1

λkj xjvj ,

or


j=1

xjλj − λ


Motivation


As ”directions” in V that are invariant (unchanged) undermultiplication by A (or, in terms of linear transformations, invariantunder the action of T = TA) they might have special meaning to theapplication at hand (ground states in physics, resonances inmechanical systems,...).Example: If TA is a rotation about an axis with direction vector v 6= 0then has the eigenpair (1, v).

If A ∈ Mn×n possesses an eigenbasis (i.e., a basis V consisting ofeigenvectors vj associated with eigenvalues λj ∈ R, j = 1, . . . , n) thenmany operations simplify if one switches the coordinate representationto V :

v =n∑

j=1

xjvj =⇒ Akv =n∑

j=1

λkj xjvj ,

or


j=1

xjλj − λ


Motivation



v =n∑

j=1

xjvj =⇒ Akv =n∑

j=1

λkj xjvj ,

or


j=1

xjλj − λ


Motivation



v =n∑

j=1

xjvj =⇒ Akv =n∑

j=1

λkj xjvj ,

or


j=1

xjλj − λ


Linear Independence of Eigenvectors

Theorem. If the system {v1, . . . , vk} consists of eigenvectors associatedwith different eigenvalues (i.e., λi 6= λj for all possible pairs i 6= j) then itis linearly independent.

Proof. Since this is Theorem 2 of Chapter 5.1, you can read the proof onpage 272. Alternative: Assume that the system is not linearly independent.Then, for some j = 2, . . . , k, {v1, . . . , vj−1} is linearly independent but

vj =

j−1∑i=1

civi (ci 6= 0).

Apply Ak to both sides:

λnj vj = Akvj =

j−1∑i=1

ciAkvi =

j−1∑i=1

ciλki vi .

Now consider the three cases maxi<j |λi | > |λj | or < |λj | or = |λj |, and(after multiplying by λ−kj ) look at the limits for k →∞ on both sides. Onthe left, the expression is constant and 6= 0, on the right the limit is ∞ inthe first case, 0 in the second, and not defined in the third.


Theorem. If the system {v1, . . . , vk} consists of eigenvectors associatedwith different eigenvalues (i.e., λi 6= λj for all possible pairs i 6= j) then itis linearly independent.

Proof. Since this is Theorem 2 of Chapter 5.1, you can read the proof onpage 272. Alternative: Assume that the system is not linearly independent.Then, for some j = 2, . . . , k, {v1, . . . , vj−1} is linearly independent but

vj =

j−1∑i=1

civi (ci 6= 0).


λnj vj = Akvj =

j−1∑i=1

ciAkvi =

j−1∑i=1

ciλki vi .



Theorem. If the system {v1, . . . , vk} consists of eigenvectors associatedwith different eigenvalues (i.e., λi 6= λj for all possible pairs i 6= j) then itis linearly independent.Proof. Since this is Theorem 2 of Chapter 5.1, you can read the proof onpage 272. Alternative: Assume that the system is not linearly independent.Then, for some j = 2, . . . , k, {v1, . . . , vj−1} is linearly independent but

vj =

j−1∑i=1

civi (ci 6= 0).


λnj vj = Akvj =

j−1∑i=1

ciAkvi =

j−1∑i=1

ciλki vi .




vj =

j−1∑i=1

civi (ci 6= 0).


λnj vj = Akvj =

j−1∑i=1

ciAkvi =

j−1∑i=1

ciλki vi .




vj =

j−1∑i=1

civi (ci 6= 0).


λnj vj = Akvj =

j−1∑i=1

ciAkvi =

j−1∑i=1

ciλki vi .


Characteristic Equation

How to find eigenvalues and eigenvectors analytically is facilitated by thefollowing obvious consequence of the properties of the determinantfunction:

Theorem. λ is eigenvalue for A ∈ Mn×n if and only if it solves thecharacteristic equation

0 = det(A−λIn) =: pA(λ) = (−1)n(λn−(a11+. . .+ann)λn−1+. . .+det(A)).

Once a root λ′ ∈ R of the characteristic polynomial pA(λ) has been found,a maximal set of linearly independent eigenvectors associated with thiseigenvalue λ′ can be found by finding a basis in Nul(A− λ′In) (e.g., byfinding its REF).

Examples (details in class): For the 2× 2 matrices on previous slides, weconfirm what we guessed before:

pA(λ) = (λ− 3)(λ+ 1), pB(λ) = (λ− 2)2, pC (λ) = (λ− 1)2 + 1(> 0).

Example with 3× 3 A demonstrated in class.


How to find eigenvalues and eigenvectors analytically is facilitated by thefollowing obvious consequence of the properties of the determinantfunction:Theorem. λ is eigenvalue for A ∈ Mn×n if and only if it solves thecharacteristic equation




pA(λ) = (λ− 3)(λ+ 1), pB(λ) = (λ− 2)2, pC (λ) = (λ− 1)2 + 1(> 0).







pA(λ) = (λ− 3)(λ+ 1), pB(λ) = (λ− 2)2, pC (λ) = (λ− 1)2 + 1(> 0).







pA(λ) = (λ− 3)(λ+ 1), pB(λ) = (λ− 2)2, pC (λ) = (λ− 1)2 + 1(> 0).


Observations

pAT (λ) = pA(λ), thus, A and AT have the same eigenvalues.

Since pA(λ) is a polynomial of degree exactly n, it has exactly n (realand complex) roots (counting multiplicities). So, we cannot havemore than n eigenvalues. Examples show that we may have all kindsof situations between no and n real eigenvalues (it is a pity that wedon’t want to deal with complex eigenvalues, see Chapter 5.5).However, if n is odd, we must have at least one real eigenvalue(why?).Suppose that we have n real eigenvalues. If they are all different (ahappy case) then each of them has exactly one associatedeigenvector, thus altogether n linearly independent eigenvectors (i.e.,these n eigenvectors form a basis for Rn).If they are not all different then there are multiple eigenvalues (themultiplicity n(λ) of such an eigenvalue is called its algebraicmultiplicity). The number of linearly independent eigenvectorsm(λ) := dimNul(A− λIn) (called geometric multiplicity of λ) is atleast 1 and at most n(λ). Nothing more can be said!

Observations

pAT (λ) = pA(λ), thus, A and AT have the same eigenvalues.Since pA(λ) is a polynomial of degree exactly n, it has exactly n (realand complex) roots (counting multiplicities). So, we cannot havemore than n eigenvalues. Examples show that we may have all kindsof situations between no and n real eigenvalues (it is a pity that wedon’t want to deal with complex eigenvalues, see Chapter 5.5).However, if n is odd, we must have at least one real eigenvalue(why?).

Suppose that we have n real eigenvalues. If they are all different (ahappy case) then each of them has exactly one associatedeigenvector, thus altogether n linearly independent eigenvectors (i.e.,these n eigenvectors form a basis for Rn).If they are not all different then there are multiple eigenvalues (themultiplicity n(λ) of such an eigenvalue is called its algebraicmultiplicity). The number of linearly independent eigenvectorsm(λ) := dimNul(A− λIn) (called geometric multiplicity of λ) is atleast 1 and at most n(λ). Nothing more can be said!

Observations

pAT (λ) = pA(λ), thus, A and AT have the same eigenvalues.Since pA(λ) is a polynomial of degree exactly n, it has exactly n (realand complex) roots (counting multiplicities). So, we cannot havemore than n eigenvalues. Examples show that we may have all kindsof situations between no and n real eigenvalues (it is a pity that wedon’t want to deal with complex eigenvalues, see Chapter 5.5).However, if n is odd, we must have at least one real eigenvalue(why?).Suppose that we have n real eigenvalues. If they are all different (ahappy case) then each of them has exactly one associatedeigenvector, thus altogether n linearly independent eigenvectors (i.e.,these n eigenvectors form a basis for Rn).

If they are not all different then there are multiple eigenvalues (themultiplicity n(λ) of such an eigenvalue is called its algebraicmultiplicity). The number of linearly independent eigenvectorsm(λ) := dimNul(A− λIn) (called geometric multiplicity of λ) is atleast 1 and at most n(λ). Nothing more can be said!

Observations

pAT (λ) = pA(λ), thus, A and AT have the same eigenvalues.Since pA(λ) is a polynomial of degree exactly n, it has exactly n (realand complex) roots (counting multiplicities). So, we cannot havemore than n eigenvalues. Examples show that we may have all kindsof situations between no and n real eigenvalues (it is a pity that wedon’t want to deal with complex eigenvalues, see Chapter 5.5).However, if n is odd, we must have at least one real eigenvalue(why?).Suppose that we have n real eigenvalues. If they are all different (ahappy case) then each of them has exactly one associatedeigenvector, thus altogether n linearly independent eigenvectors (i.e.,these n eigenvectors form a basis for Rn).If they are not all different then there are multiple eigenvalues (themultiplicity n(λ) of such an eigenvalue is called its algebraicmultiplicity). The number of linearly independent eigenvectorsm(λ) := dimNul(A− λIn) (called geometric multiplicity of λ) is atleast 1 and at most n(λ). Nothing more can be said!

Diagonalization

Theorem. If A ∈ Mn×n possesses a basis of eigenvectors B = {v1, . . . , vn}associated to its n real eigenvalues λ1, . . . , λn (multiple eigenvalues arelisted n(λ) times) then we have

AV = VD, V = [v1, . . . vn], D is diagonal with dii = λi .

Since V (a matrix of basis vectors!) is invertible, this means D = V−1AV ,in other words, the matrix A can be diagonalized (transformed into thediagonal matrix) by changing to the basis B.

Conversely, if, for some invertible V ∈ Mn×n, the matrix D = V−1AV isdiagonal then the diagonal elements dii are the eigenvalues of A, and thecolumns of V the associated eigenvectors!The proof is obvious by the definition of eigenvalues and -vectors.Whether a given A is diagonalizable or not is not easy to recognize from Aitself, in general, you have to do the dirty work. However, an importantsufficient condition for diagonalization is the symmetry of A (see 7.1).

Diagonalization



Since V (a matrix of basis vectors!) is invertible, this means D = V−1AV ,in other words, the matrix A can be diagonalized (transformed into thediagonal matrix) by changing to the basis B.Conversely, if, for some invertible V ∈ Mn×n, the matrix D = V−1AV isdiagonal then the diagonal elements dii are the eigenvalues of A, and thecolumns of V the associated eigenvectors!

The proof is obvious by the definition of eigenvalues and -vectors.Whether a given A is diagonalizable or not is not easy to recognize from Aitself, in general, you have to do the dirty work. However, an importantsufficient condition for diagonalization is the symmetry of A (see 7.1).

Diagonalization



Since V (a matrix of basis vectors!) is invertible, this means D = V−1AV ,in other words, the matrix A can be diagonalized (transformed into thediagonal matrix) by changing to the basis B.Conversely, if, for some invertible V ∈ Mn×n, the matrix D = V−1AV isdiagonal then the diagonal elements dii are the eigenvalues of A, and thecolumns of V the associated eigenvectors!The proof is obvious by the definition of eigenvalues and -vectors.

Whether a given A is diagonalizable or not is not easy to recognize from Aitself, in general, you have to do the dirty work. However, an importantsufficient condition for diagonalization is the symmetry of A (see 7.1).

Diagonalization



Since V (a matrix of basis vectors!) is invertible, this means D = V−1AV ,in other words, the matrix A can be diagonalized (transformed into thediagonal matrix) by changing to the basis B.Conversely, if, for some invertible V ∈ Mn×n, the matrix D = V−1AV isdiagonal then the diagonal elements dii are the eigenvalues of A, and thecolumns of V the associated eigenvectors!The proof is obvious by the definition of eigenvalues and -vectors.Whether a given A is diagonalizable or not is not easy to recognize from Aitself, in general, you have to do the dirty work. However, an importantsufficient condition for diagonalization is the symmetry of A (see 7.1).

Things to Practice

Check if given v is an eigenvector of A (and find λ), or find (amaximal set of linearly independent) eigenvectors v for given λ.Bonus: Do this for a linear map T : V → V , if V is somefinite-dimensional vector sapce, e.g., Pn.

Recognize matrices with known eigenvalues (e.g., triangular matrices).

Find the characteristic polynomial pA(λ), and its roots (theeigenvalues).

Find all eigenvalues and a maximal set of linearly independentassociated eigenvectors.

Decide (using the previous task or otherwise) if a matrix A isdiagonalizable.

Find D and V from given information on the eigenvalues/vectors ofA, and use it to simplify calculations such as An, A−k = (A−1)k , and(bonus) linear combinations of powers of A and their inverses.

If A is diagonalizable, how are det(A) and eigenvalues related?

Outline




Inner Product, Norm, Orthogonality

Given limited time, we define these notions only for V = Rn.

Definition. a) The inner product u · v of two vectors u, v ∈ Rn is defined by

u · v := uTv = vTu =n∑

i=1

uivi .

b) The norm (or length) ‖v‖ of a vector v ∈ Rn is given by

‖v‖ :=√v · v = (

n∑i=1

v2i )1/2.

c) Two vectors u, v ∈ Rn are called orthogonal (denoted u ⊥ v) if u ·v = 0.

These notions are in line with the intuitive understanding for vectors in theplane or space. Using this analogy, we define distances and angles betweenvectors in Rn:

dist(u, v) = ‖u− v‖, cos(∠(u, v)) =u · v‖u‖‖v‖

.

Study the properties of u · v (Theorem 1 on page 333).





i=1

uivi .


‖v‖ :=√v · v = (

n∑i=1

v2i )1/2.




.






i=1

uivi .


‖v‖ :=√v · v = (

n∑i=1

v2i )1/2.




.






i=1

uivi .


‖v‖ :=√v · v = (

n∑i=1

v2i )1/2.




.






i=1

uivi .


‖v‖ :=√v · v = (

n∑i=1

v2i )1/2.




.






i=1

uivi .


‖v‖ :=√v · v = (

n∑i=1

v2i )1/2.




.


Properties to Remember

.

Matrix-vector and matrix-matrix multiplication can be reinterpretedusing the inner product (recall the ”row times column” rule tocompute entries).

Triangle inequalities (recall these by looking at the parallelogramspanned by u and v):

|‖u‖ − ‖v‖| ≤ ‖u± v‖ ≤ ‖u‖+ ‖v‖.

Cauchy-Schwarz inequality (implicitly used in the definition of angles):

|u · v| ≤ ‖u‖ · ‖v‖.

Given any subspace W ⊂ Rn, the set of all vectors orthogonal to allvectors in W is also a subspace, denoted by

W⊥ := {u ∈ Rn : u ·w = 0 for all w ∈W }.The dimensions of W and W⊥ always add up to n = dimRn !


.

Matrix-vector and matrix-matrix multiplication can be reinterpretedusing the inner product (recall the ”row times column” rule tocompute entries).Triangle inequalities (recall these by looking at the parallelogramspanned by u and v):

|‖u‖ − ‖v‖| ≤ ‖u± v‖ ≤ ‖u‖+ ‖v‖.


|u · v| ≤ ‖u‖ · ‖v‖.




.


|‖u‖ − ‖v‖| ≤ ‖u± v‖ ≤ ‖u‖+ ‖v‖.


|u · v| ≤ ‖u‖ · ‖v‖.




.


|‖u‖ − ‖v‖| ≤ ‖u± v‖ ≤ ‖u‖+ ‖v‖.


|u · v| ≤ ‖u‖ · ‖v‖.


W⊥ := {u ∈ Rn : u ·w = 0 for all w ∈W }.

The dimensions of W and W⊥ always add up to n = dimRn !


.


|‖u‖ − ‖v‖| ≤ ‖u± v‖ ≤ ‖u‖+ ‖v‖.


|u · v| ≤ ‖u‖ · ‖v‖.



Orthogonal and Orthonormal Sets and Bases

Definition. A set W = {w1, . . . ,wk} ⊂W of non-zero vectors in asubspace W of Rn (W = Rn is allowed) is called orthogonal if they aremutually orthogonal, i.e., wi ⊥ wj for all possible index pairs i 6= j , andorthonormal if, in addition, all wi have unit norm ‖wi‖ = 1.

Orthogonal systems are automatically linearly independent, for

c1w1 + . . .+ ckwk = 0 | ·wi =⇒ ci‖wi‖2 = 0,

i.e., ci = 0 for all i = 1, . . . , k .Consequently, if k = dimW or if W = span{w1, . . . ,wk}, then the systemis automatically a basis in W , and the coordinate map can be computedexplicitly for any w ∈W :

[w]W = (w ·w1

‖w1‖2, . . . ,

w ·wk

‖wk‖2)T .




c1w1 + . . .+ ckwk = 0 | ·wi =⇒ ci‖wi‖2 = 0,

i.e., ci = 0 for all i = 1, . . . , k .

Consequently, if k = dimW or if W = span{w1, . . . ,wk}, then the systemis automatically a basis in W , and the coordinate map can be computedexplicitly for any w ∈W :

[w]W = (w ·w1

‖w1‖2, . . . ,

w ·wk

‖wk‖2)T .




c1w1 + . . .+ ckwk = 0 | ·wi =⇒ ci‖wi‖2 = 0,

i.e., ci = 0 for all i = 1, . . . , k .Consequently, if k = dimW or if W = span{w1, . . . ,wk}, then the systemis automatically a basis in W , and the coordinate map can be computedexplicitly for any w ∈W :

[w]W = (w ·w1

‖w1‖2, . . . ,

w ·wk

‖wk‖2)T .

Projections/Distance onto/from Subspaces

Theorem. Let W = {w1, . . . ,wk} be a orthogonal basis of a subspace W .Then the orthogonal projection PW : Rn →W , defined for every v ∈ Rn

by the condition

(v − PW v) ·w = 0 for all w ∈W ,

can be computed by the formula

PW (v) =k∑

i=1

v ·wi

‖wi‖2wi .

Moreover, PW v is the point in W closest to v, i.e.,

‖v − PW (v)‖ = minw∈W

‖v −w‖.

Examples done in class.

How to Construct Orthogonal Systems

An orthogonal basis U = {u1, . . . ,uk} of W = span{v1, . . . , vk} can beconstructed by from the vi by Gram-Schmidt orthogonalization (assumethat the vi are linearly independent):

u1 = v1,

u2 = v2 −v2 · u1‖u1‖2

u1,

u3 = v3 −v3 · u1‖u1‖2

u1 −v3 · u2‖u2‖2

u2,

. . .

uk = vk −k−1∑i=1

vk · ui‖ui‖2

ui .

Compare with the formula for the orthogonal projection, to understandwhy uj is orthogonal to all previous ui , i < j .

After having constructed U , an orthonormal basis in W is obtained bynormalization:

{‖u1‖−1u1, . . . , ‖uk‖−1uk}.

How to Construct Orthogonal Systems

An orthogonal basis U = {u1, . . . ,uk} of W = span{v1, . . . , vk} can beconstructed by from the vi by Gram-Schmidt orthogonalization (assumethat the vi are linearly independent):

u1 = v1,

u2 = v2 −v2 · u1‖u1‖2

u1,

u3 = v3 −v3 · u1‖u1‖2

u1 −v3 · u2‖u2‖2

u2,

. . .

uk = vk −k−1∑i=1

vk · ui‖ui‖2

ui .

Compare with the formula for the orthogonal projection, to understandwhy uj is orthogonal to all previous ui , i < j .After having constructed U , an orthonormal basis in W is obtained bynormalization:

{‖u1‖−1u1, . . . , ‖uk‖−1uk}.

Orthogonal Matrices and QR Factorization

What we have done so far can be expressed in matrix language:

A system U = {u1, . . . ,uk} of vectors in Rn is orthogonal resp.orthonormal if UTU = D is a diagonal matrix resp. UTU = Ik , whereU = [u1, . . . ,uk ] is the n × k matrix associated with U .

U is an orthonormal basis of Rn if and only if k = n andUTU = UUT = In.Any n × n matrix U with this property is called orthogonal.Equivalently, U is orthogonal if UT = U−1.Moreover, the inner product is unchanged under orthogonaltransformations TU (multiplication by an orthogonal matrix U):

(Uu) · (Uv) = (Uu)T (Uv) = uTUTUv = uTv = u · v.

Applying the Gram-Schmidt orthogonalization process to the columnsof a n × k matrix A leads to a so-called QR-factorization of A,

A = QR, Q ∈ Mn×r : QTQ = Ir , R ∈ Mr×k is upper-triangular,

where r = rank(A) ≤ min(n, k). Knowing QR details is bonus!



A system U = {u1, . . . ,uk} of vectors in Rn is orthogonal resp.orthonormal if UTU = D is a diagonal matrix resp. UTU = Ik , whereU = [u1, . . . ,uk ] is the n × k matrix associated with U .U is an orthonormal basis of Rn if and only if k = n andUTU = UUT = In.Any n × n matrix U with this property is called orthogonal.Equivalently, U is orthogonal if UT = U−1.

Moreover, the inner product is unchanged under orthogonaltransformations TU (multiplication by an orthogonal matrix U):







A system U = {u1, . . . ,uk} of vectors in Rn is orthogonal resp.orthonormal if UTU = D is a diagonal matrix resp. UTU = Ik , whereU = [u1, . . . ,uk ] is the n × k matrix associated with U .U is an orthonormal basis of Rn if and only if k = n andUTU = UUT = In.Any n × n matrix U with this property is called orthogonal.Equivalently, U is orthogonal if UT = U−1.Moreover, the inner product is unchanged under orthogonaltransformations TU (multiplication by an orthogonal matrix U):







A system U = {u1, . . . ,uk} of vectors in Rn is orthogonal resp.orthonormal if UTU = D is a diagonal matrix resp. UTU = Ik , whereU = [u1, . . . ,uk ] is the n × k matrix associated with U .U is an orthonormal basis of Rn if and only if k = n andUTU = UUT = In.Any n × n matrix U with this property is called orthogonal.Equivalently, U is orthogonal if UT = U−1.Moreover, the inner product is unchanged under orthogonaltransformations TU (multiplication by an orthogonal matrix U):





Least-Squares Problem

The motivating example for the least-squares problem is the fitting of acurve to noisy measurements (in statistics, this is called linear regression):One looks for a function of the form

f (x) = c1f1(x) + c2f2(x) + . . .+ cnfn(x)

and wants to find the coefficients c1, . . . , cn such that the graph of f (x)matches a data set (xi , yi ), i = 1, . . . ,m, as good as possible. Since oftenm >> n and yi = f (xi ) + εi is considered a ”noisy measurement” (withnoise value εi of the actual function value at xi , enforcing

yi = f (xi ) = c1f1(xi ) + c2f2(xi ) + . . .+ cnfn(xi ), i = 1, . . . ,m,

does not make too much sense since this leads to an over-determinedm × n system

Ac = y, (A)ij = fj(xi ), i = 1, . . . ,m, j = 1, . . . , n,

for the unknown coefficients cj , j = 1, 2, . . . , n, which is most probableinconsistent, and even if it would have a solution the resulting f (x) wouldpick up the noise, e.g., have lots of oscillations.

Remedy: Instead of solving Ac = y, one solves the residual minimizationproblem

‖Ac− y‖2 =m∑i=1

(yi −n∑

j=1

cj fj(xi ))2 −→ min .

Geometrically, the solution point y := Ac is the projection of y onto thesubspace Col(A). Thus, finding y is a problem we have already solved. Ifthe columns of A are linearly independent, then the vector c (called theleast-squares solution of Ac = y, and denoted cLS) is uniquely determinedfrom y by solving the consistent system Ac = y.

A shorter way of finding c is provided by

Theorem. If A ∈ Mm×n has linearly independent columns, i.e.rank(A) = n, then the least-squares solution cLS of the system Ac = y isunique, and can be found from the n × n system (the so-called normalequation)

ATAc = ATy (ATA is invertible).

If A is given by its QR-factorization then ATA = RTR and the previoussystem can be solved in a more economic way by forward and backwardsubstitution (the product RTR is similar to an LU-factorization).


‖Ac− y‖2 =m∑i=1

(yi −n∑

j=1








‖Ac− y‖2 =m∑i=1

(yi −n∑

j=1








‖Ac− y‖2 =m∑i=1

(yi −n∑

j=1







Example

About 50 data points (black dots) created from polynomialp(t) = (3t − 2)2 (red line) by adding noise |εi | ≤ 0.3. Figure shows graphsof least-squares fitting polynomials of degree 1 (green line), degree 2 (blueline), and degree 10 (black dashed line). Draw your conclusions.

Things to Practice

Know the definition and properties of inner products and norm. Usethem to find angles, length, and distances in Rn.

Orthogonal/orthonormal sets and their linear independence.Coefficient map for orthogonal bases.

Find projections onto and distances from subspaces spanned by suchsets. W⊥.

Be able to use the Gram-Schmidt orthogonalization process, to findan orthonormal basis.

Find the least-squares solution of Ac = y from the normal equationATAc = ATy. Know when cLS is unique.

Connection of least-squares problem with orthogonal projection ontoCol(A).

(Bonus) What is a QR factorization, how to find it, and how topossibly put it to good use?

Outline




Diagonalization of Symmetric Matrices

The set of symmetric n × n matrices has remarkable properties, one of themost important ones concerns the eigenvalue problem.

Theorem. If A ∈ Mn×n is symmetric (AT = A) thena) all its eigenvalues λ1, . . . , λn are real,b) it has an orthonormal basis of eigenvectors U = {u1, . . . ,un},c) it can be orthogonally diagonalized, i.e.,

D =

λ1

λ2. . .

λn

= UTAU,

where U is the orthogonal matrix formed by U .

This is spelled out in Theorem 1-3 of Chapter 7.1. The orthogonality ofeigenvectors for different eigenvalues is the easiest part (done in class), therest needs more preparation than we have.



Theorem. If A ∈ Mn×n is symmetric (AT = A) thena) all its eigenvalues λ1, . . . , λn are real,

b) it has an orthonormal basis of eigenvectors U = {u1, . . . ,un},c) it can be orthogonally diagonalized, i.e.,

D =

λ1

λ2. . .

λn

= UTAU,





Theorem. If A ∈ Mn×n is symmetric (AT = A) thena) all its eigenvalues λ1, . . . , λn are real,b) it has an orthonormal basis of eigenvectors U = {u1, . . . ,un},

c) it can be orthogonally diagonalized, i.e.,

D =

λ1

λ2. . .

λn

= UTAU,






D =

λ1

λ2. . .

λn

= UTAU,






D =

λ1

λ2. . .

λn

= UTAU,



Example

In class, the diagonalization of the 4× 4 matrix

A =

4 3 1 13 4 1 11 1 4 31 1 3 4

was demonstrated in all detail.

This A had one double eigenvalue λ1/2 = 1 for which choosing the twoassociated orthonormal eigenvectors was easy.In general, for multiple eigenvalues λ of a symmetric A one needs to firstfind a basis in Nul(A− λIn), and then orthonormalize it by Gram-Schmidtorthogonalization.

Example


A =

4 3 1 13 4 1 11 1 4 31 1 3 4


This A had one double eigenvalue λ1/2 = 1 for which choosing the twoassociated orthonormal eigenvectors was easy.

In general, for multiple eigenvalues λ of a symmetric A one needs to firstfind a basis in Nul(A− λIn), and then orthonormalize it by Gram-Schmidtorthogonalization.

Example


A =

4 3 1 13 4 1 11 1 4 31 1 3 4


This A had one double eigenvalue λ1/2 = 1 for which choosing the twoassociated orthonormal eigenvectors was easy.In general, for multiple eigenvalues λ of a symmetric A one needs to firstfind a basis in Nul(A− λIn), and then orthonormalize it by Gram-Schmidtorthogonalization.

Compression Idea

One of the central ideas of data analysis and modelling in general iscompression.Roughly speaking, given a large amount of data represented by Kparameters (e.g., bits or real numbers) we aim at rewriting these data intoanother format requiring k << K parameters. There is exact compression(the original K parameters can be recovered by decompression exactly)and lossy compression (the decompression recovers the original only withina small tolerance).

Examples related to Linear Algebra:

If a matrix A or a vector x has a lots of small entries thenthresholding with a small positive parameter ε

bij =

{aij , if |aij | ≥ ε0, if |aij | < ε

, yj =

{xj , if |xj | ≥ ε0, if |xj | < ε

,

leads to so-called sparse matrices and sparse vectors whose many zeroentries need not be stored.

Compression Idea

One of the central ideas of data analysis and modelling in general iscompression.Roughly speaking, given a large amount of data represented by Kparameters (e.g., bits or real numbers) we aim at rewriting these data intoanother format requiring k << K parameters. There is exact compression(the original K parameters can be recovered by decompression exactly)and lossy compression (the decompression recovers the original only withina small tolerance).

Examples related to Linear Algebra:

If a matrix A or a vector x has a lots of small entries thenthresholding with a small positive parameter ε

bij =

{aij , if |aij | ≥ ε0, if |aij | < ε

, yj =

{xj , if |xj | ≥ ε0, if |xj | < ε

,

leads to so-called sparse matrices and sparse vectors whose many zeroentries need not be stored.

More Examples

If A ∈ Mn×m happens to have low rank r , then writing it as A = QRreplaces it by two much smaller (m × r and r × n matrices. E.g., ifr = 1, A is a product a column vector with a row vector, and can bestored with m + n real numbers instead of mn numbers.

If a symmetric matrix A has lots of small eigenvalues λi then itsdiagonalization

A = UDUT =n∑

i=1

λiuiuTi ≈

∑i : |λi |>ε

λiuiuTi

can be shortened to a more economic low-rank approximation.

The latter idea carries over to general m × n matrices, and is calledsingular value decomposition (SVD), a generalization of orthogonaldiagonalization of symmetric matrices.

More Examples

If A ∈ Mn×m happens to have low rank r , then writing it as A = QRreplaces it by two much smaller (m × r and r × n matrices. E.g., ifr = 1, A is a product a column vector with a row vector, and can bestored with m + n real numbers instead of mn numbers.

If a symmetric matrix A has lots of small eigenvalues λi then itsdiagonalization

A = UDUT =n∑

i=1

λiuiuTi ≈

∑i : |λi |>ε

λiuiuTi

can be shortened to a more economic low-rank approximation.

The latter idea carries over to general m × n matrices, and is calledsingular value decomposition (SVD), a generalization of orthogonaldiagonalization of symmetric matrices.

Singular Value Decomposition

Theorem. For any A ∈ Mm×n, there exist orthogonal matrices U ∈ Mm×m,V ∈ Mn×n, and a diagonal m × n matrix Σ with non-negative diagonalelements Σii := σi called singular values of A, ordered by decreasing value:

σ1 ≥ σ2 ≥ . . . ≥ σmin(m,n) ≥ 0,

such that

A = UΣV T ⇐⇒ A =

min(m,n)∑i=1

σiuivTi .

In contrast to the orthogonal diagonalization of symmetric A, we now needtwo orthonormal bases U (in Rm) and V (in Rn) but can diagonalize anymatrix!Neglecting terms with small σi is one of the most popular methods in dataanalysis, namely PCA (Principal Component Analysis), see Chapter 7.5.



σ1 ≥ σ2 ≥ . . . ≥ σmin(m,n) ≥ 0,

such that

A = UΣV T ⇐⇒ A =

min(m,n)∑i=1

σiuivTi .

In contrast to the orthogonal diagonalization of symmetric A, we now needtwo orthonormal bases U (in Rm) and V (in Rn) but can diagonalize anymatrix!

Neglecting terms with small σi is one of the most popular methods in dataanalysis, namely PCA (Principal Component Analysis), see Chapter 7.5.



σ1 ≥ σ2 ≥ . . . ≥ σmin(m,n) ≥ 0,

such that

A = UΣV T ⇐⇒ A =

min(m,n)∑i=1

σiuivTi .

In contrast to the orthogonal diagonalization of symmetric A, we now needtwo orthonormal bases U (in Rm) and V (in Rn) but can diagonalize anymatrix!Neglecting terms with small σi is one of the most popular methods in dataanalysis, namely PCA (Principal Component Analysis), see Chapter 7.5.

Finding Singular Values and SVD

Suppose m ≥ n (overdetermined case).

Step 1. Form B = ATA, and solve for the orthogonal diagonalization ofthe symmetric n × n matrix B = VDV T .

Step 2. Order the eigenpairs (λ, v) of B such that λ1 ≥ . . . ≥ λn ≥ 0, andfind the singular values by σi =

√λi , i = 1, . . . , n. This defines the

matrices Σ and V of the SVD.

Step 3. To find U, consider

AV = (UΣV T )V = UΣ = [σ1u1, . . . , σnun].

This shows us that ui is uniquely determined from the i-th column of AV(by multiplying with 1/σi ) whenever σi > 0. Because of the agreed uponordering of singular values, this gives the first r columns of U, where r isthe number of non-zero singular values and equals the rank of A.The remaining m − r columns can be found by Gram-Schmidt.


















This shows us that ui is uniquely determined from the i-th column of AV(by multiplying with 1/σi ) whenever σi > 0.

Because of the agreed uponordering of singular values, this gives the first r columns of U, where r isthe number of non-zero singular values and equals the rank of A.The remaining m − r columns can be found by Gram-Schmidt.









This shows us that ui is uniquely determined from the i-th column of AV(by multiplying with 1/σi ) whenever σi > 0. Because of the agreed uponordering of singular values, this gives the first r columns of U, where r isthe number of non-zero singular values and equals the rank of A.

The remaining m − r columns can be found by Gram-Schmidt.










Examples

We demonstrate the procedure on two examples (first 7.4.11):

A =

−3 16 −26 −2

→ B =

(81 −27−27 9

)→

σ1 = 3

√10, σ2 = 0,

V =

(3/√

10 1/√

10

−1/√

10 3/√

10

)Thus, rank(A)=1, u1 = [AV ]1/σ1 = 1

3(−1, 2, 2)T , and

A = σ1u1vT1 = 3

√10

−1/32/32/3

(3/√

10, −1/√

10)

is the economic SVD (the part corresponding to non-zero σi ), and

A =

−1/3 0 4/√

18

2/3 1/√

2 1/√

18

2/3 −1/√

2 1/√

18

3√

10 00 00 0

(3/√

10 −1/√

10

1/√

10 3/√

10

)is the (full, but not ”economic”) SVD of A.

Examples

The second example is a underdetermined case (7.4.13). Even though onecould do it the same way, it is less work if one forms B = AAT , finds Σand U from it, and then V by looking at VΣT = UAT :

A =

(3 2 22 3 −2

)→ B =

(17 88 17

)→

σ1 = 5, σ2 = 3,

U =

(1/√

2 −1/√

2

1/√

2 1/√

2

)Thus, rank(A)=2, vi = [UAT ]i/σi , i = 1, 2, leading to the ”economic”SVD

A =

(1/√

2 −1/√

2

1/√

2 1/√

2

)(5 00 3

)(1/√

2 1/√

2 0

−1/√

18 1/√

18 −4/√

18

)of A. For the full SVD, add a zero column to Σ and (−2/3, 2/3, 1/3)T asthird column to V (note that Σ and A must have the same size, whereasU and V are square matrices of size m and n, respectively).

Uses of SVD and Diagonalization

Orthogonal diagonalization (= spectral decomposition) can be usedto compute powers, inverses, etc., the same way as diagonalization(Chapter 5.3) was used.The (economic) SVD A = UΣV T of A ∈ Mm×n (with U ∈ Mm×r ,V ∈ Mn×r having orthonormal column vectors, and Σ ∈ Mr×r adiagonal matrix having the r non-zero singular values σi on thediagonal) allows us to define a so-called pseudo-inverseA† = V Σ−1UT of size n ×m. Incidentally, under the conditions ofChapter 6.5, we have

cLS = A†y

for the least-squares solution of an overdetermined linear systemAc = y.The SVDs of AT can be obtained from their counterparts for A:

AT = VΣTUT = v ΣT UT .

While the diagonal part Σ of the SVD is unique, the orthogonalmatrices U, V are not.

Time Permitting

Finding the SVD by best approximation with rank-one matrices.

PCA demonstrated.

Infinite sequences as solutions of difference equations (Chapter 4.8)and linear dynamical systems (Chapter 5.6)

The matrix exponential and linear ODE systems (Chapter 5.7).

Not part of the Final Exam.

Things to Practice

Know what symmetric and orthogonal matrices are (be able to check).

Know the properties of eigenvalues/vectors of symmetric A.

Find the orthogonal diagonalization of small-size symmetric A.

Know that symmetric n × n A possess an orthonormal basis ofeigenvectors, and that this can be used to compute with A in thisbasis.

Definition of singular values, finding them.

Structure of SVD, find the SVD for small-size A.

Know what rank-one matrices are, and how to write the SVD ofgeneral A (and the spectral decomposition of symmetric A) as linearcombination of such rank-one matrices.

Find the least-squares solution of Ac = y using the economic SVD.

How does the SVD of a symmetric matrix looks like?

Documents

Math20F Slides: Summary of Chapters 5.1-3, 6.1-5, and 7.1 ...aoswald/Math20F/SlidesMath20FCh567.pdf · Math20F Slides: Summary of Chapters 5.1-3, 6.1-5, and 7.1+4 of [Lay] P. Oswald