23
“notes2” 2013/2/20 page 29 Chapter 2 Invariant Subspaces Reminder: Unless explicitly stated we are talking about finite dimensional vector spaces, and linear transformations and operators between finite dimensional vector spaces. However, we will from time to time explicitly discuss the infinite dimensional case. Similarly, although much of the theory holds for vector spaces over some field, we focus (for practical purposes) on vector spaces of the real of complex field. Definition 2.1. Let V be a vector space (real or complex) and L : V V be a linear operator over V . We say a vector space W V is an invariant subspace of L if for every w W , Lw W (we also write LW W ). Note that V , {0} (the set containing only the zero vector in V ), Null(L), and Range(L) are all invariant subspaces of L. Exercise 2.1. Prove the statement above. Theorem 2.2. Let V and L be as before, and let W 1 , W 2 , W 3 be invariant subspaces of L. Then (1) W 1 + W 2 is an invariant subspace of L, (2) (W 1 + W 2 )+ W 3 = W 1 +(W 2 + W 3 ), (3) W 1 + {0} = {0} + W 1 . Exercise 2.2. Prove theorem 2.2. (The set of all invariant subspaces of a linear operator with the binary operation of the sum of two subspaces is a semigroup and a monoid). Exercise 2.3. Prove that the sum of invariant subspaces is commutative. If an invariant subspace of a linear operator, L, is one-dimensional, we can 29

Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 29

i

i

i

i

Chapter 2

Invariant Subspaces

Reminder: Unless explicitly stated we are talking about finite dimensional vectorspaces, and linear transformations and operators between finite dimensional vectorspaces. However, we will from time to time explicitly discuss the infinite dimensionalcase. Similarly, although much of the theory holds for vector spaces over some field,we focus (for practical purposes) on vector spaces of the real of complex field.

Definition 2.1. Let V be a vector space (real or complex) and L : V → V be alinear operator over V . We say a vector space W ⊆ V is an invariant subspace ofL if for every w ∈ W , Lw ∈ W (we also write LW ⊆W ).

Note that V , {0} (the set containing only the zero vector in V ), Null(L), andRange(L) are all invariant subspaces of L.

Exercise 2.1. Prove the statement above.

Theorem 2.2. Let V and L be as before, and let W1, W2, W3 be invariant subspacesof L. Then (1) W1 + W2 is an invariant subspace of L, (2) (W1 + W2) + W3 =W1 + (W2 +W3), (3) W1 + {0} = {0}+W1.

Exercise 2.2. Prove theorem 2.2. (The set of all invariant subspaces of a linearoperator with the binary operation of the sum of two subspaces is a semigroup anda monoid).

Exercise 2.3. Prove that the sum of invariant subspaces is commutative.

If an invariant subspace of a linear operator, L, is one-dimensional, we can

29

Page 2: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 30

i

i

i

i

30 Chapter 2. Invariant Subspaces

say a bit more about it, and hence we have a special name for non-zero vectors insuch a space.

Definition 2.3. We call a nonzero vector v ∈ V an eigenvector if Lv = λv forsome scalar λ. The scalar λ is called an eigenvalue. The (ordered) pair, (v, λ) iscalled an eigenpair. (Note: this is the generalization from Chapter 1 to cover anylinear operator.)

Exercise 2.4. Given a one-dimensional invariant subspace, prove that any nonzerovector in that space is an eigenvector and all such eigenvectors have the same eigen-value.

Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a set of eigenvectors, which is the sum of theinvariant subspaces associated with each eigenvalue, is an invariant subspace.

Example 2.1. As mentioned before, the matrix A ∈ Rn×n defines a linear operator

over Rn. Consider the real matrix A =

1 2 34 5 67 8 9

and vector v =

1−21

.

Then Av = 0 = 0 ·v. Hence v is an eigenvector of A and 0 is an eigenvalue of A.The pair (v, 0) is an eigenpair. Note that a matrix with an eigenvalue 0 is singular.

Example 2.2. The following is an example for an infinite dimensional subspace.Let C∞[a, b] be the set of all infinitely differentiable real functions on the (closed)interval [a, b]. We define the addition of two functions f, g ∈ C∞[a, b] by h = f + gis the function h(x) ≡ (f + g)(x) = f(x) + g(x) (for x ∈ [a, b]), and for all α ∈ R

and f ∈ C∞[a, b] we define h = αf by h(x) ≡ (αf)(x) = αf(x). Then C∞[a, b] withthis definition of scalar multiplication and vector addition is a vector space (showthis).

Let L be defined by Lu = uxx for u ∈ C∞[a, b]. Then L is a linear operatorover C∞[a, b] and (sinωx,−ω2) is an eigenpair for any ω ∈ R.

We have Lv = λv ⇔ Lv − λv = 0. We can rewrite the right-hand side of thelast expression as (L − λI)v = 0. Since v 6= 0, the operator (L − λI) is singular(i.e. not invertible).

Although in some cases the eigenvalues and eigenvectors of a linear operatorare clear by inspection, in general we need some procedure to find them. As alllinear operators over finite dimensional subspaces can be represented as matrices,all we need is a systematic procedure to find the eigenvalues and eigenvectors ofmatrices (we get our answer for the original operator by the corresponding linearcombinations of basis vectors). Remember that the function that maps linear trans-formations between finite dimensional vector spaces given bases for the spaces tomatrices is an isomorphism (i.e. an invertible linear transformation).

Given a basis B, let A = [L]B. Using the linearity of the standard transfor-mation from operator to matrix (given a basis), we also have A − λI = [L − I]B

Page 3: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 31

i

i

i

i

31

and the matrix A− λI must be singular. Hence, det(A− λI) = 0

Definition 2.4. The polynomial det(A−λI) is called the characteristic polynomialof A. The (polynomial) equation det(A − λI) = 0 in λ is called the characteristicequation for A (and for L!). The eigenvalues of A (and of L) are the roots of thischaracteristic equation. The multiplicity of an eigenvalue as a root of this equationis called the algebraic multiplicity of that eigenvalue.

IMPORTANT!! It may seem from the above that eigenvalues depend on thechoice of basis, but this is not the case! To show that this is the case, we needonly pick two different bases for V , and show that the corresponding matrix ofthe transformation for one basis is similar to the matrix of the transformation forthe other basis. Let A = [L]B. Let C also be a basis for V and define B = [L]C .Furthermore, let X = [I]C←B. Then we have A = X−1BX (recall that this is calleda similarity transformation between A and B).

Theorem 2.5. A and B as defined above have the same eigenvalues.

Proof. From A = X−1BX we see that A− λI = X−1BX− λI = X−1(B− λI)X.Hence det(A − λI) = det(X−1)det(B − λI)det(X) and det(A − λI) = 0 ⇔det(B− λI) = 0.

So, it is indeed fine to define the eigenvalues of an operator over a vectorspace using any basis for the vector space and its corresponding matrix. In fact,as the characteristic polynomial of an n × n matrix has leading term of (−1)nλn

(check this), the characteristic polynomials of A and B are equal. So, we can callthis polynomial the characteristic polynomial of L without confusion. Note thatthe proof above does not rely on the fact that A and B are representations of the(same) linear operator L, only that A is obtained from a similarity transformationof B. So, this is a general result for similar matrices.

We say that similarity transformations preserve eigenvalues. The standardmethods for computing eigenvalues of any but the smallest matrices are in factbased on sequences of cleverly chosen similarity transformations the limit of whichis an uppertriangular (general case) or diagonal matrix (Hermitian or real symmetriccase). Take a course in Numerical Linear Algebra for more details!

Eigenvectors of matrices are not preserved under similarity transformations,but they change in a straightforward way. Let A and B be as above and letAv = λv. Then, X−1BXv = λv ⇔ B(Xv) = λ(Xv), so Xv is an eigenvector ofB corresponding to λ.

If we are interested in computing eigenvectors of an operator L, then again thechoice of basis is irrelevant (at least in theory; in practice, it can matter a lot). LetA and B be representations of L with respect to the bases B = {v1, v2, . . . vn} andC = {w1, w2, . . . wn} as above, with the change of coordinate matrix X = [I]C←B.By some procedure we obtain Au = λu, which corresponds to B(Xu) = λ(Xu).Define y =

∑n

i=1 uivi (so that u = [y]B), then we have [Ly]B = [λy]B, whichimplies (by the standard isomorphism) that Ly = λy. However, we also have

Page 4: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 32

i

i

i

i

32 Chapter 2. Invariant Subspaces

[y]C = [I]C←B[y]B = Xu. This gives [Ly]C = [λy]C ⇔ B(Xu) = λ(Xu). So,computing eigenvectors of B leads to the same eigenvectors for L as using A.

Theorem 2.6. A linear operator, L, is diagonalizable if and only if there is a basisfor the vector space with each basis vector an eigenvector of L. A matrix An×n isdiagonalizable if and only if there is a basis for Rn (respectively Cn) that consistsof eigenvectors5 of A.

We often say A diagonalizable if there exists an invertible matrix U such thatU−1AU is diagonal. But clearly, if we set D = U−1AU, we rearrage, we get

AU = UD⇒ [Au1, . . . ,Aun] = [d11u1, d22u2, . . . , dnnun]

⇒ Aui = diiui, i = 1, . . . , n

and since the ui cannot be zero (since U was assumed invertible), the columns ofU must be eigenvectors and elements of D must be eigenvalues.

Furthermore, you should make sure you are able to show that if L(x) = Axand A is diagonalizable, then if you use the eigenvectors for the (only) basis on theright and left of the linear transformation/matrix transformation picture, you findthat the matrix of the transformation is precisely the diagonal matrix containingthe eigenvalues.

As mentioned in the previous chapter, a matrix may not be diagonalizable. Wenow consider similarity transformations to block diagonal as an alternative. (Referalso to definition of block diagonal in Chapter 1)

Definition 2.7. Let A be a complex or real n× n matrix and let the numbers m1,m2, . . . , ms be given such that 1 ≤ mi ≤ n for i = 1, . . . , s and

∑s

i=1 mi = n.

Furthermore, for i = 1, . . . , s let fi = 1 +∑i−1

j=1 mj (where f1 = 1), `i =∑i

j=1 mj,and Pi = {fi, . . . , `i}. We say that A is a (complex or real) block diagonal matrixwith s diagonal blocks of sizes m1, m2, . . . , ms, if its coefficients satisfy ar,c = 0 ifr and c are not elements of the same index set Pi. (See note below).

Note that A is a block diagonal matrix if the coefficients outside the diagonalblocks are all zero. The first diagonal block is m1×m1, the second block is m2×m2,and so on. The first coefficient of block i has index fi; the last coefficient of blocki has index `i.

Theorem 2.8. Let L be a linear operator over V , with dim(V ) = n, and withinvariant subspaces V1, . . . , Vs, such that V1

⊕V2

⊕· · ·

⊕Vs = V . Further, let

there be bases v(i)1 , . . . , v

(i)mi

for each Vi (for i = 1, . . . , s), and define the ordered

set B = {v(1)1 , . . . , v

(1)m1

,v(2)1 , . . . , v

(2)m2

, . . . ,v(s)ms} (i.e., B is basis for V ). Then

5Here again, if the matrix is real, we must be careful to specify whether or not we are consideringthe matrix transformation as a map on Rn or on Cn. If the former and the eigenpairs are not allreal, then we are forced to conclude it is not diagonalizable with respect to Rn, even though itmay be if we take it with respect to Cn.

Page 5: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 33

i

i

i

i

33

A = [L]B is block-diagonal.

Proof. (This is a sketch of the proof.) As each Vi is an invariant subspace, Lv(i)j ∈

Vi. Hence, Lv(i)j =

∑mi

k=1 αkv(i)k . These coefficients correspond to columns fi, fi +

1, . . . , `i and the same rows. So, only the coefficients in the diagonal blocks of A,Am1×m1

1,1 , . . . ,Ams×ms

s,s can have nonzero coefficients.

Exercise 2.5. Write the proof in detail.

Note that a block-diagonal matrix (as a linear operator over Rn or Cn) revealsinvariant subspace information quite trivially. Vectors with zero coefficients exceptcorresponding to one diagonal block obviously lie within an invariant subspace.Moreover, bases for the invariant subspaces can be trivially found. Finally, findingeigenvalues and eigenvectors (generalized eigenvectors) for block diagonal matricesis greatly simplified. For this reason we proceed by working with matrices. However,the standard isomorphism between the n-dimensional vector space V and Cn) (orRn) given a basis B for V guarantees that the invariant subspaces we find for thematrix [L]B correspond to invariant subspaces for L, as we observe in the followingtheorem.

Theorem 2.9. Let A ∈ Cn×n be a block diagonal matrix with s diagonal blocks ofsizes m1, m2, . . . , ms. Define the integers fi = 1 +

∑i−1j=1 mj (where f1 = 1) and

`i =∑i

j=1 mj for i = 1, . . . , s. Then the subspaces (of Cn)

Vi = {x ∈ Cn|xj = 0 for all j < fi and j > `i} = Span(efi , efi+1, . . . , e`i)

for i = 1, . . . , s are invariant subspaces of A.

Proof. The proof is left as an exercise.

Exercise 2.6. Prove Theorem 2.9.

Using the previous theorem we can also make a statement about invariantsubspaces of L if the matrix representing L with respect to a particular basis isblock diagonal.

Theorem 2.10. Let L be a linear operator over V , with dim(V ) = n and let theordered set B = {v1,v2, . . . , vn} be a basis for V . Furthermore, let A = [L]B beblock-diagonal with block sizes m1,m2, . . . ,ms (in that order). Let fi and `i fori = 1, . . . , s be as defined in Definition 2.7.

Then the subspaces (of V ) V1, . . . , Vs, defined by Vi = Span(vfi ,vfi+1, . . . ,vmi)

are invariant subspaces of L and V1

⊕V2

⊕· · ·

⊕Vs = V .

Proof. The proof is left as an exercise.

Page 6: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 34

i

i

i

i

34 Chapter 2. Invariant Subspaces

Exercise 2.7. Prove Theorem 2.10.

Procedure to find invariant SS for L

1. Get the matrix representation of L first. That is, pick a basis B for V and letA = [L]B.

2. Find an invertible matrix S such that F := S−1AS is block diagonal withblocks satisfying certain nice properties (nice is TBD).

3. The columns of S will span the invariant subspaces of A (group the columnsaccording to the block sizes of F, as we’ve been doing in the preceding dis-cussion).

4. Use S and B to compose the invariant subspaces (in V ) for L.

Now, Step 2 is non-trivial, but we’ll put this off and just assume it can bedone. What remains is HOW do we finish Step 4? The following two theoremaddresss this issue.

Theorem 2.11. Let L be a linear operator over a complex n-dimensional vectorspace V , and let B = {b1, b2, . . . , bn} be a (arbitrary) basis for V . Let A = [L]Band let F = S−1AS, for any invertible matrix S ∈ Cn×n, be block diagonal withblock sizes m1, m2, . . . , ms. Furthermore, let fi = 1 +

∑i−1j=1 mj (f1 = 1) and

`i =∑i

j=1 mj, and let the ordered basis C = {c1, c2, . . . , cn} be defined by ci =∑nj=1 bjsj,i for i = 1, . . . n. Then the spaces Vi = Span(cfi , . . . , c`i) are invariant

subspaces of L and V = V1

⊕V2

⊕· · ·

⊕Vs.

Proof. The proof is left as an exercise. Hint: consider I ◦L◦I, and note that withthis definition of the C basis, S = [I]C←B.

We will now consider in some more detail a set of particularly useful andrevealing invariant subspaces that span the vector space. Hence, we consider block-diagonal matrices of a fundamental type. Most of the following results hold only forcomplex vector spaces, which are, from a practical point of view, the most importantones.

Next we provide some links between the number of distinct eigenvalues, thenumber of eigenvectors, and the number of invariant subspaces of a matrix A ∈Cn×n (working over the complex field).

Theorem 2.12. Let λ be an eigenvalue of A (of arbitrary algebraic multiplicity).There is at least one eigenvector v of A corresponding to λ.

Proof. Since A − λI is singular, there is at least one nonzero vector v such that(A− λI)v = 0.

Theorem 2.13. Let λ1, λ2, . . . , λk be distinct eigenvalues and let v(i)1 , . . . , v

(i)mi

be

Page 7: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 35

i

i

i

i

2.1. Toward a Direct Sum Decomposition 35

independent eigenvectors associated with λi, for i = 1, . . . , k. Then

{v(1)1 , . . . , v(1)

m1, v

(2)1 , . . . , v(2)

m2, . . . , v

(k)1 , . . . , v(k)

mk}

is an independent set.

Proof. Left as an exercise for the reader (it’s in most linear algebra textbooks.)

2.1 Toward a Direct Sum Decomposition

In general the above set of eigenvectors does not always give a direct sum decompo-sition for the vector space V . That is, it is not uncommon that we will not have acomplete set of n linearly independent eigenvectors. So we need to think of anotherway to get what we’re after. A complete set of independent eigenvectors for ann-dimensional vector space (n independent eigenvectors for an n-dimensional vectorspace) would give n 1-dimensional invariant subspaces, each the span of a singleeigenvector. These 1-dimensional subspaces form a direct sum decomposition ofthe vector space V . Hence the representation of the linear operator in this basis ofeigenvectors is a block diagonal matrix with each block size equal to one, that is, adiagonal matrix. In the following we try to get as close as possible to such a blockdiagonal matrix and hence to such a direct sum decomposition. We will aim forthe following two properties. First, we want the diagonal blocks to be as small aspossible and we want each diagonal block to correspond to a single eigenvector. Thelatter means that the invariant subspace corresponding to a diagonal block containsa single eigenvector. Second, we want to make the blocks as close to diagonal aspossible. It turns out that bidiagonal, with a single nonzero diagonal right above(or below) the main diagonal (picture?) is the best we can do.

In the following discussion, polynomials of matrices or linear operatorsplay an important role. Note that for a matrix A ∈ Cn×n the matrices A2 ∈ Cn×n,A3 ∈ Cn×n, etc. are well-defined and that Cn×n (over the complex field) is itself avector space.

Hence, polynomials (of finite degree) inA, expressed as α0I+α1A+· · ·+αnAn

are well-defined as well and, for fixed A, are elements of Cn×n. Note there is adifference between a polynomial as a function of a free variable of a certain typeand the evaluation of a polynomial for a particular choice of that variable.

Similarly, for linear operators over a vector space, L : V → V , composition of(or product of) the operator with itself, one or more (but finite) times, results inanother linear operator over the same space, (L)m ≡ L◦(Lm−1) and (L)m : V → V .Indeed, the set of all linear operators over a vector space V , (often expressedas L(V )) is itself a vector space (over the same field). (Exercise: Prove this!)

A nice property of polynomials in a fixed matrix (or linear operator) is thatthey commute (in contrast to two general matrices).

Exercise 2.8. Prove this for linear matrix polynomials A − λI and A − µI and

Page 8: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 36

i

i

i

i

36 Chapter 2. Invariant Subspaces

linear operator polynomials L− λI and L − µI.

We will discuss this in greater detail later, but for the moment this is all weneed.

Remember, that in the following we consider the linear operator L over the(finite) n-dimensional vector space V with ordered basis B = {b1, b2, . . . , bn} andA = [L]B. We start by using a matrix decomposition that is quite revealing. Weshow that every matrix is similar to an uppertriangular matrix with the eigenvalueson the diagonal in any desired order.

Theorem 2.14. Let A ∈ Cn×n. There exists a nonsingular S such that T =

S−1AS is uppertriangular . T has the same eigenvalues as A with the same multi-plicity, and the eigenvalues of T (and A) can appear on the diagonal in any order.

Proof. Note that T = S−1AS is a similarity transformation, which proves that Tand A have the same eigenvalues with the same multiplicities. It remains to provethat S exists with the desired properties. We prove this theorem by induction onn, the size of the matrix. We start with n = 2 to make the order of the eigenvaluesmeaningful.

Let A ∈ C2×2. Since the characteristic polynomial is of degree two, A hastwo eigenvalues, and for each distinct eigenvalue A has at least one eigenvector.Let λ1 be the eigenvalue we want at position (1, 1) on the diagonal and let x1 bea corresponding eigenvector, so that Ax1 = λ1x1. Now let y2 be any vector suchthat the matrix (x1 y2) is invertible. The existence of y2 follows from the fact thatany set of independent vectors in a vector space can be extended to a basis for thatvector space. Consider

(x1 y1

)−1A

(x1 y1

)=

(x1 y1

)−1 (x1λ1 Ay1

)=

(x1 y1

)−1 (x1 y1

)( λ1 α

0 A1

)=

(λ1 α

0 A1

)= T. (2.1)

So, for S = (x1 y1) we get the desired similarity transformation with the desiredeigenvalue λ1 in the leading position. For the 2 × 2 case the remaining submatrixA1 is the second eigenvalue. If A1 is distinct from λ1, we can replace x1 with aneigenvector corresponding to A1 and get A1 in the leading position of T.

Now, assume that the theorem holds for matrices of dimensions 2 to n−1 (theinduction hypothesis). Next, we prove that, in that case, it also holds for matricesof dimension n.

Let A ∈ Cn×n and let λ1 be an eigenvalue of A, and we want λ1 in position

(1, 1) of T. Let x1 be an eigenvector corresponding to λ1. Then there exist vectorsy2, . . . , yn such that {x1, y2, . . . yn} are independent. Define the matrix Y =(y2 y3 . . . yn) ∈ Cn×(n−1). Then the matrix (x1 Y) is invertible, and we proceed

Page 9: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 37

i

i

i

i

2.1. Toward a Direct Sum Decomposition 37

as we did for the 2× 2 case,

(x1 Y

)−1A

(x1 Y

)=

(x1 Y

)−1 (x1λ1 AY

)=

(x1 Y

)−1 (x1 Y

)( λ1 (α)T

0 An−1

)=

(λ1 (α)T

0 An−1

), (2.2)

where An−1 is an (n− 1)× (n− 1) (sub)matrix. Note that λ(A) = λ1 ∪ λ(An−1),where λ(B) indicates the set of eigenvalues (or spectrum) of a matrix B. By the

induction hypothesis, there exists a matrix Zn−1 ∈ C(n−1)×(n−1) such that

Z−1

n−1An−1Zn−1 = Tn−1.

Now let

Z =

(1 0

0 Zn−1

),

then

Z−1(x1 Y

)−1A

(x1 Y

)Z = Tn. (2.3)

Note that the matrix S in the theorem is given by S = (x1Y)Z. Both matri-

ces, (x1Y) and Z are invertible by construction, and the product of two invertiblematrices is invertible again.

Corollary 2.15. Let A ∈ Cn×n have distinct eigenvalues λ1 with with multiplicity

m1, λ2 with multiplicity m2, . . . , and λs with multiplicity ms (where m1 + m2 +· · ·+m1 = n). There exists a similarity transformation S−1AS = T such that T isuppertriangular and equal eigenvalues of A are grouped together in diagonal blocks,with the first diagonal block of size m1 × m1 with λ1 on the diagonal, the seconddiagonal block of size m2 ×m2 with λ2 on the diagonal, and so on.

T =

T1,1 T1,2 . . . T1,s

O T2,2 . . . T2,s

O O. . .

...O O . . . Ts,s

, (2.4)

where Ti,i ∈ Cmi×mi is uppertriangular and its diagonal is λiImi

.

Exercise 2.9. Prove that an uppertriangular matrix T ∈ Cn×n has invariantsubspaces Span(e1), Span(e1, e2), . . . , Span(e1, . . . , en).

Page 10: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 38

i

i

i

i

38 Chapter 2. Invariant Subspaces

Next we derive a sequence of similarity transformations that will show thatthe matrix T is similar to a block diagonalmatrix with the mi×mi uppertriangularmatrices Ti,i for i = 1, . . . , s on the diagonal.

Theorem 2.16. Let A ∈ Cn×n be the uppertriangular matrix,

A =

(L HO M

),

with (uppertriangular ) diagonal blocks L ∈ Cm×m (1 ≤ m < n) and M ∈C(n−m)×(n−m) such that λ(L) ∩ λ(M) = ∅, Then there exists a similarity trans-formation S−1AS such that

S−1AS =

(L OO M

),

Proof. Since we want to preserve the blocks L and M, we consider S of the form

S =

(Im QO In−m

),

where Im ∈ Cm×m and In−m ∈ C(n−m)×(n−m). Then

S−1AS =

(Im −QO In−m

)(L HO M

)(Im QO In−m

)

=

(L LQ−QM+HO M

),

where the reader should verify that S−1 has the form used here. This matrix hasthe desired property if LQ −QM +H = O. It turns out that for L and M withthe properties given (both uppertriangular and their spectra disjoint), for any H,we can pick (solve for) such an Q that this equation is satisfied. Write the equationin the following form

LQ−QM = −H,

and, exploiting the fact that L and M are uppertriangular , solve this equationcolumn by column. For the first column we have

Lq1 − q1m1,1 = −h1 ⇔ (L−m1,1Im)q1 = −h1.

Since m1,1 ∈ λ(M) ⇒ m1,1 6∈ λ(L), (L −m1,1I) is nonsingular (and uppertrian-gular ) and we can solve the equation for q1. However, once q1 is known, we canconsider the second column

Lq2 − q2m2,2 = −h2 + q1m1,2 ⇔ (L−m2,2Im)q2 = −h2 + q1m1,2.

Page 11: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 39

i

i

i

i

2.1. Toward a Direct Sum Decomposition 39

Since m2,2 ∈ λ(M) ⇒ m2,2 6∈ λ(L), (L −m2,2I) is nonsingular (and uppertrian-gular ) and we can solve the equation for q2. Now, assuming we have solved forcolumns 1 to k − 1, we proceed for the k-th column as follows

Lqk − qkmk,k = −hk +k−1∑

j=1

qjmj,k ⇔ (L−mk,kIm)qk = −h2 +k−1∑

j=1

qjmj,k.

Again since the matrix on the left hand side is nonsingular, we can solve the equationfor k = 3, 4, . . . , n−m. Hence, we can always obtainQ such that LQ−QM+H = O.

Note that it does not matter whether L or M is singular, as long as λ(L) ∩λ(M) = ∅.

Corollary 2.17. Let A ∈ Cn×n have distinct eigenvalues λ1 with with multiplicitym1, λ2 with multiplicity m2, . . . , and λs with multiplicity ms (where m1 + m2 +· · · +m1 = n). There exists a similarity transformation S−1AS = D such that Dis block diagonaland uppertriangular , with the first diagonal block of size m1 ×m1

with λ1I on the diagonal, the second diagonal block of size m2×m2 with λ2I on thediagonal, and so on.

D =

T1,1 O . . . OO T2,2 . . . O

O O. . .

...O O . . . Ts,s

,

where Ti,i ∈ Cmi×mi is uppertriangular and its diagonal is λiImi.

Proof. According to Corollary 2.15, we can use a similarity transformation tofind an uppertriangular matrix T similar to A with equal eigenvalues grouped indiagonal blocks as given in (2.4). We now apply Theorem 2.16 with L = T1,1,H = (T1,2 . . . T1,s), and

M =

T2,2 . . . T2,s

......

Ts,2 . . . Ts,s

,

and compute Q to zero the block row corresponding to H. Next, we apply Theo-rem 2.16 with

L =

(T1,1 OO T2,2

),

H =

(T1,3 . . . T1,s

T2,3 . . . T2,s

),

and

M =

T3,3 . . . T3,s

......

Ts,3 . . . Ts,s

,

Page 12: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 40

i

i

i

i

40 Chapter 2. Invariant Subspaces

and so on.

This decomposition reveals a number of important properties of the matrixA and special invariant subspaces as well as of any linear operator over a finitedimensional vector space.

Definition 2.18. Let L be a linear operator over the vector space V , and let Ube an invariant subspace of L. We define the restriction of L to U as the linearoperator LU : U → U such that for all u ∈ U LUu = Lu.

Exercise 2.10. Let L be a linear operator over V and let B = {v1,v2, . . . ,vn}be a basis for V such that A = [L]B is block diagonal with diagonal blocks A1,1 ofsize m1, A2,2 of size m2, . . . , As,s of size ms (in that order) and let fi and `i bedefined as in definition 2.7 for i = 1, . . . , s. Let Bi = {vfi ,vfi+1, . . . ,v`i} and letVi = Span(Bi) for i = 1, . . . , s. Show that Ai,i = [LVi

]Bi].

Corollary 2.17 shows that every matrix has a decomposition A = SDS−1

that reveals (unique) invariant subspaces of dimension mi = aλi(the algebraic

multiplicity) associated with each distinct eigenvalue. As every linear operator overa finite dimensional subspace with a given basis is uniquely associated with a matrix,we have the same conclusion about linear operators.

We next discuss a few additional properties of matrices and linear operatorsthat follows (almost) directly from this decomposition or the block diagonal form.

To each distinct eigenvalue λi of A (L) there corresponds an invariant sub-space, Vλi

, of dimension aλisuch that the restriction of A (L) to that invariant

subspace has only the eigenvalue λi. There is no invariant subspace of larger di-mension with this property. There may be invariant subspaces of smaller dimensionwith this property, and finding those and characterizing them is the topic of theremainder of this chapter.

Note that each invariant subspace of dimension aλiassociated with a distinct

eigenvalue has at least one eigenvector. The same holds for any linear operatorthat, with respect to some given basis, has the matrix A as its representation.

These invariant subspaces, Vi, associated with the distinct eigenvalues λi forma direct sum decomposition of the vector space Cn (or V for L). So, we have achievedpart of our quest, a direct sum decomposition of the vector space over which L (orA) is defined in invariant subspace that are associated with the eigenvalues.

The next two examples show that this may or may not be the best we can do.(perhaps we should explicitly identify A with the linear operator LA over Rn or Cn

or treat the separately - I prefer the first)It shows that with every distinct eigenvalue λ ∈ λ(L) is associated an invariant

subspace of dimension. ThThe matrix A in the corollary above has invariant

prove general possibility to turn block diagonal matrix with distinct spec-tra for each block into block diagonal matrix

Let L be a linear operator over the complex vector space V , and let U be a finite

Page 13: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 41

i

i

i

i

2.1. Toward a Direct Sum Decomposition 41

dimensional invariant subspace of L with dim(U) ≥ 1. Furthermore, let x ∈ Uand let m be the smallest integer such that the set of vectors {x,Lx, . . . ,Lmx} isdependent. So, {x,Lx, . . . ,Lm−1x} is independent. Since, U is finite dimensionalsuch m always exists and obviously m ≤ dim(U).

Since U is an invariant subspace and x ∈ U , Lx ∈ U and (by induction)(L)kx ∈ U for any k ∈ N. The following two questions relate to the U , x and L inthis and the previous paragraph.

Exercise 2.11. Prove that for any polynomial p(L) in L, and x in the U definedabove, we have p(L)x ∈ U .

Exercise 2.12. Prove that for any polynomial pm−1(L) of degree at most m − 1,except the zero polynomial, pm−1(L)x 6= 0.

However, from the dependence of {x,Lx, . . . ,Lmx} follows that there is a nontriviallinear combination of these vectors that sums to zero: α0x+α1Lx+· · · , αmLmx = 0.This is equivalent with (α0I + α1L+ · · · , αmLm)x = 0.

Theorem 2.19. Let L and U be as above. Then, there is a vector u ∈ U suchthat Lu = λu (for some λ). (Every invariant subspace of a complex linear operatorcontains at least one eigenvector).

Proof. Let x ∈ U and let m be the smallest integer such that the set of vectors{x,Lx, . . . ,Lmx} is dependent. Hence, there is a nontrivial linear combinationsuch that

∑m

k=0 αkLkx = 0⇔ (∑m

k=0 αkLk)x = 0. By the fundamental theorem ofalgebra6 this is equivalent to

αm(L − rmI)(L − rm−1I) · · · (L − r1I)x = 0

(for r1, . . . , rm such that αm(L − rmI)(L − rm−1I) · · · (L − r1I) = (∑m

k=0 αkLk)).From Exercise [?] above we know that (L−rm−1I) · · · (L−r1I)x 6= 0, and therefore(L − rm−1I) · · · (L − r1I)x is an eigenvector of L with eigenvalue rm.

In fact, this proof tells us a little more. Since the factors (L − ri) commute,we also have

αm(L − rm−1I)(L − rmI)(L − rm−2I) · · · (L − r1I)x = 0,

so that (L − rmI)(L − rm−2I) · · · (L − r1I)x must be an eigenvector of L witheigenvalue rm−1. Of course, this construction can be repeated to bring each factor

(L − riI) to the left to show that(∏

k 6=i(L − rkI))x is an eigenvector of L with

eigenvalue ri. As we have shown above, eigenvectors corresponding to distincteigenvalues are independent. So, we have the following useful theorem.

Theorem 2.20. Let L and U be as above, and let x ∈ U and m be the smallest

6Each complex polynomial can be factored over the complex field.

Page 14: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 42

i

i

i

i

42 Chapter 2. Invariant Subspaces

integer such that the set of vectors {x,Lx, . . . ,Lmx} is dependent. Then there existαm and r1, . . . , rm such that

αm(L − rm−1I)(L − rmI)(L − rm−2I) · · · (L − r1I)x = 0.

Furthermore, each ri is an eigenvalue of L and for each distinct ri there existsui ∈ U such that Lui = riui.

Proof. (see discussion above)

As we have seen above (actually this needs to be discussed still) the numberof independent eigenvectors of L is the sum of the geometric multiplicities of each(distinct) eigenvalue. We have also seen that each invariant subspace contains atleast one eigenvector. We would like to derive a set of invariant subspaces such that

1. their direct sum is the vector space V , so they can only intersect trivially,

2. they are fundamental (better term!) in that none can be split further intothe direct sum of invariant subspaces.

To this end we consider invariant subspaces U such that U contains exactlyone independent eigenvector. We are still assuming complex vector spaces.

Theorem 2.21. Let L be a linear operator over the complex vector space V . Let Ube a finite dimensional invariant subspace of L with dim(U) = s and the propertythat for any two eigenvectors of L, u1,u2 ∈ U , {u1,u2} is dependent. Then thereexists a vector x ∈ U and scalar λ such that (L−λI)sx = 0 and (L−λI)s−1x 6= 0.

The vector (L−λI)s−1x is the unique independent eigenvector of L in U witheigenvalue λ.

Proof. The first part of the proof follows almost directly from the previous dis-cussion. For any vector y ∈ U , there is a smallest m ≤ s such that y,Ly, . . . ,Lmyis dependent and a nontrivial linear combination α0y+ α1Ly + · · ·+ αmLmy = 0.We can again factor the polynomial in L to obtain

αm(L − rmI)(L − rm−1I)(L − rm−2I) · · · (L − r1I)y = 0.

However, according to theorem ??? every distinct ri yields an independent eigen-vector ui ∈ U . As we have by assumption a single eigenvector in U , the polynomialcan only have one distinct root λ (with multiplicity m). So the polynomial equationmust be

αm(L − λI)my = 0,

and as αm 6= 0 (why?), we have (L−λI)my = 0. As proved in exercise ??? we alsohave that (L − λI)m−1y 6= 0.

In the second part of the proof we show that there is a vector x ∈ U suchthat m = s. Assume m < s (otherwise, we are done). As dim(U) = s > m, there

Page 15: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 43

i

i

i

i

2.2. Invariant Subspace Examples 43

must be a vector z such that {y,Ly, . . . ,Lm−1y, z} is independent. Furthermore,there must be a smallest positive integer ` such that {z,Lz, . . . ,L`} is dependent.As for y and m we have that (L − λI)`z = 0 and (L − λI)`−1z 6= 0. Both(L − λI)`−1z and (L − λI)m−1y are eigenvectors and therefore (by assumption)(L − λI)`−1z = α(L − λI)m−1y.

Now assume (first) that ` ≤ m. From the equation above we have that

(L − λI)`−2z = α(L − λI)m−2y + β1(L − λI)m−1y,(L − λI)`−3z = α(L − λI)m−3y + β1(L − λI)m−2y + β2(L − λI)m−1y,...z = α(L − λI)m−`y + β1(L − λI)m−`+1y + β2(L − λI)m−`+2y + · · ·+ β`−1(L − λI)m−1y .

But this yields a contradiction, since {y,Ly, . . . ,Lm−1y, z} is independent. Hence` > m must hold. If ` < s there must again be a vector z ∈ U such that{z,Lz, . . . ,L`−1z, z} is independent, and we repeat the construction. As s is fi-nite, this repeated construction leads to the desired vector x.

Now let the linear operator L over a complex vector space V have m indepen-dent eigenvectors u1, . . . ,um. Then there are invariant subspaces Ui of L such thateach Ui contains ui and this is the only independent eigenvector in Ui. Furthermore,dim(Ui) = si and si is maximum.

Theorem 2.22. U1

⊕U2

⊕· · ·

⊕Um = V .

Proof. TBD

• applications: consider matrices from mass-spring systems, matrices from finitedifferences, matrices from finite elements, other simple examples, (partial)differential equations

2.2 Invariant Subspace Examples

We now discuss a number of more extensive examples. The first involves simpledifferential equations

2.2.1 Differential Equations

Consider again the space C∞[t0, te] of infinitely differentiable (real) functions on[t0, te], the function y ∈ C∞[t0, te], and the simple differential equation y = λy forsome real λ. If we define L : C∞[t0, te] → C∞[t0, te] by Ly ≡

∂∂ty ≡ y (verify that

L is a linear operator over C∞[a, b]), then the differential equation can be writtenas Ly = λy. So, solutions to this differential equation are eigenvectors of L witheigenvalue λ. The function u(t) = eλt with t ∈ [t0, te] is a solution, and so is aeλt forany a ∈ R. Note that, as always with eigenvectors, the length is not relevant (exceptthat it cannot be 0). Alternatively, eλt is a nontrivial solution to the singular linear

Page 16: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 44

i

i

i

i

44 Chapter 2. Invariant Subspaces

system (L − λI)y = 0, called homogeneous equation in ODE terms. As alwaysholds for solutions to homogeneous (linear) equations, any function aeλt, for anya ∈ R is also a solution. Note that these properties follow directly from the linearalgebra structure of the problem: (1) (nonzero) length of eigenvectors is irrelevantand (2) any scalar multiple of a solution to an homogeneous linear equation is alsoa solution.

To identify a unique solution we typically assume an initial condition (buta condition at another point in the interval would work just as well). Given aninitial condition y(t0) = y0. We use this condition to fix the choice of a in theone-dimensional space of solutions aeλt.

Next, we show that more complicated (coupled) systems of linear differentialequations take a very simple and insightful form if we express the problem in theeigenvector basis. Consider the coupled system of linear differential equations.

y =

(4 11 4

)y, for t ∈ (0, T ] with y(0) =

(20

).

In this case the linear differential equation is more difficult to solve as the equationsfor the components of y(t), y1(t) and y2(t), are coupled. We now show that inthe eigenvector basis the equations become uncoupled and the solution for eachcomponent (in this basis) can be given directly, as above. The matrix has theeigenvalues λ1 = 5, λ2 = 3 and corresponding eigenvectors v1 = [1 1]T , and v2 =[1 − 1]T . Hence we have the eigendecomposition of this matrix:

(4 11 4

)=

(1 11 −1

)(5 00 3

)(1/2 1/21/2 −1/2

),

where(

1/2 1/21/2 −1/2

)=

(1 11 −1

)−1.

Let the component vector of y(t) in the basis [v1 v2] be [α(t)β(t)]T . Then the

component vector of y in the eigenvector basis is [α β], and the differential equationabove becomes(

1 11 −1

)(α

β

)=

(1 11 −1

)(5 00 3

)(1 11 −1

)−1 (1 11 −1

)(αβ

)⇔

(1 11 −1

)(α

β

)=

(1 11 −1

)(5 00 3

)(αβ

)⇔

β

)=

(5 00 3

)(αβ

),

and we’re back to two (uncoupled) equations that can be solved independently.Note that this is just the same differential equation written in the basis defined bythe eigenvectors. Computing the decomposition of the y(0) along the eigenvectorsgives the initial conditions for α and β.

(α(0)β(0)

)=

(1/2 1/21/2 −1/2

)(20

)=

(11

).

Page 17: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 45

i

i

i

i

2.2. Invariant Subspace Examples 45

So, the solution for α is given by α(t) = e5t and the solution for β is given byβ(t) = e3t. Multiplying each by the corresponding eigenvector gives the solution inthe original basis (the canonical basis for R2).

y(t) =

(1 11 −1

)(α(t)β(t)

)=

(e5t + e3t

e5t − e3t

).

Note that the eigenvalue with largest real part gives the dominant component ormode of the solution.

We now generalize this example for a diagonalizable matrix (we discuss thenondiagonalizable case in the next chapter). Consider the differential equationy = Ay for t ∈ (0, T ] and y(0) = y0 for y : [0, T ] → Rn and A ∈ Rn×n. Note,that although A is a real matrix, we may need to factor the characteristic equa-tion of A over the complex field resulting (potentially) in complex eigenvalues andeigenvectors.

Let the eigendecomposition of A be A = VΛV−1, where V is matrix with the(typically normalized) eigenvectors as columns and Λ is a diagonal matrix with theeigenvalues on the diagonal (in the order corresponding to the order of the eigen-vectors in V). Let y(t) in the eigenvector basis (the columns of V) be representedby η, that is, y(t) = Vη(t). Again substituting this and the eigendecomposition ofA into the differential equation gives

Vη = VΛV−1Vη = VΛη ⇔

η = Λη.

This equation has componentwise solutions ηi(t) = η0,ieλit, where the initial values

η0,i are obtained from

y0 = y(0) = Vη(0) = Vη0 ⇔ η0 = V−1y0.

The solution in the eigenvector basis V is now given by η(t) = diag(eλit)η0.7 We

obtain the solution in the standard basis again by multiplying by the matrix of basisvectors (here, the eigenvectors) V: y(t) = Vdiag(eλit)V−1y0.

The matrix Vdiag(eλit)V−1 is often denoted as eAt (called the matrix ex-ponential). This notation is inspired by the simple form that polynomials in di-agonalizable matrices take. If A = VΛV−1, then Ak = VΛkV−1. Consider thepolynomial p(t) =

∑nk=0 pkt

k applied to A. We have

p(A) = p0I+ p1A+ p2A2 + · · ·+ pnA

n (2.5)

= p0VV−1 + p1VΛV−1 + p2VΛ2V−1 + · · ·+ pnVΛnV−1 (2.6)

= V(p0I+ p1Λ+ p2Λ

2 + · · ·+Λn)V−1 (2.7)

= Vdiag(p(λk))V−1. (2.8)

Note that the eigenvectors of p(A) are the eigenvectors of A and the polynomial issimply applied to each eigenvalue individually. Noting that

eλkt = 1 + λkt+1

2(λkt)

2 +1

6(λkt)

3 + · · · ,

7We use diag(x) or diag(xi) to indicate a diagonal matrix with the coefficients of x on thediagonal.

Page 18: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 46

i

i

i

i

46 Chapter 2. Invariant Subspaces

we see that

Vdiag(eλit)V−1y0 = V

(I+ tΛ+

1

2(tΛ)2 + · · ·

)V−1 (2.9)

= I+ tA+1

2(tA)2 + · · · , (2.10)

which is the polynomial that generates the exponential function (note that thissequence converges for any value of λkt). So, it’s the exponential function appliedto tA, hence the notation eAt.

We will discuss polynomials of matrices and applications of these in moredetail later.

In many cases, we are not so much interested in the details of the solution y(t)given above, but only in the simple question (simple to ask) whether y(t) → 0 fort → ∞, or whether components of y may grow without bound or remain boundedwhile not going to zero. This addresses the question of stability of a dynamicalsystem. We will discuss some aspects here briefly and come back to the issue inmore detail after defining how to measure magnitudes or lengths of vectors in avector space.

In the example given above of a system of two coupled ordinary differen-tial equationswe see that both solution components blow up for increasing t, y =(

11

)e5t +

(1−1

)e3t. However, for large t it is the component (1 1)T e5t that

dominates. This is easily seen by writing the solution as

y(t) = e5t(

1 + e−2t

1− e−2t

).

This is generally the case assuming that the initial solution has a component in thedirection of the eigenvector with the largest real part.

Consider the autonomous dynamical system y(t) = f(y(t)), with y(0) = y0,where y(t) : (0,∞] → Rn and f : Rn → Rn. If f(y0) = 0, there is no change,y = 0, and the system is in steady (or stationary) state, also called in equilibrium.Stability addresses what happens if we perturb the system slightly from this steadystate. If we perturb the state (sufficiently) slightly in an arbitrary direction, willthe system return to the equilibrium (equilibrium is asymptotically stable), will thesystem stay close to the equilibrium but not necessarily return to the equilibrium(equilibrium is stable), or will the system more further and further away from theequilibrium (equilibrium is unstable).

Consider a perturbation from y0 and let the perturbed state of the system berepresented by y(t) = y(t) + ε(t) = y0 + ε(t). Then, assuming the Jacobian of f isLipschitz continuous (details later), we have

˙y = ε = f(y0 + ε(t)) = f(y0) + J(y0)ε(t) +O(n∑

i,j=1

ε2i ), (2.11)

Page 19: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 47

i

i

i

i

2.2. Invariant Subspace Examples 47

where

J(y) =

∂f1(y)∂y1

∂f1(y)∂y2

. . . ∂f1(y)∂yn

∂f2(y)∂y1

∂f2(y)∂y2

. . . ∂f2(y)∂yn

...∂fn(y)∂y1

∂fn(y)∂y2

. . . ∂fn(y)∂yn

We can discuss the nonlinear termO(∑n

i,j=1 η2i ) in more detail in chapter 4, after we

introduce norms. However, it should be clear that the analysis of stability dependsfirst of all on the properties of the Jacobian J (at the equilibrium), as for small ekthe nonlinear term will be very small. Assuming, in addition, that the Jacobian isdiagonalizable, we consider the system of first order equations e = Je (so, we assumethe nonlinear term is negligible for small enough e) with initial condition e(0) = e0,and proceed as above. Let J = VΛV−1 be the eigendecomposition of the Jacobian,and let the representation or coordinate vectors of e in the basis V be given byη, e = Vη. Then the solution components for η are given by ηk(t) = eλktη0,k.Since we are interested in (in)stability under arbitrary perturbations, we assumeall components of η0 are nonzero. Hence, for all components of η to go to zero(asymptotic stability), we need that Re(λk) < 0 for all k = 1, . . . , n. For stability,we need that no components grow while some (possibly all, possibly none) decay.Hence, we need that Re(λk) ≤ 0 for all k = 1, . . . , n. Components with zero realpart of the eigenvalue λk are balancing on the boundary of instability, and in thiscase the effect of the nonlinear term in (2.11) will play a part. Finally, for instabilitywe need that at least one component grows without bound, which means there is atleast one k such that Re(λk) > 0. We will show later that for systems that satisfy theconditions given for this autonomous system, the analysis of the linear componentis actually sufficient for both asymptotically stable and unstable equilibria. For thelinear stable but not asymptotically stable case, analysis of the nonlinear term isrequired.

It is important to realize that in many problems of practical importance the‘finite’ dimension n can be very, very large, and computing the complete eigen-decomposition would be a hopeless task (and when still doable, often extremelyexpensive). However, our analysis shows that this is not necessary. All we need toknow are the eigenvalues with real part near zero or (much) larger than zero. Inpractice, this means that we need only compute a modest number of eigenvaluesof a very large matrix. That turns out to be a manageable task. We will discussthis briefly in section (Krylov methods for linear systems and eigenvalue problems).For detailed information, see books on numerical analysis or large-scale eigenvalueproblems. Should we collect references somewhere?

Discretizing a Differential Equation

We could also put this earlierNext we consider a numerical approach, finite difference discretization, to an-

alyze or solve differential equations that is much more generally applicable than

Page 20: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 48

i

i

i

i

48 Chapter 2. Invariant Subspaces

symbolic approaches. In order to emphasize similarities and differences with sym-bolic approaches we consider, at first, a very simple equation. Discretization ofPDEs is one of the major sources of practical matrix problems, and we can use thisto provide us with meaningful (higher dimensional) linear operators and (larger)matrices to demonstrate the theory introduced and show how this theory is usedsolving very important problems in science and engineering. After introducing innerproducts and related theory (linear and bilinear forms), we will discuss another ap-proach to discretization, finite elements, that relies even more on the linear algebra.

In deriving effective discretization and numerical methods various concernsregarding convergence and error bounds are important, but they are more in thedomain of analysis, and we do not consider them here. An advanced numericalanalysis course would answers these questions.

First we prove a result that will help us put finite difference discretization inperspective for these notes.

Theorem 2.23. The composition of two linear transformations, L : U → V andM : V → W , (M◦L) : U → W , defined as (M◦ L)u =M(Lu) (for u ∈ U) is alinear transformation. Analogously, for two matrices MB ∈ Cm×n and A ∈ C`×m

the composition AB ∈ C`×n defines a linear transformation.

Proof. The proof is left as an exercise.

Exercise 2.13. Prove theorem 2.23. Prove that the composition of a sequence oflinear transformations Ln ◦ Ln−1 · · · ◦ L1, where Li : Vi → Vi+1 is again a lineartransformation.

Consider the partial differential equation ut = uxx + f(x) for t ∈ (0, T ] andx ∈ [0, 1] (the heat equation) or, slightly more general if the rate of heat conductionis not constant, ut = (a(x)ux)x + f(x), and in two spatial dimensions plus time,ut = ∇ · (a(x, y)(∇u) + f(x, y), where (x, y) ∈ [0, 1]× [0, 1]. We can rewrite theseequations in the form of linear operator equations. For the first equation, we get(∂t − ∂2

x)u = f .To discretize the first equation we consider the unknown values of the function

u(x, t) at the set of grid points (xj , tn defined by xj = j∆x for j = 0, . . . , J and∆x = 1/J , and tn = n∆t, for n = 0, . . . , N and ∆t = 1/N .

Define the vector un = (u(x0, n∆t), u(x1, n∆t), . . . , u(xJ , n∆t)).

Exercise 2.14. Show that mapping L(u) = un (for given n) is a linear mapping.

This process is called sampling (especially in terms of signals and pictures).The same holds for sampling for a set of arbitrary (x, t) points (an irregular grid/sampling).Occasionally we will also consider a matrix u of u values in space and time withuj,n = u(j∆x, n∆t) or the vector uk = u(j∆x, n∆t) with k = (n− 1) ∗ J + j (justa linear reordering of the matrix above).

The next step in finite difference discretization is to replace all derivatives by

Page 21: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 49

i

i

i

i

2.2. Invariant Subspace Examples 49

linear combinations of (unknown) function values on the regular grid. The pointbeing that after this only the unknown function values are unknowns in the discreteproblem, and we do not need the values of the various derivatives of u on the grid.

We define a number of finite difference operators and the reader should verifythat (1) these finite difference operators are linear operators and that (2) other(more elaborate) finite difference operators can be defined similarly. Given ∆t and∆x, we define the following forward and backward differences in space (x) and time(t)

∆+xu(x, t) = u(x+∆x, t)− u(x, t), (2.12)

∆−xu(x, t) = u(x, t)− u(x−∆x, t), (2.13)

∆+tu(x, t) = u(x, t+∆t)− u(x, t), (2.14)

∆−tu(x, t) = u(x, t)− u(x, t−∆t), (2.15)

(2.16)

and the following central differences in space and time

δxu(x, t) = u(x+ (1/2)∆x, t)− u(x− (1/2)∆x, t), (2.17)

∆0xu(x, t) = u(x+∆x, t)− u(x−∆x, t), (2.18)

δtu(x, t) = u(x, t+ (1/2)∆t)− u(x, t− (1/2)∆t), (2.19)

∆0tu(x, t) = u(x, t+∆t)− u(x, t−∆t). (2.20)

(2.21)

We can compose further difference operators from these. An important one is

(δ2x)u(x, t) = δx(δxu) = δx(u(x+ (1/2)∆x, t)− u(x− (1/2)∆x, t))

= (u(x+∆x, t)− u(x, t)) − (u(x, t)− u(x−∆x, t))

= u(x+∆x, t)− 2u(x, t)) + u(x−∆x, t). (2.22)

Using Taylor approximation (assuming that u has enough derivatives) we can showthat these difference operators (with a proper weighting) provide approximationsto derivatives. For ∆x−1∆+xu(x, t) we have

∆x−1∆+xu(x, t) = ∆x−1(u(x+∆x, t)− u(x, t)) (2.23)

= ∆x−1(u(x, t) + ∆xux(x, t) =1

2∆x2uxx(x, t)− u(x, t))(2.24)

= ux(x, t) +1

2∆xuxx(x, t), (2.25)

for some x ∈ (x, x +∆x).

Page 22: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 50

i

i

i

i

50 Chapter 2. Invariant Subspaces

The weighted finite difference operator (1/∆x2)δ2xu(x, t) approximates the secondderivative in x (below u, ux, etc are evaluated at (x, t)),

δ2xu(x, t) = (u(x−∆x, t) − 2u(x, t)) + u(x+∆x, t))

= (u −∆xux +∆x2

2uxx −

∆x3

6uxxx +

∆x4

24uxxxx(x1, t) +

u+∆xux +∆x2

2uxx +

∆x3

6uxxx +

∆x4

24uxxxx(x2, t)−

2u)

= ∆x2uxx+O(∆x4).

Dividing by ∆x2 gives

1

∆x2δ2xu(x, t) = uxx+O(∆x2). (2.26)

Evaluating these finite differences at the grid points results in equations involvingonly the unknown values of u(x, t) at the grid points, u(j∆x, n∆t).

Consider now the simple equation ut = uxx for t ∈ (0, T ] and x ∈ (0, 1), withboundary conditions u(x, 0) = u0(x) (initial condition) and u(0, t) = 0 and u(1, t) =0. We discretize as discussed above, with the values uj,0 ≡ u(j∆x, 0) = u0(j∆x)given and u0,n = u(0, n∆t) = 0 and uJ,n = u(1, n∆t) = 0. We use a forward finitedifference to approximate ut and the second order central difference ((1/∆x2)δ2xu)to approximate uxx. We evaluate the expression ut = uxx at the point (j∆x, n∆t)and write out the approximations.

u(j∆x, n∆t+∆t)− u(j∆x, n∆t)

∆t=

1

∆x2(u(j∆x−∆x, n∆t)− 2u(j∆x, n∆t) + u(j∆x+∆x, n∆t)) ⇔

uj,n+1 − uj,n

∆t= ∆x−2 (uj,n − 2uj,n + uj+1,n)⇔ (2.27)

uj,n+1 = uj,n +∆t

∆x2(uj,n − 2uj,n + uj+1,n) . (2.28)

Since uj,0 is given, we have a simple update formula that lets us compute the valuesof uj,n+1 for j = 1, . . . , J from those of uj,n for for j = 1, . . . , J .

We can also write this as a matrix iteration. We have

un+1 = un +∆t

∆x2Aun, (2.29)

where un contains the values uj,n for j = 1, . . . , J − 1 (so excluding the end pointswhere the solution is given by the boundary conditions), and A is the tridiagonalmatrix containing the coefficients of (2.28) in each row, (empty spaces indicate

Page 23: Chapter 2 Invariant Subspaces - Misha Kilmer · 2018. 7. 27. · Vice versa the span of an eigenvector is an invariant subspace. From Theo-rem 2.2 then follows that the span of a

“notes2”2013/2/20page 51

i

i

i

i

2.2. Invariant Subspace Examples 51

zeros)

A =

−2 11 −2 1

1 −2 1. . .

. . .. . .

1 −2 11 −2

.