Introduction to Applied Linear Algebramath.ucdenver.edu/~spaul/empty/hostedfiles/linear.pdfApplied Linear Algebra De nitions, Theorems and Problems SUBRATA PAUL FALL - 2014 1 Contents

Introduction

to

Applied Linear AlgebraDefinitions, Theorems and Problems

SUBRATA PAUL

FALL - 2014

1

Contents

1. Vector Space 12. Finite-Dimensional Vector Spaces 23. Exercise on Finite Dimensional Vector Space 44. Linear Map 115. Exercise on Linear Map 136. Eigenvalues 166.1. Some comments 177. Problems on Eigenvalues 198. Operations on Complex Vector Spaces 218.1. Decomposition of an Operator 228.2. Characteristic and Minimal Polynomials 228.3. Jordan Form 239. Trace and Determinant 249.1. Trace 249.2. Determinant 2410. Inner Product Space 2610.1. Proof of some theorems 2811. Exercise of Chapter 6 3012. Some important results 3113. Operations on Inner Product Spaces 3214. Problems on Operations in inner product spaces 3715. Scrambled Ideas 4215.1. On Square root of a matrix 4315.2. On the Matrix Decompositions 4316. On SPD 4617. On Row space, Column space, rank and nulity 5118. On Diagonalizablity 5319. On Orthogonal Projection 5619.1. Projection in a Nutshell 5720. On minimization problems 6121. Matrix Differentiation 6722. Miscellaneous 6923. List of Important Problems 71

1. Vector Space

Definition 1.1 (Vector Space). A vector space is a set V along with an addition and a

scalar multiplication on V such that the following properties hold:

(1) commutative: u+ v = v + u for all u, v ∈ V(2) associativity (u+ v) + w = u+ (v + w) and (ab)v = a(bv) for all u, v, w ∈ V and

all a, b ∈ F.

(3) additive identity: there exists an element 0 ∈ V such that v + 0 = v = 0 + v for

all v ∈ V .

(4) additive inverse: for every v ∈ V , there exists w ∈ V such that v + w = 0;

(5) multiplicative identity 1v = v for all v ∈ V(6) distributive properties: a(u+ v) = au+ av and (a+ b)u = au+ bu for all a, b ∈ F

and all u, v ∈ V .

Definition 1.2 (Subspace). A subset U of V is called a subspace of V if U is also a

vector space.

Theorem 1.1. A subset U of V is a subspace of V if and only if U satisfies the following

three conditions:

(1) additive identity / U is non empty.

(2) αu+ βv ∈ U for u, v ∈ U and α, β ∈ F.

Definition 1.3 (Sum of subsets). Suppose U1, U2, . . . . . . , Um are subsets of V . The sum

of U1, . . . . . . , Um denoted U1 + · · · + Um, is the set of all possible sums of elements of

U1, . . . , Um. More precisely,

U1 + · · ·+ Um = {u1 + · · ·+ um : u1 ∈ U1, . . . , um ∈ Um}.

Theorem 1.2. Suppose U1, . . . . . . , Um are subspaces of V . Then U1 + U2 + . . . · · ·+ Um

is the smallest subspace of V containing U1, . . . . . . , Um.

Definition 1.4 (Direct Sum). Suppose U1, . . . . . . , Um are subspaces of V .

The sum U1 + . . . · · · + Um is called a direct sum if each element of U1 + . . . · · · + Um

can be written in only one way as a sum u1 + · · ·+ um, where each uj ∈ Uj.

Theorem 1.3 (Condition for a direct sum). Suppose U1, . . . . . . , Um are subspaces of V .

Then U1 + . . . · · ·+Um is a direct sum if and only if the only one way to write 0 as a sum

u1 + . . . · · ·+ um, where each ui is in Uj is by taking each uj = 0.

Theorem 1.4. Suppose U and W are subspaces of V . Then U + W is a direct sum if

and only if U ∩W = {0}.

1

2. Finite-Dimensional Vector Spaces

Definition 2.1 (Linear Combination). A linear combination of a list v1, . . . . . . , vm of

vectors in V is a vector of the form

a1v1 + . . . · · ·+ amvm

where a1, . . . . . . , am ∈ F.

Definition 2.2 (Span). The set of all linear combinations of a list of vectors v1, . . . . . . , vm

in V is called the span of v1, . . . . . . , vm, denoted span(v1, . . . , vm). In other words

span(v1, . . . . . . , vm) = {a1v1 + · · ·+ amvm : a1, . . . , am ∈ F}.

The span of the empty list () is defined to be {0}.

Theorem 2.1. The span of a list of vectors in V is the smallest subspace of V containing

all the vectors in the list.

Definition 2.3. A vector space is called finite-dimensional if some list of vectors in it

spans the space.

Definition 2.4 (Polynomial). A function p : F→ F is called a polynomial with coefficient

in F if there exist a0, . . . , am ∈ F such that p(z) = a0 + a1z + a2z2 + · · · + amz

m for all

z ∈ F. The set of all polynomials with coefficients in F is denoted by P(F).

Definition 2.5. A polynomial p ∈ P(F) is said to have degree m if there exists scalars

a0, a1, . . . , am with am 6= 0 such that

p(z) = a0 + a1z + . . . · · ·+ amzm

for all x ∈ F. The polynomial that is identically 0 is said to have degree −∞

Definition 2.6. For m a nonnegative integer, Pm(F) denotes the set of all polynomials

with coefficients in F and degree at most m.

Definition 2.7. A vector space is called infinite-dimensional if it is not finite -dimensional.

Definition 2.8 (Linearly Independent). A list v1, . . . . . . , vm of vectors in V is called

linearly independent if the only choice of a1, . . . . . . , am ∈ F that makes a1v1+. . . · · ·+amvmequal to 0 is a1 = . . . · · · = am = 0. The empty list ( ) is also declared to be linearly

independent.

Definition 2.9 (Linearly Dependent). A list of vectors in V is called linearly dependent if

it not linearly independent. In other words, a list v1, . . . . . . , vm of vectors in V is linearly

dependent if there exist a1, . . . . . . , am ∈ F not all 0, such that a1v1 + · · ·+ amvm = 0.

Lemma 1 (Linear Dependence Lemma). Suppose v1, . . . . . . , vm is a linearly dependent

list in V . Then there exists j ∈ {1, 2, . . . ,m} such that the following hold:

(a) vj ∈ span(v1, . . . . . . , vj−1);2

(b) if the jth term is removed from v1, . . . . . . , vm the span of the remaining list equals

span(v1, . . . , vm).

Theorem 2.2. In a finite-dimensional vector space, the length of every linearly indepen-

dent list of vectors is less than or equal to the length of every spanning list of vectors.

Theorem 2.3. Every subspace of a finite-dimensional vector space is finite-dimensional.

Definition 2.10 (Basis). A basis of V is a list of vectors in V that is linearly independent

and spans V .

Theorem 2.4. A list v1, . . . , vn of vectors in V is a basis of V if and only if every v ∈ Vcan be written uniquely in the form

v = a1 + . . . · · ·+ anvn,

where a1, . . . . . . , an ∈ F.

Theorem 2.5. Every spanning list in a vector space can be reduced to a basis of the

vector space.

Theorem 2.6. Every finite-dimensional vector space has a basis.

Theorem 2.7. Every linearly independent list of vectors in a finite-dimensional vector

space can be extended to a basis of the vector space.

Theorem 2.8. Suppose V is finite-dimensional and U is a subspace of V . Then there is

a subspace W of V such that V = U ⊕W .

Theorem 2.9. Any two bases of a finite-dimensional vector space have the same length.

Definition 2.11 (Dimension). The dimension of a finite-dimensional vector space is the

length of any basis of the vector space. The dimension of V is denoted by dimV .

Theorem 2.10. If V is finite-dimensional and U is a subspace of V , then dimU ≤ dimV.

Theorem 2.11. Suppose V is finite-dimensional. Then every linearly independent list

of vectors in V with length dimV is a basis of V .

Theorem 2.12. Suppose V is finite-dimensional. Then every spanning list of vectors in

V with length dimV is a basis of V .

Theorem 2.13 (Dimension of a sum). If U1 and U2 are subspaces of a finite-dimensional

vector space, then

dim(U1 + U2) = dimU1 + dimU2 − dim(U1 ∩ U2).

Remark 1. • One way to show that a sum is a direct sum is to set u1 + · · ·+um = 0

and show that it implies u1 = · · · = um = 0.

3

3. Exercise on Finite Dimensional Vector Space

Problem 3.1 (2A.7). Prove or give a counterexample: If v1, v2, . . . . . . , vm is a linearly

independent list of vectors in V , then

5v1 − 4v2, v2, v3, . . . . . . , vm

is linearly independent.

Solution. Since the list v1, v2, . . . vm is linearly independent, obviously v2, . . . , vm is lin-

early independent. For sake of contradiction assume that the list 5v1−4v2, v2, v3, . . . . . . , vm

is not linearly independent and therefore 5v1 − 4v2 ∈ span(v2, . . . , vm). There exists con-

stants c2, c3, . . . , cm in the respective field so that

5v1 − 4v2 = c2v2 + . . . · · ·+ cmvm

which implies

−5v1 + (c2 + 4)v2 + c3v3 + . . . · · ·+ cmvm = 0

which is a contradiction to the fact that (v1, . . . . . . , vm) is linearly independent because

−5 6= 0.

Problem 3.2 (2A.8). Prove or give a counterexample: If v1, v2, . . . . . . , vm is a linearly

independent list of vectors in V and λ ∈ F with λ 6= 0, then λv1, λv2, . . . . . . , λvm is

linearly independent.

Solution. Observe that

c1λv1 + c2λv2 + . . . · · ·+ cmλvm = 0

implies

c1v1 + c2v2 + . . . · · ·+ cmvm = 0

Since the set v1, v2, . . . . . . , vm is linearly independent we have c1 = c2 = . . . · · · = cm = 0.

Hence the set of vectors λv1, λv2, . . . . . . , λvm is linearly independent.

Problem 3.3 (2A.9). Prove or give a counterexample: If v1, . . . , vm and w1, . . . , wm are

linearly independent lists of vectors in V , then v1 + w1, v2 + w2, . . . , vm + wm is linearly

independent.

Solution. Take the space R2 and the vectors as, v1 = (1, 0), v2 = (0, 1), w1 = (0, 1)

and w2 = (1, 0). It is obvious that (v1, v2) and (w1, w2) are linearly independent lists of

vectors. But v1 + w1 = (1, 1) = v2 + w2 and hence (v1 + w1, v2 + w2) are not linearly

independent.

Problem 3.4 (2A.10). Suppose v1, . . . , vm is linearly independent in V and w ∈ V .

Prove that ifv1 + w, . . . , vm + w is linearly dependent, then w ∈ span(v1, . . . , vm).4

Solution. Since v1 +w, . . . , vm +w is linearly dependent, there is a list of scalars c1, c2 +

. . . . . . , cm with |c1|+ |c2|+ · · ·+ |cm| 6= 0 such that

c1(v1 + w) + . . . · · ·+ cm(vm + w) = 0.

Writing c = c1 + c2 + · · ·+ cm,

w =−c1cv1 +

−c2cv2 + . . . · · ·+ −cm

cvm

showing that w ∈ span(v1, . . . , vm)

Problem 3.5 (2A.11). Suppose v1, . . . , vm is linearly independent in V and w ∈ V . Show

that v1, . . . , vm, w is linearly independent if and only if w /∈ span(v1, . . . . . . vm).

Solution. If w ∈ span(v1, v2, . . . , vm) then v1, . . . , vm, w is not linearly independent and

hence by contrapositive, v1, . . . , vm, w implies w /∈ span(v1, . . . , vm)

Since v1, . . . , vm is linearly independent, the only way the list of vectors v1, . . . , vm, w

can be linearly dependent if w ∈ span(v1, . . . , vm). So by contrapositive of the above

argument w /∈ span(v1, . . . , vn) implies v1, . . . , vn, w is linearly independent.

Problem 3.6 (2A.12). Explain why three does not exists a list of six polynomials that

is linearly independent in P4(F).

Solution. A polynomial in P4(F) is defined as

p(x) = ax4 + bx3 + cx2 + dx+ e

which is uniquely determined by the coefficients and thus we can imagine them as vector

(a, b, c, d, e) in F4

Problem 3.7 (2A.13). Explain why no list of four polynomials spans P4(F).

Solution. The coefficients of the polynomials of P ∈ P4(F) can be uniquely represented

as a vector of F5. The correspondence between the polynomial and vector in F5 is an

bijection. Since a space F5 can not be spanned by four vectors, a list of four polynomials

can not spans P4(F).

Problem 3.8 (2A.14). Prove that V is infinite-dimensional if and only if there is a

sequence v1, v2, . . . . . . of vectors in V such that v1, . . . , vm is linearly independent for

every positive integer m.

Solution. For contradiction assume that V is finite dimensional with dimension m, then

any sequence of m + 1 vectors is not linearly independent. The contrapositive of the

above argument says, if there exists a sequence of vectors in V such that the collection

v1, . . . vm is linearly independent for any value of m then V is infinite dimensional.

Problem 3.9 (2A.17). Suppose p0, p1, . . . . . . , pm are polynomials in Pm(F) such that

pj(2) = 0 for each j. Prove that p0, p1, . . . , pm is not linearly independent in Pm(F).

5

Solution. Since pj(2) = 0, (x− 2) is a factor of pj for j = 1, . . . ,m. Then,

pj(x) = (x− 2)qj(x)

where qj is a polynomial in Pm−1(F). But the list of (m + 1) polynomial q0, . . . . . . , qm

is not linearly independent in Pm−1(F) and so there are coefficients c0, . . . . . . , cm not all

equal to zero such that

c0q0 + . . . · · ·+ cmqm = 0

which implies (x− 2) [c0q0 + . . . · · ·+ cmqm] = 0 and therefore

c0p0 + . . . . . . cmpm = 0

with cj 6= 0 for at least one j = 1, . . . ,m. Therefore p0, . . . , pm is not linearly independent.

Problem 3.10 (2B.8). Suppose U and W are subspaces of V such that V = U ⊕W .

Suppose also that u1, . . . , um is a basis of U and w1, w2, . . . , wn is a basis of W . Prove

that

u1, . . . , um, w1, . . . , wn

is a basis of V .

Solution. First we claim that u1, . . . , um, w1, . . . , wn spans V . Let v be an arbitrary

vector in V . Since V = U ⊕W there exists u ∈ U and w ∈ W such that v = u+w. But

since u1, . . . , um spans U and w1, . . . , wn spans W , we have,

v = u+ w = c1u1 + · · ·+ cmum + d1w1 + · · ·+ cnwn,

for some c1, . . . , cm, d1, . . . , dn ∈ F. Therefore, the list u1, . . . , um, w1, . . . , wn spans V .

Now we claim that u1, . . . , um is linearly independent. Let

a1u1 + · · ·+ b1um + c1w2 + · · ·+ cnwn = 0.

Which implies a1u1 + · · ·+ amum = −(a1w1 + · · ·+ nnwn). The left hand side belongs to

U and the write hand side belongs to W . Since U = U ⊕W , U ∩W = {0} and hence

the both sides are equal to zero. The fact that u1, . . . , um and w1, . . . , wn are linearly

independent implies

a1 = · · · = am = b1 = · · · = bn = 0.

Hence the set u1, . . . , um, w1, . . . , wn is linearly independent.

Therefore we conclude that u1, . . . , um, w1, . . . , wn is a basis of V .

Problem 3.11 (2C.1). Suppose V is finite-dimensional and U is a subspace of V such

that dimU = dimV . Prove that U = V

Solution. Let dimU = dimV = m and assume that v1, . . . , vm is a basis of U . Hence

v1, . . . , vm is linearly independent in V and therefore can be extended to a basis of V . But

since dimV is m the extension is trivial and hence v1 . . . , vm is a basis for V . Therefore

U = V .6

Problem 3.12 (2C.2). Show that the subspace or R2 are precisely {0}, R2, and all lines

in R2 through the origin.

Solution. The subspaces of R2 has dimension 0 or 1 or 2. The space with dimension

0 is {0} and the only space with dimension 2 is R2. The spaces of dimension 1 are the

straightline passing through origin.

Problem 3.13 (2C.4). For the space P4(F) of polynomials of degree less than or equal

to 4,

(a) Let U = {p ∈ P4(F) : p(6) = 0}. Find a basis of U .

(b) Extend this basis in part (a) to a basis of P4(F).

(c) Find a subspace W of P4(F) such that P4(F) = U ⊕W .

Solution. (a) A basis for U is (x4 − 64, x3 − 63, x2 − 62, x− 6)

(b) The extended basis for P4 would be (x4 − 64, x3 − 63, x2 − 62, x− 6, 6)

(c)

Problem 3.14 (2C.5). P4(F) is the space of polynomials.

(a) Let U = {p ∈ P4(R) : p′′(6) = 0}. Find a basis for U .

(b) Extend the basis in part (a) to a basis of P4(R).

(c) Find a subspace W of P4(R) such that P4(R) = U ⊕W .

Problem 3.15 (2C.6). Let U = {p ∈ P4(F) : p(2) = p(5)}.(a) Find a basis of U .

(b) Extend the basis of part (a) to a basis of P4(F).


Problem 3.16 (2C.7). Let U = {p ∈ P4(F) : p(2) = p(5) = p(6)}.(a) Find a basis of U .



Problem 3.17 (2C.8). Let U = {p ∈ P4(R) :∫ 1

−1 p = 0}.(a) Find a basis of U .



Problem 3.18 (2C.9). Suppose v1, . . . . . . , vm is linearly independent in V and w ∈ V .

Prove that

dim span(v1 + w, . . . . . . , vm + w) ≥ m− 1

Let us denote

V = span(v1, . . . , vm), W = span(w)

Note that dimV = m and dimW ≥ 0. Also, dim(V ∩W ) ≤ 1. Therefore

dim span(v1 + w, . . . , vm + w) = dim(V +W ) = dimV + dimW − dim(V ∩W ) ≥ m− 17

Problem 3.19 (2C.10). Suppose p0, . . . . . . , pm ∈ P(F ) are such that each pj has degree

j. Prove that p0, p1, . . . , pm is a basis of Pm(F).

Solution. Since each pj has degree j, pi is not linear combination of p0, . . . , pi−1 for any

i = 1 to m. Therefore p0, . . . , pm is a list of m + 1 linearly independent vectors in the

space Pm(F) with dimension m+ 1 and hence p0, . . . , p1 is a basis.

Problem 3.20 (2C.11). Suppose that U and W are subspaces of R8 such that dimU = 3,

dimW = 5, and U +W = R8. Prove that R8 = U ⊕W .

Solution.

dim(U ∩W ) = dimU + dimW − dim(U +W ) = 3 + 5− 8 = 0.

Therefore, U ∩W = {0} and hence R8 = U ⊕W .

Problem 3.21 (2C.12). Suppose U and W are both five-dimensional subspaces of R9.

Prove that U ∩W 6= {0}.

Solution.


But dim{0} = 0, so U ∩W 6= {0}.

Problem 3.22 (2C.13). Suppose U and W are both 4-dimensional subspaces of C6.

Prove that there exists two vectors in U ∩W such that neither of these vectors is a scalar

multiple of the other.

Solution.


So, there is a basis of U ∩W consisting of two linearly independent vectors and hence

one is not scalar multiple of other.

Problem 3.23 (2C.14). Suppose U1, . . . , Um are finite-dimensional subspaces of V . Prove

that U1 + · · ·+ Um is finite dimensional and

dim(U1 + · · ·+ Um) ≤ dimU1 + · · ·+ dimUm.

Solution.

Problem 3.24 (2C.15). Suppose V is finite-dimensional, with dimV = n ≥ 1. Prove

that there exist 1-dimensional subspaces U1, . . . , Um of V such that

V = U1 ⊕ U2 ⊕ · · · ⊕ Un.

Problem 3.25 (2C.16). Suppose U1, . . . , Um are finite-dimensional subspaces of V such

that U1 + · · ·+ Um is a direct sum. Prove that U1 ⊕ · · · ⊕ Um is finite dimensional and

dimU1 ⊕ · · · ⊕ Um = dimU1 + · · ·+ dimUm.8

Problem 3.26. Let F be a commutative field, let (V,+, .) be a finite dimensional vector

space over F, let U and W be two subspace of V . Show that there exists S, a subspace

of V , such that V = S ⊕ U and V = S ⊕W if and only if dimU = dimW .

Solution. First let us assume that there exists S ⊂ V such that V = S ⊕ U and

V = S ⊕W . We want to show that dimU = dimW .

V = S ⊕ U ⇒ dimV = dimS + dimU

V = S ⊕ V ⇒ dimV = dimS + dimW

The above two equations implies

dimU = dimW.

Now for the another direction we assume that dimU = dimW . We use the following

lemma:

Lemma 2. Let V be a finite dimensional vector space and U is a subspace of V then

there is a subspace W of V such that V = U ⊕W .

Proof. Let u1, . . . , um is a basis of U . We can extend the basis of U to the basis of V

adding w1, . . . , wn−m. Where n is the dimension of V . Then V = U +M because

v = a1u1 + · · ·+ amum + am+1w1 + · · ·+ anwn−m

where a1, . . . , an ∈ F, for any v ∈ V . In this expression a1u1 + · · · + amum ∈ U and

am+1w1 + · · ·+ anwn−m ∈ W . Let x ∈ U ∩W then there are a1, . . . , am, b1, . . . , bn−m ∈ Fsuch that x = a1u1 + · · ·+ amum = b1w1 + · · ·+ bn−mwn−m. Which implies

a1u1 + · · ·+ amum + b1w1 + · · ·+ bn−mwn−m = 0.

Since u1, . . . , um, w1, . . . , wn−m is linearly independent a1 = · · · = am = b1 = · · · =

bn−m = 0 and hence x = 0. Therefore

V = U ⊕W.

Since U ∩W is a subspace of U and W by the above lemma there exists U ′ and W ′

such that

U = (U ∩W )⊕ U ′; W = (U ∩W )⊕W ′.

Note that U +W = (U ∩W )⊕U ′⊕W ′ ⊂ V . So, by the lemma there exists H ⊂ V such

that

V = (U +W )⊕H = (U ∩W )⊕ U ′ ⊕W ′ ⊕H.

Let dimU = dimW = p and suppose u′1, . . . , u′p is a basis of U and w′1, . . . , w

′p is a basis

for W . Define,

G = span(u′1 + w′1, . . . . . . , u′p + w′p).

We require another lemma to continue.9

Lemma 3. U, V,W,U ′,W ′ and G are as defined above. Then

(U ∩W )⊕ U ′ ⊕W ′ = (U ∩W )⊕ U ′ ⊕G

(U ∩W )⊕ U ′ ⊕W ′ = (U ∩W )⊕W ′ ⊕G

Proof. Let x ∈ (U ∩W )⊕ U ′ ⊕W ′. Then x = x1 + x2 + x3 where x1 ∈ U ∩W , x2 ∈ U ′

and x3 ∈ W ′. Therefore, there are a1, . . . ap, b1, . . . , bp ∈ F such that

x2 = a1u′1 + · · ·+ apu

′p

x3 = b1w′1 + · · ·+ bpw

′p.

So,

x =x1 + a1u′2 + . . . apu

′p + b1w

′1 + · · ·+ bpw

′p

=x1 + (a1 − b1)u′1 + · · ·+ (ap − bp)u′p + b1(u′1 + w′1) + · · ·+ bp(u

′p + w′p)

=x1 + xU + xG

where xU ∈ U ′ and xG ∈ G. Let v1, . . . , vn−2p be a basis for U ∩W . We write

0 = xi + xU + xG

where xi = a1v1 + · · · + an−2pvn−2p ∈ U ∩W , xU = b1u′1 + · · · + bpu

′p ∈ U ′ and xG =

c1(u′1 + w′1) + · · · + cp(u

′p + w′p) ∈ G. Note that v1, ,̇vn−2p, u

′1, . . . , u

′p, w

′1, w

′p is linearly

independent. So,

0 =a1v1 + · · ·+ an−2pvn−2p + b1u′1 + · · ·+ bpu

′p + c1(u

′1 + w′1) + · · ·+ cp(u

′p + w′p)

=a1v1 + · · ·+ an−2pvn−2p + (b1 + c1)u′1 + · · ·+ (bp + cp)u

′p + c1w

′1 + · · ·+ cpw

′p

For the independence we get a1 = · · · = an−2p = c1 = · · · = cp = b1+c1 = · · · = bp+cp = 0

which implies

a1 = · · · = an−2p = c1 = · · · = cp = b1 = · · · = bp = 0

Therefore, xi = xU = xG = 0 and hence

(U ∩W )⊕ U ′ ⊕W ′ = (U ∩W )⊕ U ′ ⊕G

Similarly we can prove the other equation.

Using the above lemma we can write V in two ways:

V = ((U ∩W )⊕ U ′)⊕W ′ ⊕H = U ⊕G⊕H = U ⊕ S

V = ((U ∩W )⊕W ′)⊕ U ′ ⊕H = W ⊕G⊕H = W ⊕ S

where S = G⊕H.

10

4. Linear Map

Definition 4.1. A linear map from V to W is a function T : V → W with the following

properties:

• Additivity:

T (u+ v) = Tu+ Tv, ∀u, v ∈ V ;

• Homogeneity:

T (λv) = λ(Tv), ∀λ ∈ F, ∀v ∈ V.

Definition 4.2. The set of all linear map from V to W is denoted L(V,W ).

Definition 4.3. Suppose v1, . . . . . . , vn is a basis of V and w1, . . . . . . , wn ∈ W . Then

there exists a unique linear map T : V → W such that

Tvj = wj

for each j = 1, . . . . . . , n.

Definition 4.4. Suppose S, T ∈ L(V,W ) and λ ∈ F. The sum S + T and the product

λT are the linear maps from V to W defined by

(S + T )(v) = Sv + Tv, and (λT )(v) = λ(Tv).

for all v ∈ V .

Definition 4.5. With the operations of addition and scalar multiplication as defined

above L(V,W ) is a vector space.

Definition 4.6. If T ∈ L(U, V ) and S ∈ L(V,W ), then the product ST ∈ L(U,W ) is

defined by

(ST )(u) = S(Tu)

for u ∈ U .

Theorem 4.1. Algebraic properties of products of linear maps

• associativity

(T1T2)T3 = T1(T2T3);

• identity:

TI = IT = T ;

• distributive properties:

(S1 + S2)T = S1T + S2T, and S(T1 + T2) = ST1 + ST2.

Theorem 4.2. Suppose T is a linear map from V to W . Then T (0) = 0.

Definition 4.7 (Null Space). For T ∈ L(V,W ), the null space of T , denoted by nullT ,

is the subset of V consisting of those vectors that T maps to 0:

nullT = {v ∈ V : Tv = 0}11

Theorem 4.3. Suppose T ∈ L(V,W ). Then nullT is a subspace of V .

Definition 4.8. A function T : V → W is called injective if Tu = Tv implies u = v.

Definition 4.9. Let T ∈ L(V,W ). Then T is injective if and only if nullT = {0}.

Definition 4.10 (Range). For T a function from V to W , the range of T is a subset of

W consisting of those vectors that are of the form Tv for some v ∈ V ;

rangeT = {Tv : v ∈ V }

Theorem 4.4. If T ∈ L(V,W ), then rangeT is a subspace of W .

Definition 4.11. A function T : V → W is called surjective if its range equals W .

Theorem 4.5 (Fundamental Theorem of Linear Maps). Suppose V is a finite-dimensional

and T ∈ L(V,W ). The range T is finite-dimensional and

dimV = dim nullT + dim rangeT.

Theorem 4.6. Suppose V and W are finite-dimensional vector spaces such that dimV >

dimW . Then no linear map from V to W is injective.

Theorem 4.7. Suppose V and W are finite-dimensional vector spaces such that dimV <

dimW . Then no linear map from V to W is surjective.

Definition 4.12. A homogeneous system of linear equations with more variables than

equations has nonzero solutions.

Definition 4.13. An inhomogeneous system of linear equations with more equations

than variables has no solution for some choice of the constant terms.

12

5. Exercise on Linear Map

Problem 5.1 (3.A.4). Suppose T ∈ L(V,W ) and v1, . . . , vm is a list of vectors in V such

that Tv1, . . . , T vm is a linearly independent list in W . Prove that v1, . . . , vm is linearly

independent.

Solution. Let

c1v1 + · · ·+ cmvm = 0,

for some c1, . . . , cm ∈ F. Then,

c1Tv1 + · · ·+ cmTvm = T (c1v1 + · · ·+ cmvm) = T (0) = 0.

Since Tv1, . . . , T vm is linearly independent, c1 = · · · = cm = 0 and hence v1, . . . , vm is

linearly independent.

Problem 5.2 (3A.7). Show that every linear map from a 1-dimensional vector space to

itself is multiplication by some scalar.

Solution. Let v ∈ V , then Tv ∈ V . Since two vectors v, Tv in 1 dimensional vector

space is linearly dependent there exists λ, possibly dependent on v such that Tv = λv.

Now we will show that λ does not depend on v.

Let v1 6= 0 and ,Tv1 = λ1v1 and Tv2 = λ2v2. Since dimV = 1, v1, v2 is linearly dependent

and so there exists β ∈ F such that, v1 = βv2. Then,

Tv1 = T (βv2)⇒ Tv1 = βTv2 ⇒ λ1v1 = βλ1v2 ⇒ λ2v1 = λ1v1 ⇒ (λ2 − λ1)v1 = 0.

But since v1 6= 0, λ1 = λ2 and hence, Tv = λv for all v ∈ V and some λ ∈ F.

Problem 5.3 (3A.8). Give an example of a function φ : R2 → R such that

φ(av) = aφ(v)

for all a ∈ R and all v ∈ R2 but φ is not linear.

Solution.

Problem 5.4 (Prelim & 3B20). Suppose that W is finite dimensional and T ∈ L(V,W ).

Prove that T is in injective if and only if there exits S ∈ L(W,V ) such that ST is the

identity map in V .

Solution. Suppose T is injective. Since W is finite dimensional the subspace range(T )

of W is also finite dimensional. Let w1, . . . , wm be a basis of range(T ). Since w1, . . . , wm

belong to range(T ), there exists v1, . . . , vm in V such that w1 = Tv1, . . . w2 = Tv2, . . . .

Let

c1v1 + · · ·+ cmvm = 0

for some c1, . . . , cm ∈ F. Applying T we get,

c1Tv1 + · · ·+ cmTvm = c1w1 + · · ·+ cmwm = 0.13

Since w1, . . . , wm is linearly independent, we have c1 = · · · = cm = 0 and hence, v1, . . . , vm

is linearly independent as well.

Now let x in V . Then, since w1, . . . , wm spans range(T ), there exist scalars α1, . . . , αm

such that Tx = α1w1 + · · ·+ αmwm. So, Tx = α1Tv1 + · · ·+ αmTvm. Which implies,

T (x− α1v1 − . . . αmvm) = 0.

Now T is injective, so we get that x− α1v1 − · · · − αmvm = 0 and hence,

x = α1v1 + · · ·+ αmvm.

So, x in span(v1, . . . , vm). This proves that (v1, . . . , vm) spans V .

We conclude that v1, . . . , vm is a basis of V .

Let us extend the linearly independent set w1, . . . , wm of W with wm+1, . . . , wn so as

w1, . . . , wn is a basis of W . Now define S : W → V such tha t

Sw1 = v1, . . . , SWm = vm, . . . Swm+1 = · · · = Swn = 0.

It is clear that S ∈ L(W,V ) and that ST is the identity map on V .

Now suppose that there exists S ∈ L(W,V ) such that ST is the identity map on V . Let

x and y in V such that Tx = Ty. Multiplying by S, this means STx = STy, but ST is

the identity of STx = x and STy = y, so we get x = y, which means T is injective.

Now, assume that ST = I. Let Tv1 = Tv2 then,

v1 = STv1 = STv2 = v2.

Hence T is injective.

Problem 5.5. Suppose V is finite-dimensional and T ∈ L(V,W ). Prove that T is

surjective if and only if there exists S ∈ L(W,V ) such that TS is the identity map on W .

Solution. Since V is finite dimensional and T is surjective dimW ≤ dimV and hence

W is also finite dimensional. Let w1, . . . , wm is a basis of W . Since T is surjective there

exists v1, . . . , vm ∈ V such that

Tv1 = w1, . . . . . . , T vm = wm.

Define S ∈ L(W,V ) such that

Sw1 = v1, . . . Swm = vm

then clearly TS = IW . Now assume that there exists S ∈ L(W,V ) such that ST is the

identity map on W . Then for any w ∈ W , TSw = w and hence for any w ∈ W there is

Sw ∈ V such that T (Sw) = w. Therefore, T is surjective.14

Problem 5.6. Suppose U and V are finite-dimensional vector spaces and S ∈ L(V,W )

and T ∈ L(U, V ). Prove that

dim nullST ≤ dim nullS + dim nullT.

Solution. Let T ′ be the restriction of T on the nullST . Since U is finite dimensional

nullST ⊂ U is also finite dimensional and so,

dim nullST = dim nullT ′ + dim rangeT ′.

We want to show two things

• nullT ′ ⊂ nullT

• rangeT ′ ⊂ nullS

To show the first claim let u ∈ nullT ′. Then, T ′u = 0. So, Tu = T ′u = 0 and hence

u ∈ nullT . Therefore, nullT ′ ⊂ nullT .

Then rangeT ′ = {Tu : u ∈ nullST}. Let u ∈ nullST be arbitrary. Then, ST ′u = STu =

0 and hence T ′u ∈ S. Hence rangeT ′ ∈ nullS. Therefore,

dim nullST ≤ dim nullT + dim nullS.

Problem 5.7. Suppose U and V are finite-dimensional vector spaces and S ∈ L(V,W )

and T ∈ L(U, V ). Prove that

dim rangeST ≤ min{dim rangeS, dim rangeT}.

Solution. Since U is finite dimensional and T ∈ L(U, V ) and ST ∈ U ,W , the funda-

mental theorem of algebra implies,

dim nullST + dim rangeST = dimU = dimnulT + dim rangeT

which implies that,

dim rangeST = dim rangeT + dim nullT − dim nullST.

Let u ∈ nullT . Then Tu = 0 which implies STu = 0 and hence u ∈ nullST . Therefore

nullT ⊂ nullST . So, dim nullT − dim nullST ≤ 0. Hence,

dim rangeST ≤ dim rangeT.

Let w ∈ rangeST . Then there is u ∈ U such that w = STu. Which implies for any

w ∈ rangeST there is Tu ∈ V such that S(Tu) = w and hence w ∈ rangeS. Therefore,

rangeST ⊂ rangeS which implies,

dim rangeST ≤ dim rangeS.

Therefore,

dim rangeST ≤ min{dim rangeS, dim rangeT}.

15

6. Eigenvalues

Definition 6.1. Suppose T ∈ L(V ). A subspace U of V is called invariant under T if

u ∈ U implies Tu ∈ U .

Definition 6.2. Suppose T ∈ L(V ). A number λ ∈ F is called an eigenvalue of T if

there exists v ∈ V such that v 6= 0 and Tv = λv.

Definition 6.3. Let λ be an eigenvalue of an operator T ∈ L(V ). The set of all eigen-

vectors corresponding to λ along with the zero vector is a subspace, called the eigenspace

Eλ of T corresponding to eigenvalue λ.

Theorem 6.1. Suppose V is finite-dimension, T ∈ L(V ), and λ ∈ F. Then the following

are equivalent:

(a) λ is an eigenvalue of T ;

(b) T − λI is not injective;

(c) T − λI is not surjective;

(d) T − λI is not invertible.

Definition 6.4. Suppose T ∈ L(V ) and λ ∈ F is an eigenvalue of T . A vector v ∈ V is

called an eigenvector of T corresponding to λ if v 6= 0 and Tv = λv.

Theorem 6.2. Let T ∈ L(V ). Suppose λ1, . . . , λm are distinct eigenvalues of T and

v1, . . . , vm are corresponding eigenvectors. Then v1, . . . , vm is linearly independent.

Theorem 6.3. Suppose V is finite-dimensional. Then each operator on V has at most

dimV distinct eigenvalues.

Definition 6.5. Suppose T ∈ L(V ) and U is a subspace of V invariant under T .

• The restriction operator T |U ∈ L(U) is defined by

T |U(u) = Tu

for u ∈ U .

• The quotient operator T/U ∈ L(V/U) is defined by

(T/U)(v + U) = Tv + U

for v ∈ V .

Theorem 6.4 (Multiplicative properties of Matrix polynomial). Suppose p, q ∈ P(F)

and T ∈ L(V ). Then

(a) (pq)(T ) = p(T )q(T )

(b) p(T )q(T ) = q(T )p(T )

Theorem 6.5. A square matrix has an eigenvalue.

or,

Every operator on a finite-dimensional, nonzero, complex vector space has an eigenvalue.16

Theorem 6.6. Suppose T ∈ L(V ) and v1, . . . , vn is a basis of V . Then the following are

equivalent:

(c) the matrix of T with respect to v1, . . . , vn is upper triangular;

(b) Tvj ∈ span(v1, . . . , vj) for each j = 1, . . . , n;

(c) span(v1, . . . , vj) is invariant under T for each j = 1, . . . , n;

Theorem 6.7. Suppose V is a finite-dimensional complex vector space and T ∈ L(V ).

Then T has an upper-triangular matrix with respect to some basis of V .

or

Let A ∈ Mn×n(C) then for all invertible matrix V ∈ Mn×n(C) there exists upper-

triangular matrix T such that

A = V TV −1.

Definition 6.6. Two matrix A and B are called similar if there exists invertible matrix

V such that A = V BV −1. Intuitively, similar means same operator in two different basis.

Theorem 6.8. If A and B are similar then ΛA = ΛB where ΛA is the spectrum of A

which is the set of all eigenvalues of A.

Proof. Let (λ, v) be eigencouple of A. Then Av = λv. Since B is similar to A there is

an invertible matrix V such that A = V BV −1 and therefore,

V BV −1v = λv ⇒ B(V v) = λV v

showing that λ is an eigenvalue of B.

Theorem 6.9. Suppose T ∈ L(V ) has an upper-triangular matrix with respect to some

basis of V . Then the eigenvalues of T are precisely the entries on the diagonal of that

upper-triangular matrix.

Theorem 6.10 (Conditions equivalent to diagonalizability). Suppose V is finite-dimensional

and T ∈ L(V ). Let λ1, . . . , λm denote the distinct eigenvalues of T . Then the following

are equivalent:

(a) T is diagonalizable.

(b) V has a basis consisting of eigenvectors of T ;

(c) there exists 1-dimensional subspaces U1, . . . , Un of V , each invariant under T ,

such that

V = U1 ⊕ · · · ⊕ Un;

(d) V = E(λ1, T )⊕ · · · ⊕ E(λm, T );

(e) dim v = dimE(λ1, T ) + · · ·+ dimE(λm, T )

6.1. Some comments

• If U is an invariant subspace of V under T ∈ L(V ) with dimU = 1 then each

nonzero vector of U is an eigenvector of T . To show this let u ∈ U then Tu ∈ U17

but U is one dimensional and so Tu = λu for some λ ∈ F and hence u is an

eigenvector.

18

7. Problems on Eigenvalues

Problem 7.1 (5A.33). Suppose T ∈ L(V ). Prove that T/(rangeT ) = 0.

Solution.

Problem 7.2 (5A.34). Suppose T ∈ L(V ). Prove that T/(nullT ) is injective if and only

if (nullT ) ∩ (rangeT ) = {0}.

Solution. Let v ∈ (nullT ) ∩ (rangeT ). Then Tv = 0 and there is u ∈ V such that

Tu = v.

T/(nullT )(u+ nullT ) = Tu+ nullT = v + nullT = nullT

Therefore, u + nullT ∈ null(T/(nullT )) = nullT since T/(nullT ) is injective. So,

u ∈ nullT and hence v = Tu = 0. Therefore (nullT ) ∩ (rangeT ) = {0}.

Now assume that (nullT ) ∩ (rangeT ) = {0}. For the sake of contradiction let there

is v /∈ nullT such that v + nullT ∈ null(T/(nullT )). Then,

T/(nulT )(v + nullT ) = nullT ⇒ Tv + nullT = nullT ⇒ Tv ∈ nullT.

But Tv ∈ rangeT and so Tv ∈ (nullT ) ∩ (rangeT ) which implies Tv = 0. Hence

v ∈ nullT and therefore T/(nullT ) is injective.

Problem 7.3 (5A.25). Suppose T ∈ L(V ) and u, v are eigenvectors of T such that u+ v

is also an eigenvector of T . Prove that u and v are eigenvectors of T corresponding to

the same eigenvalue.

Solution. Let λ1, λ2, λ3 are the eigenvalue corresponding to the eigenvectors u, v, u + v

respectively. Then,

Tu = λ1u, Tv = λ2v, T (u+ v) = λ3(u+ v).

If u, v is linearly dependent then there is c ∈ F such that v = cu. In this case,

λ2v = Tv = T (cu) = cTu = cλ1u = λ1v ⇒ (λ2 − λ1)v = 0⇒ λ1 = λ2.

Now, suppose u, v is linearly independent. In this case,

λ1u+ λ2v = Tu+ Tv = T (u+ v) = λ3(u+ v)⇒ (λ1 − λ3)u+ (λ2 − λ3)v = 0.

But linear independence of u, v implies λ1 − λ3 = 0 = λ2 − λ3 and hence λ1 = λ2.

Problem 7.4 (5A.26). Suppose T ∈ L(V ) is such that every nonzero vector in V is an

eigenvector of T . Prove that T is a scalar multiple of the identity operator.

Solution. Let u, v ∈ V be any two vectors. Since u, v are eigenvectors and also u + v

is an eigenvector of T by the previous problem their corresponding eigenvalue is same.

This is true for any pair and hence we can conclude that all the vectors in V corresponds

to the same eigenvalue, say λ. Then,

(T − λI)v = 0, ∀v ∈ V ⇒ T − λI = 0⇒ T = λI19

Problem 7.5. Let V be a complex vector space. Let S and T be two operators on V

such that TS = ST .

(a) Prove that if λ is an eigenvalue of S then the eigenspace of S associated with the

eigenvalue λ, Eλ, is invariant under T .

(b) Prove that S and T have (at least) one common eigenvector.

Solution. Part a: Let v ∈ Eλ. Then Sv = λv.

S(Tv) = STv = T (Sv) = Tλv = λTv.

Therefore, Tv ∈ Eλ and hence Eλ is invariant under T .

Part b: Restrict T to the invariant subspace Eλ. Eλ is clearly finite-dimensional, nonzero

complex vector space and T |Eλ is an operator. Hence by the existence theorem of eigen-

value, there is a vector v ∈ Eλ which is an eigenvector of T . Since v ∈ Eλ, v is also an

eigenvector of S.

20

8. Operations on Complex Vector Spaces

Theorem 8.1. Suppose T ∈ L(V ). Then

{0} = nullT 0 ⊂ nullT 1 ⊂ · · · ⊂ nullT k ⊂ nullT k+1 ⊂ . . .

Theorem 8.2. Suppose T ∈ L(V ). Suppose m is a nonnegative integer such that

nullTm = nullTm+1. Then

nullTm = nullTm+1 = nullTm+2 = . . . . . .

Theorem 8.3. Suppose T ∈ L(V ). Let n = dimV . Then

nullT n = nullT n+1 = nullT n+2 = . . . . . . .

Theorem 8.4. Suppose T ∈ L(V ). Let n = dimV . Then

V = nullT n ⊕ rangeT n

Definition 8.1. Suppose T ∈ L(V ) and λ is an eigenvalue of T . A vector v ∈ V is called

a generalize eigenvector of T corresponding to λ if v 6= 0 and

(T − λI)jv = 0

for some positive integer j.

Definition 8.2. Suppose T ∈ L(V ) and λ ∈ F. The generalized eigenspace of T corre-

sponding to λ, denoted G(λ, T ), is defined to be the set of all generalized eigenvectors of

T corresponding to λ, along with the 0 vector.

Definition 8.3. Suppose T ∈ L(V ) and λ ∈ F. Then G(λ, T ) = null(T − λI)dimV .

Theorem 8.5. Let T ∈ L(V ). Suppose λ1, . . . , λm are distinct eigenvalues of T and

v1, . . . , vm are corresponding generalized eigenvectors. Then v1, . . . , vm is linearly inde-

pendent.

Definition 8.4. An operator is called nilpotent if some power of it equals 0.

Remark 2. If N is nilpotent then there is j such that N j = 0 and therefore (N−oI)jv = 0

for all v ∈ V . Hence G(0, N) = null(N − 0I)j = V

Theorem 8.6. Suppose N ∈ L(V ) is nilpotent. Then NdimV = 0.

Remark 3. Suppose N is a nilpotent operator on V . Then there is a basis of V with

respect to which the matrix of N has the form0 ∗. . .

0 0

;

here all entries on and bellow the diagonal are 0’s.21

8.1. Decomposition of an Operator

Theorem 8.7. Suppose T ∈ L(V ) and p ∈ P(F). Then null p(T ) and range p(T ) are

invariant under T.

Remark 4. So, G(λ, T ) which is null space of (T − λI) is invariant under T .

Theorem 8.8. Suppose V is a complex vector space and T ∈ L(V ). Let λ1, . . . , λm be

the distinct eigenvalues of T . Then

(a) V = G(λ1, T )⊕ · · · ⊕G(λm, T );

(b) each G(λj, T ) is invariant under T ;

(c) each (T − λjI)|G(λj ,T ) is nilpotent.

Theorem 8.9. Suppose V is a complex vector space and T ∈ L(V ). Then there is a

basis of V consisting of generalized eigenvectors of T .

Definition 8.5. Multiplicity

• Suppose T ∈ L(V ). The multiplicity of an eigenvalue λ of T is defined to be the

dimension of the corresponding generalized eigenspace G(λ, T ).

• In other words, the multiplicity of an eigenvalue λ of T equals dim null(T−λI)dimV

8.2. Characteristic and Minimal Polynomials

Definition 8.6 (Characteristic Polynomial). Suppose V is a complex vector space and

T ∈ L(V ). Let λ1, . . . .λm denote the distinct eigenvalues of T , with multiplicities

d1, . . . , dm. The polynomial

(z − λ1)d1 . . . (z − λm)dm

is called the characteristic polynomial fo T .

Theorem 8.10. Suppose V is a complex vector space and T ∈ V. Then

• the characteristic polynomial of T has degree dimV .

• the zeros of the characteristic polynomial of T are the eigenvalues of T .

Theorem 8.11 (Cayley-Hamilton Theorem). Suppose V is a complex vector space and

T ∈ L(V ). Let q denote the characteristic polynomial of T . Then q(T ) = 0.

Definition 8.7. Suppose T ∈ L(V ). Then there is a unique monic polynomial p of

smallest degree such that p(T ) = 0. The polynomial p is called the minimal polynomial.

Theorem 8.12. Suppose T ∈ L(V ) and q ∈ P(F). Then q(T ) = 0 if and only if q is

a polynomial multiple of the minimal polynomial of T . The characteristic polynomial is

multiple of the minimal polynomial of T .

Theorem 8.13. Let T ∈ L(V ). Then the zeros of the minimal polynomial of T are

precisely the eigenvalues of T .22

8.3. Jordan Form

Theorem 8.14. Suppose N ∈ L(V ) is nilpotent. Then there exist vectors v1, . . . , vn ∈ Vand nonnegative integers m1, . . . ,mn such that

• Nm1 , . . . , Nv1, v1, . . . , Nmnvn, . . . , Nvn, vn is a basis of V ;

• Nm1+1v1 = · · · = Nmn+1vn = 0.

23

9. Trace and Determinant

9.1. Trace

Definition 9.1. Suppose T ∈ L(V ).

• If F = C, then the trace of T is the sum of the eigenvalues fo T , with each

eigenvalue repeated according to its multiplicity.

• If F = R, then the trace of T is the sum of the eigenvalues of TC , with each

eigenvalues repeated according to its multiplicity.

• The trace of a square matrix A is the sum of the diagonal entries of A.

Theorem 9.1.

traceAB = traceBA

Proof. Using the diagonal entries definition:

traceAB =n∑j=1

n∑k=1

Aj,kBk,j =n∑k=1

n∑j=1

Bk,jAj,k = traceBA

Using sum of eigenvalues definition: First we show that eigenvalues of AB is also eigen-

values of BA. Let λ is an eigenvalue of AB. There is v 6= 0 such that

ABv = λv ⇒ BABv = Bλv = λBv

If Bv 6= 0 then λ is an eigenvalue of BA. If Bv = 0 then ABv = 0⇒ λ = 0 because we

assumed v 6= 0 and so 0 is an eigenvalue of AB. So,

det(BA) = det(B) det(A) = det(A) det(B) det(AB) = 0

which implies 0 is an eigenvalue of BA. So, traceAB = traceBA

Theorem 9.2.

traceT = trace(V TV −1)

Proof.

trace(V TV −1) = trace(TV V −1) = trace(TI) = traceT

9.2. Determinant

Definition 9.2. Suppose T ∈ L(V )

• If F = C, then the determinant of T is the product of the eigenvalues of T , with

each eigenvalue repeated according to its multiplicity.

• If F = R, then the determinant of T is the product of the eigenvalues of TC , with

each eigenvalue repeated according to its multiplicity.

Remark 5. If T ∈ R then det(T ) ∈ R.

Theorem 9.3.

detT = (−1)npT (0)24

where pT is the characteristic polynomial and n = dimV .

Theorem 9.4. An operator on V is invertible if and only if its determinant is nonzero.

Theorem 9.5.

PT (z) = det(zI − T )

Proof. Let λ ∈ ΛT then (z − λ) ∈ ΛT−λI .

PT (z) =∏

(z − λi) = det(zI − T )

25

10. Inner Product Space

Definition 10.1. An inner product on V is a function that takes each ordered pair (u, v)

of elements of V to a number 〈u, v〉 ∈ F and has the following properties:

• Positive: 〈u, v〉 ≥ 0 for all v ∈ V ;

• Definiteness: 〈v, v〉 = 0 if and only if v = 0.

• additive in first slot:

〈u+ v, w〉 = 〈u,w〉+ 〈v, w〉, ∀u, v, w ∈ V ;

• homogeneity in first slot:

〈λu, v〉 = λ〈u, v〉, ∀λ ∈ F, ∀u, v ∈ V ;

• conjugate symmetry

〈u, v〉 = 〈u, v〉,∀u, v ∈ V.

Definition 10.2. The Euclidean inner product on Fn is defined by

〈(w1, . . . , wn), (z1, . . . , zn)〉 = w1z1 + · · ·+ wnzn.

Example 1. An inner product can be defined on the vector space of continuous real-

valued functions on the interval [−1, 1] by

〈f, g〉 =

∫ 1

−1f(x)g(x)dx.

Theorem 10.1. Properties of Inner product:

(1) For a fixed u ∈ V , the function φ : V → F defined by, φ(v) = 〈u, v〉 is a linear

map from V to F.

(2) 〈0, v〉 = 0 for all v ∈ V .

(3) 〈u, 0〉 = 0, for all v ∈ V .

(4) 〈u, v + w〉 = 〈u, v〉+ 〈u,w〉 for all u, v, w ∈ V .

(5) 〈u, λv〉 = λ〈u, v〉 for all λ ∈ F and u, v ∈ V .

Definition 10.3. For v ∈ V , the norm of v , denoted by ‖v‖, is defined by

‖v‖ =√〈v, v〉.

Theorem 10.2 (Basic properties of norm). Suppose v ∈ V• ‖v‖ = 0 if and only if v = 0.

• ‖λv‖ = |λ|‖v‖ for all λ ∈ F.

Definition 10.4. Two vectors u, v ∈ V are called orthogonal if 〈u, v〉 = 0.

Theorem 10.3 (Pythagorean Theorem). Suppose u and v are orthogonal vectors in V .

Then

‖u+ v‖2 = ‖u‖2 + ‖v‖226

Theorem 10.4 (Orthogonal Decomposition). Suppose u, v ∈ V , with v 6= 0. Set c = 〈u,v〉‖v‖2

and w = u− cv. Then

〈w, v〉 = 0, and u = cv + w

Theorem 10.5 (Cauchy-Schwarz inequality). Suppose u, v ∈ V . Then

|〈u, v〉| ≤ ‖u‖‖v‖.

The inequality is an equality if and only if one of u, v is a scalar multiple of the other.

Proof.0 ≤〈‖u‖v − ‖v‖u, ‖u‖v − ‖v‖u〉

=‖u‖2〈v, v〉 − 2‖u‖v‖〈u, v〉+ ‖v‖2〈u, u〉

=2‖u‖2‖v‖2 − 2‖u‖‖v‖〈u, v〉Therefore,

2‖u‖‖v‖〈u, v〉 ≤ 2‖u‖2‖v‖2 ⇒ 〈u, v〉 ≤ ‖u‖‖v‖.

We can apply the same reasoning to −u instead of u and we obtain the Cauchy-Schwarz

inequality.

Theorem 10.6. Suppose u, v ∈ V . Then

‖u+ v‖ ≤ ‖u‖+ ‖v‖.

This inequality is an equality if and only if one of u, v is a nonnegative multiple of the

other.

Theorem 10.7 (Parallelogram Equality). Suppose u, v ∈ V . Then

‖u+ v‖2 + ‖u− v‖2 = 2(‖u‖2 + ‖v‖2).

Definition 10.5. A list e1, . . . , em of vectors in V is orthonormal if

〈ej, ek〉 =

1 if j = k,

0 if j 6= k.

Theorem 10.8. If e1, . . . , em is an orthonormal list of vectors in V , then

‖a1e1 + · · ·+ amem‖2 = |a1|2 + · · ·+ |am|2

for all a1, . . . , am ∈ F.

Theorem 10.9. Every orthonormal list of vectors is linearly independent.

Theorem 10.10. Suppose e1, . . . , en is an orthonormal basis of V and v ∈ V . Then

v = 〈v, e1〉e1 + · · ·+ 〈v, en〉en

and

‖v‖2 = |〈v, e1〉|2 + · · ·+ |〈v, en|2.27

10.1. Proof of some theorems

Theorem 10.11 (Pythagorean Theorem). Suppose u and v are orthogonal vectors in

V . Then

‖u+ v‖2 = ‖u‖2 + ‖v‖2

Proof. We have,‖u+ v‖2 =〈u+ v, u+ v〉

=〈u, u〉+ 〈u, v〉+ 〈v, u〉+ 〈v, v〉

=‖u‖2 + ‖v‖2

Theorem 10.12 (Cauchy-Schwarz Inequality). Suppose u, v ∈ V . Then,

|〈u, v〉| ≤ ‖u‖‖v‖.

The inequality is an equality if and only if one of u , v is a scalar multiple of the other.

Proof. If v = 0, then both sides of the desired inequality equal 0. Thus we assume that

v 6= 0. Consider the orthogonal decomposition

u =〈u, v〉‖v‖2

v + w,

where w is orthogonal to v. By the Pythagorean theorem,

‖u‖2 = ‖〈u, v〉‖v‖2

v‖2 + ‖w‖2

=|〈u, v〉|2

‖v‖2+ ‖w‖2

≥|〈u, v〉|2

‖v‖2.

Multiplying both sides of this inequality by ‖v‖2 and then taking square roots gives the

desired inequality. The equality holds if ‖u‖ = |〈u,v〉|2‖v‖2 and is possible if and only if w = 0.

But w = 0 if and only if u is a multiple of v.

Proof (Alternative). By the property of inner product

0 ≥〈‖u‖v − ‖v‖u, ‖u‖v − ‖v‖u〉

=‖u‖2〈v, v〉 − 2‖u‖‖v‖〈u, v〉+ ‖v‖2〈u, u〉

=2‖u‖2‖v‖2 − 2‖u‖‖v‖〈u, v〉

Therefore,

〈u, v〉 ≤ ‖u‖‖v‖.

We can apply the same reasoning to −u instead of u and we obtain the Cauchy-Schwarz

inequality.

Theorem 10.13 (Triangle Inequality). Suppose u, v ∈ V . Then,

‖u+ v‖ ≤ ‖u‖+ ‖v‖.28

This inequality is an equality if and only if one of u , v is a nonnegative multiple of the

other.

Proof. We have,‖u+ v‖2 =〈u+ v, u+ v〉

=〈u, u〉+ 〈v, u〉+ 〈u, v〉+ 〈v, v〉

=〈u, u〉+ 〈v, v〉+ 〈u, v〉+ 〈u, v〉

=‖u‖2 + ‖v‖2 + 2Re〈u, v〉

≤‖u‖2 + ‖v‖2 + 2 |〈u, v〉|

≤‖u‖2 + ‖v‖2 + 2‖u‖‖v‖

=(‖u‖+ ‖v‖)2,

.

Taking square root of both sides of the inequality above gives the desired inequality.

Theorem 10.14 (Parallelogram Equality). Suppose u, v ∈ V . Then

‖u+ v‖2 + ‖u− v‖2 = 2(‖u‖2 + ‖v‖2).

Proof. We have,

‖u+ v‖2 + ‖u− v‖2 =〈u+ v, u+ v〉+ 〈u− v, u− v〉

=‖u‖2 + ‖v‖2 + 〈u, v〉+ 〈v, u〉+ ‖u‖2 + ‖v‖2 − 〈u, v〉 − 〈v, u〉

=2(‖u‖2 + ‖v‖2).

Problem 10.1. Suppose V is a real inner product space. Then for all u, v ∈ V ,

〈u, v〉 =‖u+ v‖2 − ‖u− v‖2

4.

Problem 10.2. Suppose V is a complex inner product space. Then, for all u, v ∈ V ,

〈u, v〉 =‖u+ v‖2 − ‖u− v‖2 + ‖u+ iv‖2i− ‖u− iv‖2i

4.

29

11. Exercise of Chapter 6

Problem 11.1 (6A.5). Suppose T ∈ L(V ) is such that ‖Tv‖ ≤ ‖v‖ for every v ∈ V .

Prove that T −√

2I is invertible.

Solution. Let v ∈ null(T −√

2I). Then,

(T −√

2I)v = 0

⇒Tv =√

2v

⇒‖Tv‖ = ‖√

2v‖

⇒‖v‖ ≥√

2‖v‖

⇒(√

2− 1)‖v‖ ≤ 0

⇒‖v‖ ≤ 0

⇒‖v‖ = 0⇒ v = 0

Hence null(T −√

2I) = {0} and therefore T −√

2I is invertible.

Problem 11.2. Suppose u, v ∈ V and ‖u‖ = ‖v‖ = 1 and 〈u, v〉 = 1. Prove that u = v.

Solution.‖u− v‖2 =〈u− v, u− v〉

=〈u, u〈+〈u,−v〈+〈−v, u〈+〈v, v〈

=‖u‖2 − 〈u, v〈−〈u, v〈+‖v‖2

=1− 1− 1 + 1

=0

Therefore, u− v = 0 which implies u = v.

30

12. Some important results

Problem 12.1. If A is of full column rank then ATA is invertible.

Solution. Let A be an m × n matrix and rankA = n. Note that ATA is an square

matrix of dimension n× n. The matrix ATA is invertible if and only if nullATA = {0}.Let v ∈ nullATA. Then,

ATAv = 0⇒ vTATAv = 0⇒ (Av)TAv = 0⇒ ‖Av‖2 = 0⇒ ‖Av‖ = 0⇒ Av = 0.

Therefore, v ∈ nullA. Since A is of full column rank nullA = {0} and hence v = 0.

Hence,

nullATA = {0}.

Therefore, ATA is invertible. This is same as saying, ATA has rank n or ATA is positive

definite. If A is not known to be of full column rank then ATA is positive semidefinite.

31

13. Operations on Inner Product Spaces

Definition 13.1. Suppose T ∈ L(V,W ). The adjoint of T is the function T ∗ : W → V

such that

〈Tv, w〉 = 〈v, T ∗w〉

for every v ∈ V and every w ∈ W .

Remark 6. The Riesz representation theorem states that:

Suppose V is finite-dimensional and φ is a linear functional on V . Then there is a

unique vector u ∈ V such that

φ(v) = 〈u, v〉

By the Riesz representation theorem, there is a unique vector u such that

φ(Tv) = 〈Tv, u〉

we call the unique vector T ∗w to get 〈Tv, w〉 = 〈v, T ∗w〉.

Theorem 13.1. Properties of adjoint:

• (S + T )∗ = S∗ + T ∗

• (λT )∗ = λT ∗

• (T ∗)∗ = T

• I∗ = I

• (ST )∗ = T ∗S∗

Theorem 13.2. Suppose T ∈ L(V,W ). Then,

(1) nullT ∗ = (rangeT )⊥

(2) rangeT ∗ = (nullT )⊥

(3) nullT = (rangeT ∗)⊥

(4) rangeT = (nullT ∗)⊥

Proof. First we prove (a). Let w ∈ W . Then,

w ∈ nullT ∗ ⇔T ∗w = 0

⇔〈v, T ∗w〉 = 0, ∀v ∈ V

⇔〈Tv, w〉 = 0, ∀v ∈ V

⇔w ∈ (rangeT )⊥

To prove (b) we have to show that all the vectors of

Definition 13.2 (Self-Adjoint). An operator T ∈ L(V ) is called self-adjoint if T = T ∗.

In other words, T ∈ L(V ) is self adjoint if and only if

〈Tv, w〉 = 〈v, Tw〉

for all v, w ∈ V .

Theorem 13.3. Every eigenvalue of a self-adjoint operator is real..32

Theorem 13.4. Suppose V is a complex inner product space and T ∈ L(V ). Suppose

〈Tv, v〉 = 0

for all v ∈ V . Then T = 0.

Remark 7. This theorem is not true in real space. The following counter example shows

the fact. Let

T =

[0 1

−1 0

]6= 0.

Take v = (x, y) and consider the Euclidean inner product.

〈Tv, v〉 = 〈(y,−x), (x, y)〉 = xy − xy = 0

Theorem 13.5. Suppose T is a self-adjoint operator on V such that

〈Tv, v〉 = 0

for all v ∈ V . Then T = 0.

Proof. Note that for complex space it is true by the previous theorem. Now we want to

prove it for real inner product space. First we will prove that if 〈Tv, v〉 = 0 for all v ∈ Vthen T ∗ = −T .

〈T (v + w), v + w〉 = 0

⇒〈v + w, T ∗(v + w)〉 = 0

⇒〈v + w, T ∗v + T ∗w〉 = 0

⇒〈v, T ∗v〉+ 〈v, T ∗w〉+ 〈w, T ∗v〉+ 〈w, T ∗w〉

⇒〈v, T ∗w〉+ 〈w, T ∗v〉 = 0

⇒〈v, T ∗w〉 = −〈w, T ∗v〉

⇒〈v, T ∗w〉 = −〈Tw, v〉

⇒〈v, T ∗w〉 = 〈v,−Tw〉

⇒T ∗ = −TTherefore for self adjoint operator, T = T ∗ = −T ⇒ T = 0.

Theorem 13.6. Suppose V is a complex inner product space and T ∈ L(V ). Then T is

self-adjoint if and only if 〈Tv, v〉 ∈ R for every v ∈ V .

Definition 13.3.

T ∈ L(V ) is normal if TT ∗ = T ∗T

Theorem 13.7. An operator T ∈ L(V ) is normal if and only if

‖Tv‖ = ‖T ∗v‖

for all v ∈ V .33

Remark 8. The above theorem implies that nullT = nullT ∗ if T is normal. Using this

we can also prove that rangeT = rangeT ∗ as,

rangeT = (nullT ∗)⊥ = (nullT )⊥ = rangeT ∗

Theorem 13.8. Suppose T ∈ L(V ) is normal and v ∈ V is an eigenvector of T with

eigenvalue λ. Then v is also an eigenvector of T ∗ with eigenvalue λ.

Theorem 13.9. Suppose T ∈ L(V ) is normal. Then eigenvectors of T corresponding to

distinct eigenvalues are orthogonal.

Theorem 13.10 (Complex Spectral Theorem). Suppose F = C and T ∈ L(V ). Then

the following are equivalent:

(1) T is normal

(2) V has an orthonormal basis consisting of eigenvectors of T .

(3) T has a diagonal matrix with respect to some orthonormal basis of V .

Theorem 13.11 (Real Spectral Theorem). Suppose F = R and T ∈ L(V ). Then the

following are equivalent:

(1) T is self-adjoint

(2) V has an orthonormal basis consisting of eigenvectors of T .

(3) T has a diagonal matrix with respect to some orthonormal basis of V .

Theorem 13.12. Suppose T ∈ L is self-adjoint and U is a subspace of V that is invariant

under T . Then

(1) U⊥ is invariant under T .

(2) T |U ∈ L(U) is self-adjoint;

(3) T |U⊥ ∈ L(U⊥) is self-adjoint.

Remark 9. About Normal Operator

(1) rangeT = rangeT ∗

(2) In complex inner product space every normal operator has a square root. It is

because T = V DV T since T is normal then take S = V D1/2V T .

Remark 10. About Self Adjoint

A projection is orthogonal if and only if it is self adjoint.

The eigenvalues are real

Here we state some important results.

Lemma 4. If 〈Ax, x〉 = 0 for all x ∈ V in complex field then A = 0

Proof.

0 = 〈A(x+ y), x+ y〉 = 〈Ax, x〉+ 〈Ax, y〉+ 〈Ay, x〉+ 〈Ay, y = 〈Ax, y〉+ 〈Ay, x〉34

Now use x+ iy in place of x+ y to get,

0 = −i〈Ax, y〉+ i〈Ay, x〉 ⇒ 〈Ax, y〉 − 〈Ay, x〉 = 0.

Hence,

〈Ax, y〉 = 0∀x, y ∈ V.

Now use y = Ax to get

〈Ax,Ax〉 = ‖Ax‖2 = 0

which implies Ax = 0 and is true for all x ∈ V and hence A = 0.

Lemma 5. If A is normal if and only if ‖Ax‖ = ‖A∗x‖.

Proof. Since A is normal AA∗ = A∗A and hence (A∗A − AA∗)x = 0 which gives the

following:

0 = 〈(A∗A−AA∗)x, x〉 = 〈A∗Ax, x〉−〈AA∗x, x〉 = 〈Ax,Ax〉−〈A∗x,A∗x〉 = ‖Ax‖2−‖A∗x‖2.

The result follows immediately.

Lemma 6. If A is normal then nullA = nullA∗.

Proof.

x ∈ nullA⇔ Ax = 0⇔ ‖Ax‖ = 0⇔ ‖A∗x‖ = 0⇔ A∗x = 0⇔ x ∈ nullA∗

Hence

nullA = nullA∗

Lemma 7. For any operator A in finite dimensional complex vector space

(rangeA)⊥ = nullA∗

Proof. Let w ∈ rangeA and v ∈ nullA∗. Then there is u ∈ V such that w = Au and

P ∗v = 0. Now

〈v, w〉 = 〈v,Au〉 = 〈A∗v, u〉 = 〈0, u〉 = 0

since w and v were arbitrary we can conclude the result.

Lemma 8. For any projection P ,

nullP = range(I − P ).

Proof. Let v ∈ nullP . Then, Pv = 0 which implies (I − P )v = v and hence v ∈range(I − P ).

Now let v ∈ range(I − P ). Then there is u such that (I − P )u = v and so

Pv = P (I − P )u = Pu− P 2u = Pu− Pu = 0

Hence

nullP = range(I − P ).35

Theorem 13.13. If P is normal and a projection matrix then P is orthogonal projection

and self adjoint.

Proof. Suppose P is normal that is P ∗P = PP ∗ and P is a projection that is P 2 = P .

(rangeP )⊥ = nullP ∗ = nullP = range(I − P ).

In particular x− Px ∈ range(I − P ) is orthogonal to Px ∈ rangeP . So,

〈x, (P − P ∗P )x〉 = 〈x, (I − P ∗)Px〉 = 〈x, (I − P )∗Px〉 = 〈(I − P )x, Px〉 = 0.

Therefore,

P − P ∗P = 0⇒ P = P ∗P

Taking adjoint

P ∗ = P ∗P.

Hence P ∗ = P that is P is self adjoint. Therefore, the projection is an orthogonal

projection.

36

14. Problems on Operations in inner product spaces

Problem 14.1 (7A.4). Suppose T ∈ L(V,W ). Prove that

(1) T is injective if and only if T ∗ is surjective.

(2) T is surjective if and only if T ∗ is injective.

Solution.

T is injective ⇔ nullT = {0} ⇒ (nullT )⊥ = V ⇔ rangeT ∗ = V ⇔ T ∗ is surjective.

T is surjective ⇔ rangeT = V ⇔ (rangeT )⊥ = {0} ⇔ nullT ∗ = {0} ⇔ T ∗ is injective

Problem 14.2. Suppose P ∈ L(V ) is such that P 2 = P is an orthogonal projection.

Prove that P is an orthogonal projection if and only if P is self-adjoint.

Solution. First we suppose that P is an orthonormal projection. Thus there is a subspace

U of V such that P = PU . Suppose v1, v2 ∈ V . Write

v1 = u1 + w1, v2 = u2 + w2,

where u1, u2 ∈ U and w1, w2 ∈ U⊥. Now,

〈Pv1, v2〉 =〈u1, u2 + w2〉

=〈u1, u2 + w2〉

=〈u1, u2〉+ 〈u1, w2〉

=〈u1, u2〉

=〈u1, u2〉+ 〈w1, u2〉

=〈u1 + w1, u2〉

=〈v1, Pv2〉

Thus P = P ∗, and hence P is self-adjoint. To prove the implication in the order direction,

now suppose that P is self-adjoint. Let v ∈ V . Because P (v − Pv) = Pv − P 2v = 0, we

have

v − Pv ∈ nullP = (rangeP ∗)⊥ = (rangeP )⊥.

Writing

v = Pv + (v − Pv),

we have Pv ∈ rangeP and (v − Pv) ∈ (rangeP )⊥. Thus

Pv = PrangePv.

Because this holds for all v ∈ V , we have P = PrangeP , which shows that P is an

orthonormal projection.

Problem 14.3. Prove that if T ∈ L(V ) is normal, then

nullT k = nullT and rangeT k = rangeT

for every positive integer k.37

Solution. Suppose T ∈ L(V ) is normal and that k is a positive integer. Obviously we

can assume that k ≥ 2. It is obvious that nullT ⊆ nullT k. We only have to prove that

nullT k ⊆ nullT . Assume that v ∈ nullT k. Then

〈T ∗T k−1v, T ∗T k−1v〉 =〈TT ∗T k−1v, T k−1v〉

=〈T ∗T kv, T k−1v〉

=〈0, T k−1v〉

=0

Therefore, T ∗T k−1v = 0. Thus,

〈T k−1v, T k−1v〉 = 〈T ∗T k−1v, T k−2v〉 = 〈0, T k−2v〉 = 0,

which implies that T k−1v = 0. In other words, v ∈ nullT k−1. The same argument, with

k replaced with k− 1, shows that v ∈ nullT k−2. Continuing this process we can reach to

the conclusion that v ∈ nullT . Hence nullT k ⊆ nullT as well as nullT = nullT k

To show rangeT = rangeT k, note that T k = T (T k−1) and so rangeT k ⊂ rangeT . Also,

dim rangeT k = dimV − dim nullT k = dimV − dim nullT = dim rangeT.

Hence rangeT k = rangeT because one is subset of other and have same dimension.

Problem 14.4. Prove that a normal operator on a complex inner-product space is self-

adjoint if and only if all its eigenvalues are real.

Solution. If T is self-adjoint all its eigenvalues are real. Conversely, suppose that all

the eigenvalues of T are real. By the complex spectral theorem, since T is normal , T =

V DV T where D is diagonal with the eigenvalues on the diagonal and V is orthonormal.

Therefore, T ∗ = V DV T = T because D∗ = D.

Problem 14.5. Suppose V is a complex inner-product space and T ∈ L(V ) is a normal

operator such that T 9 = T 8. Prove that T is self adjoint and T 2 = T .

Solution. By the complex spectral theorem, there is an orthonormal basis (e1, . . . , en) of

V consisting of eigenvectors of T . Let λ1, . . . , λn be the corresponding eigenvalues. Thus

Tej = λjej

for j = 1, . . . , n.

λ9jej = T 9ej = T 8ej = λ8jej ⇒ λ9j = λ8j

which implies that λj equals 0 or 1. In particular, all the eigenvalues of T are real. This

implies that T is self-adjoint. Also,

T 2ej = λ2jej = λjej = Tej

where the second equality holds because λj = 0 or 1. Because T 2 and T agree on a basis,

they must be equal.38

Problem 14.6. Suppose T ∈ L(V ) is self-adjoint, λ ∈ F , and ε > 0. Prove that if there

exists v ∈ V such that ‖v‖ = 1 and

‖Tv − λv‖ < ε,

then T has an eigenvalue λ′ such that |λ− λ′| < ε

Solution. By the spectral theorem there is an orthonormal basis (e1, . . . , en) of V con-

sisting of eigenvectors of T . Let λ1, . . . , λn be the corresponding eigenvalues. Suppose

v ∈ V is such that ‖v‖ = 1 and ‖Tv − λv‖ < ε. Then we have,

v = 〈v, e1〉e1 + · · ·+ 〈v, en〉en

and so

Tv = λ1〈v, e1〉e1 + · · ·+ λn〈v, en〉enThen,

ε2 >‖Tv − λv‖2

=‖(λ1 − λ)〈v, e1〉e1 + . . . (λn − λ)〈v, en〉en‖2

= |λ1 − λ|2 |〈v, e1〉|2 + · · ·+ |λn − λ|2 |〈v, en〉|2

≥(min{|λ1 − λ|2, . . . , |λn − λ|2})(|〈v, e1〉|2 + · · ·+ |〈v, en〉|2)

= min{|λ1 − λ|2, . . . , |λn − λ|2}Thus ε > |λj − λ| for some j.

Problem 14.7. Let V be an inner-product space with inner product 〈·, ·〉V . Let W be

an inner product space with inner product 〈·, ·〉W . Let T ∈ L(V,W ). Define the adjoint

T ∗ of T . Under what sufficient condition does the adjoint exist? Under what sufficient

condition is the adjoint unique?

Solution. Let T ∈ L(V,W ). We define T ∗, the adjoint of T as the function

T ∗ :W → V

w −→ T ∗w

where T ∗w is such that ∀v ∈ V ,

〈Tv, w〉W = 〈v, T ∗w〉V .

T ∗ is linear because for all w1, w2 ∈ W and α, β ∈ F,

〈v, T ∗(αw1+βw2)〉V = α〈Tv, w1〉W+β〈Tv, w2〉W = α〈v, T ∗w1〉V +β〈v, T ∗w2〉V = 〈v, αλT ∗w1+βT∗w2〉V = .

T ∗ exists in finite dimension and it is unique.

Problem 14.8. Let V be a complex inner-product space. We consider T in L(V ). Give

necessary and sufficient condition on V and D for the following statements.

(1) T is self-adjoint if and only if there exist V and D such that T = V DV H and D

is diagonal and ....39

(2) T is normal if and only if there exists V and D such that T = V DV H and D

diagonal and ...

(3) T is an isometry if and only if there exist V and D such that T = V DV H and D

is diagonal and ...

(4) T is positive if and only if there exist V and D such that T = V DV H and D is

diagonal and ...

Solution. (1) V is orthonormal and eigenvalues are real that is V HV = V V H = I

and ∀i = 1, . . . , n, dii ∈ R(2) V is orthonormal that is V HV = V V H = I.

(3) V is orthonormal and eigenvalues of modulus 1 that is V HV = V V H = I and

∀i = 1, . . . , n, |dii| = 1

(4) V is orthonormal and eigenvalues are real nonnegative that is V HV = V V H = I

and ∀i = 1, . . . , n, dii ∈ R, dii ≥ 0

Problem 14.9. We consider a complex inner product space. Prove that every eigenvalue

of a self-adjoint operator is real.

Solution. Let (V,+, ·, 〈·, ·〉) be complex inner product space. Let T ∈ L(V ) is self

adjoint. So, T = T ∗. Let λ be an eigenvalue of T and x 6= 0 an associated eigenvector.

So, Tx = λx. Then,

λ〈x, x〉 = 〈λx, x〉 = 〈Tx, x〉 = 〈x, T ∗x〉 = 〈x, Tx〉 = 〈x, λx〉 = λ〈x, x〉.

Since x 6= 0 we have 〈x, x〉 6= 0 and hence λ = λ and therefore, λ is real.

Problem 14.10. We consider a complex inner product space. Prove that eigenvectors

of a self-adjoint operator corresponding to distinct eigenvalues are orthogonal.

Solution. Let (V,+, ·, 〈·, ·〉) be complex inner product space. Let T ∈ L(V ) is self

adjoint. So, T = T ∗. Let λ1 and λ2 are distinct eigenvalues of T and x1 6= 0, x2 6== are

associated eigenvectors respectively. Then,

λ1〈x1, x2〉 = 〈λ1x1, x2〉 = 〈Tx1, x2〉 = 〈x1, T ∗x2〉 = 〈x1, Tx2〉 = 〈x1, λ2x2 = λ2〈x1, x2〉

Since T is self adjoint the eigenvalues are real and so λ2 = λ2. Therefore,

λ1〈x1, x2〉 = λ2〈x1, x2〉 ⇒ (λ1 − λ2)〈x1, x2〉 = 0⇒ 〈x1, x2〉 = 0.

Hence x1 and x2 are orthogonal .

Problem 14.11. Let V be an inner-product space with inner product 〈·, ·〉V . Let W be

an inner-product space with inner product 〈·, ·〉W . Let T ∈ L(V,W ). We assume T has

an adjoint. Prove that the adjoint of T is unique.

Solution. Let T ∗1 and T ∗2 are distinct adjoints of T . Let v ∈ V and w ∈ W . Since T ∗1 is

adjoint of T ,

〈Tv, w〉W = 〈v, T1 ∗ w〉V .40

Since T ∗2 is adjoint of T ,

〈Tv, w〉W = 〈v, T2 ∗ w〉V .

We get, 〈v, T ∗1w〉V = 〈v, T ∗2w〉V and so,

〈v, (T1 ∗ −T ∗2 )w〉V = 0.

So, (T ∗1 − T ∗2 )w is orthogonal to all vectors in V since v was arbitrary. Therefore,

(T ∗1 − T ∗2 )w = 0

which is true for all w ∈ W and hence T ∗1 − T ∗2 = 0 showing that

T ∗1 = T ∗2 .

Hence adjoint is unique.

Problem 14.12. We consider the vector space of n -by- n complex matrices. Prove that

trace(AHB) defines an inner product.

Solution.

Problem 14.13. Let A be an n × n complex matrix. Define H = 12(A + A∗) and

S = 12(A−A∗). Prove that A is normal if every eigenvalues of H is also an eigenvalue of

S.

Solution. Let vi be an eigenvalue of H corresponding to the eigenvalue λi. By the

assumption, vi is also an eigenvector of S corresponding to µi(say). Then,

1

2(A+ A∗)vi = λvi;

1

2(A− A∗)vi = µvi.

Which implies,

Avi = (λ+ µ)vi; A∗vi = (λ− µ)vi.

Therefore,

AA∗vi = A(λ−µ)vi = (λ+µ)(λ−µ)vi = (λ−µ)(λ+µ)vi = (λ−µ)A∗vi = A∗(λ−µ)vi = A∗Avi.

Since H is Hermitian (because H∗ = H), there is an basis of Cn consisting of the eigen-

vectors of H. Let v1, . . . , vn be the eigen-basis of Cn. Let v ∈ Cn be arbitrary. Then for

some a1, . . . , an ∈ C, v =n∑i=1

aivi. Then,

AA∗v = AA∗b∑i=1

aivi =n∑i=1

aiAA∗vi =

n∑i=1

aiA∗Avi = A∗A

n∑i=1

aivi = A∗Av.

Since v was arbitrary, we conclude that AA∗ = A∗A and hence A is normal.

41

15. Scrambled Ideas

Remark 11. On Similarity Transformation:

(1) Two matrix A and B are called similar if there exists an invertible matrix P such

that A = PBB−1

(2) Determinant of similar matrices is same

detA = det(PBP−1) = det(P ) det(B) det(P−1) = det(P ) det(B)1

det(P )= detB

(3) Similar matrices have the same eigenvalues because

det(A−λI) = det(PBP−1−λI) = det(PBP−1−PλIP−1) = det(P (B−λI)P−1) = det(B−λI)

(4) To get Ak its helpful to have similarity transformation, A = V DV −1 where V is

invertible.

Remark 12. On Left and right eigenvectors

(1) Let A is diagonalizable then, A = V DV −1 with respect to the eigenbasis that is

the eigenvectors are the columns of V .

(2) Setting W = V −T we get A = W−TDW T which implies W TA = DW T that is

W Ti A = λiW

Ti . W T

i is the left eigenvector of A or the right eigenvector of AT .

(3) The eigenvalues of A and AT are same.

Remark 13. On the Symmetric matrices

(1) Symmetric matrix has real eigenvalues.

(2) Symmetric matrix has an orthonormal eigen-basis.

(3) We can diagonalize it as A = QΛQT

(4) Symmetric matrix is a combination of mutually perpendicular projection matrices.

A = QΛQT = λ1q1qT1 + λ2q2q

T2 + . . . . . .

(5) Signs of pivots are same as signs of eigenvalues. That is the number of positive

pivots is equal to that of positive eigenvalues and same for negative.

(6) A symmetric matrix can be factorize as

A = LDLT

It is because, any matrix can be written as A = LDU and since A = AT we must

have LDU = UTDLT . Since the factorization is unique, we must have U = LT

and therefore A = LDLT .

(7) Since symmetric matrices are diagonalizable and so have enough eigenvector to

make an eigen-basis there is no defective eigenvalue that is the minimal polynomial

has all linear factors.

Remark 14. On the SPD

(1) Each of the tests is a necessary and sufficient condition for the real symmetric

matrix A to be positive definite.42

• XTAx > 0 for all nonzero real vectors x.

• All the eigenvalues of A satisfies λi > 0

• All the upper left submatrices Ak have positive determinants.

• All the pivots (without row exchanges) satisfy dk > 0.

15.1. On Square root of a matrix

Definition 15.1. Square root of a matrix A is a matrix S such that A = S2.

Theorem 15.1. Suppose A is a square matrix. There is a positive semi-definite matrix

S such that A = S2 if and if A is positive semidefinite.

Proof. Suppose A is positive semidefinite.

A positive semi definite ⇒ A is Hermetian ⇒ A is normal .

So there is a unitary matrix U and a diagonal matrix D, whose diagonal entries are

the eigenvalues of A, such that D = U∗AU . The eigenvalues of A are all nonnegative,

which allows us to define a diagonal matrix E whose diagonal entries are the positive

square roots of the eigenvalues of A, in the same order as they appear in D. That is

E is the diagonal matrix with the non-negative diagonal entries such that E2 = D. Set

S = UEU∗, then,

S2 = UEU∗UEU∗ = UEInEU∗ = UE2U∗ = UDU∗ = A

First we verify that S is Hermitian,

S∗ = (UEU∗)∗ = UE∗U∗ = UETU∗ = UEU∗ = S.

Since E is diagonal and the entries are real E is Hermitian. Also the eigenvalues of E are

the diagonal entries and so non-negative. Hence E is positive semi-definite. Let x ∈ Vbe any vector, then,

x∗Sx = x∗UEU∗x = (xU∗)∗E(xU) ≥ 0

Hence, S is positive semi definite.

Now we assume A = S2, with S positive semi-definite. Then S is Hermitian, and we

check that A is Hermitian.

A∗ = (SS)∗ = S∗S∗ = SS = A.

Let x ∈ V be any vector, then

x∗Ax = x∗SSx = x∗S∗Sx = (Sx)∗Sx = ‖Sx‖ ≥ 0

15.2. On the Matrix Decompositions

Let A be an m × n matrix with rank n. We can use Gram-Schmidt orthogonalization

to factorize A into two matrices. One is an m × n matrix Q with orthonormal columns43

and the other is an n × n upper triangular matrix R. This factorization is known as

reduced QR factorization. A full QR factorization is done appending an additional m−northogonal columns to Q so that it become an m×m unitary matrix.

Theorem 15.2. Every A ∈ Cm×n (m ≥ n) has a full QR factorization, hence also a

reduced QR factorization.

Theorem 15.3. Let p ≥ q. Let A be a real p × q matrix with rank q. Prove that the

QR-decomposition A = QR is unique if R is forced to have positive entries on its main

diagonal, Q is p× q and R is q × q.

Proof. Assume that A = Q1R1 and A = Q2R2 with R1, R2 upper triangular with positive

entries on the diagonal and QT1Q1 = Iq and QT

2Q2 = Iq.

We first note that since A is full rank, R1 and R2 are invertible. We have

(15.1) Q1R1 = Q2R2.

Multiplying by QT1 on the left and by R−12 on the right, the equation (15.1) gives,

R1R−12 = QT

1Q2.

Since R1R−12 is upper triangular this means that QT

1Q2 is upper triangular. Now multi-

plying QT2 on the left and R−11 on the right equation (15.1) gives,

R2R−11 = QT

2Q1.

This means that QT2Q1 is upper triangular. So, QT

1Q2 is lower triangular. That is QT1Q2

is upper and lower triangular. So it is diagonal and also invertible.

Let us call D = QT1Q2. Then

R1 = DR2.

From equation (15.1) we get

Q1DR2 = Q2R2 ⇒ Q1D = Q2 ⇒ Q1 = Q2D−1.

Therefore, QT1Q1 = I, and, QT

2Q2 = I gives D2 = I. D has therefore ±1 on the

diagonal.

Since R1 = DR2, the diagonal entry of R1 are given by (R1)ii = Dii(R2)ii. But the

positivity of both (R1)ii and (R2)ii along with Dii = ±1 implies Dii = 1. Finally D = I

and so:

Q1 = Q2 and, R1 = R2.

Problem 15.1. Let A be an m× n matrix (m ≥ n), and let A = QR be a reduced QR

factorization

(a) Show that A has rank n if and only if all the diagonal entries of R are nonzero.

(b) Suppose R has k nonzero diagonal entries for some k with 0 ≤ k < n. What does

this implies about the rank of A? Exactly k? At least k? At most k? Give a

precise answer, and prove it.44

Solution. content...

45

16. On SPD

If A is symmetric positive definite then A1/2 exists and is well defined. For a symmetric

matrix A, the inverse A−1 is also symmetric. Because

(A−1)T = (A−1)TAA−1 = (A−1)TATA−1 = (AA−1)TA−1 = A−1.

Also for any nonsingular square matrix we can switch the inverse and transpose because

(A−1)TAT = (AA−1)T = I ⇒ (AT )−1 = (A−1)T

A SPD has Cholesky factorization that is if A is symmetric positive definite matrix then

A = CCT

where C is lower triangular with positive elements on the diagonal. (so C is invertible.)

Problem 16.1. Let A,B, and C represent three real n× n matrices, where A and B be

symmetric positive definite and C be invertible. Prove that each of the following is spd.

(a) A−1

(b) A+B

(c) CTAC

(d) A−1 − (A+B)−1

Solution. We use the definition and property that a real symmetric matrix A is

positive definite if and only if xTAx > 0 for all real n− dimensional vectors x 6= 0,

or equivalently, if all its eigenvalues are real and positive.

(a) Since A is symmetric, we have,

(A−1)T = (A−1)TAA−1 = (ATA−1)TA−1 = (AA−1)TA−1 = A−1

and thus A−1 is symmetric. Moreover, if A is positive definite, then all its eigen-

values are positive and real. Let λ be an eigenvalue of A−1 then A−1v = λv where

v is the corresponding eigenvector. But A−1v = λv implies Av = 1λv and therefore

1/λ is an eigenvalue of A and is real and positive and hence λ will be positive.

Hence A−1 is positive definite.

(b) A+B is symmetric because,

(A+B)T = AT +BT = A+B.

Also, for any x 6= 0,

xT (A+B)x = xTAx+ xTBx > 0.

Hence A+B is spd.

(c) CTAC is symmetric because

(CTAC)T = CTA(CT )T = CTAC.46

Also for any x 6= 0,

xTCTACx = (Cx)TACx.

If Cx = 0 then, xTCx = 0 which implies x = 0. Therefore for x 6= 0, Cx 6= 0.

Therefore since A is positive definite,

xTCTACx = (Cx)TACx > 0.

Hence CTAC is spd.

(d)

A−1−(A+B)−1 = (A−1(A+B)−I)(A+B)−1 = A−1B(A+B)−1 = [(A+B)(B−1A)]−1 = [AB−1A+A]−1

By (a) B−1 is spd. By (c), AB−1A = ATB−1A is spd. By (b), AB−1A+A is spd.

by (a) again, [AB−1A+ A]−1 is spd. Hence A−1 − (A+B)−1 is spd.

Problem 16.2. Suppose A is a positive symmetric real n × n matrix and B is a real

m×n matrix such that BBT is positive definite. Prove that the matrix BT (BA−1BT )−1B

is symmetric positive definite.

Solution. First note that is symmetric because. Since A is spd, by the previous problem

A−1 is spd. For x 6= 0, we have BTx 6= 0 because otherwise xTBBTx = 0 which

contradicts the fact that BBT is spd. Therefore for x 6= 0,

xTBA−1BTx = (BTx)TA−1BTx > 0

since A−1 is spd. Hence BA−1BT is spd. Again since the inverse of an spd is spd we have,

(BA−1BT )−1 is spd. Since BBT is spd, we can show that BTB is also spd. Obviously

BTB is symmetric. Let λ be an eigenvalue of BTB and v is the corresponding eigenvector.

Then, BTBv = λv which implies BBTBv = λBv showing that λ is also an eigenvalue of

BBT and hence λ > 0. Therefore BTB is spd. Hence for x 6= 0 Bx 6= 0. So, for x 6= 0,

xTBT (BA−1BT )−1Bx = (Bx)T (BA−1BT )−1Bx > 0

since (BA−1BT )−1 is spd. Hence BT (BA−1BT )−1B is spd.

Problem 16.3. Suppose A is a positive definite symmetric square real matrix and B is

a symmetric square real matrix. Show that there exists a square real matrix C such that

CTAC is the identity matrix and CTBC is a diagonal matrix.

Solution. Let C1 = A1/2. Then C−11 AC−11 is the identity matrix and C−11 BC−11 is

symmetric. We can write C−11 BC−11 = PDP T , where D is diagonal and P is orthogonal.

Then D = (P TC−11 )B(C−11 P ) and (P TC−11 )A(C−11 P ) = P T (C−11 AC−11 ) is the identity

matrix. Thus, one can take C = C−11 P

Problem 16.4. Assume the following general definition for a real positive semidefinite

matrix: an n × n real matrix A is said to be positive semidefinite if and only if, for all

vector x in Rn, xTAx ≥ 0. In particular, this definition allows real matrices which are

not symmetric to be positive semidefinite.47

(a) Prove that if A and B are real symmetric positive semidefinite matrices and matrix

A is nonsingular, then AB has only real nonnegative eigenvalues.

(b) Provide a counterexample showing that the requirement that the matrices are

symmetric cannot be dropped.

Solution. (a) Since A is symmetric positive definite, A1/2 and A−1/2 are well de-

fined. The matrix AB has the same eigenvalues as the matrix A−1/2ABA1/2 =

A1/2BA1/2. The latter matrix is selfadjoint and positive semidefinite, so it has

real nonnegative eigenvalues.

Alternative: Let λ be an eigenvalue of AB and v is the corresponding eigenvec-

tor. Then

ABv = λv ⇒ BABv = λBv = vT ⇒ vTBABv = λvTBv ⇒ (Bv)TABv = λvTBv.

Since A is nonsingular and positive semidefinite, all the eigenvalues are positive

and so A is positive definite. Therefore the right hand side is positive. Also

vTBv ≥ 0. So if λ < 0 then (Bv)TABv = λvTBv is inconsistent. Hence λ ≥ 0.

(b) We need to have a nonsymmetric matrix A. To create a positive semidefinite

matrix A, one simply takes a symmetric positive definite matrix H and then add

an antisymmetric matrix S, then A = H + S is positive semidefinite matrix. In

our case, we take

A =

[0 1

−1 0

]and B =

[1 0

0 1

].

In this case A is positive semidefinite nonsingular, B is positive semidefinite, and

AB does not have real nonnegative eigenvalues.

Problem 16.5. Let A be an n × n real symmetric positive semidefinite matrix. Let B

be an n× n real symmetric positive definite matrix.

(a) Prove that AB have real nonnegative eigenvalues.

(b) Prove that

det(A) det(B) ≤(

trace(AB)

n

)nSolution. (a) Since B is symmetric positive definite matrix, it has a Cholesky fac-

torization B = CCT , where C is lower triangular with positive elements on the

diagonal. Now we have,

AB = (C−TCT )A(CCT ) = (CT )−1(CTAC)CT .

Therefore AB is similar to CTAC, therefore AB and CTAC have the same eigen-

values.

Since A is symmetric, CTAC is symmetric as well, so CTAC has real eigenvalues.

Moreover, since C is invertible and A is positive semidefinite, CTAC is positive

semidefinite as well. Therefore CTAC is real symmetric positive semidefinite, so48

it has real nonnegative eigenvalues. We conclude that AB has real nonnegative

eigenvalues.

(b) Let λi, i = 1, 2, . . . , n, be the n eigenvalues of AB. (Where we repeat the eigen-

values according to their algebraic multiplicities). On the other hand, we note

that

det(A) det(B) = det(AB) =n∏i=1

λi.

On the other hand,

trace(AB) =n∑i=1

λi.

Since λi ≥ 0 by the arithmetic-geometric mean inequality we get,

(n∏i=1

λi

) 1n

≤

n∑i=1

n,

which leads us to the result.

Problem 16.6. We consider two real valued n × m matrices A and B such that A is

symmetric positive definite and B is anti-symmetric. Prove that A+B is invertible.

Solution. Since B is anti-symmetric, (which means by definition BT = −B), for all

vector x of size n, we have

xTBx = (xTBx)T = xTBTx = −xTBx

which implies xTBx = 0. Now let x be a n× 1 vector such that

(A+B)x = 0.

Then, multiplying on the left by xT , this implies

xT (A+B)x = xTAx+ xTBx = xTAx = 0.

Since A is positive definite, xTAx = 0 implies x = 0. Hence A+B has trivial null space

and therefore is invertible.

Problem 16.7. (a) Let A be a complex Hermitian matrix. Prove that A is positive

definite if and only if all the eigenvalues of A are positive.

(b) Let A =

2 0 0

0 3 −1

0− 1 3

. Let V = R3. We define the map ∗ : V × V → R by

u ∗ v = uTAv for all u, v ∈ V . Prove that * is an inner product on V .

(c) Use the inner product from above and the Gram-Schmidt orthogonalization pro-

cess to find an orthonormal basis for V .

Solution. (a) Let A be Hermitian positive definite. This means that, for all x 6= 0,

xHAx is real positive. Let λ be eigenvalue of A. Let v be an eigenvector of A49

associated with the eigenvalue λ such that vHv = 1. Now we see that vHAv =

vHλv = λvHv = λ. So λ is real positive.

Let A be Hermitian with all eigenvalues positive. Then since A is Hermitian, A

is diagonalizable with an orthonormal basis. So there exists V such that A =

V DV H . Let x be a nonzero vector of size n.

xHAx = XH(V DV H)x = xHV D1/2D1/2V Hx = (D1/2V Hx)H(D1/2V Hx) = ‖D1/2vHx‖2 > 0.

which implies that A is positive definite.

(b) A is symmetric and the eigenvalues of A are 2,2 and 4. So the eigenvalues of A

are all positive, so A is symmetric positive definite. Therefore uTAv defines an

inner product. (Theorem used: uTAv defines an inner product if and only if A is

symmetric positive definite.)

(c) We apply the Gram-Schmidt process to the basis e1, e2, e3 in order to obtain an

orthonormal basis for V .

eT1Ae1 = 2⇒ ‖e1‖ =√

2⇒ q1 = [√

2/2, 0, 0].

qT1 Ae2 = 0; qT1 Ae3 = 0.

eT2Ae2 = 3⇒ ‖e2‖ =√

3⇒ q2 = [0,√

3/3, 0].

qT2 Ae3 =√

3⇒ w = e3 −√

3q2 = [0, 1/3, 1].

wTAw = 8/3⇒ ‖w‖ = 2√

6/3⇒ q3 = [0,√

6/12,√

6/4]

50

17. On Row space, Column space, rank and nulity

The column space of a matrix A of dimension m× n is the subspace of Fm containing

all the linear combination of the columns of A. The solution to Ax = 0 form a vector

space that is called the null space of A which is a subspace of Fn. If the matrix A has

linearly independent columns (in case of square matrix we can say if it is invertible) then

the null space contains only the zero vector. Let the vector xn is in the null space and a

particular solution to the system Ax = b is xp. Then we can write a general solution or

complete solution of Ax = b as,

x = xp + xn

because

Ax = A(xp + xn) = Axp + Axn = b+ 0 = b

In the point of view we can infer that a system Ax = b has unique solution if the matrix

A has zero null space and b is in the column space of A. The requirement of b being in

the column space is required for the existence of the solution and null(A) = 0 is required

for the uniqueness of the solution. If the null space is not the zero then it contains infin-

itely many vectors and so in that case if a solution exists then there are infinitely many

solution to the system.

Now we will introduce two other important subspaces. An important term in the

discussion is the rank of a matrix. Rank of a matrix is the dimension of the column

space that is the number of independent columns. The row space is the subspace of Fm

containing all the linear combination of the rows for the matrix. A row space can also be

defined in terms of column space. Row space of a matrix A is the column space of AT .

The fourth fundamental subspace generated by a matrix A is the null space of AT which

is also called the left null space. The left null space is the subspace of Fm. So, the list of

four fundamental subspaces are,

• The column space of A is denoted by C(A). Its dimension is the rank r.

• The null space of A is denoted by null(A). Its dimension is n− r• The row space of A is the column space of AT . Its dimension is r

• The left null space of A is the null space of AT . Its dimension is m− r

Row operation preserves the row space but it does change the column space. The dimen-

sion of the null space is also known as nulity. We state to very important results which

are known as the fundamental theorem of orthogonality.

Theorem 17.1. The row space is orthogonal to the nullspace. The column space is

orthogonal to the left nullspace that is the null space of AT .

The picture of the four spaces by Gilbert51

Every b in the column space is a combination Ax of the columns. In fact, b is Axr,

with xr in the row space, since the nullspace component gives Axn = 0. If another vector

x′r in the row space gives Ax′r = b, then A(xr − x′r) = b− b = 0. This puts xr − x′r in the

null space and the row space, which makes it orthogonal to itself. Therefore it is zero and

thus xr = x′r. Therefore, exactly one vector in the row space is carried to b. A matrix

transforms its row space onto its column space.

Now we state some theorems about the rank and nullity which is the dimension of the

null space.

• Elementary row operations do not change the row space of a matrix.

• If a matrix A is in row echelon form, then the nonzero rows of A are linearly

independent.

• The rank of a matrix is equal to the number of nonzero rows in its row echelon

form.

• The row space is orthogonal complement of the null space in Fn. The left null

space is the orthogonal complement of the column space in Fn.

• Ax = b is solvable if and only if yT b = 0 whenever yTA = 0 or equivalently

ATy = 0.

• rank(A+B) ≤ rank(A) + rank(B).

52

18. On Diagonalizablity

A matrix A ∈ Fn×n is said to be diagonalizable if it is similar to a diagonal matrix that

is if there exists an invertible matrix U ∈ Fn×n and a diagonal matrix D ∈ Fn×n such

that

A = UDU−1

An n× n matrix A is said to be diagonalizable

• if and only if there exists a basis of Rn made of eigenvectors of A.

• if and only if A has n linearly independent eigenvectors.

• if and only if there exists a diagonal matrix D and an invertible matrix V such

that A = V DV −1.

Suppose A is diagonalizable. Then the formula A = UDU−1 implies that

A− λIn = UDU−1 − λUInU−1 = U(D − λIn)U−1

and hence that

dim null(A− λIn) = dim null(D − λIn)

for every point λ ∈ F because these matrices are similar. In particular, if λ = λj is an

eigenvalue of A, then

γj = dim null(A− λjIn) = dim null(D − λjIn)

is equal to the number of times the number λj is repeated in the diagonal matrix D. Thus,

γ1+ · · ·+γk = n. Also γj represents the dimension of the eigenspace corresponding to the

eigenvalue λj. So the sum of the dimension of the eigenspaces is n and therefore there

are n linearly independent eigenvectors because eigenvectors corresponding to distinct

eigenvalues are linearly independent. A sufficient condition for diagonalizability is that if

the square matrix of size n has n distinct eigenvalues then the matrix is diagonalizable.

Let A be an Hermitian matrix that is A = A∗. Then A is diagonalizable in an orthonormal

basis, therefore there exists Q, n× n unitary matrix, and D, n× n diagonal matrix such

that A = QDQ∗. Since A is Hermitian the eigenvalues are real and hence all the entries

of D are real.

An skew Hermitian matrix B that is B = B∗ is diagonalizable in an orthonormal basis

and all its eigenvalues are purely imaginary.

Problem 18.1. Let A and B be n×n matrices. Prove or disporve each of the following.

(a) If A and B are diagonalizable, then so is A+B

(b) If A and B are diagonalizable, then so is AB

(c) If A2 = A, then A is diagonalizable.

(d) If A2 is diagonalizable, then so is A.53

Solution. (a) Consider the following matrices:

A =

[1 1

0 0

]and B =

[−1 0

0 0

]The matrix A is upper triangular, so that its eigenvalues are its diagonal entries

namely 1 and 0. Since it has to distinct eigenvalues A is diagonalizable. The

matrix B is diagonalizable because it is already diagonal. But the sum of the

matrices,

A+B =

[0 1

0 0

]is not diagonalizable because it is a Jordan block of size 2 associated with eigen-

value 0. Hence, we can see that A and B both are diagonalizable, while A+B is

not diagonalizable.

(b) The statement is not true. A counterexample is

A =

[1 1

0 0

], B =

[0 0

0 1

], AB =

[0 1

00

].

(c) The statement is true. If A2 = A then A(A− I) = 0. It follows that the minimal

polynomial of A is either x (if A = 0) or (x − 1) (if A = I), or x(x − 1). In

any case, the minimal polynomial µA has no repeated roots, and thus A is

diagonalizable.

(d) The statement is not true. A counterexample is

A =

[0 1

00

], A2 =

[0 0

00

].

As we explained in part (a), the matrix A is not diagonalizable but A2 is diago-

nalizable.

Problem 18.2. Suppose that A is an m×n matrix and B is an n×m matrix, and write

Im for the m×m identity matrix. Show that if Im−AB is invertible, then so is In−BA.

Solution. Method 1: Let x ∈ null(In−BA). Then x−BAx = 0. It follows that BAx =

x, so AB(Ax) = Ax, which implies (In − AB)Ax = 0. Therefore Ax ∈ null(Im − AB).

Since Im−AB is invertible, the nullspace is trivial, so that Ax = 0 and thus x = BAx = 0.

Hence nullspace of In −BA is trivial and so In −BA is invertible.

Method 2:

Problem 18.3. Let A and B be n × n complex matrices such that AB = BA. Show

that if A has n distinct eigenvalues, then A , B and AB are all diagonalizable.

Solution. Let λ1, . . . , λn be the n distinct eigenvalues of A with corresponding (nonzero)

eigenvectors v1, . . . , vn. We know that a list of eigenvectors belonging to distinct eigenval-

ues must be a linearly independent list. Hence B = (v1, . . . , vn) is a basis of Cn consisting54

of eigenvectors of A, so that A is similar to the diagonal matrix diag(λ1, . . . . . . , λn).

For each i = 1, . . . , n, we have

ABvi = BAvi = Bλivi = λiBvi

which implies that Bvi is also an eigenvector of A corresponding to the eigenvalue λi.

Since A has n distinct eigenvalues all the eigenspaces are one dimensional and hence

Bvi = ciλivi for some ci ∈ C and we write ciλi = ψi. Therefore, Bvi = ψivi for all

i = 1, . . . , n. Hence the basis B is also a basis of eigenvectors of B so that vi is associated

with the eigenvalue ψi.

Now let V be the matrix with the eigenvector vi as its ith column. Then, AV = ΛV and

Bv = ΨV where Λ and ψ are the diagonal matrix corresponding to A and B respectively.

Then,

ABV = AΨV = ΨAV = ΦΛV = ΦV

where Φ = ΛΨ = diag(ψ1λ1, . . . , ψnλn).

55

19. On Orthogonal Projection

An orthonormal basis of a vector space V is very important in many situations. A list

of vectors in V that is orthonormal and also form a basis is called a orthonormal basis of

V . For an n dimensional space a list of n orthonormal vectors forms the orthonormal basis

because the orthonormal list of vectors are linearly independent. One of the advantages of

having orthonormal basis is that for any vector v ∈ V the coefficient of linear combination

is readily known. Suppose e1, . . . , en is an orthonormal basis of V and v ∈ V . Then

v = 〈v, e1〉e1 + · · ·+ 〈v, en〉en

and

‖v‖ = |〈v, e1〉|2 + · · ·+ |〈v, en〉|2 .

A good thing is that every finite-dimensional inner product space has an orthonormal

basis.

Definition 19.1. A linear transformation P of a vector space U over F into itself is said

to be a projection if P is idempotent that is if P 2 = P .

Any transformation projects a vector from its row space to its column space. If the

transformation is an operator then we can say that the transformation project a vector

into its column space. Then what is the specialty of the projection? For an non-projection

operator T if you take a vector u from the column space U then it will project it to U but it

is not necessarily project it to the same vector u. On the contrary, let the transformation

is a projection P , and u ∈ U where U is the column space of P . Then there is y ∈ Vsuch that Py = u. So, Pu = P 2y = Py = u.

Let P be an projection of V onto U where U is finite-dimensional subspace of V . Then,

U = CP = {Px : x ∈ V } and NP = {x ∈ V : Px = 0} is the null space of P . We want

to show that V = U ⊕Np.

Let x ∈ V . Then

x = Px+ (I − P )x

and Px ∈ U . Moreover (I − P )x ∈ NP , since

P (I − P )x = (P − P 2)x = (P − P )x = 0.

Thus

V = U +NP .

To prove that the sum is direct sum let y ∈ U ∩NP . Then,

y ∈ U ⇔ y = Py

y ∈ NP ⇔ Py = 0

which implies y = 0. Hence

V = U ⊕NP

56

Definition 19.2. A linear transformation P of an inner product space U over F into itself

is said to be an orthogonal projection if P is idempotent and self adjoint with respect to

the given inner product that is if

P 2 = P, and 〈Pu, v〉 = 〈u, Pv〉

for every pair of vectors u, v ∈ U.

The another way to define or express the orthonormal projection is that:

Definition 19.3. Suppose U is a finite-dimensional subspace of V . The orthogonal

projection of V onto U is the operator PU ∈ L(V ) defined as follows: For v ∈ V , write

v = u+ w, where u ∈ U and w ∈ U⊥. Then PUv = u.

Simply the above definition is saying that the column space of P which is U and the

null space are the orthogonal complement. Bellow we will show that these two definitions

are equivalent.

Let P be an orthogonal projection of V onto the finite-dimensional subspace U according

to the former definition. That is P 2 = P and 〈Pu, v〉 = 〈u, Pv〉. Let u ∈ U be arbitrary

and w ∈ U⊥. Then,

〈Pw, u〉 = 〈w,Pu〉 = 〈w, u〉 = 0

Since the above equality is true for all u ∈ U we can conclude that Pw ∈ U⊥. But

Pw ∈ U because P is the projection onto U . Therefore Pw ∈ U ∩U⊥ and hence Pw = 0.

Hence

Pv = Pu+ Pw = Pu = u.

Now we take the second definition and want to show that 〈Pu, v〉 = 〈u, Pv〉 for all

u, v ∈ V . Let u = u1 + u2 and v = v1 + v2 where u1, v1 ∈ U and u2, v2 ∈ U⊥.

〈Pu, v〉 = 〈u1, v1 + v2〉 = 〈u1, v1〉+ 〈u1, v2〉 = 〈u1, Pv〉 = 〈u1, Pv〉+ 〈u2, Pv〉 = 〈u, Pv〉

The importance of orthonormal basis of a subspace is that if we know the projection of a

vector along the basis vectors then we can simply add them together to get the projection

on the subspace. The following problem illustrate the property.

19.1. Projection in a Nutshell

(1) A projector is a square matrix P that satisfies P 2 = P . Another name is Idem-

potent.

(2) Two types of projections are orthogonal projection and oblique projection.

(3) The projector matrix P projects a vector v to the range of P along the null-space

of P .

(4) If P is a projector then I −P is also a projector and is called the complementary

projector to P .57

(5) range(P )=null(I − P ) also, null(P )=range (I − P ).

(6) range(P )∩ null (P ) = ∅.(7) Orthogonal Projection:

(a) If P ∈ C(m×m) is an orthogonal projector then

range(P ) ⊥ null(P )

(b) A projector P is orthogonal if and only if P = P ∗.

(c) If Q is has orthonormal columns then QQ∗ is an orthonormal projector onto

the column space of Q.

(d) The rank-one orthogonal projector in a single direction q can be written as

Pq = qq∗. To normalize it we can write the orthonormal projector in the

direction of q as Pq = qq∗

q∗q. So the rank-one projector isolates the component

of a vector in a singe direction. The complement of rank-one projector 1−Pqis the rank m− 1 orthogonal projector that eliminate the component in the

direction of q.

(e) Let a subspace generated by the vectors 〈a1, . . . . . . an〉 and let A be the m×nmatrix whose jth column is aj. Then the projector onto the subspace which

is also the column space or range of A is given by

P = A(A∗A)−1A∗ =AA∗

A∗A.

Problem 19.1. Let P2[0, 2] represent the set of polynomials with real coefficients and of

degree less than or equal to 2, defined on [0, 2]. For p = (p(t)) ∈ P2 and q = (q(t)) ∈ P2,

define

〈p, q〉 := p(0)q(0) + p(1)q(1) + p(2)q(2).

(a) Let T represent the linear transformation that maps an element p ∈ P2 to the

closest element of the span of the polynomial 1 and t in the sense of the norm

associated with the inner product. Find the matrix A of T in the standard basis

of P2.

(b) Is A symmetric? Is T self-adjoint? Do these facts contradict each other?

(c) Find the minimal polynomial of T

Solution. (a) We understand that T is the orthogonal projection onto the subspace

spanned by 1 and t. To find the matrix of T in the standard basis, let us apply

T to 1, t and t2. It is clear that T (1) = 1 and T (t) = t. Now we need to compute

T (t2). So we need to compute the orthogonal projection of t2 onto the subspace

spanned by 1 and t. To do that we need the orthogonal basis for the subspace.

Using the Gram-Schmidt algorithm, we get e1(t) = 1 and

e2(t) = t− 〈t, 1〉〈1, 1〉

1 = t− 1.

Hence, {e1, e2} is an orthogonal basis for the subspace spanned by 1 and t. Using

this orthogonal basis, we can now perform the orthogonal projection of t2 onto 158

and t to get the orthogonal projection onto the subspace spanned by 1 and t,

T (t2) =〈t2, 1〉〈1, 1〉

1 +〈t2, t− 1〉〈t− 1, t− 1〉

(t− 1) =5

3+ 2(t− 1) = −1

3+ 2t

Thus, the standard basis {1, t, t2} is mapped to {1, t,−13

+ 2t}. In coordinate

vectors, (1,0,0) is mapped to (1,0,0), (0,1,0) is mapped to (0,1,0) and (0,0,1) is

mapped to (-1/3,2,0). So the transformation matrix is

A =

1 0 −1/3

0 1 2

0 0 0

(b) No, the matrix is not symmetric. The transformation is self-adjoint being an

orthogonal projection. Since q − Tq is orthogonal to the plane U spanned by

1 and t and Tp is on the subspace U , we have 〈Tp, q − Tq〉 = 0. Similarly,

〈Tq, p− Tp〉 = 0 we have,

〈Tp, q〉 = 〈Tp, q−Tq〉+〈Tp, Tq〉 = 〈Tp, Tq〉 = 〈p+Tp−p, Tq〉 = 〈p, Tq〉+〈Tp−p, Tq〉 = 〈p, Tq〉.

The matrix A of the transformation T is given in the basis {1, t, t2}, which is not

an orthogonal basis, so the facts that xTAy 6= xTATy ( matrix is not symmetric)

and that 〈Tp, q〉 = 〈p, Tq〉 ( operator is self-adjoint) do not contradict each other.

(c) Since T is a projection, we know that T 2 − T = 0. Moreover, T 6= I (so the

minimal polynomial is not x − 1), T 6= 0 ( so the minimal polynomial is not x),

and thus minimal polynomial is µT = x2 − x

Problem 19.2. Let Pn represent the real vector space of polynomials in x of degree less

than or equal to n defined on [0, 1]. Give a real number a, we define Qn(a) the subset of

Pn of polynomials that have the real number a as a root.

(a) Let a be a real number. Show that Qn(a) is a subspace of Pn. Determine the

dimension of that subspace and exhibit a basis.

(b) Let the inner product in Pn be defined by 〈p, q〉 =∫ 1

0p(x)q(x)dx. Determine the

orthogonal complement of the subspace Q2(1) of P2.

Solution. (a) The polynomials in Qn(a) can be written as p(x) = (x− a)q(x) where

q(x) is a polynomial of degree less than or equal to n−1. Let p1(x) = (x−a)q1(x)

and p2(x) = (x− a)q2(x) are in Qn(a) and α, β ∈ R. Then

αp1(x) + βq1(x) = (x− a)(αq1(x) + βq2(x)).

Note that a is a root of αp1(x) + βp2(x) also αq1(x) + βq2(x) has degree less than

or equal to n−−1. Hence αp1(x) + βp2(x) ∈ Qn(a). Therefore, Qn(a) is indeed

a subspace. Since Qn(a) is isomorphic with Pn−1, its dimension is n and

{(x− a), (x− a)2, . . . , (x− a)n}

is a basis.59

(b) We can write the polynomial in P2 as a0 + a1(x − 1) + a2(x − 1)2. We need a

polynomial orthogonal to x− 1 and (x− 1)2, so∫ 1

0

(a0 + a1(x− 1) + a2(x− 1)2

)(x− 1)dx = 0,∫ 1

0

(a0 + a1(x− 1) + a2(x− 1)2

)(x− 1)2dx = 0,

which yields

−a02

+a13− a2

4= 0,

a03− a1

4+a25

= 0,

so a0a1a2

= a2

3/10

6/5

1

.Thus,

Q2(1)⊥ = {3a2 + 12a2(x− 1) + 10a2(x− 1)2, a2 ∈ R}

Problem 19.3. A complex n × n matrix P is idempotent if P 2 = P . Show that every

idempotent matrix is diagonalizable.

Solution. Let P be a complex n× n idempotent matrix. The relation P 2 = P reads as

well P (P − I) = 0. Therefore the eigenvalues of P are either 0 or 1. We consider the two

eigenspaces E0 (the eigenspace associated with the eigenvalue 0), and E1 ( the eigenspace

associated with the eigenvalue 0). Our goal is to prove that E0 ⊕ E1 = Cn. This will

prove that P is diagonalizable.

Note that E0 is null(P ).

Note as well that E1 is range(P ). This is less obvious. On the one hand E1 ⊂ range(P ),

since, if x ∈ E1, x = Tx so x ∈ range(P ) (in other word the eigenspace is always in the

range ). On the other had, if y ∈ range(P ), there exists x such that y = Px, and so

Py = P 2x = Px = y

so that y ∈ E1, so range(P ) ⊂ E1.

We now need to prove that null(P ) ⊕ range(P ) = Cn. Let y ∈ Cn, we can write

y = (Py) + (y − Py). The left-hand side (Py) belongs to range(P ). The right hand

side (y − Py) belongs to null(P ) since P (y − Py) = Py − P 2y = Py − Py = 0.

Therefore null(P )⊕ range(P ) = Cn. So E0⊕E1 = Cn. This proves that P is diagonaliz-

able.

60

20. On minimization problems

We start with the importance of orthogonal projection for the minimization problem

stating the following theorem.

Theorem 20.1. Suppose U is a finite-dimensional subspace of V , v ∈ V and u ∈ U .

Then

‖v − PUv‖ ≤ ‖v − u‖.

Furthermore, the inequality above is an equality if and only if u = PUv.

The above theorem simply says that PUv is such a point in the subspace U so that the

distance ‖v − u‖ is smallest when u = PUv.

Consider the least square problem which is stated as: Given A ∈ Cm×n,m ≥ n, b ∈ Cm,

find x ∈ Cn such that the residual ‖b−Ax‖2 is minimized. Since m ≥ n probabilistically

the vector b does not lie on the column space of A and so the solution of the system

Ax = b does not exists. To get an idea about parameters we seek the best vector y on

the column space of A so that we can solve the system Ax = y and the solution would

be close in the sense that y is the best approximation or projection of b on the column

space of A. By the above theorem the orthogonal projection gives the vector y so that

the residual b− y = b− Ax is the least.

Let P be the orthogonal projection onto the subspace U of V . Then V = U ⊕ U⊥. So,

the vector b ∈ V can be written as b = y+ r where y ∈ U and r ∈ U⊥ which is known as

the residual. Now in the context of solving the least square problem Ax = b, the range

of A is U and to minimize the residual we require the residual to be in U⊥. Therefore r

is orthogonal to the range of A that is it is orthogonal to every columns of A. That is

(col 1)T r = 0, . . . , (col n)T r = 0

which implies that

AT r = 0.

Therefore, AT (b− Ax) = 0⇒ ATAx = AT b. The equation

ATAx = AT b

is know as the normal equation and the matrix ATA is nonsingular if and only if A has

full rank. Therefore the solution x is unique if and only if A has full rank.

The solution can be written as x = (ATA)−1AT b. Here we can define the projection

matrix in a very nice way. Remind that y was the orthogonal projection of b on the range

of A. Therefore the orthogonal projection of b is given by

y = Ax = A(ATA)−1AT b =AAT

ATAb.

The matrix P = AAT

ATAis the orthonormal projection onto the range of A.

61

Definition 20.1. If A has full rank, then the solution x to the least square problem

is unique and is given by x = (ATA)−1AT b. The matrix (ATA)−1AT is known as the

pseudoinverse of A, denoted by A†:

A† = (ATA)−1AT

To solve the normal equation we take help of different factorization of a matrix such as

QR, SVD or Cholesky factorization. First we discuss the Singular Value Decomposition

(SVD). In SVD of a matrix A ∈ Cm×n we seek for decomposition of a matrix as

A = UΣV H ,

where U ∈ Fm×m and V ∈ Fn×n are orthonormal matrices and Σ is a diagonal matrix.

We can write the above equation as

AV = UΣ

which tells us that we are looking for orthonormal set of vectors in the row space which

are the columns of U so that they are mapped to another set of orthonormal vectors

which are the columns in the column space or range of A. The null space is simply

fine introducing zeros on the diagonal in Σ. So the bottom line is in SVD we seek for

orthonormal basis in the row space and an orthonormal basis in the column space so that

we can diagonalize a matrix.

If a matrix A is symmetric positive definite then we can use same orthonormal basis

for both the row and column space. In that case we can decompose the matrix in the

following way:

A = QΛQH

where Q is orthonormal and Λ is the diagonal matrix with the eigenvalues of A in the

diagonal. But in a general case we are not that lucky.

But fortunately we can use the fact of the symmetric positive definite matrices. Since

it is difficult to find the two orthonormal bases together we try to eliminate one of the

orthonormal basis. To do this let us compute,

AHT = V ΣHUHUΣV H = V Σ2V H

where Σ2 is simply the diagonal matrix with the square of the elements of Σ. So, the

columns of V are simply the eigenvalues of AHA and the entries on the diagonal of Σ2

are the eigenvalue. That is we have to do eigenvalue decomposition of AHA to find the

orthonormal basis V and to find the diagonal entries of Σ which are the positive square

root of the eigenvalues of AHA. Now consider,

AAH = UΣ2UH

and so we can find the orthonormal basis U using AAH . So if rank of A is r then we

will have orthonormal basis v1, . . . , vr and an orthonormal basis u1, . . . , ur of the column

space. Then we fill up the bases attaching the basis of the null space of A and null space62

of AT to make the basis for the whole space and get U and V completely. This process

is known as the full SVD.

The existence and uniqueness of SVD is given by the following theorem.

Theorem 20.2. Every matrix A ∈ Cm×n has a singular value decomposition. Further-

more, the singular values {σj} are uniquely determined, and, if A is square and the σj

are distinct, the left and right singular vectors {uj} and {vj} are uniquely determined up

to complex signs (i.e., complex scalar factors of absolute value 1).

63

Problem 20.1. Let A be a full column rank n×k matrix (so k ≤ n) and b to be a column

vector of size n. We want to minimize the squared Euclidean norm L(x) = ‖Ax − b‖22with respect to x.

(a) Prove that, if rank(A) = k, then ATA is invertible.

(b) Compute the gradient of L(x).

(c) Directly derive the normal equations by minimizing L(x), and then provide the

closed-form expression for x that minimizes L(x).

(d) We consider a QR factorization of A where Q is n× k and R is k× k. Show that

an equivalent solution for x is x = R−1QT b.

Solution.

(a) For the sake of contradiction assume that ATA is singular. Then there is x 6= 0 such

that ATAx = 0. Then xTATAx = 0 so that ‖Ax‖2 = 0 which implies, by the property

of norm, Ax = 0. So, x ∈ nullA which contradicts the fact that A is of full rank because

for a full ranked matrix, nullA = {0}. We proved that ATAx = 0 ⇒ x = 0. Since ATA

is square, this means that ATA is invertible.

(b) We can write L(x) as,

L(x) = (Ax− b)T (Ax− b) = (xTAT − bT )(Ax− b) = xTATAx− xTAT b− bTAx+ bT b

Since xTAT b is a scalar xTAT b = (xTAT b)T = bTAx and hence

L(x) = xTATAx− 2bTAx+ bT b

We use the following two propositions:

Proposition 1. Let the scalar α be defined by

α = yTAx

where y is m× 1, x is n× 1 , A is m× n, and A does not depend on x and y, then

∂α

∂x= yTA

∂α

∂y= xTAT

Proof. Define wT = yTA and note that α = wTx Hence,

∂α

∂x= wT = yTA.

Since α is a scalar we can write

α = αT = xTATy

hence,∂α

∂y= xTAT

64

Proposition 2. For the special case in which the scalar α is given by the quadratic form

α = xTAx

where x is n× 1 , A is n× n, and A does not depend on x, then

∂α

∂x= xT (A + AT )

Proof. By definition,

α =n∑j=1

n∑i=1

aijxixj

Differentiating with respect to the kth element of x we have

∂α

∂xk=

n∑j=1

akjxj +n∑i=1

aikxi

for all k = 1, . . . , n, and consequently,

∂α

∂x= xTAT + xTA = xT (AT + A)

Therefore,

∇L(x) = 2xTATA− 2bTA.

(c) Setting the gradient to zero, we get xTATA = bTA and transposing both sides we get

the normal equation

ATAx = AT b.

Since ATA is invertible, the unique solution of the normal equations is obtained as

x = (ATA)−1AT b.

(d) The QR factorization of A has the property that A = QR, with QTQ = I. We claim

that since A is of full rank R is invertible. Because if not then there exists some nonzero

x such that Rx = 0 which will imply Ax = QRx = 0 which says that dim nullA > 0 and

so rankA < k which contradicts to the fact that A is of full rank. Since R is invertible

so is RT and hence

x = (RTQTQR)−1RTQT b = (RTR)−1RTQT b = R−1(RT )−1RTQT b = R−1QT b

Problem 20.2. In this problem, R is the field of real numbers. Let (u1, . . . , um) be

an orthonormal basis for subspace W 6= {0} of the vector space V = Rn×1 (under the

standard inner product), let U be the n× n matrix defined by U = [u1, . . . , um], and let

P be the n× n matrix defined by A = UUT .

(a) Prove that if v is any given member of V ,, then among all the vectors w in W ,

the one which minimizes ‖v − w‖ is given by

w = 〈v, u1〉u1 + · · ·+ 〈v, um〉um.

(This vector w is called the projection of v onto W .)

(b) Prove: For any vector x ∈ Rn×1, the projection w of x onto W is given by w = Px.65

(c) Prove: P is a projection matrix. (Recall that a matrix P is called a projection

matrix if and only if P is symmetric and idempotent).

(d) If V = R3×1, and W = span[(1, 2, 2)T , (1, 0, 1)T ], find the projection matrix de-

scribed above and use it to find the projection of (2, 2, 2)T onto W .

Solution. (a)

66

21. Matrix Differentiation

Proposition 3. Let

y = Ax

where y is m× 1, x is n× 1 , A is m× n, and A does not depend on x, then

∂y

∂x= A

Proof. Since ith element of y is given by

yi =n∑k=1

aikxk

it follows that∂yi∂xj

= aij

for all i = 1, . . . ,m, j = 1, . . . , n. Hence

∂y

∂x= A


α = yTAx

where y is m× 1, x is n× 1 , A is m× n, and A does not depend on x and y, then

∂α

∂x= yTA

∂α

∂y= xTAT

Proof. Define wT = yTA and note that α = wTx Hence,

∂α

∂x= wT = yTA.

Since α is a scalar we can write

α = αT = xTATy

hence,∂α

∂y= xTAT

Proposition 5. For the special case in which the scalar α is given by the quadratic form

α = xTAx


∂α

∂x= xT (A + AT )

Proof. By definition,

α =n∑j=1

n∑i=1

aijxixj

67

Differentiating with respect to the kth element of x we have

∂α

∂xk=

n∑j=1

akjxj +n∑i=1

aikxi

for all k = 1, . . . , n, and consequently,

∂α

∂x= xTAT + xTA = xT (AT + A)

Proposition 6. For the special case where A is a symmetric matrix and

α = xTAx


∂α

∂x= 2xTA.


α = yTx

where y is n× 1, x is n× 1, and both y and x are functions of the vector z. Then

∂α

∂z= xT

∂y

∂z+ yT

∂x

∂z


α = xTx

where x is n× 1, and x is a function of the vector z. Then

∂α

∂z= 2xT

∂x

∂z.

68

22. Miscellaneous

Problem 22.1. Show that if A = AT (that is A is real matrix) in the field of complex

number, then all the eigenvalues of A are real numbers.

Solution. Note that eigenvalue of a real matrix may be complex. Let v be an eigenvector

of A such that v∗v = 1 corresponding to the eigenvalue λ. By v∗ we denote the conjugate

transpose. Since A is symmetric and the entries are real A∗ = A.

Av = λv; v∗A = λv∗

the last equality is obtained taking the conjugate transpose of the first equality and using

A = A∗. Then,

λ = λv∗v = v∗Av = v∗λv = λv∗v = λ

Since λ = λ it is clear that λ is real.

Problem 22.2. Show that if BT = −B (i.e. B is skew symmetric), then all the eigen-

values of B are pure imaginary or zero. (B is a matrix with real coefficients.)

Solution. Let (λ, x) be an eigencouple of B a skew symmetric matrix. Then

(22.1) Bx = λx.

. If we multiply on the left by xH , we get

(22.2) xHBx = λxHx.

. Now we transpose-conjugate equation (22.1) and get that xHBH = xHλ, we use the

fact that BH = BT (since B is real) and BT = −B (since B is skew-symmetric) to get

that xH(−B) = xHλ, we multiply both side by x on the right and rearrange to get that

(22.3) xHBx = −λxHx.

Since x is not zero, (22.2) and (22.3) imply that λ = −λ. Therefore λ is pure imaginary

or zero.

Problem 22.3. Let A be a real matrix. A generalized inverse of a matrix A is any matrix

G such that AGA = A. Prove each of the following:

(a) If A is invertible, the unique generalized inverse of A is A−1.

(b) If G is a generalized inverse of (XTX), then

XGXTX = X.

(c) For any real symmetric matrix A, there exists a generalized inverse of A.

Solution. (a) Since,

AA−1A = IA = A

so, A−1 is a generalized inverse of A. Now, if AGA = A, then

AG = AGAA−1 = AA−1 = I,69

so, G = A−1 and hence it is unique.

(b) For arbitrary vector v, we can write v = u+ w, where u ∈ nullXT and w = Xλ.

So, XTu = 0⇒ uTX = 0. Then,

vTXGXTX = (uT + λTXT )XGXTX = λTXTXGXTX = λTXTX = wTX = vTX.

Since v is arbitrary, XGXTX = X.

(c) Since A is real symmetric, it is diagonalizable; so A = PΛP T , where P is orthog-

onal and Λ is diagonal real, with the eigenvalues λ = (λ1, . . . , λn) on the diagonal.

Let γ = (γ1, . . . , γn) where

γi =

1λi

if λi 6= 0

0 if λi = 0.

Let Γ be the diagonal matrix with γ along the diagonal. Let G = PΓP T . Since

P is orthogonal, P TP = I. Thus,

AGA = PΛP TPΓP TPΛP T = PΛΓΛP T = PΛP T = A

Thus G is a generalized inverse of A.

If A is not given symmetric then we can use the SVD to find the generalized

inverse which is also known as pseudoinverse. Since every matrix has a singular

value decomposition let,

A = UΣV T

where U and V are orthonormal that is V TV = UTU = I and Σ = (σ1, . . . , σn) is

a diagonal matrix. Let γ = (γ1, . . . , γn) where

γi =

1σi

if σi 6= 0

0 if σi = 0.

Let Γ be the diagonal matrix with γ along the diagonal. Define G = V ΓUT .

Thus,

AGA = UΣV TV ΓUTUΣV T = UΣΓΣV T = A.

Hence G is the general inverse of A.

70

23. List of Important Problems

Problem1: Let F be a commutative field, let (V,+, .) be a finite dimensional vector space

over F, let U and W be two subspace of V . Show that there exists S, a subspace of V ,

such that V = S ⊕ U and V = S ⊕W if and only if dimU = dimW .

(1) rank(A+B) ≤ rankA+ rankB

(2) A matrix A is symmetric implies it has real eigenvalues and the eigenvectors of

A forms a basis of the vector space. In other words there is a orthonormal basis

of V such that A is a diagonal matrix. Since A has n eigenvectors there is no

defective eigenvalues that is there is no generalized eigenvalue and hence all the

factors in the minimal polynomial are in single power.

71

Documents

Introduction to Applied Linear Algebramath.ucdenver.edu/~spaul/empty/hostedfiles/linear.pdfApplied Linear Algebra De nitions, Theorems and Problems SUBRATA PAUL FALL - 2014 1 Contents