Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Introduction
to
Applied Linear AlgebraDefinitions, Theorems and Problems
SUBRATA PAUL
FALL - 2014
1
Contents
1. Vector Space 12. Finite-Dimensional Vector Spaces 23. Exercise on Finite Dimensional Vector Space 44. Linear Map 115. Exercise on Linear Map 136. Eigenvalues 166.1. Some comments 177. Problems on Eigenvalues 198. Operations on Complex Vector Spaces 218.1. Decomposition of an Operator 228.2. Characteristic and Minimal Polynomials 228.3. Jordan Form 239. Trace and Determinant 249.1. Trace 249.2. Determinant 2410. Inner Product Space 2610.1. Proof of some theorems 2811. Exercise of Chapter 6 3012. Some important results 3113. Operations on Inner Product Spaces 3214. Problems on Operations in inner product spaces 3715. Scrambled Ideas 4215.1. On Square root of a matrix 4315.2. On the Matrix Decompositions 4316. On SPD 4617. On Row space, Column space, rank and nulity 5118. On Diagonalizablity 5319. On Orthogonal Projection 5619.1. Projection in a Nutshell 5720. On minimization problems 6121. Matrix Differentiation 6722. Miscellaneous 6923. List of Important Problems 71
1. Vector Space
Definition 1.1 (Vector Space). A vector space is a set V along with an addition and a
scalar multiplication on V such that the following properties hold:
(1) commutative: u+ v = v + u for all u, v ∈ V(2) associativity (u+ v) + w = u+ (v + w) and (ab)v = a(bv) for all u, v, w ∈ V and
all a, b ∈ F.
(3) additive identity: there exists an element 0 ∈ V such that v + 0 = v = 0 + v for
all v ∈ V .
(4) additive inverse: for every v ∈ V , there exists w ∈ V such that v + w = 0;
(5) multiplicative identity 1v = v for all v ∈ V(6) distributive properties: a(u+ v) = au+ av and (a+ b)u = au+ bu for all a, b ∈ F
and all u, v ∈ V .
Definition 1.2 (Subspace). A subset U of V is called a subspace of V if U is also a
vector space.
Theorem 1.1. A subset U of V is a subspace of V if and only if U satisfies the following
three conditions:
(1) additive identity / U is non empty.
(2) αu+ βv ∈ U for u, v ∈ U and α, β ∈ F.
Definition 1.3 (Sum of subsets). Suppose U1, U2, . . . . . . , Um are subsets of V . The sum
of U1, . . . . . . , Um denoted U1 + · · · + Um, is the set of all possible sums of elements of
U1, . . . , Um. More precisely,
U1 + · · ·+ Um = {u1 + · · ·+ um : u1 ∈ U1, . . . , um ∈ Um}.
Theorem 1.2. Suppose U1, . . . . . . , Um are subspaces of V . Then U1 + U2 + . . . · · ·+ Um
is the smallest subspace of V containing U1, . . . . . . , Um.
Definition 1.4 (Direct Sum). Suppose U1, . . . . . . , Um are subspaces of V .
The sum U1 + . . . · · · + Um is called a direct sum if each element of U1 + . . . · · · + Um
can be written in only one way as a sum u1 + · · ·+ um, where each uj ∈ Uj.
Theorem 1.3 (Condition for a direct sum). Suppose U1, . . . . . . , Um are subspaces of V .
Then U1 + . . . · · ·+Um is a direct sum if and only if the only one way to write 0 as a sum
u1 + . . . · · ·+ um, where each ui is in Uj is by taking each uj = 0.
Theorem 1.4. Suppose U and W are subspaces of V . Then U + W is a direct sum if
and only if U ∩W = {0}.
1
2. Finite-Dimensional Vector Spaces
Definition 2.1 (Linear Combination). A linear combination of a list v1, . . . . . . , vm of
vectors in V is a vector of the form
a1v1 + . . . · · ·+ amvm
where a1, . . . . . . , am ∈ F.
Definition 2.2 (Span). The set of all linear combinations of a list of vectors v1, . . . . . . , vm
in V is called the span of v1, . . . . . . , vm, denoted span(v1, . . . , vm). In other words
span(v1, . . . . . . , vm) = {a1v1 + · · ·+ amvm : a1, . . . , am ∈ F}.
The span of the empty list () is defined to be {0}.
Theorem 2.1. The span of a list of vectors in V is the smallest subspace of V containing
all the vectors in the list.
Definition 2.3. A vector space is called finite-dimensional if some list of vectors in it
spans the space.
Definition 2.4 (Polynomial). A function p : F→ F is called a polynomial with coefficient
in F if there exist a0, . . . , am ∈ F such that p(z) = a0 + a1z + a2z2 + · · · + amz
m for all
z ∈ F. The set of all polynomials with coefficients in F is denoted by P(F).
Definition 2.5. A polynomial p ∈ P(F) is said to have degree m if there exists scalars
a0, a1, . . . , am with am 6= 0 such that
p(z) = a0 + a1z + . . . · · ·+ amzm
for all x ∈ F. The polynomial that is identically 0 is said to have degree −∞
Definition 2.6. For m a nonnegative integer, Pm(F) denotes the set of all polynomials
with coefficients in F and degree at most m.
Definition 2.7. A vector space is called infinite-dimensional if it is not finite -dimensional.
Definition 2.8 (Linearly Independent). A list v1, . . . . . . , vm of vectors in V is called
linearly independent if the only choice of a1, . . . . . . , am ∈ F that makes a1v1+. . . · · ·+amvmequal to 0 is a1 = . . . · · · = am = 0. The empty list ( ) is also declared to be linearly
independent.
Definition 2.9 (Linearly Dependent). A list of vectors in V is called linearly dependent if
it not linearly independent. In other words, a list v1, . . . . . . , vm of vectors in V is linearly
dependent if there exist a1, . . . . . . , am ∈ F not all 0, such that a1v1 + · · ·+ amvm = 0.
Lemma 1 (Linear Dependence Lemma). Suppose v1, . . . . . . , vm is a linearly dependent
list in V . Then there exists j ∈ {1, 2, . . . ,m} such that the following hold:
(a) vj ∈ span(v1, . . . . . . , vj−1);2
(b) if the jth term is removed from v1, . . . . . . , vm the span of the remaining list equals
span(v1, . . . , vm).
Theorem 2.2. In a finite-dimensional vector space, the length of every linearly indepen-
dent list of vectors is less than or equal to the length of every spanning list of vectors.
Theorem 2.3. Every subspace of a finite-dimensional vector space is finite-dimensional.
Definition 2.10 (Basis). A basis of V is a list of vectors in V that is linearly independent
and spans V .
Theorem 2.4. A list v1, . . . , vn of vectors in V is a basis of V if and only if every v ∈ Vcan be written uniquely in the form
v = a1 + . . . · · ·+ anvn,
where a1, . . . . . . , an ∈ F.
Theorem 2.5. Every spanning list in a vector space can be reduced to a basis of the
vector space.
Theorem 2.6. Every finite-dimensional vector space has a basis.
Theorem 2.7. Every linearly independent list of vectors in a finite-dimensional vector
space can be extended to a basis of the vector space.
Theorem 2.8. Suppose V is finite-dimensional and U is a subspace of V . Then there is
a subspace W of V such that V = U ⊕W .
Theorem 2.9. Any two bases of a finite-dimensional vector space have the same length.
Definition 2.11 (Dimension). The dimension of a finite-dimensional vector space is the
length of any basis of the vector space. The dimension of V is denoted by dimV .
Theorem 2.10. If V is finite-dimensional and U is a subspace of V , then dimU ≤ dimV.
Theorem 2.11. Suppose V is finite-dimensional. Then every linearly independent list
of vectors in V with length dimV is a basis of V .
Theorem 2.12. Suppose V is finite-dimensional. Then every spanning list of vectors in
V with length dimV is a basis of V .
Theorem 2.13 (Dimension of a sum). If U1 and U2 are subspaces of a finite-dimensional
vector space, then
dim(U1 + U2) = dimU1 + dimU2 − dim(U1 ∩ U2).
Remark 1. • One way to show that a sum is a direct sum is to set u1 + · · ·+um = 0
and show that it implies u1 = · · · = um = 0.
3
3. Exercise on Finite Dimensional Vector Space
Problem 3.1 (2A.7). Prove or give a counterexample: If v1, v2, . . . . . . , vm is a linearly
independent list of vectors in V , then
5v1 − 4v2, v2, v3, . . . . . . , vm
is linearly independent.
Solution. Since the list v1, v2, . . . vm is linearly independent, obviously v2, . . . , vm is lin-
early independent. For sake of contradiction assume that the list 5v1−4v2, v2, v3, . . . . . . , vm
is not linearly independent and therefore 5v1 − 4v2 ∈ span(v2, . . . , vm). There exists con-
stants c2, c3, . . . , cm in the respective field so that
5v1 − 4v2 = c2v2 + . . . · · ·+ cmvm
which implies
−5v1 + (c2 + 4)v2 + c3v3 + . . . · · ·+ cmvm = 0
which is a contradiction to the fact that (v1, . . . . . . , vm) is linearly independent because
−5 6= 0.
Problem 3.2 (2A.8). Prove or give a counterexample: If v1, v2, . . . . . . , vm is a linearly
independent list of vectors in V and λ ∈ F with λ 6= 0, then λv1, λv2, . . . . . . , λvm is
linearly independent.
Solution. Observe that
c1λv1 + c2λv2 + . . . · · ·+ cmλvm = 0
implies
c1v1 + c2v2 + . . . · · ·+ cmvm = 0
Since the set v1, v2, . . . . . . , vm is linearly independent we have c1 = c2 = . . . · · · = cm = 0.
Hence the set of vectors λv1, λv2, . . . . . . , λvm is linearly independent.
Problem 3.3 (2A.9). Prove or give a counterexample: If v1, . . . , vm and w1, . . . , wm are
linearly independent lists of vectors in V , then v1 + w1, v2 + w2, . . . , vm + wm is linearly
independent.
Solution. Take the space R2 and the vectors as, v1 = (1, 0), v2 = (0, 1), w1 = (0, 1)
and w2 = (1, 0). It is obvious that (v1, v2) and (w1, w2) are linearly independent lists of
vectors. But v1 + w1 = (1, 1) = v2 + w2 and hence (v1 + w1, v2 + w2) are not linearly
independent.
Problem 3.4 (2A.10). Suppose v1, . . . , vm is linearly independent in V and w ∈ V .
Prove that ifv1 + w, . . . , vm + w is linearly dependent, then w ∈ span(v1, . . . , vm).4
Solution. Since v1 +w, . . . , vm +w is linearly dependent, there is a list of scalars c1, c2 +
. . . . . . , cm with |c1|+ |c2|+ · · ·+ |cm| 6= 0 such that
c1(v1 + w) + . . . · · ·+ cm(vm + w) = 0.
Writing c = c1 + c2 + · · ·+ cm,
w =−c1cv1 +
−c2cv2 + . . . · · ·+ −cm
cvm
showing that w ∈ span(v1, . . . , vm)
Problem 3.5 (2A.11). Suppose v1, . . . , vm is linearly independent in V and w ∈ V . Show
that v1, . . . , vm, w is linearly independent if and only if w /∈ span(v1, . . . . . . vm).
Solution. If w ∈ span(v1, v2, . . . , vm) then v1, . . . , vm, w is not linearly independent and
hence by contrapositive, v1, . . . , vm, w implies w /∈ span(v1, . . . , vm)
Since v1, . . . , vm is linearly independent, the only way the list of vectors v1, . . . , vm, w
can be linearly dependent if w ∈ span(v1, . . . , vm). So by contrapositive of the above
argument w /∈ span(v1, . . . , vn) implies v1, . . . , vn, w is linearly independent.
Problem 3.6 (2A.12). Explain why three does not exists a list of six polynomials that
is linearly independent in P4(F).
Solution. A polynomial in P4(F) is defined as
p(x) = ax4 + bx3 + cx2 + dx+ e
which is uniquely determined by the coefficients and thus we can imagine them as vector
(a, b, c, d, e) in F4
Problem 3.7 (2A.13). Explain why no list of four polynomials spans P4(F).
Solution. The coefficients of the polynomials of P ∈ P4(F) can be uniquely represented
as a vector of F5. The correspondence between the polynomial and vector in F5 is an
bijection. Since a space F5 can not be spanned by four vectors, a list of four polynomials
can not spans P4(F).
Problem 3.8 (2A.14). Prove that V is infinite-dimensional if and only if there is a
sequence v1, v2, . . . . . . of vectors in V such that v1, . . . , vm is linearly independent for
every positive integer m.
Solution. For contradiction assume that V is finite dimensional with dimension m, then
any sequence of m + 1 vectors is not linearly independent. The contrapositive of the
above argument says, if there exists a sequence of vectors in V such that the collection
v1, . . . vm is linearly independent for any value of m then V is infinite dimensional.
Problem 3.9 (2A.17). Suppose p0, p1, . . . . . . , pm are polynomials in Pm(F) such that
pj(2) = 0 for each j. Prove that p0, p1, . . . , pm is not linearly independent in Pm(F).
5
Solution. Since pj(2) = 0, (x− 2) is a factor of pj for j = 1, . . . ,m. Then,
pj(x) = (x− 2)qj(x)
where qj is a polynomial in Pm−1(F). But the list of (m + 1) polynomial q0, . . . . . . , qm
is not linearly independent in Pm−1(F) and so there are coefficients c0, . . . . . . , cm not all
equal to zero such that
c0q0 + . . . · · ·+ cmqm = 0
which implies (x− 2) [c0q0 + . . . · · ·+ cmqm] = 0 and therefore
c0p0 + . . . . . . cmpm = 0
with cj 6= 0 for at least one j = 1, . . . ,m. Therefore p0, . . . , pm is not linearly independent.
Problem 3.10 (2B.8). Suppose U and W are subspaces of V such that V = U ⊕W .
Suppose also that u1, . . . , um is a basis of U and w1, w2, . . . , wn is a basis of W . Prove
that
u1, . . . , um, w1, . . . , wn
is a basis of V .
Solution. First we claim that u1, . . . , um, w1, . . . , wn spans V . Let v be an arbitrary
vector in V . Since V = U ⊕W there exists u ∈ U and w ∈ W such that v = u+w. But
since u1, . . . , um spans U and w1, . . . , wn spans W , we have,
v = u+ w = c1u1 + · · ·+ cmum + d1w1 + · · ·+ cnwn,
for some c1, . . . , cm, d1, . . . , dn ∈ F. Therefore, the list u1, . . . , um, w1, . . . , wn spans V .
Now we claim that u1, . . . , um is linearly independent. Let
a1u1 + · · ·+ b1um + c1w2 + · · ·+ cnwn = 0.
Which implies a1u1 + · · ·+ amum = −(a1w1 + · · ·+ nnwn). The left hand side belongs to
U and the write hand side belongs to W . Since U = U ⊕W , U ∩W = {0} and hence
the both sides are equal to zero. The fact that u1, . . . , um and w1, . . . , wn are linearly
independent implies
a1 = · · · = am = b1 = · · · = bn = 0.
Hence the set u1, . . . , um, w1, . . . , wn is linearly independent.
Therefore we conclude that u1, . . . , um, w1, . . . , wn is a basis of V .
Problem 3.11 (2C.1). Suppose V is finite-dimensional and U is a subspace of V such
that dimU = dimV . Prove that U = V
Solution. Let dimU = dimV = m and assume that v1, . . . , vm is a basis of U . Hence
v1, . . . , vm is linearly independent in V and therefore can be extended to a basis of V . But
since dimV is m the extension is trivial and hence v1 . . . , vm is a basis for V . Therefore
U = V .6
Problem 3.12 (2C.2). Show that the subspace or R2 are precisely {0}, R2, and all lines
in R2 through the origin.
Solution. The subspaces of R2 has dimension 0 or 1 or 2. The space with dimension
0 is {0} and the only space with dimension 2 is R2. The spaces of dimension 1 are the
straightline passing through origin.
Problem 3.13 (2C.4). For the space P4(F) of polynomials of degree less than or equal
to 4,
(a) Let U = {p ∈ P4(F) : p(6) = 0}. Find a basis of U .
(b) Extend this basis in part (a) to a basis of P4(F).
(c) Find a subspace W of P4(F) such that P4(F) = U ⊕W .
Solution. (a) A basis for U is (x4 − 64, x3 − 63, x2 − 62, x− 6)
(b) The extended basis for P4 would be (x4 − 64, x3 − 63, x2 − 62, x− 6, 6)
(c)
Problem 3.14 (2C.5). P4(F) is the space of polynomials.
(a) Let U = {p ∈ P4(R) : p′′(6) = 0}. Find a basis for U .
(b) Extend the basis in part (a) to a basis of P4(R).
(c) Find a subspace W of P4(R) such that P4(R) = U ⊕W .
Problem 3.15 (2C.6). Let U = {p ∈ P4(F) : p(2) = p(5)}.(a) Find a basis of U .
(b) Extend the basis of part (a) to a basis of P4(F).
(c) Find a subspace W of P4(F) such that P4(F) = U ⊕W .
Problem 3.16 (2C.7). Let U = {p ∈ P4(F) : p(2) = p(5) = p(6)}.(a) Find a basis of U .
(b) Extend the basis of part (a) to a basis of P4(F).
(c) Find a subspace W of P4(F) such that P4(F) = U ⊕W .
Problem 3.17 (2C.8). Let U = {p ∈ P4(R) :∫ 1
−1 p = 0}.(a) Find a basis of U .
(b) Extend the basis of part (a) to a basis of P4(F).
(c) Find a subspace W of P4(F) such that P4(F) = U ⊕W .
Problem 3.18 (2C.9). Suppose v1, . . . . . . , vm is linearly independent in V and w ∈ V .
Prove that
dim span(v1 + w, . . . . . . , vm + w) ≥ m− 1
Let us denote
V = span(v1, . . . , vm), W = span(w)
Note that dimV = m and dimW ≥ 0. Also, dim(V ∩W ) ≤ 1. Therefore
dim span(v1 + w, . . . , vm + w) = dim(V +W ) = dimV + dimW − dim(V ∩W ) ≥ m− 17
Problem 3.19 (2C.10). Suppose p0, . . . . . . , pm ∈ P(F ) are such that each pj has degree
j. Prove that p0, p1, . . . , pm is a basis of Pm(F).
Solution. Since each pj has degree j, pi is not linear combination of p0, . . . , pi−1 for any
i = 1 to m. Therefore p0, . . . , pm is a list of m + 1 linearly independent vectors in the
space Pm(F) with dimension m+ 1 and hence p0, . . . , p1 is a basis.
Problem 3.20 (2C.11). Suppose that U and W are subspaces of R8 such that dimU = 3,
dimW = 5, and U +W = R8. Prove that R8 = U ⊕W .
Solution.
dim(U ∩W ) = dimU + dimW − dim(U +W ) = 3 + 5− 8 = 0.
Therefore, U ∩W = {0} and hence R8 = U ⊕W .
Problem 3.21 (2C.12). Suppose U and W are both five-dimensional subspaces of R9.
Prove that U ∩W 6= {0}.
Solution.
dim(U ∩W ) = dimU + dimW − dim(U +W ) = 5 + 5− 9 = 1.
But dim{0} = 0, so U ∩W 6= {0}.
Problem 3.22 (2C.13). Suppose U and W are both 4-dimensional subspaces of C6.
Prove that there exists two vectors in U ∩W such that neither of these vectors is a scalar
multiple of the other.
Solution.
dim(U ∩W ) = dimU + dimW − dim(U +W ) = 4 + 4− 6 = 2.
So, there is a basis of U ∩W consisting of two linearly independent vectors and hence
one is not scalar multiple of other.
Problem 3.23 (2C.14). Suppose U1, . . . , Um are finite-dimensional subspaces of V . Prove
that U1 + · · ·+ Um is finite dimensional and
dim(U1 + · · ·+ Um) ≤ dimU1 + · · ·+ dimUm.
Solution.
Problem 3.24 (2C.15). Suppose V is finite-dimensional, with dimV = n ≥ 1. Prove
that there exist 1-dimensional subspaces U1, . . . , Um of V such that
V = U1 ⊕ U2 ⊕ · · · ⊕ Un.
Problem 3.25 (2C.16). Suppose U1, . . . , Um are finite-dimensional subspaces of V such
that U1 + · · ·+ Um is a direct sum. Prove that U1 ⊕ · · · ⊕ Um is finite dimensional and
dimU1 ⊕ · · · ⊕ Um = dimU1 + · · ·+ dimUm.8
Problem 3.26. Let F be a commutative field, let (V,+, .) be a finite dimensional vector
space over F, let U and W be two subspace of V . Show that there exists S, a subspace
of V , such that V = S ⊕ U and V = S ⊕W if and only if dimU = dimW .
Solution. First let us assume that there exists S ⊂ V such that V = S ⊕ U and
V = S ⊕W . We want to show that dimU = dimW .
V = S ⊕ U ⇒ dimV = dimS + dimU
V = S ⊕ V ⇒ dimV = dimS + dimW
The above two equations implies
dimU = dimW.
Now for the another direction we assume that dimU = dimW . We use the following
lemma:
Lemma 2. Let V be a finite dimensional vector space and U is a subspace of V then
there is a subspace W of V such that V = U ⊕W .
Proof. Let u1, . . . , um is a basis of U . We can extend the basis of U to the basis of V
adding w1, . . . , wn−m. Where n is the dimension of V . Then V = U +M because
v = a1u1 + · · ·+ amum + am+1w1 + · · ·+ anwn−m
where a1, . . . , an ∈ F, for any v ∈ V . In this expression a1u1 + · · · + amum ∈ U and
am+1w1 + · · ·+ anwn−m ∈ W . Let x ∈ U ∩W then there are a1, . . . , am, b1, . . . , bn−m ∈ Fsuch that x = a1u1 + · · ·+ amum = b1w1 + · · ·+ bn−mwn−m. Which implies
a1u1 + · · ·+ amum + b1w1 + · · ·+ bn−mwn−m = 0.
Since u1, . . . , um, w1, . . . , wn−m is linearly independent a1 = · · · = am = b1 = · · · =
bn−m = 0 and hence x = 0. Therefore
V = U ⊕W.
Since U ∩W is a subspace of U and W by the above lemma there exists U ′ and W ′
such that
U = (U ∩W )⊕ U ′; W = (U ∩W )⊕W ′.
Note that U +W = (U ∩W )⊕U ′⊕W ′ ⊂ V . So, by the lemma there exists H ⊂ V such
that
V = (U +W )⊕H = (U ∩W )⊕ U ′ ⊕W ′ ⊕H.
Let dimU = dimW = p and suppose u′1, . . . , u′p is a basis of U and w′1, . . . , w
′p is a basis
for W . Define,
G = span(u′1 + w′1, . . . . . . , u′p + w′p).
We require another lemma to continue.9
Lemma 3. U, V,W,U ′,W ′ and G are as defined above. Then
(U ∩W )⊕ U ′ ⊕W ′ = (U ∩W )⊕ U ′ ⊕G
(U ∩W )⊕ U ′ ⊕W ′ = (U ∩W )⊕W ′ ⊕G
Proof. Let x ∈ (U ∩W )⊕ U ′ ⊕W ′. Then x = x1 + x2 + x3 where x1 ∈ U ∩W , x2 ∈ U ′
and x3 ∈ W ′. Therefore, there are a1, . . . ap, b1, . . . , bp ∈ F such that
x2 = a1u′1 + · · ·+ apu
′p
x3 = b1w′1 + · · ·+ bpw
′p.
So,
x =x1 + a1u′2 + . . . apu
′p + b1w
′1 + · · ·+ bpw
′p
=x1 + (a1 − b1)u′1 + · · ·+ (ap − bp)u′p + b1(u′1 + w′1) + · · ·+ bp(u
′p + w′p)
=x1 + xU + xG
where xU ∈ U ′ and xG ∈ G. Let v1, . . . , vn−2p be a basis for U ∩W . We write
0 = xi + xU + xG
where xi = a1v1 + · · · + an−2pvn−2p ∈ U ∩W , xU = b1u′1 + · · · + bpu
′p ∈ U ′ and xG =
c1(u′1 + w′1) + · · · + cp(u
′p + w′p) ∈ G. Note that v1, ,̇vn−2p, u
′1, . . . , u
′p, w
′1, w
′p is linearly
independent. So,
0 =a1v1 + · · ·+ an−2pvn−2p + b1u′1 + · · ·+ bpu
′p + c1(u
′1 + w′1) + · · ·+ cp(u
′p + w′p)
=a1v1 + · · ·+ an−2pvn−2p + (b1 + c1)u′1 + · · ·+ (bp + cp)u
′p + c1w
′1 + · · ·+ cpw
′p
For the independence we get a1 = · · · = an−2p = c1 = · · · = cp = b1+c1 = · · · = bp+cp = 0
which implies
a1 = · · · = an−2p = c1 = · · · = cp = b1 = · · · = bp = 0
Therefore, xi = xU = xG = 0 and hence
(U ∩W )⊕ U ′ ⊕W ′ = (U ∩W )⊕ U ′ ⊕G
Similarly we can prove the other equation.
Using the above lemma we can write V in two ways:
V = ((U ∩W )⊕ U ′)⊕W ′ ⊕H = U ⊕G⊕H = U ⊕ S
V = ((U ∩W )⊕W ′)⊕ U ′ ⊕H = W ⊕G⊕H = W ⊕ S
where S = G⊕H.
10
4. Linear Map
Definition 4.1. A linear map from V to W is a function T : V → W with the following
properties:
• Additivity:
T (u+ v) = Tu+ Tv, ∀u, v ∈ V ;
• Homogeneity:
T (λv) = λ(Tv), ∀λ ∈ F, ∀v ∈ V.
Definition 4.2. The set of all linear map from V to W is denoted L(V,W ).
Definition 4.3. Suppose v1, . . . . . . , vn is a basis of V and w1, . . . . . . , wn ∈ W . Then
there exists a unique linear map T : V → W such that
Tvj = wj
for each j = 1, . . . . . . , n.
Definition 4.4. Suppose S, T ∈ L(V,W ) and λ ∈ F. The sum S + T and the product
λT are the linear maps from V to W defined by
(S + T )(v) = Sv + Tv, and (λT )(v) = λ(Tv).
for all v ∈ V .
Definition 4.5. With the operations of addition and scalar multiplication as defined
above L(V,W ) is a vector space.
Definition 4.6. If T ∈ L(U, V ) and S ∈ L(V,W ), then the product ST ∈ L(U,W ) is
defined by
(ST )(u) = S(Tu)
for u ∈ U .
Theorem 4.1. Algebraic properties of products of linear maps
• associativity
(T1T2)T3 = T1(T2T3);
• identity:
TI = IT = T ;
• distributive properties:
(S1 + S2)T = S1T + S2T, and S(T1 + T2) = ST1 + ST2.
Theorem 4.2. Suppose T is a linear map from V to W . Then T (0) = 0.
Definition 4.7 (Null Space). For T ∈ L(V,W ), the null space of T , denoted by nullT ,
is the subset of V consisting of those vectors that T maps to 0:
nullT = {v ∈ V : Tv = 0}11
Theorem 4.3. Suppose T ∈ L(V,W ). Then nullT is a subspace of V .
Definition 4.8. A function T : V → W is called injective if Tu = Tv implies u = v.
Definition 4.9. Let T ∈ L(V,W ). Then T is injective if and only if nullT = {0}.
Definition 4.10 (Range). For T a function from V to W , the range of T is a subset of
W consisting of those vectors that are of the form Tv for some v ∈ V ;
rangeT = {Tv : v ∈ V }
Theorem 4.4. If T ∈ L(V,W ), then rangeT is a subspace of W .
Definition 4.11. A function T : V → W is called surjective if its range equals W .
Theorem 4.5 (Fundamental Theorem of Linear Maps). Suppose V is a finite-dimensional
and T ∈ L(V,W ). The range T is finite-dimensional and
dimV = dim nullT + dim rangeT.
Theorem 4.6. Suppose V and W are finite-dimensional vector spaces such that dimV >
dimW . Then no linear map from V to W is injective.
Theorem 4.7. Suppose V and W are finite-dimensional vector spaces such that dimV <
dimW . Then no linear map from V to W is surjective.
Definition 4.12. A homogeneous system of linear equations with more variables than
equations has nonzero solutions.
Definition 4.13. An inhomogeneous system of linear equations with more equations
than variables has no solution for some choice of the constant terms.
12
5. Exercise on Linear Map
Problem 5.1 (3.A.4). Suppose T ∈ L(V,W ) and v1, . . . , vm is a list of vectors in V such
that Tv1, . . . , T vm is a linearly independent list in W . Prove that v1, . . . , vm is linearly
independent.
Solution. Let
c1v1 + · · ·+ cmvm = 0,
for some c1, . . . , cm ∈ F. Then,
c1Tv1 + · · ·+ cmTvm = T (c1v1 + · · ·+ cmvm) = T (0) = 0.
Since Tv1, . . . , T vm is linearly independent, c1 = · · · = cm = 0 and hence v1, . . . , vm is
linearly independent.
Problem 5.2 (3A.7). Show that every linear map from a 1-dimensional vector space to
itself is multiplication by some scalar.
Solution. Let v ∈ V , then Tv ∈ V . Since two vectors v, Tv in 1 dimensional vector
space is linearly dependent there exists λ, possibly dependent on v such that Tv = λv.
Now we will show that λ does not depend on v.
Let v1 6= 0 and ,Tv1 = λ1v1 and Tv2 = λ2v2. Since dimV = 1, v1, v2 is linearly dependent
and so there exists β ∈ F such that, v1 = βv2. Then,
Tv1 = T (βv2)⇒ Tv1 = βTv2 ⇒ λ1v1 = βλ1v2 ⇒ λ2v1 = λ1v1 ⇒ (λ2 − λ1)v1 = 0.
But since v1 6= 0, λ1 = λ2 and hence, Tv = λv for all v ∈ V and some λ ∈ F.
Problem 5.3 (3A.8). Give an example of a function φ : R2 → R such that
φ(av) = aφ(v)
for all a ∈ R and all v ∈ R2 but φ is not linear.
Solution.
Problem 5.4 (Prelim & 3B20). Suppose that W is finite dimensional and T ∈ L(V,W ).
Prove that T is in injective if and only if there exits S ∈ L(W,V ) such that ST is the
identity map in V .
Solution. Suppose T is injective. Since W is finite dimensional the subspace range(T )
of W is also finite dimensional. Let w1, . . . , wm be a basis of range(T ). Since w1, . . . , wm
belong to range(T ), there exists v1, . . . , vm in V such that w1 = Tv1, . . . w2 = Tv2, . . . .
Let
c1v1 + · · ·+ cmvm = 0
for some c1, . . . , cm ∈ F. Applying T we get,
c1Tv1 + · · ·+ cmTvm = c1w1 + · · ·+ cmwm = 0.13
Since w1, . . . , wm is linearly independent, we have c1 = · · · = cm = 0 and hence, v1, . . . , vm
is linearly independent as well.
Now let x in V . Then, since w1, . . . , wm spans range(T ), there exist scalars α1, . . . , αm
such that Tx = α1w1 + · · ·+ αmwm. So, Tx = α1Tv1 + · · ·+ αmTvm. Which implies,
T (x− α1v1 − . . . αmvm) = 0.
Now T is injective, so we get that x− α1v1 − · · · − αmvm = 0 and hence,
x = α1v1 + · · ·+ αmvm.
So, x in span(v1, . . . , vm). This proves that (v1, . . . , vm) spans V .
We conclude that v1, . . . , vm is a basis of V .
Let us extend the linearly independent set w1, . . . , wm of W with wm+1, . . . , wn so as
w1, . . . , wn is a basis of W . Now define S : W → V such tha t
Sw1 = v1, . . . , SWm = vm, . . . Swm+1 = · · · = Swn = 0.
It is clear that S ∈ L(W,V ) and that ST is the identity map on V .
Now suppose that there exists S ∈ L(W,V ) such that ST is the identity map on V . Let
x and y in V such that Tx = Ty. Multiplying by S, this means STx = STy, but ST is
the identity of STx = x and STy = y, so we get x = y, which means T is injective.
Now, assume that ST = I. Let Tv1 = Tv2 then,
v1 = STv1 = STv2 = v2.
Hence T is injective.
Problem 5.5. Suppose V is finite-dimensional and T ∈ L(V,W ). Prove that T is
surjective if and only if there exists S ∈ L(W,V ) such that TS is the identity map on W .
Solution. Since V is finite dimensional and T is surjective dimW ≤ dimV and hence
W is also finite dimensional. Let w1, . . . , wm is a basis of W . Since T is surjective there
exists v1, . . . , vm ∈ V such that
Tv1 = w1, . . . . . . , T vm = wm.
Define S ∈ L(W,V ) such that
Sw1 = v1, . . . Swm = vm
then clearly TS = IW . Now assume that there exists S ∈ L(W,V ) such that ST is the
identity map on W . Then for any w ∈ W , TSw = w and hence for any w ∈ W there is
Sw ∈ V such that T (Sw) = w. Therefore, T is surjective.14
Problem 5.6. Suppose U and V are finite-dimensional vector spaces and S ∈ L(V,W )
and T ∈ L(U, V ). Prove that
dim nullST ≤ dim nullS + dim nullT.
Solution. Let T ′ be the restriction of T on the nullST . Since U is finite dimensional
nullST ⊂ U is also finite dimensional and so,
dim nullST = dim nullT ′ + dim rangeT ′.
We want to show two things
• nullT ′ ⊂ nullT
• rangeT ′ ⊂ nullS
To show the first claim let u ∈ nullT ′. Then, T ′u = 0. So, Tu = T ′u = 0 and hence
u ∈ nullT . Therefore, nullT ′ ⊂ nullT .
Then rangeT ′ = {Tu : u ∈ nullST}. Let u ∈ nullST be arbitrary. Then, ST ′u = STu =
0 and hence T ′u ∈ S. Hence rangeT ′ ∈ nullS. Therefore,
dim nullST ≤ dim nullT + dim nullS.
Problem 5.7. Suppose U and V are finite-dimensional vector spaces and S ∈ L(V,W )
and T ∈ L(U, V ). Prove that
dim rangeST ≤ min{dim rangeS, dim rangeT}.
Solution. Since U is finite dimensional and T ∈ L(U, V ) and ST ∈ U ,W , the funda-
mental theorem of algebra implies,
dim nullST + dim rangeST = dimU = dimnulT + dim rangeT
which implies that,
dim rangeST = dim rangeT + dim nullT − dim nullST.
Let u ∈ nullT . Then Tu = 0 which implies STu = 0 and hence u ∈ nullST . Therefore
nullT ⊂ nullST . So, dim nullT − dim nullST ≤ 0. Hence,
dim rangeST ≤ dim rangeT.
Let w ∈ rangeST . Then there is u ∈ U such that w = STu. Which implies for any
w ∈ rangeST there is Tu ∈ V such that S(Tu) = w and hence w ∈ rangeS. Therefore,
rangeST ⊂ rangeS which implies,
dim rangeST ≤ dim rangeS.
Therefore,
dim rangeST ≤ min{dim rangeS, dim rangeT}.
15
6. Eigenvalues
Definition 6.1. Suppose T ∈ L(V ). A subspace U of V is called invariant under T if
u ∈ U implies Tu ∈ U .
Definition 6.2. Suppose T ∈ L(V ). A number λ ∈ F is called an eigenvalue of T if
there exists v ∈ V such that v 6= 0 and Tv = λv.
Definition 6.3. Let λ be an eigenvalue of an operator T ∈ L(V ). The set of all eigen-
vectors corresponding to λ along with the zero vector is a subspace, called the eigenspace
Eλ of T corresponding to eigenvalue λ.
Theorem 6.1. Suppose V is finite-dimension, T ∈ L(V ), and λ ∈ F. Then the following
are equivalent:
(a) λ is an eigenvalue of T ;
(b) T − λI is not injective;
(c) T − λI is not surjective;
(d) T − λI is not invertible.
Definition 6.4. Suppose T ∈ L(V ) and λ ∈ F is an eigenvalue of T . A vector v ∈ V is
called an eigenvector of T corresponding to λ if v 6= 0 and Tv = λv.
Theorem 6.2. Let T ∈ L(V ). Suppose λ1, . . . , λm are distinct eigenvalues of T and
v1, . . . , vm are corresponding eigenvectors. Then v1, . . . , vm is linearly independent.
Theorem 6.3. Suppose V is finite-dimensional. Then each operator on V has at most
dimV distinct eigenvalues.
Definition 6.5. Suppose T ∈ L(V ) and U is a subspace of V invariant under T .
• The restriction operator T |U ∈ L(U) is defined by
T |U(u) = Tu
for u ∈ U .
• The quotient operator T/U ∈ L(V/U) is defined by
(T/U)(v + U) = Tv + U
for v ∈ V .
Theorem 6.4 (Multiplicative properties of Matrix polynomial). Suppose p, q ∈ P(F)
and T ∈ L(V ). Then
(a) (pq)(T ) = p(T )q(T )
(b) p(T )q(T ) = q(T )p(T )
Theorem 6.5. A square matrix has an eigenvalue.
or,
Every operator on a finite-dimensional, nonzero, complex vector space has an eigenvalue.16
Theorem 6.6. Suppose T ∈ L(V ) and v1, . . . , vn is a basis of V . Then the following are
equivalent:
(c) the matrix of T with respect to v1, . . . , vn is upper triangular;
(b) Tvj ∈ span(v1, . . . , vj) for each j = 1, . . . , n;
(c) span(v1, . . . , vj) is invariant under T for each j = 1, . . . , n;
Theorem 6.7. Suppose V is a finite-dimensional complex vector space and T ∈ L(V ).
Then T has an upper-triangular matrix with respect to some basis of V .
or
Let A ∈ Mn×n(C) then for all invertible matrix V ∈ Mn×n(C) there exists upper-
triangular matrix T such that
A = V TV −1.
Definition 6.6. Two matrix A and B are called similar if there exists invertible matrix
V such that A = V BV −1. Intuitively, similar means same operator in two different basis.
Theorem 6.8. If A and B are similar then ΛA = ΛB where ΛA is the spectrum of A
which is the set of all eigenvalues of A.
Proof. Let (λ, v) be eigencouple of A. Then Av = λv. Since B is similar to A there is
an invertible matrix V such that A = V BV −1 and therefore,
V BV −1v = λv ⇒ B(V v) = λV v
showing that λ is an eigenvalue of B.
Theorem 6.9. Suppose T ∈ L(V ) has an upper-triangular matrix with respect to some
basis of V . Then the eigenvalues of T are precisely the entries on the diagonal of that
upper-triangular matrix.
Theorem 6.10 (Conditions equivalent to diagonalizability). Suppose V is finite-dimensional
and T ∈ L(V ). Let λ1, . . . , λm denote the distinct eigenvalues of T . Then the following
are equivalent:
(a) T is diagonalizable.
(b) V has a basis consisting of eigenvectors of T ;
(c) there exists 1-dimensional subspaces U1, . . . , Un of V , each invariant under T ,
such that
V = U1 ⊕ · · · ⊕ Un;
(d) V = E(λ1, T )⊕ · · · ⊕ E(λm, T );
(e) dim v = dimE(λ1, T ) + · · ·+ dimE(λm, T )
6.1. Some comments
• If U is an invariant subspace of V under T ∈ L(V ) with dimU = 1 then each
nonzero vector of U is an eigenvector of T . To show this let u ∈ U then Tu ∈ U17
but U is one dimensional and so Tu = λu for some λ ∈ F and hence u is an
eigenvector.
18
7. Problems on Eigenvalues
Problem 7.1 (5A.33). Suppose T ∈ L(V ). Prove that T/(rangeT ) = 0.
Solution.
Problem 7.2 (5A.34). Suppose T ∈ L(V ). Prove that T/(nullT ) is injective if and only
if (nullT ) ∩ (rangeT ) = {0}.
Solution. Let v ∈ (nullT ) ∩ (rangeT ). Then Tv = 0 and there is u ∈ V such that
Tu = v.
T/(nullT )(u+ nullT ) = Tu+ nullT = v + nullT = nullT
Therefore, u + nullT ∈ null(T/(nullT )) = nullT since T/(nullT ) is injective. So,
u ∈ nullT and hence v = Tu = 0. Therefore (nullT ) ∩ (rangeT ) = {0}.
Now assume that (nullT ) ∩ (rangeT ) = {0}. For the sake of contradiction let there
is v /∈ nullT such that v + nullT ∈ null(T/(nullT )). Then,
T/(nulT )(v + nullT ) = nullT ⇒ Tv + nullT = nullT ⇒ Tv ∈ nullT.
But Tv ∈ rangeT and so Tv ∈ (nullT ) ∩ (rangeT ) which implies Tv = 0. Hence
v ∈ nullT and therefore T/(nullT ) is injective.
Problem 7.3 (5A.25). Suppose T ∈ L(V ) and u, v are eigenvectors of T such that u+ v
is also an eigenvector of T . Prove that u and v are eigenvectors of T corresponding to
the same eigenvalue.
Solution. Let λ1, λ2, λ3 are the eigenvalue corresponding to the eigenvectors u, v, u + v
respectively. Then,
Tu = λ1u, Tv = λ2v, T (u+ v) = λ3(u+ v).
If u, v is linearly dependent then there is c ∈ F such that v = cu. In this case,
λ2v = Tv = T (cu) = cTu = cλ1u = λ1v ⇒ (λ2 − λ1)v = 0⇒ λ1 = λ2.
Now, suppose u, v is linearly independent. In this case,
λ1u+ λ2v = Tu+ Tv = T (u+ v) = λ3(u+ v)⇒ (λ1 − λ3)u+ (λ2 − λ3)v = 0.
But linear independence of u, v implies λ1 − λ3 = 0 = λ2 − λ3 and hence λ1 = λ2.
Problem 7.4 (5A.26). Suppose T ∈ L(V ) is such that every nonzero vector in V is an
eigenvector of T . Prove that T is a scalar multiple of the identity operator.
Solution. Let u, v ∈ V be any two vectors. Since u, v are eigenvectors and also u + v
is an eigenvector of T by the previous problem their corresponding eigenvalue is same.
This is true for any pair and hence we can conclude that all the vectors in V corresponds
to the same eigenvalue, say λ. Then,
(T − λI)v = 0, ∀v ∈ V ⇒ T − λI = 0⇒ T = λI19
Problem 7.5. Let V be a complex vector space. Let S and T be two operators on V
such that TS = ST .
(a) Prove that if λ is an eigenvalue of S then the eigenspace of S associated with the
eigenvalue λ, Eλ, is invariant under T .
(b) Prove that S and T have (at least) one common eigenvector.
Solution. Part a: Let v ∈ Eλ. Then Sv = λv.
S(Tv) = STv = T (Sv) = Tλv = λTv.
Therefore, Tv ∈ Eλ and hence Eλ is invariant under T .
Part b: Restrict T to the invariant subspace Eλ. Eλ is clearly finite-dimensional, nonzero
complex vector space and T |Eλ is an operator. Hence by the existence theorem of eigen-
value, there is a vector v ∈ Eλ which is an eigenvector of T . Since v ∈ Eλ, v is also an
eigenvector of S.
20
8. Operations on Complex Vector Spaces
Theorem 8.1. Suppose T ∈ L(V ). Then
{0} = nullT 0 ⊂ nullT 1 ⊂ · · · ⊂ nullT k ⊂ nullT k+1 ⊂ . . .
Theorem 8.2. Suppose T ∈ L(V ). Suppose m is a nonnegative integer such that
nullTm = nullTm+1. Then
nullTm = nullTm+1 = nullTm+2 = . . . . . .
Theorem 8.3. Suppose T ∈ L(V ). Let n = dimV . Then
nullT n = nullT n+1 = nullT n+2 = . . . . . . .
Theorem 8.4. Suppose T ∈ L(V ). Let n = dimV . Then
V = nullT n ⊕ rangeT n
Definition 8.1. Suppose T ∈ L(V ) and λ is an eigenvalue of T . A vector v ∈ V is called
a generalize eigenvector of T corresponding to λ if v 6= 0 and
(T − λI)jv = 0
for some positive integer j.
Definition 8.2. Suppose T ∈ L(V ) and λ ∈ F. The generalized eigenspace of T corre-
sponding to λ, denoted G(λ, T ), is defined to be the set of all generalized eigenvectors of
T corresponding to λ, along with the 0 vector.
Definition 8.3. Suppose T ∈ L(V ) and λ ∈ F. Then G(λ, T ) = null(T − λI)dimV .
Theorem 8.5. Let T ∈ L(V ). Suppose λ1, . . . , λm are distinct eigenvalues of T and
v1, . . . , vm are corresponding generalized eigenvectors. Then v1, . . . , vm is linearly inde-
pendent.
Definition 8.4. An operator is called nilpotent if some power of it equals 0.
Remark 2. If N is nilpotent then there is j such that N j = 0 and therefore (N−oI)jv = 0
for all v ∈ V . Hence G(0, N) = null(N − 0I)j = V
Theorem 8.6. Suppose N ∈ L(V ) is nilpotent. Then NdimV = 0.
Remark 3. Suppose N is a nilpotent operator on V . Then there is a basis of V with
respect to which the matrix of N has the form0 ∗. . .
0 0
;
here all entries on and bellow the diagonal are 0’s.21
8.1. Decomposition of an Operator
Theorem 8.7. Suppose T ∈ L(V ) and p ∈ P(F). Then null p(T ) and range p(T ) are
invariant under T.
Remark 4. So, G(λ, T ) which is null space of (T − λI) is invariant under T .
Theorem 8.8. Suppose V is a complex vector space and T ∈ L(V ). Let λ1, . . . , λm be
the distinct eigenvalues of T . Then
(a) V = G(λ1, T )⊕ · · · ⊕G(λm, T );
(b) each G(λj, T ) is invariant under T ;
(c) each (T − λjI)|G(λj ,T ) is nilpotent.
Theorem 8.9. Suppose V is a complex vector space and T ∈ L(V ). Then there is a
basis of V consisting of generalized eigenvectors of T .
Definition 8.5. Multiplicity
• Suppose T ∈ L(V ). The multiplicity of an eigenvalue λ of T is defined to be the
dimension of the corresponding generalized eigenspace G(λ, T ).
• In other words, the multiplicity of an eigenvalue λ of T equals dim null(T−λI)dimV
8.2. Characteristic and Minimal Polynomials
Definition 8.6 (Characteristic Polynomial). Suppose V is a complex vector space and
T ∈ L(V ). Let λ1, . . . .λm denote the distinct eigenvalues of T , with multiplicities
d1, . . . , dm. The polynomial
(z − λ1)d1 . . . (z − λm)dm
is called the characteristic polynomial fo T .
Theorem 8.10. Suppose V is a complex vector space and T ∈ V. Then
• the characteristic polynomial of T has degree dimV .
• the zeros of the characteristic polynomial of T are the eigenvalues of T .
Theorem 8.11 (Cayley-Hamilton Theorem). Suppose V is a complex vector space and
T ∈ L(V ). Let q denote the characteristic polynomial of T . Then q(T ) = 0.
Definition 8.7. Suppose T ∈ L(V ). Then there is a unique monic polynomial p of
smallest degree such that p(T ) = 0. The polynomial p is called the minimal polynomial.
Theorem 8.12. Suppose T ∈ L(V ) and q ∈ P(F). Then q(T ) = 0 if and only if q is
a polynomial multiple of the minimal polynomial of T . The characteristic polynomial is
multiple of the minimal polynomial of T .
Theorem 8.13. Let T ∈ L(V ). Then the zeros of the minimal polynomial of T are
precisely the eigenvalues of T .22
8.3. Jordan Form
Theorem 8.14. Suppose N ∈ L(V ) is nilpotent. Then there exist vectors v1, . . . , vn ∈ Vand nonnegative integers m1, . . . ,mn such that
• Nm1 , . . . , Nv1, v1, . . . , Nmnvn, . . . , Nvn, vn is a basis of V ;
• Nm1+1v1 = · · · = Nmn+1vn = 0.
23
9. Trace and Determinant
9.1. Trace
Definition 9.1. Suppose T ∈ L(V ).
• If F = C, then the trace of T is the sum of the eigenvalues fo T , with each
eigenvalue repeated according to its multiplicity.
• If F = R, then the trace of T is the sum of the eigenvalues of TC , with each
eigenvalues repeated according to its multiplicity.
• The trace of a square matrix A is the sum of the diagonal entries of A.
Theorem 9.1.
traceAB = traceBA
Proof. Using the diagonal entries definition:
traceAB =n∑j=1
n∑k=1
Aj,kBk,j =n∑k=1
n∑j=1
Bk,jAj,k = traceBA
Using sum of eigenvalues definition: First we show that eigenvalues of AB is also eigen-
values of BA. Let λ is an eigenvalue of AB. There is v 6= 0 such that
ABv = λv ⇒ BABv = Bλv = λBv
If Bv 6= 0 then λ is an eigenvalue of BA. If Bv = 0 then ABv = 0⇒ λ = 0 because we
assumed v 6= 0 and so 0 is an eigenvalue of AB. So,
det(BA) = det(B) det(A) = det(A) det(B) det(AB) = 0
which implies 0 is an eigenvalue of BA. So, traceAB = traceBA
Theorem 9.2.
traceT = trace(V TV −1)
Proof.
trace(V TV −1) = trace(TV V −1) = trace(TI) = traceT
9.2. Determinant
Definition 9.2. Suppose T ∈ L(V )
• If F = C, then the determinant of T is the product of the eigenvalues of T , with
each eigenvalue repeated according to its multiplicity.
• If F = R, then the determinant of T is the product of the eigenvalues of TC , with
each eigenvalue repeated according to its multiplicity.
Remark 5. If T ∈ R then det(T ) ∈ R.
Theorem 9.3.
detT = (−1)npT (0)24
where pT is the characteristic polynomial and n = dimV .
Theorem 9.4. An operator on V is invertible if and only if its determinant is nonzero.
Theorem 9.5.
PT (z) = det(zI − T )
Proof. Let λ ∈ ΛT then (z − λ) ∈ ΛT−λI .
PT (z) =∏
(z − λi) = det(zI − T )
25
10. Inner Product Space
Definition 10.1. An inner product on V is a function that takes each ordered pair (u, v)
of elements of V to a number 〈u, v〉 ∈ F and has the following properties:
• Positive: 〈u, v〉 ≥ 0 for all v ∈ V ;
• Definiteness: 〈v, v〉 = 0 if and only if v = 0.
• additive in first slot:
〈u+ v, w〉 = 〈u,w〉+ 〈v, w〉, ∀u, v, w ∈ V ;
• homogeneity in first slot:
〈λu, v〉 = λ〈u, v〉, ∀λ ∈ F, ∀u, v ∈ V ;
• conjugate symmetry
〈u, v〉 = 〈u, v〉,∀u, v ∈ V.
Definition 10.2. The Euclidean inner product on Fn is defined by
〈(w1, . . . , wn), (z1, . . . , zn)〉 = w1z1 + · · ·+ wnzn.
Example 1. An inner product can be defined on the vector space of continuous real-
valued functions on the interval [−1, 1] by
〈f, g〉 =
∫ 1
−1f(x)g(x)dx.
Theorem 10.1. Properties of Inner product:
(1) For a fixed u ∈ V , the function φ : V → F defined by, φ(v) = 〈u, v〉 is a linear
map from V to F.
(2) 〈0, v〉 = 0 for all v ∈ V .
(3) 〈u, 0〉 = 0, for all v ∈ V .
(4) 〈u, v + w〉 = 〈u, v〉+ 〈u,w〉 for all u, v, w ∈ V .
(5) 〈u, λv〉 = λ〈u, v〉 for all λ ∈ F and u, v ∈ V .
Definition 10.3. For v ∈ V , the norm of v , denoted by ‖v‖, is defined by
‖v‖ =√〈v, v〉.
Theorem 10.2 (Basic properties of norm). Suppose v ∈ V• ‖v‖ = 0 if and only if v = 0.
• ‖λv‖ = |λ|‖v‖ for all λ ∈ F.
Definition 10.4. Two vectors u, v ∈ V are called orthogonal if 〈u, v〉 = 0.
Theorem 10.3 (Pythagorean Theorem). Suppose u and v are orthogonal vectors in V .
Then
‖u+ v‖2 = ‖u‖2 + ‖v‖226
Theorem 10.4 (Orthogonal Decomposition). Suppose u, v ∈ V , with v 6= 0. Set c = 〈u,v〉‖v‖2
and w = u− cv. Then
〈w, v〉 = 0, and u = cv + w
Theorem 10.5 (Cauchy-Schwarz inequality). Suppose u, v ∈ V . Then
|〈u, v〉| ≤ ‖u‖‖v‖.
The inequality is an equality if and only if one of u, v is a scalar multiple of the other.
Proof.0 ≤〈‖u‖v − ‖v‖u, ‖u‖v − ‖v‖u〉
=‖u‖2〈v, v〉 − 2‖u‖v‖〈u, v〉+ ‖v‖2〈u, u〉
=2‖u‖2‖v‖2 − 2‖u‖‖v‖〈u, v〉Therefore,
2‖u‖‖v‖〈u, v〉 ≤ 2‖u‖2‖v‖2 ⇒ 〈u, v〉 ≤ ‖u‖‖v‖.
We can apply the same reasoning to −u instead of u and we obtain the Cauchy-Schwarz
inequality.
Theorem 10.6. Suppose u, v ∈ V . Then
‖u+ v‖ ≤ ‖u‖+ ‖v‖.
This inequality is an equality if and only if one of u, v is a nonnegative multiple of the
other.
Theorem 10.7 (Parallelogram Equality). Suppose u, v ∈ V . Then
‖u+ v‖2 + ‖u− v‖2 = 2(‖u‖2 + ‖v‖2).
Definition 10.5. A list e1, . . . , em of vectors in V is orthonormal if
〈ej, ek〉 =
1 if j = k,
0 if j 6= k.
Theorem 10.8. If e1, . . . , em is an orthonormal list of vectors in V , then
‖a1e1 + · · ·+ amem‖2 = |a1|2 + · · ·+ |am|2
for all a1, . . . , am ∈ F.
Theorem 10.9. Every orthonormal list of vectors is linearly independent.
Theorem 10.10. Suppose e1, . . . , en is an orthonormal basis of V and v ∈ V . Then
v = 〈v, e1〉e1 + · · ·+ 〈v, en〉en
and
‖v‖2 = |〈v, e1〉|2 + · · ·+ |〈v, en|2.27
10.1. Proof of some theorems
Theorem 10.11 (Pythagorean Theorem). Suppose u and v are orthogonal vectors in
V . Then
‖u+ v‖2 = ‖u‖2 + ‖v‖2
Proof. We have,‖u+ v‖2 =〈u+ v, u+ v〉
=〈u, u〉+ 〈u, v〉+ 〈v, u〉+ 〈v, v〉
=‖u‖2 + ‖v‖2
Theorem 10.12 (Cauchy-Schwarz Inequality). Suppose u, v ∈ V . Then,
|〈u, v〉| ≤ ‖u‖‖v‖.
The inequality is an equality if and only if one of u , v is a scalar multiple of the other.
Proof. If v = 0, then both sides of the desired inequality equal 0. Thus we assume that
v 6= 0. Consider the orthogonal decomposition
u =〈u, v〉‖v‖2
v + w,
where w is orthogonal to v. By the Pythagorean theorem,
‖u‖2 = ‖〈u, v〉‖v‖2
v‖2 + ‖w‖2
=|〈u, v〉|2
‖v‖2+ ‖w‖2
≥|〈u, v〉|2
‖v‖2.
Multiplying both sides of this inequality by ‖v‖2 and then taking square roots gives the
desired inequality. The equality holds if ‖u‖ = |〈u,v〉|2‖v‖2 and is possible if and only if w = 0.
But w = 0 if and only if u is a multiple of v.
Proof (Alternative). By the property of inner product
0 ≥〈‖u‖v − ‖v‖u, ‖u‖v − ‖v‖u〉
=‖u‖2〈v, v〉 − 2‖u‖‖v‖〈u, v〉+ ‖v‖2〈u, u〉
=2‖u‖2‖v‖2 − 2‖u‖‖v‖〈u, v〉
Therefore,
〈u, v〉 ≤ ‖u‖‖v‖.
We can apply the same reasoning to −u instead of u and we obtain the Cauchy-Schwarz
inequality.
Theorem 10.13 (Triangle Inequality). Suppose u, v ∈ V . Then,
‖u+ v‖ ≤ ‖u‖+ ‖v‖.28
This inequality is an equality if and only if one of u , v is a nonnegative multiple of the
other.
Proof. We have,‖u+ v‖2 =〈u+ v, u+ v〉
=〈u, u〉+ 〈v, u〉+ 〈u, v〉+ 〈v, v〉
=〈u, u〉+ 〈v, v〉+ 〈u, v〉+ 〈u, v〉
=‖u‖2 + ‖v‖2 + 2Re〈u, v〉
≤‖u‖2 + ‖v‖2 + 2 |〈u, v〉|
≤‖u‖2 + ‖v‖2 + 2‖u‖‖v‖
=(‖u‖+ ‖v‖)2,
.
Taking square root of both sides of the inequality above gives the desired inequality.
Theorem 10.14 (Parallelogram Equality). Suppose u, v ∈ V . Then
‖u+ v‖2 + ‖u− v‖2 = 2(‖u‖2 + ‖v‖2).
Proof. We have,
‖u+ v‖2 + ‖u− v‖2 =〈u+ v, u+ v〉+ 〈u− v, u− v〉
=‖u‖2 + ‖v‖2 + 〈u, v〉+ 〈v, u〉+ ‖u‖2 + ‖v‖2 − 〈u, v〉 − 〈v, u〉
=2(‖u‖2 + ‖v‖2).
Problem 10.1. Suppose V is a real inner product space. Then for all u, v ∈ V ,
〈u, v〉 =‖u+ v‖2 − ‖u− v‖2
4.
Problem 10.2. Suppose V is a complex inner product space. Then, for all u, v ∈ V ,
〈u, v〉 =‖u+ v‖2 − ‖u− v‖2 + ‖u+ iv‖2i− ‖u− iv‖2i
4.
29
11. Exercise of Chapter 6
Problem 11.1 (6A.5). Suppose T ∈ L(V ) is such that ‖Tv‖ ≤ ‖v‖ for every v ∈ V .
Prove that T −√
2I is invertible.
Solution. Let v ∈ null(T −√
2I). Then,
(T −√
2I)v = 0
⇒Tv =√
2v
⇒‖Tv‖ = ‖√
2v‖
⇒‖v‖ ≥√
2‖v‖
⇒(√
2− 1)‖v‖ ≤ 0
⇒‖v‖ ≤ 0
⇒‖v‖ = 0⇒ v = 0
Hence null(T −√
2I) = {0} and therefore T −√
2I is invertible.
Problem 11.2. Suppose u, v ∈ V and ‖u‖ = ‖v‖ = 1 and 〈u, v〉 = 1. Prove that u = v.
Solution.‖u− v‖2 =〈u− v, u− v〉
=〈u, u〈+〈u,−v〈+〈−v, u〈+〈v, v〈
=‖u‖2 − 〈u, v〈−〈u, v〈+‖v‖2
=1− 1− 1 + 1
=0
Therefore, u− v = 0 which implies u = v.
30
12. Some important results
Problem 12.1. If A is of full column rank then ATA is invertible.
Solution. Let A be an m × n matrix and rankA = n. Note that ATA is an square
matrix of dimension n× n. The matrix ATA is invertible if and only if nullATA = {0}.Let v ∈ nullATA. Then,
ATAv = 0⇒ vTATAv = 0⇒ (Av)TAv = 0⇒ ‖Av‖2 = 0⇒ ‖Av‖ = 0⇒ Av = 0.
Therefore, v ∈ nullA. Since A is of full column rank nullA = {0} and hence v = 0.
Hence,
nullATA = {0}.
Therefore, ATA is invertible. This is same as saying, ATA has rank n or ATA is positive
definite. If A is not known to be of full column rank then ATA is positive semidefinite.
31
13. Operations on Inner Product Spaces
Definition 13.1. Suppose T ∈ L(V,W ). The adjoint of T is the function T ∗ : W → V
such that
〈Tv, w〉 = 〈v, T ∗w〉
for every v ∈ V and every w ∈ W .
Remark 6. The Riesz representation theorem states that:
Suppose V is finite-dimensional and φ is a linear functional on V . Then there is a
unique vector u ∈ V such that
φ(v) = 〈u, v〉
By the Riesz representation theorem, there is a unique vector u such that
φ(Tv) = 〈Tv, u〉
we call the unique vector T ∗w to get 〈Tv, w〉 = 〈v, T ∗w〉.
Theorem 13.1. Properties of adjoint:
• (S + T )∗ = S∗ + T ∗
• (λT )∗ = λT ∗
• (T ∗)∗ = T
• I∗ = I
• (ST )∗ = T ∗S∗
Theorem 13.2. Suppose T ∈ L(V,W ). Then,
(1) nullT ∗ = (rangeT )⊥
(2) rangeT ∗ = (nullT )⊥
(3) nullT = (rangeT ∗)⊥
(4) rangeT = (nullT ∗)⊥
Proof. First we prove (a). Let w ∈ W . Then,
w ∈ nullT ∗ ⇔T ∗w = 0
⇔〈v, T ∗w〉 = 0, ∀v ∈ V
⇔〈Tv, w〉 = 0, ∀v ∈ V
⇔w ∈ (rangeT )⊥
To prove (b) we have to show that all the vectors of
Definition 13.2 (Self-Adjoint). An operator T ∈ L(V ) is called self-adjoint if T = T ∗.
In other words, T ∈ L(V ) is self adjoint if and only if
〈Tv, w〉 = 〈v, Tw〉
for all v, w ∈ V .
Theorem 13.3. Every eigenvalue of a self-adjoint operator is real..32
Theorem 13.4. Suppose V is a complex inner product space and T ∈ L(V ). Suppose
〈Tv, v〉 = 0
for all v ∈ V . Then T = 0.
Remark 7. This theorem is not true in real space. The following counter example shows
the fact. Let
T =
[0 1
−1 0
]6= 0.
Take v = (x, y) and consider the Euclidean inner product.
〈Tv, v〉 = 〈(y,−x), (x, y)〉 = xy − xy = 0
Theorem 13.5. Suppose T is a self-adjoint operator on V such that
〈Tv, v〉 = 0
for all v ∈ V . Then T = 0.
Proof. Note that for complex space it is true by the previous theorem. Now we want to
prove it for real inner product space. First we will prove that if 〈Tv, v〉 = 0 for all v ∈ Vthen T ∗ = −T .
〈T (v + w), v + w〉 = 0
⇒〈v + w, T ∗(v + w)〉 = 0
⇒〈v + w, T ∗v + T ∗w〉 = 0
⇒〈v, T ∗v〉+ 〈v, T ∗w〉+ 〈w, T ∗v〉+ 〈w, T ∗w〉
⇒〈v, T ∗w〉+ 〈w, T ∗v〉 = 0
⇒〈v, T ∗w〉 = −〈w, T ∗v〉
⇒〈v, T ∗w〉 = −〈Tw, v〉
⇒〈v, T ∗w〉 = 〈v,−Tw〉
⇒T ∗ = −TTherefore for self adjoint operator, T = T ∗ = −T ⇒ T = 0.
Theorem 13.6. Suppose V is a complex inner product space and T ∈ L(V ). Then T is
self-adjoint if and only if 〈Tv, v〉 ∈ R for every v ∈ V .
Definition 13.3.
T ∈ L(V ) is normal if TT ∗ = T ∗T
Theorem 13.7. An operator T ∈ L(V ) is normal if and only if
‖Tv‖ = ‖T ∗v‖
for all v ∈ V .33
Remark 8. The above theorem implies that nullT = nullT ∗ if T is normal. Using this
we can also prove that rangeT = rangeT ∗ as,
rangeT = (nullT ∗)⊥ = (nullT )⊥ = rangeT ∗
Theorem 13.8. Suppose T ∈ L(V ) is normal and v ∈ V is an eigenvector of T with
eigenvalue λ. Then v is also an eigenvector of T ∗ with eigenvalue λ.
Theorem 13.9. Suppose T ∈ L(V ) is normal. Then eigenvectors of T corresponding to
distinct eigenvalues are orthogonal.
Theorem 13.10 (Complex Spectral Theorem). Suppose F = C and T ∈ L(V ). Then
the following are equivalent:
(1) T is normal
(2) V has an orthonormal basis consisting of eigenvectors of T .
(3) T has a diagonal matrix with respect to some orthonormal basis of V .
Theorem 13.11 (Real Spectral Theorem). Suppose F = R and T ∈ L(V ). Then the
following are equivalent:
(1) T is self-adjoint
(2) V has an orthonormal basis consisting of eigenvectors of T .
(3) T has a diagonal matrix with respect to some orthonormal basis of V .
Theorem 13.12. Suppose T ∈ L is self-adjoint and U is a subspace of V that is invariant
under T . Then
(1) U⊥ is invariant under T .
(2) T |U ∈ L(U) is self-adjoint;
(3) T |U⊥ ∈ L(U⊥) is self-adjoint.
Remark 9. About Normal Operator
(1) rangeT = rangeT ∗
(2) In complex inner product space every normal operator has a square root. It is
because T = V DV T since T is normal then take S = V D1/2V T .
Remark 10. About Self Adjoint
A projection is orthogonal if and only if it is self adjoint.
The eigenvalues are real
Here we state some important results.
Lemma 4. If 〈Ax, x〉 = 0 for all x ∈ V in complex field then A = 0
Proof.
0 = 〈A(x+ y), x+ y〉 = 〈Ax, x〉+ 〈Ax, y〉+ 〈Ay, x〉+ 〈Ay, y = 〈Ax, y〉+ 〈Ay, x〉34
Now use x+ iy in place of x+ y to get,
0 = −i〈Ax, y〉+ i〈Ay, x〉 ⇒ 〈Ax, y〉 − 〈Ay, x〉 = 0.
Hence,
〈Ax, y〉 = 0∀x, y ∈ V.
Now use y = Ax to get
〈Ax,Ax〉 = ‖Ax‖2 = 0
which implies Ax = 0 and is true for all x ∈ V and hence A = 0.
Lemma 5. If A is normal if and only if ‖Ax‖ = ‖A∗x‖.
Proof. Since A is normal AA∗ = A∗A and hence (A∗A − AA∗)x = 0 which gives the
following:
0 = 〈(A∗A−AA∗)x, x〉 = 〈A∗Ax, x〉−〈AA∗x, x〉 = 〈Ax,Ax〉−〈A∗x,A∗x〉 = ‖Ax‖2−‖A∗x‖2.
The result follows immediately.
Lemma 6. If A is normal then nullA = nullA∗.
Proof.
x ∈ nullA⇔ Ax = 0⇔ ‖Ax‖ = 0⇔ ‖A∗x‖ = 0⇔ A∗x = 0⇔ x ∈ nullA∗
Hence
nullA = nullA∗
Lemma 7. For any operator A in finite dimensional complex vector space
(rangeA)⊥ = nullA∗
Proof. Let w ∈ rangeA and v ∈ nullA∗. Then there is u ∈ V such that w = Au and
P ∗v = 0. Now
〈v, w〉 = 〈v,Au〉 = 〈A∗v, u〉 = 〈0, u〉 = 0
since w and v were arbitrary we can conclude the result.
Lemma 8. For any projection P ,
nullP = range(I − P ).
Proof. Let v ∈ nullP . Then, Pv = 0 which implies (I − P )v = v and hence v ∈range(I − P ).
Now let v ∈ range(I − P ). Then there is u such that (I − P )u = v and so
Pv = P (I − P )u = Pu− P 2u = Pu− Pu = 0
Hence
nullP = range(I − P ).35
Theorem 13.13. If P is normal and a projection matrix then P is orthogonal projection
and self adjoint.
Proof. Suppose P is normal that is P ∗P = PP ∗ and P is a projection that is P 2 = P .
(rangeP )⊥ = nullP ∗ = nullP = range(I − P ).
In particular x− Px ∈ range(I − P ) is orthogonal to Px ∈ rangeP . So,
〈x, (P − P ∗P )x〉 = 〈x, (I − P ∗)Px〉 = 〈x, (I − P )∗Px〉 = 〈(I − P )x, Px〉 = 0.
Therefore,
P − P ∗P = 0⇒ P = P ∗P
Taking adjoint
P ∗ = P ∗P.
Hence P ∗ = P that is P is self adjoint. Therefore, the projection is an orthogonal
projection.
36
14. Problems on Operations in inner product spaces
Problem 14.1 (7A.4). Suppose T ∈ L(V,W ). Prove that
(1) T is injective if and only if T ∗ is surjective.
(2) T is surjective if and only if T ∗ is injective.
Solution.
T is injective ⇔ nullT = {0} ⇒ (nullT )⊥ = V ⇔ rangeT ∗ = V ⇔ T ∗ is surjective.
T is surjective ⇔ rangeT = V ⇔ (rangeT )⊥ = {0} ⇔ nullT ∗ = {0} ⇔ T ∗ is injective
Problem 14.2. Suppose P ∈ L(V ) is such that P 2 = P is an orthogonal projection.
Prove that P is an orthogonal projection if and only if P is self-adjoint.
Solution. First we suppose that P is an orthonormal projection. Thus there is a subspace
U of V such that P = PU . Suppose v1, v2 ∈ V . Write
v1 = u1 + w1, v2 = u2 + w2,
where u1, u2 ∈ U and w1, w2 ∈ U⊥. Now,
〈Pv1, v2〉 =〈u1, u2 + w2〉
=〈u1, u2 + w2〉
=〈u1, u2〉+ 〈u1, w2〉
=〈u1, u2〉
=〈u1, u2〉+ 〈w1, u2〉
=〈u1 + w1, u2〉
=〈v1, Pv2〉
Thus P = P ∗, and hence P is self-adjoint. To prove the implication in the order direction,
now suppose that P is self-adjoint. Let v ∈ V . Because P (v − Pv) = Pv − P 2v = 0, we
have
v − Pv ∈ nullP = (rangeP ∗)⊥ = (rangeP )⊥.
Writing
v = Pv + (v − Pv),
we have Pv ∈ rangeP and (v − Pv) ∈ (rangeP )⊥. Thus
Pv = PrangePv.
Because this holds for all v ∈ V , we have P = PrangeP , which shows that P is an
orthonormal projection.
Problem 14.3. Prove that if T ∈ L(V ) is normal, then
nullT k = nullT and rangeT k = rangeT
for every positive integer k.37
Solution. Suppose T ∈ L(V ) is normal and that k is a positive integer. Obviously we
can assume that k ≥ 2. It is obvious that nullT ⊆ nullT k. We only have to prove that
nullT k ⊆ nullT . Assume that v ∈ nullT k. Then
〈T ∗T k−1v, T ∗T k−1v〉 =〈TT ∗T k−1v, T k−1v〉
=〈T ∗T kv, T k−1v〉
=〈0, T k−1v〉
=0
Therefore, T ∗T k−1v = 0. Thus,
〈T k−1v, T k−1v〉 = 〈T ∗T k−1v, T k−2v〉 = 〈0, T k−2v〉 = 0,
which implies that T k−1v = 0. In other words, v ∈ nullT k−1. The same argument, with
k replaced with k− 1, shows that v ∈ nullT k−2. Continuing this process we can reach to
the conclusion that v ∈ nullT . Hence nullT k ⊆ nullT as well as nullT = nullT k
To show rangeT = rangeT k, note that T k = T (T k−1) and so rangeT k ⊂ rangeT . Also,
dim rangeT k = dimV − dim nullT k = dimV − dim nullT = dim rangeT.
Hence rangeT k = rangeT because one is subset of other and have same dimension.
Problem 14.4. Prove that a normal operator on a complex inner-product space is self-
adjoint if and only if all its eigenvalues are real.
Solution. If T is self-adjoint all its eigenvalues are real. Conversely, suppose that all
the eigenvalues of T are real. By the complex spectral theorem, since T is normal , T =
V DV T where D is diagonal with the eigenvalues on the diagonal and V is orthonormal.
Therefore, T ∗ = V DV T = T because D∗ = D.
Problem 14.5. Suppose V is a complex inner-product space and T ∈ L(V ) is a normal
operator such that T 9 = T 8. Prove that T is self adjoint and T 2 = T .
Solution. By the complex spectral theorem, there is an orthonormal basis (e1, . . . , en) of
V consisting of eigenvectors of T . Let λ1, . . . , λn be the corresponding eigenvalues. Thus
Tej = λjej
for j = 1, . . . , n.
λ9jej = T 9ej = T 8ej = λ8jej ⇒ λ9j = λ8j
which implies that λj equals 0 or 1. In particular, all the eigenvalues of T are real. This
implies that T is self-adjoint. Also,
T 2ej = λ2jej = λjej = Tej
where the second equality holds because λj = 0 or 1. Because T 2 and T agree on a basis,
they must be equal.38
Problem 14.6. Suppose T ∈ L(V ) is self-adjoint, λ ∈ F , and ε > 0. Prove that if there
exists v ∈ V such that ‖v‖ = 1 and
‖Tv − λv‖ < ε,
then T has an eigenvalue λ′ such that |λ− λ′| < ε
Solution. By the spectral theorem there is an orthonormal basis (e1, . . . , en) of V con-
sisting of eigenvectors of T . Let λ1, . . . , λn be the corresponding eigenvalues. Suppose
v ∈ V is such that ‖v‖ = 1 and ‖Tv − λv‖ < ε. Then we have,
v = 〈v, e1〉e1 + · · ·+ 〈v, en〉en
and so
Tv = λ1〈v, e1〉e1 + · · ·+ λn〈v, en〉enThen,
ε2 >‖Tv − λv‖2
=‖(λ1 − λ)〈v, e1〉e1 + . . . (λn − λ)〈v, en〉en‖2
= |λ1 − λ|2 |〈v, e1〉|2 + · · ·+ |λn − λ|2 |〈v, en〉|2
≥(min{|λ1 − λ|2, . . . , |λn − λ|2})(|〈v, e1〉|2 + · · ·+ |〈v, en〉|2)
= min{|λ1 − λ|2, . . . , |λn − λ|2}Thus ε > |λj − λ| for some j.
Problem 14.7. Let V be an inner-product space with inner product 〈·, ·〉V . Let W be
an inner product space with inner product 〈·, ·〉W . Let T ∈ L(V,W ). Define the adjoint
T ∗ of T . Under what sufficient condition does the adjoint exist? Under what sufficient
condition is the adjoint unique?
Solution. Let T ∈ L(V,W ). We define T ∗, the adjoint of T as the function
T ∗ :W → V
w −→ T ∗w
where T ∗w is such that ∀v ∈ V ,
〈Tv, w〉W = 〈v, T ∗w〉V .
T ∗ is linear because for all w1, w2 ∈ W and α, β ∈ F,
〈v, T ∗(αw1+βw2)〉V = α〈Tv, w1〉W+β〈Tv, w2〉W = α〈v, T ∗w1〉V +β〈v, T ∗w2〉V = 〈v, αλT ∗w1+βT∗w2〉V = .
T ∗ exists in finite dimension and it is unique.
Problem 14.8. Let V be a complex inner-product space. We consider T in L(V ). Give
necessary and sufficient condition on V and D for the following statements.
(1) T is self-adjoint if and only if there exist V and D such that T = V DV H and D
is diagonal and ....39
(2) T is normal if and only if there exists V and D such that T = V DV H and D
diagonal and ...
(3) T is an isometry if and only if there exist V and D such that T = V DV H and D
is diagonal and ...
(4) T is positive if and only if there exist V and D such that T = V DV H and D is
diagonal and ...
Solution. (1) V is orthonormal and eigenvalues are real that is V HV = V V H = I
and ∀i = 1, . . . , n, dii ∈ R(2) V is orthonormal that is V HV = V V H = I.
(3) V is orthonormal and eigenvalues of modulus 1 that is V HV = V V H = I and
∀i = 1, . . . , n, |dii| = 1
(4) V is orthonormal and eigenvalues are real nonnegative that is V HV = V V H = I
and ∀i = 1, . . . , n, dii ∈ R, dii ≥ 0
Problem 14.9. We consider a complex inner product space. Prove that every eigenvalue
of a self-adjoint operator is real.
Solution. Let (V,+, ·, 〈·, ·〉) be complex inner product space. Let T ∈ L(V ) is self
adjoint. So, T = T ∗. Let λ be an eigenvalue of T and x 6= 0 an associated eigenvector.
So, Tx = λx. Then,
λ〈x, x〉 = 〈λx, x〉 = 〈Tx, x〉 = 〈x, T ∗x〉 = 〈x, Tx〉 = 〈x, λx〉 = λ〈x, x〉.
Since x 6= 0 we have 〈x, x〉 6= 0 and hence λ = λ and therefore, λ is real.
Problem 14.10. We consider a complex inner product space. Prove that eigenvectors
of a self-adjoint operator corresponding to distinct eigenvalues are orthogonal.
Solution. Let (V,+, ·, 〈·, ·〉) be complex inner product space. Let T ∈ L(V ) is self
adjoint. So, T = T ∗. Let λ1 and λ2 are distinct eigenvalues of T and x1 6= 0, x2 6== are
associated eigenvectors respectively. Then,
λ1〈x1, x2〉 = 〈λ1x1, x2〉 = 〈Tx1, x2〉 = 〈x1, T ∗x2〉 = 〈x1, Tx2〉 = 〈x1, λ2x2 = λ2〈x1, x2〉
Since T is self adjoint the eigenvalues are real and so λ2 = λ2. Therefore,
λ1〈x1, x2〉 = λ2〈x1, x2〉 ⇒ (λ1 − λ2)〈x1, x2〉 = 0⇒ 〈x1, x2〉 = 0.
Hence x1 and x2 are orthogonal .
Problem 14.11. Let V be an inner-product space with inner product 〈·, ·〉V . Let W be
an inner-product space with inner product 〈·, ·〉W . Let T ∈ L(V,W ). We assume T has
an adjoint. Prove that the adjoint of T is unique.
Solution. Let T ∗1 and T ∗2 are distinct adjoints of T . Let v ∈ V and w ∈ W . Since T ∗1 is
adjoint of T ,
〈Tv, w〉W = 〈v, T1 ∗ w〉V .40
Since T ∗2 is adjoint of T ,
〈Tv, w〉W = 〈v, T2 ∗ w〉V .
We get, 〈v, T ∗1w〉V = 〈v, T ∗2w〉V and so,
〈v, (T1 ∗ −T ∗2 )w〉V = 0.
So, (T ∗1 − T ∗2 )w is orthogonal to all vectors in V since v was arbitrary. Therefore,
(T ∗1 − T ∗2 )w = 0
which is true for all w ∈ W and hence T ∗1 − T ∗2 = 0 showing that
T ∗1 = T ∗2 .
Hence adjoint is unique.
Problem 14.12. We consider the vector space of n -by- n complex matrices. Prove that
trace(AHB) defines an inner product.
Solution.
Problem 14.13. Let A be an n × n complex matrix. Define H = 12(A + A∗) and
S = 12(A−A∗). Prove that A is normal if every eigenvalues of H is also an eigenvalue of
S.
Solution. Let vi be an eigenvalue of H corresponding to the eigenvalue λi. By the
assumption, vi is also an eigenvector of S corresponding to µi(say). Then,
1
2(A+ A∗)vi = λvi;
1
2(A− A∗)vi = µvi.
Which implies,
Avi = (λ+ µ)vi; A∗vi = (λ− µ)vi.
Therefore,
AA∗vi = A(λ−µ)vi = (λ+µ)(λ−µ)vi = (λ−µ)(λ+µ)vi = (λ−µ)A∗vi = A∗(λ−µ)vi = A∗Avi.
Since H is Hermitian (because H∗ = H), there is an basis of Cn consisting of the eigen-
vectors of H. Let v1, . . . , vn be the eigen-basis of Cn. Let v ∈ Cn be arbitrary. Then for
some a1, . . . , an ∈ C, v =n∑i=1
aivi. Then,
AA∗v = AA∗b∑i=1
aivi =n∑i=1
aiAA∗vi =
n∑i=1
aiA∗Avi = A∗A
n∑i=1
aivi = A∗Av.
Since v was arbitrary, we conclude that AA∗ = A∗A and hence A is normal.
41
15. Scrambled Ideas
Remark 11. On Similarity Transformation:
(1) Two matrix A and B are called similar if there exists an invertible matrix P such
that A = PBB−1
(2) Determinant of similar matrices is same
detA = det(PBP−1) = det(P ) det(B) det(P−1) = det(P ) det(B)1
det(P )= detB
(3) Similar matrices have the same eigenvalues because
det(A−λI) = det(PBP−1−λI) = det(PBP−1−PλIP−1) = det(P (B−λI)P−1) = det(B−λI)
(4) To get Ak its helpful to have similarity transformation, A = V DV −1 where V is
invertible.
Remark 12. On Left and right eigenvectors
(1) Let A is diagonalizable then, A = V DV −1 with respect to the eigenbasis that is
the eigenvectors are the columns of V .
(2) Setting W = V −T we get A = W−TDW T which implies W TA = DW T that is
W Ti A = λiW
Ti . W T
i is the left eigenvector of A or the right eigenvector of AT .
(3) The eigenvalues of A and AT are same.
Remark 13. On the Symmetric matrices
(1) Symmetric matrix has real eigenvalues.
(2) Symmetric matrix has an orthonormal eigen-basis.
(3) We can diagonalize it as A = QΛQT
(4) Symmetric matrix is a combination of mutually perpendicular projection matrices.
A = QΛQT = λ1q1qT1 + λ2q2q
T2 + . . . . . .
(5) Signs of pivots are same as signs of eigenvalues. That is the number of positive
pivots is equal to that of positive eigenvalues and same for negative.
(6) A symmetric matrix can be factorize as
A = LDLT
It is because, any matrix can be written as A = LDU and since A = AT we must
have LDU = UTDLT . Since the factorization is unique, we must have U = LT
and therefore A = LDLT .
(7) Since symmetric matrices are diagonalizable and so have enough eigenvector to
make an eigen-basis there is no defective eigenvalue that is the minimal polynomial
has all linear factors.
Remark 14. On the SPD
(1) Each of the tests is a necessary and sufficient condition for the real symmetric
matrix A to be positive definite.42
• XTAx > 0 for all nonzero real vectors x.
• All the eigenvalues of A satisfies λi > 0
• All the upper left submatrices Ak have positive determinants.
• All the pivots (without row exchanges) satisfy dk > 0.
15.1. On Square root of a matrix
Definition 15.1. Square root of a matrix A is a matrix S such that A = S2.
Theorem 15.1. Suppose A is a square matrix. There is a positive semi-definite matrix
S such that A = S2 if and if A is positive semidefinite.
Proof. Suppose A is positive semidefinite.
A positive semi definite ⇒ A is Hermetian ⇒ A is normal .
So there is a unitary matrix U and a diagonal matrix D, whose diagonal entries are
the eigenvalues of A, such that D = U∗AU . The eigenvalues of A are all nonnegative,
which allows us to define a diagonal matrix E whose diagonal entries are the positive
square roots of the eigenvalues of A, in the same order as they appear in D. That is
E is the diagonal matrix with the non-negative diagonal entries such that E2 = D. Set
S = UEU∗, then,
S2 = UEU∗UEU∗ = UEInEU∗ = UE2U∗ = UDU∗ = A
First we verify that S is Hermitian,
S∗ = (UEU∗)∗ = UE∗U∗ = UETU∗ = UEU∗ = S.
Since E is diagonal and the entries are real E is Hermitian. Also the eigenvalues of E are
the diagonal entries and so non-negative. Hence E is positive semi-definite. Let x ∈ Vbe any vector, then,
x∗Sx = x∗UEU∗x = (xU∗)∗E(xU) ≥ 0
Hence, S is positive semi definite.
Now we assume A = S2, with S positive semi-definite. Then S is Hermitian, and we
check that A is Hermitian.
A∗ = (SS)∗ = S∗S∗ = SS = A.
Let x ∈ V be any vector, then
x∗Ax = x∗SSx = x∗S∗Sx = (Sx)∗Sx = ‖Sx‖ ≥ 0
15.2. On the Matrix Decompositions
Let A be an m × n matrix with rank n. We can use Gram-Schmidt orthogonalization
to factorize A into two matrices. One is an m × n matrix Q with orthonormal columns43
and the other is an n × n upper triangular matrix R. This factorization is known as
reduced QR factorization. A full QR factorization is done appending an additional m−northogonal columns to Q so that it become an m×m unitary matrix.
Theorem 15.2. Every A ∈ Cm×n (m ≥ n) has a full QR factorization, hence also a
reduced QR factorization.
Theorem 15.3. Let p ≥ q. Let A be a real p × q matrix with rank q. Prove that the
QR-decomposition A = QR is unique if R is forced to have positive entries on its main
diagonal, Q is p× q and R is q × q.
Proof. Assume that A = Q1R1 and A = Q2R2 with R1, R2 upper triangular with positive
entries on the diagonal and QT1Q1 = Iq and QT
2Q2 = Iq.
We first note that since A is full rank, R1 and R2 are invertible. We have
(15.1) Q1R1 = Q2R2.
Multiplying by QT1 on the left and by R−12 on the right, the equation (15.1) gives,
R1R−12 = QT
1Q2.
Since R1R−12 is upper triangular this means that QT
1Q2 is upper triangular. Now multi-
plying QT2 on the left and R−11 on the right equation (15.1) gives,
R2R−11 = QT
2Q1.
This means that QT2Q1 is upper triangular. So, QT
1Q2 is lower triangular. That is QT1Q2
is upper and lower triangular. So it is diagonal and also invertible.
Let us call D = QT1Q2. Then
R1 = DR2.
From equation (15.1) we get
Q1DR2 = Q2R2 ⇒ Q1D = Q2 ⇒ Q1 = Q2D−1.
Therefore, QT1Q1 = I, and, QT
2Q2 = I gives D2 = I. D has therefore ±1 on the
diagonal.
Since R1 = DR2, the diagonal entry of R1 are given by (R1)ii = Dii(R2)ii. But the
positivity of both (R1)ii and (R2)ii along with Dii = ±1 implies Dii = 1. Finally D = I
and so:
Q1 = Q2 and, R1 = R2.
Problem 15.1. Let A be an m× n matrix (m ≥ n), and let A = QR be a reduced QR
factorization
(a) Show that A has rank n if and only if all the diagonal entries of R are nonzero.
(b) Suppose R has k nonzero diagonal entries for some k with 0 ≤ k < n. What does
this implies about the rank of A? Exactly k? At least k? At most k? Give a
precise answer, and prove it.44
Solution. content...
45
16. On SPD
If A is symmetric positive definite then A1/2 exists and is well defined. For a symmetric
matrix A, the inverse A−1 is also symmetric. Because
(A−1)T = (A−1)TAA−1 = (A−1)TATA−1 = (AA−1)TA−1 = A−1.
Also for any nonsingular square matrix we can switch the inverse and transpose because
(A−1)TAT = (AA−1)T = I ⇒ (AT )−1 = (A−1)T
A SPD has Cholesky factorization that is if A is symmetric positive definite matrix then
A = CCT
where C is lower triangular with positive elements on the diagonal. (so C is invertible.)
Problem 16.1. Let A,B, and C represent three real n× n matrices, where A and B be
symmetric positive definite and C be invertible. Prove that each of the following is spd.
(a) A−1
(b) A+B
(c) CTAC
(d) A−1 − (A+B)−1
Solution. We use the definition and property that a real symmetric matrix A is
positive definite if and only if xTAx > 0 for all real n− dimensional vectors x 6= 0,
or equivalently, if all its eigenvalues are real and positive.
(a) Since A is symmetric, we have,
(A−1)T = (A−1)TAA−1 = (ATA−1)TA−1 = (AA−1)TA−1 = A−1
and thus A−1 is symmetric. Moreover, if A is positive definite, then all its eigen-
values are positive and real. Let λ be an eigenvalue of A−1 then A−1v = λv where
v is the corresponding eigenvector. But A−1v = λv implies Av = 1λv and therefore
1/λ is an eigenvalue of A and is real and positive and hence λ will be positive.
Hence A−1 is positive definite.
(b) A+B is symmetric because,
(A+B)T = AT +BT = A+B.
Also, for any x 6= 0,
xT (A+B)x = xTAx+ xTBx > 0.
Hence A+B is spd.
(c) CTAC is symmetric because
(CTAC)T = CTA(CT )T = CTAC.46
Also for any x 6= 0,
xTCTACx = (Cx)TACx.
If Cx = 0 then, xTCx = 0 which implies x = 0. Therefore for x 6= 0, Cx 6= 0.
Therefore since A is positive definite,
xTCTACx = (Cx)TACx > 0.
Hence CTAC is spd.
(d)
A−1−(A+B)−1 = (A−1(A+B)−I)(A+B)−1 = A−1B(A+B)−1 = [(A+B)(B−1A)]−1 = [AB−1A+A]−1
By (a) B−1 is spd. By (c), AB−1A = ATB−1A is spd. By (b), AB−1A+A is spd.
by (a) again, [AB−1A+ A]−1 is spd. Hence A−1 − (A+B)−1 is spd.
Problem 16.2. Suppose A is a positive symmetric real n × n matrix and B is a real
m×n matrix such that BBT is positive definite. Prove that the matrix BT (BA−1BT )−1B
is symmetric positive definite.
Solution. First note that is symmetric because. Since A is spd, by the previous problem
A−1 is spd. For x 6= 0, we have BTx 6= 0 because otherwise xTBBTx = 0 which
contradicts the fact that BBT is spd. Therefore for x 6= 0,
xTBA−1BTx = (BTx)TA−1BTx > 0
since A−1 is spd. Hence BA−1BT is spd. Again since the inverse of an spd is spd we have,
(BA−1BT )−1 is spd. Since BBT is spd, we can show that BTB is also spd. Obviously
BTB is symmetric. Let λ be an eigenvalue of BTB and v is the corresponding eigenvector.
Then, BTBv = λv which implies BBTBv = λBv showing that λ is also an eigenvalue of
BBT and hence λ > 0. Therefore BTB is spd. Hence for x 6= 0 Bx 6= 0. So, for x 6= 0,
xTBT (BA−1BT )−1Bx = (Bx)T (BA−1BT )−1Bx > 0
since (BA−1BT )−1 is spd. Hence BT (BA−1BT )−1B is spd.
Problem 16.3. Suppose A is a positive definite symmetric square real matrix and B is
a symmetric square real matrix. Show that there exists a square real matrix C such that
CTAC is the identity matrix and CTBC is a diagonal matrix.
Solution. Let C1 = A1/2. Then C−11 AC−11 is the identity matrix and C−11 BC−11 is
symmetric. We can write C−11 BC−11 = PDP T , where D is diagonal and P is orthogonal.
Then D = (P TC−11 )B(C−11 P ) and (P TC−11 )A(C−11 P ) = P T (C−11 AC−11 ) is the identity
matrix. Thus, one can take C = C−11 P
Problem 16.4. Assume the following general definition for a real positive semidefinite
matrix: an n × n real matrix A is said to be positive semidefinite if and only if, for all
vector x in Rn, xTAx ≥ 0. In particular, this definition allows real matrices which are
not symmetric to be positive semidefinite.47
(a) Prove that if A and B are real symmetric positive semidefinite matrices and matrix
A is nonsingular, then AB has only real nonnegative eigenvalues.
(b) Provide a counterexample showing that the requirement that the matrices are
symmetric cannot be dropped.
Solution. (a) Since A is symmetric positive definite, A1/2 and A−1/2 are well de-
fined. The matrix AB has the same eigenvalues as the matrix A−1/2ABA1/2 =
A1/2BA1/2. The latter matrix is selfadjoint and positive semidefinite, so it has
real nonnegative eigenvalues.
Alternative: Let λ be an eigenvalue of AB and v is the corresponding eigenvec-
tor. Then
ABv = λv ⇒ BABv = λBv = vT ⇒ vTBABv = λvTBv ⇒ (Bv)TABv = λvTBv.
Since A is nonsingular and positive semidefinite, all the eigenvalues are positive
and so A is positive definite. Therefore the right hand side is positive. Also
vTBv ≥ 0. So if λ < 0 then (Bv)TABv = λvTBv is inconsistent. Hence λ ≥ 0.
(b) We need to have a nonsymmetric matrix A. To create a positive semidefinite
matrix A, one simply takes a symmetric positive definite matrix H and then add
an antisymmetric matrix S, then A = H + S is positive semidefinite matrix. In
our case, we take
A =
[0 1
−1 0
]and B =
[1 0
0 1
].
In this case A is positive semidefinite nonsingular, B is positive semidefinite, and
AB does not have real nonnegative eigenvalues.
Problem 16.5. Let A be an n × n real symmetric positive semidefinite matrix. Let B
be an n× n real symmetric positive definite matrix.
(a) Prove that AB have real nonnegative eigenvalues.
(b) Prove that
det(A) det(B) ≤(
trace(AB)
n
)nSolution. (a) Since B is symmetric positive definite matrix, it has a Cholesky fac-
torization B = CCT , where C is lower triangular with positive elements on the
diagonal. Now we have,
AB = (C−TCT )A(CCT ) = (CT )−1(CTAC)CT .
Therefore AB is similar to CTAC, therefore AB and CTAC have the same eigen-
values.
Since A is symmetric, CTAC is symmetric as well, so CTAC has real eigenvalues.
Moreover, since C is invertible and A is positive semidefinite, CTAC is positive
semidefinite as well. Therefore CTAC is real symmetric positive semidefinite, so48
it has real nonnegative eigenvalues. We conclude that AB has real nonnegative
eigenvalues.
(b) Let λi, i = 1, 2, . . . , n, be the n eigenvalues of AB. (Where we repeat the eigen-
values according to their algebraic multiplicities). On the other hand, we note
that
det(A) det(B) = det(AB) =n∏i=1
λi.
On the other hand,
trace(AB) =n∑i=1
λi.
Since λi ≥ 0 by the arithmetic-geometric mean inequality we get,
(n∏i=1
λi
) 1n
≤
n∑i=1
n,
which leads us to the result.
Problem 16.6. We consider two real valued n × m matrices A and B such that A is
symmetric positive definite and B is anti-symmetric. Prove that A+B is invertible.
Solution. Since B is anti-symmetric, (which means by definition BT = −B), for all
vector x of size n, we have
xTBx = (xTBx)T = xTBTx = −xTBx
which implies xTBx = 0. Now let x be a n× 1 vector such that
(A+B)x = 0.
Then, multiplying on the left by xT , this implies
xT (A+B)x = xTAx+ xTBx = xTAx = 0.
Since A is positive definite, xTAx = 0 implies x = 0. Hence A+B has trivial null space
and therefore is invertible.
Problem 16.7. (a) Let A be a complex Hermitian matrix. Prove that A is positive
definite if and only if all the eigenvalues of A are positive.
(b) Let A =
2 0 0
0 3 −1
0− 1 3
. Let V = R3. We define the map ∗ : V × V → R by
u ∗ v = uTAv for all u, v ∈ V . Prove that * is an inner product on V .
(c) Use the inner product from above and the Gram-Schmidt orthogonalization pro-
cess to find an orthonormal basis for V .
Solution. (a) Let A be Hermitian positive definite. This means that, for all x 6= 0,
xHAx is real positive. Let λ be eigenvalue of A. Let v be an eigenvector of A49
associated with the eigenvalue λ such that vHv = 1. Now we see that vHAv =
vHλv = λvHv = λ. So λ is real positive.
Let A be Hermitian with all eigenvalues positive. Then since A is Hermitian, A
is diagonalizable with an orthonormal basis. So there exists V such that A =
V DV H . Let x be a nonzero vector of size n.
xHAx = XH(V DV H)x = xHV D1/2D1/2V Hx = (D1/2V Hx)H(D1/2V Hx) = ‖D1/2vHx‖2 > 0.
which implies that A is positive definite.
(b) A is symmetric and the eigenvalues of A are 2,2 and 4. So the eigenvalues of A
are all positive, so A is symmetric positive definite. Therefore uTAv defines an
inner product. (Theorem used: uTAv defines an inner product if and only if A is
symmetric positive definite.)
(c) We apply the Gram-Schmidt process to the basis e1, e2, e3 in order to obtain an
orthonormal basis for V .
eT1Ae1 = 2⇒ ‖e1‖ =√
2⇒ q1 = [√
2/2, 0, 0].
qT1 Ae2 = 0; qT1 Ae3 = 0.
eT2Ae2 = 3⇒ ‖e2‖ =√
3⇒ q2 = [0,√
3/3, 0].
qT2 Ae3 =√
3⇒ w = e3 −√
3q2 = [0, 1/3, 1].
wTAw = 8/3⇒ ‖w‖ = 2√
6/3⇒ q3 = [0,√
6/12,√
6/4]
50
17. On Row space, Column space, rank and nulity
The column space of a matrix A of dimension m× n is the subspace of Fm containing
all the linear combination of the columns of A. The solution to Ax = 0 form a vector
space that is called the null space of A which is a subspace of Fn. If the matrix A has
linearly independent columns (in case of square matrix we can say if it is invertible) then
the null space contains only the zero vector. Let the vector xn is in the null space and a
particular solution to the system Ax = b is xp. Then we can write a general solution or
complete solution of Ax = b as,
x = xp + xn
because
Ax = A(xp + xn) = Axp + Axn = b+ 0 = b
In the point of view we can infer that a system Ax = b has unique solution if the matrix
A has zero null space and b is in the column space of A. The requirement of b being in
the column space is required for the existence of the solution and null(A) = 0 is required
for the uniqueness of the solution. If the null space is not the zero then it contains infin-
itely many vectors and so in that case if a solution exists then there are infinitely many
solution to the system.
Now we will introduce two other important subspaces. An important term in the
discussion is the rank of a matrix. Rank of a matrix is the dimension of the column
space that is the number of independent columns. The row space is the subspace of Fm
containing all the linear combination of the rows for the matrix. A row space can also be
defined in terms of column space. Row space of a matrix A is the column space of AT .
The fourth fundamental subspace generated by a matrix A is the null space of AT which
is also called the left null space. The left null space is the subspace of Fm. So, the list of
four fundamental subspaces are,
• The column space of A is denoted by C(A). Its dimension is the rank r.
• The null space of A is denoted by null(A). Its dimension is n− r• The row space of A is the column space of AT . Its dimension is r
• The left null space of A is the null space of AT . Its dimension is m− r
Row operation preserves the row space but it does change the column space. The dimen-
sion of the null space is also known as nulity. We state to very important results which
are known as the fundamental theorem of orthogonality.
Theorem 17.1. The row space is orthogonal to the nullspace. The column space is
orthogonal to the left nullspace that is the null space of AT .
The picture of the four spaces by Gilbert51
Every b in the column space is a combination Ax of the columns. In fact, b is Axr,
with xr in the row space, since the nullspace component gives Axn = 0. If another vector
x′r in the row space gives Ax′r = b, then A(xr − x′r) = b− b = 0. This puts xr − x′r in the
null space and the row space, which makes it orthogonal to itself. Therefore it is zero and
thus xr = x′r. Therefore, exactly one vector in the row space is carried to b. A matrix
transforms its row space onto its column space.
Now we state some theorems about the rank and nullity which is the dimension of the
null space.
• Elementary row operations do not change the row space of a matrix.
• If a matrix A is in row echelon form, then the nonzero rows of A are linearly
independent.
• The rank of a matrix is equal to the number of nonzero rows in its row echelon
form.
• The row space is orthogonal complement of the null space in Fn. The left null
space is the orthogonal complement of the column space in Fn.
• Ax = b is solvable if and only if yT b = 0 whenever yTA = 0 or equivalently
ATy = 0.
• rank(A+B) ≤ rank(A) + rank(B).
52
18. On Diagonalizablity
A matrix A ∈ Fn×n is said to be diagonalizable if it is similar to a diagonal matrix that
is if there exists an invertible matrix U ∈ Fn×n and a diagonal matrix D ∈ Fn×n such
that
A = UDU−1
An n× n matrix A is said to be diagonalizable
• if and only if there exists a basis of Rn made of eigenvectors of A.
• if and only if A has n linearly independent eigenvectors.
• if and only if there exists a diagonal matrix D and an invertible matrix V such
that A = V DV −1.
Suppose A is diagonalizable. Then the formula A = UDU−1 implies that
A− λIn = UDU−1 − λUInU−1 = U(D − λIn)U−1
and hence that
dim null(A− λIn) = dim null(D − λIn)
for every point λ ∈ F because these matrices are similar. In particular, if λ = λj is an
eigenvalue of A, then
γj = dim null(A− λjIn) = dim null(D − λjIn)
is equal to the number of times the number λj is repeated in the diagonal matrix D. Thus,
γ1+ · · ·+γk = n. Also γj represents the dimension of the eigenspace corresponding to the
eigenvalue λj. So the sum of the dimension of the eigenspaces is n and therefore there
are n linearly independent eigenvectors because eigenvectors corresponding to distinct
eigenvalues are linearly independent. A sufficient condition for diagonalizability is that if
the square matrix of size n has n distinct eigenvalues then the matrix is diagonalizable.
Let A be an Hermitian matrix that is A = A∗. Then A is diagonalizable in an orthonormal
basis, therefore there exists Q, n× n unitary matrix, and D, n× n diagonal matrix such
that A = QDQ∗. Since A is Hermitian the eigenvalues are real and hence all the entries
of D are real.
An skew Hermitian matrix B that is B = B∗ is diagonalizable in an orthonormal basis
and all its eigenvalues are purely imaginary.
Problem 18.1. Let A and B be n×n matrices. Prove or disporve each of the following.
(a) If A and B are diagonalizable, then so is A+B
(b) If A and B are diagonalizable, then so is AB
(c) If A2 = A, then A is diagonalizable.
(d) If A2 is diagonalizable, then so is A.53
Solution. (a) Consider the following matrices:
A =
[1 1
0 0
]and B =
[−1 0
0 0
]The matrix A is upper triangular, so that its eigenvalues are its diagonal entries
namely 1 and 0. Since it has to distinct eigenvalues A is diagonalizable. The
matrix B is diagonalizable because it is already diagonal. But the sum of the
matrices,
A+B =
[0 1
0 0
]is not diagonalizable because it is a Jordan block of size 2 associated with eigen-
value 0. Hence, we can see that A and B both are diagonalizable, while A+B is
not diagonalizable.
(b) The statement is not true. A counterexample is
A =
[1 1
0 0
], B =
[0 0
0 1
], AB =
[0 1
00
].
(c) The statement is true. If A2 = A then A(A− I) = 0. It follows that the minimal
polynomial of A is either x (if A = 0) or (x − 1) (if A = I), or x(x − 1). In
any case, the minimal polynomial µA has no repeated roots, and thus A is
diagonalizable.
(d) The statement is not true. A counterexample is
A =
[0 1
00
], A2 =
[0 0
00
].
As we explained in part (a), the matrix A is not diagonalizable but A2 is diago-
nalizable.
Problem 18.2. Suppose that A is an m×n matrix and B is an n×m matrix, and write
Im for the m×m identity matrix. Show that if Im−AB is invertible, then so is In−BA.
Solution. Method 1: Let x ∈ null(In−BA). Then x−BAx = 0. It follows that BAx =
x, so AB(Ax) = Ax, which implies (In − AB)Ax = 0. Therefore Ax ∈ null(Im − AB).
Since Im−AB is invertible, the nullspace is trivial, so that Ax = 0 and thus x = BAx = 0.
Hence nullspace of In −BA is trivial and so In −BA is invertible.
Method 2:
Problem 18.3. Let A and B be n × n complex matrices such that AB = BA. Show
that if A has n distinct eigenvalues, then A , B and AB are all diagonalizable.
Solution. Let λ1, . . . , λn be the n distinct eigenvalues of A with corresponding (nonzero)
eigenvectors v1, . . . , vn. We know that a list of eigenvectors belonging to distinct eigenval-
ues must be a linearly independent list. Hence B = (v1, . . . , vn) is a basis of Cn consisting54
of eigenvectors of A, so that A is similar to the diagonal matrix diag(λ1, . . . . . . , λn).
For each i = 1, . . . , n, we have
ABvi = BAvi = Bλivi = λiBvi
which implies that Bvi is also an eigenvector of A corresponding to the eigenvalue λi.
Since A has n distinct eigenvalues all the eigenspaces are one dimensional and hence
Bvi = ciλivi for some ci ∈ C and we write ciλi = ψi. Therefore, Bvi = ψivi for all
i = 1, . . . , n. Hence the basis B is also a basis of eigenvectors of B so that vi is associated
with the eigenvalue ψi.
Now let V be the matrix with the eigenvector vi as its ith column. Then, AV = ΛV and
Bv = ΨV where Λ and ψ are the diagonal matrix corresponding to A and B respectively.
Then,
ABV = AΨV = ΨAV = ΦΛV = ΦV
where Φ = ΛΨ = diag(ψ1λ1, . . . , ψnλn).
55
19. On Orthogonal Projection
An orthonormal basis of a vector space V is very important in many situations. A list
of vectors in V that is orthonormal and also form a basis is called a orthonormal basis of
V . For an n dimensional space a list of n orthonormal vectors forms the orthonormal basis
because the orthonormal list of vectors are linearly independent. One of the advantages of
having orthonormal basis is that for any vector v ∈ V the coefficient of linear combination
is readily known. Suppose e1, . . . , en is an orthonormal basis of V and v ∈ V . Then
v = 〈v, e1〉e1 + · · ·+ 〈v, en〉en
and
‖v‖ = |〈v, e1〉|2 + · · ·+ |〈v, en〉|2 .
A good thing is that every finite-dimensional inner product space has an orthonormal
basis.
Definition 19.1. A linear transformation P of a vector space U over F into itself is said
to be a projection if P is idempotent that is if P 2 = P .
Any transformation projects a vector from its row space to its column space. If the
transformation is an operator then we can say that the transformation project a vector
into its column space. Then what is the specialty of the projection? For an non-projection
operator T if you take a vector u from the column space U then it will project it to U but it
is not necessarily project it to the same vector u. On the contrary, let the transformation
is a projection P , and u ∈ U where U is the column space of P . Then there is y ∈ Vsuch that Py = u. So, Pu = P 2y = Py = u.
Let P be an projection of V onto U where U is finite-dimensional subspace of V . Then,
U = CP = {Px : x ∈ V } and NP = {x ∈ V : Px = 0} is the null space of P . We want
to show that V = U ⊕Np.
Let x ∈ V . Then
x = Px+ (I − P )x
and Px ∈ U . Moreover (I − P )x ∈ NP , since
P (I − P )x = (P − P 2)x = (P − P )x = 0.
Thus
V = U +NP .
To prove that the sum is direct sum let y ∈ U ∩NP . Then,
y ∈ U ⇔ y = Py
y ∈ NP ⇔ Py = 0
which implies y = 0. Hence
V = U ⊕NP
56
Definition 19.2. A linear transformation P of an inner product space U over F into itself
is said to be an orthogonal projection if P is idempotent and self adjoint with respect to
the given inner product that is if
P 2 = P, and 〈Pu, v〉 = 〈u, Pv〉
for every pair of vectors u, v ∈ U.
The another way to define or express the orthonormal projection is that:
Definition 19.3. Suppose U is a finite-dimensional subspace of V . The orthogonal
projection of V onto U is the operator PU ∈ L(V ) defined as follows: For v ∈ V , write
v = u+ w, where u ∈ U and w ∈ U⊥. Then PUv = u.
Simply the above definition is saying that the column space of P which is U and the
null space are the orthogonal complement. Bellow we will show that these two definitions
are equivalent.
Let P be an orthogonal projection of V onto the finite-dimensional subspace U according
to the former definition. That is P 2 = P and 〈Pu, v〉 = 〈u, Pv〉. Let u ∈ U be arbitrary
and w ∈ U⊥. Then,
〈Pw, u〉 = 〈w,Pu〉 = 〈w, u〉 = 0
Since the above equality is true for all u ∈ U we can conclude that Pw ∈ U⊥. But
Pw ∈ U because P is the projection onto U . Therefore Pw ∈ U ∩U⊥ and hence Pw = 0.
Hence
Pv = Pu+ Pw = Pu = u.
Now we take the second definition and want to show that 〈Pu, v〉 = 〈u, Pv〉 for all
u, v ∈ V . Let u = u1 + u2 and v = v1 + v2 where u1, v1 ∈ U and u2, v2 ∈ U⊥.
〈Pu, v〉 = 〈u1, v1 + v2〉 = 〈u1, v1〉+ 〈u1, v2〉 = 〈u1, Pv〉 = 〈u1, Pv〉+ 〈u2, Pv〉 = 〈u, Pv〉
The importance of orthonormal basis of a subspace is that if we know the projection of a
vector along the basis vectors then we can simply add them together to get the projection
on the subspace. The following problem illustrate the property.
19.1. Projection in a Nutshell
(1) A projector is a square matrix P that satisfies P 2 = P . Another name is Idem-
potent.
(2) Two types of projections are orthogonal projection and oblique projection.
(3) The projector matrix P projects a vector v to the range of P along the null-space
of P .
(4) If P is a projector then I −P is also a projector and is called the complementary
projector to P .57
(5) range(P )=null(I − P ) also, null(P )=range (I − P ).
(6) range(P )∩ null (P ) = ∅.(7) Orthogonal Projection:
(a) If P ∈ C(m×m) is an orthogonal projector then
range(P ) ⊥ null(P )
(b) A projector P is orthogonal if and only if P = P ∗.
(c) If Q is has orthonormal columns then QQ∗ is an orthonormal projector onto
the column space of Q.
(d) The rank-one orthogonal projector in a single direction q can be written as
Pq = qq∗. To normalize it we can write the orthonormal projector in the
direction of q as Pq = qq∗
q∗q. So the rank-one projector isolates the component
of a vector in a singe direction. The complement of rank-one projector 1−Pqis the rank m− 1 orthogonal projector that eliminate the component in the
direction of q.
(e) Let a subspace generated by the vectors 〈a1, . . . . . . an〉 and let A be the m×nmatrix whose jth column is aj. Then the projector onto the subspace which
is also the column space or range of A is given by
P = A(A∗A)−1A∗ =AA∗
A∗A.
Problem 19.1. Let P2[0, 2] represent the set of polynomials with real coefficients and of
degree less than or equal to 2, defined on [0, 2]. For p = (p(t)) ∈ P2 and q = (q(t)) ∈ P2,
define
〈p, q〉 := p(0)q(0) + p(1)q(1) + p(2)q(2).
(a) Let T represent the linear transformation that maps an element p ∈ P2 to the
closest element of the span of the polynomial 1 and t in the sense of the norm
associated with the inner product. Find the matrix A of T in the standard basis
of P2.
(b) Is A symmetric? Is T self-adjoint? Do these facts contradict each other?
(c) Find the minimal polynomial of T
Solution. (a) We understand that T is the orthogonal projection onto the subspace
spanned by 1 and t. To find the matrix of T in the standard basis, let us apply
T to 1, t and t2. It is clear that T (1) = 1 and T (t) = t. Now we need to compute
T (t2). So we need to compute the orthogonal projection of t2 onto the subspace
spanned by 1 and t. To do that we need the orthogonal basis for the subspace.
Using the Gram-Schmidt algorithm, we get e1(t) = 1 and
e2(t) = t− 〈t, 1〉〈1, 1〉
1 = t− 1.
Hence, {e1, e2} is an orthogonal basis for the subspace spanned by 1 and t. Using
this orthogonal basis, we can now perform the orthogonal projection of t2 onto 158
and t to get the orthogonal projection onto the subspace spanned by 1 and t,
T (t2) =〈t2, 1〉〈1, 1〉
1 +〈t2, t− 1〉〈t− 1, t− 1〉
(t− 1) =5
3+ 2(t− 1) = −1
3+ 2t
Thus, the standard basis {1, t, t2} is mapped to {1, t,−13
+ 2t}. In coordinate
vectors, (1,0,0) is mapped to (1,0,0), (0,1,0) is mapped to (0,1,0) and (0,0,1) is
mapped to (-1/3,2,0). So the transformation matrix is
A =
1 0 −1/3
0 1 2
0 0 0
(b) No, the matrix is not symmetric. The transformation is self-adjoint being an
orthogonal projection. Since q − Tq is orthogonal to the plane U spanned by
1 and t and Tp is on the subspace U , we have 〈Tp, q − Tq〉 = 0. Similarly,
〈Tq, p− Tp〉 = 0 we have,
〈Tp, q〉 = 〈Tp, q−Tq〉+〈Tp, Tq〉 = 〈Tp, Tq〉 = 〈p+Tp−p, Tq〉 = 〈p, Tq〉+〈Tp−p, Tq〉 = 〈p, Tq〉.
The matrix A of the transformation T is given in the basis {1, t, t2}, which is not
an orthogonal basis, so the facts that xTAy 6= xTATy ( matrix is not symmetric)
and that 〈Tp, q〉 = 〈p, Tq〉 ( operator is self-adjoint) do not contradict each other.
(c) Since T is a projection, we know that T 2 − T = 0. Moreover, T 6= I (so the
minimal polynomial is not x − 1), T 6= 0 ( so the minimal polynomial is not x),
and thus minimal polynomial is µT = x2 − x
Problem 19.2. Let Pn represent the real vector space of polynomials in x of degree less
than or equal to n defined on [0, 1]. Give a real number a, we define Qn(a) the subset of
Pn of polynomials that have the real number a as a root.
(a) Let a be a real number. Show that Qn(a) is a subspace of Pn. Determine the
dimension of that subspace and exhibit a basis.
(b) Let the inner product in Pn be defined by 〈p, q〉 =∫ 1
0p(x)q(x)dx. Determine the
orthogonal complement of the subspace Q2(1) of P2.
Solution. (a) The polynomials in Qn(a) can be written as p(x) = (x− a)q(x) where
q(x) is a polynomial of degree less than or equal to n−1. Let p1(x) = (x−a)q1(x)
and p2(x) = (x− a)q2(x) are in Qn(a) and α, β ∈ R. Then
αp1(x) + βq1(x) = (x− a)(αq1(x) + βq2(x)).
Note that a is a root of αp1(x) + βp2(x) also αq1(x) + βq2(x) has degree less than
or equal to n−−1. Hence αp1(x) + βp2(x) ∈ Qn(a). Therefore, Qn(a) is indeed
a subspace. Since Qn(a) is isomorphic with Pn−1, its dimension is n and
{(x− a), (x− a)2, . . . , (x− a)n}
is a basis.59
(b) We can write the polynomial in P2 as a0 + a1(x − 1) + a2(x − 1)2. We need a
polynomial orthogonal to x− 1 and (x− 1)2, so∫ 1
0
(a0 + a1(x− 1) + a2(x− 1)2
)(x− 1)dx = 0,∫ 1
0
(a0 + a1(x− 1) + a2(x− 1)2
)(x− 1)2dx = 0,
which yields
−a02
+a13− a2
4= 0,
a03− a1
4+a25
= 0,
so a0a1a2
= a2
3/10
6/5
1
.Thus,
Q2(1)⊥ = {3a2 + 12a2(x− 1) + 10a2(x− 1)2, a2 ∈ R}
Problem 19.3. A complex n × n matrix P is idempotent if P 2 = P . Show that every
idempotent matrix is diagonalizable.
Solution. Let P be a complex n× n idempotent matrix. The relation P 2 = P reads as
well P (P − I) = 0. Therefore the eigenvalues of P are either 0 or 1. We consider the two
eigenspaces E0 (the eigenspace associated with the eigenvalue 0), and E1 ( the eigenspace
associated with the eigenvalue 0). Our goal is to prove that E0 ⊕ E1 = Cn. This will
prove that P is diagonalizable.
Note that E0 is null(P ).
Note as well that E1 is range(P ). This is less obvious. On the one hand E1 ⊂ range(P ),
since, if x ∈ E1, x = Tx so x ∈ range(P ) (in other word the eigenspace is always in the
range ). On the other had, if y ∈ range(P ), there exists x such that y = Px, and so
Py = P 2x = Px = y
so that y ∈ E1, so range(P ) ⊂ E1.
We now need to prove that null(P ) ⊕ range(P ) = Cn. Let y ∈ Cn, we can write
y = (Py) + (y − Py). The left-hand side (Py) belongs to range(P ). The right hand
side (y − Py) belongs to null(P ) since P (y − Py) = Py − P 2y = Py − Py = 0.
Therefore null(P )⊕ range(P ) = Cn. So E0⊕E1 = Cn. This proves that P is diagonaliz-
able.
60
20. On minimization problems
We start with the importance of orthogonal projection for the minimization problem
stating the following theorem.
Theorem 20.1. Suppose U is a finite-dimensional subspace of V , v ∈ V and u ∈ U .
Then
‖v − PUv‖ ≤ ‖v − u‖.
Furthermore, the inequality above is an equality if and only if u = PUv.
The above theorem simply says that PUv is such a point in the subspace U so that the
distance ‖v − u‖ is smallest when u = PUv.
Consider the least square problem which is stated as: Given A ∈ Cm×n,m ≥ n, b ∈ Cm,
find x ∈ Cn such that the residual ‖b−Ax‖2 is minimized. Since m ≥ n probabilistically
the vector b does not lie on the column space of A and so the solution of the system
Ax = b does not exists. To get an idea about parameters we seek the best vector y on
the column space of A so that we can solve the system Ax = y and the solution would
be close in the sense that y is the best approximation or projection of b on the column
space of A. By the above theorem the orthogonal projection gives the vector y so that
the residual b− y = b− Ax is the least.
Let P be the orthogonal projection onto the subspace U of V . Then V = U ⊕ U⊥. So,
the vector b ∈ V can be written as b = y+ r where y ∈ U and r ∈ U⊥ which is known as
the residual. Now in the context of solving the least square problem Ax = b, the range
of A is U and to minimize the residual we require the residual to be in U⊥. Therefore r
is orthogonal to the range of A that is it is orthogonal to every columns of A. That is
(col 1)T r = 0, . . . , (col n)T r = 0
which implies that
AT r = 0.
Therefore, AT (b− Ax) = 0⇒ ATAx = AT b. The equation
ATAx = AT b
is know as the normal equation and the matrix ATA is nonsingular if and only if A has
full rank. Therefore the solution x is unique if and only if A has full rank.
The solution can be written as x = (ATA)−1AT b. Here we can define the projection
matrix in a very nice way. Remind that y was the orthogonal projection of b on the range
of A. Therefore the orthogonal projection of b is given by
y = Ax = A(ATA)−1AT b =AAT
ATAb.
The matrix P = AAT
ATAis the orthonormal projection onto the range of A.
61
Definition 20.1. If A has full rank, then the solution x to the least square problem
is unique and is given by x = (ATA)−1AT b. The matrix (ATA)−1AT is known as the
pseudoinverse of A, denoted by A†:
A† = (ATA)−1AT
To solve the normal equation we take help of different factorization of a matrix such as
QR, SVD or Cholesky factorization. First we discuss the Singular Value Decomposition
(SVD). In SVD of a matrix A ∈ Cm×n we seek for decomposition of a matrix as
A = UΣV H ,
where U ∈ Fm×m and V ∈ Fn×n are orthonormal matrices and Σ is a diagonal matrix.
We can write the above equation as
AV = UΣ
which tells us that we are looking for orthonormal set of vectors in the row space which
are the columns of U so that they are mapped to another set of orthonormal vectors
which are the columns in the column space or range of A. The null space is simply
fine introducing zeros on the diagonal in Σ. So the bottom line is in SVD we seek for
orthonormal basis in the row space and an orthonormal basis in the column space so that
we can diagonalize a matrix.
If a matrix A is symmetric positive definite then we can use same orthonormal basis
for both the row and column space. In that case we can decompose the matrix in the
following way:
A = QΛQH
where Q is orthonormal and Λ is the diagonal matrix with the eigenvalues of A in the
diagonal. But in a general case we are not that lucky.
But fortunately we can use the fact of the symmetric positive definite matrices. Since
it is difficult to find the two orthonormal bases together we try to eliminate one of the
orthonormal basis. To do this let us compute,
AHT = V ΣHUHUΣV H = V Σ2V H
where Σ2 is simply the diagonal matrix with the square of the elements of Σ. So, the
columns of V are simply the eigenvalues of AHA and the entries on the diagonal of Σ2
are the eigenvalue. That is we have to do eigenvalue decomposition of AHA to find the
orthonormal basis V and to find the diagonal entries of Σ which are the positive square
root of the eigenvalues of AHA. Now consider,
AAH = UΣ2UH
and so we can find the orthonormal basis U using AAH . So if rank of A is r then we
will have orthonormal basis v1, . . . , vr and an orthonormal basis u1, . . . , ur of the column
space. Then we fill up the bases attaching the basis of the null space of A and null space62
of AT to make the basis for the whole space and get U and V completely. This process
is known as the full SVD.
The existence and uniqueness of SVD is given by the following theorem.
Theorem 20.2. Every matrix A ∈ Cm×n has a singular value decomposition. Further-
more, the singular values {σj} are uniquely determined, and, if A is square and the σj
are distinct, the left and right singular vectors {uj} and {vj} are uniquely determined up
to complex signs (i.e., complex scalar factors of absolute value 1).
63
Problem 20.1. Let A be a full column rank n×k matrix (so k ≤ n) and b to be a column
vector of size n. We want to minimize the squared Euclidean norm L(x) = ‖Ax − b‖22with respect to x.
(a) Prove that, if rank(A) = k, then ATA is invertible.
(b) Compute the gradient of L(x).
(c) Directly derive the normal equations by minimizing L(x), and then provide the
closed-form expression for x that minimizes L(x).
(d) We consider a QR factorization of A where Q is n× k and R is k× k. Show that
an equivalent solution for x is x = R−1QT b.
Solution.
(a) For the sake of contradiction assume that ATA is singular. Then there is x 6= 0 such
that ATAx = 0. Then xTATAx = 0 so that ‖Ax‖2 = 0 which implies, by the property
of norm, Ax = 0. So, x ∈ nullA which contradicts the fact that A is of full rank because
for a full ranked matrix, nullA = {0}. We proved that ATAx = 0 ⇒ x = 0. Since ATA
is square, this means that ATA is invertible.
(b) We can write L(x) as,
L(x) = (Ax− b)T (Ax− b) = (xTAT − bT )(Ax− b) = xTATAx− xTAT b− bTAx+ bT b
Since xTAT b is a scalar xTAT b = (xTAT b)T = bTAx and hence
L(x) = xTATAx− 2bTAx+ bT b
We use the following two propositions:
Proposition 1. Let the scalar α be defined by
α = yTAx
where y is m× 1, x is n× 1 , A is m× n, and A does not depend on x and y, then
∂α
∂x= yTA
∂α
∂y= xTAT
Proof. Define wT = yTA and note that α = wTx Hence,
∂α
∂x= wT = yTA.
Since α is a scalar we can write
α = αT = xTATy
hence,∂α
∂y= xTAT
64
Proposition 2. For the special case in which the scalar α is given by the quadratic form
α = xTAx
where x is n× 1 , A is n× n, and A does not depend on x, then
∂α
∂x= xT (A + AT )
Proof. By definition,
α =n∑j=1
n∑i=1
aijxixj
Differentiating with respect to the kth element of x we have
∂α
∂xk=
n∑j=1
akjxj +n∑i=1
aikxi
for all k = 1, . . . , n, and consequently,
∂α
∂x= xTAT + xTA = xT (AT + A)
Therefore,
∇L(x) = 2xTATA− 2bTA.
(c) Setting the gradient to zero, we get xTATA = bTA and transposing both sides we get
the normal equation
ATAx = AT b.
Since ATA is invertible, the unique solution of the normal equations is obtained as
x = (ATA)−1AT b.
(d) The QR factorization of A has the property that A = QR, with QTQ = I. We claim
that since A is of full rank R is invertible. Because if not then there exists some nonzero
x such that Rx = 0 which will imply Ax = QRx = 0 which says that dim nullA > 0 and
so rankA < k which contradicts to the fact that A is of full rank. Since R is invertible
so is RT and hence
x = (RTQTQR)−1RTQT b = (RTR)−1RTQT b = R−1(RT )−1RTQT b = R−1QT b
Problem 20.2. In this problem, R is the field of real numbers. Let (u1, . . . , um) be
an orthonormal basis for subspace W 6= {0} of the vector space V = Rn×1 (under the
standard inner product), let U be the n× n matrix defined by U = [u1, . . . , um], and let
P be the n× n matrix defined by A = UUT .
(a) Prove that if v is any given member of V ,, then among all the vectors w in W ,
the one which minimizes ‖v − w‖ is given by
w = 〈v, u1〉u1 + · · ·+ 〈v, um〉um.
(This vector w is called the projection of v onto W .)
(b) Prove: For any vector x ∈ Rn×1, the projection w of x onto W is given by w = Px.65
(c) Prove: P is a projection matrix. (Recall that a matrix P is called a projection
matrix if and only if P is symmetric and idempotent).
(d) If V = R3×1, and W = span[(1, 2, 2)T , (1, 0, 1)T ], find the projection matrix de-
scribed above and use it to find the projection of (2, 2, 2)T onto W .
Solution. (a)
66
21. Matrix Differentiation
Proposition 3. Let
y = Ax
where y is m× 1, x is n× 1 , A is m× n, and A does not depend on x, then
∂y
∂x= A
Proof. Since ith element of y is given by
yi =n∑k=1
aikxk
it follows that∂yi∂xj
= aij
for all i = 1, . . . ,m, j = 1, . . . , n. Hence
∂y
∂x= A
Proposition 4. Let the scalar α be defined by
α = yTAx
where y is m× 1, x is n× 1 , A is m× n, and A does not depend on x and y, then
∂α
∂x= yTA
∂α
∂y= xTAT
Proof. Define wT = yTA and note that α = wTx Hence,
∂α
∂x= wT = yTA.
Since α is a scalar we can write
α = αT = xTATy
hence,∂α
∂y= xTAT
Proposition 5. For the special case in which the scalar α is given by the quadratic form
α = xTAx
where x is n× 1 , A is n× n, and A does not depend on x, then
∂α
∂x= xT (A + AT )
Proof. By definition,
α =n∑j=1
n∑i=1
aijxixj
67
Differentiating with respect to the kth element of x we have
∂α
∂xk=
n∑j=1
akjxj +n∑i=1
aikxi
for all k = 1, . . . , n, and consequently,
∂α
∂x= xTAT + xTA = xT (AT + A)
Proposition 6. For the special case where A is a symmetric matrix and
α = xTAx
where x is n× 1 , A is n× n, and A does not depend on x, then
∂α
∂x= 2xTA.
Proposition 7. Let the scalar α be defined by
α = yTx
where y is n× 1, x is n× 1, and both y and x are functions of the vector z. Then
∂α
∂z= xT
∂y
∂z+ yT
∂x
∂z
Proposition 8. Let the scalar α be defined by
α = xTx
where x is n× 1, and x is a function of the vector z. Then
∂α
∂z= 2xT
∂x
∂z.
68
22. Miscellaneous
Problem 22.1. Show that if A = AT (that is A is real matrix) in the field of complex
number, then all the eigenvalues of A are real numbers.
Solution. Note that eigenvalue of a real matrix may be complex. Let v be an eigenvector
of A such that v∗v = 1 corresponding to the eigenvalue λ. By v∗ we denote the conjugate
transpose. Since A is symmetric and the entries are real A∗ = A.
Av = λv; v∗A = λv∗
the last equality is obtained taking the conjugate transpose of the first equality and using
A = A∗. Then,
λ = λv∗v = v∗Av = v∗λv = λv∗v = λ
Since λ = λ it is clear that λ is real.
Problem 22.2. Show that if BT = −B (i.e. B is skew symmetric), then all the eigen-
values of B are pure imaginary or zero. (B is a matrix with real coefficients.)
Solution. Let (λ, x) be an eigencouple of B a skew symmetric matrix. Then
(22.1) Bx = λx.
. If we multiply on the left by xH , we get
(22.2) xHBx = λxHx.
. Now we transpose-conjugate equation (22.1) and get that xHBH = xHλ, we use the
fact that BH = BT (since B is real) and BT = −B (since B is skew-symmetric) to get
that xH(−B) = xHλ, we multiply both side by x on the right and rearrange to get that
(22.3) xHBx = −λxHx.
Since x is not zero, (22.2) and (22.3) imply that λ = −λ. Therefore λ is pure imaginary
or zero.
Problem 22.3. Let A be a real matrix. A generalized inverse of a matrix A is any matrix
G such that AGA = A. Prove each of the following:
(a) If A is invertible, the unique generalized inverse of A is A−1.
(b) If G is a generalized inverse of (XTX), then
XGXTX = X.
(c) For any real symmetric matrix A, there exists a generalized inverse of A.
Solution. (a) Since,
AA−1A = IA = A
so, A−1 is a generalized inverse of A. Now, if AGA = A, then
AG = AGAA−1 = AA−1 = I,69
so, G = A−1 and hence it is unique.
(b) For arbitrary vector v, we can write v = u+ w, where u ∈ nullXT and w = Xλ.
So, XTu = 0⇒ uTX = 0. Then,
vTXGXTX = (uT + λTXT )XGXTX = λTXTXGXTX = λTXTX = wTX = vTX.
Since v is arbitrary, XGXTX = X.
(c) Since A is real symmetric, it is diagonalizable; so A = PΛP T , where P is orthog-
onal and Λ is diagonal real, with the eigenvalues λ = (λ1, . . . , λn) on the diagonal.
Let γ = (γ1, . . . , γn) where
γi =
1λi
if λi 6= 0
0 if λi = 0.
Let Γ be the diagonal matrix with γ along the diagonal. Let G = PΓP T . Since
P is orthogonal, P TP = I. Thus,
AGA = PΛP TPΓP TPΛP T = PΛΓΛP T = PΛP T = A
Thus G is a generalized inverse of A.
If A is not given symmetric then we can use the SVD to find the generalized
inverse which is also known as pseudoinverse. Since every matrix has a singular
value decomposition let,
A = UΣV T
where U and V are orthonormal that is V TV = UTU = I and Σ = (σ1, . . . , σn) is
a diagonal matrix. Let γ = (γ1, . . . , γn) where
γi =
1σi
if σi 6= 0
0 if σi = 0.
Let Γ be the diagonal matrix with γ along the diagonal. Define G = V ΓUT .
Thus,
AGA = UΣV TV ΓUTUΣV T = UΣΓΣV T = A.
Hence G is the general inverse of A.
70
23. List of Important Problems
Problem1: Let F be a commutative field, let (V,+, .) be a finite dimensional vector space
over F, let U and W be two subspace of V . Show that there exists S, a subspace of V ,
such that V = S ⊕ U and V = S ⊕W if and only if dimU = dimW .
(1) rank(A+B) ≤ rankA+ rankB
(2) A matrix A is symmetric implies it has real eigenvalues and the eigenvectors of
A forms a basis of the vector space. In other words there is a orthonormal basis
of V such that A is a diagonal matrix. Since A has n eigenvectors there is no
defective eigenvalues that is there is no generalized eigenvalue and hence all the
factors in the minimal polynomial are in single power.
71