General Analysis - wj32wj32.org/files/General Analysis.pdf · 2019. 4. 4. · say that vis a unit vector. The following theorem shows that kkis indeed a norm. Theorem 1.6. Let V be

General Analysis

Wen Jia Liu

April 4, 2019

Contents

Contents 2

Introduction 4

1 Normed vector spaces 51.1 Norms and inner products . . . . . . . . . . . . . . . . . . . . . . . 51.2 Series and nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3 Continuous linear maps . . . . . . . . . . . . . . . . . . . . . . . . 191.4 Products, quotients and duality . . . . . . . . . . . . . . . . . . . . 301.5 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411.6 Exterior algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611.7 Normed algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731.8 The regulated integral . . . . . . . . . . . . . . . . . . . . . . . . . 79

2 Differentiation 822.1 The derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822.2 Higher derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 922.3 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 992.4 Inverse and implicit functions . . . . . . . . . . . . . . . . . . . . . 1092.5 Maxima and minima . . . . . . . . . . . . . . . . . . . . . . . . . . 1132.6 Special functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1172.7 Line integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

3 Complex analysis 1383.1 Complex differentiation and integration . . . . . . . . . . . . . . . 1383.2 Cauchy’s theorem and meromorphic functions . . . . . . . . . . . . 1503.3 Formal power series . . . . . . . . . . . . . . . . . . . . . . . . . . 164

4 Integration 1754.1 Measurable spaces and measures . . . . . . . . . . . . . . . . . . . 1764.2 Measurable maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1914.3 The integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2034.4 The Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2204.5 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

2

Contents

4.6 Product measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2434.7 The Cp, Cc, C0 spaces . . . . . . . . . . . . . . . . . . . . . . . . . 2494.8 Radon measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2574.9 Topological groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 2794.10 Haar measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2814.11 Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

5 Commutative Banach algebras 3035.1 The spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3045.2 Gelfand representation . . . . . . . . . . . . . . . . . . . . . . . . . 3155.3 Commutative C*-algebras . . . . . . . . . . . . . . . . . . . . . . . 3235.4 Positive operator-valued measures . . . . . . . . . . . . . . . . . . 3375.5 The spectral theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 3485.6 Operators on real Hilbert spaces . . . . . . . . . . . . . . . . . . . 3585.7 Compact operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 3755.8 Schatten class operators . . . . . . . . . . . . . . . . . . . . . . . . 3965.9 Fourier analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4135.10 Fourier analysis in vector spaces . . . . . . . . . . . . . . . . . . . 438

6 Probability theory 4436.1 Probability, expectation and independence . . . . . . . . . . . . . . 4436.2 Linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453

Bibliography 454

3

Introduction

TODO

1. Goal

2. Assumed knowledge/prerequisites

3. Notation

4

1 Normed vector spaces

In this chapter, K denotes either the real or complex field. We write span(S) forthe subspace generated by a set S, and in particular we write span(v1, . . . , vn) forthe subspace generated by vectors v1, . . . , vn.

Notes

For more material on exterior algebra, which is discussed in Section 1.6, see [16]and [3].

1.1 Norms and inner products

Norms

Definition 1.1. Let V be a vector space over K. A norm is a map ‖·‖ : V →[0,∞) satisfying the following properties:

1. ‖v‖ ≥ 0 for all v ∈ V .

2. ‖v‖ = 0 if and only if v = 0.

3. ‖rv‖ = |r| ‖v‖ for all v ∈ V and r ∈ K.

4. ‖u+ v‖ ≤ ‖u‖+ ‖v‖ for all u, v ∈ V .

A seminorm on V is a map ‖·‖ : V → [0,∞) that satisfies properties (1), (3)and (4) of Definition 1.1, but not necessarily (2). A real or complex vector spaceV with a norm is called a normed vector space. Any subspace W of a normedvector space is also a normed vector space, with the norm restricted to W . It iseasy to see that d(x, y) = ‖x− y‖ defines a metric (called the norm metric) onV , so any normed vector space is a metric space with this induced metric. Notethat ‖·‖ is a continuous function under the norm metric. The associated topologyon V is called the norm topology. A Banach space is a normed vector space

5


that is complete with respect to the norm metric. If V is a normed vector spaceand W ⊆ V is a complete subspace, then W is closed in V . If V is a Banach space,then a subspace of V is a Banach space if and only if it is closed.

Lemma 1.2 (Reverse triangle inequality). Let ‖·‖ be a seminorm on V . For allu, v ∈ V ,

|‖u‖ − ‖v‖| ≤ ‖u− v‖ .

Proof. Since‖u‖ = ‖v + (u− v)‖ ≤ ‖v‖+ ‖u− v‖

and‖v‖ = ‖u+ (v − u)‖ ≤ ‖u‖+ ‖u− v‖ ,

we have−‖u− v‖ ≤ ‖u‖ − ‖v‖ ≤ ‖u− v‖ .

Theorem 1.3. Let V be a vector space over K and let ‖·‖ be a seminorm on V .

1. The set S = v ∈ V : ‖v‖ = 0 is a subspace of V .

2. V/S is a normed vector space with the norm ‖v + S‖ = ‖v‖.

Proof. S is nonempty since ‖0‖ = 0 ‖0‖ = 0. If u, v ∈ S and r, s ∈ K then

‖ru+ sv‖ ≤ |r| ‖u‖+ |s| ‖v‖ = 0,

so ru+ sv ∈ S. This proves (1). For (2), we can define a norm ‖·‖ : V/S → [0,∞)by setting ‖v + S‖ = ‖v‖. This is well-defined since u− v ∈ S implies that

|‖u+ S‖ − ‖v + S‖| = |‖u‖ − ‖v‖| ≤ ‖u− v‖ = 0.

It is easy to check that ‖·‖ is indeed a norm.

A map f : V → W is said to be an isometry if ‖fv‖ = ‖v‖ for all v ∈ V . It isclear that every isometry is injective, since fu = fv implies that

0 = ‖f(u− v)‖ = ‖u− v‖ .

If f is a bijection, then it is easy to see that f−1 is also an isometry.

6


Inner products

Definition 1.4. Let V be a vector space over K. An inner product on V is amap 〈·, ·〉 : V × V → K satisfying the following properties:

1. 〈v, v〉 ≥ 0 for all v ∈ V and 〈v, v〉 = 0 if and only if v = 0.

2. (If K = C) 〈u, v〉 = 〈v, u〉 for all u, v ∈ V .

3. (If K = R) 〈u, v〉 = 〈v, u〉 for all u, v ∈ V .

4. 〈ru+ sv, w〉 = r 〈u,w〉+ s 〈v, w〉 for all u, v, w ∈ V and r, s ∈ K.

A real (or complex) vector space V along with an inner product on V is called areal (or complex) inner product space. Any subspace W of an inner productspace is also an inner product space, with the inner product restricted to W ×W .If K = R then properties (3) and (4) imply that the inner product is linear inits second argument, and therefore bilinear. If K = C then the inner product isconjugate linear in its second argument:

〈w, ru+ sv〉 = 〈ru+ sv, w〉 = r〈u,w〉+ s〈v, w〉 = r 〈w, u〉+ s 〈w, v〉 .

Since the inner product is linear in its first argument and conjugate linear in itssecond argument, we say that it is sesquilinear. A Hilbert space is a completeinner product space.

Theorem 1.5. Let V be an inner product space over K and let f : V → V be alinear map.

1. If 〈fv, w〉 = 0 for all v, w ∈ V , then f = 0.

2. If K = C and 〈fv, v〉 = 0 for all v ∈ V , then f = 0.

3. If 〈fv, w〉 = 〈v, fw〉 and 〈fv, v〉 = 0 for all v, w ∈ V , then f = 0. (Notethat f satisfies the first condition if it is a self-adjoint operator on a Hilbertspace. See Definition 1.102.)

Proof. For (1), we have〈fv, fv〉 = 0

for all v ∈ V . For (2), let r ∈ C. Then

0 = 〈f(rv + fv), rv + fv〉

= |r|2 〈fv, v〉+⟨f2v, fv

⟩+ r 〈fv, fv〉+ r

⟨f2v, v

⟩= r 〈fv, fv〉+ r

⟨f2v, v

⟩.

7


Setting r = 1 gives〈fv, fv〉+

⟨f2v, v

⟩= 0

and setting r = i givesi 〈fv, fv〉 − i

⟨f2v, v

⟩= 0.

Therefore 〈fv, fv〉 = 0. For (3), a similar argument shows that

0 = r 〈fv, fv〉+ r⟨f2v, v

⟩= 2r 〈fv, fv〉 ,

so setting r = 1 gives 〈fv, fv〉 = 0.

An inner product on V induces a norm

‖v‖ =√〈v, v〉

on V , so any inner product space is also a normed vector space (and therefore ametric space). We say that ‖v‖ is the norm or length of v. If ‖v‖ = 1, then wesay that v is a unit vector. The following theorem shows that ‖·‖ is indeed anorm.

Theorem 1.6. Let V be an inner product space over K.

1. (Cauchy-Schwarz inequality). For all u, v ∈ V ,

|〈u, v〉| ≤ ‖u‖ ‖v‖ ,

with equality if and only if one of u and v is a scalar multiple of the other.

2. (Triangle inequality). For all u, v ∈ V ,

‖u+ v‖ ≤ ‖u‖+ ‖v‖ ,

with equality if and only if one of u and v is a scalar multiple of the other.

3. (Parallelogram law). For all u, v ∈ V ,

‖u+ v‖2 + ‖u− v‖2 = 2 ‖u‖2 + 2 ‖v‖2 .

4. (Polarization identities). If K = R, then

〈u, v〉 =1

4(‖u+ v‖2 − ‖u− v‖2).

If K = C, then

〈u, v〉 =1

4(‖u+ v‖2 − ‖u− v‖2 + i ‖u+ iv‖2 − i ‖u− iv‖2).

8


Proof. For (1), we can assume that v 6= 0, for otherwise the result follows imme-diately. For all r ∈ K,

0 ≤ 〈u− rv, u− rv〉= 〈u, u〉 − r 〈u, v〉 − r(〈v, u〉 − r 〈v, v〉).

If we set r = 〈v, u〉 / 〈v, v〉 then

0 ≤ 〈u, u〉 − 〈v, u〉〈u, v〉〈v, v〉

= ‖u‖2 − |〈u, v〉|2

‖v‖2.

Equality holds if and only if 〈u− rv, u− rv〉 = 0, which implies that u = rv. For(2), we have

‖u+ v‖2 = 〈u+ v, u+ v〉

= ‖u‖2 + 〈u, v〉+ 〈v, u〉+ ‖v‖2

≤ ‖u‖2 + 2 ‖u‖ ‖v‖+ ‖v‖2

= (‖u‖+ ‖v‖)2.

Corollary 1.7. Let V be an inner product space. The function 〈·, ·〉 : V ×V → Kis continuous.

Proof. This follows from the fact that

〈x, y〉 − 〈x0, y0〉 = 〈x− x0, y − y0〉+ 〈x− x0, y0〉+ 〈x0, y − y0〉

and

|〈x, y〉 − 〈x0, y0〉| ≤ |〈x− x0, y − y0〉|+ |〈x− x0, y0〉|+ |〈x0, y − y0〉|≤ ‖x− x0‖ ‖y − y0‖+ ‖x− x0‖ ‖y0‖+ ‖x0‖ ‖y − y0‖ .

Corollary 1.8. Let V and W be inner product spaces. A linear map f : V →Wis an isometry if and only if 〈fu, fv〉 = 〈u, v〉 for all u, v ∈ V .

Proof. This follows from the polarization identities.

9


Not all norms are induced by an inner product, since (for example) there are normsthat do not satisfy the parallelogram law

‖u+ v‖2 + ‖u− v‖2 = 2 ‖u‖2 + 2 ‖v‖2 .

It is a remarkable fact that the converse holds:

Theorem 1.9. Let V be a normed vector space. If the parallelogram law holds inV , then the appropriate polarization identity (in Theorem 1.6) defines an innerproduct that induces the norm on V .

Orthogonality

Definition 1.10. Let V be an inner product space.

1. Two vectors u, v ∈ V are orthogonal, written u ⊥ v, if 〈u, v〉 = 0.

2. Two sets X,Y ⊆ V are orthogonal, written X ⊥ Y , if 〈x, y〉 = 0 for allx ∈ X and y ∈ Y . We will write v ⊥ X instead of v ⊥ X.

3. The orthogonal complement of a set X ⊆ V is the subspace

X⊥ = v ∈ V : v ⊥ X .

Note that X ∩X⊥ = ∅ for any set X ⊆ V , and if X ⊆ Y then X⊥ ⊇ Y ⊥.

4. A nonempty set O = vαα∈A ⊆ V is said to be an orthogonal set ifvα ⊥ vβ for all α 6= β. If each vα is also a unit vector, then we say that O isan orthonormal set.

Theorem 1.11. Any orthogonal set of nonzero vectors is linearly independent.

Proof. Let O be an orthogonal set of nonzero vectors. Suppose that

r1v1 + · · ·+ rnvn = 0

where v1, . . . , vn are distinct elements of O. Then

0 = r1 〈v1, vi〉+ · · ·+ rn 〈vn, vi〉 = ri 〈vi, vi〉

and ri = 0 for each i.

Theorem 1.12 (Gram-Schmidt process). Let v1, v2, . . . be a sequence of vectorsin an inner product space. Define a sequence O = u1, u2, . . . by

un = vn −n−1∑i=1

rn,iui

10


where u1 = v1 and

rn,i =

0, ui = 0,〈vn,ui〉〈ui,ui〉 , ui 6= 0.

Then O is an orthogonal set with the property that

span(u1, . . . , un) = span(v1, . . . , vn)

for all n, and un = 0 if and only if vn ∈ span(v1, . . . , vn−1).

Lemma 1.13. Let V be an inner product space over K, let v ∈ V , and let S be asubspace of V . If there is a C ≥ 0 such that

Re 〈v, s〉 ≤ C 〈s, s〉

for all s ∈ S, then v ⊥ S. In particular, if Re 〈v, u〉 ≤ C 〈u, u〉 for all u ∈ V thenv = 0.

Proof. We have Re 〈v, cs〉 ≤ C |c|2 ‖s‖2 for all c ∈ K. Setting c = r and thenc = −r for r > 0 gives

|Re 〈v, s〉| ≤ rC ‖s‖2 → 0

as r → 0, so Re 〈v, s〉 = 0. If K = C then setting c = ir and then c = −ir forr > 0 gives

|Im 〈v, s〉| ≤ rC ‖s‖2 → 0

as r → 0, so Im 〈v, s〉 = 0.

If V = S ⊕ T (i.e. S + T = V and S ∩ T = 0) and S ⊥ T , then we say that Vis the orthogonal direct sum of S and T and we write V = S T .

Theorem 1.14 (Uniqueness of orthogonal direct sums). Let V be an inner productspace and let S, T, T ′ be subspaces of V .

1. If V = S T then T = S⊥.

2. If S T = S T ′ then T = T ′.

Proof. If S T then S ⊥ T , so T ⊆ S⊥. If x ∈ S⊥ then x = s+ t for some s ∈ Sand t ∈ T , so

〈s, s〉 = 〈s, s〉+ 〈s, t〉 = 〈s, x〉 = 0

and x ∈ T . Therefore S⊥ ⊆ T .

11


1.2 Series and nets

Let V be a normed vector space and let ak be a sequence in V . The nth partialsum of ak is defined as

sn =n∑k=1

ak.

If the nth partial sum of ak converges to a vector s ∈ V , then we say that theseries

∑ak converges to s and we write

∞∑k=1

ak = s.

We say that∑ak diverges if it does not converge. If the series

∞∑k=1

‖ak‖

converges, then we say that∑ak is absolutely convergent.

Theorem 1.15. Let V be a normed vector space and let∑an be a series in V .

1. If∑an converges, then for every ε > 0 there exists an integer N such that∥∥∥∥∥

n∑k=m

ak

∥∥∥∥∥ < ε

for all n ≥ m ≥ N . If V is complete, then the converse holds.

2. If∑an converges, then ‖an‖ → 0 as n→∞.

Proof. The sequence sn of partial sums converges, so it is a Cauchy sequence.

Theorem 1.16. Let V be a normed vector space. Then V is complete (i.e. aBanach space) if and only if every absolutely convergent series is convergent.

Proof. Suppose that V is complete and let∑ak be an absolutely convergent series.

The sequence sn of partial sums is a Cauchy sequence since

‖sm − sn‖ =

∥∥∥∥∥n∑

k=m+1

ak

∥∥∥∥∥ ≤n∑

k=m+1

‖ak‖

12


whenever n ≥ m, so sn converges to some s ∈ V . Conversely, suppose that everyabsolutely convergent series is convergent and let xn be a Cauchy sequence inV . Choose n1 < n2 < · · · such that ‖xm − xn‖ < 2−j for m,n ≥ nj . Then

xn1+

k∑j=1

(xnj+1− xnj ) = xnk+1

and∞∑j=1

∥∥xnj+1− xnj

∥∥ ≤ ∞∑j=1

2−j <∞,

so∑∞j=1(xnj+1

− xnj ) converges. Therefore the subsequence xnk converges tosome x ∈ V , and xn → x since xn is a Cauchy sequence.

Using absolute convergence, we have a number of convergence tests for infiniteseries in Banach spaces.

Theorem 1.17 (Comparison test). Let V be a Banach space, let an be a se-quence in V , and let cn be a sequence of real numbers. If ‖an‖ ≤ cn for all nand

∑cn converges, then

∑an converges.

Proof. The sequence sn of partial sums of∑an is a Cauchy sequence since

‖sm − sn‖ =

∥∥∥∥∥n∑

k=m+1

ak

∥∥∥∥∥ ≤n∑

k=m+1

cn

whenever n ≥ m.

Theorem 1.18 (Root test). Let V be a Banach space and let an be a sequencein V . Let α = lim supn→∞

n√‖an‖.

1. If α < 1, then∑an converges.

2. If α > 1, then∑an diverges.

Proof. If α < 1, choose β with α < β < 1 and choose an integer N such thatn√‖an‖ < β for all n ≥ N . Then ‖an‖ < βn, and the comparison test shows that∑an converges. If α > 1 then ‖an‖ > 1 for infinitely many values of n, so ‖an‖

does not converge to 0 and∑an cannot converge.

Theorem 1.19 (Limit test). Let V be a Banach space and let an be a sequencein V with an 6= 0 for all n.

13


1. If lim supn→∞ ‖an+1‖ / ‖an‖ < 1, then∑an converges.

2. If there is an integer N such that ‖an+1‖ / ‖an‖ ≥ 1 for all n ≥ N , then∑an diverges.

Proof. Suppose (1) holds. Choose some β < 1 and an integer N such that

‖an+1‖‖an‖

< β

for all n ≥ N . Then

‖an‖ < β ‖an−1‖ < · · · < βn−N ‖aN‖

for all n ≥ N , so∑an converges by the comparison test. If (2) holds then

‖an+1‖ ≥ ‖an‖ for all n ≥ N , so ‖an‖ does not converge to 0 and∑an cannot

converge.

Rearrangements

If a′n = aσ(n) for some bijection σ : 1, 2, . . . → 1, 2, . . . , we say that∑a′n is a

rearrangement of∑an. If every rearrangement of

∑an converges to the same

value, we say that∑an is unconditionally convergent.

Theorem 1.20. Let∑an be a convergent series in a normed vector space. If∑

an is absolutely convergent, then it is unconditionally convergent.

Proof. Let∑aσ(n) be a rearrangement of

∑an. For any ε > 0, there is an integer

N such that∞∑k=n

‖ak‖ < ε

for all n ≥ N . Choose M such that 1, . . . , N ⊆ σ(1, . . . ,M). Then for alln > M we have ∥∥∥∥∥

n∑k=1

ak −n∑k=1

aσ(k)

∥∥∥∥∥ < ε

since (at least) the terms a1, . . . , aN cancel. Therefore∑an converges to the same

value as∑aσ(n).

Theorem 1.21. Every unconditionally convergent series in a finite-dimensionalnormed vector space V is absolutely convergent.

Proof. The case V = R follows from the Riemann series theorem, and the generalcase follows easily.

14


Uniform convergence

Recall that a sequence of functions fn from a topological space X to a metricspace Y is said to converge uniformly to a function f : X → Y if for every ε > 0there exists an integer N such that d(fn(x), f(x)) < ε for all x ∈ X and n ≥ N .Equivalently, fn → f uniformly if and only if

supx∈X

d(fn(x), f(x))→ 0

as n → ∞. We say that fn is uniformly Cauchy if for every ε > 0 thereexists an integer N such that d(fm(x), fn(x)) < ε for all x ∈ X and m,n ≥ N .Equivalently, fn is uniformly Cauchy if and only if

supx∈X

d(fm(x), fn(x))→ 0

as m,n→∞. We have two related definitions:

1. We say that fn converges locally uniformly to a function f : X → Y iffor every x ∈ X there exists a neighborhood U ⊆ X of x on which fn → funiformly.

2. We say that fn converges compactly (or uniformly on compact sets)to a function f : X → Y if fn → f uniformly on K for every compactK ⊆ X.

If each fn maps into a normed vector space, then we say that a series∑fn

converges uniformly on X if the sequence sn of partial sums defined by

sn(x) =

n∑k=1

fk(x)

converges uniformly on X.

Theorem 1.22 (Uniform limit theorem). Let fn be a sequence of continuousfunctions on X. If fn → f uniformly, then f is continuous.

Proof. Let x0 ∈ X and let ε > 0. Choose an integer n such that d(fn(x), f(x)) < εfor all x ∈ X. Since fn is continuous, there exists a neighborhood U of x0 suchthat d(fn(x), fn(x0)) < ε for all x ∈ U . Then

d(f(x), f(x0)) ≤ d(f(x), fn(x)) + d(fn(x), fn(x0)) + d(fn(x0), f(x0)) < 3ε

for all x ∈ U .

15


Theorem 1.23 (Cauchy criterion for uniform convergence). A sequence of func-tions fn into a complete metric space converges uniformly on X if and only iffor every ε > 0 there exists an integer N such that

d(fm(x), fn(x)) < ε (*)

for all x ∈ X and m,n ≥ N .

Proof. If fn → f uniformly, then there is an integer N such that d(fn(x), f(x)) < εfor all x ∈ X and n ≥ N , so

d(fm(x), fn(x)) ≤ d(fm(x), f(x)) + d(f(x), fn(x)) < 2ε

for all x ∈ X and m,n ≥ N . Conversely, suppose that condition in (*) holds.Since fn(x) is a Cauchy sequence for every x ∈ X, the sequence fn convergespointwise to a function f . Let ε > 0 and choose N so that (*) holds. Fix x ∈ Xand n ≥ N , and let δ > 0. Since fm → f pointwise, we can choose an integerM ≥ N such that d(fm(x), f(x)) < δ for all m ≥M . Then

d(fn(x), f(x)) ≤ d(fn(x), fm(x)) + d(fm(x), f(x)) < ε+ δ.

Since δ was arbitrary, we have d(fn(x), f(x)) ≤ ε. Therefore fn → f uniformly.

Corollary 1.24 (Weierstrass M-test). Let fn be a sequence of functions into aBanach space, and let Mn be a sequence of real numbers. If

∑Mn converges

and‖fn(x)‖ ≤Mn

for all x ∈ X and all n, then∑fn converges uniformly.

Proof. Let ε > 0 and choose an integer N such that∥∥∥∥∥n∑

k=m

fk(x)

∥∥∥∥∥ ≤n∑

k=m

Mk < ε

for all x ∈ X and n ≥ m ≥ N . By Theorem 1.23,∑fn converges uniformly.

Theorem 1.25. Let fn be a sequence of functions on X. If X is locally compact,then fn → f locally uniformly if and only if fn → f compactly.

Proof. If fn → f locally uniformly and K ⊆ X is compact, then we can cover Kby a finite number of open sets on which fn → f uniformly. Conversely, if fn → fcompactly and x ∈ X, then we can choose a relatively compact neighborhood Uof x so that fn → f uniformly on the compact set U .

16


Convergence of nets

A directed set is a nonempty set A along with a reflexive and transitive relation≥ such that for any α, β ∈ A, there exists some γ ∈ A such that γ ≥ α and γ ≥ β.For example, N = 1, 2, . . . is a directed set with the usual relation ≥, since forany a, b ∈ N we have c ≥ a and c ≥ b if c = max(a, b) ∈ N. Let X be a topologicalspace, let x ∈ X, and let U be the set of all neighborhoods of x. Then U is adirected set with the relation ⊆, since for any U, V ⊆ U we have W ⊆ U andW ⊆ V if W = U ∩ V ∈ U .

A net is a function from a directed set A to a topological space X, usually written(xα) or (xα)α∈A where α ∈ A and xα ∈ X. If Y ⊆ X, we say that (xα) iseventually in Y if there exists some α ∈ A such that xβ ∈ Y for all β ≥ α. Ifx ∈ X and (xα) is eventually in U whenever U is a neighborhood of x, we say that(xα) converges to x or that the limit of (xα) is x, and we write xα → x.

Theorem 1.26 (Properties of limits). Let X,Y be topological spaces.

1. A map f : X → Y is continuous if and only if for every x ∈ X we havef(xα)→ f(x) whenever (xα) is a net with xα → x.

2. If X is Hausdorff, then limits are unique: if xα → x and xα → x′, thenx = x′.

3. If E ⊆ X, then x ∈ E if and only if there is a net (xα) in E (i.e. xα ∈ Efor all α) such that xα → x.

4. If E ⊆ X, then E is closed if and only if xα → x implies x ∈ E for any net(xα) in E.

5. If X is a normed vector space, then xα → x if and only if for every ε > 0there exists some α such that ‖xβ − x‖ < ε for all β ≥ α.

Let V be a normed vector space and let A be a set. A series is an expression∑α∈A xα where xα ∈ V . Let P0(A) be the set of all finite subsets of A. This is a

directed set under the relation ⊇, since for any S, T ⊆ P0(A) we have U ⊇ S andU ⊇ T if U = S ∪ T . The net (sS) of partial sums is defined by

sS =∑α∈S

xα.

If sS converges to a vector s ∈ V , then we say that the series∑α∈A xα converges

to s and we write ∑α∈A

xα = s.

17


Alternatively,∑α∈A xα converges to s if and only if for every ε > 0 there exists a

finite set S ⊆ A such that ∥∥∥∥∥∑α∈T

xα − s

∥∥∥∥∥ < ε

for all finite sets S ⊆ T ⊆ A. We have analogs of Theorem 1.15 and Theorem 1.16:

Theorem 1.27. Let V be a normed vector space and let∑α∈A xα be a series in

V . If∑α∈A xα converges, then for every ε > 0 there exists a finite set S ⊆ A

such that ∥∥∥∥∥∑α∈T

xα

∥∥∥∥∥ < ε

whenever T ⊆ A is finite and S∩T = ∅. If V is complete, then the converse holds.

A series∑α∈A xα is absolutely convergent if

∑α∈A ‖xα‖ converges.

Corollary 1.28. Let V be a normed vector space. Then V is complete (i.e. aBanach space) if and only if every absolutely convergent series is convergent.

Corollary 1.29. Let V be a normed vector space and let∑α∈A xα be a convergent

series in V . Then xα 6= 0 for only countably many α.

Proof. For each n ≥ 1, choose a finite set Sn ⊆ A such that∥∥∥∥∥∑α∈T

xα

∥∥∥∥∥ < 1

n

whenever T ⊆ A is finite and Sn ∩ T = ∅. Let S =⋃∞n=1 Sn, which is countable.

If α /∈ S then Sn ∩ α = ∅ and ‖xα‖ < 1/n for all n, i.e. xα = 0.

The preceding result relies on the fact that V is first countable; it does not holdwhen V is an arbitrary abelian topological group.

Theorem 1.30. Let V be a Banach space and let xn be a sequence in V . Thefollowing are equivalent:

1.∑n∈N xn converges (as a net) to x.

2.∑∞n=1 xn converges unconditionally to x.

Theorem 1.31. Let xαα∈A be a set of nonnegative real numbers. Then∑α∈A

xα = supS

∑α∈S

xα

if either expression is finite, where the sup is taken over all finite sets S ⊆ A.

18


1.3 Continuous linear maps

From now on, we write |x| instead of ‖x‖ for the norm of a vector x. Let E andF be normed vector spaces over K.

A linear map f : E → F is called bounded if there exists some C such that|fx| ≤ C |x| for all x ∈ E. The number C is called a bound for f . Clearly, everyisometry is bounded (with 1 as a bound).

Theorem 1.32. Let f : E → F be a linear map. The following are equivalent:

1. f is bounded.

2. f is continuous.

3. f is continuous at some point x0 ∈ E.

Proof. If C is a bound for f then |fx− fy| = |f(x− y)| ≤ C |x− y|, so f iscontinuous. This proves (1)⇒ (2). The implication (2)⇒ (3) is obvious. Supposethat f is continuous at some x0 ∈ E. Choose some δ > 0 such that |fx− fx0| ≤ 1whenever |x− x0| ≤ δ. Then for all x 6= 0,

|fx| = |x|δ

∣∣∣∣f ( δx|x| + x0

)− fx0

∣∣∣∣≤ |x|

δ.

This proves (3)⇒ (1).

We define the norm of a bounded linear map by

|f | = inf C ∈ R : |fx| ≤ C |x| for all x ∈ E .

This norm is often called the operator norm. It is easy to verify the followingequalities:

|f | = supx 6=0

|fx||x|

= sup|x|=1

|fx|

= sup|x|≤1

|fx| .

Theorem 1.33. Let E,F,G be normed vector spaces, and let f, g : E → F andh : F → G be continuous linear maps.

19


1. |f | = 0 if and only if f = 0.

2. |rf | = |r| |f | for all r ∈ K.

3. |f + g| ≤ |f |+ |g|.

4. |h g| ≤ |h| |g|.

The preceding result shows that the operator norm is a norm on the space L(E,F )of continuous linear maps from E to F , so L(E,F ) is a normed vector space.

Theorem 1.34. If F is complete, then L(E,F ) is complete.

Proof. Let fn be a Cauchy sequence in L(E,F ) and define a map f : E → F asfollows: if x ∈ E and ε > 0 then there exists some N such that

|fm(x)− fn(x)| = |(fm − fn)(x)| ≤ |fm − fn| |x| < ε |x|

for all m,n ≥ N , so fn(x) is a Cauchy sequence in F and we can define f(x)to be the value that fn(x) converges to. It is clear that f is linear. If we canshow that fn → f uniformly, then f is continuous since each fn is continuous. Letε > 0 and choose N so that |fm − fn| < ε for all m,n ≥ N . For any x ∈ E with|x| = 1 there exists some m ≥ N such that |f(x)− fm(x)| < ε, so

|(f − fn)(x)| ≤ |f(x)− fm(x)|+ |fm(x)− fn(x)| < 2ε

for all n ≥ N . This implies that |f − fn| ≤ 2ε.

If f : E → F is both a homeomorphism and a (linear) isomorphism, then we saythat f is a linear homeomorphism. If f is both an isometry and an isomor-phism, then we say that f is an isometric isomorphism; it is clear that everyisometric isomorphism is also a linear homeomorphism.

Theorem 1.35. If E and F are linearly homeomorphic normed vector spaces,then E is complete if and only if F is complete.

Proof. Assume that F is complete. Let ϕ : E → F be a linear homeomorphismand let xn be a Cauchy sequence in E. Then

|ϕxm − ϕxn| = |ϕ(xm − xn)| ≤ |ϕ| |xm − xn| ,

so ϕxn is a Cauchy sequence in F and converges to some y ∈ F since F iscomplete. But ∣∣xn − ϕ−1y

∣∣ =∣∣ϕ−1(ϕxn − y)

∣∣ ≤ ∣∣ϕ−1∣∣ |ϕxn − y| ,

so xn → ϕ−1y.

20


Corollary 1.36. If E is complete and f : E → F is a linear isometry, then f(E)is closed in F .

Proof. If we restrict the codomain of f to f(E) then f is an isometric isomorphism,so f(E) is complete and therefore closed.

We can repeat the above construction for multilinear maps. If E1, . . . , Ek, F arenormed vector spaces, then a multilinear map f : E1 × · · · × Ek → F is calledbounded if there exists a C > 0 such that

|f(x1, . . . , xk)| ≤ C |x1| · · · |xk|

for all xi ∈ Ei. Let L(E1, . . . , Ek;F ) be the space of all continuous multilinearmaps from E1 × · · · × Ek to F . (We give E1 × · · · × Ek the product topology.)

Theorem 1.37. Let E1, . . . , Ek, F be normed vector spaces.

1. A multilinear map f : E1 × · · · × Ek → F is continuous if and only if it isbounded. In that case, we define the norm of f by

|f | = inf C ∈ R : |f(x1, . . . , xk)| ≤ C |x1| · · · |xk| for all xi ∈ Ei

= supx1,...,xk 6=0

|f(x1, . . . , xk)||x1| · · · |xk|

= sup|xi|=1

|f(x1, . . . , xk)| .

2. L(E1, . . . , Ek;F ) is a normed vector space under the norm defined in (1).

3. If F is complete, then L(E1, . . . , Ek;F ) is complete.

4. The map ϕ : L(E1, . . . , Ek;F )→ L(E1, L(. . . , L(Ek, F )) . . . ) given by

(ϕf)(x1) · · · (xk) = f(x1, . . . , xk)

is an isometric isomorphism.

Finite-dimensional spaces

Recall that two metrics d1 and d2 on a set X are said to be equivalent if thereexist c, C > 0 such that

cd2(x, y) ≤ d1(x, y) ≤ Cd2(x, y)

21


for all x, y ∈ X. In that case, d1 and d2 induce the same topology on X. We saythat two norms |·|1 and |·|2 on E are equivalent if there exist c, C > 0 such that

c |x|2 ≤ |x|1 ≤ C |x|2

for all x ∈ E. It is obvious that equivalent norms induce equivalent metrics.Furthermore, a Cauchy sequence under one norm is also a Cauchy sequence underan equivalent norm.

Theorem 1.38. Let E be a vector space, and let |·|1 and |·|2 be two norms on E.The following are equivalent:

1. |·|1 and |·|2 are equivalent.

2. |·|1 and |·|2 induce equivalent metrics on E.

3. |·|1 and |·|2 induce the same topology on E.

Proof. (1) ⇒ (2) and (2) ⇒ (3) are clear. Suppose that (3) holds. Write E1 forE equipped with |·|1, and E2 for E equipped with |·|2. Since E1 and E2 have thesame topology, the identity maps ι1 : E1 → E2 and ι2 : E2 → E1 are continuous.Then

|x|2 = |ι1x|2 ≤ |ι1| |x|1 and |x|1 = |ι2x|1 ≤ |ι2| |x|2 ,

so |·|1 and |·|2 are equivalent. This proves (3)⇒ (1).

The following theorem shows that the algebraic structure of a finite-dimensionalvector space uniquely determines a natural topology.

Theorem 1.39. If E is a finite-dimensional vector space, then all norms on Eare equivalent.

Proof. Let e1, . . . , en be a basis for E as a real vector space; we can write anyx ∈ E in the form x = a1e1 + · · ·+ anen where a1, . . . , an ∈ R. If |·|∗ is a norm onE then

|x|∗ ≤ |a1| |e1|∗ + · · ·+ |an| |en|∗≤ max |e1|∗ , . . . , |en|∗ (|a1|+ · · ·+ |an|)

≤√nmax |e1|∗ , . . . , |en|∗

√a2

1 + · · ·+ a2n,

so the map(a1, . . . , an) 7→ |a1e1 + · · ·+ anen|∗

22


is continuous on Rn. Let |·|1 and |·|2 be norms on E. Then the map f : Rn \0 →R given by

(a1, . . . , an) 7→|a1e1 + · · ·+ anen|1|a1e1 + · · ·+ anen|2

is continuous, and since the unit sphere S = x ∈ Rn : |x| = 1 is compact, fattains a minimum c and a maximum C on S. It follows that

c |x|2 ≤ |x|1 ≤ C |x|2

for all x ∈ E. Therefore |·|1 and |·|2 are equivalent.

Corollary 1.40. Let E,F be normed vector spaces and let f : E → F be a linearmap. If E is finite-dimensional, then f is continuous.

Proof. Let e1, . . . , en be a basis for E and let

M = max |f(e1)| , . . . , |f(en)| .

Define a norm |·|1 on E by

|a1e1 + · · ·+ anen|1 = |a1|+ · · ·+ |an| .

There exists some C > 0 such that |x|1 ≤ C |x| for all x ∈ E. Then

|f(x)| = |a1f(e1) + · · ·+ anf(en)|≤M |x|1≤ CM |x| ,

so f is bounded.

Corollary 1.41. Let E be an n-dimensional real normed vector space. Then Eis linearly homeomorphic to Rn, and E is complete.

Proof. Let e1, . . . , en be a basis for E. The isomorphism ϕ : Rn → E given by(a1, . . . , an) 7→ a1e1 + · · · + anen is a homeomorphism since E and Rn are bothfinite-dimensional. Completeness follows from Theorem 1.35.

Corollary 1.42 (Heine-Borel property). If E is finite-dimensional, then everyclosed and bounded subset of E is compact. In particular, the unit sphere S =x ∈ E : |x| = 1 is compact.

Theorem 1.43. Let E be a normed vector space over K and let F be a subspaceof E.

23


1. If E is finite-dimensional, then F is closed in E.

2. If F is finite-dimensional, then F is closed in E.

Proof. For (1), Corollary 1.41 shows that F is complete, and therefore F is closedin E. For (2), let xn be a sequence in F that converges to some x ∈ E. Then Fis closed in F +Kx by part (1), so x ∈ F .

Theorem 1.44. Let E be a normed vector space. If E is locally compact, then Eis finite-dimensional.

Proof. Choose a relatively compact neighborhood U of 0 ∈ E. Since U is compact,there are finitely many x1, . . . , xn ∈ E such that U ⊆ (x1 + 1

2U)∪ · · · ∪ (xn + 12U).

Let F = span(x1, . . . , xn), which is finite-dimensional and closed by Theorem 1.43.Since U ⊆ F + 1

2U and 12F = F , we have 1

2U ⊆ F + 14U and U ⊆ F + F + 1

4U =F + 1

4U . Continuing this process gives

U ⊆∞⋂n=1

(F + 2−nU).

If x /∈ F then we must have x /∈ F + 2−nU for some n because F is closed and Uis a bounded set containing 0. This implies that x /∈ U , and therefore U ⊆ F . Ifx ∈ E then rx ∈ U ⊆ F for some r > 0, so x = r−1(rx) ∈ F . This shows thatE = F .

Extensions of linear maps

Theorem 1.45 (Linear extension theorem). Let F be a subspace of a normedvector space E, let G be a Banach space, and let f : F → G be a continuous linearmap. The closure F of F in E is also a subspace of E, and there is a uniqueextension of f to a continuous linear map f : F → G with the same norm as f .

Proof. If x, y ∈ F then there are sequences xn , yn in F such that xn → x andyn → y. Then xn + yn → x + y and rxn → rx for all r ∈ K, so x + y ∈ F andrx ∈ F . This proves that F is a subspace of E. The uniqueness of f is clear sinceit is continuous, and it remains to show that f exists. Let x ∈ F and choose asequence xn in F such that xn → x. Then

|fxm − fxn| = |f(xm − xn)| ≤ |f | |xm − xn| ,

24


so fxn is a Cauchy sequence. Since G is complete, we can define fx to be thevalue that fxn converges to. This is well-defined, for if x′n is another sequencein F that converges to x then∣∣fx′n − fx∣∣ ≤ |fx′n − fxn|+ ∣∣fxn − fx∣∣

≤ |f | (|x′n − x|+ |x− xn|) +∣∣fxn − fx∣∣

→ 0

as n → ∞. It is easy to verify that f is linear and agrees with f on F . Since|fxn| ≤ |f | |xn|, we have∣∣fx∣∣ =

∣∣∣ limn→∞

fxn

∣∣∣ = limn→∞

|fxn| ≤ |f | limn→∞

|xn| = |f | |x| .

Therefore∣∣f ∣∣ = |f |.

Let V be a vector space and let p : V → R be any function. If p(x+y) ≤ p(x)+p(y)for all x, y ∈ V , we say that p is subadditive. If p(rx) = rp(x) for all x ∈ V andr ≥ 0, we say that p is positively homogeneous. If p is both subadditive andpositively homogeneous, we say that p is sublinear. In particular, every seminorm(and every norm) on V is sublinear. Also note that every sublinear function isconvex.

Theorem 1.46 (Hahn-Banach theorem). Let F be a subspace of a real vectorspace E and let f : F → R be a linear map. If p : E → R is sublinear andf(x) ≤ p(x) for all x ∈ F , then there exists a linear map f : E → R such that

f = f on F and f(x) ≤ p(x) for all x ∈ F .

Proof. We first show that if v ∈ E \ F , then we can extend f to F + Rv whilepreserving the condition that f(x) ≤ p(x) for all x ∈ F + Rv. For all x, y ∈ F wehave

fx− fy ≤ p(x− y) = p(x+ v − v − y) ≤ p(x+ v) + p(−y − v),

so−fy − p(−y − v) ≤ −fx+ p(x+ v).

Choose some a ∈ R such that

supy∈F

(−fy − p(−y − v)) ≤ infx∈F

(−fx+ p(x+ v));

then setting y = x gives

fx+ a ≤ p(x+ v) and − fx− a ≤ p(−x− v)

25


for all x ∈ F . Define f : F + Rv → R by f(x+ tv) = fx+ at. If t > 0 then

fx+ at = t(f(xt

)+ a)≤ tp

(xt

+ v)

= p(x+ tv),

and if t < 0 then

fx+ at = −t(−f(xt

)− a)≤ −tp

(−xt− v)

= p(x+ tv),

so f(x+ tv) = fx+ at ≤ p(x+ tv) for all x ∈ F and t ∈ R.

Consider the set B of pairs (G, f) where G is a subspace of E containing F and f

is an extension of f to G such that f(x) ≤ p(x) for all x ∈ G. This is a partially

ordered set if we can declare that (G1, f1) ≤ (G2, f2) if and only if G1 ⊆ G2

and f2 extends f1. This set is also nonempty, since (F, f) ∈ B. Given a totally

ordered subset C = (Gα, fα)α∈A of B, we can define f on G =⋃α∈AGα by

setting f(x) = fα(x) for any α ∈ A. Then (G, f) is an upper bound of C. By

Zorn’s lemma, there is some maximal element (G, f) of B. If G 6= E then we can

choose some v ∈ E \G and use the previous construction to extend f to G+ Rv,

contradicting the maximality of (G, f). Therefore G = E, and f is the desiredextension of f .

The space L(E,K) is called the dual space of E, and is often denoted by E∗.Elements of E∗ are called linear functionals. We will sometimes refer to linearmaps from E to K as linear functionals even if they are not continuous, but thesecases will be explicitly mentioned.

Lemma 1.47.

1. Let f : E → C be a complex linear functional and let f1 = Re f . Then

f(x) = f1(x)− if1(ix) (*)

and |f | = |f1|.

2. If f1 : E → R is a real linear functional, then (*) defines a complex linearfunctional.

Proof. Write f = f1 + if2. Then f(x) = −if(ix) = −if1(ix) + f2(x), so f2(x) =−f1(ix). Since |f(x)| ≥ |f1(x)|, we have |f | ≥ |f1|. Also, if f(x) 6= 0 then

|f(x)| = f

(|f(x)|f(x)

x

)= f1

(|f(x)|f(x)

x

)≤ |f1| |x| ,

so |f | ≤ |f1|.

26


Theorem 1.48 (Hahn-Banach extension theorem). Let F be a subspace of anormed vector space E and let f : F → K be a linear functional. There exists alinear functional f : E → K that extends f and has the same norm as f .

Proof. First suppose that K = R. Define a sublinear function p : E → R byp(x) = |f | |x|. Then f(x) ≤ |f(x)| ≤ |f | |x| = p(x) for all x ∈ F , so Theorem

1.46 shows that there is a linear map f : E → R such that f(x) ≤ |f | |x| for all

x ∈ E. Clearly |f | = |f |. Now suppose that K = C. By Lemma 1.47, we can writef(x) = f1(x) − if1(ix) where f1 : F → R is a real linear functional. Considering

E and F as normed vector spaces over R, we can find an extension f1 of f1 to Esuch that |f1| = |f1|. Define f : E → C by f(x) = f1(x) − if1(ix). By Lemma1.47,

|f | = |f1| = |f1| = |f | .

Due to its importance, we often refer to Theorem 1.48 as the Hahn-Banach theorem(instead of Theorem 1.46).

Corollary 1.49. If v ∈ E is a nonzero vector, then there exists a linear functionalf on E such that fv 6= 0.

Separation theorems

If S is a nonempty subset of a normed vector space E, we define the distancefrom x ∈ E to S by

d(x, S) = infs∈S|x− s| .

Clearly, d(x, S) = 0 if and only if x ∈ S. Also, x 7→ d(x, S) is continuous.

Theorem 1.50. Let F be a closed subspace of a normed vector space E and letv ∈ E \ F . There exists a linear functional f : E → K such that f = 0 on F ,fv = d(v, F ) > 0, and |f | = 1.

Proof. Let F ′ = F +Kv and define a linear map λ : F ′ → K by

λ(x+ rv) = rd(v, F ).

Then

|λ(x+ rv)| = |r| d(v, F )

≤ |r|∣∣v − (−r−1x)

∣∣= |x+ rv|

27


for r 6= 0, so |λ| ≤ 1. By Theorem 1.48, we can extend λ to a linear functionalf : E → K such that |f | ≤ 1. Let ε > 0 and choose x ∈ F such that |v − x| <d(v, F ) + ε. Then

|f(x− v)||x− v|

>d(v, F )

d(v, F ) + ε,

so taking ε→ 0 shows that |f | = 1.

If S ⊆ E, the Minkowski functional of S is

µS(x) = inf λ > 0 : x ∈ λS ,

assuming that the inf exists.

Lemma 1.51. Let S be a convex subset of a normed vector space E such that0 ∈ IntS.

1. The Minkowski functional µS of S is sublinear. Also, µ−1S ([0, r)) ⊆ rS.

2. 0 ≤ µS(x) ≤ C |x| for some constant C > 0, so µS is continuous.

3. IntS = µ−1S ([0, 1)) and S = µ−1

S ([0, 1]).

Proof. Choose c > 0 such that the open ball B of radius c around 0 is containedin S. It is then easy to see that µS is well-defined, and that µS(rx) = rµS(x) forall r ≥ 0. Let r > 0 and let x ∈ µ−1

S ([0, r)). By definition, there is some s withµS(x) ≤ s < r such that x ∈ sS, i.e. x = sy for some y ∈ S. Then (s/r)y ∈ Ssince S is convex, and x = r(s/r)y ∈ rS. This shows that µ−1

S ([0, r)) ⊆ rS. Nowlet x, y ∈ E and let r, s such that µS(x) < r and µS(y) < s. Then x+y ∈ rS+sS,so

x+ y ∈ (r + s)

(r

r + sS +

s

r + sS

)⊆ (r + s)S

since S is convex. Therefore µS(x+ y) ≤ r + s. Since r and s were arbitrary, wehave µS(x+ y) ≤ µS(x) + µS(y). This proves (1). If x 6= 0 then

cx

2 |x|∈ B ⊆ S,

so

µS(x) =2 |x|cµS

(cx

2 |x|

)≤ 2

c|x| .

This proves (2). Since µS is continuous, µ−1S ([0, 1)) is an open subset of S

and µ−1S ([0, 1)) ⊆ IntS. Similarly, µ−1

S ((1,∞)) is an open subset of E \ S, soµ−1S ((1,∞)) ⊆ Int(E \ S). If µS(x) = 1 and 0 < r < 1 < s then rx ∈ IntS andsx /∈ S, so µ−1

S (1) ⊆ ∂S. This proves (3).

28


Theorem 1.52. Let S be a closed convex subset of a normed vector space E andlet v ∈ E \ S. There exists a linear functional f : E → K such that

supx∈S

Re f(x) ≤ 1 and Re f(v) > 1.

Proof. We can assume that 0 ∈ S. Let δ = d(v, S) > 0 and define

T = x ∈ E : d(x, S) < δ/2 .

If x, y ∈ T then we can choose x′, y′ ∈ S such that |x− x′| < δ/2 and |x− y′| <δ/2, so

|tx+ (1− t)y − (tx′ + (1− t)y′)| ≤ t |x− x′|+ (1− t) |y − y′| < δ/2.

This shows that T is convex. The closure of a convex set is also convex, so T isconvex. Since x 7→ d(x, S) is continuous, 0 ∈ IntT . Furthermore, S ⊆ T andv /∈ T . Lemma 1.51 shows that the Minkowski functional µT of T is well-definedand sublinear, there is some C > 0 such that µT (x) ≤ C |x| for all x ∈ E, andµT (v) > 1.

First suppose that K = R. Define f : Rv → R by f(rv) = rµT (v). By Theorem

1.46, there is an extension f of f to E such that f(x) ≤ µT (x) ≤ C |x| for all x ∈ E.

Clearly f is continuous. Since S ⊆ T , we have f(x) ≤ µT (x) ≤ 1 for all x ∈ S.

Also, f(v) = f(v) = µT (v) > 1. Now suppose that K = C. Considering E as anormed vector space over R, we can find a linear functional f1 : E → R as before.By Lemma 1.47, the linear functional f : E → C given by f(x) = f1(x)− if1(ix)has the desired properties.

Completion of a normed vector space

A completion of a normed vector space E is a pair (E, ι) where E is a Banachspace and ι : E → E is a linear isometry such that ι(E) is dense in E.

Theorem 1.53. The completion of E, if it exists, is unique: if (F, ι′) is anothercompletion of E, then there is a unique isometric isomorphism ϕ : E → F suchthat ι′ = ϕ ι.

Proof. Since ι′ ι−1 : ι(E)→ F is an isometry, Theorem 1.45 shows that there isan extension of ι′ ι−1 to a linear isometry ϕ : E → F , which satisfies ι′ = ϕ ι.Similarly, there is an extension of ι (ι′)−1 : ι′(E) → E to a linear isometryψ : F → E, which satisfies ι = ψ ι′. But

(ψ ϕ)|ι(E) = ψ ϕ|ι(E) = ψ ι′ ι−1 = ι ι−1 = Idι(E),

so ψ ϕ = IdE by continuity. Similarly, ϕ ψ = IdF .

29


Lemma 1.54. The linear map τ : E → E∗∗ given by

(τx)f = fx

is an isometry.

Proof. We have |(τx)f | = |fx| ≤ |x| |f |, so |τx| ≤ |x|. Conversely, if x ∈ Ethen by Theorem 1.48 we can choose f ∈ E∗ with |f | = 1 and fx = |x|. Since|(τx)f | = |fx| = |x|, we have |τx| ≥ |x| and therefore |τx| = |x|.

If the isometry τ : E → E∗∗ is surjective, then E is naturally isomorphic to E∗∗

and we say that E is reflexive.

Theorem 1.55. Every normed vector space has a completion.

Proof. Let E be a normed vector space over K and let τ : E → E∗∗ be the injectiondefined in Lemma 1.54. Since E∗∗ = L(E∗,K) and K is complete, Theorem 1.34shows that E∗∗ is complete. Then (τ(E), τ) is a completion of E.

1.4 Products, quotients and duality

The open mapping theorem

Theorem 1.56 (Baire category theorem). Let X be a complete metric space.

1. If Un is a sequence of open dense subsets of X, then⋂∞n=1 Un is dense in

X.

2. If Sn is a sequence of closed subsets of X and X =⋃∞n=1 Sn, then some

Sn contains a nonempty open set.

Proof. Let x0 ∈ X and ε0 > 0; we need to show that there is some x ∈⋂∞n=1 Un

such that d(x0, x) < ε0. Construct a sequence x1, x2, . . . as follows: havingchosen xn and εn (for n ≥ 0), we choose xn+1 ∈ Un+1 and εn+1 ∈ (0, εn/2) sothat d(xn, xn+1) < εn/2 and Bεn+1(xn+1) ⊆ Un+1. For all integers n > m ≥ 1 wehave

d(xm, xn) < εm

(1

2+ · · ·+ 1

2n−m

)< εm <

ε0

2m, (*)

so xn is a Cauchy sequence, and xn → x for some x ∈ X since X is complete.Furthermore, taking n→∞ in (*) gives d(xm, x) ≤ εm, so

d(xm, x) ≤ d(xm, xm+1) + d(xm+1, x) <εm2

+ εm+1 < εm

30


for all m. This shows that x ∈⋂∞n=1 Un, and proves (1). Part (2) follows by taking

the complement of each Sn.

Theorem 1.57 (Open mapping theorem). Let E,F be Banach spaces. If f : E →F is a surjective continuous linear map, then f is open.

Proof. Let Ur be the open ball of radius r around 0 in E, and let Vr be the openball of radius r around 0 in F . It suffices to show that f(Uk) contains an open ballaround 0, for some positive integer k. Since X =

⋃∞k=1 Uk/2 and f is surjective,

we have F =⋃∞k=1 f(Uk/2). By Theorem 1.56, there is some k ≥ 1 and an open

ball B2r(z) in F such that B2r(z) ⊆ f(Uk/2). Choose z′ ∈ f(Uk/2) and x′ ∈ Uk/2such that |z′ − z| < r and z′ = fx′; then Br(z

′) ⊆ B2r(z) ⊆ f(Uk/2), and

y = fx′ + (y − z′) ∈ f(x′ + Uk/2) ⊆ f(Uk)

for all y ∈ Vr. Note that if y ∈ Vr/2n , then

y =1

2n(2ny) ⊆ 1

2nf(Uk) = f(Uk/2n).

Let y ∈ Vr/2 and construct a sequence xn as follows: choose some x1 ∈ Uk/2 such

that y− fx1 ∈ Vr/4. Having chosen x1, . . . , xn−1 such that y−∑n−1i=1 fxi ∈ Vr/2n ,

choose some xn ∈ Uk/2n such that y −∑ni=1 fxi ∈ Vr/2n+1 . Since

∞∑n=1

|xn| <∞∑n=1

k

2n= k,

the series∑∞n=1 xn converges to some x ∈ Uk, and y = fx by continuity. This

shows that Vr/2 ⊆ f(Uk).

Corollary 1.58. Let E,F be Banach spaces. If f : E → F is a continuousisomorphism, then f is a linear homeomorphism.

Recall that the graph of a map f : X → Y is the set (x, f(x)) : x ∈ X in X×Y .If f : E → F is a continuous linear map, then the graph of f is closed (since F isHausdorff). The converse holds:

Theorem 1.59 (Closed graph theorem). Let E,F be Banach spaces and let f :E → F be a linear map. If the graph of f is closed, then f is continuous.

Proof. Let G be the graph of f , which is a closed subspace of E×F . The projectionπE : G → E given by (x, y) 7→ x is a continuous isomorphism, so it is a linearhomeomorphism by Corollary 1.58. Let πF : G → F be the projection onto F ;then f = πF π−1

E is continuous.

31


Theorem 1.60 (Uniform boundedness principle). Let fαα∈A be a collectionof maps in L(E,F ) such that fαxα∈A is bounded for all x ∈ E. If B ⊆ E isbounded, then

⋃α∈A fα(B) is bounded.

Proof. For n = 1, 2, . . . , let

An = x ∈ E : |fαx| ≤ n for all α ∈ A,

which is closed because every fα is continuous. Then E =⋃∞n=1An, so Theorem

1.56 implies that some An contains an open ball Br(x0), where r > 0 and x0 ∈ E.For all x ∈ Br(0) we have

|fαx| ≤ |fα(x0 + x− x0)| = |fα(x0 + x)|+ |fαx0| ≤ 2n,

and the result follows.

Compare the next theorem with Corollary 1.130.

Theorem 1.61. Let E,F be Banach spaces. The set of surjective continuouslinear maps is open in L(E,F ).

Proof. Let f ∈ L(E,F ) be surjective. By the open mapping theorem, there is aconstant C > 1 such that for all y ∈ B1(0) ⊆ F , there exists some x ∈ E suchthat fx = y and |x| ≤ C |y|. Choose some 0 < r < C−1, and let g ∈ L(E,F ) with|f − g| < r. To prove that g is surjective, it suffices to show that

g(BC(1−rC)−1(0)) ⊇ B1(0).

Let y0 ∈ B1(0) ⊆ F ; then there is some x0 ∈ E such that fx0 = y0 and |x0| ≤C|y0|. Let y1 = fx0 − gx0, so that

|y1| ≤ |f − g||x0| ≤ rC.

Since rC < 1, there is some x1 ∈ E such that fx1 = y1 and |x1| ≤ C|y1|. Lety2 = fx1 − gx1, so that

|y2| ≤ |f − g||x1| ≤ (rC)2.

Continuing this process, we have sequences xn and yn such that fxn = yn,yn+1 = fxn − gxn, |yn| ≤ (rC)n, and |xn| ≤ C(rC)n. By induction, we have

y0 = gx0 + · · ·+ gxn + yn+1

for all n. Since rC < 1, the series

x =

∞∑n=0

xn

converges absolutely, with |x| ≤ C(1− rC)−1. It is easy to see that gx = y0, andthis completes the proof.

32


Finite product spaces

Let E1, . . . , Ek be normed vector spaces. The product norm on E1 × · · · × Ekwith respect to a norm ‖·‖ on Rk is defined by

|(x1, . . . , xk)| = ‖(|x1| , . . . , |xk|)‖ ;

it is easy to check that this is indeed a norm on E1 × · · · × Ek.

Theorem 1.62. Let ‖·‖ and ‖·‖∗ be norms on Rk, and let |·| and |·|∗ be theassociated product norms on E1 × · · · × Ek.

1. |·| and |·|∗ are equivalent.

2. Any product norm induces the product topology on E1 × · · · × Ek.

3. If E1, . . . , Ek are Banach spaces, then E1×· · ·×Ek is a Banach space underany product norm.

Proof. Theorem 1.39 shows that ‖·‖ and ‖·‖∗ are equivalent, and (1) follows im-mediately. Parts (2) and (3) follow from the fact that a product norm induces aproduct metric on E1 × · · · × Ek.

Since any two product norms are equivalent, we usually do not specify whichproduct norm we are using if we are only interested in the topology of the productspace.

Theorem 1.63. Let F be a normed vector space, let f : F → E1 × · · · ×Ek, andlet fi = πi f be the component functions of f , where πi : E1 × · · · × Ek → Ei isthe canonical projection. Then f is a continuous linear map if and only if everyfi is a continuous linear map.

A product of normed vector spaces satisfies the following universal property:

Theorem 1.64 (Universal property of product spaces). If F is a normed vectorspace and fi : F → Ei are continuous linear maps, then there is a unique contin-uous linear map f : F → E1 × · · · × Ek such that fi = πi f for each i, whereπi : E1 × · · · × Ek → Ei is the canonical projection.

Proof. This follows immediately from the corresponding universal properties forvector spaces and topological spaces.

A finite product of Banach spaces is not only a product, but also a coproduct.

33


Theorem 1.65. Let F be a normed vector space, let f : E1 × · · · ×Ek → F , andlet fi = f ιi be the component functions of f , where ιi : Ei → E1 × · · · × Ek isthe canonical injection. Then f is a continuous linear map if and only if every fiis a continuous linear map.

Theorem 1.66 (Universal property of coproduct spaces). If F is a normed vectorspace and fi : Ei → F are continuous linear maps, then there is a unique contin-uous linear map f : E1 × · · · × Ek → F such that fi = f ιi for each i, whereιi : Ei → E1 × · · · × Ek is the canonical injection.

Theorem 1.67 (Internal direct products). Let E be a Banach space and let F,Gbe closed subspaces of E such that F + G = E and F ∩ G = 0. Then the map(x, y) 7→ x+ y from F ×G to E is a linear homeomorphism.

Proof. The map is a continuous isomorphism, so it is a linear homeomorphism byCorollary 1.58.

In Theorem 1.67, we say that E is the (internal) direct product or direct sumof F and G. We also say that F and G are (closed) complements of each otherin E. If F is a closed subspace of E that has a closed complement, we say that Fsplits E.

The fact that the notions of direct sum (coproduct) and direct product (product)coincide for a finite collection of vector spaces allows us to generalize the conceptof block matrices. Suppose E1, . . . , Em and F1, . . . , Fn are vector spaces, andλ : E1 × · · · × Em → F1 × · · · × Fn is a linear map. We have unique linear mapsλi,j : Ej → Fi such that λi,j = πi λ ιj , where πi : F1 × · · · × Fn → Fi is thecanonical projection and ιj : Ej → E1 × · · · × Em is the canonical injection. Inthis case we write

λ =

λ1,1 · · · λ1,m

.... . .

...λn,1 · · · λn,m

and say that the matrix represents λ, and that the maps λi,j are the compo-nents of λ. If τ : F1 × · · · × Fn → G1 × · · · ×Gp is another linear map, then it iseasy to verify that τ λ is represented by the product of the matrix representingτ with the matrix representing λ, with the usual matrix multiplication formula.On the other hand, if we start with linear maps λi,j , then there is a unique linearmap λ that has the components λi,j . The situation is entirely analogous to blockmatrices in R or C.

Let E be a normed vector space and let f : E → E be a continuous linear map.We say that a subspace F ⊆ E is an invariant subspace of f if f(F ) ⊆ F . Now

34


let F,G be subspaces of E such that E = F ⊕ G. If F and G are both invariantsubspaces of f , then we can decompose f into continuous linear maps f |F : F → Fand f |G : G→ G such that f = ιF f |F πF + ιG f |G πG, where πF : E → Fand πG : E → G are the canonical projections and ιF : F → E and ιG : G → Eare the canonical injections. In block matrix notation,

f =

[f |F 00 f |G

].

Quotient spaces

Let E be a normed vector space over K and let F be a subspace of E. Thequotient norm on E/F is defined by

|x+ F | = inff∈F|x+ f | .

Note that |x+ F | ≤ |x| for all x ∈ E.

Theorem 1.68. The quotient norm is a seminorm on E/F , and |x+ F | = 0 ifand only if x ∈ F . Thus, the quotient norm is a norm if and only if F is closedin E.

Proof. If r ∈ K is nonzero and x ∈ E then

|kx+ F | = inff∈F|rx+ f | = |r| inf

f∈F|x+ f/r| = |r| inf

f∈F|x+ f | = |r| |x+ F | .

If x, y ∈ E then

|x+ y + F | = inff∈F|x+ y + f |

= inff,g∈F

|x+ f + y + g|

≤ inff,g∈F

(|x+ f |+ |y + g|)

= |x+ F |+ |y + F | .

This shows that the quotient norm is a seminorm. Suppose x ∈ F and let ε > 0.Choose some f ∈ F such that |x− f | < ε; then

|x+ F | = |x− f + F | ≤ |x− f | < ε.

Therefore |x+ F | = 0. Conversely, suppose that |x+ F | = 0 and let ε > 0. Since

inff∈F|x− f | = 0,

we can choose some f ∈ F such that |x− f | < ε. Therefore x ∈ F .

35


Unless otherwise specified, we always equip the quotient space with the quotientnorm. The quotient map π : E → E/F is a continuous linear map, since |πx| =|x+ F | ≤ |x|. In fact, it induces the quotient topology on E/F :

Theorem 1.69. Let E be a normed vector space and let F be a closed subspaceof E. The quotient map π : E → E/F is a topological quotient map.

Proof. The quotient map is surjective and continuous, so it suffices to show thatit is an open map. Let U ⊆ E be an open set and let x+ F ∈ π(U) where x ∈ U .Choose r > 0 such that y ∈ U whenever |y − x| < r. Suppose y + F ∈ E/F with|(y + F )− (x+ F )| < r. By the definition of the quotient norm, we can choosesome f ∈ F such that

|y − x+ f | < |y − x+ F |+ (r − |y − x|) ≤ r.

Then y + f ∈ U and y + F = πy = π(y + f) ∈ π(U). This shows that the openball of radius r around x+ F is contained in π(U).

Corollary 1.70. Let E be a normed vector space and let F be a closed subspaceof E. If G is a normed vector space and f : E/F → G, then f is a continuouslinear map if and only if f π is a continuous linear map.

We have the following universal property:

Theorem 1.71 (Universal property of quotient spaces). Let E be a normed vectorspace and let F be a closed subspace of E. If G is a normed vector space andf : E → G is a continuous linear map with F ⊆ ker f , then there is a uniquecontinuous linear map f : E/F → G such that f = f π.

Proof. The existence of f follows from the corresponding universal property forquotient vector spaces, and f is continuous by Corollary 1.70.

Theorem 1.72. If E is a Banach space and F is a closed subspace of E, thenE/F is also a Banach space.

Proof. By Theorem 1.16, it suffices to show that absolute convergence impliesconvergence in E/F . Let

∑∞n=1(xn + F ) be an absolutely convergent series. For

each n, we can choose some fn ∈ F such that |xn + fn| < |xn + F |+ 2−n. Then

∞∑n=1

|xn + fn| <∞∑n=1

|xn + F |+∞∑n=1

1

2n<∞,

36


so∑∞n=1(xn + fn) converges to some x ∈ E. We have∣∣∣∣∣

N∑n=1

(xn + F )− (x+ F )

∣∣∣∣∣ =

∣∣∣∣∣N∑n=1

xn − x+ F

∣∣∣∣∣=

∣∣∣∣∣N∑n=1

(xn + fn)− x+ F

∣∣∣∣∣≤

∣∣∣∣∣N∑n=1

(xn + fn)− x

∣∣∣∣∣→ 0

as N →∞, so∑∞n=1(xn + F ) converges to x+ F .

Note that if E,F are Banach spaces and f : E → F is a continuous linear mapwith im f closed then E/ ker f is linearly homeomorphic to im f .

Let f : E → F be a linear map. Given any subset S ⊆ E, we say that f : E → Fis bounded below on S if there exists some C > 0 such that |fx| ≥ C|x| for allx ∈ S. We say that f is bounded below if it is bounded below on E.

Lemma 1.73. Let E,F be Banach spaces and let f : E → F be a continuouslinear map. Then f is bounded below if and only if f is injective and im f isclosed.

Proof. If f is injective and im f is closed then im f is a Banach space and f :E → im f is a linear homeomorphism. Since

∣∣f−1x∣∣ ≤ ∣∣f−1

∣∣ |x|, we have |fx| ≥∣∣f−1∣∣−1 |x| for all x ∈ E. Conversely, if |fx| ≥ C |x| then f is injective and∣∣f−1x∣∣ ≤ C−1

∣∣ff−1x∣∣ = C−1 |x|, so f−1 : im f → E is continuous and therefore

f : E → im f is a linear homeomorphism. This implies that im f is closed inF .

Recall that for a subspace W of a vector space V , the codimension of W isdim(V/W ).

Theorem 1.74. Let E be a Banach space over K and let F be a closed subspaceof E. If F has finite dimension or finite codimension, then F splits E.

Proof. We use Theorem 1.67. Suppose that n = dimF < ∞, let v1, . . . , vn bea basis for F , and let f1, . . . , fn be the dual basis for F ∗. By the Hahn-Banach

theorem, we can extend each fi to a continuous linear functional fi on E. Define

37


ϕ : E → Kn by x 7→ (f1x, . . . , fnx); then G = kerϕ is closed and it is easy to see

that F ∩G = 0. If x ∈ E and y = f1(x)v1 + · · ·+ fn(x)vn then ϕ(x− y) = 0, sox = y+ (x− y) ∈ F +G. This proves (1). Now assume that n = dim(E/F ) <∞.Let x1 + F, . . . , xn + F be a basis for E/F and let G = span(x1, . . . , xn) ⊆ E.Then G is finite-dimensional and closed by Theorem 1.43, and it is easy to seethat F +G = E and F ∩G = 0. This proves (2).

Duality

Let E be a normed vector space. The annihilator of a subset S ⊆ E is

S⊥ = f ∈ E∗ : fx = 0 for all x ∈ S .

The pre-annihilator of a subset T ⊆ E∗ is

⊥T = x ∈ E : fx = 0 for all f ∈ T .

(These are not to be confused with orthogonal complements in Hilbert spaces.However, the two concepts are closely related.) If f ∈ E∗ and x ∈ E, we write〈x, f〉 = f(x). Note that the map 〈·, ·〉 : E × E∗ → K is continuous and bilinear.

Lemma 1.75. Let S, S′ ⊆ E and T, T ′ ⊆ E∗.

1. If S ⊆ S′, then S⊥ ⊇ (S′)⊥. If T ⊆ T ′, then ⊥T ⊇ ⊥T ′.

2. S⊥ is a closed subspace of E∗, and ⊥T is a closed subspace of E.

3. S⊥ = (span(S))⊥ and span(S) = ⊥(S⊥). If S is a closed subspace of E,then S = ⊥(S⊥).

Proof. If x ∈ E then x⊥ = f ∈ E∗ : fx = 0 is closed by the continuity of the

map f 7→ fx (see Lemma 1.54). Therefore S⊥ =⋂x∈S x

⊥is closed. Similarly,

if f ∈ T then ⊥ f = x ∈ E : fx = 0 is closed by the continuity of f . Therefore⊥T =

⋂f∈T

⊥ f is closed. This proves (2). Since S ⊆ span(S), we have S⊥ ⊇(span(S))⊥. Conversely, let f ∈ S⊥ and x ∈ span(S). Choose a sequence xn inspan(S) such that xn → x. Then fxn = 0 for each n, so fx = 0 by continuity.Therefore f ∈ (span(S))⊥. Suppose that S is a closed subspace of E. It is clearthat S ⊆ ⊥(S⊥). Conversely, suppose that x ∈ (⊥(S⊥)) \ S and choose a linearfunctional f ∈ E∗ such that f = 0 on S and fx 6= 0 (using Theorem 1.50). Thenf ∈ S⊥ and x /∈ ⊥(S⊥), which is a contradiction. This proves (3).

38


Theorem 1.76. Let E,F be normed vector spaces and let f : E → F be acontinuous linear map. There is a unique continuous linear map f∗ : F ∗ → E∗,called the transpose of f , satisfying

〈fx, y〉 = 〈x, f∗y〉

for all x ∈ E and y ∈ F ∗.

Proof. Define (f∗y)x = (y f)x. Since |(f∗y)x| = |(y f)x| ≤ |y| |f | |x|, eachf∗y is continuous and |f∗y| ≤ |f | |y|. Therefore f∗ is continuous. Uniqueness isobvious.

Theorem 1.77 (Properties of the transpose). Let E,F,G be normed vector spacesover K, and let f, g ∈ L(E,F ) and h ∈ L(F,G).

1. 0∗ = 0 and Id∗E = IdE∗ .

2. (f + g)∗ = f∗ + g∗.

3. (rf)∗ = rf∗ for all r ∈ K.

4. f∗∗ τE = τF f , where τE : E → E∗∗ and τF : F → F ∗∗ are as in Lemma1.54.

5. |f∗| = |f |.

6. (h g)∗ = g∗ h∗.

7. If f is a linear homeomorphism, then (f∗)−1 = (f−1)∗.

Proof. For (4), we have

〈y, f∗∗τEx〉 = 〈f∗y, τEx〉 = 〈x, f∗y〉= 〈fx, y〉 = 〈y, τF fx〉

for all x ∈ E and y ∈ F ∗. For (5), Theorem 1.76 shows that |f∗| ≤ |f | and|f∗∗| ≤ |f∗|. But

|f | = sup|x|=1

|fx| = sup|x|=1

|τF (fx)|

= sup|x|=1

|f∗∗(τEx)|

≤ sup|x|=1

|f∗∗| |τE | |x|

= |f∗∗| ,

39


so|f∗| ≤ |f | ≤ |f∗∗| ≤ |f∗| .

Lemma 1.78. Let E,F be normed vector spaces and let f ∈ L(E,F ). If f∗ isinjective and im f∗ is closed, then f is surjective.

Proof. Let U be the closed unit ball around 0 in E; it suffices to show that f(U)contains an open ball around 0 in F . By Lemma 1.73, there is a C > 0 such that|f∗z| ≥ C |z| for all z ∈ F ∗. Let V be the open ball of radius C around 0 in F .Suppose that y ∈ V \f(U). Since f(U) is a closed convex set, there is some z ∈ F ∗such that |zfx| ≤ 1 for all x ∈ U and zy > 1 (see Theorem 1.52). Then |z| > C−1

and|f∗z| = sup

x∈U|(f∗z)x| = sup

x∈U|zfx| ≤ 1 < C |z| ,

which is a contradiction.

Theorem 1.79 (Closed range theorem). Let E,F be normed vector spaces andlet f ∈ L(E,F ). Then ker f∗ = (im f)⊥ and ker f = ⊥(im f∗). If E and F areBanach spaces, then the following are equivalent:

1. im f is closed.

2. im f = ⊥(ker f∗).

3. im f∗ = (ker f)⊥.

4. im f∗ is closed.

Proof. We have

x ∈ ker f∗ ⇔ f∗x = 0

⇔ 〈y, f∗x〉 = 0 for all y ∈ E⇔ 〈fy, x〉 = 0 for all y ∈ E⇔ x ∈ (im f)⊥,

so ker f∗ = (im f)⊥. A similar argument shows that ker f = ⊥(im f∗). We have⊥(ker f∗) = ⊥((im f)⊥), and Lemma 1.75 shows that im f = ⊥((im f)⊥) if andonly if im f is closed. This proves (1) ⇔ (2). If y ∈ im f∗ and x ∈ ker f theny = f∗y′ for some y′ ∈ F ∗, so

〈x, y〉 = 〈x, f∗y′〉 = 〈fx, y′〉 = 0.

40


Therefore im f∗ ⊆ (ker f)⊥. Suppose that im f is closed; we want to show that(ker f)⊥ ⊆ im f∗. Let π : E → E/ ker f be the quotient map. By Theorem 1.71,we have a linear homeomorphism ϕ : E/ ker f → im f such that f = ϕ π. If y ∈(ker f)⊥ then ker f ⊆ ker y, so there is a continuous linear map y : E/ ker f → Ksuch that y = y π. Then

y = y π = y ϕ−1 f,

so y = f∗y ∈ im f∗ if y is any extension of y ϕ−1 to F (using Theorem 1.48).This proves (1)⇒ (3), and (3)⇒ (4) is obvious.

Suppose that im f∗ is closed. Let g : E → im f be f with its codomain restrictedto im f , so that f = ι g where ι : im f → F is the canonical injection. Theng∗ : (im f)∗ → E∗ satisfies g∗ι∗ = f∗. In order to apply Lemma 1.78 to g, we needto check that g∗ is injective and im g∗ is closed. The kernel of ι∗ : F ∗ → (im f)∗ is

ker ι∗ = (im ι)⊥ = (im f)⊥ = (im f)⊥,

so by Theorem 1.71 there is a linear homeomorphism ϕ : F ∗/(im f)⊥ → (im f)∗

satisfying ι∗ = ϕ π where π : F ∗ → F ∗/(im f)⊥ is the projection. Considerf∗ : F ∗ → E∗; since ker f∗ = (im f)⊥, Theorem 1.71 shows that there is a

continuous injective linear map f : F ∗/(im f)⊥ → E∗ with im f = im f∗ and

f∗ = f π. Nowg∗ ϕ π = g∗ ι∗ = f∗ = f π,

so g∗ ϕ = f (by uniqueness in Theorem 1.71). Then g∗ is injective and im g∗ =

im f = im f∗ is closed, so Lemma 1.78 shows that g is surjective. Thereforeim f = im g = im f is closed.

1.5 Hilbert spaces

Projections

Let V be a vector space and let ρ : V → V be a linear map. We say that ρ is aprojection (onto im ρ along ker ρ) if ρ|im ρ = Idim ρ and V = im ρ ⊕ ker ρ (i.e.im ρ+ ker ρ = V and im ρ∩ ker ρ = 0). We say that ρ is idempotent if ρ2 = ρ.

Theorem 1.80 (Properties of projections). Let V be a vector space and let ρ :V → V be a linear map.

1. ρ is a projection if and only if it is idempotent.

41


2. If ρ is a projection (onto im ρ along ker ρ), then IdV −ρ is a projection (ontoker ρ along im ρ).

3. If S, T are subspaces of V such that V = S⊕T , then there exists a projectionρS,T : V → V onto S along T , defined by ρS,T (s + t) = s where s ∈ S andt ∈ T .

Proof. We only prove (1). If ρ is a projection, s ∈ im ρ and t ∈ ker ρ, thenρ2(s+ t) = s = ρ(s+ t). Conversely, suppose that ρ2 = ρ. If x ∈ im ρ ∩ ker ρ thenx = ρy for some y ∈ V , so x = ρy = ρ2y = ρx = 0. If x ∈ V then x = ρx+(x−ρx),where ρx ∈ im ρ and x−ρx ∈ ker ρ because ρ(x−ρx) = ρx−ρ2x = 0. This showsthat V = im ρ ⊕ ker ρ. Finally, if x ∈ im ρ then x = ρy for some y ∈ V , soρx = ρ2y = ρy = x.

Theorem 1.81 (Continuity of projections). Let V be a normed vector space andlet ρ : V → V be a projection.

1. If ρ is continuous then im ρ and ker ρ are both closed, and V = im ρ× ker ρin the sense of Theorem 1.67.

2. If V is a Banach space, and im ρ and ker ρ are both closed, then ρ is contin-uous.

Proof. For (1), ker ρ = ρ−1(0) is closed because ρ is continuous. But IdV − ρ isalso continuous and is the projection onto ker ρ along im ρ, so the same argumentshows that im ρ is closed. For (2), we use Theorem 1.59. Let xn be a sequence inV such that (xn, ρxn)→ (x, y) for some x, y ∈ V . It remains to show that y = ρx.Since ρxn ∈ im ρ and im ρ is closed, we have y ∈ im ρ. Since (IdV −ρ)xn ∈ ker ρ andker ρ is closed, we have x− y ∈ ker ρ. Therefore y = ρy = ρy + ρ(x− y) = ρx.

Not all projections on a Banach space V are continuous, and it is not alwayspossible to construct a continuous projection onto a subspace S ⊆ V , even whenS is closed. This is related to the fact that a closed complement does not alwaysexist for a closed subspace S of V (if it does, then Theorem 1.81 implies that theprojection onto S along its closed complement is continuous). However, we willshow that closed complements do always exist for closed subspaces if V is a Hilbertspace.

Now let E be an inner product space. A projection ρ : E → E is orthogonal ifim ρ ⊥ ker ρ, i.e. E = im ρ ker ρ.

Theorem 1.82. A linear map ρ : E → E is an orthogonal projection if and onlyif it is a projection and |ρx| ≤ |x| for all x ∈ E. In that case, |ρx| = |x| if andonly if x ∈ im ρ.

42


Proof. Suppose that ρ is an orthogonal projection. If s ∈ im ρ and t ∈ ker ρ, then

|ρ(s+ t)|2 = |s|2 ≤ |s|2 + |t|2 = |s+ t|2

since s ⊥ t. Conversely, suppose that ρ is a projection and |ρx| ≤ |x| for all x ∈ E;we want to show that im ρ ⊥ ker ρ. If s ∈ im ρ and t ∈ ker ρ then

|s|2 = |ρ(s+ t)|2 ≤ |s+ t|2 = |s|2 + 〈s, t〉+ 〈t, s〉+ |t|2 ,

so −2 Re(〈s, t〉) ≤ |t|2 and s ⊥ ker ρ by Lemma 1.13.

The inequality |ρx| ≤ |x| is known as Bessel’s inequality, and it implies thatevery orthogonal projection is continuous.

Existence of orthogonal projections

We will now show that if S is a closed subspace of a Hilbert space E, then thereexists an orthogonal projection onto S. Thus, Theorem 1.81 shows that E can bedecomposed into a direct product of closed subspaces S and S⊥.

Theorem 1.83. Let E be an inner product space and let S be a nonempty completeconvex subset of E. There exists a unique element v ∈ S such that

|v| = infu∈S|u| .

Proof. Let d = infu∈S |u| and choose a sequence un in S such that |un| → d.By the parallelogram law,

|um − un|2 = 2 |um|2 + 2 |un|2 − 4 |(um + un)/2|2

≤ 2 |um|2 + 2 |un|2 − 4d2,

so un is Cauchy and converges to some v ∈ S. (Note that (um + un)/2 ∈ Ssince S is convex.) Since the norm is continuous, we have |v| = limn→∞ |un| = d.If |v| = |v′| = d then

|v − v′|2 = 2 |v|2 + 2 |v′|2 − 4 |(v + v′)/2|2

≤ 2d+ 2d− 4d

= 0,

so v is unique.

43


Corollary 1.84. Let E be an inner product space, let x ∈ E, and let S be anonempty complete convex subset of E. There exists a unique v ∈ S such that

|x− v| = infu∈S|x− u| .

We call v the best approximation to x in S.

Proof. Apply Theorem 1.83 to x+ S.

Theorem 1.85. Let E be a Hilbert space over K, let x ∈ E, and let S be a closedsubspace of E. The best approximation to x in S is the unique vector v in S forwhich x− v ⊥ S.

Proof. If x− v ⊥ S then for any u ∈ S we have x− v ⊥ v − u, so

|x− u|2 = |x− v|2 + |v − u|2 ≥ |x− v|2 .

Therefore v is the best approximation to x in S. Conversely, suppose that v is thebest approximation to x in S. For u ∈ S and r ∈ K we compute

|x− v − ru|2 = |x− v|2 − r 〈x− v, u〉 − r 〈u, x− v〉+ rr |u|2

= |x− v|2 + |u|2(rr − r 〈x− v, u〉

|u|2− r 〈x− v, u〉

|u|2

)

= |x− v|2 + |u|2∣∣∣∣∣r − 〈x− v, u〉|u|2

∣∣∣∣∣2

− |〈x− v, u〉|2

|u|2.

Put r = 〈x− v, u〉 / |u|2 to get∣∣∣∣∣x− v − 〈x− v, u〉|u|2u

∣∣∣∣∣2

= |x− v|2 − |〈x− v, u〉|2

|u|2.

Since v is the best approximation to x in S, we have

|x− v|2 − |〈x− v, u〉|2

|u|2≥ |x− v|2 ,

which implies that 〈x− v, u〉 = 0. Therefore x− v ⊥ S.

Alternatively, we can use Theorem 2.49. Define f : S → R by f(y) = |x− y|2 =〈x− y, x− y〉. Then

Df(v)u = 〈−u, x− v〉+ 〈x− v,−u〉= −2 Re(〈x− v, u〉)= 0

44


for all u ∈ S since f has a global minimum at v, so Lemma 1.13 shows thatx− v ⊥ S.

Theorem 1.86 (Existence of orthogonal projections). Let E be a Hilbert spaceand let S be a closed subspace of E. There exists an orthogonal projection onto S.

Proof. Let ρ : E → E be the map that sends each x ∈ E to the best approximationto x in S. Theorem 1.85 shows that ρx is the unique vector in S such thatx − ρx ⊥ S, so it follows immediately that ρ is linear and ker ρ = S⊥. Thereforeim ρ ⊥ ker ρ. If x ∈ E then x = ρx+(x−ρx) ∈ S+S⊥, so E = SS⊥ = im ρker ρ.This shows that ρ is an orthogonal projection.

Fourier expansions

Let O = u1, . . . , un be a finite orthonormal subset of a Hilbert space E. Ifx ∈ span(O) then

x = a1u1 + · · ·+ anun

for some a1, . . . , an ∈ K. Taking inner products, we have

〈x, ui〉 = a1 〈u1, ui〉+ · · ·+ an 〈un, ui〉 = ai

and therefore

x =

n∑i=1

〈x, ui〉ui.

Each 〈x, ui〉 is called a Fourier coefficient, and the above sum is called theFourier expansion of x. If x ∈ E then⟨

x−n∑i=1

〈x, ui〉ui, uj

⟩= 〈x, uj〉 −

n∑i=1

〈x, ui〉〈ui, uj〉 = 0,

so using Theorem 1.85 we have an explicit expression for the best approximationρx to x in span(O):

ρx =

n∑i=1

〈x, ui〉ui.

(Here, ρ is the orthogonal projection onto span(O).) Note that Bessel’s inequalitynow takes the form

n∑i=1

|〈x, ui〉|2 =

∣∣∣∣∣n∑i=1

〈x, ui〉ui

∣∣∣∣∣2

≤ |x|2 ,

with equality if and only if x ∈ span(O). We would like to obtain a similarexpression for ρx when O is an infinite orthonormal subset of E.

45


Theorem 1.87. Let uαα∈A be an orthonormal subset of a Hilbert space E. A

series∑α∈A rαuα (where rα ∈ K) converges if and only if

∑α∈A |rα|

2converges.

In that case, ∣∣∣∣∣∑α∈A

rαuα

∣∣∣∣∣2

=∑α∈A|rα|2 .

Proof. We use Theorem 1.27. The first series converges if and only if for everyε > 0 there exists a finite set S ⊆ A such that∣∣∣∣∣∑

α∈Trαuα

∣∣∣∣∣ < ε

whenever T ⊆ A is finite and S ∩ T = ∅. But∣∣∣∣∣∑α∈T

rαuα

∣∣∣∣∣2

=∑α∈T|rα|2 ,

so the second series converges if and only if the first series converges. The secondstatement follows from the continuity of the norm (see Theorem 1.26).

Lemma 1.88. Let E be an inner product space and let S be a subset of E.

1. S⊥ is closed.

2. S⊥ = (span(S))⊥.

3. span(S) = S⊥⊥. If S is a closed subspace of E, then S = S⊥⊥.

Proof. If x ∈ E then x⊥ = v ∈ E : 〈x, v〉 = 0 is closed by the continuity of

the inner product. Therefore S⊥ =⋂x∈S x

⊥is closed. This proves (1). Since

S ⊆ span(S), we have S⊥ ⊇ (span(S))⊥. Conversely, let x ∈ S⊥ and y ∈ span(S).Choose a sequence yn in span(S) such that yn → y. Then 〈x, yn〉 = 0 for eachn, so 〈x, y〉 = 0 by continuity. This proves (2). For (3), Theorem 1.86 shows that

E = span(S) (span(S))⊥ = span(S) S⊥,

and since S⊥ is closed, E = S⊥ S⊥⊥. Therefore S⊥⊥ = span(S) by Theorem1.14.

Now let O = uαα∈A be an arbitrary orthonormal subset of E, and let x ∈ E.From the case of finite O, we have

supS

∑α∈S|〈x, uα〉|2 ≤ |x|2

46


where the sup is taken over all finite sets S ⊆ A. By Theorem 1.31, the series∑α∈A|〈x, uα〉|2

converges, and Theorem 1.87 shows that∑α∈A〈x, uα〉uα

converges to some element v ∈ span(O). By continuity we have

〈x− v, uβ〉 = 0

for every β ∈ A, so x−v ∈ span(O)⊥ = (span(O))⊥ using Lemma 1.88. ThereforeTheorem 1.85 shows that

ρx = v =∑α∈A〈x, uα〉uα

is the best approximation to x in span(O).

Theorem 1.89 (Fourier expansion formula). Let O = uαα∈A be an orthonormalsubset of E and let x ∈ E. The Fourier expansion∑

α∈A〈x, uα〉uα

converges, and is the best approximation to x in span(O). Bessel’s inequality

∑α∈A|〈x, uα〉|2 =

∣∣∣∣∣∑α∈A〈x, uα〉uα

∣∣∣∣∣2

≤ |x|2

holds, with equality if and only if x ∈ span(O).

Products of Hilbert spaces

Let (Eα)α∈A be a collection of Hilbert spaces over K. The (Hilbert space)product E =

∏α∈AEα is the vector space of all functions f : A →

⋃α∈AEα

where f(α) ∈ Eα for all α ∈ A and∑α∈A |f(α)|2 converges. We usually write an

47


element of E as (xα)α∈A, where xα ∈ Eα and∑α∈A |xα|

2<∞. If (xα), (yα) ∈ E

then ∑α∈S|xα + yα|2 ≤ 2

(∑α∈S|xα|2 +

∑α∈S|yα|2

)

≤ 2

(∑α∈A|xα|2 +

∑α∈A|yα|2

)

for all finite S ⊆ A, so Theorem 1.31 shows that∑α∈A |xα + yα|2 converges. We

can define an inner product on E by

〈(xα), (yα)〉 =∑α∈A〈xα, yα〉 ;

this sum is absolutely convergent since∑α∈S|〈xα, yα〉| ≤

∑α∈S|xα| |yα|

≤ 1

2

(∑α∈S|xα|2 +

∑α∈S|yα|2

)

≤ 1

2

(∑α∈A|xα|2 +

∑α∈A|yα|2

)

for all finite S ⊆ A. Finally, we prove that E is complete. Let (x(n)) be a Cauchy

sequence in E, where (x(n)) = (x(n)α ). For any ε > 0 there exists an integer N

such that|(x(m))− (x(n))|2 =

∑α∈A|x(m)α − x(n)

α |2 < ε (*)

for all m,n ≥ N . For each α ∈ A we have∣∣∣x(m)α − x(n)

α

∣∣∣ < ε, so x(n)α is a Cauchy

sequence in Eα and converges to some xα. If S ⊆ A is finite then∑α∈S|x(n)α − xα|2 ≤ ε

using (*) and the continuity of the norm, so Theorem 1.31 shows that∑α∈A|x(n)α − xα|2 ≤ ε.

Therefore (x(n)α − xα) ∈ E, (xα) ∈ E, and (x(n)) → (xα). This shows that E is a

Hilbert space.

48


For each α ∈ A, there is a canonical injection ια : Eα → E given by ια(x) = δαx,where δα : A→ 0, 1 is defined by

δα(β) =

1, α = β,

0, α 6= β.

This map is a linear homeomorphism onto a closed subspace of E. It is clearthat ια(Eα) ⊥ ιβ(Eβ) for all α 6= β. Furthermore, if (xα) ∈ E then the series∑α∈A ια(xα) converges to (xα) since∣∣∣∣∣∑

α∈Sια(xα)

∣∣∣∣∣2

=∑α∈S|xα|2

for all finite S ⊆ A. Therefore the (algebraic) direct sum⊕

α∈AEα (all (xα)α∈Awhere xα 6= 0 for finitely many α) is dense in E =

∏α∈AEα. The converse holds

as well (cf. Theorem 1.67):

Theorem 1.90 (Internal Hilbert space products). Let E be a Hilbert space andlet (Eα)α∈A be a collection of closed subspaces of E such that Eα ⊥ Eβ wheneverα 6= β, and

⊕α∈AEα is dense in E. There exists a unique isometric isomorphism

ϕ : E →∏α∈AEα such that ϕ|Eα = ια for all α ∈ A.

Proof. We have a natural injection

f :⊕α∈A

Eα →∏α∈A

Eα.

This map is a linear isometry:

|f((xα))|2 =∑α∈A|xα|2 =

∣∣∣∣∣∑α∈A

xα

∣∣∣∣∣2

= |(xα)|2

since Eα ⊥ Eβ for α 6= β. (The above sum is finite, since (xα) ∈⊕

α∈AEα.) ByTheorem 1.45, there is a unique extension of f to a linear isometry ϕ defined on⊕

α∈AEα = E.

It is easy to see that this map is surjective, so it is an isometric isomorphism.

The product of a finite number of Hilbert spaces is both a product and a coprod-uct, as in Theorem 1.64 and Theorem 1.66. However, an infinite product is notnecessarily a product or a coproduct.

49


Hilbert bases

A Hilbert basis for a Hilbert space E is a maximal orthonormal subset of E.From now on, an ordinary vector space basis (a maximal linearly independentsubset of E) will be referred to as a Hamel basis. Clearly, every orthonormalHamel basis is also a Hilbert basis. If E is finite-dimensional, then the two conceptsare equivalent.

Theorem 1.91 (Existence of Hilbert bases). Let E be a Hilbert space and let Obe an orthonormal subset of E. There exists a Hilbert basis B for E that containsO.

Proof. Let A be the set of all orthonormal sets in E containing O. Then A isnonempty partially ordered set under set inclusion. If C = Oαα∈A is a totallyordered subset of A, then

⋃α∈AOα is an upper bound of C. By Zorn’s lemma,

there is a maximal element B of A. By definition, B is a Hilbert basis for E.

Theorem 1.92 (Properties of Hilbert bases). Let O = uαα∈A be an orthonor-mal subset of E. The following are equivalent:

1. O is a Hilbert basis for E.

2. O⊥ = 0.

3. span(O) = E.

4. ρx = x for all x ∈ E, where ρ is the orthogonal projection onto span(O).

5. Equality holds in Bessel’s inequality: |ρx| = |x| for all x ∈ E. Equivalently,

|x|2 =∑α∈A|〈x, uα〉|2

for all x ∈ E.

6. Parseval’s identity holds:

〈x, y〉 = 〈ρx, ρy〉

=

⟨∑α∈A〈x, uα〉uα,

∑α∈A〈y, uα〉uα

⟩=∑α∈A〈x, uα〉〈y, uα〉

for all x, y ∈ E.

50


Proof. If O⊥ 6= 0 then we can choose some unit vector u ∈ O⊥. The set O∪uis orthonormal and contains O, so O is not maximal. Conversely, if O is notmaximal then O⊥ 6= 0. This proves (1)⇔ (2). Lemma 1.88 shows that

E = span(O) (span(O))⊥ = span(O)O⊥ = span(O),

which proves (2) ⇔ (3). The implications (3) ⇒ (4) ⇒ (5) are obvious, and(5)⇒ (3) follows from Theorem 1.82. Also, (4)⇒ (6) and (6)⇒ (5) are clear.

Theorem 1.93. Let E be a Hilbert space. All Hilbert bases for E have the samecardinality.

Proof. Let A = uαα∈A and B = vββ∈B be Hilbert bases for E. Then

uα =∑β∈B

〈uα, vβ〉 vβ

and 〈uα, vβ〉 vβ 6= 0 only for β in a countable set Bα ⊆ B, by Corollary 1.29.Furthermore,

⋃α∈ABα = B since we cannot have vβ ⊥ A for any β. So

|B| =

∣∣∣∣∣ ⋃α∈A

Bα

∣∣∣∣∣ ≤ ℵ0 |A| = |A| .

Similarly we have |A| ≤ |B|, so |A| = |B|.

We call this cardinality the Hilbert dimension of E. If C is any set and K = Ror K = C, we define the space

`2(C) =∏c∈C

K.

Clearly, δcc∈C is a Hilbert basis for `2(C), and the Hilbert dimension of `2(C)is |C|. (Recall that δc(a) = 1 if a = c and δc(a) = 0 if a 6= c.)

Theorem 1.94.

1. If |C1| = |C2|, then `2(C1) is isometrically isomorphic to `2(C2).

2. If B is a Hilbert basis for a Hilbert space E, then E is isometrically isomor-phic to `2(B).

3. Two Hilbert spaces are isometrically isomorphic if and only if they have thesame Hilbert dimension.

51


Theorem 1.95. Let E be a Hilbert space. Then E is separable if and only if ithas a countable Hilbert basis.

Proof. If un is a countable dense subset of E, then applying the Gram-Schmidtprocess (Theorem 1.12) to un gives a countable Hilbert basis for E. Conversely,if un is a countable Hilbert basis for E, then the span of un with rationalcoefficients is a countable dense subset of E.

Duality

A map f : E → F between real or complex vector spaces is conjugate linear iff(x+ y) = f(x) + f(y) and f(rx) = rf(x) for all x, y ∈ E and r ∈ K.

Theorem 1.96 (Riesz representation theorem). Let E be a Hilbert space over Kand let f : E → K be a continuous linear functional. There exists a unique vf ∈ Esuch that

f(x) = 〈x, vf 〉for all x ∈ E, and |f | = |vf |. The map f 7→ vf is a conjugate linear homeomor-phism.

Proof. We can assume that f 6= 0, for otherwise we can take v = 0. Let F = ker f ,which is closed. Then Theorem 1.86 shows that E = F F⊥, and dim(F⊥) = 1since dim(E/F ) = 1. Choose some y ∈ F⊥ and let

v =f(y)

|y|2y.

If x ∈ F then f(x) = 0 = 〈x, v〉. If x ∈ F⊥ then x = ry for some r ∈ K, so

f(x) = rf(y) =f(y)

|y|2〈ry, y〉 = 〈x, v〉 .

Therefore f(x) = 〈x, v〉 for all x ∈ E.

Given a Hilbert space E over K, we can define an inner product on its dual E∗ by

〈f, g〉 = 〈vg, vf 〉 .

For example,

〈af + bg, h〉 = 〈vh, vaf+bg〉=⟨vh, avf + bvg

⟩= a 〈f, h〉+ b 〈g, h〉 .

52


Since〈vf , vf 〉 = |vf |2 = |f |2

by Theorem 1.96, the norm induced by this inner product agrees with the usualoperator norm in E∗ = L(E,K).

Let τ : E → E∗∗ be the injection given by (τx)f = fx (see Lemma 1.54), and letϕE : E∗ → E be the conjugate linear homeomorphism defined in Theorem 1.96.Consider the map ϕEϕE∗ : E∗∗ → E; since ϕE∗ and ϕE are both conjugate linearhomeomorphisms, ϕEϕE∗ is a linear homeomorphism. Furthermore,

|ϕEϕE∗α| = |ϕE∗α| = |α| ,

so ϕEϕE∗ is an isometric isomorphism. Note that

〈y, ϕEf〉 = fy = (τy)f = 〈f, ϕE∗τy〉 = 〈ϕEϕE∗τy, ϕEf〉 .

Setting f = ϕE∗τx gives

〈y, ϕEϕE∗τx〉 = 〈ϕEϕE∗τy, ϕEϕE∗τx〉 = 〈y, x〉 ,

so ϕEϕE∗τx = x by Corollary 1.8 and τ = (ϕEϕE∗)−1. This shows that all Hilbert

spaces are reflexive, i.e. the injection τ is actually an isometric isomorphism fromE to E∗∗.

Theorem 1.97. Let E,F be Hilbert spaces and let f : E → F be a continuouslinear map. There is a unique continuous linear map f∗ : F → E, called theadjoint of f , satisfying

〈fx, y〉 = 〈x, f∗y〉

for all x ∈ E and y ∈ F .

Proof. If 〈fx, y〉 = 〈x, gy〉 then 〈x, (f∗ − g)y〉 = 0 for all x ∈ E and y ∈ F , so

|(f∗ − g)y|2 = 0 for all y ∈ F . This proves uniqueness. For each y ∈ F , define alinear functional fy ∈ E∗ by fyx = 〈fx, y〉. By Theorem 1.96, there is a vectorvfy ∈ E such that fyx = 〈x, vfy 〉 for all x ∈ E. Define f∗y = vfy ; then

〈fx, y〉 = 〈x, vfy 〉 = 〈x, f∗y〉

for all x ∈ E and y ∈ F . Since y 7→ fy and g 7→ vg are both conjugate linear, f∗

is linear. We have |fyx| ≤ |f | |x| |y|, so

|f∗y| = |vfy | = |fy| ≤ |f | |y|

and therefore f∗ is continuous.

53


Theorem 1.98 (Properties of the adjoint). Let E,F,G be Hilbert spaces over K,and let f, g ∈ L(E,F ) and h ∈ L(F,G).

1. 0∗ = 0 and Id∗E = IdE.

2. (f + g)∗ = f∗ + g∗.

3. (rf)∗ = rf∗ for all r ∈ K.

4. f∗∗ = f , and 〈f∗x, y〉 = 〈x, fy〉 for all x ∈ F and y ∈ E.

5. |f∗| = |f | and |f∗f | = |ff∗| = |f |2.

6. (hg)∗ = g∗h∗.

7. If f is a linear homeomorphism, then (f∗)−1 = (f−1)∗.

The map f 7→ f∗ from L(E,F ) to L(F,E) is a conjugate linear bijective isometry.

Proof. For (5), Theorem 1.97 shows that |f∗| ≤ |f |. But |f | = |f∗∗| ≤ |f∗|, so|f∗| = |f |. Also,

|f∗f |1/2 ≤ (|f∗| |f |)1/2 = |f |

and|fx| = 〈fx, fx〉1/2 = 〈(f∗f)x, x〉1/2 ≤ |f∗f |1/2 |x| ,

so |f∗f | = |f |2.

The relationship between the adjoint and the transpose defined in Theorem 1.76is given by the following theorem:

Theorem 1.99. Let E,F be Hilbert spaces and let f ∈ L(E,F ). Let ϕE : E∗ → Eand ϕF : F ∗ → F be the conjugate linear homeomorphisms defined in Theorem1.96. If f∗ : F → E is the adjoint of f and fT : F ∗ → E∗ is the transpose of f ,then

f∗ϕF = ϕEfT .

Proof. For all x ∈ E and y ∈ F ,⟨x, ϕEf

Tϕ−1F y

⟩= (fTϕ−1

F y)x

= (ϕ−1F y)(fx)

= 〈fx, y〉= 〈x, f∗y〉 .

54


Theorem 1.100 (Closed range theorem). Let E,F be Hilbert spaces and let f ∈L(E,F ).

1. ker f∗ = (im f)⊥ and ker f = (im f∗)⊥.

2. ker(f∗f) = ker f and ker(ff∗) = ker f∗.

Also, the following are equivalent:

1. im f is closed.

2. im f = (ker f∗)⊥.

3. im f∗ = (ker f)⊥.

4. im f∗ is closed.

In that case, we also have im(f∗f) = im f∗ and im(ff∗) = im f .

Proof. For (1),

x ∈ ker f∗ ⇔ f∗x = 0

⇔ 〈f∗x, y〉 = 0 for all y ∈ E⇔ 〈x, fy〉 = 0 for all y ∈ E⇔ x ∈ (im f)⊥,

so ker f∗ = (im f)⊥. Similarly, ker f = (im f∗)⊥. For (2), it is clear that ker f ⊆ker(f∗f). If x ∈ ker(f∗f) then 〈(f∗f)x, x〉 = 0, which implies that 〈fx, fx〉 = 0and x ∈ ker f . Therefore ker(f∗f) = ker f . The equivalences follow from Theorem1.79 and Theorem 1.99.

Theorem 1.101. Let f ∈ L(E,F ) be a linear isometry.

1. f∗fx = x for all x ∈ E.

2. f is an isometric isomorphism if and only if it is invertible and f−1 = f∗.


〈f∗fx, y〉 = 〈fx, fy〉 = 〈x, y〉

for all y ∈ E using Corollary 1.8. For (2), if f is an isometric isomorphism thenCorollary 1.8 shows that⟨

f−1y, x⟩

=⟨f−1y, f−1fx

⟩= 〈y, fx〉 = 〈f∗y, x〉

for all y ∈ F and x ∈ E. Conversely, if f is invertible and f−1 = f∗ then

〈fx, fy〉 = 〈f∗fx, y〉 = 〈x, y〉

for all x, y ∈ E.

55


Operators on Hilbert spaces

Definition 1.102. We say that an operator f ∈ L(E) is

1. normal if ff∗ = f∗f .

2. self-adjoint or Hermitian (or symmetric if K = R) if f∗ = f .

3. skew-adjoint or skew-Hermitian if f∗ = −f .

4. unitary (or orthogonal if K = R) if f is invertible and f∗ = f−1.

5. a projection or idempotent if f2 = f (see Theorem 1.80).

Clearly, all self-adjoint, skew-self-adjoint and unitary operators are normal. Iff ∈ L(E) then f∗f and ff∗ are both self-adjoint.

Theorem 1.103 (Properties of normal operators). Let f ∈ L(E). If K = C, thenf is normal if and only if |fx| = |f∗x| for all x ∈ E. If f is normal, then:

1. For all x, y ∈ E,

〈fx, fy〉 = 〈f∗x, f∗y〉 and |fx| = |f∗x| .

2. f∗ is normal. If f is invertible then f−1 is normal. If p(x) ∈ K[x] then p(f)is normal.

3. ker f∗ = ker f .

4. ker fk = ker f for any positive integer k. In particular, ker f is an invariantsubspace of f .

5. If x ∈ E and λ ∈ C, then fx = λx if and only if f∗x = λx.

6. If λ and µ are distinct eigenvalues of f , then corresponding eigenspaces Eλand Eµ are orthogonal.

Proof. If f is normal then

〈fx, fy〉 = 〈f∗fx, y〉 = 〈ff∗x, y〉 = 〈f∗x, f∗y〉

and |fx|2 = |f∗x|2 for all x, y ∈ E. Conversely, if K = C and |fx| = |f∗x| for allx ∈ E then

〈f∗fx, x〉 = 〈fx, fx〉 = |fx|2

= |f∗x|2 = 〈f∗x, f∗x〉= 〈ff∗x, x〉

56


and Theorem 1.5 applies. For (3),

ker f∗ = ker ff∗ = ker f∗f = ker f

using Theorem 1.100. For (4), it is clear that ker f ⊆ ker fk. Let g = f∗f ; since fcommutes with f∗, we have (f∗f)k = (f∗)kfk and

ker fk = ker(fk)∗fk = ker(f∗)kfk = ker(f∗f)k = ker gk.

Let x ∈ ker fk = ker gk with k ≥ 2. Since g is self-adjoint,

0 =⟨gkx, gk−2x

⟩=⟨gk−1x, gk−1x

⟩,

so x ∈ ker gk−1. Repeating this process shows that x ∈ ker g = ker f . For (5),

ker(f − λIdE) = ker(f − λIdE)∗ = ker(f∗ − λIdE).

For (6), if x ∈ Eλ and y ∈ Eµ then

λ 〈x, y〉 = 〈λx, y〉 = 〈fx, y〉 = 〈x, f∗y〉 = 〈x, µy〉 = µ 〈x, y〉 ,

so 〈x, y〉 = 0.

Corollary 1.104. Let f ∈ L(E) be normal. If im f is closed, then:

1. im f = (ker f)⊥ and E = im f ker f .

2. im f = im f∗.

3. im fk = im f for any positive integer k. In particular, im f is an invariantsubspace of f .

4. f |im f is a linear homeomorphism onto im f .

Proof. (1) and (2) follow from Theorem 1.100. For (3), we have im fk ⊆ im f for allk ≥ 1. If x ∈ im fk then x = fky for some y ∈ E. Write y = u+v for some u ∈ im fand v ∈ ker f . Then u = fz for some z ∈ E, so x = fky = fku = fk+1z ∈ im fk+1.This shows that im fk ⊆ im fk+1 for all k ≥ 1, and therefore im fk = im f for allk ≥ 1. For (4), f |im f is injective because ker f ∩ im f = 0, and it is surjectivebecause f(im f) = im f2 = im f . Since im f is closed, Corollary 1.58 shows thatf |im f is a linear homeomorphism.

Theorem 1.105 (Properties of self-adjoint operators). Let f ∈ L(E). If K = C,then f is self-adjoint if and only if 〈fx, x〉 is real for all x ∈ E. If f is self-adjoint,then:

1. 〈fx, x〉 is real for all x ∈ E.

57


2. If g ∈ L(E) is self-adjoint then f + g is self-adjoint. If f invertible then f−1

is self-adjoint. If p(x) ∈ R[x] then p(f) is self-adjoint.

3. All eigenvalues of f are real.

4. If g ∈ L(E) then fg = 0 if and only if im f ⊥ im g.

5. If E is finite-dimensional then f has an eigenvalue.

Proof. If f is self-adjoint then

〈fx, x〉 = 〈x, fx〉 = 〈fx, x〉,

so 〈fx, x〉 is real. Conversely, if K = C and 〈fx, x〉 is real for all x ∈ E then

〈fx, x〉 = 〈x, fx〉 = 〈f∗x, x〉

and Theorem 1.5 applies. For (2), if fx = λx and x 6= 0 then

〈fx, x〉 = 〈λx, x〉 = λ |x|2

is real, so λ is real. For (3),

〈fx, gy〉 = 〈x, fgy〉

for all x, y ∈ E. For (4), define q : E \ 0 → R by q(x) = 〈fx, x〉 / |x|2. Then

Dq(x)u =|x|2 (〈fu, x〉+ 〈fx, u〉)− (〈u, x〉+ 〈x, u〉) 〈fx, x〉

|x|4

=2 |x|2 Re (〈fx, u〉)− 2 〈fx, x〉Re (〈x, u〉)

|x|4.

Since the unit sphere S = x ∈ E : |x| = 1 is compact, q|S attains a maximumat some v ∈ S. But q(rx) = x for all r ∈ R \ 0, so q has a global maximumat v. Therefore Dq(v) = 0, which implies Re (〈fv, u〉) = 〈fv, v〉Re (〈v, u〉) andRe (〈fv − 〈fv, v〉 v, u〉) = 0 for all u ∈ E. Apply Lemma 1.13 to get fv = 〈fv, v〉 v.This shows that 〈fv, v〉 is an eigenvalue of f .

The converses to (1) in Theorem 1.103 and (1) in Theorem 1.105 do not hold whenE is a real Hilbert space. An operator τ ∈ L(E) satisfying |τx| = |τ∗x| for allx ∈ E is not necessarily normal, and the fact that 〈τx, x〉 is real for all x ∈ E doesnot imply that τ is self-adjoint (〈τx, x〉 is always real because E is a real Hilbertspace).

58


Theorem 1.106 (Properties of unitary operators). Let f, g ∈ L(E). Then f isunitary if and only if it is an isometric isomorphism. If f and g are unitary, then:

1. fg and f−1 are unitary. If r ∈ K and |r| = 1 then rf is unitary.

2. If λ is an eigenvalue of f , then |λ| = 1.

Proof. Lemma 1.101 shows that f is unitary if and only if it is an isometric iso-morphism. For (2), if fx = λx and x 6= 0 then

〈x, x〉 = 〈fx, fx〉 = 〈λx, λx〉 = |λ|2 〈x, x〉 ,

so |λ| = 1.

Theorem 1.107. Let ρ ∈ L(E) be a projection. The following are equivalent:

1. ρ is an orthogonal projection.

2. |ρx| ≤ |x| for all x ∈ E.

3. 〈ρx, x〉 = |ρx|2 for all x ∈ E.

4. ρ is self-adjoint.

5. ρ is normal.

Proof. Theorem 1.82 proves (1)⇔ (2). If (1) holds and x ∈ E then we can writex = s+ t where s ∈ im ρ and t ∈ ker ρ, so

〈ρx, x〉 = 〈s, s+ t〉 = |s|2 = |ρx|2

since s ⊥ t. Conversely, if (3) holds then

|ρx|2 = 〈s, s〉 = 〈s, x〉 = |s|2 ≤ |s|2 + |t|2 = |x|2 .

This proves (1)⇔ (3). If (1) holds then ker ρ and im ρ are both closed, so ker ρ∗ =(im ρ)⊥ = ker ρ and im ρ∗ = (ker ρ)⊥ = im ρ by Theorem 1.100. Also, (ρ∗)2 =(ρ2)∗ = ρ∗. Therefore ρ∗ is an orthogonal projection onto im ρ along ker ρ, whichimplies that ρ∗ = ρ. This proves (1) ⇒ (4), and it is clear that (4) ⇒ (5). If ρ isnormal then ker ρ = ker ρ∗ = (im ρ)⊥. Since ρ is continuous, im ρ = ker(IdE − ρ)is closed. Therefore E = im ρ ker ρ. This proves (5)⇒ (1).

We say that an operator f ∈ L(E) is positive (or positive semidefinite) if itis self-adjoint and 〈fx, x〉 ≥ 0 for all x ∈ E. We say that f is positive definiteif 〈fx, x〉 > 0 for all x 6= 0. The preceding theorem shows that all orthogonal

59


projections are positive. If E is a complex Hilbert space and 〈fx, x〉 ≥ 0 for allx ∈ E, then f is positive due to Theorem 1.105.

If f, g ∈ L(E), we write f ≥ g if f − g is positive. In other words, f ≥ g if〈fx, x〉 ≥ 〈gx, x〉 for all x ∈ E. This relation is a partial order on the set of allself-adjoint operators. In particular, if f ≥ g and g ≥ f then 〈fx, x〉 = 〈gx, x〉 forall x ∈ E, and Theorem 1.5 implies that f = g.

Theorem 1.108. Let ρ, σ ∈ L(E) be orthogonal projections. The following areequivalent:

1. im ρ ⊆ imσ.

2. ρσ = ρ.

3. σρ = ρ.

4. σ − ρ is an orthogonal projection.

5. ρ ≤ σ.

6. |ρx| ≤ |σx| for all x ∈ E.

Proof. Suppose that (1) holds and let x ∈ E. Write x = s+ t where s ∈ imσ andt ∈ kerσ. Since kerσ ⊆ ker ρ,

ρσx = ρσ(s+ t) = ρs = ρs+ ρt = ρx.

This proves (1) ⇒ (2). If ρσ = ρ then σρ = (ρσ)∗ = ρ∗ = ρ, which proves(2) ⇒ (3). If (3) holds and x ∈ im ρ then σx = σρx = ρx = x, so x ∈ imσ. Thisproves (3)⇒ (1). Suppose that (1) holds. Then σ − ρ is self-adjoint and

(σ − ρ)2 = σ − σρ− ρσ + ρ = σ − ρ,

so Theorem 1.107 shows that σ − ρ is an orthogonal projection. This proves(1) ⇒ (4). If σ − ρ is an orthogonal projection then σ − ρ ≥ 0, so (4) ⇒ (5) isclear. If ρ ≤ σ then

|ρx|2 = 〈ρx, x〉 ≤ 〈σx, x〉 = |σx|2

by Theorem 1.107, which proves (5)⇒ (6). If (6) holds and x ∈ im ρ then

|x| = |ρx| ≤ |σx| ≤ |x| ,

so |σx| = |x| and x ∈ imσ due to Theorem 1.82. This proves (6)⇒ (1).

Lemma 1.109. Let E,F be Hilbert spaces. If f ∈ L(E,F ) then f∗f ≥ 0 andff∗ ≥ 0.

Proof. We have 〈f∗fx, x〉 = 〈fx, fx〉 ≥ 0 for all x ∈ E. Similarly, 〈ff∗y, y〉 =〈f∗y, f∗y〉 ≥ 0 for all y ∈ F .

60


1.6 Exterior algebra

In this section, V and W are vector spaces over K (with K = R or K = C). Wewrite L(V ) = L(V, V ) for the space of linear operators on V .

Exterior algebra and exterior powers

An exterior algebra on V is a unital algebra E over K, along with a linearmap ι : V → E (the canonical injection) such that ι(v)ι(v) = 0 for all v ∈ V ,satisfying the following universal property: for any unital algebra A and any linearmap f : V → A such that f(v)f(v) = 0 for all v ∈ V , there is a unique (unital)algebra homomorphism f : E → A such that f = f ι.

Theorem 1.110 (Existence and uniqueness of the exterior algebra).

1. If E1 and E2 are exterior algebras on V , then E1∼= E2.

2. There exists an exterior algebra on V .

Proof. We will not prove (2) here; a standard construction is to take the quotientof the tensor algebra of V by the two-sided ideal generated by elements of the formv ⊗ v. (1) is easy to prove from the universal property.

Since there is a unique (up to isomorphism) exterior algebra on V , we will referto the exterior algebra ΛV on V . We will also use the wedge symbol ∧ todenote the product in this algebra, which is called the wedge product. Since wehave a canonical injection ι : V → ΛV , we can consider any v ∈ V as being anelement of ΛV (rather than writing ιv every time). Similarly we have an injectionκ : K → ΛV given by r 7→ re where e ∈ ΛV is the unit element, so we can alsoconsider any r ∈ K as being an element of ΛV . If u, v ∈ V then

0 = (u+ v) ∧ (u+ v) = u ∧ u+ u ∧ v + v ∧ u+ v ∧ v = u ∧ v + v ∧ u,

sou ∧ v = −v ∧ u.

This shows that the wedge product is anticommutative (on elements of V ).

The kth exterior power of V is the vector subspace

ΛkV = span v1 ∧ · · · ∧ vk : v1, . . . , vk ∈ V ,

61


with Λ0V = K (or more precisely, Λ0V = κ(K) = re : r ∈ K). Elements of ΛkVare called k-vectors, and those with the form v1∧· · ·∧vk for some v1, . . . , vk ∈ Vare decomposable. It is a fact that

ΛV =

∞⊕k=0

ΛkV,

but we will not prove this. In general, if β ∈ ΛjV and γ ∈ ΛkV then

β ∧ γ = (−1)jkγ ∧ β.

We say that a multilinear map f : V k → W is alternating if f(v1, . . . , vk) = 0whenever vi = vj for some i 6= j. Clearly, the inclusion ιk : V k → ΛkV given by(v1, . . . , vk) 7→ v1 ∧ · · · ∧ vk is an alternating multilinear map. In fact, we have auniversal property that characterizes the kth exterior power in this way:

Theorem 1.111 (Universal property of the kth exterior power). For any vectorspace W and any alternating multilinear map f : V k → W , there is a uniquelinear map f : ΛkV →W such that f = f ιk.

Theorem 1.112. Let eαα∈A be an ordered basis for V . For all k, the set

eα1∧ · · · ∧ eαk : α1 < · · · < αk

is a basis for ΛkV , where α1, . . . , αk ranges over the subsets of A with k elements.If V is finite-dimensional then

dim(ΛkV

)=

(n

k

)and dim (ΛV ) = 2n.

In particular, if k > dimV then

ΛkV = 0 .

Corollary 1.113. If v1, . . . , vk ∈ V then v1 ∧ · · · ∧ vk = 0 if and only if v1, . . . , vkare linearly dependent.

Given any k, a linear map f : V → W induces a corresponding linear map Λkf :ΛkV → ΛkW as follows: define an alternating multilinear map g : V k → ΛkW byg(v1, . . . , vk) = fv1∧· · ·∧fvk. By the universal property of the kth exterior power,we have a unique linear map Λkf : ΛkV → ΛkW such that (Λkf)v1 ∧ · · · ∧ vk =fv1 ∧ · · · ∧ fvk for all v1, . . . , vk ∈ V .

62


More generally, if V = W then for any j there is a unique linear map Λkj f : ΛkV →ΛkV satisfying

(Λkj f)v1 ∧ · · · ∧ vk =∑

i1+···+ik=j

f i1v1 ∧ · · · ∧ f ikvk,

where i1, . . . , ik ∈ 0, 1. We define Λ00f = IdK . For example, Λk0f = IdΛkV ,

(Λk1f)v1 ∧ · · · ∧ vk =

k∑i=1

v1 ∧ · · · ∧ fvi ∧ · · · ∧ vk,

Λkkf = Λkf , and Λkj f = 0 if j > k.

Theorem 1.114. If f ∈ L(V ), ω ∈ ΛkV and v ∈ V ,

(Λk+1j f)ω ∧ v = (Λkj f)ω ∧ v + (Λkj−1f)ω ∧ fv.

Proof. For all u1, . . . , uk ∈ V , we have

(Λk+1j f)u1 ∧ · · · ∧ uk ∧ v

=∑

i1+···+ik+1=j

f i1u1 ∧ · · · ∧ f ikuk ∧ f ik+1v

=∑

i1+···+ik=j

f i1u1 ∧ · · · ∧ f ikuk ∧ v +∑

i1+···+ik=j−1

f i1u1 ∧ · · · ∧ f ikuk ∧ fv

= [(Λkj f)u1 ∧ · · · ∧ uk] ∧ v + [(Λkj−1f)u1 ∧ · · · ∧ uk] ∧ fv.

Determinant and trace

Now suppose that V is n-dimensional. Let f ∈ L(V ) and consider the induced mapΛnf ∈ L(ΛnV ). Since ΛnV is one-dimensional, we can define the determinantdet(f) to be the unique number satisfying Λnf = det(f)ι where ι is the identitymap on ΛnV . Similarly, the trace tr f is defined as the unique number satisfyingΛn1f = (tr f)ι.

Recall that f ∈ L(V ) is diagonalizable if there is a basis u1, . . . , un of V suchthat for all i, we have fui = λiui for some λi ∈ K. In that case, λ1, . . . , λn areall of the eigenvalues of f , and we say that f is diagonalizable with respect tou1, . . . , un.

63


Theorem 1.115 (Properties of determinant and trace). Let f, g ∈ L(V ).

1. det(0) = 0 and det(IdV ) = 1.

2. det(fg) = det(f) det(g).

3. det(rf) = rn det(f) for all r ∈ K.

4. If f is diagonalizable and its eigenvalues are λ1, . . . , λn (with repetitions),then det(f) = λ1 · · ·λn.

5. tr : L(V )→ K is a linear map.

6. tr(fg) = tr(gf).

7. If f is diagonalizable and its eigenvalues are λ1, . . . , λn (with repetitions),then tr f = λ1 + · · ·+ λn.

Proof. (1) is obvious. For (2), we have

f(gv1) ∧ · · · ∧ f(gvn) = det(f)gv1 ∧ · · · ∧ gvn)

= det(f) det(g)v1 ∧ · · · ∧ vn

and alsofgv1 ∧ · · · ∧ fgvn = det(fg)

by definition. For (3),

rv1 ∧ · · · ∧ rvn = rnv1 ∧ · · · ∧ vn.

For (4), if V = V1 ⊕ · · · ⊕ Vn and we choose nonzero vi ∈ Vi such that fvi = λivi,then

fv1 ∧ · · · ∧ fvn = λ1v1 ∧ · · · ∧ λnvn = (λ1 · · ·λn)v1 ∧ · · · ∧ vn.

(5) is obvious. For (6), we compute

(tr f)(tr g)v1 ∧ · · · ∧ vn =

n∑k=1

(tr f)v1 ∧ · · · ∧ gvk ∧ · · · ∧ vn

=

n∑k=1

v1 ∧ · · · ∧ fgvk ∧ · · · ∧ vn

+

n∑k=1

∑j 6=k

v1 ∧ · · · ∧ fvj ∧ · · · ∧ gvk ∧ · · · ∧ vn.

64


The last sum is symmetric in f and g and so is (tr f)(tr g)v1 ∧ · · · ∧ vn. Therefore

n∑k=1

v1 ∧ · · · ∧ fgvk ∧ · · · ∧ vn =

n∑k=1

v1 ∧ · · · ∧ gfvk ∧ · · · ∧ vn,

i.e. tr(fg) = tr(gf). (7) is similar to (4).

Theorem 1.116. If f ∈ L(V ) then f is invertible if and only if det(f) 6= 0.

Proof. If f is an isomorphism then 1 = det(ff−1) = det(f) det(f−1), which im-plies that det(f) 6= 0. Conversely, if det(f) 6= 0 then whenever v1, . . . , vn ∈ V arelinearly independent we have

fv1 ∧ · · · ∧ fvn = det(f)v1 ∧ · · · ∧ vn 6= 0,

which implies that fv1, . . . , fvn is linearly independent (see Corollary 1.113).Therefore f is an isomorphism.

If A is an n×n matrix, its determinant is the determinant of the correspondinglinear operator on Kn, i.e. the map fA : Kn → Kn such that 〈fA(ej), ei〉 = Ai,j ,where e1, . . . , en is the standard basis for Kn.

Theorem 1.117 (Properties of determinant and trace for matrices). Let A andB be n× n matrices.

1. det(In) = 1 and det(O) = 0, where In is the n× n identity matrix and O isthe zero matrix.

2. det(AB) = det(A) det(B).

3. det(rA) = rn det(A) for all r ∈ K.

4. If A = (ai,j)i,j is diagonal, then det(A) = a1,1 · · · an,n.

5. det((ai,j)i,j) =∑σ∈Sn sgn(σ)a1,σ(1) · · · an,σ(n).

6. The map (v1, . . . , vn) 7→ det[v1 · · · vn

]is alternating and multilinear.

7. det(A) = det(AT ).

8. tr is a linear map.

9. tr(AB) = tr(BA).

10. tr((ai,j)i,j) = a1,1 + · · ·+ an,n.

65


Duality for linear operators

Suppose that V is n-dimensional and let 0 ≤ k ≤ n. For any F ∈ L(ΛkV ), wedefine its exterior transpose F∧ ∈ L(Λn−kV ) as the unique linear operatorsatisfying

(Fβ) ∧ γ = β ∧ (F∧γ)

for all β ∈ ΛkV and γ ∈ Λn−kV . The following theorem shows that this definitionmakes sense.

Theorem 1.118 (Properties of the exterior transpose). Let 0 ≤ k ≤ n.

1. If F ∈ L(ΛkV ), then F∧ ∈ L(Λn−kV ) exists and is unique.

2. Id∧ΛkV = IdΛn−kV .

3. If F,G ∈ L(ΛkV ) then (FG)∧ = G∧F∧.

4. If F ∈ L(ΛkV ) then F∧∧ = F .

5. The map F 7→ F∧ is an isomorphism.

Proof. Choose a basis e1, . . . , en for V . By Theorem 1.112, we have a corre-sponding bases for ΛkV and Λn−kV . For a given γ ∈ Λn−kV , we need to findsome ω ∈ Λn−kV such that (Fβ) ∧ γ = β ∧ ω for all β ∈ ΛkV . Write

ω =∑

j1<···<jn−k

cj1,...,jn−keji ∧ · · · ∧ ejn−k .

If j′1 < · · · < j′k,

ej′i ∧ · · · ∧ ej′k ∧ ω = ±cj1,...,jn−ke1 ∧ · · · ∧ en,

where j1, . . . , jn−k and j′1, . . . , j′k are disjoint. Since β is a linear combinationof the vectors ej′i ∧ · · · ∧ ej′k for j′1 < · · · < j′k, ω is uniquely determined by F andγ. Define F∧γ = ω. The coefficients cj1,...,jn−k are linear in both F and γ, so F∧

and F 7→ F∧ are linear maps. This proves (1).

For (3), we compute

β ∧ ((FG)∧γ) = (FGβ) ∧ γ = (Gβ) ∧ (F∧γ) = β ∧ (G∧F∧γ).

For (4),

(Fβ) ∧ γ = β ∧ (F∧γ) = (−1)k(n−k)(F∧γ) ∧ β= (−1)k(n−k)γ ∧ (F∧∧β) = (F∧∧β) ∧ γ,

and (5) follows directly from (4).

66


Using this notation, if f ∈ L(V ) then det(f) = (Λnf)∧, where we have identifiedL(Λ0V ) with K.

Compare the next theorem with Vandermonde’s identity for binomial coefficients,which states that (

n

j

)=

j∑i=0

(n− kj − i

)(k

i

).

Theorem 1.119 (Vandermonde’s identity). Let n = dimV , 0 ≤ j, k ≤ n. For allf ∈ L(V ),

(Λnj f)IdΛkV =

j∑i=0

(Λn−kj−i f)∧Λki f.

Proof. For all v1, . . . , vn ∈ V ,

(Λnj f)v1 ∧ · · · ∧ vn=

∑i1+···+in=j

f i1v1 ∧ · · · ∧ f invn

=

j∑i=0

∑i1+···+ik=i

∑i′1+···+i′n−k=j−i

f i1v1 ∧ · · · ∧ f ikvk ∧ f i′1vk+1 ∧ · · · ∧ f i

′n−kvn

=

j∑i=0

∑i1+···+ik=i

f i1v1 ∧ · · · ∧ f ikvk ∧∑

i′1+···+i′n−k=j−i

f i′1vk+1 ∧ · · · ∧ f i

′n−kvn

=

j∑i=0

∑i1+···+ik=i

f i1v1 ∧ · · · ∧ f ikvk ∧ [(Λn−kj−i f)vk+1 ∧ · · · ∧ vn]

=

j∑i=0

[(Λki f)v1 ∧ · · · ∧ vk] ∧ [(Λn−kj−i f)vk+1 ∧ · · · ∧ vn]

=

j∑i=0

[(Λn−kj−i f)∧(Λki f)v1 ∧ · · · ∧ vk] ∧ vk+1 ∧ · · · ∧ vn,

where i1, . . . , in ∈ 0, 1. This holds for all vk+1, . . . , vn, and the result follows.

Let f ∈ L(V ). Setting j = n and k = 1 in Theorem 1.119 gives

det(f)IdV = (Λnf)IdΛ1V =

n∑i=0

(Λn−1n−i f)∧Λ1

i f

= (Λn−1n−1f)∧Λ1

1f = (Λn−1n−1f)∧f.

67


We say that (Λn−1n−1f)∧ ∈ L(V ) is the adjugate of f . If f is invertible, then

f−1 = det(f)−1(Λn−1n−1f)∧.

Duality for inner product spaces

Given any vector space V and its algebraic dual V ∗, we can define a bilinear map〈·, ·〉 : ΛkV × ΛkV ∗ → K as follows: given v1, . . . , vk ∈ V , we can define a mapϕv1,...,vk : (V ∗)k → K by

ϕv1,...,vk(v∗1 , . . . , v∗k) = det ((v∗i (vj))i,j) = det

v∗1(v1) · · · v∗1(vk)

.... . .

...v∗k(v1) · · · v∗k(vk)

. (*)

This map is easily seen to be alternating and multilinear, so it induces a corre-sponding linear map Λkϕv1,...,vk : ΛkV ∗ → K. The map ψ : V k →

(ΛkV ∗

)∗given by ψ(v1, . . . , vk) = ϕv1,...,vk is itself alternating and multilinear, so it also

induces a linear map Λkψ : ΛkV →(ΛkV ∗

)∗. The desired bilinear map is then

〈β, γ〉 7→ (Λkψ)(β)(γ). By definition, 〈v1 ∧ · · · ∧ vk, v∗1 ∧ · · · ∧ v∗k〉 is equal to (*)whenever v1, . . . , vk ∈ V and v∗1 , . . . , v

∗k ∈ V ∗.

Theorem 1.120. The linear maps

λ : ΛkV →(ΛkV ∗

)∗v 7→ 〈v, ·〉

and

λ∗ : ΛkV ∗ →(ΛkV

)∗v∗ 7→ 〈·, v∗〉

are injective. If V is finite-dimensional then both maps are surjective.

Proof. Choose an ordered basis eαα∈A for V and let B be the correspondingbasis of ΛkV (see Theorem 1.112). Suppose that 〈β, γ〉 = 0 for all γ ∈ ΛkV ∗, andwrite β as a linear combination of members of B, i.e. β =

∑ni=1 cieαi,1 ∧ · · ·∧ eαi,k

where αi,1 < · · · < αi,k. For each α ∈ A, let e∗α ∈ V ∗ be a linear functionalsuch that e∗α(eα) = 1 but e∗α(eβ) for all β 6= α. Then for each 1 ≤ i, j ≤ n,〈eαi,1 ∧ · · · ∧ eαi,k , e∗αj,1 ∧ · · · ∧ e

∗αj,k〉 is the determinant of a k × k matrix with at

68


most one nonzero entry in each row. In fact, there can only be k nonzero entriesif i = j. Therefore

0 =

⟨n∑i=1

cieαi,1 ∧ · · · ∧ eαi,k , e∗αj,1 ∧ · · · ∧ e∗αj,k

⟩= cj

for each j, and β = 0. This shows that λ is injective, and the proof for λ∗ issimilar. When V is finite-dimensional, surjectivity follows easily from the factthat the spaces all have the same dimension.

If V is an inner product space, then the previous construction defines an innerproduct on each exterior power satisfying

〈u1 ∧ · · · ∧ uk, v1 ∧ · · · ∧ vk〉 = det((〈ui, vj〉)i,j) = det

〈u1, v1〉 · · · 〈u1, vk〉...

. . ....

〈uk, v1〉 · · · 〈uk, vk〉

.When ui = vi for i = 1, . . . , k, the matrix on the right hand side is called the Grammatrix of v1, . . . , vk. Note that |v1 ∧ · · · ∧ vk| = |v1| · · · |vk| whenever v1, . . . , vkare orthogonal. The following theorem shows that 〈·, ·〉 is in fact an inner product:

Theorem 1.121. If v1, . . . , vk ∈ V then the Gram matrix

G(v1, . . . , vk) = (〈vi, vj〉)i,j

is positive semidefinite. It is invertible if and only if v1 ∧ · · · ∧ vk 6= 0.

Proof. For all x = (x1, . . . , xk) ∈ Kk,

〈G(v1, . . . , vk)x, x〉 =

k∑i=1

k∑j=1

〈vi, vj〉xixj =

k∑i=1

k∑j=1

〈xivi, xjvj〉

=

∣∣∣∣∣k∑i=1

xivi

∣∣∣∣∣2

≥ 0. (*)

By Corollary 1.113, v1∧· · ·∧vk = 0 if and only if v1, . . . , vk are linearly dependent.If v1, . . . , vk are linearly dependent then G(v1, . . . , vk) is not invertible because itscolumns are linearly dependent. Conversely, if G(v1, . . . , vk) is not invertible then

G(v1, . . . , vk)x = 0 for some x 6= 0. (*) shows that∑ki=1 xivi = 0, so v1, . . . , vk

are linearly dependent.

69


Theorem 1.122 (Hadamard’s inequality). For all v1, . . . , vk ∈ V we have

|v1 ∧ · · · ∧ vk| ≤ |v1| · · · |vk| ,

with equality if and only if v1, . . . , vk are orthogonal or vi = 0 for some i.

Proof. We can assume that v1, . . . , vk 6= 0. Let ui = vi/ |vi| for i = 1, . . . , k. Letλ1, . . . , λk ≥ 0 be the eigenvalues of G(u1, . . . , uk), with repetitions (see Theorem5.127). Using the AM-GM inequality,

|u1 ∧ · · · ∧ uk|2 = det(G(u1, . . . , uk)) = λ1 · · ·λk

≤(λ1 + · · ·+ λk

k

)k=

(trG(u1, . . . , uk)

k

)k=

(|u1|2 + · · ·+ |uk|2

k

)k= 1,

with equality if and only if λ1 = · · · = λk. This is true if and only if G(u1, . . . , uk)is the identity matrix, i.e. u1, . . . , uk is an orthonormal set.

Corollary 1.123. The inclusion ιk : V k → ΛkV given by (v1, . . . , vk) 7→ v1∧· · ·∧vk is continuous, and |ιk| = 1.

Hilbert spaces

If E is an inner product space, then the completion ΛkE of ΛkE is a Hilbert space.

Theorem 1.124. Let E be a Hilbert space and let eαα∈A be an ordered Hilbertbasis for E. If f ∈ L(E) and feα = λαeα for all α ∈ A, then for all v1, . . . , vk ∈ E,

fv1 ∧ · · · ∧ fvk =∑

α1<···<αk

λα1 · · ·λαk 〈v1 ∧ · · · ∧ vk, eα1 ∧ · · · ∧ eαk〉 eα1 ∧ · · · ∧ eαk

and

|fv1 ∧ · · · ∧ fvk|2 =∑

α1<···<αk

|λα1· · ·λαk |2| 〈v1 ∧ · · · ∧ vk, eα1

∧ · · · ∧ eαk〉 |2.

In particular,

v1 ∧ · · · ∧ vk =∑

α1<···<αk

〈v1 ∧ · · · ∧ vk, eα1∧ · · · ∧ eαk〉 eα1

∧ · · · ∧ eαk .

70


Proof. By Theorem 1.89 and Corollary 1.123,

fv1 ∧ · · · ∧ fvk=∑α1∈A

〈v1, eα1〉 feα1

∧ · · · ∧∑αk∈A

〈vk, eαk〉 feαk

=∑α1∈A

· · ·∑αk∈A

λα1· · ·λαk 〈v1, eα1

〉 · · · 〈vk, eαk〉 eα1∧ · · · ∧ eαk

=∑

α1<···<αk

λα1· · ·λαk

∑σ∈Sk

〈v1, eασ(1)〉 · · · 〈vk, eασ(k)〉eασ(1) ∧ · · · ∧ eασ(k)

=∑

α1<···<αk

λα1 · · ·λαk∑σ∈Sk

sgn(σ)〈v1, eασ(1)〉 · · · 〈vk, eασ(k)〉eα1 ∧ · · · ∧ eαk

=∑

α1<···<αk

λα1· · ·λαk det((

⟨vi, eαj

⟩)i,j)eα1

∧ · · · ∧ eαk

=∑

α1<···<αk

λα1 · · ·λαk 〈v1 ∧ · · · ∧ vk, eα1 ∧ · · · ∧ eαk〉 eα1 ∧ · · · ∧ eαk .

The second formula follows from the fact that the set of all eα1 ∧ · · · ∧ eαk isorthogonal.

Corollary 1.125. Let E be a Hilbert space and let eαα∈A be an ordered Hilbertbasis for E. For all k, the set

eα1∧ · · · ∧ eαk : α1 < · · · < αk

is a Hilbert basis for ΛkE, where α1, . . . , αk ranges over the subsets of A with kelements.

Theorem 1.126. Let E,F be Hilbert spaces, let O = uαα∈A be an orderedorthonormal set in E, and let vαα∈A be an ordered orthonormal set in F . If f ∈L(E) with O⊥ ⊆ ker f and fuα = λαvα for all α ∈ A, then for all w1, . . . , wk ∈ E,

(Λkj f)w1∧· · ·∧wk =∑

α1<···<αk

rα1,...,αk 〈w1 ∧ · · · ∧ wk, uα1∧ · · · ∧ uαk〉 vα1

∧· · ·∧vαk

whererα1,...,αk =

∑i1+···+ik=j

λi1α1· · ·λikαk

and i1, . . . , ik ∈ 0, 1.

Proof. As in Theorem 1.124.

71


Theorem 1.127. Let E,F,G be Hilbert spaces, let f ∈ L(E,F ), and let g ∈L(F,G).

1. Λkj rf = rjΛkj f .

2. Λkgf = ΛkgΛkf . If f is invertible then Λkf is invertible and (Λkf)−1 =Λkf−1.

3. (Λkf)∗ = Λkf∗.

4. If f is compact then Λkj f can be uniquely extended to a compact linear map

Λkj f ∈ L(ΛkE,ΛkF ) with |Λkj f | ≤(kj

)|f |j (see Section 5.7). (If j 6= k, we

assume that E = F .)

Proof. We have

(Λkgf)v1 ∧ · · · ∧ vk = gfv1 ∧ · · · ∧ gfvk = (Λkg)fv1 ∧ · · · ∧ fvk= (ΛkgΛkf)v1 ∧ · · · ∧ vk,

which proves (2). For all u1, . . . , uk ∈ E and v1, . . . , vk ∈ F ,

〈(Λkf)u1 ∧ · · · ∧ uk, v1 ∧ · · · ∧ vk〉 = 〈fu1 ∧ · · · ∧ fuk, v1 ∧ · · · ∧ vk〉= det((〈fui, vj〉)i,j) = det((〈ui, f∗vj〉)i,j)= 〈u1 ∧ · · · ∧ uk, f∗v1 ∧ · · · ∧ f∗vk〉 = 〈u1 ∧ · · · ∧ uk, (Λkf∗)v1 ∧ · · · ∧ vk〉,

and (3) follows by continuity. For (4), let f =∑∞m=1 sm(f)vmu

∗m be a singular

value decomposition of f (see Theorem 5.142). By Theorem 1.126,

Λkj f =∑

m1<···<mk

sm1,...,mk(Λkj f)vm1∧ · · · ∧ vmk(um1

∧ · · · ∧ umk)∗

is a singular value decomposition of Λkj f , where

sm1,...,mk(Λkj f) =∑

i1+···+ik=j

sm1(f)i1 · · · smk(f)ik .

Since

|sm1,...,mk(Λkj f)| ≤∑

i1+···+ik=j

s1(f)i1 · · · s1(f)ik =

(k

j

)|f |j ,

we must have

|Λkj f | ≤(k

j

)|f |j .

72


Corollary 1.128. If E is finite-dimensional and f ∈ L(E), then det(f∗) =det(f).

Proof. Let n = dimE and let ι be the identity map on ΛnV . Then

det(f∗)ι = Λnf∗ = (Λnf)∗

= (det(f)ι)∗ = det(f)ι.

1.7 Normed algebras

An (associative) algebra A over a field K is a vector space over K along witha bilinear map · : A × A → A, called multiplication, which is associative:x · (y · z) = (x · y) · z for all x, y, z ∈ A. We write xy = x · y for the product oftwo vectors x, y ∈ A. If the bilinear map is symmetric (i.e. xy = yx) then we saythat A is commutative. If there is an element e ∈ A such that ex = xe = x forall x ∈ A, we call e the unit element and we say that A is unital. (It is easyto check that e is uniquely determined.) A normed algebra is an algebra with anorm on its vector space structure that satisfies |xy| ≤ |x| |y| for all x, y ∈ A. If Ais both normed and unital, then we always assume that the unit element is a unitvector (i.e. |e| = 1) unless A = 0. If K = R, we say that A is a real algebra,and if K = C, we say that A is a complex algebra. A complete normed algebrais called a (real or complex) Banach algebra. Two obvious examples of (unitaland commutative) Banach algebras are R and C. If E is a Banach space, thenL(E,E) is a unital Banach algebra if we define multiplication by fg = f g.

An (algebra) homomorphism between two algebras A and B is a linear mapf : A → B such that f(xy) = f(x)f(y) for all x, y ∈ A. If A and B are unitalthen f is unital if f(eA) = eB , where eA and eB are the units in A and Brespectively. A subalgebra of A is a (vector) subspace B of A such that xy ∈ Bfor all x, y ∈ B. It is easy to see that the closure of any subalgebra in a normedalgebra is also a subalgebra. If A is a unital algebra, then a unital subalgebraof A is a subalgebra B of A whose unit is the same as the unit in A.

Unitization

The unitization of an algebra A over K is the algebra Ae = A × K, equippedwith multiplication defined by

(x, r)(y, s) = (xy + sx+ ry, rs).

73


This process is referred to as adjoining a unit to A. Note that e = (0, 1) isthe unit in Ae, and the map ι : A → Ae defined by ι(x) = (x, 0) is an algebraisomorphism onto the ideal A× 0. If A is normed, we can define a norm on Aeby

|x+ re| = |x|+ |r| .If A is a Banach algebra, then Ae is also a Banach algebra with this norm (sincethe codimension of A in Ae is 1).

Invertible elements

Let A be a unital algebra. An element x ∈ A is said to be invertible if thereis some y ∈ A such that xy = yx = e. The element y is uniquely determined,for if xz = zx = e then z = zxy = y. We call y the inverse of x, and wedenote it by x−1. Note that if x and y are invertible, then xy is also invertiblewith (xy)−1 = y−1x−1. A nonzero element x ∈ A is called a left zero divisor ifxy = 0 for some y 6= 0, and a right zero divisor if yx = 0 for some y 6= 0. Anelement that is a left zero divisor or a right zero divisor is called a zero divisor.Clearly, a left or right zero divisor is never invertible.

Theorem 1.129. Let A be a unital Banach algebra.

1. If x ∈ A and |e− x| < 1, then x is invertible, x−1 =∑∞k=0(e− x)k, and∣∣x−1

∣∣ ≤ 1

1− |e− x|.

2. The set G(A) of invertible elements in A is open.

Proof. The series∞∑k=0

(e− x)k

converges absolutely, so it converges to some element y. Since

x

n∑k=0

(e− x)k = (e− (e− x))

n∑k=0

(e− x)k = e− (e− x)n+1,

taking n→∞ gives xy = e. A similar argument shows that yx = e. Therefore xis invertible. We have∣∣x−1

∣∣ =

∣∣∣∣∣∞∑k=0

(e− x)k

∣∣∣∣∣ ≤∞∑k=0

|e− x|k =1

1− |e− x|.

74


This proves (1). Let x ∈ G(A) and let y ∈ A with |y − x| <∣∣x−1

∣∣−1. Then∣∣e− x−1y

∣∣ =∣∣x−1(x− y)

∣∣ < 1, so x−1y is invertible and y = xx−1y is invertible.This proves (2).

It is not hard to show that the inversion map x 7→ x−1 is continuous. We do notgive a proof here, because Theorem 2.6 gives the stronger result that the inversionmap is differentiable.

Corollary 1.130. Let E,F be Banach spaces. The set U of linear homeomor-phisms is open in L(E,F ).

Proof. Suppose that u ∈ U and let v ∈ L(E,F ) with |v − u| <∣∣u−1

∣∣−1. Then∣∣e− u−1v

∣∣ =∣∣u−1(u− v)

∣∣ < 1, so u−1v ∈ L(E,E) is a linear homeomorphism.

Similarly we have∣∣e− vu−1

∣∣ =∣∣(u− v)u−1

∣∣ < 1, so vu−1 ∈ L(F, F ) is a linearhomeomorphism. Now

((u−1v)−1u−1)v = (u−1v)−1(u−1v) = IdE

and

v((u−1v)−1u−1) = vu−1u(u−1v)−1u−1

= vu−1uv−1

= IdF ,

so v is a linear homeomorphism.

Ideals and modular ideals

An ideal of an algebra A is a subalgebra I such that xy ∈ I and yx ∈ I wheneverx ∈ I and y ∈ A. Given an ideal I of A, we can form the quotient algebra byequipping the quotient vector space A/I with multiplication defined by

(x+ I)(y + I) = xy + I

for x, y ∈ A. To check that this is well-defined, let u, v ∈ I and note that

(x+ u+ I)(y + v + I) = xy + xv + uy + uv + I = xy + I

since I is an ideal. The ideal generated by an element u ∈ A is AuA = xuy :x, y ∈ A. It is easy to check that this is an ideal, and that it is the smallest idealof A containing u. If A is commutative then AuA = Au = uA. If A is a normed

75


algebra and I is an ideal of A, then I is also an ideal because multiplication iscontinuous.

If I 6= A then we say that I is proper. If I is proper and is not contained inany larger proper ideal of A, we say that I is maximal. A proper ideal I cannotcontain any invertible elements: if u ∈ I is invertible then x = (xu−1)u ∈ I for allx ∈ A.

Theorem 1.131. Let A be a normed algebra and let I be a closed ideal of A.Then A/I is a normed algebra (under the quotient norm). If A is complete, thenA/I is also complete (i.e. a Banach algebra).

Proof. Since I is closed, Theorem 1.68 shows that A/I is a normed vector space.Also,

|(x+ I)(y + I)| = |xy + I|= infu∈I|xy + u|

≤ infu,v∈I

|(x+ u)(y + v)|

≤ infu,v∈I

|x+ u| |y + v|

= |x+ I| |y + I| .

If A is complete then Theorem 1.72 shows that A/I is complete.

The preceding theorem implies that every closed ideal I of A is the kernel of acontinuous algebra homomorphism (the quotient map π : A→ A/I). Conversely,it is easy to check that the kernel of a continuous algebra homomorphism on A isa closed ideal of A.

An ideal I is called modular if A/I is unital, i.e. there is some u ∈ A such thatx − ux ∈ I and x − xu ∈ I for all x ∈ A. Note that if u ∈ I then x ∈ I for allx ∈ A, so I = A. This shows that if I is a proper modular ideal, then u /∈ I. IfA itself is unital then every ideal of A is modular because we can choose u = e.If I is a modular ideal, then every ideal containing I is also modular. The set ofall maximal modular ideals of A is denoted by Max(A), and will be important inChapter 5.

Theorem 1.132 (Existence of a maximal ideal). Let A be an algebra and let I bea proper ideal of A.

1. There exists a maximal ideal containing I.

2. If I is modular, then there exists a maximal modular ideal containing I.

76


Proof. We only prove (2); the proof for (1) is even simpler. Let u + I be theidentity in A/I and let J be the set of all ideals J of A such that I ⊆ J and u /∈ J .Then J is a nonempty partially ordered set under set inclusion. If K = Jαα∈Ais a totally ordered subset of J , then

⋃α∈A Jα is an upper bound of K. By

Zorn’s lemma, there is a maximal element M of J . Note that M is modularbecause I ⊆ M . If J is a proper ideal containing M then u /∈ J , for otherwisex = (x − ux) + ux ∈ M + J = J for all x ∈ A. Therefore J ∈ J and J = M ,which shows that M is a maximal ideal.

Corollary 1.133. If A is a unital commutative algebra, then an element x ∈ Ais invertible if and only if x is not contained in any maximal ideal.

Proof. If x is invertible then it is not contained in any proper ideal. Conversely, ifx is not invertible then Ax 6= A, so Theorem 1.132 shows that there is a maximalideal M containing Ax.

Theorem 1.134. Let A be a Banach algebra and let I be a proper modular idealof A. If u+ I is the identity in A/I, then

x ∈ A : |x− u| < 1 ⊆ A \ I.

Therefore I is proper.

Proof. If A has a unit e, then let A′ = A. Otherwise, let A′ = Ae (the unitizationof A). If |x − u| < 1 then Theorem 1.129 shows that e + (x − u) is invertible inA′. Write (e+ (x− u))−1 = y + re for y ∈ A and r ∈ K; then

e = (e+ x− u)(y + re)

= y + re+ xy + rx− uy − ru. (*)

Suppose that x ∈ I. If e ∈ A (i.e. A′ = A) then

e = (re− (re)u) + (y − uy) + x(y + re) ∈ I,

which is a contradiction. If e /∈ A (i.e. A′ = Ae) then (1 − r)e ∈ A from (*), sor = 1. This implies that

u = x+ xy + (y − uy) ∈ I,

which is also a contradiction. Therefore x /∈ I.

Corollary 1.135. If A is a Banach algebra, then every maximal modular ideal ofA is closed.

77


Power series

Let A be a unital Banach algebra over K, let E be a normed vector space, letF be a Banach space, and assume that we are given a continuous bilinear map· : E ×A→ F such that

|cx| = |c| |x|

for all c ∈ E and x ∈ A. Let cn be a sequence in E. The series

f(x) =

∞∑n=0

cnxn

for x ∈ A is called a power series. In practice we usually have E = K and F = Awith scalar multiplication, but in Chapter 3 we will be using the case A = C andE = F .

Let∑cnx

n be a power series and let

α = lim supn→∞

n√|cn|.

If α =∞, let R = 0; if α = 0, let R =∞; otherwise, let R = 1/α. The number Ris called the radius of convergence of the power series.

Theorem 1.136 (Convergence and divergence of power series). Let∑cnx

n be apower series with radius of convergence R. Then

∑cnx

n converges if |x| < R. IfA has the property that |xn| = |x|n for all x ∈ A, then

∑cnx

n diverges if |x| > R.

Proof. We have

lim supn→∞

n√|cnxn| = lim sup

n→∞

n√|cn| |xn| ≤ |x| lim sup

n→∞

n√|cn|,

so if |x| < R then∑cnx

n converges by the root test. The second statement isproved similarly.

Theorem 1.137. Let∑cnx

n be a power series with radius of convergence R.Then for all ε > 0, the series converges uniformly on x ∈ A : |x| ≤ R− ε.

Proof. Since|cnxn| ≤ |cn| |R− ε|n ,

the result follows from Corollary 1.24.

78


1.8 The regulated integral

Let E be a Banach space. A function f : [a, b]→ E is called a step map if thereis a partition a = a0 < · · · < ak = b of [a, b] such that f takes on a constant valueci on (ai−1, ai) for each i = 1, . . . , k. We define the (regulated) integral of f by∫ b

a

f =

∫ b

a

f(t) dt =

k∑i=1

ci(ai − ai−1).

Let B[a, b] be the (Banach) space of bounded functions from [a, b] to E with thesupremum norm ‖f‖ = supt∈[a,b] |f(t)|, let S[a, b] be the set of all step maps on

[a, b], and let S[a, b] be the closure of S[a, b] in B[a, b]. (Note that S[a, b] is completesince B[a, b] is complete.) Elements of S[a, b] are called regulated functions. It

is easy to check that S[a, b] is a subspace of B[a, b], and that∫ ba

is a linear mapfrom S[a, b] to E. Since∣∣∣∣∣

∫ b

a

f

∣∣∣∣∣ ≤k∑i=1

|ci| (ai − ai−1) ≤ (b− a) ‖f‖ ,

the linear map∫ ba

is bounded (and continuous). By Theorem 1.45, there is a

unique extension of∫ ba

to S[a, b] that satisfies∣∣∣∣∣∫ b

a

f

∣∣∣∣∣ ≤ (b− a) ‖f‖

for all f ∈ S[a, b]. We define ∫ a

b

f = −∫ b

a

f.

Theorem 1.138 (Properties of the regulated integral). Let f ∈ S[a, b].

1. If a < c < b, then∫ baf =

∫ caf +

∫ bcf .

2.∣∣∣∫ ba f ∣∣∣ ≤ ∫ ba |f |.

3. If ϕ : E → F is a continuous linear map, then∫ b

a

ϕ f = ϕ

∫ b

a

f.

79


4. If E = E1 × · · · × En where E1, . . . , En are Banach spaces, then∫ b

a

f =

(∫ b

a

f1, . . . ,

∫ b

a

fn

)

where fi = πi f and πi : E → Ei is the canonical projection.

5. If E = R and f ≥ 0, then∫ baf ≥ 0.

6. If fn is a sequence in S[a, b] converging uniformly to f , then∫ b

a

f = limn→∞

∫ b

a

fn.

Proof. Let fn be a sequence of step maps converging to f . Then fn|[a,c] isa sequence of step maps converging to f |[a,c] and fn|[c,b] is a sequence of stepmaps converging to f |[c,b]. Since∫ b

a

fn =

∫ c

a

fn|[a,c] +

∫ b

c

fn|[c,b]

for all n, (1) follows. For (2), |fn| is a sequence of step maps converging to |f |since

||fn| − |f || ≤ |fn − f | .

We have ∣∣∣∣∣∫ b

a

fn

∣∣∣∣∣ ≤∫ b

a

|fn|

using the triangle inequality, so (2) follows. For (3), we have∫ b

a

ϕ fn = ϕ

∫ b

a

fn

by the linearity of ϕ. Part (4) follows by taking ϕ = πi for each i = 1, . . . , n. (Theexistence of the integral on the left is easy to prove.) For (5), max(fn, 0) is alsoa sequence of step maps converging to f , and∫ b

a

max(fn, 0) ≥ 0

for all n. Part (5) is clear since∫ ba

is a continuous linear map.

80


Theorem 1.139. A function f : [a, b]→ E is regulated if and only if

limh→t−

f(h) and limh→t+

f(h)

exist for all t ∈ [a, b] (except for the left hand limit when t = a, and the right handlimit when t = b).

Proof. Let t ∈ [a, b), let ε > 0, and let s : [a, b] → E be a step function such that‖s− f‖ < ε. Choose δ > 0 such that s|(t,t+δ) is constant. Then

|f(x)− f(y)| < |f(x)− s(x)|+ |s(x)− s(y)|+ |s(y)− f(y)| < 2ε

for all x, y ∈ (t, t+δ), so limh→t+ f(h) exists. Similarly, limh→t− f(h) exists for allt ∈ (a, b]. Conversely, suppose that the left and right hand limits exist, and let ε >0. For each t ∈ [a, b], we can choose an interval It = (st, ut) containing t such that|f(x)− f(y)| < ε whenever x, y ∈ (st, t)∩[a, b] or x, y ∈ (t, ut)∩[a, b]. Since [a, b] iscompact, we can cover [a, b] with finitely many such intervals It1 , . . . , Itk . Let P bethe set containing a, b, and all the sti , ti and uti . Write P ∩ [a, b] = c0, . . . , cmwhere c0 < · · · < cm. It is easy to check that for all i = 1, . . . ,m we have|f(x)− f(y)| < ε for all x, y ∈ (ci−1, ci). Define a step function s : [a, b] → E bysetting s(t) = f(t) for t ∈ c0, . . . , cm, and s(t) = f((ci−ci−1)/2) for t ∈ (ci−1, ci).Then ‖s− f‖ ≤ ε.

Corollary 1.140.

1. If f : [a, b]→ E is continuous, then f is regulated.

2. If f : [a, b]→ R is monotonic, then f is regulated.

81

2 Differentiation

In this chapter, E, F and G denote Banach spaces (over R or C). We write L(E,F )for the space of continuous linear maps from E to F .

Notes

The basic material in this chapter is based on [9]. Section 2.7 contains some orig-inal material, inspired by singular homology for differentiable manifolds. Lemma2.72 is usually stated for complex-valued holomorphic functions, but we prove thegeneral (Banach space valued) case by a simple application of the Poincare lemma.This will allow us to develop the theory of complex analysis for Banach space val-ued functions on C with no additional effort. (Usually one defines a weaklyholomorphic function as a function f : U → F such that λ f is holomorphicfor every λ ∈ F ∗, and then shows that every weakly holomorphic function on anopen set is holomorphic.) This will be essential for Chapter 5, where we define theholomorphic functional calculus for unital Banach algebras.

2.1 The derivative

Definition 2.1. Let A ⊆ E be an open set and let f : A → F . A continuouslinear map λ : E → F is said to be the Frechet derivative or the derivative off at a point x ∈ A if

limh→0

f(x+ h)− f(x)− λh|h|

= 0,

and we write f ′(x) = λ or Df(x) = λ. If f has a derivative at a point x, we saythat f is differentiable at x. If f is differentiable at all x ∈ A, then we say thatf is differentiable on A or simply differentiable.

Theorem 2.2. Let A ⊆ E be an open set. The derivative of a function f : A→ Fat a point x ∈ A, if it exists, is unique.

82

2 Differentiation

Proof. Suppose λ1 and λ2 are both derivatives of f at x. Subtracting gives

limh→0

(λ2 − λ1)h

|h|= 0.

For a fixed nonzero u ∈ E we have

limt→0+

(λ2 − λ1)tu

|tu|= 0,

and since the left hand side is independent of t we have (λ2 − λ1)u = 0 for allu ∈ E. Therefore λ1 = λ2.

Theorem 2.3. Let A ⊆ E be an open set. If f : A→ F is differentiable at x ∈ Athen f is continuous at x.

Proof. For small h we can write

f(x+ h)− f(x) = |h|(f(x+ h)− f(x)− λh

|h|

)+ λh.

The right hand side tends to 0 as h→ 0, so f is continuous at x.

Suppose f : A → F is differentiable on A. If the map f ′ : A → L(E,F ) iscontinuous, then we say that f is continuously differentiable or of class C1.We will have more to say about this in Section 2.2.

Basic properties

Theorem 2.4 (Chain rule). Let A ⊆ E and B ⊆ F be open sets. Let f : A→ Fand g : B → G with f(A) ⊆ B. If f is differentiable at x and g is differentiable atf(x), then g f is differentiable at x and

(g f)′(x) = g′(f(x)) f ′(x).

Proof. Let y = f(x) and define

ϕ(s) = f(x+ s)− f(x)− f ′(x)s,

ψ(t) = g(y + t)− g(y)− g′(y)t,

ρ(h) = g(f(x+ h))− g(y)− g′(y)f ′(x)h

so that

lims→0

ϕ(s)

|s|= limt→0

ψ(t)

|t|= 0 (*)

83

2 Differentiation

since f is differentiable at x and g is differentiable at y. We want to show that

limh→0

ρ(h)

|h|= 0.

For all sufficiently small h,

g(f(x+ h))− g(y) = g(y + f ′(x)h+ ϕ(h))− g(y)

= g′(y)(f ′(x)h+ ϕ(h)) + ψ(f ′(x)h+ φ(h))

andρ(h) = g′(y)ϕ(h) + ψ(f ′(x)h+ ϕ(h)).

Since g′(y) is continuous,

limh→0

g′(y)ϕ(h)

|h|= g′(y)

[limh→0

ϕ(h)

|h|

]= 0,

and it remains to show that

limh→0

ψ(f ′(x)h+ ϕ(h))

|h|= 0.

Let ε > 0. By (*) and the continuity of f at x, there exist δ1, δ2, δ3 > 0 suchthat |ψ(t)| ≤ ε |t| whenever |t| ≤ δ1, |f ′(x)h+ ϕ(h)| ≤ δ1 whenever |h| ≤ δ2, and|ϕ(s)| ≤ |s| whenever |s| ≤ δ3. Then for all 0 < |h| ≤ min(δ2, δ3),

|ψ(f ′(x)h+ ϕ(h))||h|

≤ ε(|f ′(x)h||h|

+ϕ(h)

|h|

)≤ ε(|f ′(x)|+ 1).

Theorem 2.5 (Properties of the derivative). Let F1, . . . , Fm be Banach spaces,let A ⊆ E be an open set and let f : A → F1 and g : A → F2 be differentiable atx ∈ A.

1. If f is constant then f ′(x) = 0.

2. If f(x) = λx for some continuous linear map λ, then f ′(x) = λ.

3. If F1 = F2 then (f + g)′(x) = f ′(x) + g′(x).

4. (rf)′(x) = rf ′(x) for all scalars r.

84

2 Differentiation

5. Suppose there is a continuous bilinear map · : F1 × F2 → G. Then

(fg)′(x) = f ′(x)g(x) + f(x)g′(x),

where f ′(x)g(x) is the linear map that takes u to f ′(x)u · g(x).

6. If h : F1 × · · · × Fm → G is a continuous multilinear map, then

h′(x1, . . . , xm)(u1, . . . , um) =

m∑j=1

h(x1, . . . , uj , . . . , xm).

Proof. We only prove (5), where we can assume that |f1f2| ≤ |f1| |f2| for f1 ∈ F1

and f2 ∈ F2. We have

0 = limh→0

[f(x+ h)− f(x)− f ′(x)h]g(x+ h) + f(x)[g(x+ h)− g(x)− g′(x)h]

|h|

= limh→0

(fg)(x+ h)− (fg)(x)− [f ′(x)g(x+ h) + f(x)g′(x)]h

|h|. (*)

Now|f ′(x)h[g(x+ h)− g(x)]|

|h|≤ |f ′(x)| |g(x+ h)− g(x)| → 0

as h→ 0 since · is continuous and g is continuous at x, so

limh→0

f ′(x)h[g(x+ h)− g(x)]

|h|= 0.

Adding this to (*) gives

limh→0

(fg)(x+ h)− (fg)(x)− [f ′(x)g(x) + f(x)g′(x)]h

|h|= 0.

Theorem 2.6. Let E be a Banach algebra and let G(E) be the open set of itsinvertible elements. Then the map x 7→ x−1 is differentiable on G(E), and itsderivative at a point x is given by

u 7→ −x−1ux−1.

Proof. For sufficiently small h we have∣∣e− (e+ x−1h)

∣∣ =∣∣x−1h

∣∣ < 1/2, so e +x−1h is invertible (see Theorem 1.129) and

(x+ h)−1 − x−1 + x−1hx−1 = (e+ x−1h)−1x−1 − x−1 + x−1hx−1

= [(e+ x−1h)−1 − (e− x−1h)]x−1. (*)

85

2 Differentiation

Now ∣∣(e+ x−1h)−1 − (e− x−1h)∣∣ =

∣∣∣∣∣∞∑k=0

(−x−1h)k − (e− x−1h)

∣∣∣∣∣=

∣∣∣∣∣∞∑k=2

(−x−1h)k

∣∣∣∣∣≤

∣∣x−1h∣∣2

1− |x−1h|

≤ 2∣∣x−1

∣∣2 |h|2 .Combining this with (*) shows that

(x+ h)−1 − x−1 + x−1hx−1

|h|→ 0

as h→ 0.

Corollary 2.7 (Quotient rule). Let F1 be a Banach space, let F2 be a Banachalgebra, and let G(F2) be the open set of the invertible elements in F2. Let A ⊆ Ebe an open set and let f : A → F1 and g : A → G(F2) be differentiable at x ∈ A.Suppose there is a continuous bilinear map · : F1 × F2 → G. Write fg−1 for themap (fg−1)(x) = f(x)g(x)−1. Then (fg−1)′(x) is given by

u 7→ [f ′(x)u]g(x)−1 − f(x)g(x)−1[g′(x)u]g(x)−1.

In particular, if F2 is commutative then

(f/g)′(x) = [f ′(x)g(x)− f(x)g′(x)]g(x)−2.

Theorem 2.8. Let A ⊆ E be an open set, let F1, . . . , Fn be Banach spaces, letf : A→ F1 × · · · × Fn and let fi = πi f be the component functions of f , whereπi : F1 × · · · × Fn → Fi is the canonical projection. Then f is differentiable ata point x ∈ A if and only if every fi is differentiable at x. In that case we havef ′i(x) = πi f ′(x), i.e.

f ′(x) =

f′1(x)...

f ′n(x)

.Proof. If λ is a linear map then the ith entry of

T (h) =f(x+ h)− f(x)− λh

|h|

86

2 Differentiation

is simply

Ti(h) =fi(x+ h)− fi(x)− πiλh

|h|.

Therefore T (h) approaches 0 as h → 0 if and only if every Ti(h) approaches 0 ash→ 0. The second statement is clear from the above.

Real and complex differentiability

Let F be a Banach space over K (where K = R or K = C). Let U ⊆ K be anopen set and let f : U → F . If K = C, then we may consider K and F as Banachspaces over C or as Banach spaces over R when determining the derivative of f . Toindicate this difference, we may use the terms C-derivative, C-differentiable,R-derivative or R-differentiable. If f is C-differentiable at z ∈ U , then it isalso R-differentiable at z with the same derivative (as a R-linear map rather thana C-linear map). The converse is not true: define f : C→ C by f(x+ yi) = x− yi(or f(z) = z), so that the R-derivative is

Df(z) =

[1 00 −1

]for all z ∈ C. If f is also C-differentiable then Df(z) must be C-linear, so Df(z)i =iDf(z)(1) = i. This cannot be the case since Df(z)i = −i (from the abovematrix). Functions that are C-differentiable have special properties, which we willstudy in Chapter 3.

The K-derivative f ′(z) = Df(z) is a K-linear map from a one-dimensional vectorspace (over K) to F , and

f ′(z)w = wf ′(z)(1)

for all w ∈ K. We can therefore identify f ′(z) = Df(z) with f ′(z)(1) ∈ F ,and we often write f ′(z) instead of f ′(z)(1). For example, if f(x) = x2 thenf ′(x) : R → R is the linear map given by f ′(x)h = 2xh, but we will often writef ′(x) = f ′(x)(1) = 2x as in single variable calculus.

Theorem 2.9. Let U ⊆ K be an open set. A function f : U → F is K-differentiable at z ∈ U if and only if the limit

c = limh→0

f(z + h)− f(z)

h

exists. In that case, c = f ′(z).

87

2 Differentiation

Proof. We have

limh→0

f(z + h)− f(z)

h= c⇔ lim

h→0

f(z + h)− f(z)− ch|h|

= 0.

The fundamental theorem of calculus

The definition of the derivative extends naturally to closed intervals (with morethan one point) in R. Most of our theorems, including the chain rule, still holdfor functions defined on closed intervals.

Lemma 2.10. Let f : [a, b] → E be differentiable. If f ′(x) = 0 for all x ∈ [a, b],then f is constant.

Proof. Suppose f(t) 6= f(a) for some t ∈ [a, b], and choose a linear functional λsuch that λ(f(t)) 6= λ(f(a)), e.g. by applying the Hahn-Banach theorem (Theorem1.48). Then λf is differentiable and (λf)′(x) = 0 for all x ∈ [a, b], which impliesthat λ f is constant. This is a contradiction.

Theorem 2.11 (Fundamental theorem of calculus). Let f : [a, b] → E be anintegrable function, and suppose that f is continuous at x ∈ [a, b]. Then the map

t 7→∫ t

a

f

is differentiable at x and its derivative is f(x).

Proof. We have

1

|h|

∣∣∣∣∣∫ x+h

a

f −∫ x

a

f − f(x)h

∣∣∣∣∣ =1

|h|

∣∣∣∣∣∫ x+h

x

[f(t)− f(x)] dt

∣∣∣∣∣≤ sup

t|f(t)− f(x)|

→ 0

as h → 0, where the sup is taken over all t between x and x + h where f(t) isdefined.

Corollary 2.12. Let f : [a, b]→ E be continuous, let F : [a, b]→ E, and supposethat F ′(x) = f(x) for all x ∈ [a, b]. Then∫ b

a

f = F (b)− F (a).

88

2 Differentiation

Proof. Apply Lemma 2.10 to the map

x 7→ F (x)−∫ x

a

f.

Corollary 2.13 (Integration by parts). Let E1, E2, F be Banach spaces and sup-pose there is a continuous bilinear map · : E1 × E2 → F . Let f : [a, b] → E1 andg : [a, b]→ E2 be continuously differentiable functions. Then∫ b

a

f ′g +

∫ b

a

fg′ = f(b)g(b)− f(a)g(a).

Proof. We have (fg)′ = f ′g + fg′ by the product rule, so integrating both sidesfrom a to b and applying Corollary 2.12 produces the result.

Corollary 2.14 (Change of variables formula on an interval). Let f : [a, b] → Ebe continuous and let g : [c, d]→ [a, b] be continuously differentiable. Then∫ g(d)

g(c)

f =

∫ d

c

(f g)g′.

Proof. Define F : [a, b] → E by x 7→∫ xaf so that F ′ = f . Then (F g)′ =

(F ′ g)g′ = (f g)g′, so∫ g(d)

g(c)

f = F (g(d))− F (g(c)) =

∫ d

c

(F g)′ =

∫ d

c

(f g)g′.

Mean value theorems

Let α : [a, b] → L(E,F ) be a continuous map into the space of linear maps fromE to F . If x ∈ [a, b] and y ∈ E then we write α(x)y for the element α(x)(y) ∈ F .

Lemma 2.15. Let α : [a, b]→ L(E,F ) be a continuous map and let y ∈ E. Then∫ b

a

α(t)y dt =

(∫ b

a

α(t) dt

)y.

Proof. The map λ 7→ λ(y) is a continuous linear map from L(E,F ) to F , so theresult follows from Theorem 1.138.

89

2 Differentiation

Theorem 2.16 (Mean value theorem). Let A ⊆ E be an open set, let f : A→ Fbe continuously differentiable, let x ∈ A, and let v ∈ E. If the line segment x+ tvwith 0 ≤ t ≤ 1 is contained in A, then

f(x+ v)− f(x) =

∫ 1

0

f ′(x+ tv)v dt =

(∫ 1

0

f ′(x+ tv) dt

)v.

Proof. Let g(t) = f(x + tv) so that g′(t) = f ′(x + tv)v. By the fundamentaltheorem of calculus, we have

g(1)− g(0) =

∫ 1

0

g′.

Since g(0) = f(x) and g(1) = f(x+v), the result follows and we can apply Lemma2.15.

If x, y ∈ A and the line segment between x and y is contained in A, setting v = y−xin Theorem 2.16 gives

|f(y)− f(x)| =∣∣∣∣∫ 1

0

f ′(x+ t(y − x))(y − x) dt

∣∣∣∣≤ |y − x| (1− 0) sup

t∈[0,1]

|f ′(x+ t(y − x))| .

In fact, we have a more general result due to [2].

Theorem 2.17 (Mean value inequality). Let f : [a, b] → F and g : [a, b] → R becontinuous. If there is a countable set R such that f and g are differentiable on[a, b] \R and |f ′(t)| ≤ g′(t) for all t ∈ [a, b] \R, then

|f(b)− f(a)| ≤ g(b)− g(a).

Proof. Write R = r1, r2, . . . and let ε > 0. Let J be the set of all t ∈ [a, b] suchthat

|f(s)− f(a)| ≤ g(s)− g(a) + ε

(s− a+

∑rn<s

2−n

)(*)

for all s ∈ [a, t). Clearly a ∈ J , and if t ∈ J then [a, t] ⊆ J . Let u = sup J . Bycontinuity, (*) holds for s = u.

Suppose that u < b. If u /∈ R then f and g are differentiable at u, so there is someδ > 0 such that [u, u+ δ] ⊆ [a, b] and

|f(t)− f(u)− f ′(u)(t− u)| ≤ (ε/2)(t− u),

|g(t)− g(u)− g′(u)(t− u)| ≤ (ε/2)(t− u)

90

2 Differentiation

for all t ∈ [u, u+ δ]. For these t,

|f(t)− f(u)| ≤ |f ′(u)|(t− u) + (ε/2)(t− u) ≤ g′(u)(t− u) + (ε/2)(t− u)

≤ g(t)− g(u) + ε(t− u)

and

|f(t)− f(a)| ≤ |f(t)− f(u)|+ |f(u)− f(a)|

≤ g(t)− g(a) + ε

(t− a+

∑rn<u

2−n

)≤ g(t)− g(a) + ε

(t− a+

∑rn<t

2−n

)since (*) holds for s = u. This contradicts the fact that u < b. If u ∈ R thenu = rm for some m. Since f and g are continuous, there is some δ > 0 such that[u, u+ δ] ⊆ [a, b] and

|f(t)− f(u)| ≤ (ε/2)2−m,

|g(t)− g(u)| ≤ (ε/2)2−m

for all t ∈ [u, u+ δ]. For these t we have

|f(t)− f(a)| ≤ |f(t)− f(u)|+ |f(u)− f(a)|

≤ g(u)− g(a) + ε

(u− a+

∑rn<u

2−n + 2−m−1

)

≤ g(t)− g(a) + ε

(u− a+

∑rn<t

2−n

),

which again contradicts the fact that u < b.

This shows that u = b, and since (*) holds for s = b, taking ε → 0 completes theproof.

Corollary 2.18. Let A ⊆ E be an open set, let f : A → F be continuous, andlet x, y ∈ A. If the line segment L between x and y is contained in A, and thederivative of f exists and is bounded on L, then

|f(y)− f(x)| ≤ |y − x| supz∈L|f ′(z)| .

Corollary 2.19. Let A ⊆ E be a connected open set and suppose that the deriva-tive of f : A→ F is zero on A. Then f is constant.

Proof. If x ∈ A and Br(x) is any open ball around x contained in A then Corollary2.18 shows that f is constant on Br(x). Since A is connected, f is constant onA.

91

2 Differentiation

2.2 Higher derivatives

Recall that for any Banach space F , the space of continuous linear maps L(E,F )is also a Banach space. If A ⊆ E is an open set and f : A → F is differentiable,then Df = f ′ : A→ L(E,F ) is a map between Banach spaces. Therefore we mayconsider the second derivative

D2f = f ′′ : A→ L(E,L(E,F ))

obtained by differentiating f ′. Continuing the process, we have higher order deriva-tives

Dpf = f (p) : A→ Lp(E,F ),

where Lp(E,F ) = L(E,L(E, . . . , L(E,F ) . . . )). It is clear that Dp(f + g) =Dpf + Dpg and Dp(cf) = cDpf for all scalars c. We say that f is of class Cp

or is p times continuously differentiable if Dpf(x) exists for all x ∈ A andDpf is continuous on A. Note that if f is of class Cp, then Dkf is automaticallycontinuous for all 0 ≤ k < p as well. We identify Lp(E,F ) with the space ofmultilinear maps L(E, . . . , E;F ), and write Dpf(x1, . . . , xp) for Dpf(x1) · · · (xp).

We have the following extension of Theorem 2.8, which follows easily by induction.

Theorem 2.20. Let A ⊆ E be an open set, let F1, . . . , Fn be Banach spaces, letf : A→ F1 × · · · × Fn and let fi = πi f be the component functions of f , whereπi : F1 × · · · × Fn → Fi is the canonical projection. Then f is of class Cp if andonly if every fi is of class Cp. In that case we have Dpfi(x) = πi Dpf(x), i.e.

Dpf(x) =

Dpf1(x)

...Dpfn(x)

.Theorem 2.21. Let A ⊆ E and B ⊆ F be open sets. Let f : A → F andg : B → G be class Cp maps with f(A) ⊆ B. Then g f is of class Cp.

Proof. We use induction on p, with Theorem 2.4 proving the case p = 1 (the casep = 0 also holds because a composition of continuous maps is also continuous).Assume that the result holds for p − 1 and suppose f and g are of class Cp. ByTheorem 2.4 we have

D(g f)(x) = Dg(f(x)) Df(x).

As a function of x, the right hand side is a composition of Cp−1 maps, so theinduction hypothesis shows that D(g f) is of class Cp−1 and therefore g f is ofclass Cp.

92

2 Differentiation

Symmetry

An important fact is that Dpf(x) is always symmetric (as a multilinear map) if fis of class Cp. To prove this, we start with the case p = 2.

Lemma 2.22. Let ϕ : E × E → F be a bilinear map. If there is a map ψ into Fdefined for sufficiently small (v, w) ∈ E × E such that

lim(v,w)→(0,0)

ψ(v, w) = 0

and|ϕ(v, w)| ≤ |ψ(v, w)| |v| |w| ,

then ϕ = 0.

Proof. Let v, w ∈ E. For sufficiently small s > 0 we have

|ϕ(sv, sw)| ≤ |ψ(sv, sw)| |sv| |sw| ,

sos2 |ϕ(v, w)| ≤ s2 |ψ(sv, sw)| |v| |w| .

Dividing by s2 and taking s→ 0 proves the result.

Theorem 2.23 (Symmetry of the second derivative). Let A ⊆ E be an open setand let f : A → F be a class C2 map. Then for every x ∈ A, the bilinear mapD2f(x) is symmetric. That is,

D2f(x)(v, w) = D2f(x)(w, v)

for all v, w ∈ E.

Proof. Let x ∈ A and choose r > 0 so that the open ball of radius r around x iscontained in A. Let v, w ∈ E with |v| , |w| < r/2. Define g(x) = f(x+ v)− f(x).

93

2 Differentiation

Applying Theorem 2.16, we have

g(x+ w)− g(x) =

∫ 1

0

g′(x+ tw)w dt

=

∫ 1

0

[Df(x+ v + tw)−Df(x+ tw)]w dt

=

∫ 1

0

(∫ 1

0

D2f(x+ sv + tw)v ds

)w dt

=

∫ 1

0

∫ 1

0

D2f(x)(v, w) ds dt

+

∫ 1

0

∫ 1

0

ψ(sv, tw)(v, w) ds dt

= D2f(x)(v, w) + ϕ(v, w)

where ψ(α, β) = D2f(x+ α+ β)−D2f(x) and

ϕ =

∫ 1

0

∫ 1

0

ψ(sv, tw) ds dt.

If we repeat the above process starting with g1 in place of g, where g1(x) =f(x+ w)− f(x), we obtain that

g1(x+ v)− g1(x) = D2f(x)(w, v) + ϕ(w, v).

Since g(x+ w)− g(x) = g1(x+ v)− g1(x), we have

D2f(x)(w, v)−D2f(x)(v, w) = ϕ(v, w)− ϕ(w, v),

and from the definitions of ϕ and ψ we see that∣∣D2f(x)(w, v)−D2f(x)(v, w)∣∣ ≤ 2 sup

0≤s,t≤1|ψ(sv, tw)| |v| |w| .

Since D2f is continuous, we can apply Lemma 2.22 to the bilinear map

(v, w) 7→ D2f(x)(w, v)−D2f(x)(v, w)

to obtain the result.

Theorem 2.24 (Symmetry of higher derivatives). Let A ⊆ E be an open set andlet f : A → F be a class Cp map. Then for every x ∈ A, the multilinear mapDpf(x) is symmetric.

94

2 Differentiation

Proof. We use induction on p, with Theorem 2.23 proving the case p = 2. Supposethe result holds for 2, . . . , p− 1. If v1, . . . , vp ∈ E, then

Dpf(x)(v1, . . . , vp) = D2Dp−2f(x)(v1, v2)(v3, . . . , vp)

= D2Dp−2f(x)(v2, v1)(v3, . . . , vp)

= Dpf(x)(v2, v1, v3, . . . , vp) (*)

by applying Theorem 2.23 to the C2 map Dp−2f . Also, the induction hypothesisshows that

Dp−1f(x)(vσ(2), . . . , vσ(p)) = Dp−1f(x)(v2, . . . , vp)

for any permutation σ of 2, . . . , p. If ϕσ : Lp−1(E,F ) → F is the linear mapgiven by λ 7→ λ(vσ(2), . . . , vσ(p)) then

Dpf(x)(v1, vσ(2), . . . , vσ(p)) = ϕσ(Dpf(x)(v1))

= D(ϕσ Dp−1f)(x)(v1)

= D(ϕe Dp−1f)(x)(v1)

= ϕe(Dpf(x)(v1))

= Dpf(x)(v1, . . . , vp) (**)

where e is the identity permutation. Since any permutation of 1, . . . , p can beexpressed as a composition of the permutations considered in (*) and (**), Dpf(x)is symmetric.

Taylor’s theorem

Theorem 2.25 (Taylor’s theorem). Let A ⊆ E be an open set and let f : A→ Fbe a class Cp map. Let x ∈ A and let v ∈ E. Assume that the line segment x+ tvwith 0 ≤ t ≤ 1 is contained in A. Write v(k) for the k-tuple (v, . . . , v). Then

f(x+ v) =

p−1∑k=0

Dkf(x)v(k)

k!+Rp,

where

Rp =

∫ 1

0

(1− t)p−1

(p− 1)!Dpf(x+ tv)v(p) dt.

Proof. We use induction on p, with Theorem 2.16 proving the case p = 1. Assumethat the result holds for p− 1. Let

g(t) =(1− t)p−1

(p− 1)!and h(t) = Dp−1f(x+ tv)v(p−1)

95

2 Differentiation

so that

g′(t) =−(1− t)p−2

(p− 2)!and h′(t) = Dpf(x+ tv)v(p).

(Note that for convenience, we are again identifying h′(t) with an element of F .)Applying Corollary 2.13 with the vector space product R× F → F gives∫ 1

0

−(1− t)p−2

(p− 2)!Dp−1f(x+ tv)v(p−1) dt+

∫ 1

0

(1− t)p−1

(p− 1)!Dpf(x+ tv)v(p) dt

= − 1

(p− 1)!Dp−1f(x)v(p−1),

and the result follows.

Corollary 2.26 (Taylor’s theorem with estimate). In Theorem 2.25, we also have

f(x+ v) =

p∑k=0

Dkf(x)v(k)

k!+ θ(v)

where

|θ(v)| ≤ sup0≤t≤1

|Dpf(x+ tv)−Dpf(x)|p!

|v|p

and

limv→0

θ(v)

|v|p= 0.

Proof. Let ψ(α) = Dpf(x+ α)−Dpf(x). We can write Rp as∫ 1

0

(1− t)p−1

(p− 1)!Dpf(x)v(p) dt+

∫ 1

0

(1− t)p−1

(p− 1)!Dpψ(tv)v(p) dt.

The first integral gives the pth term, and the second integral is bounded by

sup0≤t≤1

|ψ(tv)| |v|p∫ 1

0

(1− t)p−1

(p− 1)!dt =

1

p!sup

0≤t≤1|ψ(tv)| |v|p .

The result follows from the continuity of Dpf at x.

Lemma 2.27. Let f, g : [a, b] → R be continuous functions. Assume that g doesnot change sign on (a, b). Then∫ b

a

fg = f(c)

∫ b

a

g

for some c ∈ (a, b).

96

2 Differentiation

Proof. Assume that g ≥ 0 on (a, b). Since f is continuous on [a, b], it attains aminimum m and a maximum M on [a, b]. Therefore

m

∫ b

a

g ≤∫ b

a

fg ≤M∫ b

a

g.

If∫ bag = 0 then the result follows immediately, so we may assume that

∫ bag > 0.

Then

m ≤∫ bafg∫ bag≤M,

so

f(c) =

∫ bafg∫ bag

for some c ∈ (a, b) by the intermediate value theorem.

Corollary 2.28 (Taylor’s theorem with remainder in Lagrange form). In Theorem2.25, if F = R then we also have

f(x+ v) =

p−1∑k=0

Dkf(x)v(k)

k!+Dpf(z)v(p)

p!

for some z in the line segment.

Proof. By Lemma 2.27, there is a z in the line segment such that

Rp = Dpf(z)v(p)

∫ 1

0

(1− t)p−1

(p− 1)!dt

=Dpf(z)v(p)

p!.

Chain rule for the second derivative

In single variable calculus, we have the formula

(g f)′′(x) = g′′(f(x))[f ′(x)]2 + g′(f(x))f ′′(x)

for the second derivative of a composition of maps. There is a similar formula formaps between Banach spaces that is slightly more difficult to derive.

97

2 Differentiation

Theorem 2.29 (Chain rule for the second derivative). Let A ⊆ E and B ⊆ F beopen sets. Let f : A→ F and g : B → G with f(A) ⊆ B. If D2f exists on A andD2g exists on B, then D2(g f) exists on A and

D2(g f)(x)(u, v) = D2g(f(x))(Df(x)(u), Df(x)(v))

+Dg(f(x))(D2f(x)(u, v)).

Proof. We can write D(g f) = c d e, where

c : L(F,G)× L(E,F )→ L(E,G)

(λ, µ) 7→ λ µ;

d : E × E → L(F,G)× L(E,F )

(x, y) 7→ ((Dg f)(x), Df(y));

e : E → E × Ex 7→ (x, x).

Note that c is a continuous bilinear map and e is a continuous linear map. Wecompute

D2(g f)(x) = Dc((d e)(x)) Dd(e(x)) De(x)

= Dc(Dg(f(x)), Df(x)) Dd(x, x) e.

NowDd(x, y)(u, v) = (D2g(f(x))(Df(x)(u)), D2f(y)(v)),

so

D2(g f)(x)(u) = Dc(Dg(f(x)), Df(x))(Dd(x, x)(u, u))

= Dc(Dg(f(x)), Df(x))(D2g(f(x))(Df(x)(u)), D2f(x)(u))

= D2g(f(x))(Df(x)(u)) Df(x)

+Dg(f(x)) D2f(x)(u)

and

D2(g f)(x)(u)(v) = D2g(f(x))(Df(x)(u))(Df(x)(v))

+Dg(f(x))(D2f(x)(u)(v)).

98

2 Differentiation

2.3 Partial derivatives

While Theorem 2.8 and Theorem 2.20 show that differentiable maps may be nat-urally split into component functions when the codomain is a product of Banachspaces, the situation for the domain is more complicated. Let E1, . . . , Em be Ba-nach spaces and let A ⊆ E1 × · · · × Em be an open set. If x = (x1, . . . , xm) ∈E1 × · · · × Em then (from the definition of the product topology) there are opensets Aj ⊆ Ej containing xj such that x ∈ A1× · · · ×Am ⊆ A. If f : A→ F is anymap, we can consider the map

t 7→ f(x1, . . . , t, . . . , xm),

which can also be written as f ι where ι : Aj → A1 × · · · × Am is given byt 7→ (x1, . . . , t, . . . , xm). If this map is differentiable at xj , we call its derivativethe jth partial derivative of f at x and denote it by Djf(x) or DEjf(x). Fromthe definition of the derivative, Djf(x) : Ej → F is the unique continuous linearmap such that

limh→0

f(x1, . . . , xj + h, . . . , xm)− f(x1, . . . , xm)−Djf(x)h

|h|= 0.

It is not hard to see that all partial derivatives exist at x if f is differentiable at x.

Theorem 2.30. Let A1 × · · · × Am ⊆ E1 × · · · × Em where each Aj is open inEj and let f : A1 × · · · × Am → F . If f is differentiable at x = (x1, . . . , xm) ∈A1 × · · · ×Am, then every Djf(x) exists and we have Djf(x) = Df(x) ιj whereιj : Ej → E1 × · · · × Em is the canonical injection, i.e.

Df(x) =[D1f(x) · · · Dmf(x)

].

Proof. Apply the chain rule to f ι where ι : Aj → A1 × · · · × Am is given byt 7→ (x1, . . . , t, . . . , xm). Alternatively, restrict h to elements of Ej in the definitionof the derivative Df(x).

Definition 2.31. Let E1, . . . , Em and F1, . . . , Fn be Banach spaces. Let A1 ×· · ·×Am ⊆ E1×· · ·×Em where each Aj is open in Ej and let f : A1×· · ·×Am →F1 × · · · × Fn. The matrixD1f1(x) · · · Dmf1(x)

.... . .

...D1fn(x) · · · Dmfn(x)

is called the Jacobian matrix of f at x ∈ A, where fi = πi f and πi : F1× · · · ×Fn → Fi is the canonical projection.

99

2 Differentiation

Theorem 2.32. If f : A1 × · · · × Am → F1 × · · · × Fn is differentiable at x ∈ A,then the Jacobian matrix of f at x exists and represents Df(x).

Proof. Apply Theorem 2.8 followed by Theorem 2.30.

As in Rn, it may be the case that every partial derivative Djfi(x) exists but fis not differentiable at x. The differentiability of f implies the existence of theJacobian matrix, but the converse is not true. Thus we do not have a true analogof Theorem 2.8 for partial derivatives. We do however have an analog of Theorem2.20.

Theorem 2.33. Let A1× · · · ×Am ⊆ E1× · · · ×Em where each Aj is open in Ejand let f : A1 × · · · × Am → F . Then f is of class Cp (with p ≥ 1) if and only ifevery partial derivative

Djf : A1 × · · · ×Am → L(Ej , F )

is of class Cp−1. In that case we have Djf(x) = Df(x) ιj where ιj : Ej →E1 × · · · × Em is the canonical injection, i.e.

Df(x) =[D1f(x) · · · Dmf(x)

].

Proof. It is clear from the proof of Theorem 2.30 that every partial derivative isof class Cp−1 if f is of class Cp. For the converse, we only need to prove that Dfexists on A1 × · · · ×Am since

Df(x) =

m∑j=1

Djf(x) πj

implies that Df is of class Cp−1 if every Djf is of class Cp−1, where πj : E1 ×· · · × Em → Ej is the canonical projection. Let x ∈ A1 × · · · × Am and let ε > 0.Since every Djf is continuous, there exists a δ > 0 such that

|Djf(y)−Djf(x)| < ε

m

for all j = 1, . . . ,m and y ∈ Bδ(x) ⊆ A1 × · · · × Am where Bδ(x) is the open ballof radius δ around x. Let h = (h1, . . . , hm) ∈ E1 × · · · × Em with |h| < δ. Forj = 0, . . . ,m, let pj = h1 + · · ·+ hj so that p0 = 0, pm = h, and

f(x+ h)− f(x) =

m∑j=1

[f(x+ pj)− f(x+ pj−1)].

100

2 Differentiation

For each j = 1, . . . ,m the line segment from x+ pj−1 to x+ pj = x+ pj−1 + hj iscontained in Bδ(x), so we have

f(x+ pj)− f(x+ pj−1) =

∫ 1

0

Djf(x+ pj−1 + thj)hj dt

by Theorem 2.16. Then∣∣∣∣∣∣f(x+ h)− f(x)−m∑j=1

Djf(x)hj

∣∣∣∣∣∣≤

∣∣∣∣∣∣m∑j=1

[∫ 1

0

Djf(x+ pj−1 + thj)hj dt−∫ 1

0

Djf(x)hj dt

]∣∣∣∣∣∣≤

m∑j=1

|hj |∫ 1

0

|Djf(x+ pj−1 + thj)−Djf(x)| dt

≤m∑j=1

|hj |ε

m

≤ |h| ε

for all |h| < δ, which shows that

Df(x) =

m∑j=1

Djf(x) πj .

As in the case of the ordinary derivative Df , we may take higher derivatives ofpartial derivatives:

Dj1 · · ·Djrf : A1 × · · · ×Am → L(Ej1 , L(. . . , L(Ejr , F )) . . . ).

These are sometimes known as mixed partial derivatives. Theorem 2.24 hasan important interpretation in terms of the mixed partial derivatives of f .

Theorem 2.34 (Equality of mixed partial derivatives). Let A1 × · · · × Am ⊆E1 × · · · × Em where each Aj is open in Ej and let f : A1 × · · · × Am → F be ofclass C2. Then

DjDkf(x)(u)(v) = DkDjf(x)(v)(u)

for all 1 ≤ j, k ≤ m, x ∈ A1 × · · · ×Am, u ∈ Aj and v ∈ Ak.

101

2 Differentiation

Proof. For j = 1, . . . ,m, let ιj : Ej → E1 × · · · × Em be the canonical injection.We have that Dkf(x) = Df(x) ιk, so Dkf = c Df where

c : L(E1 × · · · × Em, F )→ L(Ek, F )

λ 7→ λ ιk.

Similarly, DjDkf = d D(Dkf) where

d : L(E1 × · · · × Em, L(Ek, F ))→ L(Ej , L(Ek, F ))

λ 7→ λ ιj .

Since c and d are both linear maps, we have

DjDkf(x) = [d D(Dkf)](x) = d(D(c Df)(x))

= d(Dc(Df(x)) D2f(x))

= d(c D2f(x)) = c D2f(x) ιj

and

DjDkf(x)(u)(v) = (c D2f(x) ιj)(u)(v)

= c(D2f(x)(ιj(u)))(v)

= D2f(x)(ιj(u))(ιk(v)).

A similar calculation shows that DkDjf(x)(v)(u) = D2f(x)(ιk(v))(ιj(u)). But

D2f(x)(ιk(v))(ιj(u)) = D2f(x)(ιj(u))(ιk(v))

by Theorem 2.23, so the result follows.

The proof of Theorem 2.34 contains a result that is useful in itself:

Theorem 2.35. Let A1 × · · · × Am ⊆ E1 × · · · × Em where each Aj is openin Ej and let f : A1 × · · · × Am → F . Suppose that Dpf(x) exists for somex ∈ A1 × · · · ×Am. Then

Dτ(1) · · ·Dτ(p)f(x)(v1, . . . , vp) = Dpf(x)(ιτ(1)(v1), . . . , ιτ(p)(vp))

for every map τ from 1, . . . , p to 1, . . . ,m, where ιj : Ej → E1 × · · · × Em isthe canonical injection.

Often, p times continuous differentiability is defined in terms of the mixed partialderivatives of f . The following theorem shows that this definition is equivalent toours.

102

2 Differentiation

Theorem 2.36. Let A1× · · · ×Am ⊆ E1× · · · ×Em where each Aj is open in Ejand let f : A1 × · · · × Am → F . Then f is of class Cp (with p ≥ 1) if and only ifthe partial derivative

Dτ(1) · · ·Dτ(k)f

exists on A1 × · · · × Am and is continuous, for every k = 1, . . . , p and every mapτ from 1, . . . , k to 1, . . . ,m. Futhermore,

Dτ(σ(1)) · · ·Dτ(σ(k))f(x)(vσ(1), . . . , vσ(k)) = Dτ(1) · · ·Dτ(k)f(x)(v1, . . . , vk)

= Dkf(x)(ιτ(1)(v1), . . . , ιτ(k)(vk))

for all x ∈ A1 × · · · × Am and any permutation σ of 1, . . . , k, where ιj : Ej →E1 × · · · × Em is the canonical injection.

Proof. This follows directly from Theorem 2.33 and Theorem 2.34.

Differentiation under the integral sign

Theorem 2.37 (Differentiation under the integral sign). Let A ⊆ E be an open setand let [a, b] be a closed interval with a < b. Let f : [a, b]×A→ F be a continuousmap such that DEf exists on [a, b]×A and is continuous. Let g : A→ F be givenby

g(x) =

∫ b

a

f(t, x) dt.

Then g is differentiable on A and

Dg(x) =

∫ b

a

DEf(t, x) dt.

Proof. Let x ∈ A and let

λ =

∫ b

a

DEf(t, x) dt.

For sufficiently small h we have

g(x+ h)− g(x)− λh =

∫ b

a

[f(t, x+ h)− f(t, x)−DEf(t, x)h] dt

=

∫ b

a

[∫ 1

0

DEf(t, x+ sh)h ds−DEf(t, x)h

]dt

=

∫ b

a

∫ 1

0

[DEf(t, x+ sh)−DEf(t, x)]h ds dt

103

2 Differentiation

so that

|g(x+ h)− g(x)− λh||h|

≤∫ b

a

∫ 1

0

|DEf(t, x+ sh)−DEf(t, x)| ds dt

≤ (b− a) sups,t|DEf(t, x+ sh)−DEf(t, x)|

where the sup is taken over all 0 ≤ s ≤ 1 and a ≤ t ≤ b.Let ε > 0. For each t ∈ [a, b] there is a neighborhood Bt × Ut of (t, x) such that|DEf(u, y)−DEf(t, x)| < ε whenever (u, y) ∈ Bt × Ut, and such that Bt andUt are open balls around t and x respectively. Since [a, b] is compact, there arefinitely many balls Bt1 , . . . , Btn that cover [a, b]. Then for sufficiently small h suchthat x+ h ∈

⋂nk=1 Utk and all 0 ≤ s ≤ 1 and a ≤ t ≤ b we have t ∈ Btk for some

k, so

|DEf(t, x+ sh)−DEf(t, x)| ≤ |DEf(t, x+ sh)−DEf(tk, x)|+ |DEf(tk, x)−DEf(t, x)|

< 2ε.

Corollary 2.38 (Fubini’s theorem). If f : [a, b]× [c, d]→ F is continuous, then∫ d

c

∫ b

a

f(s, t) ds dt =

∫ b

a

∫ d

c

f(s, t) dt ds.

Proof. Let

g(x) =

∫ x

a

∫ d

c

f(s, t) dt ds−∫ d

c

∫ x

a

f(s, t) ds dt

so that

g′(x) =

∫ d

c

f(x, t) dt−∫ d

c

f(x, t) dt = 0

for x ∈ (a, b) by Theorem 2.11 and Theorem 2.37. Since g(a) = 0, we have g = 0on [a, b].

Computing higher derivatives

Theorem 2.39. Let A ⊆ E be an open set and let f : A → F . If Dp+1f existson A and

Dpf(x)(v1, . . . , vp) = g(x, v1, . . . , vp)

for x ∈ A and v1, . . . , vp ∈ E, then

Dp+1f(x)(v, v1, . . . , vp) = D1g(x, v1, . . . , vp)(v).

104

2 Differentiation

Proof. For fixed v1, . . . , vp ∈ E we have g(x, v1, . . . , vp) = (h Dpf)(x) whereh(λ) = λ(v1, . . . , vp), so

D1g(x, v1, . . . , vp)(v) = h(Dp+1f(x)(v))

= Dp+1f(x)(v, v1, . . . , vp).

For example, let f : E × E × E → F be a symmetric multilinear map and letg(x) = f(x, x, x). Then

Dg(x)w = f(w, x, x) + f(x,w, x) + f(x, x, w) = 3f(w, x, x).

To apply the previous result, let g1(x) = 3f(w, x, x) = 3f h1 where h1(x) =(w, x, x) so that

D2g(x)(v, w) = Dg1(x)v = 3Df(h1(x))Dh1(x)v

= 3Df(w, x, x)(0, v, v)

= 3(f(0, x, x) + f(w, v, x) + f(w, x, v))

= 6f(v, w, x).

Let g2(x) = 6f(v, w, x) = 6f h2 where h2(x) = (v, w, x) so that

D3g(x)(u, v, w) = Dg2(x)u = 6Df(h2(x))Dh2(x)u

= 6Df(v, w, x)(0, 0, u)

= 6(f(0, w, x) + f(v, 0, x) + f(v, w, u))

= 6f(u, v, w).

Higher derivatives as tensors

For the remainder of this section, we assume that E,F are Banach spaces over Kand E is finite-dimensional. For the purposes of computation, it is often helpfulto think of higher derivatives as tensors. Specifically, let A ⊆ E be an open set,let f : A→ F , and assume that Dpf(x) exists for some x ∈ A. Since Dpf(x) is amultilinear map from Ep to F , we say that Dpf(x) is a multilinear form. Thespace L(E, . . . , E;F ) of such multilinear forms is isomorphic to

L(E ⊗ · · · ⊗ E,F ),

which is in turn isomorphic to

F ⊗ E∗ ⊗ · · · ⊗ E∗ = F ⊗ (E∗)⊗p = F ⊗ T pE∗.

105

2 Differentiation

Therefore we can consider Dpf(x) as a p-tensor, i.e. an element of F ⊗ T pE∗.If y ∈ F and ω1, . . . , ωp ∈ E∗ then according to the above identification we canregard y ⊗ ω1 ⊗ · · · ⊗ ωp as a multilinear map where

(y ⊗ ω1 ⊗ · · · ⊗ ωp)(v1, . . . , vp) = ω1(v1) · · ·ωp(vp)y

for all v1, . . . , vp ∈ E. If F = K then

F ⊗ T pE∗ = K ⊗ T pE∗ = T pE∗,

and Dpf(x) is a multilinear map from Ep to K.

Let e1, . . . , em be a basis for E, and denote the dual basis for E∗ by e∗1, . . . , e∗m.The tensors

e∗j1 ⊗ · · · ⊗ e∗jp : 1 ≤ j1, . . . , jp ≤ m

form a basis for T pE∗. For each j = 1, . . . ,m, let Ej be the subspace generatedby ej so that E = E1 × · · · × Em.

Theorem 2.40. Let A1 × · · · × Am ⊆ E where each Aj is open in Ej, let f :A1 × · · · ×Am → F , and suppose that Dpf(x) exists. Then

Dpf(x) =∑

j1,...,jp

Dj1 · · ·Djpf(x)(ej1 , . . . , ejp)⊗ e∗j1 ⊗ · · · ⊗ e∗jp .

Proof. Write

Dpf(x) =∑

j1,...,jp

aj1,...,jp ⊗ e∗j1 ⊗ · · · ⊗ e∗jp

where aj1,...,jp ∈ F . If we fix indices 1 ≤ i1, . . . , ip ≤ m, then Theorem 2.35 showsthat

Di1 · · ·Dipf(x)(ei1 , . . . , eip) = Dpf(x)(ei1 , . . . , eip)

=∑

j1,...,jp

aj1,...,jp ⊗ e∗j1 ⊗ · · · ⊗ e∗jp(ei1 , . . . , eip)

= ai1,...,ip .

We can apply this formula to express Taylor’s theorem (Corollary 2.28) in Rmusing classical notation.

106

2 Differentiation

Corollary 2.41 (Taylor’s theorem in Rm). Let A ⊆ Rm be an open set and let f :A→ R be a class Cp map. Let x = (x1, . . . , xm) ∈ A and let v = (v1, . . . , vm) ∈ A.Assume that the line segment x+ tv with 0 ≤ t ≤ 1 is contained in A. Then

f(x+ v) =

p−1∑k=0

1

k!

∑j1,...,jk

∂f

∂xj1 · · · ∂xjk(x)vj1 · · · vjk

+1

p!

∑j1,...,jp

∂f

∂xj1 · · · ∂xjp(z)vj1 · · · vjp

=

p−1∑k=0

1

k!

∑i1+···+im=k

(k

i1, . . . , im

)∂f

∂xi11 · · · ∂ximm

(x)vi11 · · · vimm

+1

p!

∑i1+···+im=p

(p

i1, . . . , im

)∂f

∂xi11 · · · ∂ximm

(z)vi11 · · · vimm

for some z in the line segment, where(k

i1, . . . , im

)=

k!

i1! · · · im!

is the multinomial coefficient.

If A ⊆ E is an open set and f : A → F ⊗ T pE∗ is any map, we say that f is ap-tensor field on A. Any function f : A→ F is a 0-tensor field, since T 0E∗ = K.We write Γp(r)(A;F ) for the vector space of all Cr p-tensor fields on A. We write

Γp(A;F ) = Γp(0)(A;F ) for the space of continuous p-tensor fields on A. If F = K,

then we write Γp(r)(A) instead of Γp(r)(A;K).

Let e1, . . . , em be a basis for E and let e∗1, . . . , e∗m be the dual basis for E∗.The derivative of each e∗j is a C∞ 1-tensor field, usually written dxj , satisfyingdxj(x) = e∗j for all x ∈ E. Given f ∈ Γp(A) and g ∈ Γq(A) we can definef ⊗ g ∈ Γp+q(A) by (f ⊗ g)(x) = f(x)⊗ g(x). Then any tensor field f ∈ Γp(A;F )can be written as

f =∑

j1,...,jp

ωj1,...,jp ⊗ dxj1 ⊗ · · · ⊗ dxjp

where ωj1,...,jp : A → F , and it is easy to see that f is of class Cr if and only ifthe functions ωj1,...,jp are of class Cr.

Using this notation, the derivative is a linear map D : Γp(r)(A;F )→ Γp+1(r−1)(A;F )

for r ≥ 1. When p = 0, Theorem 2.40 already provides an explicit formula forD : Γ0

(r)(A;F ) → Γ1(r−1)(A;F ). For higher derivatives, we have the following

result.

107

2 Differentiation

Theorem 2.42. If ωj1,...,jp : A→ F are of class Cr (with r ≥ 1), then

D

∑j1,...,jp

ωj1,...,jp ⊗ dxj1 ⊗ · · · ⊗ dxjp

=

∑j1,...,jp

m∑j=1

Djωj1,...,jp(·)(ej)⊗ dxj ⊗ dxj1 ⊗ · · · ⊗ dxjp ,

where Djωj1,...,jp(·)(ej) is the map x 7→ Djωj1,...,jp(x)(ej).

Proof. Fix indices 1 ≤ j1, . . . , jp ≤ m and fix x ∈ A. Since dxj1 ⊗ · · · ⊗ dxjp is aconstant map,

D(ωj1,...,jp ⊗ dxj1 ⊗ · · · ⊗ dxjp)(x)(u)

= Dωj1,...,jp(x)(u)⊗ (dxj1 ⊗ · · · ⊗ dxjp)(x)

= Dωj1,...,jp(x)(u)⊗ e∗j1 ⊗ · · · ⊗ e∗jp

=

m∑j=1

Djωj1,...,jp(x)(ej)⊗ e∗j (u)e∗j1 ⊗ · · · ⊗ e∗jp

by Theorem 2.40. Therefore

D(ωj1,...,jpdxj1 ⊗ · · · ⊗ dxjp)(x) =

m∑j=1

Djωj1,...,jp(x)(ej)⊗ e∗j ⊗ e∗j1 ⊗ · · · ⊗ e∗jp ,

and the first result follows.

The dxj notation is especially convenient for calculating higher order derivativesof a function f : Rm → R. For example, let f(x, y) = x3 + x2y2 + y3. Then

Df = (3x2 + 2xy2) dx+ (2x2y + 3y2) dy;

D2f = ((6x+ 2y2) dx+ 4xy dy)⊗ dx+ (4xy dx+ (2x2 + 6y) dy)⊗ dy

= (6x+ 2y2) dx⊗2 + 4xy (dx⊗ dy + dy ⊗ dx) + (2x2 + 6y) dy⊗2;

D3f = 6 dx⊗3 + 4y(dx⊗2 ⊗ dy + dx⊗ dy ⊗ dx+ dy ⊗ dx⊗2)

+ 4x(dx⊗ dy⊗2 + dy ⊗ dx⊗ dy + dy⊗2 ⊗ dx) + 6 dy⊗3.

108

2 Differentiation

2.4 Inverse and implicit functions

If U ⊆ E and V ⊆ F are open sets, we say that a bijection f : U → V is a Cp

diffeomorphism if f and f−1 are both of class Cp.

Theorem 2.43 (Contraction principle). Let (X, d) be a complete metric spaceand let ϕ : X → X be a map satisfying

d(ϕ(x), ϕ(y)) ≤ cd(x, y)

for all x, y ∈ X and some constant c < 1. Then there is exactly one x ∈ X forwhich ϕ(x) = x.

Proof. Choose any x0 ∈ X and define xn+1 = ϕ(xn). For all n ≥ 1 we have

d(xn+1, xn) = d(ϕ(xn), ϕ(xn−1)) ≤ cd(xn, xn−1),

so d(xn+1, xn) ≤ cnd(x1, x0) by induction. For all m > n,

d(xn, xm) ≤ d(xn, xn+1) + · · ·+ d(xm−1, xm)

≤ (cn + · · ·+ cm−1)d(x1, x0)

=cn − cm+2

1− cd(x1, x0)

≤ cn(1− c)−1d(x1, x0),

which shows that xn is a Cauchy sequence. Since X is complete, xn → x forsome x ∈ X. Furthermore,

x = limn→∞

xn+1 = limn→∞

ϕ(xn) = ϕ(x)

since ϕ is continuous. Uniqueness follows from the fact that if x, y are bothfixed points of ϕ then d(x, y) = d(ϕ(x), ϕ(y)) ≤ cd(x, y) which is only true whenx = y.

Theorem 2.44 (Inverse function theorem). Let A ⊆ E be an open set and letf : A → F be of class Cp (with p ≥ 1). Suppose that f ′(p) is invertible for somep ∈ A. Then there is a neighborhood U ⊆ A of p such that f(U) is open andf |U : U → f(U) is a Cp diffeomorphism.

Proof. By replacing f with f ′(p)−1 f , we may assume that E = F and f ′(p) =IdE . Since f ′ is continuous at p, there exists an open ball U ⊆ A around p such that|f ′(x)− IdE | < 1

2 for all x ∈ U . For y ∈ f(U), define the map ϕy(x) = x−f(x)+y.

109

2 Differentiation

Note that x is a fixed point of ϕy if and only if f(x) = y. For y ∈ f(U) we have∣∣ϕ′y(x)∣∣ = |f ′(x)− IdE | < 1

2 for all x ∈ U , so by Corollary 2.18 we have

|ϕy(x1)− ϕy(x2)| ≤ 1

2|x1 − x2| (*)

for all x1, x2 ∈ U . Using the uniqueness argument in Theorem 2.43, we concludethat f |U : U → f(U) is a bijection.

Now let b ∈ f(U) so that b = f(a) for some a ∈ U . Let B be an open ball withradius r around a such that B ⊆ U , and let B′ be an open ball of radius r/2around b. We want to show that B′ ⊆ f(U), thus proving that f(U) is open. Lety ∈ B′. If x ∈ B then

|ϕy(x)− a| ≤ |ϕy(x)− ϕy(a)|+ |ϕy(a)− a|

<1

2|x− a|+ |y − b|

< r,

so ϕy(x) ∈ B. This together with (*) shows that ϕy|B : B → B is a contractionmapping, and since B is complete we can apply Theorem 2.43 to obtain a fixedpoint x ∈ B of ϕy|B , which implies that f(x) = y and y ∈ f(U).

For the last part of the proof, we denote f |U by f and (f |U )−1 by f−1 for con-venience. Let y ∈ f(U) and y + k ∈ f(U) with k 6= 0; there exist x ∈ U andx+h ∈ U with y = f(x) and y+ k = f(x+h), noting that h 6= 0. In fact we have

|h− k| = |h− f(x+ h) + f(x)|= |ϕy(x+ h)− ϕy(x)|

≤ 1

2|h|

from (*), so |h| ≤ 2 |k|. Then h→ 0 as k → 0 and∣∣f−1(y + k)− f−1(y)− f ′(x)−1k∣∣

|k|=

∣∣f ′(x)−1(f(x+ h)− f(x))− h∣∣

|k|

≤∣∣f ′(x)−1

∣∣ |f(x+ h)− f(x)− f ′(x)h||k|

≤ 2∣∣f ′(x)−1

∣∣ |f(x+ h)− f(x)− f ′(x)h||h|

→ 0

as h→ 0. This proves that

(f−1)′(y) = f ′(x)−1 = f ′(f−1(y))−1, (**)

110

2 Differentiation

so f−1 is continuous and differentiable on f(U). Furthermore, (**) shows that(f−1)′ is of class Cp since the maps f−1, f ′ and λ 7→ λ−1 (operator inversion) areall of class Cp.

Theorem 2.45 (Implicit function theorem). Let A ⊆ E and B ⊆ F be open setsand let f : A × B → G be of class Cp (with p ≥ 1). Suppose (a, b) ∈ A × Bsuch that f(a, b) = 0 and DF f(a, b) : F → G is invertible. Then there exists aneighborhood U ⊆ A of a and a Cp map g : U → B with the following properties:

1. g(a) = b.

2. f(x, g(x)) = 0 for all x ∈ U .

3. g′(a) = −[DF f(a, b)]−1 DEf(a, b).

Proof. Define

f : A×B → E ×G(x, y) 7→ (x, f(x, y))

and compute

f ′(a, b) =

[IdE 0

DEf(a, b) DF f(a, b)

].

Then f ′(a, b) is invertible, with

f ′(a, b)−1 =

[IdE 0

−[DF f(a, b)]−1 DEf(a, b) [DF f(a, b)]−1

]. (*)

By Theorem 2.44, there exist neighborhoods V ⊆ A × B of (a, b) and W ⊆E × G of (a, 0) such that f |V : V → W is a Cp diffeomorphism. Let U =x ∈ E : (x, 0) ∈W; it is clear that U is a neighborhood of a. Define g : U → B

by g = π (f |V )−1 i where π : A × B → B is the canonical projection andi : A→ A×B is given by i(x) = (x, 0). To complete the proof, we check the threerequired properties. Firstly,

g(a) = π((f |V )−1(a, 0)) = π(a, b) = b

since f(a, b) = (a, 0). If x ∈ U then (x, 0) ∈ W , so (x, f(x, y)) = f(x, y) = (x, 0)for a unique y ∈ B and

f(x, g(x)) = f(x, π((f |V )−1(x, 0))) = f(x, π(x, y)) = f(x, y) = 0.

Lastly, g′(b) is simply the bottom left entry of (*).

111

2 Differentiation

Theorem 2.46 (Constant rank theorem). Let E,F be finite-dimensional normedvector spaces with m = dimE and n = dimF . Let A ⊆ E be an open set and letf : A → F be of class Cp (with p ≥ 1). Suppose that f ′(x) has (constant) rank rfor all x ∈ A, and let p ∈ A. Then there exist

1. a neighborhood U ⊆ A of p and an open set V ⊆ F with f(U) ⊆ V ,

2. open sets U ′ ⊆ E and V ′ ⊆ F ,

3. Cp diffeomorphisms ϕ : U → U ′ and ψ : V → V ′,

4. decompositions E = E1 ⊕ E2 and F = F1 ⊕ F2 with dimE1 = dimF1 = r,and

5. an isomorphism τ : E1 → F1,

such that

(ψ f ϕ−1)|U ′∩E1= τ |U ′ and (ψ f ϕ−1)|U ′∩F1

= 0.

Proof. Since f ′(p) has rank r, we can choose subspaces E1 of E and F2 of F suchthat E = E1 ⊕ ker f ′(p) and F = im f ′(p)⊕ F2. Then f ′(p)|E1

is an isomorphismfrom E1 to im f ′(p). Let E2 = ker f ′(p), F1 = im f ′(p), and τ = f ′(p)|E1

. LetπEi : E → Ei and πFi : F → Fi be the projections. Define

g : A→ E

(x, y) 7→ ((τ−1 πF1 f)(x, y), y).

(Note that x ∈ E1 and y ∈ E2.) Then

g′(p) =

[τ−1 πF1 D1f(p) τ−1 πF1 D2f(p)

0 IdE2

]=

[IdE1

τ−1 πF1D2f(p)

0 IdE2

],

which is invertible. By Theorem 2.44, there exist neighborhoods U ⊆ A of p andV of g(p) such that g|U : U → V is a Cp diffeomorphism. Let ϕ = g|U . Sinceg ϕ−1 = IdV , we have

((τ−1 πF1 f ϕ−1)(x, y), (πE2 ϕ−1)(x, y)) = (x, y)

which implies that πE2 ϕ−1 = IdV and

(τ−1 πF1 f)((πE1

ϕ−1)(x, y), y) = x.

112

2 Differentiation

Therefore(f ϕ−1)(x, y) = (τ(x), h(x, y))

for all (x, y) ∈ V , where h = πF2 f ϕ−1. Since

(f ϕ−1)′(x, y) =

[τ 0

D1h(x, y) D2h(x, y)

]has rank r and τ has rank r, we must have D2h(x, y) = 0 for all (x, y) ∈ V . Byshrinking V , we may assume that V = V1 × V2 where V1 ⊆ E1 and V2 ⊆ E2 areconnected open sets. Since V2 is connected, h(x, y) is independent of y. If wechoose any q ∈ V2, then

(f ϕ−1)(x, y) = (τ(x), h(x, q))

for all (x, y) ∈ V . Define ψ : τ(V1) × F2 → F by ψ(x, y) = (x, y − h(τ−1(x), q)).This map is a diffeomorphism onto its image, since

ψ−1(x, y) = (x, y + h(τ−1(x), q)).

Then(ψ f ϕ−1)(x, y) = (τ(x), 0),

and the result follows after shrinking the sets U , V and τ(U1)× F2.

There is a generalization of Theorem 2.44 for the case when the derivative issurjective. Note that there is an analogous relationship between Theorem 1.61and Corollary 1.130.

Theorem 2.47 (Surjective mapping theorem). Let A ⊆ E be an open set and letf : A→ F be continuously differentiable. Suppose that f ′(p) is surjective for somep ∈ A. Then there is a neighborhood U ⊆ A of p such that f |U : U → F is anopen map. In particular, f(U) is open.

Proof. TODO

2.5 Maxima and minima

Consequences of Taylor’s theorem

Definition 2.48. Let f : X → R be a map defined on a topological space X. Ifthere is a neighborhood U of x ∈ X such that f(t) ≤ f(x) for all t ∈ U , then we

113

2 Differentiation

say that f has a local maximum at x. Similarly, if f(t) ≥ f(x) for all t ∈ Uthen we say that f has a local minimum at x. If f has a local maximum or localminimum at x, then we say that f has an extreme value at x. If strict inequalityholds, then we say that f has a strict local maximum or minimum.

In single variable calculus, a differentiable function f : R→ R has a local maximumor minimum at a point x ∈ R only if f ′(x) = 0. It is easy to extend this result tomaps defined on Banach spaces.

Theorem 2.49. Let A ⊆ E be an open set and let f : A→ R. If f is differentiableat x ∈ A and has an extreme value at x, then f ′(x) = 0.

Proof. Let v ∈ E and let g(t) = x+ tv. Then f g has an extreme value at 0, so0 = (f g)′(0) = f ′(g(0))g′(0) = f ′(x)v. Therefore f ′(x) = 0.

Also recall that if f : R → R is of class C1 and there is a point x ∈ R such thatf ′(x) = 0, then f(x) is a local minimum if f ′′(x) > 0 and f(x) is a local maximumif f ′′(x) < 0. There is a similar test for higher derivatives that follows from Taylor’stheorem. Again, we can prove analogous statements for maps defined on Banachspaces.

Recall that if q ∈ L(E, . . . , E;R) is a multilinear map from Ep to R, then we saythat q is a multilinear form.

Definition 2.50. Write h(p) for the p-tuple (h, . . . , h). We say that a form q ispositive semidefinite if qh(p) ≥ 0 for all h and positive definite if qh(p) > 0 forall h 6= 0. The terms negative semidefinite and negative definite are definedsimilarly. If qh(p) takes on both positive and negative values, then we say that qis indefinite.

Theorem 2.51 (Higher derivative test). Let A ⊆ E be an open set and let f :A→ R. Assume that f is (p−1) times continuously differentiable and that Dpf(x)exists for some p ≥ 2 and x ∈ A. Also assume that f ′(x), . . . , f (p−1)(x) = 0 andf (p)(x) 6= 0. Write h(p) for the p-tuple (h, . . . , h).

1. If f has an extreme value at x, then p is even and the form f (p)(x) is semidef-inite.

2. If there is a constant c such that f (p)(x)h(p) ≥ c > 0 for all |h| = 1, then fhas a strict local minimum at x and (1) applies.

3. If there is a constant c such that f (p)(x)h(p) ≤ c < 0 for all |h| = 1, then fhas a strict local maximum at x and (1) applies.

114

2 Differentiation

Proof. By Corollary 2.26 and the given assumptions, we can write

f(x+ h)− f(x) =1

p!f (p)(x)h(p) + θ(h) |h|p

where θ(h)→ 0 as h→ 0. First assume that f has an extreme value at x. Choose

a vector h0 6= 0 such that f (p)(x)h(p)0 6= 0. Then for sufficiently small t ∈ R we

have both

f(x+ th0)− f(x) =

(1

p!f (p)(x)h

(p)0 ± θ(th0) |h0|p

)tp (*)

and

|θ(th0)| |h0|p <1

p!f (p)(x)h

(p)0 .

For these t, the sign of (*) is the same as the sign of f (p)(x)h(p)0 . Since x is an

extreme value, the sign of (*) must remain constant for small t, which cannothappen unless p is even. Similarly, if f (p)(x) is not semidefinite then there is some

vector h1 6= 0 such that f (p)(x)h(p)1 and f (p)(x)h

(p)0 have opposite signs, which

contradicts the fact that the sign of (*) is constant for small t.

Now suppose that the condition in (2) holds. Then

f(x+ h)− f(x) =1

p!f (p)(x)h(p) + θ(h) |h|p

=

[1

p!f (p)(x)

(h

|h|

)(p)

+ θ(h)

]|h|p

≥[c

p!+ θ(h)

]|h|p .

Since θ(h)→ 0 as h→ 0, the last term is positive for sufficiently small h 6= 0. Forthese h we have f(x+ h) > f(x), so f has a strict local minimum at x. The prooffor (3) is similar.

Corollary 2.52 (Higher derivative test, finite-dimensional case). In Theorem2.51, further assume that E is finite-dimensional. Then h 7→ f (p)(x)h(p) hasboth a minimum and maximum value on the set h ∈ E : |h| = 1, and:

1. If the form f (p)(x) is indefinite, then f does not have an extreme value at x.

2. If the form f (p)(x) is positive definite, then f has a strict local minimum atx.

3. If the form f (p)(x) is negative definite, then f has a strict local maximum atx.

115

2 Differentiation

Proof. Since E is finite-dimensional, the set S = h ∈ E : |h| = 1 is compact.Therefore the continuous map h 7→ f (p)(x)h(p) attains a minimum c and a maxi-mum C on S. Part (1) follows directly from part (1) of Theorem 2.51. If f (p)(x)is positive definite then c > 0, so part (2) of Theorem 2.51 applies. If f (p)(x) isnegative definite then C < 0, so part (3) of Theorem 2.51 applies.

Convexity

Definition 2.53. Let A ⊆ E be a convex open set and let f : A → R. We saythat f is convex if

f(x+ λ(y − x)) ≤ f(x) + λ(f(y)− f(x))

for all x, y ∈ A and λ ∈ (0, 1).

Recall that a twice differentiable function f : (a, b) → R is convex if and only iff ′′(x) ≥ 0 for all x ∈ (a, b).

Theorem 2.54. Let A ⊆ E be a convex open set and let f : A→ R. Suppose thatf ′′ exists on A. Then f is convex if and only if f ′′(x) is positive semidefinite forall x ∈ A.

Proof. Let x, y ∈ A and define γ : (−ε, 1 + ε)→ A by γ(λ) = x+ λ(y − x), whereε > 0 is chosen to be small enough. Then Dγ(λ)(1) = y−x and D2γ(λ)(1, 1) = 0.By Theorem 2.29, we have

(f γ)′′(λ) = D2(f γ)(λ)(1, 1)

= f ′′(γ(λ))(y − x, y − x). (*)

If f ′′(z) is positive semidefinite for all z ∈ A then (f γ)′′(λ) ≥ 0 for all λ, so f γis convex. Then for all λ ∈ (0, 1),

(f γ)(0 + λ(1− 0)) ≤ (f γ)(0) + λ[(f γ)(1)− (f γ)(0)]

andf(x+ λ(y − x)) ≤ f(x) + λ(f(y)− f(x)),

which proves that f is convex. Conversely, suppose that f is convex. Let z ∈ Aand choose r > 0 so that the open ball of radius r around z is contained in A. Let|u| < r and set x = z − u/2 and y = z + u/2 in (*) so that

f ′′(z)(u, u) = f ′′(γ(1/2))(y − x, y − x)

= (f γ)′′(1/2)

≥ 0

since fγ is convex. This shows that f ′′(z) is positive semidefinite for all z ∈ A.

116

2 Differentiation

Suppose E is finite-dimensional, A ⊆ E is a convex open set and f : A → R.Further suppose that f ′′(x) is positive semidefinite for all x ∈ A. Then combiningTheorem 2.54 with Theorem 2.52, we see that f has a strict global minimum atany x such that f ′(x) = 0 and f ′′(x) is positive definite.

Lagrange multipliers

The method of Lagrange multipliers provides a necessary condition for a functionf : A → R to be maximized or minimized subject to a constraint expressed as afunction g : A→ R.

Lemma 2.55. Let f ∈ E∗ and g ∈ L(E,F ). If im g is closed and ker g ⊆ ker f ,then f = λ g for some λ ∈ F ∗.

Proof. We have f ∈ (ker g)⊥, so Theorem 1.79 shows that f ∈ im g∗. That is,f = g∗λ = λ g for some λ ∈ F ∗.

Theorem 2.56 (Method of Lagrange multipliers). Let A ⊆ E be an open set. Letf : A → R and g : A → R be of class C1, and let S = g−1(0). If f |S has anextreme value at x ∈ S, g′(x) is surjective, and ker g′(x) splits E, then there is acontinuous linear map λ : R→ R such that f ′(x) = λ g′(x).

Proof. If we can prove that ker g′(x) ⊆ ker f ′(x) then the result follows fromLemma 2.55. Let F = ker g′(x) and choose a closed subspace G ⊆ E such that E =F ⊕G. Write x = (x1, x2) where x1 ∈ F and x2 ∈ G, and choose neighborhoods Bof x1 and C of x2 such that B×C ⊆ A. Since g′(x) is surjective and DF g(x) = 0,DGg(x) must be invertible; we also have g(x1, x2) = 0. By Theorem 2.45, thereexists a neighborhood U ⊆ B of x1 and a C1 map h : U → C such that h(x1) = x2

and g(t, h(t)) = 0 for all t ∈ U . Let k : U → A be given by t 7→ (t, h(t)) sothat k(U) ⊆ S and k′(x1)|F is the identity map on F . Since f |S has an extremevalue at x we have (f k)′(x1) = 0 by Theorem 2.49, so f ′(x)k′(x1) = 0 by thechain rule. In particular, if v ∈ ker g′(x) = F then f ′(x)v = f ′(x)k′(x1)v = 0, sov ∈ ker f ′(x).

2.6 Special functions

Uniform convergence and differentiation

Theorem 2.57. Let A ⊆ E be an open set. Let fn be a sequence of C1 mapsfrom U to F such that fn → f pointwise. If f ′n converges uniformly to a mapg : U → L(E,F ), then f is differentiable and f ′ = g.

117

2 Differentiation

Proof. We write ‖·‖ for the sup norm. Let x ∈ U and let B ⊆ U be an open ballaround x. By Corollary 2.18, we have

|fm(x+ h)− fn(x+ h)− [fm(x)− fn(x)]| ≤ |h| supt∈B|f ′m(t)− f ′n(t)| (*)

for all h such that x + h ∈ B. Let ε > 0 and choose N such that ‖f ′m − f ′n‖ < εand ‖f ′n − g‖ < ε for all m,n ≥ N (where ‖·‖ is the sup norm). Taking m → ∞in (*) shows that

|f(x+ h)− fn(x+ h)− [f(x)− fn(x)]| ≤ |h| ε.

For each n ≥ N , applying Corollary 2.18 to the map y 7→ fn(y)− f ′n(x)y gives

|fn(x+ h)− fn(x)− f ′n(x)h| ≤ |h| ε.

Then

|f(x+ h)− f(x)− g(x)h| ≤ |f(x+ h)− fn(x+ h)− [f(x)− fn(x)]|+ |fn(x+ h)− fn(x)− f ′n(x)h|+ |f ′n(x)h− g(x)h|

≤ 3 |h| ε.

Uniform convergence and power series

Recall that a power series in a unital Banach algebra A is a series of the form

f(x) =

∞∑n=0

cnxn

where x ∈ A, the cn are values in some normed vector space E, and we are givena continuous bilinear map · : E × A → F . In order to compute the derivative off , we first need a generalization of the power rule from single variable calculus.

Theorem 2.58 (Power rule). Let E be a unital Banach algebra, let n ≥ 0, andlet pn : E → E be the map defined by pn(x) = xn. Then Dpn(x) is the linear mapgiven by

u 7→n−1∑k=0

xkuxn−k−1.

In particular, if E is commutative then Dpn(x) is given by

u 7→ nxn−1u.

118

2 Differentiation

Proof. We use induction on n. The case n = 0 is clear, so suppose that the resultholds for n − 1. Since pn(x) = xpn−1(x), the product rule shows that Dpn(x)maps u to

upn−1(x) + xDpn−1(x)u = uxn−1 + x

n−2∑k=0

xkuxn−k−2

= uxn−1 +

n−1∑k=1

xkuxn−k−1

=

n−1∑k=0

xkuxn−k−1.

Theorem 2.59. Let

f(x) =

∞∑n=0

cnxn

be a power series with radius of convergence R. For all ε > 0, the series con-verges uniformly on x ∈ E : |x| ≤ R− ε. Furthermore, f ′ is differentiable onx ∈ E : |x| < R with f ′(x) given by

u 7→∞∑n=1

cn

n−1∑k=0

xkuxn−k−1.

In particular, if E is commutative then f ′(x) is given by

u 7→

( ∞∑n=1

ncnxn−1

)u.

Proof. See Theorem 1.137 for the first statement. Then apply Theorem 2.58 andTheorem 2.57 for each ε > 0.

The exponential function

Definition 2.60. Let E be a unital Banach algebra over K. If x ∈ E, theexponential of x is

exp(x) =

∞∑n=0

xn

n!,

119

2 Differentiation

which converges absolutely for all x. Thus we have a map exp : E → E, called theexponential function. We will write 1 instead of e for the unit in E, to avoidconfusion with the notation ex.

Theorem 2.61. Let x, y ∈ E.

1. exp(0) = 1, where 1 is the unit in E.

2. If xy = yx then exp(x+ y) = exp(x) exp(y) = exp(y) exp(x). In particular,exp(x) is always invertible.

3. If y is invertible then exp(yxy−1) = y exp(x)y−1.

Proof. These identities follow directly from the power series.

Theorem 2.62.

1. We have

D exp(x)u =

∞∑n=1

1

n!

n−1∑k=0

xkuxn−k−1

=

∫ 1

0

exp(sx)u exp((1− s)x) ds.

If xu = ux, thenD exp(x)u = exp(x)u = u exp(x).

2. If γ : K → E is of class C1, then

(exp γ)′(r) =

∫ 1

0

exp(sγ(r))γ′(r) exp((1− s)γ(r)) ds.

(Here we are identifying γ′(r) with γ′(r)(1).) In particular,

d

drexp(rx) = exp(rx)x = x exp(rx).

Proof. The first equality in (1) follows from Theorem 2.59. To prove the secondequality, first note that ∫ 1

0

sm(1− s)n ds =m!n!

(m+ n+ 1)!.

120

2 Differentiation

Therefore∫ 1

0

exp(sx)u exp((1− s)x) ds =

∫ 1

0

∞∑m=0

smxm

m!u

∞∑n=0

(1− s)nxn

n!ds

=

∞∑m=0

∞∑n=0

xmuxn

m!n!

∫ 1

0

sm(1− s)n ds

=

∞∑m=0

∞∑n=0

xmuxn

(m+ n+ 1)!,

which is clearly equal to∞∑n=1

n−1∑k=0

xkuxn−k−1

n!.

For (2), applying the chain rule gives

(exp γ)′(r) = D exp(γ(r))γ′(r)

=

∫ 1

0

exp(sγ(r))γ′(r) exp((1− s)γ(r)) ds.

In particular,

d

drexp(rx) =

∫ 1

0

exp(srx)x exp((1− s)rx) ds

=

∫ 1

0

exp(rx)x ds

= exp(rx)x = x exp(rx).

Theorem 2.63 (Linear systems of differential equations). Let V be a Banachspace, let U be a connected open subset of R, let f : U → V be differentiable andsuppose that f ′(t) = A(t)f(t) + b(t) for all t ∈ U , where A : U → L(V ) andb : U → V are continuous. Assume that A(s)A(t) = A(t)A(s) for all s, t ∈ U .Choose any a ∈ U . Then there exists some c ∈ V such that

f(t) = exp(A(t))

(c+

∫ t

a

exp(−A(s))b(s) ds

),

where

A(t) =

∫ t

a

A(s) ds.

121

2 Differentiation

Proof. Let

g(t) = exp(−A(t))f(t)−∫ t

a

exp(−A(s))b(s) ds.

It is easy to verify that A(t) = A′(t) commutes with A(t) for every t ∈ U , so

g′(t) = exp(−A(t))(−A′(t))f(t) + exp(−A(t))f ′(t)− exp(−A(t))b(t)

= − exp(−A(t))A(t)f(t) + exp(−A(t))(A(t)f(t) + b(t))− exp(−A(t))b(t)

= 0

and g is constant by Corollary 2.19.

Corollary 2.64 (Linear systems of differential equations with constant coeffi-cients). Let V be a Banach space, let U be a connected open subset of R, letf : U → V be differentiable and suppose that f ′(t) = Af(t) + b(t) for all t ∈ U ,where A ∈ L(V ) and b : U → V is continuous. Choose any a ∈ U . Then thereexists some c ∈ V such that

f(t) = exp(tA)

(c+

∫ t

a

exp(−sA)b(s) ds

).

In particular, if b = 0 thenf(t) = exp(tA)c.

Definition 2.65. Let E be a unital commutative Banach algebra over K. Wedefine the logarithm of an element x by

log(x) = −∞∑n=1

(1− x)n

n,

which converges absolutely for all |1− x| < 1. Thus we have a map log : B1(1)→E, where B1(1) is the open ball of radius 1 around the unit in E.

Theorem 2.66.

1. log(1) = 0.

2. D log(x)u = x−1u = ux−1.

3. If x ∈ B1(1) we have exp(log(x)) = x.

4. If |x| < log 2 we have log(exp(x)) = x.

5. If x, y ∈ B1(1) with xy ∈ B1(1), we have log(xy) = log(x) + log(y).

122

2 Differentiation

Proof. (2) follows directly from Theorem 2.59. For (3), let f(x) = x exp(− log(x)).Then

Df(x)u = u exp(− log(x))− x exp(− log(x))x−1u = 0

for all u ∈ E, so f is constant on the connected set B1(1). Since f(1) = 1, we have

exp(log(x)) = x(x exp(− log(x)))−1 = x.

For (4), let f(x) = log(exp(x))− x. This function is defined on Blog(2)(0) since

|1− exp(x)| ≤∞∑n=1

|x|n

n!= exp(|x|)− 1 < 1.

We haveDf(x)u = exp(x)−1 exp(x)u− u = 0

for all u ∈ E and f(0) = 0, so log(exp(x)) = x. For (5), let f(x) = log(xy) −log(x)− log(y). Then

Df(x)u = uy(xy)−1 − ux−1 = 0

for all u ∈ E and f(1) = 0, so log(xy) = log(x) + log(y).

2.7 Line integrals

Let E,F be Banach spaces over K and let U ⊆ E be an open set. We definea form on U to be a continuous map ω : U → L(E,F ). A path in U is acontinuous map γ : [a, b] → U . If γ(a) = γ(b) then we say that γ is a closedpath. We say that a continuous map γ : [a, b] → U is a curve (or piecewiseC1) if there is a partition a = a0 < · · · < ak = b of [a, b] such that γ|[ai−1,ai] is acontinuously differentiable path for each i. If γ(a) = γ(b) then we say that γ is aclosed curve.

If γ : [a, b]→ U is a C1 path and ω is a form on U , we define the (line) integralof ω along γ by ∫

γ

ω =

∫γ

ω(x) dx =

∫ b

a

ω(γ(t))γ′(t) dt.

(Note that ω(γ(t)) is a linear map from E to F .) If γ is a curve with partitiona0, . . . , ak, we define the integral of ω along γ by∫

γ

ω =

∫γ

ω(x) dx =

k∑i=1

∫γ|[ai−1,ai]

ω.

123

2 Differentiation

Theorem 2.67 (Properties of line integrals). Let γ : [a, b] → U be a curve andlet ω, η be forms on U .

1. For any a, b ∈ K, ∫γ

(aω + bη) = a

∫γ

ω + b

∫γ

η.

2. For any Banach space G and any continuous linear map f : F → G,∫γ

fω = f

(∫γ

ω

),

where fω : U → L(E,G) is the form defined by (fω)(x)u = f(ω(x)u).

3. If γ is constant then ∫γ

ω = 0.

4. If γ1 = γ|[a,c] and γ2 = γ|[c,b] where a < c < b then∫γ

ω =

∫γ1

ω +

∫γ2

ω.

5. If ϕ : [c, d] → [a, b] is a continuously differentiable function with ϕ(c) = aand ϕ(d) = b then ∫

γϕω =

∫γ

ω.

If ϕ(c) = b and ϕ(d) = a (i.e. ϕ is decreasing) then∫γϕ

ω = −∫γ

ω.

6. We have ∣∣∣∣∫γ

ω

∣∣∣∣ ≤ L(γ) supt∈[a,b]

|ω(γ(t))| ,

where L(γ) is the length of γ defined by

L(γ) =

k∑i=1

∫ ai

ai−1

|γ′(t)| dt.

124

2 Differentiation

7. If ωn is a sequence of forms on U converging compactly (i.e. uniformly oncompact sets) to a form ω, then∫

γ

ω = limn→∞

∫γ

ωn.

Proof. By linearity, we can assume that γ is a C1 path. For (2), we have∫γ

fω =

∫ b

a

f(ω(γ(t))γ′(t)) dt = f

(∫ b

a

ω(γ(t))γ′(t) dt

)

= f

(∫γ

ω

)by Theorem 1.138. For (4), we have∫

γϕω =

∫ d

c

ω((γ ϕ)(t))(γ ϕ)′(t) dt

=

∫ d

c

ω(γ(ϕ(t)))γ′(ϕ(t))ϕ′(t) dt

=

∫ b

a


=

∫γ

ω

by Theorem 2.14. For (5), we have∣∣∣∣∫γ

ω

∣∣∣∣ =

∣∣∣∣∣∫ b

a


∣∣∣∣∣ ≤∫ b

a

|ω(γ(t))| |γ′(t)| dt

≤ L(γ) supt∈[a,b]

|ω(γ(t))| .

For (6), we have∣∣∣∣∫γ

ωn −∫γ

ω

∣∣∣∣ =

∣∣∣∣∫γ

(ωn − ω)

∣∣∣∣ ≤ L(γ) supt∈[a,b]

|(ωn − ω)(γ(t))|

→ 0

as n→∞.

125

2 Differentiation

Exact, conservative and closed forms

We have an important generalization of the fundamental theorem of calculus toline integrals.

Theorem 2.68 (Fundamental theorem for line integrals). Let f : U → F becontinuously differentiable and let γ : [a, b]→ U be a curve. Then∫

γ

Df = f(γ(b))− f(γ(a)).

Proof. First assume that γ is a C1 path. Then∫γ

Df =

∫ b

a

Df(γ(t))γ′(t) dt =

∫ b

a

(f γ)′(t) dt = f(γ(b))− f(γ(a))

by the fundamental theorem of calculus. If γ is a curve with partition a0, . . . , akthen ∫

γ

Df =

k∑i=1

[f(γ(ak))− f(γ(ak−1))] = f(γ(b))− f(γ(a)).

Note that in particular we have ∫γ

Df = 0

for any closed curve γ in U . If ω is a form on U , a function f : U → F satisfyingω = Df is called a potential for ω. We say that a form ω is exact if it has apotential function. Note that if U is connected then Corollary 2.19 implies thatf − g is constant if f, g are any two potentials for ω. If the integral of ω along anyclosed curve is zero, then we say that ω is conservative. It is easy to see that aform is conservative if and only if it is path-independent, in the sense that∫

γ

ω =

∫γ

ω

whenever γ, γ are curves with the same starting and ending points.

Theorem 2.69. A form is conservative if and only if it is exact.

126

2 Differentiation

Proof. Theorem 2.68 shows that every exact form is conservative, so it remainsto show that every conservative form is exact. Let ω be a conservative form onU . We can assume that U is connected, for otherwise we can obtain a potentialfunction fα on each component Uα of U and define a potential f : U → F for ωby setting f |Uα = fα. Since ω is path-independent, for any two points x, y ∈ Uwe can define ∫ y

x

ω =

∫γ

ω

where γ is any curve from x to y. Choose some x0 ∈ U and let

f(x) =

∫ x

x0

ω;

we want to show that ω = Df . Let x ∈ U and choose r > 0 so that the open ballof radius r around x is contained in U . For all |h| < r the straight line from x tox+ h is contained in U , so

1

|h||f(x+ h)− f(x)− ω(x)h| = 1

|h|

∣∣∣∣∣∫ x+h

x0

ω −∫ x

x0

ω − ω(x)h

∣∣∣∣∣=

1

|h|

∣∣∣∣∣∫ x+h

x

ω − ω(x)h

∣∣∣∣∣=

1

|h|

∣∣∣∣∫ 1

0

ω(x+ th)h dt−∫ 1

0

ω(x)h dt

∣∣∣∣=

1

|h|

∣∣∣∣(∫ 1

0

[ω(x+ th)− ω(x)] dt

)h

∣∣∣∣≤ supt∈[0,1]

|ω(x+ th)− ω(x)|

→ 0

as h→ 0 since ω is continuous.

If ω is a differentiable form on U , we say that ω is closed if Dω(x) ∈ L(E,E;F )is symmetric for every x ∈ U . If ω = Df for some C2 map f : U → F thenDω(x) = D2f(x) is always symmetric by Theorem 2.23, so we have the followingresult:

Theorem 2.70. Every exact C1 form is closed.

The converse of Theorem 2.70 holds for certain kinds of sets:

127

2 Differentiation

Theorem 2.71 (Poincare lemma). Let U ⊆ E be a star-shaped open set. Everyclosed C1 form on U is exact.

Proof. Suppose that U is star-shaped with respect to some x0 ∈ U , i.e. the linesegment from x0 to any x ∈ U is contained in U . By translating U , we can assumethat x0 = 0. Let ω be a closed form on U . Define

f(x) =

∫ 1

0

ω(tx)x dt,

which is simply the integral of ω along the straight line segment from 0 to x. ByTheorem 2.37,

Df(x)u =

∫ 1

0

[tDω(tx)(u, x) + ω(tx)u] dt

=

∫ 1

0

[tDω(tx)(x, u) + ω(tx)u] dt

=

(∫ 1

0

[tDω(tx)x+ ω(tx)] dt

)u

=

(∫ 1

0

d

dt(tω(tx)) dt

)u

= ω(x)u.

Lemma 2.72. Let R = [a, b]× [c, d] be a rectangle and let ω be a closed (differen-tiable) form defined on an open subset of R2 containing R. Then∫

∂R

ω = 0,

where the integral is taken counterclockwise along the boundary of R.

Proof. First note that D(Dω(x))(y) = Dω(x) is symmetric for all y ∈ R2, soTheorem 2.71 shows that Dω(x) is exact for all x ∈ R. Decompose R into thefour rectangles

R1 = [a, b−a2 ]× [c, d−c2 ],

R2 = [ b−a2 , b]× [c, d−c2 ],

R3 = [a, b−a2 ]× [d−c2 , d],

R4 = [ b−a2 , b]× [d−c2 , d].

128

2 Differentiation

Due to the orientations of ∂R1, . . . , ∂R4 we have∫∂R

ω =

4∑i=1

∫∂Ri

ω and

∣∣∣∣∫∂R

ω

∣∣∣∣ ≤ 4∑i=1

∣∣∣∣∫∂Ri

ω

∣∣∣∣ ,so there is a rectangle R(1) among R1, . . . , R4 for which∣∣∣∣∫

∂R(1)

ω

∣∣∣∣ ≥ 1

4

∣∣∣∣∫∂R

ω

∣∣∣∣ .Replacing R with R(1) in the above, we have a rectangle R(2) ⊆ R(1) such that∣∣∣∣∫

∂R(2)

ω

∣∣∣∣ ≥ 1

4

∣∣∣∣∫∂R(1)

ω

∣∣∣∣ .Repeating this process, we obtain a sequence of rectangles

R(1) ⊇ R(2) ⊇ · · ·

such that ∣∣∣∣∫∂R(n+1)

ω

∣∣∣∣ ≥ 1

4

∣∣∣∣∫∂R(n)

ω

∣∣∣∣for all n. Therefore ∣∣∣∣∫

∂R(n)

ω

∣∣∣∣ ≥ 1

4n

∣∣∣∣∫∂R

ω

∣∣∣∣ .If L0 is the length of ∂R and Ln is the length of ∂R(n), then Ln = L0/2

n, and ifdiamR is the diameter of R, then diamR(n) = (diamR)/2n. Since every R(n) iscompact and diamR(n) → 0 as n→∞, there is exactly one point

x0 ∈∞⋂n=1

R(n).

Since ω is differentiable at x0, there exists a neighborhood U of x0 such that

ω(x) = ω(x0) +Dω(x0)(x− x0) + θ(x− x0)

for every x ∈ U , where θ is a continuous function into L(E,F ) satisfying θ(0) = 0and

limx→x0

θ(x− x0)

|x− x0|= 0. (*)

129

2 Differentiation

(This follows directly from the definition of the derivative.) For sufficiently largen we have R(n) ⊆ U and∫

∂R(n)

ω =

∫∂R(n)

ω(x0) dx+

∫∂R(n)

Dω(x0)(x− x0) dx+

∫∂R(n)

θ(x− x0) dx

=

∫∂R(n)

[ω(x0)−Dω(x0)x0] dx+

∫∂R(n)

Dω(x0) +

∫∂R(n)

θ(x− x0) dx

=

∫∂R(n)

θ(x− x0) dx

since x 7→ [ω(x0) − Dω(x0)x0]x is a primitive for the constant form ω(x0) −Dω(x0)x0 and Dω(x0) is exact. Therefore∣∣∣∣∫

∂R

ω

∣∣∣∣ ≤ 4n∣∣∣∣∫∂R(n)

ω

∣∣∣∣≤ 4nLn sup

x∈R(n)

|θ(x− x0)|

≤ 4nLn(diamR(n)) supx∈R(n)\x0

|θ(x− x0)||x− x0|

= L0(diamR) supx∈R(n)\x0

|θ(x− x0)||x− x0|

→ 0

as n→∞ by (*).

Theorem 2.73 (Morera’s theorem). If ω is a form on a disc

U =x ∈ R2 : |x− x0| < r

and ∫

∂R

ω = 0

for every rectangle R contained in U , then ω is exact.

Proof. Define

f(x) =

∫ x

x0

ω,

where the integral is taken along the sides of a rectangle whose opposite verticesare x0 and x. By considering an appropriate rectangle we have

f(x+ h)− f(x) =

∫ x+h

x

ω,

so we can use the argument in Theorem 2.69 to show that ω = Df .

130

2 Differentiation

Corollary 2.74 (Goursat’s theorem). If ω is a closed (differentiable) form on adisc

U =x ∈ R2 : |x− x0| < r

,

then ω is exact.

Proof. Apply Lemma 2.72 and Theorem 2.73.

A closed form ω on U is said to be locally exact if for every x ∈ U there is aneighborhood V ⊆ U of x on which ω is exact. Theorem 2.71 shows that everyclosed C1 form on an open subset of E is locally exact, and Corollary 2.74 showsthat every closed form (not necessarily of class C1) on an open subset of R2 islocally exact.

Locally exact forms

So far we have only defined line integrals along C1 paths and curves. For locallyexact forms, we can extend the definition to paths that are not differentiable.

Let ω be a locally exact form on U and let γ : [a, b] → U be a path. Sinceγ([a, b]) is compact, there exists a partition P = a0, . . . , ak of [a, b] and open ballsB1, . . . , Bk such that ω is exact on Bi and γ([ai−1, ai]) ⊆ Bi for each i = 1, . . . , k.We define the integral of ω along γ by∫

γ

ω =

k∑i=1

[gi(γ(ai))− gi(γ(ai−1))],

where gi is any potential for ω on Bi.

Lemma 2.75. The above integral is well-defined.

Proof. Suppose we are given different open balls B1, . . . , Bk with the above prop-erties, as well as corresponding potential functions gi. For each i the functions giand gi are both potentials for ω on the connected set Bi∩ Bi, so gi− gi is constantand

gi(γ(ai))− gi(γ(ai−1)) = gi(γ(ai))− gi(γ(ai−1)).

Next, we show that choosing a refinement of P does not change the value of theintegral. Since any refinement of P can be obtained by adding finitely many pointsto P , it suffices to show that adding a single point c to P does not change thevalue of the integral. Let

P = a0, . . . , aj−1, c, aj , . . . , ak

131

2 Differentiation

where aj−1 < c < aj . We can use the same open balls and potential functions asbefore, with Bj being used for the intervals [aj−1, c] and [c, aj ]. Then the term

gj(γ(aj))− gj(γ(aj−1))

is replaced by

gj(γ(aj))− gj(γ(c)) + gj(γ(c))− gj(γ(aj−1)),

which does not change the value of the integral. This completes the proof, for ifQ is another partition of [a, b] then taking the common refinement P ∪Q does notchange the value of the integral.

It is easy to check that Theorem 2.68 still holds when γ is a path.

Let X,Y be topological spaces and let f, g : X → Y be continuous maps. Ahomotopy from f to g is a continuous map H : X × [0, 1] → Y such thatH(s, 0) = f(s) and H(s, 1) = g(s) for all s ∈ [a, b]. Let γ1, γ2 : [a, b]→ X be paths(i.e. continuous maps). If t 7→ H(0, t) and t 7→ H(1, t) are constant, we say thatH is a path homotopy. If there is a path homotopy from γ1 to γ2, we say thatγ1 is homotopic to γ2.

Theorem 2.76 (Line integrals along homotopic paths). Let ω be a locally exactform on U . If γ1, γ2 are paths in U that are homotopic, then∫

γ1

ω =

∫γ2

ω.

Proof. Let H : [a, b] × [0, 1] → U be a path homotopy from γ1 to γ2. Since[a, b]× [0, 1] is compact, there are partitions

a = s0 ≤ · · · ≤ sm = b,

0 = t0 ≤ · · · ≤ tn = 1

such that for each rectangle Rij = [si−1, si]×[tj−1, tj ] there is an open ball Bij ⊆ Uwith H(Rij) ⊆ Bij on which ω is exact. For each j = 0, . . . , n, let γ(j)(s) =H(s, tj). Since γ(0) = γ1 and γ(n) = γ2, it suffices to show that∫

γ(j)

ω =

∫γ(j−1)

ω

for each j = 1, . . . , n. Fix some j. For each i = 1, . . . ,m, let gi be a potential forω on Bij . Since gi and gi−1 are both potentials for ω on Bij ∩B(i−1)j , they differby a constant on Bij ∩B(i−1)j . Therefore

gi(γ(j)(si−1))− gi(γ(j−1)(si−1)) = gi−1(γ(j)(si−1))− gi−1(γ(j−1)(si−1))

132

2 Differentiation

for each i = 1, . . . ,m. We have∫γ(j)

ω −∫γ(j−1)

ω

=

m∑i=1

[gi(γ(j)(si))− gi(γ(j)(si−1))− gi(γ(j−1)(si)) + gi(γ

(j−1)(si−1))]

=

m∑i=1

[gi(γ(j)(si))− gi(γ(j−1)(si))− (gi−1(γ(j)(si−1))− gi−1(γ(j−1)(si−1)))]

= gm(γ(j)(b))− gm(γ(j−1)(b))− (g0(γ(j)(a))− g0(γ(j−1)(a)))

= 0

since γ(j) and γ(j−1) have the same starting and ending points.

An open set U is said to be simply connected if it is connected and every closedpath in U is homotopic to a point (i.e. homotopic to a constant path).

Corollary 2.77. Every locally exact form on a simply connected open set is exact.

Proof. Apply Theorem 2.76 and Theorem 2.69.

Theorem 2.78. Every path in U is homotopic to a curve in U .

Proof. Let γ : [a, b] → U be a path. Since γ([a, b]) is compact, there exists apartition P = a0, . . . , ak of [a, b] and open balls B1, . . . , Bk ⊆ U such thatγ([ai−1, ai]) ⊆ Bi for each i = 1, . . . , k. Define γi : [0, 1] → Bi by γi(s) =γ(ai−1 + s(ai − ai−1)); then

γ|[ai−1,ai](s) = γi

(s− ai−1

ai − ai−1

).

For each i there is a path homotopy Hi : [0, 1] × [0, 1] → Bi from γi to thestraight line segment ηi from γ(ai−1) to γ(ai), so we can define a path homotopyH : [a, b]× [0, 1]→ U by setting

H|[ai−1,ai]×[0,1](s, t) = Hi

(s− ai−1

ai − ai−1, t

).

Therefore γ is homotopic to the curve s 7→ H(s, 1).

133

2 Differentiation

Singular homology

Given p + 1 affinely independent points v0, . . . , vp in Rn, the geometric p-simplex with vertices v0, . . . , vp is the subset of Rn defined by

[v0, . . . , vp] =

p∑i=0

tivi : 0 ≤ ti ≤ 1 and

p∑i=0

ti = 1

.

The integer p is called the dimension of the simplex. The simplices whose verticesare nonempty subsets of v0, . . . , vp are called the faces of the simplex. The(p− 1)-dimensional faces are the boundary faces of the simplex. The standardp-simplex is

4p = [e0, . . . , ep] ⊆ Rp,

where e0 = 0 and ei is the ith standard basis vector. For each i = 0, . . . , p, wedefine the ith face map in 4p to be the unique affine map Fi,p : 4p−1 → 4psatisfying

Fi,p(e0) = e0, . . . , Fi,p(ei−1) = ei−1, Fi,p(ei) = ei+1, . . . , Fi,p(ep−1) = ep.

Let U ⊆ E be an open set. A continuous map σ : 4p → U is called a singularp-simplex in U . The singular chain group of U in degree p, denoted byCp(U), is the free abelian group generated by all singular p-simplices in U . Anelement of Cp(U) is called a singular p-chain. The boundary of a singularp-simplex σ is the singular (p− 1)-chain defined by

∂σ =

p∑i=0

(−1)iσ Fi,p.

For example, if σ is the identity map on 42 then ∂σ = σ F0,2−σ F1,2 +σ F2,2

where

(σ F0,2)(e0) = e1, (σ F0,2)(e1) = e2,

(σ F1,2)(e0) = e0, (σ F1,2)(e1) = e2,

(σ F2,2)(e0) = e0, (σ F2,2)(e1) = e1.

The map ∂ extends uniquely to a group homomorphism ∂p : Cp(U) → Cp−1(U),called the singular boundary operator. We write ∂ = ∂p when the dimensionp is clear.

Theorem 2.79. For any c ∈ Cp(U) we have ∂(∂c) = 0.

134

2 Differentiation

We say that a singular p-chain c is a cycle if ∂c = 0, and we say that c is aboundary if c = ∂b for some b ∈ Cp+1(U). Let Zp(U) be the set of all singular p-cycles, and let Bp(U) be the set of all singular p-boundaries. Then Zp(U) = ker ∂p,and Bp(U) = im ∂p−1. Since ∂p−1 ∂p = 0, we have Bp(U) ⊆ Zp(U). The pthsingular homology group of U is the quotient group

Hp(U) = Zp(U)/Bp(U).

The equivalence class in Hp(U) of a singular p-cycle c is called its homology class,and is denoted by [c]. If [c] = [c′], then we say that c and c′ are homologous.More generally, if c is a singular p-chain (and not necessarily a p-cycle), we willdenote its equivalence class in Cp(U)/Bp(U) by [c] as well.

Since 41 = [0, 1], any singular 1-simplex γ is a path in U . Conversely, any pathγ : [a, b] → U can be considered as a singular 1-simplex in U since we have thereparametrization γ : [0, 1]→ U given by γ(t) = γ(a+ t(b− a)).

Lemma 2.80.

1. If γ is a singular 1-simplex then [−γ] = −[γ], where −γ is the singular1-simplex defined by

(−γ)(t) = γ(1− t).

2. If γ1, γ2 are singular 1-simplices with γ1(1) = γ2(0), then [γ1 +γ2] = [γ1 ·γ2],where γ1 · γ2 is the singular 1-simplex defined by

γ(t) =

γ1(2t), 0 ≤ t ≤ 1/2,

γ2(2t− 1), 1/2 < t ≤ 1.

3. If γ, η are singular 1-simplices that are (path) homotopic, then [γ] = [η].

A 1-chain c is piecewise C1 if it can be written as a sum of curves (i.e. piecewise

C1 paths). From Lemma 2.80, it is easy to see that we can write [c] =∑ki=1 ci[σi]

where each σi is a C1 path.

Theorem 2.81. Every 1-cycle c ∈ Z1(U) can be written in the form

[c] =

k∑i=1

ci[γi],

where each γi : 41 → U is a closed path.

135

2 Differentiation

Proof. Suppose not; then we can write

c = ∂b+

k∑i=1

ciγi +

j∑i=1

σi (*)

where b is a singular 2-chain, ci, c′i 6= 0 for all i, each γi is a closed path, and each

σi is a path that is not closed. We can assume that (j, k) is smallest pair for whichc can be written in this form (where we take (j, k) < (j′, k′) if j < j′, or j = j′

and k < k′), and that j ≥ 1. Let us denote a singular 0-simplex σ with σ(0) = xby P (x). (If c is a singular 1-simplex in U and ∂c = 0 then by definition we haveP (c(1))− P (c(0)) = 0, i.e. c is a closed path.) Then

0 = ∂c =

j∑i=1

[P (σi(1))− P (σi(0))].

Suppose P (σi(1)) = P (σ1(0)) for some i 6= 1; then [σi] + [σ1] = [σi · σ1], so we canreduce either j or k in (*). Similarly, if P (σi(0)) = P (σ1(0)) for some i 6= 1, then[σi]+[σ1] = [(−σi)·σ1] and we can reduce j or k. Both of these cases contradict theminimality of (j, k), so we must have P (σ1(0)) = P (σ1(1)) since the coefficient ofP (σ1(0)) in the above sum is 0. This contradicts the fact that σi is not closed.

Let ω be a locally exact form on U and let c ∈ C1(U) be a 1-chain in U . Write

c =∑ki=1 ciσi where ci ∈ Z and each σi is a 1-simplex in U . We define the

integral of ω over c by ∫c

ω =

k∑i=1

ci

∫σi

ω.

Lemma 2.82. Let σ : 42 → U be a 2-simplex. Then∫∂σ

ω = 0

for every locally exact form ω on U .

Proof. It is clear that ∫∂σ

ω =

∫σγ

ω

where γ : [a, b] → 42 is a path traversing the boundary of 42 counterclockwise.Since 42 is convex, there is a path homotopy H : [a, b]× [0, 1]→ 42 from γ to apoint in 42, so σH is a path homotopy from σγ to a point in σ(42). Therefore∫

σγω = 0

136

2 Differentiation

by Theorem 2.76.

Theorem 2.83. If c and c′ are homologous 1-cycles in U , then∫c

ω =

∫c′ω

for every locally exact form ω on U . If ω and ω are locally exact forms on U thatdiffer by an exact form, then ∫

c

ω =

∫c

ω

for every 1-cycle c in U .

Proof. If c and c′ are homologous, then c − c′ = ∂b for some 2-chain b. Writeb =

∑ki=1 biσi where each σi is a 2-simplex. Then

∫c

ω −∫c′ω =

∫∂b

ω =

k∑i=1

bi

∫∂σi

ω = 0

by Lemma 2.82. Suppose ω and ω are locally exact forms with ω − ω = Df forsome function f : U → F , and let c be a 1-cycle. By Theorem 2.81, c is homologousto∑ki=1 ciγi where each γi is a closed path. Then

∫c

ω −∫c

ω =

∫c

Df =

k∑i=1

ci

∫γi

Df = 0.

137

3 Complex analysis

3.1 Complex differentiation and integration

Complex numbers

The field C of complex numbers can be constructed by equipping R2 with theproduct

((a, b), (c, d)) 7→ (ac− bd, ad+ bc),

which is a continuous, symmetric bilinear map under the standard norm. Thuswe may also consider C as a unital and commutative (real) Banach algebra. Thereal numbers can be embedded in C using the homomorphism x 7→ (x, 0). Theelement (0, 1) ∈ C is usually denoted by i, which has the property that i2 = −1.If z ∈ C is a complex number, then z can be written both as x + yi for x, y ∈ Ror as an element (x, y) ∈ R2. We write z for x− yi, or the element (x,−y) ∈ R2.We define the real and imaginary parts of z = x+ yi by

Re z = x and Im z = y.

Note that we have a homomorphism ϕ : C \ 0 → GL(2,R) given by

x+ yi 7→[x −yy x

].

We sometimes refer to the matrix ϕ(x + yi) as the matrix representation ofx+ yi.

The open disc Dr(z0) is the set z ∈ C : |z − z0| < r, and the closed discDr(z0) is the set z ∈ C : |z − z0| ≤ r.

Differentiation

Let F be a complex Banach space, let U ⊆ C be an open set, and let f : U →F . Recall that the C-derivative of f at z ∈ U , if it exists, is a C-linear map

138

3 Complex analysis

Df(z) : C → F . We will use the terms complex derivative and complexdifferentiable instead of C-derivative and C-differentiable. We will also writef ′(z) = Df(z)(1) ∈ F , and reserve the notation Df for the R-derivative of f . If fis complex differentiable at all z ∈ U , then we say that f is holomorphic on Uor simply holomorphic. If A ⊆ C is any set, we say that f is holomorphic onA if it is holomorphic on an open set containing A. If z ∈ C and f is holomorphicon a neighborhood of z, we say that f is holomorphic at z.

Using Theorem 2.4, Theorem 2.5 and Corollary 2.7 we have:

Theorem 3.1. Let U, V ⊆ C be open sets. Let f : U → C and g : V → F withf(U) ⊆ V . If f is complex differentiable at z and g is complex differentiable atf(z), then g f is complex differentiable at z and

(g f)′(z) = g′(f(z))f ′(z).

Theorem 3.2. Let U ⊆ C be an open set, let F1, F2 be complex Banach spaces,and let f : U → F1 and g : U → F2 be complex differentiable at z ∈ U .

1. If f is constant then f ′(z) = 0.

2. If F1 = F2 then (f + g)′(z) = f ′(z) + g′(z).

3. (rf)′(z) = rf ′(z) for all r ∈ C.

4. If F1 = C or F2 = C, then (fg)′(z) = f ′(z)g(z) + f(z)g′(z).

5. If F2 = C and g(z) 6= 0 then (f/g)′(z) = [f ′(z)g(z)− f(z)g′(z)]/g(z)2.

Theorem 3.3. Let U ⊆ C be an open set. A function f : U → C is complexdifferentiable at z ∈ U if and only if it is R-differentiable at z and the R-derivativeDf(z) is represented by a matrix of the form[

a −bb a

].

In that case, Df(z) is the matrix representation of the complex number f ′(z).

Furthermore, detDf(z) = |f ′(z)|2.

The preceding theorem shows that a holomorphic function f : U → C is simply areal differentiable function with the property that its derivative is a scalar timesa rotation matrix at every point of U . Suppose that f(x+ yi) = u(x, y) + v(x, y)iwhere u, v are real valued functions. If f is holomorphic then the theorem impliesthat

∂u

∂x=∂v

∂yand

∂u

∂y= −∂v

∂x.

These are known as the Cauchy-Riemann equations.

139

3 Complex analysis

Integration

We define complex line integrals as in Section 2.7, taking E = C. Any continuousfunction f : U → F corresponds to a form ωf : U → L(C, F ) given by ωf (z)h =hf(z). If γ is a curve in U then the integral of f along γ is defined by∫

γ

f =

∫γ

f(z) dz =

∫γ

ωf .

Note that if γ : [a, b]→ U is a curve with partition a0, . . . , ak, then∫γ

f =

k∑i=1

∫ ai

ai−1

f(γ(t))γ′(t) dt.

The usual properties in Theorem 2.67 apply. A holomorphic function g : U → Fsatisfying f = g′ is called a primitive of f . It is easy to check that any potentialfunction for ωf is a primitive for f .

The key connection with Section 2.7 is the following: suppose f : U → F isholomorphic. Since ωf = ` f where ` : F → L(C, F ) is the linear map given by`(x)h = hx, we have

Dωf (z)(u, v) = (D`(f(z))Df(z)u)(v)

= `(Df(z)u)(v)

= uvf ′(z).

Clearly, Dωf (z) is symmetric for all z ∈ U . Thus ωf is closed, and Corollary 2.74shows that ωf is locally exact. As a consequence, we can define the integral of aholomorphic function f along any path (see Lemma 2.75).

Example 3.4. Let n be an integer and define a curve γ : [0, 2π]→ C by γ(t) = eit.Then ∫

γ

zn dz = i

∫ 2π

0

e(n+1)it dt

=

2πi, n = −1,

0, n 6= −1.

Theorem 3.5 (Cauchy’s theorem, local version). Let U ⊆ C be an open set, letγ1, γ2 be paths in U that are homotopic, and let f be holomorphic on U . Then∫

γ1

f =

∫γ2

f.

140

3 Complex analysis

In particular, if U is simply connected then∫γ

f = 0

for any closed path γ in U .

Proof. Apply Theorem 2.76.

If C is a circle, we write ∫C

f

for the integral of f along C, taken counterclockwise.

Theorem 3.6 (Cauchy’s integral formula, local version). Let D be a closed discand let f be holomorphic on D. Then

f(z) =1

2πi

∫∂D

f(ζ)

ζ − zdζ

for every z ∈ IntD.

Proof. Let U be an open set containing D on which f is holomorphic. For smallr > 0, the circle Cr of radius r around z is contained in D. Note that Cr ishomotopic to ∂D in U \ z, so Theorem 3.5 shows that∣∣∣∣ 1

2πi

∫∂D

f(ζ)

ζ − zdζ − f(z)

∣∣∣∣ =

∣∣∣∣ 1

2πi

∫Cr

f(ζ)

ζ − zdζ − 1

2πi

∫Cr

f(z)

ζ − zdζ

∣∣∣∣=

∣∣∣∣ 1

2πi

∫Cr

f(ζ)− f(z)

ζ − zdζ

∣∣∣∣≤ 1

2π2πr sup

ζ∈Cr

∣∣∣∣f(ζ)− f(z)

ζ − z

∣∣∣∣→ 0

as r → 0 since f is complex differentiable at z.

Theorem 3.7. Let U ⊆ C be an open set, let γ : [a, b] → U be a curve, letA = γ([a, b]), and let g : A → F be continuous. Suppose that f : U \ A → F isgiven by

f(z) =1

2πi

∫γ

g(ζ)

ζ − zdζ.

141

3 Complex analysis

Let z0 ∈ U \A and let D = Dr(z0) be an open disc around z0 contained in U \A.Then

f(z) =

∞∑n=0

an(z − z0)n (*)

for all z ∈ D, where

an =1

2πi

∫γ

g(ζ)

(ζ − z0)n+1dζ.

We have

|an| ≤1

rnsupζ∈A|g(ζ)| ,

so the power series in (*) has a radius of convergence of at least r.

Proof. Let 0 < s < r and let D′ = Ds(z0). For all z ∈ D′ and ζ ∈ A we have

1

ζ − z=

1

ζ − z0

(1

1− z−z0ζ−z0

)

=1

ζ − z0

∞∑n=0

(z − z0

ζ − z0

)n,

where the geometric series converges absolutely and uniformly for ζ ∈ A since∣∣∣∣z − z0

ζ − z0

∣∣∣∣ ≤ s

r< 1.

Therefore

f(z) =1

2πi

∫γ

g(ζ)

ζ − z0

∞∑n=0

(z − z0

ζ − z0

)ndζ

=

∞∑n=0

[1

2πi

∫γ

g(ζ)

(ζ − z0)n+1dζ

](z − z0)n

for all z ∈ D′.

Using Theorem 3.6 we can apply Theorem 3.7 to any holomorphic function, takingγ = ∂D.

A function f : C → F is called entire if it is holomorphic on C. From Theorem3.7 we can see that a function is entire if and only if it is represented by a powerseries with infinite radius of convergence.

142

3 Complex analysis

Theorem 3.8. Let f be an entire function. If there is a constant c and a positiveinteger k such that

sup|z|=r

|f(z)| ≤ crk

for all r > 0, then f is a polynomial of degree k or less (with coefficients in F ).

Proof. Write f(z) =∑∞n=0 anz

n where an ∈ F . By Theorem 3.7, we have

|an| ≤1

rnsup|z|=r

|f(z)| ≤ crk−n.

If n > k, we can take r →∞ to deduce that an = 0.

Corollary 3.9 (Liouville’s theorem). Any bounded entire function is constant.

Theorem 3.10 (Fundamental theorem of algebra). Every non-constant polyno-mial with complex coefficients has a root.

Proof. Let f(z) = anzn+ · · ·+a1z+a0 be a non-constant polynomial and suppose

that f(z) 6= 0 for all z ∈ C. Then g(z) = 1/f(z) is an entire function, and

|g(z)| = |z|−n∣∣an + an−1z

−1 + · · ·+ a0z−n∣∣−1 → 0

as |z| → ∞. Choose R so that |g(z)| < 1 for all |z| > R. Since g is continuous,it is also bounded on the closed disc of radius R. Therefore g is a bounded entirefunction, and Liouville’s theorem implies that g is constant (and in fact zero).This contradicts the fact that f is non-constant.

Elementary functions

We define the exponential and logarithm functions

exp(z) = ez =

∞∑n=0

zn

n!and log(z) =

∞∑n=1

(1− z)n

n

as in Definition 2.60 and Definition 2.65, with the usual properties:

Theorem 3.11 (Properties of the exp and log).

1. exp is an entire function, and log is defined on the open disc D1(1).

2. exp(0) = 1 and log(1) = 0.

143

3 Complex analysis

3. exp(x+ y) = exp(x) exp(y).

4. log(xy) = log(x) + log(y) for x, y ∈ D1(1) with xy ∈ D1(1).

5. exp′(z) = exp(z) and log′(z) = 1/z.

6. exp(log(z)) = z for z ∈ D1(1).

7. log(exp(z)) = z for |z| < log 2.

We then define the entire functions

sin(z) =eiz − e−iz

2iand cos(z) =

eiz + e−iz

2,

which are easily seen (from their power series) to agree with the elementary realvalued sin and cos functions on R. The usual rules sin′(z) = cos(z) and cos′(z) =− sin(z) still hold. Note Euler’s formula eix = cosx+ i sinx, which holds for allx ∈ R (and all x ∈ C). Suppose eix = eiy for some x, y ∈ R. Then cosx = cos yand sinx = sin y, so x− y is a multiple of 2π.

Theorem 3.12 (Polar coordinates). Define ϕ : (0,∞)× (−π, π]→ R2 \ 0 by

ϕ(r, θ) = reiθ = r cos θ + ir sin θ.

1. ϕ is a C∞ bijection.

2. The restriction of ϕ to (0,∞)×(−π, π) is a diffeomorphism onto R2\(−∞, 0].

3. The inverse of ϕ on the right half of the plane (where x > 0) is given by

ϕ−1(x, y) =(√

x2 + y2, arctan(y/x)).

Proof. (1) is obvious. For (2), we compute

Dϕ(r, θ) =

[cos θ −r sin θsin θ r cos θ

].

The determinant of this matrix is r 6= 0, so ϕ|(0,∞)×(−π,π) is a diffeomorphismsince it is a bijection onto R2 \ (−∞, 0].

The preceding theorem shows that every nonzero complex number z can be writtenin the form z = reiθ for unique r > 0 and θ ∈ (−π, π]. We define the argumentof z by arg z = θ. Thus, for all z 6= 0 we have z = |z| ei arg z.

We will now examine the logarithm more closely. Since the integral of 1/z takencounterclockwise along the unit circle is 2πi, the function z 7→ 1/z has no primitive

144

3 Complex analysis

on C \ 0. However, the set C \ (−∞, 0] is star-shaped (with respect to 1, forinstance) and therefore simply connected, so z 7→ 1/z has a primitive g on thisset. By adding a constant to g, we may assume that g(1) = 0. Since (g− log)′ = 0on D1(1) and g(1) = log(1) = 0, we have g = log on D1(1).

Theorem 3.13 (Principal value of log). The unique primitive g of z 7→ 1/z onC \ (−∞, 0] satisfying g(1) = 0 is given by

g(z) = log |z|+ i arg z,

and g is the inverse of the restriction of exp to R× (−π, π).

Proof. Let g(z) = log |z|+ i arg z and define ϕ : (0,∞)× (−π, π) → R2 \ (−∞, 0]as in Theorem 3.12. We compute

Dϕ(r, θ)−1 =

[cos θ sin θ

−r−1 sin θ r−1 cos θ

].

Let π1, π2 : R2 → R be the projections onto the x and y components, respectively.Then (π1 ϕ−1)(z) = |z| and (π2 ϕ−1)(z) = arg z by definition, so g = f ϕ−1

where f : (0,∞) × (−π, π) → R × (−π, π) is given by (r, θ) 7→ (log r, θ). Letz ∈ C \ (−∞, 0] and write z = reiθ for some r > 0 and θ ∈ (−π, π). Then

Dg(z) = Df(r, θ)Dϕ(r, θ)−1

=

[r−1 00 1

] [cos θ sin θ

−r−1 sin θ r−1 cos θ

]= r−1

[cos θ sin θ− sin θ cos θ

],

which is the matrix representation of 1/z. Therefore g is holomorphic with g′(z) =1/z. Since (g − g)′ = 0 and g(1) = g(1) = 0, we have g = g.

Since arg z is defined for all z 6= 0, we may extend g to C \ 0 using the formulain Theorem 3.13. However, g cannot be holomorphic on C \ 0 since it is notcontinuous along the negative real axis. From now on, we will discard our oldpower series definition of log and use g : C \ 0 → R× (−π, π] instead.

Theorem 3.14. Let z ∈ C. Then

log(exp(z)) = z + 2πik,

where k is chosen so that Im z + 2πk ∈ (−π, π].

145

3 Complex analysis

Proof. We have

log(exp(z)) = log |exp(z)|+ i arg(exp(z))

= Re z + i arg(ei Im z).

Given z ∈ C \ 0 and α ∈ C, we define zα = eα log z. If n is an integer, we mayalso define zn for any z ∈ C by repeated multiplication. These two definitionscoincide when z is nonzero.

Theorem 3.15. Let α, β ∈ C.

1. zα+β = zαzβ for all z 6= 0.

2. znα = (zα)n for all n ∈ Z.

3. (zα)β = zαβe2πiβk, where k is chosen so that Im(α log z) + 2πk ∈ (−π, π].


(zα)β = (eα log z)β

= exp(β log(eα log z))

= exp(β(α log z + 2πik))

= eαβ log ze2πiβk

by Theorem 3.14, where k is chosen so that Im(α log z) + 2πk ∈ (−π, π].

Let n be a positive integer, let |z| = 1 and suppose that zn = 1. If we write z = eiθ

then eiθn = 1, which implies that θn = 2πk for some integer k. Therefore z =e2πik/n. There are exactly n possible values of z, obtained when k = 0, . . . , n− 1.These values are called the nth roots of unity. Consider the case n = 2. Thepreceding argument shows that for any z 6= 0, there are exactly two values w1, w2

of w for which w2 = z. Furthermore, we have w1 = −w2. In order to definea square root function, we must make a choice of sign. As with log, this choicecannot be continuous on C \ 0.

Theorem 3.16 (Principal value of√z). Define a function

√· : C→ C as follows:

set√

0 = 0, and set √z = z1/2 = e

12 log z

for z 6= 0.

1. (√z)2 = z for all z ∈ C.

146

3 Complex analysis

2.√· is a bijection from C to (x, y) ∈ C : x > 0 or (x = 0 and y ≥ 0).

3. The restriction of√· to C\ (−∞, 0] is holomorphic, and is the inverse of the

restriction of z 7→ z2 to (x, y) ∈ C : x > 0.

Proof. Since −π < arg z ≤ π we have −π2 <12 arg z ≤ π

2 , and (2) follows. For (3),√· is holomorphic on C \ (−∞, 0] because log is.

Properties of analytic functions

Let z0 ∈ C and let f : U → F be a function defined on a neighborhood of z0. Wesay that f is analytic at z0 if there is a r > 0 and coefficients a0, a1, · · · ∈ F suchthat

f(z) =

∞∑n=0

an(z − z0)n

for all z ∈ Dr(z0). If f is analytic at every point of U , then we say that fis analytic on U . By Theorem 2.59, it is clear that every analytic function isinfinitely complex differentiable and therefore of class C∞ as a real differentiablefunction.

Theorem 3.17. If U ⊆ C is an open set, then a function f : U → F is analyticif and only if it is holomorphic.

Proof. Every analytic function is infinitely complex differentiable. Conversely,Theorem 3.7 shows that every holomorphic function is analytic.

Theorem 3.18 (Inverse function theorem). Let U ⊆ C be an open set and letf : U → C be holomorphic. Suppose that f ′(z) 6= 0 for some z ∈ U . Then there isa neighborhood V ⊆ U of z such that f(V ) is open, f |V : V → f(V ) is invertible,and (f |V )−1 is holomorphic.

Proof. Due to Theorem 3.17, f is of class C∞, and since detDf(z) = |f ′(z)|2 6= 0,we can apply Theorem 2.44. We have D((f |V )−1)(z) = [Df((f |V )−1(z))]−1, whichis easily seen to be of the form described in Theorem 3.3.

Lemma 3.19. Let f(z) =∑∞n=0 anz

n be a non-constant power series having anon-zero radius of convergence. If f(0) = 0, then there exists s > 0 such thatf(z) 6= 0 for all 0 < |z| < s.

147

3 Complex analysis

Proof. Since f(0) = 0 and the series is non-constant, we may write f(z) =amz

m[1 + g(z)] where am 6= 0 and g(z) = b1z + b2z2 + · · · . Let R be the radius of

convergence of g, which is non-zero since f has a non-zero radius of convergence.Since g is continuous at 0, we may choose a sufficiently small non-zero s < R sothat |g(s)| < 1. Then f(z) 6= 0 for all 0 < |z| < s, which completes the proof.

Theorem 3.20 (Uniqueness of power series). Let f(z) =∑∞n=0 anz

n and g(z) =∑∞n=0 bnz

n be convergent in a set E ⊆ C having 0 as a limit point. If f(z) = g(z)for all z ∈ E, then an = bn for all n.

Proof. Let h = f − g so that h(z) = 0 for all z ∈ E. From the contrapositive ofLemma 3.19, we conclude that h is constant or h(0) 6= 0. But h(0) 6= 0 violatesthe continuity of h at 0 since h(z) = 0 for arbitrarily small |z|. Therefore h = 0and f is identical to g.

Corollary 3.21 (Identity theorem). Let f, g be holomorphic on a connected openset U . If S = z ∈ U : f(z) = g(z) has a limit point in U , then f = g.

Proof. Let

E =z ∈ U : f (n)(z) = g(n)(z) for all n ≥ 0

.

Since f and g are continuous, it is clear that E is closed in U . If z ∈ E then fand g are analytic at z, so they are equal in a neighborhood of z. Therefore E isopen in U . If we can show that E is nonempty, then E = U since U is connected.But this is clear from Theorem 3.20.

Theorem 3.22 (Open mapping theorem). Let U ⊆ C be a connected open setand let f : U → C be holomorphic. If f is non-constant, then f is an open map.

Proof. Let W be open in U . Let z0 ∈W and write f(z) =∑∞n=0 an(z − z0)n in a

neighborhood of z0. If an = 0 for all n ≥ 1 then f is constant on a neighborhoodof z0, and Corollary 3.21 implies that f is constant. Therefore an 6= 0 for somen ≥ 1; let m be the smallest such integer and write

f(z) = f(z0) + am(z − z0)m(1 + g(z))

where g is holomorphic on a neighborhood of z0 and has zero constant term.Choose any a ∈ C such that am = am. On a neighborhood of z0 we can write

f(z) = f(z0) + h(z)m

where h(z) = a(z−z0)(1+h(z))1/m. Since a 6= 0 we have h′(z) 6= 0, so the inversefunction theorem shows that h(V ) is open for some neighborhood V ⊆ W of z0.

148

3 Complex analysis

Choose r > 0 so that the open disc Dr(0) is contained in h(V ). The image of Dr(0)under the map z 7→ zm is Dr(0). Therefore h−1(Dr(0)) ⊆ V is a neighborhood ofz0 and f(h−1(Dr(0))) = Dr(f(z0)) is open.

Corollary 3.23 (Maximum modulus principle). Let U ⊆ C be a connected openset and let f : U → C be holomorphic. If there exists a point z0 ∈ U such that

|f(z)| ≤ |f(z0)|

for all z in a neighborhood of z0, then f is constant on U .

Proof. If f is non-constant, then Theorem 3.22 shows that f is an open map. IfV ⊆ U is a neighborhood of z0 then f(V ) is open, so there must be a point w ∈ Vwith |f(w)| > |f(z0)|.

Limits of holomorphic functions

Theorem 3.24. Let U ⊆ C be an open set. Let fn be a sequence of holomorphicfunctions from U to F converging locally uniformly to f : U → F . Then f is

holomorphic, and f(k)n → f (k) locally uniformly for all k ≥ 1.

Proof. Let z0 ∈ U and let D = Dr(z0) ⊆ U be a closed disc around z0 (with r > 0)on which fn → f uniformly. If R is a rectangle contained in IntD, then∫

∂R

f = limn→∞

∫∂R

fn = 0

since each fn is holomorphic. By Theorem 2.73, the associated form ωf : IntD →L(C, F ) is exact, and f = g′ for some g holomorphic on IntD. This shows that fis holomorphic on IntD.

Let z ∈ Dr/2(z0) and choose s > 0 such that the closed disc D′ = Ds(z) is

contained in Dr/2(z0). Then ∂D′ is homotopic to ∂D, and∣∣∣f (k)n (z)− f (k)(z)

∣∣∣ =

∣∣∣∣ k!

2πi

∫∂D′

fn(ζ)− f(ζ)

(ζ − z)k+1dζ

∣∣∣∣=

∣∣∣∣ k!

2πi

∫∂D

fn(ζ)− f(ζ)

(ζ − z)k+1dζ

∣∣∣∣≤ k!

2π2πr sup

ζ∈∂D

|fn(ζ)− f(ζ)||ζ − z|k+1

≤ k!r

(r/2)k+1supζ∈∂D

|fn(ζ)− f(ζ)|

by Theorem 3.7, since |ζ − z| ≥ r/2. Therefore fn → f uniformly on Dr/2(z0).

149

3 Complex analysis

3.2 Cauchy’s theorem and meromorphic functions

In this section we assume basic knowledge of algebraic topology.

An important consequence of Theorem 2.83 is the following result, which is aglobal version of Theorem 3.5:

Theorem 3.25 (Cauchy’s theorem). Let U ⊆ C be an open set, let γ1, γ2 besingular 1-cycles in U that are homologous, and let f be holomorphic on U . Then∫

γ1

f =

∫γ2

f.

Singular homology in C

We first explore the relationship between Theorem 3.25 and the winding numberof a path.

For any topological space X, we define a loop in X to be a continuous mapγ : [0, 1] → X with γ(0) = γ(1), and we say that γ is based at γ(0). LetS1 = z ∈ C : |z| = 1 be the circle. Recall that the map q : R → S1 given bys 7→ e2πis is a universal covering of S1. Let f : [0, 1] → S1 be a loop based at

a point z0 ∈ S1. We define the winding number of f by f(1) − f(0), where

f : [0, 1] → R is any lift of f . Since any two lifts of f differ by a constant, the

winding number is well-defined. Since f(1) and f(0) are both elements of the fiberq−1(z0), they differ by an integer; thus the winding number of a loop is alwaysan integer.

Theorem 3.26. Let f, g be loops in S1 based at the same point. Then f and gare (path) homotopic if and only if they have the same winding number.

Now let z0 ∈ C and let γ : [0, 1]→ C \ z0 be a closed path. Define a retractionr : C \ z0 → S1 by

r(z) =z − z0

|z − z0|.

Then r γ is a loop in S1, and we can define the winding number of γ withrespect to z0 to be the winding number of r γ. We denote this integer byW (γ, z0). If we consider γ as a singular 1-cycle in the homology group H1(C \z0) ∼= Z, then [γ] = W (γ, z0)[α] where α is the generator of H1(C\z0) definedby α(s) = z0+e2πis. Therefore, for any 1-cycle γ in C\z0 we define the windingnumber of γ with respect to z0 to be the unique integer W (γ, z0) such that[γ] = W (γ, z0)[α].

150

3 Complex analysis

Theorem 3.27.

1. If γ, η are paths in η that are homotopic, then γ is homologous to η (as1-cycles).

2. If γ is homologous to η in C \ z0, then W (γ, z0) = W (η, z0).

3. If γ1, . . . , γk are closed paths and n1, . . . , nk are integers, then

W (n1γ1 + · · ·+ nkγk, z0) = n1W (γ1, z0) + · · ·+ nkW (γk, z0).

Note that we have a convenient expression for the winding number of a 1-cycle asan integral:

Theorem 3.28. For every 1-cycle γ in C \ z0, we have

W (γ, z0) =1

2πi

∫γ

1

z − z0dz.

Proof. We first prove the result for closed paths in C \ z0. By Theorem 2.78, wemay assume that γ is a closed curve. By linearity, we may also assume that γ isa C1 path. Let γ : [0, 1]→ R be a lift of r γ; then γ is C1 and

e2πiγ(s) =γ(s)− z0

f(s),

where f(s) = |γ(s)− z0|. We compute

1

2πi

∫γ

1

z − z0dz =

1

2πi

∫ 1

0

γ′(s)

γ(s)− z0ds

=1

2πi

∫ 1

0

2πif(s)γ′(s)e2πiγ(s) + f ′(s)e2πiγ(s)

f(s)e2πiγ(s)ds

=1

2πi

∫ 1

0

(2πiγ′(s) +

f ′(s)

f(s)

)ds

=1

2πi[2πiγ(s) + log f(s)]

10

= γ(1)− γ(0)

= W (γ, z0).

Now let γ be a 1-cycle in C \ z0. By Theorem 2.81, γ is homologous to a sum∑kj=1 cjγj where each γj is a closed path. Then

W (γ, z0) =

k∑j=1

cjW (γj , z0) =

k∑j=1

cj1

2πi

∫γj

1

z − z0dz =

1

2πi

∫γ

1

z − z0dz.

151

3 Complex analysis

Lemma 3.29. Let γ : [a, b]→ C be a curve and let A = γ([a, b]). The function

α 7→∫γ

1

z − αdz

is continuous on C \A.

Proof. Let α0 ∈ C \A. The function t 7→ |α0 − γ(t)| is positive and continuous on[a, b], so it attains a minimum r > 0. For all |α− α0| < r/2 and t ∈ [a, b] we have

|α− γ(t)| ≥ |α0 − γ(t)| − |α− α0| ≥ r/2,

so ∣∣∣∣∫γ

(1

z − α− 1

z − α0

)dz

∣∣∣∣ ≤ L(γ) supt∈[a,b]

∣∣∣∣ α− α0

(γ(t)− α)(γ(t)− α0)

∣∣∣∣≤ L(γ)

4

r2|α− α0|

→ 0

as α→ α0.

Corollary 3.30. Let γ : [a, b] → C be a closed curve and let A = γ([a, b]). IfE is a connected subset of C \ A, then z 7→ W (γ, z) is constant on E. If E isunbounded, then W (γ, z) = 0 for all z ∈ E.

Proof. The first claim is clear. Let n be the winding number of γ with respect toany point of E. We have

n =1

2πi

∫γ

1

ζ − zdζ

for all z ∈ E, so n = 0 since ∣∣∣∣∫γ

1

ζ − zdζ

∣∣∣∣→ 0

as |z| → ∞.

It is a remarkable fact that for any open set U ⊆ C, the homology class of a 1-cycleγ ∈ H1(U) is completely determined by the winding number of γ with respect topoints outside of U . We will provide a proof later.

Theorem 3.31 (Boundaries in C). Let U ⊆ C be an open set and let γ be a1-cycle in U . If W (γ, z) = 0 for all z ∈ C \ U , then γ is a boundary, i.e. γ = ∂bfor some 2-chain b.

152

3 Complex analysis

Corollary 3.32. Let γ, η be 1-cycles in U . Then γ and η are homologous if andonly if W (γ, z) = W (η, z) for all z ∈ C \ U .

Corollary 3.33 (Cauchy’s theorem with winding numbers). Let U ⊆ C be an openset, let γ1, γ2 be 1-cycles in U such that W (γ1, z) = W (γ2, z) for all z ∈ C \ U ,and let f be holomorphic on U . Then∫

γ1

f =

∫γ2

f.

If c =∑ki=1 ciσi is a p-chain where ci 6= 0, we define the image of c to be the set⋃k

i=1 σi(4p).

Theorem 3.34 (Cauchy’s integral formula). Let U ⊆ C be an open set, let γ be a1-cycle in U homologous to 0, and let f be holomorphic on U . For all z ∈ U notin the image of γ we have

W (γ, z)f(z) =1

2πi

∫γ

f(ζ)

ζ − zdζ.

Proof. Write f(ζ) =∑∞n=0 an(ζ − z)n in a neighborhood of z. Let C be a small

circle centered at z, contained in this neighborhood. By Theorem 3.33,

1

2πi

∫γ

f(ζ)

ζ − zdζ =

1

2πi

∫W (γ,z)C

f(ζ)

ζ − zdζ

=1

2πi

∞∑n=0

∫W (γ,z)C

an(ζ − z)n−1 dζ

= a01

2πi

∫W (γ,z)C

1

ζ − zdζ

= W (γ, z)f(z).

Artin’s proof

If γ : [a, b] → C is a closed curve and there exists a partition a0, . . . , ak of[0, 1] such that γ|[aj−1,aj ] is a horizontal or vertical line segment for each j, thenwe say that γ is rectangular. A rectangular 1-cycle is a 1-cycle that can bewritten as a sum of rectangular closed curves. A grid is a union of finitely manyvertical or horizontal lines in C. Every grid partitions C into a finite number of

153

3 Complex analysis

rectangular regions, some bounded and some unbounded. Then it is clear that forany rectangular 1-cycle γ there is a grid G for which γ =

∑ki=1 ciσi, where each

σi is an edge of a bounded rectangle. We say that G is a grid for γ.

Lemma 3.35. Let γ be a rectangular 1-cycle in C, let G be a grid for γ, and letR1, . . . , Rn be the bounded rectangles. For each i, choose some pi ∈ IntRi. Then

γ =

n∑i=1

W (γ, pi)∂Ri.

(Each ∂Ri is oriented counterclockwise.)

Proof. Let η = γ −∑ni=1W (γ, pi)∂Ri; it is clear that W (η, p) = 0 for any p not

on the grid (i.e. not on the boundary of some bounded or unbounded rectangle).Suppose that η 6= 0 and write η = mσ+η′, where m 6= 0, σ is an edge of a boundedrectangle R, and η′ is some 1-chain not containing σ. Then σ is also an edge ofexactly one other rectangle R′, which is either bounded or unbounded. Choosep ∈ IntR and p′ ∈ IntR′. Then W (∂R, p) = 1 and W (∂R, p′) = 0, so

W (η −m∂R, p) = W (η, p)−mW (∂R, p) = −m,W (η −m∂R, p′) = W (η, p′)−mW (∂R, p′) = 0.

But the image E of η −m∂R does not contain the edge σ, so p and p′ are in thesame connected component of C \ E. Therefore

W (η −m∂R, p) = W (η −m∂R, p′)

by Corollary 3.30, which is a contradiction.

Proof of Theorem 3.31. By using an argument similar to that of Theorem 2.78,we may assume that γ is a rectangular 1-cycle in U . Let G be a grid for γ, letR1, . . . , Rn be the bounded rectangles, and choose some pi ∈ IntRi for each i. ByLemma 3.35, we have

γ =

n∑i=1

W (γ, pi)∂Ri.

Suppose some Ri contains a point p ∈ C\U ; then W (γ, p) = 0. If p ∈ IntRi, thenW (γ, pi) = W (γ, p) = 0 since IntRi is connected. If p ∈ ∂Ri and p is not in theimage of γ, then again we have W (γ, pi) = W (γ, p) = 0. Note that p cannot be inthe image of γ. Therefore Ri ⊆ U whenever W (γ, pi) 6= 0, and γ is the boundaryof the 2-chain

n∑i=1

W (γ, pi)Ri.

154

3 Complex analysis

Dixon’s proof

There is also a direct proof of Theorem 3.34 due to Dixon.

Lemma 3.36. Let U ⊆ C be an open set, let f : U × U → F be continuous, andlet γ, η be piecewise C1 1-cycles in U . Then∫

γ

∫η

f(z, w) dw dz =

∫η

∫γ

f(z, w) dz dw.

Proof. By linearity and reparametrization, we can assume that γ and η are C1

paths defined on [0, 1]. By Corollary 2.38,∫γ

∫η

f(z, w) dw dz =

∫ 1

0

(∫η

f(γ(t), w) dw

)γ′(t) dt

=

∫ 1

0

∫ 1

0

f(γ(t), η(s))η′(s)γ′(t) ds dt

=

∫ 1

0

∫ 1

0

f(γ(t), η(s))γ′(t)η′(s) dt ds

=

∫ 1

0

(∫γ

f(z, η(s)) dz

)η′(s) ds

=

∫η

∫γ

f(z, w) dz dw.

Proof of Theorem 3.34. By Theorem 2.78, we may assume that γ is a piecewiseC1 1-cycle in U . Define a continuous function g : U × U → F by

g(z, w) =

f(w)−f(z)

w−z , w 6= z,

f ′(z), w = z.

Let V be the open set of all complex numbers z not in the image of γ for whichW (γ, z) = 0. It is clear that C \ U ⊆ V , so U ∪ V = C. Define

h(z) =

1

2πi

∫γg(z, w) dw, z ∈ U,

12πi

∫γf(w)w−z dw, z ∈ V.

(Note that the two definitions coincide for z ∈ U ∩ V .) Since

h(z) =1

2πi

∫γ

f(w)

w − zdw −W (γ, z)f(z)

155

3 Complex analysis

for all z ∈ U not in the image of γ, it suffices to show that h = 0.

It is clear that h is holomorphic on V . Since g is uniformly continuous on compactsubsets of U×U , h is continuous. Let R be a rectangle contained in U . By Lemma3.36, ∫

∂R

h(z) dz =1

2πi

∫∂R

∫γ

g(z, w) dw dz

=1

2πi

∫γ

∫∂R

g(z, w) dz dw

= 0

since z 7→ g(z, w) is holomorphic for every w. By Theorem 2.73, h is holomorphicon U . Therefore h is an entire function. The image of γ is compact, so for large zwe have W (γ, z) = 0 and z ∈ V . Then

h(z) =1

2πi

∫γ

f(w)

w − zdw → 0

as |z| → ∞, and Corollary 3.9 shows that h = 0.

Note that we can recover Theorem 3.33 from Theorem 3.34. Suppose that f isholomorphic on U and W (γ, z) = 0 for all z ∈ C \ U . Choosing any point z0 ∈ Unot in the image of γ and applying Theorem 3.34 to z 7→ (z − z0)f(z) gives∫

γ

f =

∫γ

(z − z0)f(z)

z − z0dz = 2πiW (γ, z0)(z0 − z0)f(z0) = 0.

Laurent series

A Laurent series is a series of the form

f(z) =

∞∑n=−∞

an(z − z0)n

where an ∈ F . Let

f+(z) =∑n≥0

an(z − z0)n and f−(z) =∑n<0

an(z − z0)n;

we say that f converges absolutely on a set E ⊆ C if the series f+ and f−

converge absolutely on E. Uniform convergence of f is defined similarly. Thus, aconvergent Laurent series is regarded as the sum

f(z) = f+(z) + f−(z).

156

3 Complex analysis

We write Ar,R(z0) for the open annulus z ∈ C : r < |z − z0| < R and Ar,R(z0)for the closed annulus z ∈ C : r ≤ |z − z0| ≤ R.

Theorem 3.37 (Existence of a Laurent series expansion). Let A = Ar,R(z0) be aclosed annulus, let f be holomorphic on A, and let r < s < S < R. Then

f(z) =

∞∑n=−∞

an(z − z0)n

for all s ≤ |z − z0| ≤ S, and the Laurent series converges absolutely and uniformlyfor these z. The coefficients an are given by

an =

1

2πi

∫CR

f(ζ)

(ζ − z0)n+1dζ, n ≥ 0,

1

2πi

∫Cr

f(ζ)

(ζ − z0)n+1dζ n < 0

where CR = z ∈ C : |z − z0| = R and Cr = z ∈ C : |z − z0| = r.

Proof. By definition, f is holomorphic on an open set U containing A. Let z ∈As,S(z0). The 1-cycle CR − Cr is homologous to 0 in U , so we have

f(z) =1

2πi

∫CR

f(ζ)

ζ − zdζ − 1

2πi

∫Cr

f(ζ)

ζ − zdζ

by Theorem 3.34. The first integral is handled as in Theorem 3.7, giving

1

2πi

∫CR

f(ζ)

ζ − zdζ =

∞∑n=0

[1

2πi

∫CR

f(ζ)

(ζ − z0)n+1dζ

](z − z0)n.

For all ζ ∈ Cr we have

1

ζ − z= − 1

z − z0

(1

1− ζ−z0z−z0

)

= − 1

z − z0

∞∑n=0

(ζ − z0

z − z0

)n,

where the geometric series converges absolutely and uniformly for ζ ∈ Cr since∣∣∣∣ζ − z0

z − z0

∣∣∣∣ ≤ r

s< 1.

157

3 Complex analysis

Therefore

− 1

2πi

∫Cr

f(ζ)

ζ − zdζ =

1

2πi

∫Cr

f(ζ)

z − z0

∞∑n=0

(ζ − z0

z − z0

)ndζ

=

∞∑n=0

[1

2πi

∫∂D

f(ζ)

(ζ − z0)−ndζ

](z − z0)−n−1

as desired.

Theorem 3.38 (Uniqueness of Laurent series). Let

f(z) =

∞∑n=−∞

an(z − z0)n and g(z) =

∞∑n=−∞

bn(z − z0)n

be two Laurent series converging uniformly on an annulus A = Ar,R(z0). If f(z) =g(z) for all z ∈ A, then an = bn for all n.

Proof. Fix some k; then

∞∑n=−∞

an(z − z0)n−k−1 =

∞∑n=−∞

bn(z − z0)n−k−1

for all z ∈ A. Since the two Laurent series converge uniformly on A, we have

2πiak =

∞∑n=−∞

∫C

an(z − z0)n−k−1 dz =

∞∑n=−∞

∫C

bn(z − z0)n−k−1 dz = 2πibk

for any circle C in the annulus (with W (C, z0) = 1).

Isolated singularities

Let z0 ∈ C. A function f holomorphic on the annulus Dr(z0)\z0 = A0,r(z0) forsome r > 0 is said to have an isolated singularity at z0. For example, an entirefunction has an isolated singularity at every point of C. By Theorem 3.37, thereis a unique Laurent series expansion

f(z) =

∞∑n=−∞

an(z − z0)n

that converges on A0,r(z0). If there exists an integer m such that am 6= 0 andan = 0 for all n < m, we say that the order of f at z0 is m and we write

158

3 Complex analysis

ordz0 f = m. If an = 0 for all n then we write ordz0 f = ∞, and if an 6= 0 forarbitrarily small n then we write ordz0 f = −∞. If f has an isolated singularityat z0 and −∞ < ordz0 f <∞, we say that f has finite order at z0.

If ordz0 f ≥ 0 (including ordz0 f =∞), we say that z0 is a removable singularityof f . In this case, defining f(z0) = a0 makes f holomorphic at z0. If ordz0 f = −mfor some m > 0, we say that f has a pole of order m at z0. A simple pole isa pole of order 1. If ordz0 f = −∞, we say that z0 is an essential singularityof f . Thus, an isolated singularity is either a removable singularity, a pole, or anessential singularity.

We will often say that f has a zero of order m at z0 if ordz0 f = m for somem > 0. A simple zero is a zero of order 1.

Theorem 3.39. Suppose that f, g map into C and have finite order at z0. Thenfg and 1/f have finite order at z0, and:

1. ordz0(fg) = ordz0 f + ordz0 g.

2. ordz0(1/f) = − ordz0 f .

Proof. Let m = ordz0 f and n = ordz0 g. Then f1(z) = (z − z0)−mf(z) andg1(z) = (z − z0)−ng(z) are holomorphic at z0, with ordz0 f1 = ordz0 g1 = 0.Theorem 3.2 shows that f1g1 is holomorphic at z0 and ordz0(f1g1) = 0. Therefore(fg)(z) = (z − z0)m+nf1(z)g1(z) has finite order at z0 and

ordz0(fg) = m+ n+ ordz0(f1g1) = ordz0 f + ordz0 g.

Since f1 6= 0 on a neighborhood of z0, Theorem 3.2 shows that 1/f1 is holomorphicat z0. Therefore (1/f)(z) = (z − z0)−m(1/f1)(z) has finite order at z0 and

0 = ordz0(f/f) = ordz0 f + ordz0(1/f).

Let U ⊆ C be an open set and let S be a discrete subset of U . If a function fis holomorphic on U \ S and has a pole at each point of S, then we say that fis meromorphic on U . Every meromorphic function is locally a quotient of twoholomorphic functions. More precisely, if f has a pole of order m at z0 ∈ S then(z − z0)mf(z) is holomorphic at z0, so f(z) = [(z − z0)mf(z)]/(z − z0)m is thequotient of two holomorphic functions in a neighborhood of z0.

Lemma 3.40. Let z0 be an isolated singularity of f . If there exists a neighborhoodU of z0 such that f is bounded on U \ z0, then z0 is a removable singularity.

159

3 Complex analysis

Proof. Write f(z) =∑∞n=−∞ an(z−z0)n on some annulus Dr(z0)\z0. If m > 0

and 0 < s < r then

|a−m| =∣∣∣∣ 1

2πi

∫Cs

f(ζ)(ζ − z0)m−1 dζ

∣∣∣∣≤ s sup

ζ∈Cs

∣∣f(ζ)(ζ − z0)m−1∣∣

→ 0

as s→ 0. Therefore an = 0 for all n < 0.

Theorem 3.41 (Casorati-Weierstrass theorem). Let z0 be an essential singularityof f and suppose that f is holomorphic on some annulus Dr(z0) \ z0. If f mapsinto C, then f(Dr(z0) \ z0) is dense in C.

Proof. Suppose that f(Dr(z0)\z0) is not dense in C, i.e. there exists a complexnumber α and some ε > 0 such that |f(z)− α| > ε for all z ∈ Dr(z0) \ z0. Thefunction

g(z) =1

f(z)− αhas an isolated singularity at z0 and is bounded on Dr(z0) \ z0, so Lemma 3.40shows that g has a removable singularity at z0. Since

f(z) =1

g(z)+ α,

Theorem 3.39 shows that f has a removable singularity or a pole at z0.

Residues

Suppose that a function f has an isolated singularity at a point z0 and let

f(z) =

∞∑n=−∞

an(z − z0)n

be the Laurent series of f at z0. We define the residue of f at z0 to be the valuea−1, and we write Resz0 f = a−1.

Theorem 3.42. Let z0 be an isolated singularity of f and g.

1. If f has a pole of order m at z0, then

Resz0 f =1

(m− 1)!limz→z0

dm−1

dzm−1(z − z0)mf(z).

160

3 Complex analysis

2. Assume that f or g maps into C. If f has a simple pole at z0 and g isholomorphic at z0, then

Resz0(gf) = g(z0) Resz0 f.

3. Assume that f maps into C. If f has a simple zero at z0 (i.e. f(z0) = 0 andf ′(z0) 6= 0) and g is holomorphic at z0, then

Resz0(g/f) = g(z0)/f ′(z0).

Proof. (1) is clear from the Laurent series of f at z0. For (2), we have

g(z)f(z)Z = (g(z0) + g′(z0)(z − z0) + · · · )(

Resz0 f

z − z0+ · · ·

)=g(z0) Resz0 f

z − z0+ · · · .

For (3), write f(z) = f ′(z0)(z−z0)+ · · · = f ′(z0)(z−z0)(1+h(z)) where ordz0 h >0. Then

1

f(z)=

1

f ′(z0)(z − z0)(1− h(z) + h(z)2 + · · · ),

and it is clear that Resz0(1/f) = 1/f ′(z0).

Theorem 3.43. Suppose that f is holomorphic on Dr(z0) \ z0 for some r > 0.Let C be a circle centered at z0 with radius less than r. Then∫

C

f = 2πiResz0 f.

Proof. Since the Laurent series for f converges uniformly on C, we have∫C

f(z) dz =

∫C

∞∑n=−∞

an(z − z0)n dz

=

∞∑n=−∞

an

∫C

(z − z0)n dz

= 2πia−1.

161

3 Complex analysis

Theorem 3.44 (Residue theorem). Let U ⊆ C be an open set, let z1, . . . , zn bedistinct points in U , and let γ be a 1-cycle in U \ z1, . . . , zn. If γ is homologousto 0 as a 1-cycle in U and f is holomorphic on U \ z1, . . . , zn, then∫

γ

f = 2πi

n∑j=1

W (γ, zj) Reszj f.

Proof. Choose closed discs D1, . . . , Dn ⊆ U centered at z1, . . . , zn with positiveradius such that Dj ∩Dk = ∅ whenever j 6= k. Let

η = γ −W (γ, z1)∂D1 − · · · −W (γ, zn)∂Dn.

If z /∈ U then W (η, z) = 0 since W (γ, z) = 0 and W (∂Dj , z) = 0 for each j. Ifz = zk then W (∂Dj , z) = 1 if j = k and W (∂Dj , z) = 0 if j 6= k, so W (η, z) = 0.Therefore Theorem 3.33 shows that∫

γ

f =

n∑j=1

W (γ, zj)

∫∂Dj

f = 2πi

n∑j=1

W (γ, zj) Reszj f.

For the remainder of this section, assume that f and g map into C. Suppose fhas finite order m at a point z0, and write

f(z) = am(z − z0)m + am+1(z − z0)m+1 + · · ·= am(z − z0)m(1 + h(z))

in a neighborhood of z0, where ordz0 h > 0. Using the identity

(fg)′

fg=f ′

f+g′

g,

we havef ′(z)

f(z)=

m

z − z0+

h′(z)

1 + h(z).

But h′/(1+h) is holomorphic at z0, so Resz0(f ′/f) = m. We have just proved thefollowing:

Theorem 3.45. If f has finite order at z0, then

Resz0(f ′/f) = ordz0 f.

162

3 Complex analysis

Corollary 3.46 (Argument principle). Let U ⊆ C be an open set and let γ be a1-cycle in U homologous to 0. Let f be meromorphic on U with a finite numberof zeros and poles z1, . . . , zn, none of which are in the image of γ. Then∫

γ

f ′

f= 2πi

n∑j=1

W (γ, zj) ordzj f.

If f is meromorphic on a closed disc D and z1, . . . , zn is the set of its zeros andpoles in IntD, then∫

∂D

f ′

f= 2πi

n∑j=1

ordzj f

= 2πi(number of zeros − number of poles),

where each zero and each pole is counted according to its multiplicity.

Theorem 3.47 (Rouche’s theorem). Let U ⊆ C be an open set and let γ : [a, b]→U be a closed C1 path that is homotopic to a point. Assume that γ has an interiorV . Let f and g be holomorphic on U with

|f(z)− g(z)| < |f(z)|

for all z in the image of γ. Then f and g have the same number of zeros in V .(Each zero is counted according to its multiplicity.)

Proof. We have ∣∣∣∣ g(z)

f(z)− 1

∣∣∣∣ < 1

for all z in the image of γ. Then (g/f) γ is a closed path contained in the opendisc D1(1), so

W ((g/f) γ, 0) = 0

since 0 /∈ D1(1). Then

0 = W ((g/f) γ, 0)

=

∫(g/f)γ

1

zdz

=

∫ b

a

(g/f)′(γ(t))

(g/f)(γ(t))γ′(t) dt

=

∫γ

(g/f)′

g/f

=

∫γ

g′

g−∫γ

f ′

f,

163

3 Complex analysis

so the result follows from Corollary 3.46.

3.3 Formal power series

The algebra of formal power series

Let K = R or K = C. A formal power series in K is a list a0, a1, . . . of numbersin K, which we will denote by

f(z) =

∞∑n=0

anzn = a0 + a1z + a2z

2 + · · · .

Here the symbol z has no analytic meaning, and is simply part of the notation.The set KJzK of formal power series is a vector space if we define

∞∑n=0

anzn +

∞∑n=0

bnzn =

∞∑n=0

(an + bn)zn,

r

∞∑n=0

anzn =

∞∑n=0

(ran)zn,

and KJzK becomes a unital commutative algebra if we define multiplication onKJzK by ( ∞∑

m=0

amzm

)( ∞∑n=0

bnzn

)=

∞∑n=0

∑k1+k2=n

ak1bk2zn

=

∞∑n=0

n∑k=0

an−kbkzn.

If f(z) =∑∞n=0 anz

n ∈ KJzK, we say that an is the nth coefficient of f(z). Itis clear that the map [zn] : KJzK → K that gives the nth coefficient is a linearfunctional on KJzK. The set K[z] of all polynomials is a subalgebra of KJzK.

Let f(z) =∑∞n=0 anz

n ∈ KJzK. If there exists an integer m such that am 6= 0and an = 0 for all n < m, we say that the order of f(z) is m and we writeord f(z) = m. If f(z) = 0, then we write ord f(z) =∞. Note that

ord(f(z) + g(z)) ≥ min(ord f(z), ord g(z)),

ord f(z)g(z) = ord f(z) + ord g(z),

ord f(z)n = n ord f(z)

164

3 Complex analysis

for all f(z), g(z) ∈ KJzK and n ≥ 0. We write KJzK≥p for the ideal

f(z) ∈ KJzK : ord f(z) ≥ p .

We can define a metric on KJzK by

d(f(z), g(z)) = 2− ord(f(z)−g(z)).

In this metric, a sequence fk(z) in KJzK converges to∑∞n=0 anz

n if and only iffor all n ≥ 0 there exists some K such that [zn]fk(z) = an for all k ≥ K. It is easyto check that addition, scalar multiplication, multiplication and the functional [zn]are all continuous.

Let fk(z) be a sequence in KJzK. If the sequence of partial sums∑Kk=0 fk(z) con-

verges to some f(z) ∈ KJzK, we say that the infinite series∑∞k=0 fk(z) converges

and we write∑∞k=0 fk(z) = f(z).

Lemma 3.48.

1.∑∞k=0 fk(z) in KJzK converges if and only if limk→∞ ord fk(z) =∞.

2. K[z] is dense in KJzK.

3. If∑∞k=0 fk(z) and

∑∞k=0 gk(z) converge, then

∞∑k=0

k∑j=0

fk−j(z)gj(z) =

∞∑j=0

fj(z)

∞∑k=0

gk(z).

Proof. For (3), we first show that the series∑∞k=0

∑kj=0 fk−j(z)gj(z) converges.

Let N be a positive integer and choose K such that ord fk(z) ≥ N and ord gk(z) ≥N for all k ≥ K. For all k ≥ 2K and 0 ≤ j ≤ k we have j ≥ K or k − j ≥ K, so

ord

k∑j=0

fk−j(z)gj(z) ≥ min0≤j≤k

ord fk−j(z)gj(z)

≥ N.

165

3 Complex analysis

Write fk(z) =∑∞m=0 a

(k)m zm and gk(z) =

∑∞n=0 b

(k)n zn. Then

∞∑k=0

k∑j=0

fk−j(z)gj(z) =

∞∑k=0

k∑j=0

∞∑m=0

a(k−j)m zm

∞∑n=0

b(j)n zn

=

∞∑k=0

k∑j=0

∞∑n=0

n∑m=0

a(k−j)n−m b(j)m zn

=

∞∑n=0

n∑m=0

∞∑k=0

k∑j=0

a(k−j)n−m b(j)m zn

=

∞∑n=0

n∑m=0

∞∑j=0

a(j)n−mz

j∞∑k=0

b(k)m zk

=

∞∑m=0

∞∑j=0

a(j)m zj

( ∞∑n=0

∞∑k=0

b(k)n zk

)

=

∞∑j=0

fj(z)

∞∑k=0

gk(z).

If f(z) =∑∞n=0 anz

n ∈ KJzK and g(z) ∈ KJzK≥1, we define the composition off(z) and g(z) by

f(z) g(z) = f(g(z)) =

∞∑n=0

ang(z)n.

This is well-defined since ord g(z)n = n ord g(z) ≥ n→∞ as n→∞ (see Lemma3.48). For any g(z) ∈ KJzK≥1, the map f(z) 7→ f(z) g(z) is an algebra homo-morphism. Note that f(0) = f(z) 0 = a0.

Theorem 3.49 (Invertible elements of KJzK). Let f(z), g(z) ∈ KJzK.

1. 1− z is invertible, and

1

1− z=

∞∑n=0

zn.

2. If f(z) is invertible and g(z) ∈ KJzK≥1, then f(g(z)) = f(z) g(z) is invert-ible.

3. f(z) is invertible if and only if ord f(z) = 0.

166

3 Complex analysis

Proof. (1) is easy to check. For (2), suppose that f(z) is invertible and writef(z) =

∑∞n=0 anz

n and f(z)−1 =∑∞n=0 bnz

n. Then f(z)f(z)−1 = 1, so

1 = (f(z)f(z)−1) g(z) = (f(z) g(z))(f(z)−1 g(z))

and therefore (f(z) g(z))−1 = f(z)−1 g(z). For (3), suppose that f(z) isinvertible. Then f(0) = a0 is invertible by (2), so ord f(z) = 0. Conversely,suppose that ord f(z) = 0. Then a0 6= 0, and

1

a0f(z) = 1−

(1− 1

a0f(z)

)is invertible by (2) since ord

(1− 1

a0f(z)

)≥ 1. Therefore f(z) is invertible.

The formal derivative of f(z) =∑∞n=0 anz

n is defined by

Df(z) = f ′(z) =

∞∑n=0

nanzn−1 =

∞∑n=0

(n+ 1)an+1zn.

It is clear that the map D : KJzK→ KJzK is linear (and continuous), and [zn]D =(n+ 1)[zn+1]. In particular, [z0]Dp = p![zp].

Theorem 3.50 (Properties of the formal derivative).

1. Dzn = nzn−1 for n ≥ 0.

2. D is the unique linear operator on KJzK satisfying Dz = 1 and

D(f(z)g(z)) = [Df(z)]g(z) + f(z)[Dg(z)] (*)

for all f(z), g(z) ∈ KJzK.

3. kerD = K ⊆ KJzK and imD = KJzK.

4. Let f(z) ∈ KJzK. If ord f(z) ≥ 1 then ordDf(z) = ord f(z)− 1.

5. If f(z) ∈ KJzK and g(z) ∈ KJzK≥1, then

D(f(z) g(z)) = [Df(z) g(z)]Dg(z) = f ′(g(z))g′(z). (**)

Proof. (1) is clear from the definition. Since both sides of (*) are bilinear in f(z)and g(z), and K[z] is dense in KJzK, it suffices to prove (2) when f(z) = zm andg(z) = zn:

D(zmzn) = (m+ n)zm+n−1

= (mzm−1)zn + zm(nzn−1)

= (Dzm)zn + zm(Dzn).

167

3 Complex analysis

Suppose that (*) holds for some linear map D′. Since

D′1 = D′(1 · 1) = [D′1]1 + 1[D′1] = 2D′1,

we have D′1 = 0. By induction we have D′zn = nzn−1 for n ≥ 1: if D′zn = nzn−1

then

D′(zn+1) = D′(zzn) = [D′z]zn + z[D′zn] = zn + nzn = (n+ 1)zn.

Since K[z] is dense in KJzK, we have D′ = D. Parts (3) and (4) are obvious. Sinceboth sides of (**) are linear in f(z), it suffices to prove (5) when f(z) = zm. Weuse induction on m. It is clear that D(g(z)0) = 0. Assuming that D(g(z)m) =mg(z)m−1Dg(z), we have

D(g(z)m+1) = D(g(z)mg(z))

= D(g(z)m)g(z) + g(z)mDg(z)

= mg(z)mDg(z) + g(z)mDg(z)

= (m+ 1)g(z)mDg(z).

Elementary functions

Let r ∈ K. We define the binomial series

(1 + z)r =

∞∑n=0

(r

n

)zn,

where (r

n

)=r(r − 1) · · · (r − n+ 1)

n!

is the usual binomial coefficient.

Lemma 3.51. Let f(x, y) be a polynomial with coefficients in K. If f(m,n) = 0for all m,n ∈ Z, then f = 0.

Theorem 3.52 (Properties of the binomial series). Let r, s ∈ K.

1. (1 + z)r+s = (1 + z)r(1 + z)s.

2. (1 + z)nr = ((1 + z)r)n for all n ∈ Z.

3. D(1 + z)r = r(1 + z)r−1.

168

3 Complex analysis

Proof. Note that

(1 + z)r(1 + z)s =

∞∑m=0

(r

m

)zm

∞∑n=0

(s

n

)zn =

∞∑n=0

n∑k=0

(r

n− k

)(s

k

)zn,

(1 + z)r+s =

∞∑n=0

(r + s

n

)zn.

If r, s ∈ Z then (1 + z)r(1 + z)s = (1 + z)r+s implies that

n∑k=0

(r

n− k

)(s

k

)=

(r + s

n

). (*)

Since both sides are polynomials in r and s, Lemma 3.51 shows that (*) holds forall r, s ∈ K. Therefore

(1 + z)r(1 + z)s =

∞∑n=0

n∑k=0

(r

n− k

)(s

k

)zn =

∞∑n=0

(r + s

n

)zn = (1 + z)r+s

for all r, s ∈ K. This proves (1), and (2) follows immediately. For (3),

D(1 + z)r =

∞∑n=0

(n+ 1)

(r

n+ 1

)zn

=

∞∑n=0

(n+ 1)r(r − 1) · · · (r − n)

(n+ 1)!zn

= r

∞∑n=0

(r − 1) · · · (r − n)

n!zn

= r(1 + z)r−1.

Corollary 3.53 (Quotient rule). If f(z), g(z) ∈ KJzK and g(z) is invertible, then

D

(f(z)

g(z)

)=

[Df(z)]g(z)− f(z)[Dg(z)]

g(z)2.

169

3 Complex analysis

Proof. We have

D

(f(z)

g(z)

)=Df(z)

g(z)+ f(z)D

(1

g(z)

)=Df(z)

g(z)+ f(z)

(−Dg(z)

g(z)2

)=

[Df(z)]g(z)− f(z)[Dg(z)]

g(z)2.

The exponential series is defined by

exp(z) = ez =

∞∑n=0

zn

n!.

Theorem 3.54 (Properties of the exponential series).

1. If f(z), g(z) ∈ KJzK≥1 then

exp(f(z) + g(z)) = exp(f(z)) exp(g(z)).

2. D exp(z) = exp(z).

Proof. For (1),

exp(f(z)) exp(g(z)) =

∞∑m=0

f(z)m

m!

∞∑n=0

g(z)n

n!

=

∞∑n=0

n∑k=0

1

(n− k)!k!f(z)n−kg(z)k

=

∞∑n=0

1

n!

n∑k=0

(n

k

)f(z)n−kg(z)k

=

∞∑n=0

(f(z) + g(z))n

n!

= exp(f(z) + g(z)).

(2) is clear.

170

3 Complex analysis

The logarithm series is defined by

L(z) = − log(1− z) =

∞∑n=1

zn

n.

Theorem 3.55 (Properties of the logarithm series).

1. D log(z) = (1− z)−1.

2. 1− exp(−L(z)) = z.

3. −L(1− exp(z)) = z.

Proof. (1) is obvious. For (2), let f(z) = (1− z) exp(L(z)). Then f(0) = 1 and

Df(z) = (1− z) exp(L(z))(1− z)−1 − exp(L(z)) = 0,

so f(z) = 1. For (3), let f(z) = z + L(1− exp(z)). Then f(0) = 0 and

Df(z) = 1 +− exp(z)

1− (1− exp(z))= 0,

so f(z) = 0.

Convergence of series

Let f(z) =∑∞n=0 anz

n ∈ KJzK and let

α = lim supn→∞

n√|an|.

If α = ∞, let R(f(z)) = 0; if α = 0, let R(f(z)) = ∞; otherwise, let R(f(z)) =1/α. The number R(f(z)) is called the radius of convergence of f(z). For0 ≤ r ≤ ∞, let

KJzKr = f(z) ∈ KJzK : R(f(z)) ≥ r

=

∞∑n=0

anzn ∈ KJzK : lim sup

n→∞

n√|an| ≤

1

r

.

Theorem 3.56. KJzKr is a subalgebra of KJzK. Furthermore, if f(z) ∈ KJzKrthen Df(z) ∈ KJzKr.

171

3 Complex analysis

Proof. We can assume that r > 0. Let f(z), g(z) ∈ KJzKr and write f(z) =∑∞n=0 anz


n. If k ∈ K then

lim supn→∞

n√|kan| ≤

(limn→∞

n√|k|)(

lim supn→∞

n√|an|

)≤ 1

r

since limn→∞n√|k| = 1. Let 0 < s < r. Since 1/s > 1/r, there exists some N

such that

|an| ≤1

snand |bn| ≤

1

sn

for all n ≥ N . Then

lim supn→∞

n√|an + bn| ≤ lim sup

n→∞

n√|an|+ |bn|

≤ lim supn→∞

n√

2

s

=1

s.

This holds for any 0 < s < r, so

lim supn→∞

n√|an + bn| ≤

1

r.

Also,

lim supn→∞

n√|(n+ 1)an| ≤ lim sup

n→∞

n

√n+ 1

sn

= lim supn→∞

n√n+ 1

s

=1

s.


lim supn→∞

n√|(n+ 1)an| ≤

1

r

and Df(z) ∈ KJzKr. Now write f(z)g(z) =∑∞n=0 cnz

n where cn =∑nk=0 an−kbk,

and let 0 < s < r. Since 1/s > 1/r, there exists some constant C > 0 such that

|an| ≤C

snand |bn| ≤

C

sn

172

3 Complex analysis

for all n ≥ 0. Then

|cn| ≤n∑k=0

|an−k| |bk| ≤(n+ 1)C2

sn, (*)

so

lim supn→∞

n√|cn| = lim sup

n→∞

n√n+ 1C2/n

s

=1

s.


lim supn→∞

n√|cn| ≤

1

r.

Let Hr be the (unital commutative) algebra of all holomorphic functions on thedisc Dr(0) = z ∈ C : |z| < r. Let Φ : KJzKr → Hr be the map defined by

Φ

( ∞∑n=0

anzn

)(ζ) =

∞∑n=0

anζn.

Theorem 1.136 shows that the sum on the right converges.

Theorem 3.57. The map Φ : KJzKr → Hr is an algebra isomorphism. Further-more, Φ(Df(z)) = Φ(f(z))′.

Proof. It is clear that Φ is linear. Let f(z), g(z) ∈ KJzKr and write f(z) =∑∞n=0 anz


n. Let ζ ∈ Dr(0). From (*) in the proof ofTheorem 3.56, the series

∞∑n=0

n∑k=0

|an−k| |bk| |ζ|n

converges. Therefore∣∣∣∣∣N∑n=0

n∑k=0

an−kbkζn −

N∑m=0

amζm

N∑n=0

bnζn

∣∣∣∣∣ ≤∞∑

n=N+1

n∑k=0

|an−k| |bk| |ζ|n

→ 0

173

3 Complex analysis

as N →∞. This shows that

Φ(f(z)g(z))(ζ) =

∞∑n=0

n∑k=0

an−kbkζn =

∞∑m=0

amζm∞∑n=0

bnζn

= Φ(f(z))(ζ)Φ(g(z))(ζ) = (Φ(f(z))Φ(g(z))(ζ).

If Φ(f(z)) = 0 then f(z) = 0 by Theorem 3.20, so Φ is injective. Now let ϕ ∈ Hr.By Theorem 3.7, we can choose some f(z) ∈ KJzKr/2 such that ϕ(ζ) = Φ(f(z))(ζ)for all ζ ∈ Dr/2(0). Let 0 < s < r; then Theorem 3.7 shows that there is someg(z) ∈ KJzKs such that ϕ(ζ) = Φ(g(z))(ζ) for all ζ ∈ Ds(0). By Theorem 3.20,f(z) = g(z), so R(f(z)) ≥ s. This holds for any 0 < s < r, so R(f(z)) ≥ r andϕ(ζ) = Φ(f(z))(ζ) for all ζ ∈ Dr(0). This shows that Φ is surjective.

Theorem 2.59 shows that Φ(Df(z)) = Φ(f(z))′.

Corollary 3.58. Let f(z) ∈ KJzK. If R(f(z)) > 0, then R(f(z)−1) > 0 andΦ(f(z)−1)(ζ) = Φ(f(z))(ζ)−1 for sufficiently small ζ.

Theorem 3.59. Let f(z) ∈ KJzK and g(z) ∈ KJzK≥1. Write g(z) =∑∞n=1 bnz

n.If R(f(z)) = r > 0 and s > 0 is chosen so that

∞∑n=1

|bn| sn ≤ r,

then R(f(z) g(z)) ≥ s and Φ(f(z) g(z))(ζ) = (Φ(f(z)) Φ(g(z)))(ζ) for ζ ∈Ds(0).

174

4 Integration

From now on, E, F and G denote Banach spaces over a field K (where K = R orK = C). We write L(E,F) for the space of continuous linear maps from E to F,and L(E) = L(E,E) for the space of linear operators on E.

Notes

Our construction of the Bochner integral is based on [9, 11]. It is possible tostart with the integration of nonnegative functions, extend the theory to real andcomplex functions, and then extend to Banach space valued functions. Instead, theapproach used here focuses on the abstract properties of the integral (especially asa continuous linear map on a space of functions), and positivity is important onlyfor showing that real-valued functions can be approximated by simple maps (andare therefore µ-measurable). By taking this approach, many proofs are simplifiedand work for Banach space valued functions without additional effort. We discussvector-valued measures alongside positive measures, and these are used to proveHilbert space versions of the Radon-Nikodym theorem and Lp duality.

The material on Radon measures and Haar measure is based on [5]. Many of thetheorems have been extended to take full advantage of the Bochner integral. Forexample, Lp duality for Hilbert spaces allows us to extend the Riesz representationtheorem to Hilbert space valued measures. Since we do not allow integrals to takeon the value ∞, the proof of Theorem 4.146 is much longer than in [5]. Thistheorem is used in the proof of Fubini’s theorem for Radon products, which willbe essential in Section 5.9 for showing that the convolution of two L1 functions isalso L1.

Usually, Lebesgue measure is defined for R and then extended to Rn by taking theproduct measure. This has the major disadvantage of being coordinate dependent.Here we define Lebesgue measure on a finite-dimensional vector space as a Haarmeasure, and we prove the linear transformation rule µ(f(E)) = |det(f)|µ(E)using translation invariance. This proof still requires an arbitrary choice of basis,but the definition of Lebesgue measure does not.

175

4 Integration

4.1 Measurable spaces and measures

Measurable spaces

A nonempty collectionR of subsets of a nonempty setX is a ring inX if it is closedunder unions and relative complements: if A,B ∈ R then A∪B ∈ R and A\B ∈ R.Note that a ring is also closed under intersections, since A∩B = A\(A\B). Furthernote that any ring R is closed under finite unions and finite intersections, and that∅ ∈ R. If X ∈ R, then R is called an algebra (or field) in X. A σ-ring is aring R that is closed under countable unions: if Ai ∈ R for i = 1, 2, . . . then⋃∞i=1Ai ∈ R. Again, any σ-ring is also closed under countable intersections since

if A =⋃∞i=1Ai then

⋂∞i=1Ai = A \ (

⋃∞i=1A \Ai). A σ-ring that is also an algebra

in X is called a σ-algebra (or σ-field) in X.

Lemma 4.1. Let X be a set.

1. If Rαα∈A is a collection of rings in X, then⋂α∈ARα is a ring in X.

2. If R is a ring in X and B is any subset of X, the set

E ∩B : E ∈ R

is a ring in X.

3. Let Bαα∈A be a partition of X. For each α, let Rα be a ring in Bα. Thenthe set

R = E ⊆ X : E ∩Bα ∈ Rα for all α ∈ A

is a ring in X.

The above statements still hold if “ring” is replaced by algebra, σ-ring, or σ-algebra.

Proof. For (3), it is clear that R is nonempty. If E,F ∈ R then for each α wehave (E \ F ) ∩Bα = (E ∩Bα) \ (F ∩Bα) ∈ Rα, so E \ F ∈ R. Let E1, . . . , Enbe a collection of subsets in R. Then(

n⋃i=1

Ei

)∩Bα =

n⋃i=1

(Ei ∩Bα) ∈ Rα

for each α, so⋃ni=1Ei ∈ R.

Corollary 4.2. Let E be any collection of subsets of a set X. Then there is asmallest ring (algebra, σ-ring, σ-algebra) R containing E.

176

4 Integration

Proof. Take R to be the intersection of all rings containing E , noting that theintersection is not empty since the power set of X is a ring containing E .

The smallest ring (algebra, σ-ring, σ-algebra) containing E is called the ring (al-gebra, σ-ring, σ-algebra) generated by E , and is denoted byM(E). Note that ifE ⊆M(F) then M(E) ⊆M(F).

Definition 4.3. A set X together with a σ-algebra M is a measurable space,and the members ofM are measurable sets. We often write “X is a measurablespace” instead of “(X,M) is a measurable space” when M is clear from thecontext.

Let (Xα,Mα)α∈A be a collection of measurable spaces, let X =∏α∈AXα,

and let πα : X → Xα be the canonical projections. The product σ-algebra⊗α∈AMα on X is the σ-algebra generated by the set

F =π−1α (Eα) : Eα ∈Mα, α ∈ A

.

Theorem 4.4 (Properties of product σ-algebras). Let (Xα,Mα)α∈A be a col-lection of measurable spaces.

1. If A is countable then⊗

α∈AMα is generated by

F1 =

∏α∈A

Eα : Eα ∈Mα

.

2. If each Mα is generated by Eα then⊗


F2 =π−1α (Eα) : Eα ∈ Eα, α ∈ A

.

3. If A is countable in (2) then⊗


F3 =

∏α∈A

Eα : Eα ∈ Eα

.

Proof. Recall that⊗

α∈AMα = M(F). For (1), if Eα ∈ Mα then π−1α (Eα) =∏

β∈A Fβ where Fα = Eα and Fβ = Xβ for β 6= α, so it is clear that M(F) ⊆M(F1). It also follows that∏

α∈AEα =

⋂α∈A

π−1α (Eα) ∈

⊗α∈AMα,

177

4 Integration

so M(F1) ⊆ M(F). For (2), since F2 ⊆ F it is clear that M(F2) ⊆ M(F). Foreach α the collection

E ⊆ Xα : π−1

α (E) ∈M(F2)

is a σ-algebra on Xα contain-ing Eα, so it also contains Mα. That is, π−1

α (E) ∈ M(F2) for all E ∈ Mα andα ∈ A. Therefore M(F) ⊆M(F2). The proof of (3) is similar to that of (1).

Definition 4.5. Let (X, T ) be a topological space. The Borel σ-algebra BX onX is the σ-algebra generated by T . In other words, it is the smallest σ-algebra forwhich every open set in X is measurable. The sets in the Borel σ-algebra are theBorel sets of X. From now on, by default we consider every topological space tobe a measurable space with its Borel σ-algebra.

Theorem 4.6 (Product of Borel σ-algebras). Let X1, . . . , Xn be topological spacesand let X =

∏ni=1Xi with the product topology.

1.⊗n

i=1 BXi ⊆ BX .

2. If each Xi is second countable, then⊗n

i=1 BXi = BX .

Proof. Theorem 4.4 shows that⊗n

i=1 BXi is generated by

E =

n∏i=1

Ui : Ui ⊆ Xi is open

,

and (1) follows immediately because these sets are open. (2) follows from the factthat every open set in X can be written as a countable union of sets in E .

Definition 4.7. The extended real number system is created by adding thetwo elements +∞ (also written ∞) and −∞ to R, declaring that −∞ < x < ∞for all real numbers x. We denote the ordinary real numbers by (−∞,∞) andthe extended real numbers by [−∞,∞]. If E ⊆ [−∞,∞] is nonempty then inf Eand supE always exist, and may be infinite. Likewise, if xn is a sequence in[−∞,∞] then lim infn→∞ xn and lim supn→∞ xn always exist.

Measures

Let (X,M) be a measurable space. A positive measure on M (or more often“on X”) is a map µ : M→ [0,∞] that is countably additive: whenever Anis a disjoint countable collection of sets in M,

µ

( ∞⋃n=1

An

)=

∞∑n=1

µ(An). (*)

178

4 Integration

We also require that µ(∅) = 0. If A is a measurable set then the measure of A isthe value µ(A). If F is a Banach space, then an F-valued measure (or vector-valued measure) is a countably additive map µ : M → F such that µ(∅) = 0.A measure is a positive or F-valued measure. If (X,M) is a measurable spacewith a (positive or F-valued) measure µ, we say that (X,M, µ) is a (positive orF-valued) measure space (or simply “X is a measure space”). Note that thecondition in (*) implies unconditional convergence, so Theorem 1.21 implies thatthe series converges absolutely if µ is a positive measure or if µ is an F-valuedmeasure where F is finite-dimensional. If A ∈ M, we call µ(A) the (µ-)measureof A. If µ is a positive measure and µ(X) <∞, we say that µ is finite.

If R is a ring in X, then a pre-measure on R (or X) is a countably additive mapµ : R → [0,∞] such that µ(∅) = 0. However, we only require countable additivityto hold for a sequence An of sets if

⋃∞n=1An ∈ R.

Example 4.8. The following are simple examples of positive measures:

1. If X is a set and E is any subset of X, we define µ(E) = |E| if E is finiteand µ(E) =∞ if E is infinite. This is called the counting measure on X.

2. Fix x ∈ X and define µ(E) = 1 if x ∈ E and µ(E) = 0 if x /∈ E. This iscalled the Dirac measure at x0.

Theorem 4.9 (Properties of measures). Let µ be a measure on M.

1. µ(A1 ∪ · · · ∪ An) = µ(A1) + · · · + µ(An) if A1, . . . , An are pairwise disjointsets in M.

2. If An ∈M, A =⋃∞n=1An and A1 ⊆ A2 ⊆ · · · then µ(An)→ µ(A).

3. If An ∈M, A =⋂∞n=1An, A1 ⊇ A2 ⊇ · · · and µ(A1) is finite then µ(An)→

µ(A).

If µ is a positive measure, then:

4. µ (⋃∞n=1An) ≤

∑∞n=1 µ(An) if An is a collection of sets in M, not neces-

sarily disjoint.

5. A ⊆ B implies µ(A) ≤ µ(B).

Proof. For (1), take An+1, An+2, · · · = ∅ in the definition of countable additivity.For (2), let B1 = A1 and Bn = An \An−1. Then

µ(An) = µ

(n⋃k=1

Bk

)=

n∑k=1

µ(Bk),

179

4 Integration

so

µ(An)→∞∑k=1

µ(Bk) = µ

( ∞∑k=1

Bk

)= µ(A).

For (3), let Bn = A1 \ An. Then B1 ⊆ B2 ⊆ · · · and⋃∞n=1Bn = A1 \ A, so

applying (4) to Bn shows that

µ(An) = µ(A1)− µ(Bn)→ µ(A1)− µ(A1 \A) = µ(A).

For (4), let B1 = A1 and Bn = An \⋃n−1k=1 Ak for n ≥ 2. Then Bn is a disjoint

collection of sets in M, so

P

( ∞⋃n=1

An

)= P

( ∞⋃n=1

Bn

)=

∞∑n=1

P (Bn) ≤∞∑n=1

P (An).

For (5), we have

µ(B) = µ(A ∪ (B \A)) = µ(A) + µ(B \A) ≥ µ(A).

A partition of a set A is a sequence of disjoint measurable sets An such thatA =

⋃∞n=1An. Let µ be a measure. The total variation of µ is defined by

|µ| (A) = sup

∞∑n=1

|µ(An)|

for A ∈ M, where the sup is taken over all partitions of A. (If µ is a positivemeasure, then we define |∞| =∞.)

Theorem 4.10 (Properties of the total variation). Let µ be a measure.

1. |µ(A)| ≤ |µ| (A).

2. A ⊆ B implies |µ| (A) ≤ |µ| (B).

3. |µ| is a positive measure.

Proof. For (1), consider the trivial partition A of A. For (2), note that if Anis a partition of A then An∪ B \A is a partition of B. For (3) it is clear that|µ| (∅) = 0, so it remains to prove countable additivity. Let A ∈M, let An be apartition of A, and let ε > 0. For each n, let Ank be a partition of An such that

|µ| (An)− ε

2n≤∞∑k=1

|µ(Ank)| .

180

4 Integration

Then Ank for n, k = 1, 2, . . . is a partition of A, so

∞∑n=1

|µ| (An)− ε ≤∞∑n=1

∞∑k=1

|µ(Ank)| ≤ |µ| (A)

and∞∑n=1

|µ| (An) ≤ |µ| (A)

since ε was arbitrary. Conversely, if Bk is another partition of A then

∞∑k=1

|µ(Bk)| =∞∑k=1

∣∣∣∣∣∞∑n=1

µ(An ∩Bk)

∣∣∣∣∣≤∞∑n=1

∞∑k=1

|µ(An ∩Bk)|

≤∞∑n=1

|µ| (An).

This holds for all partitions Bk of A, so

|µ| (A) ≤∞∑n=1

|µ| (An).

The most important property that characterizes the total variation of a measureis given below.

Theorem 4.11. |µ| is the smallest positive measure such that |µ(A)| ≤ |µ| (A)for all A ∈M. That is, if ν is a positive measure such that |µ(A)| ≤ ν(A) for allA ∈M, then |µ| (A) ≤ ν(A) for all A ∈M.

Proof. If An is a partition of A, then

∞∑n=1

|µ(An)| ≤∞∑n=1

ν(An) = ν(A).

181

4 Integration

We say that a measure µ on X is of bounded variation if |µ| (X) < ∞. Notethat

|µ+ ν| (A) ≤ |µ| (A) + |ν| (A),

|rµ| (A) = |r| |µ| (A)

for all measures µ, ν on X, all A ∈M, and r ∈ K (if F is a Banach space over K).Let M(M,F) be the set of all F-valued measures on (X,M) of bounded variation;then M(M,F) is a normed vector space if we define

‖µ‖ = |µ| (X)

for µ ∈M(M,F). (We use ‖·‖ to denote the norm of µ because |·| is used for thetotal variation of µ.)

Lemma 4.12. Let µ : M → F be any map satisfying the following property: ifAn ∈ M with A1 ⊇ A2 ⊇ · · · and

⋂∞n=1An = ∅, then µ(An) → 0. Then µ is

countably additive.

Proof. Let A ∈ M, let An be a partition of A, and let Bk =⋃∞n=k An so that⋂∞

k=1Bk = ∅. Then ∣∣∣∣∣µ(A)−k−1∑n=1

µ(An)

∣∣∣∣∣ = |µ(Bk)| → 0

as k →∞.

Theorem 4.13. The space M(M,F) is complete, i.e. a Banach space.

Proof. Let µn be a Cauchy sequence in M(M,F) and define a map µ :M→ Fas follows: if A ∈M and ε > 0 then there exists some N such that

|µm(A)− µn(A)| = |(µm − µn)(A)| ≤ |µm − µn| (A)

≤ |µm − µn| (X) = ‖µm − µn‖ < ε

for all m,n ≥ N , so µn(A) is a Cauchy sequence in F and we can define µ(A)to be the value that µn(A) converges to. It is clear that µ(∅) = 0. Let An ∈Mwith A1 ⊇ A2 ⊇ · · · and

⋂∞n=1An = ∅, and let ε > 0. Choose M such that

‖µm − µn‖ < ε for all m,n ≥ M , and choose N such that |µM (An)| < ε for alln ≥ N . For m ≥M and n ≥ N we have

|µm(An)| ≤ |µM (An)|+ ‖µm − µM‖ < 2ε,

182

4 Integration

so taking m→∞ shows that |µ(An)| < 2ε for n ≥ N . Therefore µ(An)→ 0, andLemma 4.12 shows that µ is countably additive.

Let ε > 0 and choose N such that ‖µm − µn‖ < ε for all m,n ≥ N . Let Xn bea partition of X. For each n, choose some kn ≥ N such that |µkn(Xn)−µ(Xn)| <ε/2n. Then

∞∑n=1

|(µm − µ)(Xn)| =∞∑n=1

|(µm − µkn)(Xn)|+∞∑n=1

|(µkn − µ)(Xn)|

≤ ‖µm − µkn‖+ ε

≤ 2ε,

so ‖µm − µ‖ ≤ 2ε for all m ≥ N . This shows that µm → µ as m → ∞. If wechoose ε = 1 then

‖µ‖ = ‖µ− µN‖+ ‖µN‖ <∞,

so in fact µ ∈M(M,F).

Theorem 4.14. Let F,G be Banach spaces, let f : F→ G be a continuous linearmap, and let µ be an F-valued measure.

1. f µ is a G-valued measure.

2. If µ ∈M(M,F) then |f µ| ≤ |f | |µ|. In particular, µ 7→ fµ is a continuouslinear map from M(M,F) to M(M,G).

Proof. It is clear that (f µ)(∅) = 0. If An ∈ M with A1 ⊇ A2 ⊇ · · · and⋂∞n=1An = ∅ then µ(An) → 0, so (f µ)(An) → 0 because f is continuous.

Lemma 4.12 implies that µ is countably additive, which proves (1). (2) followsfrom the fact that

∞∑n=1

|(f µ)(An)| ≤ |f |∞∑n=1

|µ(An)| ≤ |f | |µ| (A)

for all partitions An of A.

Theorem 4.15. Let F be a Hilbert space and let µ ∈ M(M,F). If |µ(X)| =|µ| (X), then there is some unit vector u ∈ F such that µ(A) = |µ| (A)u for allA ∈M.

183

4 Integration

Proof. We can assume that µ(X) 6= 0; let u = µ(X)/ |µ(X)|. Let A ∈ M withµ(A) > 0. Then

|µ| (A)2 + 2 |µ| (A) |µ| (X \A) + |µ| (X \A)2

= (|µ| (A) + |µ| (X \A))2 = |µ| (X)2

= |µ(X)|2 = |µ(A) + µ(X \A)|2

= |µ(A)|2 + 2 Re 〈µ(A), µ(X \A)〉+ |µ(X \A)|2

≤ |µ| (A)2 + 2 Re 〈µ(A), µ(X \A)〉+ |µ| (X \A)2,

and

|µ(A)| |µ(X \A)| ≤ |µ| (A) |µ| (X \A) ≤ Re 〈µ(A), µ(X \A)〉 ≤ |µ(A)| |µ(X \A)| .

Therefore〈µ(A), µ(X \A)〉 = |µ(A)| |µ(X \A)| ,

and the Cauchy-Schwarz inequality implies that µ(X \ A) = r(A)µ(A) for somer(A) ≥ 0. We have

µ(A) =1

1 + r(A)µ(X) =

|µ(X)|1 + r(A)

u,

so if we define ν :M→ [0,∞) by

ν(A) =|µ(X)|

1 + r(A)

when µ(A) 6= 0 and ν(A) = 0 when µ(A) = 0, then ν(∅) = 0 and ν is countablyadditive because µ is. This shows that ν is a positive measure satisfying µ(A) =ν(A)u for all A ∈M. Theorem 4.11 implies that

|µ| (A) ≤ ν(A) = |µ(A)| ≤ |µ| (A),

i.e. ν(A) = |µ| (A) for all A ∈M.

Theorem 4.16. If F is finite-dimensional, then every F-valued measure is ofbounded variation.

Proof. We can assume that F = R; let µ be an R-valued measure on X. Supposethat A ∈M with |µ| (A) =∞, and M > 0. We can choose a partition An of Asuch that

∞∑n=1

|µ(An)| > 2M.

184

4 Integration

Let I be the set of all i for which µ(Ai) > 0, and let J be the set of all j for whichµ(Aj) < 0. Then ∑

i∈Iµ(Ai) > M or

∑j∈J−µ(Aj) > M.

Assume that the first case holds. Then we can choose a finite subset i1, . . . , iK ⊆I such that

µ

(K⋃k=1

Aik

)=

K∑k=1

µ(Aik) ≥M.

This shows that if |µ| (A) = ∞ and M > 0, then we can choose a subset B ⊆ Asuch that |µ(B)| ≥M .

Now suppose that µ is not of bounded variation, i.e. |µ| (X) = ∞. Let A1 = X.Having chosen A1, . . . , An−1 with |µ| (An−1) =∞, choose a subset B ⊆ An−1 suchthat |µ(B)| ≥ |µ(An−1)| + n. If |µ| (B) = ∞, then let An = B. If |µ| (B) < ∞,then let An = An−1 \B, noting that

|µ| (An) = |µ| (An−1)− |µ| (B) =∞.

Thus |µ| (An) =∞ and

|µ(An)| = |µ(An−1)− µ(B)| ≥ |µ(B)| − |µ(An−1)| ≥ n.

We have A1 ⊇ A2 ⊇ · · · , so Theorem 4.9 implies that limn→∞ µ(An) exists. Thisis a contradiction, because |µ(An)| ≥ n for each n.

Sets of measure zero

The sets of measure zero play an important role in measure theory. Let (X,M, µ)be a measure space. A set E ∈M has measure zero (or is µ-null) if |µ| (E) = 0.For a set E, we say that a property holds almost everywhere (written a.e.) ifthere is a set F of measure zero such that the property holds for all elements ofE \ F . We also say that the property holds for almost all x ∈ E. If |µ| (E) = 0then for every measurable F ⊆ E we have |µ| (F ) = 0 by Theorem 4.9. However,we would like this to hold for all subsets of E, not just measurable ones. As thefollowing theorem shows, we can always extend µ so that |µ| (F ) = 0 for all subsetsF ⊆ E:

Theorem 4.17. Let (X,M, µ) be a measure space. Let M be the collection of allE ⊆ X for which there are sets A,B ∈M such that A ⊆ E ⊆ B and |µ| (B \A) =0. In this case, we extend µ by setting µ(E) = µ(A). Then M is a σ-algebra inX and µ is a measure on M.

185

4 Integration

Proof. We first show that M is a σ-algebra. Clearly X ∈M. If A ⊆ E ⊆ B thenBc ⊆ Ec ⊆ Ac and Ac \Bc = B \A, so E ∈M implies that Ec ∈M. Finally, if

An ⊆ En ⊆ Bn, A =

∞⋃n=1

An, B =

∞⋃n=1

Bn, E =

∞⋃n=1

En (*)

and |µ| (Bn \An) = 0 then A ⊆ E ⊆ B and

B \A =

∞⋃n=1

Bn \A ⊆∞⋃n=1

Bn \An,

so |µ| (B \A) = 0 by Theorem 4.9. Next, we show that µ is well-defined. Supposethere are sets A1, A2, B1, B2 ∈ M such that A1 ⊆ E ⊆ B1, A2 ⊆ E ⊆ B2,|µ| (B1 \ A1) = 0 and |µ| (B2 \ A2) = 0. Then A1 \ A2 ⊆ E \ A2 ⊆ B2 \ A2, so|µ| (A2 \A1) = 0. A similar argument shows that |µ| (A1 \A2) = 0, so

|µ(A1)− µ(A2)| = |µ(A1 \A2)− µ(A2 \A1)|≤ |µ| (A1 \A2) + |µ| (A2 \A1)

= 0.

Finally, countable additivity is clear from the fact that if the sets En are disjointin (*) then the sets An are also disjoint.

We say that the extended measure µ is complete, and we callM the µ-completionof M. We will denote the extended measure by µ. Note that |µ| = |µ|.

σ-finite measures

A measure or pre-measure µ on a set X is said to be σ-finite if X can be writtenas a countable union of sets of finite |µ|-measure. A measurable set E is said tobe σ-finite if it can be written as a countable union of sets of finite |µ|-measure.Clearly, if E is σ-finite then every measurable subset of E is also σ-finite.

We say that µ is semifinite if for all E ∈ M with |µ| (E) = ∞ there is somemeasurable set F ⊆ E such that 0 < |µ| (F ) < ∞. All σ-finite measures aresemifinite, because if X =

∑∞n=1En where |µ| (En) <∞ then |µ| (E) =∞ implies∑∞

n=1 |µ| (E ∩ En) =∞, and there must be some n such that |µ| (E ∩ En) > 0.

Theorem 4.18. Let µ be a semifinite positive measure on X and let E ∈ Mwith µ(E) = ∞. For all C > 0, there exists a measurable set F ⊆ E such thatC < µ(F ) <∞.

186

4 Integration

Proof. Let R be the collection of all measurable sets F ⊆ E such that 0 < µ(F ) <∞. This set is nonempty because µ is semifinite. Let M = supF∈R µ(F ) andchoose a sequence Gn in R such that µ(Gn)→M . Let G =

⋃∞n=1Gn. Suppose

that M < ∞ and µ(G) < ∞; then µ(G) ≥ M because µ(Gn) → M , and µ(E \G) = ∞ because µ(E) = ∞. Choose a measurable set H ⊆ E \ G such that0 < µ(H) <∞. Then G ∪H ∈ R, so

M < µ(G) + µ(H) = µ(G ∪H) ≤M.

This is a contradiction, so either M = ∞ or µ(G) = ∞. If M = ∞ then thereis some N such that µ(GN ) > C. If µ(G) = ∞ then there is some N such that

µ(⋃N

n=1Gn

)> C.

Caratheodory’s extension theorem

If R is a ring consisting of sets of finite measure, then a measurable set E is saidto be σ-finite with respect to R if it can be written as a countable union ofsets in R. In this case, it is easy to see that E can be written as a countable unionof sets in R that are pairwise disjoint.

Definition 4.19. Let M be a σ-algebra in a set X. An outer measure on Mis a map µ∗ :M→ [0,∞] such that:

1. µ∗(∅) = 0.

2. If A,B ∈M and A ⊆ B then µ∗(A) ≤ µ∗(B).

3. If An ∈M then

µ∗

( ∞⋃n=1

An

)≤∞∑n=1

µ∗(An).

The last condition is called countable subadditivity. If µ∗ is an outer measure,a subset A of X is µ∗-measurable if for all subsets E ⊆ X we have

µ∗(E) = µ∗(E ∩A) + µ∗(E ∩Ac).

Theorem 4.20 (Caratheodory’s theorem). Let µ∗ be an outer measure on thepower set of X and let M be the collection of all µ∗-measurable subsets of X.Then M is a σ-algebra, and µ∗ is a positive complete measure on M.

Proof. Since the definition of µ∗ is symmetric in A and Ac, M is closed undercomplements. If A,B ∈M and E ⊆ X then

A ∪B = (A ∩B) ∪ (A ∩Bc) ∪ (Ac ∩B),

187

4 Integration

so by subadditivity we have

µ∗(E ∩A ∩B) + µ∗(E ∩A ∩Bc) + µ∗(E ∩Ac ∩B) ≥ µ∗(E ∩ (A ∪B))

and

µ∗(E) = µ∗(E ∩A) + µ∗(E ∩Ac)= µ∗(E ∩A ∩B) + µ∗(E ∩A ∩Bc)

+ µ∗(E ∩Ac ∩B) + µ∗(E ∩Ac ∩Bc)≥ µ∗(E ∩ (A ∪B)) + µ∗(E ∩ (A ∪B)c).

This shows thatM is closed under finite unions (and intersections). Furthermore,if A,B ∈M and A ∩B = ∅ then

µ∗(A ∪B) = µ∗((A ∪B) ∩A) + µ∗((A ∪B) ∩Ac) = µ∗(A) + µ∗(B),

which shows that µ∗ is finitely additive on M. To prove that M is a σ-algebra,it suffices to show that M is closed under countable unions of disjoint sets. LetAn be a sequence of disjoint sets inM, let Bn =

⋃nk=1Ak and let B =

⋃∞k=1Ak.

For any E ⊆ X we have

µ∗(E ∩Bn) = µ∗(E ∩Bn ∩An) + µ∗(E ∩Bn ∩Acn)

= µ∗(E ∩An) + µ∗(E ∩Bn−1),

so by induction we have µ∗(E ∩Bn) =∑nk=1 µ

∗(E ∩Ak). Therefore

µ∗(E) = µ∗(E ∩Bn) + µ∗(E ∩Bcn)

≥n∑k=1

µ∗(E ∩Ak) + µ∗(E ∩Bc),

so taking n→∞ gives

µ∗(E) ≥∞∑k=1

µ∗(E ∩Ak) + µ∗(E ∩Bc),

= µ∗(E ∩B) + µ∗(E ∩Bc) ≥ µ∗(E)

where the last inequality holds due to subadditivity. This proves that M is aσ-algebra, and taking E = B above shows that µ∗ is countably additive. Finally,if µ∗(A) = 0 and E ⊆ X then

µ∗(E) ≤ µ∗(E ∩A) + µ∗(E ∩Ac) = µ∗(E ∩Ac) ≤ µ∗(E),

so A ∈M. This shows that µ∗ is complete.

188

4 Integration

Theorem 4.21. Let R be any collection of subsets of X such that X can be writtenas a countable union of sets in R, and let µ : R → [0,∞] be any function suchthat ∅ ∈ R and µ(∅) = 0.

1. For any A ⊆ X, define

µ∗(A) = inf

∞∑n=1

µ(An)

where the inf is taken over all sequences An in R whose union containsA. Then µ∗ is an outer measure.

2. If R is a ring and µ is a pre-measure, then µ∗ extends µ and every set in Ris µ∗-measurable.

Proof. For (1) it is clear that µ∗(∅) = 0 and µ∗(A) ≤ µ∗(B) whenever A ⊆ B,so it remains to prove that µ∗ is countably subadditive. Let An be a sequenceof subsets of X; we can assume µ(An) < ∞ for all n. Let ε < 0. For each n,

let A(n)k k≥1 be a sequence of sets in R whose union contains An and such that∑∞

k=1 µ(A(n)k ) ≤ µ∗(An)+ε/2n. Then A(n)

k n,k≥1 is a countable collection of setswhose union contains

⋃∞n=1An, and

µ∗

( ∞⋃n=1

An

)≤ µ∗

( ∞⋃n=1

∞⋃k=1

A(n)k

)

≤∞∑n=1

∞∑k=1

µ(A(n)k )

≤∞∑n=1

µ∗(An) + ε.

Taking ε→ 0 proves (1).

For (2), we first show that µ∗ extends µ. Suppose A ∈ R. Since A = A∪∅∪∅∪· · · ,we have µ∗(A) ≤ µ(A). Conversely, if ε > 0 then there is a sequence An of setsin R whose union contains A, and such that

∑∞n=1 µ(An) ≤ µ∗(A) + ε. Since

A =⋃∞n=1An ∩A we have

µ(A) ≤∞∑n=1

µ(An ∩A) ≤∞∑n=1

µ(An) ≤ µ∗(A) + ε,

so taking ε→ 0 shows that µ(A) ≤ µ∗(A) and therefore µ∗(A) = µ(A).

189

4 Integration

Finally, we show that the sets in R are µ∗-measurable. Let A ∈ R and let E beany subset of X. Then µ∗(E) ≤ µ∗(E ∩ A) + µ∗(E ∩ Ac) since µ∗ is an outermeasure. Let ε > 0. Choose a sequence An of sets in R whose union containsE and such that

∑∞n=1 µ(An) ≤ µ∗(E) + ε. Then E ∩ A ⊆

⋃∞n=1An ∩ A and

E ∩Ac ⊆⋃∞n=1An ∩Ac, so

µ∗(E ∩A) + µ∗(E ∩Ac) ≤∞∑n=1

µ(An ∩A) +

∞∑n=1

µ(An ∩Ac)

=

∞∑n=1

µ(An)

≤ µ∗(E) + ε.

Taking ε→ 0 shows that µ∗(E ∩A) + µ∗(E ∩Ac) ≤ µ∗(E).

Theorem 4.22 (Caratheodory’s extension theorem). Let µ be a pre-measure ona ring R in X. Assume that X can be written as a countable union of sets in R.Then µ can be extended to a positive measure on the σ-algebra M generated by R.If µ is σ-finite, then the extension is unique. Furthermore, for A ∈M we have

µ(A) = inf

∞∑n=1

µ(An)

where the inf is taken over all sequences An in R whose union contains A.

Proof. Existence follows from Theorem 4.21 and Theorem 4.20. To prove unique-ness, write µ = µ∗ and let ν be another positive measure on M that extends µ.Write X =

⋃∞n=1Xn where each Xn ∈ R has finite µ-measure and the sets Xn

are disjoint. It suffices to show that ν(E ∩Xn) = µ(E ∩Xn) for all E ∈ M andn ≥ 1. Let B = E ∩Xn. We have

µ(B) = inf

∞∑n=1

µ(An) = inf

∞∑n=1

ν(An) ≥ ν(B)

where the inf is taken over all sequences An in R whose union contains B. LetA =

⋃∞n=1An; then

ν(A) = limn→∞

ν

(n⋃k=1

Ak

)= limn→∞

µ

(n⋃k=1

Ak

)= µ(A).

For any ε > 0 we can choose An so that µ(A) < µ(B)+ε. Therefore µ(A\B) < εand

µ(B) ≤ µ(A) = ν(A) = ν(B) + ν(A \B) ≤ ν(B) + µ(A \B) < ν(B) + ε,

190

4 Integration

which shows that ν(B) = µ(B).

In applications, we often need a simple condition for a set to have measure zero. Forexample, in most treatments of the multiple Riemann integral, a set A is definedto have measure zero if for every ε > 0 there is a sequence Qn of rectanglescovering A such that

∑∞n=1 v(Qn) < ε, where v(Qn) is the volume of Qn. Thus,

we have the following corollary.

Corollary 4.23. Let (X,M, µ) be a measure space and let R ⊆ M be a ringconsisting of sets of finite measure such that R generates M and µ is σ-finite withrespect to R. A set E ∈ M has measure zero if and only if for every ε > 0 thereexists a sequence An in R whose union contains E and such that

∞∑n=1

µ(An) < ε.

If X is σ-finite then the ring R generated by all sets of finite measure clearlygeneratesM, and µ is σ-finite with respect to R. Therefore Corollary 4.23 impliesthat a set E ∈ M has measure zero if and only if for every ε > 0 there existsa sequence An of sets of finite measure whose union contains E and such that∑∞n=1 µ(An) < ε.

4.2 Measurable maps

In the following sections, X refers to a measurable space and E,F are Banachspaces over a field K (where K = R or K = C). If f : X → E is any map,|f | : X → R is the map that sends each x ∈ X to the norm of f(x).

Definition 4.24. If (X,M) and (Y,N ) are measurable spaces, a map f : X → Yis said to be (M,N )-measurable or just measurable if f−1(E) ∈ M for allE ∈ N .

In view of the second part of Lemma 4.1, the restriction of f to any subset E of Xcan be considered as a map between measurable spaces, and f |E is measurable iff is measurable. In fact, f : X → Y is measurable if and only if for every x ∈ X,there is a measurable set E containing x for which f |E is measurable.

Theorem 4.25 (Properties of measurable maps). Let (X,M) and (Y,N ) be mea-surable spaces.

191

4 Integration

1. If N is generated by E, then f : X → Y is measurable if and only if f−1(E) ∈M for all E ∈ E.

2. If f : X → Y and g : Y → Z are measurable, then g f is measurable.

3. If (Yα,Nα)α∈A are measurable spaces, Y =∏α∈A Yα, N =

⊗α∈ANα and

πα : Y → Yα are the canonical projections, then f : X → Y is measurable ifand only if every fα = πα f is measurable.

Theorem 4.26 (Image measure). Let (X,M, µ) be a measure space and let (Y,N )be a measurable space. If f : X → Y is measurable, then

(f∗µ)(E) = µ(f−1(E))

defines a measure on Y , called an image (or pushforward) measure. Further-more,

|f∗µ| ≤ f∗|µ|.

Proof. We have (f∗µ)(∅) = µ(f−1(∅)) = µ(∅) = 0 and

(f∗µ)

( ∞⋃n=1

An

)= µ

(f−1

( ∞⋃n=1

An

))= µ

( ∞⋃n=1

f−1(An)

)

=

∞∑n=1

µ(f−1(An)) =

∞∑n=1

(f∗µ)(An).

Note that An is a partition of A if and only if f−1(An) is a partition of f−1(A).Therefore

|f∗µ|(A) = sup

∞∑n=1

|µ(f−1(An))| ≤ sup

∞∑n=1

|µ(Bn)| = |µ|(f−1(A)) = (f∗|µ|)(A),

where the first sup is taken over all partitions An of A and the second sup istaken over all partitions Bn of f−1(A).

Theorem 4.27. Let f : X → Y be a map into a measurable space. If f ismeasurable almost everywhere, then f is measurable with respect to µ.

Proof. Let E be a set of measure zero such that f |X\E is measurable. If A ⊆ Y ismeasurable, then

f−1(A) = (f−1(A) ∩ E) ∪ (f |X\E)−1(A) ∈M

since f−1(A) ∩ E ⊆ E is in M and (f |X\E)−1(A) is measurable.

Corollary 4.28. Let f, g : X → Y be maps into a measurable space. If µ iscomplete and f = g almost everywhere, then f is measurable if and only if g ismeasurable.

192

4 Integration

Borel measurability

Definition 4.29. If X is a measurable space and Y is a topological space, a mapf : X → Y is said to be measurable if f is measurable in the sense of Definition4.3 when Y is considered as a measurable space equipped with its Borel σ-algebra.It is easy to see that this is equivalent to the condition that f−1(U) is measurablefor every open set U in Y . Note that any continuous map between topologicalspaces is measurable.

Example 4.30. Since every open set in R can be written as a countable unionof open intervals, a map f : X → R is measurable if and only if f−1 ((a, b)) ismeasurable whenever a < b.

Example 4.31. Let X be a measurable space. Using Theorem 4.25 and thecontinuity of addition and multiplication, we can derive the following facts:

1. If f, g : X → Y are measurable maps into a normed vector space over K,then f + g, rf and |f | are measurable, for any r ∈ K.

2. If f, g : X → C are measurable, then fg is measurable.

3. If f : X → C is given by f = u+iv where u, v : X → R, then f is measurableif and only if u and v are measurable.

Measurable functions behave well with respect to limits in a metric space, as thefollowing theorem shows.

Theorem 4.32. Let X be a measurable space and let Y be a metric space. Letfn : X → Y be a sequence of measurable maps for n = 1, 2, . . . that convergespointwise to f . Then f is measurable.

Proof. We first prove that

f−1(U) ⊆∞⋂m=1

∞⋃k=m

f−1k (U) (*)

whenever U is open. If x ∈ f−1(U) then x ∈ f−1k (U) for sufficiently large k since

fk(x)→ f(x). Therefore

f−1(U) ⊆∞⋃k=m

f−1k (U)

for all m, and (*) follows. Next, we prove that

∞⋂m=1

∞⋃k=m

f−1k (E) ⊆ f−1(E) (**)

193

4 Integration

whenever E is closed. If

x ∈∞⋃k=m

f−1k (E)

for all m then fk(x) ∈ E for arbitrarily large k, so f(x) ∈ E since E is closed andfk(x)→ f(x). Therefore (**) follows.

Now let U be open in Y . For each n = 1, 2, . . . , let

En = y ∈ Y : d(y, Y \ U) ≥ 1/n ,Un = y ∈ Y : d(y, Y \ U) > 1/n .

Then En is closed, Un is open and Un ⊆ En for all n, and

U =

∞⋃n=1

En =

∞⋃n=1

Un.

Therefore we have the inclusions

f−1(U) =

∞⋃n=1

f−1(En) ⊇∞⋃n=1

∞⋂m=1

∞⋃k=m

f−1k (En)

⊇∞⋃n=1

∞⋂m=1

∞⋃k=m

f−1k (Un)

and

f−1(U) =

∞⋃n=1

f−1(Un) ⊆∞⋃n=1

∞⋂m=1

∞⋃k=m

f−1k (Un)

from (*) and (**).

We can extend the topology of R to [−∞,∞] by declaring (a, b), [−∞, a) and (a,∞]to be open sets in [−∞,∞]. Then we have the following useful result similar toExample 4.30:

Theorem 4.33. Let X be a measurable space. A map f : X → [−∞,∞] ismeasurable if and only if f−1 ((a,∞]) is measurable for every a ∈ R.

We also have a refinement of Theorem 4.32 for maps into [−∞,∞]:

Theorem 4.34. Let fn : X → [−∞,∞] be a sequence of measurable maps forn = 1, 2, . . . . Then the maps

infn≥1

fn, supn≥1

fn, lim infn→∞

fn and lim supn→∞

fn

are measurable. If fn → f pointwise, then f is measurable.

194

4 Integration

Proof. Let g = supn≥1 fn; then g−1 ((a,∞]) =⋃∞n=1 f

−1n ((a,∞]), so Theorem

4.33 shows that g is measurable. A similar argument shows that infn≥1 fn ismeasurable, so

lim infn→∞

fn = supn≥1

infk≥n

fk and lim supn→∞

fn = infn≥1

supk≥n

fk

are measurable. If fn → f then f = lim infn→∞ fn = lim supn→∞ fn, so f ismeasurable.

Simple maps

A map f : X → E is said to be simple if it is measurable and f(X) is finite.Any simple map can be written as f =

∑ni=1 f(Ai)χAi , where the Ai are disjoint

measurable sets, f is constant on Ai, and f(Ai) denotes the value of f on Ai. Inthat case, we say that f is simple with respect to Ai.

Theorem 4.35 (Properties of simple maps).

1. If f : X → E is simple and A ⊆ X is measurable, then f |A is simple.

2. If f : X → E is simple and g : E→ F is measurable, then g f is simple.

3. Let E1, . . . ,Ek be Banach spaces, let f : X → E1×· · ·×Ek, and let fi = πifbe the component functions of f , where πi : E1×· · ·×Ek → Ei is the canonicalprojection. Then f is simple if and only if every fi is simple.

From the preceding theorem, it is clear that the set of all simple maps on X formsa vector space. By Theorem 4.32, the pointwise limit of a sequence of simple mapsis measurable (if it exists). In fact, the converse is true if E is finite-dimensional.

Theorem 4.36. If E is finite-dimensional, then a map f : X → E is measurableif and only if it is a pointwise limit of a sequence of simple maps.

Proof. By Theorem 4.25, it suffices to prove the theorem in the case E = R. Foreach n ≥ 1, divide [−n, n) into intervals J1, . . . , JN , each of length 1/n, taking Jk tobe closed on the left and open on the right. Let JN+1 = y ∈ R : y < −n or y ≥ n.Let Ak = f−1(Jk) for k = 1, . . . , N + 1; then each Ak is measurable, the sets Akare disjoint, and their union is X. Define a simple function fn : X → R as follows:for each k = 1, . . . , N , define fn = infAk f on Ak, and define fn on AN+1 bysetting

fn(x) = n if f(x) ≥ n, fn(x) = −n if f(x) < −n.

Then fn → f pointwise.

195

4 Integration

Corollary 4.37. If f : X → [0,∞] is measurable, then f is the pointwise limit ofan increasing sequence of simple maps.

Proof. Let g = f but set g(x) = 0 whenever f(x) = ∞. By Theorem 4.36,there is a sequence gn of simple maps converging pointwise to g. Define hn =max g1, . . . , gn, but set hn(x) = n whenever f(x) =∞. Then hn is increasingand converges pointwise to f .

µ-measurable maps

Theorem 4.36 proves a useful property of measurable maps into finite-dimensionalspaces. Let µ be a measure on X. Since every measurable f : X → [0,∞] can bewritten as the pointwise limit of an increasing sequence fn of simple maps, wecan define ∫

X

f d |µ| = sup

∫X

fn d |µ| ,

where the integrals on the right have the usual definition. Thus, the propertyproved in Theorem 4.36 has a close connection to positivity and the ordering onR. We will not be defining the integral using this approach, but the property inTheorem 4.36 will be important later on.

Since E may be infinite-dimensional, we have the following definition. A mapf : X → E is µ-measurable if there exists a sequence fn of simple maps suchthat fn → f (pointwise) almost everywhere. It is clear that every µ-measurablemap is measurable almost everywhere. If E is finite-dimensional, then Theorem4.36 implies that every measurable map is also µ-measurable.

Theorem 4.38 (Properties of µ-measurable maps).

1. If f : X → E is µ-measurable and A ⊆ X is measurable, then f |A is µ-measurable.

2. If U ⊆ E is an open set, f : X → U is µ-measurable and g : U → F iscontinuous, then g f is µ-measurable.

3. Let E1, . . . ,Ek be Banach spaces, let f : X → E1×· · ·×Ek, and let fi = πifbe the component functions of f , where πi : E1×· · ·×Ek → Ei is the canonicalprojection. Then f is µ-measurable if and only if every fi is µ-measurable.

Proof. For (2), let fn be a sequence of simple maps converging to f almosteverywhere. Let gn(x) = g(fn(x)) if fn(x) ∈ U and gn(x) = 0 if fn(x) /∈ U . Thengn is a sequence of simple maps converging to g f almost everywhere. For

196

4 Integration

(3), suppose that every fi is µ-measurable. For each i, let fi,n be a sequenceof simple maps converging to fi almost everywhere. Theorem 4.35 shows that themap fn = (f1,n, . . . , fk,n) : X → E1 × · · · × Ek is simple, and fn converges to falmost everywhere.

An important property of µ-measurable maps is that the pointwise limit of asequence of µ-measurable maps is still µ-measurable.

Theorem 4.39. Let f : X → E be any map. The following are equivalent:

1. f is µ-measurable.

2. There is a set E of measure zero such that f |X\E is measurable and f(X \E)is separable.

Proof. If f is µ-measurable, then there exists a set E of measure zero and asequence fn of simple maps such that fn → f pointwise on X \ E. Theorem4.32 shows that f |X\E is measurable. Let S =

⋃∞n=1 fn(X), which is countable.

Then S is separable and f(X \ E) ⊆ S, so f(X \ E) is separable. This proves(1) ⇒ (2). Conversely, suppose that (2) holds. Let an be a countable densesubset of f(X \ E) and let

Ai,j = x ∈ X \ E : |f(x)− ai| < 1/j

for i, j ≥ 1. Since f |X\E is measurable, each Ai,j is measurable. Let

B(n)i,j = Ai,j \

⋃(i,j)<(k,`)≤(n,n)

Ak,`

where we order the pairs (i, j) by taking (i, j) ≤ (k, `) if j < `, or j = ` and i ≤ k.

For each n, the sets B(n)i,j are disjoint. Let

fn =

n∑i=1

n∑j=1

aiχB(n)i,j

;

we want to show that fn → f on X \E. Let x ∈ X \E and ε > 0. Choose j0 suchthat 1/j0 < ε and choose i0 such that |f(x)− bi0 | < 1/j0, so that x ∈ Ai0,j0 . Let

N = max(i0, j0). If n > N then x ∈ B(n)k,` , where

(k, `) = max (i, j) : x ∈ Ai,j and (i0, j0) ≤ (i, j) ≤ (n, n) .

Then fn(x) = ai and |f(x)− fn(x)| < 1/` ≤ 1/j0 < ε. This shows that fn(x) →f(x).

197

4 Integration

Corollary 4.40. If f : X → E is µ-measurable, then there exists a µ-measurablemap g : X → E such that f = g almost everywhere and g(X) is contained in aclosed, separable subspace of E.

Proof. Let E be as in Theorem 4.39, set g = f on X \E, and set g = 0 on E. LetS be a dense subset of g(X) and let F = span(S). Then g(X) ⊆ F and F is aclosed and separable subspace of E.

Corollary 4.41. Let fn be a sequence of µ-measurable maps converging almosteverywhere to a map f . Then f is µ-measurable.

Proof. Let E0 be a set of measure zero such that fn → f pointwise on X \ E0.For each n ≥ 1, let En be a set of measure zero such that fn|X\En is measurableand fn(X \ En) is separable. Let E =

⋃∞n=0En. Then E has measure zero and

fn → f pointwise on X \ E. Let S =⋃∞n=1 fn(X \ En), which is separable. Then

S is separable and f(X \ E) ⊆ S, so f(X \ E) is separable.

Modes of convergence

Definition 4.42. Let fn be a sequence of maps from X to E and let f : X → E.Let µ be a positive measure on X and let µ be the completion of µ as defined inTheorem 4.17.

1. We say that fn converges to f (pointwise) almost everywhere (a.e.)if fn(x)→ f(x) for almost all x ∈ X.

2. We say that fn converges to f almost uniformly (a.u.) if for everyε > 0 there exists a measurable set E ⊆ X with µ(E) < ε such that fn → funiformly on X \ E.

3. We say that fn converges to f in measure if all fn and f are measurablealmost everywhere and for every ε > 0 we have

µ(x ∈ X : |fn(x)− f(x)| ≥ ε)→ 0

as n→∞.

Also:

1. We say that fn is almost uniformly Cauchy if for every ε > 0 thereexists a measurable set E ⊆ X with µ(E) < ε such that fn is uniformlyCauchy on X \ E. (That is, for all δ > 0 there exists some N such that|fm(x)− fn(x)| < δ for all x ∈ X \ E and m,n ≥ N .)

198

4 Integration

2. We say that fn is Cauchy in measure if all fn and f are measurablealmost everywhere and for every ε > 0 we have

µ(x ∈ X : |fm(x)− fn(x)| ≥ ε)→ 0

as m,n→∞. (That is, for all δ > 0 there exists some N such that

µ(x ∈ X : |fm(x)− fn(x)| ≥ ε) < δ

for all m,n ≥ N .)

If µ is a vector-valued measure, we extend the above definitions by replacing µwith |µ|. From now on, we will assume that µ is a positive measure.

Theorem 4.43 (Properties of convergence a.e., a.u., and in measure). Supposethat fn → f and gn → g almost everywhere.

1. fn + gn → f + g almost everywhere. If r ∈ K, then rfn → rf almosteverywhere.

2. If h is any map equal to f almost everywhere, then fn → h almost every-where.

3. |fn| → |f | almost everywhere.

4. If E ⊆ X, then fn|E → f |E almost everywhere.

5. If gn is nonnegative, gn → 0 almost everywhere, and |fn| ≤ gn for all n,then fn → 0 almost everywhere.

The above statements still hold if “almost everywhere” is replaced by “almost uni-formly” or “in measure”.

Proof. The statements are clear for almost everywhere and almost uniform con-vergence. Suppose that fn → f and gn → g in measure, and let ε > 0. Then

x ∈ X : |(fn + gn)(x)− (f + g)(x)| ≥ ε ⊆ x ∈ X : |fn(x)− f(x)| ≥ ε/2∪ x ∈ X : |gn(x)− g(x)| ≥ ε/2 ,

so applying µ to both sides shows that fn + gn → f + g in measure. Similarly,rfn → rf in measure. This proves (1). For (3) and (4), we have

x ∈ X : ||fn(x)| − |f(x)|| ≥ ε ⊆ x ∈ X : |fn(x)− f(x)| ≥ ε ,x ∈ E : |fn(x)− f(x)| ≥ ε ⊆ x ∈ X : |fn(x)− f(x)| ≥ ε .

For (5), if gn → 0 in measure and |fn| ≤ gn then

x ∈ X : |fn(x)| ≥ ε ⊆ x ∈ X : |g(x)| ≥ ε ,

so applying µ to both sides shows that fn → 0 in measure.

199

4 Integration

Theorem 4.44 (Properties of Cauchy sequences).

1. If fn → f almost uniformly, then fn is almost uniformly Cauchy.

2. If fn is almost uniformly Cauchy and fnk → f for some subsequencefnk, then fn → f .

The above statements still hold if “almost uniformly” is replaced by “in measure”and “almost uniformly Cauchy” is replaced by “Cauchy in measure”.

Some modes of convergence are stronger than others, as the following theoremsshow.

Theorem 4.45. Suppose that fn → f almost uniformly.

1. fn → f almost everywhere.

2. If each fn is measurable almost everywhere, then fn → f in measure.

Proof. For each k = 1, 2, . . . , there exists some Ek with µ(Ek) < 1/k such thatfn → f uniformly on X \ Ek. Then µ (

⋂∞k=1Ek) = 0, and fn → f pointwise on

X \⋂∞k=1Ek =

⋃∞k=1X \ Ek. Now suppose that each fn is measurable almost

everywhere, and let δ, ε > 0. Choose E ⊆ X such that µ(E) < δ and fn → funiformly on X \ E. There exists some N such that |fn(x)− f(x)| < ε for allx ∈ X \ E and n ≥ N . Then

µ(x ∈ X : |fn(x)− f(x)| ≥ ε) ≤ µ(E) < δ

for all n ≥ N .

Theorem 4.46 (Egoroff’s theorem). If µ(X) < ∞, each fn is measurable, andfn → f almost everywhere, then fn → f almost uniformly.

Proof. By removing a set of measure zero, we can assume that fn → f everywhere.Let

Em,n = x ∈ X : |f(x)− fk(x)| ≥ 1/m for some k ≥ n

for m,n ≥ 1. Then each Em,n is measurable, Em,1 ⊇ Em,2 ⊇ · · · and⋂∞n=1Em,n =

∅ since fn → f pointwise. Noting that µ(Em,1) < ∞, Theorem 4.9 shows thatµ(Em,n) → 0 as n → ∞. Let ε > 0, and for each m choose some nm such thatµ(Em,nm) < ε/2m. Let E =

⋃∞m=1Em,nm . Then

µ(E) ≤∞∑m=1

ε

2m= ε,

200

4 Integration

and it remains to show that fn → f uniformly on X \E. Let δ > 0 and choose msuch that 1/m < δ. If x ∈ X \ E then x /∈ Em,nm , so

|f(x)− fk(x)| < 1/m < δ

for all k ≥ nm.

Theorem 4.47 (Uniqueness of limits). Suppose that fn → f almost everywhere,almost uniformly, or in measure. Also suppose that fn → g almost everywhere,almost uniformly, or in measure. Then f = g almost everywhere.

Proof. By Theorem 4.45, we only have to consider three cases. If fn → f andfn → g almost everywhere then 0 → f − g almost everywhere, so it is clear thatf = g almost everywhere. Similarly, if fn → f and fn → g in measure then0→ f − g in measure, so for every ε > 0 we have

µ(x ∈ X : |f(x)− g(x)| ≥ ε) = 0. (*)

But

x ∈ X : f(x) 6= g(x) =

∞⋃k=1

x ∈ X : |f(x)− g(x)| ≥ 1/k ,

so f = g almost everywhere. Finally, suppose that fn → f almost everywhere andfn → g in measure. To prove that f = g almost everywhere, we will show (*). Letε > 0, let E = x ∈ X : |f(x)− g(x)| ≥ ε and suppose that µ(E) > 0. Let

En = x ∈ E : |fk(x)− f(x)| ≤ ε/2 for all k ≥ n .

Then E1 ⊆ E2 ⊆ · · · and x ∈⋃∞n=1En for almost all x ∈ E since fn → f

almost everywhere. Therefore C = µ (⋃∞n=1En) > 0. By Theorem 4.9 we have

µ(En)→ C as n→∞, so µ(En) > 0 for some n. But then

|fk(x)− g(x)| = |fk(x)− f(x) + f(x)− g(x)|≥ |f(x)− g(x)| − |fk(x)− f(x)|≥ ε/2

for all x ∈ En and k ≥ n, so fn cannot converge to g in measure.

For sequences that are almost uniformly Cauchy or Cauchy in measure, we havecompleteness results.

Theorem 4.48 (Completeness for uniformly Cauchy sequences). Let fn bealmost uniformly Cauchy. Then there exists some (unique almost everywhere)f : X → E such that fn → f almost uniformly.

201

4 Integration

Proof. For each m = 1, 2, . . . , choose some Em ⊆ X such that µ(Em) < 1/m andfn is uniformly Cauchy on X \ Em. Let E =

⋂∞m=1Em so that µ(E) = 0. If

x ∈ X \ E then x ∈ X \ Em for some m, so fn(x) is a Cauchy sequence. SinceE is complete, we can define f(x) = limn→∞ fn(x) for each x ∈ X \ E. Set f = 0on E. Let ε > 0 and choose m such that 1/m < ε. Then µ(Em) < 1/m < ε, andfn is uniformly Cauchy on X \ Em. We have fn → f pointwise on X \ Em, sofn converges uniformly to f on X \ Em.

Lemma 4.49. If fn is Cauchy in measure, then there exists a subsequence fnkthat is almost uniformly Cauchy.

Proof. Let n1 = 1. Having chosen n1, . . . , nk−1, choose nk such that nk > nk−1,and such that

µ(x ∈ X : |fm(x)− fn(x)| ≥ 2−k

) < 2−k

for all m,n ≥ nk. Write gk = fnk . Let ε > 0 and choose K such that∑∞k=K 2−k <

ε. Let

E =

∞⋃k=K

x ∈ X : |gk(x)− gk+1(x)| ≥ 2−k

so that µ(E) < ε. We will show that gk is uniformly Cauchy on X \E. Let δ > 0and choose N such that N ≥ K and 2−N+1 < δ. Then for all n > m ≥ N andx ∈ X \ E we have

|gm(x)− gn(x)| =

∣∣∣∣∣n−1∑k=m

(gk(x)− gk+1(x))

∣∣∣∣∣≤

n−1∑k=m

|gk(x)− gk+1(x)|

≤n−1∑k=m

2−k ≤∞∑k=N

2−k

< δ.

Theorem 4.50 (Completeness for Cauchy in measure sequences). Let fn beCauchy in measure. Then there exists some (unique almost everywhere) f : X → Esuch that fn → f in measure.

Proof. By Lemma 4.49, there exists a subsequence fnk that is almost uniformlyCauchy, and Theorem 4.48 shows that there is some f : X → E such that fnk → falmost uniformly. Theorem 4.45 shows that fnk → f in measure, and Theorem4.44 shows that fn → f .

202

4 Integration

4.3 The integral

Integration of simple maps

Let X be a measurable space and let µ be a positive measure on X. A simplemap f : X → E is integrable if µ(x ∈ X : f(x) 6= 0) < ∞. Suppose that f issimple with respect to some collection Ai with µ(Ai) < ∞ for all i. We definethe integral of f by∫

X

f dµ =

∫x∈X

f(x) dµ =

n∑i=1

f(Ai)µ(Ai).

When the measure µ is fixed, we will write∫X

f =

∫X

f dµ

for simplicity. If f is simple with respect to some other collection Bj whereµ(Bj) <∞, then f is simple with respect to Ai ∩Bj and

m∑j=1

f(Bj)µ(Bj) =

n∑i=1

m∑j=1

f(Ai ∩Bj)µ(Ai ∩Bj) =

n∑i=1

f(Ai)µ(Ai).

This shows that the integral of f is well-defined. If E ⊆ X is measurable, we write∫E

f =

∫E

f |E .

Let I(X,µ,E) = I(µ,E) = I(µ) be the vector space of all integrable simple mapson X. If we define

‖f‖1 =

∫X

|f | =n∑i=1

|f(Ai)|µ(Ai),

then ‖·‖1 is a seminorm on I(µ). We call ‖·‖1 the L1-norm. If we let I0(µ) =f ∈ I(µ) : ‖f‖1 = 0, then I(µ)/I0(µ) is a normed vector space. We will denotean element of I(µ)/I0(µ) by [f ] = f+I0(µ), where f ∈ I(µ). Note that if f ∈ I0(µ)then f = 0 almost everywhere.

It is easy to check that ∫X

(f + g) =

∫X

f +

∫X

g,∫X

rf = r

∫X

f

203

4 Integration

for all f, g ∈ I(µ) and r ∈ K, so∫X

: I(µ)→ E is linear. Also, I0(µ) ⊆ ker∫X

, sothe integral descends to a linear map on I(µ)/I0(µ) which we will also denote by∫X

. Since ∣∣∣∣∫X

f

∣∣∣∣ =

∣∣∣∣∣n∑i=1

f(Ai)µ(Ai)

∣∣∣∣∣≤

n∑i=1

|f(Ai)|µ(Ai) = ‖f‖1 ,

the linear map∫X

: I(µ)/I0(µ)→ E is bounded (and continuous).

Theorem 4.51 (Properties of the integral). Let f, g be integrable and let r ∈ K.Then f + g, rf and |f | are integrable, and:

1.∣∣∫Xf∣∣ ≤ ∫

X|f | = ‖f‖1.

2.∫X

(f + g) =∫Xf +

∫Xg and

∫Xrf = r

∫Xf .

3. If A ⊆ X is measurable, then f |A is integrable and∫Af =

∫Af |A =

∫XχAf .

4. If X = X1 ∪ X2 where X1, X2 are disjoint and measurable, then f |X1and

f |X2 are integrable, and ∫X

f =

∫X1

f +

∫X2

f.

5. If E = R and f ≥ 0, then∫Xf ≥ 0.

6. If µ(X) <∞ and e ∈ E, then the constant map e : X → E is integrable and∫Xe dµ = µ(X)e.

7. If ϕ : E→ F is a continuous linear map, then ϕf is integrable and∫Xϕf =

ϕ∫Xf .

8. If E = E1 × · · · × En where E1, . . . ,En are Banach spaces, then∫X

f =

(∫X

f1, . . . ,

∫X

fn

)where fi = πi f and πi : E → Ei is the canonical projection. Also, f isintegrable if and only if every fi is integrable.

9. If µ, ν are positive measures on (X,M) and f is integrable with respect toµ+ ν, then f is integrable with respect to both µ and ν and∫

X

f d(µ+ ν) =

∫X

f dµ+

∫X

f dν.

204

4 Integration

If f is integrable with respect to µ and r ≥ 0, then f is integrable with respectto rµ and ∫

X

f d(rµ) = r

∫X

f dµ.

If ν ≤ µ and f is integrable with respect to µ, then f is integrable with respectto ν and ∫

X

|f | dν ≤∫X

|f | dµ.

A sequence fn in I(µ) or a sequence [fn] in I(µ)/I0(µ) is called L1-Cauchyif it is a Cauchy sequence (under the L1-norm). That is, for every ε > 0 thereexists some N such that ‖fm − fn‖1 < ε for all m,n ≥ N .

The L1-completion

Let I(µ) be the completion of I(µ)/I0(µ), which exists by Theorem 1.55. We willdenote the norm on I(µ) by ‖·‖1. By Theorem 1.45, there is a unique extensionof∫X

to a continuous linear map∫X

: I(µ)→ E with∣∣∣∣∫X

ϕ

∣∣∣∣ ≤ ‖ϕ‖1for all ϕ ∈ I(µ). We will now describe the elements of I(µ) more explicitly.

Lemma 4.52. Let fn in I(µ) be L1-Cauchy. Then fn is Cauchy in measure.

Proof. Let ε > 0 and let Emn = x ∈ X : |fm(x)− fn(x)| ≥ ε. Then µ(Emn) <∞, so χEmn ∈ I(µ,R) and

µ(Emn) =

∫X

χEmn ≤1

ε

∫X

|fm(x)− fn(x)| = 1

ε‖fm − fn‖1 .

Let L 1(X,µ,E) = L 1(µ,E) = L 1(µ) be the vector space of all maps f : X → Esuch that there is some L1-Cauchy sequence of maps in I(µ) converging to f inmeasure. Let L0(µ) ⊆ L 1(µ) be the set of all maps f : X → E such thatf = 0 almost everywhere. We will denote an element of L 1(µ)/L0(µ) by [f ],where f ∈ L 1(µ). We can define a map Λ : I(µ) → L 1(µ)/L0(µ) as follows: ifϕ ∈ I(µ), then there is some sequence [fn] in I(µ)/I0(µ) that converges to ϕ.This sequence is L1-Cauchy, so Lemma 4.52 shows that fn is Cauchy in measure

205

4 Integration

and Theorem 4.50 shows that fn → f in measure for some f : X → E. We defineΛ(ϕ) = [f ].

To check that Λ is well-defined, suppose that [gn] is another sequence thatconverges to ϕ, and that gn → g in measure for some g : X → E. It is easyto check that f1, g1, f2, g2, . . . is L1-Cauchy, so Lemma 4.52 shows that it isCauchy in measure. But fn is a subsequence that converges to f in measure, sothe whole sequence converges to f in measure. Therefore gn → f in measure. Sincegn → g in measure as well, Theorem 4.47 shows that f = g almost everywhere, so[f ] = [g].

Theorem 4.53. The map Λ : I(µ)→ L 1(µ)/L0(µ) is an isomorphism (of vectorspaces).

Proof. Let ϕ,ψ ∈ I(µ) and [fn] , [gn] be sequences in I(µ)/I0(µ) such that[fn] → ϕ and [gn] → ψ. Choose f, g : X → E such that fn → f and gn → gin measure. Then [fn] + [gn] → ϕ + ψ and fn + gn → f + g in measure, soΛ(ϕ+ψ) = f +g = Λϕ+Λψ. If r ∈ K then [rfn]→ rϕ and rfn → rf in measure,so Λ(rϕ) = rf = rΛϕ. This shows that Λ is linear. Next, we show that Λ issurjective. If [f ] ∈ L 1(µ)/L0(µ), then there exists some L1-Cauchy sequence ofmaps fn in I(µ) such that fn → f in measure. Since [fn] is L1-Cauchy andI(µ) is complete, we have [fn]→ ϕ for some ϕ ∈ I(µ). By definition, Λϕ = f .

Finally, we show that Λ is injective. Let ϕ ∈ ker Λ. That is, there is somesequence [fn] in I(µ)/I0(µ) such that [fn] → ϕ and fn → f in measure forsome f : X → E that is zero almost everywhere. By Theorem 4.43, fn → 0 inmeasure, and it remains to show that [fn] → 0 in the L1-norm (since [fn] → ϕ,we must have ϕ = 0). Since fn is Cauchy in measure, Lemma 4.49 shows thatthere is some subsequence fnk that is almost uniformly Cauchy. But fnk → 0in measure, so fnk → 0 almost uniformly by Theorem 4.47. If we can show thatfnk → 0 in the L1-norm, then [fn]→ 0 since [fn] is L1-Cauchy. Let gk = fnk .

Let ε > 0 and choose N such that ‖gm − gn‖1 < ε for all m,n ≥ N . Let S =x ∈ X : gN (x) 6= 0. Then µ(S) <∞ since g is a integrable simple map, and wecan choose some C > 0 such that |gN (x)| ≤ C for all x ∈ X. Since gk → 0 almostuniformly, we can choose some E ⊆ S with µ(E) < ε/C such that gk → 0 uniformly

206

4 Integration

on S \E. Choose some m ≥ N such that supx∈S\E |gm(x)| < ε/(µ(S) + 1). Then

‖gN‖1 =

∫E

|gN |+∫S\E|gN |

≤∫E

C +

∫S\E|gm|+

∫S\E|gN − gm|

≤ Cµ(E) +

∫S\E

ε

µ(S) + 1+ ‖gN − gm‖1

≤ ε+ εµ(S)

µ(S) + 1+ ε

< 3ε,

so for all n ≥ N ,‖gn‖1 ≤ ‖gN‖1 + ‖gn − gN‖1 < 4ε.

This shows that gk → 0 in the L1-norm.

Write L1(X,µ,E) = L1(µ,E) = L1(µ) for the quotient space L 1(µ)/L0(µ). Wecan now define a norm on L1(µ) by setting ‖f‖1 =

∥∥Λ−1f∥∥

1for f ∈ L1(µ), and

we have a corresponding seminorm on L 1(µ) defined by ‖f‖1 =∥∥Λ−1[f ]

∥∥1. By

definition, Λ is an isometric isomorphism from I(µ) to L1(µ), and both I(µ) andL1(µ) are Banach spaces (see Theorem 1.35). Recall that we have a continuouslinear map

∫X

: I(µ)→ E. If f ∈ L1(µ), we define the integral of f by∫X

f =

∫X

Λ−1f.

Similarly, if f ∈ L 1(µ) then the integral of f is∫Xf =

∫X

Λ−1[f ]. Elements ofL 1(µ) are called integrable maps.

We note some basic facts about the integral that are obvious from our definitions.If fn is any L1-Cauchy sequence of maps in I(µ) converging to f in measure,then ∫

X

f = limn→∞

∫X

fn,

‖f‖1 = limn→∞

‖fn‖1 .

If f ∈ L 1(µ) and ‖f‖1 = 0, then f = 0 almost everywhere. Since I(µ)/I0(µ) isdense in I(µ) and Λ[f ] = [f ] for f ∈ I(µ), the integrable simple maps are dense inL 1(µ) with respect to the L1-norm.

It is easy to check that the properties in Theorem 4.51 still hold.

207

4 Integration

Theorem 4.54. If f ∈ L 1(µ) then |f | ∈ L 1(µ,R) and

‖f‖1 =

∫X

|f | .

Proof. Let fn be an L1-Cauchy sequence of maps in I(µ) converging to f inmeasure. Then |fn| → |f | in measure, and |fn| is L1-Cauchy because

‖|fm| − |fn|‖1 =

∫X

||fm| − |fn|| ≤∫X

|fm − fn| = ‖fm − fn‖1 .

Therefore |f | ∈ L 1(µ,R) and

‖f‖1 =∥∥Λ−1f

∥∥1

= limn→∞

‖fn‖1

= limn→∞

∫X

|fn|

=

∫X

|f | .

Theorem 4.55. If f ∈ L 1(µ) then f vanishes outside a σ-finite set. In particular,

µ(x ∈ X : |f(x)| ≥ c) <∞

for all c > 0.

Proof. Let fn be an L1-Cauchy sequence of maps in I(µ) converging to f inmeasure. Applying Theorem 4.50 and reindexing if necessary, we can assume thatthere is a set E with µ(E) < 1 such that fn → f uniformly on X \E. There existssome N such that |fn| ≤ c/2 on X \E for all n ≥ N , so |f | ≤ c/2 on X \E. Thisshows that x ∈ X : |f(x)| ≥ c is contained in a set of finite measure. Now takec = 1/n for n = 1, 2, . . . to deduce that x ∈ X : |f(x)| > 0 is σ-finite.

Corollary 4.56. If f : X → E is simple, g ∈ L 1(µ,R), and |f | ≤ g almosteverywhere, then f is integrable.

Proof. We can assume that g is measurable. If f(X) = 0, then f is integrable.Otherwise, choose a ∈ f(X) with the smallest positive norm. By Theorem 4.55,the set S = x ∈ X : |g(x)| ≥ |a| has finite measure. Therefore

x ∈ X : f(x) 6= 0 = x ∈ X : |f(x)| ≥ |a|⊆ S ∪ E

has finite measure, where E is some set of measure zero.

208

4 Integration

Theorem 4.57. If f ∈ L 1(µ) and ε > 0, then there exists a measurable set E offinite measure such that ∫

X\E|f | < ε.

Proof. Since I(µ) is dense in L 1(µ), there is some g ∈ I(µ) such that ‖f − g‖1 < ε.Let E = x ∈ X : g(x) 6= 0 so that µ(E) <∞. Then∫

X\E|f | =

∫X\E|f − g| ≤ ‖f − g‖1 < ε.

Convergence in L1

We now have another mode of convergence (see Definition 4.42):

Definition 4.58. Let fn be a sequence in L 1(µ) and let f ∈ L 1(µ).

1. We say that fn converges to f in L1 if fn → f in the L1-norm. That is,‖fn − f‖1 → 0 as n→∞.

2. We say that fn is L1-Cauchy if it is a Cauchy sequence under the L1-norm. That is, for every ε > 0 there exists some N such that ‖fm − fn‖1 < εfor all m,n ≥ N .

Note that if fn → f in L1 then∫Xfn →

∫Xf , since∣∣∣∣∫

X

fn −∫X

f

∣∣∣∣ ≤ ∫X

|fn − f | = ‖fn − f‖1 .

(Alternatively, Λ−1[fn] → Λ−1[f ] and therefore∫X

Λ−1[fn] →∫X

Λ−1[f ] sinceΛ−1 and

∫X

are continuous.)

Theorem 4.59. Let fn be a sequence in L 1(µ).

1. If fn → f in L1, then fn → f in measure.

2. If fn is L1-Cauchy, then fn is Cauchy in measure.

Proof. Let E be a set of measure zero such that fn and f are all measurable onX \ E. Let ε > 0 and let En = x ∈ X : |fn(x)− f(x)| ≥ ε. Then µ(En ∩ (X \

209

4 Integration

E)) <∞ by Theorem 4.55, so χEn∩(X\E) ∈ I(µ,R) and

µ(En) = µ(En ∩ (X \ E)) =

∫X

χEn∩(X\E)

≤ 1

ε

∫X

|fn(x)− f(x)| = 1

ε‖fn − f‖1 .

Part (2) is similar.

Theorem 4.60 (Uniqueness of limits, with L1). Suppose that fn → f almosteverywhere, almost uniformly, in measure, or in L1. Also suppose that fn → galmost everywhere, almost uniformly, in measure, or in L1. Then f = g almosteverywhere.


We defined L 1(µ) as the space of all maps f : X → E such that there is someL1-Cauchy sequence of maps in I(µ) converging to f in measure, but the followingtheorem shows that we could have used “almost uniformly” or “almost everywhere”instead.

Theorem 4.61 (Equivalence of convergence for L1-Cauchy sequences). Let f :X → E. The following are equivalent:

1. There exists an L1-Cauchy sequence fn in L 1(µ) converging to f in mea-sure.

2. There exists an L1-Cauchy sequence fn in L 1(µ) converging to f almostuniformly.

3. There exists an L1-Cauchy sequence fn in L 1(µ) converging to f almosteverywhere.

If one of the above holds for a sequence fn then the others hold for subsequencesof fn. In that case, f ∈ L 1(µ) and fn → f in L1.

Proof. (1)⇒ (2) follows from Theorem 4.49 and (2)⇒ (3) follows from Theorem4.45. Suppose that (3) holds. Theorem 4.59 shows that fn is Cauchy in measure,so Lemma 4.49 shows that there is a subsequence fnk which converges to someg : X → E almost uniformly. But fnk → f almost everywhere, so Theorem 4.60shows that f = g almost everywhere. Therefore fnk → f in measure, which proves(1).

Corollary 4.62. If f ∈ L 1(µ), then f is µ-measurable.

210

4 Integration

The convergence theorems

Note that if f, g ∈ L 1(µ,R) then max(f, g) ∈ L 1(µ,R) and min(f, g) ∈ L 1(µ,R)because

max(f, g) =f + g + |f − g|

2, min(f, g) =

f + g − |f − g|2

.

Theorem 4.63 (Monotone convergence theorem). Let fn be an increasing (ordecreasing) sequence of functions in L 1(µ,R) such that the sequence

∫Xfn

is bounded. Then fn is both L1 and almost everywhere convergent to somef ∈ L 1(µ,R).

Proof. Assume that fn is an increasing sequence. Let C = supn≥1

∫Xfn, which

is finite by assumption. For all n ≥ m we have

‖fn − fm‖1 =

∫X

(fn − fm) ≤ C −∫X

fm,

so fn is L1-Cauchy and converges to some f ∈ L 1(µ) in L1. By Theorem 4.59,Lemma 4.49 and Theorem 4.60 there is some subsequence of fn that convergesto f almost everywhere. Since fn is increasing, fn → f almost everywhere. Thedecreasing case follows by considering −fn.

Corollary 4.64. Let fn be a sequence in L 1(µ,R). If there is a nonnegativefunction g ∈ L 1(µ,R) such that |fn| ≤ g for all n, then supn≥1 fn ∈ L 1(µ,R),infn≥1 fn ∈ L 1(µ,R), and

supn≥1

∫X

fn ≤∫X

supn≥1

fn and

∫X

infn≥1

fn ≤ infn≥1

∫X

fn.

Proof. The functions sup(f1, . . . , fn) are in L 1(µ,R) for all n and form an increas-ing sequence bounded by g, so we can apply Theorem 4.63. The inequality followsfrom the fact that fn ≤ supn≥1 fn and

∫Xfn ≤

∫X

supn≥1 fn for all n. The casefor inf is similar.

Corollary 4.65 (Fatou’s lemma). Let fn be a sequence of nonnegative func-tions in L 1(µ,R). If lim infn→∞ ‖fn‖1 < ∞ then lim infn→∞ fn < ∞ almosteverywhere, lim infn→∞ fn ∈ L 1(µ,R), and∫

X

lim infn→∞

fn ≤ lim infn→∞

∫X

fn = lim infn→∞

‖fn‖1 .

211

4 Integration

Proof. For all k, the sequence gm given by gm = inf(fk, . . . , fk+m) is decreasing,so Theorem 4.63 shows that infn≥k fn ∈ L 1(µ,R) exists and∫

X

infn≥k

fn ≤ infn≥k

∫X

fn ≤ supk≥1

infn≥k

∫X

fn = lim infn→∞

∫X

fn. (*)

The sequence infn≥k fn is an increasing, so Theorem 4.63 shows that

infn≥k

fn → lim infn→∞

fn

almost everywhere, lim infn→∞ fn ∈ L 1(µ,R), and∫X

lim infn→∞

fn =

∫X

limk→∞

infn≥k

fn = limk→∞

∫X

infn≥k

fn ≤ lim infn→∞

∫X

fn

since (*) holds for every k.

Theorem 4.66 (Dominated convergence theorem). Let fn be a sequence inL 1(µ,E), and assume there is a nonnegative function g ∈ L 1(µ,R) such that|fn| ≤ g for all n. If fn → f almost everywhere for some map f , then f ∈ L 1(µ,E)and fn → f in L1.

Proof. Since |fm − fn| ≤ 2g for all m,n, Corollary 4.64 shows that gk ∈ L 1(µ,R)for all k, where gk = supm,n≥k |fm − fn|. It is clear that gk → 0 almost everywheresince fn(x) converges for almost all x. But gk is decreasing, so Theorem 4.63and Theorem 4.60 show that gk → 0 in L1. Then

supm,n≥k

‖fm − fn‖1 = supm,n≥k

∫X

|fm − fn| ≤∫X

supm,n≥k

|fm − fn| = ‖gk‖1 ,

which shows that fn is L1-Cauchy and converges to some h ∈ L 1(µ,E) in L1.By Theorem 4.60, f = h almost everywhere.

µ-measurability and integrability

Often we want to prove that a particular µ-measurable map f is integrable. Corol-lary 4.37 shows that we can approximate |f | by an increasing sequence of simplemaps. If we can show that the integrals of these simple maps are bounded, thenTheorem 4.63 implies that |f | is integrable. The last step is to relate the integra-bility of |f | to the integrability of f . The following theorem does this.

212

4 Integration

Theorem 4.67 (Dominated µ-measurable functions). Let f : X → E be µ-measurable and suppose that there is some g ∈ L 1(µ,R) such that |f | ≤ galmost everywhere. Then f ∈ L 1(µ,E), and there exists a sequence fn inI(µ,E) with |fn| ≤ 2g converging to f almost everywhere and in L1. In particular,f ∈ L 1(µ,E) if and only if |f | ∈ L 1(µ,R).

Proof. Let fn be a sequence of simple maps converging to f almost everywhere.Let En = x ∈ X : |fn(x)| ≤ 2g(x) and let hn = χEnfn, which is simple. Then|hn| ≤ 2g for all n and hn → f almost everywhere. By Corollary 4.56 each hn isintegrable, so Theorem 4.66 shows that f ∈ L 1(µ,E) and hn → f in L1.

Corollary 4.68. Assume that E is finite-dimensional and let f : X → E bemeasurable. If f is bounded and vanishes outside a set of finite measure, then fis integrable.

Proof. Theorem 4.36 shows that f is µ-measurable. Choose C > 0 such that|f | ≤ C on A. Then |f | ≤ CχA on X and CχA is integrable, so Theorem 4.67shows that f is integrable.

Theorem 4.69 (Integration under an image measure). Let (X,M, µ) be positivemeasure space, let (Y,N ) be a measurable space, let g : X → Y be measurable, andlet f : Y → E. Then f ∈ L 1(Y, g∗µ,E) if and only if f is (g∗µ)-measurable andf g ∈ L 1(X,µ,E). In that case,∫

X

f g dµ =

∫Y

f d(g∗µ).

Proof. We have ∫X

χE g dµ =

∫X

χg−1(E) dµ

= µ(g−1(E))

= (g∗µ)(E) =

∫Y

χE d(g∗µ)

for all measurable E ⊆ Y , so by linearity the result holds when f is a simple map.Suppose f ∈ L 1(Y, g∗µ,E) and let fn be an L1-Cauchy sequence of maps inI(g∗µ) converging to f in measure. Then fn g is an L1-Cauchy sequence ofmaps in I(µ) converging to f g in measure, because∫

X

|fm g − fn g| dµ =

∫X

|fm − fn| g dµ =

∫Y

|fm − fn| d(g∗µ)

213

4 Integration

and

µ (x ∈ X : |fn(g(x))− f(g(x))| ≥ ε) = (g∗µ) (y ∈ Y : |fn(y)− f(y)| ≥ ε)→ 0

as n→∞. It follows that f g ∈ L 1(X,µ,E) and∫X

f g dµ = limn→∞

∫X

fn g dµ

= limn→∞

∫Y

fn d(g∗µ)

=

∫Y

f d(g∗µ).

Also, Corollary 4.62 shows that f is (g∗µ)-measurable. Conversely, suppose thatf is (g∗µ)-measurable and f g ∈ L 1(X,µ,E). By Theorem 4.67 it suffices toshow that |f | is integrable, so we can assume that f is a nonnegative real function.By Corollary 4.37 there is a sequence fn of increasing positive simple functionsthat converges to f almost everywhere. Since f g is integrable and fn ≤ f ,Corollary 4.56 shows that each fn is integrable. Theorem 4.63 then shows that fis integrable, and the equality follows from the first part of the proof.

Further applications

Theorem 4.70. Let fn be a sequence of maps in L 1(µ,E) such that

∞∑n=1

∫X

|fn|

converges. Then the series f =∑∞n=1 fn converges almost everywhere, f ∈

L 1(µ,E), and∑Nn=1 fn → f in L1 as N →∞.

Proof. Applying Theorem 4.63 to the partial sums∑Nn=1 |fn| shows that

∑∞n=1 |fn|

exists outside a set E of measure zero and is integrable. Therefore f =∑∞n=1 fn

exists on X \ E (see Theorem 1.16) and is integrable by Theorem 4.67. Since∣∣∣∑Nn=1 fn

∣∣∣ ≤ ∑∞n=1 |fn| on X \ E, Theorem 4.66 shows that∑Nn=1 fn → f in L1

on X \ E as N →∞.

Theorem 4.71. Let f ∈ L 1(µ,E) and let S be a closed subset of E such that forall measurable sets A ⊆ X with 0 < µ(A) <∞ we have

1

µ(A)

∫A

f ∈ S.

214

4 Integration

If µ is σ-finite or 0 ∈ S, then f(x) ∈ S for almost all x ∈ X.

Proof. First assume that µ(X) <∞. By restricting f according to Theorem 4.39,we can assume that f(X) has a dense countable subset. Let a ∈ E \ S and letBr(a) be a closed ball of radius r > 0 contained in E \ S. Let A = f−1(Br(a));we have µ(A) <∞ since µ(X) <∞. If µ(A) > 0 then∣∣∣∣ 1

µ(A)

∫A

f − a∣∣∣∣ =

1

µ(A)

∣∣∣∣∫A

f −∫A

a

∣∣∣∣≤ 1

µ(A)

∫A

|f − a|

≤ r,

which is a contradiction. Therefore µ(A) = 0. But f(X)\S has a countable densesubset, so by writing f(X) \ S as the union of countably many closed balls ofrational radii, the above shows that µ(f(X) \ S) = 0.

For the general case when µ is σ-finite, write X =⋃∞n=1Xn where each Xn has

finite measure. Then f |Xn(x) ∈ S for almost all x ∈ Xn, so f(x) ∈ S for almostall x ∈ X. In the case 0 ∈ S, we do not assume that µ(X) <∞ and instead haveµ(A) <∞ due to Theorem 4.55.

Corollary 4.72. Let f ∈ L 1(µ,E). If c ≥ 0 and∣∣∫Af∣∣ ≤ cµ(A) for every

measurable A of finite measure, then |f | ≤ c almost everywhere. In particular, if∫Af = 0 for every measurable A of finite measure, then f = 0 almost everywhere.

Proof. Take S = x ∈ E : |x| ≤ c.

Corollary 4.73. Let E be a Hilbert space and let f ∈ L 1(µ,E). If c ≥ 0 and∣∣∫A〈f, e〉

∣∣ ≤ cµ(A) for every measurable A of finite measure and every unit vectore ∈ E, then |f | ≤ c almost everywhere. In particular, if

∫A〈f, e〉 = 0 for every

measurable A of finite measure and every unit vector e ∈ E, then f = 0 almosteverywhere.

Proof. Let A ⊆ X be a measurable set of finite measure. If∫Af 6= 0, then take

e =∫Af/∣∣∫Af∣∣ so that∣∣∣∣∫

A

f

∣∣∣∣ =

∣∣∣∣⟨∫A

f, e

⟩∣∣∣∣ =

∣∣∣∣∫A

〈f, e〉∣∣∣∣ ≤ cµ(A).

The following result is a generalization of Theorem 2.37.

215

4 Integration

Theorem 4.74 (Differentiation under the integral sign). Let U ⊆ E be an openset and let f : X × U → F be a function satisfying the following properties:

1. For all y ∈ U , the map x 7→ f(x, y) is in L 1(µ,F) and the map x 7→DEf(x, y) is in L 1(µ,L(E,F)).

2. There exists a function ϕ ∈ L 1(µ,R) such that |DEf(x, y)| ≤ ϕ(x) for allx ∈ X and y ∈ U .

Let g : U → F be given by

g(y) =

∫x∈X

f(x, y).

Then g is differentiable on U and

Dg(y) =

∫x∈X

DEf(x, y).

Proof. Let y ∈ U and let

λ =

∫x∈X

DEf(x, y).

For sufficiently small h we have

g(y + h)− g(y)− λh|h|

=

∫x∈X

f(x, y + h)− f(x, y)−DEf(x, y)h

|h|,

and Corollary 2.18 shows that

|f(x, y + h)− f(x, y)−DEf(x, y)h||h|

≤ 2ϕ(x)

for all x ∈ X. Let hn be a sequence in E with hn → 0 and hn 6= 0 for all n.From the definition of the derivative, we have

f(x, y + hn)− f(x, y)−DEf(x, y)hn|hn|

→ 0

for all x ∈ X. By Theorem 4.66,

g(y + hn)− g(y)− λhn|hn|

→ 0.

This proves thatg(y + h)− g(y)− λh

|h|→ 0

as h→ 0.

216

4 Integration

Simple maps and rings

Let (X,M, µ) be a positive measure space and let R ⊆ M be a ring in X. Asimple map f : X → E with partition Ai is simple with respect to R ifAi ∈ R for every i. That is, we can write f =

∑ni=1 f(Ai)χAi for some disjoint

sets A1, . . . , An in R. Note that we must have f−1(y) ∈ R for all y 6= 0, butwe can have f−1(0) ∈ M \ R. We write IR(X,µ,E) = IR(µ,E) = IR(µ) forthe set of all maps from X to E that are integrable and simple with respect toR. Note that IR(X,µ) is a subspace of the vector space I(X,µ) and inherits theL1-norm ‖·‖1 (actually a seminorm). We already know that the integrable simplemaps I(µ) are dense in L 1(µ) and we want to show that under certain conditions,a similar result holds for IR(µ).

Lemma 4.75. Let A ∈ R and let NA be the collection of measurable subsets Y ofA for which χY lies in the L1-closure of IR(A,µ,R), i.e. given ε > 0, there is anf ∈ IR(A,µ,R) such that ‖χY − f‖1 < ε. Then NA is a σ-algebra in A.

Proof. Note that the functions here are defined on A, not X. If Y, Z ∈ NA thenχY ∪Z = max(χY , χZ), χY ∩Z = min(χY , χZ) and χA\Y = χA − χY are in NA.Therefore NA is a ring. Let Yn be a sequence of disjoint sets in NA and letY =

⋃∞n=1 Yn. For each n, choose fn ∈ IR(A,µ,R) such that

‖χYn − fn‖1 <ε

2n.

Then for sufficiently large n we have ‖χY − χY1∪···∪Yn‖1 < ε and∥∥∥∥∥χY −n∑k=1

fk

∥∥∥∥∥1

≤ ‖χY − χY1∪···∪Yn‖1 +

∥∥∥∥∥χY1∪···∪Yn −n∑k=1

fk

∥∥∥∥∥1

< ε+

n∑k=1

‖χYk − fk‖1

< 2ε.

Theorem 4.76. Let R ⊆ M be a ring in X consisting of sets of finite measuresuch that R generates M and X is σ-finite with respect to R. Then IR(µ,E) isdense in both I(µ,E) and L 1(µ,E). Furthermore, if An is a sequence in Rwhose union is X, then χY ∩An lies in the L1-closure of IR(µ,R) for all Y ∈ Mand n.

217

4 Integration

Proof. Note that An always exists since X is σ-finite with respect to R. Foreach n, let NAn be the σ-algebra of Lemma 4.75. Lemma 4.1 shows that

N = E ⊆ X : E ∩An ∈ NAn for all n

is a σ-algebra. Clearly R ⊆ N , so M = N since R generates M. This proves thesecond assertion (if we consider each f ∈ IR(An, µ,R) as an element of IR(X,µ,R)by setting f = 0 on X \An).

We now prove that for any Y ∈ M of finite measure and any ε > 0, there existssome f ∈ IR(µ,R) such that ‖χY − f‖1 < ε. Taking relative complements, wecan assume that the sets An are disjoint. For each n we have Y ∈ N , so thereis some fn ∈ IR(µ,R) such that

‖χY ∩An − fn‖1 <ε

2n.

Since Y =⋃∞n=1 Y ∩An, there is some n such that

µ

(Y −

n⋃k=1

(Y ∩Ak)

)< ε,

in which case ∥∥∥∥∥χY −n∑k=1

χY ∩Ak

∥∥∥∥∥1

< ε.

Then ∥∥∥∥∥χY −n∑k=1

fk

∥∥∥∥∥1

≤

∥∥∥∥∥χY −n∑k=1

χY ∩Ak

∥∥∥∥∥1

+

∥∥∥∥∥n∑k=1

χY ∩Ak −n∑k=1

fk

∥∥∥∥∥1

< ε+

n∑k=1

‖χY ∩Ak − fk‖1

< 2ε.

For the general case, let f ∈ I(µ,E) be nonzero and write f =∑ni=1 aiχYi where

ai ∈ E, ai 6= 0 and µ(Yi) <∞ for all i. For each i there exists some fi ∈ IR(µ,R)such that

‖χYi − fi‖1 <ε

n |ai|,

so ∥∥∥∥∥f −n∑i=1

aifi

∥∥∥∥∥1

< ε.

This shows that IR(µ,E) is dense in I(µ,E); IR(µ,E) is also dense in L 1(µ,E)because I(µ,E) is dense in L 1(µ,E).

218

4 Integration

This result allows us to extend Corollary 4.72:

Corollary 4.77. Let R ⊆ M be a ring consisting of sets of finite measure suchthat R generates M and X is σ-finite with respect to R. If f ∈ L 1(µ) and∫

A

f = 0

for all A ∈ R, then f = 0 almost everywhere.

Proof. By linearity, we have ∫X

ϕf = 0

for all ϕ ∈ IR(µ,R). Let Y be a set of finite measure. By Theorem 4.76 andTheorem 4.61, there is a sequence ϕn of maps in IR(µ,R) that is both L1

and almost everywhere convergent to χY . Taking min(ϕn, 1) and max(ϕn, 0) ifnecessary, we may assume that 0 ≤ ϕn ≤ 1. Then |ϕnf | ≤ |f | and ϕnf → χY falmost everywhere. By Theorem 4.66, we have

0 =

∫X

ϕnf →∫X

χY f,

which shows that ∫Y

f = 0.

Corollary 4.72 now implies that f = 0 almost everywhere.

Integration with respect to vector-valued measures

Let E,F,G be Banach spaces over K. A compatible product is a continuousbilinear or sesquilinear map · : E × F → G such that |e · f | ≤ |e| |f | for all e ∈ Eand f ∈ F (i.e. the norm of · is no greater than 1). (A similar construction wasused in the discussion on power series in Section 1.7.) If |e · f | = |e| |f | for alle ∈ E and f ∈ F, we say that · is norm-preserving. Typically the compatibleproduct will be scalar multiplication with either E = K or F = K, or the innerproduct on a Hilbert space E = F. We will usually write ef instead of e · f .

Assume that we are given some compatible product, and let µ be an F-valuedmeasure on X. Suppose that f ∈ I(|µ|) is simple with respect to some collectionAi with |µ| (Ai) <∞. We define the integral of f by∫

X

f =

∫X

f dµ =

∫x∈X

f(x) dµ =

∫X

f(x) dµ(x) =

n∑i=1

f(Ai)µ(Ai) ∈ G.

219

4 Integration

It is easy to check that the integral of f is well-defined. This gives a continuouslinear map

∫X

: I(|µ|)→ G, since∣∣∣∣∫X

f

∣∣∣∣ =

∣∣∣∣∣n∑i=1

f(Ai)µ(Ai)

∣∣∣∣∣ ≤n∑i=1

|f(Ai)| |µ| (Ai)

=

∫X

|f | d |µ| = ‖f‖1 .

By Theorem 1.45, there is a unique extension of∫X

to a continuous linear map∫X

: I(|µ|) → G. Recall that we have an isometric isomorphism Λ : I(|µ|) →L1(|µ|). If f ∈ L1(|µ|), we define the integral of f by∫

X

f =

∫X

Λ−1f,

and if f ∈ L 1(|µ|), we define the integral of f by∫Xf =

∫X

Λ−1[f ]. We call thisthe integral with respect to the compatible product · : E× F→ G.

Theorem 4.78 (Properties of the integral for vector-valued measures). Let f, g ∈L 1(|µ|) and let r ∈ K.

1.∣∣∫Xf dµ

∣∣ ≤ ∫X|f | d |µ| = ‖f‖1.

2.∫X

(f + g) =∫Xf +

∫Xg and

∫Xrf = r

∫Xf .

3. If X = X1 ∪ X2 where X1, X2 are disjoint and measurable, then∫Xf =∫

X1f +

∫X2f .

4. If |µ| (X) <∞ and e ∈ E, then∫Xe dµ = e · µ(X).

5. If µ, ν are F-valued measures on X and f is integrable with respect to |µ|+|ν|,then f is integrable with respect to both |µ| and |ν| and∫

X

f d(µ+ ν) =

∫X

f dµ+

∫X

f dν.

If f is integrable with respect to |µ|, then f is integrable with respect to |rµ|and ∫

X

f d(rµ) = r

∫X

f dµ.

4.4 The Lp spaces

As before, (X,M, µ) is a positive measure space. Let 1 < p < ∞ be a realnumber. Let L p(X,µ,E) = L p(µ,E) = L p(µ) be the vector space of all µ-measurable maps f : X → E such that |f |p ∈ L 1(µ). We define the Lp-norm of a

220

4 Integration

map f ∈ L p(µ) by

‖f‖p =

(∫X

|f |p)1/p

.

(As with the L1-norm on L 1(µ), the Lp-norm on L p(µ) is actually a seminorm.)To prove that the Lp-norm satisfies the triangle inequality, we first need to establishsome inequalities.

Lemma 4.79. If f ∈ L p(µ), then ‖f‖p = 0 if and only if f = 0 almost every-where.

Proof. If

‖f‖p =

(∫X

|f |p)1/p

= 0

then ‖|f |p‖1 = 0, so |f |p = 0 almost everywhere. Therefore f = 0 almost every-where.

If 1 ≤ p, q ≤ ∞ and1

p+

1

q= 1,

we say that p and q are conjugate exponents. Note that 1 and∞ are conjugateexponents.

Lemma 4.80 (Young’s inequality). Let p, q > 1 be conjugate exponents. For allr, s ≥ 0,

rs ≤ rp

p+sq

q.

Proof. If r = 0 or s = 0 then the result is clear. Otherwise, assume that r, s > 0.Since exp is convex,

rs = exp

(1

plog rp +

1

qlog sq

)≤ 1

pexp (log rp) +

1

qexp (log sq)

=rp

p+sq

q.

221

4 Integration

Theorem 4.81 (Holder’s inequality). Let p, q > 1 be conjugate exponents. Let· : E1 × E2 → E be a continuous map such that |u · v| ≤ C |u| |v| for all u ∈ E1

and v ∈ E2. If f ∈ L p(µ,E1) and g ∈ L q(µ,E2), then f · g ∈ L 1(µ,E) and‖f · g‖1 ≤ C ‖f‖p ‖g‖q.

Proof. If ‖f‖p = 0 or ‖g‖q = 0 then the result is clear (see Lemma 4.79), so wewill assume that ‖f‖p > 0 and ‖g‖q > 0. Set r = |f | / ‖f‖p and s = |g| / ‖g‖q inLemma 4.80 to get

|f · g|‖f‖p ‖g‖q

≤ C |f | |g|‖f‖p ‖g‖q

≤ C

[(1

p

)|f |p

‖f‖pp+

(1

q

)|g|q

‖g‖qq

].

The last expression is integrable, so Theorem 4.67 shows that f · g ∈ L 1(µ).Therefore ∫

X

|f · g|‖f‖p ‖g‖q

≤ C

(1

p

∫X

|f |p

‖f‖pp+

1

q

∫X

|g|q

‖g‖qq

)= C,

which implies that

‖f · g‖1 =

∫X

|f · g| ≤ C ‖f‖p ‖g‖q .

Theorem 4.82 (Minkowski’s inequality). Let 1 < p < ∞. If f, g ∈ L p(µ) thenf + g ∈ L p(µ) and ‖f + g‖p ≤ ‖f‖p + ‖g‖p.

Proof. Since

|f + g|p ≤ (|f |+ |g|)p ≤ (2 max(|f | , |g|))p

≤ 2p max(|f |p , |g|p) ≤ 2p(|f |p + |g|p)

and the last expression is integrable, Theorem 4.67 shows that f + g ∈ L p(µ). Ifwe let q = p/(p− 1) then 1/p+ 1/q = 1 and p− 1 = p/q. Since |f + g|p ∈ L 1(µ),

we have |f + g|p/q ∈ L q(µ). By Theorem 4.81,

‖f + g‖pp =

∫X

|f + g|p

=

∫X

|f + g| |f + g|p−1

≤∫X

|f | |f + g|p/q +

∫X

|g| |f + g|p/q

≤ ‖f‖p ‖f + g‖p/qp + ‖g‖p ‖f + g‖p/qp

= (‖f‖p + ‖g‖p) ‖f + g‖p−1p .

222

4 Integration

From Theorem 4.82, we can see that ‖·‖p is indeed a seminorm on L p(µ). Lemma

4.79 shows thatf ∈ L p(µ) : ‖f‖p = 0

= L0(µ), so L p(µ)/L0(µ) is a normed

vector space. We write Lp(X,µ,E) = Lp(µ,E) = Lp(µ), and we will denote anelement of Lp(µ) by [f ], where f ∈ L p(µ).

Convergence in Lp(µ)

We now add another mode of convergence to our list in Definition 4.42:

Definition 4.83. Let 1 ≤ p < ∞. Let fn be a sequence in L p(µ) and letf ∈ L p(µ).

1. We say that fn converges to f in Lp if fn → f in the Lp-norm. That is,‖fn − f‖p → 0 as n→∞.

2. We say that fn is Lp-Cauchy if it is a Cauchy sequence under the Lp-norm. That is, for every ε > 0 there exists some N such that ‖fm − fn‖p < εfor all m,n ≥ N .

Theorem 4.84. Let 1 ≤ p <∞ and let fn be a sequence in L p(µ).

1. If fn → f in Lp, then fn → f in measure.

2. If fn is Lp-Cauchy, then fn is Cauchy in measure.

Proof. Let E be a set of measure zero such that fn and f are all measurable onX \ E. Let ε > 0 and let En = x ∈ X : |fn(x)− f(x)| ≥ ε. Then µ(En ∩ (X \E)) <∞ by Theorem 4.55, so χEn∩(X\E) ∈ I(µ,R) and

µ(En) = µ(En ∩ (X \ E)) =

∫X

χEn∩(X\E)

≤ 1

εp

∫X

|fn(x)− f(x)|p =1

εp‖fn − f‖pp .

Part (2) is similar.

Theorem 4.85 (Uniqueness of limits, with Lp). Suppose that fn → f almosteverywhere, almost uniformly, in measure, or in Lp (1 ≤ p < ∞). Also supposethat fn → g almost everywhere, almost uniformly, in measure, or in Lp (1 ≤ p <∞). Then f = g almost everywhere.

223

4 Integration


Theorem 4.86. Lp(µ) is complete, i.e. a Banach space.

Proof. Let fn be an Lp-Cauchy sequence in L p(µ). Then fn is Cauchy inmeasure by Theorem 4.84, so fn → f in measure for some f : X → E. By Lemma4.49, there exists a subsequence fnk such that fnk → f almost everywhere. Letgk = fnk . For each n, |gm − gn|p → |f − gn|p almost everywhere as m → ∞. ByCorollary 4.65, |f − gn|p ∈ L 1(µ) and∫

X

|f − gn|p ≤ limm→∞

∫X

|gm − gn|p = limm→∞

‖gm − gn‖pp .

This shows that f = (f − g1) + g1 ∈ L p(µ) and fnk = gk → f in Lp. Since fnis Lp-Cauchy, fn → f in Lp.

Theorem 4.87. I(µ) is dense in Lp(µ).

Proof. Let f ∈ L p(µ) and let fn be a sequence of simple maps converging tof almost everywhere. Let En = x ∈ X : |fn(x)| ≤ 2 |f(x)| and let hn = χEnfn,which is simple. Then |hn| ≤ 2 |f | for all n and hn → f almost everywhere,which implies that |hn − f | ≤ 3 |f |, |hn − f |p ≤ 3p |f |p and |hn − f |p → 0 almosteverywhere. Since |f |p ∈ L 1(µ), we can apply Theorem 4.66 to conclude thathn → f in Lp.

The convergence theorems

Lemma 4.88. Let 1 ≤ p <∞. If r ≥ s ≥ 0, then (r − s)p ≤ rp − sp.

Proof. Let f(x) = xp − sp − (x− s)p so that f ′(x) = pxp−1 − p(x− s)p−1. Thenf ′(x) ≥ 0 for x ≥ s, so f is increasing on [s,∞). The inequality follows sincef(s) = 0.

Theorem 4.89 (Monotone convergence theorem for Lp). Let fn be an in-creasing (or decreasing) sequence of functions in L p(µ,R) such that the sequence‖fn‖p is bounded. Then fn is both Lp and almost everywhere convergent tosome f ∈ L p(µ,R).

Proof. Assume that fn is an increasing sequence. Let C = supn≥1 ‖fn‖p, whichis finite by assumption. For all n ≥ m we have

‖fn − fm‖p =

∫X

(fn − fm)p ≤∫X

fpn −∫X

fpm ≤ C −∫X

fpm

224

4 Integration

by Lemma 4.88, so fn is Lp-Cauchy and converges to some f ∈ L p(µ) in Lp

(see Theorem 4.86). By Theorem 4.84, Lemma 4.49 and Theorem 4.60 there issome subsequence of fn that converges to f almost everywhere. Since fn isincreasing, fn → f almost everywhere. The decreasing case follows by considering−fn.

Theorem 4.90 (Dominated convergence theorem for Lp). Let fn be a sequencein L p(µ,E), and assume there is a nonnegative function g ∈ L p(µ,R) such that|fn| ≤ g for all n. If fn → f almost everywhere for some map f , then f ∈ L p(µ,E)and fn → f in Lp.

Proof. Since |fm − fn| ≤ 2g and |fm − fn|p ≤ 2pgp for all m,n, Corollary 4.64shows that gk ∈ L 1(µ,R) for all k, where gk = supm,n≥k |fm − fn|

p. It is clear

that gk → 0 almost everywhere since fn(x) converges for almost all x. But gkis decreasing, so Theorem 4.63 and Theorem 4.60 show that gk → 0 in L1. Then

supm,n≥k

‖fm − fn‖pp = supm,n≥k

∫X

|fm − fn|p ≤∫X

supm,n≥k

|fm − fn|p = ‖gk‖1 ,

which shows that fn is Lp-Cauchy and converges to some h ∈ L p(µ,E) in Lp.By Theorem 4.85, f = h almost everywhere.

The space L∞(µ)

Let ‖·‖ be the supremum norm ‖f‖ = supx∈X |f(x)|. If f : X → E is measurablealmost everywhere, we define the essential supremum of f by

‖f‖∞ = esssup f = infg‖g‖ ,

where the inf is taken over all bounded maps that are equal to f almost ev-erywhere (‖f‖∞ = ∞ if there are no such maps). Equivalently, if we let Sc =x ∈ X : |f(x)| ≥ c, then

‖f‖∞ = inf c ≥ 0 : µ(Sc) = 0 .

Thus |f | ≤ ‖f‖∞ almost everywhere, and ‖f‖∞ = 0 if and only if f = 0 almosteverywhere. Note that

‖f + g‖∞ ≤ ‖f‖∞ + ‖g‖∞ ,

‖rf‖∞ = |r| ‖f‖∞

for all f, g measurable almost everywhere and all r ∈ K.

225

4 Integration

Let L∞(X,µ,E) = L∞(µ,E) = L∞(µ) be the vector space of all µ-measurablemaps f : X → E such that ‖f‖∞ <∞, and let L∞(X,µ,E) = L∞(µ,E) = L∞(µ)be the quotient space L∞(µ)/L0(µ). Then ‖·‖∞ is a seminorm on L∞(µ), andL∞(µ) is a normed vector space.

Theorem 4.91 (Holder’s inequality, second version). Let 1 ≤ p, q ≤ ∞ be con-jugate exponents. Let · : E1 × E2 → E be a continuous map such that |u · v| ≤C |u| |v| for all u ∈ E1 and v ∈ E2. If f ∈ L p(µ,E1) and g ∈ L q(µ,E2), thenf · g ∈ L 1(µ,E) and ‖f · g‖1 ≤ C ‖f‖p ‖g‖q.

Proof. See Theorem 4.81 for the case p, q > 1. Suppose that p = 1 and q = ∞.Note that f · g is µ-measurable by Theorem 4.38. We have |f · g| ≤ C |f | |g| ≤C |f | ‖g‖∞ almost everywhere, so∫

X

|f · g| ≤ C ‖g‖∞∫X

|f | = C ‖f‖1 ‖g‖∞ .

Corollary 4.92. Suppose that µ(X) < ∞. If 1 ≤ p ≤ q ≤ ∞, then L q(µ) ⊆L p(µ).

Proof. Let f ∈ L q(µ). If q = ∞ then |f | ≤ ‖f‖∞ < ∞ almost everywhere, soTheorem 4.67 shows that f ∈ L p(µ). Otherwise, we can assume that 1 ≤ p < q <∞; choose r such that

p

q+p

r= 1.

Since (|f |p)q/p ∈ L 1(µ), applying Theorem 4.91 to f and 1 : X → K gives|f |p ∈ L 1(µ) and

‖|f |p‖1 ≤ ‖|f |p‖q/p ‖1‖r/p = µ(X)p/r ‖f‖pq .

Convergence in L∞

We add yet another mode of convergence to our list in Definition 4.42:

Definition 4.93. Let fn be a sequence in L∞(µ) and let f ∈ L∞(µ).

1. We say that fn converges to f in L∞ if fn → f in the L∞-norm. That is,‖fn − f‖∞ → 0 as n→∞.

226

4 Integration

2. We say that fn is L∞-Cauchy if it is a Cauchy sequence under the L∞-norm. That is, for every ε > 0 there exists someN such that ‖fm − fn‖∞ < εfor all m,n ≥ N .

Convergence in L∞ is even stronger than almost uniform convergence: if fn → fin L∞ then we can find a set E of measure zero such that fn → f uniformly onX \ E.

Theorem 4.94. Let fn be a sequence in L∞(µ).

1. If fn → f in L∞, then fn → f almost uniformly (and in measure).

2. If fn is L∞-Cauchy, then fn is almost uniformly Cauchy (and Cauchyin measure).

Theorem 4.95 (Uniqueness of limits, with L∞). Suppose that fn → f almosteverywhere, almost uniformly, in measure, or in Lp (1 ≤ p ≤ ∞). Also supposethat fn → g almost everywhere, almost uniformly, in measure, or in Lp (1 ≤ p ≤∞). Then f = g almost everywhere.


Theorem 4.96. L∞(µ) is complete, i.e. a Banach space.

Proof. Let fn be an L∞-Cauchy sequence in L∞(µ). Let E be the set of allx ∈ X such that

|fn(x)| > ‖fn‖∞ or |fm(x)− fn(x)| > ‖fm − fn‖∞for some n or some m,n. Then E has measure zero, and fn is uniformly Cauchyon X \ E. If we define f(x) = limn→∞ fn(x) for x ∈ X \ E, and f(x) = 0 forx ∈ E, then f ∈ L∞(µ) and fn → f in L∞.

Theorem 4.97. If E is finite-dimensional, then the set of simple maps is densein L∞(µ,E).

Proof. We can assume that E = R. Let ε > 0 and let f ∈ L∞(µ,R). We canassume that f is measurable and bounded. Choose an interval [a, b) that containsf(X), and choose a positive integer n such that (b− a)/n < ε. Let

Ik =

(a+

k(b− a)

n, a+

(k + 1)(b− a)

n

]for k = 0, . . . , n−1. For each k, choose any ck ∈ Ik. Let g = c0χI0+· · ·+cn−1χIn−1

,which is simple. If x ∈ X then f(x) ∈ Ik for some k and g(x) = ck ∈ Ik, so|f(x)− g(x)| < ε. This shows that ‖f − g‖∞ < ε.

227

4 Integration

In general, the set of integrable simple maps is not dense in L∞(µ). However:

Theorem 4.98. Suppose µ(X) < ∞ and f ∈ L∞(µ). For all δ, ε > 0, thereexists some g ∈ I(µ) and a measurable set E with µ(E) < δ such that |f − g| < εon X \ E.

Proof. Since µ(X) < ∞, Corollary 4.92 shows that f ∈ L 1(µ). Theorem 4.61implies that there is a sequence of integrable simple maps converging to f almostuniformly, and the result follows immediately.

The Hilbert space L2(µ,E)

Let E be a Hilbert space over K. If f, g : X → E, we define 〈f, g〉 : X → K by〈f, g〉 (x) = 〈f(x), g(x)〉.

Lemma 4.99. If f, g ∈ L 2(µ,E), then 〈f, g〉 ∈ L 1(µ,K).

Proof. Apply Theorem 4.81 using the inner product. Alternatively, since f and gare µ-measurable, 〈f, g〉 is also µ-measurable. From the Cauchy-Schwarz inequalitywe have

2 |〈f, g〉| ≤ 2 |f | |g| ≤ |f |2 + |g|2 ,so we can apply Theorem 4.67.

Define a positive semidefinite sesquilinear map 〈·, ·〉µ : L 2(µ,E)×L 2(µ,E)→ Kby

〈f, g〉µ =

∫X

〈f, g〉 .

Clearly

〈f, f〉µ =

∫X

|f |2 = ‖f‖22 ,

so if 〈f, f〉µ = 0 then f = 0 almost everywhere. Thus if we define 〈·, ·〉µ : L2(µ,E)×L2(µ,E)→ K by

〈[f ], [g]〉µ = 〈f, g〉µ ,

then 〈·, ·〉µ is an inner product on L2(µ,E) that induces the L2-norm ‖·‖2. With

this inner product, L2(µ,E) is a Hilbert space.

Now let µ be an E-valued measure of bounded variation on X, and consider theintegral with respect to the inner product on E. This is the integral satisfying∫

X

n∑i=1

aiχAi dµ =

n∑i=1

〈ai, µ(Ai)〉

228

4 Integration

for integrable simple maps. Since |µ| (X) <∞, Corollary 4.92 gives the inclusionL 2(|µ| ,E) ⊆ L 1(|µ| ,E). Thus we have a continuous linear map

∫X

: L2(|µ| ,E)→K, and Theorem 1.96 shows that there is a unique fµ ∈ L2(|µ| ,E) such that∫

X

f dµ = 〈f, fµ〉µ =

∫X

〈f, fµ〉 d |µ|

for all f ∈ L2(|µ| ,E).

Theorem 4.100 (Polar decomposition). Let µ be an E-valued measure of boundedvariation on X. There exists a unique [fµ] ∈ L2(|µ| ,E) with |fµ| = 1 on X suchthat ∫

X

f dµ =

∫X

〈f, fµ〉 d |µ|

for all f ∈ L2(|µ| ,E). Furthermore,

µ(A) =

∫A

fµ d |µ|

for all measurable A ⊆ X.

Proof. It remains to show that we can choose fµ so that |fµ| = 1 on X. We canassume that fµ is measurable. Let Sr = x ∈ X : |fµ(x)| < r and An be apartition of Sr. We can assume µ(An) 6= 0 for all n. Then

∞∑n=1

|µ(An)| =∞∑n=1

1

|µ(An)|

∣∣∣∣∫X

µ(An)χAn dµ

∣∣∣∣=

∞∑n=1

1

|µ(An)|

∣∣∣∣∫X

〈µ(An)χAn , fµ〉 d |µ|∣∣∣∣

≤∞∑n=1

r |µ| (An) = r |µ| (Sr),

so |µ| (Sr) ≤ r |µ| (Sr). If r < 1 then |µ| (Sr) = 0, so |fµ| ≥ 1 almost everywhere.Conversely, let A ⊆ X be measurable and let e ∈ E be a unit vector. Then∣∣∣∣∫

A

〈fµ, e〉 d |µ|∣∣∣∣ =

∣∣∣∣∫X

〈eχA, fµ〉 d |µ|∣∣∣∣ =

∣∣∣∣∫X

eχA dµ

∣∣∣∣ ≤ |µ| (A),

so Corollary 4.73 shows that |fµ| ≤ 1 almost everywhere. Finally, let A ⊆ X bemeasurable. For all e ∈ E we have

〈e, µ(A)〉 =

∫X

eχA dµ =

∫X

〈eχA, fµ〉 d |µ| =⟨e,

∫A

fµ d |µ|⟩,

so taking e = µ(A)−∫Afµ d |µ| completes the proof.

229

4 Integration

4.5 Duality

Indefinite integrals

Let µ be a (positive or F-valued) measure on (X,M) and let f ∈ L 1(|µ| ,E). (Ifµ is positive, we take F = R.) Assume that we are given a compatible product· : E × F → G. Define a map µf : M→ G, called the indefinite integral of f ,by

µf (A) =

∫A

f dµ.

Theorem 4.101. µf is a G-valued measure on X.

Proof. It is clear that µf (∅) = 0. Let A ∈ M and let An be a partition of A.Since

N∑n=1

∫X

χAn |f | d |µ| ≤∫X

χA |f | d |µ|

for all N , Theorem 4.70 shows that

∞∑n=1

µ(An) =

∞∑n=1

∫X

χAnf dµ =

∫X

∞∑n=1

χAnf dµ

=

∫X

χAf dµ = µ(A).

Let µ, ν be measures on (X,M). We say that ν is absolutely continuous withrespect to µ, or µ-continuous, if |µ| (A) = 0 implies ν(A) = 0 for all A ∈ M.In that case, we write ν µ. Clearly, µf is always µ-continuous and of boundedvariation.

Suppose that ν is µ-continuous. If a property holds µ-almost everywhere, thenthere is a set E with |µ| (E) = 0 such that the property holds on X \ E. But|ν| (E) = 0, so the property also holds ν-almost everywhere. Thus if fn → fpointwise µ-almost everywhere then fn → f pointwise ν-almost everywhere; if f isµ-measurable then f is ν-measurable; we have the inclusion L∞(|µ|) ⊆ L∞(|ν|).

Theorem 4.102. Let µ, ν be measures on X. The following are equivalent:

1. ν is µ-continuous.

2. |ν| is µ-continuous.

230

4 Integration

3. ν is |µ|-continuous.

If ν is of bounded variation, then the preceding statements are equivalent to:

4. For all ε > 0 there exists a δ > 0 such that for all A ∈ M, |µ| (A) < δimplies that |ν| (A) < ε.

Proof. Suppose that (1) holds and |µ| (A) = 0. Then |µ| (B) = 0 and thereforeν(B) = 0 for all measurable B ⊆ A, so |ν| (A) = 0. Conversely, if (2) holdsand |µ| (A) = 0, then |ν(A)| ≤ |ν| (A) = 0. This proves (1) ⇔ (2). It is clearthat (1) ⇔ (3). Suppose that (2) holds but (4) does not hold. Then for someε > 0, we can find a set An ∈ M with |µ| (An) < 2−n such that |ν| (An) ≥ ε.Let Bn =

⋃∞k=nAk and B =

⋂∞n=1Bn. Theorem 4.9 shows that |µ| (B) = 0 but

|ν| (B) = limn→∞ |ν| (Bn) ≥ ε, which is a contradiction. This proves (2) ⇒ (4).If (4) holds and |µ| (A) = 0, then for each n we have |ν| (A) < 1/n. Therefore|ν| (A) = 0. This proves (4)⇒ (2).

Recall that the set M(M,F) of all F-valued measures on X of bounded variationis a Banach space (see Theorem 4.13). Note that we have a linear map f 7→ µffrom L1(|µ| ,E) to M(M,G).

Theorem 4.103 (Total variation of an indefinite integral). The map f 7→ µf iscontinuous. That is,

|µf | (A) ≤∫A

|f | d |µ|

for all A ∈ M. If the compatible product is norm-preserving, then equality holdsand the map f 7→ µf is an isometry.

Proof. Let A ∈M and let Ai be a partition of A. Then

n∑i=1

|µf (Ai)| =n∑i=1

∣∣∣∣∫Ai

f dµ

∣∣∣∣ ≤ n∑i=1

∫Ai

|f | d |µ| ≤∫A

|f | d |µ|

for all n, so

|µf | (A) ≤∫A

|f | d |µ| .

This shows that f 7→ µf is continuous. Now assume that the compatible productis norm-preserving, i.e. |ef | = |e| |f | for all e ∈ E and f ∈ F. Let f =

∑ni=1 aiχAi

231

4 Integration

be an integrable simple map on A, where Ai is a partition of A. Then∫A

|f | d |µ| =n∑i=1

|ai| |µ(Ai)| =n∑i=1

|aiµ(Ai)|

=

n∑i=1

|µf (Ai)| ≤ |µf | (A).

Since the integrable simple maps are dense in L 1(|µ|), we must have∫A

|f | d |µ| ≤ |µf | (A)

for all f ∈ L 1(|µ|).

Theorem 4.104 (Integration under an indefinite integral). Let µ be a positivemeasure and let f ∈ L 1(µ,F).

1. If g ∈ L 1(|µf | ,E) then gf ∈ L 1(µ,G) and∫X

g dµf =

∫X

gf dµ.

2. If g ∈ L∞(|µ| ,E) then g ∈ L 1(|µf | ,E), and (1) applies.

Proof. We first show that the equality holds when g is an integrable simple map.Let g =

∑ni=1 aiχAi such a map in I(|µf | ,E). Then∫

X

g dµf =

n∑i=1

ai

∫X

χAi dµf =

n∑i=1

aiµf (Ai)

=

n∑i=1

∫X

aiχAif dµ =

∫X

gf dµ.

Similarly,∫Xg d |µf | =

∫Xg |f | dµ using Theorem 4.103. For (1), let gn be an

L1-Cauchy sequence of maps in I(|µf | ,E) converging to g almost everywhere andin L1. Then gnf → gf almost everywhere, and gnf is L1-Cauchy since∫

X

|gmf − gnf | dµ ≤∫X

|gm − gn| |f | dµ =

∫X

|gm − gn| d |µf | .

232

4 Integration

Theorem 4.61 shows that gf ∈ L 1(µ,G), gnf → gf in L1 (with respect to µ),and ∫

X

g dµf = limn→∞

∫X

gn dµf

= limn→∞

∫X

gnf dµ

=

∫X

gf dµ.

For (2), g ∈ L 1(|µf | ,E) since |µf | (X) <∞ and

L∞(µ,E) ⊆ L∞(|µf | ,E) ⊆ L 1(|µf | ,E)

by Corollary 4.92.

The Radon-Nikodym theorem

We say that a measure µ on X is concentrated or carried in A ∈M if |µ| (X \A) = 0 (i.e. µ(B) = 0 for all measurable B ⊆ X \ A). We say that two measuresµ, ν on X are orthogonal or mutually singular and write µ ⊥ ν if there aredisjoint measurable sets A and B such that A ∪ B = X, µ is concentrated in A,and ν is concentrated in B.

Lemma 4.105. Let µ, ν, λ be (positive or F-valued) measures on X.

1. If µ ⊥ µ, then µ = 0.

2. µ is concentrated in A if and only if |µ| is concentrated in A.

3. µ ⊥ ν if and only if |µ| ⊥ |ν|. If µ, ν are of bounded variation, then |µ+ ν| =|µ|+ |ν|.

4. If µ ⊥ λ and ν ⊥ λ, then µ+ ν ⊥ λ.

5. If µ λ and ν λ, then µ+ ν λ.

6. If λ ⊥ µ and ν µ, then λ ⊥ ν.

7. If λ ⊥ µ and λ µ, then λ = 0.

Proof. For (4), let A be a set with |µ| (A) = 0 in which λ is concentrated. Then|ν| (A) = 0, so ν is concentrated in X \A.

Lemma 4.106. If µ is a σ-finite positive measure on X, then there exists aw ∈ L 1(µ) such that 0 < w < 1 on X.

233

4 Integration

Proof. Write X =⋃∞n=1Xn where µ(Xn) < ∞ for all n. Set wn(x) = 2−n/(1 +

µ(Xn)) for x ∈ Xn and wn(x) = 0 for x ∈ X \ Xn. Then w =∑∞n=1 wn is a

suitable function.

Theorem 4.107 (Radon-Nikodym theorem and Lebesgue decomposition). Let µbe a σ-finite positive measure on (X,M) and let ν be a finite positive measure onM.

1. There exist unique finite positive measures νa and νs onM with νa µ andνs ⊥ µ such that

ν = νa + νs.

2. There exists a unique [f ] ∈ L1(µ,R) such that νa = µf . That is,

νa(A) =

∫A

f dµ

for all A ∈ M. This f is the Radon-Nikodym derivative of νa withrespect to µ, and denote it by dνa/dµ.

Proof. To prove uniqueness, suppose that ν′a and ν′s also satisfy (1). Then νa−ν′a =νs−ν′s, νa−ν′a µ and νs−ν′s ⊥ µ, so Lemma 4.105 shows that νa−ν′a = νs−ν′s =0. For existence, let w ∈ L 1(µ) be as in Lemma 4.106. Let λ = ν + µw; then λ isa finite positive measure on M. If f ∈ L 2(λ) then f ∈ L 2(ν) and f ∈ L 1(ν) byCorollary 4.92, so the map

f 7→∫X

f dν

is a continuous linear map from L 2(λ) to R. By Theorem 1.96, there is a g ∈L 2(λ), unique λ-almost everywhere, such that∫

X

f dν =

∫X

fg dλ (*)

for all f ∈ L 2(λ). If A ∈M with λ(A) > 0, then

0 ≤ 1

λ(A)

∫A

g dλ =1

λ(A)

∫X

χA dν =ν(A)

λ(A)≤ 1.

By Theorem 4.71 we have 0 ≤ g ≤ 1, λ-almost everywhere. Therefore we canassume that 0 ≤ g ≤ 1 on X. Let

A = x ∈ X : 0 ≤ g(x) < 1 , B = x ∈ X : g(x) = 1

234

4 Integration

and define νa(E) = ν(E ∩A) and νs(E) = ν(E ∩B). Rewrite (*) as∫X

(1− g)f dν =

∫X

fgw dµ.

If we set f = χB then ∫B

w dµ = 0,

so µ(B) = 0 by Theorem 4.71 since w > 0 on X. Therefore νs ⊥ µ. Let E ∈ M.If we set f = χE(1 + g + · · ·+ gn) then∫

E

(1− gn+1) dν =

∫E

g(1 + g + · · ·+ gn)w dµ

=

∫E∩A

g(1− gn+1)

1− gw dµ

since 1− g = 0 only on B, and we know that µ(B) = 0. Note that

1− gn+1 → χE∩A,

g(1− gn+1)

1− gw → g

1− gw

as n→∞, so by Theorem 4.66 we have

νa(E) = ν(E ∩A) =

∫E∩A

g

1− gw dµ.

ThusχA

g

1− gw

is our desired function in (2). It is clear that νa µ.

Theorem 4.108 (Radon-Nikodym theorem for Hilbert spaces). Let E be a Hilbertspace, let µ be a σ-finite positive measure on (X,M) and let ν be an E-valuedmeasure of bounded variation on M. If ν µ, then there exists a unique [f ] ∈L1(µ,E) such that ν = µf .

Proof. Since |ν| µ, Theorem 4.107 shows that we can write |ν| = µg for some[g] ∈ L1(µ,R). If A ⊆ X is measurable then

ν(A) =

∫A

fν d |ν| =∫A

fν dµg =

∫A

gfν dµ = µgfν (A)

using Theorem 4.100 and Theorem 4.104. Therefore ν = µgfν .

235

4 Integration

In general, if ν(A) =∫Af dµ for all A ∈ M then we write dν = f dµ or dν(x) =

f(x) dµ(x). Clearly, if both dν = f dµ and dν = g dµ then f = g µ-almosteverywhere. If µ and ν satisfy the conditions of the Radon-Nikodym theorem andν µ then we can write

dν =dν

dµdµ,

where dν/dµ is unique (µ-almost everywhere).

Note that for any f ∈ L 1(µ,E) the indefinite integral µf satisfies dµf = f dµ.If the compatible product is norm-preserving, then Theorem 4.103 shows thatd |µf | = |f | d |µ|. Theorem 4.100 shows that for any E-valued measure µ ofbounded variation on X we have dµ = fµ d |µ| for a unique [fµ] ∈ L2(|µ| ,E)such that |fµ| = 1 on X.

Theorem 4.109. Let E be a Hilbert space over K, let µ, λ be σ-finite positivemeasures on (X,M), and let ν, ν′ be E-valued measures of bounded variation onM such that ν, ν′ µ and µ λ.

1. d(ν + ν′)/dµ = dν/dµ+ dν′/dµ (µ-almost everywhere).

2. d(rν)/dν = r dν/dµ (µ-almost everywhere) for all r ∈ K.

3. ν λ anddν

dλ=dν

dµ

dµ

dλ

(λ-almost everywhere).

4. If we also have λ µ thendλ

dµ

dµ

dλ= 1

(λ-almost everywhere).

Proof. For (3), apply the first part of Theorem 4.104 with g = χA dν/dµ to get

ν(A) =

∫A

dν

dµdµ =

∫A

dν

dµ

dµ

dλdλ

for all A ∈M.

Duality for the Lp spaces

Let µ be a positive measure on (X,M). We say that a function γ :M→ [0,∞)is µ-continuous if for all ε > 0 there exists a δ > 0 such that for all A ∈ M,

236

4 Integration

µ(A) < δ implies that γ(A) < ε. If γ is a finite positive measure, then Theorem4.102 shows that γ is µ-continuous in the present sense if and only if it is µ-continuous in the sense that µ(A) = 0 implies γ(A) = 0. If L is a subspace ofL∞(µ,E), we say that a linear functional λ : L → K is µ-continuous if there isa µ-continuous function γ : M → [0,∞) such that |λ(gχA)| ≤ ‖g‖∞ γ(A) for allg ∈ L and A ∈M.

Let E be a Hilbert space over K, and let f ∈ L 1(µ,E). For all g ∈ L∞(µ,E),Theorem 4.104 shows that∣∣∣∣∫

X

〈gχA, f〉 dµ∣∣∣∣ =

∣∣∣∣∫A

g µf

∣∣∣∣ ≤ ‖g‖∞ |µf | (A).

(Here the compatible product is the inner product on E.) Since |µf | is µ-continuous,this shows that the linear functional

g 7→∫X

〈g, f〉 dµ

is µ-continuous. In fact, the converse is true.

Lemma 4.110. Let µ be a finite positive measure on X. If λ1, λ2 : L∞(µ,E)→ Kare µ-continuous linear functionals that agree on I(µ,E), then λ1 = λ2.

Proof. Let g ∈ L∞(µ,E) and let ε > 0. Choose µ-continuous functions γ1, γ2 :M→ [0,∞) such that

|λ1(gχE)| ≤ ‖g‖∞ γ1(E) and |λ2(gχE)| ≤ ‖g‖∞ γ2(E)

for all E ∈ M, and choose a δ > 0 such that γ1(E) < ε and γ2(E) < ε wheneverµ(E) < δ. By Theorem 4.98, there is some h ∈ I(µ,E) and a measurable set Ewith µ(E) < δ such that |g − h| < ε on X \E. We can assume that ‖h‖∞ ≤ ‖g‖∞.Then

|λ1g − λ2g| ≤ |λ1g − λ1h|+ |λ1h− λ2h|+ |λ2h− λ2g|= |λ1(g − h)|+ |λ2(g − h)|.

Also,

|λ1(g − h)| ≤ |λ1[(g − h)χE ]|+ |λ1[(g − h)χX\E ]| < 2 ‖g‖∞ ε+ εγ1(X),

and a similar argument shows that |λ2(g − h)| < 2 ‖g‖∞ ε+ εγ2(X). Therefore

|λ1g − λ2g| < ε(4 ‖g‖∞ + γ1(X) + γ2(X)),

and taking ε→ 0 shows that λ1g = λ2g.

237

4 Integration

Theorem 4.111. Let E be a Hilbert space over K, let µ be a finite positive measureon X, and let λ : I(µ,E)→ K be µ-continuous linear functional. Then there existsa unique [f ] ∈ L1(µ,E) such that

λ[g] =

∫X

〈g, f〉 dµ

for all [g] ∈ I(µ,E).

Proof. Corollary 4.73 shows that [f ] is unique. To prove that [f ] exists, define afunction ν :M→ E as follows: for any measurable A ⊆ X the map e 7→ λ(eχA)is a (continuous) linear functional on E, so by Theorem 1.96 there exists a uniqueν(A) such that λ(eχA) = 〈e, ν(A)〉 for all e ∈ E. Note that ν(∅) = 0, and if A andB are disjoint then ν(A∪B) = ν(A)+ν(B) since χA∪B = χA+χB . If A =

⋃∞n=1An

for an increasing sequence of sets An then (whenever ν(A \An) 6= 0)

|ν(A)− ν(An)| =⟨ν(A \An)

|ν(A \An)|, ν(A \An)

⟩= λ

(ν(A \An)

|ν(A \An)|χA\An

)≤ |λ|

∥∥χA\An∥∥p = |λ|µ(A \An)1/p → 0

as n→∞, which shows that ν is countably additive. If An is a partition of Xwith ν(An) 6= 0 for all n then

∞∑n=1

|ν(An)| =∞∑n=1

⟨ν(An)

|ν(An)|, ν(An)

⟩= λ

( ∞∑n=1

ν(An)

|ν(An)|χAn

)≤ |λ|µ(X)1/p

since the sum defines a measurable map with norm 1 on X. Furthermore, ifµ(A) = 0 then ‖eχA‖p = 0, so ν(A) = 0. This proves that ν is an E-valuedmeasure of bounded variation, and ν µ.

By Theorem 4.108, there exists some f ∈ L 1(µ,E) such that

λ(eχA) = 〈e, ν(A)〉 =

⟨e,

∫A

f dµ

⟩=

∫X

〈eχA, f〉 dµ

for all e ∈ E and measurable A ⊆ X. By linearity we have

λ[g] =

∫X

〈g, f〉 dµ

for all g ∈ I(µ,E), and Lemma 4.110 implies that this holds for all g ∈ L∞(µ,E).

238

4 Integration

Recall that L2(µ,E) is a Hilbert space under the inner product

〈f, g〉µ =

∫X

〈f, g〉 dµ.

Theorem 4.91 shows that the integral is actually well-defined for any conjugateexponents 1 ≤ p, q ≤ ∞ and maps f ∈ L p(µ,E) and g ∈ L q(µ,E), giving us asesquilinear map

〈·, ·〉µ : Lp(µ,E)× Lq(µ,E)→ K.

Theorem 4.112. Let E be a Hilbert space, let µ be a positive measure on X andlet 1 ≤ p, q ≤ ∞ be conjguate exponents. Let f : X → E be a µ-measurablefunction such that 〈g, f〉 ∈ L 1(µ,K) for all g ∈ I(µ,E), and the set∫

X

〈g, f〉 dµ : g ∈ I(µ,E), ‖g‖p = 1

is bounded. If p = ∞, also assume that the linear functional g 7→

∫X〈g, f〉 dµ is

µ-continuous on I(µ,E). If µ is semifinite or x ∈ X : f(x) 6= 0 is σ-finite, thenf ∈ L q(µ,E) and

‖f‖q = sup

∣∣∣∣∫X

〈g, f〉 dµ∣∣∣∣ : g ∈ I(µ,E), ‖g‖p = 1

.

Proof. Let Mq = sup · · · , which is finite by assumption. Let E ⊆ X be a set offinite measure. If p <∞, A ⊆ E is measurable and g ∈ I(E,µ,E) then∣∣∣∣∫

E

〈gχA, f〉 dµ∣∣∣∣ ≤Mq ‖gχA‖p = ‖g‖∞ µ(A)1/p,

which shows that g 7→∫E〈g, f〉 dµ is µ-continuous. If p = ∞ then this linear

functional is µ-continuous by assumption. By Theorem 4.111, there is some f1 ∈L 1(E,µ,E) such that

∫E〈g, f〉 dµ =

∫E〈g, f1〉 dµ for all g ∈ I(E,µ,E), so fχE =

f1 (as functions on X) by uniqueness. This shows that fχE ∈ L 1(µ,E). If g ∈L∞(µ,E) vanishes outside E then 〈g, f〉 ∈ L 1(µ,K). Since g is µ-measurable, wecan choose a sequence gn of simple functions converging to g almost everywheresuch that |gn| ≤ |g| on X, then 〈gn, f〉 → 〈g, f〉 almost everywhere, |〈gn, f〉| ≤‖g‖∞ |f |χE on X, and Theorem 4.66 shows that∣∣∣∣∫

X

〈g, f〉 dµ∣∣∣∣ = lim

n→∞

∣∣∣∣∫X

〈gn, f〉 dµ∣∣∣∣ ≤Mq ‖g‖p . (*)

Now assume that q < ∞ and S = x ∈ X : f(x) 6= 0 is σ-finite. (Later we willshow that S is always σ-finite when µ is semifinite.) Choose an increasing sequence

239

4 Integration

En of sets of finite measure such that S =⋃∞n=1En. Since f is µ-measurable, we

can choose a sequence fn of simple functions converging to f almost everywheresuch that |fn| ≤ |f | on X. Let hn = fnχEn . If ‖hn‖q 6= 0, define ϕn : X → E by

ϕn(x) =|hn(x)|q−1f(x)

‖hn‖q−1q |f(x)|

whenever f(x) 6= 0 and ϕn(x) = |hn(x)|q−1/ ‖hn‖q−1q whenever f(x) = 0. We have

‖ϕn‖pp =1

‖hn‖p(q−1)q

∫X

|hn|p(q−1) dµ =1

‖hn‖qq

∫X

|hn|q dµ = 1.

If ‖hn‖q = 0, then let ϕn = 0. Since ϕn is in L∞(µ,E) and vanishes outside En,(*) shows that ∣∣∣∣∫

X

〈ϕn, f〉 dµ∣∣∣∣ ≤Mq.

Then

lim infn→∞

‖hn‖q = lim infn→∞

∫X

|ϕn||hn| dµ ≤ lim infn→∞

∫X

|ϕn||f | dµ

= lim infn→∞

∫X

〈ϕn, f〉 dµ ≤Mq.

By Corollary 4.65, lim infn→∞ |hn|q ∈ L 1(µ,R) and∫X

lim infn→∞

|hn|q dµ ≤Mqq .

But |hn|q → |f |q almost everywhere, so |f |q ∈ L 1(µ,R), f ∈ L q(µ,E) and‖f‖q ≤Mq. Theorem 4.91 shows that Mq ≤ ‖f‖q, so Mq = ‖f‖q.

Suppose that q <∞ and µ is semifinite. Let c > 0, let Sc = x ∈ X : |f(x)| ≥ c,and let C > 0. Suppose that µ(Sc) =∞; Theorem 4.18 shows that there is somemeasurable set E ⊆ Sc such that C < µ(E) <∞. Define ϕ : X → E by ϕ = f/ |f |on E and ϕ = 0 on X \ E. By (*),

cµ(E) ≤∫E

|f | dµ =

∣∣∣∣∫X

〈ϕ, f〉 dµ∣∣∣∣ ≤Mq ‖ϕ‖p ≤Mqµ(E)1/p

andMq ≥ cµ(E)1−1/p > cC1/q →∞

as C → ∞, which is a contradiction. Therefore µ(Sc) < ∞. This is true for allc > 0, so x ∈ X : f(x) 6= 0 is σ-finite.

240

4 Integration

Finally, suppose that q = ∞. Let ε > 0 and let S = x ∈ X : |f(x)| ≥ M∞ + ε.Suppose that 0 < µ(S) ≤ ∞. Since µ is semifinite or x ∈ X : f(x) 6= 0, we canchoose some measurable set E ⊆ S such that 0 < µ(E) < ∞. Define ϕ : X → Eby ϕ = µ(E)−1f/ |f | on E and ϕ = 0 on X \ E, so that ‖ϕ‖1 = 1. By (*),

M∞ + ε ≤ 1

µ(E)

∫E

|f | dµ =

∣∣∣∣∫X

〈ϕ, f〉 dµ∣∣∣∣ ≤M∞.

This is a contradiction, so ‖f‖∞ ≤M∞. It is clear that M∞ ≤ ‖f‖∞.

Theorem 4.113 (Dual of Lp). Let E be a Hilbert space, let µ be a positive measureon X and let 1 ≤ p, q ≤ ∞ be conjugate exponents. Consider the conjugate linearmap

Λq : Lq(µ,E)→ Lp(µ,E)∗

f 7→ 〈·, f〉µ .

1. Λq is an isometry if 1 < p ≤ ∞, or p = 1 and µ is semifinite.

2. Λq is surjective if 1 < p <∞, or p = 1 and µ is σ-finite.

Proof. For (1), let f ∈ L q(µ,E) and let λ = Λq[f ]. Theorem 4.91 shows that|λ| ≤ ‖f‖q, so it remains to show that ‖f‖q ≤ |λ|. We can assume that [f ] 6= 0.

If 1 < p <∞, then define g : X → E by

g(x) =|f(x)|q−2f(x)

‖f‖q−1q

whenever f(x) 6= 0, and g(x) = 0 whenever f(x) = 0. We have

‖g‖pp =1

‖f‖p(q−1)q

∫X

|f |p(q−1)dµ =

1

‖f‖qq

∫X

|f |q dµ = 1,

so

‖f‖q =1

‖f‖q−1q

∫|f |q dµ =

∫X

〈g, f〉 dµ ≤ |λ| .

If p = ∞, then define g : X → E by g(x) = f(x)/ |f(x)| whenever |f(x)| > 0 andg(x) = 0 whenever |f(x)| = 0. Then g is µ-measurable and ‖g‖∞ = 1, so

‖f‖1 =

∫X

|f | dµ =

∫X

〈g, f〉 dµ ≤ |λ| .

241

4 Integration

Suppose that p = 1 and µ is semifinite. Let 0 < ε < ‖f‖∞ and let S = x ∈ X :|f(x)| > ‖f‖∞ − ε. Since µ is semifinite, we can choose a measurable set E ⊆ Ssuch that 0 < µ(E) <∞. Define g : X → E by g = µ(E)−1f/ |f | on E and g = 0on X \ E, so that ‖g‖1 = 1. Then

‖f‖∞ − ε ≤1

µ(E)

∫E

|f | dµ =

∫X

〈g, f〉 dµ ≤ |λ| ,

and taking ε→ 0 shows that ‖f‖∞ ≤ |λ|.

For (2), let λ ∈ Lp(µ,E)∗ with 1 ≤ p < ∞. First assume that µ(X) < ∞. ThenI(µ,E) ⊆ Lp(µ,E), so Theorem 4.111 shows that there is some f ∈ L 1(µ,E) suchthat

λ[g] =

∫X

〈g, f〉 dµ (*)

for all g ∈ I(µ,E). Since∣∣∫X〈g, f〉 dµ

∣∣ ≤ |λ| ‖g‖p for all g ∈ I(µ,E), f is µ-measurable, and x ∈ X : f(x) 6= 0 is σ-finite (see Theorem 4.55), Theorem4.112 shows that f ∈ L q(µ,E). Then (*) holds for all g ∈ L p(µ,E), becauseTheorem 4.87 shows that I(µ,E) is dense in L p(µ,E). Therefore Λq[f ] = λ.

For the case when µ(X) =∞ but µ is σ-finite, choose w ∈ L 1(µ,R) as in Lemma4.106. Then µ = w dµ is a finite measure on X and [g] 7→ [w1/pg] is an iso-metric isomorphism from Lp(µ,E) to Lp(µ,E) since w > 0 on X. If we defineλ : Lp(µ,E) → K by λ[g] = λ[w1/pg] then λ is also a (continuous) linear func-tional, with |λ| = |λ|. Since µ(X) < ∞, there exists some [f ] ∈ Lq(µ,E) suchthat

λ[g] =

∫X

〈g, f〉 dµ

for all [g] ∈ Lp(µ,E). Let f = w1/q f (with f = f if p = 1). Then

λ[g] = λ[w−1/pg] =

∫X

〈w−1/pg, f〉 dµ =

∫X

〈g, f〉 dµ

for all [g] ∈ Lp(µ,E). Therefore Λq[f ] = λ.

Finally, suppose that µ is not σ-finite and 1 < p <∞. For every σ-finite E ⊆ X,there is a unique [fE ] ∈ Lq(µ,E) such that fE = 0 outside E, ‖fE‖q ≤ |λ|, and

λ[g] =

∫X

〈g, fE〉 dµ

for all g ∈ L p(µ,E) such that g = 0 outside E. Let M = supE ‖fE‖q as E rangesover all σ-finite subsets of X, and choose a sequence En such that ‖fEn‖q →M .Let F =

⋃∞n=1En, which is σ-finite. For each n we have F ⊇ En and fF = fEn

242

4 Integration

almost everywhere on En, so ‖fF ‖q ≥ ‖fEn‖q. Therefore ‖fF ‖q = M . If g ∈L p(µ,E) then Theorem 4.55 shows that g = 0 outside a σ-finite set A. Then∫

X

|fF |q +

∫X

∣∣fA\F ∣∣q =

∫X

|fA|q ≤Mq =

∫X

|fF |q ,

so fA\F = 0, [fA] = [fF ] and

λ[g] =

∫X

〈g, fA〉 dµ =

∫X

〈g, fF 〉 dµ.

Therefore Λq[fF ] = λ.

4.6 Product measures

Let (X,M, µ) and (Y,N , ν) be σ-finite positive measure spaces. Recall that wehave a product σ-algebraM⊗N in X ×Y ; our goal is to construct a measure onthis σ-algebra.

Theorem 4.114. Let (X,M) and (Y,N ) be measurable spaces. We considerX × Y as a measurable space with the σ-algebra M⊗N .

1. Let E ∈M⊗N and for each x ∈ X the x-section

Ex = y ∈ Y : (x, y) ∈ E

is a member of N . Similarly, for each y ∈ Y the y-section

Ey = x ∈ X : (x, y) ∈ E

is a member of M.

2. Let Z be a measurable space and let f : X × Y → Z be a measurable map.For each x ∈ X, the x-section fx : Y → Z defined by fx(y) = f(x, y) ismeasurable. Similarly, for each y ∈ X the y-section fy : X → Z defined byfy(x) = f(x, y) is measurable.

Proof. Let S be the collection of all E ∈M⊗N such that Ex ∈ N for all x. ThenS contains all sets of the form A × B with A ∈ M and B ∈ N , so it suffices toshow that S is a σ-algebra. If E ∈ S then Ec ∈ S since (Ec)x = (Ex)c. Similarly,if En is a sequence in S then

⋃∞n=1En ∈ S since (

⋃∞n=1En)

x=⋃∞n=1(En)x.

This proves (1). If E ⊆ Z is measurable then f−1x (E) =

(f−1(E)

)x, so fx is

measurable.

243

4 Integration

Let A and B be rings in M and N respectively. A rectangle is a set of the formA × B with A ∈ A and B ∈ B. Let A × B be the set of all finite disjoint unionsof rectangles. It is clear that ∅ ∈ A × B. We have the identities

(A1 ×B1) ∩ (A2 ×B2) = (A1 ∩A2)× (B1 ∩B2)

and

(A1 ×B1) \ (A2 ×B2) = [(A1 \A2)×B1] ∪ [(A1 ∩A2)× (B1 \B2)],

which show that P ∩Q and P \Q are members of A×B whenever P,Q ∈ A×B.Since P ∪Q = (P \Q) ∪Q, this shows that A× B is a ring in X × Y .

Theorem 4.115. If A is a ring in X, B is a ring in Y and A × B is the set ofall finite disjoint unions of rectangles, then M(A× B) =M(M(A)×M(B)).

Proof. Since

A× B ⊆M(A)×M(B) ⊆M(M(A)×M(B)),

it is clear thatM(A×B) ⊆M(M(A)×M(B)). If B ∈ B, consider the σ-algebrain X × B generated by all sets of the form A × B with A ∈ A. This σ-algebrais contained in M(A× B), so M(A) × ∅, B ⊆ M(A× B) for all B ∈ B. Usingthis fact, we have ∅, A ×M(B) ⊆M(A× B) for all A ∈ M(A). It follows thatM(A)×M(B) ⊆M(A×B) and thereforeM(M(A)×M(B)) ⊆M(A×B).

Now let A and B be the rings of sets of finite measure in M and N respectively.Note that M(A) = M and M(B) = N since X and Y are σ-finite. If we canconstruct a measure on M(A × B), then Theorem 4.115 shows that we have ameasure on M(M×N ) =M⊗N .

Let f : X × Y → E be simple with respect to A× B and write f =∑ni=1 f(Ai ×

Bi)χAi×Bi for some disjoint sets Ai ×Bi where Ai ∈ A and Bi ∈ B. For eachx ∈ X we can integrate the x-section fx, giving a map

x 7→∫Y

fx dν =

n∑i=1

f(Ai ×Bi)χAi(x)ν(Bi).

This is an integrable simple map on X, so we can integrate again to form aniterated integral:∫

x∈X

∫Y

fx dν dµ =

∫X

∫Y

f dν dµ =

∫x∈X

∫y∈Y

f(x, y)

=

n∑i=1

f(Ai ×Bi)µ(Ai)ν(Bi).

244

4 Integration

It is easy to see that the order of integration does not matter, and that the iteratedintegral is linear on the space of maps that are simple with respect to A×B. Wecan now construct the product measure on M⊗N .

Theorem 4.116 (Existence of the product measure). Let µ and ν be σ-finitemeasures on X and Y . There exists a unique positive measure µ ⊗ ν on M⊗Nsuch that

(µ⊗ ν)(A×B) = µ(A)ν(B)

for all A ∈ A and B ∈ B.

Proof. First note that there is a (unique) finitely additive positive function µ× νon A× B such that

(µ× ν)(A×B) = µ(A)ν(B)

for all A ∈ A and B ∈ B. We want to show that µ× ν is countably additive. LetEn be an increasing sequence of sets in A × B and let E =

⋃∞n=1En. We can

assume that (µ × ν)(E) < ∞, in which case E ∈ A × B. Let fn = χEn so thatfn is increasing and converges to χE . Then for each x, the sequence (fn)x isincreasing and converges to fx. Applying Theorem 4.63 shows that∫

Y

(fn)x dν is increasing to

∫Y

fx dν,

and applying Theorem 4.63 a second time shows that∫x∈X

∫Y

(fn)x dν dµ converges to

∫x∈X

∫Y

fx dν dµ.

That is, (µ × ν)(En) → (µ × ν)(E). This shows that µ × ν is a pre-measure onA× B, and Theorem 4.22 proves the existence and uniqueness of µ⊗ ν.

Note that IA×B(X ×Y, µ⊗ ν) is precisely the set of maps f : X ×Y → E that aresimple with respect to A× B. By definition, if f ∈ IA×B(X × Y, µ⊗ ν) then∫

X×Yf d(µ⊗ ν) =

∫X

∫Y

f dν dµ =

∫Y

∫X

f dµ dν.

Fubini’s theorem

Theorem 4.117. Let µ and ν be σ-finite measures on X and Y . If Z is a set of(µ⊗ ν)-measure zero in X × Y , then ν(Zx) = 0 for almost all x ∈ X.

245

4 Integration

Proof. For each n ≥ 1, let Sn = x ∈ X : ν(Zx) ≥ 1/n. It suffices to show that⋃∞n=1 Sn is contained in a set of measure zero. Let ε > 0. By Corollary 4.23, there

is a sequence Ak ×Bk of rectangles (where Ak ∈ A and Bk ∈ B) such that

Zx ⊆∞⋃k=1

(Ak ×Bk)x and

∞∑k=1

(µ× ν)(Ak ×Bk) <ε

n2n.

Let Tn be the set of all x for which

1

n≤∞∑k=1

ν((Ak ×Bk)x) =

∞∑k=1

ν(Bk)χAk(x);

then Tn is measurable and Sn ⊆ Tn. The partial sums on the right are integrablesimple maps with respect to x, so by Theorem 4.63 we have

1

nµ(Tn) ≤

∞∑k=1

∫X

ν((Ak ×Bk)x) dµ

=

∞∑k=1

(µ× ν)(Ak ×Bk)

<ε

n2n.

This shows that µ(Tn) < ε2−n, so µ (⋃∞n=1 Tn) ≤ ε. But ε was arbitrary, so

µ (⋃∞n=1 Tn) = 0.

Corollary 4.118. Let µ and ν be σ-finite measures on X and Y , and let f, g :X × Y → E be (µ⊗ ν)-measurable maps.

1. The map fx is ν-measurable for µ-almost all x ∈ X.

2. If f = g (µ ⊗ ν)-almost everywhere, then fx = gx ν-almost everywhere forµ-almost all x ∈ X.

Proof. For (1), we apply Theorem 4.39. Let Z be a set of measure zero in X × Ysuch that f |(X×Y )\Z is measurable and f((X×Y )\Z) has a countable dense subset.By Theorem 4.117, the set Zx has measure zero for µ-almost all x ∈ X, and byTheorem 4.114 the restriction of fx to the complement of Zx in Y is measurable.Theorem 4.39 then shows that fx is ν-measurable. For (2), let Z be the set of(µ ⊗ ν)-measure zero on which f and g differ. For each x ∈ X, the maps fx andgx differ at y only if (x, y) ∈ Z, i.e. if y ∈ Zx. But Theorem 4.117 shows thatν(Zx) = 0 for almost all x ∈ X.

246

4 Integration

Theorem 4.119 (Fubini’s theorem). Let µ and ν be σ-finite measures on X andY , and let f : X × Y → E.

1. If f ∈ L 1(µ ⊗ ν,E) then for almost all x ∈ X, the map fx is in L 1(ν,E),the map given by

x 7→∫Y

fx dν

for almost all x (and defined arbitrarily for other x) is in L 1(µ,E), and∫X×Y

f d(µ⊗ ν) =

∫x∈X

∫Y

fx dν dµ.

2. If f is (µ⊗ ν)-measurable, fx ∈ L 1(ν,E) for almost all x ∈ X, and the mapgiven by

x 7→∫Y

|fx| dν

(for almost all x ∈ X) is in L 1(µ,R), then f ∈ L 1(µ⊗ν,E) and (1) applies.

Proof. For (1), by Theorem 4.76 and Theorem 4.61 there is a sequence ϕn ofmaps in IA×B(µ⊗ ν) that is both L1 and almost everywhere convergent to f . LetZ be a set of (µ ⊗ ν)-measure zero such that ϕn → f pointwise on (X × Y ) \ Z.By Theorem 4.117, there is a set S of µ-measure zero in X such that ν(Zx) = 0for x /∈ S. If x /∈ S, we have (ϕn)x → fx pointwise on Y \ Zx. For each n wedefine a map

Φn : X → IB(ν)

x 7→ (ϕn)x,

and since (aχA×B)x = aχA(x)χB , each Φn is in IA(µ, IB(ν)). (Recall that IB(ν)is a vector space equipped with the L1-seminorm.) We have

‖Φm − Φn‖1 =

∫X

|Φm − Φn| dµ

=

∫X

∫Y

|ϕm − ϕn| dν dµ

= ‖ϕm − ϕn‖1 ,

which shows that Φn is L1-Cauchy. Applying Theorem 4.59, Lemma 4.49, The-orem 4.48, Theorem 4.45 and taking a subsequence if necessary, we can assumethat there is a set T of µ-measure zero in X such that Φn(x) is Cauchy wheneverx /∈ T . (Even though IB(ν) is only a seminormed vector space, this result still holds

247

4 Integration

because we can embed it in its completion.) In other words, Φn(x) = (ϕn)x isL1-Cauchy in I(ν). If x /∈ S ∪T then (ϕn)x(y) converges to fx(y) for almost ally ∈ Y , so Theorem 4.61 shows that fx ∈ L 1(ν) and that (ϕn)x is L1-convergentto fx. Therefore ∫

Y

(ϕn)x dν →∫Y

fx dν

for all x /∈ S ∪ T .

For each n we again define a map

Ψn : X → E

x 7→∫Y

(ϕn)x dν.

Every Ψn is in IA(µ), and Ψn is L1-Cauchy since∫X

|Ψm −Ψn| dµ =

∫x∈X

∣∣∣∣∫Y

[(ϕm)x − (ϕn)x] dν

∣∣∣∣ dµ≤∫X

∫Y

|ϕm − ϕn| dν dµ

= ‖ϕm − ϕn‖1 .

Let

Ψ(x) =

∫Y

fx dν;

we already showed that if x /∈ S ∪ T then Ψn(x) → Ψ(x). Theorem 4.61 impliesthat Ψ ∈ L 1(µ) and that Ψn is L1-convergent to Ψ. Therefore∫

x∈X

∫Y

(ϕn)x dν dµ→∫x∈X

∫Y

fx dν dµ.

But each ϕn is simple with respect to A× B, so∫x∈X

∫Y

(ϕn)x dν dµ =

∫X×Y

ϕn d(µ⊗ ν).

This proves (1).

For (2), by Theorem 4.67 it suffices to show that |f | ∈ L 1(µ ⊗ ν,R), so wecan assume that f is a nonnegative real function. By Corollary 4.37 there is asequence ϕn of increasing positive simple functions that converges to f almosteverywhere. By modifying f on a set of measure zero (if necessary), we can assumethat ϕn → f pointwise everywhere. Using the fact that X × Y is σ-finite, we may

248

4 Integration

further assume that each ϕn vanishes outside a set of finite measure (i.e. ϕn isintegrable). For each x the sequence (ϕn)x is increasing and converges to fx.Whenever x is such that fx ∈ L 1(µ,R) and (ϕn)x is ν-measurable (see Corollary4.118), Theorem 4.67 shows that (ϕn)x ∈ L 1(µ,R), so∫

Y

(ϕn)x dν is increasing and convergent to

∫Y

fx dν (*)

by Theorem 4.63. Since each ϕn is in I(µ⊗ ν), it is easy to see by linearity that∫X×Y

ϕn d(µ⊗ ν) =

∫x∈X

∫Y

(ϕn)x dν dµ.

Therefore, applying Theorem 4.63 to (*) shows that∫X×Y

ϕn d(µ⊗ ν) is increasing and convergent to

∫x∈X

∫Y

fx dν dµ.

Finally, applying Theorem 4.63 to ϕn shows that f ∈ L 1(µ,R) and∫X×Y

ϕn d(µ⊗ ν)→∫X×Y

f d(µ⊗ ν).

This completes the proof.

Note that the first part of Fubini’s theorem shows that there is a natural isometricisomorphism

L1(µ⊗ ν,E)→ L1(µ,L1(ν,E)).

4.7 The Cp, Cc, C0 spaces

Let X be a topological space and let F be a normed vector space. Let C(X,F) =C(X) be the normed vector space of all bounded continuous functions from X toF, equipped with the supremum norm

‖f‖C0 = ‖f‖∞ = supx∈X|f(x)| .

If F is a normed algebra then C(X,F) is also a normed algebra under pointwisemultiplication. If F is unital, then C(X,F) is unital.

Theorem 4.120. If F is complete then C(X,F) is complete, i.e. a Banach space.

249

4 Integration

Proof. Use Theorem 1.23 and Theorem 1.22.

Now let U be an open subset of a Banach space E, and assume that F is a Banachspace. Let Cp(U,F) = Cp(U) be the normed vector space of all Cp functionsf : U → F such that f (k) ∈ C(U) for 0 ≤ k ≤ p, equipped with the norm

‖f‖Cp =

p∑k=0

1

k!‖f (k)‖∞ =

p∑k=0

1

k!supx∈U|f (k)(x)|.

Theorem 4.121. Cp(U) is complete, i.e. a Banach space.

Proof. Let fn be a Cauchy sequence in Cp(U); then fn, f ′n, . . . , f(p)n are

all uniformly Cauchy, so for each k = 0, 1, . . . , p we have f(k)n → gk uniformly for

some gk ∈ C(U,Lk(E,F)). (Recall that Lk(E,F) is the space of multilinear mapsfrom Ek to F. We take L0(E,F) = F.) For each k = 0, 1, . . . , p− 1, Theorem 2.57shows that gk is differentiable and g′k = gk+1.

Theorem 4.122. If F is a Banach algebra, then Cp(U) is a Banach algebra underpointwise multiplication.

Proof. For all f, g ∈ Cp(U), the product rule (see Theorem 2.5) shows that

‖fg‖Cp =

p∑k=0

1

k!‖(fg)(k)‖∞

≤p∑k=0

1

k!

k∑j=0

(k

j

)‖f (j)‖∞‖g(k−j)‖∞

=

p∑k=0

k∑j=0

1

j!(k − j)!‖f (j)‖∞‖g(k−j)‖∞

=

p∑j=0

p∑k=0

1

j!‖f (j)‖∞

1

k!‖g(k)‖∞

= ‖f‖Cp ‖g‖Cp .

In general, the space C∞(U) of bounded smooth functions from U to F is not aBanach space under any norm.

250

4 Integration

The space Cc(X)

Let X be a locally compact Hausdorff (LCH) space and let Cc(X,F) = Cc(X)be the normed vector space of F-valued continuous functions on X with compactsupport, equipped with the supremum norm. (The support of f : X → F,denoted by supp(f), is defined as the closure of f−1(F \ 0), i.e. the smallestclosed set outside of which f vanishes.)

We first recall some topological results that will be used throughout this chapter.

Theorem 4.123 (Urysohn’s lemma). A topological space X is normal if and onlyif for any two disjoint closed sets A,B ⊆ X, there exists a continuous functionf : X → [0, 1] such that f = 0 on A and f = 1 on B.

Corollary 4.124. Let X be an LCH space, let U ⊆ X be open and let K ⊆ U becompact. There exists a continuous function f : X → [0, 1] such that f = 1 on Kand supp(f) ⊆ U is compact.

Let X be a topological space, let E ⊆ X and let U = (Uα)α∈A be an open coverof E. A partition of unity on E subordinate to U is a collection ψαα∈A ofcontinuous functions ψα : X → [0, 1] with the following properties:

1. supp(ψα) ⊆ Uα for all α ∈ A.

2. The collection of supports (supp(ψα))α∈A is locally finite.

3.∑α∈A ψα(x) = 1 for all x ∈ E.

Note that the last sum is finite for each x ∈ E because of (2). We say that atopological space X admits partitions of unity on E if for every open cover Uof E, there exists a partition of unity on E subordinate to U .

Theorem 4.125. A Hausdorff space is paracompact if and only if it admits par-titions of unity on itself.

Corollary 4.126. An LCH space admits partitions of unity on its compact subsets.

A real-valued linear functional λ : Cc(X,R) → R is positive if λf ≥ 0 wheneverf ≥ 0. Note that λ does not have to be continuous, but we have the followingcontinuity result when X is an LCH space:

Theorem 4.127. Let X be an LCH space. If λ : Cc(X,R) → R is a positivelinear functional and K ⊆ X is compact, then there exists a constant C such that|λf | ≤ C ‖f‖∞ for all f ∈ Cc(X,R) such that supp(f) ⊆ K. In particular, λ iscontinuous.

251

4 Integration

Proof. By Urysohn’s lemma, we can choose some ϕ ∈ Cc(X,R) such that 0 ≤ ϕ ≤1 on X and ϕ = 1 on K. Let f ∈ Cc(X,R) with supp(f) ⊆ K; then |f | ≤ ‖f‖∞ ϕ,so ‖f‖∞ ϕ− f ≥ 0 and ‖f‖∞ ϕ+ f ≥ 0. Applying λ gives ‖f‖∞ λϕ− λf ≥ 0 and‖f‖∞ λϕ+ λf ≥ 0, so |λf | ≤ (λϕ) ‖f‖∞.

The space C0(X)

Clearly Cc(X) ⊆ C(X), but Cc(X) is not a complete space under the supremumnorm. When X is an LCH space, the closure of Cc(X) has a particularly simpledescription. We say that a continuous function f : X → F vanishes at infinityif for every ε > 0 the set x ∈ X : |f(x)| ≥ ε is compact. Let C0(X,F) = C0(X)be the normed vector space of all such functions, equipped with the supremumnorm. Each f ∈ C0(X) is indeed bounded: if we let E = x ∈ X : |f(x)| ≥ 1,then f (E) is compact and |f | < 1 on X \E. Every f ∈ Cc(X) clearly vanishes atinfinity, so we have the inclusions

Cc(X) ⊆ C0(X) ⊆ C(X).

Theorem 4.128. If X is an LCH space then the closure of Cc(X) in C(X) isC0(X). Therefore C0(X) is complete, i.e. a Banach space.

Proof. Let fn be a sequence in Cc(X) converging uniformly to some f ∈ C(X).Let ε > 0 and choose some n such that ‖fn − f‖∞ < ε. Then |f | < ε on X \supp(fn), so f ∈ C0(X). Conversely, if f ∈ C0(X) then let Kn = x ∈ X :|f(x)| ≥ n−1 for n = 1, 2, . . . . Each Kn is compact, so Urysohn’s lemma showsthat there is some gn ∈ Cc(X,R) such that 0 ≤ gn ≤ 1 on X and gn = 1 on Kn.Let fn = gnf . Then fn ∈ Cc(X) and ‖fn − f‖∞ < n−1, so fn → f uniformly.

Theorem 4.129. If µ is any Borel measure on X then every element of C0(X)is µ-measurable, and C0(X) ⊆ L∞(X,µ). Furthermore:

1. If µ is finite on compact sets then Cc(X) ⊆ L p(X,µ) for every 1 ≤ p ≤ ∞.

2. If µ(X) <∞ then C0(X) ⊆ L p(X,µ) for every 1 ≤ p ≤ ∞.

Proof. Let f ∈ C0(X) and let An = x ∈ X : |f(x)| ≥ 1/n, which is compact forevery n. Then each f(An) is compact and therefore separable, so

f(X) ⊆ 0 ∪∞⋃n=1

f(An)

is separable. Also, f is measurable because it is continuous. Theorem 4.39 showsthat f is µ-measurable. Since f is also bounded, we have f ∈ L∞(X,µ). The

252

4 Integration

other inclusions follow from Corollary 4.92, noting that any f ∈ Cc(X) vanishesoutside a set of finite measure.

One-point compactification

The space C0(X) is closely related to the one-point compactification X∗ of X,which is formed by taking X∗ = X ∪ ∞ and declaring that a subset E ⊆ X∗

is open if ∞ /∈ E and E is open in X or if ∞ ∈ E and X∗ \ E is compact.It is easy to check that X∗ is indeed a compact Hausdorff space, and that theinclusion map ι : X → X∗ is an embedding (an injective continuous map that isa homeomorphism onto its image). Also, X is dense in X∗ if and only if X isnon-compact.

Lemma 4.130. A map f : X∗ → F is continuous if and only if f |X − f(∞) ∈C0(X).

Proof. The set

x ∈ X : |f(x)− f(∞)| ≥ ε = X∗ \ f−1(y ∈ F : |y − f(∞)| < ε)

is compact if and only if f is continuous at ∞.

Now assume that F is a unital normed algebra. If X is non-compact, then C0(X)cannot be a unital algebra. Using the one-point compactification, we can describeexplicitly the unitization of C0(X):

Theorem 4.131. The map Φ : C(X∗)→ C0(X)e defined by

Φf = (f |X − f(∞)) + f(∞)e

is a unital algebra isomorphism and a homeomorphism.

Proof. Φ is well-defined due to Lemma 4.130, and is clearly injective, unital andlinear. If f, g ∈ C(X∗) then

(Φf)(Φg) = (f |X − f(∞))(g|X − g(∞)) + g(∞)(f |X − f(∞))

+ f(∞)(g|X − g(∞)) + f(∞)g(∞)e

= (f |Xg|X − f(∞)g(∞)) + f(∞)g(∞)e

= Φ(fg),

which shows that Φ is an algebra homomorphism. If g + re ∈ C0(X)e then themap f : X∗ → F defined by f(x) = g(x)+r for x ∈ X and f(∞) = r is continuous

253

4 Integration

by Lemma 4.130, and clearly Φf = g. This shows that Φ is surjective. Finally, wehave

|Φf | = ‖f |X − f(∞)‖∞ + |f(∞)| ≤ 3 ‖f‖∞for all f ∈ C(X∗) and∥∥Φ−1(g + re)

∥∥∞ = max ‖g + r‖∞ , |r| ≤ ‖g‖∞ + |r| = |g + re|

for all g + re ∈ C0(X)e, which shows that Φ is a homeomorphism.

Furthermore, if X is non-compact then C0(X)e is just the subspace C0(X) +FeXof C(X), where eX : X → F is the constant function taking on the value e on X.

The Stone-Weierstrass theorem

Theorem 4.132 (Weierstrass approximation theorem). The space of polynomialfunctions is dense in C([a, b],R).

Proof. We can assume that [a, b] = [0, 1]. Let f ∈ C([0, 1],R); we can assume thatf(0) = f(1) = 0 since we can always consider

g(x) = f(x)− f(0)− x(f(1)− f(0))

instead. Define f(x) = 0 for x /∈ [0, 1]. For each n = 1, 2, . . . , let qn(x) =

cn(1−x2)n with cn chosen so that∫ 1

−1qn = 1. (Note that we are using the regulated

integral, defined in Section 1.8.) By using the inequality (1− x2)n ≥ 1−nx2, it iseasy to see that cn <

√n. Define pn : [0, 1]→ R by

pn(x) =

∫ 1

−1

f(x+ t)qn(t) dt

=

∫ 1+x

−1+x

f(t)qn(t− x) dt

=

∫ 1

0

f(t)qn(t− x) dt,

which is clearly a polynomial in x. Since g has compact support, it is uniformlycontinuous. For any ε > 0 we can choose 0 < δ < 1 such that |f(x)− f(y)| < ε/2

254

4 Integration

whenever |x− y| < δ. Then for all 0 ≤ x ≤ 1,

|pn(x)− f(x)| =∣∣∣∣∫ 1

−1

(f(x+ t)− f(x))qn(t) dt

∣∣∣∣≤∫ 1

−1

|f(x+ t)− f(x)| qn(t) dt

≤ 2 supx∈R|f(x)|

(∫ −δ−1

qn +

∫ 1

δ

qn

)+ε

2

∫ δ

−δqn

≤ 4 supx∈R|f(x)|

√n(1− δ2)n +

ε

2,

which is less than ε if we choose n large enough.

A subalgebra A of C(X,R) (under pointwise multiplication) separates points iffor any x, y ∈ X with x 6= y, there is some f ∈ A such that f(x) 6= f(y). We saythat A strongly separates points if for all x ∈ X there is some f ∈ A such thatf(x) 6= 0. Note that if A separates points and contains the constant functions,then A strongly separates points.

Theorem 4.133 (Stone-Weierstrass theorem). Let X be an LCH space and let Abe a closed subalgebra of C0(X,R) that separates points. If A strongly separatespoints, then A = C0(X,R). Otherwise, there is some z0 ∈ X such that A = f ∈C0(X,R) : f(z0) = 0.

Proof. Suppose that X is compact. We first show that if f, g ∈ A then |f | ∈A, max(f, g) ∈ A and min(f, g) ∈ A. Let ε > 0 and choose some polynomialfunction p such that ||x| − p(x)| < ε/2 for |x| ≤ 1 (see Theorem 4.132). Letq(x) = p(x)− p(0); then ||x| − q(x)| < ε for |x| ≤ 1. If f 6= 0 then h = f/ ‖f‖∞ isa map into [−1, 1], so ‖|h| − q h‖∞ < ε. Since q has no constant term and A isan algebra, q h ∈ A. Since A is closed, we must have f = ‖f‖∞ h ∈ A. It followsthat max(f, g) = 1

2 (f + g + |f − g|) ∈ A and min(f, g) = 12 (f + g − |f − g|) ∈ A.

We now prove that either the statement below holds for all f ∈ C(X,R), or thereis some z0 ∈ X such that the statement holds for all f ∈ C(X,R) with f(z0) = 0:

For all x, y ∈ X there exists some gx,y ∈ A such that

gx,y(x) = f(x) and gx,y(y) = f(y). (*)

For all x, y ∈ X with x 6= y let Ax,y = (g(x), g(y)) : g ∈ A ⊆ R2, which is analgebra under component-wise multiplication. If Ax,y = R2 whenever x 6= y, then(*) holds for all f ∈ C(X,R). Otherwise, there is some Ax0,y0 such that Ax0,y0 6=

255

4 Integration

R2. Since A separates points, we cannot have Ax0,y0 = (0, 0). Therefore A isone-dimensional, and we can choose some nonzero vector (a, b) that spans Ax0,y0 .If a, b 6= 0 and a 6= b then (a, b) and (a2, b2) are linearly independent, whichcontradicts the fact that Ax0,y0 6= R2. Therefore a = 0 or b = 0, so Ax0,y0 isspanned by (1, 0) or (0, 1). This implies that there is some z0 ∈ X such thatg(z0) = 0 for all g ∈ A. There cannot be another z1 ∈ X with this propertysince A separates points, so we have shown that Ax,y = R2 whenever x 6= y andx, y 6= z0, and that Az0,x is spanned by (1, 0) or (0, 1) whenever x 6= z0. Therefore(*) holds for all f ∈ C(X,R) such that f(z0) = 0.

Let f ∈ C(X,R) such that (*) holds and let ε > 0. For all x, y ∈ X, let

Ux,y = z ∈ X : f(z) < gx,y(z) + ε ,Vx,y = z ∈ X : f(z) > gx,y(z)− ε ,

which are both open and contain both x and y. For each y ∈ X the collectionUx,y : x ∈ X covers X, so there is a finite subcover Ux1,y, · · · , Uxm,y. Letgy = max(gx1,y, . . . , gxm,y) ∈ A so that f < gy + ε on X and f > gy − ε onVy = Vx1,y ∩ · · · ∩Vxm,y. Since Vy : y ∈ X is an open cover of X, there is a finitesubcover Vy1 , . . . , Vyn. Then g = min(gy1 , . . . , gyn) ∈ A satisfies ‖f − g‖∞ < ε.This shows that f ∈ A.

Now suppose that X is non-compact. If there is some z0 ∈ X such that f(z0) = 0for all f ∈ A then let Y = X \ z0; otherwise, let Y = X. Let Y ∗ be the one-point compactification of Y . Lemma 4.130 shows that every f ∈ A extends to acontinuous function f∗ : Y ∗ → R such that f∗|Y = f |Y and f∗(∞) = 0. ThenA∗ = f∗ : f ∈ A is a closed subalgebra of C(Y ∗,R) that separates points, soA∗ = g ∈ C(Y ∗,R) : g(∞) = 0. This implies that A = f ∈ C0(X,R) : f |Y ∈C0(Y,R). If Y = X then we are done; otherwise, A = f ∈ C0(X,R) : f(z0) = 0because f(z0) = 0 for all f ∈ A.

Theorem 4.134 (Stone-Weierstrass theorem, complex version). Let X be anLCH space and let A be a closed subalgebra of C0(X,C) that separates pointsand is closed under complex conjugation. If A strongly separates points, thenA = C0(X,C). Otherwise, there is some z0 ∈ X such that A = f ∈ C0(X,C) :f(z0) = 0.

Proof. Since Re f = (f + f)/2 and Im = (f − f)/2i, we can apply Theorem 4.133to the subalgebra

AR = Re f : f ∈ A = Im f : f ∈ A

of C0(X,R). The result follows since A = f + ig : f, g ∈ AR.

256

4 Integration

Theorem 4.135 (Stone-Weierstrass theorem, vector version). Let X be a compactHausdorff space and let A be a closed subalgebra of C(X,R) that strongly separatespoints. Let F be a normed vector space and let AF be the span of

vg : v ∈ F, g ∈ A .

Then AF is dense in C(X,F).

Proof. Let f ∈ C(X,F) and let ε > 0. For each x ∈ X, choose some neighborhoodUx of x such that |f(y)− f(x)| < ε for all y ∈ Ux. Since Ux : x ∈ X covers X,there is a finite subcover U1, . . . , Un. By Corollary 4.126, there is a partition ofunity ψ1, . . . , ψn on X subordinate to U1, . . . , Un. Let h =

∑ni=1 f(xi)ψi so

that for all x ∈ X we have

|f(x)− h(x)|

=

∣∣∣∣∣n∑i=1

ψi(x)f(x)−n∑i=1

f(xi)ψi(x)

∣∣∣∣∣≤

n∑i=1

ψi(x) |f(x)− f(xi)|

< ε,

i.e. ‖f − h‖∞ < ε. For each i, Theorem 4.133 shows that there is some ϕi ∈ Asuch that ‖ϕi − ψi‖∞ < ε/n. Then g =

∑ni=1 f(xi)ϕi ∈ AF and

‖f − g‖∞ ≤ ‖f − h‖∞ + ‖h− g‖∞< ε(1 + ‖f‖∞).

4.8 Radon measures

The theory of integration on locally compact Hausdorff (LCH) spaces provides anumber of important results for measures that are “compatible” with the topologyof the underlying space. In this section, we let X be an LCH space. We say thatX is σ-compact if it can be written as a countable union of compact sets.

Definition 4.136. A Borel measure on X is a measure defined on BX , theBorel σ-algebra of X. Let µ be a positive Borel measure on X. We say that µ isouter regular on a Borel set E ⊆ X if

µ(E) = inf µ(U) : U ⊇ E and U is open ,

257

4 Integration

inner regular on E if

µ(E) = sup µ(K) : K ⊆ E andK is compact ,

and regular if it is outer and inner regular on all Borel sets. A positive Radonmeasure is a positive Borel measure that is finite on compact sets, outer regularon all Borel sets, and inner regular on all open sets.

Note that if µ is a positive Radon measure on X then Cc(X,R) ⊆ L 1(X,µ,R), sothe map f 7→

∫Xf dµ is a positive linear functional on Cc(X,R). The main result

in this section is that the reverse holds: all positive linear functionals give rise toa positive Radon measure.

Riesz representation theorem

If f ∈ Cc(X,R) and U is open in X, we write f ≺ U to mean that 0 ≤ f ≤ 1 andsupp(f) ⊆ U .

Theorem 4.137 (Riesz representation theorem). Let λ : Cc(X,R) → R be apositive linear functional. There exists a unique positive Radon measure µ on Xsuch that

λf =

∫X

f dµ

for all f ∈ Cc(X,R). Furthermore, we have

µ(U) = sup λf : f ∈ Cc(X,R), f ≺ U (*)

for all open U ⊆ X and

µ(K) = inf λf : f ∈ Cc(X,R), f ≥ χK (**)

for all compact K ⊆ X.

Proof. To prove uniqueness, let µ be a Radon measure on X satisfying λf =∫Xf dµ for all f ∈ Cc(X,R). Let U ⊆ X be open. Clearly, λf ≤ µ(U) whenever

f ≺ U . If K ⊆ U is compact then Urysohn’s lemma shows that there is somef ∈ Cc(X,R) such that f ≺ U and f = 1 on K, so µ(K) ≤ λf . Since µ is innerregular on U , the condition (*) holds, and uniqueness follows from the fact that µis outer regular on Borel sets.

To prove existence, write

µ(U) = sup λf : f ∈ Cc(X,R), f ≺ U

258

4 Integration

for open U ⊆ X and define

µ∗(E) = inf µ(U) : U ⊇ E and U is open

for all subsets E ⊆ X. We first prove:

µ∗ is an outer measure.

If we can show that µ (⋃∞n=1 Un) ≤

∑∞n=1 µ(Un) whenever Un is a sequence of

open sets, then

µ∗(E) = inf

∞∑n=1

µ(Un) :

∞⋃n=1

Un ⊇ E and every Un is open

and the result follows from Theorem 4.21. Let Un be a sequence of open setsand let U =

⋃∞n=1 Un. Let f ∈ Cc(X,R) with f ≺ U and let K = supp(f). Since

K is compact we have K ⊆ Uk1 ∪· · ·∪Ukm for some k1, . . . , km, so Corollary 4.126shows that there is a partition of unity ψ1, . . . , ψm ∈ Cc(X,R) on K subordinateto Uk1 , . . . , Ukm. Then f =

∑mi=1 ψif and ψif ≺ Uki for all i, so

λf =

m∑i=1

λ(ψif) ≤m∑i=1

µ(Uki) ≤∞∑n=1

µ(Un).

The result follows since f was arbitrary.

Next we prove:Every open set is µ∗-measurable.

Let U ⊆ X be open and let E ⊆ X with µ∗(E) <∞; we need to show that

µ∗(E) ≥ µ∗(E ∩ U) + µ∗(E ∩ U c).

(The reverse inequality is already true since µ∗ is an outer measure.) We firstassume that E is open. Let ε > 0; since E∩U is open we can choose f ∈ Cc(X,R)such that f ≺ E ∩U and λf ≥ µ(E ∩U)− ε, and since E \ supp(f) is open we canfind g ∈ Cc(X,R) such that g ≺ E \ supp(f) and λg ≥ µ(E \ supp(f))− ε. Sincef + g ≺ E we have

µ(E) ≥ λf + λg

≥ µ(E ∩ U) + µ(E \ supp(f))− 2ε

≥ µ∗(E ∩ U) + µ∗(E ∩ U c)− 2ε,

259

4 Integration

and taking ε→ 0 gives the result. For the general case, choose an open set V ⊇ Esuch that µ(V ) ≤ µ∗(E) + ε. Then

µ∗(E) + ε ≥ µ(V )

≥ µ∗(V ∩ U) + µ∗(V ∩ U c)≥ µ∗(E ∩ U) + µ∗(E ∩ U c),

so taking ε→ 0 proves the result.

Since µ∗ is an outer measure, Theorem 4.20 shows that µ∗ is a positive measureon the σ-algebra of µ∗-measurable sets. Every open set is µ∗-measurable, so thisσ-algebra contains the Borel σ-algebra. Let µ = µ∗|BX (this is consistent with ourprevious definition of µ); then (*) holds by definition, and we have already shownthat µ is outer regular on all Borel sets. We now prove:

µ satisfies (**).

Let K ⊆ X be compact and let I be the infimum on the right hand side of (**).Let f ∈ Cc(X,R) with f ≥ χK and let ε > 0. Let Uε = f−1 ((1− ε,∞)) so thatK ⊆ Uε. For all g ∈ Cc(X,R) with g ≺ Uε we have (1 − ε)−1f − g ≥ 0 andλg ≤ (1− ε)−1λf , so

µ(K) ≤ µ(Uε) ≤ (1− ε)−1λf

using (*); taking ε → 0 shows that µ(K) ≤ λf . Since f was arbitrary, we haveµ(K) ≤ I. If U ⊇ K is open then Urysohn’s lemma shows that there is somef ∈ Cc(X,R) such that f ≺ U and f ≥ χK , so I ≤ λf ≤ µ(U). Since µ is outerregular on K, we have I ≤ µ(K). This proves (**).

From (**) it follows easily (using Urysohn’s lemma) that µ is finite on compactsets. We also want to show:

µ is inner regular on open sets.

Let U ⊆ X be open and suppose that α < µ(U). Choose f ∈ Cc(X,R) such thatf ≺ U and λf > α, and let K = supp(f). For all g ∈ Cc(X,R) with g ≥ χK wehave g− f ≥ 0, so λg ≥ λf > α. Using (**) we have µ(K) > α, and taking ε→ 0shows that S ≥ µ(U). It is clear that S ≤ µ(U), so this proves that µ is innerregular on U .

Finally we prove:

λf =

∫X

f dµ for all f ∈ Cc(X,R).

Since functions in Cc(X,R) are bounded, we can assume 0 ≤ f ≤ 1 on X. LetN be a positive integer, let K0 = supp(f), and let Ki = f−1 ([i/N,∞)) for each

260

4 Integration

i = 1, . . . , N . Also define f1, . . . , fN ∈ Cc(X,R) by

fi(x) =

0, x /∈ Ki−1,

f(x)− i−1N , x ∈ Ki−1 \Ki,

1N , x ∈ Ki

so that f =∑Ni=1 fi. Since N−1χKi ≤ fi ≤ N−1χKi−1

we have

1

Nµ(Ki) ≤

∫X

fi dµ ≤1

Nµ(Ki−1).

If U ⊇ Ki−1 is open then Nfi ≺ U and λfi ≤ N−1µ(U), so (**) for Ki and outerregularity on Ki−1 imply that

1

Nµ(Ki) ≤ λfi ≤

1

Nµ(Ki−1).

Using the fact that f =∑Ni=1 fi, we have

1

N

N∑i=1

µ(Ki) ≤∫X

f dµ ≤ 1

N

N∑i=1

µ(Ki−1),

1

N

N∑i=1

µ(Ki) ≤ λf ≤1

N

N∑i=1

µ(Ki−1),

and ∣∣∣∣λf − ∫X

f dµ

∣∣∣∣ ≤ µ(K0)− µ(KN )

N≤ µ(K0)

N.

Since µ(K0) <∞, taking N →∞ completes the proof.

Approximation theorems

Theorem 4.138. Every Radon measure is inner regular on all σ-finite sets.

Proof. Let µ be a Radon measure. We first prove the result for sets of finitemeasure. Let E be a Borel set with µ(E) < ∞, let ε > 0, choose an open setU ⊇ E such that µ(U) < µ(E) + ε, and choose a compact set F ⊆ U such thatµ(F ) > µ(U)− ε. Since µ(U \E) < ε, we can choose an open set V ⊇ U \E suchthat µ(V ) < ε. Let K = F \ V so that K is compact, K ⊆ E, and

µ(K) = µ(F )− µ(V ∩ F ) > µ(U)− ε− µ(V ) > µ(E)− 2ε.

261

4 Integration

This shows that µ is inner regular on E. Now let E be σ-finite with µ(E) = ∞.We can write E =

⋃∞n=1En where En is an increasing sequence of sets of finite

measure and µ(En)→∞. For every N there exists an n such that µ(En) > N , andsince µ is inner regular on En, there is a compact K ⊆ En such that µ(K) > N .This proves that µ is inner regular on E.

Corollary 4.139. Every σ-finite Radon measure is regular.

Theorem 4.140. Let µ be a Radon measure on X and let E be a σ-finite Borelset.

1. For every ε > 0, there is a closed F ⊆ E and an open U ⊇ E such thatµ(U \ F ) < ε.

2. If µ(E) <∞ then we can choose F to be compact in (1).

Proof. (2) is clear because we can choose an open set U ⊇ E such that µ(U) <µ(E) + ε/2 and a compact set K ⊆ E such that µ(K) > µ(E)− ε/2. For (1), wecan write E =

⋃∞n=1En for some disjoint sequence En of sets of finite measure.

For each n, choose an open set Un ⊇ En with µ(Un) < µ(En) + ε/2n+1. ThenU =

⋃∞n=1 Un is open and contains E, and

µ(U \ E) ≤∞∑n=1

µ(Un \ En) <ε

2.

Similarly, we can choose an open set V ⊇ Ec such that µ(V \ Ec) < ε/2. ThenF = V c is closed, F ⊆ E and

µ(U \ F ) = µ(U \ E) + µ(E \ F )

= µ(U \ E) + µ(V \ Ec)< ε.

Theorem 4.141. If every open set in X is σ-compact (e.g. when X is secondcountable), then every positive Borel measure on X that is finite on compact setsis regular (and is therefore Radon).

Proof. Let µ be such a measure. By Theorem 4.129, we can define a positive linearfunctional λ on Cc(X,R) by λf =

∫Xf dµ. By Theorem 1.96, there is some Radon

measure ν on X such that λf =∫Xf dν for all f ∈ Cc(X,R). Let U ⊆ X be open;

since U is σ-compact, we can write U =⋃∞i=1Ki where each Ki is compact. Define

a sequence f1, f2, . . . in Cc(X,R) as follows: for each n = 1, 2, . . . , choose some

262

4 Integration

fn ∈ Cc(X,R) such that fn = 1 on⋃ni=1Ki ∪

⋃n−1i=1 supp(fi) and fn ≺ U . Then

f1, f2, . . . increases to χU , so Theorem 4.63 shows that

µ(U) = limn→∞

∫X

fn dµ = limn→∞

∫X

fn dν = ν(U).

Now let E be any Borel set and let ε > 0. By Theorem 4.140, there is a closedF ⊆ E and an open U ⊇ E such that ν(U \ F ) < ε. Since U \ F is open, we haveµ(U \F ) = ν(U \F ) < ε. This implies that µ(U) ≤ µ(E) + ε, which shows that µis outer regular on E. We also have µ(F ) ≥ µ(E) − ε, and since F is σ-compact(because X is σ-compact and F is closed), we can find a sequence K1,K2, . . . of compact sets with Kn ⊆ F such that µ(Kn)→ µ(F ) as n→∞. In particular,if µ(E) < ∞ then there is some compact set K such that µ(K) ≥ µ(E) − ε, andif µ(E) =∞ then µ(Kn)→∞. This shows that µ is inner regular on E.

Theorem 4.142. Let µ be a finite positive Borel measure. If µ is inner regularon all Borel sets, then µ is regular.

Proof. Let ε > 0 and let E be a Borel set. Choose a compact set K ⊆ X \E suchthat µ(K) > µ(X \ E)− ε; then X \K is open and

µ((X \K) \ E) = µ((X \ E) \K) < ε.

Theorem 4.143. If µ is a Radon measure on X and 1 ≤ p < ∞ then Cc(X,F)is dense in L p(X,µ,F).

Proof. Theorem 4.87 shows that I(µ) is dense in L p(µ), so it suffices to showthat the integrable simple maps can be approximated by elements of Cc(X,F).Let E be a Borel set with µ(E) < ∞ and let ε > 0. Theorem 4.140 shows thatthere is a compact K ⊆ E and an open U ⊇ E such that µ(U \ K) < ε. ByUrysohn’s lemma, we can choose some f ∈ Cc(X,R) such that χK ≤ f ≤ χU ,and ‖χE − f‖p ≤ µ(U \K)1/p < ε1/p. The result follows from the fact that everyintegrable simple map is a linear combination of characteristic functions (of setsof finite measure).

Corollary 4.144. If X is second countable, µ is a Radon measure on X, and Eis separable, then Lp(X,µ,E) is separable for 1 ≤ p <∞.

263

4 Integration

Lower semicontinuous functions

We say that a function f : X → (−∞,∞] is lower semicontinuous (LSC) iff−1((a,∞]) is open for every a ∈ R. It is clear that every continuous function isLSC, and Theorem 4.33 shows that every LSC function is Borel measurable.

Theorem 4.145 (Properties of LSC functions).

1. If U ⊆ X is an open set then χU is LSC.

2. If f is LSC then cf is LSC for all c ≥ 0.

3. If f, g are LSC then f + g is LSC.

4. If F is a collection of LSC functions then supf∈F f is LSC (cf. Theorem4.34).

5. If f ≥ 0 is LSC then

f = sup g : g ∈ Cc(X,R), 0 ≤ g ≤ f .

Proof. (4) follows from the fact that(supf∈F

f

)−1

((a,∞]) =⋃f∈F

f−1((a,∞])

for all a ∈ R. For (5), it suffices to show that if f(x) > 0 and 0 < y < f(x)then there exists some g ∈ Cc(X,R) with 0 ≤ g ≤ f such that g(x) ≥ y. SinceU = f−1((y,∞]) is open, Urysohn’s lemma shows that there is some h ∈ Cc(X,R)such that χx ≤ h ≤ χU . Let g = yh; then 0 ≤ g ≤ yχU ≤ f and g(x) ≥ y.

Theorem 4.146 (Monotone convergence theorem for LSC functions). Let µ bea Radon measure on X. Let F be a directed set of nonnegative LSC functionsin L 1(µ,R) under the usual relation ≥ (so that for any f, g ∈ F there existssome h ∈ F such that h ≥ f and h ≥ g). If

∫Xf dµ : f ∈ F

is bounded, then

supf∈F f <∞ almost everywhere, supf∈F f ∈ L 1(µ,R), and∫X

supf∈F

f dµ = supf∈F

∫X

f dµ.

Proof. Let C = sup∫Xf dµ : f ∈ F

<∞, let g = supf∈F f and let

gn = 2−n22n∑k=1

χUn,k , Un,k = g−1((k2−n,∞]

).

264

4 Integration

Then gn is an increasing sequence of simple maps converging to g, because ifx ∈ X and 0 < g(x) <∞ then for each n with 2−n < g(x) < 2n there is an integerm > 0 such that m2−n < g(x) < (m+ 1)2−n, and

g(x)− 2−n < 2−nm ≤ 2−n22n∑k=1

χUn,k(x) < 2−n(m+ 1) < g(x) + 2−n.

If g(x) = 0 then gn(x) = 0, and if g(x) =∞ then gn(x) = 2n, so gn(x)→ g(x) inboth cases.

Let n be a positive integer. For each k, let 0 < Mn,k < µ(Un,k). Theorem4.145 shows that g is LSC, so Un,k is open for k = 1, . . . , 22n. Since µ is innerregular on Un,k, there is a compact set Kk ⊆ Un,k such that µ(Kk) > Mn,k.

Let hn = 2−n∑22n

k=1 χKk and K =⋃22n

k=1Kk. For each x ∈ K we have g(x) >gn(x) ≥ hn(x) because g(x) > 0 whenever x ∈ Un,k. Choose some ϕx ∈ F suchthat ϕx(x) > hn(x). Theorem 4.145 implies that ϕx − hn is LSC, so the setVx = y ∈ X : hn(y) < ϕx(y) is a neighborhood of x. Since K is compact,finitely many Vx1 , . . . , Vxm cover K, and since F is a directed set, we can choosesome fn ∈ F such that fn ≥ ϕxj for j = 1, . . . ,m. Therefore fn ≥ hn and

C ≥∫X

fn dµ ≥∫X

hn dµ = 2−n22n∑k=1

µ(Kk) > 2−n22n∑k=1

Mn,k. (*)

If µ(Un,k) = ∞ for some n and k then taking Mn,k → ∞ in (*) produces acontradiction, since C <∞. Therefore µ(Un,k) <∞ for all n and k, which impliesthat gn ∈ L 1(µ,R). Taking Mn,k → µ(Un,k) in (*) shows that

C ≥∫X

fn dµ ≥ 2−n22n∑k=1

µ(Un,k) =

∫X

gn dµ. (**)

By Theorem 4.63, g <∞ almost everywhere, g ∈ L 1(µ,R), and gn → g in L1.

It is clear that C ≤∫Xg dµ. Let ε > 0 and choose some n such that

∫Xgn dµ >∫

Xg dµ− ε. By (**) we have C ≥

∫Xg dµ− ε, so taking ε→ 0 shows that

supf∈F

∫X

f dµ = C =

∫X

g dµ =

∫X

supf∈F

f dµ.

265

4 Integration

Vector-valued Radon measures

Let F be a Banach space over K. We say that an F-valued measure µ is Radonif |µ| is a Radon measure. Let MR(F) = MR(X,F) ⊆ M(BX ,F) be the space ofF-valued Radon measures of bounded variation on (X,BX).

Theorem 4.147. MR(F) is a vector subspace of M(BX ,F).

Proof. Since |rµ| = |r| |µ|, it is clear that rµ is a Radon measure whenever r ∈ Kand µ is a Radon measure. Let µ, ν be Radon measures. It is clear that |µ+ ν|is finite on compact sets. Let E be a Borel set and let ε > 0; we want to showthat there is an open set U ⊇ E such that |µ+ ν| (U \ E) < ε. Choose open setsU, V ⊇ E such that |µ| (U \ E) < ε/2 and |ν| (V \ E) < ε/2. Then U ∩ V ⊇ E isopen and

|µ+ ν| ((U ∩ V ) \ E) ≤ |µ| (U \ E) + |ν| (V \ E) < ε.

This proves that |µ+ ν| is outer regular on all Borel sets. Let U be an open setand let ε > 0; we want to show that there is a compact set E ⊆ U such that|µ+ ν| (U \ E) < ε. Choose compact sets E,F ⊆ U such that |µ| (U \ E) < ε/2and |ν| (U \ F ) < ε/2. Then E ∪ F is compact and

|µ+ ν| (U \ (E ∪ F )) ≤ |µ| (U \ E) + |ν| (U \ F ) < ε.

This proves that |µ+ ν| is inner regular on all open sets. Therefore µ + ν isRadon.

Theorem 4.148. Let F,G be Banach spaces and let f : F → G be a continuouslinear map. If µ is an F-valued Radon measure then f µ is a G-valued Radonmeasure (cf. Theorem 4.14).

Proof. We have |f µ| ≤ |f | |µ| due to Theorem 4.14, so it is clear that |f µ|is finite on compact sets. We can assume f 6= 0, for otherwise f µ = 0 isRadon. Let E be a Borel set and let ε > 0. Choose an open set U ⊇ E such that|µ| (U \ E) < ε/ |f |. Then

|f µ| (U \ E) ≤ |f | |µ| (U \ E) < ε,

so |f µ| is outer regular on all Borel sets. A similar argument shows that |f µ|is inner regular on all open sets.

Now let E,G be Banach spaces and assume that we are given a compatible product· : E × F → G. Recall from Theorem 4.103 that the map taking a functionf ∈ L1(X, |µ|,E) to its indefinite integral µf ∈M(BX ,G) is continuous and linear.The following theorem shows that if µ is Radon, then the image of this map iscontained in MR(G).

266

4 Integration

Theorem 4.149. If µ is a positive or F-valued Radon measure, then for all f ∈L 1(|µ|,E) the indefinite integral µf is Radon.

Proof. It is clear that |µf | is finite on compact sets. Let ε > 0 and let E be aBorel set. By Theorem 4.102, there is some δ > 0 such that for all Borel sets Fwe have |µf | (F ) < ε whenever |µ| (F ) < δ. Since |µ| is Radon, we can choose anopen set U ⊇ E such that |µ| (U \ E) < δ. Then |µf | (U \ E) < ε. This showsthat |µf | is outer regular on all Borel sets. The proof that |µf | is inner regular onopen sets is similar.

We will be using the following result in Section 5.9. The proof is due to [15].

Theorem 4.150. Let X,Y be LCH spaces and let f : X → Y be a continuousmap. If µ ∈MR(X,F) then f∗µ ∈MR(Y,F).

Proof. Theorem 4.26 shows that |f∗µ| ≤ f∗|µ|. Let ε > 0 and let E ⊆ Y be aBorel set. Since |µ| is Radon, we can choose a compact set F ⊆ f−1(E) such that|µ|(f−1(E) \ F ) < ε. Then f(F ) is compact, f(F ) ⊆ E, and

|f∗µ|(E \ f(F )) ≤ (f∗|µ|)(E \ f(F )) = |µ|(f−1(E \ f(F )))

= |µ|(f−1(E) \ f−1(f(F ))) ≤ |µ|(f−1(E) \ F )

< ε.

This shows that |f∗µ| is inner regular on all Borel sets, and the result follows fromTheorem 4.142.

Duality

Theorem 4.137 characterizes the positive linear functionals on Cc(X,R). Here wewill characterize the (continuous) linear functionals on C0(X,F). Note that if µis a vector-valued Radon measure of bounded variation on X then C0(X,F) ⊆L p(X, |µ| ,F) for every 1 ≤ p ≤ ∞ due to Theorem 4.129.

Theorem 4.151 (Riesz representation theorem for vector-valued measures). LetF be a Hilbert space over K and let λ : C0(X,F) → K be a (continuous) linearfunctional. Then there exists a unique µ ∈MR(F) such that

λf =

∫X

f dµ

for all f ∈ C0(X,F), where the compatible product is the inner product on F.Furthermore,

|λ| = ‖µ‖ = |µ| (X).

267

4 Integration

Proof. To prove uniqueness, suppose that∫Xf dµ = 0 for all f ∈ C0(X,F). Use

Theorem 4.100 to write dµ = fµ d |µ| for some fµ ∈ L 2(|µ| ,F) with |fµ| = 1 onX. Let ε > 0. Theorem 4.143 shows that Cc(X,F) is dense in L 1(|µ| ,F), so wecan choose some f ∈ Cc(X,F) such that ‖f − fµ‖1 < ε. Then using Theorem4.104,

|µ| (X) = |µ| (X)−∫X

f dµ =

∫X

|fµ|2 d |µ| −∫X

〈f, fµ〉 d |µ|

=

∫X

〈fµ − f, fµ〉 d |µ| ≤∫X

|fµ − f | d |µ|

< ε,

so taking ε→ 0 shows that |µ| (X) = 0.

To prove existence, we can assume that |λ| = 1. First, we find a positive linearfunctional ϕ : Cc(X,R)→ R such that

|λf | ≤ ϕ |f | ≤ ‖f‖∞

for all f ∈ Cc(X,F). Let C+c (X,R) be the set of all f ∈ Cc(X,R) such that f ≥ 0,

and defineϕf = sup |λh| : h ∈ Cc(X,F), |h| ≤ f

for f ∈ C+c (X,R) so that f ≤ g implies ϕf ≤ ϕg and ϕ(cf) = cϕf for nonnegative

c. Let f, g ∈ C+c (X,R); we want to show that

ϕ(f + g) = ϕf + ϕg.

Let ε > 0 and choose h, k ∈ Cc(X,F) such that |h| ≤ f , |k| ≤ g, ϕf ≤ |λh| + εand ϕg ≤ |λk|+ ε. We can assume λh 6= 0 and λk 6= 0. Then

ϕf + ϕg ≤ |λh|+ |λk|+ 2ε

= λ

(|λh|λh

h+|λk|λk

k

)+ 2ε

≤ ϕ(|h|+ |k|) + 2ε

≤ ϕ(f + g) + 2ε,

and taking ε → 0 shows that ϕf + ϕg ≤ ϕ(f + g). For the reverse inequality, leth ∈ Cc(X,F) such that |h| ≤ f + g, let U = x ∈ X : f(x) + g(x) > 0, and define

r =f

f + gh and s =

g

f + gh on U,

r = 0 and s = 0 onX \ U

268

4 Integration

so that h = r + s, |r| ≤ |f |, |r| ≤ |h|, |s| ≤ |g| and |s| ≤ |h|. Then r, s ∈ Cc(X,F),so

|λh| = |λr + λs| ≤ |λr|+ |λs| ≤ ϕf + ϕg.

This shows that ϕ(f + g) ≤ ϕf + ϕg. Finally, for f ∈ Cc(X,R) we can writef = f+ − f− where

f+ =|f |+ f

2and f− =

|f | − f2

,

and define ϕf = ϕf+−ϕf− since f+, f− ∈ C+c (X,R). Then ϕ is a positive linear

functional on Cc(X,R).

By Theorem 4.137, there is a positive Radon measure µ on X such that

ϕf =

∫X

f dµ

for all f ∈ Cc(X,R) and

µ(X) = sup ϕf : f ∈ Cc(X,R), 0 ≤ f ≤ 1 .

Since 0 ≤ f ≤ 1 implies ϕf ≤ ‖f‖∞ ≤ 1, we have µ(X) ≤ 1. Also,

|λf | ≤ ϕ |f | =∫X

|f | dµ = ‖f‖1

for all f ∈ Cc(X,F), and Theorem 4.143 shows that Cc(X,F) is dense in L 1(X,µ,F)with respect to the L1-norm. Therefore Theorem 1.45 shows that there is an ex-tension of λ|Cc(X,F) to a (continuous) linear functional λ : L 1(X,µ,F)→ K such

that |λ| ≤ 1. By Theorem 4.113, there is some g ∈ L∞(X,µ,F) with ‖g‖∞ ≤ 1such that

λf =

∫X

〈f, g〉 dµ

for all f ∈ L 1(X,µ,F). Theorem 4.149 shows that µg is Radon, and Theorem4.104 shows that f ∈ L 1(|µg|) and

λf =

∫X

f dµg

whenever f ∈ L 1(X,µ,F). In particular,

λf = λf =

∫X

f dµg

for all f ∈ Cc(X,F). Both sides are continuous functions on C0(X,F), so theequality holds for all f ∈ C0(X,F) since Cc(X,F) is dense in C0(X,F) under

269

4 Integration

the supremum norm (see Theorem 4.128). Therefore µg ∈ MR(F) is the desiredmeasure.

Finally, we show that ‖µg‖ = |µg| (X) = 1 = |λ|. By Theorem 4.104,

|µg| (X) =

∫X

|g| dµ ≤ µ(X) ≤ 1

since ‖g‖∞ ≤ 1 and µ(X) ≤ 1. For the reverse inequality, we have

|λf | =∣∣∣∣∫X

f dµg

∣∣∣∣ ≤ ∫X

|f | d |µg| ≤ ‖f‖∞ |µg| (X)

for all f ∈ C0(X,F), so 1 = |λ| ≤ |µg| (X).

Corollary 4.152. MR(F) is isometrically isomorphic to C0(X,F)∗. If X is com-pact, then MR(F) is isometrically isomorphic to C(X,F)∗.

Corollary 4.153. MR(F) is complete, i.e. a Banach space.

Proof. C0(X,F)∗ is complete, so Theorem 1.35 implies that MR(F) is also com-plete.

Radon products

Theorem 4.154 (Products of Radon measures). Let X and Y be LCH spaces.Let µ and ν be σ-finite Radon measures on X and Y respectively.

1. µ⊗ ν is finite on compact sets.

2. If X × Y is second countable, then µ⊗ ν is a Radon measure on X × Y .

Proof. Let πX : X × Y → X and πY : X × Y → Y be the canonical pro-jections. If K ⊆ X × Y is compact then πX(K) and πY (K) are compact, soµ(πX(K)) < ∞ and ν(πY (K)) < ∞. By definition, (µ ⊗ ν)(πX(K) × πY (K)) =µ(πX(K))ν(πY (K)). Since K ⊆ πX(K)× πY (K), we have

(µ⊗ ν)(K) ≤ µ(πX(K))ν(πY (K)) <∞.

This proves (1), and (2) follows from Theorem 4.141.

270

4 Integration

Theorem 4.6 shows that BX ⊗ BY ⊆ BX×Y , with equality when X and Y aresecond countable. In general, equality does not hold when X or Y are not secondcountable. It is clear that any f ∈ Cc(X × Y ) is BX×Y -measurable, but it isan important fact that f is always (BX ⊗ BY )-measurable as well. We will usethis fact to define a Radon measure on (X × Y,BX×Y ) without assuming secondcountability. This will allow us to prove a version of Fubini’s theorem for functionsthat are BX×Y -measurable but not necessarily (BX ⊗ BY )-measurable, and willalso allow us to construct Radon measures on infinite products of compact spaces.

If f ∈ Cc(X) and g ∈ Cc(Y ), the function f ⊗ g ∈ Cc(X × Y ) is defined by(f ⊗ g)(x, y) = f(x)g(y).

Theorem 4.155. Let F be a normed vector space and let S be the subspace ofCc(X × Y,F) spanned by functions of the form v(g ⊗ h) for v ∈ F, g ∈ Cc(X,R)and h ∈ Cc(Y,R). Then S is dense in Cc(X × Y,F).

Proof. Let πX : X×Y → X and πY : X×Y → Y be the canonical projections andchoose relatively compact open sets U ⊆ X and V ⊆ Y such that πX(supp(f)) ⊆ Uand πY (supp(f)) ⊆ V . Since U ×V is a compact Hausdorff space, Theorem 4.135shows that the span of

v(g ⊗ h) : v ∈ F, g ∈ C(U,R), h ∈ C(V ,R)

is dense in

C(U ×V ,F). If f ∈ Cc(X×Y,F) and ε > 0, then there is some h in this span suchthat sup(x,y)∈U×V |h(x, y)− f(x, y)| < ε. By Urysohn’s lemma, we can choose

ϕ ∈ Cc(U,R) and ψ ∈ Cc(V,R) such that 0 ≤ ϕ ≤ 1 on U , 0 ≤ ψ ≤ 1 on V ,ϕ = 1 on πX(supp(f)) and ψ = 1 on πY (supp(f)). Define g ∈ Cc(X × Y,F) bysetting g = (ϕ⊗ ψ)h on U × V and g = 0 on (X × Y ) \ (U × V ); then g ∈ S and‖f − g‖∞ < ε.

Corollary 4.156. Every function in C0(X × Y,F) is (BX ⊗ BY )-measurable.

Proof. This follows from Theorem 4.155 and the fact that Cc(X × Y,F) is densein C0(X × Y,F).

Theorem 4.157. If µ and ν are σ-finite Radon measures on X and Y then everyelement of C0(X × Y,F) is (µ ⊗ ν)-measurable, and C0(X × Y,F) ⊆ L∞(X ×Y, µ⊗ ν,F). Furthermore:

1. Cc(X × Y,F) ⊆ L p(X × Y, µ⊗ ν,F) for every 1 ≤ p ≤ ∞.

2. If (µ⊗ ν)(X × Y ) <∞ then C0(X × Y,F) ⊆ L p(X × Y, µ⊗ ν,F) for every1 ≤ p ≤ ∞.

Proof. As in Theorem 4.129, except that f ∈ C0(X×Y,F) is (BX⊗BY )-measurabledue to Corollary 4.156 and µ⊗ν is finite on compact sets due to Theorem 4.154.

271

4 Integration

If µ and ν are σ-finite, the preceding result implies that the map

f 7→∫X×Y

f d(µ⊗ ν)

is a positive linear functional on Cc(X×Y,R), and Theorem 4.137 shows that thereis a corresponding Radon measure µ ⊗ ν on X × Y , called the Radon productof µ and ν. That is, ∫

X×Yf d(µ ⊗ ν) =

∫X×Y

f d(µ⊗ ν)

for all f ∈ Cc(X × Y,R). We will show that the Radon product agrees with theusual product measure on BX ⊗ BY .

Theorem 4.158. Let X,Y be topological spaces and let Z be a measurable space.Using the notation in 4.114:

1. If E ∈ BX×Y , then Ex ∈ BY for every x ∈ X and Ey ∈ BX for every y ∈ Y .

2. If f : X ×Y → Z is Borel measurable, then fx is Borel measurable for everyx ∈ X and fy is Borel measurable for every y ∈ Y .

Proof. As in Theorem 4.114, except that S, the collection of all E ∈ BX×Y suchthat Ex ∈ BY for all x ∈ X, contains all open sets because Ex = ι−1

x (E) is openwhenever E is open and x ∈ X, where ιx : Y → X × Y is the continuous functiondefined by ιx(y) = (x, y).

Lemma 4.159. Let µ and ν be Radon measures on X and Y , and let f ∈ Cc(X×Y,F). Then the functions

x 7→∫Y

fx dν and y 7→∫X

fy dµ

are continuous.

Proof. Let πY : X×Y → Y be the canonical projection and let K = πY (supp(f)).Let x ∈ X and ε > 0. For each y ∈ K, we can choose neighborhoods Uy of x andVy of y such that |f(x, y) − f(x′, y′)| < ε for all (x′, y′) ∈ Uy × Vy. Since K iscompact, there are finitely many Vy1 , . . . , Vyn that cover K. Let U = Uy1∩· · ·∩Uyn ,which is a neighborhood of x. If x′ ∈ U and z ∈ K then z ∈ Vyi for some i, so(x′, z) ∈ Uyi × Vyi and

|f(x, z)− f(x′, z)| ≤ |f(x, z)− f(x, yi)|+ |f(x, yi)− f(x′, z)| < 2ε.

272

4 Integration

This shows that ‖fx − fx′‖∞ < 2ε and therefore∣∣∣∣∫Y

fx dν −∫Y

fx′ dν

∣∣∣∣ ≤ 2εν(K)

for all x′ ∈ U .

Theorem 4.160. Let µ and ν be σ-finite Radon measures on X and Y .

1. If U ⊆ X × Y is an open set with (µ ⊗ ν)(U) < ∞ then the function x 7→ν(Ux) is LSC and is in L 1(µ,R), and

(µ ⊗ ν)(U) =

∫x∈X

ν(Ux) dµ.

2. If U ⊆ X and V ⊆ Y are open sets of finite measure, then

(µ ⊗ ν)(U × V ) = (µ⊗ ν)(U × V ) = µ(U)ν(V ).

3. µ ⊗ ν is σ-finite.

Proof. Let F = f ∈ Cc(X × Y,R) : 0 ≤ f ≤ χU. By Theorem 4.145 we haveχU = supf∈F f and therefore χUx = supf∈F fx. Theorem 4.146 shows that

(µ ⊗ ν)(U) =

∫X×Y

χU d(µ ⊗ ν) = supf∈F

∫X×Y

f d(µ ⊗ ν)

and

ν(Ux) =

∫Y

χUx dν = supf∈F

∫Y

fx dν.

Using Theorem 4.157 and Fubini’s theorem, we have∫x∈X

∫Y

fx dν dµ =

∫X×Y

f d(µ⊗ ν) =

∫X×Y

f d(µ ⊗ ν)

for all f ∈ F , noting that∫X×Y f d(µ⊗ ν) =

∫X×Y f d(µ ⊗ ν) from the definition

of µ ⊗ ν. Now ∫X×Y

f d(µ ⊗ ν) ≤ (µ ⊗ ν)(supp(f)) ≤ (µ ⊗ ν)(U),

so

supf∈F

∫x∈X

∫Y

fx dν dµ

≤ (µ ⊗ ν)(U) <∞.

273

4 Integration

By Lemma 4.159 and Theorem 4.146, x 7→ ν(Ux) is LSC and is in L 1(µ,R), and∫x∈X

ν(Ux) dµ =

∫x∈X

supf∈F

∫Y

fx dν dµ = supf∈F

∫x∈X

∫Y

fx dν dµ

= supf∈F

∫X×Y

f d(µ ⊗ ν) = (µ ⊗ ν)(U).

This proves (1).

For (2), note that in the above we have∫x∈X

∫Y

fx dν dµ =

∫X×Y

f d(µ⊗ ν) ≤ (µ⊗ ν)(U × V )

for all f ∈ F , so

supf∈F

∫x∈X

∫Y

fx dν dµ

≤ (µ⊗ ν)(U × V ) <∞.

Therefore x 7→ ν((U × V )x) is in L 1(µ,R) and

(µ ⊗ ν)(U) =

∫x∈X

ν((U × V )x) dµ =

∫X×Y

χU×V d(µ⊗ ν)

= (µ⊗ ν)(U × V )

using Fubini’s theorem.

For (3), write X =⋃∞m=1 Um and Y =

⋃∞n=1 Vn where every Um ⊆ X and Vn ⊆

Y is an open set of finite measure. Then µ ⊗ ν is σ-finite because X × Y =⋃m,n≥1 Um × Vn and (µ ⊗ ν)(Um × Vn) = µ(Um)ν(Vn) <∞ from (2).

Fubini’s theorem

In order to prove Fubini’s theorem for (σ-finite) Radon products, we will constructa ring in X × Y consisting of well-understood sets and approximate functions inL 1(µ ⊗ ν,F) by maps that are simple with respect to this ring. This was theapproach used in Section 4.6, where the ring consisted of finite disjoint unions ofrectangles. Let F(X ×Y ) be the set of all finite disjoint unions of sets of the formU \ V where U, V are open sets of finite (µ ⊗ ν)-measure in X × Y . We have theidentities

(U1 \ V1) ∩ (U2 \ V2) = (U1 ∩ U2) \ (V1 ∪ V2)

and(U1 \ V1) \ (U2 \ V2) = ((U1 ∩ V2) \ V1) ∪ (U1 \ (V1 ∪ U2)),

274

4 Integration

which show that P ∩ Q and P \ Q are members of F(X × Y ) whenever P,Q ∈F(X × Y ). Since P ∪ Q = (P \ Q) ∪ Q, this shows that F(X × Y ) is a ring inX × Y .

Since µ and ν are σ-finite, Theorem 4.160 implies that X × Y is σ-finite withrespect to F(X×Y ), and it is easy to see that F(X×Y ) generates BX×Y . Let f :X×Y → F be simple with respect to F(X×Y ) and write f =

∑ni=1 f(Ui\Vi)χUi\Vi

for some disjoint sets Ui \Vi where Ui, Vi are open sets of finite (µ ⊗ ν)-measurein X × Y . For each x ∈ X we can integrate the x-section fx, giving a map

x 7→∫Y

fx dν =

n∑i=1

f(Ui \ Vi)ν((Ui \ Vi)x)

=

n∑i=1

f(Ui \ Vi)ν((Ui)x \ (Ui ∩ Vi)x)

=

n∑i=1

f(Ui \ Vi)[ν((Ui)x)− ν((Ui ∩ Vi)x)].

Theorem 4.160 shows that x 7→ ν((Ui)x) and x 7→ ν((Ui ∩ Vi)x) are in L 1(µ,R),so this map is in L 1(µ,F) and we can compute the iterated integral∫

x∈X

∫Y

fx dν dµ =

n∑i=1

f(Ui \ Vi)(∫

x∈Xν((Ui)x) dµ−

∫x∈X

ν((Ui ∩ Vi)x) dµ

)

=

n∑i=1

f(Ui \ Vi)[(µ ⊗ ν)(Ui)− (µ ⊗ ν)(Ui ∩ Vi)]

=n∑i=1

f(Ui \ Vi)(µ ⊗ ν)(Ui \ Vi).

But f ∈ I(µ ⊗ ν), so ∫x∈X

∫Y

fx dν dµ =

∫X×Y

f d(µ ⊗ ν)

by definition.

Theorem 4.161. Let µ and ν be σ-finite Radon measures on X and Y . If Z isa set of (µ ⊗ ν)-measure zero in X × Y , then ν(Zx) = 0 for almost all x ∈ X (cf.Theorem 4.117).

Proof. For each n ≥ 1, let Sn = x ∈ X : ν(Zx) ≥ 1/n. It suffices to show that⋃∞n=1 Sn is contained in a set of measure zero. Let ε > 0. By Corollary 4.23, there

275

4 Integration

is a sequence of sets Uk \Vk, where Uk, Vk are open sets of finite (µ ⊗ ν)-measurein X × Y , such that

Zx ⊆∞⋃k=1

(Uk \ Vk)x and

∞∑k=1

(µ ⊗ ν)(Uk \ Vk) <ε

n2n.

Let Tn be the set of all x for which

1

n≤∞∑k=1

ν((Uk \ Vk)x) =

∞∑k=1

[ν((Uk)x)− ν((Uk ∩ Vk)x)];

then Tn is measurable due to Theorem 4.160, and Sn ⊆ Tn. The partial sums onthe right are integrable with respect to x, so by Theorem 4.63 we have

1

nµ(Tn) ≤

∞∑k=1

(∫x∈X

ν((Uk)x) dµ−∫x∈X

ν((Uk ∩ Vk)x) dµ

)

=

∞∑k=1

(µ ⊗ ν)(Uk \ Vk)

<ε

n2n.

This shows that µ(Tn) < ε2−n, so µ (⋃∞n=1 Tn) ≤ ε. But ε was arbitrary, so

µ (⋃∞n=1 Tn) = 0.

Corollary 4.162. Let µ and ν be σ-finite Radon measures on X and Y , and letf, g : X × Y → F be (µ ⊗ ν)-measurable maps.

1. The map fx is ν-measurable for µ-almost all x ∈ X.

2. If f = g (µ ⊗ ν)-almost everywhere, then fx = gx ν-almost everywhere forµ-almost all x ∈ X.

Proof. As in Corollary 4.118, except that Theorem 4.161 replaces Theorem 4.117and Theorem 4.158 replaces Theorem 4.114.

Theorem 4.163 (Fubini’s theorem for Radon products). Let µ and ν be σ-finiteRadon measures on X and Y , and let f : X × Y → F.

1. If f ∈ L 1(µ ⊗ ν,F) then for almost all x ∈ X, the map fx is in L 1(ν,F),the map given by

x 7→∫Y

fx dν

276

4 Integration

for almost all x (and defined arbitrarily for other x) is in L 1(µ,F), and∫X×Y

f d(µ ⊗ ν) =

∫x∈X

∫Y

fx dν dµ.

2. If f is (µ ⊗ ν)-measurable, fx ∈ L 1(ν,F) for almost all x ∈ X, and the mapgiven by

x 7→∫Y

|fx| dν

(for almost all x ∈ X) is in L 1(µ,R), then f ∈ L 1(µ ⊗ ν,F) and (1) applies.

Proof. As in Theorem 4.119, except that: ϕn is a sequence in IF(X×Y )(µ ⊗ ν);Theorem 4.161 replaces Theorem 4.117; each Φn is in L 1(µ,L 1(ν)); each Ψn isin L 1(µ); ∫

x∈X

∫Y

(ϕn)x dν dµ =

∫X×Y

ϕn d(µ ⊗ ν)

because ϕn is simple with respect to F(X×Y ); Corollary 4.162 replaces Corollary4.118.

Corollary 4.164. If µ and ν are σ-finite Radon measures on X and Y , then

(µ ⊗ ν)|BX⊗BY = µ⊗ ν.

Infinite Radon products

Let Xαα∈A be a collection of compact Hausdorff spaces. Let X =∏α∈AXα,

which is a compact Hausdorff space due to Tychonoff’s theorem, and let πα : X →Xα be the canonical projections. We write X(α1,...,αn) =

∏nk=1Xαk and π(α1,...,αn)

for the projection onto X(α1,...,αn) defined by

π(α1,...,αn)(x) = (πα1(x), . . . , παn(x)).

Suppose that for every sequence (α1, . . . , αn) of distinct elements of A, we aregiven a positive Radon measure µ(α1,...,αn) on X(α1,...,αn) such that

µ(α1,...,αn)(X(α1,...,αn)) = 1.

We say that such a family of measures is consistent if for all n ≥ 1:

1. µ(ασ(1),...,ασ(n))(Eσ(1) × · · · × Eσ(n)) = µ(α1,...,αn)(E1 × · · · × En) for all per-mutations σ of 1, . . . , n and Borel sets Ek ⊆ Xαk .

277

4 Integration

2. µ(α1,...,αm)(E) = µ(α1,...,αn)(E ×X(αm+1,...,αn)) for all m < n and any Borelset E ⊆ X(α1,...,αm).

Theorem 4.165 (Kolmogorov extension theorem). Let µ(α1,...,αn) be a consis-tent family of measures. There exists a unique positive Radon measure µ on Xsuch that µ(X) = 1 and

(π(α1,...,αn))∗µ = µ(α1,...,αn)

for all sequences (α1, . . . , αn) of distinct elements of A.

Proof. Let CF (X) be the set of all f ∈ C(X,C) that depend only on finitelymany coordinates. Define a map λ : CF (X) → C as follows: if f ∈ CF (X) thenf = g π(α1,...,αn) for some sequence (α1, . . . , αn) and some g ∈ C(X(α1,...,αn),C),and we can set

λf =

∫X(α1,...,αn)

g dµ(α1,...,αn).

This is well-defined because the family µ(α1,...,αn) is consistent, and it is easyto check that λ is a linear functional with |λ| = 1. Since CF (X) is a complexsubalgebra of C(X,C) that separates points, contains the constant functions, andis closed under complex conjugation, Theorem 4.134 shows that CF (X) is densein C(X,C). If we extend λ to a linear functional on C(X,C) with the same norm,then Theorem 4.151 shows that there exists some µ ∈ MR(X,C) with ‖µ‖ = 1such that

λf =

∫X

f dµ

for all f ∈ C(X,C). We have

µ(X) =

∫X

1 dµ = λ1 = 1 = ‖µ‖ ,

and since µ(X) = |µ| (X), Theorem 4.15 implies that µ is a positive measure withµ(X) = 1.

Let (α1, . . . , αn) be a sequence of distinct elements of A. If we can show that(π(α1,...,αn))∗µ is Radon, then every f ∈ C(X(α1,...,αn),R) is in

L 1(X(α1,...,αn), (π(α1,...,αn))∗µ,R),

and Theorem 4.69 shows that∫X(α1,...,αn)

f d((π(α1,...,αn))∗µ) =

∫X

f π(α1,...,αn) dµ = λ(f π(α1,...,αn))

=

∫X(α1,...,αn)

f dµ(α1,...,αn).

278

4 Integration

By uniqueness in Theorem 4.137, (π(α1,...,αn))∗µ = µ(α1,...,αn).

Let E be a Borel set in X(α1,...,αn) and write π = π(α1,...,αn) for convenience. Letε > 0. Since µ is regular, we can choose some compact set K ⊆ π−1(E) suchthat µ(K) > µ(π−1(E))− ε. Then K ′ = π(K) ⊆ E is compact, and (π∗µ)(K ′) =µ(π−1(K ′)) ≥ µ(K) because π−1(K ′) ⊇ K. Therefore (π∗µ)(K ′) > (π∗µ)(E)− ε.This shows that π∗µ is inner regular, and a similar argument shows that it is outerregular, which completes the proof.

4.9 Topological groups

A topological group is a group G with a topology such that the product mapG×G→ G given by (x, y) 7→ xy and the inversion map G→ G given by x 7→ x−1

are continuous. We write e for the identity element. Since the inversion mapis its own inverse, it is a homeomorphism. If a ∈ G then the left translationmap τa : G → G defined by τa(x) = ax is continuous. Since τ−1

a = τa−1 , theleft translation map is a homeomorphism. Similarly, the right translation mapx 7→ xa is a homeomorphism. A subset A ⊆ G is symmetric if A = A−1.

Theorem 4.166 (Properties of topological groups). Let G be a topological group.

1. If a ∈ G and U is open then aU and Ua are open.

2. If U is a neighborhood of e then there exists a symmetric neighborhood V ofe such that

V ⊆ V V ⊆ U.

Furthermore, if G is locally compact Hausdorff then we can also choose Vsuch that V and V V are compact, and

V ⊆ V V ⊆ V V ⊆ U.

3. If H is a subgroup of G then H is a subgroup of G and

H =⋂U

HU,

where the intersection is taken over all neighborhoods U of e.

4. Every open subgroup of G is closed.

5. If K,L are compact subsets of G then KL is also compact.

279

4 Integration

Proof. For (2), let W be the inverse image of U under the product map. Chooseneighborhoods W1,W2 of e such that W1×W2 ⊆W and take V = W1∩W−1

1 ∩W2∩W−1

2 . If G is locally compact then we can choose relatively compact neighborhoodsK1,K2 of e such that K1 ⊆W1 and K2 ⊆W2, and take V = K1∩K−1

1 ∩K2∩K−12 .

Then V is compact and V V is compact due to (5).

For (3), if h ∈ H then hH ⊆ H and therefore H ⊆ h−1H. Since h−1H is closed,H ⊆ h−1H and therefore hH ⊆ H. This holds for all h ∈ H, so HH ⊆ H. Ifx ∈ H then Hx ⊆ H and therefore H ⊆ Hx−1, so H ⊆ Hx−1 and therefore

Hx ⊆ H. This shows that HH ⊆ H. Also, H−1 = H ⊆ H, so H ⊆ H−1

and

H ⊆ H−1

. This shows that H−1 ⊆ H, so H is a subgroup. For the equality,

let x ∈ H. If U is a neighborhood of e then xU−1 contains x, so there is someh ∈ H such that h ∈ xU−1. Therefore x = hu for some u ∈ U , and this provesthat H ⊆

⋂U HU . Conversely, if x ∈ HU for all neighborhoods U of e then xU−1

contains an element of H for all U , so x ∈ H. This shows that⋂U HU ⊆ H.

For (4), if H is an open subgroup of G then G\H =⋃x/∈H xH is open since every

xH is open, so H is closed. For (5), KL is compact because it is the image of thecompact set K × L under the product map.

Let f be a continuous function from G to a Banach space. For any a ∈ G, wedefine the left a-translate and right a-translate of f by

Laf(x) = f(a−1x) and Raf(x) = f(xa).

Note that LaLbf = Labf and RaRbf = Rabf . We say that f is left (or right)uniformly continuous if for every ε > 0 there exists a neighborhood U of e suchthat ‖Laf − f‖∞ < ε (or ‖Raf − f‖∞ < ε) for all a ∈ U .

Theorem 4.167. If f ∈ C0(G) then f is left and right uniformly continuous.

Proof. We will prove left uniform continuity only. First assume that f ∈ Cc(G),and let ε > 0. For every x ∈ supp(f), there is a neighborhood Ux of e suchthat |f(ax)− f(x)| < ε for all a ∈ Ux. By Theorem 4.166, there is a symmetricneighborhood Vx of e such that VxVx ⊆ Ux. Since supp(f) is compact, finitelymany sets Vx1

x1, . . . , Vxnxn cover supp(f), with x1, . . . , xn ∈ supp(f). LetV = Vx1

∩ · · · ∩ Vxn and let a ∈ V . If x ∈ supp(f) then x ∈ Vxixi for some i, sox ∈ Uxixi and a−1x = a−1(xx−1

i )xi ∈ Uxixi. Therefore∣∣f(a−1x)− f(x)∣∣ ≤ ∣∣f(a−1x)− f(xi)

∣∣+ |f(xi)− f(x)| < 2ε. (*)

If x /∈ supp(f) then f(x) = 0, and we have two cases. If a−1x /∈ supp(f) thenf(a−1x) = 0. If a−1x ∈ supp(f) then a−1x ∈ Vxixi for some i, so x ∈ Uxixi anda−1x ∈ Uxixi. In this case, (*) holds.

280

4 Integration

Now let f ∈ C0(G), and let ε > 0. Since Cc(G) is dense in C0(G) (see Theorem4.128), there is some g ∈ Cc(G) such that ‖g − f‖∞ < ε. Since g is uniformlycontinuous, there exists a neighborhood U of e such that ‖Lag − g‖∞ < ε for alla ∈ U . Then

‖Laf − f‖∞ ≤ ‖Laf − Lag‖∞ + ‖Lag − g‖∞ + ‖g − f‖∞< 3ε

for all a ∈ U .

A Borel measure µ on G is left-invariant (or right-invariant) if µ(aE) = µ(E)(or µ(Ea) = µ(E)) for all a ∈ G and E ∈ BG. A linear functional λ on Cc(G)is left-invariant (or right-invariant) if λ(Laf) = λf (or λ(Raf) = λf) for alla ∈ G and f ∈ Cc(G).

Quotient groups

If G is a topological group and H is a subgroup of G, we can consider the set ofcosets G/H = xH : x ∈ G and the quotient map π : G → G/H. We give G/Hthe quotient topology induced by π, i.e. the topology where E ⊆ G/H is open ifand only if π−1(E) is open.

Lemma 4.168. Let G be a topological group and let H be a subgroup of G.

1. G/H is a group under the product (xH, yH) 7→ (xy)H if and only if H isnormal.

2. G/H is Hausdorff if and only if H is closed.

Theorem 4.169. If L is a compact subset of G/H then there exists a compactsubset K of G such that L = π(K).

Proof. Let U be a relatively compact neighborhood of e in G. Then π(U) isa neighborhood of the coset H in G/H, and since L is compact, we can findx1, . . . , xn ∈ G such that L ⊆ π(x1U)∪ · · · ∪ π(xnU). Then K = π−1(L)∩ (x1U ∪· · · ∪ xnU) is compact and π(K) = L.

4.10 Haar measure

Let G be a locally compact Hausdorff (LCH) group. A left (or right) Haarmeasure on G is a nonzero positive Radon measure on G that is left-invariant

281

4 Integration

(or right-invariant). A Haar measure on G is a left Haar measure that is alsoright-invariant (this is always true when G is abelian). A positive Radon measureis a left (or right) Haar measure if and only if its corresponding positive linearfunctional (see Theorem 4.137) is nonzero and left-invariant (or right-invariant).

Recall that C+c (G,R) = C+

c (G) is the set of all f ∈ Cc(G,R) such that f ≥ 0 onG.

Theorem 4.170 (Properties of Haar measure). Let G be an LCH group and letµ be a nonzero positive Radon measure on G.

1. µ is a left Haar measure if and only if the measure µ defined by µ(E) =µ(E−1) is a right Haar measure.

2. µ is a left Haar measure if and only if∫G

f dµ =

∫G

Laf dµ

for all a ∈ G and f ∈ Cc(G). Also, if f ∈ L 1(µ) then for all a ∈ G we haveLaf ∈ L 1(µ) and the above equality holds as well.

3. If µ is a left Haar measure then:

a) µ(U) > 0 for every nonempty open set U ⊆ G.

b)∫Gf dµ > 0 for all nonzero f ∈ C+

c (G).

Proof. For (2), if µ is a left Haar measure then∫Gf dµ =

∫GLaf dµ whenever

f ∈ I(µ), so the equality holds for all f ∈ L 1(µ) since I(µ) is dense in L 1(µ).The converse follows from (*) in Theorem 4.137. For (3)(a), since µ(G) > 0 andµ is inner regular on G, there is a compact K ⊆ G such that µ(K) > 0. If Uis a nonempty open set then K can be covered by finitely many left translatesof U (all of which have the same measure), so µ(U) > 0. For (3)(b), let U =f−1

(( 1

2 ‖f‖∞ ,∞))

so that∫Gf dµ ≥ 1

2 ‖f‖∞ µ(U) > 0.

Theorem 4.171. Let G be an LCH group. Then G has a subgroup H that isopen, closed, and σ-compact.

Proof. Choose a symmetric relatively compact neighborhood U of e (see Theorem4.166). Let Un = UU · · ·U (with n factors) and let H =

⋃∞n=1 Un. Since U is

open and Un =⋃a∈U aUn−1, Theorem 4.166 implies that every Un is open (by

induction). Therefore H is open (and closed, due to Theorem 4.166). Each Un isrelatively compact, so H is σ-compact.

282

4 Integration

Theorem 4.172. Let G be an LCH group, let H be a subgroup of G that is open,closed, and σ-compact, and choose a set B ⊆ G such that bHb∈B is disjoint andG =

⋃b∈B bH. Let E be a Borel set.

1. If there is a countable subset bn ⊆ B such that E ⊆⋃∞n=1 bnH, then

µ(E) =∑∞n=1 µ(E ∩ bnH).

2. If E ∩ bH 6= ∅ for uncountably many b ∈ B, then µ(E) =∞.

Proof. (1) is obvious. For (2), we can assume that E is open because µ is outerregular. By Theorem 4.170, µ(E ∩ bH) > 0 whenever E ∩ bH 6= ∅. This occurs foruncountably many b ∈ B, so there must be some ε > 0 such that µ(E ∩ bH) > εfor uncountably many b ∈ B. Therefore µ(E) =∞.

Theorem 4.173. Let G be an LCH group and let µ be a left Haar measure on G.Then µ(G) <∞ if and only if G is compact.

Proof. If G is compact then µ(G) <∞ since µ is finite on compact sets. Supposethat G is non-compact. Choose a relatively compact neighborhood U of e anddefine a sequence xn as follows: let x0 = e, and for each n = 1, 2, . . . consider⋃n−1i=0 xiU . This set cannot cover G because its closure is compact, so we can

find some xn /∈⋃n−1i=0 xiU . By Theorem 4.166, there exists a symmetric neigh-

borhood V of e such that V V ⊆ U . If m < n and xmV ∩ xnV is nonemptythen xn ∈ xmV V ⊆ xmU , which contradicts the fact that xn /∈

⋃n−1i=0 xiU .

Therefore xnV is a disjoint sequence of sets with µ(xnV ) = µ(V ) > 0, andµ(G) ≥ µ (

⋃∞n=1 xnV ) =∞.

Theorem 4.174. Let G be an LCH group and let µ be a left Haar measure on G.The following are equivalent:

1. There exists some x ∈ G such that µ(x) > 0.

2. For all x ∈ G we have µ(x) > 0.

3. There exists some c > 0 such that µ(E) = c |E| for all finite sets E andµ(E) =∞ for all infinite Borel sets E.

4. G is discrete.

Proof. (1)⇔ (2) is obvious from left-invariance. If µ(e) > 0 then for any finiteset E we have µ(E) =

∑x∈E µ(x) = µ(e) |E|. If E is infinite then it contains

a countably infinite subset F , so µ(E) ≥ µ(F ) =∑x∈F µ(x) =∞. This proves

(2) ⇒ (3). For (3) ⇒ (4), choose any relatively compact neighborhood U of e sothat 0 < µ(U) ≤ µ(U) < ∞ by Theorem 4.170 and the fact that U is compact.

283

4 Integration

This implies that U is finite, and since U is Hausdorff, the set e is open. Thisimplies that every subset of G is open, i.e. G is discrete. For (4) ⇒ (1), the setx is nonempty and open, so Theorem 4.170 shows that µ(x) > 0.

Theorem 4.175 (Uniqueness of Haar measure). Let G be an LCH group. If µand ν are left Haar measures on G, then there exists a c > 0 such that µ = cν.

Proof. We first give a simple proof for the case where µ is both left-invariant andright-invariant, e.g. when G is abelian. Choose any nonzero h ∈ C+

c (G) suchthat h(x) = h(x−1) for all x ∈ G, e.g. by choosing any nonzero g ∈ C+

c (G) andsetting h(x) = g(x) + g(x−1). Then for all f ∈ Cc(G), we can apply Theorem4.154 and Fubini’s theorem (since f and h vanish outside a set of finite measure)and compute∫

G

h dν

∫G

f dµ

=

∫y∈G

∫x∈G

f(x)h(y) dµ dν =

∫y∈G

∫x∈G

f(xy)h(y) dµ dν

=

∫x∈G

∫y∈G

f(xy)h(y) dν dµ =

∫x∈G

∫y∈G

f(y)h(x−1y) dν dµ

=

∫x∈G

∫y∈G

f(y)h(y−1x) dν dµ =

∫y∈G

∫x∈G

f(y)h(y−1x) dµ dν

=

∫y∈G

∫x∈G

f(y)h(x) dµ dν

=

∫G

h dµ

∫G

f dν.

Therefore µ = cν where

c =

∫Gh dµ∫

Gh dν

.

For the general case, it suffices to show that there is a constant c > 0 such that∫Gf dµ/

∫Gf dν = c for all nonzero f ∈ C+

c (G). Let f, g ∈ C+c (G) be nonzero

and let ε > 0. By Theorem 4.166, there exists a symmetric, relatively compactneighborhood U of e. Let

A = [supp(f)]U ∪ U [supp(f)],

B = [supp(g)]U ∪ U [supp(g)]

so that A and B are compact, and for every y ∈ U the functions

x 7→ f(xy)− f(yx),

x 7→ g(xy)− g(yx)

284

4 Integration

are supported in A and B respectively. Theorem 4.167 implies that there is asymmetric, relatively compact neighborhood V of e such that V ⊆ U , and

supx∈A|f(xy)− f(yx)| < ε and sup

x∈B|g(xy)− g(yx)| < ε

for all y ∈ V . Choose any nonzero h ∈ C+c (G) such that supp(h) ⊆ V and

h(x) = h(x−1) for all x ∈ V . Then∫G

h dν

∫G

f dµ =

∫y∈G

∫x∈G

f(x)h(y) dµ dν

=

∫y∈G

∫x∈G

f(yx)h(y) dµ dν

and ∫G

h dµ

∫G

f dν

=

∫y∈G

∫x∈G

f(y)h(x) dµ dν =

∫y∈G

∫x∈G

f(y)h(y−1x) dµ dν

=

∫x∈G

∫y∈G

f(y)h(y−1x) dν dµ =

∫x∈G

∫y∈G

f(y)h(x−1y) dν dµ

=

∫x∈G

∫y∈G

f(xy)h(y) dν dµ =

∫y∈G

∫x∈G

f(xy)h(y) dµ dν,

so ∣∣∣∣∫G

h dν

∫G

f dµ−∫G

h dµ

∫G

f dν

∣∣∣∣ =

∣∣∣∣∫y∈G

∫x∈G

[f(yx)− f(xy)]h(y) dµ dν

∣∣∣∣≤ εµ(A)

∫G

h dν. (*)

Repeating the above with g instead of f , we get∣∣∣∣∫G

h dν

∫G

g dµ−∫G

h dµ

∫G

g dν

∣∣∣∣ ≤ εµ(B)

∫G

h dν. (**)

Dividing (*) by∫Gh dν

∫Gf dν and (**) by

∫Gh dν

∫Gg dν gives∣∣∣∣

∫Gf dµ∫

Gf dν

−∫Gg dµ∫

Gg dν

∣∣∣∣ ≤ ε( µ(A)∫Gf dν

+µ(B)∫Gg dν

),

so taking ε→ 0 completes the proof.

285

4 Integration

Example 4.176. Let E be a finite-dimensional vector space, and consider it asan additive group. With the appropriate normalization constant (which will bediscussed in Section 4.11), its Haar measure is called the Lebesgue measure,usually denoted by m. Some related examples are:

1. Let L(E) be the space of linear operators on E and let m be the Lebesguemeasure on L(E). Then a Haar measure for L(E) \ 0 as a multiplicativegroup is

dµ(λ) =1

|det(λ)|dim(E)dm(λ).

The proof of this requires the change of variables formula, which will bediscussed in Section 4.11.

2. In particular, if K = R then a Haar measure for the multiplicative groupR \ 0 is

dµ(x) =1

|x|dm(x).

3. Let T = z ∈ C : |z| = 1 be the circle group and define γ : R → T byγ(t) = e2πit. A Haar measure for T is the image measure µ = γ∗m, given by

µ(E) = m(γ−1(E)).

Existence of Haar measure

To prove that a left Haar measure exists for an LCH group, we need a way tomeasure the size of a set relative to another. Equivalently, we can try to measurethe relative size of a compactly supported function. This will give us a nonzero,left-invariant positive linear functional on Cc(G), and Theorem 4.137 gives thecorresponding measure on G.

Let f, ϕ ∈ C+c (G) with ϕ 6= 0. We define the Haar covering number of f with

respect to ϕ by

(f : ϕ) = inf

n∑i=1

ci : f ≤n∑i=1

ciLxiϕ for some n and x1, . . . , xn ∈ G

.

The above set is nonempty: since we can find a nonempty open set U on whichϕ > 0, and since supp(f) is compact, finitely many translates of U cover supp(f).Clearly, if f 6= 0 then (f : ϕ) > 0. The Haar covering number behaves like a ratio,as the following theorem shows:

Theorem 4.177. Let f, g, ϕ ∈ C+c (G) with ϕ 6= 0.

286

4 Integration

1. (f : ϕ) = (Laf : ϕ) for all a ∈ G.

2. (cf : ϕ) = c(f : ϕ) for all c ≥ 0.

3. (f + g : ϕ) ≤ (f : ϕ) + (g : ϕ).

4. If g 6= 0 then (f : ϕ) ≤ (f : g)(g : ϕ).

Proof. (1) and (2) are obvious. (3) follows from the fact that if f ≤∑mi=1 ciLxiϕ

and g ≤∑ni=m+1 ciLxiϕ then f + g ≤

∑ni=1 ciLxiϕ. (4) follows from the fact that

if f ≤∑mi=1 ciLxig and g ≤

∑ni=1 diLyiϕ then f ≤

∑mi=1

∑nj=1 cidjLxiyjϕ and

(∑mi=1 ci) (

∑ni=1 di) =

∑mi=1

∑nj=1 cidj .

Now choose some nonzero f0 ∈ C+c (G) and define

λϕ(f) =(f : ϕ)

(f0 : ϕ)

for f, ϕ ∈ C+c (G) with ϕ 6= 0. Theorem 4.177 shows that λϕ is left-invariant and

subadditive. In fact, as supp(ϕ) becomes small, λϕ becomes “more” additive inthe following sense:

Lemma 4.178. Let f, g ∈ C+c (G). For every ε > 0 there exists a neighborhood U

of e such thatλϕ(f) + λϕ(g) ≤ λϕ(f + g) + ε

for all nonzero ϕ ∈ C+c (G) such that supp(ϕ) ⊆ U .

Proof. Choose some h ∈ C+c (G) such that h = 1 on supp(f + g) and choose δ > 0

small enough so that

2δ(f + g : f0) + δ(1 + 2δ)(h : f0) < ε.

(The reason for this will become clear later.) Let k = f + g + δh and define

r =f

kon supp(f) and r = 0 onG \ supp(f),

s =g

kon supp(g) and s = 0 onG \ supp(g)

so that r, s ∈ C+c (G). Theorem 4.167 shows that there exists a neighborhood U

of e such that|r(x)− r(y)| < δ and |s(x)− s(y)| < δ

287

4 Integration

whenever y−1x ∈ U . Let ϕ ∈ C+c (G) with ϕ 6= 0 and supp(ϕ) ⊆ U . If k ≤∑n

i=1 ciLxiϕ then |r(x)− r(xi)| < δ whenever x−1i x ∈ supp(ϕ), so

f(x) = k(x)r(x) ≤n∑i=1

ciϕ(x−1i x)r(x) ≤

n∑i=1

ciϕ(x−1i x)(r(xi) + δ)

for all x ∈ G. This implies that (f : ϕ) ≤∑ni=1 ci(r(xi) + δ), and a similar

argument shows that (g : ϕ) ≤∑ni=1 ci(s(xi) + δ). Since r + s ≤ 1, we have

(f : ϕ) + (g : ϕ) ≤n∑i=1

ci(1 + 2δ).

This holds for all n, c1, . . . , cn and x1, . . . , xn such that k ≤∑ni=1 ciLxiϕ, so

(f : ϕ) + (g : ϕ) ≤ (1 + 2δ)(k : ϕ)

≤ (1 + 2δ)[(f + g : ϕ) + δ(h : ϕ)]

≤ (f + g : ϕ) + 2δ(f + g : ϕ) + δ(1 + 2δ)(h : ϕ)

≤ (f + g : ϕ) + (f0 : ϕ)[2δ(f + g : f0) + δ(1 + 2δ)(h : f0)]

< (f + g : ϕ) + (f0 : ϕ)ε

by Theorem 4.177. Dividing by (f0 : ϕ) gives the result.

We will need the following useful characterization of compact spaces. A collec-tion Eαα∈A of subsets of a topological space X has the finite intersectionproperty if

⋂α∈B Eα 6= ∅ whenever B is a finite subset of A.

Lemma 4.179. A topological space X is compact if and only if for every collectionEαα∈A of closed sets with the finite intersection property, we have

⋂α∈AEα 6= ∅.

Proof. Let Uα = Ecα so that each Uα is open,⋂α∈AEα 6= ∅ if and only if⋃

α∈A Uα = X, and Eα has the finite intersection property if and only if nofinite subcollection of Uα covers X.

Theorem 4.180 (Existence of Haar measure). If G is an LCH group, then thereexists a left Haar measure on G.

Proof. First note that

(f0 : f)−1 ≤ λϕ(f) ≤ (f : f0)

by the last part of Theorem 4.177. For each nonzero f ∈ C+c (G), let Xf be the in-

terval [(f0 : f)−1, (f : f0)]. Tychonoff’s theorem shows that X =∏f∈C+

c (G)\0Xf

288

4 Integration

is a compact Hausdorff space, and λϕ ∈ X for all nonzero ϕ ∈ C+c (G). (Here we

are considering the points of X as functions on C+c (G) \ 0. We will also restrict

each λϕ to C+c (G) \ 0.) For each compact set E containing e, let K(E) be

the closure of λϕ : supp(ϕ) ⊆ E in X, which is compact. Since⋂ni=1K(Ei) ⊇

K (⋂ni=1Ei) 6= ∅, Lemma 4.179 shows that there is some λ ∈

⋂E K(E), where the

intersection is taken over all compact sets E containing e.

We can set λ(0) = 0, so that λ is a function on C+c (G). For any compact set E

of e, every neighborhood of λ in X intersects λϕ : supp(ϕ) ⊆ E. It follows thatfor all ε > 0, any neighborhood of U of e and any f1, . . . , fn ∈ C+

c (G), there issome nonzero ϕ ∈ C+

c (G) such that supp(ϕ) ⊆ U and |λ(fi)− λϕ(fi)| < ε for alli. Using this fact, we can use Theorem 4.177 and Theorem 4.178 to deduce that λis left-invariant, additive, and λ(cf) = cλ(f) for all f ∈ C+

c (G) and c ≥ 0. We canextend λ to Cc(G) by setting λ(f) = λ(f+) − λ(f−) as in the proof of Theorem4.151, and then Theorem 4.137 completes the proof.

Duality for L1(µ)

Let G be an LCH group and let µ be a left Haar measure on G. In Section 5.9we will make use of the dual of L1(G,µ), which is L∞(G,µ) when µ is σ-finite(see Theorem 4.113). When µ is not σ-finite, the following result provides analternative. See [4] for more details.

A set E ⊆ G is locally Borel if E ∩F is Borel whenever F is a Borel set of finitemeasure, and a locally Borel set E is locally null if µ(E ∩ F ) = 0 whenever Fis a Borel set of finite measure. We say that a property holds locally almosteverywhere in E if there is a locally null set F such that the property holdsfor all elements of E \ F . A function f : G → F is locally measurable if f |F ismeasurable whenever F is a Borel set of finite measure (i.e. f−1(E) is locally Borelfor every Borel set E ⊆ F), and f is locally µ-measurable if f |F is µ-measurablewhenever F is a Borel set of finite measure.

Theorem 4.181 (Properties of locally Borel sets). Let G be an LCH group, let Hbe a subgroup of G that is open, closed, and σ-compact, and choose a set B ⊆ Gsuch that bHb∈B is disjoint and G =

⋃b∈B bH (see Theorem 4.171).

1. A set E is locally Borel if and only if E ∩ bH is Borel for all b ∈ B.

2. A set E is locally null if and only if µ(E ∩ bH) = 0 for all b ∈ B.

3. A function f : G→ F is locally measurable if and only if f |bH is measurablefor all b ∈ B.

4. A function f : G → F is locally µ-measurable if and only if f |bH is µ-measurable for all b ∈ B.

289

4 Integration

5. If E is locally Borel and σ-finite then it is Borel.

6. If E is locally null and σ-finite then it has measure zero.

If f : G→ F is locally µ-measurable, we define the essential supremum of f by


where the inf is taken over all bounded maps g such that f = g locally almosteverywhere (‖f‖∞ = ∞ if there are no such maps). Let L∞loc(G,F) = L∞loc(G)be the vector space of all maps f : G → F that are locally µ-measurable and

satisfy ‖f‖∞ <∞. Let L∞loc(G,F) = L∞loc(G) be the quotient space L∞loc(G)/L(0)loc ,

where L(0)loc is the set of all functions that are zero locally almost everywhere.

Then ‖·‖∞ is a seminorm on L∞loc(G), and L∞loc(G) is a normed vector space. Iffn is a sequence in L∞loc(G) and f ∈ L∞loc(G), we say that fn → f in L∞ if‖fn − f‖∞ → 0.

If µ is σ-finite, then L∞loc(G) = L∞(G) and L∞loc(G) = L∞(G).

Theorem 4.182. L∞loc(G) is complete, i.e. a Banach space.


Let F be a Hilbert space over K and let f ∈ L∞loc(G,F) and g ∈ L 1(G,F).Theorem 4.55 shows that g = 0 outside a σ-finite set E. Since |f | ≤ ‖f‖∞locally almost everywhere, we must have f |E ≤ ‖f‖∞ almost everywhere. Then〈g, f〉 ∈ L 1(G,K) by Theorem 4.91, and

(g, f) 7→∫X

〈g, f〉 dµ

defines a sesquilinear map

〈·, ·〉µ : L1(G,F)× L∞loc(G,F)→ K.

Theorem 4.183 (Dual of L1). Let G be an LCH group, let F be a Hilbert space,and let µ be a left Haar measure on G. The map

Λ∞ : L∞loc(G,F)→ L1(G,F)∗

f 7→ 〈·, f〉µ

is a conjugate linear bijective isometry (cf. Theorem 4.113).

290

4 Integration

Proof. Let H be a subgroup of G that is open, closed, and σ-compact, and choosea set B ⊆ G such that bHb∈B is disjoint and G =

⋃b∈B bH (see Theorem 4.171).

To prove that Λ∞ is an isometry, let f ∈ L∞loc(G,F) and let λ = Λ∞[f ]. Theorem4.91 shows that |λ| ≤ ‖f‖∞, so it remains to show that ‖f‖∞ ≤ |λ|. For eachb ∈ B, Theorem 4.181 implies that f |bH ∈ L∞(bH,F). Define λb ∈ L1(bH,F)∗

by λb[g] =∫bH〈g, f |bH〉 dµ. Since bH is σ-finite (and therefore semifinite), we

can apply Theorem 4.113 to get ‖f |bH‖∞ ≤ |λb| ≤ |λ|, i.e. |f |bH | ≤ |λ| almosteverywhere. By 4.181, |f | ≤ |λ| locally almost everywhere. That is, ‖f‖∞ ≤ |λ|.

To prove that Λ∞ is surjective, let λ ∈ L1(G,F)∗ and consider the restrictionλ|L1(bH,F) for each b ∈ B. Since bH is σ-finite, Theorem 4.113 shows that thereis some [fb] ∈ L∞(bH,F) such that λ[g] =

∫bH〈g, fb〉 dµ for all g ∈ L 1(bH,F).

Define f : G→ F by setting f = fb on bH for each b ∈ B; then f ∈ L∞(G,F). Ifg ∈ L 1(G,F) then g vanishes outside a σ-finite set E. By Theorem 4.172, thereis a countable subset bn ⊆ B such that E ⊆

⋃∞n=1 bnH. By Theorem 4.70,

λ[g] = λ

∞∑n=1

[gχbnH ] =

∞∑n=1

λ[gχbnH ] =

∞∑n=1

∫bnH

〈g, fbn〉 dµ =

∫G

〈g, f〉 dµ.

Finite product groups and quotient groups

If G1, . . . , Gn are LCH groups, then the product group G = G1 × · · · ×Gn is alsoLCH. If µ1, . . . , µn are left Haar measures for G1, . . . , Gn, then we have an explicitformula for a left Haar measure on the product group G.

Theorem 4.184. Let G1, . . . , Gn be LCH groups, let µ1, . . . , µn be correspondingleft Haar measures, and let G = G1 × · · · ×Gn. Then the map λ : Cc(G,R)→ Rdefined by

λf =

∫x1∈G1

· · ·∫xn∈Gn

f(x1, . . . , xn) dµn · · · dµ1

is a left-invariant nonzero positive linear functional. The corresponding Radonmeasure (see Theorem 4.137) is a left Haar measure on G.

Proof. λ is clearly a nonzero positive linear functional, so it remains to show that

291

4 Integration

it is left-invariant. If a = (a1, . . . , an) ∈ G then

λ(Laf)

=

∫x1∈G1

· · ·∫xn∈Gn

f(a−11 x1, . . . , a

−1n xn)

=

∫x1∈G1

· · ·∫xn∈Gn

f(a−11 x1, . . . , a

−1n−1xn−1, xn)

=

∫xn∈Gn

∫x1∈G1

· · ·∫xn−1∈Gn−1

f(a−11 x1, . . . , a

−1n−1xn−1, xn)

=

∫xn∈Gn

∫x1∈G1

· · ·∫xn−1∈Gn−1

f(a−11 x1, . . . , a

−1n−2xn−2, xn−1, xn)

= · · ·

=

∫x1∈G1

· · ·∫xn∈Gn

f(x1, . . . , xn),

where at each step we use the fact that µi is left-invariant and then Theorem 4.157and Fubini’s theorem to change the order of integration (which is justified sinceall functions involved vanish outside a set of finite measure).

Let G be an LCH group and let H be a closed normal subgroup of G. If µ is a leftHaar measure on H, then we can give an explicit formula for a left Haar measureon G provided that we have a Haar measure for the quotient group G/H.

Lemma 4.185. For any f ∈ Cc(G,R), there exists some fH ∈ Cc(G/H,R) suchthat

fH(xH) =

∫y∈H

f(xy) dµ

for all x ∈ G.

Proof. Let K = supp(f) and define ϕ : G→ R by

ϕ(x) =

∫y∈H

f(xy) dµ.

For any x0 ∈ G, Theorem 4.167 shows that if ε > 0 then there is a relativelycompact neighborhood U of e such that ‖Laf − f‖∞ < ε for all a ∈ U , so for allx ∈ U−1x0 we have∣∣∣∣∫

y∈Hf(xy) dµ−

∫y∈H

f(x0y) dµ

∣∣∣∣ ≤ ∫y∈H

∣∣f(xx−10 x0y)− f(x0y)

∣∣ dµ< εµ(x−1

0 UK).

292

4 Integration

This shows that ϕ is continuous. Also, ϕ is constant on the left cosets of H becauseµ is left-invariant. Therefore we have a continuous function fH : G/H → Rsatisfying

fH(xH) = ϕ(x) =

∫y∈H

f(xy) dµ

for all x ∈ G. If fH(xH) 6= 0 then xy ∈ K for some y ∈ H, and x = (xy)y−1 ∈π(K) where π : G → G/H is the quotient map. Therefore supp(fH) ⊆ π(K) iscompact, i.e. fH ∈ Cc(G/H,R).

Lemma 4.186. The map f 7→ fH is linear and surjective.

Proof. Choose some nonzero h ∈ Cc(G/H,R) and let L = supp(h). By Theorem4.169, there is a compact set K ⊆ G such that π(K) = L. Choose some nonzerog ∈ Cc(G,R) such that g > 0 on L; then gH > 0 on K. Define r ∈ Cc(G,R)by r(x) = h(π(x))/gH(π(x)) whenever gH(π(x)) > 0 and r(x) = 0 whenevergH(π(x)) = 0. Then r is continuous and constant on the left cosets of H, sof = rg ∈ Cc(G,R) satisfies fH = h.

Theorem 4.187 (Quotient integral formula). Let G be an LCH group, let H bea closed normal subgroup of G, let µ be a left Haar measure on H, and let ν be aleft Haar measure on G/H. Then the map λ : Cc(G,R)→ R defined by

λf =

∫G/H

fH dν =

∫xH∈G/H

∫y∈H

f(xy) dµ dν

is a left-invariant nonzero positive linear functional. The corresponding Radonmeasure (see Theorem 4.137) is a left Haar measure on G.

Proof. λ is clearly a left-invariant positive linear functional. Lemma 4.186 showsthat λ is nonzero.

4.11 Lebesgue measure

In this section, we will examine Haar measures on finite-dimensional vector spacesover R. These spaces are all isomorphic to Rn, and furthermore, there is only onechoice of norm (see Theorem 1.39) and one natural topology. However, it is stillvaluable to state and prove most of the theorems in an abstract setting.

Let E be an n-dimensional vector space over R with the natural topology (inducedby any norm). Haar measures on E are translation-invariant in the sense thatµ(x+ E) = µ(E) for all x ∈ E and any Borel set E.

293

4 Integration

Theorem 4.188. Let µ be a Haar measure on E. If E = E1 ⊕ · · · ⊕ Ek and µi isa Haar measure on Ei, then there is a constant c > 0 such that for all f ∈ L 1(µ)we have ∫

Ef dµ = c

∫x1∈E1

· · ·∫xk∈Ek

f(x1 + · · ·+ xk) dµk · · · dµ1.

Proof. See Theorem 4.184 and Theorem 4.163.

Theorem 4.189. Let µ be a Haar measure on E. If F ⊂ E is a subspace withdim(F) < dim(E), then µ(F) = 0.

Proof. Choose a subspace G such that E = F ⊕ G, and choose Haar measures νand λ on F and G respectively. By Theorem 4.188, there is some constant c > 0such that for all compact K ⊆ F we have

µ(K) =

∫EχK dµ = c

∫x∈G

∫y∈F

χK(x+ y) dν dλ ≤ c∫Gν(K)χ0 dλ = 0

using the fact that λ(0) = 0 because G is not discrete. Since F is σ-compact,µ(F) = 0.

If v1, . . . , vk ∈ E, the k-parallelotope spanned by v1, . . . , vk is the compact set

P (v1, . . . , vk) = a1v1 + · · ·+ akvk : 0 ≤ a1, . . . , ak ≤ 1 .

For any f ∈ L(E), it is clear by linearity that

f (P (v1, . . . , vn)) = P (fv1, . . . , fvn).

Theorem 4.190. Let µ be a Haar measure on E. For any f ∈ L(E) and anyBorel set E,

µ(f(E)) = |det(f)|µ(E).

Proof. If f is not an isomorphism then dim(f(E)) < dim(E), so Theorem 4.189and Theorem 1.116 show that µ(f(E)) = 0 = |det(f)|µ(E) for all Borel sets E.Now suppose that f is an isomorphism. The measure ν(E) = µ(f(E)) is a Haarmeasure on E. It is translation-invariant because f is linear, and it is nonzerobecause f(E) is nonempty and open whenever E is nonempty and open. Theorem4.175 shows that ν = cµ for some c > 0, so it remains to prove that c = |det(f)|.

294

4 Integration

We first prove the theorem for the case when E is one-dimensional, in which casewe can assume that E = R and f(x) = ax for some a > 0. If m and n are positiveintegers then by translation invariance we have

nµ([

0,m

n

])=

n∑i=1

µ

([(i− 1)m

n,im

n

])= µ([0,m])

=

m∑i=1

µ([i− 1, i]) = mµ([0, 1]),

soµ([

0,m

n

])=m

nµ([0, 1]) and µ

((0,m

n

))=m

nµ([0, 1]).

For sufficiently large integers k we can find rational numbers q1, q2 such that

0 < a− 1

k< q1 < a < q2 < a+

1

k,

so by inner and outer regularity on [0, a] (see Theorem 4.138),

q1µ([0, 1]) = µ([0, q1]) ≤ µ([0, a]) ≤ µ((−1

k, q2 +

1

k

))=

(q2 +

2

k

)µ([0, 1])

and (a− 1

k

)µ([0, 1]) ≤ µ([0, a]) ≤

(a+

3

k

)µ([0, 1]).

Taking k → ∞ shows that ν([0, 1]) = µ(f([0, 1])) = µ([0, a]) = aµ([0, 1]), andtherefore c = a = |det(f)|.

For the general case, suppose that E is n-dimensional and choose any inner producton E. If f is an isometric isomorphism, then ν(B) = µ(f(B)) = µ(B) whereB = x ∈ E : |x| ≤ 1, so c = 1 = |det(f)|. If f is diagonalizable, i.e. there isa decomposition E = E1 ⊕ · · · ⊕ En such that each Ei is one-dimensional andf(Ei) ⊆ Ei, then for each i choose a Haar measure µi on Ei, a nonzero vectorvi ∈ Ei, and λi ∈ R such that fvi = λivi. Theorem 4.188 shows that there is someconstant c′ > 0 such that

µ(P (u1, . . . , un))

=

∫EχP (u1,...,un) dµ

= c′∫x1∈E1

· · ·∫xn∈En

χP (u1,...,un)(x1 + · · ·+ xn) dµn · · · dµ1

= c′µ1(P (u1)) · · ·µn(P (un))

295

4 Integration

for all u1, . . . , un ∈ E, so

ν(P (v1, . . . , vn))

= µ(P (fv1, . . . , fvn)) = µ(P (λ1v1, . . . , λnvn))

= c′µ1(P (λ1v1)) · · ·µn(P (λnvn)) = |λ1| · · · |λn|µ(P (v1, . . . , vn))

since each Ei is one-dimensional. Therefore c = |λ1| · · · |λn| = |det(f)|. Finally,for any f ∈ L(E) we can use the singular value decomposition (see Theorem 5.138)to write f = psq where p, q ∈ L(E) are isometric isomorphisms and s ∈ L(E) isdiagonalizable, so

ν = |det(p)| |det(s)| |det(q)|µ = |det(f)|µ.

Given a norm on ΛnE, the (n-dimensional) Lebesgue measure on E is the uniqueHaar measure m on E such that

m(P (v1, . . . , vn)) = |v1 ∧ · · · ∧ vn|

for all v1, . . . , vn ∈ E. (If n = 0 then we require that µ(0) = 1.) The followingtheorem shows that this condition makes sense.

Theorem 4.191. Let E be an n-dimensional vector space and let µ be a Haarmeasure on E. Assume that there is a norm on ΛnE.

1. µ(P (v1, . . . , vn)) = 0 if and only if v1, . . . , vn are linearly dependent.

2. If there are vectors u1, . . . , un such that

µ(P (u1, . . . , un)) = |u1 ∧ · · · ∧ un| > 0,

thenµ(P (v1, . . . , vn)) = |v1 ∧ · · · ∧ vn|

for all v1, . . . , vn ∈ E.

Proof. For (1), if v1, . . . , vn are linearly dependent then they are contained in asubspace F ⊂ E (with dim(F) < n), and so is P (v1, . . . , vn). Theorem 4.189 showsthat µ(P (v1, . . . , vn)) ≤ µ(F) = 0. If v1, . . . , vn are linearly independent thenthe map p : Rn → E defined by p(a1, . . . , an) = a1v1 + · · · + anvn is an isomor-phism. Since (0, 1)n ⊆ Rn is nonempty and open, p ((0, 1)n) is also nonempty andopen. But p ((0, 1)n) ⊆ P (v1, . . . , vn) and µ (p ((0, 1)n)) > 0 by Theorem 4.170, soµ(P (v1, . . . , vn)) > 0.

296

4 Integration

For (2), if v1, . . . , vn are linearly dependent then (1) shows that µ(P (v1, . . . , vn)) =0 and Corollary 1.113 shows that v1 ∧ · · · ∧ vn = 0. Suppose that v1, . . . , vn arelinearly independent. Corollary 1.113 shows that u1, . . . , un are linearly indepen-dent, so there is an isomorphism f ∈ L(E) such that fui = vi for i = 1, . . . , n.Then

µ(P (v1, . . . , vn))

= µ(P (fu1, . . . , fun)) = µ(f(P (u1, . . . , un)))

= |det(f)|µ(P (u1, . . . , un)) = |det(f)u1 ∧ · · · ∧ un|= |fu1 ∧ · · · ∧ fun| = |v1 ∧ · · · ∧ vn| .

The preceding theorem shows that choosing a normalization constant for the Haarmeasure on E is equivalent to choosing a norm on ΛnE. Given a norm on ΛnE,there is a corresponding Haar measure. Conversely, if µ is a Haar measure on Ethen we can define a norm on ΛnE by setting |v1 ∧ · · · ∧ vn| = µ(P (v1, . . . , vn))for all v1, . . . , vn ∈ E.

Any inner product on E induces a norm on ΛnE satisfying |e1 ∧ · · · ∧ en| =|e1| · · · |en| whenever e1, . . . , en are orthogonal, so m(P (e1, . . . , en)) = 1 when-ever e1, . . . , en is an orthonormal basis for E. In Rn, this means that the measureof the unit n-cube [0, 1]n is 1.

The regulated integral

In Section 1.8 we defined the regulated integral for functions defined on an interval[a, b]. We want to show that the regulated functions are in L 1([a, b],m,F), andthat the regulated integral agrees with the Lebesgue integral for these functions.

Theorem 4.192. Let m be the Lebesgue measure on R. If f : [a, b] → F is aregulated function then f ∈ L 1([a, b],m,F) and∫

[a,b]

f dm =

∫ b

a

f,

where the integral on the right is the regulated integral.

Proof. By definition, there is a sequence fn of step maps converging uniformlyto f . Each fn is a simple integrable map and∫

[a,b]

|fm − fn| dm ≤ (b− a) ‖fm − fn‖∞ → 0

297

4 Integration

as m,n → ∞, so fn is L1-Cauchy. By Theorem 4.61, f ∈ L 1([a, b],m,F) and∫[a,b]

fn dm →∫

[a,b]f dm. It is easy to see from the definition of the regulated

integral that∫

[a,b]fn dm =

∫ bafn, so

∫ bafn →

∫[a,b]

f dm. But∫ bafn →

∫ baf , so∫

[a,b]f dm =

∫ baf .

Change of variables

Let E be a finite-dimensional normed vector space over R and let m be anyLebesgue measure on E. The following proof is based on [10].

Theorem 4.193 (Image measure under a diffeomorphism). Let U, V ⊆ E be opensets and let g : U → V be a C1 diffeomorphism. Then for all compact K ⊆ V , theimage measure g∗m satisfies

(g∗m)(E) =

∫E

∣∣det(Dg−1)∣∣ dm

for all Borel sets E ⊆ K. That is, g∗m (restricted to K) is equal to the indefiniteintegral m|det(Dg−1)|.

Proof. Theorem 4.190 proves the result when g is a linear map, and in this caseTheorem 4.69 and Theorem 4.104 show that∫

g(K)

f dm =

∫K

f g d(g−1∗ m) =

∫K

f g |det(Dg)| dm (*)

whenever f ∈ L 1(g(K),m,R) and K ⊆ U is compact.

Choose any nonnegative function w ∈ Cc(E,R) such that∫E w dm = 1, let L =

supp(w), and let wr(x) = r−nw(x/r) where n = dim(E). Define ϕr : U → R by

ϕr(x) =

∫v∈V

wr(g−1(v)− x).

We first prove that for all compact K ⊆ U ,

limr→0+

ϕr(x) = |det(Dg(x))| (**)

for all x ∈ K. Choose δ > 0 small enough so that K + δL = x + δy : x ∈K, y ∈ L ⊆ U . Since K + δL is compact, Corollary 2.18 shows that there issome M such that |g(x)− g(y)| ≤ M |x− y| for all x, y ∈ K + δL. In particular,we can choose a bounded neighborhood W of 0 ∈ E with L ⊆ W such that

298

4 Integration

r−1(g(x + rL) − g(x)) ⊆ W for all x ∈ K and 0 < r < δ. Using the fact that wvanishes outside L and using (*) with the transformation v 7→ rv, we have

ϕr(x) =

∫v∈V

r−nw

(g−1(v)− x

r

)=

∫v∈g(x+rL)−g(x)

r−nw

(g−1(g(x) + v)− g−1(g(x))

r

)=

∫v∈W

w

(g−1(g(x) + rv)− g−1(g(x))

r

)for all 0 < r < δ. Since g−1 is differentiable at g(x), w is continuous and W hasfinite measure, Theorem 4.66 shows that

limr→0+

ϕr(x) =

∫v∈W

w(Dg−1(g(x))v).

Applying (*) again with the transformation v 7→ [Dg−1(g(x))]−1v gives

limr→0+

ϕr(x) =∣∣det([Dg−1(g(x))]−1)

∣∣ ∫v∈W

w(v)

= |det(Dg(x))| .

Next, we prove that ∫U

f g =

∫V

f∣∣det(Dg−1)

∣∣ (***)

for all f ∈ Cc(V,R). Let K = supp(f) and let

Ir =

∫u∈U

∫v∈V

f(g(u))∣∣det(Dg−1(g(u)))

∣∣wr(g−1(v)− u)

for 0 < r < δ, where δ is chosen so that g−1(K) + δL ⊆ U . By Fubini’s theorem,

Ir =

∫u∈U


∣∣ϕr(u).

Since |ϕr| ≤ ‖w‖∞ µ(W ) and u 7→ f(g(u))∣∣det(Dg−1(g(u)))

∣∣ is continuous, wecan apply Theorem 4.66 and (**) to get

limr→0+

Ir =

∫u∈U


∣∣ |det(Dg(u))| =∫U

f g.

299

4 Integration

On the other hand, if we define h : E → R by h(u) = f(g(u))∣∣det(Dg−1(g(u)))

∣∣for u ∈ g−1(K) and h(u) = 0 on u /∈ g−1(K) then h ∈ Cc(E,R), and Fubini’stheorem implies that

Ir =

∫v∈V

∫u∈E

h(u)wr(g−1(v)− u)

=

∫v∈V

∫u∈E

h(g−1(v) + u)wr(−u)

=

∫v∈V

∫u∈E

h(g−1(v)− ru)w(u),

where we have used (*) with the transformation u 7→ −ru. Since h is uniformlycontinuous,∣∣∣∣∫u∈E

h(g−1(v)− ru)w(u)− h(g−1(v))

∣∣∣∣ ≤ ∫u∈E

∣∣h(g−1(v)− ru)− h(g−1(v))∣∣w(u)

≤ supu∈L

∣∣h(g−1(v)− ru)− h(g−1(v))∣∣

→ 0

as r → 0. By Theorem 4.66,

limr→0+

Ir =

∫v∈V

h(g−1(v)) =

∫V

f∣∣det(Dg−1)

∣∣ .Finally, let K ⊆ V be compact, choose a relatively compact open set W ⊇ K suchthat W ⊆ V , and let M = supv∈W

∣∣det(Dg−1(v))∣∣. Let E ⊆ K be a Borel set and

let ε > 0. By Theorem 4.140, there is a compact F ⊆ E and an open Y ⊇ E withY ⊆W such that m(Y \F ) < ε. Similarly, there is a compact F ′ ⊆ g−1(E) and anopen Y ′ ⊇ g−1(E) with Y ′ ⊆ g−1(W ) such that m(Y ′\F ′) < ε. Let G = F ∪g(F ′)and Z = Y ∩g(Y ′). Then G is compact, Z is open, G ⊆ E ⊆ Z ⊆W , m(Z\G) < ε,and m(g−1(Z \ G)) < ε. By Urysohn’s lemma we can choose some f ∈ Cc(V,R)such that χG ≤ f ≤ χZ and therefore |f − χE | ≤ χZ\G. Using (***),∣∣∣∣m(g−1(E))−

∫E

∣∣det(Dg−1)∣∣∣∣∣∣ =

∣∣∣∣∫U

χE g −∫E

∣∣det(Dg−1)∣∣∣∣∣∣

=

∣∣∣∣∫U

f g +

∫U

(χE − f) g −∫E

∣∣det(Dg−1)∣∣∣∣∣∣

≤∫V

|f − χE |∣∣det(Dg−1)

∣∣+

∫U

|f − χE | g

< m(Z \G)M +m(g−1(Z \G))

< ε(M + 1).

300

4 Integration

This completes the proof.

Theorem 4.194 (Change of variables formula). Let U, V ⊆ E be open sets and letg : U → V be a C1 diffeomorphism. Let f ∈ L 1(V,m,F). Then (f g) |det(Dg)| ∈L 1(U,m,F) and ∫

V

f =

∫U

(f g) |det(Dg)| .

Proof. Theorem 4.69 shows that f g ∈ L 1(U, g−1∗ m,F) and∫

V

f dm =

∫U

f g d(g−1∗ m).

It remains to show that h |det(Dg)| ∈ L 1(U,m,F) and∫U

h d(g−1∗ m) =

∫U

h |det(Dg)| dm

for all h ∈ L 1(U, g−1∗ m,F). This is true for compact subsets of U , due to Theorem

4.193 and Theorem 4.104. We can write U =⋃∞n=1Kn where Kn is an increasing

sequence of compact sets, so we can use Theorem 4.63 on the increasing sequenceof functions χKn |h| |det(Dg)| to show that h |det(Dg)| is integrable and thenTheorem 4.66 on χKnh|det(Dg)| to prove the equality.

Sets of measure zero

Let E and F be finite-dimensional normed vector spaces with Lebesgue measuresmE and mF. Recall that mE is the completion of mE.

In general, for continuous maps f : E → E it is not true that f(E) has measurezero when E has measure zero. It is easy to construct a counterexample from aspace-filling curve, of which there are many. However, if f is differentiable then itis true that f(E) has measure zero when E has measure zero. To be precise, weshould say that f(E) has mE-measure zero since f(E) might not be a Borel set.

Lemma 4.195. If E ⊆ E has measure zero, then for all δ > 0 and ε > 0 there isa sequence of open balls Brn(xn) covering E such that xn ∈ E and 0 < rn < δfor each n, and

∑∞n=1mE(Brn(xn)) < ε.

Proof. Let d = dim(E) and let R be the ring consisting of finite disjoint unions ofsets of the form U \ V where U, V are bounded open sets in E with diamU < δ(see the discussion on Fubini’s theorem in Section 4.8 for a similar construction).

301

4 Integration

Using the fact that E is separable and Corollary 4.23, we have a sequence of openballs Brn(xn) covering E such that xn ∈ E and 0 < rn < δ/2 for each n, and∑∞n=1mE(Brn(xn)) < ε/2d. We can assume that each Brn(xn) contains at least

one point pn of E. Then B2rn(pn) ⊇ Brn(xn), so B2rn(pn) covers E and satisfiesthe desired conditions.

Theorem 4.196. Let U ⊆ E be open and let f : U → E be differentiable. IfE ⊆ U has mE-measure zero, then f(E) has mE-measure zero.

Proof. Let d = dim(E). For any positive integers p and q, let Fp,q be the set of allx ∈ E such that |f(y)−f(x)| ≤ p|y−x| for all y ∈ B1/q(x)∩E. Given p and q, letε > 0 and choose a sequence of open balls Brn(xn) covering E such that xn ∈ Eand rn < 1/q for all n, and

∑∞n=1mE(Brn(xn)) < ε (using Lemma 4.195). If

x ∈ Fp,q ∩Brn(xn) then |xn − x| < rn < 1/q, so |f(xn)− f(x)| ≤ p|xn− x| < prn.Therefore f(Fp,q∩Brn(xn)) ⊆ Bprn(f(xn)) and f(Fp,q) ⊆

⋃∞n=1Bprn(f(xn)). But

mE(f(Fp,q)) ≤∞∑n=1

mE(Bprn(f(xn))) < pdε,

so taking ε→ 0 shows that f(Fp,q) has mE-measure zero. Since f is differentiablewe have E =

⋃p,q≥1 Fp,q, and therefore f(E) has mE-measure zero.

Corollary 4.197. Let U ⊆ E be open and let f : U → F be differentiable. Ifdim(E) < dim(F) then f(U) has mF-measure zero.

Proof. Choose any subspace F0 of F that is isomorphic to E, and let ϕ : F0 → E bethe isomorphism. Choose a subspace F1 such that F = F0⊕F1 and let π0 : F→ F0

and π1 : F → F1 be the projections. Then V = π−10 (ϕ−1(U)) is open in F and

the map g : V → F defined by g(x) = f(ϕ(π0(x))) is differentiable. By Theorem4.189, E = x ∈ V : π1(x) = 0 has mF-measure zero since it is contained inthe subspace F0, and Theorem 4.196 shows that f(U) = g(V ) has mF-measurezero.

302

5 Commutative Banach algebras

In this chapter, all algebras are assumed to be associative and over the complexnumbers.

Notes

The first three sections are based on [8, 13]. While some of the proofs in [13]use Runge’s theorem (for approximating a holomorphic function by a sequence ofrational functions), we have avoided its use here. For example, the multiplicativityof the holomorphic functional calculus (see Theorem 5.13) is usually proved byapproximating holomorphic functions f and g by sequences fn and gn ofrational functions, and noting that multiplicativity holds for fn and gn. Our proofrelies on Lemma 3.36, which is a version of Fubini’s theorem for line integralsthat is ultimately based on differentiation under the integral sign (Theorem 2.37).As another example, Corollary 5.44 is usually proved by approximating

√· : C \

(−∞, 0]→ C by polynomials, and then showing that the coefficients can be chosento be real. Our proof relies on Theorem 5.11, which allows us to exploit thesymmetry of C \ (−∞, 0] about the x-axis.

In Section 5.4 we develop the theory of positive operator-valued measures in asimilar fashion to [1], and this provides a useful foundation on which to build thedifferent versions of the spectral theorem. In Section 5.6, we state the spectraltheorem for a real Hilbert space E via the complexification of E. This materialis fairly original and addresses both self-adjoint and normal operators; a differentapproach can be found in [6]. Many of the proofs in Section 5.8 are from [14].

Section 5.9 is based on [12], but we first give the dual group G the compact open

topology and then show that there is a natural homeomorphism between G andthe Gelfand space of L1(G), as in [8].

303


5.1 The spectrum

In Section 1.7, we discussed the basic properties of normed algebras. Here we willfocus on the spectrum of elements in a complex unital algebra, and we will definethe holomorphic functional calculus.

Let A be a unital algebra and let x ∈ A. Let G(A) be the set of all invertibleelements in A (which clearly forms a group under multiplication). If A is a unitalBanach algebra then G(A) is open in A due to Theorem 1.129. We define thespectrum of x in A, denoted by σA(x), to be the set of numbers λ ∈ C such thatλe− x is not invertible in A. That is,

σA(x) = λ ∈ C : λe− x /∈ G(A) .

The resolvent set of x is C \ σA(x). The spectral radius of x is

ρA(x) = supλ∈σA(x)

|λ| .

When it is not confusing to do so, we will write σ(x) instead of σA(x) and ρ(x)instead of ρA(x).

Theorem 5.1 (Properties of the spectrum). Let A be a unital algebra and letx, y ∈ A.

1. σ(0) = 0 and σ(e) = 1.

2. σ(rx) = rσ(x) for all r ∈ C.

3. σ(xy) ∪ 0 = σ(yx) ∪ 0, i.e. σ(xy) ⊆ σ(yx) ∪ 0. If x is invertible thenσ(xy) = σ(yx).

Proof. (1) and (2) are obvious. For (3), if λ 6= 0 and λe− yx is invertible then itis easy to check that λ−1(e+ x(λe− yx)−1y) is the inverse of λe− xy. Thereforeσ(xy) ⊆ σ(yx)∪0. If x is invertible and yx is invertible then xy = x(yx)x−1 mustbe invertible, so 0 ∈ σ(xy) implies that 0 ∈ σ(yx). Therefore σ(xy) ⊆ σ(yx).

Theorem 5.2 (Spectral mapping theorem for polynomials). Let x ∈ A and p(z) ∈C[z]. Then

σ(p(x)) = p(σ(x)).

Proof. It easy to see that

p(λ)e− p(x) = (λe− x)q(x) = q(x)(λe− x)

304


for some q(z) ∈ C[z], so if λ ∈ σ(x) then λe− x is a zero divisor, p(λ)e− p(x) is azero divisor and p(λ) ∈ σ(p(x)). This shows that p(σ(x)) ⊆ σ(p(x)). Conversely,let λ ∈ σ(p(x)) and write

p(z)− λ = (z − λ1)d1 · · · (z − λn)dn .

Thenp(x)− λe = (x− λ1e)

d1 · · · (x− λne)dn ,

and since λe − p(x) is not invertible, one of the factors x − λie is not invertible.We have p(λi) − λ = 0 and λi ∈ σ(x), so λ = p(λi) ∈ p(σ(x)). This shows thatσ(p(x)) ⊆ p(σ(x)).

We have not yet established whether the spectrum of an element is nonempty orbounded. We will do this for Banach algebras later, but first we will examine thefinite-dimensional case.

The spectrum of an algebraic element

An element x ∈ A is called algebraic if there is a nonzero polynomial p(z) ∈ C[z]such that p(x) = 0. It is clear that x is algebraic if and only if the subalgebragenerated by x is finite-dimensional. If x is algebraic, the monic polynomial mx(z)of smallest degree such that mx(x) = 0 is called the minimal polynomial of x.If A is finite-dimensional then any element x ∈ A is algebraic, since

e, x, x2, . . .

is linearly dependent.

Lemma 5.3. Let x ∈ A be algebraic. The following are equivalent:

1. x is invertible.

2. x is not a zero divisor.

3. The constant term of mx(z) is nonzero.

Proof. (1)⇒ (2) is clear. Suppose that (2) holds. If mx(z) = zp(z) then xp(x) =mx(x) = 0 and p(x) = 0 since x is not a zero divisor, which contradicts theminimality of the degree of mx(z). This proves (2)⇒ (3). Suppose that mx(z) =a0 + · · ·+ an−1z

n−1 + zn where a0 6= 0. Then

e = [−a−10 (a1 + · · ·+ an−1x

n−2 + xn−1)]x

= x[−a−10 (a1 + · · ·+ an−1x

n−2 + xn−1)],

so x is invertible. This proves (3)⇒ (1).

305


Theorem 5.4. If x ∈ A is algebraic, then σ(x) = m−1x (0). That is, λ ∈ C is in

the spectrum of x if and only if it is a root of mx(z).

Proof. If λe − x is not invertible, then mλe−x(z) = zp(z) for some p(z) ∈ C[z].Since q(z) = mx(λ−z) satisfies q(λe−x) = 0, the minimal polynomial mλe−x(z) =zp(z) divides mx(λ− z). Setting z = 0 gives mx(λ) = 0. Conversely, suppose thatmx(λ) = 0. Then mx(z) = (λ − z)p(z) for some p(z) ∈ C[z], so 0 = (λe − x)p(x)and λe− x is not invertible since it is a zero divisor.

Corollary 5.5. The spectrum of any algebraic element in A is a nonempty finiteset.

Proof. Every non-constant polynomial in C[z] has at least one root, and there arefinitely many of them.

The spectrum in a Banach algebra

Lemma 5.6. Let A be a normed algebra. If x ∈ A, then limn→∞ |xn|1/n exists

and is equal to infn≥1 |xn|1/n.

Proof. We can assume that xn 6= 0 for all n ≥ 1. Let c0 = 1 and cn = |xn| forn ≥ 1. Then cm+n ≤ cmcn for all m,n ≥ 1. If m is fixed, we can write n in theform n = q(n)m+ r(n) where 0 ≤ r(n) < m, so that

c1/nn ≤ cq(n)/nm c

1/nr(n).

But limn→∞ q(n)/n = 1/m and r(n) < m, so

lim supn→∞

c1/nn ≤ c1/mm .

Since this holds for all m ≥ 1, we have

lim supn→∞

c1/nn ≤ infn≥1

c1/nn ≤ lim infn→∞

c1/nn .

Theorem 5.7 (Properties of the spectrum in a Banach algebra). Let A be a unitalBanach algebra and let x ∈ A.

1. σ(x) is a closed and bounded set in C. If λ ∈ σ(x), then |λ| ≤ |x|.

2. σ(x) is nonempty (cf. Corollary 5.5).

306


3. ρ(x) = limn→∞ |xn|1/n = infn≥1 |xn|1/n ≤ |x|.

Proof. For (1), the map ϕ : C → A given by λ 7→ λe − x is continuous, soϕ−1(G(A)) is open. Therefore σ(x) = ϕ−1(A \G(A)) = C \ ϕ−1(G(A)) is closed.If |λ| > |x| then

∣∣λ−1x∣∣ < 1, so e− λ−1x is invertible and λe− x = λ(e− λ−1x) is

also invertible.

For (2), suppose that σ(x) is empty. Then the map f : C → A given by z 7→(ze − x)−1 is entire (see Theorem 2.6). If |z| > |x| then ze − x = z(e − z−1x) isinvertible and

∣∣(ze− x)−1∣∣ = |z|−1 ∣∣(e− z−1x)−1

∣∣ ≤ |z|−1

1− |z−1x|→ 0

as |z| → ∞ by Theorem 1.129. By Liouville’s theorem (Corollary 3.9), f = 0. Butf(z) = (ze− x)−1 6= 0 for any z ∈ C, which is a contradiction.

For (3), Lemma 5.6 shows that L = limn→∞ |xn|1/n = infn≥1 |xn|1/n. Let z ∈ Cand suppose that |z| > L. Since

∣∣z−1∣∣ < L−1, Theorem 1.136 shows that the

series∑∞n=0(z−1x)n = (e − z−1x)−1 converges. Therefore z /∈ σ(x), which shows

that ρ(x) ≤ L. Conversely, suppose that |z| < ρ(x)−1. Then z−1 /∈ σ(x), soe − zx = z(z−1e − x) is invertible. Therefore the function g(z) = (e − zx)−1 isholomorphic on

z ∈ C : |z| < ρ(x)−1

, and Theorem 3.7 shows that

g(z) =

∞∑n=0

xnzn

has a radius of convergence of at least ρ(x)−1. By Theorem 1.136 we have L−1 ≥ρ(x)−1, i.e. L ≤ ρ(x).

Note that the proof of Corollary 5.5 also uses Liouville’s theorem indirectly, byinvoking the fundamental theorem of algebra (Theorem 3.10).

Corollary 5.8 (Gelfand-Mazur theorem). If A is a unital Banach algebra in whichevery nonzero element is invertible, then A is isometrically isomorphic to C.

Proof. Let x ∈ A. If λ ∈ σ(x) then λe − x is not invertible, so λe − x = 0and x = λe. Since σ(x) is nonempty, it contains exactly one point λx satisfyingx = λxe. The map x 7→ λx is clearly an isometric isomorphism.

Theorem 1.129 shows (for Banach algebras) that if |e−x| < 1, then x is invertibleand x−1 =

∑∞k=0(e − x)k. If ρ(e − x) < 1 then 1 /∈ σ(x), so x = e − (e − x)

307


is invertible. In fact, we also have x−1 =∑∞k=0(e − x)k. Choose some r so

that ρ(e − x) < r < 1. Theorem 5.7 shows that there exists some N such that|(e − x)k|1/k ≤ r and |(e − x)k| ≤ rk for all k ≥ N . Therefore

∑∞k=0(e − x)k

converges (see Theorem 1.17), and the argument in Theorem 1.129 shows that itis the inverse of x.

The spectrum in a subalgebra

Theorem 5.9. Let A be a unital Banach algebra and let B be a closed unitalsubalgebra of A. For all x ∈ B,

σA(x) ⊆ σB(x) and ∂(σB(x)) ⊆ ∂(σA(x)).

Proof. If λe − x ∈ G(B) then λe − x ∈ G(A), so σA(x) ⊆ σB(x). For the secondinclusion, if we can show that ∂(σB(x)) ⊆ σA(x) then

∂(σB(x)) ⊆ σA(x) ∩ C \ σB(x) ⊆ σA(x) ∩ C \ σA(x) = ∂(σA(x))

because σA(x) and σB(x) are closed. Let λ ∈ ∂(σB(x)) and let y = λe − x sothat y /∈ G(B). Choose a sequence λn in C \ σB(x) such that λn → λ. Letyn = λne − x so that yn ∈ G(B) and yn → y. We must have

∣∣y−1n

∣∣ → ∞, forotherwise there is some M > 0 such that |y−1

n | < M for infinitely many n. Sinceyn → y, there is some m for which |y−1

m | < M and |ym − y| < 1/M . But

|e− y−1m y| = |y−1

m (ym − y)| ≤ |y−1m ||ym − y| < 1,

so y−1m y ∈ G(B) by Theorem 1.129. This implies that y = ym(y−1

m y) ∈ G(B),which is a contradiction. Therefore |y−1

n | → ∞.

Suppose that y ∈ G(A), and let zn = |y−1n |−1y−1

n . Since

|zny| ≤

∣∣∣∣∣∣∣y−1n y

∣∣∣∣y−1n

∣∣ −∣∣y−1n yn

∣∣∣∣y−1n

∣∣∣∣∣∣∣+

1∣∣y−1n

∣∣ ≤∣∣y−1n (y − yn)

∣∣∣∣y−1n

∣∣ +1∣∣y−1n

∣∣≤ |y − yn|+

1∣∣y−1n

∣∣ → 0,

we have1 = |zn| =

∣∣(zny)y−1∣∣ ≤ |zny| ∣∣y−1

∣∣→ 0.

This is a contradiction, so y /∈ G(A) and therefore λ ∈ σA(x).

Theorem 5.10. Let A be a unital Banach algebra, let B be a closed unital subal-gebra of A, and let x ∈ B. If C \ σA(x) is connected, then σB(x) = σA(x).

308


Proof. Since σA(x) ⊆ σB(x), it suffices to show that σB(x)\σA(x) = ∅. Let UA =C\σA(x) and UB = C\σB(x). Suppose that there is some λ ∈ σB(x)\σA(x), andchoose any µ /∈ σB(x) ⊆ UA. Since UA is connected and open in C (which is locallypath connected), it is path connected. This implies that there is a continuousfunction γ : [0, 1] → UA such that γ(0) = λ and γ(1) = µ. Since γ(0) ∈ σB(x),the set γ−1(σB(x)) is nonempty. Let s = sup γ−1(σB(x)); then s < 1 becauseγ(1) ∈ UB and UB is open. Therefore γ((s, 1]) ⊆ UB and γ(s) ∈ γ((s, 1]) ⊆ UB .We also have

γ(s) ∈ γ([0, 1]) ∩ σB(x) ⊆ σB(x),

so γ(s) ∈ ∂(σB(x)). But Theorem 5.9 shows that ∂(σB(x)) ⊆ ∂(σA(x)), so γ(s) ∈∂(σA(x)) ⊆ σA(x). This is a contradiction because γ(s) ⊆ UA.

Holomorphic functional calculus

Let A be a unital Banach algebra, let U ⊆ C be an open set and let H(U) be theunital algebra of all holomorphic functions from U to C.

Let S be a subset of U and let γ be a 1-cycle in U . We say that γ surrounds S ifthe image of γ is contained in U \S and W (γ, z) = 1 for all z ∈ S and W (γ, z) = 0for all z ∈ C \ U . Note that γ is homologous to 0 in U by Corollary 3.32. Iff : U → A is holomorphic, then Theorem 3.34 shows that

f(z) = W (γ, z)f(z) =1

2πi

∫γ

f(ζ)

ζ − zdζ

for every z ∈ S.

The following theorem provides a method for constructing a 1-cycle that surroundsa compact subset of U . If U is symmetric about the x-axis, then the 1-cycle canalso be chosen to be symmetric about the x-axis. To be precise, if γ =

∑ki=1 ciσi is

a 1-cycle then we define its (complex) conjugate to be the 1-cycle γ =∑ki=1 ciσi,

where σi(t) = σi(t) for t ∈ [0, 1]. We say that a set S ⊆ C is self-conjugate ifz ∈ S for every z ∈ S.

Theorem 5.11. Let U ⊆ C be an open set and let S be a compact subset of U .

1. There exists a piecewise C1 1-cycle γ in U that surrounds S.

2. If U is self-conjugate, then γ can be chosen so that [γ] = [η − η] for somepiecewise C1 1-chain η in U .

Proof. Since S is compact, we can choose some δ > 0 such that the distancebetween any point in S and any point in C \ U is greater than 2δ. Divide the

309


plane into a grid of horizontal and vertical lines such that the distance betweenadjacent horizontal and adjacent vertical lines is δ. Let R1, . . . , Rn be the squaresin this grid (with side length δ) which intersect S, so that Ri ⊆ U for each i;let σi,1, . . . , σi,4 be the edges of each boundary ∂Ri. Let E be the set of all suchedges that are not also the edge of another square intersecting S. (That is, all σi,jsuch that there is no σi′,j′ with i′ 6= i, j′ 6= j, and σi,j([0, 1]) = σi′,j′([0, 1]).) Ifthe image of some edge σi,j ∈ E intersects S, then both Ri and another squareRi′ (sharing this edge) intersect S. This is impossible, because the definition of Eexcludes such edges. Therefore γ =

∑σi,j∈E σi,j is a piecewise C1 1-cycle in U \S.

Lemma 3.35 shows that W (γ, z) = 1 if z ∈ IntRi and W (γ, z) = 0 if z ∈ C \U . Ifz ∈ S then z is a limit point of some IntRi, so W (γ, z) = 1. This proves (1).

For (2), use the same procedure with the compact set T = S ∪ z : z ∈ S whilealigning the grid so that it contains the real line z ∈ C : Im z = 0. Every Ri has acorresponding square R∗i with R∗i = z : z ∈ Ri, and similarly every σi,j ∈ E has

a corresponding σ∗i,j such that σ∗i,j = −σi,j , i.e. σ∗i,j(t) = σi,j(1− t) for t ∈ [0, 1].(The orientation is reversed because every ∂Ri is taken counterclockwise.) Notethat γ does not include any edge whose image lies in the real line, because thatedge is also the edge of a different square intersecting T . This shows that we canpartition E into two sets E1 and E2 of equal size such that −σi,j ∈ E2 for everyσi,j ∈ E1. Let η =

∑σi,j∈E1

σi,j . Then [γ] = [η − η], and γ surrounds S becauseit surrounds T .

Theorem 5.12. Let U ⊆ C be an open set. The set AU = x ∈ A : σ(x) ⊆ U isopen.

Proof. Let x ∈ A. Since λ 7→ |(λe − x)−1| is continuous on C \ σ(x) and |(λe −x)−1| → ∞ as |λ| → ∞, there exists some M such that |(λe − x)−1| < M for allλ /∈ σ(x). If y ∈ A with |y − x| < 1/M and λ /∈ U , then λ /∈ σ(x) and

λe− y = λe− x− (y − x)

= (λe− x)(e− (λe− x)−1(y − x))

is invertible since∣∣(λe− x)−1(y − x)

∣∣ < 1. Therefore λ /∈ σ(y).

Let HU (H(U)) be the vector space of all functions f : AU → A of the form

f(x) =1

2πi

∫γ

f(λ)(λe− x)−1 dλ,

where f ∈ H(U) and γ is any 1-cycle in U that surrounds σ(x) (see Theorem5.11). The integral is well-defined since any two 1-cycles in U surrounding σ(x)

310


are homologous in U \ σ(x), by Corollary 3.32. Let HU be the linear map given

by f 7→ f . When it is not confusing to do so, we will write H instead of HU .

Theorem 5.13 (Fundamental theorem of holomorphic functional calculus).

1. H(H(U)) is a complex algebra.

2. H : H(U) → H(H(U)) is an algebra isomorphism. In particular, H(H(U))is commutative.

3. If fn is a sequence in H(U) converging compactly (i.e. uniformly on com-pact sets) to some f ∈ H(U), then

f(x) = limn→∞

fn(x)

for all x ∈ AU .

4. Let n ≥ 0 and let pn : C → C be the function given by z 7→ zn. Thenpn(x) = xn for all x ∈ A.

Proof. Firstly, H is injective since f = 0 implies that f(µ) = f(µe) = 0 for allµ ∈ U . To prove (1) and (2), it suffices to show that

H(f)H(g) = H(fg)

for all f, g ∈ H(U). Let x ∈ A. By Theorem 5.11, we can choose a piecewise C1

1-cycle η in U that surrounds σ(x), and a piecewise C1 1-cycle γ in U \ S thatsurrounds σ(x), where S is the image of η. Note the formula

(λe− x)−1(µe− x)−1 = − (λe− x)−1 − (µe− x)−1

λ− µ. (*)

Using (*) and Lemma 3.36, we compute

H(f)(x)H(g)(x) =1

(2πi)2

∫γ

∫η

f(λ)g(µ)(λe− x)−1(µe− x)−1 dµ dλ

= − 1

(2πi)2

∫γ

∫η

f(λ)g(µ)(λe− x)−1 − (µe− x)−1

λ− µdµ dλ

=1

(2πi)2

∫γ

f(λ)(λe− x)−1

(∫η

g(µ)

µ− λdµ

)dλ

+1

(2πi)2

∫η

g(µ)(µe− x)−1

(∫γ

f(λ)

λ− µdλ

)dµ.

311


The second integral is equal to zero, since λ 7→ f(λ)/(λ − µ) is holomorphic onU \ S and γ is homologous to 0 in U \ S. Therefore

H(f)(x)H(g)(x) =1

2πi

∫γ

f(λ)(λe− x)−1

(1

2πi

∫η

g(µ)

µ− λdµ

)dλ

=1

2πi

∫γ

f(λ)g(λ)(λe− x)−1 dλ

= H(fg)(x).

For (3), let γ be a piecewise C1 1-cycle in U that surrounds σ(x). By linearity, wecan assume that γ is a curve defined on [0, 1]. Then∣∣∣f(x)− fn(x)

∣∣∣ =

∣∣∣∣ 1

2πi

∫γ

(f(λ)− fn(λ))(λe− x)−1 dλ

∣∣∣∣≤

(L(γ)

2πsupt∈[0,1]

∣∣(γ(t)e− x)−1∣∣) sup

t∈[0,1]

|f(γ(t))− fn(γ(t))|

→ 0

since f → fn compactly. For (4), choose some r > |x| and let C be the circle ofradius r around 0. From Theorem 5.7, it is clear that C surrounds σ(x). If |λ| = rthen

(λe− x)−1 = λ−1(e− λ−1x)−1 =1

λ

∞∑k=0

(xλ

)ksince |x/λ| < 1. Therefore

pn(x) =1

2πi

∫C

λn(λe− x)−1 dλ

=1

2πi

∫C

∞∑k=0

xk

λk+1−n dλ

=

∞∑k=0

xk(

1

2πi

∫C

1

λk+1−n dλ

)= xn

since the sum∑∞k=0 x

k/λk+1−n converges uniformly on C.

Lemma 5.14. Let x ∈ AU and f ∈ H(U). Then f(x) is invertible if and only iff(λ) 6= 0 for all λ ∈ σ(x).

312


Proof. If f(λ) 6= 0 for all λ ∈ σ(x), then 1/f is holomorphic on some open setV ⊆ U containing σ(x). Therefore

HV (f)(x)HV (1/f)(x) = HV (1)(x) = 1,

which shows that f(x) = HV (f)(x) is invertible. Conversely, suppose that f(λ) =

0 for some λ ∈ σ(x) and f(x) is invertible. Write f(z) = (z − λ)g(z) where

g ∈ H(U). Then f(x) = (x − λe)g(x) = g(x)(x − λe) and f(x)g(x) = g(x)f(x)imply that

(x− λe)g(x)f(x)−1 = 1 and g(x)f(x)−1(x− λe) = 1,

which is a contradiction since λ ∈ σ(x).

Compare the next theorem with Theorem 5.2.

Theorem 5.15 (Spectral mapping theorem). Let x ∈ AU and f ∈ H(U). Then

σ(f(x)) = f(σ(x)).

Proof. Let λ ∈ C and let g(z) = λ − f(z). Then λ ∈ σ(f(x)) if and only if

λe − f(x) = g(x) is not invertible. By Lemma 5.14, g(x) is not invertible if andonly if g(µ) = 0 for some µ ∈ σ(x), i.e. λ = f(µ) ∈ f(σ(x)).

Theorem 5.16 (Composition of functions). Let U, V ⊆ C be open sets. Letf ∈ H(U) and let g ∈ H(V ) with f(U) ⊆ V . Then HU (f)(AU ) ⊆ AV , and

HU (g f) = HV (g) HU (f).

That is, f(AU ) ⊆ AV and h = g f if h = g f .

Proof. Let x ∈ AU . By Theorem 5.15 we have σ(f(x)) = f(σ(x)) ⊆ f(U) ⊆ V , so

f(x) ∈ AV . Let W ⊆ U be a relatively compact open set containing σ(x). Sincef is continuous, f(W ) is compact. By Theorem 5.11, we can choose a piecewiseC1 1-cycle η in V that surrounds f(W ), and a piecewise C1 1-cycle γ in W that

313


surrounds σ(x). Using Lemma 3.36,

HU (g f)(x) =1

2πi

∫γ

g(f(λ))(λe− x)−1 dλ

=1

2πi

∫γ

(1

2πi

∫η

g(µ)

µ− f(λ)dµ

)(λe− x)−1 dλ

=1

2πi

∫η

g(µ)

(1

2πi

∫γ

(µ− f(λ))−1(λe− x)−1 dλ

)dµ

=1

2πi

∫η

g(µ)(µe−HU (f)(x))−1 dµ

= HV (g)(HU (f)(x))

since W (η, f(λ)) = 1 and λ 7→ (µ− f(λ))−1 is holomorphic on W .

The identity component

The identity component of G(A), denoted by G0(A), is the connected com-ponent of G(A) containing the unit e. For convenience, we will write G insteadof G(A) and G0 instead of G0(A). Recall that there is an exponential functionexp : A→ G (see Definition 2.60).

Theorem 5.17 (Properties of the identity component).

1. G0 is an open normal subgroup of G.

2. G0 is the subgroup generated by exp(A).

3. If A is commutative then G0 = exp(A).

4. If A is commutative then G/G0 does not contain any elements of finite orderexcept for G0.

Proof. For (1), G0 is open in G because G is locally connected. If x ∈ G0 thenx−1G0 is a connected neighborhood of e, so x−1G0 ⊆ G0. This shows that G0 is asubgroup of G. Similarly, x−1G0x is a connected neighborhood of e and thereforex−1G0x ⊆ G0, so G0 is normal in G.

For (2), let H be the subgroup generated by exp(A). Then H =⋃∞n=1En where

En = x1 · · ·xn : x1, . . . , xn ∈ exp(A). Each En is a connected neighborhoodof e, so En ⊆ G0 and therefore H ⊆ G0. If we can show that H is open in G0,then Theorem 4.166 shows that H is also closed in G0 and H = G0 because G0

is connected. To show that H is open in G0, it suffices to show that there is a

314


neighborhood of e ∈ G0 that is contained in H. This follows from Theorem 5.16,Theorem 5.12 and the fact that log is defined on a neighborhood of 1 ∈ C.

For (3), if A is commutative then Theorem 2.61 shows that exp is a (group)homomorphism. Therefore exp(A) is a group and G0 = exp(A).

For (4), suppose that xn = exp(a) for some a ∈ A. Let y = exp(n−1a) andz = xy−1 so that zn = exp(a) exp(−a) = e. If we can show that z ∈ G0, thenx = yz ∈ G0. Define f : C → A by f(λ) = λz − (λ− 1)e and let E = f−1(G). Ifλ /∈ E then λ 6= 0 and

λz − (λ− 1)e = −λ(λ− 1

λe− z

)/∈ G,

so (λ− 1)/λ ∈ σ(z). By Theorem 5.2 we have (λ− 1)n/λn ∈ σ(zn) = σ(e) = 1,so (λ − 1)n = λn. This implies that C \ E is finite, so E is connected and f(E)is a connected set containing f(0) = e. Therefore f(E) ⊆ G0, and in particularf(1) = z ∈ G0.

5.2 Gelfand representation

We will be discussing three main classes of Banach algebras later in this chapter:

1. C0(X,C) where X is an LCH space, under pointwise multiplication.

2. L(E) where E is a Hilbert space, under operator composition.

3. L1(G,C) where G is an abelian LCH group, under convolution.

The first of these is particularly well-understood, and the Gelfand representationwill allow us to represent certain elements of L(E) and L1(G,C) by functions inC0(X,C), for some suitable LCH space X.

Weak topologies

Let X be a set and let fαα∈A be a collection of functions fα : X → Yα, whereeach Yα is a topological space. The initial topology on X with respect tofαα∈A is the coarsest topology on X that makes every fα continuous. In otherwords, it is the topology generated by the sets f−1

α (U) : α ∈ A,U open in Yα. IfE is a normed vector space over K, the weak topology on E is the initial topologywith respect to the functions in the (continuous) dual space E∗. Recall that thereis an isometry τ : E→ E∗∗ (see Lemma 1.54). The weak* topology on E∗ is theinitial topology with respect to τ(E) ⊆ E∗∗, i.e. the collection of evaluation maps

315


x : E∗ → K where x(f) = f(x). Clearly the weak* topology is weaker (coarser)than the weak topology on E∗, and they coincide when E is reflexive. A net (fα)in E∗ converges to f in the weak* topology if and only if fα → f pointwise.

Theorem 5.18 (Banach-Alaoglu theorem). Let E be a normed vector space overK.

1. The weak* topology on E∗ is identical to the subspace topology on E∗ withrespect to the product space KE =

∏x∈EK, and is Hausdorff. (The elements

of KE are functions from E to K.)

2. The closed unit ball B∗ = f ∈ E∗ : |f | ≤ 1 in E∗ is compact in the weak*topology.

Proof. For (1), the topology on KE is generated by x−1(U) : x ∈ E, U open inK,where x : KE → K is the evaluation map defined by x(f) = f(x). Similarly, theweak* topology on E∗ is generated by the collection x−1(U) : x ∈ E, U open inK,where x : E∗ → K is the evaluation map. Since x is just x restricted to E∗, wehave x−1(U) = x−1(U) ∩ E∗ whenever x ∈ E and U is open in K. The subspacetopology on E∗ is generated by sets of this form. It is Hausdorff because K isHausdorff.

For (2), let Bx = r ∈ K : |r| ≤ |x|, which is a compact subset of K. SinceB∗ ⊆

∏x∈EBx and

∏x∈EBx is compact by Tychonoff’s theorem, it suffices to

show that B∗ is closed in∏x∈EBx. Let f be in the closure of B∗, let x, y ∈ E,

and let ε > 0. Choose some g ∈ B∗ such that

|f(x)− g(x)| < ε,

|f(y)− g(y)| < ε,

|f(x+ y)− g(x+ y)| < ε.

(The set of all g for which the above holds is a neighborhood of f in KE, fromthe definition of the product topology.) Then |f(x + y) − (f(x) + f(y))| < 3ε,so taking ε → 0 gives f(x + y) = f(x) + f(y). If r ∈ K then setting y = rxshows that |f(rx) − rf(x)| < ε(|r| + 1), so f(rx) = rf(x). Finally, |f(x)| <|f(x)− g(x)|+ |g(x)| < |x|+ ε, so |f | ≤ 1. This shows that f ∈ B∗.

Gelfand space, unital case

A complex homomorphism on an algebra A is an algebra homomorphism fromA to C. The set of all nonzero complex homomorphisms on A is denoted by ∆(A),which is referred to as the Gelfand space, structure space, spectrum, or the

316


maximal ideal space. (The reason for the last two names will become clearlater.) Our goal is to give ∆(A) a topology and then represent the elements of Aby continuous functions on ∆(A), which are well-understood objects.

The following theorem explains the term “maximal ideal space”. Recall thatMax(A) is the set of all maximal modular ideals of A.

Theorem 5.19. If A is a commutative Banach algebra, then the map ker :∆(A) → Max(A) is a bijection. That is, every maximal modular ideal of A isthe kernel of some ϕ ∈ ∆(A), and the kernel of every ϕ ∈ ∆(A) is a maximalmodular ideal.

Proof. We first verify that kerϕ ∈ Max(A) for all ϕ ∈ ∆(A). Since kerϕ is aclosed ideal of A with codimension 1, it is a maximal ideal. Since ϕ 6= 0, we canchoose some u ∈ A such that ϕ(u) = 1. Then for all x ∈ A, we have ϕ(x−ux) = 0and therefore x − ux ∈ kerϕ. This shows that u + kerϕ is a unit in A/ kerϕ, sokerϕ is a maximal modular ideal.

To prove injectivity, note that if ϕ,ψ ∈ ∆(A) and kerϕ = kerψ then ϕ = rψfor some r ∈ C, since ϕ,ψ are nonzero linear functionals (see Lemma 2.55). Ifu + kerϕ is the unit in A/ kerϕ then u /∈ kerϕ and ϕ(u − u2) = 0, so ϕ(u) = 1.Similarly we have ψ(u) = 1, so r = 1.

Finally, let M ∈ Max(A) and let u + M be the unit in A/M . Note that A/Mis a (unital) Banach algebra by Theorem 1.72. If x + M ∈ A/M is nonzero thenx /∈M , so M+Ax is a modular ideal containing M and we must have M+Ax = Asince M is maximal. This implies that y + ax = u for some y ∈ M and a ∈ A,so (a + M)(x + M) = u + M . This shows that every nonzero element in A/M isinvertible. By Corollary 5.8, A/M is isometrically isomorphic to C, and we have analgebra homomorphism ϕ : A→ C with kerϕ = M . This proves surjectivity.

The radical of a commutative Banach algebra A is

rad(A) =⋂

M∈Max(A)

M =⋂

ϕ∈∆(A)

kerϕ.

We say that A is semisimple if rad(A) = 0.

We will first define a topology on the Gelfand space of a unital Banach algebra,and then generalize the construction to non-unital Banach algebras by consideringthe unitization.

Lemma 5.20. Let A be a unital algebra over K (where K = R or K = C). Ifϕ : A → C is linear, ϕ(e) = 1, and ϕ(x2) = ϕ(x)2 for all x ∈ A, then ϕ is anonzero complex homomorphism.

317


Proof. For all x, y ∈ A,

ϕ(xy + yx) = ϕ(x2 + xy + yx+ y2)− ϕ(x2)− ϕ(y2)

= ϕ((x+ y)2)− ϕ(x2)− ϕ(y2)

= (ϕ(x) + ϕ(y))2 − ϕ(x2)− ϕ(y2)

= 2ϕ(x)ϕ(y).

Since(ab− ba)2 + (ab+ ba)2 = 2(a(bab) + (bab)a),

we also have the identity

[ϕ(ab− ba)]2 + [2ϕ(a)ϕ(b)]2 = ϕ((ab− ba)2 + (ab+ ba)2)

= 2ϕ(a(bab) + (bab)a)

= 4ϕ(a)ϕ(bab)

for all a, b ∈ A. If we take a = x− ϕ(x)e and b = y, then ϕ(a) = 0 and thereforeϕ(xy − yx) = ϕ(ab− ba) = 0.

Theorem 5.21 (Gleason-Kahane-Zelazko theorem). Let A be a unital Banachalgebra and let ϕ : A→ C be linear. The following are equivalent:

1. ϕ is a nonzero complex homomorphism.

2. ϕ(e) = 1 and ϕ(x) 6= 0 for all x ∈ G(A).

3. ϕ(x) ∈ σ(x) for all x ∈ A.

Proof. If (1) holds then ϕ(e) = ϕ(e)ϕ(e), so ϕ(e) = 1 because ϕ(e) = 0 wouldimply that ϕ = 0. Also, if x ∈ G(A) then ϕ(x)ϕ(x−1) = 1, so ϕ(x) 6= 0. Thisproves (1) ⇒ (2). If (2) holds and λ /∈ σ(x), then ϕ(λe − x) = λ − ϕ(x) isnonzero. This proves (2) ⇒ (3). Suppose that (3) holds. Then ϕ(e) = 1 becauseσ(e) = 1, and by Lemma 5.20 it remains to show that ϕ(x2) = ϕ(x)2 forall x ∈ A. Let n ≥ 2 be an integer. Let p(λ) = ϕ((λe − x)n), which is apolynomial in λ of degree n. Let λ1, . . . , λn be the roots of p(λ). For each i, wehave 0 = ϕ((λie−x)n) ∈ σ((λie−x)n). This implies that λie−x is not invertible,so λi ∈ σ(x) and |λi| ≤ ρ(x). But

n∏k=1

(λ− λk) = p(λ) =

n∑k=0

(−1)k(n

k

)ϕ(xk)λn−k,

son∑k=1

λk = nϕ(x) and∑j<k

λjλk =

(n

2

)ϕ(x2).

318


Then

n2ϕ(x)2 =

(n∑k=1

λk

)2

=

n∑k=1

λ2k + 2

∑j<k

λjλk =

n∑k=1

λ2k + n(n− 1)ϕ(x2),

and

∣∣ϕ(x)2 − ϕ(x2)∣∣ =

1

n

∣∣∣∣∣ 1nn∑k=1

λ2k − ϕ(x2)

∣∣∣∣∣≤ 1

n

(|ρ(x)|2 +

∣∣ϕ(x2)∣∣)

→ 0

as n→∞.

Corollary 5.22. If A is a unital Banach algebra and ϕ ∈ ∆(A), then |ϕ(x)| ≤ρ(x) ≤ |x| for all x ∈ A, and |ϕ| = 1.

Proof. This follows immediately from the last part of Theorem 5.21, Theorem 5.7,and the fact that ϕ(e) = 1.

Recall that the closed unit ball B in A∗ is weak* compact (see Theorem 5.18).Clearly ∆(A) ⊆ B due to Corollary 5.22, so ∆(A) is a compact Hausdorff space ifwe can show that it is closed in B. The subspace topology on ∆(A) is called theGelfand topology.

Theorem 5.23 (Gelfand topology). Let A be a unital commutative Banach alge-bra. Then ∆(A) is a closed subset of the closed unit ball in A∗, and is therefore acompact Hausdorff space.

Proof. Let ϕ ∈ ∆(A), let x, y ∈ A and let ε > 0. Choose ψ ∈ ∆(A) such that|ψ(a)− ϕ(a)| < ε for a ∈ e, x, y, xy. Then

|ϕ(e)− 1| ≤ |ϕ(e)− ψ(e)| < ε

and

|ϕ(x)ϕ(y)− ϕ(xy)| ≤ |ϕ(x)(ϕ(y)− ψ(y))|+ |ψ(y)(ϕ(x)− ψ(x))|+ |ψ(x)ψ(y)− ϕ(xy)|

< ε(|x|+ |y|+ 1).

Taking ε→ 0 shows that ϕ(e) = 1 and ϕ(xy) = ϕ(x)ϕ(y).

319


Gelfand space, non-unital case

Let A be a (unital or non-unital) commutative Banach algebra and consider itsunitization Ae. If A is non-unital, we define the spectrum σ(x) of x ∈ A to bethe spectrum of x in Ae, i.e. σAe(x). In this case we must have 0 ∈ σ(x), forotherwise e = xx−1 ∈ A. The resolvent set of x is C \ σAe(x), and the spectralradius of x is ρ(x) = supλ∈σ(x) |λ| = limn→∞ |xn|1/n as usual.

If ϕ ∈ ∆(A), we can extend ϕ to a complex homomorphism ϕ ∈ ∆(Ae) by setting

ϕ(x+ re) = ϕ(x) + r

for all x+ re ∈ Ae. The extension is unique, because any ψ ∈ ∆(Ae) must satisfyψ(e) = 1. There is also a complex homomorphism ϕ∞ ∈ ∆(Ae) given by

ϕ∞(x+ re) = r,

with kerϕ∞ = A. In fact, ∆(Ae) = ϕ : ϕ ∈ ∆(A)∪ϕ∞: if ψ ∈ ∆(Ae) and ψ 6=ϕ∞, then ψ|A ∈ ∆(A) and ψ = ψ|A. If we identify ∆(A) with ϕ : ϕ ∈ ∆(A) thenwe can regard ∆(A) as a subset of ∆(Ae), in which case ∆(Ae) = ∆(A) ∪ ϕ∞.

Theorem 5.24 (Gleason-Kahane-Zelazko theorem, non-unital case). Let A be aBanach algebra and let ϕ : A→ C be linear. The following are equivalent:

1. ϕ is a nonzero complex homomorphism.

2. ϕ(x) ∈ σ(x) for all x ∈ A.

Proof. We can assume that A is non-unital. If (1) holds then Theorem 5.21 showsthat ϕ(x) ∈ σAe(x) for all x ∈ Ae. In particular, ϕ(x) = ϕ(x) ∈ σ(x) = σAe(x)for all x ∈ A. This proves (1) ⇒ (2). If (2) holds then ϕ : Ae → C is linear andϕ(x) ∈ σ(x) for all x ∈ Ae because ϕ(re) = r ∈ σ(re) = r for all r ∈ C. ByTheorem 5.21, ϕ is a nonzero complex homomorphism, and so is ϕ = ϕ|A. Thisproves (2)⇒ (1).

Corollary 5.25. If A is a Banach algebra and ϕ ∈ ∆(A), then |ϕ(x)| ≤ ρ(x) ≤ |x|for all x ∈ A and |ϕ| ≤ 1.

Corollary 5.26. Let A and B be commutative Banach algebras, and let ϕ : A→ Bbe an algebra homomorphism. If B is semisimple then ϕ is continuous.

Proof. By Theorem 1.59, it suffices to show that if xn is a sequence in A and(xn, ϕ(xn)) → (x, y) for some (x, y) ∈ A × B, then y = ϕ(x). If ψ ∈ ∆(B) thenψ ϕ ∈ ∆(A) ∪ 0, so Corollary 5.25 shows that ψ ϕ and ψ are continuous.Therefore ψ(y) = limn→∞ ψ(ϕ(xn)) = ψ(ϕ(x)). This holds for all ψ, so y−ϕ(x) ∈rad(B) = 0, i.e. y = ϕ(x).

320


Corollary 5.27. Every algebra isomorphism between two semisimple commutativeBanach algebras is a homeomorphism.

In particular, the identity map on a semisimple commutative Banach algebra isa homeomorphism, even when the domain and codomain have different norms.Compare this with Theorem 1.39.

Corollary 5.28. If A is a semisimple commutative Banach algebra, then all Ba-nach algebra norms on A are equivalent.

By considering the unitization of A, we can generalize Theorem 5.23. Note that abasis for the Gelfand topology on ∆(A) consists of sets of the form

UA(ϕ, F, ε) = ψ ∈ ∆(A) : |ψ(x)− ϕ(x)| < ε for all x ∈ F ,

for ϕ ∈ ∆(A), finite subsets F of A, and ε > 0. (This follows from the definitionof the weak* topology.) Clearly the statement holds if we replace A by Ae, butwe can further assume that F is a subset of A (rather than Ae) because everyϕ ∈ ∆(Ae) satisfies ϕ(re) = r for r ∈ C.

Theorem 5.29 (Gelfand topology, non-unital case). Let A be a commutativeBanach algebra. If A is unital, then ∆(A) is compact. If A is non-unital, then∆(A) is an LCH space and ∆(Ae) = ∆(A)∪ϕ∞ is the one-point compactificationof ∆(A).

Proof. Theorem 5.23 shows that ∆(Ae) is a compact Hausdorff space, so ∆(A) =∆(Ae)\ϕ∞ is open in ∆(Ae) and therefore locally compact Hausdorff. Considera set UAe(ϕ, F, ε) where ϕ ∈ ∆(Ae), F is a finite subset of A, and ε > 0. Ifϕ∞ /∈ UAe(ϕ, F, ε) then ϕ ∈ ∆(A) and UAe(ϕ, F, ε) = UA(ϕ, F, ε) is open in theGelfand topology on ∆(A). If ϕ∞ ∈ UAe(ϕ, F, ε) and ϕ ∈ ∆(A) then

∆(Ae) \ UAe(ϕ, F, ε) = ψ ∈ ∆(A) : |ψ(x)− ϕ(x)| ≥ ε for some x ∈ F

=⋃x∈Fψ ∈ ∆(A) : |ψ(x)− ϕ(x)| ≥ ε

is closed in the Gelfand topology on ∆(A) and therefore compact. If ϕ = ϕ∞ then

∆(Ae) \ UAe(ϕ, F, ε) =⋃x∈Fψ ∈ ∆(A) : |ψ(x)| ≥ ε,

which is again compact. This shows that ∆(Ae) is the one-point compactificationof ∆(A).

321


Gelfand transform

Let A be a commutative Banach algebra. For each x ∈ A the evaluation mapx : ∆(A) → C given by x(ϕ) = ϕ(x) is continuous, from the definition of theGelfand topology. The map x is called the Gelfand transform of x. Since ∆(Ae)is the one-point compactification of ∆(A) (see Theorem 5.29) and x(ϕ∞) = 0 forx ∈ A, Lemma 4.130 shows that x ∈ C0(∆(A),C). The Gelfand representation(or Gelfand homomorphism) is the algebra homomorphism Γ : A → C0(∆(A))defined by Γ(x) = x. If A is unital, then the Gelfand representation is a unitalalgebra homomorphism. It is common to refer to Γ (and not just each x) as the

Gelfand transform. It is also common to denote the image Γ(A) by A.

Lemma 5.30. Let A be a commutative Banach algebra and let x ∈ A.

1. If A is unital, then x(∆(A)) = σ(x).

2. If A is non-unital, then x(∆(A)) ∪ 0 = σ(x).

In either case, ‖x‖∞ = ρ(x) ≤ |x|.

Proof. For (1), Theorem 5.21 shows that x(∆(A)) ⊆ σ(x). Conversely, if λ ∈ σ(x)then A(λe − x) is a proper ideal of A (for otherwise λe − x would be invertible).By Theorem 1.132 and Theorem 5.19, there is some ϕ ∈ ∆(A) such that kerϕ ⊇A(λe − x). Therefore ϕ(λe − x) = 0 and ϕ(x) = λ, so λ ∈ x(∆(A)). For (2), wehave x(∆(Ae)) = σAe(x) = σ(x) from (1), and x(∆(Ae)) = x(∆(A))∪0 because∆(Ae) = ∆(A) ∪ ϕ∞ and ϕ∞(x) = 0.

Lemma 5.31. Let A be a commutative Banach algebra and let

r = infx∈A\0

|x2||x|2

, s = infx∈A\0

‖x‖∞|x|

.

Then s2 ≤ r ≤ s.

Proof. We have s2|x|2 ≤ ‖x‖2∞ = ‖x2‖∞ ≤ |x2| for all x ∈ A, so s2 ≤ r. Also,|x2| ≥ r|x|2 and therefore |xm| ≥ rm−1|x|m when m = 2n and n = 1, 2, . . . . ByTheorem 5.7, ‖x‖∞ = ρ(x) ≥ r|x| for all x ∈ A, so r ≤ s.

Theorem 5.32 (Properties of the Gelfand representation). Let A be a commuta-tive Banach algebra, and let Γ : A→ C0(∆(A)) be the Gelfand representation.

1. Γ is continuous and |Γ| ≤ 1.

2. ker Γ = rad(A).

322


3. Γ is an isometry if and only if |x|2 = |x2| for all x ∈ A.

4. Γ is injective (i.e. A is semisimple) and Γ(A) is closed in C0(∆(A)) if andonly if there is a constant c > 0 such that |x|2 ≤ c|x2| for all x ∈ A.

5. Γ(A) strongly separates points.

Proof. (1) follows from Lemma 5.30. For (2),

ker Γ = x ∈ A : x = 0 = x ∈ A : ϕ(x) = 0 for all ϕ ∈ ∆(A)

=⋂

ϕ∈∆(A)

kerϕ = rad(A).

For (3), Γ is an isometry if and only if s = 1 in Lemma 5.31, which is true if andonly if r = 1. For (4), if Γ is injective and Γ(A) is closed in C0(∆(A)) then the openmapping theorem implies that s > 0 in Lemma 5.31, so r > 0 and |x|2 ≤ r−1|x2|for all x ∈ A. Conversely, if c > 0 and |x|2 ≤ c|x2| for all x ∈ A then r > 0 andtherefore s > 0. This implies that Γ is injective and its inverse Γ−1 : Γ(A)→ A iscontinuous, so Γ(A) is complete (and closed) because A is complete (see Theorem1.35). (5) is clear because Γ(A) consists of evaluation maps.

5.3 Commutative C*-algebras

Let A be a complex algebra. A map ∗ : A → A is called an involution if itsatisfies the following properties:

1. ∗ is a conjugate linear map, i.e. (x + y)∗ = x∗ + y∗ and (rx)∗ = rx∗ for allx, y ∈ A and r ∈ C.

2. (xy)∗ = y∗x∗ for all x, y ∈ A.

3. (x∗)∗ = x.

A *-algebra is a (complex) algebra equipped with an involution. A normed*-algebra is a *-algebra with a norm such that ∗ is an isometry, i.e. |x∗| = |x|for all x ∈ A. A Banach *-algebra is a complete normed *-algebra. A *-homomorphism between two *-algebras A and B is an algebra homomorphismf : A → B that satisfies f(x∗) = f(x)∗ for all x ∈ A. If A is a *-algebra, then a*-subalgebra of A is a subalgebra B such that x∗ ∈ B for all x ∈ B. A *-idealof A is an ideal I that is a *-subalgebra of A.

A C*-algebra is a Banach algebra A with an involution satisfying |xx∗| = |x|2 forall x ∈ A (the C* identity). Since |xx∗| ≤ |x||x∗|, we must have |x| ≤ |x∗|. This

323


implies that |x∗| ≤ |x∗∗| = |x|, so |x∗| = |x|. This shows that every C*-algebra is aBanach *-algebra. The simplest example of a C*-algebra is C itself, with complexconjugation as the involution.

If A is a unital *-algebra, then it is easy to see that

σ(x∗) = σ(x) = z : z ∈ σ(x)

for all x ∈ A.

Theorem 5.33. If A is a semisimple commutative Banach algebra, then everyinvolution on A is continuous.

Proof. Define a new norm | · |∗ on A by |x|∗ = |x∗|. It is easy to check that thisis an algebra norm, and that A is complete under this norm. By Corollary 5.28,there is some c such that |x∗| = |x|∗ ≤ c|x| for all x ∈ A.

The C*-algebra C0(X)

Before continuing further we will examine the properties of the algebra C0(X) =C0(X,C), where X is an LCH space. This algebra is particularly important be-cause the Gelfand transform represents the elements of A by elements of C0(∆(A)).It is easy to check that C0(X) is a C*-algebra under the involution defined byf∗(x) = f(x). The following theorem shows that we can completely determine allclosed ideals of C0(X).

Theorem 5.34. Let X be an LCH space. For each subset E ⊆ X, let

IE = f ∈ C0(X) : f(x) = 0 for all x ∈ E .

1. The map E 7→ IE between the set of nonempty closed subsets of X and theset of proper closed ideals of C0(X).

2. IE is modular if and only if E is compact.

3. IE ∈ Max(C0(X)) if and only if E is a singleton.

Proof. For (1), IE is a closed ideal because the evaluation map ϕx : C0(X) → C,defined by ϕx(f) = f(x), is continuous for all x ∈ X. It is proper because for anyx ∈ E we can choose some f ∈ C0(X) such that f(x) 6= 0. Suppose IE = IF whereE and F are closed and E 6= F . We can assume that there is some x ∈ F \ E.By Urysohn’s lemma there is some f ∈ C0(X) such that f(x) 6= 0 and f = 0 onE. Then f ∈ IE but f /∈ IF , which is a contradiction. This shows that the mapE 7→ IE is injective.

324


To prove that the map is surjective, let I be a proper closed ideal of C0(X) andlet

E = x ∈ X : f(x) = 0 for all f ∈ I,which is clearly a closed set. Then I ⊆ IE , and it remains to show that IE ⊆ I.Let J = f ∈ Cc(X) : supp(f) ⊆ X \ E, which is an ideal of C0(X), and letf ∈ J . For each x ∈ supp(f), we can choose some gx ∈ I such that gx 6= 0 ona neighborhood Ux of x (because x /∈ E). Since supp(f) is compact, we havesupp(f) ⊆ Ux1 ∪ · · · ∪ Uxn for some x1, . . . , xn ∈ supp(f). Then g =

∑nk=1 gxgx

is in I and g > 0 on supp(f). Define h ∈ Cc(X) by setting h(x) = f(x)/g(x) forx ∈ supp(f) and h(x) = 0 for x ∈ X \ supp(f). Then f = gh ∈ I. This showsthat J ⊆ I.

Note that E 6= ∅, for otherwise J = Cc(X) and C0(X) = J ⊆ I = I (see Theorem4.128). If we can show that J is dense in IE , then IE ⊆ J ⊆ I = I and we aredone. Let f ∈ IE , let ε > 0 and let K = x ∈ X : |f(x)| ≥ ε ⊆ X \ E, whichis compact. By Urysohn’s lemma, there is some g ∈ Cc(X) such that 0 ≤ g ≤ 1,g = 1 on K and supp(g) ⊆ X \ E. Then fg ∈ J and ‖f − fg‖∞ ≤ ε. This showsthat J is dense in IE .

For (2), if E is compact then we can choose some g ∈ C0(X) such that g = 1 onE, and f − gf ∈ IE for all f ∈ C0(X). Conversely, if there is some g ∈ C0(X)such that f(1 − g) = f − gf ∈ IE for all f ∈ C0(X), then g = 1 on E and Emust be compact. For (3), if E = x then IE = kerϕx, so IE must be maximal.If E contains more than one point, then IE cannot be maximal because Ix is aproper modular ideal containing IE , for any x ∈ E.

Recall from Theorem 5.19 that the map ϕ 7→ kerϕ is a bijection from ∆(C0(X))to Max(C0(X)). The preceding theorem shows that each M ∈ Max(C0(X)) isthe kernel of the evaluation map ϕx : C0(X) → C for some x ∈ X. Therefore∆(C0(X)) = ϕx : x ∈ X, and we can identify ∆(C0(X)) with X (as topologicalspaces) once we show that the bijection x 7→ ϕx is a homeomorphism. With thisidentification, the Gelfand representation is just the identity map on C0(X).

Theorem 5.35. The bijection x 7→ ϕx from X to ∆(C0(X)) is a homeomorphism.

Proof. A basis for the Gelfand topology on ∆(C0(X)) consists of sets of the form

UC0(X)(ϕx, F, ε) = ϕy ∈ ∆(C0(X)) : |ϕy(f)− ϕx(f)| < ε for all f ∈ F= ϕy ∈ ∆(C0(X)) : |f(y)− f(x)| < ε for all f ∈ F

where F is a finite subset of A and ε > 0, as in Theorem 5.29. Then

y ∈ X : |f(y)− f(x)| < ε for all f ∈ F =⋂f∈F

y ∈ X : |f(y)− f(x)| < ε

325


is open. This shows that x 7→ ϕx is continuous. Now let U be a neighborhood ofx ∈ X. By Urysohn’s lemma, we can choose some f ∈ C0(X) such that f(x) 6= 0and f = 0 on X \ U . Then ϕy : y ∈ U contains the neighborhood

UC0(X)(ϕx, f, |f(x)|) = ϕy ∈ ∆(C0(X)) : |f(y)− f(x)| < |f(x)|

of ϕx, since |f(y) − f(x)| < |f(x)| implies that f(y) 6= 0 and y ∈ U . This showsthat ϕx 7→ x is continuous.

Unitization

If A is a *-algebra then the unitization Ae is a *-algebra with the involution

(x+ re)∗ = x∗ + re.

However, if A is a C*-algebra, the norm |x + re| = |x| + |r| does not in generalsatisfy |xx∗| = |x|2 for x ∈ Ae. The following theorem shows that we can alwayschoose a norm on Ae that does satisfy the C* identity.

Theorem 5.36. Let A be a non-unital C*-algebra with a norm | · |. Then thereexists a norm | · |∗ on Ae satisfying the C* identity, such that |x|∗ = |x| for allx ∈ A and |e|∗ = 1 (if A 6= 0).

Proof. We will also use | · | to denote the norm |x+ re| = |x|+ |r| on Ae. For eachx ∈ Ae, define Lx : A → A by Lxy = xy so that |Lx| ≤ |x|. Define |x|∗ = |Lx|for x ∈ Ae. If x ∈ A then |Lxx∗| = |xx∗| = |x|2 = |x||x∗|, so |Lx| ≥ |x| andtherefore |x|∗ = |Lx| = |x|. Also, if we choose any nonzero x ∈ A then |Lex| = |x|,so |Le| ≥ 1 and |e|∗ = 1. It is clear that |rx|∗ = |r||x|∗, |x + y|∗ ≤ |x|∗ + |y|∗,and |xy|∗ ≤ |x|∗|y|∗ for r ∈ C and x, y ∈ Ae, so it remains to show that |x|∗ = 0implies x = 0, and that | · |∗ satisfies |x∗|∗ = |x|∗ and the C* identity.

If |x|∗ = 0 then xy = 0 for all y ∈ A. Write x = a + re with a ∈ A and r ∈ C; ifr 6= 0 then 0 = xy = ay+ry and therefore uy = y for all y ∈ A, where u = −(1/r)a.Also, u = uu∗ = (uu∗)∗ = u∗ and therefore yu = yu∗ = (uy∗)∗ = (y∗)∗ = y forall y ∈ A. This shows that u is a unit in A, which is a contradiction. Thereforer = 0, which implies that x ∈ A and x = 0 since |x| = |x|∗ = 0.

Finally, we check that | · |∗ satisfies the C* identity. If x ∈ Ae and y ∈ A then

|Lxy| = |xy| = |(xy)∗(xy)|1/2

= |y∗x∗xy|1/2 ≤ |y∗|1/2|Lx∗x|1/2|y|1/2

= |y||x∗x|1/2∗ ,

326


so |x|∗ = |Lx| ≤ |x∗x|1/2∗ and

|x|2∗ ≤ |x∗x|∗ ≤ |x∗|∗|x|∗.

Therefore |x|∗ ≤ |x∗|∗ and |x∗|∗ ≤ |x∗∗|∗ = |x|∗, so |x∗|∗ = |x|∗ and |xx∗|∗ =|x∗x|∗ = |x|2∗.

Gelfand representation

Theorem 5.37. Let A be a commutative C*-algebra. Then the Gelfand represen-tation is an isometric *-isomorphism from A to C0(∆(A)).

Proof. Assume that A is unital. First, we will prove that the Gelfand represen-tation Γ : A → C0(∆(A)) is a *-homomorphism, i.e. x∗ = x for all x ∈ A. Thisis equivalent to showing that ϕ(x∗) = ϕ(x) for all ϕ ∈ ∆(A) and x ∈ A. Writeϕ(x) = a+ bi and ϕ(x∗) = c+ di for a, b, c, d ∈ R, and suppose that b+ d 6= 0. Let

y = (b+ d)−1(x+ x∗ − (a+ c)e)

so that y = y∗ and

ϕ(y) = (b+ d)−1(a+ bi+ c+ di− (a+ c)) = i.

For all t ∈ R,

(t+ 1)2 = |(t+ 1)i|2 = |ϕ(y + tie)|2

≤ |y + tie|2 = |(y + tie)(y + tie)∗|= |(y + tie)(y − tie)| =

∣∣y2 + t2e∣∣

≤∣∣y2∣∣+ t2

and therefore 2t+ 1 ≤ |y2|. This cannot hold as t→∞, so we must have d = −b.Applying the same argument to ix and (ix)∗ shows that Imϕ((ix)∗) = − Imϕ(ix).But ϕ(ix) = −b+ ai and ϕ((ix)∗) = ϕ(−ix∗) = d− ci, so a = c. This proves thatϕ(x) = ϕ(x∗).

Next, we show that Γ is an isometry. Let x ∈ A and let y = xx∗. Then y = y∗,so |y|2 = |yy∗| = |y2|. Therefore |y|m = |ym| and |y| = |ym|1/m when m = 2n andn = 1, 2, . . . . By Theorem 5.7, |y| = ρ(y) = ‖y‖∞. Since y = xx∗ = xx = |x|2,

‖x‖∞ = ‖y‖1/2∞ = |y|1/2 = |xx∗|1/2 = |x| .

This proves that Γ is an isometry.

327


By Theorem 1.35, Γ(A) is complete and therefore closed in C0(∆(A)). Recallfrom Theorem 5.32 that Γ(A) strongly separates points. Also, Γ(A) is closedunder complex conjugation because Γ is a *-homomorphism. Then Theorem 4.134shows that Γ(A) = C0(∆(A)), and this completes the proof that Γ is an isometric*-isomorphism.

Finally, suppose that A is non-unital and consider its unitization Ae with thenorm defined in Theorem 5.36. Since Ae is unital, its Gelfand representationΓ : Ae → C(∆(Ae)) is an isometric *-isomorphism. This implies that Ae issemisimple, so Theorem 5.28 shows that the norm in Theorem 5.36 is equivalentto the usual norm |x+ re| = |x|+ |r| on Ae. By Theorem 5.29 and Theorem 4.131we can identify C(∆(Ae)) with C0(∆(A))e, so that Γ|A : A → C0(∆(A)) is theGelfand representation of A, and is an isometric *-isomorphism.

Normal elements

Let A be a *-algebra. An element x ∈ A for which xx∗ = x∗x is called normal. Aset S ⊆ A is normal if it commutes and x∗ ∈ S for all x ∈ S. By definition, anynormal subalgebra of A is a *-algebra. If x∗ = x then we say that x is Hermitianor self-adjoint. Any x ∈ A can be written in the form x = u+ iv where u, v ∈ Aare Hermitian, because we can take

u =1

2(x+ x∗) and v =

1

2i(x− x∗).

Occasionally we will use Rex and Imx to refer to u and v respectively. Also notethat e is Hermitian because e∗ = ee∗ = (ee∗)∗ = (e∗)∗ = e. An element x ∈ A ispositive if it is Hermitian and σ(x) ⊆ [0,∞), in which case we write x ≥ 0. Notethat the set of all normal elements is closed because the map x 7→ xx∗ − x∗x iscontinuous, and the set of Hermitian elements is closed because the map x 7→ x∗−xis continuous.

Lemma 5.38. Let x ∈ A and write x = u + iv where u, v ∈ A are Hermitian.Then x is normal if and only if u and v commute.

Proof. If uv = vu then (x+x∗)(x−x∗) = (x−x∗)(x+x∗), so x∗x−xx∗ = xx∗−x∗xand xx∗ = x∗x.

Theorem 5.39. Let A be a unital Banach *-algebra. For any subset S ⊆ A, thereexists a maximal normal subset B of A containing S. Furthermore:

1. B is a closed unital commutative *-subalgebra of A.

2. σB(x) = σA(x) for all x ∈ B.

328


Proof. The existence of B is easy to prove using Zorn’s lemma. First, we showthat if x ∈ A is normal and xy = yx for all y ∈ B, then x ∈ B. We have

x∗y = (y∗x)∗ = (xy∗)∗ = yx∗

for all y ∈ B, so x∗ ∈ B. Then B∪x, x∗ is normal, so it is contained in B (sinceB is maximal) and therefore x ∈ B. This proves the assertion, and (1) follows. For(2), note that if x ∈ B has an inverse in A then x−1 ∈ B because x−1 is normaland

x−1y = x−1yxx−1 = x−1xyx−1 = yx−1

for all y ∈ B.

Corollary 5.40. Let A be a unital Banach *-algebra and let x, y ∈ A. If xy = yx,then σ(x+ y) ⊆ σ(x) + σ(y) and σ(xy) ⊆ σ(x)σ(y).

Proof. Let B be a maximal normal subset containing x, y and consider theGelfand representation Γ : B → C(∆(B)). The result follows from applyingLemma 5.30 to Γ(x) and Γ(y).

Theorem 5.41 (Properties of normal, Hermitian and positive elements). Let Abe a unital C*-algebra.

1. If x ∈ A is Hermitian then σ(x) ⊆ R.

2. If x ∈ A is normal then ρ(x) = |x|.

3. If x, y ∈ A and x, y ≥ 0 then x+ y ≥ 0.

4. If x ∈ A then xx∗ ≥ 0 and e+ xx∗ is invertible.

Proof. For (1) and (2), Theorem 5.39 shows that there is a maximal normal subsetB containing x, and that B is a unital commutative C*-algebra satisfying σB(x) =σA(x). By Theorem 5.37, the Gelfand representation Γ : B → C(∆(B)) is anisometric *-isomorphism. If x is Hermitian then Γ(x) = Γ(x∗) = Γ(x), so Γ(x) isreal-valued. Therefore σA(x) = σB(x) = Γ(x)(∆(B)) ⊆ R. If x is normal thenρ(x) = ‖Γ(x)‖∞ = |x| since Γ is an isometry. For (3), note that σ(x) ⊆ [0, |x|]and therefore σ(|x|e− x) ⊆ [0, |x|] by Theorem 5.2. Since |x| e− x is normal, (2)implies that ||x| e− x| ≤ |x|. Similarly, ||y| e− y| ≤ |y| and

|(|x|+ |y|)e− (x+ y)| ≤ |x|+ |y| .

Since x + y is Hermitian, (1) implies that σ((|x| + |y|)e − (x + y)) is real andtherefore

σ((|x|+ |y|)e− (x+ y)) ⊆ [−(|x|+ |y|), |x|+ |y|].

329


Applying Theorem 5.2 again shows that

σ(x+ y) ⊆ [0, 2(|x|+ |y|)],

which proves (3). For (4), let y = xx∗ so that y is Hermitian. There is a maximalnormal subset B containing y such that Γ : B → C(∆(B)) is an isometric *-isomorphism, as in the proof of (1) and (2). (1) implies that Γ(y) is real-valued,and it remains to show that Γ(y) ≥ 0 on ∆(B). Let z = Γ−1(|Γ(y)| − Γ(y)) ∈ Bso that z∗ = Γ−1(|Γ(y)| − Γ(y∗)) = z. Let w = zx and write w = u + iv whereu, v ∈ A are Hermitian. Then

ww∗ = zxx∗z = zyz = z2y

and

w∗w = (u− iv)(u+ iv)

= (−u− iv + 2u)(u− iv + 2iv)

= −ww∗ − 2i(u+ iv)v + 2u(u− iv) + 4iuv

= −z2y + 2u2 + 2v2.

Since u and v are Hermitian, Theorem 5.2 implies that u, v ≥ 0. Also,

Γ(−z2y) = − (|Γ(y)| − Γ(y))2

Γ(y) = −Γ(y)3 + 2 |Γ(y)|Γ(y)2 − Γ(y)3

= 2Γ(y)2 (|Γ(y)| − Γ(y)) ≥ 0

on ∆(B), so −z2y ≥ 0. Then (3) implies that w∗w ≥ 0, and Theorem 5.1 showsthat σ(ww∗) ⊆ σ(w∗w) ∪ 0, implying that ww∗ ≥ 0. Since ww∗ = z2y, we haveΓ(z2y) ≥ 0 on ∆(B) and therefore Γ(z2y) = 0. This implies that Γ(y) = |Γ(y)| ≥ 0on ∆(B), and (4) follows.

Theorem 5.42. Let A be a unital C*-algebra and let B be a closed unital *-subalgebra of A. Then σB(x) = σA(x) for all x ∈ B.

Proof. Let x ∈ B with an inverse in A. Then x∗ and xx∗ both have inverses inA, and Theorem 5.41 implies that σA(xx∗) ⊆ (0,∞). Theorem 5.10 shows thatσB(xx∗) = σA(xx∗), so xx∗ has an inverse in B. Therefore x−1 = x∗(xx∗)−1 ∈B.

If S ⊆ C is a self-conjugate set, the conjugate of a function f : S → C is thefunction f∗ : S → C defined by

f∗(z) = f(z).

We say that f is self-conjugate if f = f∗, i.e. f(z) = f(z) for all z ∈ S. Notethat f is holomorphic if and only if f∗ is holomorphic.

330


Theorem 5.43. Let A be a unital commutative Banach *-algebra and let U ⊆ Cbe a self-conjugate open set.

1. The set AU = x ∈ A : σ(x) ⊆ U is self-adjoint, in the sense that x∗ ∈ AUfor all x ∈ AU .

2. H : H(U) → H(H(U)) is self-conjugate, in the sense that f∗(x) = f(x∗)∗

for all f ∈ H(U) and x ∈ AU (H and f are defined as in Theorem 5.13).

In particular, if f ∈ H(U) is self-conjugate then f : AU → A is self-adjoint, in

the sense that f(x∗) = f(x)∗ for all x ∈ AU . If f ∈ H(U) is self-conjugate and

x ∈ AU is Hermitian, then f(x) is also Hermitian.

Proof. For (1), suppose that x ∈ AU . Then

σ(x∗) = z : z ∈ σ(x) ⊆ z : z ∈ U = U,

so x∗ ∈ AU .

For (2), Theorem 5.11 shows that we can choose some 1-cycle γ in U that surroundsS such that [γ] = [η − η] for some piecewise C1 1-chain η in U . Write [η] =∑ki=1 ci[σi] where each σi is a C1 path. Then

f(x∗)∗ =

(1

2πi

∫γ

f(λ)(λe− x∗)−1 dλ

)∗= − 1

2πi

k∑i=1

ci

(∫σi−σi


)∗.

Now (∫σi


)∗=

(∫ 1

0

f(σi(t))(σi(t)e− x∗)−1σ′i(t) dt

)∗=

∫ 1

0

f(σi(t))(σi(t)e− x)−1σ′i(t) dt

=

∫ 1

0

f∗(σi(t))(σi(t)e− x)−1σi′(t) dt

=

∫σi

f∗(λ)(λe− x)−1 dλ

331


because the involution on A is a continuous conjugate linear map. Therefore

f(x∗)∗ = − 1

2πi

k∑i=1

ci

(∫σi

f(λ)(λe− x∗)−1 dλ−∫σi


)∗

= − 1

2πi

k∑i=1

ci

(∫σi

f∗(λ)(λe− x)−1 dλ−∫σi


)

=1

2πi

k∑i=1

ci

∫σi−σi


= f∗(x).

Corollary 5.44 (Existence of Hermitian logarithms and square roots). Let A bea unital Banach *-algebra and let x ∈ A be Hermitian. If σ(x) ⊆ C\ (−∞, 0] then:

1. There exists some Hermitian y ∈ A such that exp(y) = x.

2. There exists some Hermitian y ∈ A such that y2 = x.

Proof. First assume that A is commutative. Let g : C \ (−∞, 0] → C be the log-arithm function defined in Theorem 3.13, which is holomorphic. Let y = H(g)(x)so that exp(y) = H(exp g)(x) = x due to Theorem 5.16. Since g(z) = g(z) for allz ∈ C \ (−∞, 0], Theorem 5.43 shows that y is Hermitian. For the general case,apply the preceding result with x as an element of some maximal normal subsetcontaining x (see Theorem 5.39). This proves (1), and (2) is similar.

Theorem 5.45 (Fuglede-Putnam-Rosenblum theorem). Let A be a unital C*-algebra and let x, y, z ∈ A. If x and y are normal and xz = zy, then x∗z = zy∗.

Proof. We first prove that |exp(u− u∗)| = 1 for all u ∈ A. We have

exp(u− u∗)∗ =

∞∑n=0

((u− u∗)n)∗

n!=

∞∑n=0

(u∗ − u)n

n!

= exp(u∗ − u) = exp(u− u∗)−1,

so|exp(u− u∗)|2 = |exp(u− u∗) exp(u− u∗)∗| = 1.

Next, note that xnz = zyn for n = 1, 2, . . . by induction. Therefore

exp(x)z =

∞∑n=0

xnz

n!=

∞∑n=0

zyn

n!= z exp(y),

332


and|exp(x∗)z exp(−y∗)| = |exp(x∗ − x)z exp(y − y∗)| ≤ |z| .

The above argument holds for wx and wy when w ∈ C, so the function

f(w) = exp(wx∗)z exp(−wy∗)

is a bounded and entire. By Liouville’s theorem (Corollary 3.9), f is constant.Then

f ′(w) = x∗ exp(wx∗)z exp(−wy∗)− exp(wx∗)zy∗ exp(−wy∗) = 0

using the product rule and Theorem 2.62, so

f ′(0) = x∗z − zy∗ = 0.

Continuous functional calculus

Theorem 5.46 (Continuous functional calculus). Let A be a unital C*-algebra,let x ∈ A be normal, and let B be the closed unital subalgebra generated by x andx∗. Then Γ(x) : ∆(B) → σ(x) is a homeomorphism, and there exists a uniqueisometric unital *-homomorphism Cx : C(σ(x))→ A such that x = Cx(Idσ(x)).

Proof. Note that B is normal because x is normal, and σB(x) = σA(x) by Theorem5.42. Theorem 5.37 shows that the Gelfand representation Γ : B → C(∆(B))is an isometric *-isomorphism. By Lemma 5.30, Γ(x) is a surjection. Supposethat Γ(x)(ϕ) = Γ(x)(ψ) for some ϕ,ψ ∈ ∆(B). Then ϕ(x) = ψ(x), and byTheorem 5.37, ϕ(x∗) = ψ(x∗). If p(a, b) is a polynomial in two variables thenϕ(p(x, x∗)) = ψ(p(x, x∗)). The set of all p(x, x∗) is dense in B, so ϕ = ψ bycontinuity. This shows that Γ(x) is a bijection. Since Γ(x) is continuous and∆(B) is compact, Γ(x) is a homeomorphism.

Define Cx : C(σ(x)) → A by Cx(f) = Γ−1(f Γ(x)). It is easy to see thatCx is an isometric unital *-homomorphism such that Cx(Idσ(x)) = x. Supposethat Dx : C(σ(x)) → A is another isometric unital *-homomorphism satisfyingx = Dx(Idσ(x)). If q : σ(x) → C is a function of the form q(z) = p(z, z) forsome polynomial p(a, b) in two variables (with complex coefficients) then Cx(q) =p(τ, τ∗) = Dx(q). The set of all such functions is dense in C(σ(x)) due to Theorem4.134, so Cx = Dx by continuity.

333


In contrast to the holomorphic functional calculus (see Theorem 5.13), the contin-uous functional calculus applies to continuous (rather than holomorphic) functionsdefined on σ(x) for any normal element x in a unital C*-algebra. We will oftenwrite f(x) = Cx(f) for convenience.

Theorem 5.47 (Spectral mapping theorem). Let A be a unital C*-algebra, letx ∈ A be normal, and let f ∈ C(σ(x)). Then

σ(f(x)) = f(σ(x))

(cf. Theorem 5.15).

Proof. Define B ⊆ A and Γ : B → C(∆(B)) as in Theorem 5.46. Then

σ(f(x)) = Γ(Cx(f))(∆(B)) = f(Γ(x)(∆(B))) = f(σ(x)).

Positive linear functionals

Let A be a unital Banach *-algebra. A linear functional λ : A→ C (not necessarilycontinuous) is positive if λ(xx∗) ≥ 0 for all x ∈ A.

Theorem 5.48 (Properties of positive linear functionals). Let A be a unital Ba-nach *-algebra and let λ be a positive linear functional on A. Let x, y ∈ A.

1. λ(x∗) = λ(x).

2. |λ(xy∗)|2 ≤ λ(xx∗)λ(yy∗).

3. |λ(x)|2 ≤ λ(e)λ(xx∗) ≤ λ(e)2ρ(xx∗).

4. |λ(x)| ≤ λ(e)ρ(x) if x is normal.

5. If A is commutative then |λ| = λ(e). If there is some c > 0 such that|x∗| ≤ c|x| for all x ∈ A, then |λ| ≤

√cλ(e).

Proof. For all r ∈ C,

λ((x+ ry)(x∗ + ry∗)) = λ(xx∗) + rλ(xy∗) + rλ(yx∗) + |r|2 λ(yy∗) ≥ 0. (*)

Setting r = 1 and r = i gives

λ(xx∗) + λ(xy∗) + λ(yx∗) + λ(yy∗) ≥ 0,

λ(xx∗)− iλ(xy∗) + iλ(yx∗) + λ(yy∗) ≥ 0,

334


so λ(xy∗) + λ(yx∗) and i(λ(xy∗)− λ(yx∗)) are real. Then

2iλ(xy∗) = i(λ(xy∗)− λ(yx∗)) + i(λ(xy∗) + λ(yx∗))

= i(λ(xy∗)− λ(yx∗))− i(λ(xy∗) + λ(yx∗)) = −2iλ(yx∗),

so λ(xy∗) = λ(yx∗). (1) follows if we set y = e. For (2), we can assume thatλ(xy∗) 6= 0. For any s ∈ R, we can set

r = sλ(xy∗)

|λ(xy∗)|

in (*) so thatλ(xx∗) + 2s |λ(xy∗)|+ s2λ(yy∗) ≥ 0.

The left hand side is a quadratic polynomial in s, so its discriminant cannot bepositive. Therefore

4 |λ(xy∗)|2 − 4λ(xx∗)λ(yy∗) ≤ 0,

which proves (2). For (3), set y = e in (2) to get |λ(x)|2 ≤ λ(e)λ(xx∗). Let s ∈ Rsuch that s > ρ(xx∗). Theorem 5.2 implies that

σ(se− xx∗) ⊆ z ∈ C : Re z > 0,

so Corollary 5.44 shows that there is some Hermitian u ∈ A such that u2 = se−xx∗.Then

sλ(e)− λ(xx∗) = λ(se− xx∗) = λ(u2) ≥ 0,

i.e. λ(xx∗) ≤ sλ(e). Since this holds for every s > ρ(xx∗), we must have λ(xx∗) ≤λ(e)ρ(xx∗). For (4), Corollary 5.40 shows that σ(xx∗) ⊆ σ(x)σ(x∗), so ρ(xx∗) ≤ρ(x)ρ(x∗) = ρ(x)2. For (5), if A is commutative then (4) implies that

|λ(x)| ≤ λ(e)ρ(x) ≤ λ(e) |x|

for all x ∈ A, so |λ| = λ(e). If |x∗| ≤ c|x| for all x ∈ A then

|λ(x)| ≤ λ(e)√ρ(xx∗) ≤ λ(e)

√|x| |x∗| ≤

√cλ(e) |x|

for all x ∈ A, so |λ| ≤√cλ(e).

Theorem 5.49. Let A be a unital commutative Banach *-algebra such that theGelfand representation is a *-homomorphism. Let L ⊆ A∗ be the convex set of allpositive linear functionals λ on A such that λ(e) ≤ 1. Let M ⊆ MR(∆(A),R) bethe convex set of all positive Radon measures µ on ∆(A) such that µ(∆(A)) ≤ 1.Then for every µ ∈M the map

λµ(x) =

∫∆(A)

x dµ

is a positive linear functional in L, and the map µ 7→ λµ is a bijection.

335


Proof. It is clear that λµ is linear. For all x ∈ A,

λµ(xx∗) =

∫∆(A)

Γ(xx∗) dµ =

∫∆(A)

|Γ(x)|2 dµ ≥ 0

because Γ is a *-homomorphism. Also, λµ ∈ L because λµ(e) = µ(∆(A)).

Recall from Theorem 5.32 that Γ(A) strongly separates points. Also, Γ(A) is closedunder complex conjugation because Γ is a *-homomorphism. Then Theorem 4.134shows that Γ(A) is dense in C(∆(A)). To prove that µ 7→ λµ is injective, supposethat λµ = λν for some µ, ν ∈M . Since∫

∆(A)

f dµ =

∫∆(A)

f dν

for all f ∈ Γ(A), and Γ(A) is dense in C(∆(A)), equality must hold for all f ∈C(∆(A)). By uniqueness in Theorem 4.137, µ = ν. To prove surjectivity, letλ ∈ L. Define a linear map Λ : Γ(A)→ C by setting Λ(ϕ) = λ(xϕ) where xϕ ∈ Ais chosen so that ϕ = Γ(xϕ). This is well-defined because part (4) of Theorem5.48 implies that λ = 0 on ker Γ = rad(A). Since

|Λ(ϕ)| = |λ(xϕ)| ≤ λ(e)ρ(xϕ) = λ(e) ‖ϕ‖∞ ,

the norm of Λ is λ(e). Since Γ(A) is dense in C(∆(A)), we can extend Λ to a linearfunctional on C(∆(A)) with the same norm. By Theorem 4.151, there exists someµ ∈MR(∆(A),C) with ‖µ‖ = |Λ| = λ(e) such that

Λ(ϕ) =

∫∆(A)

ϕdµ

for all ϕ ∈ C(∆(A)). In particular we have

λ(x) = Λ(x) =

∫∆(A)

x dµ

for all x ∈ A and

µ(∆(A)) =

∫∆(A)

e dµ = λ(e) = ‖µ‖ ≤ 1.

Since µ(∆(A)) = |µ| (∆(A)), Theorem 4.15 implies that µ is a positive measure.

336


5.4 Positive operator-valued measures

In this section and the next we will examine the Banach algebra L(E), where Eis a complex Hilbert space. From Theorem 1.98 it is clear that the map f 7→ f∗ isan involution on L(E) satisfying the C* identity. In other words, L(E) is a unitalC*-algebra.

Hermitian forms

A Hermitian form on a complex Hilbert space E is a sesquilinear map ϕ :E × E → C that satisfies ϕ(x, y) = ϕ(y, x) for all x, y ∈ E. A quadratic form

on E is a map q : E → R satisfying q(rx) = |r|2 q(x) for all r ∈ C. We say thatq is positive semidefinite if q(x) ≥ 0 for all x ∈ E, and positive definite ifq(x) > 0 for all nonzero x ∈ E. If ϕ is a Hermitian form, qϕ(x) = ϕ(x, x) definesan associated quadratic form. It is clear that ϕ is positive semidefinite (or positivedefinite) if and only if qϕ is. If qϕ = 0 then ϕ = 0 due to Theorem 1.5.

Example 5.50. The following are examples of Hermitian forms:

1. Inner products are defined as positive definite Hermitian forms on a (com-plex) vector space.

2. If f ∈ L(E) is self-adjoint, then ϕ(x, y) = 〈fx, y〉 defines a Hermitian formon E because

ϕ(x, y) = 〈fx, y〉 = 〈x, fy〉 = 〈fy, x〉 = ϕ(y, x).

Furthermore, f is positive (or positive definite) if and only if ϕ is. Also,

|ϕ| = sup|x|=|y|=1

|〈fx, y〉| = sup|x|=1

|fx| = |f | .

Theorem 5.51. Let ϕ be a Hermitian form on E and let q be the associatedquadratic form.

1. (Cauchy-Schwarz inequality). If ϕ is positive semidefinite then for all x, y ∈E,

|ϕ(x, y)| ≤ ϕ(x, x)1/2ϕ(y, y)1/2.

If ϕ is positive definite, then equality holds if and only if one of x and y isa scalar multiple of the other.

337


2. (Triangle inequality). If ϕ is positive semidefinite then for all x, y ∈ E,

q(x+ y)1/2 ≤ q(x)1/2 + q(y)1/2.

If ϕ is positive definite, then equality holds if and only if one of x and y isa scalar multiple of the other.

3. (Parallelogram law). For all x, y ∈ E,

q(x+ y) + q(x− y) = 2q(x) + 2q(y).

4. (Polarization identity). For all x, y ∈ E,

ϕ(x, y) =1

4(q(x+ y)− q(x− y) + iq(x+ iy)− iq(x− iy)).


Theorem 5.52. Let ϕ be a positive semidefinite Hermitian form on E and let qbe the associated quadratic form. Then ϕ is continuous if and only if there existssome c > 0 such that |q(x)| ≤ c|x|2 for all x ∈ E. In that case,

|ϕ| = sup|x|=1

|q(x)| .

Proof. If ϕ is continuous then |q(x)| ≤ |ϕ| |x|2 for all x ∈ E. Conversely, suppose

that |q(x)| ≤ c |x|2 for all x ∈ E. Let x, y ∈ E be unit vectors and choose θ ∈ Rso that Reϕ(x, eiθy) = 0. Then

q(x+ ieiθy)− q(x− ieiθy) = 2(ϕ(x, ieiθy) + ϕ(ieiθy, x)) = 0,

and

|ϕ(x, y)| =∣∣ϕ(x, eiθy)

∣∣ =1

4

∣∣q(x+ eiθy)− q(x− eiθy)∣∣

≤ c

4(∣∣x+ eiθy

∣∣2 +∣∣x− eiθy∣∣2) =

c

2(|x|2 + |y|2)

= c

using the polarization identity and parallelogram law. This shows that |ϕ| ≤ c.

Corollary 5.53. If f ∈ L(E) is self-adjoint, then

|f | = sup|x|=1

|〈fx, x〉| .

338


Theorem 5.54. Let q be a quadratic form on E. If the parallelogram law holdsfor q, then the polarization identity (in Theorem 5.51) defines a Hermitian formon E (cf. Theorem 1.9).

Theorem 5.55. If ϕ : E × F → K is a continuous sesquilinear map, then thereis a unique operator f ∈ L(E,F ) satisfying

ϕ(x, y) = 〈fx, y〉

for all x ∈ E and y ∈ F (cf. Theorem 1.97). If ϕ is a Hermitian form, then f isself-adjoint.

Proof. As in Theorem 1.97, but define fyx = ϕ(x, y).

Strong and weak convergence

In addition to norm topology convergence, we will be using two other modes ofconvergence in L(E). Let (fα) be a net in L(E) and let f ∈ L(E).

1. We say that (fα) converges to f uniformly if fα → f in the (operator)norm topology. That is, |fα − f | → 0.

2. We say that (fα) converges to f strongly if fα → f pointwise. That is,fαx→ fx for all x ∈ E.

3. We say that (fα) converges to f weakly if 〈fαx, y〉 → 〈fx, y〉 for all x, y ∈ E.

If fα → f uniformly then fα → f strongly. Since y 7→ 〈·, y〉 is continuous, iffα → f strongly then fα → f weakly. If fn is a sequence in L(E):

1. We say that fn is uniformly Cauchy if fn is Cauchy in the operatornorm. That is, |fm − fn| → 0 as m,n→∞.

2. We say that fn is strongly Cauchy if fnx is Cauchy for all x ∈ E.

3. We say that fn is weakly Cauchy if 〈fnx, y〉 is Cauchy for all x, y ∈ E.

Theorem 5.56 (Uniqueness of limits). Suppose fα → f uniformly, strongly, orweakly. Also suppose that fα → g uniformly, strongly, or weakly. Then f = g.

Proof. Since uniform and strong convergence both imply weak convergence, wecan assume that fα → f and fα → g weakly. Then 〈fx, y〉 = 〈gx, y〉 for allx, y ∈ E, and f = g due to Theorem 1.5.

339


Theorem 5.57. Let (fα)α∈A be a directed set of self-adjoint operators in L(E).If (fα)α∈A has an upper bound (i.e. there is some self-adjoint g ∈ L(E) such thatfα ≤ g for all α ∈ A), then supα∈A fα exists (and is self-adjoint). In that case,fα → supα∈A fα strongly.

Proof. Choose any α0 ∈ A. By restricting (fα)α∈A, we can assume that fα ≥ fα0

for all α ∈ A (without changing the supremum). Let g ∈ L(E) be an upper boundof (fα)α∈A. For all x ∈ E the directed set (〈fαx, x〉)α∈A has an upper bound〈gx, x〉, so 〈fαx, x〉 → cx for some cx ∈ R. Let

ϕ(x, y) =1

4(cx+y − cx−y + icx+iy − icx−iy).

The polarization identity in Theorem 5.51 shows that 〈fαx, y〉 → ϕ(x, y) for allx, y ∈ E. Since (x, y) 7→ 〈fαx, y〉 is a Hermitian form for every α ∈ A, ϕ is also aHermitian form. Since

〈fα0x, x〉 ≤ 〈fαx, x〉 ≤ 〈gx, x〉,

we must have ϕ(x, x) ≤ max(|fα0|, |g|)|x|2 for all x ∈ E. By Theorem 5.55 and

Corollary 5.52, there is a self-adjoint operator h ∈ L(E) such that ϕ(x, y) = 〈hx, y〉for all x, y ∈ E. Since 〈fαx, x〉 ≤ ϕ(x, x), we have fα ≤ h for all α ∈ A. Also,h ≤ g if g is any upper bound of (fα)α∈A, which shows that h = supα∈A fα. Everyh − fα is positive, so the Cauchy-Schwarz inequality (in Theorem 5.51) impliesthat

|〈(h− fα)x, y〉| ≤ 〈(h− fα)x, x〉1/2 〈(h− fα)y, y〉1/2

for all α ∈ A and x, y ∈ E. Setting y = (h − fα)x and using the fact that〈fαx, x〉 → ϕ(x, x) = 〈hx, x〉 gives fαx→ hx for every x ∈ E.

Theorem 5.58. Let (ρα)α be a directed set of orthogonal projections in L(E). If(ρα)α has an upper bound, then supα∈A ρα exists and is an orthogonal projection.In that case, ρα → supα∈A ρα strongly.

Proof. Let ρ = supα∈A ρα, which exists and is self-adjoint due to Theorem 5.57.Since |ρα| ≤ 1 for every α ∈ A and ρα → ρ strongly,∣∣ρ2x− ρx

∣∣ ≤ |ρ(ρx)− ρα(ρx)|+ |ρα(ρx− ραx)|+ |ραx− ρx|≤ |ρ(ρx)− ρα(ρx)|+ 2 |ραx− ρx|→ 0

for every x ∈ X. This shows that ρ is a projection.

340


Positive operator-valued measures

Let (X,M) be a measurable space. A positive operator-valued measure (orPO-valued measure) on M (or X) is a countably additive map ω :M→ L(E)such that ω(A) is a positive operator for every A ∈ M, and ω(∅) = 0. Bycountably additive we mean: whenever Ak is a disjoint countable collectionof sets in M, supn≥1

∑nk=1 ω(Ak) exists and

ω

( ∞⋃k=1

Ak

)= supn≥1

n∑k=1

ω(Ak),

where the sup is taken with respect to the usual partial order on self-adjointoperators (f ≥ g if f − g is positive). A projection-valued measure is a PO-valued measure ω such that ω(A) is an orthogonal projection for every A ∈ M.If ω is projection-valued and ω(A) = IdE , we say that ω is a resolution of theidentity.

Theorem 5.59 (Properties of PO-valued measures). Let ω be a PO-valued mea-sure on M.

1. ω(A1 ∪ · · · ∪ An) = ω(A1) + · · ·+ ω(An) if A1, . . . , An are pairwise disjointsets in M.

2. If An ∈M, A =⋃∞n=1An and A1 ⊆ A2 ⊆ · · · then ω(An)→ ω(A) strongly.

3. If An ∈M, A =⋂∞n=1An and A1 ⊇ A2 ⊇ · · · then ω(An)→ ω(A) strongly.

4. ω (⋃∞k=1Ak) ≤ supn≥1

∑nk=1 ω(Ak) if Ak is a collection of sets in M, not

necessarily disjoint, provided that supn≥1

∑nk=1 ω(Ak) exists.

5. A ⊆ B implies ω(A) ≤ ω(B).


Theorem 5.60. Let ω be a PO-valued measure on M. For each x, y ∈ E, defineωx,y :M→ C by

ωx,y(A) = 〈ω(A)x, y〉 .

For each x ∈ E, write ωx = ωx,x.

1. ωx,y is a C-valued measure and |ωx,y| (A) ≤ 2|ω(A)||x||y| for all x, y ∈ Eand A ∈M.

2. ωx is a positive measure and ωx(A) ≤ |ω(A)||x|2 for all x ∈ E and A ∈M.

341


3. For all x, y ∈ E and r ∈ C, we have the identities

ωrx = |r|2 ωx,ωx+y + ωx−y = 2ωx + 2ωy,

ωx,y =1

4(ωx+y − ωx−y + iωx+iy − iωx−iy).

Proof. Let x, y ∈ E. It is clear that ωx,y(∅) = 0. Let An be a disjoint countablecollection of sets in M and let A =

⋃∞k=1Ak. Then

ω(A) = supn≥1

n∑k=1

ω(Ak).

In particular, Theorem 5.57 (with ω(A) as an upper bound for the set of partialsums on the right) shows that

∑nk=1 ω(Ak) → ω(A) strongly. Since the map

f 7→ 〈fx, y〉 is continuous, we have

n∑k=1

ωx,y(Ak) =

n∑k=1

〈ω(Ak)x, y〉 → 〈ω(A)x, y〉 = ωx,y(A),

i.e.∑∞k=1 ωx,y(Ak) = ωx,y (

⋃∞k=1Ak). This proves that ωx,y is a C-valued measure,

and ωx is a positive measure because

ωx(A) = 〈ω(A)x, x〉 ≥ 0.

Also,ωx(A) = 〈ω(A)x, x〉 ≤ |ω(A)| |x|2 .

(3) follows directly from the parallelogram law and the polarization identity inTheorem 5.51. From this, when x, y ∈ E are unit vectors we have

∞∑k=1

|ωx,y(Ak)| ≤ 1

4

∞∑k=1

(ωx+y(Ak) + ωx−y(Ak) + ωx+iy(Ak) + ωx−iy(Ak))

=

∞∑k=1

(ωx(Ak) + ωy(Ak))

≤ ωx(A) + ωy(A)

≤ 2 |ω(A)| .

Corollary 5.61. Let ω be a PO-valued measure on M and let A ∈ M. Thenω(A) = 0 if and only if ωx(A) = 0 for all x ∈ E.

342


Theorem 5.62. Let ω : M→ L(E) and suppose that ω(A) is positive for everyA ∈ M. Then ω is a PO-valued measure if and only if ωx is a positive measurefor every x ∈ E.

Proof. If ω is a PO-valued measure then Theorem 5.60 shows that every ωx isa positive measure. Conversely, suppose that ωx is a positive measure for everyx ∈ E. Since 〈ω(∅)x, x〉 = 0 for every x ∈ E, Theorem 1.5 implies that ω(∅) = 0.Let An be a disjoint countable collection of sets in M and let A =

⋃∞n=1Ak.

Sincen∑k=1

〈ω(Ak)x, x〉 → 〈ω(A)x, x〉

for every x ∈ E, ω(A) is an upper bound for the setn∑k=1

ω(Ak) : n = 1, 2, . . .

.

By Theorem 5.57, f = supn≥1

∑nk=1 ω(Ak) exists and

∑nk=1 ω(Ak)→ f strongly.

In particular, for every x ∈ E we have

n∑k=1

〈ω(Ak)x, x〉 → 〈fx, x〉

and therefore 〈fx, x〉 = 〈ω(A)x, x〉. By Theorem 1.5, f = ω(A).

Theorem 5.63. Let ω be a PO-valued measure on M. Then ω is a projection-valued measure if and only if

ω(A ∩B) = ω(A)ω(B)

for all A,B ∈M.

Proof. If ω(A∩B) = ω(A)ω(B) for all A,B ∈M then ω(A) = ω(A)2, so Theorem1.107 implies that ω(A) is an orthogonal projection for all A ∈ M. Suppose thatω is projection-valued. If A,B ∈M are disjoint then

ω(A) + ω(B) = ω(A ∪B) = ω(A ∪B)2

= (ω(A) + ω(B))2 = ω(A) + ω(A)ω(B) + ω(B)ω(A) + ω(B),

so ω(A)ω(B) = −ω(B)ω(A). But

ω(A)ω(B) = ω(A)ω(A)ω(B) = −ω(A)ω(B)ω(A)

= ω(B)ω(A)ω(A) = ω(B)ω(A),

343


so ω(A)ω(B) = 0. For the general case, if A,B ∈M then

ω(A ∩B) = ω(A ∩B)2

= (ω(A)− ω(A \B))(ω(B)− ω(B \A))

= ω(A)ω(B)− ω(A)ω(B \A)− ω(A \B)ω(B) + ω(A \B)ω(B \A)

= ω(A)ω(B).

Corollary 5.64. If ω is a projection-valued measure and A,B ∈M are disjoint,then imω(A) ⊥ imω(B).

Proof. Since ω(A)ω(B) = ω(A ∩ B) = 0, the result follows from Theorem 1.105.

As with positive measures, we say that a set A ∈M has measure zero if ω(A) =0. For a set A, we say that a property holds almost everywhere if there is a setB of measure zero such that the property holds for all elements of A \B. We alsosay that the property holds for almost all x ∈ A.

Integration with respect to PO-valued measures

Let ω be a PO-valued measure. Let ‖·‖ be the supremum norm defined by ‖f‖ =supx∈X |f(x)|. If f : X → C is measurable almost everywhere, we define theessential supremum of f with respect to ω by


where the inf is taken over all bounded maps g such that f = g almost everywhere(‖f‖∞ =∞ if there are no such maps). Note that

‖rf‖∞ = |r| ‖f‖∞ ,

‖f + g‖∞ ≤ ‖f‖∞ + ‖g‖∞ ,

‖fg‖∞ ≤ ‖f‖∞ ‖g‖∞for all f, g measurable almost everywhere and all r ∈ C. Let L∞(X,ω) = L∞(ω)be the vector space of all maps f : X → C that are measurable almost every-where and satisfy ‖f‖∞ < ∞. Let L∞(X,ω) = L∞(ω) be the quotient spaceL∞(ω)/L0(ω), where L0(ω) is the set of all functions that are zero almost every-where. Then ‖·‖∞ is a seminorm on L∞(ω), and L∞(ω) is a unital commutativenormed algebra. If fn is a sequence in L∞(ω) and f ∈ L∞(ω), we say thatfn → f in L∞ if ‖fn − f‖∞ → 0.

344


Theorem 5.65. L∞(ω) is complete, i.e. a unital commutative Banach algebra.


Theorem 5.66. If ω is a PO-valued measure, then

L∞(ω) ⊆⋂x∈E

L∞(ωx,C).

Proof. Let f ∈ L∞(ω); then f is ωx-measurable for all x ∈ E due to Theorem4.36. Since ‖f‖∞ <∞, we can choose a bounded map g such that f = g outsidea set Z with ω(Z) = 0. Corollary 5.61 shows that ωx(Z) = 0, so f ∈ L∞(ωx,C)for all x ∈ E.

Corollary 5.67. The set of simple maps is dense in L∞(ω).

Proof. See Theorem 4.97.

Let S(X) ⊆ L∞(ω) be the set of all simple maps from X to C. Suppose thatf ∈ S(X) is simple with respect to some collection Ai. We define the integralof f by ∫

X

f dω =

∫x∈X

f(x) dω =

n∑i=1

f(Ai)ω(Ai) ∈ L(E).

It is easy to check that the integral of f is well-defined. Note that(∫X

f dω

)∗=

n∑i=1

f(Ai)ω(Ai) =

∫X

f dω,

so∫Xf dω is self-adjoint if f is real-valued. In that case, if u ∈ E is a unit vector

then ∣∣∣∣⟨(∫X

f dω

)u, u

⟩∣∣∣∣ =

∣∣∣∣∣n∑i=1

f(Ai) 〈ω(Ai)u, u〉

∣∣∣∣∣ ≤n∑i=1

|f(Ai)| 〈ω(Ai)u, u〉

≤ ‖f‖∞n∑i=1

〈ω(Ai)u, u〉 = ‖f‖∞ ωu

(n⋃i=1

Ai

)≤ ‖f‖∞ ωu(X) ≤ |ω(X)| ‖f‖∞ ,

345


so Theorem 5.53 implies that∣∣∫Xf dω

∣∣ ≤ |ω(X)| ‖f‖∞. In general, if f ∈ S(X)then ∣∣∣∣∫

X

f dω

∣∣∣∣ =

∣∣∣∣∫X

Re f dω + i

∫X

Im f dω

∣∣∣∣≤∣∣∣∣∫X

Re f dω

∣∣∣∣+

∣∣∣∣∫X

Im f dω

∣∣∣∣≤ 2 |ω(X)| ‖f‖∞ .

This shows that∫X

: S(X) → L(E) is a continuous linear map with∣∣∫X

∣∣ ≤2 |ω(X)|. Since S(X) is dense in L∞(ω), there is a unique extension of

∫X

to acontinuous linear map

∫X

: L∞(ω)→ L(E). This map defines the integral of anyf ∈ L∞(ω). If A ⊆ X is measurable, we write∫

A

f dω =

∫X

χAf dω.

Theorem 5.68 (Properties of the integral for PO-valued measures). Let f, g ∈L∞(ω) and let r ∈ C.

1.∣∣∫Xf dω

∣∣ ≤ 2 |ω(X)| ‖f‖∞.

2.∫X

(f + g) dω =∫Xf dω +

∫Xg dω and

∫Xrf = r

∫Xf .

3.∫Xf dω =

(∫Xf dω

)∗.

4.∫X

1 dω = ω(X). If A ⊆ X is measurable then∫A

1 dω =∫XχA dω = ω(A).

5. For all u, v ∈ E,⟨(∫

Xf dω

)u, v⟩

=∫Xf dωu,v.

6. If f is real-valued then∫Xf dω is self-adjoint and

∣∣∫Xf dω

∣∣ ≤ |ω(X)| ‖f‖∞.

7. If f ≥ 0 then∫Xf dω ≥ 0.

From now on we will use the letters σ and τ to denote linear maps, reserving fand g for functions in L∞(ω).

Lemma 5.69. If τ ∈ L(E) commutes with ω(A) for all A ∈ M, then for allf ∈ L∞(ω) and u, v ∈ E we have∫

X

f dωτu,v =

∫X

f dωu,τ∗v.

346


Proof. If f is simple with respect to some collection Ai then∫X

f dωτu,v =

n∑i=1

f(Ai) 〈ω(Ai)τu, v〉 =

n∑i=1

f(Ai) 〈τω(Ai)u, v〉

=

n∑i=1

f(Ai) 〈ω(Ai)u, τ∗v〉 =

∫X

f dωu,τ∗v,

and the result follows from the fact that S(X) is dense in L∞(ω).

Theorem 5.70. If τ ∈ L(E) commutes with ω(A) for all A ∈ M, then τ com-mutes with

∫Xf dω for every f ∈ L∞(ω).

Proof. Let σ =∫Xf dω. For all u, v ∈ E,

〈στu, v〉 =

∫X

f dωτu,v =

∫X

f dωu,τ∗v

= 〈σu, τ∗v〉 = 〈τσu, v〉

using Lemma 5.69, so στ = τσ.

We say that a PO-valued measure ω is commutative if ω(A)ω(B) = ω(B)ω(A)for all A,B ∈ M. Theorem 5.63 shows that every projection-valued measure iscommutative.

Theorem 5.71. Let f, g ∈ L∞(ω). If ω is commutative then(∫X

f dω

)(∫X

g dω

)=

(∫X

g dω

)(∫X

f dω

).

In particular,∫Xf dω is normal.

Proof. We apply Theorem 5.70. If B ∈M then ω(B) commutes with every ω(A)for A ∈ M, so

∫Xf dω commutes with ω(B). This holds for every B ∈ M, so∫

Xg dω commutes with

∫Xf dω. In particular, taking g = f shows that

∫Xf dω

is normal.

Theorem 5.72. Let ω be a projection-valued measure. For all f, g ∈ L∞(ω),∫X

fg dω =

(∫X

f dω

)(∫X

g dω

).

347


Proof. First assume that f and g are both simple with respect to Ai. Then(∫X

f dω

)(∫X

g dω

)=

(n∑i=1

f(Ai)ω(Ai)

) n∑j=1

g(Aj)ω(Aj)

=

n∑i=1

f(Ai)g(Ai)ω(Ai)

=

∫X

fg dω

because Theorem 5.63 shows that ω(Ai)ω(Aj) = 0 when i 6= j. For the generalcase, choose sequences fn and gn of simple maps such that fn → f and gn → gin L∞. Then fngn → fg in L∞. Let σ =

∫Xf dω, σn =

∫Xfn dω, τ =

∫Xg dω,

τn =∫Xgn dω, ϕ =

∫Xfg dω, and ϕn =

∫Xfngn dω, noting that σn → σ, τn → τ

and ϕn → ϕ uniformly. Since fn and gn are simple, we have σnτn = ϕn. Therefore

|ϕ− στ | ≤ |ϕ− σnτn|+ |σn(τn − τ)|+ |(σn − σ)τ |≤ |ϕ− ϕn|+ |σn| |τn − τ |+ |σn − σ| |τ |→ 0

as n→∞, and ϕ = στ .

Corollary 5.73. Let ω be a projection-valued measure. For all f ∈ L∞(ω),(∫X

f dω

)(∫X

f dω

)∗=

∫X

|f |2 dω

and ∣∣∣∣∫X

f dω

∣∣∣∣ ≤ ‖f‖∞ .

5.5 The spectral theorem

In the previous section we defined PO-valued and projection-valued measures butdid not give any examples of them. Here we will prove the spectral theorem,which shows that every normal operator on a complex Hilbert space E induces acanonical resolution of the identity on E. (Recall that a resolution of the identityis a projection-valued measure ω on X such that ω(X) = IdE .)

We say that a PO-valued measure ω on an LCH space is Radon if ωu is Radonfor every u ∈ E. If ω is Radon then the polarization identity in Theorem 5.60implies that ωu,v is Radon for all u, v ∈ E.

348


Theorem 5.74 (Spectral theorem for normal subalgebras). Let A be a closedunital normal subalgebra of L(E). There exists a unique Radon resolution of theidentity ω on B∆(A) such that

τ =

∫∆(A)

τ dω

for all τ ∈ A, where τ is the Gelfand transform of τ . Furthermore:

1. ω(U) 6= 0 whenever U ⊆ ∆(A) is a nonempty open set.

2. If σ ∈ L(E) then σ commutes with every τ ∈ A if and only if σ commuteswith ω(S) for all S ∈ B∆(A).

Proof. Since A is a unital commutative C*-algebra, Theorem 5.37 shows that theGelfand representation Γ : A→ C(∆(A)) is an isometric *-isomorphism. To proveuniqueness, suppose that ν is another Radon resolution of the identity on B∆(A)

satisfying τ =∫

∆(A)τ dν for all τ ∈ A. We have∫

∆(A)

f dωu,v =⟨Γ−1(f)u, v

⟩=

∫∆(A)

f dνu,v

for all u, v ∈ E and all f ∈ C(∆(A)), so Theorem 4.151 shows that ν = ω.

To prove existence, define a map λu,v : C(∆(A))→ C for each u, v ∈ E by

λu,vf =⟨Γ−1(f)u, v

⟩.

Then λu,v is a continuous linear functional with |λu,v| ≤ |u| |v|, and Theorem 4.151shows that there is a unique µu,v ∈MR(∆(A),C) with ‖µu,v‖ ≤ |u| |v| such that∫

∆(A)

f dµu,v =⟨Γ−1(f)u, v

⟩for all f ∈ C(∆(A)). It is easy to see that for each S ∈ B∆(A), the map (u, v) 7→µu,v(S) is sesquilinear. This implies that for all f ∈ L∞(|µu,v|), the map

(u, v) 7→∫

∆(A)

f dµu,v

is sesquilinear. Since this map is also continuous, Theorem 5.55 shows that thereis a unique operator τf ∈ L(E) such that

〈τfu, v〉 =

∫∆(A)

f dµu,v

349


for all u, v ∈ E. By definition, τf = Γ−1(f) if f ∈ C(∆(A)).

Define ω : B∆(A) → L(E) by ω(S) = τχS . If we can show that the map S 7→〈ω(S)u, u〉 is positive measure on B∆(A) for all u ∈ E, then Theorem 5.62 impliesthat ω is a PO-valued measure. Since

〈ω(S)u, u〉 = 〈τχSu, u〉 =

∫∆(A)

χS dµu,u = µu,u(S),

this is equivalent to showing that µu,u is a positive measure. We have

µu,u(∆(A)) =

∫∆(A)

1 dµu,u =⟨Γ−1(1)u, u

⟩= |u|2

and ‖µu,u‖ ≤ |u|2, so

|u|2 = µu,u(∆(A)) ≤ |µu,u| (∆(A)) ≤ |u|2 .

By Theorem 4.15, µu,u is a positive measure.

Next, we prove that τfg = τfτg for all f, g ∈ L∞(ω). Let u, v ∈ E. If f, g ∈C(∆(A)) then∫

∆(A)

fg dµu,v =⟨Γ−1(fg)u, v

⟩=⟨Γ−1(f)Γ−1(g)u, v

⟩=

∫∆(A)

f dµΓ−1(g)u,v.

Since both sides are continuous in f as functions on

L = L 1(|µu,v|) ∩L 1(|µΓ−1(g)u,v|)

and Theorem 4.143 shows that C(∆(A)) is dense in L , equality holds for allf ∈ L . In particular, it holds for all f ∈ L∞(ω). Therefore if f ∈ L∞(ω) andg ∈ C(∆(A)),∫

∆(A)

fg dµu,v =

∫∆(A)

f dµΓ−1(g)u,v =⟨τfΓ−1(g)u, v

⟩=⟨Γ−1(g)u, τ∗f v

⟩=

∫∆(A)

g dµu,τ∗f v.

Again, both sides are continuous in g as functions on L 1(|µu,v|) ∩L 1(|µu,τ∗f v|),so the above equality holds for all f ∈ L∞(ω). Finally, for all f, g ∈ L∞(ω),

〈τfgu, v〉 =

∫∆(A)

fg dµu,v =

∫∆(A)

g dµu,τ∗f v

=⟨τgu, τ

∗f v⟩

= 〈τfτgu, v〉 .

350


This proves that τfg = τfτg.

Sinceω(S)2 = τ2

χS = τχS = ω(S)

for all S ∈ B∆(A), we have shown that ω is a projection-valued measure. Also, forall u ∈ E we have

|ω(∆(A))u|2 = 〈ω(∆(A))u, u〉 = µu,u(∆(A)) = |u|2

due to Theorem 1.107, so ω is a resolution of the identity. By definition, if τ ∈ Athen ⟨(∫

∆(A)

τ dω

)u, v

⟩=

∫∆(A)

τ dωu,v = 〈τu, v〉

for all u, v ∈ E, so∫

∆(A)τ dω = τ . For (1), choose any x ∈ U . By Urysohn’s

lemma, there is some f ∈ C(∆(A)) such that supp f ⊆ U and f(x) = 1, so∫∆(A)

f dω = Γ−1(f) 6= 0

because Γ is an isomorphism. If ω(U) = 0 then Corollary 5.61 shows that ωu(U) =0 and ⟨

Γ−1(f)u, u⟩

=

∫∆(A)

f dωu = 0

for all u ∈ E. Therefore Γ−1(f) = 0, which is a contradiction. For (2), Theorem5.70 shows that if σ commutes with ω(S) for all S ∈ B∆(A), then σ commutes withevery τ ∈ A. Conversely, suppose that σ commutes with every τ ∈ A. For everyf ∈ C(∆(A)) and all u, v ∈ E we have∫

∆(A)

f dµu,σ∗v =⟨Γ−1(f)u, σ∗v

⟩=⟨σΓ−1(f)u, v

⟩=⟨Γ−1(f)σu, v

⟩=

∫∆(A)

f dµσu,v.

As before, this implies that the equality holds for all f ∈ L∞(ω). Therefore

〈σω(S)u, v〉 = 〈ω(S)u, σ∗v〉 =

∫∆(A)

χS dµu,σ∗v

=

∫∆(A)

χS dµσu,v = 〈ω(S)σu, v〉

for all u, v ∈ E, and σω(S) = ω(S)σ.

351


Theorem 5.75 (Spectral theorem for normal operators). Let τ ∈ L(E) be normal.There exists a unique Radon resolution of the identity ω on Bσ(τ) such that

τ =

∫z∈σ(τ)

z dω.

Furthermore, if σ ∈ L(E) commutes with τ then σ commutes with ω(S) for allS ∈ Bσ(τ).

Proof. Let A be the closed unital subalgebra of L(E) generated by τ and τ∗. Thissubalgebra is normal because τ is normal. Theorem 5.42 shows that σA(τ) =σL(E)(τ), and Theorem 5.46 shows that τ : ∆(A)→ σ(τ) is a homeomorphism.

By Theorem 5.74 there is a Radon resolution of the identity ν on B∆(A) suchthat τ =

∫∆(A)

τ dν. Define ω : Bσ(τ) → L(E) by ω(S) = ν(τ−1(S)). It is easy

to see that ω is a Radon resolution of the identity satisfying τ =∫z∈σ(τ)

z dω.

Suppose that µ is another Radon resolution of the identity on σ(τ) satisfying τ =∫z∈σ(τ)

z dµ. If p(x, y) is polynomial in two variables (with complex coefficients)

then for all u, v ∈ E,∫z∈σ(τ)

p(z, z) dωu,v = 〈p(τ, τ∗)u, v〉 =

∫z∈σ(τ)

p(z, z) dµu,v

by Theorem 5.68 and Theorem 5.71. The set of all z 7→ p(z, z) is dense in C(σ(τ))due to Theorem 4.134, so ω = µ by Theorem 4.151.

If σ ∈ L(E) commutes with τ then Theorem 5.45 shows that σ commutes with τ∗.By Theorem 5.74, σ commutes with ω(S) for all S ∈ Bσ(τ).

In contrast to the holomorphic functional calculus (see Theorem 5.13) and thecontinuous functional calculus (see Theorem 5.46), the spectral theorem for normaloperators allows us to construct a Borel functional calculus that applies to anybounded Borel function f defined on σ(τ), when τ is a normal operator. If ω isthe Radon resolution of the identity defined in Theorem 5.75, we write f(τ) =∫σ(τ)

f dω for convenience. We say that ω is the spectral decomposition of τ .

Corollary 5.76 (Properties of the Borel functional calculus). Let τ ∈ L(E) benormal and let ω be the spectral decomposition of τ .

1. The map f 7→ f(τ) is a unital *-homomorphism from L∞(σ(τ), ω) to L(E).

2. Idσ(τ)(τ) = τ .

3. |f(τ)| ≤ ‖f‖∞ for all f ∈ L∞(σ(τ), ω).

352


4. |f(τ)| = ‖f‖∞ for all f ∈ C(σ(τ)).

5. σ(f(τ)) = f(σ(τ)) for all f ∈ C(σ(τ)).

6. If fn is a sequence in L∞(σ(τ), ω) converging in L∞ to f ∈ L∞(σ(τ), ω),then fn(τ)→ f(τ).

7. If σ ∈ L(E) commutes with τ then σ commutes with f(τ) for all f ∈L∞(σ(τ), ω).

By definition (see Theorem 5.74), the Borel functional calculus agrees with thecontinuous functional calculus for continuous functions defined on σ(τ), for anynormal operator τ . The following theorem shows that the Borel functional calculusalso agrees with the holomorphic functional calculus for holomorphic functionsdefined on (an open set containing) σ(τ).

Theorem 5.77 (Equivalence of Borel and holomorphic functional calculus). Letτ ∈ L(E) be normal, let U ⊆ C be an open set containing σ(τ), and let f : U → Cbe holomorphic. Then f(τ) = f(τ). (f is defined as in Theorem 5.13.)

Proof. Let ω be the spectral decomposition of τ and choose a piecewise C1 1-cycleγ in U that surrounds σ(τ) (see Theorem 5.11). Since σ(τ) is compact, we canchoose some δ > 0 and an open set V ⊆ U containing the image of γ such that|x− y| ≥ δ for all x ∈ V and y ∈ σ(τ). If λ1, λ2 ∈ V and z ∈ σ(τ) then∣∣∣∣ f(λ1)

λ1 − z− f(λ2)

λ2 − z

∣∣∣∣ =

∣∣∣∣f(λ1)λ2 − f(λ2)λ1 − (f(λ1)− f(λ2))z

(λ1 − z)(λ2 − z)

∣∣∣∣≤ δ−2(|f(λ1)λ2 − f(λ2)λ1|+ |τ ||f(λ1)− f(λ2)|),

so

supz∈σ(τ)

∣∣∣∣ f(λ1)

λ1 − z− f(λ2)

λ2 − z

∣∣∣∣→ 0

as λ1 → λ2. This shows that

λ 7→ z 7→ f(λ)

λ− z

353


defines a continuous function from V to L∞(ω). Then

f(τ) =1

2πi

∫γ

f(λ)(λIdE − τ)−1dλ

=1

2πi

∫γ

f(λ)

∫z∈σ(τ)

1

λ− zdω dλ

=

∫z∈σ(τ)

1

2πi

∫γ

f(λ)

λ− zdλ dω

=

∫z∈σ(τ)

f(z) dω

= f(τ)

using the fact that∫σ(τ)

: L∞(ω) → L(E) is a continuous linear map, Theorem

2.67, and Theorem 3.34.

Theorem 5.78. If τ ∈ L(E) is normal, then

|τ | = sup|u|=1

|〈τu, u〉|

(cf. Corollary 5.53).

Proof. Let ω be the spectral decomposition of τ , and let ε > 0. By Theorem5.37, ρ(τ) = ‖τ‖∞ = |τ |, so there is some z0 ∈ σ(τ) such that |z0| = |τ |. LetS = z ∈ σ(τ) : |z − z0| < ε, which is a nonempty open set. By Theorem 5.74,ω(S) 6= 0. Since ω(S) is an orthogonal projection, we can choose some unit vectorv ∈ E such that ω(S)v = v. Define f : σ(τ) → C by f(z) = z − z0 for z ∈ S andf(z) = 0 for z /∈ S. Then f(τ) = (τ − z0IdE)ω(S) and f(τ)v = τv − z0v, so

||〈τv, v〉| − |z0|| ≤ |〈τv, v〉 − z0| = |〈f(τ)v, v〉| ≤ |f(τ)| ≤ ‖f‖∞ ≤ ε.

Thereforesup|u|=1

|〈τu, u〉| ≥ |〈τv, v〉| ≥ |z0| − ε = |τ | − ε.

Since ε was arbitrary,|τ | ≤ sup

|u|=1

|〈τu, u〉| ≤ |τ | .

Theorem 5.79. Let τ ∈ L(E) be normal.

1. τ is self-adjoint if and only if σ(τ) ⊆ R.

354


2. τ is unitary if and only if σ(τ) ⊆ T = z ∈ C : |z| = 1.

Proof. Let A be the closed unital subalgebra of L(E) generated by τ and τ∗, andlet ω be as in Theorem 5.74. Then

τ∗ =

∫∆(A)

τ∗ dω =

∫∆(A)

τ dω.

This implies that τ = τ∗ if and only if τ = τ , which is true if and only if τ(∆(A)) ⊆R. Similarly, ττ∗ = IdE if and only if τ τ = 1, which is true if and only if τ(∆(A)) ⊆T. Lemma 5.30 and Theorem 5.10 show that σ(τ) = σA(τ) = τ(∆(A)).

Invariant subspaces and eigenvalues

If τ ∈ L(E) is normal, its spectral decomposition ω allows us to decompose Einto closed invariant subspaces of τ . Let S ∈ Bσ(τ). If x ∈ imω(S) then τx =τω(S)x = ω(S)τx ∈ imω(S), so imω(S) is a closed invariant subspace of τ .Similarly, (imω(S))⊥ = imω(σ(τ)\S) is a closed invariant subspace of τ , andwe have E = imω(S) imω(σ(τ)\S). In fact, the following result implies thatτ(imω(S)) = imω(S) under certain conditions.

Theorem 5.80. Let τ ∈ L(E) be normal, let ω be the spectral decomposition ofτ , and let f ∈ C(σ(τ)).

1. ker f(τ) = imω(f−1(0)).

2. im f(τ) is closed if and only if f−1(0) is open in σ(τ).

Proof. Let S = f−1(0). For (1), we have fχS = 0, so f(τ)ω(S) = f(τ)χS(τ) = 0and imω(S) ⊆ ker f(τ). Since S is compact, there is a sequence of Borel setsTn such that σ(τ) \ S =

⋃∞n=1 Tn and every Tn has positive distance from

S. Define fn : σ(τ) → C by fn(z) = 1/f(z) for z ∈ Tn and fn(z) = 0 forz /∈ Tn. We have fn ∈ L∞(ω) because Tn has positive distance from S and f iscontinuous, so fnf = χTn and fn(τ)f(τ) = ω(Tn). If f(τ)u = 0 then ω(Tn)u = 0for all n, and since ω is countably additive, we have ω(σ(τ) \ S)u = 0. Thereforeω(S)u = ω(σ(τ))u = u. This shows that ker f(τ) ⊆ imω(S).

For (2), suppose that σ(τ)\S is closed in σ(τ). We have χσ(τ)\Sf = f , soω(σ(τ)\S)f(τ) = χσ(τ)\S(τ)f(τ) = f(τ) and im f(τ) ⊆ imω(σ(τ)\S). Defineg : σ(τ) → C by g(z) = 1/f(z) for z ∈ σ(τ)\S and g(z) = 0 for z ∈ S. Sinceboth S and σ(τ)\S are compact, σ(τ)\S has positive distance from S. The con-tinuity of f implies that g ∈ L∞(ω), so fg = χσ(τ)\S and f(τ)g(τ) = ω(σ(τ)\S).

355


This shows that imω(σ(τ)\S) ⊆ im f(τ), and therefore im f(τ) = imω(σ(τ)\S) isclosed.

Now suppose that im f(τ) is closed. By Theorem 1.104 and Lemma 1.73, we canchoose some r > 0 such |f(τ)x| ≥ 2r|x| for all x ∈ im f(τ). Write Bd ⊆ C for theopen ball of radius d around 0. Let ε > 0, let Bε ⊆ C be the closed ball of radiusε around 0, and let Uε = f−1(Br\Bε). Define g : σ(τ) → C by g(z) = 1/f(z) forz ∈ Uε and g(z) = 0 for z 6= Uε, so that fg = χUε and f(τ)g(τ) = ω(Uε). Letx ∈ imω(Uε). Then

|f(τ)x| = |f(τ)ω(Uε)x| =∣∣∣∣(∫

Uε

f dω

)x

∣∣∣∣ ≤ ∣∣∣∣∫Uε

f dω

∣∣∣∣ |x| ≤ r|x|.But x = ω(Uε)x = f(τ)g(τ)x ∈ im f(τ), so |f(τ)x| ≥ 2r|x| and therefore x = 0.This shows that ω(Uε) = 0, and since Uε is open in σ(τ), Theorem 5.74 impliesthat Uε is empty. Taking ε → 0 shows that f−1(Br\0) is empty, and thereforef−1(0) is open in σ(τ).

Corollary 5.81. Let τ ∈ L(E) be normal and let ω be the spectral decompositionof τ .

1. If S ⊆ σ(τ) is both open and closed in σ(τ), then τ(imω(S)) is closed if andonly if 0 is not a limit point of S. In that case, τ(imω(S)) = imω(S).

2. im τ is closed if and only if 0 is not a limit point of σ(τ).

Proof. Apply Theorem 5.80 with f = Idσ(τ)χS (this function is continuous becauseS is both open and closed). If τ(imω(S)) is closed, then Theorem 1.104 showsthat τ(imω(S)) = imω(S).

Corollary 5.82 (Eigenvalues of normal operators). Let τ ∈ L(E) be normal andlet ω be the spectral decomposition of τ . Let λ ∈ σ(τ) and ρ = ω(λ).

1. ker(λIdE − τ) = im ρ.

2. λ is an eigenvalue of τ if and only if ρ 6= 0.

3. Every isolated point of σ(τ) is an eigenvalue of τ .

Proof. (1) follows from Theorem 5.80 with f(z) = λ− z. (2) is clear from (1). For(3), if λ is an isolated point of σ(τ) then λ is a nonempty open subset of σ(τ),and Theorem 5.74 implies that ω(λ) 6= 0.

356


Positive operators

In Section 5.3 we defined a positive element of a Banach algebra A to be a Her-mitian element x ∈ A such that σ(x) ⊆ [0,∞). The following theorem shows thatan operator τ is positive (in the sense that 〈τv, v〉 ≥ 0 for all v ∈ E) if and only ifit is a positive element of L(E).

Theorem 5.83. Let τ ∈ L(E). The following are equivalent:

1. τ is a positive operator.

2. τ is self-adjoint and σ(τ) ⊆ [0,∞).

Proof. If (1) holds then Theorem 5.79 implies that σ(τ) ⊆ R. If z > 0 then

z |v| = |v|−1 〈zv, v〉 ≤ |v|−1 〈(zIdE + τ)v, v〉 ≤ |(zIdE + τ)v|

for all nonzero v ∈ E. Therefore zIdE + τ is injective, and Lemma 1.73 shows thatim(zIdE + τ) is closed. By Theorem 1.100,

im(zIdE + τ) = (ker(zIdE + τ)∗)⊥ = (ker(zIdE + τ))⊥ = E.

This implies that zIdE + τ is invertible, and −z /∈ σ(τ). Conversely, if (2) holdsthen

〈τv, v〉 =

∫σ(τ)

Idσ(τ) dωv ≥ 0

for all v ∈ E because ωv is a positive measure and Idσ(τ) ≥ 0 on σ(τ).

Theorem 5.84 (Existence of positive square roots). Every positive τ ∈ L(E) hasa unique positive square root τ1/2 ∈ L(E). Furthermore:

1. |τ1/2| = |τ |1/2.

2. ker τ1/2 = ker τ .

3. im τ1/2 is closed if and only if im τ is closed. In that case, im τ1/2 = im τ .

4. If σ ∈ L(E) is positive and commutes with τ , then στ is positive andσ1/2τ1/2 = (στ)1/2.

5. If τ is invertible then (τ1/2)−1 = (τ−1)1/2.

Proof. Suppose that A is a closed unital normal subalgebra of L(E) containingτ . By Theorem 5.37 and Theorem 5.74, there is a resolution of the identity ω onB∆(A) such that Γ−1(f) =

∫∆(A)

f dω for all f ∈ C(∆(A)). By Theorem 5.83 and

357


Lemma 5.30 we have τ ≥ 0, so√τ ∈ C(∆(A)). Let σ =

∫∆(A)

√τ dω; Theorem

5.68 shows that σ ≥ 0, and Theorem 5.71 implies that σ2 = τ .

Now let A0 be the closed unital subalgebra of L(E) generated by τ , let ω0 bethe associated resolution of the identity, and define τ1/2 =

∫∆(A0)

√τ dω0 ∈ A0.

Suppose that σ1 is another positive square root of τ . Let A1 be the unital sub-algebra of L(E) generated by σ1, and let ω1 be the associated resolution of theidentity. Then τ = σ2

1 ∈ A1, so A0 ⊆ A1 and τ1/2 ∈ A1. Since Γ(τ) = Γ(σ1)2 andΓ(τ) = Γ(τ1/2)2, we have

σ1 =

∫∆(A1)

√τ dω1 = τ1/2.

For (1),|τ1/2|2 = |τ1/2τ1/2| = |τ |

using Theorem 1.98. (2) follows from Theorem 1.100 since τ = τ1/2τ1/2. (3)follows from Corollary 5.81 and the fact that σ(τ1/2) =

√σ(τ) (see Theorem 5.47).

For (4), Corollary 5.76 implies that σ commutes with τ1/2, so τ1/2 commutes withσ1/2. For all u ∈ E,

〈στu, u〉 = 〈(σ1/2τ1/2)2u, u〉 = |σ1/2τ1/2u|2 ≥ 0,

so στ ≥ 0. Since (σ1/2τ1/2)2 = στ , we have (στ)1/2 = σ1/2τ1/2 by uniqueness.(5) follows from (2) and (4).

5.6 Operators on real Hilbert spaces

Complexification

Let V be real vector space. The complexification of V is the complex vectorspace V C = V × V with addition and scalar multiplication defined by

(u1, u2) + (v1, v2) = (u1 + v1, u2 + v2),

(a+ bi)(u1, u2) = (au1 − bu2, au2 + bu1).

For convenience, we will write u1 + u2i for the vector (u1, u2) ∈ V C, so that

(u1 + u2i) + (v1 + v2i) = (u1 + v1) + (u2 + v2)i,

(a+ bi)(u1 + u2i) = (au1 − bu2) + (au2 + bu1)i.

358


We have canonical projections Re, Im : V C → V satisfying Re(u1 + u2i) = u1

and Im(u1 + u2i) = u2. We also have an injection ·C : V → V C defined byuC = (u, 0). If u ∈ V then uC is the complexification of u, and we will simplywrite u = u+ 0i ∈ V C. It is easy to check that iu = 0 + ui. If (uα)α∈A is a basisfor V , then (uCα)α∈A is a basis for V C. Therefore dimV C = dimV .

If V,W are real vector spaces and f : V →W is a linear map, the complexifica-tion of f is the linear map fC : V C →WC defined by

fC(u1 + u2i) = (fu1) + (fu2)i.

It is easy to check the following identities when r ∈ R, and g : V → W andh : U → V are other linear maps:

(f + g)C = fC + gC,

(rf)C = rfC,

(fh)C = fChC.

Now suppose that E is a real inner product space. We can define an inner producton EC by

〈u1 + u2i, v1 + v2i〉 = 〈u1, v1〉+ 〈u2, v2〉 − i 〈u1, v2〉+ i 〈u2, v1〉 .

The associated norm satisfies

|u1 + u2i|2 = |u1|2 + |u2|2.

Theorem 5.85. If E is complete then EC is complete.

Proof. Let un be a Cauchy sequence in EC. It is easy to see that Reun andImun are Cauchy in E, so Reun → v1 and Imun → v2 for some v1, v2 ∈ E.Then

|un − (v1 + v2i)|2 = |Reun − v1|2 + | Imun − v2|2 → 0,

so un → v1 + v2i.

Theorem 5.86. Let f ∈ L(E,F ).

1. fC ∈ L(EC, FC) and |fC| = |f |.

2. fC is invertible if and only if f is invertible. In that case, (fC)−1 = (f−1)C.

3. (fC)∗ = (f∗)C if E and F are real Hilbert spaces.

359


Proof. For all u1 + u2i ∈ EC we have

|fC(u1 + u2i)|2 = |fu1|2 + |fu2|2 ≤ |f |2(|u1|2 + |u2|2)

= |f |2|u1 + u2i|2,

so |fC| ≤ |f |. Since |fCu| = |fu| for all u ∈ E and the norm of u is the same in Eand EC, we must have |fC| ≥ |f |. This proves (1).

For (3), we have

〈fC(u1 + u2i), v1 + v2i〉 = 〈fu1, v1〉+ 〈fu2, v2〉 − i〈fu1, v2〉+ i〈fu2, v1〉= 〈u1, f

∗v1〉+ 〈u2, f∗v2〉 − i〈u1, f

∗v2〉+ i〈u2, f∗v1〉

= 〈u1 + u2i, (f∗)C(v1 + v2i)〉,

so (fC)∗ = (f∗)C by uniqueness.

Conjugation

The conjugate of a vector u = u1 + u2i ∈ EC is

u = u1 − u2i,

and the conjugate of an operator f ∈ L(EC) is the operator f∗ ∈ L(EC) definedby

f∗(u) = f(u).

We say that a vector u ∈ EC is self-conjugate if u = u, and we say that anoperator f ∈ L(EC) is self-conjugate if f = f∗, i.e.

f(u) = f(u)

for all u ∈ EC.

Note that for all u ∈ EC,

Reu =u+ u

2and Imu =

u− u2i

.

Therefore, u is self-conjugate if and only if it is “real”, i.e. u is the complexificationof some vector in E. The following theorem shows that a similar statement is truefor operators.

Theorem 5.87 (Properties of vector and operator conjugation). Let u, v ∈ EC

and let f, g ∈ L(EC).

360


1. u is self-conjugate if and only if u = wC for some w ∈ E.

2. |u| = |u|.

3. 〈u, v〉 = 〈u, v〉.

4. f is self-conjugate if and only if f = hC for some h ∈ L(E).

5. |f∗| = |f |.

6. (fg)∗ = f∗g∗.

7. f is invertible if and only if f∗ is invertible. In that case, (f∗)−1 = (f−1)∗.

8. (f∗)∗ = (f∗)∗.

The vector conjugation map u 7→ u and the operator conjugation map f 7→ f∗ areconjugate linear bijective isometries.

Proof. (3) follows from a simple calculation. For (4), suppose that f = f∗. Ifu ∈ E then

Im fu =1

2i(fu− fu) =

1

2i(fu− fu) = f(Imu) = 0,

so we can define g ∈ L(E) by gu = fu. For all u1 + u2i ∈ EC we have

gC(u1 + u2i) = gu1 + (gu2)i = fu1 + fu2i = f(u1 + u2i),

so f = gC. Conversely, if f = gC then f = f∗ because gC(u1−u2i) = (gu1)−(gu2)i.For (8), we have

〈f∗u, v〉 = 〈fu, v〉 = 〈fu, v〉 = 〈u, f∗v〉 = 〈u, f∗v〉 = 〈u, (f∗)∗v〉,

so (f∗)∗ = (f∗)∗ by uniqueness.

If f ∈ L(EC) then

f =f + f∗

2+f − f∗

2ii,

and it is easy to check that both (f + f∗)/2 and (f − f∗)/(2i) are self-conjugate.Therefore we can write f = gC + hCi for some g, h ∈ L(E).

Theorem 5.88. The map g+hi 7→ gC +hCi from L(E)C to L(EC) is an isomor-phism.

361


Proof. The map is linear because

(a+ bi)(g + hi) = (ag − bh) + (ah+ bg)i

and

(ag − bh)C + (ah+ bg)Ci = agC − bhC + (ahC + bgC)i

= (a+ bi)(gC + hCi).

We have already shown that the map is surjective, so it remains to prove injectivity.If gC +hCi = 0 then g(u) +h(u)i = 0 for all u ∈ E. Therefore g(u) = h(u) = 0 forall u ∈ E, i.e. g = h = 0.

The spectrum

Let E be a real Hilbert space. We define the spectrum of an operator τ ∈ L(E)by σ(τ) = σ(τC).

Theorem 5.89. Let τ ∈ L(EC).

1. z ∈ σ(τ) if and only if z ∈ σ(τ∗), for all z ∈ C.

2. τv = λv if and only if τ∗(v) = λ(v), for all λ ∈ C and v ∈ EC.

If τ is self-conjugate, then σ(τ) is self-conjugate. The same holds for the set ofeigenvalues of τ .

Proof. For (1), if zIdEC− τ is invertible then it is easy to check that (zIdEC− τ)−1∗

is the inverse of zIdEC − τ∗ (see Theorem 5.87). (2) is obvious from the fact thatτ(v) = τ∗(v).

Compare the next theorem with Theorem 5.43.

Theorem 5.90 (Holomorphic functional calculus for real Hilbert spaces). LetU ⊆ C be a self-conjugate open set (i.e. z ∈ U for all z ∈ U) and let f : U → Cbe a holomorphic function.

1. The set AU = τ ∈ L(EC) : σ(τ) ⊆ U is self-conjugate, in the sense thatτ∗ ∈ AU for all τ ∈ AU .

2. H : H(U)→ H(H(U)) is self-conjugate, in the sense that f∗(τ) = f(τ∗)∗ for

all f ∈ H(U) and τ ∈ AU (H and f are defined as in Theorem 5.13, andf∗(z) = f(z) for all z ∈ U).

362


In particular, if f ∈ H(U) is self-conjugate then f : AU → L(EC) is self-adjoint,

in the sense that f(τ∗) = f(τ)∗ for all τ ∈ AU . If f ∈ H(U) is self-conjugate and

τ ∈ AU is self-conjugate, then f(τ) is also self-conjugate.

Proof. For (1), suppose that τ ∈ AU . Then

σ(τ∗) = z : z ∈ σ(τ) ⊆ z : z ∈ U = U

by Theorem 5.89, so τ∗ ∈ AU .

For (2), Theorem 5.11 shows that we can choose some 1-cycle γ in U that surrounds

σ(τ) such that [γ] = [η − η]. Write [η] =∑ki=1 ci[σi] where each σi is a C1 path.

Then

f(τ∗)∗ =

(1

2πi

∫γ

f(λ)(λIdEC − τ∗)−1 dλ

)∗

= − 1

2πi

k∑i=1

ci

(∫σi−σi


)∗.

Now(∫σi


)∗

=

(∫ 1

0

f(σi(t))(σi(t)IdEC − τ∗)−1σ′i(t) dt

)∗

=

∫ 1

0

f(σi(t))(σi(t)IdEC − τ)−1σ′i(t) dt

=

∫ 1

0

f∗(σi(t))(σi(t)IdEC − τ)−1σi′(t) dt

=

∫σi

f∗(λ)(λIdEC − τ)−1 dλ

by Theorem 5.87. Therefore

f(τ∗)∗

= − 1

2πi

k∑i=1

ci

(∫σi

f(λ)(λIdEC − τ∗)−1 dλ−∫σi


)∗

= − 1

2πi

k∑i=1

ci

(∫σi

f∗(λ)(λIdEC − τ)−1 dλ−∫σi


)

=1

2πi

k∑i=1

ci

∫σi−σi


= f∗(τ).

363


Theorem 5.91. Let τ ∈ L(E).

1. τ is normal if and only if τC is normal.

2. τ is self-adjoint if and only if τC is self-adjoint.

3. τ is unitary if and only if τC is unitary.

4. τ is an orthogonal projection if and only if τC is an orthogonal projection.

5. τ is positive if and only if τC is positive.

6. im τ is closed if and only if im τC is closed.

Proof. (1), (2) and (3) are obvious from Theorem 5.86 and the fact that (στ)C =σCτC for all σ, τ ∈ L(E). (4) follows from (1), Theorem 1.107, and the fact that(τ2)C = (τC)2. For (5), we have

〈τC(u1 + u2i), u1 + u2i〉 = 〈τu1, u1〉+ 〈τu2, u2〉 − i〈τu1, u2〉+ i〈τu2, u1〉.

If τ is positive then it is self-adjoint, so 〈τu1, u2〉 = 〈u1, τu2〉 and it is clear thatτC is positive. If τC is positive then the above expression is always real, so wemust have 〈τu1, u2〉 = 〈τu2, u1〉 for all u1, u2 ∈ E. Therefore τ is self-adjoint,and setting u2 = 0 shows that τ is positive. For (6), suppose that im τ is closedand let un + vni be a sequence in EC such that τC(un + vni) → x + yi forsome x + yi ∈ EC. Then τun → x and τvn → y, so x, y ∈ im τ and thereforex+yi ∈ im τC. Conversely, suppose that im τC is closed and let un be a sequencein E such that τun → x for some x ∈ E. Then τCun → x, so x ∈ im τC andtherefore x ∈ im τ .

Corollary 5.92. Let τ ∈ L(E) be normal.

1. τ is self-adjoint if and only if σ(τ) ⊆ R.

2. τ is unitary if and only if σ(τ) ⊆ T.

3. τ is positive if and only if τ is self-adjoint and σ(τ) ⊆ [0,∞).

Proof. See Theorem 5.79 and Theorem 5.83.

364


The spectral theorem

The conjugate of a set S ⊆ L(EC) is the set S∗ = τ∗ : τ ∈ S. We say thatS ⊆ L(EC) is self-conjugate if S = S∗.

Let A be a closed unital self-conjugate subalgebra of L(EC). We can examine thestructure of the Gelfand space of A by defining a conjugation map on it, just asoperator conjugation reveals the structure of L(EC). The conjugate of a complexhomomorphism ϕ ∈ ∆(A) is the element ϕ∗ ∈ ∆(A) defined by

ϕ∗(τ) = ϕ(τ∗).

The terms “conjugate” and “self-conjugate” are defined for subsets of ∆(A) in theusual way.

Recall that if S ⊆ C is self-conjugate, the conjugate of a function f : S → C isthe function f∗ : S → C defined by

f∗(z) = f(z).

Similarly, the conjugate of a function f : ∆(A)→ C is the function f∗ : ∆(A)→C defined by

f∗(ϕ) = f(ϕ∗).

We say that f is self-conjugate if f = f∗.

Theorem 5.93. The Gelfand representation Γ : A → C(∆(A)) is self-conjugate,in the sense that

Γ(τ∗) = Γ(τ)∗

for all τ ∈ A.

Proof. For all ϕ ∈ ∆(A),

Γ(τ∗)(ϕ) = ϕ(τ∗) = ϕ∗(τ) = Γ(τ)(ϕ∗) = Γ(τ)∗(ϕ).

We say that a PO-valued measure ω on B∆(A) is self-conjugate if ω(S∗) = ω(S)∗for all S ∈ B∆(A), where S∗ = ϕ∗ : ϕ ∈ S. If S ⊆ C is self-conjugate and ω is aPO-valued measure on BS , we define the term “self-conjugate” for ω in the sameway (where S∗ = z : z ∈ S).

365


Theorem 5.94. If ω is a self-conjugate PO-valued measure on B∆(A), then the

corresponding integration map∫

∆(A): L∞(ω)→ L(EC) is self-conjugate. That is,(∫

∆(A)

f dω

)∗

=

∫∆(A)

f∗ dω

for all f ∈ L∞(ω). The above statement still holds if “∆(A)” is replaced by “S”where S ⊆ C is self-conjugate.

Proof. If f is simple with respect to Ai, then(∫∆(A)

f dω

)∗

=

(n∑i=1

f(Ai)ω(Ai)

)∗

=

n∑i=1

f(Ai)ω(Ai)∗

=

n∑i=1

f∗((Ai)∗)ω((Ai)∗) =

∫∆(A)

f∗ dω.

The simple maps are dense in L∞(ω), so equality holds for all f ∈ L∞(ω).

Theorem 5.95 (Spectral theorem for normal subalgebras of a complexification).Let A be a closed unital normal self-conjugate subalgebra of L(EC). Let ω be theRadon resolution of the identity on B∆(A) defined in Theorem 5.74. Then ω isself-conjugate.

Proof. Let u, v ∈ EC. If f ∈ C(∆(A)) then∫∆(A)

f dωu,v = 〈Γ−1(f)u, v〉 = 〈Γ−1(f)u, v〉

= 〈Γ−1(f∗)u, v〉 =

∫∆(A)

f∗ dωu,v

using Theorem 5.93 and Theorem 5.87. By continuity and Theorem 4.143, theequality holds for all f ∈ L∞(ω).

In particular, if S ∈ B∆(A) then

〈ω(S∗)u, v〉 =

∫∆(A)

χS∗ dωu,v =

∫∆(A)

χS dωu,v

= 〈ω(S)u, v〉 = 〈ω(S)∗u, v〉.

Therefore ω(S∗) = ω(S)∗.

366


Theorem 5.96 (Spectral theorem for complexified normal operators). Let τ ∈L(E) be normal and let ω be the spectral decomposition of τC.

1. ω is self-conjugate. In particular, ω(S) is self-conjugate whenever S ∈ Bσ(τ)

is self-conjugate.

2. If τ is self-adjoint then ω(S) is self-conjugate for all S ∈ Bσ(τ).

Proof. Let A be the closed unital normal subalgebra of L(EC) generated by τC.Since τC is self-conjugate, A is self-conjugate. By Theorem 5.74, there is a Radonresolution of the identity ν on B∆(A) such that σ =

∫∆(A)

σ dν for all σ ∈ A. By

Theorem 5.75, τC : ∆(A)→ σ(τ) is a homeomorphism and

ω(S) = ν(τC−1

(S))

for all S ∈ Bσ(τ), by definition. Theorem 5.93 shows that τC is self-conjugate andTheorem 5.95 shows that ν is self-conjugate, so

ω(S∗) = ν(τC−1

(S∗)) = ν(τC−1

(S)∗) = ν(τC−1

(S))∗ = ω(S)∗

for all S ∈ Bσ(τ), where S∗ = z : z ∈ S. This proves (1). If τ is self-adjoint thenσ(τ) ⊆ R by Corollary 5.92, so every set in Bσ(τ) is self-conjugate. This proves(2).

Theorem 5.97 (Borel functional calculus for complexified normal operators). Letτ ∈ L(E) be normal, let ω be the spectral decomposition of τC, and let f ∈ L∞(ω).

1.(∫

σ(τ)f dω

)∗

=∫σ(τ)

f∗ dω.

2. If f is self-conjugate then∫σ(τ)

f dω is self-conjugate.

If τ is self-adjoint:

1.(∫

σ(τ)f dω

)∗

=∫σ(τ)

f dω.

2. If f is real-valued then∫σ(τ)

f dω is self-conjugate and self-adjoint.


If f is self-conjugate in Theorem 5.97 then Theorem 5.87 shows that∫σ(τ)

f dω = σC

367


for some normal σ ∈ L(E), and we write f(τ) = σ. In other words, for a normaloperator τ on a real Hilbert space, the Borel functional calculus applies to anybounded self-conjugate (complex) function defined on σ(τ).

Theorem 5.98 (Existence of positive square roots, real case). Every positiveτ ∈ L(E) has a unique positive square root τ1/2 ∈ L(E), and the properties inTheorem 5.84 hold.

Proof. By Theorem 5.84, τC has a unique positive square root (τC)1/2. Theorem5.97 shows that (τC)1/2 is self-conjugate, so Theorem 5.87 implies that (τC)1/2 =σC for some σ ∈ L(E), and (σ2)C = (σC)2 = τC implies that σ2 = τ . Corollary5.92 shows that σ is positive. Finally, if ϕ ∈ L(E) is another positive square root ofτ then ϕC is a positive square root of τC, so ϕC = (τC)1/2 = σC by uniqueness.

Polar decomposition

Let E and F be Hilbert spaces over K (where K = R or K = C). We say thatν ∈ L(E,F ) is a partial isometry if ν|(ker ν)⊥ is an isometry.

Theorem 5.99. Let ν ∈ L(E,F ). The following are equivalent:

1. ν is a partial isometry.

2. ν∗ is a partial isometry.

3. ν∗ν is an orthogonal projection.

4. νν∗ is an orthogonal projection.

5. νν∗ν = ν.

6. ν∗νν∗ = ν∗.

Proof. Suppose that ν is a partial isometry. For all w ∈ F and u ∈ E, if u ∈(ker ν)⊥ then

〈ν∗νν∗w, u〉 = 〈νν∗w, νu〉 = 〈ν∗w, u〉

and if u ∈ ker ν then

〈ν∗νν∗w, u〉 = 〈νν∗w, νu〉 = 0 = 〈w, νu〉 = 〈ν∗w, u〉 .

This proves (1)⇒ (6), and a similar argument shows that (2)⇒ (5). (6)⇔ (5) isobvious. If (5) holds then

(νν∗)2 = νν∗νν∗ = νν∗

368


and(ν∗ν)2 = ν∗νν∗ν = ν∗ν,

which proves (5)⇒ (4) and (5)⇒ (3). If (4) holds and u ∈ (ker ν)⊥ = im ν∗ (seeTheorem 1.100) then there is a sequence un in im ν∗ such that un → u, so

〈νu, νu〉 = limn→∞

〈νun, νun〉 = limn→∞

〈νν∗un, νν∗un〉

= limn→∞

⟨(νν∗)2un, un

⟩= limn→∞

〈νν∗un, un〉

= limn→∞

〈ν∗un, ν∗un〉 = limn→∞

〈un, un〉

= 〈u, u〉 .

This proves (4)⇒ (1), and a similar argument shows that (3)⇒ (2).

Theorem 5.100 (Properties of partial isometries). Let ν ∈ L(E,F ) be a partialisometry.

1. im ν and im ν∗ are closed. In fact, im ν = (ker ν∗)⊥ and im ν∗ = (ker ν)⊥.

2. ν∗ν is the orthogonal projection onto im ν∗, and νν∗ is the orthogonal pro-jection onto im ν.

Proof. See Corollary 1.36 and Theorem 1.100.

Lemma 5.101. Let τ ∈ L(E,F ).

1. (τ∗τ)1/2 ∈ L(E) is a positive operator satisfying |τx| = |(τ∗τ)1/2x| for allx ∈ E.

2. If K = C and ρ ∈ L(E) is a positive operator satisfying |τx| = |ρx| for allx ∈ E, then ρ = (τ∗τ)1/2.


|τx|2 = 〈τx, τx〉 = 〈τ∗τx, x〉= 〈(τ∗τ)1/2(τ∗τ)1/2x, x〉 = 〈(τ∗τ)1/2x, (τ∗τ)1/2x〉= |(τ∗τ)1/2x|2

for all x ∈ E (see Lemma 1.109, Theorem 5.84 and Theorem 5.98). For (2), wehave

〈ρ2x, x〉 = 〈ρx, ρx〉 = |ρx|2 = |τx|2 = 〈τx, τx〉 = 〈τ∗τx, x〉

for all x ∈ E, and Theorem 1.5 applies.

369


Lemma 5.102. Let τ ∈ L(E,F ). Then τ is bounded below on (ker τ)⊥ if andonly if im τ is closed (cf. Lemma 1.73).

Proof. Note that E = ker τ(ker τ)⊥. Consider τ |(ker τ)⊥ : (ker τ)⊥ → F , which isclearly injective. Also, im τ |(ker τ)⊥ = im τ because im τ |(ker τ)⊥ ⊆ im τ and if y ∈im τ then y = τ(s+t) for some s ∈ ker τ and t ∈ (ker τ)⊥, so y = τt ∈ im τ |(ker τ)⊥ .The result follows from applying Lemma 1.73 to τ |(ker τ)⊥ .

Theorem 5.103. Let τ ∈ L(E,F ). The following are equivalent:

1. im τ is closed.

2. im(τ∗τ) is closed.

3. im((τ∗τ)1/2) is closed.

Proof. Theorem 1.100 implies (1) ⇒ (2), and Theorem 5.84 implies (2) ⇒ (3).Suppose that (3) holds. Lemma 5.102 shows that (τ∗τ)1/2 is bounded below on(ker(τ∗τ)1/2)⊥ = (ker τ)⊥, i.e. there exists some C > 0 such that |(τ∗τ)1/2x| ≥C|x| for all x ∈ (ker τ)⊥. By Lemma 5.101, |τx| = |(τ∗τ)1/2x| ≥ C|x| for allx ∈ (ker τ)⊥, so Lemma 5.102 shows that im τ is closed. This proves (3)⇒ (1).

Theorem 5.104 (Polar decomposition). If τ ∈ L(E,F ) then there is a partialisometry ν ∈ L(E,F ) and a positive operator ρ ∈ L(E) such that τ = νρ andker ν ⊆ (im ρ)⊥. Furthermore:

1. ν can be chosen so that ker ν = (im ρ)⊥.

2. ρ is unique, and ρ = (τ∗τ)1/2.

3. If τ is invertible then ν is unique, and is an isometric isomorphism.

4. If E = F and τ is normal, then ν and ρ can be chosen so that ν, ρ and τcommute, and ν is unitary.

Proof. Let ρ = (τ∗τ)1/2. Define a linear map ν0 : im ρ→ F as follows: if u ∈ im ρthen u = (τ∗τ)1/2w for some w ∈ E, and we can set ν0u = τw. This is well-defined since (τ∗τ)1/2w = 0 implies that (τ∗τ)w = 0, and Theorem 1.100 showsthat τw = 0. Also, ν0 is an isometry because Lemma 5.101 shows that

|ν0u|2 = |τw|2 = |(τ∗τ)1/2w|2 = |u|2 .

By Theorem 1.45, we can extend ν0 to an isometry on im ρ. Define ν : E → F bysetting ν = ν0 on im ρ and ν = 0 on (im ρ)⊥; then ν is a partial isometry. Clearlyνρu = ν(τ∗τ)1/2u = τu for all u ∈ E because (τ∗τ)1/2u ∈ im ρ.

370


For (2), if νρ is a polar decomposition of τ with ker ν = (im ρ)⊥ then (τ∗τ)1/2 =(ρν∗νρ)1/2 = ρ because ν is an isometry on (ker ν)⊥ ⊇ im ρ. For (3), we haveim ρ = E and ν0 = τ(τ∗τ)−1/2, so ν is an isometric isomorphism. For (4), definev(z) = z/ |z| for z 6= 0 and v(0) = 1, and p(z) = |z|. Let ν = v(τ) and ρ = p(τ)(see Corollary 5.76). By Theorem 5.68, ρ ≥ 0. Theorem 5.71 implies that ν isunitary and τ = νρ.

The Moore-Penrose inverse

Definition 5.105. A Moore-Penrose inverse or pseudoinverse of a linearmap τ ∈ L(E,F ) is a linear map τ+ ∈ L(F,E) satisfying the following properties:

1. τ+τ is self-adjoint,

2. ττ+ is self-adjoint,

3. ττ+τ = τ ,

4. τ+ττ+ = τ+.

Theorem 5.106 (Uniqueness of the pseudoinverse). If τ ∈ L(E,F ) has a pseu-doinverse, then it is unique.

Proof. Suppose that σ1, σ2 are both pseudoinverses of τ . Then

σ1τ = σ1τσ2τ = (σ1τ)∗(σ2τ)∗ = (σ2τσ1τ)∗ = (σ2τ)∗ = σ2τ,

and a similar argument shows that τσ1 = τσ2. Therefore

σ1 = σ1τσ1 = σ2τσ1 = σ2τσ2 = σ2.

Theorem 5.107 (Properties of the pseudoinverse). Let τ ∈ L(E,F ) such that apseudoinverse exists for τ .

1. 0+ = 0.

2. (τ+)+ = τ .

3. (τ∗)+ = (τ+)∗.

4. (τ∗τ)+ = τ+(τ∗)+.

5. ker τ+ = (im τ)⊥.

6. im τ+ = (ker τ)⊥.

371


7. τ+τ is the orthogonal projection onto im τ+.

8. ττ+ is the orthogonal projection onto im τ .

9. im τ and im τ+ are closed.

Proof. (1) and (2) are obvious. (3) and (4) are easy to prove by verifying theproperties in Definition 5.105.

For (5), let x ∈ ker τ+. Since ττ+ is self-adjoint, we have

〈x, τu〉 = 〈x, ττ+τu〉 = 〈ττ+x, τu〉 = 0

for all u ∈ E, which shows that x ∈ (im τ)⊥. Conversely, if x ∈ (im τ)⊥ then

〈τ+x, τ+x〉 = 〈τ+ττ+x, τ+x〉 = 〈ττ+x, (τ∗)+τ+x〉= 〈x, ττ+(τ∗)+τ+x〉 = 0

since ττ+ is self-adjoint. This shows that x ∈ ker τ+.

For (6), let x ∈ F and u ∈ ker τ . Since τ+τ is self-adjoint, we have

〈u, τ+x〉 = 〈u, τ+ττ+x〉 = 〈τ+τu, τ+x〉 = 0,

which shows that im τ+ ⊆ (ker τ)⊥. Conversely, let u ∈ (ker τ)⊥. Then u−τ+τu ∈ker τ since

τ(u− τ+τu) = τu− ττ+τu = τu− τu = 0,

and therefore

〈u− τ+τu, u− τ+τu〉 = 〈u, u− τ+τu〉 − 〈τ+τu, u− τ+τu〉= −〈τ+τu, u− τ+τu〉= −〈u, τ+τu− τ+ττ+τu〉= −〈u, τ+τu− τ+τu〉= 0.

This shows that u = τ+τu ∈ im τ+.

For (7), τ+τ is a projection because (τ+τ)2 = τ+ττ+τ = τ+τ . Since τ+τ is alsoself-adjoint, Theorem 1.107 shows that τ+τ is an orthogonal projection. Clearlyim(τ+τ) ⊆ im τ+, and if x ∈ F then τ+x = τ+ττ+x ∈ im(τ+τ), which showsthat im τ+ ⊆ im(τ+τ). Therefore τ+τ is the orthogonal projection onto im τ+. Asimilar argument applies to (8), and (9) follows from Theorem 1.82 and Theorem1.107.

372


Theorem 5.108 (Existence of the pseudoinverse). Let τ ∈ L(E,F ). Then τ hasa pseudoinverse if and only if im τ is closed.

Proof. Theorem 5.107 shows that im τ is closed if τ has a pseudoinverse.

Conversely, suppose that im τ is closed. We first show that the pseudoinverseexists when τ is normal. In that case, Corollary 1.104 shows that E = im τ ker τand τ |im τ is a linear homeomorphism onto im τ . Define τ+ ∈ L(E) by settingτ+ = (τ |im τ )−1 on im τ and τ+ = 0 on ker τ . It is easy to check that τ+ satisfiesthe properties in Definition 5.105.

For the general case, write τ = νρ using Theorem 5.104, where ρ = (τ∗τ)1/2 andν ∈ L(E,F ) is a partial isometry with ker ν = (im ρ)⊥. Note that

ρρ+ = ρ∗(ρ∗)+ = (ρ+ρ)∗ = ρ+ρ. (*)

We will prove that ρ+ν∗ is a pseudoinverse of τ by verifying the properties inDefinition 5.105.

We first show that ρ+ν∗νρ is self-adjoint. Theorem 1.100 and Theorem 5.103imply that im ρ is closed, and Theorem 5.100 shows that ν∗ν is the orthogonalprojection onto im ν∗ = (ker ν)⊥ = im ρ. Also, im ρ+ = (ker ρ)⊥ = im ρ due toTheorem 5.107 and Corollary 1.104. Using (*), we have

(ρ+ν∗νρ)∗ = ρν∗νρ+ = ρρ+

= ρ+ρ = ρ+ν∗νρ.

Next, νρρ+ν∗ is self-adjoint because

(νρρ+ν∗)∗ = ν(ρρ+)∗ν∗ = νρρ+ν∗.

Finally, the fact that ν∗ν and ρρ+ are both orthogonal projections onto im ρimplies that

νρρ+ν∗νρ = νρρ+ρ = νρ

andρ+ν∗νρρ+ν∗ = ρ+ν∗νν∗ = ρ+ν∗.

Theorem 5.109. Let τ ∈ L(E,F ).

1. If τ is invertible, then τ has a pseudoinverse and τ+ = τ−1.

2. If τ∗τ is invertible, then τ has a pseudoinverse and τ+ = (τ∗τ)−1τ∗.

373


3. If ττ∗ is invertible, then τ has a pseudoinverse and τ+ = τ∗(ττ∗)−1.

4. If τ is a partial isometry, then τ has a pseudoinverse and τ+ = τ∗.

5. If E = F and τ is normal, then τ has a pseudoinverse if and only if 0 is nota limit point of σ(τ). In that case, τ+ is normal and

τ+ = (1/·)(τ),

where 1/· : σ(τ)→ C is defined by (1/·)(z) = 1/z and (1/·)(τ) is defined asin Corollary 5.76 and Theorem 5.97. Furthermore, τ+τ = ττ+.

6. If E = F , τ is self-adjoint, and τ has a pseudoinverse, then τ+ is self-adjoint.

7. If E = F and τ is an orthogonal projection, then τ has a pseudoinverse andτ+ = τ .

Proof. (1) is obvious. (2) and (3) are easy to prove by verifying the propertiesin Definition 5.105. (4) follows directly from Theorem 5.99. (5) follows fromCorollary 5.81 and verifying that (1/·)(τ) satisfies the properties in Definition5.105. Also if τ is normal then τ+τ is the orthogonal projection onto im τ+ =(ker τ)⊥ while ττ+ is the orthogonal projection onto im τ = (ker τ)⊥, so τ+τ =ττ+. (6) and (7) are obvious.

Theorem 5.108 gives an algebraic and analytic construction of the pseudoinverse.The following is a geometric view.

Let τ ∈ L(E,F ) and let y ∈ F . If τ is not surjective, then there might not be anelement x ∈ E such that τx = y. However, it is still useful to choose some x ∈ Ethat minimizes |τx− y|. In general, this only makes sense when im τ is closed, inwhich case x should be chosen so that τx = ρim τy, where ρim τ is the orthogonalprojection onto im τ . If τ is not injective, then this choice is not unique. In fact,if τx = ρim τy for some x ∈ E then x+ ker τ is the set of all possible solutions. Tomake this choice unique, we can additionally specify that x should be chosen tominimize |x|. To summarize:

Theorem 5.110. Let τ ∈ L(E,F ) such that a pseudoinverse exists for τ , lety ∈ F , and let S be the set of all x ∈ E such that

|τx− y| = infu∈E|τu− y|.

1. τ+y ∈ S.

2. S = τ−1(ρim τy) = τ+y + ker τ .

3. τ+y is the unique element in S such that |τ+y| = infx∈S |x|.

374


Proof. ττ+ is the orthogonal projection onto im τ (see Theorem 5.107), so Theo-rem 1.86 shows that ττ+y is the best approximation to y in im τ . That is,

|ττ+y − y| = infu∈im τ

|u− y| = infx∈E|τx− y|.

This proves (1). Since the best approximation to y in im τ is unique (see Corollary1.84), S is precisely the set of x ∈ E such that τx = ττ+y = ρim τy. Thiscondition is equivalent to both x ∈ τ−1(ρim τy) and x − τ+y ∈ ker τ . Thisproves (2). Finally, note that τ+y ∈ im τ+ = (ker τ)⊥ (see Theorem 5.107). Thismeans that 0 is the unique best approximation to τ+y in ker τ , i.e.

|τ+y| = infu∈ker τ

|τ+y − u| = infu∈ker τ

|τ+y + u| = infx∈S|x|.

This proves (3).

5.7 Compact operators

Ascoli’s theorem

Let X be a metric space. Recall that a subset S ⊆ X is totally bounded if forevery ε > 0 there exists a finite collection Bε(x1), . . . , Bε(xn) of open balls ofradius ε that covers S.

Theorem 5.111 (Heine-Borel theorem). Let X be a metric space. A subset S ⊆ Xis compact if and only if it is complete and totally bounded.

Corollary 5.112. Let X be a complete metric space. A subset S ⊆ X is relativelycompact if and only if it is totally bounded.

Now let F be a Banach space, and let S ⊆ C(X,F ). We say that S is equicontin-uous at x ∈ X if for every ε > 0, there exists a δ > 0 such that |f(x)− f(y)| < εfor all f ∈ S and y ∈ X such that d(x, y) < δ. We say that S is equicontinuousif it is equicontinuous at every x ∈ X.

Theorem 5.113 (Ascoli’s theorem). Let X be a compact metric space, let F bea Banach space, and let S ⊆ C(X,F ). Then S is relatively compact if and only ifit is equicontinuous and f(x) : f ∈ S is relatively compact for all x ∈ X.

Proof. Suppose that S is relatively compact. By Theorem 4.120 and Theorem5.111, S is totally bounded. Let ε > 0, let x ∈ X, and choose a collectionBε/3(f1), . . . , Bε/3(fn) of open balls that covers S. Choose δ > 0 such that for all

375


i we have |fi(x) − fi(y)| < ε/3 whenever y ∈ X and d(x, y) < δ. If f ∈ S then‖f − fi‖ < ε/3 for some i, and we have

|f(x)− f(y)| ≤ |f(x)− fi(x)|+ |fi(x)− fi(y)|+ |fi(y)− f(y)| < ε

whenever y ∈ X and d(x, y) < δ. This shows that S is equicontinuous. Forall x ∈ X the map f 7→ f(x) is continuous, so f(x) : f ∈ S is compact andf(x) : f ∈ S is relatively compact.

Conversely, suppose that S is equicontinuous and f(x) : f ∈ S is relativelycompact for all x ∈ X. By Corollary 5.112, it suffices to show that S is totallybounded. Let ε > 0. For each x ∈ X, choose a neighborhood Ux of x such that|f(x)−f(y)| < ε/4 for all f ∈ S and y ∈ Ux. Since X is compact, there are finitelymany Ux1 , . . . , Uxn that cover X. Let A =

⋃ni=1f(xi) : f ∈ S, which is relatively

compact. Corollary 5.112 implies that A is totally bounded, so we can choose acollection Bε/4(z1), . . . , Bε/4(zm) of open balls that covers A. For every f ∈ S,there is a function σ : 1, . . . , n → 1, . . . ,m such that f(xi) ∈ Bε/4(zσ(i)) fori = 1, . . . , n. Let Σ be the (finite) set of all such functions. For each σ ∈ Σ, let Sσbe the set of all f ∈ S such that f(xi) ∈ Bε/4(zσ(i)) for i = 1, . . . , n, and chooseany fσ ∈ Sσ. It is clear that the collection Sσσ∈Σ covers S, so it remains toshow that Sσ ⊆ Bε(fσ) for all σ ∈ Σ. If f ∈ Sσ and x ∈ X then x ∈ Uxi for somei, so

|f(x)− fσ(x)|≤ |f(x)− f(xi)|+

∣∣f(xi)− zσ(i)

∣∣+∣∣zσ(i) − fσ(xi)

∣∣+ |fσ(xi)− fσ(x)|< ε.

Compact operators

Let E and F be Banach spaces over K (where K = R or K = C). We say that alinear map f : E → F is compact if f(S) is relatively compact whenever S is abounded subset of E. If B is the closed unit ball in E then f(B) is bounded, sof is continuous.

Theorem 5.114. Let f : E → F and let B be the closed unit ball in E. Thefollowing are equivalent:

1. f is compact.

2. f(S) is totally bounded whenever S is bounded.

376


3. f(B) is relatively compact.

4. f(B) is totally bounded.

5. For any sequence xn in B, there exists a subsequence xnk such thatfxnk is convergent.

Let L0(E,F ) ⊆ L(E,F ) be the set of all compact linear maps from E to F .

Theorem 5.115. L0(E,F ) is closed subspace of L(E,F ).

Proof. Let f, g ∈ L0(E,F ) and r ∈ K. If S is a bounded subset of E then(f + g)(S) is relatively compact because

(f + g)(S) ⊆ f(S) + g(S) ⊆ f(S) + g(S)

and f(S)+g(S) is compact. This shows that f+g ∈ L0(E,F ). Also, rf ∈ L0(E,F )because f(rS) = rf(S). To show that L0(E,F ) is closed, let f ∈ L0(E,F ), letε > 0, and let B be the closed unit ball in E. Choose g ∈ L0(E,F ) such that|f−g| < ε/2. By Theorem 5.114, there is a finite collection Bε/2(x1), . . . , Bε/2(xn)of open balls of radius ε/2 that covers g(B). For every x ∈ B we have |g(x)−xi| <ε/2 for some i, and

|f(x)− xi| ≤ |f(x)− g(x)|+ |g(x)− xi| < ε.

This shows that the collection Bε(x1), . . . , Bε(xn) covers f(B), and Theorem 5.114implies that f ∈ L0(E,F ).

Theorem 5.116. Let E,F,G be Banach spaces, let f ∈ L(E,F ) and let g ∈L(F,G). If f or g is compact, then g f is compact.

Proof. Let B be the closed unit ball in E. If f is compact then (gf)(B) = g(f(B))is relatively compact because f(B) is compact and g(f(B)) ⊇ g(f(B)) is compact.If g is compact then g(f(B)) is relatively compact because f(B) is bounded.

Corollary 5.117. L0(E) is an ideal of L(E).

Lemma 5.118. Let f : E → F be an isometry. A subset S ⊆ E is totally boundedif and only if f(S) is totally bounded.

Proof. Suppose that S is totally bounded. If ε > 0 then there is a collectionBε(x1), . . . , Bε(xn) of open balls that covers S, and Bε(f(x1)), . . . , Bε(f(xn)) cov-ers f(S) because f(Bε(xi)) ⊆ Bε(f(xi)). Conversely, suppose that f(S) is totallybounded and choose a collection Bε/2(y1), . . . , Bε/2(yn) that covers f(S). Foreach i, choose some zi ∈ Bε/2(yi) such that zi ∈ f(S), and choose any xi ∈ Ssuch that zi = f(xi). Then Bε(x1), . . . , Bε(xn) covers S because if x ∈ S thenf(x) ∈ Bε/2(yi) ⊆ Bε(zi) for some i, and |f(x)−f(xi)| < ε implies |x−xi| < ε.

377


Theorem 5.119. If f ∈ L(E,F ), then f is compact if and only if f∗ is compact.(See Theorem 1.76 for the definition of f∗.)

Proof. Let B be the closed unit ball in E and let B′ be the closed unit ball inF ∗. Suppose that f is compact and let y′n be a sequence in B′. For each n,define ϕn : Y → C by ϕn(y) = 〈y, y′n〉. The set ϕn is equicontinuous because|ϕn(y)−ϕn(z)| ≤ |y− z| for all n. Since f(B) is compact, Theorem 5.113 impliesthat there is a subsequence ϕnk of ϕn that converges uniformly on f(B). Then

|f∗y′nj − f∗y′nk | = sup

x∈B|〈fx, y′nj − y

′nk〉| = sup

x∈B|ϕnj (fx)− ϕnk(fx)|,

which implies that f∗y′nk is Cauchy. Since E∗ is complete, f∗y′nk converges.Therefore f∗ is compact (see Theorem 5.114).

Conversely, suppose that f∗ is compact. Let τE : E → E∗∗ and τF : F → F ∗∗ beas in Lemma 1.54. By Theorem 1.77 we have f∗∗ τE = τF f . We have alreadyshown that f∗∗ is compact, so Theorem 5.116 implies that f∗∗ τE is compact.Therefore τF (f(B)) is totally bounded, and Lemma 5.118 implies that f(B) istotally bounded. This shows that f is compact.

Corollary 5.120. If E,F are Hilbert spaces and f ∈ L(E,F ), then f is compactif and only if f∗ is compact.

Theorem 5.121. If E,F are real Hilbert spaces and f ∈ L(E,F ), then f iscompact if and only if fC is compact.

Proof. Let B be the closed unit ball in E and let B′ be the closed unit ball inEC = E×E. If f is compact, then f(B)×f(B) is relatively compact in F×F . ButFC = F ×F with the product topology and fC(B′) ⊆ fC(B×B) ⊆ f(B)× f(B),so fC(B′) is relatively compact in FC. This shows that fC is compact. Conversely,if fC is compact then f = Re fC ·C is compact due to Theorem 5.116. (Recallthat ·C : E → EC is the injection given by xC = x + 0i, and Re : FC → F is theprojection given by Re(x1 + x2i) = x1.)

Theorem 5.122. Let f ∈ L(E,F ).

1. If im f is finite-dimensional, then f is compact.

2. If f is compact and im f is closed, then im f is finite-dimensional.

Proof. (1) follows from Corollary 1.42. For (2), the open mapping theorem impliesthat f : E → im f is an open map, so im f is locally compact because f is compact.Theorem 1.44 shows that im f is finite-dimensional.

378


We say that a linear map f ∈ L(E,F ) has finite rank if im f is finite-dimensional.Let L00(E,F ) be the set of all finite rank linear maps in L(E,F ); then Theorem5.122 implies that L00(E,F ) ⊆ L0(E,F ). Later we will prove that when E andF are Hilbert spaces, the closure of L00(E,F ) is in fact L0(E,F ) (see Corollary5.143). Also note that L00(E) is an ideal of L(E).

Eigenvalues of compact operators

Theorem 5.123 (Eigenvalues of compact operators). Let E be a complex Hilbertspace and let τ ∈ L(E) be compact.

1. If E is infinite-dimensional then 0 ∈ σ(τ).

2. If λ ∈ σ(τ) and λ 6= 0 then λ is an eigenvalue of τ , and the eigenspaceker(λIdE − τ) is finite-dimensional.

3. If c > 0 then λ ∈ σ(τ) : |λ| > c is finite.

4. σ(τ) is countable and has no limit points, except possibly 0.

5. If τ has infinitely many eigenvalues λ1, λ2, . . . , then λn → 0.

Proof. (1) follows from Theorem 5.122.

For (2), suppose that λ is not an eigenvalue of τ . We first show that there is somec > 0 such that |(λIdE − τ)x| ≥ c|x| for all x ∈ E. Suppose not; then there is asequence xn of unit vectors such that |(λIdE − τ)xn| < 1/n for all n. Since τis compact, there is a subsequence xnk of xn such that τxnk → y for somey ∈ E. Then

xnk = λ−1((λIdE − τ)xnk + τxnk)→ λ−1y,

and y 6= 0 since |xnk | = 1 for all k. But

(λIdE − τ)xnk = λxnk − τxnk → λλ−1y − y = 0

and (λIdE − τ)xnk → (λIdE − τ)y, so (λIdE − τ)y = 0. This contradicts the factthat λ is not an eigenvalue of τ .

Next, we show that λIdE − τ is surjective. Let Fn = im(λIdE − τ)n, noting thatF0 = E. Since |(λIdE − τ)nx| ≥ cn|x| for all x ∈ E, Lemma 1.73 shows thateach Fn is closed. Clearly F0 ⊇ F1 ⊇ F2 ⊇ · · · , and τ(Fn) ⊆ Fn because x ∈ Fnimplies τx = λx− (λIdE − τ)x ∈ Fn. Suppose that F0 ⊃ F1 ⊃ F2 ⊃ · · · . For eachn = 0, 1, . . . , choose a unit vector xn ∈ Fn ∩ F⊥n+1 (see Theorem 1.85). Since τis compact, there is a subsequence xnk of xn such that τxnk is Cauchy. Ifj < k,

|τxnj − τxnk |2 = |λxnj − (λIdE − τ)xnj − τxnk |2 ≥ |λ|2

379


because (λIdE − τ)xnj − τxnk ∈ Fnj+1. This contradicts the fact that τxnk isCauchy, so Fn = Fn+1 for at least one n. Choose the smallest such n. Supposethat n 6= 0, and choose x ∈ Fn−1 \ Fn. Since (λIdE − τ)x ∈ Fn = Fn+1, we have(λIdE − τ)x = (λIdE − τ)y for some y ∈ Fn. Then x− y 6= 0 because x /∈ Fn, and(λIdE − τ)(x− y) = 0. This contradicts the fact that λ is not an eigenvalue of τ .Therefore n = 0, i.e.

im(λIdE − τ) = F1 = F0 = E.

We have shown that λIdE − τ is surjective, and λIdE − τ is injective because|(λIdE−τ)x| ≥ c|x| for all x ∈ E. Corollary 1.58 shows that λIdE−τ is invertible,so λ /∈ σ(τ). Finally, if Eλ = ker(λIdE − τ) then τ |Eλ is compact and im(τ |Eλ) =Eλ, so Theorem 5.122 implies that Eλ is finite-dimensional.

For (3), suppose that λ ∈ σ(τ) : |λ| > c is infinite and choose a sequenceλn of eigenvalues of τ such that |λn| > c for all n. For each n, choose somexn ∈ E such that τxn = λnxn. Let Fn = span(x1, . . . , xn). Since x1, . . . , xnis always linearly independent (each xn corresponds to a different eigenvalue), wehave F1 ⊂ F2 ⊂ · · · . Clearly τ(Fn) ⊆ Fn, and (λnIdE − τ)(Fn) ⊆ Fn−1 because

x =∑ni=1 aixi ∈ Fn implies (λnIdE − τ)x =

∑n−1i=1 aiλixi ∈ Fn−1. By Theorem

1.12, we can choose a sequence yn of unit vectors such that yn ∈ Fn ∩ F⊥n−1.Since τ is compact, there is a subsequence ynk of yn such that τynk isCauchy. If j < k,

|τynk − τynj |2 = |λnkynk − (λnkIdE − τ)ynk − τynj |2 ≥ |λnk |2 > c2

because (λnkIdE − τ)ynk − τynj ∈ Fnk−1 and ynk ⊥ Fnk−1. This contradicts thefact that τynk is Cauchy.

(4) follows from (3) and the fact that

σ(τ) ⊆ 0 ∪∞⋃n=1

λ ∈ σ(τ) : |λ| > 1/n.

(5) follows directly from (3).

Diagonalizable compact operators

Let E be a Hilbert space over K. We say that an operator τ ∈ L(E) is (unitarily)diagonalizable if there exists an orthonormal set O = uαα∈A in E such thatevery uα is an eigenvector of τ , and O⊥ ⊆ ker τ . In that case, we say that τis diagonalizable with respect to O. Note that if K = R, the eigenvaluecorresponding to uα must be real.

380


Lemma 5.124. Let σ, τ ∈ L(E) be diagonalizable with respect to O = uαα∈A.Let λα be the eigenvalue of σ corresponding to uα, and let µα be the eigenvalue ofτ corresponding to uα. If λα = µα for all α ∈ A, then σ = τ .

Proof. We have σuα = λαuα = µαuα = τuα for all α ∈ A, and if u ∈ O⊥ thenσu = 0 = τu because u ∈ kerσ and u ∈ ker τ . Since span(O) +O⊥ is dense in E(see Theorem 1.92), σ = τ .

If u ∈ E, we write uu∗ for the operator in L(E) defined by

(uu∗)(v) = 〈v, u〉u.

Since we can naturally identify u with the linear map αu ∈ L(K,E) given byr 7→ ru, we are simply defining uu∗ = αuα

∗u.

Lemma 5.125 (Properties of the rank 1 orthogonal projection). Let u ∈ E.

1. uu∗ is positive.

2. If u is a unit vector then uu∗ is the orthogonal projection onto span(u).

3. If u1, . . . , un is an orthonormal set in E, then∑ni=1 uiu

∗i is the orthogonal

projection onto span(u1, . . . , un).

Proof. For (1), if v ∈ E then

〈uu∗v, v〉 = 〈〈v, u〉u, v〉 = 〈v, u〉〈u, v〉 = |〈v, u〉|2 ≥ 0.

For (2), uu∗ is a projection because (uu∗)2 = u(u∗u)u∗ = |u|2uu∗ = uu∗. For (3),see Theorem 1.89.

Theorem 5.126. Let O = u1, u2, . . . be a countable orthonormal set in E. Ifλn is a bounded sequence in K, then the series

∞∑n=1

λnunu∗n

converges strongly to an operator τ ∈ L(E). Furthermore:

1. τ is diagonalizable with respect to O, and the eigenvalue of τ correspondingto un is λn.

2. If λn → 0 then τ is compact, and the series converges in the norm topologyto τ .

381


3. |τv|2 =∑∞n=1 |λn|

2 |〈v, un〉|2 for all v ∈ E.

4. |τ | = supn≥1 |λn|.

5. τ∗ =∑∞n=1 λnunu

∗n.

6. τ is normal. If K = R, then τ is self-adjoint.

Proof. The existence of τ is easy to prove by defining τun = λnun for all n andτv = 0 for v ∈ O⊥, and (1) follows immediately. For (2), let τN =

∑Nn=1 λnunu

∗n.

When N > M ,

|τN − τM | =

∣∣∣∣∣N∑

n=M+1

λnunu∗n

∣∣∣∣∣ ≤ supn≥M+1

|λn| .

Since λn → 0, we have |τN − τM | → 0 as M,N →∞. Therefore τN is Cauchy,and since L(E) is complete, the series

∑∞n=1 λnunu

∗n converges to some τ ∈ L(E)

in the norm topology. Theorem 5.115 shows that L0(E) is closed, so τ is compactbecause every τN is compact (see Theorem 5.122).

For (3), if v ∈ E then we can write v =∑∞n=1 unu

∗nv + w where w ⊥ O. Then

|τv|2 =

∣∣∣∣∣∞∑n=1

λnunu∗n

( ∞∑m=1

umu∗mv + w

)∣∣∣∣∣2

=

∣∣∣∣∣∞∑n=1

λnunu∗nv

∣∣∣∣∣2

=

∣∣∣∣∣∞∑n=1

λn 〈v, un〉un

∣∣∣∣∣2

.

Theorem 1.89 implies that∑∞n=1 |〈v, un〉|2 converges, so

∑∞n=1 |λn|2|〈v, un〉|2 con-

verges because λn is bounded. By Theorem 1.87,

|τv|2 =

∞∑n=1

|λn|2 |〈v, un〉|2 .

For (4), it is clear from (3) that |τ | ≤ supn≥1 |λn|. Since |τun| = |λn|, we have|τ | ≥ |λn| for all n. Therefore |τ | = supn≥1 |λn|. (5) follows from the fact thatthe adjoint map continuous (see Theorem 1.98). (6) follows directly from (3) and(5).

Theorem 5.127 (Spectral theorem for normal compact operators). Let E be aHilbert space over K and let τ ∈ L(E). The following are equivalent:

1. τ is compact and normal (and self-adjoint, if K = R).

382


2. τ is compact and diagonalizable.

3. τ is diagonalizable with respect to a countable orthonormal set.

In that case, if τ is diagonalizable with respect to un and λn is the eigenvaluecorresponding to un, then

τ =

∞∑n=1

λnunu∗n.

Proof. Suppose that (1) holds. First assume that K = C. Let ω be the spec-tral decomposition of τ (see Theorem 5.75). Theorem 5.123 and Corollary 5.82show that σ(τ) is countable, each nonzero λ ∈ σ(τ) is an eigenvalue of τ , andimω(λ) = ker(λIdE − τ) is finite-dimensional for all nonzero λ ∈ σ(τ). Letλ1, λ2, . . . be the set of all nonzero eigenvalues of τ . For each n, choose a (fi-nite) orthonormal basis un,1, . . . , un,mn for ker(λnIdE − τ). Then un,k is anorthonormal set in E such that every un,k is an eigenvector of τ , and it remains toshow that if v ∈ E is nonzero and v ⊥ un,k then τv = 0. Since v ⊥ imω(λn)for all n,

ω(σ(τ) \ 0) = supN≥1

N∑n=1

ω(λn), (*)

and ω(σ(τ)) = IdE , we must have 0 ∈ σ(τ) and v ∈ imω(0) = ker τ . Thisproves (1)⇒ (2) for the case K = C.

Now suppose that K = R; then τ is self-adjoint by assumption. By Theorem 5.121and Theorem 5.91, τC is compact and self-adjoint. Using the same argument asbefore, we have a spectral decomposition ω of τC such that σ(τ) is countable,each nonzero λ ∈ σ(τ) is an eigenvalue of τC, and imω(λ) = ker(λIdEC − τC)is finite-dimensional for all nonzero λ ∈ σ(τ). Let λ1, λ2, . . . be the set of allnonzero eigenvalues of τC. Since τC is self-adjoint, every λn is real. For each n,Theorem 5.96 shows that ω(λn) is self-conjugate, so Theorem 5.87 and Theorem5.91 show that ω(λn) = ρCn for some orthogonal projection ρn ∈ L(E). Choosean orthonormal basis un,1, . . . , un,mn for im ρn. We have

un,k ∈ im ρCn = imω(λn) = ker(λnIdEC − τC),

soτun,k = Re(τCun,k) = Re(λnun,k) = λnun,k.

This shows that each un,k is an eigenvector of τ . It remains to show that if v ∈ Eis nonzero and v ⊥ un,k then τv = 0. Since un,1, . . . , un,mn is an orthonormalbasis for im ρCn, we have v ⊥ im ρCn. From (*) and the fact that im ρCn = imω(λn),we must have v ∈ ker τC. Therefore v ∈ ker τ . This proves (1)⇒ (2) for the caseK = R.

383


Suppose that (2) holds, and that τ is diagonalizable with respect to uαα∈A.Let λα be the eigenvalue corresponding to uα. By Theorem 5.123, there are onlycountably many α ∈ A for which λα 6= 0. (Note that we need the countabilityof σ(τ) as well as the fact that ker(λIdE − τ) is finite-dimensional whenever λis a nonzero eigenvalue of τ .) Let B be the set of all such α ∈ A. Then τ isdiagonalizable with respect to uαα∈B , because τuα = 0 whenever α ∈ A \ B.This proves (2)⇒ (3).

Finally, suppose that (3) holds, and that τ is diagonalizable with respect to un.Let λn be the eigenvalue corresponding to un. By Theorem 5.123, λn → 0 (ifλn is infinite). Theorem 5.126 implies that the series

∑∞n=1 λnunu

∗n converges

to some compact operator σ ∈ L(E) that is diagonalizable with respect to unand satisfies σun = λnun for all n. By Theorem 5.124, σ = τ . Theorem 5.126shows that τ is normal (and self-adjoint if K = R). This proves (3)⇒ (1).

Note that in Theorem 5.127 we require that τ is self-adjoint if E is a real Hilbertspace. If τ is normal then τC ∈ L(EC) is diagonalizable, but τ might not bediagonalizable because σ(τ) can contain non-real values. However, we have thefollowing theorem:

Theorem 5.128. Let E be a real Hilbert space and let τ ∈ L(E). The followingare equivalent:

1. τ is compact and normal.

2. There exist disjoint sets un, vm, wm in E such that un ∪ vm ∪wm is countable and orthonormal, and sequences λn, am, bm of realnumbers such that λn → 0 and a2

m + b2m → 0, satisfying

τ =

∞∑n=1

λnunu∗n +

∞∑m=1

[vm wm

] [am −bmbm am

] [v∗mw∗m

]

=

∞∑n=1

λnunu∗n +

∞∑m=1

(amvmv∗m + bmwmv

∗m − bmvmw∗m + amwmw

∗m).

Proof. Suppose that (1) holds. By Theorem 5.121 and Theorem 5.91, τC is com-pact and normal. Using the argument in Theorem 5.127, we have a spectraldecomposition ω of τC such that σ(τ) is countable, each nonzero z ∈ σ(τ) is aneigenvalue of τC, and imω(z) = ker(zIdEC − τC) is finite-dimensional for allnonzero z ∈ σ(τ). Let λn be the set of all nonzero real eigenvalues of τC, andlet µm be the set of all non-real eigenvalues of τC. The argument in Theorem5.127 already shows the existence of a countable orthonormal set un in E suchthat τun = λnun for all n.

384


For each m, let ρm = ω(µm) and choose an orthonormal basis xm,1, . . . , xm,kmfor im ρm. Then xm,1, . . . , xm,km is an orthonormal basis for im(ρm)∗, andTheorem 5.96 shows that (ρm)∗ = ω(µm)∗ = ω(µm). Since µm 6= µm andµm ∩ µm = ∅, we must have im ρm ⊥ im(ρm)∗. Let

vm,j = Rexm,j , wm,j = Imxm,j ,

am = Reµm, bm = − Imµm,

so that vm,1, . . . , vm,km , wm,1, . . . , wm,km is an orthonormal set. Using the factthat τC(Rex) = Re(τCx) and τC(Imx) = Im(τCx) for all x ∈ EC, we have

τvm,j = Re(τCvm,j) = Re(τCxm,j) = Re(µmxm,j)

= Re((am − bmi)(vm,j + wm,ji)) = amvm,j + bmwm,j ,

and

τwm,j = Re(τCwm,j) = Im(τCxm,j) = Im(µmxm,j)

= Im((am − bmi)(vm,j + wm,ji)) = amwm,j − bmvm,j .

If v ∈ E is nonzero and v ⊥ um∪vm,j∪wm,j, then τv = 0 (see the argumentin Theorem 5.127). This proves (1)⇒ (2).

Suppose that (2) holds. Since λn → 0 and a2m + b2m → 0, the series converges

to a compact operator. Theorem 5.126 shows that∑∞n=1 λnunu

∗n is normal, so it

remains to show that each term

σm = amvmv∗m + bmwmv

∗m − bmvmw∗m + amwmw

∗m

is normal. A simple computation gives

σ∗mσm = (a2m + b2m)(vmv

∗m + wmw

∗m) = σmσ

∗m,

and this proves (2)⇒ (1).

The sum∑∞n=1 λnunu

∗n is a discrete version of integration with respect to PO-

valued measures. The next theorem shows that the continuous functional calculusfor diagonalizable compact operators has a particularly simple form. If τ is com-pact then the set λn of eigenvalues of τ can only have a limit point at 0 (seeTheorem 5.123). If λn is infinite and f ∈ C(σ(τ)) satisfies f(0) = 0, thenλn → 0 and the continuity of f at 0 ensures that f(λn)→ f(0) = 0.

Theorem 5.129 (Continuous functional calculus for normal compact operators).Let E be a Hilbert space over K and let τ ∈ L(E) be compact and diagonalizable

385


with respect to un. Let λn be the eigenvalue corresponding to un. For all f ∈C(σ(τ),K) such that f(0) = 0 (if σ(τ) contains 0),

f(τ) =

∞∑n=1

f(λn)unu∗n.

In particular, f(τ) is compact. (See Theorem 5.97 for the definition of f(τ) whenK = R.)

Proof. If λn is infinite then λn → 0, and the continuity of f at 0 implies thatf(λn) → f(0) = 0. By Theorem 5.126, the series

∑∞n=1 f(λn)unu

∗n converges to

some compact operator σ ∈ L(E) that is diagonalizable with respect to un andsatisfies σun = f(λn)un for all n.

Suppose that K = C, and let ω be the spectral decomposition of τ . If λn 6= 0,then un ∈ imω(λn) by Corollary 5.82. Since un ⊥ imω(σ(τ) \ λn),

〈f(τ)un, v〉 =

∫σ(τ)

f dωun,v =

∫λn

f dωun,v +

∫σ(τ)\λn

f dωun,v

=

∫λn

f dωun,v = 〈f(λn)ω(λn)un, v〉

= 〈f(λn)un, v〉

for all v ∈ E. Therefore f(τ)un = f(λn)un. If v ∈ E is nonzero and v ⊥ un thenv ⊥ imω(σ(τ)\0), so 0 ∈ σ(τ) and v ∈ imω(0). Therefore un⊥ ⊆ imω(0).Since f(0) = 0, Theorem 5.80 implies that

un⊥ ⊆ imω(0) ⊆ imω(f−1(0)) = ker f(τ).

This shows that f(τ) is diagonalizable with respect to un. Since f(τ)un =f(λn)un = σun, Lemma 5.124 shows that f(τ) = σ.

Now suppose that K = R. Theorem 5.121 shows that τC is compact, and it iseasy to check that τC is diagonalizable with respect to un. Therefore

f(τC) =

∞∑n=1

f(λn)(unu∗n)C.

Since the map g 7→ gC is continuous, we have σC = f(τC). By definition, f(τ) =σ.

Corollary 5.130. Let E be a Hilbert space over K. If τ ∈ L(E) is compact andpositive, then τ1/2 is compact.

386


Proof. If τ is diagonalizable with respect to un and λn ≥ 0 is the eigenvaluecorresponding to un, then

τ1/2 =

∞∑n=1

λ1/2n unu

∗n.

Theorem 5.131. Let E,F be Hilbert spaces and let f ∈ L(E,F ). The followingare equivalent:

1. f is compact.

2. f∗f is compact.

3. (f∗f)1/2 is compact.

Proof. Using Theorem 5.104, write f = ν(f∗f)1/2 for some partial isometry ν ∈L(E,F ). If f is compact then f∗f is compact due to Theorem 5.116. If f∗f iscompact then Corollary 5.130 implies that (f∗f)1/2 is compact. Finally, if (f∗f)1/2

is compact then f = ν(f∗f)1/2 is compact due to Theorem 5.116.

Theorem 5.132 (Min-max theorem for eigenvalues). Let τ ∈ L(E) be a diagonal-izable compact operator and let λ1, λ2, . . . be the eigenvalues of τ (with repetition),where |λ1| ≥ |λ2| ≥ · · · . Then

λn = infu1,...,un−1∈E

sup|τv| : v ∈ u1, . . . , un−1⊥, |v| = 1

.

Proof. τ is diagonalizable with respect to some en such that λn is the eigenvaluecorresponding to en. If v ∈ e1, . . . , en−1⊥ is a unit vector then

|τv|2 =

∣∣∣∣∣∞∑k=1

λkeke∗kv

∣∣∣∣∣2

=

∞∑k=n

|λk|2 |〈v, ek〉|2 ≤ |λn|2∞∑k=n

|〈v, ek〉|2 ≤ |λn|2 ,

so inf sup · · · ≤ |λn|. Let u1, . . . , un−1 ∈ E. Since e1, . . . , en are linearlyindependent and dim(E/u1, . . . , un−1⊥) ≤ n − 1, there exists a unit vectorv ∈ span(e1, . . . , en) ∩ u1, . . . , un−1⊥. Then

|τv|2 =

∣∣∣∣∣∞∑k=1

λkeke∗kv

∣∣∣∣∣2

=

n∑k=1

|λk|2 |〈v, ek〉|2 ≥ |λn|2n∑k=1

|〈v, ek〉|2 = |λn|2 ,

which shows that inf sup · · · ≥ |λn|.

387


Theorem 5.133. If τ ∈ L(E) and |τun| → 0 whenever un is a countableorthonormal set in (ker τ)⊥, then τ is compact.

Proof. We can assume that (ker τ)⊥ is infinite-dimensional. Choose un as fol-lows: having chosen u1, . . . , un−1, let

rn = sup|τv| : v ∈ (ker τ)⊥ ∩ u1, . . . , un−1⊥, |v| = 1

and choose a unit vector un ∈ (ker τ)⊥ ∩ u1, . . . , un−1⊥ such that |τun| ≥ rn/2.Then rn → 0 because |τun| → 0. Let τn =

∑nk=1(τuk)u∗k. If v ∈ span(u1, . . . , un)

then

τnv =

n∑k=1

(τuk)u∗k

n∑j=1

〈v, uj〉uj = τ

n∑k=1

〈v, uk〉uk = τv,

so ∣∣∣∣∣τ −n∑k=1

(τuk)u∗k

∣∣∣∣∣ ≤ rn+1 → 0

as n→∞.

The `p spaces

Let X be a discrete topological space and let E be a Banach space. Recall thatthe counting measure µ on X is defined by µ(S) = |S| if S is any finite subset ofX, and µ(S) = ∞ if S is any infinite subset of X. It is easy to check that µ is aRadon measure on X. Let `p(X) = `p(X,E) = Lp(X,µ,E).

Theorem 5.134. Let s : X → E. Then∑x∈X s(x) is absolutely convergent if

and only if s ∈ `1(X). In that case,∑x∈X|s(x)| =

∫X

|s| and∑x∈X

s(x) =

∫X

s,

and s(x) 6= 0 for only countably many x ∈ X.

Proof. If s ∈ `1(X) then Theorem 4.57 and Theorem 1.27 imply that∑x∈X |s(x)|

converges. Suppose that∑x∈X |s(x)| converges, and let n be a positive integer. By

Theorem 1.27, there is a finite set Sn ⊆ X such that∑x∈S |s(x)| < 1/n whenever

S ⊆ X is finite and Sn ∩ S = ∅. Let Tn = S1 ∪ · · · ∪ Sn. Then each χTns is asimple integrable map, and for n ≥ m,∫

X

|χTns− χTms| =∑

x∈Tn\Tm

|s(x)| < 1/m

388


because Sm∩(Tn\Tm) = ∅. This shows that χTnsn≥1 is L1-Cauchy, so χTns→ tin L1 for some t ∈ `1(X). Let T =

⋃∞n=1 Tn. If x /∈ T then Sn ∩ x = ∅ and

|s(x)| =∑y∈x |s(y)| < 1/n for all n, so s(x) = 0. This implies that χTns → s

pointwise, so Theorem 4.60 shows that s = t. Furthermore, s(x) 6= 0 for onlycountably many x ∈ X because T is countable. The equalities follow from Theorem4.57.

Theorem 5.135. Let 1 ≤ p ≤ q ≤ ∞. Then `p(X) ⊆ `q(X), and ‖s‖q ≤ ‖s‖p forall s ∈ `p(X).

Proof. Let s ∈ `p(X). If q = ∞ and p < ∞ then ‖s‖p∞ = supx∈X |s(x)|p ≤ ‖s‖pp,so ‖s‖∞ ≤ ‖s‖p. Otherwise, we can assume that 1 ≤ p < q <∞. If ‖s‖p = 0 thens = 0 and ‖s‖q = 0. If ‖s‖p 6= 0 then s ≤ ‖s‖p on X, so

|s/ ‖s‖p |q ≤ |s/ ‖s‖p |

p

on X because aq ≤ ap when 0 ≤ a ≤ 1. Therefore

‖s‖qq / ‖s‖qp =

∫X

|s/ ‖s‖p |q ≤

∫X

|s/ ‖s‖p |p = 1,

i.e. ‖s‖q ≤ ‖s‖p.

The multiplicative form of the spectral theorem

In order to state the singular value decomposition, we will need a version of thespectral theorem that represents a normal compact operator as a multiplicationoperator on `2(X,K), for some discrete topological space X. Given a function s ∈`∞(X,K), we define multiplication by s to be the operator ms ∈ L(`2(X,K))defined by ms(t) = st. Note that s ∈ C0(X,K) if and only if s(X) is countableand s−1(λ) is finite for all nonzero λ ∈ K.

Theorem 5.136 (Properties of the multiplication operator). Let X be a discretetopological space and let s ∈ `∞(X,K).

1. The map given by t 7→ mt is an isometric *-homomorphism from `∞(X,K)to L(`2(X,K)). In particular, m∗s = ms.

2. ms is normal.

3. ms is self-adjoint if and only if s is real-valued.

4. ms is unitary if and only if |s| = 1 on X.

5. ms is an orthogonal projection if and only if s = χS for some set S ⊆ X.

389


6. ms is positive if and only if s ≥ 0 on X.

7. ms is invertible if and only if s 6= 0 on X.

8. ms is compact if and only if s ∈ C0(X,K).

9. ker(λId`2(X,K) −ms) = immχs−1(λ).

Proof. We only prove (8) and (9). For all x ∈ X we have msχx = s(x)χx,so χx is an eigenvector of ms with eigenvalue s(x). Since s ∈ C0(X,K) if andonly if s−1(λ) is finite for all nonzero λ ∈ K, Theorem 5.123 and Theorem 5.127imply (8). For (9),

mst = λt⇔ s(x)t(x) = λt(x) for all x ∈ X⇔ s(x) = λ or t(x) = 0 for all x ∈ X⇔ t = χs−1(λ)t.

Theorem 5.137 (Spectral theorem for normal compact operators, multiplicativeform). Let E be a Hilbert space over K and let τ ∈ L(E) be compact and diag-onalizable. There exists a discrete topological space X, a function s ∈ C0(X,K),and an isometric isomorphism ϕ : `2(X,K)→ E such that

τ = ϕmsϕ∗.

The function s is unique in the following sense: if there is a discrete topologicalspace Y , a function t ∈ C0(Y,K), and an isometric isomorphism ϕ1 : `2(Y,K)→E such that τ = ϕ1mtϕ

∗1, then there is a bijection f : X → Y such that s = t f .

Proof. By Theorem 5.127, there is an orthonormal set O = u1, u2, . . . in E suchthat every un is an eigenvector of τ , and O⊥ ⊆ ker τ . Let λn be the eigenvaluecorresponding to un. By Theorem 1.91, there is a Hilbert basis B for E thatcontains O, and there exists an isometric isomorphism ϕ : `2(B,K) → E suchthat

ϕ(t) =∑u∈B

t(u)u

(see Theorem 1.94 and Theorem 1.87). Note that ϕ−1 = ϕ∗ due to Theorem 1.101.Define s ∈ `∞(B,K) by s(u) = λn if u = un for some n, and s(u) = 0 if u ∈ B \O.If we give B the discrete topology, then Theorem 5.123 implies that s ∈ C0(B,K).Then

ϕ−1τϕχun = ϕ−1τun = λnϕ−1un = sχun

390


for all n, and if u /∈ B \ O then u ∈ ker τ and

ϕ−1τϕχu = ϕ−1τu = 0 = sχu.

Since χuu∈B is a Hilbert basis for `2(B,K), this shows that ϕ−1τϕ = ms.

Suppose that τ = ϕ1mtϕ∗1 as described; then

ms = ϕ∗ϕ1mtϕ∗1ϕ.

Note that s−1(λ) and t−1(λ) are finite if λ ∈ K is nonzero because s ∈C0(X,K) and t ∈ C0(Y,K). If we can show that s(X) = t(Y ) and |s−1(λ)| =|t−1(λ)| for all nonzero λ ∈ K, then there is clearly a bijection f : X → Y suchthat s = t f .

Let λ ∈ s(X) and choose some nonzero r ∈ immχs−1(λ). Theorem 5.136 implies

that msr = λr,mtϕ

∗1ϕr = ϕ∗1ϕmsr = λϕ∗1ϕr,

and ϕ∗1ϕr ∈ immχt−1(λ). Since ϕ∗1ϕr 6= 0, we have λ ∈ t(Y ). Now suppose that

λ 6= 0, and let m = |s−1(λ)| and n = |t−1(λ)|. Choose a basis r1, . . . , rmfor immχs−1(λ)

, noting that dim(immχs−1(λ)) = m. Then ϕ∗1ϕr1, . . . , ϕ

∗1ϕrm

is a linearly independent set in immχt−1(λ), so

|s−1(λ)| = m ≤ dim(immχt−1(λ)) = n = |t−1(λ)|.

This shows that s(X) ⊆ t(Y ) and |s−1(λ)| ≤ |t−1(λ)| for all nonzero λ ∈ K.A similar argument shows that t(Y ) ⊆ s(X) and |t−1(λ)| ≤ |s−1(λ)| for allnonzero λ ∈ K, and this completes the proof.

Singular values

Theorem 5.138 (Singular value decomposition, multiplicative form). Let E,Fbe Hilbert spaces over K and let τ ∈ L(E,F ) be compact. There exists a discretetopological space X, a nonnegative s ∈ C0(X,K), an isometric isomorphism ϕ :`2(X,K)→ E, and a partial isometry ψ : `2(X,K)→ F such that kerψ ⊆ kerms

andτ = ψmsϕ

∗.

The function s is unique in the following sense: if there is a discrete topologicalspace Y , a nonnegative t ∈ C0(Y,K), an isometric isomorphism ϕ1 : `2(Y,K) →E, and a partial isometry ψ1 : `2(Y,K) → F such that kerψ1 ⊆ kermt andτ = ψ1mtϕ

∗1, then there is a bijection f : X → Y such that s = t f .

391


Proof. By Theorem 5.104, there is a partial isometry ν ∈ L(E,F ) such that τ =ν(τ∗τ)1/2 and ker ν = (im(τ∗τ)1/2)⊥. Since (τ∗τ)1/2 is compact (see Corollary5.130), Theorem 5.137 shows that there is a discrete topological spaceX, a functions ∈ C0(X,K), and an isometric isomorphism ϕ : `2(X,K)→ E such that

(τ∗τ)1/2 = ϕmsϕ∗.

Since (τ∗τ)1/2 is positive, s is nonnegative. Let ψ = νϕ; then ψ is a partialisometry and

ψmsϕ∗ = νϕmsϕ

∗ = ν(τ∗τ)1/2 = τ.

We haveker ν = (im(τ∗τ)1/2)⊥ = ker(τ∗τ)1/2

using Theorem 1.100, so

kerψ = ker(νϕ) = ker((τ∗τ)1/2ϕ) = kerms.

If τ = ψ1mtϕ∗1 as described, then

ϕm2sϕ∗ = τ∗τ = ϕ1mtψ

∗1ψ1mtϕ

∗1 = ϕ1m

2tϕ∗1

because ψ1 is an isometry on (kerψ1)⊥ ⊇ (kermt)⊥ = immt (see Theorem 5.136,

Theorem 1.100 and Theorem 1.101). This implies that

ms2 = ϕ∗ϕ1mt2ϕ∗1ϕ,

and uniqueness in Theorem 5.137 shows that s2 = t2 f for some bijection f :X → Y . Since s and t are nonnegative, s = t f .

Let τ ∈ L(E,F ) be compact and let τ = ψmsϕ∗ be a singular value decomposition

of τ . If λ ∈ s(X) is nonzero, we say that λ is a singular value of τ . By uniquenessin Theorem 5.138, this definition is independent of ψ, s and ϕ. When E = F andτ is diagonalizable, the singular values of τ are precisely the absolute values of theeigenvalues of τ .

Theorem 5.139. Let τ ∈ L(E,F ) be compact and let λ ≥ 0. The following areequivalent:

1. λ is a singular value of τ .

2. λ is a singular value of τ∗.

3. λ is an eigenvalue of (τ∗τ)1/2.

4. λ is an eigenvalue of (ττ∗)1/2.

392


5. There exist nonzero unit vectors u ∈ E and v ∈ F such that

τu = λv and τ∗v = λu.

We say that u is a right singular vector of τ and v is a left singularvector of τ .

Proof. (1) ⇔ (3) and (2) ⇔ (4) follow from Theorem 5.138. Suppose that (1)holds, so that λ = s(x) for some x ∈ X. Let u = ϕχx and v = ψχx, which areboth unit vectors. Then

τu = ψmsϕ∗ϕχx = s(x)ψχx = λv

andτ∗v = ϕmsψ

∗ψχx = s(x)ϕχx = λu

using Theorem 1.101. This proves (1) ⇒ (5), and a similar argument proves(2)⇒ (5). If (5) holds then

τ∗τu = λτ∗v = λ2u

andττ∗v = λτu = λ2v,

which proves (5)⇒ (3) and (5)⇒ (4).

Corollary 5.140. If τ ∈ L(E,F ) is compact and τ = ψmsϕ∗ is a singular value

decomposition of τ , then|τ | = ‖s‖∞ .

The following result is a useful generalization of Theorem 5.126, and will allow usto state the singular value decomposition in a slightly different way.

Theorem 5.141. Let un and vn be countable orthonormal sets in E and Frespectively. If λn is a bounded sequence in K, then the series

∞∑n=1

λnvnu∗n

converges strongly to an operator τ ∈ L(E,F ). Furthermore:

1. Each nonzero |λn| is a singular value of τ , where un and (λn/|λn|)vn arethe corresponding right and left singular vectors.

2. If λn → 0 then τ is compact, and the series converges in the norm topologyto τ .

393


3. |τw|2 =∑∞n=1 |λn|

2 |〈w, un〉|2 for all w ∈ E.

4. |τ | = supn≥1 |λn|.

5. τ∗ =∑∞n=1 λnunv

∗n.

If τ ∈ L(E,F ) is compact, we define the singular value sequence s(τ) =(sn(τ))n ∈ C0(Z+) of τ to be the sequence of all singular values of τ (with repeti-tions) in descending order. If τ has finitely many singular values s1(τ), . . . , sm(τ),we define sn(τ) = 0 for n > m.

Theorem 5.142 (Singular value decomposition). If τ ∈ L(E,F ) is compact, thenthere exist countable orthonormal sets un and vn in E and F such that

τ =

∞∑n=1

sn(τ)vnu∗n.

Proof. This follows easily from Theorem 5.138 and Theorem 5.141.

Corollary 5.143. L00(E,F ) is dense in L0(E,F ).

Theorem 5.144 (Min-max theorem for singular values). If τ ∈ L(E,F ) is com-pact then

sn(τ) = infu1,...,un−1∈E

sup|τv| : v ∈ u1, . . . , un−1⊥, |v| = 1

.

Proof. This follows directly from Theorem 5.132, using the fact that

|(τ∗τ)1/2v|2 = 〈(τ∗τ)1/2v, (τ∗τ)1/2v〉 = 〈τ∗τv, v〉

= 〈τv, τv〉 = |τv|2 .

Corollary 5.145. If σ ∈ L(E) and τ ∈ L0(E), then

s(στ) ≤ |σ| s(τ) and s(τσ) ≤ |σ| s(τ).

Proof. The inequality s(στ) ≤ |σ| s(τ) follows directly from Theorem 5.144, and

s(τσ) = s((τσ)∗) = s(σ∗τ∗) ≤ |σ∗|s(τ∗) = |σ|s(τ).

394


Corollary 5.146 (Fan’s inequality). Let σ, τ ∈ L0(E). For all m,n ≥ 0,

sm+n+1(σ + τ) ≤ sm+1(σ) + sn+1(τ).

Proof. Write

M(ϕ;u1, . . . , un) = sup|ϕv| : v ∈ u1, . . . , un⊥, |v| = 1.

Since |(σ + τ)v| ≤ |σv|+ |τv|,

M(σ + τ ;u1, . . . , um+n) ≤M(σ;u1, . . . , um+n) +M(τ ;u1, . . . , um+n)

≤M(σ;u1, . . . , um) +M(τ ;um+1, . . . , um+n).

The result follows from taking the inf with respect to u1, . . . , um+n and applyingTheorem 5.144.

Lemma 5.147. Let ρ ∈ L(E) be a positive compact operator and let λ1 ≥ λ2 ≥ · · ·be the eigenvalues of ρ (with repetition). If v1, . . . , vn ∈ E then

〈ρv1 ∧ · · · ∧ ρvn, v1 ∧ · · · ∧ vn〉 ≤ λ1 · · ·λn |v1 ∧ · · · ∧ vn|2 .

Proof. Write ρ =∑∞k=1 λkeke

∗k for some countable orthonormal set ek in E. By

Theorem 1.91, there is an ordered Hilbert basis eαα∈A for E that contains ek.Since ρeα = 0 whenever eα /∈ ek, Theorem 1.124 shows that

ρv1 ∧ · · · ∧ ρvn=

∑k1<···<kn

λk1 · · ·λkn 〈v1 ∧ · · · ∧ vn, ek1 ∧ · · · ∧ ekn〉 ek1 ∧ · · · ∧ ekn .

Therefore

〈ρv1 ∧ · · · ∧ ρvn, v1 ∧ · · · ∧ vn〉

=∑

k1<···<kn

λk1 · · ·λkn | 〈v1 ∧ · · · ∧ vn, ek1 ∧ · · · ∧ ekn〉 |2

≤ λ1 · · ·λn∑

k1<···<kn

| 〈v1 ∧ · · · ∧ vn, ek1 ∧ · · · ∧ ekn〉 |2

≤ λ1 · · ·λn∑

α1<···<αn

| 〈v1 ∧ · · · ∧ vn, eα1 ∧ · · · ∧ eαn〉 |2

= λ1 · · ·λn |v1 ∧ · · · ∧ vn|2 .

395


The following theorem is due to [7].

Theorem 5.148 (Horn’s inequality). Let σ, τ ∈ L(E) be compact. For all n,

n∏k=1

sk(στ) ≤n∏k=1

sk(σ)sk(τ).

Proof. Let στ =∑∞k=1 sk(στ)vku

∗k be a singular value decomposition of στ (see

Theorem 5.142). Clearly

|στu1 ∧ · · · ∧ στun| =n∏k=1

sk(στ).

By Theorem 1.127 and Lemma 5.147,

n∏k=1

sk(στ)2 = |στu1 ∧ · · · ∧ στun|2

= 〈(σ∗σ)τu1 ∧ · · · ∧ (σ∗σ)τun, τu1 ∧ · · · ∧ τun〉

≤

(n∏k=1

sk(σ)2

)|τu1 ∧ · · · ∧ τun|2

=

(n∏k=1

sk(σ)2

)〈τ∗τu1 ∧ · · · ∧ τ∗τun, u1 ∧ · · · ∧ un〉

≤

(n∏k=1

sk(σ)2

)(n∏k=1

sk(τ)2

).

5.8 Schatten class operators

In this section we will restrict our attention to separable Hilbert spaces. We firstprove the Calkin correspondence, which is a bijection between the set of all idealsof L(E) and a certain class of sequence spaces. Next we will define the Schattenclasses, which are noncommutative analogs of the Lp spaces (special cases of whichinclude the trace class and Hilbert-Schmidt operators). Finally, we will explorethe properties of trace and determinant in the infinite-dimensional setting.

From now on, we will write `p = `p(Z+,K), c0 = C0(Z+,K), and c00 = Cc(Z+,K)(the set of sequences with finitely many nonzero values).

396


The Calkin correspondence

For any sequence a = (an) in c0, we define s(a) to be the sequence (|aσ(n)|), whereσ is a permutation of Z+ chosen so that

|aσ(1)| ≥ |aσ(2)| ≥ · · · .

Alternatively, s(a) = s(ma), where s(ma) is the singular value sequence of themultiplication operator ma ∈ L(`2). Often we will write s(a) = s1(a), s2(a), . . . .

A Calkin space is a subspace i of c0 satisfying the following property: if a ∈ i andb ∈ c0 such that s(b) ≤ s(a), then b ∈ i. Clearly c0, c00, and `p (for 1 ≤ p < ∞)are all Calkin spaces, and the intersection of any collection of Calkin spaces is alsoa Calkin space.

Theorem 5.149. Let i be a subspace of c0. Then i is a Calkin space if and onlyif i is an ideal of `∞ and a σ ∈ i whenever a ∈ i and σ is a permutation of Z+.

Proof. Suppose that i is a Calkin space. Let a ∈ i and b ∈ `∞ be nonzero, and letc = ‖b‖−1

∞ ba. Then c ∈ c0 and s(c) ≤ s(a), so c ∈ i and ba = ‖b‖∞ c ∈ i. If σ isa permutation of Z+ then s(a σ) = s(a), so a σ ∈ i. Conversely, suppose thati is an ideal of `∞ and a σ ∈ i whenever a ∈ i and σ is a permutation of Z+.Let a ∈ i and b ∈ c0 such that s(b) ≤ s(a). Choose a permutation σ of Z+ suchthat |b| ≤ |a σ|. Define c ∈ `∞ by cn = bn/aσ(n) whenever an 6= 0, and cn = 0whenever an = 0. Then a σ ∈ i and b = c(a σ) ∈ i.

Theorem 5.150. Let E be a separable Hilbert space. If I is a proper ideal ofL(E), then I ⊆ L0(E).

Proof. Let I be an ideal of L(E) and suppose that there is some τ ∈ I that is notcompact. By Theorem 5.131, ρ = (τ∗τ)1/2 is not compact. Let ω be the spectraldecomposition of ρ. We have

|ρ− ω([1/n,∞))ρ| =

∣∣∣∣∣∫z∈[0,1/n)

z dω

∣∣∣∣∣ ≤ 1

n→ 0

as n→∞. There must be some c > 0 such that imω([c,∞)) is infinite-dimensional,for otherwise the fact that ω([1/n,∞))ρ→ ρ implies that ρ is compact. Since I is

397


an ideal,

ω([c,∞)) =

(∫σ(ρ)

χ[c,∞) dω

)(∫z∈[c,∞)

z−1 dω

)(∫z∈σ(ρ)

z dω

)

= ω([c,∞))

(∫z∈[c,∞)

z−1 dω

)ρ

∈ I.

Both imω([c,∞)) and E have countably infinite Hilbert bases, so there is anisometric isomorphism ϕ : E → imω([c,∞)). Then ϕ−1ω([c,∞))ϕ = IdE ∈ I,which implies that I = L(E).

Theorem 5.151 (Calkin correspondence). Let E be an infinite-dimensional sep-arable Hilbert space. If I is a proper ideal of L(E), then

i(I) = a ∈ c0 : s(a) = s(τ) for some τ ∈ I

is a Calkin space. If i is a Calkin space, then

I(i) = τ ∈ L0(E) : s(τ) ∈ i

is a proper ideal of L(E). Furthermore, i(I(i)) = i and I(i(I)) = I for all Calkinspaces i and all proper ideals I of L(E).

Proof. It is easy to see that i(I) is a vector space (e.g. by using Theorem 5.126).Let a ∈ i(I) and choose τ ∈ I such that s(a) = s(τ), i.e. τ =

∑∞n=1 |an|vnu∗n

for orthonormal sets un and vn in E. If σ is a permutation of Z+ thens(a σ) = s(a), so a σ ∈ i(I). If b ∈ `∞ then

∑∞m=1 bmumu

∗m converges strongly

to some ϕ ∈ L(E), and

τϕ =

∞∑n=1

|an|vnu∗n∞∑m=1

bmumu∗m =

∞∑n=1

|an|bnvnu∗n.

Therefore s(ab) = s(τϕ), and ab ∈ i(I) because τϕ ∈ I. By Theorem 5.149, i(I)is a Calkin space.

If τ ∈ I(i) and σ ∈ L(E) then Corollary 5.145 implies that στ ∈ I(i) and τσ ∈ I(i),so it remains to show that I(i) is a vector space. Let σ, τ ∈ I(i) and r ∈ K. Sinces(rτ) = |r| s(τ), we have rτ ∈ I(i). Corollary 5.146 implies that

s2n+1(σ + τ) ≤ sn+1(σ) + sn+1(τ),

s2n(σ + τ) ≤ s2n−1(σ + τ) ≤ sn(σ) + sn(τ)

398


for all n ≥ 0. Let

a = (s1(σ), s1(σ), s2(σ), s2(σ), . . . ),

b = (s1(τ), s1(τ), s2(τ), s2(τ), . . . )

so that s(σ + τ) ≤ a+ b. Since

(s1(σ), 0, s1(σ), 0, . . . ) ∈ i,(0, s1(σ), 0, s1(σ), . . . ) ∈ i,

we must have a ∈ i. A similar argument shows that b ∈ i. Therefore a + b ∈ i,s(σ + τ) ∈ i and σ + τ ∈ I(i).

Corollary 5.152. If E is a separable Hilbert space, then every ideal of L(E) is a*-ideal.

Proof. This follows from the fact that s(τ∗) = s(τ) (see Theorem 5.139).

Corollary 5.153. Let E be a separable Hilbert space. If I is a closed ideal of E,then I = 0, I = L0(E) or I = L(E).

Proof. If I 6= 0 and I 6= L(E) then I contains the finite rank operators, soTheorem 5.115 implies that I = I ⊇ L0(E). Theorem 5.150 implies that I ⊆L0(E), so I = L0(E).

Some inequalities

Before continuing, we need to prove some inequalities that will be used later.

Lemma 5.154 (Rearrangement inequality). Let a, b ∈ c0.

1.∑nk=1 |ak| ≤

∑nk=1 sk(a) for all n.

2.∑∞n=1 |anbn| ≤

∑∞n=1 sn(a)sn(b).

Proof. By rearranging a and b, we can assume that |a1| ≥ |a2| ≥ · · · . Then

∞∑n=1

|anbn| =∞∑n=1

(|an| − |an+1|)n∑k=1

|bk| ≤∞∑n=1

(|an| − |an+1|)n∑k=1

sk(b)

=

∞∑n=1

|an| sn(b) =

∞∑n=1

sn(a)sn(b),

which proves (2).

399


Theorem 5.155. Let a ∈ KN with a1 ≥ · · · ≥ aN ≥ 0. A point b ∈ KN lies inthe convex hull of c ∈ KN : s(c) = a if and only if

n∑k=1

sk(b) ≤n∑k=1

ak

for n = 1, . . . , N .

Proof. Let S be the convex hull. If b ∈ S, then b =∑mj=1 rjcj for some c1, . . . , cm ∈

KN such that s(cj) = a, 0 ≤ rj ≤ 1 and∑mj=1 rj = 1. Choose a permutation σ of

1, . . . , N such that sk(b) =∑mj=1 rj |(cj)σ(k)|. Then

n∑k=1

sk(b) =

m∑j=1

rj

n∑k=1

|(cj)σ(k)| ≤m∑j=1

rj

n∑k=1

sk(cj)

=

m∑j=1

rj

n∑k=1

ak =

n∑k=1

ak

for n = 1, . . . , N .

Conversely, suppose that∑nk=1 sk(b) ≤

∑nk=1 ak for n = 1, . . . , N . If b /∈ S, then

Theorem 1.52 shows that there is a linear functional λ : KN → K such that

supc∈S

Reλc ≤ 1 and Reλb > 1.

Choose some r ∈ KN with |r1| ≥ · · · ≥ |rN | such that λc =∑Nn=1 rncn and let

µc =∑Nn=1 sn(r)cn so that

|λc| ≤N∑n=1

|rncn| ≤N∑n=1

sn(r)sn(c) = µs(c)

by Lemma 5.154. Define a∗ ∈ KN by a∗n = (|rn|/rn)an whenever rn 6= 0 anda∗n = an whenever rn = 0, so that s(a∗) = a and

λa∗ =N∑n=1

|rn|an =N∑n=1

sn(r)an = µa.

400


Then

N∑n=1

sn(r)sn(b) = sN (r)

N∑k=1

sk(b) +

N−1∑n=1

(sn(r)− sn+1(r))

n∑k=1

sk(b)

≤ sN (r)

N∑k=1

ak +

N−1∑n=1

(sn(r)− sn+1(r))

n∑k=1

ak

=

N∑n=1

sn(r)an

and

|λb| ≤ µs(b) =

N∑n=1

sn(r)sn(b) ≤N∑n=1

sn(r)an = µa = λa∗.

Since a∗ ∈ S we have |λb| ≤ λa∗ ≤ 1, which contradicts the fact that Reλb > 1.Therefore b ∈ S.

Theorem 5.156. Let a, b ∈ KN such that a1 ≥ · · · ≥ aN ≥ 0, b1 ≥ · · · ≥ bN ≥ 0,and

n∏k=1

ak ≤n∏k=1

bk

for n = 1, . . . , N . If f : [0,∞) → R is a continuous and increasing function suchthat f exp is convex, then

n∑k=1

f(ak) ≤n∑k=1

f(bk)

for n = 1, . . . , N .

Proof. First assume that a1, . . . , aN 6= 0 and b1, . . . , bN 6= 0. Choose some r > 0such that ran > 1 and rbn > 1 for n = 1, . . . , N . Define a, b ∈ KN by an =log(ran) and bn = log(rbn). Then

n∑k=1

ak = log

(n∏k=1

rak

)≤ log

(n∏k=1

rbk

)=

n∑k=1

bk

for n = 1, . . . , N , so Theorem 5.155 shows that a lies in the convex hull of c ∈KN : s(c) = b. Define Fn : KN → R by Fn(c) =

∑nk=1 f(exp(|ck|)/r). If

401


c, d ∈ KN and 0 ≤ λ ≤ 1,

Fn(λc+ (1− λd)) =

n∑k=1

f(exp(|λck + (1− λ)dk|)/r)

≤n∑k=1

f(exp(λ|ck|/r + (1− λ)|dk|)/r)

≤ λn∑k=1

f(exp(|ck|)/r) + (1− λ)

n∑k=1

f(exp(|dk|)/r)

= λFn(c) + (1− λ)Fn(d)

because x 7→ f(exp(x)/r) = f(exp(x − log r)) is convex. This shows that Fn is

convex. Since Fn(c) ≤ Fn(s(c)) for all c ∈ KN , we must have Fn(a) ≤ Fn(b).Therefore

n∑k=1

f(ak) = Fn(a) ≤ Fn(b) =

n∑k=1

f(bk).

The general case follows easily by setting ak = ε whenever ak = 0 (and similarlyfor b) for sufficiently small ε > 0, and taking ε→ 0.

We say that a (possibly infinite) matrix (am,n) is doubly substochastic if∑∞n=1 |am,n| ≤ 1 for all m and

∑∞m=1 |am,n| ≤ 1 for all n.

Lemma 5.157. Let x1 ≥ · · · ≥ xn ≥ 0 and let 0 ≤ a1, . . . , an ≤ 1. If∑nk=1 ak ≤

m for some integer m, then

n∑k=1

akxk ≤m∑k=1

xk.

Proof. We have

n∑k=1

xkak =

n−1∑k=1

(xk − xk+1)

k∑j=1

aj + xn

n∑j=1

aj

=

m∑k=1

(xk − xk+1)

k∑j=1

aj +

n−1∑k=m+1

(xk − xk+1)

k∑j=1

aj + xn

n∑j=1

aj

≤m∑k=1

(xk − xk+1)k +

n−1∑k=m+1

(xk − xk+1)m+ xnm

=

m∑k=1

xk.

402


Theorem 5.158 (Properties of doubly substochastic matrices).

1. If un, vn, wn, xn are orthonormal sets in E, then

(〈un, vm〉〈wm, xn〉)m,n

is a doubly substochastic matrix.

2. If A = (am,n) is a doubly substochastic N × N matrix and b ∈ KN , thenAb ∈ KN lies in the convex hull of c ∈ KN : s(c) = s(b).

Proof. For (1), the Cauchy-Schwarz inequality and Bessel’s inequality (see Theo-rem 1.89) give

∞∑n=1

| 〈un, vm〉〈wm, xn〉 | ≤

( ∞∑n=1

| 〈un, vm〉 |2)1/2( ∞∑

n=1

| 〈wm, xn〉 |2)1/2

≤ |vm|2|wm|2

= 1.

For (2), let c = Ab. By permuting the rows and columns of A, we can assume that|b1| ≥ · · · ≥ |bN | and |c1| ≥ · · · ≥ |cN |. If 1 ≤ n ≤ N , then

n∑k=1

sk(c) =

n∑k=1

|ck| ≤n∑k=1

N∑j=1

|ak,j ||bj | =N∑j=1

rj |bj |

where rj =∑nk=1 |ak,j |. Since 0 ≤ rj ≤ 1 and

∑Nj=1 rj =

∑Nj=1

∑nk=1 |ak,j | ≤ n,

Lemma 5.157 implies thatn∑k=1

sk(c) ≤n∑j=1

|bj |,

and the result follows from Theorem 5.155.

Symmetric norms on Calkin spaces

Let E be a separable Hilbert space and let i be a Calkin space. A norm ‖·‖∗ oni is symmetric if ‖a‖∗ = ‖s(a)‖∗ for all a ∈ i. Equivalently, ‖·‖∗ is symmetric if‖ra‖∗ = ‖a‖∗ and ‖a σ‖∗ = ‖a‖∗ whenever a ∈ i, r ∈ `∞ satisfies |rn| = 1 for alln, and σ is a permutation of Z+. For example, the norms ‖·‖∞ on c0 and ‖·‖p on`p (for 1 ≤ p <∞) are symmetric.

403


Let ‖·‖∗ be a symmetric norm on c00, and let ρn ∈ L(`∞) be the projection ontoKn, so that ρna = (a1, . . . , an, 0, 0, . . . ) for a ∈ `∞. The maximal space i∗ of‖·‖∗ is the vector space consisting of all a ∈ `∞ such that

‖ρ1a‖∗ , ‖ρ2a‖∗ , . . .

is bounded. If we define ‖a‖∗ = supn≥1 ‖ρna‖∗ for a ∈ i∗, then ‖·‖∗ is a norm on

i∗. The minimal space i(0)∗ of ‖·‖∗ is the closure of c00 in i∗ (under ‖·‖∗, not

‖·‖∞). For example, the maximal space of ‖·‖p is `p, and the minimal space isidentical to the maximal space for 1 ≤ p <∞.

Theorem 5.159 (Properties of symmetric norms). Let ‖·‖∗ be a symmetric normon c00, extended to its maximal space i∗.

1. i∗ ∩ c0 is a Calkin space.

2. ‖s(a)‖∗ = ‖a‖∗ for all a ∈ i∗ ∩ c0.

3. ‖a‖∗ ≤ ‖b‖∗ for all a, b ∈ i∗ ∩ c0 such that∑nk=1 sk(a) ≤

∑nk=1 sk(b) for all

n. In particular, this holds when s(a) ≤ s(b).

4. i∗ is complete.

5. ‖e1‖∗ ‖a‖∞ ≤ ‖a‖∗ for all a ∈ i∗ and ‖a‖∗ ≤ ‖e1‖∗ ‖a‖1 for all a ∈ i∗ ∩ `1,where e1 = (1, 0, 0, . . . ).

6. If A = (am,n) is a doubly substochastic infinite matrix and b ∈ i∗, thenAb ∈ i∗ and ‖Ab‖∗ ≤ ‖b‖∗.

7. If i∗ \ c0 is nonempty then ‖·‖∗ is equivalent to ‖·‖∞.

Proof. For (2) and (3), first assume that a, b ∈ i∗ ∩ c00 and choose n so thatak = 0 and bk = 0 for k > n. By Theorem 5.155, a lies in the convex hull ofc ∈ Kn : s(c) = s(b). Since ‖·‖∗ (restricted to Kn) is convex, we must have‖a‖∗ ≤ ‖s(b)‖∗ = ‖b‖∗. For the general case we have ‖ρns(a)‖∗ ≤ ‖ρns(b)‖∗ forall n, so ‖s(a)‖∗ ≤ ‖s(b)‖∗ by definition. If we can prove (2), then (3) followsimmediately. Since ρna ∈ i∗ ∩ c00 and s(ρna) ≤ ρns(a),

‖ρna‖∗ ≤ ‖s(ρna)‖∗ ≤ ‖ρns(a)‖∗ ≤ ‖s(a)‖∗ .

This holds for all n, so ‖a‖∗ ≤ ‖s(a)‖∗. For each n, we can choose some m suchthat

s1(a), . . . , sn(a) ⊆ |a1|, . . . , |am|.

Then s(ρns(a)) ≤ s(ρma), so ‖ρns(a)‖∗ ≤ ‖ρma‖∗ ≤ ‖a‖∗. This holds for all n, so‖s(a)‖∗ ≤ ‖a‖∗, and this proves (2). (1) follows from (2) and (3).

404


For (5), applying (3) gives

|an| ‖e1‖∗ = ‖(0, . . . , 0, an, 0, . . . )‖∗ ≤ ‖ρna‖∗ ≤ ‖a‖∗

for all n. Also, if a ∈ i∗ ∩ `1 then

‖a‖∗ ≤∞∑n=1

‖(0, . . . , 0, an, 0, . . . )‖∗ = ‖e1‖∗∞∑n=1

|an| = ‖e1‖∗ ‖a‖1

using the triangle inequality.

For (4), let a(m) be a Cauchy sequence in i∗. (5) implies that a(m) is L∞-Cauchy, so a(m) → a in L∞ for some a ∈ `∞. By Theorem 1.39, the restrictionsof ‖·‖∞ and ‖·‖∗ to Kn are equivalent. Since ‖ρn(a(m)− a)‖∞ → 0, we must have‖ρn(a(m) − a)‖∗ → 0. For each m we have

|‖ρn(a(m) − a)‖∗ − ‖ρn(a(m) − a(k))‖∗| ≤ ‖ρn(a(k) − a)‖∗ → 0

as k →∞, so

‖ρn(a(m) − a)‖∗ = limk→∞

‖ρn(a(m) − a(k))‖∗ ≤ limk→∞

‖a(m) − a(k)‖∗.

This holds for all n, so a(m) − a ∈ i∗ and

‖a(m) − a‖∗ ≤ limk→∞

‖a(m) − a(k)‖∗ → 0

as m→∞. Note that a = a(1) − (a(1) − a) ∈ i∗.

For (6), Theorem 5.158 implies that for all n we have

m∑k=1

sk(ρnAb) ≤m∑k=1

sk(ρnb)

for m = 1, . . . , n, so (3) implies that ‖ρnAb‖∗ ≤ ‖ρnb‖∗ ≤ ‖b‖∗. This holds for alln, so ‖Ab‖∗ ≤ ‖b‖∗.

For (7), choose some b ∈ i∗ \ c0, and choose some ε such that |bn| ≥ ε for infinitelymany n. It is easy to show that (ε, ε, . . . ) ∈ i∗, so j = (1, 1, . . . ) ∈ i∗. If a ∈ i∗then s(ρna) ≤ s(‖a‖∞ ρnj), so ‖ρna‖∗ ≤ ‖ρnj‖∗ ‖a‖∞ ≤ ‖j‖∗ ‖a‖∞. This holdsfor all n, so ‖a‖∗ ≤ ‖j‖∗ ‖a‖∞. Together with (5), this shows that ‖·‖∗ and ‖·‖∞are equivalent norms.

If i ∩ c0 is a Calkin space, we write I(i) = τ ∈ L0(E) : s(τ) ∈ i as in Theorem5.151, and if ‖·‖∗ is a symmetric norm on i, we write |τ |∗ = ‖s(τ)‖∗ for τ ∈ I(i).We say that |·|∗ is induced by ‖·‖∗.

405


Lemma 5.160. Let ‖·‖∗ be a symmetric norm on c00, and let O be the collection ofall orthonormal sets in E. If τ ∈ I(i∗) then (〈τun, vn〉) ∈ i∗ for all un, vn ∈ O,and

|τ |∗ = supun,vn∈O

‖(〈τun, vn〉)‖∗ .

If τ ∈ L0(E) and (〈τun, vn〉) ∈ i∗ for all un, vn ∈ O, then τ ∈ I(i∗) and theabove equality holds.

Proof. Let τ ∈ L0(E) and let τ =∑∞n=1 sn(τ)xnw

∗n be a singular value decompo-

sition of τ (see Theorem 5.142). Suppose that τ ∈ I(i∗). Since

〈τun, vn〉 =

∞∑m=1

〈xm, vn〉〈un, wm〉 sn(τ),

Theorem 5.158 and Theorem 5.159 imply that (〈τun, vn〉) ∈ i∗ and ‖(〈τun, vn〉)‖∗ ≤|τ |∗ for all un, vn ∈ O. Therefore sup· · · ≤ |τ |∗. If (〈τun, vn〉) ∈ i∗ for allun, vn ∈ O then

(sn(τ)) = (〈τwn, xn〉) ∈ i∗,so τ ∈ I(i∗) and |τ |∗ ≤ sup· · · .

Theorem 5.161 (Properties of the induced norm). Let ‖·‖∗ be a symmetric normon c00.

1. |·|∗ is a norm on I(i∗).

2. If τ ∈ I(i∗) then τ∗ ∈ I(i∗) and |τ∗|∗ = |τ |∗.3. If τ ∈ I(i∗) and ψ,ϕ ∈ L(E), then ψτϕ ∈ I(i∗) and |ψτϕ|∗ ≤ |ψ| |τ |∗ |ϕ|.4. ‖e1‖∗ |τ | ≤ |τ |∗ for all τ ∈ I(i∗), where e1 = (1, 0, 0, . . . ).

5. I(i∗) is complete.

6. The closure of L00(E) in I(i∗) is I(i(0)∗ ).

Proof. Let O be the collection of all orthonormal sets in E. (1) follows fromLemma 5.160 and the fact that

‖(〈(σ + τ)un, vn〉)‖∗ = ‖(〈σun, vn〉) + (〈τun, vn〉)‖∗≤ ‖(〈σun, vn〉)‖∗ + ‖(〈τun, vn〉)‖∗

for all un, vn ∈ O. (2) follows from the fact that s(τ∗) = s(τ). For (3),Corollary 5.145 shows that s(ψτϕ) ≤ |ψ| |ϕ| s(τ). Therefore s(ψτϕ) ∈ i∗ andψτϕ ∈ I(i∗). Also, Theorem 5.159 shows that

|ψτϕ|∗ = ‖s(ψτϕ)‖∗ ≤ |ψ| |ϕ| ‖s(τ)‖∗ = |ψ| |ϕ| |τ |∗ .

406


For (4), Theorem 5.159 shows that

‖e1‖∗ |τ | = ‖e1‖∗ ‖s(τ)‖∞ ≤ ‖s(τ)‖∗ = |τ |∗ .

For (5), let τm be a Cauchy sequence in I(i∗). (3) implies that τm is uniformlyCauchy, so τm → τ for some τ0 ∈ L(E). Let un, vn ∈ O and let k > 0. ByTheorem 1.39, the restrictions of ‖·‖∞ and ‖·‖∗ to Kk are equivalent. Since‖ρk(〈(τj − τ)un, vn〉)‖∞ → 0 as j →∞, we must have ‖ρk(〈(τj − τ)un, vn〉)‖∗ → 0and

‖ρk(〈(τm − τj)un, vn〉)‖∗ → ‖ρk(〈(τm − τ)un, vn〉)‖∗as j →∞. By Lemma 5.160 we have

‖ρk(〈(τm − τj)un, vn〉)‖∗ ≤ ‖(〈(τm − τj)un, vn〉)‖∗ ≤ |τm − τj |∗ ,

so taking j →∞ shows that

‖ρk(〈(τm − τ)un, vn〉)‖∗ ≤ limj→∞

|τm − τj |∗ .

This holds for all k, so

‖(〈(τm − τ)un, vn〉)‖∗ ≤ limj→∞

|τm − τj |∗ .

This holds for all un and vn, so Lemma 5.160 implies that

|τm − τ |∗ ≤ limj→∞

|τm − τj |∗ → 0

as m→∞.

For (6), let τ ∈ L0(E) and let τ =∑∞n=1 sn(τ)xnw

∗n be a singular value decompo-

sition of τ . If τ is in the closure of L00(E) in I(i∗), then Lemma 5.160 implies that

(〈τun, vn〉) ∈ i(0)∗ for all un, vn ∈ O. In particular, s(τ) = (〈τwn, xn〉) ∈ i(0)

∗ ,

so τ ∈ I(i(0)∗ ). Conversely, suppose that τ ∈ I(i

(0)∗ ) and choose a sequence a(m)

in c00 such that a(m) → s(τ) in i∗. Then∣∣∣∣∣∞∑n=1

a(m)n xnw

∗n − τ

∣∣∣∣∣∗

=

∣∣∣∣∣∞∑n=1

(a(m)n − sn(τ))xnw

∗n

∣∣∣∣∣∗

= ‖a(m) − s(τ)‖∗ → 0.

Each∑∞n=1 a

(m)n xnw

∗n is in L00(E) because a(m) ∈ c00, so τ is in the closure of

L00(E) in I(i∗).

Corollary 5.162. If ‖·‖∗ is a symmetric norm on c00 then I(i∗) is a *-ideal ofL(E). If I is a proper ideal of L(E) and |·|I is a norm on I satisfying (3) inTheorem 5.161, then there is a symmetric norm ‖·‖∗ on c00 such that |·|I agreeswith |·|∗ on L00(E), and I ⊆ I(i∗).

407


Proof. If |·|∗ is a symmetric norm on c00 then Theorem 5.161 shows that I(i∗) isa *-ideal of L(E). For the second statement, (3) and Theorem 5.138 imply thatthere is a symmetric norm ‖·‖∗ on c00 such that |τ |I = ‖s(τ)‖∗ for all τ ∈ L00(E).Let τ ∈ I, let τ =

∑∞k=1 sk(τ)vku

∗k be a singular value decomposition of τ , and let

σn =∑nk=1 vkv

∗k. Using (3),

‖ρns(τ)‖∗ = ‖s(σnτ)‖∗ = |σnτ |I ≤ |τ |I

for all n. Therefore s(τ) ∈ i∗ and τ ∈ I(i∗).

Theorem 5.163 (Holder’s inequality, abstract version). Let ‖·‖A , ‖·‖B , ‖·‖C besymmetric norms on c00, and let iA, iB , iC be their maximal spaces. Suppose thata ∈ iA and b ∈ iB implies ab ∈ iC and ‖ab‖C ≤ ‖a‖A ‖b‖B. Then σ ∈ I(iA) andτ ∈ I(iB) implies στ ∈ I(iC) and |στ |C ≤ |σ|A |τ |B.

Proof. Theorem 5.148 and Theorem 5.156 gives

n∑k=1

sk(στ) ≤n∑k=1

sk(σ)sk(τ)

for all n, so Theorem 5.155 implies that

‖ρns(στ)‖C ≤ ‖ρns(σ)s(τ)‖C ≤ ‖ρns(σ)‖A ‖ρns(τ)‖B ≤ ‖s(σ)‖A ‖s(τ)‖B .

This holds for all n, so s(στ) ∈ iC , στ ∈ I(iC) and

|στ |C = ‖s(στ)‖C ≤ ‖s(σ)‖A ‖s(τ)‖B = |σ|A |τ |B .

Schatten class operators

Let E be a separable Hilbert space. Recall that the maximal space of ‖·‖p is `p,for 1 ≤ p ≤ ∞. For p <∞, we define the pth Schatten class Sp(E) to be I(`p),and we define S∞(E) = L(E). The norm |·|p on Sp(E) is called the pth Schattennorm, and satisfies

|τ |p = ‖s(τ)‖p =

( ∞∑n=1

sn(τ)p

)1/p

for p <∞.

If τ ∈ S1(E) then we say that τ is a trace class operator, and if τ ∈ S2(E) thenwe say that τ is a Hilbert-Schmidt operator.

408


Theorem 5.164 (Properties of Schatten classes). Let σ, τ, ϕ, ψ ∈ L(E) and let1 ≤ p, q ≤ ∞.

1. If τ ∈ Sp(E) then τ∗ ∈ Sp(E) and |τ∗|p = |τ |p.

2. If τ ∈ Sp(E) then ψτϕ ∈ Sp(E) and |ψτϕ|p ≤ |ψ| |τ |p |ϕ|.

3. If ϕ and ψ are unitary and τ ∈ Sp(E), then ψτϕ ∈ Sp(E) and |ψτϕ|p = |τ |p.

4. If p and q are conjugate exponents, σ ∈ Sp(E), and τ ∈ Sq(E), then στ ∈S1(E) and |στ |1 ≤ |σ|p |τ |q.

5. If p ≤ q and τ ∈ Sp(E) then τ ∈ Sq(E) and |τ |q ≤ |τ |p.

6. Sp(E) is complete.

7. If p <∞ then L00(E) is dense in Sp(E) (under the Schatten norm).

Proof. (1), (2) and (3) follow from Theorem 5.161. (4) follows from Theorem 4.81and Theorem 5.163. (5) follows from Theorem 5.135. (6) and (7) follow fromTheorem 5.161.

Theorem 5.165. Let τ ∈ L(E). The following are equivalent:

1. τ ∈ S2(E).

2. The series∑∞n=1 |τun|

2converges for all Hilbert bases un for E.

3. The series∑m,n≥1 | 〈τum, vn〉 |2 converges for all Hilbert bases un , vn

for E.

4. The series∑∞n=1 |τun|

2converges for some Hilbert basis un for E.

5. The series∑m,n≥1 | 〈τum, vn〉 |2 converges for some Hilbert bases un , vn

for E.

In that case,

|τ |22 =

∞∑n=1

|τun|2 =∑m,n≥1

| 〈τum, vn〉 |2.

Proof. Suppose that (1) holds and let τ =∑∞n=1 sn(τ)xnw

∗n be a singular value

decomposition of τ . Then

|τun|2 =

∞∑m=1

sm(τ)2| 〈un, wm〉 |2,

409


so∞∑n=1

|τun|2 =

∞∑m=1

sm(τ)2∞∑n=1

| 〈un, wm〉 |2 =

∞∑m=1

sm(τ)2.

This proves (1)⇒ (2). If (2) holds then

∞∑n=1

|τun|2 =

∞∑n=1

∣∣∣∣∣∞∑m=1

〈τun, vm〉 vm

∣∣∣∣∣2

=∑m,n≥1

| 〈τun, vm〉 |2,

which proves (2)⇒ (3). Suppose that (3) holds. By Theorem 5.104, we can writeτ∗ = ν(ττ∗)1/2 for some partial isometry ν ∈ L(E) such that ker ν = ker τ∗. Since

im(ττ∗)1/2 ⊆ (ker τ∗)⊥ = (ker ν)⊥

and ν|(ker ν)⊥ is an isometry, Theorem 1.101 shows that

τν = (ν∗τ∗)∗ = (ν∗ν(ττ∗)1/2)∗ = (ττ∗)1/2.

Let un be an orthonormal set in (ker τ∗)⊥, and choose a Hilbert basis O for Ethat contains un (see Theorem 1.91). Since νun is an orthonormal set, we canchoose a Hilbert basis O′ for E that contains νun. Then∑

u∈O,v∈O′| 〈τv, u〉 |2

converges,

∞∑n=1

| 〈τνun, un〉 |2 =

∞∑n=1

|〈(ττ∗)1/2un, un〉|2 =

∞∑n=1

|(ττ∗)1/4un|4,

and |(ττ∗)1/4un| → 0. By Theorem 5.133, (ττ∗)1/4 is compact, and Theorem5.131 and Corollary 5.120 show that τ is compact. (1) follows from Lemma 5.160,and this proves (3)⇒ (1).

It remains to prove (4)⇒ (1). Let vn be a Hilbert basis for E. Then

∞∑n=1

|τun|2 =∑m,n≥1

|〈τun, vm〉|2 =∑m,n≥1

|〈τ∗vn, um〉|2 =

∞∑n=1

|τ∗vn|2,

so τ∗ ∈ S2(E) by (2) and τ ∈ S2(E) by Theorem 5.164.

Corollary 5.166. Let τ ∈ L(E) be a positive operator and let 1 ≤ p < ∞. Letun be a Hilbert basis for E. Then τ ∈ Sp(E) if and only if

∞∑n=1

〈τpun, un〉

410


converges. In that case,

|τ |pp =

∞∑n=1

〈τpun, un〉 .

Proof. Apply Theorem 5.165 to τp/2.

Trace and determinant

Let τ ∈ S1(E). We define the trace of τ by

tr τ =

∞∑n=1

〈τun, un〉 ,

where un is a Hilbert basis for E. To check that this is well-defined, let vnbe another Hilbert basis for E and let τ =

∑∞n=1 sn(τ)xnw

∗n be a singular value

decomposition of τ . Then

∞∑n=1

〈τun, un〉 =

∞∑n=1

∞∑m=1

sm(τ) 〈xm, un〉〈un, wm〉 ,

which converges absolutely due to Theorem 5.158 and Theorem 5.159. Therefore

∞∑n=1

〈τun, un〉 =

∞∑m=1

sm(τ)

⟨ ∞∑n=1

〈xm, un〉un, wm

⟩=

∞∑m=1

sm(τ) 〈xm, wm〉 .

A similar argument shows that

∞∑n=1

〈τvn, vn〉 =

∞∑m=1

sm(τ) 〈xm, wm〉 ,

so∑∞n=1 〈τun, un〉 =

∑∞n=1 〈τvn, vn〉.

Theorem 5.167 (Properties of trace). Let τ ∈ S1(E) and σ ∈ L(E).

1. tr : S1(E)→ K is a continuous linear map satisfying |tr τ | ≤ |τ |1.

2. tr(στ) = tr(τσ).

3. tr τ∗ = tr τ .

4. If τ is self-adjoint then tr τ is real.

5. If τ ≥ 0 then tr τ = |τ |1.

411


Proof. The linearity of tr is clear. If τ =∑∞n=1 sn(τ)xnw

∗n is a singular value

decomposition of τ then

|tr τ | =

∣∣∣∣∣∞∑n=1

sn(τ) 〈xn, wn〉

∣∣∣∣∣ ≤∞∑n=1

sn(τ) = |τ |1 .

This proves (1). For (2), choose a Hilbert bases O and O′ for E such that Ocontains wn and O′ contains xn. Then

tr(στ) =∑w∈O〈στw,w〉 =

∑w∈O

⟨σ

∞∑n=1

sn(τ)xn 〈w,wn〉 , w

⟩

=

∞∑n=1

sn(τ) 〈σxn, wn〉 =

∞∑n=1

sn(τ)

⟨σ∑x∈O′

〈xn, x〉x,wn

⟩

=∑x∈O′

∞∑n=1

sn(τ) 〈xn, x〉〈σx,wn〉 =∑x∈O′

〈τσx, x〉 .

For (3), if un is a Hilbert basis for E then

tr τ∗ =

∞∑n=1

〈τ∗un, un〉 =

∞∑n=1

〈un, τun〉 =

∞∑n=1

〈τun, un〉 = tr τ .

Suppose that E is n-dimensional and τ ∈ L(E) is diagonalizable with respectto u1, . . . , un, and let λi be the eigenvalue corresponding to ui. For each k,Theorem 1.124 shows that Λkτ is diagonalizable with respect to

ui1 ∧ · · · ∧ uik : i1 < · · · < ik,

and the eigenvalue corresponding to ui1 ∧ · · · ∧ uik is λi1 · · ·λik . Therefore

tr Λkτ =∑

i1<···<ik

λi1 · · ·λik ,

and

det(IdE + τ) =

n∏k=1

(1 + λk) =

n∑k=0

tr Λkτ.

Now let E be infinite-dimensional and let τ ∈ S1(E). Since (τ∗τ)1/2 ∈ S1(E),Theorem 1.127 shows that (Λkτ∗Λkτ)1/2 = Λk(τ∗τ)1/2. This allows us to apply

412


Theorem 1.124, which gives∣∣Λkτ ∣∣1

=∑

i1<···<ik

si1(τ) · · · sik(τ) ≤ 1

k!

∑i1,...,ik≥1

si1(τ) · · · sik(τ)

=1

k!

( ∞∑i=1

si(τ)

)k=

1

k!|τ |k1

and shows that Λkτ ∈ S1(ΛkE). We therefore define the determinant of IdE + τby

det(IdE + τ) =

∞∑k=0

tr Λkτ,

noting that

|det(IdE + τ)| ≤∞∑k=0

| tr Λkτ | ≤∞∑k=0

1

k!|τ |k1 = exp(|τ |1).

Theorem 5.168 (Properties of determinant). Let E be a separable complex Hilbertspace and let σ, τ ∈ S1(E).

1. |det(IdE + σ)− det(IdE + τ)| ≤ exp(1 + |σ|1 + |τ |1) |σ − τ |1.

2. det((IdE + σ)(IdE + τ)) = det(IdE + σ) det(IdE + τ).

3. IdE + τ is invertible if and only if det(IdE + τ) 6= 0.

4. z 7→ det(IdE + zτ) is an entire function.

5. If λ 6= 0 is an eigenvalue of τ with multiplicity n, then ordz=−λ−1 det(IdE +zτ) = n.

Proof. TODO

5.9 Fourier analysis

A locally compact abelian (LCA) group is an abelian locally compact Hausdorff(LCH) group. For LCA groups we will use additive rather than multiplicativenotation. Recall from Section 4.10 that every LCH group G has a Haar measure m,defined on the Borel sets, that is unique up to a multiplicative constant. Usually,we will choose the constant so that m(G) = 1 if G is compact, and m(x) = 1for all x ∈ G if G is discrete. If G is finite (both discrete and compact), we will

413


specify the normalization constant. We will write L p(G,F) = L p(G,m,F) andLp(G,F) = Lp(G,m,F), as well as∫

G

f(x) dx =

∫x∈G

f(x) dm

when it is clear which measure we are using.

Theorem 5.169. Let G be an LCA group with a Haar measure m.

1. m(E) = m(−E) for any Borel set E.

2. For all f ∈ L 1(G,F), the map x 7→ f(−x) is in L 1(G,F) and∫G

f(−x) dx =

∫G

f(x) dx.

Proof. Define m′(E) = m(−E). Then m′ is a Haar measure on G and by unique-ness, m′ = cm for some c > 0. Theorem 4.166 shows that there exists a symmetriccompact subset E of G, and since m′(E) = m(E) <∞, we must have c = 1.

Let G be an LCA group and let f be a continuous map from G to a Banach space.Recall that the left a-translate of f is defined by Laf(x) = f(x − a), and thatf is uniformly continuous if for every ε > 0 there exists a neighborhood U of0 such that ‖Laf − f‖∞ < ε for all a ∈ U . (Left and right uniform continuity areequivalent when G is abelian.)

Theorem 5.170.

1. Let 1 ≤ p < ∞. For all f ∈ L p(G,F), the map x 7→ Lxf from G toL p(G,F) is uniformly continuous.

2. For all f ∈ C0(G,F), the map x 7→ Lxf from G to C0(G,F) is uniformlycontinuous.

Proof. For (1), let ε > 0. Since Cc(G,F) is dense in L p(G,F) (see Theorem4.143), there is some g ∈ Cc(G,F) such that ‖g − f‖p < ε. Also, g is uniformlycontinuous because it has compact support (see Theorem 4.167), so there existsa neighborhood U of 0 such that ‖Lag − g‖∞ < ε for all a ∈ U . Theorem 4.166implies that we can choose a relatively compact neighborhood V of 0 such thatV ⊆ U , and that the set E = supp(g) + V is compact. Then Lag − g vanishes

414


outside E and ‖Lag − g‖p < εm(E)1/p whenever a ∈ V . For all x ∈ G and a ∈ Vwe have

‖LaLxf − Lxf‖p = ‖Laf − f‖p≤ ‖Laf − Lag‖p + ‖Lag − g‖p + ‖g − f‖p< ε(2 +m(E)1/p).

(We have used the fact that ‖Lah‖p = ‖h‖p for any h because µ is translation-invariant.)

(2) follows directly from Theorem 4.167.

(m ⊗m)-measurability

In this section, we will frequently switch the order of integration using Fubini’stheorem for Radon products (Theorem 4.163). Even when G is not σ-finite, theuse of Fubini’s theorem is valid because Theorem 4.55 shows that all integrablefunctions vanish outside a σ-finite set. Note that the original version of Fubini’stheorem (Theorem 4.119) does not apply because the functions involved might notbe (m⊗m)-measurable or even (BG ⊗BG)-measurable. The following two resultswill be sufficient for all of the functions that we will be working with.

Lemma 5.171. Let m ⊗m be the Radon product on G×G. If E ⊆ G is a Borelset with m(E) = 0, then

(m ⊗m)((x, y) ∈ G×G : x− y ∈ E) = 0.

Proof. Define h : G×G→ G×G by h(x, y) = (x−y, y); then h is a homeomorphismwith the inverse (x, y) 7→ (x+ y, y). Let F = (x, y) ∈ G×G : x− y ∈ E. SinceE×G ∈ BG×G and h is continuous, F = h−1(E×G) ∈ BG×G. We have F y = y+E,so (χF )y = χFy ∈ L 1(m,R) for all y ∈ G. The map

y 7→∫G

(χF )y dm = m(y + E) = m(E) = 0

is clearly in L 1(m,R). Then Fubini’s theorem (for Radon products) shows thatχF ∈ L 1(m ⊗m,R), and

(m ⊗m)(F ) =

∫G×G

χF d(m ⊗m) =

∫y∈G

∫G

(χF )y dmdm = 0.

415


Theorem 5.172. If f : G → E is µ-measurable, then the function (x, y) 7→f(x− y) is (m ⊗m)-measurable.

Proof. Let g(x, y) = f(x− y) and h(x, y) = x− y. By Theorem 4.39, there is a setE ⊆ G of measure zero such that f |G\E is measurable and f(G \ E) is separable.Lemma 5.171 shows that F = (x, y) ∈ G ×G : x − y ∈ E has measure zero. IfA ⊆ E is a Borel set then

(g|(G×G)\F )−1(A) = (x, y) ∈ (G×G) \ F : f(x− y) ∈ A= (x, y) ∈ (G×G) \ F : (f |G\E h)(x, y) ∈ A= (f |G\E h)−1(A) \ F

is measurable because f |G\E and k are measurable. Also, g((G×G)\F ) ⊆ f(G\E)is separable. The result follows from Theorem 4.39.

Convolution

Let E,F,G be Banach spaces and let · : E× F→ G be a compatible product. Letf : G→ E and g : G→ F. The convolution of f and g at x ∈ G is defined by

(f ∗ g)(x) =

∫G

f(x− y) · g(y) dy =

∫G

(Lyf)(x) · g(y) dy,

assuming that y 7→ f(x−y)·g(y) is in L 1(G,G). If the convolution of f and g existsfor all x (or almost all x, when appropriate), we can consider the convolutionf ∗ g : G→ G as a function.

Theorem 5.173 (Properties of convolution). Let f : G→ E and g : G→ F.

1. If f ∈ Cc(G,E) and g ∈ Cc(G,F) then f ∗ g ∈ Cc(G,G), with supp(f ∗ g) ⊆supp(f) + supp(g).

2. Let 1 ≤ p, q ≤ ∞ be conjugate exponents. If f ∈ L p(G,E) and g ∈ L q(G,F)then f ∗ g ∈ C(G,G) and is uniformly continuous. If 1 < p ≤ ∞ thenf ∗ g ∈ C0(G,G).

3. If f ∈ L 1(G,E) and g ∈ L 1(G,F) then (f ∗g)(x) exists for almost all x ∈ Gand f ∗ g ∈ L 1(G,G). The map (f, g) 7→ f ∗ g is continuous and bilinear,and in particular we have ‖f ∗ g‖1 ≤ ‖f‖1 ‖g‖1.

4. Suppose that E = F and the compatible product is symmetric. If there issome x ∈ G such that (f ∗g)(x) exists, then (g ∗f)(x) exists and (f ∗g)(x) =(g ∗ f)(x).

416


5. Suppose that f ∈ L 1(G,E1), g ∈ L 1(G,E2) and h ∈ L 1(G,E3) with ap-propriate compatible products such that (x · y) · z = x · (y · z) for all x ∈ E1,y ∈ E2 and z ∈ E3. Then (f ∗ g) ∗ h = f ∗ (g ∗ h).

Proof. For (1), if x /∈ supp(f) + supp(g) then x − y /∈ supp(f) whenever y ∈supp(g), and therefore (f ∗ g)(x) = 0. It is easy to show that f ∗ g is continuous.

For (2), note that (f ∗ g)(x) exists for all x ∈ G and ‖f ∗ g‖∞ ≤ ‖f‖p ‖g‖q due toTheorem 4.91. First assume that 1 < p, q <∞. By Theorem 4.143, we can choosea sequence fn in Cc(G,E) and a sequence gn in Cc(G,F) such that fn → f inLp and gn → g in Lq. For all x ∈ G, apply Theorem 4.91 to get

|(fn ∗ gn)(x)− (f ∗ g)(x)|

≤∫G

|fn(x− y) · gn(y)− f(x− y) · g(y)| dy

=

∫G

|[fn(x− y)− f(x− y)] · [gn(y)− g(y)]

+ f(x− y) · [gn(y)− g(y)]

[fn(x− y)− f(x− y)] · g(y)| dy≤ ‖fn − f‖p ‖gn − g‖q + ‖f‖p ‖gn − g‖q + ‖fn − f‖p ‖g‖q .

This shows that fn ∗ gn → f ∗ g uniformly, and Theorem 4.128 shows that f ∗ g ∈C0(G,G) since every fn ∗ gn is in Cc(G,G) from (1). Now assume that p = 1 sothat f ∈ L 1(G,E) and g ∈ L∞(G,F). For all x, z ∈ G we have

|(f ∗ g)(x)− (f ∗ g)(z)| ≤∫G

|f(x− y)− f(z − y)| |g(y)| dy

≤ ‖(L−x − L−z)f‖1 ‖g‖∞ ,

and the uniform continuity of f ∗ g follows from Theorem 5.170.

(3) follows directly from Theorem 5.172 and Fubini’s theorem (for Radon prod-ucts). For (4), we have

(f ∗ g)(x) =

∫G

f(x− y) · g(y) dy =

∫G

f(−y) · g(x+ y) dy

=

∫G

f(y) · g(x− y) dy =

∫G

g(x− y) · f(y) dy

= (g ∗ f)(x)

417


using the transformations y 7→ x + y (see Theorem 4.170) and y 7→ −y (seeTheorem 5.169). For (5),

((f ∗ g) ∗ h)(x) =

∫G

(f ∗ g)(x− y) · h(y) dy

=

∫G

(∫G

f(x− y − z) · g(z) dz

)· h(y) dy

=

∫G

(∫G

f(x− z) · g(z − y) dz

)· h(y) dy

=

∫G

f(x− z) ·(∫

G

g(z − y) · h(y) dy

)dz

=

∫G

f(x− z) · (g ∗ h)(z) dz = (f ∗ (g ∗ h))(x)

using the transformation z 7→ z − y followed by Fubini’s theorem.

The preceding theorem shows that when A is a Banach algebra, we can considerL1(G,A) as a Banach algebra with multiplication defined by convolution. If A iscommutative then L1(G,A) is also commutative. If A is a *-algebra then L1(G,A)is a *-algebra with involution defined by

f∗(x) = f(−x)∗.

However, L1(G,A) is not a C*-algebra unless G = 0. If A has a unit e andG is discrete then eχ0 is a unit in L1(G,A). Even if G is not discrete, we canconstruct an approximate unit:

Theorem 5.174. Let f ∈ L 1(G,F). For all ε > 0, there exists a neighborhoodU of 0 in G such that

‖u ∗ f − f‖1 < ε

for all nonnegative u ∈ L 1(G,R) such that supp(u) ⊆ U and∫Gu(x) dx = 1.

Proof. By Theorem 5.170 we can choose a neighborhood U of 0 such that ‖Laf −f‖1 < ε for all a ∈ U . Then for all x ∈ G,∫

G

|(u ∗ f)(x)− f(x)| dx =

∫G

∣∣∣∣∫G

[f(x− y)− f(x)]u(y) dy

∣∣∣∣ dx≤∫G

‖Lyf − f‖1 u(y) dy

= ε.

418


Lemma 5.175. If f ∈ L 1(G,E) and g ∈ L 1(G,F) then (Laf) ∗ g = f ∗ (Lag)for all a ∈ G.

Proof. We compute

((Laf) ∗ g)(x) =

∫G

f(x− y − a)g(y) dy

=

∫G

f(x− y)g(y − a) dy

= (f ∗ (Lag))(x).

The Fourier transform

A character of G is a continuous homomorphism from G to the circle groupT = z ∈ C : |z| = 1. The dual group G of G is the abelian group formed bytaking all characters of G and defining addition by

(α+ β)(x) = α(x)β(x)

for α, β ∈ G. Note that −α = α since (α+ α)(x) = α(x)α(x) = 1. We give G thecompact open topology, which is the topology generated by the sets

VG(α,K, ε) = β ∈ G : |β(x)− α(x)| < ε for all x ∈ K,

for α ∈ G, compact subsets K of G, and ε > 0. In this topology, a net (αγ) in Gconverges to α if and only if αγ → α uniformly on compact sets.

Theorem 5.176. G is a topological group.

Proof. The inversion map is continuous because

−VG(α,K, ε) =β ∈ G : |β(x)− α(x)| < ε for all x ∈ K

=β ∈ G : |β(x)− α(x)| < ε for all x ∈ K

= VG(−α,K, ε).

For α, β, γ, δ ∈ G and x ∈ G,

|(γ + δ)(x)− (α+ β)(x)| = |γ(x)δ(x)− α(x)β(x)|= |γ(x)[δ(x)− β(x)] + β(x)[γ(x)− α(x)]|≤ |δ(x)− β(x)|+ |γ(x)− α(x)| .

419


ThereforeVG(α,K, ε) + VG(β,K, ε) ⊆ VG(α+ β,K, 2ε),

which implies that the product map (α, β) 7→ α+ β is continuous.

From now on we will take F = C, and write L 1(G) = L 1(G,C) and Cc(G) =

Cc(G,C). Let f ∈ L 1(G). The Fourier transform of f is the function f : G→ Cdefined by

f(α) =

∫G

f(x)α(x) dx.

The map f 7→ f , usually denoted by F , is also called the Fourier transform.

The following theorem provides the key connection between the dual group andSection 5.2.

Theorem 5.177 (Gelfand space of L1(G)). Let ϕα(f) = f(α). Then ϕα ∈∆(L1(G)) for all α ∈ G, and the map α 7→ ϕα from G to ∆(L1(G)) is a homeo-morphism.

Proof. Define Φ : G → ∆(L1(G)) by α 7→ ϕα. The linearity of ϕα is clear, so itremains to show that it is nonzero and continuous, and that ϕα(f∗g) = ϕα(f)ϕα(g)for all f, g ∈ L 1(G). To show that ϕα 6= 0, choose any nonzero, nonnegativef ∈ Cc(G) and note that

ϕα(αf) =

∫G

|α(x)|2 f(x) > 0.

Continuity follows from the fact that

|ϕα(f)| ≤∫G

|f(x)α(x)| dx = ‖f‖1 .

Also,

ϕα(f ∗ g) =

∫G

(∫G

f(x− y)g(y) dy

)α(x) dx

=

∫G

∫G

f(x− y)g(y)α(x) dx dy

=

∫G

∫G

f(x)g(y)α(x+ y) dx dy

=

(∫G

f(x)α(x) dx

)(∫G

g(y)α(y) dy

)= ϕα(f)ϕα(g).

420


Next, we show that Φ is a bijection. To prove injectivity, suppose that ϕα = ϕβfor some α, β ∈ G, i.e. f(α) = f(β) for all f ∈ L 1(G). The map

f 7→∫G

f(x)(α(x)− β(x)) dx

is in L1(G)∗, so Theorem 4.183 shows that α = β locally almost everywhere. Sinceα and β are continuous, α = β. To prove surjectivity, let ϕ ∈ ∆(L1(G)); we want

to show that there is some α ∈ G such that ϕ = ϕα. Since ϕ ∈ L1(G)∗, Theorem4.183 shows that there is some β ∈ L∞loc(G) such that

ϕ(f) =

∫G

f(x)β(x) dx

for all f ∈ L 1(G). Since ϕ 6= 0, we can choose some g ∈ L 1(G) such thatϕ(g) = 1. Let α(x) = ϕ(Lxg), which is continuous because x 7→ Lxg is continuous(see Theorem 5.170) and ϕ is continuous. For all x, y ∈ G,

α(x+ y) = ϕ(Lx+yg) = ϕ(g)ϕ(Lx+yg)

= ϕ(g ∗ Lx+yg) = ϕ(Lxg ∗ Lyg)

= ϕ(Lxg)ϕ(Lyg) = α(x)α(y)

using Lemma 5.175. Using the fact that |ϕ| ≤ 1 (see Corollary 5.25), we have|α(x)| ≤ ‖Lxg‖1 = ‖g‖1 and

|α(x)|n = |α(nx)| ≤ ‖g‖1for every integer n. Taking n → ∞ shows that |α(x)| ≤ 1 (for otherwise there isa contradiction). Similarly, taking n→ −∞ shows that |α(x)| ≥ 1, and therefore

|α(x)| = 1 for all x ∈ G. This shows that α ∈ G. We compute

ϕ(f) = ϕ(g)ϕ(f) = ϕ(g ∗ f)

=

∫G

(∫G

g(x− y)f(y) dy

)β(x) dx =

∫G

f(y)

∫g(x− y)β(x) dx dy

=

∫G

f(y)ϕ(Lyg) dy =

∫G

f(y)α(y) dy

= ϕα(f)

for all f ∈ L 1(G). This proves that Φ is a bijection.

Finally, we prove that Φ is a homeomorphism. Recall that the compact opentopology on G is generated by the sets

V (α,K, ε) = β ∈ G : |β(x)− α(x)| < ε for all x ∈ K

421


for α ∈ G, compact subsets K of G, and ε > 0. Also recall that the Gelfandtopology on ∆(L1(G)) is generated by the sets

U(ϕ, F, ε) = ψ ∈ ∆(L1(G)) : |ψ(f)− ϕ(f)| < ε for all f ∈ F,

for ϕ ∈ ∆(L1(G)), finite subsets F of L1(G), and ε > 0.

To prove that Φ is continuous, let α ∈ Φ−1(U(ϕγ , F, δ)) and write F = f1, . . . , fn.We can assume that f1, . . . , fn 6= 0. Let

δ′ = δ − maxi=1,...,n

|ϕα(fi)− ϕγ(fi)| > 0.

For each i = 1, . . . , n, choose some gi ∈ Cc(G) such that ‖gi − fi‖1 < δ′/4 (see

Theorem 4.143). Let K =⋃ni=1 supp(gi) and ε = (δ′/2) mini=1,...,n ‖fi‖−1

1 . Ifβ ∈ V (α,K, ε) then |(β − α)(x)− 1| < ε for all x ∈ K,

|ϕβ(fi)− ϕα(fi)|

=

∣∣∣∣∫G

fi(x)α(x)((β − α)(x)− 1) dx

∣∣∣∣≤∫K

|fi(x)| |(β − α)(x)− 1| dx+

∫G\K|fi(x)| |(β − α)(x)− 1| dx

≤ ‖fi‖1 ε+ 2

∫G\K|fi(x)| dx

= ‖fi‖1 ε+ 2

∫G\K|gi(x)− fi(x)| dx

≤ ‖fi‖1 ε+ 2 ‖gi − fi‖1< δ′,

and|ϕβ(fi)− ϕγ(fi)| < |ϕβ(fi)− ϕα(fi)|+ |ϕα(fi)− ϕγ(fi)| < δ

for i = 1, . . . , n. This shows that V (α,K, ε) ⊆ Φ−1(U(ϕγ , F, δ)).

To prove that Φ−1 is continuous, let ϕα ∈ Φ(V (γ,K, δ)). Since K is compact, wehave

δ′ = δ − supx∈K|α(x)− γ(x)| > 0.

Choose some f ∈ L1(G) such that ϕα(f) = f(α) = 1, and choose a neighborhoodW of 0 ∈ G such that ‖Lxf − Lyf‖1 < δ′/3 whenever y − x ∈ W (see Theorem

422


5.170). Note that

β(x)f(β) =

∫G

f(y)β(x)β(y) dy =

∫G

f(y)β(x− y) dy

=

∫G

f(x+ y)β(−y) dy = L−xf(β)

for all β ∈ G. For all x ∈ G, if y ∈ x+W and ϕβ ∈ U(ϕα, f, Lxf, δ′/3) then

|β(y)− α(x)|

≤ |β(y)− β(y)f(β)|+ |β(y)f(β)− β(x)f(β)|+ |β(x)f(β)− α(x)f(α)|

= |1− f(β)|+ |Lyf(β)− Lxf(β)|+ |Lxf(β)− Lxf(α)|

≤ |f(α)− f(β)|+ ‖Lyf − Lxf‖1 + |Lxf(β)− Lxf(α)|< δ′.

Since K is compact, we can choose x1, . . . , xn ∈ K such that K ⊆⋃ni=1(xi +W ).

If y ∈ K andϕβ ∈ U(ϕα, f, Lx1

f, . . . , Lxnf, δ′/3)

then y ∈ xi +W for some i, and

|β(y)− γ(y)| ≤ |β(y)− α(xi)|+ |α(xi)− γ(xi)| < δ.

This shows that U(ϕα, f, Lx1f, . . . , Lxnf, δ′/3) ⊆ Φ(V (γ,K, δ)).

Theorem 5.178 (Properties of the Fourier transform). If we identify ∆(L1(G))

with G, then the Fourier transform is the Gelfand representation of L1(G) andsatisfies the following properties:

1. F is a *-homomorphism from L1(G) to C0(G).

2. F is continuous and |F| ≤ 1, i.e. ‖f‖∞ ≤ ‖f‖1 for all f ∈ L 1(G).

3. F(L1(G)) is dense in C0(G).

4. For all x ∈ G and α ∈ G,

α(x)f(α) = (f ∗ α)(x) = L−xf(α).

In particular, F(L1(G)) is invariant under multiplication by α 7→ α(x) forany x ∈ G.

5. For all α ∈ G and f ∈ L 1(G),

αf = Lαf .

In particular, F(L1(G)) is translation-invariant.

423


Proof. (1) and (2) follow from Theorem 5.32 and the fact that

f∗(α) =

∫G

f(−x)α(x) dx =

∫G

f(x)α(−x) dx

=

∫G

f(x)α(x) dx = f(x).

Since F(L1(G)) is closed under complex conjugation and strongly separates points,(3) follows from Theorem 4.134. For (4),

α(x)f(α) =

∫G

f(y)α(x)α(y) dy =

∫G

f(y)α(x− y) dy = (α ∗ f)(x)

and

(α ∗ f)(x) =

∫G

f(y)α(x− y) dy =

∫G

f(y + x)α(−y) dy = L−xf(α).

For (5),

αf(β) =

∫G

f(x)α(x)β(x) dx =

∫G

f(x)(β − α)(x) dx

= f(β − α) = (Lαf)(β)

for all β ∈ G.

Theorem 5.179. If G is discrete then G is compact, and if G is compact then Gis discrete.

Proof. If G is discrete then L1(G) has the unit χ0, and Theorem 5.29 shows that

G is compact. Suppose that G is compact and its Haar measure m is normalized sothat m(G) = 1. Define f : G→ C by f = 1 on G; since G is compact, f ∈ L1(G).

If α ∈ G is nonzero then α(z) 6= 1 for some z ∈ G, so

f(α) =

∫G

α(−x) dx = α(z)

∫G

α(z − x) dx

= α(z)

∫G

α(−x) dx = α(z)f(α)

and therefore f(α) = 0. But f(0) = 1, so 0 = f−1(1) is open because f is

continuous. Therefore G is discrete.

Theorem 5.180. If G is second countable then G is second countable.

424


Proof. By Corollary 4.144, L1(G) is separable. This implies that C0(G) is separa-

ble, because F(L1(G)) is dense in C0(G) (see Theorem 5.178). It follows that G is

second countable (e.g. if F is a countable dense subset of C0(G) and U1, U2, . . . is a basis for C, then

f−1(Un) : f ∈ F, n = 1, 2, . . .

is a basis for G.)

Convolution for measures

Recall that MR(G) = MR(G,C) is the Banach space consisting of all C-valuedRadon measures on G. (Note that all C-valued measures are of bounded variation,due to Theorem 4.16.) For any two measures µ, ν ∈ MR(G), we can define theproduct measure µ ⊗ ν on G×G by

d(µ ⊗ ν)(x, y) =dµ

d |µ|(x)

dν

d |ν|(y)d(|µ| ⊗ |ν|)(x, y),

using Theorem 4.108. Theorem 4.103 shows that |µ ⊗ ν| = |µ| ⊗ |ν|, so µ ⊗ ν isRadon. Let ρ : G × G → G be the addition map, i.e. ρ(x, y) = x + y. Theconvolution of µ and ν is the image measure

µ ∗ ν = ρ∗(µ ⊗ ν)

(see Theorem 4.26.) That is,

(µ ∗ ν)(E) = (µ ⊗ ν)((x, y) ∈ G×G : x+ y ∈ E).

Theorem 5.181 (Properties of convolution for measures). Let µ, ν, λ ∈MR(G).

1. µ ∗ ν ∈MR(G).

2. ‖µ ∗ ν‖ ≤ ‖µ‖ ‖ν‖.

3. If f ∈ L 1(ρ∗|µ ∗ ν|) then f ρ ∈ L 1(|µ ⊗ ν|) and∫G

f d(µ ∗ ν) =

∫G×G

f ρ d(µ ⊗ ν) =

∫(x,y)∈G×G

f(x+ y) d(µ ⊗ ν).

4. µ ∗ ν = ν ∗ µ.

5. (µ ∗ ν) ∗ λ = µ ∗ (ν ∗ λ).

425


Proof. Since µ ⊗ ν is Radon, (1) follows from Theorem 4.150. For (2), Theorem4.26 implies that

|µ ∗ ν|(G) ≤ |µ ⊗ ν|(G×G) = (|µ| ⊗ |ν|)(G×G)

= |µ| (G) |ν| (G) = ‖µ‖ ‖ν‖ .

For (3), Theorem 4.69 shows that f ρ ∈ L 1(|µ ⊗ ν|). It is easy to see that theequality holds whenever f is simple, so the result follows by continuity. For (4),

(µ ∗ ν)(E) = (µ ⊗ ν)((x, y) ∈ G×G : x+ y ∈ E)= (ν ⊗µ)((y, x) ∈ G×G : x+ y ∈ E)= (ν ∗ µ)(E).

For (5), we have

((µ ∗ ν) ∗ λ)(E) =

∫(x,y,z)∈G×G×G

χE(x+ y + z) d(µ ⊗ ν ⊗λ) = (µ ∗ (ν ∗ λ))(E)

using (3) and Fubini’s theorem.

As with L1(G), if we define multiplication in MR(G) by convolution then MR(G)is a Banach algebra. It is commutative and unital because the Dirac measure δ0at 0 ∈ G (defined by δ0(E) = 1 if 0 ∈ E and δ0(E) = 0 if 0 /∈ E) is a unit inMR(G). Furthermore, MR(G) is a *-algebra if we define

µ∗(E) = µ(−E).

Lemma 5.182. Let m be a Haar measure on G, let f, g ∈ L 1(G), and supposethat f and g vanish outside a σ-finite set S (which always exists due to Theorem4.55). Then

d(mf ⊗mg)(x, y) = f(x)g(y)d(m ⊗m)(x, y),

where m has been restricted to S.

Proof. We have

d(mf ⊗mg)(x, y) =dmf

d|mf |(x)

dmg

d|mg|(y)d(|mf | ⊗ |mg|)(x, y)

=f

|f |(x)

g

|g|(y)|f |(x)|g|(y)d(m ⊗m)(x, y)

= f(x)g(y)d(m ⊗m)(x, y),

noting that f/|f | is defined |mf |-almost everywhere and g/|g| is defined |mg|-almost everywhere.

426


Theorem 5.183. Let m be a Haar measure on G. The map f 7→ mf from L1(G)to MR(G) is an isometric *-homomorphism. Furthermore:

1. If L1(G) is unital then f 7→ mf is unital.

2. If G is discrete then f 7→ mf is surjective.

Proof. Theorem 4.103 and Theorem 4.149 show that f 7→ mf is a linear isometryinto MR(G). If f, g ∈ L 1(G) and h ∈ C0(G) then Theorem 5.173 and Theorem4.104 imply that h ∈ L 1(|mf∗g|). Also, Theorem 4.69 shows that h ∈ L 1(ρ∗|mf ∗mg|). By Theorem 5.181, Lemma 5.182 and Fubini’s theorem,∫

G

h dmf∗g =

∫G

h(f ∗ g) dm =

∫G

∫G

h(x)f(x− y)g(y) dy dx

=

∫G

∫G

h(x+ y)f(x)g(y) dx dy =

∫G×G

h ρ d(mf ⊗mg)

=

∫G

h d(mf ∗mg),

and mf∗g = mf ∗mg due to uniqueness in Theorem 4.151.

(1) is obvious. For (2), suppose that G is discrete and let µ ∈ MR(G). Definef(x) = µ(x)/m(0). If E is compact then E is finite, so

µ(E) =

∫G

χE dµ =∑x∈E

µ(x)

=

∫G

χEf dm = mf (E).

By uniqueness in Theorem 4.151, µ = mf .

The Fourier transform for measures

Let µ ∈ MR(G). The Fourier transform (or Fourier-Stieltjes transform) of

µ is the function µ : G→ C defined by

µ(α) =

∫x∈G

α(x) dµ.

We will write F for the map µ 7→ µ.

Theorem 5.184 (Properties of the Fourier transform for measures). Let µ ∈MR(G).

427


1. F is a *-homomorphism from MR(G) to C(G). In particular, µ is bounded.

2. F is continuous and |F| ≤ 1, i.e. ‖µ‖∞ ≤ ‖µ‖.

3. µ is uniformly continuous.

4. For all x ∈ G and α ∈ G,

α(x)µ(α) = ν(α),

where ν(E) = µ(E − x). In particular, F(MR(G)) is invariant under multi-plication by α 7→ α(x) for any x ∈ G.

5. For all α ∈ G,µα = Lαµ.

In particular, F(MR(G)) is translation-invariant.

Proof. It is clear from Theorem 4.103 that ‖µ‖∞ ≤ ‖µ‖. Let ε > 0. Since µis Radon, Theorem 4.140 shows that there is a compact set K ⊆ G such that|µ| (G \K) < ε. If α, β ∈ VG(0,K, ε) then

|µ(α)− µ(β)| ≤∫x∈G|α(x)− β(x)| d |µ|

=

∫x∈G|(α− β)(x)− 1| d |µ|

≤∫x∈K|(α− β)(x)− 1| d |µ|+

∫x∈G\K

|(α− β)(x)− 1| d |µ|

≤ ‖µ‖ ε+ 2ε.

This proves (3). Let µ, ν ∈MR(G) and r ∈ C. It is easy to see that rµ = rµ andµ+ ν = µ+ ν. Furthermore,

µ ∗ ν(α) =

∫x∈G

α(x) d(µ ∗ ν) =

∫(x,y)∈G×G

α(x+ y) d(µ ⊗ ν)

=

∫x∈G

α(x) dµ

∫y∈G

α(y) dν = (µν)(α)

for all α ∈ G. This proves (1) and (2). For (4),

α(x)µ(α) =

∫y∈G

α(x)α(y) dµ =

∫y∈G

α(y − x) dµ

=

∫y∈G

α(y) dν = ν(α).

428


For (5), if β ∈ G then

µα(β) =

∫x∈G

β(x) dµα =

∫x∈G

β(x)α(x) dµ

=

∫x∈G

(β − α)(x) dµ = (Lαµ)(β).

Positive definite functions

We say that a function f : G→ C is positive definite if the matrix (f(xi−xj))i,jis positive semidefinite for all x1, . . . , xn ∈ G. That is,∑

1≤i,j≤n

aiajf(xi − xj) ≥ 0

for all x1, . . . , xn ∈ G and a1, . . . , an ∈ C.

Theorem 5.185 (Properties of positive definite functions). Let f : G → C be apositive definite function.

1. f(−x) = f(x) for all x ∈ G.

2. |f(x)| ≤ f(0) for all x ∈ G. In particular, f is bounded and f(0) ≥ 0.

3. |f(x)− f(y)|2 ≤ 2f(0)(f(0)− Re f(x− y)) for all x, y ∈ G.

4. If f is continuous at 0 then f is uniformly continuous.

5. f and x 7→ f(−x) are positive definite.

Proof. Take x1 = 0, x2 = x, a1 = 1 and a2 = r so that

(1 + |r|2)f(0) + rf(x) + rf(−x) ≥ 0.

Setting r = 1 shows that f(x) + f(−x) is real, and setting r = i shows thati(f(x)− f(−x)) is real. Then

− 2if(−x) = i(f(x)− f(−x))− i(f(x) + f(−x))

= i(f(x)− f(−x)) + i(f(x) + f(−x)) = 2if(x),

which proves (1). Choosing r such that |r| = 1 and rf(x) = −|f(x)| gives 2f(0)−2|f(x)| ≥ 0, which proves (2). For (3), we can assume that f(x) 6= f(y). Let r ∈ Rand take x1 = 0, x2 = x, x3 = y, a1 = 1,

a2 = r|f(x)− f(y)|f(x)− f(y)

,

429


and a3 = −a2. Since f is positive definite,

0 ≤ (1 + 2r2)f(0) + 2r |f(x)− f(y)| − 2r2 Re f(x− y)

= f(0) + 2 |f(x)− f(y)| r + 2(f(0)− Re f(x− y))r2.

This is a quadratic polynomial in r, so its discriminant cannot be positive. Thisproves (3), and (4) follows directly from (3). (5) is obvious.

The following two results provide examples of positive definite functions.

Theorem 5.186. If f ∈ L 2(G) then f ∗ f∗ is continuous and positive definite.

Proof. Theorem 5.173 shows that f ∗ f∗ is continuous. For all x1, . . . , xn ∈ G anda1, . . . , an ∈ C,∑

1≤i,j≤n

aiaj(f ∗ f∗)(xi − xj) =∑

1≤i,j≤n

aiaj

∫G

f(xi − xj − y)f(−y) dy

=∑

1≤i,j≤n

aiaj

∫G

f(xi − y)f(xj − y) dy

=

∫G

∣∣∣∣∣n∑i=1

aif(xi − y)

∣∣∣∣∣2

dy

≥ 0.

Theorem 5.187. If µ ∈MR(G) is a positive measure then

x 7→∫α∈G

α(x) dµ

defines a continuous and positive definite function on G.

Proof. For all x1, . . . , xn ∈ G and a1, . . . , an ∈ C,

∑1≤i,j≤n

aiaj

∫α∈G

α(xi − xj) dµ =

∫α∈G

∣∣∣∣∣n∑i=1

aiα(xi)

∣∣∣∣∣2

dµ ≥ 0.

Corollary 5.188. Every character of G is positive definite.

430


Let B(G) be the span of the set of all continuous positive definite functions on G.

Theorem 5.189 (Bochner’s theorem). If f ∈ B(G) then there is a unique µ ∈MR(G) such that

f(x) =

∫α∈G

α(x) dµ (*)

for all x ∈ G. If f is positive definite, then µ is positive.

Proof. To prove uniqueness, it suffices to show that if∫α∈G

α(x) dµ = 0

for all x ∈ G, then µ = 0. For all g ∈ L1(G),∫G

g dµ =

∫α∈G

∫x∈G

g(x)α(x) dx dµ =

∫x∈G

g(−x)

∫α∈G

α(x) dµ dx = 0.

Since F(L1(G)) is dense in C0(G) (see Theorem 5.178), Theorem 4.151 impliesthat µ = 0.

To prove existence, first assume that f is positive definite. We will assume thatL1(G) is non-unital, because the proof is even simpler when L1(G) is unital. LetLe be the unitization of L1(G), and define λ ∈ L∗e by

λ(g + re) 7→∫G

g(x)f(x) dx+ rf(0)

for g ∈ L1(G) and r ∈ C. Note that λe = r(0). We have

λ((g + re) ∗ (g∗ + re)) = λ(g ∗ g∗) + rλg + rλg∗ + |r|2 f(0), (**)

where

λ(g ∗ g∗) =

∫G

f(x)

∫G

g(x− y)g(−y) dy dx =

∫G

∫G

f(x− y)g(x)g(y) dy dx

and

rλg + rλg∗ =

∫G

rg(x)f(x) dx+

∫G

rg(x)f(x) dx.

Suppose that g ∈ Cc(G), and let K = supp(g). Since (x, y) 7→ f(x− y)g(x)g(y)is uniformly continuous on K ×K and gf is uniformly continuous on K, we can

431


choose a partition E1, . . . , En of K and elements y1, . . . , yn ∈ G with yi ∈ Eisuch that ∑

1≤i,j≤n

f(yi − yj)g(yi)g(yj)m(Ei)m(Ej) +

n∑i=1

f(yi)rg(yi)m(Ei)

+

n∑i=1

f(yi)rg(yi)m(Ei) + f(0)rr

is arbitrarily close to (**). The above expression is always nonnegative, becausewe can take xi = yi and ai = g(yi)m(Ei) for 1 ≤ i ≤ n, xn+1 = 0 and an+1 = r inthe definition of positive definiteness for f . This shows that λ is a positive linearfunctional.

Let G∗ be the one-point compactification of G. By Theorem 5.29 and Theorem5.49, there is a positive measure ν ∈MR(G∗) such that∫

G

g(x)f(x) dx+ rf(0) =

∫G∗

(g + r) dν

for all g ∈ L 1(G) and r ∈ C. Let µ be the restriction of ν to G; setting r = 0gives ∫

G

g(x)f(x) dx =

∫G

g dµ =

∫α∈G

∫G

g(x)α(x) dx dµ

=

∫G

g(x)

∫α∈G

α(x) dµ dx

for all g ∈ L 1(G). By Theorem 4.183, f(x) =∫α∈G α(x) dµ for locally almost all

x ∈ G. Since f is continuous, this holds for all x ∈ G.

For the general case, write f =∑ni=1 aifi where a1, . . . , an ∈ C and f1, . . . , fn

are continuous and positive definite. For each i we have fi(x) =∫α∈G α(x) dµi

for some positive measure µi ∈ MR(G). Therefore f(x) =∫α∈G α(x) dµ, where

µ =∑ni=1 aiµi ∈MR(G).

Corollary 5.190. A function f : G → C is in B(G) if and only if there is a

measure µ ∈MR(G) such that

f(x) =

∫α∈G

α(x) dµ

for all x ∈ G.

432


Proof. It remains to show that if µ ∈ MR(G) then f ∈ B(G). First assume thatµ is R-valued and let

µ+ =1

2(|µ|+ µ) and µ− =

1

2(|µ| − µ).

Clearly µ+ and µ− are positive measures in MR(G), and µ = µ+ − µ−. Define

f+(x) =

∫α∈G

α(x) dµ+ and f−(x) =

∫α∈G

α(x) dµ−.

Theorem 5.187 shows that f+ and f− are continuous and positive definite, sof = f+ − f− ∈ B(G).

If µ is C-valued then Reµ and Imµ are R-valued measures in MR(G) (see Theorem4.148), and µ = Reµ+ i Imµ. Define

fR(x) =

∫α∈G

α(x) d(Reµ) and f I(x) =

∫α∈G

α(x) d(Imµ).

Then fR, f I ∈ B(G), so f = fR + if I ∈ B(G).

The inversion theorem

Theorem 5.191 (Fourier inversion theorem). If m is a Haar measure on G, then

there is a Haar measure m on G such that for all f ∈ L1(G) ∩ B(G) we have

f ∈ L1(G) and

f(x) =

∫G

f(α)α(x) dα

for all x ∈ G.

Proof. Let B1(G) = L1(G) ∩ B(G). If f ∈ B1(G), we will write µ(f) for themeasure in Theorem 5.189 satisfying f(x) =

∫α∈G α(x) dµ(f). In that case, for all

h ∈ L1(G) we have

(h ∗ f)(0) =

∫G

h(−x)f(x) dx =

∫α∈G

∫G

h(−x)α(x) dx dµ(f)

=

∫α∈G

∫G

h(x)α(x) dx dµ(f) =

∫G

h dµ(f).

433


If f, g ∈ B1(G) then∫G

h dµ(f)g =

∫G

hg dµ(f) = ((h ∗ g) ∗ f)(0)

= ((h ∗ f) ∗ g)(0) =

∫G

hf dµ(g) =

∫G

h dµ(g)

f

for all h ∈ L1(G). Since F(L1(G)) is dense in C0(G) (see Theorem 5.178), unique-

ness in Theorem 4.151 implies that µ(f)g = µ

(g)

f.

Define a nonzero positive linear functional λ : Cc(G,R) → R as follows: let ϕ ∈Cc(G,R) and let K = supp(ϕ). Since Cc(G) is dense in L1(G) (see Theorem4.143), for every α ∈ K we can choose some u ∈ Cc(G) such that u(α) 6= 0. Note

that F(u ∗ u∗) = |u|2 is nonnegative on G and positive on a neighborhood of α.Since K is compact, there are finitely many u1, . . . , un ∈ Cc(G) such that

v =

n∑i=1

ui ∗ u∗i (*)

satisfies v > 0 on K. Theorem 5.186 shows that v ∈ B1(G), so we can set

λϕ =

∫G

ϕ

vdµ(v).

If w ∈ B1(G) and w > 0 on K then µ(v)w = µ

(w)v and∫

G

ϕ

wdµ(w) =

∫G

ϕ

vwdµ

(w)v =

∫G

ϕ

vwdµ

(v)w =

∫G

ϕ

vdµ(v),

which shows that this is well-defined. The linearity of λ is clear. Since v is positivedefinite, Theorem 5.189 shows that µ(v) is a positive measure, and therefore ϕ ≥ 0implies λϕ ≥ 0. Choose any h ∈ Cc(G) such that h 6= 0 and let w = h ∗ h∗ ∈B1(G), which is positive definite (see Theorem 5.186). Theorem 5.189 implies that

µ(w) 6= 0, so there is some ϕ ∈ Cc(G,R) such that∫Gϕdµ(w) 6= 0. If v is chosen

as in (*) then

λ(ϕw) =

∫G

ϕw

vdµ(v) =

∫G

ϕdµ(w) 6= 0, (**)

which shows that λ 6= 0.

Let ϕ ∈ Cc(G,R), let K = supp(f), and let α ∈ G. Choose v as in (*) so thatv > 0 on K ∪ (K + α). Define w(x) = α(x)v(x) so that w(β) = v(β + α) (see

434


Theorem 5.178) for all β ∈ G and µ(w)(E) = µ(v)(E − α) for all E ∈ BG. Defineψ(β) = ϕ(β − α). Then

λψ =

∫β∈G

ϕ(β − α)

v(β)dµ(v) =

∫β∈G

ϕ(β − α)

w(β − α)dµ(v)

=

∫G

ϕ

wdµ(w) = λϕ,

which shows that λ is translation-invariant. By Theorem 4.137, there is a Haarmeasure m on G such that

λϕ =

∫G

ϕdm

for all ϕ ∈ Cc(G,R).

If f ∈ B1(G) is positive definite then (**) shows that∫G

ϕf dm = λ(ϕf) =

∫G

ϕdµ(f) (***)

for all ϕ ∈ Cc(G,R). By Theorem 4.143, Theorem 4.140 and the continuity of

f , (***) holds for all ϕ ∈ I(m,R). Furthermore, α ∈ G : f(α) 6= 0 is σ-finite

because f ∈ C0(G), and the linear functional ϕ 7→∫Gϕf dm is clearly µ-continuous

on I(m,R). If ϕ ∈ I(m,R) and ‖ϕ‖∞ = 1 then∣∣∣∣∫G

ϕf dm

∣∣∣∣ ≤ µ(f)(G) <∞,

so Theorem 4.112 shows that f ∈ L1(G).

By (***) we have ∫G

ϕdmf =

∫G

ϕdµ(f)

for all ϕ ∈ Cc(G,R), so mf = µ(f) by uniqueness in Theorem 4.137. By linearity,

this holds for all f ∈ B1(G). Therefore

f(x) =

∫α∈G

α(x) dµ(f) =

∫α∈G

α(x) dmf =

∫α∈G

f(x)α(x) dm.

435


Pontryagin duality

Let m be a Haar measure on G and choose a Haar measure m on G such thatTheorem 5.191 holds.

Theorem 5.192 (Plancherel theorem). F|L1(G)∩L2(G) is an isometry (under the

L2-norm) and its image is dense in L2(G). Furthermore:

1. F|L1(G)∩L2(G) can be uniquely extended to an isometric isomorphism F :

L2(G)→ L2(G).

2. fg = f ∗ g for all f, g ∈ L 2(G).

3. F(L1(G)) = L2(G) ∗ L2(G) = h ∗ k : h, k ∈ L2(G).

Proof. If f ∈ L1(G)∩L2(G) then f ∗f∗ ∈ L1(G) is continuous and positive definiteby Theorem 5.186. By Theorem 5.191,

‖f‖22 =

∫G

|f(x)|2 dx =

∫G

f(x)f∗(−x) dx

= (f ∗ f∗)(0) =

∫G

f ∗ f∗(α) dα

=

∫G

|f(α)|2 dα = ‖f‖22.

Let F = F(L1(G) ∩ L2(G)). To show that F is dense in L2(G), it suffices to

show that F⊥ = 0 (see Lemma 1.88). Let g ∈ L 2(G) and suppose that∫Gf(α)g(α) dα = 0 for all f ∈ L 1(G) ∩L 2(G). Theorem 5.178 implies that∫

G

f(α)g(α)α(x) dα = 0

for all x ∈ G, so fg = 0 almost everywhere by uniqueness in Theorem 5.189.This holds for all f ∈ L 1(G) ∩L 2(G), and since F is translation-invariant (seeTheorem 5.178), we have g = 0 almost everywhere.

(1) follows from the fact that L1(G)∩L2(G) is dense in L2(G) (see Theorem 4.143),

and Corollary 1.36. For (2), let f, g ∈ L 2(G) and α ∈ G. By Corollary 1.8 andTheorem 5.178,

fg(α) =

∫G

f(x)g(x)α(x) dx =

∫G

(−α)f(β)g(−β) dβ

=

∫G

f(α+ β)g(−β) dβ = (f ∗ g)(α).

(3) follows immediately from (2).

436


Let H be the dual group of G. If x ∈ G then the map x : G → T defined byα 7→ α(x) is clearly a homomorphism. Recall that a net (αγ) in G converges to αif and only if αγ → α uniformly on compact sets. In particular,

x(αγ) = αγ(x)→ α(x) = x(α).

This shows that x is continuous, so x is a character of G. That is, x ∈ H.

Theorem 5.193 (Pontryagin duality theorem). The map x 7→ x from G to H isan isomorphism (of topological groups).

Proof. TODO

Corollary 5.194. G is discrete if and only if G is compact, and G is compact ifand only if G is discrete.


Corollary 5.195. L1(G) and MR(G) are semisimple. In particular, the Fourier

transform F : L1(G)→ C0(G) is injective.

Proof. Theorem 5.193 and uniqueness in Theorem 5.189 imply that MR(G) issemisimple, and it follows immediately that L1(G) is semisimple.

Corollary 5.196. G is discrete if and only if the map f 7→ mf from L1(G) toMR(G) is an isometric *-isomorphism (see Theorem 5.183). In particular, G isdiscrete if and only if L1(G) has a unit.

Proof. If G is discrete then Theorem 5.183 shows that f 7→ mf is an isometric *-

isomorphism, and χ0 is a unit in L1(G). IfG is not discrete then G is not compact

(see Corollary 5.194), and C0(G) has no unit. Since F(L1(G)) is a subalgebra of

C0(G), it has no unit. Corollary 5.195 implies F : L1(G) → F(L1(G)) is anisomorphism (of algebras), so L1(G) has no unit. Since MR(G) is unital, f 7→ mf

cannot be an isomorphism.

Theorem 5.197. Let µ ∈ MR(G). If µ ∈ L 1(G), then the function f : G → Cdefined by

f(x) =

∫G

µ(α)α(x) dα

is in L 1(G), and µ = mf . In particular, µ m.

437


Proof. Theorem 5.193 and Corollary 5.190 imply that µ ∈ L 1(G) ∩ B(G), so

Theorem 5.191 shows that µ ∈ L 1(G) and

µ(α) =

∫G

µ(x)α(x) dx.

Since µ(x) = f(−x) for all x ∈ G, we have f ∈ L 1(G) and∫G

α(x) dx = µ(α) =

∫G

f(−x)α(x) dx =

∫G

f(x)α(x) dx =

∫x∈G

α(x) dmf

for all α ∈ G. By Theorem 5.193 and uniqueness in Theorem 5.189, we haveµ = mf .

5.10 Fourier analysis in vector spaces

Let E be a finite-dimensional vector space over R and let m be any Lebesguemeasure on E. Since E is finite-dimensional, it is an LCA group under addition.The next result shows that the dual group of E is simply E∗.

Lemma 5.198. Let α be a character of R. There is some y ∈ R such thatα(x) = e2πixy for all x ∈ R.

Proof. By Theorem 2.11, we can choose some c > 0 such that∫ c

0α(t) dt 6= 0. Let

C =(∫ c

0α(t) dt

)−1so that

α(x) = Cα(x)

∫ c

0

α(t) dt = C

∫ c

0

α(x+ t) dt = C

∫ x+c

x

α(t) dt

andα′(x) = C(α(x+ c)− α(x)) = C(α(c)− 1)α(x).

By Corollary 2.64 and the fact that α(0) = 1, we have α(x) = exp(C(α(c)− 1)x).Since |α(x)| = 1 for all x ∈ R, there is some y ∈ R such that C(α(c)−1) = 2πiy.

For each ξ ∈ E∗ the map αξ : E → C defined by αξ(x) = e2πi〈x,ξ〉 is clearly acharacter of E, where 〈x, ξ〉 = ξ(x).

Theorem 5.199. The map ξ 7→ αξ from E∗ to E is an isomorphism (of topologicalgroups).

438


Proof. To show that the map is injective, suppose that ξ, ζ ∈ E∗ and e2πi〈x,ξ〉 =e2πi〈x,ζ〉 for all x ∈ E. Differentiating at x = 0 gives

2πi 〈u, ξ〉 = 2πi 〈u, ζ〉

for all u ∈ E, i.e. ξ = ζ. When E = R, Lemma 5.198 shows that the mapis surjective. Suppose that E is n-dimensional and choose a basis u1, . . . , unfor E. Let α ∈ E. For each i the function s 7→ α(sui) is a character of R, soα(sui) = e2πisti for some ti ∈ R. Define ξ ∈ E∗ by setting ξui = ti for each i; ifx = r1u1 + · · ·+ rnun then

α(x) = α(r1u1) · · ·α(rnun) = e2πi(r1t1+···+rntn) = e2πi〈x,ξ〉.

This shows that α = αξ.

Recall that E has the compact open topology, which is generated by the sets

VE(α,K, ε) = β ∈ E : |β(x)− α(x)| < ε for all x ∈ K,

for α ∈ E, compact subsets K of E, and ε > 0. Given any such K and ε, let ζ ∈ E∗and let

W = ξ ∈ E∗ : αξ ∈ VE(αζ ,K, ε)= ξ ∈ E∗ : |e2πi〈x,ξ〉 − e2πi〈x,ζ〉| < ε for all x ∈ K= ξ ∈ E∗ : |e2πi〈x,ξ−ζ〉 − 1| < ε for all x ∈ K.

To show that W is open, let ξ0 ∈W . For each y ∈ K the set

Wy = (x, ξ) ∈ E× E∗ : |e2πi〈x,ξ−ζ〉 − 1| < ε

contains (y, ξ0) and is open because (x, ξ) 7→ |e2πi〈x,ξ−ζ〉 − 1| is continuous, so wecan choose neighborhoods Uy of y and Vy of ξ0 such that Uy × Vy ⊆Wy. Since Kis compact, we can choose y1, . . . , yn ∈ K such that K ⊆ Uy1 ∪ · · · ∪ Uyn . ThenW0 = Vy1 ∩· · ·∩Vyn is a neighborhood of ξ0, and it is easy to check that W0 ⊆W .This proves that the map ξ 7→ αξ is continuous, and it remains to show that itsinverse is continuous.

Let U ⊆ E∗ be an open set and let V = αξ : ξ ∈ U. To show that V is open, letB be the closed unit ball in E and let ζ ∈ U . It suffices to choose ε > 0 such thatVE(αζ , B, ε) ⊆ V , i.e.

ξ ∈ E∗ : |e2πi〈x,ξ−ζ〉 − 1| < ε for all x ∈ B ⊆ U.

Note that|eiθ − 1| =

√2(1− cos θ)

439


for all θ ∈ R. Choose some 0 < δ < 1/4 such that ξ ∈ U whenever |ξ− ζ| < δ, andlet

ε =√

2(1− cos(2πδ)).

If ξ ∈ E∗ and |e2πi〈x,ξ−ζ〉 − 1| < ε for all x ∈ B then√2 (1− cos (2π(ξ − ζ)(x))) <

√2(1− cos(2πδ))

andcos(2πδ) < cos (2π(ξ − ζ)(x)) . (*)

For each x ∈ B, this implies that (ξ − ζ)(x) ∈ (n − δ, n + δ) for some n ∈ Z. Ifn > 0 then

1

4(ξ − ζ)(x)x ∈ B

and (*) implies that

0 < cos(2πδ) < cos

(2π(ξ − ζ)

(1

4(ξ − ζ)(x)x

))= cos

(π2

)= 0,

which is a contradiction. A similar argument shows that we cannot have n < 0.Therefore |(ξ− ζ)(x)| < δ for all x ∈ B, i.e. |ξ− ζ| < δ and ξ ∈ U . This completesthe proof.

Theorem 5.200 (Properties of the Fourier transform in vector spaces). TheFourier transform F : L1(E)→ C0(E∗) defined by

F(f)(ξ) = f(ξ) =

∫x∈E

f(x)e−2πi〈x,ξ〉

is a continuous *-homomorphism with |F| ≤ 1, and satisfies the following proper-ties:

1. F(L1(E)) is dense in C0(E∗).

2. For all x ∈ E and ξ ∈ E∗,

e2πi〈x,ξ〉f(ξ) = αξ(x)f(ξ) = (f ∗ αξ)(x) = L−xf(ξ).

3. For all ξ ∈ E∗ and f ∈ L 1(E),

αξf = Lαξ f .

4. If f ∈ L 1(E) and τ ∈ L(E) is invertible then f τ ∈ L 1(E) and

f τ = |det(τ−1)|f (τ∗)−1,

where τ∗ ∈ L(E∗) is the transpose of τ (defined in Theorem 1.76).

440


5. Let p ≥ 0. If f ∈ L 1(E) and (ζ1· · · ζq)f ∈ L 1(E) for all ζ1, . . . , ζq ∈ E∗

and all q ≤ p, then f is of class Cp and

Dpf(ξ)(ζ1, . . . , ζp) = F((−2πi)p(ζ1 · · · ζp)f)(ξ)

for all ξ, ζ1, . . . , ζp ∈ E∗, where ζ1 · · · ζp : E→ C is defined by

(ζ1 · · · ζp)(x) = 〈x, ζ1〉 · · · 〈x, ζp〉.

6. Let p ≥ 0. If f ∈ L 1(E) is of class Cp, Dqf ∈ L 1(E) for all q ≤ p, andDqf ∈ C0(E) for all q ≤ p− 1, then

F(Dpf(·)(u1, . . . , up)) = (2πi)p(u1 · · · up)f

for all u1, . . . , up ∈ E, where u1 · · · up : E∗ → C is defined by

(u1 · · · up)(ξ) = 〈u1, ξ〉 · · · 〈up, ξ〉.

Proof. For (1), (2) and (3), see Theorem 5.178. For (4), Theorem 4.194 showsthat f τ ∈ L 1(E) and

f τ(ξ) =

∫x∈E

(f τ)(x)e−2πi〈x,ξ〉 = |det(τ−1)|∫x∈E

f(x)e−2πi〈τ−1x,ξ〉

= |det(τ−1)|∫x∈E

f(x)e−2πi〈x,(τ∗)−1ξ〉 = |det(τ−1)|f (τ∗)−1.

We prove (5) by induction. The result is clearly true for p = 0. Suppose that theresult holds for p− 1. By Theorem 4.74,

Dpf(ξ)(ζ1, . . . , ζp)

= DDp−1f(ξ)(ζ1, . . . , ζp−1)(ζp)

= DF((−2πi)p−1(ζ1 · · · ζp−1)f)(ξ)(ζp)

=

∫x∈E

(−2πi)p−1(ζ1 · · · ζp−1)(x)f(x)(−2πi)〈x, ζp〉e−2πi〈x,ξ〉

=

∫x∈E

(−2πi)p(ζ1 · · · ζp)(x)f(x)e−2πi〈x,ξ〉

= F((−2πi)p(ζ1 · · · ζp)f)(ξ).

For (6), we first assume that E = R and p = 1. For all a, b ∈ R and ξ ∈ R∗ we canapply Corollary 2.13 to f and x 7→ e−2πi〈x,ξ〉, giving

f(b)e−2πi〈b,ξ〉 − f(a)e−2πi〈a,ξ〉 =

∫ b

a

f ′(x)e−2πi〈x,ξ〉 dx

+

∫ b

a

f(x)(−2πi)ξ(1)e−2πi〈x,ξ〉 dx.

441


Since f ∈ C0(R), taking a→ −∞ and b→∞ gives

F(f ′)(ξ) = 2πiξ(1)F(f)(ξ).

The general case follows by choosing a basis for E and applying Theorem 2.33.

442

6 Probability theory

Notes

6.1 Probability, expectation and independence

Definition 6.1. A probability space (or sample space) is a positive measurespace (Ω,M, P ) such that µ(Ω) = 1. We call P a probability measure, and theelements of M are called events. The probability of an event S ∈ M is P (S).In probability theory, we use the term almost surely (a.s.) instead of “almosteverywhere”. For example, if P (S) = 1 then we say that S occurs almost surely.

Random variables are traditionally functions into Rn, but we will extend thisslightly. Let F be a Banach space over K.

Definition 6.2. A (F-valued) random variable is a measurable function X :Ω → F. We say that X is an Lp random variable if X ∈ L p(P,F). Let Xnbe a sequence of random variables.

• We say that Xn converges to X almost surely (a.s.) if Xn → X almosteverywhere.

• We say that Xn converges to X in probability if Xn → X in measure.

If X is an L1 random variable, we define the expectation or mean of X by

E(X) = E[X] =

∫Ω

X ∈ F.

Note that if S ∈M then E[χS ] = P (S).

Theorem 6.3 (Properties of expectation). Let X,Y be L1 random variables.

1. E[X + Y ] = E[X] + E[Y ].

2. E[rX] = rE[X] for all r ∈ K.

443


3. For any Banach space G and continuous linear map f : F→ G,

E[fX] = E[f X] = fE[X].

4. |E[X]| ≤ E[|X|].

Proof. E is just the integration map∫

Ω: L 1(P,F)→ F.

Adjoint

Let F,G be Hilbert spaces. Define the adjoint of a vector x ∈ F to be the mapx∗ : F → K given by x∗(y) = 〈y, x〉. This coincides with the usual definition forcontinuous linear maps (see Theorem 1.97) because we can naturally identify anyvector x ∈ F with the injective linear map αx : K → F given by r 7→ rx, and weare just defining x∗ to be α∗x.

If X is an F-valued random variable, we define its adjoint X∗ : Ω→ F∗ by

X∗(ω) = X(ω)∗,

where X(ω)∗ is the adjoint of X(ω) (i.e. the adjoint of αX(ω)). Again, the adjointsatisfies

X∗y = 〈y,X〉

for all y ∈ F.

Theorem 6.4 (Properties of the adjoint). Let F,G be Hilbert spaces and let X,Ybe F-valued random variables.

1. (X + Y )∗ = X∗ + Y ∗ and (rX)∗ = rX∗ for all r ∈ K.

2. X∗∗ = τX, where τ : F → F∗∗ is the isometry defined in Lemma 1.54 (andis actually an isometric isomorphism because F is a Hilbert space.)

3. (fX)∗ = X∗f∗ for all continuous linear maps f : F → G. (More precisely,(fX(ω))∗ = X(ω)∗f∗ for all ω ∈ Ω.)

4. X is Lp if and only if X∗ is Lp.

Proof. For (1) to (3), apply Theorem 1.98. For (4), note that |X∗|p = |X|p anduse Theorem 4.67.

444


Covariance and variance

Let F,G be separable Hilbert spaces, let X be an F-valued L2 random variable,and let Y be a G-valued L2 random variable. We define the covariance betweenX and Y by

Cov(X,Y ) = E[(X − E[X]) (Y − E[Y ])

∗]=

∫ω∈Ω

(X(ω)− E[X]) (Y (ω)− E[Y ])∗

∈ L(G,F),

noting that (X − E[X]) (Y −E[Y ])∗ is L1 due to Theorem 4.81. The varianceof X is the covariance between X and itself, i.e.

Var(X) = Cov(X,X) = E[(X − E[X]) (X − E[X])

∗] ∈ L(F).

The covariance measures how much X and Y tend to change together while thevariance measures how much X tends to deviate from the mean E[X]: if u ∈ Fand v ∈ G then

〈Cov(X,Y )v, u〉 =⟨E[(X − E[X]) (Y − E[Y ])

∗v], u⟩

= E[〈X − E[X], u〉〈Y − E[Y ], v〉

].

Theorem 6.5 (Properties of covariance). Let F,G be separable Hilbert spaces. LetX,Y, Z be L2 random variables where X,Z are F-valued and Y is G-valued.

1. Cov(X,Y ) = Cov(Y,X)∗. In particular, if X and Y are K-valued thenCov(X,Y ) = Cov(Y,X).

2. Cov is sesquilinear:

Cov(aX + bZ, Y ) = aCov(X,Y ) + bCov(Z, Y ),

Cov(Y, aX + bZ) = aCov(Y,X) + bCov(Y,Z)

for a, b ∈ K.

3. Cov(X,Y ) = E [XY ∗]− E[X]E[Y ]∗.

4. Cov(fX + u, gY + v) = f Cov(X,Y )g∗, where F′,G′ are Hilbert spaces, f ∈L(F,F′), g ∈ L(G,G′), u ∈ F′ and v ∈ G′.

445


Proof. (1) and (2) are obvious. For (3),

Cov(X,Y ) = E[(X − E[X]) (Y − E[Y ])

∗]= E [XY ∗ −XE[Y ]∗ − E[X]Y ∗ + E[X]E[Y ]∗]

= E[XY ∗]− E[X]E[Y ]∗ − E[X]E[Y ]∗ + E[X]E[Y ]∗

= E [XY ∗]− E[X]E[Y ]∗.

For (4), first note that

Cov(fX + u, gY ) = Cov(fX, gY ) + Cov(u, gY )

= Cov(fX, gY ) + E [u(gY )∗]− uE[gY ]∗

= Cov(fX, gY )

and similarly Cov(fX + u, gY + v) = Cov(fX + u, gY ) = Cov(fX, gY ). Then wehave

Cov(fX, gY ) = E[f (X − E[X]) (g (Y − E[Y ]))

∗]= fE

[(X − E[X]) (Y − E[Y ])

∗]g∗

= f Cov(X,Y )g∗.

Theorem 6.6 (Properties of variance). Let F be a separable Hilbert space and letX be an F-valued L2 random variable.

1. Var(X) is a positive trace class operator.

2. tr Var(X) = |Var(X)|1 = E[|X − E[X]|2].

3. |Var(X)| ≤ E[|X − E[X]|2].

4. Var(X) = 0 if and only if X is constant almost surely.

5. Var(aX) = |a|2 Var(X) for a ∈ K.

6. Var(fX + u) = f Var(X)f∗ for f ∈ L(F,G) and u ∈ G.

Proof. We have

Var(X)∗ = Cov(X,X)∗ = Cov(X,X) = Var(X),

which shows that Var(X) is self-adjoint. It is positive because for all v ∈ F,

〈Var(X)v, v〉 = E[〈(X − E[X]) (X − E[X])∗v, v〉]

= E[〈〈v,X − E[X]〉 (X − E[X]) , v〉]

= E[|〈X − E[X], v〉|2]

≥ 0.

446


Let un be a Hilbert basis for F. Since X − E[X] is L2,

E[|X − E[X]|2] = E

[ ∞∑n=1

|〈X − E[X], un〉|2]

=

∞∑n=1

E[|〈X − E[X], un〉|2]

=

∞∑n=1

〈Var(X)un, un〉

converges. By Corollary 5.166, Var(X) is a trace class operator. This proves (1).(2) follows from Theorem 5.167, and (3) follows from Theorem 5.164. (4) followsdirectly from (2). (5) and (6) follow from Theorem 6.5.

Inner product and norm

If X and Y are both F-valued, we can define their inner product by

〈X,Y 〉v = E[〈X − E[X], Y − E[Y ]〉] ∈ K.

The norm of X is

‖X‖v =√〈X,X〉v =

√E[|X − E[X]|2] ≥ 0.

Theorem 6.6 states that |Var(X)|1 = ‖X‖2v, so if ‖X‖v = 0 then X = E[X]almost surely. Usually the inner product and Cov are very different: the innerproduct produces scalar values, and Cov produces linear maps. However, the innerproduct and Cov are equivalent when F = K, where we simply have 〈X,Y 〉v =Cov(X,Y )(1).

Recall that the space of L2 random variables L2(Ω, P,F) is a Hilbert space underthe inner product

〈[f ], [g]〉P =

∫Ω

〈f, g〉 ,

where f, g ∈ L 2(Ω, P,F). It is easy to check that 〈·, ·〉v is an inner product onLv(Ω, P,F) = L2(Ω, P,F)/Lc(Ω, P,F), where Lc(Ω, P,F) is the set of all F-valuedrandom variables that are constant almost surely, and that Lv(Ω, P,F) is a Hilbertspace under 〈·, ·〉v. We can identify Lv(Ω, P,F) with the space of zero-expectationrandom variables (X ∈ L2(Ω, P,F) such that E[X] = 0).

Since 〈·, ·〉v is an inner product, the Cauchy-Schwarz inequality (1.6) gives thebound

|〈X,Y 〉v| ≤ ‖X‖v‖Y ‖v.

447


Theorem 6.7 (Chebyshev’s inequality). For any L2 random variable X and a > 0we have

P |X| ≥ a ≤ E[|X|2]

a2.

In particular,

P |X − E[X]| ≥ a ≤‖X‖2va2

.

Proof. We have a2χ|X|≥a ≤ |X|2, so taking expectations gives

P |X| ≥ a =1

a2E[a2χ|X|≥a] ≤

E[|X|2]

a2.

Distribution and independence

An F-valued random variable X induces an image measure PX on F with its Borelσ-algebra (see Theorem 4.26), given by

PX(S) = P (X−1(S)).

We call PX the distribution (or law) of X. A collection Xαα∈A of F-valuedrandom variables is identically distributed if PXα = PXβ for all α, β ∈ A. If Xi

is an Fi-valued random variable for i = 1, . . . , n then we can consider (X1, . . . , Xn)as an (F1 × · · · × Fn)-valued random variable, and we call P(X1,...,Xn) the jointdistribution of X1, . . . , Xn.

Theorem 6.8 (Expectation rule). Let X be an F-valued random variable, letU ⊆ X(Ω) be measurable and let f : U → G. Then f ∈ L 1(PX) if and only if fis PX-measurable and f X ∈ L 1(P ). In that case,∫

Ω

f X dP = E[f(X)] =

∫U

f dPX .

Proof. See Theorem 4.69.

Corollary 6.9. Let F be a separable Banach space and let X be an F-valuedrandom variable.

1. X ∈ L 1(P ) if and only if IdF ∈ L 1(PX). In that case,

E[X] =

∫x∈F

x dPX .

448


2. If F is a Hilbert space, then X ∈ L 2(P ) if and only if IdF ∈ L 2(PX). Inthat case,

Var(X) =

∫x∈F

(x− E[X])(x− E[X])∗ dPX .

Proof. This follows directly from 6.8, noting that IdF is always PX -measurable dueto Theorem 4.39 and the fact that F is separable.

Two events S1, S2 ∈ M are independent if P (S1 ∩ S2) = P (S1)P (S2). Moregenerally, we say that a collection Sαα∈A of events in M is independent if

P

(n⋂i=1

Sαi

)=

n∏i=1

P (Sαi)

for all finite subsets α1, . . . , αn of A.

Let Xαα∈A be a collection of random variables where each Xα is Fα-valued.We say that Xαα∈A is independent if X−1

α (Sα)α∈A is independent for everycollection of Borel sets Sαα∈A where Sα ⊆ Fα.

Theorem 6.10 (Properties of independent random variables). Let Fαα∈A be acollection of separable Banach spaces, and let Xαα∈A be a collection of randomvariables where each Xα is Fα-valued.

1. Xαα∈A is independent if and only if

P(Xα1,...,Xαn ) = PXα1

⊗ · · · ⊗ PXαn

for every finite subset α1, . . . , αn of A (see Theorem 4.116).

2. If Xαα∈A is independent and Aββ∈B is a partition of A where each Aβis finite and Aβ = αβ,1, . . . , αβ,nβ, then

(Xαβ,1 , . . . , Xαβ,nβ)β∈B

is independent.

3. If Xαα∈A is independent, Gαα∈A is a collection of separable Banachspaces and fα : Fα → Gα is measurable for each α ∈ A, then

fα(Xα)α∈A

is independent.

449


Proof. For (1) we first check that the statement makes sense, since P(Xα1 ,...,Xαn ) is

a measure on B∏ni=1Xαi

while PXα1⊗· · ·⊗PXαn is a measure on

⊗ni=1 BXαi . Every

Fα is a separable Banach space and is therefore second countable, so Theorem 4.6shows that B∏n

i=1Xαi=⊗n

i=1 BXαi . If Sαi is a Borel subset of Fαi for each i, then

P

(n⋂i=1

X−1αi (Sαi)

)= P

((Xα1 , . . . , Xαn)−1(Sα1 × · · · × Sαn)

)= P(Xα1

,...,Xαn )(Sα1× · · · × Sαn)

and

n∏i=1

P (X−1αi (Sαi)) =

n∏i=1

PXαi (Sαi) = (PXα1⊗ · · · ⊗ PXαn )(Sα1

× · · · × Sαn).

This shows that P(⋂n

i=1X−1αi (Sαi)

)=∏ni=1 P (X−1

αi (Sαi)) for all possible Sαi ifand only if P(Xα1

,...,Xαn ) = PXα1⊗ · · · ⊗ PXαn , which proves (1). For (2), let Sα

be a subset of Fα for each α, let β1, . . . , βn be a finite subset of B, and writeXi = (Xαβi,1

, . . . , Xαβi,nβi) and Si = Sαβi,1 × · · · × Sαβi,nβi for each i. Then

P(X1,...,Xn)(S1 × · · · × Sn) = P(Xαβ1,1,...,Xαβn,nβn

)(Sαβ1,1 × · · · × Sαβn,nβn )

=

n∏i=1

nβi∏j=1

PXαβi,j(Sαβi,j )

=

n∏i=1

PXi(Si).

For (3), let Sα be a subset of Gα for each α, let α1, . . . , αn be a finite subset ofA, and write Yi = fαi(Xαi) for each i. Then

P(Y1,...,Yn)(Sα1, . . . , Sαn) = P

(n⋂i=1

Y −1i (Sαi)

)= P

(n⋂i=1

X−1αi (f−1

αi (Sαi))

)

=

n∏i=1

P (X−1αi (f−1

αi (Sαi))) =

n∏i=1

P (Y −1i (Sαi))

=

n∏i=1

PYi(Sαi).

450


Theorem 6.11. Let F1, . . . ,Fn be separable Banach spaces and let G be a separableBanach space. Let X1, . . . , Xn be independent L1 random variables where each Xi

is Fi-valued. If f : F1 × · · · × Fn → G is a continuous multilinear map, thenf(X1, . . . , Xn) is L1 and

E[f(X1, . . . , Xn)] = f(E[X1], . . . , E[Xn]).

Proof. We can assume that n = 2 because the general case follows by induction.Note that f is (PX1 ⊗ PX2)-measurable because G is separable, x2 7→ f(x1, x2) isin L 1(PX2 ,G) for every x1 ∈ F1 because X2 is L1, and

x1 7→∫x2∈F2

|f(x1, x2)| dPX2≤ |f ||x1|

∫x2∈F2

|x2| dPX2

is in L 1(PX1 ,R) because X1 is L1. Fubini’s theorem shows that f(X1, X2) ∈L 1(PX1 ⊗ PX2 ,G), and since X1 and X2 are independent, Theorem 6.10 showsthat P(X1,X2) = PX1

⊗ PX2. Therefore

E[f(X1, X2)] =

∫x1∈F1

∫x2∈F2

f(x1, x2) dPX2dPX1

=

∫x1∈F1

f

(x1,

∫x2∈F2

x2 dPX2

)dPX1

= f

(∫x1∈F1

x1 dPX1,

∫x2∈F2

x2 dPX2

)= f(E[X1], E[X2]).

Corollary 6.12. Let E,F be separable Hilbert spaces and let X,Y be independentL1 random variables where X is F-valued and Y is G-valued. Then XY ∗ is L1

andE[XY ∗] = E[X]E[Y ]∗.

If X and Y are L2, thenCov(X,Y ) = 0.

Proof. Take f(x, y) = xy∗ in Theorem 6.11. The second statement follows fromTheorem 6.5.

Corollary 6.13. Let F be a separable Hilbert space and let X1, . . . , Xn be inde-pendent F-valued L2 random variables. Then

Var(X1 + · · ·+Xn) = Var(X1) + · · ·+ Var(Xn).

451


Proof. Note that Cov(Xi, Xj) = 0 when i 6= j (see Theorem 6.12). We have

Var(X1 + · · ·+Xn) = Cov(X1 + · · ·+Xn, X1 + · · ·+Xn)

=

n∑i=1

n∑j=1

Cov(Xi, Xj)

=

n∑i=1

Cov(Xi, Xi)

= Var(X1) + · · ·+ Var(Xn).

Theorem 6.14. Let F be a finite-dimensional Banach space and let X1, . . . , Xn

be independent F-valued random variables. Then

PX1+···+Xn = PX1∗ · · · ∗ PXn .

Proof. Define ρ : Fn → F by ρ(x1, . . . , xn) = x1 + · · · + xn, and let E be a Borelsubset of F. By Theorem 5.181 and Theorem 6.10,

(PX1 ∗ · · · ∗ PXn)(E) =

∫(x1,...,xn)∈Fn

χE(x1 + · · ·+ xn) d(PX1 ⊗ · · · ⊗ PXn)

=

∫(x1,...,xn)∈Fn

χE(x1 + · · ·+ xn) dP(X1,...,Xn)

=

∫(x1,...,xn)∈Fn

χE ρ dP(X1,...,Xn)

= P(X1,...,Xn)(ρ−1(E))

= PX1+···+Xn(E).

So far, we have assumed that all of the random variables we work with are functionson the same probability space. Adding two random variables X1 : Ω1 → F andX2 : Ω2 → F defined on different probability spaces does not make sense, just asthe pointwise addition of f : A→ R and g : B → R does not make sense if A 6= B.However, we can form the product probability space Ω1 × Ω2 and consider theinjected random variables X ′1 = X1 ι1 and X ′2 = X2 ι2, where ιi : Ωi → Ω1×Ω2

are the canonical injections. In this product space, X ′1 and X ′2 are independent.In many situations we start with independent sources of randomness and combinethem into a single probability space using this method. Using Theorem 4.165, itis possible to extend this method to an arbitrary, possibly uncountable collectionof probability spaces.

452


Characteristic functions

6.2 Linear models

a

453

Bibliography

[1] Sterling K. Berberian. Notes on Spectral Theory. 2009.

[2] Jean Dieudonne. Foundations of Modern Analysis. Academic Press, 1969.

[3] Herbert Federer. Geometric Measure Theory. Springer-Verlag, 1969.

[4] Gerald B. Folland. A Course in Abstract Harmonic Analysis. CRC Press,1995.

[5] Gerald B. Folland. Real Analysis: Modern Techniques and Their Applica-tions. 2nd. John Wiley & Sons, 1999.

[6] Robert Kent Goodrich. “The spectral theorem for real Hilbert space”. In:Acta Sci. Math. 33 (1972), pp. 123–127.

[7] Alfred Horn. “On the singular values of a product of completely continuousoperators”. In: Proc. Natl. Acad. Sci. U.S.A. 36.7 (1950), pp. 374–375.

[8] Eberhard Kaniuth. A Course in Commutative Banach Algebras. Springer,2009.

[9] Serge Lang. Real and Functional Analysis. 3rd. Springer, 1993.

[10] Ivan Netuka. “The Change-of-Variables Theorem for the Lebesgue Integral”.In: Acta Universitatis Matthiae Belii 19 (2011), pp. 37–42.

[11] Marc A. Rieffel. Lectures on Bochner Lebesgue Integration. 1970.

[12] Walter Rudin. Fourier Analysis on Groups. Interscience Publishers, 1962.

[13] Walter Rudin. Functional Analysis. McGraw-Hill, 1973.

[14] Barry Simon. Trace Ideals and Their Applications. 2nd. American Mathe-matical Society, 2005.

[15] Karl Stromberg. “A Note on the Convolution of Regular Measures”. In:Mathematica Scandinavica 7 (1959), pp. 347–352.

[16] Sergei Winitzki. Linear Algebra via Exterior Products. lulu.com, 2010.

454

Documents

General Analysis - wj32wj32.org/files/General Analysis.pdf · 2019. 4. 4. · say that vis a unit vector. The following theorem shows that kkis indeed a norm. Theorem 1.6. Let V be