MATH 3210 Metric spaces - University of Leedspmt6jrp/3210course.pdf · Victor Bryant, Metric spaces: iteration and application, Cambridge, 1985. M. O. Searc oid, Metric Spaces, Springer

MATH 3210 Metric spaces

University of Leeds, School of Mathematics

November 29, 2017

Syllabus:1. Definition and fundamental properties of a metric space. Open sets, closed sets,closure and interior. Convergence of sequences. Continuity of mappings. (6)2. Real inner-product spaces, orthonormal sequences, perpendicular distance to asubspace, applications in approximation theory. (7)3. Cauchy sequences, completeness of R with the standard metric; uniform convergenceand completeness of C[a, b] with the uniform metric. (3)4. The contraction mapping theorem, with applications in the solution of equationsand differential equations. (5)5. Connectedness and path-connectedness. Introduction to compactness and sequentialcompactness, including subsets of Rn. (6)

LECTURE 1

Books:Victor Bryant, Metric spaces: iteration and application, Cambridge, 1985.M. O. Searcoid, Metric Spaces, Springer Undergraduate Mathematics Series, 2006.D. Kreider, An introduction to linear analysis, Addison-Wesley, 1966.

1 Metrics, open and closed sets

We want to generalise the idea of distance between two points in the real line, givenby

d(x, y) = |x− y|,

and the distance between two points in the plane, given by

d(x,y) = d((x1, x2), (y1, y2)) =√

(x1 − y1)2 + (x2 − y2)2.

to other settings.

[DIAGRAM]

This will include the ideas of distances between functions, for example.

1

1.1 Definition

Let X be a non-empty set. A metric on X, or distance function, associates to eachpair of elements x, y ∈ X a real number d(x, y) such that(i) d(x, y) ≥ 0; and d(x, y) = 0 ⇐⇒ x = y (positive definite);(ii) d(x, y) = d(y, x) (symmetric);(iii) d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality).

Examples:(i) X = R. The standard metric is given by d(x, y) = |x − y|. There are many othermetrics on R, for example

d(x, y) = |ex − ey|;

d(x, y) =

{|x− y| if |x− y| ≤ 1,1 if |x− y| ≥ 1.

Let X be any set whatsoever, then we can define

d(x, y) =

{1 if x 6= y,0 if x = y,

(the discrete metric).

(ii) X = R2. The standard metric is the Euclidean metric: if x = (x1, x2) and y =(y1, y2) then

d2(x,y) =√

(x1 − y1)2 + (x2 − y2)2.

This is linked to the inner-product (scalar product), x.y = x1y1 + x2y2, since it isjust

√(x− y).(x− y). We will study inner products more carefully later, so for the

moment we won’t prove the (well-known) fact that it is indeed a metric.Other possible metrics include

d∞(x,y) = max{|x1 − y1|, |x2 − y2|}.

Let’s check the axioms. In fact (i) and (ii) are easy (i.e., the distance is positive definite,symmetric); for (iii) let’s write |x1−y1| = p, |x2−y2| = q, |y1−z1| = r and |y2−z2| = s.Then |x1 − z1| ≤ p+ r and |x2 − z2| ≤ q + s; so

d∞(x, z) = max{|x1 − z1|, |x2 − z2|}≤ max{p+ r, q + s}≤ max{p, q}+ max{r, s} = d∞(x,y) + d∞(y, z).

by inspection.

Another metric on R2 comes from d1(x,y) = |x1 − y1| + |x2 − y2|. These metricsare all translation-invariant (i.e., d(x + z,y + z) = d(x,y)), and homogeneous (i.e.,d(kx, ky) = |k|d(x,y)).

2

(iii) Take X = C[a, b]. Here are three metrics:

d2(f, g) =

√∫ b

a

(f(x)− g(x))2 dx.

Again, this is linked to the idea of an inner product, so we will delay proving that it isa metric.

d1(f, g) =

∫ b

a

|f(x)− g(x)| dx,

the area between two curves [DIAGRAM].

d∞(f, g) = max{|f(x)− g(x)| : a ≤ x ≤ b},

the maximum separation between two curves. [DIAGRAM].

Example: on C[0, 1] take f(x) = x and g(x) = x2 and calculate

d2(f, g) =

(∫ 1

0

(x− x2)2 dx

)1/2

=√

1/30,

d1(f, g) =

∫ 1

0

|x− x2| dx = 1/6, and

d∞(f, g) = maxx∈[0,1]

|x− x2| = 1/4.

1.2 Definition

A set X together with a metric d is called a metric space, sometimes written (X, d). IfA ⊆ X then we can use d to measure distances between points of A, and (A, d) is alsoa metric space, called a subspace of (X, d).

LECTURE 2

Examples:1. The interval [a, b] with d(x, y) = |x− y| is a subspace of R.2. The unit circle {(x1, x2) ∈ R2 : x2

1 +x22 = 1} with d(x, y) =

√(x1 − y1)2 + (x2 − y2)2

is a subspace of R2.3. The space of polynomials P is a metric space with any of the metrics inherited fromC[a, b] above.

1.3 Definition

3

Let (X, d) be a metric space, let x ∈ X and let r > 0. The open ball centred at x, withradius r, is the set

B(x, r) = {y ∈ X : d(x, y) < r},

and the closed ball is the set

B[x, r] = {y ∈ X : d(x, y) ≤ r}.

Note that in R with the usual metric the open ball is B(x, r) = (x− r, x+ r), an openinterval, and the closed ball is B[x, r] = [x− r, x+ r], a closed interval.

For the d2 metric on R2, the unit ball, B(0, 1), is disc centred at the origin, excludingthe boundary. You may like to think about what you get for other metrics on R2.

1.4 Definition

A subset U of a metric space (X, d) is said to be open, if for each point x ∈ U there isan r > 0 such that the open ball B(x, r) is contained in U (“room to swing a cat”).

Clearly X itself is an open set, and by convention the empty set ∅ is also consideredto be open.

1.5 Proposition

Every “open ball” B(x, r) is an open set.

Proof: For if y ∈ B(x, r), choose δ = r − d(x, y). We claim that B(y, δ) ⊂ B(x, r).If z ∈ B(y, δ), i.e., d(z, y) < δ, then by the triangle inequality

d(z, x) ≤ d(z, y) + d(y, x) < δ + d(x, y) = r.

So z ∈ B(x, r).

1.6 Definition

A subset F of (X, d) is said to be closed, if its complement X \ F is open.

Note that closed does not mean “not open”. In a metric space the sets ∅ and X areboth open and closed. In R we have:(a, b) is open.[a, b] is closed, since its complement (−∞, a) ∪ (b,∞) is open.[a, b) is not open, since there is no open ball B(a, r) contained in the set. Nor is itclosed, since its complement (−∞, a) ∪ [b,∞) isn’t open (no ball centred at b can becontained in the set).

1.7 Example

4

If we take the discrete metric,

d(x, y) =

{1 if x 6= y,0 if x = y,

then each point {x} = B(x, 1/2) so is an open set. Hence every set U is open, sincefor x ∈ U we have B(x, 1/2) ⊆ U .

Hence, by taking complements, every set is also closed.

1.8 Proposition

In a metric space, every one-point set {x0} is closed.

Proof: We need to show that the set U = {x ∈ X : x 6= x0} is open, so takea point x ∈ U . Now d(x, x0) > 0, and the ball B(x, r) is contained in U for every0 < r < d(x, x0).[DIAGRAM]

1.9 Theorem

Let (Uα)α∈A be any collection of open subsets of a metric space (X, d) (not necessarilyfinite!). Then

⋃α∈A Uα is open. Let U and V be open subsets of a metric space (X, d).

Then U∩V is open. Hence (by induction) any finite intersection of open subsets is open.

Proof: If x ∈⋃α∈A Uα then there is an α with x ∈ Uα. Now Uα is open, so

B(x, r) ⊂ Uα for some r > 0. Then B(x, r) ⊂⋃α∈A Uα so the union is open.

If now U and V are open and x ∈ U ∩V , then ∃r > 0 and s > 0 such that B(x, r) ⊂ Uand B(x, s) ⊂ V , since U and V are open. Then B(x, t) ⊂ U ∩ V if t ≤ min(r, s).[DIAGRAM.]

So the collection of open sets is preserved by arbitrary unions and finite intersections.

However, an arbitrary intersection of open sets is not always open; for example (− 1n, 1n)

is open for each n = 1, 2, 3, . . ., but⋂∞n=1(− 1

n, 1n) = {0}, which is not an open set.

LECTURE 3

For closed sets we swap union and intersection.

1.10 Theorem

Let (Fα)α∈A be any collection of closed subsets of a metric space (X, d) (not necessar-ily finite!). Then

⋂α∈A Fα is closed. Let F and G be closed subsets of a metric space

(X, d). Then F ∪ G is closed. Hence (by induction) any finite intersection of closed

5

subsets is closed.

To prove this we recall de Morgan’s laws. We use the notation Sc for the complementX \ S of a set S ⊂ X.

x 6∈⋃α

Aα ⇐⇒ x 6∈ Aα for all α, so (⋃

Aα)c =⋂

Acα.

x 6∈⋂α

Aα ⇐⇒ x 6∈ Aα for some α, so (⋂

Aα)c =⋃

Acα.

Proof: Write Uα = F cα = X \ Fα which is open. So

⋃α∈A Uα is open by Theorem

1.9. Now, by de Morgan’s laws, (⋂α∈A Fα)c =

⋃α∈A F

cα. This is just

⋃α∈A Uα. Since

its complement is open,⋂α∈A Fα is closed.

Similarly, the complement of F ∪ G is F c ∩ Gc, which is the intersection of two opensets and hence open by Theorem 1.9. Hence F ∪G is closed. �

Infinite unions of closed sets do not need to be closed. An example is⋃∞n=1[ 1

n,∞) = (0,∞), which is open but not closed.

1.11 Definition

The closure of S, written S, is the smallest closed set containing S, and is containedin all other closed sets containing S. Also S is dense if S = X.A smallest closed set containing S does exist, because we can define

S =⋂{F : F ⊃ S, F closed},

the intersection of all closed sets containing S. There is at least one, namely X itself.

1.12 Example in R

The closure of S = [0, 1) is [0, 1]. This is closed, and there is nothing smaller that isclosed and contains S.

1.13 Theorem

The set Q of rationals is dense in R, with the usual metric.

Proof: Suppose that F is a closed subset of R which contains Q: we claim that itF = R.For U = R \ F is open and contains no points of Q. But an open set U (unless it isempty) must contain an interval B(x, r) for some x ∈ U , and hence a rational number.Our only conclusion is that U = ∅ and F = R, so that Q = R. �

1.14 Proposition

6

Let S ⊂ X. Then:(i) S ⊂ S.

(ii) S = S ⇐⇒ S is closed (so S = S).(iii) S ⊂ T ⇒ S ⊂ T .(iv) ∅ = ∅, X = X.(v) S ∪ T = S ∪ T .(vi) S ∩ T ⊂ S ∩ T .

Proof: All these are quite easy except (v) and (vi) (CHECK).

For (v) note that S ⊂ S and T ⊂ T so S ∪ T ⊂ S∪T , which is closed, so S ∪ T ⊂ S∪T .Also S ⊂ S ∪ T and T ⊂ S ∪ T so S ∪ T ⊂ S ∪ T . So equal.

For (vi), we have S ∩ T ⊂ S and S ∩ T ⊂ T so S ∩ T ⊂ S ∩ T .

But we don’t need to have equality; for example X = R, S = (0, 1), T = (1, 2). ThenS ∩ T = ∅ = ∅, whereas S ∩ T = [0, 1] ∩ [1, 2] = {1}.

1.15 Definition

We say that V is a neighbourhood (nhd) of x if there is an open set U such thatx ∈ U ⊆ V ; this means that ∃δ > 0 s.t. B(x, δ) ⊆ V . Thus a set is open preciselywhen it is a neighbourhood of each of its points.

1.16 Example

The half-open interval [0, 1) is a neighbourhood of every point in it except for 0.

1.17 Theorem

For a subset S of a metric space X, we have x ∈ S iff V ∩ S 6= ∅ for all nhds V of x(i.e., all neighbourhoods of x meet S).

Proof: If there is a neighbourhood of x that doesn’t meet S, then there is an opensubset U with x ∈ U and U ∩ S = ∅. [DIAGRAM?]But then X \U is a closed set containing S and so S ⊂ X \U , and then x /∈ S becausex ∈ U .Conversely, if every neighbourhood of x does meet S, then x ∈ S, as otherwise X \ Sis as open neighbourhood of x that doesn’t meet S. �

LECTURE 4

1.18 Definition

7

The interior of S, intS, is the largest open set contained in S, and can be written as

intS =⋃{U : U ⊂ S, U open}.

the union of all open sets contained in S. There is at least one, namely ∅.

We see that S is open exactly when S = intS, otherwise intS is smaller.

1.19 Examples in R

int[0, 1) = (0, 1); clearly this is open and there is no larger open set contained in [0, 1).

intQ = ∅. For any non-empty open set must contain an interval B(x, r) and then itcontains an irrational number, so isn’t contained in Q.

1.20 Proposition

intS = X \ (X \ S).

Proof: By De Morgan’s laws,

intS =⋃{U : U ⊂ S, U open}

= X \⋂{U c : U ⊂ S, U open}

= X \⋂{F : F ⊃ X \ S, F closed} = X \ (X \ S).

This is because U ⊂ S if and only if U c = X\U ⊃ X\S. Also F = U c is closed preciselywhen U is open. That is, there is a correspondence between open sets contained in Sand closed sets containing its complement.

1.21 Corollary

(i) intS ⊂ S.(ii) intS = S ⇐⇒ S is open.(iii) S ⊂ T ⇒ intS ⊂ intT .(iv) int (intS) = intS.(v) int(S ∪ T ) ⊃ intS ∪ intT .(vi) int(S ∩ T ) = intS ∩ intT .

Proof: Easy, or take complements and use Prop’s 1.14 and 1.20.

1.22 Definition

The boundary or frontier of S is ∂S = S \ intS = S ∩X \ S.This writes ∂S as the intersection of two closed sets, so it is also closed.

1.23 Examples in R

8

For S = [0, 1) we have intS = (0, 1) and S = [0, 1] so ∂S = {0, 1}.

For S = Q we have intS = ∅ and S = R, so ∂S = R.

1.24 Examples in R2

For S = {(x, y) : x2 + y2 < 1}, we have intS = S and S = {(x, y) : x2 + y2 ≤ 1}, so∂S is the circle {(x, y) : x2 + y2 = 1}.

For S = [0, 1) regarded as the subset {(x, y) : 0 ≤ x < 1, y = 0} of R2, we haveS = {(x, y) : 0 ≤ x ≤ 1, y = 0} and intS = ∅ so ∂S = S.

2 Convergence and continuity

Let (xn) be a sequence in a metric space (X, d), i.e., x1, x2, . . .. (Sometimes we maystart counting at x0.)

2.1 Definition

We say xn → x (i.e., xn tends to x or converges to x) if d(xn, x)→ 0 as n→∞. Thatis, for all ε > 0 there is an N such that d(xn, x) < ε for n ≥ N (“for n sufficientlylarge”).

This is the usual notion of convergence if we think of points in Rm with the Euclideanmetric.

2.2 Theorem

(i) The sequence (xn) tends to x if and only if for every open U with x ∈ U , ∃n0 s.t.xn ∈ U for all n ≥ n0.

(ii) Let S be a subset of the metric space X. Then x ∈ S if and only if there is asequence (xn) of points of S with xn → x.

Proof: (i) If xn → x and x ∈ U , then there is a ball B(x, ε) ⊂ U , since U is open.But xn → x so d(xn, x) < ε for n sufficiently large, i.e., xn ∈ U for n sufficiently large.

Conversely, if the “open set” condition works, and ε > 0, choose U = B(x, ε). Thenxn ∈ U for n sufficiently large, and so d(xn, x) < ε for n large.

(ii) If x ∈ S, then for each n we have B(x, 1n) ∩ S 6= ∅ by Theorem 1.17. So choose

xn ∈ B(x, 1/n) ∩ S. Clearly d(xn, x)→ 0, i.e., xn → x.

Conversely, if x 6∈ S, then there is a neighbourhood U of x with U ∩ S = ∅. Now nosequence in S can get into U so it cannot converge to x.

�

9

2.3 Examples

1. Take (R2, d1), where d1(x, y) = |x1 − y1| + |x2 − y2|, where x = (x1, x2) andy = (y1, y2), and consider the sequence ( 1

n, 2n+1n+1

). We guess its limit is (0, 2). To see ifthis is right, look at

d1

((1

n,2n+ 1

n+ 1

), (0, 2)

)=

∣∣∣∣ 1n∣∣∣∣+

∣∣∣∣2n+ 1

n+ 1− 2

∣∣∣∣ =1

n+

1

n+ 1→ 0

as n→∞. So the limit is (0, 2).

LECTURE 5

2. In C[0, 1] let fn(t) = tn and f(t) = 0 for 0 ≤ t ≤ 1. Does fn → f , (a) in d1, and (b)in d∞?

(a)

d1(fn, f) =

∫ 1

0

tn dt =1

n+ 1→ 0

as n→∞. So fn → f in d1.

(b)d∞(fn, f) = max{tn : 0 ≤ t ≤ 1} = 1 6→ 0

as n→∞. So fn 6→ f in d∞.

Note: say gn → g pointwise on [a, b] as n→∞ if gn(x)→ g(x) for all x ∈ [a, b]. If we

define g(x) =

{0 for 0 ≤ x < 1,1 for x = 1,

then fn → g pointwise on [0, 1]. But g 6∈ C[0, 1], as

it is not continuous at 1.

3. Take the discrete metric

d0(x, y) =

{1 if x 6= y,0 if x = y.

Then xn → x ⇐⇒ d0(xn, x) → 0. But since d0(xn, x) = 0 or 1, this happens if andonly if d0(xn, x) = 0 for n sufficiently large. That is, there is an n0 such that xn = xfor all n ≥ n0.All convergent sequences in this metric are eventually constant. So, for exampled0(1/n, 0) 6→ 0.

A result on convergence in R2.

2.4 Proposition

10

Take R2 with any of the metrics d1, d2 and d∞. Then a sequence xn = (an, bn) con-verges to x = (a, b) if and only if an → a and bn → b.

Proof: We have d1(xn,x) = |an − a| + |bn − b|. This tends to zero as n → ∞ if andonly if each of the terms |an− a| and |bn− b| does. And that’s the same as saying thatan → a and bn → b.

Also d2(xn,x) = (|an− a|2 + |bn− b|2)1/2, which tends to zero if and only if |an− a|2 +|bn − b|2 does; this happens if and only if |an − a|2 and |bn − b|2 tend to zero, which isthe same as an → a and bn → b.

Finally, d∞(xn,x) = max{|an − a|, |bn − b|}. If this tends to zero then so do |an − a|and |bn − b| as they are smaller and still positive; and if they both tend to zero thenso does their maximum, which is less than their sum. Again this is the same as sayingan → a and bn → b.

A similar result holds for Rk in general.

Now let’s look at continuous functions again.

2.5 Theorem

If fn → f in (C[a, b], d∞), then fn → f in (C[a, b], d1).

(d∞ convergence is stronger than d1 convergence.)

Proof: d∞(fn, f) = max{|fn(x)− f(x)| : a ≤ x ≤ b} → 0 as n→∞, so, given ε > 0there is an N so that d∞(fn, f) < ε for n ≥ N . It follows that if n ≥ N then

d1(fn, f) =

∫ b

a

|fn(x)− f(x)| dx ≤∫ b

a

ε dx = ε(b− a),

so d1(fn, f)→ 0 as n→∞.

Note: It is also true that if d∞(fn, f) → 0 then fn → f pointwise on [a, b]. Theconverse is FALSE.

Now we look at continuous functions between general metric spaces.

2.6 Definition

Let f : (X, dX)→ (Y, dY ) be a map between metric spaces. We say that f is continuousat x0 ∈ X if for each ε > 0 there is a δ > 0 such that dY (f(x), f(x0)) < ε wheneverdX(x, x0) < δ.

11

So f is continuous, if it is continuous at all points of X.

2.7 Proposition

For f as above, f is continuous at x0 if, whenever a sequence xn → x0, then f(xn)→f(x0) (“sequential continuity”).

Proof: Same proof as in real analysis, more or less. If f is continuous at x0 andxn → x0. Then for each ε > 0 we have a δ > 0 such that dY (f(x), f(x0)) < ε wheneverdX(x, x0) < δ.Then there’s an n0 with d(xn, x0) < δ for all n ≥ n0, and so d(f(xn), f(x0)) < ε for alln ≥ n0. Thus f(xn)→ f(x).

Conversely, if f is not continuous at x0, then there is an ε for which no δ will do,so we can find xn with d(xn, x0) < 1/n but d(f(xn), f(x0)) ≥ ε. Then xn → x0 butf(xn) 6→ f(x0).

�

But there is a nicer way to define continuity. For a mapping f : X → Y and a setU ⊂ Y , let f−1(U) be the set

f−1(U) = {x ∈ X : f(x) ∈ U}.

This makes sense even if f−1 is not defined as a function.

2.8 Theorem

A function f : X → Y is continuous if and only if f−1(U) is open in X for every opensubset U ⊂ Y .

(“The inverse image of an open set is open.” Note that for f continuous we do notexpect f(U) to be open for all open subsets of X, for example f : R→ R, f ≡ 0, thenf(R) = {0}, not open.)

LECTURE 6

Proof: Suppose that f is continuous, that U is open, and that x0 ∈ f−1(U), sof(x0) ∈ U . Now there is a ball B(f(x0), ε) ⊂ U , since U is open, and then bycontinuity there is a δ > 0 such that dY (f(x), f(x0)) < ε whenever dX(x, x0) < δ. Thismeans that for d(x, x0) < δ, f(x) ∈ U and so x ∈ f−1(U). That is, f−1(U) is open.[DIAGRAM]

Conversely, if the inverse image of an open set is open, and x0 ∈ X, let ε > 0 be given.We know that B(f(x0), ε) is open, so f−1(B(f(x0), ε)) is open, and contains x0. So itcontains some B(x0, δ) with δ > 0.

12

But now if d(x, x0) < δ, we have x ∈ B(x0, δ) ⊂ f−1(B(f(x0), ε)) so f(x) ∈ B(f(x0), ε)and we have d(f(x), f(x0)) < ε.

�

2.9 Example

Let X = R with the discrete metric, and Y any metric space. Then all functionsf : X → Y are continuous!

(i) Because the inverse image of an open set is an open set, since all sets are open.(ii) Because whenever xn → x0 we have xn = x0 for n large, so obviously f(xn)→ f(x0).

2.10 Proposition

(i) A function f : X → Y is continuous if and only if f−1(F ) is closed whenever F isa closed subset of Y .

(ii) If f : X → Y and g : Y → Z are continuous, then so is the compositiong ◦ f : X → Z defined by (g ◦ f)(x) = g(f(x)).

[DIAGRAM]

Proof: (i) We can do this by complements, as if F is closed, then U = F c is open,and f−1(F ) = f−1(U)c (a point is mapped into F if and only if it isn’t mapped intoU).Then f−1(F ) is always closed when F is closed ⇐⇒ f−1(U) is always open when Uis open.

(ii) Take U ⊂ Z open; then (g ◦ f)−1(U) = f−1(g−1(U)); for these are the points whichmap under f into g−1(U) so that they map under g ◦ f into U .Now g−1(U) is open in Y , as g is continuous, and then f−1(g−1(U)) is open in X sincef is continuous.

�

2.11 Definition

A function f : X → Y is a homeomorphism between metric spaces if it is a bijections.t. f and f−1 are continuous. Then we say X and Y are homeomorphic, or X ∼ Y .

2.12 Example

The real line R is homeomorphic to the open interval (0, 1). For if we take y =tan−1 x this maps it homeomorphically onto (−π/2, π/2), and this can be mappedhomeomorphically onto (0, 1), e.g. by z = 1

π(y + π/2).

13

3 Real inner-product spaces

Notation: vectors written u, v, w, etc. (Sometimes just u, v, w).Scalars written a, b, c, etc.Functions written f , g, h.Coordinates of a vector u normally written u1, u2, u3, etc.

3.1 Inner product in Rn

For vectors u = (u1, u2) and v = (v1, v2) in R2 we write 〈u,v〉 for the standard innerproduct

〈u,v〉 = u1v1 + u2v2 ;

sometimes written u.v or (u,v).We can do similarly for vectors in Rn where n = 1, 2, 3, . . ., (i.e., n components), so ifu = (u1, . . . , un) and v = (v1, . . . , vn) we have

〈u,v〉 = u1v1 + u2v2 + . . .+ unvn.

For example,

〈(1, 2, 3, 4), (0,−1, 5, 2)〉 = 1.0− 2.1 + 3.5 + 4.2 = 21.

3.2 Standard properties of the scalar product

I. LINEARITY.〈au + bv,w〉 = a〈u,w〉+ b〈v,w〉,

for a, b real and u, v, w vectors.

II. SYMMETRY.

〈u,v〉 = 〈v,u〉.

III. POSITIVE DEFINITENESS.〈u,u〉 ≥ 0 for all u, and we have 〈u,u〉 = 0 if and only if u = 0.

The first two are easy to check. For III note that 〈u,u〉 = u21 + . . . + u2

n ≥ 0, and itwill be zero if and only if u1 = . . . = un = 0.

3.3 Definition of a general (real) inner product

Let V be a real vector space and suppose that we have for each pair of vectors u, v inV a real number written 〈u,v〉, such that properties I, II and III of (3.2) hold. Thenwe call V a real inner product space, and 〈u,v〉 the inner product of u and v.

14

N.B. In quantum mechanics and elsewhere people use complex inner products. Not inthis course.

LECTURE 7

3.4 Examples

1. The usual inner product on Rn.

2. We can define a new inner product on R2 by

〈u,v〉 = 2u1v1 + 3u2v2.

Easily checked to be linear (do it!) and symmetric. For positive definiteness, note that

〈u,u〉 = 2u21 + 3u2

2 ≥ 0

and is > 0 unless u1 = u2 = 0.The following alternative is not an inner product, e.g. define

〈u,v〉 = 2u1v1 − 3u2v2,

so 〈u,u〉 = 2u21 − 3u2

2, and would be negative if u = (0, 1), say.

3. For a < b define C[a, b] to be the vector space of all continuous real functions on[a, b].For f , g ∈ C[a, b] define

〈f, g〉 =

∫ b

a

f(x)g(x) dx.

Example: in C[0, 1], let f(x) = x+ 1 and g(x) = 2x.

〈f, g〉 =

∫ 1

0

(x+ 1)(2x) dx =

∫ 1

0

(2x2 + 2x) dx =

[2x3

3+ x2

]1

0

=5

3.

3.5 Other properties of inner products

(a) 〈u, av+bw〉 = 〈av+bw,u〉 (rule II) = a〈v,u〉+b〈w,u〉 (rule I) = a〈u,v〉+b〈u,w〉(rule II again).So it is linear in the second argument as well as the first.

(b) 〈0,u〉 = 〈0u + 0u,u〉 = 0〈u,u〉+ 0〈u,u〉 = 0 for all u, using rule I.Also 〈u,0〉 = 〈0,u〉 = 0, using rule II. This is for any u ∈ V .

(c) More generally we can check that

〈a1u1 + a2u2 + . . .+ aNuN , b1v1 + b2v2 + . . .+ bMvM〉

behaves like multiplication, and we get

N∑i=1

M∑j=1

aibj〈ui,vj〉.

15

4 Lengths, angles, orthogonality

4.1 Definition

In an inner product space we define the length of a vector v (sometimes called its sizeor norm) by

‖v‖ =√〈v,v〉.

Note that 〈v,v〉 is always ≥ 0; also by property III, ‖v‖ = 0 if and only if v = 0.This agrees with what we usually do in Rn, e.g. v = (3, 4,−12), then ‖v‖2 = 32 + 42 +(−12)2 = 9 + 16 + 144 = 169, so ‖v‖ =

√169 = 13.

Example: in C[−1, 1] let f(x) = x. Then

‖f‖2 =

∫ 1

−1

x2 dx =

[x3

3

]1

−1

=2

3,

so ‖f‖ =√

2/3.

Note that if v ∈ V and a ∈ R, then

‖av‖2 = 〈av, av〉 = a2〈v,v〉 = a2‖v‖2,

so ‖av‖ =√a2‖v‖ = |a|.‖v‖, taking the positive square root.

For example (−2)v is twice as big as v, but with direction reversed.

4.2 Definition

The angle between two non-zero vectors u and v is the unique solution θ to

〈u,v〉 = ‖u‖ ‖v‖ cos θ

in the range 0 ≤ θ ≤ π (radians!) It is easy to check that the angle between u and uis 0, and the angle between u and −u is π.We say u and v are orthogonal if 〈u,v〉 = 0. This is because the angle between themsatisfies cos θ = 0 so θ = π/2. This is sometimes written u ⊥ v.

To make sense of our definition we will need to know that

cos θ =〈u,v〉‖u‖.‖v‖

lies between −1 and 1; see later.

Example: in C[0, 1] find the number a such that the functions f(t) = t and g(t) =3t+ a are orthogonal.

Solution:

〈f, g〉 =

∫ 1

0

t(3t+ a) dt =

[t3 +

at2

2

]1

0

= 1 +a

2,

16

so 〈f, g〉 = 0 ⇐⇒ 1 + a2

= 0, or a = −2.

More generally, a set of vectors {u1, . . . ,uN} is an orthogonal set if 〈ui,uj〉 = 0 when-ever i 6= j.

4.3 Pythagoras’s theorem

If 〈u,v〉 = 0 then ‖u + v‖2 = ‖u‖2 + ‖v‖2.

[DIAGRAM – square on the hypotenuse etc.]

Proof:

‖u + v‖2 = 〈u + v,u + v〉= 〈u,u〉+ 〈v,u〉+ 〈u,v〉+ 〈v,v〉= ‖u‖2 + 0 + 0 + ‖v‖2,

using orthogonality.

4.4 Parallelogram identity

‖u + v‖2 + ‖u− v‖2 = 2‖u‖2 + 2‖v‖2.

[DIAGRAM – draw a parallelogram.] The sums of the squares of the two diagonalsequals the sums of the squares of the four sides.

Proof: expand the inner products; see the example sheets.

5 Cauchy–Schwarz and its consequences

In order to make sense of (4.2) we need the following.

5.1 Cauchy–Schwarz inequality

For u and v in an inner-product space,

〈u,v〉2 ≤ 〈u,u〉〈v,v〉,

i.e., |〈u,v〉| ≤ ‖u‖ ‖v‖.

Example: if u1, . . . , un and v1, . . . .vn are real numbers, then∣∣∣∣∣n∑i=1

uivi

∣∣∣∣∣ ≤(

n∑i=1

u2i

)1/2( n∑i=1

v2i

)1/2

.

17

Note that the LHS is |〈u,v〉| and the RHS is ‖u‖ ‖v‖, where u = (u1, . . . , un),v = (v1, . . . , vn) and we use the standard inner product in Rn.

LECTURE 8

We give two proofs, and in each we assume that u 6= 0 and v 6= 0 (otherwise theinequality is obvious).

Proof 1:

Take‖au− bv‖2 = a2‖u‖2 − 2ab〈u,v〉+ b2‖v‖2 ≥ 0,

with a = 〈u,v〉 and b = ‖u‖2. We get

‖u‖2(〈u,v〉2 − 2〈u,v〉2 + ‖u‖2‖v‖2) ≥ 0,

which gives the result.

Proof 2:

For real t we have〈tu + v, tu + v〉 ≥ 0,

i.e.,t2〈u,u〉+ 2t〈u,v〉+ 〈v,v〉 ≥ 0.

We’ll minimize this over t, so by differentiation this is where

2t〈u,u〉+ 2〈u,v〉 = 0.

So we put t = −〈u,v〉/〈u,u〉, and we get

〈u,v〉2

〈u,u〉− 2〈u,v〉2

〈u,u〉+ 〈v,v〉 ≥ 0.

This simplifies to

−〈u,v〉2

〈u,u〉+ 〈v,v〉 ≥ 0,

i.e.,〈u,v〉2

〈u,u〉≤ 〈v,v〉,

which is what is required.

NOW we know that〈u,v〉‖u‖ ‖v‖ lies between -1 and 1, and so the definition of angle

makes sense.

18

5.2 Triangle inequality

In an inner product space we have

‖u + v‖ ≤ ‖u‖+ ‖v‖.

For example, in Rn this gives(n∑i=1

(ui + vi)2

)1/2

≤

(n∑i=1

u2i

)1/2

+

(n∑i=1

v2i

)1/2

.

[DIAGRAM – triangle of vectors]

Proof:

‖u + v‖2 = 〈u + v,u + v〉= ‖u‖2 + 2〈u,v〉+ ‖v‖2

≤ ‖u‖2 + 2‖u‖ ‖v‖+ ‖v‖2 = (‖u‖+ ‖v‖)2.

5.3 Theorem

In an inner-product space the norm (length) of a vector satisfies(i) ‖u‖ ≥ 0, and ‖u‖ = 0 if and only if u = 0;(ii) ‖au‖ = |a| ‖u‖;(iii) ‖u + v‖ ≤ ‖u‖+ ‖v‖.

5.4 Corollary

Let V be an inner-product space, and define d(x,y) = ‖x− y‖. Then d is a metric.

Proof: From Theorem 5.3, we see easily that d(x,y) ≥ 0 and d(x,y) = 0 if andonly if x− y = 0, i.e., x = y.Also d(x,y) = ‖x− y‖ = ‖y − x‖ = d(y,x).Finally

d(x, z) = ‖x− z‖ = ‖(x− y) + (y − z)‖ ≤ ‖x− y‖+ ‖y − z‖ = d(x,y) + d(y, z).

So every inner-product space is a metric space.�

5.5 The space `2

The elements of the space `2 (also written `2) are real sequences (uk)∞k=1 such that∑∞

k=1 u2k <∞.

So, for example (12, 1

4, 1

8, . . .) ∈ `2, since

∑∞k=1

(12k

)2< ∞ (geometric series); but

(1, 2, 3, 4, . . .) 6∈ `2, since∑∞

k=1 k2 =∞.

19

We shall get a vector space by adding sequences term-wise; if u = (uk) and v = (vk),then u + v = (uk + vk) and au = (auk), just like vectors with an infinite sequence ofcomponents.How do we know that (uk + vk) is still in `2?

Proof: for each N ,(N∑k=1

(uk + vk)2

)1/2

≤

(N∑k=1

u2k

)1/2

+

(N∑k=1

v2k

)1/2

≤

(∞∑k=1

u2k

)1/2

+

(∞∑k=1

v2k

)1/2

= A,

say, where we used first the triangle inequality in RN . Since this holds for every N welet N →∞ to see that

∑Nk=1(uk + vk)

2 converges, and its limit is at most A2.

In fact `2 is an inner-product space; define

〈u,v〉 =∞∑k=1

ukvk.

To see that this sum converges, use Cauchy–Schwarz in RN :

N∑k=1

|ukvk| ≤

(N∑k=1

u2k

)1/2( N∑k=1

v2k

)1/2

≤

(∞∑k=1

u2k

)1/2( ∞∑k=1

v2k

)1/2

= B,

say. Hence∑∞

k=1 |ukvk| converges to a limit which is at most B. So∑∞

k=1 ukvk isabsolutely convergent.

It is easy now to check that this defines an inner product. Also ‖u‖2 = 〈u,u〉 =∑∞k=1 u

2k, so it is like Rn with n =∞. It is an infinite-dimensional vector space, but a

very useful one.

LECTURE 9

6 Orthonormal sets

6.1 Definition

20

A set of vectors {e1, . . . , en} in an inner product space is orthonormal if it is orthogonaland each vector has norm 1. So

〈ei, ej〉 =

{0 if i 6= j,1 if i = j.

If it’s also a basis for the inner product space, then we call it an orthonormal basis.

Examples:(i) (1, 0, 0), (0, 1, 0), (0, 0, 1) is an orthonormal basis of R3 (the standard basis);(ii) An unusual orthonormal basis of R2 is e1 = (3

5, 4

5) and e2 = (−4

5, 3

5).

[DIAGRAM – draw the vectors]

6.2 Proposition

If {e1, . . . , en} is orthonormal, then∥∥∥∥∥n∑i=1

aiei

∥∥∥∥∥ =

(n∑i=1

a2i

)1/2

,

for any scalars a1, . . . , an, and so the vectors {e1, . . . , en} are linearly independent.

Proof: ⟨n∑i=1

aiei,n∑j=1

ajej

⟩=

n∑i=1

n∑j=1

aiaj〈ei, ej〉,

by (3.5). All terms except for those with i = j are zero, and we get∑n

i=1 a2i , as required.

Also, if∑n

i=1 aiei = 0, then∑n

i=1 a2i = 0, and so a1 = . . . = an = 0; i.e., the vectors

are independent.

6.3 The Gram–Schmidt process

We start with a sequence v1, . . . ,vn of independent vectors and end up with a se-quence e1, . . . , en of orthonormal vectors such that for each 1 ≤ k ≤ n the set{e1, . . . , ek} spans the same subspace as {v1, . . . ,vk}.

Define w1 = v1 and e1 = w1/‖w1‖.

Let w2 = v2 − 〈v2, e1〉e1, and e2 = w2/‖w2‖.

Then w3 = v3 − 〈v3, e1〉e1 − 〈v3, e2〉e2, and e3 = w3/‖w3‖.

In general

wk+1 = vk+1 −k∑i=1

〈vk+1, ei〉ei, and ek+1 = wk+1/‖wk+1‖.

21

Then {e1, . . . , en} are orthonormal and for each k the vectors e1, . . . , ek span the samespace as v1, . . . ,vk.

Proof: Basically, the orthonormality property is shown by induction.

Suppose that we know that e1, . . . , ek are orthonormal (k = 1 is already done).Then we work out 〈wk+1, ej〉 for j ≤ k. So

〈wk+1, ej〉 = 〈vk+1, ej〉 −k∑i=1

〈vk+1, ei〉〈ei, ej〉 = 〈vk+1, ej〉 − 〈vk+1, ej〉 = 0.

So each new vector wk+1 and hence also ek+1 is orthogonal to the earlier ej. It isn’tzero since vk+1 is independent of v1, . . . ,vk.

Also ek+1 = wk+1/‖wk+1‖ implies that ‖ek+1‖ = ‖wk+1‖/‖wk+1‖ = 1.

The span of e1, . . . , ek is k-dimensional and contained in span{v1, . . . ,vk}, so mustequal it.

Example: Take v1 = (1, 0, 0, 1), v2 = (2, 3, 2, 0) and v3 = (0, 7,−2, 2) in R4.Set w1 = v1 and

e1 = w1/‖w1‖ =1√2

(1, 0, 0, 1).

w2 = v2 − 〈v2, e1〉e1 = (2, 3, 2, 0)− 1√2

21√2

(1, 0, 0, 1) = (1, 3, 2,−1).

Note that w2 ⊥ e1. Then

e2 = w2/‖w2‖ =1√15

(1, 3, 2,−1).

Next

w3 = v3 − 〈v3, e1〉e1 − 〈v3, e2〉e2

= (0, 7,−2, 2)− 2√2

1√2

(1, 0, 0, 1)− 1√15

151√15

(1, 3, 2,−1)

= (0, 7,−2, 2)− (1, 0, 0, 1)− (1, 3, 2,−1) = (−2, 4,−4, 2).

Finally,

e3 = w3/‖w3‖ =(−2, 4,−4, 2)√

40=

1√10

(−1, 2,−2, 1).

Having done this, CHECK that

〈ei, ej〉 =

{1 if i = j,0 if i 6= j.

22

Example (Legendre polynomials):Take the functions 1, t, t2, t3, . . . in C[−1, 1] with inner product

〈f, g〉 =

∫ 1

−1

f(t)g(t) dt.

Now ‖1‖2 =∫ 1

−11 dt = 2, so e1(t) = 1/

√2.

Next take

w2(t) = t− 〈t, e1〉e1 = t− 1√2

∫ 1

−1

t√2dt = t− 0 = t.

Also

‖w2‖2 =

∫ 1

−1

t2 dt =2

3, so e2(t) = w2(t)/‖w2‖ =

√3

2t.

Then

w3(t) = t2 − 〈t2, e1〉e1(t)− 〈t2, e2〉e2(t)

= t2 − 1

2

∫ 1

−1

t2 dt− 3

2t

∫ 1

−1

t3 dt

= t2 − 1

3− 0 = t2 − 1

3.

But

‖w3‖2 =

∫ 1

−1

(t2 − 1

3)2 dt =

∫ 1

−1

(t4 − 2

3t2 +

1

9) dt

=2

5− 4

9+

2

9=

8

45,

so

e3(t) =

√45

8(t2 − 1

3) =

√5

8(3t2 − 1).

In general, en(t) has degree n− 1. Lots of useful systems of polynomials are obtainedby orthonormalizing 1, t, t2, t3, . . . with respect to different inner products (e.g. Cheby-shev, Hermite, Laguerre, . . . ).

23

LECTURE 10

7 Orthogonal projections and best approximation

Many approximation problems consist of taking a vector v and a subspace W of aninner-product space, and then finding the closest element w in W to v, i.e., minimizingthe size of the error v −w.

Examples:1. Take R3 with the usual inner product and W a plane through the origin.The closest point of W is obtained by “dropping a perpendicular onto W”.

[DIAGRAM]

2. Find the best approximation to the function f(t) = |t| on [−1, 1] by a quadraticg(t) = a+ bt+ ct2, in the sense of minimizing

‖f − g‖2 =

∫ 1

−1

(f(t)− g(t))2 dt.

7.1 Theorem

Let W be a (finite-dimensional) subspace of an inner-product space V , let v ∈ V , andlet w ∈ W satisfy

〈v −w, z〉 = 0 for all z ∈ W.Then ‖v−y‖ ≥ ‖v−w‖ for all y ∈ W . That is, w is the closest point in W to v, andit is unique.

[DIAGRAM: plot v, w, y.]

Proof: for y ∈ W write v−y = (v−w) + (w−y) and note that v−w is orthogonalto w − y, since w − y is in W .By Pythagoras’s theorem (4.3),

‖v − y‖2 = ‖v −w‖2 + ‖w − y‖2 ≥ ‖v −w‖2,

as required. Note that if y 6= w, then ‖v−y‖ > ‖v−w‖ so the closest point is unique.

7.2 Definition

If W is a subspace of an inner product space V , then its orthogonal complement, W⊥,is the set of all vectors u that are orthogonal to every vector of W .Clearly 0 ∈ W⊥, and indeed W⊥ is a subspace, since if u1 and u2 are orthogonal toeverything in W , then 〈a1u1 + a2u2,w〉 = a1〈u1,w〉+ a2〈u2,w〉 = 0 for all w ∈ W .

24

Example: if W is the 1-dimensional subspace of R3 spanned by the vector w = (3, 5, 7)then x = (x1, x2, x3) is in W⊥ if and only if 〈x,w〉 = 0, i.e., 3x1 + 5x2 + 7x3 = 0. Thisis the plane perpendicular to W .

It can be checked that (W⊥)⊥ is W again.

Now in (7.1) we have that if v−w lies in W⊥, then w is the best approximation to vby vectors in W .

7.3 The normal equations

Suppose that w1, . . . ,wn is a basis for W . Then the best approximant w to v is foundby solving

〈v −w,wi〉 = 0 for each i,

because this makes v − w orthogonal to all linear combinations of the wi. Hence wehave

〈w,wi〉 = 〈v,wi〉 for each i.

Suppose now that w =∑n

k=1 ckwk is the best approximant. Then we have

n∑k=1

ck〈wk,wi〉 = 〈v,wi〉

for each i = 1, . . . , n.

Example.In C[−1, 1] we take f(t) = |t|; to approximate it by a quadratic take w1(t) = 1,w2(t) = t and w3(t) = t2.The best approximant c0 + c1t+ c2t

2 to |t| satisfies:

c0〈1, 1〉+ c1〈t, 1〉+ c2〈t2, 1〉 = 〈f, 1〉,c0〈1, t〉+ c1〈t, t〉+ c2〈t2, t〉 = 〈f, t〉,

c0〈1, t2〉+ c1〈t, t2〉+ c2〈t2, t2〉 = 〈f, t2〉.

Now we can easily check that∫ 1

−1

tk dt =

{0 if k is odd,

2k+1

if k is even,

so we can soon calculate inner products and get

2c0 + 0 +2

3c2 =

∫ 1

−1

|t| dt = 1

25

0 +2

3c1 + 0 =

∫ 1

−1

|t|t dt = 0,

2

3c0 + 0 +

2

5c2 =

∫ 1

−1

t2|t| dt =1

2.

Note that ∫ 1

−1

|t| dt =

∫ 0

−1

(−t) dt+

∫ 1

0

t dt,

etc.The solution to these equations is c0 = 3

16, c1 = 0 and c2 = 15

16, giving the approximation

|t| ≈ 3

16+

15

16t2.

7.4 Corollary

Suppose that e1, . . . , en is an orthonormal basis for W . Then the best approximant ofv ∈ V by an element of W is

w =n∑k=1

〈v, ek〉ek.

26

Proof: Let w =∑n

k=1 ckek. Then the normal equations become

n∑k=1

ck〈ek, ei〉 = 〈v, ei〉,

which reduces to ci = 〈v, ei〉 using orthonormality.

Thus we could have solved the example of approximating f(t) = |t| by using an or-thonormal basis for the quadratic polynomials, e.g. the Legendre functions.

7.5 Definition

The orthogonal projection of v onto W , written PWv, is the closest vector w ∈ W tov. In particular,

PWv =n∑k=1

〈v, ek〉ek,

if {e1, . . . , en} is an orthonormal basis of W . Note that PW : V → W is a linearmapping.

LECTURE 11

Example: the plane W = {(x1, x2, x3) ∈ R3 : x1 + x2 + x3 = 0} is a 2-dimensionalsubspace with orthonormal basis e1 = 1√

2(1,−1, 0) and e2 = 1√

6(1, 1,−2). CHECK

that these are orthonormal and lie in W (so, since dimW = 2, they are also a basis forit).

Calculate PW (1, 0, 0). It is

PW (1, 0, 0) = 〈(1, 0, 0), e1〉e1 + 〈(1, 0, 0), e2〉e2

=1

2(1,−1, 0) +

1

6(1, 1,−2) = (

2

3,−1

3,−1

3).

Now for some more serious applications of the theory.

7.6 Least squares approximation

Problem: find the line through (0, 0) (to be varied later) which “best approximates”the data (x1, y1), . . . , (xn, yn). We would like yi = cxi for each i, but we don’t know cand the points won’t always lie exactly on a line.

[DIAGRAM]

We decide to minimize∑n

i=1(yi − cxi)2, least squares approximation, useful in statis-tical applications.

27

This is the same as taking x = (x1, . . . , xn) and y = (y1, . . . , yn) in Rn and minimizing‖y − cx‖.Take V to be Rn, usual inner product, and W to be the one-dimensional subspace{ax : a ∈ R}. This is the same as finding the closest point to y in W .

Solution: take

c =〈y,x〉〈x,x〉

,

since this is the orthogonal projection onto W . In detail, w = cx, and the normalequation is 〈w,x〉 = 〈y,x〉, or c〈x,x〉 = 〈y,x〉.So

c =x1y1 + . . .+ xnynx2

1 + . . .+ x2n

.

Example: find the best fit to the data

x y2 31 23 34 5

Solution:

c =2.3 + 1.2 + 3.3 + 4.5

22 + 12 + 32 + 42=

37

30.

7.7 Generalization

Suppose that y is known/guessed to be a linear combination of m variables x1, . . . ,xm,y = c1x1 + . . .+ cmxm, so we have experimental data

x1 x2 . . . xm yx11 x21 . . . xm1 y1...

......

...x1n x2n . . . xmn yn

Set up the problem in Rn and choose c1, . . . , cm to minimize ‖y− (c1x1 + . . .+ cmxm)‖.If W = span{x1, . . . ,xm}, then we want the closest point in W to y.We know from (7.3) that the constants c1, . . . , cm are determined by the normal equa-tions:

〈m∑k=1

ckxk,xi〉 = 〈y,xi〉 for each i,

i.e.,

c1〈x1,x1〉 + . . .+ cm〈xm,x1〉 = 〈y,x1〉. . . . . . = . . .

c1〈x1,xm〉 + . . .+ cm〈xm,xm〉 = 〈y,xm〉

28

To get a unique solution we need the vectors x1, . . . ,xm to be independent, which re-quires n ≥ m.

Example: Use the method of least squares approximation to find the best relation ofthe form y = c1x1 + c2x2 fitting the following experimental data:

x1 x2 yi) 1 0 2ii) 0 1 3iii) 1 1 2iv) 1 −1 0

Solution: We work in R4 and take x1 = (1, 0, 1, 1), x2 = (0, 1, 1,−1) and y =(2, 3, 2, 0). Normal equations are

3c1 + 0c2 = 40c1 + 3c2 = 5,

so c1 = 4/3 and c2 = 5/3. So the best relation is y = 43x1 + 5

3x2, giving

x1 x2 yexperimental ytheoretical1 0 2 4/30 1 3 5/31 1 2 31 −1 0 −1/3

7.8 Curve fitting

Given (x1, y1), . . . , (xn, yn) find a (polynomial) curve which fits these points well in thesense of least squares approximation.

Example: Find the parabola y = c0 + c1x + c2x2 which best fits the points (0, 0),

(1, 4), (−1, 1), (−2, 5).

Solution: Apply the method of least squares approximation to y = c0x0 + c1x1 + c2x2

with x0 = 1, x1 = x and x2 = x2.Put x0 = (1, 1, 1, 1), x1 = (0, 1,−1,−2), x2 = (0, 1, 1, 4), y = (0, 4, 1, 5). Note that x0

is the vector with all components 1, x1 the vector of x values, and and x2 the vectorof x2 values.Normal equations are

4c0 − 2c1 + 6c2 = 10,−2c0 + 6c1 − 8c2 = −7,6c0 − 8c1 + 18c2 = 25,

from which c0 = 3/10, c1 = 8/5 and c2 = 2.

Example: Find the line y = c0 + c1x which best fits the points (2, 3), (1, 2), (3, 3) and(4, 5). (Data used earlier to get y = cx only.)

29

Solution: let x0 = (1, 1, 1, 1), x1 = (2, 1, 3, 4) and y = (3, 2, 3, 5). So we wanty ≈ c0x0 + c1x1.Normal equations are

4c0 + 10c1 = 1310c0 + 30c1 = 37,

giving c0 = 1 and c1 = 9/10, or y = 1 + 910x.

LECTURE 12

8 Cauchy sequences and completeness

Recall that if (X, d) is a metric space, then a sequence (xn) of elements of X convergesto x ∈ X if d(xn, x) → 0, i.e., if given ε > 0 there exists N such that d(xn, x) < εwhenever n ≥ N .

Often we think of convergent sequences as ones where xn and xm are close togetherwhen n and m are large. This is almost, but not quite, the same thing.

8.1 Definition

A sequence (xn) in a metric space (X, d) is a Cauchy sequence if for any ε > 0 there isan N such that d(xn, xm) < ε for all n,m ≥ N .

Example: take xn = 1/n in R with the usual metric. Now d(xn, xm) =∣∣ 1n− 1

m

∣∣.Suppose that n and m are both at least as big as N ; then d(xn, xm) ≤ 1/N .

[DIAGRAM, showing the points]

Hence if ε > 0 and we take N > 1/ε, we have d(xn, xm) ≤ 1/N < ε whenever n and mare both ≥ N .

In fact all convergent sequences are Cauchy sequences, by the following result.

8.2 Theorem

Suppose that (xn) is a convergent sequence in a metric space (X, d), i.e., there is alimit point x such that d(xn, x)→ 0. Then (xn) is a Cauchy sequence.

Proof: take ε > 0. Then there is an N such that d(xn, x) < ε/2 whenever n ≥ N .Now suppose both n ≥ N and m ≥ N . Then

d(xn, xm) ≤ d(xn, x) + d(x, xm) = d(xn, x) + d(xm, x) < ε/2 + ε/2 = ε,

30

and we are done.

8.3 Proposition

Every subsequence of a Cauchy sequence is a Cauchy sequence.

Proof: if (xn) is Cauchy and (xnk) is a subsequence, then given ε > 0 there is an N

such that d(xn, xm) < ε whenever n,m ≥ N . Now there is a K such that nk ≥ Nwhenever k ≥ K. So d(xnk

, xnl) < ε whenever k, l ≥ K.

Does every Cauchy sequence converge?

Examples: 1. (X, d) = Q, as a subspace of R with the usual metric. Take x0 = 2 anddefine xn+1 = xn

2+ 1

xn. The sequence continues 3/2, 17/12, 577/408, . . . and indeed

xn → x where x = x2

+ 1x, i.e., x2 = 2. But this isn’t in Q.

Thus (xn) is Cauchy in R, since it converges to√

2 when we think of it as a sequencein R. So it is Cauchy in Q, but doesn’t converge to a point of Q.

2. Easier. Take (X, d) = (0, 1). Then(

1n

)is a Cauchy sequence in X (since it is

Cauchy in R, as seen above), and has no limit in X.

In each case there are “points missing from X”.

8.4 Definition

A metric space (X, d) is complete if every Cauchy sequence in X converges to a limitin X.

Is R complete? What do we mean by R? We could regard it as the set of all infinitedecimal numbers; but since there is an ambiguity e.g. 0.999. . . = 1.000. . . , we haveto allow for this, e.g. by regarding all the recurring-9 numbers as the same as thecorresponding recurring-0 numbers.

Cauchy sequences can be awkward, e.g. xn = 12+(−1)n 1

10n, i.e., 0.4, 0.51, 0.499, 0.5001,

0.49999, . . . , will converge to 0.5, even though the individual digits do not converge.

8.5 Theorem

R is complete.

We do this in several stages.

31

A: Every bounded increasing or decreasing sequence in R converges. Increasing meansx1 ≤ x2 ≤ . . . and you can guess what decreasing means. Monotone means eitherincreasing or decreasing.B: Every Cauchy sequence in R is bounded.C: Every sequence in R has a monotone subsequence.D: If a Cauchy sequence has a convergent subsequence, then the original sequenceconverges.E: R is complete.

Proof of E: let (xn) be a Cauchy sequence in R. By (C) it has a monotone subse-quence (xnk

), which is also Cauchy by (8.3). By (B) this sequence is bounded. So by(A) it converges. Now by (D) the original sequence converges.

Proof of A: we can take this as an axiom of R, or observe that if the numbers areincreasing and bounded, then eventually the integer parts are constant, then the firstdigit after the decimal point, then the second, . . . , so it is clear what number we wantas our limit. But if xn agrees with x to k decimal places then |xn − x| < 10−k; thisshows that xn → x.Example, 1.0, 1.2, 1.4, 1.41, 1.412, 1.414, 1.4141, 1.4142, ... is homing in on

√2.

LECTURE 13

Proof of B: if (xn) is Cauchy, then with ε = 1 we know that |xm − xn| < 1 when-ever m, n ≥ N . Now |xn| ≤ |xn − xN | + |xN | < 1 + |xN | for all n ≥ N . LetK = max{|x1|, |x2|, . . . , |xN−1|, 1 + |xN |}. Then |xn| ≤ K for all n.

Proof of D: suppose that (xn) is Cauchy in (X, d) and limk→∞ xnk= y. Take ε > 0.

Then there exists N such that d(xm, xn) < ε/2 whenever m,n ≥ N ; and K such thatd(xnk

, y) < ε/2 whenever k ≥ K. Choose k ≥ K such that nk ≥ N . Then for n ≥ N ,

d(xn, y) ≤ d(xn, xnk) + d(xnk

, y) < ε,

and so d(xn, y)→ 0 as n→∞.

Proof of C: let (xn) be a sequence in R. We say that xm is a peak point of the sequenceif xm ≥ xn for all n > m.

[DIAGRAM]

Case 1: only finitely many peak points. Choose n1 large so that xn is not a peak pointfor any n ≥ n1.Since xn1 is not a peak point we can find n2 > n1 with xn2 > xn1 ;since xn2 is not a peak point we can find n3 > n2 with xn3 > xn2 ; and so on.

32

Now (xnk) is strictly increasing.

Case 2: (xn) has infinitely many peak points, say, xn1 , xn2 , . . . , with n1 < n2 < . . ..Now xn1 ≥ xn2 ≥ . . ., so (xnk

) is a decreasing subsequence.

We have finished the proof that R is complete.

8.6 Corollary

A subset X ⊂ R is complete if and only if it is closed.

Proof: If X is not closed, then X 6= X, so there is a point y ∈ R such that y ∈ X\X.There is a sequence (xn) in X that converges to y, by Theorem 2.2. Then (xn) is aCauchy sequence by Theorem 8.2, but it does not have a limit in X, so X is not com-plete.

Conversely, if X is closed and (xn) is a Cauchy sequence in X, then it has a limit y inR, since R is complete, by Theorem 8.5. But then y ∈ X by Theorem 2.2, so y ∈ Xsince X is closed. Hence X is complete.

�

Examples: open intervals in R are not complete; closed intervals are complete.

What about C[a, b] with d1, d2 or d∞?

Define fn in C[0, 2] by

fn(x) =

{xn for 0 ≤ x ≤ 1,1 for 1 ≤ x ≤ 2.

[DIAGRAM]

Then

d1(fn, fm) =

∫ 2

0

|fn(x)− fm(x)| dx

=

∫ 1

0

|xn − xm| dx

=

∫ 1

0

(xm − xn) dx if n ≥ m

=1

m+ 1− 1

n+ 1≤ 1

m+ 1→ 0,

and hence (fn) is Cauchy in (C[0, 2], d1). Does the sequence converge?

33

If there is an f ∈ C[0, 2] with fn → f as n → ∞, then

∫ 2

0

|fn(x) − f(x)| dx → 0, so∫ 1

0

and

∫ 2

1

both tend to zero. So fn → f in (C[0, 1], d1), which means that f(x) = 0

on [0, 1] (from an example we did earlier). Likewise, f = 1 on [1, 2], which doesn’t givea continuous limit.

Similarly, (C[a, b], d1) is incomplete in general. Also it is incomplete in the d2 metric,as the same example shows (a similar calculation with squares of functions).

What about d∞?

8.7 Definition

A sequence (fn) of (not necessarily continuous) functions defined on [a, b] is said toconverge uniformly to f if sup{|fn(x) − f(x)| : x ∈ [a, b]} → 0 as n → ∞. (If theseare continuous functions, then this is just convergence in the d∞ metric.)

8.8 Theorem

If (fn) are continuous functions and fn → f uniformly, then f is also continuous.

Proof: Take ε > 0 and a point x ∈ [a, b]. Then there is an N such that |fn(t)−f(t)| <ε/3 for all t ∈ [a, b] whenever n ≥ N . Now fN is continuous, so we can choose δ > 0such that |fN(t)− fN(x)| < ε/3 for all t ∈ [a, b] with |t− x| < δ. Then

|f(t)− f(x)| ≤ |f(t)− fN(t)|+ |fN(t)− fN(x)|+ |fN(x)− f(x)|≤ ε/3 + ε/3 + ε/3 = ε

whenever t ∈ [a, b] and |t− x| < δ. Hence f is continuous at x.

Thus, for example, the functions fn(t) = tn converge pointwise on [0, 1] to

g(t) =

{0 for 0 ≤ t < 1,1 for t = 1,

but g is not continuous, so the convergence isn’t uniform.

LECTURE 14

8.9 Theorem

(C[a, b], d∞) is a complete metric space.

Proof: take a Cauchy sequence (fn) in (C[a, b], d∞). The proof goes in two steps.

I: For each x ∈ [a, b], (fn(x)) is a Cauchy sequence in R, and so has a limit, which wecall f(x).

34

II: fn → f uniformly; hence f ∈ C[a, b] and d∞(fn, f)→ 0.

Step I: given ε > 0 there is an N with d∞(fn, fm) < ε for n,m ≥ N , since (fn) isCauchy. But |fn(x) − fm(x)| ≤ d∞(fn, fm) and so this is also < ε for n,m ≥ N . So(fn(x)) is a Cauchy sequence in R. Since R is complete by (8.5), we see that there isa limiting value f(x).

Step II: take ε > 0 and N as in Step I. Then |fn(x)− fm(x)| < ε for each x, providedthat n,m ≥ N . Fix n ≥ N and let m → ∞. We conclude that |fn(x)− f(x)| ≤ ε foreach x, provided that n ≥ N . This is just the uniform convergence of fn to f . So f iscontinuous, i.e., f ∈ C[a, b] by (8.8), and d∞(fn, f)→ 0.

8.10 Remark

Note that R2 is also complete with any of the metrics d1, d2 and d∞; since a Cauchy/convergent sequence (vn) = (xn, yn) in R2 is just one in which both (xn) and (yn) areCauchy/ convergent sequences in R (cf. Prop. 2.4).

Similar arguments show that Rk is also complete for k = 1, 2, 3, . . ., and (with the sameproof as for Corollary 8.6) all closed subsets of Rk are complete.

9 Contraction mappings

Our aim is to use metric spaces to solve equations by using an iterative method to getapproximate solutions.

9.1 Examples

1. x3 + 2x2 − 8x+ 4 = 0. Rewrite this as x = 18(x3 + 2x2 + 4).

Consider the function φ : R→ R given by φ(x) = 18(x3 + 2x2 + 4).

Then x is a root of our equation if and only if φ(x) = x, i.e., x is a fixed point of φ.

Guess a solution x0; then let x1 = φ(x0), x2 = φ(x1), . . .. This gives a sequence ofnumbers x0, x1, x2, . . . , xn, xn+1 = φ(xn), . . ..If these terms converge to a limit, then this limit should be a solution.e.g. take x0 = 0, then x1 = 0.5, x2 = 0.578, x3 = 0.608, x4 = 0.621, x5 = 0.626,x6 = 0.629, x7 = 0.630, x8 = 0.630.

2.dydx

= x(x+ y), for 0 ≤ x ≤ 1, with y(0) = 0.

35

Rewrite as

y(x) =

∫ x

0

t(t+ y(t)) dt.

Define

φ(f)(x) =

∫ x

0

t(t+ f(t)) dt.

So y = f(x) solves the original equation if and only if φ(f) = f .Again, try to find the solution as the limit of a sequence. Take f0(x) = 0 for 0 ≤ x ≤ 1.Then

f1 = φ(f0), i.e., f1(x) =

∫ x

0

t(t+ f0(t)) dt =

∫ x

0

t2 dt =x3

3.

f2 = φ(f1), i.e., f2(x) =

∫ x

0

t(t+ f1(t)) dt =

∫ x

0

t(t+t3

3) dt =

x3

3+x5

15.

f3 = φ(f2), i.e., f3(x) =

∫ x

0

t(t+t3

3+t5

15) dt =

x3

3+x5

15+

x7

105.

Suppose we have a metric space (X, d) and a function φ : X → X. Choose x0 ∈ Xand define xn = φ(xn−1) for n ≥ 1. This gives a sequence (xn); if it is Cauchy and(X, d) is complete, then x = limn→∞ xn exists and x should solve x = φ(x). How canwe guarantee that (xn) will be Cauchy?

Note that d(xn, xn+1) = d(φ(xn−1), φ(xn)), so to get (xn) Cauchy we want φ to shrinkdistances. Let’s call φ : X → X a shrinking map if d(φ(y), φ(z)) < d(y, z) for ally, z ∈ X with y 6= z.

Example:

Take X = [1,∞), regarded as a subspace of R, usual metric. It is complete. Defineφ : X → X by φ(x) = x + 1

x. How can we check that φ is a shrinking map? Answer:

use the Mean Value Theorem (MVT)!

|φ(x)− φ(y)||x− y|

= |φ′(c)|

for some c between x and y. Now φ′(c) = 1− 1/c2, so |φ′(c)| < 1 for all c ∈ X.Hence |φ(x)− φ(y)| < |x− y| for any x 6= y, and so φ is a shrinking map.

Take x0 = 1, x1 = φ(x0) = 2, x2 = φ(x1) = 2 + 1/2 = 5/2, x3 = φ(x2) = 29/10,x4 = φ(x3) = 941/290, . . . . Clearly (xn) is increasing. If it remains bounded, then ithas a limit, `, say. Then we shall have ` = ` + 1/`, which is impossible. So (xn) isunbounded, doesn’t converge, isn’t Cauchy. Too bad!

LECTURE 15

36

9.2 Definition

Let (X, d) be a metric space. A map φ : X → X is a contraction mapping, if thereexists a constant k < 1 such that

d(φ(x), φ(y)) ≤ kd(x, y) for all x, y ∈ X.

Examples:1. Take X = [0, 1], usual metric, and φ(x) = x2/3. Then

d(φ(x), φ(y)) =

∣∣∣∣x2

3− y2

3

∣∣∣∣ =

∣∣∣∣13(x+ y)(x− y)

∣∣∣∣ ≤ 2

3|x− y|.

So φ is a contraction mapping, with k = 2/3.

2. Take X = R and φ(x) = 14

sin 3x. So |φ(x)− φ(y)| = 14| sin 3x− sin 3y|.

Use MVT! φ′(x) = 34

cos 3x, so |φ′(x)| ≤ 34, and

|φ(x)− φ(y)| = |φ′(c)| |x− y| ≤ 3

4|x− y|, etc.

3. Take X = [1,∞) and φ(x) = x+ 1/x. Suppose x = n and y = n+ 1; then

φ(y)− φ(x) = n+ 1 +1

n+ 1− n− 1

n= 1− 1

n(n+ 1),

so |φ(y) − φ(x)|/|y − x| can be made as close to 1 as we like by taking x = n andy = n + 1 for n large. Thus φ (which is a shrinking mapping) is not a contractionmapping.

9.3 Theorem

Let (X, d) be a metric space and let φ : X → X be a contraction mapping. Thenfor each x0 ∈ X the sequence defined by xn = φ(xn−1) (for each n ≥ 1) is a Cauchysequence.

Proof: for some k < 1 we have d(φ(x), φ(y)) ≤ kd(x, y). So

d(x2, x1) ≤ kd(x1, x0) = kd, say,

d(x3, x2) ≤ kd(x2, x1) ≤ k2d,

. . . ≤ . . .

d(xn+1, xn) ≤ knd, and so on.

Hence for n > m we have

d(xn, xm) ≤ d(xm, xm+1) + d(xm+1 + xm+2) + . . .+ d(xn−1, xn)

≤ kmd+ km+1d+ . . .+ kn−1d from the above

≤ kmd(1 + k + k2 + . . .) =kmd

1− k,

37

which tends to 0 as m→∞. Thus (xn) is Cauchy.

Note: in the above theorem, if (X, d) is complete, then (xn) will converge to a limitx ∈ X. Note that x is a fixed point of φ, i.e., φ(x) = x, since

d(x, φ(x)) ≤ d(x, xn) + d(xn, φ(xn)) + d(φ(xn), φ(x))

≤ d(x, xn) + d(xn, xn+1) + kd(xn, x),

and each term tends to 0 as n→∞. So d(x, φ(x)) = 0, i.e., x = φ(x).

9.4 Theorem (Banach’s Contraction Mapping Theorem) (CMT)

Let (X, d) be a complete metric space, and let φ : X → X be a contraction mapping.Then φ has a unique fixed point. If x0 is any point in X then the sequence defined byxn = φ(xn−1) (for n ≥ 1) converges to the fixed point.

Proof: by Theorem 9.3 and the note following it, we have proved everything exceptthe fact that there is only one fixed point for φ. But if x and y are fixed points, then

d(x, y) = d(φ(x), φ(y)) ≤ kd(x, y),

with k < 1; this can only happen if d(x, y) = 0, i.e., x = y.

How to apply the CMT: suppose we want to solve the equation φ(x) = x, where φis a contraction mapping. Take x0 ∈ X and construct (xn) as above. Then (xn) tendsto a solution x.

Note that in an incomplete metric space, there may be problems. For example takeX = (0, 1) ⊂ R and φ(x) = x/2. The iterates form a Cauchy sequence but the limit,0, isn’t in the space, and there is no fixed point in the space.

9.5 How fast does it converge?

Answer: d(x1, x) = d(φ(x0), φ(x)) ≤ kd(x0, x), and in general d(xn, x) ≤ knd(x0, x).Also d(x0, x) ≤ d(x0, x1)+d(x1, x) ≤ d(x0, x1)+kd(x0, x), so (1−k)d(x0, x) ≤ d(x0, x1),and we conclude that

d(xn, x) ≤ kn

1− kd(x0, x1),

so we can choose n large to make this as small as we wish.

Examples:

1. Show that x3 + 2x2− 8x+ 4 = 0 has a unique solution in [−1, 1], and find it correctto within ±10−6.

38

Solution: write equation as x = 18(x3 + 2x2 + 4), and let φ(x) = 1

8(x3 + 2x2 + 4) for

−1 ≤ x ≤ 1. Note that if |x| ≤ 1 then

|φ(x)| ≤ 1

8(|x|3 + 2|x|+ 4) ≤ 7

8,

so φ does map [−1, 1] to itself. Then

φ′(x) =

∣∣∣∣18(3x2 + 4x)

∣∣∣∣ ≤ 7

8,

for x ∈ [−1, 1], so φ is a contraction mapping with k = 7/8. It has a unique fixedpoint, as required.

LECTURE 16

Take x0 = 0. Defining xn = φ(xn−1), we get x1 = 0.5, x2 = 0.578, etc. as in Examples9.1. The sequence converges to 0.6308976, although convergence is slow, since k = 7/8,so the error after n steps is only bounded by

|xn − x| ≤kn

1− k|x0 − x1| = 4

(7

8

)n.

2. Define φ : C[0, 1]→ C[0, 1] by

φ(f)(x) =

∫ x

0

t(t+ f(t)) dt.

Show φ is a contraction mapping for the metric d∞, and use φ to find an approximatesolution y to the differential equation

dy

dx= x(x+ y), y(0) = 0, (0 ≤ x ≤ 1).

Solution:d∞(φ(f), φ(g)) = max{|φ(f)(x)− φ(g)(x)| : 0 ≤ x ≤ 1}.

|φ(f)(x)− φ(g)(x)| =

∣∣∣∣∫ x

0

t(t+ f(t)) dt−∫ x

0

t(t+ g(t)) dt

∣∣∣∣=

∣∣∣∣∫ x

0

t(f(t)− g(t)) dt

∣∣∣∣≤

∫ x

0

t|f(t)− g(t)| dt

≤∫ x

0

td∞(f, g) dt =1

2x2d∞(f, g).

39

Thus d∞(φ(f), φ(g)) ≤ 12d∞(f, g), and φ is a contraction map with k = 1/2.

If y = f(x) is a solution of the diff. eq. then f ′(t) = t(t+ f(t)) and f(0) = 0. Integratefrom 0 to x: ∫ x

0

f ′(t) dt =

∫ x

0

t(t+ f(t)) dt,

i.e., f(x) = φ(f)(x). So f = φ(f) and f is a fixed point of φ.

So CMT says that the d.e. has a unique solution, which we can obtain by iteration.We did this in Examples 9.1 as well. Note that f0 = 0, f1(x) = x3/3, so d∞(f0, f1) = 1

3,

and in general d∞(fn, f) ≤ 13.2n−1 , by 9.5.

Another example: f ′(t) = t(1 + f(t)), for t ∈ [0, 1], with f(0) = 0, f(t) = et2/2 − 1

is the actual (unique) solution.

Take f0(x) = 0, f1(x) =

∫ x

0

t(1 + f0(t)) dt =x2

2,

f2(x) =

∫ x

0

t(1 + f1(t)) dt =x2

2+x4

8, etc.

9.6 General method

Solvedy

dx= F (x, y), y(a) = c, a ≤ x ≤ b,

where F (x, y) is a real-valued function defined for a ≤ x ≤ b and y ∈ R.

If y = f(x) is a solution, then

f ′(t) = F (t, f(t)), f(a) = c, (t ∈ [a, b]), (1)

or, equivalently,

f(x) = c+

∫ x

a

F (t, f(t)) dt. (2)

Define φ : C[a, b]→ C[a, b] by

φ(f)(x) = c+

∫ x

a

F (t, f(t)) dt.

So f is a solution of (1) ⇐⇒ f is a solution of (2) ⇐⇒ f is a fixed point of φ.If φ is a contraction for the d∞ metric on C[a, b], then by CMT (1) has a uniquesolution, which is the limit of a sequence (fn), where f0 ∈ C[a, b] is arbitrary andfn = φ(fn−1) for n ≥ 1.Also d∞(fn, f) ≤ kn

1−kd(f0, f1), where k is the contraction constant of φ.

40

So when is φ a contraction map?

Note that some differential equations don’t have solutions everywhere we might wantthem; e.g. f ′(t) = −f(t)2 for t ∈ [0, 2], with f(0) = −1. The only solution is f(t) =1/(t− 1), which is discontinuous at t = 1.

9.7 Theorem

With F and φ as above, suppose there is a constant k < 1 such that

|F (x, y1)− F (x, y2)| ≤ k

b− a|y1 − y2| for all x ∈ [a, b], y1, y2 ∈ R.

Then φ is a contraction mapping on (C[a, b], d∞).

Proof: For f, g ∈ C[a, b],

|φ(f)(x)− φ(g)(x)| =

∣∣∣∣∫ x

a

(F (t, f(t))− F (t, g(t)) dt

∣∣∣∣≤

∫ x

a

|F (t, f(t))− F (t, g(t))| dt

≤ k

b− a

∫ x

a

|f(t)− g(t)| dt

≤ k

b− a

∫ x

a

d∞(f, g) dt

= kx− ab− a

d∞(f, g),

so d∞(φ(f), φ(g)) ≤ kd∞(f, g), as required.

9.8 Definition

A function f : [a, b]→ R satisfies the Lipschitz condition with constant m if|f(x1)− f(x2)| ≤ m|x1 − x2| for all x1, x2 ∈ [a, b].

If f is differentiable on [a, b] and m = max{|f ′(t)| : t ∈ [a, b]}, then f satisfies theLipschitz condition with constant m, since the Mean Value Theorem gives, for some cbetween x1 and x2,

|f(x1)− f(x2)| = |(x1 − x2)f ′(c)| ≤ m|x1 − x2|.

LECTURE 17

Similarly, if we have a function F (x, y), we say that it satisfies the Lipschitz conditionin y with constant m if

|F (x, y1)− F (x, y2)| ≤ m|y1 − y2|

41

for all x and for all y1 and y2 for which the above is defined.If we have partial derivatives, we can take

m = max

{∣∣∣∣∂F∂y∣∣∣∣ : a ≤ x ≤ b, y ∈ R

}.

9.9 Theorem

If F satisfies the Lipschitz condition in y with a constant m < 1b−a , then the differential

equation y′ = F (x, y), y(a) = c has a unique solution for a ≤ x ≤ b.

Proof: use Theorem 9.7 writing m = kb−a with k < 1.

In fact if it satisfies the Lipschitz condition with any constant m at all, we can stillsolve the equation. What we do is to solve it in C[a, a + δ], where m < 1

δ, and ob-

tain a value y(a+δ). We then solve it in C[a+δ, a+2δ], and keep going until we get to b.

[DIAGRAM: lots of pieces joined together.]

Examples:

1.dy

dx= cos(x2y), y(0) = 2, for 0 ≤ x ≤ 1.

Here F (x, y) = cos(x2y), and∂F

∂y= −x2 sin(x2y), so∣∣∣∣∂F∂y

∣∣∣∣ ≤ x2 ≤ 1. (3)

Thus F satisfies the Lipschitz condition in y with constant m = 1. Not good enoughfor the theorem to apply on [0, 1] (although we could use [0, 1

2] and [1

2, 1], as above).

But if

φ(f)(x) = 2 +

∫ x

0

cos(t2f(t)) dt,

then

|φ(f)(x)− φ(g)(x)| =

∣∣∣∣∫ x

0

(cos(t2f(t))− cos(t2g(t)) dt

∣∣∣∣≤

∫ x

0

|F (t, f(t))− F (t, g(t))| dt

≤∫ x

0

t2|f(t)− g(t)| dt by (3),

≤∫ x

0

t2d∞(f, g) dt =1

3x3d∞(f, g).

So d∞(φ(f), φ(g)) ≤ 13d∞(f, g), and so φ is a contraction map.

42

2.dy

dx=√y, y(0) = 0, 0 ≤ x ≤ 1. (4)

Here F (x, y) = y1/2 and∂F

∂y=

1

2y−1/2, unbounded.

So F does not satisfy a Lipschitz condition in y at all. For any c ∈ (0, 1] we can define

fc(x) =

{0 if x ≤ c,14(x− c)2 if c ≤ x ≤ 1.

[DIAGRAM: constant on [0, c], parabola rising on [c, 1].]

Then fc(x) satisfies (4), so there is no unique solution.

A new type of example: consider φ : R → R defined by φ(x) = cosx. Then φ is ashrinking map but not a contraction map, since

|φ(x)− φ(y)||x− y|

= | sin z|

for some z between x and y, by the Mean value Theorem. This is at most 1 (so shrink-ing), but can be close to 1 if x and y are close to π/2, for example.

[DIAGRAM: plot y = x, y = cosx; curves meet once between 0 and π/2.]

We shall see that nevertheless φ has a unique fixed point.

We see that φ2 : R→ R, where φ2(x) = φ(φ(x)), is cos(cos(x)), and this is a contractionmap, since

| cos(cosx)− cos(cos y)||x− y|

= | sin(cos z). sin z|,

by the MVT, and this is at most sin 1, since cos z lies between −1 and 1. But sin 1 isabout 0.8415, anyway it’s less than 1.

The following theorem shows that φ has a unique fixed point, given by iteration:x0 = 0, x1 = φ(x0) = 1, x2 = φ(x1) = 0.54, etc. (keep hitting cos button on calculator,working in radians), and this converges to 0.7390851 . . ..

9.10 Theorem

If (X, d) is a complete metric space and φ : X → X is a map such that some iterateφm of φ is a contraction map, then φ has a unique fixed point. For any x0 ∈ X thesequence (xn) = (φn(x0)) converges to the fixed point.

Proof: by the CMT applied to φm, we get a unique fixed point x for φm. So x = φm(x).Apply φ, then

φ(x) = φ(φm(x)) = φm+1(x) = φm(φ(x)),

43

so that φ(x) is also a fixed point of φm. By the uniqueness, φ(x) = x, so x is a fixedpoint of φ as well.

If x and y are fixed points of φ, then x and y are fixed points of φm, which is a con-traction mapping, and so x = y. Hence φ has a unique fixed point.

Sketch of last assertion: let’s do m = 3 for illustration (the general case is similar, withmore complicated notation). We have, by the CMT for φm:

x0, x3, x6, . . . , x3k, . . . → x,

x1, x4, x7, . . . , x3k+1, . . . → x,

x2, x5, x8 . . . , x3k+2, . . . → x.

This implies that the single sequence x0, x1, x2, . . . tends to x, since given ε > 0, wehave d(x3k, x) < ε for k > k0, say, d(x3k+1, x) < ε for k > k1, say, and d(x3k+2, x) < εfor k > k2, say. So d(φN(x0), x) < ε for N > max{3k0, 3k1 + 1, 3k2 + 2}.

LECTURE 18

9.11 The final word on differential equations (calculation non-examinable)

Given the differential equation

dy

dx= F (x, y), y(a) = c, (a ≤ x ≤ b), (5)

we define

φ(f)(x) = c+

∫ x

a

F (t, f(t)) dt,

as usual. Suppose that F satisfies the Lipschitz condition in y with constant m. Weshall see that some iterate of φ is a contraction mapping.

As before we calculate

|φ(f)(x)− φ(g)(x)| ≤∫ x

a

|F (t, f(t))− F (t, g(t))| dt

≤ m

∫ x

a

|f(t)− g(t)| dt, by the Lipschitz condition (6)

≤ m

∫ x

a

d∞(f, g) dt

= md∞(f, g)

∫ x

a

dt = md∞(f, g)(x− a). (7)

44

Repeat:

|φ2(f)(x)− φ2(g)(x)| = |φ(φ(f))(x)− φ(φ(g))(x)|

≤ m

∫ x

a

|φ(f)(t)− φ(g)(t)| dt, by (6) replacing f , g by φ(f), φ(g)

≤ m

∫ x

a

md∞(f, g)(t− a) dt, by (7)

= m2d∞(f, g)

∫ x

a

(t− a)dt = m2d∞(f, g)(x− a)2

2. (8)

Once more:

|φ3(f)(x)− φ3(g)(x)| ≤ m

∫ x

a

|φ2(f)(t)− φ2(g)(t)| dt, by (6) replacing f , g by φ2(f), φ2(g)

≤ m

∫ x

a

m2d∞(f, g)(t− a)2

2dt, by (8)

= m3d∞(f, g)(x− a)3

3!.

In general we obtain

|φn(f)(x)− φn(g)(x)| ≤ mn(x− a)n

n!d∞(f, g),

i.e.,

d∞(φn(f), φn(g)) ≤ mn(b− a)n

n!d∞(f, g).

Now∑

xn

n!converges for any x, so the terms tend to zero, and so mn(b−a)n

n!tends to

zero for all choices of m, a and b. If we choose N so that mN (b−a)N

N !< 1, then φN is a

contraction mapping. We thus have:

9.12 Theorem

If F (x, y) satisfies the Lipschitz condition in y for some constant m, where a ≤ x ≤ band y ∈ R, then the differential equation (5) has a unique solution which can be ap-proximated by iteration.

10 Connectedness

10.1 Definition

A metric space X is disconnected if ∃U , V , open, disjoint, nonempty, such thatX = U ∪ V . Note that U and V will also be closed, as their complements areopen. Otherwise X is connected. A subset is connected/disconnected if it is con-nected/disconnected when we restrict the metric to the subset to get a (smaller) metric

45

space.

[DIAGRAM in R2]

Examples: (i) X with the discrete metric is disconnected if #X > 1.(ii) In R, consider A = [0, 1] ∪ [2, 3]. This splits into A ∩ (−∞, 3/2) and A ∩ (3/2,∞).(iii) Q is disconnected; splits into Q ∩ (−∞,

√2) and Q ∩ (

√2,∞).

(iv) R is connnected – see later.

10.2 Definition

An interval in R is a set S such that if s, t ∈ S then [s, t] ⊂ S.

Examples are (a, b), [a, b], (a, b], [a, b), (−∞, b), (−∞, b], (a,∞), [a,∞), with a, b finite;also ∅ and R. These are all the examples possible.

We want to show that the connected subsets of R (usual metric) are precisely the in-tervals.

10.3 Lemma

Let x, y ∈ R with x < y, let U, V be disjoint open sets in R with x ∈ U and y ∈ V .Then there is a z ∈ (x, y) with z 6∈ U ∪ V .

Proof: Let T = {t < y : t ∈ U}. Now x ∈ T and so T 6= ∅, and it is bounded aboveby y, so it has a least upper bound, z = supT .

We can’t have z ∈ U or else a neighbourhood (z − δ, z + δ) is contained in U , whichmeans that z isn’t the least upper bound of T .

We can’t have z ∈ V , or else a neighbourhood (z − δ, z + δ) is contained in V (and sodoesn’t meet U), and again z isn’t the least upper bound of T .

Thus x < z < y and z 6∈ U ∪ V .�

N.B. The same result holds assuming only that [x, y] ∩ U and [x, y] ∩ V are disjoint,which is the form we shall require.

10.4 Theorem

A subset S of R is connected if and only if it is an interval.

Proof: (i) If S is not an interval, then there are x, y ∈ S with [x, y] 6⊂ S, so there isa z ∈ (x, y), z 6∈ S.

46

Now take U = S ∩ (−∞, z) and V = S ∩ (z,∞); we see that this disconnects S.

(ii) Suppose that S is an interval and U, V are open in R with (U ∩ S) ∩ (V ∩ S) = ∅,but U ∩ S and V ∩ S nonempty, and S ⊂ U ∪ V .Take x ∈ U ∩ S and y ∈ V ∩ S. By Lemma 10.3, there is a z ∈ (x, y) with z 6∈ U ∪ V .But z ∈ S ⊂ U ∪ V , a contradiction. Hence the result.

The intersection of two connected sets needn’t be connected [picture in R2] and nor isthe union of two connected sets (e.g. (0, 1) ∪ (2, 3)). (In R, however, the intersectionof two connected sets is connected, since these are just intervals.)

Unions are OK if there is a point in common, as we see next.

LECTURE 19

10.5 Remark

Let Y ⊂ X, where X is a metric space. Then a subset S ⊂ Y is open (regarded as asubset of Y ) if and only if S = U ∩ Y , where U is an open subset of X. This followseasily on noting that every open subset in Y is a union of balls BY (s, ε) for s ∈ S, andBY (s, ε) = BX(s, ε) ∩ Y .So, for example, we can say that [0, 1] and [2, 3] are open subsets of the metric spaceY = [0, 1] ∪ [2, 3], although not open when regarded as subsets of X = R, since, forexample [0, 1] = (−∞, 3

2) ∩ Y .

10.6 Theorem

Let (X, d) be a metric space and x ∈ X. Let {Sλ : λ ∈ Λ} be a family of connectedsets, each containing x. Then

⋃λ∈Λ Sλ is connected.

Proof: Let U, V be open, with⋃Sλ ⊂ U ∪ V , and U ∩ V ∩

⋃Sλ = ∅. WLOG,

x ∈ U and x 6∈ V .

For each λ,Sλ = (U ∩ Sλ)︸︷︷︸

nonempty, contains x

∪ (V ∩ Sλ),

so since Sλ is connected, V ∩ Sλ = ∅ for each λ, and so V ∩⋃Sλ = ∅.

Hence⋃Sλ is connected.

�

So every point x ∈ X is contained in a maximal connected subset, the connectedcomponent containing x, namely

Cx =⋃{S ⊂ X, x ∈ S, S connected}.

47

Of course {x} itself is one such connected set S, so this is not an empty union.

Now for x, y ∈ X, either Cx = Cy or else Cx ∩ Cy = ∅. For otherwise, Cx ⊂ Cx ∪ Cy,which would be a strictly larger connected set containing x, by Theorem 10.6, and thiscontradicts the definition of Cx.Hence, we can write X as a disjoint union of connected components, and these are themaximal connected subsets.

If we apply this for an open subset of R, we end up by seeing that it is necessarily acountable union of disjoint open intervals (countably many, since each one contains adifferent rational point and there are only countably many to go round).

10.7 Theorem

Let f : X → Y be a continuous mapping between metric spaces, and suppose that Xis connected. Then the image f(X) := {f(x) : x ∈ X} is also connected.

Proof: If f(X) = U ∪ V with U , V , open and disjoint, then X = f−1(U) ∪f−1(V ), with f−1(U) and f−1(V ) disjoint, and open (since f is continuous). Since Xis connected this can only happen if one of f−1(U) and f−1(V ) is empty, which meansthat one of U and V is empty. So f(X) is connected.

�

10.8 Corollary (Intermediate Value Theorem)

Let f : [a, b] → R be continuous. Then for each y between f(a) and f(b) there is a xbetween a and b with f(x) = y.

Proof: By Theorem 10.7 we have that f([a, b]) is connected, and hence, by Theo-rem 10.4 it is an interval. The result is now clear.

�

10.9 Definition

Let (X, d) be a metric space. Then X is path-connected if, for all x, y ∈ X, there is acontinuous f : [0, 1]→ X with f(0) = x, f(1) = y (i.e., a path joining them).

(Of course we can also talk about path-connected subsets of a metric space, as theyare metric spaces too.)

10.10 Proposition

Let X be a path-connected metric space. Then X is connected.

48

Proof: Suppose that X = U ∪V , with U and V open, disjoint and nonempty. Takex ∈ U and y ∈ V . Then there is a path f : [0, 1]→ X joining x to y.Hence [0, 1] = f−1(U)∪f−1(V ), as open disjoint sets in the metric space [0, 1]. But [0, 1]is connected so one of f−1(U) and f−1(V ) is empty. But 0 ∈ f−1(U) and 1 ∈ f−1(V ),so we have a contradiction. So X is connected.

�

The converse is false. Take X = G ∪ I, where G = {(x, sin 1/x) : x > 0} andI = {(0, y) : −1 ≤ y ≤ 1}. Then G ∪ I is connected, but not path-connected. See theExercises.

LECTURE 20

10.11 Remark

For open subsets of Rn it is true that connected and path-connected are the same.Suppose that S is open and connected, and take x ∈ S. Then

U = {y ∈ S : we can join x to y by a path}

is open [DIAGRAM]. So is

V = {y ∈ S : we can’t join x to y by a path}.

Since S = U ∪ V (open, disjoint) and U is nonempty, since it contains x, we see thatby connectedness V = ∅ and U = S.

10.12 Theorem

(i) Let n ≥ 2. Then Rn (usual metric) isn’t homeomorphic to R.(ii) Moreover, no two out of (0, 1), [0, 1) and [0, 1] are homeomorphic.

Proof: (i) Suppose that f : R→ Rn was a homeomorphism. Let U = R \ {0}, andV = Rn \ {f(0)}.Then g = f|U is a homeomorphism from U onto V , and hence V is disconnected, as itsplits into f(−∞, 0) ∪ f(0,∞).But V is path-connected, hence connected. Contradiction.

(ii) Similarly, if we delete any point from (0, 1) it becomes disconnected; not true forthe others if we deleted 0. So (0, 1) is not homeomorphic to the others. If we deleteany 2 points from [0, 1) it becomes disconnected, not true for [0, 1] if we deleted theend-points. So the other two aren’t homeomorphic, either.

�

49

Similarly we can see that [0, 1] is not homeomorphic to the square [0, 1]× [0, 1], since re-moving any three points will disconnect [0, 1]. This is in spite of the fact that there exist“space-filling curves”, i.e., continuous (non-bijective) maps from [0, 1] onto [0, 1]×[0, 1].There also exist discontinuus bijections between the two sets.

11 Compactness

Recall that any real continuous function on a closed bounded interval [a, b] is boundedand attains its bounds. We look at this in a more general context.

11.1 Definition

Let K ⊆ X, where (X, d) is a metric space. An open cover of K is a family of opensets (Uλ)λ∈Λ such that K ⊂

⋃λ∈Λ Uλ. We say that K is compact if whenever (Uλ)λ∈Λ is

an open cover of K, there is a finite subcover Uλ1 , . . . , UλN such that K ⊂ Uλ1∪. . .∪UλN .

“Every open cover has a finite subcover.”

It doesn’t matter whether we cover K with open sets in K or open sets in X, sinceopen sets in K are just the intersection with K of open sets in X.

11.2 Examples

Clearly R is not compact, as Un = (−n, n) for n = 1, 2, . . . form an open cover, but wecannot cover the whole of R by taking only finitely many of these sets.

Similarly, nor is (0, 1) =∞⋃n=2

(1

n, 1

), but not a finite union of any of these sets.

It will be shown later that [0, 1] is compact. More generally, it turns out that the com-pact subsets of Rn with the Euclidean metric are just the closed bounded ones. Thus,compact subsets of R include finite sets, and finite unions of closed intervals such as[0, 1] ∪ [2, 3]. But NOT (0, 1), R itself, or Q.

11.3 Theorem

Let f : (K, d) → R be continuous with K ⊂ X compact. Then f is bounded on Kand it attains its bounds (so that ∃x ∈ K with f(x) = sup{f(k) : k ∈ K} < ∞ andsimilarly for inf).

Proof: Let Un = {x ∈ K : |f(x)| < n} for n = 1, 2, 3, . . ., which is f−1(−n, n)and hence open; we have K ⊂

⋃Un. By compactness K ⊂ Un1 ∪ . . . ∪ UnN

for someUn1 , . . . , UnN

, and now |f(x)| ≤ max{n1, . . . , nN} for x ∈ K.

50

Also, if s = supx∈K f(x), we have either that f(x) = s for some x, or else that1/(s − f(x)) is a continuous function on K and hence bounded by M > 0, say. Thismeans that s − f(x) ≥ 1/M for all x ∈ K; i.e., f(x) ≤ s − 1/M , contradicting thedefinition of s as the sup.

�

11.4 Theorem

Let (X, d) be a metric space; then every compact subset K ⊂ X is closed and bounded.

Proof: Let x be a point of X \K. For each k ∈ K consider the balls Bk = B(k, rk/2)and Ck = B(x, rk/2) where rk = d(x, k) > 0. These are disjoint and the Bk form anopen cover of K. By compactness we can find k1, . . . , kN such that K ⊂ Bk1∪ . . .∪BkN .But now Ck1∩. . . CkN , is an open ball containing x which is disjoint from Bk1∪. . .∪BkN

and hence from K. So K is closed. [DIAGRAM]

Also, let x be any point of X and note that K ⊂⋃∞n=1B(x, n). By compactness

∃n1, . . . , nN such thatK ⊂ B(x, n1)∪. . .∪B(x, nN), and thus d(k, x) < max{n1, . . . , nN}for all k ∈ K.

�

LECTURE 21

An alternative way to see why a compact set K is necessarily closed and bounded is totake any x ∈ X \K and consider the continuous function on K given by f(k) = d(k, x).By Theorem 11.3, f attains its lower bound δ ≥ 0. But δ cannot be 0 as then we wouldhave k ∈ K with k = x. Thus δ > 0 and B(x, δ) ∩K = ∅ so K is closed.Also, since f is bounded we see that K is bounded.

11.5 Example

The infinite set S =

{1,

1

2,1

3, . . .

}∪ {0} is compact. For any open cover (Uλ) of S,

there will be a set, say Uλ0 , containing 0. Since Uλ0 is open, there is an N such that

Uλ0 will also contain1

nfor all n ≥ N . But then we only need finitely-many more Uλ

to cover the whole set.

New compact sets from old ones.

11.6 Theorem

(i) Let X be a compact metric space and F a closed subset of X. Then F is compact.(ii) Let X be a compact metric space and Y an arbitrary metric space. Suppose thatf : X → Y is continuous. Then f(X) is compact.

51

Proof: (i) If we have an open cover of F , say, F ⊂⋃λ∈Λ Uλ, then by adding the set

X \ F , which is open, we have an open cover of X. Since X is compact we only needfinitely many sets, say X ⊂ (X \ F ) ∪ Uλ1 ∪ . . . ∪ UλN , and now F ⊂ Uλ1 ∪ . . . ∪ UλN ,so F is compact.

(ii) Given an open cover f(X) ⊂⋃λ∈Λ Uλ, we see that X ⊂

⋃λ∈Λ f

−1(Uλ) since foreach point x ∈ X there is a λ with f(x) ∈ Uλ, meaning that x ∈ f−1(Uλ). Since f iscontinuous, this is an open cover of X.But now we have a finite subcover of X. X ⊂ f−1(Uλ1)∪ . . .∪ f−1(UλN ), which meansthat f(X) ⊂ Uλ1 ∪ . . . ∪ UλN . Hence f(X) is compact.

�

This gives us another way to prove Theorem 11.3. For if K is compact and f(K) ⊂ Ris compact, it is a bounded set, and being closed implies that the least upper bound isin the set.

11.7 Theorem (Heine–Borel)

Any closed bounded real interval [a, b] ⊂ R is compact (in the usual metric).

Proof: Given an open cover [a, b] ⊂⋃λ∈Λ Uλ, let

S = {x ∈ [a, b] : [a, x] ⊂ some finite subcollection of the Uλ}.

Now a ∈ S as there is a Uλ containing a, so S is a nonempty bounded set, indeed it’san interval [a, y) or [a, y] since if x1 < x2 and x2 ∈ S then also x1 ∈ S.If we can show that y = b and y ∈ S, then we have the result. But if y < b, then y liesin some Uλ0 so that (y− δ, y + δ) ⊂ Uλ0 for some δ > 0. Now y− δ/2 ∈ S, so we cover[a, y− δ/2] by finitely many Uλ and then adding Uλ0 to the collection cover [a, y+ δ/2]by finitely many, contradicting the definition of y.The same argument also shows that y ∈ S.Putting this together, we see that we can cover [a, b] by finitely many sets.

�

Now here is a concept that, for metric spaces, is equivalent to compactness and a littleeasier to understand.

11.8 Definition

A subset K of a metric space is sequentially compact if every sequence in K has aconvergent subsequence with limit in K.

The classical Bolzano–Weierstrass theorem in R says that every bounded sequence hasa convergent subsequence, and this implies that all closed bounded subsets F ⊂ R aresequentially compact (“closed” guarantees that the limit is in F ).

52

11.9 Example

The closed unit ball B in `2 is not sequentially compact (although closed and bounded).Recall that

B =

{(xn) :

∞∑n=1

x2n ≤ 1

}.

For let e1 = (1, 0, 0, . . .), e2 = (0, 1, 0, 0, . . .), e3 = (0, 0, 1, 0, . . .), etc. Then (en) is asequence in B with no convergent subsequence since d(en, em) = ‖en − em‖ =

√2 for

all n 6= m.

We need one more definition before we state the final big theorem of the course.

11.10 Definition

A subset K of a metric space is precompact or totally bounded if for each ε > 0 it canbe covered with finitely many balls B(xk, ε).[Think of employing finitely-many short-sighted guards to watch over your set.]

Easily, every compact set is precompact, since we can cover K with open balls B(x, ε),where x varies over the whole of K. By compactness we only need finitely many. [DI-AGRAM]

LECTURE 22

But the closed ball of `2 isn’t precompact, since if it were covered by balls of radius1/2 then each en would have to be in a different one – for if d(x, en) < 1/2 andd(x, em) < 1/2 we get d(xn, xm) < 1, which is a contradiction for n 6= m. So it isn’tcompact either.

11.11 Example - the Hilbert cube

Consider the subset C ⊂ `2 defined by

C =

{(xn)∞n=1 : |xn| ≤

1

nfor each n

}.

We claim that C is precompact. For given ε > 0 choose N such that∞∑

n=N+1

1

n2<ε2

4.

Now the set

D =

{x = (x1, . . . , xN) ∈ RN : |xn| ≤

1

nfor each n

}is easily seen to be precompact: we can cover it with balls of radius ε/2 simply bytaking enough centres (y1, . . . , yN) such that |xj−yj| < ε/2N for each xj ∈ [−1/j, 1/j]

53

[DIAGRAM].

Now think of vectors in RN as being padded with zeroes, so that they lie in `2.

That is D ⊂⋃Kk=1B(zk, ε/2). But now we have that C ⊂

⋃Kk=1B(zk, ε), since for every

point in c ∈ C its truncation c′ to N coordinates lies in D; thus d(c′, zk) < ε/2 andhence d(c, zk) ≤ d(c, c′) + d(c′, zk) < ε by the triangle inequality.

In fact the Hilbert cube is also compact, and this is a consequence of the big resultthat follows.

11.12 Theorem

The following are equivalent in a metric space (X, d):(1) X is compact.(2) If (En) are nonempty closed sets in X with E1 ⊇ E2 ⊇ . . ., then

⋂∞n=1 En 6= ∅.

(3) X is sequentially compact.(4) X is complete and precompact.

Proof: (1) ⇒ (2). Suppose that⋂En = ∅, then

⋃(X \ En) = X (de Morgan’s

law); this is an open cover of X so there is a finite subcover, by compactness. SoX = (X \E1)∪ . . .∪ (X \EN) = X \EN as the En are decreasing so their complementsare increasing. Which means that EN = ∅, a contradiction.

(2) ⇒ (3). Let (xn) be any sequence in X and let En = {xn, xn+1, . . .}, which aredecreasing nonempty closed sets. Thus there is a point y in

⋂∞n=1En.

Take B(y, 1): this meets {x1, x2, . . .} since y is in its closure. Pick xn1 ∈ B(y, 1).Now B(y, 1/2) meets {xn1+1, xn1+2, . . .} since y is in its closure. Pick xn2 ∈ B(y, 1/2),and note that n2 > n1.Continuing this way we find xnk

in B(y, 1/k), so the subsequence (xnk) converges to y.

(3)⇒ (4). If X is sequentially compact it is certainly complete, for if (xn) is a Cauchysequence, let (xnk

) be a convergent subsequence, converging to y. Now the originalCauchy sequence also converges to y (see Part D of Theorem 8.5).

Also X will be precompact, since if not then we can find ε > 0 with no finite coveringby balls of radius ε; choose x1 ∈ X, and then inductively we obtain (xn) such thatxn 6∈

⋃n−1k=1 B(xk, ε). Now it’s clear that d(xk, xn) ≥ ε > 0 for all k < n, which means

that the (xn) have no convergent subsequence.

We’ll postpone (4) ⇒ (1) until the next lecture, as it is long.

11.13 Corollary

54

A subset K ⊂ RN is compact if and only if it is closed and bounded.

Proof: We saw in Theorem 11.4 that all compact sets (in any metric space) areclosed and bounded.

For the converse, we can show that all closed bounded sets K in RN are sequentiallycompact. If (xn) = (xn1 , . . . , x

nN) is a sequence in K, then by passing to a subsequence

we can ensure that the sequence (xn1 ) of first coordinates converges (since every boundedsequence in R has a convergent subsequence); then to a further subsequence to ensurethat the sequence (xn2 ) of 2nd coordinates converges, and so on.

After finitely many steps we have a subsequence (yn) = (yn1 , . . . , ynN) such that ynk → zk,

say, as n→∞ for each 1 ≤ k ≤ N . This implies that yn → z = (z1, . . . , zN) (cf. Propo-sition 2.4). Also z ∈ K since K is closed.

Thus K is sequentially compact, and hence compact by Theorem 11.12.�

LECTURE 23

(4) ⇒ (1) in Theorem 11.12. This is the hardest bit and the proof is definitely notexaminable. We suppose that X is complete and precompact, and show it is compact.So take an open cover

⋃λ∈Λ Uλ of X.

First we reduce it to a countable cover. For each n we can cover X by finitely manyballs B(an,1, 1/n) ∪ . . . ∪B(an,rn , 1/n), using precompactness. Let A denote the set ofcentres (this is countable and dense because for each n the set A comes within 1/n ofevery point of X) and consider all the balls B(a, 1/k) for a ∈ A and k = 1, 2, . . .. Weclaim that every open set U is a union of some of the B(a, 1/k). For if U is open andx ∈ U , there is a ball B(x, 1/j) ⊂ U and a point a ∈ A contained in B(x, 1/2j). Butnow x ∈ B(a, 1/2j) ⊂ B(x, 1/j) ⊂ U [DIAGRAM].

Thus we can cover X with a countable subcollection of the Uλ, since for each x ∈ Xthere is a ball B(a, 1/k) with x ∈ B(a, 1/k) ⊂ some Uλ. There are only countablymany balls to choose from so select one Uλ for each ball we used.

The upshot, after relabelling, is that we may assume that X = U1 ∪ U2 ∪ . . .. If thereis a p such that X = U1 ∪ . . .∪Up, we are finished. If not, then for each i we may findxi 6∈ U1∪ . . .∪Ui. We select a Cauchy subsequence as follows (a “diagonal” argument).

Cover X by finitely-many balls of radius 1. Then for at least one of these balls thereis an infinite subsequence, say x11, x12, . . ., all in the same ball B(y1, 1).Now cover X by finitely-many balls of radius 1/2. Then for at least one of these ballsthere is an infinite subsequence of (x1k), say x21, x22, . . . all in the same ball B(y2, 1/2).

55

Repeat. We obtain nested subsequences (xnk)k all in the same ball B(yn, 1/n). But nowthe diagonal subsequence (xnn) is Cauchy, since since for m ≤ n, d(xmm, ym) < 1/mand also d(xnn, ym) < 1/m as the (xnk) are a subsequence of the (xmk). Henced(xmm, xnn) < 2/m for n > m, i.e., a Cauchy sequence.

By completeness, xnn → z, say. Now z ∈ Uj for some j so xnn ∈ Uj for n ≥ some n0.But this contradicts the construction of our (xi) as for i ≥ j we have xi 6∈ U1∪ . . .∪Uj.

�

11.14 Final Example – the Cantor set

We define the Cantor set C ⊂ [0, 1] as follows.

Let C0 = [0, 1], C1 = [0, 1/3] ∪ [2/3, 1], C2 = [0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1],and so on, at each stage deleting the middle open third of every interval that remains.[DIAGRAM]

Then C =⋂∞n=0 Cn. This is the intersection of closed subsets of R, and is hence a

closed (even compact) set. Remarkably, it is uncountable: indeed it consists of all

numbers of the form x =∞∑j=1

aj3j

, where aj = 0 or 2 for each j (and not 1). Note that

we regard 1/3 = 0.02222 . . ..

One can use a Cantor diagonal argument (as one does to prove that R is uncountable)to show that C is uncountable.

Note that in fact there is a surjection f : C → [0, 1] defined by

f :∞∑j=1

aj3j7→

∞∑j=1

aj/2

2j.

Paradoxically, the complement of the Cantor set is an open set and so just a countableunion of intervals. If one calculates the total length of the intervals removed from [0, 1]

it is∞∑j=1

2j

3j, since we removed 2j intervals of length 3−j at each stage. This sums up to

1, but there are still many points left!

The set C is “totally disconnected” – it clearly doesn’t contain any intervals, so all itssubsets consisting of more than one point are disconnected. That is, every componentof C consists of a single point.

In a technical sense (outside the scope of this course) C is a fractal set – its “dimension”is log 2/ log 3 or about 0.63.

56

THE END

The exercises on the sheet Extra Examples will be done in lectures, but the solutionswill not be put online (if necessary, you can watch the videos).

57

Documents

MATH 3210 Metric spaces - University of Leedspmt6jrp/3210course.pdf · Victor Bryant, Metric spaces: iteration and application, Cambridge, 1985. M. O. Searc oid, Metric Spaces, Springer