Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Notes on Real Analysis, 3A03
Prof. S. Alama1
Revised April 5, 2020
Contents
1 The Completeness Axiom, Supremum and Infimum 5
2 Sequences 122.1 Limits of Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Monotone Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 Divergent Sequences and Subsequences . . . . . . . . . . . . . . . . . . . . . 232.4 Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Series 29
4 Cardinality 37
5 Limits and Continuity 425.1 Limits of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.2 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.3 Extensions of limit concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.4 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6 Differentiability 586.1 Differentiable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.2 The Mean Value Theorem and its Consequences . . . . . . . . . . . . . . . . 616.3 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7 The Fundamental Theorem of Calculus 70
Introduction
Real Analysis is the study of the real numbers R and of real valued functions of one or more
real variables. If this sounds familiar, it should: you’ve been doing calculations with real
numbers and functions for several years now, starting in high school and continuing through
calculus. On the other hand, most of what you’ve done with real numbers and functions
has involved applying “rules” intended to justify which kinds of calculations are appropriate
(in the sense that they lead to correct results,) which might have seemed mysterious and
arbitrary to you at the time. The goal of this course is to develop some understanding
of functions of a real variable using logical deductive reasoning, and in doing so obtain a
1 c©2018 All Rights Reserved. Do not distribute without author’s permission.
1
much firmer grasp of how these “rules” may be justified and how to determine whether any
computational procedure will always yield correct results.
One of the great mysteries, in fact, is to figure out what the real numbers actually are.
You’ve learned that they include all of the other more familiar number systems from elemen-
tary mathematics: the counting (natural) numbers N, integers Z, rational numbers (fractions
of integers) Q, and algebraic numbers (ie, roots of polynomials with integer coefficients, such
as√
2, 3√
5, . . . .) In high school you mostly learned the algebra of real numbers. Amazingly,
one cannot really understand the reals via algebra. Their fundamental properties involve
analysis– basically, one must understand some sense of limits.
Let’s summarize some algebraic and geometric properties of the reals, which you aleady
know but perhaps haven’t thought of in these terms:
• R is a field: If a, b, c ∈ R, then so are a + b, ab, and b/c if c 6= 0. Addition and
multiplication are commutative, a + b = b + a and ab = ba, associative, a + (b + c) =
(a+ b) + c and a(bc) = (ab)c, and distributive, a(b+ c) = ab+ ac. Each a ∈ R has an
additive inverse −a ∈ R, and a multiplicative inverse 1/a ∈ R, provided a 6= 0.
• R is an ordered set. If a 6= b are distinct real numbers, then either a < b or b < a.
In addition, the ordering is transitive (ie, if a > b and b > c, then a > c,) and is
consistent with the algebraic operations. In particular, if a, b, c ∈ R with a < b then
a+ c < b+ c. If c > 0, then ac < bc, but if c < 0, then ac > bc.
• R has a “metric property”, determined by the absolute value,
|x| =
{x, if x ≥ 0,
−x, if x < 0.
Then, the distance between a, b ∈ R is defined by d(a, b) = |a − b|, the length of the
line segment joining the two points on the number line.
We say R is an ordered field. However, the rational number set Q is also an ordered
field! The distinction between R and Q, and the intimate relationship between them form
an important theme in the course. To see how the algebraic and ordering properties are
defined, and how the various familiar “rules” are justified from the axioms, see Section 2 in
the Bartle & Sherbert textbook.
Let’s take a little time to verify some basic ordering and metric properties of R.
Lemma 0.1. Assume a, b ∈ R with a, b > 0. Then
a < b if and only if a2 < b2.
2
We often write “iff” for if and only if, or use the equivalence arrow with two heads, ⇐⇒ .
So we could also have written the statement of the lemma symbolically as,
∀a, b > 0, [a < b ⇐⇒ a2 < b2].
Since this is an equivalence, we have two things to prove. First, we must show that if
0 < a < b, then it must be true that ab < b2. Since a > 0, we multiply the inequality by a
and it remains true:
a2 = a · a < b · a.
As a < b and both are positive, b · a < b · b = b2, and so substituting we arrive at a2 < b2,
and the first half of the statement is verified.
Secondly, we assume that a, b > 0 and a2 < b2; what we need to conclude is that a < b.
Subtracting a2 from both sides gives,
0 < b2 − a2 = (b− a)(b+ a),
using the field properties to write the difference of squares as a factored product. Now,
a, b > 0 implies that (a+ b) > 0, so multiplying both sides by 1/(a+ b) (which is positive,)
we have (b−a) > 0. Adding a to both sides yields a < b, as needed. The Lemma is therefore
proven.
We remark that the following is a consequence of the Lemma: if x, y > 0, then x < y
iff√x <
√y. (You don’t need to do anything much to verify this, just think about it for
a minute.) Finally, note that the condition a, b > 0 is essential– the Lemma would be false
if stated for any a, b ∈ R. (Try a numerical example with a < 0 and b > 0.) Moral: the
hypotheses of a statement are very very important!
We conclude this section with some facts about the absolute value. From the definition
of the absolute value, we have:
(a) For all x ∈ R, |x| ≥ 0;
(b) For all x ∈ R, −|x| ≤ x ≤ |x|;
(c) For all x, y ∈ R, |xy| = |x| |y|.
To verify each, treat the cases x ≥ 0 and x < 0 separately, in the definition. For example, to
verify (b): if x ≥ 0, |x| = x and −|x| ≤ 0 ≤ x, and if x < 0 then −|x| = x and |x| > 0 ≥ x.
We leave the other two properties as an exercise.
The following inequality is very important, and we will use it constantly.
3
Theorem 0.2 (Triangle Inequality). For any x, y ∈ R, |x+ y| ≤ |x|+ |y|.
Proof. First, notice that if both x, y = 0, or if x + y = 0, then the inequality is true, so we
can assume in the rest of the proof that both sides of the inequality are strictly positive.
First, notice that, by the definition of the absolute value, |x|2 = x2 ∀x ∈ R. Calculate the
square of the left term,
|x+ y|2 = (x+ y)2 = x2 + y2 + 2xy using the algebraic axioms
≤ x2 + y2 + |2xy| using property (b) of absolute value
= |x|2 + |y|2 + 2|x| |y| using property (c) of absolute value
= (|x|+ |y|)2,
so we have that |x + y|2 ≤ (|x| + |y|)2. Since a = |x + y| ≥ 0, b = |x| + |y| > 0, we apply
Lemma 0.1 to obtain the triangle inequality.
How about |x − y|, which measures the distance between x and y on the number line?
The absolute value does not behave in the same way for differences; after all, either x or y
could be negative!! We do have this interesting fact:
Theorem 0.3 (Reverse Triangle Inequality). For all x, y ∈ R, |x− y| ≥∣∣|x| − |y|∣∣.
Proof. Use the triangle inequality in this tricky way:
|x| =∣∣(x− y) + y
∣∣ ≤ |x− y|+ |y|.Therefore, |x − y| ≥ |x| − |y|. Now, if we switch the roles of x, y in the above line, we get
the opposite inequality, |x− y| = |y − x| ≥ |y| − |x|. (After all, who’s to say which number
is called x?) Then, by definition of absolute value,
∣∣|x| − |y|∣∣ =
{|x| − |y|, if |x| ≥ |y|,|y| − |x|, if |x| < |y|,
≤ |x− y|,
in either case.
4
1 The Completeness Axiom, Supremum and Infimum
Upper and Lower Bounds
We assume throughout that S ⊂ R, a proper, nonempty subset of the real numbers.
Definition 1.1. We say that S ⊂ R is bounded above if there exists v ∈ R with x ≤ v for
all x ∈ S. The number v is called an upper bound for the set S.
We say that S ⊂ R is bounded below if there exists t ∈ R with x ≥ t for all x ∈ S. The
number t is called an lower bound for the set S.
The set S is called bounded if it is both bounded above and below. If S is not bounded,
we say it is unbounded.
If S is bounded above or below it has infinitely many upper or lower bounds. In particular,
if v is an upper bound for S, any z > v is also an upper bound.
Example 1.2. (a) S1 = (0, 3] = {x ∈ R : 0 < x ≤ 3} is bounded above by v = 3 or π or
10000000. It is bounded below by t = 0 or −π or -9999999999999. It is a bounded set.
(b) S2 = N = {1, 2, 3, . . . } is bounded below by any t ≤ 1. It is unbounded above (although
we really need to prove this!) and is an unbounded set.
(c) S3 = {q ∈ Q : q2 ≤ 2} = [−√
2,√
2] ∩ Q is a bounded set. Any v ≥√
2 is an upper
bound, and any t ≤ −√
2 is a lower bound.
In each example, when there is an upper of lower bound there is a “best” choice, the
most efficient upper or lower bound, the unique one which is closest to the set S.
Definition 1.3. (A) If S ⊂ R is bounded above, we define its supremum u = supS to be
the least (smallest) upper bound of S:
(i) u is an upper bound for S; and
(ii) if v is any upper bound for S, u ≤ v.
(B) If S ⊂ R is bounded below, we define its infimum w = inf S to be the greatest
(largest) lower bound of S:
(i) w is a lower bound for S; and
(ii) if t is any lower bound for S, t ≤ w.
(C) If S is unbounded above, we write supS = +∞.
5
(D) If S is unbounded below, we write inf S = −∞.
Returning to Example 1.2, inf S1 = 0, supS1 = 3, inf S2 = 1, supS2 = +∞, inf S3 =
−√
2, and supS3 =√
2, inf S3 = −√
2. Notice that even though the set S3 ⊂ Q, its
supremum and infimum are irrational. Thus, if we base our number system on only the
rational numbers Q, then some sets would not have suprema or infima. This is the final
axiom we choose for the real numbers R:
Completeness Axiom:2 If S ⊂ R is any nonempty subset which is bounded above, then
u = supS exists as a real number.
One cannot prove that this is true from the other axioms; what is required is to demon-
strate that one can construct a set of objects which satisfies the Ordered Field, Metric, and
Completeness Axioms. This was done (separately) by Cantor and by Didekind, both in
1872 but by completely different procedures. This construction is mind-blowingly abstract
in both cases, and the details are not important to what we will do in this course, so we will
not present it.
Proposition 1.4. Let S ⊂ R and define the set T = {x | − x ∈ S}. If T is bounded below,
show that T is bounded above, and inf T = − supS.
Proof. First, we show that T is bounded above: since S is bounded below, it has a lower
bound t, with t ≤ x for all x ∈ S. For any y ∈ T , y = −x with x ∈ S, and therefore
y = −x ≥ −t, for all y ∈ S, that is (−t) is an upper bound for T , so T is bounded above.
Let u = supS. We need to show inf T = −u, that is, we need to show (−u) satisfies the
conditions B(i) and B(ii) from the definition. First we verify (i): for any x ∈ S, y = −x ∈ Tand since u is an upper bound for T , −x = y ≤ u, which is equivalent to x ≥ −u. Therefore
(−u) is a lower bound for S and B(i) holds.
Next we verify (ii): let t be any lower bound for S, so x ≥ t ∀x ∈ S. Equivalently,
y = −x ∈ T and y ≤ −t, so (−t) is an upper bound for T . Since u = supT is the smallest
upper bound, u ≤ −t and thus −u ≥ t. Hence, B(ii) is also satisfied with −u = inf S.
As a corollary to the Proposition, we see that the Completeness Axiom implies that every
nonempty S ⊂ R which is bounded below must have an infimum w = inf S ∈ R.
When using the definition of the sup and inf in proofs it will be convenient to have a
more concrete criterion for each. The first statement, that u = supS is an upper bound, is
easy to represent concretely:
(i) x ≤ u ∀x ∈ S.2Sometimes called the “Continuum Axiom”
6
The second condition may be written in different ways which are equivalent. If u is the
smallest upper bound for S, then any v < u is not an upper bound for S:
(ii)′ ∀v < u, ∃x ∈ S with x > v.
Furthermore, if v < u then we can write v = u − ε with ε = u − v > 0. And so it is also
equivalent to demand:
(ii)′′ ∀ε > 0 ∃x ∈ S with x > u− ε.
The same applies to the infimum:
Theorem 1.5. Assume S ⊂ R is bounded below. Then w = inf S if and only if:
(i) x ≥ w ∀x ∈ S,
and one of the following holds:
(ii)′ for any t > w, ∃x ∈ S with x < t; or
(ii)′′ for any ε > 0, ∃x ∈ S with x < w + ε.
Let’s use these!
Proposition 1.6. Assume S is bounded below, a > 0, and T = {ax : x ∈ S}. Then
inf T = a inf S.
Proof. Call w = inf S. By the defninition, we need to show that (aw) is a lower bound for
T , and that it is the largest lower bound for T .
First, for any y ∈ T , y = ax for some x ∈ S. Using part (i) of the definition of inf S,
(and a > 0,) we have y = ax ≥ aw, and so (aw) is a lower bound for T (and (i) is verified
for T .)
Second, we will verify (ii′) for T using (ii′) for S. Take any t > aw. Then, ta> w (since
a > 0). By (ii′) for inf S there exists x ∈ S with x < ta. Multiply by a and call y = ax ∈ T ,
with the property y = ax < a ta
= t. Therefore, t is not a lower bound for T , and by (ii′)
we’re done.
Proposition 1.7. Let S1, S2 ⊂ R be bounded below, and T = {x1 + x2 : x1 ∈ S1, x2 ∈ S2}.Then inf T = inf S1 + inf S2.
Proof. Call inf S1 = w1 and inf S2 = w2. We need to show that (w1 + w2) is a lower bound
for T , and that it is the largest lower bound for T .
7
First, for any y ∈ T we can find x1 ∈ S1 and x2 ∈ S2 with y = x1 + x2. Using property
(i) from the definition of the inf of S1 and S2, we have
y = x1 + x2 ≥ w1 + w2,
and so (w1 + w2) is a lower bound for T .
Second, we will verify (ii′′) for T . Given any ε > 0, we apply (ii′′) for S1 and S2, but with
ε/2 replacing ε: there exist x1 ∈ S1 and x2 ∈ S2 so that x1 < w1 + ε2
and x2 < w2 + ε2. Since
x1 + x2 = y ∈ T , with
y = x1 + x2 < w1 +ε
2+ w2 +
ε
2= (w1 + w2) + ε,
we have verified (ii′′) for T , and therefore by definition we have proven the Proposition.
8
The Rationals are Dense in the Reals
The rationals Q are a proper subset of the reals R: irrational numbers are like “holes” in Q.
However, we don’t worry about this too much, since we have learned that irrational numbers
can always be approximated to arbitrary precision by rational numbers. We say that Q is
“dense” in R because of this relationship. There are several equivalent ways to state the
density of the rationals Q in the reals R. The one we prove here is that between any two
real numbers there is at least one rational number.
Theorem 1.8 (Density Theorem). For any a, b ∈ R with a < b, there exists a rational
number r = mn
(m ∈ Z, n ∈ N) with a < r < b.
This is another way of saying that any real number can be approximated by rational
numbers, to arbitrary precision. Indeed, for any x ∈ R, consider a = x − 10−k and b =
x + 10−k, for some k ∈ N. Then applying the Density Theorem to the interval (a, b), we
obtain r ∈ Q with x− 10−k < r < x+ 10−k, which is equivalent to |r− x| < 10−k, that is, r
agrees with x up to the kth decimal place.
How about the irrational numbers? They’re dense in the real numbers too!
Corollary 1.9. For any a, b ∈ R with a < b, there exists an irrational number s with
a < s < b.
To prove the density of the irrationals, apply the Density Theorem to the interval ( a√2, a√
2),
to get a rational number r ∈ Q with a < r√
2 < b. Since the product of a rational number
with an irrational number is always irrational (prove it!), the Corollary is verified with
s = r√
2.
We require two basic facts to prove the Density Theorem. These may seem obvious, but
can be proven using sup and inf.
Theorem 1.10 (Archimedean Principle). For any x ∈ R there exists n ∈ N with x < n.
In other words, the set N of natural numbers is not bounded from above.
Proof. To derive a contradiction, we assume that there is no such n ∈ N, that is, n ≤ x for
all n ∈ N. So x is an upper bound for the set S = N, and by the Completeness Theorem
there exists a supremum, u = supN.
By the two properties of the sup, u is an upper bound for N,
u ≥ n for all n ∈ N, (1.1)
9
and u− 1 is not an upper bound for N, so there exists m ∈ N with
u− 1 < m.
But then, u < m+ 1, and since (m+ 1) ∈ N, this contradicts (1.1). Thus, there must be an
n ∈ N with x < n.
Example 1.11. Let S ={
1n
: n ∈ N}
. Then inf S = 0.
First, ∀n ∈ N, 1n> 0, so w = 0 is a lower bound. Second, for any t > 0, we apply the
Archimedean Property to x = 1/t to conclude that there exists n ∈ N with n > 1t. Hence,
1n< t, and t > 0 is not a lower bound for S. By definition, inf S = 0.
Theorem 1.12. Let S ⊂ Z be nonempty and bounded below. Then S contains a minimal
element: there exists m ∈ S with m ≤ n for all n ∈ S.
Proof. Since S is bounded below, by the Completeness Axiom there exists w = inf S. If
w ∈ S then we are done. So assume w 6∈ S, in order to derive a contradiction. We apply
the two properties of the inf: first, w ≤ n for all n ∈ S. Since w 6∈ S, in fact we have the
slightly stronger
w < n for all n ∈ S,
and w + 1 is not a lower bound for S, so there exists k ∈ S ⊂ Z for which
k < w + 1.
Since S has no minimum element, k is not the smallest element of S, so there must be j ∈ Swith j < k. Putting these all together,
w < j < k < w + 1.
In particular, 0 < k − j < (w + 1) − j < (w + 1) − w = 1, but k − j ∈ Z and the distance
between any two integers is at least one! So this is impossible, and we conclude that S has
a minimum element.
Applied to subsets of the natural numbers this property is called the “well-ordering
principle”:
Corollary 1.13 (Well-Ordering Principle). Any nonempty subset of N has a minimum ele-
ment.
The Corollary follows from the previous Theorem, since N is bounded below (by 1), and
so any nonempty subset of N is bounded below.
We’re now ready to prove the Density Theorem.
10
Proof of the Density Theorem. First we choose the denominator of r = m/n. By the Archimedean
Property, there exists n ∈ N with n > 1(b−a) , that is
1
n< (b− a). (1.2)
To find the numberator, consider the set S = {k ∈ Z : k > na}. Since S ⊂ Z and S is
bounded below by na, by Theorem 1.12 S contains a minimal element m, and so m ∈ S and
m− 1 6∈ S. This implies:
m > na and m− 1 ≤ na. (1.3)
Putting (1.2) and (1.3) together we get:
a <m
n=m− 1
n+
1
n≤ a+
1
n< a+ (b− a) = b.
Then the conclusion holds with r = m/n.
11
2 Sequences
A sequence of real numbers is a function f : N → R; for each counting number n ∈ Nwe associate to it a real number f(n) = xn. There are many different ways of denoting the
sequence. Here are a few which you may see:
(x1, x2, x3, . . . ) = (xn) = (xn)n∈N = (xn : n ∈ N) .
The text also writes X = (xn), using a single capital letter for the sequence as a whole.
Example 2.1. (a) The function f can be explicitly given. For instance, xn = 2n2+1n2 ,(
2n2 + 1
n2
)n∈N
=
(3,
9
4,19
9,33
16, . . . ,
).
(b) (xn) = ( (−1)n )n∈N = (−1, 1,−1, 1,−1, . . . ).
(c) We may define sequences by iteration. Let g : R→ R be a given real-valued function.
Choose an initial value x1 ∈ R and then define the sequence iteratively,
xn+1 = g (xn) , n = 1, 2, 3, . . .
This is a natural way to define sequences, for example to approximate solutions to equa-
tions. For a more specific example, take g(x) = 12
(x+ 2
x
), and generate the sequence
(xn) by iteration:
x1 = 2 and xn+1 =1
2
(xn +
2
xn
), n = 1, 2, 3, . . . .
The first few values are:
(xn) = (2, 1.5, 1.416, 1.414215686 . . . , 1.414213562 . . . , . . . )
Later, we will prove that this sequence converges to√
2.
Note that a sequence is not the same thing as a set. A sequence is an infinite ordered
list of numbers. In a sequence the same number may appear several times, and changing
the order of the elements of a sequence creates an entirely different sequence. A set has no
order, and there is no point in repeating the same value several times. Taking Example (b)
above, {xn : n ∈ N} = {−1,+1} is a set with two elements. The sequence (xn) is not the
same thing. If we let {yn}n∈N = (−1,−1, 1, 1, −1,−1, 1, 1, . . . ), this is a different sequence
than {xn}n∈N, yet it takes values in the same set {yn : n ∈ N} = {−1,+1}.
12
2.1 Limits of Sequences
Definition 2.2. We say that the sequence (xn) converges to x ∈ R if:{for every ε > 0 there exists K ∈ N so that
|xn − x| < ε for every n ≥ K.
We write x = limn→∞
xn, or xn −−−→n→∞
x as n→∞.
If there is no value of x for which xn converges to x, we say (xn) diverges.
Example 2.1 (a) converges to x = 2, xn = 2n2+1n2 −−−→
n→∞2 as n→∞. Let’s verify this via
the definition: let ε > 0 be given. We calculate
|xn − 2| =∣∣∣∣2n2 + 1
n2− 2
∣∣∣∣ =
∣∣∣∣(2n2 + 1)− 2n2
n2
∣∣∣∣ =
∣∣∣∣ 1
n2
∣∣∣∣ =1
n2. (2.1)
We need to determine when the right-hand side 1/n2 < ε. This is true when n > 1/√ε.
By the Archimedean property, we may choose K ∈ N with K > 1/√ε, so for all n ≥ K,
n ≥ K > 1/√ε and so 1
n2 < ε for n ≥ K. Plugging back into (2.1) we have
|xn − 2| = 1
n2< ε ∀n ≥ K,
and so xn −−−→n→∞
2 by definition.
Remark 2.3. By a Practice Problem, the inequality condition |xn − x| < ε is equivalent to
two inequalities, above and below,
x− ε < xn < x+ ε.
Another more geometrical way to write this is in terms of open intervals,
xn ∈ (x− ε , x+ ε) .
The interval (x− ε , x+ ε) is called an open neighborhood of size ε centered at x. So
xn −−−→n→∞
x is equivalent to saying that the elements of the sequence xn ∈ (x− ε , x+ ε)
eventually always, that is, for all n ≥ N(ε).
We verify the following properties of the limit for sequences:
Theorem 2.4. Assume the sequence (xn)n∈N is convergent, x = limn→∞
xn.
(a) The limit is unique: if limn→∞
xn = y, then y = x.
(b) (xn)n∈N is bounded; that is, ∃ M ∈ R with |xn| ≤M , ∀n ∈ N.
13
(c) If xn ≥ 0 ∀n ∈ N, then x ≥ 0.
(d) If a ≤ xn ≤ b ∀n ∈ N, then a ≤ x ≤ b.
Be careful: a bounded sequence may not be convergent! Note also that if we have strict
inequality in any of the above, for instance, if xn > 0 ∀n ∈ N, we may not conclude that
x > 0. (Exercise: find examples of sequences which which demonstrate these two remarks!)
Proof. For (a), suppose that there are two different limits, y 6= x. We apply the definition
of convergence twice: for any ε > 0, there exist two numbers N1 = N1(ε) ∈ N and N2 =
N2(ε) ∈ N, so that
|xn − x| < ε ∀ n ≥ N1 and |xn − y| < ε ∀ n ≥ N2.
By taking N = max{N1, N2}, then both of the above conditions hold,
|xn − x| < ε and |xn − y| < ε ∀n ≥ N.
This is true for any ε > 0, and here we will choose ε = |y − x|/2. Then, using the Triangle
Inequality trick,
|x− y| = |(x− xn) + (xn − y)| ≤ |xn − x|+ |xn − y| < 2ε = |x− y|, ∀n ≥ N,
which is impossible (because of the strict inequality.) This proves (a).
For (b), Apply the definition of limit with ε = 1, so ∃N ∈ N for which |xn − x| < 1
whenever n ≥ N . Then, by the triangle inequality,
|xn| = |(xn − x) + x| ≤ |xn − x|+ |x| < 1 + |x|, ∀n ≥ N.
This gives a bound on the absolute value of all but finitely many of the xn; for the finite
collection xn, . . . , xN−1, one of these has the largest absolute value, and for every n ∈ N,
|xn| ≤ max {|x1|, |x2|, . . . , |xN−1|, 1 + |x|} .
The right hand side is a number independent of n, and so the sequence {xn} is bounded.
For (c), argue by contradiction and suppose that the limit x < 0. Let ε = |x| > 0, and
apply the definition of limit: ∃N ∈ N for which |xn − x| < ε, whenever n ≥ N . Unfolding
the absolute value into two inequalities, we have
x− ε < xn < x+ ε ∀n ≥ N,
but we are only interested in a lower bound on x, and so we keep
0 ≤ xn < x+ ε = x+ |x| = 0 ∀n ≥ N,
14
since we are assuming xn ≥ 0 but x < 0. This is impossible (strict inequality!) and hence
x ≥ 0.
For (d), let yn = xn − a, and so by hypothesis yn ≥ 0 ∀n ∈ N. Since
|yn − (x− a)| = |(xn − a)− (x− a)| = |xn − a|,
by the definition of the limit yn −−−→n→∞
(x− a). Using part (c) we conclude that (x− a) ≥ 0,
in other words, x ≥ a. To verify that x ≤ b, define the sequence zn = (bn − xn) ≥ 0, and
argue as above. (Exercise!)
Here’s an old friend from calculus:
Theorem 2.5 (Squeeze Theorem). Suppose (xn)n∈N, (yn)n∈N, and (zn)n∈N are sequences in
R with:
(i) xn ≤ yn ≤ zn ∀n ∈ N; and
(ii) (xn)n∈N, (zn)n∈N are convergent, with
limn→∞
xn = L = limn→∞
zn.
Then (yn)n∈N is convergent, and limn→∞
yn = L.
Proof. This is a very simple consequence of the definition of convergence. As in the proof of
(a) above, for any ε > 0 there exists a single value N = N(ε) ∈ N for which both
|xn − L| < ε and |zn − L| < ε, ∀n ≥ N.
Using a Practice Problem from Chapter 2 in the book, the inequality with the absolute value
may be written as a double inequality,
L− ε < xn < L+ ε and L− ε < zn < L+ ε, ∀n ≥ N.
Using only the red parts of the inequalities, we have:
L− ε < xn ≤ yn ≤ zn< L+ ε, ∀n ≥ N.
Reading only the blue parts, L − ε < yn < L + ε, ∀n ≥ N , which is equivalent to yn −−−→n→∞
L.
Remark 2.6. We can also think of convergence to a limit in terms of the limit of the distance
from xn to the limit value x. That is, call this distance rn = |xn−x|, ∀n ∈ N. Then, rn ≥ 0,
and so the limit exists if and only if: ∀ε > 0 there exists N ∈ N so that
0 ≤ rn = |xn − x| < ε, ∀n ≥ N.
Notice that this statement is equivalent to limn→∞
rn = 0.
15
Theorem 2.7. Assume (xn)n∈N and (yn)n∈N are convergent, xn −−−→n→∞
x and yn −−−→n→∞
y.
Then each of the following combinations are convergent:
(a) xn + yn −−−→n→∞
x+ y;
(b) xn yn −−−→n→∞
x y;
(c) If y 6= 0,xnyn−−−→n→∞
x
y.
The proof of the above Theorem may be found in section 3.2 of the textbook. To give
you an idea of how to do it (Triangle Inequality!), consider (b): write
|xnyn − xy| = |xnyn − xny + xny − xy| = |xn(yn − y) + (xn − x)y|
≤ |xn| |yn − y|+ |xn − x| |y|.
By Theorem 2.4 (b), convergent sequences are bounded so there exists M ∈ R such that
|xn| ≤M . Hence,
|xnyn − xy| ≤M |yn − y|+ |y| |xn − x|. (2.2)
(Note that |y| is a constant!) Now we can either proceed by the definition (for any ε > 0,
choose N = N(ε) such that both
M |yn − y| <ε
2and |y| |xn − x| <
ε
2,
∀n ≥ N ,) or by using Remark 2.6 observing that the right-hand side of (2.2) tends to zero.
and substitute above. The others may be proven in a similar way.
Example 2.8. We will show that limn→∞
n1/n = 1. First, note that for n ≥ 2, n1/n > 1. So we
can write
n1/n = 1 + bn, with bn > 0.
Hence, taking the nth power and applying the Binomial Theorem,
(a+ b)n =n∑k=0
(n
k
)an−kbk,
we have:
n = (1 + bn)n
=n∑k=0
(n
k
)bkn
> 1 +n(n− 1)
2b2n.
16
The inequality in the last line is valid since the terms omitted from the sum are all positive,
and hence the last line is smaller than the sum above it.
Rearranging, we have (n− 1) > n(n−1)2
b2n, and hence 0 < bn <√
2n, ∀n ≥ 2. Recalling the
definition of bn,
1 < n1/n = 1 + bn < 1 +
√2
n
for all n ≥ 2. By the Squeeze Theorem, we obtain the desired limit.
17
2.2 Monotone Sequences
Definition 2.9. Let (xn)n∈N be a sequence in R.
(i) The sequence is monotone increasing if x1 ≤ x2 ≤ x3 . . . , ie, if xn ≤ xn+1 ∀n ∈ N.
(ii) The sequence is monotone decreasing if x1 ≥ x2 ≥ x3 . . . , ie, if xn ≥ xn+1 ∀n ∈ N.
(iii) The sequence is bounded above if ∃M ∈ R with xn ≤M ∀n ∈ N.
(iv) The sequence is bounded below if ∃m ∈ R with xn ≥ m ∀n ∈ N.
If the sequence is both bounded above and bounded below, then it is bounded in the
sense of Theorem 2.4 (b). (Check it!) If (a) or (b) hold with strict inequalities, ie, xn < xn+1
∀n ∈ N, we say the sequence is strictly monotone increasing.
Example 2.10. Define a sequence by iteration:
x1 = 0, xn+1 =√
5 + xn, n ∈ N.
So
X = (xn)n∈N =
(0,√
5,
√5 +√
5,
√5 +
√5 +√
5, . . .
).
Typically with a sequence defined via iteration it’s not possible to have a simple explicit
formula for xn; to get x24 you first need to calculate the first 23 values.
We first claim that 0 ≤ xn < 5 ∀n ∈ N. To verify this we use induction. First, when
n = 1,0 = x1 < 5, so the claim is true for n = 1. Next, assume that the claim is true for xn,
and show it must hold for xn+1. Indeed, if 0 ≤ xn < 5, then xn+1 =√
5 + xn > 0 (square
roots are positive), and
xn+1 =√
5 + xn <√
5 + 5 =√
10 <√
25 = 5,
so the claim must be true for all n.
In particular, by the claim, (xn)n∈N is bounded above and below.
We also claim that (xn) is monotone increasing, xn ≤ xn+1 for all n. Again we use
induction: when n = 1, x1 = 0 < 5 = x2, so it’s true. Assuming xn−1 ≤ xn, we calculate
xn+1 − xn =√
5 + xn −√
5 + xn−1
=(5 + xn)− (5 + xn−1)√
5 + xn +√
5 + xn−1
[using a− b =
a2 − b2
a+ b
]=
xn − xn−1√5 + xn +
√5 + xn−1
≥ 0,
by the assumption xn ≥ xn−1 (and since the denominator is positive.) Therefore xn+1 ≥ xn
holds for all n.
18
Now, a sequence which is monotone increasing and bounded above is hemmed in, and has
nowhere to go. It must always move to the right, but can never get past the upper bound;
in this case, xn ≤ xn+1 ≤ 5 for all n ∈ N. So the values have to get compressed together as
n increases, that is, the sequence must converge:
Theorem 2.11 (Monotone Sequence Theorem). (a) If (xn)n∈N is monotone increasing and
bounded above, then (xn)n∈N is convergent; ∃x ∈ R with xn −−−→n→∞
x.
(b) If (xn)n∈N is monotone decreasing and bounded below, then (xn)n∈N is convergent; ∃x ∈R with xn −−−→
n→∞x.
Returning to Example 2.10, the sequence (xn) defined there is monotone increasing and
bounded above, so by the Montone Sequence Theorem it is convergent, that is xn −−−→n→∞
x.
But what is the limit x? We can find an equation for it by passing to the limit in the iteration
equation, xn+1 =√
5 + xn ∀n ∈ N. This implies:
xn+1 =√
5 + xn, ∀n ∈ N =⇒ x2 = limn→∞
x2n+1 = limn→∞
(5 + xn) = 5 + x,
and so x is a solution of the equation x2−x−5 = 0. By the quadratic formula, x = 12± 1
2
√21
are the roots of the polynomial. Since xn > 0 ∀n, by Theorem 2.4 the limit x ≥ 0, and we
conclude that x = 12(1 +
√21) = limn→∞ xn.
If the terms “bounded above” and “bounded below” sound familiar from supremum and
infimum, it’s no accident. For a monotone increasing sequence, we will show below that
limn→∞
xn = sup{xn | n ∈ N}, that is, we define the set of numbers included in the sequence,
S = {xn | n ∈ N}, and
limn→∞
xn = supS = sup{xn | n ∈ N}.
Often, this is written as supn∈N
xn for convenience. Similarly, if the sequence is monotone
decreasing,
limn→∞
xn = infn∈N
xn = infn∈N{xn | n ∈ N}.
The Monotone Sequence Theorem is logically equivalent to the Competeness Theorem
(the existence of the supremum for sets which are bounded above.) In other words, we could
have chosen to take the Monotone Sequence Theorem as an axiom for the completeness of
R, and used it to prove that every set S which is bounded above has a supremum. To really
prove one or the other is to “construct R from Q”, and come to grips with what kind of
beast R really is. But let’s show that they really are equivalent!
19
Completeness Theorem =⇒ Monotone Sequence Theorem. Let (xn)n∈N be monotone increas-
ing and bounded above, and define (as above) the set
S = {xn | n ∈ N}.
Then the set S is bounded above, and by the Completeness Theorem there exists a supremum
x ∈ R, x = supS. We need to show that xn −−−→n→∞
x.
First, x is an upper bound for S, so xn ≤ x ∀n ∈ N. It is the smallest upper bound for
S, so given any ε > 0 there exists an element xK ∈ S (so ∃K ∈ N,) with x− ε < xK . Since
the sequence is monotone increasing, xK ≤ xn ∀n ≥ K. Putting these inequalities together,
x− ε < xK ≤ xn ≤ x < x+ ε, ∀n ≥ K,
and therefore xn −−−→n→∞
x.
Monotone Sequence Theorem =⇒ Completeness Theorem. This one is trickier, but it’s more
fun! Let S ⊂ R be a nonempty set which is bounded above. The idea is to define two se-
quences, xn ∈ S and a sequence of upper bounds vn for S, each of which converges monoton-
ically to supS. Start by taking any element x1 ∈ S and any upper bound v1. Then x1 ≤ v1,
and if x1 = v1 then they must agree with the supremum. (We had a practice problem like
that!) Suppose they’re not the same, and let r1 = v1 − x1 > 0.
Now look at the midpoint between these points, y1 = 12(x1 + v1). If y1 is an upper bound
for S, we define the second point in each sequence by v2 = y1 and x2 = x1. Since v2 = y1 is
the midpoint, the distance between x2 and v2 is half of what it was, r2 = v2 − x2 = 12r1. On
the other hand, if y1 is not an upper bound for S, then there exists x2 ∈ S with x1 < y1 < x2.
In this case, we keep the old upper bound, v2 = v1. In choosing x2 > y1 the interval (x2, v2)
has shrunk by at least half, so r2 = v2 − x2 ≤ 12r1. So in either case, we have found x2 ∈ S
and upper bound v2 so that
x1 ≤ x2 ≤ v2 ≤ v1, and r2 = v2 − x1 ≤1
2r1.
Now proceed by iteration. Assume we have already found x1, . . . , xn ∈ S and upper
bounds v1, v2, . . . , vn with
x1 ≤ x2 ≤ · · · ≤ xn−1 ≤ xn ≤ vn ≤ vn−1 ≤ · · · ≤ v2 ≤ v1,
and rn = vn − xn ≤ 2−n+1r1. Following the above procedure, we ask if the midpoint
yn = 12(xn + vn) is an upper bound for S or not. If it is, we keep xn+1 = xn but swap
vn+1 = yn. If not, we find xn+1 ∈ S with xn+1 > yn > xn and keep the upper bound
vn+1 = vn. The distance between them rn+1 ≤ 12rn ≤ 2−nr1 −−−→
n→∞0 as n→∞.
20
Since (xn) is monotone increasing and bounded above (by v1,) by the Monotone Sequence
Theorem it converges, xn −−−→n→∞
x. Similarly, (vn) is monotone decreasing and bounded below
(by x1,) so it also converges, vn −−−→n→∞
v. Since 0 ≤ vn − xn ≤ rn −−−→n→∞
0, by the Squeeze
Theorem we must have x = v. So we only need to show that v = supS.
First, take any y ∈ S. Since each vn is an upper bound for S, y ≤ vn ∀n ∈ N. By
Theorem 2.4 (d), y ≤ v, and so v is an upper bound for S. Now, for any ε > 0, since
xn → v there exists K ∈ N with v − ε < xK < v + ε. In particular, xK ∈ S, and so the first
inequality shows that v − ε is not an upper bound for S. We conclude that v = supS, and
so the supremum exists.
As a corollary of the above construction, we obtain an interesting and clarifying fact
about the supremum and infimum.
Theorem 2.12. Let S ⊂ R be a nonempty set, bounded above. Then there exists a sequence
(xn) with xn ∈ S ∀n, monotone increasing, for which xn −−−→n→∞
supS.
Of course, the same applies to the infimum, except that the sequence will be monotone
decreasing.
Another interesting application (done in class) is constructing sequences which converge
monotonically to√a, for any a > 0. See Example 3.3.5 in the text.
Exercise 2.13. We return to Example 2.1 (c),
x1 = 2 and xn+1 =1
2
(xn +
2
xn
), n = 1, 2, 3, . . . .
to show limn→∞
xn =√
2.
(a) Use induction to show that x2n − 2 ≥ 0 for all n ∈ N.
(b) Use induction to show that (xn) is monotone decreasing in n.
(c) Use the Monotone Sequence Theorem to show convergence, xn −−−→n→∞
x, and identify x
as the solution of a polynomial equation.
About Induction. We hope that you’ve already seen mathematical induction somewhere
else before. But we remind you of it here, and in the spirit of healthy skepticism about all
things, (which we encourage in studying math,) we show that it isn’t magic.
Proposition 2.14. For each n ∈ N, let P (n) denote some statement involving the value of
n. If we can show both:
21
(1) P (1) is true; and
(2) For every n ∈ N, if P (n) is assumed to be true then P (n+ 1) is true,
then P (n) is true for all n ∈ N.
Proof. Let S = {n ∈ N | P (n) is false.}. To derive a contradiction, we suppose that S is not
empty. By the Well Ordering Principle (Corollary 1.13), S has a minimal element. Call the
minimal element (k + 1), so (k + 1) ∈ S ⊂ N but for any n ≤ k, n 6∈ S. Therefore, P (k) is
true. But, by (2), P (k) true implies P (k+1) is also true, and this contradicts (k+1) ∈ S.
From the proof, we can see that the following version of induction is also verified:
If we can show both:
(1) P (1) is true; and
(2) For every k ∈ N, if P (n) is assumed to be true for all n ≤ k, then P (k + 1) is true,
then P (n) is true for all n ∈ N.
Sometimes this form of induction is needed (for example, for sequences defined by iteration
involving several previous terms.) This is discussed in section 1.2 of the textbook.
22
2.3 Divergent Sequences and Subsequences
Although we prefer sequences which converge, there are also many interesting things to learn
about divergent sequences. Sequences can diverge in various ways, some more interesting
than others.
The simplest kind of divergence is called proper divergence in the book.
Definition 2.15 (Properly divergent sequences). Let (xn)n∈N be a sequence in R.
(a) We say the sequence properly diverges to +∞ if:
∀a > 0 ∃ K ∈ N so that xn > a ∀n ≥ K.
We write limn→∞
xn = +∞ or xn −−−→n→∞
+∞, even though the sequence does not converge.
(b) We say the sequence properly diverges to −∞ if:
∀b < 0 ∃ K ∈ N so that xn < b ∀n ≥ K.
We write limn→∞
xn = −∞ or xn −−−→n→∞
−∞, even though the sequence does not converge.
So a sequence diverges to infinity if, for any fixed number a > 0, xn is eventually always
larger than a.
Example 2.16. Let xn =√n(3 + sin(n)), n ∈ N. Then xn −−−→
n→∞+∞.
Choose any a > 0. Since sin(n) ≥ −1 for any n, we have
xn ≥√n(3− 1) = 2
√n > a
provided n > a2
4. So, take K ∈ N with K > a2
4. If n ≥ K > a2
4, by the above calculation,
xn > a, and by definition xn −−−→n→∞
+∞.
Proposition 2.17. If (xn)n∈N is monotone increasing and not bounded above, then xn −−−→n→∞
+∞ (properly divergent).
If (xn)n∈N is monotone decreasing and not bounded below, then xn −−−→n→∞
−∞ (properly
divergent).
Proof. Take any a > 0. If (xn)n∈N is unbounded above, then in particular a is not an upper
bound for the sequence, so there exists K ∈ N so that a < xK . Since the sequence is
monotone increasing, xK ≤ xn for all n ≥ K, and therefore
a < xK < xn ∀ n ≥ K.
By definition, xn −−−→n→∞
+∞.
The second statement is similar, and is left as an exercise.
23
Example 2.18. Let r > 1 and define the sequence xn = rn, n ∈ N. Then, xn+1 = rn+1 =
r xn > xn ∀n ∈ N, so (xn)n∈N is (strictly) monotone increasing. In particular, notice that
xn > 1 ∀n.
We claim that the sequence is not bounded above, in which case we can apply Propo-
sition 2.17 to conclude xn −−−→n→∞
+∞. We argue by contradiction, and suppose that (xn)
is bounded above. In that case, by the Monotone Sequence Theorem, xn = rn −−−→n→∞
x for
some x ∈ R. But we pass to the limit in the equation xn+1 = r xn, to obtain the equation
x = r x. Since r > 1, we must have x = 0. But, by Theorem 2.4, x ≥ 1, a contradiction.
Therefore, the sequence (xn) is unbounded, and Proposition 2.17 applies.
If the sequence isn’t monotone the variety of divergent behavior is much greater.
Example 2.19. Define the sequence
xn = n sin[nπ
2
]=
0, if n = 2k, k ∈ N (even),
n, if n = 4k − 3, k ∈ N,
−n, if n = 4k − 1, k ∈ N,
= (1, 0,−3, 0, 5, 0,−7, 0, . . . ).
The sequence is unbounded above and unbounded below. It must therefore diverge (The-
orem 2.4!) but it does not properly diverge to either ±∞. Instead, it breaks down into
pieces, each of which has different limiting behavior. We call those parts of the sequence
subsequences.
Definition 2.20. (a) Let X = (xn)n∈N be a sequence in R, and let (nk)k∈N be a strictly
increasing sequence of counting numbers: nk ∈ N and n1 < n2 < n3 < . . . . We call the
new sequence (xnk)k∈N a subsequence of the original sequence X.
(b) If (xnk)k∈N is a subsequence of X = (xn)n∈N which converges, and y = lim
k→∞xnk
, we
call y a subsequential limit point of the sequence X.
[Often we will drop “subsequential” and simply call y a “limit point” of the sequence.
Other books use the term cluster point.]
So a subsequence is a sequence which is extracted from the original sequence. It must be
itself a sequence (an infinite ordered list) and it must take elements from the original sequence
in the same order as they originally appeared. You can think of making a subsequence by
eliminating undesirable elements from X, but still leaving an infinite number.
In the previous example, there are three natural choice of subsequences. First, the
even elements, indexed by nk = 2k, k ∈ N are x2k = 0. This subsequence is convergent,
0 = limk→∞
x2k, and so y = 0 is a subsequential limit point of the original sequence.
24
Another interesting subsequence is defined by the indices nk = 4k − 3, k ∈ N, with
x4k−3 = nk = 4k − 3. This subsequence is properly divergent to +∞. The third distinct
subsequence is defined by the choice nk = 4k− 1, and so x4k−1 = −(4k− 1) −→ −∞ is also
a properly divergent subsequence. We do not call ±∞ subsequential limit points, as they
are not real numbers.
Exercise 2.21. Consider the following crazy looking sequence:
(xn)n∈N =
(1
2,
1
3,2
3,
1
4,2
4,3
4,
1
5,2
5,3
5,4
5,
1
6,2
6,3
6,4
6,5
6, . . . . . .
)[Notice that each xn ∈ (0, 1), and every rational number between 0 and 1 is somewhere on
the list!] What is the set of all possible subsequential limit points of (xn)?
What if the sequence is actually convergent: do we learn anything more by looking at
subsequences? NO:
Theorem 2.22. If xn −→ x (is convergent), then every subsequence (xnk)k∈N converges (as
k →∞) to x also.
This statement almost does not require proof. For, if xn −→ x, then by definition, ∀ε > 0
∃K ∈ N so that |x − xn| < ε ∀n ≥ K. The subsequence consists of elements of (xn), so
whenever k is large enough that nk ≥ K we have |x − xnk| < ε, and so the subsequence
(xnk)k∈N converges to x too.
The contrapositive of Theorem 2.22 is useful to determine divergence of sequences:
Corollary 2.23. If (xn)n∈N has two different subsequential limits (ie, it contains two subse-
quences which converge to distinct values,) then the sequence is divergent.
Example 2.24. The sequence xn = (−1)n is divergent, since it has two distinct subsequential
limits. The odd subsequence x2k−1 −→ −1 as k → ∞, and the even subsequence x2k −−−→k→∞
+1 as k →∞. By Corollary 2.23 we conclude that the whole sequence must be divergent.
If a sequence diverges, must it always have (subsequential) limit points? A properly
divergent sequence has no limit points (why?) so we need to restrict ourselves a bit to get a
positive answer:
Theorem 2.25 (Bolzano-Weierstrass Theorem). Let (xn)n∈N be a bounded sequence in R.
Then (xn)n∈N contains a convergent subsequence.
More precisely, if xn ∈ [a, b] ∀n ∈ N, then there exists y ∈ [a, b] and a subsequence
(xnk)k∈N with lim
k→∞xnk
= y.
25
There are several different proofs of this important theorem. We will use one based on
this interesting fact about any sequence of real numbers:
Lemma 2.26. Every sequence (xn)n∈N contains a monotone subsequence.
The subsequence we find might be monotone increasing or monotone decreasing; it de-
pends on the sequence (xn)n∈N.
Proof of Lemma 2.26. Given a sequence (xn) of real numbers, we define the set of its “peaks”,
P = {k ∈ N | xk ≥ xn ∀n ≥ k}.
The idea is that if we graph the sequence in the plane with points (n, xn) and connect the
dots, then we get a mountain range, and standing at the peak points our view to the right
is not blocked by the rest of the sequence. Notice that for a monotone decreasing sequence,
every n is a peak, while for a strictly increasing sequence, no n is a peak.
If P is an infinite set, then it is an infinite subset of N and can be written as a sequence,
(nj)j∈N, that is the set
P = {n1 < n2 < n3 < · · · } = {nj | j ∈ N}.
Hence, it defines a subsequence (xnj)j∈N, and each is a peak, so xnj
≥ xnj+1∀j ∈ N, is a
monotone decreasing sequence.
If P is not an infinite set (ie, it has only finitely many elements,) then it has a largest
element; call it K. Then, n1 = K + 1 is not a peak, so there exists n2 > n1 for which
xn2 > xn1 . Since n2 > K it is not a peak either, so ∃n3 > n2 > n1 with xn3 > xn2 > xn1 .
Continuing in this way, we construct a strictly monotone increasing subsequence.
The proof of the Bolzano-Weierstrass Theorem is then very easy:
Proof of B-W. Assume (xn)n∈N is a bounded sequence, so there exists M ∈ R with |xn| ≤M
∀n ∈ N. By Lemma 2.26 it contains a monotone subsequence (xnk)k∈N (either increasing
or decreasing.) As the whole sequence is bounded, so is the subsequence, |xnk| ≤ M ∀k ∈
N. Thus, by the Monotone Sequence Theorem the subsequence converges: ∃x ∈ R with
limk→∞ xnk= x. If a ≤ xn ≤ b ∀n ∈ N, then by Theorem 2.4 we must have the limit
a ≤ x ≤ b also.
Finally, for a divergent sequence (xn) we distinguish the two most important subsequen-
tial limits:
26
Definition 2.27. If X = (xn) is a bounded sequence, let S be the set of all subsequential
limits of X. The limit superior of X is the supremum of this set, and the limit inferior is
its infimum,
lim supn→∞
xn = supS, lim infn→∞
xn = inf S.
If S is unbounded above, we define lim supn→∞ xn = +∞, and if S is unbounded below, define
lim infn→∞ xn = −∞.
Example 2.28. a. (xn) =(2n cosnπ2n−1
)= (−2,+4
3,−6
5,+8
7,−10
9, . . . )
The odd subsequence x2k−1 → −1, which is a subsequential limit point. The even subsequence
also converges, x2k → +1, another limit point. In fact, these are the only two, so S =
{−1,+1} is the set of all limit points. Therefore, lim supn→∞ xn = +1, and lim infn→∞ xn =
11. Notice that inf{xn | n ∈ N} = −2 which is not the same as the liminf!
b. (xn)n∈N = (2n(−1)n) = (1
2, 4, 1
8, 16, 1
32, 64, . . . ).
Verify yourself that there is an unbounded subsequence, and a subsequence converging to 0,
so lim infn→∞ xn = +∞ while lim infn→∞ xn = 0.
Proposition 2.29. Let (xn)n∈N be a bounded sequence. Then, lim infn→∞ xn ≤ lim supn→∞ xn,
and they are equal if and only if (xn)n∈N converges.
Proof. The infimum of any set is smaller or equal to the supremum, so lim infn→∞ xn ≤lim supn→∞ xn.
If (xn)n∈N is convergent to x, then x is its only limit point (remember, all subsequences
of a convergent sequence converge to the same limit,) and so S = {x} has supS = x = inf S.
To prove the converse, suppose lim infn→∞ xn = lim supn→∞ xn = L but (xn)n∈N diverges.
Since L is not the limit of (xn)n∈N, we have:
∃ε > 0 so that ∀N ∈ N there exists n ≥ N with |xn − L| ≥ ε.
Let’s use this to construct a subsequence! First, take N = 1: there exists n1 ≥ 1 with
|xn1 − L| ≥ ε. Next, take N = n1 + 1: there exists n2 ≥ n1 + 1 with |xn2 − L| ≥ ε. Then,
take N = n2 + 1: there exists n3 ≥ n2 + 1 with |xn3 − L| ≥ ε. Continue like this forever.
You get indices n1 < n2 < n3 < · · · and a subsequence (xnk)k∈N with |xnk
− L| > ε, that is,
the subsequence does not converge to L.
Since the original sequence was bounded, so is the subsequence. But the Bolzano-
Weierstrass Theorem says that any bounded sequences contains a convergent subsequence,
so (xnk)k∈N has a further subsequence which must converge, to a limit point y =6= L. But
we are assuming that there is only one limit point, and so this is a contradiction.
27
There are formulas for liminf and limsup (which are in the book, but I will not prove
them here):
Lemma 2.30. For any sequence (xn)n∈N (bounded or not),
lim supn→∞
xn = infk∈N
(supn≥k
xn
)= lim
k→∞
(supn≥k
xn
),
lim infn→∞
xn = supk∈N
(infn≥k
xn
)= lim
k→∞
(infn≥k
xn
).
2.4 Cauchy Sequences
Cauchy introduced a concept of convergence which looks like the usual definition, except
that it makes no mention of the limit value.
Definition 2.31. (xn)n∈N is a Cauchy sequence if: ∀ε > 0, ∃H ∈ N for which
|xn − xm| < ε ∀n,m ≥ H.
A sequence is Cauchy if eventually (n ≥ H) every element is arbitrarily close to every
other element (m ≥ H also.) Notice that n,m are treated symmetrically in this definition,
but often it will be convenient to indicate which one is the larger one, so we are permitted
to choose one to be the larger, ie,
|xn − xm| < ε ∀n > m ≥ H.
The definition of Cauchy sequences looks very much like that of limits, and indeed for
sequences in R the two are in fact equivalent.
Theorem 2.32. For any real number sequence, (xn)n∈N is a Cauchy sequence if and only if
(xn)n∈N is convergent.
This theorem is yet another equivalent statement of the Continuum Property of the
reals. That is, each of the following statements is logically equivalent, in the sense that if
you assume any one of them to be true we can prove the others are true:
(a) Every set S ⊂ R which is bounded above has a supremum u = supS ∈ R.
(b) Every sequence (xn)n∈N which is monotone and bounded converges to some x ∈ R.
(c) Every Cauchy sequence (xn)n∈N converges to some x ∈ R.
28
Each of these is an existence statement, and each is about “filling in the holes”, the idea
that R is a continuum.
Proof of Theorem 2.32. We break the argument down into steps.
Step 1: If (xn)n∈N is a Cauchy sequence then (xn)n∈N is bounded.
This is done exactly as in the proof of Theorem 2.4(b) (the fact that a convergent sequence
must be bounded.) Again, take ε = 1 in the definition of Cauchy Sequence, and get K ∈ Nfor which |xn − xK | < ε = 1, ∀n ≥ K. Now, finish the proof as above, letting xK play the
role of the limit in the proof of Theorem 2.4(b). [The details are left as an exercise.]
Step 2: By Bolzano-Weierstrass, (xn)n∈N contains a convergent subsequence (xnk)k∈N.
Thus, ∃ x ∈ R with limk→∞ xnk= x.
Step 3: If (xn)n∈N is a Cauchy sequence which contains a convergent subsequence, then
the whole sequence (xn)n∈N converges.
Let ε > 0 be any given value. By the Cauchy condition, ∃N ∈ N with |xm − xj| < ε2
for
all m, j ≥ N . The subsequence converges, so ∃ K1 ∈ N with |xnk− x| < ε
2for all k ≥ K1.
Finally, since nk → ∞ as k → ∞, we may choose K2 ∈ N with nk ≥ N whenever k ≥ K2.
Take K = max{K1, K2}. For every m ≥ N and k ≥ K,
|xm − x| ≤ |xm − xnk|+ |xnk
− x| < ε,
and so xm → x.
Where did we use the Completeness Axiom? It’s in Step 2: the Bolzano-Weierstrass
Theorem uses the Monotone Sequence Theorem, which is equivalent to the Completeness
Axiom!
3 Series
You may want to think of a series of real numbers,∞∑n=1
an as an “infinite sum”, but that
notion is too vague and may lead to confusion and error. We define an infinite series as a
limit of finite sums, and so it is just another example of a sequence!
Definition 3.1. Given a sequence (an)n∈N in R, we define the series∞∑n=1
an as follows: for
each k ∈ N, define the kth partial sum
sk =k∑
n=1
an.
29
We say the series∞∑n=1
an converges if the sequence (sk)k∈N is convergent, and the series
diverges if the sequence (sk)k∈N is divergent.
Since each partial sum is a finite sum of real numbers, each sk is very clearly defined; it
is the sum of the first k terms, the running tally as you sum the series one term at a time.
The series is the limiting value, if such a limit exists.
Remark 3.2. From the definition we note that the convergence or divergence of the series
doesn’t depend on which value of n the sum starts with; the series∑∞
n=1 an,∞∑n=0
an,∞∑n=3
an,
and∞∑
n=2018
an are either all convergent or all divergent. All that matters is the “tail” of the
infinite sequence (sn)n∈N of partial sums.
Example: Geometric Series∞∑n=0
rn, where r ∈ R is constant.
This is the most important series, and it illustrates divergence and convergence very well.
This is one of the only series where we have an explicit formula for the partial sums:
sk =k∑
n=0
rn =1− rk+1
1− r, k = 0, 1, 2, . . . .
If you want to start the series with n = 1, then each term has a common factor of r, so∞∑n=1
rn = r∞∑n=0
rn, and
k∑n=1
rn = r
k−1∑n=1
rn = r1− rk
1− r=r − rk+1
1− r.
If |r| < 1, then rk+1 −−−→k→∞
0, and so the partial sums converge, and the series is
convergent,∞∑n=0
rn = limk→∞
sk = limk→∞
1− rk+1
1− r=
1
1− r,
and the limit value is explicitly known.
If r > 1, rn → ∞ (properly divergent), and since sk > rk the partial sums properly
diverge to infinity, the series is divergent. If r = 1, then rk = 1 and sk = k + 1 (the number
of terms) and the series is also properly divergent to infinity.
When r < −1, rk+1 is unbounded and diverges, so also does sk. (But not properly!
|rk+1| −−−→k→∞
0 but the values oscillate in sign.) A more interesting case is r = −1, for which
30
sk alternates between 1 (k even) and 0 (k odd). So a series can diverge even if the partial
sums remain bounded; there are divergent series which are not properly divergent to ±∞.
For most convergent series we don’t know to what value the series converges to. This is
where Cauchy sequences come in handy: a series converges if and only if its partial sums
form a Cauchy sequence!
Theorem 3.3. The series∑∞
n=1 an converges if and only if: ∀ε > 0, ∃H ∈ N so that∣∣∣∣∣p∑
n=m+1
an
∣∣∣∣∣ < ε, ∀p > m ≥ H. (3.1)
Proof. This is not difficult to prove, since the left-hand side of (3.1) can be rewritten in
terms of the partial sums,
|sp − sm| =
∣∣∣∣∣p∑
n=m+1
an
∣∣∣∣∣ .So the condition (3.1) is exactly the statement that the partial sums (sk)k∈N form a Cauchy
sequence in R. By Theorem 2.32 they converge if and only if (3.1) holds.
An immediate consequence comes from looking at a special case of (3.1), when m+1 = p,
and there is only one term in the sum,
|ap| =
∣∣∣∣∣p∑
n=p
an
∣∣∣∣∣ < ε, ∀p > H.
We may conclude (after changing notation from ap to the more usual an,) that:
Corollary 3.4. If∑∞
n=1 an is a convergent series, then limn→∞
an = 0.
Thus, for a series to converge, a necessary condition is that the terms an −−−→n→∞
0. Stewart
calls this a “test for divergence”, since it is most used in the contrapositive: if an does not
tend to zero as n→∞, then the series must diverge.
This condition is, however, not sufficient for convergence of the series; not only must
the general term an −−−→n→∞
0, but it must tend to zero sufficiently rapidly that the partial
sums settle at a limiting value. The boundary between convergent and divergent behavior
is subtle, as we will see via examples later on.
31
Nonnegative Series
An important special case of series are series with nonnegative terms, an ≥ 0 ∀n ∈ N.
For nonnegative series, the partial sums are monotone increasing,
sk+1 =k+1∑n=1
an = sk + ak+1 ≥ sk, ∀k ∈ N.
Theorem 3.5. If∑∞
n=1 an is a series with nonnegative terms, an ≥ 0 ∀n ∈ N, then the
series either converges or properly diverges to +∞.
In particular, the series converges if and only if the partial sums form a bounded sequence.
Note that this is not true for series whose terms change sign: for example∞∑n=1
(−1)n has
bounded partial sums even though the series diverges.
Remark 3.6. As noted in Remark 3.2, it is enough that the series has nonnegative terms
eventually always, that is, ∃N ∈ N for which an ≥ 0 ∀n ≥ N . In that case, the partial sums
will be monotone increasing from n = N on, ie, sN ≤ sN+1 ≤ sN+2 ≤ · · · , the sequence
(sN+k)k=0,1,2,... is monotone increasing, and hence either convergent or properly divergent to
+∞.
Example 3.7 (The Harmonic Series).∞∑n=1
1
ndiverges. To get an idea of what’s happening,
write out the first several terms, and group them in this funny way:
∞∑n=1
1
n= 1 +
1
2+
[1
3+
1
4
]+
[1
5+
1
6+
1
7+
1
8
]+
[1
9+ · · ·+ 1
16
]+ · · ·
≥ 1 +1
2+ 2 · 1
4+ 4 · 1
8+ 8 · 1
16+ · · ·
≥ 1 +1
2+
1
2+
1
2+
1
2+ · · ·
Basically, we group the terms a2m−1 + · · · + a2m+1 together. There are 2m−1 terms in each
group, and each term in the group is ≥ 2−m, so each group adds 12
to the sum. To make this
official, write it in terms of partial sums with k = 2m,
s2m+1 = s2m +
[1
2m + 1+
1
2m + 2+ · · ·+ 1
2m+1
]︸ ︷︷ ︸
2m terms, each ≥ 2−(m+1)
≥ s2m +1
2.
We may then conclude (induction!) that s2m ≥ 1 + m2
for each m ∈ N, and so the sequence
of partial sums is unbounded, and thus divergent. (Remember, it is monotone increasing as
the series has nonnegative terms!)
32
Example 3.8 (The p-series). The series∞∑n=1
1
npconverges for p > 1. We use essentially the
same idea as for the harmonic series, but not we want to show that the partial sums are
bounded above, so we group the terms a little differently:
∞∑n=1
1
np= 1 +
[1
2p+
1
3p
]+
[1
4p+
1
5p+
1
6p+
1
7p
]+
[1
8p+ · · ·+ 1
15p
]+ · · ·
≤ 1 + 2 · 1
2p+ 4 · 1
4p+ 8 · 1
8p+ · · ·
= 1 +1
2p−1+
1
4p−1+
1
8p−1· · ·
= 1 +1
2p−1+
(1
2p−1
)2
+
(1
2p−1
)3
+ · · ·
=∞∑m=0
(1
2p−1
)m,
so the p-series is bounded above by a geometric series, with r = 2−(p−1). Since p > 1,
0 < 2−(p−1) < 1, and the geometric series converges, and hence so does the p-series for p > 1.
With a few examples as above we can test convergence of series with nonnegative terms
via the Comparison Tests:
Theorem 3.9 (Comparison Tests). Let∞∑n=1
an,∞∑n=1
bn be series with nonnegative terms, for
which 0 ≤ an ≤ bn ∀n ∈ N.
(a) If∞∑n=1
bn converges, then∞∑n=1
an converges also.
(b) If∞∑n=1
an diverges, then∞∑n=1
bn diverges also.
Remark 3.10. As in Remark 3.2, it is not really necessary for 0 ≤ an ≤ bn ∀n ∈ N; it
suffices that it be true eventually always, that is: there exists N ∈ N so that 0 ≤ an ≤ bn
∀n ≥ N.
Example 3.11.∞∑n=1
1
npwith 0 < p < 1, diverges (properly to +∞.)
Let an = 1n. Since np ≤ n for 0 < p < 1 and n ∈ N, we have an ≤ 1
np . Since the harmonic
series∑∞
n=1 1/n diverges, by the Comparison Test we conclude that the p-series∑∞
n=1 1/np
diverges for 0 < p < 1 also.
33
Example 3.12.∞∑n=1
√n+ 3
n3 + 2nconverges. Since n ≥ 1, we obtain a term bn with the same
order of magnitude as an by the following estimate:√n+ 3
n3 + 2n≤√n+ 3n
n3=
2√n
n3=
2
n5/2= bn
Since∑∞
n=1 bn converges (p-series, p = 5/2 > 1,) by the Comparison Test the original series
converges also.
Exercise 3.13. (a) Assume an, bn ≥ 0 ∀n ∈ N,∞∑n=1
an converges and (bn)n∈N is bounded.
Show that∞∑n=1
anbn converges.
(b) Assume an, bn ≥ 0 ∀n ∈ N,∞∑n=1
an diverges, and ∃c > 0 for which bn ≥ c ∀n. Show
that∞∑n=1
anbn diverges.
(c) Assume an ≥ 0 ∀n ∈ N and∞∑n=1
an converges. Show that∞∑n=1
a2n converges. [Hint: use
(a).]
Absolute and Conditional Convergence
When the terms of the series may change sign the variety of behavior of series is greater. This
isn’t surprising, the partial sums are no longer monotone, so it mirrors the situation we had
for nonmonotone sequences, which could diverge by oscillation without properly diverging
to ±∞.
Let’s look at the simple example:∞∑n=1
(−1)n
n. This is an alternating series, an =
(−1)nbn with bn ≥ 0 ∀n ∈ N. In this case bn = 1n, but what we do for this example will be
the same for any alternating series with the additional property that (bn)n∈N is monotone
decreasing. As in the previous examples, we get interesting information by grouping the
terms in two different ways. First, group the terms in pairs with the even terms in front:∞∑n=1
(−1)n
n= −1 +
1
2− 1
3︸ ︷︷ ︸≥0
+1
4− 1
5︸ ︷︷ ︸≥0
+1
6− 1
7︸ ︷︷ ︸≥0
+ · · ·
In other words, since the series terms alternate in sign with (−1)n, and the absolute values
decrease in magnitude, the terms
a2j + a2j+1 = b2j − b2j−1 ≥ 0, ∀j = 1, 2, 3, . . .
34
Thus, the odd partial sums, s2j+1 satisfy:
s2j+1 = s2j−1 + a2j + a2j+1 ≥ s2j−1,
and therefore the subsequence of odd partial sums are monotone increasing, s1 ≤ s3 ≤ s5 ≤· · · ≤ s2j+1 ≤ · · ·
Grouping the terms in pairs, but starting with the odd terms,
∞∑n=1
(−1)n
n= −1 +
1
2︸ ︷︷ ︸≤0
+−1
3+
1
4︸ ︷︷ ︸≤0
+−1
5+
1
6︸ ︷︷ ︸≤0
+ · · ·
With this grouping,
a2j−1 + a2j = −b2j−1 + b2j ≤ 0, ∀j = 1, 2, 3, . . .
and the even partial sums are monotone decreasing, s3 ≥ s4 ≥ s6 ≥ · · · ≥ s2j ≥ · · ·Since a2j = b2j ≥ 0, by the monotonicity above we have
s1 ≤ s2j−1 ≤ s2j−1 + b2j = s2j ≤ s2.
Therefore, s2 is an upper bound for the monotone increasing sequence of odd partial sums
(s2j−1)j∈N, and by the Monotone Sequence Theorem, it converges: ∃sodd with limj→∞
s2j−1 =
sodd. Similarly, s1 is a lower bound for the monotone decreasing sequence of even partial
sums (s2j)j∈N, and hence ∃seven with limj→∞
s2j = seven. But, s2j = s2j−1 + b2j, and
limj→∞
b2j = limj→∞
1
2j= 0,
so
seven = limj→∞
s2j = limj→∞
s2j−1 + b2j = sodd.
Therefore, both the odd and the even subsequences converge to the same limit s = sodd =
seven, and so s = limk→∞
sk and the alternating series converges.
As you may have noticed, we really didn’t use the exact form of the series, only the facts
that it alternated, with absolute value of the terms monotonically decreasing to zero. We
have thus proven the following general convergence theorem for alternating series:
Theorem 3.14 (Alternating Series Theorem). Let∑∞
n=1 an be an alternating series, an =
(−1)nbn, ∀n ∈ N, with: bn ≥ 0 ∀n ∈ N; b≥bn+1 ∀n ∈ N; and bn −−−→n→∞
0. Then∑∞
n=1 an
converges.
35
We want to distinguish series which converge because of cancellation between positive
and negative terms, and those which converge because the magnitude of the terms tends to
zero rapidly enough.
Definition 3.15. We say the series∑∞
n=1 an converges absolutely if the series of its abso-
lute values∞∑n=1
|an| is convergent. If∑∞
n=1 an is convergent but∞∑n=1
|an| diverges we say the
original series∞∑n=1
an converges conditionally.
Example 3.16. The alternating series∞∑n=1
(−1)n
nconverges conditionally. More generally,
the alternating p-series∞∑n=1
(−1)n
npconverges for any p > 0! It converges absolutely for all
p > 1, and conditionally if 0 < p ≤ 1.
One thing we should check: if a series converges absolutely, is it convergent (in the sense
of the original definition)? Fortunately, the answer is yes, and absolute convergence is a
stronger condition than simple convergence (as in Definition 3.1.
Theorem 3.17. If the series∑∞
n=1 an is absolutely convergent, then it is convergent.
Proof. This is another exercise in the Cauchy criterion for convergence. Since∑∞
n=1 an is
absolutely convergent, its partial sums form a Cauchy sequence, so by Theorem 3.3, ∀ε > 0
∃H ∈ N so that ∣∣∣∣∣p∑
n=m+1
|an|
∣∣∣∣∣ =
p∑n=m+1
|an| < ε, ∀p > m ≥ H.
Now we apply the triangle inequality repeatedly (use induction!) to verify that∣∣∣∣∣p∑
n=m+1
an
∣∣∣∣∣ ≤p∑
n=m+1
|an|
< ε, ∀p > m ≥ H.
By Theorem 3.3,∑∞
n=1 an is convergent.
Exercise 3.18. Find a convergent series∞∑n=1
xn for which the series∞∑n=1
xn diverges. [The
original series must be conditionally convergent (why?)]
This part was not covered in class, but maybe we’ll come back to it later. . .
The tests for convergence of series which you learned from Stewart may all be obtained
by using the Cauchy criterion or the Comparison tests. As an example, we prove the Root
Test.
36
Theorem 3.19 (Root Test). Suppose limn→∞
n√|an| = L exists. Then,
∑∞n=1 an converges
absolutely if L < 1 and diverges if L > 1.
As you already know, the case L = 1 is indeterminant, since any of the p-series an = 1/np
gives L = 1 regardless of the value of p. (Check it!)
Proof. First assume L < 1. Choose ε > 0 so that r = L + ε < 1. Then, by the existence of
the limit, there exists K ∈ N with
n√|an| < L+ ε = r, ∀n ≥ K.
In other words, |an| < rn holds ∀n ≥ K. By the Comparison Test, since r < 1 and the
Geometric Series∑∞
n=K rn converges, the series
∑∞n=1 an converges absolutely.
Now assume L > 1. Choose ε > 0 so that L+ ε ≥ 1 this time. Then, by the existence of
the limit, there exists K ∈ N so that
n√|an| > L+ ε ≥ 1, ∀n ≥ K.
So |an| ≥ 1 ∀n ≥ K, so by the necessary condition for convergence, Corollary 3.4, the series∑∞n=1 an diverges.
Remark 3.20. Notice that we really didn’t need the limit to exist in the Root Test; what
we really needed for convergence was: ∃ r < 1 and ∃K ∈ N for which n√|an| ≤ r ∀n ≥ K.
Then, the Root Test is just the Comparison Test (to the Geometric Series) in disguise. This
weaker condition would be satisfied if we knew that any subsequential limit point of n√|an| was
strictly smaller than one. In other words, for convergence we only need lim supn→∞
n√|an| < 1.
Similarly, for divergence we really only need to set up the necessary condition by showing
that n√|an| > 1 infinitely often in n. This is equivalent to saying that there is a subsequential
limit point of n√|an| was strictly larger than one, or lim supn→∞
n√|an| > 1. So a sharper
version of the Root Test is stated in terms of limsup rather than limit:
∞∑n=1
an converges absolutely if lim supn→∞
n√|an| < 1 and diverges if lim sup
n→∞
n√|an| > 1.
4 Cardinality
We have been careful to distinguish sets in R from sequences of real numbers.
• A set S ⊂ R is any collection of real numbers. It has no order, and there is no point
in repeating elements when describing S. It could contain finitely or infinitely many
elements.
37
• A sequence (xn)n∈N is an ordered, infinite list of real numbers, indexed by n ∈ N.
Question: when can a set S be written as a sequence,
S = {xn | n ∈ N}??
• This question really is about counting, putting the elements of S in order.
• Children use their fingers to count. Mathematicians use N = {1, 2, 3, . . . }.
We start with finite sets. Clearly, a set S ⊂ R with only finitely many elements can not
be represented as a sequence. But we can understand what it means to count the elements
of a finite set.
Definition 4.1. For each n ∈ N, define the subsets Jn = {1, 2, 3, . . . , (n− 1), n} ⊂ N. A set
S is finite if it is in one-to-one correspondence with one of the sets Jn. A set which is not
finite is called infinite.
• So Jn is like having n fingers, and we assign to each element of S exactly one “finger”
in Jn.
• Be careful not to confuse “finite” and “bounded”. The set S = [0, 1] is bounded, but
infinite.
• A one-to-one correspondence establishes an equivalence between sets; two sets which
are both in 1-1 correspondence with Jn are equivalent to each other in the sense of
counting.
• If two sets S, T are in one-to-one correspondence, we say they have the same cardi-
nality.
Now let’s consider infinite sets.
Definition 4.2. We say the set S is countably infinite (or denumerable) if S is in one-
to-one correspondence with N.
In other words, S is countably infinite if its elements can be indexed by n ∈ N. In yet
other words, a countably infinite set can be written as a sequence!
Example 1: the counting numbers, S1 = N is a countably infinite set.
Example 2: the even counting numbers, S2 = {2, 4, 6, 8 . . . } = {2k | k ∈ N} is countably
infinite.
38
• Here xk = 2k, k ∈ N, gives the 1-1 correspondence from N to S.
• Notice that S2 ⊂ S1, and {xk}k∈N is a subsequence of the sequence N = (1, 2, 3, . . . ).
The example can be generalized:
Theorem 4.3. If S1 is countably infinite and S2 ⊂ S1, then S2 is either finite or countably
infinite.
Proof. • Suppose S2 in an infinite set.
• Since S1 is countably infinite, there is a sequence (xn)n∈N which lists all of the elements,
S1 = {xn |n ∈ N}.
• Let n1 be the smallest index for which xn1 ∈ S2, n2 > n1 the next index for which
xn2 ∈ S2, etc.
• Then the subsequence (xnk)k∈N in an enumeration of the elements of S2. ♦
Now let’s think about some operations on sets. For example, the union of the sets A and
B, A ∩ B is the set whose elements x satisfy x ∈ A or x ∈ B. Here is an example which is
special, but in the end completely typical:
Example: S = Z = N ∪ {0,−1,−2,−3, }, a union of countably infinite sets, is countable.
• Careful! If we count all the positive integers first, we’ll never get to the negative ones!
• Instead, we alternate between the two sets:
Z = {0, 1,−1, 2,−2, 3,−3, . . . }
• We could write down an exact formula for the 1-1 correspondence, xk = f(k),
f(k) =
{−k/2, if k is even,
(k − 1)/2, if k is odd,
and verify that it’s injective and surjective, but usually we just indicate how to count
the elements by writing the sequence!
39
Now let’s look at any two countably infinite sets, S1 and S2. We write each one as a
sequence,
S1 = x1, x2, x3, . . . and S2 = {y1, y2, y3, . . . },
and then the union may be written as a sequence too,
S1 ∪ S2 = {x1, y1, x2, y2, x3, y3, . . . }.
Theorem 4.4. If S1, S2 are countably infinite sets, then the union S1 ∪S2 is also countably
infinite.
Can we find an infinite set which isn’t countable? Suppose we look at an infinite number
of sets, and take their union.
Theorem 4.5. Suppose we have a countably infinite collection of sets, S1, S2, S3, . . . , and
each of the sets Si, i ∈ N is countably infinite, so
Si = {ai,1, ai,2, ai,3, . . . , ai,j, . . . }, ∀i ∈ N.
Then, their union, S =⋃∞i=1 Si is also countably infinite.
By the infinite union we mean that S contains all elements ai,j with i, j ∈ N.
Proof. List each set Si as the ith row in an array,
S1 ={a1,1, a1,2, a1,3, a1,4, . . .
}S2 =
{a2,1, a2,2, a2,3, a2,4, . . .
}S3 =
{a3,1, a3,2, a3,3, a3,4, . . .
}S4 =
{a4,1, a4,2, a4,3, a4,4, . . .
}...
......
......
and count them on diagonals:
Thus,
S = {a1,1, a1,2, a2,1, a3,1, a2,2, a1,3, a1,4, a2,3, . . . },
writes the union S as a sequence, and so S is countably infinite.
Let’s see how that works for an important example, the set of all rational numbers!
S = {q ∈ Q, q > 0} =∞⋃i=1
{i
j| j ∈ N
}
Now you may start to be thinking that all infinite sets are countable. This is not the
case: there are different “sizes” of infinite sets, and some infinite sets are too large to be
counted!
40
Figure 1: Counting a countable union of countable sets.
Figure 2: Counting the rational numbers, Q.
Theorem 4.6. The real number set R is uncountable. That is, R is not in one-to-one
correspondence with N.
So R is much larger than N or Z or Q, each of which is of the same cardinality.
There are two famous arguments to prove this, both due to Georg Cantor. Here is one
which uses an old friend, the monotone sequence theorem.
Proof. We argue by contradiction: suppose that R = {x1, x2, x3, . . . } is a complete enumer-
ation of the real numbers.
• First take x1, and find any closed interval I1 = [a1, b1] with x1 6∈ I1.
• Next, take x2, and find a closed interval I2 = [a2, b2] ⊆ I1 so that x2 6∈ I2. Since
I2 ⊆ I1, also x1, x2 6∈ I2.
41
• We continue in this way, constructing nested closed intervals,
Ik = [ak, bk] ⊆ Ik−1 ⊆ · · · ⊆ I2 ⊆ I1,
with the property that x1, x2, . . . , xk /∈ Ik.
Recall:
Ik = [ak, bk] ⊆ Ik−1 ⊆ · · · ⊆ I2 ⊆ I1, and x1, x2, . . . , xk /∈ Ik.
• Look at the endpoints of these intervals,
a1 ≤ a2 ≤ a3 ≤ · · · ≤ ak ≤ · · · bk ≤ · · · ≤ b3 ≤ b2 ≤ b1
• ¡2-¿ The (ak) are monotone increasing, and bounded above by b1; the (bk) are monotone
decreasing, and bounded below by a1.
• ¡3-¿ By the Monotone Sequence Theorem, both sequences converge,
limk→∞ ak = a ≤ b = limk→∞ bk
and the interval [a, b] ⊆ [ak, bk] = Ik ∀k.
• ¡4-¿ Therefore, xk /∈ [a, b] for any k, so there exist real numbers which are not on the
list (xk)k∈N, a contradiction.
• ¡5-¿ Even though Q is a dense set in R, it is much smaller: Q is countably infinite,
but R is uncountably infinite!
5 Limits and Continuity
We now consider functions f : A ⊆ R→ R, defined on a domain set A ⊆ R. We will review
the basic concept of limits in this context (familiar to you from calculus) and connect it back
to the limit concept for sequences studied above.
42
5.1 Limits of Functions
To study the limit of f(x) as x → a, we need to know that we can approach the number c
among points x ∈ A, the domain of f .
Example 5.1. Consider the function
f(x) =
{√x, if x ≥ 0,
−7, if x = −1.
The domain of the function is the set A = {−1}∪ [0,∞). The point c = −1 is in the domain,
but it is an isolated point: it has no neighboring points which are also in the domain, and so
we can’t “approach” c = −1 as a limit of points in the domain A. On the other hand, any
point in the other part of the domain, [0,∞) has neighbors which are also in the domain of
f .
For any δ > 0, call the set Vδ(c) = (c − δ, c + δ) the δ-neighborhood of c. It is an open
interval of width 2δ centered at x = c on the real line. In the previous example, when c = −1
and δ < 1, the δ-neighborhoods Vδ(−1) do not contain any other points from the set A other
than −1 itself. That is what we mean when we say c = −1 is an isolated point in A.
Definition 5.2. Let A ⊂ R. A point c ∈ R is called a limit point (often called cluster point)
of the set A if for every δ > 0, the δ-neighborhood (c− δ, c+ δ) contains a point x ∈ A, with
x 6= c.
If A is an interval, such as [a, b], (a, b), (a, b], (a,∞), [−∞, b], etc., then any point in the
interval or any endpoint is a limit point of the interval. This will typically be the case with
the examples we do, and so we won’t worry so much about limit points in most examples.
CAUTION: a point c ∈ R can be a limit point of a set A without being an element of A.
For example, if A = (0, 1), the endpoints c = 0 and c = 1 are limit points of A even though
they’re not elements of A.
Applying the definition repeatedly with δ = 1/n, n ∈ N, we observe that an equivalent
definition of limit point can be given in terms of sequences:
Theorem 5.3. Let A ⊆ R. Then, c ∈ R is a limit point of A if and only if:{ ∃ a sequence (xn)n∈N with:
xn ∈ A ∀n ∈ N, xn 6= c ∀n ∈ N, and xn −−−→n→∞
c.
We may now define what we mean by limx→c
f(x), for limit points c of the domain of f .
43
Definition 5.4. Let f : A ⊆ R→ R and suppose c is a limit point of the domain A. Then
limx→c
f(x) = L if: {∀ε > 0, ∃δ > 0 so that :
|f(x)− L| < ε ∀x ∈ A with 0 < |x− c| < δ.
Notice that the condition 0 < |x− c| < δ says that x is very close to c, but not equal to
c (as in the definition of limit point!) So the limit of a function is in general not related
to the value of f(c), if indeed c ∈ A at all.
We also point out that the condition “x ∈ A” means that if c is an endpoint of the set
A, the limit above is actually a one-sided limit: a limit from the right if we are at the left
endpoint, and a limit from the left if c is the right endpoint of A. We define a limit from the
right as follows, by restricting to values of x > c:
Definition 5.5. Let f : A ⊆ R → R and suppose c is a limit point of the domain {x ∈A | x > c}. Then lim
x→c+f(x) = L if:{
∀ε > 0, ∃δ > 0 so that :
|f(x)− L| < ε ∀x ∈ A with c < x < c+ δ.
Similarly, we may define the limit from the left, limx→c−
f(x) as above, but for c−δ < x < c.
Example 5.6. Let f(x) =√x, defined on A = [0,∞). We will show that lim
x→0
√x = 0.
(Note that c = 0 is a limit point of A.)
Let ε > 0 be any given value. Since x ∈ A = [0,∞) with |f(x) − 0| =√x < ε exactliy
when 0 ≤ x < ε2, we choose δ = ε2 > 0. Then, if x ∈ A and 0 < |x − 0| = x < δ, indeed
|f(x)−0| =√x < ε, and the definition is satisfied. Notice that because c = 0 is an endpoint
of the interval A = [0,∞), this is actually a limit from the right. Since f is not defined to
the left of c = 0, there is no limit possible from the left.
Now, try limx→4
√x = 2. (Note again that c = 4 is a limit point of A.) Again, we simplify
the desired expression |f(x)− 2|, in order to express it in terms of x− 4:
|f(x)− 2| =∣∣√x− 2
∣∣ =
∣∣∣∣ x− 4√x+ 2
∣∣∣∣ =|x− 4|√x+ 2
.
Since x ∈ A = [0,∞),√x+ 2 ≥ 2, and hence we have |f(x)− 2| ≤ 1
2|x− 4|. So if we choose
δ = 2ε, then if |x− 4| < δ = 2ε, by the above calculation, |f(x)− 4| ≤ 12|x− 4| < ε, and so
the limit statement is verified. ♦
We next relate limits of functions to limits of sequences, which we spent so much time
studying in the previous sections.
44
Theorem 5.7. Let f : A ⊆ R→ R, and c ∈ R a limit point of A. L = limx→c
f(x) if and only
if: {For every sequence (xn)n→∞ with xn ∈ A, xn 6= c ∀n ∈ N
and xn −−−→n→∞
c, it is true that f(xn) −−−→n→∞
L.
Proof. First, assume L = limx→c
f(x). Then, for any ε > 0, ∃δ > 0 with |f(x) − L| < ε for
all x ∈ A with |x − c| < δ. Take any sequence (xn)n∈N with xn ∈ A, xn 6= c ∀n ∈ N and
xn −−−→n→∞
c. By the convergence of the sequence, ∃K ∈ N so that |xn − c| < δ, ∀n ≥ K.
Taking x = xn in the first limit condition, whenever n ≥ K we then have |f(xn) − L| < ε,
that is the sequence (f(xn))n∈N converges to L, which is the desired conclusion.
Now, assume the second statement is true, that for every sequence (xn)n∈N with xn ∈ A,
xn 6= c ∀n ∈ N and xn −−−→n→∞
c, we have f(xn) −−−→n→∞
L. We argue by contradiction, and
assume that f(x) does not converge to L as x→ c. This means that:{∃ε0 > 0 so that ∀ δ > 0, there exists x ∈ A
with 0 < |x− c| < δ and |f(x)− L| ≥ ε0.
Applying this repeatedly with δ = 1/n, n ∈ N, we may generate a sequence (xn)n∈N with
the property that xn ∈ A ∀n ∈ N, 0 < |xn − c| < 1n, and |f(xn)− L| ≥ ε0. That is, xn 6= c,
xn −−−→n→∞
c, and f(xn) 6→ L. This contradicts the hypothesis that all such sequences converge
to L, and so the Theorem is verified.
This equivalent description of convergence of limits of functions will have many important
consequences for us. In particular, the contradiction argument in the second part of the proof
will be useful as a criterion for determining when a limit does not exist!
Corollary 5.8. If there exists two sequences (xn)n∈N, (un)n∈N ⊂ A, with xn, un 6= c ∀n and
xn, un −−−→n→∞
c, for which limn→∞
f(xn) 6= limn→∞
f(un), then limx→c
f(x) does not exist.
Example 5.9. f(x) = cos (1/x), A = {x 6= 0}. Then c = 0 is a limit point of A. If we look
at the sequences xn = 1/(nπ), n ∈ N, then xn −−−→n→∞
0, xn 6= 0 ∀n, and f(xn) = cos(nπ) =
(−1)n. So there are two subsequences (corresponding to odd and even n,) along which there
are different subsequential limit values of f(xn), and by the Corollary the limit limx→0
cos(1/x)
does not exist.
We can also use the equivalence of the two kinds of convergence to extend various facts
about limits from sequences to functions:
Theorem 5.10. Assume f, g : A ⊆ R → R, and c ∈ R a limit point of A. Assume also
that L = limx→c
f(x) and M = limx→c
g(x) both exist. Then:
45
(a) For any α ∈ R, limx→c
αf(x) = αL.
(b) limx→c
[f(x) + g(x)] = L+M .
(c) limx→c
[f(x) · g(x)] = LM .
(d) If M 6= 0, limx→c
[f(x)/g(x)] = L/M .
These all follow from Theorem 5.7 and the corresponding facts proven for sequences. If
(xn)n∈N, with xn ∈ A, xn 6= c ∀n ∈ N and xn −−−→n→∞
c, then
limn→∞
[f(xn) + g(xn)] = limn→∞
f(xn) + limn→∞
g(xn) = L+M.
Since this is true for all such sequences, statement (b) must hold. The others are all proven
in the same way. We also have:
Theorem 5.11 (Squeeze Theorem). If f(x) ≤ g(x) ≤ h(x) ∀x ∈ A, c ∈ R is a limit point
of A, and limx→c f(x) = L = limx→c h(x), then limx→c g(x) = L also.
Example 5.12. We show that limx→0
x cos(1/x) = 0. Note that cos(1/x) has no limit as x→ 0,
so the limit laws in Theorem 5.10 does not apply! However, | cos(1/x)| ≤ 1 for all x 6= 0, so
−|x| ≤ x cos(1/x) ≤ |x| ∀x 6= 0.
Since limx→0 |x| = 0 (Exercise: prove it via the definition!), by the Squeeze Theorem we have
limx→0 x cos(1/x) = 0.
46
5.2 Continuous Functions
By introducing limits we have devised a new “operation” which may be performed on func-
tions, other than evaluation of a function f(x) at a point x in its domain. We distinguish a
special class of functions, the continuous functions which are well-behaved with respect to
the taking of limits.
Definition 5.13. Let A ⊆ R, and f : A→ R. We say that f is continuous at a ∈ A if
∀ε > 0 ∃δ > 0 so that |f(x)− f(a)| < ε, ∀ x ∈ A with |x− a| < δ.
We note that in the definition, a ∈ A is in the domain of f and so f(a) has been defined.
And, as long as a is a limit point of the set A, the condition given in the definition matches
with the existence of the limit, limx→a
f(x) = f(a).
We thus have the equivalent definition of continuity at a limit point a ∈ A:
Theorem 5.14. Let f : A ⊆ R→ R and a ∈ A a limit point. Then f is continuous at a if
and only if limx→a
f(x) = f(a).
If a ∈ A is not a limit point, in other words, if a is an isolated point of A, then any
f is continuous at a. So generally we only really care about continuity at limit points of
the domain A. As with limits, it will often be convenient to think of continuity in terms
of sequences. Applying Theorem 5.7 to Theorem 5.14, we have yet one more equivalent
condition for continuity at a limit point:
Theorem 5.15. Let f : A ⊆ R→ R and a ∈ A a limit point. Then f is continuous at a if
and only if limn→∞
f(xn) = f(a) holds for every sequence (xn)n∈N with xn −−−→n→∞
a.
The best functions are continuous everywhere on their domain A:
Definition 5.16. Let f : A ⊆ R→ R. We say f is continuous on A if f is continuous at
each point x ∈ A.
Continuity on a set A can be written in a very satisfying way: a continuous function is
one for which
limn→∞
f(xn) = f(
limn→∞
xn
)holds for every convergent sequence (xn)n∈N ⊂ A.
As in the previous discussion of limits, the use of sequences makes for a simple condition
for a function to be discontinuous:
Corollary 5.17. Let f : A ⊆ R → R and a ∈ A. If there exists a sequence (xn)n∈N ⊂ A
with xn −−−→n→∞
a such that limn→∞
f(xn) 6= f(a), then f is discontinuous at a.
47
Note that if there is a sequence (xn)n∈N with xn −−−→n→∞
a for which (f(xn))n∈N doesn’t
converge at all, then f is discontinuous at a. One way to exploit this is to find a sequence
xn −−−→n→∞
a for which f(xn) has two different subsequential limits. For example, if
f(x) = sgn (x) =
−1, if x < 0;
0, if x = 0;
1, if x > 0,
we take xn = (−1)n 1n−−−→n→∞
0, then f(xn) = (−1)n, which is divergent since it has two
subsequential limit points. Thus, the signum function is discontinuous at x = 0.
Using the properties of the limit from Theorem 5.10 we immediately derive the usual
properties of continuous functions from calculus:
Theorem 5.18. Assume f, g : A ⊆ R → R are each continuous at a ∈ A. Then (f + g),
fg, and f/g (provided g(a) 6= 0) are all continuous at a ∈ A.
The other natural combination between continuous functions is composition. To define
the composition of two functions h(x) = g ◦ f(x) = g(f(x)) we have to be sure that their
domain and range are compatible.
Theorem 5.19. Let f : A ⊆ R → R and g : B ⊆ R → R with f(A) ⊆ B. If f is
continuous at a ∈ A and g is continuous at b = f(a) ∈ B, then h = g ◦ f is continuous at a.
Proof. Using the definition of continuity for g at y = b, for any ε > 0 there exists γ > 0 for
which
|g(y)− g(b)| < ε for all y ∈ B with |y − b| < γ.
Now apply the definition of continuity for f at x = a (adapting the Greek letters appropri-
ately): there exists δ > 0 for which
|f(x)− f(a)| < γ for all x ∈ A with |x− a| < δ.
Now, for x ∈ A with |x − a| < δ, call f(x) = y ∈ B, and so |y − b| = |f(x) − f(a)| < γ.
By the first continuity statement, we then have |g(f(x))− g(f(a))| = |g(y)− g(b)| < ε, and
we’re done.
As an exercise, you can make an alternate proof of the continuity of g ◦ f by using
sequences (xn)n∈N ⊂ A.
With these combinations we can verify that a lot of common elementary functions are
continuous, such as polynomials and rational functions (as long as the denominator is not
zero.)
48
Example 5.20. First, f(x) =√x is continuous on [0,∞). For any a ∈ (0,∞), by a familiar
trick
|f(x)− f(a)| =∣∣√x−√a∣∣ =
∣∣∣∣ x− a√x+√a
∣∣∣∣ ≤ a−12 |x− a|.
Given any ε > 0 let δ = ε√a, and the definition of continuity at a is verified. For a = 0,
given ε > 0 we take δ = ε2 (verify the details!)
Now we know that functions of the form h(x) =√P (x), with P (x) polynomial, are
continuous at any points in their domain. (That is, x for which P (x) ≥ 0.) How do we get a
larger “vocabulary” of continuous functions? Via calculus, as we’ll see in the section on the
derivative. . .
Continuous functions are special, but we gain even more special properties when the
domain of the continuous function is a closed, bounded interval: that is, A = [a, b], and
f : [a, b]→ R is continuous on the set [a, b].
Theorem 5.21. Let f : [a, b]→ R be continuous. Then:
(a) f is bounded on [a, b]; that is, ∃M > 0 so that |f(x)| ≤M ∀x ∈ [a, b].
(b) f attains its maximum and minimum values in [a, b]. That is, there exist points
α, β ∈ [a, b] for which
f(α) ≤ f(x) ≤ f(β) for all x ∈ [a, b].
Another way to express (a) is that the image of f , the set of values y = f(x),
f([a, b]) := {f(x) | x ∈ [a, b]}
is a bounded set in R. As for (b), we could equivalently state it in this way: there exists
points α, β ∈ [a, b] so that:
f(α) = infx∈[a,b]
f(x), and f(β) = supx∈[a,b]
f(x).
CAREFUL: this theorem does not hold in case that the domain of f is either unbounded,
or not closed (ie, lacks one or both endpoints.) Try the example f(x) = e−x on the interval
[0,∞) (there is no minimum) or the interval (0, 1] (there is no maximum.)
Proof. We argue by contradiction, and assume that f is not bounded. Since |f(x)| ≥ 0
always, this means we are assuming that {|f(x)| | x ∈ [a, b]} is unbounded above. Therefore,
for any n ∈ N, n is not an upper bound for f([a, b]), and there exists xn ∈ [a, b] with
49
|f(xn)| ≥ n. As we have done many times before, we recognize that we have constructed a
sequence (xn)n∈N, in this case in the interval [a, b] and so a ≤ xn ≤ b ∀n ∈ N. Hence (xn)n∈N
is a bounded sequence, and by Bolzano-Weierstrass it contains a convergent subsequence,
xnk→ x. By properties of limits of sequences, a ≤ x ≤ b, so x ∈ [a, b], and by continuity,
f(x) = limk→∞
f(xnk).
However, f(xn) −−−→n→∞
+∞ (diverges properly), and so this is impossible. Therefore f([a, b])
is bounded, and (a) must hold.
For (b), let’s show that α exists; a similar argument works for β. Let
A = infx∈[a,b]
f(x) = inf{f(x) | x ∈ [a, b]},
which exists since we just showed that f([a, b]) is bounded. By the definition of the infimum,
for every n ∈ N, ∃xn ∈ [a, b] with
A ≤ f(xn) ≤ A+1
n, ∀n ∈ N. (5.1)
Again, we’ve constructed a sequence (xn)n∈N, a ≤ xn ≤ b, ∀n ∈ N, which is bounded, and
using Bolzano-Weierstrass again we conclude the existence of a subsequence (xnk)k∈N, and
α ∈ R with xnk→ α, with α ∈ [a, b]. From (5.1), f(xn) −−−→
n→∞A, and by continuity we then
have:
f(α) = limk→∞
f(xnk) = A,
since every subsequence of a convergent sequence converges to the same limit. This proves
(b), for the existence of a minimum.
Here is an old friend from Calculus:
Theorem 5.22 (Intermediate Value Theorem). Let f : [a, b]→ R be continuous. If f(a) <
k < f(b), then ∃c ∈ (a, b) with k = f(c).
There are (at least) two proofs of this, by slightly different means. The first one is based
on the Supremum, and it heavily uses the following Lemma, which is problem A in Practice
Problems # 5:
Lemma 5.23. If g : A ⊂ R → R is continuous at d ∈ A and g(d) > 0 then ∃δ > 0 so that
g(x) > 0 ∀x ∈ D with d− δ < x < d+ δ.
Proof # 1. Define the set
S = {u | f(x) < k ∀ x ∈ [a, u]} ⊆ [a, b],
50
and c = supS. Then either f(c) < k, f(c) > k, or f(c) = k; we eliminate the first two.
If f(c) < k, by Lemma 5.23 with g(x) = k − f(x), ∃δ > 0 for which g(x) = k − f(x) > 0
in (c− δ, c+ δ), and so (c+ δ) ∈ S, which contradicts c = supS. Hence, f(c) ≥ k.
If f(c) > k, again by Lemma 5.23, but now with g(x) = f(x) − k, ∃δ > 0 for which
g(x) = f(x) − k > 0 in (c − δ, c + δ), and therefore (c − δ) is an upper bound for S, again
contradiction the choice of c = supS.
In conclusion, f(c) = k, with c ∈ (a, b).
The second proof is more constructive– in fact, it gives an algorithm, “bisection”, by
which one can approximate the value of c.
Proof # 2. Define L = b− a, the length of the interval I0 = [a, b]. Consider the midpoint of
the interval, p0 = 12(a+ b). If f(p0) = k, then we are done, and c = p0. If not, define a new
interval I1 = [a1, b1] as follows:
• if f(p0) < k, then take a1 = p0 and b1 = b.
• if f(p0) > k, then take a1 = a and b1 = p0.
The interval I1 ⊂ I0, and has length L1 = 12L. Now repeat the above procedure with I1,
getting I2 ⊂ I1 ⊂ I0, and repeat again, etc. That is, if we have already constructed n
intervals, In ⊂ In−1 ⊂ · · · ⊂ I1 ⊂ I0, with each Ij = [aj, bj] having length Ln = 12Ln−1 =
2−nL, then let pn = 12(bn − an) be the midpoint of In. If f(pn) = k, then we are done and
c = pn. If not, define the next interval In+1 = [an+1, bn+1] as follows:
• if f(pn) < k, then take an+1 = pn and bn+1 = bn.
• if f(p0) > k, then take an+1 = an and bn+1 = pn.
If the process is non terminating, we get a sequence of nested intervals, I0 ⊂ I1 ⊂ I2 ⊂ · · · ,and so the endpoints form two monotone sequences,
a ≤ a1 ≤ a2 ≤ · · · ≤ an ≤ · · · ≤ bn ≤ · · · ≤ b2 ≤ b1 ≤ b.
By the Monotone Sequence Theorem, each sequence converges,
c = limn→∞
an, d = limn→∞
bn, and c ≤ d.
As bn − an = 2−nL −−−→n→∞
0, by the Squeeze Theorem d = c. And by continuity, f(c) =
limn→∞ f(an) = limn→∞ f(bn). Since f(an) < 0 and f(bn) > 0 ∀n, we have f(c) = 0.
51
5.3 Extensions of limit concepts
Here we consider some variations on the theme of limits of functions. First, as in Calculus
we can also consider one-sided limits, as x tends to a from the right or from the left. It
is also useful to consider the possibility that functions diverge properly as x→ a, from one
side or another.
Definition 5.24. Let f : A ⊆ R→ R, and a ∈ R a limit point of the set A.
(a) We say that f(x) converges to L as x tends to a from the right, and write limx→a+
= L if:{∀ε > 0 ∃δ > 0 such that
|f(x)− L| < ε whenever x ∈ A, and a < x < a+ δ.
(b) We say that f(x) converges to L as x tends to a from the left, and write limx→a−
= L if:{∀ε > 0 ∃δ > 0 such that
|f(x)− L| < ε whenever x ∈ A, and a− δ < x < a.
(c) We say that f properly diverges to +∞ as x tends to a from the right, and write
limx→a+
f(x) = +∞ if:{∀M > 0 ∃δ > 0 such that
f(x) > M whenever x ∈ A, and a < x < a+ δ.
(d) We say that f properly diverges to +∞ as x tends to a from the left, and write
limx→a−
f(x) = +∞ if:{∀M > 0 ∃δ > 0 such that
f(x) > M whenever x ∈ A, and a− δ < x < a.
We could also have proper divergence of f(x) to −∞ as x → a±; just switch f(x) <
−M instead of f(x) > M in (c), (d) above, as limx→a±
f(x) = −∞ is the same as when
limx→a±
[−f(x)] = +∞.
There is also the case of limits as x→ ±∞:
Definition 5.25. Let f : [a,∞)→ R. We say limx→∞
f(x) = L if
∀ε > 0, ∃B > 0 so that |f(x)− L| < ε ∀x > B.
Let f : (−∞, b]→ R. We say limx→−∞
f(x) = L if
∀ε > 0, ∃B > 0 so that |f(x)− L| < ε ∀x < −B.
52
Example 5.26. f(x) = 1/x3, x 6= 0. Then limx→0−
f(x) = −∞: since for any M > 0,
f(x) = 1/x3 < −M is equivalent to the inequality 0 > x > − 1M1/3 . So by taking δ = 1
M1/3 we
satisfy the definition.
In addition, limx→∞
f(x) = 0: for any ε > 0 and x > 0,
|f(x)− 0| = 1
x3< ε ≡ x > ε−
13 .
Choosing B = ε−13 the definition is satisfied.
53
5.4 Uniform Continuity
This is a refined form of continuity, which seems to be a kind of technical difference with the
original definition but which is surprisingly more useful than plain continuity.
Definition 5.27. Let f : A ⊆ R→ R. We say that f is uniformly continuous on A if:{∀ε > 0, ∃δ > 0 such that
∀x, u ∈ A with |x− u| < δ, then |f(x)− f(u)| < ε.
This looks a lot like the statement “f is continuous on A”,{For all u ∈ A, and for all ε > 0,
∃δ > 0 such that if x ∈ A with |x− u| < δ, then |f(x)− f(u)| < ε.
But notice the subtle difference: for regular continuity on A, δ could be different for different
values of u ∈ A, whereas in uniform continuity the same value of δ must be chosen no matter
which u ∈ A is tested.
Example 5.28. f(x) = 1/x, with domain A = [a,∞) for any a > 0. Let’s see if we can
apply the definition: for x, u ≥ a > 0, we have,
|f(x)− f(u)| =∣∣∣∣1x − 1
u
∣∣∣∣=|x− u|xu
(since |x− u| = |u− x|) (5.2)
≤ 1
a2|x− u| (since x, u ≥ a > 0).
This last inequality is true ∀x, u ∈ A = [a,∞), so given any ε > 0, if we choose δ = a2ε ,
then we may conclude that the definition is satisfied, and f is uniformly continuous on A.
The above inequality gives the perfect set-up for proving uniform continuity: the differ-
ence |f(x)−f(u)| is at most proportional to the distance |x−u|. This situation occurs often
enough that we have a special name for it:
Definition 5.29. A function f is called Lipschitz continuous on the set A ⊆ R if ∃K ≥ 0
for which
|f(x)− f(u)| ≤ K |x− u|, ∀ x, u ∈ A.
In the previous example, f(x) = 1/x is Lipschitz continuous on the set A = [a,∞), with
“Lipschitz constant” K = 1a2
. It is an easy exercise in applying the definition to prove the
following:
54
Theorem 5.30. If f is Lipschitz continuous on A then f is also uniformly continuous on
A.
What’s nice about Lipschitz is that it has a geometrical interpretation, which uniform
continuity doesn’t have: it says that the slopes of the secant lines to the graph of y = f(x)
are uniformly bounded,
m =
∣∣∣∣f(u)− f(x)
u− x
∣∣∣∣ ≤M, ∀x, u ∈ A, x 6= u.
Exercise 5.31. Show that f(x) =√x is continuous on [0,∞) but not Lipschitz continuous.
[Hint: assume it is, and get a contradiction when u = 0.]
We know that the function f(x) = 1/x is continuous everywhere on it’s domain (it’s a
rational function,) but it seems we can only show it’s uniformly continuous if the domain
excludes a neighborhood of the bad point u = 0. How would we show that it’s not uniformly
continuous? We need to invert the definition.
Lemma 5.32. Let f : A ⊆ R→ R. f is not uniformly continuous on A if and only if:{∃ε > 0 such that ∀ δ > 0 ∃x, u ∈ A with
|x− u| < δ and |f(x)− f(u)| ≥ ε,
That is, to show a function is not uniformly continuous, we need to find, for each δ > 0,
pairs of points which are close to within distance δ (|x− u| < δ) for which the y-values are
not close (|f(x) − f(u)| ≥ ε). Applying this criterion for a sequence of δ = δn = 1n, n ∈ N,
we have:
Lemma 5.33. Let f : A ⊆ R→ R. f is not uniformly continuous on A if and only if:{∃ε > 0 and a pair of sequences (xn)n∈N, (un)n∈N with xn, un ∈ A,
(xn − un) −−−→n→∞
0, and |f(xn)− f(un)| ≥ ε,
Let’s try these on f(x) = 1/x, A = (0, 1). We know the problem is near u = 0. Setting
up Lemma 5.32, given δ with 0 < δ < 1, let u = δ and x = 12δ, so |u − x| = 1
2δ < δ. And
then, by the calculation(5.2),
|f(x)− f(u)| = |x− u|xu
=12δ
12δ2
=1
δ≥ 1,
so by Lemma 5.32 f is not uniformly continuous on A = (1, 0).
55
Alternatively, we could use Lemma 5.33, with a pair of sequences. Take sequences which
both converge to zero, say xn = 2−n, un = 2−n−1, so |xn − un| = 2−n−1 −−−→n→∞
0, but
|f(xn)− f(un)| = |x− u|xu
=2−n−1
2−2n−1= 2n ≥ 2.
Whichever way you prefer, we obtain the non-uniform continuity of f on A.
Is there a way to guarantee that a continuous function is in fact uniformly continuous?
Yes, provided the domain is a closed and bounded set:
Theorem 5.34. If f is continuous on the interval [a, b] (closed & bounded), then f is also
uniformly continuous on [a, b].
Example 5.35. f(x) =√x is uniformly continuous on [0, 1]. We already showed it was con-
tinuous, so the upgrade to uniform continuity follows directly from the Theorem. However,
from Exercise 5.31 it is not Lipschitz continuous, so this gives an example of how a function
can be uniformly, but not Lipschitz, continuous.
Proof. To obtain a contradiction, assume that f is continuous on A = [a, b] but not uniformly.
By Lemma 5.33 there exist ε > 0 and a pair of sequences (xn)n∈N, (un)n∈N with xn, un ∈ A,
and (un − xn) −−−→n→∞
0, but |f(un) − f(xn)| ≥ ε > 0, ∀n ∈ N. By our old friends Bolzano
and Weierstrass, there exists a subsequence and u ∈ [a, b] for which limk→∞ unk= u. Next,
consider xnk)k∈N, with the same nk as for u. If a sequence converges, then we know that all
subsequences converge to the same limit, so (unk− xnk
) −−−→k→∞
0. Therefore,
xnk= unk
− (unk− xnk
) −−−→k→∞
u,
so the subsequences (xnk)k∈N, (unk
)k∈N both converge to the same u ∈ [a, b]. Since f is
continuous,
limk→∞
(f(un′k)− f(xn′k)
)= f(u)− f(x) = 0,
which contradicts |f(un)− f(xn)| ≥ ε > 0, ∀n ∈ N.
Example 5.36. Show that f(x) = 3√x is uniformly continuous on [0,∞).
Since (a3 − b3) = (a− b)(a2 + ab+ b2), if one of x, u 6= 0,
|f(x)− f(u)| = | 3√x− 3√u| = |x− u|
(x2/3 + x1/3u1/3 + u2/3).
So if both x, u ≥ 1, we have
|f(x)− f(u)| ≤ 1
3|x− u|,
56
and f is Lipschitz continuous on [1,∞), and therefore uniformly continuous there. Therefore,
∀ε > 0, ∃δ1 > 0 for which x, u ∈ [1,∞) with |x− u| < δ1 implies |f(x)− f(u)| < ε.
Next, for fixed u ∈ (0, 2], f is continuous at u since ∀ε > 0,
|f(x)− f(u)| ≤ 1
u2|x− u| < ε, for all x > 0 with |x− u| < u2ε.
And when u = 0, |f(x)−f(0)| = 3√x < ε whenever x > 0 and |x−0| < ε3, so f is continuous
on the closed interval [0, 2]. By Theorem 5.34 then f is uniformly continuous on [0, 2], so
∃δ0 > 0 so that ∀x, u ∈ [0, 2] with |x− u| < δ0 we have |f(x)− f(u)| < ε.
Since any x, u ∈ [1,∞) with |x − u| < 1 both lie either in [0, 2] or in [1,∞), if we let
δ = min{δ0, δ1, 1}, the definition of uniform continuity is satisfied with this δ.
57
6 Differentiability
You know all about the derivative from first-year calculus. But here we’re interested in
understanding the definition and what it says about functions, not so much about calculating
derivatives in particular examples. Much of what we’ll do here actually is proven in Stewart’s
book, but now we will try to put things together so they give a more complete picture.
6.1 Differentiable Functions
Throughout the section we will assume f : I ⊆ R → R, where the domain of f is an
interval. This could be an open interval, I = (a, b) or (a,∞) or (−∞, b); or a closed
interval, I = [a, b] or [a,∞) or (−∞, b]; or neither, I = (a, b] or I = [a, b).
Definition 6.1. Assume f : I ⊆ R → R and c ∈ I. We say f is differentiable at c, with
derivative f ′(c), if limx→c
f(x)− f(c)
x− c= f ′(c). In other words,
∀ε > 0, ∃δ > 0 so that ∀x ∈ I with 0 < |x− c| < δ,∣∣∣∣f(x)− f(c)
x− c− f ′(c)
∣∣∣∣ < ε.
Remark 6.2. Notice that if c is an endpoint of the interval I, then this is actually a one-sided
limit. For instance, if I = [a, b], and c = a (left endpoint), then x ∈ I with 0 < |x − c| < δ
means a = c < x ≤ b, so the limit is from the right only.
If f is differentiable at all c ∈ I, we say the function f is differentiable on I.
We want to think of differentiability as a property which certain functions have, like
continuity or uniform continuity. The existence of the derivative says something nontrivial
about the smoothness of the graph y = f(x): it says that there is a well-defined tangent line.
And the property of differentiability is stronger than continuity, in the sense that it is more
special:
Theorem 6.3. Suppose f is differentiable at c ∈ I. Then, f is continuous at c.
Proof. Take any x ∈ I, x 6= c. Then,
f(x)− f(c) =f(x)− f(c)
x− c(x− c).
Since f is differentiable, both terms in the product on the right-hand side have limits as
x→ c, and therefore the limit exists:
limx→c
[f(x)− f(c)] = limx→c
f(x)− f(c)
x− c(x− c) = 0.
In particular limx→c f(x) = f(c), so (by definition) f is continous at c.
58
Example 6.4. Let f(x) = |x|, which is continuous on R. (Check it with the definition!) At
c = 0, f is not differentiable: for x 6= 0,
f(x)− f(0)
x− 0=|x|x
= sgnx.
The limit as x→ 0 does not exist. (We did this example back in the beginning of Section 5.)
Be careful! When f is differentiable, f itself must also be continuous, but f ′(x) might be
discontinuous! For example
f(x) =
{x2 sin(x−1), if x 6= 0
0 if x = 0,(6.1)
is differentiable at all x ∈ R, including c = 0. For x 6= 0, use the usual rules from calculus,
f ′(x) = 2x sin(x−1)− cos(x−1).
When x = 0, we use the definition,
f ′(0) = limx→0
f(x)− f(0)
x− 0= lim
x→0x sin(x−1) = 0,
(by the Squeeze Theorem!) So f is differentiable at x0 = 0. However, f ′(x) has no limit as
x→ 0, so f ′(x) is discontinuous at c = 0.
The verification of all the usual “rules” of differential calculus (product, quotient, chain
rule) may now be done in a logical, rigorous fashion. The only one which is a little subtle is
the chain rule.
Theorem 6.5 (Chain Rule). Suppose I, J ⊆ R are intervals, g : J → R, f : I → R,
f(I) ⊆ J . Assume f is differentiable at c ∈ I, and g is differentiable at d = f(c) ∈ J . Then
h(x) = g(f(x)) is differentiable at x = c, and
h′(c) = g′(f(c)) f ′(c).
This follows directly from the definition of differentiability.
Proof of the Chain Rule. The key is the following observation: since g(y) is differentiable at
y = d, the function
ϕ(y) =
{g(y)−g(d)y−d , if y 6= d.
g′(d), if y = d,
is continuous at y = d. Then,
h(x)− h(c)
x− c=
[g(f(x))− g(d)
f(x)− d
] [f(x)− f(c)
x− c
]= ϕ(f(x))
[f(x)− f(c)
x− c
].
59
By the continuity of ϕ(y) and the differentiability of f at x = c, each term in the product
converges as x→ c, and hence
h′(c) = limx→c
h(x)− h(c)
x− c= g′(f(c)) f ′(c).
The following should be a very familiar result; we will need it in a crucial way in the next
part:
Theorem 6.6. Suppose f attains its maximum at c ∈ (a, b), ie, f(c) ≥ f(x), ∀x ∈ (a, b). If
f is differentiable at x = c, then f ′(c) = 0.
Of course, the same applies for minima, f(c) ≤ f(x), ∀x ∈ (a, b).
Proof. For all x ∈ (a, c), x < c and f(x) ≤ f(c) so f(x)−f(c)x−c ≥ 0. Since f is differentiable,
f ′(c) = limx→c−
f(x)− f(c)
x− c≥ 0.
On the other hand, x ∈ (c, b), x > c and f(x) ≤ f(c) so f(x)−f(c)x−c ≤ 0, and we also have
f ′(c) = limx→c+
f(x)− f(c)
x− c≤ 0.
The only possibility is f ′(c) = 0.
Remark 6.7. Notice that if the maximum is attained at the right endpoint, c = b, then
the first half of the above is still valid, and we conclude that f ′(c) ≥ 0 (f increases towards
its max at the right endpoint.) And if the max occurs on the left endpoint, c = a, then
the second half of the argument gives us f ′(c) ≤ 0 (and f decreases away from its max on
the left endpoint.) For minima, the inequalities are reversed: g′(a) ≥ 0 or g′(b) ≤ 0 if the
minimum occurs on an endpoint.
As a first application of this principle, we prove the following property of the derivative
due to Darboux.
Theorem 6.8 (Darboux’s Theorem). Suppose f is differentiable on [a, b], and there exists
a constant λ with f ′(a) < λ < f ′(b). Then ∃ c ∈ (a, b) with f ′(c) = λ.
The same conclusion holds if the order of the values is reversed, and f ′(b) < λ < f ′(a);
just consider g(x) = −f(x) and µ = −λ, and apply the Theorem to g and µ.
60
Proof. Let g(x) = f(x)−λx, so g is differentiable on [a, b], g′(x) = f ′(x)−λ, and g′(a) < 0 <
g′(b). Therefore, it’s enough to show that ∃c ∈ (a, b) with g′(c) = 0. Since g is differentiable
on [a, b], by Theorem 6.3 it is continuous, and so by Theorem 5.3.4 in the textbook it attains
its minimum value in [a, b]. By Remark 6.7, the minimum cannot occur on the endpoints
x = a, b, since g′(a) < 0 < g′(b). So the minimum must be attained in the open interval,
c ∈ (a, b), and so we are done, by Theorem 6.6.
Remember from the example (6.1) above that f ′(x) might be discontinuous. However,
Darboux’s Theorem says that whether f ′ is continuous or not, it satisfies the Intermediate
Value property. In particular, this says that f ′(x) cannot have a jump discontinuity, only
discontinuities in which at least one of the one-sided limits doesn’t exist (as in the example
(6.1) above.)
6.2 The Mean Value Theorem and its Consequences
Theorem 6.9 (Lagrange’s Mean Value Theorem). Suppose f is continuous on [a, b] and
differentiable on (a, b). Then, there exists c ∈ (a, b) so that
f(b)− f(a) = f ′(c)(b− a).
Equivalently, there exists c ∈ (a, b) so that
f(b)− f(a)
b− a= f ′(c).
The left-hand side gives the slope of the secant line through the points (a, f(a) and (b, f(b))
on the graph y = f(x), while the right-hand side gives the slope of the tangent line to the
graph at x = c.
There is a somewhat stronger version of the Mean Value Theorem, due to Cauchy:
Theorem 6.10 (Cauchy’s Mean Value Theorem). Assume f, g are both continuous on [a, b]
and differentiable on (a, b), and in addition assume that g′(t) 6= 0, ∀t ∈ (a, b). Then,
∃c ∈ (a, b) withf(b)− f(a)
g(b)− g(a)=f ′(c)
g′(c). (6.2)
This has a similar geometrical interpretation as Lagrange’s MVT: consider a path, r(t) =
(x(t), y(t)), t ∈ [a, b], in the plane. The left-hand side of (6.2) is the slope of the secant line
connecting the endpoints (g(a), f(a)) and (g(b), f(b)), and the right-hand side is the slope
of the tangent line at t = c.
It is easy to see that the Lagrange MVT is a corollary of Cauchy’s MVT, obtained by
choosing the function g(t) = t. To prove the MVT we start with the following special case:
61
Theorem 6.11 (Rolle’s Theorem). Suppose f is continuous on [a, b] and differentiable on
(a, b), with f(a) = 0 = f(b). Then there exists c ∈ (a, b) with f ′(c) = 0.
So between any two zeros of a differentiable function, there must be a critical point!
Proof of Rolle’s Theorem. If f(x) = 0 ∀x ∈ [a, b] we are done, since f ′(c) = 0 ∀c ∈ (a, b).
So assume f is not the constant function zero, and so it must be either positive or negative
somewhere. As f is continuous on [a, b], by Theorem 5.3.4 in the textbook, it attains
its maximum and minimum values in [a, b]. Since f(a) = f(b) = 0, either the (positive)
maximum or the (negative) minimum (or both) must be attained in the interior, at some
c ∈ (a, b). By Theorem 6.6, f ′(c) = 0.
The two Mean Value Theorems are merely Rolle’s Theorem if we use the secant line to
the curve as the horizontal axis. That is, we subtract off the equation of the secant, and
apply Rolle’s Theorem:
Proof of Cauchy’s MVT. First, we claim that g(a) 6= g(b). Indeed, if it were true that
g(a) = g(b), then by Rolle’s Theorem we would have g′(c) = 0 for some c ∈ (a, b), but we
have assumed that g′(t) 6= 0, ∀t ∈ (a, b).
Then let
h(t) = f(x)− f(a)− (f(b)− f(a))
(g(b)− g(a))(g(x)− g(a)).
Then h is continuous on [a, b] and differentiable on (a, b), with h(a) = 0 = h(b), so by Rolle’s
Theorem, ∃c ∈ (a, b) with h′(c) = 0, that is
0 = f ′(c)− (f(b)− f(a))
(g(b)− g(a))g′(c),
which is (6.2).
Almost every important tool you learned in calculus is related to the Mean Value Theo-
rem; many are actually more or less equivalent to the Mean Value Theorem!
Here are some easy consequences:
Proposition 6.12. Suppose f is continuous on [a, b] and differentiable on (a, b).
(a) If f ′(x) ≥ 0 ∀x ∈ (a, b), then f is monotone increasing on [a, b]. (Strictly monotone
increasing if f ′(x) > 0∀x.)
(b) If f ′(x) ≤ 0 ∀x ∈ (a, b), then f is monotone decreasing on [a, b]. (Strictly monotone
decreasing if f ′(x) < 0∀x.)
(c) If f ′(x) = 0 ∀x ∈ (a, b), then f(x) is constant in [a, b].
62
Of course you’ve known forever that if f is constant, then f ′(x) = 0∀x; part (c) is the
converse. It says that the only functions whose derivatives are zero are the constants.
Proof. Let a ≤ x < u ≤ b, and apply the MVT in the interval [x, u]: get c ∈ (x, u) with
f(x)− f(u) = f ′(c)(x− u).
Since (x − u) > 0, the sign of the term on the right is given by the sign of f ′(c), so that
proves (a), (b). For (c), this says f(x) = f(u), for any pair x, u ∈ [a, b], which is another
way of saying f(x) is a constant function– it always takes on the same value, independent
of x.
We can also connect back to the concepts of uniform continuity and Lipschitz continuity,
from the last section.
Example 6.13. Suppose f is differentiable on an interval I ⊆ R, and f ′(x) is a bounded
function on I; that is, ∃B ≥ 0 with
|f ′(x)| ≤ B ∀x ∈ I. (6.3)
Show that f is Lipschitz continuous on I.
Take any x, u ∈ I, and apply the MVT in the interval between them: ∃c ∈ I between x, u
for which f(x)− f(u) = f ′(c)(x− u). Taking the absolute value and using the boundedness
condition (6.3),
|f(x)− f(u)| = |f ′(c)| |x− u| ≤ B |x− u|,
which holds ∀x, u ∈ I. Hence f is Lipschitz continuous on I, and therefore it is uniformly
continuous on I.
Here is an old friend from Calculus, which is easily proven using the Cauchy MVT. Note
that in this formulation we consider a one-sided limit x→ a+, and do not need to know that
the functions are differentiable at the endpoint x = a:
Theorem 6.14 (L’Hopital’s Rule). Assume both f, g are differentiable on (a, b), and g′(x) 6=0, ∀x ∈ (a, b). Suppose that
limx→a+
f(x) = 0 = limx→a+
g(x), and ∃ L ∈ R with limx→a+
f ′(x)
g′(x)= L.
Then,
limx→a+
f(x)
g(x)= L.
63
Proof. By hypothesis, ∀ε > 0, ∃δ > 0 so that ∀x ∈ (a, a+ δ),
L− ε < f ′(x)
g′(x)< L+ ε. (6.4)
Take any s, t with a < s < t < a + δ. As in the proof of CMVT, by Rolle’s Theorem and
g′(x) 6= 0, we have g(s) 6= g(t), and by the CMVT, ∃c ∈ (s, t) ⊂ (a, a+ δ) with
f(t)− f(s)
g(t)− g(s)=f ′(c)
g′(c).
Substituting into (6.4),
L− ε < f ′(c)
g′(c)=f(t)− f(s)
g(t)− g(s)< L+ ε.
For t fixed, let s→ a+, so we have f(s), g(s)→ 0 (by hypothesis,) and
L− ε ≤ f(t)
g(t)≤ L+ ε,
for all t ∈ (a, a+ δ). By definition, limt→a+f(t)g(t)
= L.
6.3 Taylor’s Theorem
Actually, Taylor’s Theorem also qualifies as a direct consequence of the Mean Value Theorem,
but it deserves its own subsection!
Definition 6.15. Let I ⊆ R be an interval.
We say f is continuously differentiable on I, and write f ∈ C1(I), if f is differentiable
at all x ∈ I and f ′(x) is continuous on I (written f ′ ∈ C(I) or C0(I).)
We say f is n-times continuously differentiable on I, f ∈ Cn(I), if the nth derivative,
f (n)(x) = dn
dxnf(x) exists at every x ∈ I, and is continuous on I (that is, f (n) ∈ C(I).)
By Theorem 6.3, if f ∈ Cn(I) then automatically each f (k)(x) is continuous on I, k =
0, 1, 2, . . . , n − 1. So Ck(I) ⊂ Cn(I) for each k = 0, 1, 2, . . . , n − 1. Recalling Remark 6.2,
continuity and differentiability on the endpoints of the interval I are interpreted in terms of
one-sided limits.
If f is n times differentiable at x0 ∈ I, we may define the nth-order Taylor polynomial,
based at x0,
Pn(x) = Pn(x;x0)
:= f(x0) + f ′(x0)(x− x0) +1
2!f ′′(x0)(x− x0)2 + · · ·+ 1
n!f (n)(x0)(x− x0)n
=n∑k=0
1
k!f (k)(x0)(x− x0)k. (6.5)
64
We say that f and Pn agree to nth order at x0:
f(x0) = Pn(x0), f ′(x0) = P ′n(x0), . . . , f (n) = P (n)n (x0).
In fact, Pn is the unique nth order polynomial which agrees to f to order n.
If we stop at n = 1, then P1(x) is the equation of the tangent line to the graph y = f(x),
an old friend from calculus. The tangent line is the best linear approximation to the graph for
x near x0, and we expect Pn to be the best polynomial of order n to approximate the values
of f(x), for x near x0. The precision of this approximation is given by Taylor’s Theorem:
Theorem 6.16 (Taylor’s Theorem). Suppose I ⊆ R is an interval, f ∈ Cn(I), and f (n+1)(x)
exists for all x ∈ I. Then, for any x0, x ∈ I, ∃c ∈ I with c between x0 and x, so that:
f(x) = Pn(x;x0) +1
(n+ 1)!f (n+1)(c) (x− x0)n+1.
The remainder term Rn(x;x0) = 1(n+1)!
f (n+1)(c) (x− x0)n+1 is the error made by approx-
imating the value of f(x) by Pn(x;x0).
Proof. This proof uses a trick. Let x0, x ∈ I be fixed, and consider (for variable t ∈ I and
thinking of x, x0 as constants) the auxilliary function
F (t) = f(x)− f(t)−n∑k=1
1
k!(x− t)kf (k)(t).
We note that F (x0) = f(x)− Pn(x;x0), which is the remainder term in the Taylor approxi-
mation of f .
By the hypotheses on f , F (t) is differentiable for all t ∈ I, and we calculate:
F ′(t) = −f ′(t)−n∑k=1
[−kk!
(x− t)k−1f (k)(t) +1
k!(x− t)kf (k+1)(t)
]
=n−1∑j=0
1
j!(x− t)jf (j+1)(t)−
n∑k=0
1
k!(x− t)kf (k+1)(t) Re-indexing the first sum, k = j + 1
= − 1
n!(x− t)nf (n+1)(t).
Now, define another function
G(t) = F (t)−(x− tx− x0
)n+1
F (x0),
which is also differentiable on I, with G(x) = 0 = G(x0). Applying Rolle’s Theorem (or the
LMVT), ∃c lying between x, x0 with G′(c) = 0. Hence,
0 = G′(c) = F ′(c) + (n+ 1)(x− c)n
(x− x0)n+1F (x0),
65
and rearranging the above we get
F (x0) = − 1
(n+ 1)
(x− x0)n+1
(x− c)nF ′(c)
=1
(n+ 1)
(x− x0)n+1
(x− c)n1
n!(x− t)nf (n−1)(c)
=1
(n+ 1)!(x− x0)n+1f (n−1)(c),
which is what we had to prove.
Example 6.17. Show that
x− x3
6≤ sinx ≤ x, ∀ x ∈
[0,π
2
]Use Taylor, based at x0 = 0, but which order n? f(x) = sinx is C∞(R), it is differentiable
to every order. Since the highest order in the estimate is x3, let’s try n = 2 (and the cubic
will come from the remainder.) Since P2(x; 0) = x, by Taylor’s Theorem, ∀x ∈ [0, π/2],
∃c = c(x) with 0 < c < x and
sinx = P2(x; 0) +1
3!f ′′′(c)(x− 0)3 = x− 1
6cos(c)x3.
Since c ∈ (0, π/2), 0 < cos(c) < 1, and therefore we obtain the desired estimates.
Here’s a question: assume f is C∞(R), that is all the derivatives f (k)(x) exist for x ∈ R,
and at x = 0 all derivatives f (k)(0) = 0. Is it true that f(x) ≡ 0, the zero function?
Example 6.18. Define a function
g(x) =
{e−1/x, if x > 0,
0, if x ≤ 0.
As an Exercise, use Taylor’s Theorem to prove that ∀n ∈ N,
e1/x ≥ 1
n!x−n, ∀x > 0. (6.6)
[Hint: use u = 1/x and apply Taylor to eu for u ∈ [0,∞).]
As a consequence of (6.6), for any k ∈ N, it is another Exercise to show that
limx→0
g(x)
xk= 0, ∀k. (6.7)
[Hint: use the Squeeze Theorem.]
66
It turns out that g is C∞(R). This is not hard to prove, but is a bit long (there is a sketch
below.) Let’s assume this, and verify that for every n = 0, 1, 2, . . . , the Taylor polynomial
Pn(x; 0) ≡ 0. So every derivative g(n)(0) = 0, yet g is not the constant zero function. We
use induction on n. When n = 0, P0(x; 0) = g(0) = 0. Assume Pn−1(x; 0) ≡ 0, so by
Pn(x; 0) = anxn with an = f (n)(0)
n!. By Taylor’s Theorem, ∀x > 0,∃c ∈ (0, x) so that
g(x) = Pn(x; 0) +1
(n+ 1)!f (n+1)(c)xn+1 = anx
n +1
(n+ 1)!f (n+1)(c)xn+1.
By (6.7),
0 = limx→0
g(x)
xn= lim
x→0
[an +
1
(n+ 1)!f (n+1)(c)x
]= an,
since 0 < c < x and we are assuming g(n+1) is continuous. Thus Pn(x; 0) ≡ 0 and g(n)(0) = 0.
By induction we conclude this is true ∀n ∈ N.
Thus, g(x) is an example of a function which vanishes to all orders at x = 0, yet is not
the constant zero function.
[Here we sketch how to show that g ∈ C∞ in a neighborhood of zero. The first step is to
verify, by induction, that for all n ∈ N and x > 0, there exists a polynomial Qn(u) for which
∀x > 0, by the usual Chain and Product Rules, g(x) is k-times differentiable with
dn
dxng(x) =
dn
dxne−
1x = Qn
(1
x
)e−
1x = Qn
(1
x
)g(x).
Notice that g′(x) = 1x2g(x), so Q1(u) = u2. And when taking higher derivatives, d
dxQn(1/x)
is a different polynomial in u = 1/x.
Now use the definition of the derivative and (6.7) to show that
g(n+1)(0) = limx→0+
g(n)(x)− 0
x− 0= 0.
This verifies that g ∈ C∞(R), with each g(n) = 0.] ♦
Theorem 6.19 (Second Derivative Test). Assume that f is C2 for x ∈ (a, b), and c ∈ (a, b)
with f ′(c) = 0.
(a) If f ′′(c) > 0, then c is a strict local minimum for f : ∃δ > 0 so that f(c) < f(x),
∀x ∈ (c− δ, c+ δ).
(b) If f ′′(c) < 0, then c is a strict local maximum for f : ∃δ > 0 so that f(c) > f(x),
∀x ∈ (c− δ, c+ δ).
67
Proof. We prove (a); for (b) apply (a) to g(x) = −f(x).
Since by hypothesis, f ′′(x) is continuous in (a, b) and f ′′(c) > 0, ∃δ > 0 so that f ′′(x) > 0
for all x ∈ (c− δ, c + δ). Let x ∈ (c− δ, c + δ). Applying Taylor’s Theorem with n = 1, ∃ubetween x and c for which
f(x) = f(c) + f ′(c)(x− c) +1
2f ′′(u)(x− c)2 = f(c) +
1
2f ′′(u)(x− c)2,
since f ′(c) = 0. As u ∈ (c− δ, c+ δ), f ′′(u) > 0, and f(x) > f(c).
Taylor’s Theorem is also useful in Numerical Analysis, in which we want to solve prob-
lems in calculus or differential equations by approximation algorithms implemented on the
computer. To do this, we need to discretize the variables so as to only retain a finite number
of values to represent a function and its derivatives. The method of finite differences consists
in approximating the derivative by difference quotients, with values of f chosen on a grid of
x-values, with spacing xk − xk−1 = h between them.
One of the easiest approximations to the derivative at x = a is the “forward difference”,
Dhf(x) =f(x+ h)− f(x)
h, h > 0.
This is a “forward” difference since it compares f(x) with the next value forward f(x+ h).
You could also make the “backward difference”,
D−hf(x) =f(x)− f(x− h)
h, h > 0,
which is the same as using a negative value for h in the forward difference (and so its analysis
will be exactly the same as for Dhf(x).) Let’s assume that f is C2 in an open interval I
including x, and h is small enough so that x+ h ∈ I also. Then, applying Taylor’s Theorem
with n = 1 (ie, with remainder term involving the second derivative f ′′,) ∃c between x, x+h
with:
f(x+ h) = f(x) + f ′(x)[(x+ h)− x] +1
2!f ′′(c)((x+ h)− x)2 = f(x) + h f ′(x) +
h2
2!f ′′(c).
Rearranging,
Dhf(x)− f ′(x) =f(x+ h)− f(x)− h f ′(x)
h=h
2f ′′(c).
Assume that we know that the second derivative is bounded,
supx∈I|f ′′(x)| ≤M.
68
Then, we have the following estimate on the error made by approximating f ′(a) by forward
differences:
|Dhf(x)− f ′(x)| ≤ 1
2M h.
We say that the forward difference gives a first order accurate approximation to the deriva-
tive, because the error of approximation is at most proportional to the first power h = h1 of
the step size h.
Consider the centered difference approximation to the derivative,
Dchf(x) =
f(x0 + h)− f(x0 − h)
2h, h > 0.
It turns out that this is a better approximation to f ′(x), in the sense that it is second order
accurate.
Exercise 6.20. Suppose f is three times differentiable in I, and ∃M ≥ 0 with the third
derivative uniformly bounded by M , that is,
supx∈I|f (3)(x)| ≤M
Use Taylor’s Theorem (with n = 2) to show that if both x, x+ h ∈ I, then∣∣∣∣f(x+ h)− f(x− h)
2h− f ′(x)
∣∣∣∣ ≤ 1
6M3 h
2.
♦
As the question suggests, you will apply Taylor’s Theorem with n = 2 (with remainder
involving f ′′′,) to obtain two values c+ lying between x and x+h, and c− lying between x−hand x, with
f(x± h) = f(x)± h f ′(x) +1
2h2f ′′(x)± 1
6h3f ′′′(c±).
Now find |Dchf(x)− f ′(x)|, and finish as above.
69
7 The Fundamental Theorem of Calculus
Theorem 7.1 (Fundamental Theorem of Calculus, I). Assume that f is differentiable on
[a, b], and the derivative f ′(x) is Riemann integrable on [a, b]. Then,
f(b)− f(a) =
∫ b
a
f ′(x) dx.
Proof. Let ε > 0 be any given value. Since f ′(x) is Riemann integrable, by Darboux’s
Theorem there exists a partition of [a, b],
P = {a = x0 < x1 < · · · < xN = b},
so that 0 ≤ UP (f ′)−LP (f ′) < ε. By Theorem 6.3, since f is differentiable it is also continuous
on [a, b]. We apply the Mean Value Theorem to f on each subinterval in the partition
[xi−1, xi], i = 1, 2, . . . , N : there exists ci ∈ (xi−1, xi) so that f ′(ci)(xi − xi−1) = f(xi)− xi−1,∀i = 1, 2, . . . , N . We then use these values to make a “tagged” partition P and a Riemann
sum,
S(f ′, P ) =N∑i=1
f ′(ci)(xi − xi−1)
=N∑i=1
[f(xi)− f(xi−1] (telescoping sum!
= f(xn)− f(x0) = f(b)− f(a).
Since mi(P ) ≤ f ′(ci) ≤ Mi(P ), we have LP (f ′) ≤ S(f ′, P ) ≤ UP (f ′) (this is true for any
Riemann sum), and since f ′ is Riemann integrable we similarly have LP (f ′) ≤∫ baf ′(x) dx ≤
UP (f ′). By Darboux’s condition 0 ≤ UP (f ′)− LP (f ′) < ε, and so∣∣∣∣f(b)− f(a)−∫ b
a
f ′(x) dx
∣∣∣∣ =
∣∣∣∣S(f ′, P )−∫ b
a
f ′(x) dx
∣∣∣∣ < ε,
true for any ε > 0. This implies f(b)− f(a)−∫ baf(x) dx = 0, as desired.
If g is a Riemann integrable function on [a, b] and c ∈ [a, b] we may define a function
f : [a, b]→ R by integration,
f(x) =
∫ x
a
g(t) dt. (7.1)
Exercise 7.2. For g Riemann integrable on [a, b], show that f (as defined above) is Lipschitz
continuous on [a, b].
70
Theorem 7.3 (Fundamental Theorem of Calculus, II). If g is a continuous function on [a, b]
then f(x) =
∫ x
a
g(t) dt is differentiable on [a, b], with f ′(x) = g(x) ∀x ∈ [a, b].
For this version of FTC we need the Mean Value Theorem in a slightly different form:
Theorem 7.4 (MVT for Integrals). If g is a continuous function on [α, β], then there exists
c ∈ (α, β) with
g(c) =1
β − α
∫ β
α
g(t) dt.
That is, a continuous g attains its mean (average) value on any closed interval. Notice
that if g(x) = f ′(x) for some differentiable function f , then (applying FTC I) we recover
Lagrange’s Mean Value Theorem for f !
Proof of MVT4I. Since g is continuous on [α, β], it attains it maximum and minimum values,
so ∃u, v ∈ [α, β] with
g(u) ≤ g(t) ≤ g(v), ∀t ∈ [α, β].
By a property of the integral,
g(u)(β − α) =
∫ β
α
g(u) dt ≤∫ β
α
g(t) dt ≤∫ β
α
g(v) dt = g(v)(β − α),
that is,
g(u) ≤ 1
β − α
∫ β
α
g(t) dt ≤ g(v).
So by the Intermediate Value Theorem, ∃c between u and v (and therefore c ∈ (a, b),) with
g(c) =1
β − α
∫ β
α
g(t) dt.
Proof of FTC II. Take any s, x ∈ [a, b], x 6= s. We look at the difference quotient for f , and
apply a property of the Riemann integral:
f(x)− f(s)
x− s=
1
x− s
[∫ x
c
g(t) dt−∫ s
c
g(t) dt
]=
1
x− s
∫ x
s
g(t) dt.
(Recall that we use the convention that∫ sxg(t)dt = −
∫ xsg(t)dt, so we don’t need to worry
whether x < s or s < x.) Since g is continuous on [a, b], we may use MVT4I, to obtain c
between x and s, with g(c) =1
x− s
∫ x
s
g(t) dt. Therefore,
∣∣∣∣f(x)− f(s)
x− s− g(x)
∣∣∣∣ = |g(c)− g(x)|.
71
Since g is continuous at x, ∀ε > 0 ∃δ > 0 so that |g(c)− g(x)| < ε whenever |x− s| < δ. As
c lies between x and s, |c− x| < δ also, so∣∣∣∣f(x)− f(s)
x− s− g(x)
∣∣∣∣ < ε, ∀s with 0 < |x− s| < δ.
By definition of the limit, f is differentiable at x ∈ [a, b], with f ′(x) = g(x).
Corollary 7.5. If g is continuous on [a, b], then g has an antiderivative, f (defined as in
(7.1)) which is differentiable on [a, b].
Remark 7.6. You may have noticed there is a little asymmetry between FTC I and FTC
II: in the first, we only assume f ′ is Riemann integrable, while in the second we ask for g to
be continuous. In fact, not every Riemann integrable function has an antiderivative at each
point in [a, b]: for example, if we take
g(t) =
{1, if t ≥ 0,
−1, if t < 0
then g is Riemann integrable on [−1, 1], and f(x) =∫ x0g(t) dt = |x|, which fails to be
differentiable at x = 0. (It is Lipschitz continuous, as the Exercise above asks!)
Example 7.7. Define h(x) =
∫ lnx
0
ln t dt. For which x is h differentiable, and what is h′(x)?
This is a composition,
h(x) = f(φ(x)) with φ(x) = lnx, and f(u) =
∫ u
0
lnx dx.
As this is analysis (and not calculus) we need to be sure the appropriate Theorems are
applicable. f is differentiable for u ∈ (0,∞), and so FTC II applies on any closed interval
[a, b] as long as a > 0. By the Chain Rule (Theorem 6.5), we need for φ(x) to be differentiable
on an interval I with φ(I) ⊂ (0,∞), the domain where f is differentiable. Therefore, we need
to choose I with φ(x) = lnx ∈ (0,∞), so any I ⊂ (1,∞) will do. Hence, h is differentiable
in (1,∞), with
h′(x) = f ′(φ(x))φ′(x) =ln(lnx)
x, x ∈ (1,∞).
72