REAL ANALYSIS I ––– BASIC MEASURE AND INTEGRATION nweb2.uwindsor.ca/math/traynor/RA1/510.pdf · The integral of a real-valued function is constructed using the ordering of the

REAL ANALYSIS I ––– BASIC MEASURE AND INTEGRATION

This course will follow a near minimal path through those aspects of the theoryof Lebesgue measure and integral used in virtually all applications in analysis andprobability theory.

We begin, not with abstract measures, but rather concretely, with the construc-tion of Lebesgue measure in n-dimensions, since most people have a reasonably goodfeel for how area and volume “should” behave. This is done in such a way as toillustrate the major techniques applicable also to other measures, such as those usedto model distribution of mass, of electrical charge, or of probability. The reader isled to take an active part in the appropriate abstraction processes.

The integral of a real-valued function is constructed using the ordering of thereal line, so that it may be viewed, as in calculus, as the “area under the curve”. Wefind that the integral may be characterized by five basic properties; these actuallymay be taken as a starting point for many purposes.

Topics covered include Lebesgue outer measure, the Caratheodory process,Borel sets, sigma-rings and the Unique Extension Theorem, measurable sets (=events) measurable functions (= random variables), image measures (= distribu-tions), the Lebesgue integral, the Monotone Convergence Theorem, Fatou’s Lemma,Dominated Convergence Theorem, differentiation under the integral, change of vari-able, formulas for “expectation”, the connection with the Riemann integral, abso-lute continuity, singularity, the Radon-Nikodym theorem, Lebesgue differentiationtheorem, and the Fubini Theorem. We also define the Lp spaces and show thatthey are complete seminormed spaces , and give the connections between variousnotions of convergence of sequences of functions (almost everywhere, almost uni-form, in measure, and in Lp). .

Area in the plane.

When we say that the area of a planar object is “A square units”, what weare doing is comparing the object with a square, 1 unit on each side. In makingthe comparison, we take it for granted, that the areas of certain figures (such asrectangles) exist, that the area of a rectangle is finite and non-negative, that anytranslate of an object having an area has the same area, and that if E and F aredisjoint and have areas, then the set E ∪F also has an area, in fact the sum of theareas of E and F .

A consequence of these assumptions is that if the unit square [0, 1]× [0, 1] hasarea 1, then an interval of the type [a, b]× [c, d], must have area (b − a) × (d − c),

6/9/2006 1160 mam

1

2 REAL ANALYSIS I ––– BASIC MEASURE AND INTEGRATION

which is why we were taught this formula in school. This can be proved in astraightforward, though a little tedious, manner. The reader is invited to carry thisout. An outline is provided in the exercises.

For our treatment here, we’ll take this formula as our starting point, and laterwe will show that it was our only choice, because of the very useful Unique ExtensionTheorem.

What about other figures? Do discs have areas? Segments of discs? Theportion of the plane between a parabola and one of the axes? Could we define areain such a way that if An is an increasing sequence of sets with area, that the unionalso has an area — an area which is approximated by the areas of the An?

The answer to all of these turns out to be “yes”. We will find that all sets inthe plane which we normally run into will have a well defined area satisfying thesecriteria; nevertheless, there will be some sets which cannot be assigned an area.These must be carefully constructed using some form of the Axiom of Choice.

1. Lebesgue measure on intervals of Rn.

Let I be the family of bounded intervals in Rn, that is the sets of the formI = I1 × · · · × In , where each Ik is a bounded interval1of R. Each Ik is of one ofthe four forms [ak, bk], (ak, bk) (ak, bk], [ak, bk). We say that I has endpoints of theform a = (a1, . . . , an), b = (b1, . . . , bn). The Lebesgue volume or (n-dimensional)Lebesgue measure of such an interval I is defined to be

λn(I) = (b1 − a1) . . . (bn − an).

It is worth noting that this says λn(I) = λ1(I1) . . .λ1(In).If it is clear what is meant, we will often drop the subscript n in λn.

A function ϕ defined (at least) on a family C of sets with values in the extendedreals is called finitely additive on C if, whenever H is a finite disjoint subfamilyof C whose union also belongs to C, then

ϕ(⋃

E∈H

E) =∑

E∈H

ϕ(E).

Empty unions are considered ∅ and empty sums are considered 0, so in case ∅ ∈ C,this is interpreted as saying ϕ(∅) = 0:

ϕ(∅) = ϕ(⋃

E∈∅

E) =∑

E∈∅

ϕ(E) = 0.

The function ϕ is called additive (or sometimes “2-additive” for emphasis) on C ifthe above holds for families H of 2 sets; that is,

ϕ(A ∪ B) = ϕ(A) + ϕ(B), if A, B, A ∪ B ∈ C and A ∩ B = ∅;

ϕ is called countably additive (or σ-additive) on C if ϕ(⋃

I∈H I) =∑

I∈H ϕ(I)whenever H is a countable disjoint subfamily of C whose union also belongs to C.

1Generally, if a capital letter such as I denotes an interval, then Ik will denote the kth factor.

6/9/2006 1160 mam

1. LEBESGUE MEASURE ON INTERVALS OF RN . 3

1 Theorem. Lebesgue volume is finitely additive on the bounded intervals.

(Actually, we will see in Theorem 7 that it is countably additive.)

For the proof of finite additivity, we need some simple facts about products ofsets (not necessarily intervals).

2 Lemma. Let A = A1 × · · · × An, B = B1 × · · · × Bn, and C = C1 × · · · × Cn.Then:

(1) B ∩C = (B1 ∩ C1) × · · · × (Bn ∩Cn)

(2) If B and C are disjoint, then for some k, Bk and Ck are disjoint; moreover,

(3) if also A = B ∪C and B, C are non-empty then for the k of (2), Ak = Bk ∪Ck,while for i 6= k, Ai = Bi = Ci.

Proof. Exercise.

3 Lemma. If B and C are disjoint intervals contained in the interval A, thereexist disjoint intervals J and K containing B and C, respectively, with A = J ∪K.

Proof. By the previous lemma, there exists k with Bk ∩ Ck = ∅. We may assumeBk is to the left of Ck. Let d be the right endpoint of Bk and let H be eitherthe closed half-space x : xk ≤ d or the open half-space x : xk < d, dependingon whether or not d ∈ Bk. Then J = A ∩ H and K = A ∩ Hc, are the requiredintervals.

Proof of the theorem. First we show that λn is 2-additive on the set I of boundedintervals. Certainly if n = 1 and a ≤ b ≤ c, then λ1((a, b]) + λ1((b, c]) = b − a +c − b = c − a = λ1((a, c]); similarly, for the other types of intervals. If n > 1, letA = B ∪ C ∈ I, where B, C are disjoint non-empty intervals. Since B ∩ C = ∅,there is an index k with Bk ∩ Ck = ∅ and since A = B ∪ C, Ak = Bk ∪ Ck and onthe other sides Ai = Bi = Ci. Thus,

λn(A) =

∏

i6=k

λ1(Ai)

· λ1(Bk ∪ Ck)

=

∏

i6=k

λ1(Ai)

· (λ1(Bk) + λ1(Ck))

=

∏

i6=k

λ1(Ai)

· λ1(Bk) +

∏

i6=k

λ1(Ai)

·λ1(Ck)

= λn(B) + λn(C)

Now, we continue to show by induction that λn is finitely additive on I. Sup-pose H is a disjoint set of m intervals I with A =

⋃I∈H I ∈ I. We want to show

λ(A) =∑

I∈H λ(I). For m = 0, we are looking at λ(∅), which is the measure of

6/9/2006 1160 mam

4 1. LEBESGUE MEASURE ON INTERVALS OF Rn

the degenerate interval λ((a, a)) =∏

i(ai − ai) = 0 . The case m = 1 is trivial andwe have already checked the case m = 2.

Let m > 2, and assume the statement true for fewer than m sets. Let B, C beany two members of H. Since these are disjoint, the lemma yields disjoint intervalsJ, K with J ⊃ B, K ⊃ C and A = J ∪ K. Since B ∩ K = ∅ = C ∩ J , each ofthe families J ∩ I : I ∈ H and K ∩ I : I ∈ H has fewer than m non-emptyelements; the former has union J and the latter has union K, so using the inductivehypothesis and 2-additivity twice, we get

λ(A) = λ(J) + λ(K)

=∑

I∈HI∩J 6=∅

λ(J ∩ I) +∑

I∈HI∩K 6=∅

λ(K ∩ I)

=∑

I∈H

λ(J ∩ I) +∑

I∈H

λ(K ∩ I)

=∑

I

(λ(J ∩ I) + λ(K ∩ I))

=∑

I

λ((J ∪ K) ∩ I)

=∑

I∈H

λ(I),

as required. A finite union of bounded intervals is called an elementary set (or elementary

figure). Let E denote the set of all of these and let E0 denote the set of finite disjointunions2 of bounded intervals. We will show that these are actually the same. Butfirst, notice that for any family C of sets, if A and B are disjoint and each is a finitedisjoint union of sets in C, then so is A ∪ B. Indeed, if A =

⋃H, and B =

⋃H′,

where H and H′ are finite disjoint subfamilies of C and A is disjoint from B, thenH∪H′ is also a disjoint subfamily of C, so A ∪B =

⋃(H∪H′) is also the union of

a finite disjoint subfamily of C.

4 Lemma.

(a) If A, B ∈ I, then A ∩ B ∈ I

(b) If A, B ∈ I, then A \ B ∈ E0.

(c) If A ∈ E0, B ∈ I, then A \ B ∈ E0.

(d) If A ∈ E0, B ∈ E , then A \ B ∈ E0.

(e) If A ∈ E , then A ∈ E0, so E = E0.

Proof. (a) That the family of bounded intervals is closed under finite intersectionsis clear. (It reduces to the one-dimensional case through the formula A ∩ B =

2“A is a finite disjoint union of sets in C” means there exists a finite disjoint subfamily H ofC with A =

⋃H.

6/9/2006 1160 mam


A1 ∩B1 × · · ·×An ∩Bn, and there one just looks at the various cases, for example[a, b) ∩ [c, d) = [e, f), where e = maxa, c, and f = minb, d). (If e > f , then[e, f) is empty).

(b) This is proved by induction. Certainly, if n = 1, and A, B ∈ I, then A \ Bis the disjoint union of at most 2 intervals. The identity

(E1 × E2) \ (F1 × F2) = [(E1 \ F1) × E2]∪ [(E1 ∩ F1) × (E2 \ F2)] (∗)

is used for the inductive step: if n > 1, and the result holds for dimensions < n,put E1 = A1 × · · · × An−1, E2 = An, F1 = B1 × · · · × Bn−1, F2 = Bn. Then,by the inductive hypothesis E1 \ F1 is a finite disjoint union of intervals of Rn−1.Thus, (E1 \ F1) × E2 is a finite disjoint union of intervals of Rn. In the same way,(E1∩F1)× (E2 \F2) is a finite disjoint union of intervals of Rn. Thus, by (∗), A\Bis the disjoint union of two finite disjoint unions of intervals, and hence is one also:A \ B ∈ E0.

(c) If H is a finite disjoint family in I, A =⋃

H and B ∈ I, then

A \ B =

(⋃

I∈H

I

)\ B =

⋃

I∈H

(I \ B).

But I \ B ∈ E0 by (b) and the (I \ B) are disjoint, so A \ B is in E0.

(d) This follows from (c) by induction.

(e) If A = A1 ∪ · · · ∪ Am, where each Ai ∈ I, put B1 = A1, B2 = A2 \ A1,B3 = A3 \ (A1 ∪A2), . . . , Bm = Am \ (A1 ∪ · · ·∪Am−1). Then the Bi are disjoint,they belong to E0 by (d), and A =

⋃mi=1 Ai =

⋃mi=1 Bi, which belongs to E0.

The process used in establishing (e), known as “disjointification” is extremelyuseful.

A family P of sets is called a prering if it is closed under finite intersectionand the difference of two members of P is a finite disjoint union of members of P.Thus, (a) and (b) state that I is a prering.

A family of sets R is called a (Boolean) ring if it is closed under finite unionand difference, hence also intersection: A, B ∈ R implies A ∪ B ∈ R, A \ B ∈ R(hence also A ∩ B = A \ (A \ B) ∈ R). (Munroe calls a ring an “additive class”.)

5 Proposition. The family of elementary sets is a ring of sets.

Proof. If A, B ∈ E , then each of A and B is a finite union of intervals, so A ∪ B isone also; that is, A ∪ B ∈ E . Furthermore, since E = E0, A \ B ∈ E by (d) of thelemma.

What we have called a “prering” is usually called a “semiring”, but this termalso has at least two other meanings in the measure theory literature. The readercan prove now that a family P of sets is a prering iff it is closed under intersectionand the family of finite unions of sets in P is a ring.

6/9/2006 1160 mam


Subadditivity and superadditivity.

A set-function ϕ (that, is a function whose domain is a family of sets) is calledfinitely subadditive on C, if

ϕ(A) ≤∑

E∈H

ϕ(E),

whenever H is a finite subfamily of C, A ∈ C, and A ⊂⋃H. It is called finitely

superadditive on C if the reverse inequality holds for finite disjoint H ⊂ C, andA ⊃

⋃H. It should be noticed that either of these implies ϕ is monotone (more

precisely, isotone) on C in the sense that A ⊂ B implies ϕ(A) ≤ ϕ(B) , forA, B ∈ C.

6 Proposition. Lebesgue volume is both finitely subadditive and finitely super-additive on the bounded intervals.

Proof. We show superadditivity first. If A ∈ I, H is a finite disjoint subfamily of I,and A ⊃

⋃H, then

⋃H ∈ E0, and hence A\

⋃H is a finite disjoint union of members

of I; say, A \⋃H =

⋃H′, where H′ is a finite disjoint family of intervals. Then,

by the finite additivity of λ on I, λ(A) =∑

I∈H λ(I) +∑

I∈H′ λ(I) ≥∑

I∈H λ(I),since each λ(I) ≥ 0.3

To prove the subadditivity, we use the disjointification procedure again. Sup-pose A, A1, A2, . . .Am ∈ I with A ⊂

⋃mi=1 Ai. Put B1 = A1, and for i > 1, put

Bi = Ai \⋃

k<i Ak. Then the Bi are disjoint and each is a finite union of a disjointfamily Hi of intervals contained in the Ai. Thus

A =m⋃

i=1

(A ∩ Bi) =m⋃

i=1

⋃

I∈Hi

(A ∩ I)

and by the finite additivity, then the superadditivity,

λ(A) =m∑

i=1

∑

I∈Hi

λ(A ∩ I) ≤m∑

i=1

λ(Ai),

as required.

Countable additivity.

We are nearly ready to prove that λ is not only finitely, but countably additiveon I. For this, we use some topological properties. Recall the

Heine-Borel Theorem. Every closed, bounded subset K of Rn is compact, inthe sense that if K is covered by a family U of open sets, then it is covered by afinite subfamily of U .

If I is a bounded interval, then there exists a compact interval K ⊂ I withλ(I) − λ(K) as small as we wish. Indeed, if λ(I) = 0, then K = ∅ is a compact

3Warning: λ(⋃

H) is not defined at this point.

6/9/2006 1160 mam


interval, also of measure 0, while if I has endpoints a, b a < b (that is, with ai < bi

for all i ) and a < a′ < b′ < b. then the interval K = [a′, b′] ⊂ I and λ(K) =∏mi=1(b

′i − a′

i) which converges to∏m

i=1(bi − ai) = λ(I), as a′ −→ a and b′ −→ b.In the same way, each I ∈ I is contained in an open interval U with λ(U ) − λ(I)arbitrarily small.

7 Theorem. Lebesgue volume is countably additive on the bounded intervals.

Proof. We already know λ is finitely additive, so we are left to establish the count-ably infinite case. To do this we establish countable superadditivity and countablesubadditivity.

Countable superadditivity is simple. Let I ∈ I, and let (Ik) be a disjointsequence of elements of I with I ⊃

⋃k Ik. Since λ is finitely superadditive on the

intervals, λ(I) ≥∑m

k=1 λ(Ik), and, letting m −→ ∞, we obtain λ(I) ≥∑∞

k=1 λ(Ik).

Now to prove countable subadditivity, let I ∈ I, and let (Ik) be a sequence ofelements of I with I ⊂

⋃k Ik. Fix ε > 0, and let K be a compact interval contained

in I and for each k, let Uk be an open interval containing Ik with λ(I)−λ(K) < εand λ(Uk) − λ(Ik) < ε/2k. Then Uk : k ∈ N is an open cover of K, so a finitesubfamily Uk : k ≤ m also covers K:

K ⊂m⋃

k=1

Uk.

But λ is finitely subadditive, and hence

λ(I) − ε < λ(K) ≤m∑

k=1

λ(Uk)

<

m∑

k=1

(λ(Ik) + ε/2k)

<

∞∑

k=1

λ(Ik) + ε

.

Since ε is arbitrary, λ(I) ≤∑∞

k=1 λ(Ik).

1. If τ is any 2-additive real-valued function defined on the family I of intervals then τ is actually

finitely additive.

2. The family E of elementary sets is the smallest ring containing the bounded intervals.

3. If P1 and P2 are prerings, then so is A1 × A2 : A1 ∈ P1, A2 ∈ P2

4. By definition a family P of sets is a prering iff it is closed under intersection and the

difference of two members of P is a finite disjoint union of members of P.

The reason for the term is that P is a prering iff it is closed under intersection and the familyof finite unions of sets in P is a ring.

6/9/2006 1160 mam


5. (disjointification) Let (Ak) be a sequence of sets. Put B1 = A1 , and for each k > 1, letBk = A \ (A1 ∪ · · · ∪Ak−1) = Ak \

⋃j<k Aj. Then, the sequence (Bk) is disjoint and for all

k,⋃k

i=1 Ai =⋃k

i=1 Bi and⋃∞

i=1 Ai =⋃∞

i=1 Bi.

6. Let Q be the set of all rationals and I(Q) be its set of bounded intervals. The formula

`(I) = b − a, if I ∈ I(Q) has endpoints a ≤ b, does not define a function which is countablyadditive. Why does the proof in the discussion on Lebesgue volume not work here to produce

countable additivity?

7. A family of sets R is called a ring if A, B ∈ R implies A∪B ∈ R and A \B ∈ R (hence also

A ∩ B = A \ (A \ B) ∈ R).

A ring in Algebra is a set together with a binary operation + “addition” which makesit a commutative group and an associative “multiplication” which is both left and right

distributive over addition.

A ring of sets becomes a ring in the sense of Algebra if we let “addition” be A 4 B =

(A \ B) ∪ (B \ A) (symmetric difference) and let “multiplication” be AB = A ∩ B.Conversely, any family of sets closed under 4 and ∩ is a ring of sets.

1. The following is an outline of a proof that any notion of area defined for all bounded intervalsin the plane, must give I = [a, b] × [c, d] the area α(I) = (b − a) × (d − c), if we assume

that 0 ≤ α(I) < ∞, that the unit square U = [0,1] × [0,1] has area 1, that each translateof I has the same area, and that if E and F are disjoint intervals with I = E ∪ F then

α(I) = α(E) + α(F ).

(a) α([0,1)× [0, 1]) ≤ α([0,1)× [0,1]) + α(1× [0,1]) = α(U) = 1.

(b) [0, 12)× [0,1] has the same area as its translate [ 1

2, 1)× [0,1]; but, these two are disjoint

with union of area ≤ 1, so each has area ≤ 12.

(c) Continuing, we find α([0, 12n ) × [0,1]) ≤ 1

2n , so that α(0× [0,1])) ≤ 12n −→ 0. Thus,

this line segment has area 0, and so does its translate 1 × [0,1].

(d) Again using the additivity and translation, we see that if q is a positive integer, then q

disjoint translates of [0, 1q)× [0,1] and one line segment add up to U , so that α([0, 1

q)×

[0, 1]) = 1q.

(e) Similarly, α([0, pq)× [0,1]) = pα([0, 1

q) × [0,1]) = p/q.

(f) Similarly, if r, s are rationals ≥ 0, then α([0, r)× [0, s)) = rs.

(g) Then by approximating from within and without, the same formula is obtained for realr, s.

(h) Finally, α([a, b] × [c, d]) = (b − a)(c − d) as required. The same holds, of course, for theother types of intervals, since they differ from these only by line segments of area 0.

2. A family S of sets is called a semiring if it is closed under finite intersections and if A, B ∈ Swith A ⊂ B, then there is a chain of Ck ∈ S with A = C0 ⊂ C1 ⊂ · · · ⊂ Cm = B andCk \ Ck−1 ∈ S, for all k = 1, . . . ,m.

(a) Every semiring is a prering.

(b) The family of bounded intervals of Rn form a semiring.

6/9/2006 1160 mam

2. LEBESGUE OUTER MEASURE IN RN . 9

2. Lebesgue outer measure in Rn.

We now develop a notion of measure of the other subsets of Rn by approxima-tion from without. This new “outer measure” will no longer be even additive onall the subsets, but it will be possible to find a rather large family on which it iscountably additive.

Lebesgue outer measure in Rn is the function defined on all subsets of Rn

by the formula

λ∗n(A) = inf

∑

I∈H

λn(I) : H ⊂ I, H is countable, and⋃

H ⊃ A.

As with λn, we shall usually drop the subscript ‘n’.

Call a subfamily of I covering a set A an I-cover of A.

1 Theorem. The Lebesgue outer measure of a bounded interval is the same as itsvolume: λ∗

n(A) = λn(A), for A ∈ I.

Proof. Since A is one possible countable cover of A by intervals, λ∗(A) ≤ λ(A),by definition of infimum. On the other hand, if H is a countable I-cover of A,then λ(A) ≤

∑I∈H λ(I), (countable subadditivity on I), so taking infimum gives

λ(A) ≤ λ∗(A).

Note. A slightly more involved argument would prove that if H is a finite disjointfamily of bounded intervals then λ∗(

⋃H) =

∑I∈H λ(I), and hence that λ∗ is

finitely additive on the “elementary figures”.

If x ∈ Rn and A ⊂ Rn, then x + A denotes x + a : a ∈ A, the translate of Aby x.

2 Theorem. Lebesgue outer measure is translation invariant: For each x ∈ Rn

and A ⊂ Rn, λ∗(x + A) = λ∗(A).

Proof. First, if I is an interval with endpoints a ≤ b, then x + I is an interval withendpoints x + a, x + b and λ(x + I) =

∏i(x + bi − (x + ai)) = λ(I).

There is a one-to-one correspondence between the countable I-covers of A andof x + A. Indeed if H is a countable I-cover of A then Hx = x + I : I ∈ His a countable I- cover of x + A, and if H′ is a countable I-cover of x + A, thenH′

−x = −x + J : J ∈ H′ is the corresponding cover of A. Thus,

λ∗(A) = inf

∑

I∈H

λ(I) : H is a countable I-cover of A

= inf

∑

I∈H

λ(x + I) : H is a countable I-cover of A

= inf

∑

J∈H′

λ(J) : H′ is a countable I-cover of x + A

= λ∗(x + A).

6/9/2006 1160 mam

10 2. LEBESGUE OUTER MEASURE IN Rn

An extended real-valued function ϕ defined on all subsets of a space S is calledan outer measure on S if it is countably subadditive and vanishes at ∅.

Note. Our definition of countable subadditivity uses countable covers, possiblyfinite, possibly even empty. However, if ϕ(∅) = 0, and ϕ(A) ≤

∑∞i=1 ϕ(Ai), when-

ever (Ai) is a sequence of subsets of S covering A, then ϕ is countably subadditive.Indeed, ϕ(∅) = 0 takes care of the empty covers, and if A ⊂

⋃mi=1 Ai, we can put

Ai = ∅, for i > m and get ϕ(A) ≤ ϕ(A1) + · · ·+ ϕ(Am) + 0 + 0 + · · ·=∑m

i=1 ϕ(Ai).

3 Theorem. Lebesgue outer measure in Rn is an outer measure.4

Proof. λ∗(∅) = λ(∅) = 0. Now suppose A ⊂⋃∞

i=1 Ai. Fix ε > 0 and for each i letHi be a countable I-cover of Ai with

∑I∈Hi

λ(I) ≤ λ∗(Ai) + ε/2i. Then⋃

i Hi

is a countable I-cover of A, hence λ∗(A) ≤∑∞

i=1

∑I∈Hi

λ(I) ≤∑∞

i=1 λ∗(Ai) + ε,and the conlusion follows since ε is arbitrary.

The procedure used to construct Lebesgue outer measure is called the Caratheodoryprocess. It was invented by C. Caratheodory for this purpose and to study lengthand surface area, but it is useful in countless other situations.

Most outer measures (including λ∗) are not countably additive everywhere theyare defined — not even additive! But, Caratheodory also found a simple way ofselecting a large family of sets on which an outer measure is countably additive. Forany R-valued function ϕ on all the subsets of a set S, a set A is called ϕ-measurableif for all T ⊂ S,

ϕ(T ) = ϕ(T ∩ A) + ϕ(T ∩ Ac).

One may think of each T as a “test set”, so that A is measurable if it “splitsevery test set additively”. The set of all ϕ-measurable sets is denoted by Mϕ. Inthe special case of Lebesgue outer measure, each λ∗-measurable set is also calledLebesgue measurable.

A family of sets A is called a (Boolean) algebra in S, (or a field in S) if itis a ring of subsets of S containing S itself; that is, A is closed under finite unions,and complements in S.

A σ-algebra is an algebra closed under even countable unions (and intersec-tions).

4 Splitting Lemma. If ϕ is any function defined on all subsets of S to R, withϕ(∅) = 0, then Mϕ is an algebra of subsets of S on which ϕ is finitely additive.

Proof. The definition builds in a condition much stronger than additivity: if A andB are disjoint, even if only A (say), belongs to Mϕ, then for all T ⊂ S,

ϕ(T ∩ (A∪B)) = ϕ(T ∩ (A∪B)∩A)+ϕ(T ∩ (A∪B)∩Ac) = ϕ(T ∩A)+ϕ(T ∩B);

in particular , ϕ(A∪B) = ϕ(A)+ϕ(B), so the job is to show that M is an algebra.

4mercifully

6/9/2006 1160 mam


The symmetry of the definition shows that Mϕ is closed under complementa-tion. Since ϕ(T ) = 0 + ϕ(T ) = ϕ(T ∩ ∅) + ϕ(T ∩ ∅c), it evidently contains ∅ (andS). We show that if A, B ∈ Mϕ, then so is A ∪ B. Splitting the set T with themeasurable A, then the set T ∩ Ac with the measurable B, we have

ϕ(T ) = ϕ(T ∩A) + ϕ(T ∩ Ac ∩ B) + ϕ(T ∩ Ac ∩ Bc). (∗)

Notice that Ac ∩ Bc is (A ∪ B)c. If we replace T by T ∩ (A ∪ B) in (∗), the lastterm on the right side vanishes and the others stay the same, so that

ϕ(T ∩ (A ∪ B)) = ϕ(T ∩ A) + ϕ(T ∩ Ac ∩ B),

which can be put back into (∗) to obtain

ϕ(T ) = ϕ(T ∩ (A ∪ B)) + ϕ(T ∩ (A ∪ B)c);

that is, A ∪B is measurable.

5 Theorem. If ϕ is an outer measure on a space S, then Mϕ is a σ-algebra onwhich ϕ is countably additive and which contains all sets A with ϕ(A) = 0.

The sets A for which ϕ(A) = 0 are known as ϕ-null sets. We will use thenotation Nϕ for the family of ϕ-null sets.

Proof. We’ve just shown that Mϕ is an algebra. If ϕ(A) = 0, then by the mono-tonicity of ϕ, ϕ(T ) ≤ ϕ(T ∩ A) + ϕ(T \ A) = 0 + ϕ(T \ A) ≤ ϕ(T ).

By the disjointification procedure, any algebra closed under countable disjointunions is closed under countable unions. So let (Am) be a disjoint sequence in Mϕ

and B =⋃

m Am . Then, splitting the test set (T ∩⋃m

i=1 Ai)∪(T \B) with⋃m

i=1 Ai,then using the measurability of each Ai

ϕ((T ∩m⋃

i=1

Ai) ∪ (T \ B)) = ϕ(T ∩m⋃

i=1

Ai) + ϕ(T \ B) =m∑

i=1

ϕ(T ∩ Ai) + ϕ(T \ B).

Now the left side here is ≤ ϕ(T ) and the right side converges to∑∞i=1 ϕ(T ∩ Ai) + ϕ(T \ B) ≥ ϕ(T ∩ B) + ϕ(T \ B) ≥ ϕ(T ). Thus

ϕ(T ) =∞∑

i=1

ϕ(T ∩ Ai) + ϕ(T \ B) = ϕ(T ∩ B) + ϕ(T \ B)).

This shows that B is measurable and also gives a stronger result than countableadditivity; indeed, replacing T by B in gives ϕ(B) =

∑∞i=1 ϕ(Ai).

6/9/2006 1160 mam


6 Theorem. The bounded intervals, hence all open and closed sets of Rn areλ∗-measurable.

Proof. Fix T ⊂ Rn. Let A ∈ I, and let H be a countable I-cover of T . For eachI ∈ I, there exists a finite disjoint family HI ⊂ I, with I \A =

⋃HI . By the finite

additivity of λ on I, we have

λ(I ∩ A) +∑

E∈HI

λ(E) = λ(I).

But I ∩ A : I ∈ H covers T ∩ A and⋃

I HI covers T ∩ Ac. Hence,

λ∗(T ∩ A) + λ∗(T ∩ Ac) ≤∑

I∈Hλ(I ∩ A) +

∑

I∈H

∑

E∈HI

λ(E) =∑

I∈Hλ(I).

Taking infimum over the countable I-covers of T gives

λ∗(T ∩ A) + λ∗(T ∩ Ac) ≤ λ∗(T ),

and the reverse inequality holds in any case by subadditivity of λ∗.

To show that open sets and closed sets are measurable, we need the fact that

7 Lemma. Every open set in Rn is the union of countably many bounded intervals.

This is true because the countable family of intervals (a, b), where the co-ordinates of a, b are rational, form a basis for the topology of Rn. (Indeed, if x ∈ G,open, there exists an interval (a′, b′) ⊂ G containing x, hence by choosing rationalai, bi with a′

i < ai < xi < bi < b′i using the density of the rationals in the reals, onemakes the required interval Ix = (a, b). Then G =

⋃x∈G Ix.)

The statement about measurability of open sets now follows immediately sinceMλ∗ is closed under countable unions. Since it is also closed under complementsand closed sets are complements of open sets, they too are measurable.

Definition. Lebesgue measure is the restriction of λ∗, (Lebesgue outer mea-sure) to the family Mλ∗ of λ∗-measurable sets. From now on, we will use λ orλn to denote this measure. We recall that we proved that λ∗(A) = λ(A) for Aa bounded interval. With our new definition, this becomes true for all Lebesguemeasurable sets. If the superscript ∗ is used, it will be because the set is not knownto be measurable. Also Mλ∗ will be written Mλ. As we know, each λ∗-null set,that is, set N with λ∗(N ) = 0, is λ-measurable. Thus, such N will be also referredto as λ-null or Lebesgue-null.

You can prove that for all x ∈ Rn, A ∈ Mλ if and only if x+ A ∈ Mλ. (Then,of course, λ(x + A) = λ(A), since this was only a change of notation.)

Approximation.

6/9/2006 1160 mam


8 Theorem. If µ is a countably additive extended real-valued function on a ringR, then for a sequence (Ak) in R,

(1) if Ak A ∈ R, then µ(Ak) −→ µ(A),

(2) if Ak A ∈ R, then µ(Ak) −→ µ(A), provided some µ(Ak) is finite.

These properties are called increasing and decreasing σ-continuity, re-spectively. (Ak A means Ak ⊂ Ak+1, for all k, and

⋃k Ak = A. Similarly

Ak A means Ak ⊃ Ak+1, for all k, and⋂

k Ak = A.)

Proof. (1) Disjointify: If we take B1 = A1, and Bk = Ak \ Ak−1, then the Bk aredisjoint with union A, and Am =

⋃k≤m Bk. Hence, µ(Am) =

∑k≤m µ(Bk) −→∑∞

k=1 µ(Bk) = µ(A) , by countable additivity.

(2) Is left as an exercise.

Example. The n − 1 dimensional subspace x ∈ Rn : x1 = 0 has Lebesguemeasure 0.

Proof. Let us denote this subspace by V . Take R = Mλ. For each k, let Ak =0 × [−k, k] × · · · × [−k, k] = x ∈ Rn : x1 = 0, |xi| ≤ k, for i 6= 1. The sets Vand Ak are measurable, since they are closed. Then, λ(Ak) = 0(2k) . . . (2k) = 0and Ak V , so λ(V ) = 0.

The reader can modifiy this example to prove that unbounded open intervalshave infinite measure.

Example. In the previous example, suppose we didn’t know that each Ak hasmeasure 0. Let us prove that to introduce a useful technique. Fix k and writejust A for simplicity. Let e1 = (1, 0, . . . , 0), a standard basis vector in Rn. Then,for each m ∈ N, the translate Bm := A + (1/m)e1 = (a1 + 1/m, a2, . . . , an) :a ∈ A is measurable with the same measure as A. The Bm are disjoint withunion

⋃∞m=1 Bm ⊂ [0, 1] × [−k, k] × · · · × [−k, k], so (2k)n−1 ≥ λ

(⋃∞m=1 Bm

)=∑∞

m=1 λ(Bm) =∑∞

m=1 λ(A). But the right side here is infinity unless λ(A) = 0.

One can modify the above example to prove that each translate of every propersubspace has measure 0.

We could have used only open intervals in the definition of Lebesgue outermeasure:

9 Theorem. For any A ⊂ Rn, λ∗(A) = inf∑

I∈U λ(I), the infimum taken over allcountable covers U of A by bounded open intervals.

Proof. Let λ(A) be the number given by this new formula. Since each cover byopen elements of I is a cover by elements of I, we have λ∗(A) ≤ λ(A). Onthe other hand, fix ε > 0. If H = Ii : i ∈ N is a countable I-cover of A, foreach i there exists an open interval Ui with λ(Ui) < λ(Ii) + ε/2i. Thus λ(A) ≤

6/9/2006 1160 mam


∑i λ(Ui) ≤

∑i(λ(Ii) + ε/2i) =

∑i λ(Ii) + ε. Taking infimum over such H gives

λ(A) ≤ λ∗(A) + ε. Since ε is arbitrary, λ(A) ≤ λ∗(A), so equality is obtained.

Lebesgue outer measure is open outer regular. That is,

10 Theorem. For all A ⊂ Rn, λ∗(A) = infλ(G) : G is open , G ⊃ A.

Proof. Denote (temporarily) the right side of the formula to be proved by λo(A).Since λ∗ is monotone, if G is open and contains A, λ∗(A) ≤ λ(G), hence

λ∗(A) ≤ λo(A).

By the countable subadditivity, if U is a cover of A by open intervals and G =⋃

U ,then λ(G) ≤

∑U∈U λ(U ). Moreover, G is open and contains A, so

λo(A) ≤∑

U∈Uλ(U ).

By the previous theorem, the infimum over such U , gives λ∗(A) on the right side,so λ∗(A) = λo(A).

Thus every set (measurable or not) is covered by an open (hence measurable)set with almost the same outer measure.

On the other hand, each measurable set is approximated from within by that ofa compact set — Lebesgue measure (on the Lebesgue measurable sets) is compactinner regular:

11 Theorem. If A ∈ Mλ∗ , λ(A) = supλ(K) : K ⊂ A, K compact .

Proof. Let λ0(A) stand for the right side of this equation. Let A a Lebesgue-measurable set. By the monotonicity, we have

λ0(A) ≤ λ(A),

so it is the opposite inequality to be proved.

Let I be an arbitrary compact interval. With a view to using the open outerregularity, let U be an open set containing I\A. Then, I\U ⊂ I\(I\A) = I∩A ⊂ A.So I \ U is a compact subset of A and hence,

λ0(A) ≥ λ(I \ U )

= λ(I) − λ(I ∩ U )

≥ λ(I) − λ(U )

(The subtraction was legal, since I has finite measure.)

Taking the supremum of the right-side (that is the inf of λ(U ) ) over such U ,we have

λ0(A) ≥ λ(I) − λ(I \ A) = λ(I ∩ A).

6/9/2006 1160 mam


But, there exists a sequence of compact intervals Ik Rn, hence Ik ∩ A A andhence

λ0(A) ≥ λ(A),

by the increasing σ-continuity.1. Prove, directly from the definition of λ∗, that if H is a finite disjoint family of bounded inter-

vals then λ∗(⋃

H) =∑

I∈H λ(I), and hence that λ∗ is finitely additive on the “elementary

figures”. Remember that at this point in the development, λ is not defined on anything but

intervals. In particular λ(⋃

H) is not defined.

2. Let ϕ be a non-negative function defined on all subsets of S with values in R such that,

ϕ(∅) = 0 and for each sequence (En) with En ⊂ En+1 , ϕ(⋃∞

n=1 En) = limn ϕ(En), thenMϕ is a sigma algebra on which ϕ is countably additive.

3. Let A be the triangular region (x, y) : x ∈ [0,1], 1− x ≤ y ≤ 1. Find λ∗(A), using onlythe facts about λ∗ available till now.

4. Let A be the region (x, y) : x ∈ [0, 1], 0 ≤ y ≤ x2. Find λ∗(A).

5. In R2 , let L = (x,x2) : x ∈ [0,1]. Prove that λ2(L) = 0. Generalize this result to thegraph of any continuous function.

6. Unbounded open intervals of R2 have infinite Lebesgue outer measure, but there are un-bounded open sets of finite (outer) measure.

7. A translate of a Lebesgue measurable set is Lebesgue measurable.

8. Every proper vector subspace V of Rn has Lebesgue measure 0. Hint: use translationinvariance on a bounded portion of V first.

9. If τ is any function defined (at least) on a family C of subsets of a space S with values in[0,∞], and

τ∗(A) = inf∑

I∈Hτ(I) : H ⊂ C, H is countable, and

⋃H ⊃ A,

for all A ⊂ S, then τ∗ is an outer measure. (τ∗ is called the Caratheodory (outer)measure generated by τ and C.)

Even though τ need not be defined at ∅, we still get τ∗(∅) = 0, from the convention involving

empty sums.

10. Let τ∗ be the Caratheodory measure generated by τ and C, τ : C −→ [0,∞], then:

a) τ∗(I) ≤ τ(I) for all I ∈ C.

b) if τ is countably subadditive, then τ∗(A) = τ(A) for A ∈ C.

11. In the setting of the splitting lemma, a set A is called ϕ-null if ϕ(B) = 0, for all B ⊂ A. Thealgebra Mϕ need not contain all ϕ-null sets.

12. Sets A and B in Rn are said to be separated if the closure of A does not intersect B andthe closure of B does not intersect A:

cl(A) ∩ B = A ∩ cl(B) = ∅.

Using the (Caratheodory) concept of measurability, show that if A and B are separated, thenλ∗(A ∪ B) = λ∗(A) + λ∗(B).

6/9/2006 1160 mam


3. Measurable sets, Borel sets and null sets.

A non-measurable set.

What sets are λ-measurable? All open sets, all closed sets, all bounded in-tervals, countable unions, intersections and differences of these; the unboundedintervals are included, since they are countable unions of bounded ones . . . . It ishard to imagine that there are any other sets! But, it turns out that there areLebesgue measurable sets which are not obtainable by repeating these operationsstarting with open sets or intervals and there are subsets of Rn which are notLebesgue measurable. We will give an example of the latter.

1 Theorem. There exists a non Lebesgue measurable subset of R.

Proof. Consider the equivalence relation on R, defined by x ∼ y iff x− y ∈ Q. Theequivalence classes of this relation are the cosets x+Q, x ∈ R. Recall that these aredisjoint with union R. Now, for each x ∈ R, x + Q is dense in R, and so contains amember of [0, 1]. Let E be a subset of [0, 1] consisting of exactly one such memberfrom each coset. (The Axiom of Choice is used here.) Now:

(a) Any two translates r + E of E by rationals are disjoint;

(b) there are only countably many r + E; and,

(c) [0, 1] ⊂⋃

r∈Q∩[−1,1]

(r + E) ⊂ [−1, 2].

To prove (a), suppose r+E and r′+E intersect, where r, r′ are rationals; then,they have a point x in common. Say, x = r + e = r′ + e′, where e, e′ ∈ E. Thene = x − r ∈ x + Q, e′ = x − r′ ∈ x + Q. Since E contains only one element out ofeach coset, e = e′ and hence also r = r′.

(b) follows from the countability of Q.

Finally, let x ∈ [0, 1], let e be the point of E in x + Q and put r = x− e. Then,r is rational and r ∈ [−1, 1], since x, e ∈ [0, 1]. Thus, x = r + e ∈ r + E, withr ∈ Q ∩ [−1, 1], as required for the first inclusion of (c). The second inclusion isobvious.

Now, suppose E were measurable. Since the (outer) measure of a set doesn’tchange under translation, the translates of this measurable set are measurable withthe same value. We cannot have λ(E) = 0, for if it were, countable subadditivitywould give

1 = λ([0, 1]) ≤∑

r

λ(r + E) = 0.

But we can’t have λ(E) > 0 either, for then countable additivity yields

3 = λ([−1, 2]) ≥ λ

(⋃

r

(r + E)

)=∑

r

λ(r + E) = ∞.

This contradiction shows E is not measurable.

6/9/2006 1160 mam

3. MEASURABLE SETS, BOREL SETS AND NULL SETS. 17

The reader can modify this example to show that Rn has a non λn-measurablesubset.

Borel sets.

A σ-ring is a ring closed under countable unions; that is, a family of sets Ris a σ-ring, provided

(i) A, B ∈ R implies A \ B ∈ R and

(ii) H is a countable subfamiliy of R implies⋃

H ∈ R.

It follows that every σ-ring is also closed under countable intersections.

Note. Remember that countable means finite or countably infinite. But in check-ing that R is closed under countable unions we can assume the countable set is ofthe form H = An : n ∈ N, for if H has only N elements, we can take An = ∅, forn > N .

The smallest σ-ring containing a given family of sets C is called the σ-ringgenerated by C; this will be denoted by σ(C). We had better check that thisexists!

2 Lemma. If C is a family of subsets of S, then there is a smallest σ-ring containingC.

Proof. The set of all subsets of S is a σ-ring, so there is at least one σ-ring containingC. Let A be the intersection of all the σ-rings in S containing C, that is, the setof all subsets of S belonging to all σ-rings containing C. If A can be shown to bea σ-ring, it will certainly be the smallest. Now, if A, B ∈ A and R is any σ-ringcontaining C, then A, B ∈ R and hence A \ B ∈ R, since R is a σ-ring. That is, ifA, B ∈ A, then A \ B belongs to every σ-ring containing C, and hence belongs toA. Similarly, if H is a countable subfamily of A, then H is a countable subfamily ofevery σ-ring containing C, so that

⋃H belongs to every σ-ring containing C, that

is⋃H ∈ A. This shows that A is a σ-ring.

One can show that the members of σ(C) are exactly those sets which can be con-structed by repeating the σ-ring operations, starting with the elements of C. The proof

requires transfinite induction. See the exercise labeled generating a σ-ring.

A similar result holds, with essentially the same proof, for the other types of setfamilies discussed earlier, as well as others to be defined shortly. For example, thereis a smallest σ-algebra in S containing a given C, called the σ-algebra generatedby C.

Notice that a σ-ring of subsets of S is a σ-algebra in S if and only if it containsS; in particular, if C contains S, then σ(C) is also the σ-algebra generated by C.

If E is a topological space, then the σ-ring generated by the open sets is calledthe Borel σ-ring5 of E, denoted BBB(E); its members are called Borel sets. Oneoften abbreviates BBB(R) by BBB.

5or σ-algebra, of course, since it contains the whole space E

6/9/2006 1160 mam

18 3. MEASURABLE SETS, BOREL SETS AND NULL SETS

3 Theorem. The Borel sets of Rn are Lebesgue measurable.

Proof. Let G be the family of open sets. Since Mλ is a σ-ring containing G, itcontains σ(G) the smallest σ-ring containing G; that is, Mλ ⊃ BBB(Rn), the familyof Borel sets of Rn.

In section 2, we proved that every bounded interval of Rn is Lebesgue measur-able, and deduced that every open set and every closed set of Rn is measurable.Looking more carefully at that proof, we see that every σ-ring which contains Ialso contains all open sets; thus, σ(I) ⊃ G. But here we have a σ-ring containingG, so it contains the smallest σ-ring containing G; that is, σ(I) ⊃ σ(G) = BBB(Rn),the Borel sets.

Conversely, since σ(G) contains the open sets, it contains also the closed sets,hence the closed intervals. But then, by removing (closed) faces of the closedintervals we see that σ(G) contains all possible intervals. Thus σ(G) ⊃ I. But thenσ(G) is a σ-ring containing I, so σ(G) ⊃ σ(I).

We’ve just shown that

4 Proposition. The family BBB(Rn) of Borel sets of Rn is generated by the familyI of bounded intervals.

Here are some more examples of this type of argument, giving other familiarfamilies generating the Borel sets.

(1) BBB(R) is generated by the family of open intervals (a, b), for a, b ∈ R.

(2) BBB(R) is generated by the family of intervals (a, b], for a, b ∈ R.

(3) BBB(R) is generated by the family of intervals (−∞, b), for b ∈ R.

(4) If D is a dense set of real numbers, then the family of Borel sets BBB(R) isgenerated by the family of intervals (a, b], for a, b ∈ D.

(5) BBB(Rn) is generated by the family of compact sets of Rn.

Proofs. We will prove (1) and (2) and leave the rest for the reader. Let G denotethe open sets of R, let Io be the set of open intervals (a, b) and let Ir denote theset of left-open, right-closed intervals (a, b].

We know that every open interval is an open set, so σ(G) ⊃ G ⊃ Io. Thus,σ(G) is a σ-ring containing Io, so it must contain the smallest σ-ring containing Io.That is, σ(G) ⊃ σ(Io).

On the other hand, every open set in the reals is a union of a sequence ofopen intervals. Therefore, since σ(Io) contains the open intervals and is a σ-ring,it also contains G. Since it is a σ-ring containing G, it also contains σ(G); that is,σ(Io) ⊃ σ(G). Thus, σ(Io) = σ(G) = BBB.

To prove (2), let (a, b] ∈ Ir. For each positive integer n, (a, b + 1n ) ∈ G ⊂ BBB.

Hence, (a, b] =⋂∞

n=1(a, b + 1n) ∈ BBB. This shows that BBB ⊃ Ir and therefore that

BBB ⊃ σ(Ir).

6/9/2006 1160 mam


Conversely, every open interval (a, b) is a countable union⋃∞

n=1(a, b− 1n ] of sets

in Ir, so that σ(Ir) contains the collection of open intervals and hence contains theσ-ring generated by the open intervals. Thus, σ(Ir) ⊃ σ(Io) = BBB, by (1).

To prove statements like (4), one uses the same method and the fact that forall a, b ∈ R with a < b, there exist an, bn ∈ D with an a, bn b, so that(a, b) =

⋃∞n=1(an, bn].

Images of Borel sets under mappings. The generated sigma-ring argumentcan be applied also to study the effect of mappings on Borel sets. Here, we considera transformation of Rn; that is, a mapping T : Rn → Rn. The ideas will begeneralized later, when we study measurable functions.

5 Lemma. Let T : Rn → Rn be any mapping. Then,

(1) B : T−1[B] ∈ BBB(Rn) is a σ-ring;

(2) if T is one-to-one, then B : T [B] ∈ BBB(Rn) is a σ-ring.

Proof. (1) Let W = B ⊂ E : T−1[B] ∈ BBB(Rn).

(i) If A, B ∈ W, then both T−1[A] and T−1[B] ∈ BBB(Rn),so T−1[A \ B] = T−1[A] \ T−1[B] ∈ BBB(Rn) and hence A \ B ∈ W.

(ii) If (Bk) is a sequence in W, then(T−1[Bk])

)is a sequence in the σ-ring

BBB(Rn), so T−1[⋃

k Bk] =⋃

n T−1[Bk] ∈ BBB(Rn), and hence⋃

k Bk ∈ W.

This shows that W is closed under countable unions and set difference, so is aσ-ring, as required.

(2) is proved in the same manner, using the fact that for one-to-one mappings,T (A \ B) = T (A) \ T (B).

6 Theorem.

(1) If T is a continuous map of Rn into Rn and B is a Borel set, then T−1(B) isalso a Borel set.

(2) If T is a one-to-one continuous map of Rn into Rn and B is a Borel set, thenT (B) is also a Borel set.

One summarizes the first statement by saying that a continuous mapping isBorel measurable; the second, by saying a continuous injection “maps Borel setsto Borel sets”.

Proof. (1) Let G be the family of open sets of Rn, and let T : Rn → Rn becontinuous. Then, for each G ∈ G, T−1(G) is open, hence a Borel set. Thus,

G ⊂ B : T−1[B] ∈ BBB(Rn).

By the Lemma, the family on the right is a σ-ring, so it contains the σ-ring generatedby G; that is,

BBB(Rn) = σ(G) ⊂ B : T−1[B] ∈ BBB(Rn)

6/9/2006 1160 mam


Therefore, for every Borel set B, T−1[B] ∈ BBB(Rn).

(2) Now, suppose T is one-to-one and continuous and let K be the family ofcompact sets. Since the continuous image of a compact set is compact,

K ⊂ B : T [B] ∈ BBB(Rn).

Again, the family on the right is a sigma ring, so it contains σ(K) = BBB(Rn); hence,for every Borel set B, T [B] ∈ BBB(Rn), as required.

Remark. A simple modification shows that (1) of Theorem 6 also holds for contin-uous mappings from any metric space to any other; in particular, it holds if T isdefined on an open subset of Rn.

It happens that a one-to-one continuous map T on an open set U of Rn mapsonto an open set of Rn; its inverse T−1 : T (U ) → U is continuous. This is a difficultresult, which we don’t need. (See T. Rado and P.V. Reichelderfer, ”ContinuousTransformations in Analysis”, Springer-Verlag 1955.) If we had it, however, wecould use it together with the fact just mentioned to deduce Theorem 6(2), withoutinvolving the compacts.

Measurable sets are almost Borel.

The proof that there are Lebesgue measurable sets which are not Borel sets isbeyond the scope of this course. But we can show that there is not much differencebetween the two as far as λ is concerned. By definition, a Gδ-set is the intersectionof a sequence of open sets and an Kσ-set is the union of a sequence of compactsets. Both are evidently Borel sets.

7 Theorem.(1) If A ⊂ Rn, then there is a Gδ-set G ⊃ A with λ(G) = λ∗(A).(2) If A is λ-measurable, one can choose G so that λ(G \ A) = 0.(3) If A is λ-measurable, there is a Kσ-set K ⊂ A with λ(A \ K) = 0.(4) A ∈ Mλ iff A = B ∪ N , where B is Borel and N is λ-null

iff A = B 4 N , where B is Borel and N is λ-nulliff A = B \ N , where B is Borel and N is λ-null

Here A 4 B means symmetric difference, that is, A \ B ∪B \ A.

Proof. (1) Let A ⊂ Rn. Then, by the open outer regularity, for each k ∈ N, thereexists an open set Gk ⊃ A with λ(Gk) ≤ λ∗(A) + 1/k. The set G =

⋂∞k=1 Gk then

has λ(G) = λ∗(A).

(2) Now if A is measurable with λ(A) < ∞ and G is as above, we obtainλ(G\A) = λ(G)−λ(A) = 0. If A has infinite measure, one has to be more careful.A can be written as

⋃m Am, where each Am has finite measure (e.g. Am = (mI)∩A,

where I is the interval [−1, 1], 1 = (1, . . . , 1) ). Then, for each m and k, find openGmk with Gmk ⊃ Am and λ(Gmk) < λ(Am) + 1

2m1k. Then put Gk =

⋃m Gmk.

6/9/2006 1160 mam


This makes λ(Gk \ A) ≤ λ(⋃

m(Gmk \ Am)) ≤∑

m λ(Gmk \ Am) ≤ 1k , and thus

λ(⋂

k Gk \ A) = 0, as required.

(3) Again, let A ∈ Mλ, A =⋃

m Am, where each Am has finite measure. Thenby compact inner regularity, there is a compact Kmk ⊂ Am, with λ(Am\Kmk) < 1

k ,so that if K =

⋃k,m Kmk we have λ(Am \K) ≤ 1

k , for all k. Thus, λ(Am \K) = 0.Hence, λ(A \ K) is also 0.

(4) Certainly, if A is of any one of these forms, it is measurable, since Borel andnull sets are measurable. On the other hand, if A is measurable, and K is given by(3), then A = K ∪ N = K 4 N , where N = A \ K ∈ Nλ, and if G is given by (2),then A = G \ N , where N = G \A ∈ Nλ. Since Gδ-sets and Kσ-sets are Borel, weare done.

A measurable set A containing A, with λ(A) = λ∗(A) is called a measurablehull (or measurable cover) of A. Thus, (1) of the previous theorem shows thateach A has a measurable hull that is a Gδ-set. The following is an example of howa measurable hull can be used. The result will not, however, be needed in thiscourse.

8 Corollary. λ∗ is increasingly σ-continuous on all subsets of Rn.

Proof. Let Ak ⊂ Rn and Ak A, we have to show that λ∗(Ak) −→ λ∗(A). Foreach k ∈ N, let Ak be a measurable hull of Ak and let Bk =

⋂i≥k Ai. Since, for

i ≥ k, Ai ⊃ Ai ⊃ Ak we have

Ak ⊃ Bk ⊃ Ak,

hence,λ∗(Ak) = λ(Ak) ≥ λ(Bk) ≥ λ∗(Ak).

This shows that Bk is also a measurable hull of Ak; but (Bk) is an increasingsequence of measurable sets, so λ(Bk) λ(

⋃k Bk). Thus,

λ(Bk) = λ∗(Ak) ≤ λ∗(A) ≤ λ

(⋃

k

Bk

),

so λ∗(Ak) −→ λ∗(A), as required.

The Cantor set.

The family of Lebesgue null sets is denoted Nλ. This family is what is calleda hereditary σ-ring in Rn (also called a σ-ideal of subsets of Rn); that is, Nλ

is closed under countable unions, and if A ⊂ B ∈ Nλ, then A ∈ Nλ. The proof isobvious. Thus ∅, singletons, and hence countable sets are null. The following is anexample of an uncountable null set.

Let I be the unit interval [0,1]. Let G11 = (13 , 2

3 ) the “open middle third” of I.I \G11 is the disjoint union of the two compact intervals K11 = [0, 1

3], K12 = [2

3, 1].

6/9/2006 1160 mam


Let G21 and G22 be the “open middle third” of K11 and K12, respectively, andK21, K22, K23, K24 be the 4 compact intervals obtained by removing these middlethirds, etc

K01 = [0, 1]G11 = (1

3 , 23)

K11 = [0, 13 ], K12 = [23 , 1]

G21 = (19 , 2

9), G22 = (79 , 8

9),K21 = [0, 1

9 ], K22 = [29 , 39 ], K23 = [69 , 7

9 ], K24 = [89 , 1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

In general, Gij is the open interval of length 1/3i concentric with Ki−1,j, whileKi,2j−1 and Ki,2j are the two component intervals of Ki−1,j \ Gij.

The Cantor (ternary) set is defined to be

C =⋂

i

2i⋃

j=1

Kij

,

or what is the same, the complement in [0,1] of⋃∞

i=1

⋃2i−1

j=1 Gij. (The reader maycheck that C is the set of those points of [0,1] which have a ternary expansion usingonly the digits 0 and 2.)

Now, all sets here are measurable, and

λ

∞⋃

i=1

2i−1⋃

j=1

Gij

=

∞∑

i=1

λ

2i−1⋃

j=1

Gij

=

∑

i

2i−1

3i=

13

(1

1 − 23

)= 1.

Thus λ(C) = 1 − 1 = 0.

To see that C is uncountable, suppose C = x1, x2, x3, . . .. Then x1 ∈ K11 ∪K12, a union of two disjoint sets. Let K1j1 be the one of these for which x1 /∈ K1j1 .Now take K2j2 to be such that x2 /∈ K2j2 ⊂ K1j1 , . . . . In this way obtain adecreasing sequence Kiji of compact intervals. The intersection

⋂i Kiji is non-

empty, since it is the intesection of non-empty compact sets, yet contains none ofthe points xi of C. This is a contradiction, since C ⊃

⋂i Kiji .

One can modify the above proof to obtain Cantor subsets of [0,1] of any givenmeasure α, 0 ≤ α < 1.

1. If D is a dense set of real numbers, then the family of Borel sets BBB(R) is generated by thefamily of intervals (a, b], for a, b ∈ D.

2. If C is any family of sets, prove that every element of σ(C) may be covered by a countable

subfamily of C.

3. BBB(Rn) is the σ-ring generated by the family of compact sets of Rn.

4. If R is a σ-ring of subsets of a set S, determine the σ-algebra in S generated by R.

5. Prove that the family BBB(R2) of Borel sets of R2 is the σ-ring generated by B1 × B2 :B1,B2 ∈ BBB(R).

6/9/2006 1160 mam


6. For a family of sets, σ(C) denotes the smallest σ-ring containing C. The operation σ is aclosure operator on the family of subsets of S: that is, C ⊂ σ(C), C ⊂ D =⇒ σ(C) ⊂ σ(D),

and σ(σ(C)) = σ(C).

7. Let A be a set in Rn of finite outer Lebesgue measure such that

λ∗(A) = infλ(G) : G is open, G ⊃ A= supλ(K) : K is compact, K ⊂ A

.

Prove that A is λ-measurable.

8. For each a with 0 ≤ a < 1, there exists a Cantor-like subset of [0, 1] with Lebesgue measurea.

9. The Cantor ternary set consists of those elements of [0,1] which have a ternary expansion

0.a1a2a3 . . . consisting entirely of 0,2. To see this, let K(a1a2 . . . ak) be the set whose

ternary expansion begins with 0.a1a2 . . . ak. Then, K(0) = [0.0,0.0222 . . . ] = [0, 13], K(2) =

[0.2,0.222 . . . ] = [ 23, 1], K(00) = [0, 1

9], K(02) = [ 2

9, 39], K(20) = [ 6

9, 7

9], K(22) = [ 8

9, 1], . . .

. . . . Then 0.a1a2a3 . . . =∑∞

i=1ai3i is the unique element in

⋂k K(a1a2 . . . ak), and every

element of the Cantor set is of one of these.

10. If C is the Cantor ternary set, then C + C = x + y : x, y ∈ C is the interval [0,2]. Thisis because of the representation elements of C in their ternary expansion and the fact that

∑∞

i=1ai

3i : ai ∈ 0,1,2, for all i = [0,1].

11. (Generating a σ-ring)The members of σ(C) are exactly those which can be constructedby repeating the σ-ring operations starting with the elements of C. Indeed, put C0 = C, and

using transfinite induction, for any ordinal α > 0, let

Cα =

⋃

β<α

Cβ

∗

,

where D∗ means the family of all sets which are countable unions of differences of membersof D. Then

a) α < β implies Cα ⊂ Cβ ⊂ σ(C)

b) If Ω is the first uncountable ordinal, then σ(C) =⋃

α<Ω Cα.

c) If card(C) ≤ c, the same is true of card(σ(C)).

12. BBB(Rn) is the smallest family of sets closed to countable intersection and countable union

which contains the open sets.

6/9/2006 1160 mam


4. Measures, the Unique Extension Theorem.

A measure is a non-negative extended real valued countably additive functiondefined on a σ-ring S of sets. Often S can be taken to be a σ-algebra, as in thecase of Lebesgue measure.

The first goal is to find out to what extent measures are determined by theirvalues on a family of sets generating their domain. In particular, are there othermeasures defined on BBB(Rn) or on Mλ agreeing with λ on the (say) bounded openintervals?

Unique Extension.

A Dynkin system or simply D-system, is a family of sets closed undercountable disjoint unions and proper difference. We will call D a C-local D-systemif it is closed under countable disjoint unions and proper differences of memberscontained in some C ∈ C.

Explicitly, D is a D-system if

(a) H ⊂ D, H countable and disjoint =⇒⋃H ∈ D.

(b) A, B ∈ D and A ⊂ B =⇒ B \ A ∈ D.

It is a C-local D-system if it satisfies (a) and

(b′) A, B ∈ D and A ⊂ B ⊂ C ∈ C =⇒ B \ A ∈ D.

Lemma. Every D-system closed under finite intersections is a σ-ring, and anyσ-ring is a D-system.

The proof is an easy exercise.

1 D-system Theorem. If a Dynkin system contains a family C closed under finiteintersection, then it contains σ(C). More generally, suppose C is closed under finiteintersection, and D is a C-local D-system containing C. Then D contains σ(C).

Proof. For the first statement, without loss of generality, we let D be the smallest D-system containing C. For a family A of sets, let A = E : E∩A ∈ D, for all A ∈A. Then, for families A and B,

(i) A is also a Dynkin system,

(ii) A ⊂ B implies A ⊃ B, and

(iii) A ⊂ A.

To prove (i), let Em : m = 1, 2, . . . be a countable disjoint family in A.Then, for each m,

Em ∩ A ∈ D, for all A ∈ A .

The family Em ∩ A : m = 1, 2, . . . is also disjoint, so

(⋃

m

Em) ∩ A =⋃

m

(Em ∩ A) ∈ D, for all A ∈ A.

6/9/2006 1160 mam

4. MEASURES, THE UNIQUE EXTENSION THEOREM. 25

Therefore,⋃

m Em ∈ A.

In a similar way, if E ⊂ F and E, F ∈ A, for all A ∈ A, we have E ∩ A ∈ D,F ∩ A ∈ D, (E ∩ A) ⊂ (F ∩ A), so (F \ E) ∩ A = (F ∩ A) \ (E ∩ A) ∈ D.

To prove (ii), let A ⊂ B. If E ∈ B, then E ∩ B ∈ D, for all B ∈ D. If A ∈ A,then A ∈ B, so E ∩A ∈ D. This shows E ∈ A, as required.

Finally, to prove (iii), let A ∈ A and E ∈ A. Then,

A ∩ E = E ∩ A ∈ D.

Thus, for each A ∈ A, A ∩ E ∈ D, for all E ∈ A; that is A ∈ A.

Now, by hypothesis C ⊂ D and C is closed under finite intersection, so

C ⊂ C.

Thus, C is a D-system containing C, so

D ⊂ C.

But thenD ⊃ C ⊃ C.

Again, D is a D-system containing C, so again contains D:

D ⊂ D.

Thus, for all D ∈ D, D ∩ E ∈ D, for all E ∈ D. This shows that D is a D-systemclosed under finite intersections, so is a σ-ring. Since it contains C, it contains σ(C).

For the general case, assume D is only a C-local D-system containing C. Then,for each C ∈ C, C defined as above, is still a D-system containing C, so itcontains σ(C). That is, if A ∈ σ(C) and C ∈ C, A ∩ C ∈ D.

Now if A ∈ σ(C), then A ⊂⋃

n Cn, where each Cn ∈ C. (This was an exercisein section 3.) Hence, A =

⋃n An where the An are disjoint with An ⊂ Cn, [write

A =⋃

n(A ∩ Cn) and disjointify] so An = An ∩Cn ∈ D, and hence A ∈ D also.

The following is more well known than the D-system material, but is less useful.

A monotone class is a family of sets closed under countable increasing unionsand countable decreasing intersections.

2 Monotone Class Theorem. A Monotone class containing a ring of sets con-tains the generated σ-ring.

The proof is similar to that of the D-system theorem. See the Exercises.

6/9/2006 1160 mam

26 4. MEASURES, THE UNIQUE EXTENSION THEOREM

3 Unique extension Theorem. Any two (non-negative extended-real) measureswhich agree and are finite on a family closed under finite intersections also agreeon the generated σ-ring.

Proof. Let C be closed under finite intersection and µ and ν be measures on σ(C)which are equal and finite on C. Let D = A : µ(A) = ν(A). Then D is a C-localD-system containing C, and hence by the D-system theorem, σ(C) ⊂ D.

4 Corollary. Let Io be the family of bounded open intervals of Rn. The onlymeasure defined on BBB(Rn) agreeing with λ on Io is λ (restricted to BBB(Rn)). Theonly measure on Mλ agreeing with λ on Io is λ itself.

Proof. The first statement follows from the unique extension theorem, since Io isclosed under finite intersections and generates the Borel sets. For the second, if µis defined on Mλ and µ = λ on Io, then µ|BBB(Rn) is λ|BBB(Rn). But A ∈ Mλ impliesA = B ∪ N , where B ∩ N = ∅, B ∈ BBB(Rn), and λ(N ) = 0. Now N is containedin a Gδ-set N with λ(N ) = λ(N ) = 0, so µ(N ) ≤ µ(N ) = λ(N ) = 0, and henceµ(A) = µ(B) + µ(N ) = λ(B) + λ(N ) = λ(A).

For the next application, let Q be the set of dyadic cubes of the form [a, b)where for each i = 1, . . .n, ai = ki

2m and bi = ai + 12m , m ∈ N.

5 Corollary. Any two measures on BBB(Rn) which agree and are finite on Q alsoagree on all of the Borel sets.

Proof. To prove this one shows in the usual manner that Q generates the familyof Borel sets. The family Q∪ ∅ is closed under finite intersection, since any twoelements in Q ∪ ∅ which intersect are contained one in the other, so the resultfollows by the unique extension theorem.

6 Corollary. Lebesgue measure is the only translation-invariant measure on theBorel sets of Rn giving the unit cube [0, 1)× · · · × [0, 1) measure 1.

Proof. Let Q = [0, 1) × · · · × [0, 1) . Let µ be a measure on BBB(Rn) with µ(Q) =λ(Q) = 1.

Then Q is the union of (2m)n disjoint translates of the cube C = [0, 1/2m) ×· · · × [0, 1/2m). Thus, 2mnµ(C) = µ(Q) = 1, so µ(C) = (1/2mn) = λ(C). Usingtranslation invariance again we see that µ = λ on Q, so that by the previousCorollary, µ and λ agree on all Borel sets.

7 Corollary. Every translation invariant measure on the Borel sets of Rn thatgives a finite value to some bounded set with non-empty interior is a multiple ofLebesgue measure (restricted to the Borel sets).

Proof. Let µ be a translation invariant measure on BBB(Rn) and let A be a set withnon-empty interior for which µ(A) < ∞. Then, since [0, 1]× · · ·× [0, 1] is compact,a finite number of translates of A covers Q = [0, 1)× · · · × [0, 1).

6/9/2006 1160 mam


Thus, µ(Q) is also finite.

If µ(Q) = 0, we can cover Rn by a countable number of translates of Q, soµ(Rn) = 0 and µ(B) = 0λ(B) for all Borel sets B.

If µ(Q) = c > 0, let ν(B) = µ(B)/c, for each Borel set B. Then ν is atranslation invariant measure on BBB(Rn), with ν(Q) = 1. Thus, ν(B) = λ(B), forall B. Thus, µ(B) = cλ(B), for all Borel sets B.

Change of measure under a linear transformation.

Suppose T is a linear transformation on R3 to itself. If the volume of A isknown, what is the volume of T [A]? More generally in the n-dimensional case, ifA is Lebesgue-measurable, what is the relationship between λ(A) and λ(T [A])?

7 Theorem. If A is a Lebesgue measurable subset of Rn and T is a linear transfor-mation of Rn, then T [A] is also Lebesgue measurable and λ(T [A]) = |det(T )|λ(A).

Proof.

We only write out the case n > 1, leaving the easy case n = 1 to the reader.

If T is singular (that is, non-invertible), then T (Rn) is a proper vector subspaceof Rn, so has Lebesgue measure 0, by an exercise in Chapter 2. Since det(T ) is also0, we obtain λ(T (A)) = 0 = | det(T )|λ(A). (See also the Note below.)

For the case T : Rn −→ Rn is an invertible linear transformation, we provefirst the corresponding result for Borel sets. Since T is linear and defined on Rn

it is a continuous mapping Since it is invertible, it is also one-to-one. We provedin the previous chapter (Theorem 3.7) that every one-to-one continuous mappingmaps Borel sets to Borel sets. Thus, for each B ∈ BBB(Rn), T (B) is also a Borel set,so that λ(T (B)) makes sense.

The map B 7→ λ(T [B]) is a measure on the Borel sets. Indeed, λ(T (∅)) =λ(∅) = 0, and if (Bi) is a disjoint sequence of Borel sets, then the images T (Bi) aredisjoint (because T is one-to-one) and

λ(T (⋃

i

Bi)) = λ(⋃

i

T (Bi)) =∑

i

λ(T (Bi)).

This measure is also translation invariant because λ(T (x + B) = λ(Tx +T (B)) = λ(T (B)). Moreover, the image of each bounded set is bounded, so byCorollary 7, there is a constant, which we will denote T and call the scale factorof T , with λ(T (B)) = T λ(B), for all Borel sets B. Putting the unit cube Q infor B yields T = λ(T (Q)). It is our job to prove that T = | det(T )|.

If S, T are two non-singular linear transformations we see that ST = λ(ST (Q)) =S λ(T (Q)) = S T .

Now, every invertible linear transformation T on Rn can be factored as a prod-uct (composition) T = Ek . . .E1, where the Ei are “elementary” linear tranforma-tions of the forms:

6/9/2006 1160 mam


Ei interchanges two coordinates;Ei multiplies a co-ordinate by −1;Ei multiplies the first co-ordinate by c > 0;Ei adds the second co-ordinate to the first.

Since T 7→ T and T 7→ | det T | are both multiplicative, it is enough toestablish the equality for T of each of these types.

If T is one of the first two types, then T maps the unit ball U = B(0, 1) =x : |x| ≤ 1 onto itself, so λ(U ) = λ(T (U )) = T λ(U ) and hence T = 1. Thedeterminants in these two cases are 1 and -1, so T = | det(T )|.

If T is of the multiplication type: (x1, x2, . . . , xn) 7→ (cx1, x2, . . . , xn), c > 0,then det T = c and T = λ(T (Q)) = λ([0, c]× [0, 1]× · · · × [0, 1]) = c.

Finally, if T : (x1, x2, . . . , xn) 7→ (x1 +x2, x2, . . . , xn), let S : (x1, x2, . . . , xn) 7→(x1,−x2, . . . , xn). Then STST is the identity and S = 1, so 1 = S T S T =

T 2. Hence T = 1 = det T .

Now, as we said, each invertible T can be factored into transformations of theabove types, and since det is multiplicative, we have

λ(T [B]) = |det T |λ(B), for all B ∈ BBB(Rn).

(This concludes the proof that λT = | det T |λ on the Borel sets. But we stillmust handle non-Borel sets.)

Suppose A is any λ-measurable set. Then, there are Borel sets B1 ⊃ A ⊃ B2,with λ(B1 \ B2) = 0, . Then

T [B1] ⊃ T [A] ⊃ T [B2].

But λ(T [B1] \ T [B2]) ≤ λ(T [B1 \ B2]) = |det T | λ(B1 \ B2) = 0, so all three havethe same measure, namely |det T |λ(B1) = |det T |λ(A), as required. Finally, T [A]is measurable, since it differs from the measurable set T [B2] by a nullset.

Note. (1) Here is another way to handle the non invertible case. If T is non invert-ible, it can be factored as T = SR, where R(x1, . . . , xn) has 0 for its nth coordinate.Thus R(B) ⊂ Rn−1 × 0, so λ(T (Rn)) = |det S|λ(R(Rn)) ≤ | det S|λ(Rn−1 ×0) = 0 Hence λ(T (A) = |det T |λ(A), for all A ⊂ BBB(Rn).

(2) It is to be emphasised that in the singular case, if B is a Borel set, T (B)need not be a Borel set, just a Lebesgue measurable of measure 0.)

Lebesgue-Stieltjes outer measure. (Bounded interval approach)

Let α be an increasing function on R to R. We recall for such a function, thatright and left limits

α(a) = α(a+) = limx−→a+

α(x) = infα(x) : x > a

α(a) = α(a−) = limx−→a−

α(x) = supα(x) : x < a

6/9/2006 1160 mam


satisfya < b =⇒ α(a) ≤ α(a) ≤ α(a) ≤ α(b) ≤ α(b) ≤ α(b) (∗)

In particular, α and α are also increasing functions; α is right continuous and α isleft continuous.

From such an α, we define a set function µ = µα on I, the family of boundedintervals of R as follows:

µ[a, b] = α(b) − α(a)

µ(a, b] = α(b) − α(a)

µ[a, b) = α(b) − α(a)

µ(a, b) = α(b) − α(a)

The Lebesgue-Stieltjes outer measure associated with α is the functiondefined on all subsets of R by

µ∗(A) = µ∗α(A) = inf

∑

E∈H

µα(E) : H is a countable I-cover of A

.

That is µ∗α is the Caratheodory outer measure generated by µα and the bounded

intervals of R. This is an outer measure, since it was generated by the Caratheodoryprocess from a non-negative function.

We now go through the same procedure as for Lebesgue measure.

(a) µ is 2-additive on I, (eight cases) and then by induction (using Lemma 3,section 1) it is therefore finitely additive.

(b) from this, as in proposition 6, section 1, µ is finitely subadditive and superad-ditive on I.

(c) For each I ∈ I, there exist open interval U ⊃ I and compact interval K ⊂ Isuch that µ(U ) − µ(I) and µ(I) − µ(K) are arbitrarily small. This requires aproof of all four cases: We will work on the “outside”, showing such U exist.Suppose I = [a, b]. If a′ < a < b < b′, then using (∗), we have

µ[a, b] ≤ µ(a′, b′) ≤ µ[a′, b′],

and as a′ −→ a and b′ −→ b, we have µ[a′, b′] = α(b′)−α(a′) −→ α(b)−α(a) =µ[a, b], by the right continuity of α and the left-continuity of α. Thus, for eachε > 0, U = (a′, b′) can be taken so that µ(U )−µ(I) < ε. If I = (a, b], one takesU = (a, b′) and uses

µ(I) ≤ µ(U ) ≤ µ(a, b′] −→ µ(I).

The case I = [a, b) is similar, and the case I = (a, b) is trivial, since it is alreadyopen. The approximation from inside by compacts follows the same pattern.

(d) It follows, then, exactly as for Lebesgue measure that µ is countably additiveon I, and also countably subadditive.

6/9/2006 1160 mam


(e) The countable subadditivity of µ entails that µ∗ = µ∗α coincides with µ on I,

just as in Theorem 2.1.

(f) Since µ∗ is an outer measure, the family Mµ∗ of µ∗-measurable sets is a σ-algebra on which µ∗ is countably additive and contains the familyNµ∗ of µ∗-nullsets.

(g) The proof of Th 2.6 is carried over completely, yielding all bounded intervals,hence all Borel sets, measurable.

(h) Using (c) above, we can follow the proof of 2.9 to obtain

µ∗(A) = inf

∑

I∈U

µ(I) : U is a countable cover of A by bounded open intervals

(i) So µ∗ is open outer regular on all sets and is compact inner regular on theµ∗-measurable sets.

As with λ, we now let µα stand for µ∗α|Mµ∗

α. This is the Lebesgue-Stieltjes

measure generated by α.

The unique extension theorem shows that any measure which coincides withµα on (say) the intervals (a,b] of (a dense subset of) the reals coincides with µα onthe Borel sets. The analogue of Th 3.5 also holds, because of the outer and innerregularity: Each subset of R has a Gδ measurable hull. Each µ∗-measurable set issqueezed between a Gδ -set and a Kσ-set, the difference between the two being aµ-null set.

5 Corollary. Each monotone α : R −→ R has only countably many jumps.

Proof. By replacing, if necessary α by −α, we may assume α is increasing. Now,for x ∈ R, µα(x) = α(x)−α(x) is the jump of α at x. We say α has a jump at 0if this number is > 0. Now, if α has n jumps of size ≥ 1

k , in an interval [a, b], say atx1, . . .xn, then n 1

k≤ µα(x1, . . .xn) ≤ µα([a, b]) < ∞; so there are only finitely

many jumps of size ≥ 1/k, and thus, only countably many jumps in [a, b]. Since Ris the union of countably many bounded intervals, there are only countably manyjumps all together.

This corollary can also be proved independently of the construction of µα.

Lebesgue-Stieltjes outer measure can be developed in many other ways. Hereis a theorem tying some of the other methods to the one we used.

6/9/2006 1160 mam


9 Theorem. For an increasing α : R −→ R, let τ = τα be defined on the intervalsby τ (I) = α(b) − α(a), if I has endpoints a ≤ b. Then

(1) µ∗(A) is the (Caratheodory) outer measure generated by τ and the boundedopen intervals.

(2) If α is right-continuous, µ∗ is the outer measure generated by τ and the intervals(a, b] ∈ I.

(3) If α is left-continuous, µ∗ is the outer measure generated by τ and the intervals[a, b) ∈ I.

Proof.

We prove (1), which many find the most difficult, and leave the rest as exercises.

Let τ∗o be the Caratheodory outer measure generated by the τ and the set Io

of bounded open intervals. Thus,

τ∗o (A) = inf

∑

U∈U

τ (U ) : U is a countable Io-cover of A

And we recall (see point (h) in the development of Lebesgue Stieltjes outer measure)

µ∗α(A) = inf

∑

U∈U

µα(U ) : U is a countable Io-cover of A

If U = (a, b) is a (bounded) open interval, then

µα(U ) = α(b) − α(a) ≤ α(b) − α(a) = τ (U )

Therefore, if U is a countable family of open intervals covering A (that is, is acountable Io-cover of A),

µ∗α(A) ≤

∑

U∈U

τ (U )

Taking the infimum over all possible such covers gives

µ∗α(A) ≤ τ∗

o (A).

For the reverse inequality, first let U = (a, b) ∈ Io. Since α is increasing, it hasat most a countable number of points of jump, hence the set of points of continuityof α is dense in R. Thus, we may choose sequences (xk) and (yk) of points ofcontinuity of α such that xk a, yk b, and x1 = y1. Then,

(a, b) =∞⋃

k=1

(xk+1, xk] ∪∞⋃

k=1

(yk, yk+1]

∞∑

k=1

τ (xk+1, xk] = limn

n∑

k=1

(α(xk) − α(xk+1))

= limn

(α(x1) − α(xn+1)

= α(x1) − α(a)

6/9/2006 1160 mam


and

∞∑

k=1

τ (yk, yk+1] = α(b) − α(y1)

= α(b) − α(x1)

so

∞∑

k=1

τ (xk+1, xk] +∞∑

k=1

τ (yk, yk+1] = α(b) − α(a) = µα(a, b)

Now, if ε > 0 is given, since each xk and yk is a point of continuity of α, we canfind x′

k > xk and y′k > yk with

α(x′k) − α(xk) < ε/2k+1 and α(y′k) − α(yk) < ε/2k+1,

so that ∑

k

τ (xk+1, x′k) +

∑

k

τ (yk, y′k+1) < µα(a, b) + ε.

This shows that for each open interval U and each ε > 0, there is a countabledisjoint family H ⊂ Io such that

∑I∈H τ (I) < µ(U ) + ε and

⋃H ⊃ U .

Let U = Un : n ∈ N be a countable Io-cover of A. Using the statementjust proved, for each n choose a countable disjoint Hn ⊂ Io with union containingUn and

∑I∈Hn

τ (I) < µ(Un) + ε/2n. Then, the family⋃

n Hn still covers A and∑n

∑I∈Hn

τ (I) ≤∑

n µ(Un) + ε. Thus,

τ∗o (A) ≤

∑

n

µ(Un) + ε.

Let ε −→ 0 and then take the infimum over all possible countable Io-covers ofA to get

τ∗o (A) ≤ µ∗

α(A),

as required to complete the proof. If ν is a measure, defined on a σ-ring S of sets, the members of S are called

ν-measurable.

10 Theorem. Let ν be a measure on subsets of R, for which compact sets (orbounded intervals) are measurable of finite measure. Then ν coincides with someLebesgue-Stieltjes outer measure on the Borel sets.

Proof. If compact sets or bounded intervals are measurable, so are all the Borelsets; if compact sets have finite measure, so do all bounded intervals. Thus, wemay define α(x) = ν(0, x∨ 0] − ν(x ∧ 0, 0], that is

α(x) =

ν(0, x], x ≥ 0−ν(x, 0], x < 0

6/9/2006 1160 mam


Then α is clearly increasing and is right-continuous, by the σ-continuity of ν. In-deed, if (tn) is a sequence of reals with tn > x, and tn x ≥ 0, then (0, x] =⋂

n(0, tn], so limn α(tn) = limn ν(0, tn] = ν (⋂

n(0, tn]) = ν(0, x] = α(x). A similarcalculation works when x < 0. If µ∗ = µ∗

α, then µ∗(a, b] = α(b)−α(a) = ν(a, b], forall a ≤ b. Thus, by the unique extension theorem, µ∗ and ν agree on all Borel sets.As a bonus, we see that ν is open outer regular and compact inner regular on theBorel sets.

Note. If P is a probability measure on BBB(R), that is, a measure on BBB(R) withP(R) = 1, its (probability) distribution function is defined to be F (x) =P(−∞, x]. Besides being increasing and right-continuous, F has a limit 0 at −∞and 1 at +∞. The Unique Extension Theorem shows that no two probabilitiescan have the same distribution function; the extension theorem shows that eachsuch function yields a probability measure. Thus, these functions are in one-to-onecorrespondence with the probability measures on BBB(R).

Completion of measures.

Let S be a σ-ring in S, µ a measure on S. One calls µ complete if A ⊂ B ∈ Swith µ(B) = 0 implies A ∈ S (and hence µ(A) = 0.) Nµ = B ∈ S : µ(B) = 0, isthe family of µ-null sets.

Recall that Lebesgue measure (that is, Lebesgue outer measure restricted toMλ, the Lebesgue measurable sets) is a complete measure. Moreover, if A ∈ Mλ,then there exists B, C ∈ BBB(Rn) with B ⊃ A ⊃ C and λ(B \ C) = 0.

Theorem. Let µ be measure on S. Let S be the set of all those A such that thereexists B, C ∈ S with B ⊃ A ⊃ C and µ(B \ C) = 0, and in this situation putµ(A) = µ(C) = µ(B). Then

(1) S is a σ-ring containing S;

(2) µ is a complete measure extending µ; and

(3) any complete measure extending µ also extends µ.

Proof. (1) Clearly S ⊂ S ,so we just verify the axioms of a σ-ring.

If A1, A2 ∈ S, then there exist B1, B2, C1, C2 with Bi ⊃ Ai ⊃ Ci and Bi \Ci ∈Nµ. Then,

B1 \ C2 ⊃ A1 \ A2 ⊃ C1 \ B2

and(B1 \ C2) \ (C1 \ B2) ⊂ (B1 \ C1) ∪ (B2 \ C2) ∈ Nµ,

which shows that A1 \ A2 ∈ S .

To show S is closed under countable unions, let (An) be a sequence in S andBn ⊃ An ⊃ Cn, then

⋃n Bn ⊃

⋃n An ⊃

⋃n Cn and

⋃

n

Bn \

(⋃

n

Cn

)⊂⋃

n

(Bn \ Cn),

6/9/2006 1160 mam


which is again null, as required.

(2) Before showing µ is a complete measure, we have to show it is well defined.But if A ∈ S is squeezed between two pairs B1 ⊃ A ⊃ C1 and B2 ⊃ A ⊃ C2, witheach Bi \ Ci null, then

Bi ⊃ A ⊃ C1 ∪C2 ⊃ Ci

and µ(Bi \ Ci) = 0, so µ(C1) = µ(C1 ∪ C2) = µ(C2).

Now suppose (An) is a disjoint sequence in S and Bn, Cn are as before. Then,(Cn) is also disjoint and

µ(⋃

n

An) = µ(⋃

n

Cn) =∑

n

µ(Cn) =∑

n

µ(An),

which shows countable additivity.

It is obvious that µ extends µ. It is complete because, if D ⊂ A and µ(A) = 0,then there exist B ⊃ A ⊃ C with B \ C of measure 0 and µ(C) = µ(B) = 0. Butthen, B ⊃ D ⊃ ∅ and µ(B \ ∅) = 0, so that D ∈ S.

(3) If ν is complete and extends µ, then A ∈ S implies there exist B, C asbefore so ν(B \C) = µ(B \C) = 0 and A \C ⊂ B \C, so that A \C belongs to thedomain of ν, and has ν-measure 0. Thus, A also belongs to the domain of ν andν(A) = ν(C) = µ(C) = µ(A).

1. A D-system closed under finite intersections is a σ-ring. Each σ-ring is a D-system.

2. Formulate and prove an analogue of the D-system theorem for rings instead of σ-rings.

3. A λ-system (in Billingsley) is a family of sets closed under proper difference and countable

increasing unions and which contains the whole space. These are Dynkin systems in oursense. And Dynkin systems which contain the whole space are λ-systems.

4. If a λ-system contains a family C closed under finite intersection, then it contains the sigma-algebra generated by C.

5. If T is a linear transformationof Rn, then even for non-measurableA λ(T [A]) = |det(T )|λ[A].

6. We know that any measure ν on a σ-ring S of subsets of R for which compact sets are

measurable of finite measure agrees on the Borel sets with some Lebesgue Stieltjes outermeasure. If also ν is complete, that is A ∈ S, ν(A) = 0 and B ⊂ A implies B ∈ S, then ν

agrees with some Lebesgue-Stieltjes outer measure µ∗α on all the µ∗

α-measurable sets.

7. Let P be set of all intervals in Q of the form (a, b), with a < b, a, b ∈ Q. Show that σ(P) is

the set of all subsets of Q. Let µ(B) =the number of points in B, for all B ∈ σ(P), and letν(B) = 2µ(B), for all such B. Show that µ = ν on P, but not on σ(P). Why does this not

contradict the Unique Extension Theorem?

8. Proof of the Monotone Class Theorem. Suppose M is the smallest monotone class

(MC) containing the ring R. For a family of sets A, let

A = E : A ∩ E ∈ M, A \ E ∈ M, E \ A ∈ M, for all A ∈ A

Then A is also a monotone class.

One has A ⊂ B implies A ⊃ B, and A ⊂ A.

Now, R ⊂ M and R is closed under finite intersection and difference, so R ⊂ R.

6/9/2006 1160 mam


Since R is a MC containing R, M ⊂ R .But then M ⊃ R ⊃ R.

Since M is a MC containing R, M ⊂ M.This shows that M is a monotone class closed under finite intersections and difference, so is

a σ-ring.

9. Let µ be a finite measure on a σ-ring S and let (An) be a sequence of members of M provethat

µ(⋂

n

⋃k≥n Ak) ≥ lim supn µ(An).

10. Let D be the family of open disks B(x,r) : r > 0 in R2. For each B(x,r) ∈ D let

τ(B(x,r)) = r2. Let τ∗ be the Caratheodory outer measure generated by τ and D.

a) Prove that τ∗ is open outer regular (at each A ⊂ R2).

b) Assuming open sets of R2 are τ∗-measurable, show that τ∗ is a multiple of Lebesgue

measure on the Borel sets; that is, that there exists a real constant K, such that τ∗(A) =Kλ(A), for A ∈ B(R2). What about other sets A ⊂ R2?

11. Let f be a continuous strictly increasing function on R onto R. Recall that f−1 also is

continuous. Prove that on the Borel sets, the Lebesgue-Stieltjes measure generated by f isgiven by the formula

µf (B) = λ(f [B]).

12. Let µ be a measure on the σ-ring S. Let N be the collection of sets of µ-measure 0, and

N the collection of subsets of members of N . Show that the completed σ-ring S can becharacterized as

the set of all A of the form B ∪ N , with B ∈ S and N ∈ N

or the set of all A of the form B 4 N , with B ∈ S and N ∈ N

or the set of all A of the form B \ N , with B ∈ S and N ∈ N

and that the value of the completion at A is µ(A) = µ(B) in this case.

6/9/2006 1160 mam


5. Measurable functions.

Definition. Let S be a σ-ring in S and let E be a σ-ring in another space E.For a mapping f on a subset of S to E, f is called S-E measurable iff the set[f ∈ B] = f−1[B] belongs to S, for each B ∈ E . Occasionally, one has to deal withthe possibility of domain(f) not being all of S; however, if E is an algebra, one doeshave domain(f) = f−1[E] ∈ S.

Whenever E is understood, one just says “S-measurable”. (In the case E is Ror Rn, E is usually understood to be the family of Borel sets of E.) When S andE are both the σ-algebras of Borel sets of special spaces, S-E measurable maps arecalled Borel measurable. Instead of “Borel measurable function”, one often says“Borel function”.

In Probability Theory, one fixes a σ-algebra F of “events”. In this case, F-Emeasurable map, usually denoted by a symbol such as X or Y , is called a randomvariable, or, in case E = Rn, a random vector.

1 Lemma. Let f be a function on S to E.

(1) If S is a σ-ring in S, then B ⊂ E : f−1[B] ∈ S is a σ-ring in E.

(2) If E is a σ-ring in E, then f−1[B] : B ∈ E is a σ-ring in S.

The same statements hold for “σ-algebra” instead of “σ-ring”.

Proof.

(1) This is a repeat of the proof given in 3.5(1), with the family of Borel setsreplaced by a general σ-ring S.

Let W = B ⊂ E : f−1[B] ∈ S. Then:

(i) if A, B ∈ W, then f−1[A \ B] = f−1[A] \ f−1[B] ∈ S, so that A \ B ∈ W;and

(ii) if (Bn) is a sequence in W, then(f−1[Bn])

)is a sequence in S, so f−1[

⋃n Bn] =⋃

n f−1[Bn] ∈ S, and hence⋃

n Bn ∈ W, completing the proof.

(2) Exercise.

The remark about σ-algebras follows from f−1[E] = S, since f is assumeddefined on all of S.

If C is a family of sets in E and f : S −→ E, one sometimes writes f−1(C) forf−1[B] : B ∈ C. Thus the second part of the lemma says if E is a σ-ring, thenf−1(E) is one also. Then f is S-E measurable iff f−1(E) ⊂ S. Thus f−1(E) is thesmallest σ-ring in S which makes f measurable as a map into (E, E).

2 Theorem. Let f : S → E, S be a σ-ring in S, and C be a family of subsets ofE generating the σ-ring E (that is, σ(C) = E).

(1) Then, f is S-E measurable iff f−1[C] ∈ S, for all C ∈ C;

(2) hence, σ(f−1(C)) = f−1(σ(C)).

6/9/2006 1160 mam

5. MEASURABLE FUNCTIONS. 37

Proof. (1) Certainly, if f is a measurable function the condition is satisfied. Toprove the converse, suppose f−1[C] ∈ S, for all C ∈ C; that is, suppose C ⊂ B :f−1[B] ∈ S. By Lemma 1 (1), this latter is a σ-ring and since it contains C, italso contains σ(C). In other words, f−1[B] ∈ S, for all B ∈ E ; that is, f is S-Emeasurable.

(2) Taking S to be σ(f−1(C)), (1) gives f−1(σ(C)) ⊂ σ(f−1(C)). On the otherhand f−1(σ(C)) is a σ-ring containing f−1(C) and so contains σ(f−1(C)).

Notation. To simplify the writing of sets defined in terms of functions, it is con-venient to abbreviate (f < c) = x : f(x) < c, (f ∈ B) = x : f(x) ∈ B,(f1 ∈ A1, f2 ∈ A2, . . . ) = x : f1(x) ∈ A1, f2(x) ∈ A2, . . ., etc. We may use either“square brackets” [ ] or parentheses ( ) in this notation.

3 Corollary. Let S be a σ-ring in S.

(1) For a function f : S −→ R, f is S-measurable iff (f ≤ c) = f−1[(−∞, c] ] ∈ S,for every c in the real numbers.

(2) If f : Rn −→ Rm is continuous, then f is Borel-measurable.

(3) Let f : S −→ Rn; f = (f1, . . . , fn). Then f is S-measurable iff each fi isS-measurable.

Proof. (1) Since C := (−∞, c] : c ∈ R is a family of sets in R with σ(C) = BBB, thisfollows immediately from Theorem 2.

(2) Here take S to be BBB(Rn) and E to be BBB(Rm). Now, BBB(Rm) is the σ-ringgenerated by the open sets, and by continuity of f , f−1[C] is open in Rn, for eachopen set C of Rm. Since each open set in Rn belongs to BBB(Rn), the conditions ofthe theorem are satisfied, and hence f is Borel measurable.

(3) Let fi be S-measurable, for each i. Take C to be the family of open intervals(a, b) = (a1, b1) × · · · × (an, bn) of Rn. Then C generates the Borel sets. Since fi isa measurable function, [fi ∈ (ai, bi)] ∈ S, so

[f ∈ (a, b)] = [(f1, . . . , fn) ∈ (a1, b1) × · · · × (an, bn)]

= [f1 ∈ (a1, b1), . . . , fn ∈ (an, bn)]

= [f1 ∈ (a1, b1)] ∩ · · · ∩ [fn ∈ (an, bn)]∈ S.

This shows that C satisfies the conditions of the theorem so that f is S-measurable.

Conversely, if f : S −→ Rn is S-measurable and I is an open interval of R,then G := x ∈ Rn : xi ∈ I is an open subset of Rn, so [fi ∈ I] = [f ∈ G] ∈ S.Since the open intervals of R generate the Borel sets, fi is measureable.

4 Theorem. Let S, E , E ′ be σ-rings in S, E, and E′, respectively. If g : S −→ E isS–E measurable and f : E −→ E′ is E–E ′ measurable, then f(g) = f g : S −→ E′

is S–E ′ measurable. In other words, a measurable function of a measurable functionis a measurable function.

6/9/2006 1160 mam

38 5. MEASURABLE FUNCTIONS

Proof. If B ∈ E ′, then f−1[B] ∈ E , so g−1[f−1[B]] ∈ S; that is, (f(g))−1[B] ∈ S.

Consequently, for example, if g1, . . . , gn are S-measurable real functions andf : Rn −→ Rk is continuous, or even Borel measurable, then f(g1, . . . , gn) is S-measurable. In particular, sums and products of measurable functions are measur-able.

Warning. If one talks of a Lebesgue measurable function on (say) R to R, onemeans a Mλ-BBB(R) measurable. The result does not say the composition of twoLebesgue measurable functions is Lebesgue measurable, it says the compositionof a Borel-measurable function with a Lebesgue measurable function is Lebesguemeasurable.

If (fn) is a sequence of S-measurable real-valued functions, and f = supn fn,then (f > c) =

⋃n∈N(fn > c) ∈ S. If f(x) is finite for each x ∈ S, this shows that f

is an S-measurable function on S to R. If f is not necessarily finite, we still obtain

(f = ∞) =⋂

k∈N

(f > k),

so that the set where f is finite (f ∈ R) = f−1[R] is still measurable.

These considerations bring out a desire for a definition of an extended real-valued measurable function. An open set in R is an arbitrary union of open in-tervals. One includes (a,∞] as an (open) interval containing +∞, and [−∞, b) asone containing −∞. The family BBB(R) of Borel sets of R is the σ-ring generated bythese open sets. One finds that for B ⊂ R, B ∈ BBB(R) iff B ∩R ∈ BBB(R). A functionf : S −→ R is usually called S-measurable if it is S-BBB(R) measurable.

Since the intervals (c, +∞], c ∈ R, together with R generate BBB(R), we see thatan extended real-valued function is measurable iff (f > c) ∈ S, for each c ∈ R, and(f ∈ R) ∈ S. Moreover, if f is S-BBB(R) measurable and is finite-valued, then it isS-BBB measurable.

5 Theorem. Let (fn be a sequence of extended real-valued measurable functionson S. Then the four functions (defined pointwise) infn fn, supn fn, lim supn fn,lim infn fn are measurable.

Proof. We gave the argument above for supremum. The case of infimum is similar.Since

lim supn

fn(x) = infn

supk≥n

fk(x) and lim infn

fn(x) = supn

infk≥n

fk(x),

the measurability of these two also follows.

6 Theorem. Let (fn) be a sequence of extended real-valued S-measurable func-tions on S. If fn −→ f another extended real-valued function on S, then f isS-measurable.

6/9/2006 1160 mam

5. MEASURABLE FUNCTIONS. 39

More generally, if A = x : fn(x) converges in R, then A is measurable andthe function f : A −→ R defined by f(x) = limn fn(x), is S-measurable.

Proof. This follows since A = (lim supn fn = lim infn fn).

The above theorem also holds with “extended real-valued” replaced by “real-valued” if convergence is interpreted in R instead of R. Indeed, (fn converges in R) =f−1[R], where f is the limit in R.

In summary: All the usual operations of algebra and analysis lead frommeasurable functions to measurable functions.

Unless otherwise stated measurable functions areassumed everywhere defined

Simple functions.

An S-measurable function f is called simple if it has a finite number of values.If a1, . . . , an are all the values of f and Ai denotes [f = ai] = f−1[ai] then Ai ∈ Sand f is

∑ni=1 ai1Ai . This is its standard form. An S-measurable simple function

is also called an S-simple function.

7 Lemma. Let f : S −→ R be an S-measurable function. Then, there exists asequence (fn) of simple S-measurable functions converging to f ; i.e., limn fn(x) =f(x), for all x ∈ S. If f ≥ 0, this sequence can be taken increasing.

Proof. In case f ≥ 0, set fn =n2n∑

k=1

k − 12n

111[(k−1)/2n<f≤k/2n]. Now, fn ≤ fn+1 and

fn −→ f . Indeed, for each n, if f(x) > n, then fn(x) = 0 ≤ fn+1(x). As soon asn ≥ f(x), there exists a k ∈ 0, n2n−1 such that k/2n < f(x) ≤ (k+1)/2n. Thenthere are 2 possibilities: Either,

(i) f(x) ≤ k2n + 1

2n+1 = 2k+12n+1 , in which case, fn+1(x) = fn(x) = k

2n , or

(ii) 2k+12n+1 < f(x) ≤ 2k+1+1

2n+1 = k+12n , in which case fn+1(x) = fn(x) + 1/2n+1.

Thus, in all cases fn(x) ≤ fn+1(x).

As for convergence, the formula k/2n < f(x) ≤ (k + 1)/2n, shows that forn ≥ f(x), f(x) − fn(x) ≤ 1

2n −→ 0.

In general, f can be written as f = f+ − f−, where

f+ = f ∨ 0 and f− = (−f) ∨ 0,

the positive and negative parts of f . Apply the previous case to each of these.

Image measures (distributions).

If f : S −→ E is S-E measurable and µ is a measure on S, it induces a measureν on E , defined by ν(B) = µ(f−1[B]) = µ(f ∈ B). This is called the imagemeasure of µ under f , often denoted f(µ), or µf−1. In case µ = P is a probability

6/9/2006 1160 mam

40 5. MEASURABLE FUNCTIONS

measure and f = X is a random variable, PX−1 is called the distribution ofX, also denoted PX . In the special case, E = Rn, PX has distribution functionFX(x) = PX((−∞, x] ) = P(X ≤ x); FX is called the distribution function of X.

1. For a function f : R −→ R, if f is monotone, then f is Borel measurable.

2. Let M be a σ-algebra and (fn) a sequence of M-measurable functions. Let M be the set

of all x ∈ S such that (fn(x)) converges in R. Then M is M-measurable, as was seen fromlim sup and lim inf. This also may be deduced from the Cauchy criterion for convergence:

M =⋂

0<ε∈Q

⋃

k∈N

⋂

m,n≥k

x : |fm(x)− fn(x)| < ε.

The functions |fn − fm | are measurable, so each x : |fm(x) − fn(x)| < ε ∈ M, and the

operations used are all those of a σ-algebra, so M ∈ M.

This proof extends to the case measurable functions with values in a Polish space (i.e. com-

plete separable metric space.)

3. There are two different definitions of S-simple function. The one in the text requires f to be

(everywhere defined) and be measurable S–B(R) and to take on a finite number of values.This forces S to be an algebra. If S is a σ-ring (or even just a ring) we could define f to be

simple iff it is a linear combination of indicators 1A of sets A ∈ S.

Using this latter definition, if S is a σ-ring, show f is a pointwise limit of S-simple functionsiff it is S–B0 measurable, where B0 = B \ 0 : B ∈ B(R). These are the measurable

functions studied in Halmos (as opposed to measurable transformations).

4. Let T be the mapping on [0, 1) onto the unit circle S1 , defined by T (x) = e2πix. If P is

Lebesgue measure restricted to the Borel sets of [0,1), then its image measure P = PT−1

is a probability measure on the circle which is rotationally invariant. This serves as a good

model for spinning a (fair) dial.

5. The graph of any real-valued measurable function on R forms a set of λ2 measure 0.

6/9/2006 1160 mam

6. INTEGRATION. 41

6. Integration.

In this section we develop the main properties of the so-called Lebesgue integral,using simple functions and order. For the entire section, S will be an abstract space,M will be a σ-algebra in S, and µ will be a measure on M, that is a non-negativeextended real-valued countably additive function defined on M. The elements ofM are called “measurable”. We denote the set of all real-valued M-measurablefunctions by L0 = L0(S,M, µ).

This family is a vector space of functions, since it closed under addition andmultiplication by real scalars.

The 5 basic properties.

Recall that an M-simple function f is an M-measurable function with a finitenumber of real values. If a1, . . . , an is the list of all the values of f , and Ai = [f = ai],then f =

∑ni=1 ai111Ai . This is the standard form of the simple function as a linear

combination of indicators of measurable sets. Every other linear combination ofsuch indicators still has a finite number of values, so is a simple function.

The integral of a function f : S −→ R with respect to µ is defined as follows.

(1) If f ≥ 0 is simple with standard form∑n

i=1 ai111Ai , then

∫f dµ =

n∑

i=1

aiµ(Ai);

(2) If f is non-negative and M-measurable, then∫

f dµ = sup∫

h dµ : 0 ≤ h ≤ f, h M-simple ,

(possibly +∞). [This will be shown compatible with (1).]

(3) If f is M-measurable, then∫

f dµ =∫

f+ dµ−∫

f− dµ,

provided the right-hand side exists, that is, is not of the form +∞− (+∞).

The integral of a non-measurable function, or one where both∫

f+ dµ and∫f− dµ are +∞, is left undefined.

The integral of f over a subset E of S is defined to be∫

E

f dµ =∫

f 111Edµ.

Other notation for∫

f dµ is∫

f(x) µ(dx) or∫

f(x) dµ(x). If µ is Lebesgue measure,one often just writes

∫f(x)dx.

Note. Here, and elsewhere in Measure Theory, the convention is that 0 · ±∞ = 0.

6/9/2006 1160 mam

42 6. INTEGRATION

1 Lemma.

(a) Let 0 ≤ f =∑m

j=1 bj111Bi , where the Bj are disjoint, not necessarily in standard

form. Then,∫

f dµ =∑m

j=1 bjµ(Bj).

(b) If f and g are non-negative simple functions then∫

f + g dµ =∫

f dµ +∫

g dµ.

(c) If f ≥ 0 is simple and c ≥ 0 is a scalar, then∫

cf dµ = c∫

f dµ.

Proof.

(a) We may assume⋃m

j=1 Bj = S, for otherwise let Bm+1 be the complementof this union and bm+1 = 0; then, for x ∈ Bm+1, f(x) =

∑mj=1 bj111Bj (x) = 0, and

hence, f =∑m+1

j=1 bj111Bj and∑m

j=1 bjµ(Bj) =∑m+1

j=1 bjµ(Bj).

Let f =∑n

i=1 ai111Ai in standard form.

Now, suppose x ∈ Bj ∩ Ai. Then, bj = f(x) = ai. For this j, we get for allx ∈ Bj , f(x) = ai, so x ∈ Ai = x : f(x) = ai. This shows that Bj∩Ai 6= ∅ impliesBj ⊂ Ai. But each element of Ai belongs to some Bj , hence Ai =

⋃

j;Bj⊂Ai

Bj . Now

we can compute.

m∑

j=1

bjµ(Bj) =n∑

i=1

∑

j:Bj⊂Ai

bjµ(Bj)

=n∑

i=1

∑

j:Bj⊂Ai

aiµ(Bj)

=n∑

i=1

aiµ

⋃

j:Bj⊂Ai

Bj

=n∑

i=1

aiµ(Ai)

=∫

f dµ.

(b) Let f and g be simple of the form f =∑n

i=1 ai111Ai and g =∑m

j=1 bj111Bj ,with the A1, . . . , An and the B1, . . . , Bm each disjoint subfamilies of M withunion S. Then Ai ∩ Bj : i = 1, . . . , n, j = 1, . . . , m is also a disjoint family and

f + g =∑

i

ai111Ai +∑

j

bj111Bj

=∑

i,j

(ai + bj)111Ai∩Bj

6/9/2006 1160 mam

6. INTEGRATION. 43

Thus, ∫f + g dµ =

∑

i,j

(ai + bj)µ(Ai ∩Bj)

=∑

i

∑

j

aiµ(Ai ∩ Bj) +∑

j

∑

i

bjµ(Ai ∩ Bj)

=∑

i

ai

∑

j

µ(Ai ∩ Bj) +∑

j

bj

∑

i

µ(Ai ∩ Bj)

=∑

i

aiµ(⋃

j

(Ai ∩ Bj)) +∑

j

bjµ(⋃

i

(Ai ∩ Bj))

=∑

i

aiµ(Ai) +∑

j

bjµ(Bj)

=∫

f dµ +∫

g dµ,

as required.

(c) If f =∑n

i=1 ai111Ai ≥ 0, then cf =∑n

i=1 cai111Ai is also a non-negative simplefunction, and its integral is

∫cf dµ =

∑i caiµ(Ai) = c

∑i aiµ(Ai).

The definition of∫

f dµ for non-negative measurable functions is compatiblewith the definition for simple ones.

2 Corollary. Let f, g ∈ L0. If f, g are simple and 0 ≤ g ≤ f then∫

g dµ ≤∫

f dµ;hence

∫f dµ = sup

∫h dµ : 0 ≤ h ≤ f, h simple .

Proof. f = g+(f−g), and f −g ≥ 0, so∫

f dµ =∫

g dµ+∫

(f −g) dµ ≥∫

g dµ.

3 Corollary. If f, g ∈ L0, and 0 ≤ f ≤ g, then∫

f dµ ≤∫

g dµ.

Proof. Since ∫

h dµ : 0 ≤ h ≤ f, h simple ⊂ ∫

h dµ : 0 ≤ h ≤ g, h simple,sup

∫h dµ : 0 ≤ h ≤ f, h simple ≤ sup

∫h dµ : 0 ≤ h ≤ g, h simple. The rest

is definition.

4 Monotone Convergence Theorem. Let (fn) be an increasing sequence ofnon-negative real-valued measurable functions converging pointwise on S to the(real-valued) function f . Then

∫fn dµ −→

∫f dµ.

Note. This is the pointwise version of the theorem. Later, in corollary 18 we willhave a version for so-called “almost-everywhere convergence”.

Proof. By the lemma,∫

fn dµ ≤∫

f dµ, for all n ∈ N. And,∫

fn dµ ≤∫

fn+1 dµ,for all n, so limn

∫fn dµ exists and is ≤

∫f dµ.

For the reverse inequality, let h =∑k

i=1 ai111Ai be simple with 0 ≤ h ≤ f and let0 < c < 1. Put En = [fn ≥ ch]. Then (En) is an increasing sequence of sets with

6/9/2006 1160 mam

44 6. INTEGRATION

⋃n En = S. Indeed, if f(x) > 0 h(x) ≤ f(x) implies ch(x) < f(x), so for some

n ∈ N, fn(x) > ch(x), while if f(x) = 0, then ch(x) = 0 also, so x ∈ E1. Thus,∫

En

ch dµ ≤∫

En

fn dµ ≤∫

fn dµ (∗)

As n → ∞,

∫

En

h dµ =∫ k∑

i=1

ai111Ai111En dµ =n∑

i=1

aiµ(Ai ∩ En) →n∑

i=1

aiµ(Ai) =∫

h dµ.

Thus, letting n −→ ∞ in (∗) we obtain

c

∫h dµ ≤ lim

n

∫fn dµ.

Taking supremum over the non-negative simple h ≤ f and then letting c −→ 1, weobtain

∫f dµ ≤ limn

∫fn dµ.

5 Theorem. For c ∈ R, and f, g ∈ L0,

(a)∫

f + g dµ =∫

f dµ +∫

g dµ,

(b)∫

cf dµ = c∫

f dµ,

whenever the right sides exist.

Proof. (a)

Step 1. We already know this for non-negative simple functions f, g.

Step 2. If f, g ≥ 0 are measurable, let fn and gn be simple functions with 0 ≤ fn f and 0 ≤ gn g. Then fn + gn is also simple and increases to f + g. By Step 1,

∫fn + gn dµ =

∫fn dµ +

∫gn dµ,

for each n and in the limit, by the Monotone Convergence Theorem,∫

f + g dµ =∫

f dµ +∫

g dµ.

Step 3. Suppose h = f + g and∫

f dµ +∫

g dµ exists. Then either both of∫

f+ dµand

∫g +dµ or both of

∫f− dµ and

∫g −dµ are finite.

For example, if∫

f+ = +∞, then∫

f− is finite. In that case,∫

g− is also finite,

otherwise∫

f +∫

g =∫

f+ −∫

f− +∫

g+ −∫

g− = +∞ −∞, which doesn’t exist.

Then h = (f+ + g+)− (f− + g−), Put h1 = (f+ + g+) and h2 = (f− + g−). Then,h1, h2 are ≥ 0, and one of

∫h1 dµ and

∫h2 dµ are finite, say the latter.

6/9/2006 1160 mam

6. INTEGRATION. 45

Recall that h+ = h ∨ 0 and h− = (−h) ∨ 0. Since h = h1 − h2 ≤ h1, h+ =h ∨ 0 ≤ h1 ∨ 0 = h1 and similarly h− ≤ h2. Since

∫h2 dµ is finite, so is

∫h− dµ

and we calculate.h+ − h− = h1 − h2

h+ + h2 = h1 + h−

∫h+ + h2 dµ =

∫h1 + h− dµ

∫h+ dµ +

∫h2 dµ =

∫h1 dµ +

∫h− dµ, by step 2

∫h+ dµ −

∫h− dµ =

∫h1 dµ −

∫h2 dµ,

∫f + g dµ =

∫(f+ + g+) dµ −

∫(f− + g−) dµ

=∫

f+ dµ +∫

g+ dµ −∫

f− dµ −∫

g− dµ

=∫

f dµ +∫

g dµ.

Thus,∫

f +g dµ =∫

(f++g+) dµ−∫(f−+g−) dµ =

∫f+ dµ+

∫g+ dµ−

∫f− dµ−∫

g− dµ =∫

f dµ +∫

g dµ.

6 Corollary. If f ≤ g then∫

f dµ ≤∫

g dµ, whenever both are known to exist, orwhen

∫g dµ < ∞ or

∫f dµ > −∞.

Proof. Exercise.

A real-valued M-measurable function is called integrable with respect to µif∫

f dµ exists and is finite. The set of all such integrable functions is denotedL1 = L1(µ) = L1(S,M, µ).

7 Corollary. The space L1(S,M, µ) is a vector space on which integration withrespect to µ is a linear functional.

Proof. This follows immediately from Theorem 5.

8 Theorem. For a function f ∈ L0,

(1) |∫

f dµ| ≤∫|f | dµ, whenever the left-side exists.

(2) f is integrable iff |f | is integrable.

Proof. We know f = f+ − f− ≤ f+ + f− = |f |. Thus, if∫

f dµ exists, then∫

f dµ ≤∫

|f | dµ and

−∫

f dµ =∫

−f dµ ≤∫

|f | dµ; hence,∣∣∣∣∫

f dµ

∣∣∣∣ ≤∫

|f | dµ.

6/9/2006 1160 mam

46 6. INTEGRATION

This proves (1).

If f is integrable, then both f+ and f− are so also; hence,∫|f | dµ =

∫f+ dµ+∫

f− dµ is finite. If |f | is integrable, part (1) shows that∫

f dµ is finite.

9. Note. The previous result is an important property of Lebesgue integration,which distiguishes it from other integration theories. For example, the improperRiemann integral of a function f can be finite, without the |f |being integrable and|f | can be Riemann integrable, without f being Riemann integrable.

10. Five basic properties. The integral with respect to a measure µ on a σ-algebra M satisfies is defined on a subfamily of the family L0 of finite real-valuedM-measurable functions f and satisfies the following properties.

(1) For A ∈ M,∫

111A dµ = µ(A).

(2)∫

f + g dµ =∫

f dµ +∫

g dµ, whenever the right-side exists (in particular iff, g ∈ L0 are non-negative).

(3) If c ∈ R, then∫

cf dµ = c∫

f dµ, if the right side exists.

(4) (MCT) If for each n ∈ N, 0 ≤ fn ∈ L0 and (fn) is increasing to f ∈ L0, then∫f dµ exits and

∫fn dµ −→

∫f dµ.

(5) If f ∈ L0 then∫

f dµ exists iff∫

f dµ =∫

f+ dµ −∫

f− dµ and is finite iff∫|f | dµ < ∞.

One can take this result (axiomatically) as a starting point for developing thetheory of the integral. Indeed, from (1),(2), and (3) we recover the definition ofintegral on the simple functions and together with (4) we see that all non-negativef ∈ L0 have integrals (which are ≥ 0). This if further brought out be the followingconverse. It (or alternatively, the method of proof) is very useful.

11 Theorem. Let µ as always, be a measure on the σ-algebra M in S. SupposeI : D −→ R is a mapping defined on a family D of real-valued M-measurablefunctions satisfying the following properties.

(1) For A ∈ M, I(111A) = µ(A).

(2) If f, g ≥ 0 and f, g ∈ D then f + g ∈ D and I(f + g) = I(f) + I(g) (additivity).

(3) If 0 ≤ f ∈ D and 0 ≤ c ∈ R, then cf ∈ D and I(cf) = cI(f) (positivehomogeneity).

(4) If for each n ∈ N, 0 ≤ fn ∈ D and fn is increasing to f ∈ L0, then f ∈ D andI(fn) −→ I(f).

(5) f ∈ D iff I(f) = I(f+) − I(f−).

Then, D = f ∈ L0 :∫

f dµ exists and I(f) =∫

f dµ, for all f ∈ D.

If instead, only (1)–(4) are satisfied, I(f) =∫

f dµ, for all f ≥ 0 in L0.

6/9/2006 1160 mam

6. INTEGRATION. 47

Proof. By (1)∫

111A dµ = µ(A) = I(111A), for all A ∈ M. Therefore, if∑

i ai111Ai is anon-negative simple function, then (2) and (3) give

∫ ∑

i

ai111Ai dµ =∑

i

ai

∫111Ai dµ =

∑

i

aiI(111Ai ) = I

(∑

i

aiI(111Ai )

)

If f is a non-negative measurable function, then there exists (fn) an increasingsequence of simple functions converging pointwise to f . Then

∫fn dµ = I(fn),

and thus by property (4)∫

f dµ = I(f).

Finally, suppose f ∈ L0. Then by (5) I(f) exists iff I(f) = I(f+) − I(f−).But I(f+) =

∫f+ dµ and I(f−) =

∫f− dµ. Thus I(f) exists iff one of

∫f+ dµ or∫

f− dµ is finite, and then I(f) =∫

f+ dµ −∫

f− dµ =∫

f dµ.

Remark. This is a compromise result. As the proof shows, the additivity (2) and positive homo-

geneity (3) are only used to obtain the correct integral of non-negative simple functions. However,

in its application, we usually have much more satisfied. Of course, once an “integral” satisfies

these properties, it satisfies all the properties of integral, so (2) and (3) hold for all f, g ∈ L0(M)

for which the right side exists.

12 Theorem. (Fatou’s Lemma) If (fn) is a sequence of non-negative real-valuedmeasurable functions, (with lim infn fn finite) then

∫lim inf

nfn dµ ≤ lim inf

n

∫fn dµ.

Proof. Recall that lim infn fn = limn gn, where gn = infk≥n fk. Now, for eachk ≥ n, gn ≤ fk, so ∫

gn dµ ≤∫

fk dµ, for k ≥ n,

and hence ∫gn dµ ≤ inf

k≥n

∫fk dµ.

But (gn) is an increasing sequence of non-negative measurable functions with limitlim infn fn, so the Monotone Convergence Theorem gives

∫lim infn fn dµ ≤ lim infn

∫fn dµ,

as required.

6/9/2006 1160 mam

48 6. INTEGRATION

13 Dominated Convergence Theorem. Let (fn) be a sequence of real-valuedmeasurable functions converging pointwise to another one f . If there exists g ∈ L1

with |fn| ≤ g, for all n ∈ N, then∫

f ndµ −→∫

f dµ.

Proof. The conditions give −g ≤ fn ≤ g. Thus fn +g is a sequence of non-negativefunctions with lim infn(fn +g) = f +g. Thus by Fatou’s Lemma,

∫f dµ+

∫g dµ =∫

f + g dµ ≤ lim infn∫

fn + g dµ = lim infn∫

fn dµ +∫

g dµ. Since∫

g dµ is finite,we may subtract it from both sides, giving

∫f dµ ≤ lim inf

n

∫fn dµ. (∗)

Similarly, using Fatou’s lemma with g − fn one obtains

∫−f dµ ≤ lim inf

n

∫(−fn) dµ.

Thus,

lim supn

∫fn dµ ≤

∫f dµ. (∗∗)

Putting (∗) and (∗∗) together yields the result.

14 Theorem. Let f ∈ L0(S,M, µ). If∫

f dµ exists, then the set-function ν : A 7→∫A f dµ is countably additive on M.

The function ν of this theorem is known as the indefinite integral of f withrespect to µ.

Proof. Since ν(∅) =∫∅ f dµ is clearly 0, we need only show that if (An) is a disjoint

sequence in M, then ν(⋃

n An) =∑

n ν(An).

Now, for each x ∈ S, x belongs to at most one of the An, so the series∑n f111An(x) has at most one non-zero term, and therefore converges; if A =

⋃n An,

the sum is f111A(x).

If f ≥ 0, then the partial sums gn =∑

k≤n f111Ak increase pointwise to f111A, soby the Monotone Convergence Theorem,

∑

n

ν(An) = limn

∑

k≤n

ν(Ak)

= limn

∑

k≤n

∫f111Ak dµ

= limn

∫gn dµ

=∫

f111A dµ

= ν(A),

6/9/2006 1160 mam

6. INTEGRATION. 49

as required.

In general, if∫

f dµ exists, then it is the difference of∫

f+dµ, and∫

f−dµ, oneof which is finite, say

∫f−dµ < ∞. Then each

∫A

f− dµ is finite and by the firstpart,

ν(A) =∫

A

f+dµ −∫

A

f−dµ,

is countably additive since the indefinite integrals of f+ and f− are. (We used herethe fact that (f111A)+ = f+111A and (f111A)− = f−111A.)

In case f is integrable instead, then each gn, as defined above is dominated byan integrable function: |gn| ≤ |f |111A ≤ |f | ∈ L1. Thus, the Dominated ConvergenceTheorem applies to give the same conclusion.

Almost everywhere convergence.

A property of points of S is said to hold µ-almost everywhere (µ-a.e.) orfor almost all x ∈ S (µ-a.a. x ∈ S), if there exists a µ-null set N (that is, a setN ∈ M with µ(N ) = 0) such that the property holds on the complement of N .

In particular, if (fn) is a sequence of measurable functions and f is another one,(fn) is said to converge µ-almost everywhere to f if there exists a µ-null Nsuch that fn(x) −→ f(x) for all x ∈ N c = S \N ; fn is said to converge µ-almosteverywhere if there exists a µ-null N such that the sequence (fn(x)) convergesfor all x ∈ N c.

Notice the subtle difference between these two definitions — in the first, thereis reference to a limit function; in the second, there is not. Let us rectify this.

15 Lemma. If (fn) is a sequence of real-valued measurable functions convergingµ-a.e. (in R), then there exists a real-valued measurable function f such thatfn −→ f µ-a.e.

Proof. By the definition of convergence µ-a.e., there exists N ∈ M, with µ(N ) = 0such that (fn(x)) converges for x ∈ N c. Define f(x) = limn fn(x) for x ∈ N c andf(x) = 0, for x ∈ N . Then f = limn fn111Nc , pointwise, so f is M-measurable andfn −→ f µ-a.e. Indeed, fn(x) −→ f(x) for x ∈ N c.

Almost everywhere limits are not unique:

16 Lemma. Suppose fn, f , and g are all in L0, and fn −→ f µ-a.e. then fn −→ gµ-a.e. iff f = g µ-a.e.

Proof. ( =⇒ ) Let N1 and N2 be µ-null sets such that fn(x) → f(x) for x ∈ N c1

and fn(x) → g(x), for x ∈ N c2 . Put N = N1 ∪ N2. Then N is also a null set, and

for x ∈ N c, both statements hold:

fn(x) → f(x), and fn(x) → g(x).

Since limits in R are unique, f(x) = g(x) for all x ∈ N c. Thus f = g µ-a.e.

The converse is proved similarly.

6/9/2006 1160 mam

50 6. INTEGRATION

17 Theorem. For f ∈ L0, δ > 0:

(1)∫|f | dµ ≥ δµ(|f | ≥ δ)

(2)∫|f | dµ = 0, iff f = 0 µ-a.e.; in this case

∫f dµ is also 0.

(3) if f, g ∈ L0,∫

f dµ exists, and f = g µ-a.e. then∫

f dµ =∫

g dµ.

Proof. (1) This is one form of the “Chebechev inequality”. If A = (|f | ≥ δ), then|f | ≥ |f |111A ≥ δ111A , so

∫|f | dµ ≥ δµ(A), as required.

(2) First suppose∫|f | dµ = 0. Then, by (1), if δ > 0, then 0 =

∫|f | dµ ≥

δµ(|f | > δ), so µ(|f | ≥ δ) = 0. For k ∈ N, let Ak = [|f | ≥ 1k]. Then

⋃k Ak = [|f | >

0], so N := [|f | > 0] is of measure 0 and f = 0 on N c. Hence, f = 0 µ-a.e.

For the converse, suppose f = 0 µ-a.e. Then, there is a µ-null set N with f(x) =0 for x ∈ N c. Then

∫Nc |f | dµ =

∫Nc 0 dµ = 0, so

∫|f | dµ =

∫N |f | dµ+

∫Nc |f | dµ =∫

N|f | dµ. But if we put gk = |f |∧k, then gk increases pointwise to |f |, and therefore,

by the MCT,∫N|f | dµ = limk

∫N

gk dµ ≤ limk

∫N

k dµ = limk kµ(N ) = 0, asrequired.

Now, if∫|f | dµ = 0, |f | is integrable, so f is also and |

∫f dµ| ≤

∫|f | dµ, so∫

f dµ = 0.

(3) Suppose f, g ∈ L0, f = g µ-a.e., and∫

f dµ exists. Then g − f = 0 µ-a.e.,so∫

g − f dµ = 0, by (2). Then, g = f + (g − f), so∫

g dµ =∫

f dµ +∫

g − f dµ =∫

f dµ.

The MCT, the DCT, and Fatou’s Lemma hold with pointwise convergencereplaced by a.e. convergence.

18 Corollary. Suppose fn ∈ L0, for all n ∈ N and f ∈ L0, then∫

fn dµ −→∫f dµ, provided fn −→ f µ-a.e. and either

(a) (MCT) for all n ∈ N, 0 ≤ fn µ-a.e., fn ≤ fn+1 µ-a.e. or

(b) (DCT) there exists g ∈ L1, such that for all n ∈ N, |fn| ≤ g µ-a.e.

Similarly,

(c) (Fatou) if 0 ≤ fn µ-a.e. and fn ∈ L0, for all n ∈ N f ∈ L0 and f = lim infn fn

µ-a.e., then∫

f dµ ≤ lim infn∫

fn dµ.

Proof. (a) Let N, Nn be µ-null sets such that fn → f pointwise on N c, f1 ≥ 0 on(N0)c and fn ≤ fn+1 on (Nn)c, for n ≥ 1. Then the set Z := N ∪

⋃∞n=0 Nn is also

µ-null and all three statements hold on Zc. Thus, we let f ′n = 111Zcfn, f ′ = 111Zcf ,

we have f ′n −→ f ′ pointwise (everywhere) and 0 ≤ f ′

n ≤ f ′n+1 (everywhere); hence,

the pointwise MCT applies to get∫

f ′n dµ −→

∫f ′ dµ. But f ′

n = fn µ-a.e. andf ′ = f µ-a.e. so

∫f ′

n dµ =∫

f ndµ and∫

f ′ dµ =∫

f dµ, and the result follows.

(b) Similarly, if N and Nn are null sets such that fn −→ f on N c and |fn| ≤ gon (Nn)c. Then Z = N ∪

⋃n Nn is µ-null and f ′

n = 111Zcfn converges pointwise

6/9/2006 1160 mam

6. INTEGRATION. 51

to f ′ = 111Zcf and |f ′n| ≤ g′ = 111Zcg ∈ L1, so by the pointwise DCT,

∫fn dµ =∫

f ′n dµ −→

∫f ′ dµ =

∫f dµ.

(c) Is proved similarly, or deduced from (a) as the pointwise Fatou’s Lemma isdeduced from the pointwise MCT.

The point of the requirement f ∈ L0 is that we require the limit or liminf befinite for almost all x.

Further upgrades. The Monotone Convergence Theorem and Fatou’s Lemmacan be improved still further in some cases.

19 Theorem. Let (fn) be a sequence in L0(S,M, µ).

(a) If 0 ≤ fn ≤ fn+1 µ-a.e. for all n ∈ N and limn

∫fn dµ < ∞, then there exists

f ∈ L1 such that fn −→ f µ-a.e. and then limn

∫fn dµ =

∫f dµ.

(b) If 0 ≤ fn µ-a.e. for all n ∈ N and lim infn∫

fn dµ < ∞, then there exists anf ∈ L1 such that f = lim infn fn µ-a.e. and

∫f dµ ≤ lim infn

∫fn dµ.

Proof. We prove (a) and leave (b) as an exercise.

Since 0 ≤ fn ≤ fn+1 µ-a.e. for each n, we change the fn on a set of measure 0,so that these inequalities hold everywhere. This does not affect the integrals. Thenfn converges pointwise to some g which is measurable, but might be +∞ at somepoints.

Let a = limn

∫fn dµ < ∞. Fix M ∈ N. Then fn ∧ M g ∧ M which is finite

everywhere. Hence,∫

g ∧ M dµ = limn

∫fn ∧ M dµ ≤ a < ∞.

Now on the set where g > M , g ∧ M = M , so

Mµ(g > M ) ≤∫

g ∧ M dµ ≤ a.

This shows that µ(g = +∞) ≤ a/M , for all M and hence g is finite µ-a.e. Letf(x) = g(x) if g(x) is finite and 0 otherwise. Then f ∈ L0 and fn −→ f µ-a.e., andhence

∫fn dµ −→

∫f dµ, as required. f ∈ L1, since

∫f dµ = a, finite.

Remark.. The above result and the a.e. version of the MCT inspire the followingconventions. If f is an extended real-valued measurable functions which is almosteverywhere equal to a real-valued f whose integral exists one puts

∫f dµ =

∫f dµ,

and if f ≥ 0 takes the value +∞ on a set of positive measure, one puts∫

f dµ = +∞.With these conventions 0 ≤ fn f µ-a.e. implies

∫fn dµ −→

∫f dµ in all cases.

The reader can check that if f is non-negative and M-measurable (with someinfinite values) then the formula

∫f dµ = sup

∫h dµ : 0 ≤ h ≤ f, h M-simple

6/9/2006 1160 mam

52 6. INTEGRATION

still holds

If f is measurable with possibly infinite values we again put∫

f dµ =∫

f+ dµ −∫

f− dµ,

whenever the right side exists. Of course, at most one of µ(f = +∞) and µ(f =−∞) will be non-zero.

Dependence on a real parameter.

For the following we assume f : S × [a, b] −→ R. For t ∈ [a, b], to say f(x, t) ismeasurable in x means the map x 7→ f(x, t) (also denoted f(·, t)) is measurable(that is, belongs to L0).

20 Theorem. Suppose f(x, t) is measurable in x for each t ∈ [a, b], g is integrableand |f(x, t)| ≤ g(x) for each t ∈ [a, b] and x ∈ S. If, for all x ∈ S, f(x, t) −→ h(x)as t −→ t0, then

∫f(x, t) µ(dx) −→

∫h(x) µ(dx), as t −→ t0.

Proof. Here we use the sequential criterion for convergence in R. Let (tn) be asequence in [a, b] \ t0 with tn −→ t0. Put hn(x) = f(x, tn). Then (hn) is asequence of measurable functions converging pointwise to h, with |hn| ≤ g ∈ L1, soby the DCT

∫hn dµ −→

∫h dµ. Since this is true for all choices of the sequence

(tn), limt→t0

∫f(x, t) µ(dx) =

∫h dµ.

21 Corollary. If f(x, t) is measurable in x for each t ∈ [a, b], and continuous int at t0 ∈ [a, b] for each x ∈ S, with |f(·, t)| ≤ g, where g ∈ L1, then the functionF : t 7→

∫f(x, t) µ(dx) is continuous at t0.

Proof. The function F is real-valued, since each |f(·, t)| ≤ g and g is integrable. Toshow F is continuous at t0 just put h(x) = f(x, t0) in the previous result.

22 Corollary. Let f(x, t) be measurable in x, for each t ∈ [a, b] and differentiablein t, for each x ∈ S, with ∣∣∣∣

∂f

∂t(x, t)

∣∣∣∣ ≤ g(x),

for some integrable function g. If f(·, t) is integrable for some t ∈ [a, b], then it isintegrable for all t ∈ [a, b] and the function

F : t 7→∫

f(x, t) µ(dx)

is differentiable on [a, b] with

F ′(t) =d

dt

∫f(x, t) µ(dx) =

∫∂f

∂t(x, t) µ(dx).

6/9/2006 1160 mam

6. INTEGRATION. 53

Proof. For each t ∈ [a, b], the partial derivative ∂f∂t (x, t) is measurable as a function

of x since there is a sequence of measurable functionsf(x, tn) − f(x, t)

tn − tconverging

to it.

Now let t0 be such that f(·, t0) is integrable and t be another point of [a, b]. Bythe mean-value theorem, the function

h(x) :=f(x, t) − f(x, t0)

t − t0=

∂f

∂t(x, s(x)),

for some s(x) between t0 and t. Now, h is measurable, and by hypothesis |h| =∣∣∣∂f∂t

(·, s(·))∣∣∣ ≤ g, an integrable function, so h is integrable and hence f(·, t) =

f(·, t0) + (t − t0)h is also integrable for each t ∈ [a, b]. Thus F (t) =∫

f(x, t) µ(dx)is finite for each t.

To avoid more notation, let t0 now be any element of [a, b]. As before,

∣∣∣∣f(x, t) − f(x, t0)

t − t0

∣∣∣∣ ≤ g(x).

Since g is integrable, the real-parameter form of DCT above (Theorem 21) says

limt−→t0

F (t) − F (t0)t − t0

= limt−→t0

∫f(x, t) − f(x, t0)

t − t0µ(dx) =

∫∂f

∂t(x, t0) µ(dx).

6/9/2006 1160 mam

54 6. INTEGRATION

Connection with the Riemann integral.

We first review the Darboux form of the Riemann integral of a real func-tion defined on an interval of R. A partition of the interval [a, b] is a set P =x0, x1, . . . , xn, such that a = x0 ≤ x1 ≤ x2 ≤ · · · ≤ xn = b. The upper and lowerDarboux sums for a function f : [a, b] −→ R with respect to the partition P are

U (f, P ) =n∑

i=1

sup f [xi−1, xi](xi − xi−1) and

L(f, P ) =n∑

i=1

inf f [xi−1, xi](xi − xi−1).

Then the upper and lower Riemann integrals are:

∫ b

a

f(x) dx = infP

U (f, P ) and

∫ b

a

f(x) dx = supP

L(f, P ).

When these two are equal and a real number, f is called Riemann integrableon [a, b] and

∫ b

af(x) dx is their common value. One sees right away that if f is

Riemann integrable, then it is bounded.

Now suppose f is bounded, then one can prove that there exists a sequence(Pk) of such partitions with Pk ⊂ Pk+1 (“Pk+1 refines Pk) and the distance betweenajdacent points of Pk is less than 1

kand such that

limk

L(f, Pk) =∫ b

a

f(x) dx, limk

U (f, Pk) =∫ b

a

f(x) dx.

23 Theorem.

(a) If f is Riemann integrable on [a, b], then it is Lebesgue integrable (that is,integrable with respect to λ) on [a, b] to the same value.

(b) If f is bounded on [a, b], then it is Riemann integrable iff it is continuous atλ-almost all points of [a, b].

Proof. Take Pk as in the preceding discussion. If Pk = x0, x1, . . . , xn, set Ai =(xi−1, xi], and Ai = [xi−1, xi].

hk = f(a)111a +n∑

i=1

inf f [Ai]111Ai

gk = f(a)111a +n∑

i=1

sup f [Ai]111Ai ,

6/9/2006 1160 mam

6. INTEGRATION. 55

Then,hk ≤ f ≤ gk.

Since Pk+1 refines Pk we see that hk increases to some h ≤ f and gk decreases tosome g ≥ f . Moreover,

L(f, Pk) =∫

[a,b]

hk dλ ≤∫

[a,b]

gk dλ = U (f, Pk).

Taking limits, using either the MCT or the DCT, one has

∫ b

a

f(x) dx =∫

[a,b]

h dλ ≤∫

[a,b]

g dλ =∫ b

a

f(x) dx.

Notice that to this point, the hypothesis of Riemann integrability has not beenused.

Now, if f is Riemann integrable the two outside integrals are equal, so h ≤ gwith the same Lebesgue integral. This gives

∫[a,b]

(g−h) dλ = 0, with g−h ≥ 0 andso g = h for λ-almost all x ∈ [a, b]. But then f is also equal to each of these for λalmost all x ∈ [a, b]. Since subsets of λ-null sets are again null (hence measurable)f is a Mλ-measurable function. It’s integral is then the same as that of h (or g).∫[a,b] f dλ =

∫[a,b] h dλ =

∫ b

a f(x) dx. This proves (a).

Now, still under the hypothesis that f is Riemann integrable, if x /∈⋃

k Pk,h(x) = g(x), and ε > 0, there exists k0 so large that k ≥ k0 implies |hk(x)−gk(x)| <ε. But x ∈ (xi−1, xi) for some adjacent pair xi−1, xi ∈ Pk and hk and gk areconstant on that interval. Since hk ≤ f ≤ gk, we see that for all t ∈ (xi−1, xi),|f(t) − f(x)| ≤ ε, so f is continuous at x. Since the set

⋃k Pk is countable, it has

measure 0, so f is continuous at almost all points of [a, b].

Conversely, suppose f is continuous at x and x /∈⋃

k Pk, let Ik be the closedinterval between adjacent points of Pk which contains x. Since the length of Ik is< 1/k, and f is continuous at x, for all k ≥ some k0, we have f(x) − ε < f(t) <f(x) + ε and hence f(x) − ε ≤ hk(x) ≤ f(x) ≤ gk(x) ≤ f(x) + ε. This shows thatif f is continuous almost everywhere, then h = f = g a.e. and thus

∫ b

a

f(x) dx =∫

[a,b]

h dλ =∫

[a,b]

g dλ =∫ b

a

f(x) dx,

so f is Riemann integrable.

1. If f ≤ g then∫

f dµ ≤∫

g dµ, whenever both are known to exist, or when∫

g dµ < ∞ or∫f dµ > −∞.

2. Let S = N, M the set of all subsets of S and µ be “counting measure”, that is µ(A) = thenumber of points in A (= ∞, if A has infinitely many points). Then:

a) for all non-negative f : N −→ R,

∫f dµ =

∞∑

n=1

f(n).

6/9/2006 1160 mam

56 6. INTEGRATION

b) f : N −→ R is integrable iff the series above is absolutely convergent and then that thesame formula holds.

c) If the series is conditionally, but not absolutely, convergent then∫

f dµ does not exist.

3. In the Montone Convergence Theorem, the assumption f1 ≥ 0, may be replaced by∫

f1 dµ >−∞ yielding the same conclusion.

4. If∫

f1 dµ < +∞ and (fn) is a decreasing sequence of measurable functions with fn −→ f a

finite valued function, then∫

fn dµ −→∫

f dµ.

5. Fatou’s Lemma may extended by assuming, instead of non-negativity, that there exists g with∫g dµ > −∞ and g ≤ fn, for all n ∈ N, again obtaining

∫lim infn fn dµ ≤ lim infn

∫fn dµ.

6. Using Fatou’s Lemma, one can actually remove the monotonicity from the Monotone Conver-gence Theorem, just as long as the approximation is from below: Let (fn) be an sequence of

non-negative real-valued measurable functions converging pointwise on S to the (real-valued)function f . If fn(x) ≤ f(x), for all x, then

∫fn dµ −→

∫f dµ.

7. Let µ be counting measure on a two-point space S = a, b, so that µa = µb = 1.By defining appropriate functions fn on S, show that Fatou’s lemma is an extension of the

elementary resultlim inf

n(xn + yn) ≥ lim inf

nxn + lim inf

nyn

.

8. Let fn = 1n

1[0,n] and f = 0. Show that fn converges uniformly to 0, fn ∈ L1(R,Mλ, λ)

— that is, fn is integrable with respect to Lebesgue measure — yet∫

f dλ 6= limn∫

fn dλ.Explain why this does not contradict the Monotone Convergence Theorem or the Dominated

Convergence Theorem. What about Fatou’s Lemma?

9. Let gn = −fn, where fn is as in the previous problem. Show that the conclusion of Fatou’sLemma does not hold.

10. If µ is σ-finite, then condition (1) of the characteriziation of integral need only be assumedfor sets of finite measure.

11. If f is integrable and a > 0 show that x : |f(x)| > a has finite measure and that x : f(x) 6=0 has σ-finite measure (that is, is a union of a countable family of sets of finite measure).

12. If 0 ≤ fn µ-a.e for all n ∈ N and lim infn∫

fn dµ < ∞, then there exists an f ∈ L0 such

that f = lim infn fn and ∫f dµ ≤ lim inf

n

∫fndµ.

13. If f is integrable and ε > 0 then∫|f − ϕ| dµ < ε, for some integrable simple function ϕ.

14. Let f be integrable and ν be its indefinite integral: ν(A) =∫

Af dµ, for A ∈ M. Prove

ν(A) = 0, for all A ∈ M iff f = 0 µ-a.e.

15. Show that if f ≥ 0 is improper Riemann integrable on [0,∞) then f is λ-integrable there

and∫[0,∞) f dλ =

∫ ∞0 f(x)dx.

16. If |f | is improper Riemann integrable on [0,+∞), then f is λ-integrable on [0,∞), but notnecessarily improper Riemann integrable. On the other hand, f can be improper Riemann

integrable but not Lebesgue integrable.

17. From Calculus,∫ ∞0 e−tx dx = 1

t, for t > 0. Moreover t ≥ a > 0 implies e−tx ≤ e−ax. Use

this and the connection with Lebesgue integral to justify differentiating under the integralsign, and from this obtain

∫ ∞0 xne−xdx = n!.

18. Deduce the Monotone Convergence Theorem from the Dominated Convergence Theorem.

This is easy if the limit function is integrable; a little trickier, if not.

6/9/2006 1160 mam

7. CHANGE OF VARIABLE, RADON-NIKODYM THEOREM. 57

7. Change of variable, Radon-Nikodym Theorem.

Integration with respect to image measures (distributions).

Recall the basic definitions: Let M be a σ-algebra in S, and E a σ-algebra inanother space E. If T : S −→ E is M-E measurable and µ is a measure on M,it induces a measure ν on E , defined by ν(B) = µ(T−1[B]) = µ(T ∈ B). This iscalled the image measure of µ under T , often denoted T (µ), or µT−1.

In case µ = P is a probability measure and f = X is a random variable, PX−1

is called the distribution of X, also denoted PX . In the special case, E = Rn,PX has distribution function FX (x) = PX((−∞, x] ) = P(X ≤ x); FX is called thedistribution function of X.

Image measures are defined also for σ-rings in the same way, of course. We are restricting

to the case of σ-algebras only because we have restricted the integration theory to that setting.

In what follows, we sometimes use the notation f(T ) instead of f T for com-position of mappings.

1 Theorem. (Basic change of variable theorem) Let T : S −→ E be M–Emeasurable and let f : E −→ R be real-valued and E-measurable. Then,

∫f(T ) dµ =

∫f dµT−1,

whenever one side exists.

If X is a random variable, with distribution PX , then

E(f(X)) =∫

f(x)PX (dx) =∫

f(x) dFX (x)

Proof. The second statement is just a special case in Probabalistic notation. Toprove the first, we show that the mapping

I(f) =∫

f(T ) dµ

on real-valued E-measurable functions satisfies the 5 basic properties of integralwith respect to µT−1 (Th 6.11).

(1) If f = 1B , the indicator of a B ∈ E , then the right side is∫

1B(T ) dµ =∫1T−1[B] dµ = µ(T−1[B]). Here we used the fact that 1B(T )(x) = 1B(T (x)) =

1T−1 [B](x).

(2) and (3) If f, g are non-negative, and a ∈ R, then (af+g)(T ) = af(T )+g(T ),so ∫

(af + g)(T ) dµ = a

∫f(T ) dµ +

∫g(T ) dµ.

6/9/2006 1160 mam

58 7. CHANGE OF VARIABLE, RADON–NIKODYM THEOREM

(4) If f ≥ 0, it is the increasing pointwise limit of a sequence of simple func-tions fn. But then fn(T ) increases to f(T ) and

∫fn(T ) dµ −→

∫f(T ) dµ by the

Monotone Convergence Theorem.

(5) Finally, if f is a general E-measurable real function, then f = f+ − f− andf(T )+ = f+(T ) and f(T )− = f−(T ). Thus,

∫f(T ) dµ =

∫f+(T ) dµ−

∫f−(T ) dµ,

when the left side exists.

Since the map f 7→∫

f(T ) dµ satisfies these 5 properties, it must coincide with∫fdµT−1.

Measures with densities.

The word “density” in measure theory has two meanings. The first is “Radon-Nikodym derivative” and the second is “Vitali derivative”. Here, we are referringto the first meaning.

If µ is a measure on the σ-algebra M and g is a measurable function, theng is called a Radon-Nikodym derivative or density of ν with respect to µ ifν(A) =

∫A

g dµ for all A ∈ M; that is, if ν is the indefinite integral of g withrespect to µ. In this context g is often denoted dν

dµ. The name comes from the

Radon-Nikodym Theorem, which gives conditions under which such g exists fora given pair ν and µ. In the present section, we assume that g is known andnon-negative and look for a formula in terms of µ for integrating with respect to ν.

2 Theorem. Let g ≥ 0 be M-measurable and ν its indefinite integral with respectto the measure µ on M. Then for f ∈ L0,

∫f dν =

∫fg dµ, (∗)

whenever one side exists.

Proof. Again we use the 5 basic properties; this time showing that the right side of(∗) satisfies the properties of the integral with respect to ν.

(1) If A ∈ M, then∫

1Ag dµ =∫

Ag dµ = ν(A), which was given.

(2) and (3) If f1, f2 are non-negative M-measurable functions, and a ≥ 0, thenso is (af1 + f2)g = af1g + f2g and

∫(af1 + f2)g dµ = a

∫f1g dµ +

∫f2g dµ.

(4) If f ≥ 0, there exists a sequence of non-negative simple functions fn in-creasing pointwise to f . Since g is assumed non-negative, fng increases to fg alsoso∫

fg dµ = limn

∫fng dµ, f by the MCT.

(5) Finally, if f is a general measurable function, (fg)+ = f+g, similarly forthe negative parts. So if

∫fg dµ exists then one of

∫f+g dµ and

∫f−g dµ, is finite

and∫

fg dµ =∫

f+g dµ −∫

f−g dµ, as required.

3 Corollary. In the setting of the basic change of variable theorem, if the imagemeasure of T has a density g with respect to some other measure η then

∫f(T ) dµ =∫

fg dη.

6/9/2006 1160 mam


In Probability Theory, where µ = P is a probability measure, integration withrespect to P is expectation, if T = X is a random variable with density (withrespect to Lebesgue measure) gX : that is, if P(X ∈ B) =

∫B

gX dλ, Theorem 2reads

E(f(X)) =∫

fgX dλ =∫

f(x)gX (x) dx.

Linear Change of Variable.

Suppose T : Rn −→ Rn is linear and one-to-one. We saw earlier that the imagemeasure of T−1 is λ(T [B]) = | det(T )|λ(B), so has constant density JT = | det(T )|with respect to λ. When one applies the basic change of variable theorem (Th 1),through the previous corollary, one obtains

∫fdλ =

∫f T T −1dλ =

∫f T dλT =

∫f T JT dλ,

or as one often writes,∫

f(x) dx =∫

f(T (y))JT dy.

This result generalizes to the case of a one-to-one continuously differentiablemap T on an open set U onto an open set V . One finds that λT (B) has densityy 7→ JT (y), the absolute value of the Jacobian of T at y, and the result becomes

∫

V

f(x) dx =∫

U

f(T (y))JT (y) dy.

Radon-Nikodym Theorem.

A measure ν (or even an extended real-valued countably additive set function)on M is called absolutely continuous with respect to the measure µ if it is 0 onµ-null sets (µ(A) = 0 =⇒ ν(A) = 0). The notation is ν µ. If ν has a density f ,then the indefinite integral ν : A 7→

∫A

f dµ is certainly absolutely continuous withrespect to µ. If µ is σ-finite, the measure (or c.a. set function) ν is also σ-finite.In this subsection, we will show that absolute continuity is sufficient for a σ-finitec.a. function to have a density.

For a function ϕ : M −→ R, and A ∈ M, A is called

positivenegative

null

with

respect to ϕ if ϕ(E) =

≥ 0≤ 0= 0

for all E ∈ M with E ⊂ A. Obviously, a set is

both positive and negative with respect to ϕ iff it is null.

4 Hahn Decomposition Theorem. Let ϕ be an extended real-valued countablyadditive function on M. Then there exist disjoint N, P with union S such that Nis negative and P is positive with respect to ϕ.

The following two lemmas are of interest in their own right.

6/9/2006 1160 mam


5 Lemma. Every family of sets contains a maximal disjoint subfamily.

Proof. Let F be the set of all disjoint subfamilies of C, ordered by inclusion H ⊂ G.If C is a chain in F then

⋃H : H ∈ C is an upper bound in F for C (that is, it is

still disjoint and contains each of the H ∈ C). Therefore, by Zorn’s lemma, F hasa maximal element.

6 Lemma. Suppose ϕ is an extended real-valued countably additive function onM not taking the value +∞ (respectively, −∞). Then

(a) If H is a disjoint family of measurable sets, and ε > 0, then only finitely manyA ∈ H satisfy ϕ(A) ≥ ε, (respectively, < −ε).

(b) Any disjoint family H of measurable sets A with ϕ(A) > 0 (respectively < 0)is countable.

Proof. (The < +∞ case).

(a) Otherwise H would contain a countably infinite subfamily An : n ∈ Nwith ϕ(An) ≥ ε. But

∑∞n=1 ϕ(An) = ϕ(

⋃∞n=1 An), a real number, so the nth term

ϕ(An) converges to 0.

(b) Thus, if ϕ(A) > 0 for all the A in H, then H =⋃

k∈NA : ϕ(A) ≥ 1/k iscountable.

Proof of the Hahn Decomposition Theorem.

We refer to the members A of M as “measurable” and to ϕ(A) as the “charge”of A.

Since ϕ is finitely additive, it cannot take on both values +∞ and −∞. Wesuppose ϕ(A) > −∞, for all A ∈ M.

By Lemma 5, we may choose a maximal disjoint family H− of non-null negativesets. (Recall that a set A is negative for ϕ if A ∈ M and each B ∈ M contained inA has charge ≤ 0).

The family H− is countable. Let N =⋃H−. Then N ∈ M and is a negative

set; indeed, if N ⊃ E ∈ M and ϕ(E) > 0, then

ϕ(E) = ϕ(E ∩ N ) =∑

A∈H−

ϕ(E ∩A).

We claim the set P = N c is a positive set.

Suppose otherwise. Then, there exists A ⊂ N c with ϕ(A) < 0. Since ϕ(A) >−∞, each measurable E ⊂ A has finite charge. Let P be a maximal disjoint familyof sets E ⊂ A such that ϕ(E) > 0. Then P is countable. Put B = A \

⋃P.

Then ϕ(B) 6= 0, otherwise ϕ(A) =∑

E∈P ϕ(E) ≥ 0.

If B contains a set E of charge > 0, it contradicts maximality of P.

If B is non-null and negative, it contradicts the maximality of H−.

The only possibility left is that A doesn’t exist — N c is a positive set.

6/9/2006 1160 mam


7 Radon-Nikodym Theorem. If µ is a measure on M, ν is a countable additivefunction on M absolutely continuous with respect to µ, and both ν and µ areσ-finite, then there exists a real-valued M-measurable function f with ν(A) =∫

Af dµ, for all A ∈ M. If ν is a measure, then f can be taken ≥ 0.

The function f is unique in the sense that if also ν(A) =∫

Ag dµ,then f = g

µ-a.e.

Proof. Let us first get the uniqueness out of the way. Suppose that, for all A ∈ M,

ν(A) =∫

A

f dµ =∫

A

g dµ.

Then,∫

Af − g dµ = 0, for all A ∈ M. Careful: σ-finiteness is used here. How?

Put A = (f > g) = x : f(x) > g(x). Then, (f −g)111A ≥ 0 and∫

(f −g)111A dµ = 0.Hence, (f − g)111A = 0, µ-almost everywhere. Thus, µ(f > g) = 0. Similarly,µ(f < g) = 0, so f = g, µ-almost everywhere.

We now tackle existence.

Finite measure case. We first assume both ν and µ are finite measures on M.If we knew that there were such an f ≥ 0, then the partition Ank : k ∈ N, whereAnk = [ k

2n < f ≤ k+12n ], would satisfy

k

2nµ(E ∩ Ank) ≤ ν(E ∩ Ank) ≤ k + 1

2nµ(E ∩ Ank), (∗)

for all E ∈ M, and the functions fn =∑∞

k=0k2n 1Ank would increase pointwise to

f . The idea of the proof is to try to construct similar partitions πn, construct thefn from them, and then the f as a limit of the fn.

If t ∈ R is given, the function ν − tµ is a real-valued c.a. function on M. Thus,by the Hahn Decomposition Theorem, there exist Nt and Pt = N c

t , negative andpositive sets for ν − tµ, respectively, disjoint with union S. This means that for ameasurable set E

ν(E) ≤ tµ(E), if E ⊂ Nt and

ν(E) ≥ tµ(E), if E ⊂ N ct .

Since ν and µ are non-negative, when t < 0, a natural choice for Nt is Nt = ∅.

Let P∞ = (⋃

t∈Q Nt)c =⋂

t∈Q N ct . Then ν(P∞) ≥ tµ(P∞) for all t ∈ Q. Since

ν is finite valued, this means that µ(P∞) = 0, and by absolute continuity, ν(P∞)is also 0.

If s < t, we have

tµ(Ns \ Nt) ≤ ν(Ns \ Nt) ≤ sµ(Ns \ Nt).

This is impossible unless µ(Ns \ Nt) = 0. Now, let

Z = P∞ ∪⋃

Ns \ Nt : 0 ≤ s < t, s, t ∈ Q

6/9/2006 1160 mam


This set is of measure 0 for µ and hence also for ν, since ν << µ. By including Zin all of the Nt, for t ≥ 0, we may assume that

(1) for rational s, t, 0 ≤ s < t implies Ns ⊂ Nt.

(2)⋃

0≤t∈Q Nt = S.

Now, for each n ∈ N partition S into disjoint pieces,

Ank = N(k+1)/2n \ Nk/2n , k = 0, 1, 2, . . . .

This partition satisfies (∗) since Ank is positive for ν− k2n µ and negative for ν− k+1

2n µ.Then define the function fn =

∑∞k=0

k2n 1Ank and check using (∗) that it satisfies

∫

E

fn dµ ≤ ν(E) ≤∫

E

(fn +12n

)dµ ≤∫

E

fn dµ +12n

µ(E). (∗∗)

Now fn ≤ fn+1. Indeed An,k = An+1,2k ∪ An+1,2k+1 and

• for x ∈ An+1,2k, fn+1(x) = 2k2n+1 = k

2n = fn(x) and

• for x ∈ An+1,2k+1, fn+1(x) = 2k+12n+1 = fn(x) + 1

2n+1 .

Moreover, we see that fn ≤ f1 + 1 < ∞. Thus we can define f to be the pointwiselimit of the fn and use the Monotone Convergence theorem in (∗∗), yielding

∫

E

f dµ ≤ ν(E) ≤∫

E

f dµ + 0µ(S),

as required. The notice that the function f is non-negative in this case.

σ-finite measure case. Suppose µ and ν are σ-finite measures on M. Let H1

and H2 each be countable disjoint families covering S such that ν is finite on eachA ∈ H1 and µ is finite on each B ∈ H2. Then H = A ∩ B : A ∈ H1, B ∈ H2is a countable disjoint family covering S such that both ν and µ are finite on eachmember of H. For each C ∈ H, let νC and µC be the restrictions of ν and µto C, that is νC(A) = ν(A ∩ C) and µC(A) = µ(C ∩ A), and let fC ≥ 0 be ameasurable function for which νC(A) =

∫A

fC dµC , using the finite case. Noticethat µC(A) =

∫A 1C dµ, for all A ∈ M, so by Th 2,

∫g dµC =

∫g 1C dµ, for all real

M-measurable functions g, so

νC(A) =∫

A

fC1C dµ,

so that by the MCT, if f =∑

C∈H fC1C ,

ν(A) =∑

C

νC(A) =∫

A

∑

C

fC1C dµ =∫

A

f dµ,

as required.

6/9/2006 1160 mam

7. CHANGE OF VARIABLE, RADON–NIKODYM THEOREM 63

General c.a. case. Assume now that ν : M −→ R is countably additive andσ-finite . Let P and N be the positive and negative sets of ν, given by the HahnDecomposition Theorem. We cannot have both of these parts infinite, for otherwiseν(S) = ν(P ) + ν(N ) does not exist. Let us suppose ν(N ) is finite without loss ofgenerality. Put ν+(A) = ν(P ∩ A) and ν−(A) = −ν(N ∩ A). Then ν(A) =ν+(A) − ν−(A); that is, ν is the difference of two (non-negative) measures. Letf1 ≥ 0 be a RN-density for ν+and f2 ≥ 0 be one for ν− and put f = f1 − f2.Then

∫A

f2 dν2(A) < ∞, for all A ∈ M, so∫

Af dµ =

∫A

f1 dµ−∫A

f2 dµ = ν(A) asrequired.

.

1. If H is a disjoint family of sets of non-zero charge with respect to a σ-finite countably additive

function on a σ-ring, then H is countable.

2.(a) Let g : R −→ R be strictly increasing, continuously differentiable. Show that the

Lebesgue-Stieltjes measure µg corresponding to g is the image measure of a certain

map.

(b) Use the fundamental theorem of calculus to show that its Radon-Nikodym density with

respect to Lebesgue measure is the usual derivative g′.

(c) Prove that for all Borel functions f : R −→ R,∫

f dg =∫

f(x)g′(x) dx, whenever eitherside exists (the left side means

∫f dµg and the right-side refers to the integral with

respect to Lebesgue measure).

(d) Show that∫

f(g(x))g′(x)dx =∫

f(u)du, again for Borel functions f , when one side

exists.

3. Let λ be Lebesgue measure and µ be counting measure. Show that λ µ, but λ has no RN

density with respect to µ.

4. Prove the chain rule for RN derivatives: If ν µ η (all σ-finite measures) then

dν

dη=

dν

dµ

dµ

dηη-almost everywhere .

1. If ϕ(A) < +∞, for all A ∈ S, a σ-ring, and ϕ is c.a. on S, then there exists a maximal

disjoint family of sets A with ϕ(A) > 0. Measure-theoretic method to avoid Zorn: For eachk ∈ N, let Ck = A ∈ C : ϕ(A) ≥ 1/k and C =

⋃k∈N Ck. Start with k = 1. If C1 = ∅, let

H1 = C1, otherwise define H1 = A1, . . . , An by taking an arbitrary A1 ∈ C1, and assumingA1, . . . , Ai chosen let Ai+1 be any element of C1 disjoint from A1 ∪ · · · ∪ Ai. This process

stops after a finite number n of steps. Thus H1 is maximal — any member of C disjoint fromthe members of H1 satisfies ϕ(A) < 1. Repeat this process with the members of C2 which are

disjoint from each of the members of H1 creating a finite (possibly ∅) maximal such familyH2 — any A ∈ C disjoint from the members of H1 ∪ H2 must satisfy ϕ(A) < 1/2 (that is

/∈ C2). Continue in this manner, obtaining families Hk ⊂ Ck such that⋃

k Hk is a disjointfamily of sets and the elements of C disjoint from those of

⋃k≤m Hk do not belong to Cm.

Put H =⋃

k Hk . Then H is the required maximal set. Indeed if A ∈ C is disjoint from themembers of H, A /∈

⋃k Ck = C, which cannot happen.

6/9/2006 1160 mam


8. Product Measure, Fubini’s Theorem.

In this section we continue to work with measures on σ-algebras. We fix mea-sures µ1 and µ2 defined on σ-algebras M1 and M2 in spaces S1 and S2, respectively.The aim is to get general conditions under which one can replace double integralsby iterated integrals and interchange order of iterated integration.

Product Measure.

If µ1 and µ2 are measures on σ-algebras M1 and M2 (as always in this section)we seek to find a measure µ in S = S1×S2, with respect to which we can do doubleintegration. Of course, in case µ1 = µ2 is one-dimensional Lebesgue measure, two-dimensional Lebesgue measure will do the job. We will see, however, even in thiscase a more general construction will be useful.

Let S = S1 × S2. The product σ-ring of M1 and M2, denoted M1 ⊗M2 isthe σ-ring generated by the family M1×M2 = A1 ×A2 : A1 ∈ M1, A2 ∈ M2.Since S ∈ M1 ⊗ M2, this is also the generated σ-algebra, and is also called theproduct σ-algebra.

Recall that a measure µ on a σ-algebra M in S is called σ-finite if there existsa countable subfamily H ⊂ M of sets A with µ(A) < ∞ and

⋃H = S.

Product Measure Theorem. Let µ1, µ2 be σ-finite measures on σ-algebrasM 1 and M2. Then, there is exactly one measure µ on M1 ⊗ M2 for whichµ(A1 × A2) = µ1(A1)µ(A2). This µ is σ-finite.

The µ obtained this way is called product measure, the product of µ1 andµ2, denoted µ = µ1 ⊗ µ2.

Proof. First we give the construction in outline, then verify each part in detail. Foreach E ⊂ S = S1×S2 and x1 ∈ S1, let E(x1) be the S1-section of E determinedby x1; namely,

E(x1) = x2 ∈ S2 : (x1, x2) ∈ E .

Then,

(a) E(x1) ∈ M2, for each E ∈ M1 ⊗M2, and if we put

ν(x1, E) = µ2(E(x1)),

we find that:

(b) for fixed x1 ∈ S1, ν(x1, ·) is a measure on M1 ⊗M2,

(c) for fixed E ∈ M1 ⊗M2, ν(·, E) is M1-measurable, and

(d) the required measure on M1 ⊗M2 is defined by

µ(E) =∫

ν(x1, E)µ1(dx1).

(e) The uniqueness comes from the Unique Extension Theorem.

6/9/2006 1160 mam

8. PRODUCT MEASURE, FUBINI’S THEOREM. 65

The details.

(a) The map T : x2 7→ (x1, x2) is M2–(M1⊗M2) measurable. Indeed, forA1 ∈ M1 and A2 ∈ M2, T−1[A1 × A2] is A2 if x1 ∈ A1 and ∅ otherwise. Sinceeach of these is in M2 and A1×A2 : A1 ∈ M1, A2 ∈ M2 generates M1 ⊗M2,Theorem 5.2 applies. Hence, E(x1) = T−1[E] ∈ M2, for E ∈ M1 ⊗M2.

(b) This is straightforward: ν(x1, E) = µ2T−1(E), for the T above.

(c) We use the D-system theorem.

First, The family C of all A1 × A2 such that µ1(A1) and µ2(A2) are finite isclosed under finite intersection and σ(C) = M1⊗M2.

The the family D of all E for which ν(·, E) is measurable is a C-local D-system.Indeed, C ⊂ D. Indeed, if E = A1 × A2, where A1 ∈ M1 and A2 ∈ M2, then

ν(x1, E) =

µ2(A2), if x ∈ A1

0 if x /∈ A1,

which is an M1-simple function in x1.

If En is a sequence of members of D, then

ν(·,⋃

n

En) =∞∑

n=1

ν(·, En),

the sum of a sequence of non-negative measurable functions, so is also measurable(possibly infinite valued) and if E ⊂ F ⊂ C, E, F ∈ D where C ∈ C, thenν(·, F \ E) = ν(·, F )− ν(·, E), is again measurable, since it is the difference of two(finite) real-valued measurable functions.

Thus, by the D-system theorem, M1⊗M2 = σ(C) ⊂ D. That is, ν(·, E) ismeasurable for all E in M1⊗M2.

(d) That µ is now a measure follows immediately from the MCT: if (En) is a disjointsequence in M1 ⊗M2, then

∫ν(·,

⋃

n

En) dµ1 =∫ ∞∑

n=1

ν(·, En) dµ1 =∞∑

n=1

∫ν(·, En) dµ1,

since the functions involved are all non-negative (though possibly infinite atsome points). (Here we use the convention that a non-negative function withvalue +∞ on a set of measure > 0 has infinite integral.)

(e) The set C of the proof of part (c) satisfies the conditions of the Unique ExtensionTheorem. (The σ-finiteness comes from the fact that C consists of sets of finiteµ-measure.)

Fubini Theorem. Let µ1 and µ2 be σ-finite measures on the σ-algebras M1 andM2 respectively, and let S = S1 × S2 and µ = µ1 ⊗ µ2, product measure, definedon M = M1 ⊗M2. and let f be M-measurable. Then:

6/9/2006 1160 mam

66 8. PRODUCT MEASURE, FUBINI’S THEOREM

(a) For each x1 ∈ S1, f(x1, ·) is M2-measurable and for each x2 ∈ S2, f(·, x2) isM1-measurable.

(b) If f ≥ 0 or f is µ-integrable then

∫ ∫f(x1, x2)µ1(dx1)µ2(dx2) =

∫f dµ =

∫ ∫f(x1, x2) µ2(dx2) µ1(dx1).

(∗)(c) If

∫ ∫|f(x1, x2)| µ1(dx1)µ2(dx2) is finite, then f is µ-integrable, so (b) applies.

Note. Included in the meaning of (∗) is that, in case f ≥ 0, the maps

x2 7→∫

f(x1, x2)µ1(dx1) and x1 7→∫

f(x1, x2)µ2(dx2)

are measurable (possibly infinite valued). See the conventions in section 6 aboutthe integral of infinite valued functions.

In the case f is integrable, one only can assert that these maps are almosteverywhere equal to integrable maps — indeed they may not be defined everywhere.Thus (∗) must be interpreted as meaning that after appropriately defining thesemaps on a set of measure 0 they become integrable with integral

∫f dµ. This will

become clear in the proof.

Proof. (a) (The map f(x1, ·) is referred to as the x1-section of f .)We have shown earlier (in the proof of the product measure theorem) that the mapTx1 : x2 7→ (x1, x2) is M2-M1⊗M2 measurable. And the map f(x1, ·) is justthe composite of f and Tx1 , thus f(x1, ·) is M1-measurable. The other “section”f(·, x2) is handled in the same way, using the map x1 7→ (x1, x2).

Here is an alternate argument. If E ∈ M = M1 ⊗ M2, then for each x1 ∈ S1 ,E(x1) ∈ M2 and hence 1E(x1) is M2-measurable. This, however is the same as the

map 1E(x1, ·), so the result holds for f an indicator function. If f is an M-simplefunction, then it is a linear combination of indicators, so f(x1, ·) is M2-simple. In

general if f is M-measurable, then f is a pointwise limit of M-simple functions fn

and f(x1, ·) is the limit of the M2-simple functions fn(x1, ·), so is M2-measurable.

(b) Put I(f) =∫ ∫

f(x1, x2) µ2(dx2)µ1(dx1).

If f = 1E is the indicator of an E ∈ M1 ⊗M2, then∫

f(x1, x2) µ2(dx2) =∫

1E(x1) dµ2 = µ2(E(x1)),

which we have shown measurable as a function of x1 in the construction of productmeasure. Moreover, its integral with respect to µ1 is exactly µ(E), by definition,and this is

∫1E dµ =

∫f dµ. Thus for f an indicator function of E ∈ M,

I(f) =∫

f dµ. (∗∗)

6/9/2006 1160 mam

8. PRODUCT MEASURE, FUBINI’S THEOREM. 67

If f is a non-negative M-simple function, then∫

f(x1, ·) dµ2 is a non-negativelinear combination of non-negative M2-measurable functions, so is measurable (pos-sibly infinite valued); since integration preserves such linear combinations (basicproperties (2) and (3)), (∗∗) still holds for such f .

If f ≥ 0 is M-measurable and finite-valued, then there exists a sequence (fn) ofnon-negative (finite valued) M-simple functions increasing to f everywhere. Thenfor each x1, the sections fn(x1, ·) increase to f(x1, ·). By the MCT

∫fn(x1, ·) dµ2 −→

∫f(x1, ·) dµ2.

The integrals can be infinite for x1 in a set of positive µ1 measure. In any case, theresulting functions of x1 still satisfy the MCT, so

I(f) = limn

∫∫fn(x1, x2) dµ2 dµ1 = lim

nI(fn) =

∫f dµ.

Now if f is µ-integrable, then f+ and f− are non-negative µ-integrable func-tions. Thus, (∗∗) holds for each of them. Now comes the tricky part:

∫ ∫f+(x1, x2)µ2(dx2)µ1(dx1) =

∫f+ dµ < ∞

implies that∫

f+(x1, x2) µ2(dx2) is finite for µ1-almost all x1 ∈ S1. Similarly,the same holds for

∫f−(x1, x2) µ2(dx2). Thus, there exists an µ1-null set N such

that for x1 ∈ N c f(x1, ·) is µ2-integrable and the map F : S1 −→ R defined byF (x1) =

∫f(x1, x2) µ2(dx2) if x1 ∈ N c and = 0 if x ∈ N is µ1-integrable with∫

F dµ1 =∫

f dµ, which is what is meant by the second equality of (∗). The firstis similar.

(c) Finally, if the iterated integral∫ ∫

|f |dµ2dµ1 is finite, then since it is equalto∫|f | dµ by (∗), |f | must µ-almost everywhere equal to a µ-integrable function

and since it was given to be measurable, it actually is integrable and (b) applies.

Remarks.

(1) The non-negative case of the Fubini Theorem is often called Tonelli’s the-orem.

(2) The function F in the proof above can be obtained by changing the inte-grand f , replacing it by f (x1, x2) = f(x1, x2) for x1 ∈ N c and = 0 otherwise. ThenN ×S2 is µ1 ⊗µ2-null, so that f has the same integral with respect to µ1 ⊗µ2, andthe problem of undefined integral is gone.

1. Prove that the family of Borel sets of R2 is the product B⊗B, where B is the family of Borelsets of R and that the restriction of 2 dimensional Lebesgue measure on this family is the

product of 2 copies of 1-dimensional Lebesgue Measure.

2. Let µ1 = µ2 be counting measure on all subsets of N. Let f(m,n) = 1 if n = m, -1 if

n = m + 1 and 0 otherwise. Then∫ ∫

f(m,n)µ1(dm)µ2(dn) 6=∫ ∫

f(m,n)µ2(dn) µ1 (dm),though both are finite. Why does this not contradict the Fubini theorem?

6/9/2006 1160 mam

68 8. PRODUCT MEASURE, FUBINI’S THEOREM

3. Prove the following version of Cavellieri’s Principle. Let µ1 , µ2 be σ-finite measures onσ-algebrasM1, M2 in S1 , S2 respectively. Prove that if E,F are subsets of S = S1 ×S2 such

that µ2(E(x1)) = µ2(F (x1)), for µ1-almost all x1 ∈ S1 , then (µ1 ⊗ µ2)(E) = (µ1 ⊗ µ2)(F ).

4. Let µ1 = µ2 = λ1, Lebesgue measure, on the Borel subsets of [0,1]× [0,1]. Put

f(x, y) =

22n, if 12n ≤ x < 1

2n−1 , 12n ≤ y < 1

2n−1 , n ∈ N−22n+1 if 1

2n+1 ≤ x < 12n , 1

2n ≤ y < 12n−1 , n ∈ N

0 otherwise

Notice that for fixed y with 0 ≤ y < 1, there exists an n with 12n ≤ y < 1

2n−1 , and for this y,

f(·, y) = 22n1[1

2n , 12n−1

) − 22n+11[1

2n+1 , 12n

),

which has integral 0. For fixed x with 0 ≤ x ≤ 1/2, there is a n ∈ N with 12n+1 ≤ x 1

2n and

for that n,

f(x, ·) = −22n+21[1

2n+1 , 12n

) + 22n+11[1

2n , 12n−1

),

but there is still the case 1/2 ≤ x < 1, where

f(x, ·) = 4 · 1[1/2,1) ,

and we get∫ 1

0

∫ 1

0f(x, y)λ(dx)λ(dy) = 0 but

∫ 1

0

∫ 1

0f(x,y)λ(dy) λ (dx) = 1.

And check that this does not violate the Fubini Theorem.

5. For (x, y) in the unit square [0,1]× [0, 1], define

f(x, y) =x2 − y2

(x2 + y2)2.

Prove that ∫ 1

0

∫ 1

0f(x, y) dx dy = −π/4,

but ∫ 1

0

∫ 1

0f(x, y) dy dx = π/4.

Show that this doesn’t violate the Fubini Theorem, since f is not integrable on [0,1]× [0,1].

6. Let µ1 be Lebesgue measure on the Borel sets of [0,1] and let µ2 be counting measure, again

on the Borel sets of [0,1]. Let

f(x, y) =

1, x = y

0, x 6= y

. Prove that f is product measurable — that is, belongs to B(R) ⊗ B(R) — yet∫ ∫

f(x, y)µ1(dx)µ2(dy) 6=∫ ∫

f(x, y)µ2(dy)µ1(dx).

What hypothesis of the Fubini Theorem doesn’t hold?

7. Let µ1 and µ2 be measures defined on σ-algebras M1 and M2 in spaces S1 and S2 , respec-

tively. Let M1×M2 = A1 × A2 : A1 ∈ M1, A2 ∈ M2. For A = A1 × A2 ∈ M1×M2 ,put τ(A) = µ1(A1)µ2(A2). Prove that τ is defined on a prering and is countably addi-

tive. Let τ∗ be the Caratheodory outer measure generated by τ . Show that each elementA ∈ M1×M2, is τ∗-measurable with τ∗(A) = τ(A), so this is another way of producing a

product measure. The previous problem, however shows that this need not be given by theiterated integral formula.

6/9/2006 1160 mam

9. THE LP -SPACES, TYPES OF CONVERGENCE. 69

9. The Lp-spaces, types of convergence.

We recall that if E is a vector space over R, a (real) seminorm on E is afunction x 7→ ‖x‖ on E to R such that for x, y ∈ E

(1) ‖x‖ ≥ 0

(2) ‖ax‖ = |a|‖x‖, for all a ∈ R

(3) ‖x + y‖ ≤ ‖x‖+ ‖y‖.

The second condition implies that ‖x‖ = 0, when x = 0; a seminorm is called anorm if this is the only x for which ‖x‖ = 0. A vector space with a (semi)normis called a (semi)normed space. It is immediate that the set N of all x in aseminormed space E for which ‖x‖ = 0 is a vector subspace, so the quotient spaceE/N is also a vector space. Its elements consist of equivalence classes (cosets)x =x + N . Notice that the seminorm is constant on equivalence classes: if y ∈ x + Nthen | ‖x‖ − ‖y‖ | ≤ ‖x − y‖ = 0, so ‖x‖ = ‖y‖. Thus, there is a function definedon E/N by ‖x‖ = ‖x‖. This makes E/N into a normed space.

If E is a seminormed space, there is a distance on E defined by d(x, y) = ‖x−y‖.This distance is a semi-metric (or pseudometric), that is

(1) d(x, y) ≥ 0; d(x, x) = 0

(2) d(x, y) = d(y, x)

(3) d(x, y) ≤ d(x, z) + d(z, y).

If ‖ · ‖ is actually a norm then d is a metric; that is, it also satisfies

(4) d(x, y) = 0 =⇒ x = y.

We assume familiarity with the concepts of neighborhood, open set, conver-gence, Cauchy sequence etc. in a pseudometric space.

Recall that a convergent sequence in a pseudometric space is Cauchy and thatif the converse holds, the space is called complete. If X is a seminormed spaceand N = x : ‖x‖ = 0, then the quotient space X/N , described above is alsocomplete, as the reader can easily check.

Shall we review the following, “for the sake of completeness”?

Lemma. In a semimetric space,

(1) if (xn) is a Cauchy sequence, and εk > 0, for all k ∈ N, then (xn) has asubsequence (yk) such that d(yk, yk+1) < εk.

(2) If a Cauchy sequence has a convergent subsequence, then the entire sequenceis convergent.

(3) A seminormed space is complete if (and only if) every absolutely convergentseries is convergent.

Proof. (1) Suppose (xn) is Cauchy. Then we may choose n1 such that for n, m ≥ n1,

6/9/2006 1160 mam

70 9. THE Lp–SPACES, TYPES OF CONVERGENCE

d(xn, xm) < ε1. Recursively, if nk has been chosen, choose nk+1 > nk such thatd(xn, xm) < εk+1 for n, m ≥ nk. Then (yk) = (xnk) is the required subsequence.

(2) Suppose (xn) is Cauchy and (xnk) is a subsequence converging to a. Then,for all ε > 0 there exists N such that for m, n ≥ N , d(xm, xn) < ε and there is k withnk ≥ N with d(xnk, a) < ε. Thus, for m ≥ N , d(xm, a) ≤ d(xm, xnk) + d(xnk, a) <2ε. Hence xm −→ a.

(3) Suppose (xn) is Cauchy. Choose a subsequence (yk) such that ‖yk+1−yk‖ <12k . Then

∑k(yk − yk+1) converges absolutely; indeed,

∑∞k=1 ‖yk+1 − yk‖ < 1, so

by hypothesis ym = y1 +∑m−1

k=1 (yk+1 − yk) converges to some x and hence xn alsoconverges to x. (The easy converse is omitted.)

The spaces L1 and L1.

Recall that L0 = L0(S,M, µ) is the set of all (finite) real-valued M-measurablefunctions. For each f ∈ L0, ‖f‖1 =

∫|f | dµ is defined, though possibly infinite.

The space L1 = L1(S,M, µ) is the set of all f ∈ L0 for which ‖f‖1 < ∞.

1 Theorem. L1(S,M, µ) is a complete seminormed space under the seminorm‖ · ‖1 with ‖f‖1 = 0 iff f = 0 µ-a.e.

Proof. Clearly, for all f, g ∈ L0 and a ∈ R: (1) ‖f‖1 ≥ 0, (2) ‖af‖1 =∫|af | dµ =

|a|‖f‖1 and (3) ‖f + g‖1 =∫|f + g| dµ ≤

∫|f | + |g| dµ = ‖f‖1 + ‖g‖1. The latter

two properties also show that L1 is closed under addition and multiplication by ascalar, so is a vector subspace of L0. Finally, we have already shown in section6 that

∫|f | dµ = 0 implies f = 0 µ-a.e., so only the completeness remains to be

proved.

For this, we show that absolute convergence implies convergence.

Suppose∑

n fn is an absolutely convergent series in L1, and let gn =∑n

k=1 |fk|.Then

limn

∫gn dµ = lim

n

∫ ∑

k≤n

|fk|

dµ ≤ lim

n

∑

k≤n

‖fk‖1

< ∞,

So, by the Monotone Convergence Theorem, there exists a finite integrable functiong such that gn −→ g µ-a.e. (and

∫gdµ = limn

∫gn dµ < ∞). Thus, the series

∑n fn(x) converges absolutely in R, for almost all x. Thus hn =

(∑k≤n fk

)

converges µ-a.e. to some h ∈ L0 and |hn| is dominated µ-a.e. by g, a µ-integrablefunction. Since |hn − h| ≤ 2g µ-a.e., the DCT gives

‖hn − h‖1 =∫

|hn − h| dµ −→ 0,

as required.

The reader may have noticed, embedded in the above proof, the followingstrengthening of the Dominated Convergence Theorem.

6/9/2006 1160 mam


Corollary. If fn → f , µ-almost everywhere and |fn| ≤ g ∈ L1, for all n, thenfn → f in L1.

We will call a function f which is 0 µ-a.e. a µ-null function. The quotientspace of L1 by the subspace of null functions is denoted L1 = L1(S,M, µ). It isthus a normed space under the induced norm. Many authors make no distinctionbetween L1 and L1, regarding the equivalence classes as functions, whenever con-venient. Convergence of a sequence in the L1-(semi)norm is called convergencein the mean. We have shown that L1 is complete in its seminorm, so that L1 isa complete normed space, that is, a Banach space.

The spaces Lp and Lp, for 1 ≤ p < ∞.

For 1 ≤ p < ∞ and f ∈ L0, we define ‖f‖p =(∫

|f |p dµ)1/p. This reduces to

the previous definition for p = 1 and again this could be infinite.

2 Theorem. For 1 ≤ p < ∞, the set Lp = Lp(S,M, µ) of those f ∈ L0 for which‖f‖p is finite is a subspace of L0 and ‖ · ‖p is a complete seminorm on Lp vanishingexactly on the µ-null functions.

The corresponding normed space Lp/f : f is µ-null will be denoted Lp =Lp(S,M, µ).

Proof. Clearly ‖f‖p = 0 iff∥∥∥ |f |p

∥∥∥1

= 0 iff f = 0 a.e. Now, if∫|f |p dµ < ∞ and

a ∈ R then∫|af |p dµ = |a|p

∫|f |p dµ < ∞, so Lp is closed under multiplication

by a scalar and the second property of the definition of seminorm is satisfied. Iff, g ∈ Lp we have |f + g| ≤ |f | + |g| ≤ 2 max|f |, |g|, so

|f + g|p ≤ 2p max|f |p, |g|p ≤ 2p(|f |p + |g|p),

integrating, we see Lp is closed under addition. But the inequality we get is weakerthan the triangle inequality:

‖f + g‖p ≤ 2(‖f‖pp + ‖g‖p

p)1/p

The triangle inequality is known as Minkowski’s Inequality If f, g ∈ L0,then

‖f + g‖p ≤ ‖f‖p + ‖g‖p (1 ≤ p < +∞) (∗)

We already know this in case p = 1, so we assume p > 1. It really depends on astatement about real numbers. The function ϕ(t) = tp is strictly convex on [0,∞),since ϕ′′(t) = p(p − 1)tp−2 > 0. Thus, for a, b > 0, 0 < α < 1,

(αa + (1 − α)b)p ≤ αap + (1 − α)bp,

and actually < if a 6= b.

6/9/2006 1160 mam


Now, we may assume the right side of (∗) is finite. And if f or g were null, thestatement would be trivial, so we may assume that neither ‖f‖p nor ‖g‖p is 0.

Put f0 = |f|‖f‖p

and g0 = |g|‖g‖p

. Then ‖f0‖p = ‖g0‖p = 1 and

|f | + |g|‖f‖p + ‖g‖p

=‖f‖p

‖f‖p + ‖g‖pf0 +

‖g‖p

‖f‖p + ‖g‖pg0,

which is a convex combination αf0 + (1 − α)g0. Thus

(|f + g|

‖f‖p + ‖g‖p

)p

≤ αfp0 + (1 − α)gp

0 .

Integrating gives

1(‖f‖p + ‖g‖p)p

∫|f + g|p dµ ≤ α

∫fp0 dµ + (1 − α)

∫gp0 dµ = α + (1 − α) = 1

In other words,‖f + g‖p

p ≤ (‖f‖p + ‖g‖p)p,

as required.

The completeness of Lp is proved in the same way as for L1. We’ve written ithere slightly differently, illustrating the convention of integrating functions whichare not necessarily finite.

Suppose∑

n fn is an absolutely convergent series in Lp, and let g =∑∞

k=1 |fk|.Then by the Monotone Convergence Theorem,

∫gp dµ = lim

n

∫ ∑

k≤n

|fk|

p

dµ ≤ limn

∑

k≤n

‖fk‖p

p

< ∞,

so that g is finite µ-a.e. Thus hn =(∑

k≤n fk

)converges µ-a.e. to some h ∈ L0

and |hn|p is dominated by gp, which is equal µ-a.e. to a µ-integrable function. Since|hn − h|p ≤ 2pgp µ-a.e. , the DCT gives

‖hn − h‖pp =

∫|hn − h|p dµ −→ 0,

as required.

As with L1, we see embedded in the above proof.

Corollary. If fn → f µ-almost everywhere and |fn| ≤ g ∈ Lp, for all n, thenfn → f in Lp.

The following result is often used in the proof of Minkowski’s inequality. (Seethe Exercises.) It is indispensible in determining the duals of (i.e., the space ofcontinuous linear functionals on) the Lp spaces.

6/9/2006 1160 mam


3 Holder’s inequality. If p > 1 and 1p + 1

p′ = 1, and f, g ∈ L0, ‖fg‖1 ≤ ‖f‖p‖g‖p′ .

The exponents p and p′ in this result are called conjugate exponents. Notethat, in particular, this says that fg is integrable whenever f ∈ Lpand g ∈ Lp′

.The unique self-conjugate exponent is 2. In this case, Holder’s inequality gives theCauchy-Bunyakovskiı-Schwartz inequality: |

∫fg dµ| ≤ ‖fg‖1 ≤ ‖f‖2‖g‖2.

Proof. We may assume the right-side is finite and non-zero, for otherwise the resultis trivial. Then, dividing by ‖f‖p and ‖g‖p′ , we may assume both are 1.

The logarithm function ϕ(t) = log t is (strictly) concave. Indeed, ϕ′′(t) =−1/t2, for t ∈ (0,∞). Thus, for 0 < s < 1, and a, b > 0

s log a + (1 − s) log b ≤ log(sa + (1 − s)b).

Exponentiating givesasb1−s ≤ sa + (1 − s)b.

If we take s = 1p, so (1 − s) = 1

p′ and put a = Ap and b = Bp′, we get

AB ≤ Ap

p+

Bp′

p′.

Substitute |f | and |g| for A and B and integrate (remembering that ‖f‖p = 1 and‖g‖p′ = 1) to get

‖fg‖1 ≤‖f‖p

p

p+

‖g‖p′

p′

p′=

1p

+1p′

= 1,

as required.

The spaces L∞and L∞.

The space L∞= L∞

(S,M, µ) is the set of all functions f ∈ L0 which areµ-essentially bounded; that is, for which there is a number α with | f(x)| ≤ α,for µ-almost all x. Thus, f ∈ L∞ iff

‖f‖∞ := infα ≥ 0 : µ(|f | > α) = 0 < ∞.

Let W (temporarily) be the set α ≥ 0 : µ(|f | > α) = 0. If α ∈ W and α′ > α,then µ(|f | > α′) ≤ µ(|f | > α) = 0, so α′ is also in W . This shows that W is aninfinite interval. Moreover, ‖f‖∞ belongs to W , that is a minimum is attained.

Indeed, if αn α = ‖f‖∞, and µ(|f | > αn) = 0, then [|f | > αn] [|f | > α]and so µ(|f | > α) = limn µ(|f | > αn) = 0.

We have just shown that ‖f‖∞ is the least number such that |f | ≤ ‖f‖∞ almosteverywhere. One can easily check that

‖f‖∞ = infN∈N

supx∈S\N

|f(x)|.

Where N is, as before, the family of µ-null sets. This extends the usual idea of“sup-norm”, and gives ‖f‖∞ the name essential supremum.

6/9/2006 1160 mam


Theorem. L∞is a complete seminormed vector space, under ‖ · ‖∞and ‖f‖∞ = 0

iff f is µ-null.

L∞ = L∞(S,M, µ) will denote the corresponding normed space — actually,Banach space since it is complete.

Proof. We have ‖f‖∞ = 0 iff |f(x)| ≤ 0 µ-a.e., that is, iff f = 0 µ-a.e.

We now verify the seminorm properties.

Certainly, ‖f‖∞ ≥ 0.

If a 6= 0, (|f | > α) = (|af | > |a|α), so

|a|‖f‖∞ = |a| infα : µ(|af | > |a|α) = 0= infβ : µ(|af | > β) = 0= ‖af‖∞.

(Triangle inequality) Let f, g ∈ L∞. Then, |f | ≤ ‖f‖∞ a.e. and |g| ≤ ‖g‖∞

a.e., so |f +g| ≤ ‖f‖∞ +‖g‖∞ a.e. Since ‖f +g‖∞ is the least number α such that|f + g| ≤ α a.e., ‖f + g‖∞ ≤ ‖f‖∞ + ‖g‖∞, as required.

In more detail, and from another point of view, which will be useful later, when westudyL0 and convergence in measure: For real numbers |y+z| > a+b implies |y| > a

or |z| > b, thus

µ(|f + g| > ‖f‖∞ + ‖g‖∞) ≤ µ(|f | > ‖f‖∞) + µ(|g| > ‖g‖∞) = 0,

Thus α = ‖f‖∞ + ‖g‖∞ satisfies µ(|f + g| > α) = 0, so ‖f + g‖∞ ≤ ‖f‖∞ + ‖g‖∞,as required.

(Completeness.) Suppose (fn) is Cauchy in the seminorm ‖ · ‖∞. Outsidesome null set Nnm, we have |fm − fn| ≤ ‖fn − fm‖∞, so outside N =

⋃n,m Nnm,

|fn − fm| ≤ ‖fn − fm‖∞, for all m, n ∈ N.

Now, for all ε > 0, there exists n0 so large that

‖fn − fm‖∞ ≤ ε, for n, m ≥ n0.

If x /∈ N , |fn(x) − fm(x)|∞ ≤ ‖fn − fm‖∞, so

|fn(x) − fm(x)| ≤ ε, for n, m ≥ n0, (∞)

This shows first that (fn(x)) is a Cauchy sequence and hence fn(x) converges, bythe completeness of R. Put

f(x) =

limn fn(x), if x /∈ N

0, otherwise.

Now let m −→ ∞, in (∞) obtaining |fn(x) − f(x)| ≤ ε, for all x ∈ N c. But then‖fn − f‖∞ ≤ ε, for all n ≥ n0. So, fn −→ f in the ‖ · ‖∞ distance.

Finally, to show f ∈ L∞, let ε = 1 and choose n0, such that for n ≥ n0,‖fn − f‖∞ < 1. Then, ‖f‖∞ ≤ ‖fn0‖∞ + ‖fn0 − f‖∞ ≤ ‖fn0‖∞ + 1 < ∞, so thatf ∈ L∞, completing the proof. .

6/9/2006 1160 mam


The spaces Lp and Lp, 0 < p < 1 (Optional).

If 0 < p < 1, and if ‖f‖p is defined as before to be ‖f‖p =(∫

|f |p dµ)1/p, it

is still non-negative and has ‖f‖p = 0 iff f = 0 µ-a.e., but it need not satisfy the

triangle inequality; indeed, even for positive numbers,(a

12 + b

12

)2

= a + 2(ab)12 +

b > (a12 )2 + (b

12 )2.

On the other hand, the map f 7→ ‖f‖pp =

∫|f |p dµ does satisfy the triangle

inequality. This is proved, as with the case of p ≥ 1, by obtaining a correspondingresult for numbers: for a, b ≥ 0, 0 < p < 1, (a + b)p ≤ ap + bp.

Since this is obviously true for one of a, b = 0, we may assume neither is 0.Thus, it will be enough to show that (1 + b

a )p < 1 + ( ba )p. By symmetry, we may

assume a ≤ b. Put t = b/a ≥ 1. Then

(1 + t)p − tp = ptp−10 ,

for some t0 ∈ (t, t + 1). Since such a t0 is < 1 and p < 1, we have ptp−10 < 1, so

(1 + t)p − tp < 1,

which is what we wanted.

Now, if f, g ∈ L0 , we have |f + g| ≤ |f | + |g| , so

|f + g|p ≤ (|f | + |g| )p ≤ |f |p + |g|p,

and integrating gives the desired result

‖f + g‖pp ≤ ‖f‖p

p + ‖g‖pp.

A (real) topological vector space (TVS) is a (real) vector space E, witha topology on it for which addition (x, y) −→ x + y continuous and for whichmultiplication by a scalar (α, x) 7→ αx is also continuous.

If E is a vector space over R, a (real) paranorm on E is a function x 7→ ‖x‖on E to R such that for x, y ∈ E

(1) ‖x‖ ≥ 0 = ‖0‖

(2) ‖ − x‖ = ‖x‖(3) ‖x + y‖ ≤ ‖x‖+ ‖y‖.

(4) αn −→ 0, ‖xn − x‖ −→ 0 implies ‖αnxn − αx‖ −→ 0.

A paranormed space is made into a semimetric space in the usual way, usingthe distance d(x, y) = ‖x − y‖. Condition (3) assures that addition is continuous;condition (4), that multiplication is so.

We have seen that on Lp, ‖f‖ = ‖f‖pp satisfies the first 3 conditions of paranorm.

Since ‖αf‖pp = |α|p‖f‖p

p, it also satisfies condition (4). The subspace of those fwith ‖f‖p

p still 0 is, of course, the set of µ-null functions (i.e. those which are 0 a.e.)

6/9/2006 1160 mam


Theorem. For 0 < p < 1, the space Lp, of measurable functions with ‖f‖p < ∞, isa subspace of L0 , and under the induced distance becomes a complete pseudometricspace.

The proof of completeness is similar to that for 1 ≤ p < ∞, and is left to thethe reader.

The metric space obtained by factoring out the subspace of null functions isdenoted Lp. It is then a metrizable topological vector space, and the resultingmetric is complete. This is what is known as a Frechet space, or F-space.

6/9/2006 1160 mam


The space L0; convergence in measure, convergence almost everywhere,and convergence almost uniformly.

As we know, L0 is a vector space and for each δ > 0, [|f | > δ] = x : |f(x)| > δbelongs to M. A sequence (fn) in L0 converges to f ∈ L0 in µ-measure, denotedfn

µ−→ f or fn −→ f in µ, if

for each δ > 0, µ(|fn − f | > δ) → 0.

This says that “the measure of the set where there is an error of more than δ inusing fn to approximate f is arbitrarily small.” The sequence (fn) is Cauchy inµ-measure if for δ > 0,

µ(|fn − fm| > δ) −−→m,n

0.

We will see below that each sequence which is Cauchy in measure converges inmeasure.

From the Chebechev intequality, we immediately obtain that convergence inLp implies convergence in measure.

Lemma. If fn → f in Lp, then fn → f in µ-measure.

Proof. Indeed, by the Chebechev inequality,

µ(|fn − f | > δ) = µ(|fn − f |p > δp) ≤ 1δp

∫|fn − f |p dµ,

which converges to 0 if fn → f in Lp.

In Probability theory, if µ is a probability measure, convergence in measure iscalled convergence in probability .

These concepts can be given by a distance through the L0-quasinorm, to bedefined presently.

A quasinorm on a commutative group (E, +) is a function on E to R suchthat

(1) ‖0‖ = 0 ≤ ‖x‖ ≤ +∞

(2) ‖ − x‖ = ‖x‖(3) ‖x + y‖ ≤ ‖x‖+ ‖y‖.

Again, there is a natural distance associated with the quasinorm, given byd(x, y) = ‖x − y‖. This is what is known as a “generalized semimetric”, since itcan take on the value +∞; it can be turned into a semimetric d′, simply by puttingd′(x, y) = d(x, y) ∧ 1. The convergence properties will not change.)

The reader can show, just as for seminorms, that E is complete in the distanceof the quasinorm if and only if every absolutely convergent series converges.

For f ∈ L0, define

‖f‖0 = infδ ≥ 0 : µ(|f | > δ) ≤ δ.

6/9/2006 1160 mam


Temporarily, let W = δ ≥ 0 : µ(|f | > δ) ≤ δ. Then δ′ > δ ∈ W implies δ′ ∈ W .Indeed, [|f | > δ′] ⊂ [|f | > δ], so

µ(|f | > δ′) ≤ µ(|f | > δ) ≤ δ < δ′.

Thus, W is an infinite interval. Moreover, ‖f‖0 is the least element of W ; that is,a minimum is attained:

µ(|f | > ‖f‖0) ≤ ‖f‖0. (∗)

Indeed, if δn ‖f‖0, the sets [|f | > δn] increase to [|f | > ‖f‖0], so

µ(|f | > δn) → µ(|f | > ‖f‖0).

Thus, since µ(|f | > δn) ≤ δn, we obtain (∗) in the limit.

Proposition. ‖·‖0 is a quasinorm on L0 determining convergence and Cauchynessin µ-measure. Moreover, ‖f‖0 = 0 iff f = 0 µ-almost everywhere.

To say that the quasinorm ‖ · ‖0 determines convergence in µ-measure meansfn

µ−→ f if and only if ‖fn − f‖0 → 0. A similar comment holds for Cauchyness inmeasure.

Proof. First, f = 0 µ-almost everywhere iff µ(|f | > 0) = 0 iff ‖f‖0 = 0. Inparticular,

(1) ‖0‖0 = 0. Clearly, 0 ≤ ‖f‖0 ≤ +∞

(2) ‖f‖0 = ‖ − f‖0 is obvious.

(3) To see ‖f + g‖0 ≤ ‖f‖0 + ‖g‖0, notice that

(|f + g| > ‖f‖0 + ‖g‖0 ) ⊂ (|f | > ‖f‖0 ) ∪ (|g| > ‖g‖0 ) ,

so

µ (|f + g| > ‖f‖0 + ‖g‖0 ) ≤ µ(|f | > ‖f‖0) + µ(|g| > ‖g‖0) ≤ ‖f‖0 + ‖g‖0

and since ‖f + g‖0 is the least of the numbers δ for which µ(|f + g| > δ) ≤ δ,‖f + g‖0 ≤ ‖f‖0 + ‖g‖0 follows.

We now show that convergence in measure is determined by the distance of thequasinorm.

Suppose fnµ−→ f . Then for each δ > 0, µ(|fn − f | > δ) → 0. Thus, for a

fixed δ > 0, there exists n0 such that for n ≥ n0, µ(|fn − f | > δ) ≤ δ, and hence‖fn − f‖0 ≤ δ, so fn → f in the distance of ‖ · ‖0.

Conversely, if fn → f in the quasinorm and δ > 0, ε > 0 are fixed, then thereexists n0 such that for n ≥ n0, ‖fn − f‖ ≤ δ ∧ ε. But then for n ≥ n0,

µ(|fn − f | > δ) ≤ µ(|fn − f | > ‖fn − f‖0) ≤ ‖fn − f‖0 ≤ ε,

6/9/2006 1160 mam


so fnµ−→ f .

In the same way, one shows that a sequence is Cauchy in measure iff it is Cauchyin the distance of the quasinorm.

From now on, unless otherwise stated, fn converges (or is Cauchy) in L0

will mean in the quasinorm ‖ · ‖0, that is, in µ-measure. We will find that L0 iscomplete in the distance of ‖·‖0. We will do this by examining relations with almosteverywhere convergence and with another concept “almost uniform convergence”.

A sequence (fn) in L0 is said to converge µ-almost uniformly to f , denoted(fn → f a.u. or fn

a.u.−−→ f) if for each ε > 0, there exists A ∈ M with µ(A) < ε andfn → f uniformly on Ac. Naturally, (fn) is called almost uniformly Cauchy iffor each ε > 0, there exists A ∈ M with µ(A) < ε and (fn) is uniformly Cauchyon Ac.

Lemma. If (fn) is a sequence in L0 which is Cauchy almost uniformly, then thereexists f ∈ L0 such that fn converges almost uniformly to f .

Proof. For each m ∈ N, there exists Am with µ(Am) < 1/m and (fn) is Cauchyuniformly, hence converges uniformly, on Ac

m. In particular, limn fn(x) exists in R,for each x ∈ Ac

m. Since limits in the real number system are unique,

f(x) =

limn fn(x), x ∈⋃

m Acm

0, x ∈⋂

m Am

defines a function which is easily seen to belong to L0 and fn → f almost uniformly.

Lemma. For fn, f ∈ L0, fn → f µ-a.e iff for each δ > 0,

µ

⋂

n

⋃

k≥n

[|fk − f | > δ]

= 0.

The set whose measure is 0 here is the lim sup of the sets [|fn − f | > δ]; it isthe set of points where |fn − f | > δ for infinitely many n. The proof is an easyexercise.

Theorem. Equivalent are:

(a) fn → f almost uniformly

(b) For each δ > 0, µ(∃k ≥ n, |fk − f | > δ) → 0.

(c) fn → f µ-a.e and for each δ > 0 µ(∃k ≥ n, |fk − f | > δ) < ∞, for some n ∈ N.

The term Egorov convergence is sometimes used instead of “almost uniformconvergence”. The point of view here is due to R.G. Bartle, American MathematicalMonthly, 1980.

Notice that the set (∃k ≥ n, |fk − f | > δ) is the same as⋃

k≥n(|fk − f | > δ).

6/9/2006 1160 mam


Proof. (a) =⇒ (b) Suppose fn → f a.u. Fix δ > 0. For each ε > 0, there exists aset Aε of measure < ε such that fn → f uniformly on Ac

ε. Hence, there exists nsuch that ,

for all x ∈ Acε, |fk(x) − f(x)| ≤ δ, for all k ≥ n;

that is, ⋃

k≥n

[|fk − f | > δ] ⊂ Aε,

hence

µ

⋃

k≥n

[|fk − f | > δ]

≤ ε.

Thus, (b) holds.

(b) =⇒ (a) Fix ε > 0. From condition (b), we may choose nm such that if Bm

denotes⋃

k≥nm

[|fk − f | > 1

m

]then µ(Bm) < ε

2m .

Put Aε =⋃

m Bm. Then µ(Aε) < ε, and

∀x ∈ Acε, ∀m ∈ N, ∀k ≥ nm, |fk(x) − f(x)| ≤ 1

m.

Equivalently,

∀m ∈ N, ∀k ≥ nm, ∀x ∈ Acε, |fk(x) − f(x)| ≤ 1

m.

Thus, for all m ∈ N, there exists n(= nm) such that for all k ≥ n,

∀x ∈ Acε |fn(x) − f(x)| ≤ 1

m

and so fn → f uniformly on Acε.

(b)⇔(c) By the lemma, fn → f µ-a.e iff for each δ > 0,

µ

⋂

n

⋃

k≥n

[|fk − f | > δ]

= 0.

But

limn

µ

⋃

k≥n

[ |fk − f | > δ]

= µ

⋂

n

⋃

k≥n

[|fk − f | > δ]

,

iff for some n(⋃

k≥n[ |fk − f | > δ])

has finite measure. Thus, fn → f µ-a.e. and

this finiteness condition holds iff limn µ(⋃

k≥n[ |fk − f | > δ])

= 0.

6/9/2006 1160 mam


Corollary.

(1) Almost uniform convergence implies convergence in measure.

(2) (Egorov’s theorem) For a finite measure, almost everywhere convergence impliesalmost uniform convergence, hence also convergence in measure.

Proof. (1) This follows from the characterization of uniform convergence given by(b) above since

µ(|fn − f | > δ) ≤ µ

⋃

k≥n

[|fk − f | > δ]

.

(2) If the measure is finite, the finiteness condition in (c) of the theorem is satisfied,so the result follows.

We say fn → f , µ-almost uniformly on M ∈ M if for every ε > 0, thereexists E ∈ M with µ(M \ E) < ε and fn → f uniformly on E; we say fn → f inµ-measure on M if for every δ > 0, µ(M ∩ (|fn − f | > δ)) → 0.

By applying Egorov’s theorem to the measure µM defined by µM (A) = µ(M ∩ A),we can reword the result to say that convergence µ-almost everywhere impliesµ-almost uniform convergence on each measurable set M of finite measure, hencealso convergence in µ-measure on each such set.

Theorem. Every absolutely convergent series in L0 (in the quasinorm ‖ · ‖0) con-verges µ-almost uniformly, hence in µ-measure.

Proof. Let fn ∈ L0, and suppose∑

n ‖fn‖0 < ∞. Then,∑

n

µ(|fn| > ‖fn‖0) ≤∑

n

‖fn‖0 < ∞.

Thus, if Fn =⋃

k≥n[|fk| > ‖fk‖0], then µ(Fn) ≤∑

k≥n µ(|fk| > ‖f‖0) → 0, andon the complement of Fn,

∑k≥n |fk| ≤

∑k≥n ‖fk‖0 < ∞, so that the series

∑n fn

converges uniformly on F cn, by the Weierstrass M -test. This shows that the series

converges almost uniformly.

Corollary. L0 is complete under convergence in measure.

Proof. A quasinormed space is complete iff every absolutely convergent series con-verges, which it does by the theorem.

Corollary. If (fn) is a sequence converging in µ-measure, then it has a subsequencewhich converges µ-almost uniformly.

Proof. Suppose fn converges in L0. Then we may recursively choose nk such thatnk+1 > nk and ‖fm−fn‖0 < 1/2k, for m, n ≥ nk. Then the series

∑∞k=1(fnk+1−fnk)

converges absolutely, hence almost uniformly. But fnk = fn1 +∑k−1

j=1 (fnj+1 − fnj ),so fnk converges µ-almost uniformly.

As an application of the previous result, we can prove Lp-dominated conver-gence in L0 implies convergence in Lp.

6/9/2006 1160 mam


Theorem. Let fn → f in L0 and suppose there exists g ∈ Lp such that |fn| ≤ galmost everywhere, for all n ∈ N, then fn → f in Lp.

Proof. Let fn → f in µ-measure and |fn| ≤ g ∈ Lp. Suppose (fn) does not convergeto f in Lp. Then there is an ε > 0 and a subsequence (fnk) for which ‖fnk−f‖p ≥ ε,for all k. Then, there is a further subsequece fnkj

which converges almost uniformly(hence almost everywhere) to f . But |fnkj

| ≤ g ∈ Lp, so actually fnkj→ f in Lp,

a contradiction. .

Another proof of this fact will be found in the Vitali Convergence Theorembelow.

Convergence almost everywhere and domination by an element of Lp impliesalmost uniform convergence.

Theorem. Let fn → f µ-a.e. If there exists g ∈ L0 such that µ(g > δ) < ∞, forall δ > 0 and that |fn − f | ≤ g, µ-a.e. for n ≥ k, then fn → f a.u. In particular,this holds if 0 < p < ∞ and there exists an h ∈ Lp such that |fn| ≤ h µ-a.e.

Proof. If such a g exists,⋃

n≥k[|fn−f | > δ] is contained (except for a set of measure0) in [g > δ] and so has finite measure. Thus fn −→ f almost uniformly.

If |fn| ≤ h µ-a.e., then since fn → f µ-a.e. |f | ≤ h µ-a.e. also, so |fn − f | ≤2h ∈ Lp; moreover, µ(2h >δ) ≤ 1

δp

∫(2h)p dµ < ∞. Thus 2h is a suitable g.

A complete description of the connection between convergence in measure andLp convergence is in the following theorem.

Vitali Convergence Theorem. For a sequence (fn) in L0 and another f ∈ L0,fn → f in Lp iff

(1) fn → f in L0 (that is, in measure),

(2) for all ε > 0, there exists Kε with∫Kε

|fn|p dµ < ε, for all n, and

(3) for all ε > 0, there exists δ > 0 such that µ(A) < δ =⇒∫

A|fn|p dµ < ε,

for all n.

Proof. For writing this proof, we find it convenient to use norm notation rather thanintegrals. Notice that if g, h are M-measurable and |g| ≤ |h|, then ‖g‖p ≤ ‖h‖p.

( ⇐= )

Let K be any member of M. For each n, m we have fn−fm = fn111Kc −fm111Kc +(fn − fm)111K , so

‖fn − fm‖p ≤ ‖fn111Kc‖p + ‖fm111Kc‖p + ‖(fn − fm)111K‖p. (∗)

(The plan is to use condition (2) to make the first two terms small.) For each n, mand Hmn ∈ M, we can further estimate the third term by

‖(fn − fm)111K‖p ≤ ‖(fn − fm)111K\Hmn‖p + ‖fn111Hmn‖p + ‖fm111Hmn‖p.

6/9/2006 1160 mam


Take Hmn to be (|fn − fm| > α) where α > 0 is fixed. (We will specify what αshould be in a moment.) Then,

‖(fn − fm)111K\Hmn‖p ≤ ‖α111K‖p = α(µ(K))1/p

Thus,

‖fn−fm‖p ≤ ‖fn111Kc‖p+‖fm111Kc‖p+α(µ(K))1/p +‖fn111Hmn‖p +‖fm111Hmn‖p (∗∗)

Now, let ε > 0.

First, using (2), choose K with µ(K) < ∞, so that for all n, ‖fn111Kc‖p < ε.

Then, choose α so that α(µ(K))1/p < ε.

Then, using condition (3), choose δ so that µ(A) < δ implies ‖f111A‖p < ε, for all n.

Finally, use the fact that (fn) is Cauchy in measure to choose n0 such that form, n ≥ n0, µ(Hmn) = µ(|fn − fm| > α) < δ. Then the last two terms of (∗∗) areeach < ε.

Then, for n, m ≥ n0, ‖fn − fm‖p ≤ 5ε, so (fn) is Cauchy in Lp. Thus, (fn)converges in Lp to some f ′. But then, fn → f ′ in measure also, so f = f ′ µ-almosteverywhere and fn → f in Lp.

The details of the remainder of this section are still to be put in these notes.

Connections between the various Lp spaces.

Theorem.

(1) For a finite measure µ on S, L∞ ⊂ Lp ⊂ Lq if q < p, and

‖f‖p ≤ ‖f‖∞µ(S)1/p, ‖f‖q ≤ ‖f‖pµ(S)(1q − 1

p ).

In particular,

(a) convergence in L∞ implies convergence in Lp.

(b) If q < p, then convergence in Lp imples convergence in Lq.

Morover, as p −→ ∞,‖f‖p −→ ‖f‖∞.

(2) For counting measure, the reverse.

Summary diagrams to be done in class.1. For functions fn, f ∈ L0 , fn −→ f µ-almost everywhere iff for all δ > 0, µ(|fn − f | >

δ, for infinitely many n) = 0.

2. Prove that L0 is a topological vector space under the ‖ · ‖0 quasinorm iff there does not exist

a sequence An ∅ with µ(An) = +∞. (Hint: it is the continuity of (a, f) −→ af that is inquestion.)

6/9/2006 1160 mam


3. If ϕ is uniformly continuous on R to R, and fn converges in measure to f , then ϕ fn

converges in measure to ϕ f .

Conversely, if ϕ is not uniformly continuous, there exists a measure space (S,M, µ), M a

σ-algebra, and a sequence (fn) converging uniformly (hence in measure) to a function f , butsuch that ϕfn does not converge in measure to ϕf . (Hint: start by negating the definition

of uniform continuity.)

4. Let (fn) and (hn) be sequences in L0(µ), f ∈ L0 . Suppose (hn) is a decreasing sequence

converging to 0 in µ-measure, and |fn − f | ≤ hn, for all n. Prove that fn −→ f µ-almostuniformly.

Proof. The sequence (fn) converges almost uniformly to f if and only if for each δ > 0,µ(∃k ≥ n, |fk − f | > δ) → 0. But, by hypothesis,

|fk − f | ≤ hk ≤ hn, for k ≥ n

Hence, ⋃

k≥n

(|fk − f | > δ) ⊂ (hn > δ).

Since (hn) converges to 0 in measure,

µ

⋃

k≥n

(|fk − f | > δ)

≤ µ(hn > δ) → 0,

as required.

5. For 0 < p < ∞ deduce from the Dominated Convergence Theorem (in L1) the Lp version: If

fn −→ f a.e. and there exists g ∈ Lp such that |fn| ≤ g pointwise (or a.e.) then fn −→ f inLp.

6. Concerning the L0 quasinorm, ‖f‖0 = infδ ≥ 0 : µ(|f | > δ) ≤ δ, prove

a) ‖f‖0 ≤ ‖f‖∞ ∧ µ(f 6= 0). Need there be equality?

b) If |α| ≤ 1, ‖αf‖0 ≤ ‖f‖0

Soln.. (a) First, µ(|f | > ‖f‖∞) = 0 ≤ ‖f‖∞. Since ‖f‖0 is the least δ with µ(|f | > δ) = 0 ≤δ, this shows

‖f‖0 ≤ ‖f‖∞.

Also, if δ = µ(f 6= 0), then µ(|f | > δ) ≤ µ(f 6= 0) = δ, so

‖f‖0 ≤ µ(f 6= 0).

Thus, ‖f‖0 ≤ ‖f‖∞ ∧ µ(f 6= 0).

To see that there need not be equality, take µ to be Lebesgue measure on the Borel sets of

R, and put

f(x) =

1, 0 ≤ x ≤ 2

3, 2 < x ≤ 3

0, otherwise

Then µ(|f | > 1) = 1 ≤ 1, and if δ < 1, µ(|f | > δ) = 3 > δ, so ‖f‖0 = 1 < ‖f‖∞∧µ(f 6= 0) =3.

(b) If |α| ≤ 1, then |αf | ≤ |f |, so

µ(|αf | > ‖f0‖) ≤ µ(|f | > ‖f‖0) ≤ ‖f‖0,

6/9/2006 1160 mam


so ‖αf‖0 ≤ ‖f‖0.

7. Suppose µ is a finite measure. Define ‖f‖• = ‖ |f | ∧ 1‖1, for f ∈ L0 and d(f, g) = ‖f − g‖•.

a) Show that this d is a semimetric on L0 .

b) Show that a sequence (fn) in L0 converges to f in measure iff d(fn, f) −→ 0.

8. Suppose µ is a finite measure. Define ‖f‖• =∥∥∥ |f |

1+|f |

∥∥∥1, for f ∈ L0 and d(f, g) = ‖f − g‖•.

a) Show that this d is a semimetric on L0 .

b) Show that a sequence (fn) in L0 converges to f in measure iff d(fn, f) −→ 0.

9. If fn is a sequence of indicator functions 1An converging in L1 to f , prove that f is equal

µ-almost everywhere to an indicator function.

10. Alternate way of establishing AB ≤ Ap

p+ Bp′

p′ : Fix 0 < s < 1, let ϕ(x) = xs, for x > 0.

Then ϕ′(x) = sx(s−1), which is s at x = 1 and ϕ has a negative second derivative, so thegraph of ϕ is below its tangent line:

xs ≤ sx + (1 − s).

Now substitute x = a/b, a, b > 0 and multiply by b.

asb1−s ≤ sa + (1− s)b.

Now take s = 1p, so (1− s) = 1

p′ and put a = Ap and b = Bp′.

11. Proof of Minkowski’s inequality from Holder’s. Let p′ be the conjugate exponent, then|f + g|p = |f + g||f + g|p−1 ≤ |f ||f + g|p−1 + |g||f + g|p−1. Integrating and using Holder’s

inequality on the first term on the right gives (since (p − 1)p′ = p)

∫|f ||f + g|p−1 dµ ≤ ‖f‖p

(∫|f + g|(p−1)p′

dµ

)1/p′

= ‖f‖p ‖f + g‖p/p′p = ‖f‖p ‖f + g‖p−1

p

The second term is handled in the same way, so

‖f + g‖pp ≤ (‖f‖p + ‖g‖p)‖f + g‖p−1

p .

The result follows by dividing by ‖f + g‖p−1p , since the case when this is zero is trivial.

12. The material on L0 remains valid for the case of a σ-ring, and µ a (non-negative c.a.) measureon S, and L0 = L0(S), the set of all S-B0 measurable functions. B0 is the collection of Borel

sets of R \0, that is B \ 0 : B ∈ B(R) . Each f ∈ L0 is a pointwise limit of a sequenceof S-simple functions.

6/9/2006 1160 mam

Documents

REAL ANALYSIS I ––– BASIC MEASURE AND INTEGRATION nweb2.uwindsor.ca/math/traynor/RA1/510.pdf · The integral of a real-valued function is constructed using the ordering of the