28
CHAPTER I PROBABILITY SPACES 1. INTRODUCTION TO JR An introduction to analysisusuallybegins with a study of propertiesof JR, the set of real numbers. It will be assumed that youknowsomething about them. Morespecifically, it isassumed that yourealize that (i) they form a field (which means that you can add, multiply, subtract and divide in the usual way), and (ii) they are (totally) ordered (i.e., for any two numbers a, bE JR, either a < b, a = b, or a > b) and a 2: b if and only if a - b 2: O. Note that a < band 0 < c imply ac < be; a b imply a + c b + c for any c E JR; 0 < 1 and -1 < O. Inside JR lies the set Zof integers {... , -2, -1,0,1,2,3, ... } and the setN of natural numbers {1,2, ... }. Also, there isasmallestfieldinside JR that contains Z, namely Q, the set of rational numbers {p/q : p, q E Z and q i:- O}. Areal number that is not rational issaid to be irrational. It isoftenuseful to view JR as the pointsona straight line(seeFig. 1.1). .. o Fig. 1.1 Then a < b is indicatedbyplacing a strictly to the leftof b, if the line is orientedby putting 0 to the left of 1. The fieldQisalso totally ordered, but it differsfrom JR in that there are "holes" inQ. Toexplainthis, it is useful to makeadefinition. Definition 1.1.1. Let A c lR. A number b with the property that b 2: a for all a E A is said to be an upper bound of A. If b a for all a E A, b is said to bea lower bound of A. Example. Let A = {a E JR I a < 1}. Then any number b 2: 1is an upper bound of A. Here there is a smallest or least upper bound (1. u.b.), namely 1. Exercise 1.1.2. Let A = {a E Q I a 2 < 2}. Show that ifthere is a least upper bound of A, it cannot belong to Q. [Hints: show that, forsmall p> 0, a 2 < 2implies that (a+p)2 < 2 and a 2 > 2implies that (a_p)2 > 2; conclude that if b is a1.u.b. of A, then b 2 = 2; show that there isno rational J. C. Taylor, An Introduction to Measure and Probability © Springer-Verlag New York, Inc. 1997

An Introduction to Measure and Probability || Probability Spaces

  • Upload
    j-c

  • View
    223

  • Download
    3

Embed Size (px)

Citation preview

Page 1: An Introduction to Measure and Probability || Probability Spaces

CHAPTER I

PROBABILITY SPACES

1. INTRODUCTION TO JR

An introduction to analysis usually begins with a study of properties ofJR, the set of real numbers. It will be assumed that you know somethingabout them. More specifically, it is assumed that you realize that

(i) they form a field (which means that you can add, multiply, subtractand divide in the usual way), and

(ii) they are (totally) ordered (i.e., for any two numbers a, bE JR, eithera < b, a = b, or a > b) and a 2: b if and only if a - b 2: O. Note thata < band 0 < c imply ac < be; a ~ b imply a + c ~ b + c for anyc E JR; 0 < 1 and -1 < O.

Inside JR lies the set Z of integers {... , -2, -1,0,1,2,3, ... } and theset N of natural numbers {1, 2, ... }. Also, there is a smallest field insideJR that contains Z, namely Q, the set of rational numbers {p/q : p, q E

Z and q i:- O}. A real number that is not rational is said to be irrational.It is often useful to view JR as the points on a straight line (see Fig. 1.1) .

..o

Fig. 1.1

Then a < b is indicated by placing a strictly to the left of b, if the lineis oriented by putting 0 to the left of 1. The field Q is also totally ordered,but it differs from JR in that there are "holes" in Q. To explain this, it isuseful to make a definition.

Definition 1.1.1. Let A c lR. A number b with the property that b 2: afor all a E A is said to be an upper bound of A. If b ~ a for all a E A, bis said to be a lower bound of A.

Example. Let A = {a E JR Ia < 1}. Then any number b 2: 1 is an upperbound of A. Here there is a smallest or least upper bound (1.u.b.),namely 1.

Exercise 1.1.2. Let A = {a E Q I a2 < 2}. Show that if there is aleast upper bound of A, it cannot belong to Q. [Hints: show that, for smallp> 0, a2 < 2 implies that (a+p)2 < 2 and a2 > 2 implies that (a_p)2 > 2;conclude that if b is a 1.u.b. of A, then b2 = 2; show that there is no rational

J. C. Taylor, An Introduction to Measure and Probability© Springer-Verlag New York, Inc. 1997

Page 2: An Introduction to Measure and Probability || Probability Spaces

2 I. PROBABILITY SPACES

number b with b2 = 2.] One can view ..j2 E lR as a number that "fills ahole in Q". "Holes" exist because it is often possible to split the set Qinto two disjoint subsets A, B so that (i) a E A and b E B implies thata ~ b; (ii) there is no l.u.b. of A in Q and no largest or greatest lowerbound (g.l.b.) of Bin Q. A decomposition of Q satisfying (i) is called aDedekind cut of Q.

1.1.3. Axiom of the least upper bound. Every subset A of lR thathas an upper bound has a least upper bound.

This axiom is an extremely important property of R It will be takenfor granted that the real numbers exist and have this property. (Startingwith suitable axioms for set theory, one can show that (i) there exist fieldswith the properties of lR and (ii) any two such fields are isomorphic (i.e.,are the "same"); see Halmos [H2].)

Exercise 1.1.4. (1) Show that N has no upper bound. [Hint: show thatif b is an upper bound of N, then b - 1 is also an upper bound of N.](2) Let (; > 0 be a positive real number. Show that there is a natu­

ral number n E N with 0 < lin < g. [Hint: use (1) and the fact thatmultiplication by a positive number preserves an inequality.]

This last exercise shows that the ordered field lR has the Archimedeanproperty, i.e., for any number (; > 0 there is a natural number n with0< lin < L

If E and F are two sets, it will be taken for granted that the concept ofa function f : E 1-+ F is understood. When E = N, a function f :N -+ F isalso called a sequence of elements from F. The function f in this caseis often denoted by (f(n))n?l or (fn)n?l, where f(n) = fn is the value off at n. When the domain of n is understood, (fn)n?l is often shortened to(fn). Given a sequence (fn)n?l, a subsequence of (fn)n?l is a sequenceof the form (fnk)k?l, where n : N -+ N is a strictly increasing functionk -+ nk, e.g., nk = 2k for all k 2: 1. If the original sequence is written as(f(n))n?l' a subsequence may be indicated as (f(nk))k?l or (f(n(k)))k?l.The important thing is that in a subsequence one selects elements from theoriginal sequence by using a strictly increasing function n.

Definition 1.1.5. A sequence (bn)n?l of real numbers converges to Bas n -+ +00 if for any positive number (; > 0,

B - (; < bn < B + (;, for all sufficiently large n.

That is, Ibn - BI ~ (; ifn 2: n((;), where n is an integer depending on € andthe sequence (see Exercise 1.1.16 for the definition of lal). This is denotedby writing B = limn~+oo bn.

A sequence (bn)n>l of real numbers converges to +00 if for any N EN, bn 2: N, n 2: n(N).

Page 3: An Introduction to Measure and Probability || Probability Spaces

1. INTRODUCTION TO IR 3

Exercise 1.1.6. Let (bn)n~l be a non-decreasing sequence of real num­bers, i.e., bn ::; bn+l for all n 2 1. Show that

(1) the sequence converges,(2) the limit is finite if and only if the sequence has an upper bound,(3) when the limit is finite, it equals the 1.u.b. of {bn In 2 I}.Let (b~)n~l be another non-decreasing sequence with bn ::; b~ for all

n 2 1. Show that(4) limn bn ::; lim b~.

Definition 1.1.7. Let (an)n~o be a sequence of real numbers. Let Sn =ao +al + a2 +... +an' The sequence (sn)n~O is called the infinite series2::=0 an, and the series is said to converge if (sn)n~O converges.

Exercise 1.1.8. Let (an)n~o be a sequence of positive real numbers. Showthat

(1) the series 2::=0 an converges if and only if {Sn I n 2 o} has anupper bound,

(2) if 2::=0 an converges, then it converges to 1.u.b. {sn In 2 O}.Finally, show that

(3) if 2::=0 an converges to a finite limit, then limN->CXl 2::=N an = O.

Exercise 1.1.9. Suppose one has a random variable X whose values arenon-negative integers. Let (an)n>o be a sequence of positive real numbers.When can the proba.bility that X-is n equal can where c is a fixed constant?What happens if an = *? (if an = ~ or an = nl~gnor (ni~2)?) Thisexercise begs the question: what is a random variable? For the time being,think of it as a procedure that assigns probabilities to certain outcomes.The definition is given in Chapter II, see Definition 2.1.6.

Exercise 1.1.10. A random variable X has a Poisson distribution withmean 1, if the probability that X is n is e- l In! (see Feller IFl) for thePoisson distribution).

Proposition 1.1.11. (Exchange of order of limits). Let (bm,n)m,n~l

be a double sequence of real numbers, i.e., a function b : N x N -+ JR.Assume that

(1) ml ::; m2 :=} bm1,n ::; bm2 ,n for all n 2 1,(2) nl ::; n2 :=} bm,nl ::; bm,n2 for all m 2 1.

Then

lim ( lim bm n) = lim ( lim bm n) = lim bn n,n-++oo m-++oo' m-++oo n-++oo' n-++oo'

where an increasing sequence has limit +00 if it is unbounded.

Proof By symmetry it suffices to verify the second equality. Now bm,m ::;

bn,n if m ::; n and so B = limn->CXl bn,n exists, as does Bm~f limn->CXl bm,n.

Page 4: An Introduction to Measure and Probability || Probability Spaces

4 I. PROBABILITY SPACES

Also, by Exercise 1.1.6, Bm :s; B as bm,n :s; bn,n when n ~ m. It followsfrom (1) and Exercise 1.1.6 that Bm1 :s; Bm,. Hence, limm _ oo Bm = B'exists and is less than or equal to B (see Exercise 1.1.6 again).

If B is finite, let f > 0 and n = n(f) be such that B - f :s; bn,n :s; B ifn ~ n(f). Let m = n(f). Then Bm = limn_oo bm,n ~ bn(f),n(f) ;::: B - f.Hence, B' = B as B' ~ B m for all m.

If B = +00 and N :s; bn,n for n ~ n(N), then Bm = limn_oo bm,n ~ Nif m = n(N) and so B' = +00. 0

Corollary 1.1.12. L::I(L:f==1 aij) = L:f==I(L::I aij) ifaij ~ 0, 1:S; i,j.

Proof. Let bm,n = L:;:l L:7=1 aij and verify that the conditions of Propo­sition 1.1.11 are satisfied. 0

Exercise 1.1.13. Decide whether Proposition 1.1.11 is valid when onlyone of (1) and (2) is assumed. In Corollary 1.1.12, what happens if aii =1, ai,HI = -1 for all i ~ 1 and all other aij = O?

Exercise 1.1.14. (See Feller [Fl]' p. 267.) Let Pn ~ 0 for all n ~ 0 andassume L::'=oPn = 1. Let mr = L::'=onrpn' Show that L:::o":,tr =

L::'=o Pnent for t ~ O.

This brief discussion of properties of lR concludes with a discussion ofilltervais.

Definition 1.1.15. A set [ C lR is said to be an interval if x :S y :S zand x, z E [ implies y E [.

If an interval I has an upper bound, then [ C (-00, b], where b = 1.u.b.[ and (-00, b] = {x E lR I x :s; b}. If it also has a lower bound, then[ C [a, b] = {x Ia :S x :s; b} if a = g.1.b. I. A bounded interval [ - onehaving both upper and lower bounds - is said to be

(1) a closed interval when [ = [a, b],(2) an open interval when [= (a,b)~f{x Ia < x < b} (often denotedby la, b[),

(3) a half-open interval when I = (a,b] or [a,b), where (a,b]~f{a Ia < x:S; b} and similarly [a,b)~f{a Ia:S; x < b}.

One also denotes (a,b] by ]a,b] and [a, b) by [a,b[.An unbounded interval [ is one of the following:

(-oo,b) = {x I x < b};

(-oo,b] = {x I x:S; b};

(a, +00) = {x I a < x};

[a,+oo)={xla:s;x}; or

(-00, +00) = JR.

Page 5: An Introduction to Measure and Probability || Probability Spaces

2. WHAT IS A PROBABILITY SPACE? MOTIVATION 5

Exercise 1.1.16. If x E JR., define Ixl = x if x 2: 0 and = (-I)x if x < O.Let a, b be any two real numbers. Show that

(1) la + bl ~ lal + Ibl (the triangle inequality),(2) conclude that lIal- Ibll ~ la - bl by two applications of the triangleinequality.

If a, b are two numbers, let a V b denote their maximum, also denoted bymax{a, b}, and a /\ b denote their minimum, also denoted by min{a, b}.Define x+ to be max{x,O} = x V 0 and x- to be max{-x, O} = (-x) V O.Show that

(3) x- = -(x /\ 0),(4) x = x+ - x- ,(5) Ixl = x+ +x-,(6) a V b = Ha + b+ la - bll, and(7) a/\b=Ha+b-la-bl}.

Exercise 1.1.17. Verify the following statements for -00 < a < b < +00:(1) (a,b) = u~=N(a,b -~], where b - a> -k;(2) [a, b) = n~=N(a - ~,b);

(3) [a,b] = n~=da,b+ ~).

Show that if Xo E (a, b), then (xo - 6, Xo + 6) c (a, b) for some 6 > 0 (thisimplies that (a,b) is an open set; see Exercise 1.3.10).

Exercise 1.1.18. Let 0 < a < 1. If p > 0, show that at = et logo is an1

increasing function of p and that limp _HXl a;; = 1.

Exercise 1.1.19. Let JR. be the union of two disjoint intervals It and 12 •Show that

(1) one of the two intervals is to the left of the other (either Xl E Itand X2 E 12 always implies Xl < X2 or vice-versa), and

(2) if neither of these two intervals is the void set, then sup It = infhif It is to the left of h

Let (In) be an increasing sequence of intervals. Show that

(3) if 1= Un l n , then I is an interval.

Assume that each In above is unbounded and bounded below, Le., thereis an E JR. with (an' +00) C In C [an, +00). Assume that I has the sameproperty. Show that

(4) if (a, +00) C I c [a, +00), then liman = a.

For further information on the real line and general background infor­mation in analysis, consult Marsden [Ml] or Rudin [R4].

2. WHAT IS A PROBABILITY SPACE? MOTIVATION

A probability space can be viewed as something that models an "ex­periment" whose outcomes are "random" (whatever that means). There

Page 6: An Introduction to Measure and Probability || Probability Spaces

6 I. PROBABILITY SPACES

are often "simple" or "elementary outcomes" in the model (as points in anunderlying set 0 and weights assigned to these outcomes that indicate thelikelihood or probability of the outcome occurring (see Feller [Fl]). Thegeneral outcome or "event" is often a collection of "elementary outcomes" .For example, consider the following.

Example 1.2.1. The "experiment" consists of rolling a fair six-sided dietwo times. The "elementary outcomes" could be taken to be ordered pairsw = (m, n), where m and n are integers from 1 to 6. The set 0 of elementaryoutcomes may be taken to be the set of all such ordered pairs (it is usual todenote this set as the Cartesian product {I, 2, ... ,6} x {I, 2, ... , 6}). Theset of all events may be taken as the collection of all subsets of 0, denotedby 1.l3(0). If each elementary outcome w is assigned weight 3~' then onemay define the probability P(A) of an event A by P(A) = LWEA P( {w}) =

tm = W, where IAI denotes the number of elements in A. If the die is

not fair - the probability of getting either a 1,2,3, or a 4 is k and forexample that of getting either a 5 or a 6 is ~ - then the basic probabilitiesor weights of the elementary outcomes will need to be altered to correspondto the new situation.

For elementary situations as in Example 1.2.1, it suffices to consider aso-called finitely additive probability space. This is a triple (0,21, P),where 0 is a set (corresponding intuitively to the set of "elementary out­comes", 21 is a collection of subsets of 0 (the "events") with certain "al­gebraic" properties that make it into a Boolean algebra of subsets of 0,and for each event A E 21 there is a number P(A) assigned that lies betweenoand 1 (the probability of the occurrence of A). More explicitly, to saythat 21 is a Boolean algebra means that the collection 21 of subsets satisfiesthe following conditions:

(21d 0 E 21;(212) AI, A2 E 21 implies that A 1 UA2 E 21; and

(213 ) A E 21 implies that N E 21, where N ~f {w E 0 Iw f/- A} ~f CA.

The statement that P is a probability means that it is a function definedon 21 with the following properties:

(Pd P(O) = 1;(P2 ) 0 ~ P(A) ~ 1; and

(FAP3 ) Al n A2 = 0~ P(AI U A2 ) = P(A1) + P(A2 ).

It is not hard to see that Example 1.2.1 is a finitely additive probabilityspace.Some simple consequences of the properties of a Boolean algebra 21 and

a finitely additive probability defined on it are given in the next exercise.

Exercise 1.2.2. Show that·

(1) AI ,A2 E 21 implies that Al n A2 E 21 and A 1 n A~ E 21 (one oftendenotes Al n A~ by Al\A2 or Al nCA2 ),

Page 7: An Introduction to Measure and Probability || Probability Spaces

2. WHAT IS A PROBABILITY SPACE? MOTIVATION

(2) P(0) = 0,(3) Al c A2 implies that P(AI ) S; P(A2 ),

(4) P(A I U A2 ) S; P(AI) + P(A 2 ),

(5) P(A I U A2 ) + P(A 1 n A2 ) = P(AI) + P(A2 ),

(6) P(Uk=1Ak ) = L:~=l P(A k \ U:::} Ai) S; L:~=l P(Ak ).

7

Remark 1.2.3. In Exercise 1.2.2 (6), a union of sets Uk=lAk is convertedto a disjoint union Uk=1A~, where A~ = Ak \ U:::} Ai' This is a standarddevice or trick that is often used, especially for countable unions A =

Uk::1Ak ·

Here is another example of an "experiment" with "random" outcomes.

Example 1.2.4. What probability space is it natural to use to discussthe probability of choosing a number at random from [0,1]? What is theprobability of choosing a number from (~, ~]? Clearly, one should take n tobe [0,1] and 21 to be the collection of finite unions of intervals contained in[0,1] (so that 21 is a Boolean algebra containing intervals and their unions).Show that 21 is a Boolean algebra. How do you define P on 2l?

Example 1.2.5. Continuing with the same probability space as in Exam­ple 1.2.4, suppose that one wants to discuss the probability of selecting atrandom a number x with the following property: it does not lie in (~, ~)

- i.e., the middle third of [0,1] - nor does it lie in the middle third ofeither [o,~] or [~, 1] - and so on, infinitely often. This describes a subsetC of [0,1], an event.Look at the complementary event CC: it is the disjoint union of middle

third intervals; CC = (~, ~)U[(~, ~)U(~, ~)]u[(b, ~)u(~, ~)U(~, ~)U

(~, ~)] U ....Let q be the probability of CC. Then one sees that

(l)q?~,

(2) q? ~ + ~ + ~ = ~ + ~,(3) q? ~ + ~ + ~,

To verify this, one makes use of the principle of mathematical induction,stated below.

The principle of mathematical induction. Let P(n) be a propositionor statement for each n EN. If

(1) P(I) is true and(2) P(n + 1) is true provided P(n) is true,

then P(n) is true for all n. (This principle amounts to saying that if A eNis such that (1) 1 E A and (2) n E A implies n + 1 E A, then A = N).

Page 8: An Introduction to Measure and Probability || Probability Spaces

8 I. PROBABILITY SPACES

N 1 2 4 2n -1• h h . I f h . ,,00 2 k

ow 3 +"9 + 27 + ... +3" IS tent partIa sum 0 t e senes L.Jk=O y<+r'

This is a geometric series with ao = ~ and T = ~, and so the sum is~ (1 - ~) -1 = 1. Consequently, one expects the probability of choosingsuch an x from C to be zero!

Remark. The subset C of [0, 1] described in Example 1.2.5 is called theCantor set or Cantor discontinuum. It contains no interval with dis­tinct end points. Why? What would happen if instead of extracting middlethirds one removed the middle quarter at each stage?

Note that neither the Cantor set C nor its complement is in the Booleanalgebra 21 defined in Example 1.2.4, as the set CC is the union of an infinitenumber of open intervals each of which is in the Boolean algebra 21: itcan be shown that CC cannot be expressed as a finite union of sets from2t - this has to do with the fact that each of the intervals involved is aconnected component of CC, i.e., for any point x in one of these intervalsI, the largest open interval that contains it coincides with I (see Exercise1.3.11).

Example 1.2.6. (Coin tossing) Suppose one tosses a fair coin until ahead occurs. What is the probability of this event? To begin with, whatcould one take as n, the set of "elementary outcomes"? Take n to be N,where each n 2: 1 corresponds to a finite string of length n that commenceswith n - 1 heads and concludes with a tail. Here probabilities may beassigned to each integer n 2: 1, namely 2~ for the string of length n. Since

E::'=l 2~ = ~ C~!) = 1 this gives a probability space with 2t taken tobe all the subsets of nand P(A) defined to be EnEA 2~' Therefore, theprobability of first obtaining a head on an even number toss is E::'=1 ~ =~ C~~) = ~. In this example, you can verify that if A = U~=lAn and

An n Am = 0 when n =1= m, then P(A) = E::'=l P(An ). This is clearlya desirable property of a probability, but how can it be obtained in thecontext of the previous exercise, where n = [0,1]7

Example 1.2.7. Suppose X is a random variable with unit normal distri­

bution, i.e., the probability that a < X ~ b is b J: e-( ~2 )dx = P«a, b]).What is the probability p that X takes values in the Cantor set? ImitatingExample 1.2.5, one may compute the probability q of the complementaryset. Then

(1) q 2: P«~, ~»,

(2) q 2: P«~, ~» + P«~, ~» + P«~, ~», and(n) q 2: P«~, ~» + P«~, ~» + P«~,~» + ... + p«3~;;-2, 3~;;-1».

Therefore, one expects the probability p to be

1-limn-+00{P«~, ~»+P«~, ~»+P«~, ~»+p«b, b»+P«~, ~»+ P«~, ~» + P«~, ~» + ... + p«3~;;-2, 3~;;-1 »}.

Page 9: An Introduction to Measure and Probability || Probability Spaces

3. DEFINITION OF A PROBABILITY SPACE 9

At this point, it is not so clear what the answer should be here. In fact itis zero! (See Proposition 2.4.1.)

Exercise 1.2.8. Write a more explicit formula for the probability that theabove random variable X takes values in the set Cn , which results after theprocedure for constructing the Cantor set has been applied n times. Thisamounts to getting a handle on this set by writing an explicit descriptionof the intervals involved in the removal process.

[Hints: after completion of the nth stage of the procedure for construct­ing the Cantor set, one is left with 2n intervals each of length 3~' Theyare all translates of the interval [0, 3~]' To describe Cn, it suffices to de­termine the left hand endpoints of these intervals. One may do this byobserving that, for each integer n, if k < 3n , then k has a unique expres­sion as k = I:::o1ai3i with the ai E {O, 1, 2}. One may use this to writea triadic "decimal" expansion for the left hand endpoints of the intervalsremaining at the nth stage. Show that they will be of the form 0.b1b2 ..• bn

with bi E {O, 2}, where 0.b1b2 ··· bn = 3~ I:~=l bi3n-

i . Show that one mayobtain them from the 2n - 1 left hand endpoints occurring at the (n - l)ststage by first putting a zero in the "first position", i.e., by shifting the"decimal" over to the right by one place and inserting a zero and thendoing the same thing but this time inserting a two.]

What Examples 1.2.5, 1.2.6, and 1.2.7 hint at is the following: whileone often starts to construct a probability using some "obvious" definitionfor certain simple sets (those in 21), it is soon useful and necessary to tryto extend the probability to more complicated sets that are made up fromthose of 21 by infinite operations. In addition the probability should notonly be finitely additive (i.e., satisfy (FAP3)), but also countably additive,as its computation will often involve infinite series.

3. DEFINITION OF A PROBABILITY SPACE

Definition 1.3.1. (0, J, P) is said to be a probability space if 0 is aset, J is a a-algebra of subsets of 0, and P is a (countably additive)probability on J. To say that J is a a-algebra means that the collectionJ of subsets of 0 satisfies

(Jd 0 E J;(J2) A E J implies Ac E J; and(J3) (Ank~l c J implies U~=lAn E J.

Furthermore, P is a function defined on J that satisfies

(Pd P(O) = 1;(P2 ) 0 S; P(A) S; 1 if A E J; and(P3 ) (An)n>l c J, and An n Am = 0 ifn =I- m implies

P(U~=lAn) = I::=l P(An ) (countable additivity).

Page 10: An Introduction to Measure and Probability || Probability Spaces

10 1. PROBABILITY SPACES

Remarks. (1) A probability differs from a finitely additive probabilityby the important property of being countably additive (P3 ) (also calleda-additive).(2) A probability space is also a finitely additive probability space since

countable additivity (P3 ) implies finite additivity (FAP3 ).

(3) A a-algebra is also called a a-field.

In Example 1.2.4, the basic way of computing probability was to assignlength to intervals contained in [0,1]. Corresponding to this is a naturalfunction F, which determines these probabilities: define F(x) to be thelength of [0, x] if 0 ~ x ~ 1. This function can be extended to all of IRby setting F(x) = 0 when x ~ 0 and F(x) = 1, if 1 ~ x. This extendedfunction can be used to assign a probability to any interval (a, b]: namely,P((a, b]) = F(b) - F(a). This probability is the length of (a, b] n [0, 1]. Asa result, the "experiment" of Example 1.2.4 can also be modeled by usingall the intervals of IR as basic events, with the proviso that any intervaldisjoint from [0,1] has probability zero. The function F is an example of adistribution function (see Definition 1.3.5).Whenever one has a probability space (f!, J, P) with f! = IR and the 17­

algebra J contains every interval (-00, x], then the probability determinesa natural function in the same way: let F(x) ~fp((-oo,xD.This function has the properties (DF1 ), (DF2 ), and (DF3 ) stated in the

following proposition.

Proposition 1.3.2. The function F(x) = P((-00, x]) associated with aprobability P has the following properties:

(DFd x ~ y implies F(x) ~ F(y) (i.e., F is a non-decreasing function);(DF2 ) (i) limx~+oo F(x) = 1 (i.e., for any positive number t, F(x) ~ 1- t

if x is large enough and positive); and(ii) limx~_oo F(x) = 0 (i.e., for any positive number t, F(x) ~ t ifx is negative and Ixl is sufficiently large);

(DF3 ) for any x, if (xn) is a sequence that decreases to x, then the valuesF(xn) decrease to F(x) (i.e., F(x) = limn~oo F(xn)).

Proof. As an exercise prove (DFd and (DF2 ) using properties (Pd, (P2 ),

and (P3 ). [For the first part of (DF2 ) use Exercise 1.1.8 (3) and the factthat IR = U~~_oo(n, n + 1] to show that 1 = F(n) + L:~n P((k, k + 1]).]The property (DF3 ) says that for any x, the values of F(xn) as Xn

approaches x from above (i.e., from the right) approach F(x). The technicalformulation of this statement is that limn~ooF(xn) = F(x) if x ~ Xn+l ~

X n and limn X n = x. Since F is non-decreasing, it suffices to show that forany x, F(x) = limn~oo F(x + ~).

Let An = (x,x + ~]. Then An+1 C An and n;:"=IAn = 0. Now Al =(x, X+ 1] = U~l(x + (k~l)'X + i] and An = Uk=n(x + k~I' X+ n Hence,

P(Ad = L:~I P((x+ k~l' x+ iD ~ 1, and so L:;;"=n P((x+ k~I' x+ iD =

Page 11: An Introduction to Measure and Probability || Probability Spaces

3. DEFINITION OF A PROBABILITY SPACE 11

P(An ) ---+ °as n ---+ +00 since it is the tail end of a convergent series. NowF(x + ~) = F(x) + P(An ). Hence, limn _ oo F(x + ~) = F(x). 0

Exercise 1.3.3. Let (An)n;::~l be a sequence in 3', where (n, 3', P) is aprobability space. Show that P(An) ---+ °if (1) An => An+1 for all n ~ 1and (2) n~=l An = 0.The property (DF3 ) of a distribution function is a reflection of the count­

able additivity of P. It can be reformulated by saying that F(x) is rightcontinuous at every point x.

Definition 1.3.4. A function F : JR ---+ JR has a right limit .>. at a pointa E JR if, for any to> 0, there exists a 8> 0, where 8 = 8(a, to), such that

JF(x) - '>'1 < to whenever a < x < a + 8.

The right limit'>' is denoted by F(a+). The function F is right continu­ous at a if F(a) = F(a+ ). Similarly, one defines the left limit F(a-) tobe.>. if, for any to > 0, there exists a 8 > 0, where 8 = 8(xo, to), such that

JF(x) - '>'1 < to whenever a - 8 < x < a

and F is left continuous at a if F(a) = F(a-).

Remark. When F is the distribution function of a probability, then F(a- )= P((-oo,a)) (see Exercise 1.4.14 (3)).

Definition 1.3.5. A function F : JR ---+ [0,1] is said to be a distributionfunction jf

(1) it is non-decreasing and right continuous; and(2) limx__oo F(x) = °and limx_+oo F(x) = 1.

A distribution function F determines the probability of certain sets,namely the probability of (a, b], where P((a, b]) ~f F(b) - F(a). This notionof probability can then be extended to the Boolean algebra 21 generated bythe intervals (a, b], where A E 21 if and only if A is a finite union of intervalsof the form (a, b], with -00 :S a :S b :S +00, and by convention one takes(-00, +00] = JR. Note that 21 is the smallest Boolean algebra containingall the intervals (-00, x]), x E lR. A priori, there is no reason why theprobability P on 21 determined in this way by a distribution function Fshould come from a probability on a a-algebra 3' containing the Booleanalgebra 21. This raises the following issue.

Basic Problem 1.3.6. Given a distribution function F on JR, are there aa-algebra 3' => 21 and a probability P on 3' such that F is the distributionfunction ofP? Note that for such a probability P, jt follows that P( (a, b]) =F(b) - F(a) whenever a :S b?

Page 12: An Introduction to Measure and Probability || Probability Spaces

12 I. PROBABILITY SPACES

Remark. As will be seen later in Theorem 2.2.2, this basic problem hasan important generalization: the distribution function F is replaced byany right continuous, non-decreasing function G on JR. Then the questionbecomes: is there a a-algebra 3" :) I!, and is there a function J.L on 3" thatbehaves like a probability except that its value on n = JR is not forced tobe 17 For example, if G(x) = x for all x, then such a set function J.L wouldcompute the length of sets.

Returning to distribution functions, the rest of this chapter is devotedto the solution of Basic Problem 1.3.6: to show that every distributionfunction comes from a probability. As a first step, one has the following.

Exercise 1.3.7. Let 21 be the collection of finite unions of intervals ofthe form (a, bj, where -00 ~ a ~ b ~ +00, and by convention one takes(-00, +ooj = JR. Verify that

(1) I! is a Boolean algebra, (i.e. it satisfies (21t}, (1!2), (1!3) of §1.2 ),and

(2) there is one and only one finitely additive probability P on I! suchthat P((a, b]) = F(b) - F(a) for all a ~ b.

[Hints: (1) Show that any A E 21 can be written as a finite pairwise disjointunion of intervals of the form (a, bj; note that the union of two intervalsof this type is an interval of this type if they are not disjoint and observethat (a, bj n (c, d) is 0 or (a V c, b /\ d]. (2) Convince oneself of (2) byshowing that if A = Ui'=l (ai, bi] with the intervals pairwise disjoint, thenL~=l{F(bi ) - F(ai)} is not dependent on the particular way A is writtenas a disjoint union. For example if A = (O,3j = (0, 1] U(1, 3j = (0,2] U(2,3],then P((O, 3]) = F(3)-F(0) = {F(I)-F(0)}+{F(3)-F(I)} = P((O, 1])+P((I,3]). Suggestion: given two disjoint unions, make up a "finer" disjointunion by using the second one to "cut up" all the intervals of the first.Then observe that, if (a, bj = U7=1 (Cl, dtl with the intervals (Cl' dtl pairwisedisjoint, after relabeling if necessary, one has a = Cl < d1 = C2 < ... <dL - 1 = CL < dL = b.]

Convention 1.3.8. Until further notice, unless otherwise stated or thecontext makes it evident (see Definition 1.4.2), 21 will denote the aboveBoolean algebra of finite unions of half-open intervals (a, b] C JR.

Example 1.3.9. Here are three well-known distribution functions:

(1) for the uniform distribution on [0,1], (see Fig. 1.2)

if x ~ 0,

if 0 ~ x ~ 1,

if 1 ~ x.

Page 13: An Introduction to Measure and Probability || Probability Spaces

3. DEFINITION OF A PROBABILITY SPACE

(0,1)

I(I,Q)

Fig. 1.2

(2) for the unit normal distribution (see Fig. 1.3)

1 JX 2F(x) = Ii>= e- X /2dx.v 27r -00

•••••• __ • •••••••••••• __ ••• __ ••• __ •• e ••• _

Fig. 1.3

(3) for the Poisson distribution with mean one (see Fig. 1.4)

13

{0,

F(x) = 1 1e L:O$n$x nr,

(I+...+I/n!)/.

(I +112+ 116)1.(1+112)/.

11.

x < 0,

°~ X.

_.1.-_-,-_-,-_--.,_ -.---

n

Fig. 1.4

Page 14: An Introduction to Measure and Probability || Probability Spaces

14 L PROBABILITY SPACES

Exercise 1.3.10. (a) Let n be a set and let ~ be a collection of subsets ofn. Show that there is a smallest (I-algebra of subsets of n that contains ~.

It is called the (I-algebra generated by ~ and will be denoted by (I(~).

(b) Let n = R Show that the smallest (I-algebra containing each of thefollowing collections ~ of subsets of IR is the same:

(1) ~={(a,b]la:=;b}j

(2) ~ = {(a, b) Ia :=; b};(3) ~ = {[a, b) Ia :=; b};(4) ~ = {[a,b] Ia:=; b};(5) ~ is the collection of open subsets G of IRj and(6) ~ is the collection of closed subsets F of R

The (I-algebra that results is called the (I-algebra of Borel subsets ofIR and is denoted by ~(IR).

[Hints for (1) to (4) in (b): make use of Exercise 1.1.17.][Hints for (2),(5), and (6) in (b): for the two other collections of sets, twodefinitions are needed: a subset G of IR is said to be open if Xo E G impliesthat, for some to > 0, (xo - to, Xo + to) C G; a subset F is said to be closedif Fe is open. As a result, the (I-algebras generated by collections (5) and(6) coincide. To show that collections (2) and (5) generate the same (1­

algebras, it is necessary to know the following fact: every open set can bewritten as a countable union of open intervals (a fact that is part of thenext exercise).]

Exercise 1.3.11. (a) Let 0 be the collection of open subsets 1R. Showthat

(1) IR E 0,(2) G I , G2 EO=> G I n G2 EO,(3) the union of any collection of open sets is open.

(b) Show that if G is open and Xo E G, then there is a largest openinterval (a, b) C G that contains xo.[Hint: the union of a collection of intervals that contain a fixed point isitself an interval: recall Definition 1.1.15.](c) Let G be an open set and let x I, X2 E G. Show that the largest open

interval h C G that contains Xl either equals 12 , the largest open intervalC G that contains X2, or is disjoint from h.(d) Let G be an open set. Show that G is a disjoint union of at most

a countable number of open intervals. [Hint: suppose G C (-1, 1). Showthat if G is expressed as a disjoint union of open intervals using (c), at mosta finite number can have length 2: lin, where n 2: 1 is any fixed positiveinteger.]

Comment. It is standard mathematical terminology to call a collection 0of sets a topology if it satisfies (1), (2), and (3) in the above exercise. Thecomplements of open sets are defined to be the closed sets, and it follows

Page 15: An Introduction to Measure and Probability || Probability Spaces

3. DEFINITION OF A PROBABILITY SPACE 15

from (3) that for any set A, the intersection of all the closed sets containingit is a closed set. It is called the closure of A and is usually denoted by A.

Digression on countable sets. It is about time to say what the word"countable" means. A set E is countable if all its elements can be labeledby natural numbers in a 1:1 way, i.e., if there is a function c: N -; E suchthat (i) E = {c(n) I n EN}, (ii) c(nd = c(n2) implies nl = n2. A setis at most countable if it is either finite (i.e., it can be "counted" using{I, ... ,n} for some n) or countable (i.e., it can be "counted" using N).

Given two sets A and B, the Cartesian product A x B ~f {(a, b) Ia E A,b E B}.

Proposition 1.3.12. If A and B are countable, then A x B, is countable.

Proof. To begin with, A x B is clearly not finite. To "count" A x B isreally the same as "counting" N x N. This can be viewed as a set in theplane. Figure 1.5 explains how to "count" or "enumerate" N x N:

(1.5)

(1.1) (2.1) (3.1) (4.1) (5.1)

Fig. 1.5

Proposition 1.3.13. Q is countable.

Proof sketch. It suffices to show that {q E Q I q > O} is countable. Why?Look at the diagram of N x N, and at each "site" (n, m) attach the rationaln/m. It should then be clear how to count {q E Q I q > O}! 0

Proposition 1.3.14. (Cantor's diagonal argument) (0,1] and henceIR is not countable.

Proof. If 0 < a :S 1, a has a decimal expansion as a=0.ala2··· an'" =L~l adl0 i

. If one eliminates decimals that terminate in an unbrokenstring of zeros, this decimal expression is unique (explain!).

Assume that the numbers in (0,1] can be enumerated, and write themin a sequence using only decimals that fail to terminate in an unbroken

Page 16: An Introduction to Measure and Probability || Probability Spaces

16

string of zeros:

I. PROBABILITY SPACES

al = 0.aUaI2aI3··· aln '"

a2 = O.a21a22a23 ... a2n ...

Let a = O.a~l a~2a;3 ... a~n ... , where 0 i= .a~n i= ann for each n i= O.In other words, use the diagonal entries of the above infinite table to makea number in (0,1]. Then a is not in the list since its decimal expansionfails to agree with that of any of the expansions for aI, a2, . .. . This is acontradiction.Since (0,1] is not countable, JR is not countable. 0

This argument applies to any non-void interval since, for example, thefunction f(x) = ~=~ maps [a,b] in a 1:1 way onto [0,1]. As a result, anynon-void open set is not countable, as it contains some interval [a, b] witha i= b. In particular, one has the following observation.

Exercise 1.3.15. Let C be any countable subset of R Then the closureJR\C of JR\C equals R Show this by observing that C :::) CJR\C, which isan open set. This property of the complement JR\C of C is referred to bysaying that C is dense. Show that a set D is dense in JR if and only ifevery non-void open set contains a point of D.

Finally, one can verify the following fact.

Exercise 1.3.16. Let E I , ... , En be a finite collection of disjoint countablesets. Then E = Ui=lE i is countable. Let E I , . .• , En, . .. be a countablecollection of countable sets. Then E = U~lE i is countable.

4. CONSTRUCTION OF A PROBABILITY

FROM A DISTRIBUTION FUNCTION

Now consider the basic problem of constructing a probability from adistribution function F. Exercise 1.3.10 (b) shows that if one is to get aprobability space (JR,~,P) from F with ~ :::) l2(, then ~ will have to containthe a-algebra !B(JR) of all Borel subsets of JR.Furthermore, if A E 21 and A = U~IAn, where the An are in 21 and

pairwise disjoint - for example, (0,1] = U~=l(n~l' ~]- then, it will benecessary (as shown later) that

00

P(A) = I:P(An )

n=l

if there is to be a probability P on a a-algebra ~ :::) 21. Recall that P is tobe a-additive.

Page 17: An Introduction to Measure and Probability || Probability Spaces

4. CONSTRUCTION OF A PROBABILITY 17

Exercise 1.4.1. Show that F(l)-F(O) = 2:::=1{F(~)-F(n~1)}' [Hint:use Exercise 1.1.8 and the right continuity of F.]

Definition 1.4.2. A finitely additive probability P on a Boolean algebra21 is said to be a-additive if A = U~=lAn implies

00

P(A) = I: P(An),n=l

when A in 21, the sets An are all in 21, and are pairwise disjoint.

A finitely additive probability P on 21 need not be a-additive, as thefollowing example shows.

Example. Define P on 21 by setting P(A) = 0 if A has an upper boundand P(A) = 1 if A has no upper bound (remember that 21 is a specialBoolean Algebra - see Convention (1.3.8)). Show that (JR, 21, P) satisfiesconditions (21d, (212 ), (213 )(P1), (P2 ), and (FAP3 ) in §1.2. Show that it isnot a-additive. To see this, try to calculate P(JR) as 2:::=-00 P((n, n + 1]).Notice that this "probability" does not come from a distribution function.What is missing?

Returning to the problem of extending P from 21 to 'B(JR) , it will nowbe shown that if P on 21 is determined by a distribution function, then itis a-additive.

Exercise 1.4.3. Show that P on 21 is a-additive if and only if (a, b]U~l (Ck, dk], with the (Ck' dk] pairwise disjoint, implies

00

P((a, bJ) = F(b) - F(a) = I:{F(dk) - F(Ck)}'k=l

Theorem 1.4.4. Let F be a distribution function on R Let P be theunique finitely additive probability on 21 such that P( (a, bJ) = F(b) - F(a)whenever a S b. Then P is a-additive on 21.

Remark. This theorem may appear obvious to you in view of Exercise1.4.3. In a way it should. However, its actual justification depends on theAxiom 1.1.3.

Proof. By Exercise 1.4.3, it suffices to prove that if (a, b] = Uk=l (Ck, dk]with the intervals (Ck, dk] pairwise disjoint, then

00(*) F(b) - F(a) = I:{F(dk) - F(Ck)}'

k=l

Now it is obvious that F(b) - F(a) ~ 2::~=1 {F(dk) - F(Ck)} since (a, b] :JUk=l (Ck, dk]· Hence, by Exercise 1.1.8, F(b) - F(a) ~ 2::~1 {F(dk) -

Page 18: An Introduction to Measure and Probability || Probability Spaces

18 I. PROBABILITY SPACES

F(Ck)}. Therefore, it is enough to verify the opposite inequality, and itis here that the Axiom 1.1.3 and the right continuity of the distributionfunction F come into play.First, one shows that it suffices to verify (*) when -00 < a < b < +00.

Suppose, for example, that -00 = a < b < +00. Then, for large n, onehas -n < b. If (-00, b] = Uk=l (Ck' dkl with the intervals (Ck, dkl pairwisedisjoint, then (-n, b] = U~l (CkV( -n), dk]. If (*) holds when the endpointsare both finite, then

00 00

F(b) - F( -n) = L F(dk) - F(Ck V (-n)) ::; L F(dk) - F(Ck).k=l k=l

Since P(( -00, b]) = F(b) and F( -n) converges to zero as n tends to +00, itfollows that F(b) ::; E~l F(dk) - F(Ck), and hence F(b) = E~l F(dk)­F(Ck) (the opposite inequality is valid, as observed above). Similar argu­ments apply when -00 < a < b = +00 and when -00 = a < b = +00.This shows that the theorem holds provided it holds for intervals with finiteendpoints.Assume -00 < a < b < +00, and choose any positive number E > o.

For each k, there is a positive number ek > 0 such that

because the distribution function F is increasing and right continuous. Inother words,

Therefore,

00 00 00

L {F(dk + ek) - F(Ck)} = L P((Ck' dk + ek]) ::; L {F(dk) - F(Ck)} + E.k=l k=l k=l

Note that (a, b] = U~l (Ck' dkl C U~l (Ck, dk,+ek), which is an open set.Using the right continuity once again (at Xo = a), there is a positive numbere < b - a such that F(a + e) ::; F(a) + f-

In other words,P((a, b]) - E ::; P((a + e, b]).

Since [a + e, b] c (a, b] C Uk=l (Ck' dk + ek), the closed interval [a + e, b] iscontained in a countable union of open intervals. A very famous theorem(the Heine-Borel theorem) asserts that, as a result, the union of somefinite number of the open intervals (Ck' dk +ek) contains [a+e, b]: the proofwill be given below and it uses the Axiom 1.1.3. Assuming the validity of

Page 19: An Introduction to Measure and Probability || Probability Spaces

4. CONSTRUCTION OF A PROBABILITY 19

the Heine-Borel theorem, suppose that one has [a+e, b] C Uk=l (Ck' dk+ek).Then

and so

P«a + e, b]) = F(b) - F(a + e) ::; peA')n n

::; LP«Ck,dk +ek]) = L{F(dk +ek) - (Ck)}k=l k=l

(recall that peA') ::; L~=l P«Ck' dk + ek]) by Exercise 1.2.2 (6). Theconclusion is then that

F(b) - F(a) - f = P«a,b]) - f::; P«a+e,b])n

::; L {F(dk + ek) - F(Ck)}k=l00 00

::; L {F(dk + ek) - F(Ck)} ::; L {F(dk) - F(Ck)} + f.

k=l k=l

Since f is any positive number, the desired inequality is proved. 0

For completeness, the key theorem that was used in the proof of thisresult will now be proved.

Theorem 104.5 (Heine-Borel). Let [a, b] be a closed, bounded intervalin JR. Assume that there is a collection of open intervals (a., b,) whoseunion contains [a,b]. Then the union of some finite number of the givencollection of open intervals also contains [a, b].

Proof. The idea of the proof is to see how large an interval [a, xl witha ::; x ::; b can actually be covered by a finite number of the open intervals(i.e., [a, x] C some finite union of these intervals). One knows that for somei, say iI, a E (ail' bi!). So, if x = min{b, ~ (a +biJ}, then [a, xl C (ai!, bi!).Let H equal the set of x in the interval [a, b] such that [a, x] is contained ina finite number of the intervals. This is a set with an upper bound b. Notethat if x E H and a ::; y ::; x, then y E H. Also, if d is an upper bound ofH, then d < e ::; b implies e rf. H. By Axiom 1.1.3, H has an l.u.b. Call itc.

Exercise 1.4.6. (1) Show that [a, c] is contained in a finite union of theopen intervals. [Hint: c is in some interval.] (2) Show that if [a, x] iscontained in a finite union of the open intervals and x < b, then x is notan upper bound of H.

This exercise implies c = b. and so the theorem is proved. 0

Page 20: An Introduction to Measure and Probability || Probability Spaces

20 I. PROBABILITY SPACES

Remark. An equivalent form of the Heine-Borel theorem is the followingresult.

Theorem 1.4.7. Let (Oi)iEJ be a family of open sets that covers theclosed, bounded interval [a, b] C JR (i.e., UiEJOi ::J [a, b]). Then a finitenumber of the sets Oi covers [a, b] (i.e., for some finite set F c J, UiEFOi ::J[a, b]).

Exercise 1.4.8. Show that Theorem 1.4.7 and Theorem 1.4.5 are equiva­lent.

The Heine-Borel theorem is so basic that the class of sets for which itis true is given a name.

Definition 1.4.9. A set K C JR is said to be compact if, whenever K C

UiEJOi, each Oi open, there is a finite set F C J with K C UiEFOi'

The Heine-Borel theorem states that every closed and bounded intervalis compact. Given this theorem, it is not hard to show that a set K C JR iscompact if and only if K is closed and bounded (see Exercise 1.5.6). Forexample, the Cantor set is compact!Returning again to the extension problem for P, it is clear that if A =

U~=lAn, An E 21 for all n and A not necessarily in 21, then A E~. Also,by Remark 1.2.3, A can be written as a disjoint union of sets in 21: onereplaces each An by An \ U~~} Ai. Consequently, if there is an extension,P(A) = 2::'=1 P(An\ u~~l Ai) ::; 2::'=1 P(An). Without assuming theextension to be possible, one may define P*(A) to be the greatest lowerbound of {2::'=1 P(An)jA = U~=lAn, An E 21 for all n}. Then P*(A)is an estimate for the value P(A) of a possible extension when A E 21(7'the collection of sets that are countable unions of sets from 21. Since JR =U~~_oo (n, n + 1], every E C JR is a subset of some set A E 21(7' Henceif E c A, then one expects to have P(E) ::; P* (A). This motivates thedefinition of the following set function P*.

If E c JR, define P*(E) to be the greatest lower bound of {P*(A)IE C

A E 21(7}, i.e.,

P*(E) ~f inf P*(A) = inf{f P(An)IE C U~=lAn, An E 21 for all n}.AE21aA:::>E n=l

Remark. The terms infimum (abbreviated to "inf") and supremum(abbreviated to "sup") are merely other words for "g.l.b." and "l.u.b.",respectively.

In general, the set function P* does not behave like a probability because(P3) need not hold! However, it has certain important properties whichmake it into what is called an outer measure. They are stated in thenext definition.

Page 21: An Introduction to Measure and Probability || Probability Spaces

4. CONSTRUCTION OF A PROBABILITY 21

Definition 1.4.10. An outer measure on the subsets ofJR is a set func­tion P* such that

(1) 0 S P*(E) for all E c JR,(2) E 1 C E 2 implies P*(Ed S P*(E2 ), and(3) E = U~=lEn implies P*(E) S L:=l P*(En ) (i.e., it is countablysubadditive) .

Proposition 1.4.11. The set function P* defined above is an outer mea­sure with P*(E) S 1 for all E E IR. Furthermore, since P is a-additive on21, P(A) = P*(A), for all A E 21.

Proof. Properties (1) and (2) of Definition 1.4.10 are obvious. Let E > 0and, for each n, let En C Uk:: 1An,k be such that L%:l P(An,k) S P*(En)+2';. where the sets An,k E 21. Then E = U~=lEn C U~=l Uk::l An,k, whichis a countable union of sets from 21. Hence,

If A E 21 is contained in U~=lAn,An E 21, n ~ 1, then A = U~=lA~,

where A~ = A n [An \ U~:i A k]. Then, by the a-additivity of P on 21(Theorem 1.4.4), P(A) = L:=l P(A~) S L:=l P(An ) as A~ C An, forall n ~ 1. This shows that P(A) S P*(A) and hence P(A) = P*(A) ifA E 21. 0

Remark. The fact that P and P* agree on 21 is crucial in what follows.This is why it is so important that P be a-additive on 21.

While P* is defined for all subsets of JR, it is not necessarily a probability.This raises the problem as to whether there is a natural class of sets onwhich it is a probability. The following way of solving this problem is dueto a well-known Greek mathematician, C. Caratheodory. He observed that(i) the sets in any a-algebra <5 containing 21 have a special property (see(C) below) provided the outer measure P* restricted to <5 is a probability,and (ii) the collection of all sets with this property is in fact a a-algebra,and the restriction of P* to this a-algebra is a probability.First, notice that because of property (3) of Definition 1.4.10, for any

two sets E and Q, one has P*(E) S P*(E n Q) + P*(E\Q). However, ifQ E 21, then in fact

(C) P*(E) = P*(E n Q) + P*(E n QC),

for any set E because P and P* agree on 21.To prove (C) for sets Q E 21, let E > 0 and let An E 21 for all n ~ 1 be

such that E C u~=lAn and L:=l P(An ) S P*(E) + E. These sets An maybe assumed to be pairwise disjoint by Remark 1.2.3. Further, An n QC E 21for all n ~ 1, since Q E 21.

Page 22: An Introduction to Measure and Probability || Probability Spaces

22

Now

I. PROBABILITY SPACES

~ L {P(An nQn + L {P(An nQcnn=l n=l

= L {P(An nQ) + P(An nQcn = L P(An ) ~ P*(E) + €.

n=l n=l

Therefore, P*(E nQ) + P*(E nQC) ~ P*(E), proving (C).Now suppose that ~ :J 21 is a a-algebra and that P* restricted to ~ is a

probability, say R. Then, since a a-algebra is a Boolean algebra and R isa-additive on ~, one could construct a new outer measure R* from R. Asstated later in Exercise 1.5.4, in fact R* = P*. Therefore, it follows fromwhat has just been proved for 21 that condition (C) holds for all the setsin ~.

This suggests that one should look at the collection 3' of all sets forwhich condition (C) holds, Le.,

3' = {Q IP*(E) = P*(E n Q) + P*(E n QC) for any E c IR}.

It will now be shown that

(i) 3' is a a-algebra (containing 21), and(ii) P* restricted to 3' is a probability.Hence, (IR, 3',P*) is a probability space and 3' :J 21 (also, in view of

Exercise 1.3.10 (b), 3' :J !H(IR) - the algebra of Borel sets).It remains to verify (i) and (ii).To verify (i), first note that n = IR E 3' and that A E 3' implies Ac E 3'.

If A l , A2 E 3', then A l U A2 E 3'. To see this, let E C IR. Then, by (C), onehas

P*(E) = P*(E n Ad + P*(E nAn.

Since, by (C),

and, again by (C),

P*(E n An = P*(E n A~ n A2 ) + P*(E n A~ n A2),

this implies that

P*(E) = p*(EnA l nA2 ) +p*(EnA l nA2)(1) + P*(E n A~ n A2 ) + P*(E n A~ n A2).

Page 23: An Introduction to Measure and Probability || Probability Spaces

4. CONSTRUCTION OF A PROBABILITY

Since

it follows from (1) and Definition 1.4.10 (3) that

P*(E) ? P*(E n (AI U A2 » + P*(E n A~ n A2).

23

Hence, AI, A2 E J implies that Al U A2 E J.By now it should be clear that J is a Boolean algebra. Therefore, J

is a cr-algebra providing U~=lAn E J whenever the An E J are pairwisedisjoint.To verify this, one has to show that for any set E c 0,

when the An E J are pairwise disjoint. To do this, it will suffice to showthat for all n ? 1,

n

(2) P*(E) ? l:P*(E n Ad + P*(E n (U~=IAnn·i=l

The reason is that this inequality and Definition 1.4.10 (3) imply that

00

P*(E) ? l: P*(E nAn) + P*(E n (U~=IAn)C)n=l

In view of the validity of the opposite inequality, again by Definition 1.4.10(3), one then has

P*(E) = P*(E n (U~=IAn» + P*(E n (U~=IAn)C), i.e., U~=l An E J.

Now (2) holds if

n

(3) P*(E)? l:p*(EnAi)+p*(En(Uf=IAi)C).i=l

Hence, to verify (2), it suffices to prove this last inequality (i.e., inequality(3» as (U~lAi)C J (U~lAiY· Note that (3) is equivalent by Definition1.4.10 (3) to the identity

(4)n

P*(E) = l:p*(EnAi ) +p*(En (Uf=IAi)C).i=l

Page 24: An Introduction to Measure and Probability || Probability Spaces

24 1. PROBABILITY SPACES

To prove (4), let AI, ... ,An be pairwise disjoint and in~. Then, byapplying the defining property of ~ first to Al using E, and then to A2 andusing E n A~ in place of E, it follows that

P*(E) = P*(E n Ad + P*(E n A~)= p*(En Ad + p*(En A~ n A2 ) + p*(En A~ n A~)

= P*(E n Ad + P*(E n A2 ) + P*(E n A~ n A~),

since Al n A2 = 0.Hence, (4) holds for n = 2. Note that this follows immediately from

formula (1) if Al n A2 = 0. Assume that (4) is true for n - 1 pairwisedisjoint sets Ai, Le., for any E c JR,

n-I

P*(E) = L P*(E n Ai) + P*(E n (U~':/Ain·i=1

Apply the defining property of ~ to An and use the set E n (U~':/ Ai)". Itthen follows that

Therefore,

n

P*(E) = L P*(E n Ai) + P*(E n (Uf=IAi)C).i=1

This completes the proof of (i). The proof of (ii) is given below.

Definition 1.4.12. The u-algebra ~ of sets Q that satisfy

(C) P*(E) = P*(E n Q) + P*(E n QC)

for all E c JR is called the u-algebra of P*-measurable sets.

The following theorem is the goal to which all these arguments havebeen leading.

Theorem 1.4.13. Let F be a distribution function on R

(1) Then there is a unique probability P on 'B(JR) , the u-algebra ofBorel subsets of JR, such that P((a, b]) = F(b) - F(a) whenevera os. b.

(2) The u-algebra ~ ofP*-measurable subsets contains 'B(JR), and P*restricted to ~ is a probability such that P*((a, b]) = F(b) - F(a)whenever a os. b.

Page 25: An Introduction to Measure and Probability || Probability Spaces

4. CONSTRUCTION OF A PROBABILITY 25

Furthermore, the a-algebra ~ ofP* -measurable sets is the largest a-algebral!5 containing 21 with the property that the restriction of P* to l!5 is aprobability.

Proof. First consider (2). It has been shown that ~ is a a-algebra. To provethat P* restricted to ~ is a probability, it will suffice to verify (P3 ) of Defi­nition 1.3.1 since P* obviously satisfies (PI) and (P2 ). Let (An)n~l C ~ bepairwise disjoint. Since P* is an outer measure, it is countably subadditive,i.e., (3) of Definition 1.4.10 holds. Hence

00

P*(U~=lAn) :::; 2: P*(An).n=l

To prove the reverse inequality, note that P*(U~=lAn) ~ P*(U;:'=lAn) =

L~=l P*(An) in view of (4) (take E = U;:'=lAn). This completes the proofof (2) and of (ii) above.To prove the first statement, let PI and P 2 be two probabilities on

23(lR.) such that Pl«a,b]) = P2 «a,b]) = F(b) - F(a) whenever a:::; b. Let9Jt = {A E !B(lR.) I Pl(A) = P 2 (A)}. Then, by Exercise 1.3.7, PI and P 2agree on 21 .

Exercise 1.4.14. Let (n,~,P) be a probability space and let (An)n~l C~. Show that

(1) if An C An+! for all n, then limn-ooo P(An) = P(U;::'=lAn),(2) if An::> An+! for all n, then limn-ooo P(An) = p(n;::'=lAn).

[Hint: recall Exercise 1.3.3.J

If F is the distribution function of a probability P on 23(lR.), show that

(3) F(x-) = P«-oo,x)) for all x E lR..

Exercise 1.4.15. Show that the set 9Jt defined above has the followingtwo properties (where (An)n~l C 9Jt):

(1) An C An+! for all n implies U;::'=lAn E 9Jtj and(2) An ::> An+! for all n implies n~=lAn E 9Jt,

Le., 9Jt is a so-called monotone class.

Exercise 1.4.16. (Monotone class theorem for sets) Let 9Jt be amonotone class that contains a Boolean algebra 21. Show that 9Jt containsthe smallest a-algebra ~ containing 21. [Hints: let 9Jto be the smallestmonotone class containing 21 (why does it exist?). Let A E 21. Show that{B E 9Jto I B UA E 9Jto} is a monotone class. Conclude that B E 9Jto,A E21 imply B UA E 9Jto. Now fix B l E 9Jto and look at {B E 9Jto I B l U B E9Jto}. Conclude as before that BI, B E 9Jto implies B l U B E 9Jto. Makea similar argument to show that {Be I B E 9Jto} is 9Jto. Conclude that9Jto ::> ~.J

Page 26: An Introduction to Measure and Probability || Probability Spaces

26 I. PROBABILITY SPACES

The monotone class theorem has a function version, which is stated asTheorem 3.6.14. It gives conditions that ensure that a given collection H. ofbounded functions on n contains all the bounded random variables in thea-algebra determined by a subset C of H. that is closed under multiplication.From these three exercises and Exercise 1.3.10 (b), one concludes that

9J1 = ~(JR.), as it contains 21, and so (1) is established.The last statement of the theorem is a consequence of Exercise 1.5.4 and

the observation made following Proposition 1.4.11 that condition (C) holdsfor all the sets in Q; if Q; is a a-algebra containing 21 such that P* restrictedto Q; is a probability. 0

Remark 1.4.17. It is easily verified that the collection 9J1 defined abovehas the following properties: (i) n E 9J1 and (ii) if A, B E 9J1 with A c B,then B\A E 9J1. Dynkin proved (see Proposition 3.2.6) that the smallestsuch system .c containing a collection <!:, of sets closed under finite inter­sections is the smallest a-field containing <!:. This is another version of themonotone class theorem. It is proved in Exercise 3.2.5, part C, that anycollection of sets .c :::> <!: satisfying (i) and (ii) also contains the smallestBoolean algebra 21 containing <!: if the collection l!: is closed under finiteintersections. Taking <!: as {(a,b] I -00 ~ a < b ~ +oo} Proposition3.2.6 also proves the uniqueness of the probability P on ~(JR.) for whichP((a,b]) = F(b) - F(a).

5. ADDITIONAL EXERCISES*

Exercise 1.5.1. Let (n,~,P) be a probability space. Show that

(1) P(U~=lAn) ~ E:=l P(An) (see Exercise 1.2.2 (6)); and(2) P(U~=lAn) = 0 if P(An) = 0 for all n ~ 1.

Exercise 1.5.2. The so-called Heaviside function is the function H,where

{0,

H(x) =1,

x < 0,x ~ o.

Calculate for this distribution function the corresponding outer measureP*, and determine the a-algebra ~ of p.-measurable subsets of R [Hint:guess the a-algebra ~ and P*. Then see if they "work".]The resulting measure, denoted by EO or bo is called the Dirac measure atthe origin or unit point mass at the origin. Replace H(x) by H(x -

a) ~f Ha(x), and let Ea denote the resulting measure. Give the formula forfa(A), A E ~, where ~ is the corresponding a-algebra of measurable sets.The measure Ea is the Dirac measure or unit point mass at a.

Exercise 1.5.3. For the distribution function of the Poisson distribution(see Example 1.3.9 (3)), calculate the corresponding outer measure p. anddetermine the a-algebra ~ of p. -measurable subsets of R

Page 27: An Introduction to Measure and Probability || Probability Spaces

5. ADDITIONAL EXERCISES 27

Exercise 1.5.4. Let 15 :J mbe a a-algebra, and assume that P* restrictedto 15 is a probability R. Define the outer measure R* by setting R*(E) =inf{E::'=l R(Bn) lEe U':'=lBn,Bn E 15 for all n}. Show that

(1) R*(E) = inf{R(B) lEe B, BE 15},(2) R*(E) :::; P*(E) for all E C JR,(3) R*(E) = P*(E) for all E C R [Hint: R(B) = P*(B).]

Exercise 1.5.5. Show that Q is P*-measurable if and only if P*(Q) +P* (QC) = 1. This exercise may be done by going through the followingsteps.

(1) Let Q,E be subsets of JR, and let (Am), (Bm), and (Cp ) be pair­wise disjoint collections of sets frommsuch that (i) E c UpCp = Cand (ii) Q c umAm = A, QC C UnBn = B. Show that

m,p n,p

p ffi,n,p

:::; LP(Cp ) + LP(Am n Bn).P m,n

[Hint: for the second inequality use Exercise 1.2.2 (5).](2) If (Am) and (Bn) are two collections of pairwise disjoint sets frommsuch that (i) JR = AUB, where A = U;:;;'=lAm and B = U':'=lBn , and(ii) Em P(Am)+En P(Bn) :::; l+e, show that Em n P(AmnBn) :::;e.[Hint: make use of Exercise 1.2.2 (5) to compute P(Am) + P(Bn)and use the fact that JR = Um,nAm U Bn.]

(3) If P*(Q) + P*(QC) = 1 and E C JR, show that for any e > 0 onehas P*(E n Q) + P*(E n QC) :::; P*(E) + e.

Remarks. Result (2) says that peA n B) :::; e, i.e., the overlap of A andB is small (relative to P) if (i) and (ii) hold. It will help to realize that allthe sets in meT are Borel and that, for example, Em P(Am) = peA), whereP is the probability on 113(JR). It is also of some interest to realize that allthe above computations can be done without knowing that P on 2l has anextension as a probability to 113(JR). As a result, the P*-measurable sets Qmay be defined by the equation P*(Q) + P*(QC) = 1 and Theorem 1.4.13proved using this definition (see Neveu [NIl).

Exercise 1.5.6. (The Balzano-Weierstrass property)This exercise characterizes the compact subsets of JR.

Part A. Let A be a subset of R Show that A is compact if and only if itis closed and bounded. [Hints: if A is closed and bounded, then for some

Page 28: An Introduction to Measure and Probability || Probability Spaces

28 1. PROBABILITY SPACES

N > 0 one has A c [-N, N]; if A c UiOi, then [-N, N] C UiOi U N;now use the Heine-Borel Theorem 1.4.5. For the converse, observe A C

Un ( -n, n); and if Xo E AC and no open interval (xo - i, Xo + i) C AC, lookat the open sets [xo - i,xo + i]c.]

Part B. Let A be a compact subset of IR, and let (Xn)n~l be a sequenceof points of A. Let S = {xn I n ~ 1}. If the sequence has no convergentsubsequence, show that

(1) S is infinite [Hint what happens if it is finite?],(2) S is closed [Hint: if a rf. S, show that some open interval centeredat a is disjoint from S; otherwise what happens?],

(3) for each point s of S, there is an open interval centered at s con­taining no other point of S. [Hint: start off with the first point.]

Show that (3) contradicts the assumption that A is compact [use (2)].Conclude that (Xn)n~l has a convergent subsequence.

Part C. Let A be subset of IR, and assume that every sequence (Xn)n~l ofpoints in A has a subsequence that converges to a point of A. Show that

(1) A is closed [Hint: consider the hint for Part B (2)] ,(2) A is bounded. [Hint: if a sequence converges it is bounded.]

This exercise emphasizes the importance of the Bolzano-Weierstrassproperty (see Royden [R3]) for a set: every sequence (in the set) has aconvergent subsequence (convergent to a point in the set). It is equivalentto the property of compactness.