Introduction to Stochastic Processes - Universiteit …spieksma/colleges/stochproc/SP04.pdf · Introduction to Stochastic Processes August 30, 2006 Contents 1 Measure space and random

Introduction to Stochastic Processes

August 30, 2006

Contents

1 Measure space and random variables 1

2 Integration, Expectation and Independence 8

3 The art of conditioning 21

4 Martingales 31

5 Martingale convergence problems 35

6 Continuous time processes: the Wiener process or Brownian motion 37

7 Diffusions and Ito processes 43

Abstract

These notes have a two-fold use: it contains both the material (albeit slightly re-shuffled) ofthe course on this topic taught by the above authors in Fall 2003 as well as extra notes where wefeel that the book on ‘Basic Stochastic Processes’ is slightly too ephemeral.

1 Measure space and random variables

Definition 1.1 A probability space is a triple (Ω,F ,P) with the following properties:

• The sample space of outcomes Ω is a non-empty set;

• the set of observable events F is a σ-algebra over Ω. This means that F is a collection ofsubsets of Ω with the following properties:

i) Ω ∈ F ;

ii) B ∈ F ⇒ Ω \B ∈ F ;

1

iii) if (Bn)n∈N is a sequence of events in F , then ∪∞i=1Bn ∈ F .

F can be interpreted as the amount of information of Ω that can be observed. The smaller F ,the less information we have of Ω.

• P is a probability measure on (Ω,F), i.e. P : F → [0, 1] with the properties

i) PΩ = 1;

ii) for (Bn)n∈N a sequence of mutually disjoint events in F , i.e. Bi ∩Bj = ∅ for i 6= j, one hasP ∪∞i=1 Bi =

∑∞i=1 PBi (σ-additivity).

σ-algebrast:opg1-1

Problem 1.1 Check that B1, B2 . . . ∈ F implies ∩∞i=1Bi ∈ F , i.e. the intersection of countablymany elements of F belongs to F .

The Borel-σ-algebra B(Rd) over Ω = Rd is the intersection of all σ-algebras containing the open setsin Rd. It is the smallest σ-algebra containing all open sets in Rd.

Problem 1.2 Show that all one-points sets x, x ∈ R belong to B(R). Show that Q belongs toB(R).

The σ-algebra σ(A) generated by a subset A ⊆ P(Ω) is the intersection of all σ-algebras containingA:

σ(A) :=⋂B : B is a σ − algebra over Ω with A ⊆ B.

Then B(R) is the σ-algebra generated by e.g. the open intervals (−a, b), a, b ∈ Q.t:opg1-3

Problem 1.3 Let Ω = Z+. Suppose that A = i | i ∈ Z+ is the collection of all one-point sets.Determine the minimal σ-algebra containing A.

t:opg1-6

Problem 1.4 Let V ⊂ N. Let V be the class of subsets for which the ‘(Cesaro) density’

γ(V ) = limn→∞

#(V ∩ 1, . . . , n)n

exists. Give an example of sets V , W ∈ V for which V ∩W 6∈ V. Hence, V is not a σ-algebra.

Problem 1.5 Let Ω = 0, 1Z+ , i.e. Ω = (ω1, ω2, . . .), ωn ∈ 0, 1, n = 1, 2, . . .. Define

F = σ(ω : ωn = k, n ∈ Z+, k ∈ 0, 1).

Describe F in words. Show that F contains the following sets: (i)An = ω : ωi = 0, i > n; (ii)ω :

∑∞i=1 ωi < ∞; (iii) ω :

∑∞i=1 ωi2−i < 1/3; (iv) ω : limn→∞

∑ni=1 ωi/n = 1/2.

2

Probability measure A statement S about points ω ∈ Ω is said to hold almost everywhere (a.e.)if

S = ω | S(ω) is true ∈ F , and PS = 1.

As an example of a simple probability space, take Ω = ±1n, F = P(Ω) (power set or collection ofall sub-sets), and P the Laplace-measure on Ω, i.e.

PB =#(B)#(Ω)

.

σ-algebras are complicated objects. It is often easier to work with π-systems.

A collection I of subsets of Ω is called a π-system, if it is invariant under intersection:

I1, I2 ∈ I → I1 ∩ I2 ∈ I.

t:l-3

Lemma 1.1 Let µ1, µ2 be two probability measures on (Ω, σ(I)), such that µ1 = µ2 on I. Thenµ1 = µ2 on σ(I). That is, if two probability measures agree on a π-system, then they agree on theσ-algebra generated by the π-system.

Problem 1.6 Give a π-system I, such that σ(I) = B([0, 1]).

Let Ω = [0, 1] and F = B([0, 1])) the Borel-sets on [0, 1]. The Lebesgue measure P = λ ‘measures’the length of an interval:

λ(a, b] = b− a.

It is not trivial to prove that λ can be extended as a probability measure on ([0, 1],B([0, 1]).

Let (Ω,F ,P) be a probability space and let Ann∈N be a sequence of events (An ∈ F , n = 1, . . .).Then

lim supn→∞

An := ∩m ∪n≥m An = (An i.o. )

i.o=infinitely often. Explanation: x ∈ lim supn An iff x ∈ ∪n≥mAn for all m. Then x ∈ lim supn An

iff for all m there exists n ≥ m such that x ∈ An. Similarly

lim infn→∞

An := ∪m ∩n≥m An = (An eventually).

Then x ∈ lim infn An iff there exists m such that x ∈ ∩n≥mAn. That is, x ∈ lim infn An iff x belongsto all An except at most finitely many.

Problem 1.7 Prove that lim infn→∞ An ⊂ lim supn→∞ An.

The notation An ↑ A means: An ⊂ An+1, n ∈ N, A = ∪nAn; An ↓ A means that An ⊃ An+1,A = ∩nAn.

Lemma 1.2 (Monotone convergence of the measure of a set) i) An ↑ A implies PAn ↑PA;

ii) An ↓ A implies PAn ↓ PA.

Problem 1.8 Prove this lemma, see the hint on BSP p.3. (BSP=Basic Stochastic Processes)

3

Note that for (ii) it is crucial that we consider probability measures. In case of a general measureµ, (ii) does not necessarily hold when µ(Ω) = ∞. Example is: the Lebesgue measure on R withAn = (n,∞). Then µ(An) = ∞, and A = ∩nAn = ∅, µ(∅) = 0.

Lemma 1.3 (Fatou Lemma for sets) i) Plim infn→∞ An ≤ lim infn→∞ PAn

ii) Plim supn→∞ An ≥ lim supn→∞ PAn.

In this case, (ii) requires finiteness of the measure at play.

Problem 1.9 Give an example where (ii) does not hold.

Proof of the Fatou Lemma. We prove (ii). Let Gm = ∪n≥mAn and Gm ↓ G = lim supn→∞ An

(why?). Hence PGm ↓ PG. Since PGm ≥ PAn, we have that PGm ≥ supn≥m PAn.Hence

PG =↓ limm→∞

PGm ≥↓ limm→∞

supn≥m

PAn = lim supn→∞

PAn.

QED

Problem 1.10 Prove statement (i) of the Fatou Lemma.

Lemma 1.4 (First Borel-Cantelli Lemma) Suppose that∑∞

n=1 PAn < ∞. Then

Plim supn→∞

An = PAn i.o. = 0.

Applications of this lemma come later, after introducing the notion of independence.

Random variables What functions on a probability space (Ω,F ,P) are consistent with the σ-algebra F? These are the measurable functions.

Definition 1.2 A map X : Ω → R is called (F)-measurable is X−1(B) ∈ F for all B ∈ B(R).

In other words:X ∈ B := ω : X(ω) ∈ B = X−1(B)

is an observable event for all B ∈ B(R). In the probabilistic context a measurable, real-valuedfunction is called a random variable. For the present we stick to speaking of measurable functions.

If Ω = Rk and F = B(Rk), then we call X a Borel-function.t:opg1-5

Problem 1.11 Let Ω = [0, 1] and let A ( Ω, A 6= ∅. Determine the minimal σ-algebra F containingA. Classify all (Ω,F)-measurable functions.

4

The building blocks of these functions are the elementary functions: let A1, . . . , An ∈ F be disjoint(Aj ∩Ai = ∅, j 6= i) and let a1, . . . , an ∈ R. Then

f(ω) =n∑

i=1

ai1Ai(ω)

is an elementary function or a simple function. Here 1Ai is the indicator-function of Ai, i.e.

1Ai(ω) =

1, ω ∈ Ai

0, otherwise.

Problem 1.12 Show that an elementary function is measurable.t:opg1-4

Problem 1.13 Let Ω = R, and F = B(R). Show that the function X : R → R, defined by

X(ω) =

1, ω ∈ Q,0, ω ∈ R \ Q

,

is an elementary function.

In order to show that limits of elementary functions are measurable, we need the following elementaryresults on measurability.

Lemma 1.5 i) f−1 preserves set operations:

f−1(∪αAα) = ∪αf−1(Aα); f−1(Ac) = (f−1(A))c, ...

ii) If C ⊆ B is a collection of sets generating B(R), that is σ(C) = B(R), then f−1(C) ∈ F for allC ∈ C, implies that f is F-measurable.

iii) The function g : Ω → R is measurable, if

g ≤ x = ω : g(ω) ≤ x ∈ F , for all x ∈ R.

Proof. The proof of (i) is straightforward. For the proof of (ii) let C(B) be the collection of elementsB (are sets!) of B(R), with f−1(B) ∈ F . By (i) C(B) is a σ-algebra, by assumption C(B) containsC, hence C(B) contains B. (iii) follows from (ii) when we take C = π(R) the class of intervals of theform (−∞, x]. QED

Problem 1.14 Let Ω = R, F = B(R). Show that f : R → R given by f(x) = cos(x) is measurable.

Measurability is preserved under a number of operations.t:l-1

Lemma 1.6 (Sums and products are measurable) f, g measurable and λ ∈ R, then f +g, f ·gand λf are measurable.

5

Proof (partial). It is sufficient by the previous lemma to check that f + g > x ∈ F (why?). Now,f(ω) + g(ω) > x iff f(ω) > x− g(ω). Hence there exists qω ∈ Q, such that f(ω(> qω > x− g(ω). Itfollows that

f + g > x = ∪q∈Qf > q ∩ g < x− q,the latter of which is a countable union of elements of F . QED

Lemma 1.7 (Composition lemma) If f is F-measurable and g a Borel function, then the com-position g f is F-measurable.

In the next lemma, we may allow that the limits have values ±∞. All results can be extended tothis case, but here we restrict to finite limits. This lemma ensures that non-increasing limits ofelementary functions are measurable. Most ‘reasonable’ functions fall into this category.

t:l-2

Lemma 1.8 (Measurability of infs, liminfs and lims) Let f1, . . . be a sequence of measurablefunctions. Then (i) infn fn, supn fn, (ii) lim infn fn, lim supn fn are measurable (provided theselimits are finite); moreover (iii) ω : limn fn(ω) exists ∈ F .

Proof. For (i), use ω : infn fn(ω) ≥ x = ∩nω : fn(ω) ≥ x. For (ii) let ln(ω) = infm≥n fm(ω).Then ln is measurable by (i). Moreover,

l(ω) := lim inf fn(ω) =↑ limn

ln(ω) = sup ln(ω),

and so l ≤ x = ∩nln ≤ x ∈ F . For (iii), note that

limn

fn exists = lim sup fn < ∞ ∩ lim inf fn > −∞ ∩ lim sup fn − lim inf fn = 0.

QED

Problem 1.15 We did not prove the case of sup and lim sup. How does this follows from the infand lim inf case?

The uniqueness lemma for measures allow to deduce results on σ-algebras from results on π-systemsfor these σ-algebras. There is a similar result for measurable functions. The following theorem allowto deduce results for general measurable functions form results on indicator functions of elementsfrom a π-system for the σ-algebra at hand! This version is taken from Williams’ book, most versionstend to be formulated as assertions on σ-algebras.

Theorem 1.9 ((Halmos) Monotone class Theorem: elementary version) Let H be a classof bounded functions from a set S into R, satisfying the following conditions:

i) H is a vector space over R (i.e. it is an Abelian group w.r.t to addition of functions, it isclosed under scalar multiplication by real scalars, such that (αβ)f = α(βf),(−1)f = −f and(α + β)f = αf + βf , for f ∈ H, α, β ∈ R);

ii) if fn,n = 1, 2, . . ., is a sequence of non-negative functions in H such that fn ↑ f , with f bounded,then f ∈ H;

iii) The constant function is an element of H.

If H contains the indicator function of every set in a π-system I, then H contains overy boundedσ(I)-measurable function.

6

Coin tossing Let Ω = 0, 1N. So, Ω = (ω1, ω2, . . .), ωn ∈ 0, 1, n = 1, . . .. Define

F = σ(ω : ωn = k : n ∈ N, k ∈ 0, 1).

Let Xn(ω) be the projection on the n-th co-ordinate: Xn(ω) = ωn. It is the result of the n-th toss.By definition of F , Xn is a random variable. By Lemma 1.6

Sn = X1 + · · ·+ Xn = number of ones in n tosses

is a random variable. Next for x ∈ [0, 1]ω :

number of ones in n tossesn=number of tosses

→ x

= ω : lim supSn

n= x ∩ ω : lim inf

Sn

n= x ∈ F

by Lemma 1.8. Note that this means that the Strong Law of Large Numbers is a meaningful result!t:opg1-7

Problem 1.16 Define Pω : ω1 = x1, . . . , ωn = xn = 1/2n, where x1, . . . , xn ∈ 0, 1. Assume thatthis can be extended to a probability measure on Ω. Prove the following assertions:

i) E = ω :∑

n ωn < ∞ ∈ F , and PE = 0.

ii) The function X(ω) =∑

n ωn2−n is a random variable.

iii) λ(a, b] = PX ∈ (a, b] for all intervals (a, b] ∈ [0, 1].

iv) λ(B) = PX ∈ B for all Borel sets B ⊂ [0, 1]. Hence X has the uniform distribution on [0, 1].

σ-algebra generated by a random variable or a collection of these Suppose we have acollection of random variables Xt : Ω → R, t ∈ I, where I is some index set. Then

X = σ(Xt : t ∈ I)

is defined to be the smallest σ-field, such that each random variable Xt is X -measurable. It followsthat X ⊂ F ! One can view σ(Xt : t ∈ I) as the information carried by the random variables Xt, t ∈ I.For instance, observing an outcome y = X1(ω), we can only retrieve the set X−1

1 (y) that ω belongsto, and in general not the precise point ω that produced outcome y. Compared to the σ-algebraF , we lose information by observing the outcome of a random variables, and so the σ-algebrasσ(X1), σ(X1, X2), . . . ,X are sub-σ-algebras of F . It makes sense that observing more outcomesy1 = X1(ω), y2 = X2(ω), . . ., provides us more information as to the precise point ω that producedthese outcomes. This is consistent with the fact that e.g. σ(X1, . . . , Xn) ⊃ σ(X1, . . . , Xn−1): themore outcomes we observe, the bigger ‘finer’ the generated σ-algebra.

How can we build σ-algebra X if e.g. the index set I = N? π-systems help us here: let Xn = σ(Xk :k ≤ n), then ∪nXn is a π-system that generates σ(Xn : n ∈ N).

Problem 1.17 Let Ω = [0, 1], F = B([0, 1]), and

X1(ω) =

1, ω ≤ 1/50, ω > 1/5

, X2(ω) =

−1, ω ≤ 1/20, 1/2 < ω ≤ 3/42, ω > 3/4

Determine σ(X1), σ(X2) and σ(X1, X2). Describe all σ(X1, X2)-measurable functions.

7

Problem 1.18 Let Ω = R, F = B(R). For X(ω) = cos(ω), determine σ(X). Is Y defined byY (ω) = sin(ω) σ(X)-measurable?

Problem 1.19 Prove that the σ-algebra σ(X) generated by the random variable X is given by

σ(X) = X−1(B) := (ω |X(ω ∈ B : B ∈ B),

and that σ(X) is generated by the π-system

π(X) := (ω |X(ω) ≤ x : x ∈ R).

How can one characterise π-systems generating σ(X1, . . . , Xn) and X ? Explain.t:mtb

Theorem 1.10 Let (Ω,F) be a measure space. Let Ω1 be another space and f : Ω1 → Ω a function.Let F1 = σ(f−1(A), A ∈ F) = f−1(F) be the σ-algebra generated by the inverse images of A ∈ Funder f . Then a function g : Ω1 → R is F1-measurable if and only if there exists a F-measurablefunction h : Ω → R such that g = h(f).

An application of the above theorem is the Doob-Dynkin lemma.

Lemma 1.11 (in BSP:Doob-Dynkin lemma) Let X : Ω → R be a random variable. Then Y : Ω → Ris σ(X)-measurable if and only if there exists a Borel function f : R → R such that Y = f(X).

The lemma can be proved by first proving it for elementary functions and then extending this topositive and then to general measurable functions.

Problem 1.20 Show how the Doob Dynkin lemma follows from theorem 1.10. Suppose that X isan elementary function. Show the assertion of the Lemma by explicitly constructing σ(X) and bysubsequently specifying how to choose f .

2 Integration, Expectation and Independence

It is convenient here to assume a general measure µ, i.e. we have a measure space (Ω,F , µ). As areminder: we say that an event A ∈ F occurs µ-a.s.(almost surely), or µ-a.e. (almost everywhere),if µ(Ac) = 0. In case that µ is a probability measure, we can also say that this event occur swithprobability 1.

For a non-negative elementary funtion f =∑n

i=1 ai1Ai, ai ≥ 0, i = 1, . . . , n, we define∫fdµ =

n∑i=1

aiµ(Ai).

For general positive, measurable functions f , the integral can be defined by∫fdµ = lim

n→∞

∫fndµ,

where fn, n = 1, . . ., is a non-decreasing sequence of elementary functions, with fn ↑ f , n →∞.

For example, one can choose

fn(ω) =

n, f(ω) > n(i− 1)2−n, (i− 1)2−n < f(ω) ≤ i2−n ≤ n, i = 1, . . . , n2n.

8

Problem 2.1 These approximating elementary functions fn are σ(f)-measurable. Prove this.

For general measurable f , write f = f+− f−, f+, f− ≥ 0: f+ = max(f, 0), f− = max(−f, 0). Thenf is integrable if at least one of

∫f+dµ,

∫f−dµ < ∞; if both are finite we call f summable! N.B.

this is slightly different from definition 1.9 of SBP. N.B. this stepwise argument from elementaryfunctions, via positive functions to general functions is part of a standard proof machine. Lateron itwill be used for stochastic integrals.

Problem 2.2 Let Ω = (0, 1], F = B(R), µ = λ. Let f = 1Q∩(0,1]. Calculate∫

fdλ.t:opg2-1

Problem 2.3 i) Suppose that µ(f 6= 0) = 0 for some measurable function f (not necessarily non-negative!). Prove that

∫fdµ = 0.

ii) Let µ(f < 0) = 0 (i.e. f ≥ 0 µ-a.e.). Prove that∫

fdµ ≥ 0.

iii) Let f be a measurable function with µ(f < 0) = 0. Prove that∫

fdµ = 0 implies µ(f > 0) = 0,i.e. f ≡ 0 µ-a.e. Give an example of a measure space and a function f with f 6≡ 0, and∫

fdµ = 0.

The next step is to formulate a number of basic convergence theorems giving conditions under whichintegral and limits may be interchanged. These conditions amount to requiring positivity (positivefunctions are always integrable, there are no problems of substracting ∞ from ∞) or some well-behaved dominating function.

t:th-1

Theorem 2.1 Monotone convergence Theorem Suppose that 0 ≤ fn ↑ f µ-a.e.(i.e. µ( ∪n (fn < 0) ∪ (f < 0) ∪ (fn 6↑ f)) = 0). Then

limn→∞

∫fndµ =

∫lim

n→∞fndµ =

∫fdµ. (2.1)

Dominated Convergence Theorem Suppose that fn → f µ-a.e., and |fn| ≤ g µ-a.e. with g aµ-summable function. Then ∫

|fn − f |dµ → 0, n →∞

and in particular (2.1) holds.

Lemma 2.2 (Fatou’s Lemma) (BSP, p. 109) If fn ≥ 0 µ-a.e., then∫lim inf

nfndµ ≤ lim inf

n

∫fndµ.

Proof. Let gn := infk≥n fk. gn is measurable and gn ↑ lim infk fk. Then fk ≥ gn for k ≥ n. Hence∫fkdµ ≥

∫gndµ, k ≥ n (see question 2.3 (iii)). By monotone convergence

∫gndµ ↑

∫lim infk fkdµ,

and so ∫lim inf

kfkdµ =↑ lim

n

∫gndµ ≤↑ lim

ninfk≥n

∫fkdµ = lim inf

n

∫fndµ.

QED

9

Problem 2.4 There is a limsup version of Fatou’s lemma:∫lim sup

nfndµ ≥ lim sup

n→∞

∫fndµ.

Provide conditions on the sequence fn, n = 1, . . ., such that the version follows from the aboveFatou’s lemma.

Problem 2.5 Let Ω = (0, 1]; F = B(0, 1] and µ = λ, the Lebesgue measure. Let

fn = n1(0,1/n].

Compute limn fn, and limn

∫fndλ. Compare this with the statements in the Monotone Convergence

Theorem, Dominated Convergence Theorem and Fatou’s Lemma. Which results fail and why?

Problem 2.6 Let fn, n = 1, . . . and f be measurable functions, with the property that∫|fn −

f |dµ → 0, n → ∞. Does this imply that fn → f , n → ∞, µ-a.e.? Unfortunately not in general:choose Ω = (0, 1], F = B(0, 1] and

f2n+i = 1(i·2−n,(i+1)·2−n], i = 0, . . . , 2n − 1, n = 0, 1, . . .

Calculate∫

fkdλ, investigate whether the limits limk→∞ fk and limk→∞∫

fkdλ exist.

In order to be able to define conditional expectations lateron, we need the following result.

Theorem 2.3 (Radon-Nikodym, BSP p. 28) Let (Ω,F) be given. Suppose that µ is a σ-finitemeasure, i.e. there are events An, n = 1, . . . ∈ F , with ∪An = Ω and µ(An) < ∞ for n = 1, . . ..Suppose further that ν is µ-absolutely continuous, i.e. µ(A) = 0 implies ν(A) = 0. Then there existsa measurable function f ≥ 0, which is integrable w.r.t. µ, such that

ν(A) =∫

Afdµ =

∫f1Adµ.

Notation: f = dν/dµ is called the density or Radon-Nikodym derivative of ν w.r.t. µ.

A consequence of the Theorem for measurable functions g, integrable w.r.t. ν, is that∫gdν =

∫g · fdµ, (2.2)

Back to random variables and probability measures In general, when speaking of randomvariables, we define these in terms of the outcomes (values X van take) and a probability distributionon the space of outcomes. The underlying probability space (Ω,F ,P) is mostly left undefined andits role is hidden.

It can be useful to know a way of constructing an underlying probability space. However, first wewill discuss some notation and concepts for random variables related to integration.

Suppose that (Ω,F ,P) is given as well as the random variable X : Ω → R. Then PX given by

PXA = Pω : X(ω) ∈ A

is a probability measure on (R,B(R)) by virtue of the so-called “overplantingsstelling’.

10

Theorem 2.4 (Overplantingsstelling) Let Ω,F , µ) be a measure space. Suppose that (Ω′,F ′) isa measurable space. Let f : Ω → Ω′ be a F–F ′-measurable function in the sense that f−1(A′) ∈ Ffor all A′ ∈ F ′. Then the function

µ′(A′) = µf−1(A′), A′ ∈ F ′,

is a measure on F ′. Moreover, for any F ′-measurable function g : Ω′ → R, one has∫Ω

g(f)dµ =∫

Ω′gdµ′,

in the sense that both integrals exist and are equal, whenever at least one of them exists.

Problem 2.7 Prove this theorem.

PX is called the probability distribution of X. Since (−∞, x], x ∈ R is a π-system generatingB(R), the uniqueness lemma 1.1 implies that it is sufficient to specify the values

FX(x) = PX(−∞, x] = PX ≤ x,

which is called the (probability) distribution function of X.

Problem 2.8 Show that FX has the following properties:

i) F : R → [0, 1], FX is non-decreasing;

ii) limx→−∞ FX(x) = 0, limx→∞ FX(x) = 1;

iii) FX is right-continuous.

The function FX provides a nice tool for the construction of random variables with a given distributionfunctions.

Let a function F with properties (i,ii,iii) be given. Then again there is a unique (why?) probabilitymeasure p on (R,B(R)) with

p(−∞, x] = F(x).

Choose Ω = R, F = B(R) and P = p, and set X(ω) = ω. We have PX = p.

We can also construct X on (Ω,F ,P) = ([0, 1],B[0, 1], λ): set

X(ω) = infy : F(y) ≥ ω(= supz : F(z) < ω.

This is called the Skohodrod representation.

Problem 2.9 Show that FX = F.

If the probability measure PX is absolutely continuous w.r.t. the Lebesgue measure then PX has aprobability density function fX (w.r.t. to the Lebesgue measure) by the Radon-Nikodym theoremand then we can write

PXA =∫

AfX(x)dλ(x).

11

Whenever f is Riemann-integrable, this integral is the same as the normal Riemann integral! Thisapplies for instance when the density is a continuous function on an open interval of R. On the otherhand, there is a pit-fall here: one would expect continuity of FX to imply existence of a density. Thisis not true: the Cantor set provides a way to construct an example of this. However If FX is acontinuous function and there is a function f such that FX(x) =

∫ x−∞ f(u)du, then the density exists

and one may choose fX(x) = f(x), as in the usual cases.

If X is P-summable, we say that X has finite expectation (or a finite first moment), given by

E(X) =∫

ΩX(ω)dP(ω).

Using the ‘overplantingsstelling’, we can write it in terms of PX by E(X) =∫R xdPX(x). If X has a

density fX w.r.t. λ then

E(X) =∫

RxfX(x)dλ(x).

N.B. different authors define the existence of the expectation or of moments differently: some requireonly integrability in our sense.

If, X2 is P-summable, then we call X square integrable. The variance of X is defined by σ2((x) =E(X − E(X))2(= E(X2)− (E(X))2.

In order to calculate expectations of functions of X, we can use the ‘overplantingsstelling’ in aconvenient way. Suppose that g : R → R is Borel-measurable. Then g(X) is has a finite expectationif and only if g is summable w.r.t PX and we have

Eg(X) =∫

Ωg(X(ω))dP(ω) =

∫R

g(x)PX(x).

If X has a density fX w.r.t. λ, then

Eg(X) =∫

Rg(x)fX(x)dλ(x).

Remark The space of summable functions on (Ω,F ,P) is denotes by L1(Ω,F ,P), and the space of

square integrable functions on (Ω,F ,P) by L2(Ω,F ,P). Both play important roles: ||X||1 =∫|f |dP

and ||X||2 =√∫

X2dP act ‘almost as’ norms on these spaces. The problem is that ||X||1,2 = 0 doesnot imply that X = 0. It only implies that X = 0 P-a.e.! The solution is to define equivalenceclasses of functions that are P-almost everywhere equal. The resulting quotient spaces are denotedby L2(Ω,F ,P) and L2(Ω,F ,P) and these are complete, normed spaces. In case of L2(Ω,F ,P), thenorm comes from the inner product (X, Y ) = E(XY ) and so the space is a Hilbert space. Note thatconvergence in these spaces means convergence in the respective norms.

Problem 2.10 Suppose that X takes only countably many values.

i) What type of function is X? PX cannot be absolutely continuous w.r.t. the Lebesgue measure λon (R,B(R))- why?

ii) Give a formula for E(X) and σ2(X).

12

iii) Suppose that X ∈ 0, 1, . . . P-a.s. and suppose that X has a finite expectation. Show thefollowing alternative formula for its expectation:

E(X) =∞∑

n=0

PX > n.

N.B. the limit theorems of the previous section can be transferred to a formulation in terms ofexpectations! When restricting to postive r.v.s, these can be used to yield some useful results. Notethat by definition for a r.v. X we have P|X| < ∞ = 1 (this is not necessary, the theory holdsthrough if we allo infinite values!). Suppose that Xnn∈N is a collection of r.v.s on (Ω,F ,P), thatare all P-a.e. non-negative.

• One hasE(

∑n

Xn) =∑

n

E(Xn), (2.3)

where both are either finite or infinite.

•∑

E(Xn) < ∞ implies that∑

n Xn < ∞ a.e. and so Xn → 0, n →∞, a.s.

Problem 2.11 Prove this. Conjure up a simple example where (2.3) fails when the positivitycondition lacks.

One can write probabilities of sets in terms of expectations:

PX ∈ A = PXA = E1X∈A

and similarly ∫A

gdPX =∫

Rg1AdPX .

We conclude this section with two important inequalities.

Lemma 2.5 (Chebyshev’s inequality) Suppose that X is a random variable. Let φ : R → R+ bea non-decreasing, non-negative function such that E(φ(X)) < ∞. Then for all a > 0 with φ(a) > 0one has

PX ≥ a ≤E(φ(X))

φ(a).

Proof.

E(φ(X) =∫

φ(x)dPX(x)

≥∫

x≥aφ(x)dPX(x)

≥ φ(a)PX ≥ a.

Positivity of φ justifies the first inequality. QED

13

Let Z ∼ N (0, 1), that is Z has the standard normal distribution with density

fZ(x) =1√2π

exp−x2/2.

We will prove thatPZ > a ≤ exp−a2/2. (2.4)

Take φ(z) = expγz, γ > 0. Then

E(φ(Z)) =1√2π

∫zexpγz − z2/2dz

=1√2π

∫zexp−(z − γ)2/2dz · expγ2/2

= expγ2/2.

So thatPZ > a ≤ expγ2/2− γa γ=a

= exp−a2/2.

As an application, let X1, X2, . . . be N (0, 1) distributed random variables. Let An =maxX1, . . . , Xn >

√6 log n. Then

PAn = PmaxX1, . . . , Xn >√

6 log n≤ nPX1 >

√6 log n ≤ n exp−6 log n/2 = 1/n2.

Hence∞∑

n=1

PAn ≤∞∑

n=1

1/n2 < ∞.

Applying the first Borel-Cantelli lemma yields that

0 = Plim supn→∞

An = P lim supn→∞

maxX1, . . . , Xn√6 log n

> 1.

This implies that for a.a. ω

maxX1(ω), . . . , Xn(ω) ≤√

6 log n, n ≥ n(ω).

A function f : A → R, where A = (a, b) is een open interval of R, is called convex on A if for allx, y ∈ A, one has that

f(px + (1− p)y) ≤ pf(x) + (1− p)f(y).

Important convex functions on R are f(x) = |x|, x2, expαx.

Lemma 2.6 (Jensen’s Inequality, BSP p.31) Suppose that f : A → R is convex on A, withA = (a, b). Suppose that X is a summable r.v. with

PX ∈ A = 1, E(|f(X)| < ∞.

ThenEf(X) ≥ f(E(X)).

Problem 2.12 Prove this lemma, by successively carrying out the following steps.

14

i) Show that there exists c ∈ [a, b], such that f is non-increasing on (a, c) and non-decreasing on(c, b). And use this to show continuity of f on A.

ii) Show that for x0 < x1 < x2, x0, x1, x2 ∈ A, one has

f(x2)− f(x0)x2 − x0

≥ f(x1)− f(x0)x1 − x0

,

by suitably expressing x1 as convex combination of x2 and x0. Show that together with (i) thisimplies that for each x0 ∈ A there exist a number n(x0) with

f(x) ≥ f(x0) + n(x0)(x− x0), x ∈ A.

iii) Finish the proof of the lemma by taking expectation in the last inequality and selecting a suitablevalue for x0.

Independence We now have a basic probability space (Ω,F ,P).Independence of σ-algebras Sub-σ-algebras F1,F2, . . . of F are called independent, whenever for eachsequence of sets A1 ∈ F1, A2 ∈ F2, . . . and each finite set of distinct indices i1 < i2 < · · · < in onehas

PAi1 ∩ · · · ∩Ain =n∏

k=1

PAik.

Independence of r.v.s Random variables X1, X2, . . . are independent if the σ-algebrasσ(X1), σ(X2), . . . are independent.Independence of events Events A1, A2, . . . are independent if the σ-algebras A1,A2, . . . are indepepen-dent, where

Ai = ∅,Ω, Ai, Aci.

In other words, if the r.v.s 1A1,1A2, . . . are independent.

Problem 2.13 Show that for independence of A1, . . ., it is sufficient to check for each finite set ofindices i1, i2, . . . , in that

PAi1 ∩ · · · ∩Ain =n∏

k=1

PAik.

Remark: independence of r.v.s X1 and X2 say, does not imply that X2 is not σ(X1)-measurable.Construct a trivial example to illustrate this.

Checking independence of σ-algebras and r.v.s is a cumbersome task, but fortunately π-systemslighten (up) life.

Lemma 2.7 Suppose that F1 and F2 are sub-σ-algebras of F . Suppose that there are π-systems I1

and I2 generating F1 and F2:σ(I1) = F1, σ(I2) = F2.

Then F1 and F2 are independent iff I1 and I2 are independent in that

PI1 ∩ I2 = PI1PI2, I1 ∈ I1, I2 ∈ I2.

15

Proof. Clearly, independence of the σ-algebras implies independence of the π-systems. So assumeindependence of the π-systems. The only apparatus for extending assertions on measures to wholeσ-algebras we have so far, is the uniqueness lemma 1.1.

Let I1 ∈ I1 be given. Then

µ(A) = PI1 ∩A, ν(A) = PI1PA

are measures on I2 (check this). These two measures agree on the π-system I2. Moreover, µ(Ω) =ν(Ω). By the uniqueness lemma they now agree on the whole of F2. This implies

PI1 ∩A = PI1PA, A ∈ F2. (2.5)

Since I1 ∈ I1 was arbitrarily chosen, (2.5) holds for all I1 ∈ I1 and A ∈ F2. Now, fix A ∈ F2 anddefine µ(B) = PB ∩A, ν(B) = PBPA. Again µ and ν agree on I1, with ν(Ω) = µ(Ω), and soby the uniqueness lemma they agree on the whole of F1. This is what we wanted to prove. QED

Example. Suppose that for two random variables X and Y one has

PX ≤ x, Y ≤ y = PX ≤ xPY ≤ y, x, y ∈ R,

i.e. the π-systems π(X) = (X ≤ x);x ∈ R and π(Y ) are independent. These π-systems generatethe σ-algebras σ(X) and σ(Y ), so that independence of X and Y follows. N.B. The book BSPtreats this matter slightly differently - independence of r.v.s is slightly differently defined.

Problem 2.14 Let X1, X2, . . . be independent r.v.s. Show that the σ-algebras σ(X1, . . . , Xn) andσ(Xn+1, . . . , Xn+l) are independent.

Of course it is nice to define independence, but can one construct independent r.v.s at all? Remindthe construction in Problem 1.16. There we had that X(ω) =

∑n ωns−n has the uniform distribution

on (0, 1].

Problem 2.15 Show that Zn(ω) = ωn, n = 1, . . . are independent, indentically distributed r.v’s,and give their distribution.

It follows easily that also

X1(ω) = ω12−1 + ω32−2 + ω62−3 + ω102−4 + . . .

X2(ω) = ω22−1 + ω52−2 + ω92−3 + ω142−4 + · · ·X3(ω) = ω4s

−1 + ω82−2 + ω132−3 + ω192−4 + · · ·

and so forth, have the uniform distribution on (0, 1]. The different subsequences of the expansionof ω generating the X ′

is are disjoint. It is intuitively clear that Xi are independent r.v.s with thesame uniform distribution on (0, 1]. Let any sequence of distribution functions Fn, n ∈ N be given.By the Skohodrod representation, one can find r.v.s Yn = gn(Xn) having distribution function Fn.Independence is preserved obviously.

Problem 2.16 Let X and Y be independent r.v’s. Let g, h : R → R be Borel functions. Show thatg(X) and h(Y ) are independent.

16

Let us now consider two rv.s X and Y on the probability space (Ω,F ,P). For each point ω we havethe vector function (X(ω), Y (ω)) taking values in R2. This gives rise to distributions on the planeR2 and hence of so-called product measures. We will not further discuss this here, but restrict toessentially one-dimensional sub-cases.

First instance is when we consider g(ω) = X(ω)Y (ω).

Lemma 2.8 If X and Y are independent then E(XY ) = E(X)E(Y ), provided the latter expectationsexist (i.e. X and Y are P-summable).

Problem 2.17 Prove this by carrying out the following steps. First show the result for elementaryfunctions, then for positive functions. For the latter one uses sequences of approaching elementaryfunctions for X and Y : note these should be independent! Then finish the proof.

We now turn to proving a number of results on sequences of random variables. The proof rely onassertions derived or stated hitherto. They will be applicable lateron to stochastic processes.

There are two results that concern sequences of independent, identically distributed r.v.s X1, X2, . . .on the probability space (Ω,F ,P).

For the first lemma, we need the concept of stopping time BSP pag.54). T : Ω → R ∪ ∞ is astopping time for the sequence X1, . . ., if T ≤ n ∈ σ(X1, . . . , Xn). In words: the decision to stopbefore or at time n is taken on basis of the outcomes X1, X2, . . . , Xn. Note that we allow T = ∞,this is the non-stopping decision.

Let Xn = 1 with probability p and X = −1 with probability (1 − p): it can be interpreted as arespective gain and loss a gambler incurs when tossing a biased coin. Then the gamblers gain or lossafter n tosses equals Sn = X1 + · · ·+Xn. If the gamblers decides to stop after the n-th game, if gainat that time is some number x, then this is a stopping time.

Lemma 2.9 (Wald’s equation) Let X1, . . . be a sequence of i.i.d. distributed r.v.s with finiteexpectation. Suppose that T < ∞ a.e. and E(T ) < ∞. Then E(

∑Ti=1 Xi = E(X1)E(T ).

Proof. Write ST =∑T

i=1. Assume first that Xi ≥ 0 P-a.e. Now we have (check the validity of allsteps)

EST =∞∑

n=1

∫ω:T (ω)=n

ST dP =∞∑

n=1

∫ω:T (ω)=n

SndP =∞∑

n=1

n∑k=1

∫ω:T (ω)=n

XkdP

=∞∑

k=1

∞∑n=k

∫ω:T (ω)=n

XkdP =∞∑

k=1

∫ω:T (ω)≥k

XkdP =∞∑

k=1

E(Xk1T≥k)

=∞∑

k=1

E(Xk)PT ≥ k

= E(X1)∞∑

k=1

PT ≥ k = E(X1)E(T ).

For the second equality we use that T < ∞ with probability 1. For the 7th equality we use indepen-dence of Xk and 1T≥k. To show independence, we use the fact that 1T≥k = 1 − 1T≤k−1. By

17

definition, 1T≤k−1 is σ(X1, . . . , Xk−1)-measurable, hence 1T≥k = 1 − 1T≤k−1. It follows thatσ(1T≥k) ⊂ σ(X1, . . . , Xk−1). Since σ(Xk) and σ(X1 . . . , Xk−1) are independent, and so σ(Xk) andσ(1T≥k) are independent, i.e. Xk and 1T≥k are independent.

Now, for general r.v.s Xn, the assertion follows from the fact that it applies to X+1 , . . . and X−

1 , . . ..Check this. QED

Problem 2.18 Suppose that p ≥ 1/2. The gambler intends to stop the first time t that his totalgain is -1 (i.e. he has 1 less than what he started with), i.e. T = t iff t = minn|Sn = −1.Assuming that T is finite with probability 1, and has finite expectation, Wald’s equation applies.What contradiction do we get and what might be wrong with our assumptions on T? Study the casep = 1/2 and p > 1/2 separately.

Many interesting events have probability 0 or 1. The first Borel-Cantelli lemma is an assertion ona sequence of events when their probabilities are a finite series. What can we say, if there seriesdiverges? For this, we need an extra condition of independence.

Lemma 2.10 (Second Borel-Cantelli Lemma) Suppose that A1, A2, . . . ∈ F is are independentevents, such that

∑∞n=1 PAn = ∞ Then Plim supn→∞ An = 1.

Proof.G := ( lim sup

n→∞An)c = ( ∩m ∪n≥mAn)c = ∪m ∩n≥m Ac

n.

Call Br,m = ∩rn=mAc

n and Bm = ∩∞n=mAcn. Then G = ∪mBm and Br,m ↓ Bm. Hence by monotone

convergence PBr,m ↓ PBm. By independence

PBr,m =r∏

n=m

PAcn =

r∏n=m

(1− PAn) = exp r∑

n=m

log (1− PAn) ≤ exp −r∑

n=m

PAn,

where we use that log(1− x) ≤ −x for x ∈ (0, 1). By taking limits, we obtain

PBm ≤ limr→∞

exp −r∑

n=m

PAn = 0.

Now, PG ≤∑∞

m=1 PBm = 0, and so PGc = 1, which is what we set out to prove. QED

As an example, let Xn, n = 1, 2, . . . be a sequence of i.i.d. distributed random variables. Supposethat Xn are exponentially distributed with parameter 1, i.e. PXn > x = exp−x, x ≥ 0. Then

PXn > α log n = n−α, α > 0.

Applying the 2 Borel-Cantelli lemmas, we find

PXn > α log n i.o. =

0, α > 11, α ≤ 1.

Put S = lim supn→∞(Xn/ log n). S is a r.v.!

ω : S(ω) ≥ 1 = ω : lim supn→∞

(Xn(ω)/ log n) ≥ 1

⊃ ω : Xn(ω) > log n i.o .

18

Hence, PS ≥ 1 = 1. On the other hand,

PS > 1 + 2α−1 ≤ PXn > (1 + α−1) log n, i.o = 0.

We have that S > 1 = ∪∞α=1S > 1 + 2α−1, hence PS > 1 = 0. As a consequence:S ≡ 1 withprobability 1.

Problem 2.19 Monkey typing the Bible Suppose that a monkey types a sequence of symbolsat random, one per unit of time. This produces an infinite sequence Xn, n = 1, 2, . . . of i.i.d. r.v.s,with values in the set of possible symbols on the typing machine. If it is a finite set of symbols, thenwe agree that minx PX1 = x := ε > 0. The monkey lives infinitely long and types incessantly.Typing the Bible corresponds to typing a particular sequence of say N symbols (N is the number ofsymbols in the Bible). Let H = monkey types infinitely many copies of the Bible .

Use the second Borel-Cantelli lemma to show that PH = 1. Define suitable Ω, F and P and setsAn.

t:opg2-2

Problem 2.20 A sometimes convenient charachterisation of convergence with probability 1. Let X,Xn, n = 1, . . . be r.v.s. on the same probability space (Ω,F ,P). Then Xn → X with probability 1iff for all ε > 0

limn→∞

P ∪∞m=n (|Xm −X| > ε) = 0.

or equivalently iff for all ε > 0

limn→∞

P ∩m≥n (|Xm −X| ≤ ε) = 1.

Show this.

Problem 2.21 (Algebraic..) Let s > 1 and define the Riemann-zeta function ζ(s) =∑

n∈N n−s.Let X, Y be i.i.d r.v. with

PY = n = PX = n =n−s

ζ(s).

Prove that the eventsAp = X divisible by p, p prime

are independent. Explain Euler’s formula

1ζ(s)

=∏

p prime(1− 1

ps)

probabilistically. Prove that

P no square other than 1 divides X =1

ζ(2s).

Let H be the highest common factor of X and Y . Prove that

PH = n =n−2s

ζ(2s).

19

Problem 2.22 Suppose that Xi denotes the ‘quality’ of the i-th applicant for a job. Applicants areinterviewed in a random order and so one may assume that X1, . . . are i.i.d. random variables withthe same continuous distribution (i.e. they all have a continuous density). What is the probabilitythat the i-th candidate is the best so far? Prove that

PEi =1i,

where Ei = i-th candidate is best so far = Xi > Xj , j < i. Prove that the events E1, E2, . . . areindependent. Why would we assume a continuous distribution for the qualities? Suppose that thereare only a limited amount of N candidates. Calculate the probability that the i-th candidate is thebest amongst all N candidates.

Problem 2.23 Let X1, . . . be i.i.d. r.v.s with the N (0, 1) distribution. Prove that

Plim supn→∞

Xn√2 log n

= 1 = 1.

Use that for x > 0

1x + 1/x

1√2π

exp−x2/2 ≤ PX1 > x ≤ 1x

1√2π

exp−x2/2,

since X1 has the N (0, 1) distribution. The second inequality can be derived from the fact that

d

dxexp−x2/2 = −x exp−x2/2.

We recall one of the versions of the law of large numbers.

Theorem 2.11 (Strong Law of Large Numbers) Let Xn, n = 1, 2, . . . be a sequence of i.i.d.r.v.s on the probability space (Ω,F ,P), with finite expectation. Then

∑ni=1 Xi/n → E(X1), with

probability 1.

There are elementary proofs for which one needs only results from these pages, but we will not dothat here.

Problem 2.24 Is a fair game fair? Let X1, . . . be independent r.v.s with PXn = n2− 1 = 1/n2 =1− PXn = −1. Prove that E(Xn) = 0, but that

X1 + · · ·+ Xn

n→ −1, with probability 1.

This is counter-intuitive, when bearing in mind the Law of Large Numbers! What would you expecton basis of this law?

Problem 2.25 The following is a sometimes simple test of a.s. convergence. Let Xn, n = 1, . . . , Xbe r.v.s on the same probability space (Ω,F ,P). If for all ε > 0∑

n

P|Xn −X| > ε < ∞

then Xn → X with probability 1. Hint: use problem 2.20.

20

Problem 2.26 You have a lamp working on a battery. As soon as the battery fails, you replaceit with a new one. Batteries have i.i.d. lifetimes, say Xn ≥ 0 is the lifetime of the n-th battery.Assume the lifetimes to be bounded: Xn ≤ M with probability 1 for some constant M . Let N(t) bethe number of batteries that have failed by time t.

i) Show that in general N(t) is not a stopping time, whereas N(t) + 1 is. Hint: N(t) = n iffX1 + · · ·+ Xn ≤ t and X1 + · · ·+ Xn+1 > t.

ii) Argue that t < E(∑N(t)+1

i=1 Xi) ≤ t + M . Use Wald’s equation to show the elementary renewaltheorem for bounded r.v.s

limt→∞

EN(t)t

=1

E(X1).

That is: the rate at which batteries fail is exactly 1/expected lifetime. Which is an intuitivelyobvious result.

Problem 2.27 A deck of 52 cards is shuffled and the cards are then turned face up, one at a time.Let Xi equal 1, if the i-th card turned up is an ace, otherwise Xi = 0, i = 1, . . . , 52. Let N denotethe number of cards needed to be turned over until all 4 aces appear. That is, the final ace appearson the Nth card to be turned over.

i) Show that PXi = 1 = 4/52.

ii) Is Wald’s equation valid? If not, why not?

3 The art of conditioning

The corresponding chapter in BSP is clear enough, only few remarks are to be made here. Let usjust give the definition of conditional expectation.

Suppose we have a probability space (Ω,F ,P). Let X be a random variable with finite expectation,i.e. E(|X|) < ∞. Let A ⊂ F be a sub-σ-algebra: say this is our knowledge of the structure of thespace Ω, which is coarser than F , but consistent with it. In fact, let us assume that we cannotobserve X in detail, that is our knowledge of the space Ω is for instance also coarser than σ(X).

Then we have to ‘estimate’ X in a consistent way with our knowledge A. It makes sense to replaceX by averaging the values X over all sets A ∈ A. This gives rise to follow theorem-definition (SBPDef. 2.3, Def. 2.4, Prop. 2.3)

Theorem 3.1 (Fundamental Theorem and Definition of Kolmogorov 1933)Suppose we have a probability space (Ω,F ,P). Let X be a random variable with finite expectation,i.e. E(|X|) < ∞. Let A be a sub-σ-algebra of F . Then there exists a random variable Y such that

i) Y is A-measurable;

ii) E(|Y |) < ∞;

iii) for each A ∈ A we have ∫A

Y dP =∫

AXdP.

21

If Y ′ is another r.v. with properties (i,ii,iii), then Y ′ = Y with probability 1, i.e. PY ′ = Y = 1. Wecall Y a version of the conditional expectation of E(X|A) of X given A and we write Y = E(X|A)a.s.

N.B.1 Conditional expectations are random variables!N.B.2 Suppose we have constructed a A-measurable r.v. Z, with E(|Z|) < ∞, such that (iii) holdsfor all A ∈ π(A), i.e. (iii) holds on a π-system generating A. Then (iii) holds for all A ∈ A, and soZ is a version of the conditional expectation E(X|A).N.B.3 BSP p.29 list a number of important properties of conditional expectation. An importantone on independence lacks.

t:l-6

Lemma 3.2 (Independence) Let (Ω,F ,P) be a probability space. Suppose that A,G ⊂ F and thatX is a r.v. on (Ω,F ,P) with finite expectation. Suppose that A is independent of σ(σ(X),G). Then

E(X|σ(G,A)) = E(X|G), a.s. (3.1)

In particular, choosing G = σ(X), it follows that E(X|A) = E(X), a.s., whenever A and σ(X) areindependent.

Proof. We may assume that X ≥ 0 with probability 1. For A ∈ A and G ∈ G, X1G and 1A areindependent and so

E(X1G1A) = E(X1G)E(1A).

Since Y = E(X|G) a.s. is G-measurable, also Y 1G and 1A are independent with

E(Y 1G1A) = E(Y 1G)E(1A).

Since E(X1G) = E(E(X1G|G)) = E(1GE(X|G)) = E(1GY ). it follows that

E(X1G∩A) = E(X1G1A) = E(Y 1G1A) = E(Y 1G∩A). (3.2)

For a set C ∈ F , the functions µ(C) = E(X1C), ν(C) = E(Y 1C) define positive, finite measureson (Ω,F ,P). Note that the set C = G ∩ A, G ∈ G, A ∈ A, form a π-system for σ(G,A). By (3.2)µ and ν are equal on this π-system, µ(Ω) = ν(Ω) and so there are equal on σ(G,A). Hence Y is aversion of E(X|σ(G,A)). QED

Many theorem for integrals, i.e. expectations, apply to conditional expectations. Even though thelatter are r.v.s and not integrals! We quote some of these.

Properties of conditional expectations without proof (see also BSP p.29) Let the proba-bility space (Ω,F ,P) be given. Let X, Xn, n = 1, 2, . . ., be r.v.s on this probability space, with finiteexpectation (E|X|,E|Xn| < ∞). Let A be a sub-σ-algebra of F .

conditional monotone convergence If 0 ≤ Xn ↑ X, a.s., then E(Xn|A) ↑ E(X|A) a.s.

conditional Fatou If Xn ≥ 0 a.s. and E(lim inf Xn|A) ≤ lim inf E(Xn|A).

conditional dominated convergence If Xn → X a.s., and |Xn(ω)| ≤ Y (ω), n = 1, 2, . . ., for ther.v. Y with finite expectation, then E(Xn|A) → E(X|A) a.s.

22

conditional Jensen If f : R → R is a convex function, and E|f(X)| < ∞, then E(f(X)|A) ≥f(E(X|A)) a.s.

Problem 3.1 A rather queer example. Let Ω = (0, 1]. Let A be the σ-algebra generated by allone-point sets x, x ∈ (0, 1]. Let Px = 0 for all x ∈ (0, 1].

i) Does A contain any intervals? If yes, which ones? What is the relation between A and B(0, 1]?What values can PA take for A ∈ A?

ii) Let X : (0, 1] → R be any r.v. Determine E(X|A). Explain heuristically.

N.B.4 Let X be square integrable. Then the conditional expectation is in fact a least squares esti-mate or an orthogonal projection of X onto the space of square integrable functions on (Ω,A,P).Some terminology: by E(X|Y ), E(Y1, Y2, . . .) we mean E(X|σ(Y )), E(X|σ(Y1, Y2, . . .)) etc.etc.etc.

Problem 3.2 Let X, Y1, Y2 be r.v.s on (Ω,F ,P). Use BSP p.29 to show the following properties.

i) E(Xg(Y1)|Y1) = g(Y1)E(X|Y1), for Borel functions g.

ii) E(E(X|Y1, Y2)|Y2) = E(X|Y2).t:opg2-3

Problem 3.3 Let (Ω,F ,P) be given and let X be a r.v. Let A1, A2, . . . be a measurable partitionof Ω, that is: A1, A2, . . . ∈ F with Ai ∩ Aj = ∅ and ∪iAi = Ω. let A = σ(A1, . . .) be the σ-algebragenerated by this partition.

i) Show that there is a version Y of E(X|A) that is constant on each of Ai, in particular

Y (ω) =E(1AiX)

PAi, ω ∈ Ai,

provided that PAi > 0. What is the value when PAi = 0?

ii) Let Z be any A-measurable r.v. which has distinct values on the Ai, I = 1, . . .. How can youexpress E(X|A) in term of Z?

This is not explicitly stated in BSP, but a very important property The Doob-Dynkinlemma implies that there exists some Borel function g, such that

E(X|Y ) = g(Y )!, with probability 1.

In this case we writeE(X|Y = y) := g(y).

A similar assertion holds when Y = (Y1, Y2, . . . , Yn) is a random vector on (Ω,F ,P). We can oftencalculate this and then it is extremely important in computing expectations etc.

Problem 3.4 Suppose that X = g(Y ) for some Borel function g. What is E(X|Y )?

23

Note that this entails that E(X|Y ) is constant on sets where Y is constant, i.e. on sets of the formω : Y (ω) = y.

Since E(X|Y ) = g(Y ) is a function of Y a.e. , one can write integrals of E(X|Y ) over measurablesets of Ω as integrals over measurable sets of B w.r.t. the induced probability distribution PY of Y :∫

AE(X|Y )dP =

∫Y (A)

g(y)dPY (y) =∫

Y (A)E(X|Y = y)dPY (y).

Problem 3.5 In the case of a discrete r.v. Y we have seen how to calculate E(X|Y ). SpecifyE(X|Y = y). Show that

E(X) =∑

y

E(X|Y = y)PY = y.

Problem 3.6 Suppose that X1, . . . is a sequence of i.i.d. r.v.s on the probability space (Ω,F ,P) withfinite expectation. Let T be a stopping time for this sequence. Let Sn =

∑ni=1 Xi. It is tempting to

say thatE(ST |T = n) = nE(X1).

This is not correct in general- explain why and give a counter-example.

This conditioning on a value Y = y gives rise to conditioning on events. Say, let A ∈ F . PutY = 1A. Then we define E(X|A) := E(X|Y = 1): this is a number!. If E(X|Y ) = g(Y ), thenE(X|A) = g(1).

Problem 3.7 Let A ∈ F , and let B1, . . . be a measurable partition of the set A. Show that

E(X|A)PA =∑

i

E(X|Bi)PBi.

Problem 3.8 Let a probability space (Ω,F ,P) be given. Let X and Y be r.v.s on this space withE|X| < ∞. By definition, E(X|Y ) is σ(Y )-measurable.

Suppose that Y has a density w.r.t. the Lebesgue meaure λ, i.e. there is a Borel function fY , suchthat PY ∈ B =

∫B fY (y)dλ(y). Show that

E(X) =∫

yE(X|Y = y)fY (y)dλ(y).

This is the analogon of the formula for discrete r.v.s Y !

There are now two issues to be addressed. first is that conditional expectations E(X|Y ) are easilycalculated when Y is a discrete r.v., taking only countably many values. However, when Y has amore general distribution, it is not that obvious how to do this. A first step in this direction is towrite conditional expectation as expectations of r.v.s.

24

Conditional probabilities and conditional distribution functions (pdf) Since probabilitiescan be written as expectations, it is clear that one can also condition probabilities. Let A ∈ F andlet Y be a r.v. on (Ω,F). Then

PA|Y := E(1A|Y ) = pA(Y ),

where pA is Borel-function that depends on A. We call this the conditional probability of A givenY . Write PA|Y = y = pA(y), and we call it the conditional probability of A given Y = y. As inthe foregoing,

PA ∩ Y −1(B) =∫

Y −1(B)1A(ω)dP(ω) =

∫Y −1(B)

PA|Y (ω)dP(ω) =∫

BPA|Y = ydPY (y),

so we can write this probability in terms of the probability distribution of Y !

Problem 3.9 Calculate PA|Y = y when Y is a discrete r.v.

Let X be another r.v. on the same probability space. Then we can apply the above to the setA = X ∈ B′. it is common to write PX|Y (B′) = PX ∈ B′|Y , PX|Y =y(B′) = PX ∈ B′|Y = yas the conditional distribution of X given y and given Y = y respectively. This implies that

PX ∈ B′ ∩ Y ∈ B =∫

Y −1(B)PX ∈ B′|Y dP =

∫B

PX|Y =y(B′)dPY (y). (3.3)

It is a theorem that one can choose a so-called regular version of PX|Y =Y (ω), which is a probabilitymeasure on (R,B) for P-almost all ω ∈ Ω.

Problem 3.10 Argue that PX|Y =y(A) = PX(A) when X and Y are independent r.v.s on the sameprobability space (Ω,F ,P).

Since PX|Y =y is a probability distribution on (R,B), we can calculate expectations of B-measurablefunctions.

t:l-8

Lemma 3.3 Let φ be a Borel function. Then

E(φ(X)|Y = y) =∫R

φ(x)dPX|Y =y(x). (3.4)

Problem 3.11 Derive this relation, when Y is a discrete r.v.

Proof. Why is this so? Again we apply the strategem of going from elementary functions, via non-negative functions to general functions. First, let φ = 1B, B ∈ B. In this case φ(X) = 1B(X) =1X

−1(B). Hence

E(φ(X)|Y ) = E(1B(X)|Y ) = E(1X−1(B)|Y ) = PX−1(B)|Y def= PX|Y (B).

On the other hand ∫xφ(x)dPX|Y =y(x) =

∫x∈B

dPX|Y =y(x) = PX|Y =y(B).

25

General elementary functions φ are linear combinations of indicator functions. The assertion thenfollows from the above and the linearity property BSP29 property (1). For positive functions itfollows by monotone convergence of conditional expectations. Finally we write φ = φ+ − φ−, andthen the results follows again from linearity. QED

We have reduced the problem of computing conditional expectations, to the problem of computingconditional probability distributions. Does this help?

Very often, a problem already is formulated in terms of conditional distributions. If this is not thecase, one can do something in the following case.

Say X, Y have a joint probability density fX,Y , with respect to the Lebesgue measure λ2 on (R2,B2):

PX ∈ B′, Y ∈ B =∫ ∫

x∈B′,y∈BfX,Y (x, y)dλ2(x, y).

Then fY (y) =∫R fX,Y (x, y)dλ(x) acts as a probability density of Y .

Define the elementary conditional pdf (=probability density function) of X given Y as

fX|Y =y(x) =

fX,Y (x,y)

fY (y) , if fY (y) 6= 00, otherwise.

Then

PX|Y =y(A) = PX ∈ A|Y = y =∫

x∈AfX|Y =y(x)dλ(x), E(φ(X)|Y = y) =

∫φ(x)fX|Y =y(x)dλ(x).

This material is contained in BSP exercise 2.16 and Remark 2.3 and you should be able to do thederivations by help of BSP. One can check the validity of this by checking the definition of conditionalexpectation by rewriting (3.3).

Extra observation on conditioning Often, the same random variable appears in the condition-ing and as part of the random variable that we take the conditional expectation of. For instance,E(

∑Ti=1 Xi|T = t), where T is a r.v. with positive integer values. Intuitively it is clear that we can

insert the value t for T in the conditioning: E(∑T

i=1 Xi|T = t) = E(∑t

i=1 Xi|T = t). Is this truegenerally?

For some cases we do know this already:

i) E(X + f(Y )|Y = y) = E(X|Y = y) + f(y) = E(X + f(y)|Y = y), by linearity of conditionalexpectations, for any Borel function f ;

ii) or, E(Xf(Y )|Y = y) = E(X|Y = y)f(y) = E(Xf(y)|Y = y), by “taking out what is known”.

How can one prove this in case of the above example of a random sum?

Let X, Y be given r.v.’s on a probability space (Ω,F ,P). Let us consider the functions f : R2 → R,with f B2-measurable. A π-system for B2 is for instance the collection of product sets (−∞, x] ×(−∞, y]|x, y ∈ R. The question if whether E(f(X, Y )|Y = y) = E(f(X, y)|Y = y) PY -a.s.

Define H as the collection of bounded Borel functions f : R2 → R, with E(f(X, Y )|Y = y) =E(f(X, y)|Y = y) PY -a.s. It is straightforward to check that H is a monotone class.

26

Let f = 1(−∞,a]×(−∞,b]). If f ∈ H for any a, b ∈ R, then H contains all bounded B2-measurablefunctions by the Monotone class Theorem. We check that f ∈ H:

E(f(X, Y )|Y ) = E(1X≤a1Y≤b|Y ) = 1Y≤bE(1X≤a|Y ).

On the other hand

E(f(X, y)|Y ) = E(1X≤a1y≤b|Y ) = 1y≤bE(1X≤a|Y ).

On the set Y −1(y) one has 1y≤b = 1Y≤b (the first is the constant function on Ω, either 0everywhere or 1). So the result is proved, if we take the same version E(1X≤a|Y ).

Now, think yourself how to extend this to unbounded B2-measurable functions f .

Another approach is to use joint measures: PX ≤ a, Y ≤ b defines a probability measure PX,Y on(R2,B2). We have seen that

PX,Y (−∞, a]× (−∞, b] = PX ≤ a, Y ≤ b =∫

y≤b

∫x≤a

dPX|Y =y(x)dPY (y).

Under assumed regularity conditions, for arbitrary Borel sets B2 ∈ B2 one gets by standard proce-dures

PX,Y (B2) =∫

y:∈By

∫x:(x,y)∈B2

dPX|Y =y(x)dPY (y),

with By = y ∈ R : ∃x such that (x, y) ∈ B2. So we have an identity for measures. Now by goingthrough the standard machinery of indicator functions, elementary functions, positive and generalfunctions, one can show that∫

ωf(X, Y )dP =

∫x,y

f(x, y)dPX,Y (x, y)

=∫

y

∫xf(x, y)dPX|Y =y(x)dPY (y) =

∫yE(f(X, y)|Y = y)dPY (y)

provided that E(f(X, y)|Y = y) is a B-measurable function on R! Can you prove this from standardmachinery?

Remind that E(f(X, Y )|Y ) = g(Y ) for some Borel-function g. Our goal to prove is that one cantake g(y) = E(f(X, y)|Y = y) (presumably we have proved measurability). Not to confuse notation,write h(y) = E(f(X, y)|Y = y). We get∫

ω∈Y −1(B)h(Y )dP =

∫y∈B

h(y)dPY (y)

=∫

y∈B

∫xf(x, y)dPX|Y =ydPY (y) =

∫y∈B,x∈R

f(x, y)dPX,Y (x, y) =

=∫

ω∈Y −1(B)f(X, Y )dP.

It follows that h(Y ) is a version of the conditional expectation E(f(X, Y )|Y ).

In case that X and Y are independent, we have a simpler expression since in this case PX|Y =y(B) =PX(B): h(y) = E(f(X, y)|Y = y) = EX(f(X, y)), where we take the unconditional expectation w.r.t.X.

27

Help variables Sometimes it is convenient to consider ‘mixtures’ of conditional expectation in thefollowing sense. Let X, Y, Z be r.v.s on (Ω,F ,P). One can then speak of E(X|Y, Z = z). Let g(Y, Z)is a Borel function that is a.s. equal to E(X|Y, Z). Then E(X|Y, Z = z) = g(Y, z), where Y is leftunspecified.

Since σ(Z, Y ) ⊃ σ(Y ), the Tower property yields that E(E(X|Y, Z)|Y ) = E(X|Y ).

Let us consider E(E(X|Y, Z)|Y ) = E(g(Y, Z)|Y ). We are in the above situation: E(g(Y, Z)|Y = y) =E(g(y, Z)|Y = y) =

∫z g(y, z)dPZ|y=y(z). Now, if Z and Y are independent, we find

E(g(Y, Z)|Y = y) =∫

zg(y, z)dPZ(z),

so that E(X|Y = y) =∫z g(y, z)dPZ(z). Hence, if the conditional expectation E(X|Y, Z) = g(Y, Z)

is easy to calculate, this may help to solve the more complicated problem of calculating E(X|Y ).

Problem 3.12 Try to justify all these steps.

This procudure may help to attack BSP exercise 2.6 in a more structured way.

Problem 3.13 Let X = ξ and Y = η from exercise 2.6. Define an appropriate r.v. Z, such thatE(X|Y, Z) can be directly calculated. Compute the desired conditional expectation E(X|Y ).

One can derive many convenient statements about these ‘mixed’ conditional distributions. LetX, Y1, . . . , Yn, Z be r.v.s on the same probability space (Ω,F ,P).

Problem 3.14 i) Show that

E(X|Z = z) = E(E(X|Y1, . . . , Yn, Z = z)|Z = z).

ii) Let Z = z ∈ σ(Y1, . . . , Yn). Show that

E(E(X|Y1, . . . , Yn)|Z = z) = E(E(X|Y1, . . . , Yn, Z = z)|Z = z).

Problem 3.15 Let X, Y be independent r.v.s with Xd= exp(λ), Y

d= exp(µ). Show thatminX, Y d= expλ + µ.

Problem 3.16 Let X1, . . . , Xn be i.i.d. r.v.s, distributed as a homogeneous distribution on (0, 1)(Xi

d= Hom(0, 1)).

i) Determine the distribution function FZ and density fZ of Z = max(X1, . . . , Xn).

ii) Calculate PZ ≤ z|X1 = x and the density fZ|X1=x(z).

iii) Calculate PX1 ≤ x|Z = z and PX1 ≤ x|Z. Hint: use (ii). Calculate E(X|Z).

Problem 3.17 Let U, V i.i.d. r.v.s, with U, Vd= Hom(0, 1). Let X = min(U, V ) and Y = max(U, V ).

Calculate PY ≤ y|X and calculate E(Y |X).

28

Problem 3.18 Let X1, . . . , Xn be i.i.d. r.v.s with continuous distribution functions F. Let X =maxX1, . . . , Xn and Y = minX1, . . . , Xn. Prove the following statements.

i)

PY > y|X = t =(F(x)− F(y)

F(x)

)n−1, y < x.

ii)

PXk ≤ x|X = t =

n−1

nF(x)F(t) , x < t

1, x ≥ t.

iii)

E(Xk|X = t =n− 1

n · F(t)

∫ t

−∞ydF(y) +

t

n.

Problem 3.19 Gambler’s ruin. A man is saving money to be a new Jaguar at the cost of N unitsof money. he starts having k (1 < k < N) units and tries to win the remainder by the followinggamble with his bank manager. He tosses a fair coin repeatedly; id it comes up heads the managerpays him one unit, but if it comes up tails then he pays the bank manager one unit. He plays thisgame repeatedly, until one of two events occurs: either he runs out of money and is bankrupted orhe wins enough to buy the Jaguar. What is the probability that he is ultimately banktrupted?

Let Ak denote the event that he is eventually bankrupted, given an initial capital of k units. Writepk = PAk. Let B the event that the first toss of the coin shows heads.

Conditioning on B yields a linear relation between pk, pk−1 and pk−1, for k = 1, . . . , N − 1. This isa linear difference equation with boundary conditions p0 = 1, pN = 0.

A trick to solve this (and many similar problems), is to look at the differences bk = pk − pk−1. Thelinear difference equation then transforms to a linear relation between bk and bk+1.

i) Solve it and determine pk.

ii) One can look at the problem from a different point of view. Say let T be the first time our maneither is bankrupted or he has collected the money for buying the Jaguar. Show that T is astopping time. Assume that it is finite with probability 1 and has finite expectation. Use thisto derive the same formula for pk.

Problem 3.20 Now the man follows another strategy. He starts by betting one unit of money. Ifheads come up, the manager pays him is bet, if tails come up, he loses his bet to the manager.Everytime he wins, he increases his bet by one, but he will never bet more than his present capitalor the remainder needed to buy the Jaguar. If he loses he decreases the next bet by one, with againthe condition that he will not bet more than his present capital and the sum needed to buy theJaguar. He will always bet at least 1.

Denote by Sn his capital after n bets, S0 = k is his initial capital. Let T again denote the momentthat the man stops betting. Then let us simply model that the mans capital remains the same foreverafter.

29

i) Show that ESn+1|S0, . . . , Sn = Sn,

ii) Show that E(Sn) = S0.

iii) Assume that we may conclude that E(ST ) = S0. Determine now the probability that the mangets bankrupted. How do both strategies compare?

Problem 3.21 A biased coin is tossed repeatedly. Each time there is a probability p of a headturning up. Let pn be the probability that an even number of heads has occurred after n tosses(zero is an even number). Then p0 = 1. Derive an expression for pn in terms of pn−1 and use it tocalculate pn, n = 1, 2, . . ..

Sequences of r.v.s and some examples

Gambling systems (cf. BSP Ch.3) A casino offers the following game consisting of n rounds. Inevery round t he bets αt ≥ 0. His bet in round t may depend on his knowledge of the game’s past.

The outcomes ηt, t = 1, . . . of the game are i.i.d. r.v.s with values in −1, 1 and Pηt = 1 = 1/2 =Pηt = −1. The gambler’s capital at time t is therefore Xt =

∑ti=1 αiηi.

A gambling statregy α1, α2, . . . is called admissable if αt is σ(η1, η2, . . . , ηt−1)-measurable. In wordsthis means that the gambler has no prophetic abilities. His bet at time t depends exclusively onobserved past history.

Example: αt = 1ηt>0 “only bet if you will win” is not admissible.

Problem 3.22 By the distribution of outcomes, one has E(Xt) = 0. Prove this.

One has T = mint|Xt ≤ α is a stopping time, since T ≤ t = ∪tl=0Xl ≤ α and

Xl ≤ α ∈ σ(η1, . . . , ηl ⊂ σ(η1, . . . , ηt, l ≤ t.

Now, αt = 1T>t−1 = 1T≥t ∈ σ(η1, . . . , ηt−1) defines an admissible gambling strategy with

Xt =t∑

j=1

αjηj =t∑

j=1

1T≥jηj =mint,T∑

j=1

ηj = Smint,T,

where St =∑t

j=1 ηj . Hence ESmint,T = 0 if T is a stopping time.

Hedging We have seen that the above gambling strategies cannot modify the expectation: one theeverage the gambler wins and loses nothing. Apart from that, which payoffs can one obtain bygambling?

We discuss a simple model for stock options. Assume that the stock price either increases by 1 ordecreases by 1 every day, with probability 1/2, independently from day to day. Suppose I own αt

units of stock at time t. Then the value of my portfolio increases by αtηt every day (ηt are definedas in the gambling section).

Suppose the bank offers the following contract “European option”: at a given time t one has thechoice to buy 1 unit of stock for price C or not to buy it. C is specified in advance. Our pay-off perunit stock is (St − C)+. In exchange, the bank receives a deterministic amount E((St − C)+).

Can one generate the pay-off by an appropriate gambling strategy? The answer is yes, and in factmuch more is true.

30

Lemma 3.4 Let Y be a σ(η1, . . . , ηn) measurable function. Then there is a gambling strategyα1, . . . , αt such that

Y − E(Y ) =n∑

j=1

αjηj .

Proof. Write Fn = σ(η1, . . . , ηn). Define αj by

αjηj = E(Y |Fj)− E(Y |Fj−1).

We have to show that αj ∈ Fj−1.

Problem 3.23 i) Show that E(αjηj |Fj−1) = 0.

ii) Use this fact to show that E(αj |Fj−1, ηj = 1) = E(αj |Fj−1, ηj = −1).

Now αj is Fj−1-measurable if

αj =1ηj

E(αjηj |Fj−1, σ(ηj))

does not depend on the value ηj . But this follows from the above.

Problem 3.24 Explain this.

We conclude that αj , j = 1, . . . is a gambling strategy. The result follows by addition. QED

Problem 3.25 Symmetry Let X1, . . . be i.i.d. r.v.s with finite expectation. Let Sn = X1+· · ·+Xn.In general X1 is not σ(Sn)-measurable for n ≥ 2. Explain, and give an example.

Show that with probability 1 we have

E(X1|Sn) = · · · = E(Xn|Sn) =1n

E(X1 + · · ·+ Xn|Sn) =1n

E(Sn|Sn) =Sn

n.

4 Martingales

From now on we will mainly list homework problems. As a basic ‘datum’ we take a filtered space(Ω,F , Fnn,P). Here (Ω,F ,P) is a probability space and Fn ⊂ F , n = 1, . . . is a filtration, that isF1 ⊆ F2 ⊆ F3 ⊆ · · ·.

Define F∞ = σ(∪nFn).

Let Mn be a supermartingale, adapted to the filtration Fnn.

Problem 4.1 Suppose that S and T are stopping times adapted to the filtered space. Show thatmin(S, T ) = S ∧ T , max(S, T ) = S ∨ T and S + T are stopping times.

31

Suppose that T is a stopping time that is finite with probability 1.

Then Mn∧T n is a supermartingale (provided that E|Mn∧T | < ∞) and hence E(Mn∧T ) ≤ E(M0).Under what conditions is

E(MT ) ≤ E(M0)? (4.1)

Basically one needs a condition ensuring that

E(MT ) ≤ limn→∞

E(Mn∧T ), (4.2)

in the supermartingale case, orE(MT ) = lim

n→∞E(Mn∧T )

in the martingale case. The latter amounts to justifying interchange of limit and expectation.

BSP gives general conditions for this to happen in the form of (Doob’s) Optional Stopping Theorem.We can also give simpler conditions that often apply and for which (4.2) can be proved in a moredirect manner.

We give another form of the Optional Stopping Theorem.

Theorem 4.1 (Doob’s optional Stopping Theorem) i) Let Mnn be a supermartingale andT an a.s. finite stopping time. One has E|MT | < ∞ and (4.1) in each of the following cases.

1. T is a.s. bounded: T (ω) ≤ N for almost all ω ∈ Ω, for some constant N .

2. Mn(ω) ≤ C for some constant C, for almost all ω, n = 0, 1, . . ..

3. E(T ) < ∞ and |Mn(ω)−Mn−1(ω)| ≤ C for some constant C, for a.a. ω, n = 1, . . ..

ii) If Mnn is a martingale then E(MT ) = E(M0) under any of the conditions 1,2 or 3.

iii) Martingale transformation Suppose that Mnn is a martingale and T a stopping time satisfying(i, 3). Let αnn be an admissible gambling strategy adapted to Fnn (or a previsible process),such that |αn(ω)| ≤ C2 for a.a. ω, n = 1, . . ., for some constant C2. Then

E(T∑

n=1

αn(Mn −Mn−1)) = 0,

in other words, on the average, we cannot turn a neutral game into a profitable (or losing) one.

iv) If Mnn is a non-negative supermartingale and T is a.s. finite, then (4.1) again applies.

Problem 4.2 i) Prove parts (i,ii,iii) of the above Optional Stopping Theorem.

ii) Prove (iv). Deduce that λPsupn Mn ≥ λ ≤ E(M0).

A problem in applying this theorem is to check a.s. finiteness of the stopping time, and even checkingthat it has finite expectation.

There is a simple result, which applies in many cases.

32

Lemma 4.2 What always stands a reasonable chance of happening, will a.s. happen,sooner rather than later. Let T is a stopping time on the filtered space (Ω,F , (Fn)n,P). Supposet:l-4-1

T has property that for some N ∈ Z+ and some ε > 0,

PT ≤ t + N | Ft) > ε, a.s., t = 1, 2, . . .

Then T < ∞ a.s., in particular E(T ) < ∞.

Problem 4.3 Prove Lemma 4.2. Hint: using that PT > kN = PT > kN, T > (k − 1)N, proveby induction that PT > kN ≤ (1− ε)k.

Monkey typing ABRACADABRA At each of time 1, 2, 3, . . . a monkey types a capital letterat random. The sequence of letters form an i.i.d. sequence of r.v.s, uniformly drawn from the 26possible capital letters.

Just before each time t = 1, 2, 3, . . ., a new gambler arrives, carrying D 1 in his pocket. Het bets D 1that the tth letter will be A. If he loses, he leaves; if he wins he receives D 26 times his bet (so thathis total capital of his first bet is D 26!). He bets all of D 26 on the event that the (t+1)th letter willbe B. If he loses, he leaves. If he wins, he will bet his whole fortune of D 262 on the event that the(t = 2)th letter will be R. And so forth through the whole ABRACADABRA sequence. Let Tbe the first time, by which the monkey has produced the ABRACADABRA sequence. Once thissequence has been produced, gamblers stop to arrive at the system and nothing happens anymore.

Problem 4.4 i) Put M0 = 0. Show that the total accumulated gain Mt by the gamblers at timet, t = 0, 1, 2, . . ., is a martingale (loss is a negative gain).

ii) Show that T is a.s. finite with E(T ) < ∞.

iii) Explain why martingale theory makes it intuitively obvious that

E(T ) = 2611 + 264 + 26.

Prove this.

iv) Can you make a guess of the expected time till the monkey has typed 10 successive A’s? Explainintuitively.

Simple and asymmetric random walks Let Xn be a simple or asymmetric rand walk on theintegers. Then Xn is a martingale whenever p = 1/2, it is a supermartingale whenever p < 1/2 anda submartingale when p > 1/2.

First consider a finite interval (a, b), such that X0 ∈ (a, b). Let T be the first time that Xn leavesthis interval, i.e.

T = minn |Xn 6∈ (a, b).

Problem 4.5 i) Show that Xn − n(2p− 1) is a martingale.

ii) Show that T is a.s. finite and has finite expectation.

Let p = 1/2.

33

iii) Compute PXT = a and PXT = b, using the martingale from (i).

iv) Compute E(T ). Hint: use one of the ways discussed in BSP or during the lectures, to define asuitable related martingale.

In the case that e.g. b = ∞ the result should be intuitively obvious that T is a.s. finite wheneverp > 1/2, but it is not whenever p ≤ 1/2. Let a = 0, X0 = 1, b = ∞, that is, we are interested in theprobability that random walk will hit 0. There are many ways of investigating this. Here we aim touse methods discussed in Ch. 3 of BSP and the notes.

Problem 4.6 i) Use the previous problem to show that T is a.s. finite in the symmetric case. Showthat E(T ) = ∞.

ii) Assume that p < 1/2. Show that T is a.s. finite by using the martingale from (i) of the previousproblem. Hint: show that E(n ∧ T ) ≤ 1/(1− 2p). Deduce that E(T ) < ∞.

This still leaves the case p > 1/2. A simple technique coming from Markov chain theory and potentialtheory helps. We formulate it in a more general context.

Lemma 4.3 Let Xnn be a stochastic process with values in Z, adapted to the filtration Fnn. LetsH be the collection of functions f : Z+ → R with the following properties: f ≥ 0, and f(Xn)n is asupermartingale (adapted to Fnn, for any initial position X0 = x. Let T0 = minn > 0 |Xn = 0.In Markov chain theory such functions are called non-negative, superharmonic functions.

i) Let x > 0 be given. Suppose that PT0 < ∞|X0 = x = 1. Then f(0) ≤ f(x).

ii) Show that the stopping time T for the asymmetric walk with p > 1/2, is infinite with positiveprobability. Hint: construct a function f with f(0) > f(x), such that f(Xn) is a martingale.

Martingale formulation of Bellman’s optimality principle. Your winning per unit stake ongame n are εn, where the εn are i.i.d. r.v.s with

Pεn = 1 = p = 1− Pεn = −1,

with p > 1/2. Your bet αn on game n must lie between 0 and Zn−1, your capital at time n − 1.Your object is to maximise your ‘interest reate’ E log(ZN/Z0), where N =length of the game is finiteand Z0 is a given constant. Let Fn = σ(ε1, . . . , εn) be your ‘history’ upto time n. Let αnn be anadmissible strategy.

Problem 4.7 Show that log(Zn)− nα is a supermartingale with α the entropy given by

α = p log p + (1− p) log(1− p) + log 2.

Hence log(Zn/Z0) ≤ Nα. Show also that for some strategy log(Zn) − nα is a martingale. What isthe best strategy?

34

5 Martingale convergence problems

Let the filtered probability space (Ω,F , Fnn,P) be given. All processes are again processes on thisspace, adapted to the filtration Fnn.

A summary of the L1-supermartingale convergence theorem is as follows.

Theorem 5.1 (BSP Thm.4.3, 4.4) Let Mnn=0,1,... be a UI supermartingale. Then Mn → M∞a.s. for some r.v. M∞, and even Mn → M∞ in L1, i.e.

E|Mn −M∞| → 0, n →∞.

If Mnn is a martingale, thenMn = E(M∞|Fn),

and so Mnn is a Doob type martingale w.r.t. M∞.

It is useful to quote the following theorem, which extends BSP exercise 4.5.

Theorem 5.2 (Levy’s ‘Upward’ Theorem) Let X be a r.v. with E|X| < ∞. Then Mn =E(X|Fn) is a UI martingale. Let M∞ = a.s. limn→∞ Mn, then

M∞ = E(X|F∞), a.s.

where F∞ = σ(∪nFn).

That M∞ = E(X|F∞) is by no means trivial. It amounts again to justifying a limit interchange:limn E(X|Fn) = E(X|σ(limnFn)).

Proof. We only have to prove that M∞ = E(X|F∞), a.s.

Let Y = E(X|F∞), a.s., and suppose that P(Y 6= M∞) > 0. We may assume that X ≥ 0, a.s. Definetwo measures on (Ω,F∞):

µ1(A) = E(Y 1A), µ2(A) = E(M∞1A), A ∈ F∞.

For B ∈ Fn we have B ∈ F∞ and so

µ1(B) = E(Y 1B)def of Y= E(X1B)

def of Mn= E(Mn1B)BSP Thm. 4.4= E(M∞1B) = µ2(B).

Hence µ1 and µ2 agree on the π-system ∪nFn and therefore they agree on F∞.

Now Y is F∞-measurable. Take M∞ = lim supn Mn, then F∞-measurable. Hence F = 1Y >M∞ isF∞-measurable and so

E((Y −M∞)1F) = µ1(F )− µ2(F ) = 0.

Since (Y −M∞)1F ≥ 0, it follows that PF = 0. Similarly, PY < M∞ = 0. QED

Theorem 5.3 (Kolmogorov’s 0-1 law) Let X1, . . . be a sequence of independent r.v.’s. ThenPA = 0 or 1 for all A ∈ T , with T the tail-σ-algebra.

35

Proof. Define Fn = σ(X1, . . . , Xn). Let A ∈ T , and let X = 1A. By Levy’s upward theorem,

X = E(X|F∞) = limn

E(X|Fn), a.s.

Now X is Tn+1-measurable. Since Tn+1 and Fn are independent, it follows that X is independentof Fn. And so, E(X|Fn) = E(X) = PA. Consequently, X = PA, a.s. The result follows, sinceindicator functions take only the value 0 or 1. QED

There is a nice proof of the strong Law of Large Numbers using Kolmogorov’s 0-1 Law. To this endwe will in fact use so-called ‘reverse martingales’.

Theorem 5.4 (Levy’s Downward Theorem) Let (Ω,F ,P) be a probability space. LetF−nn=0,... be a non-increasing collection of sub-σ-algebras of F with

F−1 ⊇ F−2 ⊇ · · · ⊇ F−n ⊇ · · · ⊇ F−∞ = ∩nF−n.

Let X be a r.v. with E|X| < ∞, and define M−n = E(X|F−n). Then M−∞ = limn→∞ M−n existsa.s. and in L1. Moreover M−∞ = E(X|F−∞), a.s.

Problem 5.1 Prove the theorem. Use the techniques that were used for Doob’s submartingaleconvergence theorem, the L1-convergence and Levy’s Upward Theorem.

Let X1, . . . be a sequence of i.i.d. r.v.s with finite expectation. Write Sn =∑n

i=1 Xi. DefineF−n = σ(Sn, Sn+1, . . .).

Problem 5.2 i) Show that E(X1|Fn) = Sn/n, a.s.

ii) Show that limn→∞ Sn/n exists a.s. and in L1, and that it equals E(X1).

Galton-Watson process- the simplest form of a branching process This is a simple modelfor population growth, growth of the number of cells, etc. Suppose that we start with a populationof 1 individual at time 0, i.e. N0 = 1.

The number of t-th generation individuals is denoted by Nt. Individual n from this generation hasan amount of offspring Zn

t . We assume that Znt , n = 1, . . . , Nt, t = 0, . . . are bounded i.i.d. r.v.s, say

PZnt = k = pk, k = 0, . . . ,K,

for some constant K > 0, and that p0 > 0. Clearly, Nt+1 =∑Nt

n=1 Znt and µ = E(N1) =

∑k kpk.

We are interested in the extinction probability of the population as well as the expected time tillextinction. Define T = mint |Nt = 0.

Problem 5.3 i) Show that Nt/µt is a martingale with respect to an appropriate filtration.

Let now µ < 1, that is, on the average an individual produces less than one child.

ii) Show that Nt → 0 a.s. What does this imply for the extinction time T?

36

iii) Show that Mt = αNt1Nt>0 is a contracting supermartingale for some α > 1, i.e. for someα > 1 there exists β < 1 such that

E(Mt+1|Ft) ≤ βMt, t = 1, . . .

iv) Show that this implies that E(T ) < ∞. What is the smallest bound on E(T ) you can get?

The case of a population that remains constant on the average, is more complicated. Let TN =mint|Nt = 0 or Nt ≥ N. Intuitively it is clear that TN should be a.s. finite. In order to prove this,define the function

f(x) = P0 < Nt < N, for all t ≥ n|Nn = x.

Problem 5.4 i) Show that f(Nt) is a supermartingale.

ii) Show that this implies that f ≡ 0, for all values of µ. Hint: consider the value f(x∗) wherex∗ = argmaxf(x).

iii) Let µ = 1. Use (ii) to show that PT < ∞ = 1. Is Nt a UI martingale in this case? Explain.

iv) Prove that PT < ∞ < 1 whenever µ > 1. You can prove this by using arguments that youhave seen before during the course.

We are still left with the question whether the average time till extinction is finite or not, in thecritical situation µ = 1. The answer is that E(T ) = ∞, for which there seems to exist no probabilisticproof.

Problem 5.5 Find the simplest proof in the literature of this statement and write it down in yourown words.

6 Continuous time processes: the Wiener process or Brownian mo-tion

Multivariate normal distribution Let

X =

X1

X2...

Xk

be a k-dimensional random vector. We say that X has a N (µ, |Σ) distribution, µ ∈ Rk, |Σ a k × kpositive definite matrix, if aT X has the normal distribution N (aT µ, aT |Σa), for all a ∈ Rk. Thesimultaneous distribution of X is given by

fX(x) =1√

(2π)ndet( |Σ)exp−1

2(x− µ)T |Σ−1(x− µ), x ∈ Rk.

A first consequence is that for a non-singular matrix B, the vector BX has theN (µ∗, |Σ∗) distributionwith µ∗ = Bµ and |Σ∗ = BΣBT .

37

The definition implies all information on the components Xi and their correlations cov(Xi, Xj) =

E(XiXj)−E(Xi)E(Xj). Putting a = ei, the i-th unit vector, we obtain that Xid= N (µi, |Σii). Using

a = ei + ej , one can deduce that cov(Xi, Xj) = |Σij .

By using the density, one can then show that |Σij = 0 implies independence of Xi and Xj . This is aspecial properties of normally distributed r.v.s

Next we define Brownian motion.

Brownian motion or Wiener process The stochastic process W (t), t ∈ R+, defined on theprobability space Ω,F ,P) is called a standard Brownian motion (or standard Wiener process) if

i) W (0) = 0 a.s.;

ii) (W (t1, . . . ,W (tn)) has a multivariate normal distribution, for all n and all times 0 < t1 < t2 <· · · < tn;

iii) E(W (t)) = 0 for t > 0;

iv) cov(W (s),W (t)) = min(s, t);

v) W (·, ω) is a continuous function for a.a. ω ∈ Ω.

Problem 6.1 Let 0 < t1, · · · < tn. By assumption (W (t1), . . . ,W (tn)) has a multivariate normaldistribution. Compute the covariance matrix.

Construction of Brownian motion on [0, 1] For each ω we will define a uniformly convergentsequence of continuous functions Wl(t, ω), t ∈ [0, 1], l = 0, 1, . . ..

Define W0(0) = 0, and choose W0(1) = ∆0,0d= N (0, 1). Extend W0(t), 0 < t < 1, by linear

interpolation: W0(t) = t ·W0(1).

Next let ∆1,1d= N (0, 1/4) be drawn independently of ∆0,0. Define W1(0) = 0,

W1(1/2) =12W0(1) + ∆1,1

and W1(1) = W0(1) and define W1(t), t 6= 0, 1/2, 1 by linear interpolation. It is easily checked that(W1(1/2),W1(1)) has a multivariate normal distribution with the properties (iii,iv). Indeed

cov(W1(1/2),W1(1)) = cov(12W0(1),W0(1)) = 1

2σ2(W0(1)) = 12 = min(1

2 , 1).

Furtherσ2(W1(1/2)) = 1

4σ2(W0(1)) + σ2(∆1,1) = 12 .

The construction of Wl+1(t) from Wl(t) is as follows. Let ∆l+1,jd= N (0, 2−(l+2)), j = 1, . . . , 2l, be

independent and independent of ∆i,j , i ≤ l, j = 1, . . . , 2i−1. Assign Wl+1(0) = 0,

Wl+1((2j − 1)2−(l+1)) = Wl((2j − 1)2−(l+1)) + ∆l+1,j , j = 1, . . . , 2l

andWl+1(j2−l) = Wl(j2−l), j = 1, . . . , 2l.

38

For t 6= j2l+1, j = 0, 1, . . . , 2l+1, we define Wl+1(t) by linear interpolation:

Wl+1(t) = Wl+1(j2−(l+1)) +t− j · 2−(l+1)

2−(l+1)·Wl((j + 1)2−(l+1)), j = 1, . . . , 2l+1 − 1.

Then Wl+1(t), t = j · 2l+1, has the multivariate normal distribution with properties (iii,iv) from thedefinition of standard Brownian motion.

Lemma 6.1 sup0≤t≤1 |Wn(t)−Wm(t)| → 0 a.s., n, m →∞, i.e.

Pω : sup0≤t≤1

|Wn(t, ω)−Wm(t, ω)| 6→ 0, for some sequence nk,ω,mk,ω, k = 1, 2, . . . = 0.

Proof. Let Xl,j , j = 1, . . . , 2l−1, l = 1, . . . be a collection of i.i.d. N (0, 1)-distributed r.v.s. Clearly,for l ≥ 1

sup0≤t≤1

|Wl(t)−Wl−1(t)| ≤ max |∆l,j |, j = 1, . . . , 2l−1 =1

2(l+1)/2·max |Xl1|, . . . , |Xl2l−1 |

PutAn = ω : max

l=1,...,nj=1,...,2l−1

|Xlj(ω)| > 2 ·√

6 log(2n − 1)

(there are∑n

l=1 2l−1 = 2n− 1 i.i.d. N (0, 1)-distributed r.v.s that determine the max). We have seenthat PAn ≤ 1/(2n−1)2 and so the first Borel-Cantelli lemma implies that Plim supn→∞ An = 0.

Put A = lim supn→∞ An. Fix any ω ∈ Ac. There exists nω, such that

sup0≤t≤1

|Wn(t, ω)−Wn−1(t, ω)| ≤ 12(n+1)/2

· 2 ·√

6 log(2n − 1), n ≥ nω.

Consequently for m > n

sup0≤t≤1

|Wm(t, ω)−Wn(t, ω)| ≤m∑

l=n+1

|Wl(t, ω)−Wl−1(t, ω)|

≤m∑

l=n+1

12(l+1)/2

· 2 ·√

6 log(2l − 1) → 0, m, n →∞

QED

As a result, for ω ∈ Ac the sequence of continuous functions Wn(·, ω) has a continuous limit W (·, ω).To see that this limit defines is a Brownian motion on [0, 1], we have to do some work still.

Let 0 < t1 < t2 < · · · < tn ≤ 1. We have to show that (W (t1), . . . ,W (tn)) is a random vector withthe desired properties. This is clearly true if all tk all are dyadic rationals jk/2l: on these pointsW (tk) = Wn(tk) for n ≥ l. Otherwise let tmk > tk, tmk → tk, m → ∞, be a sequence of dyadicrationals. Then

(W (t1, ω), . . . ,W (tn, ω)) = lim supm→∞

(W (tm1 , ω), . . . ,W (tmn , ω)), ω ∈ Ac,

39

by continuity. If A = ∅ then the limsup is measurable. If A 6= ∅ we need in fact that F be extendedwith all subsets of 0-probability sets. This is a little beyond the scope of the course.

Now, (W (tm1 ), . . . ,W (tmn )) converges a.s. to the random vector (W (t1, . . . ,W (tn). The correspondingmultivariate normal densities converge as well to desired multivariate normal density. Hence thecorresponding distribution functions converge. One can then show that the limit distribution functionis the distribution function of (W (t1, . . . ,W (tn)).

t:l6.2

Lemma 6.2 Let (Ω,F ,P) be a probability space. Let X and Xn, n = 1, 2, . . ., be r.v.s on thisprobability space, such that Xn → X a.s. Let Fn(·) = PXn ≤ · be the distribution function of Xn

and assume that Fn → F for some distribution function F. Then F is the distribution function of X.

Problem 6.2 Prove this lemma - Fatou’s lemma for sets plays a role here.

With probability 1 Brownian motion paths W(t),0 ≤ t, are nowhere differentiable. Inother words: there exists a set A ∈ F , PA = 0, such that W (·, ω) is nowhere differentiablefor all ω ∈ Ac.

Note: we assume that W (·, ω) has continuous paths on R+, for a.a. ω. We have proved this only fora compact time interval. Let

Xnk = max∣∣W (k+1

2n )−W ( k2n )

∣∣ ,∣∣W (k+2

2n )−W (k+12n )

∣∣ ,∣∣W (k+3

2n )−W (k+22n )

∣∣.These differences have the distribution of 2−n/2 ·W (1), and so

PXnk ≤ ε = P3|W (1)| ≤ 2n/2ε ≤ (2 · 2n/2 · ε)3,

since the density of the standard normal distribution is bounded by 1. For Yn = mink≤n2n Xnk, wehave

PYn ≤ ε ≤ n · 2n(2 · 2n/2 · ε)3. (6.1)

Problem 6.3 Explain (6.1).

Let

DW (tω) = lim suph↓0

W (t + h, ω)−W (t, ω)h

, DW (tω) = lim infh↓0

W (t + h, ω)−W (t, ω)h

.

LetE = ω : DW (tω), DW (t, ω) are both finite for some t.

It is not clear whether E ∈ F is a measurable set! Choose ω ∈ E. Then there exists K = K(ω) suchthat

−K < DW (t, ω) ≤ DW (t, ω) < K,

for some time t = t(ω). Then there exists a constant δ = δ(ω, t,K), such that |W (s)−W (t)| ≤ K|s−t|for all s ∈ [t, t + δ]. Hence, there exists n0 = n0(δ,K, t), such that for n > n0

4 · 2−n < δ, 8K < n, n > t.

Given such n, choose k so that (k − 1)2−n ≤ t < k2−n. it follows that |i · 2−n − t| < δ fori = k, k + 1, k + 2, k + 3, and so

Xnk(ω) ≤ 2K(4 · 2−n) < n · 2−n. (6.2)

40

Problem 6.4 Explain (6.2).

Since, k − 1 ≤ t · 2n < n · 2n, it follows that

Yn(ω) ≤ Xnk(ω) ≤ n · 2−n.

We have thus shown that for ω ∈ E, there exists Nω such that ω ∈ An = ω : Yn(ω) ≤ n · 2−n forn ≥ Nω. So, E ⊂ lim infn An.

By virtue of (6.1),PAn ≤ n · 2n(2 · 2n/2 · n2−n)3.

Thus Plim infn An ≤ lim infn→∞ PAn = 0. By extending F with all sets contained in sets ofprobability 0, we obtain that PE = 0. This example shows again the necessity of such an extensionprocedure!

Markov property and strong Markov property Fix t. Put Ft = σ(W (s), s ≤ t and F0 =∅,Ω.

Now W (T +t)−W (T ), t ≥ 0 is independent of FT . This is the Markov property of Brownian motion.Moreover, it is a Brownian motion.

Problem 6.5 prove these statements.

We may even allow T to be a stopping time.

T is a stopping time if T is a non-negative r.v. on (Ω,F ,P), such that

ω : T (ω) ≤ t ∈ Ft.

Define FT to be the collection of all sets M ∈ F such that M ∩ ω : T (ω) ≤ t ∈ Ft for all t ≥ 0.

Problem 6.6 Deduce that ω : T (ω) = t ∈ Ft and that M ∈ FT implies M ∩ω : T (ω) = t ∈ Ft.

Now, let T be a stopping time and put W ∗(t) = W (T + t)−W (T ). Then the strong Markov propertyholds: W ∗(t), t ≥ 0 is independent of FT (i.e. σ(W ∗(t), t ≥ 0 is independent of FT . Moreover W ∗(t)is a Brownian motion.

This is true, if for all x1, . . . , xk ∈ R, t1 < · · · < tk, k = 1, . . ., and all M ∈ FT we have

P(W ∗(t1) ≤ x1, . . . ,W∗(tk) ≤ xk) ∩M

= PW ∗(t1) ≤ x1, . . . ,W∗(tk) ≤ xk · PM = PW (t1) ≤ x1, . . . ,W (tk) ≤ xk · PM. (6.3)

To prove this, first assume that T ∈ A for a countable set A with probability 1. Since

ω : W ∗(t) ≤ x = ∪T∈Aω : W (T + t, ω)−W (T, ω) ≤ x, T (ω) = t) ∈ F ,

it follows that W ∗(t) is F-measurable. Moreover,

P(W ∗(t1) ≤ x1, . . . ,W∗(tk) ≤ xk) ∩M =

∑t∈A

P(W ∗(t1) ≤ x1, . . . ,W∗(tk) ≤ xk) ∩M ∩ (T = t.

41

If M ∈ FT , then M ∩ (T = t) ∈ Ft. Further, if T = t, then (W ∗(t1), . . . ,W ∗(tk)) has the samedistribution as W (t1 + t)−W (t), . . . ,W (tk + t)−W (t)). We obtain

P(W ∗(t1) ≤ x1, . . . ,W∗(tk) ≤ xk) ∩M

=∑t∈A

P(W (t1 + t)−W (t) ≤ x1, . . . ,W (tk + t)−W (t) ≤ xk) ∩M ∩ (T = t)

=∑t∈A

P(W (t1 + t)−W (t) ≤ x1, . . . ,W (tk + t)−W (t) ≤ xkPM ∩ (T = t)

= P(W (t1 + t)−W (t) ≤ x1, . . . ,W (tk + t)−W (t) ≤ xkPM.

This proves that the first and last terms in (6.3) are equal. To prove equality of the second and lastterms, simply take M = Ω. Consequently, the assertion has been proved for stopping times with acountable range.

Let T be an arbitrary stopping time. Define

τn =

k · 2−n if (k − 1)2−n < T ≤ k · 2−n, k = 1, 2, . . .0, if T = 0.

If k · 2−n ≤ t < (k + 1) · · · 2−n, then τn ≤ t = T ≤ k · 2−n ∈ Fk2−n ⊂ Ft. If follows that τn is astopping time with a countable range.

Suppose that M ∈ FT and k · 2−n ≤ t < (k + 1) · · · 2−n. Then M ∩ τn ≤ t = M ∩ T ≤ k2−n ∈Fk2−n ⊂ Ft. So FT ⊂ Fτn .

Let W (n)(t, ω) = W (τn(ω) + t, ω)−W (τn(ω), ω) be the displacement process after stopping time τn.Since M ∈ FT implies M ∈ Fτn , we have by virtue of (6.3)

P(W (n)(t1) ≤ x1, . . . ,W(n)(tk) ≤ xk) ∩M = PW (n)(t1) ≤ x1, . . . ,W

(n)(tk) ≤ xkPM. (6.4)

However, τn(ω) → T (ω) for all ω and by a.s. continuity of the sample paths, W (n)(t, ω) → W ∗(t, ω)for a.a. ω.

To finish the proof, we have to invoke Lemma 6.2.

Problem 6.7 Finish the proof by suitably applying this Lemma.

Curious properties Let Ta be the first time that the Brownian motion process hits the set [a,∞).

Problem 6.8 i) By conditioning on the event Ta ≤ t show that

2PW (t) ≥ a = PTa ≤ t.

ii) Use this to show that

PTa ≤ t =2√2π

∫ ∞

a/√

texp−y2/2dy.

Compute the corresponding density fTa(t). Derive that Ta < ∞ a.s., but E(Ta) = ∞. ComputePmax0≤s≤t W (s) ≥ a.

How often does Brownian hit 0 in a finite time interval?

42

Problem 6.9 Let ρ(s, t) be the probability that a Brownian motion path has at least one zero in(s, t).

i) Deduce that

ρ(s, t) = 1− 2π

arcsin√

st .

ii) use (i) to show that the position of the last zero before time 1 is distributed over (0, 1) withdensity π−1(t(1− t))−1/2.

iii) For each ω let Z(ω) = t : W (t, ω) = 0 be the set of zeroes of W (·, ω). Show that λ(Z(ω)) = 0for a.a. ω, in words, the Lebesgue measure of the set of zeroes of W (·, ω) is 0 a.s.

We give some application of the use of stopping times. The first one is the curious phenomenon thatone can embed any given distribution law in a Brownian motion. One version of this statement isthe so called Skorokhod embedding - which is in a sense a minimal construction in the sense thatthe stopping time involved has finite expectation. Without this minimality condition, it is an almosttrivial statement, as pointed out by Doob.

Problem 6.10 Let F be a distribution function. Determine a function h such that Ph(W (1)) ≤x = F(x). Show that τ = mint > 1|W (t) = h(W (1)) is an a.s. finite stopping time. Show thatW (τ) has distribution function F and that E(τ) = ∞.

The stochastic process X(t) = µ · t + W (t), t ≥ 0, is called a Brownian motion with drift µ. Remindthat we can associate 3 martingales with the Brownian motion: W (t), t ≥ 0, W 2(t) − t, t ≥ 0 andexpcW (t)−c2t/2, t ≥ 0. With the Brownian motion with drift, one can also associate martingales.

Problem 6.11 Show that X(t)− µt, t ≥ 0, and exp−2µW (t), t ≥ 0 are martingales.

Let a < 0 < b and suppose that X(0) = x ∈ (a, b). We are interested in the probability px that a ishit before b.

t:q1

Problem 6.12 i) Let T = mint |X(t) ∈ a, b. Show that T < ∞ with probability 1.

ii) Use the continuous time version (not formulated but evident) of the optional stopping theoremto compute px.

iii) Show that E(T ) < ∞. Compute E(T ) through a suitable martingale.

7 Diffusions and Ito processes

Let us first give a proof of Theorem 7.4 from BSP.

Theorem 7.1 Let f be a stochastic process belonging to M2t and let I(t) =

∫ t0 f(s, ω)dW (s, ω).

Then there exists a stochastic process ζ(s), s ≤ t, such that ζ(·, ω) is continuous for a.a. ω andPξ(s) = ζ(s) = 1 for all s ∈ (0, t].

43

Proof. Let fn → f be a sequence of approximating random step functions. Clearly, Is(fn)0≤s≤t isa.s. continuous.

Since Is(fn)0≤s≤t is a martingale, also Is(fn) − Is(fm)0≤s≤t, is a martingale. Hence, (Is(fn) −Is(fm))20≤s≤t, is a sub-martingale. We may apply Doob’s maximal inequality yielding that

P sup0≤s≤t

|Is(fn)− Is(fm)| > ε) = P sup0≤s≤t

|Is(fn)− Is(fm)|2 > ε2)

≤ 1ε2

E(It(fn)− It(fm))2)

=1ε2||(It(fn)− It(fm))||2

L2 =1ε2||fn − fm||2M2

t→ 0, n,m →∞.

It follows that there exists a subsequence nkk, such that

P sup0≤s≤t

|Is(fnk)− Is(fnk+1

)| > 2−k < 2−k.

We may apply the first Borel-Cantelli Lemma to obtain that for almost all ω there exists an indexk(ω) such that

sup0≤s≤t

|Is(fnk)(ω)− Is(fnk+1

)(ω)| ≤ 2−k, k ≥ k(ω).

Hence, the sequence Is(fnk)(ω) converges uniformly on (0, t] for a.a. ω. Hence the limit Js(ω) =

limk→∞ Is(fnk)(ω) is a continuous function on (0, t] for a.a. ω. Now, Is(fnk

) L2

→ Is, k → ∞, fors ∈ (0, t]. Hence, there is a subsequence converging to Is for a.a. ω. It follows that PIs = Js = 1for s ∈ (0, t]. QED

Problem 7.1 Suppose that X, Xn, n = 1, . . . are r.v.s in L2(Ω,F ,P). Assume that XnL2

→ X.

i) Show that limn→∞ P|Xn −X| > ε = 0, for each ε > 0.

ii) Use this to show that there is a subsequence nkk along which there is a.s. convergence, i.e.Xnk

→ X for a.a. ω.

The proof of the above theorem will given some indications as how to prove this.

Problem 7.2 Brownian bridge Let a, b ∈ R be given. Consider the following 1-dimensionalequation:

dY (t) =b− Y (t)

1− tdt + dW (t), 0 ≤ t < 1, Y (0) = a.

Verify that

Y (t) = a(1− t) + bt + (1− t)∫ t

0

dW (s)1− s

, 0 ≤ t < 1,

solves the equation and prove that limt→1 Y (t) = b a.s.

So far, we have studied how to construct Ito-processes. However, given any stochastic differentialequation, there is no clue so far, as how to judge whether there exists a solution and, if it exists,whether it is unique (with prob. 1).

44

SBP treats the case of a so-called Ito-diffusion. Let us give the definition for the n-dimensional case.

A time-homogeneous Ito diffusion is a stochastic vector process X(t, ω) = (X1(t, ω), . . . , Xn(t, ω)) on(Ω,F ,P) that satisfies a stochastic differential equation of the form

dX(t) = b(X(t))dt + σ(X(t))dW (t), t ≥ s,X(s) = x,

where W (t) = (W1(t), . . . ,Wd(t)) is a d-dimensional Brownian motion, b : Rn → Rn, σ : Rn →n× d. We assume that b and σ satisfy a Lipschitz condition: there exists a constant C such that

||b(x)− b(y)|| + |||σ|(x)− |σ|(y)|| ≤ C||x− y||. (7.1)

Since we have not spoken of the multi-dimensional case, let us shortly spend a few words on it.(we need it in the later examples) A d-dimensional Brownian motion is simply the vector processassociated with d independent one-dimensional Brownian motions defined on the same space. TheSDE then simply stands for

dXi(t) = bi(X(t))dt + σi1(X(t))dW1(t) + . . . + σid(X(t))dWd(t), i = 1, . . . , n.

We can now set Ft = σ(Wi(s), 0 < s ≤ t, i = 1, . . . , d).

The analog of BSP Theorem 7.7 gives that under the above Lipschitz condition there is an a.s.unique solution of the initial value problem with a.s. continuous paths

dX(t) = b(X(t))dt + σ(X(t))dW (t), 0 ≤ t ≤ TX(0) = X0,

provided E(X0)2 < ∞ and X0 independent of σ(Ft, t > 0). This solution is adapted to the filtrationσ(Ft, σ(X0) and has

∫ T0 X2

i (t)dt < ∞ for i = 1, . . . , n.

Liptschitz conditions are also commonly used in (deterministic) differential equations for guarantee-ing existence and uniqueness properties. Next we will list a number of properties of Ito diffusions.In fact, these properties are inherent to so-called diffusion processes given technical conditions. Onedoes not need the notion of SDE’s and Ito integrals for arriving at these properties. However, itappears from the literature that SDE’s are an efficient formalism for deriving existence and unique-ness results for diffusion processes with certain given properties. In fact it constructs a diffusionprocess with given properties from Brownian motion paths. Some authors claim this as the key ofIto’s contribution to the field of diffusion processes.

The properties described lateron rely on Ito’s formalism.

In our case it is better not to depart to the field of diffusion processes. Even more so, because thereare many conflicting definitions of this notion. The best advice for a rigorous treatment are thebooks by Rogers and Williams (the latter being the author of the martingale book).

From now on, we will only consider Ito diffusions. One can prove that this are strong Markovprocesses.

The infinitesimal generator A of the process X(t) is defined by

Af(x) def= limt↓0

E(f(X(t))|X(0) = x)− f(x)t

, x ∈ Rn.

If for a given function f , this limit exists for all x, then we say that f belongs to DA, the domainof the generator. Let C2

0 (Rn) be the set of twice continuously differentiable functions on Rn with

45

compact support. Then one can prove that Af(x) exists for all x ∈ Rn and

Af(x) =∑

i

bi(x)∂f

∂xi(x) +

12

∑i,j

(σ(x)σT (x))ij∂2f

∂x2(x),

whenever f ∈ C20 (Rn). Note that A is a linear operator on C2

0 (Rn).

It is obvious that Brownian motion is a time-homogeneous one-dimensional Ito diffusion with in-finitesimal parameters b(x) = 0 and σ(x) = (1). The infinitesimal operator A associated with it, isgiven by

Af(x) =12

∂2f

∂x2(x) =:

12∆f(x).

(∆ stands for the Laplace operator: if f : Rn → R is twice differentiable, then ∆f =∑n

i=1(∂2/∂x2

i )f .

One can model the graph of Brownian motion by a two-dimensional diffusion as follows: X(t) =(X1(t), X2(t)) with X1(t) = t and X2(t) = W (t).

Problem 7.3 Compute the corresponding infinitesimal generator.

The Ornstein-Uhlenbeck process is the Ito diffusion defined by

d(X(t)) = −αX(t)dt + σdW (t),

with α.

Problem 7.4 Give the infinitesimal generator.

The infinitesimal generator contains the information on the marginal distributions of an Ito diffusion.

Lemma 7.2 (Dynkin’s Lemma) Let f ∈ C20 (Rn). Suppose that τ is a stopping time with

E(τ |X(0) = x) < ∞. Then

E(f(X(τ))|X(0) = x) = f(x) + E(∫ τ

0Af(X(s))ds|X(0) = x).

The proof of this lemma follows rather straightforwardly from Ito’s formula.

Problem 7.5 Search the literature for a proof of Dynkin’s lemma based on Ito’s formula. Write itand, if necessary, supply lacking details.

Problem 7.6 Consider the n-dimensional Brownian motion W (t) = (W1(t), . . . ,Wn(t)), t ≥ 0.Suppose Brownian motion starts at a point x ∈ Rn. Let R > 0 be given. As the norm on Rn we

consider the L2-norm: ||x|| =√∑

i x2i .

i) Compute the infinitesimal generator of n-dimensional Brownian motion.

Let ||x|| < R and let τ denote the first exit time of the ball Bn = y | ||y|| < R. By a.s. continuityof Brownian motion paths, τ = inft > 0|W (t) 6∈ Bn is equal in distribution to

inft > 0| ||W (t)|| = R.

46

ii) Show that Pτ < ∞|X(0) = x = 1. Define a suitable martingale to compute E(τ) by virtue ofthe optional stopping theorem. Argue that the theorem is applicable and compute the expectedexit time. Hint: problem 6.12 may be helpful here.

Let now ||x|| > R and let τ be the first entrance time of Bn. The question is whether τ < ∞ a.s.and if yes, what is the expectation. The case n = 1 has been solved already (where?), so we assumen ≥ 2. I do not know how to get the optional stopping theorem to work for answering the abovequestions - if you can, please do. Therefore, it seems better to apply Dynkin’s lemma for suitablefunctions f . What type of function f would be suitable?

The complement of the closure of the R-ball is unbounded, and so we start in unbounded territory.Now, we need to use functions that have a compact support. It makes sense to consider the annulusAk = y|R < ||y|| < 2kR. Choose k large enough so that x ∈ Ak. Denote τk = inft > 0|W (t) 6∈ Ak.

A suitable function f = fn,k should depend on y only through the norm. Further, the integral∫ τk

0 Afn,k(X(s))ds should be easy to calculate. The best would be if this expression disappearsaltogether on the annulus, i.e. Afn,k = 0 on Ak. In other words, ∆fn,k = 0 on the annulus, i.e. fn,k

is harmonic (on the annulus).

Choose f = fn,k a function in C20 (Rn), with fn,k(y) = log ||y|| on Ak if n = 2 and fn,k(y) = ||y||2−n if

n > 2.

iii) Show that τk satisfies the conditions of Dynkin’s lemma. Show that fn,k is harmonic on Ak.Compute E(f(X(τk))|X(0) = x). Derive that

Pτ < ∞|X(0) = x =

1, n = 2( ||x||R )2−n, n > 2.

In case of n = 2, show that E(τ |X(0) = x = ∞. The implication is that Brownian motion in2 dimensions is null-recurrent and in n > 3 dimension it is transient.

Now, if we choose the stopping time τ deterministic, i.e. τ ≡ t, then we see that

u(t, x) = E(f(X(t))|X(0) = x)

is differentiable w.r.t. t and∂u

∂t= E(Af(X(t))|X(0) = x).

It turns out that we can express the right-hand side of the above also in terms of u. This gives riseto Kolmogorov’s backward equation.

Theorem 7.3 (Kolmogorov’s backward equation) Let f ∈ C20 (Rn).

i) Define u(t, x) = E(f(X(t))|X(0) = x). Then u(t, ·) ∈ DA for each t and

∂u

∂t= Au, t > 0, x ∈ Rn (7.2)

u(0, x) = f(x), x ∈ Rn. (7.3)

Interpret the right-hand side of (7.2) as A applied to u as a function of x.

47

ii) Suppose that w(t, x) is a bounded function solving (7.2) and (7.3), which is continuously differ-entiable in t, and twice continuously differentiable in x. Then w(t, x) = u(t, x). In particular,we have the explicit partial differential equation

∂u

∂t=

∑i

bi∂u

∂x+

12

∑i,j

(σσT )ij∂2u

∂x2.

This theorem gives a probabilistic solution to the initial value problem (7.2), (7.3). Now supposethat the Ito diffusion X(t) has a density p(t, x, y) = (∂/∂y)PX(t) ≤ y|X(0) = x that is oncecontinuously differentiable in t and twice continuously differentiable in x. Then it makes sense thatthis density itself solves (7.2).

Problem 7.7 Sketch a way how to prove this from Theorem 7.3.

Heat equation Let us now fix X(t) = x + W (t), with x given. Then Kolmogorov’s backwardequation (7.2) reduces to

∂u(x, t)∂t

= 12

∂2u(x, t)∂x2

,

which is the heat equation in one dimension. If X(t) is n-dimensional Brownian motion, then we get

∂u

∂t= 1

2∆u.

We interpret this equation physically in different ways. It may model the time development oftemperature u by heat conduction. On the other hand, microscopic particles suspended in a fluidor gas perform a very irregular motion, caused by collisions with molecules in thermal motion. Onecan then interpret u as the particle density, evolving in time.

From a microscopic point of view, individual particles perform a Brownian motion, which is a stochas-tic process. The process is an Ito diffusion. From a macroscopic point of view, the particle densityevolves in time according to the heat equation.

The relation between the two is that the density of Brownian motion is a solution to the heatequation.

Problem 7.8 Check the validity of this statement.

Now we will check the validity of Theorem 7.3 for this simple model. Consider the initial valueproblem

∂

∂tu =

12∆u, in Rn ×R+

u continuous in Rn ×R+0

and u(x1, . . . , xn, 0) = Φ(x1, . . . , xn),

for a given bounded and continuous function Φ : Rn → R.

By Theorem 7.3 this initial value problem has the following solution

u(x, t) = E(Φ(x + W (t))) =∫

RdΦ(x + y)

1(2πt)d/2

exp−||y||2/2tdy =∫

RdΦ(y)φd(x− y, t)dy,

provided that Φ satisfies the condition of that theorem. For continuous functions Φ the statementcan be checked directly.

48

Problem 7.9 Do this by carrying out the following steps.

i) Argue that (x, t) → Φ(x + W (t)) is a continuous and bounded function for a.a. ω ∈ Ω.

ii) Use (i) and a suitable convergence theorem to conclude that (x, t) → E(Φ(x+W (t))) is continuous.

iii) Show that E(Φ(x + W (0))) satisfies the initial conditions.

iv) Finally show that E(Φ(x + W (t))) solves the heat equation. To this end we need to interchangederivation and integral, so you have to justify that operation.

We derive another property of the solution u to the initial value problem.

Theorem 7.4 Let s > 0 be fixed. Then u(x + W (t), s − t) is a martingale w.r.t. the filtration(Ft)0≤t≤s, Ft = σ(W (t′), 0 ≤ t′ ≤ t).

Problem 7.10 Prove the theorem. First argue that the following relation holds true: u(y, s− t) =E(Φ(y + W (s− t))) = E

(Φ(y + W (s)−W (t))|Ft

).

Boundary value problems Let X(t) be an Ito diffusion in one dimension:

dX(t) = b(X(t))dt + σ(X(t))dW (t),

where the functions b and σ satisfy the Lipschitz condition (7.1). Let (a, b) be a given interval, andlet X(0) = x ∈ (a, b).

Put τ = inft > 0|X(t) 6∈ (a, b) and define p = PX(τ) = b|X(0) = x. Suppose that we can find asolution f ∈ C2(R) such that

Af = b(x)f ′(x) +12f ′′(x) = 0, x ∈ R.

Problem 7.11 i) Prove that

p =f(x)− f(a)f(b)− f(a)

,

provided τ < ∞ a.s.

ii) Now specialise to the case X(t) = x + W (t), t ≥ 0. Prove that

p =x− a

b− a.

iii) Determine p if X(t) = x + bt + σW (t).

Now we are interested in the following boundary value problem: find u ∈ C2(R) such thatu′′(x) = 0, x ∈ (a, b)u(a) = φ(a)u(b) = φ(b).

49

Problem 7.12 Determine a solution to this problem analytically.

We can derive a solution by a stochastic approach.

Problem 7.13 Let X(t) = x + W (t). Show that

u(x) := E(φ(X(τ))|X(0) = x)

solves the boundary value problem.

Solving the PDE in this way is stereotype. However, in general one needs a detailed study of suitableproperties of the function E(φ(X(τ))|X(0) = x, because in most cases one cannot explicitly calculateit. That involves many technicalities.

Problem 7.14 Solve the following boundary value problem by a stochastic approach: find u ∈C2(R) such that

bu′(x) + 12σ2u′′(x) = 0, x ∈ (a, b)

u(a) = φ(a)u(b) = φ(b).

In the above solutions, time did not play a role. We will next consider the simplest version of aboundary value problem involving the heat equation.

Back to the heat equation Let D denote the infinite strip:

D = (t, x) ∈ R2 : x < R.

Let φ be a bounded continuous function on δD = (t, R)|t ∈ R. We consider the following boundaryvalue problem: find u ∈ C1,2(R× (−∞, R)) such that

∂u

∂t+

12

∂2u

∂x2= 0, (x, t) ∈ D

lim(s,x)→(t,R),(s,x)∈D

u(s, x) = φ(t), y ∈ δD.

A physical interpretation of this problem is the following: we consider an infinitely long vertical bar,with upper end point at R. We fix the temperature φ at the upper point of the bar as a function oftime. Now we are interested in how temperature ‘spreads’ over the whole bar,while time is running.

Problem 7.15 i) Define (hint: look at earlier exercises) a 2-dimensional Ito diffusion X(t), withgenerator

A =∂

∂t+

12

∂2

∂x2.

ii) Let τt,x = infs > t |X(s) 6∈ D, given that X(0) = (t, x). Show that

u(t, x) = E(φ(X(τt,x))|X(0) = (t, x))

is the solution of the boundary value problem. Hint: the distribution of τt,x−t is the distributionof the hitting time of R for Brownian motion given initial state W (0) = x.

50

Option pricing We will indicate how to arrive at the simplest form of the Black Scholes formulafor European options. There is a extensive mathematical formalism to define all notions that we usebelow in a precise manner, but this go far beyond the scope of this course.

The basis is the following Ito diffusion

dS(t) = µS(t)dt + σS(t)dW (t),

where S(t) is the value of one unit stock. µ and σ 6= 0 are assumed constant. This is a geometricBrownian motion (see BSP) and it has the solution

S(t) = s0 exp(µ− σ2/2)t + σW (t)

where S(0) = s0 is given. Of course, dealing in stock is a risky investment because of the diffusionterm σSdW (t). If we assume the interest rate of a bank investment to equal a constant ρ, then abank investment is a safe investment.

A European option is the right to buy one unit stock at the expiration time T for DK. At theexpiration time T you will exercise your option, when K < S(T ); you will not buy it when K > S(T ).This means that at time T the value of your ‘warrant’ (right to buy the option) is

max(S(T )−K, 0).

The question is how to calculate the price of the warrant at time t < T . If one assumes a stablemarket, that is, on the average one cannot gain or lose, then price and value of a warrant must beequal. Write F (S(t), T − t) is the price of a unit warrant at time t < T . Then F (S, 0) = max(S, 0).The aim is to formulate an initial value problem for F (S, T − t), 0 ≤ t ≤ T .

Problem 7.16 Derive a SDE for dF (S, T − t).

Suppose we have the following investment policy: at time t our portfolio consists of 1 unit of warrantwith value F (S(t), T − t) and α(t) units of stock, so as to eliminate risk. Now α(t) is assumedFt = σ(W (s), s ≤ t) measurable. As a consequence the value of out portfolio at time t is

V (t) = F (S(t), T − t) + αS(t)

and we getdV = dF + αdS.

Problem 7.17 Derive a SDE for V . Determine α such that the dW term (diffusion term) disappears.Conclude that

dV (t) =(1

2∂2F

∂S2S2σ2 − ∂F

∂t

)dt.

On the other hand, in a stable market, the average value of a portfolio is the same is the value of asafe (bank) investment:

dV = ρV.

Problem 7.18 i) Show that combining the above gives rise to the following initial value problemfor F : ∂F

∂t= ρS

∂F

∂S+

12S2σ2 ∂2F

∂S2− ρF.

F (S, 0) = max(S −K, 0)

51

ii) Suppose ρ = 0. Of what Ito diffusion would the first equation be be the Kolmogorov backwardequation (7.2)?

To solve this problem, we need to invoke the Feynman-Kac formula.

Theorem 7.5 Let f ∈ C20 (Rn) and q ∈ C(Rn). Assume that q is lower bounded.

i) Put

v(t, x) = E(

exp−∫ t

0q(X(s))dsf(X(t)|X(0) = x

).

Then ∂v

∂t= Av − qv, t > 0, x ∈ Rn

v(0, x) = f(x), x ∈ Rn.

ii) If w(t, x) ∈ C1,2(R × Rn) is bounded on K × Rn for each compact subset K ⊂ R and w is asolution of the above PDE, then w(t, x) = v(t, x).

Problem 7.19 Show that the value of the option at time t = 0 equals

s0Φ(u)− e−ρT KΦ(u− σ√

T ),

whereΦ(u) =

1√2π

∫ u

−∞e−x2/2dx,

is the distribution function of a standard normal r.v. and

u =ln(s0/K) + (ρ + σ2/2)T

σ√

T.

This is the classical Black Scholes formula.

52

Documents

Introduction to Stochastic Processes - Universiteit …spieksma/colleges/stochproc/SP04.pdf · Introduction to Stochastic Processes August 30, 2006 Contents 1 Measure space and random