Exercises from Probability with Martingales

Ryan McCorvie

Exercises from Probability with Martingales

EG 1 Two points are chosen at random on a line AB independently according to theuniform distribution. If the line is cut at these two points, what’s the probability thesegments will form a triangle?

The sample space is the unit square [0, 1]2 with points corresponding to (x1, x2). Theprobability measure is the same as area in this square. A necessary and sufficient condi-tion to form a triangle is that the segments [0, x1], [x1, x2] and [x2, 1] satisfy the triangleinequalities. Assuming x1 < x2, these take the form

x1 < (x2 − x1) + (1− x2) 1− x2 < x1 + (x2 − x1) x2 − x1 < (1− x2) + x1 (1)

The first inequality says x1 < 12 , the second that x2 > 1

2 and the last that x2 − x1 < 12 .

These inequalities describe a triangle in the unit square with area 18 . Considering also the

symmetrical when x1 > x2, the total probability is 28

EG 2 Planet X is a ball with center O. Thre spaceships A, B and C land at randomon its surface, their positions being independent and each uniformly distributed on thesurface. Spaceships A and B can communicate directly by radio if ∠AOB ≤ 90. What’sthe probability all three can communicate? (For example, A can communicate to C viaB if necessary).

A spaceship X may communicate with any spaceship in the points in the hemispherecentered at X. Denote this by h(X). Without loss of generality, rotate the sphere so thatA is the north pole, and B lies in a fixed plane through O and A. The three ships maycommunicate if:

1. B is in the northern hemisphere and C is in the northern hemisphere (the ships cancommunicate via A)

2. B is in the northern hemisphere and C is in the half-lune h(B) \ h(A) (the ships cancommunicate via B)

3. B is in the southern hemisphere and C is in the half-lune h(A) ∩ h(B) (the ships cancommunicate via C)

These cases are disjoint, so we can consider them separately and sum their contributions.The area of a half lune is proportional to angle ∠XOY where X and Y are the centers ofthe hemispheres. This is the same as the dihedral angle between the equitorial planes ofthe hemispheres. Thus the probability of being in a half lune of angle θ is θ/2π.

1

So writing one integral for each case above, conditioning on the angle θ = ∠AOB andgiving the probabity for C to be in the the configuration described by the case gives theequation

P =∫ π/2

0

12

p(dθ) +∫ π/2

0

θ

2πp(dθ) +

∫ π

π/2

(π − θ)

2πp(dθ) (2)

Now, the probability density of θ = ∠AOB is given by 12 sin θ dθ. Heuristically, this can be

seen by consiering the infinitesimally thin strip on the surface of the sphere with constantlatitue θ. This is like a cylinder (or frustrum of a cone) of side edge length r dθ and radiusr sin θ, so the area is like 2πr2 sin θ dθ. Normalizing by the total surface area 4πr2 givesthe probability density.

Calculating each contribution ∫ π/2

0

12· 1

2sin θ dθ =

14∫ π/2

0

θ

2π· 1

2sin θ dθ =

14π∫ π

π/2

(π − θ)

2π· 1

2sin θ dθ =

14π

(3)

Summing the contributions we find P = 2+π4π

EG.3 Let G be the free group with two generators a and b. Start at time 0 with the unitelement 1, the empty word. At each second multiply the current word on the right byone of the four elements a, a−1, b, b−1, choosing each with probability 1

4 (independentlyof previous choices). The choices

a, a, b, a−1, a, b−1, a−1, a, b (4)

at times 1 to 9 will produce the reduced word aab of length 3 at time 9. Prove that theprobability that the reduced word 1 ever occurs at a positive time is 1/3, and explainwhy it is intuitively clear that (almost surely)

(length of reduced word at time n)/n→ 12

. (5)

Multiplying by random elements in a, a−1, b, b−1 is the same as taking a random walkon the Cayley graph of the free group with two generators (see figure 1). This is the rooted4-regular tree. The depth of a vertex is the length of the unique path to the root, which isthe vertex corresponding to 1 in the free group element. For any vertex of depth d > 0,there are three edges leading to a node of depth d + 1 and one edge leading to a node ofdepth d− 1.

Let pd be the probability of returning to 1 starting from a node of depth d. Its intu-itively clear from the symmetry of the graph that the probability only depends on thedepth. There is a graph isomorphism mapping any node of depth d to another node of

2

Figure 1: Cayley graph on the free group of 2 generators

depth d, so there is a one-to-one correspondence between paths which lead back to 1, andthose paths have equal probability. The probability must satisfy the recurrance

pn =34

pn+1 +14

pn−1 (6)

This is because we can condition the return probability on the first node in the walk anduse the fact that the return probability depends only the depth of the node. Fundamen-tal solutions of recurrance equations have the form pn = λn and satisfy a characteristicpolynomial equation which, for this recurrance is

3λ2 − 4λ + 1 = 0 ⇒ λ =13

or λ = 1 (7)

The only solution of the form pn = a · 13

n+ b which also satisfies the boundary conditions

p0 = 1 and pn → 0 as n → ∞ is pn = 13

n. In particular, a random walk starting at time

1 from a node of depth 1 returns to the origin with probability 13 . This is the same as the

event that the reduced word is ever 1 at positive time, since every word sequence as oneletter at time 1.

Let Ln be the random variable which represents word length at time n. Note that

E[Ln | Ln−1] =34(Ln−1 + 1) +

14(Ln−1 − 1) = Ln−1 +

12

(8)

since Ln is the same thing as vertex depth. Taking expectations and using the tower law

E Ln = E Ln−1 +12

(9)

Hence the expectation satisfies a simple recurrance. Since L0 = 0, this means E Ln = n/2.More precisely, the quantity Ln − Ln−1 is an i.i.d. random variable with finite varianceand mean 1

2 . Therefore by the SLLN, Ln/n→ 12 almost surely.

3

EG.4 Suppose now that the elements a, a−1, b, b−1 are chosen instead with respectiveprobabilities α, α, β, β, where α > 0, β > 0, α + β = 1

2 . Prove that the conditional proba-bility that the reduced word 1 ever occurs at a positive time, given that the element a ischosen at time 1, is the unique root x = (a) (say) in (0, 1) of the equation

3x2 + (3− 4α−1)x2 + x + 1 = 0 (10)

As time goes on, (it is almost surely true that) more and more of the reduced wordbecomes fixed, so that a final word is built up. If in the final word, the symbols a anda−1 are both replaced by A and the symbols b and b−1 are both replaced by B, show thatthe sequence of A’s and B’s obtained is a Markov chain on A, B with (for example)

pAA =α(1− x)

α(1− x) + 2β(1− y)(11)

where y = r(β). What is the (almost sure) limiting proportion of occurrence of thesymbol a in the final word? (Note. This result was used by Professor Lyons of Edinburghto solve a long-standing problem in potential theory on Riemannian manifolds.)

Algebra’s, etc.

1.1 Let V ⊂N, and say V has Cesaro density γ(V) whenever

γ(V) = limn→∞

](V ∩ 1, 2, . . . , n)n

(12)

exists. Let C be the set of all sets with Cesaro density. Give an example of V1, V2 ∈ Csuch that V1 ∩V2 6∈ C

Take V1 to be 1 and n > 10 whose base 10 representation has an equal first and last digit.Take V2 to be the natural numbers whose last digit in its base 10 representation is 1.

Then 0 ≤ ](Vi ∩ 1, 2, . . . , n) − bn/10c ≤ 1 since in every subset q · 10, q · 10 +1, q · 10 + 2, . . . , q · 10 + 9 each of V1 and V2 have exactly one representative. Thereforeγ(V1) = γ(V2) = 1.

Note V1 ∩V2 consists of 1 and all numbers of the form 1b2b3 . . . bn−11 where bi can takeany digit from 0 to 9. Thus between 10k and 10k+1 there are 10k−1 members of V1 ∩ V2.Therefore if n = 10k,

](V1 ∩V2 ∩ 1, . . . , n) = 1 + 1 + 10 + 100 + · · ·+ 10k−2 =10k−1 − 1

9+ 1 (13)

and ](V1∩V2∩1,...,n)n < 1

89 . However if n = 2 · 10k then

](V1 ∩V2 ∩ 1, . . . , n) = 1 + 1 + 10 + 100 + · · ·+ 10k−1 =10k − 1

9+ 1 (14)

4

and ](V1∩V2∩1,...,n)n > 1

18 . Therefore the limit diverges since the lim sup is greater than thelim inf.

Independence

4.1 Let (Ω,F , Pr) be a probability triple. Let I1, I2 and I3 be three π-systems on Ωsuch that for k = 1, 2, 3

Ik ⊂ F and Ω ∈ Ik (15)

Prove that ifPr(I1 ∩ I2 ∩ I3) = Pr(I1)Pr(I2)Pr(I3) (16)

whenever Ik ∈ Ik then the σ(Ik) are independent. Why did we require Ω ∈ Ik?

Fix I2 ∈ I2 and I3 ∈ I3. Then consider the mappings for I ⊂ I1

µ(I) → Pr(I ∩ I2 ∩ I3) µ′(I) → Pr(I)Pr(I2)Pr(I3) (17)

Each of µ and µ′ are countably additive measures because the probability Pr is a count-ably additive measure on F . By the hypothesis (16), the measures are equal on theπ-system I1. By assumption Ω ∈ I1 as well, so µ(Ω) = µ′(Ω). If we dropped theassumption that Ω ∈ I1, then the condition that µ(Ω) = µ′(Ω) is the condition thatPr(I2 ∩ I3) = Pr(I2)Pr(I3), which doesn’t hold generically. We conclude from theorem1.6 that µ(I) = µ′(I) for I ∈ σ(I1). Since I2 and I3 were arbitrary, equation (16) holds forall I1 ∈ σ(I1), I2 ∈ I2, I3 ∈ I3.

Now we can repeat the above argument with the π-systems I ′1 = I2, I ′2 = I3, I ′3 =σ(I1), and observe (16) is invariant to permutations of 1, 2, 3, to conclude that (16)holds for I1 ∈ σ(I1), I2 ∈ σ(I2), I3 ∈ I3. Applying the arugment one more time forI ′1 = I3, I ′2 = σ(I1), I ′3 = σ(I2) gives the result we want, that (16) holds for Ik ∈ σ(Ik)

4.2 For s > 1 define ζ(s) = ∑n∈N n−s and consider the distribution on N given byPr(X = n) = ζ(s)−1n−s. Let Ek = n divisible by k. Show Ep and Eq are inde-pendent for primes p 6= q. Show Euler’s formula 1/ζ(s) = ∏p(1 − 1/ps). ShowPr(X is square-free) = 1/ζ(2s). Let H = gcd(X, Y). Show Pr(H = n) = n−2s/ζ(2s)

The elements of m ∈ Ek are in one-to-one correspondence with elements of n ∈ N by therelationship m = nk. Therefore

Pr(Ek) = ∑m∈Ek

Pr(m) = ∑n∈N

Pr(nk) (18)

SincePr(nk) = k−sn−s/ζ(s) = k−s Pr(n) (19)

we havePr(Ek) = k−s ∑

n∈N

Pr(n) = k−s (20)

5

Furthermore, the conditional probability satisfies

Pr(X = kn | X ∈ Ek) =Pr(X = nk)

Pr(Ek)=

(nk)−s/ζ(s)k−s = n−s/ζ(s) (21)

Thus the conditional distribution of X is given by the ζ(s) distribution on the relativedivisor X/k.

Now, for m, n relatively prime, Em ∩ En = Emn by unique factorization. Therefore

Pr(Em ∩ En) = Pr(Emn) = (mn)−s = m−sn−s = Pr(Em)Pr(En) (22)

which proves independence.Consider the probability X is not divisible by any of the first n primes p1, p2, . . . , pn.

This is the event E = Ecp1∩ · · · ∩ Ec

pn and

Pr(E) =n

∏i=1

(1− Pr(Epi)) =n

∏i=1

(1− p−si ) (23)

As n → ∞ the set E consists of 1 and really big numbers, certainly every element of Eexceeds pn+1, and Pr(X > n) ↓ 0 as n → ∞. Thus Pr(E) → Pr(X = 1) = 1/ζ(s). Thisshows

ζ(s)−1 = ∏p(1− 1/ps) (24)

The set of integers which don’t contain a square prime factor up to pn is given by

F = Ecp2

1∩ · · · ∩ Ec

p2n

(25)

Each of the sets Ep2 are independent, because the numbers p2 are relatively prime. Thusas n → ∞, F converges to a set of square-free numbers up to pn (at least) and a set ofnegligible probability, so

Pr(F)→∏p(1− (p2)−s) = ∏

p(1− 1/p2s) = 1/ζ(2s) (26)

Now n = gcd(x, y) iff X and Y are divisible by n, and the relative divisors nx, ny suchthat x = n · nx and y = n · ny. By independence the event that both X ∈ En and Y ∈ Ensatisfies

Pr(X ∈ En, Y ∈ En) = Pr(X ∈ En)Pr(Y ∈ En) = n−2s (27)

Let R = (a, b) | a, b relatively prime where we take the convention (1, 1) 6∈ R. We cancalulate Pr((X, Y) ∈ R) as follows. First consider the sets Fp = X ∈ Ep ∩ Y ∈ Ep. Byindependence Pr(Fp) = Pr(X ∈ Ep)Pr(Y ∈ Ep) = p−2s. The Fp are independent of eachother since the Ep are. Then Fc

p is the event that p is not a divisor of at least one of X andY. Hence F =

⋂p Fc

p is the event that X and Y have no common divisor.

Pr(X, Y relatively prime) = Pr(F) = ∏p(1− p−2s) = 1/ζ(2s) (28)

6

Pulling it together

Pr(gcd(X, Y) = n) = Pr(An)Pr(Bn | An) = n−2s/ζ(2s) (29)

where An = X ∈ En ∩ Y ∈ En and Bn = (X/n, Y/n) ∈ R.

4.3 Let X1, X2, . . . be i.i.d. random variables from a continuous distribution. Let E1 = Ωand for n ≥ 2 let En = Xn = max(X1, X2, . . . , Xn), that is that a “record” occurs atevent n. Show that the events En are independent and that Pr(En) = 1/n.

A continuous distribution has the property that singletons have measure zero µ(X =x) = 0. Thus conditioning on the value of X1, it follows that X1 = X2 has measurezero also, by Fubini’s theorem Pr(X1 = X2) =

∫Ω Pr(X2 = x) µ(dx) = 0. Since there are

a finite number of pairs of variables, we may assume none of the Xi are coincident byexcluding a set of measure zero.

Allow a permutation σ ∈ Sn to act on the sample space by permuting the indices. Thatis σ : (X1, X2, . . . , Xn) → (Xσ1, Xσ2, . . . , Xσn). Clearly this also permutes the ranks so thatρi(σX) = ρσi(X), or in vector form, ρ(σX) = σρ(X). Because the Xi are i.i.d., the permu-tation σ is measure preserving Pr(E) = Pr(σE) for any event E ⊂ Ωn. Consider the Weylchamber W = x1 < x2 < · · · < xn. Clearly the disjoint σW partition Ωn, excluding aset of measure zero of coincident points, since every point x ∈ Ωn is in ρ(x)−1(W) whereρ(x) is the permutation induced by mapping each component to its ordinal ranking fromsmallest to greatest.

We infer that Pr(W) = 1/n!, since the disjoint copies of W have equal probability andcomprise the entire space. The event En is the union of all the sets σW where σn = n.There are (n− 1)! such permutations, so Pr En = (n− 1)!/n! = 1/n.

For any σW ⊂ En we can apply any permutation σ′ ∈ Sm ⊂ Sn which acts only onthe first m letters and keeps the rest fixed. For any σ′ ∈ Sm, σ′σW ⊂ En since the nthcomponent is largest and unchanged by the action of σ′. Let us group the sets σW ⊂ Eninto the orbits of the action of Sm. From each orbit we may choose the representative withρ1(x) < ρ2(x) < · · · < ρm(x) for each x ∈ σW, which is formed by permuting the firstm elements into sorted order. Let U be the union of all orbit representatives. For σ′ ∈ Smthe sets σ′U are disjoint and partition En. There are m! such sets, they all have equalprobability, and their union has probability 1/n, so Pr(U) = 1/(n ·m!). Of these exactly(m− 1)! are also in Em. These correspond to σ′ ∈ Sm which satisfy σ′m = m. ThereforePr(Em ∩ En) = 1/(mn) = Pr(Em)Pr(En) and the sets are independent.

Borel-Cantelli Lemmas

4.4 Suppose a coin with probability p of heads is tossed repeatedly. Let Ak be the eventthat a sequence of k (or more) heads occurs among the tosses numbered 2k, 2k + 1, 2k +1, . . . , 2k+1 − 1. Prove that

Pr(Ak i.o) =

1 if p ≥ 1

20 if p < 1

2(30)

7

Let Ek,i be the event that there is a string of k heads at times j for

2k + i ≤ j < 2k + i + k− 1 (31)

Observe Pr(Ek,i) = pk. If mk = 2k− k and i ≤ mk then Ek,i ⊂ Ak. Furthmore Ak ⊂⋃mk

i=0 Ek,isince if there’s a string of k heads in this range, it has to start somewhere. Thus by theunion bound

Pr(Ak) ≤mk

∑i=0

Ek,i = mk · Pr(Ek,i) ≤ (2p)k (32)

If p < 12 then ∑k Pr(Ak) < ∞. By Borel-Cantelli 1, we must have Pr(Ak i.o.) = 0.

Now consider Ek,0, Ek,k, Ek,2k, . . . so long as ik ≤ mk. Now the sets are independentand

Ak ⊃bmk/kc⋃

i=0

Ek,ik (33)

Taking the inclusion-exclusion inequality

Pr(Ak) ≥∑i

Pr(Ek,ik)−∑i,j

Pr(Ek,ik ∩ Ek,jk) (34)

All of the terms in the first sum are pk and there are bmk/kc ≥ 2k/k − 2 terms. All theterms in the second sum are p2k since the Ek,ik are independent and there are (bmk/kc

2 ) ofthem. Therefore when p = 1

2

Pr(Ak) ≥(

2k

k− 2

)pk −

(22k

2k2

)p2k =

1k− 2−(k−1) − 1

2k2 (35)

Therefore ∑ Pr(Ak) = ∞, since its greater than sum of the harmonic series and a con-vergent series. Since the Ak are independent, the second Borel Cantelli lemma says thatPr(A)k i.o.) = 1.

As a function of p, the probability Pr(Aki.o.) must non-decreasing, since increasingthe probability of heads increases the likelihood of the event. However we just showedthis probability is 1 for p = 1

2 . Therefore the probability is 1 for any p > 12 .

4.5 Prove that if G ∼ N (0, 1) then for x > 0

Pr(G > x) ≤ 1x√

2πe−

12 x2

(36)

Let Xi N (0, 1) be i.i.d.. Show that L = 1 almost surely where

L = lim sup(Xn/√

2 log n) (37)

Let Sn = ∑i Xi. Recall Sn/√

n ∼ N (0, 1). Prove that

|Sn| < 2√

n log n, eventually almost surely (38)

8

Let’s do a calcualtion, noting that u/x ≥ 1 in the range of the integral

∫ ∞

x

e−12 u2

uk du ≤ 1xk+1

∫ ∞

xue−

12 u2

du =e−

12 x2

xk+1 (39)

Now integrating by parts with v = xe−12 x2

and u′ = 1/x we get

√2π Pr(G > x) =

∫ ∞

xe−

12 u2

du =e−

12 x2

x−∫ ∞

x

e−12 x2

x2 dx (40)

Alternately dropping the last term or applying (39) with k = 2 we find

1√2π

(1x− 1

x3

)e−

12 x2 ≤ Pr(G > x) ≤ 1√

2π

1x

e−12 x2

(41)

If I(x) = x−1e−12 x2

then this shows for any ε > 0 of x is large enough then 1−ε√2π

I(x) ≤P(G > x) ≤ 1√

2π. Now

I(√

2α log n) =1√

α log ne−α log n =

1xα√

α log n(42)

Using the substitution the substitution u =√

log x we can apply the integral test∫ ∞

aI(√

2α log n) dx =∫ ∞

a

dxxα√

α log x=

2√α

∫ ∞

ae(1−α)u2

du (43)

Clearly the integral converges if α > 1 and diverges if α ≤ 1, so the same is true of

∑n

Pr(Xn >√

2α log n) = O(∑n

I(√

2α log n)) (44)

The Xn are independent so we may apply Borel-Cantelli and its converse to conclude forα > 1

lim sup Xn/√

2α log n ≤ 1 a.s. lim inf Xn/√

2 log n ≥ 1 a.s. (45)

Let α ↓ 1 to conclude lim L = 1 almost surely.Note that we only used independence in the lower bound, the upper bound uses plain

jane Borel-Cantelli which does not assume independence. Thus Yn = Sn/√

n are a se-quence of N (0, 1) random variables. If we take α =

√2 then conclude

Yn < 2√

log n eventually, a.s. (46)

The same is true for −Yn which has the same distribution, so eventually (taking thegreater eventuality) |Yn| < 2

√log n. This is the same as |Sn| < 2

√n log n eventually al-

most surely. In particular this implies the strong law of large numbers for Xn ∼ N (µ, σ2)

since |Sn/n− µ| < 2σ

√log n

n → 0 eventually almost surely.

9

4.6 This is a converse to the SLLN. Let Xn be a sequence of i.i.d. RV with E[|Xn|] = ∞.Prove that

∑n

Pr(|Xn| > kn) = ∞ for ∀k and lim sup|Xn|

n= ∞ a.s. (47)

For Sn = ∑ni=1 Xn conclude

lim sup|Sn|

n= ∞ a.s. (48)

ThenbZc = ∑

n∈N

IZ≥n (49)

since each indicator with n < Z adds 1 to the sum and all the other indicaters are 0,and there are bZc such indicators. Taking expectations of both sides and noting thatbZc ≤ Z ≤ bZc+ 1

E[Z]− 1 ≤ ∑n∈N

Pr(Z ≥ n) ≤ E[Z] (50)

FOr any k ∈N, take Z = |X1|/k, since E[Z] = ∞ we get that

∑n

Pr(|X1|/k > n) ≥ E[Z]− 1 = ∞ (51)

For i.i.d. Xn since Pr(|Xn| > α) = Pr(|X1| > α) for any α ≥ 0 this implies

∑n

Pr(|Xn| > kn) = ∞ (52)

By the converse of Borel-Cantelli we must have |Xn|/n > k infinitely often, and thereforelim sup|Xn|/n ≥ k for every k ∈N. We conclude lim sup|Xn|/n = ∞.

Suppose |Sn|/n ≤ k eventually for some k > 0. That is, there exists a k > 0 and N > 0such that if n ≥ N then |Sn| ≤ kn. We know that almost surely there exists an n ≥ N + 1such that |Xn| ≥ 2kn. But then

|Sn| ≥ |Xn| − |Sn−1| ≥ 2kn− k(n− 1) > kn (53)

This is a contradiction. Hence, with probability 1, |Sn|/n > k infinitely often for any k > 0and therefore

lim sup|Sn|

n= ∞ a.s. (54)

4.7 What’s fair about a fair game? Let X1, X2, . . . be independent random variablessuch that

Xn =

n2 − 1 with probability n−2

−1 with probability 1− n−2 (55)

Prove that E Xn = 0 for all n, but that if Sn = X1 + X2 + · · ·+ Xn then

Sn

n→ −1 a.s. (56)

10

An easy calculation shows

E Xn = (n2 − 1) · Pr(n2 − 1) +−1 · Pr(−1) (57)

= (1− n−2) + (n−2 − 1) = 0 (58)

Now let An be the event Xn = −1. Since

∑n

Pr(Acn) = ∑

nn−2 < ∞ (59)

By Borel-Cantelli, with probability 1, only finitely many of the Acn occur. Therefore, with

probability 1, Xn = −1 with a finite number of exceptions. On any such sample, let N bethe largest index such that XN = 1− n−2. Then for n > N

Sn

n=

SN

n+−1 · (n− N)

n(60)

The as n → ∞, the first term tends to zero because SN is fixed and the second term tendsto -1. (Note the indepdendence of Xn was not needed in this argument)

4.8 This exercise assumes that you are familiar with continuous-parameter Markovchains with two states.For each n ∈ N, let X(n) = X(n)(t) : t ≥ 0 be a Markov chain with state-space thetwo-point set 0, 1 with Q-matrix

Q(n) =

(−an anbn −bn

), an, bn > 0 (61)

and transition function P(n)(t) = exp(tQ(n)). Show that, for every t,

p(n)00 (t) ≥ bn/(an + bn), p(n)01 (t) ≤ an/(an + bn) (62)

The processes (X(n) : n ∈ N) are independent and X(n)(0) = 0 for every n. Each X(n)

has right-continuous paths.Suppose that ∑ an = ∞ and ∑ an/bn < ∞. Prove that if t is a fixed time then

Pr(X(n)(t) = 1 for infinitely many n) = 0 (63)

Use Weierstrass’s M-test to show that ∑n log p(n)00 (t) is uniformly convergent on [0, 1], anddeduce that

Pr(X(n)(t) = 0 for ALL n)→ 1 as t ↓ 0 (64)

Prove thatPr(X(n)(s) = 0, ∀s ≤ t, ∀n) = 0 for every t > 0 (65)

and discuss with your tutor why it is almost surely true that within every non-emptytime interval, infinitely many of the X(n) chains jump.

11

First let’s calculate the matrix exponential

Q =

(−a ab −b

)we have Q2 =

(a2 + ab −a2 − ab−ab− b2 ab + b2

)= −(a + b)Q (66)

Thus, by induction, Qn = (−a− b)n−1 Q, and therefore

exp(tQ) = I + ∑k≥1

tkQk

k!= I − Q

a + b ∑k≥1

tk(−a− b)k

k!= I − e−(a+b)t − 1

a + bQ

=1

a + b

(b + ae−t(a+b) a− ae−t(a+b)

b− be−t(a+b) a + be−t(a+b)

) (67)

From this (62) is evident.For fixed t > 0,

∑n

Pr(X(n)(t) = 1) = ∑n

p(n)01 (t) ≤∑n

an

an + bn≤∑

n

an

bn< ∞ (68)

Thus by Borel-Cantelli, almost surely, at any time t only finitely many of the X(n)(t) = 1Now consider the joint probability Π(t) that every X(n)(t) = 0 at time t. This satisfies

the bound

− log Π(t) = − log ∏n

p(n)00 (t) ≤∑n

log(

an + bn

bn

)≤∑

n

an

bn< ∞ (69)

This gives a uniform bound for the convergence of − log Π(t), implies the product con-verges uniformly on any compact subset of R≥0, such as [0, 1]. In particular this meansthat

limt↓0

Π(t) = Π(0) =∞

∏n=1

1 = 1 (70)

TODO finish this

Tail σ-algebras

4.9 Let Y0, Y1, Y2, . . . be independent random variables with

Yn =

+1 probability 1

2−1 probability 1

2(71)

DefineXn = Y0Y1 · · ·Yn (72)

Prove the Xn are independent. Define

Y = σ(Y1, Y2, . . . ), Tn = σ(Xr : r > n) (73)

Prove

L :=⋂n

σ(Y , Tn) 6= σ

(Y ,⋂nTn

)=: R (74)

12

First note Pr(Xn = 1) = Pr(Xn = −1) = 12 , so Xn has the same distribution as Yn. If we

let X0 = Y0, this follows by induction since

Xn =

Xn−1 with probability 1

2−Xn−1 with probability 1

2(75)

and so, for example,

Pr(Xn = 1) =12

Pr(Xn = 1) +12

Pr(Xn−1 = −1) = 2 · 14=

12

(76)

For m < n and εm, εn ∈ −1, 1, consider Pr(Xm = εm ∩ Xn = εn). Clearly this isthe same as Pr(Xm = εm ∩ Xm ·Xn = εmεn). Given Xm and Xn we can multiply themto generate Xm · Xn. Conversely, given Xm and Xm · Xn we can multiply them to recoverXn. Note also that

Xm · Xn = (Y0 ·Y1 · · ·Ym)2 Ym+1 · · ·Yn = Ym+1 · · ·Yn (77)

Hence Xm and Xm · Xn are independent, since they are functions of independent randomvariables. Thus

Pr(Xm = εm ∩ Xn = εn) = Pr(Xm = εm ∩ Xm · Xn = εmεn)= Pr(Xm = εm)Pr(Xm · Xn = εmεn)

=12· 1

2= Pr(Xm = εm)Pr(Xn = εn)

(78)

This argument generalizes to any finite set of Xn1 , . . . , Xnk by considering the isometrictransformation

Xn1

Xn2...

Xnk

→

Xn1

Xn1 Xn2...

Xn1 Xn2 · · ·Xnk

(79)

and noting each of the variables on the right hand side are independent with the samedistribution as Xn so the joint probability is 2−k for any fixed values εnk . But this is thesame as ∏k Pr(Xnk = εnk).

Furthermore Xn is independent of Ym. If m > n this is obvious because the definition ofXm involves terms independent of Yn by definition. If m ≤ n then consider the isometrictransformation (Ym, Xn)→ (Ym, Xn ·Ym). The second variable is the product Y0 · · ·Ym−1 ·Ym+1 · · ·Yn which is clearly independent of Ym

Suppose we know the values of Yk for all k ≥ 1 and also some Xr. Then we canimmediately deduce the value of Y0 since

Y0 = Y0 · (Y1 · · ·Yr)2 = Xr ·Y1 ·Y2 · · ·Yr (80)

This shows that for any n, Y0 is measurable in the algebra σ(Y1, . . . , Yn, Xn) ⊂ σ(Y , Tn−1).Therefore Y0 is measurable in L =

⋂n σ(Y , Tn)

On the other hand, Y0 is independent of R. Let T = ∩nTn be the tail field. By theabove argument Y0 is independent of σ(Y1, . . . , Ym, Tn) for any n > m + 1. Therefore Y0is independent of Rm = σ(Y1, . . . , Tm, T ) since its a subset. But Rm ⊂ Rm+1 and so Y0 isindependent of ∪nRn = R

13

Dominated Convergence Theorem

5.1 Let S = [0, 1], Σ = B(S), µ be the Lesbegue measure. Define fn = nI(0,1/n). Provethat fn(s) → 0 for every s ∈ S but that µ( fn) = 1 for every n. Draw a picture ofg = supn| fn| and show that g 6∈ L1(S, Σ, µ)

If n > 1/s then fn(s) = 0 because the indicator I(0,1/n) doesn’t put any mass at pointsas large as s. So fn(s) eventually for every s and lim fn(s) → 0. On the other handµ( fn) = nµ(0, 1/n) = n

n = 1. The function g = supn| fn| sort of looks like the function 1/xnear 0, except it has stairsteps at each 1/n. It takes the value n on the range (1/(n + 1), n].Now gn = max(| f1|, . . . , | fn|) is a simple function which satisfies

µgn ≥n

∑k=1

k(

1k− 1

k + 1

)=

n

∑k=1

1k + 1

(81)

The sum diverges as n ↑ ∞, which means µg also diverges since its greater than any µgn.

5.2 Prove inclusion-exclusion considering integrals of indicator functions

We will use the following property of indicators repeatedly: for events A and B,

IA IB = IA∩B (82)

This can be verified by considering the four cases x ∈ (A ∪ B)c, x ∈ A \ B, x ∈ B \ A andx ∈ A ∩ B. These four cases partition Ω, and the values of the expressins on both sides of(82) are equal.

Let A1, . . . , An be a collection of events. To simplify notation I’m going to write I(A)for indicators rather than IA, but the indicator remains a function of ω ∈ Ω. Now by deMorgan’s law (A1 ∪ A2 ∪ . . . An)c = Ac

1 ∩ Ac2 ∩ · · · ∩ Ac

n.

1− I(A1 ∪ · · · ∪ An) = I((A1 ∪ . . . An)c)

=I(Ac1 ∩ Ac

2 ∩ · · · ∩ Acn)

=I(Ac1)I(Ac

2) . . . I(Acn)

=(1− I(A1))(1− I(A2)) . . . (1− I(An))

=1−∑i

I(Ai) + ∑i<j

I(Ai)I(Aj)− ∑i<j<k

I(Ai)I(Aj)I(Ak)

+ · · ·+ (−1)n−1 I(A1)I(A2) . . . I(An)

=1−∑i

I(Ai) + ∑i<j

I(Ai ∩ Aj)− ∑i<j<k

I(Ai ∩ Aj ∩ Ak)

+ · · ·+ (−1)n−1 I(A1 ∩ A2 ∩ · · · ∩ An)

(83)

14

Take expectations of each side yields inclusion exclusion

Pr(A1 ∪ · · · ∪ An) =∑i

Pr(Ai) + ∑i<j

Pr(Ai ∩ Aj)− ∑i<j<k

Pr(Ai ∩ Aj ∩ Ak)

+ · · ·+ (−1)n−1 Pr(A1 ∩ A2 ∩ · · · ∩ An)

(84)

Now we prove the inclusion-exclusion in equalities. The d-depth inequality on n termson the indicator functions is given by

I(A1 ∪ · · · ∪ An) Q ∑i

I(Ai) + ∑i<j

I(Ai ∩ Aj)− ∑i<j<k

I(Ai ∩ Aj ∩ Ak)

+ · · ·+ (−1)d−1 ∑i1<i2<···<id

I(Ai1 ∩ Ai2 ∩ · · · ∩ Aid)(85)

where the inequality is ≤ if d is odd and≥ if d is even. For each d we induct on n. Clearlyits true for d = 1 and n = 2 since by inclusion-exclusion

I(A1 ∪ A2) = I(A1) + I(A2)− I(A1 ∩ A2) ≤ I(A1) + I(A2) (86)

Then its true inductively for any n and d = 1, assuming its true for n− 1

I(A1 ∪ · · · ∪ An) ≤ I(A1 ∪ · · · ∪ An−1) ∪ I(An)

≤ I(A1) + · · ·+ I(An−1) + I(An)(87)

For arbitrary d, the inequality is true for n ≤ k, since in this case its just the inclusion-exclusion equality. No terms are truncated. So inductively if its true for n − 1, we cangroup together A1 ∪ A2 and treat it as one set

I(A1 ∪ · · · ∪ An) Q I(A1 ∪ A2) + ∑i

I(Ai)

−∑i

I((A1 ∪ A2) ∩ Ai)−∑i<j

I(Ai ∩ Aj)

+ ∑i,j

I((A1 ∪ A2) ∩ Ai ∩ Aj) + ∑i<j<k

I(Ai ∩ Aj ∩ Ak)

+ . . .

+ (−1)d−1 ∑i1<i2<···<id−1

I((A1 ∪ A2) ∩ Ai1 ∩ Ai2 ∩ · · · ∩ Aid−1)

+ (−1)d−1 ∑i1<i2<···<id

I(Ai1 ∩ Ai2 ∩ · · · ∩ Aid)

(88)

Here indices range from 3 to n since we’ve explicitly broken out 1 and 2. Using the dis-tributive law, each indicator which includes A1 ∪ A2 can be written as the indicator of aunion of two sets (A1 ∩ B) ∪ (A2 ∩ B) where B = Ai1 ∩ · · · ∩ Aim . For all such indica-tors of m < d terms, use inclusion-exclusion for n = 2 to expand I((A1 ∪ A2) ∩ B) →I(A1 ∩ B) + I(A2 ∩ B)− I(A1 ∩ A2 ∩ B). After performing all these expansions, we seewe recover most of the terms inclusion-exclusion formula up to level d. This is because

15

the terms of level m expand to provide all the missing terms in level m which have exactlyone of A1 or A2, and also to provide all the missing terms in level m + 1 including bothA1 and A2. Terms with neither A1 or A2 were already present before the expansion.

Finally we extend the inequality by expanding the terms in level d using the inequalityI((A1∪ A2)∩ B) ≤ I(A1∩ B)+ I(A2∩ B). This provides the missing terms at level d. Thisalso extends the inequality in the right direction, ≤ for odd d and ≥ for even d, owing tothe factor (−1)d−1 on these terms.

The Strong Law

7.1 Let f be a bounded continuous function on [0, ∞). The Laplace transform of f is thefunction L on (0, ∞) defined by

L(λ) =∫ ∞

0e−λx f (x) dx (89)

Let X1, X2, . . . be i.i.d. random variables with the exponential distriution of rate λ, soPr(X > x) = e−λx, E[X] = 1

λ , Var(X) = 1λ2 . If Sn = X1 + X2 + · · ·+ Xn show that

E f (Sn) =(−1)n−1λn

(n− 1)!L(n−1)(λ) (90)

Show that f may be recovered from L as follows: for y > 0

f (y) = limn↑∞

(−1)n−1 (n/y)nLn−1(n/y)(n− 1)!

(91)

We can write

E f (Sn) =∫

Rn+

f (x1 + x2 + · · ·+ xn)e−λx1e−λx2 . . . e−λxn dx1dx2 . . . dxn

=∫ ∞

0

∫ ∞

u1

· · ·∫ ∞

un−1

f (un)e−λun dun . . . du1

(92)

where we performed a change of variables u1 = x1, u2 = x1 + x2, . . . , un = x1 + x2 + · · ·+xn.

To simplify this, we’re going to successively apply Fubini’s theorem. To that end con-sider the calculation∫ ∞

0uk∫ ∞

ug(v) dv du =

∫ ∞

0g(v)

∫ v

0uk du dv =

∫ ∞

0

vk+1

k + 1g(v) dv (93)

So let u = u1, v = u2, k = 0 and g(v) =∫ ∞

v · · ·∫

un−1f (un)e−λun . We are left with∫ ∞

0u2

∫ ∞

u2

· · ·∫ ∞

un−1

f (un)e−λun dun . . . du1 (94)

16

We may repeatedly interchange the order of integration, and then integrate out one vari-able until we are left with

E f (Sn) =∫ ∞

0

un−1n

(n− 1)!f (un)e−λun dun (95)

Now we need two fact from the theory of Laplace transforms

ddλ

L[ f ] =∫ ∞

0

ddλ

f (x)e−λx dx =∫ ∞

0−x f (x)e−λx dx (96)

= −L[x f ] (97)

L[ f ′] =∫ ∞

0f ′(x)e−λx dx = f (x)e−λx

∣∣∣∞0+ λ

∫ ∞

0f (x)e−λx dx (98)

= λ f (0) + λL[ f ] (99)

Comparing this equation to (95) its clear that

E f (Sn) =(−1)n−1

(n− 1)!Ln−1[ f ] (100)

TODO where is the factor of λn−1???Note E x4 = λ−5Γ(4) < ∞. Therefore by the SLLN, Sn/n → λ almost surely. By

continuity, f (Sn/n) → f (1/λ) almost surely. By bounded convergence theorem (sincef ≤ B for some constant B and E B = B), E f (Sn/n) → f (1/λ). We connect this to ourprevious expression by first noting for α > 0

L[ f (αx)](λ) =∫ ∞

0f (αx)e−λx dx = α−1

∫ ∞

0e−λαu f (u) du = α−1L[ f (x)](αλ) (101)

where the middle equality comes from the substitution u = x/α. From this its clear that

dn−1

dxn L[ f (αx)](λ) = α−n dn−1

dxn L[ f (x)](αλ) (102)

Taking y = 1/λ and α = 1/n we get the desired result

f (y) = E f (Sn/n) = limn→∞

(−1)n−1 (n/y)nL(n−1)(n/y)(n− 1)!

(103)

17

7.2 As usual, write Sn−1 = x ∈ Rn : |x| = 1. There is a unique probability mea-sure νn−1 on (Sn−1,B(Sn−1)) such that νn−1(A) = νn−1(HA) for every orthogonal n× nmatrix H and every A in B(Sn−1). This is the same as the radial measure in polar coor-dinates, or the Haar measure under the action of orthogonal matricies.

(a) Prove that if X is a vector in Rn, the components of which are independent N (0, 1)variables, then for every orthogonal n × n matrix H the vector HX has the sameproperty. Deduce that X/‖X‖ has law νn−1

(b) Let Z1, Z2, · · · ∼ N (0, 1) and let

Rn = ‖(Z1, . . . , Zn)‖ = (Z21 + Z2

2 + · · ·+ Z2n)

12 (104)

Show Rn/√

n→ 1 a.s.

(c) For each n, let (Y(n)1 , Y(n)

2 , . . . , Y(n)n ) be a point chosen on Sn−1 according to the dis-

tribution νn−1. Then

limn→∞

Pr(√

Y(n)1 ≤ x) = Φ(x) =

1√2π

∫ x

−∞e−y2/2 dy (105)

limn→∞

Pr(√

Y(n)1 ≤ x1;

√Y(n)2 ≤ x2) = Φ(x1)Φ(x2) (106)

(a) Let Y = HX. Each component is the linear combination of Gaussian random vari-ables, so it too must be jointly Gaussian. Let’s compute the moments of the compo-nents. Note E Y = E HX = E H0 = 0. Therefore

Cov Y = E YYᵀ = E(HX)(HX)ᵀ = E HXXᵀHᵀ = H(E XXᵀ)Hᵀ = HIHᵀ = I (107)

We’ve used the identity HHᵀ for orthogonal matricies. Thus each Yi ∼ N(0, 1). Fur-thermore Cov(Yi, Yj) = 0 if i 6= j. Since these are jointly Gaussian random variables,zero correlation is the same thing as independence. Now X/‖X‖ ∈ Sn−1 and the lawof X, by the above calculation, is invariant under orthogonal transformations. Thatis, HX/‖HX‖ has the same distribution as X/‖X‖, which is true since HX has thesame distribution as X. We conclude that the law of X/‖X‖ is νn−1 by the uniquenessof this measure.

(b) Note E Z2k = 1 since this is just the variance of Zk. Also E Z8

k < ∞ since a Gaussianhas finite moments of all orders. By the SLLN applied to the random variables Zk,

R2n/n = (

n

∑k=1

Z2k )/n→ 1 a.s. (108)

By continuity of square roots, it must also be the case that Rn/√

n→ 1 almost surely.

(c) Note Y has the same law as X/‖X‖ = X/Rn. As n → ∞, this tends to X/√

n almostsurely. Thus Pr(

√nY1 < x) → Pr(X1 < x) = Φ(x). Similarly Pr(

√nY1 < x1;

√nY2 <

x2)→ Pr(X1 < x1; X2 < x2) = Φ(x1)Φ(x2).

18

Conditional Expectation

9.1 Prove that if G is a sub σ-algebra of F and if X ∈ L1(Ω,F , Pr) and if Y ∈L1(Ω,G, Pr) and

E(X; G) = E(Y; G) (109)

for every G in a π-system which contains Ω and generates G, then (109) holds for everyG ∈ G.

We make a standard monotone class argument. We will show that the collection of setssatisfying (109) is a d-system, so by Dynkin’s lemma and the hypothesis the π-system Ghas property implies σ(G) has the property.

(a) By assumption Ω ∈ G

(b) Suppose A, B ∈ G. Then L = A ∩ B ∈ G because G is a π-system. Now since B =(B \ A)ä L is the union of these disjoint sets. Therefore IB = IB\A + IL. Thus by thelinearity of expectations

E(X; B \ A) = E XIB\A = E X(IB − IL) = E(X; B)− E(X; L) (110)

Since the same is true for Y we get

E(X; B \ A) = E(X; B)− E(X; L) = E(Y; B)− E(Y; L) = E(Y; B \ A) (111)

and the class of sets satisfying (109) is closed under taking differences.

(c) Suppose An ↑ A. Then XIAn is dominated by |XIA| and also XIAn → XIA. Thus bydominated convergence

E(X; An)→ E(X; A) (112)

Mutatis mutandis, the same is true for Y hence

E(X; A) = limn→∞

E(X; An) = limn→∞

E(Y; An) = E(Y; A) (113)

This shows that A is in the class of sets which satisfies (109)

Thus, the class of sets which satisfies (109) is a d-system as well as a π-system, so it in-cludes σ(G).

I guess another approach is to note that G → E[X; G] is a (signed) measure, so we canuse something like theorem 1.6 on the uniqueness of measures defined on π-systems?

9.2 Suppose that X, Y ∈ L1(Ω,F , Pr) and that

E(X|Y) = Y a.s. E(Y|X) = X a.s. (114)

Prove that Pr(X = Y) = 1

19

Recall the defining property of of conditional expectation. First, E[Y | X] is σ(X)measurable,and second and for G ∈ σ(X),

E[E[Y | X]; G] = E[Y; G] (115)

Let A ∈ σ(X). From (114) and (115)

E[X; A] = E[E[Y | X]; A] = E[Y; A] (116)

Also, mutatis mutandis, E[Y; B] = E[X; B] for any B ∈ σ(Y). For c ∈ R, taking A = X ≤c and B = Y ≤ c gives

E(X; X ≤ c) = E(Y; X ≤ c)⇒E(X−Y; X ≤ c, Y > c) + E(X−Y; X ≤ c, Y ≤ c) = 0

E(X; Y ≤ c) = E(Y; Y ≤ c)⇒E(X−Y; X > c, Y ≤ c) + E(X−Y; X ≤ c, Y ≤ c) = 0

(117)

ThereforeE(X−Y; X ≤ c, Y > c) = E(X−Y; X > c, Y ≤ c) (118)

since both equal E(Y− X; X ≤ c, Y ≤ c). Since the left hand side is non-positive, and theright hand side is non-negative, both sides must be zero.

I think this clinches it. Note X > Y ⊂ ⋃c∈QX > c, Y ≤ c. However

E(X−Y; X > Y) ≤ ∑c∈Q

E(X−Y; X > c, Y ≤ c) = 0 (119)

However if Pr(X > Y) > 0 then E(X − Y; X > Y) > 0. Thus X ≤ Y almost surely. Bysymmetry, Y ≤ X almost surely, and the intersection of these almost sure events is almostsure. So X = Y almost surely.

Martingales

10.1 Polya’s urn At time 0 an urn contains 1 black ball and 1 white ball. At each timet = 1, 2, . . . a ball is chosen at random from the urn and is replaced, and another ball ofthe same color is also added. Just after time n there are therefore n + 2 balls in the urn,of which Bn + 1 is black, where Bn is the number of black balls chosen by time n.Let Mn = (Bn + 1)/(n + 2) be the proportion of blacks balls just after time n. Prove thatM is a martingale.Prove Pr(Bn = k) = 1

n+1 for 0 ≤ k ≤ n. What’s the distribution of Θ = lim Mn. Provethat for 0 < θ < 1,

Nθn =

(n + 1)!Bn!(n− Bn)!

θBn(1− θ)n−Bn (120)

defines a martingale Nθ

20

We will consider M with respect to the filtrationFn = σ(B1, . . . , Bn). Thus, since σ(Bn−1) ⊂Fn−1

E[Mn | Fn−1] = Pr[Bn = Bn+1]Bn−1 + 1

n + 2+ Pr[Bn = Bn+1 + 1]

Bn−1 + 2n + 2

=n− Bn−1

n + 1· Bn−1 + 1

n + 2+

Bn−1 + 1n + 1

· Bn−1 + 2n + 2

=(n + 2)(Bn−1 + 1)(n + 2)(n + 1)

= Mn−1

(121)

which verifies Mn is a martingale.Now let ti specify on which draw the ith black ball selected, and let sj specify which

draw the jth white ball is selected are white. That is, if n = 7 and k = 4 and the ballsdrawn are bwwbbwb then t1 = 1, t2 = 4, t3 = 5, t4 = 7 and ti is given by s1 = 2, s2 =3, s3 = 6. The length of each sequence is the total number of balls chosen of the respectivecolor. Since every draw is either black or white, thus it must appear either in the blacksequence or the white sequence. Therefore

⋃i ti ∪

⋃i si = 1, 2, . . . , n. Let’s compute the

probability of a specific sequence

P =k

∏i=1

iti + 1

·n−k

∏j=1

jsj + 1

=k!(n− k)!

n + 1=

1(n + 1)

(nk

)−1

(122)

We’ve learned a remarkable fact: the probability doesn’t depend on the order of thedraws, just the total number which are black. The sequence is exchangable, any per-mutation of the outcomes results in an event with the same probability. There are (n

k)sequences with k black balls and n− k white ones.

Pr(Bn = k) =(

nk

)1

(n + 1)

(nk

)−1

=1

n + 1(123)

Thus we’ve learned another remarkable fact, the distribution of Bn is uniform on 1, 2, . . . , n+1.Thus Θ = limn→∞ Mn must converge in distribution to the uniform distribution U(0, 1).

Turning to Nθn, let B be the event that there’s a black ball on the nth draw. That is that

Bn = Bn−1 + 1. Let W = Bc, the probability there is a white ball.

E[Nθn | Fn−1] =Pr(W)

(n + 1)!Bn−1!(n− Bn−1)!

θBn−1(1− θ)n−Bn−1

+ Pr(B)(n + 1)!

(Bn−1 + 1)!(n− Bn−1 − 1)!θBn−1+1(1− θ)n−Bn−1

=n!

Bn−1!(n− 1− Bn−1)!θBn−1(1− θ)n−1−Bn ×[

(1− θ)(n + 1)n− Bn−1

P(W) +θ(n + 1)Bn−1 + 1

P(B)]

(124)

The factor before the parantheses is Nθn−1. The expression in the parentheses is

(1− θ)n + 1

n− Bn+1

(n− Bn−1

n + 1

)+ θ

n + 1Bn−1 + 1

(Bn−1 + 1

n + 1

)= 1− θ + θ = 1 (125)

21

So the martingale property is proved

10.2 Belman’s optimality principle Your winnings per unit stake on game n are εnwhere εn are i.i.d. RV with distribution

εn =

+1 with probability p−1 with probability q

(126)

Your stake Cn on game n must lie between 0 and Zn−1, where Zn−1 is your fortune attime n − 1. Your object is to maximize the expected interest rate E log(ZN/Z0) whereN is a given integer representing the number of times you will play and Z0, your initialwealth, is a given constant. Let Fn = σ(ε1, . . . , εn) be your history up to time n. Showthat if C is any previsible strategy, then log Zn− nα is a supermartingale where α denotesthe entropy

α = p log p + q log q + log 2 (127)

so that E log(ZN/Z0) ≤ Nα, but that, for a certain strategy, log Zn − nα is a martingale.What is the best strategy?

For a discrete distribution on 1, 2, . . . , n, lets maximize E log X for a random variablesubject to the constraint ∑n

i=1 X(i) = K. Using the method of Lagrange multipliers wemaximize the expression

F =n

∑i=1

pi log xi − λ(n

∑i=1

xi − K) (128)

over the n + 1 variables x1, x2, . . . , xn, λ. (Maximizing with respect to λ) is just the sameas respecting the constraint. The first order conditions are

∂F∂xi

= 0 ⇒ pi

xi= λ (129)

Summing over the equations 1 = ∑i pi = λ ∑i xi = λK. Thus xi = Kpi is the extremalsolution. This is a maximum, which can be seen by considering the distribution wherep1 = 1, or the second order conditions ∂2F

∂x2i= − pi

x2i= − 1

pi. The maximum value is given

by∑

ipi log xi = ∑

ipi(log pi + log K) = −H(p) + log K (130)

where H(p) = −∑i pi log pi is the entropy of the distributionCalculating

log Zn − log Zn−1 = log(

Zn−1 + Cnεn

Zn

)(131)

Let fn = Cn/Zn−1. For any choice 0 ≤ Cn ≤ Zn−1, let fn = Cn/Zn−1 we get 0 ≤ f ≤ 1.

E[log Zn − log Zn−1] = p log(1 + fn) + q log(1− fn) ≤ p log p + q log q + log 2 (132)

22

where the last inequality follow from our previous calculation. Therefore Zn − nα is a su-permartingale for any previsible Cn and the optimal Cn is given by Cn = (2p− 1)Zn−1 =(p− q)Zn−1 . The supermartingale property implies that E[log ZN − Nα] ≤ E[log Z0] andthus E[log(ZN/Z0)] ≤ Nα

10.3 Stopping times. Suppose S and T are stopping times relative to (Ω,F , Fn).Prove that S ∧ T = min(S, T) and S ∨ T = max(S, T) and S + T are all stopping times.

First note that for any constant n, T = n ∈ Fn. This is T ≤ n \ T ≤ n − 1 andthe former is in Fn and the latter is in Fn−1 ⊂ Fn. Therefore T ≥ n ∈ Fn sinceT ≥ n = T = n ∪ T > n. The latter is just T ≤ nc ∈ Fn.

Turning to S ∧ T, suppose S ∧ T = n. Without loss of generality, suppose S = nand T ≥ n. We’ve already shown that each of these sets S = n and T ≥ n arein Fn, therefore their intersection is as well. The function S ∨ T is even simpler, since ifS ∨ T = n, without loss of generality assume S = n and T ≤ n. These events are both inFn, so their intersection is as well.

For S + T, note that each of S and T are natural numbers. Therefore if S + T = n forsome n ∈ N, there are a finite number of possibilities that S = s and T = t and s + t = n.(Namely s ∈ 0, 1, . . . , n and t = n− s). So T + S = n = ⋃n

k=0S = k ∩ T = n− k,and each of the sets in this expression are clearly in Fn

10.4 Let S and T be stopping times with S ≤ T. Define the process 1(S,T] with parameterset N via

1(S,T](n, ω) =

1 if S(ω) < n ≤ T(ω)

0 otherwise(133)

Prove 1(S,T] is previsible and deduce that if X is a supermartingale then

E(XT∧n) ≤ E(XS∧n), ∀n (134)

Note1(S,T](n, ω) = 1 = S(ω) ≤ n− 1 ∩ T(ω) ≥ n (135)

The first event S ≤ n− 1 ∈ Fn−1 because S is a stopping time. Similarly the secondevent T ≥ n = T ≤ n− 1c ∈ Fn−1 because T is a stopping time. Since 1(S,T] onlytakes the values 0, 1 this proves that ω → 1(S,T](n, ω) is Fn−1 measurable. Hence theprocess 1(S,T) is previsible.

Thus for any supermartingale Xn, the process Yn = 1(S,T] • Xn is also a supermartin-gale with value at time n given by

1(S,T] • Xn = XT∧n − XS∧n (136)

Without loss of generality we can assume Y0 = 0, reindexing time if necessary so thatS, T ≥ 1. Taking expectations of both sides gives the desired result

E Yn = E XT∧n − E XS∧n ≤ Y0 = 0 (137)

23

10.5 Suppose that T is a stopping time such that for some N ∈ N and some ε > 0 wehave, for every n

Pr(T ≤ n + N | Fn) > ε, a.s. (138)

By induction using Pr(T > kN) = Pr(T > kN; T > (k− 1)N) that for k = 1, 2, 3, . . .

Pr(T > kN) ≤ (1− ε)k (139)

Show that E(T) < ∞

Let Gn = T > n. Taking complements, our hypothesis implies

Pr(Gn+N | Fn) < 1− ε, a.s. (140)

Taking n = 0 this implies Pr(GN) = Pr(GN | Fn) < 1− ε. By induction, using the relationabove for n = N(k− 1) we get

Pr(GNk) = Pr(GN(k−1)+N | GN(k−1))Pr(GN(k−1)) ≤ (1− ε)(1− ε)n−1 (141)

so the induction holds.Let’s derive an alternate expression for expectation

E T =∞

∑n=1

n Pr(T = n) =∞

∑n=1

n

∑k=1

Pr(T = n) =∞

∑k=1

∞

∑n=k

Pr(T = n) =∞

∑k=1

Pr(T ≥ k) (142)

Also if m < n then Gm ⊃ Gn we have the trivial inequality Pr(Gm) ≥ Pr(Gn). So usingthis inequality for n = Nk + 1 to n = N(k + 1)− 1 compared with m = Nk we get

E T =∞

∑n=1

Pr(T ≥ n) ≤∞

∑k=0

N Pr(T ≥ Nk) ≤ N∞

∑k=0

(1− ε)k =Nε

(143)

Clearly this is finite. Note that the inequalities used above are true almost surely. How-ever, we used only a countable number of inequalities, and the countable intersection ofalmost sure events is still almost sure. Thus the inequality above also holds almost surely.The result may be summarized, “so long as there’s a chance something happens, no mat-ter how small (so long as the chance is bounded below by a positive number), then it willhappen eventually”

24

10.6 ABRACADABRA. At each of times 1, 2, 3, . . . a monkey types a capital letter atrandom, so the sequence typed is an i.i.d. sequence of RV’s each chosed uniformlyamong the 26 possible capital letters. Just before each time t = 1, 2, . . . a new gamblerarrives. He bets $1 that

the nth letter will be A (144)

If he loses he leaves. If he wins he bets his fortune of $26 that

the (n + 1)st letter will be B (145)

If he loses he leaves. If he wins he bets his fortune of $262 that

the (n + 2)st letter will be R (146)

and so on through ABRACADABRA. Let T be the first time the monkey has producedthe consecutive sequence. Show why

E(T) = 2611 + 264 + 26 (147)

Intuitively, the cumulative winnings of all the gamblers are a martingale, since its the sumof a bunch of martingales. The cumulative winnings are −T for the initial stake placedat each time 1, . . . , T plus 2611 + 264 + 26 payout for the winnings on the bets placed onthe first A (winning for the whole word ABRACADABRA), the bet placed on the secondto last A (winning for the terminal fragment ABRA) and the bet placed on the final A(winning for A). For this to be a martingale, expectated cumulative winnings must be 0so E T = 2611 + 264 + 26

More formally, let Ln be the nth randomly selected letter. The Ln are i.i.d., with uni-form distribution over A, B, . . . , Z. For a sequence of letters a1a2 . . . an consider themartingale

M(j)n =

1 if n ≤ j26k if n = j + k and Lj = a1, Lj+1 = a2, . . . , Lj+k−1 = ak

2611 if n > j + 11 and Lj = a1, Lj+1 = a2, . . . , Lj+10 = a11

0 otherwise

(148)

This is bounded, and hence is in L1. M(j)n satisfies the martingale prorperty since, except

for case 2, its constant and in that case

E[M(j)n+1 | Fn] =

126· 26M(j)

n +2526· 0 = M(j)

n (149)

Now form the martingale N(j) = In≥j •M(j) where In≥j is the indicator for n ≥ j. Nowform the martingale

B = ∑j≥1

N(j) (150)

At any time n only finitely many of the terms are non-zero. Each term is in L1 , so B isalso in L1. The martingale property holds since B is the sum of martingales. For n ∈ N

25

let S be the set of j ≤ n such that M(j)n is in case 2, matching some number of letters. Let

S′ be the set of j ≤ n such that M(j)n is in case 3, matching all the letters. The value of B is

given byBn = −n + ∑

j∈S26n−j + ∑

j∈S′2611 (151)

Let T be the stopping time for the first time some M(j) enters case 3, that is M(j)

matches all 11 letters in ABRACADABRA. It must be that E|T| < ∞ because it statis-fies the property in 10.5 with N = 11 and ε = 26−11– i.e., it stops if the next 26 lettersare ABRACADABRA in order. Whenever the stopping time happens, Bn has the samevalue for the second and third terms. In this case S = n− 4, n− 1 since the terminalABRA and A matched those M(n−4) and M(n−1) respectively, and S′ has exactly one el-ement, since this is the first time ABRACADABRA appeared. Hence by Doob’s optionalstopping theorem

0 = B0 = E BT = −E T + 264 + 26 + 2611 (152)

This is the same as (147)

10.7 Gambler’s Ruin. Suppose that X1, X2, . . . are i.i.d. RV’s with

Pr[X = +1] = p, Pr[X = −1] = q, where 0 < p = 1− q < 1 (153)

and p 6= q. Suppose that a and b are integers with 0 < a < b. Define

Sn = a + X1 + · · ·+ Xn, T = infn : Sn = 0 or Sn = b (154)

Let F0 = ∅, Ω and Fn = σ(X1, . . . , Xn). Explain why T satisfies the condition in 10.5.Prove that

Mn =

(qp

)Sn

and Nn = Sn − n(p− q) (155)

define martingales M and N. Deduce the values of Pr(ST = 0) and E ST

Let’s compute

E[Mn | Fn−1] = p(

qp

)Sn−1+1

+ q(

qp

)Sn−1−1

= (q + p)(

qp

)Sn−1

= Mn−1 (156)

E[Nn | Fn−1] = p(Sn−1 + 1− n(p− q)) + q(Sn−1 − 1− n(p− q)) (157)= (p + q)Sn−1 + p− q− n(p− q) = Nn−1 (158)

Now the condition described in 10.5 holds with N = b − 1. For any n suppose thatXn+1 = Xn+2 = · · · = Xn+b = +1. Conditional on Sn ∈ (0, b), then some Sn+b =Sn + b > b, so Sn+k = b for some k ∈ 1, 2, . . . , b − 1. In particular, T ≤ n + b − 1.However Pr(Xn+1 = Xn+2 = · · · = Xn+b = +1) = pb > 0. So

Pr(T > N + b− 1 | Fn) > pb (159)

26

and the condition is satisfied. We conclude that E T < ∞.Let π = Pr(ST = 0) = 1− Pr(ST = b). By Doob’s optional-stopping theorem(

qp

)a= M0 = E MT = π · 1 + (1− π) ·

(qp

)b(160)

or

Pr(ST = 0) = π =(q/p)a − (q/p)b

1− (q/p)b Pr(ST = b) = 1− π =1− (q/p)a

1− (q/p)b (161)

Alsoa = N0 = E NT = π · 0 + (1− π) · b− T(p− q) (162)

or

T =(1− π)b− a

p− q(163)

10.8 A random number Θ is chosen uniformly in [0, 1] and a coin with probability Θof heads is minted. The coin is tossed repeatedly. Let Bn be the number of heads in ntosses. Prove that Bn has exactly the same probabilistic structure as the (Bn) sequence in10.1 on Polya’s urn. Prove that Nθ

n is the regular conditional pdf of Θ given B1, . . . , Bn

Consider the beta function B(m + 1, n + 1) =∫ 1

0 um(1− u)n du. Note B(m, n) = B(n, m)by a change of variables v = 1− u. Thus, without loss of generality, assume m ≥ n

B(m + 1, n + 1) =∫ 1

0um(1− u)n du

=1

m + 1um+1un

∣∣∣∣10+

nm + 1

∫ 1

0um+1(1− u)n−1 du

=n

m + 1B(m + 2, n) =

(n

∏k=1

n + 1− km + k

)B(m + n + 1, 1)

=n!m!

(m + n)!1

m + n + 1=

n!m!(n + m + 1)!

(164)

Now conditional on Θ = θ, Bn ∼ Bin(n, θ). Thus we can marginalize over θ to find

Pr(Bn = k) =∫ 1

0Pr(Bn = k | Θ = θ) dθ =

∫ 1

0

(nk

)θk(1− θ)n−k dθ

=

(nk

)B(k + 1, n− k + 1) =

1n + 1

(165)

27

Compare this with (123). Now

Pr(Bn = k; Bn−1 = k− 1) =∫ 1

0Pr(Bn = k; Bn−1 = k− 1 | Θ = θ) dθ

=∫ 1

0θ ·(

n− 1k− 1

)θk−1(1− θ)n−k dθ

=

(n− 1k− 1

)B(k + 1, n− k + 1) =

kn(n + 1)

(166)

We’ve used the fact that Pr(Bn = k | Bn−1 = k− 1, Θ = θ) = θ, since conditional on Θ,the trials are independent with probability θ. Therefore using Baye’s rule

Pr(Bn = k | Bn−1 = k− 1) =Pr(Bn = k; Bn−1 = k− 1)

Pr(Bn−1 = k− 1)=

kn

(167)

This is eactly the rule for for Polya’s urn.Let’s compute the regular conditinal pdf for Θ. Let B(n) = (B1, B2, . . . , Bn) be the

vector of values of Bk for k = 1, . . . , n and let b = (b1, . . . , bn) be a realization. Now

Pr(B(n) = b | Θ = θ) =n

∏k=1

θ I(bk=bk−1+1)(1− θ)I(bk=bk−1) (168)

That is, we pick up a factor of θ whenever bk increases (we flipped a head) or 1− θ when-ever bk stays the same (we flipped a tails). Clearly this can be simplified to

Pr(B(n) = b | Θ = θ) = θBn(1− θ)n−Bn (169)

Marginalizing over Θ, we compute

Pr(B(n) = b) =∫ 1

0θBn(1− θ)n−Bn dθ =

Bn!(n− Bn)!(n + 1)!

(170)

Now by Bayes rule

Pr(Θ = θ | B(n) = b) =Pr(B(n) = b | Θ = θ)Pr(Θ = θ)

Pr(B(n) = b)

=(n + 1)!

Bn!(n− Bn)!θBn(1− θ)n−Bn dθ

= Nθn dθ

(171)

This proves the last assertion in the problem.

28

The jailhouse problem.

Γ(x)Γ(y) =∫ ∞

0sx−1e−x ds

∫ ∞

0ty−1e−y dt

=∫ ∞

0

∫ ∞

0sx−1ty−1e−x−y ds dt

=∫ ∞

0

∫ 1

0(uv)x−1((1− u)v)y−1e−x−y v du dv

=∫ 1

0ux−1(1− u)y−1 du

∫ ∞

0vx+y−1e−x−y dv

= B(x, y)Γ(x + y)

(172)

where we made the substitution u = ss+t , v = s + t with Jacobian

∂(u, v)∂(s, t)

=

∣∣∣∣t/(s + t) −s/(s + t)1 1

∣∣∣∣ = 1s + t

=1v

(173)

When x, y ∈N we recover our previous formula (164) since Γ(n) = (n− 1)! for n ∈N

10.9 Show that if X is a non-negative supermartingale and T is a stopping time then

E(XT; T < ∞) ≤ E X0 (174)

Deduce thatPr(sup Xn ≥ c) ≤ E X0

c(175)

Let Zn = Xn IT<∞. Then as n → ∞, Zn∧T → XT IT<∞ since if T(ω) = N then Zn = XT forall n > N so Zn → XT and if T(ω) = ∞ then Zn = 0 for all n. Since Xn ≥ 0, by Fatou’slemma

E[XT; T < ∞] = E lim ZT∧n = E lim inf ZT∧n ≤ lim inf E ZT∧n (176)

so, because Xn ≥ 0 and because of the supermartingale property

E[XT; T < ∞]− E[X0] ≤ lim inf E ZT∧n − E X0

≤ lim infn→∞

E XT∧n − E X0

≤ 0

(177)

This gives us (174).Now consider the stopping time T = infXn ≥ c. From the definition of T, if n = T

then Xn ≥ c.

c Pr(max Xn ≥ c) = c Pr(T < ∞) ≤ E[XT IT<∞] ≤ E X0 (178)

This gives us (175). Note this is a maximal version of Markov’s inequality.

29

10.10 The “Starship Enterprise” problem. The control system on the star-ship Enter-prise has gone wonky. All that one can do is to set a distance to be travelled. Thespaceship will then move that distance in a randomly chosen direction, then stop. Theobject is to get into the Solar System, a ball of radius ρ. Initially, the Enterprise is at adistance R0 > ρ from the Sun.Show that for all ways to set the jump lengths rn

Pr( the Enterprise returns to the solar system) ≤ ρ

R0(179)

and find a strategy of jump lengths rn so that

Pr(the Enterprise returns to the solar system) ≥ ρ

R0− ε (180)

Let Rn be the distance after n space-hops. Now f (x) = 1|x| is a harmonic function satis-

fying ∇2 f = ∑i∂2 f∂x2

i= 0 when x 6= 0. Let r = |x|, let S(x, r) be the 2-sphere of radius r

centered at x. Then let S(r) = S(0, r), and let dσ be the surface area element. Similarly letB(x, r) be the ball of radius r centered at x.

∂ f∂xi

= − xi

r3∂2 f∂x2

i= − 1

r3 +3x2

ir5 ∇2 f = − 3

r3 +3r2

r5 = 0 (181)

Any harmonic function satisfies the mean-value property. Let

A(x, r) =1

4πr2

∫∫S(r)

f dσ =1

4π

∫∫S(1)

f (x + rn) dσ1 (182)

In the above we applied a dilation to integrate over S(1) with surface element dσ1 = r2dω.Then differentiating under the integral sign with respect to r

ddr

A(x, r) =1

4π

∫∫S(1)∇ f (x + rn) · n dσ1 =

14πr2

∫∫S(r)∇ f · n dσ

=1

4πr2

∫∫∫B(r)∇2 f dV = 0

(183)

In the last equation we used Gauss’ law. By continuity A(x, r) → f (x) as r → 0 soA(x, r) = f (x) for all r.

The probability law for the starship enterprise for a jump of size r is given by dω4πr2 on

the ball S(x, r). so E 1/Rn+1 = A(xn, r) for f (x) = |x|−1. Hence, if r < Rn, by the abovecalculation

E1

Rn+1=

1Rn

(184)

since |x|−1 is harmonic inside of B(x, r), and therefore 1/Rn is a martingale.

30

Before considering the case r > Rn we need to perform a calculation. For any ε > 0and f (x) = |x|−1

∫∫S(ε)∇ f · n dσ =

∫∫S(ε)−x · x

r4 dσ = −4πε2

ε2 = −4π (185)

since∇|x|−1 = −x/r3 and n = x/r. Remarkably, the integral does not depend on ε. Thusfor r > Rn, we can modify (183) to integrate over B(x, r) \ B(0, ε), and then take ε → 0.This gives the formula

ddr

A(x, r) =

0 if r < |x|− 1

r2 if r ≥ |x|(186)

Thus, this more complete analysis shows that 1/Rn is a supermartingale everywhere, and

E1

Rn+1=

1max(r, Rn)

≤ 1Rn

(187)

By the positive supermartingale maximal inequality (175)

Pr(minn

Rn ≤ ρ) = Pr(

maxn

1Rn≥ 1

ρ

)≤ ρ

R0(188)

The probability on the left is the probability that the enterprise ever gets into the solarsystem, and it applies to any strategy rn of jump-lengths.

To get a lower bound, consider the process with constant jump sizes rn = ε for any0 < ε < ρ, and also choose ρ′ > R0. Let τ = infRn ≤ ρ and τ′ = infRn ≥ ρ′and let t = τ ∧ τ′. First we show Pr(t < ∞) = 1, that is we hit one barrier or the othereventually. Note the x coordinate of the Enterprise’s location follows a random walksince the x projection of each jump of length ε is i.i.d.. Let Xn be the x-coordinate at timen. By the symmetry of each spherical jump, Xn is a martingale, so E Xn = X0. Alsoif Jn = Xn − Xn−1 note Var(Jn) = ε2σ2

0 for some σ0 since, by dilation symmetry of thespherical jump, the variance scales as ε2.

Pr(Rn ≥ ρ′) ≥ Pr(|Xn| ≥ ρ′) ≥ Pr(|Xn − X0| > 2ρ′) (189)

However by the central limit theorem

Pr(|Xn − X0| > 2ρ′)→ 2Φ(− 2ρ′

εσ0√

n

)→ 1 as n→ ∞ (190)

We conclude

Pr(t = ∞) ≤ Pr(τ′ = ∞) = Pr(maxn

Rn ≤ ρ′) ≤ infn

Pr(Rn ≤ ρ′) = 0 (191)

Now let’s consider the stopped martingale Zn = 1/Rt∧n. This is a martingale since ourchose of ε < ρ implies rn = ε < Rn for all n. We can apply Doob’s optional stoppingtheorem since

1ρ′ + ε

≤ Zn ≤1

ρ− ε(192)

31

and therefore Zn is a bounded, positive martingale. We can break the expectation up intotwo parts, when Rn hits the upper and lower barriers ρ′ and ρ.

1R0

= E[Zt] = s E[Zt; t = τ] + E[Zt; t = τ′] ≤ Pr(t = τ)

ρ− ε+

Pr(t = τ′)

ρ′(193)

where we’ve estimated the expectations by the simple upper bounds implied by

ρ− ε ≤Rt ≤ ρ if t = τ

ρ′ ≤Rt ≤ ρ′ + ε if t = τ′(194)

This gives the lower bound

Pr(t = τ) ≥1

R0− 1

ρ′

1ρ−ε −

1ρ′

(195)

Now as ρ′ → ∞, the event t = τ is essentially the event that the Enterprise returnsto the solar system. It says that the distance Rn ≤ ρ occurs before Rn → ∞. So we caninterpret this equation in the limit that ρ′ → ∞ to say

Pr(Enterprise returns) ≥ ρ− ε

R0(196)

10.11 Star Trek 2, Captain’s LogMr Spock and Chief Engineer Scott have modified the control system so that the Enter-prise is confined to move for ever in a fixed plane passing through the Sun. However,the next ’hop-length’ is now automatically set to be the current distance to the Sun (’next’and ’current’ being updated in the obvious way). Spock is muttering something aboutlogarithms and random walks, but I wonder whether it is (almost) certain that we willget into the Solar System sometime. ” ’

Let f (x) = log|x| = 12 log(x2

1 + x22).

∂ f∂xi

= − xi

r2∂2 f∂x2

i= − 1

r2 +2x2

ir4 ∇2 f = − 2

r2 +2r2

r4 = 0 (197)

so this function is harmonic for x 6= 0. Thus log Rn is a martingale so Xn = log Rn −log Rn−1 has mean zero. Note the law of Rn is unique determined by the value of Rn−1.However, the law of Xn is independent of the value of Rn−1 because

Xn = log Rn − log Rn−1 = log αRn − log αRn−1 for any α > 0 (198)

We can choose α = 1/Rn−1. Therefore we conclude the sequence X1, X2, . . . is i.i.d. Fur-thermore, as we’ll show later, E X2

n < ∞, so the random variables have finite variance σ2.Consider

Sn = X1 + X2 + · · ·+ Xn (199)

32

For any real number α > 0,

Pr(infn

Sn = −∞) ≥ Pr(Sn ≤ −ασ√

n, i.o) (200)

since if the event on the right hand side happens, the event on the left hand side cer-tainly does, since Sn takes on values which are arbitrarily negative if its less than −ασ

√n

infinitely often.Let Ek be a sequence of events. Then the event that Ek occur infinitely often is given

byEi.o. =

⋂n∈N

⋃k≥n

Ek (201)

This implies that

Pr(Ei.o) = limn→∞

Pr(⋃

k≥n

Ek) = lim supn→∞

Pr(⋃

k≥n

Ek) (202)

The limit exists because the terms are non-increasing, and it equals the lim sup. SincePr(

⋃k≥n Ek) ≥ Pr(En), a lower bound is given by

Pr(Ei.o) ≥ lim supn→∞

Pr(En) (203)

By the central limit theorem

limn→∞

Pr(Sn/√

n ≤ −ασ) = Φ(−α) > 0 (204)

and therefore Pr(infn Sn = −∞) > 0. This implies that Pr(infn Sn = −∞) = 1 by Kolmol-gorov’s 0-1 law, since we will show this event is in the tail field.

There are two loose ends to tie up. First we will demonstrate that the event E =inf Sn = −∞ is in the tail field. Select k such that Snk ≤ −k. Now replace X1, . . . , Xm byX′1, . . . , X′m, and let

S′n =

X′1 + X′2 + · · ·+ X′n if n ≤ mX′1 + X′2 + · · ·+ X′m + Xm+1 + · · ·+ Xn if n > m

(205)

If D = ∑mi=1 X′i − Xi, then for n > m, S′n = Sn + D. From this its obvious that, for

large enough k, S′nk≤ −k + M → −∞. Thus inf S′n = −∞ as well. Therefore E is in

σ(Xm+1, Xm+2, . . . ) since its independent of the values of X1, . . . , Xm. Since this is true forany m, E is in the tail field.

Finally we show that σ2 = Var(Xi) < ∞. We show E X2i < ∞. Without loss of gen-

erality, assume Rn−1 = 1, and we’ll compute the pdf of R = Rn. Let Θ be the directionthe Enterprise travels, with Θ = 0 corresponding to the direction toward the center of thesolar system. By hypothesis Θ ∼ U(0, 2π). Now R is the base of an isoceles triangle withedges 1 and opposite angle Θ so

R = 2 sinΘ2

⇒ pR(r) =4√

4− r2(206)

33

where pR(r) is the pdf of R. Thus we want to compute

E(logR)2 =∫ 2

0

4(log r)2√

4− r2dr =

∫ log 2

−∞

4u2eu√

4− eudu (207)

The left tail when r is close to zero equivalent to the transformed integral when u is closeto −∞. This integral resembles the tail of Γ(3) =

∫ ∞0 u2e−u, and is therefore finite. On the

right tail when r → 2 the integrand resembles 4(log 2)2/√

4− r2 which has antiderivativeA arctan r/2 for some constant A, and therefore finite integral on [2− ε, 2]. On the inter-mediate range [ε, 2− ε], the integrand is continuous and hence bounded, so the integralis bounded there too. Thus E(log R)2 < ∞

12.1 Branching process. A branching process Z = Zn : n ≥ 0 is constructed in theusual way. Thus, a family X(n)

k : n, k ≥ 0 of i.i.d. Z+-valued random variables issupposed given. We define Z0 = 1 and recursively

Zn+1 = X(n+1)1 + · · ·+ X(n+1)

Zn(208)

Assume that if X denotes any of the X(n)k then

µ = E X < ∞ and 0 < σ2 = Var(X) < ∞ (209)

Provet that Mn = Zn/µn defintes a martingale M relative to the filtration Fn =σ(Z0, Z1, . . . , Zn). Show that

E(Z2n+1 | Fn) = µ2Z2

n + σ2Zn (210)

and deduce that M is bounded in L2 iff µ > 1. Show that when µ > 1

Var(M∞) =σ2

µ(µ− 1)(211)

Note that since E(X(n+1)k | Fn) = E X(n+1)

k = µ and since Zn is Fn-measurable

E(Zn+1 | Fn) =Zn

∑k=1

E(X(n+1)k | Fn) = Znµ (212)

From this it immediately follows that Mn is a martingale

E(Mn+1 | Fn) = E(Zn+1/µn+1 | Fn) = Zn/µn = Mn (213)

Similarly since Var(X(n+1)k | Fn) = σ2,

Var(Zn+1 | Fn) =Zn

∑k=1

Var(X(n+1)k | Fn) = Znσ2 (214)

34

But by definition, Var(Zn+1 | Fn) = E(Z2n | Fn)− (E(Zn+1 | Fn))2, so

E(Z2n | Fn) = (E(Zn+1 | Fn))

2 + Znσ2 = Z2nµ2 + Znσ2 (215)

From this we get

E(Mn+1 | Fn) =µ2Z2

nµ2n+2 +

Znσ2

µ2n+2 = M2n + Mn

σ2

µn+2 (216)

Thus

bn+1 = E(M2n+1 −M2

n) =σ2

µn+2 E Mn =σ2

µn+2 (217)

Note bn is a geometric series and therefore ∑n bn < ∞ iff µ > 1. By the theorem in section12.1, M is bounded in L2 iff ∑n bn < ∞, and in that case it converges almost surely tosome random variable M∞ with E M2

∞ = ∑n bn + M20. When µ > 1 we can sum the series

Var(M∞) = E M2∞ − (E M∞)2 =

σ2/µ2

1− µ−1 =σ2

µ(µ− 1)(218)

since E M∞ = M0

12.2 Let E1, E2, . . . be independent events with Pr(En) = 1/n. Let Yk = IEk . Prove that∑(Yk − 1

k )/ log k converges a.s. and use Kronecker’s Lemma to deduce that

Nn

log n→ 1 a.s. (219)

where Nn = Y1 + · · ·+ Yn

First note E Yk = Pr Ek =1k and

Var Yk ≤ E Y2k = E I2

Ek= E IEk = Pr(Ek) =

1k

(220)

Consider the random variable Zk =Yk−k−1

log k . Note that E Zk = 0 and

σ2k = Var Zk =

Var(Yk)

(log k)2 ≤1

k(log k)2 (221)

Now∫ dx

x(log x)2 = − 1log x so by the integral test, ∑k σ2

k < ∞. Also the Zk are independent,so by theorem 12.2,

∑k

Zk = ∑k

(Yk −

1k

)/ log k (222)

converges almost surely. By Kronecker’s lemma, this means

∑nk=1 Yk − k−1

log n=

Nn − Hn

log n→ 0 a.s. (223)

35

where Nn = Y1 + · · · + Yn and Hn = 1 + 12 + · · · + 1

n is the harmonic series. NowHn → log n + γ where γ is the Euler-Mascheroni constant. In particular Hn/ log n → 1.Therefore

Nn

log n→ 1 a.s. (224)

Note that the events in 4.3 for X1, X2, . . . drawn from i.i.d. continuous distribution thatthere is a “record” at time k satisfies the hypothesis of this problem. In 4.3 we show theevents are independent, and that they have probability 1

k .

12.3 Star Trek 3. Prove that if the strategy in 10.11 is (in the obvious sense) employed –and for ever – in R3 rather than in R2, then

∑ R−2n < ∞ a.s. (225)

where Rn is the distance from the Enterprise to the Sun at time n.

Let En be the event that the Enterprise returns to the solar system for the first time at timen. The area of a cap on a sphere with central angle θ is 2πR2(1− cos θ) and therefore theprobability of a jump landing in that cap is 1

2(1− cos θ). The area formula can be seen byintegrating the surface area element R2 sin θ dθdφ or by using Archemedes observationthat the projection of an sphere onto an enclosing cylendar preserves surface area. Forthe jumps strategy described in this problem, the radius of the jump sphere for jump n isRn−1, the distance from the Sun. The probabilty we enter the solar system correspondsto the cap whose edge is distance r from the sun. Thus there is an isoceles triangle withequal sides R and base r, and the opposite angle of the base is θ. By the law of cosines,ρ2 = 2R2 − 2R2 cos θ. Let T be the stopping time that the Enterprise returns to the solarsystem T = infRn ≤ r. This implies

ξk = Pr(Ek | Fk−1) =

ρ2

4R2k−1

k ≤ T

0 otherwise(226)

Using the notation of theorem 12.15, Zn = ∑k≤n IEn ≤ 1 for all n. Let Y∞ = ∑∞k=1 ξk =

∑Tk=1

ρ2

4R2k−1

. If Y∞ = ∞ then 12.15b implies that limn→∞ Zn/Yn = 1, but this is a contradic-

tion because Zn is bounded. Therefore Y∞ < ∞ almost surely.TODO: to clinch the argument I want to take ρ→ 0 and T → ∞, but the sum also gets

bigger in this limit since it includes the terms where Rn is really close to 0.

13.1 Prove that a class C of RVs is UI if and only if both of the following conditions hold

(i) C is bounded in L1, so that A = supE(|X|) : X ∈ C < ∞

(ii) for every ε > 0, ∃δ > 0 such that if F ∈ F , Pr(F) < δ and X ∈ C then E(|X|; F) < ε

36

Assume the conditions are true. Let

BK = supX∈CPr(|X| > K) (227)

Now for any X ∈ C, Markov’s inequality implies

Pr(|X| > K) ≤ E|X|K≤ A

K(228)

Taking the supremum over all X ∈ C, this shows BK ≤ A/K. Therefore BK → 0 as K → ∞.For any ε > 0, there is a δ satisfying condition (ii). Take K large enough that BK < δ. Fromthe definition of BK, all of the sets F = |X| > K satisfy Pr(F) < δ, and hence condition(ii) implies E(|X|; |X| > K) < ε for each X ∈ C. This verifies that C is UI.

Conversely assume C is UI. For any K > 0 and F ∈ F , note

IF ≤ IF(I|X|>K + I|X|≤K) ≤ I|X|>K + I|X|≤K IF (229)

Multiplying both sides by |X| and taking expectatins gives

E(|X|; F) ≤ E(|X|; |X| > K) + E(|X|; F ∩ |X| ≤ K)≤ E(|X|; |X| > K) + K Pr(F)

(230)

Hence, given ε > 0 choose K such that E(|X|; |X| > K) < ε2 for all X. Then choose δ = ε

2K .Equation (230) shows that for this choice of δ, condition (ii) is satisfied. Furthermore withF = Ω and K chosen so E(|X|; |X| > K) ≤ 1, (230) gives a uniform bound of 1 + K forE(|X|), so condition (i) is satisfied.

13.2 Prove that if C and D are UI classes of RVs, and if we define

C +D = X + Y : X ∈ C, Y ∈ D (231)

then C +D is UI. Hint.

We’ll verify the conditions in 13.1. First note that for Z0 ∈ C +D with Z0 = X0 + Y0

E|Z0| ≤ E|X0|+ E|Y0| ≤ supX∈C

E|X|+ supY∈D

E|Y| (232)

The bound on the right hand side is independent of Z, so its a bound for supZ∈C+D E|Z|,and condition (i) is satisfied.

For ε > 0, choose δ such that for any X ∈ C and Y ∈ D and F ∈ F with Pr(F) < δ,then E(|X|; F) < ε

2 and E(|Y|; F) < ε2 (Take δ to be the minimum of the δ needed for C or

D alone). Then for Z = X + Y

E(|Z|; F) ≤ E(|X|; F) + E(|Y|; F) <ε

2+

ε

2= ε (233)

Thus condition (ii) is satisfied for C +D

37

13.3 Let C be a UI family of RVs. Say that Y ∈ D if for some X ∈ C and some sub-σ-algebra G of F , we have Y = E(X | G), a.s. Prove that D is UI.

Let G ∈ G and Y = E(X | G). Then

E(|Y|; G) = E(|E(X | G)| IG) = E(|E(XIG | G)|)≤ E(E(|XIG| | G)) = E(|X|IG) = E(|X|; G)

(234)

The inequality in the middle is the conditional Jensen’s inequality for the convex function|X|. When G = Ω, this gives a bound E(|Y|) ≤ E(|X|). So A = supX∈C E(|X|) is a boundfor supY∈D E(|Y|), and condition (i) of 13.1 is satisfied. Given ε > 0, choose δ > 0 tosatisfy condition (ii) of 13.1 for C. Then for any G ∈ G ⊂ F with Pr(G) < δ have for anyY ∈ D

E(|Y|; G) ≤ E(|X|; G) < ε (235)

Thus D also satisfies condition (ii) of 13.1. Satisfying both conditions, it is UI.

14.1 Hunt’s lemma. Suppose that (Xn) is a sequence of RV’s such that X = lim Xnexists a.s. and that (Xn) is dominated by Y in (L1)+.

|Xn(ω)| ≤ Y(ω), ∀(n, ω), and E(Y) < ∞ (236)

Let Fn be any filtration. Prove that

E(Xn | Fn)→ E(X | F∞) a.s. (237)

Let Zm = supr≥m|Xr − X|. This tends to 0 because

limn

Xn = X ⇒ limn|Xn − X| = 0 ⇒ lim sup

n|Xn − X| = 0 (238)

The last limit is the same as limm Zm = 0. Furthermore Zm → 0 in L1 by dominated con-vergence. Note |X| = limn|Xn| ≤ Y, so |Zm| ≤ supr≥m|Xr − X| ≤ 2Y and is dominatedby an L1 function. Therefore E Zm → 0.

Let’s turn to E(Xn | Fn). Note for any m ≤ n,

|E(Xn | Fn)− E(X | F∞)| ≤ |E(X | Fn)− E(X | F∞)|+ |E(Xn | Fn)− E(X | Fn)|≤ |E(X | Fn)− E(X | F∞)|+ E(|Xn − X| | Fn)

≤ |E(X | Fn)− E(X | F∞)|+ E(Zm | Fn)

≤ |E(X | Fn)− E(X | F∞)|+ |E(Zm | Fn)− E(Zm | F∞)|+ E(Zm | F∞)

(239)

By Doob’s upward convergence theorem, E(X | Fn) → E(X | F∞) and E(Zm | Fn) →E(Zm | F∞) in L1 and almost surely. By the tower property and positivity,

E|E(Zm | F∞)| = E Zm → 0 (240)

so the last term converges to 0 in L1. Furthermore by conditional dominated convergence,E(Zm | F∞) → E(0 | F∞) = 0, so the last term converges to 0 almost surely. ThereforeE(Xn | Fn)→ E(X | F∞) in L1 and almost surely.

38

14.2 Azuma-Hoeffding

(a) Show that if Y is a RV with values in [−c, c] and E Y = 0 then for θ ∈ R

E eθY ≤ cosh θc ≤ exp(

12

θ2c2)

(241)

(b) Prove that if M is a positive martingale null at 0 such that for some sequence (cn :n ∈N) of positive constants

|Mn −Mn−1| ≤ cn, ∀n (242)

then, for x > 0,

Pr

(supk≤n

Mk ≥ x

)≤ exp

(12

x2/ n

∑k=1

c2k

)(243)

(a) Define α such that Y is the convex combination of c and −c

α =Y + c

2c⇒ Y = cα− c(1− α) (244)

Since |Y| ≤ c we have α ∈ [0, 1]. By Jensen’s inequality since E α = 12

E eθY = E eθcα−θc(1−α) ≤ eθc E α + e−θc E(1− α) =12(eθc + e−θc) = cosh θc (245)

Now compare the Taylor series of cosh x vs exp(12 x2)

cosh x = ∑k

x2k

(2k)!exp

(12

x2)= ∑

k

x2k

2kk!(246)

For x ≥ 0, we see exp(12 x2) dominates cosh x term by term since the product on the

RHS below contains only the even terms

(2k)! = 2k · (2k− 1) · · · 2 · 1 ≥ 2k · (2k− 2) · · · 2 = 2kk! (247)

Thus we get the second inequality cosh θc ≤ e12 θ2c2

.

(b) Now by the conditional Jensen inequality, eMn is a submartingale. By Doob’s sub-martingale inequality

Pr

(supk≤n

eθMk ≥ eθx

)≤ e−θx E eθMn (248)

Using the tower property, the inequality from (a)

E eθMn = E(E(eMn−Mn−1eMn−1 | Fn−1)) = e−12 θ2c2

n E eθMn−1 (249)

39

Thus by induction E eθMn = e12 θ2(c2

n+c2n−1+···+c2

1) E eM0 = exp(12 θ2 ∑k c2

k). If we chooseθ to minimize exp(−θx + 1

2 θ2 ∑k c2k) we get θ = x/(∑k c2

k). Plugging this into (248),where we can rewrite the LHS because eθx is monotonically increasing

Pr

(supk≤n

Mk ≥ x

)≤ exp

(−1

2x/

∑k

c2k

)(250)

16.1 Prove that

limT↑∞

∫ T

0x−1 sin x dx =

π

2(251)

Define

f (λ) = L[x−1 sin x] =∫ ∞

0

e−λx sin xx

dx (252)

Then by (97) f ′(α) = −L[sin x]. We can compute this handily

f ′(α) = −∫ ∞

0e−λx sin x dx (253)

= cos xe−λx∣∣∣∞0+ λ

∫ ∞

0e−λx cos x dx (254)

= 1 + λ

(sin xe−λx

∣∣∣∞0+ λ

∫ ∞

0sin xe−λx

)(255)

= (1 + λ2) f ′(α) (256)

From dominated convergence theorem, f (∞) = 0. Thus we integrate from ∞ to find

f (λ) =∫ α

∞

dλ

1 + λ2 = arctan(λ) +π

2(257)

Set λ = 0 to find∫ ∞

0sin x

x dx = π2

A second solution is to use complex analysis. Consider the contour γ in the complexplane along the real line from −T to −ε, then a semicircle centered at 0 from −ε to ε, thenalong the real line from ε to T, then a semicircle centered at 0 from T to −T. By Cauchy’stheorem

limε→0

∫γ

eiz

zdz = iπ (258)

since the integrand is holomorphic inside of the region bound by γ, but has a residue at0, and the arc of radius ε is π radians. On the semicircle of radius T, writing z = Teiθ wehave dz = iTeiθdθ.∣∣∣∣∫ π

0

exp(iTeiθ)

Teiθ iTeiθdθ

∣∣∣∣ ≤ ∫ π

0

∣∣∣eiT(cos θ+i sin θ))∣∣∣ dθ =

∫ π

0e−T sin θ dθ (259)

40

The integrand is dominated by 1, so by the dominated convergence theorem the integralconverges to 0 as T → ∞. Therefore as T → ∞ and ε→ 0 we have

limT→∞,ε→0

∫γ

eiz

zdz =

∫ ∞

−∞

eix

xdx (260)

and we conclude this integral equals iπ. Making the substitution x → −x we conclude∫ ∞

−∞

e−ix

xdx = −iπ (261)

Using the fact sin x = eix−e−ix

2i , this implies∫ ∞

−∞

sin xx

dx = π ⇒∫ ∞

0

sin xx

dx =π

2(262)

Since the integrand on the left is even.

16.2 Prove that if Z has U[−1, 1] distribution, then

φZ(θ) = sin θ/θ (263)

Show there do not exist i.i.d. RV X and Y such that

X−Y ∼ U[−1, 1] (264)

Computing

φZ(θ) = E eiθZ =∫ 1

−1eiθx 1

2dx =

eiθx

2iθ

∣∣∣∣1−1

=sin θ

θ(265)

For X and Y i.i.d. and Z = X−Y

φZ(θ) = E eiθ(X−Y) = E eiθXe−iθY = E eiθX E e−iθY = φX(θ)φX(−θ) (266)

where we used the property that E f (X)g(Y) = E f (X)E g(Y) for independent X and Y.However since φX(−θ) = φX(θ), in order for Z = X − Y to satisfy Z ∼ U[−1, 1] it mustbe that

‖φX(θ)‖2 =sin θ

θ(267)

But this is impossible since the left hand side is strictly non-negative whereas the righthand side is sometimes negative.

16.3 Calculate φX(θ) where X has the Cauchy distribution, which has pdf 11+x2 . Show

that the Cauchy distribution is stable, i.e., that (X1 + X2 + · · · + Xn)/n ∼ X for i.i.d.Cauchy Xi.

41

First assume θ > 0. Consider the contour γ which goes from −R to R along the real line,and then along a semicircle centered at 0 from R to −R. By Cauchy’s theorem

∫γ

eθiz

1 + z2 dz = 2πie−θ

2i= πe−θ (268)

since there is exactly one residue at z = i (the denominator can be written (i + z)(i− z))and the residue there is e−θ/2. Along the semicircle we can use the elementary bound∣∣∣∣∫

γ′

eiθz

1 + z2 dz∣∣∣∣ ≤ πR

R2 − 1→ 0 (269)

since for z = Reiψ the magnitude∣∣exp(iθReiψ)

∣∣ = e−Rθ sin ψ ≤ 1 (this is where we usethe hypothesis θ > 0 and that the path is a semicircle in the upper half plane). Also|1 + z2| ≥ |z|2 − 1 = R2 − 1. Thus the integrand is bound by 1

R2−1 and the length of thepath is πR. Thus in the limit only the segment along the real line makes a contribution tothe path integral and we conclude for θ > 0

φX(θ) =1π

∫ ∞

−∞

eiθx

1 + x2 dx = e−θ if θ ≥ 0 (270)

For θ < 0 we can modify the above argument, using a semicircle in the lower halfplane from R to −R instead. Then in Cauchy’s theorem we pick up the residue at −i (in-stead of i) going along a clockwise path (instead of counterclockwise). Thus the integralequals πeθ. We get the same bounds on the integral which show the contribution of thesegment along the semicircle tends to 0. Thus

φX(θ) =1π

∫ ∞

−∞

eiθx

1 + x2 dx = eθ if θ ≤ 0 (271)

Both cases may be summarized by the formula φX(θ) = e−|θ|. From this it immediatelyfollows that X is stable since

φSn/n(θ) = φX(θ/n)n = e−n|θ/n| = e−|θ| (272)

By the uniqueness of characteristic functions, Sn/n is distributed like a standard Cauchyrandom variable.

All that remains is to justify φSn/n(θ) = φX(θ/n)n. This follows from the followingcalculation for i.i.d. Xk:

φSn/n(θ) = E eiθ(X1+···+Xn)/n = (E eiθX1/n)(E eiθX2/n) · · · (E eiθXn/n) = φX(θ/n)n (273)

16.4 Suppose that X ∼ N (0, 1). Show that φX(θ) = exp(−12 θ2)

42

Note that

φX(θ) = E eiθX =1√2π

∫ ∞

−∞eiθxe−

12 x2

dx =1√2π

∫ ∞

−∞e−

12 (x−iθ))2− 1

2 θ2= e−

12 θ2

I(iθ) (274)

where I(α) = 1√2π

∫ ∞−∞ e−

12 (x−α)2

dx. For a real parameter α, I(α) = I(0) = 1, since we canjust perform a change of variables u = x− α, and the range of integration is unchanged.

In the complex case, consider a path integral in C along the rectangle with corners at±R and ±R− iθ of the function f (z) = 1√

2πe−

12 z2

. Since f (z) is entire, the path integral is0. As R→ ∞, the integral along the bottom side from−R− iθ to R− iθ tends to I(iθ). Theintegral along the top side from R to−R tends to−I(0) (since the limits of the integral arereversed). Along the sides of the rectangle, z = ±R− ix for x ∈ [0, θ], so the magnitudeof the integrand satisfies∣∣∣∣ 1√

2πe−

12 (±R−ix)2

∣∣∣∣ = 1√2π|e− 1

2 (R2−x2+2Rxi)| ≤ 1√2π

e−12 (R2−θ2) → 0 (275)

The path along this side is constant length θ as R → ∞. Therefore the contribution to thepath integral from the left and right sides of the rectangle is negligible as R → ∞, by thesimple bound of the path length times the maximum integrand magnitude. We conclude

0 =∫

γf (z) dz→ I(iθ)− I(0) (276)

Thus I(iθ) = I(0) = 1 and φX(θ) = e−12 θ2

.

16.5 Prove that if φ is the characteristic function of a RV X then φ is non-negative definitein that for c1, c2, . . . , cn ∈ C and θ1, θ2, . . . , θn ∈ R

∑j,k

cjckφ(θj − θk) ≥ 0 (277)

Consider the RV Z = ∑k ckeiθkX. Now by the positivity of expectations

E|Z|2 ≥ 0 (278)

Since

|Z|2 = ZZ =

(∑

jcjeiθjX

)(∑k

cke−iθkX

)= ∑

j,kcjckei(θj−θk)X (279)

Taking expectations of both sides and using linearity yields (277)

43

16.6

(a) Let (Ω,F , Pr) = ([0, 1],B[0, 1], Leb). What is the distribution of the random variableZ = 2ω− 1? Let ω = ∑ 2−nRn(ω) be the binary expansion of ω. Let

U(ω) = ∑odd n

2−nQn(ω) where Qn(ω) = 2Rn(ω)− 1 (280)

Find a random variable V independent of U such that U and V are identically dis-tributed and U + 1

2V is uniformly distributed on [−1, 1]

(b) Now suppose that (on some probability triple) X and Y are i.i.d. RV such that

X +12

Y is uniformly distributed on [−1, 1] (281)

Let φ be the characteristic function of X. Calculate φ(θ)/φ(14 θ). Show that the dis-

tribution of X must be the same as that of U in part (a) and deduce that there existsa set F ∈ B[−1, 1] such that Leb(F) = 0 and that Pr(X ∈ F) = 1

(a) The Lebesgue measure gives the uniform distribution on [0, 1]. Under the linear trans-form Z = 2ω − 1 the distribution remains uniform, but over [−1, 1]. To see this notefor any z ∈ [−1, 1]

Pr(Z < z) = Pr(

ω <12(z + 1)

)= Leb

(0,

12(z + 1)

)=

12(z− (−1)) (282)

which is exactly the distribution for U[−1, 1]. Let V be given by a similar expressionto U

V(ω) = ∑odd n

2−n(2Rn+1(ω)− 1) = ∑even n

2−n+1(2Rn(ω)− 1) (283)

the difference being that the nth term uses the n + 1st Rademacher function Rn+1rather than Rn. Its a standard result that the Rademacher functions Rn(ω) are i.i.d..The variables U and V are functions of disjoint collections of Rn, so they are indepen-dent. Because their expressions as functions of identically distributed Rademacherfunctions are the same, the variables have the same distribution. Its also clear that

U +12

V = ∑odd n

2−n(2Rn(ω)− 1) + ∑even n

2−n(2Rn(ω)− 1)

= 2

(∑

n∈N

2−nRn(ω)

)− 1 = 2ω− 1

(284)

Thus U + 12V ∼ U[−1, 1]

(b) From the assumption Z = X + 12Y has Z ∼ U[−1, 1] we get

φX(θ) φX(12

θ) =∫ 1

−1eiθu · du

2=

sin θ

θ(285)

44

From this we deduce φX(12 θ)φX(

14 θ) =

sin( 12 θ)

12 θ

and hence

φX(θ)

φX(14 θ)

=φX(θ) φX(

12 θ)

φX(12 θ) φX(

14 θ)

=sin θ

2 sin(12 θ)

= cos(12

θ) (286)

Hence, by induction

φX(θ)

φX(4−kθ)=

φX(θ)

φX(14 θ)

φX(14 θ)

φX(1

16 θ)· · · φ(4−k+1θ)

φ(4−kθ)= cos(

12

θ) cos(18

θ) · · · cos(2−2k+1θ)

(287)Since φ is uniformly continuous and φ(0) = 0 we conclude by taking the limit k→ ∞

φ(θ) = ∏odd k

cos(2−kθ) (288)

On the other hand

φQn(θ) = E eiθQn =12

eiθ +12

e−iθ = cos(θ) (289)

Therefore since U = ∑odd k 2−kQn we have

φU(θ) = ∏odd k

cos(2−kθ) = φX(θ) (290)

So it must be that U and X have the same distribution. TODO prove the measure0 thing. Its not hard to show that the values of X lie in a cantor-like set with middlethirds excluded. However this approach doesn’t use the characteristic function at all...is there some clever thing with Fourier analysis?

18.1

(a) Supose that λ > 0 and that (for n > λ), Fn is the DF associated with the Binomialdistribution Bin(n, λ/n). Prove using CF’s that Fn converges weakly to F where F isthe DF of the Poisson distribution with parameter λ.

(b) Suppose that X1, X2, . . . are i.i.d. RV’s each with the density function (1− cos x)/πx2

on R. Prove that for x ∈ R

limn→∞

Pr(

X1 + X2 + · · ·+ Xn

n≤ x

)=

12+ π−1 arctan x (291)

where arctan ∈ (−π2 , π

2 )

Consider the Bernoulli random variable Z with parameter p

φZ(θ) = E eiθZ = peiθ·1 + qeiθ·0 = p(eiθ − 1) + 1 (292)

45

(a) Since B(n, p) is the sum of independent Bernoulli random variables, if Fn ∼ B(n, p) =B(n, λ/n) then

φFn(θ) =(

peiθ + q)n

=

(1 +

λ

n(eiθ − 1)

)n→ exp(λ(eiθ − 1)) (293)

On the other if F has a Poisson distribution with parameter λ, its CF is given by

φF(θ) = E eiθF =∞

∑k=0

eiθke−λ λk

k!= exp(λeiθ − λ) (294)

which is exactly the same as limn→∞ φFn(θ). Thus we conclude that Fn → F in weaklyin distribution.

(b) Let’s compute

φX(θ) = E eiθX =∫

Reiθx 1− cos x

πx2 dx (295)

To this end first let’s compute for k ∈N and α > 0

Ik(α) =∫

Rx−keiαx dx (296)

(Aside: there’s something subtle going on here, because, like the integral in 16.1, thisis not integrable in the L1 sense). As in problem 16.1, take a contour along the realaxis from −R to −ε, a semicircle in the upper half plane from −ε to ε, a line from εto R, and then a semicircle in the upper half plane back to −R. We use a variationof the residue theorem

∫γ′

f (z)(z−z0)k dz = i∆θ

(k−1)! f (k−1)(z0), and ∆θ is the radians in theinfinitesimal path. ∮

γ

eiαz

zk dz = πi(iα)k−1

(k− 1)!(297)

Using the technique of 16.1, we get an estimate of the integral along the big semicircle∫γ′′

eiαz

zk dz =∫ π

0

e−Rα sin θ

Rk−1 dθ → 0 as R→ ∞ (298)

Since |x−k| ≤ |x−1| on the big semicircle, the estimate for the contribution of thissegment holds. Therefore as R→ ∞ and ε→ 0 we get∮

γ

eiαz

zk dz→ Ik(α) (299)

When α < 0 we can modify the argument by taking semicircles in the lower halfplane. This allows us to conclude the integral vanishes along the big semicircle, butnow the sense of integration along the small semicircle is reversed to clockwise fromcounterclockwise. We can summarize both cases as

Ik(α) =π(iα)k

(k− 1)!|α| (300)

46

This gives I1(α) = πi sgn(x) and I2(α) = −π|α|.Expressing cos x = 1

2(eix + e−ix), our CF can be written

φX(θ) =1π

(I2(θ)−

12(I2(θ + 1) + I2(θ − 1))

)=

12(|θ + 1|+ |θ − 1|)− |θ|

=

1− |θ| |θ| ≤ 10 otherwise

(301)

Thus φX is a kind of a witch-hat centered at 0.

Like in 16.3, for Sn = X1 + · · ·+ Xn,

φSn/n(θ) = φX(θ/n)n =

(1− |θ|n

)nif |θ|n ≤ 1

0 otherwise(302)

For any fixed θ, for large enough n, |θ|/n ≤ 1 so the first case applies eventually.Taking the limit as n→ ∞ this implies

φSn/n(θ)→ e−|θ| (303)

This is the CF of the Cauchy distribution, Sn/n converges in distribution to the Cauchydistribution. We conclude

P(Sn/n ≤ x)→∫ x

−∞

dxπ(1 + x2)

=12+ π−1 arctan(x) (304)

18.2 Prove the weak law of large numbers in the following form. Suppose X1, X2, . . .are i.i.d. RV each with the same distribution as X. Suppose X ∈ L1 and that E X = µ.Prove by use of CF’s that the distribution of

Sn = n−1(X1 + · · ·+ Xn) (305)

converges weakly to the unit mass at µ. Deduce that

Sn → µ in probability (306)

Of course, SLLN implies this weak law.

Let Z ∼ δµ be equal to µ almost surely. Then

φZ(θ) = E eiθZ = eiθµ (307)

47

Now consider the Taylor expansion of φX(θ). Note that

φX(0) = 1 φ′X(0) = E iXeiθX∣∣∣θ=0

= iµ (308)

Thereforelog φX(θ) = 1 + iµθ + o(θ) (309)

So the characteristic function satisfies

φSn(θ) = φX(θ/n)n =

(1 +

iµθ

n+ o

(θ

n

))n→ exp(iθµ) = φZ(θ) (310)

This implies that Sn → δµ in distribution. However, for a unit mass, convergence indistribution is the same as convergence in probability since for any ε > 0

Pr(Sn < µ− ε)→ Pr(Z < µ− ε) = 0 Pr(Sn > µ + ε)→ Pr(Z > µ + ε) = 0 (311)

and hence Pr(|Sn − µ| > ε) → 0 for any ε > 0. This proves the weak law of largenumbers.

Weak Convergence for [0, 1]

18.3 Let X and Y be RV’s taking values in [0, 1]. Suppose that

E Xk = E Yk k = 0, 1, 2, . . . (312)

Prove that

(i) E p(X) = E p(Y) for every polynomial p

(ii) E f (X) = E f (Y) for every continuous function f on [0, 1]

(iii) Pr(X ≤ α) = Pr(Y ≤ α) for every α ∈ [0, 1].

Item (i) follows from linearity of expectations. If p(x) = ∑nk=0 ckxk then

E p(X) =n

∑k=0

ck E Xk =n

∑k=0

ck E Yk = E p(Y) (313)

Item (ii) follows from the Weierstrass theorem. On the compact set [0, 1], every contin-uous function f can be approximated uniformly by a sequence of polynomials pn. ChooseN large enough so that for all n > N, | f (x)− pn(x)| < ε for all x. In this case

|E f (X)− E pn(X)| ≤ E| f (X)− pn(X)| ≤ ε (314)

Therefore

|E f (X)− E f (Y)| ≤ |E f (X)− E pn(X)|+ |E pn(X)− E pn(Y)|+ |E pn(X)− E f (X)|≤ ε + 0 + ε = 2ε

(315)

48

Since ε is arbitrary, this shows E f (X) = E f (Y).Approximate I[0,α] by the continuous peicewise linear function

fn(x) =

1 x ∈ [0, α)

n (α− x) x ∈ [α, α + 1n )

0 x ∈ [α + 1n , 1]

(316)

Using the dominated convergence theorem

limn→∞

E fn(X) = E I[0,α](X) = Pr (X ≤ α) (317)

The same holds for Y. However, E fn(X) = E fn(Y) for all n by part (ii). Therefore thelimits are equal and Pr(X ≤ α) = Pr(Y ≤ α). Thus the moments unique determine thedistribution on [0, 1].

18.4 Suppose that (Fn) is a sequence of DFs with

Fn(x) = 0 for x < 0, Fn(1) = 1, for every n (318)

Suppose that

mk = limn

∫[0,1]

xkdFn exists for k = 0, 1, 2, . . . (319)

Use the Helly-Bray Lemma and 18.3 to show that Fnw→ F, where F is characterized by∫

[0,1] xk dF = mk, ∀k

Let Fn be associated with the measure µn. Suppose a subsequence Fniw→ F converges

in distribution. By Helly-Bray there is a subsequence ni such that Fniw→ F, and call the

associated measure µ. By the definition of weak convergence since xk is a continuousfunction on [0, 1], and since ni is a subsequence of n

mk = limn

µn(xk) = limni

µni(xk) = µ(xk) =∫[0,1]

xk dF (320)

If another subsequence n′i satisfies Fn′iw→ F′ with associated measure µ′, then 10.3 implies

that F′ = F since the above argument shows µ′(xk) = mk as well, so µ′(xk) = µ(xk) forall k.

This implies that Fnw→ F. For suppose there is some continuous g : [0, 1] → R such

that lim sup µn(g) 6= µ(g) or lim inf µn(g) 6= µ(g). Then choose a subsequence ni sothat µni(g) converges to a value other than µ(g). By Helly-Bray we can take a furthersubsequence converging to a distribution F′. This distribution must satisfy µ′(g) 6= µ(g),but this contradicts what we’ve previously shown, that every convergent subsequencehas the same limit. Therefore limn µn(g) exists for all g ∈ C[0, 1] and lim µn(g) = µ(g).

49

18.5 Improving on 18.3, a moment inversion formula. Let F be a distribution withF(0−) = 0 and F(1) = 1. Let µ be the associated law, and define

mk =∫[0,1]

xk dF(x) (321)

Define

Ω = [0, 1]× [0, 1]N, =B × BN, Pr = µ× LebN

Θ(ω) = ω0, Hk(ω) = I[0,ω0](ωk)(322)

This models the situation in which Θ is chosen with law µ, a coin with probability Θ isminted and tossed at times 1, 2, . . . . See E10.8. The RV Hk is 1 if the kth toss producesheads, 0 otherwise. Define

Sn = H1 + H2 + · · ·+ Hn (323)

By the strong law of large numbers and fubini’s theorem

Sn/n→ Θ, a.s. (324)

Define a map D on the space of real sequences (an : n ∈ Z+) by setting

Da = (an − an+1 : n ∈ Z+) (325)

Prove that

Fn(x) = ∑(

ni

)(Dn−im)i → F(x) (326)

at every point of continuity of F

By the SLLN, we know that

F(x) = Pr(Θ ≤ x) = limn→∞

Pr(Sn/n ≤ x) (327)

Conditional on Θ, Sn ∼ Bin(n, Θ), so marginalizing over Θ and using Fubini’s theoremwe find

Pr(Sn ≤ nx) =∫[0,1]

∑k≤nx

(nk

)θn−k(1− θ)k µ(dθ) = ∑

k≤nx

(nk

) ∫[0,1]

θn−k(1− θ)k µ(dθ)

(328)But we can write the integral in terms of the mk. By induction I claim

(Dnm)k =∫[0,1]

xk(1− x)n µ(dx) (329)

50

Clearly this is true for n = 0, for n ≥ 1

(Dnm)k = (D · (Dn−1m))k = (Dn−1m)k − (Dn−1m)k+1

=∫[0,1]

xk(1− x)n−1 µ(dx)−∫[0,1]

xk+1(1− x)n−1 µ(dx)

=∫[0,1]

xk(1− x)n−1(1− x) µ(dx)

(330)

Pulling together the parts we get (326)

18.6 Moment Problem. Prove that if (mk : k ∈ Z+) is a sequence of numbers in [0, 1],then there exiss a RV X with values in [0, 1] such that E(Xk)

Weak Convergence for [0, ∞)

18.7 Using Laplace transforms instead of CF’s. Suppose that F and G are DF’s on R+

such that F(0−) = G(0−) and∫[0,∞)

e−λx dF(x) =∫[0,∞)

e−λx dG(x), ∀x ≥ 0 (331)

Note that the integral on the LHS has a contribution F(0) from 0. Prove that F = G.Suppose that (Fn) is a sequence of distribution functions on R each with Fn(0−) = 0and such that

L(λ) = limn

∫e−λx dFn(x) (332)

exists for λ ≥ 0 and that L is continuous at 0. Prove that Fn is tight and that

Fnw→ F where

∫e−λx dF(x) = L(λ), ∀λ ≥ 0 (333)

Let U = e−X and V = e−Y. Since the range of X, Y is [0, ∞) the range of U, V is [0, 1].From (331)

E Uk = E e−kX = E e−kY = E Vk (334)

Hence by 10.3 the distributions of U and V are equal. However, since ex is monotonic andinvertible, this means that the distributions of X and Y are equal. More concretely, therelationship Pr(U ≥ c) = Pr(X ≤ − log c) provides a dictionary to translate between thedistributions of X and U, and similarly for Y and V.

Given ε > 0 choose λ small enough so that L(λ) > 1− ε/4. This is possible because Lis continuous at 0. Thus, because L is the limit of the Laplace transforms of the Fn, for allbut finitely many n we must have∫

[0,∞)(1− e−λx) dFn(x) < ε/2 (335)

51

Thus for any K, for all but finitely many n

(1− Fn(K))(1− e−λK) ≤∫[0,∞)

(1− e−λx) dFn(x) < ε/2 (336)

Choose K large enough so that 1− e−λK > 12 . Thus we get, for all but finitely many n

Fn(K) > 1− ε (337)

which shows that Fn is tight.The rest of the proof is basically the same as 18.4. Having proved tightness, Hally-Bray

implies that some subsequence converges in distribution to some distribution, say F. Anydistribution which is the limit of a subsequence of Fn has the Laplace transform L (sincethe limit of a subsequences are the same as the limit of the sequence). By the argumentabove, there is a unique distribution with a given Laplace transform, Thus, the limit ofa subsequence of Fn, when it exists, converges to a unique distribution regardless of thesubsequence. This implies that Fn

w→ F since, for any continuous function g, we musthave lim sup

∫g dFn = lim inf

∫g dFn =

∫g dF, since otherwise we could use Helly-Bray

to find a subsequence which converged to a distribution other than F.

A13.1 Modes of convergence.

(a) Show that (Xn → X, a.s.)⇒ (Xn → X in prob)

(b) Prove that (Xn → X in prob) 6⇒ (Xn → X, a.s.)

(c) Prove that if ∑ Pr(|Xn − X| > ε) < ∞, ∀ε > 0 then Xn → X, a.s.

(d) Suppose that Xn → X in probability. Prove that a subsequence (Xnk) of (Xn) con-verges Xnk → X, a.s.

(e) Deduce from (a) and (d) that Xn → X in probability if and only if every subsequenceof (Xn) contains a further subsequence which converges a.s. to X

(a) Suppose there is an event E ⊂ Ω with Pr(E) > 0 and for some ε > 0, |Xn − X| > εinfinitely often for ω ∈ Ω. Then for every ω ∈ E, it can’t be that Xn → X since eitherXn > X + ε infinitely often or lim supn Xn < X − ε infinitely often, and hence eitherthe lim inf or lim sup is different from X. Thus Pr(Xn → X) ≤ 1− Pr(E) < 1.

(b) Let Ω = [0, 1], taking Pr as the Lesbegue measure on the Borel sets. Let

Xn,k(x) =

1 if k

n ≤ x < k+1n

0 otherwise(338)

Then take the Xn,k as a sequence taking the indices in lexicographic order. That is,take the n’s in order, and for a given n take the k’s in order. Now for n ≥ N, andε ∈ (0, 1)

Pr(|Xn,k − 0| > ε) = Pr(Xn,k = 1) =1n≤ 1

N(339)

52

Thus Xn,k → 0 in probability. On the other hand, lim inf Xn,k(x) = 1 for any x ∈ [0, 1).That’s because for any n, let kn(x) = bn/xc. Then Xn,kn(x)(x) = 1, and Xn,k(x) = 1infinitel often. Therefore it can’t be that Xn,k → X almost sure (in fact, almost surely,the sequence doesn’t converge).

A second solution is given by considering IEn for independent En where Pr(En) = pn.Note

Pr(|IEm − IEn | > ε) = Pr(En∆Em) = (1− pn)pm + pn(1− pm) (340)

which are independent. TODO finish second solution

(c) By Borel-Cantelli, |Xn − X| ≤ ε eventually almost surely, because only finitely of theevents |Xn − X| > ε hold. By hypothesis, this is true for each ε = 1/k for k ∈ N,so take the intersection of these countably many almost sure sets Ek to get an almostsure set E =

⋂k Ek. For samples ω ∈ E, for each k ∈ N, there exists an Nk(ω) ∈ N

such that if n > Nk(ω) then |Xn(ω) − X(ω)| ≤ 1/k. For arbitrary ε > 0, choose1/k < ε and Nε(ω) = Nk(ω) to get a corresponding statement for ε. This is thedefinition of almost sure convergence– for each ε almost surely there exists an N suchthat |Xn − X| < ε for n > N.

(d) Start with m = 1. Since Xn → X in probability, we can choose a subsequence n1,ksuch that Pr(|Xnk − X| > 1) < 2−k for k ∈ N. This subsequence evidently satisfiesthe property ∑ Pr(|Xn1,k − X| > 1) = 1 < ∞.

Now recursively choose a subsequence nm,k of nm−1,k which satisfies Pr(|Xnm,k −X| >1/m) < 2−k. This subsequence satisfies the property ∑ Pr(|Xn1,k − X| > 1/m) = 1 <∞. Therefore, for any m′ ∈ N the diagonal subsquence nm = nm,m is entirely withinthe subsequence nm′,k except for at most finitely many terms when m < m′. Thus∑ Pr(|Xnm − X| > 1/m′) < ∞ for all m′. This shows that the subsequence satisfiesproperty (c), and therefore Xnm → X almost surely.

(e) If Xn → X in probability, then for any ε > 0, pn = Pr(|Xn − X|) → 0. Since thesequence pn has a limit, any subsequence pnk converges to the same limit, 0. Thus thesubsequence Xnk also converges in probability to X. By (d) this subsequence has afurther subsequence which converges almost surely to X.

Conversely, assume the property, and choose ε > 0. We’ll show pn = Pr(|Xn − X| >ε) has the property pn → 0. Choose δ > 0, n1 = 1, and recursively find a subsequencenk by choosing pnk + δ > supm>nk−1

pm. By the property, pnk has a subsequence nl forwhich Xnl converges almost surely to X. By (a), Xnl converges in in probability, sopnl → 0. Since nl is a subsequence of nk is has the property that pnl + δ > pm for anym > nl. Taking the limit l → ∞, this shows that lim pm < δ. Since δ is arbitrary, thisshows pm → 0. Since ε is arbitrary, this shows Xn → X in probability.

53

A13.2 If ξ ∼ N (0, 1), show E eλξ = exp(−12 λ2). Suppose ξ1, ξ2, . . . are i.i.d. RV’s each

with a N (0, 1) distribution. Let Sn = ∑ni=1 ξi and let a, b ∈ R and define

Xn = exp(aSn − bn) (341)

Show(Xn → 0 a.s.)⇔ (b > 0) (342)

but that for r ≥ 1(Xn → 0 in Lr)⇔ (r < 2b/a2) (343)

Calculating

E eλξ =1√2π

∫R

e−λξe−12 ξ2

dξ =1√2π

∫R

e−12 (ξ−λ)2− 1

2 λ2dξ = e−

12 λ2 1√

2π

∫R

e−12 ζ2

dζ

= e−12 λ2

(344)

where we performed the substitution ζ = ξ − λ

Suppose b < 0. By the SLLN, Sn/n → 0 as n → ∞ so, almost surely. Thus for eachelement of the sample space ω there is an N(ω) such that Sn

n − b < 12 b for all n > N. In this

case exp(Sn− bn) < exp(−12 bn)→ 0. Now suppose b > 0. The similarly there is an N(ω)

such that Snn > 1

2 b and therefore Xn > exp(12 bn) → ∞. For b = 0, the law of the iterated

logarithm says that Sn > n log log n infinitely often. Therefore eaSn > (log n)an infinitelyoften, and hence does not converge to 0. Since −Sn is also the sum of i.i.d. N (0, 1) RV’s,eaSn < 1

(log n)an infinitely often. Thus the limit of eaSn doesn’t exist, it oscillates infinitelyoften reaching arbitrarily high and low positive values.

On the other hand

E Xrn = E er(aSn−bn) = e−rbn(E eraξ)n = e(

12 (ra)2−br)n (345)

If r < 2b/a2 then ‖Xn‖r → 0, but if r ≥ 2b/a2 its not true. If 2b/a2 then ‖Xn‖r → ∞ andif r = 2b/a2 then ‖Xn‖r → 1

54

Documents

Exercises from Probability with Martingales