Upload
ngothu
View
233
Download
1
Embed Size (px)
Citation preview
Advanced Probability: Solutions to Sheet 2
Guolong Li∗
November 26, 2013
1 Discrete-time martingales
Exercise 1.1
Let us suppose that, at time 0, an urn contains a single black ball and a
single white ball. At each time n ≥ 1, a ball is chosen uniformly at random
from those in the urn and it is replaced, together with another ball of the
same colour. We will denote the number of black balls that we have chosen
by time n by Bn. Immediately after time n, then, the urn contains n + 2
balls, of which Bn + 1 are black.
Proposition. Let Mn := (Bn+1)/(n+2) for all n ≥ 0; this is the proportion
of black balls in the urn immediately after time n. With respect to a certain
natural filtration, M = (Mn : n ≥ 0) is a martingale that converges a.s. and
in Lp for all p ∈ [1,∞) to some [0, 1]-valued random variable, X∞.
Proof. Let us define Fn := σ(B0, . . . , Bn) for all n ≥ 0. We shall show that
the process M is a martingale with respect to this filtration. Clearly, for
all n ≥ 0, Mn is Fn-measurable and such that |Mn| ≤ 1, so that it is in
particular integrable. We also see that, almost surely,
E[Mn+1 | Fn] = Mn ·Bn + 2
n+ 3+ (1−Mn) · Bn + 1
n+ 3
=(Bn + 1)(Bn + 2) + (n+ 1−Bn)(Bn + 1)
(n+ 2)(n+ 3)
=(Bn + 1)(2 + n+ 1)
(n+ 2)(n+ 3)= Mn,
and so we conclude that M is a martingale.
∗Comments and corrections should be sent to [email protected].
1
As we mentioned earlier, |Mn| ≤ 1 for all n ≥ 0, so the process is bounded
in Lp for all p ≥ 1. Therefore, by the Lp martingale convergence theorem,
there exists a random variable X∞ such that, for all p ∈ (1,∞), Mn → X∞
a.s. and in Lp. (By the ‘a.s.’ part of the theorem, this X∞ is the same for
each p ∈ (1,∞).) A fortiori, then, Mn → X∞ in L1 as well. Additionally, as
Mn ∈ [0, 1], it follows that X∞ ∈ [0, 1] a.s.
Proposition. For each k ≥ 1 and each n ≥ 0, define
M (k)n :=
k∏r=1
Bn + r
n+ r + 1.
The process (M(k)n : n ≥ 0) is a martingale with respect to the same natural
filtration as in the previous proposition.
Proof. Let us fix some k ≥ 1. Again, it is obvious that, for each n ≥ 0,
M(k)n is Fn-measurable and that, as each factor lies in [0, 1], |M (k)
n | ≤ 1; the
process is therefore adapted and integrable. To verify that the martingale
property obtains, let us fix some n ≥ 0. Then, a.s.,
E[M(k)n+1 | Fn] = E
[k∏r=1
Bn+1 + r
n+ r + 2
∣∣∣∣∣ Fn
]
= Mn ·k∏r=1
Bn + r + 1
n+ r + 2+ (1−Mn) ·
k∏r=1
Bn + r
n+ r + 2
=
∏kr=2(Bn + r)∏k
r=1(n+ r + 2)· (Bn + 1)(Bn + k + 1) + (n+ 1−Bn)(Bn + 1)
n+ 2
=
∏kr=2(Bn + r)∏k
r=1(n+ r + 2)· (Bn + 1)(n+ k + 2)
n+ 2
=
∏kr=1(Bn + r)∏k
r=1(n+ r + 1)= M (k)
n .
Looking back at the definition of M(k)n , it’s quite clear that it almost
equals to Mkn . Our next goal will be to quantify this and show that as n→∞,
the difference disappears in a suitable manner. Each factor in the definition
on M(k)n can be rewritten as
Bn + r
n+ r + 1=Bn + r
n+ 2· n+ 2
n+ r + 1=Bn + 1 + r − 1
n+ 2· n+ 2
n+ r + 1=
(Mn+
r − 1
n+ 2
)n+ 2
n+ r + 2.
2
From this it is clear that each of the k factors tends to X∞ a.s. as n→∞and hence that M
(k)n → Xk
∞ a.s. as n → ∞. As we mentioned earlier,
|M (k)n | ≤ 1 so, by the same reasoning as that employed in the proof to
our first proposition, there exists a random variable M∞ such that, for all
p ∈ [1,∞), M(k)n → M∞ a.s. and in Lp. Therefore Xk
∞ = M∞ a.s. and
so, for all p ∈ [1,∞), M(k)n → Xk
∞ a.s. and in Lp. In particular, then, the
convergence holds in L1, and
E[Xk∞] = lim
n→∞E[M (k)
n ] = limn→∞
E[M(k)0 ] =
1
k + 1,
where the penultimate equality in the above holds by the martingale property.
We can use this to determine the law of X∞. The moment generating
function of X∞ exists as X∞ ∈ [0, 1] a.s. It is given by
MX∞(t) := E[etX∞ ] = E
[ ∞∑k=0
(tX∞)k
k!
]=∞∑k=0
tkE[Xk∞]
k!=∞∑k=0
tk
(k + 1)!=
et − 1
t.
We have used Fubini’s theorem in the third equality together with the
absolute convergence of the series in the third term. The moment generating
function of a random variable which is uniformly distributed on [0, 1] is
precisely equal to MX∞ so it follows that X∞ is itself uniformly distributed
on [0, 1].
We shall next reobtain this result in a more direct way by showing that
Bn is uniformly distributed on 0, 1, . . . , n for each n ≥ 0. For the cases
n = 0 and n = 1, this is immediate; let us suppose that we have established
the result for BN . Let us take some 1 ≤ k ≤ N . Then
P(BN+1 = k) = P(BN+1 = k | BN = k)P(BN = k)
+ P(BN+1 = k | BN = k − 1)P(BN = k − 1)
=(k − 1) + 1
N + 2· 1
N + 1+
(N + 2)− (k + 1)
N + 2· 1
N + 2
=1
N + 2.
We also have that
P(BN+1 = 0) = P(BN+1 = 0 | BN = 0)P(BN = 0) =1
N + 2
and, finally, that
P(BN+1 = N + 1) = P(BN+1 = N + 1 | BN = N)P(BN = N) =1
N + 2.
3
It follows from these calculations that BN+1 is uniformly distributed on the
set 0, 1, . . . , N + 1 and hence, by induction, we have proven the claim.
We can use this to rederive the distribution of X∞ in the following way.
As Mn → X∞ a.s., Mn → X∞ in distribution, so if we can show that the
distribution function of Mn converges everywhere to that of a random variable
that is uniformly distributed on [0, 1], it will follow that X∞ is uniformly
distributed on [0, 1]. We begin by noting that, as Mn = (Bn + 1)/(n+ 2), Mn
is uniformly distributed on 1/(n+ 2), . . . , (n+ 1)/(n+ 2). If Fn denotes
the distribution function of Mn, by definition
Fn(x) =
0 if x < 0
b(n+ 2)xc/(n+ 1) if 0 ≤ x ≤ 1
1 if x > 1
Clearly Fn(x)→ 0 if x < 0 and Fn(x)→ 1 if x > 1, so let us suppose that
0 ≤ x ≤ 1. Then Fn(x) = b(n+ 2)xc/(n+ 1)→ x as n→∞, and hence Fn
converges everywhere to the distribution function of a random variable that
is uniformly distributed on [0, 1]. We conclude that X∞ is, as we had shown
earlier, uniformly distributed on [0, 1].
Proposition. Let 0 < θ < 1 and define, for all n ≥ 0,
Nn(θ) :=(n+ 1)!
Bn!(n−Bn)!θBn(1− θ)n−Bn .
The process N(θ) := (Nn(θ) : n ≥ 0) is a martingale with respect to the same
filtration as that in the previous propositions.
Proof. Again, it is trivially the case that Nn(θ) is Fn-measurable and in-
tegrable, so it suffices to check the martingale property. Let n ≥ 0. Then,
a.s.,
E[Nn+1(θ) | Fn] = E
[(n+ 2)!
Bn+1!(n+ 1−Bn+1)!· θBn+1(1− θ)n+1−Bn+1
∣∣∣∣ Fn
]
= Mn ·(n+ 2)!
(Bn + 1)!(n−Bn)!· θBn+1(1− θ)n−Bn
+ (1−Mn) · (n+ 2)!
Bn!(n+ 1−Bn)!· θBn(1− θ)n+1−Bn
=(n+ 1)!
Bn!(n−Bn)!· θBn(1− θ)n−Bn(θ + 1− θ)
= Nn(θ).
4
Exercise 1.2
Given Θ, the probability we have a sequence of B1, B2, B3...Bn is
ΘBn(1−Θ)n−Bn .
So, when Θ is uniform distributed in [0, 1], the probability of having sequence
B1, B2, B3...Bn is∫ 1
0θBn(1− θ)n−Bndθ = B(Bn + 1, n−Bn + 1) =
Bn!(n−Bn)!
(n+ 1)!,
where B denotes the beta function. So, we have Nn(θ) is indeed the con-
ditional density function of Θ given B1, B2, B3...Bn. Then the probability
the (n+ 1)th toss is head is same as the conditional expectation of Θ given
B1, B2, B3...Bn. So it is
(n+ 1)!
Bn!(n−Bn)!
∫ 1
0θBn+1(1− θ)n−Bndθ =
Bn + 1
n+ 2.
So, it has same probability structure as in the previous problem.
Exercise 1.3
At each time n ≥ 1, let us suppose that an idealised monkey types a capital
letter at random; that is, let us suppose that (Un : n ≥ 1) is an iid sequence
of random variables that are uniformly distributed on A, . . . ,Z and that Un
corresponds to the nth random character that the monkey types. Let us define
T to be the first time that the idealised monkey types ‘ABRACADABRA’,
i.e.
T := infn ≥ 1 : Un−10 = A, Un−9 = B, . . . , Un = A.
Our goal in this exercise is to calculate E[T ] with a martingale argument.
In addition to the above, let us suppose that we have a sequence of
gamblers indexed by k ∈ Z>0 and that each gambler plays the following
game. He begins by betting £1 that the nth letter will be ‘A’. If he loses, he
leaves; if he wins, he receives £26 and continues playing. If he is still playing,
he places his fortune—all £26 of it—on the (n + 1)th letter being ‘B’. If
he loses, he leaves; if he wins, he receives £262. He continues in this way,
betting everything on ‘ABRACADABRA’ being the eventual sequence.
We will now introduce a martingale based on these gamblers. Let us
stipulate that the kth gambler begins playing at time k and that X(k)n denotes
5
the kth gambler’s fortune at time n; to be more specific, X(k)k−1 = 1, X
(k)k
is either 0 (if Uk 6= A) or 26 (if Uk = A), and so on up to X(k)k+10, which
is either 0 or 2611. (We choose this time parameterisation so that the kth
gambler first bets on the outcome of the kth letter.) Let us also set X(k)n = 0
if n < k− 1 and let us set X(k)n = X
(k)k+10 if n > k+ 10; this second condition
corresponds to the gambler keeping his fortune if the monkey has typed
‘ABRACADABRA’. Let us define Xn :=∑n
k=1X(k)n . The random variable
Xn corresponds to the sum of the fortunes of all ‘active’ players. Finally,
let us set Fn := σ(U1, . . . , Un) for each n ≥ 1. If k < n− 9, X(k)n+1 = X
(k)k+10,
which is Fn-measurable as k + 10 ≤ n. It follows that, almost surely,
E[Xn+1 | Fn] =
n+1∑k=1
E[X(k)n+1 | Fn] =
n−10∑k=1
E[X(k)n+1 | Fn] +
n+1∑k=n−9
E[X(k)n+1 | Fn]
=
n−10∑k=1
X(k)k+10 +
n+1∑k=n−9
E[X(k)n+1 | Fn]
=
n−10∑k=1
X(k)n +
n+1∑k=n−9
(26 ·X(k)
n ·1
26+ 0)
= Xn +X(n+1)n
= Xn + 1.
It follows from the above that (Xn − n : n ≥ 1) satisfies the martingale
property. Moreover, Xn − n is clearly Fn-measurable. It is also integrable as
|Xn − n| ≤ n · 2611 + n, and hence (Xn − n : n ≥ 1) is a martingale.
Now, the random time T is plainly a stopping time and it is clearly the
case that
T ≤ infn ∈ 11 · Z>0 : Un−10 = A, Un−9 = B, Un−8 = R, . . . , Un = A
.
By independence, the right-hand side of the above is a geometric random
variable with success probability 26−11. As this has a finite (albeit large) mean,
it follows by comparison that E[T ] <∞. By Exercise 2.4, if (Xn − n : n ≥ 1)
has (a.s.) bounded increments then we can apply an optional stopping
6
theorem with T . We see that
|Xn+1 − (n+ 1)−Xn + n| ≤ 1 + |Xn+1 −Xn|
= 1 +
∣∣∣∣∣n+1∑k=1
X(k)n+1 −
n∑k=1
X(k)n
∣∣∣∣∣= 1 +
∣∣∣∣∣n−10∑k=1
X(k)k+10 +
n+1∑k=n−9
X(k)n+1 −
n−11∑k=1
X(k)k+10 −
n∑k=n−10
X(k)n
∣∣∣∣∣≤ 1 +X(n−10)
n +n+1∑
k=n−9X
(k)n+1 +
n∑k=n−10
X(k)n
≤ 1 + 23 · 2611
and hence (Xn−n : n ≥ 1) has bounded increments. Therefore E[XT − T ] =
E[X1 − 1] = 0, and so E[T ] = E[XT ] = 2611 + 264 + 26 as X(k)T 6= 0 if and
only if k ∈ T, T − 3, T − 10.
Exercise 1.4
Let us write P(A | G ) for E[1A | G ] and let us suppose that X = (Xn : n ≥ 0)
is a sequence of (0, 1)-valued random variables defined in the following
way. We begin by fixing some a ∈ (0, 1) and setting X0 := a a.s. We then
inductively define Xn+1 from Xn by
P(Xn+1 = Xn/2 | Fn) = 1−Xn = 1− P(Xn+1 = (Xn + 1)/2 | Fn),
where Fn := σ(Xk : 0 ≤ k ≤ n).
Proposition. The process X is a martingale with respect to its natural
filtration. Moreover, there exists some X∞ such that Xn → X∞ in Lp for
each p ∈ [1,∞).
Proof. The process is clearly adapted and, as Xn is a.s. (0, 1)-valued for each
n ≥ 0, that it is integrable. We will now check the martingale property. If we
fix n ≥ 0 then, a.s.,
E[Xn+1 | Fn] = E[Xn+11Xn+1=Xn/2 | Fn] + E[Xn+11Xn+1=(Xn+1)/2 | Fn]
= E[Xn1Xn+1=Xn/2/2 | Fn] + E[(Xn + 1)1Xn+1=(Xn+1)/2/2 | Fn]
= XnE[1Xn+1=Xn/2 | Fn]/2 + (Xn + 1)E[1Xn+1=(Xn+1)/2 | Fn]/2
= Xn(1−Xn)/2 + (Xn + 1)Xn/2 = Xn,
7
where we have used the fact that Xn is bounded and Fn-measurable in
the third equality. The process is therefore a martingale. As X is bounded,
it is in particular bounded in Lp for all p ∈ (1,∞); by the Lp martingale
convergence theorem, there is some X∞ such that, for all p > 1, Xn → X∞
a.s. and in Lp. As Lp convergence implies L1 convergence on finite measure
spaces when p ∈ (1,∞), Xn → X∞ in L1 as well.
For the second part of the exercise, we see that
E[(Xn+1 −Xn)2] = E[E[X2n1Xn+1=Xn/2/4 | Fn]] + E[E[(1−Xn)21Xn+1=(Xn+1)/2/4 | Fn]]
= E[X2nE[1Xn+1=Xn/2 | Fn]/4] + E[(1−Xn)2E[1Xn+1=(Xn+1)/2 | Fn]/4]
= E[X2n(1−Xn) + (1−Xn)2Xn]/4
= E[Xn(1−Xn)]/4,
where the second equality holds as Xn is bounded and Fn-measurable.
As Xn → X∞ both in L1 and L2, we have that E[Xn] → E[X∞] and
E[X2n]→ E[X2
∞] and hence that
E[Xn(1−Xn)]→ E[X∞(1−X∞)].
Moreover, as (Xn : n ≥ 1) is a Cauchy sequence in L2,
E[Xn(1−Xn)]/4 = E[(Xn+1 −Xn)2]→ 0.
By combining the above two facts we see that E[X∞(1−X∞)] = 0. As X∞
is a.s. [0, 1]-valued, X∞(1 − X∞) ≥ 0 a.s.; as its expectation is 0, it must
equal 0 a.s., and hence X∞ is a.s. 0, 1-valued. Finally, as Xn → X∞ in L1
and as X is a martingale, E[X∞] = E[X0] = a and therefore X∞ is Bernoulli
distributed with parameter a.
Exercise 1.5
Let us suppose that X = (Xn : n ≥ 0) is a martingale in L2.
Proposition. The increments of X are pairwise orthogonal. That is, for all
n 6= m,
E[(Xn+1 −Xn)(Xm+1 −Xm)] = 0.
8
Proof. Without loss of generality let us suppose that n > m ≥ 0. Then
E[(Xm+1 −Xm)(Xn+1 −Xn)] = E[E[(Xm+1 −Xm)(Xn+1 −Xn) | Fn]]
= E[(Xm+1 −Xm)E[Xn+1 −Xn | Fn]]
= E[(Xm+1 −Xm)(Xn −Xn)]
= 0.
The second equality in the above holds as Xm+1 − Xm is Fn-measurable
and in L2 and Xn+1 −Xn is in L2, and the penultimate equality holds by
the martingale property.
Proposition. The process X is bounded in L2 if and only if
∞∑n=0
E[(Xn+1 −Xn)2] <∞.
Proof. The Pythagorean theorem for inner product spaces says that, if
v1, . . . , vN is an orthogonal set, then ‖∑N
n=1 vn‖2 =∑N
n=1‖vn‖2. As
Xn+1 − Xn : 0 ≤ n ≤ N is an orthogonal set in L2, we see that∑Nn=0 E[(Xn+1 −Xn)2] = E[(XN+1 −X0)
2] and hence that
0 ≤ limN→∞
N∑n=0
E[(Xn+1−Xn)2] = limN→∞
E[(XN −X0)2] ≤ sup
N≥1E[(XN −X0)
2]
If∑∞
n=0 E[(Xn+1 − Xn)2] < ∞ then the second term in the above is also
finite and, as convergent sequences are bounded, the third term is also finite.
It follows that X = (X −X0) + X0 is bounded in L2. Conversely, if X is
bounded in L2 then the third term is finite and hence so is the first term.
Exercise 1.6
In this exercise we will prove several versions of Wald’s identity. Let us take
X = (Xn : n ≥ 1) to be an iid sequence of integrable random variables, and
let us define Sn := X1 + · · ·+Xn for all n ≥ 0, Fn := σ(X1, . . . , Xn) for all
n ≥ 1, and F0 := ∅,Ω. Finally, let T be a stopping time with respect to
the filtration (Fn : n ≥ 0).
Proposition. If Xn is nonnegative for each n ≥ 1 then E[ST ] = E[T ]E[X1].
Remark. Under the assumptions above, if ω is such that T (ω) = ∞ then
we should interpret ST (ω) to mean limn→∞ Sn(ω). Provided we permit the
possibility of infinity, this interpretation is sensible as (Sn : n ≥ 1) is a
nondecreasing sequence.
9
Proof. Let us consider the process Y = (Yn : n ≥ 0) := (Sn−nE[X1] : n ≥ 0).
It is trivial that Y is adapted to the filtration (Fn : n ≥ 0) and that Yn is
integrable. We also a.s. have that, for each n ≥ 0,
E[Yn+1 | Fn] = E[Yn +Xn+1 − E[X1] | Fn] = Yn + E[Xn+1]− E[X1] = Yn.
This is the case as Y is adapted and as (Xn : n ≥ 1) is an iid sequence. It
follows from these properties that Y is a martingale and hence that Y T is,
too. The martingale property implies that E[Y Tn ] = E[Y T
0 ] = 0—that is, that
E[ST∧n] = E[T ∧ n]E[X1]. If we apply the monotone convergence theorem to
both sides of this then we see E[ST ] = E[T ]E[X1].
Proposition. If T is integrable then E[ST ] = E[T ]E[X1].
Proof. We will use our previous proposition in our proof. We see that
Sn =n∑k=1
Xk =n∑k=1
X+k −
n∑k=1
X−k =: S1n − S2
n.
As (X+n : n ≥ 1) and (X−n : n ≥ 1) are iid sequences of nonnegative,
integrable random variables, the first part of the question implies that
E[S1T ] = E[T ]E[X+
1 ] and E[S2T ] = E[T ]E[X−1 ]. As E[T ] <∞ and E[|X1|] <∞,
S1T and S2
T are integrable, and hence so is ST = S1T − S2
T :
E[ST ] = E[S1T ]− E[S2
T ] = E[T ](E[X+1 ]− E[X−1 ]) = E[T ]E[X1].
Let us now suppose that X1 is centred and that Ta := infn ≥ 0 : Sn ≥ afor some fixed a > 0. Were it the case that E[Ta] <∞, we would have that
Ta <∞ a.s. and hence, by the previous proposition, that 0 < a ≤ E[STa ] =
E[Ta]E[X1] = 0, which is absurd. It follows that E[Ta] =∞.
For the final part of the exercise, we may as well consider the more general
scenario in which P(Xn = 1) = p ∈ (1/2, 1) and P(Xn = −1) = q := 1− p as
the work is identical. If we can prove that E[Ta] <∞ then we can apply the
second proposition and conclude that E[STa ] = E[X1]E[Ta], i.e. that E[Ta] =
dae/(p− q). For each n ≥ 0, Ta ∧ n is a bounded (and therefore integrable)
stopping time. By the previous proposition, then, E[Ta ∧n]E[X1] = E[STa∧n].
The monotone convergence theorem then tells us that
0 ≤ E[Ta]E[X1] = limn→∞
E[Ta ∧ n]E[X1] = limn→∞
E[STa∧n] ≤ dae.
The last inequality is true as, for each n ≥ 0, E[STa∧n] ≤ dae. Therefore, as
E[X1] > 0, E[Ta] <∞ and so, from our earlier reasoning, we conclude that
E[Ta] = dae/(p− q). In the case where p = 2/3, E[Ta] = 3dae.
10
Exercise 1.7
In this exercise we will investigate the gambler’s ruin. Let us suppose that
(Xn : n ≥ 1) is an iid sequence with P(X1 = 1) = p ∈ (0, 1) and P(X1 =
−1) = q := 1 − p and that a, b ∈ Z are such that 0 < a < b. Let us define
Sn := a + X1 + · · · + Xn for all n ≥ 0; Fn := σ(X1, . . . , Xn) for all n ≥ 1;
F0 := ∅,Ω; and T := infn ≥ 0 : Sn ∈ 0, b.
Proposition. The process M = (Mn : n ≥ 0) defined by Mn := (q/p)Sn for
all n ≥ 0 is a martingale with respect to (Fn : n ≥ 0).
Proof. The process M is clearly adapted. It is also integrable as, for each
n ≥ 0, |Mn| = (q/p)Sn ≤ (q/p)a+n ∨ (q/p)a−n < ∞. To establish the
martingale property, let us fix some n ≥ 0. Then, noting that Mn is bounded
and that Xn+1 is independent of Fn, we a.s. have that
E[Mn+1 | Fn] = E[Mn(q/p)Xn+1 | Fn] = (q/p)SnE[(q/p)Xn+1 | Fn]
= MnE[(q/p)Xn+1 ]
= Mn((q/p) · p+ (p/q) · q) = Mn.
Proposition. The process N = (Nn : n ≥ 0) defined by Nn := Sn−n(p− q)for all n ≥ 0 is a martingale with respect to (Fn : n ≥ 0).
Proof. The process N is clearly adapted. As Nn is a finite sum of integrable
random variables added to a constant for each n ≥ 1, it is also integrable.
Let us fix some n ≥ 0. We see that
E[Nn+1 | Fn] = E[Xn+1] + Sn − (n+ 1)(p− q) = Nn
as Xn+1 is independent of Fn and E[Xn+1] = p− q. It follows that N is a
martingale.
If S steps up b times in a row, then T must have occurred by the end of
that sequence, so
T ≤ b · infk + 1 : k ∈ Z≥0, Xkb+j = 1 for all 1 ≤ j ≤ b
=: bτ
with probability 1. It is clear that τ has the distribution of a geometric
random variable (taking values in 1, 2, . . .) with success probability pb. As
τ has finite expectation, E[T ] <∞; it follows that T <∞ a.s. If Tα denotes
the first time that S hits α ∈ Z, then T = T0 ∧ Tb and
P(T0 < Tb) + P(Tb < T0) = 1. (1)
11
As MT is a bounded process, we can apply the dominated convergence
theorem to give us
(q/p)a = E[MT0 ] = E[MT
n ]→ E[MT ] = P(T0 < Tb) + (q/p)bP(Tb < T0) (2)
where the second equality follows from MT being a martingale. By (1) and
(2),
P(Tb < T0) =(q/p)a − 1
(q/p)b − 1and P(T0 < Tb) =
(q/p)b − (q/p)a
(q/p)b − 1. (3)
We note that P(ST = 0) = P(T0 < Tb). We also see that N has bounded
increments:
|Nn+1−Nn| = |(Sn+1−Sn)−((n+1)(p−q)−n(p−q))| ≤ |Xn+1|+|p−q| ≤ 2.
As E[T ] <∞, the optional stopping theorem entails that
a = E[N0] = E[NT ] = E[ST − T (p− q)] = bP(Tb < T0)− E[T ](p− q).
By the above, together with (3), we conclude that
E[T ] =bP(Tb < T0)− a
p− q=b((q/p)a − 1)− a((q/p)b − 1)
(p− q)((q/p)b − 1).
Exercise 1.8
We shall prove a version of the Azuma–Hoffding inequality. Let us suppose
that Y is a centred random variable that takes its values in [−c, c]. The
function fθ : [−c, c]→ R defined by fθ(y) := eθy is convex for each θ ∈ R. If
−c ≤ y ≤ c, then y = λ(−c) + (1− λ)c with λ = (c− y)/2c ∈ [0, 1]. As fθ is
convex, fθ(y) ≤ λfθ(−c) + (1− λ)fθ(c). That is,
eθy ≤ c− y2c· e−θc +
c+ y
2c· eθc.
It follows that, as E[Y ] = 0,
E[eθY ] ≤ E
[c− Y
2c· e−θc +
c+ Y
2c· eθc
]=
eθc + e−θc
2= cosh(θc).
We also have that
cosh(θc) =
∞∑n=0
(θc)2n
(2n)!≤∞∑n=0
(θc)2n
2n · n!=
∞∑n=0
1
n!·(
(θc)2
2
)n= exp
((θc)2
2
).
We also have a conditional analogue to this; the proof is essentially identical.
12
Lemma. If Y is a random variable taking values in [−c, c] and if E[Y | G ] = 0
a.s. for some sub-σ-algebra G of F then, for all θ ∈ R, with probability 1
E[eθY | G ] ≤ cosh(θc) ≤ exp
((θc)2
2
).
Next, let us suppose that M = (Mn : n ≥ 0) is a martingale such that
M0 = 0 and, for each n ≥ 1, there is some cn > 0 such that |Mn−Mn−1| ≤ cn.
By convexity, nonnegativity and Jensen’s inequality, the process (eθMn : n ≥0) is a nonnegative submartingale. If we take θ > 0, by Doob’s submartingale
inequality we also have that, for each x > 0,
P
(supk≤n
Mk ≥ x
)= P
(supk≤n
eθMk ≥ eθx
)≤ e−θxE[eθMn ].
As eθMn is bounded for every n ≥ 0, for each n ≥ 1 we have that
E[eθMn ] = E[E[eθ(Mn−Mn−1)eθMn−1 | Fn−1]] = E[eθMn−1E[eθ(Mn−Mn−1) | Fn−1]]
and so, as Mn −Mn−1 satisfies the conditions of the lemma with G := Fn−1
and c := cn,
E[eθMn ] ≤ eθ2c2n/2E[eθMn−1 ] ≤ · · · ≤ exp
(θ2
2
n∑k=1
c2k
).
Therefore, for every θ > 0 and x > 0,
P
(supk≤n
Mk ≥ x
)≤ exp
(θ2
2
n∑i=1
c2i − θx
).
By varying θ > 0 and using elementary calculus, we see that the right-hand
side is minimised when θ = x/∑n
i=1 c2i > 0. By putting this value of θ into
the above, we conclude that
P
(supk≤n
Mk ≥ x
)≤ exp
(−x2/2∑nk=1 c
2k
).
Exercise 1.9
Let us suppose that f : [0, 1] → R is Lipschitz continuous with Lipschitz
constant K and that, for each n ≥ 0, fn is the piecewise linear function
13
that agrees with f on the set Dn := k2−n : 0 ≤ k ≤ 2n. Let us take
Mn := f ′n1Dcn, say. By definition,
Mn =2n−1∑k=0
f((k + 1)2−n)− f(k2−n)
2−n· 1(k2−n,(k+1)2−n).
This is a suggestive way of writing Mn, especially once we notice that
([0, 1],B([0, 1]), µ) is a probability space, where µ denotes Lebesgue measure.
For each n ≥ 0, let us define
Fn := σ(P(Dn) ∪ (k2−n, (k + 1)2−n) : 0 ≤ k ≤ 2n − 1
).
This defines a filtration on our probability space.
Lemma. The process M = (Mn : n ≥ 0) is a martingale with respect to
(Fn : n ≥ 0).
Proof. Let us fix some n ≥ 0. It is clear that Mn is Fn-measurable and
integrable as it is a linear combination of indicator functions of sets in Fn.
To show that the martingale property holds, it is enough to verify that
E[Mn+11(k2−n,(k+1)2−n)] = E[Mn1(k2−n,(k+1)2−n)] for each 0 ≤ k ≤ 2n − 1.
(This is the case as every set in Fn is a finite disjoint union of sets of this
type together with a null set—namely, a subset of Dn.) This is true, however:
if 0 ≤ k ≤ 2n − 1, then
E[Mn+11(k2−n,(k+1)2−n)] =f((2k + 1)2−(n+1))− f(k2−n)
2−(n+1)· 2−(n+1)
+f((k + 1))2−n)− f((2k + 1)2−(n+1))
2−(n+1)· 2−(n+1)
= f((k + 1)2−n)− f(k2−n)
= E[Mn1(k2−n,(k+1)2−n)].
We will now show that f can be written as the integral of a bounded
function. As f is Lipschitz continuous with Lipschitz constant K, we have
that
|Mn| ≤2n−1∑k=0
|f((k + 1)2−n)− f(k2−n)|2−n
· 1(k2−n,(k+1)2−n) ≤ K
for each n ≥ 0. It follows that M is a bounded process and hence that it
is uniformly integrable. By the UI martingale convergence theorem there
is an integrable random variable M∞ such that Mn → M∞ a.s. and in L1.
14
It is thus clear that |M∞| ≤ K a.s. By the L1 convergence, if n ≥ 1 and
0 ≤ k ≤ 2n then E[Mn+m1(0,k2−n)] → E[M∞1(0,k2−n)]| as m → ∞, and
therefore
f(k2−n)− f(0) = E[Mn+m1[0,k2−n)]→ E[M∞1[0,k2−n)] =
∫ k2−n
0M∞(x) dx
as m→∞. It follows that for all dyadic rationals q ∈ [0, 1],
f(q) = f(0) +
∫ q
0M∞(x) dx. (4)
If y ∈ [0, 1] and (qn : n ≥ 0) is a sequence of dyadic rationals in [0, 1] such
that qn y, then∣∣∣∣∣ f(y)− f(0)−∫ y
0M∞(x) dx
∣∣∣∣∣ ≤ |f(y)− f(qn)|
+
∣∣∣∣∣ f(qn)− f(0)−∫ qn
0M∞(x) dx
∣∣∣∣∣+
∣∣∣∣∣∫ y
qn
M∞(x) dx
∣∣∣∣∣,which tends to 0 as n→∞ as f is continuous and |M∞| ≤ K. We conclude
that (4) also holds when q is any element of [0, 1].
Exercise 1.10
We shall prove a decomposition result for submartingales due to Doob.1
Doob’s decomposition theorem. If X = (Xn : n ≥ 0) is a submartingale
then, modulo null sets, there is a unique martingale M = (Mn : n ≥ 0) and
a unique previsible process A = (An : n ≥ 0) such that A0 = 0 a.s., A is a.s.
nondecreasing and X = M +A a.s.
Proof. We will begin with a proof of uniqueness. If X has a decomposition
X = M +A a.s. of the required type then, a.s.,
E[Xn+1−Xn | Fn] = E[Mn+1−Mn | Fn]+E[An+1−An | Fn] = An+1−An.
Therefore we a.s. have that, for all n ≥ 0,
An+1 =
n∑k=0
(Ak+1 −Ak) =
n∑k=0
E[Xk+1 −Xk | Fk], (5)
1There is a continuous-time analogue of this result due to Meyer.
15
and so A is uniquely determined (up to null sets) and hence M = X −A is
also uniquely determined (up to null sets).
For existence, let us define A0 := 0 and, for all n ≥ 0, An+1 as in (5)
and Mn := Xn − An. It is immediate from the definition of A that it is
previsible; it is equally immediate from the submartingale property that it
is a.s. nondecreasing. To see that M is a martingale we notice that, for all
n ≥ 0, Mn is Fn-measurable and integrable as it is a linear combination of
Fn-measurable and integrable functions, and that, a.s.,
E[Mn+1 −Mn | Fn] = E[Xn+1 −Xn | Fn]− E[An+1 −An | Fn] = 0
by the definition of A. The required decomposition thus exists.
Proposition. Suppose that X is a submartingale and that X = M +A a.s.
is its Doob decomposition. The processes M and A are bounded in L1 if and
only if X is bounded in L1; whenever this is the case, A∞ := limn→∞An <∞a.s.
Proof. It follows immediately from the triangle inequality that if A and M are
bounded in L1 then X is bounded in L1. Conversely, if supn≥0 E[|Xn|] ≤ Kthen, for each n ≥ 0,
E[|An+1|] = E[An+1] = E
[n∑k=0
E[(Xk+1 −Xk) | Fk]
]= E[Xn]−E[X0] ≤ 2K
and hence, as A0 = 0 a.s., supn≥0 E[|An|] ≤ 2K. As M = X − A a.s., the
triangle inequality implies that M is also bounded in L1. To close, as A is a.s.
nondecreasing, it possess an a.s. limit A∞. If X is bounded in L1, then A is
also bounded in L1, and hence E[A∞] < ∞ by the monotone convergence
theorem; this entails that A∞ <∞ a.s.
Exercise 1.11
Proposition. If X = (Xn : n ≥ 0) is a UI submartingale and X = M +A
a.s. is its Doob decomposition then M is UI.
Proof. As X is UI it is bounded in L1 and so, by the results in Exercise 1.8, A
must also be bounded in L1. Our work also showed that 0 ≤ An ≤ A∞ ∈ L1
and hence A is also UI. We will now show that M = X − A a.s. is UI,
16
so let us fix some ε > 0 and let δ > 0 be such that, if P(A) < δ, then
supn≥0 E[|An|1A] < ε/2 and supn≥0 E[|Xn|1A] < ε/2. Then
supn≥0
E[|Mn|1A] ≤ supn≥0
E[|An|1A] + supn≥0
E[|Xn|1A] < ε/2 + ε/2 = ε.
It follows that M is UI.
Proposition. If X = (Xn : n ≥ 0) is a UI submartingale and S and T are
stopping times such that T ≥ S, then E[XT | FS ] ≥ XS a.s.
Remark. You might be shaking your head here, as T needn’t be finite. This
is true. However, as X is UI, Xn → X∞ a.s. for some random variable X∞
and hence, if T (ω) =∞, we can (and do) take XT (ω) to mean X∞(ω).
Proof. As we a.s. have that XT = AT +MT = (AT − AS) + AS +MT , we
also a.s. have that
E[XT | FS ] = E[AT−AS | FS ]+E[AS | FS ]+E[MT | FS ] ≥ AS+MS = XS .
In the inequality we have used, respectively, the fact that A is a.s. nonde-
creasing and T ≥ S; the fact that AS is FS-measurable; and the optional
stopping theorem for UI martingales. The only one of these facts that has
not been proven in lectures is the second, which we prove now. For each
Borel set B,
A−1S (B) =(A−1∞ (B) ∩ S =∞
)∪∞⋃n=0
(A−1n (B) ∩ S = n
). (6)
Let us fix some n ≥ 0. Then, for all m ≥ 0,
A−1n (B) ∩ S = n ∩ S ≤ m =
∅ if m < n,
A−1n (B) ∩ S = n otherwise.
This belongs to Fm in both cases. We also see that A−1n (B)∩S =∞∩S ≤m = ∅, which is an element of every σ-algebra, and hence A−1S (B) ∈ FS
for each B ∈ B by (6). We conclude that AS is FS-measurable.
2 Weak Convergence
Let us suppose that (Xn : n ≥ 1) is an sequence of iid random variables, each
uniformly distributed on [0, 1], and let us define Mn := X1 ∨ · · · ∨Xn. Let
17
us also define, for all n ≥ 1, Yn := n(1−Mn) and FYn to be its distribution
function. Then, for all x ∈ R,
FYn(x) = P(n(1−Mn) ≤ x) = P(Mn ≥ 1− x/n) = 1− P(Mn < 1− x/n)
= 1− P(∀i ≤ n, Xi < 1− x/n)
= 1− (P(X1 < 1− x/n))n
where we have used the iid hypothesis in the final line. If x < 0, then
P(X1 < 1 − x/n) = 1 and so FYn(x) = 0 for all n ≥ 1. If x ≥ 0 then for
all sufficiently large n, P(X1 < 1− x/n) = 1− x/n and therefore (P(X1 <
1− x/n))n = (1− x/n)n → e−x as n→∞, and hence FYn(x)→ 1− e−x as
n→∞. It follows that
FYn(x)→
0 if x < 0,
1− e−x if x ≥ 0.
It follows then that Yn tends in distribution to an exponential random
variable of parameter 1 as n→∞.
Exercise 2.2
Let us suppose that (Xn : n ≥ 0) is a sequence of random elements of a
metric space (M,d) and that all of these random elements are defined on
the same probability space, (Ω,F ,P).
Proposition. If Xn → X a.s. then Xn → X in distribution.
Proof. If Xn → X a.s. and f ∈ Cb(M) then, as f is continuous, f(Xn) →f(X) a.s. and so E[f(Xn)]→ E[f(X)] by the dominated convergence theorem.
Proposition. If Xn → X in probability then Xn → X in distribution.
Proof. Let us suppose that Xn → X in probability and that it is not the case
that Xn → X in distribution; i.e., that there is some f ∈ Cb(M), some ε > 0
and some subsequence (n(k) : k ≥ 1) such that |E[f(Xn(k))]− E[f(X)]| ≥ εfor all k ≥ 1. As Xn → X in probability, there is a further subsequence
(n(k(r)) : r ≥ 1) such that Xn(k(r)) → X a.s. as r → ∞. By the previous
proposition, Xn(k(r)) → X in distribution as r →∞—but this is absurd. It
therefore must be that Xn → X in distribution.
18
Proposition. If Xn → c in distribution for some constant c ∈ M then
Xn → c in probability.
Proof. Let us suppose that Xn → c in distribution and that B(c, ε) denotes
the (open) ball in M with centre c and radius ε > 0. By the portmanteau
lemma we see that
lim infn→∞
P(Xn ∈ B(c, ε)) ≥ P(c ∈ B(c, ε)) = 1
and hence that lim supn→∞ P(Xn ∈ B(c, ε)) = 1. It follows that P(Xn ∈B(c, ε)) → 1 or, phrased equivalently, that P(d(Xn, c) ≥ ε) → 0. As ε > 0
was arbitrary we thus have that Xn → c in probability.
Exercise 2.3
Let us suppose that (Xn : n ≥ 0) and (Yn : n ≥ 0) are random variables. We
first remark that it need not be the case that, if Xn → X in distribution and
Yn → Y in distribution, then (Xn, Yn)→ (X,Y ) in distribution. For example,
we can take Xn and Z to be uniformly distributed on −1, 1 and we can
define Yn := −Xn; in this case, both Xn and Yn tend to Z in distribution
but (Xn, Yn) does not tend to (Z,Z) in distribution. (To see this, take the
open set U = (−2, 0)× (−2, 0) and observe that, for all n ≥ 0,
lim infn→∞
P((Xn, Yn) ∈ U) = 0 < 1/2 = P((Z,Z) ∈ U).
By the portmanteau lemma, it cannot be the case that (Xn, Yn)→ (Z,Z) in
distribution.)
Proposition. Suppose that X and Y are independent random variables
defined on the probability space (Ω,F ,P) and that, for every n ≥ 0, Xn
and Yn are independent random variables defined on the probability space
(Ωn,Fn,Pn). If Xn → X and Yn → Y in distribution, (Xn, Yn)→ (X,Y ) in
distribution.
Proof. If we let φ denote the characteristic function then, for all ξ = (ξ1, ξ2) ∈R2,
φ(Xn,Yn)(ξ) = φXn(ξ1)φYn(ξ2)→ φX(ξ1)φY (ξ2) = φ(X,Y )(ξ)
where the convergence follows from Levy’s convergence theorem and the
equalities follow from our independence assumptions. By Levy’s convergence
theorem, (Xn, Yn)→ (X,Y ) in distribution.
19
Proposition. Suppose that X,Y,X1, Y1, X2, Y2, . . . are random variables
defined on the common probability space (Ω,F ,P). If Y is a.s. constant,
Xn → X in distribution and Yn → Y in distribution, then (Xn, Yn)→ (X,Y )
in distribution.
Proof. Let us suppose that Y = c a.s., where c ∈ R is constant. It is clear
that Yn → c in distribution so, by the previous exercise, Yn → c in probability.
Let us fix some ξ = (ξ1, ξ2) ∈ R2; then
φ(Xn,Yn)(ξ) = E[eiξ1Xneiξ2(Yn−c)]eiξ2c = E[eiξ1Xn(eiξ2(Yn−c)−1)]eiξ2c+φXn(ξ1)eiξ2c.
(7)
Let us fix some ε > 0. By continuity, there is some δ > 0 such that |Yn−c| < δ
implies that |eiξ2(Yn−c) − 1| < ε/4. As Yn → c in probability, for this δ and
ε there is some N ≥ 0 such that, for all n ≥ N , P(|Yn − c| ≥ δ) < ε/4. We
therefore see that
|E[eiξ1Xn(eiξ2(Yn−c) − 1)]| ≤ E[|eiξ2(Yn−c) − 1|(1|Yn−c|<δ + 1|Yn−c|≥δ)]
≤ ε/4 + 2P(|Yn − c| ≥ δ)
≤ 3ε/4.
As Xn → X in distribution, by Levy’s convergence theorem there is some
N ′ ≥ N such that, for all n ≥ N ′, |φXn(ξ1)− φX(ξ1)| ≤ ε/4. From (7), then,
for all n ≥ N ′ ≥ N we have that
|φ(Xn,Yn)(ξ)− φX(ξ1)eiξ2c| ≤ |φ(Xn,Yn)(ξ)− φXn(ξ1)e
iξ2c|+ |φXn(ξ1)eiξ2c − φX(ξ1)e
iξ2c|
= |E[eiξ1Xn(eiξ2(Yn−c) − 1)]|+ |φXn(ξ1)− φX(ξ1)|
≤ ε
and so φ(Xn,Yn)(ξ)→ φX(ξ1)eiξ2c as n→∞. By Levy’s convergence theorem,
we conclude that (Xn, Yn)→ (X, c) in distribution and hence that (Xn, Yn)→(X,Y ) in distribution.
Remark. It is natural to consider the weak convergence of probability mea-
sures when those measures are on a given metric space, (M,d). Both proposi-
tions in this question have generalisations to this setting [?, Theorems 2.8 &
3.9]. I will develop a general proof here so that we can appreciate the change
in flavour when we attempt to generalise results to metric spaces; the proof
becomes overtly topological and ‘hands on’ when we are unable to bludgeon
claims with Levy’s cudgel.
20
Exercise 2.4
Proposition. All finite families of probability measures on (Rd,B(Rd)) are
tight.
Proof. Let us define Qm := [−m,m]d for all m ≥ 1; this is a compact set
and Qm Rd. Let us suppose that µ1, . . . , µn are our probability measures
and let us fix some ε > 0. As µk(Qm)→ 1 as m→∞, there must be some
Mk such that, for all m ≥Mk, µk(Qcm) ≤ ε. It follows that
sup1≤k≤n
µk(QcM1∨···∨Mn
) ≤ ε
and thus, as ε > 0 was arbitrary, we conclude that our family is tight.
Proposition. If (µn : n ≥ 0) is a tight sequence of measures on (Rd,B(Rd))with the property that supn≥0 µ(Rd) < ∞, then there is some sequence
(nk : k ≥ 0) such that µn(k) converges weakly to some measure, µ.
Proof. If there is a subsequence (nk : k ≥ 0) such that µn(k)(Rd) = 0 for
each k ≥ 0, then µn(k) converges weakly to the zero measure.
Let us suppose instead that there is some N ≥ 0 such that, for all
n ≥ N , µn(Rd) > 0. If it is the case that infn≥N µn(Rd) = 0, we can extract
a subsequence (nk : k ≥ 0) such that µn(k)(Rd) → 0. If f ∈ Cb(Rd), then
|µn(k)(f)| ≤ ‖f‖∞µn(k)(Rd)→ 0 as k →∞ and hence µn(k) converges weakly
to the zero measure as k →∞.
On the other hand, if it is the case that infn≥N µn(Rd) > 0, we can define
probability measures νn on (Rd,B(Rd)) by νn(A) := µn(A)/µn(Rd) for all
A ∈ B(Rd) and all n ≥ N . The sequence (νn : n ≥ N) is tight: for all
ε > 0, there is a compact set K with the property that supn≥N µn(Kc) ≤ε · infn≥N µn(Rd), and hence
supn≥N
νn(Kc) = supn≥N
µn(Kc)
µn(Rd)≤ ε.
By Prohorov’s theorem, there is a subsequence (νn(k) : k ≥ 0) of (νn : n ≥ N)
such that νn(k) converges weakly to a probability measure, ν, as k → ∞.
Further, as our assumptions imply that (µn(k)(Rd) : k ≥ 0) is a bounded
sequence, the Bolzano–Weierstrass theorem implies that there is a convergent
subsequence (µn(k(r))(Rd) : r ≥ 0). It follows therefore that, as r → ∞,
νn(k(r)) → ν weakly and µn(k(r))(Rd) → C, say. To close, we claim that
21
µn(k(r)) = µn(k(r))(Rd) · νn(k(r)) → C · ν =: µ weakly as r →∞. To prove this,
we observe that if f ∈ Cb(Rd), then
µn(k(r))(f) = µn(k(r))(Rd) · νn(k(r))(f)→ C · ν(f) = µ(f)
as r →∞. We conclude that µn(k(r)) → µ weakly as r →∞, as required.
22