56
Convergence of Markov Processes Amanda Turner University of Cambridge 1

Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Convergence of Markov Processes

Amanda TurnerUniversity of Cambridge

1

Page 2: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Contents

1 Introduction 2

2 The Space DE [0,∞) 3

2.1 The Skorohod Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Convergence of Probability Measures 10

3.1 The Prohorov Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 The Skorohod Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Convergence of Finite Dimensional Distributions 18

5 Relative Compactness in DE [0,∞) 21

5.1 Prohorov’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.2 Compact Sets in DE [0,∞) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.3 Some Useful Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6 A Law of Large Numbers 28

6.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.2 The Fluid Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.3 A Brief Look at the Exit Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7 A Central Limit Theorem 38

7.1 Relative Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

7.2 Convergence of the Finite Dimensional Distributions . . . . . . . . . . . . . . . . 42

8 Applications 48

8.1 Epidemics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

8.2 Logistic Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

9 Conclusion 53

1

Page 3: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

1 Introduction

This essay aims to give an account of the theory and applications of the convergence of stochasticprocesses, and in particular Markov processes. This is developed as a generalisation of theconvergence of real-valued random variables using ideas mainly due to Prohorov and Skorohod.Sections 2 to 5 cover the general theory, which is applied in Sections 6 to 8.

For random variables taking values in R, there are a number of types of convergence includ-ing almost sure convergence, convergence in probability and convergence in distribution. Thefirst two depend on the random variables being defined on the same probability space and areconsequently not sufficiently general. Convergence in distribution is weaker than the other twotypes of convergence in the sense that if Xn → X almost surely or in probability, then Xn

converges to X in distribution. However, if Xn converges to X in distribution, then there existsa probability space on which are defined random variables Yn and Y with the same distributionsas Xn and X such that Yn → Y almost surely, and hence in probability. In this way convergencein distribution ‘incorporates’ the other types of convergence.

The notion of convergence for stochastic processes, that is random variables taking valuesin some space of functions on [0,∞), is even less straightforward. Once again, there is almostsure convergence and convergence in probability, but one may also say that Xn converges to Xif the finite-dimensional distributions converge, that is if for any choice t1, . . . , tk of times, then(Xn

t1 , . . . , Xntk

) converges in distribution to (Xt1 , . . . , Xtk). It turns out that a direct generali-

sation of convergence in distribution of random variables in R to stochastic processes in somesense ‘incorporates’ all these types of convergence.

We restrict our attention to stochastic processes whose sample paths are right continuousfunctions with left limits at every time point t, also known as cadlag functions. The reason forthis is that most processes which arise in applications have this property, and these functionsare reasonably well behaved. In order to be able to talk about almost sure convergence andconvergence in probability, we need a topology on the space of cadlag functions. It turns out tobe extremely difficult to construct a topology with useful properties on this space and Section2 contains a very technical discussion of the construction and properties of such a topology, theSkorohod topology.

For each stochastic process X , there is a unique probability measure on the space of cadlagfunctions that characterises the distribution of X . As this probability measure is independentof the probability space on which X is defined, it is easier to work with the probability measuresto obtain results on convergence. In Section 3 we construct a metric on the space of probabilitymeasures that induces a topology equivalent to that generated by convergence in distribution ofthe related stochastic processes. Using this we establish an equivalence between convergence indistribution, almost sure convergence and convergence in probability.

Section 4 looks at the convergence of finite dimensional distributions. This viewpoint is usedto establish a key result, due to Prohorov, that states that a sequence of stochastic processesconverges if and only if it is relatively compact and the corresponding finite dimensional distri-butions converge. This gives the required equivalence between convergence in distribution andconvergence of finite dimensional distributions.

In order to apply the result from Section 4 in any practical situations, we need to have anunderstanding of what it means for a sequence of stochastic processes to be relatively compact.Prohorov’s Theorem, discussed in Section 5, establishes equivalent conditions in terms of thecompact sets on the space of cadlag functions. Characterising these compact sets in a way thatcan be easily applied and putting these results together gives some useful necessary and sufficientconditions for a family of stochastic processes to be relatively compact.

The remainder of the essay applies the general theory that has been built up so far to variouscases of interest. In Section 6 the Law of Large Numbers is generalised by showing that undercertain conditions a sequence of Markov jump processes can converge to the solution of an

2

Page 4: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

ordinary differential equation. (The sense in which this is a generalisation of the Law of LargeNumbers is explained at the beginning of the section.)

This idea is developed further in Section 7, by showing that the fluctuations about this limitconverge in distribution to the solution of a stochastic differential equation, which generalisesthe central limit theorem. Here the results from Section 4 and the characterisation of relativecompactness from Section 5 are applied to prove the convergence in distribution.

Finally, the applications of the large number and central limit results to some practicalsituations are discussed. Particular mention is given to the application of these limit theoremsto population processes in biology.

The material in Sections 2 to 5 is broadly based on the approach of Ethier and Kurtz [4].Sections 6 and 7 cover material from the paper of Darling and Norris [3], although the applicationof the theorems from Section 4 and Section 5 is an extension of the results in this paper.

2 The Space DE[0,∞)

Most stochastic processes arising in applications have right and left limits at each time pointfor almost all sample paths. By convention we assume that the sample paths are in fact rightcontinuous where this can be done without changing the finite-dimensional distributions. Forthis reason, the space of right continuous functions with left limits is of great importance andin this section we explore its various properties and define a suitable metric on it. We concludethe section by investigating the Borel σ-algebra that results from this metric.

Although in the applications to be discussed, the stochastic processes have sample pathstaking values in some subset of (Rd, | · |), where possible we establish results for processes withsample paths taking values in a general metric space. Throughout this essay we shall denotethis metric space by (E, r), and define q to be the metric q = r ∧ 1.

Definition 2.1. DE [0,∞) is the space of all right continuous functions x : [0,∞) → E with leftlimits i.e. for each t ≥ 0, lims↓t x(s) = x(t), and lims↑t x(s) = x(t−) exists.

We begin with a result that shows that functions in DE[0,∞) are fairly well behaved.

Lemma 2.2. If x ∈ DE [0,∞), then x has at most countably many points of discontinuity.

Proof. The set of discontinuities of x is given by⋃∞

n=1An where An = t > 0 : r(x(t), x(t−)) >1n, so it is enough to show that each An is countable. Suppose we have distinct points t1, t2, . . . ∈An with tm → t for some t, as m → ∞. By restricting to a subsequence if necessary, we mayassume that either tm ↑ t or tm ↓ t. Then limm→∞ x(tm) = x(t−) = limm→∞ x(tm−) orlimm→∞ x(tm) = x(t) = limm→∞ x(tm−) and so r(x(tm), x(tm−)) < 1

n for large enough m,contradicting xm ∈ An. Therefore An cannot contain any limiting sequences. But for eachT > 0, every sequence in the interval [0, T ] has a convergent subsequence and so there are onlya finite number of points of An in [0, T ]. Hence An is countable, as required.

2.1 The Skorohod Topology

The results on convergence of probability measures that we shall prove in subsequent sectionsare most applicable to complete separable metric spaces. For this reason we define a metricon DE [0,∞) under which it is separable and complete if (E, r) is separable and complete. Inparticular DRd [0,∞) will be separable and complete.

Definition 2.3. Let Λ′ be the collection of strictly increasing functions λ mapping [0,∞) onto[0,∞) (in particular, λ(0) = 0, limt→∞ λ(t) = ∞, and λ is continuous) Let Λ be the set of

3

Page 5: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Lipschitz continuous functions λ ∈ Λ′ such that

γ(λ) = sup0≤s<t

logλ(t) − λ(s)

t− s

<∞

For x, y ∈ DE [0,∞), define

d(x, y) = infλ∈Λ

[

γ(λ) ∨∫ ∞

0

e−ud(x, y, λ, u)du

]

whered(x, y, λ, u) = sup

t≥0q(x(t ∧ u), y(λ(t) ∧ u)).

The Skorohod Topology is the topology induced on DE [0,∞) by the metric d.

Proposition 2.4. The function d, defined above, is a metric on DE [0,∞).

Proof. Suppose (xn)n≥1, (yn)n≥1 are sequences in DE [0,∞). Then limn→∞ d(xn, yn) = 0 if andonly if there exists a sequence (λn)n≥1 in Λ such that

limn→∞

γ(λn) = 0 (2.1)

andlim

n→∞µu ∈ [0, u0] : d(xn, yn, λn, u) ≥ ε = 0 (2.2)

for every ε > 0 and u0 > 0, where µ is Lebesgue measure.

Now for all T > 0,

T(

eγ(λ) − 1)

= T(

esup0≤s<t|log λ(t)−λ(s)t−s | − 1

)

= sup0≤s<t

T(

e|logλ(t)−λ(s)

t−s | − 1)

≥ sup0<t≤T

T(

e|logλ(t)

t | − 1)

= sup0<t≤T

T

(

max

λ(t) − t

t,

t− λ(t)

λ(t)

)

≥ sup0<t≤T

max λ(t) − t, t− λ(t) ,

where the first inequality follows from setting s = 0 and bounding t by T and the next line

follows by considering the cases log λ(t)t ≥ and < 0 separately. This gives us

sup0≤t≤T

|λ(t) − t| ≤ T(

eγ(λ) − 1)

, (2.3)

by which (2.1) implieslim

n→∞sup

0≤t≤T|λn(t) − t| = 0 (2.4)

for all T > 0.

Now suppose d(x, y) = 0. Then setting xn = x and yn = y for all n ∈ N, (2.2) and (2.4)imply that x(t) = y(t) for almost all continuity points t of y. But by Lemma 2.2, y has at mostcountably many points of discontinuity and so x(t) = y(t) for almost all points t and, as x andy are right continuous, x = y.

Let x, y ∈ DE[0,∞). Then since λ is bijective on [0,∞)

supt≥0

q(x(t ∧ u), y(λ(t) ∧ u)) = supt≥0

q(x(λ−1(t) ∧ u), y(t ∧ u))

4

Page 6: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

for all λ ∈ Λ and u ≥ 0, and so d(x, y, λ, u) = d(y, x, λ−1, u). Also

γ(λ) = sup0≤s<t

logλ(t) − λ(s)

t− s

= sup0≤s<t

logλ(λ−1(t)) − λ(λ−1(s))

λ−1(t) − λ−1(s)

= sup0≤s<t

logt− s

λ−1(t) − λ−1(s)

= sup0≤s<t

logλ−1(t) − λ−1(s)

t− s

= γ(λ−1)

for every λ ∈ Λ and so d(x, y) = d(y, x).

To show that d is a metric it only remains to check the triangle inequality.

Let x, y, z ∈ DE [0,∞), λ1, λ2 ∈ Λ, and u ≥ 0. Then

supt≥0

q(x(t ∧ u), z(λ2(λ1(t)) ∧ u)) ≤ supt≥0

q(x(t ∧ u), y(λ1(t) ∧ u))

+ supt≥0

q(y(λ1(t) ∧ u), z(λ2(λ1(t)) ∧ u))

= supt≥0

q(x(t ∧ u), y(λ1(t) ∧ u))

+ supt≥0

q(y(t ∧ u), z(λ2(t) ∧ u)),

i.e. d(x, z, λ2 λ1, u) ≤ d(x, y, λ1, u) + d(y, z, λ2, u). But since λ2 λ1 ∈ Λ and

γ(λ2 λ1) = sup0≤s<t

logλ2(λ1(t)) − λ2(λ1(s))

t− s

= sup0≤s<t

logλ2(λ1(t)) − λ2(λ1(s))

λ1(t) − λ1(s)+ log

λ1(t) − λ1(s)

t− s

≤ sup0≤s<t

logλ2(λ1(t)) − λ2(λ1(s))

λ1(t) − λ1(s)

+ sup0<s≤t

logλ1(t) − λ1(s)

t− s

= γ(λ2) + γ(λ1),

we obtain d(x, z) ≤ d(x, y) + d(y, z) as required.

It is not very clear from the definition of the Skorohod topology under which conditionssequences in DE [0,∞) converge. The following two propositions establish some necessary andsufficient conditions for convergence in DE [0,∞) which are slightly easier to get an intuitivegrasp of.

Proposition 2.5. Let (xn)n≥1 and x be in DE [0,∞). Then limn→∞ d(xn, x) = 0 if and only ifthere exists a sequence (λn)n≥1 in Λ such that (2.1) holds and

limn→∞

d(xn, x, λn, u) = 0 for all continuity points u of x. (2.5)

In particular, limn→∞ d(xn, x) = 0 implies that limn→∞ xn(u) = limn→∞ xn(u−) = x(u) for allcontinuity points u of x.

Proof. By Lemma 2.2, x has only countably many discontinuity points and so, by the Reverse

5

Page 7: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Fatou Lemma and (2.5),

limn→∞

µu ∈ [0, u0] : d(xn, x, λn, u) ≥ ε = lim supn→∞

1u∈[0,u0]:d(xn,x,λn,u)≥εdµ

≤∫

1u∈[0,u0]:lim supn→∞ d(xn,x,λn,u)≥εdµ

≤ µu ∈ [0, u0] : u is a discontinuity point of x= 0,

i.e (2.2) holds and so the conditions are sufficient.

Conversely, suppose that limn→∞ d(xn, x) = 0 and that u is a continuity point of x. Thenthere exists a sequence (λn)n≥1 in Λ such that (2.1) holds, and (2.2) holds with yn = x for alln. In particular, there exists an increasing sequence (Nm)m≥1 such that for all n ≥ Nm

µ

v ∈ (u, u+ 1] : d(xn, x, λn, v) <1

m

> 0. (2.6)

Hence, for each Nm ≤ n < Nm+1, there exists a un ∈ (u, u + 1] such that d(xn, x, λn, un) < 1m .

By picking arbitrary values of un ∈ (u, u+ 1] for n < N1, we obtain

limn→∞

supt≥0

q(xn(t ∧ un), x(λn(t) ∧ un)) = limn→∞

d(xn, x, λn, un) = 0. (2.7)

Now

d(xn, x, λn, u) = supt≥0

q(xn(t ∧ u), x(λn(t) ∧ u))

≤ supt≥0

q(xn(t ∧ u), x(λn(t ∧ u) ∧ un))

+ supt≥0

q(x(λn(t ∧ u) ∧ un), x(λn(t) ∧ u))

But

supt≥0

q(x(λn(t ∧ u) ∧ un), x(λn(t) ∧ u)) = sup0≤t≤u

q(x(λn(t ∧ u) ∧ un), x(λn(t) ∧ u))

∨ supt>u

q(x(λn(t ∧ u) ∧ un), x(λn(t) ∧ u))

= sup0≤t≤u

q(x(λn(t) ∧ un), x(λn(t) ∧ u))

∨ supt>u

q(x(λn(u) ∧ un), x(λn(t) ∧ u))

= supu≤s≤λn(u)∨u

q(x(s), x(u))

∨ supλn(u)∧u<s≤u

q(x(λn(u) ∧ un), x(s))

where the third equality is obtained by setting s = λn(t)∧un in the first term and s = λn(t)∧uin the second term. Hence

d(xn, x, λn, u) ≤ sup0≤t≤u

q(xn(t ∧ un), x(λn(t) ∧ un))

+ supu≤s≤λn(u)∨u

q(x(s), x(u))

∨ supλn(u)∧u<s≤u

q(x(λn(u) ∧ un), x(s)) (2.8)

for each n. Thus limn→∞ d(xn, x, λn, u) = 0 by (2.7), (2.4), and the continuity of x at u.

6

Page 8: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Proposition 2.6. Let (xn)n≥1 and x be in DE [0,∞). Then limn→∞ d(xn, x) = 0 if and only ifthere exists a sequence (λn)n≥1 in Λ such that (2.1) holds and

limn→∞

sup0≤t≤T

r(xn(t), x(λn(t))) = 0 (2.9)

for all T > 0.

Remark 2.7. The above proposition is equivalent to one with (2.9) replaced by

limn→∞

sup0≤t≤T

r(xn(λn(t)), x(t)) = 0 (2.10)

Proof. Suppose limn→∞ d(xn, x) = 0. Then there exists a sequence (λn)n≥1 in Λ such that (2.1)holds and (2.2) holds with yn = x for all n. In particular, by (2.6) with u = m, there exists asequence (un)n≥1 ⊂ (0,∞) with un → ∞, and d(xn, x, λn, un) → 0 i.e.

limn→∞

supt≥0

r(xn(t ∧ un), x(λn(t) ∧ un)) = 0.

But given T > 0, un ≥ T ∨λn(T ) for sufficiently large n (using (2.4)) and so the above equationimplies (2.9).

Conversely, suppose there exists a sequence (λn)n≥1 in Λ such that (2.1) and (2.9) hold.Then for every continuity point u of x,

sup0≤t≤u

q(xn(t ∧ un), x(λn(t) ∧ un)) = sup0≤t≤u

q(xn(t), x(λn(t))) ≤ sup0≤t≤u

r(xn(t), x(λn(t)))

for all n large enough that un > λn(u)∨ u. And so, by (2.8), and the right continuity of x (2.5)holds. The result follows by Proposition 2.5.

Two simple examples of sequences that converge in (DE [0,∞), d) may give some insight intowhy the metric d is defined as above:

Example 2.8. Suppose (xn)n≥1 is a sequence in (DE [0,∞), d) such that xn → x locally uni-formly for some x ∈ (DE [0,∞), d). Then limn→∞ sup0≤t≤T r(xn(t), x(t)) = 0 and so, takingλn(t) = t for all n in Proposition 2.6, xn → x with respect to d. Hence the Skorohod topologyis weaker than the (locally) uniform topology.

Example 2.9. Define a sequence (xn)n≥1 in (DE [0,∞), d), by xn(s) = αn1tn≤s, where αn ∈ Eand tn ∈ [0,∞). Suppose that tn → t and αn → α for some t ∈ [0,∞) and α ∈ E. Intuitivelyone would expect xn → x where x(s) = α1t≤s. However, in the locally uniform topology thisis not always the case; for example if t ≤ tn for all n and α 6= 0, then xn(t) = 0 for all n,whereas x(t) = α. In other words, the locally uniform topology is too strong. The functionsλn are introduced to allow small perturbations around the points of discontinuity of x. In thisexample, if t 6= 0, then tn > 0 for sufficiently large n and so we may set λn(s) = t

tns. Then

limn→∞ γ(λn) = limn→∞∣

∣log(

ttn

)∣

∣ = 0 and

sup0≤s≤T

r(xn(s), x(λn(s))) = sup0≤s≤T

r(αn1tn≤s, α1tn≤s) → 0.

By Proposition 2.6, xn → x, as intuitively expected.

We are now ready to prove that (DE [0,∞), d) has the required properties. Note that whileseparability is a topological property, completeness is a property of the metric.

Theorem 2.10. If E is separable, then DE [0,∞) is separable. If the metric space (E, r) iscomplete, then (DE [0,∞), d) is complete.

7

Page 9: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Proof. Since E is separable, there exists a countable dense subset α1, α2, . . . ⊂ E. Let Γ bethe countable collection of elements of DE [0,∞) of the form

y(t) =

αik, tk−1 ≤ t < tk, k = 1, . . . , n,

αin, t ≥ tn,

(2.11)

where 0 = t0 < t1 < · · · < tn are rationals, i1, . . . in are positive integers and n ≥ 1. We shallshow that Γ is dense in DE [0,∞). Given ε > 0, let x ∈ DE [0,∞) and let T ∈ N be sufficientlylarge that e−T < ε

2 . By Lemma 2.2, x has only finitely many points of discontinuity in theinterval (0, T ), at s1, . . . sm say, where 0 = s0 < s1 < · · · < sm < sm+1 = T . Since x has leftlimits, for each j = 0, . . . ,m, x is uniformly continuous on the interval [sj , sj+1) and so thereexists some 0 < δj < sj+1−sj such that if s, t ∈ [sj, sj+1), and |s−t| < δj , then |x(s)−x(t)| < ε

4 .

Let n ≥ 3Tδ0∧···∧δm

and let tk = kTn for k = 0, . . . , n. For each k = 0, . . . , n− 1, there exists some

positive integer ik such that |αik− x(tk)| < ε

4 . Define y as in (2.11). For each j = 1, . . .m with

nsj /∈ N, there exists some kj such that sj ∈ (tkj, tkj+1). Let t′kj

=sj−e−εtkj+1

1−e−ε ∈ (tkj−1, sj),

and t′kj+1=

sj−eεtkj+1

1−eε ∈ (tkj+1, tkj+2). Note that, by the definition of n, |kj − ki| ≥ 3 for all

i 6= j and so the t′kjare strictly increasing. Let λ ∈ Λ be the strictly increasing piecewise linear

function joining (t′kj, t′kj

) to (tkj+1, sj) to (t′kj+1, t′kj+1) for those j = 1, . . .m, for which nsj /∈ N,

with gradient 1 otherwise. Then

γ(λ) ≤ minj

log

t′kj+1 − sj

t′kj+1 − tkj+1

∧ log

sj − t′kj

tkj+1 − tkj

= ε

and, if t < T , then r(x(t), y(λ(t))) = r(x(t), x(tk)) + r(x(tk), αik) < ε

2 , for some k with 0 ≤t− tk < δ. Hence, by the definition of d,

d(x, y) ≤ ε ∨(ε

2+ e−T

)

≤ ε,

and so Γ is dense in DE [0,∞).

To prove completeness, suppose that (xn)n≥1 is a Cauchy sequence in DE [0,∞). There exist1 ≤ N1 < N2 < · · · such that m,n ≥ Nk implies

d(xn, xm) ≤ 2−k−1e−k.

Then if yk = xNkfor k = 1, 2, . . ., there exist uk > k and λk ∈ Λ such that

γ(λk) ∨ d(yk, yk+1, λk, uk) ≤ 2−k.

Let µn,k = λk+n · · · λk. Then γ(µn,k) ≤∑k+nj=k γ(λj) ≤ 2−k+1. By a similar argument to that

used to prove (2.3), for any µ, λ ∈ Λ,

eγ(µ) ≥ supt6=λ(t)

µ(λ(t)) − µ(t)

λ(t) − t

,

and sosup

0≤t≤T|µ(λ(t) − t)| ≤ sup

0≤t≤T|λ(t) − t|eγ(µ).

Hence, if n ≥ m, then

sup0≤t≤T

|µn,k − µm,k| ≤ sup0≤t≤T

|µn−m−1,k+m+1(t) − t|eγ(µm,k)

≤ T (eγ(µn−m−1,k+m+1) − 1)e2−k+1

≤ T (e2−k−m − 1)e2

−k+1

→ 0

8

Page 10: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

as m → ∞, where the second inequality follows by (2.3). Hence, µn,k converges uniformly oncompact intervals to a strictly increasing, Lipschitz continuous function µk, with γ(µk) ≤ 2−k+1,i.e. µk ∈ Λ. Now

supt≥0

q(yk(µ−1k (t) ∧ uk), yk+1(µ

−1k+1(t) ∧ uk)) = sup

t≥0q(yk(µ−1

k (t) ∧ uk), yk+1(λk(µ−1k (t)) ∧ uk))

= supt≥0

q(yk(t ∧ uk), yk+1(λk(t) ∧ uk))

= d(yk, yk+1, λk, uk)

≤ 2−k

for k = 1, 2, . . .. Since (E, r) is complete, zk = yk µ−1k ∈ DE [0,∞), converges uniformly

on bounded intervals to a function y : [0,∞) → E. As each zk ∈ DE [0,∞), y ∈ DE [0,∞)(taking locally uniform limits preserves right continuity and the existence of left limits). Nowlimk→∞ γ(µ−1

k ) = limk→∞ γ(µk) = 0 and

limk→∞

sup0≤t≤T

r(yk(µ−1k (t)), y(t)) = lim

k→∞sup

0≤t≤Tr(zk(t), y(t)) = 0,

for all T > 0 and so, by Proposition 2.6 (using Remark 2.7), limk→∞ d(yk, y) = 0. Hence(DE [0,∞), d) is complete.

In order to study Borel probability measures on DE [0,∞) it is important to know more aboutSE , the Borel σ-algebra of DE [0,∞). The following result states that SE is just the σ-algebragenerated by the coordinate random variables.

Proposition 2.11. For each t ≥ 0, define πt : DE [0,∞) → E by πt(x) = x(t). Then

SE ⊃ S′E = σ(πt : 0 ≤ t <∞).

If E is separable, then SE = S′E .

Proof. For each ε > 0, t ≥ 0, and f ∈ C(E), the space of real valued bounded continuousfunctions, define

fεt (x) =

1

ε

∫ t+ε

t

f(πs(x))ds.

Now suppose (xn)n≥1 is a sequence in DE [0,∞) converging to x. Then there exists a sequence(λn)n≥1 in Λ such that (2.1) and (2.10) hold. Then

fεt (xn) =

1

ε

f(xn(λn(s)))1λn(t)≤s≤λn(t+ε)λ′n(s)ds

→ 1

ε

f(x(s))1t≤s≤t+εds

= fεt (x),

by dominated convergence, since xn(λn(s)) → x(s) uniformly on bounded intervals, f is boundedand continuous, and λn(s) → s uniformly on bounded intervals, implying λ′n(s) → 1 almosteverywhere, uniformly on bounded intervals. Hence fε

t is a continuous function on DE [0,∞)and so is Borel measurable. As limε→∞ fε

t (x) = f(πt(x)) for every x ∈ DE[0,∞), f πt is Borelmeasurable for every f ∈ C(E) and hence every bounded measurable function f . In particular,

π−1t (Γ) = x ∈ DE[0,∞) : 1Γ(πt(x)) = 1 ∈ SE

for all Borel subsets Γ ⊂ E, and hence SE ⊃ S′E .

Assume now that E is separable. Let n ≥ 1, let 0 = t0 < t1 < · · · < tn < tn+1 = ∞, and forα0, α1, . . . , αn ∈ E define η(α0, α1, . . . , αn) ∈ DE [0,∞) by

η(α0, α1, . . . , αn)(t) = αi, ti ≤ t < ti+1, i = 0, 1, . . . , n.

9

Page 11: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Nowd(η(α0, α1, . . . , αn), η(α′

0, α′1, . . . , α

′n)) ≤ max

0≤i≤nr(αi, α

′i),

and so η is a continuous function from En+1 into DE [0,∞). Since E is separable, En+1 isseparable and so there exists a countable dense subset A ⊂ En+1. For fixed z ∈ DE [0,∞) andε > 0, by the continuity of η,

Γ = a ∈ En+1 : d(z, η(a)) < ε =⋃

a∈A:d(z,η(a))<ε

∞⋂

n=1

B

(

a,1

n

)

,

is a measurable subset of En+1 with respect to the Borel product σ-algebra and so, since eachπt is S′

E measurable, d(z, η (πt0 , πt1 , . . . , πtn)) is an S′

E-measurable function from DE [0,∞)into R. For m = 1, 2, . . ., define ηm as η but with n = m2 and ti = i

m , i = 0, 1, . . .. Byan identical argument to that in the proof of the separability of DE [0,∞) in Theorem 2.10,d(x, ηm(πt0(x), πt1 (x), . . . πtn

(x))) → 0 as m→ ∞ and hence

limm→∞

|d(z, ηm(πt0(x), πt1 (x), . . . πtn(x))) − d(z, x)| ≤ lim

m→∞d(x, ηm(πt0(x), πt1 (x), . . . πtn

(x)))

= 0

for every x ∈ DE [0,∞). Therefore d(z, x) = limm→∞ d(z, ηm(πt0(x), πt1 (x), . . . πtn(x))) is S′

E-measurable in x for fixed z ∈ DE [0,∞) and in particular, every open ball B(z, ε) = x ∈DE [0,∞) : d(z, x) < ε belongs to S ′

E . Since E (and, by Theorem 2.10, DE [0,∞)) is separable,S′

E contains all the open sets in DE[0,∞) and hence contains SE .

3 Convergence of Probability Measures

I order to study the convergence of the distributions of stochastic processes, it is necessary tounderstand the probability measures that characterise these. In this section we construct ametric on the space of probability measures corresponding to the convergence, in distribution,of the stochastic processes. Using this, we establish a relationship between convergence indistribution and convergence in probability of processes defined on a common probability space.This result is applied to sequences of Markov chains and Diffusion processes to obtain somesimple conditions for convergence.

Where possible, results are proved for probability measures on a general metric space (S, d).However, in practice, we generally take S = DE [0,∞), and in particular DRd [0,∞), with themetric d defined in the previous section.

Notation 3.1. For a metric space (S, d)B(S) is the σ-algebra of Borel subsets of SP(S) is the family of Borel probability measures on SC(S) is the space of real-valued bounded continuous functions on (S, d) with norm ‖f‖ =supx∈S |f(x)|C is the collection of closed subsets of SF ε = x ∈ S : infy∈F d(x, y) < ε where F ⊂ S

Definition 3.2. A sequence (Pn)n≥1 in P(S) is said to converge weakly to P ∈ P(S) (denotedPn ⇒ P ) if

limn→∞

fdPn =

fdP for all f ∈ C(S).

The distribution of an S-valued random variable X , denoted by PX−1, is the element of P(S)given by PX−1(B) = P(X ∈ B) where P is the probability measure on the probability spaceunderlying X .

10

Page 12: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

A sequence (Xn)n≥1 of S-valued random variables is said to converge in distribution to theS-valued random variable X if PX−1

n ⇒ PX−1, or equivalently, if

limn→∞

E(f(Xn)) = E(f(X)) for all f ∈ C(S).

This is denoted by Xn ⇒ X .

Remark 3.3. Note that this is a direct generalisation of the definition of convergence in distri-bution of a sequence of real-valued random variables (Xn)n≥1, where we say that Xn ⇒ X iflimn→∞ E(f(Xn)) = E(f(X)) for all f ∈ C(R).

3.1 The Prohorov Metric

We now define a metric ρ on P(S) with the property that a sequence of probability measuresconverges with respect to ρ if and only if it converges weakly.

Definition 3.4. For P and Q ∈ P(S) the Prohorov metric is defined by

ρ(P,Q) = infε > 0 : P (F ) ≤ Q(F ε) + ε for all F ∈ C,

using the notation defined in 3.1.

In order to prove that ρ is a metric, the following lemma is needed.

Lemma 3.5. Let P,Q ∈ P(S) and α, β > 0. If

P (F ) ≤ Q(Fα) + β (3.1)

for all F ∈ C, thenQ(F ) ≤ P (Fα) + β (3.2)

for all F ∈ C.

Proof. Suppose F ∈ C. Fα is open since if x ∈ Fα, then there exists y ∈ F such that d(x, y) < α.Then d(z, y) ≤ d(z, x) + d(x, y) < α for all z ∈ B(x, α− d(x, y)) and so B(x, α− d(x, y)) ⊂ Fα.Hence, if G = S \ Fα, then G ∈ C. F ⊂ S \Gα since if x ∈ Gα, then there exists some y /∈ Fα

such that d(x, y) < α. Then d(y, z) ≥ α for all z ∈ F and so d(x, z) ≥ d(y, z) − d(x, y) > 0 forall z ∈ F i.e. x /∈ F . Substituting G into (3.1) gives

P (Fα) = 1 − P (G) ≥ 1 −Q(Gα) − β ≥ Q(F ) − β.

Proposition 3.6. The function ρ, defined above, is a metric on P(S).

Proof. By the above lemma,

ρ(P,Q) = infε > 0 : P (F ) ≤ Q(F ε) + ε for all F ∈ C= infε > 0 : Q(F ) ≤ P (F ε) + ε for all F ∈ C= ρ(Q,P ).

If ρ(P,Q) = 0, then there exists a sequence (εn)n≥1 with εn → 0 such that P (F ) ≤ Q(F εn)+ εn

for all n. Letting n→ ∞ and using the continuity of probability measures, gives P (F ) ≤ Q(F )for all F ∈ C. By the above symmetry between P and Q, P (F ) = Q(F ) for all F ∈ C and hencefor all F ∈ B(S). Therefore ρ(P,Q) = 0 if and only if P = Q.

11

Page 13: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Finally, if P,Q,R ∈ P(S), and δ > 0, ε > 0 satisfy ρ(P,Q) < δ and ρ(Q,R) < ε, then

P (F ) ≤ Q(F δ) + δ

≤ Q(F δ) + δ

≤ R((F δ)ε) + δ + ε

≤ R(F δ+ε) + δ + ε

for all F ∈ C, so ρ(P,R) ≤ δ + ε and hence ρ(P,R) ≤ infδ>ρ(P,Q) δ + infε>ρ(Q,R) ε = ρ(P,Q) +ρ(Q,R) as required.

Theorem 3.7. Let (Pn)n≥1 be a sequence in P(S) and P ∈ P(S). If S is separable, thenlimn→∞ ρ(Pn, P ) = 0 if and only if Pn ⇒ P .

Proof. Suppose limn→∞ ρ(Pn, P ) = 0. For each N , let εn = ρ(Pn, P )+ 1n . Given f ∈ C(S) with

f ≥ 0,∫

fdPn =

∫ ‖f‖

0

Pn(f ≥ t)dt ≤∫ ‖f‖

0

P (f ≥ tεn)dt+ εn‖f‖

for every n and so, by dominated convergence,

lim supn→∞

fdPn ≤ limn→∞

∫ ‖f‖

0

P (f ≥ tεn)dt

=

∫ ‖f‖

0

P (f ≥ t)dt

=

fdP.

Hence, for all f ∈ C(S),

lim supn→∞

(‖f‖ + f)dPn ≤∫

(‖f‖ + f)dP, and

lim supn→∞

(‖f‖ − f)dPn ≤∫

(‖f‖ − f)dP.

Therefore∫

fdP ≤ ‖f‖ − lim supn→∞

(‖f‖ − f)dPn

= lim infn→∞

fdPn

≤ lim supn→∞

fdPn

= lim supn→∞

(‖f‖ + f)dPn − ‖f‖

≤∫

fdP.

and so we must have equality throughout. Thus limn→∞∫

fdPn =∫

fdP for all f ∈ C(S) i.e.Pn ⇒ P .

Conversely, suppose Pn ⇒ P . We first establish some preliminary results for open and closedsubsets of S. Let F ∈ C and for each ε > 0, define fε ∈ C(S) by

fε(x) =

(

1 − d(x, F )

ε

)

∨ 0,

12

Page 14: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

where d(x, F ) = infy∈F d(x, y). Then fε ≥ 1F for all ε > 0 and so

lim supn→∞

Pn(F ) ≤ limn→∞

fεdPn =

fεdP,

for each ε > 0. Therefore, by dominated convergence,

lim supn→∞

Pn(F ) ≤ limε→0

fεdP = P (F ).

If G ⊂ S is open, then

lim infn→∞

Pn(G) = 1 − lim supn→∞

Pn(Gc) ≥ 1 − P (Gc) = P (G).

Now let ε > 0. Since S is separable (note: this is the only point in the proof where we useseparability), there exists a countable dense subset x1, x2, . . . ⊂ S. Let Ei = B(xi,

ε4 ) for

i = 1, 2, . . .. Then P (⋃∞

i=1 Ei) = P (S) = 1 and so there exists some smallest integer N such

that P (⋃N

i=1 Ei) > 1− ε2 . Now let G be the collection of open sets of the form

(⋃

i∈I Ei

)ε2 , where

I ⊂ 1, . . . , N. Since G is finite, by the above result on open sets, there exists some N0 suchthat P (G) ≤ Pn(G) + ε

2 for all G ∈ G and n ≥ N0. Given F ∈ C, let

F0 =⋃

Ei : 1 ≤ i ≤ N, Ei ∩ F 6= ∅.

Then Fε20 ∈ G and so

P (F ) ≤ P (Fε20 ) + P

( ∞⋃

i=N+1

Ei

)

≤ P (Fε20 ) +

ε

2

≤ Pn(Fε20 ) + ε

≤ Pn(F ε) + ε

for all n ≥ N0, where the first inequality is by F ⊂ F0∪(⋃∞

i=N+1Ei), the second by the definitionof N , the third by the definition of N0 and the fourth by the diameter of the Ei being ε

2 . Henceρ(Pn, P ) ≤ ε for each n ≥ N0, i.e. limn→∞ ρ(Pn, P ) = 0.

Definition 3.8. Let P,Q ∈ P(S). Define M(P,Q) to be the set of all µ ∈ P(S × S) withmarginals P and Q i.e. µ(A× S) = P (A) and µ(S ×A) = Q(A) for all A ∈ B(S).

The following lemma provides a probabilistic interpretation of the Prohorov metric:

Lemma 3.9.ρ(P,Q) ≤ inf

µ∈M(P,Q)inf ε > 0 : µ((x, y) : d(x, y) ≥ ε) ≤ ε .

Proof. If for some ε > 0 and µ ∈ M(P,Q) we have

µ((x, y) : d(x, y) ≥ ε) ≤ ε,

then

P (F ) = µ(F × S)

≤ µ ((F × S) ∩ (x, y) : d(x, y) < ε) + µ((x, y) : d(x, y) ≥ ε)

≤ µ(S × F ε) + ε

= Q(F ε) + ε,

for all F ∈ C and so ρ(P,Q) ≤ ε. The result follows.

13

Page 15: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

In fact, in the case when S is separable, the inequality in the above lemma can be replacedby an equality. (The proof is an immediate consequence of Lemma 3.11).

Proposition 3.10. Let (S, d) be separable. Suppose that Xn, n = 1, 2, . . ., and X are S-valuedrandom variables defined on the same probability space with distributions Pn, n = 1, 2, . . ., andP respectively. If d(Xn, X) → 0 in probability as n→ ∞, then Pn ⇒ P .

Proof. For n = 1, 2, . . ., let µn be the joint distribution of Xn and X . Then

limn→∞

µn((x, y) : d(x, y) ≥ ε) = limn→∞

P(d(Xn, X) ≥ ε)

= 0,

where P is the probability measure on the probability space underlying Xn and X . By Lemma3.9, limn→∞ ρ(Xn, X) = 0, and since S is separable, the result follows by Theorem 3.7.

3.2 Examples

Proposition 3.10 suggests a method of proving that a sequence of probability measures convergesweakly by constructing random variables with the required distributions on a common proba-bility space and showing that they converge in probability. We illustrate this method by lookingat three examples: discrete time Markov chains, continuous time Markov chains and diffusionprocesses.

3.2.1 Discrete Time Markov Chains

Suppose that E is countable and that (XN)N≥1 and X are discrete time Markov Chains withinitial distributions (λN )N≥1 and λ, and transition matrices (PN )N≥1 and P respectively. Wewill show that if PN → P and λN → λ uniformly, then XN ⇒ X .

We construct random variables (Y N )N≥1 and Y with the required distributions on the prob-ability space ([0, 1),B, µ) where B is the Borel σ-algebra and µ is Lebesgue measure. Withoutloss of generality we may assume E = N (set the relevant probabilities to zero if E is finite).

For each ω ∈ [0, 1), construct a sequence a(ω) = (in)n≥0 of elements of N inductively asfollows:

Since∑∞

i=0 λi = 1, there exists a smallest i0 ∈ N such that ω >∑i0

i=0 λi. Set a(ω)0 = i0,

and let µi0 =∑i0−1

i=0 λi.

Since µi0 ≤ ω < µi0 + λi0 , and∑∞

i=0 pi0 i = 1, there exists some smallest i1 such that

ω < µi0 + λi0

∑i1i=0 pi0 i. Set a(ω)1 = i1, and let µi0,i1 = µi0 + λi0

∑i1−1i=0 pi0 i.

Suppose we have constructed a(ω)m = im and µi0,...,imfor all m < n, such that µi0,...,im

≤ω < µi0,...,im

+ λi0pi0 i1 · · · pim−1 im. Since

∑∞i=0 pin−1 i = 1, there exists some smallest in such

that ω < µi0,...,in−1 + λi0pi0 i1 · · · pin−2 in−1

∑in

i=0 pin−1 i. Set a(ω)n = in, and let µi0,...,in=

µi0,...,in−1 + λi0pi0 i1 · · · pin−2 in−1

∑in−1i=0 pin−1 i.

Define a discrete time process (Yn)n≥0 on ([0, 1),B, µ) by setting Yn(ω) = a(ω)n. Then

P (Y0 = i0, Y1 = i1, . . . , Yn = in) = µ(ω : µi0,...,in≤ ω < µi0,...,in

+ λi0pi0 i1 · · · pin−1 in)

= λi0pi0 i1 · · · pin−1 in,

and so (Yn)n≥0 is a Markov chain with initial distribution λ and transition matrix P . For eachN ∈ N, construct (Y N

n )n≥0 with initial distribution λN and transition matrix PN similarly.

µi0,...,inis a continuous function in a finite number of the entries of λ and P , and λN → λ and

PN → P , uniformly in N . Hence µNi0,...,in

→ µi0,...,in. Therefore, if µi0,...,in

< ω < µi0,...,in+1,then there exists some N0 ∈ N such that N ≥ N0 implies that

µNi0,...,in

< ω < µNi0,...,in+1

14

Page 16: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

and hence Y Nm (ω) = Ym(ω) for allm ≤ n. In other words, provided ω 6= µi0,...,in

for any i0, . . . , in,Y N (ω) → Y (ω). But only a countable number of elements of [0, 1) are equal to µi0,...,in

for somei0, . . . , in and so limN→∞ Y N = Y almost surely. By Proposition 3.10, XN ⇒ X as required.

3.2.2 Continuous Time Markov Chains

Suppose that E is countable and that (XN )N≥1 and X are continuous time Markov Chains withinitial distributions (λN )N≥1 and λ, and generator matrices (QN )N≥1 and Q respectively. Wewill show that if QN → Q and λN → λ uniformly, then XN ⇒ X .

We construct random variables (ZN )N≥1 and Z with the required distributions on a commonprobability space (Ω,F ,P), using a construction due to Norris [9]. As in the discrete case weshall assume that E = N.

Let (ΠN )N≥1 and Π be the jump matrices corresponding to the generator matrices (QN )N≥1

and Q respectively. Since QN → Q uniformly, the corresponding jump matrices ΠN → Πuniformly and, by the discrete case above, there exist discrete time Markov chains (Y N )N≥1 andY with initial distributions (λN )N≥1 and λ, and transition matrices (ΠN )N≥1 and Π respectively,such that limN→∞ Y N = Y almost surely. By discarding a set of measure zero if necessary, wemay assume that Y N (ω) → Y (ω) for all ω ∈ Ω. Let T1, T2, . . . be independent exponentialrandom variables of parameter 1, independent of (Y N )N≥1 and Y . Defining q(i) = −qii, setSn = Tn

q(Yn−1) , Jn = S1 + · · · + Sn, and

Zt =

Yn if Jn ≤ t < Jn+1 for some n

∞ otherwise.

Then the Sn are independent exponential random variables with parameters q(Yn) and so Z hasthe required distribution. Define SN

n , JNn and ZN similarly for N ≥ 1. Since Y N (ω) → Y (ω)

for all ω ∈ Ω, given ω ∈ Ω, for each n, there exists some Nn such that N ≥ Nn implies that

Y Nm (ω) = Ym(ω) for all m ≤ n. Then since QN → Q, if N ≥ Nn, then SN

m(ω) = Tm(ω)

qN (Y Nm−1(ω))

=

Tm(ω)qN (Ym−1(ω)) → Tm(ω)

q(Ym−1(ω)) = Sm(ω) for all m ≤ n+ 1 and hence JNm → Jm for all m ≤ n+ 1. By

the same argument as that used to prove that DE [0,∞) is separable in Theorem 2.10, it followsthat d(ZN (ω), Z(ω)) → 0 as N → ∞, and so limN→∞ ZN = Z almost surely. By Proposition3.10, XN ⇒ X , as required.

3.2.3 Diffusion Processes

Suppose that (an)n≥1 and a are bounded symmetric uniformly positive definite Lipschitz func-tions Rd → Rd ⊗ Rd, and that (bn)n≥1 and b are bounded Lipschitz functions Rd → Rd. Let(Xn)n≥1 and X be diffusion processes in R

d with diffusivities (an)n≥1 and a, and drifts (bn)n≥1

and b respectively, starting from x ∈ Rd. We shall show that if an → a and bn → b uniformly,then Xn ⇒ X .

Let (Bt)t≥0 be a Brownian Motion in Rd. We shall construct diffusions (Zn)n≥1 and Z, withthe required distributions, on the probability space underlying (Bt)t≥0.

Since a(x) is symmetric positive definite for all x ∈ Rd, there exists a unique symmetric

positive definite map σ : Rd → Rd ⊗ (Rd)∗ such that σ(x)σ(x)∗ = a(x). Furthermore, since ais bounded and Lipschitz, σ is bounded and Lipschitz. Similarly, there exist unique symmetricpositive definite bounded Lipschitz maps (σn)n≥1 and, since an → a uniformly, σn → σ uni-formly. Assume that σ, σn, b and bn have Lipschitz constant K, independent of n. Since σ, σn,b and bn are Lipschitz, there exist continuous processes Z and Zn for n = 1, 2, . . ., adapted tothe filtration generated by (Bt)t≥0 satisfying

dZt = σ(Zt)dBt + b(Zt)dt, Z0 = x,

dZnt = σn(Zn

t )dBt + bn(Znt )dt, Zn

0 = x.

15

Page 17: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Zn and Z have the required distributions. Using (x+ y)2 = 2x2 + 2y2,

|Znt − Zt|2 =

∫ t

0

(σn(Zns ) − σ(Zs))dBs +

∫ t

0

(bn(Zns ) − b(Zs))ds

2

≤ 2

∫ t

0

(σn(Zns ) − σ(Zs))dBs

2

+ 2

∫ t

0

(bn(Zns ) − b(Zs))ds

2

.

By Doob’s L2 inequality,

E

(

sups≤t

∫ t

0

(σn(Zns ) − σ(Zs))dBs

2)

≤ 4E

(∫ t

0

|σn(Zns ) − σ(Zs)|2ds

)

,

and by the Cauchy-Schwartz inequality,

E

(

sups≤t

∫ t

0

(bn(Zns ) − b(Zs))ds

2)

≤ tE

(∫ t

0

|bn(Zns ) − b(Zs)|2ds

)

.

Now since K is a Lipschitz constant for σn,

|σn(Znr ) − σ(Zr)| ≤ |σn(Zn

r ) − σn(Zr)| + |σn(Zr) − σ(Zr)|≤ K|Zn

r − Zr| + ‖σn − σ‖,

and similarly|bn(Zn

r ) − b(Zr)| ≤ K|Znr − Zr| + ‖bn − b‖.

Hence

E

(

sups≤t

|Zns − Zs|2

)

≤ 8E

(∫ t

0

|σn(Zns ) − σ(Zs)|2ds

)

+ 2tE

(∫ t

0

|bn(Zns ) − b(Zs)|2ds

)

≤ 16E

(∫ t

0

(K2|Zns − Zs|2 + ‖σn − σ‖2)ds

)

+ 4tE

(∫ t

0

(K2|Zns − Zs|2 + ‖bn − b‖2)ds

)

≤ 16t‖σn − σ‖2 + 4t2‖bn − b‖2

+ (16 + 4t)K2

∫ t

0

E

(

supr≤s

|Znr − Zr|2

)

ds.

Given ε > 0 and T > 0, set c = (16 + 4T )K2 and ε′ = εe−cT . Since σn → σ and bn → b

uniformly, there exists N ∈ N such that n ≥ N implies ‖σn −σ‖ <√

ε′

32T and ‖bn− b‖ <√

ε′

8T 2 .

If fn(t) = E(

sups≤t |Zns − Zs|2

)

, then

fn(t) ≤ ε′ + c

∫ t

0

fn(s)ds,

for all n ≥ N and 0 ≤ t ≤ T . By Gronwall’s Inequality (Lemma 6.9), fn(t) ≤ ε′ect = ε and soE(

sups≤t |Zns − Zs|2

)

→ 0 as n→ ∞. In particular, sups≤t |Zns −Zs| → 0 in probability. Taking

λn(t) = t for all t, Proposition 2.6 implies that d(Zn, Z) → 0 in probability. By Proposition3.10, XN ⇒ X , as required.

3.3 The Skorohod Representation

A converse to Proposition 3.10 exists, in the form of the Skorohod Representation. Before wecan prove this, we need the following lemma.

16

Page 18: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Lemma 3.11. Let S be separable. Let P,Q ∈ P(S), ε > 0 satisfy ρ(P,Q) < ε, and δ > 0.Suppose that E1, . . . , EN ∈ B(S) are disjoint with diameters less than δ and that P (E0) ≤ δ,

where E0 = S \⋃Ni=1 Ei. Then there exist constants c1, . . . cN ∈ [0, 1] and independent random

variables X,Y0, . . . , YN (S-valued) and ξ ( [0, 1]-valued) on some probability space (Ω,F ,P) suchthat X has distribution P , ξ is uniformly distributed on [0, 1],

Y =

Yi on X ∈ Ei, ξ ≥ ci, i = 1, . . . , N,

Y0 on X ∈ E0 ∪⋃N

i=1X ∈ Ei, ξ < ci

has distribution Q,

d(X,Y ) ≥ δ + ε ⊂ X ∈ E0 ∪

ξ < max

ε

P (Ei): P (Ei) > 0

,

andP(d(X,Y ) ≥ δ + ε) ≤ δ + ε.

Proof. This lemma is not proved here as the proof is long and not very illuminating. Theinterested reader is referred to pp97-101 of Ethier and Kurtz [4].

Theorem 3.12 (The Skorohod Representation). Let (S, d) be separable. Suppose Pn,n = 1, 2, . . . and P in P(S) satisfy Pn ⇒ P . Then there exists a probability space (Ω,F ,P)on which are defined S-valued random variables Xn, n = 1, 2, . . . and X with distributions Pn,n = 1, 2, . . ., and P respectively, such that limn→∞Xn = X almost surely.

Proof. Let x1, x2, . . . be a dense subset of S. Then P(⋃∞

i=1 B(

xi, 2−k))

= 1 for each k andso, given ε > 0, there exist integers N1, N2, . . . such that

P

(

Nk⋃

i=1

B(

xi, 2−k)

)

≥ 1 − 2−k

for k = 1, 2, . . .. Set E(k)i = B(xi, 2

−k) and E(k)0 = S \ ⋃Nk

i=1 E(k)i . Assume (without loss of

generality) that εk = min1≤i≤NkP (E

(k)i ) > 0. Define the sequence (kn)n≥1 by

kn = 1 ∨ max

k ≥ 1 : ρ(Pn, P ) <εk

k

.

Apply Lemma 3.11 with Q = Pn, ε =εkn

knif kn > 1 and ε = ρ(Pn, P ) + 1

n if kn = 1, δ = 2−kn ,

Ei = Ekn

i , and N = Nknfor n = 1, 2, . . .. Then there exists a probability space (Ω,F ,P) on

which are defined S-valued random variables Y(n)0 , . . . , Y

(n)Nkn

, n = 1, 2, . . ., a random variable

ξ, uniformly distributed on [0, 1], and an S-valued random variable X with distribution P , all

of which are independent, such that if the constants c(n)1 , . . . , c

(n)Nkn

∈ [0, 1], n = 1, 2, . . . areappropriately chosen, then the random variable

Xn =

Y(n)i on X ∈ E

(kn)i , ξ ≥ c

(n)i , i = 1, . . . , Nkn

,

Y(n)0 on X ∈ E

(kn)0 ∪⋃Nkn

i=1 X ∈ E(kn)i , ξ < c

(n)i

has distribution Pn and

d(Xn, X) ≥ 2−kn +εkn

kn

⊂ X ∈ E(kn)0 ∪

ξ <1

kn

if kn > 1, for n = 1, 2, . . ..

17

Page 19: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Since Pn ⇒ P , by Theorem 3.7 ρ(Pn, P ) → 0. Hence, for each k ∈ N, ρ(Pn, P ) < εk

kfor sufficiently large n, and so kn ≥ k for sufficiently large n. If Kn = minm≥n km, thenlimn→∞Kn = ∞. However, for Kn > 1,

P

( ∞⋃

m=n

d(Xm, X) ≥ 2−km +εkm

km

)

≤∞∑

k=Kn

P(X ∈ E(k)0 ) + P

(

ξ <1

Kn

)

≤ 2−Kn+1 +1

Kn

→ 0.

So limn→∞Xn = X almost surely.

We conclude this section by giving an application of this theorem.

Corollary 3.13 (The Continuous Mapping Theorem). Let (S, d) and (S′, d′) be separablemetric spaces and let h : S → S′ be Borel measurable. Suppose that Pn, n = 1, 2, . . . and P inP(S) satisfy Pn ⇒ P , and define Qn, n = 1, 2, . . . and Q in P(S′) by Qn = Pnh

−1, Q = Ph−1.(By definition, Ph−1(B) = P (s ∈ S : h(s) ∈ B).) Let Ch be the set of points of S at which h iscontinuous. If P (Ch) = 1, then Qn ⇒ Q on S′.

Proof. By Theorem 3.12, there exists a probability space (Ω,F ,P) on which are defined S-valued random variables Xn, n = 1, 2, . . ., and X with distributions Pn, n = 1, 2, . . ., and Prespectively, such that limn→∞Xn = X almost surely. Since P(X ∈ Ch) = P (Ch) = 1, we havelimn→∞ h(Xn) = h(X) almost surely, and so, by Proposition 3.10, Qn ⇒ Q in S′.

4 Convergence of Finite Dimensional Distributions

We now get to a key result, due to Prohorov, on characterizing convergent processes. This statesthat a sequence of stochastic processes converges if and only if it is relatively compact and thefinite dimensional distributions converge.

This is a particularly useful method for checking the convergence of processes where thelimit has independent increments, as in this case computing the finite dimensional distributionsis relatively straightforward. (See Example 4.4 at the end of the section for an illustration).

Definition 4.1. Let Xα (where α ranges over some index set) be a family of stochastic pro-cesses with sample paths in DE [0,∞), and let Pα ⊂ P(DE [0,∞)) be the family of associatedprobability distributions (i.e. Pα(B) = P(Xα ∈ B), (where P is the probability measure on theprobability space underlying Xα), for all B ∈ SE , SE being the Borel σ-algebra of DE[0,∞).)We say that Xα is relatively compact if Pα is (i.e. if the closure of Pα in P(DE [0,∞)) iscompact).

Lemma 4.2. If X is a process with sample paths in DE [0,∞), then the complement in [0,∞)of

D(X) = t ≥ 0 : P(X(t) = X(t−)) = 1is at most countable.

Proof. For each, ε > 0, δ > 0, and T > 0, let

A(ε, δ, T ) = 0 ≤ t ≤ T : P (r(X(t), X(t−)) ≥ ε) ≥ δ .

18

Page 20: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Now if A(ε, δ, T ) contains a sequence (tn)n≥1 of distinct points, then

P (r(X(tn), X(tn−)) ≥ ε infinitely often) = P

( ∞⋂

n=1

∞⋃

m=n

r(X(tm), X(tm−)) ≥ ε)

= limn→∞

P

( ∞⋃

m=n

r(X(tm), X(tm−)) ≥ ε)

≥ limn→∞

P (r(Xn(t), X(tn−)) ≥ ε)

≥ δ > 0.

contradicting the fact that for each x ∈ DE [0,∞), r(x(t), x(t−)) ≥ ε for at most finitely manyt ∈ [0, T ] (see the proof of Lemma 2.2). Hence A(ε, δ, T ) is finite and so

D(X)c = t ≥ 0 : P(r(X(t), X(t−)) > 0) > 0 =∞⋃

n=1

∞⋃

n=1

∞⋃

N=1

A

(

1

n,

1

m,N

)

,

is at most countable.

Theorem 4.3. Let E be separable and let Xn, n = 1, 2, . . ., and X be processes with samplepaths in DE [0,∞).

(a) If Xn ⇒ X, then(Xn(t1), . . . , Xn(tk)) ⇒ (X(t1), . . . , X(tk)) (4.1)

for every finite set t1, . . . , tk ⊂ D(X). Moreover, for each finite set t1, . . . , tk ⊂ [0,∞),there exist sequences (tn1 )n≥1 in [t1,∞), . . . , (tnk )n≥1 in [tk,∞) converging to t1, . . . , tk,respectively, such that (Xn(tn1 ), . . . , Xn(tnk )) ⇒ (X(t1), . . . , X(tk)).

(b) If Xn is relatively compact and there exists a dense set D ⊂ [0,∞) such that (4.1) holdsfor every finite set t1, . . . , tk ⊂ D, then Xn ⇒ X.

Proof. (a) Suppose that Xn ⇒ X . Using the Skorohod Representation (Theorem 3.12), thereexists a probability space on which are defined processes Yn, n = 1, 2, . . ., and Y withsample paths in DE [0,∞) and with the same distributions asXn, n = 1, 2, . . ., and X , suchthat d(Yn, Y ) = 0 almost surely. If t1, . . . , tk ⊂ D(X) = D(Y ), then using the notation ofProposition 2.11, (πt1 , . . . , πtk

) : DE [0,∞) → Ek is continuous almost surely with respectto the distribution of Y and so, by the Continuous Mapping Theorem (Corollary 3.13),

limn→∞

(Yn(t1), . . . , Yn(tk)) = limn→∞

(πt1 , . . . , πtk)(Yn)

= (πt1 , . . . , πtk)(Y )

= (Y (t1), . . . , Y (tk)) almost surely.

The first conclusion follows by Proposition 3.10.

For the second conclusion, we observe that, by Lemma 4.2, for each finite set t1, . . . tk ⊂[0,∞), there exist sequences (tn1 )n≥1 in [t1,∞)∩D(X), . . . , (tnk )n≥1 in [tk,∞)∩D(X) con-verging to t1, . . . , tk, respectively. Then, by the above result, (Xn(tm1 ), . . . , Xn(tmk )) ⇒(X(tm1 ), . . . , X(tmk )) for each m ∈ N as n → ∞. Since the process X is right con-tinuous, (X(tm1 ), . . . , X(tmk )) → (X(t1), . . . , X(tk)) almost surely as m → ∞ and so(Xn(tn1 ), . . . , Xn(tnk )) ⇒ (X(t1), . . . , X(tk)).

(b) Since Xn is relatively compact, the closure of Pn is compact and hence every subse-quence of Pn has a convergent subsequence. It follows that every subsequence of Xnhas a convergent (in distribution) subsequence and so it is enough to show that every con-vergent subsequence of Xn converges in distribution toX . Restricting to a subsequence ifnecessary, suppose that Xn ⇒ Y . We must show that X and Y have the same distribution.

19

Page 21: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Let t1, . . . , tk ⊂ D(Y ) and f1, . . . fk ∈ C(E), and choose sequences (tn1 )n≥1 in D∩[t1,∞),. . . , (tnk )n≥1 in D ∩ [tk,∞) converging to t1, . . . , tk, respectively. The map (x1, . . . , xk) 7→∏k

i=1 fi(xi) ∈ C(Ek) and so (4.1) implies E

(

∏ki=1 fi(Xn(tmi ))

)

→ E

(

∏ki=1 fi(X(tmi ))

)

as

n→ ∞ for each m ≥ 1. Therefore there exist integers n1 < n2 < n3 < . . . such that∣

E

(

k∏

i=1

fi(X(tmi ))

)

− E

(

k∏

i=1

fi(Xnm(tmi ))

)∣

<1

m. (4.2)

Now∣

E

(

k∏

i=1

fi(X(ti))

)

− E

(

k∏

i=1

fi(Y (ti))

)∣

≤∣

E

(

k∏

i=1

fi(X(ti))

)

− E

(

k∏

i=1

fi(X(tmi ))

)∣

+

E

(

k∏

i=1

fi(X(tmi ))

)

− E

(

k∏

i=1

fi(Xnm(tmi ))

)∣

+

E

(

k∏

i=1

fi(Xnm(tmi ))

)

− E

(

k∏

i=1

fi(Xnm(ti))

)∣

+

E

(

k∏

i=1

fi(Xnm(ti))

)

− E

(

k∏

i=1

fi(Y (ti))

)∣

for each m ≥ 1. All four terms on the right tend to zero as m→ ∞, the first by the rightcontinuity of X , the second by (4.2), the third by the right continuity of Xnm

, and thefourth by (a), using the facts that Xnm

⇒ Y and t1, . . . , tk ⊂ D(Y ). Hence

E

(

k∏

i=1

fi(X(ti))

)

= E

(

k∏

i=1

fi(Y (ti))

)

(4.3)

for all f1, . . . fk ∈ C(E) and all t1, . . . , tk ⊂ D(Y ) (and hence for all t1, . . . , tk ⊂ [0,∞)(Lemma 4.2 and right continuity of X and Y )). Now let

D = A ∈ SE : E(1X∈A) = E(1Y ∈A) .

This is clearly a d-system. Also, since 1A is in the closure of C(E) for all open sets A ⊂ E,(4.3), together with the dominated convergence theorem, implies that D contains the π-system consisting of all finite intersections of π−1

t (A) : t ∈ [0,∞) and A ⊂ E is open. ByDynkin’s π-system Lemma, D contains the σ-algebra generated by the coordinate randomvariables πt, and hence, by Proposition 2.11, SE itself. Thus X and Y have the samedistribution.

Example 4.4. Suppose (Xn)n≥1 is a relatively compact sequence of processes with samplepaths in DRd [0,∞) having independent increments, and let X be a process in DRd [0,∞) hav-ing independent increments. For simplicity assume that X(0) = Xn(0) = 0 for all n. Now(Xn(t1), . . . , Xn(tk)) ⇒ (X(t1), . . . , X(tk)) for every finite set t1, . . . , tk ⊂ D(X) if and only if

E

exp i

k∑

j=1

〈θj , Xn(tj)〉

→ E

exp i

k∑

j=1

〈θj , X(tj)〉

for every k-tuple (θ1, . . . , θk) ∈ ((Rd)∗)k. Since Xn has independent increments,

E

exp i

k∑

j=1

〈θj , Xn(tj)〉

=

k∏

j=1

E(exp i〈θ′j , Xn(tj) −Xn(tj−1)〉)

20

Page 22: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

where θ′j =∑k

m=j θm for j = 1, . . . , k. The same result holds for X and so, if Xn(t) −Xn(s) ⇒X(t)−X(s) for all s, t, then the finite dimensional distributions of Xn converge in distributionto those of X and hence Xn ⇒ X . Since this condition is clearly necessary, Xn ⇒ X if and onlyif Xn(t) −Xn(s) ⇒ X(t) −X(s) for all s, t.

5 Relative Compactness in DE[0,∞)

In order to apply Theorem 4.3 in any practical cases, it is necessary to have an understanding ofthe conditions a family of stochastic processes, or equivalently probability measures, must satisfyto be relatively compact. In this section we establish some necessary and sufficient conditionsfor relative compactness in DE [0,∞) which will be useful in later sections.

5.1 Prohorov’s Theorem

Prohorov’s Theorem gives a characterisation of the compact subsets of P(S), where (S, d) is themetric space of Section 3, by relating compactness to the notion of tightness.

Definition 5.1. A probability measure P ∈ P(S) is said to be tight if for each ε > 0 thereexists a compact set K ⊂ S such that P (K) ≥ 1 − ε.

A family of probability measures M ⊂ P(S) is tight if for each ε > 0 there exists a compactset K ⊂ S such that infP∈M P (K) ≥ 1 − ε.

Theorem 5.2 (Prohorov’s Theorem). Let (S, d) be complete and separable, and let M ⊂P(S). Then the following are equivalent:

(a) M is tight.

(b) For each ε > 0, there exists a compact set K ⊂ S such that

infP∈M

P (Kε) ≥ 1 − ε,

where Kε is defined in 3.1.

(c) M is relatively compact.

Before we can prove this, we need two intermediate results.

Theorem 5.3. If S is separable, then P(S) is separable. If in addition (S, d) is complete, then(P(S), ρ) is complete.

Proof. Since S is separable, there exists a countable dense subset x1, x2, . . . ⊂ S. Let δx denotethe element of P(S) with unit mass at x ∈ S. Fix x0 ∈ S. We shall show that the countable

set of probability measures of the form∑N

i=0 aiδxiwith N finite, ai rational and

∑Ni=0 ai = 1

is dense in P(S). Given ε > 0, let P ∈ P(S). Since P (⋃∞

i=1B(xi, ε/2)) = 1, there exists some

N < ∞ such that P (⋃N

i=1 B(xi, ε/2)) ≥ 1 − ε2 . Set Ai = B(xi, ε/2) \⋃i−1

j=1 B(xj , ε/2) for each

i ≤ N . Pick some m ∈ N with m ≥ 2Nε . Let ai = ⌊mP (Ai)⌋

m < P (Ai) for i = 1, . . . , N and define

a0 = 1 −∑Ni=1 ai. Then the ai are rational and so Q =

∑Ni=0 aiδxi

is of the required form. IfF ∈ C, then

P (F ) ≤ P

F∩Ai 6=∅

Ai

+ P

((

N⋃

i=1

Ai

)c)

≤∑

F∩Ai 6=∅

⌊mP (Ai)⌋m

+N

m+ε

2

≤ Q(F ε) + ε.

21

Page 23: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Therefore ρ(P,Q) < ε and so P(S) is separable.

To prove completeness, suppose (Pn)n≥1 is a Cauchy sequence in P(S). By restrictingto a subsequence if necessary, we may assume that ρ(Pn−1, Pn) < 2−n for each n ≥ 2. Asin the proof of separability, for each n = 2, 3, . . . there exists some Nn < ∞ and disjoint

sets E(n)1 , . . . , E

(n)Nn

∈ B(S) with diameters less than 2−n such that Pn−1(E(n)0 ) ≤ 2−n where

E(n)0 = S \⋃Nn

i=1 E(n)i . By applying Lemma 3.11 successively for n = 2, 3, . . ., with P = Pn−1,

Q = Pn, ε = δ = 2−n andN = Nn, there exists a probability space (Ω,F ,P) on which are defined

S-values random variables Y(n)0 , . . . , Y

(n)Nn

, for n = 2, 3, . . ., [0, 1]-valued random variables ξ(n),n = 2, 3, . . ., and an S-valued random variableX1 with distribution P1, such that if the constants

c(n)1 , . . . , c

(n)Nn

∈ [0, 1] are appropriately chosen, then the random variable

Xn =

Y(n)i on Xn−1 ∈ E

(n)i , ξ(n) ≥ c

(n)i , i = 1, . . . , Nn,

Y(n)0 on Xn−1 ∈ E

(n)0 ∪⋃Nn

i=1Xn−1 ∈ E(n)i , ξ(n) < c

(n)i

has distribution Pn, andP(d(Xn−1, Xn) ≥ 2−n+1) ≤ 2−n+1.

Then∑∞

n=2 P(d(Xn−1, Xn) ≥ 2−n+1) <∞ and, by the Borel-Cantelli Lemma, P(d(Xn−1, Xn) ≥2−n+1 infinitely often) = 0. Hence

P

( ∞∑

n=2

d(Xn−1, Xn) <∞)

= 1.

Since (S, d) is complete, limn→∞Xn exists on this set. Setting X to be the value of the limitwhere it exists, and 0 otherwise, limn→∞Xn = X almost surely and so, by Proposition 3.10,limn→∞ ρ(Pn, P ) = 0, where P is the distribution of X .

Lemma 5.4. If (S, d) is complete and separable, then each P ∈ P(S)is tight.

Proof. Let x1, x2, . . . be a dense subset of S, and let P ∈ P(S). Then P(⋃∞

k=1 B(

xk,1n

))

= 1for each n and so, given ε > 0, there exist integers N1, N2, . . . such that

P

(

Nn⋃

k=1

B

(

xk,1

n

)

)

≥ 1 − ε

2n

for n = 1, 2, . . .. Let K be the closure of⋂

n≥1

⋃Nn

k=1 B(

xk,1n

)

. Then for each δ > 0, K can

be covered by Nn balls of radius δ where n > 1δ . Therefore K is totally bounded and hence

compact, and

P (K) ≥ 1 −∞∑

n=1

[

1 − P

(

Nn⋃

k=1

B

(

xk,1

n

)

)]

≥ 1 −∞∑

n=1

ε

2n

= 1 − ε.

Proof of Theorem 5.2.

(a ⇒ b) Immediate.

22

Page 24: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

(b ⇒ c) By Theorem 5.3, (P(S), ρ) is complete and hence the closure of M is complete. So itsuffices to show that M is totally bounded i.e. given δ > 0, there exists a finite set N ⊂ P(S)such that M ⊂ ⋃P∈N Q : ρ(P,Q) < δ.

Let 0 < ε < δ2 . Then there exists a compact set K ⊂ S such that (b) holds. By the

compactness of K, there exists a finite set x1, . . . , xn ⊂ K such that Kε ⊂ ⋃ni=1Bi, where

Bi = B(xi, 2ε). Fix x0 ∈ S and an integerm ≥ nε and let N be the finite collection of probability

measures of the form

P =

n∑

i=0

(

ki

m

)

δxi, (5.1)

where ki are integers with 0 ≤ ki ≤ m and∑n

i=0 ki = m.

Given Q ∈ M, let ki = ⌊mQ(Ei)⌋ for i = 1, 2, . . . , n, where Ei = Bi \⋃i−1

j=1Bj , and let

k0 = m−∑ni=1 ki. Then, defining P by 5.1, we have

Q(F ) ≤ Q

F∩Ei 6=∅

Ei

+Q((Kε)c)

≤∑

F∩Ei 6=∅

⌊mQ(Ei)⌋m

+n

m+ ε

≤ P (F 2ε) + 2ε

for all closed sets F ⊂ S. So ρ(P,Q) ≤ 2ε < δ as required.

(c ⇒ a) Let ε > 0. Since M is relatively compact, it is totally bounded and hence, for eachn ∈ N, there exists a finite subset Nn ⊂ M such that M ⊂ ⋃P∈Nn

Q : ρ(P,Q) < ε2n+1 . Since

Nn is finite, by Lemma 5.4, for each n ∈ N we can choose a compact set Kn ⊂ S such thatP (Kn) ≥ 1 − ε

2n+1 for all P ∈ Nn. Given Q ∈ M, for each n ∈ N, there exists Pn ∈ Nn suchthat

Q(Kε/2n+1

n ) ≥ Pn(Kn) − ε

2n+1≥ 1 − ε

2n.

Letting K be the closure of⋂

n≥1Kε/2n+1

n , K is compact and

Q(K) ≥ 1 −∞∑

n=1

ε

2n= 1 − ε.

5.2 Compact Sets in DE [0,∞)

To apply Prohorov’s Theorem (Theorem 5.2) to P(DE [0,∞)), it is necessary to have a charac-terisation of the compact sets of DE[0,∞). We first give conditions under which a collection ofstep functions is compact.

Definition 5.5. Given a step function x ∈ DE [0,∞), define s0(x) = 0 and, for k = 1, 2, . . .,define sk(x) = inft > sk−1(x) : x(t) 6= x(t−) if sk−1(x) <∞, and sk(x) = ∞ if sk−1(x) = ∞.

Lemma 5.6. For Γ ⊂ E, compact, and δ > 0, define A(Γ, δ) to be the set of step functionsx ∈ DE[0,∞) such that x(t) ∈ Γ for all t ≥ 0, and sk(x)− sk−1(x) > δ for each k ≥ 1 for whichsk−1 <∞. Then the closure of A(Γ, δ) is compact.

Proof. It is enough to show that every sequence in A(Γ, δ) has a convergent subsequence. Sup-pose (xn)n≥1 is a sequence in A(Γ, δ). Either there exists a subsequence (x1a

n )n≥1 such thats1(x

1an ) <∞ for all n, or there exists a subsequence (x1

n)n≥1 such that s1(x1n) = ∞ for all n. In

23

Page 25: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

the first case, there exists a subsequence (x1bn )n≥1 of (x1a

n )n≥1 such that, for some t1 ∈ [δ,∞],

limn→∞ s1(x1bn ) = t1. If t1 <∞, we can insist further that

∣log t1s1(x1b

n )

∣ < 1n . Since Γ is compact,

the sequence (x1bn )n≥1 has a subsequence (x1

n)n≥1 such that limn→∞ x1n(s1(x

1n)) = α1 for some

α1. In this way, we can construct a sequence of subsequences (x1n)n≥1 ⊂ (x2

n)n≥1 ⊂ · · · suchthat, for k = 1, 2, . . ., either

(a) sk(xkn) <∞ for all n, there exists some tk ∈ [kδ,∞] such that limn→∞ sk(xk

n) = tk, and if

tk <∞,∣

∣log tk−tk−1

sk(xkn)−sk−1(xk

n)

∣< 1

n , and limn→∞ xkn(sk(xk

n)) = αk for some αk, or

(b) sk(xkn) = ∞ for all n.

Let (ym)m≥1 be the subsequence of (xn)n≥1 defined by ym = xmm and define y ∈ DE [0,∞) by

y(t) = αk, tk ≤ t < tk+1, for k = 0, 1, . . . where we take t0 = 0. Since sk(ym) − sk−1(ym) > δfor each k ≥ 1 and each m for which sk−1(ym) <∞, we may define λm ∈ Λ to be the piecewiselinear function that joins (sk−1(ym), tk−1) to (sk(ym), tk) if tk < ∞ and which has gradient 1

otherwise. Then γ(λm) = supk

∣logtk−tk−1

sk(ym)−sk−1(ym)

∣ < 1m → 0, and, for each t there exists a k

such that r(ym(t), y(λm(t))) = r(ym(sk(ym)), y(tk)) → 0. Thus ym → y by Proposition 2.6.

The conditions for compactness will be stated in terms of the following modulus of continuity.

Definition 5.7. For x ∈ DE[0,∞), δ > 0, and T > 0, define

w′(x, δ, T ) = infti

maxi

sups,t∈[ti−1,ti)

r(x(s), x(t)),

where ti ranges over all partitions of the form 0 = t0 < t1 < · · · < tn−1 < T ≤ tn withmin1≤i≤n(ti − ti−1) > δ and n ≥ 1. (The initially strange looking condition tn−1 < T ≤ tnallows us to not have to worry about the length of the final interval, in a partition of [0, T ],being smaller than δ. For example, partitions where each interval is the same length, but thislength does not divide T , are admissible.)

Note that w′(x, δ, T ) is non-decreasing in δ and in T , and that

w′(x, δ, T ) ≤ w′(y, δ, T ) + 2 sup0≤s<T+δ

r(x(s), y(s)).

Before we can characterise the compact sets, we need to establish a few properties ofw′(x, δ, T ).

Lemma 5.8. (a) For each x ∈ DE [0,∞) and T > 0, w′(x, δ, T ) is right continuous in δ and

limδ→0

w′(x, δ, T ) = 0.

(b) If (xn)n≥1 is a sequence in DE [0,∞), and limn→∞ d(xn, x) = 0, then

lim supn→∞

w′(x, δ, T ) ≤ w′(x, δ, T + ε)

for every δ > 0, T > 0, and ε > 0.

(c) For each δ > 0 and T > 0, w′(x, δ, T ) is Borel measurable in x.

Proof. (a) The right continuity follows from the fact that any partition 0 = t0 < t1 < · · · <tn−1 < T ≤ tn with min1≤i≤n(ti−ti−1) > δ and n ≥ 1 also satisfies min1≤i≤n(ti−ti−1) > δ′

for δ′ = 12 (δ + min1≤i≤n(ti − ti−1)) > δ.

24

Page 26: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Let N ∈ N and define τN0 = 0 and, for k = 1, 2, . . .,

τNk = inf

t > τNk−1 : r(x(t), x(τN

k−1)) >1

N

if τNk−1 < ∞, τN

k = ∞ if τNk−1 = ∞. Note that the sequence (τN

k )k≥0 is strictly increasing

(as long as its terms remain finite) by the right continuity of x, and for 0 < δ < minτNk+1−

τNk : τN

k < T , w′(x, δ, T ) ≤ maxi sups,t∈[τNi ,τN

i+1)

(

r(x(s), x(τNi )) + r(x(τN

i ), x(t)))

≤ 2N .

Hence limδ→0 w′(x, δ, T ) = 0.

(b) Let (xn)n≥1 be a sequence in DE [0,∞), x ∈ DE [0,∞). Suppose δ > 0 and T > 0. Iflimn→∞ d(xn, x) = 0, then by Proposition 2.6, there exists a sequence (λn)n≥1 in Λ suchthat (2.1) and (2.9) hold when T is replaced by T + δ. For each n, let yn(t) = x(λn(t)) forall t ≥ 0 and let δn = sup0≤t≤T (λn(t+ δ) − λn(t)). Then, for every ε > 0,

lim supn→∞

w′(xn, δ, T ) ≤ lim supn→∞

w′(yn, δ, T ) + 2 lim supn→∞

sup0≤s<T+δ

r(xn(s), x(λn(t)))

≤ lim supn→∞

w′(x, δn, λn(T ))

≤ limn→∞

w′(x, δn ∨ δ, T + ε)

= w′(x, δ, T + ε),

where the first inequality follows from the comment at the end of Definition 5.7, the secondfollows by substituting s, t for λn(s), λn(t) in the definition of w′ and by w′ being non-decreasing in δ, the third follows by w′ being non-decreasing in δ and T and by λn(T ) → T ,and the equality follows by w′ being right continuous in δ (part (a)).

(c) Define w′(x, δ, T+) = limε↓0 w′(x, δ, T + ε). This exists by the monotonicity of w′ in T .Then if xn → x, by (b),

lim supn→∞

w′(xn, δ, T+) ≤ limε↓0

w′(x, δ, T + 2ε)

= w′(x, δ, T+).

So w′(x, δ, T+) is upper semicontinuous and hence Borel measurable in x. The result fol-lows by the observation that w′(x, δ, T ) = limε↓0 w′(x, δ, (T −ε)+) for every x ∈ DE [0,∞).

Theorem 5.9. Let (E, r) be complete. Then the closure of A ⊂ DE [0,∞) is compact if andonly if the following two conditions hold:

(a) For every rational t ≥ 0, there exits a compact set Γt ⊂ E such that x(t) ∈ Γt for allx ∈ A.

(b) For each T > 0,limδ→0

supx∈A

w′(x, δ, T ) = 0.

Proof. Suppose A satisfies (a) and (b), and let l ≥ 1. Choose δl ∈ (0, 1) such that

supx∈A

w′(x, δl, l) ≤1

l

and ml ≥ 2 such that 1ml

< δl. Define Γ(l) =⋃(l+1)ml

i=0 Γi/ml. Using the notation of Lemma 5.6,

let Al = A(Γ(l), δl).

25

Page 27: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Given x ∈ A, there is a partition 0 = t0 < t1 < · · · < tn−1 < l ≤ tn < l+ 1 < tn+1 = ∞ withmin1≤i≤n(ti − ti−1) > δl such that

max1≤i≤n

supt∈[ti−1,ti)

r(x(s), x(t)) ≤ 2

l.

Define x′ ∈ Al by x′(t) = x((⌊mlti⌋ + 1)/ml) for ti ≤ t < ti+1, i = 0, 1, . . . , n. Then, by thedefinition of ml, ti ≤ (⌊mlti⌋ + 1)/ml ≤ ti + 1

ml< ti+1, sup0≤t<l r(x

′(t), x(t)) ≤ 2l . Hence

d(x′, x) ≤∫ ∞

0

e−u supt≥0

r(x′(t ∧ u), x(t ∧ u)) ∧ 1du

≤ 2

l+ e−l

<3

l,

and so A ⊂ A3/ll . Now l was arbitrary and so A ⊂ ⋂l≥1 A

l/3l . By Lemma 5.6, Al is compact for

each l ≥ 1 and hence A is totally bounded. It follows that A has compact closure, as required.

Conversely, suppose that A has compact closure. For each rational t ≥ 0, define Γt ⊂ E byΓt = At, where At = x(t) : x ∈ A. In order to show that Γt is compact, it suffices to showthat every sequence in At has a convergent subsequence. Suppose (xn(t))n≥1 is a sequence inAt. Since A has compact closure, by restricting to a subsequence if necessary, we may assumethat xn → x, for some x ∈ DE[0,∞). By Proposition 2.6, there exists a sequence (λn)n≥1

in Λ such that (2.1) and (2.9) hold. There is a subsequence λnrsuch that either λnr

(t) ≥ tfor all nr, or λnr

(t) < t for all nr. In the first case, r(xnr(t), x(t)) ≤ r(xnr

(t), x(λnr(t))) +

r(x(λnr(t)), x(t)), the first term of which converges to 0 by (2.9) and the second term of which

converges to 0 by (2.4) and the right continuity of x. In the second case, r(xnr(t), x(t−)) ≤

r(xnr(t), x(λnr

(t))) + r(x(λnr(t)), x(t−)), which converges to 0 similarly. Hence (xn(t))n≥1 has

a convergent subsequence, and (a) holds.

To see that (b) holds, suppose there exist η > 0, T > 0 and a sequence (xn)n≥1 in A such thatw′(xn,

1n , T ) ≥ η for all n. Since A has compact closure, we may assume that limn→∞ d(xn, x) =

0 for some x ∈ DE [0,∞). By Lemma 5.8(b),

η ≤ lim supn→∞

w′(xn, δ, T ) ≤ w′(x, δ, T + 1)

for all δ > 0. Letting δ → 0, by Lemma 5.8(a), the right hand side tends to zero, resulting in acontradiction. Hence (b) holds.

5.3 Some Useful Criteria

We now combine the above characterisation with Prohorov’s Theorem (Theorem 5.2) to obtainsome useful criteria for relative compactness in DE [0,∞).

Theorem 5.10. Let (E, r) be complete and separable, and let Xα be a family of stochasticprocesses with sample paths in DE [0,∞). Then Xα is relatively compact if and only if thefollowing two conditions hold:

(a) For every η > 0 and rational t ≥ 0, there exists a compact set Γη,t ⊂ E such that

infα

P(

Xα(t) ∈ Γηη,t

)

≥ 1 − η.

(b) For every η > 0 and T > 0, there exists δ > 0 such that

supα

P (w′(Xα, δ, T ) ≥ η) ≤ η.

26

Page 28: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

(Note that, by Lemma 5.8, w′(x, δ, T ) is Borel measurable and so the set w′(Xα, δ, T ) ismeasurable in the probability space underlying Xα. Hence it is legitimate to refer to theprobability of this set).

Proof. If Xα is relatively compact, then since (DE [0,∞), d) is complete and separable (The-orem 2.10) we may apply Prohorov’s Theorem (Theorem 5.2), to find a compact set Kη ⊂DE [0,∞) such that infα P (Xα ∈ Kη) ≥ 1 − η. Since Kη is compact, by Theorem 5.9, for everyrational t ≥ 0, there exists a compact set Γη,t such that x(t) ∈ Γη,t for all x ∈ Kη i.e.

infα

P (Xα(t) ∈ Γη,t) ≥ infα

P (Xα ∈ Kη) ≥ 1 − η.

Also, by Theorem 5.9, limδ→0 supx∈Kηw′(x, δ, T ) = 0 and so, for each T > 0, there exists δ > 0

such that supx∈Kηw′(x, δ, T ) < η. Then

supα

P (w′(Xα, δ, T ) ≥ η) ≤ supα

P(Xα /∈ Kη)

≤ 1 − infα

P(Xα ∈ Kη)

≤ η.

So (a) and (b) hold and in fact Γηη,t can be replaced by Γη,t in (a).

Conversely, let ε > 0, let T be a positive integer such that e−T < ε2 , and choose δ > 0 such

that (b) holds with η = ε4 . Let m > 1

δ and set Γ =⋃mT

i=0 Γε2−i−2, im

. Then

infα

P(

Xα(i/m) ∈ Γε4 , i = 0, 1, . . . ,mT

)

≥ 1 −mT∑

i=0

(

1 − infαP(

Xα(i/m) ∈ Γε2−i−2

ε2−i−2, im

))

≥ 1 −mT∑

i=0

ε2−i−2

≥ 1 − ε

2.

Using the notation of Lemma 5.6, let A = A(Γ, δ). By the lemma, A has compact closure.

Given x ∈ DE [0,∞), with w′(x, δ, T ) < ε4 and x(i/m) ∈ Γ

ε4 for i = 0, 1, . . . ,mT , by the

definition of w′ (Definition 5.7), there exists a partition 0 = t0 < t1 < · · · < tn−1 < T ≤ tn suchthat min1≤i≤n(ti − ti−1) > δ and

max1≤i≤n

sups∈[ti−1,ti)

r(x(s), x(t)) <ε

4,

and there exist yi ⊂ Γ such that r(x(i/m), yi) <ε4 , for i = 0, 1, . . . ,mT . Define x′ ∈ A by

x′(t) =

y⌊mti−1⌋+1, ti−1 ≤ t < ti, i = 1, . . . , n− 1

y⌊mtn−1⌋+1, t ≥ tn−1.

Then since m > 1δ , for each i = 1, . . . , n, ti−1 ≤ ⌊mti−1⌋+1

m < ti and so if ti−1 ≤ t < ti ∧ T , thenr(x(t), x′(t)) ≤ r(x(t), x((⌊mti−1⌋+ 1)/m))+ r(x((⌊mti−1⌋+ 1)/m), y⌊mti−1⌋+1) <

ε2 , and hence

d(x, x′) < ε2 + e−T < ε, implying that x ∈ Aε. Consequently,

infα

P(Xα ∈ Aε) ≥ infα

P

(

Xα(i/m) ∈ Γε4 , i = 0, 1, . . . ,mT and w′(Xα, δ, T ) <

ε

4

)

≥ 1 −(ε

2+ε

4

)

> 1 − ε.

Since Theorem 2.10 implies (DE [0,∞), d) is complete and separable, by Prohorov’s Theorem(Theorem 5.2), Xα is relatively compact.

27

Page 29: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Corollary 5.11. Let (E, r) be complete and separable, and let (Xn)n≥1 be a sequence of stochas-tic processes with sample paths in DE[0,∞). Then (Xn)n≥1 is relatively compact if and only ifthe following two conditions hold:

(a) For every η > 0 and rational t ≥ 0, there exists a compact set Γη,t ⊂ E such that

lim infn→∞

P(

Xn(t) ∈ Γηη,t

)

≥ 1 − η.

(b) For every η > 0 and T > 0, there exists δ > 0 such that

lim supn→∞

P (w′(Xn, δ, T ) ≥ η) ≤ η.

Proof. The conditions are necessary as an immediate consequence of Theorem 5.10.

For the sufficiency, fix η > 0, rational t ≥ 0, and T > 0. For every n ∈ N, by Lemma 5.4there exists a compact set Γn ⊂ E such that P(Xn(t) ∈ Γη

n) ≥ 1 − η and by Lemma 5.8(a)there exists δn > 0 such that P(w′(Xn, δn, T ) ≥ η) ≤ η. By conditions (a) and (b) there existsa compact set Γ0 ⊂ E, δ0 > 0, and a positive integer n0 such that

infn≥n0

P (Xn(t) ∈ Γη0) ≥ 1 − η

andsup

n≥n0

P (w′(Xn, δ0, T ) ≥ η) ≤ η.

We can replace n0 by 1 in the above relations if we replace Γ0 by Γ =⋃n0−1

n=0 Γn and δ0 by

δ =∧n0−1

n=0 δn. The result follows by Theorem 5.10.

6 A Law of Large Numbers

We now change course slightly to discuss a generalisation of the Law of Large Numbers to asequence of Markov jump processes. We shall use this result in the next section when we applythe theory we have built up so far to establishing a generalisation of the Central Limit Theoremto Markov processes.

The Law of Large Numbers essentially demonstrates that the average behaviour of a sequenceof independent identically distributed random variables with finite means becomes deterministicas the number of random variables becomes large. By viewing these random variables as thejump sizes in a random walk, the value of

X1 +X2 + · · · +XN

N

can be regarded as the position of the random walk at time 1 if the jump rate is increased andthe jump size is decreased, each by a factor of N (See Figure 1). In this setting, the Law of LargeNumbers can be interpreted as saying that in the limit as N → ∞ the position of a random walkat time 1 becomes deterministic if the jump rate is increased, and the jump size is decreased,each by a factor of N .

Now suppose that instead of a random walk we have a general pure jump Markov process(defined below). We generalise the Law of Large Numbers by showing that, under certainconditions, if we alter the jump rate and jump size as described above, then, in the limit asN → ∞, the position of the jump process at time t is deterministic for all t. This deterministiclimit is known as the fluid limit.

28

Page 30: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Figure 1: Illustration of the scaling process for N = 3

6.1 Preliminaries

We start with a few definitions.

Definition 6.1. A stochastic process X = (Xt)t≥0 taking values in a subset I of Rd is a purejump process if there exist some (possibly infinite) random times 0 = J1 < J2 < . . . < Jn ↑ ζ,(where ∞ <∞ is allowed) and some I-valued process (Yn)n∈Z+ such that

Xt =

Yn if Jn ≤ t < Jn+1

∂ if t ≥ ζ

where ζ (which may be infinite) is the explosion time of the process and ∂ is some cemeterystate.

X is a pure jump Markov process with Levy kernel K if it is a pure jump process and, for alln ∈ N,

P(Jn ∈ dt, ∆XJn∈ dy | Jn > t, XJn−1 = x) = K(x, dy)dt,

where ∆XJn= XJn

−XJn−1 is the displacement of the nth jump.

Definition 6.2. The jump measure µ of X on (0,∞) × Rd is given by

µ =

∞∑

n=1

δ(Jn,∆XJn )

where δ(t,y) denotes the unit mass at (t, y) i.e.∫

fdµ =∑∞

n=1 f(Jn,∆XJn).

We also introduce the random measure ν on (0,∞) × Rd, given by

ν(dt, dy) = K(Xt−, dy)dt.

29

Page 31: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Definition 6.3. The Laplace transform of a pure jump Markov process with Levy kernel K isgiven by

m(x, θ) =

Rd

e〈θ,y〉K(x, dy).

We assume that, for some η0 > 0,

supx∈I

sup|θ|≤η0

m(x, θ) ≤ C <∞. (6.1)

The conditions for the fluid limit to exist will be formulated in terms of the Laplace transform,so before we state the main theorem, we establish a few preliminary results.

Fix η ∈ (0, η0). We establish bounds on m′′(x, θ) and m′′′(x, θ) for |θ| ≤ η in the lemmabelow, where ′ denotes differentiation in θ. Although we only use the first bound in the proof ofthe fluid limit, the second bound will be useful in the next section and it is convenient to provethem together.

Lemma 6.4. There exist A <∞ and B <∞ such that

|m′′(x, θ)| ≤ A and |m′′′(x, θ)| ≤ B (6.2)

for all x ∈ I and |θ| ≤ η.

Proof. Set δ = η0 − |θ| ≥ η0 − η > 0. Note that (δy)2 ≤ eδy + e−δy for all y ∈ R. Then

|y|2 = (y1)2 + · · · + (yd)

2 ≤ δ−2d∑

i=1

(eδyi + e−δyi)

and so∫

Rd

|y|2e〈θ,y〉K(x, dy) ≤ δ−2d∑

i=1

(∫

Rd

eδyie〈θ,y〉K(x, dy) +

Rd

e−δyie〈θ,y〉K(x, dy)

)

= δ−2d∑

i=1

(∫

Rd

e〈θ+δei,y〉K(x, dy) +

Rd

e〈θ−δei,y〉K(x, dy)

)

≤ 2Cd

δ2

since |θ± δei| ≤ |θ|+ δ = η0. We may now differentiate twice under the integral sign to see thatm′′(x, θ) exists (m′(x, θ) exists since |y| ≤ 1

2 (|y|2 + 1)). Then for |u|, |v| ≤ 1,

〈u,m′′(x, θ)v〉 =

Rd

〈u, y〉〈y, v〉e〈θ,y〉K(x, dy)

≤∫

Rd

|u||y|2|v|e〈θ,y〉K(x, dy)

≤ 2Cd

δ2

≤ 2Cd

(η0 − η)2

So, setting A = 2Cd(η0−η)2 gives the required result.

Now, with δ as above, (δy)2 ≤ 94 (e

23 δy + e−

23 δy) and so

|y|3 =(

(y1)2 + · · · + (yd)

2)

32 ≤ 27

8δ−3

(

d∑

i=1

(e23 δyi + e−

23 δyi)

)

32

≤ 27

8δ−3(2d)

12

d∑

i=1

(eδyi + e−δyi)

30

Page 32: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

where the last inequality follows by Jensen’s Inequality and the concavity of x23 . Then, as above

Rd

|y|3e〈θ,y〉K(x, dy) ≤ 27

8δ−3(2d)

12 2Cd ≤ 27Cd

32

2√

2(η0 − η)3

and so, by the same reasoning as above, m′′′(x, θ) exists and taking B = 27Cd32

2√

2(η0−η)3gives the

required result.

Proposition 6.5. Let a : Ω × (0,∞) × Rd → R be a previsible process satisfying

E

∫ t

0

Rd

|a(s, y)|ν(ds, dy) <∞

for all t. Then the following process is a martingale.

∫ t

0

Rd

a(s, y)(µ− ν)(ds, dy).

Proof. This result is well known.

In particular, by the proof of Lemma 6.4, |y| ≤ 12 (|y|2 + 1) ≤ 1

2 (η−20

∑di=1(e

〈η0ei,y〉 +

e〈−η0ei,y〉) + 1), and so (6.1) gives

E

∫ t

0

Rd

|y|ν(ds, dy) ≤ 1

2

(

2d

η20

+ 1

)

Ct <∞

and hence we may take a(s, y) = y to get the martingale

Mt =

Rd

y(µ− ν)(ds, dy). (6.3)

For θ ∈ (Rd)∗, define

φ(x, θ) =

Rd

e〈θ,y〉 − 1 − 〈θ, y〉

K(x, dy).

Then φ ≥ 0 and, by the second-order mean value theorem,

φ(x, θ) =

∫ 1

0

〈θ,m′′(x, rθ)θ〉(1 − r)dr.

Therefore, by Lemma 6.4

φ(x, θ) ≤∫ 1

0

|m′′(x, rθ)||θ|2(1 − r)dr ≤ 1

2A|θ|2,

for x ∈ I and |θ| ≤ η.

Definition 6.6. Let (θt)t≥0 be a previsible process in (Rd)∗ with |θt| ≤ η for all t. Define

Zt = Zθt = exp

∫ t

0

〈θs, dMs〉 −∫ t

0

φ(Xs, θs)ds

.

Lemma 6.7. (Zt)t≥0 is a martingale.

31

Page 33: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Proof. Since |θt| ≤ η for all t,

|Zt| ≤ exp∫ t

0

|θs||dMs| +∫ t

0

|φ(Xs, θs)|ds

≤ exp∫ t

0

Rd

2η|y|ν(ds, dy) +

∫ t

0

1

2Aη2dt

≤ exp

(

2d

η20

+ 1

)

ηCt+1

2Aη2t

,

and so (Zt)t≥0 is locally bounded. Now,

Zt − Zt− = Zt−

(

exp∫

Rd

〈θt, y〉(µ− ν)(dt, dy) −∫

Rd

e〈θt,y〉 − 1 − 〈θt, y〉ν(dt, dy) − 1

)

= Zt−

y:(Jn,∆XJn)=(t,y)

e〈θt,y〉 − 1 +

Rd

(e〈θt,y〉 − 1)ν(dt, dy)

=

Rd

Zt−(e〈θt,y〉 − 1)(µ− ν)(dt, dy).

Hence

Zt = 1 +

∫ t

0

Rd

Zs−(e〈θs,y〉 − 1)(µ− ν)(ds, dy),

and so (Zt)t≥0 is a non-negative local martingale. Therefore there exist stopping times Tn → ∞such that (ZTn

t )t≥0 is a martingale and so, by Fatou’s Lemma, E(Zt) = E(lim infn→∞ ZTn

t ) ≤lim infn→∞ E(ZTn

t ) = 1. Therefore

E

∫ t

0

Rd

|Zs−(e〈θs,y〉 − 1)|ν(ds, dy) ≤ E

∫ t

0

Zs−

Rd

(e〈θs,y〉 + 1)K(Xs−, dy)ds

≤ E

∫ t

0

Zs(m(Xs, θs) +m(Xs, 0))ds

≤ 2Ct <∞.

By Proposition 6.5, (Zt)t≥0 is a martingale.

Proposition 6.8 (The Exponential Martingale Inequality). For all δ ∈ (0, Aηt√d],

P

(

sups<t

|Ms| > δ

)

≤ 2de−δ2

2Adt .

Proof. Fix θ ∈ (Rd)∗ with |θ| = 1 and, for any δ′ > 0, consider the stopping time

T = inft ≥ 0 : 〈θ,Mt〉 > δ′.For any ε < η, by Lemma 6.7, (Zεθ

t )t≥0 is a martingale, where θt = θ for all t. Therefore, bythe Optional Stopping Theorem, E(Zεθ

T∧t) = E(Zεθ0 ) = 1. Since on the set T ≤ t,

ZεθT∧t ≥ exp

ε〈θ,MT 〉 −∫ T

0

1

2Aε2|θ|2dt

≥ eδ′ε− 12Atε2

(using the right continuity of M to get 〈θ,MT 〉 ≥ δ′),

P

(

sups≤t

〈θ,Ms〉 > δ′)

= P(T ≤ t)

≤ P

(

ZεθT∧t ≥ eδ′ε− 1

2Atε2)

≤ e−δ′ε+ 12Atε2

E(ZεθT∧t)

= e−δ′ε+ 12Atε2

,

32

Page 34: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

where the second inequality is by Chebyshev’s Inequality. In particular, when δ′ ≤ Atη, we cantake ε = δ′

At to get

P

(

sups≤t

〈θ,Ms〉 > δ′)

≤ e−δ′22At .

Now, if sups≤t |Ms| > δ, then sups≤t〈θ,Ms〉 > δ√d

for one of θ = ±e1, . . . ,±ed. Setting δ′ =δ√d≤ ATη gives

P(sups<t

|Ms| > δ) ≤d∑

i=1

P

(

〈ei,Ms〉 >δ√d

)

+

d∑

i=1

P

(

〈−ei,Ms〉 >δ√d

)

≤ 2de−δ2

2Adt .

Finally we state Gronwall’s Inequality, which will be useful in this section and the next.

Lemma 6.9 (Gronwall’s Inequality). Let µ be a Borel measure on [0,∞), let ε ≥ 0 and letf be a Borel measurable function that is bounded on bounded intervals and satisfies

0 ≤ f(t) ≤ ε+

[0,t)

f(s)µ(ds)

for t ≥ 0. Thenf(t) ≤ εeµ[0,t).

In particular, if M > 0 and

0 ≤ f(t) ≤ ε+M

∫ t

0

f(s)ds

for t ≥ 0, thenf(t) ≤ εeMt.

Proof. This result is well known.

6.2 The Fluid Limit

Let (XNt )t≥0 be a sequence of pure jump Markov processes with Levy kernels KN (x, dy) taking

values in some subsets IN of Rd. Let S be an open subset in Rd and set SN = IN ∩S. We shallstudy, under certain conditions, the limiting behaviour of (XN

t )t≥0 as N → ∞, on compact timeintervals, up to the first time the process leaves S.

The introduction of S does not add any additional restrictions as we are free to take S = Rd.However, in some situations our processes may all stop abruptly on leaving some open set Ui.e. KN(x, dy) = 0 for all x /∈ U and when this happens it is useful to restrict our attention towhat happens before the processes leave U . In this case we may choose S to be a subset of U .In other cases we may wish to take S to be small to make it easier to check that the variousconvergence conditions are satisfied. The only restriction imposed on S is that the conjecturedlimit path does not leave S in the relevant compact time interval.

Definition 6.10. Fix t0 > 0 and set

TN = inft ≥ 0 : XNt /∈ S ∧ t0.

¿From now on we shall assume that XNt = XN

t∧T N for all t ≥ 0.

33

Page 35: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Let mN (x, θ) be the Laplace transform corresponding to Levy kernel KN(x, dy) for x ∈ SN

and θ ∈ (Rd)∗ (see Definition 6.3). We assume that there is a limit kernel K(x, dy) defined forx ∈ S with corresponding Laplace transform m(x, θ) with the following properties:

(a) There exists a constant η0 > 0 such that m(x, θ) is bounded for all x ∈ S and |θ| ≤ η0.

(b) As N → ∞,

supx∈SN

sup|θ|≤η0

mN (x,Nθ)

N−m(x, θ)

→ 0. (6.4)

We give some justification of condition (6.4) with the following example.

Example 6.11. Consider the case outlined at the beginning of this chapter where XN arisesfrom X1 by increasing the jump rate by a factor of N and decreasing the jump size by a factor ofN . Then XN

t = 1N (X1

Nt −X10 ) +XN

0 = 1N (XNt −X0) +XN

0 . So, setting K(x, dy) = K1(x, dy),we get

K(x, dy)dt = P (J1 ∈ dt, ∆XJ1 ∈ dy | J1 > t, X0 = x)

= P

(

NJN1 ∈ dt, N∆XN

JN1

∈ dy | NJN1 > t, XN

0 = x)

= P

(

JN1 ∈ dt/N, ∆XN

JN1

∈ dy/N | JN1 > t/N, XN

0 = x)

= KN (x, dy/N) dt/N.

HenceKN (x, dy

N)

N = K(x, dy) and

m(x,Nθ)

N=

1

N

Rd

e〈Nθ,y〉KN (x, dy)

=

Rd

e〈θ,y′〉KN (x, dy′/N)/N

=

Rd

e〈θ,y′〉K(x, dy′)

= m(x, θ)

(where y′ = Ny) i.e. (6.4) holds. Hence our fluid limit theorem will apply in this case (subjectto a few additional constraints on K(x, dy)) and so we do have a generalisation of the Law ofLarge Numbers as claimed at the beginning of the section.

We also assume that there exists some x0 ∈ S, the closure of S such that, for all δ > 0,

lim supN→∞

N−1 log P(|XN0 − x0| ≥ δ) < 0. (6.5)

What this means is that there exists some γδ > 0 such that P(|XN0 − x0| ≥ δ) < e−γδN for

sufficiently large N i.e. XN0 converges in probability to x0 exponentially fast in N . This holds

in the simple case XN0 = xN

0 (deterministic) with xN0 → x0 as N → ∞.

Set b(x) = m′(x, 0), where once again ′ denotes differentiation in θ. We make the finalassumptions that b is Lipschitz on S and that S has a Lipschitz boundary so that b has anextension to a Lipschitz vector field b on Rd. Then there is a unique solution (xt)t≥0 to the

ordinary differential equation xt = b(xt) starting from x0.

Theorem 6.12 (Fluid Limit). Under the above assumptions, for all δ > 0,

lim supN→∞

N−1 log P( supt≤T N

|XNt − xt| ≥ δ) < 0,

where TN is defined in Definition 6.10.

34

Page 36: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Proof. By assumption (6.4), since m(x, θ) is bounded, there exists some constant C < ∞ suchthat, for all n ∈ N,

supx∈SN

sup|θ|≤η0

mN (x,Nθ)

N≤ C.

Fix η ∈ (0, η0). Then, by the proof of Lemma 6.4,

|mN ′′(x, θ)| ≤ 2NCd

(Nη0 −Nη)2=A

N,

for all x ∈ SN and |θ| ≤ Nη.

Set bN (x) = mN ′(x, 0) and define (MNt )t≥0 by

XNt = XN

0 +MNt +

∫ t

0

bN (XNs )ds.

Then

MNt = XN

t −XN0 −

∫ t

0

Rd

yKN(XNs , dy)ds

=∑

JNn ≤t

∆XJNn−∫ t

0

Rd

yνN (ds, dy)

=

∫ t

0

Rd

y(µN − νN )(ds, dy),

where µN and νN are defined as in Definition 6.2. MNt corresponds to the martingale Mt in

(6.3) and hence we may apply Proposition 6.8 to MNt to conclude that, if ε0 = A

NNηt0√d =

Aηt0√d > 0 and C0 = max2d, 2Adt0, then, for all ε ∈ (0, ε0],

P

(

supt≤T N

|MNt | > ε

)

≤ 2de−Nε2

2AdT N ≤ C0e−Nε2

C0 . (6.6)

Given δ > 0, set ε = min δ2e

−Kt0 , ε0, where K is the Lipschitz constant of b. Let

ΩN = |XN0 − x0| ≤ ε and sup

t≤T N

|MNt | ≤ ε.

Then by (6.5) and (6.6)

P(Ω \ ΩN ) = P(|XN0 − x0| > ε or sup

t≤T N

|MNt | > ε)

≤ P(|XN0 − x0| > ε) + P( sup

t≤T N

|MNt | > ε)

≤ e−γεN + C0e−Nε2

C0

for large enough N , and, by L’Hopital’s Rule,

lim supN→∞

N−1 log(e−γεN + C0e−Nε2

C0 ) = lim supN→∞

−γεe−γεN − ε2

C0C0e

−Nε2

C0

e−γεN + C0e−Nε2

C0

≤ −minγε, ε2/C0 < 0.

Hencelim supN→∞

N−1 log P(Ω \ ΩN ) < 0.

35

Page 37: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

However, by (6.4), there exists N1 ∈ N such that

supx∈SN

sup|θ|≤η0

mN (x,Nθ)

N−m(x, θ)

≤ ε2

8At0d

for all N ≥ N1. Then for each N ≥ N1, pick h ∈ (Rd)∗ in the same direction as bN(x) − b(x)and with |h| = ε

2At0√

d. |h| ≤ η by the definition of ε0, since ε ≤ ε0. Then

|bN(x) − b(x)| =|〈h, bN (x) − b(x)〉|

|h|

≤ 1

|h|

( ∣

〈Nh, bN(x)〉N

− mN (x,Nh) −mN (x, 0)

N

+

mN (x,Nh)

N−m(x, h)

+

mN (x, 0)

N−m(x, 0)

+ |m(x, h) −m(x, 0) − 〈h, b(x)〉|)

≤ 1

|h|

(

φN (x,Nh)

N

+ 2 supx∈SN

sup|θ|≤η0

mN (x,Nθ)

N−m(x, θ)

+ |φ(x, h)|)

≤ 1

|h|

(

1

N

1

2

A

NN2|h|2 + 2

ε2

8At0d+

1

2A|h|2

)

t0(6.7)

We note that

XNt − xt = (XN

0 − x0) +MNt +

∫ t

0

(bN (XNs ) − b(XN

s ))ds+

∫ t

0

(b(XNs ) − b(xs))ds

for t ≤ TN ≤ t0. So, for N ≥ N1, on ΩN ,

|XNt − xt| ≤ 3ε+K

∫ t

0

|XNs − xs|ds

which implies, by Gronwall’s lemma (Lemma 6.9), that supt≤T N |XNt −xt| ≤ 3εeKt0 ≤ δ. Hence

lim supN→∞

N−1 log P( supt≤T N

|XNt − xt| ≥ δ) ≤ lim sup

N→∞N−1 log P(Ω \ ΩN ) < 0,

as required.

Remark 6.13. A corresponding generalisation of the Law of Large Numbers holds for diffusionprocesses. Suppose X1 is a diffusion process with diffusivity a1 and drift b1, satisfying theconditions of Section 3.2.3. For simplicity assume that X1

0 = 0. Then there exists a Brownianmotion B such that

dX1t = σ1(X

1t )dBt + b1(X

1t )dt,

where σ1(x)σ1(x)∗ = a1(x). Defining XN as in Example 6.11, but with XN

0 = 0, we getXN

t = 1NX

1Nt and so,

dXNt =

σ1(NXNt )√

NdB′

t + b1(NXNt )dt,

where B′t = 1√

NBNt is a Brownian motion. Hence XN has diffusivity aN (x) = a1(Nx)

N , and

drift bN (x) = b1(Nx). Since a1 is bounded, aN → 0 uniformly as N → ∞. If there exists someLipschitz function b such that bN → b uniformly, let xt be the solution to the ordinary differentialequation dxt

dx = b(xt) with x0 = 0. By Section 3.2.3, XNt → xt in probability, generalising the

Law of Large Numbers, as required.

36

Page 38: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

6.3 A Brief Look at the Exit Time

We conclude this section by making some observations about the limiting distribution of theexit time TN . This will enable us to make the statement of Theorem 6.12 more precise and willalso be useful in the next section.

Definition 6.14. Defineτ = inft ≥ 0 : xt /∈ S ∧ t0,T = t ∈ [0, τ) : xt /∈ S.

We aim to show that

lim supN→∞

N−1 log P( inft∈T ∪τ

|TN − t| > δ) < 0.

SetΩN

δ = supt≤T N

|XNt − xt| < δ

andAδ = s ∈ [0, t0] : inf

y∈Sc|xs − y| < δ.

Note that since S is open, by the right continuity of XN , either TN = t0 or XNT N /∈ S. Hence,

on the set ΩNδ , TN ∈ Aδ or TN = t0 (See Figure 2).

Figure 2: Illustration of the set Aδ

Note that if τ 6= t0, then, for 0 < ε < t0−τ there exists some τε ∈ [τ, τ+ε) such that xτε/∈ S.

Since (S)c is open, there exists δ > 0 such that B(xτ , δ)∩ S = ∅. Then on ΩNδ , XN

τε/∈ S and so

TN ≤ τε < t0. Hence TN = t0 on ΩNδ implies that τ = t0.

Lemma 6.15. sups∈Aδ inft∈T ∪τ |s− t| → 0 as δ → 0.

37

Page 39: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Proof. Suppose the contrary. Then there exists some ε > 0 such that, for each n ∈ N, thereexists some sn ∈ A

1n , with inft∈T ∪τ |sn − t| ≥ ε. By the definition of A

1n , there is a sequence

yn ∈ Sc with |xsn− yn| < 1

n . Since sn is in the compact interval [0, t0] for all n, there existsa convergent subsequence snr

→ s for some s ≤ t0. As xt is continuous, xsnr→ xs. Hence

ynr= (ynr

− xsnr) + xsnr

→ xs and, since Sc is closed, xs ∈ Sc i.e. s ∈ T ∪ τ, contradictingour choice of sn.

Proposition 6.16. For all δ > 0,

lim supN→∞

N−1 log P

(

inft∈T ∪τ

|TN − t| > δ

)

< 0.

Proof. By the above lemma, given δ > 0, there exists an ε > 0 such that sups∈Aε inft∈T ∪τ |s−t| ≤ δ and such that, if τ 6= t0, B(xτη

, ε) ∩ S = ∅ for some 0 < η < t0 − τ . On ΩNε , TN ∈ Aε

or TN = t0. Hence inft∈T ∪τ |TN − t| ≤ δ and so inft∈T ∪τ |TN − t| > δ ⊂ Ω \ ΩNε . By

Theorem 6.12,

lim supN→∞

N−1 log P( inft∈T ∪τ

|TN − t| > δ) ≤ lim supN→∞

N−1 log P(Ω \ ΩNε )

= lim supN→∞

N−1 log P( supt≤T N

|XNt − xt| > δ)

< 0.

In particular, if T is empty, then TN → τ in probability, exponentially fast in N . SinceXN

t = XNt∧T N , for all t ≥ TN , |XN

t − xt∧τ | ≤ |XT N − xT N | + |xT N − xt∧τ |. Now as xt iscontinuous, there exists some ε0 > 0 such that |TN − τ | ≤ ε0 implies that the second term isless than δ

2 for all t ≥ TN . Choosing 0 < ε < δ2 sufficiently small that ΩN

ε ⊂ |TN − τ | ≤ ε0gives

lim supN→∞

N−1 log P(supt≤t0

|XNt − xt∧τ | > δ) ≤ lim sup

N→∞N−1 log P(Ω \ ΩN

ε ) < 0.

7 A Central Limit Theorem

The Central Limit Theorem states that if we have a sequence (Xn)n∈N of independent identicallydistributed random variables with finite mean and variance, then

√N

(

X1 +X2 + · · · +XN

N− µ

)

converges in distribution to a N(0, σ2) random variable where σ2 is the variance of each randomvariable, and µ is the mean of each random variable or, equivalently, the deterministic limit towhich (X1 +X2 + · · · +XN )/N converges by the Law of Large Numbers.

As in the previous section, we can view these random variables as the jump sizes in a randomwalk, and so obtain a corresponding statement about the position of the random walk at time1 if the jump rate is increased and the jump size is decreased, each by a factor of N .

In this section we generalise the Central Limit Theorem to Markov jump processes by showingthat, under certain conditions, if we alter the jump rate and jump size as described above toobtain a sequence (XN

t )t≥0 of jump processes, then,

√N(

XNt − xt

)

38

Page 40: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

converges in distribution to a Gaussian process (the analogue of a Gaussian random variablefor processes), where xt is the deterministic fluid limit obtained in the previous section bygeneralising the Law of Large Numbers.

We prove our result for the more general sequence of Markov jump processes described inthe previous section.

Definition 7.1. DefineγN

t =√N(

XNt − xt∧T N

)

,

where TN is the exit time defined in Definition 6.10 and, as before, XNt = XN

t∧T N .

In addition to the assumptions of the previous section, we also assume that

(a) there exists some random variable γ0 such that

γN0 ⇒ γ0, (7.1)

(b)

supx∈SN

√N |bN (x) − b(x)| → 0, (7.2)

where, as before, bN(x) = mN ′(x, 0) and b(x) = m′(x, 0),

(c) b is C1 on S,

(d) a, defined by a(x) = m′′(x, 0), is Lipschitz on S.

Example 7.2. In the case where XN arises from X1 by increasing the jump rate by a factor

of N and decreasing the jump size by a factor of N , by Example 6.11 mN (x,Nθ)N = m(x, θ) and

so bN (x) = b(x). Hence assumption (b) holds in this case and so our convergence theorem willapply in this case (subject to a few additional constraints on K(x, dy)) and so we do have ageneralisation of the Central Limit Theorem as claimed at the beginning of the section.

Note that

〈u, a(x)v〉 =

Rd

〈u, y〉〈y, v〉K(x, dy) = 〈ua(x)∗, v〉

and

〈v, a(x)v〉 =

Rd

〈v, y〉2K(x, dy) ≥ 0

so a is symmetric positive definite and hence there exists σ, unique up to change of sign, suchthat σ(x)σ(x)∗ = a(x).

Definition 7.3. Let (γt)t≤τ be the unique solution to the linear stochastic differential equation

dγt = σ(xt)dBt + ∇b(xt)γtdt

starting from γ0, where τ is defined in Definition 6.14 and B is a Brownian motion. Note thatthe distribution of (γt)t≤τ does not depend on the choice of σ since −B and B have the samedistributions.

As the limiting distribution of γNt will depend on the limiting behaviour of TN , we make the

final assumptions that T , defined in Definition 6.14 is finite and that, for all t ∈ T , ∂S is C1 atxt with inward normal nt, and P(〈nt, γt〉 = 0) = 0.

Definition 7.4. DefineT = mint ∈ T : 〈nt, γt〉 < 0 ∧ τ.

From now on we assume that γt = γt∧T .

We aim to prove that γNt ⇒ γt as N → ∞.

39

Page 41: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

7.1 Relative Compactness

As (γt)t≥0 arises from a linear stochastic differential equation, it has independent incrementsand so checking that the finite dimensional distributions of γN converge to those of γ will berelatively straightforward. Therefore it is convenient to use Theorem 4.3 to prove the convergencein distribution of the above processes.

We establish the relative compactness of the sequence (γN)N∈N by checking the necessaryand sufficient conditions stated in Corollary 5.11, namely that

(a) For every ε > 0 and rational t ≥ 0, there exists a compact set Γε,t ⊂ Rd such that

lim infN→∞

P(

γNt ∈ Γε

ε,t

)

≥ 1 − ε.

(b) For every ε > 0 and T > 0, there exists δ > 0 such that

lim supN→∞

P(

w′(γN , δ, T ) ≥ ε)

≤ ε.

where w′ is defined in Definition 5.7.

By conditioning on F0 = σ(γ0), if necessary, it is sufficient to consider the case when γ0 isnon-random. We make this assumption throughout the remainder of this section.

Lemma 7.5. For all ε > 0 there exists λ <∞ such that, for all N ∈ N,

P

(

supt≤t0

|γNt | ≥ λ

)

< ε.

Proof. Given ε > 0, pick λ′ ≥ max|γ0| + 1,√

C0 log 2C0

ε , where C0 is defined in (6.6). By

(7.2), there exists some N1 ∈ N such that N ≥ N1 implies

√N |bN(x) − b(x)| ≤ λ′

t0

for all x ∈ SN . By (7.1), there exists some N2 ∈ N such that N ≥ N2 implies

P(|γN0 | > λ′) ≤ P(|γN

0 | > |γ0| + 1) <ε

2.

By (6.6), there exists some N3 ∈ N such that N ≥ N3 implies that

P

(

√N sup

t≤T N

|MNt | > λ′

)

≤ P

supt≤T N

|MNt | >

C0 log 2C0

ε

N

≤ ε

2.

N3 is chosen so that

C0 log2C0

ε

N ≤ ε0, where ε0 is defined in (6.6).

Now, by the definition of MNt ,

γNt = γN

0 +√NMN

t +√N

∫ t

0

(bN (XNs ) − b(XN

s ))ds+√N

∫ t

0

(b(XNs ) − b(xs))ds

and so, on the set

√N |bN(x) − b(x)| ≤ λ′

t0, |γN

0 | ≤ λ′,√N sup

t≤T N

|MNt | ≤ λ′

,

40

Page 42: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

|γNt | ≤ λ′ + λ′ + t

λ′

t0+√NK

∫ t

0

|XNs − xs|ds

≤ 3λ′ +K

∫ t

0

|γNs |ds,

where K is the Lipschitz constant of b. Hence, by Gronwall’s Inequality (Lemma 6.9),

supt≤T N

|γNt | ≤ 3λ′e−Kt0 ,

and so, if N0 = maxN1, N2, N3, then N ≥ N0 implies that

P

(

supt≤t0

|γNt | ≥ 3λ′e−Kt0

)

≤ P(|γN0 | > λ′ or

√N sup

t≤T N

|MNt | > λ′) < ε.

Pick λ ≥ 3λ′e−Kt0 sufficiently large that P(

supt≤t0 |γNt | ≥ λ

)

< ε for all N < N0. Then, for all

N ∈ N, P(

supt≤t0 |γNt | ≥ λ

)

< ε.

For all t, take Γε,t = B(0, λ), the closed ball of radius λ in Rd. Then by the above lemma,

lim infN→∞

P(γNt ∈ Γε

ε,t) ≥ 1 − ε,

and so condition (a) of Corollary 5.11 holds.

Lemma 7.6. For all ε > 0, there exists λ < ∞ such that, for all δ > 0, there exists Nδ < ∞such that, for all N ≥ Nδ and all t ≤ t0,

P

(

sups≤t0,t≤s≤t+δ

|γNs − γN

t | > λ√δ

)

< ε.

Proof. Given ε > 0, by Lemma 7.5 there exists λ1 <∞ such that

P

(

|γNt | ≥ λ1

K√t0

)

2,

for all t ≤ t0. By (6.6), setting λ2 =√

2Ad log 4dε , there exists N ′

δ < ∞ such that N ≥ N ′δ

implies that

P

(

√N sup

t≤T N∧δ

|MNt | > λ2

√δ

)

< 2d exp

−N(λ2

δ/N)2

2Ad(TN ∧ δ)

≤ ε

2,

and hence

P

(

√N sup

s≤t0,t≤s≤t+δ|MN

s −MNt | > λ2

√δ

)

2,

where the dependence of N ′δ on δ arises from needing λ2

δ/N ≤ ε0. Setting λ′ = maxλ1, λ2and λ = 3λ′e−Kt0 , by (7.2), there exists N0 <∞ such that N ≥ N0 implies

√N |bN(x) − b(x)| ≤ λ′√

t0.

Now, by the definition of MNt , if s ≥ t

γNs − γN

t =√N(MN

s −MNt ) +

√N

∫ s

t

(bN (XNu ) − b(XN

u ))du +√N

∫ s

t

(b(XNu ) − b(xu))du,

41

Page 43: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

and∣

√N

∫ s

t

(b(XNu ) − b(xu))du

≤ K

∫ s

t

|γNu |du ≤ K

∫ s

t

(|γNt | + |γN

u − γNt |)du.

Hence, on the set

√N |bN(x) − b(x)| ≤ λ′√

t0, |γN

t | ≤ λ′

K√t0,

√N sup

s≤t0,t≤s≤t+δ|MN

s −MNt | ≤ λ′

√δ

,

if s ≤ t0 and t ≤ s ≤ t+ δ,

|γNs − γN

t | ≤ λ′√δ + (s− t)

λ′√t0

+K(s− t)λ′

K√t0

+K

∫ s

t

|γNu − γN

t |du

≤ 3λ′√δ +K

∫ s

t

|γNu − γN

t |du,

where, for the second inequality, δ may be taken to be at most t0 without changing the admissiblevalues of s. By Gronwall’s Inequality (Lemma 6.9),

sups≤t0,t≤s≤t+δ

|γNs − γN

t | ≤ 3λ′√δe−Kt0 = λ

√δ,

and so, if Nδ = maxN0, N′δ, then N ≥ Nδ implies that

P

(

sups≤t0,t≤s≤t+δ

|γNs − γN

t | ≥ λ√δ

)

≤ P

(

|γNt | ≤ λ′

K√t0

)

+P

(

√N sup

s≤t0,t≤s≤t+δ|MN

s −MNt | ≤ λ′

√δ

)

≤ ε.

Given ε > 0 and T > 0, set 2δ =(

ελ

)2where λ is as above. Then there exists a partition

ti of the form 0 = t0 < t1 < · · · < tn−1 < T ≤ tn with ti − ti−1 = 2δ for all i. Now

w′(γN , δ, T ) ≤ maxi

sups,t∈[ti−1,ti)

|γNs − γN

t |

≤ sups≤t0,t≤s≤t+2δ

|γNs − γN

t |,

and so

lim supN→∞

P(

w′(γN , δ, T ) ≥ ε)

≤ lim supN→∞

P

(

sups≤t0,t≤s≤t+2δ

|γNs − γN

t | ≥ λ√

)

≤ ε.

Therefore condition (b) of Corollary 5.11 holds and hence the sequence (γN )N∈N is relativelycompact.

7.2 Convergence of the Finite Dimensional Distributions

It now remains to prove that, for all finite subsets t1, . . . tk ⊂ [0,∞), the finite dimensionaldistributions (γN

t1 , . . . , γNtk

) ⇒ (γt1 , . . . , γtk). As γN

t is stopped at TN and γt is stopped at T ,where T is defined in Definition 7.4, it is necessary to establish a preliminary result about therelationship between TN and T when N is large.

Lemma 7.7. (a) If T = 0, then P(TN = 0) → 1 as N → ∞.

42

Page 44: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

(b) If T > 0 and τ1 is the smallest non-zero element of T , then for every t < τ1, P(TN > t) → 1as N → ∞.

Proof. (a) By the definition of T , T = 0 if and only if x0 ∈ ∂S and 〈n0, γ0〉 < 0 where n0 isthe inward normal at x0. Since γN

0 → γ0 in probability (γ0 is non random), 〈n0, γ0〉 < 0implies that P(〈n0, γ

N0 〉 < 0) → 1 as N → ∞ and so P(〈n0, X

N0 −x0〉 < 0) → 1 as N → ∞

(by the definition of γN0 ). If x0 ∈ ∂S, then this means that XN

0 /∈ S provided |XN0 − x0|

is small enough. Therefore TN = 0. Since XN0 → x0 in probability, this implies that

P(TN = 0) → 1 as N → ∞.

(b) If T > 0, then, as above, either x0 ∈ S, or 〈n0, γ0〉 > 0 (P(〈n0, γ0〉 = 0) = 0). If x0 ∈ Sthen, since T is finite, the result follows by Proposition 6.16. Suppose that x0 ∈ ∂S and〈n0, γ0〉 > 0. Since ∂S is C1 at x0, for all ε > 0, there exists δ(ε) > 0 such that, for allx ∈ S with |x− x0| ≤ δ(ε), and all v ∈ Rd,

|v| ≤ δ(ε) and 〈n0, v〉 ≥ ε|v| ⇒ x+ v ∈ S. (7.3)

We illustrate this condition in Figure 3.

Figure 3: Illustration of the case x0 ∈ S and 〈n0, γ0〉 > 0

Given ε > 0, let λ1 be the λ from Lemma 7.5 corresponding to ε3 . Let λ2 be the λ from

Lemma 7.6 corresponding to ε3 . Since γN

0 → γ0 in probability, there exists some N1 < ∞such that N ≥ N1 implies

P

(

|γN0 − γ0| >

〈n0, γ0〉4

)

3.

Let ε1 = min

〈n0,γ0〉2λ1

, 1λ1,(

〈n0,γ0〉4λ2

)2

and N0 = maxN1, Nε1, where Nε1 is defined as

in Lemma 7.6. Let Ω be the set

supt≤t0

|γNt | < max

1

ε1,〈n0, γ0〉

2ε1

, supt≤t0∧ε1

|γNt − γN

0 | ≤ 〈n0, γ0〉4

, |γN0 − γ0| ≤

〈n0, γ0〉4

.

43

Page 45: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Then if N ≥ N0 and t ≤ t0 ∧ ε1, on Ω,

〈n0, γNt 〉 ≥ 2ε1

〈n0, γ0〉(

〈n0, γ0〉 − |〈n0, γNt − γN

0 〉| − |〈n0, γN0 − γ0〉|

) 〈n0, γ0〉2ε1

≥ 2ε1〈n0, γ0〉

(

〈n0, γ0〉 − |γNt − γN

0 | − |γN0 − γ0|

)

|γNt |

≥ 2ε1〈n0, γ0〉

(

〈n0, γ0〉 −〈n0, γ0〉

4− 〈n0, γ0〉

4

)

|γNt |

= ε1|γNt |,

and

|γNt | < 1

ε1.

But

P(Ωc) ≤ P

(

supt≤t0

|γNt | ≥ λ1

)

+P

(

supt≤t0∧ε1

|γNt − γN

0 | > λ2√ε1

)

+P

(

|γN0 − γ0| >

〈n0, γ0〉4

)

which is less than ε. Hence

P

(

〈n0, γNt 〉 > ε1|γN

t | and |γNt | < 1

ε1

)

≥ 1 − ε.

By the continuity of xt, there exists 0 < ε2 < ε1 ∧ τ such that t ≤ ε2 implies |xt − x0| ≤δ(ε1). Set N ′ ≥ maxN0, (ε1δ(ε1))

−2. Then, for N ≥ N ′ and t ≤ TN ∧ ε2, xt ∈ S,

|xt − x0| ≤ δ(ε1), N− 1

2 |γNt | ≤ N− 1

2

ε1≤ δ(ε1) and 〈n0, γ

Nt 〉 > ε1|γN

t |, all with probability

exceeding 1 − ε. By (7.3), this implies that XNt = xt + N− 1

2 γNt ∈ S i.e. TN 6= t for all

t ≤ ε2. Hence P(TN ≤ ε2) < ε for all N ≥ N ′, and so P(TN = 0) → 0 as N → ∞. Theresult follows by Proposition 6.16.

Lemma 7.8. Suppose t ≤ t0. Then γNt ⇒ γt as N → ∞.

Proof. If T = 0, then by Lemma 7.7, TN → 0 in probability as N → ∞. If t > 0, then, recallingthat γN

t is stopped at TN and that γt is stopped at T , by Lemma 7.6 and (7.1),

P(|γNt − γt| ≤ ε) = P(|γN

T N − γ0| ≤ ε)

≥ P(|γNT N − γN

0 | ≤ ε

2, |γN

0 − γ0| <ε

2and TN ≤ t)

→ 1.

So assume T > 0. By conditioning on σ(γτm) if necessary, where τm ∈ T , we may assume

that t < τ1, where τ1 is the smallest non-zero element of T .

Define (ψt)t≤τ in Rd ⊗ (Rd)∗ by

ψt = ∇b(xt)ψt, ψ0 = id.

Fix θ ∈ (Rd)∗ and set θt = (ψ∗t )−1θ. Then ψ∗

t θt + ψ∗t θt = 0 (θ is constant) and so θt =

−ψ∗t−1(ψ∗

t θt) = −ψ∗t−1(ψ∗

t ∇b(xt)∗θt) = −∇b(xt)

∗θt. By Ito’s formula, and the definition of γt

(Definition 7.3),

d〈θt, γt〉 = 〈dθt, γt〉 + 〈θt, dγt〉 + 0

= −〈∇b(xt)∗θt, γt〉dt+ 〈θt, σ(xt)dBt〉 + 〈θt,∇b(xt)γt〉dt

= 〈θt, σ(xt)dBt〉,

44

Page 46: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

and so

〈θt, γt〉 ∼ N(〈θ, γ0〉,∫ t

0

〈θs, a(xs)θs〉ds),

i.e. γ is a Gaussian process. On the other hand,

dγNt =

√NdMN

t +√N(bN (XN

t ) − b(xt))dt,

where MNt is defined as in Theorem 6.12. As above,

d〈θt, γNt 〉 =

√N〈θt, dM

Nt 〉 +RN,θ

t dt, (7.4)

whereRN,θ

t =√N〈θt, b

N (XNt ) − b(xt) −∇b(xt)(X

Nt − xt)〉.

By (7.2)

supt≤T N

√N |bN (XN

t ) − b(XNt )| → 0.

Given ε > 0, since xt ∈ S for all t in the compact interval [ε, τ1 − ε], there exists some δ′ > 0such that the compact set x ∈ Rd : inft∈[ε,τ1−ε] |x − xt| ≤ δ′ is a subset of S. Then since b isC1 on S, ∇b is uniformly continuous on compact sets and so there exists some 0 < δ < δ′ suchthat |x− y| < δ implies that |∇b(x)−∇b(y)| ≤ ε. Now suppose |x−xt| ≤ δ. By the mean valuetheorem, there exists some ξ ∈ B(xt, |x−xt|) ⊂ B(xt, δ) such that b(x)− b(xt) = ∇b(ξ)(x−xt).Then

|b(x) − b(xt) −∇b(xt)(x− xt)| = |∇b(ξ) −∇b(xt)||x− xt|≤ ε|x− xt|.

Hence, |XNt − xt| ≤ δ and ε ≤ t ≤ τ1 − ε imply

√N |b(XN

t ) − b(xt) −∇b(xt)(XNt − xt)| ≤ ε|γN

t |.

θt is continuous and hence bounded, by C1 say, on the compact interval [0, τ1]. Similarly ∇b(xt)is bounded, by C2 say. Given η > 0, by Lemma 7.5 there exists some λ < ∞ such thatP(supt≤T N |γN

t | > λ) < η2 . Given t′ < τ1, pick ε sufficiently small that t′ ≤ τ1 − ε and

ε[C1(ε+ (K + C2)λ) + t′(C1(1 + λ))] < η. Then, if δ > 0 is as above, on the set

Ω = supt≤T N

√N |bN (XN

t ) − b(XNt )| < ε, sup

t≤T N

|XNt − xt| ≤ δ, sup

t≤T N

|γNt | ≤ λ,

|RN,θt | ≤ C1(sup

t≤t′

√N |bN (XN

t ) − b(XNt )| + (K + sup

t≤t0

|∇b(xt)|) supt≤t′

|γNt |)

≤ C1(ε+ (K + C2)λ),

and if ε ≤ t ≤ t′, then

|RN,θt | ≤ C1(sup

t≤t′

√N |bN (XN

t ) − b(XNt )| + ε sup

t≤t′|γN

t |)

≤ C1ε(1 + λ).

Hence

∫ t′

0

|RN,θt |dt =

∫ ε

0

|RN,θt |dt+

∫ t′

ε

|RN,θt |dt

≤ ε [C1(ε+ (K + C2)λ)] + (t′ − ε)(C1ε(1 + λ))

< η.

45

Page 47: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Choosing N sufficiently large that P(Ω) ≥ 1 − η,

P

(

∫ t′

0

|RN,θt |dt ≤ η

)

≥ 1 − η

i.e. the integral converges to 0 in probability.

Therefore, returning to (7.4), in order to show γNt ⇒ γt, it suffices to show, for all θ ∈ (Rd)∗

and all t ≤ τ1, that√N

∫ t

0

〈θs, dMNs 〉 → N(0,

∫ t

0

〈θs, a(xs)θs〉ds)

in distribution. By Levy’s Continuity Theorem, it is sufficient to show, for all θ ∈ (Rd)∗ and all

t ≤ τ1, that E(EN,θt ) → 1 as N → ∞, where

EN,θt = expi

√N

∫ t

0

〈θs, dMNs 〉 +

1

2

∫ t

0

〈θs, a(xs)θs〉ds.

Set mN (x, θ) = mN (x, iθ), m(x, θ) = m(x, iθ) and

φN (x, θ) =

Rd

(

ei〈θ,y〉 − 1 − i〈θ, y〉)

KN(x, dy).

Assuming that (6.4) holds for θ ∈ (Cd)∗, by the same argument as (6.7),

supx∈SN

sup|θ|≤η

|mN ′(x,Nθ) − m′(x, θ)| → 0

as N → ∞, for all η ≤ η0. By the second-order mean value theorem, if h ∈ (Rd)∗, then

m′′(x, θ)h = m′(x, θ + h) − m′(x, θ) −∫ 1

0

m′′′(x, rθ)(h, h)(1 − r)dr,

and similarly for mN ′′(x,Nθ). By Lemma 6.4, |m′′′(x, θ)| ≤ B, for all x ∈ S and |θ| ≤ η, andN2|mN ′′′(x,Nθ)| ≤ N2 NB

N3 = B, for all x ∈ SN and |θ| ≤ η. Given ε > 0, let δ = min η0−η2 , ε

2B .There exists N0 such that N ≥ N0 implies

supx∈SN

sup|θ|≤η+η0

2

|mN ′(x,Nθ) − m′(x, θ)| < εδ

4.

If N ≥ N0, then

|NmN ′′(x,Nθ) − m′′(x, θ)| = sup|h|=δ

1

|h|∣

∣(NmN ′′(x,Nθ) − m′′(x, θ))h∣

= sup|h|=δ

1

|h|∣

∣mN ′(x,N(θ + h)) − m′(x, θ + h)

− mN ′(x,Nθ) + m′(x, θ)

+

∫ 1

0

(

m′′′(x, θ) −N2mN ′′′(x,Nθ))

(h, h)(1 − r)dr∣

≤ sup|h|=δ

1

|h|(

2 supx∈SN

sup|θ|≤η+η0

2

|mN ′(x,Nθ) − m′(x, θ)| + 1

22B|h|2

)

<1

δ

2εδ

4+Bδ

≤ ε.

Hence, for all η < η0, we have

supx∈SN

sup|θ|≤η

|NmN ′′(x,Nθ) − m′′(x, θ)| → 0.

46

Page 48: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Since a(x) = m′′(x, 0) = −m′′(x, 0),

φN (x,√Nθ) +

1

2〈θ, a(x)θ〉 =

∫ 1

0

(

NmN ′′(x,√Nrθ) − m′′(x, 0)

)

(θ, θ)(1 − r)dr,

where the first term arises from the second-order mean value theorem. Hence, for all ρ <∞,

supx∈SN

sup|θ|≤ρ

|φN (x,√Nθ) +

1

2〈θ, a(x)θ〉| ≤ 1

2ρ2 sup

x∈SN

sup|θ|≤ρ

|NmN ′′(x,√Nθ) − m′′(x, 0)|

≤ 1

2ρ2 sup

x∈SN

sup|θ|≤ ρ√

N

|NmN ′′(x,Nθ) − m′′(x, θ)|

+ supx∈SN

sup|θ|≤ρ

|m′′(x, θ/√N) − m′′(x, 0)|

→ 0, (7.5)

the second term converging by the continuity of m′′(x, θ) as a function of θ.

Write EN,θt = EN

t = ZNt A

Nt B

Nt , where

ZNt = expi

√N

∫ t

0

〈θs, dMNs 〉 −

∫ t

0

φN (XNs ,

√Nθs)ds,

ANt = exp

∫ t

0

(

φN (XNs ,

√Nθs +

1

2〈θs, a(X

Ns )θs〉

)

ds,

BNt = exp

∫ t

0

1

2〈θs, (a(xs) − a(XN

s ))θs〉ds.

(ZNt∧T N )t≤τ is a martingale as in Lemma 6.7, and so E(ZN

t∧T N ) = 1 for all N . Fix t ≤ τ . Sinceθs is continuous, it is bounded on the compact interval [0, t]. By Lemma 6.4, a(x) is boundedand so, for s ≤ t, by (7.5), φN (XN

s ,√Nθs), and hence ZN

t∧T N , is bounded, uniformly in N . Alsoby (7.5), AN

t∧T N → 1 uniformly as N → ∞. Since a(x) is bounded, BNt∧T N is bounded uniformly

in N , and since a is Lipschitz and XNt → xt in probability (Theorem 6.12), BN

t∧T N converges to1 in probability. Given ε > 0, suppose |ZN

t∧T N |, |ANt∧T N |, |BN

t∧T N | ≤M for some constant M andthat N is sufficiently large that P(|AN

t∧T NBNt∧T N − 1| ≥ δ) ≤ ε

2(M3+1) where δ = ε2M . Then

∣E(ZNt∧T NAN

t∧T NBNt∧T N ) − 1

∣ =∣

∣E(

ZNt∧T N (AN

t∧T NBNt∧T N − 1)

)∣

≤ E

(

|ZNt∧T N |AN

t∧T NBNt∧T N − 1|1|AN

t∧TN BN

t∧T N −1|<δ)

+ E

(

(|ZNt∧T NA

Nt∧T NB

Nt∧T N | + 1)1|AN

t∧TNBN

t∧T N−1|≥δ

)

≤ MδP(|ANt∧T NB

Nt∧T N − 1| < δ)

+ (M3 + 1)P(|ANt∧T NBN

t∧T N − 1| ≥ δ)

< ε.

Hence E(ZNt∧T NA

Nt∧T NB

Nt∧T N ) → 1 as N → ∞. By Lemma 7.7, P(TN > t) → 1 for all t < τ1.

Since ENt = ZN

t ANt B

Nt is bounded uniformly in N , by a similar argument to that above,

E(ENt ) → 1 for all t < τ1, as required.

Lemma 7.9. For all finite subsets t1, . . . tk ⊂ [0, τ ], (γNt1 , . . . , γ

Ntk

) ⇒ (γt1 , . . . γtk).

Proof. The proof is by induction on k. The case k = 1 is proved in Lemma 7.8 above. Supposewe have proved the result for all m < k, where k > 1. Let 0 ≤ t1 < · · · < tk ≤ τ . By Levy’sContinuity Theorem, it is sufficient to show that

E

exp i

k∑

j=1

〈θj , γNtj〉

→ E

exp i

k∑

j=1

〈θj , γtj〉

,

47

Page 49: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

for every k-tuple (θ1, . . . , θk) ∈ ((Rd)∗)k. Let

βN =

k−2∑

j=1

〈θj , γNtj〉 + 〈θk−1 + θk, γ

Ntk−1

〉,

and define β similarly. By our inductive hypothesis, βN ⇒ β and so, conditional on Ftk−1=

σ(γt : 0 ≤ t ≤ tk−1), βN → β in probability. Hence

E

(∣

∣ei(βN−β) − 1∣

∣ | Ftk−1

)

→ 0

as N → ∞. Also, by an identical argument to Lemma 7.8,

E

(

ei〈θk,γN

tk−γN

tk−1〉 | Ftk−1

)

→ E

(

ei〈θk,γtk−γtk−1

〉 | Ftk−1

)

,

as N → ∞. Hence∣

∣E

(

eiβN

ei〈θk,γN

tk−γN

tk−1〉)− E

(

eiβei〈θk,γtk−γtk−1

〉)∣

=∣

∣E

[

E

(

eiβN

ei〈θk,γN

tk−γN

tk−1〉 − eiβei〈θk,γtk

−γtk−1〉 | Ftk−1

)]∣

≤ E

∣E

(

ei(βN−β)ei〈θk,γN

tk−γN

tk−1〉 − ei〈θk,γtk

−γtk−1〉 | Ftk−1

)∣

≤ E

∣E

[(

ei(βN−β) − 1)

ei〈θk,γN

tk−γN

tk−1〉 | Ftk−1

]∣

+ E

∣E

(

ei〈θk,γN

tk−γN

tk−1〉 − ei〈θk,γtk

−γtk−1〉 | Ftk−1

)∣

≤ E

[

E

(∣

∣ei(βN−β) − 1∣

∣ | Ftk−1

)]

+ E

∣E

(

ei〈θk,γN

tk−γN

tk−1〉 − ei〈θk,γtk

−γtk−1〉 | Ftk−1

)∣

→ 0.

Therefore (γNt1 , . . . , γ

Ntk

) ⇒ (γt1 , . . . γtk), completing the inductive step.

Combining these results gives the following theorem.

Theorem 7.10. γN ⇒ γ as N → ∞.

Proof. By Lemmas 7.5 and 7.6, the sequence (γN )N∈N is relatively compact and by Lemma 7.9,the finite dimensional distributions of γN converge in distribution to those of γ. The resultfollows by Theorem 4.3.

The basic implication of this theorem is that (XNt )t≥0 can be approximated (in distribution)

by the Gaussian process (xt +N− 12 γt)t≥0. This allows us to use properties of diffusion processes

when investigating the behaviour of XN for large values of N .

8 Applications

The results from the previous two sections can be applied to obtain limiting results in a widerange of areas, from random graphs and stochastic networks to perturbed dynamical systems.We look at the applications to biology and specifically to population processes, discussing twoexamples: epidemics and logistic growth.

48

Page 50: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

8.1 Epidemics

Suppose we have a population consisting ofN individuals. In the population, at any given time t,there are a number of individuals, SN

t , who are susceptible to a particular disease, and a numberof individuals, IN

t , who are infected by the disease and can pass it on. A susceptible individualencounters diseased individuals at a rate proportional to the fraction of the total populationthat is diseased, with proportionality constant λ. We assume that diseased individuals recoverand become immune independently of each other, at a rate µ. Therefore, (SN

t , INt )t≥0 is a

continuous-time Markov chain, taking values in (Z+)2 where

(s, i) → (s− 1, i+ 1) at rate λsi/N

(s, i) → (s, i− 1) at rate µi.

We are interested in the proportion of susceptible and infected individuals for large values of Nas this is generally the situation under which an actual epidemic occurs.

Let XNt = (SN

t , INt )/N be the proportion of susceptibles and infectives at time t. Since

INt + SN

t ≤ N for all t, XNt takes values in UN = 1

N Z2 ∩ U , where U is the unit square [0, 1]2.We are particularly interested in the proportion of individuals who have avoided being infectedby the time the last infective recovers and for this reason, we let S be the open set (−1, 2)×(0, 2).(This choice of S is somewhat arbitrary, but has the property that the only way the processcan leave S is by the last infective recovering). Set SN = UN ∩ S, and for some fixed t0, letTN = inft ≥ 0 : XN

t /∈ S ∧ t0.

∆XNt =

(−1,1)N with rate λSN

t INt /N = NλXN,1

t XN,2t

(0,−1)N with rate µIN

t = NµXN,2t ,

and so KN (x, dy) = Nλx1x2δ(−1,1)/N +Nµx2δ(0,−1)/N . Then

mN(x, θ) =

R2

e〈θ,y〉KN (x, dy)

= Nλx1x2eθ2−θ1

N +Nµx2e−θ2N ,

and so, there exists a limit kernel K(x, dy) = λx1x2δ(−1,1) + µx2δ(0,−1), with corresponding

Laplace transform m(x, θ) = λx1x2eθ2−θ1 + µx2e−θ2 with the properties:

(a) There exists a constant η0 = 1 such that

supx∈S

sup|θ|≤1

m(x, θ) ≤ 4λe2 + 2µe1 <∞.

(b)

supx∈SN

sup|θ|≤η0

mN (x,Nθ)

N−m(x, θ)

= 0.

We also assume that there exists some x0 ∈ S, the closure of S such that, for all δ > 0,

lim supN→∞

N−1 log P(|XN0 − x0| ≥ δ) < 0.

Note that as SN0 + IN

0 = N for all N , x0 = (1 − α, α) for some α ∈ [0, 1]. Set b(x) = m′(x, 0) =(−λx1x2, λx1x2 − µx2). On S,

|λ(x1x2 − y1y2)| ≤ λ(|x1 − y1||x2| + |x2 − y2||y1|)≤ 2λ(|x1 − y1| + |x2 − y2|),

49

Page 51: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

and

|λ(x1x2 − y1y2) − µ(x2 − y2)| ≤ 2λ|x1 − y1| + (2λ+ µ)|x2 − y2|)≤ (2λ+ µ)(|x1 − y1| + |x2 − y2|).

Hence

|b(x)|2 = |λ(x1x2 − y1y2)|2 + |λ(x1x2 − y1y2) − µ(x2 − y2)|2≤ (4λ2 + (2λ+ µ)2)(|x1 − y1| + |x2 − y2|)2≤ 2(4λ2 + (2λ+ µ)2)|x − y|2,

and so b is Lipschitz on S. Therefore, there is a unique solution (xt)t≥0 to the ordinary differentialequation xt = b(xt) starting from x0, that is

x1t = −λx1

tx2t , x2

t = (λx1t − µ)x2

t .

The sequence of jump processes (XN )N≥1 satisfies all the conditions needed to apply the FluidLimit Theorem (Theorem 6.12) and so, for all δ > 0,

lim supN→∞

N−1 log P( supt≤T N

|XNt − xt| ≥ δ) < 0.

Let τ = inft ≥ 0 : x2t = 0. Then T = t ≥ 0 : xt /∈ S = [τ,∞) (if x2

t = 0, then x1t = x2

t = 0),and so xt∧τ = xt for all t. The argument in Proposition 6.16 then shows that

lim supN→∞

N−1 log P(supt≥0

|XNt − xt∧τ | > δ) < 0.

By investigating the properties of xt, we can get a good idea of the behaviour of XNt for large

values of N . We introduce a new variable x3t , the proportion of individuals that have recovered

by time t, so that x1t +x2

t +x3t = 1 for all t, and hence x3

t = µx2t . We shall discuss three questions:

(a) What conditions are needed for an epidemic to take place?

(b) What proportion of people will catch the disease and, in particular, will there be anysusceptibles left when the epidemic has died out?

(c) What is the time dependence of the epidemic?

The case α = 0, when there are no infectives initially, is not interesting as all the values remainconstant over time, so we assume that α > 0. By reversing time in the differential equations, itcan be seen that this implies that x2

t > 0 for all t (and so in fact τ = ∞). An epidemic occurswhen the proportion of infectives increases over some time period. Looking at the equation for

x2t , this happens if and only if λx1

t > µ for some t. By the equation for x1t , x

1t is a decreasing

function of t and so an epidemic occurs if and only if λxt0 > µ, or ρ0 = λ(1−α)

µ > 1. We call ρ0

the reproductive rate.

By the equation for x3t , x

3t is increasing, but bounded above by 1, and so must converge to

a limit as t→ ∞. Hence x3t → 0, and so x2

t → 0 as t→ ∞. Therefore, the epidemic dies out inthe limit. Now

dx1t

dx3t

=x1

t

x3t

= −λµx1

t ,

and so x1t = x1

0e−λ

µx3

t . At the end of the epidemic, x2∞ = 0 and so x1

∞ + x3∞ = 1. Hence the

proportion of susceptibles at the end of the epidemic satisfies

x1∞ = (1 − α)e−

λµ

(1−x1∞),

50

Page 52: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

which can be solved numerically. Often, the initial number of infectives is very small, so (1−α)

is approximately 1. If we let π = 1 − x1∞

1−α be the proportion of susceptibles ultimately infected,then substituting in the above equation gives

π ≈ 1 − e−λ(1−α)

µπ = 1 − e−ρ0π.

For a given value of ρ0, this can be solved using Newton’s method.

Using the fact that x2t = 1− x1

t − x3t , and x1

t = (1−α)e−λµ

x3t , the epidemic curve is given by

x3t = µ(1 − x3

t − (1 − α)e−λµ

x3t ).

For specified values of λ, µ and α, this equation can be solved by numerical methods, which inturn gives values for x1

t and x2t . The graphs in Figure 4 (from [2]), show the course of a model

epidemic for λ = 0.001, µ = 0.1 in the cases α = 1/500 i.e ρ0 = 4.99 > 1, and α = 1/150 i.e.ρ0 = 1.49 > 1.

Figure 4: Graphs of a model epidemic for ρ0 = 4.99 and ρ0 = 1.49

8.2 Logistic Growth

We now interpret N as the area of a region occupied by a certain population. If the populationsize is k, then the population density is k

N . For simplicity, we assume that births and deaths occursingly. The rate of births and deaths should be approximately proportional to the populationsize. We assume, however, that crowding affects the birth and death rates, which thereforedepend on the population density. Therefore, the birth and death rates should be of the formλ(k/N)k and µ(k/N)k, respectively, for some functions λ and µ. If a, b, and c are non-negativeconstants, then one of the simplest such models is given by taking λ(x) = a and µ(x) = b+ cx.

51

Page 53: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

Let the Markov chain (XNt )t≥0 be the population density at time t. Then XN takes values in

IN = 1N Z+, and

∆XNt =

1N at rate NaXN

t

− 1N at rate NbXN

t +Nc(XNt )2.

We are interested in whether the population exceeds a certain density or dies out and so, forfixed R > 0, we set S = (0, R), and SN = IN ∩ S. XN has Levy Kernel KN (x, dy) =Naxδ 1

N+ (Nbx+Nc)δ− 1

Nand so

mN (x, θ) = NaxeθN + (Nbx+Nc)e−

θN .

There exists a limit kernel K(x, dy) = axδ1 +(bx+ c)δ−1, with corresponding Laplace transformm(x, θ) = axeθ + (bx + c)e−θ with the required properties. Suppose there exists some x0 ∈ S,the closure of S such that, for all δ > 0,

lim supN→∞

N−1 log P(|XN0 − x0| ≥ δ) < 0.

Set b(x) = m′(x, 0) = (a−b)x−cx2. b is Lipschitz on S and so there is a unique solution (xt)t≥0

to the ordinary differential equation xt = (a− b)xt − cx2t starting from x0. Applying the Fluid

Limit Theorem (Theorem 6.12), for all δ > 0,

lim supN→∞

N−1 log P( supt≤T N

|XNt − xt| ≥ δ) < 0,

where TN is defined as in Definition 6.10. We can solve the ordinary differential equationexplicitly to get

xt =x0(a− b)

(a− b− cx0)e(b−a)t + cx0if a 6= b, or xt =

x0

x0ct+ 1if a = b,

from which it is easy to see that, when c 6= 0, the population dies out as t → ∞ if a ≤ b, andthe population density stabilises to a−b

c if a > b. If c = 0, then the population dies out if a < b,it explodes (i.e. becomes infinite) if a > b, and the density is stable at x0 if a = b. A graph ofthe case a > b, c 6= 0 is sketched in Figure 5.

Figure 5: A graph of the logistic growth model in the case a > b, c 6= 0

We now look at the fluctuations about this result. As in Section 7, let γNt =

√N(XN

t − xt)and suppose that there exists a random variable γ0 such that γN

0 ⇒ γ0. Let a(x) = m′′(x, 0) =

52

Page 54: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

(a+b)x+cx2. It is easy to check that all the necessary conditions for Theorem 7.10 hold. HenceγN ⇒ γ, where γ is the solution to the stochastic differential equation

dγt = σ(xt)dBt + ∇b(xt)γtdt

=√

(a+ b)xt + cx2tdBt + (a− b− 2cxt)γtdt.

In all of the cases mentioned above, except a > b, c = 0, xt converges to a finite limit as t→ ∞.When this limit is zero, γt behaves asymptotically like the solution to the differential equation

dyt = (a− b)ytdt.

That is, yt = x0e(a−b)t → 0 as t→ ∞ when a < b, and so the fluctuations die out as well. When

xt converges to a−bc , a > b and so γt behaves asymptotically like the solution to the stochastic

differential equation

dYt =

2a(a− b)

cdBt − (a− b)Ytdt,

i.e. Y is an Ornstein-Uhlenbeck process and so Yt ∼ N(

x0e−(a−b)t, a

c (1 − e−2(a−b)t))

⇒ N(0, ac )

as t→ ∞. Hence the fluctuations are asymptotically stable and XNt behaves like a N(a−b

c , aNc)

random variable for large t and N .

9 Conclusion

In this essay, the notion of convergence for a sequence of stochastic processes has been discussed.We followed the route of defining a topology on the space of cadlag functions, and a metricon the space of probability measures that induced the same topology as that generated byweak convergence. This enabled us to establish an equivalence between weak convergence,convergence in probability and almost sure convergence, which we used to prove that, in thesimple cases of Markov chains and diffusion processes, the convergence of the generators impliedweak convergence. In Section 4, we proved a key result, due to Prohorov, which states thata sequence of stochastic processes converges if and only if it is relatively compact and thecorresponding finite dimensional distributions converge. The remainder of the essay was spentestablishing a more workable form of this result, and applying it to generalise the Law of LargeNumbers and the Central Limit Theorem to Markov jump processes. Applications of thesetheorems were also discussed.

There are a number of ways in which the key result mentioned above can be developed. Oneidea is to show that any sequence of Markov processes converges if the corresponding sequenceof generators converges. Using the semigroup representation, it follows directly from the Markovproperty that the finite dimensional distributions converge. The difficulty is showing the relativecompactness of the sequence of processes. This can be done using Aldous’ Criterion which isproved by Kallenberg [8]. A full proof of the result, for Feller processes, is given in the samereference.

A shortcoming of Prohorov’s result is that in general it is very difficult, and sometimes im-possible, to calculate the finite dimensional distributions of a process. Strook and Varadhanformulated an alternative result which depends on the generators of the processes, and is conse-quently applicable in many more situations. It relies on the fact that if X is a Markov processwith sample paths in a metric space (S, d) and with generator A, then

f(X(t)) −∫ t

0

Af(X(s))ds

is a martingale for all f ∈ C(S). This can in fact be used to characterise the process X .Strook and Varadhan’s result is stated as follows. Suppose that (Xn)n≥1 is a relatively compact

53

Page 55: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

sequence of Markov processes, with generators (An)n≥1 such that An → A. If there exists aMarkov process X such that

f(X(t)) −∫ t

0

Af(X(s))ds

is a martingale for all f ∈ C(S), then Xn ⇒ X . A comprehensive account of this result is givenin [4] and in [7].

54

Page 56: Convergence of Markov Processes - Lancasterturnera/essay.pdf · 2007. 10. 5. · 1 Introduction This essay aims to give an account of the theory and applications of the convergenceof

References

[1] Billingsley, P. Convergence of Probability Measures. Wiley, New York, 1999

[2] Brown, D. and Rothery, P. Models in Biology: Mathematics, Statistics and Computing.Wiley, Chichester, 1993

[3] Darling, R.W.R. and Norris J.R. Vertex Identifiability in Large Random Hypergraphs. Inpreparation, 2002

[4] Ethier, S.N. and Kurtz, T.G. Markov Processes: Characterization and Convergence. Wiley,New York, 1986

[5] Friedlin, M.I. Markov Processes and Differential Equations: Asymptotic Problems.Birkhauser Verlag, Basel, 1996

[6] Galambos, J. and Gani, J. Studies in Applied Probability. Applied Probability Trust, Ox-ford, 1994

[7] Jacod, J. and Shiryaev, A.N. Limit Theorems for Stochastic Processes. Springer-Verlag,Berlin, 1987

[8] Kallenberg, O. Foundations of Modern Probability. Springer, New York, 1997

[9] Norris, J.R. Markov Chains. Cambridge University Press, Cambridge, 1997

[10] Rogers, L.C.G. and Williams, D. Diffusions, Markov processes, and Martingales, Vol. I.Cambridge University Press, Cambridge, 2001

55