Stochastic Processes in Discrete Time - Stochastik II Lecture ...Stochastic Processes in Discrete Time - Stochastik II Lecture notes Prof. Dirk Becherer Institut fur¨ Mathematik Humboldt

Stochastic Processes in Discrete Time - Stochastik II

Lecture notes

Prof. Dirk Becherer

Institut für MathematikHumboldt Universität zu Berlin

February 12, 2018

Contents

1 Conditional expectations 21.1 The Hilbert space L2 and orthogonal projections . . . . . . . . . . . . . . . 21.2 Conditional expectations:

Definition and construction . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Properties of conditional expectations . . . . . . . . . . . . . . . . . . . . . 41.4 Regular conditional probability distributions . . . . . . . . . . . . . . . . . 71.5 Disintegration

and semi-direct products . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Martingales 142.1 Definition and interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 Martingale convergence theorems

for a.s. convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.3 Uniform integrability and

martingale convergence in L1 . . . . . . . . . . . . . . . . . . . . . . . . . 182.4 Martingale inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.5 The Radon-Nikodym theorem

and differentiation of measures . . . . . . . . . . . . . . . . . . . . . . . . . 232.6 A brief excursion on Markov chains . . . . . . . . . . . . . . . . . . . . . . 29

2.6.1 First-entry times of Markov chains . . . . . . . . . . . . . . . . . . 302.6.2 Further properties of Markov chains . . . . . . . . . . . . . . . . . . 312.6.3 Branching Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3 Construction of stochastic processes 333.1 General construction of stochastic processes . . . . . . . . . . . . . . . . . 333.2 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.3 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Weak convergence and tightness of probability measures 404.1 Weak convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.2 Tightness and Prohorov’s theorem . . . . . . . . . . . . . . . . . . . . . . . 414.3 Weak convergence on S � Cr0, 1s . . . . . . . . . . . . . . . . . . . . . . . 43

5 Donsker’s invariance principle 45

1

Chapter 1

Conditional expectations

To define conditional expectation, we will need the following theory.

1.1 The Hilbert space L2 and orthogonal projectionsLet pΩ,F ,Pq be a probability space. L2 � L2pΩ,F ,Pq is the space of (equivalence classesof) square-integrable real-valued random variables with the scalar product

xX, Y y :� E rXY sthat induces the norm }X} :� axX,Xy � aE rX2s. From measure and integrationtheory we know that L2 is a complete normed space (hence a Banach space), with thescalar product x�, �y, i.e. it is even a Hilbert space.Theorem 1.1. Let K be a closed subspace in L2. Then for any Y P L2 there exists anelement Ŷ P K such that

(i) }Y � Ŷ } � dpY,Kq :� inf!}Y �X}

�� X P K),(ii) Y � Ŷ K X for all X P K, i.e. Y � Ŷ P KK.

Properties (i) and (ii) are equivalent and Ŷ is uniquely described through (i) or (ii).Remark 1.1. The result holds as well for general Hilbert spaces: Note that in the proofwe just used the inner-product-structure but not the specific form of this inner product.Definition 1.2 (“best forecast in L2 sense”). Ŷ from Theorem 1.1 is called the orthogonalprojection of Y into K (in the Hilbert space L2).

1.2 Conditional expectations:Definition and construction

Let us fix a probability space pΩ,F ,Pq. Conditional expectations will be essential for thedefinition of martingales. A conditional expectation could be seen as the best approxima-tion in a mean-squared-error sense for a real valued random variable given some partialinformation (modelled as a sub-σ-field G of F).

More precisely and more generally, we define

2

Definition 1.3 (Conditional expectation). Let G be a sub-σ-field of F . Let also X be arandom variable with X ¥ 0 or X P L1. Then a random variable Y with the properties

(i) Y is G-measurable,

(ii) E rY 1Gs � E rX1Gs for each G P G,is called (a version of) the conditional expectation of X with respect to G. We writeY � E rX|Gs.Remark 1.2. • Intuitively, (i) means “based on information G”, while (ii) means

“best forecast”.

• Using (i) and (ii) only, it is straightforward to check that

X ¥ 0 ùñ Y ¥ 0,

X � integrable ùñ Y � integrable.

(By (i), note that tŶ ¥ 0u, tŶ 0u P G; use (ii) in Definition 1.3)Exercise 1.1. Let Y be a G-measurable integrable random variable for a σ-field G. Thenthe following are equivalent

(a) Y satisfies (ii) in Definition 1.3 for every G P G, i.e.

E rX1Gs � E rY 1Gs @G P G;

(b) Y satisfiesE rX1Gs � E rY 1Gs @G P E

for a class E 2Ω with Ω P E , G � σpEq and E - X-closed;(c) Y satisfies

E rXZs � E rY Zsfor all G-measurable, bounded (and1 non-negative) random variables Z.

Sidenote: An analogous statement holds with Y being nonnegative instead of inte-grable, if one requires Z ¥ 0 (instead of boundedness) in part (c).Theorem 1.4. Let X be a random variable with X ¥ 0 (or X P L1) and let G be asub-σ-field of F . Then:

(1) Conditional expectation E rX|Gs exists and satisfies E rX|Gs ¥ 0 (or E rX|Gs P L1respectively).

(2) The conditional expectation is P-a.s. unique, i.e. if Y and Y 1 are both versions of theconditional expectation, then Y � Y 1 P-a.s.

1optional

3

Remark 1.3. In part (b) of the proof above we constructed ErX|Gs for X P L2 as theorthogonal projection onto the subspace L2G, i.e. the conditional expectation minimizesErpX�Y q2s � }X�Y }2L2 over all Y P L2G, which justifies the description of the conditionalexpectation as “the best” projection (for X P L2).

In simple cases conditional expectations can be constructed explicitly.

Example 1.1. Let G be a σ-field that is generated by a finite number of sets, i.e. G �σpA1, . . . , ANq for a finite number of sets Ai P F . For such partitions, we can find adisjoint partition Ω � ni�1Bi with Bi XBj � H for i � j with G � σpB1, . . . , Bnq ( �disjoint union). Then every G-measurable real-valued random variable Y has the form

Y �ņ

i�1yi1Bi , yi P R

(ù Exercise: ð is clear, ñ follows by contradiction).Let X P L1 or ¥ 0. Then we have the following explicit representation for the condi-

tional expectation of X given G:

ErX|Gs �ņ

i�1

ErX1BisPrBks 1Bi pwith the convention 0{0 � 0q.

There is an analogous formula if G is generated by a countable (possibly infinite) set.For example, if X � fpUq with U uniformly distributed on r0, 1s and pBkq some

partition of [0,1], then the formula above shows that ErX|Gs is a piecewise-constantfunction on each interval Bk with constant value equal to the mean of fpUq on thisinterval.

Intuitive interpretation of conditional expectations with conditional probabilities: forPrBis ¡ 0 the elementary conditional probability is defined by Pr�|Bis :� Pr � XBis{PrBis,and the conditional expectation of X given Bi could be defined by

ErX|Bis �»XdPr�|Bis � ErX1BisPrBis ,

such that ErX|Gs � °ni�1 ErX|Bis � 1Bi .Remark 1.4. A similar formula holds when G is generated by a countably infinite par-tition. Check that (i) and (ii) in Definition 1.3 are fulfilled (exercise).

Remark 1.5. If G � σpZq for a random variable Z, we define ErX|Zs :� ErX|σpZqs.Then ErX|Zs is of the form hpZq for a measurable function h, since σ-measurable. Weoften write hpzq � ErX|Z � zs, or more precise would be hpzq � ErX|Zs��

Z�z.

1.3 Properties of conditional expectationsThroughout this section, we fix a probability space pΩ,F ,Pq and a sub-σ-field G F .Theorem 1.5. Let X ¥ 0 or in L1. Then

4

(1) ErErX|Gss � ErXs;(2) X: G-measurable ùñ ErX|Gs � X a.s.;(3) Linearity: X1,2 P L1, a, b P R ùñ EraX1 � bX2|Gs � aErX1|Gs � bErX2|Gs;(4) Monotonicity: X1,2 ¥ 0 or P L1 with X1 ¤ X2 a.s. ùñ ErX1|Gs ¤ ErX2|Gs a.s.;(5) Projectivity, “tower property”: H G F - σ-fields, then

ErErX|Gs|Hs � ErX|Hs a.s.;

(6) “Taking out what is known”: Z: G-measurable, Z �X and X are ¥ 0 or in L1, then

ErZX|Gs � ZErX|Gs a.s.;

(7) If X is independent of G, then ErX|Gs � ErXs a.s.Example 1.2 (Application of conditional expectations: martingales). Let pFkqkPN0 withF0 F1 � � � F , be an increasing family of σ-fields. Such a sequence pFkq is called afiltration. A process pMkqkPN0 with the properties(1) Mk P L1 for every k P N0,(2) Any Mk, k P N0, is Fk-measurable, (We say: pMkq is adapted to pFkq.)(3) ErMk|Fls �Ml a.s. @l ¤ k,is called a martingale.

(a) Random walk: Let Y1, . . . , Yn be i.i.d. random variables with Yk P L1, ErYis � 0. ThenMk :�

°ki�1 Yi is a martingale with respect to the filtration pFkq � pσpYi : i ¤ kqq

(ù Exercise).

(b) Let X P L1 and pFkqkPN0 be a filtration. Then Mk :� ErX|Fks, k P N0, is a martingale.(c) “Multiplicative random walk”: Y1, Y2, � � � , are i.i.d. random variables such that Yi ¥

0 @i and bounded. Let Fk � σpYi : i ¤ kq. Then Mk :�±k

i�1 Yi, k � 1, 2, . . . , is amartingale if ErYis � 1 (noting M0 � 1).

Theorem 1.6 (Limiting theorems and inequalities for conditional expectations).

(1) Monotone convergence: Let 0 ¤ Xn Õ X a.s. Then

ErXn|Gs Õ ErX|Gs a.s.

(2) Fatou’s lemma: Let Xn ¥ 0 for all n. Then

Erlim infnÑ8 Xn|Gs ¤ lim infnÑ8 ErXn|Gs a.s.

5

(3) Dominated convergence theorem (theorem of Lebesgue): If Xn a.s.Ñ X andthere exists Z P L1 such that |Xn| ¤ Z, then

ErXn|Gs Ñ ErX|Gs a.s.

(4) Jensen’s inequality: Let X P L1 and u : RÑ R be convex with upXq P L1. ThenErupXq|Gs ¥ upErX|Gsq a.s.

A simple corollary of Jensen’s inequality gives us

Corollary 1.7. For p ¥ 1, the conditional expectation maps Lp into itself, i.e. for X P Lpwe have ErX|Gs P Lp, and

}ErX|Gs}p ¤ }X}p.Example 1.3. (1) Conditional expectation if joint density exists:

Let X and Y be real-valued random variables with joint density fpx, yq. Let g be abounded (or non-negative) measurable function. Then

ErgpXq|Y s � ErgpXq|σpY qs � hpY qfor

hpyq :�³gpxqfpx, yq dx³fpx, yq dx �

»gpxq fpx, yq³

fpu, yq du dx��»gpxq fX|Y px|yqloooomoooon

cond. density

dx,

where the right-hand side is defined finitely, i.e. h � 0 when the denominator in theformula is 0. (Exercise)

(2) Bayes’ formula:Let pΩ,F ,Pq be a probability space and consider the probability measure Q withdensity Z with respect to P, i.e. QrAs � EPrZ1As for every A P F (in particularErZs � 1, Z ¥ 0 P -a.s.). Then for any random variable X ¥ 0 (or in L1) andsub-σ-field G F we have

EQrX|Gs � EPrX � Z|GsZG

,

where ZG :� EPrZ|Gs. Note that the right-hand side is defined Q-a.s.; on the Q-nullsetwhere the (version of the) denominator is 0, one could set the conditional expectationto 0.Remark: ZG is the density of Q with respect to P on G in the sense that QrGs �EPrZG1Gs for every G P G.

(3) Wald’s identities:Let X1, X2, . . . , be i.i.d. square integrable random variables and let N be an N-valued random variable that is independent of pXiqiPN. Consider the random variable

6

S :� °Ni�1Xi (interpretation: “total claim size of an insurance company”). We areinterested in computing some statistics of S, e.g. moments: mean and variance. Wedo so by using the Theorem 1.8 with G :� σpNq:

ErSs � E�E� Ņi�1

Xi

�� G�� E�E� ņi�1

Xi

��n�N

�� E

�nErX1s

��n�N

�� ErN s � ErX1s (Wald’s first identity).

ErS2s � E�E�� ņ

i�1Xi

2��n�N

�� E

��Var

� ņi�1

Xi

� E

�� ņi�1

Xi

�2��n�N

�� E

��nVar pX1q � n2E rpX1qs2

� ��n�N

�� ErN s � VarrX1s � ErN2s � ErX1s2,

and thus

VarrSs � ErN s � VarrX1s � ErX1s2 ��ErN2s � ErN s2�

� ErN s � VarrX1s � ErX1s2 � VarrN s (Wald’s second identity).

(4) Let X1, . . . , Xn be i.i.d., X1 P L1. Set Sn :�°ni�1Xi. Then we have

ErXi|Sns � Snn.

Now, consider random variables X and Y defined on the probability space pΩ,F ,Pqand taking values in the measurable spaces pSX ,SXq and pSY ,SY q respectively.Theorem 1.8. Let X and Y be independent, gpx, yq be a measurable real-valued functiontaking non-negative values (or gpX, Y q P L1). Then

ErgpX, Y q|Y s � ErgpX, yqs��y�Y .

More precisely, hpyq :� ErgpX, yqs is a measurable function and the random variable hpY qis a version of ErgpX, Y q|Y s.

1.4 Regular conditional probability distributionsFor this section, fix a probability space pΩ,F ,Pq, a sub-σ-field G F , a measurable spacepS,Sq, and a random variable X : pΩ,Fq Ñ pS,Sq.

Motivation: for any B P S, 1BpXq � 1tXPBu is a bounded random variable, hence(a version of) the conditional expectation Er1BpXq|Gs � PrX P B|Gs exits, it is G-measurable, valued in r0, 1s a.s. This leads to a mapping

Ω� S ÞÑ r0, 1spω,Bq ÞÑ PrX P B|Gsrωs

7

for any chosen versions (for each B) of the conditional expectations. However, the defini-tion of such a mapping would depend on the choice of versions.

Question: Can we choose versions of the conditional expectations such that B ÞÑPrX P B|Gs is a probability measure for any ω P Ω (or a.e.)? Problem: In general therea uncountably many B’s.

Definition 1.9. (a) A stochastic kernel (or Markov kernel) from a measurable spacepS1,S1q into another one pS2,S2q is a mapping K : S1 � S2 Ñ r0, 1s such that

(i) for all x1 P S1, B ÞÑ Kpx1, Bq is a probability measure on S2,(ii) for all A2 P S2, x1 ÞÑ Kpx1, A2q is a S1 � Bpr0, 1sq-measurable mapping.

(b) A regular conditional probability distribution of a random variable X taking values inpS,Sq given the σ-field G is a stochastic kernel K from pΩ,Gq to pS,Sq such that forevery B P S, Kp�, Bq is a version of Er1BpXq|Gs � PrX P B|Gs.

The existence results below will make use of the so-called “Monotone Class Theorem”(MCT). That is why we explain it first.

Definition 1.10. Let pΩ,Fq be a measurable space. Let D 2Ω be a class (familiy) ofsets which is closed with respect to differences and increasing limits, that means

• A,B P D with B A implies AzB P D,• Ak P D, k P N, with A1 A2 � � � , implies A :�

8k�1Ak P D.

We may call suchD an “m-class”2. If additionally Ω P D, then one callsD a Dynkin system.Note that a Dynkin system contains any monotone limit of its elements, being sets.

Theorem 1.11 (Monotone class theorem for classes of sets, cf. [Klenke06] (Thm. 1.19)or [JacPro03] (ch.6)). Let E 2Ω be closed under finite intersections (and Ω P E). Thenthe smallest Dynkin system δpEq (or m-class when Ω P E) containing E is equal to theσ-field σpEq generated by E, i.e.

δpEq � σpEq.Note that “” is clear, “” is the main statement.

Corollary 1.12 (a simple MCT for functions). Let E be as in Theorem 1.11 with Ω P E.Let H be a vector space of functions Ω ÞÑ R such that• H contains all functions 1E, E P E,• if fn P H, n P N, 0 ¤ f1 ¤ f2 ¤ � � � , and f :�Ò � limnÑ8 fn is bounded, then f P H.

Then H contains all bounded σpEq-measurable functions.Theorem 1.13. Let S � R with the σ-field S � BpRq, and X be a pS,Sq-valued randomvariable. For every sub-σ-field G F there exists a regular conditional probability of Xgiven G.

2this is not standard in the literature

8

Remark 1.6. The important property about R which we will use is that there exists acountable dense subset (e.g. Q), i.e. that R is separable.Remark 1.7. If pS,Sq as a measurable space is “similar” (isomorphic) to pR,BpRqq, thenone can construct a regular conditional probability on it.Definition 1.14. A measurable space pS,Sq is called Borel space if there exists A P BpRqand a bijection Ψ : S Ñ A such that Ψ and Ψ�1 are measurable with respect to pS,Sqand pA,BpAqq.Remark 1.8. Such Ψ induces a bijection between S and BpAq by3 B ÞÑ ΨpBq �pΨ�1q�1pBq ÞÑ Ψ�1pΨpBqq � B. That means, the σ-fields are isomorphic. One couldwrite briefly ΨpSq � pΨ�1q�1pSq � BpAq and Ψ�1pBpAqq � S.Theorem 1.15. Let pS,Sq be a Borel space, X be a random variable valued in pS,Sqand defined on a probability space pΩ,F ,Pq. Let also G F be a sub-σ-field. Then thereexists a regular conditional probability for X given G.Definition 1.16. A Polish space is a complete, separable and metrizable topologicalspace.Theorem 1.17. Let S be a Polish space with S � BpSq (the σ-algebra generated by thetopology). Then pS,Sq is a Borel space.Example 1.4 (Examples of Polish spaces). • pRd,Bdq;• pZd, 2Zdq;• closed subsets A of Rd with BpAq;• Cpr0, T s,Rq with } � }8.

Lemma 1.18 (“Lifting” of a kernel). Let X be a random variable taking values in pS1,S1q,on a probability space pΩ,F ,Pq; let σpXq �: G F , and K be a stochastic kernel frompΩ, σpXqq to pS2,S2q. Then there exists a unique stochastic kernel rK from pXpΩq, XpΩqXS1q to pS2,S2q such that rKpXpωq, Bq � Kpω,Bq for all ω P Ω and B P S2.

Terminology: If K is the regular conditional probability of a random variable Ygiven σpXq, then one calls rK the regular conditional probability of Y given X.

pS1,S1q

pΩ,G � σpXqq pS2,S2q

X

K

rK

Figure 1.4.1: “Lifting” of a kernel K to a kernel rK via the random variable X3Note that each preimage here can be read as an inverse image.

9

Example 1.5 (Application example). Let K be a regular conditional probability of arandom variable Y (taking values in SY ,SY q) given G � σpXq for some random variableX. Then for every measurable function g ¥ 0 (or g bounded, or gpY q P L1) we have

ErgpY q|Xs �»gpyqKpx, dyq

��x�X

a.s. (1.4.1)

Remark 1.9. What if gpXq P L1 but not bounded? Then (1.4.1) would hold for g�and g�. The decomposition g � g� � g� yields a.s. finiteness and equality (1.4.1), thusPr“8 � 82s � P �³ g� dK � ³ g� dK � 8� � 0, i.e. the right-hand side above is well-defined for P �X�1-almost all values of x � Xpωq. Hence, the claim will hold true; onecould define the integral at the RHS (for all values x) to be �8 for those x where it is(as a measure-integral) not well-defined, to have RHS defined for all x with a.s. equalityholding true.

1.5 Disintegrationand semi-direct products

Setting to consider: Given a stochastic kernel K from pS1,S1q to pS2,S2q, a probabilitymeasure P1 on pS1,S1q, Ω � S1 � S2, F � S1 b S2, and projections Xi : Ω Ñ Si,ω � px1, x2q ÞÑ xi. The aim is to construct a probability measure P on F such thatP �X�11 � P1 and K is the regular conditional probability distribution of X2 given X1.Example 1.6. Let Si be countable sets, Si � 2Si (i � 1, 2), F � 2Ω. Any probabilitymeasure P on F is determined by the probability weights

Prtpx1, x2qus � PrX1 � x1, X2 � x2s� PrX1 � x1s � PrX2 � x2|X1 � x1s� P1rx1s �Kpx1, x2q.

Hence, P1 and K determine P on F . Furthermore, for any f : pΩ,Fq Ñ pR�,BpR�qq thefollowing formula holds:

ErfpX1, X2qs �»

Ωf dP

�¸x1PS1

¸x2PS2

fpx1, x2qKpx1, x2qP1px1q

�»S1

�»S2

fpx1, x2qKpx1, dx2q

P1pdx1q.

In other words, the probability measure P and expectations with respect to it are beingdetermined by P1 and K via a representation similar to the one from Fubini’s theorem.

Theorem 1.19 (Disintegration). In the setting under consideration, there exists a prob-ability measure P on pΩ,Fq such that

10

(a) »Ωf dP �

»S1

�»S2

fpx1, x2qKpx1, dx2q

P1pdx1q for all measurable f ¥ 0.

In particular, for all A P F the sections Ax1 :� tx2 P S2| px1, x2q P Au are in S2, and(b) PrAs � ³

S1Kpx1, Ax1qP1pdx1q,

(c) PrA1 � A2s �³A1Kpx1, A2qP1pdx1q for all Ai P Si, i � 1, 2.

The probability measure P is uniquely characterized by (c).

Notation: For this unique measure P we write

P � P1 bK .

One calls P the (semi-direct) product of P1 with K.

Remark 1.10. Theorem 1.19 implies the standard statement of the Fubini’s theorem forproduct measures P � P1 b P2 (take Kpx1, �q � P2p�q) (cf. [Klenke06, Theorem 14.14 and14.16]).

Terminology: The marginal distributions of P on S1 b S2 are PX1 :� P � X�11 andPX2 :� P �X�12 . Then Kpx1, �q is the (regular) conditional probability distribution of X2given X1.

Example 1.7. If X1, X2 are discrete random variables, we get using elementary condi-tional probabilities that

PrX2 P A2|X1 � x1s � PrtX2 P A2u X tX1 � x1usP1rtX1 � x1us ,

and hence Kpx1, A2q � PrX2 P A2|X1 � x1s because PrtX2 P A2u X tX1 � x1us �P1rtX1 � x1us �Kpx1, A2q by (c) in Theorem 1.19.Remark 1.11. We can obtain a canonical generalization from 2 to n factors (finite numberof factors) in Theorem 1.19 by induction. For a countably infinite number of factors theexistence of a measure P will be proven later in the lecture.

Now, let pSi,Siq be measurable spaces for 1 ¤ i ¤ n, P1 be a probability measure onpS1,S1q and Ki be stochastic kernels from p

i�1k�1 Sk,

Âi�1k�1 Skq to pSi,Siq for 2 ¤ i ¤ n.

Set Ω :�n1 Si and F :�Ân1 Si.Corollary 1.20. In the aforementioned setting, there exists a probability measure P onpΩ,Fq such that for every measurable f ¥ 0(a) »

ΩfdP �

»S1

»S2

� � �»Sn

fpx1, � � � , xnqKnppx1, � � � , xn�1q, dxnq � � �K2px1, dx2qP1pdx1q,

11

(b) and, in particular, for any Ai P Si

P

�n¡1Ai

��»A1

»A2

� � �»An�1

Knppx1, . . . , xn�1q, AnqKn�1ppx1, � � � , xn�2q, dxn�1q

� � �K2px1, dx2qP1pdx1q;with probability measure P being uniquely defined in this way.

Remark 1.12. Remark for product measures P �Ân1 Pi:1. The construction above works for general measures (instead of just probability mea-

sures); the uniqueness argument in the Fubini’s thm. needed σ-finiteness of P.

2. One can consider f ¥ 0, or bounded or f P L1pPq in the theorem of Fubini forproduct measures; in the L1-case, the inner integrals are finitely defined a.e. (wrt.the respective finite dim. distribution given by the outer intergrals); integral wrt tokernels could be defined (in generalization of the ususal measure integral) pointwiseeverywhere, defining their value as the usual difference of the integral of the positivand negative integrand where that is defined, and letting it be �8 elsewhere.

Theorem 1.21. Let X1, X2 be random variables defined on a probability space pΩ,F ,Pqand taking values in the measurable spaces pS1,S1q and pS2,S2q respectively. Let P1 bea measure on S1 and K be a stochastic kernel from S1 to S2. Suppose that the jointdistribution pX1, X2q on pS1 � S2,S1 b S2q is P1 bK. Then for any S1 b S2-measurablereal-valued function F ¥ 0, the following holds

ErF pX1, X2q|X1spωq �»S2

F pX1pωq, x2qKpX1pωq, dx2q

� hpX1pωqq,for hpx1q :�

³S2F px1, x2qKpx1, dx2q.

Remark 1.13. This statement generalizes a previous result: see Example 1.5.

Remark 1.14. A simple corollary of Theorem 1.21 is Theorem 1.8. Indeed, this followswhen P1 bK � P1 b P2 (i.e. independent case, without dependence of K from its firstargument).

Theorem 1.22. Let pSi,Siq be measurable spaces for i � 1, 2, and pS2,S2q be a Polishspace with S2 � BpS2q. Then every probability distribution Q on pS1 � S2,S1 b S2q �:pΩ,Fq is of the form Q � Q1 b rK for a probability measure Q1 on S1 and a stochastickernel rK from pS1,S1q to pS2,S2q.

An analogous statement holds for measures on product spaces with finitely manyfactors pSi,Siq, i � 1, � � � , n,. Proof by induction.Example 1.8 (Markov process in discrete finite time with a general state space). LetpS,Sq be a measurable space, P0 a probability measure on S (starting initial distributionfor the process), and K be a stochastic kernel from pS,Sq to pS,Sq. An S-valued processpXnqn¥0 on a probability space pΩ,F ,Pq is called (time-homogeneous) Markov processwith initial distribution P0 and transition kernel K if

12

(i) PX0 � P �X�10 � P0,(ii) for every n ¥ 0 and B P S holds

PrXi P B|Xi�1, Xi�2, � � � , X0s � PrXi P B|Xi�1s � KpXi�1, Bq a.s..

For a given P0 and K, there exists a Markov process pXkqk�0,1,...,N for any given finitetime horizon N P N, and the joint law of pX0, X1, � � � , XNq is given by P0 bK b � � � bKloooooomoooooon

N times.

(For infinite time horizon, we will construct such a process only later in the lecture).Existence: Take Ω � N0 Si, F � ÂN0 Si, P :� P0 b K b � � � bKloooooomoooooon

N timeswith kernels

Kippx0, . . . , xi�1q, dxiq � Kpxi�1, dxiq as for Cor. 1.20 but with the conditional distributionof the next future state xi depending on the ’past’ px0, . . . , xi�1q only through the ’presentstate’ xi�1 (Markov property). Provided that the Kernels K do not depend on i onespeaks of a time-homogeneous Markov process. (If one considers more generally kernelsKipxi�1, dxiq with Ki not being the same for all i one speaks of a time-inhomogeneousMarkov process and can do a likewise construction.)

Then it is easy to check that (i) and (ii) hold for the canonical projections X0, � � � , XNon the product space:

• Clearly PX0 � P0 by construction.• By the generalized Fubini’s theorem (Theorem 1.19, extension for more kernels) we

have for all Bk P S

Er1BnpXnq �n�1¹

01BkpXkqs �

»B0

� � �»Bn�1

Kpxn�1, BnqKpxn�2, dxn�1q � � �P0pdx0q

� ErKpXn�1, Bnq �n�1¹

01BkpXkqs.

Hence, PrXn P B|Xn�1, � � � , X0s � KpXn�1, Bq sincen�1

0 tXk P Bku forms anintersection stable generator of σpX0, � � � , Xn�1q (cf. Exercise 1.1). Analogously,taking Bk � S for k � 0, � � � , n� 2 we get PrXn P B|Xn�1s � KpXn�1, Bq.

Uniqueness of the finite dimensional distribution: follows by uniqueness ofthe generalized Fubini theorem (Cor.1.20), noting that (ii) implies (with notations as fromthe proof there) that Pk�1 � PkbK for k � 1, . . . n is the joint distribution of X0, . . . , Xk.

13

Chapter 2

Martingales

2.1 Definition and interpretationLet pΩ,F ,Pq be a probability space. Consider discrete time n � 0, 1, . . . , with n Pt0, 1, . . . , T u (finite horizon) or n P N0 (infinite horizon).Definition 2.1. (a) A filtration pFnqn¥0 on F is an increasing family of σ-fields F0 F1 � � � F . Set F8 :� σ

�n¥0Fn

� F .(b) A stochastic process pXnqn¥0 is adapted to the filtration pFnqn¥0 ifXn is Fn-measurable

for each n ¥ 0.(c) The natural filtration of a process pXnqn¥0 is the smallest filtration with respect to

which pXnqn¥0 is an adapted process, i.e. pGnqn¥0 for Gn � σpYk : k ¤ nq.Remark 2.1. Filtrations are models for information flow, Fn represents informationavailable at time n.

Definition 2.2. A stochastic process pXnqn¥0 is called martingale with respect to thefiltration pFnqn¥0 and a probability measure P if

(i) pXnqn¥0 is adapted,(ii) pXnqn¥0 is integrable, i.e. Xn P L1pPq for all n ¥ 0,

(iii) ErXn|Fn�1s � Xn�1 P-a.s. for all n ¥ 1.The process pXnqn¥0 is a super-/sub-martingale if (iii) only holds with “¤” / “¥”.Remark 2.2. • Note that (iii) is equivalent to ErXm|Fns � Xn P-a.s. for all m ¥ n.• A super(sub)-martingale is a an adapted process that decreases (increases) in con-

ditional mean.

Example 2.1. (1) Additive martingales: let Yk P L1 be i.i.d. with ErY1s � 0, andconsider Xn :�

°nk�1 Yk. Then pXnqn¥1 is a martingale with respect to Fn � σpYk :

k ¤ nq.

14

(2) Multiplicative martingales: let Yk be i.i.d. with Yk P L1 and ErYks � 1. ConsiderXn :�

±nk�1 Yk. Then pXnqn¥1 is a martingale with respect to Fn � σpYk : k ¤ nq.

(3) For X P L1pPq, let Xn :� ErX|Fns, n ¥ 0, for some filtration pFnqn¥0. Then pXnqn¥0is a martingale w.r.t pFnqn¥0.

Definition 2.3. A process pHnqn¥1 is called predictable if Hn is Fn�1-measurable for alln ¥ 1 (sometimes also called “previsible”).

For a stochastic process pXnq and a predictable process pHnqn¥1, we define by H �Xthe process

Yn � pH �Xqn �ņ

k�1Hk∆Xk, n ¥ 0 pY0 � 0q.

Interpretation: “cumulative gain/loss from betting according to the strategy H”. Y “�³HdX” is a discrete-time version of the stochastic integral of H with respect to X. If X

is a martingale, then H �X is also called a martingale transform.Theorem 2.4. (a) Let H be predictable, bounded (i.e. there exists K 8 such that

|Hn| ¤ K for all n), and X be a martingale. Then Y � H �X is a martingale.(b) Let H be predictable, bounded and non-negative, X be a supermartingale. Then Y �

H �X is a supermartingale.(c) Instead of boundedness of H in (a) or (b), it suffices that H P L2 when X P L2.Definition 2.5. A map τ : Ω Ñ N0 � t0, 1, . . . ,8u is called stopping time with respectto a filtration pFnqn¥0 if tτ ¤ nu P Fn for all n ¥ 0.Remark 2.3. τ : Ω Ñ N0 is a stopping time iff tτ � nu P Fn for all n since (note:notation being for disjoint unions) tτ ¤ nu � nk�0tτ � ku and Fk Fn for k ¤ n.Example 2.2. • Let pXnqnPN be an adapted process taking values in the measurable

space pS,Sq and let B P S. Then, “the first entry time of X in B”:τ :� inftn P N0| Xn P Bu pinf H :� 8q,

is a stopping time. Indeed, tτ ¤ nu � nk�0 tXk P BuloooomoooonPFkFn

P Fn.

• Typically, a “last entry time”: τ :� suptn P N0| Xn P Bu would not be a stoppingtime as tτ ¤ nu would depend on foresight (future) knowledge,• Let τ, τ 1 be stopping times. Then τ ^ τ 1, τ _ τ 1, τ � τ 1 are also stopping times.• Every deterministic time n is a stopping time. In particular, τ ^ n is a stopping

time for each stopping time τ .

Remark 2.4. Intuitively, Definition 2.5 says that it is known at any time n whetherstopping has occured up to that time.

Note that a finite horizon process pXkqk�0,...,N can be naturally embedded into theinfinite horizon setting by letting Xk :� XN for k ¡ N (and Fk � aFN) for k ¡ N .

15

Now, consider a filtered probability space pΩ,F , pFnq,Pq.Definition 2.6. Let X � pXnq be adapted process and τ be a stopping time. ThenXτ � pXτnq with Xτn :� Xn^τ (i.e. Xτnpωq � Xn^τpωqpωq for ω P Ω) is called the process Xstopped at time τ .

Remark 2.5. Note the difference

• Xτ is a process, while

• Xτ is a random variable: Xτ : ω ÞÑ Xτpωq (which requires τ to be finite or, otherwise,X8 to be defined).

Exercise 2.1. (a) If X is an adapted process and τ is a finite stopping time, then Xτ isa random variable.

(b) If X is an adapted process and τ is a stopping time, then Xτ is an adapted process.

(c) If τ and τ 1 are stopping times, then τ ^ τ 1, τ _ τ 1 and τ � τ 1 are stopping times.Now, let τ be a stopping time and let Hpτqn :� 1tn¤τu. Note that tn ¤ τu � Ωztτ ¤

n� 1u P Fn�1, and thus pHpτqn q is a previsible process. For a process pXnq we have

pHpτq �Xqn �ņ

k�11tk¤τu � pXk �Xk�1q � Xn^τ �X0 � Xτn �X0.

By Theorem 2.4 we thus obtain

Theorem 2.7 (Stopping theorem, Doob, version I).

(a) Let pXnq be a supermartingale τ be a stopping time. Then Xτ is a supermartingale.In particular, ErXτn |Fks ¤ Xτk and ErXτns ¤ ErX0s for all k ¤ n.

(b) Let pXnq be a martingale τ be a stopping time. Then Xτ is a martingale. In particular,ErXτn |Fks � Xk and ErXτns � ErX0s for all k ¤ n.

Remark 2.6. • If τ is a bounded stopping time, i.e. if there exists a deterministicconstant N 8 such that τ ¤ N , then Theorem 2.7 gives that ErXτ s ¤ ErX0s(and “=” in (b)). Indeed, simply choose n ¥ N .• Stopped supermartingale remains supermartingale.Next aim will be obtain statements as in Theorem 2.7 for σ-fields from stopping times.

For that purpose, we need to define the stopped-σ-field.

Definition 2.8. Let τ be a stopping time with respect to the filtration pFnqn¥0 in F .Then

Fτ :� tA P F | AX tτ ¤ nu P Fn @n ¥ 0uis called the σ-field of the events observable until time τ .

Exercise 2.2. 1. for deterministic stopping times τ � k we have Fτ � Fk, hencenotation in non-ambiguous.

16

2. Prove that Fτ is a σ-field.

3. If X is an adapted process and τ is a finite stopping time, then Xτ is Fτ -measurable.

Lemma 2.9. Let τ and τ 1 be stopping times with τ ¤ τ 1. Then Fτ Fτ 1.Corollary 2.10 (Corollary of Theorem 2.7). Let X � pXnqn¥0 be a martingale (or su-permartinga) and σ, τ be stopping times with σ ¤ τ a.s.(a) Then

ErXτ^n|Fσ^ns p¤q� Xσ^n a.s.(b) If, moreover, τ and σ are bounded, then

ErXτ |Fσs p¤q� Xσ a.s.

Corollary 2.11. 1. Let pXnq be a martingale. Then for any bounded stopping time τholds ErXτ s � ErX0s.

2. Let pXnq be an adapted and integrable process (i.e. Xn P L1 for all n) such thatfor each bounded stopping time τ it holds that ErXτ s � ErX0s. Then pXnq is amartingale.

2.2 Martingale convergence theoremsfor a.s. convergence

Let pΩ,F ,Pq be a probability space with pFnqn¥0 a filtration. Let also pXnqn¥0 be areal-valued stochastic process. For a, b P R with a b define

UNa,b :� # of times the process X crosses ra, bs for time [0,N], N P N.In other words,

UNa,b � suptk| Tk ¤ Nu,where Tk is defined as follows:

S0 � T0 � 0, Sk :� inftn| n ¥ Tk�1, Xn ¤ au, Tk :� inftn| n ¥ Sk, Xn ¥ bu, k ¥ 1.For X being adapted, Sk and Tk are stopping times.

Lemma 2.12 (Upcrossing lemma). Let pXnqn¥0 be a supermartingale. Then

ErUNa,bs ¤1

b� aErpXN � aq�s. (2.2.1)

Remark 2.7. • With Ua,b �Ò - limNÑ8 UNa,b � # of all upcrossings, the monotoneconvergence theorem gives

ErUa,bs ¤ 1b� a supNPNErpXN � aq

�s. (2.2.2)

17

• The right-hand side in (2.2.2) is finite if and only if supNPN ErX�N s is finite.• This condition is satisfied if X�N � 0 for all N (i.e. X ¥ 0) or if supNPN Er|XN |s 8

(i.e. X is L1-bounded). Indeed, the latter condition suffices since ErX�N s ¤ Er|XN |s.Theorem 2.13 (Doob’s martingale convergence theorem I). Let pXnqn¥0 be a super-martingale with supN¥0 ErX�N s 8. Then the a.s. limit X8 � limnÑ8Xn exists and isin L1. In particular, X8 is finite a.s. and every non-negative supermartingale convergesalmost surely against a finite and integrable limit.

Example 2.3. (1) Let Zi be i.i.d. with Zi � N p0, 1q and let a ¥ σ2 for σ ¡ 0 be fixed.Consider the random variables

Yi � exp�σZi � 12a

.

Then ErYis ¤ 1 because a ¥ σ2. Consider the non-negative process Xn :�±n

1 Yi. Itis a non-negative supermartingale (cf. Example 2.1, 2). Hence, Theorem 2.13 impliesthat Xn converges a.s. Question: What is the limit? (Exercise)

(2) Consider a random walk Sn �°n

1 Yn, where Yi are i.i.d. and p � PrYi � �1s �1� PrYi � �1s.Symmetric case: p � 1{2. In this case, Sn is a martingale (cf. Exercise 2.1, 1).Consider the stopping time Tc :� inftn| Sn � cu for some c P Z. Then Theorem 2.7gives that pSTcn q is a martingale. It is bounded from above if c ¡ 0 and from below ifc 0. By the martingale convergence theorem, limnÑ8 STcn exists a.s. and thereforePrTc 8s � 1 for every c P Z. Thus,£

cPZtTc 8u � Prlim supSn � �8, lim inf Sn � �8s � 1.

Asymmetric case: p � 1{2. Then one can check that�

1�pp

Snis a (non-trivial) non-

negative martingale (ù exercise). Hence, Theorem 2.13 implies that it convergesa.s. The limit is 0 because the Law of large numbers gives

1nSn Ñ 2p� 1 � ErX1s,

and hence Sn Ñ �8 if p ¡ 1{2 and Sn Ñ �8 if p 1{2. In both cases,�

1�pp

Sn Ñ 0.Note that

�1�pp

SnDOES NOT converge to 0 in L1 (expectations being 1 @n).

2.3 Uniform integrability andmartingale convergence in L1

Definition 2.14. A family X of random variables on a probability space pΩ,F ,Pq iscalled uniformly integrable if

limcÑ8 supXPX

Er|X|1t|X|¥cus � 0.

18

Notation: Er|X|; |X| ¥ cs :� Er|X|1t|X|¥cus or ErY ; Bs � ErY 1Bs (common nota-tion, as e.g. in [Will91]); Writing X “is UI” is an abbrevation for “is uniformly integrable”.

Example 2.4. Let Y P L1. Then the family X � tErY |Gs| G is a sub-σ-field of Fu isuniformly integrable (exercise, or [Will91, Chapter 13.1], or example 2.5 )

Lemma 2.15. Each of the following is a sufficient condition for a family X to be uni-formly integrable.

(a) supXPX Er|X|ps 8 for some p ¡ 1.(b) There exists Y P L1, such that |X| ¤ Y a.s. for all X P X .Theorem 2.16. Let X be a family of random variables. The following are equivalent

(1) X is uniformly integrable;

(2) X is L1-bounded (i.e. supXPX Er|X|s 8) and @ε ¡ 0 Dδ ¡ 0 such that

supXPX

Er|X|1As ε @A P F with PrAs δ;

(3) (De La Vallée Poussin criterion:) There exists a function G : r0,8q Ñ r0,8q withlimxÑ8 Gpxqx � �8 and supXPX ErGp|X|qs 8; moreover, G can be chosen as in-creasing and convex.

Example 2.5 (for application of the de La Vallée Poussin criterion). (a) If X is Lp-boundedfor some p ¡ 1, then X is UI (choose Gpxq � |x|p for dLVP in part (3)).

(b) Let Z P L1. Then X :� tErZ|Gs | G is a sub-σ-field of Fu is uniformly integrable.Theorem 2.17. For a random variable X8 and a sequence pXnqnPN in L1, the followingare equivalent:

(1) pXnq converges to X8 in L1 and X8 P L1;(2) pXnq is UI and converges to X8 in probability.Remark 2.8. • This generalizes the dominated convergence theorem.

Remark 2.9. On the connection of uniform integrability to compactness in the weaktopology σpL1, L8q on L1:• The weak topology σpL1, L8q is the coarsest topology on L1 with respect to which

all mappings fZpXq :� ErZXs, Z P L8, are continuous. A sequence pXnq in L1converges to Xw P L1 weakly if

limn

ErZXns � ErZXws @Z P L8.

• A set K L1 is weakly (relatively) compact if K (resp. its closure) is compact withrespect to the topology σpL1, L8q.

19

Theorem 2.18 (Dunford, Pettis). For a set K L1 the following are equivalent(1) K is UI,

(2) K is weakly relatively compact,

(3) K is weakly relatively sequentially compact, i.e. any sequence in K has a subsequencethat converges weakly (with limit in the closure of K).

Remark 2.10. • “(2) ô (3)” holds on every Banach space (result by Eberlein andSmulian).

• Weak convergence of probability measures (will come later in the lecture) is just theweak-�-convergence of probability measures as subset of the dual space of continuousand bounded functionals.

Theorem 2.19 (Martingale convergence theorem II, L1-convergence). Let pXnqnPN0 be amartingale. Then the following are equivalent.

(1) There exists a random variable Y P L1 such that Xn � ErY |Fns a.s.(2) pXnqnPN0 converges in L1 against an F8-measurable random variable X8.(3) There exists a random variable X8 P L1pF8q such that pXnqnPN0 is a martingale (on

n P N0).(4) pXnqnPN0 is UI.Exercise 2.3. (Lp-convergence of martingales): Let p P p1,8q. Assume that pXnqnPN0 ismartingale that is bounded in Lp. Then the convergence statement (2) of Theorem 2.19holds even with convergence in Lp; i.p. X8 is in Lp and the other statements of thattheorem hold true too (with Y � X8 P Lp).Remark 2.11 (On the relation of Y and X8). In general,

ErX81As � ErXn1As � ErY 1As @A P Fn, @n P N0.

Therefore,ErX81As � ErY 1As @A P

¤nPN0Fn.

Hence, by a monotone class / Dynkin-type of argument it follows that

ErX81As � ErY 1As @A P σ�¤nPN0Fn

�� F8,

i.e. X8 � ErY |F8s.Corollary 2.20. Let Y P L1 and Xn :� ErY |Fns. Then pXnqnPN0 is a uniformly integrablemartingale and Xn Ñ X8 :� ErY |F8s, where the convergence is a.s. and in L1.

20

Corollary 2.21 (Kolmogorov’s 0-1-law). Let Y0, Y1, . . . , be independent random variables.Consider the σ-fields An :� σpYn�1, Yn�2, . . .q and A �

nPNAn. Then for every A P A

we have either PrAs � 0 or PrAs � 1.Example 2.6 (Approximation of functions). Consider Ω � r0, 1s, F � BpΩq, P - theLebesgue measure on Ω. Consider the filtration

Fn :� σ��

j

2n ,j � 1

2n

�� j ¤ 2n � 1 , n P N0.For a function f P L1, let fn :� Erf |Fns. Then fn Ñ f in L1 and almost everywhere.

To be able to apply the results, one needs to check that f is F8-measurable, i.e. thatBpr0, 1sq � F8. But this is true since for every ra, bs with a, b P r0, 1s we can take dyadicapproximations an Ò a and bn Ó b such that ran, bnq P Fn; then ra, bs �

nran, bnq P F8.

2.4 Martingale inequalitiesLet pΩ,F ,Pq be a probability space and pFnqnPN0 be a filtration.Lemma 2.22. Let X � pXnqnPN0 be a supermartingale with X ¥ 0. Then for any N0–valued stopping time τ holds

ErX0s ¥ ErXτ s ¥ ErXτ1tτ 8us.

(in particular, Xτ is also defined on the set tτ � 8u a.s.)Theorem 2.23 (Doob’s maximal inequality for probabilities).

(1) Let X � pXnqnPN0 be a supermartingale with X ¥ 0. Then for every c ¡ 0

P�

supnPN0

Xn ¥ c�¤ ErX0s

c.

(2) Let X � pXnqnPN0 be a submartingale with X ¥ 0. Then for every c ¡ 0 and N P N0

P�

maxn�0,1,...,N

Xn ¥ c�¤ 1cErXN1tmaxn�0,1,...,N Xn¥cus

¤ 1cErXN s.

Corollary 2.24. Let X � pXnqnPN0 be a martingale. Then

P�

maxn�0,1,...,N

|Xn| ¥ c�¤ 1cEr|XN |s.

Example 2.7 (Ruin probabilities in insurance mathematics). Consider the following in-surance balance process

Xn � Xn�1 � cn � Yn n P N,

21

where X0 � x0 P R� denotes the starting capital, cn is deterministic and denotes thepremiums at time n, and Yn is stochastic and denotes the claims at time n. Assume thatpXnq is adapted to pFnq (e.g. Fn � σpYk : l ¤ nq). Let τ :� inftn P N| Xn ¤ 0u; this isthe time of technical ruin of the insurance company.

A main question in Ruin theory is to estimate Prτ 8s (ruin probability). Note thattτ 8u tinfnXn ¤ 0u � tsupnp�Xnq ¥ 0u. The idea is to look for a decreasingfunction u : R Ñ R� such that upXnq is a supermartingale. This way we can apply theDoob’s inequality (Theorem 2.23, (1)) to estimate

Prτ 8s ¤ PrsupnpupXnqq ¥ up0qs ¤ upX0q

up0q �upx0qup0q .

A specific example is when Erexpp�λpcn � Ynqq|Fn�1s ¤ 1 for all n P N, where λ ¡ 0 isfixed. In this case upXnq :� expp�λXnq is a supermartingale, and thus

Prτ 8s ¤ e�λx0 .

For instance, if Yn are i.i.d. and cn � c, then the moment estimate EreλY1s ¤ eλc for someλ ¡ 0 sufficies.

Remark:

• exponential decay of ruin probability e�λx0 in initial capital x0 ¡ 0;• but assuming exponential moments for claim sizes Yn is restrictive.

The next statement is a preparation for the Doob’s martingale inequalities for Lp-moments.

Lemma 2.25. Let U, V ¥ 0 be random variables with c �PrV ¥ cs ¤ ErU1tV¥cus for everyc ¡ 0. Then for any measurable function ψ : r0,8q Ñ r0,8q with Ψpvq :� ³v0 ψpzq dz wehave

ErΨpV qs ¤ E�U

» V0

1cψpcq dc

�.

Theorem 2.26 (Doob’s inequality for Lp-moments). Let pXnqnPN0 be a non-negativesubmartingale (or a general martingale). Set X�8 :� supnPN0 |Xn|. Then for every p ¡ 1there exist constants cp, Cp ¡ 0, depending only on p but not on X, such that

cp � supnPN0

}Xn}Lppaq¤ }X�8}Lp

pbq¤ Cp � supnPN0

}Xn}Lp .

Remark 2.12. • For p ¡ 1 and a non-negative submartingale X we have: X isbounded in Lp ðñ X�8 is in Lp.• As we saw in the proof of Theorem 2.26, the inequality (a) holds even for p � 1,

but in general (b) does not necessarily hold for p � 1.• Note that, by stopping, the inequality holds up to any (finite) time horizon.

22

2.5 The Radon-Nikodym theoremand differentiation of measures

Let P and Q be finite measures on pΩ,Fq.Definition 2.27. (a) Q is absolutely continuous w.r.t P on F , denoted by Q ! P, if the

following holds@A P F : PrAs � 0 ñ QrAs � 0.

(b) Q is equivalent to P on F , denoted by Q � Q, if Q ! P and P ! Q on F , i.e. the nullsets of P and Q are the same.

(c) Q is singular to P on F , denoted by Q K P, if there exists A P F with PrAs � 0 andQrAcs � 0, i.e. Q is supported on a P-null set and vice versa.

Example 2.8. (1) Let P be a continuous distribution on R and Q be a discrete distri-bution on R. Then P K Q.

(2) Let P and Q be continuous distributions on Rd with density functions f and g (w.r.tLebesgue measure λ). Then

Q ! P ðñ tg � 0u tf � 0u λ-a.e.,Q � P ðñ tg � 0u � tf � 0u λ-a.e..

(3) Let X1, X2, . . . , be i.i.d Bernouli random variables, t0, 1u-valued, with success prob-ability p and q under P and Q respectively. P and Q are measures on Ω � t0, 1uN,Fn :� σpX1, . . . , Xnq, F :�

nPNFn � σpXn| n P Nq. Then P � Q on Fn for ev-

ery n P N if p, q P p0, 1q: for every non-empty A P Fn we have that PrAs ¡ 0 andQrAs ¡ 0. However, for p � q we have that Q K P on F . Indeed, by the Law oflarge numbers we have that 1

n

°nk�1Xk Ñ p P-a.s., and 1n

°nk�1Xk Ñ q Q-a.s., i.e. for

A :� limnÑ8 1n °nk�1Xk � q( P F we have that PrAs � 0 and QrAs � 1.Lemma 2.28. Let P be a probability measure on pΩ,Fq, X ¥ 0 be a random variable inL1pPq, i.e. EPrXs 8. Then

QrAs :� EPrX1As �»A

X dP, A P F ,

defines a finite measure on pΩ,Fq with Q ! P.Remark 2.13. • For Y ¥ 0, F -measurable, we have EQrY s � EPrXY s (by standard

approximation).

• The Radon-Nikodym theorem will show that Q ! P is sufficient for having a repre-sentation as in Lemma 2.28.

More generally, we will next study the Lebesgue decomposition.

23

Definition 2.29. Lebesgue decomposition of a probability measure Q with respect to aprobability measure P on F is a decomposition of the form

QrAs � EPrX1As �QrAX tX � 8us @A P F , (2.5.1)

where X P L1pPq is F -measurable and non-negative. Such a random variable is calledgeneralized Radon-Nikodym derivative of Q w.r.t P on F , denoted by

X � dQdP��F.

Remark 2.14. A Lebesgue decomposition of a measure Q w.r.t P is a decomposition ofQ into an absolutely continuous part and a singular part. More precisely, let QapAq :�EPrX1As and QsrAs :� QrAX tX � 8us for A P F . Then Q � Qa�Qs; also, Qa ! P onF by Lemma 2.28 and Qs K P because PrX � 8s � 0 since X P L1pPq.Lemma 2.30. Let Q have a Lebesgue decomposition w.r.t P on F with a generalizedRadon-Nikodym derivative X. Then

(1) X ¡ 0 Q-a.s.(2) P has a Lebesgue decomposition w.r.t Q on F with a generalized Radon-Nikodym

derivative given bydPdQ

��F� 1X.

Example 2.9. Let F be generated by a countable partition of Ω: Ω � 8i�1Bi for(disjoint) Bi P F and F � σpBi : i P Nq. Set

X �8̧

i�11Bi �

�QrBisPrBis 1tPrBis�0u �8 � 1tPrBis�0u

�"

QrBis{PrBis if ω P Bi with PrBis � 08 if ω P Bi with PrBis � 0 .

Then PrX � 8s � 0 and EPrXs �°i:PrBis�0 QrBis (being 8 because Q is a probability

measure). Each set A P F is of the form A � iPJ Bi with J N. HenceQrAs �

¸jPJ

QrBjs �¸jPJ

PrBj s�0

QrBisPrBis PrBis �

¸jPJ

PrBj s�0

QrBis

� EPrX1As �QpAX tX � 8uq,

i.e. X � dQdP��F .

Remark 2.15. The example showed that the Lebesgue decomposition for any two mea-sures Q and P on a σ-field generated by a countable partition always exists!

24

Let us fix the set-up for the statements that follow. Let P and Q be probabilitymeasures on a measurable space pΩ,Fq with a filtration pFnq. Set F8 :�

nFn. Assume

that Lebesgue decompositions of Q with respect to P on any Fn exist, providing randomvariables Xn ¥ 0, Xn P L1pP,Fnq such that

QrAs � EPrXn1Asloooomoooon�:QarAs

�QrAX tXn � 8usloooooooooomoooooooooon�:QsrAs

@A P Fn

Remark 2.16. • If Q ! P on Fn, then QsrAs � 0 and Q � Qa on Fn.• If Q K P (on F8) with gen.RN-derivative X8, then Qs � Q, Qa � 0, and X8 � 0P-a.s.

Theorem 2.31. In the set-up defined above, the following holds true

(1) pXnq is a P-supermartingale and p 1Xn q is a Q-supermartingale.(2) X8 :� a.s- limnÑ8Xn exists and

X8 �" P r0,8q P-a.s.P p0,8s Q-a.s. .

(3) The Lebesgue decomposition of Q w.r.t P on F8 is given by

dQdP

��F8

� X8.

(4) Q is locally absolutely continuous w.r.t P (“Q loc! P”), i.e.Q ! P on Fn @n,

if and only if pXnq is a P-martingale with EPrXns � 1.

(5) Q ! P on F8 if and only if Q loc! P and pXnq is P-UI, or (equivalently) if QrX8 �8s � 0.

Definition 2.32. A σ-field is called separable if it is generated by a countable familypAiqiPN of sets.Example 2.10. (1) BpRdq is separable (generated by rectangles with rational vertices).(2) Borel σ-field of a Polish space is separable.

Remark:

• The sets Ai, i P N, do not need to be disjoint. But for any n one can find finitelymany disjoint sets A11, . . . , A1m such that σpA1, . . . , Anq � σpA11, . . . , A1mq. For in-stance, consider the finite partition generated by all intersections of sets Ai or Aci ,for i ¤ n. In this sense, each Fn � σpA1, . . . , Anq may be assumed to be generatedby a finite number of disjoint sets, or by a finite partition of Ω.

25

• Thus, every separable σ-field can be represented as

nPNFn as in Thm.2.31 witheach Fn being generated by a finite partition.

• Recall that for every σ-field Fn generated by a countable partition the Lebesguedecomposition of Q with respect to P exists for any probability measures P and Q(see Example ).

Theorem 2.33 (Radon-Nikodym). Let Q and P be finite measures on pΩ,Fq with Q ! P.Then there exists X P L1pP,Fq, X ¥ 0, such that

QrAs � EPrX1As �»A

X dP @A P F .

We write X � dQdP��F and call it the Radon-Nikodym derivative (or density) of Q with

respect to P on F .The same statement holds more generally for P being only σ-finite. For both P,Q

being only σ-finite, the statement holds too except that the density X ¥ 0 does not needto be in L1pPq.

Note that the density is unique up to equality P -almost everywhere..

Example 2.11 (Likelihood-ratio test). Let pYiqiPN be i.i.d. under P and Q real-valuedrandom variables with Yi having distribution µ under P and ν under Q.

Question: are successive observations of pYiq being drawn from P or Q?Set-up: Ω � RN with Yipωq � ωi (coordinate mappings), Fn � σpYk : k ¤ nq,

F � F8 �

nPNFn, P � µbN and Q � νbN.Note that knowing the whole sequence, the decision between Q and P is easy: for

every B P BpRq we have

limnÑ8

ņ

i�11BpYiq �

"µpBq P-a.s.,νpBq Q-a.s.

So take B P BpRq such that µpBq � νpBq and observe the limit, or rather the sequence,above. What to conclude based on only finitely many observations?

Suppose that ν ! µ with dνdµ � h on R. For instance, µ and ν have densities f and gw.r.t the Lebesgue measure λ on R, in which case one needs tg � 0u tf � 0u λ a.s.,and h � g{f :

νpBq �»B

g dλ �»B

g

ff dλ � Eµrh1Bs @B P BpRq.

Then Q loc! P and dQdP��Fn� Xn :�

±n1 hpYiq : Indeed, let A �

n1 Y

�1i pBiq for Bi P BpRq,

26

then

QrAs �n¹1QrYi P Bis pYi are Q-i.i.d.q

�n¹1

»Bi

g dλ �n¹1

»Bi

g

floomoon�h

f dλ

�n¹1EPrhpYiq1BipYiqs

� EP��

n¹1hpYiq

�1n

1 BipY1, . . . , Ynq

�pYi are P-i.i.d.q

� EPrXn1As,

and since such sets form X-stable generator of BpRnq, we conclude that dQdP��Fn� Xn (using

standard arguments/monotone class theorem).However, if µ � ν, then Q K P on F8 (take B such that µpBq � νpBq, then use the

Law of large numbers to construct a suitable set). Also, by Theorem 2.31 and Remark2.16 we have that limnÑ8Xn � X8 � 0 P-a.s. and limnÑ8Xn � 8 Q-a.s.

Decision between P and Q can be made by observing the sequence of the likelihoodratios Xn. More precisely, the law of large numbers gives

1n

logXn � 1n

ņ

1log hpYiq Ñ

"EQrlog hpY1qs Q-a.s.,EPrlog hpY1qs P-a.s.,

assuming that log hpY1q P L1pQq or P L1pPq respectively.Set

EQrlog hpY1qs �»

log h dν �»

log dνdµ dν �: Hpν|µq,

the relative entropy of ν with respect to µ, and we have respectively when also µ ! ν,i.e. µ � ν,

EPrlog hpY1qs �»

log h dµ �»� log dµdν dµ � �Hpµ|νq.

Exercise: Hpν|µq ¥ 0 with equality iff µ � ν.If both entropies are finite, we have asymptotically

Xn �"

exppn �Hpν|µqq Q-a.s.,expp�n �Hpµ|νqq P-a.s.

The relative entropy intuitively gives the rate of convergence of pXnq towards �8 or 0.Example 2.12 (“Exponential tilting”). (a) Let Yi � Expp1q i.i.d. under P (with µ :�

Expp1q). Can we construct a change to an absolutely continuous probability measureQ such that Y1, . . . , Yn are i.i.d. with distribution Exppλq �: ν under Q for a givenparameter λ ¡ 0?

27

Ansatz: For hpxq � dνdx � dxdµpxq � λe�λx

e�x, note that h ¥ 0 and Eµrhs � 1. So if we

defineνpBq :� Eµr1Bhs @B P BpRq,

we have that ν � Exppλq and with

Xn :�n¹1hpYiq � dν

bn

dµbn

one can describe a probability distribution dQ � XndP (on any Fn): Indeed, for allBi P BpRq

QrY P Bs � EPrn¹1hpYiq1BpY1, . . . , Ynqs

�n¹i�1

EPrhpYiq1BipYiqs (Yi are i.i.d.)

�n¹i�1

»Bi

hpxqe�x dx �n¹i�1

»Bi

λ expp�λxq dx.

Thus, Y1, . . . , Yn are i.i.d. and Exppλq-distributed under Q.However, the measure Q � νbN, under which all Yi, i P N, are i.i.d. Exppλq-distributed, is singular to P � µbN on F8 � σpYi : i P Nq, although νbn � µbn,i.e. Q � P on Fn � σpY1, . . . , Ynq.Note that pXnq is a P-martingale but not P-UI.

(b) Let Yi � N p0, 1q be i.i.d. under P (µ :� N p0, 1q). For m P R, define ν on BpRqthrough hpxq � exp �mx� 12m2� � exp�� px�mq22 { exp p�x2{2q by

νpBq � Eµrh1Bs �»B

h dµ

�»B

ϕm,1ϕ0,1

ϕ0,1 dx

�»B

ϕm,1 dx,

where ϕa,b is the density of N pa, bq w.r.t dx. Thus, ν � N pm, 1q distribution. Nowset

dνbndµbn �

n¹1hpYiq �: Xn.

pXnq is a P-martingale but not P-UI because µbN K νbN on the canonical probabilityspace pRN,BpRNqq since µ � ν although µbn � νbn for every n P N (because Xn ¡ 0).

28

2.6 A brief excursion on Markov chainsWe do a brief excusion on Markov chains, postponing question about construction ofthe respective stochastic model to the next chapter. Indeed, the construction is a directapplication (using Markov kernels) of the general contruction for stochastic processes indiscrete time provided there. The idea of Markov transition kernels also provides anintuitive example that motivates the more general framework (more general kernels) inthe next chapter.

• Markov chain: Markov process in discrete time with countable (discrete) state space.

• Setting: E is a countable state space, i P E are the states, N � t0, 1, . . .u is the timeindex set.

Definition 2.34 (Markov property). A process pXnqnPN on some probability space pΩ,F ,Pqhas the Markov property (w.r.t. P) if @n @i0, i1, . . . , in�1 P E

P�Xn�1 � in�1

��Xr0,ns � ir0,ns� � P �Xn�1 � in�1��Xn � in�(whereever defined), i.e. if P

�Xn�1 � in�1

��Xr0,ns� � P �Xn�1 � in�1��Xn� a.e..• Notational abbreviation used: irs,ts � pis, . . . , itq P Et�s�1, Xrs,ts � pXs, . . . , Xtq.• On E we consider the discrete topology, BpEq � 2E.• Stochastic kernel K : E � BpEq Ñ r0, 1s from E to E is determined by a stochastic

matrix P � ppijqi,jPE with pij � Kpi, tjuq.• A stochastic kernel K is also called Markov kernel, and P is called transition prob-

ability matrix, as K (resp. P) describes the dynamics of transitions for a Markovchain.

Definition 2.35 (Markov chain). pXnqnPN is called (time-homogeneous) Markov chain(under P) with initial (starting) distribution ν on E and stochastic kernel K representedby a stochastic matrix P , if @n P N @i0, . . . , in�1 P E

P�Xn�1 � in�1

��Xr0,ns � ir0,ns� � pinin�1 � Kpin, tin�1uq (2.6.1)(whereever conditional probability is defined), i.e. if

P�Xn�1 � in�1

��Xr0,ns� � pinin�1 ��in�Xn � KpXn, tin�1uq a.e..Notation: given a transition matrix/kernel, the distribution of the process pXnqnPN

is determined by ν and one often writes Pν for P to indicate the initial distribution, or Pifor Pν when ν � δi is the point mass at i.

In the sequel we will work on the canonical probability space of paths for a Markovchain: Ω � EN, F � σpπn : n P Nq � σpXn : n P Nq � BpEqN, Xn � πn for the canonicalprojections πnpωq � ωn. This permits to define a family Pi, i P E, of probability measuressimultaneously on the same space pΩ,Fq with the canonical mapping X : N � Ω Ñ E,

29

Xnpωq � ωn, such that X � pXnqnPN is a Markov chain with initial distribution δi (startingat i P E) under Pi for any i. By Th. 2.36(a) one can also define Pν on pΩ,Fq for anyinitial probability distribution ν on E.

Any question that depends only the distributional properties of the Markov process(e.g. about probabilities or expectations depending on the distribution of the processpXnq) can be addressed on the canonical space, since the distribution on pEN,BpEqbNq of aMarkov chain X (defined on some probability space) with initial distribution ν :� P �X�10and given transition matrix is identical to the distribution of the canonical process π onthe canonical space under Pν .

Indeed part (a) of Theorem 2.36 shows that the distribution of Xr0,ns under P isuniquely determined on Fn for any n P N. As YnFn is an intersection-stable generatorof F , the distributions of the processes X and π on BpEqbN are the same (uniqueness)- if they exists. Existence we postpone to the next chapter - it will be obtained byKolmogorov’s consistency theorem.

Theorem 2.36. Let pXnqnPN be E-valued stochastic process.(a) pXnqnPN is a Markov chain (under P ) with transition kernel (matrix) K (P) and

initial distribution P �X�10 � ν if and only if @n P N @i0, . . . , in P EP�Xr0,ns � ir0,ns

� � νpi0qpi0i1pi1i2 � � � pin�1in .(b) If pXnqnPN is a Markov chain, then @n m, in P E, A En, B Em�n

P�Xrn�1,ms P B

��Xr0,n�1s P A,Xn � in� � P �Xrn�1,ms P B��Xn � in�� Pin

�Xr1,m�ns P B

�(2.6.2)

(whereever defined), i.e.

P�Xrn�1,ms P B

��X0, . . . , Xn� � P �Xrn�1,ms P B��Xn� ,meaning that future evolution of Markov chains only depends on the history up totoday through the present state.

Theorem 2.37 (Conditional independence of past and future given the present). LetpXnqnPN be a Markov chain. Then @n m, in P E, A En, B Em�n we haveP�Xr0,n�1s P A,Xrn�1,ms P B

��Xn � in� � P �Xr0,n�1s P A��Xn � in��P �Xrn�1,ms P B��Xn � in�(whereever defined), i.e.

P�Xr0,n�1s P A,Xrn�1,ms P B

��Xn� � P �Xr0,n�1s P A��Xn� � P �Xrn�1,ms P B��Xn� a.e.2.6.1 First-entry times of Markov chainsWe consider the natural filtration Fn � σpXk : k ¤ nq generated by the process pXnq.With respect to pFnq, the first-entry times of a set A E

TA :� inftn ¥ 1 : Xn P Au,HA :� inftn ¥ 0 : Xn P Au,

30

are stopping times, possibly taking value �8. Note that PirHA � 0s � 1 for i P A andPirHA � TAs � 1 for i R A. Define hApiq :� PirHA 8s as the probability of reachingA eventually when starting at i, and define kApiq :� EirHAs as the expected time forreaching A from i (expectation is taken under Pi).

Theorem 2.38. Let A E be non-empty. Then hA : E Ñ r0, 1s is the smallest functionE ÞÑ r0,8q solving

hApiq �¸jPE

pijhApjq @i P EzA

with boundary conditions hApiq � 1 for i P A.For the proof of the theorem we will need the following lemma.

Lemma 2.39. For every i P EzA, n P N, A � H we have

PirTA ¤ n� 1s �¸jPE

pijPjrHA ¤ ns.

Theorem 2.40. kA is the minimal function E ÞÑ r0,8q solving

kApiq � 1�¸jRA

pijkApjq @i R A

with boundary condition kApiq � 0 for i P A.

2.6.2 Further properties of Markov chainsFor n P N define the shift operator θn : Ω Ñ Ω, ω � pωjqjPN ÞÑ pωn�jqjPN.Theorem 2.41. Let g : pΩ,Fq Ñ pR,BpRqq be measurable and nonnegative (or bounded).Then for any n ¥ 0 we have

Eνrg � θn |Fns � Eirgs��i�Xn �: EXnrgs

Instead of just 1-step transition probabilities, one can also consider multi-step transi-tion probabilities

ppnqij :� PrXk�n � j|Xk � is � PirXn � js

and those are being related according to

Theorem 2.42 (Chapman-Kolmogorov). For m,n P N, i, j P E we have

PirXn�m � js �¸yPE

PirXn � ysPyrXm � js .

For Markov processes in general the analogue of the next property would be strongerthan the Markov property itself, but for Markov chains both are equivalent:

31

Theorem 2.43. Let pXnqnPN be a Markov chain and T 8 be a finite stopping time,with respect to pFnqnPN. Then

P�XrT�1,T�ks P B

��XT � i, A� � Pi �Xr1,ks P B� , (2.6.3)for k P N, A P FT , B Ek, i P E (wherever defined), such that

P�XrT�1,T�ks P B

��FT � � Pi �Xr1,ks P B� ��i�XT

�: PXT�Xr1,ks P B

�. (2.6.4)

Remark 2.17. For T � n, m � n � k, we have for A � tXr0,ns P Ãu for some Ã P En�1in (2.6.3):

P�Xrn�1,ms P B

��Xn � i,Xr0,T s P Ã� � Pi �Xr1,m�ns P B� ,i.e. the ordinary Markov property.

2.6.3 Branching ProcessWe consider the branching process (Galton-Watson process) as a simple Markovian modelfor a growing population.Let pξni qn,iPN1 be an array of iid., N � N0-valued random variables.Define the process pZnqnPN by Z0 :� 1 and

Zn�1 :�Zņ

i�1ξn�1i ,

Then pZnq is a Markov chain, taking values in E � N, with respect to filtrationFn :� σpξki : k ¤ n, i P Nq,

with transition-matrix ppijq and -kernel K given byPrZn�1 � j|Zn � is& � Prξn�11 � . . .� ξn�1i � js �: pij �: Kpi, tjuq;

indeed we have

PrZn�1 � j|Fns � P�Zņ

i�1ξn�1i

��Fn�� PrZn�1 � j|Zns, j P E,

implying (by monotone convergence)

P rZn�1 P B|Fns � PrZn�1 P B|Zns, B E.Let m :� Erξni s and σ2 :� Varpξni q. Assume m P p0,8q. (Case m � 0 is trivial .)Theorem 2.44. Assume σ2 P p0,8q and let Wn :� Zn{mn.

Then pWnq is a martingale,the almost-sure limit limnWn �: W8 exists, andm ¡ 1 ô ErW8s � 1 ô ErW8s ¡ 0 .

32

Chapter 3

Construction of stochastic processes

3.1 General construction of stochastic processesThe aim of this section is to explain general concepts and construction of mathematicalmodel (probability space) carrying general stochastic processes.

Let pS,Sq be a measurable space, S is the space of observations, pXiqiPI is some familyof random variables defined on a probability space pΩ,F ,Pq and taking values in pS,Sq,with some general index set I.

Definition 3.1. (and Lemma) Let the index set I be non-empty, pS,Sq be a measurablespace, pXiqiPI be a stochastic process (family of random variables valued in pS,Sq). ForJ I, J finite, the mapping pXiqiPJ : Ω Ñ SJ induces a measure QpJq on SJ :�

ÂiPJ S

byQpJq � P � ppXiqiPJq�1,

being the “image measure” (distribution) of pXiqiPJ . The family tQpJq| J I, J finiteuis called the family of finite-dimensional marginal distributions of pXiqiPI under P. Thisfamily is consistent in the sense that for any finite J 1 J we have

QpJ 1q � QpJq � pπJJ 1q�1,where πJJ 1 : SJ Ñ SJ 1 are the canonical projections: πJJ 1ppsiqiPJq � psiqiPJ 1 .

On SI � iPI S we define the coordinate mappings πj � πItju : ω � psiqiPI ÞÑ sj.SI � ÂiPI S is the product-σ-field given by σpπi : i P Iq, i.e. the smallest σ-field onSI so that all πi are measurable. An intersection-stable generator of SI is given by thesystem ZR of all cylinder sets of the form ZR � tω P SI | ωi P Ai for i P Ju with Ai P Sand J I, J – finite. A process pXiqiPI induces also a distribution on SI as the imagemeasure of P under the mapping pXiqiPI : Ω Ñ SI . Studying the probabilistic propertiesof pXiqiPI requires to know this distribution Q � P � ppXiqiPIq�1 of the process X.Remark 3.1. • More generally, we can also consider a family of different measurable

spaces pSi,Siq, i P I; and in this case pSI ,SIq :� p

iPI Si,Â

iPI Siq whereÂ

iPI Si �σpπi : i P Iq.• SI �ÂiPI Si in general is not the same as σptiPI Ai| Ai P Siuq if I is uncountable.

If I is countable, then the two σ-fields coincide (exercise).

33

Question: How to construct a model pΩ,F ,Pq and pXiqiPI such that X has a given“stochastic structure”?

Canonical model: Choose Ω � SI , F � SI , and Xi � πi as canonical coordinateprojections. Hence, the question reduces to the construction of a measure P on pSI ,SIqwith some “structure” (on canonical probability space, Q � P in the above notation).

There are alternative conceptual ways to impose “stochastic structure” in general:

(1) specify a consistent family of finite-dimensional distributions;

(2) specify some initial distribution for one (first) coordinate and a “transition rule” onhow to sample further coordinates from given ones.

Lemma 3.2. A probability measure P on pΩ,Fq � pSI ,SIq is uniquely determinedby its finite-dimensional marginal distributions, i.e. for any consistent family of finite-dimensional distributions tQpJq| J I, J- finiteu, there exists at most one P on pΩ,Fqsuch that QpJq � P � pπIJq�1 for every finite J I.

Let pS,Sq be a measurable space, P0 be a probability measure on pS,Sq, I � N :�t0, 1, . . .u. For any n ¥ 1, let Kn be a stochastic kernel from pSn,Snq to pS,Sq. Considerthe following iterative construction of measures (as semidirect products of measures andkernels):

Pp0q � P0, Pp1q � P0 bK1, . . . Ppnq � Ppn�1q bKn, . . .For every f ¥ 0, Sn�1-measurable function, we have»

Sn�1f dPpnq �

»S

P0pdx0q»S

K1px0, dx1q � � �»S

Knppx0, . . . , xn�1q, dxnq � fpx0, . . . , xnq.

Let Ω � SN � tω � px0, x1, . . .q| xi P S, i P Nu; this is the space of all trajectories.Let Xnpωq � πnpωq � xn for n P N; this process describes the state at time n. SetFn :� σpX0, . . . , Xnq for all n P N, describing the information up to time n, and letF :� σpX0, X1, . . .q � σ p

nPNFnq � SN. Note that Fn is a σ-field on SN and each

A P Fn has the form An � S � S � � � � for some An P Sn�1.Remark 3.2. Consistency of P p0q, P p1q, . . . : For An P Fn, we have pAn � Sqlooomooon

�:An�1PSn�2�S��

An�1 � S � � � � , and Ppn�1qrAn � Ss � Ppnq b Kn�1rAn � Ss � PpnqrAns. Similarly,PpmqrAn � S � � � � � Slooooomooooon

m�ns � PpnqrAns for every m ¡ n and An P Fn.

Theorem 3.3 (Ionescu Tulcea). In the discrete time situation I � N, there exists aunique probability measure P on SN satisfying

PrAn � S � � � � s � PpnqrAns @An P Sn�1, (3.1.1)or equivalently @n and f ¥ 0, Sn�1-measurable,»

ΩfpX0, . . . , Xnq dP �

»S

P0pdx0q»S

K1px0, dx1q � � �»S

Knppx0, . . . , xn�1q, dxnq�fpx0, . . . , xnq.(3.1.2)

34

Notation and example: If in Theorem 3.3 we have that Knppx0, . . . , xn�1q, dxnq �Pnpdxnq for some probability measure Pn on S for all n, i.e. does not depend on x0, . . . , xn�1,then the unique measure P (from Theorem 3.3) is called the product measure of the Pn’s,and we write

P �8ân�0

Pn.

In the general case we write

P � P0 b� 8ân�1

Kn

using notation of semidirect products.

For P � Â8n�0 Pn we have that Pn is the marginal distribution of the n-th coordi-nate and Ppnq � Ânk�0 Pk for all n ¥ 0, i.e. in particular the coordinate mappings areindependent.

Corollary 3.4. The probability measure P from Theorem 3.3 is the product of its one-dimensional marginal distributions Pn :� P � π�1n � P � X�1n if and only if the randomvariables pXnqn¥0 are independent under P.Example 3.1. If Knppx0, . . . , xn�1q, �q � Knpxn�1, �q does not depend on x0, x1, . . . , xn�2but only on xn�1 for all n, then the coordinate process pXnqn¥0 is a (time-inhomogeneous)discrete-time Markov process. If Kn � K (does not depend on n), then pXnqn¥0 is a time-homogeneous Markov process.

Next goal is to construct stochastic processes on arbitrary (possibly uncountable)index set I with state space pS,Sq. Let Ω � SI , F � SI , and Xi : Ω Ñ S for i P I - thecoordinate mappings.

Theorem 3.5 (Kolmogorov’s consistency theorem). Let tQpJq| J I, J - finiteu be aconsistent family of finite-dimensional distributions QpJq on pSJ ,SJq and let S be a Polishspace with Borel σ-field S � BpSq. Then there exists a unique probability measure P onpΩ,Fq � pSI ,SIq which has the QpJq’s as its finite-dimensional marginal distributions.Lemma 3.6. Let I be an arbitrary index set and SI � ÂiPI S � σpXi : i P Iq be theproduct σ-field. Then we have

(a)SI �

¤JI

J - countable

σpXj : j P Jq.

(b) SI � σ�!pjPJ Ajq � piPIzJ Sq �� Aj P S @j P J, J I, J - countable).

3.2 Gaussian ProcessesLet I be an arbitrary index set, pXiqiPI be a stochastic process defined on the probabilityspace pΩ,F ,Pq and taking values in pS,Sq � pR,BpRqq.

35

Definition 3.7. (a) X � pXiqiPI is called Gaussian process if all finite dimensionalmarginal distributions are multivariate normal (“Gaussian”), i.e. @n P N , i1, . . . , in PI, a P Rn holds that °n1 akXik is a univariate normal random variable (possibly de-generate, i.e. with zero variance ).

(b) X is called centered Gaussian process if it is a Gaussian process with ErXis � 0 forevery i P I.

(c) A family pXki qiPIk , k P K, of stochastic processes on pΩ,F ,Pq is called a Gaussianprocess if tXki | k P K, i P Iku � pXki qpk,iqPĨ with Ĩ :� tpk, iq| k P K, i P Iku is aGaussian process.

Theorem 3.8. (a) Let X � pXiqiPI be a Gaussian process. Then its distribution isuniquely determined by ErXis, i P I, and CovpXi, Xjq, i, j P I.

(b) The process X � pXiqiPI is a Gaussian process with mean values ErXis, i P I, ifand only if the process rX � pXi � ErXisqiPI is a centered Gaussian process (with thecovariance function for both processes being the same).

(c) Existence: Let Γ : I � I Ñ R be such that for every finite J , J I, the matrixpΓpi, jqqi,jPJ is symmetric and positive semi-definite. Then there exists a centeredGaussian process pXiqiPI on some probability space with CovpXi, Xjq � Γpi, jq for alli, j P I.

Example 3.2. Consider the space L2pR�, dxq with the scalar product xf, gy �³R� fg dx.

This space is a separable Hilbert space. Let pΩ,F ,Pq be a probability space carrying asequence pεiqiPN of i.i.d. N p0, 1q random variables. The subspace of L2pΩ,F ,Pq whichis spanned by pεiqiPN is also a separable Hilbert space with the scalar product xX, Y y �ErXY s � ³Ω XY dP.

Now, an isometry between these two Hilbert spaces is given, after choosing an ONBpbiqiPN of L2pR�, dxq, by bi Ø εi, i P N, inducing

L2pR�, dxq Q h �¸

αibi ÐÑ¸

αiεi �: Xh P L2pΩ,F ,Pq,where αi � xh, biy for i P N.

Since εi are Gaussian, then one can prove that pXhqhPL2pR�,dxq is a centered Gaussianprocess with covariance CovpXh, Xgq � xh, gy

Take h � 1r0,ts and set Wt :� X1r0,ts for every t P R�. The process pWtqtPR� is acentered Gaussian process, since t1r0,ts| t P R�u L2pR�, dxq, and its covariance functionis given by

CovpWs,Wtq � CovpX1r0,ss , X1r0,tsq� x1r0,ss,1r0,tsy (isometry)�» 8

01r0,ss � 1r0,ts dx

� s^ t.This will be the covariance structure of Brownian motion, also known as Wiener

process (after Norbert Wiener, 1894-1964).

36

Example 3.3. In this example, we will construct a Gaussian process with the samecovariance structure as the one from the previous example using a different approach.

Let P0 � δ0 and

Ks,tpx, dyq � 1a2πpt� sq exp��|y � x|

2

2|t� s|

dy, t ¡ s ¥ 0.

For every finite index set J � tt1, . . . , tnu, n P N, define the finite-dimensional distributions

QpJqpn¡1Aiq �

»RP0pdx0q

»At1

K0,t1px0, dxt1q � � �»Atn

Ktn�1,tnpxtn�1 , dxtnq @Ati P BpRq.

This family of measures is consistent. Indeed, δxbKs,tpx, dyq � x�N p0, s�tq andN p0, t�sq �N p0, u� tq � N p0, u� sq for all u ¡ t ¡ s ¥ 0. Hence, by the Kolmogorov’s consis-tency theorem (Theorem 3.5) there exists a measure P on pRr0,8q,BpRqr0,8qq such that thecanonical process X � pXtqt¥0 has tQpJq| J - finiteu as finite-dimensional marginal distri-butions that are multivariate normal distributions. Therefore, X is a Gaussian processunder P, centered and with covariances: for s t

CovpXs, Xtq � ErXsXts� ErXspXs �Xt �Xslooomooon

KXs

qs

� ErX2s s � s� s^ t.

Remark 3.3. Note that pXtq and pWtq (from the previous example) have the samedistribution on pRr0,8q,BpRqr0,8qq. The process X starts at 0, Xt�Xs is N p0, t�sq-distributed, and Xt �Xs is independent of the past of the process until time s fort ¡ s ¥ 0. This Gaussian process has the properties which define Brownian motionexcept the continuity of paths, it is sometimes called pre-Brownian motion.

3.3 Brownian motionDefinition 3.9. Processes pXtq, pYtq, t P I, given on the probability space pΩ,F ,Pq, arecalled

(a) versions or modifications of each other if

@t P I DNt P F : PrNts � 0 & Xtpωq � Ytpωq @ω R Nt,i.e.

PrXt � Yts � 1 @t P I;(b) indistinguishable if there exists a null set N such that

Xtpωq � Ytpωq @t P I, @ω R N,i.e.

PrXt � Yt for all t P Is � 1.

37

Theorem 3.10 (Continuity criterion from Kolmogorov and Chentsov). Let pXtqtPr0,1s bean Rd-valued process, with constants α, β, C ¡ 0 such that

Er|Xt �Xs|αs ¤ C � |t� s|1�β @s, t P r0, 1s.

Then there exists a version pYtqtPr0,1s of pXtqtPr0,1s with (globally) Hölder-continuous pathsof order λ P �0, β

α

�.

Remark 3.4. • A function g : r0, 1s Ñ Rd is (globally) Hölder-continuous of order λif

sups,tPr0,1ss�t

}gpsq � gptq}|s� t|λ 8.

• If X � pXtqtPr0,1s is a process such that Xt �Xs � N p0, t� sq for every t, s P r0, 1swith t ¡ s, then Er|Xt �Xs|4s � 3|t � s|2 (ErZ4s � 3σ4 for Z � N p0, σ2q). Hence,pXtq has a Hölder-continuous version with α � 4, β � 1 and c � 3 in the previoustheorem.

Definition 3.11 (Brownian motion). An Rd-valued stochastic process pBtqtPR� definedon some probability space pΩ,F ,Pq is called(1) rd � 1s a (1-dimensional) Brownian motion if

(a) B0 � 0,(b) @n P N, 0 ¤ t0 ¤ t1 ¤ � � � ¤ tn, we have that ∆Btn :� Btn � Btn�1 areN p0, tn � tn�1q–distributed and independent (increments over disjoint intervalsare independent),

(c) P-a.a. paths are continuous;

(2) rd ¥ 1s a (d-dimensional) Brownian motion if all coordinate processes Bj, j � 1, . . . , d,of B � pBjqj�1,...,d are independent 1-dimensional Brownian motions;

(3) a Brownian motion with respect to a given filtration pFtqtPR� on F if it is a d-dimensional Brownian motion with increments Bt � Bs independent of Fs for allt ¥ s ¥ 0.

Theorem 3.12 (Existence and Hölder-continuity of Brownian motion). Let X � pXtqtPR�be a centered real-valued Gaussian process with covariance function CovpXs, Xtq � s ^ t(which exists, by previous results). Then there exists a version B of X such that B hascontinuous paths that are Hölder-continuous of any order λ 1{2 on any compact intervalr0, ns R�. In particular, a Brownian motion (d-dimensional) does exist.Remark 3.5. Let pXtqtPR� and pYtqtPR� be right-continuous processes. If X is a versionof Y , then X and Y are indistinguishable. The reason is that a right-continuous functionR� ÞÑ Rd is determined by its values on a dense countable subset on R�.Definition 3.13. Let X � pXtqtPR� be a stochastic process on some probability spacepΩ,F ,Pq.

38

(a) The natural filtration pF�t qt¥0 generated by X is the filtration F�t � σpXs : s ¤ tq,i.e. the smallest filtration with respect to which X is adapted.

(b) • A filtration pFtqt¥0 is complete (w.r.t the measure P) if each Ft is complete w.r.tP (i.e. contains all subsets of P-null sets).• A filtration pFtqt¥0 is called right-continuous if

Ft � Ft� :�£ε¡0Ft�ε @t P R�.

• A filtration pFtqt¥0 satisfies the usual conditions if it is complete (w.r.t P) andright-continuous.

(c) The usual filtration pFXt q generated by a process X is

FXt :�£ε¡0

�F�t�ε

�P @t P R�,where p�qP means completion with respect to P.

Lemma 3.14. If a filtration pFtq is right-continuous, then its completion pFPt q is alsoright-continuous, hence it satisfies the usual conditions.

Theorem 3.15. Let pFtqt¥0 denote the completion of the natural filtration of the Brownianmotion pBtqt¥0. Then pFtqt¥0 is right-continuous, i.e. it satisfies the usual conditions.Remark 3.6. A Brownian motion is an pFtq-Brownian motion with respect to its usualfiltration �: pFtq.Corollary 3.16. For pFtqt¥0 from Theorem 3.15, F0 � pF�0 qP is trivial, i.e. for everyA P F0 we have PrAs P t0, 1u.Example 3.4. If B is a Brownian motion, then

A � tDε ¡ 0 Bt � 0 in p0, εsu P FB0�and hence PrAs P t0, 1u. However, PrAs � 1 (e.g. PrBε{3 ¡ 0, B2ε{3 0s ¡ 0).

Similarly, PrCs � 0 for C :� t@ε ¡ 0 Dt�, t� P p0, εq : Bt� ¡ 0 and Bt� ¡ 0u

39

Chapter 4

Weak convergence and tightness ofprobability measures

4.1 Weak convergenceSetting: pS, dq is a metric space with S � BpSq (the Borel σ-field), µ and µn, n P N, areprobability measures on pS,Sq.Remark 4.1. We know from the course Stochastics I that convergence notions like“µnpAq Ñ µpAq @A P S” or “|µn � µpAq| � supAPS |µnpAq � µ| Ñ 0” would be toorestrictive, e.g. for the central limit theorem.

Definition 4.1. (a) pµnqnPN converges weakly to µ, write µn ñ µ, if»S

h dµn Ñ»S

h dµ @h P CbpSq,

where CbpSq denotes the space of all bounded continuous functions on S.(b) A sequence of pS,Sq-valued random variables Xn defined on some probability spaces

pΩn,Fn,Pnq converges in distribution to an pS,Sq-valued random variable X definedon pΩ,F ,Pq, and write Xn DÑ X, if µn :� Pn �X�1n ñ µ :� P �X�1.

Remark 4.2. • It is essential in (b) that Xn, X, are pS,Sq-valued (the same space).

• Notations in the literature for this type of convergence: Xn DÑ X, Xn LÑ X (con-vergence in law), LpXn|Pnq Ñ LpX|Pq, µn wÑ µ.

Theorem 4.2 (Portemanteau theorem). The following are equivalent

(1) µn ñ µ as nÑ 8.(2)

³h dµn Ñ

³h dµ for every h P CbpSq that is uniformly continuous.

(3) lim supn µnpF q ¤ µpF q for every closed set F S.(4) lim infn µnpGq ¥ µpGq for every open set G S.

40

(5) limnÑ8 µnpAq � µpAq for every µ-boundary-less A P S, i.e. A P S with µpAzA�q � 0.Sometimes it is very useful that it suffices to have µnpAq Ñ µpAq for only a subfamily

of sets A.

Corollary 4.3. Let E S be X-stable such that(1) any open G S can be represented as a countable union of sets of E,(2) µnpAq Ñ µpAq as nÑ 8 for every A P E.Then µn ñ µ.Remark 4.3. For separable pS, dq one can construct a countable set E as in Corollary4.3: take the balls of rational radii centered at a countable dense subset of S, and considerall finite intersection of such balls.

4.2 Tightness and Prohorov’s theoremLet pS, dq be a metric space and S � BpSq be the Borel σ-field. Consider be the set of allprobability distributions on pS,Sq denoted byM1pSq. By “µn ñ µ” we will denote weakconvergence.

Remark 4.4. • The topology of weak convergence is described by the neighborhood-basis (which is also a basis of the topology)

Uε,h1,...,hnpµq :�"ν PM1pSq

�� | » hi dν � » hi dµ| ε, i � 1, . . . , n* , n P N, ε ¡ 0, hi P CbpSq, µ PM1.Open sets of M1pSq are sets which contain for any element a neighborhood of thisform. Note that basis sets above are itself open, and any open set is an (arbitrary)union of such basis sets.

• Recall (cf. [Alt02, Chapter 2.5]): A set M in a topological space is

– M is compact :ô any open cover has a finite subcover.– M is sequentially (relatively) compact :ô any sequence in M (M) has a sub-

sequence converging in M (M). Note that compact is not necessarily cvalentto sequentially compact.

– M is precompact (totally bounded) in a metric space :ô It has, for any ε ¡ 0,a finite covering with ε-balls.

– for M set in a metric space,

compact ô sequentially compactô precompact and completeñ M is separable

– A subset M of a complete metric space is precompact iff it is relatively compact.

41

• A natural question: is the topology of weak convergence in M1pSq metrizable?Answer: yes, if S is separable (“Prohorov’s metric”).

Example 4.1. Consider S � R with the usual metric, pxnq a sequence in R and µn :� δxn .If limnÑ8 |xn| � 8, there exists no convergence subsequence in general. Consider forexample xn � n. Then

³h dµn � hpnq and this does not necessarily converge, i.e. µn does

not converge weakly to some measure. However, if pxnq is bounded, then there exists asubsequence pxnkq converging to some x P R. In this case, it is easy to see that µnk ñ δx.Definition 4.4. A set M M1pSq is called tight if for every ε ¡ 0 there exists a compactset K S such that µpKq ¥ 1� ε for every µ PM .Example 4.2. The following is a sufficient condition for tightness: there exists a functionh : S Ñ R such that t|h| ¤ cu is compact for every c ¥ 0 and

supµPM

»h dµ 8.

For instance, S � R, hpxq � |x|p for some p ¡ 0. Then the set of measures with boundedp-moments will form a tight set.

Theorem 4.5 (Prohorov). Let M M1pSq.(1) If M is tight, then M is relatively sequentially compact.

(2) Let S be complete and separable. If M is relatively sequentially compact, then M istight.

Example 4.3. Let pµnq be a tight sequence of probability measures. Assume that anyweakly convergent subsequence has weak limit µ. Then the sequence converges: µn ñ µ.Proposition 4.6. If S is compact, then M1pSq is compact and sequentially compact.

For probability measures µ and µn, n P N, on pRd,Bdq, let ϕµn , ϕµ denote the char-acteristic functions of µn, µ; and let Xn, X denote random variables with distributionsµn � Pn � X�1n , µ � P � X�1, respectively. We recall Levy’s continuity theorem forcharacteristic functions. It states that the following are equivalent

(1) There exists a probability measure µ on Bd such that µn ñ µ.(2) There exists a function ϕ : Rd Ñ C such that ϕµnpuq Ñ ϕpuq for all u P Rd and ϕ is

(partially) continuous at 0 (in which case ϕ � ϕµ).The proof for d � 1 is usually shown in the lecture Stochastics I. The proofs for the cased ¥ 1 for Levy’s continuity theorem, and likewise the proof for the Cramer-Wold theorem(cf.below), make use of Prohorov’s Theorem 4.5 (cf. [Klenke06, Thms 15.23 and 15.55],elaborate the details) (for part (b), by using part (a) and the already known result fordimension d � 1). :Proposition 4.7. With the previous notations, we have

42

(a) (Cramér-Wold theorem): Then

µn ñ µ ðñ xy,Xny DÑ xy,Xy @y P Rd.

(b) Levy’s continuity theorem (cf.above) on characteristic functions holds in multiple di-mensions on Rd, d P N.

Remark 4.5 (On M1pSq being metrizable and the Prohorov metric). (1)ρpµ, νq :� inftε ¡ 0| µpAq ¤ ε� νpUεpAqq, νpAq ¤ ε� µpUεpAqq @A P Su

defines a metric on M1pSq called the Prohorov metric.(2) ρpµn, µq Ñ 0 implies µn ñ µ.(3) If S is separable, then the converse is true: µn ñ µ implies that ρpµn, µq Ñ 0. Hence,M1pSq with the topology of weak convergence is metrizable by the Prohorov metric.For reference, cf. [Klenke06].

4.3 Weak convergence on S � Cr0, 1sWe consider the space S � Cr0, 1s of continuous real-valued functions on r0, 1s withthe sup-norm }x} � suptPr0,1s |xptq| (inducing the metric dpx, yq � }x � y}). With thistopology, S is a separable Banach space .

Lemma 4.8. (a) The Borel σ-field S � BpSq is generated by the cylinder sets of the formZ � tx P S| xptiq P Ai, i � 1, . . . , nu

with 0 ¤ t1 ¤ . . . ¤ tn ¤ 1, Ai P BpRq for i � 1, . . . , n, n P N, i.e. σptZ setsuq �BpSq � S.

(b) BpSq � σpπt| t P r0, 1sq with πt : x ÞÑ xptq being the coordinate projections.(c) The set Z of all cylinder sets is X-stable.

Any measure µ on Cr0, 1s is determined by its finite-dimensional marginal distributionsµJ � µ � π�1J with J r0, 1s - finite, where πJ � πr0,1sJ , x ÞÑ pxptjqqtjPJ , are the canonicalprojections, since cylinder sets are X-stable generator of BpCr0, 1sq by Lemma 4.8.Lemma 4.9. (a) If µn ñ µ for some µ, µn P M1pC

Documents

Stochastic Processes in Discrete Time - Stochastik II Lecture ...Stochastic Processes in Discrete Time - Stochastik II Lecture notes Prof. Dirk Becherer Institut fur¨ Mathematik Humboldt