Lecture 4: Introduction to stochastic processes and ... · Lecture 4: Introduction to stochastic processes and stochastic calculus C edric Archambeau Centre for Computational Statistics

Lecture 4: Introduction to stochastic processesand stochastic calculus

Cedric Archambeau

Centre for Computational Statistics and Machine LearningDepartment of Computer Science

University College London

[email protected]

Advanced Topics in Machine Learning (MSc in Intelligent Systems)January 2008

mailto:[email protected]

Discrete-time vs continuous-time?

Real systems are continuous.

Can we gain something?

Physical model.

Can we exploit this information?

450 500 550 600!2

!1.5

!1

!0.5

0

0.5

1

1.5

2

Outline

Some definitions

Stochastic processes

Levy processes

Markov processes

Diffusion processes

Ito’s formula

Variational inference for Diffusion processes

Elements of probability theory

A collection A of subsets of the sample space Ω is a σ-algebra if

A contains Ω:Ω ∈ A.

A is closed under the operation of complementation:

Ω\A ∈ A if A ∈ A.

A is closed under the operation of countable unions:[n

An ∈ A if A1,A2, . . . ,An, . . . ∈ A.

This implies that A is closed under countable intersections.

We say that (Ω,A) is a measurable space if Ω is a non-empty set and A is aσ-algebra of Ω.

Elements of probability theory (continued)

A measure H(·) on (Ω,A) is a nonnegative valued set function on A satisfying

H(∅) = 0,

H

[n

An

!=X

n

H(An) if Ai ∩ Aj = ∅,

for any sequence A1,A2, . . . ,An, . . . ∈ A.

If A ⊆ B, it follows that H(A) 6 H(B).

If H(Ω) is finite, i.e. 0 6 H(Ω) 6∞, then H(·) can be normalized to obtain aprobability measure P(·):

P(A) =H(A)

H(Ω), P(A) ∈ [0, 1],

for all A ∈ A.

We say that (Ω,A,P) is a probability space if P is a probability measure onthe measurable space (Ω,A).

Elements of probability theory (continued)

Let (Ω1,A1) and (Ω2,A2) be two measurable spaces. The functionf : Ω1 → Ω2 is measurable if the pre-image of any A2 ∈ A2 is in A1:

f −1(A2) = ω1 ∈ Ω1 : f (ω1) ∈ A2 ∈ A1,

for all A2 ∈ A2.

Let (Ω,A,P) be a probability space.. We call the measurable functionX : Ω→ RD a continuous random variable.

Stochastic process

Let T be the time index set and (Ω,A,P) the underlying probability space.The function X : T × Ω→ RD is a stochastic process, such that

Xt = X (t, ·) : Ω→ RD is a random variable for each t ∈ T ,

Xω = X (·, ω) : T → RD is a realization or sample path for each ω ∈ Ω.

When considering continuous time systems, T will often be equal to R+.

In practice, we call stochastic process a collections of random variablesX = Xt , t > 0, which are defined on a common probability space.

We can think of Xt as the position of a particle at time t, changing as t varies.The particle moves continuously or has jumps for some t > 0:

∆Xt = Xt+ − Xt− = limε↓0

Xt+ε − limε↓0

Xt−ε.

In general, we will assume that the process is right-continuous, i.e. Xt+ = Xt .

Independence

Let Y1, . . . ,Yn be a collection of random variables, with Yi ∈ RDi . Theyare independent if

P(Y1 ∈ A1, . . . ,Yn ∈ An) =nY

i=1

P(Yi ∈ Ai ),

for all Ai ⊂ RDi .

An infinite collection is said to be independent if every finite subcollectionis independent.

A stochastic process X = Xt , t > 0 has independent increments if therandom variables

Xt0 ,Xt1 − Xt0 , . . . ,Xtn − Xtn−1

are independent for all n > 1 and t0 < t1 < . . . < tn.

Stationarity

A stochastic process is (strictly) stationary if all the joint marginals areinvariant under time displacement h > 0, that is

p(Xt1+h,Xt2+h, . . . ,Xtn+h) = p(Xt1 ,Xt2 , . . . ,Xtn )

for all t1, . . . , tn.

The stochastic process X = Xt , t > 0 is wide-sense stationary if thereexists a constant m ∈ RD and a function C : R+ → RD , such that

µt ≡ 〈Xt〉 = m,

Σt ≡ 〈(Xt − µt)(Xt − µt)>〉 = C(0),

Vs,t ≡ 〈(Xt − µt)(Xs − µs)>〉 = C(t − s),

for all s, t ∈ R+.

We call Vs,t the two-time covariance.

The stochastic process X = Xt , t > 0 has stationary increments ifXt+s − Xt has the same distribution as Xs for all s, t > 0.

Example: Poisson process

The Poisson process with intensity parameter λ > 0 is a continuous timestochastic process X = Xt , t ∈ R+ with independent, stationary increments:

Xt − Xs ∼ P(λ(t − s)),

X0 = 0,

for all 0 6 s 6 t.

The Poisson process is not wide-sense stationary:

µt = λt,

σ2t = λt,

vs,t = λmins, t.0 5 10 15 20 25 30 35 40 45 500

1

2

3

4

5

6

7

8

9

10

t

X t

The Poisson process is right-continuous and, in fact, it is Levy process (seelater) consisting only of jumps.

The Poisson distribution (or law of rare events) is defined as

n ∼ P(λ) =λn

n!e−λ, n ∈ N,

where λ > 0. The mean and the variance are given by

〈n〉 = λ, 〈(n − 〈n〉)2〉 = λ.

Levy process

A stochastic process X = Xt , t > 0 is a Levy process if

The increments on disjoint time intervals are independent.

The increments are stationary: increments with equally long time intervalsare identically distributed.

The sample paths are right-continuous with left limit, i.e.

limε↓0

Xt+ε = Xt , limε↓0

Xt−ε = Xt− .

Levy processes are usually described in terms of the Levy-Khintchinerepresentation.

A Levy process can have three types of components: a determinisitic drift, arandom diffusion component and a random jump component.

It is implicitly assumed that a Levy process starts at X0 = 0 with probability 1.

Applications:

Financial stock prices: Black-Scholes

Population models: birth-and-death processes

...

Interpretation of Levy processes

Levy processes are the continuous time equivalent of random walks.

A random walk over over n time units is a sum of n independent andidentically distributed random variables:

Sn =Xn>1

∆xn,

where ∆xn are iid random variables.

Random walks have independent and stationary increments.

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

time

stat

e

Figure: Example of a Gaussian random walk with S0 = 1.

Interpretation of Levy processes (continued)

A random variable X has an infinitely divisible distribution if for every m > 1we can write

X ∼mX

j=1

X(m)j ,

where X (m)j j are iid.

For example the Normal, Poisson or Gamma distribution are infinitely divisible.The Bernouilli is not infinitely divisible.

Levy processes are infinitely divisible since the increments for non-overlappingtime intervals are independent and stationary:

Xs =mX

j=1

(Xjs/m − X(j−1)s/m),

for all m > 1.

In fact, it can be shown that there is a Levy process for each infinitely divisibleprobability distribution.

Markov process

The stochastic process X = Xt , t > 0 is a (continuous timecontinuous-state) Markov process if

p(Xt |Xs) = p(Xt |Xr1 , . . . ,Xrn ,Xs),

for all 0 6 r1 6 . . . 6 rn 6 s 6 t.

We call p(Xt |Xs) the transition density. It can be time depenent.

The Chapman-Kolmogorov equation follows from the Markov property:

p(Xt |Xs) =

Zp(Xt |Xτ )p(Xτ |Xs)dXτ ,

for all s 6 τ 6 t.

The Chapman-Kolmogorov played already an important role in (discretetime) dynamical systems.

Levy processes satisfy the Markov property.

Markov process (continued)

A Markov process is homogeneous if its transition density depends onlyon the time difference:

p(Xt+h|Xt) = p(Xh|X0),

for all 0 6 h.

The Poisson process is homogeneous a discrete state Markov process:

P(nt+h|nt) = P(λ(t + h − t)) = P(nh|n0).

Let f (·) be a bounded function. A Markov process is ergodic if the timeaverage limit coincides with the spatial average, i.e.

limT→∞

1

T

Z T

0

f (Xt)dt = 〈f 〉

where the expectation is taken wrt the stationary probability density.

Martingale (fair game)

A martingale is a stochastic process such that the expectation of some futureevent given the past and the present is the same as if given only the present:

〈Xt |Xτ , 0 6 τ 6 s〉 = Xs

for all t.

More formally, let (Ω,A,P) be a probability space and At , t > 0 a filtration1

of A. The stochastic process X = Xt , t > 0 is a martingale if

〈Xt |As〉 = Xs ,

with probability 1, for all 0 6 s < t.

When the process Xt satisfies the Markov property, we have 〈Xt |As〉 = 〈Xt |Xs〉.

1A filtration At , t > 0 of A is an increasing family of σ-algebras on the measurable space(Ω,A), that is As ⊆ At ⊆ A for any 0 6 s 6 t. This means that more information becomesavailable with increasing time.

Diffusion process

A Markov process X = Xt , t > 0 is a diffusion process if the following limitsexist for all ε > 0:

limt↓s

1

t − s

Z|Xt−Xs |>ε

p(Xt |Xs)dXt = 0,

limt↓s

1

t − s

Z|Xt−Xs |<ε

(Xt − Xs) p(Xt |Xs)dXt = α(s,Xs),

limt↓s

1

t − s

Z|Xt−Xs |<ε

(Xt − Xs)(Xt − Xs)> p(Xt |Xs)dXt = β(s,Xs)β>(s,Xs),

where 0 6 s 6 t.

We call vector α(s, x) the drift and matrix β(s, x) the diffusioncoefficient at time s and state Xs = x.

The first condition prevents the diffusion process from havinginstantaneous jumps.

The drift α is the instantaneous rate of change of the mean, given thatXs = x at time s.

The diffusion matrix D = ββ> is the instantaneous rate of change of thesquared fluctuations of the process, given that Xs = x at time s.

Diffusion process (continued)

Diffusion processes are almost surely continuous functions of time, butthey need not to be differentiable.

Diffusion processes are Levy processes (without the jump component).

The time evolution of the transition density p(y, t|x, s) with s 6 t, givensome initial condition or target constraint was described by Kolmogorov:

The forward evolution of the transition density is given by the Kolmogorovforward equation (also known as the Fokker-Planck equation):

∂p

∂t= −

Xi

∂

∂yiαi (t, y)p+

1

2

Xi,j

∂2

∂yi∂yjDij (t, y)p,

for a fixed initial state (s, x).The backward evolution of the transition density is given by theKolmogorov backward equation (or adjoint equation):

−∂p

∂s=X

i

αi (s, x)∂p

∂xi+

1

2

Xi,j

Di,j (s, x)∂2p

∂xi∂xj,

for a fixed final state (t, y).

Wiener process

The Wiener process was proposed by Wiener as mathematical descriptionof Brownian motion.

It characterizes the erratic motion (i.e. diffusion) of a grain pollen on awater surface due to it continually be bombarded by water molecules.

It can be viewed as a scaling limit of a random walk on any finite timeinterval (Donsker’s Theorem).

It is also commonly used to model stock market fluctuations.

0 0.2 0.4 0.6 0.8 1!0.8

!0.6

!0.4

!0.2

0

0.2

0.4

0.6

t

Wt

Wiener process (continued)

A standard Wiener process is a continuous time Gaussian Markov processW = Wt , t > 0 with (non-overlapping) independent increments for which

W0 = 0,

The sample path Wω is almost surely continuous for all ω ∈ Ω,

Wt −Ws ∼ N (0, t − s),

for all 0 6 s 6 t.

The sample paths of Wω are almost surely nowhere differentiable.

The expectation 〈Wt〉 is equal to 0 for all t.

W is not wide-sense stationary as vs,t = mins, t, but has stationaryincrements.

W is homogeneous since p(Wt+h|Wt) = p(Wh|W0).

W is a diffusion process with drift α = 0 and diffusion coefficient β = 1,such that Kolmogorov’s forward and backward equation are given by

∂p

∂t− 1

2

∂2p

∂y 2= 0,

∂p

∂s+

1

2

∂2p

∂x2= 0.

Informal proof that a Wiener process is not differentiable:

Consider the partition of a bounded time interval [s, t] into subintervals [τ(n)k , τ

(n)k+1] of

equal length, such that

τ(n)k = s + k

t − s

2n, k = 0, 1, . . . , 2n − 1.

Consider a sample path Wω(τ) of the standard Wiener process W = Wτ , τ ∈ [s, t].It can be shown (Kloeden and Platen, p. 72) that

limn→∞

2n−1Xk=0

“W (τ

(n)k+1, ω)−W (τ

(n)k ), ω

”2= t − s.

Hence, taking the limit superior, i.e. the supremum2 of all the limit points, we get

t − s 6 lim supn→∞

maxk

˛W (τ

(n)k+1, ω)−W (τ

(n)k , ω)

˛ 2n−1Xk=0

˛W (τ

(n)k+1, ω)−W (τ

(n)k , ω)

˛.

From the sample path continuity, we have maxk

˛W (τ

(n)k+1, ω)−W (τ

(n)k , ω)

˛→ 0 with

probability 1 when n→∞ and therefore

2n−1Xk=0

˛W (τ

(n)k+1, ω)−W (τ

(n)k , ω)

˛→∞.

As a consequence, the sample paths do almost surely not have bounded variation on

[s, t] and cannot be differentiated.

2For S ⊆ T , the supremum of S is the least element of T , which is greater or equal to allelements of S.

Let s 6 t. The two-time covariance is then given by

vs,t = 〈WtWs〉= 〈(Wt−Ws + Ws)Ws〉

= 〈Wt −Ws〉〈Ws〉+ 〈W 2s 〉

= 0 · 0 + s.

The transition density of W is given by p(Wt |Ws) = N (Ws , t − s). Hence,the drift and the diffusion coefficient for a standard Wiener process are

α(s,Ws) = limt↓s

〈Wt〉 −Ws

t − s= 0,

β(s,Ws) = limt↓s

˙W 2

t

¸−2 〈Wt〉Ws + W 2

s

t − s= lim

t↓s

˙W 2

t

¸−W 2

s

t − s= lim

t↓s

t − s

t − s.

The same results are found by directly differentiating the transition densityas required in the Kolmogorov’s equations.

Brownian bridge

A Brownian bridge is a Wiener process pinned at both ends, i.e. the samplepaths all go through an initial state at time t = 0 and a given state at a latertime t = T .

Let W = Wt , 0 6 t be a standard Wiener process. The Brownian bridgeB(x0, yT ) = Bt(x0, yT ), 0 6 t 6 T is a stochastic process, such that

Bt(x0, yT ) = x0 + Wt −t

T(x0 + WT − yT ).

A Brownian bridge Bt(x0, yT ) is a Gaussian process with mean function andtwo-time covariance given by

〈Bt〉 = x0 −t

T(x0 − yT ),

vs,t = mins, t − st

T,

for 0 6 s, t 6 T .

Brownian bridge (continued)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5!3

!2

!1

0

1

2

3

t

B t

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5!2

!1

0

1

2

3

4

5

t

B tFigure: Sample path examples of a Brownian bridge for different initial and final states.

Diffusion processes revisited

Let W = Wt , t > 0 be a standard Wiener process. The time evolution of adiffusion process can be described by a stochastic differential equation (SDE):

dXt = α(t,Xt)dt + β(t,Xt)dWt , dWt ∼ N (0, dtID),

where X = Xt , t > 0 is a stochastic process with drift α ∈ RD and diffusioncoefficient β ∈ RD×D .

This representation corresponds to the state-space representation ofdiscrete-time dynamical systems.

An SDE is be interpretted as a (stochastic) integral equation along asample path ω, that is

X (t, ω)− X (s, ω) =

Z t

s

α(τ,X (τ, ω))dτ +

Z t

s

β(τ,X (τ, ω))dW (τ, ω)

dτdτ.

This representation is symbolic as a Wiener process is almost surely notdifferentiable, but the limiting process corresponds to Gaussian whitenoise:

limh→0

W (τ + h, ω)−W (τ, ω)

h∼ N (0, 1).

This means that Gaussian white noise cannot be realized physically!

Construction of Ito’s stochastic integral

The central question is how to compute a stochastic integral of the formZ t

s

β(τ,X (τ, ω))dW (τ, ω) =?

K. Ito’s starting point is the following:

Consider the standard Wiener process W = Wt , t > 0 and a (scalar)constant diffusion coefficient β(t,Xt) = β for all t.

The integral along the sample path ω is equal toZ t

s

βdW (τ, ω) = β W (t, ω)−W (s, ω)

with probability 1.

The expected integral and the expected squared integral are thus given byfiZ t

s

βdW (τ, ω)

fl= 0,

*„Z t

s

βdW (τ, ω)

«2+

= β2(t − s).

Construction of Ito’s stochastic integral (continued)

Consider the integral of the random function f : T × Ω→ R:

I [f ](ω) =

Z t

s

f (τ, ω)dW (τ, ω).

It is assumed f is mean square integrable.

1 If f is a random step function, that is f (t, ω) = fj(ω) on [tj , tj+1[, then

I [f ](ω) =n−1Xj=1

fj(ω)W (tj+1, ω)−W (tj , ω),

with probability 1 for all ω. Since fj(ω) is constant on [tj , tj+1[, we get˙I [f ]¸

= 0,˙I 2[f ]

¸=P

j〈f2j 〉(tj+1 − tj).

2 If f (n) is a sequence of random n-step functions converging to the generalrandom function f , such that f (n)(t, ω) = f (t

(n)j , ω) on [t

(n)j , t

(n)j+1[, then

I [f (n)](ω) =n−1Xj=1

f (t(n)j , ω)W (t

(n)j+1, ω)−W (t

(n)j , ω),

with probability 1 for all ω. The same results follow.

The Ito stochastic integral

Theorem: The Ito stochastic integral I [f ] of a random function f : T ×Ω→ Ris the (unique) mean square limit of sequences of stochastic integrals I [f (n)] forany sequence of random n-step functions f (n) converging to f :

I [f ](ω) = m.s. limn→∞

n−1Xj=1

f (t(n)j , ω)W (t

(n)j+1, ω)−W (t

(n)j , ω)

with probability 1 and s = t(n)1 < . . . < t

(n)n−1 < t.

The Ito integral of f with respect to W is a zero mean random variable.

Since the Ito integral is constructed from the sequence f (n) evaluated attj ’s, it defines a stochastic process which is a martingale.

The chain rule from classical calculus does not apply (see later)!

The Stratonivich construction preserves the classical chain rule, but notthe martingale property.

We call˙I 2[f ]

¸=R t

s〈f 2〉dt the Ito isomery.

Ito’s stochastic integral follows from the fact that

DI 2[f (n)]

E=

n−1Xj=1

〈f 2(t(n)j , ω)〉(t(n)

j+1 − t(n)j ),

is a proper Riemann integral for t(n)j+1 − t

(n)j → 0.

Ito formula

Let Yt = U(t,Xt) and consider the process X = Xt , t > 0 described by thefollowing SDE:

dXt = α(t,Xt)dt + β(t,Xt)dWt .

The stochastic chain rule is given by

dYt =

∂U

∂t+ α

∂U

∂x+

1

2β2 ∂

2U

∂x2

ffdt + β

∂U

∂xdWt

with probability 1.

The additional term comes from the fact that

The symbolic SDE is to be interpreted as an Ito stochastic integral, i.e.with equality in the mean square sense.

dW 2t is of O(dt).

Chain rule for classical calculus:

Consider y = u(t, x). Discarding the second and higher order terms in the Taylorexpansion of u leads to

dy = u(t + dt, x + dx)− u(t, x)

=∂u

∂tdt +

∂u

∂xdx .

Chain rule for stochastic calculus:

For Yt = U(t,Xt), the Taylor expansion of U leads to

dYt = U(t + dt,Xt + dXt)− U(t,Xt)

=∂U

∂tdt +

∂U

∂xdXt +

1

2

∂2U

∂t2(dt)2 + 2

∂2U

∂t∂xdt dXt +

∂2U

∂x2(dXt)2

ff+ . . . ,

where(dXt)2 = α2(dt)2 + 2αβdt dWt+β2(dWt)2.

Hence, we need to keep the additional term of O(dt), such that

dYt =

∂U

∂t+

1

2β2 ∂

2U

∂x2

ffdt +

∂U

∂xdXt .

Substituting dXt leads to the desired results.

Application: Black-Scholes option-pricing model

Assume the evolution of a stock price Xt is described by a geometric Wienerprocess:

dXt = ρXtdt + σXtdWt ,

where ρ is called the risk-free rate (or drift) and σ the volatility.

Consider the change of variable Yt = log Xt . Applying the stochastic chain ruleleads to Black-Scholes formula:

dYt =

„ρ− σ2

2

«dt + σdWt .

This leads to the following solution for the stock price at time t:

Xt = X0 exp

„ρ− σ2

2

«t + σWt

ff.

Assumptions:

No dividends or charges

European exercise terms

Markets are efficient

Interest rates are known

Returns are log-normal

Ornstein-Uhlenbeck (OU) process

The Ornstein-Uhlenbeck process with drift γ > 0 and mean µ is defined asfollows:

dXt = −γ(Xt − µ)dt + σdWt .

The OU process is known as the mean reverting process.

It is a Gaussian process covariance function:

vs,t =σ2

2γe−γ|t−s|, s 6 t.

It is wide-sense stationary.

It is a homogeneous Markov process.

It process is a diffusion process.

It is the continuous equivalent of the discrete AR(1) process.

OU process (continued)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5!6

!5

!4

!3

!2

!1

0

1

2

t

X t

Figure: Sample path examples of a OU process with different drift and diffusioncoefficient. The same mean µ and initial condition were used.

References

Crispin W. Gardiner: Handbook of Stochastic Methods. Springer, 2004(3rd edition).

Peter E. Kloeden and Eckhard Platen: Numerical Solution of StochasticDifferential Equations. Springer, 1999.

Bernt Øksendal: Stochastic Differential Equations. An Introduction withApplications. Springer, 2000 (5th edition).

A Tutorial Introduction to Stochastic Differential Equations: Continuoustime Gaussian Markov Processes by Christopher K. I. Williams at NIPSworkshop on Dynamical Systems, Stochastic Processes and BayesianInference, 2006.

Levy processes and Finance by Matthias Winkel.

Documents

Lecture 4: Introduction to stochastic processes and ... · Lecture 4: Introduction to stochastic processes and stochastic calculus C edric Archambeau Centre for Computational Statistics