Merton Portfolio Optimization Problem

FGV - Fundacao Getulio VargasMaster Program - Mathematical Modeling

Merton Portfolio Optimization Problem

Student: Gustavo Adolfo Martins Jotta Soares

Supervisor: Yuri Fanham Saporito

Rio de Janeiro2017

Contents

1 Introduction 6

2 Preliminary results 82.1 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Ito’s integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Ito’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 Stochastic Differential Equations (SDE’s) . . . . . . . . . . . . . . . . . . . 152.5 Feynman-Kac representation . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Continuous-time Stochastic Control 193.1 Controlled diffusion processes . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Admissible controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3 Stochastic Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4 Dynamic Programming Principle (DPP) . . . . . . . . . . . . . . . . . . . 23

3.4.1 Interpretation of DPP . . . . . . . . . . . . . . . . . . . . . . . . . 233.4.2 Using DPP to compute the value function . . . . . . . . . . . . . . 27

3.5 Hamilton-Jacobi-Bellman Equation (HJB) . . . . . . . . . . . . . . . . . . 293.5.1 Formal derivation of HJB . . . . . . . . . . . . . . . . . . . . . . . 29

3.6 Verification Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4 Merton Portfolio Optimization Problem 394.1 The Market Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.2 The Classical Merton Portfolio Optimization Problem . . . . . . . . . . . 40

4.2.1 Terminal Utility of Wealth Maximization . . . . . . . . . . . . . . 414.2.2 Utility of Consumption Maximization . . . . . . . . . . . . . . . . 45

5 Conclusions 48

Bibliographic references 48

2

Abstract

Merton’s portfolio optimization problem is the choice an investor must make of howmuch of its wealth it should consume and how much it should allocate between stocksand a risk-free asset in order to maximize the expected utility.

The focus of this work was to solve two of the cases of the Merton problem. For this,we studied some fundamental themes, such as: Dynamic Principle Programming (DPP)and the Hamilton-Jacobi-Bellmann Equation (HJB Equation). In addition, we reviewsome concepts of Stochastic Processes and some important results of Ito Calculus.

Merton’s portfolio optimization problem is well known in finance and the centralideas for solving it are adaptable to solving other finance problems.

Key words: Merton, DPP, HJB

3

Ficha catalográfica elaborada pela Biblioteca Mario Henrique Simonsen/FGV

Soares, Gustavo Adolfo Martins Jotta

Merton portfolio optimization problem / Gustavo Adolfo Martins Jotta Soares. - 2017.

47 f.

Dissertação (mestrado) – Fundação Getulio Vargas, Escola de Matemática

Aplicada. Orientador: Yuri Fahham Saporito.

Inclui bibliografia.

1. Matemática financeira. 2. Investimentos - Análise. 3. Merton, Modelo

de. I. Saporito, Yuri Fahham. II. Fundação Getulio Vargas. Escola de Matemática Aplicada. III. Título.

CDD – 513.93

Chapter 1

Introduction

Control theory is an interdisciplinary branch of Engineering and Mathematics thatdeals with the behavior of dynamic systems. Stochastic Control or Optimal StochasticControl deals with the existence of both uncertainties in observations and noise thatdrive the evolution of the system.

In this work, we will consider a financial market consisting of one risk free asset S0

and one risky asset S1. An agent may invest in this market at any time t, with a numberof shares αt in that risk asset at a price St. Denoting by Xt your wealth in time t, theamount to be invested in the risk-free asset, in the time t, will be:

Xt − αt · S1t

S0t

. (1.1)

Then, accumulated wealth evolves according to:

dXt =

(Xt − αt · S1

t

)S0

t

· dS0t + αtdS1

t . (1.2)

The problem of optimization portfolio selection is to choose the best investment incertain assets given a criterion. One of the criteria that can be chosen by the agent isthe mean-variance criterion, which can be seen in two ways:

infαVar(XT);E [XT] ≥ m (1.3)

orsupα

E [XT] ; Var(XT) ≤ σ2

. (1.4)

6

This problem is known as Classical Merton Portfolio Problem and in the presentwork we will see how to solve it. For this, we are going to present some prelimi-nary results, among them Ito Calculus, and two key subjects: Dynamic ProgrammingPrinciple (DPP) and Hamilton-Jacobi-Bellman (HJB) Equation.

In general, the stochastic control problem deals with a model where the state of thesystem is governed by a stochastic differential equation (SDE):

dXt,x,αs = b

(s,Xt,x,α

s , αs

)ds + σ

(s,Xt,x,α

s , αs

)dWs, (1.5)

where Xt,x,αt = x, and a gain function

J(t, x, α) = E

[∫ T

tf(s,Xt,x,α

s , αs

)ds + g

(Xt,x,α

T

)]. (1.6)

The goal is to maximize the gain function with respect to the admissible control 1 α.Then, we introduce the value function:

v(t, x) = supα

J(t, x, α). (1.7)

1We will introduce this concept in Section 3.2.

7

Chapter 2

Preliminary results

To obtain the solution to the problem mentioned in the previous chapter, we willpresent, among others, the key concepts of Dynamic Programming Principle (DPP)and Hamilton-Jacobi-Bellman Equation (HJB) in the next chapter. Before that, we willrevisit some important concepts and results of Stochastic Analysis. Here, we will notworry about the construction nor the proof of these results. We mention, among others,Pham [Ph09], Karatzas and Shreve [KaSh88], Steele [St01] or Friedman [Fr75].

2.1 Stochastic Processes

A stochastic process is a mathematical model for the occurrence, at each moment after theinitial time, of a random phenomenon. The randomness is captured by the introductionof a measurable space (Ω,F ), called the sample space, on which probability measurescan be placed. Thus, a stochastic process with state space S is a collection of randomvariables Xt; t ∈ T taking values in S defined on the same probability space (Ω,F ,P).

The set T is called its parameter set. If T =N = 0, 1, 2, . . ., the process is said to be adiscrete parameter process. If T is an interval ofR, the process is said to have a continuousparameter.

The index t represents time, and one thinks of Xt as the state or the position of theprocess at time t. The state space is R in most examples, and then the process is saidreal-valued.

For every fixed ω ∈ Ω, the mapping

t 7→ Xt(ω),

defined on the parameter set T, is called the sample path of the process X associated withω.

Definition 2.1. The stochastic process X is said to be a cad-lag (resp. continuous) if foreach ω ∈ Ω, the path Xt(ω) is right-continuous and admits left limits at each t (resp. iscontinuous).

Definition 2.2. A filtration is an increasing family F = (Ft)t∈[0,∞) of σ-fields of F : Fs ⊂

Ft ⊂ F for all 0 ≤ s ≤ t in [0,∞).

In the sequel, we are given a filtration F = (Ft)t∈[0,∞) on (Ω,F ,P).

8

Definition 2.3. The setB([0, t]) is called the set of Borel sets of [0, t] whenB([0, t]) is thesmallest σ−field that contains all of open subsets of [0, t].

Definition 2.4. Let’s take FT ⊗ B to be the smallest σ−field that contains all of theproduct sets A × B, where A ∈ Ft and B ∈ B. Then, a function f : Ω × [0, t]→ R:

(i) is measurable if f ∈ FT ⊗ B and;

(ii) is adapted provided that f (·, t) ∈ Ft for each t ∈ [0,T].

This means that an adapted process (F-adapted) is a process whose value at anytime t is revealed by the information Ft.

Definition 2.5. The stochastic process (Xt)t∈T is progressively measurable (with respect toF) if for any t ∈ T, the mapping (s, ω)→ Xs(ω) is measurable on [0, t]×Ω equiped withthe product σ-field B([0, t]) ⊗ Ft.

Continuous-time martingale

Definition 2.6 (Continuous-time martingale). Let (Xt)t∈[0,∞) a process adapted withrespect to (Ft)t∈[0,∞) as in Definition 2.4. We say that (Xt) is a martingale if:

(i) E [|Xt|] < ∞, for all 0 ≤ t < ∞;

(ii) E [Xt|Ft] = Xs, for all 0 ≤ s ≤ t < ∞.

Brownian Motion

Brownian motion is the most important stochastic process. L. Bachelier, ackowledgedfather of quantitative methods in finance, invented it for one of the first importantapplications. His purpose was to provide a model to financial markets, hoping that theBrownian Motion would lead to a model for security prices that would provide a soundbasis for the pricing of options, a hope that was vindicated after some modifications.Since then, Brownian Motion is largely used in finance for modelling stock prices. Thefirst proof that Brownian motion exists as a rigorously defined mathematical object wasmade by N. Wiener. That’s why Brownian Motion is also know as Wiener Process.

Definition 2.7 (Brownian Motion). A Brownian motion or a Wiener process is a continuous-time stochastic process Wt; 0 ≤ t ≤ T defined on a probability space (Ω,W,P) if itsatisfies:

(i) W0 = 0;

(ii) for any 0 ≤ t0 < t1 < . . . < tn ≤ T, the random variables Wtk −Wtk−1 (1 ≤ k ≤ n) areindependent;

(iii) for any 0 ≤ s ≤ t <≤ T, the increment Wt −Ws has the Gaussian distribution withmean 0 and variance t − s;

(iv) for all ω in a set with probability one, Wt(ω) is a continuous function of t.

9

Figure 2.1: Louis Bachelier (1879-1946) Figure 2.2: Norbert Wiener (1894-1964)

In this work, we will not discuss the existence, construction and simulation of aBrownian motion, we only state the classical properties of Brownian motion.

Proposition 2.1. Let (Wt)t∈[0,T] be a Brownian motion with respect to (Ft)t∈[0,T].

(1) Symmetry: (−Wt)t∈[0,T] is also a Brownian motion.

(2) Scaling: for all λ > 0, the process(( 1λ

Wλ2t

))t∈[0,T]

is also a Brownian motion.

(3) Invariance by translation: for all s > 0, the process (Wt+s−Ws)t∈[0,T] is a standard Brownianmotion independent of Fs.

Moreover, (Xt)t∈[0,∞) is continuous if there exists Ω0 ⊂ Ω with P (Ω0) = 1 such thatfor all ω ∈ Ω0 the function on [0,∞) defined by t 7→ Xt(ω) is continuous.

2.2 Ito’s integration

Kiyosi Ito was a japanese mathematician who pioneered the construction of a newcalculus for stochastic processes, now known as the Ito calculus, which is one of themost useful tools of probability theory.

Figure 2.3: Kiyosi Ito (1915-2008)

10

Definition 2.8. The classH2 = H2[0,T] consists of all measurable adapted functions φthat satisfy the integrability constraint

E

[∫ T

0φ2(ω, t)dt

]< ∞. (2.1)

Definition 2.9 (Simple Functions). A function φ : Ω → R is simple, and denoted byφ ∈ H0[0,T], if

φ(ω, t) =

n−1∑i=0

ai(ω) · 1(ti,ti+1](t), (2.2)

where 0 = t0 < t1 < . . . < tn = T and ai ∈ Fti , with E[a2

i

]< ∞.

Lemma 2.1. Let φ ∈ H2[0,T]. Then there exists a sequence of bounded simple functions φn

inH2[0,T] such that

E

[∫ T

0

∣∣∣φ(t) − φn(t)∣∣∣2 dt

]→ 0, if n→∞. (2.3)

Definition 2.10. Let φ(t) be a simple function in H2[0,T], say φ(t) = φi, if ti ≤ t ≤ ti+1,0 ≤ i ≤ r − 1 where 0 = t0 < t1 < . . . < tr = T. The random variable

r−1∑k=0

φ(tk)[Wtk+1 −Wtk

](2.4)

is denoted by ∫ T

0φ(s)dWs (2.5)

and is called the stochastic integral or Ito integral of φ with respect to the BrownianMotion W.

In the setting of Lemma 2.1, we can show that the sequence∫ T

0φn(s)dWs

(2.6)

is convergent in probability and its limit is independent of the particular sequence φn.We denote this limit by ∫ T

0φ(s)dWs (2.7)

and call it the stochastic integral or Ito integral of φ(t) with respect to the Brownianmotion Wt.

11

Theorem 2.1 (Ito’s isometry). If φ is a function inH2[0,T], then

E

[∫ T

0φ(s)dWs

]= 0, (2.8)

E

∣∣∣∣∣∣∫ T

0φ(s)dWs

∣∣∣∣∣∣2 = E

[∫ T

0φ2(s)ds

]. (2.9)

Let φ ∈ H2[0,T] and consider the process

Xt =

∫ t

0φ(s)dWs, 0 ≤ t ≤ T, (2.10)

where, by definition,∫ 0

0φ(s)dWs = 0. The stochastic process Xt is called indefinite

integral of φ. Notice that Xt is Ft−measurable.Moreover, ∫ u

0φ(s)dWs +

∫ t

uφ(s)dWs =

∫ t

0φ(s)dWs, if 0 ≤ u < t ≤ T, (2.11)

for any φ ∈ H2[0,T].

Theorem 2.2. If φ ∈ H2[0,T], then the indefinite integral Xt, 0 ≤ t ≤ T, is a martingale.

2.3 Ito’s formula

In classical calculus, we usually use the Fundamental Theorem of Calculus to computethe integrals, as it is difficulty to compute them by using the definition. With Ito integral,the situation is similar and exists an appropriate analog to the traditional fundamentaltheorem of calculus.

Theorem 2.3 (Ito’s formula - Brownian case). If f : R → R has a continuous secondderivative, then

f (Wt) = f (0) +

∫ t

0f ′(Ws)dWs +

12

∫ t

0f ”(Ws)ds. (2.12)

The presence of the second integral is the most important feature of Ito’s formulabecause, without that, we would just have the formal transciption of the usual funda-mental calculus theorem. Moreover, both integrals on the right hand side make sensebecause of the continuity of f ′ and f ”.

So far, in order to calculate or interpret an Ito’s integral, we had to resort to itsdefinition. Using Ito’s formula, this laborious reduction is replaced by a process thatis almost always simpler and concrete. Taking, for example, a function f ∈ C2(R) forwhich F′ = f and F(0) = 0, Ito’s formula can be written as a formula for the integral off : ∫ t

0f (Ws)dWs = F(Ws) −

12

∫ t

0f ′(Ws)ds. (2.13)

12

For example, if f (Ws) = Ws, then we have:∫ t

0f (Ws)dWs =

W2t

2−

t2

2. (2.14)

Notice that Ws ∈ H2, so its Ito integral is a martingale. Then, by Equation (2.14), we

conclude that W2t − t is also a martingale.

Theorem 2.4 (Ito’s formula with space and time variables). For any function f ∈ C1,2(R+×

R), we have the representation

f (t,Wt) = f (0, 0) +

∫ t

0

∂ f∂t

(s,Ws)ds +

∫ t

0

∂ f∂x

(s,Ws)dWs +12

∫ t

0

∂2 f∂x2 (s,Ws)ds. (2.15)

We can rewrite the Ito formula more concisely by differential notation. If Xt =f (t,Wt), then

dXt =∂ f∂t

(t,Wt)dt +∂ f∂x

(t,Wt)dWt +12∂2 f∂x2 (t,Wt)dt. (2.16)

There are connections between theory of martingales and the differential equations,which many of them are made by Ito’s formula. The simplest, but handiest, of theseconnections is given by the next theorem.

Theorem 2.5 (Martingale PDE condition). If f ∈ C1,2(R+ ×R) and

∂ f∂t

= −12∂2 f∂x2 (2.17)

and

E

∫ T

0

(∂ f∂t

(t,Wt))2

dt

< ∞, (2.18)

then Xt = f (t,Wt) is a martingale on 0 ≤ t ≤ T.

Example 2.1. Consider the process

Mt = f (t,Wt) = expαWt −

α2t2

.

Then, we have∂ f∂t

(t,Wt) = −α2

2f (t,Wt)

and∂2 f∂x2 (t,Wt) = α2 f (t,Wt).

So, condition (2.17) is satisfied. Moreover,

E

∫ T

0

(−α2

2f (t,Wt)

)2

dt

= E

[∫ T

0

α4

4f 2(t,Wt)dt

]< ∞.

Hence, by condition (2.18) Mt is a martingale.

13

Definition 2.11 (Brownian motion with drift). Consider a standard Brownian motionWt. Then the process Xt defined by

Xt = µt + σWt (2.19)

is called Brownian motion with a constant drift rate µ and an instantaneous varianceσ2.

We can rewrite the Brownian motion of Definition 2.19 as a stochastic differentialequation

dXt = µdt + σdWt. (2.20)

Consider now a function Yt = f (t,Xt) that can naturally be written as Yt = g(t,Wt),where g(t, x) = g(t, µt + σx). Applying Ito’s formula to Yt, we have:

dYt =∂g∂t

(t,Wt)dt +∂g∂x

(t,Wt)dWt +12∂2g∂x2 (t,Wt)dt. (2.21)

Moreover, by chain rule, we have:

∂g∂t

(t, x) =∂ f∂t

(t, µt + σx) + µ∂ f∂x

(t, µt + σx), (2.22)

∂g∂x

(t, x) = σ∂ f∂x

(t, µt + σx), (2.23)

∂2g∂x2 (t, x) = σ2∂

2 f∂x2 (t, µt + σx). (2.24)

Substituting (2.22), (2.23) and (2.24) into (2.21), we have:

dYt =

[∂ f∂t

(t, µt + σWt) + µ∂ f∂x

(t, µt + σWt)]

dt + σ∂ f∂x

(t, µt + σWt)dWt

+12σ2∂

2 f∂x2 (t, µt + σWt)dt

=∂ f∂t

(t,Xt)dt + µ∂ f∂x

(t,Xt)dt + σ∂ f∂x

(t,Xt)dWt +σ2

2∂2 f∂x2 (t,Xt)dt

=∂ f∂t

(t,Xt)dt +∂ f∂x

(t,Xt)dXt +σ2

2∂2 f∂x2 (t,Xt)dt. (2.25)

There is a rule called Box Calculus that allows us to compute Equation (2.25) withoutdelay. This is an algebra for the setA of linear combinations of the formal symbols dtand dWt. In this algebra, adapted functions are regarded as scalar values and additionis just the usual algebraic addition. Products are then computed by the traditionalrules of associativity and transitivity together with a multiplication table for the specialsymbols dt and dWt.

· dt dWt

dt 0 0dWt 0 dt

14

Applying the rules of Box Calculus to (2.20), we have:

(dXt)2 = dXt · dXt

= (µdt + σdWt) · (µdt + σdWt)= µ2dt · dt + µσdt · dWt + σµdWt · dt + σ2dWt · dWt

= σ2dt. (2.26)

Then,

d f (t,Xt) =∂ f∂t

(t,Xt)dt +∂ f∂x

(t,Xt)dWt +12∂2 f∂x2 (t,Xt)(dXt)2

=∂ f∂t

(t,Xt)dt +∂ f∂x

(t,Xt)dWt +σ2

2∂2 f∂x2 (t,Xt)dt.

Indeed, the equation (2.25) works not only for functions of Brownian motions withdrift, but it is valid whenever Xt is a smooth function of time and Brownian motion.

2.4 Stochastic Differential Equations (SDE’s)

Let’s fix a filtered probabilty space (Ω,F ,F = (Ft)t∈[0,∞),P) satisfying the usual condi-tions and a Brownian motion Wt with respect to F. Consider given functions µ(t, x, ω)and σ(t, x, ω) defined on [0,∞)×R×Ω and both valued inR. Moreover, we will assumethat, for all ω, the functions µ(·, ·, ω) and σ(·, ·, ω) are Borelian on [0,∞) × R and for allx ∈ R, the processes µ(·, x, ·) and σ(·, x, ·), written µ(·, x) and σ(·, x) for simplification, areprogressively measurable. So, let’s define:

Definition 2.12 (Stochastic Differential Equation). A Stocasthic Differential Equation(SDE) is an equation of the form

dXt,xs = µ(s,Xt,x

s )ds + σ(s,Xt,xs )dWs, with Xt,x

t = x. (2.27)

The coefficients µ and σ are interpreted as measures of short-term growth andshort-term variability, respectively.

Definition 2.13 (Strong solution of SDE). A strong solution of SDE (2.27) starting attime t is a progressively measurable process X such that∫ T

t

∣∣∣µ(s,Xt,xs )

∣∣∣ ds +

∫ T

t

∣∣∣σ(s,Xt,xs )

∣∣∣2 ds < ∞, a.s.,∀t ≤ T ∈ [0,∞),

and the following relation:

Xt,xT = x +

∫ T

tµ(s,Xt,x

s )ds +

∫ T

tσ(s,Xt,x

s )dWs, t ≤ T ∈ [0,∞), (2.28)

holds true a.s.

15

Uniqueness and existence of a solution

Theorem 2.6. Assume there exists a (deterministic) constant K and a real-valued process κsuch that for all t ∈ [0,∞), ω ∈ Ω, x, y ∈ R:

• Lipschitz growth condition:∣∣∣µ(t, x, ω) − µ(t, y, ω)∣∣∣ +

∣∣∣σ(t, x, ω) − σ(t, y, ω)∣∣∣ ≤ K

∣∣∣x − y∣∣∣ ; (2.29)

• Linear growth condition:∣∣∣µ(t, x, ω)∣∣∣ + |σ(t, x, ω)| ≤ κt(ω) + K |x| , (2.30)

with

E

[∫ t

0|κu|

2 du]< ∞,∀t ∈ [0,∞). (2.31)

Under conditions (2.29), (2.30) and (2.31), we can affirm that:

(i) there exists, for all t ∈ [0,∞), a strong solution to the SDE (2.27) starting at time t;

(ii) For any ξFt−measurable random variable valued in R, such that E[|ξ|2

]< ∞, there is a

unique strong solution X starting from ξ at time t, i.e., Xt = ξ. The uniqueness is path-wise, i.e., if X and Y are two such strong solutions, we haveP [Xs = Ys,∀t ≤ s ∈ [0,T)] =1;

(iii) This solution is square integrable: for all T > t, there exists a constant CT such that

E

[supt≤s≤T|Xs|

2

]≤ CT

(1 + E

[|ξ|2

]).

Corollary 2.1 (Flow condition). By pathwise uniqueness, for all t ≤ u, u ∈ R, x ∈ R, wehave

Xt,xs = Xu,Xt,x

us .

Proof. By (2.28), we have:

Xt,xT = x +

∫ T

tµ(s,Xt,x

s )ds +

∫ T

tσ(s,Xt,x

s )dWs ⇒

⇒ Xt,xT = x +

∫ u

tµ(s,Xt,x

s )ds +

∫ u

tσ(s,Xt,x

s )dWs +

∫ T

uµ(s,Xt,x

s )ds +

∫ T

uσ(s,Xt,x

s )dWs ⇒

⇒ Xt,xT = Xt,x

u +

∫ T

uµ(s,Xt,x

s )ds +

∫ T

uσ(s,Xt,x

s )dWs ⇒

⇒ dXu,Xt,xu

s = µ(s,Xt,xs )ds + σ(s,Xt,x

s )dWs, with Xu,Xut,xu = Xt,x

u .

16

2.5 Feynman-Kac representation

Stochastic differential equations have many applications. One of the most importantapplications is the stochastic representation for solutions to partial differential equa-tions, and this is known as the Feynman-Kac formula. The formula builds a bridgebetween stochastic differential equations (SDE) and partial differential equations (PDE),and creates the probabilistic approach to the study of partial differential equations.

Suppose that the function u : R+ ×R→ R satisfies the PDE:

∂u∂t

(t, x) =12σ2(x)

∂2u∂x2 (t, x) + µ(x)

∂u∂x

(t, x) + q(x)u(t, x), (2.32)

for any (t, x) ∈ (0,∞) × R, with u(0, x) = f (x). Let’s assume that µ, σ and f are smoothenough. Moreover, assuming that u is a solution for (2.32), we are implicitly assumingthat u ∈ C1,2(R+ ×R).

Let’s verify the Feyman-Kac representation for solution u:

u(t, x) = E

[exp

∫ t

0q (Xs) ds

f (Xt)

], (2.33)

where X is solution of SDE

dXt = µ (Xt) dt + σ (Xt) dWt,

W is a Brownian motion and X0 = x and find a martingale involving u. In the sequel,we will fix t > 0.

First, let’s define Us = u (t − s,Xs) for 0 ≤ s ≤ t. Applying Ito’s formula, we have:

dUs =∂u∂t

(t − s,Xs) · (−1)ds +∂u∂x

(t − s,Xs) dXs +12·∂2u∂x2

(t − s,Xs) (dXs)2 . (2.34)

But, dXs = µ (Xs) ds + σ (Xs) dWs. So,

(dXs)2 = µ2 (Xs) (ds)2 + 2µ (Xs) σ (Xs) dsdWs + σ2 (Xs) (dWs)

2⇒

⇒ (dXs)2 = σ2 (Xs) ds. (2.35)

Substituting (2.35) into (2.34), we have:

dUs = −∂u∂t

(t − s,Xs) +∂u∂x

(t − s,Xs) dXs +12· σ2 (Xs) ·

∂2u∂x2

(t − s,Xs) ds. (2.36)

By Equation (2.32), we can affirm:

∂u∂t

(t − s,Xs) =12σ2(Xs)

∂2u∂x2

(t − s,Xs) + µ(Xs)∂u∂x

(t − s,Xs) + q(Xs)u (t − s,Xs) . (2.37)

17

Then, substituting (2.37) into (2.36), we have:

dUs = −12σ2(Xs)

∂2u∂x2

(t − s,Xs) ds − µ(Xs)∂u∂x

(t − s,Xs) ds − q(Xs)u (t − s,Xs) ds

+∂u∂x

(t − s,Xs) dXs +12· σ2 (Xs) ·

∂2u∂x2

(t − s,Xs) ds

= − µ(Xs)∂u∂x

(t − s,Xs) ds − q(Xs)u (t − s,Xs) ds +∂u∂x

(t − s,Xs)(µ(Xs)ds + σ(Xs)dWs

)= − µ(Xs)

∂u∂x

(t − s,Xs) ds − q(Xs)u (t − s,Xs) ds + µ(Xs)∂u∂x

(t − s,Xs) ds

+ σ(Xs)∂u∂x

(t − s,Xs) dWs

= − q(Xs)u (t − s,Xs) ds + σ(Xs)∂u∂x

(t − s,Xs) dWs. (2.38)

Now, let Is = exp∫ t

0q(Xs)ds

and let’s compute d(UsIs).

d (UsIs) = dUs · Is + Us · dIs + dUs · dIs

=

[−q(Xs)Usds + σ(Xs)

∂u∂x

(t − s,Xs) dWs

]Is + Us

[Isq(Xs)ds

]+

[−q(Xs)Usds + σ(Xs)

∂u∂x

(t − s,Xs) dWs

]·[Isq(Xs)ds

]= −q(Xs)UsIsds + σ(Xs)Is

∂u∂x

(t − s,Xs) dWs + UsIsq(Xs)ds

= σ(Xs)Is∂u∂x

(t − s,Xs) dWs. (2.39)

The drift of (2.39) is equal zero. Hence, we can affirm that Ms = Us · Is is a localmartingale. Assuming that σ(Xs)Is

∂u∂x (t − s,Xs) ∈ H2, (Ms)s∈[0,t] is a real martingale.

Then,E [Mt|F0] = M0 ⇒ E [E [Mt|F0]] = E [M0] = E [Mt] . (2.40)

But,

Mt = Ut · It = u (t − t,Xt) · exp∫ t

0q(Xs)ds

= u (0,Xt) · exp

∫ t

0q(Xs)ds

= exp

∫ t

0q(Xs)ds

· f (Xt). (2.41)

Moreover, M0 = Uo · I0 = u(t − 0,X0) = u(t, x).Therefore, u(t, x) = E [M0] = E

[exp

∫ t

0q(Xs)ds

· f (Xt)

].

18

Chapter 3

Continuous-time Stochastic Control

3.1 Controlled diffusion processes

Let us consider a control model where the state of the system is governed by a StochasticDifferential Equation (SDE) valued in R:

dXt,x,αs = b(Xt,x,α

s , αs)ds + σ(Xt,x,αs , αs)dWs, Xt,x,α

t = x, s ≥ t (3.1)

where W is a 1-dimensional Brownian Motion on a filtered probability space (Ω,F ,F =(Ft)t≥0,P) satisfying the usual conditions. In this work, we will develop the theory for 1-dimensional Brownian motion and SDE’s valued inR, but the results for m-dimensionalBrownian motions and SDE’s valued in Rn are similar.

The controlα = (αs) is a progressively measurable (with respect toF) process, valuedin some set A ⊂ R.

The measurable functions b : R × A → R and σ : R × A → R satisfy a Lipschitzcondition uniform in A:

∃k ≥ 0,∀x, y ∈ R,∀a ∈ A, |b(x, a) − b(y, a)| + |σ(x, a) − σ(y, a)| ≤ k|x − y|. (3.2)

3.2 Admissible controls

Finite horizon problem

Now, let us deal with the objective functions and, for finite horizon problem, let usestablish some conditions for the value function to be well defined.

Let us fix a finite horizon 0 < T < ∞. We denote by A the set of control process αsuch that:

E

[∫ T

0|b(0, αt)|2 + |σ(0, αt)|2dt

]< ∞. (3.3)

As we know, the conditions (3.2) and (3.3) ensure, for all α ∈ A and for any initialcondition (t, x) ∈ [0,T] × R, the existence and uniqueness of a strong solution for SDE(3.1) starting from x at t. Let us denote this solution by Xt,x,α

s , t ≤ s ≤ T with a.s.

19

continuous paths. Moreover, we have:

E

[supt≤s≤T|Xt,x,α

s |2

]< ∞. (3.4)

Definition 3.1. Let f : [0,T] ×R ×A→ R a measurable function. For (t, x) ∈ [0,T] ×R,we denote byA(t, x) the subset of controls α ∈ A such that:

E

[∫ T

t

∣∣∣∣ f (s,Xt,x,α

s , αs

)∣∣∣∣ ds]< ∞. (3.5)

Remark 3.1. When f satisfies a quadratic growth condition in x, i.e., there exists apositive constant C and a positive function k : A→ R+ such that

| f (t, x, a)| ≤ C(1 + |x|2) + k(a),∀(t, x, a) ∈ [0,T] ×R × A, (3.6)

then the estimate (3.4) shows that for all (t, x) ∈ [0,T]×R, for any constant control α = ain A, we have

E

[∫ T

t

∣∣∣ f (s,Xt,x,αs , a)

∣∣∣ ds]≤ E

[∫ T

t

(C(1 + |Xt,x,α

s |2) + k(a)

)ds

]= E

[∫ T

t

(C + C · |Xt,x,α

s |2 + k(a)

)ds

]

= E

∫ T

t

constant︷︸︸︷(C + k(a)) +C · |Xt,x,α

s |2ds

= E

[(C + k(a))(T − t) + C

∫ T

t|Xt,x,α

s |2ds

]= (C + k(a))(T − t) + E

[C

∫ T

t|Xt,x,α

s |2ds

]= (C + k(a))(T − t) + C

∫ T

tE

[|Xt,x,α

s |2]

ds

≤ (C + k(a))(T − t) + C∫ T

tE

[supt≤u≤T

|Xt,x,αu |

2

]ds

= (C + k(a))(T − t) + C

<∞︷︸︸︷E

[supt≤u≤T

|Xt,x,αu |

2

](T − t).

Then,

E

[∫ T

t| f (s,Xt,x,α

s , a)|ds]< ∞.

Hence, constant controls lie in A(t, x). Moreover, if there exists a positive constantC such that

k(a) ≤ C(1 + |b(0, a)|2 + |σ(0, a)|2),

20

for all a in A, then the conditions (3.3) and (3.4) show that for all (t, x) ∈ [0,T] × R, forany control α ∈ A:

E

[∫ T

t

∣∣∣ f (s,Xt,x,αs , αs)

∣∣∣ ds]

≤ E

[∫ T

t

(C(1 + |Xt,x,α

s |2) + k(αs)

)ds

]≤ E

[∫ T

t

(C(1 + |Xt,x,α

s |2) + C(1 + |b(0, αs)|2 + |σ(0, αs)|2)

)ds

]= E

[∫ T

t(2C + C|Xt,x,α

s |2)ds

]+ E

[∫ T

t(|b(0, αs)|2 + |σ(0, αs)|2)ds

]= E

[2C(T − t) + C

∫ T

t|Xt,x,α

s |2ds

]+ E

[∫ T

t(|b(0, αs)|2 + |σ(0, αs)|2)ds

]= 2C(T − t) + C · E

[∫ T

t|Xt,x,α

s |2ds

]+ E

[∫ T

t(|b(0, αs)|2 + |σ(0, αs)|2)ds

]≤ 2C(T − t) + C · E

∫ T

t

∣∣∣∣∣∣ supt≤u≤T

Xt,x,αu

∣∣∣∣∣∣2

ds

+ E

[∫ T

t(|b(0, αs)|2 + |σ(0, αs)|2)ds

]= 2C(T − t) + C ·

∫ T

tE


Xt,x,αu

∣∣∣∣∣∣2 ds + E

[∫ T

t(|b(0, αs)|2 + |σ(0, αs)|2)ds

]= 2C(T − t) + C · E


Xt,x,αu

∣∣∣∣∣∣2︸︷︷︸

<∞

(T − t) + E

[∫ T

t(|b(0, αs)|2 + |σ(0, αs)|2)ds

].

Moreover, as|b(0, αs)|2 + |σ(0, αs)|2 ≥ 0, ∀s ∈ [t,T],

we have ∫ T

t|b(0, αs)|2 + |σ(0, αs)|2ds ≤

∫ T

0|b(0, αs)|2 + |σ(0, αs)|2ds,

then

E

[∫ T

t(|b(0, αs)|2 + |σ(0, αs)|2)ds

]≤ E

[∫ T

0(|b(0, αs)|2 + |σ(0, αs)|2)ds

]⇒

⇒E

[∫ T

t(|b(0, αs)|2 + |σ(0, αs)|2)ds

]< ∞.

Hence,

E

[∫ T

t| f (s,Xt,x,α

s , αs)|ds]< ∞.

In other words, in this case,A(t, x) = A.

21

Infinite horizon problem

We denote byA0 the set of control processes α such that:

E

[∫ T

0|b(0, αt)|2 + |σ(0, αt)|2dt

]< ∞,∀T > 0. (3.7)

Given an initial condition at t = 0, x ∈ R, and a control α ∈ A0, there exists a uniquestrong solution, denoted by Xx

s , s ≥ 0 to (3.1).By Theorem 2.6, we have:

E[∣∣∣Xx,α

s

∣∣∣2] ≤ C|x|2 + CeCsE

[∫ s

0|x|2 + |b(0, αu)|2 + |σ(0, αu)|2du

], (3.8)

for some constant C independent of s, x and α.Now, let us deal with the objective function. Let β > 0 and f : R × A → R a

measurable function.For x ∈ R, we denote byA(x) the subset of controls α ∈ A0 such that:

E

[∫∞

0e−βs

∣∣∣ f (Xx,αs , αs)

∣∣∣ ds]< ∞, (3.9)

and we assume thatA(x) is not empty for all x ∈ R.

Remark 3.2. When f satisfies a quadratic growth condition in x, i.e., as in Remark 3.1 ,exist a positive constant C and a positive function k : A→ R+ such that:

| f (x, a)| ≤ C(1 + |x|2) + k(a),∀(x, a) ∈ R × A, (3.10)

then the estimate shows that for β > 0 large enough, for all x ∈ R, a ∈ A,

E

[∫∞

0e−βs| f (Xx,α

s , a)|ds]< ∞

Hence, the constant controls in A belong toA(x).

3.3 Stochastic Optimal Control

Definition 3.2 (Gain function).

i) For the finite horizon problem, under conditions stated in Definition 3.1, the gainfunction is defined by:

J(t, x, α) = E

[∫ T

tf(s,Xt,x,α

s , αs

)ds + g

(Xt,x,α

T

)], (3.11)

for all (t, x) ∈ [0,T]×R and α ∈ A(t, x), where g : R→ R is a measurable function.

22

ii) For the infinite horizon problem, the gain function is defined by:

J(x, α) = E

[∫∞

0e−βs f (Xx,α

s , αs)ds], (3.12)

for all x ∈ R and α ∈ A(x).

Remark 3.3. The constant β is usually called by discount factor and it is very importantto ensure the existence of the gain function.

In both problems, the objective is to maximize over all control processes the gainfunction J. Then, let us introduce the

Definition 3.3 (Associated value function).

i) For the finite horizon problem, the associated value function is

v(t, x) = supα∈A(t,x)

J(t, x, α); (3.13)

ii) For the infinite horizon problem, the associated value function is

v(x) = supα∈A(x)

J(x, α) (3.14)

Definition 3.4 (Optimal control).

i) For the finite horizon problem, given an initial condition (t, x) ∈ [0,T) × R, we saythat α ∈ A(t, x) is an optimal control if v(t, x) = J(t, x, α).

ii) For the infinite horizon problem, given an initial condition x ∈ R, we say thatα ∈ A(x) is an optimal control if v(x) = J(x, α).

3.4 Dynamic Programming Principle (DPP)

The dynamic programming principle (DPP) is a fundamental result in the theory ofstochastic control.

3.4.1 Interpretation of DPP

The interpretation of the DPP is that the optimization problem can be split in two parts:an optimal control on the whole interval [t,T] may be obtained by

I) searching for an optimal control from time u given the state value Xt,x,αu , i.e.,

compute v(u,Xt,x,αu );

II) then, maximizing over controls on [t,u] the quantity

E

[∫ u

tf (s,Xt,x,α

s , αs)ds + v(u,Xt,x,αs )

].

23

Theorem 3.1 (DPP for Finite horizon). Let (t, x) ∈ [0,T]×R and fixed u ∈ (0,T). Then, wehave:

v(t, x) = supα∈A(t,x)

E

[∫ u

tf (s,Xt,x,α

s , αs)ds + v(u,Xt,x,αu )

](3.15)

Remark 3.4. To prove this theorem, we will use the following equivalent descriptionof the DPP:

For any u fixed in [t,T], (3.15) holds if and only if,

(i) for all α ∈ A(t, x):

v(t, x) ≥ E[∫ u

tf (s,Xt,x,α

s , αs)ds + v(Xt,x,αu )

], (3.16)

(ii) for all ε > 0, there exists α ∈ A(t, x) such that for all u

v(t, x) − ε ≤ E[∫ u

tf (s,Xt,x,α

s , αs)ds + v(Xt,x,αu )

]. (3.17)

Proof. Let v(t, x) = supα∈A(t,x)E[∫ u

tf (s,Xt,x,α


]. Let us divide the proof

of this theorem in two parts.

i) First, we will prove that v(t, x) ≤ v(t, x).

Given an admissible control α ∈ A(t, x), we have, by pathwise uniqueness of theflow of SDE for X, the Markovian structure

Xt,x,αs = Xu,Xt,x,α

u ,αs , s ≥ u,

for any u ∈ (t,T) fixed.

By the law of iterated conditional expectation, we can affirm that:

J(t, x, α) = E

[∫ T

tf (s,Xt,x,α

s , αs)ds + g(Xt,x,αt )

]= E

[∫ u

tf (s,Xt,x,α

s , αs)ds +

∫ T

uf (s,Xt,x,α


]= E

[E

[∫ u

tf (s,Xt,x,α

s , αs)ds +

∫ T

uf (s,Xt,x,α


∣∣∣∣∣∣Fu

]]= E

[E

[∫ u

tf (s,Xt,x,α

s , αs)ds∣∣∣∣∣Fu

]]+ E

[E

[∫ T

uf (s,Xt,x,α


∣∣∣∣∣∣Fu

]]= E

[∫ u

tf (s,Xt,x,α

s , αs)ds]

+ E

[E

[∫ T

uf (s,Xu,Xt,x,α

u ,αs , αs)ds + g

(Xt,Xt,x,α

u ,αt

)∣∣∣∣∣∣Fu

]]= E

[∫ u

tf (s,Xt,x,α

s , αs)ds]

+ E

[∫ T

uf (s,Xu,Xt,x,α

u ,αs , αs)ds + g

(Xt,Xt,x,α

u ,αt

)]= E

[∫ u

tf (s,Xt,x,α

s , αs)ds + J(u,Xt,x,αu , α)

].

24

But, v(u,Xt,x,αu ) = supα∈A(u,Xt,x,α

u ) J(u,Xt,x,αu , α), then J(u,Xt,x,α

u , α) ≤ v(u,Xt,x,αu ).

Hence,

J(t, x, α) ≤ E[∫ u

tf (s,Xt,x,α


]⇒ sup

α∈A(t,x)J(t, x, α) ≤ sup

α∈A(t,x)E

[∫ u

tf (s,Xt,x,α


]⇒v(t, x) ≤ sup

α∈A(t,x)E

[∫ u

tf (s,Xt,x,α


]⇒v(t, x) ≤ v(t, x). (3.18)

ii) Now, we will prove that v(t, x) ≥ v(t, x).

Fix some arbitrary control α ∈ A(t, x) and u ∈ (t,T).

By definition of value functions, for any ε > 0, there exists αε ∈ A(u,Xt,x,αu ) which

is an ε−optimal control for v(u,Xt,x,αu ), i.e.,

v(u,Xt,x,αu ) − ε ≤ J(u,Xt,x,α

u , αε) (3.19)

Let us now define the process

αs =

αs, if s ∈ [t,u]αεs , if s ∈ [u,T] .

By the Measurable Selection Theorem, the process α is progressively measurableand, hence, α ∈ A(t, x). This point is delicate and not trivial because we havemensurabilty questions here (see e.g. [BeSh78]).

Then, by using the law of iterated conditional expectation, and from (3.19), wehave:

v(t, x) ≥ J(t, x, α) = E

[∫ u

tf (s,Xt,x,α

s , αs)ds + J(u,Xt,x,αu , αε)

]≥ E

[∫ u

tf (s,Xt,x,α


]− ε.

From the arbitrariness of α ∈ A(t, x), u ∈ [t,T] and ε > 0, we obtain the inequality

v(t, x) ≥ supα∈A(t,x)

E

[∫ u

tf (s,Xt,x,α


]⇒ v(t, x) ≥ v(t, x). (3.20)

By combining the two relations (3.18) and (3.20) , we get the required result.

Theorem 3.2 (DPP for infinite horizon). Let x ∈ R and u ∈ R fixed. Then we have:

v(x) = supα∈A(x)

E

[∫ u

0e−βs f (Xx,α

s , αs)ds + e−βuv(Xx,αu )

]. (3.21)

25

Proof. Let v(x) = supα∈A(x)E[∫ u

0e−βs f (Xx,α


].

(i) First, we will prove that v(x) ≤ v(x).

Given an admissible control α ∈ A(x), we have, by pathwise uniqueness of theflow of SDE for X, the Markovian structure

Xx,αs = XXx,α

u ,αs , s ≥ u,

for any u ∈ R fix.

By the law of iterated conditional expectation, we can afirm:

J(x, α) = E

[∫∞

0e−βs f

(Xx,α

s , αs)

ds]

= E

[E

[∫∞

0e−βs f

(Xx,α

s , αs)

ds∣∣∣∣∣Fu

]]= E

[E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds +

∫∞

ue−βs f

(Xx,α

s , αs)

ds∣∣∣∣∣Fu

]]= E

[E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds∣∣∣∣∣Fu

]]+ E

[E

[∫∞

ue−βs f

(Xx,α

s , αs)

ds∣∣∣∣∣Fu

]]= E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds]

+ E

[E

[∫∞

ue−βs f

(XXx,α

u ,αs , αs

)ds

∣∣∣∣∣Fu

]]= E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds]

+ E

[E

[∫∞

0e−β(s+u) f

(XXx,α

u ,αs+u , αs+u

)ds

∣∣∣∣∣Fu

]]= E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds]

+ E

[E

[e−βu

∫∞

0e−βs f

(XXx,α

u ,αs+u , αs+u

)ds

∣∣∣∣∣Fu

]]= E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds]

+ e−βuE

[E

[∫∞

0e−βs f

(XXx,α

u ,αs+u , αs+u

)ds

∣∣∣∣∣Fu

]]= E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds]

+ e−βuE

[∫∞

0e−βs f

(XXx,α

u ,αs+u , αs+u

)ds

]︸︷︷︸

=J(Xx,αu ,αs), where αs=αs+u

= E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds + e−βuJ(Xx,α

u , αs)].

But, v (Xx,αu ) = supα∈A(Xx,α

u ) J (Xx,αu , α), then J (Xx,α

u , α) ≤ v (Xx,αu ). Hence,

J(x, α) ≤ E[∫ u

0e−βs f

(Xx,α

s , αs)

ds + e−βuv(Xx,αu )

]⇒

⇒ supα∈A(x)

J(x, α) ≤ supα∈A(x)

E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds + e−βuv(Xx,αu )

]⇒

⇒v(x) ≤ supα∈A(x)

E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds + e−βuv(x)]⇒

⇒v(x) ≤ v(x). (3.22)

26

(ii) Now, we will prove that v(x) ≥ v(x).

Fix some arbitrary control α ∈ A(x) and u ∈ R.

By definition of value functions, for any ε > 0, there exists αε ∈ A (Xx,αu ) which is

an ε−optimal control for v (Xx,αu ), i.e.,

v(Xx,α

u)− ε ≤ J

(Xx,α

u , αε). (3.23)

Then, by using the law of iterated conditional expectation, and from (3.23), wehave:

v(x) ≥ J(x, α) = E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds + e−βuJ(Xx,α

u , α)]

≥ E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds + e−βuv(Xx,α

u)]− ε.

From the arbitrariness of α ∈ A(x),u ∈ R and ε > 0, we obtain the inequality

v(x) ≥ supα∈A(x)

E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds + e−βuv(Xx,α

u)]⇒ v(x) ≥ v(x). (3.24)

By combining the two relations (3.22) and (3.24), we get the required result.

3.4.2 Using DPP to compute the value function

In this section, we will compute the value funcion heuristically. Here, we are not worryabout the precision of the approximation, but with the method to obtain the valuefunction. For more precise approximations, we recomend the references [KloPla99]and [KushDu92].

First, let us consider v(T, x). By equation (3.15), we have:

v(T, x) = supα∈A(T,Xt,x,α

T )J(T,Xt,x,α

T , α)

= E

[∫ T

Tf(s,Xt,x,α

s , αs

)ds + g

(XT,x,α

T

)].

But,∫ T

Tf(s,Xt,x,α

s , αs

)ds = 0, hence

v(T, x) = g(x). (3.25)

Now, let us take α ∈ A (T − ∆T, x), where ∆T is very small. Then,

v (T − ∆T, x) = supα∈A(T−∆T,x)

E

[∫ T

T−∆Tf(s,XT−∆T,x,α

s , αs

)ds + v

(T,XT−∆T,x,α

T

)]. (3.26)

Since ∆T is small, we haveA (T − ∆T, x) ≈ A. Then,

v (T − ∆T, x) ≈ supa∈AE

[∫ T

T−∆Tf(s,XT−∆T,x,a

s , a)

ds + g(XT−∆T,x,a

T

)]. (3.27)

27

Moreover, as ∆T is small, by Euler approximation, we can affirm that∫ T

T−∆Tf(s,XT−∆T,x,a

s , a)

ds ≈ f (T − ∆T, x, a) · ∆T. (3.28)

Substituting (3.28) in (3.27), we have:

v (T − ∆T, x) ≈ supa∈A

(f (T − ∆T, x, a) · ∆T + E

[g(XT−∆T,x,a

T

)]). (3.29)

Consider now the SDE below:

dXT−∆T,x,as = b

(XT−∆T,x,a

s , a)

ds + σ(XT−∆T,x,a

s , a)

dWs, X0 = XT−∆T,x,aT−∆T = x. (3.30)

To calculate E[g(XT−∆T,x,a

T

)], we will simulate the SDE above N times.

We can rewrite (3.30) as

XT−∆T,x,aT = x +

∫ T

T−∆Tb(XT−∆T,x,a

s , a)ds +

∫ T

T−∆Tσ(XT−∆T,x,a

s , a)dWs (3.31)

So, by Euler approximation, we have:

XT−∆T,x,aT ≈ x + b(x, a) · ∆T + σ(x, a) (WT −WT−∆T) (3.32)

We already know x, b(x, a) and σ(x, a), and, moreover, WT − WT−∆T ∼ N(0,∆T).

Hence, we only need to simulate N times√

∆T · Z D

= WT −WT−∆T, where Z ∼ N(0, 1).Let be z1, z2, . . . , zn the simulated values. Then, using Monte Carlo simulations, we

can affirm that

E[g(XT−∆T,x,a

T

)]≈

1N

N∑i=1

g(x + b(x, a)∆T + σ(x, a)

√

∆Tzi

)(3.33)


v (T − ∆T, x) ≈ supa∈A

f (T − ∆T, x, a) · ∆T +1N

N∑i=1

g(x + b(x, a)∆T + σ(x, a)

√

∆Tzi

) .(3.34)

For each x, optimization (3.34) can be numerically calculated. So, we find anapproximation v(T − ∆T, x) and an optimal control α(T − ∆T, x).

By similar arguments, we can affirm that:

v (T − 2∆T, x) = supa∈AE

[∫ T−∆T

T−2∆Tf(s,XT−2∆T,α

s , a)

ds + v(T − ∆T,XT−2∆T,x,α

T−∆T

)]≈ sup

a∈A

(f (T − 2∆T, x, a)

)· ∆T + E

[v(T − ∆T,XT−∆T,x,a

T−∆T

)](3.35)

By (3.34), we already know v(T − ∆T, ·). As in (3.33), we have:

E[v(T − ∆T,XT−2∆T,x,a

T−∆T

)]≈

1N

N∑i=1

v(T − ∆T, x + b(x, a)∆T + σ(x, a)

√

∆Tzi

). (3.36)

This process can be repeat backward until the desired time.

28

3.5 Hamilton-Jacobi-Bellman Equation (HJB)

In the previous section, we used the dynamic programming method for solving astochastic control problem under the framework of controlled diffusion. The approachof Dynamic Programming Principle yields a certain partial differential equation (PDE),of second order and nonlinear, called Hamilton-Jacobi-Bellman (HJB). Let us derivethis equation for the finite and infinite horizon problems.

3.5.1 Formal derivation of HJB


Let us consider the time u = t + h and a constant control αs = a, for some arbitrarya ∈ A, in the relation (3.16) of the DPP:

v(t, x) ≥ E[∫ t+h

tf (s,Xt,x,α

s , a)ds + v(t + h,Xt,x,α

t+h

)]. (3.37)

By assuming that v is smooth enough, we may apply Ito’s formula between t andt + h:

v(t + h,Xt,x,α

t+h

)= v(t, x) +

∫ t+h

t

∂v∂t

(s,Xt,x,α

s

)ds +

∫ t+h

t

∂v∂x

(s,Xt,x,α

s

)dXt,x,α

s +

12

∫ t+h

t

∂2v∂x2

(s,Xt,x,α

s

) (dXt,x,α

s

)2. (3.38)

By (3.1), we can affirm that:

dXt,x,αs = b

(Xt,x,α

s , a)

ds + σ(Xt,x,α

s , a)

dWs. (3.39)

Then,(dXt,x,α

s

)2= b2

(Xt,x,α

s , a)

(ds)2︸︷︷︸=0

+σ2(Xt,x,α

s , a)

(dWs)2︸︷︷︸

=ds

+2(b(Xt,x,α

s , a)) (σ(Xt,x,α

s , a))

(ds · dWs)︸︷︷︸=0

= σ2(Xt,x,α

s , a)

ds. (3.40)

By substituting (3.39) and (3.40) into (3.38), we have:

v(t + h,Xt,x,α

t+h

)= v(t, x) +

∫ t+h

t

∂v∂t

(s,Xt,x,α

s

)ds +∫ t+h

t

∂v∂x

(s,Xt,x,α

s

) (b(Xt,x,α

s , a)

ds + σ(Xt,x,α

s , a)

dWs

)+

12

∫ t+h

t

∂2v∂x2

(s,Xt,x,α

s

)σ2

(Xt,x,α

s , a)

ds. (3.41)

29

By substituting (3.41) into (3.37), we have:

v(t, x) ≥ E[∫ t+h

tf (s,Xt,x,α

s , a)ds + v(t, x) +

∫ t+h

t

∂v∂t

(s,Xt,x,α

s

)ds+∫ t+h

t

∂v∂x

(s,Xt,x,α

s

) (b(Xt,x,α

s , a)

ds + σ(Xt,x,α

s , a)

dWs

)+

12

∫ t+h

t

∂2v∂x2

(s,Xt,x,α

s

)σ2

(Xt,x,α

s , a)

ds]⇒

⇒ v(t, x) ≥ v(t, x) + E

[∫ t+h

tf (s,Xt,x,α

s , a)ds]

+ E

[∫ t+h

t

∂v∂t

(s,Xt,x,α

s

)ds

]+

E

[∫ t+h

t

∂v∂x

(s,Xt,x,α

s

)b(Xt,x,α

s , a)

ds]

+ E

[∫ t+h

t

∂v∂x

(s,Xt,x,α

s

)σ(Xt,x,α

s , a)

dWs

]︸︷︷︸

=0

+

E

[12

∫ t+h

t

∂2v∂x2

(s,Xt,x,α

s

)σ2

(Xt,x,α

s , a)

ds]⇒

⇒ 0 ≥ E

[∫ t+h

tf (s,Xt,x,α

s , a)ds]

+ E

[∫ t+h

t

∂v∂t

(s,Xt,x,α

s

)ds

]+

E

[∫ t+h

t

∂v∂x

(s,Xt,x,α

s

)b(Xt,x,α

s , a)

ds]

+ E

[12

∫ t+h

t

∂2v∂x2

(s,Xt,x,α

s

)σ2

(Xt,x,α

s , a)

ds].

Dividing by h and sending h to 0, this yields

0 ≥ f (t, x, a) +∂v∂t

(t, x) +∂v∂x

(t, x)b(x, a) +12∂2v∂x2 (t, x)σ2(x, a).

Since this holds true for any a ∈ A, then

− supa∈A

[f (t, x, a) +

∂v∂x

(t, x)b(x, a) +12∂2v∂x2 (t, x)σ2(x, a)

]−∂v∂t

(t, x) ≥ 0. (3.42)

Now, let’s suppose that α∗ is an optimal control. Then, in equation (3.15), we have:

v(t, x) = E

[∫ t+h

tf(s,Xt,x,α∗

s , α∗s)

ds + v(t + h,Xt,x,α∗

t+h )], (3.43)

where Xt,x,α∗ is the state system solution to (3.1) starting from x at t, with control α∗.By similar arguments above, we then get

−

[f (t, x, α∗t) +

∂v∂x

(t, x)b(x, α∗t) +12∂2v∂x2 (t, x)σ2(x, α∗t)

]−∂v∂t

(t, x) = 0. (3.44)

Combining (3.44) with (3.42), suggests that v should satisfy

− supa∈A

[f (t, x, a) +

∂v∂x

(t, x)b(x, a) +12∂2v∂x2 (t, x)σ2(x, a)

]−∂v∂t

(t, x) = 0, (3.45)

30

for all (t, x) ∈ [0,T) ×R, if the above supremum is finite. Usually, this PDE is rewrittenin the form

−H(t, x,

∂v∂x

(t, x),∂2v∂x2 (t, x)

)−∂v∂t

(t, x) = 0,∀(t, x) ∈ [0,T) ×R, (3.46)

where, for (t, x, p, γ) ∈ [0,T] ×R ×R ×R,

H(t, x, p, γ) = supa∈A

[f (t, x, a) + p · b(x, a) +

12· γ · σ2(x, a)

]. (3.47)

This function H is called the Hamiltonian of the associated control problem. Theequation (3.46) is called the dynamic programing equation or Hamilton-Jacobi-Bellman(HJB) equation.

From the Definition 3.13 of the value function v considered at the horizon date T,we can affirm that the regular terminal condition associated to this PDE is

v(T, x) = g(x),∀x ∈ R. (3.48)

Remark 3.5. When looking at the minimization problem

v(t, x) = infα∈A(t,x)

E

[∫ T

tf (s,Xt,x,α

s , αs)ds + g(Xt,x,α

T

)],

we can go back to a maximization problem by considering the value −v.This is equivalent to considering a Hamiltonian function:

H(t, x, p, γ) = infa∈A

[f (t, x, a) + p · b(x, a) +

12· γ · σ2(x, a)

]with an HJB equation:

−H(t, x,

∂v∂x

(t, x),∂2v∂x2 (t, x)

)−∂v∂t

(t, x) = 0,∀(t, x) ∈ [0,T) ×R.


Let’s consider a constant control αs = a, for some arbitrary a ∈ A. By Equation (3.21),we can affirm:

v(x) ≥ E[∫ u

0e−βs f (Xx,α


]. (3.49)

Let φ(u,Xx,αu ) = e−βuv(Xx,α

u ). Then,

dφ(u,Xx,αu ) =

∂φ

∂u(u,Xx,α

u )du +∂φ

∂x(u,Xx,α

u )dXx,αu +

∂2φ

∂x2 (u,Xx,αu )(dXx,α

u )2

= −βe−βuv(Xx,αu )du + e−βu∂v

∂x(Xx,α

u )(b(Xx,α

u , a)du + σ(Xx,αu , a)dWu

)+

12

e−βu∂2v∂x2 (Xx,α

u )σ2(Xx,αu , a)du

= −βe−βuv(Xx,αu )du + e−βu∂v

∂x(Xx,α

u )b(Xx,αu , a)du + e−βu∂v

∂x(Xx,α

u )σ(Xx,αu , a)dWu

+12

e−βu∂2v∂x2 (Xx,α

u )σ2(Xx,αu , a)du.

31

So,

φ(u,Xx,αu ) = φ(0,Xx,α

0 ) +

∫ u

0

(−βe−βsv(Xx,α

s ) + e−βs∂v∂x

(Xx,αs )b(Xx,α

s , a) +12

e−βs∂2v∂x2 (Xx,α

s )σ2(Xx,αs , a)

)ds

+

∫ u

0e−βs∂v

∂x(Xx,α

s )σ(Xx,αs , a)dWs.

Moreover, φ(0,Xx,α0 ) = e−β·0 · v(Xx,α

0 ) = v(x). Hence,

φ(u,Xx,αu ) = v(x) +

∫ u

0

(−βe−βsv(Xx,α

s ) + e−βs∂v∂x

(Xx,αs )b(Xx,α

s , a) +12


s )σ2(Xx,αs , a)

)ds

+

∫ u

0e−βs∂v

∂x(Xx,α

s )σ(Xx,αs , a)dWs︸︷︷︸

is a martingale

. (3.50)

Substituting (3.50) in (3.49), we have:

v(x) ≥ E[∫ u

0e−βs f (Xx,α

s , αs)ds]

+E [v(x)] + E

[∫ u

0−βe−βsv(Xx,α

s )ds]

+ E

[∫ u

0e−βs∂v

∂x(Xx,α

s )b(Xx,αs , a)ds

]

+E

[∫ u

0

12


s )σ2(Xx,αs , a)ds

]+

= 0︷︸︸︷E

[∫ u

0e−βs∂v

∂x(Xx,α

s )σ(Xx,αs , a)dWs

]≥ v(x) + E

[∫ u

0e−βs f (Xx,α

s , αs)ds]− βE

[∫ u

0e−βsv(Xx,α

s )ds]

+ E

[∫ u

0e−βs∂v

∂x(Xx,α

s )b(Xx,αs , a)ds

]+

12E

[∫ u

0e−βs∂

2v∂x2 (Xx,α


],

Hence,

0 ≥ E

[∫ u

0e−βs f (Xx,α

s , αs)ds]− βE

[∫ u

0e−βsv(Xx,α

s )ds]

+ E

[∫ u

0e−βs∂v

∂x(Xx,α

s )b(Xx,αs , a)ds

]+

12E

[∫ u

0e−βs∂

2v∂x2 (Xx,α


].

Dividing by u and sending u to 0, this yields

0 ≥ f (x, a) − βv(x) +∂v∂x

(x)b(x, a) +12∂2v∂x2 (x)σ2(x, a).

So,

βv(x) −[

f (x, a) +∂v∂x

(x)b(x, a) +12∂2v∂x2 (x)σ2(x, a)

]≤ 0.

32

Since this holds true for any a ∈ A, then

βv(x) − supa∈A

[f (x, a) +

∂v∂x

(x)b(x, a) +12∂2v∂x2 (x)σ2(x, a)

]≤ 0.

Now, let’s suppose that α∗ is an optimal control. Then, in equation (3.21), we have:

v(x) = supα∈A(x)

E

[∫ u

0e−βs f

(Xx,α∗

s , α∗s)

ds + φ(u,Xx,α∗

u

)],

where Xx,α∗ is the state solution to equation (3.1) starting from x at 0 with control α∗.By similar arguments of finite horizon problem, suggests that v should satisfy

βv(x) − supα∈A(x)

[f (x, a) +

∂v∂x

(x)b(x, a) +12∂2v∂x2 (x)σ2(x, a)

]= 0,

∀x ∈ R, if the above supremum is finite.This PDE may be written also as

βv(x) −H(x,∂v∂x

(x),∂2v∂x2 (x)

)= 0,∀x ∈ R, (3.51)

where

H(x, p, γ) = supα∈A(x)

[f (x, a) + p · b(x, a) +

12· γ · σ2(x, a)

]. (3.52)

Similary in the finite horizon problem, this function H is called the Hamiltonian ofthe associated control problem. The equation (3.51) is called the dynamic programingequation or Hamilton-Jacobi-Bellman (HJB) equation for the infinite horizon problem.

3.6 Verification Theorem

In the previous section, we learned how to find, when exists, a smooth function that isa solution for HJB equation. A natural issue that arises is: Does this candidate coincidewith the value function? To answer this question, we will see the Verification Theoremfor the finite and infinite horizon problem. Besides the confirmation that, in fact, thesmooth solution of HJB is the value function, the Verification Theorem allows us toexicihit as by product an optimal Markovian control.

Given the assumptions defined in the previous section, let’s formulate the:

Theorem 3.3 (Verification Theorem).


Let ϕ be a function in C1,2 ([0,T] ×R).

33

(i) Suppose that

−∂ϕ

∂t(t, x) − sup

a∈A

[f (t, x, a) +

∂ϕ

∂x(t, x) · b(x, a) +

12∂2ϕ

∂x2 (t, x)σ2(x, a)]≥ 0 (3.53)

ϕ(T, x) ≥ g(x), x ∈ R (3.54)

Then, ϕ ≥ v on [0,T] ×R.

(ii) Suppose further that ϕ(T.) = g, and there exists a measurable function α(t, x), (t, x) ∈(0,T) ×R, valued inA such that

−∂ϕ

∂t(t, x) − sup

a∈A

[f (t, x, a) +

∂ϕ

∂x(t, x) · b(x, a) +

12∂2ϕ

∂x2 (t, x)σ2(x, a)]

= −∂ϕ

∂t(t, x) − f (t, x, α(t, x)) −

∂ϕ

∂x(t, x) · b(x, α(t, x)) −

12∂2ϕ

∂x2 (t, x)σ2(x, α(t, x)) = 0,

the SDE

dXt,x,αs = b

(Xt,x,α

s , α(s,Xt,x,αs )

)ds + σ

(Xt,x,α

s , α(s,Xt,x,αs )

)dWs (3.55)

admits a unique solution, denoted by Xt,x,αs , given an initial condition Xt,x,α

t = x, and theprocess

α(s,Xt,x,α

s

), t ≤ s ≤ T

lies inA(t, x). Then,

ϕ = v on [0,T] ×R (3.56)

and α is an optimal Markovian control.

Proof.

(i) By hypothesis, ϕ ∈ C1,2 ([0,T] ×R). Then, applying Ito’s formula, we have:

ϕ(u,Xt,x,α

u

)= ϕ(t, x) +

∫ u

t

∂ϕ

∂t

(s,Xt,x,α

s

)ds +

∫ u

t

∂ϕ

∂x

(s,Xt,x,α

s

)b(Xt,x,α

s , αs

)ds +

12

∫ u

t

∂2ϕ

∂x2

(s,Xt,x,α

s

)σ2

(Xt,x,α

s , αs

)ds +

∫ u

t

∂ϕ

∂x

(s,Xt,x,α

s

)σ(Xt,x,α

s , αs

)dWs︸︷︷︸

is a martingale

,

for all (t, x) ∈ [0,T) ×R, α ∈ A(t, x) and u ∈ [t,T) fixed.

34

Taking the expectation, we get:

E[ϕ

(u,Xt,x,α

u

)]= E

[ϕ(t, x)

]+ E

[∫ u

t

∂ϕ

∂t

(s,Xt,x,α

s

)ds

]+

E

[∫ u

t

∂ϕ

∂x

(s,Xt,x,α

s

)b(Xt,x,α

s , αs

)ds

]+

12E

[∫ u

t

∂2ϕ

∂x2

(s,Xt,x,α

s

)σ2

(Xt,x,α

s , αs

)ds

]+

E

[∫ u

t

∂ϕ

∂x

(s,Xt,x,α

s

)σ(Xt,x,α

s , αs

)dWs

]︸︷︷︸

=0

⇒

⇒ E[ϕ

(u,Xt,x,α

u

)]= ϕ(t, x) + E

[∫ u

t

(∂ϕ

∂t

(s,Xt,x,α

s

)+∂ϕ

∂x

(s,Xt,x,α

s

)b(Xt,x,α

s , αs

)+

12·∂2ϕ

∂x2

(s,Xt,x,α

s

)σ2

(Xt,x,α

s , αs

))ds

]. (3.57)

But, as ϕ satisfies the condition (3.53), we can affirm that

− f (t,Xt,x,αs , αs) ≥

∂ϕ

∂t(t,Xt,x,α

s ) +∂ϕ

∂x(t,Xt,x,α

s ) · b(Xt,x,αs , αs) +

12∂2ϕ

∂x2 (t,Xt,x,αs )σ2(Xt,x,α

s , αs).(3.58)


E[ϕ

(u,Xt,x,α

u

)]≤ ϕ(t, x) − E

[∫ u

tf(s,Xt,x,α

s , αs

)ds

], ∀α ∈ A(t, x). (3.59)

Sinceϕ is continuous on [0,T]×R, by sending u to T, we obtain, for all α ∈ A(t, x),by the Dominated Convergence Theorem and by (3.54):

E[g(Xt,x,α

T

)]≤ ϕ(t, x) − E

[∫ T

tf(s,Xt,x,α

s , αs

)ds

]⇒

⇒ E

[∫ T

tf(s,Xt,x,α

s , αs

)ds

]+ E

[g(Xt,x,α

T

)]≤ ϕ(t, x)⇒

⇒ E

[∫ T

tf(s,Xt,x,α

s , αs

)ds + g

(Xt,x,α

T

)]≤ ϕ(t, x)⇒

⇒ J(t, x, α) ≤ ϕ(t, x)⇒⇒ v(t, x) ≤ ϕ(t, x). (3.60)

(ii) Applying Ito’s formula to ϕ(s, Xt,x,α

s

)between t ∈ [0,T) and u ∈ [t,T), we have:

E[ϕ

(u, Xt,x,α

u

)]= ϕ(t, x) + E

[∫ u

t

(∂ϕ

∂t

(s, Xt,x,α

s

)+∂ϕ

∂x

(s, Xt,x,α

s

)b(Xt,x,α

s , αs

)+

12·∂2ϕ

∂x2

(s, Xt,x,α

s

)σ2

(Xt,x,α

s , αs

))ds

], (3.61)

35

where αs = α(s, Xt,x,α

s

).

But, by definition of α(t, x), we have:

−∂ϕ

∂t(t, x) − f (t, x, α(t, x)) +

∂ϕ

∂x(t, x) · b(x, α(t, x)) +

12∂2ϕ

∂x2 (t, x)σ2(x, α(t, x)) = 0,

and so

E[ϕ

(u, Xt,x,α

u

)]= ϕ(t, x) − E

[∫ u

tf(s, Xt,x,α

s , α(s, Xt,x,α

s

))].

By sending u to T, we obtain:

E[g(XT,x,α

u

)]= ϕ(t, x) − E

[∫ T

tf(s, Xt,x,α

s , α(s, Xt,x,α

s

))]⇒

⇒ E

[∫ T

tf(s, Xt,x,α

s , α(s, Xt,x,α

s

))]+ E

[g(XT,x,α

u

)]= ϕ(t, x)⇒

⇒ E

[∫ T

tf(s, Xt,x,α

s , α(s, Xt,x,α

s

))+ g

(XT,x,α

u

)]= ϕ(t, x)⇒

⇒ J(t, x, α) = ϕ(t, x)⇒⇒ v(t, x) ≥ ϕ(t, x). (3.62)

Combining (3.60) with (3.62), we conclude that ϕ = v with α as an optimal Marko-vian control.


Let ϕ ∈ C2(R), and satisfies a growth condition (3.10) defined in the Remark 3.2.

(i) Suppose that

βϕ(x) − supa∈A

[f (x, a) +

∂ϕ

∂x(x)b(x, a) +

12∂2ϕ

∂x2 σ2(x, a)

]≥ 0, x ∈ R, (3.63)

lim supT→∞

e−βTE[ϕ

(Xx,α

T

)]≥ 0, x ∈ R,∀α ∈ A(x). (3.64)

Then, ϕ ≥ v on R.

(ii) Suppose further that for all x ∈ R, there exists a measurable function α(x), x ∈ R, valuedin A such that

βϕ(x) − supa∈A

[f (x, a) +

∂ϕ

∂x(x)b(x, a) +

12∂2ϕ

∂x2 σ2(x, a)

]=

= βϕ(x) − f (x, α(x)) −∂ϕ

∂x(x)b(x, α(x)) −

12∂2ϕ

∂x2 σ2(x, α(x)) = 0,

the SDEdXx,α

s = b(Xx,αs , α(Xx,α

s ))ds + σ(Xx,αs , α(Xx,α

s ))dWs

36

admits a unique solution, denoted by Xx,αs , given an initial condition Xx,α

0 = x, satisfying

lim infT→∞

e−βTE[ϕ

(Xx,α

T

)]≤ 0, (3.65)

and the processα(Xx,α

s

), s ≥ 0

, lies inA(x). Then,

ϕ(x) = v(x),∀x ∈ R

and α is an optimal Markovian control.

Proof.

(i) By hypothesis, ϕ ∈ C2(R). Then, applying Ito’s formula, we have:

e−βuw(Xx,α

u)

= ϕ(x) +

∫ u

0e−βs

(−βϕ

(Xx,α

s)

+∂ϕ

∂x(Xx,α

s)

b(Xx,α

s , αs)

+12∂2ϕ

∂x2

(Xx,α

s)σ2 (Xx,α

s , αs))

ds

+

∫ u

0e−βs∂ϕ

∂x(Xx,α

s)σ(Xx,α

s , αs)

dWs︸︷︷︸is a martingale

,

for all x ∈ R, α ∈ A(x) and u ∈ [0,∞) fixed.

Taking the expectation, we get:

E[e−βuw

(Xx,α

u)]

=

ϕ(x)︷︸︸︷E

[ϕ(x)

]+E

[∫ u

0e−βs

(−βϕ

(Xx,α

s)

+∂ϕ

∂x(Xx,α

s)

b(Xx,α

s , αs)

+12∂2ϕ

∂x2

(Xx,α

s)σ2 (Xx,α

s , αs))

ds]

+E

[∫ u

0e−βs∂ϕ

∂x(Xx,α

s)σ(Xx,α

s , αs)

dWs

]︸︷︷︸

= 0

.

Then,

E[e−βuw

(Xx,α

u)]

=ϕ(x) + E

[∫ u

0e−βs

(−βϕ

(Xx,α

s)

+∂ϕ

∂x(Xx,α

s)

b(Xx,α

s , αs)

+12∂2ϕ

∂x2

(Xx,α

s)σ2 (Xx,α

s , αs))

ds].

(3.66)

But, as ϕ satifies the condition (3.63), we can affirm that

− βϕ(Xx,α

s)+∂ϕ

∂x(Xx,α

s)

b(Xx,α

s , αs)+

12∂2ϕ

∂x2

(Xx,α

s)σ2 (Xx,α

s , αs)≤ − f

(Xx,α

s , αs). (3.67)

37


E[e−βuϕ

(Xx,α

u)]≤ ϕ(x) − E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds]. (3.68)

Since ϕ is continuous on R and sending u to infinite, we obtain, for all α ∈ A(x),by (3.64):

lim supu→∞

E[e−βuϕ

(Xx,α

u)]

︸︷︷︸≥0

≤ ϕ(x) − lim supu→∞

E

[∫ u

0e−βs f

(Xx,α

s , αs)

ds]⇒

⇒ 0 ≤ ϕ(x) − E[∫

∞

0e−βs f

(Xx,α

s , αs)

ds]⇒

⇒ E

[∫∞

0e−βs f

(Xx,α

s , αs)

ds]≤ ϕ(x)⇒

⇒ J (x, α) ≤ ϕ(x)⇒⇒ v(x) ≤ ϕ(x). (3.69)

(ii) Applying Ito’s formula to e−βsϕ(Xx,α

s

)between 0 and u ∈ (0,∞), we have:

E[ϕ

(Xx,α

u

)]= ϕ(x) + E

[∫ u

0e−βs

(−βϕ

(Xx,α

s

)+∂ϕ

∂x

(Xx,α

s

)b(Xx,α

s , αs

)+

12∂2ϕ

∂x2

(Xx,α

s

)σ2

(Xx,α

s , αs

))ds

],

where αs = α (Xx,αs ).

By definition of α(x), we can affirm that:

E[e−βuϕ

(Xx,α

u

)]= ϕ(x) − E

[∫ u

0e−βs f

(Xx,α

s , α(Xx,αs )

)].

By sending u to infinite, we obtain:

lim infu→∞

E[ϕ

(Xx,α

u

)]︸︷︷︸

≤0

= ϕ(x) − lim infu→∞

E

[∫ u

0e−βs f

(Xx,α

s , α(Xx,αs )

)]⇒

⇒ 0 ≥ ϕ(x) − E[∫

∞

0e−βs f

(Xx,α

s , α(Xx,αs )

)]⇒

⇒ E

[∫∞

0e−βs f

(Xx,α

s , α(Xx,αs )

)]≥ ϕ(x)⇒

⇒ J(x, α) ≥ ϕ(x)⇒⇒ v(x) ≥ ϕ(x). (3.70)

Then, by combining (3.69) and (3.70), we conclude thatϕ = v with α as an optimalMarkovian control.

38

Chapter 4

Merton Portfolio Optimization Problem

4.1 The Market Model

Let consider a model for a financial market with one non risky asset S0 =(S0

t

)t≥0

and

one risky asset S1 =(S1

t

)t≥0

whose dynamics are given by:dS0t = S0

t rtdtdS1

t = S1t [µtdt + σtdWt],

for 0 ≤ t ≤ T with S10 > 0, where:

(i) r, µ and σ are supposed to be progressively measurable 1-dimensional processes;

(ii) Wt are Brownian Motion.

We denote by π = (π0t , π

1t )0≤t≤T a small investor portfolio where π0

t and π1t repre-

sent the amounts in cash invested in the bond (non risky asset) and the risky asset,respectively.

The value at time t of such portfolio is denoted by

Xt = Xπt = π0

t + π1t . (4.1)

Definition 4.1. A portfolioπ is said to be admissible if its components are progressivelymeasurable and ∫ T

0|π0

t |dt < ∞, (4.2)∫ T

0|πtσt|

2dt < ∞ (4.3)

andXπ

t ≥ 0, 0 ≤ t ≤ T (4.4)

almost surely.

Remark 4.1. When π0 and π1 are negatives means short sale, i.e., an occasion whensomeone sells shares that they have borrowed hoping that their price will fall beforethey have to replace them so that they make a profit.

39

Definition 4.2. A portfolio π is said to be self-financing if

dXπt = π0

t

dS0t

S0t

+ π1t

dS1t

S1t

. (4.5)

By the market model that we are considering, we have:

dS0t

S0t

= rtdt (4.6)

anddS1

t

S1t

= µtdt + σtdWt. (4.7)

Then, by the self-financing property, we have:

dXπt = π0

t rtdt + π1t (µtdt + σtdWt)

= π0t rtdt + π1

tµtdt + π1tσtdWt

= π0t rtdt + π1

t rtdt + π1tµtdt − π1

t rtdt + π1tσtdWt

= (π0t + π1

t )rtdt + π1t (µt − rt)dt + π1

tσtdWt

= Xπt rtdt + π1

t (µt − rt)dt + π1tσtdWt. (4.8)

Definition 4.3. We denote by A the set of self-financing, admissible portfolios. Sincein the present context, the proportion of wealth can be any real number, an admissiblecontrol α = (αt)t is a progressively measurable process such that

∫ T

0|αt|

2dt < ∞.

Let us denote by the αt the proportion of wealth invested in the stock at time t. So,we have

αt =π1

t

Xt⇔ αtXt = π1

t . (4.9)


dXt,x,αt = Xt,x,α

t rtdt + π1t (µt − rt)dt + π1

tσtdWt

= Xt,x,αt rtdt + αtXt,x,α

t (µt − rt)dt + αtXt,x,αt σtdWt

= Xt,x,αt

[(αt(µt − rt) + rt

)dt + αtσtdWt

](4.10)

4.2 The Classical Merton Portfolio Optimization Problem

We will examine two kinds of problem:

1) The optimization of expected utility of terminal wealth under finite time horizon;

2) The optimization of the utility of consumption under infinite time horizon.

To solve these problems, we will consider the market model defined in the previoussection. For the sake of simplicity, we assume that the coeficientes are deterministicand constant over time.

So, the model reduces to:dS0

t = S0t rdt, S0

0 = 1,dS1

t = S1t [µdt + σdWt], S1

0 = s.

40

4.2.1 Terminal Utility of Wealth Maximization

Denoting by (Xt,x,αs )t≤s≤T the solution of (4.10) starting from x at time s = t, then the

Merton portfolio optimization problem is to compute

v(t, x) = supα∈A

E[U(Xt,x,α

T )], (t, x) ∈ [0,T] × [0,∞),

where the utility function U is a concave increasing function on (0,∞).

Lemma 4.1. For each t ∈ [0,T], the function x 7→ v(t, x) is non-decreasing and concave (andstricly concave if there exists an optimal control).

Proof. Part I: non-decreasing:

Let us assume that t ∈ [0,T] is fixed as well as x and y such that 0 < x < y < ∞.

Assuming α ∈ A a generic admissible control and denoting by Xt,xs and Xt,y

s thesolutions at time s of (4.10) starting from x and y at time t, then we have:

dXt,x,αs = Xt,x,α

s[(αs(µ − r) + r

)ds + αsσdWs

]dXt,y,α

s = Xt,y,αs

[(αs(µ − r) + r

)ds + αsσdWs

] ⇒⇒ d(Xt,y,α

s − Xt,x,αs ) = (Xt,y,α

s − Xt,x,αs )

[(αs(µ − r) + r)ds + αsσdWs

].

Let Yt,y−x,αs be equal Xt,y,α

s − Xt,x,αs and let’s calculate d(log Yt,y−x,α

s ):

d(log Yt,y−x,αs ) =

1

Yt,y−x,αs

dYt,y−x,αs +

12·

− 1(Yt,y−x,α

s

)2

(dYt,y−x,αs

)2

=(αs

(µ − r

)+ r

)ds + αsσdWs −

12·

1(Yt,y−x,α

s

)2 ·(Yt,y−x,α

s

)2· (αs)

2 σ2ds

=

(αs

(µ − r

)+ r −

(αs)2 σ2

2

)ds + αsσdWs.

(4.11)

Then,

log Yt,y−x,αs − log Yt,y−x,α

t =

∫ s

t

(αu

(µ − r

)+ r −

(αu)2 σ2

2

)du + αuσdWu ⇒

⇒ log(

Yt,y−x,αs

y − x

)=

Is︷︸︸︷∫ s

t

(αu

(µ − r

)+ r −

(αs)2 σ2

2

)du + αuσdWu ⇒

⇒Yt,y−x,α

s

y − x= exp Is ⇒

⇒Yt,y−x,αs = (y − x) · exp Is . (4.12)

41

But, Xt,y,αt − Xt,x,α

t = y − x > 0 and exp Is > 0. Hence, by (4.12), we have:

Yt,y−x,αs > 0⇒ Xt,y,α

s − Xt,x,αs > 0⇒ Xt,y,α

s > Xt,x,αs ,∀s ∈ [t,T]. (4.13)

Now, U being non-decreasing, we have:

E[U

(Xt,x,α

T

)]≤ E

[U

(Xt,y,α

T

)]≤ v(t, y). (4.14)

This is true for all α ∈ A, then supa∈AE[U

(Xt,x,α

T

)]= v(t, x) ≤ v(t, y). Hence, v(t, ·)

is non-decreasing.

Part II: concavity:

Let us assume that t ∈ [0,T] is fixed, 0 < x1 < x2 < ∞, α1 and α2 are admissiblecontrols and λ ∈ [0, 1] and let’s define

xλ = λx1 + (1 − λ)x2 (4.15)

and the process αλ by

αλs =λXt,x1,α1

s α1s + (1 − λ)Xt,x2,α2

s α2s

λXt,x1,α1

s + (1 − λ)Xt,x2,α2

s

, t ≤ s ≤ T, (4.16)

where Xt,x1,α1

s and Xt,x2,α2

s are the solutions at time s of (4.10) starting from x1 and x2

and controlled by the admissible control processes α1 and α2, respectively. Then, dXt,x1,α1

s = Xt,x1,α1

s

[(α1

s (µ − r) + r)

ds + α1sσdWs

]dXt,x2,α2

s = Xt,x2,α2

s[(α2

s (µ − r) + r)

ds + α2sσdWs

]⇒

λ · dXt,x1,α1

s = λ · Xt,x1,α1

s

[(α1

s (µ − r) + r)

ds + α1sσdWs

](1 − λ) · dXt,x2,α2

s = (1 − λ) · Xt,x2,α2

s[(α2

s (µ − r) + r)

ds + α2sσdWs

]⇒

d(λXt,x1,α1

s

)= λXt,x1,α1

s α1s (µ − r)ds + λXt,x1,α1

s rds + λXt,x1,α1

s α1sσdWs

d((1 − λ)Xt,x2,α2

s

)= (1 − λ)Xt,x2,α2

s α2s (µ − r)ds + (1 − λ)Xt,x2,α2

s rds + (1 − λ)Xt,x2,α2

s α2sσdWs

Suming the last two equations above, we have:

d(λXt,x1,α1

s + (1 − λ)Xt,x2,α2

s

)=

(λXt,x1,α1

s α1s + (1 − λ)Xt,x2,α2

s α2s

)(µ − r)ds

+(Xt,x1,α1

s + (1 − λ)Xt,x2,α2

s

)rds

+(λXt,x1,α1

s α1s + (1 − λ)Xt,x2,α2

s α2s

)σdWs ⇒

⇒

d(λXt,x1,α1

s + (1 − λ)Xt,x2,α2

s

)Xt,x1,α1

s + (1 − λ)Xt,x2,α2

s

=

λXt,x1,α1

s α1s + (1 − λ)Xt,x2,α2

s α2s

Xt,x1,α1

s + (1 − λ)Xt,x2,α2

s

(µ − r) + r

ds

+

λXt,x1,α1

s α1s + (1 − λ)Xt,x2,α2

s α2s

Xt,x1,α1

s + (1 − λ)Xt,x2,α2

s

σdWs ⇒

⇒ d(λXt,x1,α1

s + (1 − λ)Xt,x2,α2

s

)=

(Xt,x1,α1

s + (1 − λ)Xt,x2,α2

s

) [(αλs (µ − r) + r

)ds

+αλs σdWs

]. (4.17)

42

Let’s admit that αλ is an admissible control to the associated controlled processXt,xλ,αλ . Then,

dXt,xλ,αλs = Xt,xλ,αλ

s

[(αλs (µ − r) + r

)ds + αλs σdWs

](4.18)

Therefore, by (4.17), (4.18) and Theorem 2.6, we conclude that:

Xt,xλ,αλss = λXt,x1,α1

s + (1 − λ)Xt,x2,α2

s , t ≤ s ≤ T. (4.19)

The utility function being concave, we have:

U(Xλ

T

)≥ λU

(Xt,x1,α1

T

)+ (1 − λ)U

(Xt,x2,α2

T

).

Then, since this is true for all admissible control processes α1 and α2, we have:

v(t, λx1 + (1 − λ)x2) ≥ λv(t, x1) + (1 − λ)v(t, x2).

So, the value function is concave.

Part III: Strictly concave:

Let us assume that the utility function U is strictly concave. If the supremumdefining the value function is always attained, we have:

v(t, xλ) = E[U

(Xt,xλ,α

T

)], where α is an optimal control,

≥ U(Xt,xλ,αλ

T

)> λU

(Xt,x1,α1

T

)+ (1 − λ)U

(Xt,x2,α2

T

), where α1, α2 are optimal controls,

= λv(t, x1) + (1 − λ)v(t, x2).

Hence, if there exists an optimal control, the function is strictly concave.

We now write the Hamiltonian of the problem:

H(x, t, p, γ) = supa∈A

x[a(µ − r) + r]p +

12

a2x2σ2γ

(4.20)

Let ψ(a) = pxr + px(µ − r)a + 12γσ

2x2a2. Assuming µ > r, then,

(i) if γ > 0, sup(a,c)∈A×Cψ(a) = H(t, x, p, γ) = +∞;

(ii) if γ = 0 and p , 0, then sup(a,c)∈A×Cψ(a) = H(t, x, p, γ) = +∞;

(iii) if γ = 0 and p = 0, then sup(a,c)∈A×Cψ(a) = H(t, x, p, γ) = 0 and;

(iv) if γ < 0, then we have:ψ′(a) = px(µ − r) + γσ2x2a. (4.21)

Let us consider a such that ψ′(a) = 0. So,

px(µ − r) + γσ2x2a = 0⇒ a = −(µ − r)σ2 ·

pγx. (4.22)

Then,

sup(a,c)∈A×C

ψ(a) = ψ(a)⇒ H(t, x, p, γ) = prx −12

p2(µ − r)2

γσ2 . (4.23)

43

As we are looking for a strictly concave value function, we will consider γ < 0.By HJB Equation for Finite Horizon Problem (3.46), we have:

−H(t, x,

∂v∂x

(t, x),∂2v

∂x2(t, x)

)−∂v∂t

(t, x) = 0⇒

⇒−

rx∂v∂x

(t, x) −12

(µ − rσ

)2(∂v∂x (t, x)

)2

∂2v∂x2

(t, x)

− ∂v∂t

(t, x) = 0, (4.24)

with terminal condition v(T, x) = U(x). Let’s consider the power utility function (strictly

concave) U(x) =xq

q, x > 0, q < 1. We will look for a function v of the form:

v(t, x) = λ(t) ·U(x) = λ(t) ·xq

q. (4.25)

So,∂v∂t

(t, x) = λ′(t) ·xq

q∧∂v∂x

(t, x) = λ(t) · xq−1∧∂2v∂x2 (t, x) = λ(t) · (q − 1) · xq−2, (4.26)

andλ(T) ·U(x) = v(T, x)⇒ λ(T) = 1. (4.27)


− rxλ(t) · xq−1 +12

(µ − rσ

)2

·

(λ(t) · xq−1

)2

λ(t) · (q − 1) · xq−2 − λ′(t) ·

xq

q= 0⇒

⇒λ′(t) ·xq

q= −rλ(t) · xq +

12

(µ − rσ

)2

·λ2(t) · x2q−2

λ(t) · (q − 1) · xq−2 ⇒

⇒λ′(t) · xq = −rqλ(t) · xq +12

(µ − rσ

)2

·

(q

q − 1

)· λ(t) · xq

⇒

⇒λ′(t) =

[−rq +

12

(µ − rσ

)2

·

(q

q − 1

)]︸︷︷︸

=k

·λ(t)⇒

⇒λ′(t) = k · λ(t). (4.28)

Solving the Ordinary Differential Equation (4.28), we have:λ(t) = c · exp kt ,λ(T) = 1 ⇒ c · exp kT = 1⇒ c = exp −kT .

Hence,

λ(t) = exp k(t − T) ⇒ v(t, x) = exp k(t − T) ·xq

q. (4.29)

Moreover, the optimal control α(t, x) is

−(µ − r)σ2 ·

λ(t) · xq−1

λ(t) · (q − 1) · xq−2 · x=

(µ − r)σ2 ·

11 − q

, (4.30)

which is a rate independent of x, inversely proportional to σ2 and directly proportionalto µ and q.

44

4.2.2 Utility of Consumption Maximization

As in the previous section, let’s denote by αt the proportion of wealth invested in therisky asset, and now by δt the consumption per unit of wealth at time t. Moreover,for the sake of simplicity, let’s consider the infinite horizon problem. An admissiblecontrol is now a pair of progressively measurable processes (αt, δt)t∈[0,∞) with δt ≥ 0, sothat A × C = R × [0,∞), such that

∫∞

0

(|αt|

2 + δt

)dt < ∞ a.s.. Given such an admissible

control, the corresponding wealth process (Xπt )t∈[0,∞) is the unique strong solution of the

equation:dXx,α

s = Xx,αs

[(αs(µ − r) + r

)ds + αsσdWs

]− δs · Xx,α

s ds (4.31)

and our goal is to compute the value function

v(x) = sup(α,δ)∈A×C

E

[∫∞

0e−βsU(δs · Xx,α

s )ds], (4.32)

where β > 0 is some discount factor and U(δ · x) represents the utility of consuming δper unit of wealth.

By HJB Equation for Infinite Horizon Problem (3.51) , we can affirm that:

βv(x) − sup(a,c)∈A×C

U(c · x) + px[a(µ − r) + r] − pcx +

12

a2γσ2x2

=

= βv(x) − sup(a,c)∈A×C

U(c · x) − pcx

+ sup

(a,c)∈A×C

pxr + px(µ − r)a +

12γσ2x2a2

. (4.33)

= βv(x) − supc∈C

U(c · x) − pcx

+ sup

a∈A

pxr + px(µ − r)a +

12γσ2x2a2

. (4.34)

Let ψ(a) = pxr + px(µ − r)a + 12γσ

2x2a2. Assuming µ > r, then,

(i) if γ > 0, supa∈Aψ(a) = +∞;

(ii) if γ = 0 and p , 0, then supa∈Aψ(a) = +∞;

(iii) if γ = 0 and p = 0, then supa∈Aψ(a) = 0 and;

(iv) if γ < 0, then, analogously to (4.23), we have:

supa∈A

ψ(a) = ψ

(−

(µ − r)σ2 ·

pγx

)⇒ sup

a∈Aψ(a) = prx −

12

p2(µ − r)2

γσ2 . (4.35)

Furthermore,

− supc∈C

U(c · x) − pcx

= inf

c∈C

pcx −U(c · x)

= inf

c∈C

p · λ −U(λ)

= (−U)(−λ), (4.36)

45

where (−U) denotes the Fenchel-Legendre transform of the convex function −U. Recallthat the transform of a function Φ is the function Φ defined by

Φ(y) = infc≥0

Φ(c) − cy

. (4.37)

Subsituting (4.35) and (4.36) into (4.34), we have:

βv(x) + (−U) (−v′(x)) −[v′(x)rx −

12·

(v′(x))2(µ − r)2

σ2v”(x)

]. (4.38)

Let’s, consider the power utility function U(λ) = λq

q , q < 1 and let’s compute (−U).

Call θ(λ) = pλ −λq

q. Then,

θ′(λ) = p − λq−1. (4.39)

Taking λ such that θ′(λ) = 0, we have λ = p1

q−1 . Then,

θ(λ) = p · p1

q−1 −p

qq−1

q= p

qq−1 −

pq

q−1

q

= pq

q−1 ·

(1 −

1q

)= p

qq−1 ·

(q − 1

q

). (4.40)

Hence,

(−U) (−v′(x)) = (v′(x))q

q−1 ·

(q − 1

q

). (4.41)

We will look for a candidate solution to the HJB equation in the form:

v(x) = k ·U(x) = k ·xq

q. (4.42)

Then,v′(x) = k · xq−1 and v”(x) = k · (q − 1) · xq−2,

and, by (4.38), we can affirm that:

β · k ·xq

q+

(k · xq−1

) qq−1·

(q − 1

q

)−

[k · xq−1

· r · x −12·

k2· x2q−2

· (µ − r)2

σ2 · k · (q − 1) · xq−2

]= 0⇒

⇒β + (q − 1) · k1

q−1 − rq +q · (µ − r)2

2 · σ2 · (q − 1)= 0⇒

⇒k1

q−1 = rq −q · (µ − r)2

2 · σ2 · (q − 1)− β⇒

⇒k =

(rq −

q · (µ − r)2

2 · σ2 · (q − 1)− β

)q−1

.

46

Therefore,

v(x) =

(rq −

q · (µ − r)2

2 · σ2 · (q − 1)− β

)q−1

·xq

q. (4.43)

Moreover, the optimal control α(x) is

−(µ − r)σ2 ·

k · xq−1

k(q − 1)xq−2 · x=

(µ − r)σ2(1 − q)

, (4.44)

which is a rate also independent of x, inversely proportional to σ2 and directly propor-tional to µ and q.

The optimal consumption c is

λx

=v′(x)

1q−1

x=

(k · xq−1

) 1q−1

x=

k1

q−1 · xx

= k1

q−1 (4.45)

47

Chapter 5

Conclusions

In the present work, we review some important results of Stochastic Processes andStochastic Calculus. These results served as a basis for the theory presented in thefollowing chapters.

Dynamic Programming Principle (DPP) and Hamilton-Jacobi-Bellman (HJB) Equa-tion, which is an application of DPP, studied in Chapters 2 and 3, are very important forthe resolution of numerous optimization problems. In my opinion, the most importantone, in the case of HJB, is the correlation between a SDE and a PDE.

In Chapter 4, we study the Merton portfolio optimization problem, where we applythe concepts presented in chapters 2 and 3. In the present work, we study the case wherethere was no transaction cost. Considering transaction costs, the problem becomes abit more difficult. However, the main idea for its solution derives from the presentedsolution.

48

Bibliography

[Ph09] PHAM, Huyen (2009): Continuous-time Stochastic Control and Optimizationwith Financial Applications - Springer

[KaSh88] KARATZAS, Ioannis; SHREVE, Steven E. (1988): Brownian Motion andStochastic Calculus - Springer

[Ca] CARMONA, Rene - Lectures on BSDEs, Stochastic Control, and Stochastic Differ-ential Games with Financial Applications - SIAM

[St01] STEELE, J. Michael (2001): Stochastic Calculus and Financial Applications -Springer

[BeSh78] BERTSEKAS, D., SHREVE, S. (1978): Stochastic optimal control: the discrete-time case, Math in Sci. and Eng., Academic Press, 1978

[Fr75] FRIEDMAN A. (1975): Stochastic differential equations and applications, Vol. 1,Academic Press.

[KloPla99] Numerical Solution of Stochastic Differential Equations (1992, corr. 2ndprinting 1999).

[KushDu92] Numerical Methods for Stochastic Control Problems in Continuous Time(1992)

49

Documents

Merton Portfolio Optimization Problem