Time-Inconsistent Stochastic Optimal Control Problems

Time-InconsistentStochastic Optimal Control Problems

Jiongmin Yong

(University of Central Florida)

May 8, 2018

Outline

1. Introduction: Time-Consistency

2. Time-Inconsistent Problems

3. Equilibrium Strategies

4. Further Study

1. Introduction: Time-Consistency

Optimal Control Problem: ConsiderX (s) = b(s,X (s), u(s)), s ∈ [t,T ],

X (t) = x ,

with cost functional

J(t, x ; u(·)) = h(X (T )) +

∫ T

tg(s,X (s), u(s))ds,

u(·) ∈ U [t,T ] =u : [t,T ]→ U

∣∣ u(·) is measurable.

Problem (C). For (t, x) ∈ [0,T )× Rn, find u(·) ∈ U [t,T ] s.t.

J(t, x ; u(·)) = infu(·)∈U [t,T ]

J(t, x ; u(·)) ≡ V (t, x).

Bellman Optimality Principle: For any τ ∈ [t,T ],

V (t, x)= infu(·)∈U [t,τ ]

[∫ τ

tg(s,X (s), u(s))ds+V

(τ,X (τ ; t, x , u(·))

)].

Let (X (·), u(·)) be optimal for (t, x) ∈ [0,T )× Rn.

V (t, x) = J(t, x ; u(·))

=

∫ τ

tg(s, X (s), u(s))ds + J

(τ, X (τ ; t, x , u(·)); u(·)

∣∣[τ,T ]

)≥∫ τ

tg(s, X (s), u(s))ds + V

(τ, X (τ ;t,x ,u(·))

)≥ infu(·)∈U [t,τ ]

∫ τ

tg(s,X (s), u(s))ds+V

(τ,X (τ ; t, x , u(·))

)=V (t, x).

Thus, all the equalities hold.

Consequently,

J(τ, X (τ); u(·)

∣∣[τ,T ]

)= V (τ, X (τ))

= infu(·)∈U [τ,T ]

J(τ, X (τ); u(·)

), a.s.

Hence, u(·)∣∣[τ,T ]

∈ U [τ,T ] is optimal for(τ, X (τ ; t, x , u(·))

).

This is called the time-consistency of Problem (C).

Stochastic Optimal Control Problem:

(Ω,F ,F,P) — filtered complete probability space,W (·) — d-dim standard Brownian motion, F = Ftt≥0 ≡ FW

U [t,T ] =u : [t,T ]× Ω→ U

∣∣ u(·) is F-prog. meas..

dX (s)=b(s,X (s), u(s))ds+σ(s,X (s), u(s))dW (s), s∈ [t,T ],

X (t) = x ,

J(t, x ; u(·)) = E[ ∫ T

tg(s,X (s), u(s))ds + h(X (T ))

].

Problem (S). For given (t, x) ∈ [0,T )× Rn, find au(·) ∈ U [t,T ] such that

J(t, x ; u(·)) = infu(·)∈U [t,T ]

J(t, x ; u(·)).

It is also Time-consistent.

A modified problem:dX (s) = b(s,X (s), u(s))ds + σ(s,X (s), u(s))dW (s), s ∈ [t,T ],

X (t) = x ,

J(t, x ; u(·)) = E[e−λ(T−t)h(X (T ))

+

∫ T

te−λ(s−t)g(s,X (s), u(s))ds

].

e−λ· — exponential discount λ — discount rate

e−λt J(t, x ; u(·)) = E[e−λTh(X (T )) +

∫ T

te−λsg(s,X (s), u(s))ds

]Minimizing u(·) 7→ J(t, x ; u(·)) is equivalent to minimizing

u(·) 7→ e−λt J(t, x ; u(·)).

Remains time-consistent.

Recursive Utility/Cost Functional

Adam Smith (1759), “The Theory of Moral Sentiments”

Utility is not intertemporally sparable but rather that

past and future experiences, jointly with current ones,

provide current utility.

Roughly, in mathematical terms, one should have

U(t,X (t)) = f (U(t − r ,X (t − r)),U(t + τ,X (t + τ)),

where U(t,X ) is the utility at (t,X ).

* Epstein–Duffie (1992) introduced stochastic differential utility:The current utility Y (t) of the payoff ξ at future time T > tsatisfies

Y (t) = Et

[ ∫ T

tg(s,Y (s))ds + ξ

].

Thus, current utility depends on the future utility. The above isequivalent to the following BSDE:

Y (t) = ξ −∫ T

tg(s,Y (s))ds −

∫ T

tZ (s)dW (s), t ∈ [0,T ].

A natural extension:

Y (t) = h(X (T )) +

∫ T

tg(s,Y (s),Z (s))ds −

∫ T

tZ (s)dW (s).

(Y (·),Z (·)) — Recursive utility/disutility process

A further extension:

Y (t)=h(X (T ))+

∫ T

tg(s,X (s), u(s),Y (s),Z (s))ds−

∫ T

tZ (s)dW (s),

dX (s) = b(s,X (s), u(s))ds + σ(s,X (s), u(s))dW (s),

X (t) = x .

Define recursive cost functional

J(t, x ; u(·)) = Y (t)

≡ Et

[h(X (T )) +

∫ T

tg(s,X (s), u(s),Y (s),Z (s))ds

].

If g(s, x , u, y , z) = g(s, x , u), it becomes the classical one.

Further adding exponential discount:

J(t, x ; u(·)) = Et

[e−λ(T−t)h(X (T ))

+

∫ T

te−λ(s−t)g(s,X (s), u(s),Y (s),Z (s))ds

]If the above is called Y (t) (for t ∈ [0,T ]), then

Y (t) = h(X (T )) +

∫ T

t

[λY (s) + g(s,X (s), u(s),Y (s),Z (s))ds

−∫ T

tZ (s)dW (s).

• The corresponding problem is still time-consistent.

• Dynamic programming principle holds, HJB equation is valid, etc.

2. Time-Inconsistent Problems

In reality, problems are hardly time-consistent:

An optimal decision/policy made at time t, more than often,will not stay optimal, thereafter.

Main reason: When building the model, describing theutility/cost, etc., the following are used:

subjective Time-Preferences and

subjective Risk-Preferences.

• Time-Preferences:

Most people do not discount exponentially! Instead, they overdiscount on the utility of immediate future outcomes.

* What if a car in front not moving 2 seconds after the lightturned green? (Give a horn!)

* Plan to finish a job within next week (Will you finish itMonday? or Friday?)

* Shopping using credit cards (meet immediate satisfaction)

* Unintentionally pollute the environment due to over-development

...........

Immediate utility weighs heavier!

Annual rate is r = 10% = 0.1

Option (A): Get $100 today (5/8/2018).

Option (B): Get $105 (> 100(1 + 0.112 )) on 6/8/2018.

Option (A′): Get $110 (= 100× 1.10) on 5/8/2019.

Option (B′): Get $115.50 (> 110(1 + 0.112 )) on 6/30/2019.

For a time-consistent person,

(A)(A′), (B)(B′), (no difference)

(B)(A), (B′)(A′), (consistent preferences).

However, for an uncertainty-averse person,

(A)(B), (B′)(A′), (inconsistent preferences).

Magnifying the example:

Option (A): Get $1M today (5/8/2018).

Option (B): Get $1.05M (> 1M(1 + 0.112 )) on 6/8/2018.

Option (A′): Get $1.1M (= 1M× 1.10) on 5/8/2019.

Option (B′): Get $1.155M(>1.1M(1+ 0.112 )) on 6/8/2019.

For an uncertainty-averse person,

(A)(B), (B′)(A′).

The feeling is stronger?

More rational in the farther future.

Exponential discounting: λe(t) = e−rt , r > 0 — discount rate

Hyperbolic discounting: λh(t) = 11+kt — a hyperbola

If let k = er − 1, i.e., e−r = λe(1) = λh(1) = 11+k , then

λe(t) = e−rt =1

(1 + k)t, λh(t) =

1

1 + kt.

For t ∼ 0, t 7→ 11+kt decreases faster than t 7→ 1

(1+k)t :

λ′h(0) = −k < − ln(1 + k) = λ′e(0),

Hyperbolic discounting actually appears in people’s behavior.

* D. Hume (1739), “A Treatise of Human Nature”

“Reason is, and ought only to be the slave of the passions.”

People’s actions/behaviors are due to their passions.

Generalized Merton Problem

dX (s) = [rX (s) + (µ− r)u(s)− c(s)]ds + σu(s)dW (s),

X (t) = x .

J(t, x ; u(·), c(·)) = Et

[ ∫ T

tν(t, s)c(s)β(s)ds + ρ(t)X (T )β

],

with β ∈ (0, 1). Classical case:

ν(t, s) = e−δ(s−t), ρ(t) = e−δ(T−t), 0 ≤ t ≤ s ≤ T .

Problem. Find (u(·), c(·)) to maximize J(t, x ; u(·), c(·)).

For given t ∈ [0,T ), optimal solution:ut(s) =

(µ− r)X t(s)

σ2(1− β),

ct(s) =ν(t, s)

11−β X t(s)

eλ

1−β (T−s)ρ(t)

11−β +

∫ Ts e

λ1−β (τ−s)

ν(t, τ)1β dτ

λ =[2rσ2(1− β) + (µ− r)2]β

2σ2(1− β)

It is time-inconsistent.

* Palacious–Huerta (2003), survey on history

* Strotz (1956), Pollak (1968), Laibson (1997), ...

* Finn E. Kydland and Edward C. Prescott, (1977)(2004 Nobel Prize winners)(classical optimal control theory not working)

* Ekeland–Lazrak (2008)

* Yong (2011, 2012) (Multi-person differential games)

* Wei–Yong–Yu (2017) (recursive cost functional case)

* Karnam–Ma–Zhang (2017)

* Mei–Yong (2017) (with regime-switching)

• Risk-Preferences:

Consider two investments whose returns are: R1 and R2 with

P(R1 = 100) =1

2, P(R1 = −50) =

1

2,

P(R2 = 150) =1

3, P(R2 = −60) =

2

3.

Which one you prefer?

ER1 =1

2100 +

1

2(−50) = 25,

ER2 =1

3150 +

2

3(−60) = 10.

So R1 seems to be better.

* St. Petersburg Paradox: (posed by Nicolas Bernoulli in 1713)

P(X = 2n) =1

2n, n ≥ 1,

E[X ] =∞∑n=1

2nP(X = 2n) =∞∑n=1

2n1

2n=∞.

Question: How much are you willing to pay to play the game?

How about $10,000? Or $1,000? Or ???

In 1738, Daniel Bernoulli (a cousin of Nicolas) introducedexpected utility: E[u(X )]. With u(x) =

√x , one has

E√X =

∞∑n=1

( 1√2

)n= 1 +

√2.

* 1944, von Neumann–Morgenstern: Introduced “rationality”axioms: Completeness, Transitivity, Independence, Continuity.

Standard stochastic optimal control theory is based on theexpected utility theory.

• Decision-making based on expected utility theory istime-consistent.

• In classical expected utility theory, the probability is objective.

• It is controversial whether a probability should be objective.

• Early relevant works: Ramsey (1926), de Finetti (1937)

Allais Paradox (1953). Let Xi be a payoff

Option 1. P(X1 = 100) = 100%

Option 2. P(X2 = 100) = 89%, P(X2 = 0) = 1%,P(X2 = 500) = 10%

Option 3. P(X3 = 0) = 89%, P(X3 = 100) = 11%

Option 4. P(X4 = 0) = 90%, P(X4 = 500) = 10%

Most people have the following preferences:

X2 ≺ X1, X3 ≺ X4.

If there exists a utility function u : R→ R+ such that

X ≺ Y ⇐⇒ E[u(X )] < E[u(Y )],

then

X2 ≺ X1 ⇒ E[u(X2)] = 0.89u(100) + 0.1u(500) + 0.01u(0)

<E[u(X1)] = u(100)

X3 ≺ X4 ⇒ E[u(X3)] = 0.89u(0) + 0.11u(100)

<E[u(X4)] = 0.9u(0) + 0.1u(500),

Thus,

0.11u(100)>0.1u(500) + 0.01u(0),

0.11u(100)<0.01u(0) + 0.1u(500).

Relevant Literature:

* Subjective expected utility theory (Savage 1954)

* Mean-variance preference (Markowitz 1952)leading to nonlinear appearance of conditional expectation

* Choquet integral (1953)leading to Choquet expected utility theory

* Prospect Theory (Kahneman–Tversky 1979)(Kahneman won 2002 Nobel Prize)

* Distorted probability (Wang–Young–Panjer 1997)widely used in insurance/actuarial science

* BSDEs, g-expectation (Peng 1997)leading to time-consistent nonlinear expectation

* BSVIEs (Yong 2006, 2008)leading to time-inconsistent dynamic risk measure

Recent Relevant Literatures:

* Bjork–Murgoci (2008), Bjork–Murgoci–Zhou (2013)

* Hu–Jin–Zhou (2012, 2015)

* Yong (2014, 2015)

* Bjork–Khapko–Murgoci (2016)

* Hu–Huang–Li (2017)

Only the case that Et [X (·)] and/or Et [u(·)] nonlinearly appear.

• A Summary:

Time-Preferences: (Exponential/General) Discounting.

Risk-Preferences: (Subjective/Objective) Expected Utility.

Exponential discounting + objective expected utility/disutility

leads to time-consistency.

Otherwise, the problem will be time-inconsistent.

3. Equilibrium Strategies

Time-consistent solution:

Instead of finding an optimal solution(which is time-inconsistent),

find an equilibrium strategy(which is time-consistent).

Sacrifice some immediate satisfaction,

save some for the future

(retirement plan, controlling economy growth speed, ...)

A General Formulation:dX (s)=b(s,X (s), u(s))ds+σ(s,X (s), u(s))dW (s), s ∈ [t,T ],

X (t) = x ,dY (s)=−g(t, s,X (s), u(s),Y (s),Z (s))ds+Z (s)dW (s), s∈ [t,T ],

Y (T ) = h(t,X (T )).

Let (Y (·),Z (·)) ≡ (Y (· ; t, x , u(·)),Z (· ; t, x , u(·)). Then define

J(t, x ; u(·)) = Y (t; t, x , u(·))

= Et

[h(t,X (T )) +

∫ T

tg(t, s,X (s), u(s),Y (s),Z (s))ds

].

Problem (N). For (t, x) ∈ [0,T )× Rn, find u(·) ∈ U [t,T ] s.t.

J(t, x ; u(·)) = infu(·)∈U [t,T ]

J(t, x ; u(·)).

This problem is time-inconsistent.

Comparison:

Recursive cost functional with exponential discounting:

J(t, x ; u(·)) = Et

[e−λ(T−t)h(X (T ))

+

∫ T

te−λ(s−t)g(s,X (s), u(s),Y (s),Z (s))ds

]Recursive cost functional with general discounting:

J(t, x ; u(·)) = Y (t; t, x , u(·))

= Et

[h(t,X (T )) +

∫ T

tg(t, s,X (s), u(s),Y (s),Z (s))ds

].

• For a time-inconsistent problem, one should seek an equilibriumstrategy which is time-consistent and locally optimal (in somesense).

• The idea behind: Sacrifice some near future utility to meet somefarther needs (such as retirement fund).

• Mathematically, this can be achieved via a multi-persondifferential games.

-

6

tN =TtN−1tN−2tN−3

XN(tN−1)

?

J(tN−1, xN−1; u(·))

6

J(tN−2, xN−2; u(·))

6

t

XN−1(tN−2)

?

Idea of Seeking Equilibrium Strategies.

• Partition the interval [0,T ]:

[0,T ] =N⋃

k=1

[tk−1, tk ], Π : 0 = t0 < t1 < · · · < tN−1 < tN .

• Solve an optimal control problem on [tN−1, tN ], with costfunctional:

JN(u) = EtN−1

[h(tN−1,X (T ))

+

∫ tN

tN−1

g(tN−1, s,X (s), u(s),Y (s),Z (s))ds],

obtaining optimal 4-tuple (XN(·), uN(·),YN(·),ZN(·)), dependingon the initial pair (tN−1, xN−1).

• Solve an optimal control problem on [tN−2, tN−1] with asophisticated cost functional:

JN−1(u)=E[h(tN−2,XN(T ))

+

∫ tN

tN−1

g(tN−2, s,XN(s), uN(s),Y (s),Z (s))ds

+

∫ tN−1

tN−2

g(tN−2, s,X (s), u(s),Y (s),Z (s))ds].

• By induction to get an approximate equilibrium strategy,depending on Π.

• Let ‖Π‖ → 0 to get a limit.

Definition. Ψ : [0,T ]× Rn → U is called a time-consistentequilibrium strategy if for any x ∈ Rn,

dX (s) = b(s, X (s),Ψ(s, X (s)))ds

+σ(s, X (s),Ψ(s, X (s)))dW (s), s ∈ [0,T ],

X (0) = x

admits a unique solution X (·). For some ΨΠ : [0,T ]× Rn → U,

lim‖Π‖→0

d(

ΨΠ(t, x),Ψ(t, x))

= 0,

uniformly for (t, x) in any compact sets, whereΠ : 0 = t0 < t1 < · · · < tN−1 < tN = T , and (local optimality)

Jk(tk−1,X

Π(tk−1); ΨΠ(·)∣∣[tk−1,T ]

)≤Jk

(tk−1,X

Π(tk−1); uk(·)⊕ΨΠ(·)∣∣[tk ,T ]

), ∀uk(·)∈U [tk−1, tk ],

Jk(·) — sophisticated cost functional.

dXΠ(s) = b(s,XΠ(s),ΨΠ(s,XΠ(s)))ds

+σ(s,XΠ(s),ΨΠ(s,XΠ(s)))dW (s), s ∈ [0,T ],

XΠ(0) = x

[uk(·)⊕ΨΠ(·)∣∣[tk ,T ]

](s) =

uk(s), s ∈ [tk−1, tk),

ΨΠ(s,X k(s)), s ∈ [tk ,T ],

dX k(s) = b(s,X k(s), uk(s))ds

+σ(s,X k(s), uk(s))dW (s), s∈ [tk−1, tk),

dX k(s) = b(s,X k(s),ΨΠ(s,X k(s)))ds

+σ(s,X k(s),ΨΠ(s,X k(s)))dW (s), s∈ [tk ,T ],

X k(tk−1) = XΠ(tk−1).

Equilibrium control:

u(s) = Ψ(s, X (s)), s ∈ [0,T ].

Equilibrium state process X (·), satisfying:dX (s) = b(s, X (s),Ψ(s, X (s)))ds

+σ(s, X (s),Ψ(s, X (s)))dW (s), s ∈ [0,T ],

X (0) = x

Equilibrium value function:

V (t, X (t)) = J(t, X (t); u(·)).

Let D[0,T ] = (τ, t)∣∣ 0 ≤ τ ≤ t ≤ T. Define

a(t, x , u) = 12σ(t, x , u)σ(t, x , u)T , ∀(t, x , u) ∈ [0,T ]× Rn × U,

H(τ, t, x , u, θ, p,P) = tr[a(t, x , u)P

]+ 〈 b(t, x , u), p 〉

+g(τ, t, x , u, θ, p>σ(t, s, u)),

∀(τ, t, x , u, p,P) ∈ D[0,T ]× Rn × U × Rn × Sn,

Let ψ : D(ψ) ⊆ D[0,T ]× Rn × R× Rn × Sn → U such that

H(τ, t, x , ψ(τ, t, x , θ, p,P), θ, p,P) = infu∈U

H(τ, t, x , u, θ, p,P) > −∞,

∀(τ, t, x , p,P) ∈ D(ψ).

In classical case, it just needs

H(t, x , p,P) = infu∈U

H(t, x , u, p,P) > −∞,

∀(t, x , p,P) ∈ [0,T ]× Rn × Rn × Sn.

Equilibrium HJB equation:

Θt(τ , t, x)+tr[a(t, x ,ψ(t, t, x ,Θx(t, t, x),Θxx(t, t, x))

)Θxx(τ , t, x)

]+ 〈 b

(t, x , ψ(t, t, x ,Θx(t, t, x),Θxx(t, t, x))

),Θx(τ , t, x) 〉

+g(τ , t, x , ψ(t, t, x ,Θx(t, t, x),Θxx(t, t, x)),Θ(τ , t, x),

Θx(τ , t, x)>σ(t, x , ψ(t, t, x ,Θ(t, t, x),Θx(t, t, x),Θxx(t, t, x)))=0,

(τ, t, x)∈D[0,T ]×Rn,

Θ(τ ,T , x) = h(τ , x), (τ, x) ∈ [0,T ]× Rn.

Classical HJB Equation:

Θt(t, x)+tr[a(t, x ,ψ(t, x ,Θx(t, x),Θxx(t, x))

)Θxx(t, x)

]+ 〈 b

(t, x , ψ(t, x ,Θx(t, x),Θxx(t, x))

),Θx(t, x) 〉

+g(t, x , ψ(t, x ,Θx(t, x),Θxx(t, x))

)= 0, (t, x) ∈ [0,T ]× Rn,

Θ(T , x) = h(x), x ∈ Rn.

Comparison:

Classical HJB equation:Θt(t, x)+H(t, x ,Θx(t, x),Θxx(t, x)

)= 0, (t, x)∈ [0,T ]×Rn,

Θ(T , x) = h(x), x ∈ Rn.

Equilibrium HJB equation:

Θt(τ, t, x)+H(τ, t, x ,Θx(τ, t, x),Θxx(τ, t, x),

Θ(t, t, x),Θx(t, t, x),Θxx(t, t, x))

= 0,

(τ, t, x)∈D[0,T ]×Rn,

Θ(τ,T , x) = h(τ, x), (τ, x) ∈ [0,T ]× Rn.

Equilibrium value function:

V (t, x) = Θ(t, t, x), ∀(t, x) ∈ [0,T ]× Rn.

It satisfies

V (t, X (t; x)) = J(t, X (t; x); Ψ(·)

∣∣[t,T ]

), (t, x) ∈ [0,T ]× Rn.

Equilibrium strategy:

Ψ(t, x) = ψ(t, t, x ,Vx(t, x),Vxx(t, x)), (t, x) ∈ [0,T ]× Rn.

Theorem. Under proper conditions, the equilibrium HJB equationadmits a unique classical solution Θ(· , · , ·). Hence, an equilibriumstrategy Ψ(· , ·) exists.

Equilibrium strategy Ψ(· , ·) has the following properties:

• Time-consistent: t 7→ Ψ(t, X (t)).

• Local approximately optimality:

For any t ∈ [0,T ), any ε > 0, and any u(·) ∈ U [t, t + ε), let

[u(·)⊕Ψ(· , ·)

](s, x)=

u(s), (s, x) ∈ [t, t + ε)× Rn,

Ψ(s, x), (s, x) ∈ [t + ε,T ]× Rn.

limε→0

J(t, X (t); u(·)⊕Ψ(· , ·)

∣∣[t+ε,T ]

)− J(t, X (t); Ψ(· , ·)

∣∣[t,T ]

)ε

≥ 0.

J(t, X (t); Ψ(· , X (·))

)≤ J

(t, x ; u(·)⊕Ψ(· , X (·))

)+ o(ε).

(Perturbed on [t, t + ε).)

Return to Generalized Merton Problem

dX (s) = [rX (s) + (µ− r)u(s)− c(s)]ds + σu(s)dW (s),

X (t) = x .

J(t, x ; u(·), c(·)) = Et

[ ∫ T

tν(t, s)c(s)β(s)ds + ρ(t)X (T )β

],

with β ∈ (0, 1). Classical case:

ν(t, s) = e−δ(s−t), ρ(t) = e−δ(T−t), 0 ≤ t ≤ s ≤ T .

Problem. Find (u(·), c(·)) to maximize J(t, x ; u(·), c(·)).

Time-consistent equilibrium strategy:

Ψ(t, x) = ϕ(t)xβ,

with ϕ(·) satisfying integral equation:

ϕ(t) = eλ(T−t)−β

∫ Tt ( ν(s′,s′)

ϕ(s′) )1

1−β ds′ρ(t)

+

∫ T

teλ(s−t)−β

∫ st ( ν(s′,s′)

ϕ(s′) )1

1−β ds′(ν(s, s)

ϕ(s)

) β1−β

ν(t, s)ds,

t ∈ [0,T ].

4. Further Study

1. The well-posedness of the equilibrium HJB equation for the caseσ(t, x , u) is not independent of u.

2. The case that ψ is not unique, has discontinuity, etc.

3. The case that σ(t, x , u) is degenerate, viscosity solution?

4. Random coefficient case (non-degenerate/degenerate cases).

5. The case involving conditional expectation.

6. Infinite horizon problems.

Thank You!

Documents

Time-Inconsistent Stochastic Optimal Control Problems