29
Static and Dynamic Optimization (42111) Build. 303b, room 048 Section for Dynamical Systems Dept. of Applied Mathematics and Computer Science The Technical University of Denmark Email: [email protected] phone: +45 4525 3356 mobile: +45 9351 1161 2019-11-24 14:37 Lecture 12: Stochastic Dynamic Programming 1 / 29

Static and Dynamic Optimization (42111)

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Static and Dynamic Optimization (42111)

Static and Dynamic Optimization (42111)

Build. 303b, room 048Section for Dynamical Systems

Dept. of Applied Mathematics and Computer ScienceThe Technical University of Denmark

Email: [email protected]: +45 4525 3356mobile: +45 9351 1161

2019-11-24 14:37

Lecture 12: Stochastic Dynamic Programming

1 / 29

Page 2: Static and Dynamic Optimization (42111)

Outline of lecture

Recap: L11 Deterministic Dynamic Programming (D)

Dynamics Programming (C)

Stochastics (Random variable)

Stochastic Dynamic Programming

Booking profiles

Stochastic Bellman

Stochastic optimal stepping (SDD)

Reading guidance: DO p. 83-92.

2 / 29

Page 3: Static and Dynamic Optimization (42111)

Dynamic Programming (D)

Find a sequence of decisions ui i = 0, , 1, . . . N which takes the system

xi+1 = fi(xi, ui) x0 = x0

along a trajectory, such that the cost function

J = φ(xN ) +

N−1∑

i=0

Li(xi, ui)

is minimized.

3 / 29

Page 4: Static and Dynamic Optimization (42111)

Dynamic Programming

The Bellman function (the optimal cost to go) is defined as:

Vi(xi) = minuN−1

i

Ji(xi, uN−1i )

and is a function of the present state, xi, and index, i.In particular

VN (xN ) = φN (xN )

Theorem

The Bellman function Vi, is given by the backwards recursion

Vi(xi) = minui

[

Li(xi, ui) + Vi+1(xi+1)]

xi+1 = fi(xi, ui) x0 = x0

with the boundary conditionVN (xN ) = φN (xN )

Bellman equation is a functional equation, gives a sufficient condition and V0(x0) = J∗. �

4 / 29

Page 5: Static and Dynamic Optimization (42111)

Dynamic programming

ui = arg minui

[

Li(xi, ui) + Vi+1( fi(xi, ui)︸ ︷︷ ︸

xi+1

)

︸ ︷︷ ︸

Wi(xi,ui)

]

If a maximization problem: min → max.

5 / 29

Page 6: Static and Dynamic Optimization (42111)

Type of solutions

−50

5

0

5

10

0

5

10

15

20

25

xt

x

Vt(x)

time (i)

V

Fish bone method (Graphical method)

Schematic method (Tables) − > programming

Analytical (e.g. Sep. of variable)

Analytical:

Guess the type of functionality in Vi(x) i.e. up to a number of parameter. Check if it satisfythe Bellman equation. This results in a (number of) recursion(s) for the parameter(s).

6 / 29

Page 7: Static and Dynamic Optimization (42111)

Continuous Dynamic Programming

Find the input function ut, t ∈ R, (more precisely {u}T0 ) that takes the system

x = ft(xt, ut) x0 = x0 t ∈ [0, T ] (1)

such that the cost function

J = φT (xT ) +

∫ T

0Lt(xt, ut) dt (2)

is minimized. Define the truncated performance index (cost to go)

Jt(xt, {u}Tt ) = φT (xT ) +

∫ T

t

Ls(xs, us) ds

The Bellman function (optimal cost to go) is defined by

Vt(xt) = min{u}T

t

[

Jt(xt, {u}Tt )

]

We have the following theorem, which states a sufficient condition.

Theorem

The Bellman function Vt(xt), satisfy the equation

−∂Vt(xt)

∂t= min

ut

[

Lt(xt, ut) +∂Vt(xt)

∂xft(xt, ut)

]

Hamilton Jacobi Bellman (3)

This is a PDE with boundary conditions

VT (xT ) = φT (xT )

�7 / 29

Page 8: Static and Dynamic Optimization (42111)

Continuous Dynamic Programming

Proof.

In discrete time we have the Bellman equation

Vi(xi) = minui

[

Li(xi, ui) + Vi+1(xi+1)]

with the boundary conditionVN (xN ) = φN (xN )

t+∆t

i+ 1

t

i

Then

Vt(xt) = minut

[∫ t+∆t

t

Lt(xt, ut) dt+ Vt+∆t(xt+∆t)

]

Apply a Taylor expansion on Vt+∆t(xt+∆t)

Vt(xt) = minut

[

Lt(xt, ut)∆t + Vt(xt) +∂Vt(xt)

∂xft ∆t+

∂Vt(xt)

∂t∆t+ o(|∆t|)

]

8 / 29

Page 9: Static and Dynamic Optimization (42111)

Continuous Dynamic Programming

Proof.

Vt(xt) = minut

[

Lt(xt, ut)∆t+ Vt(xt) +∂Vt(xt)

∂xft∆t+

∂Vt(xt)

∂t∆t+o(|∆t|)

]

(just a copy)

Collect the terms which do not depend on the decision (ut):

Vt(xt) = Vt(xt) +∂Vt(xt)

∂t∆t+min

ut

[

Lt(xt, ut) ∆t+∂Vt(xt)

∂xft(xt, ut) ∆t

]

+o(|∆t|)

In the limit ∆t → 0 (and after divide with ∆t):

−∂Vt(xt)

∂t= min

ut

[

Lt(xt, ut) +∂Vt(xt)

∂xft(xt, ut)

]

9 / 29

Page 10: Static and Dynamic Optimization (42111)

The HJB equation:

−∂Vt(xt)

∂t= min

u

[

Lt(xt, ut) +∂Vt(xt)

∂xft(xt, ut)

]

(just a copy)

The Hamiltonian function

Ht(xt, ut, λTt ) = Lt(xt, ut) + λT

t ft(xt, ut)

The HJB equation can also be formulated as

−∂Vt(xt)

∂t= min

ut

Ht(xt, ut,∂Vt(xt)

∂x)

Link to Pontryagins maximum principle:

λTt =

∂Vt(xt)

∂x

xt = ft(xt, ut) State equation

−λTt =

∂xt

Ht Costate equation

ut = arg minut

[Ht] Optimality condition

10 / 29

Page 11: Static and Dynamic Optimization (42111)

Motion control

Consider the systemxt = ut x0 = x0

and the performance index

J =1

2px2

T +

∫ T

0

1

2u2t dt

The HJB equation, (3), gives:

−∂Vt(xt)

∂t= min

ut

[1

2u2t +

∂Vt(xt)

∂xut

]

VT (xT ) =1

2px2

T

The minimization can be carried out and gives a solution w.r.t. ut which is

ut = −∂Vt(xt)

∂x

So if the Bellman function is known the control action, the decision can be determined from this.If the result above is inserted in the HJB equation we get

−∂Vt(xt)

∂t=

1

2

[∂Vt(xt)

∂x

]2

[∂Vt(xt)

∂x

]2

= −1

2

[∂Vt(xt)

∂x

]2

which is a partial differential equation with the boundary condition

VT (xT ) =1

2px2

T

11 / 29

Page 12: Static and Dynamic Optimization (42111)

PDE:

−∂Vt(xt)

∂t= −

1

2

[∂Vt(xt)

∂x

]2

(just a copy)

Inspired of the boundary condition we guess on a candidate function of the type

Vt(x) =1

2stx

2

where the time dependence is in the function, st. Since

∂V

∂x= stx

∂V

∂t=

1

2stx

2

the following equation

−1

2stx

2 = −1

2(stx)

2

must be valid for any x, i.e. we can find st by solving the ODE

st = s2t sT = p

backwards. This is actually (a simple version of) the continuous time Riccati equation. Thesolution can be found analytically or by means of numerical methods. Knowing the function, st,we can find the control input

ut = −∂Vt(xt)

∂x= −stxt

12 / 29

Page 13: Static and Dynamic Optimization (42111)

Stochastic Dynamic Programming

13 / 29

Page 14: Static and Dynamic Optimization (42111)

The Bank loan

Deterministic:xi+1 = (1 + r)xi − ui x0 = x0

Stochastic:xi+1 = (1 + ri)xi − ui x0 = x0

0 5 10 15 20 250

1

2

3

4

5

6

7

8

9

10Rate of interests

%

time (month)

14 / 29

Page 15: Static and Dynamic Optimization (42111)

0 1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

4 Bank balance

Bal

ance

time (year)0 1 2 3 4 5 6 7 8 9 10

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

4 Bank balance

Bal

ance

time (year)

15 / 29

Page 16: Static and Dynamic Optimization (42111)

Discrete Random Variable

X ∈{x1, x2, ..., xm

}∈ R

n

pk = P{X = xk

}≥ 0

m∑

k=1

pk = 1

1 2 3 4 5 6 7 80

0.2

0.4

E

{

X}

=m∑

k=1

pkxk

E

{

g(X)}

=m∑

k=1

pkg(xk)

16 / 29

Page 17: Static and Dynamic Optimization (42111)

Stochastic Dynamic Programming

Consider the problem of minimizing (in some sense):

J = φN (xN , eN ) +

N−1∑

i=0

Li(xi, ui, ei)

subject toxi+1 = fi(xi, ui, ei) x0 = x0

and the constraints(xi, ui, ei) ∈ Vi (xN , eN ) ∈ VN

ei might be vectors reflecting model errors or direct stochastic effects.

17 / 29

Page 18: Static and Dynamic Optimization (42111)

Ranking performance Indexes

When ei and others are stochastic variable, what do we mean by that one strategy is better thananother.

In a deterministic situation we mean that

J1 > J2

(J1 (J2) being the objective function for strategy 1 (2)).

In a stochastic situation we can choose the definition

E

{

J1

}

> E

{

J2

}

but others do exists. This choice reflects some kind of average consideration.

18 / 29

Page 19: Static and Dynamic Optimization (42111)

Example: Booking profiles

Normally a plane is over booked, ie. more tickets are sold than the number of seats xN . Let xi

be the number of sold tickets on the beginning of day i.

0 N21

If xN < xN we have empty seats - money out the window.If xN > xN we have to pay compensations - also money out the window.

So we want to find a strategy such we are minimizing:

E

{

φ(xN − xN )}

Let wi be the requests for a ticket on day i

(with probability: P{wi = k

}= pk)

and let vi be number of cancellations on day i

(with probability P{vi = k

}= qk).

Dynamics:

xi+1 = xi +min(ui, wi) − vi ei =

[wi

vi

]

Decision information: ui(xi).

19 / 29

Page 20: Static and Dynamic Optimization (42111)

Stochastic Bellman Equation

Consider the problem of minimizing:

J = E

{

φ(xN , eN ) +

N−1∑

i=0

Li(xi, ui, ei)}

subject toxi+1 = fi(xi, ui, ei) x0 = x0

and the constraints(xi, ui, ei) ∈ Vi (xN , eN ) ∈ VN

Theorem

The Bellman function (optimal cost to go), Vi(xi) is given by the (backward) recursion:

Vi(xi) = minui

E

{

Li(xi, ui, ei) + Vi+1(xi+1)}

xi+1 = fi(xi, ui, ei)

VN (xN ) = E

{

φN (xN , eN )}

where the optimization is subject to the constraints and the available information . �

20 / 29

Page 21: Static and Dynamic Optimization (42111)

Discrete (SDD) case

If ei is discrete, ie.

ei ∈{e1i , e2i , ... emi

}pki = P

{

ei = eki

}

k = 1, 2, ... m

then the stochastic Bellman equation can be expressed as

Vi(xi) = minui

m∑

k=1

pki

[

Li(xi, ui, eki ) + Vi+1(fi(xi, ui, e

ki )

︸ ︷︷ ︸

xi+1

)]

︸ ︷︷ ︸

Wi(xi,ui)

with boundary condition

VN (xN ) =m∑

k=1

pkNφN (xN , ekN )

The entries in the scheme below are now expected values (ie. weighted sums).

Wi ui Vi(xi) u∗i (xi)

xi 0 1 2 3

01234

21 / 29

Page 22: Static and Dynamic Optimization (42111)

Optimal stochastic stepping (SDD)

Consider the systemxi+1 = xi + ui + ei x0 = 2,

whereei ∈

{−1 0 1

}ui ∈ {−1, 0, 1}∗

xi ∈ {−2, −1, 0, 1, 2}

and

pki eixi -1 0 1

-2 0 12

12

-1 0 12

12

0 12

0 12

1 12

12

0

2 12

12

0

J = E

{

x24 +

3∑

i=0

x2i + u2

i

}

Notice, no stochastic components.

22 / 29

Page 23: Static and Dynamic Optimization (42111)

Optimal stochastic stepping (SDD)

Firstly, from

J = E

{

x24 +

3∑

i=0

x2i + u2

i

}

(no stochastics in cost) we establish V4(x4) = x24. We are assuming perfect state information.

x4 V4

-2 4-1 10 01 12 4

23 / 29

Page 24: Static and Dynamic Optimization (42111)

Optimal stochastic stepping (SDD)

Then we establish the W3(x3, u3) function (the cost to go):

W3(x3, u3) =m∑

k=1

pk3

[

L3(x3, u3, ek3) + V4(f3(x3, u3, e

k3)

]

W3(x3, u3) = p13[x23 + u2

3 + V4(x3 + u3 + e13)]

e13, p13

+p23[x23 + u2

3 + V4(x3 + u3 + e23)]

e23, p23

+p33[x23 + u2

3 + V4(x3 + u3 + e33)]

e33, p33

︸ ︷︷ ︸

L3(x3,u3,ek

3)

︸ ︷︷ ︸

f3(x3,u3,ek

3)

or more compact:

W3(x3, u3) = x23 + u2

3 + p13[V4(x3 + u3 + e13)

]

+p23[V4(x3 + u3 + e23)

]

+p33[V4(x3 + u3 + e33)

]

24 / 29

Page 25: Static and Dynamic Optimization (42111)

Optimal stochastic stepping (SDD)

W3(x3, u3) =3∑

k=1

pk[

x23 + u2

3 + V4(x3 + u3 + ek3)]

W3(0,−1) =1

2

[02 + (−1)2 + V4(0 − 1−1)

](−1,

1

2)

+0[02 + (−1)2 + V4(0− 1 + 0)

](0, 0)

+1

2

[02 + (−1)2 + V4(0− 1 + 1)

](1,

1

2)

=1

2(1 + 4) + 0 +

1

2(1 + 0) = 3

W3 u3

x3 -1 0 1-2 ∞ 6.5 5.5-1 4.5 1.5 2.50 3 1 31 2.5 1.5 4.52 5.5 3.5 ∞

x4 V4

-2 4-1 10 01 12 4

(just for reference)

25 / 29

Page 26: Static and Dynamic Optimization (42111)

Optimal stochastic stepping (SDD)

W3 u3 V3(x3) u∗3(x3)

x3 -1 0 1-2 ∞ 6.5 5.5 5.5 1-1 4.5 1.5 2.5 1.5 00 3 1 3 1 01 2.5 1.5 4.5 1.5 02 5.5 3.5 ∞ 3.5 0

W2 u2 V2(x2) u∗2(x2)

x2 -1 0 1-2 ∞ 7.5 6.25 6.25 1-1 5.5 2.25 3.25 2.25 00 4.25 1.5 3.25 1.5 01 3.25 2.25 4.5 2.25 02 6.25 6.5 ∞ 6.25 -1

26 / 29

Page 27: Static and Dynamic Optimization (42111)

Optimal stochastic stepping (SDD)

W1 u1 V1(x1) u∗1(x1)

x1 -1 0 1-2 ∞ 8.25 6.88 6.88 1-1 6.25 2.88 3.88 2.88 00 4.88 2.25 4.88 2.25 01 3.88 2.88 6.25 2.88 02 6.88 8.25 ∞ 6.88 -1

W0 u0 V0(x0) u∗0(x0)

x0 -1 0 12 7.56 8.88 ∞ 7.56 -1

Trace back: ui(xi). A feed back solution. Not a time function.

27 / 29

Page 28: Static and Dynamic Optimization (42111)

Deterministic setting (xi+1 = xi + ui i = 0, ... 3)

i 0 1 2 3u∗i -1 0 0 0

Stochastic setting (xi+1 = xi + ui+ei i = 0, ... 3)

x0 u∗0 x1 u∗

1 x2 u∗2 x3 u∗

3-2 1 -2 1 -2 1-1 0 -1 0 -1 00 0 0 0 0 01 0 1 0 1 0

2 -1 2 -1 2 -1 2 0

28 / 29

Page 29: Static and Dynamic Optimization (42111)

Concluding remarks

Discrete state and decision space.

Approximation. Grid covering state and decision space.

Curse of dimensions - combinatoric explosion.

29 / 29