Stochastic Optimal Control Problems Part I: Deterministic Case · Stochastic Optimal Control Problems Part I: Deterministic Case Hasnaa Zidani ENSTA-Paris, University Paris-Sacaly

Stochastic Optimal Control ProblemsPart I: Deterministic Case

Hasnaa Zidani

ENSTA-Paris, University Paris-Sacaly

IMPA, June 20-24, 2016

H. Zidani (ENSTA ParisTech) Stochastic Optimal Control Problems SVAN’2016 1 / 30

Outline

1 Controlled differential systems

2 A Direct Numerical appraoch

3 Optimality conditions: Pontryagin principle


Outline

1 Controlled differential systemsIntroduction and ExamplesState equationExistence of optimal solutions

2 A Direct Numerical appraochDiscrete Optimal Control ProblemExampleState of the art



y state of the system

u control inputu y

Find a control law and its corresponding trajectory that optimize someperformances of the system while complying with prescribed constraints(physical or economical constraints on the control and/or the state)


Consider the problem of minimizing the cost function∫ T

0`(yt , ut)dt + φ(y0, yT ) subject to: yt = f (yt , ut), t ∈ (0,T ),

and the constraints:

Control constraints: c(ut) ≤ 0, t ∈ (0,T ),

State constraints: g(yt) ≤ 0, t ∈ (0,T ),

Mixed state and control constraints: c(ut , yt) ≤ 0, t ∈ (0,T ),

Initial-final equality and inequality constraints:

Φi (y0, yT ) = 0, i = 1, · · · , r1,Ψi (y0, yT ) ≤ 0, i = r1 + 1, · · · , r .


Function spaces: Control and state spaces

U := L∞(0,T ;Rm); Y := W 1,∞(0,T ;Rd).

Their extension to Hilbert spaces:

U2 := L2(0,T ;Rm); Y2 := H1(0,T ;Rd).


The space race: Goddard problem

Example (Goddard)

h(t) = v(t), h(0) = 0,

v(t) =u(t)

m(t)− g , v(0) = 0,

m(t) = −bu(t), m(0) = mo

h(t) : altitudev(t) : velocitym(t) : masse

u(t) : thrust

ä The trust u(t) is subject to: 0 ≤ u(t) ≤ umax .

ä The rocket’s mass satisfies the contraint: m1 ≤ m(t) ≤ m2(t).

The optimal control problem is the following:

Max h(T )u(t) ∈ [0, umax], (h, v ,m) verifie l’EDO,m1 ≤ m(t) ≤ m2(t) t ≥ 0.


Launcher’s problem: Ariane 5

• Steer the launcher from Kourou to the GEO• State variables (r, v) ∈ R3 × R3:

r = v

v =−→P +

−→FT (r, v, u)−−→FD(r, v, u);

u ∈ R3 the trust force (control input).• State constraints: Heat flux, limited capacityof ergol, target constraint (GEO)

Objective function: maximization of the payload.


Standing assumptions

Assume the set of admissible control inputs is:

Uad := {u ∈ U ; ut ∈ U on (0,T )}.

(A0) U is a closed set in Rm.

(A1) f : Rd × Rm −→ Rd is loc. Lipschitz continuous.

(A2) For every x ∈ Rd , f (x ,U) is a convex set of Rd .


Proposition

Assume (A0)-(A1). Let x ∈ Rd .

i) For every u ∈ Uad, there exists yu ∈ H1([0,T ];Rd) solution of theequation: yut = f (y yt , ut), yu0 = x .

ii) Moreover, the application defined by

T (·) : L2(0,T ;Rm) −→ H1(0,T ;Rd)

u 7−→ T (u) := yu

is continuous

S[0,T ](x) :={y | ∃u ∈ Uad, yt = f (yt , ut), y0 = x

}


Under (A0)-(A2) and if U is a compact set,

ä S[0,T ](x) is a compact set in W 1,1 endowed with C 0-topology.This result is a consequence of Filippov’s theorem, see the books of Vinter (2010) or Aubin-Cellina

(1984).

ä the set-valued function x S[0,T ](x) is Lipschitz continuous,

∃L > 0,S[0,T ](x) ⊂ S[0,T ](z) + L|x − z |BW 1,1 ∀x , z ∈ Rd .


Example (1)

Min

∫ 1

0y2(t) dt

y(t) = u(t),

y(0) = 0,

u(t) ∈ {−1, 1}

un(t) =

{1 sur ( 2k

2n ,2k+12n )

−1 sur ( 2k+12n , 2k+2

2n )

yn(t) =

{t − k

n sur ( 2k2n ,

2k+12n )

−t + (k+1)n sur ( 2k+1

2n , 2k+22n )

This simple problem doesn’t admit a solution

yn → 0, y ≡ 0 is not admissible !!‖un‖L∞,L2 = 1 6→ 0


Example (1’)

Min

∫ 1

0y2(t) dt

y(t) = u(t),

y(0) = 0,

u(t) ∈ [−1, 1]

un(t) =

{1 sur ( 2k

2n ,2k+12n )

−1 sur ( 2k+12n , 2k+2

2n )

yn(t) =

{t − k

n sur ( 2k2n ,

2k+12n )

−t + k+1n sur ( 2k+1

2n , 2k+22n )

The relaxed control problem admits a solution!

yn → 0, y ≡ 0 is admissible‖un‖L∞,L2 = 1 6→ 0


Outline





”First discretize and then optimize”

Consider a general control problem

Min φ(yT ) +∫ T0 `(yt , ut)

subject to: yt = f (yt , ut), t ∈ (0,T ), y0 = x

c(ut) ≤ 0, t ∈ (0,T ),

g(yt) ≤ 0, t ∈ (0,T ),

c(ut , yt) ≤ 0, t ∈ (0,T ),

Φi (y0, yT ) = 0, i = 1, · · · , r1,

Ψi (y0, yT ) ≤ 0, i = r1 + 1, · · · , r .


The Euler discretization

ä N: number of time steps, hk > 0 duration of k-th time step

ä Steps begin at time t0 = 0, and for k = 1 to N, tk =∑k

j=0 hj

ä State equation: yk+1 = yk + hk f (uk , yk), k = 0, · · · ,N − 1.

ä Cost function: φ(yN)+

ä Running constraints:

c(uk) ≤ 0; g(yk) ≤ 0; c(uk , yk) ≤ 0, k = 1, · · · ,N − 1.

ä Final equality and inequality constraints:

Φi (y0, yN) = 0, i = 1, · · · , r1,Ψi (y0, yN) ≤ 0, i = r1 + 1, · · · , r .


ä Some control problems are ”naturally” desribed by controlled discretedynamics.

ä Indeed, in some cases the control can act on the control variable onlyat very specific dates (daily, monthly, ...)

ä In this case, the time schedule is fixed and the control problem isalready in the form of a complex finite dimensional control problem.


Example: A production problem

yt : amount of steel produced at time t.

0 ≤ ut ≤ 1 is a fraction of steel produced at time t and allocated toinvestment.

Thepart of yt allocated to investment is used to increase theproduction capacity according to Eq:

dytdt

(t) = kutyt ,

where y0 = A is the initial production and k is the coefficient ofincrease in production.

The optimal control problem consists here at choosing u in an optimalway to maximize the production allocate to the consumption during afixed time horizon T .


Questions

In case of continuous control problem

How is the discretized version related to the original continuous controlproblem ?

Given a nominal local solution (u, y) of the original problem:

Does the discretized problem have a solution (uh, yh) near (u, y) ??

Can we expect an Error order as ‖uh − u‖+ ‖yh − y‖ = O(h), whereh := maxk hk ?

Is it reasonable to assume that the solution is (piecewise) smooth ?

How do we solve the discretized problem ?


Example: double integrator (I)

Consider the very simple example with constraints on the control:ä Dynamics: yt = ut ∈ [−1, 1]ä Optimization problem: reach the zero state in minimal time

-2.0 -1.6 -1.2 -0.8 -0.4 0.0 0.4 0.8 1.2 1.6 2.0

-2.0

-1.6

-1.2

-0.8

-0.4

0.0

0.4

0.8

1.2

1.6

2.0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

........................................

.....

....

....................

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.............................................

..................................

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

....................................................

.......................................

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.............................................................

.................................

............................

...............................

....................................

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.........................................

......................................................

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

...................................

..............................................

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

........................

....

......

.......................................

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Example: double integrator (I)

ä Solution: Bang-bang optimal control, at most one switching time

ä Discretized solution of same nature (costate affine function of time)

ä Error only due to the switching time step

ä Expected error: at most O(h)

Ref. Alt, Baier, Gerdts, Lempio, Error bounds for Euler approximation of linear-quadratic control

problems with bang-bang solutions. 2012.


Example: double integrator (II)Fuller’s problem I (work with J. Laurent-Varin)

Same dynamics: xt = ut ∈ [−1, 1]; Integral cost! T

0x2

tdt.

0 1 2 3 4 5 6 7 8−1.0

−0.8

−0.6

−0.4

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

Figure 2: Fuller problem: optimal control, logarithmic penalty

8Ref. PhD work of J. Laurent-Varin, 2005.


PROS

This method can integrate all types of constraints (state constraints,mixed constraints, ... etc)

The discrete problem is a finite dimensional optimisation problem

CONS

local approach

Huge number of variables

Stability and convergence results: in some cases, the discretized controlproblem doesn’t have any feasible solution while the original controlproblem does have a solution!

The discretization of the control problem should take into account thestructure of the optimal trajectory


Outline





1 Controlled differential systems

2 A Direct Numerical appraoch



With a final state constraint.

Min φ(yT )

subject to: yt = f (yt , ut), t ∈ (0,T ), y0 = y0

Ψ(yT ) = 0

The mapping T : u 7−→ yu is univoque

The OCP (P) can be re-written as:

MinF(u) := J (u, yu)

u ∈ Uad ; Ψ(T (u))(T ) = 0.

Reminder (A known result in Optimization theory)

u ∈ Uad is a minimum of (P) =⇒∃(λo , λ) 6= 0, [λoF ′(u) + [Ψ′(T (u)) · T ′(u)(T )]Tλ] · (u − u) ≥ 0∀u ∈ Uad .


Differentiability of F

(A1’) Assume f is of classe C 1.

Theorem

Assume (A0)-(A1) and (A1’), then T is differentiable on L2(0,T ;Rm). Moreover,we have :

T ′(u) · v = zuv ∀u, v ∈ L2(0,T ;Rm);

where zuv is the linearized state, solution of:

{zt = f ′y (yu

t , ut)zt + f ′u(yut , ut)vt on (0,T ),

z0 = 0,(1)

where yu· := T (u) stands for the state associated to u.


Theorem

We have:

λoF ′(u).v + [T ′(u)(T )]Tλ] · v =

∫ T

0

〈p(t), fu(yut , ut) · vt)〉 dt

where yu = T (u), and p is the adjoint state associated to u, solution of:

−p(t) = [fy (yut , ut ]

tp(t),

p(T ) = λoΦ′(T , yuT ) + λ

Itroduce the hamiltonien H : Rd × Rm × Rd → R, defined by:

H(x , q, v) = q · f (x , v).


Theorem (Sous (A1)-(A3) et (A1’))

let u ∈ Uad is a minimum of (P), then the triplet (u, y , p) satifies:

˙y(t) = f (y(t), u(t)), y(0) = xo

− ˙p(t) = [fy (yu(t), u(t))]tp(t),

∂uH(y(t), u(t), p(t)) · (u − u(t)) ≥ 0, ∀u ∈ U.

.

The triplet (u, y , p) is called a Pontryagin extremal.



let u ∈ Uad is a minimum of (P), then the triplet (u, y , p) satifies:

˙y(t) = f (y(t), u(t)), y(0) = xo

− ˙p(t) = [fy (yu(t), u(t))]tp(t),

H(y(t), u(t), p(t)) = minu∈UH(y(t), u, p(t)).

The triplet (u, y , p) is called a Pontryagin extremal.


More generally ...

Min φ(yT ) +∫ T0 `(y(t), u(t)) dt


Ψ(yT ) = 0


let u ∈ Uad is a minimum of (P), then there exists (λ0, λ) ∈ {0, 1} × Rd

such that

˙y(t) = ∂pH(y(t), u(t), p(t), λ0), y(0) = xo

− ˙p(t) = ∂yH(yu(t), u(t), p(t), λ0)]tp(t),

∂uH(y(t), u(t), p(t), λ0) · (u − u(t)) ≥ 0, ∀u ∈ U,

where H(x , v , q, µ) := 〈q, f (x , a)〉+ µ`(x , v) for x ∈ Rd , v ∈ U, q ∈ Rd

and µ ∈ {0, 1}.Moreover, λo = 1 if the problem is free of state constraints.


More generally ...

Min φ(yT ) +∫ T0 `(y(t), u(t)) dt


Ψ(yT ) = 0


let u ∈ Uad is a minimum of (P), then there exists (λ0, λ) ∈ {0, 1} × Rd

such that

˙y(t) = ∂pH(y(t), u(t), p(t), λ0), y(0) = xo

− ˙p(t) = ∂yH(yu(t), u(t), p(t), λ0)]tp(t),

∂uH(y(t), u(t), p(t), λ0) · (u − u(t)) ≥ 0, ∀u ∈ U,

where H(x , v , q, µ) := 〈q, f (x , a)〉+ µ`(x , v) for x ∈ Rd , v ∈ U, q ∈ Rd

and µ ∈ {0, 1}.Moreover, λo = 1 if the problem is free of state constraints.


Documents

Stochastic Optimal Control Problems Part I: Deterministic Case · Stochastic Optimal Control Problems Part I: Deterministic Case Hasnaa Zidani ENSTA-Paris, University Paris-Sacaly