31
Pontryagin’s Minimum Principle. Optimal Control with Constraints on Inputs

8-Pontryagin

Embed Size (px)

DESCRIPTION

xvcvcx

Citation preview

Page 1: 8-Pontryagin

Pontryagin’s Minimum Principle. Optimal

Control with Constraints on Inputs

Page 2: 8-Pontryagin

1Numerical determination of optimal

trajectories

1.1 Two point boundary value problem

In previous chapter variational techniques were used to derive necessary con-ditions for optimal control. The problem was stated as follows:

Consider the problem of minimizing the performance measure

J = h(x(tf ), tf) +∫ tf

t0

g(x(t),u(t), t)dt (1.1)

subject to:x = f(x(t),u(t), t), x(t0) = x0 (1.2)

The problem is to find an admissible control u∗(t) that causes the system(1.2) to follow an admissible trajectory x∗(t) that minimizes the performancemeasure (1.1).

The hamiltonian, has been defined as:

H(x(t),u(t), λ(t), t) = g(x(t),u(t), t) + λT (t)f(x(t),u(t), t)

Assuming that the state and control variables are not constrained, thatthe final time tf is fixed and the final state is free, we can summarize thetwo-point boundary value problem (TPBVP) that results from the variationalapproach by the equations, (Kirk, 2004):

x∗ =∂H

∂λ= f(x∗,u∗, t) (1.3)

λ∗ = −∂H

∂x(1.4)

1

Page 3: 8-Pontryagin

0 =∂H

∂u(1.5)

x∗(t0) = x0 (1.6)

λ∗(tf ) =∂h

∂x(x∗(tf )) (1.7)

From these five sets of conditions it is desired to obtain an explicit rela-tionship for x∗(t) and u∗(t), t ∈ [t0, tf ]. The difficulty of solving the differ-ential equations in this case is caused by the combination of split boundaryvalues and nonlinear differential equations.

Note that the differential equations (1.3) and (1.4) should be solved, ingeneral, simultaneously because they depend on the same variables (thestates and the adjoint variables). The initial or final conditions, however,are split as shown in (1.6) and (1.7), as the first ones are the values of thestates at the initial time and the rest are the values of the adjoint variablesat the final time. A classical numerical method for integration cannot beapplied.

The algorithm (Kirk, 2004) is based on the observation that u∗(t) whichminimizes the hamiltonian will minimize J . Thus, the numerical algorithmwill determine the control u∗ which makes the first derivative of the hamil-tonian equal to zero.

• Write the hamiltonian H(x(t), u(t), λ(t)).

H(x(t),u(t), λ(t), t) = g(x(t),u(t), t) + λT · f(x(t),u(t), t) (1.8)

Calculate the first derivative of the hamiltonian with respect to u:∂H/∂u.

• Write the differential equations of the adjoint variables:

λi = −∂H

∂xi

,

• Determine the final conditions for λ(tf )

• Start with a random discrete approximation for u(t), t ∈ [0, tf ]. Forexample divide the interval [t0, tf ] into N subintervals and consider thecontrol u constant on each subinterval.

u(t) = u(tk), k = 1..N

2

Page 4: 8-Pontryagin

u(t)

t k

u k

t f

N 1

t 0

k

u k initial guess

optimal control u*

Figure 1.1: Discrete control variable

• Minimize H(u) using a Matlab function or a numerical algorithm todetermine the minimum (for example gradient methods). The functionto be minimized has to solve at each step:

– Using the vector u integrate the state equations (use Runge-Kuttaor Euler methods), with the initial conditions x(0) = x0 to obtaina state trajectory constant on N subintervals.

– Calculate λ(tf )

– Solve the adjoint system of differential equations reversely, fromtf to t0.

– Using u, x and λ calculate the first derivative of the hamiltonian.

The procedure to solve the optimal control problem using the steepestdescent method can be summarized as follows:

• Select a discrete approximation of the control variable u(t), t = [t0, tf ],by subdividing the interval [t0, tf ] into N subintervals and consideringthe control variable as being piecewise constant during each of thesesubintervals.

• Using the N control values, integrate the state equations (1.3) from t0to tf , with initial conditions (1.6) and obtain a state trajectory as apiecewise constant vector function.

• Calculate λ(tf) by substituting x(tf ) into (1.7).

• Integrate (1.4) from tf to t0, with final conditions (1.7) using thepiecewise-constant values of the states calculated before.

3

Page 5: 8-Pontryagin

• If

‖∂H

∂u‖ ≤ ε (1.9)

where ε is a small positive constant, terminate the iterative procedureand output the extremal state and control values.

• If the stopping criterion (1.9) is not satisfied, generate a new piecewiseconstant control, function given by:

u← u− τ∂H

∂u(1.10)

where τ is the step of the steepest descent method.

4

Page 6: 8-Pontryagin

2Pontryagin’s Minimum Principle

2.1 Constrained optimal control and the min-

imum principle

We have assumed that the admissible controls and states are not constrainedby any boundaries; however, in realistic systems such constraints commonlyoccur. Physically realizable controls generally have magnitude limitations:the thrust of a rocket engine cannot exceed a certain value; motors, whichprovide torque, saturate; attitude control mass expulsion systems are capa-ble of providing a limited torque. State constraints often arise because ofsafety or structural restrictions: the current in an electric motor cannot ex-ceed a certain value without damaging the windings; the turning radius of amaneuvering aircraft cannot be less than a specified minimum value (Kirk,2004).

The general approach in which we consider the effect of control constraintsand show the necessary conditions leads to the Pontryagin’s minimum prin-ciple.

u(t)

t t 0 t f

������inadmissible

region

admissible region

u(t)

Figure 2.1: An extremal control that is constrained by a boundary

5

Page 7: 8-Pontryagin

Figure 2.1 shows an example of bounded controls. All functions boundedby vertical lines between t0 and tf and placed in the admissible region areadmissible controls.

It can be demonstrated (Kirk, 2004) that a necessary condition for u∗(t)to minimize the functional J is:

H(x∗(t),u∗(t), λ∗(t), t) ≤ H(x∗(t),u(t), λ∗(t), t) (2.1)

for all t ∈ [t0, tf ] and for all admissible controls.Equation (2.1) which indicates that an optimal control must minimize the

hamiltonian is called Pontryagin’s minimum principle.Notice that we have established a necessary condition but not (in general)

sufficient condition for optimality. An optimal control must satisfy Pontrya-gin’s minimum principle; however, there may be controls that satisfy theminimum principle that are not optimal.

Let us now summarize the principal results.The problem is to find a control u∗(t) ∈ U (where U is the class of

admissible controls), which causes the system

x(t) = f(x(t),u(t), t) (2.2)

to follow an admissible trajectory that minimizes the performance measure

J(u) = h(x(tf ), tf) +∫ tf

t0

g(x(t),u(t), t)dt (2.3)

In terms of the hamiltonian:

H(x(t),u(t), λ(t), t) = g(x(t),u(t), t) + λT (t)f(x(t),u(t), t) (2.4)

the necessary conditions for u∗(t) to be optimal are:

x∗(t) =∂H

∂λ(x∗(t),u(t)∗, λ∗(t), t) = f(x∗(t),u(t)∗, t) (2.5)

λ∗(t) = −∂H

∂x(x∗(t),u(t)∗, t) (2.6)

H(x∗(t),u∗(t), λ∗(t), t) ≤ H(x∗(t),u(t), λ∗(t), t) (2.7)

for all admissible controls u(t) and for all t ∈ [t0, tf ]The following boundary conditions hold, when the final state is free and

the final time is fixed:∂h

∂x(x∗(tf ), tf) = λ∗(tf ) (2.8)

6

Page 8: 8-Pontryagin

and:

H(x∗(tf),u∗(tf ), λ

∗(tf ), tf) +∂h

∂x(x∗(tf ), tf) = 0 (2.9)

when the final state is fixed and the final time is free.It should be emphasized that:

• u∗(t) is a control that causes H(x∗(t),u∗(t), λ∗(t), t)to assume its global,or absolute minimum.

• Equations (2.5), (2.6), (2.7), (2.8) consitute a set of necessary condi-tions for optimality; these conditions are not, in general, sufficient.

In addition, the minimum principle, although derived for controls withvalues in a closed and bounded region, can also be applied to problems inwhich the admissible controls are not bounded. This can be done by viewingthe unbounded control region as having arbitrarily large bounds, thus ensur-ing that the optimal control will not be constrained by boundaries. In thiscase, for u∗(t) to minimize the hamiltonian it is necessary (but not sufficient)that

∂H

∂u(x∗(t),u∗(t), λ∗(t), t) = 0 (2.10)

If equation (2.10) is satisfied, and the matrix:

∂2H

∂u2(x∗(t),u∗(t), λ∗(t), t)

is positive definite, this is sufficient to guarantee that u∗(t) causes H to be alocal minimum; if the hamiltonian can be expressed in the form:

H(x(t),u(t), λ(t), t) = f(x, λ, t) + c(x, λ, t)Tu(t) +1

2uTR(t)u (2.11)

where c is a m× 1 array that does not have any terms containing u(t), thensatisfaction of (2.10) and ∂2H

∂u2 be positive definite are necessary and sufficientfor H(x∗(t),u∗(t), λ∗(t), t) to be a global minimum.

For H of the form (2.11)

∂2H

∂u2(x∗(t),u∗(t), λ∗(t), t) = R(t)

thus, if R(t) is positive definite

u∗(t) = −R−1(t)c(x∗(t), λ∗(t), t)

minimizes the hamiltonian.

7

Page 9: 8-Pontryagin

Example 2.1 (Weber, 2000) Consider the problem of accelerating a skate-board in such a way as to maximize the total distance traveled in a given timeT , minus the effort expended. Denote x1(t) the distance traveled at time t.The speed x2(t) is the first derivative of x1, and let the acceleration be u(t),the first derivative of x2. The dynamical system that describes the problemis:

x1(t) = x2(t)

x2(t) = u(t) (2.12)

where x1(0) = 0, x2(0) = 0. The performance measure to be minimized inthis case is:

J = −x1(T ) +1

2

∫ T

1u2(t)dt (2.13)

Note that the first term in J is the negative distance traveled in time T ,i.e. while this term will be minimized, the distance x1(T ) is maximized. Thesquared acceleration integrated over the entire time interval corresponds tothe effort expended in time T .

The hamiltonian is

H(x, u, λ) =1

2u2(t) + λ1(t)x2(t) + λ2(t)u(t)

The control u∗ that minimizes the hamiltonian is computed from

∂H

∂u= u(t) + λ2(t) = 0

oru∗(t) = −λ2(t)

This control will minimize indeed the hamiltonian since the second derivativeis positive, or

∂2H

∂u2= 1 > 0

The costates are computed from:

λ1(t) = −∂H

∂x1= 0

λ2(t) = −∂H

∂x2

= −λ1(t) (2.14)

Since the final states are free and the final time is given (T ), the finalconditions for λ1,2 are given by:

λ1(T ) =∂h(x(T ))

∂x1(T )= −1

λ2(T ) =∂h(x(T ))

∂x2(T )= 0 (2.15)

8

Page 10: 8-Pontryagin

whereh(x(T )) = −x1(T )

From (2.14) and (2.15) we obtain:

λ1(t) = −1

λ2(t) = t− T (2.16)

Because u∗(t) = −λ2(t), we obtain the optimal control law as:

u∗(t) = T − t

which decreases linearly with time and reaches zero at the final time.

Example 2.2 Let us now illustrate the effect on the necessary conditions ofconstraining the admissible control values. Consider the system having thestate equations:

x1(t) = x2(t)

x2(t) = −x2(t) + u(t) (2.17)

with the initial conditions x(t0) = x0. The performance measure to be mini-mized is:

J(u) =∫ tf

t0

1

2[x2

1(t) + u2(t)]dt (2.18)

tf is specified and the final state x(tf) is free.

a) Find the necessary conditions for an unconstrained control to minimizeJ .

H(x, u, λ) =1

2[x2

1(t) + u2(t)] + λ1x2(t) + λ2(t)(−x2(t) + u(t)) (2.19)

The costate equations are:

λ1(t) = −∂H

∂x1= −x1(t)

λ2(t) = −∂H

∂x2= −λ1(t) + λ2(t) (2.20)

Since the control values are unconstrained, it is necessary that:

∂H

∂u= u∗(t) + λ2(t) = 0 ⇒ u∗(t) = −λ∗

2(t) (2.21)

9

Page 11: 8-Pontryagin

Notice that the hamiltonian is of the form (2.11) and

∂2H

∂u2= 1 > 0

therefore u∗(t) does minimize the hamiltonian.

The costate boundary conditions are:

λ1(tf ) =∂h

∂x1(x(tf), tf) = 0

λ2(tf ) =∂h

∂x2(x(tf), tf) = 0 (2.22)

since h does not appear in J .

b) Find necessary conditions for optimal control if:

−1 ≤ u(t) ≤ 1, for all t ∈ [t0, tf ] (2.23)

The state and costate equations and the boundary conditions for λ(tf)remain unchanged; however u must be selected to minimize

H(x∗, u, λ∗) =1

2x2∗1 +

1

2u2 + λ∗

1x∗

2 + λ∗

2(−x∗

2 + u)

subject to the constraining relations (2.23). To determine the controlthat minimizes H, we first separate all of the terms containing u(t)

1

2u2(t) + λ∗

2(t)u(t) (2.24)

from the hamiltonian. For times when the optimal control is unsatu-rated, we have:

u∗(t) = −λ∗

2(t)

as in part a); clearly this will occur when |λ∗

2(t)| ≤ 1. If however, thereare times when |λ∗

2(t)| > 1 then, from (2.24) the control that minimizesH is:

u∗(t) =

{

−1 for λ∗

2(t) > 11 for λ∗

2(t) < −1

Thus u∗(t) is the saturated function of λ∗

2(t) pictured in Figure 2.2. Forthe constrained control we have:

u∗(t) =

−1 for λ∗

2(t) > 1−λ∗

2(t) for − 1 ≤ λ∗

2(t) ≤ 11 for λ∗

2(t) < −1(2.25)

10

Page 12: 8-Pontryagin

u(t)

1

-1

l 2 *(t) 1

-1

s

Figure 2.2: Saturated control

To determine u∗(t) explicitely, the state and costate equations must besolved. Because of the differences in equations (2.21) and (2.25) thestate-costate trajectories in the two cases will be the same only if theinitial state values are such that the bounded control does not saturate.If this situation occurs, the control constraints do not affect the solution.

Note that the optimal control history for part b) cannot be determined,in general, by calculating the optimal control history for part a) and allowingto saturate whenever the stipulated boundaries are violated.

2.2 Additional necessary conditions

Other necessary conditions may be obtained for the special case when thehamiltonian does not depend explicitly on time, or the optimal control prob-lem is stated as follows:

Determine the optimal control u∗(t) that causes the system

x(t) = f(x(t),u(t)), t0 ≤ t ≤ tf (2.26)

to follow an optimal trajectory x∗(t), while minimizing the performance mea-sure:

J = h(x(tf )) +∫ tf

t0

g(x(t),u(t))dt (2.27)

Additional condition 1: If the final time tf is fixed, the hamiltonian isconstant along the optimal trajectory, or:

H(x∗(t),u∗(t), λ∗(t)) = C, t0 ≤ t ≤ tf (2.28)

11

Page 13: 8-Pontryagin

Additional condition 2: If the final time tf is free, the hamiltonian is zeroalong the optimal trajectory, or:

H(x∗(t),u∗(t), λ∗(t)) = 0, t0 ≤ t ≤ tf (2.29)

One approach to prove the relation (2.28) can be to differentiate thehamiltonian with respect to time:

d

dtH(x(t), u(t), λ(t)) =

d

dt

(

g(x(t),u(t)) + λT (t)f(x(t),u(t)))

=

=

[

dx(t)

dt

]T∂H

∂x+

[

dλ(t)

dt

]T∂H

∂λ+

[

du(t)

dt

]T∂H

∂u=

= x(t)T∂H

∂x+ λ(t)T

∂H

∂λ+ u(t)T

∂H

∂u(2.30)

Using the necessary conditions for the optimal trajectory x∗(t):

x∗(t) =∂H

∂λ(2.31)

λ∗(t) = −∂H

∂x(2.32)

and knowing that the optimal control u∗(t) satisfies:

∂H

∂u= 0 (2.33)

the relation (2.30) becomes:

dH

dt= x∗(t)T (−λ∗(t)) + λ∗(t)T x∗(t) + u∗(t)T · 0 = 0 (2.34)

which means the hamiltonian is constant when evaluated along the optimaltrajectory.

The relation (2.29) results directly from the boundary condition writtenfor the case when the final time is free, i.e.:

H(x∗(tf ),u∗(tf), λ

∗(tf ), tf) +∂h

∂t(x∗(tf), tf ) = 0 (2.35)

When the hamiltonian and the function h do not depend explicitly on time,the relation (2.35) becomes:

H(x∗(tf ),u∗(tf ), λ

∗(tf)) = 0 (2.36)

But the hamiltonian is constant along the optimal trajectory, therefore from(2.36) and (2.28) we obtain:

H(x∗(t),u∗(t), λ∗(t)) = 0, t0 ≤ t ≤ tf (2.37)

12

Page 14: 8-Pontryagin

3Minimum time problems. Bang-bang control

Example 3.1

Consider the problem of accelerating a skateboard in such way as to bringthe skateboard at rest at given position in minimum time. Denote x1(t) thedistance traveled at time t, x1(t) = x2(t) the velocity, and x1(t) = x2(t) =u(t) the acceleration. The model of the dynamical system is:

x1(t) = x2(t); x2(t) = u(t)

with the final states specified: x1(T ) = x2(T ) = 0. We want to determinethe minimum final time T by minimizing:

J =∫ T

01dt

Consider the constraints on control variable −1 ≤ u(t) ≤ 1.The hamiltonian is:

H(x,u, λ) = 1 + λ1(t)x2(t) + λ2(t)u(t)

By examining the form of the hamiltonian we can see that the term λ2(t)u(t),the only one which depends on u(t), will decide the minimum value of H withrespect to u.

The costate λ2(t) is a switching function because the optimal value ofu(t) changes when λ2(t) changes sign. The explanation follows below.

The costate equations are:

{

λ1(t) = −∂H∂x1

= 0

λ2(t) = −∂H∂x2

= −λ1(t)⇒

{

λ1(t) = C1

λ2(t) = −C1t+ C2

13

Page 15: 8-Pontryagin

The switching function λ2(t) is a line and therefore it can change sign atmost once. Then:

u∗(t) =

{

+1, λ2(t) < 0−1, λ2(t) > 0

u*(t)

l 2 (t)

-1

+1

Figure 3.1: Bang-bang control

For λ2(t) < 0, the term λ2(t)u(t) from the hamiltonian has its minimumvalue (negative) when u(t) has its maximum positive value. Then for u∗(t) =1 the hamiltonian is minimum for any t ∈ [0, T ] (we subtracted the maximumpossible from the term 1 + λ1(t)x2(t) which is constant with respect to u).For λ2(t) < 0, the term λ2(t)u(t) will be ’most negative’ for u∗(t) = −1 andthen the hamiltonian reaches the global minimum.

The optimal control is given now in terms of the costate vector λ(t). Wewant the control to be given in terms of the state x(t) so we will have aclosed-loop control system.

Solving the state equations:

u∗(t) = −1 ⇒

x1(t) = x2(t)x2(t) = −1

x1(0) = x10, x2(0) = x20

{

x1(t) = −t2

2+ x20t + x10

x2(t) = −t+ x20

(3.1)

⇒ x1(t) = −1

2(x20 − t)2 + x10 +

x220

2= −

1

2(x20 − t)2 + k1 = −

x22(t)

2+ k1

(3.2)

u∗(t) = +1 ⇒

x1(t) = x2(t)x2(t) = +1

x1(0) = x10, x2(0) = x20

{

x1(t) =t2

2+ x20t + x10

x2(t) = t+ x20

(3.3)

⇒ x1(t) =1

2(t+ x20)

2 + x10 −x220

2=

1

2(x20 − t)2 + k2 =

x22(t)

2+ k2 (3.4)

14

Page 16: 8-Pontryagin

The equations (3.2) and (3.4) are a set of parabolas opening about the x1 axis,as shown in Figure 3.2. Two parabolas pass through each point in x1 − x2

x 1 (t)

x 2 (t)

u*=-1 u*=+1

k 1 <0 k 1 >0 k 2 <0 k 2 >0

Figure 3.2: State space trajectories for bang-bang control

space, one for u∗ = −1 and one for u∗ = +1. For a given x(0) = [x10 x20]T ,

using only u∗ = −1 or u∗ = +1 may not transfer the system from the initialstate to the origin x(T ) = 0 (the target set).

The parabolas that pass through the origin are the ones for which k1 =k2 = 0, or, the ones for which the initial conditions satisfy:

x10 = −x220

2, for u∗ = −1, or x10 =

x220

2, for u∗ = +1 (3.5)

The equation of the curve that transfer the system to the origin are obtainedas follows:

u∗(t) = −1 ⇒

x1(t) = x2(t)x2(t) = −1

x1(T ) = x2(T ) = 0⇒

{

x1(t) = −(T−t)2

2

x2(t) = T − t⇒ x1(t) = −

x22(t)

2

u∗(t) = +1 ⇒

x1(t) = x2(t)x2(t) = +1

x1(T ) = x2(T ) = 0⇒

{

x1(t) =(t−T )2

2

x2(t) = t− T⇒ x1(t) =

x22(t)

2

Since t < T , for u∗ = −1 ⇒ x2(t) > 0 and for u∗ = +1 ⇒ x2(t) < 0. Thenthe equation of the switching locus is written as:

x1(t) = −1

2x2(t)|x2(t)| (3.6)

and is presented in Figure 3.3.

15

Page 17: 8-Pontryagin

x 1 (t)

x 2 (t) u*=-1

u*=+1

Figure 3.3: Switching locus

If, by chance, the initial state lies on the switching locus, then we haveu = ±1 accordingly as we have x1 <> 0. If, as in most cases, the initialstate is not on the switching locus, then u = ±1 must be chosen to move thesystem toward the switching locus. By inspection, it is apparent that, abovethe switching locus we have u = −1 and below we have u = +1; a typicalpath from an initial state x10, x20 is shown in the Figure 3.4.

x 1 (t)

x 2 (t) u*=-1

u*=+1

u*=+1

x 10 ,x 20

x 10 ,x 20

u*=-1

Figure 3.4: Examples of optimal trajectories

The optimal control law is applied according to the following rules:

• If x(0) is to the left of the switching curve, apply u∗ = +1 until thecurve is reached, then apply u∗ = −1 until the origin is reached.

• If x(0) is to the right of the switching curve, apply u∗ = −1 until thecurve is reached, then apply u∗ = +1 until the origin is reached.

• If x(0) is on the switching curve, apply u = −sign(x2(0)) until theorigin is reached.

16

Page 18: 8-Pontryagin

• Apply u = 0 when the origin is reached.

We have obtained the optimal control law at any time t as a function ofthe state value. To express the optimal control law in a convenient form, letus define the switching function, obtained from equation (3.6):

s(x(t)) = x1(t) +1

2x2(t)|x2(t)| (3.7)

Notice that:

• s(x(t)) > 0 implies x(t) lies on the right of the switching locus

• s(x(t)) < 0 implies x(t) lies on the left of the switching locus

• s(x(t)) = 0 implies x(t) lies on the switching locus

In terms of this switching function, the optimal control law is:

• u∗ = −1 for x(t) such that s(x(t)) > 0

• u∗ = +1 for x(t) such that s(x(t)) < 0

• u∗ = −1 for x(t) such that s(x(t)) = 0 and x2(t) > 0

• u∗ = +1 for x(t) such that s(x(t)) = 0 and x2(t) < 0

• u∗ = 0 for x(t) = 0

An implementation of this optimal control law is shown in Figure 3.5.

x 1 x 2 u

in

out

+ +

-1

in

out

-1

+1

Plant

0.5x 2 (t)|x 2 (t)|

Figure 3.5: Implementation of the time-optimal control law

17

Page 19: 8-Pontryagin

4Minimum fuel problems

4.1 Minimum fuel problems

4.1.1 The statement of the problem

For a system described by the state equation:

x(t) = f(x(t),u(t), t), t0 ≤ t ≤ tf , x(t0) = x0 (4.1)

where u(t) is an m × 1 control vector, a minimum fuel problem wouldrequire to determine the optimal control vector u∗(t) that minimizes theperformance measure:

J =∫ tf

t0

[

m∑

i=1

αi|ui(t)|

]

dt (4.2)

while satisfying the constraints:

−M ≤ ui(t) ≤ M, i = 1, 2, ..., m (4.3)

It is assumed that the fuel consumption is proportional to the magnitudeof the control inputs.

4.1.2 Example of a minimum fuel problem

Example 4.1 The plant described by the differential equation:

x(t) = −x(t) + u(t) (4.4)

18

Page 20: 8-Pontryagin

is to be transferred from an arbitrary initial state x(0) = x0 to the origin,x(T ) = 0, by a control that minimizes the performance measure:

J(u) =∫ T

0|u(t)|dt (4.5)

The admissible controls are constrained by:

|u(t)| ≤ 1 (4.6)

and the final time T is fixed.

A more general case of this problem may be found in (Kirk, 2004). Thesolution of the problem is the optimal control signal that minimizes thehamiltonian, written as:

H(x(t), u(t), λ(t)) = |u(t)|+ λ(t)(−x(t) + u(t)) (4.7)

The terms in the hamiltonian that depend on the control u(t) are given by(the reduced hamiltonian Hr):

Hr(u(t), λ(t)) = |u(t)|+ λ(t)u(t) (4.8)

or:

Hr = u(t) + λ(t)u(t), for u(t) ≥ 0 (4.9)

Hr = −u(t) + λ(t)u(t), for u(t) ≤ 0 (4.10)

• If λ(t) > 1, the minimum value of expression (4.9), which is a positivequantity, is 0 because u(t) ≥ 0; but the minimum of (4.10) is a negativevalue and is attained for u∗(t) = −1.

• If λ(t) = 1, the expression (4.9) has a minimum at 0, with u(t) = 0;but (4.10) will be zero for any non-positive value of u(t). Thus, anynon-positive value of u(t) will minimize the hamiltonian.

• If 0 < λ(t) < 1, the minimum of (4.9) and (4.10) is zero (they are bothpositive) and is attained for u∗(t) = 0.

• If λ(t) < −1, the minimum value of expression (4.10), which is a pos-itive quantity, is 0 because u(t) ≤ 0; but the minimum of (4.9) is anegative value and is attained for u∗(t) = +1.

19

Page 21: 8-Pontryagin

−1 −0.5 0 0.5 1−2

0

2

4

u

|u|+

λ u

λ > 1

−1 −0.5 0 0.5 10

0.5

1

1.5

2

u

|u|+

λ u

λ = 1

−1 −0.5 0 0.5 10

0.5

1

1.5

u

|u|+

λ u

0 < λ < 1

−1 −0.5 0 0.5 10

0.5

1

1.5

u

|u|+

λ u

−1 < λ < 0

−1 −0.5 0 0.5 10

0.5

1

1.5

2

u

|u|+

λ u

λ = −1

−1 −0.5 0 0.5 1−2

0

2

4

u

|u|+

λ u

λ < −1

Figure 4.1: Reduced hamiltonian for various values of λ

• If λ(t) = −1, the expression (4.10) has a minimum at 0, with u(t) = 0;but (4.9) will be zero (minimum) for any non-negative value of u(t).

• If −1 < λ(t) < 0, the minimum of (4.9) and (4.10) is zero (they areboth positive) and is attained for u∗(t) = 0.

The above situations are depicted in Figure 4.1 and the conclusions followdirectly from the plots.

Summarizing, the control signal that minimizes the hamiltonian (4.7) is:

u∗(t) =

+1, for λ∗(t) < −1−1, for λ∗(t) > 10, for |λ∗(t)| < 1

undetermined non-positive value for λ∗(t) = +1undetermined non-negative value for λ∗(t) = −1

(4.11)

The costate equation and its solution are:

λ(t) = −∂H

∂x= λ(t) ⇒ λ(t) = Cet (4.12)

20

Page 22: 8-Pontryagin

There are five possible trajectories for λ∗(t) that affect the control signalu∗(t), depending on the value of the constant of integration C, as shown inFigure 4.2, (Beale, 2001).

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−6

−4

−2

0

2

4

6

t

λ* (t)

= C

⋅ et

C≥ 1

−1< C < 0

0 < C < 1

C≤ −1

C = 0t1

t2

Figure 4.2: Forms of the costate

Because of the exponential form of the costate, the situation when λ∗(t) =±1 cannot occur for a finite time interval. Therefore, the optimal control canbe expressed as:

u∗(t) =

+1, for λ∗(t) < 1−1, for λ∗(t) > 10, for |λ∗(t)| < 1

(4.13)

The form bang-off-bang of the optimal control u∗(t) as a function of thecostate λ∗(t) is shown in Figure 4.3.

u*(t)

λ *(t)

-1

1

-1

1

Figure 4.3: Bang-off-bang optimal control

Four of the λ∗-trajectories are shown in Figure 4.4 and the fifth is λ∗(t) =0 for which u∗(t) = 0, t ∈ [0, T ].

The state trajectories can be found by substituting the admissible valuesof the control signal u∗(t) into the state equation and solving for x(t). Thiswill determine whether of not the target set can be reached and if it dependson the initial conditions, (Beale, 2001).

21

Page 23: 8-Pontryagin

t

u * (t)

u * (t)

t

u * (t)

1

-1

u * (t)

t 1

(a) C > 1 (b) 0 < C <1

u * (t)

t

u * (t) 1

-1

t

u * (t) 1

-1

λ * (t)

u * (t)

t 1

(c) C < -1 (d) -1 < C <0

λ * (t)

λ * (t) λ * (t)

λ * (t)

λ * (t)

λ * (t) λ * (t)

1

-1

Figure 4.4: Forms of the costate and optimal control signals, (?)

u∗(t) = 0. The solution of the state equation is obtained as follows:

x(t) = −x(t), x(0) = x0, ⇒ x(t) = x0e−t (4.14)

The exponential form of the trajectory indicates that the system willapproach the final state zero as time approaches infinity, but it willnever attain the zero value in a finite time. Therefore, when no controlsignal is applied, or u∗(t) = 0, this cannot be the only value used.

u∗(t) = +1. The solution of the state equation is :

x(t) = −x(t) + 1, x(0) = x0, ⇒ x(t) = 1 + (x0 − 1)e−t (4.15)

This state trajectory moves towards 1 as time increases and it will passthrough the value 0 only if x0 < 0.

For a fixed final time T , the state will get to the origin only for aparticular value of the initial condition, determined from:

x(T ) = 0 = 1 + (x0 − 1)e−T ⇒ x0 = 1− eT (4.16)

For the rest of the initial conditions switching is needed.

22

Page 24: 8-Pontryagin

u∗(t) = −1. The solution of the state equation is :

x(t) = −x(t) − 1, x(0) = x0, ⇒ x(t) = −1 + (x0 + 1)e−t (4.17)

This state trajectory moves towards −1 as time increases. It will passthrough the value 0 only if x0 > 0.

The state will attain the origin using only the control value u∗(t) = −1,in a fixed time T , only for the initial state obtained as:

x(T ) = 0 = −1 + (x0 + 1)e−T ⇒ x0 = −1 + eT (4.18)

0 0.5 1 1.5 2 2.5 3−3

−2

−1

0

1

2

(a)

x*(t) = 1+(x0−1) ⋅ e−t

0 0.5 1 1.5 2 2.5 3−2

−1

0

1

2

3

(b)

x*(t) = −1+(x0+1) ⋅ e−t

x0 < 0

x0 > 0

x0 > 0

x0 < 0

u*(t) = +1

u*(t) = −1

Figure 4.5: State trajectories for different initial conditions. (a) u∗(t) = +1(b) u∗(t) = −1

Figure 4.5 (a) and (b) show the state trajectories for u∗(t) = −1 andu∗(t) = +1. They indicate that the target set x(T ) = 0 can be reachedin most of the cases (when the initial condition does not have one of thevalues given by (4.16) or (4.18)) only after switching the control value fromu∗(t) = 0 to u∗(t) = −1 (if the initial state x0 > 0) or to u∗(t) = +1 (if theinitial state x0 < 0).

4.1.3 Determining the switching time

.

23

Page 25: 8-Pontryagin

Initial condition x0 is positive: x0 > 0. The control u∗(t) = 0 will notdrive the system to the origin x(T ) = 0 in a finite time. As shown inFigure 4.4b), either u∗(t) = −1 or u∗(t) = [0,−1] must be used.

Let t = t1 be the switching time from u∗(t) = 0 to u∗(t) = −1. Thestate value at time t1 is obtained from (4.14) for the final time t = t1.

x(t1) = x0e−t1 (4.19)

After switching, the state equation (4.17) will be solved for the timeinterval [t1, T ] with the initial condition x(t1):

x(t) = −x(t)− 1, x(t1) = x0e−t1 (4.20)

and the solution is:

x(t) = −1 + (x0 + et1)e−t (4.21)

The switching time t1 is obtained by substituting t = T into (4.21) andletting it equal to zero (the target set), as shown below:

x(T ) = −1 + (x0 + et1)e−T = 0 (4.22)

(x0 + et1)e−T = 1, ⇒ x0 + et1 = eT , ⇒ t1 = ln(eT − x0) (4.23)

Initial condition x0 is negative: x0 < 0. The switching time t1 can beobtained for negative initial conditions in the same manner.

In this case (see Figure 4.4d)), the optimal control is zero u∗(t) = 0 fort ∈ [0, t1] and u∗(t) = +1 for t ∈ [t1, T ].

The switching time results from the calculations given below:

x(t) = −x(t) + 1, x(t1) = x0e−t1 (4.24)

x(t) = 1 + (x0 − et1)e−t (4.25)

x(T ) = 1 + (x0 − et1)e−T = 0 (4.26)

x0 − et1 = −eT , ⇒ t1 = ln(eT + x0) (4.27)

24

Page 26: 8-Pontryagin

4.1.4 Open-loop control

From all the calculations above, the optimal control is:

u∗(t) =

0, for x0 > 0 and t < ln(eT − x0)−1, for x0 > 0 and ln(eT − x0) < t < T0, for x0 < 0 and t < ln(eT + x0)

+1, for x0 < 0 and ln(eT + x0) < t < T

(4.28)

The optimal control (4.28) is an open-loop form since the current valueof the state x(t) is not used to determine the control signal.

Closed-loop control would be preferable to reduce the effects of distur-bances.

4.1.5 Closed-loop control

The closed-loop control law will be obtained by solving the state equation foru∗(t) = −1 and then for u∗(t) = +1, with the final condition x(T ) = 0. Thisis based on the observation that during the last part of the time interval[t1, T ], the optimal control is either +1 or −1 depending on whether theinitial state is negative or positive.

For x(t) > 0:

x(t) = −x(t)− 1, x(T ) = 0, ⇒ x(t) = −1 + eT−t, t > t1 (4.29)

For x(t) < 0:

x(t) = −x(t) + 1, x(T ) = 0, ⇒ x(t) = 1− eT−t, t > t1 (4.30)

During the first part of the interval t ∈ [0, t1], the optimal control is zeroand

x(t) = x0e−t, t < t1 (4.31)

The switching of the control from 0 to −1 occurs when the curve (4.31)intersect curve (4.29). We denote x(t) from (4.29) with:

s(t) = −1 + eT−t (4.32)

and call it the equation of the switching curve. The control will be 0 for allstate values smaller than s(t) and it will switch to −1 when they becomeequal.

25

Page 27: 8-Pontryagin

For negative values of the states x(t) < 0 we apply a similar reasoningand find out that the control is 0 for 0 < x(t) < s(t) and it switches to +1for x(t) ≤ s(t).

The closed-loop form of the optimal control can be written as:

u∗(t) =

−1, for x(t) ≥ s(t)0, for |x(t)| < s(t)+1, for x(t) ≤ −s(t)

(4.33)

Figure 4.6 shows the solution for x0 > 0.

s(t) = e T-t - 1

x(t)

t

e T -1

x 0

x 0 e - t

u*(t)

u*(t) -1

T

t 1

x*(t)

Figure 4.6: The optimal trajectory and the optimal control

26

Page 28: 8-Pontryagin

5Exercises

1. (Owens, 1981) Verify that for the scalar plant

x(t) = −x(t) + u(t), x(0) = 1, |u(t)| ≤ M t ∈ [0, 1] (5.1)

the admissible controller u∗(t) that minimizes the performance mea-sure:

J = x(1) +∫ 1

0u2(t)dt (5.2)

takes the form (open-loop):

(a) if 2M ≥ 1:

u∗(t) = −1

2et−1 (5.3)

(b) if 2M < 1:

u∗(t) =

{

−12et−1, for 0 ≤ t ≤ 1 + ln 2M

−M, for 1 + ln 2M ≤ t ≤ 1(5.4)

2. (Kirk, 2004) A differential equation that describes the leaky reservoirshown in Figure 5.1 is:

x(t) = −0.1x(t) + u(t) (5.5)

where x(t) is the height of the water and u(t) is the net inflow rate ofwater at time t. Assume 0 ≤ u(t) ≤M .

Find the optimal control law if it is desired to minimize:

J =∫ 100

0−x(t)dt (5.6)

27

Page 29: 8-Pontryagin

Figure 5.1: Reservoir

3. (Owens, 1981) Consider the plant with the state equations:

x1(t) = x2(t) (5.7)

x2(t) = u(t) (5.8)

Find the optimal control law that brings the system from the initialstate x(0) = [1, 0]T to the final state x(1) = [0, 0]T , and minimizesthe performance measure:

J =1

2

∫ 1

0u2(t)dt (5.9)

The control is constrained by: −M ≤ u(t) ≤ M .

4. Minimize the performance measure:

J =∫ 1

0x(t)dt (5.10)

subject to the state equation:

x(t) = u(t), x(0) = 1 (5.11)

and the control constraint: −1 ≤ u(t) ≤ 1.

5. Minimize the performance measure:

J =∫ 2

0

(

u2(t) + 3u(t)− 2x(t))

dt (5.12)

subject to the state equation:

x(t) = x(t) + u(t), x(0) = 5 (5.13)

and the control constraint: 0 ≤ u(t) ≤ 2.

28

Page 30: 8-Pontryagin

6. (Owens, 1981) Verify that the solution of the problem:

x(t) = −x(t) + u(t), x(0) = 0 (5.14)

J = 2x(1) +∫ 1

0

(

x(t) +1

2u2(t)

)

dt (5.15)

in the presence of the constraint |u(t)| ≤M is the optimal control:

u∗(t) =

{

−λ(t) if |λ(t)| ≤ 1−Msign(λ(t)) if |λ(t)| > 1

(5.16)

where λ(t) = 1 + et−1 is the solution of the costate equation

λ(t) = λ(t)− 1, λ(0) = 2 (5.17)

Hence verify that:

a)u∗(t) = −M if M ≤ 1 + e−1 (5.18)

b)u∗(t) = −1 − et−1 if M ≥ 2 (5.19)

c)

u∗(t) =

{

−1− et−1 if 0 ≤ t ≤ ln(M − 1)−M if t > 1 + ln(M − 1) > 1

(5.20)

if 1 + e−1 ≤ M ≤ 2.

29

Page 31: 8-Pontryagin

Bibliography

Beale, G. (2001). Optimal control. online, George Mason University.

Kirk, D. E. (2004). Optimal Control Theory. An Introduction. Dover Publi-cations, Inc.

Owens, D. (1981). Multivariable and Optimal Systems. Academic Press.

Weber, R. (2000). Optimization and control. online atwww.statslab.cam.ac.uk.

30