7
1486 Ind. Eng. Chem. Res. 1994,33, 1486-1492 PROCESS DESIGN AND CONTROL Time-Optimal Control by Iterative Dynamic Programming Bojan Bojkov and Rein Luus' Department of Chemical Engineering, University of Toronto, Toronto, Ontario, M5S 1 A4 Canada Although it is possible to have singular subarcs in time-optimal control, in most of the time interval the control is bang-bang in nature. We therefore apply iterative dynamic programming to search simultaneously for the switching times and for the values for control. Instead of using stages of equal length, we use stages of varying length so that switching times can be accurately determined. Computations based on three examples show that the optimal control policy can be obtained without difficulty even when the system equations are highly nonlinear. The computational procedure is straightforward, and the computations can be readily carried out on a personal computer. Introduction The problem of reaching a desired state from some given initial condition in minimum time is an important problem in engineering. For example, when a plant is started up, it is desirable to reach the final operating conditions as soon as possible. Also, when going from one operating condition to another, the transition should be accomplished as fast as possible if the intermediate product has no economic value. For two-dimensional systems, time- optimal control problems can be solved quite readily by applying Pontryagin's maximum principle. However, for high-dimensional problems, when the equations are nonlinear, the time-optimal control problem is very difficult to solve. The simplest approach to high-dimensional time- optimal control problems consists of choosing a sampling period over which the control is kept constant and then using optimization to see if the origin can be reached in a given number of steps. The smallest number of steps for which the origin can be reached then gives the solution to the time-optimal control problem. Such an approach was used for linear systems by Lapidus and Luus (19671, byBashein (1971), andmorerecentlybyRoseneta1. (1987). Since linear programming (LP) is computationally very fast, it was used to solve the time-optimal control problem involving linear systems. Rosen and Luus (1989, 1990) showed that by a similar approach, time-optimal control problems involving nonlinear systems can be solved using sequential quadratic programming (SQP) with continu- ation on the control policy to obtain the smallest final time for which the desired state can be reached. Alternatively, Edgar and Lapidus (1972) used discrete dynamic programming and penalty functions to determine the optimal singular bang-bang control policy of nonlinear chemical systems. Furthermore, Nishida et al. (1976) presented a suboptimal control procedure using piecewise maximization of the Hamiltonian and a penalty function using control variables to obtain the control policies for various problems. More recently, Bojkov and Luus (1994) used iterative dynamic programming (IDP) in conjunction with a false position search on the final time to determine the minimum time of high-dimensional chemical engi- neering systems. However, the bang-bang nature of the optimal control policy was not exploited. In the present investigation, we wish to examine the possibility of using IDP to find the best time-optimal 0SSS-5S85/94/2633-14~6~04.50/0 control policy to reach the origin. To determine the switching times accurately, we allow the stages to have varyinglengths. Therefore, as variables we take the values for control and also the length of each time stage. The purpose of this paper is to develop and test such a computational procedure based on iterative dynamic programming, as introduced by Luus (1990). Problem Statement scribed by the vector differential equation Let us consider the continuous dynamic system de- dx - = f(x,u) dt with initial state x(0) given, where x is an n-dimensional state vector, and u is an m-dimensional control vector subject to the simple upper and lower bounds (2) It is assumed that the bounds are physically realistic so that the system can be operated at each bound for a finite length of time. The problem is to choose u such that the desired state, taken to be the origin, is reached in minimum time. Numerically, it is required to choose the control policy u(t) in the time interval 0 5 t C tf such that cyj I uj(t) I pi, j = 1, 2, ..., m Ixi(tf)l 5 e, i = 1, 2, ..., n (3) and the final time t f is minimized. Let us now turn to the implementation of this minimum time problem by IDP. Mathematical Formulation To apply IDP to this problem, we first introduce an appropriate final time scalar performance index to be minimized: J = +(s(tf),tf) (4) so that the final state constraint in eq 3 is satisfied a t some value of the final time tf. We then transform the continuous control policy into a piecewise constant control problem by dividing the time interval 0 I t C tfinto P time stages each of variable length 0 1994 American Chemical Society

Time-Optimal Control by Iterative Dynamic Programming

  • Upload
    rein

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Time-Optimal Control by Iterative Dynamic Programming

1486 Ind . Eng. C h e m . Res . 1994,33, 1486-1492

PROCESS DESIGN AND CONTROL

Time-Optimal Control by Iterative Dynamic Programming

Bojan Bojkov and Rein Luus' Department of Chemical Engineering, University of Toronto, Toronto, Ontario, M5S 1 A4 Canada

Although i t is possible to have singular subarcs in time-optimal control, in most of the time interval the control is bang-bang in nature. We therefore apply iterative dynamic programming to search simultaneously for the switching times and for the values for control. Instead of using stages of equal length, we use stages of varying length so that switching times can be accurately determined. Computations based on three examples show that the optimal control policy can be obtained without difficulty even when the system equations are highly nonlinear. The computational procedure is straightforward, and the computations can be readily carried out on a personal computer.

Introduction

The problem of reaching a desired state from some given initial condition in minimum time is an important problem in engineering. For example, when a plant is started up, it is desirable to reach the final operating conditions as soon as possible. Also, when going from one operating condition to another, the transition should be accomplished as fast as possible if the intermediate product has no economic value. For two-dimensional systems, time- optimal control problems can be solved quite readily by applying Pontryagin's maximum principle. However, for high-dimensional problems, when the equations are nonlinear, the time-optimal control problem is very difficult to solve.

The simplest approach to high-dimensional time- optimal control problems consists of choosing a sampling period over which the control is kept constant and then using optimization to see if the origin can be reached in a given number of steps. The smallest number of steps for which the origin can be reached then gives the solution to the time-optimal control problem. Such an approach was used for linear systems by Lapidus and Luus (19671, byBashein (1971), andmorerecentlybyRoseneta1. (1987). Since linear programming (LP) is computationally very fast, it was used to solve the time-optimal control problem involving linear systems. Rosen and Luus (1989, 1990) showed that by a similar approach, time-optimal control problems involving nonlinear systems can be solved using sequential quadratic programming (SQP) with continu- ation on the control policy to obtain the smallest final time for which the desired state can be reached.

Alternatively, Edgar and Lapidus (1972) used discrete dynamic programming and penalty functions to determine the optimal singular bang-bang control policy of nonlinear chemical systems. Furthermore, Nishida et al. (1976) presented a suboptimal control procedure using piecewise maximization of the Hamiltonian and a penalty function using control variables to obtain the control policies for various problems. More recently, Bojkov and Luus (1994) used iterative dynamic programming (IDP) in conjunction with a false position search on the final time to determine the minimum time of high-dimensional chemical engi- neering systems. However, the bang-bang nature of the optimal control policy was not exploited.

In the present investigation, we wish to examine the possibility of using IDP to find the best time-optimal

0SSS-5S85/94/2633-14~6~04.50/0

control policy to reach the origin. To determine the switching times accurately, we allow the stages to have varyinglengths. Therefore, as variables we take the values for control and also the length of each time stage. The purpose of this paper is to develop and test such a computational procedure based on iterative dynamic programming, as introduced by Luus (1990).

Problem Statement

scribed by the vector differential equation Let us consider the continuous dynamic system de-

dx - = f(x,u) dt

with initial state x(0) given, where x is an n-dimensional state vector, and u is an m-dimensional control vector subject to the simple upper and lower bounds

(2)

It is assumed that the bounds are physically realistic so that the system can be operated at each bound for a finite length of time.

The problem is to choose u such that the desired state, taken to be the origin, is reached in minimum time. Numerically, it is required to choose the control policy u(t) in the time interval 0 5 t C tf such that

cyj I uj( t ) I pi, j = 1, 2, ..., m

Ixi(tf)l 5 e, i = 1, 2, ..., n (3)

and the final time t f is minimized. Let us now turn to the implementation of this minimum time problem by IDP.

Mathematical Formulation To apply IDP to this problem, we first introduce an

appropriate final time scalar performance index to be minimized:

J = +(s(tf),tf) (4)

so that the final state constraint in eq 3 is satisfied at some value of the final time tf. We then transform the continuous control policy into a piecewise constant control problem by dividing the time interval 0 I t C tfinto P time stages each of variable length

0 1994 American Chemical Society

Page 2: Time-Optimal Control by Iterative Dynamic Programming

Ind. Eng. Chem. Res., Vol. 33, No. 6, 1994 1487

(4) Starting at stage P, corresponding to the normalized time 7 p - L, for each state grid point generate R sets of allowable values for control

u(k) = t k - t k - 1 , k = 1, 2, ..., P (5)

where the control is kept constant during each of these time intervals. By having stages of variable length, we can determine switching times accurately.

Now, let us introduce a normalized time variable 7 such that

(6) t 7 = -

t f

with the discrete values

(7)

Therefore, in the transformed time domain, each stage is of equal length

(8) 1 L = - P

and the transformed system equations, in the time interval 7 k I 7 < 7 k + 1 , become

= u(k) Pf(x,u), k = 1, 2, ..., P (9)

In each time interval the control is kept constant a t u(7k). The time-optimal control problem is to determine U(7k) and the length of the time stages u ( k ) , for k = 1, 2, . . ., P, such that the constraints are satisfied and so that tf is minimized.

As formulated, this problem is now ideally suited for iterative dynamic programming. Although there are situations where time-optimal control involves singular subarcs as was the case of the autothermic reaction problem presented by Jackson (1966), a bang-bang type of control policy for the entire time interval provides a reasonable approximation. Therefore, to keep the problem compu- tationally as simple as possible, we examine only the extreme values for the control variables.

d7

Algorithm The basic iterative dynamic programming algorithm

employing randomly chosen candidates for the admissible control is given by Bojkov and Luus (1992, 1993). Here, we allow stages of different length to be used by trans- forming the time variable and then determining the optimal length of each stage, as well as the best control policy. The algorithm is summarized into 8 steps: (1) Choose the number of time stages P, allowing a maximum of P - 1 switching times. (2) Choose the number of state grid points N , the number of allowable values for the control R, the initial region size rj for each of the control variables uj, and the initial region size s for the scalar stage length parameter u. Also, choose initial values for uj and u for each stage. (3) By choosing N values for control and the stage length, evenly distributed inside the specified regions, integrate eq 9 from 7 = 0 to 7 = 1, to generate N values for the state grid a t each stage. For u t the bang- bang control policy closest to the boundary IS used:

1 2 1 if u j < aj + - (p. - aj) then u j = aj, j = 1,2, ..., m (10)

or

1 2 1

if u j 1 aj + -(p. - aj) then u j = pi, j = 1, 2, ..., m (11)

u(7p- l ) = u?(7p-1) + 0r(q)(7p-1)

u(P) = u*(P)‘q’ + ws‘q’(P)

(12)

and for the stage length u by

(13)

where 0 is an (rn X m) diagonal matrix with random numbers between -1 and 1 along the diagonal and w is a scaIar random number between -1 and 1; (7p-1) is the best value for control, and uJq)(P) is the best value for the stage length, both obtained for that particular grid point in the previous iteration. For each test point different 0 andw are used. The control uj for j = 1,2, ..., m is subjected to eqs 10 and 11, but u(P) is a nonnegative unconstrained variable. Now, integrate eq 9 from 7p-1 to 7 p once with each of the R sets of allowable values for control and stage length to yield x ( 7 p ) . For each grid point we therefore have R values of the performance index and we choose the control and the stage length that give a minimum value for J i n eq 4; store the corresponding control and the stage length for use in step 5. (5) Step back to stage P - 1, corresponding to time 7 p - 2L, and for each state grid point, generate R allowable sets of controls and stage lengths and integrate eq 9 from r p - 2L to 7 p - L once with each of the R sets. To continue the integration from 7 p - L to 7 p choose the control and the stage length from step 4 that correspond to the grid point closest to the x at 7 p - L. Compare the R values of the performance index J, and store the control policy and the stage length that give the minimum value. (6) Repeat step 5 until stage 1, corresponding to the initial time 7 = 0, is reached. We now have the control policy and the stage lengths of each stage that yield the minimum value for the performance index J i n eq 4. (7) Reduce the size of the allowable region for the stage length by an amount y,

S(S+l)(k) = ys@)(k) , k = 1, 2, . . a , P (14)

where q is the IDP iteration index. The region size for allowable control, r(q), however, is kept constant at its initial value, do) throughout all the iterations, allowing the control to leave the boundary if necessary. Use the best control policy and the stage lengths from step 6 as the midpoint for the next iteration. (8) Increment the IDP iteration index q by 1, and go to step 4. Continue the procedure for a specified number of iterations, such as 30, and examine the results.

From the computational point of view, we like to keep the number of grid points N and the number of sets of allowable values for control and stage length R as small as possible. Bojkov and Luus (1993, 1994) showed that the use of a region contraction factor y = 0.80, as few as N = 3 or N = 5 state grid points, and R = 15 allowable values for control in iterative dynamic programming yielded rapid convergence to the global optimum of typical chemical engineering systems. If R is chosen to be 15 or larger then the results are also expected to be independent of the random number generator.

Examples All computations were performed in double precision

on a 486DX2-66 personal computer using IBM’s OS/2 version 2.1 operating system and the WATCOM FOR- TRAN 7732 version 9.5 compiler. The FORTRAN sub-

Page 3: Time-Optimal Control by Iterative Dynamic Programming

1488

routine DVERK of Hull et al. (1976) with a tolerance of was used for the integration of the differential

equations in examples 1 and 2. For the third system, to decrease the computation time without compromising on the accuracy, the FORTRAN subroutine LSODE of Hindmarsh (1980) with a tolerance of 10-8 was used. The built-in compiler random number generator URAND was used for the generation of the random numbers.

Ind. Eng. Chem. Res., Vol. 33, No. 6, 1994

Example 1: Drug Displacement Problem Let us first consider the time-optimal drug displacement

problem, considered recently by Maurer and Wiegand (1992), where a desired level of two drugs, warfarin and phenylbutazone, must be reached in a patient's blood- stream in minimum time. We consider here the case where there is no upper bound on the concentration of warfarin. The equations expressing the changes in the concentrations of these two drugs in the patient's bloodstream are

dxi D2 dt C, - = -[C,(0.02 - xl) + 4 6 . 4 ~ 1 ( ~ - 2 ~ 2 ) ] (15)

dx2 D2 dt C, - = -[C,(U - 2x2) + 46.4~,(0.02 - X I ) ] (16)

where x1 is the concentration of warfarin, x2 is the concentration of phenylbutazone, and

D = 1 + 0.2(x1 + x , ) (17)

C, = D2 + 232 + 4 6 . 4 ~ ~ (18)

C, = D2 + 232 + 4 6 . 4 ~ ~

C, = CIC, - 2 1 5 2 . 9 6 ~ ~ ~ ~

(19)

(20)

The control is the rate of infusion of phenylbutazone and is bounded by

0 5 ~ 5 8 (21)

The initial concentrations of warfarin and phenylbutazone are given by

The system is to be taken to the desired final state

(23)

and tf is to be minimized.

the performance index To solve this time-optimal control problem, we introduce

J = (xl(tf) - 0.02)2 + (x,(tf) - 2.00)' (24)

to be minimized and wish to find the control policy u such that

IXl(tf) - 0.021 I 5 x io-' (25)

and

are reached and tf is minimized.

A ++-A- A- A- t +

1 E 3 X

r

50 0 20 40

ITERATION NUMBER Figure 1. Convergence profiles for the drug displacement problem in example 1, withP = 5 , N = 3, R = 15, and y = 0.80. (0) Performance index, (A) final time, tf.

0.020 0.0 0.5 1 .o NORMALIZED TIME ( t / t , )

Figure 2. State profiles for the drug displacement problem in example 1 with tf = 221.465. (-) Warfarin concentration, XI; (- -) phenylbutazone concentration, x2.

For the first run, we chose the number of stages P = 5, allowing a maximum of 4 switching times, the number of state grid points N = 3, and the number of allowable values for control and stage lengthR = 15 and allowed a maximum of 50 iterations. By starting with an initial control policy u(k)(O) = 0 and a control region size r(k)co ) = 12, for k = 1, 2, ..., P, and choosing as initial guesses for the stage length u(k)(O) = 15 and the region size s(k)(O) = 20, k = 1, 2, ..., P, we obtained rapid convergence of the performance index in eq 24 to a value of 2.40 X 10-15, as is shown in Figure 1. The resulting final time of 221.4653 is in good agreement with the value of 221.4661 reported by Maurer and Wiegand (1992). It is interesting to note that the final state specifications in eqs 25 and 26 were reached at a final time of 221.465 after 41 iterations requiring only 15 s of CPU time on the personal computer. The resulting control policy for this system consists of initially taking u at the upper bound (u = 8) until t = 189.156 and then switching to u = 0 and keeping the control a t this lower bound until tf = 221.465, which is the minimum final time. In Figure 2, the resulting state trajectories are plotted as a function of the normalized time t/tf. It is noted that at the switching time tltf = 0.854 there is a sudden change in the trajectories.

The convergence properties of the algorithm for this system to within the final state specifications in eqs 25 and 26 were investigated by studying the effects of the initial values for the stage length u(k)(O) and the initial region size s(k)(O), as illustrated in Figure 3. Of the 154 runs, 50 runs yielded the time-optimal control policy within the specified tolerance in 50 or fewer iterations. Another 19 runs yielded results very close to the optimal control policy, while not reaching the specifications in eqs 25 and

Page 4: Time-Optimal Control by Iterative Dynamic Programming

r. - u,

E

E

0

26. The computation time for each run consisting of 50 iterations was less than 20 s. Therefore, the global optimum was reached quite easily.

- G G " . * * . * * _

- 0 5 0 4 3 ' * * * * - = 6 0 - 4 0 * 4 9 G * * -

- ' 4 1 ' 4 3 G ' * G 5 0 4 6 ' - t e * 4 3 5 0 * * .

~ 4 0 - ' 4 0 ' * 4 7 G 4 3 ' * - - 4 8 3 8 4 7 . e e t a .

- 4 9 4 3 * 3 a 5 0 * e * * -

- 4 6 4 9 4 9 4 8 * 4a * - - 3 a 4 6 4 t e . * * 4 6 e -

2 0 - 4 6 4 6 G 4 1 G 4 0 4 3 3 6 * * - - 29 0 36 0 4643484840 0 -

383845 G 3a e 45 - _ * " _ . . * * G 45454635 -

' - ' a ' a ' s ' a a

Example 2: Minimum Time Control of a Two-Link Robotic Arm

Let us consider the robotic arm discussed for time- optimal control by Weinreb and Bryson (1985) and Lee (1992), consisting of two identical rigid links, where an idealized payload is located at the tip of the elbow link. The torques u1 and up are generated by the two torquers located at the shoulder and elbow of the arm. With the assumption that the arm operates in a horizontal plane and its links have unlimited angular freedom about their respective joints, the equations of motion as given by Lee (1992) are

2 dx = -[sin(x,)($; + 4 9 c0s(x3) x:) + dt

3 7 - 2 cos(x,) (ul - up) - -up] 3 / [ 5 + sin2(x3)] (28)

-- dx3 - x 2 - x1 dt

(30)

In eqs 27-30,xl and x p are the inertial angular velocities of the shoulder and elbow links, respectively, and x3 and x4 represent angles. The control variables u1 and up, representing the torque are bounded by

The problem is to find the control policy of the robotic arm, starting and ending at rest, with the initial state

Ind. Eng. Chem. Res., Vol. 33, No. 6, 1994 1489

~.0007

and reaching the final state ro.0od

(32)

(33)

in minimum time tf. We therefore take the final time performance index expressing the deviations from the desired state

J = xl(tf)2 + ~ ~ ( t , ) ~ + ( x 3 ( t f ) - 0.500)2 + (x4(tf) - 0 ~ 5 2 2 ) ~

(34)

to be minimized. As in example 1 we chose the number of time stages P

= 5, allowing for a maximum of 4 switching times. After some preliminary runs with the IDP parameters of example 1, it was found that the easiest way to obtain adequate convergence was to use a multipass method where the control policy and stage lengths from the last iteration were used as the initial control for the subsequent pass, while restoring the search region to s (O). Furthermore, the number of state grid points N was increased from 3 to 5 and the allowable values for the variables R were increased from 15 to 25, to ensure convergence. The use of these parameters with the performance index in eq 34 yielded rapid convergence of the final states x 3 and x 4 but left deviations from the final state values for x1 and xp. It was therefore necessary to modify eq 34 by reducing the influence of the state variables x3 and x4 relative to the first two state variables, by introducing the weighting factor q into the performance index

J, = xl(tf)2 + x2(tf)' + q[(x3(tf) - 0.500)2 + (x4(tf) - 0.5221'1 (35)

where the subscript q on J i s used to specify the weighting factor used in the performance index.

By using the same IDP parameters as before and choosing the weighting factor q = 0.3, the initial control u(k)(o) = 1 with the initial region size r(k)(O) = 3, fork = 1, 2, ..., P, and the initial stage length u(k)(O) = 0.25 with region size .s(k)(O) = 0.1, for k = 1, 2, ..., P, the procedure yielded a final time of 2.9814 after 50 passes of 30 iterations each with the resulting final state

D.0ooil (36)

Convergence to the optimal values of J,, and tf was systematic and rapid as is shown in Figure 4. The computation time was 47 min for the 50 passes.

To get results closer to the desired final state, the problem was resubmitted for 50 more passes with IDP. The minimum time of 2.9822 was obtained, with the corresponding control policy given in Figure 5. Although the final time was increased, each state variable was within 5 X 10-6 at the final time. This result is a slight improvement over 2.989 reported as the minimum time by Lee (1992) for a continuous control policy. The specific value of q is not crucial since it was found that q in the

Page 5: Time-Optimal Control by Iterative Dynamic Programming

1490 Ind. Eng. Chem. Res., Vol. 33, No. 6, 1994

I ~ ' ~ I ' ~ ~ I 0 . 1 5 - ' * * * * ' * 31 * 40-

* 3 7 * * * * 35 36 - * 3 6 ' * * * 3731 * . * 3 4 ' * * * * 3 8 ' - * 3 0 ' * , 3 1 3 3 ' * .

-0.10 - * * * 39 * * 3 2 3 3 3 3 * * - * 303333 * * * - 2 _ e * . .

- * 36 43 * 32 32 27 32 * * * - _ . ' 30 25 30 25 * ' * - K 4 - 3 3 * 3 5 2 8 2 7 3 2 3 1 * * * * ~

5 0.05 -32 * * 3 0 3 0 3 1 * * * * * - - 0 * 3 3 2 5 2 5 2 7 ' * * * * - - * 2 6 2 5 3 0 3 2 * * * * * - . . . 2 8 Z 7 2 6 ' I . f * ( I .

- * 363535 * 46 * * * * * -

_ t * * _ . t

.w - - - . . . . - v)

0 8

t z

0.00 * I ' a ' c a I

I " ' l " ' I " i

* * + + - A 3 1

0 20 40 PASS NUMBER

Figure 4. Convergence profiles for the robot arm problem in example 2, with P = 5, N = 5, R = 25, and y = 0.80. (e) Performance index, (A) final time, tf.

CI

s I

I Y 5 I

range 0.1-0.5 yielded the same final time and the same control policy.

The convergence of the algorithm to the vicinity of the optimal final time tf = 2.9822 was investigated for various initial values for the stage parameters u(k)(O) and s(k)(O). As is shown in Figure 6, out of 165 sets of starting conditions the algorithm converged 55 times to within 0.07 % of the optimal final time in approximately 30-35 passes, and the region over which convergence was obtained is reasonably large.

Example 3: Startup of an Autothermic Reaction System

Let us consider the optimization of the start-up pro- cedure for an autothermic reaction where a first order reversible reaction A $ B is occurring. This nonlinear system was first presented by Jackson (1966) and used for optimal control studies by Nishida et al. (1976). It is interesting to investigate this system because of the multitude of solutions reported by Jackson (1966) and by Nishida et al. (1976). Although Jackson (1966) and later Nishida et al. (1976) showed that the desired state for this system could not be reached by using a bang-bang control policy, we shall attempt to reach the close vicinity of the desired state by using the proposed algorithm employing bang-bang control. The dynamic behavior of this reaction system is represented as follows

(37)

where XI is the mole fraction of the product B, x2 is the temperature in the reactor, and { is the reaction rate expression given by

{ = [l - ~ ~ 1 2 4 1 7 exp - ~ - ( ":03) ~ ~ ( 2 . 6 8 3 X lo6) exp - - (39) ( 9

The control variables u1 and u2, representing the external heat supply to the reactant preheater and the fraction of the hot gas leaving the reactor, are bounded by

0 I u1 I 200 (40)

and

O I u , I l (41)

The reactor startup problem is to determine the controls ul(t) and uz(t) which will transfer the system from the initial state

x(0) = [;;I to the desired final operating state

(43)

in minimum time tf. Since there is an order of magnitude difference between

the two state variables, as in example 2, we chose the weighted performance index to be minimized

(44)

with q = lo4 to balance the magnitudes of the two state variables. After some preliminary runs using P = 5 time stages, we used for our first run a multipass method of 30 iterations each, the number of state grid points N = 5, and the number of allowable values for control and stage length R = 15. As initial time stage length we chose u(k)(O) = 0.4 with an initial region size s(k)(O) = 0.1, for 12 = 1, 2, ..., P. Rapid convergence of the weighted performance index to a value of 0.3296 X lo4 was obtained as can be seen in

J, = ( X , ( t f ) - 0.7)' + q ( x z ( t f ) - 800)'

Page 6: Time-Optimal Control by Iterative Dynamic Programming

0.4 - - e - 2 0 (3

0.2 A 5 t

0.i

0.0

2.3

, I ~ , ~ I ~ I f ~ I I I I I ,

G GOO- . . . * * "*O.**"."* 0' * a * * * GGOO-

_ * o * * * * * * OOOGO 0- OOOOGO- _. . t." 'O*"" " " OGOGGO- .*..**O.."'*

0""' OOGOOO@- 000G000- .* f t t ' O ' " . "

- . * * t . t G G000* OGO'OO?- O'G'OO' 'GOO0 -

. * . . * . OOOGGOOO000G*?- OO~GOGGGO*O'O -: : : O~OGGGGGGGO OOG':

~OO'OG*GGO'*G000- ~G0OGOGGO~GGG"'- OOOOOOOGO" * l a * -

O G O G * O G O O * * ~ " - - * ' O O O ~ O O O O a G * G G ~ O ' u - - " G G G G G ~ G G G ~ ' O G ~ k ~ ' -

_. . . *.. ..*....

2 0 . 3 - . * * . . * 00"." . 'e" '0- _.*.... . * * * I "

_... . * * * * _ * * * * .

- ~ o o ~ o o o . . o ; ~ ~ ~ ~ ~ ~ ~ ~ - :;gg:;:;: + t . 1/ . * :

- * * ' I 3 n n ' a a a

4 w 2.1 4 ; 2.0 =

IE-5 1.9 0 4 8

PASS NUMBER Figure 7. Convergence profiles for the autothermic reaction, with P = 5, N = 5, R = 25, and y = 0.80. (0) Performance index, (A) final time, tf.

Figure 7. The state variables obtained for this run at tf = 2.249 are

(45)

The violation of the final state constraints are negligible, and the improvement over the minimum time of 2.8 reported by Nishida et al. (1976) is quite significant. As is shown in Figure 7 the convergence to the optimum was obtained in only 5 IDP passes.

To drive the reaction system closer to the desired state than obtained in eq 45, we used continuation on the factor 9 , through the following scheme: TI = lo4 is used for the first 5 IDP passes and is then increased by a factor of 10 for each subsequent pass until 9 = 1 is reached for the 11 th pass. Starting from an initial time of 2, the procedure yielded a final time of 2.252 with a slight improvement in the second state variable

0.6961 ~ 2 . 2 5 2 ) = [ 800.00]

The computation time for these 11 passes was 40 min. Another run with u(k)@) = 0.36 and s(k)(O) = 0.26 yielded a minimum time of 2.247, and a performance index value of J1 = 0.33980 X 10" with the final state

(47)

This is a very minor improvement over the result in eq 46 and represents very closely the desired state. It is interesting to observe that both controls switch only once at t = 1.488, where u1 switches from the upper bound to the lower bound, and u2 switches from the lower bound to the upper bound. The corresponding state trajectories are given in Figure 8.

The region of convergence to within 0.03% of the optimum performance index, Le., to within J1 = 0.33990 X 10" and tf = 2.247, for the continuation approach is shown in Figure 9, based on the choice of the initial stage length u(k)(O) and region size s(k)(O). As was the case in the two previous examples, this procedure has a wide region where convergence to the vicinity of the optimum occurs. As the initial values for u(k)(O) and s(k)(O) increase, the final states are still satisfied but the final time is slightly larger than the optimum 2.247. These cases are indicated by 0 in Figure 9. Such a problem occurs because of the relatively flat profile of the performance index in the range 2.25 5 tf I 2.30. This final time, however, is significantly better than previously reported in the literature.

In all of these three nonlinear examples, it is observed that the choices of the initial stage length and the corresponding initial region are quite important to ensure that the global optimum is reached. A relatively small number of starting conditions is required to obtain the global optimum. The results obtained from one run usually give useful information for the next run, so that in practice the success rate is somewhat higher than choosing the starting conditions at random or over a uniform grid. Since the computational effort is not excessive, several runs of different starting conditions can be easily carried out in a single day. Research in this area of global optimization and how to avoid local optima is continuing.

Conclusions

To solve nonlinear time-optimal control problems by simultaneously searching for the stage lengths and also for the values for control, as presented here, appears to be an attractive way of tackling such difficult problems. The three examples considered here show that convergence is obtained for typical nonlinear problems and that the proposed algorithm based on iterative dynamic program- ming yielded control policies that are better than the results previously reported in the literature. The method provides a convenient way of cross-checking results for highly nonlinear systems, since the computations can be readily performed on a personal computer.

Page 7: Time-Optimal Control by Iterative Dynamic Programming

1492 Ind. Eng. Chem. Res., Vol. 33, No. 6, 1994

Acknowledgment Financial support from the Natural Sciences and

Engineering Research Council of Canada Grant A-3515 is gratefully acknowledged.

Nomenclature C = variables used in example 1 D = variables used in example 1 f = nonlinear function of x and u, (n X 1) i = general index, state variable number j = general index, control variable number J = performance index k = general index, stage number L = length of time stage, (= P -I) m = number of control variables n = number of state variables N = number of grid points for the state P = number of stages q = general index, IDP iteration r = region over which control is taken, (rn X 1) R = number of seta of random allowable values for the control

s = region size for the parameter u k t = time tf = final time u = control vector, (rn X 1) u = stage length as defined in eq 5 x = state vector, ( n X 1)

Greek Letters a, = lower bound on control uj 0, = upper bound on control uj y = reduction factor for the control region c = final state constraint specification used in eq 3 { = reaction term for example 3 r) = weighting factor for the performance index J 6 = diagonal matrix of random numbers between -1 and 1 7 = normalized time 4 = objective function as defined in eq 4 w = random number between -1 and 1

vector and stage lengths

Literature Cited Bashein, G. A Simplex Algorithm for On-line Computationsof Time-

Optimal Controls. IEEE Trans. Autom. Control 1971, AC-16,479- 482.

Bojkov, B.; Luus, R. Use of Random Admissible Values for Control in Iterative Dynamic Programming. Ind. Eng. Chem. Res. 1992, 31, 1308-1314.

Bojkov, B.; Luus, R. Evaluation of the Parameters used in Iterative Dynamic Programming. Can. J. Chem Eng. 1993, 71, 451-459.

Bojkov, B.; Luus, R. Application of Iterative Dynamic Programming to Time Optimal Control. Chem. Eng. Res. Des. 1994, 72,72430.

Edgar, T. F.; Lapidus, L. The Computation of Optimal Singular Bang- Bang Control 11: Nonlinear Systems. AIChE J. 1972,18,780-785.

Hindmarsh, A. C. LSODE and LSODI, Two New Initial Value Ordinary Differential Equation Solvers. ACM-SIGMUM News- lett. 1980, 15, 10-11.

Hull, T. E.; Enright, W. D.; Jackson, K. R. User guide to DVERK -a subroutine for solving nonstiff ODES. Report 100, Department of Computer Science, University of Toronto: Toronto, Ontario, Canada, 1976.

Jackson, R. Optimum Startup Procedures for an Autothermic Reaction System. Chem. Eng. Sci. 1966,21, 241-260.

Lapidus, L.; Luus, R. Optimal Control of Engineering Processes; Blaisdell: Waltham, MA, 1967; pp 177-184.

Lee, A. Y. Solving Constrained Minimum-Time Robot Problems using Sequential Gradient Restoration Algorithm. Optim. Control Appl. Methods 1992,13, 145-154.

Luus, R. A Practical Approach to Time-Optimal Control of Nonlinear Systems. Ind. Eng. Chem. Process Des. Dev. 1974,13,405-408.

Luus, R. Optimal Control by Dynamic Programming using Accessible Grid Pointa and Region Contraction. Hung. J. Ind. Chem. 1989,

Luus, R. Application of Dynamic Programming to High-Dimensional Nonlinear Optimal Control Problems. Int. J. Control 1990, 52, 239-250.

Maurer, H.; Wiegand, M. Numerical Solution of a Drug Displacement Problem with Bounded State Variables. Optim. Control Appl. Methods 1992, 13, 43-55.

Nishida, N.; Liu, Y. A.; Lapidus, L.; Hiratauka, S. An Effective Computational Algorithm for Suboptimal Singular and/or Bang- Bang Control. AIChE J. 1976,22, 505-523.

Rosen, 0.; Luus, R. Sensitivity of Optimal Control to Final State Specification by a Combined Continuation and Nonlinear Pro- gramming Approach. Chem. Eng. Sci. 1989,44, 2527-2534.

Rosen, 0.; Luus, R. Evaluation of Gradienta for Piecewise Constant Optimal Control. Comput. Chem. Eng. 1991, 15, 273-281.

Rosen, 0.; Imanudin; Luus, R. Sensitivity of the Final State Specification for Time-Optimal Control Problems. Int. J. Control

Weinreb, A.; Bryson, A. E., Jr. Optimal Control of Systems with Hard Control Bounds. IEEE Trans. Autom. Control 1985, AC-

17, 523-543.

1987,45, 1371-1381.

30,1135-1138.

Received for review November 15, 1993 Revised manuscript received February 22, 1994

Accepted March 8, 1994'

Abstract published in Advance ACS Abstracts, April 15, 1994.