Optimal Control by Iterative Dynamic Programming with Deterministic and Random Candidates for Control

PROCESS DESIGN AND CONTROL

Optimal Control by Iterative Dynamic Programming withDeterministic and Random Candidates for Control

Wichaya Mekarapiruk and Rein Luus*

Department of Chemical Engineering, University of Toronto, Toronto, Ontario M5S 3E5, Canada

In addition to randomly chosen candidates for control, we examine the effect of also includingdeterministic control candidates in iterative dynamic programming (IDP) to improve the chanceof achieving the global optimal solution. Two types of deterministic control candidates (shiftingand smoothing candidates) are chosen on the basis of the control policy obtained in the previousiteration. The search for the optimal value for control in the subsequent iteration is then madeon the combined set of control candidates chosen randomly and deterministically. Three highlynonlinear and multimodal chemical engineering optimal control problems are used to illustratethe advantages of this procedure in obtaining the global optimum.

Introduction

In recent years, iterative dynamic programming (IDP)has been developed and applied to solve successfullymany different classes of chemical engineering optimalcontrol problems.1 In the most recent IDP algorithms,the candidates for control are chosen at random insidea region that is contracted after every iteration. Thisprovides a high probability in attaining the globaloptimum solution and ease of implementation becauseno gradient information or auxiliary variable is re-quired. The use of random search approach with IDPwas suggested by Bojkov and Luus2 as a better alterna-tive to the earlier approach of choosing the test valuesfor the control variables over a uniform distribution.3However, even though the chance of getting the globaloptimum with IDP is high, if the system is highlynonlinear, sometimes the convergence is slow and a localoptimum may result.4-6

To improve the convergence property and the prob-ability of obtaining the global optimum of IDP, in thispaper, we investigate the use of a deterministic ap-proach together with the existing random approach toselect the test values for the control variables. Thecandidates for control chosen deterministically here canbe classified into two types: shifting and smoothingcandidates. The shifting candidate is meant to shift thecontrol policy forward or backward along the time scaleto prevent stalling at a local optimum. On the otherhand, the smoothing candidate is meant to smooth thecontrol profile and facilitate the convergence of thealgorithm. These control candidates are calculated fromthe best control policy obtained in the previous iteration.They are then used in combination with the controlcandidates chosen randomly to establish the searchspace for IDP for the next iteration. In this way, thedeterministic candidates are taken only as “sugges-

tions”, and neither the deterministic nor random com-ponent of the search mechanism dominates the conver-gence of the algorithm. Thus, at the same time, we cantake advantage of the robust characteristic generallyprovided by the random candidates, and the fast con-vergence rate and guarding feature against stalling ata local optimum offered by the deterministic candidates.To illustrate and test the procedure, we choose threehighly nonlinear chemical engineering optimal controlproblems for which difficulties in optimization have beenreported.

Optimal Control Problem

We consider the continuous dynamic system describedby the differential equation

where the initial state x(0) is given. The state vector xis an (n × 1) vector and u is an (m × 1) control vectorbounded by

The performance index associated with this system isa scalar function of the state at final time tf given by

The optimal control problem is to find the control policyu(t) in the time interval 0 e t < tf that maximizes theperformance index in eq 3.

Deterministic Candidates for Control

The IDP algorithm used here is based on the algo-rithm developed by Luus3 for seeking piecewise constantcontrol with region sizes contracted in a multipassfashion as recommended by Luus7 for high-dimensional

* To whom correspondence should be addressed. Tele-phone: 416-978-5200. Fax: 416-978-8605. E-mail: [email protected].

dxdt

) f(x,u) (1)

Rj e uj e âj j ) 1, 2, ..., m (2)

I ) Φ(x(tf)) (3)

84 Ind. Eng. Chem. Res. 2000, 39, 84-91

10.1021/ie990381t CCC: $19.00 © 2000 American Chemical SocietyPublished on Web 12/02/1999

systems. The main features of IDP are given in thereview paper by Luus.1 As test values for control at eachtime stage in each iteration, here we choose R candi-dates at random and also three additional deterministiccandidates based on the values for control at adjacentstages. The calculation of these deterministic candidatesis similar to the use of shift and smoothing reproductiveoperators to generate new offsprings in evolutionaryoptimization method as proposed by Pham.8 At eachtime step, these three additional candidates for controlare chosen as follows:

where φ is a smoothing parameter, 0 e φ e 0.5. Here[uc]i,k

q denotes the i-th control candidate for u at stage kand iteration q, and [us]k

q-1 denotes the best controlvalue for u at stage k and iteration q - 1. The first Rvalues of uc are the control candidates chosen at randomover some allowable regions, and eqs 4-12 give theadditional three candidates. Therefore, the total numberof candidates for control used at each grid point in eachtime stage is M ) R + 3.

The control candidates chosen in eqs 4, 5, 7, and 10are shifting candidates, which are simply taken to beequal to the best control values obtained from theprevious iteration for the adjacent stage(s). Because itis often found that a local optimum is just the shiftingof the optimal control policy one or a few steps forwardor backward along the time scale with similar size andshape,4,9 it is expected that the shifting candidateschosen in this way enable an escape route from a localoptimum. This is especially important if there is a largechange or switching involved in the control policy.

The candidates for control calculated in eqs 6, 8, 9,11, and 12 are used to smooth the control policy. Here,φ is the smoothing factor with the value within therange of 0-0.5. The motivation for the introduction ofsmoothing candidates is the fact that the optimal controlpolicy always contains some smooth portion(s). Thus,by incorporating the smoothing candidates into thesearch space of IDP, the smooth portions of the optimalcontrol policy can be established more readily, which

then reduces the computation effort required for IDP.Here we shall use the same smoothing factor φ for alltime stages. The choice for the value of φ will bediscussed in the examples.

Numerical Results

All computations were performed in double precisionon a Pentium/300 personal computer using WatcomFortran compiler version 9.5. The built-in compilerrandom number generator URAND was used for thegeneration of random numbers. The Fortran subroutineDVERK of Hull et al.10 was used for the integration ofthe differential equations with the local error tolerancechosen to be 10-6 for examples 1 and 2, and 10-8 forexample 3.

Example 1: Ethanol Fermentation Problem

To illustrate the procedure, let us first choose thefermentation process involving ethanol fermentation asformulated and used for optimal control by Hong,11 andused for optimal control studies by Hartig et al.,12

Bojkov and Luus,13 and Luus and Hennessy.14 Thedifferential equations describing the ethanol fermenta-tion are

with

where x1 represents the cell mass concentration, x2 isthe substrate concentration, x3 is the desired productconcentration, and x4 is the volume of the liquid insidethe reactor.

The initial state is specified as

with the feed rate to the reactor u constrained by

The liquid volume is limited by the 200 L vessel size,so that at the final time tf,

The performance index to be maximized is the totalmass of the desired product at the final time:

for 1 < k < P

[uc] R+1,kq ) [us] k-1

q-1 (4)

[uc] R+2,kq ) [us] k+1

q-1 (5)

[uc] R+3,kq ) φ[us] k-1

q-1 + (1 - 2φ)[us] kq-1 + φ[us] k+1

q-1

(6)

for k ) 1

[uc] R+1,1q ) [us] 2

q-1 (7)

[uc] R+2,1q ) (1 - φ)[us] 1

q-1 + φ[us] 2q-1 (8)

[uc] R+3,1q ) φ[us] 1

q-1 + (1 - φ)[us] 2q-1 (9)

for k ) P

[uc] R+1,Pq ) [us] P-1

q-1 (10)

[uc] R+2,Pq ) (1 - φ)[us] P

q-1 + φ[us] P-1q-1 (11)

[uc] R+3,Pq ) φ[us] P

q-1 + (1 - φ)[us] P-1q-1 (12)

dx1

dt) Ax1 - u

x1

x4(13)

dx2

dt) -10Ax1 + u(150 - x2

x4) (14)

dx3

dt) Bx1 - u

x3

x4(15)

dx4

dt) u (16)

A ) ( 0.4081 + x3/16)( x2

0.22 + x2) (17)

B ) ( 11 + x3/71.5)( x2

0.44 + x2) (18)

x(0) ) [1 150 0 10]T (19)

0 e u e 12 (20)

x4(tf) - 200 e 0 (21)

Ind. Eng. Chem. Res., Vol. 39, No. 1, 2000 85

For the maximum yield, we expect that we must havethe maximum allowed volume. Therefore, instead of theinequality constraint in eq 21, we have the equalityconstraint

Therefore, we formulate the augmented performanceindex to be maximized as

where a penalty function with the shifting term s isadded to the performance index in eq 22. The value ofthe shifting term is initially chosen to be zero and isupdated after every iteration by the amount of violationof the equality constraint at the final time tf, i.e., by

where q is the iteration number. The use of thequadratic penalty function with a shifting term wasintroduced by Luus15 as a way of dealing with difficultequality constraint in steady-state optimization. It wasfound to be efficient in handling equality state con-straints in optimal control problems.16-18

To have direct comparison with the results in theliterature,12,14 we chose the final time to be 63.0 h, andwe considered the case when the time interval wasdivided into 20 stages (P ) 20). Hartig et al.12 comparedthree optimization methods: IDP, sequential quadraticprogramming (SQP), and the direct search optimizationmethod of Luus and Jaakola19 (LJ optimization) insolving this problem and reported that IDP yielded thehighest success rate with P ) 20. However, they foundthat, due to numerous local optima, a large number ofgrid points had to be used in order to achieve the globaloptimum. In fact, with the use of 23 grid points andbetween 21 and 41 allowable values for control, theglobal optimum was obtained in 18 out of 28 runs. Luusand Hennessy14 applied LJ optimization with an adap-tive scheme for adjusting region size and with a shiftingterm in the penalty function to solve this problem, andachieved reasonably high success rates for P ) 10 and15. However, when P ) 20, the overall success rate wasfound to be only 35%.

In the work by Hartig et al.,12 IDP was used withabsolute value penalty function to handle inequalitystate constraint in eq 21. Here we shall examine firsthow well the regular IDP algorithm performs when usedwith a quadratic penalty function including the shiftingterm. We took the penalty function factor θ ) 1, andchose the initial control to be halfway between thebounds, namely u(0) ) 6, and the initial region size tobe the size of the interval between the upper boundaryand the lower boundary on u: i.e., r(0) ) 12. Thereduction factor γ and the restoration factor η weretaken to be 0.8 and 0.9, respectively. We allowed amaximum of 50 passes, each pass consisting of 20iterations. As is shown in Table 1, here the optimum I) 20841.1 is achieved with a smaller number of gridpoints and a smaller number of allowable candidatesfor control compared to the results reported by Hartiget al.12 Of the 18 runs, the global optimum was obtainedseven times; thus, the success rate in getting the global

optimum is reasonably good but not very high (39%).In Table 1, the results obtained with r(0) ) 6 are alsogiven. When smaller initial region size is used, theoptimum is more difficult to reach with the regular IDP.In this case, of the 18 runs, the global optimum wasobtained only two times (11%).

Now, let us use IDP with the combined deterministicand randomly chosen candidates. The effect of thesmoothing factor φ and the penalty factor θ are shownin Table 2. Here the parameters for IDP were taken tobe the same as in the previous case. The initial regionsize was chosen to be r(0) ) 6 and we used 6 totalallowable candidates for control (M ) 6). As is seen inthe table, for each value of φ, the chance of getting theglobal optimum was higher than 60%. Furthermore, foran intermediate value of φ (0.15 e φ e 0.35), the globaloptimum was obtained in all of the runs. The effect ofthe penalty function factor θ on the convergence rateas seen in Table 2 is not great. However, for values ofθ greater than 2, generally a larger number of passesis required to reach the optimum.

To compare these results with those for regular IDP,we solved the problem using the present procedure withr(0) ) 6, the number of grid points taken to be withinthe same range as in Table 1, and the total number ofallowable candidates ranging from 6 to 14. The smooth-ing factor φ was chosen to be 0.25 and the penalty factorθ was chosen to be 1. The results, given in Table 3, showthat the success rate obtained with the proposed pro-cedure was 87%, which is considerably higher than thesuccess rate achieved with the regular IDP algorithm.The computation times are reasonable. For example, thecomputation time for 18 passes with N ) 3 and M ) 6was 4 min, and for 5 passes with N ) 13 and M ) 14was about 12 min on a Pentium/300.

However, it is noted that even when a large numberof grid points, N ) 11, were used, there was still achance of getting a local optimum for this problem. This

I ) x3(tf)x4(tf) (22)

h ) x4(tf) - 200 ) 0 (23)

J ) x3(tf)x4(tf) - θ(h - s)2 (24)

sq+1 ) sq - (x4(tf) - 200) (25)

Table 1. Number of Passes Required To Reach I )20841.1 When the Regular IDP Algorithm with θ ) 1 andDifferent Values of M and N in Example 1 Were Useda

total number of candidates for control, M ) R

r (0) ) 12 r (0) ) 6number ofgrid points, N 6 8 10 6 8 10

3 * * 10 * * *5 * * * * * *7 * 10 * * 8 *9 15 * * * * *

11 14 12 14 8 * *13 * * 15 * * *

a Asterisk denotes a local optimum.

Table 2. Number of Passes Required To Reach I )20841.1 in Example 1 When N ) 3, M ) 6, and DifferentValues of O and θ Were Useda

penalty function factor, θsmoothingfactor, φ 0.1 0.2 1 2 5 10

0 22 10 28 20 * *0.10 14 25 15 * * 430.15 22 11 19 14 32 420.20 21 6 20 13 27 470.25 15 28 18 20 25 360.30 22 24 17 17 23 420.35 23 19 9 29 24 310.40 7 22 * 15 26 450.50 26 8 13 19 18 38


86 Ind. Eng. Chem. Res., Vol. 39, No. 1, 2000

shows that care is needed in choosing the parametersfor IDP. To achieve a high probability in obtaining theglobal optimum, several runs should be made by chang-ing the number of grid points and the number ofcandidates for control. This is not a difficult taskbecause, when the present method of choosing controlcandidates with IDP is used, the range of the param-eters yielding the global optimum is increased.

Example 2: Bifunctional Catalyst Problem

A very challenging optimal control problem possessingnumerous local optima is the bifunctional catalystproblem as formulated by Luus et al.4 and consideredby Luus and Bojkov.20 The system is described by theseven differential equations:

with the initial condition

The rate expressions are given by

where the coefficients cij are given by Luus et al.4The objective is to determine the catalyst blend u

along the length of the reactor specified by the charac-teristic reaction time interval 0 e t < 2000 g h/mol suchthat the performance index

is maximized subject to the constraints

As was done by Luus and Bojkov,20 let us divide thetime interval into 10 stages, i.e., P ) 10. As the initialcontrol policy, we chose the value of u halfway betweenthe bounds, as specified by eq 36, i.e., u(0) ) 0.75, andas the initial region size, we chose r(0) ) 0.3. Only asingle grid point (N ) 1) was used, and a run consistedof only a single pass with the reduction factor γ takento be 0.8.

To investigate the effect of the smoothing factor, wesolved the problem using the proposed procedure withdifferent values of φ. Table 4 shows the number ofiterations required to reach the global optimum of I )10.0942 × 10-3 when φ in the range 0-0.5 and differentnumbers of total allowable control candidates are cho-sen. A larger number of allowable control candidatestends to yield the optimum in fewer iterations, but theeffect of φ does not seem to be significant for thisproblem. The number of iterations required to reach theoptimum does not vary much for different values of φ.In fact, they do not differ much from the number ofiterations required when φ ) 0, suggesting that, for thisproblem, using only the shifting candidates along withrandomly chosen values should suffice.

The problem was then solved by incorporating onlyshifting candidates for control to the set of allowablecontrol candidates. In this case, the numbers of totalallowable control candidates become M ) R + 2 forstages 2 to P - 1 and M ) R + 1 for stages 1 and P.Table 5 shows the performance index obtained after 50iterations for different values of γ and r(0) when takingR ) 2 (i.e., M ) 4 for stages 2 to P -1 and M ) 3 forstages 1 and P). In all runs, the global optimum was

Table 3. Number of Passes Required To Reach I )20841.1 When O ) 0.25, θ ) 1, and Different Values of Mand N in Example 1 Were Useda

total number of candidates for control, M ) R + 3number ofgrid points, N 6 8 10 12 14

3 18 12 18 11 75 19 10 * 10 *7 * 15 20 8 59 14 8 8 8 15

11 19 10 14 * 613 21 5 5 11 5


dx1

dt) -k1x1 (26)

dx2

dt) k1x1 - (k2 + k3)x2 + k4x5 (27)

dx3

dt) k2x2 (28)

dx4

dt) -k6x4 + k5x5 (29)

dx5

dt) k3x2 + k6x4 - (k4 + k5 + k8 + k9)x5 + k7x6 +

k10x7 (30)

dx6

dt) k8x5 - k7x6 (31)

dx7

dt) k9x5 - k10x7 (32)

x(0) ) [1 0 0 0 0 0 0]T (33)

ki ) ci1 + ci2u + ci3u2 + ci4u

3, i ) 1, 2, ..., 10 (34)

I ) x7(tf) (35)

Table 4. Number of Iterations Required To Reach I )10.0942 × 10-3 in Example 2 When γ ) 0.8 and DifferentValues of O and M Were Used

total number of candidates for control, M ) R + 3smoothingfactor, φ 5 7 9 11 13

0 27 23 14 17 160.10 24 20 16 17 120.20 22 24 19 17 160.25 27 23 18 17 160.30 27 23 18 17 160.40 22 23 14 17 160.50 22 23 15 17 16

Table 5. Value of the Performance Index I × 103

Obtained with Different Values of r(0) and γ in Example 2When R ) 2 Was Used (Numbers in Parentheses Indicatethe Number of Iterations Required To Reach I ) 10.0942× 10-3)

initial region size, r(0)reductionfactor, γ 0.1 0.2 0.3 0.4 0.5

0.60 10.0940 10.0941 10.0942 10.0938 10.0942(18) (12)

0.65 10.0942 10.0941 10.0939 10.0942 10.0936(20) (16)

0.70 10.0941 10.0942 10.0942 10.0938 10.0942(13) (15) (20)

0.75 10.0942 10.0942 10.0942 10.0942 10.0942(20) (20) (21) (21) (24)

0.80 10.0942 10.0942 10.0942 10.0942 10.0942(18) (24) (24) (24) (28)

0.85 10.0942 10.0942 10.0942 10.0942 10.0942(22) (29) (29) (37) (29)

0.90 10.0942 10.0942 10.0942 10.0942 10.0942(38) (40) (41) (40) (36)

0.60 e u e 0.90 (36)


obtained up to at least five-figure accuracy. The com-putation time required for 50 iterations was 30 s on aPentium/300 computer.

The results in Table 5 compare favorably to theresults obtained by Luus and Bojkov20 when the regularIDP algorithm was used. In their work, Luus andBojkov20 chose the value of γ to be in the range of 0.7-0.9 and the initial region size in the range of 0.2-0.5.They found that to obtain the global optimum, theregion size chosen had to be greater than 0.2; and whenthe initial region was chosen to be greater than 0.3,there were still some chances that the global optimumwould be missed. However, one can see from Table 5that, using the present procedure, even if the initialregion size is as small as 0.1, the global optimum is stillobtained. If the reduction factor γ was chosen to be equalto or greater than 0.75, the global optimum was alwaysreached to six-figure accuracy.

Example 3: Fed-Batch Reactor for ProteinProduction Problem

Next let us consider the problem of determining theoptimal nutrient and inducer feeding rates for a fed-batch reactor as introduced by Lee and Ramirez,21 andused for optimal control studies by Luus.1 The systemis described by the following differential equations:

where

and

The initial state is specified as

The final time is specified as tf ) 10 h, and theperformance index to be maximized is

where Q is the relative cost factor associated with theinducer feed.

The constraints on the control variables are

First, let us consider the case where the relative costof the inducer to the value of the product is 5 (Q ) 5).Luus1 reported that, in this case, at the optimum, u1 iszero throughout the time interval. Thus, optimizationcan be done only on u2. By using IDP with flexible stagelengths, Luus1 achieved the performance index I )0.816480. This value is 2.2% better than the perfor-mance index 0.7988 reported by Lee and Ramirez.21

To solve this problem with stages of equal length,using IDP with deterministic-random search, let usdivide the time interval into 50 stages (P ) 50). To checkthe observation made by Luus1 that when Q ) 5, u1 canbe set to zero, we first performed the optimization onboth u1 and u2. We chose as initial control values, u1

(0)

) u2(0) ) 0, and as the initial region sizes, r1

(0) ) r1(0) )

0.5. The reduction factor γ and the region restorationfactor η were taken to be 0.8. The smoothing factor wasarbitrarily taken to be 0.25. A few preliminary runsshowed that using 3, 5, or 7 grid points yielded ratherslow convergence to the optimum. This is due to thenonlinearities and low sensitivity of the system. There-fore, we used 9 grid points (N ) 9) and 12 allowablecandidates for control (M ) 12 or R ) 9). After 12passes, each consisting of 50 iterations (the sameperformance index as reported by Luus1), I ) 0.816480was achieved. The control policy for u2 is given in Figure1. It has similar shape to, but is somewhat smootherthan, the control policy given by Luus.1 The value ofthe control u1 at the optimum is indeed zero throughoutthe time interval. This allows us to simplify the problemand reduce the computational effort by setting u1 ) 0and perform optimization on only u2.

We now study the effect of the smoothing factor φ forthis problem. As the initial control policy, we took u2

(0)

) 0, and chose the initial region size, r(0), to be 0.5. Eachrun consists of only a single pass with 100 iterations.Table 6 shows the values of the performance indexobtained when N ) 9 and different values for thenumber of allowable candidates for control M andsmoothing factor φ were used. The results show thatthe value of φ has some effect on the convergenceproperty of the algorithm for this problem. An interme-diate value of φ (0.15 e φ e 0.35) tends to yield a betterresult than a value at the extremes. The computation

dx1

dt) u1 + u2 (37)

dx2

dt) µx2 - ψx2 (38)

dx3

dt)

100u1

x1- ψx3 -

µx2

0.51(39)

dx4

dt) RfPx2 - ψx4 (40)

dx5

dt)

4u2

x1- ψx5 (41)

dx6

dt) -k1x6 (42)

dx7

dt) k2(1 - x7) (43)

dx8

dt) u2 (44)

µ ) (0.407x3

ú )(x6 +0.22x7

0.22 + x5) (45)

ψ ) (u1 + u2)/x1 (46)

ú ) 0.108 + x3 + (x32/14814.8) (47)

RfP ) (0.095x3

ú )(0.0005 + x5

0.022 + x5) (48)

k1 ) k2 ) 0.09x5/(0.034 + x5) (49)

x(0) ) [1 0.1 40 0 0 1 0 0]T (50)

I ) x1(tf)x4(tf) - Qx8(tf) (51)

0 e uj e 1, j ) 1, 2 (52)


time required for 60 iterations ranges from 219 s whenM ) 6 is used to 510 s when M ) 14 is used.

Note here the low sensitivity of the control policy onthe performance index. Figure 2 shows a control policyyielding a performance index I ) 0.816478, obtainedafter a single pass when φ ) 0.5 and R ) 7 were chosen.This control policy is quite different from the optimalcontrol policy in Figure 1, even though the performanceindex obtained in this case is just 0.0002% lower thanthe optimal solution. This kind of low sensitivity of theperformance index to a change in the control policymakes the problem very difficult to solve.

Next let us consider the case where there is no penaltyfor the use of inducer feed (Q ) 0). In this case, u1 isnot zero, and the search must be done on both u1 andu2. Using IDP with 10 stages of flexible lengths, Luus1

obtained I ) 1.00960. Here, we took the initial controlvalues to be u1

(0) ) u2(0) ) 0, and the initial region sizes

to be 0.5 for the two controls. The smoothing factor waschosen to be 0.25. After four passes, each consisting of50 iterations, the performance index I ) 1.009600 was

achieved without any difficulty. The computation timerequired was 25 min. To further refine the control policy,several runs were made. The best control policy obtainedyielding I ) 1.009604 is given in Figure 3. The value ofu1 reaches the peak of 0.95 at stage 44 (correspondingto 8.6 e t e 8.8) and then decreases to zero, whereas u2reaches the maximum of 1.0 also at stage 44 andremains at 1.0 until the final time.

Now, let us take the relative cost of the inducer to bejust slightly greater than zero, i.e., Q ) 0.002. Usingthe same IDP parameters as above, we obtained theperformance index I ) 1.007738. The optimal controlpolicy for this case in Figure 4 shows that increasingvalue of Q from zero by only a very small amount causessignificant reduction in the controls of u1 and u2. The

Figure 1. Optimal control policy of u2 for example 3 when Q )5, yielding I ) 0.816480.

Figure 2. Control policy of u2 yielding I ) 0.816478 for example3 when Q ) 5, showing low sensitivity of control on the perfor-mance index.

Table 6. Value of the Performance Index Obtained WhenN ) 9 Was Used (Numbers in Parentheses Indicate theNumber of Iterations Required To Reach I ) 0.816480)

total number of candidates for control, M ) R + 3smoothingfactor, φ 6 8 10 12 14

0 0.816439 0.816418 0.816471 0.816464 0.8164790.05 0.816480 0.816480 0.816480 0.816479 0.816480

(53) (53) (50) (45)0.10 0.816480 0.816480 0.816480 0.816480 0.816480

(46) (44) (43) (42) (40)0.15 0.816480 0.816480 0.816480 0.816480 0.816480

(42) (42) (41) (40) (43)0.20 0.816480 0.816480 0.816480 0.816480 0.816480

(44) (42) (42) (42) (39)0.25 0.816480 0.816480 0.816480 0.816480 0.816480

(42) (39) (42) (39) (40)0.30 0.816480 0.816480 0.816480 0.816480 0.816480

(56) (43) (42) (39) (40)0.35 0.816480 0.816480 0.816480 0.816480 0.816480

(43) (53) (53) (41) (40)0.40 0.816478 0.816480 0.816480 0.816479 0.816480

(37) (41) (40)0.45 0.816473 0.816480 0.816478 0.816479 0.816480

(46) (39)0.50 0.816473 0.816478 0.816478 0.816479 0.816479

Figure 3. Optimal control policy of u1 and u2 for example 3 whenQ ) 0.


peak value for u1 is reduced from 0.95 to 0.08, and thevalue of u2 rapidly decreases to zero after reaching themaximum value of 1.0 at stage 46 (corresponding to 9e t e 9.2). Although Q penalizes the control u2 only, itis interesting to note that it affects u1 more strongly.

Conclusions

The use of combined deterministic and randomlychosen values for the control candidates in iterativedynamic programming improves convergence to theoptimum for the three difficult chemical engineeringproblems chosen here. The effect of the smoothing factorφ was found to be problem-dependent, but an interme-diate value of φ ) 0.25 gave good results in all threeexamples. Even though care is still necessary in choos-ing the parameters for IDP, the ranges for the param-eters yielding the global optimum become greater whenthe deterministic candidates are included, and thesuccess rate in achieving the global optimal solution forIDP is improved.

Acknowledgment

Financial support from the Natural Science andEngineering Research Council of Canada is gratefullyacknowledged.

Nomenclature

A ) expression defined by eq 17B ) expression defined by eq 18f ) nonlinear function of x and u, dimension (n × 1)h ) equality constraint defined in eq 23I ) performance indexJ ) augmented performance index

k ) index used for stage numberm ) number of control variablesM ) number of sets of total allowable values for the controln ) number of state variablesN ) number of grid points used in IDPP ) number of time stagesQ ) relative cost factor used in eq 51r ) region over which allowable values for the control are

takenR ) number of sets of random allowable values for the

controlRfP ) nonlinear function defined by eq 48s ) shifting term used in the quadratic penalty functiont ) timetf ) final timeu ) scalar control variableu ) control vector, dimension (m × 1)uc ) candidate for the controlus ) best control valuex ) state vector, dimension (n × 1)

Greek Letters

Rj ) lower bound on control uj

âj ) upper bound on control uj

γ ) reduction factor for the control region after everyiteration

ú ) nonlinear function defined in eq 47η ) region restoration factor used after every passθ ) penalty factorµ ) nonlinear function defined in eq 45φ ) smoothing factorΦ ) performance indexψ ) nonlinear function defined in eq 46

Subscripts

f ) final valuei ) index used for control candidate numberj ) index used for control variable numberk ) index used for stage number

Superscripts

(0) ) initial valueq ) index used for iteration number

Literature Cited

(1) Luus, R. Iterative Dynamic Programming: From Curiosityto a Practical Optimization Procedure. Control Intelligent Syst.1998, 26, 1.

(2) Bojkov, B.; Luus, R. Use of Random Admissible Values forControl in Iterative Dynamic Programming. Ind. Eng. Chem. Res.1992, 31, 1308.

(3) Luus, R. Optimal Control by Dynamic Programming UsingAccessible Grid Points and Region Reduction. Hung. J. Ind. Chem.1989, 17, 523.

(4) Luus, R.; Dittrich J.; Keil F. J. Multiplicity of Solutions inthe Optimization of a Bifunctional Catalyst Blend in a TubularReactor. Can. J. Chem. Eng. 1992, 70, 780.

(5) Bojkov, B.; Luus, R. Evaluation of the Parameters Used inIterative Dynamic Programming. Can. J. Chem. Eng. 1993, 71,451.

(6) Mekarapiruk, W.; Luus R. Optimal Control of Final StateConstrained Systems. Can. J. Chem. Eng. 1997, 75, 806.

(7) Luus, R. Numerical Convergence Properties of IterativeDynamic Programming when Applied to High Dimensional Sys-tems. Chem. Eng. Res. Des. 1996, 74, 55.

(8) Pham, Q. T. Dynamic Optimization of Chemical EngineeringProcesses by an Evolutionary Method. Comput. Chem. Eng. 1998,22, 1089.

(9) Luus, R. Application of Iterative Dynamic Programming toOptimal Control of Nonseparable Problems. Hung. J. Ind. Chem.1997, 25, 293.

Figure 4. Optimal control policy of u1 and u2 for example 3 whenQ ) 0.002.


(10) Hull, T. E.; Enright, W. D.; Jackson, K. R. User Guide toDVERK-a Subroutine for Solving Nonstiff ODE’s. Report 100,Department of Computer Science, University of Toronto, Canada,1976.

(11) Hong, J. Optimal Substrate Feeding Policy for Fed BatchFermentation with Substrate and Product Inhibition Kinetics.Biotechnol. Bioeng. 1986, 27, 1421.

(12) Hartig, F.; Keil, F. J.; Luus, R. Comparison of OptimizationMethods for a Fed-Batch Reactor. Hung. J. Ind. Chem. 1995, 23,141.

(13) Bojkov, B.; Luus, R. Optimal Control of Nonlinear Systemswith Unspecified Final Times. Chem. Eng. Sci. 1996, 51, 905.

(14) Luus, R.; Hennessy, D. Optimization of Fed-Batch Reactorsby the Luus-Jaakola Optimization Procedure. Ind. Eng. Chem. Res.1999, 38, 1948.

(15) Luus, R. Handling Difficult Equality Constraints in DirectSearch Optimization. Hung. J. Ind. Chem. 1996, 24, 285.

(16) Mekarapiruk, W.; Luus, R. Optimal Control of InequalityState Constrained Systems Ind. Eng. Chem. Res. 1997, 36, 1686.

(17) Luus, R.; Storey, C. Optimal Control of Final StateConstrained Systems. Proc. IASTED Int. Conf. on Modelling,

Simulation and Control, Singapore, Aug 11-13, 1997; IASTED/Acta Press: Anaheim, CA, 1997; p 245.

(18) Luus, R.; Mekarapiruk, W.; Storey, C. Evaluation ofPenalty Functions for Optimal Control. Proc. Int. Conf. onOptimization Techniques and Applications ICOTA ’98, Perth,Australia, July 1-4, 1998; Curtin Printing Services: Curtin,Australia, 1998; p 724.

(19) Luus, R.; Jaakola, T. H. I. Optimization by Direct Searchand Systematic Reduction of the Size of Search Region. AIChE J.1973, 19, 760.

(20) Luus, R.; Bojkov B. Global Optimization of the BifunctionalCatalyst Problem. Can. J. Chem. Eng. 1994, 72, 160.

(21) Lee, J.; Ramirez, W. F. Optimal Fed-Batch Control ofInduced Foreign Protein Production by Recombinant Bacteria.AIChE J. 1994, 40, 899.

Received for review June 1, 1999Revised manuscript received September 29, 1999

Accepted October 12, 1999

IE990381T


Documents

Optimal Control by Iterative Dynamic Programming with Deterministic and Random Candidates for Control