Handling Inequality Constraints in Optimal Control by Problem Reformulation

Handling Inequality Constraints in Optimal Control by Problem Reformulation

Rein Luus*

Department of Chemical Engineering, UniVersity of Toronto, Toronto, ON M5S 3E5, Canada

Establishment of optimal control for systems, where constraints involve both control and state, is very difficult.In some problems the difficulty is reduced significantly by transforming the optimal control problem. Forillustration, the optimal control of a nonisothermal fed-batch reactor with heat removal constraint is considered.Although there are only two control variables, the feed rate and the temperature, the heat removal rate constraintmakes the optimal control problem very difficult. To parametrize the optimal control problem, the time intervalis divided into P time stages of variable length, and piecewise constant control is used at each time stage.Establishment of the optimal control policy is very challenging owing to the low sensitivity, the heat removalconstraint, and the need for a large number of time stages for adequate approximation. However, byreformulating the optimal control problem where heat generation, rather than temperature, is used as a controlvariable, we are able to get greater accuracy with a smaller number of time stages. The optimal controlpolicy is established with iterative dynamic programming and checked with LJ-optimization procedure.

Introduction

In engineering design and applications, we are frequentlyfaced with the problem of determining the control policy toachieve the best operation of the system within specifiedconstraints. If the system is well-behaved, methods based onPontryagin’s maximum principle may be successfully used.Although the resulting optimal control problem becomes aboundary value problem, if the system of equations is well-behaved, the optimal control policy can be readily obtained byboundary condition iteration,1 or by control vector iteration.2

In fact, if the time horizon is reasonably short, boundarycondition iteration provides very rapid quadratic convergenceto the optimal control policy of well-behaved systems even ifthere are constraints on the control.3

However, many chemical engineering systems are not well-behaved and alternate procedures may have to be used to obtainthe optimal control policy. For some systems, the optimal controlpolicy is highly oscillatory in nature and there can be doubtswhether the optimal control policy is actually achieved.4 Aseemingly simple isothermal fed-batch reactor optimizationproblem5 has presented challenges for many optimizationprocedures, and, as can be anticipated, establishing the optimalcontrol of nonisothermal fed-batch reactors is much moredifficult. Recently, Schlegel and Marquardt6 showed that thecontrol policy thought to be optimal for a nonisothermal fed-batch reactor could be improved to provide a 2% improvementin the reported yield, but their reported control policy is stillnot the global optimum. Numerous methods have been proposedto deal with the establishment of optimal control of badlybehaved systems such as fed-batch reactors where the very lowsensitivity makes the establishment of the optimal control policyquite difficult.7-10 When temperature is included as an additionalcontrol variable and heat constraint is added, the problem ofestablishing the optimal control becomes much more difficult.11,12

The goal of this paper is to investigate establishing the optimalcontrol policy for nonisothermal fed-batch reactors by iterative

dynamic programming (IDP),13 and to show the benefits ofslightly reformulating the optimal control problem.

Problem Formulation

We consider the model of the nonisothermal fed-batch reactoras modeled by Srinivasan and Bonvin,11 where the exothermicreaction

A + Bfk1

Cfk2

D

is occurring. It is required to determine the optimal temperatureprofile and the feed rate to maximize the yield VcC, where Vdenotes the volume and cC is the concentration of the desiredspecies C, in a given batch time tf, so that the heat removalcapacity of the reactor would not be exceeded. The equationsdescribing the reactor are

dx1

dt) -k1x1cB (1)

dx2

dt) k2(x1 - x2) (2)

dx3

dt) u1 (3)

where x1 ) VcA, x2 ) V(cA + cC), and x3 ) V, the volume ofthe liquid in the reactor, and cB ) (20x3 + x1 - 29.139)/x3, andu1 is the feed rate.

The initial state is

xT(0) ) [10 10 1 ] (4)

and the rate constants are

k1 ) 2 × 103 exp( -2.0 × 104

8.31(u2 + 273.15)) (5)

k2 ) 8 × 1012 exp( -8.0 × 104

8.31(u2 + 273.15)) (6)

where u2 is the temperature in degrees Celsius.The reactions are exothermic, where the heat produced by

the reaction is

Q ) 5 × 105k1x1cB + 5 × 104k2(x2 - x1) (7)* To whom correspondence should be addressed. E-mail:

[email protected].

Ind. Eng. Chem. Res. 2009, 48, 9622–96309622

10.1021/ie801806t CCC: $40.75 2009 American Chemical SocietyPublished on Web 04/17/2009

The constraints on the feed rate and temperature are

0 e u1 e 2 (8)

20 e u2 e 50 (9)

and the constraints on the volume and heat withdrawal rate are

0 < x3 e 1.1 (10)

Q e 5 × 106 (11)

At the batch time tf ) 0.25 h, it is required to maximize theyield of the desired product:

I ) x2(tf) - x1(tf) (12)

Equation 11 is a very difficult inequality constraint since Qis a function of both the state x and control u.

We divide the time interval [0,tf] into P time stages of variablelength as suggested by Bojkov and Luus,14 and use piecewiseconstant control at each time stage. For optimization, we useiterative dynamic programming (IDP) as presented in detail byLuus.13 For clarity, here we outline the algorithm.

Algorithm for IDP. Let us consider the general problem,where we wish to choose the m-dimensional control vector uin the time interval [0, tf] to maximize the performance indexthat is an explicit function of the n-dimensional state at the finaltime x(tf):

I ) Φ(x(tf)) (13)

subject to the mathematical model

dxdt

) f(x,u), x(0) given (14)

the constraints on the control variables

aj e uj e bj, j ) 1, 2, ..., m (15)

and the general inequality constraints for state and control inthe form

ψi(x,u) e 0, i ) 1, 2, ..., k (16)

for the entire time interval.To deal with general inequality constraints on the state, we

follow the penalty function approach used by Luus,15 with asmall variation. Instead of difference equations, we use dif-ferential equations as suggested by Mekarapiruk and Luus16 toconstruct the penalty functions. This approach was also usedsuccessfully by Shelokar et al.17 We therefore introduce k stateconstraint Variables through the differential equations

dxn+i

dt) {0 if ψi(x, u) e 0

ψi(x, u) if ψi(x, u) > 0 (17)

for i ) 1, 2, ..., k, with the initial condition

xn+i(0) ) 0, i ) 1, 2, ..., k (18)

At the final time, tf, xn+i(tf) therefore gives the total violation ofthe ith inequality constraint integrated over time. The advantageof taking such value is that sometimes a violation may occurfor a short time inside a time stage, and this violation may goby unnoticed if the difference equation approach is used wherethe violations are checked only at the ends of the time stages.Now we choose the augmented performance index to bemaximized as

J ) I - ∑i)1

k

θixn+i(tf) (19)

where θi > 0 are penalty function factors for the inequality stateconstraints.

The given time interval is divided into P time stages ofvarying length, and we consider the case where the control iskept constant in each time interval, although different controlparametrizations, such as piecewise linear, could also be used.As suggested by Bojkov and Luus,14 the time variable t istransformed to τ, so that the time interval [0,tf] becomes [0,1]and in the new time variable all stages are of equal length. Thedifferential equation in the k th time interval then becomes

dxdτ

) V(k) Pf(x, u) (20)

and the stage length at each stage V(k) becomes an additionalcontrol variable to be determined through optimization. Thealgorithm then becomes the following:

1. Choose the number of time stages P, the number of gridpoints N, the number of allowable values for control (includingstage lengths) R at each grid point, the region contraction factorγ used after every iteration, the region restoration factor η, initialvalues for the control and the stage lengths, the initial regionsizes, the number of iterations to be used in every pass, and thenumber of passes.

2. By choosing N values for control and the stage lengthsaround the best available control and stage lengths inside theregion sizes, integrate eq 20 from τ ) 0 to τ ) 1 to generate Ntrajectories. The N values for x at the beginning of each timestage make up the N grid points at each stage.

3. Starting at stage P, corresponding to the normalized timeτ )(P - 1)/P, for each grid point generate R sets of values forcontrol and stage length:

u(P) ) u*j(P) + Dr j(P) (21)

V(P) ) V*j(P) + ωw j(P) (22)

where D is an m × m diagonal matrix with different randomnumbers between -1 and 1 along the diagonal and ω is anotherrandom number between -1 and 1; u*j(P) is the best value forcontrol, and V*j(P) is the best value for the stage length, bothobtained for that particular grid point in the previous iteration;w j is the stage length at iteration j. Integrate eq 20 from τ )(P - 1)/P to τ ) 1 once with each of the R allowable valuesfor control and stage length to yield x(tf) so that the performanceindex can be evaluated. Compare the R values of the perfor-mance index and choose the control and stage length whichgive the maximum value. The corresponding control and stagelength are stored for use in step 4.

4. Step back to stage P - 1, corresponding to time τ )(P - 2)/P. For each grid point generate R allowable sets ofcontrol and stage lengths. Integrate eq 20 from τ ) (P - 2)/Pto τ ) (P - 1)/P once with each of the R sets. To continueintegration, choose the control and the stage length from step 3that corresponds to the grid point that is closest to the x at τ )(P - 1)/P. Now compare the R values of the performance indexand store in memory the control policy and the stage lengththat yield the maximum value.

5. Step back to stage P - 2 and continue the procedure inthe previous step. Continue until stage 1 corresponding to τ )0 with the given initial state as the grid point is reached. Makethe comparison of the R values of the performance index togive the best control and the best stage length for this stage.We now have the best control policy and the best stage lengthfor each stage in the sense of maximizing the performance indexfrom the allowable choices.

Ind. Eng. Chem. Res., Vol. 48, No. 21, 2009 9623

6. In preparation for the next iteration, reduce the size of theallowable regions

r j+1(k) ) γr j(k), k ) 1, 2, ..., P (23)

w j)1(k) ) γw j(k), k ) 1, 2, ..., P (24)

where γ is the region reduction factor and j is the iteration index.Use the best control policy and the best stage lengths from step5 as the midpoint for the next iteration.

7. Increment the iteration index j by 1 and go to step 2 togenerate another set of grid points. Continue for the specifiednumber of iterations.

8. Increment the pass number index by 1, set the iterationindex j to 1, and restore the region sizes to η times the regionsizes used at the beginning of the pass, or choose the regionsizes from the amount that the corresponding variables havechanged, and go to step 2. Continue for the specified numberof passes and examine the results.

The clipping technique is used to handle the control con-straints given in eq 15. In step 2, in generating the grid points,the suggestion of Fikar et al.18 is used where the values tocontrol are generated at random inside the region. This methodof generating grid points has been found to be especially usefulfor the optimal control of fed-batch reactors.19 Although IDPcannot guarantee obtaining the global optimum from any startingvalue for control,20 it has been found useful in time optimalcontrol of distillation columns,21-23 and in the study ofoscillatory systems.24 For some systems the computations canbe carried out fast enough to enable its use for online optimalcontrol.25,26

Numerical Results. All computations were done in doubleprecision using WATCOM FORTRAN77 compiler version 9.5on an AMD Athlon/3800 (2.4 GHz) personal computer, whichis about 1.5 times faster than Pentium4/2.4 GHz computer. Toobtain an accurate integration, the IMSL subroutine DVERK27

was used with a local error tolerance of 10-8. DVERK is aRunge-Kutta subroutine based on Verner’s fifth and sixth orderpair formulas.

We first consider solving the optimal control problem aspresented in eqs 1-12. To deal with the heat removal constraintin eq 11, we introduce the state variable x4 which is initiallychosen to be zero, and

dx4

dt) {Q × 10-6 - 5.0 if Q > 5.0 × 106

0 if Q e 5.0 × 106(25)

Here, to keep all the state variables of the same order ofmagnitude, the heat constraint violation is multiplied by 10-6.Thus, if there is no constraint violation, then x4 remains zeroand whenever the heat removal constraint is violated, a positivevalue is assigned to the derivative and x4 becomes positive. Thenx4(tf) is incorporated as a penalty into an augmented performanceindex in efforts to drive the constraint violation to zero. Thevalue of x4(tf) shows the extent to which the constraint has beenviolated. A very small value will indicate that there is a violation,and methods can be used to reduce the violation to a negligibleamount. This approach has been found to be effective byMekarapiruk and Luus28 in dealing with state constraints.

We are using stage lengths of variable lengths and since thefinal time tf is specified, this introduces another term into theaugmented performance index. Also, it is expected that formaximum yield, the volume should be at its maximum value,so the upper limit on the volume in eq 10 is treated as an equalityconstraint at the final time t ) tf. This assumption is verified

through sensitivity analysis. We therefore choose the augmentedperformance index to be maximized as

J ) [x2(tf) - x1(tf)] - θ1[(x3(tf) - 1.1 - s1)2 +

(tfc - 0.25 - s2)2] - θ2x4(tf) (26)

where the calculated value of the final time is

tfc ) ∑k)1

P

V(k) (27)

The penalty function factors θ1 and θ2 are positive andsufficiently large to provide convergence; the shifting terms s1

and s2 are put equal to zero initially and are updated after everypass according to

s1q+1 ) s1

q - (x3(tf) - 1.1) (28)

s2q+1 ) s2

q - (tfc - 0.25) (29)

where q is the pass number. The use of shifting terms to dealwith equality constraints in optimization was first proposed byLuus29 for steady state optimization and used in IDP by Luusand Storey30 and is discussed in some detail in Luus13 and Luuset al.31,32 An interesting aspect of shifting terms is that, uponconvergence, they give the sensitivity information. At their finalconverged values, -2θ1s1 gives the sensitivity of the perfor-mance index to the violation of the volume constraint, and -2θ1s2 gives the sensitivity of the performance index to theviolation to the final time constraint. Thus if s1 is negative, thenincreasing the volume beyond 1.1 L will increase the perfor-mance index, and will confirm our expectation of getting themaximum yield when the volume reaches its maximum allowedvalue. Similarly, if s2 is negative, then an increase in tf willincrease the performance index.

For a large number of optimal control problems that havebeen solved by this approach, it has been found that there is awide range over which the penalty function factors may bechosen to obtain successful results, so here each penalty functionfactor was assigned a value of 100. For a preliminary run wetook P ) 25. The initial values for the control variables weretaken as 1.0 and 35, respectively, and the initial stage lengthswere chosen to be 0.01 for each time stage. The initial regionsizes were taken as 1, 25, and 0.001. The region contractionfactor γ ) 0.95 was used with the region restoration factor ofη ) 0.90. For each pass, 20 iterations were used, and a total of100 passes were used. The results of this preliminary run, shownin Table 1, indicate that more than a single grid point isnecessary, but no more than 7 are required for this problem,and no more than 100 random points per iteration are necessary.The total computation time on the Athlon/3800 was 3 h, with107 s for the case with N ) 1, R ) 25 to 2353 s for the casewith N ) 7, R ) 100.

The control policy for the best value, I ) 2.0825 with s1 )-0.047746, s2 ) -0.267783, in Table 1 was used as a startingvalue for a refined run with R ) 50 and N ) 1, 3, and 5. The

Table 1. Performance Index I as a Function of the Number of GridPoints N Generated by Assigning Control Values at Random, Andthe Number of Random Points R Used at Each Time Stage in EachIteration with the Use of θ1 ) θ2 ) 102 with P ) 25

performance index, I

number of grid points, N R ) 25 R ) 50 R ) 100

1 2.06592 2.07319 2.079463 2.07762 2.07774 2.079325 2.07957 2.08114 2.080537 2.07631 2.07949 2.08250

9624 Ind. Eng. Chem. Res., Vol. 48, No. 21, 2009

negative value of s1 shows that x3 is at its upper bound at theoptimum, validating the assumption made in using the equalityconstraint. Changing the seed number for generating randomnumbers had negligible effect. After several refined runs, themaximum value I ) 2.0931 was obtained with the control policyshown in Table 2. The feed rate u1 starts at its maximum value,drops to 0.6 at 0.033 h, and then to zero at 0.12 h. It is interestingto note that the first time stage is very short and the temperatureu2 is close to its maximum allowable value, but drops veryrapidly in the beginning, reaching a minimum value of 31.6°C, then increases to a maximum of 46.5 °C, and then decreasesto 44.9 °C at the final time. In the last column is shown theheat removal rate Q at the end of each time stage. It is notedthat for the first half of the time period, the heat removal rateis at its maximum value.

To get a more accurate control policy, the number of timestages P was doubled to 50. A very small improvement to I )2.0949 was obtained in the performance index, but the controlpolicy was not changed much as can be seen in Figures 1 and2. The best control policy obtained here is quite different fromthat reported by Srinivasan and Bonvin,11 where the feed ratestarts at 0.56, increases to 1.2 and then decreases gradually to0.57 before dropping to zero at 0.16 h.

Since at the optimum, Q is changing in a very smooth fashion,to check the results, it was decided to reformulate the problemwhere the heat rate Q is used as a control variable rather thantemperature.

Reformulation of the Optimal Control Problem. Insteadof having the control u2 appearing as the temperature in therate constants, an alternate way of expressing this optimal controlproblem is to have

u2 ) Q (30)

Then eq 7 is used to obtain the temperature T, where

k1 ) 2 × 103 exp( -2.0 × 104

8.31(T + 273.15)) (31)

k2 ) 8 × 1012 exp( -8.0 × 104

8.31(T + 273.15)) (32)

When a value is given to Q, eq 7 is solved numerically toyield the temperature T by the method suggested by Luus.33

Equality constraints in optimization are readily accomplishedand the results are very accurate.34 Identification of activeinequality constraints and using these as equalities has beenshown to be feasible and this approach improves the conver-gence properties of direct search optimization very significantlyfor many problems.35 Here, the algebraic equation is solved byNewton’s method inside the differential equation subroutine,so it is solved many times in each integration time step, yieldingan almost continuous profile for the temperature. To get thegradients, a forward and backward perturbation of 10-4 wasused. The optimal control problem is still the same, but thisformulation is expected to give greater accuracy.

We still need the auxiliary variable x4, but here it becomes

dx4

dt) {T - 50 if T > 50

20 - T if T < 200 if 20 e T e 50

(33)

As before, each penalty function factor was assigned a valueof 100 and we took P ) 25. The initial values for the controlvariables were taken as 1.0 and 2.5 × 106, respectively, and

Table 2. Control Policy with the Use of P ) 25 Time Stages ofVariable Length Giving I ) 2.0931

stage time u1 u2 u3 ) ∆t Q

1 0.000775 1.999999 48.789321 0.000775 4999999.992 0.001662 1.999949 47.623159 0.000887 5000000.003 0.002658 2.000000 46.366872 0.000996 5000000.004 0.003780 2.000000 45.013031 0.001123 5000000.005 0.004904 1.998614 43.721329 0.001124 5000000.006 0.006224 1.999366 42.275984 0.001319 5000000.007 0.008019 1.999136 40.421285 0.001795 5000000.008 0.009914 1.999341 38.592236 0.001895 5000000.009 0.012149 1.999193 36.585655 0.002234 4999999.9910 0.014783 1.999962 34.400843 0.002635 5000000.0011 0.018472 2.000000 31.620800 0.003689 4999999.9912 0.032874 0.614723 31.618181 0.014401 4999649.2613 0.072193 0.621229 31.602861 0.039319 4997163.3514 0.119033 0.633107 31.600542 0.046841 4995120.4015 0.126270 0.017450 31.659878 0.007237 4761327.7416 0.132793 0.000000 33.547491 0.006523 4768272.9717 0.139320 0.000000 35.397620 0.006527 4760398.4718 0.144853 0.000000 37.331527 0.005533 4788888.4419 0.150427 0.000000 39.046367 0.005574 4780954.7620 0.155194 0.000000 40.841850 0.004766 4806047.7121 0.160257 0.000000 42.439509 0.005063 4788638.5322 0.166534 0.000000 44.195145 0.006277 4731942.2823 0.201936 0.000000 46.449477 0.035402 3653868.2524 0.223716 0.000000 45.603403 0.021780 2989470.1425 0.250000 0.000000 44.869540 0.026284 2388897.74

Figure 1. Optimal feed rate policy with the use of P ) 50 time stages withthe original formulation; I ) 2.0949.

Figure 2. Optimal temperature policy with the use of P ) 50 time stageswith the original formulation; I ) 2.0949.


the initial stage lengths were chosen to be 0.01 for each timestage. The initial region sizes were taken as 1, 1.25 × 106, and0.001. To solve the algebraic equation, the initial value for Twas taken as 35 °C for the first iteration of the first pass.Thereafter, the best previously obtained value of T for a stagewas used as the starting value for that stage. The calculationswere calculated until the calculated Q was within 10-5 of u2.This provided 11 figure accuracy. The region contraction factorγ ) 0.95 was used with the region restoration factor of η )0.90. As before, for each pass, 20 iterations were used, and atotal of 100 passes were used. There was no constraint violation(x4(tf) ) 0) and considerably better results were obtained, as isshown in Table 3. As expected, the computation time washigher, about double the time required with the originalformulation. Here, with P ) 25, the best value I ) 2.1017 is0.2% better than obtained with the original formulation. It ismost interesting to note that the control policy for u1 as shownin Table 4 is slightly different from the control policy in Table2. The shifting terms obtained here: s1 ) -0.050327, s2 )-0.26705 are quite close to those obtained with the originalformulation. In the last column is the calculated temperature atthe end of each stage. We see that the temperature profile isquite similar to the one obtained before, but it reaches the lowerbound of 20 °C at t ) 0.046 h.

Increasing the number of time stages to P ) 50 increasedthe performance index to 2.1018 and changed the control policyvery slightly as is seen in Figures 3 and 4. Figure 5 shows thetemperature profile. It is interesting to see that the temperature

starts at 49.84 °C which is very close to the upper limit andthen falls off to the lower limit, and then rises to a peak of 48°C, and then gradually decreases. Due to the low sensitivity,the optimal control policy is very difficult to obtain accuratelywith the original formulation, but the reformulated problemprovides a way of getting excellent results.

Checking Results with Direct Search OptimizationProcedure. To check the optimal control policy as obtainedwith IDP, the reformulated problem was run with directsearch optimization using time stages of constant length. TheLuus-Jaakola (LJ) optimization procedure36,37 was used herebecause of simplicity and high reliability.38 To improveconvergence, line search was used after every pass, assuggested by Luus.39

The time interval was divided into P ) 25 time stages,each of length 0.01 h. In each pass, 20 iterations were usedand a maximum of 200 passes were allowed. The regionreduction factor 0.95 was used to reduce the size of the regionafter every iteration, and after every pass the region wasrestored to 0.85 of its value at the beginning of each pass.The initial values for u1 and u2 were chosen randomly in

Table 3. Performance Index I as a Function of the Number of GridPoints N Generated by Assigning Control Values at Random, andthe Number of Random Points R Used at Each Time Stage in EachIteration with the Use of θ1 ) θ2 ) 102 for the ReformulatedOptimal Control Problem with P ) 25

performance index, I

number of grid points, N R ) 25 R ) 50 R ) 100

1 2.09473 2.09598 2.090963 2.10097 2.10121 2.100555 2.10055 2.10170 2.101367 2.10151 2.10121 2.10157

Table 4. Optimal Control Policy for the Reformulated Problem withthe Use of P ) 25 Time Stages of Variable Length, YieldingI ) 10170

stage time u1 u2 × 10-6 u3 ) ∆t temperature

1 0.00038 2.00000 5.0000 0.000375 49.33352 0.00734 2.00000 5.0000 0.006966 41.15413 0.03678 2.00000 5.0000 0.029438 21.51004 0.04626 1.16240 5.0000 0.009483 20.00585 0.06900 0.67829 5.0000 0.022735 20.00796 0.09655 0.00000 5.0000 0.027554 26.02627 0.11986 0.00000 5.0000 0.023307 31.94388 0.14736 0.00000 5.0000 0.027500 40.21889 0.16948 0.00000 5.0000 0.022120 48.177810 0.17330 0.00000 4.7994 0.003828 47.939811 0.17720 0.00000 4.6079 0.003892 47.715812 0.18117 0.00000 4.4241 0.003977 47.506413 0.18528 0.00000 4.2464 0.004108 47.314414 0.18957 0.00000 4.0725 0.004287 47.134015 0.19407 0.00000 3.9007 0.004507 46.960016 0.19885 0.00000 3.7308 0.004773 46.799717 0.20395 0.00000 3.5604 0.005102 46.640218 0.20934 0.00000 3.3903 0.005391 46.465519 0.21492 0.00000 3.2238 0.005580 46.272520 0.22060 0.00000 3.0652 0.005677 46.083621 0.22636 0.00000 2.9149 0.005759 45.903822 0.23218 0.00000 2.7724 0.005821 45.730423 0.23805 0.00000 2.6378 0.005877 45.565424 0.24399 0.00000 2.5104 0.005940 45.409925 0.25000 0.00000 2.3896 0.006007 45.2626

Figure 3. Optimal feed rate policy with the use of P ) 50 time stages withreformulation; I ) 2.1018.

Figure 4. Optimal heat rate policy with the use of P ) 50 time stages withreformulation; I ) 2.1018.


(0,2) and (0, 5.0 × 106) respectively. The initial region sizeswere chosen to be 0.25 and 2.5 × 106, respectively.

By using different seeds for the random number generator,five runs were performed with R ) 200, R ) 500, and R )1000. The results in Table 5 show that the choice of the seedfor the random numbers has very little effect and at least500 random points are necessary to get the performance indexI ) 2.1014. With R ) 1000 I ) 2.1014 was obtained eachtime. The average computation time with the use of 500random points per iteration was 507 s for the 5 cases.

When the number of stages was increased to 50, theperformance index was increased to 2.1017. The optimal controlpolicy obtained with P ) 50 stages of equal length is shown inFigures 6 and 7. As can be seen, the results are quite close tothe optimal control policy obtained with IDP, presented earlierin Figures 3 and 4. However, since the stages are of equal length,the switching in the feed rate in Figure 6 is not very accurate.Thus a run was performed with the use of P ) 100 time stages,each of length 0.0025 h. An improved value I ) 2.1018 wasobtained, and the switching in the feed rate was now much closerto that displayed in Figure 3. Thus by reformulating the problem,direct search optimization can be readily used to give resultsclose to the optimal.

Discussion of results

Although the reformulated problem requires more com-putational effort, the problem of piecewise constant temper-ature profile is avoided. Recently Schlegel and Marquardt6

considered the optimal control of a slightly different noniso-

thermal fed-batch reactor as presented by Srinivasan et al.12

The system equations and initial state are the same, but thebatch time is 0.5 h instead of 0.25 h, and there are few otherdifferences:

cB ) (20x3 + x1 - 28.8315)/x3 (34)

the rate constants are

k1 ) 4 exp( -6.0 × 103

8.31(u2 + 273.15)) (35)

k2 ) 800 exp( -2.0 × 104

8.31(u2 + 273.15)) (36)

and the heat produced by the reaction is

Q ) 3 × 104k1x1cB + 104k2(x2 - x1) (37)

The constraints on the feed rate and heat rate are alsodifferent:

0 e u1 e 1 (38)

Figure 5. Optimal temperature profile with the use of P ) 50 time stageswith reformulation; I ) 2.1018.

Table 5. Performance Index I as a Function of the Number ofRandom Points R Used in Each Iteration Using LJ OptimizationProcedure with the Use of θ1 ) θ2 ) 102 for the ReformulatedOptimal Control Problem with P ) 25 Stages, Each of Length 0.01,for 5 Runs with Different Seeds for the Random Number Generatorin Each Run

performance index I

run number R ) 200 R ) 500 R ) 1000

1 2.1013 2.1014 2.10142 2.1012 2.1012 2.10143 2.1014 2.1014 2.10144 2.1013 2.1014 2.10145 2.1012 2.1013 2.1014

Figure 6. Optimal feed rate policy obtained with LJ optimization procedureby using P ) 50 time stages of equal length with reformulation; I ) 2.1017.

Figure 7. Optimal heat rate policy obtained with LJ optimizationprocedure by using P ) 50 time stages of equal length with reformula-tion; I ) 2.1017.


Q e 1.5 × 105 (39)

The rest of the equations and the performance index are thesame as before.

When this problem was reformulated, using Q instead of Tas the second control variable, the computer program that wasused for the earlier example, could be used easily. With theuse of P ) 60 stages of variable length, the maximumperformance index I ) 2.05269 was obtained. By doubling thenumber of stages to 120, the performance index was increasedto 2.05271. Although this is only marginally better than 2.05270reported by Schlegel and Marquardt,6 the control policy closeto t ) 0, is somewhat different, as shown in Figures 8 and 9.Here, Q is at its upper bound, whereas in that paper Q is initiallysignificantly below the upper bound. The temperature profile isgiven in Figure 10. The temperature starts at 49.88 °C ratherthan 46 °C as reported by Schlegel and Marquardt.6 The rest ofthe temperature profile is almost identical to theirs. There wasno violation of the constraints, with the V(tf) ) 1.100000 andtfc ) 0.500000 with the shifting terms s1 ) -0.040031 and s2

) -0.009118. Because of the low sensitivity, it is very difficultto get the optimal control policy very accurately.

Although IDP requires considerably more computationaleffort than sequential quadratic programming, for highly

nonlinear systems the reliability of IDP has been found to beconsiderably greater. This was especially true when difficultieswere experienced in the use of SQP in establishing the optimalbifunctional catalyst blend in a tubular reactor.40 At present,the computer time on personal computers is essentially free, sothe expenditure of a few hours to solve a complex problem isnot excessive. Once the nature of the optimal control policy isknown, then the computational effort is reduced substantially,as is the case of establishing the optimal control of oscillatorysystems.24

For the nonisothermal fed-batch reactor, very accurate resultswere possible because the problem of searching for the besttemperature was replaced by searching for the best heat rate.In the original formulation, piecewise constant control fortemperature is not good, since under optimal control, thetemperature changes very rapidly initially, so the stage lengthsinitially should be extremely short. Piecewise linear parametri-zation may diminish this difficulty, but this was not attempted.Reformulation of the problem provided a much more attractivealternative.

Conclusions

Reformulating a nonisothermal fed-batch reactor optimalcontrol problem given by Srinivasan and Bonvin11 provided asignificant simplification for establishing the optimal control.Instead of using the temperature as the second control variable,the heat produced by the reaction was used. This resulted inbetter results with a relatively small number of time stages. Thereformulated problem involved solving a nonlinear algebraicequation inside the integration routine and required about twiceas much computational effort for the same number of timestages. The better results, however, showed that the benefit ofdetermining the heat rate rather than the temperature for thenonisothermal fed-batch reactor was worth the additional effort.For the reformulated problem, the optimal control policy is alsoreadily obtained with direct search optimization procedure.

A maximum yield of 2.1018 moles was obtained for thismathematical model of a nonisothermal fed-batch reactor. Theoptimal control policy for this model is quite different fromthe one reported in the literature and was cross-checked by adifferent optimization procedure. Successful results were alsoobtained with a similar optimal control problem, where theoptimal control policy had evaded several researchers.

Figure 8. Optimal feed rate policy with the use of P ) 120 time stageswith reformulation of the second optimal problem; I ) 2.05271.

Figure 9. Optimal heat rate policy with the use of P ) 120 time stageswith reformulation of the second optimal problem; I ) 2.05271.

Figure 10. Optimal temperature profile with the use of P ) 120 time stageswith reformulation of the second optimal problem; I ) 2.05271.


Iterative dynamic programming provides a good means ofobtaining the optimal control policy of highly nonlinear systems.To get to the vicinity of the global optimum, however, morethan one grid point at each time step was required for both ofthe systems considered. As noted by Fikar et al.18 and by Rusnaket al.,26 here we also found that faster convergence resulted ifthe grid points are generated by assigning control values atrandom rather than uniformly. Computations showed that forthis problem no more than 7 grid points are required at eachtime stage if the initial control policy is far from the optimum.For refining the results 3 grid points are quite adequate.

Acknowledgment

After my official retirement in 2004, the office space andaccess to the library provided to me by University of Torontois gratefully appreciated.

Nomenclature

aj ) lower bound on uj

bj ) upper bound on uj

cA ) concentration of reactant A (mol/L)cB ) concentration of reactant B (mol/L)cC ) concentration of desired product C (mol/L)D ) diagonal matrix with random numbers in (-1,1)f ) continuous vector function of x and uI ) performance index, yield of desired product C (mol)J ) augmented performance index (mol)k ) number of inequality constraintsk1 ) rate constant for first reaction (L/(mol h))k2 ) rate constant for second reaction (L/h)m ) number of control variablesn ) number of state variablesN ) number of grid points used at each time stageP ) number of time stagesq ) pass numberQ ) heat produced by reaction (J/h)r ) region size vector for control variablesR ) number of random points per iterations1 ) shifting term for volume constraint (L)s2 ) shifting term for batch time constraint (h)t ) time (h)tf ) batch time (h)tfc ) calculated batch time (h)T ) temperature (°C)u1 ) feed rate (L/h)u2 ) temperature (°C)u ) control vectorV ) stage lengthV ) volume (L)x1 ) moles of A (mol)x2 ) moles of A and C combined (mol)x3 ) volume of liquid in reactor (L)x4 ) heat constraint violation variable, temperature constraint

variablex ) state vectorw ) region size for stage length

Greek Symbolsγ ) amount by which the region sizes are reduced after every

iterationε ) region collapse parameter specifying the minimum region sizeη ) region restoration factorθ1 ) penalty function factor for equality constraintsθ2 ) penalty function factor for inequality constraints

τ ) transformed timeΦ ) a continuous function of final stateψ ) general inequality constraint variableω ) random number between -1 and 1

Literature Cited

(1) Luus, R. Boundary condition iteration, BCI. Encyclopedia ofOptimization, 2nd ed.; Floudas, C. A., Pardalos, P. M., Eds.; Springer, NewYork, 2009; p 313.

(2) Luus, R. Control vector iteration, CVI. Encyclopedia of Optimization,2nd ed.; Floudas, C. A., Pardalos, P. M., Eds.; Springer, New York, 2009;p 509.

(3) Luus, R. Further developments in the new approach to boundarycondition iteration in optimal control. Can. J. Chem. Eng. 2001, 79, 968.

(4) Luus, R. Parametrization in nonlinear optimal control problems.Optimization 2006, 55, 65.

(5) Park, S.; Ramirez, W. F. Optimal production of secreted protein infed-batch reactors. AIChE J. 1988, 34, 1550.

(6) Schlegel, M.; Marquardt, W. Detection and exploitation of the controlswitching structure in the solution of dynamic optimization problems. J.Process Control 2006, 16, 275.

(7) Yang, R. L.; Wu, C. P. Global Optimal Control by AcceleratedSimulated Annealing, Paper presented at the First Asian Control Conference,Tokyo, Japan, 1994.

(8) Luus, R. Sensitivity of control policy on yield of a fed-batch reactor.Proceedings of IASTED International Conference on Modelling andSimulation, Pittsburgh, PA, April 27-29, 1995; Hamza, M. H., Ed.;IASTED/Acta Press: Calgary, AB, 1995; p 224.

(9) Luus, R.; Hennessy, D. Optimization of fed-batch reactors by theLuus-Jaakola optimization procedure. Ind. Eng. Chem. Res. 1999, 38, 1948.

(10) Schlegel, M.; Stockmann, K.; Binder, T.; Marquardt, W. Dynamicoptimization using adaptive control vector parameterization. Comput. Chem.Eng. 2005, 29, 1731.

(11) Srinivasan, B.; Bonvin, D. Characterization of optimal temperatureand feed-rate policies for discontinuous two-reaction systems. Ind. Eng.Chem. Res. 2003, 42, 5607.

(12) Srinivasan, B.; Palanki, S.; Bonvin, D. Dynamic optimization ofbatch processes I. Characterization of the nominal solution. Comput. Chem.Eng. 2003, 27, 1.

(13) Luus, R. IteratiVe Dynamic Programming; Chapman & Hall/CRC:London, UK, 2000.

(14) Bojkov, B.; Luus, R. Optimal control of nonlinear systems withunspecified final times. Chem. Eng. Sci. 1996, 51, 905.

(15) Luus, R. Application of iterative dynamic programming to stateconstrained optimal control problems. Hung. J. Ind. Chem 1991, 19, 245.

(16) Mekarapiruk, W.; Luus, R. Optimal control of inequality stateconstrained systems. Ind. Eng. Chem. Res. 1997, 36, 1686.

(17) Shelokar, P. S.; Jayaraman, V. K.; Kulkarni, B. D. Multicanonicaljump walk annealing assisted by Tabu for dynamic optimization of chemicalengineering processes. Eur. J. Oper. Res 2008, 185, 1213.

(18) Fikar, M.; Latifi, A. M.; Fournier, F.; Creff, Y. Application ofiterative dynamic programming to optimal control of a distillation column.Can. J. Chem. Eng. 1998, 76, 1110.

(19) Luus, R. Choosing grid points in solving singular optimal controlproblems by iterative dynamic programming. Proceedings of the 10thIASTED International Conference on Intelligent Systems and Control;Sztandera, L. M., Ed.; ACTA Press: Anaheim, CA, 2007; p 425.

(20) Luus, R.; Galli, M. Multiplicity of solutions in using dynamicprogramming for optimal control. Hung. J. Ind. Chem 1991, 19, 55.

(21) Luus, R. Time optimal control of a binary distillation column. DeV.Chem. Eng. Mineral Process 2002, 10, 19.

(22) Woinaroschy, A. Time-optimal control of startup distillationcolumns by iterative dynamic programming. Ind. Eng. Chem. Res. 2008,47, 4158.

(23) Woinaroschy, A. Time-optimal control of startup distillation ofnonideal mixtures, Ind. Eng. Chem. Res. 2009, 48, 3873.

(24) Luus, R. Optimal control of oscillatory systems by iterative dynamicprogramming. J. Ind. Manage. Optim. 2008, 4, 1.

(25) Luus, R.; Okongwu, O. N. Towards practical optimal control ofbatch reactors. Chem. Eng. J. 1999, 75, 1.

(26) Rusnak, A.; Fikar, M.; Latifi, M. A.; Meszaros, A. Receding horizoniterative dynamic programming with discrete time models. Comput. Chem.Eng. 2001, 25, 161.

(27) Hull, T. E.; Enright, W. D., Jackson, K. R. User Guide toDVERKsA Subroutine for SolVing Nonstiff ODE’s; Department of Com-puter Science, University of Toronto: Canada, 1976; Report 100.


(28) Mekarapiruk, W.; Luus, R. Iterative dynamic programming withadaptive scheme for region size determination. Hung. J. Ind. Chem. 1999,27, 235.

(29) Luus, R. Handling difficult equality constraints in direct searchoptimization. Hung. J. Ind. Chem 1996, 24, 285.

(30) Luus, R.; Storey, C. Optimal control of final state constrainedsystems, Proceedings of the IASTED International Conference on Modeling,Simulation, and Optimization; Singapore August 11-13, 1997; Hamza,M. H., Ed.; IASTED/Acta Press:Calgary, AB, 1997; p 245.

(31) Luus, R.; Mekarapiruk, W.; Storey, C. Evaluation of penaltyfunctions for optimal control. Proceedings of the International Conferenceon Optimization Techniques and Applications (ICOTA ′98); Curtin PrintingServices: Perth, Western Australia, 1998; p 724.

(32) Luus, R.; Mekarapiruk, W.; Storey, C. Evaluation of penaltyfunctions for optimal control. Optimization Methods and Applications; Yang,X., Teo, K. L., Caccetta, L., Eds.; Kluwer Academic Publishers: London,UK, 2001; p 81.

(33) Luus, R. Effective solution procedure for systems of nonlinearalgebraic equations. Hung. J. Ind. Chem 1999, 27, 307.

(34) Luus, R. Handling difficult equality constraints in direct searchoptimization. Part 2. Hung. J. Ind. Chem 2000, 28, 211.

(35) Luus, R.; Sabaliauskas, K.; Harapyn, I. Handling inequalityconstraints in direct search optimization. Eng. Optim. 2006, 38, 391.

(36) Luus, R.; Jaakola, T. H. I. Optimization by direct search andsystematic reduction of the size of search region. AIChE J. 1973, 19, 760.

(37) Luus, R. Direct search Luus-Jaakola optimization procedure.Encyclopedia of Optimization, 2nd ed.; Floudas, C. A., Pardalos, P. M.,Ed.; Springer, New York, 2009; p 735.

(38) Liao, B.; Luus, R. Comparison of the Luus-Jaakola optimizationprocedure and the genetic algorithm. Eng. Optim. 2005, 37, 381.

(39) Luus, R. Use of line search in the Luus-Jaakola optimizationprocedure. Proceedings of the 3rd IASTED International Conference onComputer Intelligence; Banff, Alberta, Canada, July 2-4, 2007; Acta Press:Anaheim, CA, 2007; p 128.

(40) Luus, R.; Dittrich, J.; Keil, F. J. Multiplicity of solutions in theoptimization of bifunctional catalyst blend in tubular reactor. Can. J. Chem.Eng. 1992, 70, 780.

ReceiVed for reView November 25, 2008ReVised manuscript receiVed March 28, 2009

Accepted April 2, 2009

IE801806T


Documents

Handling Inequality Constraints in Optimal Control by Problem Reformulation