Improvement of Gauss-Newton method for parameter estimation through the use of information index

438 Ind. Eng. Chem. Fundam. 1983, 22, 436-445

CB,[B] = concentration of intermediate product B Cc,[C] = concentration of final product C CAo,[Al0 = initial concentration of reactant A CB,,[B]- = concentration of product B at infinite reaction time Cc,,[C], = concentration of product C at infiiite reaction time a = CA/CAO

= CB/CAO c = c C / c A O

= kl/k2CA0 k1 = rate constant for reaction R1 k2 = rate constant for reaction R2

m = complex refractive index of soot particles T = transformed time, defined by eq 3

E(m) = -Im[(m2 - l)/(rf + 2)]

Literature Cited Benson. S. W. “The Foundations of Chemical Kinetics;” McGraw-Hill: New

Frenklach, M.; Taki, S.; Matula, R. A. Combust. Flame 1983, 49, 37. Froment, G. F.; Bischoff, K. B. “Chemical Reactor Analysis and Design;”

Hili, C. G., Jr. “An Introduction to Chemical Engineering Kinetics and Reactor

Holland, C. D.; Anthony, R. G. “Fundamentals of Chemical Reaction Engi-

Houser, T. J. J. Chem. Wys. 1969, 50. 3962. Levenspiel, 0. “Chemical Reaction Engineering,” 2nd ed.; Wiiey: New York,

YO&, 1960; pp 45-46.

Wiiey: New York, 1979; pp 13-15.

Design;” Wiiey: New York, 1977: pp 337-342.

neering;” Prentice-Hail: Englewood Cliffs, NJ, 1979; Chapter 10.

1972; pp 150-157.

Receiued for reuiew September 13, 1982 Accepted July 18, 1983

Improvement of Gauss-Newton Method for Parameter Estimation through the Use of Information Index

Nlcolas Kalogerakis and Rein Luus’

Department of Chenflcal Englneerlng and Applled Chemlstty, Unlversky of Toronto, Toronto, Ontario M5S 1A4, Canada

In parameter estimation for systems described by ordinary differential equations, the region of convergence for the Gauss-Newton method can be substantially enlarged through the use of an “information index” and an optimal step-size policy. The information index provides a measure of the available sensitivity information as a function of time, thereby locating the most appropriate section of data to be used for parameter estimation. The use of the chosen section of data significantly improves the conditioning of the linear algebraic equations that are solved at the end of each iteration in the Gauss-Newton method, and the region of convergence is thus expanded. I f observations are unavailable in the most appropriate section of data, artificial data can be generated by data smoothing and interpolation. The effectiveness of the approach is illustrated by four examples deplctlng typical chemical engineering systems. The proposed procedure is especially attractive for systems described by stiff differential equations.

Introduction Frequently the structure of the mathematical model of

a physical system is available from fundamental principles. It is then required to use experimental data to obtain the best estimate for the parameters in the model in the sense of minimizing the difference between the model predictions and the actual observations taken from the physical system. Quasilinearization and Gauss-Newton are the most commonly used parameter estimation procedures because of their fast quadratic convergence near the optimum (Bellman and Kalaba, 1965; Lee, 1968; Bard, 1970,1974). However, a major problem is the small region of convergence. Unless the initial guess of the unknown parameters is in the immediate vicinity of the optimum, divergence occurs. This problem has received much attention recently. Ramaker et al. (1970) suggested the incorporation of Marquardt’s modification to expand the region of convergence. Donnelly and Quon (1970) proposed a procedure whereby the given data are perturbed if divergence occurs and a series of problems are solved until the model tra- jectories Ymatch” the original data. Nieman and Fisher (1972) incorporated linear programming into the parameter estimation procedure and suggested the solution of a series of constrained parameter estimation problems where the search is restricted in a small parameter space around the chosen initial guess. Recently, Wang and Luus (1980) showed that the use of shorter data length can enlarge substantially the region of convergence. To further in-

crease the region of convergence, Kalogerakis and Luus (1982) proposed a two-step procedure whereby direct search optimization is used initially to reach the vicinity of the optimum, followed by the Gauss-Newton method.

Whenever an initial guess is outside the region of convergence the source of difficulties may be traced to the following. First, if the estimate of the parameter vector is far from the optimum, the differential equations may become numerically unstable. Second, the Gauss-Newton method may cause excessive overstepping; and third, it may predict a totally wrong direction in which to change the parameter vector. In this paper the importance of using a robust integration routine is demonstrated and an optimal step-size policy is proposed to cope with the first two problems. It is further shown that lack of sufficient sensitivity information cause8 the Gauss-Newton method to predict a wrong direction and the problem can be overcome by incorporating such information into the parameter estimation procedure. Problem Statement

dinary differential equation Let us consider a dynamic system described by the or-

x(to) = xo (2) where x is the n-dimensional state vector, xo is the given

0 1983 American Chemical Society

Ind. Eng. Chem. Fundam., Vol. 22, No. 4, 1983 437

equations that are to be solved at the end of each iteration of the Gauss-Newton method. Reliable determination of A.irv+l) from eq 10 does not simply imply that matrix A must be nonsingular but that the system of linear equations be not ill-conditioned; Le., the condition number of matrix A be not excessively large. Since A is symmetric and positive definite, the condition number is defined (Strang, 1980) in terms of the largest and the smallest eigenvalues of A , namely

Even with double precision computational difficulties usually arise in the solution of eq 10 if the condition number is very large. Since A depends only on the sensitivity coefficients, the ill-conditioning of A when the condition number is high is related to the insensitivity of the output to one or more parameters.

Several methods have been proposed in the literature to overcome this problem of ill-conditioning. Among them the most common are the pseudoinverse (or truncation) approach and the damped least squares (Osborne, 19721, both used extensively in least-squares problems. The pseudoinverse can be computed either by singular value decomposition of matrix A (Lawson and Hanson, 1974) or by approximation techniques (Klinger, 1968).

In the case of the damped least squares, eq 10 is replaced by a better conditioned system

(12) proposed by Levenberg (1944) and Marquardt (1963). Here p 2 is a small free parameter, which is chosen to be larger than the smallest eigenvalue of A if A is ill-conditioned.

The idea behind both methods in solving nonlinear least-squares problems is to use an approximation of eq 10 until the parameters reach the vicinity of the optimum, where it is assumed that the system is well-posed. How- ever, these methods often converge to local optima and convergence is rather slow, if a t all. On the other hand, the efficiency of Marquardt's modification is greatly in- fluenced both by the starting value of p and the rate a t which p is varied (Ramaker et al., 1970; Osborne, 1972). Therefore, a large number of preliminary runs may be required. Although these techniques may overcome the ill-conditioning of A , the estimate AkG+l) cannot really be obtained reliably, since the direction of the change in the parameter vector may be totally wrong, since the basic sensitivity information may be lacking.

A careful examination of the matrix A in eq 10 shows that A depends not only on the sensitivity Coefficients but also on their values a t the times when the given measurements have been taken. Therefore, when the parameter estimates are far from the optimum, some of the sensitivity coefficients may be excited and become large in magnitude at times which fall outside the range of the given data points. This means that the output vector will be insensitive to these parameters a t the given measurement times, resulting in the loss of the sensitivity information in A . Thus eq 10 becomes ill-conditioned and any attempt to change the equation is really self-defeating, since the parameter estimate will be unreliable.

Therefore, instead of modifying eq 10, we propose a direct approach whereby the conditioning of matrix A can be improved significantly by the use of an appropriately selected section of data, so that more of the available sensitivity information is captured for the current parameter values. Namely, we propose to use the section of data over which the output vector is most sensitive to the parameters, even if the section is entirely different from the

cond(A) = hmax/Xmin (11)

( A + p21)AlkJ+') = b

initial state, and k is a p-dimensional vector of constant parameters. We assume that fhas continuous first partial derivatives with respect to its arguments. The rn-dimensional output vector y is related to the 'state vector by

Ye) = CxW (3) where C is a constant m X n matrix.

Given the measurements of the output vector, 3(ti), i = 1, 2, ..., N , the problem is to estimate the unknown parameter vector k , which minimizes the performance index S, given by

N

i=l S = C [ f ( t i ) - Y(ti)lTQ(ti)[3(ti) - AtJl (4)

where Q is an m X m positive definite, symmetric weighting matrix which can have different values along the data length. Gauss-Newton Method

Let us consider the well-known Gauss-Newton method. Suppose an estimate k'J7 of the unknown parameter vector is available a t the j th iteration. To obtain the next estimate kG+'), we first expand the corresponding output vector by Taylor series to yield

Substitution of eq 3 into eq 5 yields yO+"(t) = C x q t ) + CG(t)Ako+l) (6)

where G ( t ) is the n X p sensitivity matrix (dxT/dk)T, obtained by differentiating both sides of eq 1 with respect to k to yield

- dG(t) = ( $TG(t) + ( g)T (7) dt

with G(t0) = 0

since the initial state x o does not depend on k . Substi- tuting yG+l)(t) into the performance index in eq 4 and setting dS/akG+l) = 0 we obtain a set of linear algebraic equations N

i= l dr( t i )CQ( ti)CG(ti) A@+1) =

i=l zQ(ti)CQ(ti)[Y(ti) - C&'(ti)l (9)

AA@+l) = b (10)

which is of the form

Solution of eq 10 yields A k G + l ) and thus kG+l) is obtained. If the initial guess k(O) is sufficiently close to the optimum, this procedure yields a sequence of parameter values converging rapidly to the optimum. If, however, the parameter values are far from the optimum, there are three problems associated with the estimate kG+l) obtained by eq 1 0 (a) AkG+') may correspond to a wrong direction; (b) the change AkG+l) may be in the right direction, but too large in magnitude; i.e., there may be excessive overstepping; (c) the state equations may be numerically unstable a t k o + AkG+l). In the following sections each of these problems is examined separately, and means of overcoming the problems are proposed. Information Index

When the parameter values are far from the optimum some difficulties arise in the solution of the linear algebraic

438

range of the given data points. To be able to determine the proper section of data, we

introduce the information index for each parameter, defined by

Ind. Eng. Chem. Fundam., Vol. 22, No. 4, 1983

sj(t) = k j (:;) - Q (;) - kj (j = 1, 2, ..., p ) (13)

or equivalently, using eq 3 and the definition of the sensitivity matrix

s j ( t ) = kjejTdr(t)CQCG(t)ejkj = 1, 2 , ..., p ) (14)

where ei is a unit p-dimensional vector with 1 in the j th element and zeros elsewhere. The scalar s j ( t ) can be viewed as an index measuring the overall sensitivity of the output to the parameter k j at time t.

Thus, given an initial guess kco), we can integrate eq 1 and 7 and then compute s j ( t ) , j = 1, 2 , ..., p as functions of time. Subsequently, by plotting s j ( t ) vs. time, preferably on log scale, we can determine the time interval where the information indices are excited and become large in magnitude. If observations are not available within this time interval, artificial data should be generated by data smoothing and interpolation to provide the sensitivity information.

The idea behind this approach is that given the structure of the model and the initial state, for each set of parameter values there exists a time interval which contains most of the sensitivity information. The information index for each parameter therefore enables us to locate the best section of data which should be used for parameter estimation at that particular iteration. Use of such a section improves directly the conditioning of matrix A by cap- turing most of the available sensitivity information and makes the next estimate of the parameter vector more reliable. An Optimal Step-Size Policy

When the parameter values are far from the optimum, the Gauss-Newton method usually causes overstepping, since eq 5 is accurate only to first-order variations. To limit the amount of change in the parameter vector, we introduce a stepping parameter X (0 < h I 1) so that the parameter vector for the next iteration is

There are two steps involved in obtaining an optimal value for X in the sense of minimizing the performance index along the chosen direction by the Gauss-Newton method.

First we find an acceptable value for X, denoted by A,, simply by choosing X = 1 and halving it until the performance index is improved, i.e.

It should be emphasized here that it is unnecessary to integrate the state equations for the entire data length for each value of A. Once the performance index becomes greater than S(ku)) , a smaller value for X can be chosen. By this procedure, besides the savings in computation time, numerical instability is avoided, since the performance index S becomes large quickly and integration is stopped well before computer overflow is threatened.

Having an acceptable value for A, we next obtain A,,,. as follows. Taylor series expansion of the state vector with respect to A, retaining up to second order terms, yields

(17) x(t ,@) + X A H + 1 ) ) = x q t ) + XG(t)A@+l) + X 2 r ( t )

where the n-dimensional vector r ( t ) is not a function of A. The vector r ( t ) is easily computed by the performed integration of the state equation a t ku) + X,AkG+l); i.e.

r(t) = - [ X ( t , @ ' + XaAk'J+l)) - xO'(t) - X,G(t)A@+')] 1 An2

(18) Thus, having r(t i) , i = 1,2, ..., N, the performance index becomes

N

i = l s = C[jyti) - C x q t ) - XCG(ti)A@+') -

X2Cr(ti)lT&(ti)[.?(ti) - CxO'(t) - XCG(ti)A@+') - PCr(tJ] (19)

Subsequent use of the stationary condition dSlc3X = 0 yields

(20) a3X3 + a2X2 + a1X + a0 = 0

where N

i= l a3 = 2CrT(ti)C&( ti)Cr( ti)

a2 = 3 C rT( t i ) C Q ( ti)CG( ti)A@+') N

i=l

N

i=l al = ~ @ + l ) * ~ d r ( t ~ ) d r ~ ( t ~ ) ~ ~ ( t ~ ) ~ @ + l ) -

N 2CrT(ti)CQ(ti) i= l [.?(ti) - cX(J'(ti)l

N

i = l a. = - A @ + ~ ) ~ c d r ( t i ) d r ~ ( t i ) [ g ( t i ) - cX(J)(ti)] (21)

Solution of eq 20 thus yields the optimum value of A, namely The solution, for example, is obtained readily by Newton's method within 4-5 iterations using A, as a starting value. By this procedure only one integration of the state equation is required for Xopt. Stability during Numerical Integration

The importance of using a good integration routine should also be emphasized. When AZrQ+') is excessively large, during the determination of A, numerical instability may cause computer overflow very fast, even before we have a chance to compare the performance indices. In this case, the use of a good integration routine is of great importance to provide a message to indicate that the toler- ance requirements cannot be met, so that actual computer overflow is avoided. In this case, we stop integration and simply halve X and start integration from the beginning again. Several restarts may be necessary before A, is obtained. The Proposed Algorithm

At this point it is worthwhile to summarize the whole algorithm which can be seen as a two-step procedure. In the first step we determine the most appropriate section of data to be used for parameter estimation by computing the information index of each parameter. Next, we generate artificial data by data smoothing and interpolation in the desired section of data, if measurements are not available. In general, the best section of data should be determined at each iteration for the current parameter values since the information index provides the best section of data only for the current parameter values. Thus, it is expected that during the course of the iterations different sections of data will be judged most appropriate. However, it is also expected that during the iterations the most appropriate sections of data will lie somewhere in between


the range of the given data points and the section determined by the initial parameter estimates. It is therefore recommended that the selected section of data obtained for the initial parameter estimates be combined with the section of the given measurements and that the entire section be used for all iterations, except in the last iteration. At the last iteration all artificial data are dropped and only the given data are used.

For the parameter estimation we propose to use the Gauss-Newton method with the optimal step size policy discussed earlier. Thus, the steps to be followed are the following: (Al) Select an initial guess k(O). (A2) Integrate eq 1 and 7 to obtain G(t), from which compute s.(t), j = 1,2, ..., p , using eq 14, as functions of time. (A3) blot the information index of each parameter sj(t) vs. time (preferably on log scale) to determine the time interval where the information indices are excited and become large in magnitude. (A4) If observations are not available in this time interval, generate artificial data by data smoothing and interpolation. (Bl ) Having the appropriate section of data and ku), set the iteration index j = 0. (B2) Inte- grate eq l and 7 to obtain xG)(t) and G(t). (B3) Obtain AkG+l) from eq 10. (B4) Using the stepping procedure mentioned previously determine A, from eq 20 and then compute kG+l) from eq 15. (B5) I!llAkG+l)ll is less than a preset criterion terminate the iteration; otherwise con- tinue with step (B2) after incrementing j by one. Modifications for Stiff Systems

The proposed procedure is especially useful when the system is described by a set of stiff differential equations, often encountered in engineering processes (Byme et al., 1977). In this case the wide range of the prevailing time constants creates additional numerical difficulties which tend to shrink even further the region of convergence. To avoid numerical difficulties in the integration of eq 1 and 7 some modifications to the Gauss-Newton method are necessary. It should be noted that if eq 1 is stiff, then eq 7 is also stiff since both equations have the Same jacobian. As has been pointed out (Edsberg, 1976), the scaling of the problem is very crucial so that the numerical solver be- haves a t its best. Thus, it is preferable to scale eq l ac- cording to the order of magnitude of the given measurements, whenever possible. In addition, to normalize the sensitivity coefficients, we introduce the reduced sensitivity coefficients flij(t) = (dxi/dkj)kj, kj # 0; i.e.

O(t) = G(t)K (22) where K is diag(kl, k2, ..., kp) . By this transformation eq 7 becomes

with O(to) = 0.

5 eT(t) G Q ( ti) f3WJK-l A@+') =

With the reduced sensitivity coefficients eq 9 becomes

i=l N CeT(tJflQ(tJLF(ti) - Cd)(ti)l (24) i = l

which is of the form

A&+') = b (25)

~ & + 1 ) = ~ - l ~ k O + l ) (26) This modification can also be viewed as a transformation

of variables which results in a reduction of the condition

where

10.12

I I

Figure 1. Given measurements for example 1: (0 ) y l ; (0) yz; (-) interpolation curve.

number of A . Therefore, the use of the reduced sensitivity coefficienb will also be beneficial for nonstiff systems.

When the system is stiff and observations are unavailable during the fast transienta of some variables, generation of artificial data by interpolation at times close to the origin may be difficult. This difficulty can be overcome, however, by redefining the initial state. Instead of restricting the use of data close to the given initial state, the data can be shifted to allow any of the early data points to constitute the initial state, where the fast transients have died out (Kalogerakis and Luus, 1980). At the shifted origin, generation of artificial data by interpolation is considerably easier. This is illustrated in examples 3 and 4. To illustrate the usefulness of the information index and the proposed optimal step size policy, we next consider four examples depicting typical chemical engineering systems. Numerical Results

Example 1. We consider the pyrolytic dehydrogenation of benzene to diphenyl and triphenyl studied by Hougen and Watson (1948) and used for parameter estimation by Seinfeld and Gavalas (19701, Kalogerakis and Luus (1980) and Glowinski and Stocki (1981). The system is described by the two-parameter model

= - rl - r2 xl(0) = 1 hl

dt -

- r1/2 - r2 x2(0) = 0 b 2

dt _ -

with rl = kl(x12 - x2(2 - 2x1 - x2)/0.726)

r2 = k2(x1x2 - (1 - x1 - 2x2)(2 - 2x1 - x2)/3.852) (27) where x1 and x 2 are the gram-moles of benzene and diphenyl per gram-mole of pure benzene feed. The parameters kl and k2 are to be determined from the eight measurements of x1 and x 2 which are given as a function of the residence time t and are shown in Figure 1. The weighting matrix Q = diag(1,lO) was used here. The simplicity of the model allows an easy representation of the region of convergence in the kl-k2 plane.

To establish a base for comparison, we first determine the region of convergence for the standard Gauss-Newton method using a fourth-order Runge-Kutta (with fixed step for integration) routine. The region is shown in Figure 2 by curve a. The Gauss-Newton method converges very rapidly to the optimum from any initial guess which falls within this region. A typical nul is shown in Table I where the optimum k* = (355.4, 403.3)T is reached in four iterations.

By using the proposed step size policy, the region of convergence is enlarged as shown in Figure 2 by curve b,

1.0, M 1

+ i + i + + "1

Figure 2. Region of convergence of the Gauss-Newton method for example 1: (a) standard Gauss-Newton method (Runge-Kutta for integration); (b) with optimal step size (Runge-Kutta for integration); (c) with optimal step size (DVERK for integration); (d) with optimal step size ( D G M for integration); (e) with optimal step size and use of information index (DVERK for integration). Test points for Gauss-Newton method with optimal step size and use of information index are denoted by + (DGEAR for integration).

Table I. Convergence of the Gauss-Newton Method for Example 1

iteration h , k * performance index

0 550.0 550.0 0.43017 X 10.' 1 298.9 413.8 0.11596 X 10.' 2 349.7 406.1 0.14118 X 3 355.3 403.3 0.30806 X 4 355.4 403.3 0.30788 X

where the same integration procedure was used. It is clear that the region of convergence has been more than doubled in size.

The region of convergence can be further enlarged by using better integration procedures. Although there is a large number of good differential equation solvers available, we selected two typical ones, the IMSL routines DVERK (J. H. Verner's Runge-Kutta formulas of fifth and sixth order) designed for nonstiff differential equations and DGEAR (Gear's method, revision 3, analytical jacobian supplied) designed for stiff systems (IMSL reference manual, 1980). The expansion of the region of convergence is shown in Figure 2 by curves c and d for DVERK and DGEAR, respectively. Although this example is a nonstiff system, the use of a stiff differential equation solver in- creases the region of convergence quite substantially. This result is not surprising since the eigenvalues of the jacobian of the system depend on the parameters which change during the course of the iterations and thus, the equations could become stiff a t some point. Thus, by using the proposed step size policy and DGEAR for integration, the region of convergence is increased by a factor of 100.

Next, let us assume that as an initial guess we have kea) = (355400, 403300)T which is three orders of magnitude away from the optimum. Obviously, the Gauss-Newton method fails to converge whether the step size policy is used or not. We can readily detect the source of difficulties by computing the information index for each parameter as a function of time, shown in Figure 3, and observing that most of the sensitivity information is available in the interval [ lo+, lo+], which falls outside the range of the given measurements. In the range of the given data points the

A- >- Y)

0 4

TIME

Figure 3. Normalized information index of kl and k2 vs. time, t o determine the best section of data to be used by Gauss-Newton method, for example 1 (slmar = 0.0884; sZ- = 0.0123).

Table 11. Comparison of Computation Time Requirements by DVERK and DGEAR for Parameter Estimation in Example 1

comDut time. s on IBM 3033 init. guess. k(') iterations DVERK DGEAR

(3554,4033) 6 2.16 2.28 (35540,40330) 8 4.08 2.76 (355400,403300) 9 20.64 3.00

information indices are practically zero, which shows that for the assumed parameter values no sensitivity information is available by using only the given measurements.

If, however, we generate artificial data by interpolation in the interval [lo+, 5.63 X lo4], 10 data points per cycle, 46 points in total, subsequent use of the Gauss-Newton method with the proposed step size policy brings the parameters to k = [355.4, 403.8IT in nine iterations. The significant expansion of the region of convergence through the introduction of the information index is shown in Figure 2 by curve e. DVERK was used here for integration.

At this point the integration routine becomes the lim- iting factor. If instead of DVERK, DGEAR is used for integration, the region of convergence is further expanded. To illustrate, we used 16 different starting points beyond curve e, indicated by a + in Figure 2. At most, 12 iterations were required to converge to the optimum from all these points.

We thus see that the use of information index combined with the proposed step size policy and an appropriate integration routine results in a very substantial expansion of the region of convergence for the Gauss-Newton method.

It should be noted that the use of DGEAR for integration does not require extra programming effort, since an analytical expression for the jacobian ( d f T / d ~ ) ~ is also required by the parameter estimation method. In addition to the expanded region of convergence, by the use of DGEAR for integration there are savings in computation time as well, as shown in Table I1 for some typical starting parameter values.

For the sake of comparison the information indices are computed versus time when the parameters have their optimum values. As shown in Figure 4, in terms of experimental design the given measurements have been taken at proper times, although some extra information could have been gained by having had a few extra measurements in the interval [lo4, 5.63 X lo4]. As indicated by the information indices, data points in the interval [ lo4,

should not contribute significantly to the accuracy of the best estimates. This is shown in Table 111, where


TIME

Figure 4. Normalized information index of kl and k2 (at the optimum) vs. time, to determine whether measurements have been collected at proper times, for example 1 (81- = 0.0885; 82- = 0.0123).

Table 111. Standard Deviation of Best Estimates vs. the Section of Data Used for Example 1

no* Of std dev, %

section of data points k , k, [5.63 X 1.697 X lo-'] 8 0.30 0.78

1.697 X lo-'] 14 0.25 0.74 [lo-', 1.697 X 10"] 54 0.24 0.74

the decrease in the standard deviation of each parameter is quite small when 40 additional data points are used in the interval [ W , W ] . Therefore, the information indices can also be used to check whether measurements of the output have been taken a t proper times.

Example 2. Next, we choose the model of an isothermal CSTR with complex reactions used by Lapidus and Luus (1967) and Rao and Luus (1972) for optimal control studies. The governing equations are

data

dxl/dt = k5 - qX1 - klxlX2 - k 4 X l X 6 m ;

x1(0) = 0.1883

&2/dt = 7.0 - Q X ~ - klxlx2 - 2k2~2~3; ~2(0) = 0.2507

dx3/dt = 1.75 - qx3 - k2~23t3; x&O) = 0.0467

dx4/dt = -qx4 + 2k1xlx2 - k3x4x5; x4(0) = 0.0899

dx,/dt = -QZ~ + 3k2X2X3 - k3~4~5;

dx,/dt = -QX6 + 2k,x&f, - k 4 X 1 X g m ;

x&O) = 0.1804

X 6 ( 0 ) = 0.1394

dx,/dt = -9x7 + 2k4X1x6dO.9; x7(0 ) = 0.1046

q = 8.75 + 125 (28) The output vector is related to the state variables

through eq 3 where

1 0 0 0 0 0 0 c = [%W] (29)

which means that four state variables, xl, x4, x5, and x6, are observed. The problem is to determine the five unknown parametem given the measurements P(ti), ti = 0.01i

A S 006

' ' 0 0.03 006 0.09 012 015 OB 021 TIME

Figure 5. Simulated measurements for example 2: (0) yl; (0) y2; (A) y3; (X) y4; (-) interpolation curve.

(i = 1,2, ..., 20). The values of the parameters are assumed to be kl = 17.6, k2 = 73.0, k3 = 51.3, k4 = 23.0, and k5 = 6.0. The data are generated from

where Ri are 4 X 4 diagonal matrices of random numbers uniformly distributed from -1 to 1, u is the maximum noise level, and x*(t) the exact solution of eq 28, obtained using the given parameter values. A maximum noise level of 0.05 was used here, and the simulated data points are shown in Figure 5. As weighting matrix Q , the identity matrix was used here.

Since the state variables are the weight fractions of the reacting species, the seventh differential equation in eq 28 is redundant since x7 can be calculated from the other maas fractions. As a result, the total number of differential equations to be integrated per iteration is thus reduced from 42 to 36.

To obtain a good indication of the size of the region of convergence we may examine specific directions in the parameter space. Thus, let us consider the initial guesses

(31) where El = diag(l,l,l,l,l), E2 = diag(l,.l,.l,.l,.l), E, = diag(.l,l,.l,.l,.l), E4 = diag(.l,.l,l,.l,.l), E5 = diag- (.l,.l,.l,l,.l), and E6 = diag(.l,.l,.l,.l,l). The maximum value of CY for which convergence is obtained by the Gauss-Newton method shows therefore the size of the region of convergence along the chosen direction.

First we determine the size of the region of convergence for the standard Gauss-Newton method with DVERK used for integration. As seen in Table IV, the region is very small and it can be enlarged somewhat when the proposed step size policy is used. Again, if instead of DVERK, the stiff differential solver DGEAR is used, the region of convergence is further enlarged as shown in Table IV.

= [0.176 X lo6, 0.731 X lo5, 0.513 X lo5, 0.230 X lo5, 0.601 X lO4IT, which corresponds to CY = 10000 along the direction (l,.l,.l,.l,.l). In this case the condition number of matrix A is excessively large, of the order of 1019, and the Gauss-Newton method fails to produce a new estimate for the parameter vector in the first iteration. However, we can easily overcome this problem by the proposed

Y(tJ = [I+ .RJCx*(tj) (i = 1, 2, ..., 20) (30)

generated by Yo) = [ I + CY&]^* (i = 1, 2, ..., 6)

Suppose now that as an initial guess we have

Table IV. Region of Convergence of the Gauss-Newton Method along 6 Different Directions, for Example 2 maximum ct for convergence integration

direction (lJ,IJ,l) (l,.l,.l,.l,.l) (.l,l,.l,.l,.l) (.l,.l,l,.l,.l) (.l,.l,.l,l,.l) (.l,.l,.l,.l,l) routine standard Gauss-Newton 3 10 10 10 2 1.5 DVERK Gauss-Newton 4 25 30 35 5 3 DVERK

with optimal step size 15 160 290 310 70 10 DGEAR

442 Ind. Eng. Chem. Fundam., Vol. 22, No. 4, 1983

0

0

O " 8 , F- 0 0 4 1 0 O 0 O

2 3 - - 0 0 z1

x 0

E ti- >-

8

7

-6 Ir)

4 2

0 - 5 x

10

2-

1

O O 00000'0

13 0 0 0 0 0 0 0

O - o o

1'

i

LLIyy/ IO0

TIME

Figure 6. Normalized information index of kl, ..., k5 vs. time, to determine the best section of data to be used by the Gauss-Newton method, for example 2 (sl- = 0.0135, s- = 0.0014, s3- = 0.0113, sirnar = 0.0072, $5- = 0.2454).

RANGEOFDIVIN # 7 1 DATA POINTS ~

TIME

Figure 7. Normalized information index of kl, ..., k5 (at the optimum) vs. time, to determine whether measurements have been taken at proper times, for example 2 (sl- = 0.0137, sZrnsl = 0.0055, s3-

procedure as follows. We first compute the information indices, si@), j = 1,2, ..., 5 with respect to time. Next, from the plot of sj(t) vs. log ( t ) in Figure 6, we can see that the time interval where the information indices are excited is [lo4, which is outside the range of the given data points, [0.01,0.20]. Next, we generate artificial data by interpolation in the interval [lo*, 0.01],10 data pointa per cycle, 40 points in total. The smoothness of the data, shown in Figure 5, indicates that such an interpolation poses no difficulty. Subsequent use of the Gauss-Newton method with the proposed step size policy and with DGEAR for integration, brings the parameters to k = [17.59, 73.01, 51.29, 23.01, 6.00IT in 15 iterations.

It is noted that the condition number of matrix A was reduced from 1019 to 1.74 x lo2 during the first iteration. This shows quite clearly that the ill-conditioning of the linear system can be significantly improved by a proper choice of data points in the proper place, so that most of the available sensitivity information is utilized. Similarly, the proposed procedure yielded convergence to the optimum from all other directions with cy = 10000.

For comparison purposes, the information indices versus time are shown in Figure 7 when the parameters have their optimum values. It is seen that in terms of experimental design the given data points are taken at proper times. Although the generated data are important during the early iterations, they should not contribute significantly to the accuracy 3f the final estimates, as indicated by the information indices. This is again shown in Table V where the decrease in the standard deviation of each parameter is quite small when 40 additional data points are used in the interval [lo4, 0.011.

Next we consider the reversible-one- enzyme system used by Garfhkel et al. (1966) and Enright

= 0.0039, sirmu = 0.00098, ~ 5 ~ - = 0.0813).

Example 3.

Table V. Standard Deviation of Best Estimates vs. the Section of Data Used, for Example 2

std dev, % no. o f section data of data points k , k, k , k , k ,

.20] 60 5.10 7.85 6.55 17.1 2.65 [.01, .20] 20 5.55 8.65 7.00 18.5 2.90

ok" I 2 3 4 6 8 IO 20 30 40 60 80 DO TIME

Figure 8. Simulated measurements for example 3: (0) yl; (0) y2; (-) interpolation curve.

and Hull (1976) as a test problem for stiff differential equation solvers.

k k A~ + A& A~ + A~ + A~

The governing equations are dxl/dt = -k1xlx, + (kz + k 3 ) ~ 3 - k4X1X4;

xl(0) = 3.365 X lo-' dxz/dt = -k lx lxz + k 2 ~ 3 ; xz(0) = 8.261 x dx,/dt = k1~1xz, - (kz + k J ~ 3 + 124x13~4;

x3(0) = 9.380 X lo4 dx4/dt = -k4xIx4 + k3x3; x4(0) = 1.642 X (33) The parameter values are assumed to be kl = 3.0 X lo5, kz = 2.0 X lo', k3 = 1.0 X lo2, and k4 = 9.0 X lo5. The observation matrix is

c = p o o o o ~ 0 1 0 0 0

(34)

Two state variables x1 and xz are observed. Simulated data generated at 21 unequally spaced points in the interval [1.,100.] from

(i = 1, 2, ..., 21) (35) with u = 0.01 are shown in Figure 8. The weighting matrix Q = diag(.108,1) was used here to compensate for the difference in the magnitudes of y1 and yz.

For the above system it is readily seen that there are two invariants, i.e., x l ( t ) + x 3 ( t ) = constant and x z ( t ) - x l ( t ) + x 4 ( t ) = constant. Therefore, two of the four differential equations are reduced to algebraic ones, and only 10 differential equations have to be integrated per iteration.

As in example 2, we compute the size of the region of convergence for the standard Gauss-Newton method along five different directions. DGEAR was used for integration throughout this example. As shown in Table VI, the region is very small, but it can be enlarged substantially with the proposed step-size policy.

= [3.0 X lolo, 2.0 X lo', 1.0 X lo', 9.0 X 1O1OlT which corresponds to cy

= lo6 along the direction (.l,l,.l,.l). Direct use of the Gauss-Newton method fails in the first iteration, as the condition number of matrix A in eq 25, is excessively large, of the order of 1Ol8. Integrating eq 1 and 20, we compute

$(ti) = [ I + d&]Cgr(tJ;

Let us assume now for an initial guess


k l ' 30 x l O l o

k 2 = 2 0 x107 k3= I 0 x I O 7 k4= 9 0 x lOlo 10' 00

5 ,

s4

I RANGE OF I GIVEN lDATP POINTS I C +

Table VI. Region of Convergence of the Gauss-Newton Method along 5 Different Directions, for Example 3 maximum CY for convergence

direction (1,1,1,1) (1,.1,.1,.1) (.l,l,.l,.l) (.l,.l,l,.l) (.l,,l,.l,l) standard Gauss-Newton 1.5 1.0 2.0 2.5 0.5 Gauss-Newton 90 110 40 190 30

with optimal step size

0 ,o%o-. 10-5 ul Illld I j - r n 0 - l , I , ,,,,,,: 100 I , &e

TIME

Figure 9. Normalized information index of kl, ..., k p vs. time, to determine the beat section of data to be used by the Gauss-Newton method, for example 3. Initial state at t = 0.0 (slrnU = 33.08, sZrnlu = 33.08, ssrnax = 4.496, ~4- = 5.645).

i TIME: t - to

Figure 10. Normalized information index of kl, ..., k4 vs. time, to determine the best section of data to be used by the Gauss-Newton method, for example 3. Initial state at t = 5.0 (sl- = 33.18, sZ- = 35.59, ssrnax = 2.957, ~4,,,= = 10.03).

the information indices, si@), j = 1, 2, 3,4, as a function of time. Plotting s j ( t ) vs. log (t) , shown in Figure 9, we see that most of the sensitivity information is available in the time interval [lo*, which falls outside the range of the time interval for the given data points [1.,100.].

Here we have the additional problem that we cannot generate reliably artificial data in the desired interval because of the fast transients near the origin. However, observing the given data points, shown in Figure 8, we see that interpolation does not create any difficulties if we consider as the initial state the data points given at t = 5.0. Recomputing the formation indices with the data a t t = 5.0 as the initial state, shown in Figure 10, we determine the interval where artificial data should be generated, namely [5.0 X 10-9,6.].

Generating 10 data points per cycle, 90 in total, and using the Gauss-Newton method with the proposed step- size policy and to = 5.0, we converge to k = [2.98 X lo5, 1.99 x lo', 9.98 X lo', 9.00 X lo5]*. By introducing the artificial data the condition number of matrix A was reduced from 10l8 to 2.9 X lo3 in the first iteration.

Again, for comparison purposes we computed the information indices when the parameters have their optimum values. As shown in Figure 11, all sensitivity information related to the fast transients cannot be captured

TIME

Figure 11. Normalized information index of k , , ..., k4 (at the optimum) va. time, to determine whether measurements have been taken at proper times, for example 3 (slrn= = 5.924, sZrnax = 5.785, s3rn= =

Table VII. Standard Deviation of Beat Estimates vs. the Section of Data Used. for Example 3

7.387, sirnlu = 6.560).

std dev, % no. o f section data of data points k , k , k , k4

lo2] 71 0.76 1.21 0.93 1.29 [I, IO2] 21 1.00 1.51 1.61 2.03

by the given measurements. However, since the sensitiv- ities of all four parameters are excited by the slow transients as well, it appears that good estimates for the parameters can be obtained without measuring the fast transients. This is shown in Table VII, where the decrease in the standard deviation of each parameter is rather small when 50 additional data points are used in the interval [lO+,l]. A situation where such measurements would be necessary can be detected from such a graph if the information index of one or more parameters is excited only by the fast transients.

Example 4. As a fourth example, the reaction rate equations introduced by Robertson (1967) are considered. This system has been widely used for the evaluation of stiff differential equation solvers (Weimer and Clouch, 1979; Villadsen and Michelsen, 1978; Enright, 1976; Edsberg, 1976; Hindmarch and Byrne, 1976). The reaction system involving three chemical species is described by the two differential equations

dxi/dt = -klxl + kZxz(1 - ~1 - ~ 2 ) ;

dxz/dt = k 1 ~ 1 - kZxz(1 - ~1 - xp) - k3~2'; x ~ ( O ) = 1

~ p ( 0 ) = 0 (36)

where the third concentration variable x 3 has been elim- inated by the use of the invariant xl(t) + x z ( t ) + x 3 ( t ) = 1. The values for the rate constants kl, kz, and k3 are. taken to be 0.04,l.O X 104, and 3.0 X lo', respectively. Simulated data were generated in the interval [1.,100.] from

(i = 1, 2, . .a, 21) (37) with a maximum noise level Q = 0.01 and are shown in Figure 12. The weighting matrix Q = diag(1,loS) was used here to compensate for the difference in the magnitudes of y1 and y2.

In Table VI11 it is seen that the size of the region of convergence along four different directions for the

P(tJ = [ I + aRJx*(tJ;

444 Ind. Eng. Chem. Fundam., Vol. 22, No. 4, 1983

0 0 f.0

20 30 40 60 80 100 0

O 0 000

O'O I >o 2 3 4 6 8 IO 0 TIME

Figure 12. Simulated measurements for example 4: (0) yl; (0) y z ; (-) interpolation curve.

0 4.

~ RANGE I OF GIVEN - ;DATA POINTS

TIME

Figure 13. Normalized information index of k l , kz, k3 vs. time, to determine the best section of data to be used by Gauss-Newton method, for example 4. Initial state at t = 0.0 (slmlu = 0.4713, s2mlu

Table VIII. Region of Convergence of the Gauss-Newton Method along 4 Different Directions for Example 4

= 0.0428, sSrna. = 0.3320).

maximum (Y for convergence direction (1,lJ) (l,.l,.l) (.l,l,.l) (.l,.l,l)

standard 3 0.5 0.5 5

Gauss-Newton 110 20 5 850 Gauss-Newton

with optimal step size

Gauss-Newton method is quite small but can be enlarged substantially with the proposed step size policy. DGEAR was used for integration throughout this example.

Let us consider the case where k(O) = [4. X 1@,1. X lo9, 3. X 10l2IT which corresponds to a = 106 along the direction l,.l,.l). The information indices, shown in Figure 13, show that artificial data should be generated very close to the origin. Thus, taking as the initial state the data point at t = 4.0, we recompute the information indices, shown in Figure 14. Based on the latter we generate artificial data in the interval [4. X lo*, 5.1, 10 data points per cycle, 90 in total. Subsequent use of the Gauss-Newton method with the proposed step size policy and with to = 4.0, brings the parameter estimates to k = [.039, .99 X 1@,3.0 X lO7IT in 14 iterations.

Similarly, by this proposed procedure convergence to the optimum was readily obtained from the other 3 directions with a = lo6. There is great flexibility in the choice of to. The choice of to is based on one's ability to interpolate reliably between data points. For this example we could have chosen instead of 4.0 as the origin, 3.0, 8.0, or 10.0 to yield equally good results. Conclusions

The usefulness of the information index has been illustrated with several examples. The information index

T I E : t - to

Figure 14. Normalized information index of k, , kz , k , vs. time, to determine the best section of data to be used by the Gauss-Newton method, for example 4. Initial state at t = 4.0 (slmax = 0.3285, sZmm

enables us to locate the most appropriate section of data which should be used by the Gauss-Newton method which then predicts accurately the direction toward the optimum. Subsequently, with the proposed optimal step size policy the distance along the direction is determined. The use of a good integration routine is essential to avoid numerical difficulties in the determination of the step size. As a result of the above, the region of convergence for the Gauss-Newton method is significantly enlarged. Once the optimal values for the parameters are obtained, the information indices can then be used to determine whether the measurements of the output have been taken at proper times. This information is of special value to stiff systems. Acknowledgment

This research was supported by the Natural Sciences and Engineering Research Council of Canada Grant A- 3515. The computations were done with the facilities of the University of Toronto Computer Centre. Nomenclature C = (rn x rn) observation matrix e = p-dimensional unit vector f = n-dimensional vector function G = (n x p) sensitivity matrix k = p-dimensional parameter vector K = (p X p) diagonal matrix of the parameters Q = (rn X rn) positive definite, symmetric weighting matrix R = (rn X rn) diagonal matrix of random numbers uniformly

r = n-dimensional vector defined by eq 16 S = performance index defined by eq 4 s, = information index of parameter k j t = time x = n-dimensional state vector y = rn-dimensional output vector Greek Letters cy = parameter indicating normalized distance of initial guess

0 = (n x p) reduced sensitivity matrix X = step size, eigenvalue p = small free parameter u = maximum noise level Superscripts -= measured value of the variable * = value of the variable at the optimum 0') = value of the variable at the jth iteration T = transpose Literature Cited Bard, Y. SIAM J . Numer. Anal. 1970, 7 , 157. Bard, Y , "Nonlinear Parameter Estimation"; Academic Press: New York,

= 0.0427, sgmax = 0.1666).

distributed between -1 and 1

from optimum

1974.

Ind. Eng. Chem. Fundam. 1083, 22, 445-453 445

Bellman, R.; Kalaba, R. ”Quasillnearlzatlon and Nonlinear Boundary Value Problems”; American Elsevier: New York, 1965.

Byrne, 0. D.; Hindmarch, A. C.; Jackson, K. R.; Brown, H. 0. Comput. Chem. Eng. 1977, 1 , 133.

DonneHy, J. K.; Quon, D. Can. J. Chem. €ng. 1970, 48, 114. Edsberg, L. I n “Numerical Methods for Differential Systems”; Lapidus, L.;

Schiesser, W. E., Ed.; Academic Press: New York, 1978; p 181. Enrlght, A. H.; Hull, T. !-I. I n “Numerical Methods for Dlfferentlal Systems”;

Lapidus, L.; Schiesser, W. E., Ed.; Academlc Press: New York, 1976; p 45.

Oarflnkel, D.; Chlng, D. W.; Adelman, M.; Clark, P. Ann. N . Y . Aced. Sci. 1966, 1054.

Glowlnskl, J.; Stockl, J. A I M J. 1981, 27, 1041. Hindmarch, A. C.; Byrne, G. D. I n “Numerical Methods for Differential

Systems“; Lapidus, L.; Schlesser, W. E.. Ed.; Academic Press: New York, 1978; p 147.

Hougen, 0.; Watson, K. M. “Chemical Process Prlnciples”; Vol. 3, Wlley: New York, 1948.

IMSL Llbrary, “Reference manual”, 8th ed.; tnternatlonal Mathematical and Statlstlcal Librarles Inc.: Houston, TX, 1980.

Kalogerakis. N.; Luus, R. A I C M J. 1980, 26, 670. Kalogerakb, N.; Luus, R. Proceedings of 1982 American Control Conference,

Lawson, C. L.; Hanson, R. J. “Solving Least Squares Problems”; Prentlce-

Lee. E. S. ”Quasillnearlzatlon and Invariant Imbedding”; Academlc Press:

Levenberg, K. Quart. Appl. Meth. 1944, 2 , 184. Marquardt, D. W. J. Soc. Ind. Appl. Meth. 1963, 1 1 , 431. Nieman, R. E.; Fisher, D. 0. Can. J . Chem. €ng. 1972, 50, 802. Osbome, M. R. In "Numerical Methods for Nonlinear Optimization”; Lootsma,

Ramaker, B. L.; Smith, C. L.; Mwlll, P. W. Ind. Eng. Chem. Fundam. 1970,

Reo, S. N.; Luus, R. Can. J. Chem. Eng. 1972, 50, 777. Robertson, H. H. I n “Numerical Methods”; Walsh, J., Ed.; Thompson: Wash-

Seinfeid, J. H.; Gavalas. G. R. AICM J . 1970, 16, 644. Strang, G. “Linear Algebra and Its Appllcatlons”; Academlc Press. New

Vllladsen, J.; Michelsen, M. L. “Solution of Differential Equatlon Models by

Wang, B. C.; Luus, R. Int. J . Control 1980, 31, 947. Welmer. A. W.; Clough, D. E. AI- J. 1979, 25, 730.

Hall: Englewcad Cllffs, NJ, 1974.

New York. 1968.

F. A.. Ed.; Academlc Press: New York, 1972; p 171.

9 , 28.

ington, 1967; p 178.

York, 1980.

Polynomial Approxlmatlon”; Prentlce-Hall: Englewood ClMs, NJ, 1978.

Received for review September 20, 1982 Revised manuscript received June 10, 1983

Accepted July 20, 1983

Filtration of Airborne Dust in a Triboelectrically Charged Fluidized Bed

Gabrlel Tardos’ and Robert Pfeffer

Department of Chemical Englneerlng, The City College of The City Universlv of New York, New York, New York 10031

Mlchael Peters

Department of Chemical Engineering and Envlronmental Engineering, Rensselaer Polytechnic Institute, Troy, New York 1218 1

Thomas Sweeney

Department of Chemical Engineering, The Ohla State University, Columbus, Ohla 43210

Experimental evidence is given to show that charges generated naturally on dielectric granules in a fluidized bed during the fluidization process can considerably increase the efficiency of such beds to remove micron and submicron airborne particles. The experimental work Includes both the measurement of electrostatic charges generated on fluidized granules wtth a modlfied Faraday cage, as well as the measurement of the filtration efficiency when such parameters as gas velocity and gas humidity are varied over a wide range. The experimentally obtained efficiencies are then compared to the predictions of a previous fluidized bed filtration model which took into account both the electrostatic charges on the bed granules and dust particles as well as the presence of bubbles in the bed. Furthermore, lt is demonstrated that a proper choice of fluidized particles can significantly increase the efficiency of the fluidized bed filter without an increase in bed thickness or a corresponding increase in pressure drop through the filter.

Introduction The filtration of dusts and aerosols from a gas stream

is a common industrial problem. Many applications in- volve high temperatures, corrosive environments, arkd high particle loadings in which classical cleaning devices, such as electrostatic precipitation, venturi scrubbing, and fixed bed filtration, may not be well suited. The idea of using a fluidized bed to circumvent these difficulties has been

the subject of many investigations dating back to Meissner and Mickley (1949).

The concept involves using the gas to be filtered as the fluidizing medium for a bed of collector particles. Because the collector particles are fluidized, they can be easily removed, cleaned, and replaced continuously (see, e.g., Sryvcek and Beeckmans, 1976). Recent reviews of studies on fluidized bed filtration may be found in the work of

0 1983 American Chemical Society

Documents

Improvement of Gauss-Newton method for parameter estimation through the use of information index