Constructing NARMAX models using ARMAX models

This article was downloaded by: [171.67.34.69]On: 09 March 2013, At: 05:48Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

International Journal of ControlPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/tcon20

Constructing NARMAX models using ARMAX modelsTOR A. JOHANSEN a & BJARNE FOSS aa Department of Engineering Cybernetics, Norwegian Institute of Technology, Trondheim,N-7034, NorwayVersion of record first published: 15 Mar 2007.

To cite this article: TOR A. JOHANSEN & BJARNE FOSS (1993): Constructing NARMAX models using ARMAX models, InternationalJournal of Control, 58:5, 1125-1153

To link to this article: http://dx.doi.org/10.1080/00207179308923046

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form toanyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses shouldbe independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims,proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly inconnection with or arising out of the use of this material.

http://www.tandfonline.com/loi/tcon20

http://dx.doi.org/10.1080/00207179308923046

http://www.tandfonline.com/page/terms-and-conditions

INT. 1. CONTROL, 1993, VOL. 58, No.5, 1125-1153

Constructing NARMAX models using ARMAX models

TOR A. JOHANSENt and BJARNE A. FOSSt

This paper outlines how it is possible to decompose a complex non-linearmodelling problem into a set of simpler linear modelling problems. LocalARMAX models valid within certain operating regimes are interpolated toconstruct a global NARMAX (non-linear NARMAX) model. Knowledge ofthe system behaviour in terms of operating regimes is the primary basis forbuilding such models, hence it should not be considered as a pure black-boxapproach, but as an approach that utilizes a limited amount of a priori systemknowledge. It is shown that a large classof non-linear systems can be modelledin this way, and indicated how to decompose the systems range of operationinto operating regimes. Standard system identification algorithms can be usedto identify the NARMAX model, and several aspects of the system identification problem are discussed and illustrated by a simulation example.

1. IntroductionModelling complex systems using first principles is in many cases resource

demanding. In some cases our system knowledge is so limited that detailedmodelling is difficult. Examples of this are found in, for example, the metallurgical and biochemical process industries.

In some case, resources can be saved by using black-box models describingthe input/output behaviour of the system. Such models represent the controllable and observable part of the system. The structure and parameters ofblack-box models have in general no direct interpretation in terms of thephysical properties of the system. The ARMAX model is a well-known linearinput/output model representation. The NARMAX (nonlinear ARMAX) modelrepresentation is an extension of the linear ARMAX model, and represents thesystem by a nonlinear mapping of past inputs, outputs and noise terms to futureoutputs. In this paper we discuss how NARMAX models can be representedand, in particular, we discuss how a NARMAX model can be constructed froma set of ARMAX models.

We concentrate on non-linear systems that are working in several operatingregimes, because systems that normally work within one operating regime mayin many cases be adequately described by a linear model. There are numerousexamples of systems that must work in several operating regimes, including mostbatch processes (Rippin 1989) and many control systems. Apart from normaloperating conditions, the control system may also have to take care of startupand shutdown, operation during maintenance and faulty operation, whichobviously lead to different operating regimes.

Traditionally, the problem with multiple operating regimes is solved bynon-linear first principles models covering several operating regimes, gainscheduling, or simply by manual or rule-based control of the system when

Received 5 May 1992. Revised 7 November 1992.t Department of Engineering Cybernetics, Norwegian Institute of Technology, N-7034

Trondheim, Norway.

0020-7179/93 $10.00 © 1993 Taylor & Francis Ltd

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3

1126 T. A. Johansen and B. A. Foss

operating outside the normal operating regimes. From an engineering point ofview, it may seem appealing to decompose the modelling problem into a set ofsimpler modelling problems. This is exactly what we propose here: First, thesystem operation is decomposed into a set of operating regimes that are assumedto cover the full range of operation we want our model to cover. Next, for eachoperating regime we choose a simple (typically linear) local model. It is usuallynot natural to define the operating regimes as crisp sets, since there will usuallybe a smooth transition from one regime to another, not a jump. Hence, it makessense to interpolate the local models in a smooth fashion to get a global model.The interpolation is such that the local model that is assumed to be the bestmodel at the current operating point will be given most weight in theinterpolation, while neighbouring local models may be given some weight, andlocal models corresponding to distant operating regimes will not contribute tothe global model at that operating point. To do the smooth interpolation at agiven operating point, we need to know which of the local models describe thesystem well around that operating point. For that purpose, to each local modelwe associate a local model validity function, i.e. a function that indicates therelative validity of local models at a given operating point.

The use of local linear models without interpolation, i.e. piecewise linearmodels, have been suggested by several authors, including Skeppstedt et al .(1992), Skeppstedt (1988), Hilhorst et al . (1991), Billings and Voon (1987), andTong and Lim (1980). A related technique is the use of splines (Friedman 1991)for representing dynamics models (Psichogios et al. 1992). Splines are also localmodels, but unlike piecewise linear models, there are constraints that enforcessmoothness on the boundaries between the local models. Different variations ofinterpolating memories (Tolle, Parks, Ersil, Hormel & Militzer 1992, Lane,Handelman & Gelfand 1992, Omohundro 1987) is a related local modellingtechnique, where a number of input/output-pairs of the system is memorizedand interpolated to give a model. Our approach can be thought of as ageneralization of these techniques, since we interpolate local models.

This paper is organized as follows: first, in § 2, we present a modelrepresentation based on local models. Then we discuss the approximationcapabilities of this representation, and show that a large class of non-linearsystems can be represented. The notion of operating regimes is introduced andwe present a general result guiding the choice of operating point vector.Thereafter, we discuss some practical aspects of modelling using local models in§ 3, and some aspects of system identification in § 4. In § 5, the concepts areillustrated by a simulation example, and § 6 contains some discussions andconclusions.

2. Model representationThe NARMAX model representation

yet) = f(y(t - 1), , y(t - ny), u(t - 1), ... , u(t - nu),

e(t - 1), , e(t - ne»+ e(t) (1)

has been shown by Leontaritis and Billings (1985) and Chen and Billings (1989)to represent a large class of discrete-time nonlinear dynamic systems. Here

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3

Constructing NARMAX models using ARMAX models 1127

, yet) EYe IR m is the output vector, u(t) E U c IR' is the input vector, ande(t) E E c IRm is the equation error. We introduce the (m(ny + ne) + rnu)dimensional information vector

1/J(t - 1) = [yT(t - 1), ... , yT(t - ny ) , uT(t - 1), ... , uT(t - nu),

eT(t - 1), ... , eT(t - neW

where 1/J(t - 1) is in the set lJI = yny x if" x E?«. This enables us to write (1) inthe form

yet) = f( 1/J(t - 1» + e(t) (2)

Provided that necessary smoothness conditions on f: lJI-. Yare satisfied, ageneral way of representing functions is by series expansions. Using a first-orderTaylor-series expanded about the systems equilibrium point yields a standardARMAX model. Second-order Taylor-expansions are possible, while higherorder Taylor-expansions are not very useful in practice because the number ofparameters in the model increases drastically with the expansion order, andbecause of the poor extrapolation and interpolation capabilities of higher-orderpolynomials. Splines offer a solution to this problem, but the approximation inhigher dimensional spaces may be difficult. Chen et al. (1990 a) have proposedusing a sigmoidal neural network expansion; Billings and Voon (1987) used apiecewise linear model, and Chen et af. (1990 b) have applied a radial basisfunction expansion as a means of representing f. A generic model representation based on local models was introduced by the present authors (Johansenand Foss 1992a, c), inspired by work by Stokbro et af. (1990) and Jones et al.(1991). We study this model representation in detail in the following.

2.1. Approximation using local models and interpolationGiven a set of functions OJi: lJI-. [0, 1)};';,OI, the following equation is

trivially true:

(3)

N-l

2: f( 1/J)Pj(1/J)j=O

N-l

2: Pi( 1/J);=0

f(1J!) = ----

(4)N-l

2: Pi(1/J);=0

assuming that at any point 1/J E lJI, not all Pi vanish. We assume the function Piis chosen such that it is localized in a subset lJIj C lJI. This means that Pie1/J) issignificantly larger than zero for 1/J E lJIi, and Pie1/J) is very close to zero for1/J ~ lJIj . Since Pi is close to zero for 1/J ~ lJIj , we can substitute f on theright-hand side in (3) with an t. that is a good approximation only in lJIi

N-l

2: li(1/J)Pi(1/J);=0

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


Introducing the normalized function Wi: lJI-o> [0, 1] defined by

W;(1J!) = N~;(1J!)

L Pj( 1J!)j=O

gives the approximationN-l

1(1J!) = L 1;(1J!) Wi( 1J!)i=O

(5)

(6)

In this equation, we can interpret Wi as a function that gives a value close to 1in parts of lJI where the function 1i is a good approximation to [, and close tozero elsewhere. By definition of w; we know that L;:'(jlw;{1J!) = 1 for all 1J! E lJI,and we call the functions w; interpolation [unctions because they are used tointerpolate local models t: We call t. a local model since it is assumed to be anaccurate description of the true [locally (where Pi is not close to zero).

The set of all functions of the form (6) with local models of polynomial orderp and smooth interpolation functions is denoted

'#p = {l: lJI-o> yll(1/J) = ~11;(1J!)Wi(1J!)}

At the extreme, zeroth-order Taylor-expansions of [ about 1J!; E lJI; may be usedto define 1;:

(7)

where 0; is a parameter vector. Such a simple local model is closely related toan interpolating memory (Tolle et al. 1992) and requires a large number ofinterpolation functions, since this means that the value [( 1J!i) is extrapolatedlocally. This representation is identical to neural networks with localizedreceptive fields (Moody and Darken 1989, Stokbro et al. 1990). Considering{w;};:'(jl as a set of basis-functions, the method is also similar to radialbasis-function expansions (Broomhead and Lowe 1988), for the followingreason: if the functions P; are chosen as local radial functions, the normalizedfunction Wi defined by (5) will not be radial in general, but it will qualitativelyhave much the same shape and features as Pi, except near the boundary of lJI.

A first-order Taylor-expansion of [ about 1J!i provides better extrapolationand interpolation than the zeroth-order expansion (7). Assuming the firstderivative of [ exists, the local models are given by

1;(1J!) = [(1J!;) + 'V[(1J!i)(1J! - 1J!;) = 0i + 8;(1J! - 1J!;) (8)

where 0; is a parameter vector and 8 i is a parameter matrix. Observe that (8) isactually an ARMAX model resulting from a linearization about 1J!,. Both the'weighted linear maps' of Stokbro et al. (1990), Stokbro (1991) and Stokbro andUmberger (1990) and the 'connectionist normalized linear spline networks' ofJones et al. (1989, 1991) used a first-order expansion locally. This representationmakes it possible to build a NARMAX model by interpolating between a set ofARMAX models.

Higher-order local models can of course also be used. Furthermore, there is

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3

(9)


no requirement that all the local models must have the same structure. Some ofthe local models may be based on first principles modelling, while others may begeneric black-box models. The present authors (Johansen and Foss 1992c) usedthis approach to integrate first principles models with neural network typemodels.

2.2. Approximation propertiesIt seems reasonable that the approximation can be made arbitrarily good by

choosing a sufficient number of local models. This is indeed the case, asillustrated in the following. We use the following norm to measure theapproximation accuracy:

Ilf - lll~ = sup Ilf(1/J) - 1(1/J)11z1J'E 'P

where II· liz denotes the euclidean norm.The (p + 1)th derivative of the vector function f at the point 1/J is denoted by

"Jp+lf( 1/J). Assume f is continuously differentiable p + 1 times, and {fi} f"::;/ arelocal models equal to the first p terms of the Taylor-series expansion of f about1/Ji' For any 1/J E lJI, we have

N-l

f(1/J) - 1(1/J) = L: (f(1/J) - l;(1/J»w;(1/J)i~O

If we assume II"Jp+1f (1/J)II< M for all 1/J E 'l', where 11·11 denotes the inducedoperator norm, we obtain by Taylor's theorem

N-l

Ilf(1/J) - 1(1/J)11z < ~ (p ~ I)! 111/J - 1/J;llf+l Wi(1/J)

In order to ensure that this norm is smaller than an arbitrary e > 0, we mustensure that for any 1/J E lJI the following condition holds:

N-l N-l

L: 111/J - 1/Jillf+l,O;(1/J) < e (p + I)! L: '(J;(1/J)i~O M i~O

Defining the set of functions {gi: lJI--.~}{:c/ by

g;(1JI) = 111/J - 1/Jillf+1_ e (p + I)!

M

and rewriting (9) gives the following condition that must hold for any 1/J E lJI:N-l

L: gi(1/J)Pi( 1/J) < 0i=O

or, equivalently, if we divide (10) by L:{:c/{!;(1/J)N-l

L: gi(1/J) w;(1/J) < 0;=0

(10)

(11)

The problem is now to find the conditions on N and the functions {Pi}{:Ol toensure that (10) holds for any given e > O. A geometric interpretation of (10) isgiven in Fig, 1. Certainly, this equation holds if the negative contribution of one

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


Figure 1. A geometric interpretation of the constraints on 'Pi'

term gi(1/J)Pi(1/J) in (10) dominates the (possibly positive) contribution of allother terms. A necessary condition is gi(1/J){'J;(1/J) --+ 0 as 111/Jlb --+ 00. This willcertainly be ensured if we choose Pi as an exponential or gaussian function.

Notice that the shape of the g,-functions are fixed and given by thespecifications. We are, however, free to choose the location and number N oflocal models. Let us choose the set {1/Ji};:~ so large and 'sufficiently dense inlp' that at least one of the functions {gil i=O1 will be negative at any 1/J E 1J1.Then the functions {gil ;:01 are fixed, and we must choose the Pi-functions suchthat (10) holds.

This can be done in several ways. In the limit when the width of thePi-functions go to zero the interpolation functions Hii will approach step-functions as shown in Fig. 2. The model will then approach a piecewise constant

4

2

(~-I .0 _

-I 0 1 2 4 8 9

Figure 2. Situation when the width of 'Pi goes to zero.

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


model if p = 0, a piecewise linear model if p = 1, etc. In this limit, at any

1/J E lJI there will exist a j such~hat {I if i = j

wi( 1/J) = 0 if i *" j

By the choice of {1/Jj}~(/ we know that gj(1/J) < 0, and since Wi(1/J) = 0 for i *" i,(11) will hold. We can now provide a result for the case when lJI is a boundedset.

Theorem 1: Suppose we are given any integer p "" 0, and suppose f hascontinuous (p + l)th derivative. If lJI is bounded, then for any E > 0 there is an7E @ip (with finite N, which may depend on E) such that

Ilf - 71100 < E (12)

Proof: Since Vp +1f(1/J) is continuous, it is bounded (by M < 00) on lJI. Since lJIis bounded, a finite N is sufficient to ensure that one gj-function is negative atany point. Since N is finite, we do not have to go to the limit and make{Wi}~Ol step-functions, but can stop when sufficiently close. Then {Pi}~Ol canbe chosen as smooth functions such that (11) holds. Since 1/J E lJI was arbitrary,the theorem is proved. 0

This is an existence theorem. However, the proof is constructive and givesindications on how to construct the approximator. In order to use this proof toformulate an upper bound on the approximation error, we introduce thefollowing definition of distance between sets, similar to the Haussdorf metric.

Definition 1: Assume A and B are two non-empty subsets of a vector space.Then the distance between the sets is defined as

'21J(A, B) = inf sup Iia - bib£tEA beB

o

The crux in the proof of Theorem 1 is that at any point 1/J E lJI one of thegj-functions is negative and that the Pi-functions are chosen such that at anypoint 1/J E lJI, a negative term gj(1/J)Pi(1/J) will dominate the sum (10). At leastone g;-function will be negative at any 1/J E lJI if the following condition holds:

(13)

If the set {1/Jj} is dense in lJI, this distance will be zero. The term 'sufficientlydense' used informally above means that the set {1/J;} ~o 1 should be chosen suchthat (13) holds for the given E.

Theorem 2: Suppose we are given any integer p "" O. If lJI is bounded and f hasbounded (p + l)th derivative, i.e. Ilvp+lf(1/J)II,;;; M for all 1/J E lJI, then for any7E @ip with finite N and sufficiently narrow functions {p;} ~o1

, an upper boundon the approximation error is given by

A MIlf - flloo ,;;; (p + I)! ('21J({1/J;}, lJI))p+l (14)

Proof: Equation (13) will hold for E equal to the right-hand side of (14). Fromthe p!evious discussion, it is evident that (11) holds for any 1/J E lJI. Hence,Ilf - flloo will be bounded by E, and the result follows. 0

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


This is under the condition that the Prfunctions are chosen narrow. Thisbound is conservative, meaning that for Pi-functions that are not too narrow andnot too wide, one may expect better accuracy.

From (14) we see that if the polynomial order p of the local models isincreased, then the accuracy will improve. If IJf is not bounded and M > 0, Nmust be infinite in order to guarantee a bounded error.

If f does not satisfy the smoothness conditions in Theorem 1, the proofobviously does not hold. If, however, f is such that it can be approximatedarbitrarily well by a sufficiently smooth function, then we can show that f canbe approximated arbitrarily well by interpolating local models. In particular wehave the following corollary.

Corollary 1: The results of Theorem I also hold if the smoothness assumptionon f is relaxed to assuming only continuity. In other words, the set 5;p is dense inthe set of continuous functions from IJf into Y.

Proof: By the Weierstrass approximation theorem (see, for example, Stromberg 1981), for any e > 0 there exists a polynomial 7 such that IIf -711", :;;; e/2.By Th~eor~m 1,7 can be approximated by an 1E 5;p on the bo~nded set IJf suchthat Ilf - fll", < e/2. Using the triangle inequality, we get Ilf - fll", < e. 0

Example 1: Assume p = 1, i.e. that the local models are ARMAX models.Then (14) can be written

A M 2Ilf - fll", :;;; - (2D( {1/1J, 1Jf) = e2

(15)

If f and 1/1 are scalars, M is a bound on the second derivative of f, in otherwords a bound on the curvature. If the system is linear, then M = 0 and onelocal linear model is sufficient to make an arbitrarily good global model (ofcourse). M indicates the nonlinearity of the function, and we expect e toincrease with increasing M, i.e. increasing nonlinearity, which is indeed the caseas indicated by (15). However, using the upper bound M gives a conservativeresult since the system may behave more linearly in some regions than others.Hence, we need not have a high density of local models where the system doesnot have strong nonlinear behaviour. 0

Example 2: With a simple example we illustrate the use of Theorem 2.Consider the function f: [0, 2]--> ~ given by f( 1/1) = ~ + 1. Assume that wehave two local linear models located at 1/10 = 0·5 and 1/11 = 1·5. Then9l({0·5,.1·5}, [0,2])=0·5, p=1 and M=2. Theorem 2 predicts the bounde = 0·25 on the approximation accuracy. As shown by Fig. 3, this bound is exactwhen using infinitely narrow functions Pi, i.e. a piecewise linear approximation.The reason for this is that M = f"( 1/1) = 2 for all 1/1, hence there is no regionwhere f is 'less nonlinear'. As we see later, better approximations can beachieved using well-chosen Prfunctions. From this figure we also see that if thelocal linear models are not chosen as first-order Taylor-expansions, but chosenon the basis of, for example, a least squares regression, improvement might alsobe achieved. 0

Since the system function f can be approximated arbitrary well, we are ableto make arbitrarily good predictions on a finite horizon if there is no noise,

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


6,--- -_-_-_- ----,

4

0.2 0.4 0.6 0.8 12 lA 1~ l.8 2

Figure 3. Approximation of f(1{J) = v? + 1 using two local linear models.

provided the initial values are correct and the inputs and outputs are such that1jJ(t) E 1JI (Polycarpou et al. 1992). However, it is well known that the solutionsof some difference equations are sensitive with respect to initial value ormodelling errors. Examples of such systems are chaotic or marginally stablesystems.

2.3. Operating regimes

In the rest of this paper, we usually assume p = 1, i.e. we use linear ARMAXmodels locally to build a non-linear NARMAX model. In the representation (6)the interpolation functions {Iii;} ~o1 are defined on the set lfI. This is a subset ofthe information-space. If the information-space has high dimension (as it oftenhas), the curse-of-dimensionallity problem arises. The problem, first describedby Bellman (1961), is essentially that the number of local models needed tocover a region of this space uniformly increases exponentially with the dimension of the space. In practice, uniform coverage is usually not necessary, but theproblem is still significant. In some cases the interpolation functions may bedefined on a space of smaller dimension. This is our motivation for introducingthe terms operating regime and operating point. First, we define cP to be the setof operating points. Motivated by the fact that we want to model a nonlinearsystem with a set of linear models, it is convenient to define an operating regimeas a subset of cP where the system behaves approximately linearly.

Definition 2: An operating regime is a set of operating points CPi C cP where thesystem behaves approximately linearly. 0

A model validity function Pi: cP--+ [0, 1] is smooth and satisfies p;(I/» "'" 1 forI/> E CP;, and goes to zero outside CPi' The interpolation function Wi: cP--+ [0, 1] isnow defined as

Wi( 1/» = N~:( 1/»

L Pj(1/»j=O

assuming that at every operating point I/> E CP, not all model validity functions Pivanish.

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


(17)

In many cases there will exist a function H: lJI-7 (/> such that, at any t ,<P(t) = H(1/I(t». The function H will typically be a projection, i.e, (/> will be asubset of a space of lower dimension than lJI. In cases where the operating pointis calculated on the basis of filtered or estimated quantities, the relationshipbetween 1/I(t) and <P(I) is more complex, and must be described by an operator'Je mapping previous data to the operating point. Although very important, thiscomplicates the analysis considerably, and we do not consider this case here, butleave it as a topic for future research.

To summarize, the representation we address at this stage isN-I

Ht) = 1(1/I(t - 1» = ~ 1;(1/1(1 - l»w;(<P(t - 1» (16);=0

where the local models

1;(1/I(t - 1» = 8; + 8;(1/I(t - 1) - 1/1;)

are ARMAX models. We define the set

s, = {l: lJI-7 yI1(1/I) = ~11;(1/I)w;(<p)}

where p is the polynomial order of 1;, the interpolation functions {w;} i~;ii I aresmooth, and <p = H(1/I). Now we want to state some general results regardingthe transform H from the information vector to the operating point vector. Ingeneral, [ can be written as an affine function of some of its argument. Werearrange the elements of 1/1 into 1/IT = [1/11 1/1'1] such that

(18)

Assume lJILand lJIN are the subsets of the information-space corresponding to1/IL and 1/IN' respectively. [I: lJIN -7 ~m and fz: lJIN -7 ~mxl are non-linearvector- and matrix-valued functions, respectively. Our main result guiding thechoice of <p is the following, which indicates that <p must be chosen such that itcaptures the systems non-linearities.

Theorem 3: Assume f given in (18) is continuous, and lJI is bounded. Then [orany s > 0 there is an f E ?Ji l with <p = 1J1N, and finite N such that Ilf-11/", < e.

Proof: Fix an arbitrary 1/1 E lJI such that 1J1T = [1/11 1J1'1].

11[(1/1) - 1(1/1)112 = 1I[(1J1N' 1/IL) - ]oI1;(1/IN' 1/IL)W;(1J1N)112

= 11~\[1(1/IN) + f2(1/IN)1/IL - 1;(1/IN, 1/IL))W;(1/JN)112

= 11~\ft(1/IN) -1Ni(1J1N) + f2(1/IN)1J1L - 1U(1/IL»)W;(1/IN)1I2

In the last line we split the linear function 1;: lJI-7 ~m into two linear functions1N;: lJIN -7 ~m and 1u: lJIL -+ ~m. Now we choose 1u(1/Id = P;1/IL where P; isa not yet specified constant parameter matrix. Then we have

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


11/(1/1) - 1(1/1)112

,,;;: 11~\/l(1/IN) - 1Ni(1/IN) + (f(1/IN) - !3J1/IL)W;(1/IN)llz

,,;;: 11~\!J(1/IN) - 1Ni(1/IN»W;(1/IN)lIz + 11~\h(1/IN) - !3;)1/ILWi(1/IN)llz

,,;;: 11ft<1/IN) - ~11Ni(1/IN)Wi(1/IN)llz + 111/ILIb . IIIz(1/IN) - ~~>;Wi(1/IN)lIzThe first term in this equation can be made arbitrarily small by Corollary 1 withp ,;, 1 since 1N; is linear. Since lJI is bounded, the second term can be madearbitrarily small by the same corollary with p = 0 through the choice of !3i'Hence, for any s > 0 we can make 11/(1/1) -1(1/1)112 < e and since 1/1 is arbitrarywe get III - 11100 < e. 0

Using the same notation as before, the attainable approximation error isbounded by

11/-11100 ,,;;:}i «0J({rj>;}, cI»)Z + MlllJILII0J({rj>;} , cI>)) = e (19)2

where

The motivation for introducing the operating point rj> is that in many cases thisvector may be of a significantly lower dimension than 1/1. With a fixed N thefirst term in (19) will be significantly smaller than the corresponding term

~ (0J({1/Ii}, lJI)z

However, the second term in (19) will make the error increase, but in mostcases when cI> is of smaller dimension than lJI, the approximation (16)-(17) willgive better accuracy than (6), (8). Another important fact is that a lowdimension of cI> makes it easier to partition the set into operating regimes.

Example 3: For example, if I is linear in the control variables u(t - 1), then rj>need not contain any u(t - I)-terms. If we have the system

yet) = I(y(t - 1), u(t - 1» = 11(y(t - 1» + h(y(t - I»u(t - 1)

we can choose rj>(t - 1) = yet - 1) without losing accuracy in the approximation.o

We now generalize this result for local expansions of (polynomial) order p.We split 1/1 into two parts and rearrange 1/IT = [1/11 1/111 such that

(20)

where A: lJIL -> ~[ is of polynomial order less than or equal to p, IH1:lJIH -> ~m, and 1HZ: lJIH -> ~mxl may be of higher order.

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


Theorem 4: Suppose f given in (20) is continuous and IJI is a bounded set. Thenfor any e > 0 there is an 1E '?Jip with ep = 1/JH, and finite N such thatIlf - 111", < e.

Proof: The proof follows the same idea as the proof of Theorem 3, butrequires some tedious notation, and is therefore omitted. 0

2.4. Some comparisons

Using local linear models we can write the model representation (16)-(17) asN-l

Y(t) = L (0; + e;(1/J(t - 1) - 1/J;»w;(ep(t - 1»;=0

= (~I(Oi - e;1/J;)w;(ep(t - 1») + ('~le;w;(ep(t - l)))1/J(t - 1)

= O(ep(t - 1» + e(ep(t - 1»1/J(t - 1)

This means that the non-linear model can be written as an apparently linearmodel, where the parameters are dependent on the operating point. Priestley(1981) introduced state-dependent models that can be written

Y(t) = O(x(t - 1» + e(x(t - 1»1/J(t - 1) (21)

where x is the state-vector, 0 is a state-dependent vector, and e is astate-dependent matrix. In general x = 1/J was suggested, but it was alsoobserved that this might be redundant, so a simpler vector may be used todescribe the parameter dependence. The present approach with x = ep hasobvious similarities. Billings and Voon (1987) discussed the use of models withsignal-dependent parameters, which are similar to (21) with x = to, where w(t) isthe auxiliary signal. Billings and Voon (1987) used polynomials to define thedependence of the parameters on the auxiliary signal, i.e. O(w(t» and e(w(t»are polynomials in w(t). A similar approach was proposed by Cyrot-Normandand Mien (1980). Our approach is also similar, but system knowledge is appliedto choose the Pi-functions, which again defines O(ep(t» and e(ep(t». TheThreshold AR Model by Tong and Lim (1980) can also be written in the form(21) with x(t - 1) = y(t - 1)

{

OJO(y(t - 1» =

Oz

if y(t - 1) E Y l

if y(t - 1) E Yz

te l if y(t - 1) E Yl

e(y(t - 1» =ez if y(t - 1) E Yz

where Y = Yj U Yz. Here the parameters are switched between two possibleparameter sets, and the decision is based on the value of y(t - 1). The resultingmodel is a piecewise linear model and related to our approach if ep(t - 1) =y(f - 1) and the interpolation functions are step-functions.

The notion of operating points and model validity functions offers acomplementary method for parametrizing the state-dependence of the parameters given by Priestley (1988).

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3

(22)


Takagi and Sugeno (1985) suggested a fuzzy-logic-based technique forcombining in a smooth fashion a set of linear models into a global model. Itturns out that if the operating regimes <P; are viewed as fuzzy sets withmembership functions equal to the model validity functions, then inference on arulebase of the form

IF eJ>(t - 1) E <P; THEN yet) = 1;(1Jl(t - 1»

gives a resulting global model of the same form as that analysed in the presentpaper, provided the fuzzy operations are properly defined. This suggests the useof fuzzy sets and rules as a means of defining the operating regimes and localmodel validity functions. This is appealing since this gives a direct method ofrepresenting the empirical knowledge the engineers and operators have aboutthe system.

A related non-linear modelling approach uses radial basis-functions (RBFs)(Powell 1987, Broomhead and Lowe 1988). Using RBFs, a non-linear functionmay be modelled as

N-\

1(1Jl) = 2: e;r;(II1Jl - 1JI;lb);=0

where r.: ~+ ~ is typically chosen as a gaussian function. The relationshipbetween some of these approaches is best illustrated by an example.

Example 4: We consider again the function in Example 2, and the followingfour modelling approaches.

(1) Two piecewise linear models, as Example 2, centred at 1Jlo = 0·5 and1Jl\ = 1·5. This may also be interpreted as a 'threshold AR' model.

(2) We choose gaussian model validity functions

( ( )2)1 eJ> - eJ>

Pie eJ» = exp - 2 :E; I

with eJ>o = 0·5, eJ>\ = 1·5, :E; = 0·5, and use two local linear models.

(3) Five local zeroth-order models centred at 1JIo = 0, 1J!1 = 0·5, 1Jl2 = 1,1Jl3 = 1·5, 1J!4 = 2, and gaussian model validity functions with 1:; = 0·5.

(4) A radial basis-function expansion with five gaussian basis-functionscentred at 1Jlo = 0, 1J!1 = 0·5, 1Jl2 = 1, 1J!3 = 1·5, 1Jl4 = 2, and 1:; = 0·5.

Linear regression is used to estimate the model parameters, and the resultsare shown in Fig. 4. By comparing Fig. 4(a) with 4(b), it is obvious thatinterpolating local linear models using well-chosen model validity functions canimprove the accuracy compared with piecewise linear models.

Notice that f now is defined on [-1,3], while data on [0,2] is used forparameter estimation. The extrapolation capabilities can thus be evaluated, andwe see that the local linear approximations give first-order extrapolation, aswould be expected, while the local zeroth-order models give zeroth-orderextrapolation. The RBF approach does not given any extrapolation at all, sinceall basis-functions go to O. A feed-forward neural net with one hidden layer (ofsigmoidal basis-functions) would give an extrapolation qualitatively similar to thezeroth-order models in Fig. 4(c). As we see, there are fundamental differencesconcerning the extrapolation capabilities. D

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


few)

32o-5L-_-'--_--'-_-'-_-.J

-I32o-5L-_.-<-_~__~_-.J

-I

(a) (b)

32o32oOL-_~-~--,-----'

-1

(e) (d)

Figure 4. (a) Approximation of f("') = 1J? + 1 using a piecewise linear model with two locallinear models; (b) approximation using two local linear models and gaussian modelvalidity functions; (e) approximation using five local zeroth-order expansions (constant); (d) approximation using a radial basis-function expansion with five gaussianbasis-functions.

3. Modelling

The representation (16)-(17) is appealing since ARMAX models are simple.We represent a complex nonlinear system by a number of simple linear systems.A piecewise linear model will have the same features, but unless we enforce themodel to be continuous on the boundary between the local models, the resultingglobal model may be discontinuous, which may be undesirable in some cases.Unlike the piecewise linear model, the model (16) will be smooth when themodel validity functions are chosen as smooth functions. In practice, there are. atleast four ways (shown in the following) a NARMAX model can be constructedusing local ARMAX models and interpolation.

(1) First we choose a set of operating regimes {<Pi} ;;:c/ that correspond to thenormal equilibrium points and major transient operating regimes of the systemin question. This means that we partition the set of operating points into partswhere we believe that the system will behave linearly. Then we performexperiments on the system and identify a local ARMAX model l for eachoperating regime, using cost indices corresponding to each local model

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


1 mj

I, = - 2:(y(t) - li(1/J(t - 1)))T Ai(y(t) - li(1/J(t - 1))) (23)mi 1=1

where m, is the number of data points relevant to regime <Pi, and Ai is a scalingmatrix. Then the local models are integrated using system knowledge to choosesensible model validity functions {Pi};::'O 1 such that the set <P is covered, asshown in Fig. 5. The choice of these functions will strongly influence theaccuracy of the global model, since the local ARMAX models are identifiedbefore the functions Pi are chosen.

(2) Instead of choosing the model validity functions {p;};::'o 1 empirically, wemay try to find optimal validity functions after we have found the localARMAX models. Choosing fixed structures for the functions {p;} ;::'01 is necessary to make the problem finite-dimensional. Keeping the parameters of theidentified ARMAX models fixed, we may search for optimal parameters of{p;};::'Ol using the same data used for identifying the ARMAX models, and aglobal performance index. This procedure leads to a two-step optimizationprocedure.

(3) A more direct approach is first to choose the model validity functionscorresponding to the operating regimes <Pi using system knowledge. Keeping themodel validity functions fixed, we minimize the global index

I = ~ ~(y(t) - l(1/J(t - 1»)TA(y(t) -l(1/J(t - 1») (24)m 1=1

with respect to all parameters in the local models. Here m is the number of data

Figure 5. The set <P of operating points is covered with local models. The figure shows lineswhere the model validity functions Pi are constant.

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


points available, and A is a suitable scaling matrix. Now the shape of the modelvalidity functions will be taken into consideration when finding the optimalparameters for the local models. This has two side effects: first, the accuracy ofthe global model is less sensitive to the choice of {Pi};~;OI; second, the localmodels (17) are influenced by the user-specified functions {p;} ;':,ijl-hence, theyare no longer linearizations of f about {1/Ji};':,ijl.

(4) An obvious improvement to (2) and (3) would be to search for ARMAXand local model validity function parameters simultaneously. This leads to acomplex non-linear optimization problem.

3.1. Required a priori system knowledgeWhen building models, different kinds of system knowledge must be avail

able. For ARMAX models, one must first estimate the dominant time-constantsof the system to choose the sampling interval. Second, the ARMAX modelorder must be chosen and the structure of the system disturbances must beknown in order to select the MA-part of the model.

When building NARMAX models, it is in addition necessary to find asuitable structure for the system function f. First principles knowledge can beapplied here, if available. We have proposed a generic structure that does notrequire first-principles modelling. It is, however, not a completely black-boxapproach, as some limited system knowledge must be included. In order to usethe local modelling approach introduced here, a priori knowledge in terms ofoperating regimes must be available. One must be able to estimate operatingregimes in which the system will behave approximately linearly.

In general, the technique with local models and interpolation may be used inan elegant fashion to integrate first-principles models with black-box models,since it is completely feasible that some of the local models may be derivedbased on first principles, while others may be black-box models (Johansen andFoss 1992c). In operating regimes where the dominating physical pheomena arewell understood and possible to model, and in operating regimes where the datais so sparce that black-box modelling is not possible, it makes sense to use localfirst-principles models. In the remaining regimes, black-box models can beconstructed, and the proposed technique with operating regime decompositioncan be applied to integrate the different models. Of course, in regimes where wehave limited data and limited knowledge, modelling is impossible, and the bestwe can hope for is some reasonable extrapolation of the neighbouring localmodels into those regimes.

Defining the operating point in a suitable manner is important. If the localmodels are linear, we have shown in Theorem 3 that the operating point mustcapture the systems non-linearities. Given a set of data from the system,different tests for linear relationships between some inputs and the outputs is ofgreat interest (Haber 1985). In the case with signal-dependent piecewise linearmodels, it has been observed by Billings and Voon (1987) that the model maybe input-sensitive. This must certainly be expected if the data used foridentification does not cover the full range of operation. Input sensitivity andbiased models may also be the result if the operating point vector is not suitablychosen, i.e. only some of the non-linearities are captured by the operating point.

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


4. IdentificationFirst we consider identification of local model parameters based on the local

cost indices (23); secondly, we consider the global cost index (24). Finally, weconsider identification of local model validity function parameters and modelstructure identification.

4.1. Identifying local model parameters using local cost indicesThe prediction error at time t for the local model 11 is defined to be

EI(t) = y(t) - lj( 1J1(t - 1))

The cost index Jj associated with the local model can be written as

(25)

Consider the local ARMAX model (17). The local models are parametrized by avector OJ and a matrix B j • Since all local models (17) are linear functions of theparameters, the representation is basically linear in the parameters, and standardidentification methods can be applied, e.g. those of Soderstrom and Stoica(1988).

Assume first the noise e(t) is sequentially uncorrelated, and the informationvector 1J1( t - 1) does not contain noise terms e( t - 1), ...• e( t - ne) . Then themodel can be written in the linear regression form

(26)

(27)

where ej is a parameter vector and qJi(t - 1) is a regression matrix.The parameters can be estimated using the least squares (LS) method. The

regression matrix, qJj(t - 1), is a matrix of computable or measurable quantitiesnot depending on the parameter vector and not correlated with e(t). The leastsquares estimate minimizing (25) can be written as

-" (1 m, T) -I(1 m, )OJ = mj ~ rpj(t)lI. jrpj (t) mj ~/irpi(t)y(t)

In the general case when delayed noise, e(t - 1), e(t - 2), ... , e(t - ne) , isincluded in 1J1(t - 1), we use the prediction error (PE) method. Since the noiseis assumed not to be measurable, we do not know the values of e(t - 1), ... ,e(t - ne ) . If the model matches the true system, then Ej(t) = e(t). Now Ej(t - 1)depends on the parameters ej since it is the prediction error. Since qJi(t - 1)depends on Ej(t - 1) and hence e;, we may conclude that the predictor (26) isno longer linear in the parameters. Hence it is not possible to find a simpleanalytic solution like (27). The cost indices (25) must be minimized numericallyusing, for example, the Newton-Raphson algorithm

B~k+l) = B\k) - (Yk(V2Jj(B~k)-IVJj(B~k))

or the Gauss-Newton algorithm, which is based on simplified calculations of theinverse hessian matrix.

Both the LS-method and PE-method can be formulated recursively. TheRPE-algorithm is

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


A A

8j( t ) = 8i(t - 1) + Kj(t)£i(t) (28)

Ki(t) = P;(t) lJI(t)ll i (29)

P;(t) = P;{t - 1) - P;(t - l)lJI(t)(lli l + lJIT(t)p;(t - 1) lJI(t»-llJIT(t)Pi(t - 1)

(30)

with

a£·lJI(t) = - ~ (r)

aei

4.2. Identifying local model parameters using a global cost indexWe define the global prediction error to be

£(t) = yet) - I(1J!(t - 1»)

The global cost index (24) can be rewritten as

1 mJ = - 2>T(t)Il£(t)

m 1=1(31)

The LS- and PE-methods can be formulated in the same manner as above.When the identification is performed on-line, some rather heuristic modificationmay be desirable. As pointed out by the present authors (Johansen and Foss1992 b), only the parameters of those local models that are assumed to be validat the current operating point should be updated. This is easily accomplished byensuring that the model validity function p, is exactly zero for operating pointswhere the local model is not valid. In practice, however, we wish the function Pito be smooth, i.e. it should go fast to zero instead of being exactly zero. Thismay cause problems when the system is operating within one operating regimefor a long time. Then all the other local models will be updated slightly eachtime-step, and information about other operating regimes will 'leak out', inparticular if forgetting factors are used.

The heuristics we propose to eliminate this problem is to update only theparameters of those local models satisfying Wj(</J(t)) > 6, where typically6 E [0·1,0·5]. It is a danger that parameters being an active part of the globalmodel may never be updated. This requires the system to be excited such thatthe parameters of all local models will be updated from time to time.

4.3. Identifying model validity function parametersIn general, the local model validity function parameters will enter the

equations for the prediction error nonlinearly. In particular, if these parametersare to be identified simultaneously with the local model parameters, we get acomplex nonlinear programming problem. We do not discuss this problem here,but refer to the vast literature on non-linear programming (e.g. Gill et al. 1981).We should like to point out that most of our simulations so far indicate that arough empirical choice of model validity functions combined with local modelidentification based on a global cost index in most cases gives better results thanthe use of local indices and subsequent optimization of the local model validityfunctions.

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


4.4. Identifying model structureWe have suggested how knowledge about operating regimes may be used to

decompose the modelling problem into the problem of building simple ARMAXmodels corresponding to each operating regime. Together with model validityfunctions, this gives a model structure in which only the parameters areunknown. In some cases, such knowledge may not be available and we needmethods to find an adequate model structure, i.e. the number N of localmodels, and the structure of each local model. From the parsimony principle weknow that the best model is the model with the fewest parameters able todescribe the system adquately. There are several theoretical frameworks thatdeal with this problem.

(I) The prediction error using a good model should be uncorrelated with pastprediction errors and inputs (and future inputs if the system operates in openloop). There are several correlation-based tests (see, for example, Billings andVoon 1986).

(2) If we consider the expected average square error on future data, weobtain the final prediction error (Akaike 1969)

SiFPE = Si(1 + P/m)/(I - P/m) (32)

This depends on the number of parameters P and the numbers of data points mused for identification.

In general, J will decrease when more parameters are introduced in themodel, e.g. when new local models are added. However, P will increase, and atsome stage the index SiFPE will increase, indicating that the quality of the modeldecreases. It is therefore important to keep the number. of local models at aminimum, and use as simple as possible local models.

Both these methods can be implemented off-line by an exhaustive search.Not all of them are suited for on-line structure identification, however. Itrequires large amounts of computer power to test more than two or three modelhypotheses simultaneously. Generally a batch of data is required to perform anystatistical test with some given significance level. A large prediction error insome time sample may be due to noise, parameter errors or inadequate modelstructure. Collecting a batch of data over some time interval will reduce theimpact of noise. If the batch is so large that the parameter estimator is expectedto converge during the batch, the impact of parameter errors may also beinsignificant. Hence, if the batch is large and the prediction error is biased orcorrelated, we can infer that the model structure is not good. In our approachwith local models, we must decide which local models are the cause for themismatch. This can be found by collecting statistics locally for every local model.

4.5. Time-varying systemsIn the case of time varying parameters, the RPE-formulation (28)-(30) must

be modified. This is usually done by introducing a method for artificiallyincreasing the covariance matrix estimate P so that it will not go to zero. The

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


most common schemes are linear increase (which corresponds to the Kalmanfilter), exponential increase, and covariance resetting.

At one time-step we get the information vector 1jJ( t). The information vectoris transformed to the vector w(t) = [wo(!/J(t)) ... WN-l(!/J(t)W. This vectorcorresponds to the direction in interpolation-space where we get information.The interpolation-space consists of N dimensions, one for each local model. Ifforgetting is to be avoided, we should only update models along the direction inthe model space where we get new information. This means that we should onlyforget if we know that new information will arrive to compensate for theforgotten information. At the operating regime level, this is rather simple in ourcase since by construction the components of w(t) will be close to zero when theinformation is not relevant for the corresponding local models. Hence, bythresholding the parameter update, only the local models about which we getnew information will be updated. This leads to a small set of local models to beupdated at each time-step. Within each local model, techniques based on thesame type of reasoning can be applied to only update in the directions inparameters-space where we get new information (Fortescue et al. 1981, Stelid etat. 1985, Parkum et al. 1990).

5. Simulation exampleWe use the proposed approach to identify a model of a continuous stirred

tank reactor (CSTR) where a first-order, exothermic chemical reaction A ....... Btakes place. The system is described by the following mass- and energy-balance

(33)

(34)

(35)

with symbols as defined in Table 1. Inputs to the system are the power Q andthe inlet flow qi' Outputs are temperature T and concentration CA' We definethe vectors y = [5 X 10- 4 x CA, 5 X 10- 3 X T]T and U = [50 X qi, 10- 7 X Q]T.The system is simulated as a continuous-time system. Two independent sequences {(y(t), u(t))} ~~ are collected by sampling the system every twominutes (see Fig. 6). One set is used for parameter identification only, and theother is used for model validation only.

The reactor is open-loop unstable. There are therefore two stabilizingsingle-loop PI-controllers. The reference signal to the controllers are LP-filteredwhite noise signals, ensuring the input is rich.

In our example, we choose the information vector 1jJT(t - 1) =[yT(t - 1) uT(t - 1)]. We have performed four simulations, and a least squaresalgorithm is used in all cases:

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


Symbol

Tt,TR

CA

CAi

EARkopCp

V{).H,q,qoQ

Value

350-400310350

110

700008·3140·0421·04·0

10·0-90000

50-50050-500-6-3

Units

KKKkmol rn?

kmol m"!kl krnol "!kJ(Kkmol)-1min"!kgl- 1

kJ (K kg)-lm3

kl kmol"!l min "!

l min "!

MWkmol l'

Description

Reactor temperatureInlet temperatureReference temperatureConcentration of A in reactorInlet concentration of AActivation energyGas constantFrequency factorAverage density in reactorAverage heat capacity in reactorReactor volumeReaction energyInlet flowOutlet flowExternal power from heat exchangerReaction rate

(36)

Table 1. Symbols.

Simulation 1: First, an ARMAX model

y(t) = ql(t - 1)6

lOT 61o 1 62

Yl(t - 1) - yt 0 63

Y2(t-l)-y! 0 64

o Yl(t-l)-Yt 65

o Y2(t - 1) - Y! 66Ul(t - 1) - ut 0 67

U2(t - 1) - u! 0 68o ul(t-1)-ut 69

o U2(t - 1) - u! 610

is fitted using linear regression on the data. The points [yt yl1T and [ut u!]Tare the points about which the ARMAX models are linearized, and are chosenas average values of the respective signals.

Simulation 2: Second, a NARMAX model with two local ARMAX models (36)is fitted. If CA; and T, are constant, the model (33)-(35) can be written

i. y(t) = f(y(t), u(t)) = fl(y(t») + fz(y(t))u(t)dt

Discretizing by the Euler method gives

(37)

y(t) = y(t - 1) + !),.t(ft(y(t - 1)) + fz(y(t - l))u(t - 1)) (38)

Comparing this with (18), Theorem 3 indicates that the operating point <p = ycaptures the nonlinearities. In order to keep the example simple, we observethat the system exhibits its strongest nonlinear behaviour with respect totemperature. As a first approximation, it is therefore reasonable to choose<P(t) = Y2(t) = 5 x 10- 4 X T(t). The two local model validity functions arechosen as gaussians (22) with <Po = 1·8 and <PI = 1·9 (corresponding to 360 K and

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


2.' ,- - __-- -,

2.' ,--_-- --_-- ~_- __,

.000900BOO600400300.00. ·O~---,,=---,,=---;=:--=,,---=;;--=;;--~"""---;;;=-<>=-....,..,;!

Figure 6. The first two curves show output and input data used for identification, while thelast two curves show data used for validation.

380 K). These values are found by exarrurnng the operating range of the dataspan. The size of the operating regimes are estimated to be ~i = 0·05 (corresponding to 10 K). The identification of the two ARMAX models is performed separately using local cost indices (23).

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


Simulation 3: The same model structure as simulation 2, but a global cost index(24) is used to identify the parameters of the two ARMAX models simultaneously.

Simulation 4: Finally, we use four local ARMAX models and a global costindex (24). The local model validity functions are still gaussians, centred at</>0 = 1·775, CP1 = 1·825, cI>2 = 1·875, and c/J3 = 1·925 corresponding to 355, 365,375, 385 K, respectively. The width parameter is chosen as I j =0·02, corresponding to 4 K.

5.1. ResultsThe results are summarized in Table 2 (different data are used for identifica

tion and validation). We see that all NARMAX approaches give significantlybetter results than the ARMAX approach. As would be expected, better resultsare achieved with a global performance index than local indices, and increasingthe number of local models also improves on the model accuracy.

One-step-ahead prediction errors are shown in Fig. 7. The curves indicatethat the prediction error is considerably reduced using the NARMAX modelscompared to the ARMAX models.

The covariance functions for the prediction errors are

£[(£(t) - e)(£(t + r) - e)T] = [r l1«r»

r21 r(39)

Estimates of the autocorrelation functions are shown in Fig. 8. It is well knownthat an unbiased model gives an autocorrelation function equal to a <'i-function.(This is not a sufficient condition. A more detailed model validation should alsoconsider £[(u(t) - u)(£(t + r) - E)T] and higher order covariances-Billingsand Voon 1986). However, we may conclude from the curves that the NARMAX models strongly improve on the model accuracy compared to theARMAX models. We may not expect a perfect model because the modelstructure is different from the true system: first, there is fundamental structuraldifference between the state-space system and the model based on local modelsand interpolation. The low number of local models limits the accuracy. Second,there is a structure error introduced by sampling the continuous system. Third,the simplified choice of operating point introduces a systematic error.

We have also done simulations using optimization of the I;-parameters in thelocal model validity functions after the local ARMAX parameters were identified using local cost indices. This gave substantial improvements, but the resultswere not as good as those obtained using a global index with pre-chosen (notoptimal) I;-parameters. Including Yl(t) in the operating point vector also gavesome improvements, but not as many as expected. Hence, we can conclude thatthe choice of operating point is sensible.

6. Discussion and conclusionsAs discussed in the introduction, this model representation may be useful

when first-principles modelling is resource-demanding or does not give satisfactory results. Today, ARMAX models are perhaps the most widely used

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


Simulation Parameters Variance: E[(£( I) - e)(£(1) - e]Tj

- 0·5055-1-85330·6256

-1·9255Simulation 1: ARMAX

fi=0·0755 [ 9·40 -1.77J X 10-4

model 1·4010 -1·77 0·340·8599

-0'1189-0·0336

_ 0·1758_

r- 0.5985- 0·3931-1·7812 1·9235

Simulation 2: NARMAX 0·7470 0·4694model with two local -1·3276 -2·9493ARMAX models. The

fit =0·0514

fi2 =0·1069 [ 2-10 -0.4OJ X 10-4

local models are fitted 1·2817 1·5959 -0·40 0·09separately using local 0·9842 0·8765prediction errors. -0·1043 -0·1875

-0·0486 -0·0410'- 0·1721_ _ 0·1890_

- 0·5771- - 0·3965-1·7859 1·9255

Simulation 3: NARMAX 0·9096 0·3176model with two local -1·4806 -3·3892ARMAX models. The

fi, =0·0185

fi2 =0·1381 [ 0·96 -0.19J 10-4

local models are fitted 1·2940,

1·6952 -0'19 0.04 xusing the global predic- 1·0534 0·8493tion error. -0·0549 -0·2342

-0·0534 -0'0418

- 0·1620_ - 0·1983_

0·6185 0·56591·7528 1·81310·8624 0·7364

-1·9076 -1·7990

fit =0·0276

fi2 =0·0538 [ 0·38 -0'08J x 10-4

1·1933,

1·3712 -0·08 0·021·0209 0·9867

-0·0573 -0'1131-0·0467 -0·0492

Simulation 4: NARMAX 0·1626 0,1737model with four localNARMAX models. Theglobal prediction error is 0·4643 0·2971

used. 1·8839 1·96960·5439 0·1970

-2·7944 -4·0579

fi, =0·0918

fi4 =0·1642

1·5668,

1·82750·9428 0·8471

-0·1685 -0·2624-0·0505 -0·0433

0·1847 0·2042

Table 2. Results for ARMAX and NARMAX model fitting.

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


'1 _:~~ ~Sim. 1 a 100 :::00 300 400 300 600 700 800 900 1000

0.'

0.05

'2

_o.o~.0.1 0 100 ~oo 300 400 ",DO 600 700 800 <;>00 1000

o .•~=

'I_::~=Sim. 2 -0.1 0 loa ~oo 300 400 500 600 700 800 900 1000

'I _:~~~~'='I',...-.r-:~.........,...,~.. . "." ":' /1Sim. 4 0 1 00 ~oo :;JOO 800 900 1000

O·'E ~0.0",

'2 _0:': --:--~---a 100 1000

Figure 7. Prediction errors on the independent validation data for the resulting models of thefour simulations. Here £1 = y, - h. and £2 = Y2 - h·

black-box type model in industry. Neural networks have over the past few yearsgained considerable popularity, at least academically. The most popular feedforward type networks can be used to build black-box NARMAX models,typical examples can be found in the work of Chen et al. (1990 a) and Nguyenand Widrow (1990). Common to most types of neural network is the fact thatthe representation is very complex and difficult to understand and validate,which may explain why so few industrial applications are reported.

Decomposing the system operation into several operating regimes and usinglocal ARMAX models to describe each operating regime is appealing for thefollowing reasons.

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


0.5

105_I '----__----L. --J

o

-0.5

-0.5

105_} '----__-----'- --J

o10

0.5

Figure 8. The curves show the autocorrelation functions for the prediction errors, for theresulting models of the four simulations.

(a) It is possible to integrate several kinds of system knowledge, includinglocal first principles models, with a black-box type model. ARMAXmodels are well understood and widely used in industry, and is hence aconvenient basis for building NARMAX models.

(b) The class of systems that can be represented is large, and a linearparametrization of the model is sufficient.

(c) The concept is straightforward, and the model structure is easy tounderstand. This is important, since the model structure can be easilyvalidated. In addition, some validation may be performed by validatingeach local model separately.

(d) Describing a system by means of operating regimes is common practicein engineering. Integrating the models for each operating regime byinterpolation have not been common so far, but seems to be a straightforward way to build models that are valid within several operatingregimes. The interpolation may improve the accuracy of the model,compared with piecewise linear models. In addition, the smoothness ofthe model is an inherent property.

A fundamental problem with all local modelling methods is the curse-ofdimensionality problem (Bellman 1961, Moody and Darken 1989, Tolle et al.1992, Friedman 1991). In our case, we have shown that this problem may

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


sometimes be reduced considerably, since the operating point may be of lowerdimension than the information vector.

To summarize, we have investigated how nonlinear systems can be modelledusing NARMAX models based on local ARMAX models. The main result isthat given a sufficient number of local models and well-defined operatingregimes, the system function can be approximated with arbitrary accuracy. Inpractice, noise and the amount of data available will limit the attainableaccuracy. Standard identification algorithms can easily be applied since theproposed representation is linear in the parameters. The empirical choice ofmodel validity function may however complicate the problem. Several practicalaspects of building such models are outlined and illustrated by a simulationexample.

The approach falls somewhat between first-principles modelling and pureblack-box modelling. Available local first principles models, as well as a prioriknowledge in terms of operating regimes, can be incorporated in the model.

ACKNOWLEDGMENTSThis work was supported by the Royal Norwegian Council for Scientific and

Industrial Research (NTNF) under Doctoral Scholarship Grant No. ST.10.12.221718, which was received by the first author.

REFERENCESAKAIKE, H. 1969, Fitting autoregressive models for prediction. Annals of the Institute of

Statistical Mathematics, 21, 243-247.BELLMAN, R. E., 1961, Adaptive Control Processes (Princeton, NJ: Princeton University

Press).BILLINGS, S. A., and VOON, W. S. F., 1986, Correlation based model validity tests for

non-linear models. International Journal of Control, 44, 235-244; 1987, Piecewiselinear identification of non-linear systems. International Journal of Control, 46,215-235.

BROOMHEAD, D. S., and LOWE, D., 1988, Multivariable functional interpolation and adaptivenetworks. Complex Systems, 2, 321-355.

CHEN, S., and BILLINGS, S. A., 1989, Representation of non-linear systems: the NARMAXmodel. International Journal of Control, 49. 1013-1032.

CHEN, S., BILLINGS, S. A., COWAN, C. F. N., and GRANT, P., 1990b, Practical identificationof NARMAX models using radial basis functions. International Journal of Control, 52,1327-1350.

CHEN, S., BILLINGS, S. A., and GRANT, P. M., 1990 a, Non-linear system identification usingneural networks. International Journal of Control, 51, 1191-1214.

CYROT-NoRMAND, D., and MIEN, H. D. V., 1980, Non-linear state-affine identificationmethods: application to electrical power plants. Proceedings of the IFAC Symposiumon Automatic Control in Power Generation, Distribution and Protection, pp. 449-462.

FORTESCUE, T. R., KERSHENBAUM, L. S., and YDSTIE, B. E., 1981, Implementation ofself-tuning regulators with variable forgetting factors. Automatica, 17, 831-835.

FRIEDMAN, J. H., 1991, Multivariable adaptive regression splines. The Annals of Statistics, 19,1-141.

GILL, P., MURRAY, W., and WRIGHT, M., 1981, Practical optimization (New York: AcademicPress).

HABER, R., 1985, Nonlinearity tests for dynamics processes. 7th IFAC Symposium onIdentification and System Parameter Estimation, York, U.K., pp. 409-414.

HILHORST, R. A., VAN AMERONGEN, J., and LoHNBERG, P., 1991, Intelligent adaptive control

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


of mode-switch processes. Proceedings of the IFAC International Symposium onIntelligent Tuning and Adaptive Control, Singapore.

JOHANSEN, T. A., and Foss, B. A., 1992 a, A NARMAX model representation for adaptivecontrol based on local models. Modeling, Identification, and Control, 13, 25-39;1992b. Nonlinear local model representation for adaptive systems. Proceedings of theSingapore International Conference on Intelligent Control and Instrumentation, Vol. 2,pp. 677-682: 1992 c, Representing and learning unmodeled dynamics with neuralnetwork memories. Proceedings of the American Control Conference, Chicago, Illinois,Vol. 3.

JONES, R. D., LEE, Y. c., BARNES, C. W., FLAKE, G. W., LEE, K., LEWIS, P. S., and QIAN,S., 1989, Function approximation and time series prediction with neural networks.Technical Report 90-21, Los Alamos National Laboratory, New Mexico.

JONES, et al., R. D., 1991, Nonlinear adaptive networks: a little theory, a few applications.Technical Report 91-273, Los Alamos National Laboratory, New Mexico.

LANE, S. H., HANOELMAN, D. A., and GELFANO, J. J., 1992, Theory and development ofhigher-order CMAC neural networks. IEEE Control System Magazine 12, 23-30.

LEONTARITIS, I. J., and BILLINGS, S. A., 1985, Input-output parametric models for non-linearsystems. International Journal of Control, 41, 303-344.

Mooov, J., and DARKEN, C. J., 1989, Fast learning in networks of locally-tuned processingunits. Neural Computation, 1, 281-294.

NGUYEN, D. H., and WIDROW, B., 1990, Neural networks for self-learning control systems.IEEE Control Systems Magazine, 10, 18-23.

OMOHUNDRO, S. M., 1987, Efficient algorithms with neural network behavior. ComplexSystems, 1, 273-347.

PARKUM, J. E., POULSEN, N. K., and HOLST, J., 1990, Selective forgetting in adaptiveprocedures. Proceedings of the 11th IFAC World Congress, Tallinn, Estonia.

POLYCARPOU, M. M., IOANNOU, P. A., and AHMED-ZAID, F., 1992, Neural networks andon-line approximators for discrete-time nonlinear system identification. IEEE Transactions on Control Systems Technology, submitted for publication.

POWELL, M. J. D., 1987, Radial basis function approximations to polynomials. 12th BiennalNumerical Analysis Conference, Dundee, pp. 223-241.

PRIESTLEY, M. B., 1981, Spectral Analysis and Time Series (London, U.K.: Academic Press);1988, Non-linear and Non-stationary Time Series Analysis (London, U.K.: AcademicPress).

PSICHOGIOS, D. c., DE VEAUX, R. D., and UNGAR, L. H., 1992, Nonparametric systemidentification: a comparison of MARS and neural nets. Proceedings of the AmericanControl Conference, Chicago, Illinois.

R'PPIN, D. W. T., 1989, Control of batch processes. Preprints of the 3rd IFAC DYCORD+ '89Symposium, Maastrict, The Netherlands, pp. 115-125.

SreLID, S" EGELAND, 0., and Foss, B., 1985, A solution to the blow-up problem in adaptivecontrollers. Modeling, Identification and Control, 6, 39-56.

SKEPPSTEDT, A., 1988, Construction of composite models from large data-sets. Ph.D. thesis,University of Linkoping, Sweden.

SKEPPSTEDT, A., LJUNG, L., and MILLNERT, M., 1992, Construction of composite models fromobserved data. International Journal of Control, 55, 141-152.

SODERSTROM, T., and STOICA, P., 1988, System Identification (Englewood Cliffs, NJ: PrenticeHall).

STOKBRO, K., 1991, Predicting chaos with weighted maps. Technical Report 91/10 S,NORDITA, Copenhagen.

STOKBRO, K., HERTZ, J. A., and UMBERGER, D. K., 1990, Exploiting neurons with localizedreceptive fields to learn chaos. Preprint 28, Niels Bohr Institute and NORDITA,Copenhangen; also: Journal of Complex Systems, submitted for publication.

STOKBRO, K., and UMBERGER, D. K., 1990, Forecasting with weighted maps. Proceedings ofthe 1990 Workshop on Nonlinear Modeling and Forecasting, Santa Fe Institute, NewMexico.

STROMBERG, K. R., 1981, An Introduction to Classical Real Analysis (Belmont, Calif:Wadsworth).

TAKAGI, T., and SUGENO, M., 1985, Fuzzy identification of systems and its application to

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3


modeling and control. IEEE Transactions on Systems, Man, and Cybernetics, 15,116-132.

TOLLE, H., PARKS, P. C, ERSO, E., HaRMEL, M., and MILlTZER, J., 1992, Learning controlwith interpolating memories-general ideas, design lay-out, theoretical approaches andpractical applications. International Journal of Control, 56, 291-317.

TONG, H., and LIM, K. S., 1980, Threshold autoregression, limit cycles and cyclical data.Journal of the Royal Statistical Society, B, 42, 245-292.

Dow

nloa

ded

by [

171.

67.3

4.69

] at

05:

48 0

9 M

arch

201

3

Documents

Constructing NARMAX models using ARMAX models