Deriving Monotonic Function Envelopes from Observations*

Deriving Monotonic Function Envelopes fromObservations*

Herbert KayDepartment of Computer Sciences

University of Texas at AustinAustin, TX [email protected]

Abstract

Much work in qualitative physics involves con-structing models of physical systems using func-tional descriptions such as "flow monotonically in-creases with pressure." Semiquantitative meth-ods improve model precision by adding numeri-cal envelopes to these monotonic functions . Adhoc methods are normally used to determine theseenvelopes. This paper describes a systematicmethod for computing a bounding envelope of amultivariate monotonic function given a streamof data . The derived envelope is computed bydetermining a simultaneous confidence band fora special neural network which is guaranteed toproduce only monotonic functions. By compos-ing these envelopes, more complex systems can besimulated using semiquantitative methods.

IntroductionScientists and engineers build models of continuoussystems to better understand and control them. Ide-ally, these models are constructed based on the theunderlying physical properties of the system . Unfortu-nately, real systems are seldom well enough understoodto construct precise models based solely on a prioriknowledge of physical laws . Therefore, process data isoften used to estimate some portions of the model.

Techniques for estimating a functional relationshipbetween an output variable y and the input vectorx typically assume that there is some determinis-tic function g and some random variate e such thaty = g(x) + e, where c is a normally distributed, meanzero random variable with variance o-2 that representsmeasurement error and stochastic variations . The es-timate is computed by using a parameterized fitting

'This work has taken place in the Qualitative ReasoningGroup at the Artificial Intelligence Laboratory, The Uni-versity of Texas at Austin . Research of the QualitativeReasoning Group is supported in past by NSF grants IRI-8905494, IRI-8904454, and IRI-9017047, by NASA contractNCC 2-760, and by the Jet Propulsion Laboratory.

Lyle H. UngarDepartment of Chemical Engineering

University of PennsylvaniaPhiladelphia, PA [email protected] .edu

function f(x; 0) and then using regression analysis todetermine the values 0 such that f(x ; 0) .., g(x) .

Traditional regression methods require knowledge ofthe form of the estimation function f. For instance,we may know that body weight is linearly related toamount of body fat. We may then determine thatf(weight; B)

= weight - B is an appropriate model.Neural network methods have been developed for caseswhere no information about f is known . This may bethe case if f models a complex process whose physics ispoorly understood . Such networks are known to be ca-pable of representing any functional relationship givena large enough network. In this paper, we consider thecase where some intermediate level of knowledge aboutf is known. In particular, we are interested in caseswhere we have knowledge of the monotonicity of f interms of the signs ofa where xk E x . For example,we mightknow that outflow from a tank monotonicallyincreases with tank pressure . This type of knowledgeis prevalent in qualitative descriptions of systems, so itmakes sense to take advantage of it .

Since the estimate is based on a finite set of data, itis not possible for it to be exact. We therefore requireour estimate to have an associated confidence measurewhich takes into account the uncertainty introduced bythe finite sample size . For our semiquantitative repre-sentation, we are therefore interested in deriving anenvelope that bounds all possible functions that couldhave generated the datastream with some probabilityP.

This paper describes a method for estimating andcomputing bounding envelopes for multivariate func-tions based on a set of data and knowledge of themonotonicity of the functional relationship . First,we describe our method for computing an estimatef(x; 0) ;zz~ g(x) based on a neural network that is con-strained to produce only monotonic functions. Second,we describe our method for computing a bounding en-velope, which is based on linearizing the estimationfunction and then using F-statistics to compute a si-multaneous confidence band . Third, we present severalexamples of function fitting and its use in semiquanti-tative simulation using QSIM. Next we discuss related

work in neural networks and nonparametric analysisand finally we summarize the results and describe fu-ture work .Bounded monotonic functions are a key element of

the semiquantitative representation used by simulatorssuch as Q2 [Kuipers and Berleant, 1988], Q3 [Berleantand Kuipers, 1992], and Nsim [Kay and Kuipers, 1993]which predict behaviors from models that are incom-pletely specified . To date, the bounds for such func-tions have been derived in an ad hoc manner . Thework described in this paper provides a systematicmethod for finding these functional bounds . It is par-ticularly appropriate for semiquantitative monitoringand diagnosis systems (such as MIMIC [Dvorak andKuipers, 1989]) because process data is readily avail-able in such applications . By combining our functionbounding method with model abduction methods suchas MISQ [Richards et ad., 1992], we plan to constructa self-calibrating system which derives models directlyfrom observations of the monitored process. Such asystem will have the property that as more data is ac-quired from the process, the model and its predictionswill improve.

Computing the Estimate

Computing the estimate of g requires that we makesome assumptions about the nature of deterministicand stochastic portions of the model. We assume thatthe relationship between y and x is y = g(x)+c where eis a normally distributed random variable with mean 0and variance Q' . Other assumptions, such as differentnoise probability distributions or multiplicative ratherthan additive noise coupling could be made . The abovemodel, however, is fairly general and it permits us touse powerful regression techniques for the computationof the estimate and its envelope . For situations wherevariance is not uniform, we can use variance stabiliza-tion techniques to transform the problem so that it hasa constant variance.

In traditional regression analysis, the modeler sup-plies a function f(x ; B) together with a dataset to aleast-squares algorithm which determines the optimalvalues for 9 so that f(x; 9)

:z~ g(x) .

The estimatedvalue of y is then y = f(x ; 9) . In our case, however,the only information available about f is the signs ofits n partial derivatives a where xk E x, so no ex-plicit equation for f can be assumed . One way to workwithout an explicit form for f is to use a neural netfunction estimator. Figure 1 illustrates a network fordetermining y given a set of inputs x. The networkhas three layers . The input layer contains one node foreach element xk and one bias node set to a constantvalue of 1 . The hidden layer consists of a set of nhnodes which are connected to each input variable aswell as to the bias input. The output layer consistsof a single node which is connected to all the hiddennodes as well as to another bias value which is fixed

Figure 1: A neural net-based function estimator .This three-layer net computes the function

~~Al [w.[.i,l]a (E'i=, wi[i,j]xi +wi[n+l,i1)~ + wO[nh+1,1]) .

at 11 . All nodes use sigmoidal basis functions and allconnections are weighted . In our notation, wili j] rep-resents the connection from input xi to hidden node jand wo U,ll represents the connection from hidden nodej to the output layer' . This network represents thefunction

=

Q(S + wo[nh+1,11)nh

ns

-_

LWOLi,110 ~E wi[ii]xi + wi[n+lj]

j=1 i=1

where v(x) is the sigmoidal function

We cancompute the weights by solving the nonlinear leastsquares problem

Win

(yi - yi)zi

where Pi = f(xi ; w) and w is a vector of all weights.Cybenko [Cybenko, 1989] and others have shown thatwith a large enough number of hidden units, any con-tinuous function may be approximated by a networkof this form .One drawback of using this estimation function is

that it can overfit the given data. This results in theestimate following the random variate e as well as thedeterministic part of the model which means that weget a poor approximation of g3. We therefore reducethe scope of possible functions to include only mono-tonic functions. To do this, note that if f is mono-tonically increasing in xk, then a must be positive .

'The bias terms permit the estimation function to shiftthe center of the sigmoid.

'The weight wi[n+l,i] represents the connection to theinput bias and the weight wo[nh+1,1] represents the connec-tion from the hidden layer bias node to the output node .

'Visually, the estimate will try to pass through everydatapoint.

By constraining the weights, we can force this deriva-tive to positive for all x, insuring that the resultingfunction is monotonic. The derivative in question is

ofaxk

p

=

v' (s + Wo[.n+1,1]) -pn~

n

t woU,110

(

`

E(w=[i,j]xi) + Wi[k,j]j=1 i=1

~

Since the derivative of the sigmoid function is positivefor all values of its domain,a will be positive if p ispositive . This will be the case ifbl<j<nwo[j,1] - wi[k,j] >-0 . If the partial derivative is negative, then the inequal-ity is reversed .

This set of additional constraints on weights causesthe network to produce only monotonic functions. Wemay determine the values of the weights by solving theproblem

min

(ya -yi)2

wi

subject to d 1<j<nn wofjj] - Wi[k,j] > 0.1<k<n

These nh - n constraints transform the least squaresproblem into a constrained nonlinear optimizationproblem. While more difficult to solve than the uncon-strained problem, there are still a number of solutionsbased on numerical methods . In our work we use theSQP algorithm [Biegler, 1985 ; Biegler and Cuthrell,1985] .To compute the estimate, we need to find the best

value for nh . We determine nh by repeatedly solvingthe optimization problem for increasing values of nhstarting at 1 and continuing until there is no improve-ment in the sample standard error

Z:i (yi - yi)2n-p

For our examples, nh runs around 2 .Figure 2 shows the result of using the above method

to estimate a fit to a datastream derived from aquadratic function with noise from a normal distribu-tion with QZ = 4. Note that the estimate is in factmonotonic and does not follow the noise.

Computing the EnvelopeThe estimate computed in the previous section is af-fected by the sample dataset . Since this dataset isof finite size, the estimate generated from it will notbe precisely correct. In this section, we describe ourmethod for bounding our estimate with an envelopethat captures the uncertainty introduced by using afinite set of data .The typical confidence estimate used in regression

analysis is the confidence interval, which is defined ata point x as P(f(x) - bx < g(x) < f(x) -I- bx) = 1- a

Figure 2: Fitting a neural net estimator to a datas-tream of 100 points . The data was generated by addingnoise with a variance of 4 to the line y = x2+5 (shownas a dashed line) . The estimated function is shown asa solid line . There are two hidden nodes.

where bx depends on x . The confidence interval is apoint probability measure since it expresses the uncer-tainty of the estimated value f(x) at a single point xin the domain . Since we wish our envelope to boundall possible functions, we require that our confidenceinterval holds simultaneously at all points in the do-main . This measure (called a confidence band) can beeasily computed for linear regression models . Since ourmodel is nonlinear, we use a linearization of f to forman approximate confidence band .Assume that we have a linear model f(x; B) = xTo

where 8 is a parameter vector of length p and that/ is our estimate of the parameters . If we representthe datastream as a matrix [Y I X] where the ith rowrepresents a single sample datapoint (yi, xi), it can beshown [Bates and Watts, 1988] that the 1 - a confi-dence band for f is

xT/3

s

pF(a; P, n - P)IIXTR-111

2where s2 is the sample standard error

YL, Fis the a quantile of the F-statistic with p and n - pdegrees of freedom, and R is the square portion ofthe QR decomposition of the array of sample inputs

X (X = Q

~ ]).Geometrically, the columns of X

form a p-dimensional linear subspace called the expec-tation surface in which the solution must lie. The leastsquares computation finds the point on the expectationsurface that is closest to Y.To use this result, we must linearize the nonlinear

problem as follows. Note that the entry xnp of X isaxT~_app

Using this, we linearize our estimator by defin-

ing a matrix V such that v,p

x^'*'

and a vector'lp -

aw~v such that v

ofaw~x'v~' . The envelope is then de-

fined by

f(X;*)fs pF(a ;p,n - p)IIvTRv -1 JJ

where V = Qv I

Ov I .

The linearization can be

viewed as defining a plane which is tangent to the trueexpectation surface (which is not a linear subspace) atw. Assuming that w is close to the exact value for w,the linearization will hold near this point on the plane .

Because the linearization is only an approximation,this estimate does not provide an exact 1-a confidenceband for f. However, the result is approximately cor-rect and depends on the degree of nonlinearity of theexpectation surface [Bates and Watts, 1988] .We use the LINPACK subroutine DQRDC to compute

the QR decomposition of V. This method performs thedecomposition using pivoting so that Rv is both uppertriangular and has diagonal elements whose magnitudedecreases going down the diagonal . With this form, wecan easily recognize the case where the linearized ma-trix V may not have full rank (i .e ., only q < p columnsare linearly independent) . This in turn means that Rvhas zeros in all diagonals past column q . In such cases,the product vT Rv -1 may not have a solution . Tohandle this problem, we simply reduce the size of Rvby using only the upper left q x q corner . We mustthen also reduce the size of v so that it does not con-tain partials with respect to wi where i > q. This isjustified, since if V forms a p-dimensional basis for thelinearized expectation surface, and its rank is only q,then the wi where i > q may be ignored since theirvalue is arbitrary .The confidence band is designed to cover all possi-

ble curves that fit the data with a probability of 1-a.With traditional regression, this confidence band is themost precise description we can achieve given the gen-eral form of f . Using monotonicity information, how-ever, we can further refine our prediction to derive atighter envelope by ruling out portions of the curvesthat could not be monotonic . Consider the confidenceband shown in Figure 3 . If we know that the underly-ing function is monotonic we may remove the shadedregions from the prediction since no monotonic func-tion within the confidence band could pass throughthese regions . Note that this final step could not beperformed if we used point confidence intervals .

ExamplesWe have applied our envelope method to noisy datasetswhose underlying functions are linear, quadratic,square root, and exponential . In order to give a feel-ing for the form of the envelopes generated, we presentseveral of these test cases . Each result is for a uni-variate monotonically increasing model function f(x) .Figure 4 shows the estimate and envelope for a set of100 samples drawn from the function y = 0.5x+5 withadditive noise with variance 4 . Note that the envelope

120

Figure 3 : Reducing the size of a confidence band usingmonotonicity information . If we know that the un-derlying function is monotonically increasing, we canrule out the shaded areas of the figure as part of theenvelope .

Figure 4 : 95% envelope (solid lines) for the linear func-tion y = 0.5x + 5 (dashed line) . The estimator has 3hidden nodes . Sample variance is 4 .520 .

expands at the ends since there is less data there toconstrain the estimate .The second example is that of a quadratic function

with noise of variance 4 . This dataset demonstrates theeffect of nonuniform sampling . In areas where there islittle data, the envelope bulges outward to compensate.Note also that the upper portion of the envelope isnarrowest at the far end of the plot . This is due tobias in the estimation function which assumes that thecurve will "heel over" just past the end of the dataset .For this reason, the computed envelope is valid onlyover the range of the given data .

In the third example, the envelope method is usedto create a semiquantitative model which is then sim-ulated using QSIM to predict the behavior of a realphysical system . To develop this example, we tooka small plastic tank and drilled a hole in its bottom .Our goal was to construct a semiquantitative modelthat could be used to predict the level of fluid in thetank over time . The standard approach to construct-ing such a model in QSIM is to construct a qualitative

Figure 5: 95% envelope (solid lines) for the quadraticfunction y = x2 + 5 (dashed line). The estimator hastwo hidden nodes. Sample variance is 4.227 .

model and then augment it with numerical informa-tion about the monotonic functions and parameters .Normally, these numerical envelopes are ad hoc in thatthey are hand-derived by the modeler. By applying theenvelope method to a data stream from the system, wecan construct these envelopes in a more principled way.We require that the model prediction bound all real

behaviors of the system (i .e ., that the true behavior iswithin the predicted bounds) with probability 1 - a .Semiquantitative models are particularly good at thisbecause they produce a bounded solution . We beginby asserting the qualitative part of the model - A' =-f(A), f E M+ which asserts that the derivative ofamount (A) is a monotonic function of amount . Sincewe are interested in level rather than amount, we alsoassert that g(L) = A, g E M+, i .e ., that level (L)and amount are monotonically related. The completequalitative model is thus

A'= -f(g(L))

f,g E M+

This qualitative description holds for a wide class oftanks . While it is unquestionably true for our tank,it is not specific enough to produce useful predictions .To increase the precision of the prediction, we mustfurther define f(A) and g(L) . To determine thesefunctions, we performed the following experiment . Wefilled the tank with water and allowed it to drain whilemeasuring flow (cc/sec), level (cm), and amount (cc) .Figures 6 and 7 show the experimental data togetherwith the 95% confidence envelopes. With this infor-mation, we used the QSIM semiquantitative simula-tion techniques Q2 [Kuipers and Berleant, 1988] andNsim [Kay and Kuipers, 1993] to produce a prediction .Since QSIM already reasons with monotonicenvelopes,there are no special techniques required to handle thecomposed function f(g(L)) .

Figure 8 shows the results of the simulation togetherwith experimental data for time vs level . Since thewidth of the monotonic function envelopes are a func-

Figure 6 : 95% envelope for level vs flow data from atank . The estimator has two hidden nodes . Samplevariance is 0 .441 .

Figure 7: 95% envelope for the amount vs level data .The measurement accuracy for this data was greaterthan for the level vs flow data, hence the estimate isnarrower . The estimator has two hidden nodes. Sam-ple variance is 0.003 .

tion of the datastream length, by observing the phys-ical system for a longer time period, we would expectthe envelopes to tighten further, thus yielding tighterpredictions.

Related WorkThis work is related to several approaches to functionestimation using neural networks . Cybenko [Cybenko,1989] has shown that any continuous function can berepresented with a neural net that is similar to onethat we use. With our method, we assume a priorithat the true function is monotonic. This assumptionrestricts our attention to a smaller class of functionswhich means that less data is needed to compute areasonable estimate . Additionally, we compute a con-fidence measure on our estimate in the form of an en-velope .The VI-NET method [Leonard et al., 1992] also uses

a neural network for computing estimates of general

Figure 8: Prediction envelope for the tank model us-ing the previous monotonic envelopes for f and g. Notethat sample measurements taken from the system areall contained within the envelope . The predictions ofthis model could be strengthened with increased mea-surements which would serve to reduce the monotonicenvelopes .

functions. It is notable in that it provides a confidencemeasure on its estimate and can determine when it isbeing asked to extrapolate. By using radial basis func-tions instead of sigmoidal ones, it is able to handle dif-ferent variances across the function . This is especiallyuseful in applications where there is no a priori infor-mation about g . Because it allows non-constant vari-ances, it would be difficult to get a VI-NET to returnsimultaneous confidence bands, which are importantsince we wish to bound all curves that could have gen-erated the datastream . In contrast, our approach canonly handle fixed-variance problems, but since variancewill often track either x or y, we can make monotonic-ity assumptions about o-2 if it should prove necessary.Under such circumstances, variance stabilization tech-niques [Draper and Smith, 1981] should prove usefulin fixing the variance .

This work is also related to monotonic function esti-mation [Kruskal, 1964 ; Hellerstein, 1990], particularlythe NIMF estimator [Hellerstein, 1990] which also de-termines envelopes for multivariate monotonic func-tions . NIMF allows for a more general noise model(zero mean, symmetrically distributed) and uses anon-parametric statistical method for determining pointconfidence bounds . These bounds, however, are muchweaker than those derived with methods that first com-pute an estimate . This is not surprising, since we areassuming more about the noise than NIMF does . Fi-nally, NIMF produces point bounds only, and it is un-clear how these bounds could be made into confidencebands of reasonable width.

Discussion and Future WorkThis paper has described a method for computing en-velopes for functions described solely by monotonicity

122

information and a stream of data. It improves overexisting methods for function estimation by providinga simultaneous confidence band that encloses all func-tions that could have generated the datastream . Usingmonotonicity information provides several benefits tofunction estimation

" Much less data is required to obtain a reasonablyprecise model.

" When combined with simultaneous confidencebands, portions of the band can be eliminated, thusfurther improving model precision.

While the class of monotonic functions may seemrestrictive for modeling continuous systems, we canrepresent many non-monotonic functions as compo-sitions of monotonic ones . For example, in a systemof two cascaded tanks, the level of water in the lowertank is typically a non-monotonic function of time .The flow into the lower tank, however, is the compo-sition of a monotonically increasing function of theamount of water in the upper tank and a monoton-ically decreasing function of the water in the lowertank . By using semiquantitative simulation meth-ods, we can represent this composition and thus sim-ulate systems with non-monotonic behaviors.

The method is applicable for cases where samplevariance is fixed. Future work includes adding vari-ance stabilization methods to handle cases where thedata has a non-constant variance . We also plan to addthe ability to impose further constraints such as statingthat the function must pass through zero .Our method for deriving bounds for monotonic func-

tions plays a key role in the construction of semiquanti-tative models, especially for monitoring and diagnosistasks where process data is readily available. Becausethe precision of the resulting envelopes is a function ofthe amount of data used to compute them, our bound-ingmethod also provides asystematic method for shift-ing the precision of a semiquantitative model along acontinuum from purely qualitative to exact .

Nomenclaturex

The domain of the function (y = g(x)).n

The dimension of x.nh

The number of hidden units in the network.p

The number of weights in the network(a total of (n + 2)nh + 1) .The weight from xs to hidden unit j.The weight from hidden unit j to the output .A vector of all weights .The estimate of y.

References

W i[z,j]wou,l]wy

Bates, Douglas M. and Watts, Donald G. 1988 . Non-linear Regression and Its Applications. John Wileyand Sons, New York .Berleant, Daniel and Kuipers, Benjamin 1992 . Com-bined qualitative and numerical simulation with q3 .

In Faltings, Boi and Struss, Peter, editors 1992, Re-cent Advances in Qualitative Physics. MIT Press.Biegler, L. T. and Cuthrell, J . E. 1985 . Improvedinfeasible path optimization for sequential modularsimulators - II . the optimization algorithm. Comput-ers and Chemical Engineering 9(3) :257-267 .Biegler, L. T. 1985 . Improved infeasible path op-timization for sequential modular simulators - I.the interface. Computers and Chemical Engineering9(3) :245-256 .Cybenko, G. 1989 . Approximation by superpositionsof sigmoidal functions. Mathematics of Control, Sig-nals, and Systems 2:303-314 .Draper, Norman R. and Smith, H . 1981 . Applied Re-gression Analysis (second edition) . John Wiley andSons, New York .Dvorak, Daniel Louis and Kuipers, Benjamin 1989 .Model-based monitoring of dynamic systems. In Pro-ceedings of the Eleventh International Joint Confer-ence on Artificial Intelligence . 1238-1243.Hellerstein, Joseph 1990 . Obtaining quantitative pre-dictions from monotone relationships . In Proceedingsof the Eighth National Conference on Artificial Intel-ligence (AAAI-90). 388-394.Hertz, John ; Krogh, Anders ; and Palmer, Richard G.1991 . Introduction to the Theory of Neural Computa-tion . Addison-Wesley, Redwood City.Kay, Herbert and Kuipers, Benjamin 1993 . Numericalbehavior envelopes for qualitative models . In Proceed-ings of the Eleventh National Conference on ArtificialIntelligence (AAAI-93). to appear .Kruskal, J . B. 1964 . Multidimensional scaling by op-timizing goodness of fit to a nonmetric hypothesis .Psychometrika 29(1) :1-27.Kuipers, Benjamin and Berleant, Daniel 1988 . Usingincomplete quantitative knowledge in qualitative rea-soning . In Proceedings of the Seventh National Con-ference on Artificial Intelligence. 324-329 .Leonard, J . A. ; Kramer, M. A . ; and Ungar, L. H.1992 . Using radial basis functions to approximate afunction and its error bounds . IEEE Transactions ofNeural Networks 3(4) :624-627 .Richards, Bradley L. ; Kraan, Ina; and Kuipers, Ben-jamin 1992 . Automatic abduction of qualitative mod-els. In Proceedings of the Tenth National Conferenceon Artificial Intelligence (AAAI-92). 723-728.

Documents

Deriving Monotonic Function Envelopes from Observations*