Forcing Function Diagnostics for Nonlinear Dynamicsfaculty.bscb.cornell.edu/~hooker/forcingpaper.pdfForcing Function Diagnostics for Nonlinear Dynamics Giles Hooker Abstract This paper

Forcing Function Diagnostics for NonlinearDynamics

Giles Hooker

Abstract

This paper investigates the problem of model diagnostics for sys-tems described by nonlinear differential equations. We study lack offit through the estimation of forcing functions – external inputs thatprovide the desired behavior. We derive lack of fit tests based onthese functions and study the problems associated with diagnosticsfor partially observed systems. Some observations are made concern-ing model building and identifiability. The methods are illustratedwith examples from computational neuroscience.

1 Introduction

Recent research has seen an increase in interest in fitting nonlinear differentialequations to data. Many systems of differential equations used to describereal-world phenomena are developed from first-principles approaches; eitherthrough conservation laws or by rough guesses. Such a priori modeling haslead to proposed models that mimic the qualitative behavior of observedsystems, but have poor quantitative agreement with empirical measurements.There has been relatively little literature on the development of interpretabledifferential equations models from an empirical point of view. The need forgood diagnostics has been cited a number of times, for example in Ramsayet al. (2007) and associated discussion.

This paper considers the problem of performing goodness of fit diagnosticsand model improvement when a set of deterministic nonlinear differentialequations has been proposed to explain observed data. We describe thestate of a system by a vector of k time-varying quantities x = {x1, . . . , xk}.

1

An ordinary differential equation (ODE) describes the evolution of x(t) byrelating the rate of change of x to the current state of the system

d

dtx(t) = f(x(t); t,θ). (1)

Here f : Rk → Rk is a known vector-valued function that depends on a finiteset of unknown parameters θ. The inclusion of t as an argument to f in (1)allows this dependence to vary with time. We assume that the nature of thisvariation is deterministic and specified up to elements of θ. Time-dependentvariation is included to allow known external inputs to affect x, for example.

Such systems have been used in many areas of applied mathematics andscience. It has been shown that even quite simple systems can producehighly complex behavior. They also represent a natural way of developingsystems from first-principles or conservation laws and frequently representthe appropriate scale on which to describe how a system responds to anexternal input. We describe such systems as being modeled on the derivativescale as opposed to on the scale of the observations.

The central observation in this paper is that since differential equationsare modeled on the derivative scale, lack of fit for these equations must bemeasured on the same scale. We illustrate this with a thought experiment.Suppose we have continuously observed a system y(t) without error and havepostulated a model of the form (1) to describe it. The appropriate measureof lack of fit is

g(t) =d

dty(t)− f(y(t); t,θ).

Throughout this paper we will maintain the convention that y indicates ob-servations of a system, while x represents solutions to a differential equationused to describe it. In practice, we do not have continuous, noiseless obser-vations of a system, but instead need to estimate y from a discrete set ofdata.

Re-arranging the equation above, we can express the observed system as:

d

dty(t) = f(y(t); t,θ) + g(t). (2)

g(t), representing lack of fit, appears as a collection of what may be termedempirical forcing functions. The term empirical is applied here to distinguishthem from known inputs already accounted for in the specification of f. Theseact as an external influence on y(t) which responds in terms of d

dty(t). The

2

basic task of diagnostics now becomes the estimation of g(t). This may becomplicated by knowledge (or lack of knowledge) about unmeasured compo-nents of y(t) and where fit may be lacking. Once g(t) has been estimated,the usual diagnostic procedures may be undertaken. The statistical signifi-cance of g(t) may be assessed and it may then be examined graphically forrelationships with various measured variables. Any apparent relationshipsmay then be accounted for by modifications to the model structure and themodified system re-estimated.

We note that equation (2) represents a stochastic differential equation ifg(t) is regarded as being random. Stochastic processes represent a violation ofthe model specification (1), and make up part of the alternatively hypothesisfor the goodness of fit tests we describe below. However, distinguishingbetween g being the realization of a stochastic process and being purely dueto a deterministic mis-specification of f is beyond the scope of this paper.

1.1 Neural Spike Data

In order to demonstrate the ideas from this paper in practice, we examinethe results of a common experiment in neurobiology. Figure 1 gives readingstaken from a voltage clamp experiment in which transmembrane potential ofthe axon of a single zebrafish motor neuron in response to an electric stimulusis recorded. When the stimulus exceeds a thresh-hold value, neurons respondby initiating the sharp oscillations that can be seen in Figure 1. There is along literature on models that aim to describe neural behavior; see Wilson(1999) or Clay (2005) for an overview. These models range from very simpleFitzHugh-Nagumo type systems (FitzHugh, 1961; Nagumo et al., 1962) tohighly complex models originating with Hodgkin and Huxley (1952) involvingmany ion channels that appear as dynamical components.

The task we pose is to develop a tractable modification of simple modelsthat provides a reasonable agreement with the shape of the spikes foundin the data. As a first modeling step, we use a polynomial whose termsare motivated from equation (9.7) in Wilson (1999). There, the model wasdeveloped with the aim of mimicking the turning points of a physiological

3

system. Our equations take the form:

d

dtV = p1 + p2V + p3V

2 + p4V3 + p5R + p6V R (3)

d

dtR = p7 + p8V + p9R (4)

for unknown parameters p1, . . . , p9. We observe that the form of equations(3) is invariant under linear transformations of V and R. We have used thisproperty to derive initial guesses for the parameters by taking the parametersgiven in Wilson (1999) and transforming them to approximately match thepeaks and troughs found in the data.

Figure 1 presents a plot of a short sequence of these data along with thebest fit from these equations. It is clear that while there is a qualitativelygood agreement between the model and the data, some systematic lack of fitis evident. This paper develops tests to determine the significance of lack offit and visual tools for model improvement.

1.2 Outline

This paper begins with a discussion of estimating lack of fit, illustrated withexamples, in Section 2. We develop computationally inexpensive approxi-mate goodness-of-fit tests to be used as initial diagnostics in Section 3 andgo on to discuss visual tools for model development in Section 4. Some iden-tifiability problems and possible solutions to them are discussed in Section5. We provide an example in the form of model development for the spikingbehavior of zebrafish neurons in Section 6. We suggest further directions forresearch in section 7.

2 Estimating Lack of Fit

The central technique in the paper is the estimation of a set of forcing func-tions g(t) as in (2). y(t) and its derivatives are not known exactly; ratherwe have access to a set of noisy observations yobs = {yobs

ij }k,ni

i,j=1 where yij isa measurement of the ith component of the system at time tij. We take

y(t) = {y(tij)}k,ni

i,j=1 to be the corresponding estimated values of the system,

and ‖yobs − y(t)‖ to be a (possibly weighted) norm between them. We notethat it is common that some of the components of y have no observations.

4

Figure 1: Zebrafish data. Dots provide observed values of trans-membranepotential, lines the results of a least-squares fit of equations (3) and (4) tothese data.

5

g(t) is represented as a basis function expansion:

g(t) = Φ(t)d. (5)

where Φ(t) is a vector containing the evaluation of a set of m basis expansions{φ1(t), . . . , φm(t)} at time t. In the examples below, the φi are given by cubicB-splines with equi-spaced knots. d is then a m × k matrix providing thecoefficients of each of m basis functions for each of k forcing components.d may be estimated via any parameter estimation scheme for differentialequations.

As is the case for other methods based on basis expansions, the varianceresulting from estimating a larger number of coefficients d may be regulatedeither by limiting the number of basis functions or by directly penalizing thesmoothness of g(t). The goodness of fit tests developed below correspond tothe latter approach. However, in many systems, the numerical challenges inestimating g(t) already limit the number of basis functions that can be used.

2.1 Estimating Parameters in a Differential Equation

Parameter estimation techniques in differential equation models is a difficulttopic with a large associated literature. Solutions to (1) typically do not haveanalytic expressions, necessitating the use of sometimes computationally-intensive numerical approximation methods. Further, such solutions aregiven only up to a set of initial conditions x0 = x(t0). When x0 is unknown,the set of parameters must be augmented to include it.

This paper largely aims to be generic in its approach and the diagnosticprescriptions here may be tied to any parameter estimation scheme. Theinitial tests for goodness of fit described in this paper make use of a straight-forward implementation of nonlinear least squares (NLS), described below.

In this NLS scheme, solutions x(t; θ,x0) to (1) are approximated for anyparameter vector θ and initial conditions x0. A Gauss-Newton method isthen used to estimate both. In order to make use of a Gauss-Newton solver,gradients are calculated by solving an enlarged system to include the sensi-tivity equations (see Atherton et al., 1975; Leis and Kramer, 1988):

d

dt

xvec

(dxdθ

)vec

(dxdx0

) =

f(x; t, θ)

vec(

∂f∂θ

+ ∂f∂x

dxdθ

)vec

(∂f∂x

dxdx0

) (6)

6

Here the vec(A) operator vectorizes the matrix A by concatenating its columns.Each row of (6) may be derived by a straight-forward chain rule. The fol-lowing initial conditions may also be given:

x(0) = x0,dx

dθ

∣∣∣∣t=0

= 0,dxi

dx0,j

∣∣∣∣t=0

=

{1 if i = j0 otheriwise.

(7)

Both x(t; θ,x0) and its derivatives with respect to θ and x0 may be obtainedby solving the expanded system (6 - 7).

There are many methods for approximating solutions to ODEs and theiraccuracy can depend on the nature of the system to be estimated; see Deu-flhard and Bornemann (2000). The simulations below have been undertakenusing a a Runge-Kutta method implemented in Matlab. However, thesewere insufficiently accurate for the zebrafish data above; details of numericalmethods used for these data are given in Section 6.

When the θ are replaced by d in (5), we replace the second line of (6) by:

d

dt

dy

dd=

df

dx

dy

dd+ Φ(t). (8)

For the purposes of visualization, it is frequently useful to smooth the es-timate of g(t). When a standard roughness penalty is placed on g, thepenalized nonlinear least squares problem becomes

(yobs(t)− y(t))2 + λk∑

i=1

dTi Pdi (9)

where the di are columns of d and P is a penalty matrix. This can be achievedsimply by augmenting the set of squared errors in the Gauss-Newton solver.

This estimation scheme ignores the many numerical problems associatedwith parameter estimation in differential equations. These include controllingfor numerical error in the Runge-Kutta scheme and the problem of minimiz-ing over rough response surfaces. The author’s experience is that the schemeabove provides good answers when initial parameter estimates are quite closeto the true value. However it frequently breaks down when started from pa-rameter estimates that are moderately far from the truth and either findsa local optimum or a set of parameters for which the ODE system is notnumerically solvable.

There are several methods available that may be used to overcome theseproblems. Stochastic search techniques such as simulated annealing (Jaeger

7

et al., 2004) or Markov Chain Monte Carlo methods (Huang et al., 2006)have been employed to try to find global minima in the response surface.Other methods include the estimation of parameters while attempting tosolve (1) at the same time; see, for example, Tjoa and Biegler (1991). Thislast approach has been used in Section 6.

Alternative methods rely on pre-smoothing steps; Varah (1982) proposesestimating a smooth y to the data and choosing θ to minimize:

k∑i=1

∫ (d

dtyj(t)− fj(y(t); t,θ)

)2

dt (10)

This method requires all components of y to be measured with high resolutionand precision and may still suffer from bias due to the smoothing methodused. More recent techniques include intermediate methods such as Ramsayet al. (2007), which which replace the penalty in (9) with (10), and Ionideset al. (2006), based on particle filtering techniques.

2.2 Recovering a Forced Linear System

As an initial experiment, we use a two-component linear differential equation:[x1

x2

]=

[−4 8−4 4

] [x1

x2

]+

[g(t)0

]in which g(t) takes the form of a stepped interval:

g(t) =

{5 if t < 50 otherwise

We assume the equation is known exactly and attempt to reproduce g non-parametrically from noisy data. The system was run with initial conditions(x1, x2) = (−1, 1), observations were taken every 0.05 time units on the in-terval [0,20] and gaussian noise with variance 0.25 was added to the observedsystem.

Figure 2 provides the result of estimating g(t) using the methods de-scribed above. In order to regularize the resulting smooth, we used a secondderivative roughness penalty, λ

∫g(2)(t)2dt as in (9). λ was chosen by mini-

mizing the squared discrepancy between the estimated differential equations

8

at the observation times and the true (noiseless) values of the system. Settingyλ(t) to be the reconstruction of the system, we choose

λopt = argminλ

2∑i=1

∫(yλ,i(t)− xi(t))

2 dt (11)

These results have been compared to the simple expedient of generating asmooth of the data y(t) and plotting

d

dty(t)− f(y(t); t,θ)

In this case, the smooth was generated by a spline with a third derivativepenalty, following the recommendations of Ramsay and Silverman (2005) andthe smoothing parameter chosen to minimize (11).

The second method provides rough results due to the difficulty of estimat-ing derivatives accurately from discrete, noisy data. We can understand thisby observing that the bias in a smooth is related to the size of the derivativethat is penalized as well as the penalty parameter. Nonlinear systems pro-duce trajectories that can have large derivatives, reducing the optimal valueof λ and increasing the roughness of the resulting estimate.

In the first method, the choice of basis functions and roughness penaltyhave been deliberately made without reference to the structure of the ex-periment. These results could be improved significantly with the knowledgethat the system disturbance was a square wave. Similarly, prior knowledgeof the structure of functional mis-specification, such as cyclic behavior, canbe incorporated explicitly either in the selection of basis functions, or in thesmoothness penalty that is used.

3 Goodness of Fit Tests

When f(x,u) = Ax+u(t) is linear up to an additive function, exact solutionsto (1) may be given by:

x(t) = eAtx0 + eAt

∫ s

t0

e−Asu(s)ds (12)

9

Figure 2: Experiments with a forced, linear system. The true forcing compo-nent (dashed) with the estimated component (solid) and a forcing componentderived from a generic smooth of the data (dotted).

10

where eAt represents a matrix exponential. Including a lack of fit forcingfunction of the form Φ(t)d now modifies (12) to

y(t) = eAtx0 + eAt

∫ s

t0

e−Asu(s)ds + eAt

∫ s

t0

e−AsΦ(s)dds

= u(t) + X(t)x0 + Φ(t)d. (13)

Thus for an additive observational error process, we may express the esti-mation problem for d in terms of a linear model with parametric effects forinitial conditions x0:

yobs = u(t) + X(t)x0 + Φ(t)d + ε = u(t) + Z(t)d + ε. (14)

Where Z(t) and d represent the concatinations [X(t), Φ(t)] and [x0,d] re-spectively and ε represents independent observational errors.

Since the size of d may be expected to grow with the number of observa-tions, we regard (14) as a mixed-effects model with d being random effectsand add the distributional assumptions:

d ∼ N(0, τ 2P ), ε ∼ N(0, σ2I) (15)

to (14). This is a direct extension of the random-effects treatment of smooth-ing splines (et Gu, 2002). P may be given by a roughness penalty for a basisexpansion as in (9), or simply taken to be the identity. In practise, we havefound that the numerical challenges involved in solving (6) restrict us to asufficiently coarse basis that this choice makes little difference. A test for thesignificance of d is then equivalent to testing the hypothesis

H0 : τ 2 = 0.

This process was described in Crainiceanu et al. (2005). Exact finite-sampledistributions for a likelihood-ratio test can be derived under the null hypoth-esis; these are detailed in the appendix.

A simulation study is presented in Figure 4. We used the same linearsystem as presented in Section 2.2. The height of the perturbation wasvaried between 0 and 2.5 and we simulated Gaussian errors with variance0.5. Goodness of fit was assessed using a 2nd-order B-spline basis on 21knots across the interval [0 20]. We compared the power of the test when

11

Figure 3: Power comparisons for testing lack of fit in a linear differentialequation for lack of fit represented as forcing functions (solid lines) and asadditive disturbances on the scale of the observations (dashed) as a forcingperturbation is varied (horizontal axes). Lines with circles represent powerwhen both x1 and x2 are observed; lines without represent power when onlyx1 is observed.

the basis was taken as representing a forcing function as in (2) against themodel

y(t) = x(t; θ,x0) + Φ(t)d (16)

= u(t) + X(t)x0 + Φ(t)d

with distributional assumptions given by (15). We might describe this modelas representing lack of fit on the scale of the observations. In the case of alinear system, this amounts to the choice of basis used to describe lack offit. Power was calculated using 1000 simulations in each case. It is apparentthat the representation in terms of forcing functions does noticeably better.

One common aspect of nonlinear dynamical systems is that only somecomponents of x are measured. Moreover, the measured components may

12

not be those for which system mis-specification is most severe. In thesesituations, examining residuals (model (16)) only allows us to examine thosecomponents of x which are measured. Representing lack of fit as a forcingfunction allows it to be estimated for all components where it frequently hasa simpler form. Moreover, it is possible to explore lack of fit for differentcomponents in order to estimate where it may be most severe. Power for asystem in which x1 is the only observed component is also given in Figure 4.

Tests with Unknown Parameters

Typically, some or all of the parameters in a differential equation modelneed to be estimated from data before a goodness of fit test may be applied.Moreover, in nonlinear dynamics, the coefficients of a basis representationfor a forcing function do not influence the model linearly, even when theparameters are fixed.

Crainiceanu and Ruppert (2004) suggest extensions of mixed-effects like-lihood ratio tests to nonlinear models by linearizing the model about theestimated parameter values. In that paper, a nonlinear model was assumedto be of the form Y = g(t, θ) + Φ(t)c, placing lack of fit on the scale ofthe observations. The test proceeds by linearizing g about the estimated θ,providing the approximation

Y ≈ g(t, θ) +d

dθg(t, θ)

∣∣∣∣θ

δθ + Φ(t)c + ε (17)

This model is then viewed as a linear mixed effects model with random effectsc. A test for the variance of c can then be carried out. The notion is that thevariance associated with estimating θ is partially accounted for by includingthe fixed effects δθ in the mixed model estimation scheme.

In differential equation models, the procedure may be carried out byestimating dy/dθ and dy/dy0 from the sensitivity equations (6). Setting θto be the full vector (x0, θ), the approximation (17) becomes

yobs − x(t; x0, θ) ≈dx

dx0

∣∣∣∣x0

δx0 +dx

dθ

∣∣∣∣θ

δθ + Φ(t)d + ε (18)

which fits into the framework above. We note that representing the distur-bance in this manner is not dissimilar to the Gaussian process errors modelsuggested in Kennedy and O’Hagan (2001).

13

In order to use a model of the form (2), a further linearization is required.A generalization of the approximation (17) is given by

y = g(t, θ, c) + ε ≈ g(t, θ, 0) + δdg

dθ+ c

dg

dc+ ε (19)

For differential equations, this amounts to

yobs − x(t; x0, θ) ≈dx

dx0

∣∣∣∣x0

δx0 +dx

dθ

∣∣∣∣θ

δθ +dy

dd

∣∣∣∣0

d + ε (20)

For linear differential equations,dydd

is given by the last term of (13). Innonlinear models, it may be estimated by solving the sensitivity equations(8). Treating the linearized model (20) as a mixed effect model with distri-butional assumptions (15) in the same manner as (14) now allows us to applythe methodology above to test that the variance of d is zero.

In order to examine the behavior of this approximate test, a simulationstudy was conducted using data taken from (3-4) with parameters set tozero apart from (p2, p4, p5, p7, p8, p8) = (1/3,−1, 1/3,−3/5, 3, 3/5). Thesewere chosen to provide a system which resembled a sin-curve. For eachsimulation, a linear differential equation was fit to the data and goodnessof fit tested using the methodology above. Figure ?? presents the power ofthis proposed test as a function of the variance of observational noise. As inFigure 4, this is compared to conducting the test using model (17) and weobserve a noticeable improvement in power at moderate noise levels.

Linearization distorts the null distribution of the likelihood ratio teststatistic and should be treated with some care. We note, however, that Wald-type tests and confidence regions based on NLS have a very similar form (egBates and Watts, 1988). The advantage of the tests suggested here are thatthey may be performed before running an expensive optimization scheme toestimate g(t). Moreover, they allow us to investigate which components ofx may suffer most from lack of fit. This may allow us to reduce the numberof components for which g(t) is estimated, both reducing the size of theoptimization problem and the complexity of visually exploring them.

4 Mis-specification

There are many ways in which mis-specification may arise in a dynamicalsystem. Wood and Thomas (1999) demonstrate that even minor changes

14

Figure 4: Power comparisons for testing lack of fit in a linear differentialequation for lack of fit represented as forcing functions (solid lines) and asadditive disturbances on the scale of the observations (dashed) as the amountof observational noise around a solution to the nonlinear differential equations(3-4) is increased.

15

in the form of a dynamical system may have a significant impact on systembehavior. It can be important, therefore, to pay careful attention to the formin which a system is written down.

We have demonstrated that a forcing function is a point-wise measure oflack of fit. As such it represents the same quantity as a residual in standardregression for the purposes of diagnostics and model building. We can there-fore perform the usual diagnostics; producing plots of g(t) against y(t) oragainst other quantities.

It is important to note, however, that such diagnostic plots may quicklybecome complex and difficult to interpret. Where y(t) and g(t) both containnumerous components each gi(t) must be examined with respect to all thecomponents of yj(t). In this case the data available for gi(y) becomes aunivariate path through Rk making it difficult to determine an appropriateform for this dependence. Figure 5 demonstrate these difficulties.

Forcing function representations of lack of fit provide a basic recipe formodel building that is very similar to the standard techniques taught fordata analysis via linear regression:

1. Postulate and estimate a model. Linear models are a common default.

2. Perform a test – either formal or informal – for lack of fit.

3. If fit to data is unsatisfactory, estimate forcing functions to representlack of fit on the modeling scale.

4. Attempt to represent lack of fit in terms of any known quantities.

5. Re-formulate and re-estimate the differential equation model to accountfor any relationships found in Step 4.

As in linear regression, some trial and error must be expected. As pointedout, the reformulation step may not be straightforward. Moreover, the recipe,as stated here, ignores several other modeling choices that are available in thecontext of dynamical systems: notably latent variables, higher order systemsand transformations of the data.

16

5 Unmeasured Components and Lack of Fit

Identifiability

In nonlinear dynamics, it is typical that only some of the components of asystem are measured. Data from voltage clamp experiments, for example,are only available for the variable V in (3-4). For deterministic systems, thelack of measurements for some components does not present a problem forparameter estimation since the unmeasured components have a direct impacton those that are measured. However, the presence of missing componentsdoes become a problem for the estimation of forcing functions.

Consider a system described by states (y,x) where y has been continu-ously measured and x has not been measured. Suppose a differential equationf(y,x; t) has been proposed to describe it and

f(y, ·; t) : Rk′ → Rk′

is invertible for all (y, t). It is possible to show that it is only possible toidentify the same number of forcing functions as there are observed compo-nents; this is proved formally in the appendix. Note that identification hereassumes a fixed y(t) that is known continuously. This would be the caseif we were to first smooth data and then perform diagnostics as in Varah(1982). The problem of identification for discretely observed data involvesthe number of observations and the number of basis functions used and willnot necessarily be easy to analyze.

For a single observed component, it is possible to independently estimatea forcing function for each component in the system and perform model build-ing by including the strongest observed relationships at each step. However,when d out of k components are observed, estimating the

(dk

)possible com-

binations of forcing functions becomes computationally intractable and ana-lyzing them becomes intellectually infeasible. It may therefore be preferableto impose some identifiability constraints and estimate all the componentsjointly. A standard method of doing this is to use a ridge-type penalty, aug-menting (9) with a term: λ

∑τi

∫gi(t)

2dt where the τi are chosen accordingto the scale of the variables xi, or according to prior information about whichcomponents of xi are likely to be mis-specified. We may regard this as thefully nonlinear equivalent of the model (13) and use the REML estimate forλ as a guide to the amount of penalization required. In the author’s expe-rience, diagnostic plots are insensitive to the choice of λ and very similar to

17

Figure 5: Diagnostics for the Zebrafish data. Lack of fit forcing functionsfor V , plotted against estimated trajectories for V (left)and R (right), afterfitting equations (4) and (3) to the patch clamp data in Figure 1.

those provided by estimating each gi individually.

6 Zebrafish Data

Figure 5 presents lack of fit as estimated by nonlinear least squares for thezebra fish data. For this model and data, a naive implementation of nonlinearleast squares proved insufficient to estimate parameters or forcing functions.Instead, they have been estimated via a constrained optimization techniquefor collocation methods, implemented in the IPOPT routines (Wachter andBiegler, 2006; Arora and Biegler, 2004). A complete description of the useof these methods for this application is given in Hooker and Biegler (2007).

In order to estimate lack of fit, we used 200 linear B-splines in both vari-ables. We have plotted the lack of fit functions for the observed variable Vas a function of their estimated position in both variables. These plots makethe difficulties of manually assessing lack of fit clear. The most evident rela-tionship is between V , R and gV . However, this relationship is given only fora one-dimensional curve in V, R space and guessing a useful functional formis correspondingly problematic. We propose that the flat line evident in theright-hand panel of Figure 5 indicated a regime transition; the model provid-ing a relatively good fit on one side of the regime while different parametersmight be required on the other.

18

Figure 6: A fit to the zebra fish data using a modified Rinzle model.

Formally, this model was implemented by modifying (3) to

d

dtV =p1 + p2V + p3V

2 + p4V3 + p5R + p6V R

+ logit [(c1 + c2R− V )c3] (q1 + q2V + q3V2 + q4V

3 + q5R + q6V R).

The logistic term provides a smooth transition between the two regimes.The results of an implementation of this model with additional parametersestimated in an ad hoc manner by fitting them to the estimated forcingfunctions are given in Figure 6. Here, the general shape of the spike is clearlybetter resolved. It is also evident from this figure that the timings of thespikes in the data are not exact, sometimes occurring earlier and sometimeslater than a perfectly cyclic system would produce. Optimizing parameterswithin this system produced a 100-fold decreased in total squared error.However, the modified equations, with the estimated parameters, occur in achaotic region. In particular, small changes to c3, which controls the steepnessof the transition, produce a system which either has an eventual fixed pointnear zero or eventually diverges. Thus, additional analysis is required inorder to satisfy the requirements of a realistic system.

19

7 Conclusion

The most appropriate representation for a models lack of fit is on the scalethat the model is given. For differential equations, it is appropriate to mea-sure lack of fit on the scale of derivatives. This observation leads to theestimation of forcing functions as a diagnostic tool. This paper has shownthat these tools provide useful insight into where a system of differentialequations may be mis-specified. We have also derived approximate tests forthe significance of lack of fit and the need for further modeling.

Simple residual analysis, however, is not sufficient as a model buildingtechnique. Lack of fit suffers from identifiability problems when the positedmodel has unmeasured components. This makes it important to exercisecaution in exploratory model development. As exemplified by our attemptsto improve simple neurological models, it is also important to consider thedynamical properties of the proposed modification. In particular, care needsto be taken to preserve the stability of limit cycles, bifurcation points andother dynamical features of the model. Much further work is therefore war-ranted. Similarly, interpolation tools that allow lack of fit to be estimatedin such a way that it maintains the stability of limit cycles would prove veryhelpful for visual diagnostics. In the realm of stochastic systems, it wouldalso be useful to develop tests for independence between estimated forcingfunctions and the trajectory of the model. We anticipate work in this fieldto be challenging and exciting for a long time.

Acknowledgements

The zebrafish data displayed in this study was collected in and kindly sup-plied by Joe Fetcho’s lab at the Biology Department in Cornell University. Iam greatly indebted to Larry Biegler for the time and effort he spent in as-sisting me to set up these data and equations to estimate parameters withinthe IPOPT system.

References

Arora, N. and Biegler, L. T. (2004). A trust region SQP algorithm forequality constrained parameter estimation with simple parametric bounds.Computational Optimization and Applications 28, 51–86.

20

Atherton, R. W., Schainker, R. B., and Ducot, E. R. (1975). On the statisticalsensitivity analysis of models for chemical kinetics. AIChE Journal 3, 441–448.

Bates, D. M. and Watts, D. B. (1988). Nonlinear Regression Analysis andIts Applications. Wiley, New York.

Bellman, R. (1953). Stability Theory of Differential Equations. Dover, NewYork.

Clay, J. R. (2005). Axonal excitability revisited. Progress in Biophysics andMolecular Biology 88, 59–90.

Crainiceanu, C. M. and Ruppert, D. (2004). Likelihood ratio tests forgoodness-of-fit of a nonlinear regression model. Journal of MultivariateAnalysis 91, 35–52.

Crainiceanu, C. M., Ruppert, D., Claeskins, G., and Wand, M. P. (2005).Exact liklihood ratio tests for penalized splines. Biometrika 92, 91–103.

Deuflhard, P. and Bornemann, F. (2000). Scientific Compuitng with OrdinaryDifferential Equations. Springer-Verlag, New York.

FitzHugh, R. (1961). Impulses and physiological states in models of nervemembrane. Biophysical Journal 1, 445–466.

Gu, C. (2002). Smoothing Spline ANOVA Models. Springer, New York.

Hodgkin, A. L. and Huxley, A. F. (1952). A quantitative descripotion of mem-brane current and its application to conduction and excitation in nerve. J.Physiol. 133, 444–479.

Hooker, G. and Biegler, L. (2007). IPOPT and neural dynamics: Tips, tricksand diagnostics. Technical Report BU-1676-M, Dept. Bio. Stat. and Comp.Bio., Cornell University.

Huang, Y., Liu, D., and Wu, H. (2006). Hierarchical bayesian methods forestimation of parameters in a longitudinal hiv dynamic system. Biometrics62, 413423.

Ionides, E. L., Breto, C., and King, A. A. (2006). Inference for nonlineardynamical systems. Proceedings of the National Academy of Sciences .

21

Jaeger, J., Blagov, M., Kosman, D., Kolsov, K., Manu, Myasnikova, E.,Surkova, S., Vanario-Alonso, C., Samsonova, M., Sharp, D., and Reinitz,J. (2004). Dynamical analysis of regulatory interactions in the gap genesystem of drosophila melanogaster. Genetics 167, 1721–1737.

Kennedy, M. and O’Hagan, A. (2001). Bayesian calibration of computermodels.

Leis, J. R. and Kramer, M. A. (1988). The simultaneous solution and sen-sitivity analysis of systems described by ordinary differential equations.ACM Transactions on Mathematical Software 14, 45–60.

Nagumo, J. S., Arimoto, S., and Yoshizawa, S. (1962). An active pulsetransmission line simulating a nerve axon. Proceedings of the IRE 50,2061–2070.

Ramsay, J. O., Hooker, G., Campbell, D., and Cao, J. (2007). Parameterestimation in differential equations: A generalized smoothing approach.Journal of the Royal Statistical Society, Series B (with discussion) 16,.

Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis.Springer, New York.

Rheinbolt, W. C. (1984). Differential-algebraic systems as differential equa-tions on manifolds. Mathematics of Computation 43, 473–482.

Tjoa, I.-B. and Biegler, L. (1991). Simultaneous solution and optimizationstrategies for parameter estimation of differential-algebraic equation sys-tems. Industrial Engineering and Chemical Research 30, 376–385.

Varah, J. M. (1982). A spline least squares method for numerical parameterestimation in differential equations. SIAM Journal on Scientific Computing3, 28–46.

Wachter, A. and Biegler, L. T. (2006). On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming.Mathematical Programming pages 25–57.

Wilson, H. R. (1999). Spikes, decisions and actions: the dynamical founda-tions of neuroscience. Oxford University Press, Oxford.

22

Wood, S. N. and Thomas, M. B. (1999). Super sensitivity to structure inbiological models. Proceedings of the Royal Society (B) .

A

A.1 Tests for Random Effects

Crainiceanu et al. (2005) derive distributions for the Restricted likelihoodratio test of a random effect variance under the null hypothesis that thevariance is zero. Specifically consider a linear mixed model

Y = Xβ + Zb + ε, cov

(bε

)=

[σ2

bΣ 00 σ2

ε In

]where Y is a n-vector of responses to be regressed on a n×p predictor matrixX with a further set of random effects contained in the n× k matrix Z. Weassume that the random effects b have covariance matrix σ2

bΣ for known Σand the ε are independent N(0, σ2

ε ).Re-define λ = σ2

b/σ2ε , we have cov(Y ) = σ2

ε Vλ for Vλ = In + λZΣZT .Twice the negative restricted log likelihood is then

REL(β, σ2, λ) = (n− p− 1) log(σ2ε ) + log (det(Vλ))

− log(det(XT V −1

λ X))

+(Y −Xβ)T V −1(Y −Xβ)

σ2ε

and the restricted likelihood ratio test statistic for λ = 0 is

RLRTn = supβ,σ2

ε ,λ

REL(β, σ2ε , λ)− sup

β,σ2ε

REL(β, σ2ε , 0)

Let P0 = In−X(XT X)−1X and µs and ξs be the eigenvalues of Σ1/2ZT P0ZΣ1/2

and Σ1/2ZT ZΣ1/2. Let us for s = 1, . . . , K and wt for t = 1, . . . , n− p− 1 beindependent N(0, 1) variables and

Nn(λ) =K∑

i=1

λµs

1 + λµs

w2s , Dn(λ) =

K∑s=1

w2s

1 + λµs

+

n−p−1∑s=K+1

w2s

then RLRTn is equal in distribution to∑λ≥0

(n− p− 1) log

[1 +

Nn(λ)

Dn(λ)

]−

K∑s=1

log(1 + λµs)

Although this appears complex, the distribution is easily simulated.

23

B

B.1 Lack of Fit Identifiability

Section 5 explored identifiability for lack of fit functions for systems describedby two variables. The extension to more complex settings cannot be madeso directly. We examine this setting by breaking the system into componentswhich are or are not measured, and which are or are not forced. In thefollowing theorem they will be given distinct labels:

Theorem B.1. Let x0(t) and j0(t) have dimension k0, x1(t), y(t) and j(t)have dimension k1 for each t ∈ [0 1], and let z(t) have dimension k2. Letf0(x

0,x1,y, z; t), f1(x0,x1,y, z; t), g(x0,x1,y, z; t) and h(x0,x1,y, z; t) be lip-

schitz continuous in their arguments, h be bounded and let f1(x0,x1,y, z; t)

have a lipschitz continuous inverse f−1y (x0,x1,w, z; t) in y for each x0,x1, z, t.

For fixed, continuously differentiable x0(t), and x1(t), let y0, z0 be such thatx1(0) = d

dtx1

∣∣0

= f(x0(0),x1(0),y0, z0; 0), the equation

d

dtx0(t) = f0(x

0(t),x1(t),y(t), z(t); t) + j0(t)

d

dtx(t) = f1(x

0(t),x1(t),y(t), z(t); t)

d

dty(t) = g(x0(t),x1(t),y(t), z(t); t) + j(t)

d

dtz(t) = h(x0(t),x1(t),y(t), z(t); t)

(y(0), z(0)) = (y0, z0)

have unique continuous solutions y(t), z(t), j0(t) and j(t) with y(t) and z(t)continuously differentiable.

Proof. Solutions may be constructed directly by noting that any solution z(t)must satisfy:

d

dtz(t) = h

(x0(t),x1(t), f−1

y (x0(t),x1(t), x(t), z(t); t), z(t), t)

(21)

z(0) = z0

(Note that y0 = f−1(x0(t),x1(t), x(0), z0; 0) by assumption). Under the as-sumptions of lipschitz continuity, the right hand side of (21) is also lipschitz

24

continuous. This provides the local existence and uniqueness for solutionsof (21); see, for example, Bellman (1953). Boundedness of h guarantees aglobal solution.

We can now re-construct y(t) and j(t) as

y(t) = f−1y (x0(t),x1(t), x1(t), z(t); t)

j0(t) =d

dtx0(t)− f0(x

0(t),x1(t),y(t), z(t); t)

j(t) =d

dty(t)− g(x(t),y(t), z(t); t)

Continuity follows directly.

We remark that this construction is only possible when y has the samedimension as x, otherwise no such f−1

y will exist. Rather than insisting onthis inverse, we could proceed to solve the Differential-Algebraic System:

0 = x(t)− f(x(t),y(t), z(t),u(t))

d

dtz(t) = h(x(t),y(t), z(t); t)

in terms of y and z and proceed with the substitution. This allows slightlydifferent conditions on f. See, for example Rheinbolt (1984).

The division of components into x0, x1, y and z is arbitrary here. Thepoint being that for any division for which f−1

y exists, solutions may be found.

25

Documents

Forcing Function Diagnostics for Nonlinear Dynamicsfaculty.bscb.cornell.edu/~hooker/forcingpaper.pdfForcing Function Diagnostics for Nonlinear Dynamics Giles Hooker Abstract This paper