Forcing Function Diagnostics for Nonlinear Dynamicsfaculty.bscb.cornell.edu/~hooker/forcingfunctions.pdfForcing Function Diagnostics for Nonlinear Dynamics 929 Figure 1. Zebraﬁsh

Biometrics 65, 928–936

September 2009DOI: 10.1111/j.1541-0420.2008.01172.x

Forcing Function Diagnostics for Nonlinear Dynamics

Giles Hooker

Department of Biological Statistics and Nonlinear Dynamics, Cornell University, Ithaca, New York 14850, U.S.A.email: [email protected]

Summary. This article investigates the problem of model diagnostics for systems described by nonlinear ordinary differentialequations (ODEs). I propose modeling lack of fit as a time-varying correction to the right-hand side of a proposed differentialequation. This correction can be described as being a set of additive forcing functions, estimated from data. Representinglack of fit in this manner allows us to graphically investigate model inadequacies and to suggest model improvements. I derivelack-of-fit tests based on estimated forcing functions. Model building in partially observed systems of ODEs is particularlydifficult and I consider the problem of identification of forcing functions in these systems. The methods are illustrated withexamples from computational neuroscience.

Key words: Diagnostics; Goodness of fit; Neural dynamics; Nonlinear dynamics.

1. IntroductionRecent research has seen a significant increase in interest infitting nonlinear differential equations to data. Many systemsof differential equations used to describe real-world phenom-ena are developed from first-principles approaches by a com-bination of conservation laws and rough guesses. Such a priorimodeling has led to proposed models that mimic the qualita-tive behavior of observed systems, but have poor quantitativeagreement with empirical measurements. There has been rel-atively little literature on the development of interpretabledifferential equations models from an empirical point of view.The need for good diagnostics has been cited a number oftimes, for example, in Ramsay et al. (2007) and associateddiscussions.

This article considers the problem of performing goodness-of-fit diagnostics and model improvement when a set of deter-ministic nonlinear differential equations has been proposed toexplain observed data. I describe the state of a system by avector of k time-varying quantities x(t) = {x1(t), . . . , xk (t)}.An ordinary differential equation (ODE) describes the evolu-tion of x(t) by relating its rate of change to the current stateof the system:

d

dtx = f(x; t, θ). (1)

Here f : Rk → R

k is a known vector-valued function that de-pends on a finite set of unknown parameters θ. The inclusionof t as an argument to f in equation (1) allows this dependenceto vary with time. I assume that the nature of this variationis deterministic and specified up to elements of θ. This vari-ation is included to allow known external inputs to affect x,for example.

Such systems have been used in many areas of appliedmathematics and science. It has been shown that even quitesimple systems can produce highly complex behavior. They

also represent a natural means of developing systems fromfirst principles or conservation laws and frequently representthe appropriate scale on which to describe how a system re-sponds to an external input. I describe such systems as beingmodeled on the derivative scale as opposed to on the scale ofthe observations.

The central observation in this article is that because dif-ferential equations are modeled on the derivative scale, lackof fit for these equations is most interpretably measured onthe same scale. I illustrate this with a thought experiment.Suppose we have continuously observed a system y(t) with-out error and have postulated a model of the form (1) todescribe it. A directly interpretable measure of lack of fit isg(t) = dy/dt − f(y; t, θ).

Throughout this article, I will maintain the convention thaty indicates observations of a system, while x represents solu-tions to a differential equation used to describe it. In practice,we do not have continuous, noiseless observations of a system,but instead need to estimate y from a discrete set of data.

Rearranging the equation above

d

dty = f(y; t, θ) + g(t). (2)

g(t), representing lack of fit, appears as a collection of whatmay be termed empirical forcing functions. The term empiricalis applied here to distinguish them from known inputs alreadyaccounted for in the specification of f. These act as an exter-nal influence on y(t), which responds in terms of d

dty(t). The

basic task of diagnostics now becomes the estimation of g(t).This may be complicated by knowledge (or lack of knowledge)about unmeasured components of y(t) and where fit may belacking. Once g(t) has been estimated, the usual diagnosticprocedures may be undertaken: the statistical significance ofg(t) may be assessed and it may then be examined graphi-cally for relationships with various measured variables. Any

928 C© 2009, The International Biometric Society

Forcing Function Diagnostics for Nonlinear Dynamics 929

Figure 1. Zebrafish data. Dots provide observed values oftransmembrane potential, lines the results of a least-squaresfit of equations (3) and (4) to these data.

apparent relationships may then be accounted for by modifi-cations to the model structure and the modified system rees-timated.

We note that equation (2) represents a stochastic differ-ential equation if g(t) is regarded as being random. Stochas-tic processes represent a violation of the model specification(1), and make up part of the alternative hypothesis for thegoodness-of-fit tests I describe below. However, distinguish-ing between g(t) being the realization of a stochastic processand being purely due to a deterministic misspecification of fis beyond the scope of this article.

1.1 Neural Spike DataTo demonstrate the ideas from this article in practice, I exam-ine the results of a common experiment in neurobiology. Fig-ure 1 gives readings taken from a voltage clamp experimentin which transmembrane potential is recorded on the axonof a single zebrafish motor neuron in response to an electricstimulus. When a stimulus exceeds a threshold value, neuronsrespond by initiating the sharp oscillations that can be seenin Figure 1. There is a long literature on models that aim todescribe neural behavior; see Wilson (1999) or Clay (2005) foran overview. These models range from very simple FitzHugh–Nagumo type systems (FitzHugh, 1961; Nagumo, Arimoto,and Yoshizawa, 1962) to highly complex models originatingwith Hodgkin and Huxley (1952) involving many ion channelsthat appear as dynamical components.

The task I pose is to develop a tractable modification ofsimple models that provides a reasonable agreement with theshape of the spikes found in the data. As a first modelingstep, I use a polynomial whose terms are motivated fromequation (9.7) in Wilson (1999). There, the model was de-veloped with the aim of mimicking the turning points of aphysiological system. Our equations take the form:

d

dtV = p1 + p2V + p3V

2 + p4V3 + p5R + p6V R, (3)

d

dtR = p7 + p8V + p9R, (4)

for unknown parameters p1, . . . , p9. I observe that the formof equation (3) is invariant under linear transformations ofV and R. I have used this property to derive initial guessesfor the parameters by taking the parameters given in Wilson(1999) and transforming them to approximately match thepeaks and troughs found in the data.

Figure 1 presents a plot of a short sequence of these dataalong with the best fit from these equations. It is clear thatwhile there is a qualitatively good agreement between themodel and the data, some systematic lack of fit is evident.This article develops tests to determine the significance oflack of fit and visual tools for model improvement.

1.2 OutlineThis article begins with a discussion of estimating lack of fit,illustrated with examples, in Section 2. I develop computa-tionally inexpensive approximate goodness-of-fit tests to beused as initial diagnostics in Section 3. Some identifiabilityproblems and possible solutions to them are discussed in Sec-tion 4. I provide an example in the form of model developmentfor the spiking behavior of zebrafish neurons in Section 5. Isuggest further directions for research in Section 6.

2. Estimating Lack of FitThe central technique in the article is the estimation of a set offorcing functions g(t) as in equation (2). For real-world data,y(t) and its derivatives are not known exactly; rather we haveaccess to a set of noisy observations yobs = (yobs

ij )k ,n ii ,j=1, where

yobsij is a measurement of the ith component of the system at

times tij . The vector ti will be taken to be the collection oftimes at which the ith component of y is measured. We takey(t) = {yi (tij )}k ,n i

i ,j=1 to be the corresponding estimated valuesof the system, and ‖yobs − y(t)‖ to be a (possibly weighted)norm between them. Where appropriate, yobs and y(t) will beregarded as vectors formed by concatenating the observationsfor each component of y(t). We note that it is common thatsome of the components of y have no observations.

g(t) is represented as a basis function expansion:

g(t)T = Φ(t)D, (5)

where Φ(t) is a row vector containing the evaluation of a set ofm basis expansions {φ1(t), . . . , φm (t)} at time t. In the exam-ples below, the φi are given by cubic B-splines with equispacedknots. D is then a m × k matrix providing the coefficients ofeach of m basis functions for each of k forcing components. Itwill frequently be convenient to use d = vec(D) to denote thevectorized version of D formed by concatenating its columns.d may be estimated via any parameter estimation scheme fordifferential equations.

As is the case for other methods based on basis expansions,the variance resulting from estimating a large number of co-efficients d may be regulated either by limiting the numberof basis functions or by directly penalizing the smoothness ofg(t). The goodness-of-fit tests developed below correspond tothe latter approach. However, in many systems, the numeri-cal challenges in estimating g(t) already limit the number ofbasis functions that can be used.

930 Biometrics, September 2009

2.1 Estimating Parameters in a Differential EquationParameter estimation techniques in differential equation mod-els is a difficult topic with a large associated literature. So-lutions to equation (1) typically do not have analytic expres-sions, necessitating the use of sometimes computationally in-tensive numerical approximation methods. Further, such solu-tions are given only up to a set of initial conditions x0 = x(t0).When x0 is unknown, the set of parameters must be aug-mented to include it.

This article largely aims to be generic in its approach andthe diagnostic prescriptions here may be tied to any param-eter estimation scheme. The initial tests for goodness of fitdescribed in this article make use of a straightforward imple-mentation of nonlinear least squares, described below.

In this implementation, solutions x(t; θ,x0) to equation (1)are approximated for any parameter vector θ and initial con-ditions x0. A Gauss–Newton method is then used to estimateboth. To make use of a Gauss–Newton solver, gradients arecalculated by solving an enlarged system to include the sen-sitivity equations (see Atherton, Schainker, and Ducot, 1975;Leis and Kramer, 1988):

d

dt

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

x

vec

(dxdθ

)

vec

(dxdx0

)

⎫⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎭

=

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

f(x; t, θ)

vec

(∂f∂θ

+∂f∂x

dxdθ

)

vec

(∂f∂x

dxdx0

)

⎫⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎭

. (6)

Each row of equation (6) may be derived by a straightforwardchain rule and the following initial conditions may also begiven:

x(0) = x0,dxdθ

∣∣∣∣t=0

= 0,dxi

dx0,j

∣∣∣∣t=0

=

{1 if i = j

0 otherwise.(7)

Both x(t; θ,x0) and its derivatives with respect to θ and x0

may be obtained by solving the expanded systems (6)–(7).There are many methods for approximating solutions to

ODEs and their accuracy can depend on the nature of the sys-tem to be estimated; see Deuflhard and Bornemann (2000).The simulations below have been undertaken using a Runge–Kutta method implemented in Matlab. However, these wereinsufficiently accurate for the real-data example in Section5, where details are given for specific methods that wereemployed.

When the θ are replaced by d in equation (5), the secondline of equation (6) becomes:

d

dt

dydd

=∂f∂x

dydd

+ Φ(t), (8)

where Φ(t) is the block diagonal matrix with Φ(t) on thediagonal blocks. For the purposes of visualization, it is fre-quently useful to smooth the estimate of g(t). When a stan-dard roughness penalty is placed on g, the penalized nonlinearleast squares objective becomes

||yobs(t) − y(t)| |2 + λdT Pd, (9)

where P is a penalty matrix. This can be achieved simply byaugmenting the set of squared errors in the Gauss–Newtonsolver.

This estimation scheme ignores the many numerical prob-lems associated with parameter estimation in differentialequations. These include controlling for numerical error inthe Runge–Kutta scheme and the problem of minimizing overrough response surfaces. The author’s experience is that thescheme above provides good estimates when initial parameterestimates are quite close to the true value. However, it fre-quently breaks down when started from parameter estimatesthat are moderately far from the truth and either finds a localoptimum or a set of parameters for which the ODE system isnot numerically solvable.

There are several methods available that may be used toovercome these problems. Stochastic search techniques suchas simulated annealing (Jaeger et al., 2004) or Markov chainMonte Carlo methods (Huang, Liu, and Wu, 2006) have beenemployed to try to find global minima in the response surface.Other methods include the estimation of parameters whileattempting to solve equation (1) at the same time; see, forexample, Tjoa and Biegler (1991). This last approach has beenused in Section 5.

Alternative methods rely on presmoothing steps. Varah(1982) proposes estimating a smooth y(t) to the data andchoosing θ to minimize:

k∑i=1

∫ {d

dtyj (t) − fj (y(t); t, θ)

}2

dt. (10)

This method requires all components of y(t) to be measuredwith high resolution and precision and may still suffer frombias due to the smoothing method used. More recent tech-niques include intermediate methods such as Ramsay et al.(2007), which replace the penalty in equation (9) with equa-tion (10), and Ionides, Breto, and King (2006), based on par-ticle filtering techniques.

2.2 Example: A Forced Linear SystemAs an initial experiment, I use a two-component linear differ-ential equation:

d

dt

(x1

x2

)=

(−4 8

−4 4

)(x1

x2

)+

(g(t)

0

),

in which g(t) takes the form of a stepped interval:

g(t) =

{5 if t < 5

0 otherwise.

I assume the equation is known exactly and attempt to repro-duce g nonparametrically from noisy data. The system wasrun with initial conditions (x1, x2) = (−1, 1), observationswere taken every 0.05 time units on the interval [0, 20] andGaussian noise with variance 0.25 was added to the observedsystem.

Figure 2 provides the result of estimating g(t) using themethods described above. To regularize the resulting smooth,I used a second derivative roughness penalty, λ

∫g(2)(t)2 dt

as in equation (9). λ was chosen by minimizing the squareddiscrepancy between the estimated differential equations atthe observation times and the true (noiseless) values of the


Figure 2. Experiments with a forced linear system. Thetrue forcing component (solid) with the estimated compo-nent (dashed) and a forcing component derived from a genericsmooth of the data (dotted).

system. Setting yλ (t) to be the reconstruction of the system,I choose

λopt = argminλ

2∑i=1

∫{yλ ,i (t) − xi (t)}2 dt. (11)

These results have been compared to the simple expedient ofgenerating a smooth of the data y(t) and plotting

d

dty(t) − f(y(t); t, θ).

In this case, the smooth was estimated by a spline with a thirdderivative penalty, following the recommendations of Ramsayand Silverman (2005) and the smoothing parameter chosento minimize equation (11).

The second method provides rough results due to the diffi-culty of estimating derivatives accurately from discrete, noisydata. We can understand this by observing that the bias in asmooth is related to the size of the derivative that is penalizedas well as the penalty parameter. Nonlinear systems producetrajectories that can have large derivatives, reducing the op-timal value of λ and increasing the roughness of the resultingestimate.

In the first method, the choice of basis functions and rough-ness penalty has been deliberately made without reference tothe structure of the experiment. These results could be im-proved significantly with the knowledge that the system dis-turbance was a square wave. Similarly, prior knowledge of thestructure of functional misspecification, such as cyclic behav-ior, can be incorporated explicitly either in the selection ofbasis functions, or in the smoothness penalty that is used.

3. Goodness-of-Fit Tests for Linear DifferentialEquations

I consider a system f(x,u) = Ax(t) + u(t), where A is a knownk × k matrix and u(t) a known k-valued function. Under theseconditions, exact solutions to equation (1) are given by:

x(t) = eA tx0 + eA t

∫ t

t0

e−A su(s) ds, (12)

where eAt represents a matrix exponential. Including a set oflack of fit forcing functions of the form DT Φ(t)T now modifiesequation (12) to

y(t)T = eA tx0 + eA t

∫ t

t0

e−A su(s) ds + eA t

∫ t

t0

e−A sDT Φ(s)T ds

= X(t)x0 + u(t) + DT Φ(t)T . (13)

Thus for an additive observational error process, we may ex-press the estimation problem for d in terms of a linear modelwith parametric effects for initial conditions x0:

yobs = u(t) + X(t)x0 + Φ(t)d + ε. (14)

Here, yobs is taken as a vector, u(t) is formed by concate-nating the ui (ti ), X(ti ) is a block matrix with the (i, j)thblock being X ij (ti ), and Φ(t) is a block diagonal matrix withdiagonal blocks Φ(ti ). ε represents a vector of independentobservational errors.

Because the size of d may be expected to grow with thenumber of observations, I regard equation (14) as a mixed-effects model with d being random effects adding the distri-butional assumptions:

d ∼ N (0, τ 2P ), ε ∼ N (0, σ2I), (15)

to equation (14). Here, P may be given by a roughness penaltyfor a basis expansion as in equation (9), or simply taken to bethe identity. This is a direct extension of the random-effectstreatment of smoothing splines (e.g., Gu, 2002). In practice, Ihave found that the numerical challenges involved in solvingequation (6) restrict us to a sufficiently coarse basis that thischoice of P makes little difference. A test for the significanceof d is then equivalent to testing the hypothesis

H0 : τ 2 = 0.

This process was described in Crainiceanu et al. (2005). Exactfinite-sample distributions for a likelihood-ratio test can bederived under the null hypothesis; these are detailed in theAppendix.

A simulation study is presented in Figure 3. I used the samelinear system as presented in Section 2.2. The height of theperturbation was varied between 0 and 2.5 and I simulatedGaussian errors with variance 0.5. Goodness of fit was as-sessed using a second-order B-spline basis on 21 knots acrossthe interval [0, 20]. I compared the power of the test whenthe basis was taken as representing a forcing function as inequation (2) against the model

y(t)T = x(t; θ,x0)T + DT Φ(t)T

= u(t) + X(t)x0 + DT Φ(t)T ,(16)

with distributional assumptions given by equation (15). I de-scribe this model as representing lack of fit on the scale of theobservations. Power was calculated using 1000 simulations ineach case. It is apparent that the representation in terms offorcing functions does noticeably better.

One common aspect of nonlinear dynamical systems is thatonly some components of x are measured. Moreover, the mea-sured components may not be those for which system misspec-ification is most severe. In these situations, examining resid-uals (model 16) only allows us to examine those components


Figure 3. Power comparisons for testing lack of fit in a lineardifferential equation for lack of fit represented as forcing func-tions (solid lines) and as additive disturbances on the scale ofthe observations (dashed) as a forcing perturbation is var-ied (horizontal axes). Lines with circles represent power whenboth x1 and x2 are observed; lines without represent powerwhen only x1 is observed.

of x which are measured. Representing lack of fit as a forcingfunction allows it to be estimated for all components whereit frequently has a simpler form. Moreover, it is possible toexplore lack of fit for different components to estimate whereit may be most severe. Power for a system in which x1 is theonly observed component is also given in Figure 3.

Tests with Unknown ParametersTypically, some or all of the parameters in a differential equa-tion model need to be estimated from data before a goodness-of-fit test may be applied. Moreover, in nonlinear dynamics,the coefficients of a basis representation for a forcing functiondo not influence the model linearly, even when the parametersare fixed.

Crainiceanu and Ruppert (2004) suggest extensions ofmixed-effects likelihood ratio tests to nonlinear models by lin-earizing the model about the estimated parameter values. Inthat paper, a nonlinear model was assumed to be of the formY = g(t, θ) + Φ(t)c, placing lack of fit on the scale of theobservations. The test proceeds by linearizing g about the es-timated θ, providing the approximation

Y ≈ g(t, θ) +d

dθg(t, θ)

∣∣∣∣θ

δ + Φ(t)c + ε, (17)

where δ and θ are then fit to Y − g(t, θ) via a standard linearmixed-effects model. A test for the variance of the randomeffects can then be carried out. The notion is that the vari-ance associated with estimating θ is partially accounted forby including the fixed effects δ in the mixed model estimationscheme.

In differential equation models, the procedure may be car-ried out by estimating dx/dθ and dx/dc0 from the sensitiv-

ity equations (6). Setting θ to be the full vector (x0, θ), theapproximation (17) becomes

yobs − x(t; x0, θ) ≈ Xx0(t)δx0 + Xθ (t)δθ + Φ(t)d + ε, (18)

where Xx0(t) and Xθ (t) represent the concatenations of

dxi (ti )/dx0 and dxi (ti )/dθ row-wise across i when thesederivatives are taken at x0 = x0 and θ = θ. They may be cal-culated from equation (6). Φ(t) is the block diagonal matrixwith Φ(ti ) on the diagonal blocks and x(t; x0, θ) is the vectorformed by concatenating xi (ti ; x0, θ).

Equation (18) fits into the framework above. We note thatrepresenting the disturbance in this manner is not dissimilarto the Gaussian process errors model suggested in Kennedyand O’Hagan (2001).

To use a model of the form (2), a further linearization isrequired. A generalization of the approximation (17) is givenby

y = g(t, θ, c) + ε ≈ g(t, θ, 0) +dg

dθδ +

dg

dcc + ε. (19)

For linear differential equations, this is equivalent to replacingΦ(t) with Φ(t) in equation (18). For non-linear models itbecomes

yobs − x(t; x0, θ) ≈ Xx0(t)δx0 + Xθ (t)δθ +

dydd

∣∣∣∣d=0

d + ε,

(20)

where dydd is the concatenation of the dyi (ti )/dd, which can be

estimated by solving the sensitivity equation (8). Treating thelinearized model (20) as a mixed-effects model with distribu-tional assumptions (15) in the same manner as equation (14)now allows us to apply the methodology above to test thatthe variance of d is zero.

To examine the behavior of this approximate test, asimulation study was conducted using data taken fromequations (3)–(4) with parameters set to zero apart from (p2,p4, p4, p7, p8, p8) = (3, −1, 3, 1/15, −1/3, −1/15). Thesewere chosen to provide a system that resembled a sin curve.For each simulation, a linear differential equation was fit tothe data and goodness of fit tested using the methodologyabove. Figure 4 presents the power of this proposed test as afunction of the variance of observational noise. As in Figure 4,this is compared to conducting the test using model (17) andwe observe a noticeable improvement in power at moderatenoise levels.

Linearization distorts the null distribution of the likelihoodratio test statistic and should be treated with some care. Wenote, however, that Wald-type tests and confidence regionsbased on nonlinear least squares have a very similar form(e.g., Bates and Watts, 1988). The advantage of the testssuggested here is that they may be performed before runningan expensive optimization scheme to estimate g(t). Moreover,they allow us to investigate which components of x may suffermost from lack of fit. This may allow us to reduce the num-ber of components for which g(t) is estimated, both reducingthe size of the optimization problem and the complexity ofvisually exploring them.


Figure 4. Power comparisons for testing lack of fit in a lin-ear differential equation for lack of fit represented as forcingfunctions (solid lines) and as additive disturbances on thescale of the observations (dashed) as the amount of obser-vational noise around a solution to the nonlinear differentialequations (3)–(4) is increased.

4. Unmeasured Components and Lack-of-FitIdentifiability

In nonlinear dynamics, it is typical that only some of the com-ponents of a system are measured. Data from voltage clampexperiments, for example, are only available for the variableV in equations (3)–(4). For deterministic systems, the lack ofmeasurements for some components does not present a prob-lem for parameter estimation because the unmeasured com-ponents have a direct impact on those that are measured.However, the presence of missing components does become aproblem for the estimation of forcing functions.

Consider a system described by states (y, x), where y hasbeen continuously measured and x has not been measured.Suppose a differential equation f(y,x; t) has been proposed todescribe it and f(y, ·; t) : R

k ′ → Rk ′ is invertible for all (y, t).

It can be shown that it is only possible to identify the samenumber of forcing functions as there are observed compo-nents; this is proved formally in the Appendix. Note that

Figure 5. Diagnostics for the Zebrafish data. Lack-of-fit forcing functions for V, plotted against estimated trajectories for V(left) and R (right), after fitting equations (3) and (4) to the patch clamp data in Figure 1.

identification here assumes a fixed y(t) that is known con-tinuously. This would be the case if we were to first smoothdata and then perform diagnostics as in Varah (1982). Theproblem of identification for discretely observed data involvesthe number of observations and the number of basis functionsused and will not necessarily be easy to analyze.

For a single observed component, it is possible to indepen-dently estimate a forcing function for each component in thesystem and perform model building by including the strongestobserved relationships at each step. However, when b outof k components are observed, estimating the

(bk

)possible

combinations of forcing functions becomes computationallyintractable and analyzing them becomes intellectually infea-sible. It may therefore be preferable to impose some identifi-ability constraints and estimate all the components jointly. Astandard method of doing this is to use a ridge-type penalty,augmenting equation (9) with a term λ

∑τi

∫gi (t)2 dt, where

the τ i are chosen according to the scale of the variables xi , oraccording to prior information about which components of xi

are likely to be misspecified. We may regard this as the fullynonlinear equivalent of the model (13) and use the restrictedmaximum likelihood (REML) estimate for λ as a guide to theamount of penalization required. In the author’s experience,diagnostic plots are insensitive to the choice of λ and verysimilar to those provided by estimating each gi individually.

5. Zebrafish DataFigure 5 presents lack of fit as estimated by nonlinear leastsquares for the zebrafish data. For this model and data, anaive implementation of nonlinear least squares proved insuf-ficient to estimate parameters or forcing functions. Instead,they have been estimated via a constrained optimization tech-nique for collocation methods, implemented in the IPOPTroutines (Arora and Biegler, 2004; Wachter and Biegler,2006). A complete description of the use of these methodsfor this application is given in Hooker and Biegler (2007).

To estimate lack of fit, I used 200 linear B-splines in bothvariables. I have plotted the lack-of-fit functions for the ob-served variable V as a function of their estimated positionin both variables. These plots make the difficulties of manu-ally assessing lack-of-fit clear. The most evident relationshipis between gV and (V , R), where there is a clear nonlinear de-pendence. However, because gV is only measured for (V , R)


Figure 6. A fit to the zebrafish data using the system (3)–(4)including the modification (21).

on a one-dimensional curve in (V , R) space, proposing anyform to fit this relationship inherently requires extrapolation.I propose that the flat line evident in the right-hand panel ofFigure 5 indicated a regime transition, the model provides arelatively good fit on one side of the regime while differentparameters are required on the other.

Formally, this model was implemented by modifying equa-tion (3) to

d

dtV = p1 + p2V + p3V

2 + p4V3 + p5R + p6V R

+ logit {(c1 + c2R − V )c3}

×(q1 + q2V + q3V

2 + q4V3 + q5R + q6V R

). (21)

The logistic term provides a smooth transition between thetwo regimes. The results of an implementation of this modelwith additional parameters estimated in an ad hoc mannerby fitting them to the estimated forcing functions are givenin Figure 6. Here, the general shape of the spike is clearlybetter resolved. However, it is also evident from this figurethat the timings of the spikes in the data are not exact, some-times occurring earlier and sometimes later than a perfectlycyclic system would produce. Optimizing parameters withinthis system produced a 100-fold decrease in total squarederror. However, we note that the modified equations, withthe estimated parameters, occur in a chaotic region. Smallchanges to c3, which controls the steepness of the transition,produce a system that alternates between an eventual fixedpoint near zero and a system that eventually diverges. Thus,additional analysis is required to satisfy the requirements ofa realistic system.

6. ConclusionsThe most interpretable representation for a model’s lack of fitis on the scale that the model is given. For differential equa-tions, it is appropriate to measure lack of fit on the scale ofderivatives. This observation leads to the estimation of forc-ing functions as diagnostic tools. This article has shown that

these tools provide useful insight into where a system of dif-ferential equations may be misspecified. I have also derivedapproximate tests for the significance of lack of fit and theneed for further modeling.

Exploratory model development is particularly difficult indynamical systems. Lack of fit suffers from identifiabilityproblems when the posited model has unmeasured compo-nents. This makes it important to exercise caution in ex-ploratory model development. As exemplified by our attemptsto improve simple neurological models, it is also important toconsider the dynamical properties of the proposed modifica-tion. In particular, care needs to be taken to preserve the sta-bility of limit cycles, bifurcation points, and other dynamicalfeatures of the model. Much further work is therefore war-ranted. Similarly, interpolation tools that allow lack of fit tobe estimated in such a way that it maintains the stability oflimit cycles would prove very helpful for visual diagnostics.In the realm of stochastic systems, it would also be usefulto develop tests for independence between estimated forcingfunctions and the trajectory of the model. I anticipate workin this field to be challenging and exciting for a long time.

7. Supplementary MaterialsMatlab code for the simulations in this article is availablefrom the Paper Information link at the Biometrics web-site http://www.biometrics.tibs.org. A full description ofthe fitting procedures and further diagnostics for the neuraldata are given in Hooker and Biegler (2007), available fromhttp://www.bscb.cornell.edu/∼hooker.

Acknowledgements

The zebrafish data used in this study was provided by JoeFetcho’s lab at the Biology Department in Cornell University.I am greatly indebted to Larry Biegler for the time and efforthe spent in assisting me with the IPOPT system.

References

Arora, N. and Biegler, L. T. (2004). A trust region SQP algorithmfor equality constrained parameter estimation with simple para-metric bounds. Computational Optimization and Applications 28,51–86.

Atherton, R. W., Schainker, R. B., and Ducot, E. R. (1975). On the sta-tistical sensitivity analysis of models for chemical kinetics. AIChEJournal 3, 441–448.

Bates, D. M. and Watts, D. B. (1988). Nonlinear Regression Analysisand Its Applications. New York: Wiley.

Bellman, R. (1953). Stability Theory of Differential Equations.New York: Dover.

Clay, J. R. (2005). Axonal excitability revisited. Progress in Biophysicsand Molecular Biology 88, 59–90.

Crainiceanu, C. M. and Ruppert, D. (2004). Likelihood ratio tests forgoodness-of-fit of a nonlinear regression model. Journal of Multi-variate Analysis 91, 35–52.

Crainiceanu, C. M., Ruppert, D., Claeskins, G., and Wand, M.P. (2005). Exact liklihood ratio tests for penalized splines.Biometrika 92, 91–103.

Deuflhard, P. and Bornemann, F. (2000). Scientific Computing withOrdinary Differential Equations. New York: Springer-Verlag.

FitzHugh, R. (1961). Impulses and physiological states in models ofnerve membrane. Biophysical Journal 1, 445–466.


Gu, C. (2002). Smoothing Spline ANOVA Models. New York: Springer.Hodgkin, A. L. and Huxley, A. F. (1952). A quantitative description of

membrane current and its application to conduction and excita-tion in nerve. Journal of Physiology 133, 444–479.

Hooker, G. and Biegler, L. (2007). IPOPT and neural dynamics: Tips,tricks and diagnostics. Technical Report BU-1676-M, Departmentof Biological Statistics and Computational Biology, Cornell Uni-versity.

Huang, Y., Liu, D., and Wu, H. (2006). Hierarchical Bayesian meth-ods for estimation of parameters in a longitudinal HIV dynamicsystem. Biometrics 62, 413–423.

Ionides, E. L., Breto, C., and King, A. A. (2006). Inference for nonlin-ear dynamical systems. Proceedings of the National Academy ofSciences 103, 18438–18443.

Jaeger, J., Blagov, M., Kosman, D., Kolsov, K., Manu, Myasnikova, E.,Surkova, S., Vanario-Alonso, C., Samsonova, M., Sharp, D., andReinitz, J. (2004). Dynamical analysis of regulatory interactionsin the gap gene system of drosophila melanogaster. Genetics 167,1721–1737.

Kennedy, M. and O’Hagan, A. (2001). Bayesian calibration of computermodels. Journal of the Royal Statistical Society, Series B 63, 425–464.

Leis, J. R. and Kramer, M. A. (1988). The simultaneous solution andsensitivity analysis of systems described by ordinary differentialequations. ACM Transactions on Mathematical Software 14, 45–60.

Nagumo, J. S., Arimoto, S., and Yoshizawa, S. (1962). An active pulsetransmission line simulating a nerve axon. Proceedings of the IRE50, 2061–2070.

Ramsay, J. O., Hooker, G., Campbell, D., and Cao, J. (2007). Parame-ter estimation in differential equations: A generalized smoothingapproach. Journal of the Royal Statistical Society, Series B (withdiscussion) 16, 741–796.

Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis.New York: Springer.

Rheinbolt, W. C. (1984). Differential-algebraic systems as differentialequations on manifolds. Mathematics of Computation 43, 473–482.

Tjoa, I.-B. and Biegler, L. (1991). Simultaneous solution and optimiza-tion strategies for parameter estimation of differential-algebraicequation systems. Industrial Engineering and Chemical Research30, 376–385.

Varah, J. M. (1982). A spline least squares method for numerical pa-rameter estimation in differential equations. SIAM Journal onScientific Computing 3, 28–46.

Wachter, A. and Biegler, L. T. (2006). On the implementation of aninterior-point filter line-search algorithm for large-scale nonlinearprogramming. Mathematical Programming 106, 25–57.

Wilson, H. R. (1999). Spikes, Decisions and Actions: The DynamicalFoundations of Neuroscience. Oxford, U.K.: Oxford UniversityPress.

Received October 2007. Revised July 2008.Accepted August 2008.

Appendix

Tests for Random EffectsCrainiceanu et al. (2005) derive distributions for the restrictedlikelihood ratio test of a random effect variance under the nullhypothesis that the variance is zero. Specifically, consider alinear mixed model

Y = Xβ + Zb + ε, cov

(b

ε

)=

(σ2

b Σ 0

0 σ2ε In

),

where Y is a n-vector of responses to be regressed on an × p predictor matrix X with a further set of random effectscontained in the n × k matrix Z. I assume that the randomeffects b have covariance matrix σ2

bΣ for known Σ and the εare independent N (0, σ2

ε ).Redefining λ = σ2

b /σ2ε , we have cov(Y) = σ2

ε V λ for V λ =I n + λZΣZT . Twice the negative restricted log likelihood isthen

REL(β, σ2, λ) = (n − p − 1) log(σ2

ε

)+ log {det(Vλ )}

− log{det

(XT V −1

λ X)}

+(Y − Xβ)T V −1(Y − Xβ)

σ2ε

and the restricted likelihood ratio test statistic for λ = 0 is

RLRTn = supβ ,σ 2

ε ,λ

REL(β, σ2

ε , λ)− sup

β ,σ 2ε

REL(β, σ2

ε , 0).

Let P 0 = I n − X(X TX )−1X and μs and ξs be the eigen-values of Σ1/2ZTP 0ZΣ1/2 and Σ1/2ZTZΣ1/2. Letting us fors = 1, . . . , K and wt for t = 1, . . . , n − p − 1 be independentN (0, 1) variables and

Nn (λ) =K∑i=1

λμs

1 + λμs

w2s , Dn (λ) =

K∑s=1

w2s

1 + λμs

+n−p−1∑s=K +1

w2s ,

RLRTn is equal in distribution to

∑λ�0

(n − p − 1) log

{1 +

Nn (λ)Dn (λ)

}−

K∑s=1

log(1 + λμs ).

Although this appears complex, the distribution is easilysimulated.

Lack-of-Fit IdentifiabilitySection 4 explored identifiability for lack-of-fit functions forsystems described by two variables. The extension to morecomplex settings cannot be made so directly. I examine thissetting by breaking the system into components which are orare not measured, and which are or are not forced. In thefollowing theorem they will be given distinct labels:

Theorem 1: Let x0(t) and j0(t) have dimension k0,x1(t),y(t) and j(t) have dimension k1 for each t ∈ [0, 1], andlet z(t) have dimension k2. Let f0(x0,x1,y, z; t), f1(x0,x1,y, z; t), g(x0,x1,y, z; t), and h(x0,x1,y, z; t) be Lips-chitz continuous in their arguments, h be bounded andlet f1(x0,x1,y, z; t) have a Lipschitz continuous inversef−1y (x0,x1,w, z; t) in y for each x0,x1, z, t. For fixed, contin-uously differentiable x0(t) and x1(t), let y0, z0 be such thatx1

0 = ddt

x1|0 = f(x00,x

10,y0, z0; 0), the equations

d

dtx0 = f0(x0,x1,y, z; t) + j0(t)

d

dtx = f1(x0,x1,y, z; t)

d

dty = g(x0,x1,y, z; t) + j(t)

d

dtz = h(x0,x1,y, z; t)

have unique continuous solutions y(t), z(t), j0(t), and j(t) withy(t) and z(t) continuously differentiable.


Proof. Solutions may be constructed directly by notingthat any solution z(t) must satisfy:

d

dtz = h

(x0,x1, f−1

y (x0,x1, x, z; t), z, t)

, z(0) = z0 (A.1)

(note that y0 = f−1(x0,x1, x10, z0; 0) by assumption). Under

the assumption of Lipschitz continuity, the right-hand sideof (A.1) is also Lipschitz continuous. This provides the localexistence and uniqueness for solutions of (A.1); see, for exam-ple, Bellman (1953). Boundedness of h guarantees a globalsolution.

We can now reconstruct y(t) and j(t) as

y(t) = f−1y (x0,x1, x1, z; t)

j0(t) =d

dtx0 − f0(x0,x1,y, z; t)

j(t) =d

dty − g(x,y, z; t).

Continuity follows directly.We remark that this construction is only possible when y(t)

has the same dimension as x1(t), otherwise no such f−1y will

exist. Rather than insisting on this inverse, we could proceedto solve the differential-algebraic system:

0 =d

dtx − f(x,y, z,u)

d

dtz = h(x,y, z; t)

in terms of y(t) and z(t) and proceed with the substitution.This allows slightly different conditions on f. See, for example,Rheinbolt (1984).

The division of components into x0,x1,y, and z is arbitraryhere, the point being that for any division for which f−1

y exists,solutions may be found.

Documents

Forcing Function Diagnostics for Nonlinear Dynamicsfaculty.bscb.cornell.edu/~hooker/forcingfunctions.pdfForcing Function Diagnostics for Nonlinear Dynamics 929 Figure 1. Zebraﬁsh