Statistical Inference for Non-Linear Models Involving Ordinary Differential Equations

This article was downloaded by: [University of North Carolina]On: 13 November 2014, At: 13:46Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Statistical Theory and PracticePublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/ujsp20

Statistical Inference for Non-Linear Models InvolvingOrdinary Differential EquationsGhosh K Sujit a & Lovely Goyal ba Department of Statistics , NC State University , Raleigh, NC, 27695-8203, USAb Medical Sciences Biostatistics, Amgen Inc. , Thousand Oaks, CA, 91230-1799, USAPublished online: 30 Nov 2011.

To cite this article: Ghosh K Sujit & Lovely Goyal (2010) Statistical Inference for Non-Linear Models Involving OrdinaryDifferential Equations, Journal of Statistical Theory and Practice, 4:4, 727-742, DOI: 10.1080/15598608.2010.10412015

To link to this article: http://dx.doi.org/10.1080/15598608.2010.10412015

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose ofthe Content. Any opinions and views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be reliedupon and should be independently verified with primary sources of information. Taylor and Francis shallnot be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and otherliabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/ujsp20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/15598608.2010.10412015

http://dx.doi.org/10.1080/15598608.2010.10412015

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

© Grace Scientific PublishingJournal of

Statistical Theory and PracticeVolume 4, No. 4, December 2010

Statistical Inference for Non-linear Models involvingOrdinary Differential Equations

Sujit K Ghosh, Department of Statistics, NC State University, Raleigh,NC 27695-8203, USA, Email: [email protected]

Lovely Goyal, Medical Sciences Biostatistics, Amgen Inc., Thousand Oaks,CA 91230-1799, USA. Email: [email protected]

Received: July 21, 2010 Revised: November 3, 2010

Abstract

In the context of nonlinear fixed effect modeling, it is common to describe the relationship betweena response variable and a set of explanatory variables by a system of nonlinear ordinary differentialequations (ODEs). More often such a system of ODEs does not have any analytical closed formsolution, making parameter estimation for these models quite challenging and computationally verydemanding. Two new methods based on Euler’s approximation are proposed to obtain an approxi-mate likelihood that is analytically tractable and thus making parameter estimation computationallyless demanding than other competing methods. These methods are illustrated using a data on growthcolonies of paramecium aurelium and simulation studies are presented to compare the performancesof these new methods to other established methods in the literature.

AMS Subject Classification: 62F03; 62F15; and 62P10.

Key-words: Bayesian inference; Ordinary differential equations; MCMC; Non-linear models; Splines.

1. Introduction

In the field of biomedical applications, data usually consists of repeated measurementson individuals observed under varying experimental conditions. For example, in pharma-cokinetics, several blood samples are taken on participating individuals over a period oftime, following administration of a drug. These individuals can be considered as a ran-dom sample drawn from a population of interest. More often, the relationship between themeasured response and the varying experimental conditions is nonlinear and involves un-known parameters of interest. The model is then fitted to data sets from different individuals,

* 1559-8608/10-4/$5 + $1pp - see inside front cover© Grace Scientific Publishing, LLC

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014

728 Sujit K. Ghosh & Lovely Goyal

where the main interest is to make inferences about population characteristics and in specialcases, about individual characteristics which requires mixed effects modeling framework.However, in this paper we will treat data set for each individual as a separate data set andtherefore the scope of this paper is restricted to nonlinear fixed effects models. Extensionsof our methods to nonlinear mixed effects modeling framework will be presented elsewhere.

Within the framework of nonlinear models (NLM), much of the interest is focused onrepresenting the mean function (or mean trajectory), describing the dynamic relationshipbetween the response and explanatory variables (such as time), by a system of ordinarydifferential equations (ODEs) whose parameters describe the different characteristics of theunderlying population. Therefore, a system of ODEs provides an attractive modeling toolto describe dynamic process, where the interest is focused on modeling the rate of changeover time rather than the static average value of the response variable.

In Ho et al. (1995), ordinary differential equations were used to analyze the temporaldynamics of HIV viral load measurements in AIDS patients and their results revealed thatthe HIV virus replicates at a very high rate. In another example, a system of nonlinearODEs was used to describe the temporal expectation of virus and infected cell densitiesafter initiation of anti-retroviral treatment (Perelson et al., 1996). In the case of a HIVstudy, parameters involved in differential equations, can characterize rates of production,infection, death of immune system cells and viral production and clearance (Ding and Wu,1999).

It is well-known that when a closed form analytic solution is available for the sys-tem of ODEs, the parameters can be estimated using standard statistical packages, e.g.,R, SAS, WinBUGS etc. For example, in Han, Chaloner and Perelson (2002), parameters in-volved in a system of ODEs were estimated using an analytical solution of the ODEs. Theclosed form analytical solution was obtained by assuming that the virus dynamics are insteady state prior to initiation of the anti-retroviral therapy. However, in practice it turns outthat such steady state assumptions may not hold and thus there are very few cases where it isactually possible to derive the closed form expression for the exact solution for a well-posedsystem of ODEs. The parameter estimation problems for such models become challengingand computationally demanding in the absence of any analytically closed form solution forthe system of ODEs.

The objective of this article is to develop computationally efficient methods to obtainstatistical inference for parameters of a NLM that involves a system of ODEs, in the absenceof an analytical solution. In Section 2, we describe the nonlinear statistical models andprovide a brief review of the associated numerical methods. In Section 3, we present thetwo proposed methods based on Euler’s approximation; (i) Bayesian Euler’s approximationmethod (BEAM) in Section 3.1 and (ii) splines Euler’s approximation method (SEAM) inSection 3.2. We then illustrate our methods in Section 4 by applying it to a data on growthcolonies of paramecium aurelium. Simulation studies motivated by the previous applicationis then presented in Section 5. Finally, in Section 6, we provide some general conclusionsand directions for future research.

2. Nonlinear models involving ODEs

Let y j denote the jth observed response, measured at time point t j, for j = 1,2, ...,nindividuals. To keep our description simple, we considered time as the only dynamic ex-

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014

Statistical Inference for NLMs involving ODEs 729

planatory variable in the model. However, methodologies proposed in this paper can beextended to a more general case with multiple dynamic covariates. The statistical modelcan be written as,

y j = µ(t j,θθθ)+ ε j, j = 1,2, ...,n. (2.1)

In equation (2.1), µ is the mean function describing the population dynamics of the response,and depends on a vector θθθ = (θ1, . . . ,θp)

T of p regression parameters.In the context of many biological applications (e.g., PK/PD or PBPK models), µ can be

defined as the solution of a system of ODEs given by

dνννdt

= g(ννν(t,θθθ)) for t = t0 (2.2)

and ννν(t0,θθθ) = ννν0(θθθ) (2.3)

where ννν(·) = (ν1(·), . . . ,νq(·))T represents the underlying vector of dynamics and ννν0(·)provides a set of known initial conditions (often free of θθθ ). The q-vector valued functionggg(·) = (g1(·), . . . ,gq(·))T that describes the dynamics is completely known up to the un-known parameter θθθ . Notice that (2.2) can equivalently be expressed with a set of q ODEs,dνkdt = gk(ννν(t,θθθ)) for k = 1, . . . ,q.

The mean function, µ(·) is related to ννν by a completely known function H : Rq → Rby µ(·) = H(ννν(·)). In this paper, for simplicity, we use single-compartmental system, i.e.,q = 1, for all our illustrations but methods proposed in this paper can be applied to thegeneral case of q-compartmental system of ODEs. The random errors ε j’s correspond to themeasurement uncertainties associated with the observed response at different time points.These random errors are assumed to be identically, independently distributed (iid) with zeromean and constant variance across all measurements, i.e.,

E(ε j) = 0 and Var(ε j) = σ2, for j = 1,2, ...,n. (2.4)

The iid assumption for the errors is clearly restrictive and in Section 6 we discuss how ourmethods can be extended to the case when we allow the variance of errors to change withtime, i.e., Var(ε j) = σ2(t j,ηηη) with unknown parameter ηηη .

The objective is to estimate the parameter vector θθθ and the variance parameter, σ2. Theparameter estimates can be obtained by numerical methods, using packages like nlm in R

or proc nlin in SAS if an analytic closed form expression were available for the meanfunction µ(·). However, as we discussed earlier, in most cases, such an analytical solutionfor the system of ODEs either requires restrictive assumptions or simply not available. Thelack of a closed form expression for the mean function makes the parameter estimationproblem challenging and this is focus of our research work.

The usual approach to overcome this problem, is to solve the system of ODEs numericallyby using popular ODE solvers at a known set of values of the parameter θθθ . However thesuccess of most of these numerical approximation methods depends on a “good" choice of astarting value for θθθ and some characteristics of the system (e.g., the steepness etc.). A “bad"starting value often leads to an unstable solution and creates numerical problems for the op-timization method that is followed by these ODE solvers. The popular odesolve packagein R provides an interface to the Fortran ODE solver lsoda (Petzold, 1987). The numerical

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014


solution obtained from these ODE solvers are then used to obtain parameter estimates eitherby a Bayesian method (Gelman, Bois and Jing, 1996; Wakefield, 1996; Lunn et al., 2002;Putter et al., 2002.) or by the maximum likelihood method (Davidian and Giltinan, 1995.;Racine-Poon and Wakefield 1998). Even though ODE solvers are widely used for estimatingparameters in PK/PD modeling, it may be difficult to implement or lack control of essentialnumerical subroutines required to obtain the desired numerical solution for ODE. Moreoverthese numerical methods also turn out to be unstable especially in case of censored or miss-ing data (Putter et al., 2002). Apart from such numerical instabilities, these methods arecomputationally intensive iterative procedure in which the system of ODEs must be solvedat each time point for each individual and this becomes more complicated in the case ofmulti-compartmental problems with censored or missing data.

An alternative approach, known as the “Integrated Data” (ID) method for parameter es-timation for models described by a system of ODEs, was proposed by Holte et al. (2003).The idea behind the ID method is to simplify a nonlinear regression problem by transform-ing a system of ODEs into a system of integral equations and fitting a linear regressionmodel with “covariates" as the approximate integrals in these equations. However, the IDmethod requires a set of dense measurements taken from each compartment represented inthe ODE system, which may be difficult or costly, if not impossible, especially when thesystem consists of multi-compartments i.e., when q > 1. We compared our methods to theID approach using simulation study and found the results competitive, however we havenot presented the results from ID method in this paper due to lack of space. In this pa-per, we propose two alternative methodologies to resolve the numerical problems related toparameter estimation.

The first approach will be termed as the “Bayesian Euler’s Approximation Method(BEAM)” which is built on the existing Bayesian framework for parameter estimation inPK/PD modeling (Gelman, Bois and Jing, 1996; Lunn et al., 2002; Han, Chaloner andPerelson, 2002; Putter et al., 2002; Wakefield, 1996; Huang, Liu and Wu, 2004) using theEuler’s method of solving a system of ODEs and thus providing an analytic closed form ap-proximation for the likelihood function. The advantages of BEAM are in providing a closedform analytic approximation for the likelihood as well as its ability to handle missing or cen-sored data by using data augmentation methods (Schafer, 1997). It also has the flexibility tohandle sparse and/or unbalanced data. The availability of entire posterior distributions forthe unknown parameter (θθθ ,σ) also makes it straightforward to draw statistical inferences.

The second approach will be termed as the “Splines Euler’s Approximation Method(SEAM)”, that uses a suitable class of interpolating spline functions (Wahba, 1990) to pre-process the data before applying the Euler’s method to approximate the likelihood function.The advantages of SEAM are in relaxing the distributional assumptions for the errors andhuge computational efficiency over the competing methods.

3. Likelihood approximation using the Euler’s method

Many different numerical approximation methods are available for computing approxi-mate solutions to a system of ODEs with a given set of initial conditions such as the genericproblem given by (2.2) and (2.3).

A numerical approximation method is basically a prescription for replacing the system of

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014


ODEs by a system of linear algebraic equations that can be solved on a computer using soft-ware written in a standard programming language (Lambert, 1987). A detailed discussionof these methods, can be found in literature (Shampine, 1994; Lambert, 1987; Atkinson,1978). All the numerical approximation methods involve discretizing the time points byan amount h known as the “step size,” which is the distance between two consecutive timepoints. This step size may or may not be same for all consecutive time points, but for ourdescription we assume h to be constant over the range of time points, i.e. we assume thatt0k = t0 +hk for k = 0,1,2, . . ., represent the discretized time points. It is easy to see that the

solution to the system of ODEs in equation (2.2) can be expressed as

ννν(t,θθθ) =∫ t

t0g(ννν(s,θθθ))ds+ννν0(θθθ) (3.1)

which suggests the approximation, as h → 0,

ννν(t +h,θθθ)−ννν(t,θθθ) =∫ t+h

tg(ννν(s,θθθ))ds ≈ hg(ννν(t,θθθ)) (3.2)

and hence an approximation for µ(t,θθθ) = H(ννν(t,θθθ)), where H is a completely knowncontinuous function. Thus, using (3.2) we can obtain a recurrence relation to approximatethe mean function.

We now describe a method based on (3.2) to approximate the likelihood that arise fromthe model given by equations (2.1-2.4). Let t1 < t2 < · · · < tn denote the observed timepoints in the data set and we observe the response values {Yj = Y (t j) : j = 1,2, . . . ,n}. Asthe observed time points can be unevenly distributed we first consider a discretization by Nfixed time points t0 = t0

1 < t02 < ... < t0

N such that t0k+1 − t0

k = h for k = 1,2, ...,(N −1). Inorder to cover the range of observed time points we choose the maximum value for thesefixed time points to be larger than tn. In other words, we assume t0

N > tn. The choiceof h (and hence that of N) will depend on the sample size n. Letting νννk ≡ ννν(t0

k ,θθθ) andµk ≡ µ(t0

k ,θθθ) = H(νννk) for k = 1,2, . . . ,N −1, we can write

νννk+1 = νννk +hg(νννk)

µk = H(νννk), (3.3)

with initial condition, ννν1 = ννν0(θθθ). This simple approximation defined by equation (3.3),forms the basis of our analytical approximation. Now to define µ(t,θθθ) for any value oft ∈ [t0

1 , t0N ], we use a linear interpolation. More precisely, given a time point t ∈ [t0

1 , t0N ], we

define the labels,

L(t) =N

∑k=1

I(t0k ≤ t). (3.4)

Notice that for any t ∈ [t01 , t

0N ], the function L(t) takes the values in the range {1, . . . ,(N−1)}

and determines that how many t0k ’s are less than t which in turn provides the lower limit of

the interval that contains t, i.e., t0L(t) ≤ t < t0

L(t)+1. The value of approximate mean function

at t ∈ [t01 , t

0N ] is then given by,

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014


µ(t,θθθ)≡ µh(t,θθθ) = µL(t)+t − t0

L(t)

t0L(t)+1 − t0

L(t)

(µL(t)+1 − µL(t)), (3.5)

where L(t) is defined in (3.4) and µk’s are defined in (3.3). Thus we can approximatethe true likelihood function of (θθθ ,σ) using the µ(t,θθθ) function. Notice that µh(t,θθθ) →µ(t,θθθ) as h → 0 (Lambert, 1987). In fact, µh(t,θθθ) = µ(t,θθθ)+ o(h) if we use the abovemethod known as the “naive" Euler’s method. If we use the “improved" Euler’s method(see Appendix A), then µh(t,θθθ) = µ(t,θθθ)+o(h2) and more generally Runge-Kutta methodyields µh(t,θθθ) = µ(t,θθθ)+ o(h4). Thus, it follows that as h → 0, the likelihood functionbased on true mean function µ(t,θθθ) can be well approximated by the likelihood functionbased on the approximating µ(t,θθθ) function as defined above.We now use (3.3) and (3.5) as the basis to propose two methods which can be further im-proved using other numerical recipies as described in Appendix A at the cost of computa-tional time.

3.1. The Bayesian Euler’s Approximation Method BEAM

In order to construct a likelihood based on (3.5) we assume that the errors are iid and nor-mally distributed with mean zero and variance σ2. Given the observed data D = {(Yj, t j) :j = 1,2, . . . ,n} the model described by equations (2.1-2.4) can now be approximated by thefollowing hierarchical model:

y j|(θθθ ,σ2)indep∼ N(µ j(θθθ),σ2) for j = 1,2, ...,n (3.6)

where µ j(θθθ) = µ(t j,θθθ) as defined in (3.5). For practical applications, the y j’s may be thelog-transformed (or more generally Box-Cox transformed) values depending on whethernormality or log-normality is the more appropriate assumption for the data (Lunn et al., 2002).

Second and the final stage of this hierarchical model consists of the specifying priordistributions for parameters as follows:

θθθ |σ2 ∼ MVNp(θθθ 0,H0) and σ2 ∼ IG(a0,b0), (3.7)

where MVNp(θθθ 0,H0) denotes a multivariate normal distribution with mean θθθ 0 and variancematrix H0 and IG(a0,b0) denotes an inverse gamma distribution with mean b0

a0−1 . The valuesof a0, b0, θθθ 0, H0, are assumed to be known, which are used to elicit prior informationwhen available. In the lack of prior information, we choose these known quantities to reflecton prior ignorance by choosing these values to yield vague priors (i.e., priors with largevariances). For a detailed discussion about the choice of prior distribution, see Natarajanand Kass (2000). The joint posterior distribution for parameters θθθ and σ2 based on themodel (3.6-3.7) can be written as:

p(θθθ ,σ2|D) ∝ p(Y|θθθ ,σ2)p(θθθ)p(σ2). (3.8)

where Y = (Y1, . . . ,Yn). Clearly the above posterior density is analytically intractable as itis highly nonlinear in θθθ . In order to use sampling based methods, such as Markov chainMonte Carlo (see Robert and Casella, 2005) we obtain the full conditionals of θθθ and σ2,which are given by,

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014


p(θθθ |σ2,D) ∝ p(Y|θθθ ,σ2)p(θθθ) (3.9)

and p(σ2|θθθ ,D) ∝ p(Y|θθθ ,σ2)p(σ2). (3.10)

If the right hand side of above equations have densities of the standard form then we can useGibbs sampling (Geman and Geman, 1984) to simulate values from the posterior distributionof (θθθ ,σ2). For instance a standard form is available for the full conditional of σ2:

σ2|θθθ ,D ∝ IG

(a0 +

n2,b0 +

12

n

∑j=1

(y j − µ j(θθθ))2

), (3.11)

and therefore we can easily draw samples from the full conditional of σ2. Since we donot have a standard form for the conditional distribution of θθθ , we can use the Metropolis-Hastings algorithm (Hastings, 1970) to draw samples. Though we used WinBUGS to generateapproximate samples from posterior distribution of (θθθ ,σ2), here we give a brief outline ofthe iterative MCMC algorithm suitable for BEAM.

1. Initialize the iteration of the chain at l = 0 and start with some initial values, S(0) =(σ2(0),θθθ (0))

2. Obtain S(l) from S(l−1) in two steps:

(a) Draw σ2(l) ∼ π(σ2|θθθ (l−1),YYY ) using (3.11) and

(b) For θθθ (l), generate a new value ϕϕϕ from a symmetric proposal density q(ϕϕϕ |θθθ (l−1)).Evaluate the acceptance probability of the move, given by,

α(ϕϕϕ |θθθ (l−1)) = min

{1,

π(ϕϕϕ |σ2(l),Y)

π(θθθ (l−1)|σ2(l),Y)

}

Also, independently sample a u from the uniform (0,1) and if u ≤ α(ϕ |θθθ (l−1))

the move is accepted else stay at θθθ (l−1). In other words, if the move is acceptedthen θθθ (l) = ϕϕϕ otherwise θθθ (l) = θθθ (l−1).

3. Move from chain l to l+1 using step (2) and repeat until the Markov chain {S(l), l =1,2, . . .} converges (to p(θθθ ,σ2|D)).

In practice, WinBUGS uses a Metropolis algorithm based on a normal proposal distributionwith mean as the current value of the parameter and variance determined by tuning overthe first 4000 iterations to achieve an acceptance rate between 20% and 40%. In order todiagnose convergence of algorithms, we used graphical techniques such as history plots(available in WinBUGS) of the values of the multiple chains for each parameter. Basedon these plots we decided upon an initial number of burn-in iterations (e.g., at least 4000)followed by say B = 2000 samples per chain drawn afterwards. We performed MCMCsampling based on three parallel chains, therefore all the posterior summaries are based on atotal of 3B = 6000 samples. Mean of posterior distribution for each of unknown parameterswas taken to be the posterior estimate of that unknown parameter along with a 95% posteriorinterval formed by 2.5% and 97.5% posterior percentiles.

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014


3.2. The Splines Euler’s Approximation Method (SEAM)

Interpolation is a method of constructing a smooth curve from a discrete set of knowndata points, which is a specific case of curve fitting, in which the function must go exactlythrough the data points. Spline interpolation is a form of interpolation where the interpolantis a special type of piecewise polynomial function called a spline. Spline interpolation ispreferred over regular polynomial interpolation because the interpolation error can be madesmall even when using low degree polynomials for the spline. The motivation behind thesecond approach, “Splines Euler’s Approximation Method (SEAM)” is to obtain a smoothestimate of the mean using interpolating spline by regressing observed response on observedtime points. Given the observed data D (as defined in previous section), we fit a cubic splineinterpolation passing through each of the time points t j’s. More specifically for each interval[t j, t j+1], there is a separate cubic polynomial each with its own coefficients:

S j(t) = a j(t − t j)3 +b j(t − t j)

2 + c j(t − t j)+d j for t ∈ [t j, t j+1]

together, these segments constitute the spline S(t) which must satisfy the following condi-tions:

(i) Piecewise continuous: S j(t j) = y j, S j(t j+1) = y j+1 and

(ii) First and second derivatives continuous: S′j−1(t j) = S′j(t j) and S′′j−1(t j) = S′′j (t j).

A detailed discussion about splines and interpolation can be found in literature (Wahba,1990).

In order to construct the approximating mean function µ(t,θθθ), we use the followingsteps:

1. Fit an interpolating spline Y (t) by regressing observed data Yj’s on observed timepoints t j’s for j=1,2,...,n and obtain the predicted values Yk = Y (t0

k ), corresponding toeach of the fixed time points t0

k = t0 +hk, for k = 1, . . . ,N. This creates a pseudo-dataset {(Yk, t0

k ) : k = 1,2, . . . ,N} suitable for Euler’s approximation.

2. Next, use Euler’s method (3.3) to construct an approximate mean function µ(t0k ,θθθ).

3. Finally, obtain the least square estimate θθθ by minimizing

SS(θθθ) =N

∑k=1

(Yk − µ(t0k ,θθθ))

2, (3.12)

and then obtain σ2 = 1N−p ∑N

k=1(Yk − µ(t0k , θθθ))

2.

The first advantage of this approach lies in the fact that no parametric distribution assump-tion is necessary for the errors. Another advantage comes from the use of splines for curvefitting, since splines can be used to fit data observed over sparsely and unevenly spaced timepoints, the SS(θθθ) defined by (3.12) becomes a smooth function of θθθ facilitating the mini-mization problem in (3.12). Moreover, this approach is computationally very efficient andeasy to implement requiring no iterative procedure. Therefore, SEAM provides an attractivetool in comparison of other computationally intensive procedures.

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014


4. Analysis of Growth Colonies of Paramecium Aurelium

Diggle (1990) presents a data set that describes the growth of three closed colonies ofparamecium aurelium in a nutritive medium on a 19 days period. For a detailed descriptionof this experiment, we encourage readers to refer to Diggle (1990, p. 8). One of the maingoals of this experiment was to develop a dynamic model for the growth count (say x j’s) ofparamecium aurelium, as a function of time t. The data is assumed to follow a log-normaldistribution, with

y j = log{x j}= log{ν(t j,θθθ)}+ ε j

= µ(t j,θθθ)+ ε j, ε j ∼ N(0,σ2) (4.1)

where x j is the observed growth count at time point t j (measured in days) and µ(t j,θθθ) =log{ν(t j,θθθ)}.

Next, it is assumed that ν(t) follows the standard 2 parameter logistic growth curve de-scribed by the non-linear differential equation

dνdt

= g(ν(t,θθθ))

≡ ν(θ1 −θ2ν), and ν(0) = y0 = 2. (4.2)

Equivalently, we can express the equation(4.2) in terms of µ(·):

dµdt

= g∗(µ(t,θθθ))

≡ θ1 −θ2eµ and µ(0) = log(2). (4.3)

In a logistic growth curve model, θ1 represents the growth per capita, θ1/θ2 measures thecarrying capacity of the population, and y0 represents the initial size of the population.

There are three data sets consisting of individual counts, three closed colonies of parame-cium aurelium and here we analyzed all three data sets separately. Although we proposedour estimation methods for situations where an analytical closed form solution is not avail-able for the system of differential equations, here an analytically closed form solution isactually available. The reason behind the choice of this simple growth curve model is thatit will allow us to compare the performance of the estimation based on our proposed ap-proaches to that based on the ideal nonlinear regression approach, which requires a closedform mean function be available. A detailed description of nonlinear regression techniquescan be found in a classic book by Davidian and Giltinan (1995).

To approximate the likelihood using BEAM, we chose N = n = 19 and for SEAM we

chose N = 40. These choices of the tuning parameter N (or equivalently, h =t0N−t0

1N ) are

not based on any analytical work, rather the choices are mainly driven by computationalconvenience. In general, the finer the grid points (with large N and hence small h) is chosen,the better the likelihood will be approximated. Alternatively, one may also use improvednumerical approximation methods (such as those described in the Appendix A) at the costof computing time.

To implement the BEAM on this log transformed data, we followed the same steps asdescribed in Section 2.2, coupled with equations (4.1) and (4.3). For the MCMC runs re-

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014


quired within BEAM, we generated samples based on three parallel Markov chains with aninitial burn-in of 4000 iterations followed by 2000 post-burn-in samples per chain, givingus a total of 6000 approximate samples from the posterior distribution of (θθθ ,σ). Con-vergence of the chains was diagnosed visually by inspecting simultaneous trace and acfplots of the values of all three chains, for each parameter. Plots showing a good mixing ofchains were a reasonable indication for convergence. The software WinBUGS freely avail-able at the website http://www.mrc-bsu.cam.ac.uk/bugs/ was used to perform all ofthe computations for this data analysis. The following values were used to elicit priors:θθθ 0 = (0,0)T , H0 = 0.1I2 for the prior on θθθ and a0 = b0 = 0.01 for σ2. Finally, the MC es-timate of the mean and standard deviation of the posterior distributions for these parameterswere used as point estimates and standard errors, respectively.

In a similar approach, to implement SEAM to fit the model to these data sets we followedthe same steps as described in Section 2.3. The function “interpSpline" in R was used toobtain the spline interpolation followed by the use of the optim function available in R toperform the minimization of SS(θθθ) function in (3.12). Finally, to fit the NLM, we used nlm

function in R to obtain parameter estimates and associated standard errors (SE).Results from all three estimation approaches are presented in Table 4.1. In Table 4.1, we

Table 4.1. Parameter estimates and standard errors (SE) based on thelogistic growth model for colonies of the bacteria paramecium aure-lium using NLS, BEAM and SEAM methods to fit three data sets.

Data Set Estimates Method

NLS BEAM SEAM

I θ1 0.789 0.760 0.783SE 0.025 0.029 0.044

θ2(∗10−3) 1.446 1.470 1.464SE 0.120 0.152 0.238

σ 0.218 0.266 0.233SE - 0.049 -

II θ1 0.837 0.803 0.827SE 0.025 0.034 0.049

θ2(∗10−3) 1.672 1.678 1.685SE 0.126 0.175 0.265

σ 0.201 0.266 0.207SE - 0.049 -

III θ1 0.892 0.857 0.875SE 0.018 0.025 0.051

θ2(∗10−3) 1.594 1.596 1.579SE 0.078 0.118 0.246

σ 0.132 0.201 0.137SE - 0.037 -

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014


0 5 10 15

12

34

56

Data 1

Time

Mea

n fu

nctio

n

BEAMNLSSEAM

0 5 10 15

12

34

56

Data 2

Time

Mea

n fu

nctio

n

BEAMNLSSEAM

0 5 10 15

12

34

56

Data 3

Time

Mea

n fu

nctio

n

BEAMNLSSEAM

Figure 4.1. Plots of observations on growth colonies of paramecium aurelium (in logscale) and the estimated mean trajectories obtained by NLS, BEAM and SEAM foreach of the three data sets.

presented estimates and standard errors for parameters θθθ = (θ1,θ2) and σ , corresponding tononlinear least square (NLS) method, BEAM and SEAM. Additionally, in order to comparethe performance of these three different estimation approaches, we plotted estimated meanfunction corresponding to three approaches along with the observed data points in Figure4.1. From Figure 4.1, all three approaches seem to perform very similarly in capturingthe trajectory of mean function reasonably well. It can also be observed from Figure 4.1that both BEAM and SEAM provide data fits close to the fits obtained by using the NLSmethod, which uses the exact analytical form of the mean function. We used BEAM tocalculate the point estimates based on the posterior distribution of the carrying capacity percapita (θ1/θ2) for all three data sets, along with the corresponding 95% posterior credibleinterval. These point estimates were obtained by calculating the mean of ratio of posteriorsamples for θ1 and θ2. For the data set (1), the estimated carrying capacity is 523.91 and thecorresponding 95% posterior credible interval is (445.80, 606.92). For the data set (2), theestimated carrying capacity is 526.88 and the corresponding 95% posterior credible intervalis (448.19, 612.83). Similarly, for the data set (3), the estimated carrying capacity is 533.74and the corresponding 95% posterior credible interval is (453.38, 616.70).

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014


5. Simulation Study

A simulation study, motivated by the above real data analysis was carried out to comparethe performance of the two proposed methods, BEAM and SEAM, to the NLS procedure,in terms of estimation accuracy and efficiency. The true values of parameters for data gen-eration are chosen based on the estimates obtained for the real data application. For thesimulation study, data is generated using the model (4.3) with µ(t,θθθ) given by the closedform solution:

µ(t,θθθ) = log(ν(t,θθθ))

= log(θ1)+µ(0)+ tθ1 − log[θ2eµ(0)(etθ1 −1)+θ1] (5.1)

We chose the time points as given in the real data set and simulated data sets using equation(4.1) and (5.1) with true values of the parameter set at θ1 = 0.8, θ2 = 0.0015 and σ = 0.25.

We chose the sample size same as the real data set, i.e., n = 19 and replicated the datagenerations for 1000 Monte Carlo runs. To fit the model by BEAM we chose N = n and thesame prior distribution as used in the real data analysis and used same number of burn-inand MCMC samples for each of the 1000 data sets as was used in the real data application.Similarly for SEAM we used N = 40 to fit the models to each of the simulated data sets.

A summarization of the comparative study of the three procedures based on this simu-lation study are given in Table 5.1 and Figure 5.1. In Table 5.1, we summarize our findingin terms of (i) the bias, which is the difference between MC mean of the point estimatesand the true value of a parameter; (ii) the estimated standard error (ESE), which is the MCmean of the standard errors of the parameter estimates, (iii) the Monte Carlo simulationstandard error (MCSE), which is the standard deviation of the point estimates and (iv) themean square error (MSE) obtained as Bias2+ MCSE2.

From Table 5.1, it is evident that all three methods performed equally well in terms ofbias, estimated standard errors and MCSE’s. For a better understanding of MC distributionof the parameter estimates in comparing the BEAM and SEAM with the NLS method, wepresent box plots of the estimates obtained by each of the three methods in Figure 5.1. Thehorizontal solid line in each case represents the true value of the parameters. Figure 5.1(i)reveals that although BEAM and SEAM tend to underestimate θ1, the inter-quartile rangeof the estimates from all three methods contain the true value of θ1. Even though BEAMappears to exhibit the largest negative bias compared to other two methods, this bias is notstatistically significant (e.g., p-value=0.8156). Similarly for θ2, we observe from Figure5.1(ii) and Table 5.1 that in this case as well, all three methods perform almost identically.For σ , Figure 5.1(iii) apparently indicates that BEAM tends to overestimate and SEAMtends to underestimate the true value but none of these biases are statistically significant.

Finally, in terms of comparing the MSEs (see Table 5.1) obtained by these three methods,we find that, as expected NLS has the minimum MSE compared to the two proposed meth-ods, but the gain is very nominal considering the fact that NLS uses exact analytical formof the mean function. In practice, when an analytically closed form for the mean function isnot available, NLS is not applicable, but BEAM and SEAM will still work.

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014


Table 5.1. Simulation Results for the logistic growth model for coloniesof paramecium aurelium using NLS, BEAM and SEAM Methods with1000 MC Runs.

Parameters Estimates Method

NLS BEAM SEAM

θ1 Bias 0.002 -0.014 -0.008MCSE 0.029 0.028 0.029ESE 0.029 0.029 0.233MSE 0.001 0.001 0.001

θ2(∗10−3) Bias 0.005 0.036 0.017MCSE 0.146 0.149 0.149ESE 0.140 0.152 0.235MSE 0.021 0.024 0.022

σ Bias -0.005 0.014 -0.017MCSE 0.042 0.044 0.044ESE - 0.048 -MSE 0.0018 0.0021 0.0022

NLS BEAM SEAM

0.75

0.80

0.85

0.90

(i)

NLS BEAM SEAM

0.00

120.

0016

0.00

20

(ii)

NLS BEAM SEAM

0.10

0.20

0.30

0.40

(iii)

Figure 5.1. Box plots of point estimates: (i) θ1’s (ii) θ2’s (iii) σ ’s based on 1000simulated data sets. (The horizontal solid line in each case represents the true valueof the parameters.)

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014


6. Discussion

The main objective of the data analysis (in Section 3) and simulation study (in Section4) was to compare the performance of BEAM and SEAM with NLS for situations where aclosed form analytical solution for system of differential equations is available. Results ofdata analysis and simulation study suggest that both of these methods provide results veryclose to the results obtained by the NLS method and therefore the Euler’s approximation tothe mean function, described in Section 3, works quite accurately. Also the strikingly similarvalues of the MSE’s of the parameters suggest that the proposed methods are as efficient asthe NLS method when an analytic solution is available.

In this paper we assumed that random errors are identically, independently distributed butmethodologies presented here are not restricted to this assumption. This assumption can berelaxed by using a generalized nonlinear modeling framework (see Davidian and Giltinan,1995) with µ(t,θθθ) as the mean function and σ2(t,ηηη) as the variance function, which isassumed to be a known function up to the unknown parameter ηηη . In order to implementBEAM with this generalized nonlinear modeling framework we need to use suitable priorsfor the parameter ηηη of the variance function. For the SEAM we have to replace the SS(θθθ)in (3.12) by a weighted least square criteria, where say the weights can be chosen to beinversely proportional to the variance function.

Further, in most of biomedical applications, data constitutes of several individuals andmodeling of such data involves population specific parameters as well as individual-specificparameters, and that requires a mixed effects modeling framework. Future work for this re-search consists of extending the proposed methodologies to the mixed effects model frame-work where data are subject to missingness and censoring.

One of the main advantages of BEAM and SEAM is that these do not require any restric-tive assumptions other than those typically considered in nonlinear modeling. The proposedlikelihood approximation method also provides a closed form approximation of the meanfunction, µ(t,θθθ). Therefore these methods can be used to estimate the mean function atany time point lying within the close vicinity of the observed time range. This is a hugeadvantage as it avoids evaluating the numerical solution of the mean function at the pa-rameter estimate again and again for interpolation/extrapolation. Because of the Bayesianframework, one of the key advantages of BEAM also lies in its ability to handle missingdata that is very common in longitudinal studies. The availability of posterior distributionsfor the unknown parameters, also makes it straightforward to draw statistical inferences.At the same time advantage of SEAM comes from not only from its accuracy of estima-tion and weaker distributional assumptions but also from its computational convenience ascompared to BEAM. Although BEAM provides estimates that are not only accurate but ap-plicable with missing or censored data, there is no denying that this is a computationallyintensive procedure. In comparison to BEAM, SEAM takes much less computation time,but SEAM is limited to handling only un-censored data. At the end, we will leave the choicebetween BEAM and SEAM up to readers, as both have their pros and cons.

For our proposed methods, we used the “naive" Euler’s approximation method in bothcases. We chose Euler’s approach just for the sake of simplicity and also because it providedreasonable estimates for parameters in our simulation studies. However, other improvednumerical methods (see Appendix A) can also be implemented within proposed methods,though that will mostly likely increase the computational time.

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014


A. Appendix: Numerical methods to solve a system of ODEs

To describe few popular approximation methods, for simplicity, we assume µ ≡ ν withq = 1.

1. Naive Euler’s method: This method is based on the following sequence of solutions,

µ(tn) = µn ≈ µn−1 +g(tn−1,µn−1)h (A.1)

The order of accuracy for this method is o(h). Recall that the order of accuracy of amethod is the order with which the solution function is approximated.

An approximation to a solution of a system of ODEs is said to be rth order accurateand is denoted by o(hr), if the term corresponding to hr in the Taylor expansion of thesolution is correctly reproduced.

2. Improved Euler’s method: This second algorithm improves the naive Euler’s methodby modifying the algorithm as follows:

µ(tn)≈ µn = µn−1 +12[g(tn−1,µn−1)+g(tn,µn−1 +g(tn−1,µn−1)h)]h (A.2)

The order of accuracy for the improved Euler’s method is o(h2).

3. Runge-Kutta method: This algorithm uses the Simpson’s rule:∫ tn+h

tng(t,µ(t))dt ≈ h

6[g(tn,µ(tn))+4g(tn +

h2,µ(tn +

h2))+g(tn +h,µ(tn +h))]

The final algorithm, after approximating µ(tn), µ(tn + h2 ) and µ(tn + h) is given by

following sequence of steps:

Kn,1 = g(tn,µn)

Kn,2 = g(tn +h2,µn +

h2

Kn,1)

Kn,3 = g(tn +h2,µn +

h2

Kn,2)

Kn,4 = g(tn +h,µn +hKn,3)

µn+1 = µn +h6[Kn,1 +2Kn,2 +2Kn.3 +Kn,4] (A.3)

This method has an accuracy of o(h4) which is much better compared to previoustwo approximation methods, but is also more computationally more intensive thanthe previous two methods.

There is a huge literature on numerical methods to solve ODEs and an extensivereview of these numerical methods can be found in a classic book by Butcher (2003).

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014


Acknowledgements

The authors would like to thank the Editor for inviting the first author to submit a manu-script for the special volume. We would also like to thank the reviewer for the insightfulcomments and helpful suggestions.

References

Atkinson, K., 1978. An Introduction to Numerical Analysis. Wiley, New York.Butcher, J.C., 2003. Numerical Methods for Ordinary Differential Equations. John Wiley & Sons.Davidian, M., Giltinan, D.M., 1995. Nonlinear Models for Repeated Measurement Data. Chapman & Hall,

London.Diggle, P.J., 1990. Time Series A Biostatistical Introduction. Oxford Science Publications.Ding, A.A., Wu, H., 1999. Relationships between antiviral treatment effects and biphasic viral decay rates

in modelling HIV dynamics. Mathematical Biosciences, 160, 63–82.Gelman, A., Bois, F., Jing, J., 1996. Physiological pharmacokinetic analysis using population modeling and

informative prior distributions. Journal of the American Statistical Association, 85, 398–409.Geman, S., Geman, D., 1984. Stochastic relaxation, Gibbs distributions and the Bayesian resolution of

images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.Han, C., Chaloner, K., Perelson, A.S., 2002. Bayesian analysis of a population HIV dynamic model. Case

Sudies in Bayesian Statistics, 6, 223–237.Hastings, W.K., 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika,

57, 97–109.Ho, D.D., Neumann, A.U., Perelson, A.S., Chen, W., Leonard, J.M. Markowitz, M., 1995. Rapid turnover

of plasma virions and CD4 lymphocytes in HIV-1 infection. Nature, 373, 123–126.Holte, S.E., Cornelisse, P., Heagerty, P., Self, S., 2003. An alternative to nonlinear least-squares regression

for estimating parameters in ordinary differential equations models (personal communication).Huang, Y., Liu, D., Wu, H. (2004). Hierarchical bayesian methods for estimation of parameters in a longitu-

dinal HIV dynamic system. Biometrics, 62(2), 413–423.Lambert, J.D., 1987. Numerical Methods for Ordinary Differential Equations. John Wiley, Chichester.Lunn, D.J., Best, N., Thomas, A., Wakefield, J., Spiegelhalter, D., 2002. Bayesian analysis of population

PK/PD models: General concepts and software. Journal of Pharmacokinetics and Pharmacodynamics,29, 271–307.

Natarajan, R., Kass, R., 2000. Reference Bayesian methods for generalized linear mixed models. Journal ofthe American Statistical Association, 95, 227–237.

Perelson, A.S., Neumann, A.V., Markowitz, M., Leonard, J.M., Ho, D.H., 1996. HIV-1 dynamics in vivo:virion clearance rate, infected cell life-span, and viral generation time. Science, 271, 1582–1587.

Petzold, L.R., 1987. Automatic selection of methods for solving stiff and nonstiff systems of ordinarydifferential equations. Journal of Scientific and Statistical Computing, 4, 136–148.

Putter, H., Heisterkamp, S.H., Lange, J.M.A., de Wolf, F., 2002. A Bayesian approach to parameter estima-tion in HIV dynamical models. Statistics in Medicine, 21, 2199–2214.

Racine-Poon, A., Wakefield, J., 1998. Statistical methods for population pharmacokinetic modelling. Sta-tistical Methods in Medical Research, 7, 63–84.

Robert, C.P., Casella, G., 2005. Monte Carlo Statistical Methods, Springer.Schafer, J.L., 1997. Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC.Shampine, L.F., 1994. Numerical Solutions of Ordinary Differential Equations. Chapman & Hall, New

York.Wahba, G., 1990. Splines for Observational Data. Society for Industrial and Applied Mathematics.Wakefield, J.C., 1996. The Bayesian analysis to population pharmacokinetic models. Journal of the Amre-

ican Statistical Association, 91, 62–75.

Dow

nloa

ded

by [

Uni

vers

ity o

f N

orth

Car

olin

a] a

t 13:

46 1

3 N

ovem

ber

2014

Documents

Statistical Inference for Non-Linear Models Involving Ordinary Differential Equations