Model Selection and Mixed-Effects Modeling of HIV Infection Dynamics

8/3/2019 Model Selection and Mixed-Effects Modeling of HIV Infection Dynamics

1/24

Model Selection and Mixed-Effects Modeling

of HIV Infection Dynamics

D. M. Bortz and P. W. Nelson

Mathematical Biology Research Group

Department of Mathematics

University of Michigan

2074 East Hall

525 East University Avenue

Ann Arbor, MI 48109-1043

USA

Abstract

We present an introduction to a model selection methodology and an application tomathematical models of in vivo HIV infection dynamics. We consider six previouslypublished deterministic models and compare them with respect to their ability torepresent HIV infected patients undergoing reverse transcriptase mono-therapy. Inthe creation of the statistical model, a hierarchical mixed-effects modeling approach

is employed to characterize the inter- and intra-individual variability in the patientpopulation. We estimate the population parameters in a likelihood function formu-lation, which is then used to calculate information theory-based model selectioncriteria, providing mathematical ranking of each of the models ability to representpatient data. The parameter fits generated by these models, furthermore, providestatistical support for the higher viral clearance rate c in Louie et al. (2003). Amongthe candidate models, our results suggest which mathematical structures, e.g., linearversus nonlinear, best describe the data we are modeling and illustrate a frameworkfor others to consider when modeling infectious diseases.

Key words: Model Selection, Mixed-effects Modeling, HIV, Information Criteria,Parameter Estimation

Author to whom correspondence should be addressed.Email addresses: [email protected] (D. M. Bortz), [email protected],

[email protected] (P. W. Nelson).

Preprint submitted to Bulletin of Mathematical Biology August 4, 2005


2/24

1 Background

In 1995 a simple model for Human Immunodeficiency Virus-1 viral loads inthe blood plasma of infected individuals was presented in Ho et al. (1995)

which revolutionized our understanding of the disease. This work was thefirst to show that the infection pathogenesis was a rapidly varying dynamicalprocess during which about twelve billion viral particles per day were beingproduced in infected individuals. The model used to examine the data fromtwenty patients was

V(t) = P cV(t) ,

where V(t) represents the rate of change (at time t) in the time-dependent

viral concentration V(t), P the daily production rate, and c the viral clearancerate. By assaying each patients viral loads before receiving anti-viral therapy asteady state level for the virus was calculated, thus yielding an initial conditionfor V(t). Following administration of the protease inhibitor, repeated patienttesting depicted a rapid decline in the total viral loads in the plasma. Byassuming the drug to be completely effective they were then able to identifya specific value for c, which strongly suggested that incredibly large numbersof virions were created every day in an HIV infected individual. A concurrentstudy by Wei et al. (1995), used a different mathematical model, but still led toa nearly identical conclusion, thus further supporting the high HIV productionclaims.

The viral decay process is fundamentally exponential, but explicit models ofthis form have a limited capacity for biological interpretation. Thus, the workin Ho et al. (1995) was followed in 1996 by Perelson et al. (1996) in whichthey extended the model to

T(t) = kT0VI(t) T(t) ,

VI(t) = (1 np)NT cVI(t) , (1)

VNI(t) = npNT(t) cVNI(t) ,

which included the dynamics of productively infected T cells T, their rateof decay , and the biological fact that an anti-viral therapy using a proteaseinhibitor leads to the production of non-infectious viral loads VNI. Neitherthe original model nor the data distinguished between infectious and non-infectious viral concentrations, and thus system (1) separated the viral con-centrations into these two compartments to aid in analysis. Clearly no drug iscompletely effective, thus the efficacy of the protease inhibitor np is betweenzero and one, depending on its potency. Ifnp = 1, as is done in Perelson et al.

2


3/24

(1996), then all subsequent virions produced after the initiation of therapyare non-infectious viral particles and incapable of infecting target T cells T0.The decline observed in the patient data is characterized by the decay rates, and c, and leads to estimates for the half life of productively infected Tcells and viral particles of two days and eight hours, respectively. Since this

model assumes a perfectly efficacious drug therapy (np = 1), both these esti-mates should be considered lower bounds. These publications catalyzed a largenumber of studies over the next several years, with the more theoretical workconsisting primarily of variations of the model (1). It is this body of researchthat we wish to address in this paper.

1.1 Subsequent Development of Mathematical Models of HIV Infection Dy-namics

As of July 2005, there were over 2400 publications citing Ho et al. (1995) and1050 citing Perelson et al. (1996). In many of these papers, researchers arguedover the importance of a variety of biological effects as well as for the inclu-sion or exclusion of the corresponding representations in their mathematicalmodels. Following the publication of Perelson et al. (1996), additional and/oralternative compartment formulations were proposed (Callaway and Perelson,2002; Kramer, 1999; Murray et al., 1998; Nowak et al., 1997; Perelson andNelson, 1999; Stafford et al., 2000; Wodarz et al., 1999) and the use of de-lay differential equations in modeling the eclipse phase was heavily debated(Grossman et al., 1998; Herz et al., 1996; Lloyd, 2001; Mittler et al., 1998;

Nelson et al., 2000; Nelson and Perelson, 2002). The knowledge gained fromusing models of disease pathogenesis has, in many cases, suggested novel de-sign ideas for treatment strategies as well as laboratory experiments. In themid nineties, for example, several publications provided strong support forthe existence of a high rate of HIV replication and clearance in infected in-dividuals (Ho et al., 1995; Perelson et al., 1996; Wei et al., 1995). It is nowcommonly believed that in vivo, on the order of 1010 virions are created andthen destroyed every day by the immune system (Mittler et al., 1999; Perel-son et al., 1997; Ramratnam et al., 1999). The high replication rate impliesthat the virus has an enormous number of opportunities to mutate and evolveinto a drug resistant strain. A pharmacological mono-therapy will, therefore,

eventually fail since the virus will almost certainly manifest a resistance toany one drug. To counteract the high mutation rate, the current approach isto simultaneously administer multiple drugs to HIV infected individuals.

In many of the aforementioned papers, the viral clearance rate was identifiedby modeling the disease pathogenesis with a system of deterministic differen-tial equations, numerically calculating a solution, and then fitting the resultswith plasma viral load data (using a ordinary least squares (OLS) approach),

3


4/24

e.g., see Perelson et al. (1996, 1997); Ramratnam et al. (1999). Two statisticalissues rarely considered when studying disease pathogenesis using dynamicalsystems are the modeling of variability within and between individuals as wellas the estimation of statistical evidence for the superiority of one model overothers. We will employ hierarchical nonlinear mixed-effects (NLME) modeling

approach to address the first issue and model selection criteria for the second.To our knowledge, the first publication to employ NLME and basic model se-lection in modeling in vivo HIV dynamics was Wu et al. (1998), though theyrestricted their models to low-dimension, linear DEs. Aside from Wu et al.(1998), Wu and Ding (1999), and Wu and Wu (2002b), moreover, model se-lection methods have experienced limited use in the mathematical modelingcommunity in general and among HIV dynamical system modelers in partic-ular.

2 Mixed-Effects Modeling

In the modeling of any physical system, one must choose how to addressthe inherent randomness. The variability present in inorganic phenomena fre-quently leads to singular probability densities and is typically ignored withoutconsequences. In biological systems, however, there are a multitude of poten-tial sources of randomness and ignoring their impact could potentially leadto spurious conclusions. In the development of many of the aforementionedmodels, the dynamics of cellular and viral populations inside an individualpatient were implicitly assumed to be deterministic. When the statistical is-

sues were addressed, they were frequently secondary to the main conclusions ofthe article. 1 We will employ a (statistically) straightforward characterizationof the inherent within- and between-patient variability. Consequently, it is im-portant to realize that the subsequent development in this section could haverealistically been performed using alternative distributions (such as Gammaor exponential) and approximation schemes.

We assign the sources of randomness in two stages, the first for an individualand the second for the overall population. For patient i, the log transform ofthe ni-dimensional vector of viral load measurements vi at ni time-points ismodeled by

ln vi = ln f(+ bi) + i , (2)

where the vector valued function f represents the solution to the differentialequation, evaluated at the same ni timepoints, and i is a vector randomvariable describing the measurement error in observing the viral loads. 2 The

1 Excluding the publications by H. Wu and V. DeGruttola cited in this article.2 In a mild abuse of notation, for a vector x = (x1, x2, . . . , xn)

T, we let lnx =(lnx1, ln x2, . . . , lnxn)

T.

4


5/24

vector + bi consists of parameters from the chosen differential equation suchas viral clearance rates and CD4 T-cell death rates. For each patient, iscomposed of q population parameters and bi is a vector of random variablesof perturbations away from , realized by selecting an individual patient ifrom the population. This framework is known as a hierarchical nonlinear

mixed-effects model (Davidian and Giltinan, 1995; Pinheiro and Bates, 2000),wherein is commonly called a fixed effect and bi is a random effect. In thispaper, we will assume that the measurement errors are independent, identicallydistributed (iid) log-normal, i.e., i N(0, 2I), with common variance 2,mitigating heteroscedastic features of the data. To test this assumptions, wefit all patients using a delayed exponential and studied the errors using asingle sample Kolmogorov-Smirnov hypothesis test. The result was that thenull hypothesis (of log-normal errors) cannot be rejected at confidence levelsabove 70%. On a patient by patient basis, moreover, nine out of ten reportedsimilar (or better) results, thus giving us further support for our assumption.

With regard to the distribution for the perturbations bi, the ten samplesgenerated in the simple case are, however, not enough to provide strong sup-port for any distribution. In the HIV literature, several different distributionshave been assumed, with the most popular being a normal distribution. Inthe absence of further information, we simply assumed bi N(0, ) withvariance/covariance matrix (as done in Wu et al. (1998)).

We assume that the individual patient pdfs are independent and consider thefollowing probability density function (pdf) p as a function of the viral loaddata V = {v1, v2, . . . , vM} for all M patients

p(V, , , 2) =Mi=1

ppat(vi|bi, , , 2)ppop(bi|, ,

2)ppar(, , 2) .

For simplicity, we let be a vector containing , the unique elements of, and 2. The first term in the product, ppat(vi|bi, ), is the pdf of an in-dividual patient, given perturbations away from the population parameters,whereas ppop(bi|) represents the pdf of the perturbations, given populationparameters. We will simply assume the last term in the product ppar() to beproportional to one, as while it is possible to assert priors on , , and 2, itwill distract from the main theme of this article.

To compute a likelihood for , , and 2, we integrate out the marginaldistribution of the bis

Mi=1

ppat(vi|bi, )ppop(bi|)dbi., (3)

For convenience in later computations, we designate the penalized nonlinear

5


6/24

least squares term as

g(, vi, bi) = ln vi ln f(+ bi)22 +

2bTi 1bi ,

setting up our approach for calculating the likelihood

L(|V) =||M/2

(22)(N+Mq)/2

Mi=1

exp

g(, vi, bi)

22

dbi , (4)

where N =M

i=1 ni is the total number of observations over all patients andtimepoints and q is the dimension of . The values which maximize (4) arethe Maximum Likelihood Estimators (MLE) of .

The direct evaluation of the multidimensional integral in (4) over the domain

of bi is challenging. To address this, we employ Laplaces approximation andconsider a Taylor series expansion of g around the best fit MLEs of the per-turbations of, bi() = arg minbi g(, vi, bi)

g(, vi, bi) g(, vi, bi()) + (bi bi())T2

bg(, vi, bi())(bi bi()) ,

noting both that the gradient of g at bi() will be zero and that bi() isdependent upon , , and 2. In other words, we minimize g for the ith patientand then approximate it at the best fit bi() using a truncated Taylor series.If the Hessian ofg is diagonalizable, we recognize the integral as a multivariate

Gaussian, readily allowing a computation of an estimate of the integral in (4).According to nonlinear least squares theory, the Hessian can be approximatednear bi by

2bg(, vi, bi())

2bg(, vi, bi())

= bf(+ bi)Tbf(+ bi) +

21 ,

and we shall employ this in our calculations. The advantage of this approachis that it requires only that 2

bg (and 2

bg) be diagonalizable.

Designating as the natural log of the likelihood, our overall goal is thus tomaximize

(|V) = N

2ln(22) +

M

2ln ||

1

2

Mi=1

ln2

bg(, vi, bi())

+ 2 Mi=1

g(, vi, bi())

,

6


7/24

over , where for patient i

ln vi = ln f(+ bi) + i ,

bi N(0, ) ,

i N(0, 2I) ,

= {, , 2} ,bi() = arg min

bi

g(, vi, bi, ) ,

and

2bg(, vi, bi) = bf(+ bi())

Tbf(+ bi()) + 21 .

The value of at its maximized point will subsequently be used in the nextsection to calculate a selection criteria for each model.

We have chosen not to focus on the mixed-effects framework, since there is

a precedent for this formulation in HIV modeling Wu et al. (1998); Wu andDing (1999); Wu and Wu (2002a). In particular, we direct the interested readerto the excellent review article (Wu, 2005). We do note, however, that thesepublications do not include nonlinear differential equation models, withoutclosed form solutions, in their analyses.

3 Model Selection Criteria

It is possible that a candidate model may generate an improved, but over-fittedsystem, due to increased degrees of freedom in the differential equation. If themodel choice is based solely on the maximum likelihood (or least squares),we will over-fit the data. It is well known, furthermore, that the solutionsto systems of DEs of dimension three and higher can exhibit topologicallycomplex and even chaotic behavior. This calls into question the robustnessof firm conclusions based upon best least squares fit parameters in DEs ofmodest to high dimension. In the information theory and statistical inferenceliterature, there exist methodologies for selecting the best model from severalcandidate models. From the perspective of applied mathematics, these infor-mation criteria (IC) are measures of parametric complexity. While they do

not directly address any questions concerning the topological complexity ofthe solution manifolds for the candidate DEs, we can use them to providestatistical evidence supporting one model over others.

While differing in the derivation details, all of the considered criteria involve agoodness-of-fit term as well as a term measuring the complexity of the model.It is therefore important to understand the utility of these criteria as wellas their limitations. Specifically, since each of the three criteria we consider

7


8/24

employ different representations of complexity, each will rank the superiorityof candidate models according to different priorities.

3.1 Akaike Information Criterion

If p is a probability density function (pdf) representing the reality of somephenomena and p represents the pdf of a given candidate model, one way toquantify the discrepancy between p and p is by using the Kullback-Leibler(KL) distance (Kullback and Leibler, 1951)

I(p, p) = Ep[ln(p/p)] =

ln

p(x)

p(x)

p(x) dx ,

where is the set of all realizable values for x (viral loads in our case). Clearlyp and p need not be from the same class of functions, but we will search fora parametrization for p, that yields a pdf p(|) as close as possible to p

(as measured by I). By assuming that a minimum for I(p, p(|)) exists andasymptotically expanding I around it, Akaike (1973, 1974, 1977) was able tocompute an expected estimate of I(p, p) up to a data-dependent, additiveconstant. The non-constant part of this estimate is the

Akaike Information Criterion (AIC) = 2(|V) + 2Q , (5)where is the vector of Q best fit Maximum Likelihood Estimator (MLE)parameters and is the log-likelihood function, conditioned on the given dataV. 3 Since the criteria estimates the KL distance up to an additive constant,this allows relative ranking of a set of candidate models, when all are fit tothe same set of data.

Later developments and refinements of the AIC include the small sample sizeAIC (Hurvich and Tsai, 1989)

AICC = 2(|V) + 2Q

N

N Q 1

,

and Takeuchi Information Criteria (TIC) (Takeuchi, 1976)

TIC = 2(|V) + 2tr J()I()1 . (6)Note that the trace term in the TIC contains

I() = Ep

2(|)T

,

3 For the HIV data, V = {v1,v2, . . . ,vM} and consists of, the unique elementsof , and 2 for a total ofq(q + 3)/2 + 1 = Q MLE parameters.

8


9/24

and

J() = Ep

(, )

(, )T

,which are the Hessian and outer product forms of the inverse Fisher Informa-tion Matrix (IFIM), respectively.

There are, however, disadvantages to each of these criteria. The AIC is knownto over-fit data and conventional wisdom suggests that we have enough obser-vations to make the small sample correction insignificant. 4 In both of thesecriteria, the model complexity measure ascribes parsimony to models with asmall number of parameters. While the TIC does include sensitivities, it isalso asymptotically unbiased, with the second term converging to 2Q in thelimit of large data. It is, moreover, also well-known for its volatility in prac-tice (Shibata, 1989), and as depicted in the Section 5, our results illustratethis behavior. We would prefer our metric for complexity to somehow reflect

the interdependence between parameters and these criteria are, therefore, notideally suited for our purposes.

3.2 Information Complexity (ICOMP)

Under the assumption of a multivariate Gaussian density for both the statisti-cal model p and reality p, Bozdogan (1988) proposed Information Complexity(ICOMP)

2(|V) + Q ln(ag

) , (7)

as a model selection criteria, where a and g are the algebraic and geometricmeans of the eigenvalues of the variance/covariance matrix of the MLE ,respectively. The basis for this criteria lies in Shannons concept of channelcapacity (Shannon, 1948) also known as the mutual information between theparameters. Given the true density p, the first term in (7) is an unbiasedestimator of the non-constant part of I(p, p), i.e., Ep[lnp], and as in othercriteria, reflects the goodness of fit of a candidate model. The second term isderived by consideringI(

Qi=1 pi, p) where the pis are the marginal densities of

p with respect to the ith parameter. The computation of this distance reducesto

I(Qi=1

pi, p) =1

2

Qi=1

ln(sii) 1

2ln |(| , (8)

4 Accepted practice is to use the AICC when observations/parameters is greaterthan 40 (Burnham and Anderson, 2002). As expected, the differences between theAICC and AIC values for our models were insignificant (results not presented).

9


10/24

where () is the variance/covariance matrix for the Gaussian p and sij isthe (i, j)th element of () (van Emden, 1971, page 61). From a coding the-ory perspective, one could think of (8) as the actual rate of transmission of asignals between the random variables, i.e., and the elements ofbi (Shannon,1948, pages 20-2). The capacity of a noisy channel would then be calculated

by maximizing (8) over all possible sources. From a model selection perspec-tive, a complex model would be one in which signals are accurately passedbetween parameters, and several researchers, including van Emden (1971) andBozdogan (1988), have advocated this philosophy. Maximizing over sources isanalogous to maximizing (8) over all possible orthonormal transformations ofthe coordinate axis of and bi. Since the eigenvalues of () are invariantunder similarity transformations and () is symmetric positive definite, themaximal transform rotates the axes so as to equalize the sii, yielding

Q

2ln

tr((

))

Q

1

2ln |(

)| , (9)

as a measure of the complexity. This transformation can be explicitly con-structed employing Jacobi rotations to compute a diagonalization of result-ing in the algebraic mean of the eigenvalues of on the diagonal (see vanEmden (1971, page 65) or Trefethen and Bau III (1997, pages 225-7)). Lastly,it is known that asymptotically = I1, and thus we can compute an estimateof the ICOMP by replacing () with the IFIM

ICOMP(IFIM) = 2(|V) + Q ln tr I()1 ln I()1

.

A helpful interpretation of (9) is revealed by recalling that for i, j = 1, . . . , Q,the eigenvalues {i}

Qi=1 of (

) will be located in the Gerschgorin discs ofradius

i=j |sij|, centered at sii. If we let ei be such that for each i, i = sii+ei

where |ei|

i=j |sij|, Equation (9) then equals

Q

2ln

1QQ

i=1 iQi=1 i

1/Q = Q

2ln

1QQ

i=1 siiQi=1(sii + ei)

1/Q

=Q

2ln

1Q

Qi=1 sii

Qi=1 sii +Qi=1 O(ei)j=i sjj1/Q ,

=Q

2ln

sasg

1

2ln

1 + Qi=1

O(ei)j=i

sjjsii

,illustrating how the ICOMP ascribes simplicity to a model with a small num-ber of parameters with small covariances and tight clustering of variances(sa sg).

10


11/24

We desire a parametrically parsimonious model that not only fits the datawell, but also ascribes each parameter to a specific mathematical feature ofthe data, and of the three criteria considered, the ICOMP is the criteria bestsuited to our application.

3.3 Discussion

In the computation of the above criteria, I() and J() must be estimated.The matrix I is similar to the Fisher Information matrix, but in our case, theexpectation is taken with respect to reality p and not the candidate model p.We will employ the empirical Hessian as an approximation

I(

) I(

) =

2(|V)T

,

and will estimate the matrix J by assuming independent random samples,allowing the approximation

J() J() = Nj=1

(, Vj)

(Vj)

T ,where j indexes over all patients and measurements.

The computation of these estimates involves calculating the sensitivity of theviral compartments to the constitutive parameters in the differential equation,

i.e., the derivative of state variables with respect to some chosen parameter.These parameter sensitivities are actually solutions to sensitivity equations,and for illustrations of derivations and analysis of sensitivity equations in thecontext of HIV infection dynamics, see Bortz and Nelson (2004) or Banksand Bortz (2005b). In all models except for the delay differential equation,we employed the automatic differential software ADOL-C (Griewank et al.,1996) to calculate both the gradients and the Hessians. The accuracy of theelements in these matrices, therefore, is restricted only by the accuracy of thealgorithm chosen to numerically simulate a solution to the chosen differentialequation.

As is illustrated in Banks et al. (2003) and Ciupe et al. (2005), a more conven-tional approach to the model selection is to use the least squares functional tocalculate the statistical significance of the introduction of more complexity intoa candidate model. The application of this type of traditional statistical anal-ysis in the context of DEs was originally proposed in Banks and Fitzpatrick(1990). A related technique involving statistical testing of likelihood ratios isemployed by Gorfine et al. (2003) in the context of modeling B lymphocytedevelopment, to provide evidence for phenotypic reflux between B-cells in dif-

11


12/24

ferent stages of development. Both of these techniques are, however, restrictedto a null/alternative hypothesis framework which involves binary comparisonsbetween nested models (where one model is a simplification of the other). SinceIC-based approaches are not restricted in this manner, they have a distinctadvantage when comparing multiple models from different schools of thought

and with different mathematical representations.

There are also manifest well-posedness questions regarding the model selec-tion methodology we have employed. The identifiability of parameters, suchas clearance and death rates is not obvious and is the focus of Sonday andNelson (2005). We restricted our collection of DE models to published oneswith previously proven existence and uniqueness results, and with a numer-ical approximation fN to an actual solution f, with index of approximationN, we assume that as N , fN f. While there are issues concerningcontrol of error propagation into the likelihood calculation, the well-posednessof the forward problem is easily shown (Bortz, 2005). Since we are using a

numerical approximation to the MLE parameters N, the likelihood LN, andthe selection criteria ICN, however, the well-posedness of the inverse problemis not clear. With the absence of identifiability results, in order to carefullyanswer the question of how (or if) ICN IC as N , the topology of theparameter space will likely need to be reformulated using the Prohorov metricas was done in Banks and Bortz (2005a). This endeavor and results concerningthe numerical stability of the likelihood approximation scheme are the focusof Bortz (2005).

Given multiple models which fit the data equally well, we can therefore, calcu-late their IC values and rank them based on ability to encapsulate the reality ofthe data. While our primary focus is to illustrate the utility of the informationtheory-based model selection framework, there are other possible choices forcriteria including the Schwarz Information Criteria (SIC) (Schwarz, 1978) 5 ,the Minimum Description Length (MDL) based Stochastic Complexity (SC)(Rissanen, 1989), and the Kullback Information Criteria (KIC) (Kim and Ca-vanaugh, 2005) For a survey of Akaike, MDL, and Bayesian based model selec-tion, we direct the reader to Burnham and Anderson (2002), Grunwald et al.(2005), and Kass and Raftery (1995), respectively.

4 HIV Model Comparison

We will now utilize the mathematical theory presented in the last section tostudy a collection of models for HIV pathogenesis during anti-viral drug ther-apy. We consider the data from an experiment reported in Louie et al. (2003)

5 The SIC is also known as the Bayesian Information Criteria (BIC).

12


13/24

in which ten HIV infected patients were placed on a reverse transcriptase (RT)inhibitor mono-therapy (Tenofovir). Measurements of their plasma HIV RNAconcentrations were recorded over three weeks and it is the first two weeks ofthis data which we propose to fit.

There are many variations to (1) which could potentially be used to modelthe infection dynamics and we have chosen in this section to examine andcompare five published variations to this model. Any one of the alternativemodels could be a better representation of the actual dynamics, and we areinterested in the one which best matches the dynamics as well as the statistics.

Since the experiment under consideration consists of an RT monotherapy, thefirst model we consider is

T(t) = (1 nrt)kT0V(t) T(t) ,

V(t) = NT(t) cV(t) , (10)

where nrt is the efficacy of the RT inhibitor and N is the average number ofvirions produced per productively infected T cell over its lifetime. We assumethat the viral concentration prior to therapy is at a steady state, and adjustk accordingly (see Appendix A.1). As in other studies, we will focus on iden-tifying the fitted values of the c and parameters and assume known valuesthe remaining constants (except for the delay equation, where we also fit thedelay parameter).

Our second model extends (10) to allow for a time-varying target cell pop-ulation. It was previously assumed that the uninfected T-cell population T0

remained constant during the first week following the initiation of drug ther-apy. When measuring viral loads over longer periods of time, however, thisassumption is not reasonable and the following model

T(t) = (1 nrt)kV(t)T(t) T(t) ,

V(t) = NT(t) cV(t) , (11)

T(t) = S+ pT(t)

1

T(t) + T(t)

Tmax

dTT(t) (1 nrt)kV(t)T(t) ,

allows the T compartment to vary in time. See Appendix A.2 for the valuesof T0 , k, and S used to ensure that T(0) = T

(0) = V(0) = 0. Note that a

mathematical analysis of this system with and without the density dependentgrowth term, was presented in Perelson and Nelson (1999).

Our third and fourth models retain the constant target cell population as-sumption and instead address the effects of modeling latently infected T-cellsL. As in Perelson et al. (1997), we assume that upon infection, a certain frac-tion f of target cells initially become latently infected and then switch toproductively infected cells T with rate k2. These latently infected cells are

13


14/24

also less detectable by the immune system than productively infected T cellsand hence have a lower death rate L (Perelson, 2004), which we will fit alongwith and c. The cells in the L compartment are not considered to be truelatently infected cells and are now interpreted as simply infected cells, notyet producing virus. For the third model

T(t) = T(t) + k2L(t) + (1 f)(1 nrt)kT0V(t) ,

V(t) = NT(t) cV(t) , (12)

L(t) = f(1 nrt)kT0V(t) (L + k2)L(t) ,

we also assume an initial steady state condition, reflected in the conditions forT0 , k, and L0 in Appendix A.3.

It is also possible that all infected cells must pass through the non-productive

stage (f = 1), and to address this possibility, we consider for our fourth model

T(t) = T(t) + k2L(t) ,

V(t) = NT(t) cV(t) , (13)

L(t) = (1 nrt)kT0V(t) (L + k2)L(t) ,

with conditions for T0 , k, and L0 in Appendix A.4.

Our fifth model

T(t) = T(t) + k2L(t) ,

V(t) = NT(t) cV(t) kT(t)V(t) ,

T(t) = S+ pT(t)

1

T(t) + T(t) + L(t)

Tmax

dTT(t) (1 nrt)kV(t)T(t) ,

(14)

L(t) = (1 nrt)kT(t)V(t) (L + k2)L(t) ,

with conditions for T0 , k, L0, and S in Appendix A.5, is based on a sys-

tem originally presented in Perelson et al. (1993) and describes the dynamicsfor both uninfected target cells T and latently infected cells L. This model,furthermore, includes a nonlinear mass-action term in the V(t) equation toaccount for the loss of the virion upon infection of a target cell (kT(t)V(t)).

As mentioned previously, many researchers have argued for the inclusion ofan intra-cellular time delay between viral infection and production. Our lastdifferential equation (DE)-based model is

14


15/24

Parameter : Value Description : Source

nrt : 0.8 RT efficacy : Nelson et al. (2001)

p : 0.03 (hr1) T-cell growth rate : Perelson et al. (1993)

T0 : from data (cells/mm3) Initial target cell population : Louie et al. (2003)

N : 480 (virionmm3/(cellml)) Virions per lysing cell : Perelson and Nelson (1999)

V0 : from data (virions/ml) Initial viral load : Louie et al. (2003)

dT : 0.02(hr1) Natural T-cell death rate : Perelson et al. (1993)

f : 0.03 Efficacy of latent infection : Perelson et al. (1997)

k2 : 0.01 (hr1) Latent cell activation rate : Perelson et al. (1997)

Tmax : 1500 (mm3) Maximum T-Cell density : Perelson et al. (1993)

Table 1

Model Parameters.

T(t) = (1 nrt)kT(t)V(t ) T(t) ,

V(t) = NT(t) cV(t) , (15)

T(t) = S dTT(t) (1 nrt)kV(t)T(t) ,

with the conditions on T0 , k, and Sdefined in Appendix A.6. A similar modelwas originally presented in Nelson et al. (2001) and for a full mathematicalanalysis of delay differential equation (DDE) models of HIV infection dynam-

ics, see Nelson and Perelson (2002).

5 Methods

We maximized the likelihood of each model by optimizing over the infected celldeath rates, viral clearance rates, and eclipse phase delays from each modelas well as the variances and covariances of the parameters across the patientpopulation. Table 1 summarizes the parameters used for all patients. We choseto focus on the viral clearance rates and the T-cell decay rates and, therefore,

assumed constant nrt and N across all patients. There is evidence that theseparameters could vary across patients, but we do not address this issue here.

A sequential quadratic programming method with inequality constraints, em-ploying a BFGS Hessian update, was used to fit the population parametersin the likelihood function. A trust-region optimization method, also with in-equality constraints, was used to fit both the parameters for individual patientsbi. To ensure positive definiteness of , a nonlinear constraint was employed

15


16/24


17/24


18/24

are quite large in the c estimate and relatively small in the estimate. Thissuggests that while immune system efficacy may vary across patients, HIV in-fected CD4 T-cell death rates are fairly consistent. When evaluated at , theHessian of is negative definite and the gradient of is smaller than 1e-3 (inEuclidean norm). Such a large variance, however, could suggest a misspecifica-

tion in the statistical model. Accordingly, our future studies include strategiesfor model diagnostics more sophisticated than first and second order numericalconvergence criteria (Bortz, 2005).

In Table 4, we report the correlations for the parameters in all models, whichsuggests widely differing correlations based on the model chosen, giving evenmore credence to our claim that models must include dynamic and stochasticcomponents. Scientifically, therefore, we focus on the correlation reported formodel (11), which suggests a weak, but positive, correlation between viral

clearance and infected cell death rate.

In Bortz and Nelson (2004), we presented a sensitivity analysis methodologywhich included a principle component based analysis. In the diagonalizationof the sensitivity matrix, the eigenvector associated with the smallest (in mag-nitude) eigenvalue represented a direction (locally) along which (c, ) valuesgenerated similar solutions. As Figure 4 on page 1019 in Bortz and Nelson(2004) illustrated, there is a wide range for c and a small range for overwhich solutions are similar. Even though the patients in the dataset used inthat paper were undergoing protease inhibitor therapy, it is reassuring to knowthat different analyses generate similar conclusions (both coinciding with mi-crobiological understanding).

If we also consider the second best model (the delay model), we can commenton a common modeling pitfall in which terms are added to a model withoutfully realizing the mathematical and statistical implications. The inclusionof a delay term is biological justified and does yield a best fit with a lowerleast squares value (not presented). Without accounting for variation acrosspatients, however, the fitted rate of decay for the productively infected cells

is notably higher, and many researchers have addressed the implications ofthis increase. For example, Grossman et al. (1998) argue that including adelay in the model for the death of infected cells leads to different conclusionsregarding residual transmission of infection during antiviral pharmaceuticaltherapy, while Lloyd (2001) argues that an absence of delays in the modelleads to grossly optimistic conclusions about treatment efficacy. Based on themixed effects/model selection methodology we must, however, be cautious ofclaims and conclusions concerning the higher .

18


19/24

7 Conclusions

Information theory-based model selection is a powerful tool for comparingmodels and modeling mechanisms. We presented an introduction to this method-

ology as well as three criteria. We discussed the relative merits of each andchose one well suited to studying models of HIV infection dynamics duringtherapy. The calculation of the model selection criteria for these nonlinearmodels is nontrivial, but facilitated by the use of modern differential equationsolver, optimization, and automatic differentiation software.

We studied six models, previously published in the literature, and comparedthe ability of each model to characterize the random as well as deterministicfeatures of HIV patient data from Louie et al. (2003). We concluded that it iscrucial to include target cell dynamics with density dependent growth (as in(11)), but that a latent cell compartment is not needed over this shorter timescale. Our results also strongly support higher estimates for the viral clearancerate than have been used in previous publications.

It is not sufficient to rely solely on biological intuition and goodness of fitcriteria to develop complex models. When considering several valid, candi-dates, the model selection approach presented here provides a framework forconstructing models supported by statistical evidence.

Acknowledgements

The research of P. W. Nelson, Ph.D., is supported in part by a Career Awardat the Scientific Interface from the Burroughs Wellcome Fund. D. M. Bortz,Ph.D., is also supported in part from the Burroughs Wellcome Fund.

The authors wish the thank the anonymous referees for helpful insight and ex-cellent suggestions. The authors also wish to thank A. S. Perelson (Los Alamos

National Laboratory) and K. A. Shedden (University of Michigan Statistics)for their comments on an earlier draft of this manuscript and D. D. Ho andM. Markowitz (Aaron Diamond Aids Research Center) for generously sharingtheir data. The authors also wish to thank C. Ludwig (Technische UniversitatMunchen) for assistance in calling high performance DE solver software fromwithin Matlab, A. Verschild (RWTH Aachen University) for the use of theADiMat automatic differentiation software, and R. Serban (Lawrence Liver-more National Laboratory) for assistance with the SUNDIALS software.

19


20/24

A Model Parameter definitions

A.1 Model (10)

T0 =cV0N

; k = cNT0

A.2 Model (11)

T0 =cV0NT0

; k = cNT0

; S =NpT2

0+(pcV0+(dTp)NTmax)T0+V0Tmaxc

NTmax

A.3 Model (12)

T0 =cV0N

; L0 =fcV0

N(L+k2fL); k = c(L+k2)

T0N(L+k2fL)

A.4 Model (13)

T0 =cV0N ; L0 =

cV0k2N

; k = L+k2T0k2N

A.5 Model (14)

T0 =cV0k2

(Nk2Lk2); k = c(L+k2)T0(k2(N1)L) ; L0 =

cV0k2(N1)L

;

S=1

Tmax(k2(N 1) L){((N 1)k2 L)pT

20 + V0Tmaxc(L + k2)

+((N 1)(dT p)(k2 L)Tmax + ( + k2)pcV0)T0}

20


21/24

A.6 Model (15)

T0 =cV0N

; k = cT0N

; S = dTT0 +cV0N

References

Akaike, H., 1973. Information theory as an extension of the maximum like-lihood principle. In: Petrov, B. N., Csaki, F. (Eds.), Second InternationalSymposium on Information Theory. Akademiai Kiado, Budapest, Hungary,pp. 267281.

Akaike, H., 1974. A new look at the statistical model identification. IEEE

Transactions on Automatic Control 19, 716723.Akaike, H., 1977. On entropy maximization principle. In: Krishnaiah, P. R.

(Ed.), Applications of Statistics. North Holland Publishing Company, Am-sterdam, The Netherlands, pp. 2741.

Banks, H. T., Bortz, D. M., Apr. 2005a. Inverse problems for a class of measuredependent dynamical systems. Journal of Inverse and Ill-Posed Problems13 (2).

Banks, H. T., Bortz, D. M., Jun. 2005b. A parameter sensitivity methodol-ogy in the context of HIV delay equation models. Journal of MathematicalBiology 50 (6), 607625.

Banks, H. T., Bortz, D. M., Holte, S. E., 2003. Incorporation of variabilityinto the mathematical modeling of viral delays in HIV infection dynamics.Mathematical Biosciences 183 (1), 6391.

Banks, H. T., Fitzpatrick, B. G., 1990. Statistical methods for model compar-ison in parameter estimation problems for distributed systems. Journal ofMathematical Biology 28, 501527.

Bortz, D. M., 2005. Accurate model selection computations. In preparation.Bortz, D. M., Nelson, P. W., 2004. Sensitivity analysis of nonlinear lumped

parameter models of HIV infection dynamics. Bulletin of Mathematical Bi-ology 66 (5), 10091026.

Bozdogan, H., 1988. ICOMP: A new model-selection criterion. In: Bock, H. H.

(Ed.), Classification and Related Methods of Data Analysis. North HollandPublishing Company, Amsterdam, The Netherlands, pp. 599608.

Burnham, K. P., Anderson, D. R., 2002. Model Selection and Multimodel Infer-ence: A Practical Information-Theoretic Approach, 2nd Edition. Springer-Verlag, New York, NY.

Callaway, D. S., Perelson, A. S., 2002. HIV-1 infection and low steady stateviral loads. Bulletin of Mathematical Biology 64, 2964.

Ciupe, S., de Bivort, B. L., Bortz, D. M., Nelson, P. W., 2005. Estimating

21


22/24

kinetic parameters from HIV primary infection data through the eyes ofthree different mathematical models. Mathematical Biosciences. To appear.

Davidian, M., Giltinan, D. M., 1995. Nonlinear Models for Repeated Measure-ment Data. No. 62 in Monographs on Statistics and Applied Probability.Chapman and Hall/CRC, Boca Raton, Florida.

Gorfine, M., Freedman, L., Shahaf, G., R.Mehr, 2003. Maximum likelihoodestimator and likelihood ratio test in complex models: An application to Blymphocyte development. Bulletin of Mathematical Biology 65, 11311139.

Griewank, A., Juedes, D., Utke, J., 1996. ADOL-C: A package for the auto-matic differentiation of algorithms written in C/C++. ACM Transactionson Mathematical Software 22 (2), 131167.

Grossman, Z., Feinberg, M., Kuznetsov, V., Dimitrov, D., Paul, W., 1998. HIVinfection: how effective is drug combination treatment? Immunology Today19, 528532.

Grunwald, P. D., Myung, I. J., Pitt, M. A. (Eds.), 2005. Advances in Min-imum Description Length : Theory and Applications. Neural Information

Processing. MIT Press.Hairer, E., Norsett, S. P., Wanner, G., 1993. Solving Ordinary Differential

Equations I. Nonstiff Problems, 2nd Edition. Series in Computational Math-ematics. Springer-Verlag.

Herz, A. V. M., Bonhoeffer, S., Anderson, R. M., May, R. M., Nowak, M. A.,1996. Viral dynamics in vivo: limitations on estimates of intracellular delayand virus decay. Proceedings of the National Academy of Sciences, USA 93,72477251.

Hindmarsh, A. C., Brown, P. N., Grant, K. E., Lee, S. L., Serban, R., Shu-maker, D. E., Woodward, C. S., Sep. 2005. SUNDIALS: Suite of nonlinear

and differential/algebraic equation solvers. ACM Transactions on Mathe-matical Software 31 (3).Ho, D. D., Neumann, A. U., Perelson, A. S., Chen, W., Leonard, J. M.,

Markowitz, M., Jan. 1995. Rapid turnover of plasma virions and CD4 lym-phocytes in HIV-1 infection. Nature 373 (6510), 123126.

Hurvich, C. M., Tsai, C.-L., 1989. Regression and time series model selectionin small samples. Biometrika 76, 271293.

Kass, R. E., Raftery, A. E., Jun. 1995. Bayes factors. Journal of the AmericanStatistical Association 90 (430), 773795.

Kim, H.-J., Cavanaugh, J. E., 2005. Model selection criteria based on kullbackinformation measures for nonlinear regression. Journal of Statistics Planning

and Inference. To appear.Kramer, I., 1999. Modeling the dynamical impact of HIV on the immune

system: Viral clearance, infection, and AIDS. Mathematical and ComputerModelling 29, 95112.

Kullback, S., Leibler, R. A., 1951. On information and sufficiency. Annals ofMathematical Statistics 22, 7986.

Lloyd, A. L., 2001. The dependence of viral parameter estimates on the asumedviral load life cycle: limitations of studies of viral load data. Proceedings of

22


23/24

the Royal Society of London Series B 268, 847854.Louie, M., Hogan, C., Hurley, A., Simon, V., Chung, C., Padte, N., Lamy, P.,

Flaherty, J., Coakley, D., Mascio, M. D., Perelson, A. S., Markowitz, M.,2003. Determining the antiviral activity of tenofovir disoproxil fumaratein treatment-naive chronically HIV-1-infected individuals. AIDS 17, 1151

1156.Mittler, J. E., Markowitz, M., Ho, D. D., Perelson, A. S., 1999. Improved

estimates for HIV-1 clearance rate and intracellular delay. AIDS 13, 14151417.

Mittler, J. E., Sulzer, B., Neumann, A. U., Perelson, A. S., 1998. Influenceof delayed viral production on viral dynamics in HIV-1 infected patients.Mathematical Biosciences 152, 143163.

Murray, J. M., Kaufmann, G., Kelleher, A. D., Cooper, D. A., 1998. A modelof primary HIV-1 infection. Mathematical Biosciences 154, 5785.

Nelson, P. W., Mittler, J. E., Perelson, A. S., 2001. Effect of drug efficacy andthe eclipse phase of the viral life cycle on estimates of HIV viral dynamic

parameters. Journal of Acquired Immune Deficiency Syndromes 26, 405412.

Nelson, P. W., Murray, J. D., Perelson, A. S., 2000. A model of HIV-1 patho-genesis that includes an intracellular delay. Mathematical Biosciences 163,201215.

Nelson, P. W., Perelson, A. S., 2002. Mathematical analysis of delay differentialequation models of HIV-1 infection. Mathematical Biosciences 179, 7394.

Nowak, M. A., Bonhoeffer, S., Shaw, G. M., May, R. M., 1997. Anti-viral drugtreatment: Dynamics of resistance in free virus and infected cell populations.Journal of Theoretical Biology 184, 203217.

Perelson, A. S., 2004. Personal communication.Perelson, A. S., Essunger, P., Cao, Y., Vesanen, M., Hurley, A., Saksela,K., Markowitz, M., Ho, D. D., May 1997. Decay characteristics of HIV-1-infected compartments during combination therapy. Nature 387 (6629),188191.

Perelson, A. S., Kirschner, D. E., de Boer, R., 1993. Dynamics of HIV infectionof CD4+ T-cells. Mathematical Biosciences 114, 81125.

Perelson, A. S., Nelson, P. W., 1999. Mathematical analysis of HIV-1 dynamicsin vivo. SIAM Review 41, 344.

Perelson, A. S., Neumann, A. U., Markowitz, M., Leonard, J. M., Ho, D. D.,1996. HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span,

and viral generation time. Science 271, 15821586.Pinheiro, J. C., Bates, D. M., 2000. Mixed-Effects Models in S and S-PLUS.

Statistics and Computing. Springer-Verlag, New York, NY.Ramratnam, B., Bonhoeffer, S., Binley, J., Hurley, A., Zhang, L., Mittler,

J. E., Markowitz, M., Moore, J. P., Perelson, A. S., Ho, D. D., 1999. Rapidproduction and clearance of HIV-1 and hepatitis C virus assessed by largevolume plasma apheresis. The Lancet 354, 17821785.

Rissanen, J., 1989. Stochastic Complexity and Statistical Inquiry. Vol. 15 of

23


24/24

Series in Computer Science. World Scientific, Singapore.Schwarz, G., 1978. Estimating the dimension of a model. The Annals of Statis-

tics 6, 461464.Shannon, C. E., 1948. A mathematical theory of communication. Bell System

Technical Journal 27, 37942 and 623656.

Shibata, R., 1989. Statistical aspects of model selection. In: Willems, J. C.(Ed.), From data to model. Springer-Verlag, London, pp. 375394.

Sonday, B., Nelson, P. W., 2005. Identifiability criteria for models of HIV. Inpreparation.

Stafford, M. A., Corey, L., Cao, Y., Daar, E. S., Ho, D. D., Perelson, A. S.,2000. Modeling plasma virus concentration during primary HIV infection.Journal of Theoretical Biology 203, 285301.

Takeuchi, K., 1976. Distribution of informational statistics and criterion ofmodel fitting. Suri-Kagaku (Mathematic Sciences) 153, 1218.

Trefethen, L. N., Bau III, D., 1997. Numerical Linear Algebra. SIAM, Philadel-phia, PA.

van Emden, M. H., 1971. An Analysis of Complexity. No. 35 in MathematicalCentre tracts. Mathematisch Centrum, Amsterdam.

Wei, X., Ghosh, S. K., Taylor, M. E., Johnson, V. A., Emini, E. A., Deutsch,P., Lifson, J. D., Bonhoeffer, S., Nowak, M. A., Hahn, B. H., Saag, M. S.,Shaw, G. M., 1995. Viral dynamics in human immunodeficiency virus type1 infection. Nature 373, 117122.

Wodarz, D., Lloyd, A. L., Jansen, V. A. A., Nowak, M. A., 1999. Dynamicsof macrophage and T cell infection by HIV. Journal of Theoretical Biology196, 101113.

Wu, H., 2005. Statistical methods for HIV dynamic studies in AIDS clinical

trials. Statistical Methods in Medical Research 14, 122.Wu, H., Ding, A., 1999. Population HIV-1 dynamics in vivo: Applicable modelsand inferential tools for virological data from aids clinical trials. Biometrics55, 410418.

Wu, H., Ding, A. A., de Gruttola, V., 1998. Estimation of HIV dynamic pa-rameters. Statistics in Medicine 17, 24632485.

Wu, H., Wu, L., 2002a. Identification of significant host factors for hiv dy-namics modeled by nonlinear mixed-effect models. Statistics in Medicine21, 753771.

Wu, L., Wu, H., 2002b. Missing time-dependent covariates in human immun-odeficiency virus dynamic models. Journal of the Royal Statistical Society,

Series C (Applied Statistics) 51, 297318.

Documents

Model Selection and Mixed-Effects Modeling of HIV Infection Dynamics