24
JSS Journal of Statistical Software July 2016, Volume 71, Issue 3. doi: 10.18637/jss.v071.i03 JMFit:A SAS Macro for Joint Models of Longitudinal and Survival Data Danjie Zhang Gilead Sciences, Inc. Ming-Hui Chen University of Connecticut Joseph G. Ibrahim University of North Carolina Mark E. Boye Eli Lilly and Company Wei Shen Eli Lilly and Company Abstract Joint models for longitudinal and survival data now have a long history of being used in clinical trials or other studies in which the goal is to assess a treatment effect while accounting for a longitudinal biomarker such as patient-reported outcomes or immune responses. Although software has been developed for fitting the joint model, no software packages are currently available for simultaneously fitting the joint model and assessing the fit of the longitudinal component and the survival component of the model separately as well as the contribution of the longitudinal data to the fit of the survival model. To fulfill this need, we develop a SAS macro, called JMFit. JMFit implements a variety of popular joint models and provides several model assessment measures including the decomposition of AIC and BIC as well as ΔAIC and ΔBIC recently developed in Zhang, Chen, Ibrahim, Boye, Wang, and Shen (2014). Examples with real and simulated data are provided to illustrate the use of JMFit. Keywords : AIC, BIC, patient-reported outcome (PRO), shared parameter model, time-varying covariates. 1. Introduction The joint analysis of longitudinal and time-to-event outcomes has been widely published in statistical journals. One popular approach in joint modeling of longitudinal and survival data is based on shared random effects, where the longitudinal model and survival model share common random effects and these random effects then induce correlation between the longitudinal and survival components of the model. This family of joint models is also called

JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

Embed Size (px)

Citation preview

Page 1: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

JSS Journal of Statistical SoftwareJuly 2016 Volume 71 Issue 3 doi 1018637jssv071i03

JMFit A SAS Macro for Joint Models ofLongitudinal and Survival Data

Danjie ZhangGilead Sciences Inc

Ming-Hui ChenUniversity of Connecticut

Joseph G IbrahimUniversity of North Carolina

Mark E BoyeEli Lilly and Company

Wei ShenEli Lilly and Company

Abstract

Joint models for longitudinal and survival data now have a long history of being usedin clinical trials or other studies in which the goal is to assess a treatment effect whileaccounting for a longitudinal biomarker such as patient-reported outcomes or immuneresponses Although software has been developed for fitting the joint model no softwarepackages are currently available for simultaneously fitting the joint model and assessingthe fit of the longitudinal component and the survival component of the model separatelyas well as the contribution of the longitudinal data to the fit of the survival model Tofulfill this need we develop a SAS macro called JMFit JMFit implements a varietyof popular joint models and provides several model assessment measures including thedecomposition of AIC and BIC as well as ∆AIC and ∆BIC recently developed in ZhangChen Ibrahim Boye Wang and Shen (2014) Examples with real and simulated dataare provided to illustrate the use of JMFit

Keywords AIC BIC patient-reported outcome (PRO) shared parameter model time-varyingcovariates

1 Introduction

The joint analysis of longitudinal and time-to-event outcomes has been widely published instatistical journals One popular approach in joint modeling of longitudinal and survivaldata is based on shared random effects where the longitudinal model and survival modelshare common random effects and these random effects then induce correlation between thelongitudinal and survival components of the model This family of joint models is also called

2 JMFit Joint Models of Longitudinal and Survival Data in SAS

the ldquoshared parameter modelsrdquo (SPMs) There are two basic formulations of SPMs Thefirst is the ldquotime trajectory modelrdquo denoted by SPM1 where one essentially substitutes thepolynomial time trajectory function from the longitudinal model into the hazard function ofthe survival model and in this case the trajectory function acts like a time-varying covariatein the survival model The second formulation denoted by SPM2 is to directly include therandom effects as covariates in the survival model There are several R packages available infitting joint models based on shared random effects including JM (Rizopoulos 2012) JMbayes(Rizopoulos 2016) and joineR (Philipson Sousa Diggle Williamson Kolamunnage-Donaand Henderson 2012) There is also a Stata module stjm (Crowther 2012 Crowther Abramsand Lambert 2013) which estimates shared random effects models In addition another Rpackage lcmm (Proust-Lima Philipps and Liquet 2016) estimates joint models based onshared latent classesOne important issue in the joint modeling of longitudinal and survival data concerns theseparate contribution of the model components to the overall goodness-of-fit of the jointmodel Recently Zhang et al (2014) derived a novel decomposition of the AIC and BICcriteria into additive components that will allow us to assess the goodness of fit for eachcomponent of the joint model Such a decomposition leads to the development of ∆AICand ∆BIC which quantify the change of AIC and BIC in fitting the survival data with andwithout using the longitudinal data Thus ∆AIC and ∆BIC can be used to determine theimportance of the longitudinal data relative to the model fit of the survival data In addition∆AIC and ∆BIC are also very useful in assessing whether a linear trajectory or quadratictrajectory is more suitable and also facilitating a direct comparison between SPM1s andSPM2s These measures will help the data analyst in not only assessing each component ofthe joint model but also in determining the contribution of the longitudinal measures to thefit of the survival data These newly developed model assessment criteria are not available inany of these packages or module mentioned before We mention here that the methodologyfor ∆AIC and ∆BIC was fully developed in Zhang et al (2014) but our goal here is thenovel implementation of this methodology into user-friendly software along with a class ofjoint models for jointly analyzing longitudinal and time-to-event dataThis paper introduces JMFit a SAS macro that will allow us to fit the SPM1 SPM2 time-varying covariates and two-stage models as well as to assess the goodness-of-fit of each ofthe longitudinal and survival components in the joint model A detailed analysis of thelongitudinal and survival data from a cancer clinical trial as well as an analysis of the simulateddata are carried out to illustrate the functionality of JMFit A detailed description of JMFitis given in Appendix A

2 The models and model assessment

21 The joint models

Suppose that there are n subjects For the ith subject let yi(t) denote the longitudinalmeasure which is observed at time t isin ai1 ai2 aimi where 0 le ai1 lt ai2 lt middot middot middot lt aimi

and mi ge 1 Here yi(0) denotes the baseline value of the longitudinal measure Let tiand δi be the failure time and the censoring indicator such that δi = 1 if ti is a failuretime and 0 if ti is right-censored for the ith subject We further let xi(t) and zi denote a

Journal of Statistical Software 3

pL-dimensional vector of time-dependent covariates and a pS-dimensional vector of baselinecovariates respectively The joint model for (yi ti) consists of the longitudinal componentand the survival componentFor the longitudinal component a mixed effects regression model is assumed for yi(t) whichtakes the form

yi(aij) = θgti g(aij) + γgtxi(t) + εi(aij) (1)

where g(aij) = (1 aij a2ij a

qij)prime is a polynomial vector of order q for j = 1 mi θi is

a (q + 1)-dimensional vector of random effects and γ is a p-dimensional vector of regressioncoefficients In (1) we further assume θi sim N(θΩ) where θ is the (q+1)-dimensional vectorof the overall effects Ω is a (q + 1) times (q + 1) positive definite covariance matrix with thelower triangle consisting of Ω00Ω10Ω11 Ωqq εi(aij) sim N(0 σ2) and θi and εi(aij) areindependent We note that in (1) if q = 1 g(aij) = (1 aij)gt and θgti g(aij) represents a lineartrajectory and if q = 2 g(aij) = (1 aij a2

ij)gt and θgti g(aij) leads to a quadratic trajectoryFor the failure time ti we assume that the hazard function takes the form

λ(t|λ0βαθi g(t) zi) = λ0(t) expβθgti g(t) +αgtzi (2)

orλ(t|λ0βαθi g(t) zi) = λ0(t) expβgtθi +αgtzi (3)

where λ0(t) is the baseline hazard function β is a one-dimensional regression coefficient in(2) and β is a (q+ 1)-dimensional vector of the regression coefficients in (3) Note that in (2)or (3) θi and g(t) are the parameters and the functions from the longitudinal component ofthe joint model in (1) while λ0 β (or β) and α are the only parameters pertaining to thesurvival component As shown in Section 31 β (or β) controls the association between thelongitudinal marker and the time-to-event A value of β = 0 (or β = 0) implies no associationbetween the longitudinal marker and the time-to-event The joint model with hazard functionspecified in (2) is the trajectory model denoted by SPM1 while the one with hazard functiongiven by (3) is denoted by SPM2 Under SPM1 a positive value of β implies that a largercurrent value of the longitudinal marker is associated with a larger instantaneous hazardwhereas a negative value of β implies that a larger current value of the longitudinal markeris associated with a smaller instantaneous hazard

22 The construction of the piecewise constant baseline hazard function

Assuming λ0(t) to be a piecewise constant baseline hazard function we partition the timeaxis into J intervals with 0 = sJ0 lt sJ1 lt sJ2 lt lt sJJminus1 lt sJJ = infin Then we assign aconstant baseline hazard to each of the J intervals that is

λ0(t) = λj t isin (sJjminus1 sJj ] for j = 1 2 J (4)

Let tlowast1 le tlowast2 le middot middot middot le tlowastnlowast be the nlowast event times of the tirsquos where nlowast =sumni=1 δi We consider

four algorithms to construct the sJj rsquos

Algorithm 1 Equally-spaced quantile partition (ESQP)

Step 1 Compute pj = jJ for j = 1 J minus 1

4 JMFit Joint Models of Longitudinal and Survival Data in SAS

Step 2 Let nj = [pjnlowast] which is the integer part of pjnlowast

Step 3 Set

sJj =

(tlowastnj+ tlowastnj+1)2 if nj = pjn

lowast

tlowastnj+1 if nj lt pjnlowast

(5)

for j = 1 J minus 1

The ESQP is a popular approach to construct the piecewise constant hazard function whichis discussed in Ibrahim Chen and Sinha (2001 Chapter 5) and also implemented in the Rpackage JM developed by Rizopoulos (2010) We note that sJj is the pjth quantile of the tlowasti rsquosand (5) is implemented in the SAS UNIVARIATE procedure as the default option for computingquantiles We also note that the ESQP algorithm does not yield nested partitions in the sensethat sJ1

j j = 1 J1 is not necessarily a subset of sJ2j j = 1 J2 when J1 lt J2

In order to construct nested partitions we propose the following three bi-sectional quantilepartition algorithms

Algorithm 2 Left bi-sectional quantile partition (LBSQP)

Step 1 Decompose J into two parts

J = 2K +M (6)

where K and M are integers and M lt 2K In (6) K = [log J log 2] and thenM = J minus 2K

Step 2 Compute

(i) ak = k2K for k = 1 2K minus 1 and(ii) bm = (2mminus 1)2K+1 for m = 1 M(ge 1)

Step 3 Sort a1 a2Kminus1 b1 bM in ascending order and the resulting ordered J minus 1values are denoted by p1 le p2 le middot middot middot le pJminus1

Step 4 Use Steps 2 and 3 of Algorithm 1 to compute sJj j = 1 J minus 1

Algorithm 3 Middle bi-sectional quantile partition (MBSQP)

Step 1 The same as Algorithm 2

Step 2 Compute

(i) ak = k2K for k = 1 2K minus 1 and(ii) for m = 1 M(ge 1)

(a) bm = (2K minusm)2K+1 for m = 1 3 5 and(b) bm = (2K +mminus 1)2K+1 for m = 2 4 6

Steps 3 and 4 The same as Algorithm 2

Journal of Statistical Software 5

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

0 10 20 30 40 50 60 70 80 90 100()

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

0 10 20 30 40 50 60 70 80 90 100()

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

J = 1

J = 2 = 21

J = 3 = 21 + 1

J = 4 = 22

J = 5 = 22 + 1

J = 6 = 22 + 2

J = 7 = 22 + 3

J = 8 = 23

J = 9 = 23 + 1

0 10 20 30 40 50 60 70 80 90 100()

(a) LBSQP (b) MBSQP (c) RBSQP

Figure 1 An illustration of three bi-sectional partition methods

Algorithm 4 Right bi-sectional quantile partition (RBSQP)

Step 1 The same as Algorithm 2

Step 2 Compute

(i) ak = k2K for k = 1 2K minus 1 and(ii) bm = (2K+1 minus (2mminus 1))2K+1 for m = 1 M(ge 1)

Steps 3 and 4 The same as Algorithm 2

If there are ties in sJj j = 1 J say sJ` = sJ`+1 the interval (sJ` sJ`+1] is undefined Thuswe only use distinct values of the sJj Let Jt denote the number of ties in sJj j = 1 JThen the number of distinct intervals reduces to J minus JtFigure 1 shows how the partition intervals are constructed based on LBSQP MBSQP andRBSQPNotice that when J = 2K K = 1 2 ESQP LBSQP MBSQP and RBSQP yield thesame partition LBSQP MBSQP and RBSQP are desirable when there are more events atthe beginning in the middle and at the end of the follow-up period respectively Anotheradvantage of LBSQP MBSQP and RBSQP is that the resulting partitions are nested andhence the log-likelihood of the joint model increases in J when the longitudinal componentremains fixed

23 The joint likelihoodWe rewrite (1) as follows

yi = Wi(θgti γgt)gt + εiwhere yi = (yi(ai1) yi(aimi))gt Wi = ((g(aij)gtxi(aij)gt)gt j = 1 mi)gt and εi =(εi(ai1) εi(aimi))gt sim N(0 σ2Imi) The complete-data likelihood function of the longitu-dinal measures for the ith subject is given by

L(γ σ2|yiWiθi) = 1(2πσ2)

mi2

expminus 1

2σ2 (yi minusWi(θgti γgt)gt)gt(yi minusWi(θgti γgt)gt) (7)

6 JMFit Joint Models of Longitudinal and Survival Data in SAS

for i = 1 n Note that the density of θi is given by

f(θi|θΩ) = |Ω|minus12

(2π)q+1

2exp

minus 1

2(θi minus θ)gtΩminus1(θi minus θ) (8)

Let ϕ = (λβαγ σ2θΩ) Using (2) (or (3)) (7) and (8) the observed-data likelihoodfunction for (yi ti δi) for the ith subject is given by

L(ϕ|yi ti δi ziWi) =intL(λβα|ti δi ziθi g)L(γ σ2|yiWiθi)f(θi|θΩ)dθi (9)

where the complete-data likelihood function for the survival component is written as

L(λβα|ti δi ziθi g) =[λ(ti|λ0βαθi g(ti) zi)

]δi

times expminusint ti

0λ(u|λ0βαθi g(u) zi)du

(10)

for i = 1 n In (2) (or (3)) when β = 0 (or β = 0) the hazard function reduces toλ(t|λ0α zi) = λ0(t) exp(αgtzi) In this case we fit the survival data alone and the likelihoodfunction in (10) for the ith subject reduces to

L0(λα|ti δi zi) = λ0(ti) exp(αgtzi)δi exp[minus exp(αgtzi)

int ti

0λ0(u)du

] (11)

Letting Dobs = (yi ti δixi zi) i = 1 n denote the observed data the joint likelihoodfor all subjects is given by

L(ϕ|g Dobs) =nprodi=1

L(ϕ|yi ti δi ziWi) (12)

24 AIC (BIC) decomposition and ∆AIC (∆BIC)

Write ϕ1 = (γ σ2θΩ) and ϕ2 = (λβα) Let f(θi|yiWiϕ1) be the conditional density ofthe random effects θi given yi and also let L(ϕ1|yiWi) =

intL(γ σ2|yiWiθi)f(θi|θΩ)dθi

which is the likelihood function corresponding to the marginal distribution of yi FollowingZhang et al (2014) the joint likelihood given in (12) can be decomposed as

L(ϕ|g Dobs) = LLong(ϕ1|g Dobs)LSurv|Long(ϕ2|gϕ1 Dobs) (13)

where LLong(ϕ1|g Dobs) =prodni=1 L(ϕ1|yiWi) and LSurv|Long(ϕ2|gϕ1 Dobs) =

prodni=1

intL(ϕ2|ti

δi ziθi g)f(θi|yiWiϕ1)dθi Using (13) the decomposition of the total Akaike InformationCriterion (AIC) (Akaike 1973) developed in Zhang et al (2014) is given as

AIC = AICLong + AICSurv|Long

where AIC = minus2 logL(ϕ|g Dobs)+2 dim(ϕ) AICLong = minus2 logLLong(ϕ1|g Dobs)+2 dim(ϕ1)AICSurv|Long = minus2 logLSurv|Long(ϕ2|g ϕ1 Dobs) + 2 dim(ϕ2) and ϕ ϕ1 and ϕ2 are the maxi-mum likelihood estimates (MLEs) of ϕ ϕ1 and ϕ2 Similarly the total Bayesian InformationCriterion (BIC) (Schwarz 1978) for the joint model can be decomposed into

BIC = BICLong + BICSurv|Long

Journal of Statistical Software 7

where BIC = AIC+dim(ϕ)(lognminus2) BICLong = AICLong+dim(ϕ1) (lognminus2) BICSurv|Long =AICSurv|Long + dim(ϕ2)(logn minus 2) Using the decompositions of AIC and BIC Zhang et al(2014) proposed two new model assessment criteria given by

∆AIC = AICSurv0 minusAICSurv|Long∆BIC = BICSurv0 minus BICSurv|Long

whereAICSurv0 = minus2

sumni=1 logL0(λ α|ti δi zi) + 2 dim(λα)

BICSurv0 = minus2sumni=1 logL0(λ α|ti δi zi) + dim(λα) logn

and L0(λα|ti δi zi) is defined by (11) The ∆AIC or ∆BIC measure the gain of the fit in thesurvival component due to the longitudinal data with a penalty for the additional parametersin the survival component of the joint model The model with a large value of ∆AIC (∆BIC)is more preferred

3 The SAS macro JMFit

31 Design

The SAS macro JMFit has been developed to assess model fit in joint models of longitudinaland survival data In fact it can fit five models including the two types of joint models withlinear and quadratic trajectories as well as the time-varying covariates model The macroJMFit consists of five submacros SPM1L SPM1Q SPM2L SPM2Q and TVC corresponding to thefive models respectively The MODEL argument of JMFit specifies one of the following fivemodels to be fitted

SPM1L SPM1 with Linear trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ(θ0i + θ1it) +αgtzi

SPM1Q SPM1 with Quadratic trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ(θ0i + θ1it+ θ2it2) +αgtzi

SPM2L SPM2 with Linear trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ1θ0i + β2θ1i +αgtzi

where θ0i and θ1i are subject-level random intercept and random slope

SPM2Q SPM2 with Quadratic trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ1θ0i + β2θ1i + β3θ2i +αgtzi

where θ0i θ1i and θ2i are random effects (ie random intercept random slope andrandom quadratic coefficient)

8 JMFit Joint Models of Longitudinal and Survival Data in SAS

TVC Time-Varying Covariates model The hazard function has the form

λ(t|λ0 βα zi yi(t)) = λ0(t) expβyi(t) +αgtzi

where yi(t) = yi(aij) for aij le t lt aij+1 for j = 1 mi where aimi+1 = infin Notethat the TVC model is a ldquonon-jointrdquo model and the use of this model has great potentialfor bias (Fisher and Lin 1999)

We provide the two versions of SPM1L SPM1Q SPM2L and SPM2Q with the TS argumentIf TS is missing or equal to 0 the joint model will be fit while TS = 1 yields the correspondingtwo-stage model Similar to the method in Tsiatis DeGruttola and Wulfsohn (1995) (i) wefirst fit the linear mixed model specified in (1) to the longitudinal data alone and then obtainthe estimates of θi denoted by θi and (ii) we replace θi in (2) or (3) with the estimate θiat the second stage The only difference between the joint model and the corresponding two-stage model is that θi in (2) or (3) is replaced with the estimate θi This two-stage approachmay potentially lead to biased and inefficient estimates (Ibrahim Chu and Chen 2010)The number of intervals J (ge 1) for the piecewise constant baseline hazard function needs tobe specified in the NPIECES argument For the PARTITION argument 1 represents ESQP 2represents LBSQP 3 represents MBSQP and 4 represents RBSQPJMFit automatically produces a rich text file (RTF) including five tables (i) Number of Sub-jects (ii) Fit Statistics (iii) Survival Parameter Estimates (Survival Alone) (iv) ParameterEstimates and (v) Hazard Ratios amp λ Estimates

32 Implementation detailsIf the observed longitudinal measures are sparse the full trajectories of longitudinal measuresmight not be well estimated For example in the case of fitting a quadratic trajectory the signof the estimated second-order coefficient could be incorrect if the longitudinal measures wereobserved only within the first half of the follow-up period leading to incorrect extrapolationwhen the observed progression time was far beyond the time of the last observed longitudinalmeasureLet tmaxi = max1lejlemiaij When t gt tmaxi yi(t) is never observed and no longitudinaldata are available to estimate the trajectory θgti g(t) for t gt tmaxi Under SPM1 the extrap-olation of the trajectory θgti g(t) beyond tmaxi may lead to a survival component of the jointmodel that fits the survival data poorly In addition such an extrapolation also causes asevere convergence problem in the SAS NLMIXED (SAS Institute Inc 2011b) procedure espe-cially when tmaxi ti for many subjects To circumvent these issues for SPM1 we modifythe hazard function in (2) as

λ(t | λ0αβθi g zi) = λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+) +αgtzi (14)

or

λ(t | λ0αβθi g zi)

= λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+)timesτ minus (tlowastmaxi + [tminus tlowastmaxi]+)

τ minus tlowastmaxi+αgtzi (15)

where [tminustlowastmaxi]+ = max(tminustlowastmaxi 0) tlowastmaxi = tmaxi+wtimesmax(timinustmaxi 0) w (isin [0 1]) is theproportion of max(ti minus tmaxi 0) and τ = max1leilenti which is the last follow-up survival

Journal of Statistical Software 9

time If w = 0 (default) tmaxi will be the starting point of the modified extrapolation of thetrajectory while w = 1 implies that the trajectory extends to ti with no tmaxi adjustmentThis modification can also be applied to the TS model corresponding to SPM1L or SPM1QThere are two arguments called TMAXI and WEIGHT If TMAXI is missing or equal to 0 no tmaxiadjustment will be applied If TMAXI = 1 the tmaxi adjustment based on hazard functiongiven in (14) will be applied with weight given in the WEIGHT argument that is the trajectorywill become flat after tlowastmaxi If TMAXI = 2 the tmaxi adjustment based on hazard functiongiven in (15) will be applied with weight given in the WEIGHT argument that is startingat tlowastmaxi the trajectory will linearly go down to 0 at the last follow-up survival time Anldquooptimalrdquo choice of weight w in conjunction with TMAXI option may be determined by eitherAICSurv|Long or BICSurv|Long The purpose of defining tmaxi is that we create an extrapolationof the longitudinal measures so that the trajectory function can be well estimated Thisextrapolation is needed when there are few longitudinal measures at later points We notehere that the tmaxi method corresponds to a ldquoprediction carried forwardrdquo approach whichmay potentially induce bias in the estimation We also note that this is not an issue for theSPM2 in which the hazard function is independent of g(t)The OPTIONS argument allows users to specify options (eg integration method optimizationtechnique and convergence criteria) that are available in the PROC NLMIXED statement IfOPTIONS is missing JMFit will use adaptive Gaussian quadrature to approximate the integralof the likelihood over the random effects perform a quasi-Newton optimization and apply arelative gradient convergence criterion of 10minus8 All the methods mentioned above are defaultmethods in PROC NLMIXED The Riemann integral is used to compute the cumulative hazardfunction for the trajectory models Each time interval is divided into 200 subintervalsA big challenge in fitting joint models using the SAS NLMIXED procedure is convergence Poorinitial values may lead to the failure of convergence in NLMIXED To address this issue wefirst fit the longitudinal data alone using the SAS MIXED procedure to obtain the estimatesγ σ2 θ Ω θi for the parameters in the longitudinal component of the joint model Using(θi i = 1 n) to replace (θi i = 1 n) in (2) (or (3)) we fit the survival data aloneto obtain the estimates β (or β) α and λ for the parameters in the survival component ofthe joint model Finally these estimates are used as the initial values for the joint modelJMFit does not exclude any longitudinal measures for the joint models If one wishes toexclude the longitudinal measures observed after the survival time ti those longitudinal mea-sures should be pre-excluded in the input longitudinal data for JMFit ldquoCAUTION Longi-tudinal measures are observed after the survival timerdquo will be given at the end of the outputfile if there are any longitudinal measures observed after the survival time

4 Examples

41 The IBCSG data

To illustrate how JMFit works we use the data from a clinical trial in premenopausal womenwith node-positive breast cancer conducted by the IBCSG (International Breast Cancer StudyGroup 1996) Each participant was randomly assigned in a 2 times 2 factorial design to receive(A) cyclophosphamide methotrexate and fluorouracil for 6 consecutive courses on months 1to 6 (CMF6) (B) CMF6 plus three single courses of reintroduction CMF given on months 9

10 JMFit Joint Models of Longitudinal and Survival Data in SAS

Number of Subjects

Subjects_in_coping 1015

Subjects_in_os 1015

Subjects_Used 1015

Figure 2 The number of subjects for coping

12 and 15 (C) CMF for three consecutive courses on months 1 to 3 (CMF3) or (D) CMF3plus three single courses of reintroduction CMF given on months 6 9 and 12 Four indicatorsof patientsrsquo quality of life (QOL) appetite perceived coping mood and physical well-beingwere collected in this trial We consider a subset of the IBCSG data which consists of 1015patients with at least three values of each longitudinal measure and six binary covariatesincluding treatment (A 1 = icyc 0 = reint and 0 = intera B 1 = icyc 1 = reintand 1 = intera C 0 = icyc 0 = reint and 0 = intera and D 0 = icyc 1 = reintand 0 = intera) age (1 = lsquogt 40rsquo and 0 = lsquole 40rsquo) the number of positive nodes of thetumor (1 = lsquogt 4rsquo and 0 = lsquole 4rsquo) and estrogen receptor (ER) status (1 = positive and 0 =negative) To satisfy the normality assumption for each of the four QOL indicators followingChi and Ibrahim (2006) the corresponding observed value of each QOL was transformed toradic

100minusQOL so that smaller value reflects better QOL For the subset of 1015 patients aftera median follow-up of 968 years (interquartile range 810ndash1104 years) 296 patients diedWe note that QOL indicators were collected only up to 165 years (about 18 months) Byapplying joint models in this study we compare the longitudinal QOL indicators in termsof their contributions to the fit of overall survival (OS) via ∆AIC and ∆BIC which arecomputed using JMFitSuppose we fit the trajectory model with a linear trajectory as considered by Chi and Ibrahim(2006) and Zhu Ibrahim Chi and Tang (2012) and J = 9 for coping Then we create two datasets named as ldquocopingrdquo and ldquoosrdquo for the macro JMFitrsquos LONG and SURV options respectivelyThe MODEL option is set to ldquoSPM1Lrdquo and TS is set to 0 We assign 2 to TMAXI for tmaxiadjustment with WEIGHT=05 The NPIECES option is set to 9 and the PARTITION option isset to 2 for the LBSQP algorithmJMFit is called

JMFit(LONG = coping SURV = os MODEL = SPM1L TS = 0 TMAXI = 2WEIGHT = 05 NPIECES = 9 PARTITION = 2)

The RTF file lsquoOutput for coping under SPM1L with J=9 (Partition=2)rtfrsquo is gener-ated by JMFit which lists five tables Table ldquoNumber of Subjectsrdquo given in Figure 2 showsthat there are 1015 patients in data sets coping and os respectively and 1015 subjects areused implying that the IDs of the subjects in these two data sets matchFigure 3 shows the ldquoFit Statisticsrdquo table which consists of the log likelihood AICLong(BICLong) AICSurv (BICSurv) and ∆AIC (∆BIC) Inside the JMFit macro PROC NLMIXEDprovides the log likelihood AIC and BIC and PROC IML (SAS Institute Inc 2011a) is usedto compute AICLong and BICLongNext the table titled ldquoSurvival Parameter Estimates (Survival Alone)rdquo shown in Figure 4is obtained by fitting the survival data alone For this table the estimate the standard error

Journal of Statistical Software 11

Fit Statistics

Log Likelihood -806953

AIC 1619507 BIC 1633290

AICLong 1383046 BICLong 1388954

AICSurv|Long 236460 BICSurv|Long 244336

AICSurv0 240095 BICSurv0 247479

ΔAIC 3635 ΔBIC 3143

Figure 3 The fit statistics for coping under SPM1L with J = 9 based on LBSQP

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

icyc -000126 01642 1015 -001 09939 (-0323 0321) -000133

reint -02772 01707 1015 -162 01046 (-0612 0058) -000141

intera 02678 02337 1015 115 02520 (-0191 0726) -000211

agegrp -02070 01356 1015 -153 01272 (-0473 0059) 0000143

nodegrp 08926 01170 1015 763 lt0001 (0663 1122) -000009

er_stat -01982 01275 1015 -156 01203 (-0448 0052) -000221

log λ1 -49829 02846 1015 -1751 lt0001 (-5541 -4424) -000031

log λ2 -27823 02848 1015 -977 lt0001 (-3341 -2224) -000051

log λ3 -29729 02373 1015 -1253 lt0001 (-3439 -2507) 0003588

log λ4 -27064 02359 1015 -1147 lt0001 (-3169 -2244) -000073

log λ5 -26998 02364 1015 -1142 lt0001 (-3164 -2236) -000167

log λ6 -29433 02355 1015 -1250 lt0001 (-3406 -2481) 0001627

log λ7 -28127 02387 1015 -1178 lt0001 (-3281 -2344) -000293

log λ8 -30135 02367 1015 -1273 lt0001 (-3478 -2549) 0001908

log λ9 -34253 02358 1015 -1453 lt0001 (-3888 -2963) -000119

Figure 4 The estimates of the parameters obtained by (11) with J = 9 based on LBSQP

(SE) the degrees of freedom (DF) the t value the p value and the 95 confidence interval(CI) are all shown for each parameter In addition the gradient of the negative log-likelihoodfunction is displayed which can be used to check convergence of PROC NLMIXED A smallgradient implies better convergence The largest absolute value of the gradients shown inFigure 4 is 0003588 indicating good convergence of PROC NLMIXEDPROC NLMIXED produces the table titled ldquoParameter Estimatesrdquo in Figure 5 which consistsof three subtables ldquoCovariance Parameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquoand ldquoSurvival Parameter Estimatesrdquo For each parameter in this table the estimate SE DFt value p value 95 CI and gradient are all provided The Subtable ldquoCovariance ParameterEstimatesrdquo consists of the estimates of the three lower-triangle elements (Ω00Ω10Ω11) of the

12 JMFit Joint Models of Longitudinal and Survival Data in SAS

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 05350 003274 1013 1634 lt0001 (0471 0599) -008107

Ω10 -005658 002094 1013 -270 00070 (-0098 -0015) 0101966

Ω11 01725 002650 1013 651 lt0001 (0120 0224) -002461

σ 06149 0007308 1013 8414 lt0001 (0601 0629) 0087386

Longitudinal Parameter Estimates

Intercept 008643 007928 1013 109 02759 (-0069 0242) 0123621

t -04004 002277 1013 -1758 lt0001 (-0445 -0356) -015625

icyc 01625 007073 1013 230 00218 (0024 0301) 0015924

reint 01259 007007 1013 180 00726 (-0012 0263) 0313939

intera -006607 009767 1013 -068 04989 (-0258 0126) 0002171

agegrp 005158 005907 1013 087 03828 (-0064 0167) -013258

nodegrp 002780 005384 1013 052 06057 (-0078 0133) -018797

er_stat 002232 005408 1013 041 06799 (-0084 0128) -005203

Survival Parameter Estimates

icyc -006267 01695 1013 -037 07116 (-0395 0270) 0024537

reint -03355 01756 1013 -191 00563 (-0680 0009) -001849

intera 03464 02406 1013 144 01502 (-0126 0818) 014234

agegrp -01827 01397 1013 -131 01913 (-0457 0091) 0049036

nodegrp 09522 01217 1013 782 lt0001 (0713 1191) 003694

er_stat -01540 01315 1013 -117 02418 (-0412 0104) -000715

log λ1 -49259 02888 1013 -1706 lt0001 (-5493 -4359) 0016501

log λ2 -25917 02923 1013 -887 lt0001 (-3165 -2018) -002471

log λ3 -27430 02482 1013 -1105 lt0001 (-3230 -2256) -00547

log λ4 -24318 02494 1013 -975 lt0001 (-2921 -1942) -006882

log λ5 -23952 02519 1013 -951 lt0001 (-2889 -1901) -007637

log λ6 -26149 02529 1013 -1034 lt0001 (-3111 -2119) -008057

log λ7 -24741 02571 1013 -962 lt0001 (-2979 -1970) -008093

log λ8 -26860 02547 1013 -1054 lt0001 (-3186 -2186) -00729

log λ9 -31547 02510 1013 -1257 lt0001 (-3647 -2662) -005614

β 02643 006188 1013 427 lt0001 (0143 0386) 0314192

Figure 5 The estimates of the parameters for coping under SPM1L with J = 9 based onLBSQP

Journal of Statistical Software 13

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_icyc 09393 (0674 1310)

HR_reint 07150 (0507 1009)

HR_intera 14140 (0882 2267)

HR_agegrp 08330 (0633 1096)

HR_nodegrp 25915 (2041 3291)

HR_er_stat 08573 (0662 1110)

λ1 0007256 (0004 0013)

λ2 007489 (0042 0133)

λ3 006438 (0040 0105)

λ4 008788 (0054 0143)

λ5 009116 (0056 0149)

λ6 007318 (0045 0120)

λ7 008424 (0051 0140)

λ8 006815 (0041 0112)

λ9 004265 (0026 0070)

HR_β 13025 (1154 1471)

Figure 6 The hazard ratios and λ estimates for coping under SPM1L with J = 9 based onLBSQP

random-effects covariance matrix as well as the estimate of the standard deviation (σ) for theerror term The estimates of the coefficients of the variables from the longitudinal componentare shown in Subtable ldquoLongitudinal Parameter Estimatesrdquo In Subtable ldquoSurvival ParameterEstimatesrdquo ldquoParamVarrdquo lists all the names of the variables as well as the parameters from thesurvival component In this subtable log λ1 log λ2 and log λ9 are the natural logarithmsof the piecewise baseline hazards for the three intervals respectively The hazard ratios of thecovariates and β as well as the estimates of λ are shown in Figure 6 From Figure 5 we seethat the number of induction cycles has p values of 00218 and 07116 in the longitudinal andsurvival submodels respectively indicating that the effect of the number of induction coursesis significant at the 005 level in the longitudinal submodel and is not statistically significantin the survival submodel The reintroduction of CMF and the interaction between the numberof initial cycles and the reintroduction do not have significant effects on QOL or OS based onthe large p values From Figure 5 we also see that (i) the number of positive nodes is highlysignificant in the survival submodel implying that a greater number of positive nodes wouldincrease the risk of death and (ii) a significant coefficient β indicates that coping is highlyassociated with OSFrom the ldquoFit Statisticsrdquo tables in Figure 3 and Figure 7 we see that coping had the largestvalues of ∆AIC and ∆BIC and physical well-being had the smallest values of ∆AIC and ∆BICwhich indicate that coping led to the most gain in fitting the OS while physical well-beinghad the least contribution to the fit of the OS

14 JMFit Joint Models of Longitudinal and Survival Data in SAS

Fit Statistics

Log Likelihood -870714

AIC 1747029 BIC 1760812

AICLong 1508529 BICLong 1514436

AICSurv|Long 238500 BICSurv|Long 246376

AICSurv0 240095 BICSurv0 247479

ΔAIC 1596 ΔBIC 1104

Fit Statistics

Log Likelihood -854107

AIC 1713814 BIC 1727597

AICLong 1476026 BICLong 1481933

AICSurv|Long 237788 BICSurv|Long 245664

AICSurv0 240095 BICSurv0 247479

ΔAIC 2308 ΔBIC 1815

(a) Appetite (b) Mood

Fit Statistics

Log Likelihood -876795

AIC 1759190 BIC 1772973

AICLong 1519749 BICLong 1525657

AICSurv|Long 239440 BICSurv|Long 247317

AICSurv0 240095 BICSurv0 247479

ΔAIC 655 ΔBIC 163

(c) Physical well-being

Figure 7 The fit statistics for appetite mood and physical well-being under SPM1L withJ = 9 based on LBSQP

AICSurv|Long ∆AICJ LBSQP MBSQP RBSQP LBSQP MBSQP RBSQP1 252820 252820 252820 370 370 3702 251219 251219 251219 884 884 8843 244981 244981 250867 2041 2041 9654 244598 244598 244598 2152 2152 21525 240602 244795 244396 2928 2155 22436 240799 244959 244561 2932 2160 22477 240959 240959 244757 2941 2941 22508 240734 240734 240734 3054 3054 30549 236460 240938 240922 3635 3050 306410 236565 240990 241105 3654 3060 3063

Table 1 AICSurv|Longrsquos and ∆AICrsquos for coping under SPM1L with different partition algo-rithms and different J rsquos

Since ∆AIC = AICSurv0 minus AICSurv|Long both AICSurv0 and AICSurv|Long change when Jchanges Thus the ldquobest Jrdquo based on ∆AIC may not necessarily lead to the best survivalsubmodel or the best joint model As an illustration Table 1 shows the values of AICSurv|Longand ∆AIC for coping under SPM1L with different partition algorithms and different J rsquosNote TMAXI is set to 2 with WEIGHT = 05 in all the models From Table 1 we see that (i)J = 10 or J = 9 is the ldquobestrdquo choice according to ∆AIC (ii) J = 9 or J = 8 is the ldquobestrdquoone based on AICSurv|Long and (iii) the value of AICSurv|Long is 236460 for LBSQP when

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 2: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

2 JMFit Joint Models of Longitudinal and Survival Data in SAS

the ldquoshared parameter modelsrdquo (SPMs) There are two basic formulations of SPMs Thefirst is the ldquotime trajectory modelrdquo denoted by SPM1 where one essentially substitutes thepolynomial time trajectory function from the longitudinal model into the hazard function ofthe survival model and in this case the trajectory function acts like a time-varying covariatein the survival model The second formulation denoted by SPM2 is to directly include therandom effects as covariates in the survival model There are several R packages available infitting joint models based on shared random effects including JM (Rizopoulos 2012) JMbayes(Rizopoulos 2016) and joineR (Philipson Sousa Diggle Williamson Kolamunnage-Donaand Henderson 2012) There is also a Stata module stjm (Crowther 2012 Crowther Abramsand Lambert 2013) which estimates shared random effects models In addition another Rpackage lcmm (Proust-Lima Philipps and Liquet 2016) estimates joint models based onshared latent classesOne important issue in the joint modeling of longitudinal and survival data concerns theseparate contribution of the model components to the overall goodness-of-fit of the jointmodel Recently Zhang et al (2014) derived a novel decomposition of the AIC and BICcriteria into additive components that will allow us to assess the goodness of fit for eachcomponent of the joint model Such a decomposition leads to the development of ∆AICand ∆BIC which quantify the change of AIC and BIC in fitting the survival data with andwithout using the longitudinal data Thus ∆AIC and ∆BIC can be used to determine theimportance of the longitudinal data relative to the model fit of the survival data In addition∆AIC and ∆BIC are also very useful in assessing whether a linear trajectory or quadratictrajectory is more suitable and also facilitating a direct comparison between SPM1s andSPM2s These measures will help the data analyst in not only assessing each component ofthe joint model but also in determining the contribution of the longitudinal measures to thefit of the survival data These newly developed model assessment criteria are not available inany of these packages or module mentioned before We mention here that the methodologyfor ∆AIC and ∆BIC was fully developed in Zhang et al (2014) but our goal here is thenovel implementation of this methodology into user-friendly software along with a class ofjoint models for jointly analyzing longitudinal and time-to-event dataThis paper introduces JMFit a SAS macro that will allow us to fit the SPM1 SPM2 time-varying covariates and two-stage models as well as to assess the goodness-of-fit of each ofthe longitudinal and survival components in the joint model A detailed analysis of thelongitudinal and survival data from a cancer clinical trial as well as an analysis of the simulateddata are carried out to illustrate the functionality of JMFit A detailed description of JMFitis given in Appendix A

2 The models and model assessment

21 The joint models

Suppose that there are n subjects For the ith subject let yi(t) denote the longitudinalmeasure which is observed at time t isin ai1 ai2 aimi where 0 le ai1 lt ai2 lt middot middot middot lt aimi

and mi ge 1 Here yi(0) denotes the baseline value of the longitudinal measure Let tiand δi be the failure time and the censoring indicator such that δi = 1 if ti is a failuretime and 0 if ti is right-censored for the ith subject We further let xi(t) and zi denote a

Journal of Statistical Software 3

pL-dimensional vector of time-dependent covariates and a pS-dimensional vector of baselinecovariates respectively The joint model for (yi ti) consists of the longitudinal componentand the survival componentFor the longitudinal component a mixed effects regression model is assumed for yi(t) whichtakes the form

yi(aij) = θgti g(aij) + γgtxi(t) + εi(aij) (1)

where g(aij) = (1 aij a2ij a

qij)prime is a polynomial vector of order q for j = 1 mi θi is

a (q + 1)-dimensional vector of random effects and γ is a p-dimensional vector of regressioncoefficients In (1) we further assume θi sim N(θΩ) where θ is the (q+1)-dimensional vectorof the overall effects Ω is a (q + 1) times (q + 1) positive definite covariance matrix with thelower triangle consisting of Ω00Ω10Ω11 Ωqq εi(aij) sim N(0 σ2) and θi and εi(aij) areindependent We note that in (1) if q = 1 g(aij) = (1 aij)gt and θgti g(aij) represents a lineartrajectory and if q = 2 g(aij) = (1 aij a2

ij)gt and θgti g(aij) leads to a quadratic trajectoryFor the failure time ti we assume that the hazard function takes the form

λ(t|λ0βαθi g(t) zi) = λ0(t) expβθgti g(t) +αgtzi (2)

orλ(t|λ0βαθi g(t) zi) = λ0(t) expβgtθi +αgtzi (3)

where λ0(t) is the baseline hazard function β is a one-dimensional regression coefficient in(2) and β is a (q+ 1)-dimensional vector of the regression coefficients in (3) Note that in (2)or (3) θi and g(t) are the parameters and the functions from the longitudinal component ofthe joint model in (1) while λ0 β (or β) and α are the only parameters pertaining to thesurvival component As shown in Section 31 β (or β) controls the association between thelongitudinal marker and the time-to-event A value of β = 0 (or β = 0) implies no associationbetween the longitudinal marker and the time-to-event The joint model with hazard functionspecified in (2) is the trajectory model denoted by SPM1 while the one with hazard functiongiven by (3) is denoted by SPM2 Under SPM1 a positive value of β implies that a largercurrent value of the longitudinal marker is associated with a larger instantaneous hazardwhereas a negative value of β implies that a larger current value of the longitudinal markeris associated with a smaller instantaneous hazard

22 The construction of the piecewise constant baseline hazard function

Assuming λ0(t) to be a piecewise constant baseline hazard function we partition the timeaxis into J intervals with 0 = sJ0 lt sJ1 lt sJ2 lt lt sJJminus1 lt sJJ = infin Then we assign aconstant baseline hazard to each of the J intervals that is

λ0(t) = λj t isin (sJjminus1 sJj ] for j = 1 2 J (4)

Let tlowast1 le tlowast2 le middot middot middot le tlowastnlowast be the nlowast event times of the tirsquos where nlowast =sumni=1 δi We consider

four algorithms to construct the sJj rsquos

Algorithm 1 Equally-spaced quantile partition (ESQP)

Step 1 Compute pj = jJ for j = 1 J minus 1

4 JMFit Joint Models of Longitudinal and Survival Data in SAS

Step 2 Let nj = [pjnlowast] which is the integer part of pjnlowast

Step 3 Set

sJj =

(tlowastnj+ tlowastnj+1)2 if nj = pjn

lowast

tlowastnj+1 if nj lt pjnlowast

(5)

for j = 1 J minus 1

The ESQP is a popular approach to construct the piecewise constant hazard function whichis discussed in Ibrahim Chen and Sinha (2001 Chapter 5) and also implemented in the Rpackage JM developed by Rizopoulos (2010) We note that sJj is the pjth quantile of the tlowasti rsquosand (5) is implemented in the SAS UNIVARIATE procedure as the default option for computingquantiles We also note that the ESQP algorithm does not yield nested partitions in the sensethat sJ1

j j = 1 J1 is not necessarily a subset of sJ2j j = 1 J2 when J1 lt J2

In order to construct nested partitions we propose the following three bi-sectional quantilepartition algorithms

Algorithm 2 Left bi-sectional quantile partition (LBSQP)

Step 1 Decompose J into two parts

J = 2K +M (6)

where K and M are integers and M lt 2K In (6) K = [log J log 2] and thenM = J minus 2K

Step 2 Compute

(i) ak = k2K for k = 1 2K minus 1 and(ii) bm = (2mminus 1)2K+1 for m = 1 M(ge 1)

Step 3 Sort a1 a2Kminus1 b1 bM in ascending order and the resulting ordered J minus 1values are denoted by p1 le p2 le middot middot middot le pJminus1

Step 4 Use Steps 2 and 3 of Algorithm 1 to compute sJj j = 1 J minus 1

Algorithm 3 Middle bi-sectional quantile partition (MBSQP)

Step 1 The same as Algorithm 2

Step 2 Compute

(i) ak = k2K for k = 1 2K minus 1 and(ii) for m = 1 M(ge 1)

(a) bm = (2K minusm)2K+1 for m = 1 3 5 and(b) bm = (2K +mminus 1)2K+1 for m = 2 4 6

Steps 3 and 4 The same as Algorithm 2

Journal of Statistical Software 5

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

0 10 20 30 40 50 60 70 80 90 100()

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

0 10 20 30 40 50 60 70 80 90 100()

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

J = 1

J = 2 = 21

J = 3 = 21 + 1

J = 4 = 22

J = 5 = 22 + 1

J = 6 = 22 + 2

J = 7 = 22 + 3

J = 8 = 23

J = 9 = 23 + 1

0 10 20 30 40 50 60 70 80 90 100()

(a) LBSQP (b) MBSQP (c) RBSQP

Figure 1 An illustration of three bi-sectional partition methods

Algorithm 4 Right bi-sectional quantile partition (RBSQP)

Step 1 The same as Algorithm 2

Step 2 Compute

(i) ak = k2K for k = 1 2K minus 1 and(ii) bm = (2K+1 minus (2mminus 1))2K+1 for m = 1 M(ge 1)

Steps 3 and 4 The same as Algorithm 2

If there are ties in sJj j = 1 J say sJ` = sJ`+1 the interval (sJ` sJ`+1] is undefined Thuswe only use distinct values of the sJj Let Jt denote the number of ties in sJj j = 1 JThen the number of distinct intervals reduces to J minus JtFigure 1 shows how the partition intervals are constructed based on LBSQP MBSQP andRBSQPNotice that when J = 2K K = 1 2 ESQP LBSQP MBSQP and RBSQP yield thesame partition LBSQP MBSQP and RBSQP are desirable when there are more events atthe beginning in the middle and at the end of the follow-up period respectively Anotheradvantage of LBSQP MBSQP and RBSQP is that the resulting partitions are nested andhence the log-likelihood of the joint model increases in J when the longitudinal componentremains fixed

23 The joint likelihoodWe rewrite (1) as follows

yi = Wi(θgti γgt)gt + εiwhere yi = (yi(ai1) yi(aimi))gt Wi = ((g(aij)gtxi(aij)gt)gt j = 1 mi)gt and εi =(εi(ai1) εi(aimi))gt sim N(0 σ2Imi) The complete-data likelihood function of the longitu-dinal measures for the ith subject is given by

L(γ σ2|yiWiθi) = 1(2πσ2)

mi2

expminus 1

2σ2 (yi minusWi(θgti γgt)gt)gt(yi minusWi(θgti γgt)gt) (7)

6 JMFit Joint Models of Longitudinal and Survival Data in SAS

for i = 1 n Note that the density of θi is given by

f(θi|θΩ) = |Ω|minus12

(2π)q+1

2exp

minus 1

2(θi minus θ)gtΩminus1(θi minus θ) (8)

Let ϕ = (λβαγ σ2θΩ) Using (2) (or (3)) (7) and (8) the observed-data likelihoodfunction for (yi ti δi) for the ith subject is given by

L(ϕ|yi ti δi ziWi) =intL(λβα|ti δi ziθi g)L(γ σ2|yiWiθi)f(θi|θΩ)dθi (9)

where the complete-data likelihood function for the survival component is written as

L(λβα|ti δi ziθi g) =[λ(ti|λ0βαθi g(ti) zi)

]δi

times expminusint ti

0λ(u|λ0βαθi g(u) zi)du

(10)

for i = 1 n In (2) (or (3)) when β = 0 (or β = 0) the hazard function reduces toλ(t|λ0α zi) = λ0(t) exp(αgtzi) In this case we fit the survival data alone and the likelihoodfunction in (10) for the ith subject reduces to

L0(λα|ti δi zi) = λ0(ti) exp(αgtzi)δi exp[minus exp(αgtzi)

int ti

0λ0(u)du

] (11)

Letting Dobs = (yi ti δixi zi) i = 1 n denote the observed data the joint likelihoodfor all subjects is given by

L(ϕ|g Dobs) =nprodi=1

L(ϕ|yi ti δi ziWi) (12)

24 AIC (BIC) decomposition and ∆AIC (∆BIC)

Write ϕ1 = (γ σ2θΩ) and ϕ2 = (λβα) Let f(θi|yiWiϕ1) be the conditional density ofthe random effects θi given yi and also let L(ϕ1|yiWi) =

intL(γ σ2|yiWiθi)f(θi|θΩ)dθi

which is the likelihood function corresponding to the marginal distribution of yi FollowingZhang et al (2014) the joint likelihood given in (12) can be decomposed as

L(ϕ|g Dobs) = LLong(ϕ1|g Dobs)LSurv|Long(ϕ2|gϕ1 Dobs) (13)

where LLong(ϕ1|g Dobs) =prodni=1 L(ϕ1|yiWi) and LSurv|Long(ϕ2|gϕ1 Dobs) =

prodni=1

intL(ϕ2|ti

δi ziθi g)f(θi|yiWiϕ1)dθi Using (13) the decomposition of the total Akaike InformationCriterion (AIC) (Akaike 1973) developed in Zhang et al (2014) is given as

AIC = AICLong + AICSurv|Long

where AIC = minus2 logL(ϕ|g Dobs)+2 dim(ϕ) AICLong = minus2 logLLong(ϕ1|g Dobs)+2 dim(ϕ1)AICSurv|Long = minus2 logLSurv|Long(ϕ2|g ϕ1 Dobs) + 2 dim(ϕ2) and ϕ ϕ1 and ϕ2 are the maxi-mum likelihood estimates (MLEs) of ϕ ϕ1 and ϕ2 Similarly the total Bayesian InformationCriterion (BIC) (Schwarz 1978) for the joint model can be decomposed into

BIC = BICLong + BICSurv|Long

Journal of Statistical Software 7

where BIC = AIC+dim(ϕ)(lognminus2) BICLong = AICLong+dim(ϕ1) (lognminus2) BICSurv|Long =AICSurv|Long + dim(ϕ2)(logn minus 2) Using the decompositions of AIC and BIC Zhang et al(2014) proposed two new model assessment criteria given by

∆AIC = AICSurv0 minusAICSurv|Long∆BIC = BICSurv0 minus BICSurv|Long

whereAICSurv0 = minus2

sumni=1 logL0(λ α|ti δi zi) + 2 dim(λα)

BICSurv0 = minus2sumni=1 logL0(λ α|ti δi zi) + dim(λα) logn

and L0(λα|ti δi zi) is defined by (11) The ∆AIC or ∆BIC measure the gain of the fit in thesurvival component due to the longitudinal data with a penalty for the additional parametersin the survival component of the joint model The model with a large value of ∆AIC (∆BIC)is more preferred

3 The SAS macro JMFit

31 Design

The SAS macro JMFit has been developed to assess model fit in joint models of longitudinaland survival data In fact it can fit five models including the two types of joint models withlinear and quadratic trajectories as well as the time-varying covariates model The macroJMFit consists of five submacros SPM1L SPM1Q SPM2L SPM2Q and TVC corresponding to thefive models respectively The MODEL argument of JMFit specifies one of the following fivemodels to be fitted

SPM1L SPM1 with Linear trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ(θ0i + θ1it) +αgtzi

SPM1Q SPM1 with Quadratic trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ(θ0i + θ1it+ θ2it2) +αgtzi

SPM2L SPM2 with Linear trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ1θ0i + β2θ1i +αgtzi

where θ0i and θ1i are subject-level random intercept and random slope

SPM2Q SPM2 with Quadratic trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ1θ0i + β2θ1i + β3θ2i +αgtzi

where θ0i θ1i and θ2i are random effects (ie random intercept random slope andrandom quadratic coefficient)

8 JMFit Joint Models of Longitudinal and Survival Data in SAS

TVC Time-Varying Covariates model The hazard function has the form

λ(t|λ0 βα zi yi(t)) = λ0(t) expβyi(t) +αgtzi

where yi(t) = yi(aij) for aij le t lt aij+1 for j = 1 mi where aimi+1 = infin Notethat the TVC model is a ldquonon-jointrdquo model and the use of this model has great potentialfor bias (Fisher and Lin 1999)

We provide the two versions of SPM1L SPM1Q SPM2L and SPM2Q with the TS argumentIf TS is missing or equal to 0 the joint model will be fit while TS = 1 yields the correspondingtwo-stage model Similar to the method in Tsiatis DeGruttola and Wulfsohn (1995) (i) wefirst fit the linear mixed model specified in (1) to the longitudinal data alone and then obtainthe estimates of θi denoted by θi and (ii) we replace θi in (2) or (3) with the estimate θiat the second stage The only difference between the joint model and the corresponding two-stage model is that θi in (2) or (3) is replaced with the estimate θi This two-stage approachmay potentially lead to biased and inefficient estimates (Ibrahim Chu and Chen 2010)The number of intervals J (ge 1) for the piecewise constant baseline hazard function needs tobe specified in the NPIECES argument For the PARTITION argument 1 represents ESQP 2represents LBSQP 3 represents MBSQP and 4 represents RBSQPJMFit automatically produces a rich text file (RTF) including five tables (i) Number of Sub-jects (ii) Fit Statistics (iii) Survival Parameter Estimates (Survival Alone) (iv) ParameterEstimates and (v) Hazard Ratios amp λ Estimates

32 Implementation detailsIf the observed longitudinal measures are sparse the full trajectories of longitudinal measuresmight not be well estimated For example in the case of fitting a quadratic trajectory the signof the estimated second-order coefficient could be incorrect if the longitudinal measures wereobserved only within the first half of the follow-up period leading to incorrect extrapolationwhen the observed progression time was far beyond the time of the last observed longitudinalmeasureLet tmaxi = max1lejlemiaij When t gt tmaxi yi(t) is never observed and no longitudinaldata are available to estimate the trajectory θgti g(t) for t gt tmaxi Under SPM1 the extrap-olation of the trajectory θgti g(t) beyond tmaxi may lead to a survival component of the jointmodel that fits the survival data poorly In addition such an extrapolation also causes asevere convergence problem in the SAS NLMIXED (SAS Institute Inc 2011b) procedure espe-cially when tmaxi ti for many subjects To circumvent these issues for SPM1 we modifythe hazard function in (2) as

λ(t | λ0αβθi g zi) = λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+) +αgtzi (14)

or

λ(t | λ0αβθi g zi)

= λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+)timesτ minus (tlowastmaxi + [tminus tlowastmaxi]+)

τ minus tlowastmaxi+αgtzi (15)

where [tminustlowastmaxi]+ = max(tminustlowastmaxi 0) tlowastmaxi = tmaxi+wtimesmax(timinustmaxi 0) w (isin [0 1]) is theproportion of max(ti minus tmaxi 0) and τ = max1leilenti which is the last follow-up survival

Journal of Statistical Software 9

time If w = 0 (default) tmaxi will be the starting point of the modified extrapolation of thetrajectory while w = 1 implies that the trajectory extends to ti with no tmaxi adjustmentThis modification can also be applied to the TS model corresponding to SPM1L or SPM1QThere are two arguments called TMAXI and WEIGHT If TMAXI is missing or equal to 0 no tmaxiadjustment will be applied If TMAXI = 1 the tmaxi adjustment based on hazard functiongiven in (14) will be applied with weight given in the WEIGHT argument that is the trajectorywill become flat after tlowastmaxi If TMAXI = 2 the tmaxi adjustment based on hazard functiongiven in (15) will be applied with weight given in the WEIGHT argument that is startingat tlowastmaxi the trajectory will linearly go down to 0 at the last follow-up survival time Anldquooptimalrdquo choice of weight w in conjunction with TMAXI option may be determined by eitherAICSurv|Long or BICSurv|Long The purpose of defining tmaxi is that we create an extrapolationof the longitudinal measures so that the trajectory function can be well estimated Thisextrapolation is needed when there are few longitudinal measures at later points We notehere that the tmaxi method corresponds to a ldquoprediction carried forwardrdquo approach whichmay potentially induce bias in the estimation We also note that this is not an issue for theSPM2 in which the hazard function is independent of g(t)The OPTIONS argument allows users to specify options (eg integration method optimizationtechnique and convergence criteria) that are available in the PROC NLMIXED statement IfOPTIONS is missing JMFit will use adaptive Gaussian quadrature to approximate the integralof the likelihood over the random effects perform a quasi-Newton optimization and apply arelative gradient convergence criterion of 10minus8 All the methods mentioned above are defaultmethods in PROC NLMIXED The Riemann integral is used to compute the cumulative hazardfunction for the trajectory models Each time interval is divided into 200 subintervalsA big challenge in fitting joint models using the SAS NLMIXED procedure is convergence Poorinitial values may lead to the failure of convergence in NLMIXED To address this issue wefirst fit the longitudinal data alone using the SAS MIXED procedure to obtain the estimatesγ σ2 θ Ω θi for the parameters in the longitudinal component of the joint model Using(θi i = 1 n) to replace (θi i = 1 n) in (2) (or (3)) we fit the survival data aloneto obtain the estimates β (or β) α and λ for the parameters in the survival component ofthe joint model Finally these estimates are used as the initial values for the joint modelJMFit does not exclude any longitudinal measures for the joint models If one wishes toexclude the longitudinal measures observed after the survival time ti those longitudinal mea-sures should be pre-excluded in the input longitudinal data for JMFit ldquoCAUTION Longi-tudinal measures are observed after the survival timerdquo will be given at the end of the outputfile if there are any longitudinal measures observed after the survival time

4 Examples

41 The IBCSG data

To illustrate how JMFit works we use the data from a clinical trial in premenopausal womenwith node-positive breast cancer conducted by the IBCSG (International Breast Cancer StudyGroup 1996) Each participant was randomly assigned in a 2 times 2 factorial design to receive(A) cyclophosphamide methotrexate and fluorouracil for 6 consecutive courses on months 1to 6 (CMF6) (B) CMF6 plus three single courses of reintroduction CMF given on months 9

10 JMFit Joint Models of Longitudinal and Survival Data in SAS

Number of Subjects

Subjects_in_coping 1015

Subjects_in_os 1015

Subjects_Used 1015

Figure 2 The number of subjects for coping

12 and 15 (C) CMF for three consecutive courses on months 1 to 3 (CMF3) or (D) CMF3plus three single courses of reintroduction CMF given on months 6 9 and 12 Four indicatorsof patientsrsquo quality of life (QOL) appetite perceived coping mood and physical well-beingwere collected in this trial We consider a subset of the IBCSG data which consists of 1015patients with at least three values of each longitudinal measure and six binary covariatesincluding treatment (A 1 = icyc 0 = reint and 0 = intera B 1 = icyc 1 = reintand 1 = intera C 0 = icyc 0 = reint and 0 = intera and D 0 = icyc 1 = reintand 0 = intera) age (1 = lsquogt 40rsquo and 0 = lsquole 40rsquo) the number of positive nodes of thetumor (1 = lsquogt 4rsquo and 0 = lsquole 4rsquo) and estrogen receptor (ER) status (1 = positive and 0 =negative) To satisfy the normality assumption for each of the four QOL indicators followingChi and Ibrahim (2006) the corresponding observed value of each QOL was transformed toradic

100minusQOL so that smaller value reflects better QOL For the subset of 1015 patients aftera median follow-up of 968 years (interquartile range 810ndash1104 years) 296 patients diedWe note that QOL indicators were collected only up to 165 years (about 18 months) Byapplying joint models in this study we compare the longitudinal QOL indicators in termsof their contributions to the fit of overall survival (OS) via ∆AIC and ∆BIC which arecomputed using JMFitSuppose we fit the trajectory model with a linear trajectory as considered by Chi and Ibrahim(2006) and Zhu Ibrahim Chi and Tang (2012) and J = 9 for coping Then we create two datasets named as ldquocopingrdquo and ldquoosrdquo for the macro JMFitrsquos LONG and SURV options respectivelyThe MODEL option is set to ldquoSPM1Lrdquo and TS is set to 0 We assign 2 to TMAXI for tmaxiadjustment with WEIGHT=05 The NPIECES option is set to 9 and the PARTITION option isset to 2 for the LBSQP algorithmJMFit is called

JMFit(LONG = coping SURV = os MODEL = SPM1L TS = 0 TMAXI = 2WEIGHT = 05 NPIECES = 9 PARTITION = 2)

The RTF file lsquoOutput for coping under SPM1L with J=9 (Partition=2)rtfrsquo is gener-ated by JMFit which lists five tables Table ldquoNumber of Subjectsrdquo given in Figure 2 showsthat there are 1015 patients in data sets coping and os respectively and 1015 subjects areused implying that the IDs of the subjects in these two data sets matchFigure 3 shows the ldquoFit Statisticsrdquo table which consists of the log likelihood AICLong(BICLong) AICSurv (BICSurv) and ∆AIC (∆BIC) Inside the JMFit macro PROC NLMIXEDprovides the log likelihood AIC and BIC and PROC IML (SAS Institute Inc 2011a) is usedto compute AICLong and BICLongNext the table titled ldquoSurvival Parameter Estimates (Survival Alone)rdquo shown in Figure 4is obtained by fitting the survival data alone For this table the estimate the standard error

Journal of Statistical Software 11

Fit Statistics

Log Likelihood -806953

AIC 1619507 BIC 1633290

AICLong 1383046 BICLong 1388954

AICSurv|Long 236460 BICSurv|Long 244336

AICSurv0 240095 BICSurv0 247479

ΔAIC 3635 ΔBIC 3143

Figure 3 The fit statistics for coping under SPM1L with J = 9 based on LBSQP

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

icyc -000126 01642 1015 -001 09939 (-0323 0321) -000133

reint -02772 01707 1015 -162 01046 (-0612 0058) -000141

intera 02678 02337 1015 115 02520 (-0191 0726) -000211

agegrp -02070 01356 1015 -153 01272 (-0473 0059) 0000143

nodegrp 08926 01170 1015 763 lt0001 (0663 1122) -000009

er_stat -01982 01275 1015 -156 01203 (-0448 0052) -000221

log λ1 -49829 02846 1015 -1751 lt0001 (-5541 -4424) -000031

log λ2 -27823 02848 1015 -977 lt0001 (-3341 -2224) -000051

log λ3 -29729 02373 1015 -1253 lt0001 (-3439 -2507) 0003588

log λ4 -27064 02359 1015 -1147 lt0001 (-3169 -2244) -000073

log λ5 -26998 02364 1015 -1142 lt0001 (-3164 -2236) -000167

log λ6 -29433 02355 1015 -1250 lt0001 (-3406 -2481) 0001627

log λ7 -28127 02387 1015 -1178 lt0001 (-3281 -2344) -000293

log λ8 -30135 02367 1015 -1273 lt0001 (-3478 -2549) 0001908

log λ9 -34253 02358 1015 -1453 lt0001 (-3888 -2963) -000119

Figure 4 The estimates of the parameters obtained by (11) with J = 9 based on LBSQP

(SE) the degrees of freedom (DF) the t value the p value and the 95 confidence interval(CI) are all shown for each parameter In addition the gradient of the negative log-likelihoodfunction is displayed which can be used to check convergence of PROC NLMIXED A smallgradient implies better convergence The largest absolute value of the gradients shown inFigure 4 is 0003588 indicating good convergence of PROC NLMIXEDPROC NLMIXED produces the table titled ldquoParameter Estimatesrdquo in Figure 5 which consistsof three subtables ldquoCovariance Parameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquoand ldquoSurvival Parameter Estimatesrdquo For each parameter in this table the estimate SE DFt value p value 95 CI and gradient are all provided The Subtable ldquoCovariance ParameterEstimatesrdquo consists of the estimates of the three lower-triangle elements (Ω00Ω10Ω11) of the

12 JMFit Joint Models of Longitudinal and Survival Data in SAS

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 05350 003274 1013 1634 lt0001 (0471 0599) -008107

Ω10 -005658 002094 1013 -270 00070 (-0098 -0015) 0101966

Ω11 01725 002650 1013 651 lt0001 (0120 0224) -002461

σ 06149 0007308 1013 8414 lt0001 (0601 0629) 0087386

Longitudinal Parameter Estimates

Intercept 008643 007928 1013 109 02759 (-0069 0242) 0123621

t -04004 002277 1013 -1758 lt0001 (-0445 -0356) -015625

icyc 01625 007073 1013 230 00218 (0024 0301) 0015924

reint 01259 007007 1013 180 00726 (-0012 0263) 0313939

intera -006607 009767 1013 -068 04989 (-0258 0126) 0002171

agegrp 005158 005907 1013 087 03828 (-0064 0167) -013258

nodegrp 002780 005384 1013 052 06057 (-0078 0133) -018797

er_stat 002232 005408 1013 041 06799 (-0084 0128) -005203

Survival Parameter Estimates

icyc -006267 01695 1013 -037 07116 (-0395 0270) 0024537

reint -03355 01756 1013 -191 00563 (-0680 0009) -001849

intera 03464 02406 1013 144 01502 (-0126 0818) 014234

agegrp -01827 01397 1013 -131 01913 (-0457 0091) 0049036

nodegrp 09522 01217 1013 782 lt0001 (0713 1191) 003694

er_stat -01540 01315 1013 -117 02418 (-0412 0104) -000715

log λ1 -49259 02888 1013 -1706 lt0001 (-5493 -4359) 0016501

log λ2 -25917 02923 1013 -887 lt0001 (-3165 -2018) -002471

log λ3 -27430 02482 1013 -1105 lt0001 (-3230 -2256) -00547

log λ4 -24318 02494 1013 -975 lt0001 (-2921 -1942) -006882

log λ5 -23952 02519 1013 -951 lt0001 (-2889 -1901) -007637

log λ6 -26149 02529 1013 -1034 lt0001 (-3111 -2119) -008057

log λ7 -24741 02571 1013 -962 lt0001 (-2979 -1970) -008093

log λ8 -26860 02547 1013 -1054 lt0001 (-3186 -2186) -00729

log λ9 -31547 02510 1013 -1257 lt0001 (-3647 -2662) -005614

β 02643 006188 1013 427 lt0001 (0143 0386) 0314192

Figure 5 The estimates of the parameters for coping under SPM1L with J = 9 based onLBSQP

Journal of Statistical Software 13

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_icyc 09393 (0674 1310)

HR_reint 07150 (0507 1009)

HR_intera 14140 (0882 2267)

HR_agegrp 08330 (0633 1096)

HR_nodegrp 25915 (2041 3291)

HR_er_stat 08573 (0662 1110)

λ1 0007256 (0004 0013)

λ2 007489 (0042 0133)

λ3 006438 (0040 0105)

λ4 008788 (0054 0143)

λ5 009116 (0056 0149)

λ6 007318 (0045 0120)

λ7 008424 (0051 0140)

λ8 006815 (0041 0112)

λ9 004265 (0026 0070)

HR_β 13025 (1154 1471)

Figure 6 The hazard ratios and λ estimates for coping under SPM1L with J = 9 based onLBSQP

random-effects covariance matrix as well as the estimate of the standard deviation (σ) for theerror term The estimates of the coefficients of the variables from the longitudinal componentare shown in Subtable ldquoLongitudinal Parameter Estimatesrdquo In Subtable ldquoSurvival ParameterEstimatesrdquo ldquoParamVarrdquo lists all the names of the variables as well as the parameters from thesurvival component In this subtable log λ1 log λ2 and log λ9 are the natural logarithmsof the piecewise baseline hazards for the three intervals respectively The hazard ratios of thecovariates and β as well as the estimates of λ are shown in Figure 6 From Figure 5 we seethat the number of induction cycles has p values of 00218 and 07116 in the longitudinal andsurvival submodels respectively indicating that the effect of the number of induction coursesis significant at the 005 level in the longitudinal submodel and is not statistically significantin the survival submodel The reintroduction of CMF and the interaction between the numberof initial cycles and the reintroduction do not have significant effects on QOL or OS based onthe large p values From Figure 5 we also see that (i) the number of positive nodes is highlysignificant in the survival submodel implying that a greater number of positive nodes wouldincrease the risk of death and (ii) a significant coefficient β indicates that coping is highlyassociated with OSFrom the ldquoFit Statisticsrdquo tables in Figure 3 and Figure 7 we see that coping had the largestvalues of ∆AIC and ∆BIC and physical well-being had the smallest values of ∆AIC and ∆BICwhich indicate that coping led to the most gain in fitting the OS while physical well-beinghad the least contribution to the fit of the OS

14 JMFit Joint Models of Longitudinal and Survival Data in SAS

Fit Statistics

Log Likelihood -870714

AIC 1747029 BIC 1760812

AICLong 1508529 BICLong 1514436

AICSurv|Long 238500 BICSurv|Long 246376

AICSurv0 240095 BICSurv0 247479

ΔAIC 1596 ΔBIC 1104

Fit Statistics

Log Likelihood -854107

AIC 1713814 BIC 1727597

AICLong 1476026 BICLong 1481933

AICSurv|Long 237788 BICSurv|Long 245664

AICSurv0 240095 BICSurv0 247479

ΔAIC 2308 ΔBIC 1815

(a) Appetite (b) Mood

Fit Statistics

Log Likelihood -876795

AIC 1759190 BIC 1772973

AICLong 1519749 BICLong 1525657

AICSurv|Long 239440 BICSurv|Long 247317

AICSurv0 240095 BICSurv0 247479

ΔAIC 655 ΔBIC 163

(c) Physical well-being

Figure 7 The fit statistics for appetite mood and physical well-being under SPM1L withJ = 9 based on LBSQP

AICSurv|Long ∆AICJ LBSQP MBSQP RBSQP LBSQP MBSQP RBSQP1 252820 252820 252820 370 370 3702 251219 251219 251219 884 884 8843 244981 244981 250867 2041 2041 9654 244598 244598 244598 2152 2152 21525 240602 244795 244396 2928 2155 22436 240799 244959 244561 2932 2160 22477 240959 240959 244757 2941 2941 22508 240734 240734 240734 3054 3054 30549 236460 240938 240922 3635 3050 306410 236565 240990 241105 3654 3060 3063

Table 1 AICSurv|Longrsquos and ∆AICrsquos for coping under SPM1L with different partition algo-rithms and different J rsquos

Since ∆AIC = AICSurv0 minus AICSurv|Long both AICSurv0 and AICSurv|Long change when Jchanges Thus the ldquobest Jrdquo based on ∆AIC may not necessarily lead to the best survivalsubmodel or the best joint model As an illustration Table 1 shows the values of AICSurv|Longand ∆AIC for coping under SPM1L with different partition algorithms and different J rsquosNote TMAXI is set to 2 with WEIGHT = 05 in all the models From Table 1 we see that (i)J = 10 or J = 9 is the ldquobestrdquo choice according to ∆AIC (ii) J = 9 or J = 8 is the ldquobestrdquoone based on AICSurv|Long and (iii) the value of AICSurv|Long is 236460 for LBSQP when

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 3: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

Journal of Statistical Software 3

pL-dimensional vector of time-dependent covariates and a pS-dimensional vector of baselinecovariates respectively The joint model for (yi ti) consists of the longitudinal componentand the survival componentFor the longitudinal component a mixed effects regression model is assumed for yi(t) whichtakes the form

yi(aij) = θgti g(aij) + γgtxi(t) + εi(aij) (1)

where g(aij) = (1 aij a2ij a

qij)prime is a polynomial vector of order q for j = 1 mi θi is

a (q + 1)-dimensional vector of random effects and γ is a p-dimensional vector of regressioncoefficients In (1) we further assume θi sim N(θΩ) where θ is the (q+1)-dimensional vectorof the overall effects Ω is a (q + 1) times (q + 1) positive definite covariance matrix with thelower triangle consisting of Ω00Ω10Ω11 Ωqq εi(aij) sim N(0 σ2) and θi and εi(aij) areindependent We note that in (1) if q = 1 g(aij) = (1 aij)gt and θgti g(aij) represents a lineartrajectory and if q = 2 g(aij) = (1 aij a2

ij)gt and θgti g(aij) leads to a quadratic trajectoryFor the failure time ti we assume that the hazard function takes the form

λ(t|λ0βαθi g(t) zi) = λ0(t) expβθgti g(t) +αgtzi (2)

orλ(t|λ0βαθi g(t) zi) = λ0(t) expβgtθi +αgtzi (3)

where λ0(t) is the baseline hazard function β is a one-dimensional regression coefficient in(2) and β is a (q+ 1)-dimensional vector of the regression coefficients in (3) Note that in (2)or (3) θi and g(t) are the parameters and the functions from the longitudinal component ofthe joint model in (1) while λ0 β (or β) and α are the only parameters pertaining to thesurvival component As shown in Section 31 β (or β) controls the association between thelongitudinal marker and the time-to-event A value of β = 0 (or β = 0) implies no associationbetween the longitudinal marker and the time-to-event The joint model with hazard functionspecified in (2) is the trajectory model denoted by SPM1 while the one with hazard functiongiven by (3) is denoted by SPM2 Under SPM1 a positive value of β implies that a largercurrent value of the longitudinal marker is associated with a larger instantaneous hazardwhereas a negative value of β implies that a larger current value of the longitudinal markeris associated with a smaller instantaneous hazard

22 The construction of the piecewise constant baseline hazard function

Assuming λ0(t) to be a piecewise constant baseline hazard function we partition the timeaxis into J intervals with 0 = sJ0 lt sJ1 lt sJ2 lt lt sJJminus1 lt sJJ = infin Then we assign aconstant baseline hazard to each of the J intervals that is

λ0(t) = λj t isin (sJjminus1 sJj ] for j = 1 2 J (4)

Let tlowast1 le tlowast2 le middot middot middot le tlowastnlowast be the nlowast event times of the tirsquos where nlowast =sumni=1 δi We consider

four algorithms to construct the sJj rsquos

Algorithm 1 Equally-spaced quantile partition (ESQP)

Step 1 Compute pj = jJ for j = 1 J minus 1

4 JMFit Joint Models of Longitudinal and Survival Data in SAS

Step 2 Let nj = [pjnlowast] which is the integer part of pjnlowast

Step 3 Set

sJj =

(tlowastnj+ tlowastnj+1)2 if nj = pjn

lowast

tlowastnj+1 if nj lt pjnlowast

(5)

for j = 1 J minus 1

The ESQP is a popular approach to construct the piecewise constant hazard function whichis discussed in Ibrahim Chen and Sinha (2001 Chapter 5) and also implemented in the Rpackage JM developed by Rizopoulos (2010) We note that sJj is the pjth quantile of the tlowasti rsquosand (5) is implemented in the SAS UNIVARIATE procedure as the default option for computingquantiles We also note that the ESQP algorithm does not yield nested partitions in the sensethat sJ1

j j = 1 J1 is not necessarily a subset of sJ2j j = 1 J2 when J1 lt J2

In order to construct nested partitions we propose the following three bi-sectional quantilepartition algorithms

Algorithm 2 Left bi-sectional quantile partition (LBSQP)

Step 1 Decompose J into two parts

J = 2K +M (6)

where K and M are integers and M lt 2K In (6) K = [log J log 2] and thenM = J minus 2K

Step 2 Compute

(i) ak = k2K for k = 1 2K minus 1 and(ii) bm = (2mminus 1)2K+1 for m = 1 M(ge 1)

Step 3 Sort a1 a2Kminus1 b1 bM in ascending order and the resulting ordered J minus 1values are denoted by p1 le p2 le middot middot middot le pJminus1

Step 4 Use Steps 2 and 3 of Algorithm 1 to compute sJj j = 1 J minus 1

Algorithm 3 Middle bi-sectional quantile partition (MBSQP)

Step 1 The same as Algorithm 2

Step 2 Compute

(i) ak = k2K for k = 1 2K minus 1 and(ii) for m = 1 M(ge 1)

(a) bm = (2K minusm)2K+1 for m = 1 3 5 and(b) bm = (2K +mminus 1)2K+1 for m = 2 4 6

Steps 3 and 4 The same as Algorithm 2

Journal of Statistical Software 5

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

0 10 20 30 40 50 60 70 80 90 100()

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

0 10 20 30 40 50 60 70 80 90 100()

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

J = 1

J = 2 = 21

J = 3 = 21 + 1

J = 4 = 22

J = 5 = 22 + 1

J = 6 = 22 + 2

J = 7 = 22 + 3

J = 8 = 23

J = 9 = 23 + 1

0 10 20 30 40 50 60 70 80 90 100()

(a) LBSQP (b) MBSQP (c) RBSQP

Figure 1 An illustration of three bi-sectional partition methods

Algorithm 4 Right bi-sectional quantile partition (RBSQP)

Step 1 The same as Algorithm 2

Step 2 Compute

(i) ak = k2K for k = 1 2K minus 1 and(ii) bm = (2K+1 minus (2mminus 1))2K+1 for m = 1 M(ge 1)

Steps 3 and 4 The same as Algorithm 2

If there are ties in sJj j = 1 J say sJ` = sJ`+1 the interval (sJ` sJ`+1] is undefined Thuswe only use distinct values of the sJj Let Jt denote the number of ties in sJj j = 1 JThen the number of distinct intervals reduces to J minus JtFigure 1 shows how the partition intervals are constructed based on LBSQP MBSQP andRBSQPNotice that when J = 2K K = 1 2 ESQP LBSQP MBSQP and RBSQP yield thesame partition LBSQP MBSQP and RBSQP are desirable when there are more events atthe beginning in the middle and at the end of the follow-up period respectively Anotheradvantage of LBSQP MBSQP and RBSQP is that the resulting partitions are nested andhence the log-likelihood of the joint model increases in J when the longitudinal componentremains fixed

23 The joint likelihoodWe rewrite (1) as follows

yi = Wi(θgti γgt)gt + εiwhere yi = (yi(ai1) yi(aimi))gt Wi = ((g(aij)gtxi(aij)gt)gt j = 1 mi)gt and εi =(εi(ai1) εi(aimi))gt sim N(0 σ2Imi) The complete-data likelihood function of the longitu-dinal measures for the ith subject is given by

L(γ σ2|yiWiθi) = 1(2πσ2)

mi2

expminus 1

2σ2 (yi minusWi(θgti γgt)gt)gt(yi minusWi(θgti γgt)gt) (7)

6 JMFit Joint Models of Longitudinal and Survival Data in SAS

for i = 1 n Note that the density of θi is given by

f(θi|θΩ) = |Ω|minus12

(2π)q+1

2exp

minus 1

2(θi minus θ)gtΩminus1(θi minus θ) (8)

Let ϕ = (λβαγ σ2θΩ) Using (2) (or (3)) (7) and (8) the observed-data likelihoodfunction for (yi ti δi) for the ith subject is given by

L(ϕ|yi ti δi ziWi) =intL(λβα|ti δi ziθi g)L(γ σ2|yiWiθi)f(θi|θΩ)dθi (9)

where the complete-data likelihood function for the survival component is written as

L(λβα|ti δi ziθi g) =[λ(ti|λ0βαθi g(ti) zi)

]δi

times expminusint ti

0λ(u|λ0βαθi g(u) zi)du

(10)

for i = 1 n In (2) (or (3)) when β = 0 (or β = 0) the hazard function reduces toλ(t|λ0α zi) = λ0(t) exp(αgtzi) In this case we fit the survival data alone and the likelihoodfunction in (10) for the ith subject reduces to

L0(λα|ti δi zi) = λ0(ti) exp(αgtzi)δi exp[minus exp(αgtzi)

int ti

0λ0(u)du

] (11)

Letting Dobs = (yi ti δixi zi) i = 1 n denote the observed data the joint likelihoodfor all subjects is given by

L(ϕ|g Dobs) =nprodi=1

L(ϕ|yi ti δi ziWi) (12)

24 AIC (BIC) decomposition and ∆AIC (∆BIC)

Write ϕ1 = (γ σ2θΩ) and ϕ2 = (λβα) Let f(θi|yiWiϕ1) be the conditional density ofthe random effects θi given yi and also let L(ϕ1|yiWi) =

intL(γ σ2|yiWiθi)f(θi|θΩ)dθi

which is the likelihood function corresponding to the marginal distribution of yi FollowingZhang et al (2014) the joint likelihood given in (12) can be decomposed as

L(ϕ|g Dobs) = LLong(ϕ1|g Dobs)LSurv|Long(ϕ2|gϕ1 Dobs) (13)

where LLong(ϕ1|g Dobs) =prodni=1 L(ϕ1|yiWi) and LSurv|Long(ϕ2|gϕ1 Dobs) =

prodni=1

intL(ϕ2|ti

δi ziθi g)f(θi|yiWiϕ1)dθi Using (13) the decomposition of the total Akaike InformationCriterion (AIC) (Akaike 1973) developed in Zhang et al (2014) is given as

AIC = AICLong + AICSurv|Long

where AIC = minus2 logL(ϕ|g Dobs)+2 dim(ϕ) AICLong = minus2 logLLong(ϕ1|g Dobs)+2 dim(ϕ1)AICSurv|Long = minus2 logLSurv|Long(ϕ2|g ϕ1 Dobs) + 2 dim(ϕ2) and ϕ ϕ1 and ϕ2 are the maxi-mum likelihood estimates (MLEs) of ϕ ϕ1 and ϕ2 Similarly the total Bayesian InformationCriterion (BIC) (Schwarz 1978) for the joint model can be decomposed into

BIC = BICLong + BICSurv|Long

Journal of Statistical Software 7

where BIC = AIC+dim(ϕ)(lognminus2) BICLong = AICLong+dim(ϕ1) (lognminus2) BICSurv|Long =AICSurv|Long + dim(ϕ2)(logn minus 2) Using the decompositions of AIC and BIC Zhang et al(2014) proposed two new model assessment criteria given by

∆AIC = AICSurv0 minusAICSurv|Long∆BIC = BICSurv0 minus BICSurv|Long

whereAICSurv0 = minus2

sumni=1 logL0(λ α|ti δi zi) + 2 dim(λα)

BICSurv0 = minus2sumni=1 logL0(λ α|ti δi zi) + dim(λα) logn

and L0(λα|ti δi zi) is defined by (11) The ∆AIC or ∆BIC measure the gain of the fit in thesurvival component due to the longitudinal data with a penalty for the additional parametersin the survival component of the joint model The model with a large value of ∆AIC (∆BIC)is more preferred

3 The SAS macro JMFit

31 Design

The SAS macro JMFit has been developed to assess model fit in joint models of longitudinaland survival data In fact it can fit five models including the two types of joint models withlinear and quadratic trajectories as well as the time-varying covariates model The macroJMFit consists of five submacros SPM1L SPM1Q SPM2L SPM2Q and TVC corresponding to thefive models respectively The MODEL argument of JMFit specifies one of the following fivemodels to be fitted

SPM1L SPM1 with Linear trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ(θ0i + θ1it) +αgtzi

SPM1Q SPM1 with Quadratic trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ(θ0i + θ1it+ θ2it2) +αgtzi

SPM2L SPM2 with Linear trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ1θ0i + β2θ1i +αgtzi

where θ0i and θ1i are subject-level random intercept and random slope

SPM2Q SPM2 with Quadratic trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ1θ0i + β2θ1i + β3θ2i +αgtzi

where θ0i θ1i and θ2i are random effects (ie random intercept random slope andrandom quadratic coefficient)

8 JMFit Joint Models of Longitudinal and Survival Data in SAS

TVC Time-Varying Covariates model The hazard function has the form

λ(t|λ0 βα zi yi(t)) = λ0(t) expβyi(t) +αgtzi

where yi(t) = yi(aij) for aij le t lt aij+1 for j = 1 mi where aimi+1 = infin Notethat the TVC model is a ldquonon-jointrdquo model and the use of this model has great potentialfor bias (Fisher and Lin 1999)

We provide the two versions of SPM1L SPM1Q SPM2L and SPM2Q with the TS argumentIf TS is missing or equal to 0 the joint model will be fit while TS = 1 yields the correspondingtwo-stage model Similar to the method in Tsiatis DeGruttola and Wulfsohn (1995) (i) wefirst fit the linear mixed model specified in (1) to the longitudinal data alone and then obtainthe estimates of θi denoted by θi and (ii) we replace θi in (2) or (3) with the estimate θiat the second stage The only difference between the joint model and the corresponding two-stage model is that θi in (2) or (3) is replaced with the estimate θi This two-stage approachmay potentially lead to biased and inefficient estimates (Ibrahim Chu and Chen 2010)The number of intervals J (ge 1) for the piecewise constant baseline hazard function needs tobe specified in the NPIECES argument For the PARTITION argument 1 represents ESQP 2represents LBSQP 3 represents MBSQP and 4 represents RBSQPJMFit automatically produces a rich text file (RTF) including five tables (i) Number of Sub-jects (ii) Fit Statistics (iii) Survival Parameter Estimates (Survival Alone) (iv) ParameterEstimates and (v) Hazard Ratios amp λ Estimates

32 Implementation detailsIf the observed longitudinal measures are sparse the full trajectories of longitudinal measuresmight not be well estimated For example in the case of fitting a quadratic trajectory the signof the estimated second-order coefficient could be incorrect if the longitudinal measures wereobserved only within the first half of the follow-up period leading to incorrect extrapolationwhen the observed progression time was far beyond the time of the last observed longitudinalmeasureLet tmaxi = max1lejlemiaij When t gt tmaxi yi(t) is never observed and no longitudinaldata are available to estimate the trajectory θgti g(t) for t gt tmaxi Under SPM1 the extrap-olation of the trajectory θgti g(t) beyond tmaxi may lead to a survival component of the jointmodel that fits the survival data poorly In addition such an extrapolation also causes asevere convergence problem in the SAS NLMIXED (SAS Institute Inc 2011b) procedure espe-cially when tmaxi ti for many subjects To circumvent these issues for SPM1 we modifythe hazard function in (2) as

λ(t | λ0αβθi g zi) = λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+) +αgtzi (14)

or

λ(t | λ0αβθi g zi)

= λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+)timesτ minus (tlowastmaxi + [tminus tlowastmaxi]+)

τ minus tlowastmaxi+αgtzi (15)

where [tminustlowastmaxi]+ = max(tminustlowastmaxi 0) tlowastmaxi = tmaxi+wtimesmax(timinustmaxi 0) w (isin [0 1]) is theproportion of max(ti minus tmaxi 0) and τ = max1leilenti which is the last follow-up survival

Journal of Statistical Software 9

time If w = 0 (default) tmaxi will be the starting point of the modified extrapolation of thetrajectory while w = 1 implies that the trajectory extends to ti with no tmaxi adjustmentThis modification can also be applied to the TS model corresponding to SPM1L or SPM1QThere are two arguments called TMAXI and WEIGHT If TMAXI is missing or equal to 0 no tmaxiadjustment will be applied If TMAXI = 1 the tmaxi adjustment based on hazard functiongiven in (14) will be applied with weight given in the WEIGHT argument that is the trajectorywill become flat after tlowastmaxi If TMAXI = 2 the tmaxi adjustment based on hazard functiongiven in (15) will be applied with weight given in the WEIGHT argument that is startingat tlowastmaxi the trajectory will linearly go down to 0 at the last follow-up survival time Anldquooptimalrdquo choice of weight w in conjunction with TMAXI option may be determined by eitherAICSurv|Long or BICSurv|Long The purpose of defining tmaxi is that we create an extrapolationof the longitudinal measures so that the trajectory function can be well estimated Thisextrapolation is needed when there are few longitudinal measures at later points We notehere that the tmaxi method corresponds to a ldquoprediction carried forwardrdquo approach whichmay potentially induce bias in the estimation We also note that this is not an issue for theSPM2 in which the hazard function is independent of g(t)The OPTIONS argument allows users to specify options (eg integration method optimizationtechnique and convergence criteria) that are available in the PROC NLMIXED statement IfOPTIONS is missing JMFit will use adaptive Gaussian quadrature to approximate the integralof the likelihood over the random effects perform a quasi-Newton optimization and apply arelative gradient convergence criterion of 10minus8 All the methods mentioned above are defaultmethods in PROC NLMIXED The Riemann integral is used to compute the cumulative hazardfunction for the trajectory models Each time interval is divided into 200 subintervalsA big challenge in fitting joint models using the SAS NLMIXED procedure is convergence Poorinitial values may lead to the failure of convergence in NLMIXED To address this issue wefirst fit the longitudinal data alone using the SAS MIXED procedure to obtain the estimatesγ σ2 θ Ω θi for the parameters in the longitudinal component of the joint model Using(θi i = 1 n) to replace (θi i = 1 n) in (2) (or (3)) we fit the survival data aloneto obtain the estimates β (or β) α and λ for the parameters in the survival component ofthe joint model Finally these estimates are used as the initial values for the joint modelJMFit does not exclude any longitudinal measures for the joint models If one wishes toexclude the longitudinal measures observed after the survival time ti those longitudinal mea-sures should be pre-excluded in the input longitudinal data for JMFit ldquoCAUTION Longi-tudinal measures are observed after the survival timerdquo will be given at the end of the outputfile if there are any longitudinal measures observed after the survival time

4 Examples

41 The IBCSG data

To illustrate how JMFit works we use the data from a clinical trial in premenopausal womenwith node-positive breast cancer conducted by the IBCSG (International Breast Cancer StudyGroup 1996) Each participant was randomly assigned in a 2 times 2 factorial design to receive(A) cyclophosphamide methotrexate and fluorouracil for 6 consecutive courses on months 1to 6 (CMF6) (B) CMF6 plus three single courses of reintroduction CMF given on months 9

10 JMFit Joint Models of Longitudinal and Survival Data in SAS

Number of Subjects

Subjects_in_coping 1015

Subjects_in_os 1015

Subjects_Used 1015

Figure 2 The number of subjects for coping

12 and 15 (C) CMF for three consecutive courses on months 1 to 3 (CMF3) or (D) CMF3plus three single courses of reintroduction CMF given on months 6 9 and 12 Four indicatorsof patientsrsquo quality of life (QOL) appetite perceived coping mood and physical well-beingwere collected in this trial We consider a subset of the IBCSG data which consists of 1015patients with at least three values of each longitudinal measure and six binary covariatesincluding treatment (A 1 = icyc 0 = reint and 0 = intera B 1 = icyc 1 = reintand 1 = intera C 0 = icyc 0 = reint and 0 = intera and D 0 = icyc 1 = reintand 0 = intera) age (1 = lsquogt 40rsquo and 0 = lsquole 40rsquo) the number of positive nodes of thetumor (1 = lsquogt 4rsquo and 0 = lsquole 4rsquo) and estrogen receptor (ER) status (1 = positive and 0 =negative) To satisfy the normality assumption for each of the four QOL indicators followingChi and Ibrahim (2006) the corresponding observed value of each QOL was transformed toradic

100minusQOL so that smaller value reflects better QOL For the subset of 1015 patients aftera median follow-up of 968 years (interquartile range 810ndash1104 years) 296 patients diedWe note that QOL indicators were collected only up to 165 years (about 18 months) Byapplying joint models in this study we compare the longitudinal QOL indicators in termsof their contributions to the fit of overall survival (OS) via ∆AIC and ∆BIC which arecomputed using JMFitSuppose we fit the trajectory model with a linear trajectory as considered by Chi and Ibrahim(2006) and Zhu Ibrahim Chi and Tang (2012) and J = 9 for coping Then we create two datasets named as ldquocopingrdquo and ldquoosrdquo for the macro JMFitrsquos LONG and SURV options respectivelyThe MODEL option is set to ldquoSPM1Lrdquo and TS is set to 0 We assign 2 to TMAXI for tmaxiadjustment with WEIGHT=05 The NPIECES option is set to 9 and the PARTITION option isset to 2 for the LBSQP algorithmJMFit is called

JMFit(LONG = coping SURV = os MODEL = SPM1L TS = 0 TMAXI = 2WEIGHT = 05 NPIECES = 9 PARTITION = 2)

The RTF file lsquoOutput for coping under SPM1L with J=9 (Partition=2)rtfrsquo is gener-ated by JMFit which lists five tables Table ldquoNumber of Subjectsrdquo given in Figure 2 showsthat there are 1015 patients in data sets coping and os respectively and 1015 subjects areused implying that the IDs of the subjects in these two data sets matchFigure 3 shows the ldquoFit Statisticsrdquo table which consists of the log likelihood AICLong(BICLong) AICSurv (BICSurv) and ∆AIC (∆BIC) Inside the JMFit macro PROC NLMIXEDprovides the log likelihood AIC and BIC and PROC IML (SAS Institute Inc 2011a) is usedto compute AICLong and BICLongNext the table titled ldquoSurvival Parameter Estimates (Survival Alone)rdquo shown in Figure 4is obtained by fitting the survival data alone For this table the estimate the standard error

Journal of Statistical Software 11

Fit Statistics

Log Likelihood -806953

AIC 1619507 BIC 1633290

AICLong 1383046 BICLong 1388954

AICSurv|Long 236460 BICSurv|Long 244336

AICSurv0 240095 BICSurv0 247479

ΔAIC 3635 ΔBIC 3143

Figure 3 The fit statistics for coping under SPM1L with J = 9 based on LBSQP

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

icyc -000126 01642 1015 -001 09939 (-0323 0321) -000133

reint -02772 01707 1015 -162 01046 (-0612 0058) -000141

intera 02678 02337 1015 115 02520 (-0191 0726) -000211

agegrp -02070 01356 1015 -153 01272 (-0473 0059) 0000143

nodegrp 08926 01170 1015 763 lt0001 (0663 1122) -000009

er_stat -01982 01275 1015 -156 01203 (-0448 0052) -000221

log λ1 -49829 02846 1015 -1751 lt0001 (-5541 -4424) -000031

log λ2 -27823 02848 1015 -977 lt0001 (-3341 -2224) -000051

log λ3 -29729 02373 1015 -1253 lt0001 (-3439 -2507) 0003588

log λ4 -27064 02359 1015 -1147 lt0001 (-3169 -2244) -000073

log λ5 -26998 02364 1015 -1142 lt0001 (-3164 -2236) -000167

log λ6 -29433 02355 1015 -1250 lt0001 (-3406 -2481) 0001627

log λ7 -28127 02387 1015 -1178 lt0001 (-3281 -2344) -000293

log λ8 -30135 02367 1015 -1273 lt0001 (-3478 -2549) 0001908

log λ9 -34253 02358 1015 -1453 lt0001 (-3888 -2963) -000119

Figure 4 The estimates of the parameters obtained by (11) with J = 9 based on LBSQP

(SE) the degrees of freedom (DF) the t value the p value and the 95 confidence interval(CI) are all shown for each parameter In addition the gradient of the negative log-likelihoodfunction is displayed which can be used to check convergence of PROC NLMIXED A smallgradient implies better convergence The largest absolute value of the gradients shown inFigure 4 is 0003588 indicating good convergence of PROC NLMIXEDPROC NLMIXED produces the table titled ldquoParameter Estimatesrdquo in Figure 5 which consistsof three subtables ldquoCovariance Parameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquoand ldquoSurvival Parameter Estimatesrdquo For each parameter in this table the estimate SE DFt value p value 95 CI and gradient are all provided The Subtable ldquoCovariance ParameterEstimatesrdquo consists of the estimates of the three lower-triangle elements (Ω00Ω10Ω11) of the

12 JMFit Joint Models of Longitudinal and Survival Data in SAS

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 05350 003274 1013 1634 lt0001 (0471 0599) -008107

Ω10 -005658 002094 1013 -270 00070 (-0098 -0015) 0101966

Ω11 01725 002650 1013 651 lt0001 (0120 0224) -002461

σ 06149 0007308 1013 8414 lt0001 (0601 0629) 0087386

Longitudinal Parameter Estimates

Intercept 008643 007928 1013 109 02759 (-0069 0242) 0123621

t -04004 002277 1013 -1758 lt0001 (-0445 -0356) -015625

icyc 01625 007073 1013 230 00218 (0024 0301) 0015924

reint 01259 007007 1013 180 00726 (-0012 0263) 0313939

intera -006607 009767 1013 -068 04989 (-0258 0126) 0002171

agegrp 005158 005907 1013 087 03828 (-0064 0167) -013258

nodegrp 002780 005384 1013 052 06057 (-0078 0133) -018797

er_stat 002232 005408 1013 041 06799 (-0084 0128) -005203

Survival Parameter Estimates

icyc -006267 01695 1013 -037 07116 (-0395 0270) 0024537

reint -03355 01756 1013 -191 00563 (-0680 0009) -001849

intera 03464 02406 1013 144 01502 (-0126 0818) 014234

agegrp -01827 01397 1013 -131 01913 (-0457 0091) 0049036

nodegrp 09522 01217 1013 782 lt0001 (0713 1191) 003694

er_stat -01540 01315 1013 -117 02418 (-0412 0104) -000715

log λ1 -49259 02888 1013 -1706 lt0001 (-5493 -4359) 0016501

log λ2 -25917 02923 1013 -887 lt0001 (-3165 -2018) -002471

log λ3 -27430 02482 1013 -1105 lt0001 (-3230 -2256) -00547

log λ4 -24318 02494 1013 -975 lt0001 (-2921 -1942) -006882

log λ5 -23952 02519 1013 -951 lt0001 (-2889 -1901) -007637

log λ6 -26149 02529 1013 -1034 lt0001 (-3111 -2119) -008057

log λ7 -24741 02571 1013 -962 lt0001 (-2979 -1970) -008093

log λ8 -26860 02547 1013 -1054 lt0001 (-3186 -2186) -00729

log λ9 -31547 02510 1013 -1257 lt0001 (-3647 -2662) -005614

β 02643 006188 1013 427 lt0001 (0143 0386) 0314192

Figure 5 The estimates of the parameters for coping under SPM1L with J = 9 based onLBSQP

Journal of Statistical Software 13

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_icyc 09393 (0674 1310)

HR_reint 07150 (0507 1009)

HR_intera 14140 (0882 2267)

HR_agegrp 08330 (0633 1096)

HR_nodegrp 25915 (2041 3291)

HR_er_stat 08573 (0662 1110)

λ1 0007256 (0004 0013)

λ2 007489 (0042 0133)

λ3 006438 (0040 0105)

λ4 008788 (0054 0143)

λ5 009116 (0056 0149)

λ6 007318 (0045 0120)

λ7 008424 (0051 0140)

λ8 006815 (0041 0112)

λ9 004265 (0026 0070)

HR_β 13025 (1154 1471)

Figure 6 The hazard ratios and λ estimates for coping under SPM1L with J = 9 based onLBSQP

random-effects covariance matrix as well as the estimate of the standard deviation (σ) for theerror term The estimates of the coefficients of the variables from the longitudinal componentare shown in Subtable ldquoLongitudinal Parameter Estimatesrdquo In Subtable ldquoSurvival ParameterEstimatesrdquo ldquoParamVarrdquo lists all the names of the variables as well as the parameters from thesurvival component In this subtable log λ1 log λ2 and log λ9 are the natural logarithmsof the piecewise baseline hazards for the three intervals respectively The hazard ratios of thecovariates and β as well as the estimates of λ are shown in Figure 6 From Figure 5 we seethat the number of induction cycles has p values of 00218 and 07116 in the longitudinal andsurvival submodels respectively indicating that the effect of the number of induction coursesis significant at the 005 level in the longitudinal submodel and is not statistically significantin the survival submodel The reintroduction of CMF and the interaction between the numberof initial cycles and the reintroduction do not have significant effects on QOL or OS based onthe large p values From Figure 5 we also see that (i) the number of positive nodes is highlysignificant in the survival submodel implying that a greater number of positive nodes wouldincrease the risk of death and (ii) a significant coefficient β indicates that coping is highlyassociated with OSFrom the ldquoFit Statisticsrdquo tables in Figure 3 and Figure 7 we see that coping had the largestvalues of ∆AIC and ∆BIC and physical well-being had the smallest values of ∆AIC and ∆BICwhich indicate that coping led to the most gain in fitting the OS while physical well-beinghad the least contribution to the fit of the OS

14 JMFit Joint Models of Longitudinal and Survival Data in SAS

Fit Statistics

Log Likelihood -870714

AIC 1747029 BIC 1760812

AICLong 1508529 BICLong 1514436

AICSurv|Long 238500 BICSurv|Long 246376

AICSurv0 240095 BICSurv0 247479

ΔAIC 1596 ΔBIC 1104

Fit Statistics

Log Likelihood -854107

AIC 1713814 BIC 1727597

AICLong 1476026 BICLong 1481933

AICSurv|Long 237788 BICSurv|Long 245664

AICSurv0 240095 BICSurv0 247479

ΔAIC 2308 ΔBIC 1815

(a) Appetite (b) Mood

Fit Statistics

Log Likelihood -876795

AIC 1759190 BIC 1772973

AICLong 1519749 BICLong 1525657

AICSurv|Long 239440 BICSurv|Long 247317

AICSurv0 240095 BICSurv0 247479

ΔAIC 655 ΔBIC 163

(c) Physical well-being

Figure 7 The fit statistics for appetite mood and physical well-being under SPM1L withJ = 9 based on LBSQP

AICSurv|Long ∆AICJ LBSQP MBSQP RBSQP LBSQP MBSQP RBSQP1 252820 252820 252820 370 370 3702 251219 251219 251219 884 884 8843 244981 244981 250867 2041 2041 9654 244598 244598 244598 2152 2152 21525 240602 244795 244396 2928 2155 22436 240799 244959 244561 2932 2160 22477 240959 240959 244757 2941 2941 22508 240734 240734 240734 3054 3054 30549 236460 240938 240922 3635 3050 306410 236565 240990 241105 3654 3060 3063

Table 1 AICSurv|Longrsquos and ∆AICrsquos for coping under SPM1L with different partition algo-rithms and different J rsquos

Since ∆AIC = AICSurv0 minus AICSurv|Long both AICSurv0 and AICSurv|Long change when Jchanges Thus the ldquobest Jrdquo based on ∆AIC may not necessarily lead to the best survivalsubmodel or the best joint model As an illustration Table 1 shows the values of AICSurv|Longand ∆AIC for coping under SPM1L with different partition algorithms and different J rsquosNote TMAXI is set to 2 with WEIGHT = 05 in all the models From Table 1 we see that (i)J = 10 or J = 9 is the ldquobestrdquo choice according to ∆AIC (ii) J = 9 or J = 8 is the ldquobestrdquoone based on AICSurv|Long and (iii) the value of AICSurv|Long is 236460 for LBSQP when

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 4: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

4 JMFit Joint Models of Longitudinal and Survival Data in SAS

Step 2 Let nj = [pjnlowast] which is the integer part of pjnlowast

Step 3 Set

sJj =

(tlowastnj+ tlowastnj+1)2 if nj = pjn

lowast

tlowastnj+1 if nj lt pjnlowast

(5)

for j = 1 J minus 1

The ESQP is a popular approach to construct the piecewise constant hazard function whichis discussed in Ibrahim Chen and Sinha (2001 Chapter 5) and also implemented in the Rpackage JM developed by Rizopoulos (2010) We note that sJj is the pjth quantile of the tlowasti rsquosand (5) is implemented in the SAS UNIVARIATE procedure as the default option for computingquantiles We also note that the ESQP algorithm does not yield nested partitions in the sensethat sJ1

j j = 1 J1 is not necessarily a subset of sJ2j j = 1 J2 when J1 lt J2

In order to construct nested partitions we propose the following three bi-sectional quantilepartition algorithms

Algorithm 2 Left bi-sectional quantile partition (LBSQP)

Step 1 Decompose J into two parts

J = 2K +M (6)

where K and M are integers and M lt 2K In (6) K = [log J log 2] and thenM = J minus 2K

Step 2 Compute

(i) ak = k2K for k = 1 2K minus 1 and(ii) bm = (2mminus 1)2K+1 for m = 1 M(ge 1)

Step 3 Sort a1 a2Kminus1 b1 bM in ascending order and the resulting ordered J minus 1values are denoted by p1 le p2 le middot middot middot le pJminus1

Step 4 Use Steps 2 and 3 of Algorithm 1 to compute sJj j = 1 J minus 1

Algorithm 3 Middle bi-sectional quantile partition (MBSQP)

Step 1 The same as Algorithm 2

Step 2 Compute

(i) ak = k2K for k = 1 2K minus 1 and(ii) for m = 1 M(ge 1)

(a) bm = (2K minusm)2K+1 for m = 1 3 5 and(b) bm = (2K +mminus 1)2K+1 for m = 2 4 6

Steps 3 and 4 The same as Algorithm 2

Journal of Statistical Software 5

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

0 10 20 30 40 50 60 70 80 90 100()

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

0 10 20 30 40 50 60 70 80 90 100()

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

J = 1

J = 2 = 21

J = 3 = 21 + 1

J = 4 = 22

J = 5 = 22 + 1

J = 6 = 22 + 2

J = 7 = 22 + 3

J = 8 = 23

J = 9 = 23 + 1

0 10 20 30 40 50 60 70 80 90 100()

(a) LBSQP (b) MBSQP (c) RBSQP

Figure 1 An illustration of three bi-sectional partition methods

Algorithm 4 Right bi-sectional quantile partition (RBSQP)

Step 1 The same as Algorithm 2

Step 2 Compute

(i) ak = k2K for k = 1 2K minus 1 and(ii) bm = (2K+1 minus (2mminus 1))2K+1 for m = 1 M(ge 1)

Steps 3 and 4 The same as Algorithm 2

If there are ties in sJj j = 1 J say sJ` = sJ`+1 the interval (sJ` sJ`+1] is undefined Thuswe only use distinct values of the sJj Let Jt denote the number of ties in sJj j = 1 JThen the number of distinct intervals reduces to J minus JtFigure 1 shows how the partition intervals are constructed based on LBSQP MBSQP andRBSQPNotice that when J = 2K K = 1 2 ESQP LBSQP MBSQP and RBSQP yield thesame partition LBSQP MBSQP and RBSQP are desirable when there are more events atthe beginning in the middle and at the end of the follow-up period respectively Anotheradvantage of LBSQP MBSQP and RBSQP is that the resulting partitions are nested andhence the log-likelihood of the joint model increases in J when the longitudinal componentremains fixed

23 The joint likelihoodWe rewrite (1) as follows

yi = Wi(θgti γgt)gt + εiwhere yi = (yi(ai1) yi(aimi))gt Wi = ((g(aij)gtxi(aij)gt)gt j = 1 mi)gt and εi =(εi(ai1) εi(aimi))gt sim N(0 σ2Imi) The complete-data likelihood function of the longitu-dinal measures for the ith subject is given by

L(γ σ2|yiWiθi) = 1(2πσ2)

mi2

expminus 1

2σ2 (yi minusWi(θgti γgt)gt)gt(yi minusWi(θgti γgt)gt) (7)

6 JMFit Joint Models of Longitudinal and Survival Data in SAS

for i = 1 n Note that the density of θi is given by

f(θi|θΩ) = |Ω|minus12

(2π)q+1

2exp

minus 1

2(θi minus θ)gtΩminus1(θi minus θ) (8)

Let ϕ = (λβαγ σ2θΩ) Using (2) (or (3)) (7) and (8) the observed-data likelihoodfunction for (yi ti δi) for the ith subject is given by

L(ϕ|yi ti δi ziWi) =intL(λβα|ti δi ziθi g)L(γ σ2|yiWiθi)f(θi|θΩ)dθi (9)

where the complete-data likelihood function for the survival component is written as

L(λβα|ti δi ziθi g) =[λ(ti|λ0βαθi g(ti) zi)

]δi

times expminusint ti

0λ(u|λ0βαθi g(u) zi)du

(10)

for i = 1 n In (2) (or (3)) when β = 0 (or β = 0) the hazard function reduces toλ(t|λ0α zi) = λ0(t) exp(αgtzi) In this case we fit the survival data alone and the likelihoodfunction in (10) for the ith subject reduces to

L0(λα|ti δi zi) = λ0(ti) exp(αgtzi)δi exp[minus exp(αgtzi)

int ti

0λ0(u)du

] (11)

Letting Dobs = (yi ti δixi zi) i = 1 n denote the observed data the joint likelihoodfor all subjects is given by

L(ϕ|g Dobs) =nprodi=1

L(ϕ|yi ti δi ziWi) (12)

24 AIC (BIC) decomposition and ∆AIC (∆BIC)

Write ϕ1 = (γ σ2θΩ) and ϕ2 = (λβα) Let f(θi|yiWiϕ1) be the conditional density ofthe random effects θi given yi and also let L(ϕ1|yiWi) =

intL(γ σ2|yiWiθi)f(θi|θΩ)dθi

which is the likelihood function corresponding to the marginal distribution of yi FollowingZhang et al (2014) the joint likelihood given in (12) can be decomposed as

L(ϕ|g Dobs) = LLong(ϕ1|g Dobs)LSurv|Long(ϕ2|gϕ1 Dobs) (13)

where LLong(ϕ1|g Dobs) =prodni=1 L(ϕ1|yiWi) and LSurv|Long(ϕ2|gϕ1 Dobs) =

prodni=1

intL(ϕ2|ti

δi ziθi g)f(θi|yiWiϕ1)dθi Using (13) the decomposition of the total Akaike InformationCriterion (AIC) (Akaike 1973) developed in Zhang et al (2014) is given as

AIC = AICLong + AICSurv|Long

where AIC = minus2 logL(ϕ|g Dobs)+2 dim(ϕ) AICLong = minus2 logLLong(ϕ1|g Dobs)+2 dim(ϕ1)AICSurv|Long = minus2 logLSurv|Long(ϕ2|g ϕ1 Dobs) + 2 dim(ϕ2) and ϕ ϕ1 and ϕ2 are the maxi-mum likelihood estimates (MLEs) of ϕ ϕ1 and ϕ2 Similarly the total Bayesian InformationCriterion (BIC) (Schwarz 1978) for the joint model can be decomposed into

BIC = BICLong + BICSurv|Long

Journal of Statistical Software 7

where BIC = AIC+dim(ϕ)(lognminus2) BICLong = AICLong+dim(ϕ1) (lognminus2) BICSurv|Long =AICSurv|Long + dim(ϕ2)(logn minus 2) Using the decompositions of AIC and BIC Zhang et al(2014) proposed two new model assessment criteria given by

∆AIC = AICSurv0 minusAICSurv|Long∆BIC = BICSurv0 minus BICSurv|Long

whereAICSurv0 = minus2

sumni=1 logL0(λ α|ti δi zi) + 2 dim(λα)

BICSurv0 = minus2sumni=1 logL0(λ α|ti δi zi) + dim(λα) logn

and L0(λα|ti δi zi) is defined by (11) The ∆AIC or ∆BIC measure the gain of the fit in thesurvival component due to the longitudinal data with a penalty for the additional parametersin the survival component of the joint model The model with a large value of ∆AIC (∆BIC)is more preferred

3 The SAS macro JMFit

31 Design

The SAS macro JMFit has been developed to assess model fit in joint models of longitudinaland survival data In fact it can fit five models including the two types of joint models withlinear and quadratic trajectories as well as the time-varying covariates model The macroJMFit consists of five submacros SPM1L SPM1Q SPM2L SPM2Q and TVC corresponding to thefive models respectively The MODEL argument of JMFit specifies one of the following fivemodels to be fitted

SPM1L SPM1 with Linear trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ(θ0i + θ1it) +αgtzi

SPM1Q SPM1 with Quadratic trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ(θ0i + θ1it+ θ2it2) +αgtzi

SPM2L SPM2 with Linear trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ1θ0i + β2θ1i +αgtzi

where θ0i and θ1i are subject-level random intercept and random slope

SPM2Q SPM2 with Quadratic trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ1θ0i + β2θ1i + β3θ2i +αgtzi

where θ0i θ1i and θ2i are random effects (ie random intercept random slope andrandom quadratic coefficient)

8 JMFit Joint Models of Longitudinal and Survival Data in SAS

TVC Time-Varying Covariates model The hazard function has the form

λ(t|λ0 βα zi yi(t)) = λ0(t) expβyi(t) +αgtzi

where yi(t) = yi(aij) for aij le t lt aij+1 for j = 1 mi where aimi+1 = infin Notethat the TVC model is a ldquonon-jointrdquo model and the use of this model has great potentialfor bias (Fisher and Lin 1999)

We provide the two versions of SPM1L SPM1Q SPM2L and SPM2Q with the TS argumentIf TS is missing or equal to 0 the joint model will be fit while TS = 1 yields the correspondingtwo-stage model Similar to the method in Tsiatis DeGruttola and Wulfsohn (1995) (i) wefirst fit the linear mixed model specified in (1) to the longitudinal data alone and then obtainthe estimates of θi denoted by θi and (ii) we replace θi in (2) or (3) with the estimate θiat the second stage The only difference between the joint model and the corresponding two-stage model is that θi in (2) or (3) is replaced with the estimate θi This two-stage approachmay potentially lead to biased and inefficient estimates (Ibrahim Chu and Chen 2010)The number of intervals J (ge 1) for the piecewise constant baseline hazard function needs tobe specified in the NPIECES argument For the PARTITION argument 1 represents ESQP 2represents LBSQP 3 represents MBSQP and 4 represents RBSQPJMFit automatically produces a rich text file (RTF) including five tables (i) Number of Sub-jects (ii) Fit Statistics (iii) Survival Parameter Estimates (Survival Alone) (iv) ParameterEstimates and (v) Hazard Ratios amp λ Estimates

32 Implementation detailsIf the observed longitudinal measures are sparse the full trajectories of longitudinal measuresmight not be well estimated For example in the case of fitting a quadratic trajectory the signof the estimated second-order coefficient could be incorrect if the longitudinal measures wereobserved only within the first half of the follow-up period leading to incorrect extrapolationwhen the observed progression time was far beyond the time of the last observed longitudinalmeasureLet tmaxi = max1lejlemiaij When t gt tmaxi yi(t) is never observed and no longitudinaldata are available to estimate the trajectory θgti g(t) for t gt tmaxi Under SPM1 the extrap-olation of the trajectory θgti g(t) beyond tmaxi may lead to a survival component of the jointmodel that fits the survival data poorly In addition such an extrapolation also causes asevere convergence problem in the SAS NLMIXED (SAS Institute Inc 2011b) procedure espe-cially when tmaxi ti for many subjects To circumvent these issues for SPM1 we modifythe hazard function in (2) as

λ(t | λ0αβθi g zi) = λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+) +αgtzi (14)

or

λ(t | λ0αβθi g zi)

= λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+)timesτ minus (tlowastmaxi + [tminus tlowastmaxi]+)

τ minus tlowastmaxi+αgtzi (15)

where [tminustlowastmaxi]+ = max(tminustlowastmaxi 0) tlowastmaxi = tmaxi+wtimesmax(timinustmaxi 0) w (isin [0 1]) is theproportion of max(ti minus tmaxi 0) and τ = max1leilenti which is the last follow-up survival

Journal of Statistical Software 9

time If w = 0 (default) tmaxi will be the starting point of the modified extrapolation of thetrajectory while w = 1 implies that the trajectory extends to ti with no tmaxi adjustmentThis modification can also be applied to the TS model corresponding to SPM1L or SPM1QThere are two arguments called TMAXI and WEIGHT If TMAXI is missing or equal to 0 no tmaxiadjustment will be applied If TMAXI = 1 the tmaxi adjustment based on hazard functiongiven in (14) will be applied with weight given in the WEIGHT argument that is the trajectorywill become flat after tlowastmaxi If TMAXI = 2 the tmaxi adjustment based on hazard functiongiven in (15) will be applied with weight given in the WEIGHT argument that is startingat tlowastmaxi the trajectory will linearly go down to 0 at the last follow-up survival time Anldquooptimalrdquo choice of weight w in conjunction with TMAXI option may be determined by eitherAICSurv|Long or BICSurv|Long The purpose of defining tmaxi is that we create an extrapolationof the longitudinal measures so that the trajectory function can be well estimated Thisextrapolation is needed when there are few longitudinal measures at later points We notehere that the tmaxi method corresponds to a ldquoprediction carried forwardrdquo approach whichmay potentially induce bias in the estimation We also note that this is not an issue for theSPM2 in which the hazard function is independent of g(t)The OPTIONS argument allows users to specify options (eg integration method optimizationtechnique and convergence criteria) that are available in the PROC NLMIXED statement IfOPTIONS is missing JMFit will use adaptive Gaussian quadrature to approximate the integralof the likelihood over the random effects perform a quasi-Newton optimization and apply arelative gradient convergence criterion of 10minus8 All the methods mentioned above are defaultmethods in PROC NLMIXED The Riemann integral is used to compute the cumulative hazardfunction for the trajectory models Each time interval is divided into 200 subintervalsA big challenge in fitting joint models using the SAS NLMIXED procedure is convergence Poorinitial values may lead to the failure of convergence in NLMIXED To address this issue wefirst fit the longitudinal data alone using the SAS MIXED procedure to obtain the estimatesγ σ2 θ Ω θi for the parameters in the longitudinal component of the joint model Using(θi i = 1 n) to replace (θi i = 1 n) in (2) (or (3)) we fit the survival data aloneto obtain the estimates β (or β) α and λ for the parameters in the survival component ofthe joint model Finally these estimates are used as the initial values for the joint modelJMFit does not exclude any longitudinal measures for the joint models If one wishes toexclude the longitudinal measures observed after the survival time ti those longitudinal mea-sures should be pre-excluded in the input longitudinal data for JMFit ldquoCAUTION Longi-tudinal measures are observed after the survival timerdquo will be given at the end of the outputfile if there are any longitudinal measures observed after the survival time

4 Examples

41 The IBCSG data

To illustrate how JMFit works we use the data from a clinical trial in premenopausal womenwith node-positive breast cancer conducted by the IBCSG (International Breast Cancer StudyGroup 1996) Each participant was randomly assigned in a 2 times 2 factorial design to receive(A) cyclophosphamide methotrexate and fluorouracil for 6 consecutive courses on months 1to 6 (CMF6) (B) CMF6 plus three single courses of reintroduction CMF given on months 9

10 JMFit Joint Models of Longitudinal and Survival Data in SAS

Number of Subjects

Subjects_in_coping 1015

Subjects_in_os 1015

Subjects_Used 1015

Figure 2 The number of subjects for coping

12 and 15 (C) CMF for three consecutive courses on months 1 to 3 (CMF3) or (D) CMF3plus three single courses of reintroduction CMF given on months 6 9 and 12 Four indicatorsof patientsrsquo quality of life (QOL) appetite perceived coping mood and physical well-beingwere collected in this trial We consider a subset of the IBCSG data which consists of 1015patients with at least three values of each longitudinal measure and six binary covariatesincluding treatment (A 1 = icyc 0 = reint and 0 = intera B 1 = icyc 1 = reintand 1 = intera C 0 = icyc 0 = reint and 0 = intera and D 0 = icyc 1 = reintand 0 = intera) age (1 = lsquogt 40rsquo and 0 = lsquole 40rsquo) the number of positive nodes of thetumor (1 = lsquogt 4rsquo and 0 = lsquole 4rsquo) and estrogen receptor (ER) status (1 = positive and 0 =negative) To satisfy the normality assumption for each of the four QOL indicators followingChi and Ibrahim (2006) the corresponding observed value of each QOL was transformed toradic

100minusQOL so that smaller value reflects better QOL For the subset of 1015 patients aftera median follow-up of 968 years (interquartile range 810ndash1104 years) 296 patients diedWe note that QOL indicators were collected only up to 165 years (about 18 months) Byapplying joint models in this study we compare the longitudinal QOL indicators in termsof their contributions to the fit of overall survival (OS) via ∆AIC and ∆BIC which arecomputed using JMFitSuppose we fit the trajectory model with a linear trajectory as considered by Chi and Ibrahim(2006) and Zhu Ibrahim Chi and Tang (2012) and J = 9 for coping Then we create two datasets named as ldquocopingrdquo and ldquoosrdquo for the macro JMFitrsquos LONG and SURV options respectivelyThe MODEL option is set to ldquoSPM1Lrdquo and TS is set to 0 We assign 2 to TMAXI for tmaxiadjustment with WEIGHT=05 The NPIECES option is set to 9 and the PARTITION option isset to 2 for the LBSQP algorithmJMFit is called

JMFit(LONG = coping SURV = os MODEL = SPM1L TS = 0 TMAXI = 2WEIGHT = 05 NPIECES = 9 PARTITION = 2)

The RTF file lsquoOutput for coping under SPM1L with J=9 (Partition=2)rtfrsquo is gener-ated by JMFit which lists five tables Table ldquoNumber of Subjectsrdquo given in Figure 2 showsthat there are 1015 patients in data sets coping and os respectively and 1015 subjects areused implying that the IDs of the subjects in these two data sets matchFigure 3 shows the ldquoFit Statisticsrdquo table which consists of the log likelihood AICLong(BICLong) AICSurv (BICSurv) and ∆AIC (∆BIC) Inside the JMFit macro PROC NLMIXEDprovides the log likelihood AIC and BIC and PROC IML (SAS Institute Inc 2011a) is usedto compute AICLong and BICLongNext the table titled ldquoSurvival Parameter Estimates (Survival Alone)rdquo shown in Figure 4is obtained by fitting the survival data alone For this table the estimate the standard error

Journal of Statistical Software 11

Fit Statistics

Log Likelihood -806953

AIC 1619507 BIC 1633290

AICLong 1383046 BICLong 1388954

AICSurv|Long 236460 BICSurv|Long 244336

AICSurv0 240095 BICSurv0 247479

ΔAIC 3635 ΔBIC 3143

Figure 3 The fit statistics for coping under SPM1L with J = 9 based on LBSQP

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

icyc -000126 01642 1015 -001 09939 (-0323 0321) -000133

reint -02772 01707 1015 -162 01046 (-0612 0058) -000141

intera 02678 02337 1015 115 02520 (-0191 0726) -000211

agegrp -02070 01356 1015 -153 01272 (-0473 0059) 0000143

nodegrp 08926 01170 1015 763 lt0001 (0663 1122) -000009

er_stat -01982 01275 1015 -156 01203 (-0448 0052) -000221

log λ1 -49829 02846 1015 -1751 lt0001 (-5541 -4424) -000031

log λ2 -27823 02848 1015 -977 lt0001 (-3341 -2224) -000051

log λ3 -29729 02373 1015 -1253 lt0001 (-3439 -2507) 0003588

log λ4 -27064 02359 1015 -1147 lt0001 (-3169 -2244) -000073

log λ5 -26998 02364 1015 -1142 lt0001 (-3164 -2236) -000167

log λ6 -29433 02355 1015 -1250 lt0001 (-3406 -2481) 0001627

log λ7 -28127 02387 1015 -1178 lt0001 (-3281 -2344) -000293

log λ8 -30135 02367 1015 -1273 lt0001 (-3478 -2549) 0001908

log λ9 -34253 02358 1015 -1453 lt0001 (-3888 -2963) -000119

Figure 4 The estimates of the parameters obtained by (11) with J = 9 based on LBSQP

(SE) the degrees of freedom (DF) the t value the p value and the 95 confidence interval(CI) are all shown for each parameter In addition the gradient of the negative log-likelihoodfunction is displayed which can be used to check convergence of PROC NLMIXED A smallgradient implies better convergence The largest absolute value of the gradients shown inFigure 4 is 0003588 indicating good convergence of PROC NLMIXEDPROC NLMIXED produces the table titled ldquoParameter Estimatesrdquo in Figure 5 which consistsof three subtables ldquoCovariance Parameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquoand ldquoSurvival Parameter Estimatesrdquo For each parameter in this table the estimate SE DFt value p value 95 CI and gradient are all provided The Subtable ldquoCovariance ParameterEstimatesrdquo consists of the estimates of the three lower-triangle elements (Ω00Ω10Ω11) of the

12 JMFit Joint Models of Longitudinal and Survival Data in SAS

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 05350 003274 1013 1634 lt0001 (0471 0599) -008107

Ω10 -005658 002094 1013 -270 00070 (-0098 -0015) 0101966

Ω11 01725 002650 1013 651 lt0001 (0120 0224) -002461

σ 06149 0007308 1013 8414 lt0001 (0601 0629) 0087386

Longitudinal Parameter Estimates

Intercept 008643 007928 1013 109 02759 (-0069 0242) 0123621

t -04004 002277 1013 -1758 lt0001 (-0445 -0356) -015625

icyc 01625 007073 1013 230 00218 (0024 0301) 0015924

reint 01259 007007 1013 180 00726 (-0012 0263) 0313939

intera -006607 009767 1013 -068 04989 (-0258 0126) 0002171

agegrp 005158 005907 1013 087 03828 (-0064 0167) -013258

nodegrp 002780 005384 1013 052 06057 (-0078 0133) -018797

er_stat 002232 005408 1013 041 06799 (-0084 0128) -005203

Survival Parameter Estimates

icyc -006267 01695 1013 -037 07116 (-0395 0270) 0024537

reint -03355 01756 1013 -191 00563 (-0680 0009) -001849

intera 03464 02406 1013 144 01502 (-0126 0818) 014234

agegrp -01827 01397 1013 -131 01913 (-0457 0091) 0049036

nodegrp 09522 01217 1013 782 lt0001 (0713 1191) 003694

er_stat -01540 01315 1013 -117 02418 (-0412 0104) -000715

log λ1 -49259 02888 1013 -1706 lt0001 (-5493 -4359) 0016501

log λ2 -25917 02923 1013 -887 lt0001 (-3165 -2018) -002471

log λ3 -27430 02482 1013 -1105 lt0001 (-3230 -2256) -00547

log λ4 -24318 02494 1013 -975 lt0001 (-2921 -1942) -006882

log λ5 -23952 02519 1013 -951 lt0001 (-2889 -1901) -007637

log λ6 -26149 02529 1013 -1034 lt0001 (-3111 -2119) -008057

log λ7 -24741 02571 1013 -962 lt0001 (-2979 -1970) -008093

log λ8 -26860 02547 1013 -1054 lt0001 (-3186 -2186) -00729

log λ9 -31547 02510 1013 -1257 lt0001 (-3647 -2662) -005614

β 02643 006188 1013 427 lt0001 (0143 0386) 0314192

Figure 5 The estimates of the parameters for coping under SPM1L with J = 9 based onLBSQP

Journal of Statistical Software 13

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_icyc 09393 (0674 1310)

HR_reint 07150 (0507 1009)

HR_intera 14140 (0882 2267)

HR_agegrp 08330 (0633 1096)

HR_nodegrp 25915 (2041 3291)

HR_er_stat 08573 (0662 1110)

λ1 0007256 (0004 0013)

λ2 007489 (0042 0133)

λ3 006438 (0040 0105)

λ4 008788 (0054 0143)

λ5 009116 (0056 0149)

λ6 007318 (0045 0120)

λ7 008424 (0051 0140)

λ8 006815 (0041 0112)

λ9 004265 (0026 0070)

HR_β 13025 (1154 1471)

Figure 6 The hazard ratios and λ estimates for coping under SPM1L with J = 9 based onLBSQP

random-effects covariance matrix as well as the estimate of the standard deviation (σ) for theerror term The estimates of the coefficients of the variables from the longitudinal componentare shown in Subtable ldquoLongitudinal Parameter Estimatesrdquo In Subtable ldquoSurvival ParameterEstimatesrdquo ldquoParamVarrdquo lists all the names of the variables as well as the parameters from thesurvival component In this subtable log λ1 log λ2 and log λ9 are the natural logarithmsof the piecewise baseline hazards for the three intervals respectively The hazard ratios of thecovariates and β as well as the estimates of λ are shown in Figure 6 From Figure 5 we seethat the number of induction cycles has p values of 00218 and 07116 in the longitudinal andsurvival submodels respectively indicating that the effect of the number of induction coursesis significant at the 005 level in the longitudinal submodel and is not statistically significantin the survival submodel The reintroduction of CMF and the interaction between the numberof initial cycles and the reintroduction do not have significant effects on QOL or OS based onthe large p values From Figure 5 we also see that (i) the number of positive nodes is highlysignificant in the survival submodel implying that a greater number of positive nodes wouldincrease the risk of death and (ii) a significant coefficient β indicates that coping is highlyassociated with OSFrom the ldquoFit Statisticsrdquo tables in Figure 3 and Figure 7 we see that coping had the largestvalues of ∆AIC and ∆BIC and physical well-being had the smallest values of ∆AIC and ∆BICwhich indicate that coping led to the most gain in fitting the OS while physical well-beinghad the least contribution to the fit of the OS

14 JMFit Joint Models of Longitudinal and Survival Data in SAS

Fit Statistics

Log Likelihood -870714

AIC 1747029 BIC 1760812

AICLong 1508529 BICLong 1514436

AICSurv|Long 238500 BICSurv|Long 246376

AICSurv0 240095 BICSurv0 247479

ΔAIC 1596 ΔBIC 1104

Fit Statistics

Log Likelihood -854107

AIC 1713814 BIC 1727597

AICLong 1476026 BICLong 1481933

AICSurv|Long 237788 BICSurv|Long 245664

AICSurv0 240095 BICSurv0 247479

ΔAIC 2308 ΔBIC 1815

(a) Appetite (b) Mood

Fit Statistics

Log Likelihood -876795

AIC 1759190 BIC 1772973

AICLong 1519749 BICLong 1525657

AICSurv|Long 239440 BICSurv|Long 247317

AICSurv0 240095 BICSurv0 247479

ΔAIC 655 ΔBIC 163

(c) Physical well-being

Figure 7 The fit statistics for appetite mood and physical well-being under SPM1L withJ = 9 based on LBSQP

AICSurv|Long ∆AICJ LBSQP MBSQP RBSQP LBSQP MBSQP RBSQP1 252820 252820 252820 370 370 3702 251219 251219 251219 884 884 8843 244981 244981 250867 2041 2041 9654 244598 244598 244598 2152 2152 21525 240602 244795 244396 2928 2155 22436 240799 244959 244561 2932 2160 22477 240959 240959 244757 2941 2941 22508 240734 240734 240734 3054 3054 30549 236460 240938 240922 3635 3050 306410 236565 240990 241105 3654 3060 3063

Table 1 AICSurv|Longrsquos and ∆AICrsquos for coping under SPM1L with different partition algo-rithms and different J rsquos

Since ∆AIC = AICSurv0 minus AICSurv|Long both AICSurv0 and AICSurv|Long change when Jchanges Thus the ldquobest Jrdquo based on ∆AIC may not necessarily lead to the best survivalsubmodel or the best joint model As an illustration Table 1 shows the values of AICSurv|Longand ∆AIC for coping under SPM1L with different partition algorithms and different J rsquosNote TMAXI is set to 2 with WEIGHT = 05 in all the models From Table 1 we see that (i)J = 10 or J = 9 is the ldquobestrdquo choice according to ∆AIC (ii) J = 9 or J = 8 is the ldquobestrdquoone based on AICSurv|Long and (iii) the value of AICSurv|Long is 236460 for LBSQP when

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 5: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

Journal of Statistical Software 5

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

0 10 20 30 40 50 60 70 80 90 100()

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

0 10 20 30 40 50 60 70 80 90 100()

| |

| |

| | |

| | | |

| | | | |

| | | | | |

| | | | | | |

| | | | | | | |

| | | | | | | | |

|

|

|

|

|

|

|

|

J = 1

J = 2 = 21

J = 3 = 21 + 1

J = 4 = 22

J = 5 = 22 + 1

J = 6 = 22 + 2

J = 7 = 22 + 3

J = 8 = 23

J = 9 = 23 + 1

0 10 20 30 40 50 60 70 80 90 100()

(a) LBSQP (b) MBSQP (c) RBSQP

Figure 1 An illustration of three bi-sectional partition methods

Algorithm 4 Right bi-sectional quantile partition (RBSQP)

Step 1 The same as Algorithm 2

Step 2 Compute

(i) ak = k2K for k = 1 2K minus 1 and(ii) bm = (2K+1 minus (2mminus 1))2K+1 for m = 1 M(ge 1)

Steps 3 and 4 The same as Algorithm 2

If there are ties in sJj j = 1 J say sJ` = sJ`+1 the interval (sJ` sJ`+1] is undefined Thuswe only use distinct values of the sJj Let Jt denote the number of ties in sJj j = 1 JThen the number of distinct intervals reduces to J minus JtFigure 1 shows how the partition intervals are constructed based on LBSQP MBSQP andRBSQPNotice that when J = 2K K = 1 2 ESQP LBSQP MBSQP and RBSQP yield thesame partition LBSQP MBSQP and RBSQP are desirable when there are more events atthe beginning in the middle and at the end of the follow-up period respectively Anotheradvantage of LBSQP MBSQP and RBSQP is that the resulting partitions are nested andhence the log-likelihood of the joint model increases in J when the longitudinal componentremains fixed

23 The joint likelihoodWe rewrite (1) as follows

yi = Wi(θgti γgt)gt + εiwhere yi = (yi(ai1) yi(aimi))gt Wi = ((g(aij)gtxi(aij)gt)gt j = 1 mi)gt and εi =(εi(ai1) εi(aimi))gt sim N(0 σ2Imi) The complete-data likelihood function of the longitu-dinal measures for the ith subject is given by

L(γ σ2|yiWiθi) = 1(2πσ2)

mi2

expminus 1

2σ2 (yi minusWi(θgti γgt)gt)gt(yi minusWi(θgti γgt)gt) (7)

6 JMFit Joint Models of Longitudinal and Survival Data in SAS

for i = 1 n Note that the density of θi is given by

f(θi|θΩ) = |Ω|minus12

(2π)q+1

2exp

minus 1

2(θi minus θ)gtΩminus1(θi minus θ) (8)

Let ϕ = (λβαγ σ2θΩ) Using (2) (or (3)) (7) and (8) the observed-data likelihoodfunction for (yi ti δi) for the ith subject is given by

L(ϕ|yi ti δi ziWi) =intL(λβα|ti δi ziθi g)L(γ σ2|yiWiθi)f(θi|θΩ)dθi (9)

where the complete-data likelihood function for the survival component is written as

L(λβα|ti δi ziθi g) =[λ(ti|λ0βαθi g(ti) zi)

]δi

times expminusint ti

0λ(u|λ0βαθi g(u) zi)du

(10)

for i = 1 n In (2) (or (3)) when β = 0 (or β = 0) the hazard function reduces toλ(t|λ0α zi) = λ0(t) exp(αgtzi) In this case we fit the survival data alone and the likelihoodfunction in (10) for the ith subject reduces to

L0(λα|ti δi zi) = λ0(ti) exp(αgtzi)δi exp[minus exp(αgtzi)

int ti

0λ0(u)du

] (11)

Letting Dobs = (yi ti δixi zi) i = 1 n denote the observed data the joint likelihoodfor all subjects is given by

L(ϕ|g Dobs) =nprodi=1

L(ϕ|yi ti δi ziWi) (12)

24 AIC (BIC) decomposition and ∆AIC (∆BIC)

Write ϕ1 = (γ σ2θΩ) and ϕ2 = (λβα) Let f(θi|yiWiϕ1) be the conditional density ofthe random effects θi given yi and also let L(ϕ1|yiWi) =

intL(γ σ2|yiWiθi)f(θi|θΩ)dθi

which is the likelihood function corresponding to the marginal distribution of yi FollowingZhang et al (2014) the joint likelihood given in (12) can be decomposed as

L(ϕ|g Dobs) = LLong(ϕ1|g Dobs)LSurv|Long(ϕ2|gϕ1 Dobs) (13)

where LLong(ϕ1|g Dobs) =prodni=1 L(ϕ1|yiWi) and LSurv|Long(ϕ2|gϕ1 Dobs) =

prodni=1

intL(ϕ2|ti

δi ziθi g)f(θi|yiWiϕ1)dθi Using (13) the decomposition of the total Akaike InformationCriterion (AIC) (Akaike 1973) developed in Zhang et al (2014) is given as

AIC = AICLong + AICSurv|Long

where AIC = minus2 logL(ϕ|g Dobs)+2 dim(ϕ) AICLong = minus2 logLLong(ϕ1|g Dobs)+2 dim(ϕ1)AICSurv|Long = minus2 logLSurv|Long(ϕ2|g ϕ1 Dobs) + 2 dim(ϕ2) and ϕ ϕ1 and ϕ2 are the maxi-mum likelihood estimates (MLEs) of ϕ ϕ1 and ϕ2 Similarly the total Bayesian InformationCriterion (BIC) (Schwarz 1978) for the joint model can be decomposed into

BIC = BICLong + BICSurv|Long

Journal of Statistical Software 7

where BIC = AIC+dim(ϕ)(lognminus2) BICLong = AICLong+dim(ϕ1) (lognminus2) BICSurv|Long =AICSurv|Long + dim(ϕ2)(logn minus 2) Using the decompositions of AIC and BIC Zhang et al(2014) proposed two new model assessment criteria given by

∆AIC = AICSurv0 minusAICSurv|Long∆BIC = BICSurv0 minus BICSurv|Long

whereAICSurv0 = minus2

sumni=1 logL0(λ α|ti δi zi) + 2 dim(λα)

BICSurv0 = minus2sumni=1 logL0(λ α|ti δi zi) + dim(λα) logn

and L0(λα|ti δi zi) is defined by (11) The ∆AIC or ∆BIC measure the gain of the fit in thesurvival component due to the longitudinal data with a penalty for the additional parametersin the survival component of the joint model The model with a large value of ∆AIC (∆BIC)is more preferred

3 The SAS macro JMFit

31 Design

The SAS macro JMFit has been developed to assess model fit in joint models of longitudinaland survival data In fact it can fit five models including the two types of joint models withlinear and quadratic trajectories as well as the time-varying covariates model The macroJMFit consists of five submacros SPM1L SPM1Q SPM2L SPM2Q and TVC corresponding to thefive models respectively The MODEL argument of JMFit specifies one of the following fivemodels to be fitted

SPM1L SPM1 with Linear trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ(θ0i + θ1it) +αgtzi

SPM1Q SPM1 with Quadratic trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ(θ0i + θ1it+ θ2it2) +αgtzi

SPM2L SPM2 with Linear trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ1θ0i + β2θ1i +αgtzi

where θ0i and θ1i are subject-level random intercept and random slope

SPM2Q SPM2 with Quadratic trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ1θ0i + β2θ1i + β3θ2i +αgtzi

where θ0i θ1i and θ2i are random effects (ie random intercept random slope andrandom quadratic coefficient)

8 JMFit Joint Models of Longitudinal and Survival Data in SAS

TVC Time-Varying Covariates model The hazard function has the form

λ(t|λ0 βα zi yi(t)) = λ0(t) expβyi(t) +αgtzi

where yi(t) = yi(aij) for aij le t lt aij+1 for j = 1 mi where aimi+1 = infin Notethat the TVC model is a ldquonon-jointrdquo model and the use of this model has great potentialfor bias (Fisher and Lin 1999)

We provide the two versions of SPM1L SPM1Q SPM2L and SPM2Q with the TS argumentIf TS is missing or equal to 0 the joint model will be fit while TS = 1 yields the correspondingtwo-stage model Similar to the method in Tsiatis DeGruttola and Wulfsohn (1995) (i) wefirst fit the linear mixed model specified in (1) to the longitudinal data alone and then obtainthe estimates of θi denoted by θi and (ii) we replace θi in (2) or (3) with the estimate θiat the second stage The only difference between the joint model and the corresponding two-stage model is that θi in (2) or (3) is replaced with the estimate θi This two-stage approachmay potentially lead to biased and inefficient estimates (Ibrahim Chu and Chen 2010)The number of intervals J (ge 1) for the piecewise constant baseline hazard function needs tobe specified in the NPIECES argument For the PARTITION argument 1 represents ESQP 2represents LBSQP 3 represents MBSQP and 4 represents RBSQPJMFit automatically produces a rich text file (RTF) including five tables (i) Number of Sub-jects (ii) Fit Statistics (iii) Survival Parameter Estimates (Survival Alone) (iv) ParameterEstimates and (v) Hazard Ratios amp λ Estimates

32 Implementation detailsIf the observed longitudinal measures are sparse the full trajectories of longitudinal measuresmight not be well estimated For example in the case of fitting a quadratic trajectory the signof the estimated second-order coefficient could be incorrect if the longitudinal measures wereobserved only within the first half of the follow-up period leading to incorrect extrapolationwhen the observed progression time was far beyond the time of the last observed longitudinalmeasureLet tmaxi = max1lejlemiaij When t gt tmaxi yi(t) is never observed and no longitudinaldata are available to estimate the trajectory θgti g(t) for t gt tmaxi Under SPM1 the extrap-olation of the trajectory θgti g(t) beyond tmaxi may lead to a survival component of the jointmodel that fits the survival data poorly In addition such an extrapolation also causes asevere convergence problem in the SAS NLMIXED (SAS Institute Inc 2011b) procedure espe-cially when tmaxi ti for many subjects To circumvent these issues for SPM1 we modifythe hazard function in (2) as

λ(t | λ0αβθi g zi) = λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+) +αgtzi (14)

or

λ(t | λ0αβθi g zi)

= λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+)timesτ minus (tlowastmaxi + [tminus tlowastmaxi]+)

τ minus tlowastmaxi+αgtzi (15)

where [tminustlowastmaxi]+ = max(tminustlowastmaxi 0) tlowastmaxi = tmaxi+wtimesmax(timinustmaxi 0) w (isin [0 1]) is theproportion of max(ti minus tmaxi 0) and τ = max1leilenti which is the last follow-up survival

Journal of Statistical Software 9

time If w = 0 (default) tmaxi will be the starting point of the modified extrapolation of thetrajectory while w = 1 implies that the trajectory extends to ti with no tmaxi adjustmentThis modification can also be applied to the TS model corresponding to SPM1L or SPM1QThere are two arguments called TMAXI and WEIGHT If TMAXI is missing or equal to 0 no tmaxiadjustment will be applied If TMAXI = 1 the tmaxi adjustment based on hazard functiongiven in (14) will be applied with weight given in the WEIGHT argument that is the trajectorywill become flat after tlowastmaxi If TMAXI = 2 the tmaxi adjustment based on hazard functiongiven in (15) will be applied with weight given in the WEIGHT argument that is startingat tlowastmaxi the trajectory will linearly go down to 0 at the last follow-up survival time Anldquooptimalrdquo choice of weight w in conjunction with TMAXI option may be determined by eitherAICSurv|Long or BICSurv|Long The purpose of defining tmaxi is that we create an extrapolationof the longitudinal measures so that the trajectory function can be well estimated Thisextrapolation is needed when there are few longitudinal measures at later points We notehere that the tmaxi method corresponds to a ldquoprediction carried forwardrdquo approach whichmay potentially induce bias in the estimation We also note that this is not an issue for theSPM2 in which the hazard function is independent of g(t)The OPTIONS argument allows users to specify options (eg integration method optimizationtechnique and convergence criteria) that are available in the PROC NLMIXED statement IfOPTIONS is missing JMFit will use adaptive Gaussian quadrature to approximate the integralof the likelihood over the random effects perform a quasi-Newton optimization and apply arelative gradient convergence criterion of 10minus8 All the methods mentioned above are defaultmethods in PROC NLMIXED The Riemann integral is used to compute the cumulative hazardfunction for the trajectory models Each time interval is divided into 200 subintervalsA big challenge in fitting joint models using the SAS NLMIXED procedure is convergence Poorinitial values may lead to the failure of convergence in NLMIXED To address this issue wefirst fit the longitudinal data alone using the SAS MIXED procedure to obtain the estimatesγ σ2 θ Ω θi for the parameters in the longitudinal component of the joint model Using(θi i = 1 n) to replace (θi i = 1 n) in (2) (or (3)) we fit the survival data aloneto obtain the estimates β (or β) α and λ for the parameters in the survival component ofthe joint model Finally these estimates are used as the initial values for the joint modelJMFit does not exclude any longitudinal measures for the joint models If one wishes toexclude the longitudinal measures observed after the survival time ti those longitudinal mea-sures should be pre-excluded in the input longitudinal data for JMFit ldquoCAUTION Longi-tudinal measures are observed after the survival timerdquo will be given at the end of the outputfile if there are any longitudinal measures observed after the survival time

4 Examples

41 The IBCSG data

To illustrate how JMFit works we use the data from a clinical trial in premenopausal womenwith node-positive breast cancer conducted by the IBCSG (International Breast Cancer StudyGroup 1996) Each participant was randomly assigned in a 2 times 2 factorial design to receive(A) cyclophosphamide methotrexate and fluorouracil for 6 consecutive courses on months 1to 6 (CMF6) (B) CMF6 plus three single courses of reintroduction CMF given on months 9

10 JMFit Joint Models of Longitudinal and Survival Data in SAS

Number of Subjects

Subjects_in_coping 1015

Subjects_in_os 1015

Subjects_Used 1015

Figure 2 The number of subjects for coping

12 and 15 (C) CMF for three consecutive courses on months 1 to 3 (CMF3) or (D) CMF3plus three single courses of reintroduction CMF given on months 6 9 and 12 Four indicatorsof patientsrsquo quality of life (QOL) appetite perceived coping mood and physical well-beingwere collected in this trial We consider a subset of the IBCSG data which consists of 1015patients with at least three values of each longitudinal measure and six binary covariatesincluding treatment (A 1 = icyc 0 = reint and 0 = intera B 1 = icyc 1 = reintand 1 = intera C 0 = icyc 0 = reint and 0 = intera and D 0 = icyc 1 = reintand 0 = intera) age (1 = lsquogt 40rsquo and 0 = lsquole 40rsquo) the number of positive nodes of thetumor (1 = lsquogt 4rsquo and 0 = lsquole 4rsquo) and estrogen receptor (ER) status (1 = positive and 0 =negative) To satisfy the normality assumption for each of the four QOL indicators followingChi and Ibrahim (2006) the corresponding observed value of each QOL was transformed toradic

100minusQOL so that smaller value reflects better QOL For the subset of 1015 patients aftera median follow-up of 968 years (interquartile range 810ndash1104 years) 296 patients diedWe note that QOL indicators were collected only up to 165 years (about 18 months) Byapplying joint models in this study we compare the longitudinal QOL indicators in termsof their contributions to the fit of overall survival (OS) via ∆AIC and ∆BIC which arecomputed using JMFitSuppose we fit the trajectory model with a linear trajectory as considered by Chi and Ibrahim(2006) and Zhu Ibrahim Chi and Tang (2012) and J = 9 for coping Then we create two datasets named as ldquocopingrdquo and ldquoosrdquo for the macro JMFitrsquos LONG and SURV options respectivelyThe MODEL option is set to ldquoSPM1Lrdquo and TS is set to 0 We assign 2 to TMAXI for tmaxiadjustment with WEIGHT=05 The NPIECES option is set to 9 and the PARTITION option isset to 2 for the LBSQP algorithmJMFit is called

JMFit(LONG = coping SURV = os MODEL = SPM1L TS = 0 TMAXI = 2WEIGHT = 05 NPIECES = 9 PARTITION = 2)

The RTF file lsquoOutput for coping under SPM1L with J=9 (Partition=2)rtfrsquo is gener-ated by JMFit which lists five tables Table ldquoNumber of Subjectsrdquo given in Figure 2 showsthat there are 1015 patients in data sets coping and os respectively and 1015 subjects areused implying that the IDs of the subjects in these two data sets matchFigure 3 shows the ldquoFit Statisticsrdquo table which consists of the log likelihood AICLong(BICLong) AICSurv (BICSurv) and ∆AIC (∆BIC) Inside the JMFit macro PROC NLMIXEDprovides the log likelihood AIC and BIC and PROC IML (SAS Institute Inc 2011a) is usedto compute AICLong and BICLongNext the table titled ldquoSurvival Parameter Estimates (Survival Alone)rdquo shown in Figure 4is obtained by fitting the survival data alone For this table the estimate the standard error

Journal of Statistical Software 11

Fit Statistics

Log Likelihood -806953

AIC 1619507 BIC 1633290

AICLong 1383046 BICLong 1388954

AICSurv|Long 236460 BICSurv|Long 244336

AICSurv0 240095 BICSurv0 247479

ΔAIC 3635 ΔBIC 3143

Figure 3 The fit statistics for coping under SPM1L with J = 9 based on LBSQP

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

icyc -000126 01642 1015 -001 09939 (-0323 0321) -000133

reint -02772 01707 1015 -162 01046 (-0612 0058) -000141

intera 02678 02337 1015 115 02520 (-0191 0726) -000211

agegrp -02070 01356 1015 -153 01272 (-0473 0059) 0000143

nodegrp 08926 01170 1015 763 lt0001 (0663 1122) -000009

er_stat -01982 01275 1015 -156 01203 (-0448 0052) -000221

log λ1 -49829 02846 1015 -1751 lt0001 (-5541 -4424) -000031

log λ2 -27823 02848 1015 -977 lt0001 (-3341 -2224) -000051

log λ3 -29729 02373 1015 -1253 lt0001 (-3439 -2507) 0003588

log λ4 -27064 02359 1015 -1147 lt0001 (-3169 -2244) -000073

log λ5 -26998 02364 1015 -1142 lt0001 (-3164 -2236) -000167

log λ6 -29433 02355 1015 -1250 lt0001 (-3406 -2481) 0001627

log λ7 -28127 02387 1015 -1178 lt0001 (-3281 -2344) -000293

log λ8 -30135 02367 1015 -1273 lt0001 (-3478 -2549) 0001908

log λ9 -34253 02358 1015 -1453 lt0001 (-3888 -2963) -000119

Figure 4 The estimates of the parameters obtained by (11) with J = 9 based on LBSQP

(SE) the degrees of freedom (DF) the t value the p value and the 95 confidence interval(CI) are all shown for each parameter In addition the gradient of the negative log-likelihoodfunction is displayed which can be used to check convergence of PROC NLMIXED A smallgradient implies better convergence The largest absolute value of the gradients shown inFigure 4 is 0003588 indicating good convergence of PROC NLMIXEDPROC NLMIXED produces the table titled ldquoParameter Estimatesrdquo in Figure 5 which consistsof three subtables ldquoCovariance Parameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquoand ldquoSurvival Parameter Estimatesrdquo For each parameter in this table the estimate SE DFt value p value 95 CI and gradient are all provided The Subtable ldquoCovariance ParameterEstimatesrdquo consists of the estimates of the three lower-triangle elements (Ω00Ω10Ω11) of the

12 JMFit Joint Models of Longitudinal and Survival Data in SAS

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 05350 003274 1013 1634 lt0001 (0471 0599) -008107

Ω10 -005658 002094 1013 -270 00070 (-0098 -0015) 0101966

Ω11 01725 002650 1013 651 lt0001 (0120 0224) -002461

σ 06149 0007308 1013 8414 lt0001 (0601 0629) 0087386

Longitudinal Parameter Estimates

Intercept 008643 007928 1013 109 02759 (-0069 0242) 0123621

t -04004 002277 1013 -1758 lt0001 (-0445 -0356) -015625

icyc 01625 007073 1013 230 00218 (0024 0301) 0015924

reint 01259 007007 1013 180 00726 (-0012 0263) 0313939

intera -006607 009767 1013 -068 04989 (-0258 0126) 0002171

agegrp 005158 005907 1013 087 03828 (-0064 0167) -013258

nodegrp 002780 005384 1013 052 06057 (-0078 0133) -018797

er_stat 002232 005408 1013 041 06799 (-0084 0128) -005203

Survival Parameter Estimates

icyc -006267 01695 1013 -037 07116 (-0395 0270) 0024537

reint -03355 01756 1013 -191 00563 (-0680 0009) -001849

intera 03464 02406 1013 144 01502 (-0126 0818) 014234

agegrp -01827 01397 1013 -131 01913 (-0457 0091) 0049036

nodegrp 09522 01217 1013 782 lt0001 (0713 1191) 003694

er_stat -01540 01315 1013 -117 02418 (-0412 0104) -000715

log λ1 -49259 02888 1013 -1706 lt0001 (-5493 -4359) 0016501

log λ2 -25917 02923 1013 -887 lt0001 (-3165 -2018) -002471

log λ3 -27430 02482 1013 -1105 lt0001 (-3230 -2256) -00547

log λ4 -24318 02494 1013 -975 lt0001 (-2921 -1942) -006882

log λ5 -23952 02519 1013 -951 lt0001 (-2889 -1901) -007637

log λ6 -26149 02529 1013 -1034 lt0001 (-3111 -2119) -008057

log λ7 -24741 02571 1013 -962 lt0001 (-2979 -1970) -008093

log λ8 -26860 02547 1013 -1054 lt0001 (-3186 -2186) -00729

log λ9 -31547 02510 1013 -1257 lt0001 (-3647 -2662) -005614

β 02643 006188 1013 427 lt0001 (0143 0386) 0314192

Figure 5 The estimates of the parameters for coping under SPM1L with J = 9 based onLBSQP

Journal of Statistical Software 13

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_icyc 09393 (0674 1310)

HR_reint 07150 (0507 1009)

HR_intera 14140 (0882 2267)

HR_agegrp 08330 (0633 1096)

HR_nodegrp 25915 (2041 3291)

HR_er_stat 08573 (0662 1110)

λ1 0007256 (0004 0013)

λ2 007489 (0042 0133)

λ3 006438 (0040 0105)

λ4 008788 (0054 0143)

λ5 009116 (0056 0149)

λ6 007318 (0045 0120)

λ7 008424 (0051 0140)

λ8 006815 (0041 0112)

λ9 004265 (0026 0070)

HR_β 13025 (1154 1471)

Figure 6 The hazard ratios and λ estimates for coping under SPM1L with J = 9 based onLBSQP

random-effects covariance matrix as well as the estimate of the standard deviation (σ) for theerror term The estimates of the coefficients of the variables from the longitudinal componentare shown in Subtable ldquoLongitudinal Parameter Estimatesrdquo In Subtable ldquoSurvival ParameterEstimatesrdquo ldquoParamVarrdquo lists all the names of the variables as well as the parameters from thesurvival component In this subtable log λ1 log λ2 and log λ9 are the natural logarithmsof the piecewise baseline hazards for the three intervals respectively The hazard ratios of thecovariates and β as well as the estimates of λ are shown in Figure 6 From Figure 5 we seethat the number of induction cycles has p values of 00218 and 07116 in the longitudinal andsurvival submodels respectively indicating that the effect of the number of induction coursesis significant at the 005 level in the longitudinal submodel and is not statistically significantin the survival submodel The reintroduction of CMF and the interaction between the numberof initial cycles and the reintroduction do not have significant effects on QOL or OS based onthe large p values From Figure 5 we also see that (i) the number of positive nodes is highlysignificant in the survival submodel implying that a greater number of positive nodes wouldincrease the risk of death and (ii) a significant coefficient β indicates that coping is highlyassociated with OSFrom the ldquoFit Statisticsrdquo tables in Figure 3 and Figure 7 we see that coping had the largestvalues of ∆AIC and ∆BIC and physical well-being had the smallest values of ∆AIC and ∆BICwhich indicate that coping led to the most gain in fitting the OS while physical well-beinghad the least contribution to the fit of the OS

14 JMFit Joint Models of Longitudinal and Survival Data in SAS

Fit Statistics

Log Likelihood -870714

AIC 1747029 BIC 1760812

AICLong 1508529 BICLong 1514436

AICSurv|Long 238500 BICSurv|Long 246376

AICSurv0 240095 BICSurv0 247479

ΔAIC 1596 ΔBIC 1104

Fit Statistics

Log Likelihood -854107

AIC 1713814 BIC 1727597

AICLong 1476026 BICLong 1481933

AICSurv|Long 237788 BICSurv|Long 245664

AICSurv0 240095 BICSurv0 247479

ΔAIC 2308 ΔBIC 1815

(a) Appetite (b) Mood

Fit Statistics

Log Likelihood -876795

AIC 1759190 BIC 1772973

AICLong 1519749 BICLong 1525657

AICSurv|Long 239440 BICSurv|Long 247317

AICSurv0 240095 BICSurv0 247479

ΔAIC 655 ΔBIC 163

(c) Physical well-being

Figure 7 The fit statistics for appetite mood and physical well-being under SPM1L withJ = 9 based on LBSQP

AICSurv|Long ∆AICJ LBSQP MBSQP RBSQP LBSQP MBSQP RBSQP1 252820 252820 252820 370 370 3702 251219 251219 251219 884 884 8843 244981 244981 250867 2041 2041 9654 244598 244598 244598 2152 2152 21525 240602 244795 244396 2928 2155 22436 240799 244959 244561 2932 2160 22477 240959 240959 244757 2941 2941 22508 240734 240734 240734 3054 3054 30549 236460 240938 240922 3635 3050 306410 236565 240990 241105 3654 3060 3063

Table 1 AICSurv|Longrsquos and ∆AICrsquos for coping under SPM1L with different partition algo-rithms and different J rsquos

Since ∆AIC = AICSurv0 minus AICSurv|Long both AICSurv0 and AICSurv|Long change when Jchanges Thus the ldquobest Jrdquo based on ∆AIC may not necessarily lead to the best survivalsubmodel or the best joint model As an illustration Table 1 shows the values of AICSurv|Longand ∆AIC for coping under SPM1L with different partition algorithms and different J rsquosNote TMAXI is set to 2 with WEIGHT = 05 in all the models From Table 1 we see that (i)J = 10 or J = 9 is the ldquobestrdquo choice according to ∆AIC (ii) J = 9 or J = 8 is the ldquobestrdquoone based on AICSurv|Long and (iii) the value of AICSurv|Long is 236460 for LBSQP when

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 6: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

6 JMFit Joint Models of Longitudinal and Survival Data in SAS

for i = 1 n Note that the density of θi is given by

f(θi|θΩ) = |Ω|minus12

(2π)q+1

2exp

minus 1

2(θi minus θ)gtΩminus1(θi minus θ) (8)

Let ϕ = (λβαγ σ2θΩ) Using (2) (or (3)) (7) and (8) the observed-data likelihoodfunction for (yi ti δi) for the ith subject is given by

L(ϕ|yi ti δi ziWi) =intL(λβα|ti δi ziθi g)L(γ σ2|yiWiθi)f(θi|θΩ)dθi (9)

where the complete-data likelihood function for the survival component is written as

L(λβα|ti δi ziθi g) =[λ(ti|λ0βαθi g(ti) zi)

]δi

times expminusint ti

0λ(u|λ0βαθi g(u) zi)du

(10)

for i = 1 n In (2) (or (3)) when β = 0 (or β = 0) the hazard function reduces toλ(t|λ0α zi) = λ0(t) exp(αgtzi) In this case we fit the survival data alone and the likelihoodfunction in (10) for the ith subject reduces to

L0(λα|ti δi zi) = λ0(ti) exp(αgtzi)δi exp[minus exp(αgtzi)

int ti

0λ0(u)du

] (11)

Letting Dobs = (yi ti δixi zi) i = 1 n denote the observed data the joint likelihoodfor all subjects is given by

L(ϕ|g Dobs) =nprodi=1

L(ϕ|yi ti δi ziWi) (12)

24 AIC (BIC) decomposition and ∆AIC (∆BIC)

Write ϕ1 = (γ σ2θΩ) and ϕ2 = (λβα) Let f(θi|yiWiϕ1) be the conditional density ofthe random effects θi given yi and also let L(ϕ1|yiWi) =

intL(γ σ2|yiWiθi)f(θi|θΩ)dθi

which is the likelihood function corresponding to the marginal distribution of yi FollowingZhang et al (2014) the joint likelihood given in (12) can be decomposed as

L(ϕ|g Dobs) = LLong(ϕ1|g Dobs)LSurv|Long(ϕ2|gϕ1 Dobs) (13)

where LLong(ϕ1|g Dobs) =prodni=1 L(ϕ1|yiWi) and LSurv|Long(ϕ2|gϕ1 Dobs) =

prodni=1

intL(ϕ2|ti

δi ziθi g)f(θi|yiWiϕ1)dθi Using (13) the decomposition of the total Akaike InformationCriterion (AIC) (Akaike 1973) developed in Zhang et al (2014) is given as

AIC = AICLong + AICSurv|Long

where AIC = minus2 logL(ϕ|g Dobs)+2 dim(ϕ) AICLong = minus2 logLLong(ϕ1|g Dobs)+2 dim(ϕ1)AICSurv|Long = minus2 logLSurv|Long(ϕ2|g ϕ1 Dobs) + 2 dim(ϕ2) and ϕ ϕ1 and ϕ2 are the maxi-mum likelihood estimates (MLEs) of ϕ ϕ1 and ϕ2 Similarly the total Bayesian InformationCriterion (BIC) (Schwarz 1978) for the joint model can be decomposed into

BIC = BICLong + BICSurv|Long

Journal of Statistical Software 7

where BIC = AIC+dim(ϕ)(lognminus2) BICLong = AICLong+dim(ϕ1) (lognminus2) BICSurv|Long =AICSurv|Long + dim(ϕ2)(logn minus 2) Using the decompositions of AIC and BIC Zhang et al(2014) proposed two new model assessment criteria given by

∆AIC = AICSurv0 minusAICSurv|Long∆BIC = BICSurv0 minus BICSurv|Long

whereAICSurv0 = minus2

sumni=1 logL0(λ α|ti δi zi) + 2 dim(λα)

BICSurv0 = minus2sumni=1 logL0(λ α|ti δi zi) + dim(λα) logn

and L0(λα|ti δi zi) is defined by (11) The ∆AIC or ∆BIC measure the gain of the fit in thesurvival component due to the longitudinal data with a penalty for the additional parametersin the survival component of the joint model The model with a large value of ∆AIC (∆BIC)is more preferred

3 The SAS macro JMFit

31 Design

The SAS macro JMFit has been developed to assess model fit in joint models of longitudinaland survival data In fact it can fit five models including the two types of joint models withlinear and quadratic trajectories as well as the time-varying covariates model The macroJMFit consists of five submacros SPM1L SPM1Q SPM2L SPM2Q and TVC corresponding to thefive models respectively The MODEL argument of JMFit specifies one of the following fivemodels to be fitted

SPM1L SPM1 with Linear trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ(θ0i + θ1it) +αgtzi

SPM1Q SPM1 with Quadratic trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ(θ0i + θ1it+ θ2it2) +αgtzi

SPM2L SPM2 with Linear trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ1θ0i + β2θ1i +αgtzi

where θ0i and θ1i are subject-level random intercept and random slope

SPM2Q SPM2 with Quadratic trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ1θ0i + β2θ1i + β3θ2i +αgtzi

where θ0i θ1i and θ2i are random effects (ie random intercept random slope andrandom quadratic coefficient)

8 JMFit Joint Models of Longitudinal and Survival Data in SAS

TVC Time-Varying Covariates model The hazard function has the form

λ(t|λ0 βα zi yi(t)) = λ0(t) expβyi(t) +αgtzi

where yi(t) = yi(aij) for aij le t lt aij+1 for j = 1 mi where aimi+1 = infin Notethat the TVC model is a ldquonon-jointrdquo model and the use of this model has great potentialfor bias (Fisher and Lin 1999)

We provide the two versions of SPM1L SPM1Q SPM2L and SPM2Q with the TS argumentIf TS is missing or equal to 0 the joint model will be fit while TS = 1 yields the correspondingtwo-stage model Similar to the method in Tsiatis DeGruttola and Wulfsohn (1995) (i) wefirst fit the linear mixed model specified in (1) to the longitudinal data alone and then obtainthe estimates of θi denoted by θi and (ii) we replace θi in (2) or (3) with the estimate θiat the second stage The only difference between the joint model and the corresponding two-stage model is that θi in (2) or (3) is replaced with the estimate θi This two-stage approachmay potentially lead to biased and inefficient estimates (Ibrahim Chu and Chen 2010)The number of intervals J (ge 1) for the piecewise constant baseline hazard function needs tobe specified in the NPIECES argument For the PARTITION argument 1 represents ESQP 2represents LBSQP 3 represents MBSQP and 4 represents RBSQPJMFit automatically produces a rich text file (RTF) including five tables (i) Number of Sub-jects (ii) Fit Statistics (iii) Survival Parameter Estimates (Survival Alone) (iv) ParameterEstimates and (v) Hazard Ratios amp λ Estimates

32 Implementation detailsIf the observed longitudinal measures are sparse the full trajectories of longitudinal measuresmight not be well estimated For example in the case of fitting a quadratic trajectory the signof the estimated second-order coefficient could be incorrect if the longitudinal measures wereobserved only within the first half of the follow-up period leading to incorrect extrapolationwhen the observed progression time was far beyond the time of the last observed longitudinalmeasureLet tmaxi = max1lejlemiaij When t gt tmaxi yi(t) is never observed and no longitudinaldata are available to estimate the trajectory θgti g(t) for t gt tmaxi Under SPM1 the extrap-olation of the trajectory θgti g(t) beyond tmaxi may lead to a survival component of the jointmodel that fits the survival data poorly In addition such an extrapolation also causes asevere convergence problem in the SAS NLMIXED (SAS Institute Inc 2011b) procedure espe-cially when tmaxi ti for many subjects To circumvent these issues for SPM1 we modifythe hazard function in (2) as

λ(t | λ0αβθi g zi) = λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+) +αgtzi (14)

or

λ(t | λ0αβθi g zi)

= λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+)timesτ minus (tlowastmaxi + [tminus tlowastmaxi]+)

τ minus tlowastmaxi+αgtzi (15)

where [tminustlowastmaxi]+ = max(tminustlowastmaxi 0) tlowastmaxi = tmaxi+wtimesmax(timinustmaxi 0) w (isin [0 1]) is theproportion of max(ti minus tmaxi 0) and τ = max1leilenti which is the last follow-up survival

Journal of Statistical Software 9

time If w = 0 (default) tmaxi will be the starting point of the modified extrapolation of thetrajectory while w = 1 implies that the trajectory extends to ti with no tmaxi adjustmentThis modification can also be applied to the TS model corresponding to SPM1L or SPM1QThere are two arguments called TMAXI and WEIGHT If TMAXI is missing or equal to 0 no tmaxiadjustment will be applied If TMAXI = 1 the tmaxi adjustment based on hazard functiongiven in (14) will be applied with weight given in the WEIGHT argument that is the trajectorywill become flat after tlowastmaxi If TMAXI = 2 the tmaxi adjustment based on hazard functiongiven in (15) will be applied with weight given in the WEIGHT argument that is startingat tlowastmaxi the trajectory will linearly go down to 0 at the last follow-up survival time Anldquooptimalrdquo choice of weight w in conjunction with TMAXI option may be determined by eitherAICSurv|Long or BICSurv|Long The purpose of defining tmaxi is that we create an extrapolationof the longitudinal measures so that the trajectory function can be well estimated Thisextrapolation is needed when there are few longitudinal measures at later points We notehere that the tmaxi method corresponds to a ldquoprediction carried forwardrdquo approach whichmay potentially induce bias in the estimation We also note that this is not an issue for theSPM2 in which the hazard function is independent of g(t)The OPTIONS argument allows users to specify options (eg integration method optimizationtechnique and convergence criteria) that are available in the PROC NLMIXED statement IfOPTIONS is missing JMFit will use adaptive Gaussian quadrature to approximate the integralof the likelihood over the random effects perform a quasi-Newton optimization and apply arelative gradient convergence criterion of 10minus8 All the methods mentioned above are defaultmethods in PROC NLMIXED The Riemann integral is used to compute the cumulative hazardfunction for the trajectory models Each time interval is divided into 200 subintervalsA big challenge in fitting joint models using the SAS NLMIXED procedure is convergence Poorinitial values may lead to the failure of convergence in NLMIXED To address this issue wefirst fit the longitudinal data alone using the SAS MIXED procedure to obtain the estimatesγ σ2 θ Ω θi for the parameters in the longitudinal component of the joint model Using(θi i = 1 n) to replace (θi i = 1 n) in (2) (or (3)) we fit the survival data aloneto obtain the estimates β (or β) α and λ for the parameters in the survival component ofthe joint model Finally these estimates are used as the initial values for the joint modelJMFit does not exclude any longitudinal measures for the joint models If one wishes toexclude the longitudinal measures observed after the survival time ti those longitudinal mea-sures should be pre-excluded in the input longitudinal data for JMFit ldquoCAUTION Longi-tudinal measures are observed after the survival timerdquo will be given at the end of the outputfile if there are any longitudinal measures observed after the survival time

4 Examples

41 The IBCSG data

To illustrate how JMFit works we use the data from a clinical trial in premenopausal womenwith node-positive breast cancer conducted by the IBCSG (International Breast Cancer StudyGroup 1996) Each participant was randomly assigned in a 2 times 2 factorial design to receive(A) cyclophosphamide methotrexate and fluorouracil for 6 consecutive courses on months 1to 6 (CMF6) (B) CMF6 plus three single courses of reintroduction CMF given on months 9

10 JMFit Joint Models of Longitudinal and Survival Data in SAS

Number of Subjects

Subjects_in_coping 1015

Subjects_in_os 1015

Subjects_Used 1015

Figure 2 The number of subjects for coping

12 and 15 (C) CMF for three consecutive courses on months 1 to 3 (CMF3) or (D) CMF3plus three single courses of reintroduction CMF given on months 6 9 and 12 Four indicatorsof patientsrsquo quality of life (QOL) appetite perceived coping mood and physical well-beingwere collected in this trial We consider a subset of the IBCSG data which consists of 1015patients with at least three values of each longitudinal measure and six binary covariatesincluding treatment (A 1 = icyc 0 = reint and 0 = intera B 1 = icyc 1 = reintand 1 = intera C 0 = icyc 0 = reint and 0 = intera and D 0 = icyc 1 = reintand 0 = intera) age (1 = lsquogt 40rsquo and 0 = lsquole 40rsquo) the number of positive nodes of thetumor (1 = lsquogt 4rsquo and 0 = lsquole 4rsquo) and estrogen receptor (ER) status (1 = positive and 0 =negative) To satisfy the normality assumption for each of the four QOL indicators followingChi and Ibrahim (2006) the corresponding observed value of each QOL was transformed toradic

100minusQOL so that smaller value reflects better QOL For the subset of 1015 patients aftera median follow-up of 968 years (interquartile range 810ndash1104 years) 296 patients diedWe note that QOL indicators were collected only up to 165 years (about 18 months) Byapplying joint models in this study we compare the longitudinal QOL indicators in termsof their contributions to the fit of overall survival (OS) via ∆AIC and ∆BIC which arecomputed using JMFitSuppose we fit the trajectory model with a linear trajectory as considered by Chi and Ibrahim(2006) and Zhu Ibrahim Chi and Tang (2012) and J = 9 for coping Then we create two datasets named as ldquocopingrdquo and ldquoosrdquo for the macro JMFitrsquos LONG and SURV options respectivelyThe MODEL option is set to ldquoSPM1Lrdquo and TS is set to 0 We assign 2 to TMAXI for tmaxiadjustment with WEIGHT=05 The NPIECES option is set to 9 and the PARTITION option isset to 2 for the LBSQP algorithmJMFit is called

JMFit(LONG = coping SURV = os MODEL = SPM1L TS = 0 TMAXI = 2WEIGHT = 05 NPIECES = 9 PARTITION = 2)

The RTF file lsquoOutput for coping under SPM1L with J=9 (Partition=2)rtfrsquo is gener-ated by JMFit which lists five tables Table ldquoNumber of Subjectsrdquo given in Figure 2 showsthat there are 1015 patients in data sets coping and os respectively and 1015 subjects areused implying that the IDs of the subjects in these two data sets matchFigure 3 shows the ldquoFit Statisticsrdquo table which consists of the log likelihood AICLong(BICLong) AICSurv (BICSurv) and ∆AIC (∆BIC) Inside the JMFit macro PROC NLMIXEDprovides the log likelihood AIC and BIC and PROC IML (SAS Institute Inc 2011a) is usedto compute AICLong and BICLongNext the table titled ldquoSurvival Parameter Estimates (Survival Alone)rdquo shown in Figure 4is obtained by fitting the survival data alone For this table the estimate the standard error

Journal of Statistical Software 11

Fit Statistics

Log Likelihood -806953

AIC 1619507 BIC 1633290

AICLong 1383046 BICLong 1388954

AICSurv|Long 236460 BICSurv|Long 244336

AICSurv0 240095 BICSurv0 247479

ΔAIC 3635 ΔBIC 3143

Figure 3 The fit statistics for coping under SPM1L with J = 9 based on LBSQP

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

icyc -000126 01642 1015 -001 09939 (-0323 0321) -000133

reint -02772 01707 1015 -162 01046 (-0612 0058) -000141

intera 02678 02337 1015 115 02520 (-0191 0726) -000211

agegrp -02070 01356 1015 -153 01272 (-0473 0059) 0000143

nodegrp 08926 01170 1015 763 lt0001 (0663 1122) -000009

er_stat -01982 01275 1015 -156 01203 (-0448 0052) -000221

log λ1 -49829 02846 1015 -1751 lt0001 (-5541 -4424) -000031

log λ2 -27823 02848 1015 -977 lt0001 (-3341 -2224) -000051

log λ3 -29729 02373 1015 -1253 lt0001 (-3439 -2507) 0003588

log λ4 -27064 02359 1015 -1147 lt0001 (-3169 -2244) -000073

log λ5 -26998 02364 1015 -1142 lt0001 (-3164 -2236) -000167

log λ6 -29433 02355 1015 -1250 lt0001 (-3406 -2481) 0001627

log λ7 -28127 02387 1015 -1178 lt0001 (-3281 -2344) -000293

log λ8 -30135 02367 1015 -1273 lt0001 (-3478 -2549) 0001908

log λ9 -34253 02358 1015 -1453 lt0001 (-3888 -2963) -000119

Figure 4 The estimates of the parameters obtained by (11) with J = 9 based on LBSQP

(SE) the degrees of freedom (DF) the t value the p value and the 95 confidence interval(CI) are all shown for each parameter In addition the gradient of the negative log-likelihoodfunction is displayed which can be used to check convergence of PROC NLMIXED A smallgradient implies better convergence The largest absolute value of the gradients shown inFigure 4 is 0003588 indicating good convergence of PROC NLMIXEDPROC NLMIXED produces the table titled ldquoParameter Estimatesrdquo in Figure 5 which consistsof three subtables ldquoCovariance Parameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquoand ldquoSurvival Parameter Estimatesrdquo For each parameter in this table the estimate SE DFt value p value 95 CI and gradient are all provided The Subtable ldquoCovariance ParameterEstimatesrdquo consists of the estimates of the three lower-triangle elements (Ω00Ω10Ω11) of the

12 JMFit Joint Models of Longitudinal and Survival Data in SAS

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 05350 003274 1013 1634 lt0001 (0471 0599) -008107

Ω10 -005658 002094 1013 -270 00070 (-0098 -0015) 0101966

Ω11 01725 002650 1013 651 lt0001 (0120 0224) -002461

σ 06149 0007308 1013 8414 lt0001 (0601 0629) 0087386

Longitudinal Parameter Estimates

Intercept 008643 007928 1013 109 02759 (-0069 0242) 0123621

t -04004 002277 1013 -1758 lt0001 (-0445 -0356) -015625

icyc 01625 007073 1013 230 00218 (0024 0301) 0015924

reint 01259 007007 1013 180 00726 (-0012 0263) 0313939

intera -006607 009767 1013 -068 04989 (-0258 0126) 0002171

agegrp 005158 005907 1013 087 03828 (-0064 0167) -013258

nodegrp 002780 005384 1013 052 06057 (-0078 0133) -018797

er_stat 002232 005408 1013 041 06799 (-0084 0128) -005203

Survival Parameter Estimates

icyc -006267 01695 1013 -037 07116 (-0395 0270) 0024537

reint -03355 01756 1013 -191 00563 (-0680 0009) -001849

intera 03464 02406 1013 144 01502 (-0126 0818) 014234

agegrp -01827 01397 1013 -131 01913 (-0457 0091) 0049036

nodegrp 09522 01217 1013 782 lt0001 (0713 1191) 003694

er_stat -01540 01315 1013 -117 02418 (-0412 0104) -000715

log λ1 -49259 02888 1013 -1706 lt0001 (-5493 -4359) 0016501

log λ2 -25917 02923 1013 -887 lt0001 (-3165 -2018) -002471

log λ3 -27430 02482 1013 -1105 lt0001 (-3230 -2256) -00547

log λ4 -24318 02494 1013 -975 lt0001 (-2921 -1942) -006882

log λ5 -23952 02519 1013 -951 lt0001 (-2889 -1901) -007637

log λ6 -26149 02529 1013 -1034 lt0001 (-3111 -2119) -008057

log λ7 -24741 02571 1013 -962 lt0001 (-2979 -1970) -008093

log λ8 -26860 02547 1013 -1054 lt0001 (-3186 -2186) -00729

log λ9 -31547 02510 1013 -1257 lt0001 (-3647 -2662) -005614

β 02643 006188 1013 427 lt0001 (0143 0386) 0314192

Figure 5 The estimates of the parameters for coping under SPM1L with J = 9 based onLBSQP

Journal of Statistical Software 13

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_icyc 09393 (0674 1310)

HR_reint 07150 (0507 1009)

HR_intera 14140 (0882 2267)

HR_agegrp 08330 (0633 1096)

HR_nodegrp 25915 (2041 3291)

HR_er_stat 08573 (0662 1110)

λ1 0007256 (0004 0013)

λ2 007489 (0042 0133)

λ3 006438 (0040 0105)

λ4 008788 (0054 0143)

λ5 009116 (0056 0149)

λ6 007318 (0045 0120)

λ7 008424 (0051 0140)

λ8 006815 (0041 0112)

λ9 004265 (0026 0070)

HR_β 13025 (1154 1471)

Figure 6 The hazard ratios and λ estimates for coping under SPM1L with J = 9 based onLBSQP

random-effects covariance matrix as well as the estimate of the standard deviation (σ) for theerror term The estimates of the coefficients of the variables from the longitudinal componentare shown in Subtable ldquoLongitudinal Parameter Estimatesrdquo In Subtable ldquoSurvival ParameterEstimatesrdquo ldquoParamVarrdquo lists all the names of the variables as well as the parameters from thesurvival component In this subtable log λ1 log λ2 and log λ9 are the natural logarithmsof the piecewise baseline hazards for the three intervals respectively The hazard ratios of thecovariates and β as well as the estimates of λ are shown in Figure 6 From Figure 5 we seethat the number of induction cycles has p values of 00218 and 07116 in the longitudinal andsurvival submodels respectively indicating that the effect of the number of induction coursesis significant at the 005 level in the longitudinal submodel and is not statistically significantin the survival submodel The reintroduction of CMF and the interaction between the numberof initial cycles and the reintroduction do not have significant effects on QOL or OS based onthe large p values From Figure 5 we also see that (i) the number of positive nodes is highlysignificant in the survival submodel implying that a greater number of positive nodes wouldincrease the risk of death and (ii) a significant coefficient β indicates that coping is highlyassociated with OSFrom the ldquoFit Statisticsrdquo tables in Figure 3 and Figure 7 we see that coping had the largestvalues of ∆AIC and ∆BIC and physical well-being had the smallest values of ∆AIC and ∆BICwhich indicate that coping led to the most gain in fitting the OS while physical well-beinghad the least contribution to the fit of the OS

14 JMFit Joint Models of Longitudinal and Survival Data in SAS

Fit Statistics

Log Likelihood -870714

AIC 1747029 BIC 1760812

AICLong 1508529 BICLong 1514436

AICSurv|Long 238500 BICSurv|Long 246376

AICSurv0 240095 BICSurv0 247479

ΔAIC 1596 ΔBIC 1104

Fit Statistics

Log Likelihood -854107

AIC 1713814 BIC 1727597

AICLong 1476026 BICLong 1481933

AICSurv|Long 237788 BICSurv|Long 245664

AICSurv0 240095 BICSurv0 247479

ΔAIC 2308 ΔBIC 1815

(a) Appetite (b) Mood

Fit Statistics

Log Likelihood -876795

AIC 1759190 BIC 1772973

AICLong 1519749 BICLong 1525657

AICSurv|Long 239440 BICSurv|Long 247317

AICSurv0 240095 BICSurv0 247479

ΔAIC 655 ΔBIC 163

(c) Physical well-being

Figure 7 The fit statistics for appetite mood and physical well-being under SPM1L withJ = 9 based on LBSQP

AICSurv|Long ∆AICJ LBSQP MBSQP RBSQP LBSQP MBSQP RBSQP1 252820 252820 252820 370 370 3702 251219 251219 251219 884 884 8843 244981 244981 250867 2041 2041 9654 244598 244598 244598 2152 2152 21525 240602 244795 244396 2928 2155 22436 240799 244959 244561 2932 2160 22477 240959 240959 244757 2941 2941 22508 240734 240734 240734 3054 3054 30549 236460 240938 240922 3635 3050 306410 236565 240990 241105 3654 3060 3063

Table 1 AICSurv|Longrsquos and ∆AICrsquos for coping under SPM1L with different partition algo-rithms and different J rsquos

Since ∆AIC = AICSurv0 minus AICSurv|Long both AICSurv0 and AICSurv|Long change when Jchanges Thus the ldquobest Jrdquo based on ∆AIC may not necessarily lead to the best survivalsubmodel or the best joint model As an illustration Table 1 shows the values of AICSurv|Longand ∆AIC for coping under SPM1L with different partition algorithms and different J rsquosNote TMAXI is set to 2 with WEIGHT = 05 in all the models From Table 1 we see that (i)J = 10 or J = 9 is the ldquobestrdquo choice according to ∆AIC (ii) J = 9 or J = 8 is the ldquobestrdquoone based on AICSurv|Long and (iii) the value of AICSurv|Long is 236460 for LBSQP when

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 7: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

Journal of Statistical Software 7

where BIC = AIC+dim(ϕ)(lognminus2) BICLong = AICLong+dim(ϕ1) (lognminus2) BICSurv|Long =AICSurv|Long + dim(ϕ2)(logn minus 2) Using the decompositions of AIC and BIC Zhang et al(2014) proposed two new model assessment criteria given by

∆AIC = AICSurv0 minusAICSurv|Long∆BIC = BICSurv0 minus BICSurv|Long

whereAICSurv0 = minus2

sumni=1 logL0(λ α|ti δi zi) + 2 dim(λα)

BICSurv0 = minus2sumni=1 logL0(λ α|ti δi zi) + dim(λα) logn

and L0(λα|ti δi zi) is defined by (11) The ∆AIC or ∆BIC measure the gain of the fit in thesurvival component due to the longitudinal data with a penalty for the additional parametersin the survival component of the joint model The model with a large value of ∆AIC (∆BIC)is more preferred

3 The SAS macro JMFit

31 Design

The SAS macro JMFit has been developed to assess model fit in joint models of longitudinaland survival data In fact it can fit five models including the two types of joint models withlinear and quadratic trajectories as well as the time-varying covariates model The macroJMFit consists of five submacros SPM1L SPM1Q SPM2L SPM2Q and TVC corresponding to thefive models respectively The MODEL argument of JMFit specifies one of the following fivemodels to be fitted

SPM1L SPM1 with Linear trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ(θ0i + θ1it) +αgtzi

SPM1Q SPM1 with Quadratic trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ(θ0i + θ1it+ θ2it2) +αgtzi

SPM2L SPM2 with Linear trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ1θ0i + β2θ1i +αgtzi

where θ0i and θ1i are subject-level random intercept and random slope

SPM2Q SPM2 with Quadratic trajectory The hazard function has the form

λ(t|λ0αβθi g(t) zi) = λ0(t) expβ1θ0i + β2θ1i + β3θ2i +αgtzi

where θ0i θ1i and θ2i are random effects (ie random intercept random slope andrandom quadratic coefficient)

8 JMFit Joint Models of Longitudinal and Survival Data in SAS

TVC Time-Varying Covariates model The hazard function has the form

λ(t|λ0 βα zi yi(t)) = λ0(t) expβyi(t) +αgtzi

where yi(t) = yi(aij) for aij le t lt aij+1 for j = 1 mi where aimi+1 = infin Notethat the TVC model is a ldquonon-jointrdquo model and the use of this model has great potentialfor bias (Fisher and Lin 1999)

We provide the two versions of SPM1L SPM1Q SPM2L and SPM2Q with the TS argumentIf TS is missing or equal to 0 the joint model will be fit while TS = 1 yields the correspondingtwo-stage model Similar to the method in Tsiatis DeGruttola and Wulfsohn (1995) (i) wefirst fit the linear mixed model specified in (1) to the longitudinal data alone and then obtainthe estimates of θi denoted by θi and (ii) we replace θi in (2) or (3) with the estimate θiat the second stage The only difference between the joint model and the corresponding two-stage model is that θi in (2) or (3) is replaced with the estimate θi This two-stage approachmay potentially lead to biased and inefficient estimates (Ibrahim Chu and Chen 2010)The number of intervals J (ge 1) for the piecewise constant baseline hazard function needs tobe specified in the NPIECES argument For the PARTITION argument 1 represents ESQP 2represents LBSQP 3 represents MBSQP and 4 represents RBSQPJMFit automatically produces a rich text file (RTF) including five tables (i) Number of Sub-jects (ii) Fit Statistics (iii) Survival Parameter Estimates (Survival Alone) (iv) ParameterEstimates and (v) Hazard Ratios amp λ Estimates

32 Implementation detailsIf the observed longitudinal measures are sparse the full trajectories of longitudinal measuresmight not be well estimated For example in the case of fitting a quadratic trajectory the signof the estimated second-order coefficient could be incorrect if the longitudinal measures wereobserved only within the first half of the follow-up period leading to incorrect extrapolationwhen the observed progression time was far beyond the time of the last observed longitudinalmeasureLet tmaxi = max1lejlemiaij When t gt tmaxi yi(t) is never observed and no longitudinaldata are available to estimate the trajectory θgti g(t) for t gt tmaxi Under SPM1 the extrap-olation of the trajectory θgti g(t) beyond tmaxi may lead to a survival component of the jointmodel that fits the survival data poorly In addition such an extrapolation also causes asevere convergence problem in the SAS NLMIXED (SAS Institute Inc 2011b) procedure espe-cially when tmaxi ti for many subjects To circumvent these issues for SPM1 we modifythe hazard function in (2) as

λ(t | λ0αβθi g zi) = λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+) +αgtzi (14)

or

λ(t | λ0αβθi g zi)

= λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+)timesτ minus (tlowastmaxi + [tminus tlowastmaxi]+)

τ minus tlowastmaxi+αgtzi (15)

where [tminustlowastmaxi]+ = max(tminustlowastmaxi 0) tlowastmaxi = tmaxi+wtimesmax(timinustmaxi 0) w (isin [0 1]) is theproportion of max(ti minus tmaxi 0) and τ = max1leilenti which is the last follow-up survival

Journal of Statistical Software 9

time If w = 0 (default) tmaxi will be the starting point of the modified extrapolation of thetrajectory while w = 1 implies that the trajectory extends to ti with no tmaxi adjustmentThis modification can also be applied to the TS model corresponding to SPM1L or SPM1QThere are two arguments called TMAXI and WEIGHT If TMAXI is missing or equal to 0 no tmaxiadjustment will be applied If TMAXI = 1 the tmaxi adjustment based on hazard functiongiven in (14) will be applied with weight given in the WEIGHT argument that is the trajectorywill become flat after tlowastmaxi If TMAXI = 2 the tmaxi adjustment based on hazard functiongiven in (15) will be applied with weight given in the WEIGHT argument that is startingat tlowastmaxi the trajectory will linearly go down to 0 at the last follow-up survival time Anldquooptimalrdquo choice of weight w in conjunction with TMAXI option may be determined by eitherAICSurv|Long or BICSurv|Long The purpose of defining tmaxi is that we create an extrapolationof the longitudinal measures so that the trajectory function can be well estimated Thisextrapolation is needed when there are few longitudinal measures at later points We notehere that the tmaxi method corresponds to a ldquoprediction carried forwardrdquo approach whichmay potentially induce bias in the estimation We also note that this is not an issue for theSPM2 in which the hazard function is independent of g(t)The OPTIONS argument allows users to specify options (eg integration method optimizationtechnique and convergence criteria) that are available in the PROC NLMIXED statement IfOPTIONS is missing JMFit will use adaptive Gaussian quadrature to approximate the integralof the likelihood over the random effects perform a quasi-Newton optimization and apply arelative gradient convergence criterion of 10minus8 All the methods mentioned above are defaultmethods in PROC NLMIXED The Riemann integral is used to compute the cumulative hazardfunction for the trajectory models Each time interval is divided into 200 subintervalsA big challenge in fitting joint models using the SAS NLMIXED procedure is convergence Poorinitial values may lead to the failure of convergence in NLMIXED To address this issue wefirst fit the longitudinal data alone using the SAS MIXED procedure to obtain the estimatesγ σ2 θ Ω θi for the parameters in the longitudinal component of the joint model Using(θi i = 1 n) to replace (θi i = 1 n) in (2) (or (3)) we fit the survival data aloneto obtain the estimates β (or β) α and λ for the parameters in the survival component ofthe joint model Finally these estimates are used as the initial values for the joint modelJMFit does not exclude any longitudinal measures for the joint models If one wishes toexclude the longitudinal measures observed after the survival time ti those longitudinal mea-sures should be pre-excluded in the input longitudinal data for JMFit ldquoCAUTION Longi-tudinal measures are observed after the survival timerdquo will be given at the end of the outputfile if there are any longitudinal measures observed after the survival time

4 Examples

41 The IBCSG data

To illustrate how JMFit works we use the data from a clinical trial in premenopausal womenwith node-positive breast cancer conducted by the IBCSG (International Breast Cancer StudyGroup 1996) Each participant was randomly assigned in a 2 times 2 factorial design to receive(A) cyclophosphamide methotrexate and fluorouracil for 6 consecutive courses on months 1to 6 (CMF6) (B) CMF6 plus three single courses of reintroduction CMF given on months 9

10 JMFit Joint Models of Longitudinal and Survival Data in SAS

Number of Subjects

Subjects_in_coping 1015

Subjects_in_os 1015

Subjects_Used 1015

Figure 2 The number of subjects for coping

12 and 15 (C) CMF for three consecutive courses on months 1 to 3 (CMF3) or (D) CMF3plus three single courses of reintroduction CMF given on months 6 9 and 12 Four indicatorsof patientsrsquo quality of life (QOL) appetite perceived coping mood and physical well-beingwere collected in this trial We consider a subset of the IBCSG data which consists of 1015patients with at least three values of each longitudinal measure and six binary covariatesincluding treatment (A 1 = icyc 0 = reint and 0 = intera B 1 = icyc 1 = reintand 1 = intera C 0 = icyc 0 = reint and 0 = intera and D 0 = icyc 1 = reintand 0 = intera) age (1 = lsquogt 40rsquo and 0 = lsquole 40rsquo) the number of positive nodes of thetumor (1 = lsquogt 4rsquo and 0 = lsquole 4rsquo) and estrogen receptor (ER) status (1 = positive and 0 =negative) To satisfy the normality assumption for each of the four QOL indicators followingChi and Ibrahim (2006) the corresponding observed value of each QOL was transformed toradic

100minusQOL so that smaller value reflects better QOL For the subset of 1015 patients aftera median follow-up of 968 years (interquartile range 810ndash1104 years) 296 patients diedWe note that QOL indicators were collected only up to 165 years (about 18 months) Byapplying joint models in this study we compare the longitudinal QOL indicators in termsof their contributions to the fit of overall survival (OS) via ∆AIC and ∆BIC which arecomputed using JMFitSuppose we fit the trajectory model with a linear trajectory as considered by Chi and Ibrahim(2006) and Zhu Ibrahim Chi and Tang (2012) and J = 9 for coping Then we create two datasets named as ldquocopingrdquo and ldquoosrdquo for the macro JMFitrsquos LONG and SURV options respectivelyThe MODEL option is set to ldquoSPM1Lrdquo and TS is set to 0 We assign 2 to TMAXI for tmaxiadjustment with WEIGHT=05 The NPIECES option is set to 9 and the PARTITION option isset to 2 for the LBSQP algorithmJMFit is called

JMFit(LONG = coping SURV = os MODEL = SPM1L TS = 0 TMAXI = 2WEIGHT = 05 NPIECES = 9 PARTITION = 2)

The RTF file lsquoOutput for coping under SPM1L with J=9 (Partition=2)rtfrsquo is gener-ated by JMFit which lists five tables Table ldquoNumber of Subjectsrdquo given in Figure 2 showsthat there are 1015 patients in data sets coping and os respectively and 1015 subjects areused implying that the IDs of the subjects in these two data sets matchFigure 3 shows the ldquoFit Statisticsrdquo table which consists of the log likelihood AICLong(BICLong) AICSurv (BICSurv) and ∆AIC (∆BIC) Inside the JMFit macro PROC NLMIXEDprovides the log likelihood AIC and BIC and PROC IML (SAS Institute Inc 2011a) is usedto compute AICLong and BICLongNext the table titled ldquoSurvival Parameter Estimates (Survival Alone)rdquo shown in Figure 4is obtained by fitting the survival data alone For this table the estimate the standard error

Journal of Statistical Software 11

Fit Statistics

Log Likelihood -806953

AIC 1619507 BIC 1633290

AICLong 1383046 BICLong 1388954

AICSurv|Long 236460 BICSurv|Long 244336

AICSurv0 240095 BICSurv0 247479

ΔAIC 3635 ΔBIC 3143

Figure 3 The fit statistics for coping under SPM1L with J = 9 based on LBSQP

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

icyc -000126 01642 1015 -001 09939 (-0323 0321) -000133

reint -02772 01707 1015 -162 01046 (-0612 0058) -000141

intera 02678 02337 1015 115 02520 (-0191 0726) -000211

agegrp -02070 01356 1015 -153 01272 (-0473 0059) 0000143

nodegrp 08926 01170 1015 763 lt0001 (0663 1122) -000009

er_stat -01982 01275 1015 -156 01203 (-0448 0052) -000221

log λ1 -49829 02846 1015 -1751 lt0001 (-5541 -4424) -000031

log λ2 -27823 02848 1015 -977 lt0001 (-3341 -2224) -000051

log λ3 -29729 02373 1015 -1253 lt0001 (-3439 -2507) 0003588

log λ4 -27064 02359 1015 -1147 lt0001 (-3169 -2244) -000073

log λ5 -26998 02364 1015 -1142 lt0001 (-3164 -2236) -000167

log λ6 -29433 02355 1015 -1250 lt0001 (-3406 -2481) 0001627

log λ7 -28127 02387 1015 -1178 lt0001 (-3281 -2344) -000293

log λ8 -30135 02367 1015 -1273 lt0001 (-3478 -2549) 0001908

log λ9 -34253 02358 1015 -1453 lt0001 (-3888 -2963) -000119

Figure 4 The estimates of the parameters obtained by (11) with J = 9 based on LBSQP

(SE) the degrees of freedom (DF) the t value the p value and the 95 confidence interval(CI) are all shown for each parameter In addition the gradient of the negative log-likelihoodfunction is displayed which can be used to check convergence of PROC NLMIXED A smallgradient implies better convergence The largest absolute value of the gradients shown inFigure 4 is 0003588 indicating good convergence of PROC NLMIXEDPROC NLMIXED produces the table titled ldquoParameter Estimatesrdquo in Figure 5 which consistsof three subtables ldquoCovariance Parameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquoand ldquoSurvival Parameter Estimatesrdquo For each parameter in this table the estimate SE DFt value p value 95 CI and gradient are all provided The Subtable ldquoCovariance ParameterEstimatesrdquo consists of the estimates of the three lower-triangle elements (Ω00Ω10Ω11) of the

12 JMFit Joint Models of Longitudinal and Survival Data in SAS

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 05350 003274 1013 1634 lt0001 (0471 0599) -008107

Ω10 -005658 002094 1013 -270 00070 (-0098 -0015) 0101966

Ω11 01725 002650 1013 651 lt0001 (0120 0224) -002461

σ 06149 0007308 1013 8414 lt0001 (0601 0629) 0087386

Longitudinal Parameter Estimates

Intercept 008643 007928 1013 109 02759 (-0069 0242) 0123621

t -04004 002277 1013 -1758 lt0001 (-0445 -0356) -015625

icyc 01625 007073 1013 230 00218 (0024 0301) 0015924

reint 01259 007007 1013 180 00726 (-0012 0263) 0313939

intera -006607 009767 1013 -068 04989 (-0258 0126) 0002171

agegrp 005158 005907 1013 087 03828 (-0064 0167) -013258

nodegrp 002780 005384 1013 052 06057 (-0078 0133) -018797

er_stat 002232 005408 1013 041 06799 (-0084 0128) -005203

Survival Parameter Estimates

icyc -006267 01695 1013 -037 07116 (-0395 0270) 0024537

reint -03355 01756 1013 -191 00563 (-0680 0009) -001849

intera 03464 02406 1013 144 01502 (-0126 0818) 014234

agegrp -01827 01397 1013 -131 01913 (-0457 0091) 0049036

nodegrp 09522 01217 1013 782 lt0001 (0713 1191) 003694

er_stat -01540 01315 1013 -117 02418 (-0412 0104) -000715

log λ1 -49259 02888 1013 -1706 lt0001 (-5493 -4359) 0016501

log λ2 -25917 02923 1013 -887 lt0001 (-3165 -2018) -002471

log λ3 -27430 02482 1013 -1105 lt0001 (-3230 -2256) -00547

log λ4 -24318 02494 1013 -975 lt0001 (-2921 -1942) -006882

log λ5 -23952 02519 1013 -951 lt0001 (-2889 -1901) -007637

log λ6 -26149 02529 1013 -1034 lt0001 (-3111 -2119) -008057

log λ7 -24741 02571 1013 -962 lt0001 (-2979 -1970) -008093

log λ8 -26860 02547 1013 -1054 lt0001 (-3186 -2186) -00729

log λ9 -31547 02510 1013 -1257 lt0001 (-3647 -2662) -005614

β 02643 006188 1013 427 lt0001 (0143 0386) 0314192

Figure 5 The estimates of the parameters for coping under SPM1L with J = 9 based onLBSQP

Journal of Statistical Software 13

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_icyc 09393 (0674 1310)

HR_reint 07150 (0507 1009)

HR_intera 14140 (0882 2267)

HR_agegrp 08330 (0633 1096)

HR_nodegrp 25915 (2041 3291)

HR_er_stat 08573 (0662 1110)

λ1 0007256 (0004 0013)

λ2 007489 (0042 0133)

λ3 006438 (0040 0105)

λ4 008788 (0054 0143)

λ5 009116 (0056 0149)

λ6 007318 (0045 0120)

λ7 008424 (0051 0140)

λ8 006815 (0041 0112)

λ9 004265 (0026 0070)

HR_β 13025 (1154 1471)

Figure 6 The hazard ratios and λ estimates for coping under SPM1L with J = 9 based onLBSQP

random-effects covariance matrix as well as the estimate of the standard deviation (σ) for theerror term The estimates of the coefficients of the variables from the longitudinal componentare shown in Subtable ldquoLongitudinal Parameter Estimatesrdquo In Subtable ldquoSurvival ParameterEstimatesrdquo ldquoParamVarrdquo lists all the names of the variables as well as the parameters from thesurvival component In this subtable log λ1 log λ2 and log λ9 are the natural logarithmsof the piecewise baseline hazards for the three intervals respectively The hazard ratios of thecovariates and β as well as the estimates of λ are shown in Figure 6 From Figure 5 we seethat the number of induction cycles has p values of 00218 and 07116 in the longitudinal andsurvival submodels respectively indicating that the effect of the number of induction coursesis significant at the 005 level in the longitudinal submodel and is not statistically significantin the survival submodel The reintroduction of CMF and the interaction between the numberof initial cycles and the reintroduction do not have significant effects on QOL or OS based onthe large p values From Figure 5 we also see that (i) the number of positive nodes is highlysignificant in the survival submodel implying that a greater number of positive nodes wouldincrease the risk of death and (ii) a significant coefficient β indicates that coping is highlyassociated with OSFrom the ldquoFit Statisticsrdquo tables in Figure 3 and Figure 7 we see that coping had the largestvalues of ∆AIC and ∆BIC and physical well-being had the smallest values of ∆AIC and ∆BICwhich indicate that coping led to the most gain in fitting the OS while physical well-beinghad the least contribution to the fit of the OS

14 JMFit Joint Models of Longitudinal and Survival Data in SAS

Fit Statistics

Log Likelihood -870714

AIC 1747029 BIC 1760812

AICLong 1508529 BICLong 1514436

AICSurv|Long 238500 BICSurv|Long 246376

AICSurv0 240095 BICSurv0 247479

ΔAIC 1596 ΔBIC 1104

Fit Statistics

Log Likelihood -854107

AIC 1713814 BIC 1727597

AICLong 1476026 BICLong 1481933

AICSurv|Long 237788 BICSurv|Long 245664

AICSurv0 240095 BICSurv0 247479

ΔAIC 2308 ΔBIC 1815

(a) Appetite (b) Mood

Fit Statistics

Log Likelihood -876795

AIC 1759190 BIC 1772973

AICLong 1519749 BICLong 1525657

AICSurv|Long 239440 BICSurv|Long 247317

AICSurv0 240095 BICSurv0 247479

ΔAIC 655 ΔBIC 163

(c) Physical well-being

Figure 7 The fit statistics for appetite mood and physical well-being under SPM1L withJ = 9 based on LBSQP

AICSurv|Long ∆AICJ LBSQP MBSQP RBSQP LBSQP MBSQP RBSQP1 252820 252820 252820 370 370 3702 251219 251219 251219 884 884 8843 244981 244981 250867 2041 2041 9654 244598 244598 244598 2152 2152 21525 240602 244795 244396 2928 2155 22436 240799 244959 244561 2932 2160 22477 240959 240959 244757 2941 2941 22508 240734 240734 240734 3054 3054 30549 236460 240938 240922 3635 3050 306410 236565 240990 241105 3654 3060 3063

Table 1 AICSurv|Longrsquos and ∆AICrsquos for coping under SPM1L with different partition algo-rithms and different J rsquos

Since ∆AIC = AICSurv0 minus AICSurv|Long both AICSurv0 and AICSurv|Long change when Jchanges Thus the ldquobest Jrdquo based on ∆AIC may not necessarily lead to the best survivalsubmodel or the best joint model As an illustration Table 1 shows the values of AICSurv|Longand ∆AIC for coping under SPM1L with different partition algorithms and different J rsquosNote TMAXI is set to 2 with WEIGHT = 05 in all the models From Table 1 we see that (i)J = 10 or J = 9 is the ldquobestrdquo choice according to ∆AIC (ii) J = 9 or J = 8 is the ldquobestrdquoone based on AICSurv|Long and (iii) the value of AICSurv|Long is 236460 for LBSQP when

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 8: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

8 JMFit Joint Models of Longitudinal and Survival Data in SAS

TVC Time-Varying Covariates model The hazard function has the form

λ(t|λ0 βα zi yi(t)) = λ0(t) expβyi(t) +αgtzi

where yi(t) = yi(aij) for aij le t lt aij+1 for j = 1 mi where aimi+1 = infin Notethat the TVC model is a ldquonon-jointrdquo model and the use of this model has great potentialfor bias (Fisher and Lin 1999)

We provide the two versions of SPM1L SPM1Q SPM2L and SPM2Q with the TS argumentIf TS is missing or equal to 0 the joint model will be fit while TS = 1 yields the correspondingtwo-stage model Similar to the method in Tsiatis DeGruttola and Wulfsohn (1995) (i) wefirst fit the linear mixed model specified in (1) to the longitudinal data alone and then obtainthe estimates of θi denoted by θi and (ii) we replace θi in (2) or (3) with the estimate θiat the second stage The only difference between the joint model and the corresponding two-stage model is that θi in (2) or (3) is replaced with the estimate θi This two-stage approachmay potentially lead to biased and inefficient estimates (Ibrahim Chu and Chen 2010)The number of intervals J (ge 1) for the piecewise constant baseline hazard function needs tobe specified in the NPIECES argument For the PARTITION argument 1 represents ESQP 2represents LBSQP 3 represents MBSQP and 4 represents RBSQPJMFit automatically produces a rich text file (RTF) including five tables (i) Number of Sub-jects (ii) Fit Statistics (iii) Survival Parameter Estimates (Survival Alone) (iv) ParameterEstimates and (v) Hazard Ratios amp λ Estimates

32 Implementation detailsIf the observed longitudinal measures are sparse the full trajectories of longitudinal measuresmight not be well estimated For example in the case of fitting a quadratic trajectory the signof the estimated second-order coefficient could be incorrect if the longitudinal measures wereobserved only within the first half of the follow-up period leading to incorrect extrapolationwhen the observed progression time was far beyond the time of the last observed longitudinalmeasureLet tmaxi = max1lejlemiaij When t gt tmaxi yi(t) is never observed and no longitudinaldata are available to estimate the trajectory θgti g(t) for t gt tmaxi Under SPM1 the extrap-olation of the trajectory θgti g(t) beyond tmaxi may lead to a survival component of the jointmodel that fits the survival data poorly In addition such an extrapolation also causes asevere convergence problem in the SAS NLMIXED (SAS Institute Inc 2011b) procedure espe-cially when tmaxi ti for many subjects To circumvent these issues for SPM1 we modifythe hazard function in (2) as

λ(t | λ0αβθi g zi) = λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+) +αgtzi (14)

or

λ(t | λ0αβθi g zi)

= λ0(t) expβθgti g(tminus [tminus tlowastmaxi]+)timesτ minus (tlowastmaxi + [tminus tlowastmaxi]+)

τ minus tlowastmaxi+αgtzi (15)

where [tminustlowastmaxi]+ = max(tminustlowastmaxi 0) tlowastmaxi = tmaxi+wtimesmax(timinustmaxi 0) w (isin [0 1]) is theproportion of max(ti minus tmaxi 0) and τ = max1leilenti which is the last follow-up survival

Journal of Statistical Software 9

time If w = 0 (default) tmaxi will be the starting point of the modified extrapolation of thetrajectory while w = 1 implies that the trajectory extends to ti with no tmaxi adjustmentThis modification can also be applied to the TS model corresponding to SPM1L or SPM1QThere are two arguments called TMAXI and WEIGHT If TMAXI is missing or equal to 0 no tmaxiadjustment will be applied If TMAXI = 1 the tmaxi adjustment based on hazard functiongiven in (14) will be applied with weight given in the WEIGHT argument that is the trajectorywill become flat after tlowastmaxi If TMAXI = 2 the tmaxi adjustment based on hazard functiongiven in (15) will be applied with weight given in the WEIGHT argument that is startingat tlowastmaxi the trajectory will linearly go down to 0 at the last follow-up survival time Anldquooptimalrdquo choice of weight w in conjunction with TMAXI option may be determined by eitherAICSurv|Long or BICSurv|Long The purpose of defining tmaxi is that we create an extrapolationof the longitudinal measures so that the trajectory function can be well estimated Thisextrapolation is needed when there are few longitudinal measures at later points We notehere that the tmaxi method corresponds to a ldquoprediction carried forwardrdquo approach whichmay potentially induce bias in the estimation We also note that this is not an issue for theSPM2 in which the hazard function is independent of g(t)The OPTIONS argument allows users to specify options (eg integration method optimizationtechnique and convergence criteria) that are available in the PROC NLMIXED statement IfOPTIONS is missing JMFit will use adaptive Gaussian quadrature to approximate the integralof the likelihood over the random effects perform a quasi-Newton optimization and apply arelative gradient convergence criterion of 10minus8 All the methods mentioned above are defaultmethods in PROC NLMIXED The Riemann integral is used to compute the cumulative hazardfunction for the trajectory models Each time interval is divided into 200 subintervalsA big challenge in fitting joint models using the SAS NLMIXED procedure is convergence Poorinitial values may lead to the failure of convergence in NLMIXED To address this issue wefirst fit the longitudinal data alone using the SAS MIXED procedure to obtain the estimatesγ σ2 θ Ω θi for the parameters in the longitudinal component of the joint model Using(θi i = 1 n) to replace (θi i = 1 n) in (2) (or (3)) we fit the survival data aloneto obtain the estimates β (or β) α and λ for the parameters in the survival component ofthe joint model Finally these estimates are used as the initial values for the joint modelJMFit does not exclude any longitudinal measures for the joint models If one wishes toexclude the longitudinal measures observed after the survival time ti those longitudinal mea-sures should be pre-excluded in the input longitudinal data for JMFit ldquoCAUTION Longi-tudinal measures are observed after the survival timerdquo will be given at the end of the outputfile if there are any longitudinal measures observed after the survival time

4 Examples

41 The IBCSG data

To illustrate how JMFit works we use the data from a clinical trial in premenopausal womenwith node-positive breast cancer conducted by the IBCSG (International Breast Cancer StudyGroup 1996) Each participant was randomly assigned in a 2 times 2 factorial design to receive(A) cyclophosphamide methotrexate and fluorouracil for 6 consecutive courses on months 1to 6 (CMF6) (B) CMF6 plus three single courses of reintroduction CMF given on months 9

10 JMFit Joint Models of Longitudinal and Survival Data in SAS

Number of Subjects

Subjects_in_coping 1015

Subjects_in_os 1015

Subjects_Used 1015

Figure 2 The number of subjects for coping

12 and 15 (C) CMF for three consecutive courses on months 1 to 3 (CMF3) or (D) CMF3plus three single courses of reintroduction CMF given on months 6 9 and 12 Four indicatorsof patientsrsquo quality of life (QOL) appetite perceived coping mood and physical well-beingwere collected in this trial We consider a subset of the IBCSG data which consists of 1015patients with at least three values of each longitudinal measure and six binary covariatesincluding treatment (A 1 = icyc 0 = reint and 0 = intera B 1 = icyc 1 = reintand 1 = intera C 0 = icyc 0 = reint and 0 = intera and D 0 = icyc 1 = reintand 0 = intera) age (1 = lsquogt 40rsquo and 0 = lsquole 40rsquo) the number of positive nodes of thetumor (1 = lsquogt 4rsquo and 0 = lsquole 4rsquo) and estrogen receptor (ER) status (1 = positive and 0 =negative) To satisfy the normality assumption for each of the four QOL indicators followingChi and Ibrahim (2006) the corresponding observed value of each QOL was transformed toradic

100minusQOL so that smaller value reflects better QOL For the subset of 1015 patients aftera median follow-up of 968 years (interquartile range 810ndash1104 years) 296 patients diedWe note that QOL indicators were collected only up to 165 years (about 18 months) Byapplying joint models in this study we compare the longitudinal QOL indicators in termsof their contributions to the fit of overall survival (OS) via ∆AIC and ∆BIC which arecomputed using JMFitSuppose we fit the trajectory model with a linear trajectory as considered by Chi and Ibrahim(2006) and Zhu Ibrahim Chi and Tang (2012) and J = 9 for coping Then we create two datasets named as ldquocopingrdquo and ldquoosrdquo for the macro JMFitrsquos LONG and SURV options respectivelyThe MODEL option is set to ldquoSPM1Lrdquo and TS is set to 0 We assign 2 to TMAXI for tmaxiadjustment with WEIGHT=05 The NPIECES option is set to 9 and the PARTITION option isset to 2 for the LBSQP algorithmJMFit is called

JMFit(LONG = coping SURV = os MODEL = SPM1L TS = 0 TMAXI = 2WEIGHT = 05 NPIECES = 9 PARTITION = 2)

The RTF file lsquoOutput for coping under SPM1L with J=9 (Partition=2)rtfrsquo is gener-ated by JMFit which lists five tables Table ldquoNumber of Subjectsrdquo given in Figure 2 showsthat there are 1015 patients in data sets coping and os respectively and 1015 subjects areused implying that the IDs of the subjects in these two data sets matchFigure 3 shows the ldquoFit Statisticsrdquo table which consists of the log likelihood AICLong(BICLong) AICSurv (BICSurv) and ∆AIC (∆BIC) Inside the JMFit macro PROC NLMIXEDprovides the log likelihood AIC and BIC and PROC IML (SAS Institute Inc 2011a) is usedto compute AICLong and BICLongNext the table titled ldquoSurvival Parameter Estimates (Survival Alone)rdquo shown in Figure 4is obtained by fitting the survival data alone For this table the estimate the standard error

Journal of Statistical Software 11

Fit Statistics

Log Likelihood -806953

AIC 1619507 BIC 1633290

AICLong 1383046 BICLong 1388954

AICSurv|Long 236460 BICSurv|Long 244336

AICSurv0 240095 BICSurv0 247479

ΔAIC 3635 ΔBIC 3143

Figure 3 The fit statistics for coping under SPM1L with J = 9 based on LBSQP

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

icyc -000126 01642 1015 -001 09939 (-0323 0321) -000133

reint -02772 01707 1015 -162 01046 (-0612 0058) -000141

intera 02678 02337 1015 115 02520 (-0191 0726) -000211

agegrp -02070 01356 1015 -153 01272 (-0473 0059) 0000143

nodegrp 08926 01170 1015 763 lt0001 (0663 1122) -000009

er_stat -01982 01275 1015 -156 01203 (-0448 0052) -000221

log λ1 -49829 02846 1015 -1751 lt0001 (-5541 -4424) -000031

log λ2 -27823 02848 1015 -977 lt0001 (-3341 -2224) -000051

log λ3 -29729 02373 1015 -1253 lt0001 (-3439 -2507) 0003588

log λ4 -27064 02359 1015 -1147 lt0001 (-3169 -2244) -000073

log λ5 -26998 02364 1015 -1142 lt0001 (-3164 -2236) -000167

log λ6 -29433 02355 1015 -1250 lt0001 (-3406 -2481) 0001627

log λ7 -28127 02387 1015 -1178 lt0001 (-3281 -2344) -000293

log λ8 -30135 02367 1015 -1273 lt0001 (-3478 -2549) 0001908

log λ9 -34253 02358 1015 -1453 lt0001 (-3888 -2963) -000119

Figure 4 The estimates of the parameters obtained by (11) with J = 9 based on LBSQP

(SE) the degrees of freedom (DF) the t value the p value and the 95 confidence interval(CI) are all shown for each parameter In addition the gradient of the negative log-likelihoodfunction is displayed which can be used to check convergence of PROC NLMIXED A smallgradient implies better convergence The largest absolute value of the gradients shown inFigure 4 is 0003588 indicating good convergence of PROC NLMIXEDPROC NLMIXED produces the table titled ldquoParameter Estimatesrdquo in Figure 5 which consistsof three subtables ldquoCovariance Parameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquoand ldquoSurvival Parameter Estimatesrdquo For each parameter in this table the estimate SE DFt value p value 95 CI and gradient are all provided The Subtable ldquoCovariance ParameterEstimatesrdquo consists of the estimates of the three lower-triangle elements (Ω00Ω10Ω11) of the

12 JMFit Joint Models of Longitudinal and Survival Data in SAS

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 05350 003274 1013 1634 lt0001 (0471 0599) -008107

Ω10 -005658 002094 1013 -270 00070 (-0098 -0015) 0101966

Ω11 01725 002650 1013 651 lt0001 (0120 0224) -002461

σ 06149 0007308 1013 8414 lt0001 (0601 0629) 0087386

Longitudinal Parameter Estimates

Intercept 008643 007928 1013 109 02759 (-0069 0242) 0123621

t -04004 002277 1013 -1758 lt0001 (-0445 -0356) -015625

icyc 01625 007073 1013 230 00218 (0024 0301) 0015924

reint 01259 007007 1013 180 00726 (-0012 0263) 0313939

intera -006607 009767 1013 -068 04989 (-0258 0126) 0002171

agegrp 005158 005907 1013 087 03828 (-0064 0167) -013258

nodegrp 002780 005384 1013 052 06057 (-0078 0133) -018797

er_stat 002232 005408 1013 041 06799 (-0084 0128) -005203

Survival Parameter Estimates

icyc -006267 01695 1013 -037 07116 (-0395 0270) 0024537

reint -03355 01756 1013 -191 00563 (-0680 0009) -001849

intera 03464 02406 1013 144 01502 (-0126 0818) 014234

agegrp -01827 01397 1013 -131 01913 (-0457 0091) 0049036

nodegrp 09522 01217 1013 782 lt0001 (0713 1191) 003694

er_stat -01540 01315 1013 -117 02418 (-0412 0104) -000715

log λ1 -49259 02888 1013 -1706 lt0001 (-5493 -4359) 0016501

log λ2 -25917 02923 1013 -887 lt0001 (-3165 -2018) -002471

log λ3 -27430 02482 1013 -1105 lt0001 (-3230 -2256) -00547

log λ4 -24318 02494 1013 -975 lt0001 (-2921 -1942) -006882

log λ5 -23952 02519 1013 -951 lt0001 (-2889 -1901) -007637

log λ6 -26149 02529 1013 -1034 lt0001 (-3111 -2119) -008057

log λ7 -24741 02571 1013 -962 lt0001 (-2979 -1970) -008093

log λ8 -26860 02547 1013 -1054 lt0001 (-3186 -2186) -00729

log λ9 -31547 02510 1013 -1257 lt0001 (-3647 -2662) -005614

β 02643 006188 1013 427 lt0001 (0143 0386) 0314192

Figure 5 The estimates of the parameters for coping under SPM1L with J = 9 based onLBSQP

Journal of Statistical Software 13

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_icyc 09393 (0674 1310)

HR_reint 07150 (0507 1009)

HR_intera 14140 (0882 2267)

HR_agegrp 08330 (0633 1096)

HR_nodegrp 25915 (2041 3291)

HR_er_stat 08573 (0662 1110)

λ1 0007256 (0004 0013)

λ2 007489 (0042 0133)

λ3 006438 (0040 0105)

λ4 008788 (0054 0143)

λ5 009116 (0056 0149)

λ6 007318 (0045 0120)

λ7 008424 (0051 0140)

λ8 006815 (0041 0112)

λ9 004265 (0026 0070)

HR_β 13025 (1154 1471)

Figure 6 The hazard ratios and λ estimates for coping under SPM1L with J = 9 based onLBSQP

random-effects covariance matrix as well as the estimate of the standard deviation (σ) for theerror term The estimates of the coefficients of the variables from the longitudinal componentare shown in Subtable ldquoLongitudinal Parameter Estimatesrdquo In Subtable ldquoSurvival ParameterEstimatesrdquo ldquoParamVarrdquo lists all the names of the variables as well as the parameters from thesurvival component In this subtable log λ1 log λ2 and log λ9 are the natural logarithmsof the piecewise baseline hazards for the three intervals respectively The hazard ratios of thecovariates and β as well as the estimates of λ are shown in Figure 6 From Figure 5 we seethat the number of induction cycles has p values of 00218 and 07116 in the longitudinal andsurvival submodels respectively indicating that the effect of the number of induction coursesis significant at the 005 level in the longitudinal submodel and is not statistically significantin the survival submodel The reintroduction of CMF and the interaction between the numberof initial cycles and the reintroduction do not have significant effects on QOL or OS based onthe large p values From Figure 5 we also see that (i) the number of positive nodes is highlysignificant in the survival submodel implying that a greater number of positive nodes wouldincrease the risk of death and (ii) a significant coefficient β indicates that coping is highlyassociated with OSFrom the ldquoFit Statisticsrdquo tables in Figure 3 and Figure 7 we see that coping had the largestvalues of ∆AIC and ∆BIC and physical well-being had the smallest values of ∆AIC and ∆BICwhich indicate that coping led to the most gain in fitting the OS while physical well-beinghad the least contribution to the fit of the OS

14 JMFit Joint Models of Longitudinal and Survival Data in SAS

Fit Statistics

Log Likelihood -870714

AIC 1747029 BIC 1760812

AICLong 1508529 BICLong 1514436

AICSurv|Long 238500 BICSurv|Long 246376

AICSurv0 240095 BICSurv0 247479

ΔAIC 1596 ΔBIC 1104

Fit Statistics

Log Likelihood -854107

AIC 1713814 BIC 1727597

AICLong 1476026 BICLong 1481933

AICSurv|Long 237788 BICSurv|Long 245664

AICSurv0 240095 BICSurv0 247479

ΔAIC 2308 ΔBIC 1815

(a) Appetite (b) Mood

Fit Statistics

Log Likelihood -876795

AIC 1759190 BIC 1772973

AICLong 1519749 BICLong 1525657

AICSurv|Long 239440 BICSurv|Long 247317

AICSurv0 240095 BICSurv0 247479

ΔAIC 655 ΔBIC 163

(c) Physical well-being

Figure 7 The fit statistics for appetite mood and physical well-being under SPM1L withJ = 9 based on LBSQP

AICSurv|Long ∆AICJ LBSQP MBSQP RBSQP LBSQP MBSQP RBSQP1 252820 252820 252820 370 370 3702 251219 251219 251219 884 884 8843 244981 244981 250867 2041 2041 9654 244598 244598 244598 2152 2152 21525 240602 244795 244396 2928 2155 22436 240799 244959 244561 2932 2160 22477 240959 240959 244757 2941 2941 22508 240734 240734 240734 3054 3054 30549 236460 240938 240922 3635 3050 306410 236565 240990 241105 3654 3060 3063

Table 1 AICSurv|Longrsquos and ∆AICrsquos for coping under SPM1L with different partition algo-rithms and different J rsquos

Since ∆AIC = AICSurv0 minus AICSurv|Long both AICSurv0 and AICSurv|Long change when Jchanges Thus the ldquobest Jrdquo based on ∆AIC may not necessarily lead to the best survivalsubmodel or the best joint model As an illustration Table 1 shows the values of AICSurv|Longand ∆AIC for coping under SPM1L with different partition algorithms and different J rsquosNote TMAXI is set to 2 with WEIGHT = 05 in all the models From Table 1 we see that (i)J = 10 or J = 9 is the ldquobestrdquo choice according to ∆AIC (ii) J = 9 or J = 8 is the ldquobestrdquoone based on AICSurv|Long and (iii) the value of AICSurv|Long is 236460 for LBSQP when

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 9: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

Journal of Statistical Software 9

time If w = 0 (default) tmaxi will be the starting point of the modified extrapolation of thetrajectory while w = 1 implies that the trajectory extends to ti with no tmaxi adjustmentThis modification can also be applied to the TS model corresponding to SPM1L or SPM1QThere are two arguments called TMAXI and WEIGHT If TMAXI is missing or equal to 0 no tmaxiadjustment will be applied If TMAXI = 1 the tmaxi adjustment based on hazard functiongiven in (14) will be applied with weight given in the WEIGHT argument that is the trajectorywill become flat after tlowastmaxi If TMAXI = 2 the tmaxi adjustment based on hazard functiongiven in (15) will be applied with weight given in the WEIGHT argument that is startingat tlowastmaxi the trajectory will linearly go down to 0 at the last follow-up survival time Anldquooptimalrdquo choice of weight w in conjunction with TMAXI option may be determined by eitherAICSurv|Long or BICSurv|Long The purpose of defining tmaxi is that we create an extrapolationof the longitudinal measures so that the trajectory function can be well estimated Thisextrapolation is needed when there are few longitudinal measures at later points We notehere that the tmaxi method corresponds to a ldquoprediction carried forwardrdquo approach whichmay potentially induce bias in the estimation We also note that this is not an issue for theSPM2 in which the hazard function is independent of g(t)The OPTIONS argument allows users to specify options (eg integration method optimizationtechnique and convergence criteria) that are available in the PROC NLMIXED statement IfOPTIONS is missing JMFit will use adaptive Gaussian quadrature to approximate the integralof the likelihood over the random effects perform a quasi-Newton optimization and apply arelative gradient convergence criterion of 10minus8 All the methods mentioned above are defaultmethods in PROC NLMIXED The Riemann integral is used to compute the cumulative hazardfunction for the trajectory models Each time interval is divided into 200 subintervalsA big challenge in fitting joint models using the SAS NLMIXED procedure is convergence Poorinitial values may lead to the failure of convergence in NLMIXED To address this issue wefirst fit the longitudinal data alone using the SAS MIXED procedure to obtain the estimatesγ σ2 θ Ω θi for the parameters in the longitudinal component of the joint model Using(θi i = 1 n) to replace (θi i = 1 n) in (2) (or (3)) we fit the survival data aloneto obtain the estimates β (or β) α and λ for the parameters in the survival component ofthe joint model Finally these estimates are used as the initial values for the joint modelJMFit does not exclude any longitudinal measures for the joint models If one wishes toexclude the longitudinal measures observed after the survival time ti those longitudinal mea-sures should be pre-excluded in the input longitudinal data for JMFit ldquoCAUTION Longi-tudinal measures are observed after the survival timerdquo will be given at the end of the outputfile if there are any longitudinal measures observed after the survival time

4 Examples

41 The IBCSG data

To illustrate how JMFit works we use the data from a clinical trial in premenopausal womenwith node-positive breast cancer conducted by the IBCSG (International Breast Cancer StudyGroup 1996) Each participant was randomly assigned in a 2 times 2 factorial design to receive(A) cyclophosphamide methotrexate and fluorouracil for 6 consecutive courses on months 1to 6 (CMF6) (B) CMF6 plus three single courses of reintroduction CMF given on months 9

10 JMFit Joint Models of Longitudinal and Survival Data in SAS

Number of Subjects

Subjects_in_coping 1015

Subjects_in_os 1015

Subjects_Used 1015

Figure 2 The number of subjects for coping

12 and 15 (C) CMF for three consecutive courses on months 1 to 3 (CMF3) or (D) CMF3plus three single courses of reintroduction CMF given on months 6 9 and 12 Four indicatorsof patientsrsquo quality of life (QOL) appetite perceived coping mood and physical well-beingwere collected in this trial We consider a subset of the IBCSG data which consists of 1015patients with at least three values of each longitudinal measure and six binary covariatesincluding treatment (A 1 = icyc 0 = reint and 0 = intera B 1 = icyc 1 = reintand 1 = intera C 0 = icyc 0 = reint and 0 = intera and D 0 = icyc 1 = reintand 0 = intera) age (1 = lsquogt 40rsquo and 0 = lsquole 40rsquo) the number of positive nodes of thetumor (1 = lsquogt 4rsquo and 0 = lsquole 4rsquo) and estrogen receptor (ER) status (1 = positive and 0 =negative) To satisfy the normality assumption for each of the four QOL indicators followingChi and Ibrahim (2006) the corresponding observed value of each QOL was transformed toradic

100minusQOL so that smaller value reflects better QOL For the subset of 1015 patients aftera median follow-up of 968 years (interquartile range 810ndash1104 years) 296 patients diedWe note that QOL indicators were collected only up to 165 years (about 18 months) Byapplying joint models in this study we compare the longitudinal QOL indicators in termsof their contributions to the fit of overall survival (OS) via ∆AIC and ∆BIC which arecomputed using JMFitSuppose we fit the trajectory model with a linear trajectory as considered by Chi and Ibrahim(2006) and Zhu Ibrahim Chi and Tang (2012) and J = 9 for coping Then we create two datasets named as ldquocopingrdquo and ldquoosrdquo for the macro JMFitrsquos LONG and SURV options respectivelyThe MODEL option is set to ldquoSPM1Lrdquo and TS is set to 0 We assign 2 to TMAXI for tmaxiadjustment with WEIGHT=05 The NPIECES option is set to 9 and the PARTITION option isset to 2 for the LBSQP algorithmJMFit is called

JMFit(LONG = coping SURV = os MODEL = SPM1L TS = 0 TMAXI = 2WEIGHT = 05 NPIECES = 9 PARTITION = 2)

The RTF file lsquoOutput for coping under SPM1L with J=9 (Partition=2)rtfrsquo is gener-ated by JMFit which lists five tables Table ldquoNumber of Subjectsrdquo given in Figure 2 showsthat there are 1015 patients in data sets coping and os respectively and 1015 subjects areused implying that the IDs of the subjects in these two data sets matchFigure 3 shows the ldquoFit Statisticsrdquo table which consists of the log likelihood AICLong(BICLong) AICSurv (BICSurv) and ∆AIC (∆BIC) Inside the JMFit macro PROC NLMIXEDprovides the log likelihood AIC and BIC and PROC IML (SAS Institute Inc 2011a) is usedto compute AICLong and BICLongNext the table titled ldquoSurvival Parameter Estimates (Survival Alone)rdquo shown in Figure 4is obtained by fitting the survival data alone For this table the estimate the standard error

Journal of Statistical Software 11

Fit Statistics

Log Likelihood -806953

AIC 1619507 BIC 1633290

AICLong 1383046 BICLong 1388954

AICSurv|Long 236460 BICSurv|Long 244336

AICSurv0 240095 BICSurv0 247479

ΔAIC 3635 ΔBIC 3143

Figure 3 The fit statistics for coping under SPM1L with J = 9 based on LBSQP

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

icyc -000126 01642 1015 -001 09939 (-0323 0321) -000133

reint -02772 01707 1015 -162 01046 (-0612 0058) -000141

intera 02678 02337 1015 115 02520 (-0191 0726) -000211

agegrp -02070 01356 1015 -153 01272 (-0473 0059) 0000143

nodegrp 08926 01170 1015 763 lt0001 (0663 1122) -000009

er_stat -01982 01275 1015 -156 01203 (-0448 0052) -000221

log λ1 -49829 02846 1015 -1751 lt0001 (-5541 -4424) -000031

log λ2 -27823 02848 1015 -977 lt0001 (-3341 -2224) -000051

log λ3 -29729 02373 1015 -1253 lt0001 (-3439 -2507) 0003588

log λ4 -27064 02359 1015 -1147 lt0001 (-3169 -2244) -000073

log λ5 -26998 02364 1015 -1142 lt0001 (-3164 -2236) -000167

log λ6 -29433 02355 1015 -1250 lt0001 (-3406 -2481) 0001627

log λ7 -28127 02387 1015 -1178 lt0001 (-3281 -2344) -000293

log λ8 -30135 02367 1015 -1273 lt0001 (-3478 -2549) 0001908

log λ9 -34253 02358 1015 -1453 lt0001 (-3888 -2963) -000119

Figure 4 The estimates of the parameters obtained by (11) with J = 9 based on LBSQP

(SE) the degrees of freedom (DF) the t value the p value and the 95 confidence interval(CI) are all shown for each parameter In addition the gradient of the negative log-likelihoodfunction is displayed which can be used to check convergence of PROC NLMIXED A smallgradient implies better convergence The largest absolute value of the gradients shown inFigure 4 is 0003588 indicating good convergence of PROC NLMIXEDPROC NLMIXED produces the table titled ldquoParameter Estimatesrdquo in Figure 5 which consistsof three subtables ldquoCovariance Parameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquoand ldquoSurvival Parameter Estimatesrdquo For each parameter in this table the estimate SE DFt value p value 95 CI and gradient are all provided The Subtable ldquoCovariance ParameterEstimatesrdquo consists of the estimates of the three lower-triangle elements (Ω00Ω10Ω11) of the

12 JMFit Joint Models of Longitudinal and Survival Data in SAS

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 05350 003274 1013 1634 lt0001 (0471 0599) -008107

Ω10 -005658 002094 1013 -270 00070 (-0098 -0015) 0101966

Ω11 01725 002650 1013 651 lt0001 (0120 0224) -002461

σ 06149 0007308 1013 8414 lt0001 (0601 0629) 0087386

Longitudinal Parameter Estimates

Intercept 008643 007928 1013 109 02759 (-0069 0242) 0123621

t -04004 002277 1013 -1758 lt0001 (-0445 -0356) -015625

icyc 01625 007073 1013 230 00218 (0024 0301) 0015924

reint 01259 007007 1013 180 00726 (-0012 0263) 0313939

intera -006607 009767 1013 -068 04989 (-0258 0126) 0002171

agegrp 005158 005907 1013 087 03828 (-0064 0167) -013258

nodegrp 002780 005384 1013 052 06057 (-0078 0133) -018797

er_stat 002232 005408 1013 041 06799 (-0084 0128) -005203

Survival Parameter Estimates

icyc -006267 01695 1013 -037 07116 (-0395 0270) 0024537

reint -03355 01756 1013 -191 00563 (-0680 0009) -001849

intera 03464 02406 1013 144 01502 (-0126 0818) 014234

agegrp -01827 01397 1013 -131 01913 (-0457 0091) 0049036

nodegrp 09522 01217 1013 782 lt0001 (0713 1191) 003694

er_stat -01540 01315 1013 -117 02418 (-0412 0104) -000715

log λ1 -49259 02888 1013 -1706 lt0001 (-5493 -4359) 0016501

log λ2 -25917 02923 1013 -887 lt0001 (-3165 -2018) -002471

log λ3 -27430 02482 1013 -1105 lt0001 (-3230 -2256) -00547

log λ4 -24318 02494 1013 -975 lt0001 (-2921 -1942) -006882

log λ5 -23952 02519 1013 -951 lt0001 (-2889 -1901) -007637

log λ6 -26149 02529 1013 -1034 lt0001 (-3111 -2119) -008057

log λ7 -24741 02571 1013 -962 lt0001 (-2979 -1970) -008093

log λ8 -26860 02547 1013 -1054 lt0001 (-3186 -2186) -00729

log λ9 -31547 02510 1013 -1257 lt0001 (-3647 -2662) -005614

β 02643 006188 1013 427 lt0001 (0143 0386) 0314192

Figure 5 The estimates of the parameters for coping under SPM1L with J = 9 based onLBSQP

Journal of Statistical Software 13

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_icyc 09393 (0674 1310)

HR_reint 07150 (0507 1009)

HR_intera 14140 (0882 2267)

HR_agegrp 08330 (0633 1096)

HR_nodegrp 25915 (2041 3291)

HR_er_stat 08573 (0662 1110)

λ1 0007256 (0004 0013)

λ2 007489 (0042 0133)

λ3 006438 (0040 0105)

λ4 008788 (0054 0143)

λ5 009116 (0056 0149)

λ6 007318 (0045 0120)

λ7 008424 (0051 0140)

λ8 006815 (0041 0112)

λ9 004265 (0026 0070)

HR_β 13025 (1154 1471)

Figure 6 The hazard ratios and λ estimates for coping under SPM1L with J = 9 based onLBSQP

random-effects covariance matrix as well as the estimate of the standard deviation (σ) for theerror term The estimates of the coefficients of the variables from the longitudinal componentare shown in Subtable ldquoLongitudinal Parameter Estimatesrdquo In Subtable ldquoSurvival ParameterEstimatesrdquo ldquoParamVarrdquo lists all the names of the variables as well as the parameters from thesurvival component In this subtable log λ1 log λ2 and log λ9 are the natural logarithmsof the piecewise baseline hazards for the three intervals respectively The hazard ratios of thecovariates and β as well as the estimates of λ are shown in Figure 6 From Figure 5 we seethat the number of induction cycles has p values of 00218 and 07116 in the longitudinal andsurvival submodels respectively indicating that the effect of the number of induction coursesis significant at the 005 level in the longitudinal submodel and is not statistically significantin the survival submodel The reintroduction of CMF and the interaction between the numberof initial cycles and the reintroduction do not have significant effects on QOL or OS based onthe large p values From Figure 5 we also see that (i) the number of positive nodes is highlysignificant in the survival submodel implying that a greater number of positive nodes wouldincrease the risk of death and (ii) a significant coefficient β indicates that coping is highlyassociated with OSFrom the ldquoFit Statisticsrdquo tables in Figure 3 and Figure 7 we see that coping had the largestvalues of ∆AIC and ∆BIC and physical well-being had the smallest values of ∆AIC and ∆BICwhich indicate that coping led to the most gain in fitting the OS while physical well-beinghad the least contribution to the fit of the OS

14 JMFit Joint Models of Longitudinal and Survival Data in SAS

Fit Statistics

Log Likelihood -870714

AIC 1747029 BIC 1760812

AICLong 1508529 BICLong 1514436

AICSurv|Long 238500 BICSurv|Long 246376

AICSurv0 240095 BICSurv0 247479

ΔAIC 1596 ΔBIC 1104

Fit Statistics

Log Likelihood -854107

AIC 1713814 BIC 1727597

AICLong 1476026 BICLong 1481933

AICSurv|Long 237788 BICSurv|Long 245664

AICSurv0 240095 BICSurv0 247479

ΔAIC 2308 ΔBIC 1815

(a) Appetite (b) Mood

Fit Statistics

Log Likelihood -876795

AIC 1759190 BIC 1772973

AICLong 1519749 BICLong 1525657

AICSurv|Long 239440 BICSurv|Long 247317

AICSurv0 240095 BICSurv0 247479

ΔAIC 655 ΔBIC 163

(c) Physical well-being

Figure 7 The fit statistics for appetite mood and physical well-being under SPM1L withJ = 9 based on LBSQP

AICSurv|Long ∆AICJ LBSQP MBSQP RBSQP LBSQP MBSQP RBSQP1 252820 252820 252820 370 370 3702 251219 251219 251219 884 884 8843 244981 244981 250867 2041 2041 9654 244598 244598 244598 2152 2152 21525 240602 244795 244396 2928 2155 22436 240799 244959 244561 2932 2160 22477 240959 240959 244757 2941 2941 22508 240734 240734 240734 3054 3054 30549 236460 240938 240922 3635 3050 306410 236565 240990 241105 3654 3060 3063

Table 1 AICSurv|Longrsquos and ∆AICrsquos for coping under SPM1L with different partition algo-rithms and different J rsquos

Since ∆AIC = AICSurv0 minus AICSurv|Long both AICSurv0 and AICSurv|Long change when Jchanges Thus the ldquobest Jrdquo based on ∆AIC may not necessarily lead to the best survivalsubmodel or the best joint model As an illustration Table 1 shows the values of AICSurv|Longand ∆AIC for coping under SPM1L with different partition algorithms and different J rsquosNote TMAXI is set to 2 with WEIGHT = 05 in all the models From Table 1 we see that (i)J = 10 or J = 9 is the ldquobestrdquo choice according to ∆AIC (ii) J = 9 or J = 8 is the ldquobestrdquoone based on AICSurv|Long and (iii) the value of AICSurv|Long is 236460 for LBSQP when

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 10: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

10 JMFit Joint Models of Longitudinal and Survival Data in SAS

Number of Subjects

Subjects_in_coping 1015

Subjects_in_os 1015

Subjects_Used 1015

Figure 2 The number of subjects for coping

12 and 15 (C) CMF for three consecutive courses on months 1 to 3 (CMF3) or (D) CMF3plus three single courses of reintroduction CMF given on months 6 9 and 12 Four indicatorsof patientsrsquo quality of life (QOL) appetite perceived coping mood and physical well-beingwere collected in this trial We consider a subset of the IBCSG data which consists of 1015patients with at least three values of each longitudinal measure and six binary covariatesincluding treatment (A 1 = icyc 0 = reint and 0 = intera B 1 = icyc 1 = reintand 1 = intera C 0 = icyc 0 = reint and 0 = intera and D 0 = icyc 1 = reintand 0 = intera) age (1 = lsquogt 40rsquo and 0 = lsquole 40rsquo) the number of positive nodes of thetumor (1 = lsquogt 4rsquo and 0 = lsquole 4rsquo) and estrogen receptor (ER) status (1 = positive and 0 =negative) To satisfy the normality assumption for each of the four QOL indicators followingChi and Ibrahim (2006) the corresponding observed value of each QOL was transformed toradic

100minusQOL so that smaller value reflects better QOL For the subset of 1015 patients aftera median follow-up of 968 years (interquartile range 810ndash1104 years) 296 patients diedWe note that QOL indicators were collected only up to 165 years (about 18 months) Byapplying joint models in this study we compare the longitudinal QOL indicators in termsof their contributions to the fit of overall survival (OS) via ∆AIC and ∆BIC which arecomputed using JMFitSuppose we fit the trajectory model with a linear trajectory as considered by Chi and Ibrahim(2006) and Zhu Ibrahim Chi and Tang (2012) and J = 9 for coping Then we create two datasets named as ldquocopingrdquo and ldquoosrdquo for the macro JMFitrsquos LONG and SURV options respectivelyThe MODEL option is set to ldquoSPM1Lrdquo and TS is set to 0 We assign 2 to TMAXI for tmaxiadjustment with WEIGHT=05 The NPIECES option is set to 9 and the PARTITION option isset to 2 for the LBSQP algorithmJMFit is called

JMFit(LONG = coping SURV = os MODEL = SPM1L TS = 0 TMAXI = 2WEIGHT = 05 NPIECES = 9 PARTITION = 2)

The RTF file lsquoOutput for coping under SPM1L with J=9 (Partition=2)rtfrsquo is gener-ated by JMFit which lists five tables Table ldquoNumber of Subjectsrdquo given in Figure 2 showsthat there are 1015 patients in data sets coping and os respectively and 1015 subjects areused implying that the IDs of the subjects in these two data sets matchFigure 3 shows the ldquoFit Statisticsrdquo table which consists of the log likelihood AICLong(BICLong) AICSurv (BICSurv) and ∆AIC (∆BIC) Inside the JMFit macro PROC NLMIXEDprovides the log likelihood AIC and BIC and PROC IML (SAS Institute Inc 2011a) is usedto compute AICLong and BICLongNext the table titled ldquoSurvival Parameter Estimates (Survival Alone)rdquo shown in Figure 4is obtained by fitting the survival data alone For this table the estimate the standard error

Journal of Statistical Software 11

Fit Statistics

Log Likelihood -806953

AIC 1619507 BIC 1633290

AICLong 1383046 BICLong 1388954

AICSurv|Long 236460 BICSurv|Long 244336

AICSurv0 240095 BICSurv0 247479

ΔAIC 3635 ΔBIC 3143

Figure 3 The fit statistics for coping under SPM1L with J = 9 based on LBSQP

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

icyc -000126 01642 1015 -001 09939 (-0323 0321) -000133

reint -02772 01707 1015 -162 01046 (-0612 0058) -000141

intera 02678 02337 1015 115 02520 (-0191 0726) -000211

agegrp -02070 01356 1015 -153 01272 (-0473 0059) 0000143

nodegrp 08926 01170 1015 763 lt0001 (0663 1122) -000009

er_stat -01982 01275 1015 -156 01203 (-0448 0052) -000221

log λ1 -49829 02846 1015 -1751 lt0001 (-5541 -4424) -000031

log λ2 -27823 02848 1015 -977 lt0001 (-3341 -2224) -000051

log λ3 -29729 02373 1015 -1253 lt0001 (-3439 -2507) 0003588

log λ4 -27064 02359 1015 -1147 lt0001 (-3169 -2244) -000073

log λ5 -26998 02364 1015 -1142 lt0001 (-3164 -2236) -000167

log λ6 -29433 02355 1015 -1250 lt0001 (-3406 -2481) 0001627

log λ7 -28127 02387 1015 -1178 lt0001 (-3281 -2344) -000293

log λ8 -30135 02367 1015 -1273 lt0001 (-3478 -2549) 0001908

log λ9 -34253 02358 1015 -1453 lt0001 (-3888 -2963) -000119

Figure 4 The estimates of the parameters obtained by (11) with J = 9 based on LBSQP

(SE) the degrees of freedom (DF) the t value the p value and the 95 confidence interval(CI) are all shown for each parameter In addition the gradient of the negative log-likelihoodfunction is displayed which can be used to check convergence of PROC NLMIXED A smallgradient implies better convergence The largest absolute value of the gradients shown inFigure 4 is 0003588 indicating good convergence of PROC NLMIXEDPROC NLMIXED produces the table titled ldquoParameter Estimatesrdquo in Figure 5 which consistsof three subtables ldquoCovariance Parameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquoand ldquoSurvival Parameter Estimatesrdquo For each parameter in this table the estimate SE DFt value p value 95 CI and gradient are all provided The Subtable ldquoCovariance ParameterEstimatesrdquo consists of the estimates of the three lower-triangle elements (Ω00Ω10Ω11) of the

12 JMFit Joint Models of Longitudinal and Survival Data in SAS

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 05350 003274 1013 1634 lt0001 (0471 0599) -008107

Ω10 -005658 002094 1013 -270 00070 (-0098 -0015) 0101966

Ω11 01725 002650 1013 651 lt0001 (0120 0224) -002461

σ 06149 0007308 1013 8414 lt0001 (0601 0629) 0087386

Longitudinal Parameter Estimates

Intercept 008643 007928 1013 109 02759 (-0069 0242) 0123621

t -04004 002277 1013 -1758 lt0001 (-0445 -0356) -015625

icyc 01625 007073 1013 230 00218 (0024 0301) 0015924

reint 01259 007007 1013 180 00726 (-0012 0263) 0313939

intera -006607 009767 1013 -068 04989 (-0258 0126) 0002171

agegrp 005158 005907 1013 087 03828 (-0064 0167) -013258

nodegrp 002780 005384 1013 052 06057 (-0078 0133) -018797

er_stat 002232 005408 1013 041 06799 (-0084 0128) -005203

Survival Parameter Estimates

icyc -006267 01695 1013 -037 07116 (-0395 0270) 0024537

reint -03355 01756 1013 -191 00563 (-0680 0009) -001849

intera 03464 02406 1013 144 01502 (-0126 0818) 014234

agegrp -01827 01397 1013 -131 01913 (-0457 0091) 0049036

nodegrp 09522 01217 1013 782 lt0001 (0713 1191) 003694

er_stat -01540 01315 1013 -117 02418 (-0412 0104) -000715

log λ1 -49259 02888 1013 -1706 lt0001 (-5493 -4359) 0016501

log λ2 -25917 02923 1013 -887 lt0001 (-3165 -2018) -002471

log λ3 -27430 02482 1013 -1105 lt0001 (-3230 -2256) -00547

log λ4 -24318 02494 1013 -975 lt0001 (-2921 -1942) -006882

log λ5 -23952 02519 1013 -951 lt0001 (-2889 -1901) -007637

log λ6 -26149 02529 1013 -1034 lt0001 (-3111 -2119) -008057

log λ7 -24741 02571 1013 -962 lt0001 (-2979 -1970) -008093

log λ8 -26860 02547 1013 -1054 lt0001 (-3186 -2186) -00729

log λ9 -31547 02510 1013 -1257 lt0001 (-3647 -2662) -005614

β 02643 006188 1013 427 lt0001 (0143 0386) 0314192

Figure 5 The estimates of the parameters for coping under SPM1L with J = 9 based onLBSQP

Journal of Statistical Software 13

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_icyc 09393 (0674 1310)

HR_reint 07150 (0507 1009)

HR_intera 14140 (0882 2267)

HR_agegrp 08330 (0633 1096)

HR_nodegrp 25915 (2041 3291)

HR_er_stat 08573 (0662 1110)

λ1 0007256 (0004 0013)

λ2 007489 (0042 0133)

λ3 006438 (0040 0105)

λ4 008788 (0054 0143)

λ5 009116 (0056 0149)

λ6 007318 (0045 0120)

λ7 008424 (0051 0140)

λ8 006815 (0041 0112)

λ9 004265 (0026 0070)

HR_β 13025 (1154 1471)

Figure 6 The hazard ratios and λ estimates for coping under SPM1L with J = 9 based onLBSQP

random-effects covariance matrix as well as the estimate of the standard deviation (σ) for theerror term The estimates of the coefficients of the variables from the longitudinal componentare shown in Subtable ldquoLongitudinal Parameter Estimatesrdquo In Subtable ldquoSurvival ParameterEstimatesrdquo ldquoParamVarrdquo lists all the names of the variables as well as the parameters from thesurvival component In this subtable log λ1 log λ2 and log λ9 are the natural logarithmsof the piecewise baseline hazards for the three intervals respectively The hazard ratios of thecovariates and β as well as the estimates of λ are shown in Figure 6 From Figure 5 we seethat the number of induction cycles has p values of 00218 and 07116 in the longitudinal andsurvival submodels respectively indicating that the effect of the number of induction coursesis significant at the 005 level in the longitudinal submodel and is not statistically significantin the survival submodel The reintroduction of CMF and the interaction between the numberof initial cycles and the reintroduction do not have significant effects on QOL or OS based onthe large p values From Figure 5 we also see that (i) the number of positive nodes is highlysignificant in the survival submodel implying that a greater number of positive nodes wouldincrease the risk of death and (ii) a significant coefficient β indicates that coping is highlyassociated with OSFrom the ldquoFit Statisticsrdquo tables in Figure 3 and Figure 7 we see that coping had the largestvalues of ∆AIC and ∆BIC and physical well-being had the smallest values of ∆AIC and ∆BICwhich indicate that coping led to the most gain in fitting the OS while physical well-beinghad the least contribution to the fit of the OS

14 JMFit Joint Models of Longitudinal and Survival Data in SAS

Fit Statistics

Log Likelihood -870714

AIC 1747029 BIC 1760812

AICLong 1508529 BICLong 1514436

AICSurv|Long 238500 BICSurv|Long 246376

AICSurv0 240095 BICSurv0 247479

ΔAIC 1596 ΔBIC 1104

Fit Statistics

Log Likelihood -854107

AIC 1713814 BIC 1727597

AICLong 1476026 BICLong 1481933

AICSurv|Long 237788 BICSurv|Long 245664

AICSurv0 240095 BICSurv0 247479

ΔAIC 2308 ΔBIC 1815

(a) Appetite (b) Mood

Fit Statistics

Log Likelihood -876795

AIC 1759190 BIC 1772973

AICLong 1519749 BICLong 1525657

AICSurv|Long 239440 BICSurv|Long 247317

AICSurv0 240095 BICSurv0 247479

ΔAIC 655 ΔBIC 163

(c) Physical well-being

Figure 7 The fit statistics for appetite mood and physical well-being under SPM1L withJ = 9 based on LBSQP

AICSurv|Long ∆AICJ LBSQP MBSQP RBSQP LBSQP MBSQP RBSQP1 252820 252820 252820 370 370 3702 251219 251219 251219 884 884 8843 244981 244981 250867 2041 2041 9654 244598 244598 244598 2152 2152 21525 240602 244795 244396 2928 2155 22436 240799 244959 244561 2932 2160 22477 240959 240959 244757 2941 2941 22508 240734 240734 240734 3054 3054 30549 236460 240938 240922 3635 3050 306410 236565 240990 241105 3654 3060 3063

Table 1 AICSurv|Longrsquos and ∆AICrsquos for coping under SPM1L with different partition algo-rithms and different J rsquos

Since ∆AIC = AICSurv0 minus AICSurv|Long both AICSurv0 and AICSurv|Long change when Jchanges Thus the ldquobest Jrdquo based on ∆AIC may not necessarily lead to the best survivalsubmodel or the best joint model As an illustration Table 1 shows the values of AICSurv|Longand ∆AIC for coping under SPM1L with different partition algorithms and different J rsquosNote TMAXI is set to 2 with WEIGHT = 05 in all the models From Table 1 we see that (i)J = 10 or J = 9 is the ldquobestrdquo choice according to ∆AIC (ii) J = 9 or J = 8 is the ldquobestrdquoone based on AICSurv|Long and (iii) the value of AICSurv|Long is 236460 for LBSQP when

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 11: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

Journal of Statistical Software 11

Fit Statistics

Log Likelihood -806953

AIC 1619507 BIC 1633290

AICLong 1383046 BICLong 1388954

AICSurv|Long 236460 BICSurv|Long 244336

AICSurv0 240095 BICSurv0 247479

ΔAIC 3635 ΔBIC 3143

Figure 3 The fit statistics for coping under SPM1L with J = 9 based on LBSQP

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

icyc -000126 01642 1015 -001 09939 (-0323 0321) -000133

reint -02772 01707 1015 -162 01046 (-0612 0058) -000141

intera 02678 02337 1015 115 02520 (-0191 0726) -000211

agegrp -02070 01356 1015 -153 01272 (-0473 0059) 0000143

nodegrp 08926 01170 1015 763 lt0001 (0663 1122) -000009

er_stat -01982 01275 1015 -156 01203 (-0448 0052) -000221

log λ1 -49829 02846 1015 -1751 lt0001 (-5541 -4424) -000031

log λ2 -27823 02848 1015 -977 lt0001 (-3341 -2224) -000051

log λ3 -29729 02373 1015 -1253 lt0001 (-3439 -2507) 0003588

log λ4 -27064 02359 1015 -1147 lt0001 (-3169 -2244) -000073

log λ5 -26998 02364 1015 -1142 lt0001 (-3164 -2236) -000167

log λ6 -29433 02355 1015 -1250 lt0001 (-3406 -2481) 0001627

log λ7 -28127 02387 1015 -1178 lt0001 (-3281 -2344) -000293

log λ8 -30135 02367 1015 -1273 lt0001 (-3478 -2549) 0001908

log λ9 -34253 02358 1015 -1453 lt0001 (-3888 -2963) -000119

Figure 4 The estimates of the parameters obtained by (11) with J = 9 based on LBSQP

(SE) the degrees of freedom (DF) the t value the p value and the 95 confidence interval(CI) are all shown for each parameter In addition the gradient of the negative log-likelihoodfunction is displayed which can be used to check convergence of PROC NLMIXED A smallgradient implies better convergence The largest absolute value of the gradients shown inFigure 4 is 0003588 indicating good convergence of PROC NLMIXEDPROC NLMIXED produces the table titled ldquoParameter Estimatesrdquo in Figure 5 which consistsof three subtables ldquoCovariance Parameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquoand ldquoSurvival Parameter Estimatesrdquo For each parameter in this table the estimate SE DFt value p value 95 CI and gradient are all provided The Subtable ldquoCovariance ParameterEstimatesrdquo consists of the estimates of the three lower-triangle elements (Ω00Ω10Ω11) of the

12 JMFit Joint Models of Longitudinal and Survival Data in SAS

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 05350 003274 1013 1634 lt0001 (0471 0599) -008107

Ω10 -005658 002094 1013 -270 00070 (-0098 -0015) 0101966

Ω11 01725 002650 1013 651 lt0001 (0120 0224) -002461

σ 06149 0007308 1013 8414 lt0001 (0601 0629) 0087386

Longitudinal Parameter Estimates

Intercept 008643 007928 1013 109 02759 (-0069 0242) 0123621

t -04004 002277 1013 -1758 lt0001 (-0445 -0356) -015625

icyc 01625 007073 1013 230 00218 (0024 0301) 0015924

reint 01259 007007 1013 180 00726 (-0012 0263) 0313939

intera -006607 009767 1013 -068 04989 (-0258 0126) 0002171

agegrp 005158 005907 1013 087 03828 (-0064 0167) -013258

nodegrp 002780 005384 1013 052 06057 (-0078 0133) -018797

er_stat 002232 005408 1013 041 06799 (-0084 0128) -005203

Survival Parameter Estimates

icyc -006267 01695 1013 -037 07116 (-0395 0270) 0024537

reint -03355 01756 1013 -191 00563 (-0680 0009) -001849

intera 03464 02406 1013 144 01502 (-0126 0818) 014234

agegrp -01827 01397 1013 -131 01913 (-0457 0091) 0049036

nodegrp 09522 01217 1013 782 lt0001 (0713 1191) 003694

er_stat -01540 01315 1013 -117 02418 (-0412 0104) -000715

log λ1 -49259 02888 1013 -1706 lt0001 (-5493 -4359) 0016501

log λ2 -25917 02923 1013 -887 lt0001 (-3165 -2018) -002471

log λ3 -27430 02482 1013 -1105 lt0001 (-3230 -2256) -00547

log λ4 -24318 02494 1013 -975 lt0001 (-2921 -1942) -006882

log λ5 -23952 02519 1013 -951 lt0001 (-2889 -1901) -007637

log λ6 -26149 02529 1013 -1034 lt0001 (-3111 -2119) -008057

log λ7 -24741 02571 1013 -962 lt0001 (-2979 -1970) -008093

log λ8 -26860 02547 1013 -1054 lt0001 (-3186 -2186) -00729

log λ9 -31547 02510 1013 -1257 lt0001 (-3647 -2662) -005614

β 02643 006188 1013 427 lt0001 (0143 0386) 0314192

Figure 5 The estimates of the parameters for coping under SPM1L with J = 9 based onLBSQP

Journal of Statistical Software 13

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_icyc 09393 (0674 1310)

HR_reint 07150 (0507 1009)

HR_intera 14140 (0882 2267)

HR_agegrp 08330 (0633 1096)

HR_nodegrp 25915 (2041 3291)

HR_er_stat 08573 (0662 1110)

λ1 0007256 (0004 0013)

λ2 007489 (0042 0133)

λ3 006438 (0040 0105)

λ4 008788 (0054 0143)

λ5 009116 (0056 0149)

λ6 007318 (0045 0120)

λ7 008424 (0051 0140)

λ8 006815 (0041 0112)

λ9 004265 (0026 0070)

HR_β 13025 (1154 1471)

Figure 6 The hazard ratios and λ estimates for coping under SPM1L with J = 9 based onLBSQP

random-effects covariance matrix as well as the estimate of the standard deviation (σ) for theerror term The estimates of the coefficients of the variables from the longitudinal componentare shown in Subtable ldquoLongitudinal Parameter Estimatesrdquo In Subtable ldquoSurvival ParameterEstimatesrdquo ldquoParamVarrdquo lists all the names of the variables as well as the parameters from thesurvival component In this subtable log λ1 log λ2 and log λ9 are the natural logarithmsof the piecewise baseline hazards for the three intervals respectively The hazard ratios of thecovariates and β as well as the estimates of λ are shown in Figure 6 From Figure 5 we seethat the number of induction cycles has p values of 00218 and 07116 in the longitudinal andsurvival submodels respectively indicating that the effect of the number of induction coursesis significant at the 005 level in the longitudinal submodel and is not statistically significantin the survival submodel The reintroduction of CMF and the interaction between the numberof initial cycles and the reintroduction do not have significant effects on QOL or OS based onthe large p values From Figure 5 we also see that (i) the number of positive nodes is highlysignificant in the survival submodel implying that a greater number of positive nodes wouldincrease the risk of death and (ii) a significant coefficient β indicates that coping is highlyassociated with OSFrom the ldquoFit Statisticsrdquo tables in Figure 3 and Figure 7 we see that coping had the largestvalues of ∆AIC and ∆BIC and physical well-being had the smallest values of ∆AIC and ∆BICwhich indicate that coping led to the most gain in fitting the OS while physical well-beinghad the least contribution to the fit of the OS

14 JMFit Joint Models of Longitudinal and Survival Data in SAS

Fit Statistics

Log Likelihood -870714

AIC 1747029 BIC 1760812

AICLong 1508529 BICLong 1514436

AICSurv|Long 238500 BICSurv|Long 246376

AICSurv0 240095 BICSurv0 247479

ΔAIC 1596 ΔBIC 1104

Fit Statistics

Log Likelihood -854107

AIC 1713814 BIC 1727597

AICLong 1476026 BICLong 1481933

AICSurv|Long 237788 BICSurv|Long 245664

AICSurv0 240095 BICSurv0 247479

ΔAIC 2308 ΔBIC 1815

(a) Appetite (b) Mood

Fit Statistics

Log Likelihood -876795

AIC 1759190 BIC 1772973

AICLong 1519749 BICLong 1525657

AICSurv|Long 239440 BICSurv|Long 247317

AICSurv0 240095 BICSurv0 247479

ΔAIC 655 ΔBIC 163

(c) Physical well-being

Figure 7 The fit statistics for appetite mood and physical well-being under SPM1L withJ = 9 based on LBSQP

AICSurv|Long ∆AICJ LBSQP MBSQP RBSQP LBSQP MBSQP RBSQP1 252820 252820 252820 370 370 3702 251219 251219 251219 884 884 8843 244981 244981 250867 2041 2041 9654 244598 244598 244598 2152 2152 21525 240602 244795 244396 2928 2155 22436 240799 244959 244561 2932 2160 22477 240959 240959 244757 2941 2941 22508 240734 240734 240734 3054 3054 30549 236460 240938 240922 3635 3050 306410 236565 240990 241105 3654 3060 3063

Table 1 AICSurv|Longrsquos and ∆AICrsquos for coping under SPM1L with different partition algo-rithms and different J rsquos

Since ∆AIC = AICSurv0 minus AICSurv|Long both AICSurv0 and AICSurv|Long change when Jchanges Thus the ldquobest Jrdquo based on ∆AIC may not necessarily lead to the best survivalsubmodel or the best joint model As an illustration Table 1 shows the values of AICSurv|Longand ∆AIC for coping under SPM1L with different partition algorithms and different J rsquosNote TMAXI is set to 2 with WEIGHT = 05 in all the models From Table 1 we see that (i)J = 10 or J = 9 is the ldquobestrdquo choice according to ∆AIC (ii) J = 9 or J = 8 is the ldquobestrdquoone based on AICSurv|Long and (iii) the value of AICSurv|Long is 236460 for LBSQP when

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 12: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

12 JMFit Joint Models of Longitudinal and Survival Data in SAS

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 05350 003274 1013 1634 lt0001 (0471 0599) -008107

Ω10 -005658 002094 1013 -270 00070 (-0098 -0015) 0101966

Ω11 01725 002650 1013 651 lt0001 (0120 0224) -002461

σ 06149 0007308 1013 8414 lt0001 (0601 0629) 0087386

Longitudinal Parameter Estimates

Intercept 008643 007928 1013 109 02759 (-0069 0242) 0123621

t -04004 002277 1013 -1758 lt0001 (-0445 -0356) -015625

icyc 01625 007073 1013 230 00218 (0024 0301) 0015924

reint 01259 007007 1013 180 00726 (-0012 0263) 0313939

intera -006607 009767 1013 -068 04989 (-0258 0126) 0002171

agegrp 005158 005907 1013 087 03828 (-0064 0167) -013258

nodegrp 002780 005384 1013 052 06057 (-0078 0133) -018797

er_stat 002232 005408 1013 041 06799 (-0084 0128) -005203

Survival Parameter Estimates

icyc -006267 01695 1013 -037 07116 (-0395 0270) 0024537

reint -03355 01756 1013 -191 00563 (-0680 0009) -001849

intera 03464 02406 1013 144 01502 (-0126 0818) 014234

agegrp -01827 01397 1013 -131 01913 (-0457 0091) 0049036

nodegrp 09522 01217 1013 782 lt0001 (0713 1191) 003694

er_stat -01540 01315 1013 -117 02418 (-0412 0104) -000715

log λ1 -49259 02888 1013 -1706 lt0001 (-5493 -4359) 0016501

log λ2 -25917 02923 1013 -887 lt0001 (-3165 -2018) -002471

log λ3 -27430 02482 1013 -1105 lt0001 (-3230 -2256) -00547

log λ4 -24318 02494 1013 -975 lt0001 (-2921 -1942) -006882

log λ5 -23952 02519 1013 -951 lt0001 (-2889 -1901) -007637

log λ6 -26149 02529 1013 -1034 lt0001 (-3111 -2119) -008057

log λ7 -24741 02571 1013 -962 lt0001 (-2979 -1970) -008093

log λ8 -26860 02547 1013 -1054 lt0001 (-3186 -2186) -00729

log λ9 -31547 02510 1013 -1257 lt0001 (-3647 -2662) -005614

β 02643 006188 1013 427 lt0001 (0143 0386) 0314192

Figure 5 The estimates of the parameters for coping under SPM1L with J = 9 based onLBSQP

Journal of Statistical Software 13

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_icyc 09393 (0674 1310)

HR_reint 07150 (0507 1009)

HR_intera 14140 (0882 2267)

HR_agegrp 08330 (0633 1096)

HR_nodegrp 25915 (2041 3291)

HR_er_stat 08573 (0662 1110)

λ1 0007256 (0004 0013)

λ2 007489 (0042 0133)

λ3 006438 (0040 0105)

λ4 008788 (0054 0143)

λ5 009116 (0056 0149)

λ6 007318 (0045 0120)

λ7 008424 (0051 0140)

λ8 006815 (0041 0112)

λ9 004265 (0026 0070)

HR_β 13025 (1154 1471)

Figure 6 The hazard ratios and λ estimates for coping under SPM1L with J = 9 based onLBSQP

random-effects covariance matrix as well as the estimate of the standard deviation (σ) for theerror term The estimates of the coefficients of the variables from the longitudinal componentare shown in Subtable ldquoLongitudinal Parameter Estimatesrdquo In Subtable ldquoSurvival ParameterEstimatesrdquo ldquoParamVarrdquo lists all the names of the variables as well as the parameters from thesurvival component In this subtable log λ1 log λ2 and log λ9 are the natural logarithmsof the piecewise baseline hazards for the three intervals respectively The hazard ratios of thecovariates and β as well as the estimates of λ are shown in Figure 6 From Figure 5 we seethat the number of induction cycles has p values of 00218 and 07116 in the longitudinal andsurvival submodels respectively indicating that the effect of the number of induction coursesis significant at the 005 level in the longitudinal submodel and is not statistically significantin the survival submodel The reintroduction of CMF and the interaction between the numberof initial cycles and the reintroduction do not have significant effects on QOL or OS based onthe large p values From Figure 5 we also see that (i) the number of positive nodes is highlysignificant in the survival submodel implying that a greater number of positive nodes wouldincrease the risk of death and (ii) a significant coefficient β indicates that coping is highlyassociated with OSFrom the ldquoFit Statisticsrdquo tables in Figure 3 and Figure 7 we see that coping had the largestvalues of ∆AIC and ∆BIC and physical well-being had the smallest values of ∆AIC and ∆BICwhich indicate that coping led to the most gain in fitting the OS while physical well-beinghad the least contribution to the fit of the OS

14 JMFit Joint Models of Longitudinal and Survival Data in SAS

Fit Statistics

Log Likelihood -870714

AIC 1747029 BIC 1760812

AICLong 1508529 BICLong 1514436

AICSurv|Long 238500 BICSurv|Long 246376

AICSurv0 240095 BICSurv0 247479

ΔAIC 1596 ΔBIC 1104

Fit Statistics

Log Likelihood -854107

AIC 1713814 BIC 1727597

AICLong 1476026 BICLong 1481933

AICSurv|Long 237788 BICSurv|Long 245664

AICSurv0 240095 BICSurv0 247479

ΔAIC 2308 ΔBIC 1815

(a) Appetite (b) Mood

Fit Statistics

Log Likelihood -876795

AIC 1759190 BIC 1772973

AICLong 1519749 BICLong 1525657

AICSurv|Long 239440 BICSurv|Long 247317

AICSurv0 240095 BICSurv0 247479

ΔAIC 655 ΔBIC 163

(c) Physical well-being

Figure 7 The fit statistics for appetite mood and physical well-being under SPM1L withJ = 9 based on LBSQP

AICSurv|Long ∆AICJ LBSQP MBSQP RBSQP LBSQP MBSQP RBSQP1 252820 252820 252820 370 370 3702 251219 251219 251219 884 884 8843 244981 244981 250867 2041 2041 9654 244598 244598 244598 2152 2152 21525 240602 244795 244396 2928 2155 22436 240799 244959 244561 2932 2160 22477 240959 240959 244757 2941 2941 22508 240734 240734 240734 3054 3054 30549 236460 240938 240922 3635 3050 306410 236565 240990 241105 3654 3060 3063

Table 1 AICSurv|Longrsquos and ∆AICrsquos for coping under SPM1L with different partition algo-rithms and different J rsquos

Since ∆AIC = AICSurv0 minus AICSurv|Long both AICSurv0 and AICSurv|Long change when Jchanges Thus the ldquobest Jrdquo based on ∆AIC may not necessarily lead to the best survivalsubmodel or the best joint model As an illustration Table 1 shows the values of AICSurv|Longand ∆AIC for coping under SPM1L with different partition algorithms and different J rsquosNote TMAXI is set to 2 with WEIGHT = 05 in all the models From Table 1 we see that (i)J = 10 or J = 9 is the ldquobestrdquo choice according to ∆AIC (ii) J = 9 or J = 8 is the ldquobestrdquoone based on AICSurv|Long and (iii) the value of AICSurv|Long is 236460 for LBSQP when

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 13: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

Journal of Statistical Software 13

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_icyc 09393 (0674 1310)

HR_reint 07150 (0507 1009)

HR_intera 14140 (0882 2267)

HR_agegrp 08330 (0633 1096)

HR_nodegrp 25915 (2041 3291)

HR_er_stat 08573 (0662 1110)

λ1 0007256 (0004 0013)

λ2 007489 (0042 0133)

λ3 006438 (0040 0105)

λ4 008788 (0054 0143)

λ5 009116 (0056 0149)

λ6 007318 (0045 0120)

λ7 008424 (0051 0140)

λ8 006815 (0041 0112)

λ9 004265 (0026 0070)

HR_β 13025 (1154 1471)

Figure 6 The hazard ratios and λ estimates for coping under SPM1L with J = 9 based onLBSQP

random-effects covariance matrix as well as the estimate of the standard deviation (σ) for theerror term The estimates of the coefficients of the variables from the longitudinal componentare shown in Subtable ldquoLongitudinal Parameter Estimatesrdquo In Subtable ldquoSurvival ParameterEstimatesrdquo ldquoParamVarrdquo lists all the names of the variables as well as the parameters from thesurvival component In this subtable log λ1 log λ2 and log λ9 are the natural logarithmsof the piecewise baseline hazards for the three intervals respectively The hazard ratios of thecovariates and β as well as the estimates of λ are shown in Figure 6 From Figure 5 we seethat the number of induction cycles has p values of 00218 and 07116 in the longitudinal andsurvival submodels respectively indicating that the effect of the number of induction coursesis significant at the 005 level in the longitudinal submodel and is not statistically significantin the survival submodel The reintroduction of CMF and the interaction between the numberof initial cycles and the reintroduction do not have significant effects on QOL or OS based onthe large p values From Figure 5 we also see that (i) the number of positive nodes is highlysignificant in the survival submodel implying that a greater number of positive nodes wouldincrease the risk of death and (ii) a significant coefficient β indicates that coping is highlyassociated with OSFrom the ldquoFit Statisticsrdquo tables in Figure 3 and Figure 7 we see that coping had the largestvalues of ∆AIC and ∆BIC and physical well-being had the smallest values of ∆AIC and ∆BICwhich indicate that coping led to the most gain in fitting the OS while physical well-beinghad the least contribution to the fit of the OS

14 JMFit Joint Models of Longitudinal and Survival Data in SAS

Fit Statistics

Log Likelihood -870714

AIC 1747029 BIC 1760812

AICLong 1508529 BICLong 1514436

AICSurv|Long 238500 BICSurv|Long 246376

AICSurv0 240095 BICSurv0 247479

ΔAIC 1596 ΔBIC 1104

Fit Statistics

Log Likelihood -854107

AIC 1713814 BIC 1727597

AICLong 1476026 BICLong 1481933

AICSurv|Long 237788 BICSurv|Long 245664

AICSurv0 240095 BICSurv0 247479

ΔAIC 2308 ΔBIC 1815

(a) Appetite (b) Mood

Fit Statistics

Log Likelihood -876795

AIC 1759190 BIC 1772973

AICLong 1519749 BICLong 1525657

AICSurv|Long 239440 BICSurv|Long 247317

AICSurv0 240095 BICSurv0 247479

ΔAIC 655 ΔBIC 163

(c) Physical well-being

Figure 7 The fit statistics for appetite mood and physical well-being under SPM1L withJ = 9 based on LBSQP

AICSurv|Long ∆AICJ LBSQP MBSQP RBSQP LBSQP MBSQP RBSQP1 252820 252820 252820 370 370 3702 251219 251219 251219 884 884 8843 244981 244981 250867 2041 2041 9654 244598 244598 244598 2152 2152 21525 240602 244795 244396 2928 2155 22436 240799 244959 244561 2932 2160 22477 240959 240959 244757 2941 2941 22508 240734 240734 240734 3054 3054 30549 236460 240938 240922 3635 3050 306410 236565 240990 241105 3654 3060 3063

Table 1 AICSurv|Longrsquos and ∆AICrsquos for coping under SPM1L with different partition algo-rithms and different J rsquos

Since ∆AIC = AICSurv0 minus AICSurv|Long both AICSurv0 and AICSurv|Long change when Jchanges Thus the ldquobest Jrdquo based on ∆AIC may not necessarily lead to the best survivalsubmodel or the best joint model As an illustration Table 1 shows the values of AICSurv|Longand ∆AIC for coping under SPM1L with different partition algorithms and different J rsquosNote TMAXI is set to 2 with WEIGHT = 05 in all the models From Table 1 we see that (i)J = 10 or J = 9 is the ldquobestrdquo choice according to ∆AIC (ii) J = 9 or J = 8 is the ldquobestrdquoone based on AICSurv|Long and (iii) the value of AICSurv|Long is 236460 for LBSQP when

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 14: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

14 JMFit Joint Models of Longitudinal and Survival Data in SAS

Fit Statistics

Log Likelihood -870714

AIC 1747029 BIC 1760812

AICLong 1508529 BICLong 1514436

AICSurv|Long 238500 BICSurv|Long 246376

AICSurv0 240095 BICSurv0 247479

ΔAIC 1596 ΔBIC 1104

Fit Statistics

Log Likelihood -854107

AIC 1713814 BIC 1727597

AICLong 1476026 BICLong 1481933

AICSurv|Long 237788 BICSurv|Long 245664

AICSurv0 240095 BICSurv0 247479

ΔAIC 2308 ΔBIC 1815

(a) Appetite (b) Mood

Fit Statistics

Log Likelihood -876795

AIC 1759190 BIC 1772973

AICLong 1519749 BICLong 1525657

AICSurv|Long 239440 BICSurv|Long 247317

AICSurv0 240095 BICSurv0 247479

ΔAIC 655 ΔBIC 163

(c) Physical well-being

Figure 7 The fit statistics for appetite mood and physical well-being under SPM1L withJ = 9 based on LBSQP

AICSurv|Long ∆AICJ LBSQP MBSQP RBSQP LBSQP MBSQP RBSQP1 252820 252820 252820 370 370 3702 251219 251219 251219 884 884 8843 244981 244981 250867 2041 2041 9654 244598 244598 244598 2152 2152 21525 240602 244795 244396 2928 2155 22436 240799 244959 244561 2932 2160 22477 240959 240959 244757 2941 2941 22508 240734 240734 240734 3054 3054 30549 236460 240938 240922 3635 3050 306410 236565 240990 241105 3654 3060 3063

Table 1 AICSurv|Longrsquos and ∆AICrsquos for coping under SPM1L with different partition algo-rithms and different J rsquos

Since ∆AIC = AICSurv0 minus AICSurv|Long both AICSurv0 and AICSurv|Long change when Jchanges Thus the ldquobest Jrdquo based on ∆AIC may not necessarily lead to the best survivalsubmodel or the best joint model As an illustration Table 1 shows the values of AICSurv|Longand ∆AIC for coping under SPM1L with different partition algorithms and different J rsquosNote TMAXI is set to 2 with WEIGHT = 05 in all the models From Table 1 we see that (i)J = 10 or J = 9 is the ldquobestrdquo choice according to ∆AIC (ii) J = 9 or J = 8 is the ldquobestrdquoone based on AICSurv|Long and (iii) the value of AICSurv|Long is 236460 for LBSQP when

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 15: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

Journal of Statistical Software 15

Coping Physical well-beingw TMAXI = 1 TMAXI = 2 TMAXI = 1 TMAXI = 20 240288 240295 240278 240254

025 238423 237658 239927 23969805 238351 236460 239910 239440075 239576 238560 240170 2399091a 240295 240295 240282 240282

Table 2 AICSurv|Longrsquos for coping and physical well-being under SPM1L with tmaxi adjust-ments using different wrsquos a No tmaxi adjustment

J = 9 which is the smallest one among all the combination of the partition algorithms andJ rsquos Since our goal is to fit the joint model to both the longitudinal data and the survivaldata it is more appropriate to use AICSurv|Long to determine the number of intervals for thepiecewise constant baseline hazard functionTable 2 shows the values of AICSurv|Long for coping and physical well-being under SPM1Lwith J = 9 and tmaxi adjustments using different wrsquos We can see from Table 2 that amongthe five values of w for both tmaxi adjustments coping and physical well-being have thesmallest values of AICSurv|Long at w = 05 For both coping and physical well-being TMAXI =2 outperforms TMAXI = 1 except for w = 0 for coping In addition both coping and physicalwell-being have the largest values of AICSurv|Long at w = 1 indicating that the full trajectorieswere not well estimated

42 A simulated data example

We consider a simulated data example which is similar to the data used in Zhang et al (2014)We generated a simulated data set with n = 400 subjects as follows First the time points(aij rsquos) at which the longitudinal measures were taken were fixed at (0 21 42 63 84 105126)304375 For each subject we generated seven binary covariates (xi1 xi7) indepen-dently from Bernoulli distributions with success probabilities (ie P (xij = 1) j = 1 7))049 092 081 049 038 056 and 074 respectively These proportions were estimated fromthe data in Zhang et al (2014) corresponding to the covariates treatment raceethnicitygender age Karnofsky status baseline stage of disease and vitamin supplementation re-spectively Second we simulated the longitudinal trajectory as

microi(aij) = (θ0 + θ0i) + (θ1 + θ1i)aij + γxi

where θ0 = 062 θ1 = 004 γ = (minus011minus010minus018minus006minus058minus009 010)gt and

(θ0i θ1i)gt sim N

(0(

062 minus004minus004 006

)) Finally we generated the longitudinal data from

a N(microi(aij) σ2) distribution with σ = 054 and tlowasti as

tlowasti = minus log(1minus U)λ expβ1(θ0 + θ0i) + β2(θ1 + θ1i) +αxi

where α = (minus036 015 004minus0003minus033minus038minus007)gt β1 = 026 β2 = 117 λ = exp(minus167) and U sim U(0 1) This longitudinal dataset is denoted by DLong Note that the

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 16: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

16 JMFit Joint Models of Longitudinal and Survival Data in SAS

values of the parameters were obtained by fitting SPM2L to the data in Zhang et al (2014)in which the longitudinal measure corresponds to pain The censoring time Ci was generatedfrom an exponential distribution with mean 50 and the right-censoring percentage was about12 The failure time and censoring indicator were calculated as ti = mintlowasti Ci and δi =1tlowasti le Ci where 1A denotes the indicator function such that 1A = 1 if A is true and 0otherwise This survival dataset is denoted by DSurvWe also generated three additional sets of longitudinal data with longitudinal trajectoriessimulated from

micro`i(aij) = (θ0 + θ0i + τ`0i) + (θ1 + θ1i + τ`1i)aij + γxi

where (i) (τ10i τ11i)gt sim N

(0(

012 00 012

)) (ii) (τ20i τ21i)gt sim N

(0(

052 00 052

)) and

(iii) (τ30i τ31i)gt sim N

(0(

1 00 1

)) Then the longitudinal data were generated from a

N(micro`i(aij) σ2) distribution with σ = 054 for ` = 1 2 3 These three additional sets oflongitudinal data were coupled with the same survival times as in DLong + DSurv to formthree additional data sets These resulting data sets are denoted by DLong1 + DSurv DLong2+ DSurv and DLong3 + DSurv This simulation setting is similar to Simulation III in Zhanget al (2014) except for the six additional covariatesFigure 8 shows the fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv andDLong3 +DSurv respectively using JMFit We see from Figure 8 that DLong + DSurv has thelargest values of ∆AIC and ∆BIC

Fit Statistics

Log Likelihood -405149

AIC 814899 BIC 824079

AICLong 610463 BICLong 615652

AICSurv|Long 204436 BICSurv|Long 208427

AICSurv0 207571 BICSurv0 210764

ΔAIC 3135 ΔBIC 2337

Fit Statistics

Log Likelihood -407949

AIC 820498 BIC 829678

AICLong 615187 BICLong 620376

AICSurv|Long 205311 BICSurv|Long 209303

AICSurv0 207571 BICSurv0 210764

ΔAIC 2260 ΔBIC 1461

(a) DLong + DSurv (b) DLong1 + DSurv

Fit Statistics

Log Likelihood -439992

AIC 884584 BIC 893764

AICLong 677837 BICLong 683026

AICSurv|Long 206747 BICSurv|Long 210739

AICSurv0 207571 BICSurv0 210764

ΔAIC 824 ΔBIC 025

Fit Statistics

Log Likelihood -477545

AIC 959691 BIC 968871

AICLong 752567 BICLong 757756

AICSurv|Long 207123 BICSurv|Long 211115

AICSurv0 207571 BICSurv0 210764

ΔAIC 447 ΔBIC -351

(c) DLong2 + DSurv (d) DLong3 + DSurv

Figure 8 The fit statistics for DLong + DSurv DLong1 + DSurv DLong2 + DSurv and DLong3+ DSurv under SPM2L

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 17: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

Journal of Statistical Software 17

The rest of the output for DLong +DSurv are provided in Appendix B The output for DLong`+DSurv (` = 1 2 3) are omitted here for brevity

5 Concluding remarksThe JMFit SAS macro fits the joint models for longitudinal and survival data The piecewiseexponential constant hazard model is assumed for the baseline hazard function The time-axis is partitioned into J intervals which are constructed by four algorithms namely ESQPLBSQP MBSQP and RBSQP JMFit allows users to specify the number of intervals J andthe partition method Five models including SPM1L SPM1Q SPM2L SPM2Q and TVCare implemented in JMFit This SAS macro also computes AIC BIC ∆AIC ∆BIC and theestimates of the parameters in the joint model The computational time of JMFit dependson which of SPM1L SPM1Q SPM2L SPM2Q or TVC is chosen and how big the datasetis For the example given in Section 41 it took 5 minutes to fit SPM1L with J = 9 basedon LBSQP for coping on a Dell PC with an Intel i5 processor 330 GHz CPU and 8 GB ofmemory On the same PC it only took 10 to 11 seconds to fit SPM2L to each of the simulateddatasets illustrated in Section 42The JMFit SAS macro provides two versions for SPM1L SPM1Q SPM2L and SPM2Q withthe TS argument with TS = 1 yielding the corresponding two-stage model Comparing thesemodels with TS missing or equal to 0 for the two-stage models (i) we first fit (1) to thelongitudinal data alone and obtain the estimates of θi denoted by θi and (ii) we then use θiin (2) and (3) The TS models are also substantially different than the TVC model since theTS models do not require the LOCF assumptionThe current version of the JMFit SAS macro only fits linear and quadratic models for thelongitudinal outcome and the piecewise constant baseline hazard function for the survivalsubmodel In the joint modeling framework other dependence structures such as dependencethrough the derivatives of the trajectory function or interactions with covariates as well asspline approximations to the baseline hazard could be assumed In addition other trajectoryfunctions may be more appropriate to model the time effect on the longitudinal outcomes incertain applications These additional features could be built in the JMFit macro in a futurereleaseFinally we note that both the IBCSG dataset and the simulated datasets DSurv DLongDLong1 DLong2 and DLong3 are available for downloading from the journal website

AcknowledgmentsWe would like to thank the Editors-in-Chief the Editor and the three anonymous reviewersfor their very helpful comments and suggestions which have led to a much improved versionof the paper and the the JMFit SAS macro Dr M-H Chen and Dr J G Ibrahimrsquos researchwas partially supported by NIH grants GM 70335 and P01 CA142538

References

Akaike H (1973) ldquoInformation Theory and an Extension of the Maximum Likelihood Prin-

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 18: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

18 JMFit Joint Models of Longitudinal and Survival Data in SAS

ciplerdquo In BN Petrov F Csaki (eds) Proceedings of the Second International Symposiumon Information Theory pp 267ndash281 Budapest Akademiai Kiado

Chi YY Ibrahim JG (2006) ldquoJoint Models for Multivariate Longitudinal and MultivariateSurvival Datardquo Biometrics 62 432ndash445 doi101111j1541-0420200500448x

Crowther MJ (2012) STJM Stata Module to Fit Shared Parameter Joint Models of Lon-gitudinal and Survival Data URL httpEconPapersrepecorgRePEcbocbocodes457502

Crowther MJ Abrams KR Lambert PC (2013) ldquoJoint Modeling of Longitudinal and SurvivalDatardquo The Stata Journal 13 165ndash184

Fisher LD Lin D (1999) ldquoTime-Dependent Covariates in the Cox Proportional-HazardsRegression Modelrdquo Annual Review of Public Health 20 145ndash157 doi101146annurevpublhealth201145

Ibrahim JG Chen MH Sinha D (2001) Bayesian Survival Analysis Springer-Verlag doi101007978-1-4757-3447-8

Ibrahim JG Chu H Chen LM (2010) ldquoBasic Concepts and Methods for Joint Models ofLongitudinal and Survival Datardquo Journal of Clinical Oncology 28 2796ndash2801 doi101200jco2009250654

International Breast Cancer Study Group (1996) ldquoDuration and Reintroduction of Adju-vant Chemotherapy for Node-Positive Premenopausal Breast Cancer Patientsrdquo Journal ofClinical Oncology 14 1885ndash1894 doi101200jco2005030783

Philipson P Sousa I Diggle P Williamson P Kolamunnage-Dona R Henderson R (2012)joineR Joint Modelling of Repeated Measurements and Time-to-Event Data R packageversion 10-3 URL httpsCRANR-projectorgpackage=joineR

SAS Institute Inc (2011a) SASIML Software Version 93 Cary NC URL httpwwwsascom

SAS Institute Inc (2011b) SASSTAT Software Version 93 Cary NC URL httpwwwsascom

Proust-Lima C Philipps V Liquet B (2016) ldquoEstimation of Extended Mixed Models UsingLatent Classes and Latent Processes The R Package lcmmrdquo Journal of Statistical SoftwareForthcoming

Rizopoulos D (2010) ldquoJM An R Package for the Joint Modelling of Longitudinal and Time-to-Event Datardquo Journal of Statistical Software 35(9) 1ndash33 doi1018637jssv035i09

Rizopoulos D (2012) JM Joint Modeling of Longitudinal and Survival Data R packageversion 11-0 URL httpsCRANR-projectorgpackage=JM

Rizopoulos D (2016) ldquoThe R Package JMbayes for Fitting Joint Models for Longitudinal andTime-to-Event Data Using MCMCrdquo Journal of Statistical Software Forthcoming

Schwarz G (1978) ldquoEstimating the Dimension of a Modelrdquo The Annals of Statistics 6461ndash464 doi101214aos1176344136

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 19: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

Journal of Statistical Software 19

Tsiatis AA DeGruttola V Wulfsohn MS (1995) ldquoModelling the Relationship of Survivalto Longitudinal Data Measured with Error Applications to Survival and CD4 Counts inPatients with AIDSrdquo Journal of the American Statistical Association 90 27ndash37 doi10108001621459199510476485

Zhang D Chen MH Ibrahim JG Boye ME Wang P Shen W (2014) ldquoAssessing Model Fitin Joint Models of Longitudinal and Survival Data with Applications to Cancer ClinicalTrialsrdquo Statistics in Medicine 33(27) 4715ndash4733 doi101002sim6269

Zhu H Ibrahim JG Chi YY Tang N (2012) ldquoBayesian Influence Measures for JointModels for Longitudinal and Survival Datardquo Biometrics 68 954ndash964 doi101111j1541-0420201201745x

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 20: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

20 JMFit Joint Models of Longitudinal and Survival Data in SAS

A The macro JMFit

The SAS macro JMFit as well as the five submacros should be stored in a folder named jmfitThen JMFit can be accessed by including the following lines

filename jmfit directory of the file JMFitinclude jmfit(JMFitsas)JMFit(LONG= SURV= MODEL= TS= TMAXI= WEIGHT= NPIECES= PARTITION=

OPTIONS= INITIAL= OUTPUT=)

Inputs for JMFit

LONG Data set with the first three columns SID (subject ID) Y (longitudinal measure) A(time at which Y was taken) and additional columns for covariates (XL1-XLp) whereSID Y and A should be arranged in the first second and third columns and XL1middot middot middot XLp should be placed after column 3 which can be enforced in SAS by usingthe retain command Note that XL1 XLp can be time-dependent or baselinecovariates Required

SURV Data set with the first three columns SID survival time (T) censoring indicator(delta) (1 = death and 0 = censored) and additional columns for covariates (XS1-XSq)where SID T and delta should be arranged in the first second and third columns andXS1 XSq should be placed after column 3 Required

MODEL Model specification Required One of

1 SPM1L Shared Parameter Model 1 with Linear trajectory2 SPM1Q Shared Parameter Model 1 with Quadratic trajectory3 SPM2L Shared Parameter Model 2 with Linear trajectory4 SPM2Q Shared Parameter Model 2 with Quadratic trajectory5 TVC Time-Varying Covariates Model

TS Indicates whether to implement the model specified in the MODEL argument or the cor-responding two-stage model If 0 (default) the model specified in the MODEL argumentwill be fit If 1 the corresponding two-stage model will be fit instead It only works forSPM1L SPM1Q SPM2L and SPM2Q

TMAXI tmaxi adjustment to the model specified in the MODEL argument If 0 (default) notmaxi adjustment will be applied If 1 the trajectory will become flat after tlowastmaxi =tmaxi+ WEIGHTtimesmax(ti minus tmaxi 0) If 2 starting at tlowastmaxi the trajectory will linearlygo down to 0 at the last follow-up survival time It only works for SPM1L and SPM1Q

WEIGHT The proportion (isin [0 1]) of max(ti minus tmaxi 0) If 0 (default) the starting point ofthe modified extrapolation of the trajectory is tmaxi If 1 the trajectory extends to tiwith no tmaxi adjustment It only works when TMAXI = 1 or TMAXI = 2

NPIECES Number of intervals J (ge 1) for the piecewise constant baseline hazard functionRequired

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 21: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

Journal of Statistical Software 21

PARTITION Algorithm for constructing the partition of the time axis Required One of

(i) 1 Equally-Spaced Quantile Partition (ESQP)(ii) 2 Left Bi-Sectional Quantile Partition (LBSQP)(iii) 3 Middle Bi-Sectional Quantile Partition (MBSQP)(iv) 4 Right Bi-Sectional Quantile Partition (RBSQP)

OPTIONS Allows users to specify options that are available in the PROC NLMIXED statementFor example OPTIONS = str(QPOINTS = 5 TECH = CONGRA ABSGCONV = 00001)specifies Gaussian quadrature with five quadrature points for approximating the in-tegral of the likelihood over the random effects the conjugate-gradient optimizationand an absolute gradient convergence criterion of 00001

INITIAL Allows users to set their own initial values JMFit will automatically generate thestarting values for the model parameters and these initial values will be stored in thedata set _initial Since the order of the parameters is very important when calculatingAICLong and BICLong users are recommended to change the initial values in _initialand then rename _initial

OUTPUT Name of the output rich text file (RTF) One can also specify the directory in whichthe file will be put For example the output file named ldquomyoutputrdquo will be storedin Cmyoutput by JMFit( OUTPUT = Cmyoutput) If OUTPUT is notspecified the file will be indexed by the name of Y from LONG and the modelrsquos name

Note 1 (i) The name of the SID variable in LONG should be the same as that of the SIDvariable in SURV (ii) A and T should be in the same unit of time (month preferred)(iii) the categorical covariates must be coded as dummy variables and (iv) the SASmacro allows for any numbers of covariates for both components of the joint model andthe covariates for the longitudinal component can be totally different from those for thesurvival component

Note 2 (i) ldquoERROR Not enough memory to generate coderdquo This might occur if J is toobig (ii) a too long path for the OUTPUT may lead to an error (iii) the macro is assumingoptions validvarname = v7 for valid variable names that can be processed in SASand (iv) the calculations of AICLong and BICLong require the IML Procedure

Note 3 No missing values are allowed in both data sets

Output for JMFit

The macro automatically produces an RTF file indexed by the name of Y from LONG andthe modelrsquos name The RTF file includes five tables (i) Number of Subjects (ii) Fit Statis-tics (iii) Survival Parameter Estimates (Survival Alone) (iv) Parameter Estimates and (v)Hazard Ratios amp λ Estimates

Note The construction of the Parameter Estimates table is different for each model ForSPM1L SPM1Q SPM2L and SPM2Q it consists of three subtables ldquoCovarianceParameter Estimatesrdquo ldquoLongitudinal Parameter Estimatesrdquo and ldquoSurvival Parameter

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 22: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

22 JMFit Joint Models of Longitudinal and Survival Data in SAS

Estimatesrdquo for the TS model corresponding to SPM1L SPM1Q SPM2L or SPM2Qthere are two tables ldquoLongitudinal Parameter Estimates (Stage I)rdquo and ldquoSurvival Pa-rameter Estimates (Stage II)rdquo and for the TVC model there is only one table namedldquoParameter Estimatesrdquo

B Output for the simulated data DLong + DSurv under SPM2L

Number of Subjects

Subjects_in_long 400

Subjects_in_surv 400

Subjects_Used 400

Figure 9 The number of subjects for DLong + DSurv

Survival Parameter Estimates (Survival Alone)

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

therapy -03373 01081 400 -312 00019 (-0550 -0125) 0002926

race -000214 01811 400 -001 09906 (-0358 0354) 0001148

gender 005360 01272 400 042 06737 (-0196 0304) -000402

age 004130 01078 400 038 07018 (-0171 0253) 0005914

karnofsky -02666 01116 400 -239 00173 (-0486 -0047) -000098

stage -02204 01088 400 -203 00435 (-0434 -0006) -000063

bf -003879 01262 400 -031 07587 (-0287 0209) 0003399

log λ1 -15883 02502 400 -635 lt0001 (-2080 -1096) 0000383

Figure 10 The estimates of the parameters obtained by fitting the survival data alone forDLong + DSurv

Hazard Ratios amp λ Estimates

Parameter Estimate 95 CI

HR_therapy 07020 (0563 0875)

HR_race 10552 (0732 1521)

HR_gender 09778 (0756 1264)

HR_age 11057 (0889 1375)

HR_karnofsky 07499 (0598 0940)

HR_stage 08237 (0661 1026)

HR_bf 09328 (0722 1205)

λ1 01809 (0109 0301)

HR_β1 12041 (1046 1387)

HR_β2 36966 (2312 5910)

Figure 11 The hazard ratios and λ estimates for DLong + DSurv under SPM2L

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 23: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

Journal of Statistical Software 23

Parameter Estimates

ParamVar Estimate SE DF T-Value P-Value 95 CI Gradient

Covariance Parameter Estimates

Ω00 06819 005753 398 1185 lt0001 (0569 0795) 0005795

Ω10 -005231 001440 398 -363 00003 (-0081 -0024) 0023612

Ω11 006822 0006320 398 1079 lt0001 (0056 0081) -00075

σ 05252 0008301 398 6326 lt0001 (0509 0542) 0000529

Longitudinal Parameter Estimates

Intercept 06852 01916 398 358 00004 (0309 1062) -00101

longt 003229 001491 398 217 00309 (0003 0062) -003416

therapy -001673 008539 398 -020 08447 (-0185 0151) -000516

race -01504 01449 398 -104 03002 (-0435 0135) -001036

gender -01901 01019 398 -187 00628 (-0390 0010) -000528

age -01539 008503 398 -181 00711 (-0321 0013) -002083

karnofsky -05958 008773 398 -679 lt0001 (-0768 -0423) -000274

stage -02010 008602 398 -234 00199 (-0370 -0032) 0010376

bf -007759 009840 398 -079 04309 (-0271 0116) -000665

Survival Parameter Estimates

therapy -03539 01118 398 -316 00017 (-0574 -0134) -001536

race 005376 01859 398 029 07726 (-0312 0419) 0000795

gender -002245 01307 398 -017 08637 (-0279 0234) 0008286

age 01005 01109 398 091 03656 (-0118 0319) -000213

karnofsky -02878 01150 398 -250 00128 (-0514 -0062) 0010698

stage -01940 01117 398 -174 00831 (-0414 0026) -000047

bf -006953 01302 398 -053 05937 (-0326 0186) 0001858

log λ1 -17098 02590 398 -660 lt0001 (-2219 -1201) -00061

β1 01858 007175 398 259 00100 (0045 0327) 0003564

β2 13074 02387 398 548 lt0001 (0838 1777) 0003804

Figure 12 The estimates of the parameters for DLong + DSurv under SPM2L

AffiliationDanjie ZhangGilead Sciences Inc333 Lakeside DriveFoster City CA 94404 United States of AmericaE-mail danjiezhanggileadcom

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong
Page 24: JMFit: A SAS MacroforJointModelsof LongitudinalandSurvivalData · JSS JournalofStatisticalSoftware July 2016, Volume 71, Issue 3.doi:10.18637/jss.v071.i03 JMFit: ASAS MacroforJointModelsof

24 JMFit Joint Models of Longitudinal and Survival Data in SAS

Ming-Hui ChenDepartment of StatisticsUniversity of Connecticut215 Glenbrook Road U-4120Storrs CT 06269 United States of AmericaE-mail ming-huichenuconnedu

Joseph G IbrahimDepartment of BiostatisticsUniversity of North CarolinaMcGavran Greenberg Hall CB7420Chapel Hill NC 27599 United States of AmericaE-mail ibrahimbiosuncedu

Mark E Boye Wei ShenEli Lilly and CompanyLilly Corporate CenterIndianapolis IN 46285 United States of AmericaE-mail boyemalillycom shen_wei_x1lillycom

Journal of Statistical Software httpwwwjstatsoftorgpublished by the Foundation for Open Access Statistics httpwwwfoastatorg

July 2016 Volume 71 Issue 3 Submitted 2013-05-01doi1018637jssv071i03 Accepted 2015-02-23

  • Introduction
  • The models and model assessment
    • The joint models
    • The construction of the piecewise constant baseline hazard function
      • Algorithm 1 Equally-spaced quantile partition (ESQP)
      • Algorithm 2 Left bi-sectional quantile partition (LBSQP)
      • Algorithm 3 Middle bi-sectional quantile partition (MBSQP)
      • Algorithm 4 Right bi-sectional quantile partition (RBSQP)
        • The joint likelihood
        • AIC and BIC decomposition and Delta AIC and Delta BIC
          • The SAS macro JMFit
            • Design
            • Implementation details
              • Examples
                • The IBCSG data
                • A simulated data example
                  • Concluding remarks
                  • The macro JMFit
                  • Output for the simulated data DLong