Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Development and application of
methodology for the parametric analysis
of complex survival and joint
longitudinal-survival data in biomedical
research
Thesis submitted for the degree of
Doctor of Philosophy
At the University of Leicester.
by
Michael James Crowther MMath MSc
Department of Health Sciences
University of Leicester.
1st November 2014.
Development and application of
methodology for the parametric analysis
of complex survival and joint
longitudinal-survival data in biomedical
research
Thesis submitted for the degree of
Doctor of Philosophy
At the University of Leicester.
by
Michael James Crowther MMath MSc
Department of Health Sciences
University of Leicester.
1st November 2014
Abstract
Development and application of methodologyfor the parametric analysis of complex survival
and joint longitudinal-survival data inbiomedical research
M. J. Crowther
The occurrence of survival, or time-to-event, data is commonplace in medical research,where interest lies in the time it takes from a given baseline, for an event of interestto occur, and the factors that are associated with it. For example, this could be theeffect of a treatment on the time to death since diagnosis of cardiovascular disease. Theprimary aim of this thesis is to develop parametric methods for the analysis of complexsurvival data, including the extension to joint models of longitudinal and survival data,to provide a number of advantages over the commonly used semi-parametric Cox model.New and current methodology is often assessed using simulation studies; however, oftenin the field of survival analysis they are simplistic and fail to reflect biologically plausi-ble scenarios. In this thesis a general algorithm for simulating complex survival data,from any given hazard function, is proposed and assessed. A general framework for theparametric analysis of survival data is then developed, utilising numerical quadrature,illustrated in detail using the special case of restricted cubic splines to model the base-line hazard and time-dependent effects. Extensions to the framework including clusterrobust standard errors and excess mortality models are also considered. Finally, thejoint longitudinal-survival modelling framework is extended to incorporate the Royston-Parmar survival model, and a mixture of two parametric distributions, both evaluatedthrough simulation, utilising the proposed simulation algorithm, showing advantagesover more simple parametric approaches. The estimation of joint models, using Gauss-ian quadrature, is also evaluated through an extensive simulation study. Throughoutthe thesis, user friendly software is developed to implement the methodological compo-nents, allowing statisticians and non-statisticians alike, to apply the methods directly.A variety of clinical datasets in the areas of cancer, cardiovascular disease and livercirrhosis are used to exemplify the proposals.
I
Acknowledgements
I would first like to express my gratitude to my two main supervisors, Professor
Keith Abrams and Professor Paul Lambert, both of whom have been not only excellent
supervisors and mentors, but also friends. I would also like to thank Professor Harry
Hemingway for acting as my third supervisor, who I look forward to working with more
in the future.
Thanks also go to other members of the Department of Health Sciences for their
input and expertise, including fellow PhD students and junior researchers. In particular,
I’d like to thank Dr. Mark Rutherford for taking the time to read through a draft of this
thesis. Thanks also to my friends and colleagues at Karolinska Institutet, in particular
Dr. Therese Andersson and Dr. Sandra Eloranta.
I would like to thank all the employees of StataCorp who made my internship,
conducted during my PhD studies, such an enjoyable and invaluable experience. In
particular, I would like to thank Yulia Marchenko, Jeff Pitblado, Alan Riley and Vince
Wiggins. I would also like to thank all the users of my Stata commands who have
emailed over the years with questions, queries, bug reports and suggestions, that have
led to many improvements and made the experience so worthwhile.
I would also like to thank Dr. James Cook and Rickard Henrikson, purely for
keeping me sane throughout my PhD, predominantly through collaborative beer con-
sumption.
Finally, I would like to thank my parents, my sister, and my Grandpa, and the rest
of my friends, for their support over the past few years.
This PhD thesis has been prepared as part of a Doctoral Research Fellowship funded
by the National Institute for Health Research (NIHR) (DRF-2012-05-409).
II
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1. Aims of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2. Survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3. Joint models for the analysis of longitudinal and survival . . . . . . . . . 21.4. Layout of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2. Survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3. Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4. Fundamental relationships in survival analysis . . . . . . . . . . . . . . . 82.5. Delayed entry (left truncation) . . . . . . . . . . . . . . . . . . . . . . . 92.6. Parametric survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . 102.7. Cox proportional hazards model . . . . . . . . . . . . . . . . . . . . . . . 132.8. Royston-Parmar flexible parametric survival model . . . . . . . . . . . . 142.9. Why a parametric approach? . . . . . . . . . . . . . . . . . . . . . . . . 182.10. Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . 202.11. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Chapter 3. Clinical motivating examples . . . . . . . . . . . . . . . . . . . . . . 243.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2. German breast cancer dataset . . . . . . . . . . . . . . . . . . . . . . . . 243.3. Breast cancer in England and Wales . . . . . . . . . . . . . . . . . . . . 283.4. Bladder cancer recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . 313.5. Trial in prednisone treatment of patients with liver cirrhosis . . . . . . . 323.6. Cohort of obese patients with type 2 diabetes mellitus from the General
Practice Research Database . . . . . . . . . . . . . . . . . . . . . . . . . 353.7. Trials of D-penicillamine treatment for patients with primary biliary
cirrhosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.8. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Chapter 4. Simulation of complex survival data . . . . . . . . . . . . . . . . . . 42
III
4.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.3. Motivating dataset - German breast cancer trial . . . . . . . . . . . . . . 454.4. Simulating survival data . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.5. A general framework for simulation of survival data . . . . . . . . . . . . 484.6. Simulating from a complex baseline hazard function . . . . . . . . . . . . 504.7. Simulating time-dependent effects . . . . . . . . . . . . . . . . . . . . . . 574.8. Simulating time-varying covariates . . . . . . . . . . . . . . . . . . . . . 614.9. Simulating a censoring distribution . . . . . . . . . . . . . . . . . . . . . 644.10. survsim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.11. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Chapter 5. A general framework for parametric survival analysis . . . . . . . . 685.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.3. A general framework for the parametric analysis of survival data . . . . 715.4. Time-dependent effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.5. Excess mortality models . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.6. Cluster robust standard errors . . . . . . . . . . . . . . . . . . . . . . . . 755.7. Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.8. Implementation in Stata: The stgenreg package . . . . . . . . . . . . . 775.9. Analysis of example datasets using stgenreg . . . . . . . . . . . . . . . 805.10. Improving the estimation procedure when using restricted cubic splines 915.11. Further analysis of example datasets using strcs . . . . . . . . . . . . 975.12. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Chapter 6. Joint modelling of longitudinal and survival data . . . . . . . . . . . 1046.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046.3. Survival analysis with time-varying covariates . . . . . . . . . . . . . . . 1056.4. Formulating the standard joint model . . . . . . . . . . . . . . . . . . . . 1086.5. Alternative association structures . . . . . . . . . . . . . . . . . . . . . . 1126.6. Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1156.7. Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1236.8. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Chapter 7. Flexible parametric joint modelling of longitudinal and survival data 1307.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1307.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1307.3. Defining the joint model . . . . . . . . . . . . . . . . . . . . . . . . . . . 1327.4. Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1347.5. Analysis of liver cirrhosis dataset . . . . . . . . . . . . . . . . . . . . . . 1387.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Chapter 8. Joint modelling of longitudinal and survival data using finite mixturemodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1528.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
IV
8.3. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1538.4. Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1588.5. Application to liver cirrhosis data . . . . . . . . . . . . . . . . . . . . . . 1608.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Chapter 9. Adjusting for measurement error in baseline prognostic biomarkers: Ajoint modelling approach . . . . . . . . . . . . . . . . . . . . . . . 165
9.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1659.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1659.3. Joint model formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1679.4. Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1699.5. Analysis of GPRD cohort . . . . . . . . . . . . . . . . . . . . . . . . . . 1719.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Chapter 10. Software for the joint modelling of longitudinal and survival data . 18110.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18110.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18110.3. Data setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18210.4. Model fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18410.5. Association structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18710.6. Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18810.7. Joint modelling of longitudinally measured serum bilirubin and time to
death . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18910.8. Post-estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19210.9. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Chapter 11. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19411.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19411.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19411.3. Summary of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19411.4. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19711.5. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20011.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Appendix A. Research Papers 1 and 2 - The Stata Journal and Statistics inMedicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Appendix B. Research Papers 3 and 4 - Journal of Statistical Software andStatistics in Medicine . . . . . . . . . . . . . . . . . . . . . . . . . 204
Appendix C. Research Paper 5 - Statistics in Medicine . . . . . . . . . . . . . . 222
Appendix D. Research Paper 6 - Statistics in Medicine . . . . . . . . . . . . . . 223
Appendix E. Research Paper 7 - BMC Medical Research Methodology . . . . . . 224
Appendix F. Research Paper 8 - The Stata Journal . . . . . . . . . . . . . . . . 233
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
V
List of Tables
3.1 German breast cancer trial. Results from a Cox proportional hazards model,investigating effect of hormonal therapy on survival. . . . . . . . . . . . . . . 25
3.2 German breast cancer trial. Results from a Royston-Parmar proportionalhazards model with 4 degrees of freedom, investigating effect of hormonaltherapy on survival. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 England and Wales breast cancer dataset. Results from a Cox proportionalhazards model, investigating effect of deprivation status on survival. . . . . . 29
3.4 England and Wales breast cancer dataset. Results from a Royston-Parmarproportional hazards model, investigating effect of deprivation status onsurvival. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5 Bladder cancer recurrence dataset. Number of patients who were censored orexperienced up to 4 recurrences of bladder cancer . . . . . . . . . . . . . . . 31
4.1 German breast cancer trial. Estimated treatment effects from Weibull, Royston-Parmar and Cox models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Bias and coverage of the log hazard ratio and estimates of baseline survival andhazard at specific time points from both Weibull and mixture Weibull models. 54
4.3 Bias and coverage of the log hazard ratio and estimates of baseline survival andhazard at specific time points from Weibull and Royston-Parmar models. . . 58
4.4 Bias and coverage of the log hazard ratio and estimates of baseline survival andhazard at specific time points from proportional hazards Weibull and Royston-Parmar models, and non-proportional hazards Royston-Parmar model. . . . . 62
5.1 Description of each component that can be included in the linear predictor ofa parameter included in a stgenreg model. . . . . . . . . . . . . . . . . . . . 79
5.2 England and Wales breast cancer dataset. Results from optimised Weibull andgeneral parametric models with varying number of nodes. . . . . . . . . . . . 82
5.3 Liver cirrhosis trial of prednisone treatment. Example multiple-record persubject data structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4 Liver cirrhosis trial of prednisone treatment. Results from spline basedproportional hazards model with time-varying prothrombin index. . . . . . . 90
5.5 England and Wales breast cancer dataset. Comparison of estimates when usingdifferent numbers of nodes for the fully numeric approach . . . . . . . . . . . 94
5.6 England and Wales breast cancer dataset. Comparison of estimates when usingdifferent numbers of nodes for the combined analytical/numeric approach . . 95
VI
5.7 England and Wales breast cancer dataset. Results from combined analytic/numericalspline based survival model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.8 England and Wales breast cancer dataset. Comparison of excess hazard ratios(and 95% confidence intervals) from models with the linear predictor on the loghazard scale and the log cumulative hazard scale. Both models have 6 knotswith these placed evenly according to the distribution of log death times. . . 98
5.9 Bladder cancer dataset. Number of patients who were censored or experiencedup to 4 recurrences of bladder cancer . . . . . . . . . . . . . . . . . . . . . . 100
5.10Bladder cancer dataset. Results from spline based and Cox models with clusterrobust standard errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.1 Liver cirrhosis trial of prednisone treatment. Example multiple-record persubject data structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.1 Simulation results from Weibull scenario 1. Association is varied with α =−0.25, 0.25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.2 Simulation results from mixture-Weibull scenario 2. Association is varied withα = −0.25, 0.25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.3 Simulation results from Weibull scenario 3. Association is varied with α =−0.25, 0.25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.4 Liver cirrhosis trial of prednisone treatment. Results from applying the RPrestricted cubic spline and B-spline joint models. . . . . . . . . . . . . . . . . 151
7.5 Liver cirrhosis trial of prednisone treatment. Results from joint models withvarying knot locations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.1 Estimates of bias and coverage from the simulation study. . . . . . . . . . . . 160
8.2 Liver cirrhosis trial of prednisone treatment. Parameter estimates and 95%Confidence Intervals (CI) for the Weibull, mixture-Weibull and RP joint models.164
9.1 Simulation results of bias and coverage of the association parameter, α. Thestandard deviation of the measurement error is σe. . . . . . . . . . . . . . . . 172
9.2 GPRD cohort of obese patients with type 2 diabetes mellitus. Results fromapplying a Royston-Parmar proportional hazards model adjusting for observedbaseline systolic blood pressure, and a full joint model using the interceptassociation structure. All models are adjusted for age (years) sex (male) andBMI (kg/m2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.3 GPRD cohort of obese patients with type 2 diabetes mellitus. Results fromapplying a joint model with random intercept and random linear slope in thelongitudinal submodel, using the intercept association structure. . . . . . . . 176
VII
List of Figures
2.1 Example hazard functions from the exponential, Weibull andGompertz distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1 German breast cancer trial. Survival function is estimated usingthe Kaplan-Meier method, across treatment groups. . . . . . . . . . . . 25
3.2 German breast cancer trial. Hazard function estimated usingthe Royston-Parmar survival model . . . . . . . . . . . . . . . . . . . . 27
3.3 German breast cancer trial. Non-parametric estimates of thehazard function using the Epanechnikov and Epanechnikov 2kernel density smoothers. Follow-up is restricted to 6 years. . . . . . . . 27
3.4 England and Wales breast cancer data, all ages, comparing leastand most deprived. Survival function is estimated using theKaplan-Meier method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 England and Wales breast cancer dataset, all ages, comparingleast and most deprived. Hazard function is estimated using aRoyston-Parmar model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.6 Liver cirrhosis trial investigating prednisone treatment. Survivalfunction is estimated using the Kaplan-Meier method. . . . . . . . . . . 32
3.7 Liver cirrhosis trial investigating prednisone treatment. Hazardfunction estimated using a Royston-Parmar model. . . . . . . . . . . . . 33
3.8 Liver cirrhosis trial investigating prednisone treatment. Lon-gitudinal profiles of prothrombin index for patients who werecensored/died. Lowess smoother overlaid. . . . . . . . . . . . . . . . . . 34
3.9 Liver cirrhosis trial investigating prednisone treatment. Longi-tudinal profiles of prothrombin index for patients who were cen-sored/died. Timescale is adjusted by taking away each patientsobserved survival time. Lowess smoother overlaid. . . . . . . . . . . . . 34
3.10 GPRD cohort of obese patients with type 2 diabetes mellitus.Survival function is estimated using the Kaplan-Meier method. . . . . . 36
3.11 GPRD cohort of obese patients with type 2 diabetes mellitus.Hazard function is estimated using a Royston-Parmar model. . . . . . . 36
3.12 GPRD cohort of obese patients with type 2 diabetes mellitus.Box plots of baseline SBP for patients who were censored or whosuffered a stroke. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.13 GPRD cohort of obese patients with type 2 diabetes mellitus.Distribution of number of observed measurements of systolicblood pressure per patient. . . . . . . . . . . . . . . . . . . . . . . . . . 37
VIII
3.14 Primary biliary cirrhosis trials investigating D-penicillamine treat-ment. Survival function is estimated using the Kaplan-Meier method. . 38
3.15 Primary biliary cirrhosis trials investigating D-penicillamine treat-ment. Hazard function is estimated using a nonparametric ker-nel smoother. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.16 Primary biliary cirrhosis trials investigating D-penicillamine treat-ment. Longitudinal profiles of log(serum bilirubin) index for pa-tients who were censored/died. Lowess smoother overlaid. . . . . . . . . 40
3.17 Primary biliary cirrhosis trials investigating D-penicillamine treat-ment. Longitudinal profiles of log(serum bilirubin) index for pa-tients who were censored/died. Timescale is adjusted by takingaway each patients observed survival time. Lowess smootheroverlaid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1 German breast cancer trial. Predicted survival from Weibull andRoyston-Parmar models overlaid on the Kaplan-Meier curves. . . . . . . 46
4.2 German breast cancer trial. Predicted hazard functions fromWeibull and Royston-Parmar survival models. . . . . . . . . . . . . . . . 46
4.3 Schematic flow diagram of simulation techniques, beginning atthe top-left. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 Example two-component mixture Weibull baseline hazard functions . . 524.5 Example fractional polynomial baseline hazard function . . . . . . . . . 554.6 Example time-dependent log hazard ratio representing a dimin-
ishing treatment effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1 England and Wales breast cancer dataset. Comparison of fit-ted survival function and confidence intervals between analyticWeibull model and equivalent model fitted with quadrature. . . . . . . . 83
5.2 England and Wales breast cancer dataset. Predicted hazardfunction for the most affluent group with 95% confidence interval. . . . 87
5.3 England and Wales breast cancer dataset. Time-dependent haz-ard ratio for deprivation status. . . . . . . . . . . . . . . . . . . . . . . . 88
5.4 England and Wales breast cancer dataset. Kaplan-Meier esti-mates for the most affluent and most deprived groups, with pre-dicted survival overlaid. The figure on the left shows predictedsurvival with a proportional effect of deprivation status, withthe figure on the right allowing for non-proportional hazards inthe effect of deprivation status. . . . . . . . . . . . . . . . . . . . . . . . 89
5.5 England and Wales breast cancer dataset. Excess hazard ratiocomparing most deprived with least deprived group. The modelused 6 knots for the baseline and 4 knots for the time-dependenteffect. Shaded area represents the 95% confidence interval. . . . . . . . 99
5.6 England and Wales breast cancer dataset. Excess hazard ratioscomparing most deprived with least deprived group. The modelused 6 knots for the baseline and 4 knots for the time-dependenteffect. Thicker lines are the estimated excess hazard ratios, withthinner lines associated 95% confidence intervals. . . . . . . . . . . . . . 99
IX
5.7 Bladder cancer dataset. Baseline hazard rates for the four or-dered events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.1 Liver cirrhosis trial of prednisone treatment. Time-varying co-variate within a survival analysis . . . . . . . . . . . . . . . . . . . . . . 108
6.2 Time-varying covariate within a joint model . . . . . . . . . . . . . . . . 1106.3 Basis node locations for 2-dimensional 9 point simple Gauss-
Hermite quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206.4 Basis node locations for 2-dimensional 9 point simple Gauss-
Hermite quadrature, transformed by Choleski decomposition ofvariance-covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.5 Conceptual example of adaptive Gauss-Hermite quadrature . . . . . . . 124
7.1 Liver cirrhosis trial of prednisone treatment. Longitudinal pro-files of prothrombin index for patients who were censored/died.Lowess smoother overlaid. . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.2 Liver cirrhosis trial of prednisone treatment. Longitudinal pro-files of prothrombin index for patients who were censored/died.Timescale is adjusted by taking away each patients survival time.Lowess smoother overlaid. . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7.3 Liver cirrhosis trial of prednisone treatment. Marginal survivalusing 6 parameters to model the baseline cumulative hazardfunction, overlaid on the Kaplan-Meier estimate and it’s associ-ated 95% confidence interval. . . . . . . . . . . . . . . . . . . . . . . . . 141
7.4 Liver cirrhosis trial of prednisone treatment. Marginal survivalusing cubic B-splines with 3, 4 or 5 internal knots, resulting in 7,8 or 9 parameters to model the baseline cumulative hazard func-tion, overlaid on the Kaplan-Meier estimate and it’s associated95% confidence interval. . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.5 Liver cirrhosis trial of prednisone treatment. Predicted condi-tional survival probabilities for patients 98 and 253. . . . . . . . . . . . 143
7.6 Liver cirrhosis trial of prednisone treatment. Fitted marginalsurvival function from joint models with varying knot locationsand number of internal knots. Left hand plot used 6 parametersto model the baseline log cumulative hazard function, right handplot uses 4, 5 and 7 parameters. . . . . . . . . . . . . . . . . . . . . . . 144
7.7 Simulated data. Fitted marginal survival function from jointmodels with either 1 or 5 degrees of freedom, overlaid on theKaplan-Meier survival curve. . . . . . . . . . . . . . . . . . . . . . . . . 146
8.1 Baseline hazard functions used in the simulation study. . . . . . . . . . 1598.2 Liver cirrhosis trial of prednisone treatment. Fitted marginal
survival functions. The left plot is from the Weibull based jointmodel, and the right from the mixture Weibull joint model. . . . . . . . 161
9.1 Example simulated observed longitudinal measurements withvarying measurement error standard deviation. . . . . . . . . . . . . . . 171
X
9.2 GPRD cohort of obese patients with type 2 diabetes mellitus.Longitudinal response measurements for SBP for 9 randomlyselected patients who had at least 10 measurements. The dashedline represents the fitted longitudinal trajectories based on thejoint model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
9.3 Predicted survival from the Royston-Parmar flexible parametricmodel (FPM) and joint model, for a female, aged 60 years, BMIof 30, with SBP of 90, 130 or 200. . . . . . . . . . . . . . . . . . . . . . 176
10.1 Longitudinal profiles of log(serum bilirubin) for patients whowere censored/died. Timescale is adjusted by taking away eachpatients survival time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
XI
CHAPTER 1
Introduction
1.1. Aims of the thesis
The development of what is now known as the Cox proportional hazards model,
by Sir David Cox in 1972, is without doubt one of the most important developments
in medical and statistical research (Cox, 1972). With over 36,000 citations (Google
Scholar, February 2014) since publication, it is often the default choice for the analysis
of survival data; however, it is not without its disadvantages. The primary aim of this
thesis is to develop methodology for the parametric analysis of survival data, as an
alternative to the Cox model.
In order to assess statistical methods, often simulation studies are conducted, which
generate data under a known truth. To validate and assess the parametric methods
developed in this thesis, it is also an aim to simultaneously derive a general framework
to better simulate more biologically plausible survival data, as simulation studies can
often be considered too simplistic using current methods. This will provide a framework
which can be utilised in an extensive array of methodological areas.
Advanced statistical techniques are often developed and published in the method-
ological literature, which have the potential to be used in a variety of clinical settings;
however, they can struggle to permeate through to applied biomedical research. A
further aim of this thesis is, in parallel to the methodological work, to create and
disseminate user friendly software, which can be used by both statisticians and non-
statisticians, alike. Methods could be implemented in a variety of statistical packages,
such as R; however, I have chosen Stata.
Throughout the thesis, the methodology developed will be applied to real clinical
datasets in a variety of settings, such as liver cirrhosis and breast cancer, serving to
1
illustrate the wide variety of clinical areas that can benefit from the novel methodology,
and user friendly software.
1.2. Survival analysis
The occurrence of survival (or time-to-event) data is commonplace in medical re-
search, where interest lies in the time it takes from a given baseline, for an event of
interest to occur, and the factors that are associated with it. For example, this could be
the effect of a treatment on the time to death since diagnosis of cardiovascular disease.
The two main approaches to survival analysis, are the semi-parametric approach
of Cox (1972), and fully parametric approaches, assuming such distributions as the
exponential or Weibull, for example (Collett, 2003). The Cox model does not assume
any functional form for the baseline hazard function, whereas a parametric approach
assumes a specific shape, estimated as part of the model. Both allow us to investigate
the influence that risk factors have on the rate of disease or mortality, for example.
In this thesis I concentrate on the parametric approach to survival analysis, in par-
ticular, deriving a general algorithm to simulate survival data under more biologically
plausible scenarios to better assess both methods used in practice, and novel models.
This then leads to the development of a general framework for parametric survival
analysis, motivated by wanting to incorporate greater flexibility than standard para-
metric models can provide, particularly in capturing complex baseline hazard functions
and time-dependent effects. The framework is extended to incorporate cluster robust
standard errors and relative survival, with an improved estimation routine when using
the special case of restricted cubic splines to model the baseline and time-dependent
effects, illustrated with applications in the areas of breast and bladder cancer.
1.3. Joint models for the analysis of longitudinal and survival
It is often intrinsic in clinical practice that longitudinal outcomes, such as repeat-
edly measured systolic blood pressure, and survival data, such as time to cardiovascular
disease, are collected in parallel. Most often, these outcomes are analysed separately,
2
for example using linear mixed effect models to model the longitudinal response, and
as described above, the survival outcome is commonly analysed using a proportional
hazards model. When these two processes are associated, a more complex approach
to analysis is required. This has led to the field of joint modelling of longitudinal and
survival data, sometimes known as simultaneous modelling, receiving substantial at-
tention over the past 20 years, particularly within methodological statistical research.
The most common approach combines the linear mixed effect model with a propor-
tional hazards model, through shared random effects, which serve to characterise the
association between the two processes. A joint model can provide much greater in-
sights into both the longitudinal and survival outcomes, reducing bias and making
most efficient use of the data available, in an attempt to disentangle the underlying
relationships (Gould et al., 2014). Common examples include the association between
repeatedly measured prostate specific antigen (PSA) levels and recurrence of prostate
cancer (Proust-Lima and Taylor, 2009), the relationship between cluster of differenti-
ation 4 (CD4) cell counts in predicting the progression to acquired immune deficiency
syndrome (AIDS) in human immunodeficiency virus (HIV) infected patients (Wang and
Taylor, 2001), and modelling quality of life over time jointly with survival (Billingham
and Abrams, 2002). Their potential for use in clinical practice is quite substantial,
enabling prediction of the time to an event conditional upon a biomarker trajectory to
date (Rizopoulos, 2011).
Motivated by allowing more flexibility in the joint model framework, I extend the
frequentist shared parameter joint model to incorporate the Royston-Parmar (RP) flex-
ible parametric survival model, which uses restricted cubic splines on the log cumulative
hazard scale, and compare it to a previous proposal which used B-splines (Rizopoulos
et al., 2009), in application to a dataset of patients with liver cirrhosis, investigating
the association between repeated measures of prothrombin index and survival. The
proposal is further evaluated through an extensive simulation study, where I compare
adaptive and non-adaptive Gauss-Hermite quadrature, used to calculate the joint like-
lihood. I then move to the more traditional log hazard scale, incorporating a finite
3
mixture of parametric distributions as the survival submodel, comparing it to the RP
based joint model in application to the liver cirrhosis trial, and further investigating the
effect of misspecifying the baseline hazard function through simulation. I then describe
the novel use of a specific association structure within a joint model, linking baseline
systolic blood pressure (SBP) to the time to stroke in obese patients with type 2 dia-
betes mellitus, but utilising the often ignored repeated measures of SBP to account for
measurement error. This approach is further evaluated through simulation, to assess
the impact of measurement error. All of the joint models described in this thesis are
implemented in the extensive stjm package in Stata, written as a central component of
this thesis, which is illustrated using a further dataset of patients with primary biliary
cirrhosis, investigating the association between repeated measures of serum bilirubin
and survival.
1.4. Layout of the thesis
The thesis is organised as follows. Chapter 2 introduces the field of survival analysis,
and describes the fundamental relationships that form the basis of this thesis. Chapter
3 introduces six clinical datasets that are used throughout the thesis to motivate the
methodological developments, briefly describing the medical setting and summary sta-
tistics. Chapter 4 develops a framework for the simulation of complex survival data,
providing methods to simulate more biologically plausible data. Chapter 5 develops
a framework for the general parametric analysis of survival data, illustrated with a
variety of applications in breast cancer and liver cirrhosis. Chapter 6 introduces the
joint modelling of longitudinal and survival data, and Chapters 7 and 8 develop new
joint models to provide more flexible survival submodels to capture complex hazard
functions, incorporating the Royston-Parmar survival model and a finite mixture of
parametric survival models, respectively. Both proposals are illustrated by investigat-
ing the association between repeated measures of prothrombin index and survival in a
prednisone treatment trial of patients with liver cirrhosis. Chapter 9 describes the use
of a specific association structure within a joint model, linking baseline SBP to the time
4
to stroke in obese patients with type 2 diabetes mellitus. I then describe in Chapter 10
user friendly Stata software for fitting a variety of joint models. I conclude the thesis
in Chapter 11 with a discussion, outlining further work that could be undertaken.
5
CHAPTER 2
Survival analysis
2.1. Chapter outline
This chapter introduces the foundations of survival analysis, including commonly
used methods to analyse survival data. The framework described in this chapter will
form the basis for methodological extensions described in later chapters.
2.2. Introduction
Often in medical research, interest lies in the time it takes for an event of interest
to occur, from a given baseline. The analysis of such data is described as survival or
time-to-event analysis, where the term ‘failure’ is used to describe the occurrence of
the event of interest, and the ‘survival time’ is the time taken for the event of interest
to occur (Collett, 2003). Common examples of this include the time to death in a ran-
domised controlled trial of patients with cardiovascular disease, with the time origin
being the randomisation of treatment, or time to progression to acquired immunode-
ficiency syndrome (AIDS) in human immunodeficiency virus (HIV) positive patients,
with the time origin being diagnosis of HIV.
In the analysis of survival data, we are interested in the probability of an event and
the rate at which the event occurs, and how this rate varies between groups of patients,
for example, between males and females. However, the analysis is often complicated by
the fact that not all patients will experience the event of interest within the allocated
follow-up period. These patients are said to be censored.
2.3. Censoring
The event of interest will usually not occur, or be observed, for all subjects within
the time frame of a particular study. For example, consider a cohort of patients with
6
breast cancer, who enter the study at diagnosis, and the goal is to investigate the effect
of treatment A compared to treatment B on the time to death. We will not have
the time to death for all patients unless we wait for all patients to die. In the case of
breast cancer, some patients will survive over 30 years, as such it is often not feasible to
continue a study beyond a certain time point. If patients have been under observation
for the duration of the study, but have not experienced the event of interest when the
study comes to an end, this is described as administrative censoring. Some patients
will be lost to follow-up (for example, a patient may move house and contact is lost
during the study period). When the study is concluded and the data is analysed, we
know that their unobserved time to death will be longer than their time in the study.
Censored patients cannot simply be removed from the analysis, as this will create a
missing not at random process which will affect inferences and lead to bias.
The above described form of censoring is generally known as right censoring. Other
types, such as left censoring and interval censoring can also occur. Left censoring
is when the event of interest occurs before a patient becomes under observation, for
example, consider patients followed up for recurrence of a particular cancer, following
removal of the primary tumor. They are examined two months after the operation, and
if a recurrence is found, then the actual time of recurrence is less than two months, and
therefore their time to recurrence is left censored. Interval censoring occurs when the
event of interest occurs within a particular time window, but it is not known exactly
when. For example, continuing with the recurrence of cancer example, if a patient is
event free at two months, but a recurrence is found at the second visit occurring at four
months, then the actual time of recurrence is known to have occurred between two and
four months, which can be described as interval censored.
2.3.1. Informative censoring
When patients are administratively censored, this can generally be considered to be
non-informative, i.e. the censoring process is not related to any factors which are
associated with a patient’s underlying survival time. When censoring is informative,
for example, if patients on a particular treatment being evaluated are withdrawn from
7
the study due to adverse effects of the particular treatment, and are subsequently
censored at this point, then standard analyses become invalid and care must be taken
when interpreting results. This can be viewed as a form of informative drop-out, which
will be considered in more detail in Chapters 6 to 10.
2.4. Fundamental relationships in survival analysis
The four fundamental functions in survival analysis are the cumulative probability
distribution function, F (t); survival function, S(t); the hazard function, h(t); and the
cumulative hazard function, H(t). Mathematically, they can all be written in terms of
one another. Let the continuous non-negative random variable T denote the observed
survival time, since entry to the study (t = 0). Suppose T has a probability distribution
with underlying probability density function f(t). The distribution function can then
be written as (Collett, 2003)
F (t) = P (T ≤ t) =
∫ t
0f(u)du, (2.1)
which represents the probability that the survival time is less than some value t. Al-
ternatively,
S(t) = P (T > t) (2.2)
which is the proportion of the study population who have not experienced the event
of interest by time t, i.e. the probability of being event-free at time t. The hazard
function is defined as
h(t) = limδt→0
P (t < T ≤ t+ δt|T > t)
δt(2.3)
= −d logS(t)
dt(2.4)
such that h(t) is the instantaneous failure rate at time t. I can further write
H(t) =
∫ t
0h(u)du (2.5)
= − logS(t)
8
where H(t) is the cumulative hazard function. The probability density function f(t) is
defined as,
f(t) = h(t)S(t) (2.6)
In any particular survival dataset, n patients will be observed, each with a time to event,
ti, and an event indicator, di, which takes the value of 1 if the ith patient experienced
the event, or 0 if they were censored.
Perhaps the most useful of these functions, in terms of communicating risk, are
the survival and hazard. The survival function allows us to make statements such as:
“The probability of experiencing the event within the next 5 years is A”, and the hazard
function allows us to make statements such as, “The rate of experiencing the event is B
per year.” From a patient’s perspective it is important to translate complex statistical
methods into communicable statements, easily understood by patient and clinician,
alike. Furthermore, these functions are required in the definition of the log-likelihood,
and subsequently required for estimation, which will be covered in Section 2.10.
2.5. Delayed entry (left truncation)
A patient does not always become at risk at time 0. Often in epidemiological
studies, age is used as the timescale, as it can be considered a better way of controlling
for age, compared to adjusting for it at baseline (Cheung et al., 2003; Thiebaut and
Benichou, 2004). Age at diagnosis, say, is then taken as the time at which a patient
becomes at risk of the event. Within a survival analysis, we must therefore condition
upon survival up to the entry time, and therefore the probability of surviving up to
time t, conditional on surviving up to time t0, can be written as
P (T > t|T > t0) =S(t)
S(t0)(2.7)
which is the survival function evaluated at time t, divided by the survival function
evaluated at the entry time, t0. This must be accounted for when estimating a model,
which will be described further in Sections 2.6.2 and 2.10.
9
2.6. Parametric survival analysis
To gain a greater insight into the observed data, we are generally interested in
fitting a statistical model. In survival analysis, this is usually conducted on the (log)
hazard scale, with covariates included in a linear predictor which is additive on the log
hazard scale. This then allows us to make statements such as by how much does being
exposed to a particular factor X increase/decrease the hazard rate, in relative terms.
It is also possible to model on alternative scales, such as the (log) cumulative hazard
scale, which I will return to in Section 2.8.2.
2.6.1. Exponential, Weibull and Gompertz distributions
By making assumptions about the shape of the hazard function we can begin to model
our time to event data.
2.6.1.1. Exponential distribution
The simplest assumption we can make is that the hazard is contant over time, and
therefore
h(t) = λ, S(t) = exp(−λt), f(t) = λ exp(−λt)
which gives f(t), the probability density function of the exponential distribution. There-
fore, if we assume that the hazard rate is constant we are assuming that the survival
times follow an exponential distribution.
2.6.1.2. Weibull distribution
A more flexible choice is the Weibull distribution, which assumes a hazard function
of the form
h(t) = λγtγ−1,
with
S(t) = exp(−λtγ), f(t) = λγtγ−1 exp(−λtγ)
10
By assuming survival times come from a Weibull distribution, we incorporate more
flexibility for the hazard function, which can now be monotonically increasing or de-
creasing. It should be noted that if γ = 1, then the Weibull model reduces to the
exponential.
2.6.1.3. Gompertz distribution
An alternative parametric distribution is the Gompertz, which has been used exten-
sively to model mortality data. I define the hazard function
h(t) = λ exp(γt)
with corresponding survival and probability density functions,
S(t) = exp−λγ−1(eγt − 1), f(t) = λ expγt− λγ−1(eγt − 1)
This distribution is suitable for modelling hazard rates which increase or decrease ex-
ponentially with time.
I illustrate some of the shapes that the three above distributions can capture, in
terms of the hazard function, in Figure 2.1.
0.05
0.10
0.15
0.20
0.25
Haz
ard
func
tion
0 1 2 3 4Follow-up time
Exp. l = 0.1 Weibull l = 0.1, g = 1.2Weibull l = 0.1, g = 0.8 Gompertz l = 0.1, g = 0.2Gompertz l = 0.1, g = -0.2
Figure 2.1. Example hazard functions from the exponential, Weibulland Gompertz distributions.
11
Note from Figure 2.1, that the hazard functions which can be modelled using these
three distributions are either constant or monotonically increasing or decreasing. The
ability to incorporate a more flexible, but still parametric, function will be returned to
in Section 2.8.2 and throughout the thesis.
2.6.2. Log-likelihood function for a parametric model
The log-likelihood contribution of the ith patient, for a parametric survival model,
allowing for right-censoring, can be written as
logLi = logf(ti)
diS(ti)1−di
= di log [f(ti)] + (1− di) log [S(ti)] (2.8)
where ti and di are the event/censoring time and event indicator for the ith patient,
respectively. Alternatively, using Equation (2.6) I can write
logLi = logh(ti)
diS(ti)
= di log [h(ti)] + log [S(ti)] (2.9)
Equations (2.8) and (2.9) assume a patient becomes at risk at time 0. As described in
Section 2.5, this may not always be the case; there may be the case where a patient is
not at risk of an event until some time after time 0. For example, this can occur when
age is used as the timescale, and a patient is not at risk until the age of diagnosis. To
incorporate delayed entry, Equation (2.9) is extended to
logLi = di log [h(ti)] + log [S(ti)]− log [S(t0i)] (2.10)
where t0i is the time at which the ith patient becomes at risk. The total likelihood is
therefore
L =
n∏
i=1
Li
12
or alternatively the total log likelihood is
logL =
n∑
i=1
logLi
Maximising a log-likelihood function such as Equation (2.10) will be covered in Section
2.10.
2.7. Cox proportional hazards model
Interest is often in comparing between groups, for example, a clinical trial may
investigate how the hazard rate in a group of patients randomised to standard therapy,
compares to the hazard rate in a group of patients randomised to receive a new therapy.
To illustrate this, consider a binary covariate, X, with X = 0 representing standard
therapy, and X = 1 representing a new therapy. If I let h0(t) represent the hazard
function for a patient on standard therapy, I can write the hazard function for the ith
patient as
hi(t) = h0(t) exp(Xiβ) (2.11)
which is known as the proportional hazards model, with h0(t) known as the baseline
hazard function (i.e. the hazard function when all covariates are equal to zero), and
therefore
hi(t) =
h0(t) exp(β) if Xi = 1
h0(t) if Xi = 0
To compare the two groups, I can calculate how much higher (or lower) the hazard rate
is in the new therapy group (X = 1), compared to standard therapy (X = 0)
HR =h0(t) exp(β)
h0(t)= exp(β)
which is known as the Hazard Ratio (HR), and equivalently, β is known as the log
hazard ratio. This of course generalises to a vector of covariates, Xi, and associated
vector of log hazard ratios, β.
13
Cox (1972) derived a partial likelihood function for the ith patient for a proportional
hazards model (assuming no ties) as
Li(β) =
[exp(Xiβ)∑
l∈R(ti)exp(Xiβ)
]di(2.12)
where R(ti) is the set of individuals who are at risk at time ti. The corresponding
log-likelihood function is written as
logLi(β) = di
Xiβ − log
∑
i∈R(ti)
exp(Xiβ)
(2.13)
Unlike the parametric models described in Section 2.6, this formulation does not require
specification of h0(t), the baseline hazard function, and therefore the distribution of T
does not have to be specified. Equation (2.12) is therefore not a full likelihood, but
known as a partial likelihood (Cox, 1975), as it does not make use of the observed
survival or censoring times. It can be estimated using standard procedures such as
Newton-Raphson, described later in Section 2.10.
The key assumption of Equation (2.11) is that of proportional hazards, in that
covariate effects remain constant over the entire of follow-up. Various diagnostics have
been proposed in the literature to assess the proportional hazards assumption (Gramb-
sch and Therneau, 1994), including graphical approaches such as log(− log(S(t))) plots,
and those based on residuals (Schoenfeld, 1982). Alternatively, the presence of non-
proportional hazards can be evaluated directly by including an interaction of the co-
variate of interest and a function of time (Sauerbrei et al., 2007). The simulation and
modelling of non-proportional hazards will be considered further in Chapters 4 and 5.
2.8. Royston-Parmar flexible parametric survival model
In the analysis of clinical trial data or population based studies, the phenomenon
of a turning point in the underlying hazard function, is often observed (Murtagh et al.,
1994; Lambert and Royston, 2009; Eloranta et al., 2012). For example, in cancer trials,
the mortality rate tends to peak soon after the beginning of the trial, before tailing off.
14
Although hazard ratios can generally be quite insensitive to a poorly specified baseline
hazard function (Rutherford et al., 2014), when interest lies in measures of absolute risk
it is vital to accurately capture the baseline (Royston and Lambert, 2011). Use of more
flexible, yet parametric, models is increasing in medical research as the benefits of a
parametric approach, such as the reporting of measures of absolute risk, prediction and
extrapolation, become recognised (King et al., 2012). One such class of survival models
is the Royston-Parmar flexible parametric survival model (Royston and Parmar, 2002).
This model framework relies on the use of restricted cubic splines to model the baseline
on a variety of scales.
2.8.1. Restricted cubic splines
The use of splines, piecewise smooth polynomials, to model nonlinear relationships in
statistics is commonplace (Durrleman and Simon, 1989). They provide an extremely
flexible framework to capture complex shapes in patterns of data. They are continuous
in nature, and have constraints imposed that ensure they are continuous at the places
where the piecewise polynomials join, called knots. Here I concentrate on restricted
cubic splines, as they will be used extensively in later chapters.
I begin with defining a cubic spline function, s3(x), which is the most commonly
used
s3(x) =
3∑
j=0
β0jxj +
K∑
i=1
βi3(x− ki)3+ (2.14)
with
u+ =
u if u > 0
0 if u ≤ 0
This can be extended to allow for a cubic interpolation with the added restriction
that the fitted function is forced to be linear beyond the boundary knots, known as a
restricted cubic spline function. I can therefore write a restricted cubic spline function
of x, with vector of knots k0, as sx|γ,k0. For example, with K knots (in ascending
15
order) a restricted cubic spline function can be expressed as:
s(x) = γ0 + γ1z1 + γ2z2 + · · ·+ γK−1zK−1 (2.15)
and
z1 = x
zp = (x− kp)3+ − κp(x− k1)3
+ − (1− κp)(x− kK)3+ p = 2, . . . ,K − 1
where
κp =kK − kpkK − k1
Choosing the number of knots, K, will be considered further in Section 2.8.3.
2.8.2. The Royston-Parmar model
Following Royston and Parmar (2002), to derive the flexible parametric model, I begin
with the survival function for a Weibull distribution
S(t) = exp(−λtγ)
Transforming to the log cumulative hazard scale,
logH(t) = log [− logS(t)] = log(λ) + γ log(t)
Therefore on the log cumulative hazard scale, this gives a linear function of log time.
Incorporating a vector of baseline covariates, Xi, gives
logH(t|Xi) = log(λ) + γ log(t) +Xiβ (2.16)
The fundamental idea of the Royston-Parmar model is to replace log(t) with a restricted
cubic spline function of log(t), to provide much greater flexibility. By relaxing the linear
relation between the log cumulative hazard and log time, it provides a much more
flexible modelling framework to capture non-linearities in the log cumulative hazard
function (and subsequently the hazard function). Furthermore, the (log) cumulative
16
hazard function can be considered a simpler function than the hazard function, and so
fewer spline terms are generally required to capture it, compared to modelling on the
(log) hazard scale (Royston and Lambert, 2011).
I now substitute the baseline function, log(λ) + γ log(t), in Equation (2.16), for a
restricted cubic spline function of log(t)
logH(t|Xi) = ηi(t) = slog(t)|γ,k0+Xiβ (2.17)
and transforming to the hazard and survival scales, we have
h(t|Xi) =
1
t
dslog(t)|γ,k0d log(t)
exp(ηi(t)), S(t|Xi) = exp− exp(ηi(t))
Given the fully parametric nature of the model, the derivatives of the spline function
required in the definition of the hazard function can be easily calculated. For fur-
ther details on the Royston-Parmar survival model, I refer the reader to Royston and
Lambert (2011).
This ‘flexible parametric model’ has been gaining in popularity in recent years,
with a number of applied examples in the areas of cancer survival (Lambert et al.,
2011) and prognostic research (Miladinovic et al., 2012; Rooney et al., 2013), and
methodological extensions such as competing risks (Hinchliffe and Lambert, 2013),
incorporating a cure fraction (Andersson et al., 2011), and relative survival (Nelson
et al., 2007). Incorporating the Royston-Parmar survival model into a joint model
framework is the focus of Chapter 7.
When applying this form of survival model, knot locations are often defined based
on the distribution of event times, i.e. equally spaced centiles. The linearity assumption
before the first knot and after the final knot leads to stability in the estimated function
at the extremes of the data, with the first and final knots often chosen to be the
minimum and maximum observed (log) event times. The parametric nature of the
survival model ensures that smooth continuous time predictions can be obtained and
tailored at the individual level, allowing out of sample predictions to be made.
17
2.8.3. Selecting the degrees of freedom
A central component of the RP model is that a number of degrees of freedom (or
equivalently a number of knots), must be specified to model the baseline cumulative
hazard function. This is also true of modelling time-dependent effects using splines;
however, for brevity I concentrate only on the aspect of capturing the baseline.
A recent extensive simulation study that I was involved in, which directly utilised
the simulation methodology developed in Chapter 4, assessed standard model selection
criteria in selecting the optimum degrees of freedom (Rutherford et al., 2014), including
the Akaike Information Criterium (AIC) proposed by Akaike (1973), and the Bayesian
Information Criterium (BIC) proposed by Gelfand and Dey (1994).
Rutherford et al. (2014) concluded that restricted cubic splines provide an effective
way of capturing simple and complex baseline hazard functions, with the approxima-
tion not heavily reliant on the ‘correct’ selection of number of knots. We found that
if a sufficient number of knots are chosen, the hazard function can be very closely ap-
proximated. However, care must be taken not to specify too many degrees of freedom,
as this can lead to over-fitting where local deviations in the data are picked up.
The default way of choosing knot locations is based on equally spaced centiles of
the distribution of (log) event times, which was found to perform very well across all
scenarios. Of course, sensitivity analyses can be conducted to assess this on specific
clinical datasets, by varying the knot locations, where parameter estimates have been
found to be quite robust (Lambert et al., 2010b; Andersson et al., 2011). This can
be expected as hazard ratios are often quite insensitive to a poorly specified baseline
hazard, which can occur if too few degrees of freedom are chosen.
2.9. Why a parametric approach?
Despite the Cox model remaining the most popular survival model choice (Cox,
1972), there is growing interest in parametric survival models (King et al., 2012). In
particular, there are a number of benefits to undertaking a parametric approach in the
analysis of survival data. As described above, the Cox model does not directly estimate
18
the baseline hazard function, whereas a parametric model does. By directly modelling
the baseline hazard function, measures of absolute risk, such as hazard rates, can be
obtained directly, including quantifying the associated uncertainty. From epidemiolog-
ical and clinical perspectives, this is particularly useful in illustrating how hazard rates
change and evolve over time. If the baseline is correctly specified then a fully paramet-
ric approach can be more efficient, with smaller standard errors, than the equivalent
Cox model estimated using partial likelihood (Collett, 2003); however, this gain is often
very minor. The modelling of time-dependent effects (non-proportional hazards) can
be conducted much more easily within a parametric framework, and is often compu-
tationally more efficient when compared to a Cox model with time-dependent effects.
Modelling both the baseline hazard and time-dependent effects in continuous time has
the further benefit of allowing predictions to be made both in and out of sample, which
is particularly useful in prognostic modelling. Hazard ratios can also be translated back
to an absolute scale, which can be of use for calculating the number needed to treat.
Finally, within the economic decision modelling framework, parametric survival models
must be used to model survival data, as extrapolation is often required, frequently over
a lifetime horizon, in order to calculate projected treatment effects and costs (Latimer,
2013).
In particular, there is growing use of the Royston-Parmar flexible parametric sur-
vival model, described in the previous section (Royston and Parmar, 2002). Recently,
a number of applied publications have extolled the benefits of the Royston-Parmar ap-
proach (Miladinovic et al., 2012; Rooney et al., 2013). In direct comparison to the Cox
model, they found that “On balance we prefer the Royston-Parmar model over Cox as
it comes with the advantages of parametric models whilst closely matching the Cox es-
timates”, and concluded that “Royston-Parmar flexible parametric modeling proved a
powerful method” (Rooney et al., 2013), and “Researchers involved in prognosticating
patient survival are encouraged to consider the Royston-Parmar model as an alternative
to Cox” (Miladinovic et al., 2012).
19
2.10. Maximum likelihood estimation
The models described in Section 2.6 can be estimated using maximum likelihood
techniques (Gould et al., 2010). Once again, the log-likelihood contribution of the ith
patient, allowing for right censoring and delayed entry (left truncation), is written as
logLi = log
h(ti)
diS(ti)
S(t0i)
= di logh(ti)+ logS(ti) − logS(t0i) (2.18)
where t0i and ti are the observed entry and survival/censoring times, respectively, for
the ith patient. If delayed entry is not present then the third term in Equation (2.18)
can be dropped. Using Equation (2.18) the log-likelihood can be directly maximised if
using one of the common parametric survival models described in Section 2.6.1.
Finally, it should be noted that using Equation (2.5), Equation (2.18) can be written
only in terms of the hazard function
logLi = di logh(ti) −∫ ti
t0i
h(u)du (2.19)
This directly implies that a model can be estimated using only the hazard function,
assuming the integral in Equation (2.19) can be calculated. Given that often the hazard
function is of most interest, specifying a complex hazard function can be advantageous.
I will return to this aspect of the log-likelihood function for parametric models in
Chapter 5.
More formally, I have the likelihood function with parameter vector θ ∈ Θ, given
the observed data vector, X
L(θ;X) =
n∏
i=1
L(θ;Xi) (2.20)
with the assumptions that the observations are independent and identically distributed.
2.10.1. The Newton-Raphson algorithm
Maximising the likelihood in question can be conducted using a variety of techniques.
20
Here I briefly describe, based on Gould et al. (2010), the most commonly used approach,
as implemented in the ml suite in Stata (StataCorp., 2011). In this section I assume
interest in solving for a parameter vector, to maintain generality.
In essence, I wish to find the estimated parameter values, θ, which maximise a
likelihood function given the observed data, X:
L(θ;X) = maxθ∈Θ
L(θ;X) (2.21)
or equivalently
logL(θ;X) = maxθ∈Θ
logL(θ;X) (2.22)
To obtain the solution, I can attempt to solve this analytically by taking the first
derivative with respect to the parameter vector, and setting it equal to zero
∂ logL(θ;X)
∂θ
∣∣∣∣θ=θ
= 0 (2.23)
which represents finding the roots of the gradient vector.
Perhaps the most popular method of finding roots is Newton’s method. This is an
iterative procedure, which begins with a set of initial values, θi, and is updated under
the following scheme:
(1) Start with an initial guess θi
(2) Calculate a new guess θi+1 = θi+−H(θi)−1g(θi), where g(θi) is the gradient
vector, or score, and H(θi) is the matrix of second derivatives, or Hessian.
(3) Repeat
The procedure will be repeated until specified convergence criteria are met. For
example, within Stata, convergence is declared when g(θi) ∗ H(θi)−1 ∗ g(θi)
′ < ε1,
where ε1 = 1E − 05. A further 1 of 2 conditions must also be met before convergence
is declared;
(1) |θi+1 − θi| < ε2, where ε2 = 1E − 06,
(2) | logL(θi+1; Z)− logL(θi; Z)| < ε3, where ε3 = 1E − 07.
21
2.10.2. Numerical derivatives
The algorithm described above requires the log-likelihood function, and its first and
second derivatives. If the score and Hessian cannot be calculated analytically, then
Stata can provide a numerical approximation. Or even if they can, general purpose
optimisers such as the ml suite in Stata can be considered powerful and accurate enough
that they can be used even when the score and Hessian can be programmed analytically
(MacDonald, 2014).
An analytic derivative can be defined as
f ′(z) =df(z)
dz= lim
h→0
f(z + h)− f(z)
h(2.24)
which leads to the approximation
f ′(z) ≈ f(z + h)− f(z)
h(2.25)
for an appropriate h. Stata actually uses a centred derivative calculation of the form
f ′(z) ≈ f(z + h/2)− f(z − h/2)
h
which although it increases the computation time (as f(z) from Equation (2.25) is
already calculated in the likelihood function call), it reduces the order of magnitude
of the error from O(h) to O(h2). For more details on choosing h, and the numerical
accuracy of the approximation, I refer the reader elsewhere (Gould et al., 2010).
Similarly, the same method can be applied to calculate the required second deriva-
tives
f ′′(z) ≈ f ′(z + h/2)− f ′(z − h/2)
h
All methodological developments described in this thesis use this approach. Recently,
MacDonald (2014) has argued that, given the availability of general purpose numerical
maximisers, there is no need to go directly to the Expectation-Maximisation (EM)
algorithm when handling incomplete data.
22
2.11. Discussion
In this chapter I have introduced the key fundamentals of survival analysis and
introduced notation that will be used in succeeding chapters. The parametric distri-
butions described in this chapter will be used to simulate survival data in Chapter
4, a process often required to validate model assumptions and assess new methodol-
ogy, which will then be extended to simulate complex survival data, including joint
longitudinal-survival data to be used in Chapters 7, 8 and 9.
Researchers are encouraged to adopt new methodology, and move away from the
‘cottage industry’ that has developed around the Cox model (Reid, 1994). Given the
benefits of the parametric approach it is hoped that researchers move away from the de-
fault choice, and embrace the alternatives and the benefits that they can provide. This
motivates the methodological developments in Chapter 5, where I propose a general
framework for the parametric analysis of survival data.
In Chapter 7, I will extend the Royston-Parmar model to the joint longitudinal-
survival modelling framework, illustrating the benefits that a flexible parametric ap-
proach can have within the joint model framework.
23
CHAPTER 3
Clinical motivating examples
3.1. Chapter outline
In this chapter I describe the motivating clinical datasets used throughout the thesis.
The methodological developments described in subsequent chapters can be motivated
by a variety of datasets from varying clinical areas, not just limited to a particular
disease or setting of interest. Here I present summary statistics and exploratory data
analyses of datasets from areas such as cancer, cardiovascular disease and liver cirrhosis.
This includes non-parametric methods, such as Kaplan-Meier survival curves, which is
a standard way to present survival curves (Kaplan and Meier, 1958), and Royston-
Parmar models to illustrate parametric estimates of the hazard function, with degrees
of freedom selected using the AIC and BIC, described in Chapter 2.
3.2. German breast cancer dataset
The first dataset I describe comes from a study of 686 women diagnosed with breast
cancer in Germany (Schumacher et al., 1994), with 246 patients randomized to receive
hormonal therapy and 440 to receive a placebo. The primary outcome is recurrence-free
survival, with 299 patients experiencing the event of interest, i.e. recurrence or death.
Figure 3.1 shows the Kaplan-Meier plot for recurrence-free survival, across treatment
groups, where baseline was defined as time of mastectomy.
An indication of a beneficial treatment effect can been seen from Figure 3.1, with
improved survival in the arm treated with hormonal therapy, throughout follow-up.
This can be formally investigated using a Cox proportional hazards model, therefore,
as an initial analysis, I fit a Cox proportional hazards regression model to the data,
investigating the effect of treatment, with results shown in Table 3.1.
24
0.00
0.25
0.50
0.75
1.00
Sur
viva
l pro
babi
lity
0 2 4 6 8Time since mastectomy (years)
ControlHormonal therapy
Kaplan-Meier survival estimates
Figure 3.1. German breast cancer trial. Survival function is estimatedusing the Kaplan-Meier method, across treatment groups.
Table 3.1. German breast cancer trial. Results from a Cox propor-tional hazards model, investigating effect of hormonal therapy on sur-vival.
Variable Hazard Ratio Std. Err. 95% CI
Treatment 0.695 0.087 0.544 0.888
Table 3.1 shows the estimated effect of treatment, with a hazard ratio of 0.695
(95% CI: 0.544, 0.888), indicating a beneficial treatment effect reducing the rate of
breast cancer recurrence by 30.5% (11.2%, 45.6%). The fundamental assumption of
proportional hazards can be evaluated using a test of non-zero slope in a generalised
linear regression of the scaled Schoenfeld residuals on time (Grambsch and Therneau,
1994), with the null hypothesis of zero slope (equating to proportional hazards). The
test for proportional hazards in the effect of hormonal therapy gives a p-value of 0.639,
showing no statistically significant evidence against the null hypothesis of proportional
hazards.
As described in Chapter 2, obtaining an estimate of the hazard function can provide
clinically meaningful information, making such things as prediction and extrapolation
possible. It is much more convenient to do this through a parametric approach. I
therefore now apply a proportional hazards Royston-Parmar survival model to the
25
breast cancer data, investigating the effect of hormonal therapy, and modelling the
baseline hazard using restricted cubic splines with four degrees of freedom. Results are
presented in Table 3.2.
Table 3.2. German breast cancer trial. Results from a Royston-Parmar proportional hazards model with 4 degrees of freedom, inves-tigating effect of hormonal therapy on survival.
Variable Hazard ratio Std. Err. 95% CI
Treatment 0.695 0.087 0.544 0.888
Baseline Coefficient Std. Err. 95% CI
Spline 1 1.513 0.137 1.244 1.782Spline 2 0.483 0.134 0.221 0.746Spline 3 0.011 0.047 -0.080 0.103Spline 4 -0.037 0.017 -0.071 -0.003
Intercept -1.173 0.082 -1.334 -1.012
From Table 3.2 I observe an identical estimate of the effect of hormonal therapy on
survival to that of the Cox model, shown in Table 3.1, with an estimated hazard ratio
of 0.695 (95% CI: 0.544, 0.888). In Table 3.2 I also present the estimated coefficients
for the spline terms used to model the baseline log cumulative hazard function, which
can therefore be directly used to produce Figure 3.2, which shows the estimated hazard
function for each treatment arm (confidence intervals are suppressed for illustration).
Figure 3.2 indicates the presence of at least one turning point in the baseline hazard
function, which is often observed in cancer and cardiovascular trials, where the mortality
rate peaks soon after the start of the trial, before tailing off (Royston, 2012). Clearly,
a standard parametric distribution such as the exponential or Weibull, would fail to
capture such a complex hazard function, as I illustrated in the example hazard functions
shown in Figure 2.1 of Chapter 2. Non-parametric kernel smoothers are often used to
show the estimated hazard function; however, they can be particularly erratic (Clark
et al., 2003), with large differences occurring between alternative kernel smoothers. I
illustrate this in Figure 3.3, which shows two commonly used non-parametric estimates
26
0.00
0.05
0.10
0.15
0.20
0.25
Rec
urre
nce
rate
0 2 4 6 8Time since mastectomy (years)
ControlHormonal therapy
Figure 3.2. German breast cancer trial. Hazard function estimatedusing the Royston-Parmar survival model
of the hazard function, namely the Epanechnikov and Epanechnikov 2 kernel smoothers,
respectively (Silverman, 1992).
0.05
0.10
0.15
0.20
0.25
0.30
Rec
urre
nce
rate
0 2 4 6Time since mastectomy (years)
Epanechnikov kernel
0.05
0.10
0.15
0.20
0.25
0.30
Rec
urre
nce
rate
0 2 4 6Time since mastectomy (years)
Epanechnikov 2 kernel
Smoothed hazard estimates
Control Hormonal therapy
Figure 3.3. German breast cancer trial. Non-parametric estimates ofthe hazard function using the Epanechnikov and Epanechnikov 2 kerneldensity smoothers. Follow-up is restricted to 6 years.
Comparing non-parametric smoothers, it is clear that I obtain quite different esti-
mates of the hazard function, although in the non-parametric method I am not assuming
proportional hazards. In particular, I have restricted follow-up to 6 years due to erratic
27
behavior in the tail when using the Epanechnikov 2 kernel smoother. Comparing to
Figure 3.2, the parametric approach produces a much more stable estimated function.
The breast cancer trial will be utilised in Chapter 4, building on the illustration de-
scribed here, that complex hazard functions, incorporating turning points, often occur
in clinical datasets. Therefore, within simulation studies designed to assess methods
used in practice, it is important that simulation scenarios accurately reflect such com-
plex settings.
3.3. Breast cancer in England and Wales
The second dataset used in this thesis consists of 115,331 women diagnosed with
breast cancer between 1986 and 1990 (inclusive) in England and Wales, with follow-
up to the end of 1995. In this case, the event of interest is death from any cause,
and follow-up is restricted to five years post diagnosis. Two covariates are available;
age at diagnosis (in years), and deprivation status categorised into 5 levels (1 = least
deprived,. . . , 5 = most deprived, with quantile group based on the Carstairs deprivation
index (Coleman et al., 1999)). The Carstairs deprivation index is an area measure based
on postcode (Carstairs and Morris, 1991).
All analyses presented in this thesis are restricted to comparing the least and most
deprived groups of patients, resulting in 41,645 women. Furthermore, the analyses
presented in Chapter 5, except in Section 5.11.1 where I model relative survival, are
restricted to women less than 50 years of age, which therefore consists of 9,721 women.
Of the 9,721 women; 2,847 (29.29%) died within 5 years of diagnosis. As an initial ex-
ploration of the reduced dataset, Figure 3.4 shows the Kaplan-Meier survival function,
for the least and most deprived groups, with associated 95% confidence intervals.
Figure 3.4 indicates a marked difference in survival between deprivation groups,
indicating poorer survival in the most deprived group. I further investigate this by
fitting a Cox proportional hazards model, investigating the effect of deprivation status
on survival, with results presented in Table 3.3.
28
0.00
0.25
0.50
0.75
1.00
Sur
viva
l pro
babi
lity
0 1 2 3 4 5Time since diagnosis (years)
Least deprivedMost deprived
Kaplan-Meier survival estimates
Figure 3.4. England and Wales breast cancer data, all ages, compar-ing least and most deprived. Survival function is estimated using theKaplan-Meier method.
Table 3.3. England and Wales breast cancer dataset. Results froma Cox proportional hazards model, investigating effect of deprivationstatus on survival.
Variable Hazard Ratio Std. Err. 95% CI
Most deprived 1.309 0.051 1.212 1.414
Reference group = least deprived
Results from Table 3.3 indicate a statistically significant hazard ratio of 1.309 (95%
CI: 1.212, 1.414) for the effect of deprivation status, indicating a 30.9% higher mortal-
ity rate in the most deprived group compared to the least deprived. However, using
the same proportional hazards test as was used in Section 3.2, I obtain a p-value of
<0.001, indicating strong evidence against the null hypothesis of proportional hazards.
This motivates the investigation of the presence of a time-dependency in the effect of
deprivation status, which will be considered further in Chapter 5.
For illustration, I proceed assuming proportional hazards is valid, and fit a Royston-
Parmar survival model to estimate the baseline hazard function, using 5 degrees of
freedom, with results shown in Table 3.4.
29
Table 3.4. England and Wales breast cancer dataset. Results froma Royston-Parmar proportional hazards model, investigating effect ofdeprivation status on survival.
Variable Hazard ratio Std. Err. 95% CI
Most deprived 1.309 0.051 1.212 1.414
Baseline Coefficient Std. Err. 95% CI
Spline 1 0.752 0.015 0.722 0.782Spline 2 -0.024 0.012 -0.047 -0.002Spline 3 0.053 0.006 0.041 0.066Spline 4 0.007 0.003 0.001 0.014Spline 5 0.000 0.002 -0.003 0.003
Intercept -1.467 0.024 -1.514 -1.421
Comparing results from Tables 3.3 and 3.4, I observe complete agreement between
Cox and Royston-Parmar models, with a hazard ratio of 1.309 (95% CI: 1.212, 1.414)
comparing the most to the least deprived groups. I show the estimated hazard function
in Figure 3.5.
0.04
0.06
0.08
0.10
0.12
Mor
talit
y ra
te
0 1 2 3 4 5Time since diagnosis (years)
Least deprivedMost deprived
Figure 3.5. England and Wales breast cancer dataset, all ages, com-paring least and most deprived. Hazard function is estimated using aRoyston-Parmar model.
The high initial mortality rate is generally attributed to a combination of patients
diagnosed with late-stage breast cancer and patients who are diagnosed due to investi-
gation of a fatal condition, unrelated to the cancer (Royston and Lambert, 2011). This
30
dataset is used extensively in Chapter 5 to illustrate more flexible parametric stan-
dard survival models, where I return to investigating the presence of a time-dependent
effect due to deprivation status, and the extension to relative survival utilising the
unrestricted age range dataset.
3.4. Bladder cancer recurrence
In this thesis, I predominantly concentrate on single event survival analysis. How-
ever, patients can often be at risk of multiple events, for example if the event is non-fatal,
then it may occur multiple times. To illustrate this, I employ a commonly used example
in bladder cancer (Wei et al., 1989; Therneau and Grambsch, 2000), which consists of
86 patients who experience up to 4 recurrences of their bladder cancer. Tumours were
removed initially, before patients were randomised to treatment, and following recur-
rence, further tumours were removed at each repeat visit. The number of recurrences
that patients experience are shown in Table 3.5, with a maximum of 4. Some patients
experience a recurrence and then are immediately lost to follow-up, and hence are not
considered at risk of a subsequent recurrence. In total, 112 recurrences were observed.
Covariates of interest include treatment group (0 for placebo, 1 for thiotepa), initial
number of tumors (range 1 to 8, with 8 meaning 8 or more), and initial size of tumors
(in centimetres, with range 1 to 7).
Table 3.5. Bladder cancer recurrence dataset. Number of patients whowere censored or experienced up to 4 recurrences of bladder cancer
Recurrence Number of patientsnumber Censored Event Total
1 38 47 852 17 29 463 5 22 274 6 14 20
This dataset will be utilised in Chapter 5 Section 5.11.2 to illustrate the use of
cluster robust standard errors to account for the correlation induced between recurrent
events, within a parametric survival analysis using restricted cubic splines to model the
31
baseline hazard function. This will allow the estimation of separate hazard functions
for each recurrence.
3.5. Trial in prednisone treatment of patients with liver cirrhosis
Here I describe a trial investigating the effect of prednisone treatment, in patients
with liver cirrhosis, previously described in Anderson et al. (1993) and others including
Henderson et al. (2002). The data consists of 488 patients with cirrhosis of the liver,
where 251 were randomised at diagnosis to receive treatment with prednisone, and 237
receive a placebo. A total of 292 patients died during the study, with maximum follow-
up of just over 13 years. A further covariate of interest is prothrombin index, which was
measured repeatedly throughout the study. Measurements were scheduled at baseline,
3, 6, and 12 months, and then annually; however, actual time of measurement varied
substantially. Figure 3.6 presents the Kaplan-Meier survival function, across treatment
groups.
0.00
0.25
0.50
0.75
1.00
Sur
viva
l pro
babi
lity
0 5 10 15Time since diagnosis (years)
PlaceboPrednisone
Kaplan-Meier survival estimates
Figure 3.6. Liver cirrhosis trial investigating prednisone treatment.Survival function is estimated using the Kaplan-Meier method.
Figure 3.6 indicates little difference between treatment groups in survival during
the first 2-3 years since diagnosis, then the curves diverge indicating a possible im-
proved prognosis in the prednisone group, before the curves come together again at
32
approximately 9 years. I also conduct a preliminary investigation of the shape of the
baseline hazard, using a Royston-Parmar model with 5 degrees of freedom investigat-
ing the direct treatment effect on survival. The estimated hazard function is shown in
Figure 9.3.
0.00
0.10
0.20
0.30
0.40
Mor
talit
y ra
te
0 5 10 15Time since diagnosis (years)
PlaceboPrednisone
Figure 3.7. Liver cirrhosis trial investigating prednisone treatment.Hazard function estimated using a Royston-Parmar model.
The Royston-Parmar model indicated a non-statistically significant hazard ratio of
0.909 (95% CI: 0.723 1.144) comparing prednisone treatment to placebo. Here I am
interested not only in a possible treatment effect on survival, but also the association
between the repeatedly measured prothrombin index and survival, which will be covered
extensively in Chapters 7 and 8.
In Figures 3.8 and 3.9 I present the longitudinal profiles of prothrombin index over
time, combining treatment groups for purposes of illustration, with a lowess smoother
overlaid to give an indication of the average profile over time, and comparing between
patients who died and who were censored. Figure 3.8 begins to give an idea of the
relationship between the longitudinal and survival processes, which becomes further
pronounced in Figure 3.9, where the timescale is adjusted by taking away each patients
observed survival time. Figure 3.9 indicates a trend in prothrombin index in the 2-3
33
year period before death, possibly indicating that lower values of prothrombin index
could be associated with an increased risk of death.
050
100
150
200
Long
itudi
nal r
espo
nse
0 5 10Measurement time
Censored
050
100
150
200
Long
itudi
nal r
espo
nse
0 5 10Measurement time
Event
Figure 3.8. Liver cirrhosis trial investigating prednisone treatment.Longitudinal profiles of prothrombin index for patients who were cen-sored/died. Lowess smoother overlaid.
050
100
150
200
Long
itudi
nal r
espo
nse
-15 -10 -5 0Time before censoring
Censored
050
100
150
200
Long
itudi
nal r
espo
nse
-15 -10 -5 0Time before event
Event
Figure 3.9. Liver cirrhosis trial investigating prednisone treatment.Longitudinal profiles of prothrombin index for patients who were cen-sored/died. Timescale is adjusted by taking away each patients observedsurvival time. Lowess smoother overlaid.
Figure 3.7 highlights another example of a complex underlying hazard function,
with multiple turning points, which cannot be captured using standard parametric
34
methods. It is this which motivates the incorporation of more flexible models, such as
the Royston-Parmar model and a finite mixture survival model, which will be developed
within the joint model framework in Chapters 7 and 8.
3.6. Cohort of obese patients with type 2 diabetes mellitus from the
General Practice Research Database
I now introduce an observational dataset, which comes from the General Practice
Research Database (GPRD). This dataset comes from a recent Health Technology As-
sessment (HTA) that I was involved in, investigating the clinical and cost effectiveness
of anti-obesity treatments in primary care (Ara et al., 2012). The dataset I describe
is a subset of the full 100,000 patient cohort analysed in the HTA report, and consists
of 4,850 obese patients with diabetes mellitus. Covariates of interest include gender
and age (in years) at baseline, and repeated measures of systolic blood pressure (SBP),
of which I have 107,347 measurements. The event of interest in this case is non-fatal
stroke, of which 278 were observed, with maximum follow-up of 22 years. Primary
interest in the analysis presented in Chapter 9, is the association between baseline SBP
and the risk of stroke. Baseline is defined as the time at which each patient entered
the cohort, i.e. time of first SBP measurement. Figure 3.10 shows the Kaplan-Meier
survival function.
The baseline hazard function is estimated using a Royston-Parmar model with two
degrees of freedom, with the estimated function and 95% confidence interval shown in
Figure 3.11.
In Figure 3.12 I illustrate the distribution of baseline SBP, separately for patients
who were censored and for those who suffered a stroke, indicating slightly higher values
of SBP for patients who had a stroke. Figure 3.13 illustrates the distribution of the
number of observed measurements of SBP per patient. Median number of measure-
ments is 17, with a range of 1 to 269.
This dataset will be used in Chapter 9 in which I investigate the use of a specific
association structure within a joint longitudinal-survival framework, to account for
35
0.00
0.25
0.50
0.75
1.00
Sur
viva
l pro
babi
lity
0 5 10 15 20 25Follow-up time (years)
Kaplan-Meier survival estimate
Figure 3.10. GPRD cohort of obese patients with type 2 diabetesmellitus. Survival function is estimated using the Kaplan-Meier method.
0.00
0.01
0.02
0.03
0.04
0.05
Haz
ard
func
tion
0 5 10 15 20 25Follow-up time (years)
95% confidence intervalHazard function
Figure 3.11. GPRD cohort of obese patients with type 2 diabetesmellitus. Hazard function is estimated using a Royston-Parmar model.
measurement error when linking baseline SBP and the risk of stroke. With an observa-
tional dataset, the issue of appropriate choice of timescale must be raised. The baseline
time point used here is time of first measurement of SBP; however, when estimating
survival probabilities, this does not provide an intuitive, clinically meaningful way of
conveying survival to patients. More appropriate perhaps, would be to use age as the
timescale; however, this requires delayed entry to be incorporated into the joint model
36
5010
015
020
025
0
Censored Stroke
Sys
tolic
Blo
od P
ress
ure
Graphs by Stroke
Figure 3.12. GPRD cohort of obese patients with type 2 diabetesmellitus. Box plots of baseline SBP for patients who were censored orwho suffered a stroke.
0.0
1.0
2.0
3D
ensi
ty
0 50 100 150 200 250No. of measurements of SBP
Figure 3.13. GPRD cohort of obese patients with type 2 diabetesmellitus. Distribution of number of observed measurements of systolicblood pressure per patient.
framework, which is not currently available in any software package. This extension
will be discussed further in Chapter 11.
37
3.7. Trials of D-penicillamine treatment for patients with primary biliary
cirrhosis
The final dataset I present is in the area of Primary Biliary Cirrhosis (PBC), which
is a long-term disease of the liver affecting the small bile ducts, leading to a build up of
bile which subsequently damages the liver (Murtagh et al., 1994). The dataset consists
of 312 patients who were enrolled in two trials investigating the effect of D-penicillamine
treatment, of which 140 died. A total of 1945 measurements of serum bilirubin were
recorded and are available for analysis. Figure 3.14 presents the Kaplan-Meier survival
curve, across treatment groups.
0.00
0.25
0.50
0.75
1.00
Sur
viva
l pro
babi
lity
0 5 10 15Time since diagnosis (months)
PlaceboD-penicillamine
Kaplan-Meier survival estimates
Figure 3.14. Primary biliary cirrhosis trials investigating D-penicillamine treatment. Survival function is estimated using theKaplan-Meier method.
Figure 3.14 indicates very similar survival probabilities between the two treatment
groups. This is further illustrated by fitting a Royston-Parmar survival model with 1
degree of freedom (equivalent to a Weibull model), investigating the effect of treatment.
The estimated hazard function is shown in Figure 3.15.
The RP model indicated a non-statistically significant treatment effect of 1.000
(0.718, 1.392), which is illustrated with the overlapping estimated hazard functions
shown in Figure 3.15. However, as in the liver cirrhosis example described in Section
38
0.055
0.060
0.065
0.070
0.075
0.080
Mor
talit
y ra
te
0 5 10 15Time since diagnosis (months)
PlaceboD-penicillamine
Figure 3.15. Primary biliary cirrhosis trials investigating D-penicillamine treatment. Hazard function is estimated using a nonpara-metric kernel smoother.
3.5, I am interested not only in the direct treatment effect on survival, but also the
association between a repeatedly measured biomarker, serum bilirubin, and survival.
Due to some right skewness, in all analyses of the PBC dataset, I work with log of
serum bilirubin.
In Figures 3.16 and 3.17 I present the longitudinal profiles of log(serum bilirubin)
over time, with the lowess smoother overlaid to give an indication of the average profile
over time, and comparing between patients who died and who were censored. Figure
3.16 suggests a relationship between the longitudinal and survival processes, which
becomes further pronounced in Figure 3.17, where the timescale is adjusted by taking
away each patients observed survival time. Figure 3.17 indicates an upward trend in
log(serum bilirubin) in the 5 month period before death, possibly indicating that higher
values of log(serum bilirubin) could be associated with an increased risk of death.
The PBC dataset can be considered a classic joint model dataset, and will be used
in Chapter 10 to illustrate some of the features of the stjm Stata package written as a
central component of this thesis.
39
-20
24
Long
itudi
nal r
espo
nse
0 5 10 15Measurement time
Censored
-20
24
Long
itudi
nal r
espo
nse
0 5 10 15Measurement time
Event
Figure 3.16. Primary biliary cirrhosis trials investigating D-penicillamine treatment. Longitudinal profiles of log(serum bilirubin)index for patients who were censored/died. Lowess smoother overlaid.
-20
24
Long
itudi
nal r
espo
nse
-15 -10 -5 0Time before censoring
Censored
-20
24
Long
itudi
nal r
espo
nse
-15 -10 -5 0Time before event
Event
Figure 3.17. Primary biliary cirrhosis trials investigating D-penicillamine treatment. Longitudinal profiles of log(serum bilirubin)index for patients who were censored/died. Timescale is adjusted bytaking away each patients observed survival time. Lowess smootheroverlaid.
40
3.8. Discussion
In this chapter I have introduced and described a variety of clinical datasets which
will be used throughout the remainder of the thesis. The wide variety of example
datasets will be used to motivate and illustrate complex statistical techniques.
41
CHAPTER 4
Simulation of complex survival data
4.1. Chapter outline
In this chapter I describe and develop methods for the simulation of complex sur-
vival data. In order to evaluate statistical models, such as those described in Chapter
2, simulation studies can be conducted to establish the performance of a model when
generating data from a known ‘truth’. In other words, I can generate data under a
known model, apply a proposed model, and establish how well the proposal performs.
I begin by describing how to simulate survival times from a standard proportional
hazards model, assuming a parametric distribution such as the Weibull, and then de-
velop methodology and propose an algorithm which allows the simulation of survival
times from essentially any defined hazard function. This includes complex baseline haz-
ard functions, which will be utilised in Chapter 5, and joint longitudinal and survival
data, which will be utilised in Chapters 7, 8 and 9. The simulation framework is illus-
trated through a number of example simulation studies. The methods are implemented
in the publicly available survsim Stata package, which can be downloaded by typing
ssc install survsim in Stata.
This chapter is based on two recent publications; Crowther and Lambert (2012)
and Crowther and Lambert (2013a), copies of which can be found in Appendix A.
4.2. Introduction
Simulation studies are conducted to assess the performance of current and novel
statistical models in pre-defined scenarios. By simulating from a known ‘truth’, it can be
directly evaluated how well a proposed model performs in estimating the ‘truth’, which
cannot be conducted using real clinical data. The quality and reporting of simulation
42
studies varies considerably which has led to the establishment of general guidelines for
the development and reporting of simulation studies in medical research (Burton et al.,
2006). In order to establish certain properties, such as bias and coverage, or robustness
to deviations from underlying assumptions, it is often desirable that chosen simulation
scenarios accurately reflect a biologically plausible distribution. This is particularly
important in the framework of survival analysis, where distributions are chosen for
both the event time and the censoring time.
Previous studies have introduced a framework to simulate survival data for Cox
proportional hazards models (Leemis, 1987; Bender et al., 2005), known as the inver-
sion method, which can generate survival times from standard parametric distributions
including the exponential, Weibull and Gompertz amongst others, incorporating base-
line covariate effects under proportional hazards. This method is easily implemented
in any standard software with a random number generator. Bender et al. (2005) noted
that many simulation studies that generated survival data assumed an exponential dis-
tribution for the distribution of event times. Although many recent studies have gone
beyond the standard exponential choice to a slightly more complex Weibull distribu-
tion (Rashid et al., 2008; Belot et al., 2010), these choices are often not flexible enough
to fully reflect the underlying distributions frequently observed in clinical data, as I
illustrated in Chapter 3.
Often in clinical trials or population based studies, at least one turning point is
observed in the underlying hazard function (Murtagh et al., 1994; Lambert and Roys-
ton, 2009; Eloranta et al., 2012). Although hazard ratios can be insensitive to a poorly
specified baseline hazard function (Rutherford et al., 2014), it is vital to accurately
capture the baseline hazard function when interest lies in measures of absolute risk
(Royston and Lambert, 2011). Through a fully flexible parametric model, I can both
accurately capture complex hazard functions, but also simulate biologically plausible
survival data. As discussed in Section 2.9, such methods are becoming more common-
place as the benefits of a parametric approach, such as the reporting of measures of
43
absolute risk, become recognised in applied research. I will return to parametric mod-
elling further in Chapter 5. In order to assess such parametric approaches methods
are required to simulate survival data from a variety of complex distributions, beyond
standard distributions such as the exponential, Weibull and Gompertz.
Furthermore, a variety of extensions to the standard survival analysis framework,
such as incorporation of time-dependent effects (non-proportional hazards), the oc-
currence of time-varying covariates (which will be considered further in Chapter 6),
heterogeneous covariate effects and covariates measured with error, all require suitable
simulation techniques to assess statistical models developed for each setting. A further,
often observed phenomenon, in survival analysis is the presence of informative censor-
ing. Standard survival models make the assumption of no dependence between the
survival and censoring mechanisms. Assessing the robustness of methods to deviations
from this assumption is a key question in survival analysis (Siannis et al., 2005).
In this chapter, I develop a general algorithm for the simulation of survival times
(Crowther and Lambert, 2013a). In Section 4.3, I briefly describe again a dataset
of breast cancer patients, introduced in Section 3.2, which exhibits turning points in
the underlying hazard function, which cannot be simulated using standard parametric
distributions. I briefly describe the inversion method in Section 4.4, to simulate data
from standard parametric distributions with an analytically tractable and invertible
cumulative hazard function, which forms the basis for the simulation framework. In
Sections 4.5 and 4.6, I describe a range of simulation scenarios, culminating in a general
simulation algorithm to simulate survival data from complex distributions using root
finding techniques with nested numerical integration. In Section 4.7, I describe how
to incorporate time-dependent effects, both with standard and complex parametric
distributions. I then describe how to incorporate both binary and continuous time-
varying covariates in Section 4.8, whilst in Section 4.9, I describe how the techniques
can be applied to incorporate dependent censoring. The methods are implemented
using the publicly available survsim package in Stata (Crowther and Lambert, 2012;
Crowther, 2011). Finally, in Section 4.11, I conclude the chapter with a discussion.
44
4.3. Motivating dataset - German breast cancer trial
I begin by illustrating the commonly observed phenomenon of a turning point in the
baseline hazard function, as motivation for simulating more clinically plausible survival
data. I use the dataset described in Section 3.2. Briefly, this dataset consists of 686
women diagnosed with breast cancer in Germany (Schumacher et al., 1994), with 246
patients randomized to receive hormonal therapy and 440 to receive a placebo. Outcome
is recurrence-free survival, with 299 patients experiencing the event of interest.
I apply a Weibull proportional hazards model and a proportional hazards Royston-
Parmar (RP) model, as described in Section 2.8.2, investigating the effect of hormonal
therapy treatment. In preliminary modelling, the degrees of freedom used to model
the baseline of the RP model are selected using the AIC and BIC, as recommended
by Rutherford et al. (2014). The AIC selected four degrees of freedom, and the BIC
selected two. In this case I follow the AIC and choose four degrees of freedom, to ensure
sufficient flexibility. Estimates of the treatment effect from the Weibull and RP models
are shown in Table 4.1. For comparison, I also fit a Cox model.
Table 4.1. German breast cancer trial. Estimated treatment effectsfrom Weibull, Royston-Parmar and Cox models
Model Hazard ratio 95% Confidence Interval
Weibull 0.675 0.528 0.862RP (df=4) 0.695 0.544 0.888
Cox 0.695 0.544 0.888
Table 4.1 indicates identical estimates between the RP and Cox models (to three
decimal places), with a minor difference in estimated hazard ratios between the Weibull
and RP models, with hazard ratios of 0.675 and 0.695, respectively, both showing a
statistically significant beneficial treatment effect. Figure 4.1 shows the fitted survival
curves from the Weibull and RP models overlaid on the Kaplan-Meier curves, by treat-
ment group, showing the much improved fit from the more flexible model.
45
0.2
0.4
0.6
0.8
1.0
Sur
viva
l pro
babi
lity
0 2 4 6 8Follow-up (years)
Weibull model
0.2
0.4
0.6
0.8
1.0
Sur
viva
l pro
babi
lity
0 2 4 6 8Follow-up (years)
Flexible parametric model
KM, no therapy Predicted survival, no therapyKM, hormonal therapy Predicted survival, hormonal therapy
Figure 4.1. German breast cancer trial. Predicted survival fromWeibull and Royston-Parmar models overlaid on the Kaplan-Meiercurves.
Furthermore, in Figure 4.2 I present the fitted hazard functions, across treatment
group, for the Weibull and RP models, illustrating a marked difference in the estimated
underlying shapes, indicating the presence of two turning points in the hazard function.
0.00
0.05
0.10
0.15
0.20
0.25
Haz
ard
rate
0 2 4 6 8Follow-up (years)
Weibull model
0.00
0.05
0.10
0.15
0.20
0.25
Haz
ard
rate
0 2 4 6 8Follow-up (years)
Flexible parametric model
Predicted hazard, no therapyPredicted hazard, hormonal therapy
Figure 4.2. German breast cancer trial. Predicted hazard functionsfrom Weibull and Royston-Parmar survival models.
46
4.4. Simulating survival data
4.4.1. Simulating survival times from standard parametric distributions
Bender et al. (2005) described a highly efficient, and easy to implement, technique
to generate survival times from a variety of standard parametric distributions, known
as the inversion method. Here, I briefly describe the method, as it forms the basis
for the extensions below. The hazard function of a proportional hazards model can be
expressed as
h(t|X) = h0(t) exp(Xβ)
where h0(t) is the baseline hazard function specified by some parametric distribution,
X is a vector of time-independent covariates with corresponding log hazard ratios,
β. The corresponding cumulative hazard, H(t|X), survival, S(t|X) and cumulative
distribution, F (t|X), functions, conditional on baseline covariates, are defined as follows
H(t|X) = H0(t) exp(Xβ), where H0(t) =
∫ t
0h0(u)du
S(t|X) = exp[−H(t|X)] and F (t|X) = 1− exp[−H(t|X)]
If I let T be the simulated survival time, Bender et al. (2005) showed that by letting
F (T |X) = 1− exp[−H(T |X)] = u, where u ∼ U(0, 1)
or alternatively
S(T |X) = 1− u (or equivalently = u) (4.1)
Thus, if h0(T ) > 0, then Equation (4.1) can be re-arranged and directly solved for T ,
as long as H0(T ) can be directly inverted.
T = H−10
(− log(u)
exp(Xβ)
)(4.2)
The data generating process then only requires draws from a uniform distribution,
followed by application of Equation (4.2). The three standard choices for h0(T ) are the
exponential, Weibull and Gompertz distributions described in Section 2.6, with common
47
hazard function shapes shown in Figure 2.1. For example, to simulate survival times
from a Weibull distribution, Equation (4.2) becomes
T =
(− log(u)
exp(Xβ)
)1/γ
(4.3)
The three standard distributions can be considered restrictive in terms of the shapes of
the baseline hazard function that can be generated. However, these distributions remain
appealing to researchers conducting simulation studies, perhaps because Equation (4.3)
has an analytically tractable form.
4.5. A general framework for simulation of survival data
In this section, I give a broad overview of the simulation framework, before describ-
ing in more detail the statistical techniques used in the algorithms, in Section 4.6. The
inversion method relies on two properties; 1) the cumulative hazard function must be
analytically tractable, i.e. it can be written down in a closed form; and 2) the cumula-
tive hazard function must be invertible, i.e. we must be able to directly solve for the
simulated survival time, T .
I will describe algorithms to first overcome the situation when I can write the cumu-
lative hazard function in closed form; however, cannot directly solve for the simulated
survival time. I then move to the general framework, where both the cumulative hazard
cannot be written in a closed form, and further it cannot be inverted to solve for the
simulated survival time. Figure 4.3 shows a schematic flow diagram illustrating the
general framework for simulating survival data from a defined hazard or cumulative
hazard function.
4.5.1. Scenario 1
Scenario 1 involves the setting of Bender et al. (2005) described in Section 4.4.1,
where the cumulative hazard function has a closed form expression and can be directly
inverted to solve for T , the simulated survival time, hence the inversion method can be
directly applied.
48
Does H0(t) have a closed form expression?
Can you solve for T analytically?
Scenario 1 Apply method of
Bender et. al. (2005)
Scenario 2 Use iterative root
finding to solve for simulated time, T
Scenario 3 Numerically integrate to obtain H0(t), within iterative root finding
to solve for T
Yes Yes
No No
Figure 4.3. Schematic flow diagram of simulation techniques, begin-ning at the top-left.
4.5.2. Scenario 2
Scenario 2 arises when I wish to use a more complex baseline hazard function to
simulate data under a proportional hazards model. In this case I assume the cumulative
hazard function has a closed form expression. However, if I choose a more complex
hazard and consequently cumulative hazard function, it may be that the cumulative
hazard function can no longer be directly inverted, and therefore cannot directly be
solved for T , the simulated survival time. In this situation I proceed by applying
iterative root finding techniques. I describe this in more detail, and through example,
in Section 4.6.1.
4.5.3. Scenario 3
Finally, Scenario 3 arises when I define a complex hazard function which cannot be
integrated analytically to obtain the cumulative hazard function. To accommodate this
setting, numerical integration techniques such as Gauss-Legendre quadrature (described
in Section 4.6.3) can be used. Following this, I once again have a cumulative hazard
49
function which cannot be directly inverted to solve for the simulated survival time, T ,
therefore requiring iterative root finding techniques as in Scenario 2. This results in
a general 2-stage algorithm involving numerical integration nested within an iterative
root finding procedure. I describe this in more detail in Section 4.6.3.
4.6. Simulating from a complex baseline hazard function
4.6.1. Root finding
The first extension I describe involves the situation where I wish to use a more complex
baseline hazard function, to simulate data under a proportional hazards model. In this
case I still assume that the cumulative hazard can be evaluated analytically for a given
hazard function.
The step between Equation (4.1) and Equation (4.2) is reliant on being able to
directly re-arrange Equation (4.1) to solve for T , the simulated survival time. When
this condition fails it requires iterative techniques to find the root of Equation (4.1). I
illustrate this situation through an example.
4.6.1.1. Example: 2-component mixture Weibull distribution
I now begin to introduce some complexity in the parametric distribution used to gen-
erate survival times. Motivation for going beyond the standard parametric approaches
described in Section 4.4.1 originates from the often observed situation of a turning point
in a dataset’s baseline hazard function, illustrated in Section 4.3. One such approach is
to use a mixture of parametric distributions (Crowther and Lambert, 2012; Rutherford
et al., 2014).
Here I define the overall baseline survival function of a two-component parametric
mixture model. Finite mixture survival models of this form have been used in standard
survival analysis (McLachlan and McGiffin, 1994), and mixture and non-mixture cure
models to obtain improved estimates of statistical cure (Lambert et al., 2010b). I define
two parametric components additive on the survival scale
S0(t) = pS01(t) + (1− p)S02(t) (4.4)
50
which defines a baseline survival function, S0(t), where S01(t) and S02(t) are the survival
function of any standard parametric distribution, and p represents the mixing parame-
ter where 0 ≤ p ≤ 1. For illustrative purposes I proceed by assuming a two-component
mixture Weibull distribution, with
S0(t) = p exp(−λ1tγ1) + (1− p) exp(−λ2t
γ2) (4.5)
where λ1, λ2, and γ1, γ2 are scale and shape parameters, respectively, with λ1, λ2,
γ1, γ2 > 0. Transforming to the cumulative hazard scale
H0(t) = − log [p exp(−λ1tγ1) + (1− p) exp(−λ2t
γ2)] (4.6)
and differentiating with respect to t, I obtain the baseline hazard function
h0(t) =λ1γ1t
γ1−1p exp(−λ1tγ1) + λ2γ2t
γ2−1(1− p) exp(−λ2tγ2)
p exp(−λ1tγ1) + (1− p)exp(−λ2tγ2)(4.7)
Proportional hazards can then be induced
h(t) =λ1γ1t
γ1−1p exp(−λ1tγ1) + λ2γ2t
γ2−1(1− p) exp(−λ2tγ2)
p exp(−λ1tγ1) + (1− p) exp(−λ2tγ2)exp(Xβ) (4.8)
where X is a vector of time-independent covariates (and no intercept term), with
associated regression coefficients, β. This model can be used to simulate survival
data from a variety of functions with turning points, to better reflect observed clinical
datasets. I illustrate some examples in Figure 4.4, based on those seen in real datasets
(Anderson et al., 1993; Murtagh et al., 1994; Schumacher et al., 1994).
Equation (4.8) can be directly integrated with respect to t to obtain the cumulative
hazard function, and consequently the survival function. However, this leads to a
survival function that when substituted into Equation (4.1), produces an equation
which cannot be directly solved for t (Scenario 2 described in Section 4.5.2). I now
describe two root finding techniques to accommodate this situation.
51
0.5
11.
5H
azar
d fu
nctio
n
0 2 4 6 8 10Follow−up time
λ1 = 1, γ1 = 1.5, λ2 = 1,γ2 = 0.5, p = 0.5
0.5
11.
5H
azar
d fu
nctio
n
0 2 4 6 8 10Follow−up time
λ1 = 0.1, γ1 = 3, λ2 = 0.1,γ2 = 1.6, p = 0.8
0.5
11.
5H
azar
d fu
nctio
n
0 2 4 6 8 10Follow−up time
λ1 = 1.4, γ1 = 1.3, λ2 = 0.1,γ2 = 0.5, p = 0.9
0.5
11.
5H
azar
d fu
nctio
n0 2 4 6 8 10
Follow−up time
λ1 = 1.5, γ1 = 0.2, λ2 = 0.5,γ2 = 0.1, p = 0.1
Figure 4.4. Example two-component mixture Weibull baseline hazardfunctions
4.6.1.2. Brent’s univariate root-finding method
To generate survival times, the following needs to be solved for t,
g(t) = S(t)− U = 0 (4.9)
An efficient method to calculate the simulated survival times is to use Brent’s univariate
root finder. This algorithm combines the bisection method with linear or quadratic
interpolation (Jann, 2005). The algorithm is executed until a desired tolerance (I use
a default of 1E-08) is met.
4.6.1.3. Newton-Raphson root finder
An alternative method to Brent’s root finder is to use Newton-Raphson iterations,
which uses the first two terms of the Taylor series expansion of g(t), the objective
function. I begin with an initial guess, t0, and then execute the following formula
tn+1 = tn −g(tn)
g′(tn)(4.10)
52
Equation (4.10) is executed until a desired tolerance is met, for example |tn+1 − tn| <1E − 08. Royston (2012) proposed to use Newton-Raphson iterations to simulate sur-
vival times using restricted cubic splines on the log cumulative hazard scale. In my
experience I have found Brent’s method to be far superior in terms of reliability and
accuracy compared to Newton-Raphson iterations, which can have convergence prob-
lems.
4.6.2. Example simulation study
I illustrate the use of Brent’s root finding technique described in Section 4.6.1, applied
to the 2-component mixture Weibull distribution. In each of 1000 repetitions, I generate
1000 survival times from a two-component mixture Weibull baseline hazard function,
with parameters λ1 = 0.3, γ1 = 2.5, λ2 = 0.025, γ2 = 1.9 and p = 0.3. These
parameter values are chosen to closely approximate the observed hazard function seen
in the breast cancer dataset. I also include a binary treatment variable, drawn from
Xi ∼ Bin(1, 0.5), with associated hazard ratio exp(β) = 0.7, and apply administrative
censoring at 5 years. Computation time to generate the 1000 datasets was 34 seconds
on an Intel Core i5 2.5GHz CPU, illustrating the computational efficiency of the root
finding technique. To each simulated dataset I apply a Weibull survival model and the
two-component mixture Weibull model (Crowther and Lambert, 2011), i.e. the true
model, monitoring estimates of the log hazard ratio. Furthermore, I monitor estimates
of the survival probability and hazard rate in the reference group (Xi = 0), at time
points t = 1, 2, 3, 4, 5, to assess the ability of each model to capture the baseline
hazard and survival functions.
Results are presented in Table 4.2. The mixture Weibull model produces unbiased
estimates and good coverage probabilities in the log hazard ratio, and the estimates of
survival and hazard, indicating its ability to capture the complex underlying hazard
function. In comparison, estimates from the Weibull model indicate minor bias in the
log hazard ratio, with large bias observed in the estimates of survival and hazard, across
the 5 time points, indicating its inability to effectively capture the underlying shape.
53
Table 4.2. Bias and coverage of the log hazard ratio and estimates ofbaseline survival and hazard at specific time points from both Weibulland mixture Weibull models.
Weibull Mixture WeibullTruth Bias 95% Coverage Bias 95% Coverage
β -0.357 -0.013 92.3 -0.002 93.7Survival
1 year 0.905 -0.019 43.9 0.001 94.72 year 0.693 0.055 5.4 -0.001 94.33 year 0.575 0.041 39.2 -0.000 95.24 year 0.494 0.004 93.7 0.001 94.35 year 0.411 -0.015 88.1 -0.000 94.1
Hazard1 year 0.220 -0.067 0.0 -0.002 93.82 year 0.250 -0.066 0.1 0.001 95.43 year 0.146 0.059 0.0 -0.001 94.24 year 0.166 0.055 1.5 -0.001 95.15 year 0.202 0.033 51.0 0.009 95.1
4.6.3. Numerical integration
I now describe in more detail scenario 3, introduced in Section 4.5.3, where I define
a more general functional form for the hazard function, which will require numerical
integration techniques to evaluate the cumulative hazard function. This is then followed
by one of the root finding techniques described above. I once again illustrate this
through an example.
4.6.3.1. Example: Fractional polynomials
Fractional polynomials select powers from a pre-defined set, usually (-2, -1, -0.5, 0,
0.5, 1, 2, 3), which can be used to model continuous covariates which exhibit non-
linearity (Royston and Sauerbrei, 2008). Note that 0 corresponds to log(). If a power
is repeated, such as 0.5, 0.5, then if x is the covariate being transformed, then this
gives x0.5 × log(x). In this case, the continuous ‘covariate’ of interest is survival time.
I once again define the proportional hazards model
h(t) = h0(t) exp(Xβ) (4.11)
54
where h0(t) is any general function which satisfies h0(t) > 0 for t > 0. Here I expand
log [h0(t)] into a function with 2 turning points, in this case a fractional polynomial
function with three powers 1, 0.5, 0.5.
log(h0(t)) = −18 + 7.3t− 11.5t0.5 log(t) + 9.5t0.5 (4.12)
The assumed hazard function is shown in Figure 4.5. This provides a reasonable fit to
the example dataset described in Section 4.3.
0.0
0.1
0.2
0.3
0.4
Haz
ard
rate
0 1 2 3 4 5Follow-up time
Fractional polynomial baseline hazard function
Figure 4.5. Example fractional polynomial baseline hazard function
The next step to simulate survival times from this underlying hazard function is to
calculate the cumulative hazard function; however, when I substitute the exponential of
Equation (4.12) into Equation (4.11) and attempt to integrate, I obtain an analytically
intractable integral, therefore requiring numerical techniques in order to evaluate it.
4.6.3.2. Gaussian quadrature
Numerical integration techniques, such as Gaussian quadrature (Stoer and Burlirsch,
2002), provide an approximation to an analytically intractable integral. Gaussian quad-
rature turns an integral into a weighted summation of a function evaluated at a set of
pre-defined points called nodes. It performs well if the function being integrated can be
approximated by a polynomial of degree 2m− 1, where m is the number of quadrature
55
nodes. For example, integrating a hazard function
∫ t
0h(u)du (4.13)
I first need to undertake a change of interval using
∫ t
0h(u)du =
t
2
∫ 1
−1h
(t
2z +
t
2
)dz (4.14)
I can now numerically integrate, using for example Gauss-Legendre quadrature, result-
ing in ∫ t
0h(u)du ≈ t
2
m∑
i=1
wih
(t
2zi +
t
2
)(4.15)
where w and z are vectors of weights and node locations, respectively. Under Gauss-
Legendre quadrature the weights are defined as
wi =2
(1− z2i )[P ′m(zi)]
where Pm(zi) is the Legendre polynomial of order m. The numerical accuracy of the
approximation depends on the number of nodes, m. In my experience I have found
that often 30 nodes are sufficient; however, when the hazard is initially very high then
often more will be required. The accuracy can be assessed by setting a simulation seed
and simulating survival times with an increasing number of nodes, and comparing the
relative difference between times. Further forms of numerical integration for integrating
over an infinite domain are introduced in Chapter 6.
Now that I can calculate the cumulative hazard, I then apply one of the root finding
procedures described in Section 4.6.1. The iterative algorithm, however, in this case
now has multiple steps, including numerical integration nested within either Newton-
Raphson steps or Brent’s method.
4.6.4. Example simulation study
I now illustrate the algorithm by simulating survival data from the baseline hazard
function defined in Equation (4.12). In this case, I can use the Royston-Parmar survival
model, which models the baseline log cumulative hazard function using restricted cubic
56
splines, to assess how well it captures complex hazard functions. I apply a Royston-
Parmar model with 5 degrees of freedom and for comparison I also apply a Weibull
proportional hazards model. For each of 1000 repetitions I simulate 1000 survival
times, incorporating a binary and continuous covariate, representing gender, X1i ∼Bin(1, 0.5), and age, X2i ∼ N(65, 12), with associated log hazard ratios of β1 = −0.5
and β2 = 0.02, respectively. I assume administrative censoring at 5 years. Computation
time to generate the 1000 datasets was 144 seconds on an Intel Core i5 2.5GHz CPU.
In each repetition I monitor estimates of the log hazard ratios for the effects of gender
and age. Furthermore, I assess estimates for survival probabilities and hazard rates at
t = 1, 2, 3, 4, 5, estimated at X1i = 0 and X2i = 65.
Results are presented in Table 4.3. Under the Weibull model, I observe moderate
bias of -0.082 (-16.4%) for β1, the treatment effect, compared to essentially unbiased
estimates under the Royston-Parmar survival model. Estimates of the hazard and
survival functions are generally heavily biased under the Weibull model, with poor
coverage probabilities, compared to minimal bias and good coverage under the Royston-
Parmar approach. Note, there is a small amount of bias for the hazard at 5 years under
the Royston-Parmar survival model; however, given that this is not the true model, it
generally performs very well.
4.7. Simulating time-dependent effects
The presence of non-proportional hazards, i.e. time-dependent effects, is common-
place in the analysis of time to event data (Jatoi et al., 2011). This is often observed
in the analysis of registry based data sources where follow-up time can be over many
years (Lambert et al., 2011). Furthermore, evidence is often found of time-dependent
treatment effects (Mok et al., 2009).
4.7.1. Standard parametric distributions
Under standard parametric distributions, the inclusion of time-dependent effects can
be undertaken so as to ensure an analytically tractable and invertible cumulative hazard
function, allowing direct application of the inversion method. For example, under
57
Table 4.3. Bias and coverage of the log hazard ratio and estimates ofbaseline survival and hazard at specific time points from Weibull andRoyston-Parmar models.
Weibull RPTruth Bias 95% Coverage Bias 95% Coverage
β1 -0.500 -0.082 74.2 -0.001 93.1β2 0.020 0.003 75.4 0.000 94.2
Survival1 year 0.602 0.011 91.4 0.000 94.72 year 0.189 0.059 4.9 0.001 94.33 year 0.076 0.001 87.1 -0.000 94.24 year 0.037 -0.018 10.3 -0.001 94.45 year 0.018 -0.014 0.1 0.001 95.3
Hazard1 year 1.105 -0.365 0.0 -0.001 95.62 year 1.064 -0.009 84.9 -0.005 94.73 year 0.796 0.503 0.0 0.022 94.44 year 0.681 0.824 0.0 0.013 96.15 year 0.709 0.979 0.0 -0.064 91.2
an exponential or a Gompertz baseline hazard function, the interaction of covariates
with linear time can be included, to result in a hazard function which can still be
directly integrated, and subsequently directly solved for the simulated survival time,
t. Similarly, under a Weibull distribution, an interaction can be formed between the
covariates and log time, equivalent to changing the shape parameter, γ.
For example, consider a binary covariate, X1, which takes values 0 or 1. Under a
Gompertz baseline hazard function, I can invoke non-proportional hazards by interact-
ing X1 with linear time, t:
h(t) = λ exp(γt+ β1X1 + β2X1t) (4.16)
Equation (4.16) can be re-arranged,
h(t) = λ exp [(γ + β2X1)t+ β1X1] (4.17)
58
integrated to obtain the cumulative hazard function
H(t) =
∫ t
0λ exp [(γ + β2X1)u+ β1X1] du
=λ exp(β1X1)
γ + β2X1exp [(γ + β2X1)t]− 1 (4.18)
I therefore have, from Equation (4.1), and letting T be the simulated survival time
U = exp
(−λ exp(β1X1)
γ + β2X1exp [(γ + β2X1)T ]− 1
)(4.19)
which can be inverted and solved for t, the simulated survival time
T =1
γ + β2X1log
(− γ + β2X1
λ exp(β1X1)log(U) + 1
)(4.20)
This of course can be extended to multiple time-dependent effects; however, if I wish
to use a more complex distribution, or indeed a more complex function of time to
interact with covariates, I once again have analytically intractable and non-invertible
cumulative hazard functions.
4.7.2. Complex hazard functions
Incorporating time-dependent effects when simulating more complex hazard functions,
returns to the scenario where both numerical integration and iterative root-finding
procedures are required. For example, this arises when including a time-dependent
effect into the 2-component mixture Weibull model.
h(t) = h0(t) exp(β1(t)X) (4.21)
where β1(t) is a general function of time, t, such as a simple linear term, or something
more complex such as a fractional polynomial or spline function. This of course can
be extended to multiple time-dependent effects, with alternative complex functions of
time.
4.7.3. Example simulation study
I now conduct a simulation study assessing the performance of the Weibull and
59
Royston-Parmar models (using 5 degrees of freedom for the baseline) under propor-
tional hazards, when simulating a time-dependent diminishing treatment effect. I fur-
ther apply a Royston-Parmar model, with 5 degrees of freedom for the baseline, allowing
for a time-dependent treatment effect with 3 degrees of freedom. In all models I assess
estimates of the hazard and survival functions in each treatment group. Survival times
are simulated from the baseline hazard function shown in Equation (4.12).
For each of 1000 repetitions I simulate 1000 survival times, incorporating a binary
and continuous covariate, representing treatment, X1i ∼ Bin(1, 0.5) and age, X2i ∼N(65, 12), and assume administrative censoring at 5 years. I simulate a time-dependent
treatment effect under the following
β1(t) = −0.7 + 0.01t+ 0.4 log(t) (4.22)
and proportional age effect of β2 = 0.02. Figure 4.6 illustrates the underlying log
hazard ratio shown in Equation (4.22), indicating the diminishing treatment effect.
-2.5
-2
-1.5
-1
-.5
0
Log
haza
rd r
atio
0 1 2 3 4 5Follow-up time (years)
Figure 4.6. Example time-dependent log hazard ratio representing adiminishing treatment effect.
The true hazard ratio of the treatment effect at 1, 2 and 5 years is 0.502, 0.668 and
0.994, respectively, to represent a diminishing treatment effect. Computation time to
60
generate the 1000 datasets was 173 seconds on an Intel Core i5 2.5GHz CPU. In each
repetition I also monitor estimates of the log hazard ratios for the effect of age.
Results are presented in Table 4.4. As in Section 4.6.4, I observe very poor perfor-
mance when using a Weibull model, with large bias in the estimates of the hazard and
survival functions for both treatment groups. I observe improved performance under
the proportional hazards Royston-Parmar model; however, some bias and poor coverage
is seen in estimates of the hazard and survival functions, particularly in the treatment
group, but as this model assumes proportional hazards and the true model has non-
proportional hazards, this is to be expected. Under the Royston-Parmar model allowing
for a time-dependent treatment effect I observe much reduced bias and improved cover-
age probabilities, indicating that the model has captured the complex time-dependent
effect, even though I do not fit the true underlying model.
4.8. Simulating time-varying covariates
Time-varying covariates occur frequently in medical research. In cancer clinical
trials the occurrence of treatment switching or non-compliance, occurs when a patient
switches from, for example, the standard therapy to the new treatment, often around
the time of progression. An area of increasing interest in the biostatistical literature
is the joint modelling of longitudinal and survival data, where a repeatedly measured
biomarker is observed, and it is of interest to investigate the relationship of this time-
varying biomarker to survival. Joint modelling will be comprehensively described in
Chapters 6 to 10.
Recently Austin (2012) extended the methods of Bender et al. (2005) to simu-
late time-varying covariates of three types: first, a dichotomous time-varying covariate
that can change at most once; second, a continuous time-varying covariate; third, a
dichotomous time-varying covariate where subjects can switch groups multiple times.
Austin derived closed form expressions, including time-independent covariates, under
the exponential, Weibull and Gompertz distributions.
61
Table 4.4. Bias and coverage of the log hazard ratio and estimates ofbaseline survival and hazard at specific time points from proportionalhazards Weibull and Royston-Parmar models, and non-proportionalhazards Royston-Parmar model.
Weibull PH RP PH RP NPHTruth Bias 95% Coverage Bias 95% Coverage Bias 95% Coverage
β2 0.020 0.003 74.5 0.000 94.0 0.000 94.6Survival (X1 = 0, X2 = 65)
1 year 0.602 0.035 39.0 0.030 61.7 0.000 94.92 year 0.189 0.062 5.0 0.006 91.3 0.001 94.03 year 0.076 -0.006 79.6 -0.010 78.9 -0.000 94.34 year 0.037 -0.022 0.7 -0.012 42.7 -0.001 95.55 year 0.018 -0.016 0.0 -0.008 38.4 0.001 95.7
Hazard (X1 = 0, X2 = 65)1 year 1.105 -0.374 0.0 -0.058 90.1 -0.005 95.42 year 1.064 0.062 74.5 0.093 81.7 -0.012 94.83 year 0.796 0.654 0.0 0.224 10.6 0.025 95.04 year 0.681 1.054 0.0 0.268 17.9 0.025 95.65 year 0.709 1.286 0.0 0.223 54.0 -0.046 95.5
Survival (X1 = 1, X2 = 65)1 year 0.803 0.180 0.0 0.168 61.7 -0.000 94.82 year 0.406 0.281 0.0 0.206 91.3 0.001 94.63 year 0.208 0.157 0.0 0.138 78.9 -0.001 94.84 year 0.112 0.061 0.0 0.086 42.7 -0.002 95.35 year 0.059 0.018 7.2 0.055 38.4 0.001 95.5
Hazard (X1 = 1, X2 = 65)1 year 0.554 -0.707 0.0 -0.511 0.0 0.000 94.12 year 0.711 -0.451 0.0 -0.408 0.0 -0.008 94.23 year 0.632 -0.007 90.2 -0.218 0.1 0.020 93.64 year 0.612 0.263 0.0 -0.144 37.5 0.011 95.55 year 0.705 0.377 0.0 -0.181 38.1 -0.079 88.6
PH - proportional hazards, RP - Royston-Parmar modelNPH - non-proportional hazards
Under the simulation framework described above, I generalise the approach of
Austin (2012) to incorporate any combination of time-varying covariates, with a user-
defined baseline hazard function to incorporate more biologically realistic hazard func-
tions, but of course could still be a standard parametric distribution.
4.8.1. Simulating treatment switching
In this scenario I wish to simulate a time-varying binary covariate. I define X1 to
represent initial treatment a patient is randomised to, with treatment A (X1 = 0) and
treatment B (X1 = 1). I assume patients were randomised to treatment arms at t = 0.
62
For simplicity I allow patients to switch arm at most once. I also include a binary
covariate which represents disease severity, X2, with each patient having a 40% chance
of having a bad prognosis (X2 = 1), which increases a patient’s event rate. Under a
general baseline hazard function, h0(t), the hazard function is written as
h(t) = h0(t) expβ1 [I(t ≤ ts)X1 + I(t > ts)(1−X1)] + β2X2 (4.23)
where β1 is the log hazard ratio for the effect of treatment, which in this case I assume
is the same regardless of whether patients switched or not. In this example I assume
that a patient initially randomised to treatment (X1 = 1) has a treatment effect of
exp(β1) until their switching time, ts, after which their hazard ratio is 1. Thus, the
time-dependence is introduced through the indicator functions I(t ≤ ts) and I(t > ts).
The switching times need to be generated, for example I can generate the potential
switching times from a uniform distribution, which is dependent on disease severity
(X2). Endogenous/non-ignorable treatment switching can be created if the variable
X2 is deleted and not available for analysis. This scenario can be easily extended
to allow for any number of switches. Alternatively, I first could generate a vector of
survival times, ts, to represent time to progression, using the simulation framework
described in the previous sections.
4.8.2. Simulating a continuous time-varying biomarker and survival in-
corporating random effects and covariates measured with error
Here I wish to simulate a continuous biomarker, measured with error, exhibiting a
linear trend, under the following model
m(t) = β0i + β1it+ δui (4.24)
where
βi ∼ N(β,Σ) (4.25)
and ui is a vector of baseline covariates with associated coefficients δ. By including
the trajectory function, m(t), in the linear predictor of the survival model, multiplied
63
by an association parameter α, I can use the simulation algorithm described in Section
4.6.3 to directly simulate survival times under a joint model framework (Wulfsohn and
Tsiatis, 1997).
h(t) = h0(t) expψTvi + αm(t) (4.26)
where h0(t) is our user defined baseline hazard function, vi is a vector of baseline
covariates with associated log hazard ratios ψ.
Following the simulation of survival times, I can then construct any measurement
schedule for the longitudinal outcome, using Equation (4.24), and subsequently calcu-
late the observed longitudinal measurements. To complete the joint model framework,
measurement error in the longitudinal outcome can be incorporated simply by drawing
the observed longitudinal values from N(m(t), σ2e), where σ2
e is the measurement error
variance. This example further illustrates the ease at which incorporating random co-
variate effects can be conducted. This framework will be utilised in Chapters 7, 8 and
9 to simulate joint longitudinal-survival data.
4.9. Simulating a censoring distribution
In the previous examples I have assumed an administrative censoring time, i.e.
maximum follow-up time that each patient can be observed. In practice, intermittent
censoring is often observed, which may also be simulated. All of the scenarios and
techniques described above can be used to generate censoring times. By simulating a set
of event times and a second set of censoring times, for each patient, I can simply take the
minimum to obtain the observed survival time, and consequently the event indicator.
Furthermore, by making the censoring distribution dependent on covariates (be they
baseline covariates, with time-dependent effects, or time-varying), I can incorporate
informative censoring (Siannis et al., 2005). Alternatively, I could simulate survival
times, and then draw from a uniform distribution between the minimum and maximum
follow-up times to define a censoring fraction.
64
4.10. survsim
The methodology developed in this chapter to simulate survival data has been
implemented in the user friendly Stata package survsim, which can be downloaded
form within Stata by typing ssc install survsim. Example code to simulate all of
the scenarios described in this chapter can be found in the associated publications, in
Appendix A.
4.11. Discussion
I have described a general framework for the generation of survival data, incorpo-
rating any combination of complex baseline hazard functions, time-dependent effects,
time-varying covariates, delayed entry, random effects and covariates measured with
error. This centres on scenarios where the simulated survival time cannot be defined
in a closed form expression.
Previous work in the simulation of survival includes using standard baseline distri-
butions, such as the exponential, Weibull and Gompertz, with time-invariant covariates
(Bender et al., 2005). Mackenzie and Abrahamowicz (2002) described techniques to al-
low for time-dependent effects, and allowed specification of the marginal distribution
of event times and covariate distributions. A recent paper by Austin (2012) provided
closed form expressions to incorporate 3 types of time-varying covariates, which built
on work by Leemis (1987), who described techniques to invert the cumulative hazard
function with a single time-varying covariate. Furthermore, Sylvestre and Abrahamow-
icz (2008) describe two algorithms (permutation based and binomial model based) to
generate survival times with time-varying covariates. Finally, Royston (2012) provided
a method to simulate from parametric models that use restricted cubic splines on the log
cumulative hazard scale, using the Newton-Raphson root-finding technique described
in Section 4.6.1.3.
The general approach I have described relies on numerical integration to evaluate
analytically intractable hazard functions. In my experience, 15 to 20 Gauss-Legendre
quadrature points is often sufficient to provide accurate generation of survival times;
65
however, in the associated software I use 30 nodes as the default, as computation time is
often minimal. As with any estimation method which utilises numerical techniques, the
accuracy of the generation process can be assessed by defining a seed and changing the
tolerance of the root finder, and/or the number of quadrature nodes, and establishing
that the generated survival times do not change.
I have illustrated the simulation approach through a variety of simulation studies
and examples. In particular, by simulating from a complex underlying distribution, I
have shown that moderate bias can be observed in estimates of the log hazard ratio for
a treatment effect, when fitting a standard Weibull proportional hazards model.
Although in this chapter I have extolled the benefits of simulating from distribu-
tions beyond the standard choices, it must be stated that in many settings a simpler
distribution may be adequate. For example, if fitting Cox models under proportional
hazards and only the hazard ratio is of interest, then the baseline distribution used is
inconsequential and therefore a simpler distribution should take preference. However,
as described above, if evaluating parametric methods or incorporating time-dependent
effects, then using a more complex distribution can provide much more realistic sce-
narios in order to fully assess the methods being evaluated.
Further extensions not described in this thesis include incorporating a cure propor-
tion. This can be easily achieved by defining a mixture or non-mixture cure hazard
function. This framework can also be applied in the generation of competing risks data,
be it through cause-specific hazards or the approach of Beyersmann et al. (2009).
Given the inherent requirement of simulation studies to assess the statistical prop-
erties and performance of current and novel methods, I believe this framework can
play an important role in allowing the generation of more biologically realistic survival
data, incorporating much more complex scenarios. For example, the two-component
mixture distribution described in Section 4.6.1.1 has recently been used to simulate
joint model data from a baseline (cumulative) hazard function, to assess the use of
splines to capture complex baseline hazard functions (Crowther et al., 2012a), which
will be considered further in Chapter 7. Although I have concentrated on parametric
66
survival models in this chapter, the framework is entirely applicable to examining the
performance of the Cox model in any of the scenarios described (Cox, 1972).
67
CHAPTER 5
A general framework for parametric survival analysis
5.1. Chapter outline
In this chapter I develop a general framework for the parametric analysis of sur-
vival data. Building on the simulation framework described in Chapter 4, numerical
integration will be used to allow the estimation of a parametric survival model, with
essentially any general hazard or log hazard function. I will describe a variety of ex-
tensions such as complex time-dependent effects, relative survival (excess mortality
models) and cluster robust standard errors. Concentrating on the special case of using
restricted cubic splines, on the log hazard scale, I will further show how the estimation
routine can be substantially improved using a combined analytic/numeric approach.
Example Stata code, using the publicly available stgenreg package, will be used to
display the flexibility and use of the modelling framework in developing novel survival
models.
This chapter is based on two recent papers; the first published in the Journal
of Statistical Software (Crowther and Lambert, 2013b), and the second published in
Statistics in Medicine (Crowther and Lambert, 2014), copies of which can be found in
Appendix B.
5.2. Introduction
The most popular tool for analysing survival data remains the Cox proportional
hazards model (Cox, 1972), which avoids making any assumptions for the shape of
the baseline hazard function. One of the reasons the Cox model remains the preferred
choice over parametric models is that parametric models available in standard software
are often not flexible enough to capture the underlying shape of the hazard function
68
seen in real data, illustrated in Section 4.3 of Chapter 4. Not having to specify the
baseline hazard function can be seen as both a strength and a weakness. The virtue of
the Cox model means that you do not have to make an assumption for the functional
form, where any assumption made can of course be incorrect. Conversely, Cox models
are estimated using partial likelihood, which will not be as efficient as full maximum
likelihood (if the model is correctly specified (Collett, 2003)), and when interest is in
measures of absolute risk it is much more convenient to do this parametrically.
Parametric models, such as those described in Chapter 2, remain a standard tool
for the analysis of survival data. As described in Section 2.9, knowledge of the baseline
hazard or survival allows absolute risk predictions over time (e.g. in prognostic models),
enables hazards ratios to be translated back to the absolute scale (e.g. useful for cal-
culating the number needed to treat), and further clinically meaningful statistics such
as differences in survival proportions (Lambert et al., 2010a). Parametric models are
also useful where extrapolation is required, such as in the economic decision modelling
framework (Weinstein et al., 2003; Latimer, 2013), and when capturing time-dependent
effects (Lambert et al., 2011). In particular, as described in Section 2.8.2, many authors
have shown the benefits of the Royston-Parmar flexible parametric model, as hazard
ratio estimates closely match those from a Cox model, whilst gaining the advantages
of undertaking a parametric approach, where in particular, the flexible modelling of
the baseline log cumulative hazard function and of time-dependent effects are particu-
larly appealing aspects of the modelling framework (Miladinovic et al., 2012; Rooney
et al., 2013). However, modelling on the log cumulative hazard scale is not without
its disadvantages; when there are multiple time-dependent effects there are difficulties
in interpretation of time-dependent hazard ratios, since these will vary over different
covariate patterns, even with no interaction between these covariates (Royston and
Lambert, 2011). Moving to the more traditional (log) hazard scale, whilst maintaining
flexibility, is desirable.
The traditional approach to estimation of parametric models is through maximum
likelihood, as discussed in Section 2.10. This is relatively simple when using a known
69
probability distribution function, such as the Weibull or Gompertz. Many commonly
used parametric survival models are implemented in a variety of software packages,
such as the streg package in Stata (StataCorp., 2011), survreg (Therneau, 2012) in
R (R Development Core Team, 2012) and LIFEREG in SAS (SAS Institute Inc., 2008).
However, every parametric model has underlying assumptions, with differences between
assumption of shape and covariate effects (proportional hazards, accelerated failure time
models), for example the widely used Weibull proportional hazards model assumes a
monotonically increasing or decreasing baseline hazard rate. Such assumptions can be
considered restrictive, leading to the development of other more flexible approaches
(Royston and Parmar, 2002; Royston and Lambert, 2011).
In this chapter, I present a general framework for the parametric analysis of sur-
vival data. Through the use of numerical integration, introduced in Chapter 4, any
hazard or log hazard function which can be written in a closed form expression can be
fitted through maximum likelihood. This allows complex extensions to standard para-
metric models, for example, modelling the log baseline hazard function using splines or
fractional polynomials, as well as complex time-dependent effects; methods that are un-
available in standard software. I will further extend the framework to relative survival
(excess mortality models) and cluster robust standard errors.
Concentrating on the special case of using restricted cubic splines, on the log hazard
scale to ensure positivity of the hazard function, I will further show how the estimation
routine can be substantially improved using a combined analytic/numeric approach.
Various types of splines have been used in the analysis of survival data, predominantly
on the hazard scale which results in an analytically tractable cumulative hazard func-
tion. For example, M-splines, which by definition are non-negative can be directly
applied on the hazard scale, due to the positivity condition. Kooperberg et al. (1995)
proposed using various types of splines on the log hazard scale, such as piecewise linear
splines (Kooperberg et al., 1995; Kooperberg and Clarkson, 1997).
The framework will be illustrated using the publicly available Stata package stgenreg
(Crowther and Lambert, 2013b), to display the flexibility and use of the modelling
70
framework in developing novel survival models. A further package, strcs, implements
the combined analytic/numerical approach.
5.3. A general framework for the parametric analysis of survival data
I now return to the estimation of parametric survival models, first described in
Section 2.10. Once again, the log-likelihood contribution of the ith patient, allowing for
right censoring and delayed entry (left truncation), can be written as
logLi = log
f(ti)
diS(ti)
1−di
S(t0i)
= di logf(ti)+ (1− di) logS(ti) − logS(t0i) (5.1)
where t0i and ti are the observed entry and survival/censoring times for the ith patient.
Using Equation (5.1) I can directly maximize the log-likelihood if using known proba-
bility density and survival functions. Alternatively, as shown in Section 2.10, Equation
(5.1) can be written as
logLi = di logh(ti) −∫ ti
t0i
h(u)du (5.2)
The important aspect to note from Equation (5.2), is that the log-likelihood can also be
maximised if only the hazard function is known. The maximisation of such a specified
hazard model relies on being able to evaluate the integral in Equation (5.2). If I
propose to use such functions as fractional polynomials or splines to model a complex
baseline (log) hazard function, or incorporating complex time-dependent effects, then I
have a situation where this integral cannot always be evaluated analytically, motivating
alternative approaches.
5.3.1. Numerical integration
I propose to use numerical quadrature, introduced in Chapter 4, to evaluate the cu-
mulative hazard, and hence maximise the likelihood in Equation (5.2). This allows
the estimation of a parametric survival model, specifying any function for the baseline
hazard, satisfying h(t) > 0 for all t > 0.
71
As described in Section 4.6.3.2 of Chapter 4, Gaussian quadrature allows the eval-
uation of an analytically intractable integral through a weighted sum of a function
evaluated at a set of pre-defined points, known as nodes (Stoer and Burlirsch, 2002).
The integral over [t0i, ti] in Equation (5.2) must be changed to an integral over [−1, 1]
using the following change of interval rule
∫ ti
t0i
h(x)dx =ti − t0i
2
∫ 1
−1h
(ti − t0i
2x+
t0i + ti2
)dx
Gaussian quadrature can then be applied as follows
∫ ti
t0i
h(x)dx ≈ ti − t0i2
m∑
j=1
wjh
(ti − t0i
2xj +
t0i + ti2
)
where xj is the jth quadrature node with associated weight, wj , where j = 1, . . . ,m.
This transformation allows the incorporation of delayed entry directly into the model
framework. As in Chapter 2, the default choice of quadrature is Gauss-Legendre.
Equation (5.2) then becomes
logLi ≈ di logh(ti) −ti − t0i
2
m∑
j=1
wjh
(ti − t0i
2xj +
t0i + ti2
)(5.3)
Equation (5.3) can then be maximised using standard methods described in Section
2.10. The accuracy of the numerical integral depends on the number of quadrature
nodes, m, with node locations dependent on the type of quadrature chosen. As with all
methods which use numerical integration, the stability of maximum likelihood estimates
should be established by using an increasing number of quadrature nodes. This is
explored further in Sections 5.9.1 and 5.10.1, and within the joint model framework in
Chapter 7.
72
5.4. Time-dependent effects
Time-dependent effects were introduced in Section 4.7, where the effect of a co-
variate is allowed to vary over time, also commonly known as non-proportional haz-
ards. Time-dependent effects are incorporated seamlessly into this modelling frame-
work. Within the general hazard function, an interaction can be formed between a
covariate(s) and any function of time, to provide the desired level of flexibility.
h(t) = h0(t) exp [X1β1 +X2β2(t)] (5.4)
where h0(t) is the baseline hazard function which could be anything from a simple
exponential to something more complex such as a spline based function. I have X1
is a vector of time-independent (baseline) covariates with associated log hazard ratios,
β1, and X2 is a further baseline covariate (usually ∈ X1), with β2(t) a function of
time, t, such as a simple linear term, or something more complex such as a fractional
polynomial or spline function. The extension to multiple time-dependent effects follows
naturally.
5.5. Excess mortality models
In population-based studies where interest lies in mortality associated with a par-
ticular disease, it is not always possible to use cause of death information. This may
be due to this information not being available or it considered too unreliable to use
(Begg and Schrag, 2002; Fall et al., 2008). In these situations it is common to model
and estimate excess mortality by comparing the mortality experienced amongst a dis-
eased population to that expected amongst a disease free population. The methods
have most commonly been applied to population-based cancer studies, but have also
been used in studies of HIV (Bhaskaran et al., 2008) and cardiovascular disease (Nelson
et al., 2007). The total mortality (hazard) rate for the ith patient, hi(t), is partitioned
into the expected mortality rate, h∗i (t), and the excess mortality rate associated with
a diagnosis of disease, λi(t).
hi(t) = h∗i (t) + λi(t) (5.5)
73
The expected mortality rate, h∗i (t), is usually obtained from national or regional life
tables stratified by age, calendar year, sex and sometimes other covariates such as
socio-economic class (Coleman et al., 1999).
Transforming to the survival scale gives,
Si(t) = S∗i (t)Ri(t) (5.6)
where Ri(t) is known as the relative survival function and S∗i (t) is the expected survival
function. The effect of covariates on the excess mortality rate is usually considered to
be multiplicative and so covariates, Xi are modelled as,
hi(t) = h∗i (t) + λ0(t) exp(Xiβ) (5.7)
where h∗i (t) is the expected mortality rate at time t, λ0(t) is the baseline excess hazard
function and the β’s are log excess hazard ratios (also referred to as log excess mortality
rate ratios). This model assumes proportional excess hazards, but in population-based
cancer studies this assumption is rarely true and there has been substantial work on
methods to fit models that relax the assumption of proportionality (Bolard et al., 2002;
Giorgi et al., 2003; Dickman et al., 2004; Nelson et al., 2007). A further fundamental
assumption is that h∗i (t) and S∗i (t) are assumed known.
A common model for analysing excess mortality is an extension of Royston-Parmar
models (Nelson et al., 2007), which were described in Section 2.8.2. With multiple
time-dependent effects interpretation of hazard ratios can be complicated and so there
are advantages to modelling on the log hazard scale instead. For example, in a model
on the log cumulative excess hazard scale where both age group and sex are modelled as
time-dependent effects, but with no interaction between the covariates, the estimated
time-dependent hazard ratio for sex would be different in each of the age groups. In a
model on the log excess hazard scale, this would not be the case (Royston and Lambert,
2011). Within the relative survival field, previous work by Remontet et al. (2007) used
numerical integration, but used quadratic splines, limited to only two knots, with no
restriction on the splines. Here I generalise the approach.
74
The log-likelihood for an excess mortality model is defined as,
logLi = di log h∗(ti) + λ(ti)+ log S∗(ti)+ log R(ti)
− log S∗(t0i) − log R(t0i) (5.8)
Since the terms log S∗(ti) and log S∗(t0i) do not depend on any model parameters
they can be omitted from the log-likelihood function for purposes of estimation, which
becomes
logLi = di log h∗(ti) + λ(ti)+ log R(ti) − log R(t0i) (5.9)
This means that in order to estimate the model parameters the expected mortality rate
at the time of death, h∗(ti), is needed for subjects that experience an event.
Under the general parametric framework using numerical quadrature, Equation
(5.9) becomes
logLi = di log h∗(ti) + λ(ti) −∫ ti
t0i
λ(u)du
≈ di log h∗(ti) + λ(ti)
− ti − t0i2
m∑
j=1
wjλ
(ti − t0i
2xj +
t0i + ti2
)(5.10)
which provides a general framework for relative survival models (Crowther and Lam-
bert, 2014), whereby Equation (5.10) can be maximised using techniques described in
Section 2.10 of Chapter 2.
5.6. Cluster robust standard errors
In standard survival analysis the assumption of independent observations is gener-
ally made; however, in some circumstances observations may be correlated if a group
structure exists within the data. For example in the analysis of recurrent event data,
where individual patients can experience an event multiple times, resulting in multiple
observations per individual (Prentice et al., 1981). In these circumstances, correlation
75
between observations of the same group would be expected. Failing to account for this
sort of structure can underestimate standard errors.
Given V , our standard estimate of the variance covariance matrix, which is the
inverse of the negative Hessian matrix evaluated at the maximum likelihood estimates,
I define the robust variance estimate developed by Huber (1967), White (1980) and
White (1982)
Vr = V
(N∑
i=1
u′iui
)V (5.11)
where ui is the contribution of the ith observation to ∂ logL/∂β, with N the total
number of observations.
This can be extended to allow for a clustered structure. Suppose the N observations
can be classified into M groups, which I denote by G1, . . . , GM , where groups are now
assumed independent rather than individual level observations. The robust estimate of
variance becomes
Vr = V
M∑
j=1
u(G)′j u
(G)j
V (5.12)
where u(G)j is the contribution of the jth group to ∂ log L/∂β. More specifically, Rogers
(1993) noted that if the log-likelihood is additive at the observation level, where
logL =
N∑
i=1
logLi (5.13)
then with ui = ∂ logLi/∂β, which is estimated numerically as described in Section
2.10.2 of Chapter 2, I have
uGj =∑
i∈Gjui (5.14)
I follow the implementation in Stata which also incorporates a finite sample adjustment
of V ∗r = M/(M − 1)Vr.
5.7. Prediction
Following a model fit, a variety of predictions can be calculated to assess model
fit, and produce clinically meaningful and interpretable plots. Of course, given that
76
I use numerical integration to calculate the cumulative hazard, and subsequently the
survival function in the estimation of a model, this means that numerical integration
must be used to calculate predictions of the cumulative hazard and survival functions.
For example, the fitted survival function is given by
S(t) = exp
(−∫ t
0h(u)du
)(5.15)
When approximating the integral in the above equation using quadrature, this becomes
S(t) ≈ exp
− t
2
m∑
j=1
wj h
(t
2xj +
t
2
) (5.16)
This approximation means that the calculation of confidence intervals does not follow
naturally; however, given that Equation (5.16) is simply the exponential of a linear
combination (weighted sum), then I can apply the delta method (Carstensen, 2004).
This method provides a way of calculating confidence intervals for the cumulative haz-
ard and survival functions, for any specified (log) hazard function. I compare this
method of calculating confidence intervals to an analytical form in Section 5.9.1.
5.8. Implementation in Stata: The stgenreg package
A central theme of the thesis is to not only develop methods for the analysis of
complex data, but to provide user friendly Stata software which puts the methods
directly into practice, to enable researchers to utilise the methodology described.
The methodological framework described above is implemented in the Stata package
stgenreg (Crowther and Lambert, 2013b). In this Section I describe the implementa-
tion of the program to aid in understanding of the flexibility of the approach, and how
it can be used, often through a single line of code, to fit novel survival models.
5.8.1. Program implementation and syntax
The log-likelihood shown in Equation (5.3) is maximized using the Newton-Raphson
algorithm, with first and second derivatives estimated using finite differences, as imple-
mented in the ml command in Stata (Gould et al., 2010), described in Section 2.10.
77
The evaluator program has been optimised using Stata’s matrix programming lan-
guage, Mata. This provides computational benefits and use of the wide array of math-
ematical functions available for the user to specify in the definition of the hazard or
log hazard function. In addition, I have implemented special cases of specific functions
which allow the incorporation of restricted cubic splines or fractional polynomials into
the hazard or log hazard function (Durrleman and Simon, 1989; Royston and Altman,
1994), making it extremely simple for the user to incorporate such complex and flexible
functions.
When using stgenreg one of the options loghazard() or hazard() must be de-
fined. These specify a user-defined log hazard or hazard function. The function must
be defined in Mata code, with parameters specified in square brackets, for example
[ln lambda]. The use of Mata means that mathematical operations require a colon
(:) prefix, for example :+ instead of +. Time must be coded as #t. The user can
specify covariates or functions of time within the linear predictor of any parameter,
providing a highly flexible framework.
For example, a Weibull distribution can be specified using either the log hazard or
hazard function. Each parameter should be parameterised to contain the entire real
number line, for example, both λ and γ are restricted to be positive by modelling on
the log scale.
. stgenreg, loghazard([ln_lambda] :+ [ln_gamma] :+ (exp([ln_gamma]):-1):*log(#t))
A linear predictor can be defined for any of the parameters, with the name of the
option defined as the name of the parameter specified in the loghazard() or hazard()
option. For example, a proportional hazards Weibull model can be fitted with covari-
ates treatment, age and sex by adding the option ln lambda(treatment age sex), as
follows
. stgenreg, loghazard([ln_lambda] :+ [ln_gamma] :+ (exp([ln_gamma]):-1):*log(#t))
> ln_lambda(treatment age sex)
78
One of the key advantages of stgenreg is that I can incorporate a variety of func-
tions (including functions of time) into the linear predictor of any parameter. For
example, parameter [ln lambda] has an available option ln lambda(comp1 | comp2
| ...| compn), which can contain a variety of component functions to increase com-
plexity. Each compj can contain a variety of functions described in Table 5.1.
Table 5.1. Description of each component that can be included in thelinear predictor of a parameter included in a stgenreg model.
Component Descriptionvarlist [, nocons] the user may specify a standard variable list within a
component section, with an optional nocons option
g(#t) where g() is any user defined function of #t writtenin Mata code, for example #t:^2
#rcs(options ) creates restricted cubic splines of either log time ortime. Options include df(int), the number of de-grees of freedom, noorthog which turns off the de-fault orthogonalisation, time, which creates splinesusing time rather than log time, the default, andoffset(varname) to include an offset when calculat-ing the splines. See rcsgen in Stata for more details.
#fp(numlist [,options ]) creates fractional polynomials of time with powers de-fined in numlist. If 0 is specified, log time is gener-ated. The only current option is offset() which isconsistent with that described in #rcs() above.
varname:*f(#t) to include time-dependent effects, where f(#t) is oneof #rcs(), #fp() or g().
Finally, all standard options of the ml suite in Stata can be used when fitting a
stgenreg model, such as constraints() which allow the user to constrain the value of
any coefficient to be a particular constant, and weights which can allow the extension
for use in case-cohort studies.
5.8.2. Predictions
A variety of predictions can be obtained following the estimation of a model. These
79
include the hazard, survival and cumulative hazard functions. The standard Stata syn-
tax to obtain predictions following a model fit is predict newvarname, statistic.
So for example, to obtain the fitted survival function, you simply specify predict
surv1, survival. Extended prediction options unavailable in standard software in-
clude: zeros - obtains baseline predictions, at()- obtains predictions at specified co-
variate patterns, timevar() - obtains predictions at specified times. These options can
be combined with standard choices of hazard, cumhazard and survival. Finally, the
ci option can be used to obtain confidence intervals using the delta method (Carstensen,
2004).
5.9. Analysis of example datasets using stgenreg
In this section I illustrate the use of the general modelling framework using the
England and Wales breast cancer dataset, described in Section 3.3 of Chapter 3, and
the dataset of patients with liver cirrhosis, described in Section 3.5 of Chapter 3. Briefly,
the England and Wales breast cancer dataset consists of 9721 women aged under 50 and
diagnosed with breast cancer in England and Wales between 1986 and 1990. The event
of interest is death from any cause, with follow-up restricted to 5 years. Deprivation
was categorised into 5 levels; however, I have restricted the analyses to comparing the
least and most deprived groups, for illustrative purposes. I therefore only consider a
binary covariate, dep5, with 0 for the least deprived and 1 for the most deprived group.
The second dataset consists of 488 patients with liver cirrhosis (Anderson et al., 1993).
A total of 251 patients were randomised to receive prednisone, with 237 randomised to
receive a placebo. Prothrombin index was measured repeatedly, with between 1 and
17 measurements per subject, resulting in 2968 observations. Outcome was all-cause
death.
5.9.1. Proof of concept - Weibull proportional hazards model
I begin by fitting a Weibull proportional hazards model to the breast cancer dataset,
investigating the effect of deprivation status. Given that Weibull models are available in
all standard statistical software, I first illustrate the concept showing that the estimates
80
agree with estimates derived using analytically tractable definitions of the hazard and
survival functions. The baseline hazard and log hazard functions have the following
form
hi(t) = λγtγ−1 exp(Xiβ)
and
log(h(t)) = log(λ) + log(γ) + (γ − 1) log(t) +Xiβ
where Xi is a vector of covariates, with corresponding regression coefficients β. In this
case it is convenient to use the loghazard() option of stgenreg. I can investigate
covariate effects by including deprivation status in the linear predictor of log(λ), using
the option ln lambda().
. stgenreg, loghazard([ln_lambda] :+ [ln_gamma] :+ (exp([ln_gamma]):-1):*log(#t))> ln_lambda(dep5)
Log likelihood = -8808.3461 Number of obs = 9721
Coef. Std. Err. z P>|z| [95% Conf. Interval]
ln_lambda
dep5 .2698351 .0392018 6.88 0.000 .1930011 .3466692
_cons -2.823244 .0371849 -75.92 0.000 -2.896125 -2.750363
ln_gamma
_cons .0454263 .0181255 2.51 0.012 .0099009 .0809517
Quadrature method: Gauss-Legendre with 15 nodes
A log hazard ratio of 0.270 (95% CI: 0.193, 0.347) is observed, and consequently a
hazard ratio of 1.310 (95% CI: 1.213, 1.414), indicating a 31% increase in the mortality
rate in the most deprived group compared to the least deprived. I could further model
the γ parameter by deprivation status, simply by adding the option ln gamma(dep5).
When fitting models which rely on numerical integration, it is important to estab-
lish the stability of maximum likelihood estimates by using an increasing number of
quadrature nodes. In the case of a Weibull proportional hazards model, I can both
compare with the optimised model using streg in Stata (where integrals are obtained
analytically), and compare with an increasing number of quadrature nodes to evaluate
the numerical approximation. Here I present results from fitting the streg model and
stgenreg models with 15, 30, 50 and 100 nodes.
81
Table 5.2. England and Wales breast cancer dataset. Results fromoptimised Weibull and general parametric models with varying numberof nodes.
VariableOptimised General parametric models
Weibull model 15 nodes 30 nodes 50 nodes 100 nodes
log(λ):Most deprived 0.270 0.270 0.270 0.270 0.270
(standard error) 0.039 0.039 0.039 0.039 0.039Intercept -2.825 -2.823 -2.825 -2.825 -2.825
(standard error) 0.037 0.037 0.037 0.037 0.037
log(γ) 0.047 0.045 0.046 0.047 0.047(standard error) 0.018 0.018 0.018 0.018 0.018
log-likelihood -8808.085 -8808.346 -8808.149 -8808.108 -8808.091
From Table 5.2, consistent parameter estimates are observed to 3 decimal places
with 30 nodes, and accuracy is improved when the number of nodes are increased. How-
ever, computation time will increase with an increasing number of nodes, for example
using 15 nodes takes 7.4 seconds compared with 12.4 seconds using 100 nodes (on a
HP laptop with Intel i5 2.5GHz processor and 8GB of RAM). In comparison, the fully
optimised streg model took 0.4 seconds to converge. This difference is clearly expected
as the stgenreg formulation of the Weibull model is not the most computationally effi-
cient, as there is no need to use numerical integration when using the standard Weibull
model. The performance of the quadrature approximation is more formally evaluated
in Section 5.10.1.
To illustrate the use of the delta method in calculating confidence intervals for
fitted cumulative hazard and survival functions, I present Figure 5.1, which compares
the fitted survival function and 95% confidence intervals from the analytic Weibull, and
the equivalent Weibull model fitted with quadrature.
Figure 5.1 indicates extremely close agreement between the confidence interval cal-
culated using the delta method and quadrature, described in Section 5.7, compared to
82
.7
.8
.9
1
Sur
viva
l fun
ctio
n
0 1 2 3 4 5Follow-up time (years)
Fitted survival: NumericFitted survival lower CI: NumericFitted survival upper CI: NumericFitted survival: AnalyticFitted survival lower CI: AnalyticFitted survival upper CI: Analytic
Confidence interval comparison
Figure 5.1. England and Wales breast cancer dataset. Comparisonof fitted survival function and confidence intervals between analyticWeibull model and equivalent model fitted with quadrature.
the analytic equivalent, illustrating the validity of the technique. This will be further
discussed in Section 5.12.
5.9.2. Generalised gamma proportional hazards model
The Generalised Gamma (GG) is a 3-parameter parametric model implemented in a
variety of statistical packages (Cox et al., 2007). However, it is parameterised as an
accelerated failure time model in Stata. I define the survival and density functions as
follows
SGG(t) =
1− I (γ, u) if κ > 0
1− Φ (z) if κ = 0
I (γ, u) if κ < 0
(5.17)
and
fGG(t) =
γγ
σt√
2πexp(z
√γ − u) if κ 6= 0
1σt√
2πexp(−z2/2) if κ = 0
(5.18)
where γ = |κ|−2, z = sign(κ)log(t) − µ/σ, u = γ exp(|κ|z), Φ(z) is the standard
normal cumulative distribution, and I(a, x) is the incomplete gamma function.
83
Therefore, using Equation (2.6), I can write down the baseline hazard function as
the ratio of the probability distribution function to the survival function.
hGG(t) =fGG(t)
SGG(t)
To invoke proportional hazards I can then simply multiply by the exponential of a
parameter, the linear predictor of which is our vector of covariates
hGG(t) =fGG(t)
SGG(t)exp(Xiβ) or log(hGG(t)) = log
(fGG(t)
SGG(t)
)+Xiβ
where β is a vector of log hazard ratios. In terms of implementation, in the linear
predictor for theXiβ parameter I must specify the nocons option to ensure no intercept
term, obtaining a proportional hazards formulation for the GG model.
I can build up the definition of this complex hazard function using Stata’s local
macros
. local mu [mu]
. local sigma exp([ln_sigma])
. local kappa [kappa]
. local gamma (abs(`kappa´) :^ (-2))
. local z (sign(`kappa´) :* (log(#t) :- `mu´) :/ (`sigma´))
. local u ((`gamma´) :* exp(abs(`kappa´) :* (`z´)))
. local surv1 (1 :- gammap(`gamma´,`u´)) :* (`kappa´ :> 0)
. local surv2 (1 :- normal(`z´)) :* (`kappa´ :== 0)
. local surv3 gammap(`gamma´,`u´) :* (`kappa´ :< 0)
. local pdf1 ((`gamma´ :^ `gamma´) :* exp(`z´ :* sqrt(`gamma´) :- `u´) :/ ///> (`sigma´ :* #t :* sqrt(`gamma´) :* gamma(`gamma´))) :* (`kappa´ :! =0). local pdf2 (exp(-(`z´ :^ 2) :/ 2) :/ (`sigma´ :* #t :* sqrt(2 :* pi()))) ///> :* (`kappa´ :== 0). local haz (`pdf1´ :+ `pdf2´) :/ (`surv1´ :+ `surv2´ :+ `surv3´)
. stgenreg, hazard(exp([xb]) :* (`haz´)) nodes(30) xb(dep5,nocons)Log likelihood = -8801.2754 Number of obs = 9721
----------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+--------------------------------------------------------------xb |
dep5 | .2694578 .0391992 6.87 0.000 .1926289 .3462868-------------+--------------------------------------------------------------kappa |
_cons | .6752793 .0749985 9.00 0.000 .528285 .8222735-------------+--------------------------------------------------------------mu |
_cons | 2.710497 .032793 82.65 0.000 2.646224 2.774771-------------+--------------------------------------------------------------ln_sigma |
_cons | .1727204 .0521935 3.31 0.001 .0704231 .2750178----------------------------------------------------------------------------Quadrature method: Gauss-Legendre with 30 nodes
84
I obtain a log hazard ratio for the most deprived compared to the least deprived, of
0.269 (95% CI: 0.193, 0.346), which is identical to the Weibull model fitted in Section
5.9.1, to 3 decimal places. This is very much a conceptual example, but aims to illustrate
just how flexible the modelling framework is. By simply introducing an extra parameter
I can implement a model not available in any software package. As far as I am aware,
a proportional hazards extension for the generalised gamma has not previously been
done.
5.9.3. Restricted cubic splines
I now introduce an alternative to the RP model, using restricted cubic splines but now
on the log hazard scale. I begin by assuming a proportional hazards model, modelling
the baseline log hazard function using restricted cubic splines
log hi(t) = log h0(t) +Xiβ = s(log(t)|γ,k0) +Xiβ (5.19)
where Xi is a vector of baseline covariates with associated log hazard ratios β, and
s(log(t)|γ,k0) is a function of log(t) expanded into restricted cubic spline basis with
knot location vector, k0, and associated coefficient vector, γ.
Similarly to the RP model, in terms of knot locations, for the internal knots, I use
by default the centiles of the uncensored log survival times, and for the boundary knots
I use the minimum and maximum observed uncensored log survival times. Once more,
the choice of the number of spline terms (more spline terms allows greater flexibility),
is left to the user.
Shorthand notation has been programmed within stgenreg to allow a very simple
syntax when fitting a restricted cubic spline based model, for example, to fit a model
on the log hazard scale with 5 spline parameters to model the baseline, I use the
#rcs(df()) notation as follows
85
. stgenreg, loghazard([xb]) xb(dep5 | #rcs(df(5))) nodes(30)Variables _eq1_cp2_rcs1 to _eq1_cp2_rcs5 were created
initial: log likelihood = -39318.31alternative: log likelihood = -24641.326rescale: log likelihood = -9630.8486Iteration 0: log likelihood = -9630.8486Iteration 1: log likelihood = -8763.1988Iteration 2: log likelihood = -8756.2377Iteration 3: log likelihood = -8756.2213Iteration 4: log likelihood = -8756.2213
Log likelihood = -8756.2213 Number of obs = 9721
Coef. Std. Err. z P>|z| [95% Conf. Interval]
dep5 .2693634 .0392018 6.87 0.000 .1925293 .3461976
_eq1_cp2_rcs1 -.0621779 .0274602 -2.26 0.024 -.1159989 -.008357
_eq1_cp2_rcs2 .0784834 .0192975 4.07 0.000 .0406611 .1163057
_eq1_cp2_rcs3 .1158689 .0176746 6.56 0.000 .0812272 .1505106
_eq1_cp2_rcs4 -.0251518 .0143719 -1.75 0.080 -.0533202 .0030165
_eq1_cp2_rcs5 .0012793 .0134076 0.10 0.924 -.0249991 .0275576
_cons -2.910463 .0607005 -47.95 0.000 -3.029434 -2.791492
Quadrature method: Gauss-Legendre with 30 nodes
The model estimates a log hazard ratio for the most deprived compared to the least
deprived of 0.269 (95% CI: 0.193, 0.346), which I compare to that of 0.269 (95% CI:
0.193, 0.346) obtained from a proportional hazards Cox model, indicating extremely
good agreement between the parametric and semi-parametric approaches. The esti-
mated baseline hazard function from the spline-based model and associated 95% con-
fidence interval can be calculated by
. predict h1, hazard zeros ci
with the estimated function shown in Figure 5.2.
Figure 5.2 illustrates that the model predicts a highly complex underlying hazard
function, which clearly could not be captured by a simple parametric model such as
the Weibull.
5.9.3.1. Complex time-dependent effects
Continuing with the special case of using restricted cubic splines, I can incorporate
time-dependent effects into the model framework as follows
log hi(t) = s(log(t)|γ0,k0) +Xiβ +
P∑
p=1
xips(log(t)|γp,kp) (5.20)
86
0.05
0.10
0.15
0.20
0.25
Haz
ard
rate
0 1 2 3 4 5Follow−up time (years)
95% confidence interval Baseline hazard rate
Figure 5.2. England and Wales breast cancer dataset. Predicted haz-ard function for the most affluent group with 95% confidence interval.
where for the pth time-dependent effect, with p = 1, . . . , P , I have xp, the pth covariate,
multiplied by a spline function of log time, s(log(t)|γp,kp), with knot location vector,
kp, and coefficient vector, γp. Once again, degrees of freedom for each time-dependent
effect can be guided using model selection criteria, and/or the impact of different knot
locations assessed through sensitivity analysis. An advantage of modelling on the log
hazard scale is that when there are multiple time-dependent effects, the interpretation
of the time-dependent hazard ratios is simplified as they do not depend on values of
other covariates, which is the case when modelling on the cumulative hazard scale
(Royston and Lambert, 2011).
Returning to the England/Wales breast cancer data, I can investigate the presence
of a time-dependent effect due to deprivation status, by applying Equation (5.20). I
use 5 degrees of freedom to capture the baseline and use 3 degrees of freedom to model
the time-dependent effect of deprivation status, which can be fitted as follows
. stgenreg, loghazard([xb]) xb(dep5 | #rcs(df(5)) | dep5:*#rcs(df(3))) nodes(30> )Variables _eq1_cp2_rcs1 to _eq1_cp2_rcs5 were createdVariables _eq1_cp3_rcs1 to _eq1_cp3_rcs3 were created
initial: log likelihood = -39318.31alternative: log likelihood = -24641.326rescale: log likelihood = -9630.8486Iteration 0: log likelihood = -9630.8486
87
Iteration 1: log likelihood = -8755.8859Iteration 2: log likelihood = -8747.373Iteration 3: log likelihood = -8747.3275Iteration 4: log likelihood = -8747.3275
Log likelihood = -8747.3275 Number of obs = 9721
Coef. Std. Err. z P>|z| [95% Conf. Interval]
dep5 .0723415 .0924005 0.78 0.434 -.1087602 .2534433
_eq1_cp2_rcs1 -.0108058 .0309504 -0.35 0.727 -.0714673 .0498558
_eq1_cp2_rcs2 .0672877 .0224852 2.99 0.003 .0232177 .1113578
_eq1_cp2_rcs3 .1128672 .0207167 5.45 0.000 .0722634 .1534711
_eq1_cp2_rcs4 -.0261438 .0145455 -1.80 0.072 -.0546525 .002365
_eq1_cp2_rcs5 .0014202 .0134079 0.11 0.916 -.0248589 .0276992
_eq1_cp3_rcs1 -.1464002 .0443983 -3.30 0.001 -.2334194 -.0593811
_eq1_cp3_rcs2 .0425164 .0333753 1.27 0.203 -.022898 .1079307
_eq1_cp3_rcs3 .0135896 .0322604 0.42 0.674 -.0496396 .0768187
_cons -2.849318 .0649361 -43.88 0.000 -2.976591 -2.722046
Quadrature method: Gauss-Legendre with 30 nodes
Figure 5.3 shows the time-dependent hazard ratio, illustrating the decrease in the
effect of deprivation over time.
12
34
56
Haz
ard
ratio
0 1 2 3 4 5Follow-up time (years since diagnosis)
95% Confidence Interval Hazard ratio
Figure 5.3. England and Wales breast cancer dataset. Time-dependent hazard ratio for deprivation status.
In Figure 5.4 I compare the fit of the models with either time-independent or
time-dependent hazard ratios for deprivation status, by overlaying the fitted survival
functions onto the Kaplan-Meier curve, for each deprivation group. It is clear from
Figure 5.4 that I obtain a much improved fit to the Kaplan-Meier curve when modelling
the time-dependent effect of deprivation group.
88
0.6
0.7
0.8
0.9
1.0
Sur
viva
l
0 1 2 3 4 5Follow−up time (years)
Proportional hazards
0.6
0.7
0.8
0.9
1.0
Sur
viva
l
0 1 2 3 4 5Follow−up time (years)
Non−proportional hazards
Affluent group, KM curve Deprived group, KM curveAffluent group, stgenreg Deprived group, stgenreg
Figure 5.4. England and Wales breast cancer dataset. Kaplan-Meierestimates for the most affluent and most deprived groups, with predictedsurvival overlaid. The figure on the left shows predicted survival witha proportional effect of deprivation status, with the figure on the rightallowing for non-proportional hazards in the effect of deprivation status.
5.9.4. Time-varying covariates
Within a survival analysis framework, time-varying covariates occur when the value
of a covariate for individual patients can change at various points in follow-up. For
example in oncology clinical trials, patients will often switch treatment group when
their condition progresses (Morden et al., 2011), or biomarkers may be measured re-
peatedly over time, resulting in multiple records per subject (Wulfsohn and Tsiatis,
1997; Henderson et al., 2000). For this form of analysis the data is often set up into
start and stop times, and since delayed entry (left truncation) is allowed, this again
is incorporated into the described modelling framework. The analysis of repeatedly
measured biomarkers and survival data within a joint model framework is considered
in later chapters.
I now illustrate the data setup required for survival analysis incorporating a time-
varying covariate, usually referred to as multiple-record per subject. I return to the
liver cirrhosis dataset described above. Table 5.3 shows the observed data for 2 patients,
in multiple-records per subject structure.
89
Table 5.3. Liver cirrhosis trial of prednisone treatment. Examplemultiple-record per subject data structure.
ID Prothrombin Treatment Enter time Exit time Event
1 38 placebo 0 0.244 01 31 placebo 0.244 0.381 01 27 placebo 0.381 0.413 1
111 59 prednisone 0 0.246 0111 60 prednisone 0.246 0.498 0111 87 prednisone 0.498 0.745 0111 59 prednisone 0.745 1.128 0111 35 prednisone 1.128 1.158 1
The ‘enter’ times refer to when prothrombin index was measured, with the final
row per subject of ‘exit’ times corresponding to each subject’s observed event/censoring
time. With the data set up into this form, any general parametric model can then be
applied, for example, I apply a model using restricted cubic splines for the baseline,
adjusting for treatment and the time-varying prothrombin index, using the following
. stgenreg, loghazard([xb]) xb(pro trt | #rcs(df(5))) nodes(30)
Results are presented in Table 5.4.
Table 5.4. Liver cirrhosis trial of prednisone treatment. Results fromspline based proportional hazards model with time-varying prothrombinindex.
Variable log(HR) Std. Err. p-value 95% CI
Prothrombin index -0.035 0.002 0.000 -0.040 -0.030Treatment 0.133 0.118 0.262 -0.099 0.364
I observe a log hazard ratio of -0.035 (95% CI: -0.040, -0.030), which equates to a
hazard ratio of 0.966 (95% CI: 0.961, 0.970) indicating lower values of the biomarker
are associated with an increased risk of death.
90
5.10. Improving the estimation procedure when using restricted cubic
splines
The very nature of the modelling framework described above implies that I can
specify practically any general function in the definition of the hazard or log hazard
function, given that it satisfies h(t) > 0 for all t > 0. As a development tool, this can
be seen as being very useful, in that novel models can often be fitted with a single
line of code. However, moving beyond the general framework, and employing a more
model specific implementation can provide increased optimisation and a more efficient
implementation. To illustrate this aspect, I concentrate on a particular flexible way of
modelling survival data, using restricted cubic splines (Durrleman and Simon, 1989)
on the log hazard scale, as opposed to the log cumulative hazard scale as implemented
in RP models, described in Section 2.8.2.
5.10.1. Improving estimation using a combined analytic/numeric ap-
proach
In the restricted cubic spline based models described above, the restricted nature of
the spline function forces the baseline log hazard function to be linear beyond the
boundary knots. In those areas the cumulative hazard function can actually be written
analytically, as the log hazard is a linear function of log time. Defining the boundary
knots to be k01, k0n, I need only conduct numerical integration between k01, k0n, using
the analytical form of the cumulative hazard function beyond the boundary knots.
I define δ0i and δ1i to be the intercept and slope of the log hazard function for the
ith patient before the first knot, k01, and φ0i and φ1i to be the intercept and slope of the
log hazard function for the ith patient beyond the final knot, k0n. If there are no time-
dependent effects then δ0i, δ1i, φ0i, φ1i are constant across patients. The cumulative
hazard function can then be defined in three components
Hi(t) = H1i(t) +H2i(t) +H3i(t) (5.21)
91
If I assume t0i < k01 and ti > k0n, then before the first knot, I have
H1i(t) =exp(δ0i)
δ1i + 1
min(ti, k01)δ1i+1 − tδ1i+1
0i
(5.22)
and after the final knot,
H3i(t) =exp(φ0i)
φ1i + 1
tφ1i+1i −max(t0i, k0n)φ1i+1
(5.23)
and H2i(t) becomes
H2i(t) ≈k0n − k01
2
m∑
j=1
vjhi
(k0n − k01
2zj +
k01 + k0n
2
)(5.24)
where zj is the jth quadrature node with associated weight, vj . Alternatively, there
may be observations where k0n > t0i > k01 and ti > k0n then
H1i(t) = 0
H2i(t) ≈k0n − t0i
2
m∑
j=1
vjhi
(k0n − t0i
2zj +
t0i + k0n
2
)
H3i(t) =exp(φ0i)
φ1i + 1
tφ1i+1i −max(t0i, k0n)φ1i+1
If t0i < k01 and k01 < ti < k0n, then
H1i(t) =exp(δ0i)
δ1i + 1
min(ti, k01)δ1i+1 − tδ1i+1
0i
H2i(t) ≈ti − k01
2
m∑
j=1
vjhi
(ti − k01
2zj +
k01 + ti2
)
H3i(t) = 0
If k01 < t0i < ti < k0n, then
H1i(t) = 0
H2i(t) ≈ti − t0i
2
m∑
j=1
vjhi
(ti − t0i
2zj +
t0i + ti2
)
H3i(t) = 0
92
If t0i < ti < k01, then
H1i(t) =exp(δ0i)
δ1i + 1
tδ1i+1i − tδ1i+1
0i
H2i(t) = 0
H3i(t) = 0
Finally, if k0n < t0i < ti, then
H1i(t) = 0
H2i(t) = 0
H3i(t) =exp(φ0i)
φ1i + 1
tφ1i+1i − tφ1i+1
0i
In this section I wish to establish the benefit of incorporating the analytic compo-
nents, described above, compared to the general method of only using numerical inte-
gration, described in Section 5.3. I use the general Stata software package, stgenreg,
described above, to fit the full quadrature based approach, and a newly developed Stata
package, strcs, which implements the combined analytic and numerical approach when
using splines on the log hazard scale. Returning to the England/Wales breast cancer
dataset, I apply the spline based models shown in Equation (5.19), with 5 degrees of
freedom (6 knots), i.e. 5 spline variables to capture the baseline, incorporating the pro-
portional effect of deprivation status, with an increasing number of quadrature points,
ranging from 10 to 1000.
93
Table5.5.
En
gla
nd
and
Wal
esb
reas
tca
nce
rd
atas
et.
Com
par
ison
ofes
tim
ates
wh
enu
sin
gd
iffer
ent
nu
mb
ers
ofn
od
esfo
rth
efu
lly
nu
mer
icap
pro
ach
Nu
mb
erof
Nod
esP
aram
eter
10
20
3040
5010
0250
500
100
0M
ost
dep
rive
d0.
26856
00.
269302
0.26
9363
0.26
9380
0.26
9386
0.269
393
0.269
395
0.269
395
0.269
395
(0.0
3920
3)
(0.0
39202)
(0.0
3920
2)(0
.039
202)
(0.0
3920
2)(0
.0392
02)
(0.0
3920
2)(0
.039
202)
(0.0
392
02)
γ0
-2.9
168
19
-2.9
12434
-2.9
1046
3-2
.909
648
-2.9
0924
0-2
.908
601
-2.9
0828
9-2
.9082
01-2
.908
162
(0.0
6086
0)
(0.0
60749)
(0.0
6070
1)(0
.060
682)
(0.0
6067
3)(0
.0606
59)
(0.0
6065
1)(0
.060
648)
(0.0
606
47)
γ1
-0.0
851
13
-0.0
66088
-0.0
6217
8-0
.060
704
-0.0
5997
9-0
.058
850
-0.0
5834
6-0
.0582
14-0
.058
158
(0.0
2764
4)
(0.0
27508)
(0.0
2746
0)(0
.027
442)
(0.0
2743
2)(0
.0274
16)
(0.0
2740
8)(0
.027
405)
(0.0
274
04)
γ2
0.0
380
85
0.07
203
30.
0784
830.
0809
230.
0821
460.
084
099
0.084
980
0.085
214
0.085
314
(0.0
1994
0)
(0.0
19462)
(0.0
1929
7)(0
.019
231)
(0.0
1919
6)(0
.0191
35)
(0.0
1910
1)(0
.019
090)
(0.0
190
84)
γ3
0.1
473
81
0.12
189
10.
1158
690.
1134
730.
1122
520.
110
276
0.109
344
0.109
088
0.108
976
(0.0
1825
8)
(0.0
17899)
(0.0
1767
5)(0
.017
569)
(0.0
1750
9)(0
.0173
98)
(0.0
1733
3)(0
.017
311)
(0.0
172
99)
γ4
-0.0
404
37
-0.0
27974
-0.0
2515
2-0
.024
017
-0.0
2343
3-0
.022
474
-0.0
2201
7-0
.0218
90-0
.021
834
(0.0
1446
9)
(0.0
14429)
(0.0
1437
2)(0
.014
343)
(0.0
1432
7)(0
.0142
96)
(0.0
1427
7)(0
.014
270)
(0.0
142
67)
γ5
0.0
101
85
0.00
317
40.
0012
790.
0005
180.
0001
33-0
.000
481
-0.0
007
75-0
.0008
57-0
.000
893
(0.0
1351
2)
(0.0
13438)
(0.0
1340
8)(0
.013
395)
(0.0
1338
8)(0
.0133
74)
(0.0
1336
6)(0
.013
363)
(0.0
133
61)
log-l
ikel
ihood
-8739.
9490
-8753.
8333
-875
6.22
13-8
757.
0858
-875
7.50
06-8
758.1
249
-875
8.383
0-8
758
.4444
-875
8.468
3Standard
errors
inparentheses
94
Table5.6.
En
gla
nd
and
Wal
esb
reas
tca
nce
rd
atas
et.
Com
par
ison
ofes
tim
ates
wh
enu
sin
gd
iffer
ent
nu
mb
ers
ofn
od
esfo
rth
eco
mb
ined
analy
tica
l/nu
mer
icap
pro
ach
Nu
mb
erof
Nod
esP
aram
eter
10
20
3040
5010
0250
500
100
0M
ost
dep
rive
d0.
26929
50.
269376
0.26
9390
0.26
9393
0.26
9394
0.269
395
0.269
395
0.269
395
0.269
395
(0.0
3920
2)
(0.0
39202)
(0.0
3920
2)(0
.039
202)
(0.0
3920
2)(0
.0392
02)
(0.0
3920
2)(0
.039
202)
(0.0
392
02)
γ0
-2.9
063
90
-2.9
08770
-2.9
0835
3-2
.908
198
-2.9
0814
8-2
.908
133
-2.9
0813
3-2
.9081
33-2
.908
133
(0.0
6065
6)
(0.0
60663)
(0.0
6065
0)(0
.060
648)
(0.0
6064
7)(0
.0606
47)
(0.0
6064
7)(0
.060
647)
(0.0
606
47)
γ1
-0.0
614
99
-0.0
59304
-0.0
5846
9-0
.058
225
-0.0
5814
9-0
.058
118
-0.0
5811
7-0
.0581
17-0
.058
117
(0.0
2739
7)
(0.0
27411)
(0.0
2740
5)(0
.027
404)
(0.0
2740
4)(0
.0274
03)
(0.0
2740
3)(0
.027
403)
(0.0
274
03)
γ2
0.0
775
81
0.08
372
00.
0849
020.
0852
330.
0853
370.
085
390
0.085
390
0.085
390
0.085
390
(0.0
1903
3)
(0.0
19082)
(0.0
1908
2)(0
.019
080)
(0.0
1908
0)(0
.0190
79)
(0.0
1907
9)(0
.019
079)
(0.0
190
79)
γ3
0.1
129
49
0.11
041
00.
1093
700.
1090
430.
1089
380.
108
889
0.108
888
0.108
888
0.108
888
(0.0
1711
7)
(0.0
17279)
(0.0
1729
1)(0
.017
290)
(0.0
1728
9)(0
.0172
88)
(0.0
1728
8)(0
.017
288)
(0.0
172
88)
γ4
-0.0
246
49
-0.0
22456
-0.0
2199
6-0
.021
857
-0.0
2181
2-0
.021
790
-0.0
2179
0-0
.0217
90-0
.021
790
(0.0
1418
8)
(0.0
14258)
(0.0
1426
3)(0
.014
263)
(0.0
1426
3)(0
.0142
63)
(0.0
1426
3)(0
.014
263)
(0.0
142
63)
γ5
-0.0
001
64
-0.0
00367
-0.0
0074
5-0
.000
869
-0.0
0090
8-0
.000
921
-0.0
0092
2-0
.0009
22-0
.000
922
(0.0
1342
8)
(0.0
13363)
(0.0
1336
0)(0
.013
360)
(0.0
1336
0)(0
.0133
60)
(0.0
1336
0)(0
.013
360)
(0.0
133
60)
log-l
ikel
ihood
-8754.
2660
-8757.
6342
-875
8.25
59-8
758.
4167
-875
8.46
34-8
758.4
839
-875
8.484
0-8
758
.4840
-875
8.484
0
Standard
errors
inparentheses
95
Table 5.5 presents parameter estimates and standard errors under the full numer-
ical approach, across varying number of quadrature nodes, and Table 5.6 presents the
equivalent results for the combined analytic/numeric approach. From Table 5.5, I still
observe variation in estimates and the log-likelihood to 5 or 6 decimal places between
500 and 1000 nodes, whilst for the combined approach shown in Table 5.6, the maxi-
mum difference between 100 and 1000 nodes is 0.000001. For the combined approach
the log-likelihood does not change to 3 decimal places between 100 and 1000 nodes,
whilst the log-likelihood for the full numerical approach is only the same to 1 decimal
place.
I found that with the full numerical approach it required 23 nodes, and 50 nodes,
to establish consistent estimates to 3 and 4 decimal places, respectively. I compare that
to 18 nodes, and 27 nodes under the combined analytic and numerical approach. Final
results for the combined approach using 27 nodes are presented in Table 5.7.
Table 5.7. England and Wales breast cancer dataset. Results fromcombined analytic/numerical spline based survival model.
Variable Hazard ratio 95% CI
Most deprived 1.309 1.212 1.414
Baseline Coefficient 95% CI
γ1 -0.059 -0.112 -0.005γ2 0.085 0.047 0.122γ3 0.110 0.076 0.143γ4 -0.022 -0.050 0.006γ5 -0.001 -0.027 0.025
Intercept -2.908 -3.027 -2.789
From Table 5.7 I observe a statistically significant hazard ratio of 1.309 (95% CI:
1.212, 1.414), indicating an increased hazard rate in the most deprived group, compared
to the least deprived. Comparing computation time, the general approach with 49
quadrature nodes took 20.5 seconds on a HP laptop with Intel i5 2.5GHz processor and
8GB of RAM, compared to 17.5 seconds using the combined approach with 27 nodes.
For clarity, I can compare the estimated hazard ratio from the spline based approaches
96
to that obtained from a Weibull model, which gave an estimated hazard ratio of 1.310
(95% CI: 1.213, 1.414), indicating very close agreement.
5.11. Further analysis of example datasets using strcs
I now proceed with further illustration of the general framework; however, now
using the spline based model, on the log hazard scale, with the improved estimation
routine implemented in the Stata package strcs.
5.11.1. Excess mortality model
To illustrate the excess mortality model I use the same data source as in Section
5.10.1; the England and Wales breast cancer dataset. However, I now include women
aged over 50. Expected mortality is stratified by age, sex, calendar year, region and
deprivation quintile (Coleman et al., 1999). As for the previous analyses in this chapter,
I only include the least and most deprived groups for simplicity. Age is categorised into
5 groups: <50, 50-59, 60-69, 70-79, 80+. There are 41,645 subjects included in the
analysis.
5.11.1.1. Proportional excess hazards model
I initially fit a model where the excess mortality rate is assumed to be proportional be-
tween different covariate patterns. I compare the estimates to a model using restricted
cubic splines on the log cumulative hazard scale (Nelson et al., 2007). In both models
6 knots are used with these placed evenly according to the distribution of log death
times. Results are shown in Table 5.8.
From Table 5.8, I observe very similar hazard ratios and their 95% confidence
intervals between the models on different scales.
5.11.1.2. Time-dependent effects
A model is now fitted where the assumption of proportional excess hazards is relaxed
for all covariates. This is done by incorporating an interaction between each covariate
and a restricted cubic spline function of log time with 4 knots (3 degrees of freedom).
The knots are placed evenly according to the distribution of log death times. The
estimated excess hazard ratio for deprivation group can be seen in Figure 5.5. As there
97
Table 5.8. England and Wales breast cancer dataset. Comparison ofexcess hazard ratios (and 95% confidence intervals) from models with thelinear predictor on the log hazard scale and the log cumulative hazardscale. Both models have 6 knots with these placed evenly according tothe distribution of log death times.
Covariate log hazard log cumulative hazardMost deprived 1.313 1.313
(1.265,1.364) (1.265,1.364)
Age (50-59) 1.055 1.055(0.998,1.114) (0.998,1.114)
Age (60-69) 1.071 1.071(1.014,1.130) (1.015,1.131)
Age (70-79) 1.453 1.454(1.372,1.539) (1.373,1.540)
Age (80+) 2.647 2.647(2.484,2.822) (2.484,2.821)
Age <50 and least deprived are the reference groups
is not an interaction between deprivation group and age group then this hazard ratio
is assumed to apply for each of the 5 age groups. If the model was fitted on the log
cumulative excess hazard scale, then this would not be the case. This is illustrated
in Figure 5.6 where the same linear predictor has been fitted for a model on the log
cumulative excess hazard scale and the estimated excess hazard ratio is shown for two
age groups and is shown to be different.
5.11.2. Cluster robust standard errors
To illustrate the use of cluster robust standard errors combined with the numerical
integration approach, I use a dataset of 85 patients with bladder cancer (Prentice et al.,
1981; Therneau and Grambsch, 2000). I fit a model for recurrent event data, where the
event of interest is recurrence of bladder cancer. Each patient can experience a total of
4 events, shown in Table 5.9. A total of 112 events were observed. Covariates of interest
include treatment group (0 for placebo, 1 for thiotepa), initial number of tumors (range
1 to 8, with 8 meaning 8 or more), and initial size of tumors (in centimetres, with range
1 to 7).
98
1
1.2
1.4
1.6
1.8
Exc
ess
mor
talit
y ra
tio
0 1 2 3 4 5Year since diagnosis
Figure 5.5. England and Wales breast cancer dataset. Excess hazardratio comparing most deprived with least deprived group. The modelused 6 knots for the baseline and 4 knots for the time-dependent effect.Shaded area represents the 95% confidence interval.
1
1.2
1.4
1.6
1.8
Exc
ess
mor
talit
y ra
tio
0 1 2 3 4 5Year since diagnosis
Age <50Age 70-79
Thinner lines are lower and upper confidence intervals
Figure 5.6. England and Wales breast cancer dataset. Excess hazardratios comparing most deprived with least deprived group. The modelused 6 knots for the baseline and 4 knots for the time-dependent effect.Thicker lines are the estimated excess hazard ratios, with thinner linesassociated 95% confidence intervals.
To allow for the inherent hierarchical structure, i.e. events nested within patients,
I fit a parametric version of the Prentice-Williams-Peterson model, allowing for cluster
99
Table 5.9. Bladder cancer dataset. Number of patients who were cen-sored or experienced up to 4 recurrences of bladder cancer
Recurrence Number of patientsnumber Censored Event Total
1 38 47 852 17 29 463 5 22 274 6 14 20
robust standard errors. This model uses non-overlapping time intervals, thus for ex-
ample, a patient is not at risk of a second recurrence until after the first has occurred.
I illustrate this by showing the data structure required
. list id _t0 _t _d if id==14
id _t0 _t _d
21. 14 0 3 1
22. 14 3 9 1
23. 14 9 21 1
24. 14 21 23 0
The above shows the observed data for patient 14 who experienced three recur-
rences, and was at risk of a fourth. The variable t0 indicates when the patient becomes
at risk of each event, and the variable t indicates the recurrence time. d is the event
indicator.
The baseline hazard for each event is also allowed to vary, i.e. there is a stratification
factor by event. I use 5 knots for a shared baseline between the events, but allow
departures from this baseline using restricted cubic splines with 3 knots for each of the
subsequent events. For comparison, I also fit a Cox model, stratified by event number,
with cluster robust standard errors (Lin and Wei, 1989). Results are presented in Table
5.10.
From Table 5.10, I observe similar estimates from the spline based model, compared
to the Cox model with cluster robust standard errors. I can compare estimated baseline
hazard rates for each of the four ordered events from the spline based model, shown in
Figure 5.7. These are not directly estimated in the Cox model.
100
Table 5.10. Bladder cancer dataset. Results from spline based andCox models with cluster robust standard errors.
VariableSpline hazard model Cox model
HR Robust Std. Err. 95% CI HR Robust Std. Err. 95% CI
group 0.699 0.149 0.459 1.063 0.716 0.148 0.478 1.073size 0.990 0.064 0.872 1.123 0.992 0.061 0.878 1.120
number 1.146 0.060 1.035 1.269 1.127 0.058 1.018 1.247
.001
.01
.11
6H
azar
d ra
te
0 10 20 30 40 50Follow-up time
Event 1
.001
.01
.11
6H
azar
d ra
te
0 10 20 30 40 50Follow-up time
Event 2
.001
.01
.11
6H
azar
d ra
te
0 10 20 30 40 50Follow-up time
Event 3
.001
.01
.11
6H
azar
d ra
te
0 10 20 30 40 50Follow-up time
Event 4
95% Confidence Interval Hazard rate
Figure 5.7. Bladder cancer dataset. Baseline hazard rates for the fourordered events.
It can be seen from Figure 5.7, that those patients who go on to experience a third
and fourth event have a high initial hazard rate, demonstrating the fact that they will
likely be a more severe subgroup.
5.12. Discussion
In this chapter I have presented a framework for the general, parametric analysis
of survival data, incorporating any combination of complex baseline hazard functions,
time-dependent effects, time-varying covariates, delayed entry (left truncation), robust
and cluster robust standard errors, and the extension to relative survival. Through
specification of a user-defined hazard function, I have illustrated how to implement
101
standard proportional hazards models, novel restricted cubic spline survival models
and a generalised gamma model with proportional hazards. In essence, this modelling
framework may be used to implement a parametric survival model defined by anything
from a very simple one parameter proportional hazards model, to models which contain
highly flexible functions of time, for both the baseline and time-dependent effects, and
also allowing for a hierarchical structure. Any parameter defined in the hazard function
can be dependent on complex functions of time, including fractional polynomials or
restricted cubic splines.
The choice of the number of quadrature nodes is left to the user. Similarly to
Chapter 4, where simulated survival times should be checked for consistency by using
an increasing number of quadrature nodes, consistent parameter estimates should also
be determined by using an increasing number of quadrature points.
As it is a general framework, it may not be the most computationally efficient;
however, it is a useful tool for the development of novel models. For example, it may
be useful to develop ideas and test new models, but then spend time developing more
computationally efficient methods for specific cases. In particular, when using restricted
cubic splines on the log hazard scale, I described how the general framework can be
optimised in special cases with respect to the estimation routine, utilising the restricted
nature of the splines to incorporate the analytic parts of the cumulative hazard function,
in combination with the Gaussian quadrature. This provided a much more efficient es-
timation process, requiring far fewer quadrature nodes to obtain consistent estimates,
providing computational benefits. In application to the England/Wales breast cancer
data, I showed that the general numerical approach requires a large number of quadra-
ture nodes, compared to the combined analytic/numeric approach, in order to obtain
consistent estimates. This is due to the numerical approach struggling to capture high
hazards at the beginning of follow-up time. Given that hazard ratios are usually only
reported to two/three decimal places, the large number of nodes used in Section 5.9
will often not be required. In further examples not shown, where the hazard is low at
102
the beginning of follow-up, often < 30 nodes are sufficient with the full numerical ap-
proach. I have chosen to use restricted cubic spline functions of log time, since in many
applications it has been found that this provides an equivalent or better fit, compared
to using splines of time. However, in studies with age as the timescale it may be more
appropriate to use spline functions of untransformed time.
I described in Section 5.7, that in order to calculate standard errors and subse-
quently confidence intervals of functions such as the cumulative hazard and survival,
the delta method combined with numerical quadrature can be used. I showed briefly in
Section 5.9.1 that there was excellent agreement between confidence intervals calculated
using this method, and the equivalent analytic method where numerical integration and
the delta method were not required. It would be useful to evaluate this approximation
in further detail through simulation, particularly to calculate coverage in estimates of
the survival function, as was conducted by Rutherford et al. (2014) within the context
of RP models.
Numerical quadrature such as Gauss-Legendre will be utilised in subsequent chap-
ters to estimate the cumulative hazard function within a joint model of longitudinal
and survival data, and furthermore, alternative forms of Gaussian quadrature will be
described and evaluated when calculating the joint likelihood.
103
CHAPTER 6
Joint modelling of longitudinal and survival data
6.1. Chapter outline
In this chapter I outline the foundations of joint modelling of longitudinal and
survival data, which has been a highly active area of biostatistical research in recent
years. I describe the motivation, model framework and estimation of joint models,
which form the basis of extensions and more in depth clinical analyses presented in
subsequent chapters. I was recently involved in a review of current methods and issues
within joint modelling, conducted by the Drug Information Association joint modelling
working group, published in Statistics in Medicine, which is included in Appendix C
(Gould et al., 2014).
6.2. Introduction
It is intrinsic to clinical practice that longitudinal and survival data are collected
in parallel. Biomarker levels such as blood pressure are recorded at more than one
time point as an indicator of patient status, as well as clinical event data such as time
to myocardial infarction, stroke or death. In many settings the two processes have
an underlying association, for example, repeated measurements of biomarkers such as
prostate specific antigen (PSA) have been used to predict recurrence of prostate cancer
(Proust-Lima and Taylor, 2009). Cluster of Differentiation 4 (CD4) counts have also
been used extensively to predict time to progression to Acquired Immune Deficiency
Syndrome (AIDS) in Human Immunodeficiency Virus (HIV) infected patients (Wang
and Taylor, 2001; Wolbers et al., 2010). The key question here is, how do changes in a
repeatedly measured biomarker profile impact the time to an event of interest?
104
Conversely, this problem can be viewed from a different perspective. Longitudinal
studies, which measure biomarkers repeatedly over time, are often affected by drop-out.
When this drop-out is associated with the underlying biomarker profile, then the miss-
ingness process can be considered missing not at random, and will affect conclusions
if ignored in the longitudinal analysis (McArdle et al., 2005). Naively removing such
patients that dropped-out through list-wise deletion can have severe consequences on
analyses. If this element of the data generating mechanism is combined with the in-
herent clustering structure of the longitudinal data, with measurements nested within
patients (and possible higher levels), then it requires a complex model framework to at-
tempt to disentangle the various elements of the generating biological processes (Diggle
et al., 2008).
Recent arguments have been made that much more importance should be placed on
attempting to evaluate patients’ perspectives in clinical research (Gould et al., 2014).
Within cancer or cardiovascular trials, such commonly available outcomes as quality
of life (QOL), measured repeatedly, are often considered as supplementary information
and not utilised to their full extent. The joint analysis of QOL and survival is a
particular area which can benefit from a joint modelling approach, where the death can
act as an informative drop-out process, especially when the two processes are strongly
associated (Billingham and Abrams, 2002; Ibrahim et al., 2010).
Although the two scenarios described above can be approached with the same mod-
elling framework, in this thesis I concentrate on the first aspect; including a biomarker,
measured repeatedly on the same subject, and with error, in a time to event analysis.
6.3. Survival analysis with time-varying covariates
The occurrence of time-varying covariates within the survival analysis framework
was briefly covered in Chapters 2 and 5. Here I give a formal definition of exogenous
and endogenous time-varying covariates, following Kalbfleisch and Prentice (2002) and
Rizopoulos (2012).
105
6.3.1. Exogenous covariates
Let yi(t) represent the covariate at time t for the ith subject, with Yi(t) = yi(s), 0 ≤s < t representing the entire covariate history up to time t. I can formally define an
exogenous covariate, which must satisfy the following relation:
Prs ≤ Ti < s+ ds|Ti ≥ s, Yi(s) = Prs ≤ Ti < s+ ds|Ti ≥ s, Yi(t) (6.1)
for all s, t such that 0 < s ≤ t, and ds → 0. Given this definition, yi(.) is associated
with the rate of event; however, note the independence between the occurrence of the
event at time s, and the future profile when t > s. Exogenous covariates can be thought
of as being external to the patients in a particular study. A common example includes
the time of day.
6.3.2. Endogenous covariates
Endogenous covariates can often be considered to be patient specific biomarkers, in
that they are often recorded on the cohort of patients under study, i.e. they are internal
processes. Given this, endogenous covariates can only be measured when the patient
is alive, therefore when the event of interest is death, the trajectory of the internal
process can have direct information relating to the time that the failure occurs. This
can be formally defined as
Si(t|Yi(t)) = Pr(Ti > t|Yi(t)) = 1 (6.2)
and therefore by definition, existence of a measurement at time t means the patient
is still alive. An often inherent component of endogenous covariates, is that they
are typically measured with error, especially since endogenous covariates are generally
biological measurements made on particular patients.
Furthermore, they are often measured intermittently, i.e. measurements are only
recorded at specified measurement times. It is these features which must be accom-
modated in the analysis of endogenous covariates. Common examples include repeated
measures of lung function, systolic and diastolic blood pressure, and serum cholesterol.
106
6.3.3. Data set up for survival analysis with time-varying covariates
For illustration, a single time-varying covariate can be incorporated into the propor-
tional hazards framework as follows
hi(t|Xi, Yi(t)) = h0(t) exp(Xiβ + αyi(t)) (6.3)
where exp(α) represents the hazard ratio for a one-unit increase in the time-varying
covariate, yi(t), at time t. This model can be fitted in standard statistical software
using the enter and exit notation, first shown in Table 5.3, which I repeat here for ease
of exposition in Table 6.1.
Table 6.1. Liver cirrhosis trial of prednisone treatment. Examplemultiple-record per subject data structure.
ID Prothrombin Treatment Enter time Exit time Event
1 38 placebo 0 0.244 01 31 placebo 0.244 0.381 01 27 placebo 0.381 0.413 1
111 59 prednisone 0 0.246 0111 60 prednisone 0.246 0.498 0111 87 prednisone 0.498 0.745 0111 59 prednisone 0.745 1.128 0111 35 prednisone 1.128 1.158 1
Using this data setup invokes a strict assumption that the time-varying covariate
only changes at observation times, and assumes that the value of the covariate is con-
stant following each measurement, until a new measurement is taken. This is illustrated
in Figure 6.1. There are fundamental issues with this approach when modelling an en-
dogenous time-varying covariate. Firstly, the magnitude of the association between the
biomarker and the event will be underestimated due to measurement error (Prentice,
1982). This particular issue will be investigated in Chapter 9. Secondly, by assuming
a last value carried forward approach, I make the assumption that the time-varying
covariate values do not change between observations, which is highly likely to be un-
realistic for many processes. The resulting interpolations conducted between possibly
107
60
80
100
120
Pro
thro
mbi
n in
dex
0 2 4 6 8 10Follow-up time (years)
Observed measurements recorded at follow-up visitsBiomarker profile used in model fitting (last value carried forward)
Time-varying covariate
Figure 6.1. Liver cirrhosis trial of prednisone treatment. Time-varying covariate within a survival analysis
sparse recordings of the endogenous covariate serve to induce further measurement error
and dilute the estimate of association.
In the next Section I introduce a framework to address the inherent complications
when analysing an endogenous covariate within a survival analysis.
6.4. Formulating the standard joint model
In essence, a joint model consists of two component submodels: a model for the
trajectory of longitudinal measurements, and a model for the event occurrence, that
share one or more parameters. The form of joint model which has dominated the liter-
ature assumes that the association between the time-to-event and longitudinal marker
is characterised by shared random effects, and it is this approach which I adopt.
Building on the notation used in previous chapters and that of Rizopoulos (2012),
let Si be the true survival time of patient i = 1, . . . , n, and Ti = min(Si, Ci) the observed
survival time, with Ci the censoring time. Define an event indicator di, which takes
the value of 1 if Si ≤ Ci and 0 otherwise. Let yij = yi(tij), j = 1, . . . , ni denote the
observed longitudinal responses, with yi(tij) the jth observed longitudinal response of a
continuous biomarker for the ith patient taken at time tij . Each patient has ni repeated
measures. I further define a vector of time-independent baseline covariates, Ui.
108
6.4.1. Longitudinal submodel
The observed longitudinal responses, yij , are inherently measured intermittently, and
with error. To accommodate these features of the data, I can utilise the linear mixed
effects framework, to estimate the true underlying, and complete, subject-specific tra-
jectory function (Diggle et al., 2002). I therefore define
yi(t) = mi(t) + ei(t), ei(t) ∼ N(0, σ2) (6.4)
mi(t) = xTi (t)β + zTi (t)bi + uTi δ, bi ∼ N(0,Σ) (6.5)
with time-dependent design matrices xi(t) and zi(t) for the fixed β and random bi
effects, respectively. Furthermore, I also define a vector of time invariant baseline
covariates, ui ∈ Ui, and corresponding regression coefficients, δ. I assume the mea-
surement error, ei(t), is normally distributed with variance σ2, and is independent from
the random effects, and that cov(ei(t), ei(u)) = 0 (where t 6= u).
Through this formulation, I now have mi(t) representing the true unobserved longi-
tudinal values, essentially removing the measurement error at any time point t, not just
the time that measurements were recorded, tailored to individual patients due to the
random effects. I can then relate the longitudinal trajectory (or components of it, see
Section 6.5), directly to the risk of event. Returning to the prothrombin index example
above, I illustrate this concept in Figure 6.2, illustrating the smooth estimated profile
mi(t).
Flexibility in the longitudinal submodel can be incoporated through the use of
fractional polynomials of time, for example, which will often be sufficient to capture
the longitudinal trajectory (Royston and Altman, 1994). B-splines and restricted cubic
splines have also been used in the joint model context (Brown et al., 2005; Rizopoulos
and Ghosh, 2011). Restricted cubic splines will be used to capture variable longitudinal
profiles in Chapter 9.
6.4.2. Survival submodel
The idea behind a joint model is to link the component processes together through
109
60
80
100
120
Pro
thro
mbi
n in
dex
0 2 4 6 8 10Follow-up time (years)
Observed measurements recorded at follow-up visitsBiomarker profile used in model fitting (last value carried forward)Estimated underlying biomarker profile from a joint model
Time-varying covariate
Figure 6.2. Time-varying covariate within a joint model
some shared parameters. Given Equation (6.5), I can now incorporate mi(t), our true
unobserved patient specific longitudinal trajectory, to the risk of an event. I define
Mi(t) = mi(s), 0 ≤ s ≤ t, to be the true unobserved longitudinal profile up to time
t. I define the proportional hazards time-to-event submodel
h(t|Mi(t),vi) = h0(t) expψTvi + αmi(t) (6.6)
where h0(t) is the baseline hazard function, and vi ∈ Ui a set of baseline time-
independent covariates with associated vector of log hazard ratios, ψ. By including
the true unobserved trajectory function, mi(t), into the linear predictor of the pro-
portional hazards model, I now begin to link the component submodels to form the
joint modelling framework. Multiplying mi(t) by an association parameter, α, I can
interpret exp(α) as the hazard ratio for a one unit increase in mi(t), at time t. In this
formulation I assume the association is based on the current value of the longitudinal
response at time t. This, and other association structures, are described further in
Section 6.5.
The survival function follows naturally
S(t|Mi(t),vi) = exp
(−∫ t
0h0(u) expψTvi + αmi(u) du
)(6.7)
110
Given this definition, I see that the survival function is dependent on the entire history
of the trajectory function, mi(t). The integral in Equation (6.7), due to the time-
dependent nature of the longitudinal process mi(t), often requires numerical integration
to evaluate, which complicates the estimation process. This will be described further
in Section 6.6.
6.4.2.1. Choosing a baseline hazard function
The two fundamental papers that developed the joint modelling framework I have de-
scribed above, from a frequentist perspective, were that of Wulfsohn and Tsiatis (1997)
and Henderson et al. (2000). They both chose the Cox model as the survival submodel
of choice, which of course does not directly estimate the baseline hazard function (Cox,
1972; Wulfsohn and Tsiatis, 1997; Henderson et al., 2000). In general, this can be seen
as a strength and weakness. By leaving the baseline hazard function unspecified, I
avoid making any assumptions about the underlying functional form. Conversely, and
as I have described in previous chapters, if interested in absolute measures of risk, such
as predicting the outcome for a particular patient, which is very often precisely what
is wanted, then it is much more convenient to do so within a parametric framework.
In particular, Hsieh et al. (2006) showed that incorporating the semi-parametric
Cox model into the joint model framework, results in under-estimation of the param-
eter standard errors. Consequently, bootstrapping is required to obtain appropriate
standard errors. Given that computationally intense numerical integration is required
to fit the models (described in detail in Section 6.6), this is an undesirable aspect of
this particular joint model framework.
This issue is avoided when taking a parametric approach. Such standard para-
metric choices include the exponential, Weibull and Gompertz distributions. These
forms of survival model were described in Section 2.6 of Chapter 2. Of course, as
described in Chapters 4 and 5, assuming a standard survival distribution can be con-
sidered restrictive in the underlying baseline hazard functions that can be accurately
captured. Incorporating more flexibility into the survival submodel choice is evaluated
and discussed further in Chapters 7 and 8.
111
6.5. Alternative association structures
The standard joint model framework, described above, relates the true unobserved
longitudinal profile at time t, directly to the risk of event at time t. This is often referred
to as the current value parameterisation. Exploring alternative, clinically meaningful,
ways of linking the two processes expands the usefulness of the joint modelling frame-
work.
In this Section I describe various alternatives to the current value parameterisation,
in particular, concentrating on association structures that are available within the Stata
package stjm, developed as part of this thesis (described in detail in Chapter 10).
6.5.1. Interaction effects
The standard joint model association structure, seen in Equation (6.6), assumes the
same association between the true longitudinal value and the risk of event, for all pa-
tients. In some cases, this can be considered an unrealistic assumption, in which case,
it may be of interest to allow for different values of association for different patient sub-
groups. This can be achieved through forming interactions between baseline covariates
and the true unobserved longitudinal trajectory function, as follows
h(t|Mi(t),vi1,vi2) = h0(t) exp[ψTvi1 +αT vi2 ×mi(t)
](6.8)
where vi1,vi2 ∈ Ui. As in Equation (6.6), I have a vector of baseline covariates, vi1,
but now introduce a vector of interaction covariates, vi2, which is interacted with the
true longitudinal profile, mi(t). This then provides a vector of association parameters
α, providing different associations for different covariate patterns. If vi2 = 1, then
Equation (6.8) reduces to the standard current value parameterisation in Equation
(6.6).
6.5.2. Time-dependent slope
The association structures described above are parameterised to link the current value
of the true longitudinal response. Given that I am modelling the longitudinal outcome
over time, a natural extension to the association structures is to link the rate of change,
112
or slope,
h(t|Mi(t),vi) = h0(t) expψTvi + α1mi(t) + α2m′i(t) (6.9)
with
m′i(t) =dmi(t)
dt=
dxTi (t)β + zTi (t)bidt
(6.10)
Inclusion of this further association structure, from a clinical perspective, could be
important where interest not only lies in the current values of, say, a biomarker level,
but also how the rate of change of the biomarker is associated with the risk of event. For
example, it may be of interest to know not only the effect that the level of the biomarker
has, but whether it is on an increasing or decreasing trend, at time t. This association
structure was incorporated into a two-stage regression calibration joint model by Ye
et al. (2008). Assessment of the added benefit of including the rate of change of CD4
trajectories within a joint model framework to model the risk of progression to AIDS
or death in HIV-positive patients was conducted by Wolbers et al. (2010).
6.5.3. Random effects parameterisation
Finally, I define a time-independent association structure, which includes only the
random effects in the linear predictor of the survival submodel
h(t|Mi(t),vi) = h0(t) expψTvi +αT (β + bi) (6.11)
Equation (6.11) includes both the population level mean of the random effect, β, plus
the subject specific deviation, bi. Alternatively, I can include only the subject specific
deviation, as follows
h(t|Mi(t),vi) = h0(t) expψTvi +αTbi (6.12)
The association parameters in Equations (6.11) and (6.12) differ in their interpretation.
For example, consider a random intercept and random slope longitudinal submodel
mi(t) = (β0 + b0i) + (β1 + b1i)t (6.13)
113
Under Equation (6.11), I can investigate the association between patient specific values
of the true longitudinal outcome at t = 0 using
h(t|Mi(t),vi) = h0(t) expψTvi + α1(β0 + b0i) (6.14)
where exp(α1) is the hazard ratio for a one unit increase in the baseline value of the
longitudinal outcome i.e. the intercept. This particular association structure will be
discussed in more detail in Chapter 9. Alternatively, using Equation (6.12)
h(t|Mi(t),vi) = h0(t) expψTvi + α2b0i (6.15)
where now the association is assumed to be based only on the subject specific deviation
from the population mean intercept.
I also draw attention to an equivalence which can exist between association struc-
tures, under certain simplistic longitudinal models, for example, using the trajectory
model in Equation (6.13), the rate of change association structure described in Section
6.5.2 is equivalent to a time-independent association structure as follows
h(t|Mi(t),vi) = h0(t) exp
ψTvi + α3
dmi(t)
dt
= h0(t) expψTvi + α3(β1 + b1i)
where exp(α3) now represents the hazard ratio for a one unit increase in the subject
specific slope of the true longitudinal outcome.
It is important to note, however, that interpretation under the time-independent
associations must be undertaken with caution. When a complex longitudinal trajectory
function is chosen for the random effects, such as fractional polynomials or splines, then
linking the random coefficients of such spline terms, for example, does not provide a
clinically interpretable association parameter.
From a computational perspective, however, the time-independent association struc-
tures are particularly useful, as the cumulative hazard function (and consequently the
114
survival function) required in the estimation process often requires numerical integra-
tion under commonly used log hazard scale parametric survival submodels and time-
dependent association structures. By using a time-independent association structure,
this numerical integration can be avoided as the cumulative hazard function has a
closed form, providing direct computational benefits. This is described in more detail
in the next section.
On a final note, incorporating more than one association structure into the linear
predictor can allow investigation of the effect of various aspects of the longitudinal
trajectory, such as in Equation (6.9) (Henderson et al., 2000). For example, combining
the current value parameterisation with the random intercept association can provide
an association framework which can allow for the time-dependent changes in the lon-
gitudinal response, whilst incorporating the often highly predictive component of the
baseline values.
6.6. Estimation
The estimation of a joint longitudinal and survival model is not a trivial task.
Motivated by the difficulty in estimation, and lack of available software, early work in
the field of joint modelling adopted a two-stage approach, in efforts to improve on a
standard survival analysis whereby the observed values of a time-varying covariate were
used, as described in Section 6.3.3. The two-stage approach first fitted a longitudinal
mixed effects model to the observed repeated measures, and then obtained subject-
specific predictions for the longitudinal marker, including empirical Bayes estimates of
the random effects, and subsequently used the predictions as a time-varying covariate
within a survival model (Self and Pawitan, 1992). The appeal of this approach is
in its general simplicity, in that it can be implemented in any standard statistical
software. However, many authors such as Tsiatis and Davidian (2001) and Sweeting
and Thompson (2011) have shown that although this method offers an improvement in
terms of bias compared to using the observed marker values, this two-stage approach
still produces substantial bias and poor coverage in a range of scenarios. In particular,
115
Sweeting and Thompson (2011) noted that any informative drop-out is ignored, and
the uncertainty in the estimation of the maximum likelihood estimates and empirical
Bayes contributions are ignored in the survival model, producing estimates that are too
precise.
Consequently, much of the recent research has concentrated on full likelihood ap-
proaches, where the longitudinal and survival processes are estimated simultaneously
within a joint framework, and it is this approach which I take in this thesis. Other
alternatives include a Bayesian approach, using Markov Chain Monte Carlo methods
(Faucett and Thomas, 1996; Brown and Ibrahim, 2003a), which is appealing as it avoids
the need for numerical integration to integrate out the random effects, as will be de-
scribed in subsequent sections under a likelihood approach, although can still be highly
computationally intensive.
6.6.1. Deriving the joint likelihood
I now define the log-likelihood under the full joint model approach, for the joint dis-
tribution of Ti, di, yi, the survival time, event indicator, and continuous longitudinal
response, respectively. I define a vector of time-independent random effects, bi which
serve to characterise the longitudinal and survival processes. I further define an overall
parameter vector θ = θt,θy,θb, where θt is the parameter vector for the survival pro-
cess, θy is the vector of parameters for the longitudinal process, and θb is the elements
of the variance-covariance matrix of the random effects.
The fundamental assumption of this joint model framework is that the random ef-
fects both account for the association between the survival and longitudinal processes,
but further account for the correlation between the repeated measurements of the lon-
gitudinal outcome. Following Rizopoulos (2012), this can be expressed as
p(Ti, di,yi|bi;θ) = p(Ti, di|bi;θ)p(yi|bi;θ) (6.16)
p(yi|bi;θ) =
ni∏
j=1
pyi(tij)|bi;θ (6.17)
116
where j indexes the longitudinal measurements. Furthermore, I make the common
assumptions of independent censoring, and non-informative times of measurement.
Therefore, the log-likelihood for the ith patient can be written as
log p(Ti, di,yi;θ) = log
∫p(Ti, di,yi, bi;θ)dbi
= log
∫p(Ti, di|bi;θt)
ni∏
j=1
pyi(tij)|bi;θy
p(bi;θb)dbi (6.18)
where
p(Ti, di|bi;θt) = hi(Ti|Mi(Ti);θt)di × Si(Ti|Mi(Ti);θt)
=[h0(Ti) expψTvi + αmi(Ti)
]di
× exp
(−∫ Ti
0h0(u) expψTvi + αmi(u)du
)(6.19)
pyi(tij)|bi;θy = (2πσ2)−1/2 exp
− [yi(tij)−mi(tij)]
2
2σ2
(6.20)
and
p(bi;θb) = (2π)−qb/2|Σ|−1/2 exp
(−b
Ti Σ−1bi
2
)(6.21)
with a continuous longitudinal response, yi, assuming conditional normality, and nor-
mally distributed random effects with variance-covariance matrix Σ, where qb is the
dimension of the random effects.
The estimation of full likelihood joint models has almost exclusively been through
(semi-parametric) maximum likelihood, implemented using Expectation-Maximisation
(EM) algorithms, where in the expectation step the unknown random effects are treated
as missing values (Wulfsohn and Tsiatis, 1997; Henderson et al., 2000). Furthermore,
many authors have proposed a Bayesian approach to estimation, through Markov Chain
Monte Carlo (MCMC) techniques (Brown and Ibrahim, 2003a,b; Wang and Taylor,
2001). In addition, Tsiatis and Davidian (2001) have proposed a conditional score
approach, which makes no distributional assumption on the random effects.
117
Alternatively, within the maximum likelihood framework, estimation can be con-
ducted via a direct maximisation of the observed data log-likelihood using standard
maximisation techniques such as the Newton-Raphson algorithm. It is this approach
which I adopt to fit the models. As has been discussed in Rizopoulos et al. (2009), the
score equations can be calculated; however, as with the log-likelihood, numerical inte-
gration is required to compute them. Within a generalised linear mixed effects model
context, Lesaffre and Spiessens (2001) have shown that often the integrals required for
such computed derivatives are more poorly approximated by quadrature compared to
the numerical estimates obtained using finite differences.
The important aspect to note from Equation (6.18), is that the (possibly multi-
densional) integral is analytically intractable, and therefore requires numerical integra-
tion techniques to evaluate, and hence maximise the likelihood. In the next section I
describe methods to evaluate the likelihood when integrating over the random effects.
Furthermore, under a parametric survival submodel, the integral in Equation (6.19)
requires numerical integration to evaluate it due to the time-dependency of the longi-
tudinal submodel. Evaluation of this form of integral was described in Chapters 4 and
5. It is clear that from the requirement of numerical integration required to calculate
the survival function, nested within (possibly multi-dimensional) numerical integration
to integrate over the random effects, is a computationally demanding and challenging
task.
6.6.2. Numerical integration over the infinite domain: Gauss-Hermite
quadrature
As described in earlier chapters, Gaussian quadrature is a numerical method to ap-
proximate analytically intractable integrals. In the case of the joint likelihood shown
in Equation (6.18), the (possibly multi-dimensional) integral over the infinite domain
must be evaluated using such techniques. In this Section I describe simple and adaptive
Gauss-Hermite quadrature, used to approximate the log-likelihood seen in Equation
(6.18), with my exposition based on Naylor and Smith (1982) and Tuerlinckx et al.
(2006).
118
6.6.2.1. Simple Gauss-Hermite quadrature
Gauss-Hermite quadrature is used to evaluate analytically intractable integrals of the
form (Naylor and Smith, 1982)
∫ ∞
−∞e−x
2f(x)dx ≈
m∑
q=1
wqf(xq) (6.22)
where xq and wq are the quadrature nodes and weights, with
wq =2m−1m!
√π
m2 [Hm−1(xq)]2 (6.23)
and xq is the qth root of the Hermite polynomial Hm(x). The theoretical basis for
Gauss-Hermite quadrature states that with m nodes, the quadrature formulation will
be exact if f(x) is a polynomial of degree 2m− 1. Following Naylor and Smith (1982)
and Tuerlinckx et al. (2006), if I replace the weighting function e−x2
in Equation (6.22)
with a normal density, φ(), with mean µ and standard deviation σ, I have
∫ ∞
−∞f(x)φ(x|µ, σ2)dx =
1√2πσ
∫ ∞
−∞f(x) exp
[−(x− µ)2
2σ2
]dx (6.24)
Then undertaking a change of variable, where x = µ + σ√
2r, and correspondingly
r = x−µ√2σ
, I have
∫ ∞
−∞f(x)φ(x|µ, σ2)dx =
√2σ√
2πσ
∫ ∞
−∞f(µ+ σ
√2r)e−r
2dr (6.25)
≈m∑
q=1
f(µ+ σ√
2r)wq√π
(6.26)
which corresponds to a quadrature evaluation based on the normal kernel, with nodes
µ+σ√
2xq, and weights wq/√π. If I use the standard normal kernel, then I have nodes,
dq =√
2xq, and weights, vq = wq/√π.
So far I have considered the univariate case, where I wish to integrate over a single
dimension, i.e. random effect. If I wish to move to the multivariate case, e.g. random
intercept and slopes, I can extend the univariate case to an integration over a multi-
variate normal distribution of dimension Q. I can therefore define the vector of nodes
119
for the multivariate standard normal as dq1,...,qQ = (dq1 , . . . , dqQ). However, this must
be pre-multiplied by the Choleski decomposition of the variance-covariance matrix of
the random effects, Σ1/2. To illustrate the multivariate case, I present Figures 6.3 and
6.4. In Figure 6.3 I have the 2-dimensional basis node locations, dq1,q2 = (dq1 , dq2), for-8
-6-4
-20
24
68
Qua
drat
ure
poin
t (e.
g. r
ando
m s
lope
)
-8 -6 -4 -2 0 2 4 6 8Quadrature point (e.g. random intercept)
Figure 6.3. Basis node locations for 2-dimensional 9 point simpleGauss-Hermite quadrature
9-point Gauss-Hermite quadrature. These basis node locations are then pre-multiplied
by the Choleski decomposition of the random effects variance-covariance matrix. In
Figure 6.4, I have multiplied the basis node locations by Σ1/2 =(
1 0.50.5 1
), to represent
positive correlation between the two random effects. The log-likelihood is then evalu-
ated at each of the ‘co-ordinates’, multiplied by the appropriate weights and summed
across dimensions.
Therefore, I can now write the log-likelihood seen in Equation (6.18), as
log p(Ti, di,yi;θ) = log
∫p(Ti, di|bi;θt)
ni∏
j=1
pyi(tij)|bi;θy
p(bi;θb)dbi
120
-8-6
-4-2
02
46
8Q
uadr
atur
e po
int (
e.g.
ran
dom
slo
pe)
-8 -6 -4 -2 0 2 4 6 8Quadrature point (e.g. random intercept)
Figure 6.4. Basis node locations for 2-dimensional 9 point simpleGauss-Hermite quadrature, transformed by Choleski decomposition ofvariance-covariance matrix
≈ logm∑
b1=1
· · ·m∑
bQ=1
p(Ti, di|Σ1/2db1,...,bQ ;θt)
ni∏
j=1
pyi(tij)|Σ1/2db1,...,bQ ;θy
× vb1 . . . vbQ (6.27)
The approximation to the log-likelihood seen in Equation (6.27) depends on the number
of quadrature points, m, and consequently the closeness of the objective function to
a polynomial function of degree 2m − 1. Standard practice within the generalised
linear mixed modelling framework, when utilising numerical quadrature, is to fit models
with an increasing number of quadrature points, until the difference between estimates
when using m and m+ 1 nodes, is sufficiently small. It is also important to note that
the computational burden of such quadrature evaluations grows substantially with an
increasing number of nodes, and subsequently grows exponentially with each increase
in dimension, Q.
Under simple Gauss-Hermite quadrature, the same node locations are used for
each subject, centred at the zero vector and scaled by the Choleski decomposition of
the estimated variance-covariance matrix of the random effects. Clearly, this will not
121
place the nodes in the optimum locations of the likelihood function for each subject.
If the between-subject variance is large then it is likely that the use of common node
locations will miss key information from the likelihood function for specific subjects,
resulting in an inefficient algorithm.
6.6.2.2. Adaptive Gauss-Hermite quadrature
To improve on the simple Gauss-Hermite (GH) quadrature described above, Pinheiro
and Bates (1995) proposed an adaptive quadrature routine, which allows subject-specific
centring and scaling of the quadrature nodes, in an attempt to place the nodes at the
most appropriate positions of the integral function for each subject.
This is achieved by considering an alternative kernel distribution, that of φ(bi|bi, Ωi),
as follows
log p(Ti, di,yi;θ)
= log
∫p(Ti, di|bi;θt)
ni∏
j=1
pyi(tij)|bi;θy
p(bi;θb)dbi
= log
∫ p(Ti, di|bi;θt)[∏ni
j=1 pyi(tij)|bi;θy]p(bi;θb)
φ(bi|bi, Ωi)φ(bi|bi, Ωi)dbi (6.28)
Equation (6.28) can then be evaluated using GH quadrature using this alternative
normal kernel density, with the nodes appropriately transformed using ri = bi +
Ω1/2i db1,...,bQ . This gives
log p(Ti, di,yi;θ) (6.29)
≈ (2π)Q/2|Ωi|1/2m∑
b1=1
· · ·m∑
bQ=1
p(Ti, di|ri;θt)
ni∏
j=1
pyi(tij)|ri;θy
× φ(ri|0,Σ) exp
(1
2dTb1,...,bQdb1,...,bQ
) Q∏
q=1
vbq (6.30)
I illustrate the adaptive approach conceptually in Figure 6.5. In Figure 6.5A I have a
normal density with population mean, µ = 0, and standard deviation, σ, to represent
a random intercept for example, that I wish to integrate out. Overlaid are the nodes
122
for 5-point simple GH quadrature, where the height of the vertical bars represents the
size of the weight applied at that node location. They will be scaled by the current
estimate of population standard deviation of the random effect, σ. As described above,
if the variation between subjects is moderate then it is highly likely that the original
node locations will fail to capture important contributions to the likelihood of some
patients, as shown in Figure 6.5B. The adaptive nature of GH quadrature allows us,
between each iteration of the estimation process, to estimate how each subject differs
from the population. Figure 6.5C illustrates this ‘shift’ for 2 patients. In Figure 6.5D
I can use the estimate, b1, to shift the original nodes, whilst in Figure 6.5E, I can then
scale them using the estimated standard error of b1, to locate the quadrature nodes in
the optimum location for subject 1. Further details on the implementation in Stata
can be found in Chapter 10.
This process of course extends to the multivariate case. The major benefit of
this approach from a computational, and efficiency, point of view, is that I can gain
a substantial reduction in computation time because far fewer quadrature nodes are
generally required to gain accurate estimates and standard errors, compared to the
use of simple Gauss-Hermite quadrature. This will be explored through simulation in
Chapter 7.
6.7. Prediction
In this Section I describe some of the clinically useful predictions that can be ob-
tained following the fitting of a joint model. The presence of both fixed and random
effects means that a variety of both simple and complex predictions can be made. Av-
erage predictions can be made simply by setting bi = 0 for all patients; however, for
patient-specific predictions I need estimates of the random effects, bi. There is grow-
ing interest in personalised medicine, with predictions tailored to individual patients
a clinically appealing goal. The joint model framework lends itself to this goal quite
naturally.
123
bi ~ N(0,S)
-6 -4 -2 0 2 4 6bi
(A) 5-point simple GH quadrature to eval-uate the population level random intercept
bi ~ N(0,S)
-6 -4 -2 0 2 4 6bi
(B) Subject-specific deviations from thepopulation mean
b1
b2
bi ~ N(0,S)
-6 -4 -2 0 2 4 6bi
(C) Estimate subject-specific shifts fromthe population mean
bi ~ N(0,S)
-6 -4 -2 0 2 4 6bi
(D) Centre nodes for a subject 1 using the
estimate of b1
bi ~ N(0,S)
-6 -4 -2 0 2 4 6bi
(E) Re-scale centred nodes using se(b1)
Figure 6.5. Conceptual example of adaptive Gauss-Hermite quadrature
6.7.1. Random effects
In order to obtain patient-specific predictions, I must obtain estimates of the random
effects vector, bi. This can be conducted within an empirical Bayes framework (Rabe-
Hesketh and Skrondal, 2012; Rizopoulos, 2012). I assume that the prior distribution
124
for the random effects is simply φ(bi;θ), the multivariate normal distribution specified
for the model above. This can then be combined with the likelihood, conditional on
the random effects vector, p(Ti, di|bi;θ)p(yi|bi;θ), to give (using Bayes’ theorem)
p(bi|Ti, di,yi;θ) =φ(bi;θ)p(Ti, di|bi;θ)p(yi|bi;θ)
p(Ti, di,yi;θ)(6.31)
∝ φ(bi;θ)p(Ti, di|bi;θ)p(yi|bi;θ) (6.32)
the posterior distribution, given the observed responses Ti, di and yi. The desired
empirical Bayes prediction is the mean of the posterior distribution, calculated at the
parameter estimates θ, given by
bi =
∫bip(bi|Ti, di,yi;θ)dbi (6.33)
with variance
var(bi) =
∫(bi − bi)2p(bi|Ti, di,yi;θ)dbi (6.34)
It should be noted that both integrals in Equations (6.33) and (6.34) remain analytically
intractable, requiring quadrature.
6.7.2. Longitudinal and survival predictions
6.7.2.1. Longitudinal predictions
The longitudinal trajectory can be calculated based only on the fixed portion of the
model, using
mi(t) = xTi (t)β + uTi δ (6.35)
which can be used to calculate the population level trend over time, for a specific
covariate pattern, calculated at the mean of the random effects distribution, i.e. when
bi = 0. Confidence intervals follow naturally when the prediction is only based on
fixed effects. Including random effects in the prediction, by calculating subject-specific
values for the random effects as described in Section 6.7.1, means that subject-specific
125
longitudinal predictions can be calculated, from Equation (8.2), as
mi(t) = xTi (t)β + zTi (t)bi + uTi δ (6.36)
One of the advantages of the shared parameter framework is the ability to make both
population level and subject-specific predictions.
6.7.2.2. Survival predictions
I can also make predictions for the survival outcome, such as the hazard, cumulative
hazard, and survival functions; either using only the fixed effect estimates, or includ-
ing patient-specific estimates of the random effects. Marginal survival predictions are
also often calculated, by drawing from the estimated random-effects distribution and
averaging across predictions (Rizopoulos, 2010).
6.7.3. Conditional survival predictions of Rizopoulos (2011)
As interest in personalised medicine grows, the clinical usefulness of tailoring pre-
dictions to an individual patient only grows. The very time-dependent nature of the
longitudinal process, and how changes in the evolution of the process impact the sur-
vival outcome provides an extremely appealing risk prediction framework (Rizopoulos,
2011; Proust-Lima and Taylor, 2009; Taylor et al., 2013).
Given an estimated joint model, for a specific patient, who could be a censored
observation, or indeed a new patient, it is of direct clinical interest to predict the time
to the event of interest. I now relate back to the description of endogenous covariates
in Section 6.3, where given that a patient has had a measurement at time t, in fact
implies survival up to time t, therefore, it is of interest to calculate the probability of
survival at time u > t conditional on survival up to time t. The following is based on
Rizopoulos (2011).
πi(u|t) = P (T ∗i ≥ u|T ∗i > t, Yi(t), Dn;θ) (6.37)
where Dn = Ti, di,yi; i = 1, . . . , n represents the sample dataset that was used to
fit the joint model of interest, which will be used to inform the conditional survival
predictions.
126
P (T ∗i ≥ u|T ∗i > t, Yi(t);θ)
=
∫P (T ∗i ≥ u|T ∗i > t, Yi(t), bi;θ)p(bi|T ∗i > t, Yi(t);θ)dbi
=
∫P (T ∗i ≥ u|T ∗i > t, bi;θ)p(bi|T ∗i > t, Yi(t);θ)
=
∫Su|Mi(u, bi,θ);θSt|Mi(t, bi,θ);θ p(bi|T
∗i > t, Yi(t);θ) (6.38)
A first-order estimate has been proposed for πi(u|t)
πi(u|t) =Siu|Mi(u, bi, θ); θSit|Mi(t, bi, θ); θ
+O(n−1i ) (6.39)
However, calculating appropriate standard errors for this estimate is problematic, and
hence Monte Carlo sampling schemes have been proposed to account for the variability
in both the maximum likelihood estimates, and the estimates of the random effects. Ri-
zopoulos (2011) takes a Bayesian approach to this, and proposed the following sampling
Monte Carlo scheme;
• For l = 1, . . . , L repetitions:
(1) Draw θ(l) ∼ N(θ, var(θ))
(2) Draw b(l)i ∼ bi|T ∗i > t, Yi(t),θ
(l)(3) Calculate π
(l)i (u|t) =
Siu|Mi(u,b(l)i ,θ(l));θ(l)
Sit|Mi(t,b(l)i ,θ(l));θ(l)
The Monte Carlo samples can then be used to derive statistics such as the median and
mean, with percentile-based confidence intervals.
6.7.4. Distribution of the random effects
Tsiatis and Davidian (2001) have proposed alternative joint model frameworks which
make no distributional assumption on the random effects; however, Huang et al. (2009)
and Rizopoulos et al. (2008) have shown from both a theoretical framework and through
simulation that joint models are quite robust to the assumption of normally distributed
random effects. It has generally been shown in the literature that the assumption of
127
normally distributed random effects is relatively robust to deviations from this assump-
tion. The estimates of the random effects, described in Section 6.7.1, can be assessed
for deviations to this assumption using standard techniques, such as Q-Q plots.
6.8. Discussion
The joint modelling of longitudinal and time-to-event data has received substantial
attention in the literature over the past 20 years (Wulfsohn and Tsiatis, 1997; Hen-
derson et al., 2000), with the ability to investigate the inter-relationships between the
joint processes being advocated in ever widening fields of study (Tseng et al., 2005;
Ibrahim et al., 2010). A number of excellent reviews of the field have been conducted
(Tsiatis and Davidian, 2004; Yu et al., 2004), most recently the review of Gould et al.
(2014) that I was involved in. It is now well established that joint modelling offers an
appropriate methodological framework when assessing the association between a lon-
gitudinal biomarker, measured with error, and a time to event of interest, be it from a
survival analysis context, or a longitudinal study adjusting for informative drop-out.
Joint modelling is well known to be a technically complex subject area, generally
due to the calculations required in fitting the models; however, this is becoming less so
with the increased availability of user friendly software. A central theme of this thesis
is to develop and disseminate user friendly Stata software for the joint modelling of
longitudinal and survival data, which will be comprehensively described in Chapter 10.
In this chapter I have described the methodological foundations of the standard joint
model framework. In the succeeding chapters I introduce some methodological exten-
sions and apply joint modelling to two datasets in the area of liver cirrhosis. Building
on the motivations described in Chapters 4 and 5, a particular limitation of joint mod-
els was the flexibility of the survival submodel, within a parametric approach. Many
authors proposed standard survival submodel choices, such as the Weibull distribution
(Deslandes and Chevret, 2007), but failed to consider the implications of misspecifying
the baseline hazard function. It is this which motivates the methodological extension
128
described in the next chapter. Furthermore, a comprehensive assessment of the com-
monly used numerical integration methods, described in Section 6.6.2, had not been
conducted, which the methods within a frequentist framework are heavily reliant on.
This will also be conducted in the next chapter.
129
CHAPTER 7
Flexible parametric joint modelling of longitudinal and
survival data
7.1. Chapter outline
In this Chapter I propose to incorporate the Royston-Parmar (RP) model (Royston
and Parmar, 2002), introduced in Chapter 2, as the survival submodel within a joint
modelling framework. Motivated by the need for a more flexible formulation of the
survival submodel to capture complex hazard functions, whilst maintaining the benefits
of a parametric approach; the proposed joint model is evaluated through an extensive
simulation study, comparing the RP submodel which uses restricted cubic splines as
an alternative to a previously defined B-spline formulation to capture complex hazard
functions. I further evaluate and compare adaptive and non-adaptive Gauss-Hermite
quadrature used to evaluate the joint likelihood, an often neglected issue.
This chapter is based on a recent publication in Statistics in Medicine, a copy of
which can be found in Appendix D (Crowther et al., 2012a).
7.2. Introduction
In Chapter 6, I introduced the main concepts of joint modelling of longitudinal and
survival data, describing the motivation for undertaking a joint model analysis and
derived the frequentist estimation framework. In this chapter, I propose a methodolog-
ical extension to the joint model framework to address the issue of capturing complex
hazard functions, raised in Chapters 4, 5 and 6.
As described in Chapters 2 and 5, the occurrence of more complex survival data
and underlying baseline hazard functions has motivated the development of more flex-
ible, parametric approaches to survival analysis. One such proposition is the model
130
of Royston and Parmar (2002), which is modelled on the log cumulative hazard scale
using restricted cubic splines, to provide a highly flexible, but still fully parametric,
survival modelling framework, described in Chapter 2.
Within a joint modelling framework, I propose to incorporate the RP survival
model, to provide a highly flexible framework to capture complex hazard functions.
Rizopoulos et al. (2009) developed a joint model using this form, but expanded log
time by using B -splines. I extend their approach by alternatively using restricted cubic
splines (Durrleman and Simon, 1989; Lambert and Royston, 2009), which impose the
restriction that the fitted function is forced to be linear beyond the boundary knots,
as described in Section 2.8.1 of Chapter 2. The number of parameters in a restricted
cubic spline specification is equal to the number of internal knots plus 2 (one of which
is an intercept). The number of parameters in a B-spline formulation is equal to the
order plus the number of internal knots.
There are 2 primary motivations for my approach. First, as described in Chapters
4 and 5, standard parametric survival models can often lack the flexibility to capture
complex underlying hazard functions, for example, the Weibull assumes a monotonic
shape, which will be insufficiently flexible to fully capture a hazard function with a
turning point. Patient specific conditional survival predictions (Rizopoulos, 2011), a key
output of the joint model framework, rely on sufficiently capturing the baseline hazard
function, and consequently extrapolating the baseline hazard function. Secondly, joint
models can be considered computationally intensive, therefore by modelling on the log
cumulative hazard scale I avoid numerically integrating the hazard function, required to
evaluate the joint likelihood when using a hazard scale parametric survival submodel,
as described in Chapter 6.
I conduct an extensive simulation study to not only evaluate the proposed joint
model, comparing to the appropriate B-spline based model, but also evaluating non-
adaptive quadrature with varying numbers of nodes, and fully adaptive quadrature, an
issue often neglected in the literature. The simulation study makes use of the general
survival simulation algorithm described in Chapter 4.
131
The methods are illustrated using a dataset of 488 patients with liver cirrhosis, in-
troduced in Chapter 3 (Anderson et al., 1993). A total of 251 patients were randomised
to receive prednisone, with 237 assigned to a placebo. Prothrombin index was measured
repeatedly at baseline, with subsequent scheduled measurements at 3, 6, 12 months and
then annually; however, observed time of measurements varied substantially. A total of
2968 measurements were recorded. I investigate the effect of treatment after adjusting
for the relationship between prothrombin index and time to death.
7.3. Defining the joint model
In this section I define the joint model incorporating the Royston-Parmar survival
model as the survival submodel of choice. I use notation defined in Chapter 6. The
longitudinal submodel follows that defined in Section 6.4.1 of the previous chapter.
7.3.1. Survival submodel
I define the Royston-Parmar survival submodel, on the log cumulative hazard scale,
with Mi(t) = mi(s), 0 ≤ s ≤ t, defined to be the true unobserved longitudinal profile
up to time t
logHi(t|Mi(t),vi) = logH0(t) +ψTvi + αmi(t) (7.1)
where H0(t) is the cumulative baseline hazard function, α denotes the association pa-
rameter and ψ is a set of log (cumulative) hazard ratios associated with a set of baseline
covariates, vi, again a subset of Ui. For illustration, I present the model formulation
with the current value association structure, described in Section 6.4.2 of Chapter
6; however, note that the other association structures remain valid in this modelling
framework. Following Section 2.8.2, the log baseline cumulative hazard is written as a
restricted cubic spline function
logHi(t|Mi(t),vi) = ηi(t) = slog(t)|γ,k0+ψTvi + αmi(t) (7.2)
132
with corresponding hazard and survival functions
h(t|Mi(t),vi) =dH(.)
dt
=
dηi(t)
dt
× exp(ηi(t))
=
d[slog(t)|γ,k0+ψT vi + αmi(t)
]
dt
× exp(ηi(t))
=
1
t
dslog(t)|γ,k0d log(t)
+ αdm(t)
dt
exp(ηi(t)) (7.3)
and
S(t|Mi(t),vi) = exp [− expηi(t)] (7.4)
Note that, as opposed to the parametric survival submodels described in the previous
chapter, the survival function defined in Equation (7.4) is analytically tractable, i.e.
can be written in closed form, providing substantial computational benefits.
7.3.1.1. A note on time-dependent association structures
Due to modelling on the log cumulative hazard scale interpretation of the association
parameter must be undertaken with caution. As can be seen in the definition of the
hazard function in Equation (7.3), the association parameter α and longitudinal com-
ponent mi(t) appear not only in the definition of ηi(t) but also within the derivative
of the cumulative hazard function. This means when interpreting α as the log cumu-
lative hazard ratio for a one-unit increase in the longitudinal outcome at time t, is not
equivalent to the log hazard ratio for a one-unit increase in the longitudinal outcome at
time t, i.e. the equivalence between cumulative hazard ratios and hazard ratios breaks
down.
7.3.2. Likelihood
The joint likelihood described in Chapter 6 remains the same for the longitudinal
submodel components, and the distribution of the shared random effects. However, the
133
likelihood component for the survival submodel is now
p(Ti, di|bi,θ) =
[1
Ti
dslog(Ti)|γ,k0d log(Ti)
+ αdm(Ti)
dTi
exp(ηi(Ti))
]di
× exp [− exp(ηi(Ti))] (7.5)
As mentioned above, the important aspect to note from Equation (7.5) is that there
is no longer the integral which is present under a parametric survival model on the
log hazard scale, as described in Chapter 6. This directly avoids the need for nested
numerical integration to evaluate the cumulative hazard function, and subsequently
the survival function required for the likelihood, seen in Equation (6.19). This provides
direct computational benefits due to modelling on the log cumulative hazard scale.
7.4. Simulation study
I now evaluate the performance of the proposed joint model, comparing the use
of restricted cubic splines to the B-spline formulation of Rizopoulos et al. (2009), and
further evaluate and compare the numerical integration techniques of adaptive and
non-adaptive Gauss-Hermite quadrature, used to calculate the joint likelihood. Under
each scenario I apply both the proposed joint model with 5 degrees of freedom, plus an
intercept, resulting in 6 parameters to capture the baseline cumulative hazard function,
and a B-spline function of degree 3 (cubic splines for consistency) and 2 internal knots,
again resulting in 6 parameters to capture the baseline hazard, providing a fair compar-
ison. I also apply the proposed joint model with 1 degree of freedom (equivalent to a
Weibull based joint model). The proposed model is implemented in the stjm command
in Stata (Crowther, 2012). I use the JM package (Rizopoulos, 2010) version 0.4.0 in R
to fit the model of Rizopoulos et al. (2009). Both Stata and R packages use a tolerance
of 1E-04 for the parameter estimates, and Stata uses 1E-07 for the log-likelihood, with
R using 1.5E-8.
Furthermore, what has often been neglected in the literature, is an assessment on
the number of quadrature nodes required to obtain consistent parameter estimates of
134
effect and sufficient coverage probabilities. To each scenario I use non-adaptive Gauss-
Hermite quadrature to evaluate the joint likelihood of both the proposed model and
the B-spline model, with 5 and 15 nodes for each random effect dimension to assess
performance. I also implement 5-point fully adaptive quadrature for the proposed
model for comparison; however, this was not available for the B-spline model which is
a limitation of the R package.
For each scenario, 300 patients were included in each of 500 repetitions. The true
longitudinal profile was generated from mi(t) = β0i + β1it + δui, with β0i ∼ N(0, 1),
β1i ∼ N(0, 0.252) and correlation between (β0i, β1i) of 0.25. The observed longitudinal
measurements were then generated from yi(t) ∼ N(mi(t), 0.52). Time of measurements
were fixed at (0, 1, 2, 3, 4). Survival times were generated from log(H(t|mi(t), ui)) =
log(H0(t)) + αmi(t) + ψui, where H0(t) is detailed below. Administrative censoring
was applied at 5 years. A binary treatment group variable was generated from ui ∼Bin(1, 0.5). The direct treatment effect on the longitudinal response, δ, is fixed at -0.25,
the direct treatment effect on the time-to-event, ψ, is fixed at 0.25, and the association
parameter, α, is varied between -0.25, 0.25.
7.4.1. Generating survival times
Here I put into practice the general survival simulation algorithm described in Chapter
4, in order to simulate survival times from complex baseline hazard functions incorpo-
rating turning points. This will allow a full assessment of the use of splines to capture
complex hazard functions. For example, the method of Rizopoulos et al. (2009) was
evaluated in a simulation study with survival times generated from an exponential dis-
tribution (however, the primary motivation of the simulation study was to evaluate the
Laplacian estimation method, not the survival submodel). I generate survival times
from a two-component mixture Weibull distribution (McLachlan and McGiffin, 1994),
with:
S0(t) = p exp−λ1tγ1+ (1− p) exp−λ2t
γ2 (7.6)
135
and
H0(t) = − log [p exp−λ1tγ1+ (1− p) exp−λ2t
γ2] (7.7)
I now add the linear predictor for the association and time independent covariates, on
the log cumulative hazard scale:
logH(t) = log [− log(p exp−λ1tγ1+ (1− p) exp−λ2t
γ2)] + αmi(t) + ψui (7.8)
Therefore, given that I can write down the survival function as S(t) = exp [−H(t)], with
H(t) defined in Equation (7.8), but cannot invert it to solve for t, I can directly apply
simulation scenario 2, from Chapter 4 Section 4.5.2, utilising root finding techniques,
to simulate survival times.
Three scenarios of baseline parameters were chosen; a standard Weibull with in-
creasing hazard function, λ1 = 0.1, γ1 = 1.5, and p = 1, a mixture Weibull with a
single turning point in the baseline hazard function, λ1 = 0.1, γ1 = 1.5, λ2 = 0.1,
γ2 = 0.5, and p = 0.9, and finally a Weibull distribution with λ1 = 1E−05, γ1 = 6.1,
and p = 1. The final scenario is to assess the validity of the approach when the hazard
is essentially zero for a portion of the follow-up time.
7.4.2. Results
Tables 7.1, 7.2 and 7.3 present bias and coverage estimates from all simulations gen-
erated under the three baseline hazard functions. Under the three scenarios, survival
submodel parameter estimates from the proposed model, i.e. the direct treatment effect
on survival (ψ) and the association parameter (α), appear to be unbiased. However,
under the B-spline approach, across all scenarios I observe consistent under-estimation
of the association parameter, α. This bias is eliminated in the restricted cubic spline
approach. For example, under scenario 2 with the true α = 0.25, the percentage bias
under the restricted cubic spline approach is -0.8% compared to -10% under the B-spline
approach. Coverage probabilities very closely approximate the desired 95% in all sce-
narios when using restricted cubic splines, even with a small number of non-adaptive
136
quadrature nodes. For the longitudinal submodel parameters, I observe generally un-
biased estimates; however, in respect to variance parameters, only when the number of
non-adaptive quadrature nodes ≥ 15 or when using fully adaptive quadrature with 5
nodes. Under non-adaptive quadrature, coverage estimates are generally below the de-
sired 95% indicating a marked under-estimation of the standard errors, compared with
optimum coverage probabilities across scenarios when fully adaptive quadrature with
5 nodes is used. Further simulations, not shown here, illustrated that 35 non-adaptive
quadrature nodes were required to provide optimum coverage probabilities. Standard
errors of variance parameters are not available in R so coverage could not be assessed
for all parameters in the B-spline models.
The proposed model also produces moderate bias in the variance estimate of the
slope parameter when 5 point non-adaptive quadrature is used; however, this bias is
eliminated under both 15 point non-adaptive and 5 point adaptive quadrature. Com-
paring across degrees of freedom, I observe almost identical estimates of bias and cov-
erage probabilities between models.
Table 7.2 presents bias and coverage estimates for simulations generated from a
two-component mixture Weibull baseline hazard, described in Section 7.4.1. Results
appear entirely consistent with those found when generating under a standard Weibull
distribution. The underestimation of the standard errors of the longitudinal parameters
remains a problem when an insufficient number of quadrature nodes are used. Despite
generating data from a complex baseline hazard, the joint models fitted with only one
degree of freedom appear to estimate all parameters just as effectively as with 5 degrees
of freedom, specifically the three treatment effects. This can perhaps be expected, as is
often the case the hazard ratio can be insensitive to specification of the baseline hazard
function (Royston and Lambert, 2011).
I discuss the implications of the choice of the number of quadrature nodes, and the
insensitivity to the baseline hazard further in Section 7.6.
137
7.5. Analysis of liver cirrhosis dataset
In this section I apply the proposed joint model to the dataset described in Section
3.5 of Chapter 3, where primary interest is the effect of treatment after adjusting for
the repeatedly measured prothrombin index on the time to all-cause death. A total of
488 patients had their prothrombin index measured at baseline, with further scheduled
measurements at 3, 6, and 12 months, and annually thereafter. Median number of
measurements was 6 (range: 1, 17). A total of 292 (59.8%) patients died during the
study. Patients were randomised to 2 treatment groups, namely prednisone or placebo.
For further details regarding the dataset, I refer the reader elsewhere (Anderson et al.,
1993). I begin with some exploration of the observed data, by plotting the observed
repeated measures of prothrombin index over time, splitting between patients who died
and who were censored, shown in Figure 7.1.
050
100
150
200
Long
itudi
nal r
espo
nse
0 5 10Measurement time
Censored
050
100
150
200
Long
itudi
nal r
espo
nse
0 5 10Measurement time
Event
Figure 7.1. Liver cirrhosis trial of prednisone treatment. Longitudi-nal profiles of prothrombin index for patients who were censored/died.Lowess smoother overlaid.
Figure 7.1 illustrates the observed biomarker measurements for all patients, with a
lowess smoother overlaid to give an indication of the average profile over time, indicat-
ing some evidence that patients who died had generally lower values of prothrombin
138
index. This is further investigated in Figure 7.2, which provides an exploration of the
relationship between prothrombin trajectory and the time (in years) to death by plot-
ting the observed longitudinal responses against observation time, where the timescale
is adjusted by taking away the observed censoring/event time. A lowess smoother is
overlaid.
050
100
150
200
Long
itudi
nal r
espo
nse
-15 -10 -5 0Time before censoring
Censored
050
100
150
200
Long
itudi
nal r
espo
nse
-15 -10 -5 0Time before event
Event
Figure 7.2. Liver cirrhosis trial of prednisone treatment. Longitudi-nal profiles of prothrombin index for patients who were censored/died.Timescale is adjusted by taking away each patients survival time.Lowess smoother overlaid.
From Figure 7.2, it is apparent that patients who experienced the event, compared
to patients who were censored, had decreasing levels of the biomarker during the 2-3
year period before death. If I assume the association between the longitudinal and
survival models is based on the current value parameterisation discussed in Section
6.4.2, I would expect a negative association indicating a lower value of prothrombin
index has an increased risk of death. This form of plot can be a useful exploratory tool
in the analysis of joint longitudinal and survival data, specifically indicating a current
value association structure between the two processes.
I now apply the joint model described in Section 7.3 to the liver cirrhosis dataset. In
the longitudinal submodel I assume a random intercept with random effect of log(time),
139
and also adjust for the interaction between treatment and time. In preliminary analysis
log(time) showed an improved fit compared to a linear effect of time. In the survival
submodel I adjust for the direct effect of treatment. I model the association between
prothrombin index and time to death through the current value parameterisation. I
use 5 degrees of freedom to model the baseline cumulative hazard, equivalent to 4
internal knots. Boundary knots are placed at the 0th and 100th percentiles of the
uncensored log survival times. For comparison I also apply the model of Rizopoulos
et al. (2009). Under the B-spline model I use cubic splines with 2 internal knots to
provide a comparison of model fit with the same number of parameters used to model
the baseline cumulative hazard function. As adaptive quadrature is not available for
the B-spline model, I apply both models using 35 point non-adaptive quadrature.
In Table 7.4, comparing between the proposed approach and the B-spline model, I
generally observe similar parameter estimates, in particular both models show a neg-
ative association between prothrombin index and time to death, for example under
the RP approach I observe an association of -0.038 (95% CI: -0.045, -0.031), indicat-
ing a lower value of prothrombin index increases the risk of death. I observe a non-
statistically significant direct effect of treatment on survival with a log hazard ratio of
0.210 (95% CI: -0.038, 0.457).
I now return to our primary motivation of our approach which is to effectively
capture complex hazard functions. I compare the fitted marginal survival functions
across models with the Kaplan-Meier estimates for the liver cirrhosis dataset, shown
in Figure 7.3. It is evident from Figure 7.3 that the restricted cubic spline approach
provides an improved fit compared to the B-spline approach, using the same number of
parameters to model the baseline cumulative hazard function. Indeed, in Figure 7.4 I
show the marginal survival function with an increased number of internal knots under
the B-spline approach, highlighting that I need to use 5 internal knots to achieve a
function which fits as closely as the restricted cubic splines approach. In other words,
I need to use 9 parameters under the B-spline approach compared to only 6 under the
restricted cubic spline approach to achieve a well-fitting function.
140
0.0
0.2
0.4
0.6
0.8
1.0
Sur
viva
l
0 2 4 6 8 10 12 14Follow−up time (years)
Restricted cubic splines
0.0
0.2
0.4
0.6
0.8
1.0
Sur
viva
l
0 2 4 6 8 10 12 14Follow−up time (years)
B−splines
Marginal survival
Kaplan−Meier curve 95% Confidence IntervalMarginal survival
Figure 7.3. Liver cirrhosis trial of prednisone treatment. Marginalsurvival using 6 parameters to model the baseline cumulative hazardfunction, overlaid on the Kaplan-Meier estimate and it’s associated 95%confidence interval.
7.5.1. Predictions
To illustrate the prognostic benefits of the joint modelling framework, conditional
survival predictions can be tailored at the individual levels using the empirical Bayes
predictions from the random effects, and sampling schemes have been proposed to
calculate appropriate standard errors for these predictions (Rizopoulos, 2011). I adapt
the approach of Rizopoulos (2011) to calculate conditional survival predictions of 2
patients with similar baseline values of prothrombin index, using the fitted restricted
cubic spline based joint model, shown in Figure 7.5. Given the negative association
between prothrombin index and an increased risk of death, it can be seen from Figure
7.5 that patient 98 has a sharply increasing pattern of prothrombin index across follow-
up time, resulting in higher survival probabilities, conditional on survival at time of final
measurement, when compared to Patient 253. Patient 253 maintains lower values of
prothrombin index, resulting in lower survival predictions. The confidence intervals are
derived using the Monte Carlo sampling scheme described in Section 6.7.3 of Chapter 6,
where I have both the estimates of the fixed effects, and the subject-specific estimates
141
0.0
0.2
0.4
0.6
0.8
1.0
Sur
viva
l
0 2 4 6 8 10 12 14Follow−up time (years)
3 internal knots
0.0
0.2
0.4
0.6
0.8
1.0
Sur
viva
l
0 2 4 6 8 10 12 14Follow−up time (years)
4 internal knots
0.0
0.2
0.4
0.6
0.8
1.0
Sur
viva
l
0 2 4 6 8 10 12 14Follow−up time (years)
5 internal knots
Marginal survival using B−splines
Kaplan−Meier curve 95% Confidence IntervalMarginal survival
Figure 7.4. Liver cirrhosis trial of prednisone treatment. Marginalsurvival using cubic B-splines with 3, 4 or 5 internal knots, resulting in7, 8 or 9 parameters to model the baseline cumulative hazard function,overlaid on the Kaplan-Meier estimate and it’s associated 95% confi-dence interval.
of the random effects, feeding into the uncertainty. The reliance of these predictions
on accurately specifying the baseline hazard is discussed in Section 7.6.
7.5.2. Sensitivity to location and number of knots
In my experience I have found that the default knot locations, based on the distribu-
tion of uncensored event times provides the most sensible approach to modelling using
spline formulations, as was found in Rizopoulos et al. (2009) and Rutherford et al.
(2014). This allows the data to be modelled more accurately in the areas of greatest
density. Previous work within the flexible parametric survival modelling framework
have shown insensitivity to knot placements (Lambert et al., 2010a). Using 5 degrees
of freedom (4 internal knots), the default knot locations are 0.424, 1.186, 2.894, 5.418.142
0.0
0.2
0.4
0.6
0.8
1.0
Survival probability
0
50
100
150
200
Pro
thro
mbi
n in
dex
0 2 4 6 8 10 12 14Follow−up time (years)
Patient 98
0.0
0.2
0.4
0.6
0.8
1.0S
urvival probability
0
50
100
150
200
Pro
thro
mbi
n in
dex
0 2 4 6 8 10 12 14Follow−up time (years)
Patient 253
Predicted conditional survival
Longitudinal responsePredicted conditional survival95% pointwise confidence interval
Figure 7.5. Liver cirrhosis trial of prednisone treatment. Predictedconditional survival probabilities for patients 98 and 253.
I choose 3 other sets of internal knot locations (on the original time scale) and compare
parameter estimates and predicted marginal survival curves. I have knot locations A
of 0.3, 1, 3, 5, locations B of 1, 3, 5, 8 and locations C of 0.2, 1, 2, 9. Table 7.5
contains the parameter estimates across models with differing knot choices, illustrat-
ing once again the robustness of parameter estimates when compared to the original
results in Table 7.4, with only minor differences observed in the 3rd decimal place. Sim-
ilarly, the left plot in Figure 7.6 shows very stable predicted marginal survival curves
across knot choices. Furthermore, the right plot in Figure 7.6 illustrates the fitted
marginal survival function when using 2, 3 and 5 internal knots (with locations based
on equally spaced quantiles of the distribution of uncensored survival times), illustrat-
ing the stability of our proposed model. In comparison to Figure 7.4, I observe much
more variability in the marginal survival predictions when using B-splines with varying
number of knots.
7.6. Discussion
In this chapter I have described a highly flexible joint model for a single longitu-
dinal continuous biomarker and the time to an event of interest. The restricted cubic
143
0.0
0.2
0.4
0.6
0.8
1.0
Sur
viva
l
0 2 4 6 8 10 12 14Follow−up time (years)
Kaplan−Meier curve 95% CI
Marg. survival 1 Marg. survival 2
Marg. survival 3
Varying knot locations
4 internal knots
0.0
0.2
0.4
0.6
0.8
1.0
Sur
viva
l
0 2 4 6 8 10 12 14Follow−up time (years)
Kaplan−Meier curve 95% CI
Marg. survival df = 3 Marg. survival df = 4
Marg. survival df = 6
Locations based on quantiles
2, 3 or 5 internal knots
Marginal survival − sensitivity analysis
Figure 7.6. Liver cirrhosis trial of prednisone treatment. Fitted mar-ginal survival function from joint models with varying knot locationsand number of internal knots. Left hand plot used 6 parameters tomodel the baseline log cumulative hazard function, right hand plot uses4, 5 and 7 parameters.
spline basis for the log cumulative baseline hazard function provides a flexible frame-
work where often the time-to-event is of primary interest. Flexibility in the longitudinal
submodel can be incorporated through the use of fixed and/or random fractional poly-
nomials of time, which can capture a variety of shapes (Royston and Altman, 1994).
The simulation study conducted to assess the proposed joint model raised three
important issues. Firstly, I observed consistent under-estimation of the association
parameter, α, under the B-spline approach. This bias was eliminated when using re-
stricted cubic splines, both with 1 and 5 degrees of freedom. Secondly, the choice of the
number of quadrature nodes can have a marked impact on both parameter estimates,
and the associated standard errors. It should be investigated further if increasing the
non-adaptive quadrature points would eliminate the bias under the B-spline approach.
If interest is purely on the time-to-event then a lower number of quadrature nodes can
be used and will produce unbiased estimates with optimum coverage levels; however, if
the longitudinal submodel is of interest then the choice of quadrature nodes and method
144
is crucial. For example, in studies where quality of life is the longitudinal marker of
interest (Billingham and Abrams, 2002), the longitudinal response profile can be of
direct interest in order to be included into an economic decision model, where reliable
estimates of associated standard errors can be pivotal in assessing cost-effectiveness and
thus health policy decisions (Briggs, 2000). The simulation study highlighted the supe-
riority of fully adaptive Gauss-Hermite quadrature in the joint model setting. The use
of adaptive quadrature means I can use a much reduced number of quadrature nodes,
resulting in substantial computational benefits. Finally, the simulation study showed in
general how the estimates of covariate effects were insensitive to the specification of the
baseline hazard. This of course can be beneficial; however, one of the key benefits of the
joint model framework are the predictions which can be obtained. These predictions
will rely heavily on the accuracy of the model in estimating the baseline hazard func-
tion. I illustrate this in Figure 7.7, whereby data is simulated under a two-component
mixture Weibull baseline hazard function with a turning point. I apply joint models to
the single simulated dataset, firstly with 1 degree of freedom (equivalent to a Weibull
model), and then 5 degrees of freedom. I then predict the marginal survival function
and compare to the Kaplan-Meier survival curve. It is evident from Figure 7.7 that
only with a sufficient number of degrees of freedom can the baseline survival function
be adequately captured.
In application to the liver cirrhosis dataset, it was found that the restricted cu-
bic spline approach provided improved flexibility in capturing complex baseline hazard
functions when compared to a B-spline formulation with the same number of param-
eters, implying that I can obtain greater flexibility with fewer parameters. Of course,
B-spline functions of other degrees may in fact provide well-fitting models; however,
the results presented in this chapter have shown that they can produce unstable fitted
functions.
In this chapter I also described patient specific conditional survival predictions (Ri-
zopoulos, 2011); a particularly clinically relevant tool of the joint model framework.
145
0.00
0.25
0.50
0.75
1.00
Sur
viva
l pro
babi
lity
0 1 2 3 4 5Follow−up time
Marginal survival function − 1 degree of freedom
0.00
0.25
0.50
0.75
1.00
Sur
viva
l pro
babi
lity
0 1 2 3 4 5Follow−up time
Marginal survival function − 5 degrees of freedom
95% Confidence IntervalKaplan−Meier estimateMarginal survival function
Figure 7.7. Simulated data. Fitted marginal survival function fromjoint models with either 1 or 5 degrees of freedom, overlaid on theKaplan-Meier survival curve.
It is important to note that such predictions require extrapolation of both the base-
line hazard function, and the longitudinal profile (under a time-dependent association
structure), often beyond the range of the data. Concentrating on extrapolating the
baseline hazard function, the RP framework described here is particularly appealing,
as I have shown that the baseline can be captured well, providing a sufficient number of
degrees of freedom are chosen, but also the restricted nature of the splines beyond the
boundary knots immediately provides a somewhat ‘sensible’ basis for extrapolation.
There are a multitude of extensions to this joint model framework. For example,
adaption to incorporate a cure fraction can be achieved simply due to the restricted
linear basis for the final spline function. Imposing the constraint that the final spline
function beyond the last knot is constant has been implemented to allow for a cure
fraction in population-based cancer studies within the flexible parametric framework
(Andersson et al., 2011). Furthermore, extension to the competing risks setting by
modelling cause-specific hazards can be accommodated, introducing cause-specific as-
sociation parameters. The generalised linear mixed effects framework for the longitu-
dinal measures submodel can be adapted to handle categorical responses (Rizopoulos
146
and Ghosh, 2011). Finally, a Bayesian approach to the proposed model could be inves-
tigated and contrasted (Guo and Carlin, 2004). Many of these extensions are discussed
further in Chapter 11.
In application to the liver cirrhosis dataset, a single term of observation time pro-
vided sufficient flexibility to capture the shape of subject specific longitudinal trajecto-
ries; however, further flexibility could be investigated through the use of splines (Brown
et al., 2005; Rizopoulos and Ghosh, 2011), which will be investigated in Chapter 9.
During the review process for the accompanying paper to this chapter, found in
Appendix D, concerns were raised about ensuring the monotonicity of the cumulative
hazard function. In my experience, including all scenarios of the simulation study,
this is not a practical issue. If at any point in the estimation process the hazard
function goes negative, then the algorithm will fail. This was not observed in any
simulations, ensuring that valid cumulative hazard and subsequently survival functions
were estimated.
Implementation of the model is facilitated through user friendly Stata software
(Crowther, 2012), developed as part of this thesis, and described in more detail in
Chapter 10.
147
Table 7.1. Simulation results from Weibull scenario 1. Association isvaried with α = −0.25, 0.25.
Parameter True value ModelNAQ 5 nodes NAQ 15 nodes AQ 5 nodesBias 95% CP Bias 95% CP Bias 95% CP
β0 0FPM (df=1) -0.001 66.5 -0.003 84.4 -0.003 95.6FPM (df=5) -0.005 67.7 -0.002 84.6 -0.003 95.6
B-spline 0.013 52.4 0.019 81.5 - -
σ0 1FPM (df=1) -0.046 59.8 -0.002 85.6 -0.004 94.8FPM (df=5) -0.046 60.5 -0.004 85.4 -0.004 94.6
B-spline 0.012 - -0.002 - - -
β1 0FPM (df=1) -0.014 70.3 -0.002 87.2 -0.001 94.8FPM (df=5) -0.015 69.7 -0.002 86.4 -0.001 94.2
B-spline -0.027 56.6 -0.024 73.0 - -
σ1 0.25FPM (df=1) -0.248 14.6 -0.018 93.0 -0.009 94.2FPM (df=5) -0.247 14.2 -0.021 92.3 -0.009 94.2
B-spline 0.022 - -0.028 - - -
σ01 0.25FPM (df=1) 0.001 70.1 -0.020 72.8 0.001 94.8FPM (df=5) 0.001 70.5 -0.022 71.9 0.001 94.8
B-spline -0.040 - -0.018 - - -
δ -0.25FPM (df=1) -0.017 60.3 -0.004 81.2 0.001 93.8FPM (df=5) -0.012 60.9 -0.009 80.6 0.001 93.8
B-spline 0.003 49.0 -0.007 80.9 - -
σe 0.5FPM (df=1) 0.068 30.5 0.000 95.0 -0.001 95.0FPM (df=5) 0.068 30.3 0.000 95.0 -0.001 95.0
B-spline 0.096 - 0.008 - - -
ψ 0.25FPM (df=1) 0.005 94.0 0.007 94.4 0.007 94.0FPM (df=5) 0.002 94.0 0.002 93.8 0.006 93.6
B-spline -0.004 95.2 -0.002 94.6 - -
α 0.25FPM (df=1) 0.004 95.2 0.005 95.4 0.005 96.0FPM (df=5) -0.002 93.2 -0.003 91.8 0.005 95.2
B-spline -0.019 94.8 -0.012 95.6 - -
β0 0FPM (df=1) 0.001 63.2 0.006 82.6 0.002 92.6FPM (df=5) 0.000 64.0 0.004 82.4 0.002 92.6
B-spline -0.019 56.6 -0.015 80.9 - -
σ0 1FPM (df=1) -0.048 62.4 -0.004 85.6 -0.008 95.0FPM (df=5) -0.048 62.2 -0.004 84.4 -0.008 95.0
B-spline 0.009 - -0.005 - - -
β1 0FPM (df=1) 0.010 71.2 0.000 88.2 -0.001 95.2FPM (df=5) 0.010 72.0 0.001 88.2 -0.001 94.6
B-spline 0.025 55.8 0.022 72.9 - -
σ1 0.25FPM (df=1) -0.246 17.6 -0.016 92.8 -0.010 95.2FPM (df=5) -0.245 17.4 -0.012 92.6 -0.009 95.2
B-spline 0.019 - -0.029 - - -
σ01 0.25FPM (df=1) 0.001 71.6 0.000 76.6 0.016 96.4FPM (df=5) 0.002 71.4 -0.002 76.6 0.015 96.2
B-spline -0.031 - -0.004 - - -
δ -0.25FPM (df=1) -0.007 61.2 -0.008 80.4 -0.005 94.8FPM (df=5) -0.008 61.8 -0.010 80.8 -0.005 94.8
B-spline -0.012 49.7 -0.012 79.7 - -
σe 0.5FPM (df=1) 0.069 29.8 0.001 93.0 0.000 94.8FPM (df=5) 0.070 29.8 0.001 92.6 0.000 94.8
B-spline 0.095 - 0.007 - - -
ψ 0.25FPM (df=1) 0.001 94.0 0.002 94.0 0.002 94.2FPM (df=5) 0.002 93.0 -0.002 92.8 0.002 93.8
B-spline -0.001 93.7 -0.003 94.0 - -
α -0.25FPM (df=1) -0.004 96.0 -0.002 96.4 -0.002 96.2FPM (df=5) 0.001 95.6 0.008 93.8 -0.001 96.2
B-spline 0.021 92.1 0.014 94.6 - -
95% CP - 95% Coverage Probability, df - degrees of freedom, NAQ - Non-Adaptive Quadrature,AQ - Adaptive Quadrature. FPM - Flexible Parametric Model usng restricted cubic splines
148
Table 7.2. Simulation results from mixture-Weibull scenario 2. Asso-ciation is varied with α = −0.25, 0.25.
Parameter True value ModelNAQ 5 nodes NAQ 15 nodes AQ 5 nodesBias 95% CP Bias 95% CP Bias 95% CP
β0 0FPM (df=1) -0.007 64.2 0.002 82.1 0.001 94.8FPM (df=5) -0.004 64.8 0.003 80.9 0.002 94.7
B-spline 0.028 52.5 0.022 80.4 - -
σ0 1FPM (df=1) -0.050 58.2 -0.002 86.5 -0.004 95.2FPM (df=5) -0.050 58.8 -0.002 86.4 -0.004 95.3
B-spline 0.011 - -0.001 - - -
β1 0FPM (df=1) -0.012 67.0 -0.001 86.9 0.000 94.8FPM (df=5) -0.013 66.8 -0.001 85.6 -0.001 94.3
B-spline -0.024 57.6 -0.022 70.9 - -
σ1 0.25FPM (df=1) -0.250 11.2 -0.020 93.2 -0.010 94.6FPM (df=5) -0.252 10.4 -0.020 92.9 -0.011 94.3
B-spline 0.014 - -0.028 - - -
σ01 0.25FPM (df=1) -0.007 68.2 -0.008 75.5 0.012 95.6FPM (df=5) -0.012 68.6 -0.007 74.8 0.012 95.5
B-spline -0.022 - -0.002 - - -
δ -0.25FPM (df=1) 0.001 62.0 -0.007 79.9 -0.005 96.6FPM (df=5) 0.000 61.0 -0.008 78.5 -0.005 96.6
B-spline -0.005 50.9 -0.006 78.1 - -
σe 0.5FPM (df=1) 0.068 27.2 0.000 95.0 -0.001 96.4FPM (df=5) 0.068 27.2 0.000 95.5 -0.001 96.4
B-spline 0.101 - 0.006 - - -
ψ 0.25FPM (df=1) -0.002 95.4 -0.002 95.6 -0.002 95.4FPM (df=5) -0.003 95.4 -0.004 94.9 -0.003 95.1
B-spline -0.011 95.8 -0.012 94.9 - -
α 0.25FPM (df=1) 0.001 93.6 0.000 93.4 0.000 93.4FPM (df=5) -0.002 93.0 -0.002 91.7 -0.001 93.3
B-spline -0.025 90.1 -0.020 91.7 - -
β0 0FPM (df=1) -0.010 64.3 0.003 81.1 -0.002 94.4FPM (df=5) -0.011 63.3 -0.001 81.7 -0.002 94.4
B-spline -0.040 46.6 -0.019 78.1 - -
σ0 1FPM (df=1) -0.042 60.7 -0.001 84.5 -0.003 94.8FPM (df=5) -0.043 60.7 -0.001 85.3 -0.003 94.8
B-spline 0.012 - 0.003 - - -
β1 0FPM (df=1) 0.009 74.4 0.000 87.6 -0.001 96.4FPM (df=5) 0.009 74.8 0.001 88.2 -0.001 96.2
B-spline 0.023 58.1 0.022 75.1 - -
σ1 0.25FPM (df=1) -0.247 14.0 -0.018 95.6 -0.008 94.0FPM (df=5) -0.248 13.4 -0.018 95.2 -0.008 94.0
B-spline 0.017 - -0.028 - - -
σ01 0.25FPM (df=1) 0.002 67.7 -0.002 69.9 0.016 94.2FPM (df=5) 0.003 67.1 -0.003 70.1 0.017 94.4
B-spline -0.017 - -0.002 - - -
δ -0.25FPM (df=1) 0.014 56.3 -0.002 81.9 0.002 95.0FPM (df=5) 0.014 57.7 0.001 81.9 0.002 95.0
B-spline 0.023 43.8 -0.003 79.7 - -
σe 0.5FPM (df=1) 0.069 25.7 0.000 92.8 -0.002 93.8FPM (df=5) 0.069 25.0 0.000 92.8 -0.001 94.0
B-spline 0.100 - 0.006 - - -
ψ 0.25FPM (df=1) -0.004 94.4 -0.001 94.4 -0.002 94.4FPM (df=5) -0.004 94.2 -0.007 94.2 -0.002 94.8
B-spline -0.004 95.2 -0.004 95.0 - -
α -0.25FPM (df=1) -0.008 96.2 -0.004 96.4 -0.005 96.4FPM (df=5) -0.005 94.0 0.003 93.0 -0.005 95.2
B-spline 0.020 94.0 0.014 94.2 - -
95% CP - 95% Coverage Probability, df - degrees of freedom, NAQ - Non-Adaptive Quadrature,AQ - Adaptive Quadrature. FPM - Flexible Parametric Model usng restricted cubic splines
149
Table 7.3. Simulation results from Weibull scenario 3. Association isvaried with α = −0.25, 0.25.
Parameter True value ModelNAQ 5 nodes NAQ 15 nodes AQ 5 nodesBias 95% CP Bias 95% CP Bias 95% CP
β0 0FPM (df=1) 0.006 57.3 -0.001 78.5 -0.001 95.0FPM (df=5) 0.009 57.1 0.000 78.0 -0.005 94.9
B-spline -0.002 42.7 0.006 74.6 - -
σ0 1FPM (df=1) -0.063 49.9 -0.010 79.5 -0.010 96.2FPM (df=5) -0.063 50.3 -0.010 79.4 -0.011 95.6
B-spline 0.006 - -0.006 - - -
β1 0FPM (df=1) 0.000 79.8 0.001 85.7 0.000 94.6FPM (df=5) 0.001 78.2 0.001 85.3 -0.002 94.1
B-spline 0.000 68.1 0.000 83.0 - -
σ1 0.25FPM (df=1) -0.217 6.6 -0.016 92.6 -0.003 93.4FPM (df=5) -0.218 6.8 -0.016 92.3 -0.005 93.9
B-spline 0.012 - -0.004 - - -
σ01 0.25FPM (df=1) -0.046 20.6 -0.020 68.5 0.011 95.2FPM (df=5) -0.046 20.4 -0.023 69.7 0.009 94.7
B-spline 0.040 - 0.012 - - -
δ -0.25FPM (df=1) 0.006 95.8 0.008 77.3 0.000 96.6FPM (df=5) -0.012 94.8 0.007 76.6 -0.001 96.6
B-spline 0.013 35.3 0.000 71.3 - -
σe 0.5FPM (df=1) 0.073 9.8 0.003 93.0 0.000 93.8FPM (df=5) 0.073 10.0 0.003 92.9 0.000 94.3
B-spline 0.133 - 0.010 - - -
ψ 0.25FPM (df=1) 0.006 95.8 0.007 95.6 0.006 95.4FPM (df=5) -0.012 94.8 0.006 95.6 -0.012 95.1
B-spline -0.005 95.4 -0.002 95.8 - -
α 0.25FPM (df=1) 0.010 94.0 0.004 94.6 0.004 94.4FPM (df=5) 0.001 93.2 0.001 93.7 -0.016 90.7
B-spline -0.025 96.0 -0.018 96.4 - -
β0 0FPM (df=1) 0.008 56.6 0.006 73.4 0.003 94.8FPM (df=5) 0.009 56.8 0.006 74.0 0.005 94.8
B-spline 0.009 39.1 0.002 70.4 - -
σ0 1FPM (df=1) -0.061 45.6 -0.009 75.8 -0.006 95.8FPM (df=5) -0.061 45.8 -0.009 76.2 -0.007 95.2
B-spline 0.011 - -0.003 - - -
β1 0FPM (df=1) 0.000 80.0 0.000 87.6 -0.001 95.2FPM (df=5) 0.000 80.4 0.000 88.2 0.000 95.4
B-spline 0.000 70.9 0.000 87.8 - -
σ1 0.25FPM (df=1) -0.219 5.2 -0.019 93.6 -0.004 94.3FPM (df=5) -0.219 4.8 -0.018 93.2 -0.005 93.6
B-spline 0.010 - -0.005 - - -
σ01 0.25FPM (df=1) -0.036 23.4 -0.021 64.5 0.010 94.1FPM (df=5) -0.034 23.4 -0.020 64.8 0.009 94.4
B-spline 0.037 - 0.009 - - -
δ -0.25FPM (df=1) -0.011 52.6 -0.015 74.6 -0.009 94.8FPM (df=5) -0.010 52.6 -0.015 75.4 -0.009 94.8
B-spline -0.023 36.3 -0.015 68.8 - -
σe 0.5FPM (df=1) 0.073 8.6 0.002 93.4 0.000 94.3FPM (df=5) 0.074 8.2 0.003 93.6 -0.001 94.4
B-spline 0.133 - 0.010 - - -
ψ 0.25FPM (df=1) -0.006 95.2 -0.003 95.2 -0.003 95.2FPM (df=5) -0.010 95.4 -0.009 94.8 -0.019 94.2
B-spline 0.000 95.4 -0.001 95.4 - -
α -0.25FPM (df=1) -0.007 95.0 -0.001 95.0 0.000 94.8FPM (df=5) 0.002 93.4 0.000 94.2 0.013 92.0
B-spline 0.028 94.2 0.021 95.0 - -
95% CP - 95% Coverage Probability, df - degrees of freedom, NAQ - Non-Adaptive Quadrature,AQ - Adaptive Quadrature. FPM - Flexible Parametric Model usng restricted cubic splines
150
Table 7.4. Liver cirrhosis trial of prednisone treatment. Results fromapplying the RP restricted cubic spline and B-spline joint models.
ParameterJoint model: RCS Joint model: B-spline
Estimate 95% CI Estimate 95% CI
Longitudinal:t1 0.872 0.388 1.356 1.169 0.694 1.643t1*treatment 0.272 -0.354 0.899 0.299 -0.327 0.925Intercept 75.252 73.033 77.471 75.901 73.745 78.056
sd(t1) 2.333 1.953 2.786 2.176 - -sd(intercept) 21.839 20.152 23.667 21.327 - -corr(t1,intercept) 0.610 0.467 0.722 0.567 - -
sd(Residual) 17.612 17.092 18.148 17.658 - -
Survival:Association -0.038 -0.045 -0.031 -0.039 -0.046 -0.031Treatment 0.210 -0.038 0.457 0.241 -0.007 0.489
where t1 = log(time + 0.00273)RCS - Restricted Cubic Splines, CI - Confidence Interval
Table 7.5. Liver cirrhosis trial of prednisone treatment. Results fromjoint models with varying knot locations.
ParameterKnot locations A Knot locations B Knot locations C
Estimate 95% CI Estimate 95% CI Estimate 95% CI
Longitudinal:t1 0.877 0.393 1.361 0.873 0.390 1.357 0.874 0.391 1.358t1*treatment 0.275 -0.351 0.902 0.275 -0.352 0.901 0.273 -0.353 0.900Intercept 75.304 73.088 77.519 75.261 73.044 77.479 75.262 73.045 77.481
sd(t1) 2.333 1.954 2.786 2.331 1.952 2.784 2.331 1.951 2.784sd(intercept) 21.843 20.132 23.700 21.830 20.143 23.658 21.829 20.142 23.657corr(t1,intercept) 0.609 0.466 0.721 0.609 0.466 0.722 0.609 0.466 0.722
sd(Residual) 17.611 17.091 18.147 17.613 17.093 18.148 17.612 17.093 18.148
Survival:Association -0.038 -0.045 -0.031 -0.038 -0.045 -0.031 -0.038 -0.045 -0.031Treatment 0.209 -0.038 0.456 0.212 -0.035 0.459 0.210 -0.037 0.457
151
CHAPTER 8
Joint modelling of longitudinal and survival data using
finite mixture models
8.1. Chapter outline
In this chapter I explore a different class of parametric survival submodels, incorpo-
rating finite mixtures of parametric distributions to model the baseline hazard flexibly
and on the log hazard scale, as an alternative to the Royston-Parmar submodel de-
scribed in the previous chapter which is modelled on the log cumulative hazard scale.
Through simulation I investigate the effect of misspecifying the baseline hazard func-
tion when modelling on the log hazard scale, and compare the finite mixture approach
to the Royston-Parmar approach through application to the liver cirrhosis dataset used
in the previous chapter and introduced in Chapter 3.
8.2. Introduction
In the previous chapter, I developed a flexible parametric framework for the joint
analysis of a longitudinal continuous biomarker and a time to event. Although the
RP based survival submodel has advantages in terms of flexibility and estimation,
by modelling on the log cumulative hazard scale, the interpretation of the association
parameter under a time-dependent association structure is not equivalent to when mod-
elling on the log hazard scale. Moving to the more traditional log hazard scale, whilst
maintaining flexibility, can overcome this.
In this chapter I propose to incorporate a finite mixture of parametric distribu-
tions as the survival submodel within the joint modelling framework. Finite mixture
models of this form have been previously shown to increase the flexibility of standard
time-to-event models (McLachlan and McGiffin, 1994), and mixture and non-mixture
152
cure models to obtain improved estimates of statistical cure (Lambert et al., 2010b).
A finite mixture of two Weibull distributions was described in Chapter 4 as a way to
simulate survival times with more biologically plausible baseline hazard functions in-
corporating turning points. This was successfully utilised in an extensive simulation
study that I was involved in, evaluating the Royston-Parmar model (Rutherford et al.,
2014), where we wished to simulate from a complex baseline hazard, and apply flexible
approximations using splines, rather than the true underlying model (in this case the
equivalence between proportional hazards and proportional cumulative hazard holds).
This was further utilised in Chapter 7, simulating data on the log cumulative hazard
scale to evaluate the RP based joint model.
In particular, in this chapter I aim to explore the impact of mispecifying the baseline
hazard function, specifically in terms of estimating the association parameter within a
joint model. In Chapter 4 Section 4.6.2, I conducted a small simulation study which
simulated survival times under a proportional hazards two-component mixture Weibull
model, and applied a Weibull proportional hazards model and the underlying mixture
model, monitoring the estimates of the log hazard ratio due to a binary covariate. It
indicated some bias in the estimate of the log hazard ratio under a Weibull model,
compared to unbiased estimates under the true mixture model.
In Section 8.3 I derive the two-component mixture survival submodel and describe
the appropriate likelihood formulation for a joint longitudinal-survival model under
this model specification. In Section 8.4 I conduct a simulation study, evaluating the
impact of mispecifying the baseline hazard function. In Section 8.5 I compare the two-
component mixture survival submodel to the Royston-Parmar submodel presented in
the previous chapter, in application to the liver cirrhosis dataset. Finally, in Section
8.6, I conclude the chapter with a discussion.
8.3. Methods
For clarity, I briefly re-define the notation. For the ith patient, I observe time-to-
event data, longitudinal response data and covariate data. Let Si be the true survival
153
time of patient i = 1, . . . , n, and Ti = min(Si, Ci) the observed survival time, with Ci
the censoring time. Define an event indicator di, which takes the value of 1 if Si ≤ Ci
and 0 otherwise. Let yij = yi(tij), j = 1, . . . ,mi denote the jth longitudinal response
measurements of a biomarker for the ith patient taken at times tij . Furthermore, I define
shared random effects, bi, which underpin the survival and longitudinal processes. Each
sub-model can be dependent on a set of covariates, Ui, which can potentially differ
between sub-models. Both censoring and time of measurements are assumed to be
non-informative.
8.3.1. Longitudinal sub-model
As in Chapters 6 and 7, I once again assume a continuous longitudinal biomarker of
interest, postulating a linear mixed effects model for the longitudinal sub-model
yi(tij) = mi(tij) + eij , eij ∼ N(0, σ2e) (8.1)
mi(tij) = xTi (tij)β + zTi (tij)bi + uiδ (8.2)
with xi and zi are design matrices for the fixed (β) and random (bi) effects of time.
Here I can incorporate flexibility by allowing both xi and zi to contain, for example,
either fractional polynomials or restricted cubic spline functions of measurement time
(Royston and Altman, 1994; Durrleman and Simon, 1989). ui ∈ Ui is a vector of
covariates, with corresponding regression coefficients, δ.
8.3.2. Two-component mixture proportional hazards models
I now derive the two-component mixture proportional hazards model. I begin with a
general definition, with components additive on the survival scale
S0(t) = pS01(t) + (1− p)S02(t) (8.3)
where S0(t) is the overall baseline survival function, with two component survival func-
tions S01(t) and S02(t), which can each take the form of the survival function of any
particular distribution, such as the exponential or Weibull. The parameter p is known
as the mixture parameter, with 0 ≤ p ≤ 1. The baseline hazard function is then
154
calculated as follows
h0(t) =d
dt[− log(S0(t))] (8.4)
Proportional hazards can then be invoked, where
h(t) = h0(t) exp(Xiβ) (8.5)
with the linear predictor ofXiβ constrained to have no intercept term. In this chapter I
concentrate on using a two-component mixture Weibull baseline hazard function, which
is defined as follows
S0(t) = p exp(−λ1tγ1) + (1− p) exp(−λ2t
γ2) (8.6)
where λ1, λ2, and γ1, γ2 are scale and shape parameters, respectively, with λ,γ > 1.
Once again, p represents the mixing parameter, with 0 ≤ p ≤ 1. Transforming to the
cumulative hazard scale
H0(t) = − log [p exp(−λ1tγ1) + (1− p) exp(−λ2t
γ2)] (8.7)
Differentiating with respect to t, I obtain the baseline hazard function:
h0(t) =λ1γ1t
γ1−1p exp(−λ1tγ1) + λ2γ2t
γ2−1(1− p) exp(−λ2tγ2)
p exp(−λ1tγ1) + (1− p) exp(−λ2tγ2)(8.8)
As was shown in Chapter 4, this hazard function can capture a variety of shapes,
including multiple turning points, providing the desired flexibility. This can be used as
the baseline with covariates acting multiplicatively, with proportional hazards induced
as follows
h(t) =λ1γ1t
γ1−1p exp(−λ1tγ1) + λ2γ2t
γ2−1(1− p) exp(−λ2tγ2)
p exp(−λ1tγ1) + (1− p) exp(−λ2tγ2)exp(Xs(t)β) (8.9)
where Xs(t) is a matrix of time-independent and/or time-dependent covariates, with
associated log hazard ratios, β. This form of survival model could be useful in standard
survival analysis, providing a flexible baseline function beyond standard choices such as
155
the Weibull. This model, along with a mixture Weibull-exponential model, is available
through the stmix command in Stata (Crowther and Lambert, 2011).
Within the joint modelling framework, the association structures described in Sec-
tion 6.4 of Chapter 6 are all applicable within the two-component mixture model, for
example, if I assume the association between the longitudinal and survival models acts
through the current value of the longitudinal response, as modelled by the longitudinal
submodel, and furthermore, define a vector of time independent baseline covariates,
vi ∈ Ui, I have
h(t) =λ1γ1t
γ1−1p exp(−λ1tγ1) + λ2γ2t
γ2−1(1− p) exp(−λ2tγ2)
p exp(−λ1tγ1) + (1− p) exp(−λ2tγ2)expψTvi + α1m(t)
(8.10)
where α1 is our association parameter and estimates the strength of the association
between the longitudinal and survival components, and ψ is a vector of log hazard
ratios corresponding to baseline covariates, vi.
Alternatively, interest may be in the association between the rate of change (slope)
of the biomarker trajectory, and how it relates to the time to event. This association
structure is formulated as
h(t) =λ1γ1t
γ1−1p exp(−λ1tγ1) + λ2γ2t
γ2−1(1− p) exp(−λ2tγ2)
p exp(−λ1tγ1) + (1− p) exp(−λ2tγ2)expψTvi + α2m
′i(t)(8.11)
where m′i(t) = ddtmi(t) is the first derivative of the longitudinal submodel and α2
estimates the strength of the association between the rate of change of the biomarker
trajectory and survival.
Finally, I can incorporate random coefficients, such as the intercept, into the linear
predictor of the survival submodel
h(t) =λ1γ1t
γ1−1p exp(−λ1tγ1) + λ2γ2t
γ2−1(1− p) exp(−λ2tγ2)
p exp(−λ1tγ1) + (1− p) exp(−λ2tγ2)expψTvi+α3(β0+b0i)
(8.12)
where by including the fixed intercept, β0, α3 corresponds to the log hazard ratio for
a one unit increase in the patient specific baseline (t = 0) value of the biomarker. The
156
above association structures were described in Chapter 6, along with further types, I
only concentrate on the most commonly used here for illustration.
8.3.3. Maximising the full joint likelihood
I re-define the full likelihood here, formulated for a continuous biomarker and the time
to an event of interest, as follows,
n∏
i=1
∫ ∞
−∞
mi∏
j=1
f(yi(tij)|bi,θ)
f(bi|θ)f(Ti, di|bi,θ) dbi
(8.13)
with
f(yi(tij)|bi,θ) = (2πσ2e)−1/2 exp
− [yi(tij)−mi(tij)]
2
2σ2e
, (8.14)
f(bi|θ) = (2π|Σ|)−q/2 exp
−b
Ti Σ−1bi
2
, (8.15)
and
f(Ti, di|bi,θ) = h(Ti, di|bi,θ)di × S(Ti, di|bi,θ) (8.16)
where,
S(Ti, di|bi,θ) = exp
(−∫ Ti
0h(u, di|bi,θ) du
)(8.17)
where θ is our full parameter vector. I assume the random effects follow a multi-
variate normal distribution with variance-covariance matrix Σ. The hazard function,
h(Ti, di|bi,θ) is defined in Equation (8.10). The integral in Equation (8.17) can be
evaluated using k-point Gauss-Legendre quadrature, which was utilised in Chapters 4
and 5. When evaluating the full joint likelihood in Equation (8.13), I showed in the
previous chapter that non-adaptive Gauss-Hermite quadrature (Pinheiro and Bates,
1995) can lead to quite substantial under-estimation of standard errors, particularly of
parameters in the longitudinal sub-model, unless a large number (> 30) of quadrature
points are used. This also has the effect of increasing computation time substantially.
Alternatively, I recommend the routine use of fully adaptive Gauss-Hermite quadra-
ture which was shown to perform well with as few as 5 quadrature points (Crowther
157
et al., 2012a). As with all methods which use numerical integration, it is important to
establish the stability of estimates with an increasing number of quadrature nodes.
The likelihood in Equation (8.13) can once again be maximised using the Newton-
Raphson technique with the first and second derivatives estimated numerically, as im-
plemented in the ml command in Stata (Gould et al., 2010). I obtain variance estimates
using the inverse of the negative Hessian matrix evaluated at the maximum likelihood
estimates.
8.4. Simulation study
I evaluate the proposed model through simulation. In previous chapters I have em-
phasised the benefit of simulating time-to-event data using biologically plausible base-
line hazard functions when assessing current and novel methodology (Crowther et al.,
2012a; Crowther and Lambert, 2012). This is particularly important when assessing
flexible and parametric methods.
The primary objective of this simulation study is to assess the finite mixture based
joint model under complex baseline hazard functions. I assume a random intercept
and slope model for a continuous longitudinal outcome, including a binary treatment
group, mi(t) = β0i + β1it+ δui, where β0i ∼ N(0, 1), β1i ∼ N(0, 0.252) and correlation
between (β0i, β1i) of 0.25. I define the direct effect of treatment on the longitudinal
biomarker, δ = 0.25, with ui ∼ Bin(1, 0.5).
For each scenario, 400 patients were included in each of 500 replications. The
observed longitudinal profile was generated from yi(tij) ∼ N(mi(tij), 0.52). Time of
measurements, ti, were fixed at (0, 1, 2, 3, . . . , 9, 10). Survival times were generated
from log(h(t|bi, ui)) = log(h0(t)) + αmi(t) + ψui, with h0(t) defined in Equation (8.8),
i.e. a mixture Weibull baseline hazard. Simulating such survival times required the
general simulation algorithm described in Chapter 4. Administrative censoring was
applied at 10 years. The direct treatment effect on the time-to-event, ψ, is fixed at
0.25, and the association parameter, α, is varied between -0.25, 0.25.158
Two combinations of mixture Weibull parameters were used; a mixture Weibull
with a single turning point in the baseline hazard function, λ1 = 0.2, γ1 = 0.8, λ2 = 1,
γ2 = 1.2, p = 0.3, and a mixture Weibull with 2 turning points and generally increasing
baseline hazard function, λ1 = 0.1, γ1 = 3, λ2 = 0.1, γ2 = 1.5, p = 0.6. I illustrate
the 2 baseline hazard functions used in Figure 8.1. Scenarios 1 and 2 were chosen to
broadly reflect the observed shape of hazard functions found in two commonly used
joint model example datasets (Anderson et al., 1993; Murtagh et al., 1994).
0.0
0.2
0.4
0.6
0.8
Haz
ard
rate
0 2 4 6 8 10Follow-up time (years)
Scenario 1
0.0
0.2
0.4
0.6
0.8
Haz
ard
rate
0 2 4 6 8 10Follow-up time (years)
Scenario 2
Hazard functions
Figure 8.1. Baseline hazard functions used in the simulation study.
To each simulated dataset I apply the two-component mixture Weibull based joint
model, assessing bias and coverage of parameter estimates. 15-point Gauss-Legendre
quadrature is used to evaluate the cumulative hazard, with 5-point adaptive Gauss-
Hermite quadrature used to evaluate the joint likelihood. For comparison, I also apply
a Weibull based joint model so that any potential bias in the estimated hazard ratios
can be evaluated when fitting an overly simple model.
8.4.1. Simulation results
Table 8.1 contains bias and coverage estimates for the three parameters of interest,
the direct effect of treatment on survival, the direct effect of treatment on the longi-
tudinal marker, and the association parameter. Results show some substantial bias
in the association parameter when using a Weibull joint model compared to minimal
159
bias under the two-component mixture Weibull joint model. This is observed in both
scenarios with either a positive or negative association. For example under Scenario
1, with α = −0.25 the bias observed under a Weibull model was -0.103 (41.2%) with
coverage of 29.1%, compared to bias of -0.007 (2.8%) and coverage of 96.4% under the
mixture model. Moderate bias is also observed in the direct effect of treatment on sur-
vival (ψ) when fitting a Weibull model, compared to minimal bias under the mixture
model. For example in scenario 1 with α = −0.25, the bias observed in the estimates
of ψ was 0.059 (23.6%), compared to 0.008 (3.2%) under the mixture model.
Table 8.1. Estimates of bias and coverage from the simulation study.
Scenario ParameterTrue Weibull Mixture Weibullvalue Bias Coverage Bias Coverage
1
α 0.25 -0.067 43.7 0.003 96.6δ 0.25 -0.008 96.4 -0.007 96.4ψ 0.25 0.060 85.6 0.009 95.6
α -0.25 -0.103 29.1 -0.007 96.4δ 0.25 -0.006 96.0 -0.007 96.0ψ 0.25 0.059 86.4 0.008 94.2
2
α 0.25 0.033 72.4 0.003 95.2δ 0.25 -0.008 95.8 -0.006 95.8ψ 0.25 0.059 83.0 0.007 93.8
α -0.25 -0.063 44.4 0.019 89.2δ 0.25 -0.006 96.2 -0.007 96.0ψ 0.25 0.050 85.6 0.002 94.0
8.5. Application to liver cirrhosis data
I now apply the proposed model to the liver cirrhosis dataset, which was also used to
illustrate the RP based joint model in the previous chapter, allowing direct comparisons
to be made. I also apply a Weibull based joint model, as in the simulation study above.
I use a random intercept and random slope trajectory model for prothrombin index
over time, using the log time scale. I adjust for treatment in the survival sub-model
160
and the interaction between measurement time (log) and treatment in the longitudinal
model.
Results are presented in Table 8.2. Consistent parameter estimates were obtained
in all three models with 5- and 6-point adaptive quadrature, to 4 decimal places. From
the mixture-Weibull joint model I observe a statistically significant (at the 5% level)
association of -0.039 (95% CI: -0.047, -0.032), indicating a lower value of prothrombin
index increases the risk of death. I find general consistency across the Weibull, mixture
Weibull models and RP joint models, in terms of the common parameters, and in
particular the association parameter.
Comparing the log hazard scale models, I illustrate the improved fit of the mixture
Weibull approach compared to a standard Weibull through plots of the fitted marginal
survival function, overlaid on the Kaplan-Meier curve, presented in Figure 8.2. The
left hand fitted marginal survival curve clearly illustrates that the Weibull based joint
model is not flexible enough to capture the underlying hazard function, compared to
the mixture Weibull in the right hand plot, showing improved fit to the Kaplan-Meier
curve. This is emphasised by comparing AIC and BIC values from the model fits,
shown in Table 8.2.
0.0
0.2
0.4
0.6
0.8
1.0
Sur
viva
l
0 5 10 15Follow-up time (years)
0.0
0.2
0.4
0.6
0.8
1.0
Sur
viva
l
0 5 10 15Follow-up time (years)
Kaplan-Meier curve 95% CI of Kaplan-MeierMarginal survival
Figure 8.2. Liver cirrhosis trial of prednisone treatment. Fitted mar-ginal survival functions. The left plot is from the Weibull based jointmodel, and the right from the mixture Weibull joint model.
161
8.6. Discussion
In this chapter, I have proposed a new time to event submodel for the joint model
framework, incorporating a two-component parametric mixture model. This formula-
tion overcomes limitations of standard parametric choices such as the exponential and
Weibull, which often lack the flexibility to capture complex hazard functions observed
in clinical datasets. In application to the trial dataset, I observed a much improved
fit to the baseline hazard function when compared to a standard Weibull based joint
model.
The simulation study conducted illustrated excellent performance of the model
in terms of bias and coverage probabilities. The adaptive Gauss-Hermite quadrature
performed well with as few as 5 quadrature nodes, supporting the results found in
the previous chapter (Crowther et al., 2012a). Furthermore, when simulating from
more biologically plausible baseline hazard functions incorporating turning points, I
observed bias in estimates of the association parameter and the direct treatment effect
on survival, when applying a Weibull based joint model.
In the previous chapter, I proposed a joint model which incorporated the flexible
parametric survival model as the survival submodel. This survival submodel is mod-
elled on the (log) cumulative hazard scale, which has direct computational benefits
by avoiding the nested Gauss-Legendre quadrature described in Section 8.3. However,
under time-dependent association structures the equivalence between log hazard ratios
and log cumulative hazard ratios does not always hold. Therefore by directly modelling
on the more traditional log hazard scale, the approach taken in this chapter, I avoid
this issue. In application to the liver cirrhosis data, I found very consistent estimates
of all parameter estimates across the modelling frameworks.
The mixture approach is not without limitations. An issue with such a mixture
of functions is that there are multiple maxima that a model could converge to. For
example, the same baseline hazard function in the described model could be constructed
if p = 0.2, or alternatively if p = 0.8, by just swapping the location and scale parameters
between the two mixtures. Furthermore, the issue of ‘how many mixtures?’ should be
162
raised. In the example described in this chapter, it is likely that two mixtures are
sufficient to capture the complexity of the baseline hazard function; however, there
could be situations where more than two mixtures may be required. This issue has been
considered from a Bayesian perspective in which the number of mixtures is treated as
a further parameter to be estimated (Demiris et al., 2011).
An alternative to the mixture approach is to use restricted cubic splines on the log
hazard scale, which I would recommend, as opposed to the RP model. This will be
discussed further in Chapter 10.
163
Table
8.2.
Liv
erci
rrh
osi
str
ial
ofp
red
nis
one
trea
tmen
t.P
aram
eter
esti
mat
esan
d95%
Con
fid
ence
Inte
rval
s(C
I)fo
rth
eW
eib
ull,
mix
ture
-Wei
bu
llan
dR
Pjo
int
mod
els.
Par
amet
erW
eib
ull
join
tm
od
elM
ixtu
re-W
eib
ull
join
tm
od
elR
Pjo
int
mod
elE
stim
ate
95%
CI
Est
imat
e95
%C
IE
stim
ate
95%
CI
Lon
git
ud
inal:
t10.9
240.
444
1.40
40.
909
0.42
81.
391
0.877
0.3
93
1.3
61
Tre
atm
ent*
t10.
269
-0.3
530.
891
0.26
9-0
.354
0.8
93
0.2
76
-0.3
510.9
02
Inte
rcep
t75.
482
73.2
7477
.690
73.2
2077
.639
75.2
4875
.304
73.0
8977
.519
sd(t
1)2.
282
1.89
72.
746
2.30
41.
920
2.766
2.3
33
1.9
64
2.7
86
sd(I
nte
rcep
t)21.7
9620
.088
23.6
4821
.805
20.0
9823.
658
21.
843
20.1
32
23.7
00
corr
(t1,
Inte
rcep
t)0.
616
0.46
80.
730
0.61
60.
470
0.7
30
0.609
0.466
0.721
sd(R
esid
ual)
17.6
33
17.1
1018
.172
17.6
2817
.106
18.1
66
17.
611
17.
091
18.
147
Su
rviv
al:
Ass
oci
ati
on
-0.0
38-0
.045
-0.0
31-0
.039
-0.0
47-0
.032
-0.0
38
-0.0
45-0
.031
Tre
atm
ent
0.19
4-0
.05
0.43
70.
194
-0.0
50.
438
0.209
-0.0
38
0.4
57
log(λ
1)
1.08
20.
606
1.55
8-0
.105
-0.9
130.7
03
--
-lo
g(γ
1)
-0.0
42
-0.1
380.
053
0.42
80.
210
0.6
46
--
-lo
g(λ
2)
--
-1.
586
1.16
32.0
09
--
-lo
g(γ
2)
--
-0.
134
-0.0
780.3
46
--
-lo
git(p)
--
--3
.081
-5.1
22-1
.04
--
-
log
like
lih
ood
-1404
1.50
4-1
4031
.385
-1402
9.375
AIC
2810
5.00
728
090.
769
2808
8.751
BIC
2814
5.45
128
142.
244
281
43.9
02
wher
et1
=lo
g(t
+0.0
02737
)
164
CHAPTER 9
Adjusting for measurement error in baseline prognostic
biomarkers: A joint modelling approach
9.1. Chapter outline
In this chapter I explore the use of a specific association structure within the joint
model framework, describing how joint modelling can be used to account for measure-
ment error in a biomarker, even when only the baseline value of the biomarker is of
interest. This is a common occurrence in registry data sources, where often repeated
measurements exist but are simply ignored. The proposed specification is evaluated
through simulation and applied to data from the General Practice Research Database,
investigating the association between baseline Systolic Blood Pressure (SBP) and the
time to stroke in a cohort of obese patients with type 2 diabetes mellitus.
This chapter is based on a recent publication in BMC Medical Research Methodol-
ogy, a copy of which can be found in Appendix E (Crowther et al., 2013b).
9.2. Introduction
Many biomarkers such as systolic blood pressure (SBP) have been identified as
key prognostic factors in the development and validation of cardiovascular risk scores
(Conroy et al., 2003; Hippisley-Cox et al., 2007). However, often only baseline values of
these biomarkers are used, for example measured at diagnosis, despite the existence of
repeated measures, especially in registry sources such as the General Practice Research
Database (GPRD) (GPRD, 2011; Ara et al., 2012). Prognosis from diagnosis is a
relevant research question and of interest to both patients and clinicians. Furthermore,
biomarkers are often measured with error. Failing to adjust for such measurement error
leads to estimates being biased towards the null (Prentice, 1982).
165
As I have described in the previous chapters, a joint model of longitudinal and
survival data allows us to investigate the relationship between a repeatedly measured
biomarker, subject to measurement error, such as SBP, and the time to an event of
interest, such as time to non-fatal stroke. The approach which has dominated the
methodological literature involves linking the two component sub-models using shared
random effects (Wulfsohn and Tsiatis, 1997; Henderson et al., 2000). These methods
require computationally intensive numerical integration, which is difficult to implement.
However, due to the recent introduction of user-friendly software in R (Rizopoulos, 2010;
Philipson et al., 2012) and Stata (Crowther et al., 2013a), these models are starting to
find their place in applied research (Wolbers et al., 2010; Ibrahim et al., 2010; Gould
et al., 2014), but the potential uses of and forms of the association parameters, linking
the longitudinal and survival components, are yet to be fully explored.
The most commonly used association structures include the current value param-
eterisation (Wulfsohn and Tsiatis, 1997); whereby I directly link the value of the
biomarker, as estimated by the longitudinal sub-model, to survival, and the first deriv-
ative or slope (Ye et al., 2008; Wolbers et al., 2010); allowing the investigation of the
effect that the rate of change of the biomarker has on survival.
There is often interest in predicting prognosis based on an initial baseline measure-
ment (Conroy et al., 2003; Hippisley-Cox et al., 2007). In this chapter I investigate
the use of the joint model framework with a random intercept association structure
as an approach to adjust for measurement error, inherent in biomarkers such as SBP.
By incorporating the repeated measures I thus make the most efficient use of the data
available.
In Section 9.3 I briefly re-formulate the joint model, whilst in Section 9.4 I conduct
a simulation study assessing the validity of the approach, varying the magnitude of
measurement error in the assumed biomarker, the magnitude of the association between
the biomarker and survival, and compare the joint model to the standard approach of
adjusting for observed baseline biomarker values in a survival model. In Section 9.5
I illustrate the approach with application to a cohort of 4,850 patients with type 2
166
diabetes mellitus, who have repeated measures of SBP and a time-to-event outcome
of non-fatal stroke. Here I model the longitudinal trajectories flexibly using restricted
cubic splines, and assess the impact using simpler longitudinal trajectory models, using
polynomials, has on estimates of the association parameter. I conclude the chapter in
Section 9.6 with a discussion.
9.3. Joint model formulation
As introduced in Chapter 6; a joint model of longitudinal and survival data consists
of two component submodels, the longitudinal submodel and the survival submodel. I
define a set of baseline covariates, Ui, which can potentially differ between submodels.
The longitudinal submodel allows us to model the trajectory of a repeatedly measured
biomarker over time, adjusting for baseline covariates. The standard approach assumes
a linear mixed effects model. I observe
yi(tij) = mi(tij) + εij , εij ∼ N(0, σ2e) (9.1)
with
mi(tij) = xTi (tij)β + zTi (tij)bi + uiδ (9.2)
where yi(tij) is the observed longitudinal response for the ith patient measured at
the jth time point. mi(tij) is our true trajectory function consisting of design matri-
ces xTi (tij) and zTi (tij) for the fixed and random effects, β and bi, respectively, where
bi ∼ MVN(0,Σ). I can incorporate flexibility here by allowing both xTi (tij) and zTi (tij)
to contain restricted cubic spline functions of time (Durrleman and Simon, 1989), de-
scribed further in Section 9.5. I also have a vector of baseline covariates ui ∈ Ui,and corresponding regression coefficients, δ. Finally, εij is our normally distributed
measurement error with constant variance σ2e .
The time-to-event submodel usually takes the form of a proportional hazards model
hi(t) = h0(t) exp(α1mi(t) +ψTvi) (9.3)
167
with h0(t) the baseline hazard function and vi ∈ Ui is a vector of baseline covariates
with corresponding log hazard ratios, ψ. The parameter α1 is commonly named the
association parameter, indicating the strength of association between the longitudinal
biomarker and the time to event. This parameterisation assumes the hazard is de-
pendent on the biomarker through its current value. This form of association is one
of many ways to link the two component sub-models. The baseline hazard function,
h0(t), can be modelled using a parametric distribution, most frequently the Weibull,
or more flexibly using Royston-Parmar survival models (Royston and Lambert, 2011),
or of course can be left unspecified (Henderson et al., 2000).
For illustration, I let mi(tij), the longitudinal submodel, be a linear function of
time
mi(tij) = (β0 + b0i) + (β1 + b1i)tij (9.4)
giving a model with a random intercept and random linear slope. As an alternative
way of linking the component models to that of Equation (9.3), I may link elements
of the trajectory function, mi(tij), to the hazard directly. For example, I can link the
subject specific baseline biomarker values through the intercept association structure,
where
hi(t) = h0(t) exp[α2(β0 + b0i) +ψTvi
](9.5)
where α2 now estimates the strength of the association between the patient specific
baseline biomarker values, as estimated by the longitudinal submodel, and the time to
event. This way I can let the risk of event depend directly on the subject specific value
of the biomarker at time t = 0.
If interest lies in prediction when a new patient is observed at baseline, the issue
of measurement error can be accounted for through this approach. A benefit of this
association structure also lies in the evaluation of the joint likelihood. Under most
parametric survival submodels (e.g. Weibull distribution) and time-dependent asso-
ciation structures (eg. current value), numerical quadrature is required to integrate
out not only the random effects, but under Equation (9.3), nested quadrature is also
168
required to evaluate the cumulative hazard function. Under the time-independent as-
sociation structure of Equation (9.5), I avoid this nested quadrature as the cumulative
hazard function has an analytically tractable form, which provides computational ben-
efits. For example under a Weibull baseline hazard function, the cumulative hazard
and subsequently the survival under the association structure shown in Equation (9.5)
are written as
Hi(t) = λtγ exp[α2(β0 + b0i) +ψTvi
](9.6)
and
Si(t) = exp(−λtγ exp
[α2(β0 + b0i) +ψTvi
])(9.7)
As discussed in Section 9.2, this model formulation can be an alternative to the standard
approach of using the observed baseline biomarker value
hi(t) = h0i(t) exp(α3Y0i +ψTvi) (9.8)
where Y0i is the observed baseline biomarker value and α3 is the log hazard ratio for
a one unit increase in the observed baseline biomarker value. Although simple to fit,
Equation (9.8) does not account for potential measurement error in Y0i.
Equation (9.5) assumes, to some extent, that the longitudinal submodel has been
correctly specified, in that it is capturing the variability over time and providing sen-
sible estimates of the subject-specific intercept to feed into the survival component.
Robustness to this will be investigated in application to the GPRD cohort, in Section
9.5.1.
9.4. Simulation study
In order to assess the performance of the standard approach of including observed
biomarker values, compared to the full joint model described above, I evaluate both
through simulation, allowing an assessment of performance when I can fit the true data-
generating model. For ease of exposition I assume a longitudinal model with random
169
intercept and slope, assuming a continuous biomarker of interest with
mi(tij) = (β0 + b0i) + (β1 + b1i)tij
where β0 = β1 = 0, and b0i ∼ N(0, 1), b1i ∼ N(0, 0.252) with correlation between
(b0i, b1i) of 0.25. Observed measurements are then generated from Yij ∼ N(mi(tij), σ2e),
where tij is the time of the jth measurement for the ith patient. I vary σe from
0.1, 0.5, 1.I assume a Weibull baseline hazard function with λ = 0.1 and γ = 1.5. A binary
variable, v1i to represent treatment group was generated from Bin(1, 0.5), with an
associated log hazard ratio of φ1 = −0.5. A continuous covariate, v2i, to represent age at
baseline was generated from N(65, 12) with an associated log hazard ratio of φ2 = 0.01.
I then generate survival times from a Weibull distribution where the hazard is defined
as h(t) = h0(t) exp(α2(β0 + b0i) + φ1v1i + φ2v2i), with α2 the association parameter,
indicating the effect of a one unit increase in the value of the subject specific intercept
on the risk of event. I vary α2 = −0.5, 0.25, 0.25, 0.5. Each simulation contained 300
patients with up to 5 annual measurements (including baseline), and administrative
censoring at 5 years. This corresponds to an approximate 18.9% survival proportion at
5 years (calculated at the mean of covariate values, v1 = 12 , v2 = 65 and β0i = 0). To
each dataset I fit a Weibull proportional hazards model including the observed baseline
measurement, and a Weibull-based joint model with the random intercept association
structure. I adjust for age and treatment in the survival submodel. Each scenario is
simulated 1000 times.
To illustrate the varying measurement error standard deviations used in the simula-
tion scenarios, I show in Figure 9.1 observed longitudinal measurements from the same
100 patients with σe = 0.1, 0.5, 1, and when α = 0.25. Figure 9.1 illustrates that
as the measurement error standard deviation increases, the variability in the observed
biomarker values increases.
170
-4-2
02
4Lo
ngitu
dina
l res
pons
e
0 1 2 3 4Time of measurement (years)
se = 0.1
-4-2
02
4Lo
ngitu
dina
l res
pons
e
0 1 2 3 4Time of measurement (years)
se = 0.5
-6-4
-20
24
Long
itudi
nal r
espo
nse
0 1 2 3 4Time of measurement (years)
se = 1
Figure 9.1. Example simulated observed longitudinal measurementswith varying measurement error standard deviation.
9.4.1. Results
Bias and coverage estimates for the association parameter are presented in Table 9.1.
Under the standard Weibull model, I observe increasing bias in the estimates of the
association between baseline biomarker values and survival, as the magnitude of the
measurement error standard deviation, σe, increases. In parallel I observe very poor
coverage probabilities under the Weibull approach. For example, with α = 0.5 and
σe = 1, I observe bias of -0.261 (percentage bias of -52.2%) and coverage of 0.4%.
In contrast, under the joint modelling approach I observe minimal bias and coverage
probabilities close to 95% across all scenarios.
9.5. Analysis of GPRD cohort
I now present the analysis of the GPRD cohort, first introduced in Chapter 3.
The cohort consists of 4,850 obese patients diagnosed with type 2 diabetes mellitus.
There are 107,347 measurements of SBP, with maximum follow-up of 22 years. There
were 278 stroke events. In all analyses I use SBP/10 so that a unit increase in SBP/10
represents a clinically meaningful 10 mm Hg increase in SBP. The primary interest is the
171
Table 9.1. Simulation results of bias and coverage of the associationparameter, α. The standard deviation of the measurement error is σe.
True value of α σeWeibull model Joint modelBias Coverage Bias Coverage
0.500.1 -0.001 94.8 0.005 95.30.5 -0.105 65.4 0.005 95.61.0 -0.261 0.4 0.008 94.8
0.250.1 0.002 94.4 0.005 94.30.5 -0.046 89.0 0.007 94.51.0 -0.123 34.1 0.010 94.8
-0.250.1 0.003 93.8 0.001 94.00.5 0.051 87.1 0.000 94.21.0 0.127 29.7 -0.002 94.6
-0.500.1 0.000 96.6 -0.005 95.90.5 0.104 66.7 -0.006 95.71.0 0.260 0.4 -0.010 94.5
association between baseline SBP and the risk of stroke. Baseline (tij = 0) corresponds
to when each patient entered the cohort, i.e. the time of first SBP measurement.
In Figure 9.2 I show the observed SBP measurements for 9 randomly selected
patients, who had at least 10 measurements, illustrating some non-linear trajectories.
To accommodate such non-linearities I can use restricted cubic splines in the linear
mixed effects submodel. In particular, I specify the following longitudinal trajectory
submodel
mi(tij) = (β0 + b0i) + β1agei + β2sexi + β3BMIi
+ βF sF (tij ; kF ) + bRsR(tij ; kR) (9.9)
Where sF (tij ; kF ) is the restricted cubic spline basis of measurement time with cor-
responding fixed effects, βF , with knot locations kF , and sR(tij ; kR) is the restricted
cubic spline basis of measurement time with corresponding random effects, bR, and
knot locations kR.
172
Preliminary modelling of the longitudinal data can be conducted to guide model
selection, in particular, the degrees of freedom for the spline terms capturing the un-
derlying longitudinal trajectory over time. I begin by assuming a random intercept and
selecting the degrees of freedom for the fixed spline terms using the AIC and BIC. In
this case, both select five degrees of freedom for sF (tij ; kF ). For the random splines
of time I assume a linear term, which equates to one spline term for sR(tij ; kR). This
allows a very flexible form to take into account the variation in SBP over time. I further
adjust for age (years), sex (female as the reference group) and Body-Mass Index (BMI
in kg/m2) at baseline.
To allow flexibility in the survival submodel I use the flexible parametric Royston-
Parmar survival submodel, developed in Chapter 7 (Royston and Lambert, 2011; Roys-
ton and Parmar, 2002; Crowther et al., 2012a), which models the baseline log cumula-
tive hazard function using restricted cubic splines. I can once again undertake separate
analysis of just the survival data to inform model selection. In particular, I use the
AIC and BIC to select the number of degrees of freedom to capture the baseline hazard
function, following Rutherford et al. (2014). In this case, both selected two degrees of
freedom, with an AIC of 2408.7173 and BIC of 2430.483. If one degree of freedom had
been selected, then this would be equivalent to a Weibull survival model. Our final
joint model is then
logHi(t) = logH0(t) + φ1agei + φ2sexi + φ3BMIi + α2(β0 + b0i) (9.10)
where
logH0(t) = s(log(t);γ,kS) (9.11)
where the baseline log cumulative hazard function, log(H0(t)), is expanded into a re-
stricted cubic spline function of log(t), s(log(t);γ,kS), with knot locations kS and
coefficient vector, γ. In each submodel I adjust for the baseline effects of age, sex and
BMI. I fit the joint model with the random intercept association structure shown in
173
Equation (9.5). For comparison, I also apply the standard Royston-Parmar survival
model, adjusting for observed baseline SBP, age, sex and BMI.
100125150175200
SB
P
0 5 10 15Follow-up time (years)
Patient 1
100125150175200
SB
P0 5 10 15
Follow-up time (years)
Patient 2
100125150175200
SB
P
0 5 10 15Follow-up time (years)
Patient 3
100125150175200
SB
P
0 5 10 15Follow-up time (years)
Patient 4
100125150175200
SB
P
0 5 10 15Follow-up time (years)
Patient 5
100125150175200
SB
P
0 5 10 15Follow-up time (years)
Patient 6
100125150175200
SB
P
0 5 10 15Follow-up time (years)
Patient 7
100125150175200
SB
P
0 5 10 15Follow-up time (years)
Patient 8
100125150175200
SB
P
0 5 10 15Follow-up time (years)
Patient 9
Observed SBP Predicted SBP
Figure 9.2. GPRD cohort of obese patients with type 2 diabetes mel-litus. Longitudinal response measurements for SBP for 9 randomly se-lected patients who had at least 10 measurements. The dashed linerepresents the fitted longitudinal trajectories based on the joint model.
Results are presented in Table 9.2. Under the standard Royston-Parmar survival
model I observe a hazard ratio for a ten mm Hg increase in baseline SBP of 1.111
(95% CI: 1.051, 1.172). Under a joint model I observe an increased hazard ratio of
1.198 (95% CI: 1.107, 1.298). The increased effect using a joint model is consistent
with that observed in the simulation study, i.e. that the bias in the standard survival
model is towards the null. The fitted trajectories seen in Figure 9.2 appear to capture
the subject-specific measurements well, although some patients appear to only require
a linear trend.
174
Table 9.2. GPRD cohort of obese patients with type 2 diabetes mel-litus. Results from applying a Royston-Parmar proportional hazardsmodel adjusting for observed baseline systolic blood pressure, and a fulljoint model using the intercept association structure. All models areadjusted for age (years) sex (male) and BMI (kg/m2).
Standard RP model Joint modelCoefficient 95% CI Coefficient 95% CI
Survival model:Baseline SBP/10 (α2) 0.105 0.050 0.159 0.181 0.102 0.261
Age (years) 0.048 0.036 0.060 0.050 0.038 0.062Sex 0.011 -0.233 0.254 -0.010 -0.254 0.234
BMI (kg/m2) 0.011 -0.015 0.037 0.013 -0.012 0.039
Longitudinal model:Intercept - - - 13.006 12.629 13.382
Age (years) - - - 0.025 0.022 0.029Sex - - - -0.252 -0.332 -0.171
BMI (kg/m2) - - - 0.003 -0.005 0.011RCS1 - - - -0.080 -0.121 -0.039RCS2 - - - -0.006 -0.019 0.006RCS3 - - - -0.001 -0.010 0.007RCS4 - - - 0.003 0.000 0.006RCS5 - - - 0.000 -0.001 0.001
σe - - - 1.522 1.515 1.528
RCS - Restricted Cubic Spline
I illustrate the bias from the standard approach in Figure 9.3, showing predictions
from both models for a female patient aged 60, with low (90), medium (130) and high
(200) SBP baseline measurements. To quantify the differences, at 10 years under the
standard model I observe a survival probability of 0.881 for a SBP of 200, compared to
0.770 under the full joint model.
9.5.1. Robustness to model specification. The association structure used to
link baseline SBP to the risk of stroke, relies on the estimated subject-specific baseline
value of SBP, as modelled by the longitudinal submodel. This will clearly be influenced
by how the longitudinal trajectory over time has been specified. To investigate the
sensitivity of this association structure, I repeat the analysis described in Section 9.5;
however, now assume a random intercept and random linear slope. This model is
clearly inadequate to capture the variability in SBP over time, as was indicated by
175
0.6
0.7
0.8
0.9
1.0
Sur
viva
l pro
babi
lity
0 5 10 15 20Follow-up time (years)
FPM model: SBP = 90 Joint model: SBP = 90FPM model: SBP = 130 Joint model: SBP = 130FPM model: SBP = 200 Joint model: SBP = 200
Figure 9.3. Predicted survival from the Royston-Parmar flexible para-metric model (FPM) and joint model, for a female, aged 60 years, BMIof 30, with SBP of 90, 130 or 200.
the selection of restricted cubic spline function with 5 degrees of freedom in the above
application. Results are presented in Table 9.3.
Table 9.3. GPRD cohort of obese patients with type 2 diabetes mel-litus. Results from applying a joint model with random intercept andrandom linear slope in the longitudinal submodel, using the interceptassociation structure.
Coefficient Std. Err. 95% CI
Survival model:Baseline SBP/10 (α2) 0.145 0.040 0.066 0.224
Age (years) 0.051 0.006 0.038 0.063Sex -0.010 0.124 -0.253 0.233
BMI (kg/m2) 0.014 0.013 -0.012 0.039
Longitudinal model:Time -0.060 0.004 -0.067 -0.053
Age (years) 0.024 0.002 0.021 0.028Sex -0.259 0.041 -0.340 -0.179
BMI (kg/m2) 0.000 0.004 -0.007 0.008Intercept 13.218 0.191 12.843 13.593
176
Comparing results from Table 9.3 to Table 9.2, I observe very consistent estimates
for all covariate effects between models, except the association parameter. Under the
random intercept and random linear slope longitudinal trajectory, I obtain an estimate
of association of 0.145 (95% CI: 0.066, 0.224), representing a hazard ratio of 1.156 (95%
CI: 1.068, 1.251). This is interpreted as a 15.6% (95% CI: 6.8%, 25.1%) increase in
the rate of stroke for a 10 unit increase in baseline systolic blood pressure. I compare
this to a hazard ratio of 1.198 (95% CI: 1.107, 1.298), representing a 19.8% (95% CI:
10.7%, 29.8%) increase in the rate of stroke for a 10 unit increase in baseline SBP, from
the more flexible spline longitudinal submodel. This will discussed in Section 9.6. To
highlight the different model fits, the AIC and BIC from the linear trajectory model
were 415,250.38 and 415,308.42, respectively, and from the spline based model were
414,627.73 and 414,700.28, respectively, indicating a much poorer fit from the linear
trajectory based joint model.
9.6. Discussion
A wealth of patient data is becoming available in registry sources such as the Clin-
ical Practice Research Database (the successor to GPRD), providing extensive oppor-
tunities to utilise the joint modelling framework. I have shown that by incorporating
repeated measures of a biomarker within a unified joint model framework, the bias due
to measurement error is minimised, even when only the baseline level of the biomarker
is predictive of survival. As illustrated in the simulation study, ignoring measurement
error in biomarkers such as blood pressure can lead to a marked under-estimation of
covariate effects. In the application, through the use of restricted cubic splines in the
linear mixed effects submodel, I can model highly nonlinear trajectories over time,
compared to linear slope models. Furthermore, the Royston-Parmar survival submodel
can also capture complex baseline hazard functions, an important component when
predicting survival at the patient level (Crowther et al., 2012a).
Given that, to my knowledge, all current cardiovascular risk scores only use base-
line measures, with no adjustment for measurement error, the prospects of utilising this
177
framework to improve prognostic risk scores is quite substantial. Predicting survival
for a new patient using this framework follows naturally, as often only a first baseline
biomarker observation will be available. However, such a modelling approach also al-
lows a dynamic risk prediction approach to be adopted, whereby a patient’s estimated
future risk is updated as each new biomarker value is obtained in real-time using elec-
tronic health records (Rizopoulos, 2011). Such an approach could enable response to
treatment to be monitored and patients counselled accordingly.
In the analysis of the GPRD cohort, I incorporated flexibility in both the longitu-
dinal submodel through the use of restricted cubic splines, and the Royston-Parmar
survival submodel. Given that both submodels require choosing the number of degrees
of freedom, a simple sensitivity analysis can be undertaken to assess knot locations
and number of knots. I showed in Chapter 7 that the Royston-Parmar survival sub-
model is very robust to both knot placement and number of knots within a joint model
framework (Crowther et al., 2012a), and furthermore, an extensive simulation study,
described in Chapter 2, has been conducted by Rutherford et al. (2014), which showed
excellent performance of the Royston-Parmar model to capture simple and complex
baseline hazard functions. Furthermore, given that primary interest was in the sur-
vival component, and the estimate of association, often modelling the longitudinal
component with a suitable sensible functional form will provide an improved estimate
compared to simplistic approaches of separate modelling.
In sensitivity analysis, I investigated the influence that the functional form for the
longitudinal trajectory over time, has on the estimate of the association parameter. I
found a marked difference between the estimate of association from the model which
was chosen through use of the AIC and BIC to select the functional form (selecting a
restricted cubic spline function with 5 degrees of freedom for the fixed effects of time,
and a random linear slope), and an intentionally more simplistic model which assumed
a random intercept and fixed and random linear effect of time. Although the AIC and
BIC indicated a much improved fit from the selected spline based model, compared to
178
the simplistic linear model, caution must be taken when assessing model fit for each
component of a joint model. This is an area which requires further research.
In this chapter I have concentrated on a specific association structure linking the
two component submodels; however, it may be of interest to investigate linking multiple
components of a biomarkers trajectory to the time to an event of interest. For example,
recent work by Rothwell et al. (2010) has shown associations between not only baseline
blood pressure, but also variability over time as important predictors of cardiovascular
events. Furthermore, I have only compared the standard approach of adjusting for
observed baseline biomarker values to the full joint model. It would be of interest
to compare alternative approaches for adjusting for measurement error, not only in
baseline biomarkers, but also under a time-dependent association structure (Zucker,
2005; Liao et al., 2011).
Extensions to the modelling framework include incorporating multiple biomarkers.
In particular, in this chapter I modelled SBP over time, whilst adjusting for baseline
BMI. It may be of interest to model not only SBP but also the inter-relationships
between different biomarkers such as BMI, and how they are related to an event of
interest (Rizopoulos and Ghosh, 2011). A further issue with an observational dataset
in particular, is the choice of appropriate timescale. The baseline time point used here
is time of first measurement of SBP; however, when estimating survival probabilities,
this does not provide an intuitive, clinically meaningful way of conveying survival to
patients. More appropriate perhaps, would be to use age as the timescale; however,
this requires delayed entry to be incorporated into the joint model framework, which
is not currently available in any software package. I discuss this extension further in
Chapter 11.
An issue which has perhaps made researchers apprehensive about adopting a joint
modelling approach, particularly to registry based datasets, is the computational bur-
den. To illustrate computational aspects of the framework, the presented joint model
applied to the cohort of 4,850 patients and a total of 107,347 observations took just
179
over 13 minutes to converge on a HP Intel Core i5 2.5GHz CPU computer with 8Gb of
RAM, which illustrates that this form of analysis is becoming entirely feasible.
180
CHAPTER 10
Software for the joint modelling of longitudinal and
survival data
10.1. Chapter outline
In this chapter I describe one of the central components of this thesis; the Stata
package stjm which implements the class of joint models that have been described in
the previous four chapters.
The first version of the software was published in The Stata Journal, a copy of
which can be found in Appendix F (Crowther et al., 2013a). I describe the latest
version of the software, consisting of approximately 15,000 lines of code, which includes
developments and extensions since publication, with the software available from the
Statistical Software Components (SSC) archive, which can be downloaded by typing
ssc install stjm, from Stata.
10.2. Introduction
A central theme of this thesis is to not only develop novel methodology for the
analysis of complex survival data, but in parallel, develop and disseminate Stata soft-
ware packages that can be readily used by statistical and medical researchers. Within
the Stata user community, this is generally done by depositing packages at the Statisti-
cal Software Components (SSC) archive, or through publication in The Stata Journal.
Both of these resources allow any Stata user to download the required package.
The uptake of new methodology within applied research can often be hampered
through a lack of ‘user friendly’ software, with sufficient documentation that researchers
can apply the methods with confidence. In this chapter, I describe the Stata package
stjm, which has been developed alongside the work described in Chapters 6 to 9. It
181
allows the joint modelling of a longitudinal continuous response and the time to an event
of interest, using the linear mixed effects framework as the longitudinal submodel, and
the proportional hazards or cumulative hazards framework for the survival submodel.
10.3. Data setup
I illustrate stjm through application to a dataset of 312 patients with Primary
Biliary Cirrhosis (PBC), from Murtagh et al. (1994), first introduced in Chapter 3. 158
patients were randomised to receive D-penicillamine, with 154 assigned to a placebo.
Serum bilirubin was measured repeatedly at intermittent time points. I investigate the
effect of treatment after adjusting for the relationship between serum bilirubin levels
and time to death. A total of 140 deaths were observed. Due to right skewness, in all
analyses I work with log(serum bilirubin).
The dataset must be correctly stset for use with stjm, through the use of start and
stop times. The purpose of the stset command is to declare it to be multiple-record
per subject survival data, which invokes sets of internal error checks to ensure the data
set has been setup correctly, and creates core variables such as d; the event indicator,
t0; the start times, and t; the stop times. The dataset must be stset in order for
stjm to be used, which leaves less room for user error. Through this, stjm can use t0
as measurement times, and the final row of t as the survival times. I illustrate the
data structure below:
. use fullpbc, clear
. stset stop, enter(start) f(event=1) id(id)
id: idfailure event: event == 1
obs. time interval: (stop[_n-1], stop]enter on or after: time startexit on or before: failure
1945 total obs.0 exclusions
1945 obs. remaining, representing312 subjects140 failures in single failure-per-subject data
2000.307 total analysis time at risk, at risk from t = 0earliest observed entry t = 0
last observed exit t = 14.30566
. list id logb drug _t0 _t _d if id==3 | id==5, noobs sepby(id)
182
id logb drug _t0 _t _d
3 .3364722 D-penicil 0 .48187494 0
3 .0953102 D-penicil .48187494 .99660498 0
3 .4054651 D-penicil .99660498 2.0342789 0
3 .5877866 D-penicil 2.0342789 2.7707808 1
5 1.223776 placebo 0 .54484725 0
5 .6418539 placebo .54484725 1.070529 0
5 .9162908 placebo 1.070529 2.1054649 0
5 1.740466 placebo 2.1054649 3.0062425 0
5 1.648659 placebo 3.0062425 3.9836819 0
5 2.944439 placebo 3.9836819 4.1205783 0
Here I have 2 patients with 4 and 6 measurements of log(serum bilirubin), respec-
tively. The variable t0 can now be used to denote the time that measurements were
taken, and the final row (for each patient) of t represents the survival time. The
data can be explored using the supplementary command stjmgraph (which can be in-
stalled by typing ssc install stjmgraph), which creates joint longitudinal-survival
plots. The syntax is simple, I state the command name, the name of the longitudinal
outcome and the panel identifier variable. I also use the lowess option to overlay a
lowess smoother to aid interpretation, and the adjust option which scales the x-axis, by
taking away each patient’s observed survival/censoring time from the time of measure-
ments. A variety of further standard graphical options are also allowed in the syntax
of stjmgraph, to edit for example axis titles and labels.
. stjmgraph logb, panel(id) lowess adjust
Figure 10.1 displays all patients’ longitudinal trajectories against time, across event
status, with the timescale adjusted by subtracting each patients survival/censoring
time. If the dataset was particularly large, I could restrict the plotted sample using
the if or in qualifiers. There appears to be a general increasing trend which is much
sharper in patients who died, compared to those who were censored. This is indicative
of a positive association between longitudinal response and time to death, whereby a
higher level of the biomarker appears to be associated with time to death.
183
-20
24
Long
itudi
nal r
espo
nse
-15 -10 -5 0Time before censoring
Censored
-20
24
Long
itudi
nal r
espo
nse
-15 -10 -5 0Time before event
Event
Figure 10.1. Longitudinal profiles of log(serum bilirubin) for patientswho were censored/died. Timescale is adjusted by taking away eachpatients survival time.
10.4. Model fitting
I now describe the main options which control the appropriate submodel specifi-
cations. A joint model can be fitted with essentially a single line of code, under the
following general syntax:
. stjm longdepvar [varlist] [if] [in] , panel(varname)
> survmodel(survsubmodel) [options]
The core components include declaring the longdepvar; the continuous longitudinal
response variable, panel(); the variable which contains a unique identifier for each
patient, and survmodel(); the type of survival submodel (see Section 10.4.2). Variables
specified in [varlist] pertain to baseline covariates to include in the longitudinal
submodel specification.
10.4.1. Longitudinal submodels
The specification of how to model the longitudinal response over time must be exclu-
sively handled through options, as under a time-dependent association structure the
cumulative hazard function is obtained using numerical integration, integrating over
184
follow-up time, and therefore stjm needs to know the exact specification of the time-
dependent fixed and random design matrices. To maintain flexibility, there are two
core ways to model the longitudinal outcome over time.
The first is to use polynomials using the options ffp(numlist) and rfp(numlist).
ffp() specifies power transformations of the time variable, to be included in the lon-
gitudinal submodel as fixed covariates, with t0 used as the time of measurements.
Values must be in −5,−4,−3,−2,−1,−0.5, 0, 0.5, 1, 2, 3, 4, 5. rfp() specifies power
transformations of the time variable, to be included in the longitudinal submodel as
fixed and random covariates. By default, a random intercept term is assumed in all
stjm models.
Alternatively, restricted cubic splines can be used through the options frcs(#) and
rrcs(#). By default, internal knot locations are placed at equally spaced centiles of
the distribution of observation times, with boundary knots placed at the minimum and
maximum observation times. frcs() specifies the degrees of freedom of the time vari-
able, expanding time into restricted cubic splines as fixed covariates. rrcs() specifies
the degrees of freedom of the time variable, expanding time into restricted cubic splines
as random covariates.
The variance-covariance structure of the random effects can be specified using the
covariance() option, with four choices available, namely, independent; specifies a
distinct variance for each random effect, with all covariances zero, exchangeable;
specifies equal variances for all random effects and one common pairwise covariance,
identity; specifies equal variances for all random effects with all covariances zero, and
unstructured; specifies that all variances and covariances are distinctly estimated. If
covariance() is left empty, then covariance(unstructured) is assumed.
Finally, interactions between baseline covariates and fixed effects of measurement
time can be specified using the timeinteraction(varlist) option. This allows dif-
ferent slopes for different covariate patterns.
10.4.2. Survival submodels
A variety of survival submodel choices are available, many of which were described in
185
earlier chapters. I briefly list them here for completeness, assuming the current value
parameterisation for illustration. Baseline covariates can be included in all survival
submodels, through the survcov(varlist) option.
10.4.2.1. Exponential
An exponential baseline hazard can be assumed under the following model
hi(t) = λ exp(ψTvi + αmi(t)) (10.1)
which can be specified using the survmodel(exp) option.
10.4.2.2. Weibull
A Weibull baseline hazard can be assumed under the following model
hi(t) = λγtγ−1 exp(ψTvi + αmi(t)) (10.2)
which can be specified using the survmodel(weibull) option.
10.4.2.3. Gompertz
A Gompertz baseline hazard can be assumed under the following model
hi(t) = λ exp(γt+ψTvi + αmi(t)) (10.3)
which can be specified using the survmodel(gompertz) option. I believe this was
the first implementation of the Gompertz model within a joint modelling framework
(Crowther et al., 2013a).
10.4.2.4. Restricted cubic splines on the log hazard scale
A restricted cubic spline baseline hazard can be assumed under the following model
hi(t) = exp(s(log(t)|k0,γ) +ψTvi + αmi(t)) (10.4)
which can be specified using the survmodel(rcs) option, with the degrees of freedom
defined using the df(#) option. By default, the spline terms are orthogonalised, which
can be suppressed using the noorthog option.
186
10.4.2.5. Flexible parametric model
The Royston-Parmar survival submodel described in Chapter 7 can be assumed
Hi(t) = exp(s(log(t)|k0,γ) +ψTvi + αmi(t)) (10.5)
which can be specified using the survmodel(fpm) option, with the degrees of freedom
defined using the df(#) option, or a vector of internal knots can be defined using
knots(numlist). By default, the spline terms are orthogonalised, which can be sup-
pressed using the noorthog option.
10.4.2.6. 2-component mixture models
The two-component mixture Weibull model, described in Chapter 8, is defined under
the following model
hi(t) =λ1γ1t
γ1−1p exp(−λ1tγ1) + λ2γ2t
γ2−1(1− p) exp(−λ2tγ2)
p exp(−λ1tγ1) + (1− p) exp(−λ2tγ2)
× exp(Xiβ + αmi(t)) (10.6)
which can be specified using the survmodel(weibweib) option.
10.5. Association structures
All of the association structures that were introduced in Chapter 6 are implemented
in stjm. By default, stjm assumes the current value association structure; however,
this can be turned off by specifying the nocurrent option. Interaction effects, as
described in Section 6.5, can be specified using the assoccovariates(varlist) op-
tion, and the slope association can be used by specifying the derivassociation op-
tion. The random effects pameterisations can be specified using either the intassoc
option, which includes the subject-specific intercept, or the association(numlist)
option where numlist must be one of the numbers specified in rfp(numlist). By
default, intassoc and association() include the fixed effect (which generally leads
to a more interpretable association parameter); however, this can be turned off using
the nocoefficient option.
187
Any combination of the association structures described above can be specified
within the same joint model, providing a highly flexible framework with a variety of
ways to investigate the relationship between the two processes.
10.6. Estimation
As described in Chapter 6, evaluating the likelihood of a joint model is a computa-
tionally challenging task, requiring advanced numerical integration techniques. Within
Stata, similar techniques have been implemented within the Generalised Linear Latent
And Mixed Models (GLLAMM) (Rabe-Hesketh et al., 2002; ?), and are now part of
official Stata with the release of version 13 (StataCorp., 2013), with the introduction
of the new suite of me commands, implementing multilevel mixed effects models, and
the gsem suite to fit generalised structural equation models.
The default numerical integration technique is adaptive Gauss-Hermite (GH) quad-
rature, described in Chapter 6. Following Rabe-Hesketh et al. (2002), I implement the
adaptive GH quadrature at the beginning of each full Newton-Raphson iteration, under
the following scheme:
• Obtain starting values for θ0. These are obtained by fitting a separate linear
mixed effects model, obtaining patient-specific predictions from the mixed ef-
fects model, and including them as a time-varying covariate within a standard
survival model (this assumes the current value association, but is appropri-
ately altered under different association structures). This is a computationally
efficient way of getting excellent starting values for the full joint model.
• Repeat for k = 1, 2, . . . until Newton-Raphson iterations have converged
– Predict the subject-specific posterior means and standard deviations, bki
and Ωki :
∗ Predict the subject-specific posterior means and standard devia-
tions, bk0i and Ωk0
i using adaptive quadrature based on previous
estimates, bk−1i and Ωk−1
i
∗ Repeat for j = 1, 2, . . . until convergence:
188
· Predict the subject-specific posterior means and standard de-
viations, bkji and Ωkji using adaptive quadrature based on pre-
vious estimates, bkj−1i and Ωkj−1
i
– Update the parameter estimates to θk using adaptive quadrature using
bki and Ωki
Various options are available to control the maximisation process, including the
numerical approximations used to evaluate the joint likelihood. These include the
number of quadrature points used for each dimension of the random effects, gh(#), with
a default of 5 under adaptive quadrature and 15 under non-adaptive quadrature. The
tolerance used when updating the quadrature nodes can be controlled using atol(#)
(the j iterations in the above scheme), with a default of 1E-08.
10.6.1. A note on calculating the cumulative hazard function
As described in Chapter 6, under time-dependent association structures and a para-
metric baseline hazard function, the estimation of a joint model requires numerical
integration to evaluate the cumulative hazard function, and subsequently the survival
function required to calculate the joint likelihood. This adds to the computational de-
mands of the estimation process, given that this set of numerical integration is nested
within possibly multi-dimensional Gauss-Hermite quadrature, to integrate out the ran-
dom effects. However, this is not always the case, as if only time-independent associa-
tion structures are specified, this is detected by stjm and the nested set of numerical
integration is avoided when fitting a standard parametric baseline hazard function, such
as the exponential, Weibull or Gompertz. In testing on the liver cirrhosis trial dataset,
this can reduce computation time by up to half.
10.7. Joint modelling of longitudinally measured serum bilirubin and time
to death
To illustrate the ease of which a joint model can be fitted, I apply a joint model
to the PBC dataset. I model the longitudinal process using a linear trajectory model,
with random intercept and slope, adjusting for the interaction between fixed linear
189
time and treatment group. I model the survival process using a Weibull proportional
hazards survival submodel, and adjusting for treatment group. I use the default current
value association and the default unstructured form for the random effects variance-
covariance matrix.
. stjm logb, panel(id) rfp(1) timeinterac(trt) survm(weibull) survcov(trt)-> gen double _time_1 = X^(1)-> gen double _time_1_trt = trt * _time_1(where X = _t0)
Obtaining initial values:
Fitting full model:
-> Conducting adaptive Gauss-Hermite quadrature
Iteration 0: log likelihood = -1923.9358Iteration 1: log likelihood = -1919.2078Iteration 2: log likelihood = -1919.1856Iteration 3: log likelihood = -1919.1855
Joint model estimates Number of obs. = 1945Panel variable: id Number of panels = 312
Number of failures = 140
Log-likelihood = -1919.1855
Coef. Std. Err. z P>|z| [95% Conf. Interval]
Longitudinal
_time_1 .1826073 .0183264 9.96 0.000 .1466883 .2185264
_time_1_trt .0045744 .0244713 0.19 0.852 -.0433885 .0525373
_cons .4927346 .0582861 8.45 0.000 .3784959 .6069733
Survival
assoc:value
_cons 1.24083 .0931223 13.32 0.000 1.058314 1.423347
ln_lambda
trt .0407589 .179847 0.23 0.821 -.3117347 .3932525
_cons -4.409684 .2740596 -16.09 0.000 -4.946831 -3.872537
ln_gamma
_cons .0188928 .0827694 0.23 0.819 -.1433322 .1811178
Random effects Parameters Estimate Std. Err. [95% Conf. Interval]
id: Unstructured
sd(_time_1) .1806879 .0123806 .1579812 .2066583
sd(_cons) 1.002541 .0426659 .9223098 1.089751
corr(_time_1,_cons) .4256451 .0728762 .2730232 .5573664
sd(Residual) .3471453 .0066734 .334309 .3604745
Longitudinal submodel: Linear mixed effects modelSurvival submodel: Weibull proportional hazards model
Integration method: Adaptive Gauss-Hermite quadrature using 5 nodesCumulative hazard: Gauss-Kronrod quadrature using 15 nodes
The first component of the output indicates the creation of core variables, gener-
ated by stjm, which are included in the linear predictor of the longitudinal submodel.
190
The next statement indicates that stjm is obtaining initial values, which means it is
currently fitting a two-stage model to get starting values for the full joint model. It
then proceeds to fit the full joint model using adaptive GH quadrature. The adaptive
iterations could be shown by specifying the showadapt option. The model converges
within three Newton-Raphson iterations and the results table is produced, showing the
longitudinal and the survival parameters.
The results table indicates a population average baseline value of the biomarker of
0.493 (95% CI: 0.378, 0.607), with an average statistically significant increasing linear
trend in the control group of 0.183 (95% CI: 0.147, 0.219). This increasing trend is
estimated to be slightly higher in the treatment group, though this difference is non-
statistically significiant (estimated difference in linear trend of 0.005 (95% CI: -0.043,
0.053)).
The association parameter is labelled under assoc:value, indicating the current
value association structure, with an estimated coefficient of 1.241 (95% CI: 1.058,
1.423), which equates to a hazard ratio of 3.459 (95% CI: 2.882, 4.151) for a one-
unit increase in log serum bilirum at time t, indicating a highly positive, statistically
significant association structure. I also observe a non-statistically significant direct ef-
fect of treatment on survival, with a log hazard ratio of 0.041 (95% CI: -0.312, 0.393).
As I fitted a Weibull baseline hazard function, baseline covariates are included in the
linear predictor of log(λ) under proportional hazards. The baseline parameters for the
Weibull survival submodel are presented on their parameterised scales (i.e. the log
scale), which can be exponentiated; λ = 0.012 and γ = 1.019.
The final part of the results table shows the estimated variance-covariance ma-
trix, which by default is presented showing standard deviations and correlation coeffi-
cients. It can be changed to show variances and covariances by simply entering stjm,
variance following a model fit to produce the output table again, or the variance
option when the model is fitted. The estimate of the residual measurement error is
shown as sd(Residual).
191
Finally, some summary text is presented to indicate which submodels have been
chosen, and the type and number of quadrature nodes for both the integrals over the
random effects, and the integral to calculate the cumulative hazard.
10.8. Post-estimation
A variety of predictions can be calculated following the fitting of a joint model using
stjm. Predictions follow the standard Stata syntax of:
. predict newvarname, statistic [options]
Fitted values for both the longitudinal response and the survival component (hazard,
cumhazard, survival) are available, either based on the fixed component of the model,
xb, or the fixed and empirical Bayes estimates of the random effects, fitted. A va-
riety of residuals can also be calculated, including, residuals; response minus fit-
ted values for longitudinal submodel, rstandard; standardised longitudinal residuals,
martingale; martingale-like residuals for the survival component, and deviance; which
calculates deviance residuals for the survival component. Empirical Bayes estimates of
the random effects can also be calculated using the reffects option, along with their
standard errors using the reses option.
Extended predictions can also be obtained using the at() option, which specifies
particular values of covariates at which to calculate predictions. This was used to create
Figure 9.3 in Chapter 9, predicting survival at specific covariate patterns. Furthermore,
baseline predictions can be calculated using the zeros option, which sets all baseline
covariates to be zero.
10.9. Discussion
In this chapter I have described the implementation in Stata of the joint mod-
elling framework described in earlier chapters. Through extensive model choices and
association structures, it provides a highly flexible framework for both the longitudinal
submodel, through the use of fractional polynomials or splines, and the survival sub-
model. Through the implementation of adaptive Gauss-Hermite quadrature, accurate
192
estimates of effect can be obtained using a much reduced number of quadrature nodes,
resulting in substantial computational benefits.
Although the methodological developments I have described throughout the thesis
are all implemented in Stata, some excellent alternatives to stjm exist in the statis-
tical software R. The first is joineR (Philipson et al., 2012), which implements joint
longitudinal-survival models using the Cox model as the survival model of choice, im-
plementing the model of Henderson et al. (2000) and provides a variety of plotting
functions and variograms. Alternatively, there is the extensive JM package, which im-
plements many of the standard joint models described in this thesis, and also allows the
Cox model to be used. It also includes many extensions such as the conditional survival
predictions of Rizopoulos (2011), and multiple-imputation based residuals (Rizopoulos
et al., 2010).
The stjm package is being constantly updated and improved, to include extension to
competing risks, inclusion of a cure proportion and allowance of categorical longitudinal
responses, to name a few. Future developments will be described in Chapter 11.
193
CHAPTER 11
Discussion
11.1. Chapter outline
In this chapter I conclude the thesis with a summary of the methodological de-
velopments described in previous chapters, whilst also considering the limitations of
the work. I further discuss the variety of future extensions, many currently under
investigation, that could build on the work that has been presented.
11.2. Introduction
In this thesis, I have developed methodology for both the simulation and analysis
of complex survival data, including the framework of joint models for longitudinal and
survival data. The methods have been assessed through simulation and application to
a variety of clinical datasets, illustrating the broad range of clinical topics that can
utilise the methodology.
In Section 11.3, I present a summary of the thesis, describing the main components
of each chapter. In Section 11.4, I place the work of the thesis in context by considering
the limitations of the work presented. In Section 11.5 I describe possible extensions
to the methodological frameworks described, including ongoing areas of research, mo-
tivated by clinically important questions. I make some final conclusions in Section
11.6.
11.3. Summary of the thesis
The following is a broad summary of the developments that have been presented
in this thesis. In Chapter 2 I described the foundations of survival analysis, which
formed the basis for extensions and developments in later chapters. Concentrating in
particular on parametric approaches to survival analysis, I described the fundamental
194
relationships which can provide clinically useful information in relating complex sta-
tistical concepts to patients. In Chapter 3 I described six motivating datasets, from
a variety of clinical areas, which were utilised throughout the thesis to illustrate the
methodological developments. Chapter 4 described the development of a general algo-
rithm to simulate survival times, providing a widely applicable framework of which to
simulate both simple and complex survival data. The framework was motivated by a
trial in breast cancer therapy, exhibiting a turning point in the baseline hazard function,
which is a commonly observed phenomenon in cancer trials, with mortality often peak-
ing shortly after the beginning of a trial before tailing off. The proposed algorithm can
simulate survival times from scenarios incorporating anything from extremely complex
underlying hazard functions, flexible time-dependent effects, time-varying covariates of
any type, and frailty terms, to name a few. This algorithm used numerical integration
nested within iterative root-finding, providing a computationally efficient framework for
simulation for use in practice, for example, it has recently been used in a study I was
involved in to investigate the performance of methods to adjust for treatment switching
in oncology trials (Latimer et al., 2014). Furthermore, I am involved in the extension
to simulate complex competing risks data, combining the general simulation algorithm
with the approach of Beyersmann et al. (2009). The numerical integration described
and used in Chapter 4, was then directly employed in Chapter 5, to develop a general
framework for the parametric analysis of survival data. Through the use of numerical
integration, a survival model can be fitted, with essentially any reasonable specification
for the hazard or log hazard function. This allowed the use of complex functions, such
as splines or fractional polynomials, to model not only the baseline hazard or log hazard
function, but time-dependent effects, and time-varying covariates. Delayed entry (left
truncation) was also accommodated within the framework. The first article relating to
Chapter 5 has been downloaded over 4,200 times since publication (Crowther and Lam-
bert, 2013b). The methods were applied to an observational dataset investigating the
effect of deprivation status on breast cancer survival, in both standard and relative sur-
vival frameworks, in particular highlighting the ability to model time-dependent effects
195
flexibly using restricted cubic splines. I further showed how the framework can be com-
bined with cluster robust standard errors to account for a hierarchical data structure,
in patients with multiple bladder cancer recurrences. The methods have since been
used in a National Institute for Health and Care Excellence (NICE) commissioned re-
port on total hip replacement and surface replacement for the treatment of pain and
disability resulting from end stage arthritis of the hip (Clarke et al., 2013), which I
was asked to collaborate on. The hip replacement example exemplifies the usefulness
of the general framework implemented in stgenreg as a development tool, whereby
we wished to fit a bathtub hazard model, a model unavailable in standard software;
however, could be fitted in a single line of code using stgenreg. Subsequently I wrote
an improved estimation routine which implemented the bathtub hazard model analyt-
ically. This echoes the improvements I illustrated in Chapter 5, through the combined
analytic/numeric approach when using restricted cubic splines on the log hazard scale,
implemented in strcs. In Chapter 6 I introduced the field of joint modelling of lon-
gitudinal and survival data, focusing on the frequentist approach and the modelling
of a single continuous longitudinal response and the time to an event of interest. I
described the foundations of the field and approaches previously conducted in the lit-
erature which motivated the extensions described in later chapters. In particular, in
Chapter 7, I extended the joint model framework to incorporate the parametric survival
model of Royston-Parmar, which provided a highly flexible framework to capture com-
plex baseline hazard functions, and provided computational benefits by modelling on
the log cumulative hazard scale. I showed improved performance of the method when
compared to a previous equivalent formulation which used cubic B-splines. Further to
this, I conducted an extensive simulation study which evaluated and compared the per-
formance of non-adaptive and fully adaptive Gauss-Hermite quadrature in calculating
the joint likelihood. This showed the clear superiority of fully adaptive quadrature,
both in terms of precision but also computational efficiency. I illustrated the proposed
model by investigating the association between repeatedly measured prothrombin in-
dex and survival, in patients with liver cirrhosis, indicating the improved fit over more
196
simple parametric approaches, and improved stability in estimates when compared to
the B-spline approach. Although modelling on the log cumulative hazard scale has its
advantages, the disadvantage of losing the equivalence between log cumulative hazard
ratios and log hazard ratios under a time-dependent association structure, motivated
the model framework of Chapter 8. In this chapter, I incorporated a mixture of two
parametric distributions as the survival submodel, returning to the log hazard scale,
concentrating on a two-component mixture Weibull model. Through simulation I illus-
trated that misspecifying the baseline hazard function can lead to bias in the estimates
of the association parameter and log hazard ratios of baseline effects. In Chapter 9, I
described the use of a specific association structure, with a novel application to registry
based data from the General Practice Research Database, investigating the association
between baseline systolic blood pressure (SBP) and the time to stroke, utilising the
repeated measures of SBP that are often simply ignored. Through simulation I showed
the reduction in bias compared to using observed baseline values of a biomarker, pro-
viding an effective way to account for measurement error. Finally in Chapter 10, I
described the implementation of the Stata package stjm, written in parallel to the
methodological developments presented in Chapters 7 to 9, for the fitting of a joint
longitudinal-survival model. A variety of model specifications were described providing
the user with great flexibility to model both the longitudinal trajectory, the survival
outcome, and how they are linked.
11.4. Limitations
It is important to consider the limitations and possible pitfalls of any analytical
method, to place in context the appropriate use of advanced statistical techniques.
The techniques described throughout Chapters 4 to 10 require at least one form of
numerical integration, be it across the infinite or finite domain. By definition, numerical
integration is a way of approximating an analytically intractable integral. Any approx-
imation contains a magnitude of error. Within the simulation framework of Chapter
4, there are instances where complex scenarios can be chosen that can avoid the need
197
for numerical integration and root-finding, such as was proposed by Austin (2012), to
simulate time-varying covariates under particular functional forms and baseline hazard
functions. Furthermore, if interest is in relative effects through fitting a Cox model,
where the baseline hazard is not estimated, then simulating from a simple parametric
form for the baseline, such as an exponential, should be preferred.
The general survival modelling framework developed in Chapter 5 also relies on in-
tegrating the hazard function using numerical quadrature to obtain the cumulative haz-
ard and consequently the survival function, for estimation. Alternatives to this, which
avoid the need for numerical integration include the reduced piecewise-exponential ap-
proach of Han et al. (2014), which combines a likelihood ratio test to identify shifts in
the hazard rate, a backward elimination procedure, and an optional presumed order
restriction on the hazard rate. However, this approach cannot model covariate effects,
as yet. Through fine-splitting of the timescale, using a piecewise exponential approach
can be advantageous, particularly as estimates closely match those from a Cox model
(Clayton and Hills, 1993; Crowther et al., 2012b). In particular, estimates will be iden-
tical to a Cox model if splitting at unique event times (Whitehead, 1980; Carstensen,
2004). A piecewise approach; however, loses the continuous time modelling nature of
the approach I developed in Chapter 5, which has particular advantages in terms of
prediction and extrapolation. Investigating improved numerical integration techniques
would be advantagous, such as using non-parametric or simple parametric estimates of
the hazard function to inform the nodes and weights of the numerical quadrature in an
attempt to place and weight them more appropriately.
A criticism of the frequentist approach to joint modelling is the reliance of numerical
quadrature to integrate out the random effects, in order to calculate the likelihood. It
is always recommended that any analyses relying on such approximations ensure con-
sistent estimates by using an increasing number of quadrature nodes (Rabe-Hesketh
et al., 2002). Many authors have proposed a Bayesian approach to joint modelling,
which avoids the need for numerical quadrature (Faucett and Thomas, 1996; Brown
198
and Ibrahim, 2003a; Ibrahim et al., 2004; Taylor et al., 2013). Often the choice be-
tween frequentist and Bayesian approaches is a philosophical one; however, the ease of
implementation of a Bayesian joint model can be seen as an advantage over more dif-
ficult to implement frequentist approaches, with vague prior distributions chosen with
the aim of recovering estimates that closely match the maximum likelihood equivalent.
This approach does not fully exploit the possible benefits of a Bayesian approach, where
genuine prior information can be incorporated. This aspect will be discussed further
in Section 11.5.
The choice of how to link the longitudinal and survival components, within a joint
model, has so far largely been guided by clinical opinion, or standard model selection
criteria, such as the AIC and BIC. Given a joint model consists of two inter-related
models, it is of interest to explore selection techniques which partition the influence
that particular components have on each aspect of a fitted model. This issue was
highlighted in Chapter 9, showing the impact that an inappropriate functional form for
the longitudinal trajectory over time can have on estimates of association.
Throughout the thesis, I have emphasised the benefits of using restricted cubic
splines for modelling both the baseline hazard function, and time-dependent effects.
An open criticism of splines is in the selection of knot locations. I have adopted the
most commonly used approach, which is to use centiles of the appropriate distribution
of observed data, for example, centiles of the uncensored (log) survival times, with the
degrees of freedom chosen by the user (Rutherford et al., 2014). Sensitivity analyses to
such selection must be conducted in order to establish suitability of functional form, and
robustness to misspecification (Royston and Lambert, 2011). Alternative estimation
routines such as penalised likelihood have been conducted to incorporate a trade-off
between the data fit and the smoothness of the functions (Rondeau et al., 2003, 2012),
which could be explored within the spline based models described in this thesis.
199
11.5. Future work
There are a variety of extensions to the work described in this thesis. The general
parametric survival framework described in Chapter 5 can be extended for competing
risks, and further to multi-state models. Competing risks occur commonly in medical
research, where patients are at risk of experiencing more than one type of event, with
the occurrence of one event preventing the occurrence of others (Putter et al., 2007).
A typical example is partitioning the mortality rate into death due to cancer, death
due to cardiovascular disease, and death due to other causes (Eloranta et al., 2012). In
this instance, the quantity that is generally of most interest is the cumulative incidence
function (CIF), which can be defined as the integral from time 0 to t, of the kth cause-
specific hazard multiplied by the overall survival function, representing the probability
of failing from cause k before time t. If utilising the general parametric framework,
which will provide smooth continuous time estimates of the CIF, then this function
will also require numerical integration to calculate it. This can be considered a post-
estimation task, as has been described previously (Lambert et al., 2010a; Hinchliffe and
Lambert, 2013). This extension is currently under investigation. Building on this is the
area of multi-state modelling, particularly for the analysis of large-scale cohort studies,
often using multiple linked databases, and how this could be used to more appropriately
populate decision models to allow for competing risks and correlation structures. A
further extension of interest is incorporating frailty terms into the general parametric
framework, to account for heterogeneity in baseline risk and covariate effects, which
are commonly used in areas such as individual patient data meta-analyses, and multi-
centre clinical trials. I have recently incorporated this extension into simple parametric
models and the Royston-Parmar survival model (Crowther et al., 2014).
Despite the extensive methodological development of joint models within the bio-
statistical literature in the past 20 years, there are many aspects which require further
work. With the growing availability of registry based data sources (Jutte et al., 2011),
comes the potential for more complex and rigorous investigations of the relationships
between biomarkers and how they relate to prognosis. For example, I am currently
200
investigating the added benefit of jointly modelling the relationship between systolic
blood pressure over time and the time to cardiovascular outcomes, in a cohort of patients
who have experienced their first cardiovascular disease event. This will be extended to
incorporate multiple biomarkers, such as serum cholesterol, where I need to account for
the expected correlation between biomarkers over time. This motivates the extension
to handle multiple biomarkers, and furthermore allowing for competing risks, where I
could investigate the different associations across event types, such as death from my-
ocardial infarction or stroke. Undertaking a Bayesian approach could be particularly
useful in this context, where there may be prior knowledge about the expected cor-
relation between biomarkers over time. Generalising the longitudinal outcomes to the
generalised linear mixed effects framework is also desirable, to accommodate categorical
and count data outcomes.
Finally, as I discussed in Chapter 9, the incorporation of delayed entry (left trun-
cation) into the joint model framework has received little attention in the joint model
literature (Piccorelli and Schluchter, 2012). If age is to be used as the timescale, which
can be of particular use when modelling observation data, due to presence of random
effects the calculation of the joint likelihood becomes more complex when delayed entry
is present, as the random effects are conditional on a patient’s entry time (Jensen et al.,
2004; Rondeau et al., 2011). I have recently implemented this extension, motivated by
a current project in breast cancer, jointly modelling repeated measures of breast den-
sity and how they are associated with survival, where patients have a minimum of two
breast density recordings, and as such are not at risk of event until the time of second
measurement.
11.6. Conclusion
In this thesis, I have developed a general framework for the simulation of survival
data, which can be utilised in a wide range of settings within the field of survival
analysis, providing an extensive approach to assess and validate assumptions through
simulation. Furthermore, the general parametric survival analysis model framework
201
developed in this thesis is widely applicable, providing the benefits of a parametric
approach to gain greater insights into clinical datasets, particularly through prediction.
As joint modelling becomes a recognised appropriate modelling technique for the com-
bined analysis of longitudinal and survival data, the areas of application will only grow,
and availability of the methods will become crucial. The importance of providing well
documented and user-friendly statistical software, in tandem to methodological devel-
opments, cannot be understated. For new techniques and improved ways of analysing
data to permeate from methodological to applied research, the methods must be acces-
sible to statisticians and non-statisticians, alike.
202
APPENDIX A
Research Papers 1 and 2 - The Stata Journal and Statistics in
Medicine
This appendix contains two papers which developed methodology and software for
the simulation of simple and complex survival data, published in The Stata Journal in
2012, and Statistics in Medicine in 2013. They are omitted from the e-thesis due to
copyright.
203
APPENDIX B
Research Papers 3 and 4 - Journal of Statistical Software and
Statistics in Medicine
This appendix contains two papers which developed methodology and software for
the general parametric analysis of survival data; the first published in the Journal of
Statistical Software in 2013, and the second published in Statistics in Medicine in 2014.
The second paper is omitted due to copyright.
204
JSS Journal of Statistical SoftwareMay 2013, Volume 53, Issue 12. http://www.jstatsoft.org/
stgenreg: A Stata Package for General Parametric
Survival Analysis
Michael J. CrowtherUniversity of Leicester
Paul C. LambertUniversity of Leicester
Abstract
In this paper we present the Stata package stgenreg for the parametric analysis ofsurvival data. Any user-defined hazard function can be specified, with the model estimatedusing maximum likelihood utilising numerical quadrature. Models that can be fitted rangefrom the Weibull proportional hazards model to the generalized gamma model, mixturemodels, cure rate models, accelerated failure time models and relative survival models.We illustrate the features of stgenreg through application to a cohort of women diagnosedwith breast cancer with outcome all-cause death.
Keywords: survival analysis, parametric models, numerical quadrature, maximum likelihood,Stata.
1. Introduction
Parametric models remain a standard tool for the analysis of survival data. Through a fullyparametric approach, we can not only obtain relative effects, such as hazard ratios in a propor-tional hazards model, but also clinically relevant absolute measures of risk, such as differencesin survival proportions (Lambert, Dickman, Nelson, and Royston 2010). Parametric modelsare also useful where extrapolation is required, such as in the economic decision modellingframework (Weinstein et al. 2003).
The most popular tool for analysing survival data remains the Cox proportional hazardsmodel (Cox 1972), which avoids making any assumptions for the shape of the baseline hazardfunction. One of the reasons the Cox model remains the prefered choice over parametricmodels is that standard parametric models available in standard software are often not flexibleenough to capture the underlying shape of the hazard function seen in real data.
The traditional approach to estimation of parametric models is through maximum likelihood.This is relatively simply when using a known probability distribution function, such as the
2 stgenreg: General Parametric Survival Analysis in Stata
Weibull or Gompertz. Many commonly used parametric survival models are implemented ina variety of software packages, such as the streg package in Stata (StataCorp. 2011), survreg(Therneau 2012) in R (R Core Team 2013) and LIFEREG in SAS (SAS Institute Inc. 2008).However, every parametric model has underlying assumptions, for example, the widely usedWeibull proportional hazards model assumes a monotonically increasing or decreasing baselinehazard rate. Such assumptions can be considered restrictive, leading to the development ofother more flexible parametric approaches (Royston and Parmar 2002; Royston and Lambert2011).
In this paper we present the Stata command stgenreg which enables the user to fit generalparametric models through specifying any baseline hazard function which can be written ina standard analytical form. This is implemented through numerical integration of the user-defined hazard function. This allows complex extensions to standard parametric models, forexample, modelling the log baseline hazard function using splines or fractional polynomials,as well as complex time-dependent effects; methods that are unavailable in standard software.Time-varying covariates can also be incorporated through using multiple records per subject.We do not consider frailty (unobserved heterogeneity) in this article.
One of the key advantages of such a general framework for survival analysis is in the devel-opment of new models, for example in one line of code a parametric survival model can befitted rather than having to directly program the likelihood evaluator.
2. Parametric survival analysis
Let T ∗i be the true event time of patient i = 1, . . . , n, and Ti = min(T ∗i , Ci) the observedsurvival time, with Ci the censoring time. Define an event indicator di, which takes the valueof 1 if T ∗i ≤ Ci and 0 otherwise. We define the probability density function of T ∗i as
f(t) = limδ→0
P (t ≤ T ∗ ≤ t+ δ)
δ
where f(t) is the unconditional probability of an event occuring in the interval (t, t+ δ). Wedefine the hazard and survival functions as
h(t) = limδ→0
P (t ≤ T ∗ ≤ t+ δ|T ∗ ≥ t)δ
and S(t) = P (T ∗ ≥ t)
such that h(t) is the instantaneous failure rate at time t, and S(t) is the probability of‘surviving’ longer than time t. This leads to
f(t) = h(t)S(t) (1)
We can further write
H(t) =
∫ t
0h(u)du S(t) = exp−H(t) (2)
where H(t) is the cumulative hazard function. When the integral in Equation 2 is analyticallyintractible, we can use numerical integration techniques to derive the cumulative hazard andthus still calculate the survival function.
Journal of Statistical Software 3
2.1. Maximum likelihood estimation
The log-likelihood contribution of the i-th patient, allowing for right censoring and delayedentry (left truncation), using Equation 1 can be written as
li = log
f(ti)
di
(S(ti)
S(t0i)
)1−di
= di logf(ti)+ (1− di) logS(ti) − (1− di) logS(t0i) (3)
where t0i and ti are the observed entry and survival/censoring times for the i-th patient.If delayed entry is not present then the third term in Equation 3 can be dropped. UsingEquation 3 we can directly maximize the log-likelihood if using known probability densityand survival functions. Alternatively, using Equation 1 we can write
li = log
h(ti)
diS(ti)
S(t0i)
= di logh(ti)+ logS(ti) − logS(t0i)
and substituting Equation 2 this becomes
li = di logh(ti) −∫ ti
t0i
h(u)du (4)
We note from Equation 4 that the likelihood can also be maximized if only the hazard func-tion is known. Of course, in standard parametric models, all 3 functions are known; however,given that often the hazard function is of most interest, specifying a complex hazard functioncan be advantageous. The maximization of such a specified hazard model relies on beingable to evaluate the integral in Equation 4. If we propose to use such functions as fractionalpolynomials or splines to model a complex baseline hazard function, or incorporating com-plex time-dependent effects, then we have a situation where this integral cannot always beevaluated analytically, motivating alternative approaches.
2.2. Numerical integration
We propose to use numerical quadrature to evaluate the cumulative hazard, and hence maxi-mize the likelihood in Equation 4, allowing the user to estimate a parametric survival model,specifying any function for the baseline hazard, satisfying h(t) > 0 for all t > 0.
Gaussian quadrature allows us to evaluate an analytically intractible integral through aweighted sum of a function evaluated at a set of pre-defined points, known as nodes (Stoerand Burlirsch 2002). We have
∫ 1
−1g(x)dx =
∫ 1
−1W (x)g(x)dx ≈
m∑
i=1
wig(xi)
where W (x) is a known weighting function and g(x) can be approximated by a polynomialfunction. The integral over [t0i, ti] in Equation 4 must be changed to an integral over [−1, 1]
4 stgenreg: General Parametric Survival Analysis in Stata
using the following rule∫ ti
t0i
h(x)dx =ti − t0i
2
∫ 1
−1h
(ti − t0i
2x+
t0i + ti2
)dx
≈ ti − t0i2
m∑
i=1
wih
(ti − t0i
2xi +
t0i + ti2
)
This transformation allows the incorporation of delayed entry quite simply. The form of Gaus-sian quadrature depends on the choice of weighting function. The default within stgenreg isGauss-Legendre quadrature, with weighting function, W (x) = 1.
The accuracy of the numerical integral depends on the number of quadrature nodes, m, withnode locations dependent on the type of quadrature chosen. As with all methods which usenumerical integration, the stability of maximum likelihood estimates should be established byusing an increasing number of quadrature nodes.
2.3. Time-dependent effects and time-varying covariates
The presence of non-proportional hazards, i.e., time-dependent effects, is common in theanalysis of time to event data (Jatoi, Anderson, Jeong, and Redmond 2011). This is frequentlyobserved in registry data sources where follow-up time is often over many years (Lambertet al. 2011). Similarly in clinical trials, time-dependent treament effects are also observed(Mok et al. 2009). Time-dependent effects are incorporated seemlessly into our modellingframework, by allowing the user to interact any covariates with a specified function of time.We illustrate this in Section 4.2.1.
Time-varying covariates are a further often observed scenario in the analysis of survival data,where the value of a covariate for individual patients can change at various points in follow-up.For example in oncology clinical trials, patients will often switch treatment group when theircondition progresses (Morden, Lambert, Latimer, Abrams, and Wailoo 2011), or biomarkersmay be measured repeatedly over time, resulting in multiple records per subject (?). For thisform of analysis the data is often set up into start and stop times, and since delayed entry (lefttruncation) is allowed, this again is incorporated into the described modelling framework. Weillustrate through example in Section 4.4.
3. The Stata package stgenreg
The Stata package stgenreg is implemented as three Stata ado files. The primary shell pro-gram, stgenreg.ado, handles the syntax options for the package, which then calls the like-lihood evaluator program stgenreg_d0.ado, described in Section 3.1. Finally, a variety ofpredictions can be obtained following estimation of a model using Stata’s predict command,which calls the program stgenreg_pred.ado, described in Section 3.2.
3.1. Program implementation and syntax
The log-likelihood shown in Equation 4 is maximized using the Newton-Raphson algorithm,with first and second derivatives estimated numerically, as implemented in the ml command inStata (Gould, Pitblado, and Poi 2010). As described in Section 2.1, the integral in Equation 4is evaluated using m-point Gaussian quadrature.
Journal of Statistical Software 5
The evaluator program has been optimized using Stata’s matrix programming language, Mata.This provides computational benefits and use of the wide array of mathematical functionsavailable for the user to specify in the hazard function. In addition, we have implementedspecific functions which allow the incorporation of restricted cubic splines or fractional poly-nomials into the hazard or log hazard function (Durrleman and Simon 1989; Royston andAltman 1994).
When using stgenreg one of the options loghazard() or hazard() must be defined. Thesespecify a user-defined log hazard or hazard function. The function must be defined in Matacode, with parameters specified in square brackets, for example [ln_lambda]. The use ofMata means that mathematical operations require a colon (:) prefix, for example :+ insteadof +. Time must be coded as #t. The user can specify covariates or functions of time withinthe linear predictor of any parameter, providing a highly flexible framework.
For example, we can specify a Weibull distribution using either the log hazard or hazardfunction. Each parameter is parameterized to contain the entire real number line, i.e., bothλ and γ are restricted to be positive by modelling on the log scale.
. stgenreg, loghazard([ln_lambda] :+ [ln_gamma] ///
> :+ (exp([ln_gamma]) :- 1) :* log(#t))
. stgenreg, hazard(exp([ln_lambda]) :* exp([ln_gamma]) :* ///
> #t :^ (exp([ln_gamma]) :- 1))
A linear predictor can be defined for any of the parameters, with the name of the optiondefined as the name of the parameter specified in the loghazard() or hazard() option. Forexample a proportional hazards Weibull model can be fitted with covariates treatment, ageand sex by adding the option ln_lambda(treatment age sex).
One of the key advantages of stgenreg is that we can incorporate a variety of functions (in-cluding functions of time) into the linear predictor of any parameter. For example, parameter[ln_lambda] has an available option ln_lambda(comp1 | comp2 | ...| compn), which cancontain a variety of component functions to increase complexity. Each compj can contain avariety of functions described in Table 1.
Additionally, excess mortality (relative survival) models (Nelson, Lambert, Squire, and Jones2007) can be fitted by use of the bhazard(varname) option. In these models a known expectedmortality rate, h∗(t), is included in the model as follows,
h(t) = h∗(t) + λ(t)
Here the loghazard() and hazard() options now refer to the modelling of λ(t). Note that itis the expected mortality rate at the event time that needs to be supplied to the bhazard()
option.
Finally, all standard options of the ml suite in Stata can be used when fitting a stgenreg
model, such as constraints() which allow the user to constrain the value of any coefficientto be a particular constant.
3.2. Predictions
A variety of predictions can be obtained following the estimation of a model. These includethe hazard, survival and cumulative hazard functions.
6 stgenreg: General Parametric Survival Analysis in Stata
Component Description
varlist [, nocons] The user may specify a standard variable list within acomponent section, with an optional nocons option.
g(#t) Where g() is any user defined function of #t writtenin Mata code, for example #t:^2.
#rcs(options ) Creates restricted cubic splines of either log time ortime. Options include df(int), the number of de-grees of freedom, noorthog which turns off the de-fault orthogonalisation, time, which creates splinesusing time rather than log time, the default, andoffset(varname) to include an offset when calculat-ing the splines. See rcsgen in Stata for more details.
#fp(numlist [,options ]) Creates fractional polynomials of time with powers de-fined in numlist. If 0 is specified, log time is gener-ated. The only current option is offset() which isconsistent with that described in #rcs() above.
varname:*f(#t) To include time-dependent effects, where f(#t) is oneof #rcs(), #fp() or g().
Table 1: Description of each component that can be included in the linear predictor of aparameter.
The standard Stata syntax to obatin predictions following a model fit is as follows
. predict newvarname, statistic
So for example, to obtain the fitted survival, hazard and cumulative hazard functions
. predict surv1, survival
. predict haz1, hazard
. predict cumhaz1, cumhazard
Extended prediction options unavilable in standard software include: zeros – obtains base-line predictions, at() – obtains predictions at specified covariate patterns, timevar() – ob-tains predictions at specified times. These options can be combined with standard choices ofhazard, cumhazard and survival. Finally, the ci option can be used to obtain confidenceintervals.
4. Analysis of example datasets using stgenreg
We illustrate stgenreg through use of a dataset comprising of 9721 women aged under 50and diagnosed with breast cancer in England and Wales between 1986 and 1990. The eventof interest is death from any cause, with follow-up restricted to 5 years. Deprivation was
Journal of Statistical Software 7
categorized into 5 levels; however, we have restricted the analyzes to comparing the mostaffluent and most deprived groups, for illustrative purposes. We therefore only consider abinary covariate, dep5, with 0 for the most affluent and 1 for the most deprived group.
We further illustrate how to incorporate a time-varying covariate through use of a datasetof 488 patients with liver cirrhosis (Anderson, Borgan, Gill, and Keiding 1993). A totalof 251 patients were randomized to receive prednisone, with 237 randomized to receive aplacebo. Prothrombin index was measured repeatedly, with between 1 and 17 measurementsper subject, resulting in 2968 observations. Outcome was all-cause death.
4.1. Weibull proportional hazards model
We begin by fitting a Weibull proportional hazards model to the breast cancer dataset, investi-gating the effect of deprivation status. Given that Weibull models are available in all standardstatistical software, we first illustrate the concept showing that the estimates agree with es-timates derived using analytically tractible definitions of the hazard and survival functions.The baseline hazard and log hazard functions have the following form
h(t) = λγtγ−1 exp(βX)
and
log(h(t)) = log(λ) + log(γ) + (γ − 1) log(t) + βX
where X is a vector of covariates, with corresponding regression coefficients β. In this case itis convenient to use the loghazard() option of stgenreg. We can investigate covariate effectsby including deprivation status in the linear predictor of log(λ), using the option ln_lambda.
. stgenreg, loghazard([ln_lambda] :+ [ln_gamma] :+ ///
> (exp([ln_gamma]) :- 1) :* log(#t)) nodes(30) ln_lambda(dep5)
Log likelihood = -8808.149 Number of obs = 9721
----------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ln_lambda |
dep5 | .2698633 .0392017 6.88 0.000 .1930293 .3466972
_cons | -2.824814 .0370151 -76.32 0.000 -2.897362 -2.752265
-----------+----------------------------------------------------------------
ln_gamma |
_cons | .0464514 .0179823 2.58 0.010 .0112068 .081696
----------------------------------------------------------------------------
Quadrature method: Gauss-Legendre with 30 nodes
We observe a log hazard ratio of 0.270 (95% CI: 0.193, 0.347) and consequently a hazard ratioof 1.310 (95% CI: 1.213, 1.414), indicating a 31% increase in the mortality rate in the mostdeprived group compared to the most affluent. We could further adjust the γ parameter bydeprivation status but adding the option ln_gamma(dep5).
8 stgenreg: General Parametric Survival Analysis in Stata
When fitting models which rely on numerical integration, it is important to establish thestability of maximum likelihood estimates by using an increasing number of quadrature nodes.In the case of a Weibull proportional hazards model, we can both compare with the optimizedmodel using streg in Stata, and compare with an increasing number of quadrature nodes.Here we present results from fitting the streg model and stgenreg models with 15, 30, 50and 100 nodes.
-----------------------------------------------------------------------------
Variable | streg stgenreg15 stgenreg30 stgenreg50 stgenreg100
-----------+-----------------------------------------------------------------
#1 |
dep5 | .2698715 .26983514 .26986326 .26986899 .26987095
| .0392017 .03920178 .03920173 .03920172 .03920171
_cons | -2.8252423 -2.8232443 -2.8248136 -2.8251059 -2.8252139
| .03694985 .03718485 .03701515 .03697471 .03695639
-----------+-----------------------------------------------------------------
#2 |
_cons | .04673335 .04542627 .04645138 .04664313 .04671442
| .01792781 .01812554 .01798227 .01794843 .0179332
-----------+-----------------------------------------------------------------
Statistics |
ll | -8808.0854 -8808.3461 -8808.149 -8808.1075 -8808.0906
-----------------------------------------------------------------------------
We obtain consistent parameter estimates to 3 decimal places with 30 nodes, and accuracy isimproved when the number of nodes are increased. However, computation time will increasewith an increasing number of nodes, for example using 15 nodes takes 7.4 seconds comparedwith 12.4 seconds using 100 nodes (on a HP laptop with Intel i5 2.5GHz processor with 8GBof RAM). In comparison, the fully optimized streg model took 0.4 seconds to converge. Thisdifference is clearly expected as the stgenreg formulation of the Weibull model is not themost computationally efficient, as there is no need to use numerical integration when usingthe standard Weibull model.
4.2. Restricted cubic spline proportional hazards model
We now introduce a much more flexible proportional hazards survival model, modelling thebaseline log hazard function using restricted cubic splines of log(time). We formulate thebaseline log hazard function
log(h(t)) = s(log(t)) +Xβ (5)
where s(log(t)) is a restricted cubic spline function of log(t). This can be implemented byusing the #rcs component option. We use the default knot locations, based on the centilesof the distribution of uncensored survival times.
This draws parallels with the flexible parametric model of Royston and Parmar (2002), imple-mented in Stata as the stpm2 command (Royston and Lambert 2011), which uses restrictedcubic splines to model the log cumulative hazard function
log(H(t)) = s(log(t)) +Xβ (6)
Journal of Statistical Software 9
0.05
0.10
0.15
0.20
0.25
Ha
za
rd r
ate
0 1 2 3 4 5Follow−up time (years)
95% confidence interval Baseline hazard rate
Figure 1: Predicted hazard function for the most affluent group with 95% confidence interval.
An advantage of modelling on the log hazard scale is that when there are multiple timedependent effects, the interpretation of the time-dependent hazard ratios is simplified asthey do not depend on values of other covariates, which is the case when modelling on thecumulative hazard scale (Royston and Lambert 2011).
We apply the model in Equation 5 with 5 degress of freedom, i.e., 4 internal knots placedat the 20th, 40th, 60th and 80th percentiles of the distribution of log event times, and 2boundary knots placed at the 0th and 100th percentiles.
. stgenreg, loghazard([xb]) xb(dep5 | #rcs(df(5))) nodes(30)
Log likelihood = -8756.2213 Number of obs = 9721
-----------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+--------------------------------------------------------------
dep5 | .2693634 .0392018 6.87 0.000 .1925293 .3461976
_eq1_cp2_rcs1 | -.0621779 .0274602 -2.26 0.024 -.1159989 -.008357
_eq1_cp2_rcs2 | .0784834 .0192975 4.07 0.000 .0406611 .1163057
_eq1_cp2_rcs3 | .1158689 .0176746 6.56 0.000 .0812272 .1505106
_eq1_cp2_rcs4 | -.0251518 .0143719 -1.75 0.080 -.0533202 .0030165
_eq1_cp2_rcs5 | .0012793 .0134076 0.10 0.924 -.0249991 .0275576
_cons | -2.910463 .0607005 -47.95 0.000 -3.029434 -2.791492
-----------------------------------------------------------------------------
Quadrature method: Gauss-Legendre with 30 nodes
10 stgenreg: General Parametric Survival Analysis in Stata
When using the component options stgenreg will create variables labelled by the equationnumber (indexed from left to right in the log hazard or hazard specification) and the com-ponent number (again counting from left to right in each parameter option). So variables_eq1_cp2_* contain the spline basis variables defined by the #rcs(df(5)) component. Theestimate of the log hazard ratio for the effect of deprivation is very similar to the Weibullbased estimate; however, we have now estimated 6 parameters to model the baseline hazardfunction, an intercept and 5 parameters associated with the spline terms. We can obtain thepredicted baseline hazard function and 95% confidence interval as follows
. predict haz1, hazard ci zeros
We illustrate the fitted baseline hazard function in Figure 1.
Time-dependent effects
We now investigate the presence of a time-dependent effect due to deprivation status. Withinthe framework of restricted cubic splines, this can be investigated using the component formvarname:*#rcs(df(num)), i.e., an interaction between the effect of time (using splines) andthe deprivation group. We use 3 degrees of freedom for illustration.
. stgenreg, loghazard([xb]) nodes(30) ///
> xb(dep5 | #rcs(df(5)) | dep5 :* #rcs(df(3)))
Log likelihood = -8747.3275 Number of obs = 9721
-----------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+--------------------------------------------------------------
dep5 | .0723415 .0924005 0.78 0.434 -.1087602 .2534433
_eq1_cp2_rcs1 | -.0108058 .0309504 -0.35 0.727 -.0714673 .0498558
_eq1_cp2_rcs2 | .0672877 .0224852 2.99 0.003 .0232177 .1113578
_eq1_cp2_rcs3 | .1128672 .0207167 5.45 0.000 .0722634 .1534711
_eq1_cp2_rcs4 | -.0261438 .0145455 -1.80 0.072 -.0546525 .002365
_eq1_cp2_rcs5 | .0014202 .0134079 0.11 0.916 -.0248589 .0276992
_eq1_cp3_rcs1 | -.1464002 .0443983 -3.30 0.001 -.2334194 -.0593811
_eq1_cp3_rcs2 | .0425164 .0333753 1.27 0.203 -.022898 .1079307
_eq1_cp3_rcs3 | .0135896 .0322604 0.42 0.674 -.0496396 .0768187
_cons | -2.849318 .0649361 -43.88 0.000 -2.976591 -2.722046
-----------------------------------------------------------------------------
Quadrature method: Gauss-Legendre with 30 nodes
In Figure 2 we compare the fit of the models with either time-independent or time-dependenthazard ratios for deprivation status, by overlaying the fitted survival functions onto theKaplan-Meier curve, for each deprivation group. We observe a much improved fit to theKaplan-Meier curve when modelling the time-dependent effect of deprivation group. We canpredict the time-dependent hazard ratio using the partpred (Lambert 2010) command asfollows.
Journal of Statistical Software 11
Figure 2: Kaplan-Meier estimates for the most affluent and most deprived groups, withpredicted survival overlaid. The figure on the left shows predicted survival with a proportionaleffect of deprivation status, with the figure on the right allowing for non-proportional hazardsin the effect of deprivatin status.
12
34
56
Ha
za
rd R
atio
0 1 2 3 4 5Follow−up time (years)
95% upper bound: hr/95% lower bound: hr Prediction
Figure 3: The estimated time-dependent hazard ratio for deprivation group and associated95% confidence interval.
12 stgenreg: General Parametric Survival Analysis in Stata
. partpred hr, for(dep5 _eq1_cp3*) ci(hr_uci hr_lci) eform
This is then plotted in Figure 3 which shows that the relative increase in the mortality rateis much larger at the start of follow-up and decreases to around one by 5 years.
4.3. Generalized gamma proportional hazards model
The generalized gamma (GG) is a 3-parameter parametric model implemented in a variety ofstatistical packages (Cox, Chu, Schneider, and Munoz 2007). However, it is parameterized asan accelerated failure time model in Stata. We can write the survival and density functionsas
SGG(t) =
1− I (γ, u) if κ > 0
1− Φ (z) if κ = 0
I (γ, u) if κ < 0
(7)
and
fGG(x) =
γγ
σt√2π
exp(z√
(γ)− u) if κ 6= 01
σt√2π
exp(−z2/2) if κ = 0(8)
where γ = |κ|−2, z = signlog(t)−µ, µ = γ exp(|κ|z), Φ(z) is the standard normal cumulativedistribution, and I(a, x) is the incomplete gamma function.
Therefore using Equation 1, we can write down our baseline hazard function as the ratio ofthe probability distribution function to the survival function.
hGG(t) =fGG(t)
SGG(t)
To invoke proportional hazards we can then simply multiply by the exponential of a parameter,the linear parameter of which is our vector of covariates
hGG(t) =fGG(t)
SGG(t)exp(Xβ) or log(hGG(t)) = log
(fGG(t)
SGG(t)
)+Xβ
Where β is a vector of log hazard ratios. In terms of implementation, in the linear predictor forour Xβ parameter we must specify the nocons option to ensure no intercept term, obtaininga proportional hazards formulation for the GG model. As this is a complex function, we canuse Stata’s local macros to build up the function.
. local mu [mu]
. local sigma exp([ln_sigma])
. local kappa [kappa]
. local gamma (abs(`kappa') :^ (-2))
. local z (sign(`kappa') :* (log(#t) :- `mu') :/ (`sigma'))
. local u ((`gamma') :* exp(abs(`kappa') :* (`z')))
. local surv1 (1 :- gammap(`gamma',`u')) :* (`kappa' :> 0)
. local surv2 (1 :- normal(`z')) :* (`kappa' :== 0)
. local surv3 gammap(`gamma',`u') :* (`kappa' :< 0)
. local pdf1 ((`gamma' :^ `gamma') :* exp(`z' :* sqrt(`gamma') :- `u') :/ ///
Journal of Statistical Software 13
> (`sigma' :* #t :* sqrt(`gamma') :* gamma(`gamma'))) :* (`kappa' :! =0)
. local pdf2 (exp(-(`z' :^ 2) :/ 2) :/ (`sigma' :* #t :* sqrt(2 :* pi())))///
> :* (`kappa' :== 0)
. local haz (`pdf1' :+ `pdf2') :/ (`surv1' :+ `surv2' :+ `surv3')
. stgenreg, hazard(exp([xb]) :* (`haz')) nodes(30) xb(dep5,nocons)
Log likelihood = -8801.2754 Number of obs = 9721
----------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+--------------------------------------------------------------
xb |
dep5 | .2694578 .0391992 6.87 0.000 .1926289 .3462868
-------------+--------------------------------------------------------------
kappa |
_cons | .6752793 .0749985 9.00 0.000 .528285 .8222735
-------------+--------------------------------------------------------------
mu |
_cons | 2.710497 .032793 82.65 0.000 2.646224 2.774771
-------------+--------------------------------------------------------------
ln_sigma |
_cons | .1727204 .0521935 3.31 0.001 .0704231 .2750178
----------------------------------------------------------------------------
Quadrature method: Gauss-Legendre with 30 nodes
Once again we obtain very similar estimates to the Weibull model, but now modelling thebaseline with 3 parameters. This model formulation illustrates a powerful tool where bysimply introducing an extra parameter we can implement a model not available in any softwarepackage.
4.4. Time-varying covariates
We now illustrate the data setup required for survival analysis incorporating a time-varyingcovariate. We use the liver cirrhosis dataset described above. Here we use the enter() andid() options of stset in Stata, to declare the data as multiple record per subject.
. stset stop, enter(start) id(id) failure(event=1)
id: id
failure event: event == 1
obs. time interval: (stop[_n-1], stop]
enter on or after: time start
exit on or before: failure
---------------------------------------------------------------------------
2968 total obs.
0 exclusions
14 stgenreg: General Parametric Survival Analysis in Stata
---------------------------------------------------------------------------
2968 obs. remaining, representing
488 subjects
292 failures in single failure-per-subject data
1777.749 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 13.39393
We illustrate the data structure of 2 patients, where _t0 represents the enter times at whichprothrombin was measured
. list id pro trt _t0 _t _d if id==1 | id==111, noobs sepby(id)
+-----------------------------------------------------+
| id pro trt _t0 _t _d |
|-----------------------------------------------------|
| 1 38 placebo 0 .2436754 0 |
| 1 31 placebo .2436754 .38057169 0 |
| 1 27 placebo .38057169 .41342679 1 |
|-----------------------------------------------------|
| 111 59 prednisone 0 .24641332 0 |
| 111 60 prednisone .24641332 .49830249 0 |
| 111 87 prednisone .49830249 .74471581 0 |
| 111 59 prednisone .74471581 1.1280254 0 |
| 111 35 prednisone 1.1280254 1.1581426 1 |
+-----------------------------------------------------+
We can now fit a stgenreg model using restricted cubic splines to model the baseline, ad-justing for the proportional effects of treatment and prothrombin index.
. stgenreg, loghazard([xb]) xb(pro trt | #rcs(df(3))) nolog
Variables _eq1_cp2_rcs1 to _eq1_cp2_rcs3 were created
Log likelihood = -588.17466 Number of obs = 2968
----------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+-------------------------------------------------------------
pro | -.0349754 .0024771 -14.12 0.000 -.0398304 -.0301205
trt | .1325576 .1182068 1.12 0.262 -.0991235 .3642388
_eq1_cp2_rcs1 | -.091006 .0579785 -1.57 0.116 -.2046419 .0226298
_eq1_cp2_rcs2 | -.1354551 .0431334 -3.14 0.002 -.219995 -.0509151
_eq1_cp2_rcs3 | -.2292129 .0499583 -4.59 0.000 -.3271295 -.1312964
_cons | .7376377 .1690535 4.36 0.000 .4062988 1.068977
----------------------------------------------------------------------------
Quadrature method: Gauss-Legendre with 15 nodes
Journal of Statistical Software 15
We observe a log hazard ratio of −0.35 (95% CI: −0.040, −0.030) indicating lower values ofthe biomarker are associated with an increased risk of death.
Alternatively stgenreg can be used in conjunction with Stata’s stsplit command, to createat risk time intervals.
5. Discussion
We have presented the stgenreg command in Stata, for the general parametric analysis ofsurvival data. Through specification of a user-defined hazard function, we have illustratedhow to implement standard proportional hazards models, novel restricted cubic spline survivalmodels and a generalized gamma model with proportional hazards. In essence, stgenreg maybe used to implement a parametric survival model defined by anything from a very simple oneparameter proportional hazards model, to models which contain highly flexible functions oftime, for both the baseline and time-dependent effects. Any parameter defined in the hazardfunction can be dependent on complex functions of time, including fractional polynomials orrestricted cubic splines.
The choice of the number of quadrature nodes is left to the user. An increasing number ofquadrature nodes should be used to establish consistent parameter estimates.
As it is a general framework, it may not be the most computationally efficient; however, it isa useful tool for the development of novel models. For example, it may be useful to developideas and test new models, but then spend time developing more computationally efficientmethods for specific cases.
In future developments we aim to allow for interval censoring, the extension to incorpo-rate frailty and a post-estimation command to calculate the cumulative incidence functionfor competing risks. The package is available from the Statistical Software Componentsarchive (Crowther and Lambert 2013) and can be installed from Stata by typing ssc install
stgenreg.
Acknowledgments
Michael Crowther was funded by a National Institute for Health Research (NIHR) DoctoralFellowship (DRF-2012-05-409).
The authors would like to thank two anonymous reviewers and an editor whose commentsgreatly improved the paper.
References
Anderson PK, Borgan Ø, Gill RD, Keiding N (1993). Statistical Models Based on CountingProcesses. Springer-Verlag.
Cox C, Chu H, Schneider MF, Munoz A (2007). “Parametric Survival Analysis and Taxonomyof Hazard Functions for the Generalized Gamma Distribution.” Statistics in Medicine,26(23), 4352–4374.
16 stgenreg: General Parametric Survival Analysis in Stata
Cox DR (1972). “Regression Models and Life-Tables.” Journal of the Royal Statistical SocietyB, 34(2), 187–220.
Crowther MJ, Lambert P (2013). “stgenreg: Stata Module to Fit General Parametric SurvivalModels.” Statistical Software Components, Boston College Department of Economics. URLhttp://ideas.repec.org/c/boc/bocode/s457579.html.
Durrleman S, Simon R (1989). “Flexible Regression Models with Cubic Splines.” Statistics inMedicine, 8(5), 551–561.
Gould W, Pitblado J, Poi B (2010). Maximum Likelihood Estimation with Stata. 4th edition.Stata Press.
Jatoi I, Anderson WF, Jeong JH, Redmond CK (2011). “Breast Cancer Adjuvant Therapy:Time to Consider Its Time-Dependent Effects.” Journal of Clinical Oncology, 29(17), 2301–2304.
Lambert P (2010). “partpred: Stata Module to Generate Partial Predictions.” StatisticalSoftware Components, Boston College Department of Economics. URL http://ideas.
repec.org/c/boc/bocode/s457176.html.
Lambert PC, Dickman PW, Nelson CP, Royston P (2010). “Estimating the Crude Probabilityof Death due to Cancer and other Causes using Relative Survival Models.” Statistics inMedicine, 29(7-8), 885–895.
Lambert PC, Holmberg L, Sandin F, Bray F, Linklater KM, Purushotham A, Robinson D,Møller H (2011). “Quantifying Differences in Breast Cancer Survival between England andNorway.” Cancer Epidemiology, 35(6), 526–533.
Mok TS, Wu YL, Thongprasert S, Yang CH, Chu DT, Saijo N, Sunpaweravong P, Han B,Margono B, Ichinose Y, Nishiwaki Y, Ohe Y, Yang JJ, Chewaskulyong B, Jiang H, DuffieldEL, Watkins CL, Armour AA, Fukuoka M (2009). “Gefitinib or Carboplatin-Paclitaxel inPulmonary Adenocarcinoma.” New England Journal of Medicine, 361(10), 947–957.
Morden JP, Lambert PC, Latimer N, Abrams KR, Wailoo AJ (2011). “Assessing Methods forDealing with Treatment Switching in Randomised Controlled Trials: A Simulation Study.”BMC Medical Research Methodology, 11, 4.
Nelson CP, Lambert PC, Squire IB, Jones DR (2007). “Flexible Parametric Models for Rela-tive Survival, with Application in Coronary Heart Disease.” Statistics in Medicine, 26(30),5486–5498.
R Core Team (2013). R: A Language and Environment for Statistical Computing. R Founda-tion for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Royston P, Altman DG (1994). “Regression Using Fractional Polynomials of ContinuousCovariates: Parsimonious Parametric Modelling.” Journal of the Royal Statistical SocietyC, 43(3), 429–467.
Royston P, Lambert PC (2011). Flexible Parametric Survival Analysis using Stata: Beyondthe Cox model. Stata Press.
Journal of Statistical Software 17
Royston P, Parmar MKB (2002). “Flexible Parametric Proportional Hazards and ProportionalOdds Models for Censored Survival Data, with Application to Prognostic Modelling andEstimation of Treatment Effects.” Statistics in Medicine, 21(15), 2175–2197.
SAS Institute Inc (2008). SAS/STAT Software, Version 9.2. Cary, NC. URL http://www.
sas.com/.
StataCorp (2011). “Stata Data Analysis Statistical Software: Release 12.” URL http://www.
stata.com/.
Stoer J, Burlirsch R (2002). Introduction to Numerical Analysis. 3rd edition. Springer-Verlag.
Therneau T (2012). survival: A Package for Survival Analysis in S. R package version 2.36-14, URL http://CRAN.R-project.org/package=survival.
Weinstein MC, O’Brien B, Hornberger J, Jackson J, Johannesson M, McCabe C, Luce BR(2003). “Principles of Good Practice for Decision Analytic Modeling in Health-Care Eval-uation: Report of the ISPOR Task Force on Good Research Practices–Modeling Studies.”Value in Health, 6(1), 9–17.
Affiliation:
Michael J. CrowtherDepartment of Health SciencesUniversity of LeicesterLeicester, United KingdomE-mail: [email protected]: http://www2.le.ac.uk/departments/health-sciences/research/
biostats/staff-pages/mjc76/
Paul C. LambertDepartment of Health SciencesUniversity of LeicesterLeicester, United KingdomandDepartment of Medical Epidemiology and BiostatisticsKarolinska InstitutetStockholm, SwedenE-mail: [email protected]: http://www2.le.ac.uk/Members/pl4/
Journal of Statistical Software http://www.jstatsoft.org/
published by the American Statistical Association http://www.amstat.org/
Volume 53, Issue 12 Submitted: 2012-07-09May 2013 Accepted: 2013-01-08
APPENDIX C
Research Paper 5 - Statistics in Medicine
This appendix contains a review of current methods and issues in joint modelling
of longitudinal and survival data, conducted by the Drug Information Association joint
model working group, that I was invited to be involved in. It has been recently published
in Statistics in Medicine. It is omitted from the e-thesis due to copyright.
222
APPENDIX D
Research Paper 6 - Statistics in Medicine
This appendix contains a methodological paper published in Statistics in Medicine,
describing the development of the Royston-Parmar based joint model. It is omitted
from the e-thesis due to copyright.
223
APPENDIX E
Research Paper 7 - BMC Medical Research Methodology
This appendix contains a paper published in BMC Medical Research Methodology,
describing the novel use of the intercept association structure within a joint model,
described in Chapter 9.
224
Crowther et al. BMC Medical Research Methodology 2013, 13:146http://www.biomedcentral.com/1471-2288/13/146
RESEARCH ARTICLE Open Access
Adjusting for measurement error in baselineprognostic biomarkers included in atime-to-event analysis: a joint modellingapproachMichael J Crowther1*, Paul C Lambert1,2 and Keith R Abrams1
Abstract
Background: Methodological development of joint models of longitudinal and survival data has been rapid inrecent years; however, their full potential in applied settings are yet to be fully explored. We describe a novel use of aspecific association structure, linking the two component models through the subject specific intercept, and thusextend joint models to account for measurement error in a biomarker, even when only the baseline value of thebiomarker is of interest. This is a common occurrence in registry data sources, where often repeated measurementsexist but are simply ignored.
Methods: The proposed specification is evaluated through simulation and applied to data from the General PracticeResearch Database, investigating the association between baseline Systolic Blood Pressure (SBP) and thetime-to-stroke in a cohort of obese patients with type 2 diabetes mellitus.
Results: By directly modelling the longitudinal component we reduce bias in the hazard ratio for the effect ofbaseline SBP on the time-to-stroke, showing the large potential to improve on previous prognostic models which useonly observed baseline biomarker values.
Conclusions: The joint modelling of longitudinal and survival data is a valid approach to account for measurementerror in the analysis of a repeatedly measured biomarker and a time-to-event. User friendly Stata software is provided.
BackgroundMany biomarkers such as systolic blood pressure (SBP)have been identified as key prognostic factors in the devel-opment and validation of cardiovascular risk scores [1,2].However, often only baseline values of these biomark-ers are used, despite the existence of repeated measures,especially in registry sources such as the General PracticeResearch Database (GPRD) [3]. Furthermore, biomark-ers are often measured with error. Failing to adjust forsuch measurement error leads to estimates being biasedtowards the null [4].
*Correspondence: [email protected] of Leicester, Department of Health Sciences, Adrian Building,University Road,Leicester LE1 7RH, UKFull list of author information is available at the end of the article
A joint model of longitudinal and survival data allowsus to investigate the relationship between a repeatedlymeasured biomarker, subject to measurement error, suchas SBP, and the time to an event of interest, such astime to non-fatal stroke. The approach which has dom-inated the methodological literature involves linking thetwo component submodels using shared random effects[5,6]. From a classical perspective, these methods requirecomputationally intensive numerical integration, which isdifficult to implement. However, due to the recent intro-duction of user-friendly software in R [7,8] and Stata [9],these models are starting to find their place in appliedresearch [10,11], but the potential uses of and forms ofthe association parameters, linking the longitudinal andsurvival components, are yet to be fully explored. Alterna-tively, many authors have proposed a Bayesian approach,
© 2013 Crowther et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.
Crowther et al. BMC Medical Research Methodology 2013, 13:146 Page 2 of 8http://www.biomedcentral.com/1471-2288/13/146
proving readily available BUGS code to implement themodels [12,13].
The most commonly used association structuresinclude the current value parameterisation [5]; wherebywe directly link the value of the biomarker, as estimatedby the longitudinal submodel, to survival, and the firstderivative or slope [10]; allowing the investigation of theeffect that the rate of change of the biomarker has onsurvival.
There is often interest in predicting prognosis basedon an initial baseline measurement [1,2]. In this paperwe investigate the use of the joint model framework witha random intercept association structure as an approachto adjust for measurement error, inherent in biomarkerssuch as SBP. By incorporating the repeated measures wethus make the most efficient use of the data available.In particular, as a prognostic model for future patients,we describe how this framework can be used to pre-dict survival for new patients who will only have baselinemeasurements.
MethodsA joint model of longitudinal and survival data consistsof two component submodels: the longitudinal submodeland the survival submodel. We define a set of baselinecovariates, U i, which can potentially differ between sub-models. The longitudinal submodel allows us to model thetrajectory of a repeatedly measured biomarker over time,adjusting for baseline covariates. The standard approachassumes a linear mixed effects model [14]. We observe
Yi(tij) = Wi(tij) + εij, εij ∼ N(0, σ 2e ) (1)
with
Wi(tij) = X′i(tij)β + Z′
i(tij)bi + uiδ (2)
where Yi(tij) is the observed longitudinal response forthe ith patient measured at the jth time point. Wi(tij)is our true unobserved trajectory function consisting ofdesign matrices X′
i(tij) and Z′i(tij) for the fixed and random
effects, β and bi, respectively, where bi ∼ MVN(0, ).We can incorporate flexibility here by allowing both X′
i(tij)and Z′
i(tij) to contain restricted cubic spline functions ofmeasurement time [15]. We also have a vector of baselinecovariates ui ∈ U i, and corresponding regression coeffi-cients, δ. Finally, εij is our normally distributed measure-ment error with constant variance σ 2
e . We further assumethat the random effects and error term are independent,and that cov(εij, εik) = 0 (where j = k).
The time-to-event submodel usually takes the form of aproportional hazards model
hi(t) = h0(t) exp(α1Wi(t) + φvi) (3)
with h0(t) the baseline hazard function and vi ∈ U i isa vector of baseline covariates with corresponding loghazard ratios, φ. The parameter α1 is commonly namedthe association parameter, indicating the strength of asso-ciation between the longitudinal biomarker and the timeto event. If α1 = 0, then the joint model reduces tothe two separate models and fitting a joint model willnot prove advantageous. This parameterisation assumesthe hazard is dependent on the biomarker through itscurrent value. This form of association is one of manyways to link the two component sub-models. The baselinehazard function, h0(t), can be modelled using a para-metric distribution, most frequently the Weibull, or lessrestrictively using flexible parametric survival models[16], or of course can be left unspecified [17]. However,an unspecified baseline hazard function leads to under-estimation of the standard errors of parameter estimates[18], and consequently bootstrapping is required to obtainappropriate standard errors.
For illustration, we let Wi(tij), the longitudinal sub-model, be a linear function of time where the intercept andslope varies between subjects
Wi(tij) = (β0 + b0i) + (β1 + b1i)tij (4)
giving a model with a random intercept and random lin-ear slope. As an alternative way of linking the componentmodels to that of Equation (3), we may link elementsof the trajectory function, Wi(tij), to the hazard directly.For example, we can link the subject specific baselinebiomarker values through the intercept association struc-ture, where
hi(t) = h0(t) exp [α2(β0 + b0i) + φvi] (5)
in this expression α2 now estimates the strength ofthe association between the patient specific baselinebiomarker values, as estimated by the longitudinal sub-model, and the time-to-event. This way we can let the riskof event depend directly on the subject specific value ofthe biomarker at time t = 0.
If interest lies in prediction when a new patient isobserved at baseline, the issue of measurement error canbe accounted for through this approach. A benefit of thisassociation structure also lies in the evaluation of the jointlikelihood. Under most parametric survival submodels(e.g. Weibull distribution) and time-dependent associa-tion structures (eg. current value), numerical quadratureis required to integrate out not only the random effects,but under Equation (3), nested quadrature is also requiredto evaluate the cumulative hazard function. Under thetime-independent association structure of Equation (5),we avoid this nested quadrature as the cumulativehazard function has an analytically tractable form, whichprovides computational benefits.
Crowther et al. BMC Medical Research Methodology 2013, 13:146 Page 3 of 8http://www.biomedcentral.com/1471-2288/13/146
As discussed in the introduction, this model formula-tion can be an alternative to the standard approach ofusing the observed baseline biomarker value
hi(t) = h0i(t) exp(α3Y0i + φvi) (6)
where Y0i is the observed baseline biomarker value andα3 is the log hazard ratio for a one unit increase in theobserved baseline biomarker value. Although simple to fit,Equation (6) does not account for potential measurementerror in Y0i.
Simulation studyIn order to assess the performance of the standardapproach of including observed biomarker values, com-pared to the full joint model described above, we evalu-ated both through simulation [19]. For ease of expositionwe assume a longitudinal model with random interceptand slope, assuming a continuous biomarker of interestwith
Wi(tij) = (β0 + b0i) + (β1 + b1i)tij
where β0 = β1 = 0, and b0i ∼ N(0, 1), b1i ∼ N(0, 0.252)with correlation between (b0i, b1i) of 0.25. Observed mea-surements are then generated from Yij ∼ N(Wi(tij), σ 2
e ),where tij is the time of the jth measurement for the ith
patient. We vary σe from 0.1, 0.5, 1.We assume a Weibull baseline hazard function with
λ = 0.1 and γ = 1.5. A binary variable, X1 to rep-resent treatment group was generated from Bin(1, 0.5),with an associated log hazard ratio of φ1 = −0.5. A con-tinuous covariate, X2, to represent age at baseline wasgenerated from N(65, 12) with an associated log hazardratio of φ2 = 0.01. We then generate survival timesfrom a Weibull distribution where the hazard is definedas h(t) = h0(t) exp(α2β0i + φ1X1 + φ2X2), with α2 theassociation parameter, indicating the effect of a one unitincrease in the value of the subject specific intercept onthe risk of event. We vary α2 = −0.5, 0.25, 0.25, 0.5. Eachsimulation contained 300 patients with up to 5 annualmeasurements (including baseline), and administrativecensoring at 5 years. This corresponds to an approximate18.9% survival proportion at 5 years (calculated at themean of covariate values, X1 = 1
2 , X2 = 65 and β0i = 0).To each dataset we fit a Weibull proportional hazardsmodel including the observed baseline measurement, anda Weibull-based joint model with the random interceptassociation structure. We adjust for age and treatment inthe survival submodel. Each scenario is simulated 1000times.
To illustrate the varying measurement error standarddeviations used in the simulation scenarios, we show inFigure 1 observed longitudinal measurements from thesame 100 patients with σe = 0.1, 0.5, 1, and when
α = 0.25. Figure 1 illustrates that as the measurementerror standard deviation increases, the variability in theobserved biomarker values increases.
The GPRD cohortThe General Practice Research Database (GPRD) Grouphas obtained ethical approval from a Multi-centreResearch Ethics Committee (MREC) for all purely obser-vational research using GPRD data; namely, studies whichdo not include patient involvement The core work ofthe GPRD is covered by MREC approval granted by theTrent Multi- Research Ethics Committee (REC referencenumber 05/MRE04/87) and this study was approved bythe GPRD Independent Scientific Advisory Committee(ISAC) (Protocol number 09_094). This study is based inpart on data from GPRD obtained under licence fromthe UK Medicines and Healthcare Product RegulatoryAgency (MHRA). However, the interpretation and con-clusions contained in this study are those of the authorsalone.
The example cohort used to illustrate the methods inthis paper consists of 4,850 obese patients diagnosed withtype 2 diabetes mellitus. We have 107,347 measurementsof SBP, with maximum follow-up of 22 years. There were278 stroke events observed.
In Figure 2 we show the observed SBP measure-ments for 9 randomly selected patients, who had at least10 measurements, illustrating some nonlinear trajecto-ries. To accommodate such nonlinearities we can userestricted cubic splines in the linear mixed effects sub-model. In particular, we specify the following longitudinalsubmodel
Wi(tij) = (β0 + b0i) + β1agei + β2sexi + β3BMIi
+ (βF sF(tij; kF) + bRsR(tij; kR))(7)
Where sF(tij; kF) is the restricted cubic spline basis ofmeasurement time with corresponding fixed effects, βF ,with knot locations kF , and sR(tij; kR) is the restrictedcubic spline basis of measurement time with correspond-ing random effects, bR, and knot locations kR.
Prelimenary modelling of the longitudinal data can beconducted to guide model selection, in particular, thedegrees of freedom for the spline terms capturing theunderlying longitudinal trajectory over time.
To allow flexibility in the survival submodel we use theflexible parametric survival model [16,20], which mod-els the baseline log cumulative hazard function usingresticted cubic splines. We can once again undertakeseperate analysis of just the survival data to inform modelselection. In particular, we can use the AIC and BIC toguide the selection of the number of degrees of free-dom to capture the baseline hazard function, following
Crowther et al. BMC Medical Research Methodology 2013, 13:146 Page 4 of 8http://www.biomedcentral.com/1471-2288/13/146
-4-2
02
4
Long
itudi
nal r
espo
nse
0 1 2 3 4Time of measurement (years)
σe = 0.1
-4-2
02
4
Long
itudi
nal r
espo
nse
0 1 2 3 4Time of measurement (years)
σe = 0.5
-6-4
-20
24
Long
itudi
nal r
espo
nse
0 1 2 3 4Time of measurement (years)
σe = 1
Figure 1 Example simulated observed longitudinal measurements with varying measurement error standard deviation.
100125150175200
SB
P
0 5 10 15Follow-up time (years)
Patient 1
100125150175200
SB
P
0 5 10 15Follow-up time (years)
Patient 2
100125150175200
SB
P
0 5 10 15Follow-up time (years)
Patient 3
100125150175200
SB
P
0 5 10 15
Follow-up time (years)
Patient 4
100125150175200
SB
P
0 5 10 15
Follow-up time (years)
Patient 5
100125150175200
SB
P
0 5 10 15
Follow-up time (years)
Patient 6
100125150175200
SB
P
0 5 10 15
Follow-up time (years)
Patient 7
100125150175200
SB
P
0 5 10 15
Follow-up time (years)
Patient 8
100125150175200
SB
P
0 5 10 15
Follow-up time (years)
Patient 9
Observed SBP Predicted SBP
Figure 2 Longitudinal response measurements for SBP for 9 randomly selected patients who had at least 10 measurements. The dashedline represents the fitted longitudinal trajectories based on the joint model.
Crowther et al. BMC Medical Research Methodology 2013, 13:146 Page 5 of 8http://www.biomedcentral.com/1471-2288/13/146
Rutherford et al. (2013) [21]. Our final joint model isthen
log [Hi(t)] = log [H0(t)] + φ1 agei + φ2 sexi + φ3 BMIi
+ α2(β0 + b0i)
(8)
where
log [H0(t)] = s(log(t); γ , kS) (9)
where the baseline log cumulative hazard function,log [H0(t)], is expanded into a restricted cubic spline func-tion of log(t), s(log(t); γ , kS), with knot locations kS andcoefficient vector, γ . This framework has recently beenincorporated into a joint model [22]. In each submodelwe adjust for the baseline effects of age, sex and BMI.We fit the joint model with the random intercept asso-ciation structure shown in Equation (5). For comparison,we also apply the standard flexible parametric survivalmodel, adjusting for observed baseline SBP, age, sex andBMI.
ResultsSimulation study resultsBias and coverage estimates for the association parame-ter are presented in Table 1. Under the standard Weibullmodel, we observe increasing bias in the estimates ofthe association between baseline biomarker values andsurvival, as the magnitude of the measurement error stan-dard deviation, σe, increases. In parallel we observe verypoor coverage probabilities under the Weibull approach.For example, with α = 0.5 and σe = 1, we observe
bias of -0.261 (percentage bias of -52.2%) and coverage of0.4%. In contrast, under the joint modelling approach weobserve minimal bias and coverage probabilities close to95% across all scenarios.
Analysis of GPRD cohortWe now present the analysis of the GPRD cohort. In allanalyses we use SBP/10 so that a unit increase in SBP/10represents a clinically meaningful 10 unit increase in SBP.Our primary interest is the association between baselineSBP and the risk of stroke. Baseline (tij = 0) correspondsto when each patient entered the cohort, i.e. the time offirst SBP measurement.
We began by assuming a random intercept and select-ing the degrees of freedom for the fixed spline termsusing the AIC and BIC. In this case, both selected fivedegrees of freedom for sF(tij; kF), with an AIC of 417565.8and BIC of 417604.1. For the random splines of time weassumed a linear term, which equates to one spline termfor sR(tij; kR). This allows a very flexible form to take intoaccount the variation in SBP over time. We further adjustfor age, sex and Body-Mass Index (BMI) at baseline.
For the flexible parametric survival submodel, both AICand BIC selected two degrees of freedom, with an AICof 2408.7173 and BIC of 2430.483. If one degree of free-dom had been selected, then this would be equivalent to aWeibull survival model.
Results are presented in Table 2. Under the standardflexible parametric survival model we observe a hazardratio for a ten unit increase in baseline SBP of 1.111 (95%CI: 1.051, 1.172). Under a joint model we observe anincreased hazard ratio of 1.198 (95% CI: 1.107, 1.298). The
Table 1 Simulation results of the association parameter, α
True True Weibull Joint model
α σe Bias % bias MSE CP Bias % bias MSE CP
0.1 -0.001 -0.2 0.006 94.8 0.005 0.9 0.006 95.3
0.50 0.5 -0.105 -21.1 0.016 65.4 0.005 0.9 0.007 95.6
1.0 -0.261 -52.1 0.071 0.4 0.008 1.6 0.012 94.8
0.1 0.002 1.0 0.005 94.4 0.005 2.0 0.006 94.3
0.25 0.5 -0.046 -18.5 0.007 89.0 0.007 2.7 0.007 94.5
1.0 -0.123 -49.2 0.018 34.1 0.010 4.1 0.009 94.8
0.1 0.003 -1.3 0.006 93.8 0.001 -0.2 0.006 94.0
-0.25 0.5 0.051 -20.6 0.007 87.1 0.000 -0.1 0.007 94.2
1.0 0.127 -50.7 0.019 29.7 -0.002 0.9 0.009 94.6
0.1 0.000 -0.1 0.006 96.6 -0.005 1.0 0.006 95.9
-0.50 0.5 0.104 -20.9 0.015 66.7 -0.006 1.1 0.007 95.7
1.0 0.260 -52.0 0.070 0.4 -0.010 2.0 0.012 94.5
MSE - mean square error.CP - coverage probability.σe - standard deviation of the measurement error.
Crowther et al. BMC Medical Research Methodology 2013, 13:146 Page 6 of 8http://www.biomedcentral.com/1471-2288/13/146
Table 2 Results from applying a flexible parametric proportional hazards model adjusting for observed baseline systolicblood pressure, and a full joint model using the intercept association structure
Standard FPSM Joint model
Coefficient 95% CI Coefficient 95% CI
Survival model:
Baseline SBP/10 (α2) 0.105 0.050 0.159 0.181 0.102 0.261
Age (years) 0.048 0.036 0.060 0.050 0.038 0.062
Sex (male) 0.011 -0.233 0.254 -0.010 -0.253 0.234
BMI (kg/m2) 0.011 -0.015 0.037 0.013 -0.012 0.039
Longitudinal model:
Intercept - - - 13.006 12.629 13.382
Age (years) - - - 0.025 0.022 0.029
Sex (male) - - - -0.252 -0.332 -0.171
BMI (kg/m2) - - - 0.003 -0.005 0.011
RCS1 - - - -0.080 -0.121 -0.039
RCS2 - - - -0.006 -0.019 0.006
RCS3 - - - -0.001 -0.010 0.007
RCS4 - - - 0.003 0.000 0.006
RCS5 - - - 0.000 -0.001 0.001
σe - - - 1.522 1.515 1.528
FPSM - Flexible Parametric Survival Model.RCS - Restricted Cubic Spline.
increased effect using a joint model is consistent with thatobserved in the simulation study, i.e. that the bias in thestandard survival model is towards the null. The fittedtrajectories seen in Figure 2 appear to capture the subject-specific measurements well, although some panels appearto only require a linear trend.
We illustrate how the bias from the standard approachincreases with SBP in Figure 3, showing predictions fromboth models for a female patient aged 60, with low (90),medium (130) and high (200) SBP baseline measurements.To quantify the differences, at 10 years under the stan-dard model we observe a survival probability of 0.881for a SBP of 200, compared to 0.816 under the full jointmodel.
DiscussionA wealth of patient data is becoming available in registrysources such as the GPRD, providing extensive oppor-tunities to utilise the joint modelling framework. Wehave shown that by incorporating repeated measures ofa biomarker within a unified joint model framework,we reduce bias due to measurement error, even whenonly the baseline level of the biomarker is predictive ofsurvival. As illustrated in the simulation study, ignoringmeasurement error in biomarkers such as blood pres-sure can lead to a marked underestimation of covariateeffects. In our application, through the use of restricted
cubic splines in the linear mixed effects submodel, we canmodel highly nonlinear trajectories over time, comparedto linear slope models. Furthermore, the flexible paramet-ric survival submodel can also capture complex baselinehazard functions, an important component when predict-ing survival at the patient level [22].
Given that, to our knowledge, all current cardiovas-cular risk scores only use baseline measures, with no
0.6
0.7
0.8
0.9
1.0
Sur
viva
l pro
babi
lity
0 5 10 15 20Follow-up time (years)
FPM model: SBP = 90 Joint model: SBP = 90FPM model: SBP = 130 Joint model: SBP = 130FPM model: SBP = 200 Joint model: SBP = 200
Figure 3 Predicted survival from the flexible parametric survivalmodel and joint model, for a female, aged 60 years, BMI of 30,with SBP of 90, 130 or 200.
Crowther et al. BMC Medical Research Methodology 2013, 13:146 Page 7 of 8http://www.biomedcentral.com/1471-2288/13/146
adjustment for measurement error, the prospects of util-ising this framework to improve prognostic risk scoresis quite substantial. Predicting survival for a new patientusing this framework follows naturally, as often only a firstbaseline biomarker observation will be available. How-ever, such a modelling approach also allows a dynamic riskprediction approach to be adopted, whereby a patient’sestimated future risk is updated as each new biomarkervalue is obtained [23]. Such an approach could enableresponse to treatment to be monitored and patients coun-selled accordingly.
In the analysis of the GPRD cohort, we incorporatedflexibility in both the longitudinal submodel through theuse of restricted cubic splines, and the flexible paramet-ric survival submodel. Given that both submodels requirechoosing the number of degrees of freedom, a simple sen-sitivity analysis can be undertaken to assess knot locationsand number of knots. We showed recently that the flex-ible parametric survival submodel is very robust to bothknot placement and number of knots within a joint modelframework [22], and furthermore, an extensive simula-tion study has been conducted by Rutherford et al. (2013),which showed excellent performance of the flexible para-metric model to capture simple and complex baselinehazard functions [21]. Furthermore, given that primaryinterest was in the survival component, and the estimateof association, often modelling the longitudinal compo-nent with a suitable sensible functional form will providean improved estimate compared to simplistic approachesof seperate modelling.
In this paper we have concentrated on a specific asso-ciation structure linking the 2 component submodels;however, it may be of interest to investigate linking multi-ple components of a biomarkers trajectory to the time toan event of interest. For example, recent work by Rothwellet al. (2010) [24] has shown associations between not onlybaseline blood pressure, but also variability over time asimportant predictors of cardiovascular events. Further-more, we have only compared the standard approach ofadjusting for observed baseline biomarker values to thefull joint model. It would be of interest to compare alter-native approaches for adjusting for measurement error,not only in baseline biomarkers, but also under a time-dependent association structure [25,26].
Extensions to the modelling framework include incor-porating multiple biomarkers. In particular, in ourexample we modelled SBP over time, whilst adjusting forbaseline BMI. It may be of interest to model not onlySBP but also the inter-relationships between differentbiomarkers such as BMI, and how they are related to anevent of interest [13].
To facilitate the use of the methods in practice, userfriendly Stata software, written by the first author, isavailable, with a variety of survival model choices and
association structures, including those discussed in thisarticle [9,27]. To illustrate computational aspects of theframework, the presented joint model applied to thecohort took just over 13 minutes to converge on a stan-dard laptop computer.
ConclusionThe joint modelling of longitudinal and survival data is avalid approach to account for measurement error in theanalysis of a repeatedly measured biomarker and a time toevent. User friendly Stata software is provided.
Competing interestsThe authors declare that they have no competing interests.
Authors’ contributionsAll authors were involved in conception and design of the project. MJCconducted the simulation study, analysed the clinical dataset and wrote thefirst draft of the manscript. PCL and KRA both revised the manuscript. Allauthors read and approved the final manuscript.
AcknowledgementsMJC is funded by a National Institute for Health Research (NIHR) DoctoralFellowship (DRF-2012-05-409) and KRA is partially supported as a NIHR SeniorInvestigator (NI-51-0508-10061).The cohort of obese patients with type 2 diabetes mellitus was obtained fromthe General Practice Research Database (GPRD) under Independent ScientificAdvisory Committee (ISAC)-approved Protocol 09_094, and which was fundedby a National Institute for Health Research (NHIR) Health TechnologyAssessment (HTA) Programme Project Grant (07/85/02).The authors would like to thank Monica Hernández for initial preparatory workon the data from the GPRD cohort of patients, Roberta Ara for useful initialdiscussions regarding anti-obesity treatment, and finally the reviewers whosecomments greatly improved the paper.
Author details1University of Leicester, Department of Health Sciences, Adrian Building,University Road,Leicester LE1 7RH, UK. 2Karolinska Institutet, Department ofMedical Epidemiology and Biostatistics, Box 281, S-171 77 Stockholm, Sweden.
Received: 8 July 2013 Accepted: 4 November 2013Published: 1 December 2013
References1. Conroy RM, Pyorala K, Fitzgerald AP, Sans S, Menotti A, De Backer G,
De Bacquer D, Ducimetiere P, Jousilahti P, Keil U, Njolstad I, Oganov RG,Thomsen T, Tunstall-Pedoe H, Tverdal A, Wedel H, Whincup P,Wilhelmsen L, Graham IM, SCOREpg: Estimation of ten-year risk of fatalcardiovascular disease in Europe: the SCORE project. Eur Heart J 2003,24(11):987–1003.
2. Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, May M, Brindle P:Derivation and validation of QRISK, a new cardiovascular diseaserisk score for the United Kingdom: prospective open cohort study.BMJ 2007, 335(7611):136. [http://dx.doi.org/10.1136/bmj.39261.471806.55]
3. Ara R, Blake L, Gray L, Hernandez M, Crowther M, Dunkley A, Warren F,Jackson R, Rees A, Stevenson M, Abrams K, Cooper N, Davies M, Khunti K,Sutton A: What is the clinical effectiveness and cost-effectiveness ofusing drugs in treating obese patients in primary care? A systematicreview. Health Technol Assess 2012, 16(5):1–202. [http://dx.doi.org/10.3310/hta16050]
4. Prentice RL: Covariate measurement errors and parameterestimation in a failure time regression model. Biometrika 1982,69(2):331–342. [http://www.jstor.org/stable/2335407]
5. Wulfsohn MS, Tsiatis AA: A joint model for survival and longitudinaldata measured with error. Biometrics 1997, 53:330–339.
6. Henderson R, Diggle P, Dobson A: Joint modelling of longitudinalmeasurements and event time data. Biostatistics 2000, 1(4):465–480.
Crowther et al. BMC Medical Research Methodology 2013, 13:146 Page 8 of 8http://www.biomedcentral.com/1471-2288/13/146
7. Rizopoulos D: JM: An R package for the joint modelling oflongitudinal and time-to-event data. J Stat Softw 2010, 35(9):1–33.[http://www.jstatsoft.org/v35/i09]
8. Philipson P, Sousa I, Diggle P, Williamson P, Kolamunnage-Dona R,Henderson R: joineR - Joint modelling of repeated measurementsand time-to-event data. 2012. [http://cran.r-project.org/web/packages/joineR/index.html]
9. Crowther MJ, Abrams KR, Lambert PC: Joint modeling of longitudinaland survival data. Stata J 2013, 13:165–184.
10. Wolbers M, Babiker A, Sabin C, Young J, Dorrucci M, Chêne G, Mussini C,Porter K, Bucher HC, CASCADE: Pretreatment CD4 cell slope andprogression to AIDS or death in HIV-infected patients initiatingantiretroviral therapy–the CASCADE collaboration: a collaborationof 23 cohort studies. PLoS Med 2010, 7(2):e1000239. [http://dx.doi.org/10.1371/journal.pmed.1000239]
11. Ibrahim JG, Chu H, Chen LM: Basic concepts and methods for jointmodels of longitudinal and survival data. J Clin Oncol 2010,28(16):2796–2801.
12. Guo X, Carlin BP: Separate and joint modeling of longitudinal andevent time data using standard computer packages. Am Stat 2004,58:16–24. [http://www.jstor.org/stable/27643494]
13. Rizopoulos D, Ghosh P: A Bayesian semiparametric multivariate jointmodel for multiple longitudinal outcomes and a time-to-event.Stat Med 2011, 30(12):1366–1380.
14. Laird NM, Ware JH: Random-effects models for longitudinal data.Biometrics 1982, 38(4):963–974.
15. Durrleman S, Simon R: Flexible regression models with cubic splines.Stat Med 1989, 8(5):551–561.
16. Royston P, Lambert PC: Flexible Parametric Survival Analysis Using Stata:Beyond the Cox Model. College Station: Stata Press; 2011.
17. Cox DR: Regression models and life-tables. J R Stat Soc Ser BMethodological 1972, 34(2):187–220.
18. Hsieh F, Tseng YK, Wang JL: Joint modeling of survival andlongitudinal data: likelihood approach revisited. Biometrics 2006,62(4):1037–1043. [http://dx.doi.org/10.1111/j.1541-0420.2006.00570.x]
19. Burton A, Altman DG, Royston P, Holder RL: The design of simulationstudies in medical statistics. Stat Med 2006, 25(11):4279–4292.
20. Royston P, Parmar MKB: Flexible parametric proportional hazards andproportional odds models for censored survival data, withapplication to prognostic modelling and estimation of treatmenteffects. Stat Med 2002, 21(15):2175–2197.
21. Rutherford MJ, Crowther MJ, Lambert PC: The use of restricted cubicsplines to approximate complex hazard functions in the analysis oftime-to-event data: a simulation study. J Stat Comput Simul 2013.[http://www.tandfonline.com/doi/abs/10.1080/00949655.2013.845890]
22. Crowther MJ, Abrams KR, Lambert PC: Flexible parametric jointmodelling of longitudinal and survival data. Stat Med 2012,31(30):4456–4471. [http://dx.doi.org/10.1002/sim.5644]
23. Rizopoulos D: Dynamic predictions and prospective accuracy in jointmodels for longitudinal and time-to-event data. Biometrics 2011,67(3):819–829. [http://dx.doi.org/10.1111/j.1541-0420.2010.01546.x]
24. Rothwell PM, Howard SC, Dolan E, O’Brien E, Dobson JE, Dahlöf B, SeverPS, Poulter NR: Prognostic significance of visit-to-visit variability,maximum systolic blood pressure, and episodic hypertension.Lancet 2010, 375(9718):895–905. [http://dx.doi.org/10.1016/S0140-6736(10)60308-X]
25. Zucker DM: A pseudo partial likelihood method for semiparametricsurvival regression with covariate errors. J Am Stat Assoc 2005,100(472):1264–1277. [http://www.tandfonline.com/doi/abs/10.1198/016214505000000538]
26. Liao X, Zucker DM, Li Y, Spiegelman D: Survival analysis witherror-prone time-varying covariates: a risk set calibration approach.Biometrics 2011, 67:50–58. [http://dx.doi.org/10.1111/j.1541-0420.2010.01423.x]
27. Crowther MJ: STJM: Stata module to fit shared parameter jointmodels of longitudinal and survival data. Stat Softw ComponentsBoston Coll Dep Econ 2012. [http://ideas.repec.org/c/boc/bocode/s457339.html]
doi:10.1186/1471-2288-13-146Cite this article as: Crowther et al.: Adjusting for measurement error inbaseline prognostic biomarkers included in a time-to-event analysis: a jointmodelling approach. BMC Medical Research Methodology 2013 13:146.
Submit your next manuscript to BioMed Centraland take full advantage of:
• Convenient online submission
• Thorough peer review
• No space constraints or color figure charges
• Immediate publication on acceptance
• Inclusion in PubMed, CAS, Scopus and Google Scholar
• Research which is freely available for redistribution
Submit your manuscript at www.biomedcentral.com/submit
APPENDIX F
Research Paper 8 - The Stata Journal
This appendix contains a paper published in The Stata Journal, describing the
Stata package stjm which implements all of the joint models, and others, described in
this thesis. It is omitted from the e-thesis due to copyright.
233
References
H. Akaike. Information theory and an extension of the maximum likelihood principle,
volume 1, pages 267–281. Akademiai Kiado, 1973.
P. K. Anderson, Ø. Borgan, R. D. Gill, and N. Keiding. Statistical Models Based on
Counting Processes. New York, Springer, 1993.
T. M. Andersson, P. W. Dickman, S. Eloranta, and P. C. Lambert. Estimating and
modelling cure in population-based cancer studies within the framework of flexible
parametric survival models. BMC Med Res Methodol, 11:96, 2011.
R. Ara, L. Blake, L. Gray, M. Hernandez, M. Crowther, A. Dunkley, F. Warren, R. Jack-
son, A. Rees, M. Stevenson, K. Abrams, N. Cooper, M. Davies, K. Khunti, and
A. Sutton. What is the clinical effectiveness and cost-effectiveness of using drugs in
treating obese patients in primary care? A systematic review. Health Technol Assess,
16(5):1–202, Feb 2012.
P. C. Austin. Generating survival times to simulate cox proportional hazards models
with time-varying covariates. Stat Med, 31(29):3946–3958, 2012.
C. B. Begg and D. Schrag. Attribution of deaths following cancer treatment. J Natl
Cancer Inst, 94(14):1044–1045, Jul 2002.
A. Belot, M. Abrahamowicz, L. Remontet, and R. Giorgi. Flexible modeling of com-
peting risks in survival analysis. Stat Med, 29(23):2453–2468, 2010.
R. Bender, T. Augustin, and M. Blettner. Generating survival times to simulate Cox
proportional hazards models. Stat Med, 24(11):1713–1723, 2005.
J. Beyersmann, A. Latouche, A. Buchholz, and M. Schumacher. Simulating competing
risks data in survival analysis. Stat Med, 28(6):956–971, 2009.
234
K. Bhaskaran, O. Hamouda, M. Sannes, F. Boufassa, A. M. Johnson, P. C. Lambert,
K. Porter, and CASCADE Collaboration. Changes in the risk of death after HIV
seroconversion compared with mortality in the general population. JAMA, 300(1):
51–59, 2008.
L. J. Billingham and K. R. Abrams. Simultaneous analysis of quality of life and survival
data. Stat Methods Med Res, 11(1):25–48, 2002.
P. Bolard, C. Quantin, M. Abrahamowicz, J. Esteve, R. Giorgi, H. Chadha-Boreham,
C. Binquet, and J. Faivre. Assessing time-by-covariate interactions in relative survival
models using restrictive cubic spline functions. J Cancer Epidemiol Prev, 7(3):113–
122, 2002.
A. H. Briggs. Handling uncertainty in cost-effectiveness models. Pharmacoeconomics,
17(5):479–500, 2000.
E. R. Brown and J. G. Ibrahim. A bayesian semiparametric joint hierarchical model
for longitudinal and survival data. Biometrics, 59(2):221–228, 2003a.
E. R. Brown and J. G. Ibrahim. Bayesian approaches to joint cure-rate and longitudinal
models with applications to cancer vaccine trials. Biometrics, 59(3):686–693, 2003b.
E. R. Brown, J. G. Ibrahim, and V. DeGruttola. A flexible B-spline model for multiple
longitudinal biomarkers and survival. Biometrics, 61(1):64–73, 2005.
A. Burton, D. G. Altman, P. Royston, and R. L. Holder. The design of simulation
studies in medical statistics. Stat Med, 25(24):4279–4292, 2006.
V. Carstairs and R. Morris. Deprivation and health in Scotland. Aberdeen University
Press, Aberdeen, 1991.
B. Carstensen. Who needs the Cox model anyway? Technical report, Steno Diabetes
Center, Denmark, 2004.
Y. B. Cheung, F. Gao, and K. S. Khoo. Age at diagnosis and the choice of survival
analysis methods in cancer epidemiology. J Clin Epidemiol, 56(1):38–43, 2003.
T. G. Clark, M. J. Bradburn, S. B. Love, and D. G. Altman. Survival analysis part i:
basic concepts and first analyses. Br J Cancer, 89(2):232–238, 2003.
235
A. Clarke, R. Pulikottil-Jacob, A. Grove, K. Freeman, H. Mistry, A. Tsertsvadze,
M. Connock, R. Court, N.-B. Kandala, M. Costa, G. Suri, D. Metcalfe, M. Crowther,
S. Morrow, S. Johnson, and P. Sutcliffe. Total hip replacement and surface replace-
ment for the treatment of pain and disability resulting from end stage arthritis of the
hip (Review of technology appraisal guidance 2 and 44). Warwick Evidence, 2013.
D. Clayton and M. Hills. Statistical Methods in Epidemiology. Oxford University Press,
1993.
M. P. Coleman, P. Babb, P. Damiecki, P. Grosclaude, S. Honjo, J. Jones, G. Knerer,
A. Pitard, M. Quinn, A. Sloggett, and B. De Stavola. Cancer Survival Trends in
England and Wales, 1971–1995: Deprivation and NHS Region. Number 61 in Studies
in Medical and Population Subjects. London: The Stationery Office, 1999.
D. Collett. Modelling Survival Data in Medical Research. Chapman & Hall, 2003.
R. M. Conroy, K. Pyorala, A. P. Fitzgerald, S. Sans, A. Menotti, G. De Backer, D. De
Bacquer, P. Ducimetiere, P. Jousilahti, U. Keil, I. Njolstad, R. G. Oganov, T. Thom-
sen, H. Tunstall-Pedoe, A. Tverdal, H. Wedel, P. Whincup, L. Wilhelmsen, I. M.
Graham, and S. C. O. R. E. . Estimation of ten-year risk of fatal cardiovascular
disease in europe: the score project. Eur Heart J, 24(11):987–1003, 2003.
C. Cox, H. Chu, M. F. Schneider, and A. Munoz. Parametric Survival Analysis and
Taxonomy of Hazard Functions for the Generalized Gamma Distribution. Stat Med,
26(23):4352–4374, 2007.
D. R. Cox. Regression models and life-tables. J Roy Statist Soc Ser B, 34(2):187–220,
1972.
D. R. Cox. Partial likelihood. Biometrika, 62(2):269–276, 1975.
M. J. Crowther. SURVSIM: Stata module to simulate complex survival data. Statistical
Software Components, Boston College Department of Economics, 2011.
M. J. Crowther. STJM: Stata module to fit shared parameter joint models of longitudi-
nal and survival data. Statistical Software Components, Boston College Department
of Economics, 2012.
236
M. J. Crowther and P. C. Lambert. STMIX: Stata module to fit two-component para-
metric mixture survival models. Statistical Software Components, Boston College
Department of Economics, 2011.
M. J. Crowther and P. C. Lambert. Simulating complex survival data. Stata J, 12(4):
674–687, 2012.
M. J. Crowther and P. C. Lambert. Simulating biologically plausible complex survival
data. Stat Med, 32(23):4118–4134, 2013a.
M. J. Crowther and P. C. Lambert. stgenreg: A Stata package for the general para-
metric analysis of survival data. J Stat Softw, 53(12), 2013b.
M. J. Crowther and P. C. Lambert. A general framework for parametric survival
analysis. Stat Med, In Press, 2014.
M. J. Crowther, K. R. Abrams, and P. C. Lambert. Flexible parametric joint modelling
of longitudinal and survival data. Stat Med, 31(30):4456–4471, 2012a.
M. J. Crowther, R. D. Riley, J. A. Staessen, J. Wang, F. Gueyffier, and P. C. Lam-
bert. Individual patient data meta-analysis of survival data using Poisson regression
models. BMC Med Res Methodol, 12(34), 2012b.
M. J. Crowther, K. R. Abrams, and P. C. Lambert. Joint modeling of longitudinal and
survival data. Stata J, 13(1):165–184, 2013a.
M. J. Crowther, P. C. Lambert, and K. R. Abrams. Adjusting for measurement error in
baseline prognostic biomarkers included in a time-to-event analysis: A joint modelling
approach. BMC Med Res Methodol, 13(146), 2013b.
M. J. Crowther, M. P. Look, and R. D. Riley. Multilevel mixed effects parametric sur-
vival models using adaptive gauss-hermite quadrature with application to recurrent
events and individual participant data meta-analysis. Stat Med, 2014.
N. Demiris, D. Lunn, and L. D. Sharples. Survival extrapolation using the poly-weibull
model. Stat Methods Med Res, 2011.
E. Deslandes and S. Chevret. Assessing surrogacy from the joint modelling of multi-
variate longitudinal data and survival: application to clinical trial data on chronic
lymphocytic leukaemia. Stat Med, 26(30):5411–5421, 2007.
237
P. W. Dickman, A. Sloggett, M. Hills, and T. Hakulinen. Regression models for relative
survival. Stat Med, 23(1):51–64, 2004.
P. Diggle, P. Heagerty, K.-Y. Liang, and S. Zeger. Analysis of Longitudinal Data.
Oxford University Press, New York, 2nd edition, 2002.
P. J. Diggle, I. Sousa, and A. G. Chetwynd. Joint modelling of repeated measurements
and time-to-event outcomes: the fourth armitage lecture. Stat Med, 27(16):2981–
2998, 2008.
S. Durrleman and R. Simon. Flexible Regression Models with Cubic Splines. Stat Med,
8(5):551–561, 1989.
S. Eloranta, P. C. Lambert, T. M. Andersson, K. Czene, P. Hall, M. Bjorkholm, and
P. W. Dickman. Partitioning of excess mortality in population-based cancer patient
survival studies using flexible parametric survival models. BMC Med Res Methodol,
12(86), 2012.
K. Fall, F. Stromberg, J. Rosell, O. Andren, E. Varenhorst, and S. E. R. P. C. Group.
Reliability of death certificates in prostate cancer patients. Scand J Urol Nephrol, 42
(4):352–357, 2008.
C. L. Faucett and D. C. Thomas. Simultaneously modelling censored survival data
and repeatedly measured covariates: a Gibbs sampling approach. Stat Med, 15(15):
1663–1685, 1996.
A. Gelfand and D. K. Dey. Bayesian model choice: asymptotics and exact calculations.
J Roy Statist Soc Ser B, 54:501–514, 1994.
R. Giorgi, M. Abrahamowicz, C. Quantin, P. Bolard, J. Esteve, J. Gouvernet, and
J. Faivre. A relative survival regression model using B-spline functions to model
non-proportional hazards. Stat Med, 22(17):2767–2784, 2003.
A. L. Gould, M. E. Boye, M. J. Crowther, J. G. Ibrahim, G. Quartey, S. Micallef, and
F. Y. Bois. Joint modeling of survival and longitudinal non-survival data: current
methods and issues. Report of the DIA Bayesian joint modeling working group. Stat
Med, 2014.
238
W. Gould, J. Pitblado, and B. Poi. Maximum Likelihood Estimation with Stata. Stata
Press, 4th edition edition, 2010.
GPRD. General Practice Research Database. 2011.
P. M. Grambsch and T. M. Therneau. Proportional hazards tests and diagnostics based
on weighted residuals. Biometrika, 81:515–526, 1994.
X. Guo and B. P. Carlin. Separate and joint modeling of longitudinal and event time
data using standard computer packages. Am Stat, 58(1):16–24, 2004.
G. Han, M. J. Schell, and J. Kim. Improved survival modeling in cancer research using
a reduced piecewise exponential approach. Stat Med, 33(1):59–73, 2014.
R. Henderson, P. Diggle, and A. Dobson. Joint modelling of longitudinal measurements
and event time data. Biostatistics, 1(4):465–480, 2000.
R. Henderson, P. Diggle, and A. Dobson. Identification and efficacy of longitudinal
markers for survival. Biostatistics, 3(1):33–50, 2002.
S. R. Hinchliffe and P. C. Lambert. Flexible parametric modelling of cause-specific
hazards to estimate cumulative incidence functions. BMC Med Res Methodol, 13:13,
2013.
J. Hippisley-Cox, C. Coupland, Y. Vinogradova, J. Robson, M. May, and P. Brindle.
Derivation and validation of QRISK, a new cardiovascular disease risk score for the
United Kingdom: prospective open cohort study. BMJ, 335(7611):136, 2007.
F. Hsieh, Y.-K. Tseng, and J.-L. Wang. Joint modeling of survival and longitudinal
data: likelihood approach revisited. Biometrics, 62(4):1037–1043, 2006.
X. Huang, L. A. Stefanski, and M. Davidian. Latent-model robustness in joint models
for a primary endpoint and a longitudinal process. Biometrics, 65(3):719–727, 2009.
P. J. Huber. The behavior of maximum likelihood estimates under nonstandard con-
ditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics
and Probability, pages 221–233. University of California Press, 1967.
J. G. Ibrahim, M. Chen, and D. Sinha. Bayesian methods for joint modeling of longi-
tudinal and survival data with applications to cancer vaccine trials. Stat Sinica, 14:
863–883, 2004.
239
J. G. Ibrahim, H. Chu, and L. M. Chen. Basic concepts and methods for joint models
of longitudinal and survival data. J Clin Oncol, 28(16):2796–2801, 2010.
B. Jann. MOREMATA: Stata module (Mata) to provide various functions. Statistical
Software Components, Boston College Department of Economics, 2005.
I. Jatoi, W. F. Anderson, J.-H. Jeong, and C. K. Redmond. Breast cancer adjuvant
therapy: time to consider its time-dependent effects. J Clin Oncol, 29(17):2301–2304,
2011.
H. Jensen, R. Brookmeyer, P. Aaby, and P. K. Andersen. Shared frailty model for
left-truncated multivariate survival data. Technical report, 2004.
D. P. Jutte, L. L. Roos, and M. D. Brownell. Administrative record linkage as a tool
for public health research. Annu Rev Public Health, 32:91–108, 2011.
J. D. Kalbfleisch and R. L. Prentice. Statistical Analysis of Failure Time Data. New
York, Wiley, 2nd edition, 2002.
E. L. Kaplan and P. Meier. Nonparametric Estimation from Incomplete Observations.
Journal of the American Statistical Association, 53(282):457–481, 1958.
N. B. King, S. Harper, and M. E. Young. Use of relative and absolute effect measures
in reporting health inequalities: structured review. BMJ, 345:e5774, 2012.
C. Kooperberg and D. B. Clarkson. Hazard regression with interval-censored data.
Biometrics, 53(4):1485–1494, 1997.
C. Kooperberg, C. J. Stone, and Y. K. Truong. Hazard regression. J Amer Statist
Assoc, 90(429):pp. 78–94, 1995.
P. C. Lambert and P. Royston. Further development of flexible parametric models for
survival analysis. Stata J, 9:265–290, 2009.
P. C. Lambert, P. W. Dickman, C. P. Nelson, and P. Royston. Estimating the crude
probability of death due to cancer and other causes using relative survival models.
Stat Med, 29(7-8):885–895, 2010a.
P. C. Lambert, P. W. Dickman, C. L. Weston, and J. R. Thompson. Estimating the
cure fraction in population-based cancer studies by using finite mixture models. J
Roy Statist Soc Ser C, 59(1):35–55, 2010b.
240
P. C. Lambert, L. Holmberg, F. Sandin, F. Bray, K. M. Linklater, A. Purushotham,
D. Robinson, and H. Møller. Quantifying differences in breast cancer survival between
England and Norway. Cancer Epidemiol, 35(6):526–533, 2011.
N. R. Latimer. Survival analysis for economic evaluations alongside clinical trials–
extrapolation with patient-level data: inconsistencies, limitations, and a practical
guide. Med Decis Making, 33(6):743–754, 2013.
N. R. Latimer, K. R. Abrams, P. C. Lambert, M. J. Crowther, A. J. Wailoo, J. P.
Morden, R. L. Akehurst, and M. J. Campbell. Adjusting survival time estimates to
account for treatment switching in randomized controlled trials–an economic evalu-
ation context: Methods, limitations, and recommendations. Med Decis Making, 34
(3):387–402, 2014.
L. M. Leemis. Variate generation for accelerated life and proportional hazards models.
Operations Research, 35(6):pp. 892–894, 1987.
E. Lesaffre and B. Spiessens. On the effect of the number of quadrature points in a
logistic random effects model: an example. J Roy Statist Soc Ser C, 50(3):325–335,
2001.
X. Liao, D. M. Zucker, Y. Li, and D. Spiegelman. Survival analysis with error-prone
time-varying covariates: a risk set calibration approach. Biometrics, 67(1):50–58,
2011.
D. Y. Lin and L. J. Wei. The robust inference for the Cox proportional hazards model.
J Amer Statist Assoc, 84(408):1074–1078, 1989.
I. L. MacDonald. Numerical maximisation of likelihood: A neglected alternative to
EM? Int Stat Rev, 2014.
T. Mackenzie and M. Abrahamowicz. Marginal and hazard ratio specific random data
generation: Applications to semi-parametric bootstrapping. Stat Comp, 12:245–252,
2002.
J. J. McArdle, B. J. Small, L. Backman, and L. Fratiglioni. Longitudinal models of
growth and survival applied to the early detection of alzheimer’s disease. J Geriatr
Psychiatry Neurol, 18(4):234–241, 2005.
241
G. J. McLachlan and D. C. McGiffin. On the role of finite mixture models in survival
analysis. Stat Methods Med Res, 3(3):211–226, 1994.
B. Miladinovic, A. Kumar, R. Mhaskar, S. Kim, R. Schonwetter, and B. Djulbegovic. A
flexible alternative to the Cox proportional hazards model for assessing the prognostic
accuracy of hospice patient survival. PLoS One, 7(10):e47804, 2012.
T. S. Mok, Y.-L. Wu, S. Thongprasert, C.-H. Yang, D.-T. Chu, N. Saijo, P. Sun-
paweravong, B. Han, B. Margono, Y. Ichinose, Y. Nishiwaki, Y. Ohe, J.-J. Yang,
B. Chewaskulyong, H. Jiang, E. L. Duffield, C. L. Watkins, A. A. Armour, and
M. Fukuoka. Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma. N
Engl J Med, 361(10):947–957, 2009.
J. P. Morden, P. C. Lambert, N. Latimer, K. R. Abrams, and A. J. Wailoo. Assessing
methods for dealing with treatment switching in randomised controlled trials: a
simulation study. BMC Med Res Methodol, 11:4, 2011.
P. Murtagh, E. Dickson, M. Van Dam, G. Malincho, and P. Grambsch. Primary bil-
iary cirrhosis: Prediction of short-term survival based on repeated patient visits.
Hepatology, 20:126–134, 1994.
J. C. Naylor and A. F. M. Smith. Applications of a method for the efficient computation
of posterior distributions. Applied Statistics, 31(3):214–225, 1982.
C. P. Nelson, P. C. Lambert, I. B. Squire, and D. R. Jones. Flexible Parametric Models
for Relative Survival, with Application in Coronary Heart Disease. Stat Med, 26(30):
5486–5498, 2007.
P. Philipson, I. Sousa, P. Diggle, P. Williamson, R. Kolamunnage-Dona, and R. Hender-
son. joineR - Joint Modelling of Repeated Measurements and Time-to-Event Data,
2012.
A. V. Piccorelli and M. D. Schluchter. Jointly modeling the relationship between
longitudinal and survival data subject to left truncation with applications to cystic
fibrosis. Stat Med, 31(29):3931–3945, 2012.
J. C. Pinheiro and D. M. Bates. Approximations to the log-likelihood function in the
nonlinear mixed-effects model. J Comput Graph Statist, 4(1):pp. 12–35, 1995.
242
R. L. Prentice. Covariate measurement errors and parameter estimation in a failure
time regression model. Biometrika, 69(2):pp. 331–342, 1982.
R. L. Prentice, B. J. Williams, and A. V. Peterson. On the regression analysis of
multivariate failure time data. Biometrika, 68:373–379, 1981.
C. Proust-Lima and J. M. G. Taylor. Development and validation of a dynamic prog-
nostic tool for prostate cancer recurrence using repeated measures of posttreatment
PSA: a joint modeling approach. Biostatistics, 10(3):535–549, 2009.
H. Putter, M. Fiocco, and R. B. Geskus. Tutorial in biostatistics: competing risks and
multi-state models. Stat Med, 26(11):2389–2430, 2007.
R Development Core Team. R: A Language and Environment for Statistical Computing.
R Foundation for Statistical Computing, Vienna, Austria, 2012. ISBN 3-900051-07-0.
S. Rabe-Hesketh and A. Skrondal. Multilevel and Longitudinal Modeling Using Stata
Volume I: Continuous Responses. Stata Press, 3rd edition, 2012.
S. Rabe-Hesketh, A. Skrondal, and A. Pickles. Reliable estimation of generalized linear
mixed models using adaptive quadrature. Stata J, 2:1–21, 2002.
I. Rashid, L. Marcheselli, and M. Federico. Estimating survival in newly diagnosed
cancer patients: use of computer simulations to evaluate performances of different
approaches in a wide range of scenarios. Stat Med, 27(12):2145–2158, 2008.
N. Reid. A conversation with Sir David Cox. Statistical Science, 9:439–455, 1994.
L. Remontet, N. Bossard, A. Belot, J. Esteve, and F. R. A. N. C. I. M. An overall
strategy based on regression models to estimate relative survival and model the effects
of prognostic factors in cancer survival studies. Stat Med, 26(10):2214–2228, 2007.
D. Rizopoulos. JM: An R Package for the Joint Modelling of Longitudinal and Time-
to-Event Data. J Stat Softw, 35(9):1–33, 7 2010.
D. Rizopoulos. Dynamic predictions and prospective accuracy in joint models for lon-
gitudinal and time-to-event data. Biometrics, 67(3):819–829, 2011.
D. Rizopoulos. Joint Models for Longitudinal and Time-to-Event Data With Applica-
tions in R. Chapman & Hall, 2012.
243
D. Rizopoulos and P. Ghosh. A Bayesian semiparametric multivariate joint model for
multiple longitudinal outcomes and a time-to-event. Stat Med, 30(12):1366–1380,
2011.
D. Rizopoulos, G. Verbeke, and G. Molenberghs. Shared parameter models under
random effects misspecification. Biometrika, 95(1):63–74, 2008.
D. Rizopoulos, G. Verbeke, and E. Lesaffre. Fully exponential Laplace approximations
for the joint modelling of survival and longitudinal data. J Roy Statist Soc Ser B, 71
(3):637–654, 2009.
D. Rizopoulos, G. Verbeke, and G. Molenberghs. Multiple-imputation-based residuals
and diagnostic plots for joint models of longitudinal and survival outcomes. Biomet-
rics, 66(1):20–29, 2010.
W. H. Rogers. sg17: Regression standard errors in clustered samples. Stata Tech Bull,
13:19–23, 1993.
V. Rondeau, D. Commenges, and P. Joly. Maximum penalized likelihood estimation in
a gamma-frailty model. Lifetime Data Anal, 9(2):139–153, Jun 2003.
V. Rondeau, J.-P. Pignon, and S. Michiels. A joint model for the dependence between
clustered times to tumour progression and deaths: A meta-analysis of chemotherapy
in head and neck cancer. Stat Methods Med Res, 2011.
V. Rondeau, Y. Mazroui, and J. R. Gonzalez. frailtypack: An R Package for the
Analysis of Correlated Survival Data with Frailty Models Using Penalized Likelihood
Estimation or Parametrical Estimation. Journal of Statistical Software, 47(4), 2012.
J. Rooney, S. Byrne, M. Heverin, B. Corr, M. Elamin, A. Staines, B. Goldacre, and
O. Hardiman. Survival analysis of irish amyotrophic lateral sclerosis patients diag-
nosed from 1995-2010. PLoS One, 8(9):e74733, 2013.
P. M. Rothwell, S. C. Howard, E. Dolan, E. O’Brien, J. E. Dobson, B. Dahlof, P. S.
Sever, and N. R. Poulter. Prognostic significance of visit-to-visit variability, maxi-
mum systolic blood pressure, and episodic hypertension. Lancet, 375(9718):895–905,
2010.
244
P. Royston. Tools to simulate realistic censored survival-time distributions. Stata J, 12
(4):639–654, 2012.
P. Royston and D. G. Altman. Regression using fractional polynomials of continuous
covariates: Parsimonious parametric modelling. J Roy Statist Soc Ser C, 43(3):
429–467, 1994.
P. Royston and P. C. Lambert. Flexible Parametric Survival Analysis Using Stata:
Beyond the Cox Model. Stata Press, 2011.
P. Royston and M. K. B. Parmar. Flexible Parametric Proportional Hazards and Pro-
portional Odds Models for Censored Survival Data, with Application to Prognostic
Modelling and Estimation of Treatment Effects. Stat Med, 21(15):2175–2197, 2002.
P. Royston and W. Sauerbrei. Multivariable model-building: A pragmatic approach
to regression analysis based on fractional polynomials for modelling continuous vari-
ables. Wiley, 2008.
M. J. Rutherford, M. J. Crowther, and P. C. Lambert. The use of restricted cubic
splines to approximate complex hazard functions in the analysis of time-to-event
data: a simulation study. J Statist Comput Simulation, 2014.
SAS Institute Inc. SAS/STAT Software, Version 9.2. Cary, NC, 2008.
W. Sauerbrei, P. Royston, and M. Look. A new proposal for multivariable mod-
elling of time-varying effects in survival data based on fractional polynomial time-
transformation. Biom J, 49(3):453–473, 2007.
D. Schoenfeld. Partial Residuals for the Proportional Hazards Regression Model.
Biometrika, 69(1):239–241, 1982.
M. Schumacher, G. Bastert, H. Bojar, K. Hubner, M. Olschewski, W. Sauerbrei,
C. Schmoor, C. Beyerle, R. L. Neumann, and H. F. Rauschecker. Randomized 2
x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-
positive breast cancer patients. german breast cancer study group. J Clin Oncol, 12
(10):2086–2093, 1994.
S. Self and Y. Pawitan. AIDS Epidemiology: Methodological Issues, chapter Modeling
a marker of disease progression and onset of disease. Birkhauser, Boston, 1992.
245
F. Siannis, J. Copas, and G. Lu. Sensitivity analysis for informative censoring in
parametric survival models. Biostatistics, 6(1):77–91, 2005.
B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman &
Hall, London, 1992.
StataCorp. Statistical software: Release 12. College Station, TX: StataCorp LP, 2011.
StataCorp. Statistical software: Release 13. College Station, TX: StataCorp LP, 2013.
J. Stoer and R. Burlirsch. Introduction to Numerical Analysis. Springer, 3rd edition,
2002.
M. J. Sweeting and S. G. Thompson. Joint modelling of longitudinal and time-to-event
data with application to predicting abdominal aortic aneurysm growth and rupture.
Biom J, 53(5):750–763, 2011.
M.-P. Sylvestre and M. Abrahamowicz. Comparison of algorithms to generate event
times conditional on time-dependent covariates. Stat Med, 27(14):2618–2634, 2008.
J. M. G. Taylor, Y. Park, D. P. Ankerst, C. Proust-Lima, S. Williams, L. Kestin,
K. Bae, T. Pickles, and H. Sandler. Real-time individual predictions of prostate
cancer recurrence using joint models. Biometrics, 69(1):206–213, 2013.
T. Therneau. A Package for Survival Analysis in S, 2012. R package version 2.36-14.
T. M. Therneau and P. M. Grambsch. Modelling Survival Data: extending the Cox
model. Springer, 2000.
A. C. M. Thiebaut and J. Benichou. Choice of time-scale in Cox’s model analysis of
epidemiologic cohort data: a simulation study. Stat Med, 23(24):3803–3820, 2004.
Y.-K. Tseng, F. Hsieh, and J.-L. Wang. Joint modelling of accelerated failure time and
longitudinal data. Biometrika, 92(3):587–603, 2005.
A. A. Tsiatis and M. Davidian. A semiparametric estimator for the proportional hazards
model with longitudinal covariates measured with error. Biometrika, 88(2):447–458,
2001.
A. A. Tsiatis and M. Davidian. Joint modeling of longitudinal and time-to-event data:
an overview. Stat Sinica, 14:809–834, 2004.
246
F. Tuerlinckx, F. Rijmen, G. Verbeke, and P. D. Boeck. Statistical inference in gen-
eralized linear mixed models: a review. Br J Math Stat Psychol, 59(Pt 2):225–255,
2006.
Y. Wang and J. M. G. Taylor. Jointly modeling longitudinal and event time data with
application to acquired immunodeficiency syndrome. J Amer Statist Assoc, 96(455):
pp. 895–905, 2001.
L. J. Wei, D. Y. Lin, and L. Weissfeld. Regression analysis of multivariate incomplete
failure time data by modeling marginal distributions. J Amer Statist Assoc, 84(408):
1065–1073, 1989.
M. C. Weinstein, B. O’Brien, J. Hornberger, J. Jackson, M. Johannesson, C. Mc-
Cabe, and B. R. Luce. Principles of Good Practice for Decision Analytic Modeling
in Health-Care Evaluation: Report of the ISPOR Task Force on Good Research
Practices–Modeling Studies. Value in Health, 6(1):9–17, 2003.
H. White. A heteroskedasticity-consistent covariance matrix estimator and a direct test
for heteroskedasticity. Econometrica, 48(4):817–838, 1980.
H. White. Maximum likelihood estimation of misspecified models. Econometrica, 50:
1–25, 1982.
J. Whitehead. Fitting Cox’s regression model to survival data using GLIM. Applied
Statistics, 29:268–275, 1980.
M. Wolbers, A. Babiker, C. Sabin, J. Young, M. Dorrucci, G. Chene, C. Mussini,
K. Porter, H. C. Bucher, and CASCADE. Pretreatment CD4 cell slope and progres-
sion to AIDS or death in HIV-infected patients initiating antiretroviral therapy–the
CASCADE collaboration: a collaboration of 23 cohort studies. PLoS Med, 7(2):
e1000239, 2010.
M. S. Wulfsohn and A. A. Tsiatis. A joint model for survival and longitudinal data
measured with error. Biometrics, 53(1):330–339, 1997.
W. Ye, X. Lin, and J. M. G. Taylor. Semiparametric modeling of longitudinal mea-
surements and time-to-event data–a two-stage regression calibration approach. Bio-
metrics, 64(4):1238–1246, 2008.
247
M. Yu, N. J. Law, J. M. G. Taylor, and H. Sandler. Joint longitudinal-survival-cure
models and their application to prostate cancer. Stat Sinica, 14:835–862, 2004.
D. M. Zucker. A pseudo partial likelihood method for semiparametric survival regression
with covariate errors. J Amer Statist Assoc, 100(472):1264–1277, 2005.
248