Development and application of methodology for the parametric … · Abstract Development and application of methodology for the parametric analysis of complex survival and joint

Development and application of

methodology for the parametric analysis

of complex survival and joint

longitudinal-survival data in biomedical

research

Thesis submitted for the degree of

Doctor of Philosophy

At the University of Leicester.

by

Michael James Crowther MMath MSc

Department of Health Sciences

University of Leicester.

1st November 2014.

Development and application of

methodology for the parametric analysis

of complex survival and joint

longitudinal-survival data in biomedical

research

Thesis submitted for the degree of

Doctor of Philosophy

At the University of Leicester.

by

Michael James Crowther MMath MSc

Department of Health Sciences

University of Leicester.

1st November 2014

Abstract

Development and application of methodologyfor the parametric analysis of complex survival

and joint longitudinal-survival data inbiomedical research

M. J. Crowther

The occurrence of survival, or time-to-event, data is commonplace in medical research,where interest lies in the time it takes from a given baseline, for an event of interestto occur, and the factors that are associated with it. For example, this could be theeffect of a treatment on the time to death since diagnosis of cardiovascular disease. Theprimary aim of this thesis is to develop parametric methods for the analysis of complexsurvival data, including the extension to joint models of longitudinal and survival data,to provide a number of advantages over the commonly used semi-parametric Cox model.New and current methodology is often assessed using simulation studies; however, oftenin the field of survival analysis they are simplistic and fail to reflect biologically plausi-ble scenarios. In this thesis a general algorithm for simulating complex survival data,from any given hazard function, is proposed and assessed. A general framework for theparametric analysis of survival data is then developed, utilising numerical quadrature,illustrated in detail using the special case of restricted cubic splines to model the base-line hazard and time-dependent effects. Extensions to the framework including clusterrobust standard errors and excess mortality models are also considered. Finally, thejoint longitudinal-survival modelling framework is extended to incorporate the Royston-Parmar survival model, and a mixture of two parametric distributions, both evaluatedthrough simulation, utilising the proposed simulation algorithm, showing advantagesover more simple parametric approaches. The estimation of joint models, using Gauss-ian quadrature, is also evaluated through an extensive simulation study. Throughoutthe thesis, user friendly software is developed to implement the methodological compo-nents, allowing statisticians and non-statisticians alike, to apply the methods directly.A variety of clinical datasets in the areas of cancer, cardiovascular disease and livercirrhosis are used to exemplify the proposals.

I

Acknowledgements

I would first like to express my gratitude to my two main supervisors, Professor

Keith Abrams and Professor Paul Lambert, both of whom have been not only excellent

supervisors and mentors, but also friends. I would also like to thank Professor Harry

Hemingway for acting as my third supervisor, who I look forward to working with more

in the future.

Thanks also go to other members of the Department of Health Sciences for their

input and expertise, including fellow PhD students and junior researchers. In particular,

I’d like to thank Dr. Mark Rutherford for taking the time to read through a draft of this

thesis. Thanks also to my friends and colleagues at Karolinska Institutet, in particular

Dr. Therese Andersson and Dr. Sandra Eloranta.

I would like to thank all the employees of StataCorp who made my internship,

conducted during my PhD studies, such an enjoyable and invaluable experience. In

particular, I would like to thank Yulia Marchenko, Jeff Pitblado, Alan Riley and Vince

Wiggins. I would also like to thank all the users of my Stata commands who have

emailed over the years with questions, queries, bug reports and suggestions, that have

led to many improvements and made the experience so worthwhile.

I would also like to thank Dr. James Cook and Rickard Henrikson, purely for

keeping me sane throughout my PhD, predominantly through collaborative beer con-

sumption.

Finally, I would like to thank my parents, my sister, and my Grandpa, and the rest

of my friends, for their support over the past few years.

This PhD thesis has been prepared as part of a Doctoral Research Fellowship funded

by the National Institute for Health Research (NIHR) (DRF-2012-05-409).

II

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII

Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1. Aims of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2. Survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3. Joint models for the analysis of longitudinal and survival . . . . . . . . . 21.4. Layout of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Chapter 2. Survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3. Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4. Fundamental relationships in survival analysis . . . . . . . . . . . . . . . 82.5. Delayed entry (left truncation) . . . . . . . . . . . . . . . . . . . . . . . 92.6. Parametric survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . 102.7. Cox proportional hazards model . . . . . . . . . . . . . . . . . . . . . . . 132.8. Royston-Parmar flexible parametric survival model . . . . . . . . . . . . 142.9. Why a parametric approach? . . . . . . . . . . . . . . . . . . . . . . . . 182.10. Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . 202.11. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Chapter 3. Clinical motivating examples . . . . . . . . . . . . . . . . . . . . . . 243.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2. German breast cancer dataset . . . . . . . . . . . . . . . . . . . . . . . . 243.3. Breast cancer in England and Wales . . . . . . . . . . . . . . . . . . . . 283.4. Bladder cancer recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . 313.5. Trial in prednisone treatment of patients with liver cirrhosis . . . . . . . 323.6. Cohort of obese patients with type 2 diabetes mellitus from the General

Practice Research Database . . . . . . . . . . . . . . . . . . . . . . . . . 353.7. Trials of D-penicillamine treatment for patients with primary biliary

cirrhosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.8. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Chapter 4. Simulation of complex survival data . . . . . . . . . . . . . . . . . . 42

III

4.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.3. Motivating dataset - German breast cancer trial . . . . . . . . . . . . . . 454.4. Simulating survival data . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.5. A general framework for simulation of survival data . . . . . . . . . . . . 484.6. Simulating from a complex baseline hazard function . . . . . . . . . . . . 504.7. Simulating time-dependent effects . . . . . . . . . . . . . . . . . . . . . . 574.8. Simulating time-varying covariates . . . . . . . . . . . . . . . . . . . . . 614.9. Simulating a censoring distribution . . . . . . . . . . . . . . . . . . . . . 644.10. survsim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.11. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Chapter 5. A general framework for parametric survival analysis . . . . . . . . 685.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.3. A general framework for the parametric analysis of survival data . . . . 715.4. Time-dependent effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.5. Excess mortality models . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.6. Cluster robust standard errors . . . . . . . . . . . . . . . . . . . . . . . . 755.7. Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.8. Implementation in Stata: The stgenreg package . . . . . . . . . . . . . 775.9. Analysis of example datasets using stgenreg . . . . . . . . . . . . . . . 805.10. Improving the estimation procedure when using restricted cubic splines 915.11. Further analysis of example datasets using strcs . . . . . . . . . . . . 975.12. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Chapter 6. Joint modelling of longitudinal and survival data . . . . . . . . . . . 1046.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046.3. Survival analysis with time-varying covariates . . . . . . . . . . . . . . . 1056.4. Formulating the standard joint model . . . . . . . . . . . . . . . . . . . . 1086.5. Alternative association structures . . . . . . . . . . . . . . . . . . . . . . 1126.6. Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1156.7. Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1236.8. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Chapter 7. Flexible parametric joint modelling of longitudinal and survival data 1307.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1307.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1307.3. Defining the joint model . . . . . . . . . . . . . . . . . . . . . . . . . . . 1327.4. Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1347.5. Analysis of liver cirrhosis dataset . . . . . . . . . . . . . . . . . . . . . . 1387.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Chapter 8. Joint modelling of longitudinal and survival data using finite mixturemodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

8.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1528.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

IV

8.3. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1538.4. Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1588.5. Application to liver cirrhosis data . . . . . . . . . . . . . . . . . . . . . . 1608.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

Chapter 9. Adjusting for measurement error in baseline prognostic biomarkers: Ajoint modelling approach . . . . . . . . . . . . . . . . . . . . . . . 165

9.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1659.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1659.3. Joint model formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1679.4. Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1699.5. Analysis of GPRD cohort . . . . . . . . . . . . . . . . . . . . . . . . . . 1719.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

Chapter 10. Software for the joint modelling of longitudinal and survival data . 18110.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18110.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18110.3. Data setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18210.4. Model fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18410.5. Association structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18710.6. Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18810.7. Joint modelling of longitudinally measured serum bilirubin and time to

death . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18910.8. Post-estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19210.9. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

Chapter 11. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19411.1. Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19411.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19411.3. Summary of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19411.4. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19711.5. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20011.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

Appendix A. Research Papers 1 and 2 - The Stata Journal and Statistics inMedicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

Appendix B. Research Papers 3 and 4 - Journal of Statistical Software andStatistics in Medicine . . . . . . . . . . . . . . . . . . . . . . . . . 204

Appendix C. Research Paper 5 - Statistics in Medicine . . . . . . . . . . . . . . 222

Appendix D. Research Paper 6 - Statistics in Medicine . . . . . . . . . . . . . . 223

Appendix E. Research Paper 7 - BMC Medical Research Methodology . . . . . . 224

Appendix F. Research Paper 8 - The Stata Journal . . . . . . . . . . . . . . . . 233

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

V

List of Tables

3.1 German breast cancer trial. Results from a Cox proportional hazards model,investigating effect of hormonal therapy on survival. . . . . . . . . . . . . . . 25

3.2 German breast cancer trial. Results from a Royston-Parmar proportionalhazards model with 4 degrees of freedom, investigating effect of hormonaltherapy on survival. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 England and Wales breast cancer dataset. Results from a Cox proportionalhazards model, investigating effect of deprivation status on survival. . . . . . 29

3.4 England and Wales breast cancer dataset. Results from a Royston-Parmarproportional hazards model, investigating effect of deprivation status onsurvival. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.5 Bladder cancer recurrence dataset. Number of patients who were censored orexperienced up to 4 recurrences of bladder cancer . . . . . . . . . . . . . . . 31

4.1 German breast cancer trial. Estimated treatment effects from Weibull, Royston-Parmar and Cox models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2 Bias and coverage of the log hazard ratio and estimates of baseline survival andhazard at specific time points from both Weibull and mixture Weibull models. 54

4.3 Bias and coverage of the log hazard ratio and estimates of baseline survival andhazard at specific time points from Weibull and Royston-Parmar models. . . 58

4.4 Bias and coverage of the log hazard ratio and estimates of baseline survival andhazard at specific time points from proportional hazards Weibull and Royston-Parmar models, and non-proportional hazards Royston-Parmar model. . . . . 62

5.1 Description of each component that can be included in the linear predictor ofa parameter included in a stgenreg model. . . . . . . . . . . . . . . . . . . . 79

5.2 England and Wales breast cancer dataset. Results from optimised Weibull andgeneral parametric models with varying number of nodes. . . . . . . . . . . . 82

5.3 Liver cirrhosis trial of prednisone treatment. Example multiple-record persubject data structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4 Liver cirrhosis trial of prednisone treatment. Results from spline basedproportional hazards model with time-varying prothrombin index. . . . . . . 90

5.5 England and Wales breast cancer dataset. Comparison of estimates when usingdifferent numbers of nodes for the fully numeric approach . . . . . . . . . . . 94

5.6 England and Wales breast cancer dataset. Comparison of estimates when usingdifferent numbers of nodes for the combined analytical/numeric approach . . 95

VI

5.7 England and Wales breast cancer dataset. Results from combined analytic/numericalspline based survival model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.8 England and Wales breast cancer dataset. Comparison of excess hazard ratios(and 95% confidence intervals) from models with the linear predictor on the loghazard scale and the log cumulative hazard scale. Both models have 6 knotswith these placed evenly according to the distribution of log death times. . . 98

5.9 Bladder cancer dataset. Number of patients who were censored or experiencedup to 4 recurrences of bladder cancer . . . . . . . . . . . . . . . . . . . . . . 100

5.10Bladder cancer dataset. Results from spline based and Cox models with clusterrobust standard errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.1 Liver cirrhosis trial of prednisone treatment. Example multiple-record persubject data structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

7.1 Simulation results from Weibull scenario 1. Association is varied with α =−0.25, 0.25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

7.2 Simulation results from mixture-Weibull scenario 2. Association is varied withα = −0.25, 0.25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.3 Simulation results from Weibull scenario 3. Association is varied with α =−0.25, 0.25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

7.4 Liver cirrhosis trial of prednisone treatment. Results from applying the RPrestricted cubic spline and B-spline joint models. . . . . . . . . . . . . . . . . 151

7.5 Liver cirrhosis trial of prednisone treatment. Results from joint models withvarying knot locations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

8.1 Estimates of bias and coverage from the simulation study. . . . . . . . . . . . 160

8.2 Liver cirrhosis trial of prednisone treatment. Parameter estimates and 95%Confidence Intervals (CI) for the Weibull, mixture-Weibull and RP joint models.164

9.1 Simulation results of bias and coverage of the association parameter, α. Thestandard deviation of the measurement error is σe. . . . . . . . . . . . . . . . 172

9.2 GPRD cohort of obese patients with type 2 diabetes mellitus. Results fromapplying a Royston-Parmar proportional hazards model adjusting for observedbaseline systolic blood pressure, and a full joint model using the interceptassociation structure. All models are adjusted for age (years) sex (male) andBMI (kg/m2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

9.3 GPRD cohort of obese patients with type 2 diabetes mellitus. Results fromapplying a joint model with random intercept and random linear slope in thelongitudinal submodel, using the intercept association structure. . . . . . . . 176

VII

List of Figures

2.1 Example hazard functions from the exponential, Weibull andGompertz distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1 German breast cancer trial. Survival function is estimated usingthe Kaplan-Meier method, across treatment groups. . . . . . . . . . . . 25

3.2 German breast cancer trial. Hazard function estimated usingthe Royston-Parmar survival model . . . . . . . . . . . . . . . . . . . . 27

3.3 German breast cancer trial. Non-parametric estimates of thehazard function using the Epanechnikov and Epanechnikov 2kernel density smoothers. Follow-up is restricted to 6 years. . . . . . . . 27

3.4 England and Wales breast cancer data, all ages, comparing leastand most deprived. Survival function is estimated using theKaplan-Meier method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.5 England and Wales breast cancer dataset, all ages, comparingleast and most deprived. Hazard function is estimated using aRoyston-Parmar model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.6 Liver cirrhosis trial investigating prednisone treatment. Survivalfunction is estimated using the Kaplan-Meier method. . . . . . . . . . . 32

3.7 Liver cirrhosis trial investigating prednisone treatment. Hazardfunction estimated using a Royston-Parmar model. . . . . . . . . . . . . 33

3.8 Liver cirrhosis trial investigating prednisone treatment. Lon-gitudinal profiles of prothrombin index for patients who werecensored/died. Lowess smoother overlaid. . . . . . . . . . . . . . . . . . 34

3.9 Liver cirrhosis trial investigating prednisone treatment. Longi-tudinal profiles of prothrombin index for patients who were cen-sored/died. Timescale is adjusted by taking away each patientsobserved survival time. Lowess smoother overlaid. . . . . . . . . . . . . 34

3.10 GPRD cohort of obese patients with type 2 diabetes mellitus.Survival function is estimated using the Kaplan-Meier method. . . . . . 36

3.11 GPRD cohort of obese patients with type 2 diabetes mellitus.Hazard function is estimated using a Royston-Parmar model. . . . . . . 36

3.12 GPRD cohort of obese patients with type 2 diabetes mellitus.Box plots of baseline SBP for patients who were censored or whosuffered a stroke. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.13 GPRD cohort of obese patients with type 2 diabetes mellitus.Distribution of number of observed measurements of systolicblood pressure per patient. . . . . . . . . . . . . . . . . . . . . . . . . . 37

VIII

3.14 Primary biliary cirrhosis trials investigating D-penicillamine treat-ment. Survival function is estimated using the Kaplan-Meier method. . 38

3.15 Primary biliary cirrhosis trials investigating D-penicillamine treat-ment. Hazard function is estimated using a nonparametric ker-nel smoother. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.16 Primary biliary cirrhosis trials investigating D-penicillamine treat-ment. Longitudinal profiles of log(serum bilirubin) index for pa-tients who were censored/died. Lowess smoother overlaid. . . . . . . . . 40

3.17 Primary biliary cirrhosis trials investigating D-penicillamine treat-ment. Longitudinal profiles of log(serum bilirubin) index for pa-tients who were censored/died. Timescale is adjusted by takingaway each patients observed survival time. Lowess smootheroverlaid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.1 German breast cancer trial. Predicted survival from Weibull andRoyston-Parmar models overlaid on the Kaplan-Meier curves. . . . . . . 46

4.2 German breast cancer trial. Predicted hazard functions fromWeibull and Royston-Parmar survival models. . . . . . . . . . . . . . . . 46

4.3 Schematic flow diagram of simulation techniques, beginning atthe top-left. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4 Example two-component mixture Weibull baseline hazard functions . . 524.5 Example fractional polynomial baseline hazard function . . . . . . . . . 554.6 Example time-dependent log hazard ratio representing a dimin-

ishing treatment effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1 England and Wales breast cancer dataset. Comparison of fit-ted survival function and confidence intervals between analyticWeibull model and equivalent model fitted with quadrature. . . . . . . . 83

5.2 England and Wales breast cancer dataset. Predicted hazardfunction for the most affluent group with 95% confidence interval. . . . 87

5.3 England and Wales breast cancer dataset. Time-dependent haz-ard ratio for deprivation status. . . . . . . . . . . . . . . . . . . . . . . . 88

5.4 England and Wales breast cancer dataset. Kaplan-Meier esti-mates for the most affluent and most deprived groups, with pre-dicted survival overlaid. The figure on the left shows predictedsurvival with a proportional effect of deprivation status, withthe figure on the right allowing for non-proportional hazards inthe effect of deprivation status. . . . . . . . . . . . . . . . . . . . . . . . 89

5.5 England and Wales breast cancer dataset. Excess hazard ratiocomparing most deprived with least deprived group. The modelused 6 knots for the baseline and 4 knots for the time-dependenteffect. Shaded area represents the 95% confidence interval. . . . . . . . 99

5.6 England and Wales breast cancer dataset. Excess hazard ratioscomparing most deprived with least deprived group. The modelused 6 knots for the baseline and 4 knots for the time-dependenteffect. Thicker lines are the estimated excess hazard ratios, withthinner lines associated 95% confidence intervals. . . . . . . . . . . . . . 99

IX

5.7 Bladder cancer dataset. Baseline hazard rates for the four or-dered events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.1 Liver cirrhosis trial of prednisone treatment. Time-varying co-variate within a survival analysis . . . . . . . . . . . . . . . . . . . . . . 108

6.2 Time-varying covariate within a joint model . . . . . . . . . . . . . . . . 1106.3 Basis node locations for 2-dimensional 9 point simple Gauss-

Hermite quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206.4 Basis node locations for 2-dimensional 9 point simple Gauss-

Hermite quadrature, transformed by Choleski decomposition ofvariance-covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . 121

6.5 Conceptual example of adaptive Gauss-Hermite quadrature . . . . . . . 124

7.1 Liver cirrhosis trial of prednisone treatment. Longitudinal pro-files of prothrombin index for patients who were censored/died.Lowess smoother overlaid. . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7.2 Liver cirrhosis trial of prednisone treatment. Longitudinal pro-files of prothrombin index for patients who were censored/died.Timescale is adjusted by taking away each patients survival time.Lowess smoother overlaid. . . . . . . . . . . . . . . . . . . . . . . . . . . 139

7.3 Liver cirrhosis trial of prednisone treatment. Marginal survivalusing 6 parameters to model the baseline cumulative hazardfunction, overlaid on the Kaplan-Meier estimate and it’s associ-ated 95% confidence interval. . . . . . . . . . . . . . . . . . . . . . . . . 141

7.4 Liver cirrhosis trial of prednisone treatment. Marginal survivalusing cubic B-splines with 3, 4 or 5 internal knots, resulting in 7,8 or 9 parameters to model the baseline cumulative hazard func-tion, overlaid on the Kaplan-Meier estimate and it’s associated95% confidence interval. . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

7.5 Liver cirrhosis trial of prednisone treatment. Predicted condi-tional survival probabilities for patients 98 and 253. . . . . . . . . . . . 143

7.6 Liver cirrhosis trial of prednisone treatment. Fitted marginalsurvival function from joint models with varying knot locationsand number of internal knots. Left hand plot used 6 parametersto model the baseline log cumulative hazard function, right handplot uses 4, 5 and 7 parameters. . . . . . . . . . . . . . . . . . . . . . . 144

7.7 Simulated data. Fitted marginal survival function from jointmodels with either 1 or 5 degrees of freedom, overlaid on theKaplan-Meier survival curve. . . . . . . . . . . . . . . . . . . . . . . . . 146

8.1 Baseline hazard functions used in the simulation study. . . . . . . . . . 1598.2 Liver cirrhosis trial of prednisone treatment. Fitted marginal

survival functions. The left plot is from the Weibull based jointmodel, and the right from the mixture Weibull joint model. . . . . . . . 161

9.1 Example simulated observed longitudinal measurements withvarying measurement error standard deviation. . . . . . . . . . . . . . . 171

X

9.2 GPRD cohort of obese patients with type 2 diabetes mellitus.Longitudinal response measurements for SBP for 9 randomlyselected patients who had at least 10 measurements. The dashedline represents the fitted longitudinal trajectories based on thejoint model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

9.3 Predicted survival from the Royston-Parmar flexible parametricmodel (FPM) and joint model, for a female, aged 60 years, BMIof 30, with SBP of 90, 130 or 200. . . . . . . . . . . . . . . . . . . . . . 176

10.1 Longitudinal profiles of log(serum bilirubin) for patients whowere censored/died. Timescale is adjusted by taking away eachpatients survival time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

XI

CHAPTER 1

Introduction

1.1. Aims of the thesis

The development of what is now known as the Cox proportional hazards model,

by Sir David Cox in 1972, is without doubt one of the most important developments

in medical and statistical research (Cox, 1972). With over 36,000 citations (Google

Scholar, February 2014) since publication, it is often the default choice for the analysis

of survival data; however, it is not without its disadvantages. The primary aim of this

thesis is to develop methodology for the parametric analysis of survival data, as an

alternative to the Cox model.

In order to assess statistical methods, often simulation studies are conducted, which

generate data under a known truth. To validate and assess the parametric methods

developed in this thesis, it is also an aim to simultaneously derive a general framework

to better simulate more biologically plausible survival data, as simulation studies can

often be considered too simplistic using current methods. This will provide a framework

which can be utilised in an extensive array of methodological areas.

Advanced statistical techniques are often developed and published in the method-

ological literature, which have the potential to be used in a variety of clinical settings;

however, they can struggle to permeate through to applied biomedical research. A

further aim of this thesis is, in parallel to the methodological work, to create and

disseminate user friendly software, which can be used by both statisticians and non-

statisticians, alike. Methods could be implemented in a variety of statistical packages,

such as R; however, I have chosen Stata.

Throughout the thesis, the methodology developed will be applied to real clinical

datasets in a variety of settings, such as liver cirrhosis and breast cancer, serving to

1

illustrate the wide variety of clinical areas that can benefit from the novel methodology,

and user friendly software.

1.2. Survival analysis

The occurrence of survival (or time-to-event) data is commonplace in medical re-

search, where interest lies in the time it takes from a given baseline, for an event of

interest to occur, and the factors that are associated with it. For example, this could be

the effect of a treatment on the time to death since diagnosis of cardiovascular disease.

The two main approaches to survival analysis, are the semi-parametric approach

of Cox (1972), and fully parametric approaches, assuming such distributions as the

exponential or Weibull, for example (Collett, 2003). The Cox model does not assume

any functional form for the baseline hazard function, whereas a parametric approach

assumes a specific shape, estimated as part of the model. Both allow us to investigate

the influence that risk factors have on the rate of disease or mortality, for example.

In this thesis I concentrate on the parametric approach to survival analysis, in par-

ticular, deriving a general algorithm to simulate survival data under more biologically

plausible scenarios to better assess both methods used in practice, and novel models.

This then leads to the development of a general framework for parametric survival

analysis, motivated by wanting to incorporate greater flexibility than standard para-

metric models can provide, particularly in capturing complex baseline hazard functions

and time-dependent effects. The framework is extended to incorporate cluster robust

standard errors and relative survival, with an improved estimation routine when using

the special case of restricted cubic splines to model the baseline and time-dependent

effects, illustrated with applications in the areas of breast and bladder cancer.

1.3. Joint models for the analysis of longitudinal and survival

It is often intrinsic in clinical practice that longitudinal outcomes, such as repeat-

edly measured systolic blood pressure, and survival data, such as time to cardiovascular

disease, are collected in parallel. Most often, these outcomes are analysed separately,

2

for example using linear mixed effect models to model the longitudinal response, and

as described above, the survival outcome is commonly analysed using a proportional

hazards model. When these two processes are associated, a more complex approach

to analysis is required. This has led to the field of joint modelling of longitudinal and

survival data, sometimes known as simultaneous modelling, receiving substantial at-

tention over the past 20 years, particularly within methodological statistical research.

The most common approach combines the linear mixed effect model with a propor-

tional hazards model, through shared random effects, which serve to characterise the

association between the two processes. A joint model can provide much greater in-

sights into both the longitudinal and survival outcomes, reducing bias and making

most efficient use of the data available, in an attempt to disentangle the underlying

relationships (Gould et al., 2014). Common examples include the association between

repeatedly measured prostate specific antigen (PSA) levels and recurrence of prostate

cancer (Proust-Lima and Taylor, 2009), the relationship between cluster of differenti-

ation 4 (CD4) cell counts in predicting the progression to acquired immune deficiency

syndrome (AIDS) in human immunodeficiency virus (HIV) infected patients (Wang and

Taylor, 2001), and modelling quality of life over time jointly with survival (Billingham

and Abrams, 2002). Their potential for use in clinical practice is quite substantial,

enabling prediction of the time to an event conditional upon a biomarker trajectory to

date (Rizopoulos, 2011).

Motivated by allowing more flexibility in the joint model framework, I extend the

frequentist shared parameter joint model to incorporate the Royston-Parmar (RP) flex-

ible parametric survival model, which uses restricted cubic splines on the log cumulative

hazard scale, and compare it to a previous proposal which used B-splines (Rizopoulos

et al., 2009), in application to a dataset of patients with liver cirrhosis, investigating

the association between repeated measures of prothrombin index and survival. The

proposal is further evaluated through an extensive simulation study, where I compare

adaptive and non-adaptive Gauss-Hermite quadrature, used to calculate the joint like-

lihood. I then move to the more traditional log hazard scale, incorporating a finite

3

mixture of parametric distributions as the survival submodel, comparing it to the RP

based joint model in application to the liver cirrhosis trial, and further investigating the

effect of misspecifying the baseline hazard function through simulation. I then describe

the novel use of a specific association structure within a joint model, linking baseline

systolic blood pressure (SBP) to the time to stroke in obese patients with type 2 dia-

betes mellitus, but utilising the often ignored repeated measures of SBP to account for

measurement error. This approach is further evaluated through simulation, to assess

the impact of measurement error. All of the joint models described in this thesis are

implemented in the extensive stjm package in Stata, written as a central component of

this thesis, which is illustrated using a further dataset of patients with primary biliary

cirrhosis, investigating the association between repeated measures of serum bilirubin

and survival.

1.4. Layout of the thesis

The thesis is organised as follows. Chapter 2 introduces the field of survival analysis,

and describes the fundamental relationships that form the basis of this thesis. Chapter

3 introduces six clinical datasets that are used throughout the thesis to motivate the

methodological developments, briefly describing the medical setting and summary sta-

tistics. Chapter 4 develops a framework for the simulation of complex survival data,

providing methods to simulate more biologically plausible data. Chapter 5 develops

a framework for the general parametric analysis of survival data, illustrated with a

variety of applications in breast cancer and liver cirrhosis. Chapter 6 introduces the

joint modelling of longitudinal and survival data, and Chapters 7 and 8 develop new

joint models to provide more flexible survival submodels to capture complex hazard

functions, incorporating the Royston-Parmar survival model and a finite mixture of

parametric survival models, respectively. Both proposals are illustrated by investigat-

ing the association between repeated measures of prothrombin index and survival in a

prednisone treatment trial of patients with liver cirrhosis. Chapter 9 describes the use

of a specific association structure within a joint model, linking baseline SBP to the time

4

to stroke in obese patients with type 2 diabetes mellitus. I then describe in Chapter 10

user friendly Stata software for fitting a variety of joint models. I conclude the thesis

in Chapter 11 with a discussion, outlining further work that could be undertaken.

5

CHAPTER 2

Survival analysis

2.1. Chapter outline

This chapter introduces the foundations of survival analysis, including commonly

used methods to analyse survival data. The framework described in this chapter will

form the basis for methodological extensions described in later chapters.

2.2. Introduction

Often in medical research, interest lies in the time it takes for an event of interest

to occur, from a given baseline. The analysis of such data is described as survival or

time-to-event analysis, where the term ‘failure’ is used to describe the occurrence of

the event of interest, and the ‘survival time’ is the time taken for the event of interest

to occur (Collett, 2003). Common examples of this include the time to death in a ran-

domised controlled trial of patients with cardiovascular disease, with the time origin

being the randomisation of treatment, or time to progression to acquired immunode-

ficiency syndrome (AIDS) in human immunodeficiency virus (HIV) positive patients,

with the time origin being diagnosis of HIV.

In the analysis of survival data, we are interested in the probability of an event and

the rate at which the event occurs, and how this rate varies between groups of patients,

for example, between males and females. However, the analysis is often complicated by

the fact that not all patients will experience the event of interest within the allocated

follow-up period. These patients are said to be censored.

2.3. Censoring

The event of interest will usually not occur, or be observed, for all subjects within

the time frame of a particular study. For example, consider a cohort of patients with

6

breast cancer, who enter the study at diagnosis, and the goal is to investigate the effect

of treatment A compared to treatment B on the time to death. We will not have

the time to death for all patients unless we wait for all patients to die. In the case of

breast cancer, some patients will survive over 30 years, as such it is often not feasible to

continue a study beyond a certain time point. If patients have been under observation

for the duration of the study, but have not experienced the event of interest when the

study comes to an end, this is described as administrative censoring. Some patients

will be lost to follow-up (for example, a patient may move house and contact is lost

during the study period). When the study is concluded and the data is analysed, we

know that their unobserved time to death will be longer than their time in the study.

Censored patients cannot simply be removed from the analysis, as this will create a

missing not at random process which will affect inferences and lead to bias.

The above described form of censoring is generally known as right censoring. Other

types, such as left censoring and interval censoring can also occur. Left censoring

is when the event of interest occurs before a patient becomes under observation, for

example, consider patients followed up for recurrence of a particular cancer, following

removal of the primary tumor. They are examined two months after the operation, and

if a recurrence is found, then the actual time of recurrence is less than two months, and

therefore their time to recurrence is left censored. Interval censoring occurs when the

event of interest occurs within a particular time window, but it is not known exactly

when. For example, continuing with the recurrence of cancer example, if a patient is

event free at two months, but a recurrence is found at the second visit occurring at four

months, then the actual time of recurrence is known to have occurred between two and

four months, which can be described as interval censored.

2.3.1. Informative censoring

When patients are administratively censored, this can generally be considered to be

non-informative, i.e. the censoring process is not related to any factors which are

associated with a patient’s underlying survival time. When censoring is informative,

for example, if patients on a particular treatment being evaluated are withdrawn from

7

the study due to adverse effects of the particular treatment, and are subsequently

censored at this point, then standard analyses become invalid and care must be taken

when interpreting results. This can be viewed as a form of informative drop-out, which

will be considered in more detail in Chapters 6 to 10.

2.4. Fundamental relationships in survival analysis

The four fundamental functions in survival analysis are the cumulative probability

distribution function, F (t); survival function, S(t); the hazard function, h(t); and the

cumulative hazard function, H(t). Mathematically, they can all be written in terms of

one another. Let the continuous non-negative random variable T denote the observed

survival time, since entry to the study (t = 0). Suppose T has a probability distribution

with underlying probability density function f(t). The distribution function can then

be written as (Collett, 2003)

F (t) = P (T ≤ t) =

∫ t

0f(u)du, (2.1)

which represents the probability that the survival time is less than some value t. Al-

ternatively,

S(t) = P (T > t) (2.2)

which is the proportion of the study population who have not experienced the event

of interest by time t, i.e. the probability of being event-free at time t. The hazard

function is defined as

h(t) = limδt→0

P (t < T ≤ t+ δt|T > t)

δt(2.3)

= −d logS(t)

dt(2.4)

such that h(t) is the instantaneous failure rate at time t. I can further write

H(t) =

∫ t

0h(u)du (2.5)

= − logS(t)

8

where H(t) is the cumulative hazard function. The probability density function f(t) is

defined as,

f(t) = h(t)S(t) (2.6)

In any particular survival dataset, n patients will be observed, each with a time to event,

ti, and an event indicator, di, which takes the value of 1 if the ith patient experienced

the event, or 0 if they were censored.

Perhaps the most useful of these functions, in terms of communicating risk, are

the survival and hazard. The survival function allows us to make statements such as:

“The probability of experiencing the event within the next 5 years is A”, and the hazard

function allows us to make statements such as, “The rate of experiencing the event is B

per year.” From a patient’s perspective it is important to translate complex statistical

methods into communicable statements, easily understood by patient and clinician,

alike. Furthermore, these functions are required in the definition of the log-likelihood,

and subsequently required for estimation, which will be covered in Section 2.10.

2.5. Delayed entry (left truncation)

A patient does not always become at risk at time 0. Often in epidemiological

studies, age is used as the timescale, as it can be considered a better way of controlling

for age, compared to adjusting for it at baseline (Cheung et al., 2003; Thiebaut and

Benichou, 2004). Age at diagnosis, say, is then taken as the time at which a patient

becomes at risk of the event. Within a survival analysis, we must therefore condition

upon survival up to the entry time, and therefore the probability of surviving up to

time t, conditional on surviving up to time t0, can be written as

P (T > t|T > t0) =S(t)

S(t0)(2.7)

which is the survival function evaluated at time t, divided by the survival function

evaluated at the entry time, t0. This must be accounted for when estimating a model,

which will be described further in Sections 2.6.2 and 2.10.

9

2.6. Parametric survival analysis

To gain a greater insight into the observed data, we are generally interested in

fitting a statistical model. In survival analysis, this is usually conducted on the (log)

hazard scale, with covariates included in a linear predictor which is additive on the log

hazard scale. This then allows us to make statements such as by how much does being

exposed to a particular factor X increase/decrease the hazard rate, in relative terms.

It is also possible to model on alternative scales, such as the (log) cumulative hazard

scale, which I will return to in Section 2.8.2.

2.6.1. Exponential, Weibull and Gompertz distributions

By making assumptions about the shape of the hazard function we can begin to model

our time to event data.

2.6.1.1. Exponential distribution

The simplest assumption we can make is that the hazard is contant over time, and

therefore

h(t) = λ, S(t) = exp(−λt), f(t) = λ exp(−λt)

which gives f(t), the probability density function of the exponential distribution. There-

fore, if we assume that the hazard rate is constant we are assuming that the survival

times follow an exponential distribution.

2.6.1.2. Weibull distribution

A more flexible choice is the Weibull distribution, which assumes a hazard function

of the form

h(t) = λγtγ−1,

with

S(t) = exp(−λtγ), f(t) = λγtγ−1 exp(−λtγ)

10

By assuming survival times come from a Weibull distribution, we incorporate more

flexibility for the hazard function, which can now be monotonically increasing or de-

creasing. It should be noted that if γ = 1, then the Weibull model reduces to the

exponential.

2.6.1.3. Gompertz distribution

An alternative parametric distribution is the Gompertz, which has been used exten-

sively to model mortality data. I define the hazard function

h(t) = λ exp(γt)

with corresponding survival and probability density functions,

S(t) = exp−λγ−1(eγt − 1), f(t) = λ expγt− λγ−1(eγt − 1)

This distribution is suitable for modelling hazard rates which increase or decrease ex-

ponentially with time.

I illustrate some of the shapes that the three above distributions can capture, in

terms of the hazard function, in Figure 2.1.

0.05

0.10

0.15

0.20

0.25

Haz

ard

func

tion

0 1 2 3 4Follow-up time

Exp. l = 0.1 Weibull l = 0.1, g = 1.2Weibull l = 0.1, g = 0.8 Gompertz l = 0.1, g = 0.2Gompertz l = 0.1, g = -0.2

Figure 2.1. Example hazard functions from the exponential, Weibulland Gompertz distributions.

11

Note from Figure 2.1, that the hazard functions which can be modelled using these

three distributions are either constant or monotonically increasing or decreasing. The

ability to incorporate a more flexible, but still parametric, function will be returned to

in Section 2.8.2 and throughout the thesis.

2.6.2. Log-likelihood function for a parametric model

The log-likelihood contribution of the ith patient, for a parametric survival model,

allowing for right-censoring, can be written as

logLi = logf(ti)

diS(ti)1−di

= di log [f(ti)] + (1− di) log [S(ti)] (2.8)

where ti and di are the event/censoring time and event indicator for the ith patient,

respectively. Alternatively, using Equation (2.6) I can write

logLi = logh(ti)

diS(ti)

= di log [h(ti)] + log [S(ti)] (2.9)

Equations (2.8) and (2.9) assume a patient becomes at risk at time 0. As described in

Section 2.5, this may not always be the case; there may be the case where a patient is

not at risk of an event until some time after time 0. For example, this can occur when

age is used as the timescale, and a patient is not at risk until the age of diagnosis. To

incorporate delayed entry, Equation (2.9) is extended to

logLi = di log [h(ti)] + log [S(ti)]− log [S(t0i)] (2.10)

where t0i is the time at which the ith patient becomes at risk. The total likelihood is

therefore

L =

n∏

i=1

Li

12

or alternatively the total log likelihood is

logL =

n∑

i=1

logLi

Maximising a log-likelihood function such as Equation (2.10) will be covered in Section

2.10.

2.7. Cox proportional hazards model

Interest is often in comparing between groups, for example, a clinical trial may

investigate how the hazard rate in a group of patients randomised to standard therapy,

compares to the hazard rate in a group of patients randomised to receive a new therapy.

To illustrate this, consider a binary covariate, X, with X = 0 representing standard

therapy, and X = 1 representing a new therapy. If I let h0(t) represent the hazard

function for a patient on standard therapy, I can write the hazard function for the ith

patient as

hi(t) = h0(t) exp(Xiβ) (2.11)

which is known as the proportional hazards model, with h0(t) known as the baseline

hazard function (i.e. the hazard function when all covariates are equal to zero), and

therefore

hi(t) =

h0(t) exp(β) if Xi = 1

h0(t) if Xi = 0

To compare the two groups, I can calculate how much higher (or lower) the hazard rate

is in the new therapy group (X = 1), compared to standard therapy (X = 0)

HR =h0(t) exp(β)

h0(t)= exp(β)

which is known as the Hazard Ratio (HR), and equivalently, β is known as the log

hazard ratio. This of course generalises to a vector of covariates, Xi, and associated

vector of log hazard ratios, β.

13

Cox (1972) derived a partial likelihood function for the ith patient for a proportional

hazards model (assuming no ties) as

Li(β) =

[exp(Xiβ)∑

l∈R(ti)exp(Xiβ)

]di(2.12)

where R(ti) is the set of individuals who are at risk at time ti. The corresponding

log-likelihood function is written as

logLi(β) = di

Xiβ − log

∑

i∈R(ti)

exp(Xiβ)

(2.13)

Unlike the parametric models described in Section 2.6, this formulation does not require

specification of h0(t), the baseline hazard function, and therefore the distribution of T

does not have to be specified. Equation (2.12) is therefore not a full likelihood, but

known as a partial likelihood (Cox, 1975), as it does not make use of the observed

survival or censoring times. It can be estimated using standard procedures such as

Newton-Raphson, described later in Section 2.10.

The key assumption of Equation (2.11) is that of proportional hazards, in that

covariate effects remain constant over the entire of follow-up. Various diagnostics have

been proposed in the literature to assess the proportional hazards assumption (Gramb-

sch and Therneau, 1994), including graphical approaches such as log(− log(S(t))) plots,

and those based on residuals (Schoenfeld, 1982). Alternatively, the presence of non-

proportional hazards can be evaluated directly by including an interaction of the co-

variate of interest and a function of time (Sauerbrei et al., 2007). The simulation and

modelling of non-proportional hazards will be considered further in Chapters 4 and 5.

2.8. Royston-Parmar flexible parametric survival model

In the analysis of clinical trial data or population based studies, the phenomenon

of a turning point in the underlying hazard function, is often observed (Murtagh et al.,

1994; Lambert and Royston, 2009; Eloranta et al., 2012). For example, in cancer trials,

the mortality rate tends to peak soon after the beginning of the trial, before tailing off.

14

Although hazard ratios can generally be quite insensitive to a poorly specified baseline

hazard function (Rutherford et al., 2014), when interest lies in measures of absolute risk

it is vital to accurately capture the baseline (Royston and Lambert, 2011). Use of more

flexible, yet parametric, models is increasing in medical research as the benefits of a

parametric approach, such as the reporting of measures of absolute risk, prediction and

extrapolation, become recognised (King et al., 2012). One such class of survival models

is the Royston-Parmar flexible parametric survival model (Royston and Parmar, 2002).

This model framework relies on the use of restricted cubic splines to model the baseline

on a variety of scales.

2.8.1. Restricted cubic splines

The use of splines, piecewise smooth polynomials, to model nonlinear relationships in

statistics is commonplace (Durrleman and Simon, 1989). They provide an extremely

flexible framework to capture complex shapes in patterns of data. They are continuous

in nature, and have constraints imposed that ensure they are continuous at the places

where the piecewise polynomials join, called knots. Here I concentrate on restricted

cubic splines, as they will be used extensively in later chapters.

I begin with defining a cubic spline function, s3(x), which is the most commonly

used

s3(x) =

3∑

j=0

β0jxj +

K∑

i=1

βi3(x− ki)3+ (2.14)

with

u+ =

u if u > 0

0 if u ≤ 0

This can be extended to allow for a cubic interpolation with the added restriction

that the fitted function is forced to be linear beyond the boundary knots, known as a

restricted cubic spline function. I can therefore write a restricted cubic spline function

of x, with vector of knots k0, as sx|γ,k0. For example, with K knots (in ascending

15

order) a restricted cubic spline function can be expressed as:

s(x) = γ0 + γ1z1 + γ2z2 + · · ·+ γK−1zK−1 (2.15)

and

z1 = x

zp = (x− kp)3+ − κp(x− k1)3

+ − (1− κp)(x− kK)3+ p = 2, . . . ,K − 1

where

κp =kK − kpkK − k1

Choosing the number of knots, K, will be considered further in Section 2.8.3.

2.8.2. The Royston-Parmar model

Following Royston and Parmar (2002), to derive the flexible parametric model, I begin

with the survival function for a Weibull distribution

S(t) = exp(−λtγ)

Transforming to the log cumulative hazard scale,

logH(t) = log [− logS(t)] = log(λ) + γ log(t)

Therefore on the log cumulative hazard scale, this gives a linear function of log time.

Incorporating a vector of baseline covariates, Xi, gives

logH(t|Xi) = log(λ) + γ log(t) +Xiβ (2.16)

The fundamental idea of the Royston-Parmar model is to replace log(t) with a restricted

cubic spline function of log(t), to provide much greater flexibility. By relaxing the linear

relation between the log cumulative hazard and log time, it provides a much more

flexible modelling framework to capture non-linearities in the log cumulative hazard

function (and subsequently the hazard function). Furthermore, the (log) cumulative

16

hazard function can be considered a simpler function than the hazard function, and so

fewer spline terms are generally required to capture it, compared to modelling on the

(log) hazard scale (Royston and Lambert, 2011).

I now substitute the baseline function, log(λ) + γ log(t), in Equation (2.16), for a

restricted cubic spline function of log(t)

logH(t|Xi) = ηi(t) = slog(t)|γ,k0+Xiβ (2.17)

and transforming to the hazard and survival scales, we have

h(t|Xi) =

1

t

dslog(t)|γ,k0d log(t)

exp(ηi(t)), S(t|Xi) = exp− exp(ηi(t))

Given the fully parametric nature of the model, the derivatives of the spline function

required in the definition of the hazard function can be easily calculated. For fur-

ther details on the Royston-Parmar survival model, I refer the reader to Royston and

Lambert (2011).

This ‘flexible parametric model’ has been gaining in popularity in recent years,

with a number of applied examples in the areas of cancer survival (Lambert et al.,

2011) and prognostic research (Miladinovic et al., 2012; Rooney et al., 2013), and

methodological extensions such as competing risks (Hinchliffe and Lambert, 2013),

incorporating a cure fraction (Andersson et al., 2011), and relative survival (Nelson

et al., 2007). Incorporating the Royston-Parmar survival model into a joint model

framework is the focus of Chapter 7.

When applying this form of survival model, knot locations are often defined based

on the distribution of event times, i.e. equally spaced centiles. The linearity assumption

before the first knot and after the final knot leads to stability in the estimated function

at the extremes of the data, with the first and final knots often chosen to be the

minimum and maximum observed (log) event times. The parametric nature of the

survival model ensures that smooth continuous time predictions can be obtained and

tailored at the individual level, allowing out of sample predictions to be made.

17

2.8.3. Selecting the degrees of freedom

A central component of the RP model is that a number of degrees of freedom (or

equivalently a number of knots), must be specified to model the baseline cumulative

hazard function. This is also true of modelling time-dependent effects using splines;

however, for brevity I concentrate only on the aspect of capturing the baseline.

A recent extensive simulation study that I was involved in, which directly utilised

the simulation methodology developed in Chapter 4, assessed standard model selection

criteria in selecting the optimum degrees of freedom (Rutherford et al., 2014), including

the Akaike Information Criterium (AIC) proposed by Akaike (1973), and the Bayesian

Information Criterium (BIC) proposed by Gelfand and Dey (1994).

Rutherford et al. (2014) concluded that restricted cubic splines provide an effective

way of capturing simple and complex baseline hazard functions, with the approxima-

tion not heavily reliant on the ‘correct’ selection of number of knots. We found that

if a sufficient number of knots are chosen, the hazard function can be very closely ap-

proximated. However, care must be taken not to specify too many degrees of freedom,

as this can lead to over-fitting where local deviations in the data are picked up.

The default way of choosing knot locations is based on equally spaced centiles of

the distribution of (log) event times, which was found to perform very well across all

scenarios. Of course, sensitivity analyses can be conducted to assess this on specific

clinical datasets, by varying the knot locations, where parameter estimates have been

found to be quite robust (Lambert et al., 2010b; Andersson et al., 2011). This can

be expected as hazard ratios are often quite insensitive to a poorly specified baseline

hazard, which can occur if too few degrees of freedom are chosen.

2.9. Why a parametric approach?

Despite the Cox model remaining the most popular survival model choice (Cox,

1972), there is growing interest in parametric survival models (King et al., 2012). In

particular, there are a number of benefits to undertaking a parametric approach in the

analysis of survival data. As described above, the Cox model does not directly estimate

18

the baseline hazard function, whereas a parametric model does. By directly modelling

the baseline hazard function, measures of absolute risk, such as hazard rates, can be

obtained directly, including quantifying the associated uncertainty. From epidemiolog-

ical and clinical perspectives, this is particularly useful in illustrating how hazard rates

change and evolve over time. If the baseline is correctly specified then a fully paramet-

ric approach can be more efficient, with smaller standard errors, than the equivalent

Cox model estimated using partial likelihood (Collett, 2003); however, this gain is often

very minor. The modelling of time-dependent effects (non-proportional hazards) can

be conducted much more easily within a parametric framework, and is often compu-

tationally more efficient when compared to a Cox model with time-dependent effects.

Modelling both the baseline hazard and time-dependent effects in continuous time has

the further benefit of allowing predictions to be made both in and out of sample, which

is particularly useful in prognostic modelling. Hazard ratios can also be translated back

to an absolute scale, which can be of use for calculating the number needed to treat.

Finally, within the economic decision modelling framework, parametric survival models

must be used to model survival data, as extrapolation is often required, frequently over

a lifetime horizon, in order to calculate projected treatment effects and costs (Latimer,

2013).

In particular, there is growing use of the Royston-Parmar flexible parametric sur-

vival model, described in the previous section (Royston and Parmar, 2002). Recently,

a number of applied publications have extolled the benefits of the Royston-Parmar ap-

proach (Miladinovic et al., 2012; Rooney et al., 2013). In direct comparison to the Cox

model, they found that “On balance we prefer the Royston-Parmar model over Cox as

it comes with the advantages of parametric models whilst closely matching the Cox es-

timates”, and concluded that “Royston-Parmar flexible parametric modeling proved a

powerful method” (Rooney et al., 2013), and “Researchers involved in prognosticating

patient survival are encouraged to consider the Royston-Parmar model as an alternative

to Cox” (Miladinovic et al., 2012).

19

2.10. Maximum likelihood estimation

The models described in Section 2.6 can be estimated using maximum likelihood

techniques (Gould et al., 2010). Once again, the log-likelihood contribution of the ith

patient, allowing for right censoring and delayed entry (left truncation), is written as

logLi = log

h(ti)

diS(ti)

S(t0i)

= di logh(ti)+ logS(ti) − logS(t0i) (2.18)

where t0i and ti are the observed entry and survival/censoring times, respectively, for

the ith patient. If delayed entry is not present then the third term in Equation (2.18)

can be dropped. Using Equation (2.18) the log-likelihood can be directly maximised if

using one of the common parametric survival models described in Section 2.6.1.

Finally, it should be noted that using Equation (2.5), Equation (2.18) can be written

only in terms of the hazard function

logLi = di logh(ti) −∫ ti

t0i

h(u)du (2.19)

This directly implies that a model can be estimated using only the hazard function,

assuming the integral in Equation (2.19) can be calculated. Given that often the hazard

function is of most interest, specifying a complex hazard function can be advantageous.

I will return to this aspect of the log-likelihood function for parametric models in

Chapter 5.

More formally, I have the likelihood function with parameter vector θ ∈ Θ, given

the observed data vector, X

L(θ;X) =

n∏

i=1

L(θ;Xi) (2.20)

with the assumptions that the observations are independent and identically distributed.

2.10.1. The Newton-Raphson algorithm

Maximising the likelihood in question can be conducted using a variety of techniques.

20

Here I briefly describe, based on Gould et al. (2010), the most commonly used approach,

as implemented in the ml suite in Stata (StataCorp., 2011). In this section I assume

interest in solving for a parameter vector, to maintain generality.

In essence, I wish to find the estimated parameter values, θ, which maximise a

likelihood function given the observed data, X:

L(θ;X) = maxθ∈Θ

L(θ;X) (2.21)

or equivalently

logL(θ;X) = maxθ∈Θ

logL(θ;X) (2.22)

To obtain the solution, I can attempt to solve this analytically by taking the first

derivative with respect to the parameter vector, and setting it equal to zero

∂ logL(θ;X)

∂θ

∣∣∣∣θ=θ

= 0 (2.23)

which represents finding the roots of the gradient vector.

Perhaps the most popular method of finding roots is Newton’s method. This is an

iterative procedure, which begins with a set of initial values, θi, and is updated under

the following scheme:

(1) Start with an initial guess θi

(2) Calculate a new guess θi+1 = θi+−H(θi)−1g(θi), where g(θi) is the gradient

vector, or score, and H(θi) is the matrix of second derivatives, or Hessian.

(3) Repeat

The procedure will be repeated until specified convergence criteria are met. For

example, within Stata, convergence is declared when g(θi) ∗ H(θi)−1 ∗ g(θi)

′ < ε1,

where ε1 = 1E − 05. A further 1 of 2 conditions must also be met before convergence

is declared;

(1) |θi+1 − θi| < ε2, where ε2 = 1E − 06,

(2) | logL(θi+1; Z)− logL(θi; Z)| < ε3, where ε3 = 1E − 07.

21

2.10.2. Numerical derivatives

The algorithm described above requires the log-likelihood function, and its first and

second derivatives. If the score and Hessian cannot be calculated analytically, then

Stata can provide a numerical approximation. Or even if they can, general purpose

optimisers such as the ml suite in Stata can be considered powerful and accurate enough

that they can be used even when the score and Hessian can be programmed analytically

(MacDonald, 2014).

An analytic derivative can be defined as

f ′(z) =df(z)

dz= lim

h→0

f(z + h)− f(z)

h(2.24)

which leads to the approximation

f ′(z) ≈ f(z + h)− f(z)

h(2.25)

for an appropriate h. Stata actually uses a centred derivative calculation of the form

f ′(z) ≈ f(z + h/2)− f(z − h/2)

h

which although it increases the computation time (as f(z) from Equation (2.25) is

already calculated in the likelihood function call), it reduces the order of magnitude

of the error from O(h) to O(h2). For more details on choosing h, and the numerical

accuracy of the approximation, I refer the reader elsewhere (Gould et al., 2010).

Similarly, the same method can be applied to calculate the required second deriva-

tives

f ′′(z) ≈ f ′(z + h/2)− f ′(z − h/2)

h

All methodological developments described in this thesis use this approach. Recently,

MacDonald (2014) has argued that, given the availability of general purpose numerical

maximisers, there is no need to go directly to the Expectation-Maximisation (EM)

algorithm when handling incomplete data.

22

2.11. Discussion

In this chapter I have introduced the key fundamentals of survival analysis and

introduced notation that will be used in succeeding chapters. The parametric distri-

butions described in this chapter will be used to simulate survival data in Chapter

4, a process often required to validate model assumptions and assess new methodol-

ogy, which will then be extended to simulate complex survival data, including joint

longitudinal-survival data to be used in Chapters 7, 8 and 9.

Researchers are encouraged to adopt new methodology, and move away from the

‘cottage industry’ that has developed around the Cox model (Reid, 1994). Given the

benefits of the parametric approach it is hoped that researchers move away from the de-

fault choice, and embrace the alternatives and the benefits that they can provide. This

motivates the methodological developments in Chapter 5, where I propose a general

framework for the parametric analysis of survival data.

In Chapter 7, I will extend the Royston-Parmar model to the joint longitudinal-

survival modelling framework, illustrating the benefits that a flexible parametric ap-

proach can have within the joint model framework.

23

CHAPTER 3

Clinical motivating examples


In this chapter I describe the motivating clinical datasets used throughout the thesis.

The methodological developments described in subsequent chapters can be motivated

by a variety of datasets from varying clinical areas, not just limited to a particular

disease or setting of interest. Here I present summary statistics and exploratory data

analyses of datasets from areas such as cancer, cardiovascular disease and liver cirrhosis.

This includes non-parametric methods, such as Kaplan-Meier survival curves, which is

a standard way to present survival curves (Kaplan and Meier, 1958), and Royston-

Parmar models to illustrate parametric estimates of the hazard function, with degrees

of freedom selected using the AIC and BIC, described in Chapter 2.

3.2. German breast cancer dataset

The first dataset I describe comes from a study of 686 women diagnosed with breast

cancer in Germany (Schumacher et al., 1994), with 246 patients randomized to receive

hormonal therapy and 440 to receive a placebo. The primary outcome is recurrence-free

survival, with 299 patients experiencing the event of interest, i.e. recurrence or death.

Figure 3.1 shows the Kaplan-Meier plot for recurrence-free survival, across treatment

groups, where baseline was defined as time of mastectomy.

An indication of a beneficial treatment effect can been seen from Figure 3.1, with

improved survival in the arm treated with hormonal therapy, throughout follow-up.

This can be formally investigated using a Cox proportional hazards model, therefore,

as an initial analysis, I fit a Cox proportional hazards regression model to the data,

investigating the effect of treatment, with results shown in Table 3.1.

24

0.00

0.25

0.50

0.75

1.00

Sur

viva

l pro

babi

lity

0 2 4 6 8Time since mastectomy (years)

ControlHormonal therapy

Kaplan-Meier survival estimates

Figure 3.1. German breast cancer trial. Survival function is estimatedusing the Kaplan-Meier method, across treatment groups.

Table 3.1. German breast cancer trial. Results from a Cox propor-tional hazards model, investigating effect of hormonal therapy on sur-vival.

Variable Hazard Ratio Std. Err. 95% CI

Treatment 0.695 0.087 0.544 0.888

Table 3.1 shows the estimated effect of treatment, with a hazard ratio of 0.695

(95% CI: 0.544, 0.888), indicating a beneficial treatment effect reducing the rate of

breast cancer recurrence by 30.5% (11.2%, 45.6%). The fundamental assumption of

proportional hazards can be evaluated using a test of non-zero slope in a generalised

linear regression of the scaled Schoenfeld residuals on time (Grambsch and Therneau,

1994), with the null hypothesis of zero slope (equating to proportional hazards). The

test for proportional hazards in the effect of hormonal therapy gives a p-value of 0.639,

showing no statistically significant evidence against the null hypothesis of proportional

hazards.

As described in Chapter 2, obtaining an estimate of the hazard function can provide

clinically meaningful information, making such things as prediction and extrapolation

possible. It is much more convenient to do this through a parametric approach. I

therefore now apply a proportional hazards Royston-Parmar survival model to the

25

breast cancer data, investigating the effect of hormonal therapy, and modelling the

baseline hazard using restricted cubic splines with four degrees of freedom. Results are

presented in Table 3.2.

Table 3.2. German breast cancer trial. Results from a Royston-Parmar proportional hazards model with 4 degrees of freedom, inves-tigating effect of hormonal therapy on survival.

Variable Hazard ratio Std. Err. 95% CI

Treatment 0.695 0.087 0.544 0.888

Baseline Coefficient Std. Err. 95% CI

Spline 1 1.513 0.137 1.244 1.782Spline 2 0.483 0.134 0.221 0.746Spline 3 0.011 0.047 -0.080 0.103Spline 4 -0.037 0.017 -0.071 -0.003

Intercept -1.173 0.082 -1.334 -1.012

From Table 3.2 I observe an identical estimate of the effect of hormonal therapy on

survival to that of the Cox model, shown in Table 3.1, with an estimated hazard ratio

of 0.695 (95% CI: 0.544, 0.888). In Table 3.2 I also present the estimated coefficients

for the spline terms used to model the baseline log cumulative hazard function, which

can therefore be directly used to produce Figure 3.2, which shows the estimated hazard

function for each treatment arm (confidence intervals are suppressed for illustration).

Figure 3.2 indicates the presence of at least one turning point in the baseline hazard

function, which is often observed in cancer and cardiovascular trials, where the mortality

rate peaks soon after the start of the trial, before tailing off (Royston, 2012). Clearly,

a standard parametric distribution such as the exponential or Weibull, would fail to

capture such a complex hazard function, as I illustrated in the example hazard functions

shown in Figure 2.1 of Chapter 2. Non-parametric kernel smoothers are often used to

show the estimated hazard function; however, they can be particularly erratic (Clark

et al., 2003), with large differences occurring between alternative kernel smoothers. I

illustrate this in Figure 3.3, which shows two commonly used non-parametric estimates

26

0.00

0.05

0.10

0.15

0.20

0.25

Rec

urre

nce

rate

0 2 4 6 8Time since mastectomy (years)

ControlHormonal therapy

Figure 3.2. German breast cancer trial. Hazard function estimatedusing the Royston-Parmar survival model

of the hazard function, namely the Epanechnikov and Epanechnikov 2 kernel smoothers,

respectively (Silverman, 1992).

0.05

0.10

0.15

0.20

0.25

0.30

Rec

urre

nce

rate

0 2 4 6Time since mastectomy (years)

Epanechnikov kernel

0.05

0.10

0.15

0.20

0.25

0.30

Rec

urre

nce

rate

0 2 4 6Time since mastectomy (years)

Epanechnikov 2 kernel

Smoothed hazard estimates

Control Hormonal therapy

Figure 3.3. German breast cancer trial. Non-parametric estimates ofthe hazard function using the Epanechnikov and Epanechnikov 2 kerneldensity smoothers. Follow-up is restricted to 6 years.

Comparing non-parametric smoothers, it is clear that I obtain quite different esti-

mates of the hazard function, although in the non-parametric method I am not assuming

proportional hazards. In particular, I have restricted follow-up to 6 years due to erratic

27

behavior in the tail when using the Epanechnikov 2 kernel smoother. Comparing to

Figure 3.2, the parametric approach produces a much more stable estimated function.

The breast cancer trial will be utilised in Chapter 4, building on the illustration de-

scribed here, that complex hazard functions, incorporating turning points, often occur

in clinical datasets. Therefore, within simulation studies designed to assess methods

used in practice, it is important that simulation scenarios accurately reflect such com-

plex settings.

3.3. Breast cancer in England and Wales

The second dataset used in this thesis consists of 115,331 women diagnosed with

breast cancer between 1986 and 1990 (inclusive) in England and Wales, with follow-

up to the end of 1995. In this case, the event of interest is death from any cause,

and follow-up is restricted to five years post diagnosis. Two covariates are available;

age at diagnosis (in years), and deprivation status categorised into 5 levels (1 = least

deprived,. . . , 5 = most deprived, with quantile group based on the Carstairs deprivation

index (Coleman et al., 1999)). The Carstairs deprivation index is an area measure based

on postcode (Carstairs and Morris, 1991).

All analyses presented in this thesis are restricted to comparing the least and most

deprived groups of patients, resulting in 41,645 women. Furthermore, the analyses

presented in Chapter 5, except in Section 5.11.1 where I model relative survival, are

restricted to women less than 50 years of age, which therefore consists of 9,721 women.

Of the 9,721 women; 2,847 (29.29%) died within 5 years of diagnosis. As an initial ex-

ploration of the reduced dataset, Figure 3.4 shows the Kaplan-Meier survival function,

for the least and most deprived groups, with associated 95% confidence intervals.

Figure 3.4 indicates a marked difference in survival between deprivation groups,

indicating poorer survival in the most deprived group. I further investigate this by

fitting a Cox proportional hazards model, investigating the effect of deprivation status

on survival, with results presented in Table 3.3.

28

0.00

0.25

0.50

0.75

1.00

Sur

viva

l pro

babi

lity

0 1 2 3 4 5Time since diagnosis (years)

Least deprivedMost deprived


Figure 3.4. England and Wales breast cancer data, all ages, compar-ing least and most deprived. Survival function is estimated using theKaplan-Meier method.

Table 3.3. England and Wales breast cancer dataset. Results froma Cox proportional hazards model, investigating effect of deprivationstatus on survival.

Variable Hazard Ratio Std. Err. 95% CI

Most deprived 1.309 0.051 1.212 1.414

Reference group = least deprived

Results from Table 3.3 indicate a statistically significant hazard ratio of 1.309 (95%

CI: 1.212, 1.414) for the effect of deprivation status, indicating a 30.9% higher mortal-

ity rate in the most deprived group compared to the least deprived. However, using

the same proportional hazards test as was used in Section 3.2, I obtain a p-value of

<0.001, indicating strong evidence against the null hypothesis of proportional hazards.

This motivates the investigation of the presence of a time-dependency in the effect of

deprivation status, which will be considered further in Chapter 5.

For illustration, I proceed assuming proportional hazards is valid, and fit a Royston-

Parmar survival model to estimate the baseline hazard function, using 5 degrees of

freedom, with results shown in Table 3.4.

29

Table 3.4. England and Wales breast cancer dataset. Results froma Royston-Parmar proportional hazards model, investigating effect ofdeprivation status on survival.

Variable Hazard ratio Std. Err. 95% CI

Most deprived 1.309 0.051 1.212 1.414

Baseline Coefficient Std. Err. 95% CI

Spline 1 0.752 0.015 0.722 0.782Spline 2 -0.024 0.012 -0.047 -0.002Spline 3 0.053 0.006 0.041 0.066Spline 4 0.007 0.003 0.001 0.014Spline 5 0.000 0.002 -0.003 0.003

Intercept -1.467 0.024 -1.514 -1.421

Comparing results from Tables 3.3 and 3.4, I observe complete agreement between

Cox and Royston-Parmar models, with a hazard ratio of 1.309 (95% CI: 1.212, 1.414)

comparing the most to the least deprived groups. I show the estimated hazard function

in Figure 3.5.

0.04

0.06

0.08

0.10

0.12

Mor

talit

y ra

te

0 1 2 3 4 5Time since diagnosis (years)

Least deprivedMost deprived

Figure 3.5. England and Wales breast cancer dataset, all ages, com-paring least and most deprived. Hazard function is estimated using aRoyston-Parmar model.

The high initial mortality rate is generally attributed to a combination of patients

diagnosed with late-stage breast cancer and patients who are diagnosed due to investi-

gation of a fatal condition, unrelated to the cancer (Royston and Lambert, 2011). This

30

dataset is used extensively in Chapter 5 to illustrate more flexible parametric stan-

dard survival models, where I return to investigating the presence of a time-dependent

effect due to deprivation status, and the extension to relative survival utilising the

unrestricted age range dataset.

3.4. Bladder cancer recurrence

In this thesis, I predominantly concentrate on single event survival analysis. How-

ever, patients can often be at risk of multiple events, for example if the event is non-fatal,

then it may occur multiple times. To illustrate this, I employ a commonly used example

in bladder cancer (Wei et al., 1989; Therneau and Grambsch, 2000), which consists of

86 patients who experience up to 4 recurrences of their bladder cancer. Tumours were

removed initially, before patients were randomised to treatment, and following recur-

rence, further tumours were removed at each repeat visit. The number of recurrences

that patients experience are shown in Table 3.5, with a maximum of 4. Some patients

experience a recurrence and then are immediately lost to follow-up, and hence are not

considered at risk of a subsequent recurrence. In total, 112 recurrences were observed.

Covariates of interest include treatment group (0 for placebo, 1 for thiotepa), initial

number of tumors (range 1 to 8, with 8 meaning 8 or more), and initial size of tumors

(in centimetres, with range 1 to 7).

Table 3.5. Bladder cancer recurrence dataset. Number of patients whowere censored or experienced up to 4 recurrences of bladder cancer

Recurrence Number of patientsnumber Censored Event Total

1 38 47 852 17 29 463 5 22 274 6 14 20

This dataset will be utilised in Chapter 5 Section 5.11.2 to illustrate the use of

cluster robust standard errors to account for the correlation induced between recurrent

events, within a parametric survival analysis using restricted cubic splines to model the

31

baseline hazard function. This will allow the estimation of separate hazard functions

for each recurrence.

3.5. Trial in prednisone treatment of patients with liver cirrhosis

Here I describe a trial investigating the effect of prednisone treatment, in patients

with liver cirrhosis, previously described in Anderson et al. (1993) and others including

Henderson et al. (2002). The data consists of 488 patients with cirrhosis of the liver,

where 251 were randomised at diagnosis to receive treatment with prednisone, and 237

receive a placebo. A total of 292 patients died during the study, with maximum follow-

up of just over 13 years. A further covariate of interest is prothrombin index, which was

measured repeatedly throughout the study. Measurements were scheduled at baseline,

3, 6, and 12 months, and then annually; however, actual time of measurement varied

substantially. Figure 3.6 presents the Kaplan-Meier survival function, across treatment

groups.

0.00

0.25

0.50

0.75

1.00

Sur

viva

l pro

babi

lity

0 5 10 15Time since diagnosis (years)

PlaceboPrednisone


Figure 3.6. Liver cirrhosis trial investigating prednisone treatment.Survival function is estimated using the Kaplan-Meier method.

Figure 3.6 indicates little difference between treatment groups in survival during

the first 2-3 years since diagnosis, then the curves diverge indicating a possible im-

proved prognosis in the prednisone group, before the curves come together again at

32

approximately 9 years. I also conduct a preliminary investigation of the shape of the

baseline hazard, using a Royston-Parmar model with 5 degrees of freedom investigat-

ing the direct treatment effect on survival. The estimated hazard function is shown in

Figure 9.3.

0.00

0.10

0.20

0.30

0.40

Mor

talit

y ra

te

0 5 10 15Time since diagnosis (years)

PlaceboPrednisone

Figure 3.7. Liver cirrhosis trial investigating prednisone treatment.Hazard function estimated using a Royston-Parmar model.

The Royston-Parmar model indicated a non-statistically significant hazard ratio of

0.909 (95% CI: 0.723 1.144) comparing prednisone treatment to placebo. Here I am

interested not only in a possible treatment effect on survival, but also the association

between the repeatedly measured prothrombin index and survival, which will be covered

extensively in Chapters 7 and 8.

In Figures 3.8 and 3.9 I present the longitudinal profiles of prothrombin index over

time, combining treatment groups for purposes of illustration, with a lowess smoother

overlaid to give an indication of the average profile over time, and comparing between

patients who died and who were censored. Figure 3.8 begins to give an idea of the

relationship between the longitudinal and survival processes, which becomes further

pronounced in Figure 3.9, where the timescale is adjusted by taking away each patients

observed survival time. Figure 3.9 indicates a trend in prothrombin index in the 2-3

33

year period before death, possibly indicating that lower values of prothrombin index

could be associated with an increased risk of death.

050

100

150

200

Long

itudi

nal r

espo

nse

0 5 10Measurement time

Censored

050

100

150

200

Long

itudi

nal r

espo

nse


Event

Figure 3.8. Liver cirrhosis trial investigating prednisone treatment.Longitudinal profiles of prothrombin index for patients who were cen-sored/died. Lowess smoother overlaid.

050

100

150

200

Long

itudi

nal r

espo

nse

-15 -10 -5 0Time before censoring

Censored

050

100

150

200

Long

itudi

nal r

espo

nse

-15 -10 -5 0Time before event

Event

Figure 3.9. Liver cirrhosis trial investigating prednisone treatment.Longitudinal profiles of prothrombin index for patients who were cen-sored/died. Timescale is adjusted by taking away each patients observedsurvival time. Lowess smoother overlaid.

Figure 3.7 highlights another example of a complex underlying hazard function,

with multiple turning points, which cannot be captured using standard parametric

34

methods. It is this which motivates the incorporation of more flexible models, such as

the Royston-Parmar model and a finite mixture survival model, which will be developed

within the joint model framework in Chapters 7 and 8.

3.6. Cohort of obese patients with type 2 diabetes mellitus from the

General Practice Research Database

I now introduce an observational dataset, which comes from the General Practice

Research Database (GPRD). This dataset comes from a recent Health Technology As-

sessment (HTA) that I was involved in, investigating the clinical and cost effectiveness

of anti-obesity treatments in primary care (Ara et al., 2012). The dataset I describe

is a subset of the full 100,000 patient cohort analysed in the HTA report, and consists

of 4,850 obese patients with diabetes mellitus. Covariates of interest include gender

and age (in years) at baseline, and repeated measures of systolic blood pressure (SBP),

of which I have 107,347 measurements. The event of interest in this case is non-fatal

stroke, of which 278 were observed, with maximum follow-up of 22 years. Primary

interest in the analysis presented in Chapter 9, is the association between baseline SBP

and the risk of stroke. Baseline is defined as the time at which each patient entered

the cohort, i.e. time of first SBP measurement. Figure 3.10 shows the Kaplan-Meier

survival function.

The baseline hazard function is estimated using a Royston-Parmar model with two

degrees of freedom, with the estimated function and 95% confidence interval shown in

Figure 3.11.

In Figure 3.12 I illustrate the distribution of baseline SBP, separately for patients

who were censored and for those who suffered a stroke, indicating slightly higher values

of SBP for patients who had a stroke. Figure 3.13 illustrates the distribution of the

number of observed measurements of SBP per patient. Median number of measure-

ments is 17, with a range of 1 to 269.

This dataset will be used in Chapter 9 in which I investigate the use of a specific

association structure within a joint longitudinal-survival framework, to account for

35

0.00

0.25

0.50

0.75

1.00

Sur

viva

l pro

babi

lity

0 5 10 15 20 25Follow-up time (years)

Kaplan-Meier survival estimate

Figure 3.10. GPRD cohort of obese patients with type 2 diabetesmellitus. Survival function is estimated using the Kaplan-Meier method.

0.00

0.01

0.02

0.03

0.04

0.05

Haz

ard

func

tion


95% confidence intervalHazard function

Figure 3.11. GPRD cohort of obese patients with type 2 diabetesmellitus. Hazard function is estimated using a Royston-Parmar model.

measurement error when linking baseline SBP and the risk of stroke. With an observa-

tional dataset, the issue of appropriate choice of timescale must be raised. The baseline

time point used here is time of first measurement of SBP; however, when estimating

survival probabilities, this does not provide an intuitive, clinically meaningful way of

conveying survival to patients. More appropriate perhaps, would be to use age as the

timescale; however, this requires delayed entry to be incorporated into the joint model

36

5010

015

020

025

0

Censored Stroke

Sys

tolic

Blo

od P

ress

ure

Graphs by Stroke

Figure 3.12. GPRD cohort of obese patients with type 2 diabetesmellitus. Box plots of baseline SBP for patients who were censored orwho suffered a stroke.

0.0

1.0

2.0

3D

ensi

ty

0 50 100 150 200 250No. of measurements of SBP

Figure 3.13. GPRD cohort of obese patients with type 2 diabetesmellitus. Distribution of number of observed measurements of systolicblood pressure per patient.

framework, which is not currently available in any software package. This extension

will be discussed further in Chapter 11.

37

3.7. Trials of D-penicillamine treatment for patients with primary biliary

cirrhosis

The final dataset I present is in the area of Primary Biliary Cirrhosis (PBC), which

is a long-term disease of the liver affecting the small bile ducts, leading to a build up of

bile which subsequently damages the liver (Murtagh et al., 1994). The dataset consists

of 312 patients who were enrolled in two trials investigating the effect of D-penicillamine

treatment, of which 140 died. A total of 1945 measurements of serum bilirubin were

recorded and are available for analysis. Figure 3.14 presents the Kaplan-Meier survival

curve, across treatment groups.

0.00

0.25

0.50

0.75

1.00

Sur

viva

l pro

babi

lity

0 5 10 15Time since diagnosis (months)

PlaceboD-penicillamine


Figure 3.14. Primary biliary cirrhosis trials investigating D-penicillamine treatment. Survival function is estimated using theKaplan-Meier method.

Figure 3.14 indicates very similar survival probabilities between the two treatment

groups. This is further illustrated by fitting a Royston-Parmar survival model with 1

degree of freedom (equivalent to a Weibull model), investigating the effect of treatment.

The estimated hazard function is shown in Figure 3.15.

The RP model indicated a non-statistically significant treatment effect of 1.000

(0.718, 1.392), which is illustrated with the overlapping estimated hazard functions

shown in Figure 3.15. However, as in the liver cirrhosis example described in Section

38

0.055

0.060

0.065

0.070

0.075

0.080

Mor

talit

y ra

te

0 5 10 15Time since diagnosis (months)

PlaceboD-penicillamine

Figure 3.15. Primary biliary cirrhosis trials investigating D-penicillamine treatment. Hazard function is estimated using a nonpara-metric kernel smoother.

3.5, I am interested not only in the direct treatment effect on survival, but also the

association between a repeatedly measured biomarker, serum bilirubin, and survival.

Due to some right skewness, in all analyses of the PBC dataset, I work with log of

serum bilirubin.

In Figures 3.16 and 3.17 I present the longitudinal profiles of log(serum bilirubin)

over time, with the lowess smoother overlaid to give an indication of the average profile

over time, and comparing between patients who died and who were censored. Figure

3.16 suggests a relationship between the longitudinal and survival processes, which

becomes further pronounced in Figure 3.17, where the timescale is adjusted by taking

away each patients observed survival time. Figure 3.17 indicates an upward trend in

log(serum bilirubin) in the 5 month period before death, possibly indicating that higher

values of log(serum bilirubin) could be associated with an increased risk of death.

The PBC dataset can be considered a classic joint model dataset, and will be used

in Chapter 10 to illustrate some of the features of the stjm Stata package written as a

central component of this thesis.

39

-20

24

Long

itudi

nal r

espo

nse

0 5 10 15Measurement time

Censored

-20

24

Long

itudi

nal r

espo

nse

0 5 10 15Measurement time

Event

Figure 3.16. Primary biliary cirrhosis trials investigating D-penicillamine treatment. Longitudinal profiles of log(serum bilirubin)index for patients who were censored/died. Lowess smoother overlaid.

-20

24

Long

itudi

nal r

espo

nse


Censored

-20

24

Long

itudi

nal r

espo

nse


Event

Figure 3.17. Primary biliary cirrhosis trials investigating D-penicillamine treatment. Longitudinal profiles of log(serum bilirubin)index for patients who were censored/died. Timescale is adjusted bytaking away each patients observed survival time. Lowess smootheroverlaid.

40

3.8. Discussion

In this chapter I have introduced and described a variety of clinical datasets which

will be used throughout the remainder of the thesis. The wide variety of example

datasets will be used to motivate and illustrate complex statistical techniques.

41

CHAPTER 4

Simulation of complex survival data


In this chapter I describe and develop methods for the simulation of complex sur-

vival data. In order to evaluate statistical models, such as those described in Chapter

2, simulation studies can be conducted to establish the performance of a model when

generating data from a known ‘truth’. In other words, I can generate data under a

known model, apply a proposed model, and establish how well the proposal performs.

I begin by describing how to simulate survival times from a standard proportional

hazards model, assuming a parametric distribution such as the Weibull, and then de-

velop methodology and propose an algorithm which allows the simulation of survival

times from essentially any defined hazard function. This includes complex baseline haz-

ard functions, which will be utilised in Chapter 5, and joint longitudinal and survival

data, which will be utilised in Chapters 7, 8 and 9. The simulation framework is illus-

trated through a number of example simulation studies. The methods are implemented

in the publicly available survsim Stata package, which can be downloaded by typing

ssc install survsim in Stata.

This chapter is based on two recent publications; Crowther and Lambert (2012)

and Crowther and Lambert (2013a), copies of which can be found in Appendix A.

4.2. Introduction

Simulation studies are conducted to assess the performance of current and novel

statistical models in pre-defined scenarios. By simulating from a known ‘truth’, it can be

directly evaluated how well a proposed model performs in estimating the ‘truth’, which

cannot be conducted using real clinical data. The quality and reporting of simulation

42

studies varies considerably which has led to the establishment of general guidelines for

the development and reporting of simulation studies in medical research (Burton et al.,

2006). In order to establish certain properties, such as bias and coverage, or robustness

to deviations from underlying assumptions, it is often desirable that chosen simulation

scenarios accurately reflect a biologically plausible distribution. This is particularly

important in the framework of survival analysis, where distributions are chosen for

both the event time and the censoring time.

Previous studies have introduced a framework to simulate survival data for Cox

proportional hazards models (Leemis, 1987; Bender et al., 2005), known as the inver-

sion method, which can generate survival times from standard parametric distributions

including the exponential, Weibull and Gompertz amongst others, incorporating base-

line covariate effects under proportional hazards. This method is easily implemented

in any standard software with a random number generator. Bender et al. (2005) noted

that many simulation studies that generated survival data assumed an exponential dis-

tribution for the distribution of event times. Although many recent studies have gone

beyond the standard exponential choice to a slightly more complex Weibull distribu-

tion (Rashid et al., 2008; Belot et al., 2010), these choices are often not flexible enough

to fully reflect the underlying distributions frequently observed in clinical data, as I

illustrated in Chapter 3.

Often in clinical trials or population based studies, at least one turning point is

observed in the underlying hazard function (Murtagh et al., 1994; Lambert and Roys-

ton, 2009; Eloranta et al., 2012). Although hazard ratios can be insensitive to a poorly

specified baseline hazard function (Rutherford et al., 2014), it is vital to accurately

capture the baseline hazard function when interest lies in measures of absolute risk

(Royston and Lambert, 2011). Through a fully flexible parametric model, I can both

accurately capture complex hazard functions, but also simulate biologically plausible

survival data. As discussed in Section 2.9, such methods are becoming more common-

place as the benefits of a parametric approach, such as the reporting of measures of

43

absolute risk, become recognised in applied research. I will return to parametric mod-

elling further in Chapter 5. In order to assess such parametric approaches methods

are required to simulate survival data from a variety of complex distributions, beyond

standard distributions such as the exponential, Weibull and Gompertz.

Furthermore, a variety of extensions to the standard survival analysis framework,

such as incorporation of time-dependent effects (non-proportional hazards), the oc-

currence of time-varying covariates (which will be considered further in Chapter 6),

heterogeneous covariate effects and covariates measured with error, all require suitable

simulation techniques to assess statistical models developed for each setting. A further,

often observed phenomenon, in survival analysis is the presence of informative censor-

ing. Standard survival models make the assumption of no dependence between the

survival and censoring mechanisms. Assessing the robustness of methods to deviations

from this assumption is a key question in survival analysis (Siannis et al., 2005).

In this chapter, I develop a general algorithm for the simulation of survival times

(Crowther and Lambert, 2013a). In Section 4.3, I briefly describe again a dataset

of breast cancer patients, introduced in Section 3.2, which exhibits turning points in

the underlying hazard function, which cannot be simulated using standard parametric

distributions. I briefly describe the inversion method in Section 4.4, to simulate data

from standard parametric distributions with an analytically tractable and invertible

cumulative hazard function, which forms the basis for the simulation framework. In

Sections 4.5 and 4.6, I describe a range of simulation scenarios, culminating in a general

simulation algorithm to simulate survival data from complex distributions using root

finding techniques with nested numerical integration. In Section 4.7, I describe how

to incorporate time-dependent effects, both with standard and complex parametric

distributions. I then describe how to incorporate both binary and continuous time-

varying covariates in Section 4.8, whilst in Section 4.9, I describe how the techniques

can be applied to incorporate dependent censoring. The methods are implemented

using the publicly available survsim package in Stata (Crowther and Lambert, 2012;

Crowther, 2011). Finally, in Section 4.11, I conclude the chapter with a discussion.

44

4.3. Motivating dataset - German breast cancer trial

I begin by illustrating the commonly observed phenomenon of a turning point in the

baseline hazard function, as motivation for simulating more clinically plausible survival

data. I use the dataset described in Section 3.2. Briefly, this dataset consists of 686

women diagnosed with breast cancer in Germany (Schumacher et al., 1994), with 246

patients randomized to receive hormonal therapy and 440 to receive a placebo. Outcome

is recurrence-free survival, with 299 patients experiencing the event of interest.

I apply a Weibull proportional hazards model and a proportional hazards Royston-

Parmar (RP) model, as described in Section 2.8.2, investigating the effect of hormonal

therapy treatment. In preliminary modelling, the degrees of freedom used to model

the baseline of the RP model are selected using the AIC and BIC, as recommended

by Rutherford et al. (2014). The AIC selected four degrees of freedom, and the BIC

selected two. In this case I follow the AIC and choose four degrees of freedom, to ensure

sufficient flexibility. Estimates of the treatment effect from the Weibull and RP models

are shown in Table 4.1. For comparison, I also fit a Cox model.

Table 4.1. German breast cancer trial. Estimated treatment effectsfrom Weibull, Royston-Parmar and Cox models

Model Hazard ratio 95% Confidence Interval

Weibull 0.675 0.528 0.862RP (df=4) 0.695 0.544 0.888

Cox 0.695 0.544 0.888

Table 4.1 indicates identical estimates between the RP and Cox models (to three

decimal places), with a minor difference in estimated hazard ratios between the Weibull

and RP models, with hazard ratios of 0.675 and 0.695, respectively, both showing a

statistically significant beneficial treatment effect. Figure 4.1 shows the fitted survival

curves from the Weibull and RP models overlaid on the Kaplan-Meier curves, by treat-

ment group, showing the much improved fit from the more flexible model.

45

0.2

0.4

0.6

0.8

1.0

Sur

viva

l pro

babi

lity

0 2 4 6 8Follow-up (years)

Weibull model

0.2

0.4

0.6

0.8

1.0

Sur

viva

l pro

babi

lity


Flexible parametric model

KM, no therapy Predicted survival, no therapyKM, hormonal therapy Predicted survival, hormonal therapy

Figure 4.1. German breast cancer trial. Predicted survival fromWeibull and Royston-Parmar models overlaid on the Kaplan-Meiercurves.

Furthermore, in Figure 4.2 I present the fitted hazard functions, across treatment

group, for the Weibull and RP models, illustrating a marked difference in the estimated

underlying shapes, indicating the presence of two turning points in the hazard function.

0.00

0.05

0.10

0.15

0.20

0.25

Haz

ard

rate


Weibull model

0.00

0.05

0.10

0.15

0.20

0.25

Haz

ard

rate


Flexible parametric model

Predicted hazard, no therapyPredicted hazard, hormonal therapy

Figure 4.2. German breast cancer trial. Predicted hazard functionsfrom Weibull and Royston-Parmar survival models.

46

4.4. Simulating survival data

4.4.1. Simulating survival times from standard parametric distributions

Bender et al. (2005) described a highly efficient, and easy to implement, technique

to generate survival times from a variety of standard parametric distributions, known

as the inversion method. Here, I briefly describe the method, as it forms the basis

for the extensions below. The hazard function of a proportional hazards model can be

expressed as

h(t|X) = h0(t) exp(Xβ)

where h0(t) is the baseline hazard function specified by some parametric distribution,

X is a vector of time-independent covariates with corresponding log hazard ratios,

β. The corresponding cumulative hazard, H(t|X), survival, S(t|X) and cumulative

distribution, F (t|X), functions, conditional on baseline covariates, are defined as follows

H(t|X) = H0(t) exp(Xβ), where H0(t) =

∫ t

0h0(u)du

S(t|X) = exp[−H(t|X)] and F (t|X) = 1− exp[−H(t|X)]

If I let T be the simulated survival time, Bender et al. (2005) showed that by letting

F (T |X) = 1− exp[−H(T |X)] = u, where u ∼ U(0, 1)

or alternatively

S(T |X) = 1− u (or equivalently = u) (4.1)

Thus, if h0(T ) > 0, then Equation (4.1) can be re-arranged and directly solved for T ,

as long as H0(T ) can be directly inverted.

T = H−10

(− log(u)

exp(Xβ)

)(4.2)

The data generating process then only requires draws from a uniform distribution,

followed by application of Equation (4.2). The three standard choices for h0(T ) are the

exponential, Weibull and Gompertz distributions described in Section 2.6, with common

47

hazard function shapes shown in Figure 2.1. For example, to simulate survival times

from a Weibull distribution, Equation (4.2) becomes

T =

(− log(u)

exp(Xβ)

)1/γ

(4.3)

The three standard distributions can be considered restrictive in terms of the shapes of

the baseline hazard function that can be generated. However, these distributions remain

appealing to researchers conducting simulation studies, perhaps because Equation (4.3)

has an analytically tractable form.

4.5. A general framework for simulation of survival data

In this section, I give a broad overview of the simulation framework, before describ-

ing in more detail the statistical techniques used in the algorithms, in Section 4.6. The

inversion method relies on two properties; 1) the cumulative hazard function must be

analytically tractable, i.e. it can be written down in a closed form; and 2) the cumula-

tive hazard function must be invertible, i.e. we must be able to directly solve for the

simulated survival time, T .

I will describe algorithms to first overcome the situation when I can write the cumu-

lative hazard function in closed form; however, cannot directly solve for the simulated

survival time. I then move to the general framework, where both the cumulative hazard

cannot be written in a closed form, and further it cannot be inverted to solve for the

simulated survival time. Figure 4.3 shows a schematic flow diagram illustrating the

general framework for simulating survival data from a defined hazard or cumulative

hazard function.

4.5.1. Scenario 1

Scenario 1 involves the setting of Bender et al. (2005) described in Section 4.4.1,

where the cumulative hazard function has a closed form expression and can be directly

inverted to solve for T , the simulated survival time, hence the inversion method can be

directly applied.

48

Does H0(t) have a closed form expression?

Can you solve for T analytically?

Scenario 1 Apply method of

Bender et. al. (2005)

Scenario 2 Use iterative root

finding to solve for simulated time, T

Scenario 3 Numerically integrate to obtain H0(t), within iterative root finding

to solve for T

Yes Yes

No No

Figure 4.3. Schematic flow diagram of simulation techniques, begin-ning at the top-left.

4.5.2. Scenario 2

Scenario 2 arises when I wish to use a more complex baseline hazard function to

simulate data under a proportional hazards model. In this case I assume the cumulative

hazard function has a closed form expression. However, if I choose a more complex

hazard and consequently cumulative hazard function, it may be that the cumulative

hazard function can no longer be directly inverted, and therefore cannot directly be

solved for T , the simulated survival time. In this situation I proceed by applying

iterative root finding techniques. I describe this in more detail, and through example,

in Section 4.6.1.

4.5.3. Scenario 3

Finally, Scenario 3 arises when I define a complex hazard function which cannot be

integrated analytically to obtain the cumulative hazard function. To accommodate this

setting, numerical integration techniques such as Gauss-Legendre quadrature (described

in Section 4.6.3) can be used. Following this, I once again have a cumulative hazard

49

function which cannot be directly inverted to solve for the simulated survival time, T ,

therefore requiring iterative root finding techniques as in Scenario 2. This results in

a general 2-stage algorithm involving numerical integration nested within an iterative

root finding procedure. I describe this in more detail in Section 4.6.3.

4.6. Simulating from a complex baseline hazard function

4.6.1. Root finding

The first extension I describe involves the situation where I wish to use a more complex

baseline hazard function, to simulate data under a proportional hazards model. In this

case I still assume that the cumulative hazard can be evaluated analytically for a given

hazard function.

The step between Equation (4.1) and Equation (4.2) is reliant on being able to

directly re-arrange Equation (4.1) to solve for T , the simulated survival time. When

this condition fails it requires iterative techniques to find the root of Equation (4.1). I

illustrate this situation through an example.

4.6.1.1. Example: 2-component mixture Weibull distribution

I now begin to introduce some complexity in the parametric distribution used to gen-

erate survival times. Motivation for going beyond the standard parametric approaches

described in Section 4.4.1 originates from the often observed situation of a turning point

in a dataset’s baseline hazard function, illustrated in Section 4.3. One such approach is

to use a mixture of parametric distributions (Crowther and Lambert, 2012; Rutherford

et al., 2014).

Here I define the overall baseline survival function of a two-component parametric

mixture model. Finite mixture survival models of this form have been used in standard

survival analysis (McLachlan and McGiffin, 1994), and mixture and non-mixture cure

models to obtain improved estimates of statistical cure (Lambert et al., 2010b). I define

two parametric components additive on the survival scale

S0(t) = pS01(t) + (1− p)S02(t) (4.4)

50

which defines a baseline survival function, S0(t), where S01(t) and S02(t) are the survival

function of any standard parametric distribution, and p represents the mixing parame-

ter where 0 ≤ p ≤ 1. For illustrative purposes I proceed by assuming a two-component

mixture Weibull distribution, with

S0(t) = p exp(−λ1tγ1) + (1− p) exp(−λ2t

γ2) (4.5)

where λ1, λ2, and γ1, γ2 are scale and shape parameters, respectively, with λ1, λ2,

γ1, γ2 > 0. Transforming to the cumulative hazard scale

H0(t) = − log [p exp(−λ1tγ1) + (1− p) exp(−λ2t

γ2)] (4.6)

and differentiating with respect to t, I obtain the baseline hazard function

h0(t) =λ1γ1t

γ1−1p exp(−λ1tγ1) + λ2γ2t

γ2−1(1− p) exp(−λ2tγ2)

p exp(−λ1tγ1) + (1− p)exp(−λ2tγ2)(4.7)

Proportional hazards can then be induced

h(t) =λ1γ1t


γ2−1(1− p) exp(−λ2tγ2)

p exp(−λ1tγ1) + (1− p) exp(−λ2tγ2)exp(Xβ) (4.8)

where X is a vector of time-independent covariates (and no intercept term), with

associated regression coefficients, β. This model can be used to simulate survival

data from a variety of functions with turning points, to better reflect observed clinical

datasets. I illustrate some examples in Figure 4.4, based on those seen in real datasets

(Anderson et al., 1993; Murtagh et al., 1994; Schumacher et al., 1994).

Equation (4.8) can be directly integrated with respect to t to obtain the cumulative

hazard function, and consequently the survival function. However, this leads to a

survival function that when substituted into Equation (4.1), produces an equation

which cannot be directly solved for t (Scenario 2 described in Section 4.5.2). I now

describe two root finding techniques to accommodate this situation.

51

0.5

11.

5H

azar

d fu

nctio

n

0 2 4 6 8 10Follow−up time

λ1 = 1, γ1 = 1.5, λ2 = 1,γ2 = 0.5, p = 0.5

0.5

11.

5H

azar

d fu

nctio

n


λ1 = 0.1, γ1 = 3, λ2 = 0.1,γ2 = 1.6, p = 0.8

0.5

11.

5H

azar

d fu

nctio

n


λ1 = 1.4, γ1 = 1.3, λ2 = 0.1,γ2 = 0.5, p = 0.9

0.5

11.

5H

azar

d fu

nctio

n0 2 4 6 8 10

Follow−up time

λ1 = 1.5, γ1 = 0.2, λ2 = 0.5,γ2 = 0.1, p = 0.1

Figure 4.4. Example two-component mixture Weibull baseline hazardfunctions

4.6.1.2. Brent’s univariate root-finding method

To generate survival times, the following needs to be solved for t,

g(t) = S(t)− U = 0 (4.9)

An efficient method to calculate the simulated survival times is to use Brent’s univariate

root finder. This algorithm combines the bisection method with linear or quadratic

interpolation (Jann, 2005). The algorithm is executed until a desired tolerance (I use

a default of 1E-08) is met.

4.6.1.3. Newton-Raphson root finder

An alternative method to Brent’s root finder is to use Newton-Raphson iterations,

which uses the first two terms of the Taylor series expansion of g(t), the objective

function. I begin with an initial guess, t0, and then execute the following formula

tn+1 = tn −g(tn)

g′(tn)(4.10)

52

Equation (4.10) is executed until a desired tolerance is met, for example |tn+1 − tn| <1E − 08. Royston (2012) proposed to use Newton-Raphson iterations to simulate sur-

vival times using restricted cubic splines on the log cumulative hazard scale. In my

experience I have found Brent’s method to be far superior in terms of reliability and

accuracy compared to Newton-Raphson iterations, which can have convergence prob-

lems.

4.6.2. Example simulation study

I illustrate the use of Brent’s root finding technique described in Section 4.6.1, applied

to the 2-component mixture Weibull distribution. In each of 1000 repetitions, I generate

1000 survival times from a two-component mixture Weibull baseline hazard function,

with parameters λ1 = 0.3, γ1 = 2.5, λ2 = 0.025, γ2 = 1.9 and p = 0.3. These

parameter values are chosen to closely approximate the observed hazard function seen

in the breast cancer dataset. I also include a binary treatment variable, drawn from

Xi ∼ Bin(1, 0.5), with associated hazard ratio exp(β) = 0.7, and apply administrative

censoring at 5 years. Computation time to generate the 1000 datasets was 34 seconds

on an Intel Core i5 2.5GHz CPU, illustrating the computational efficiency of the root

finding technique. To each simulated dataset I apply a Weibull survival model and the

two-component mixture Weibull model (Crowther and Lambert, 2011), i.e. the true

model, monitoring estimates of the log hazard ratio. Furthermore, I monitor estimates

of the survival probability and hazard rate in the reference group (Xi = 0), at time

points t = 1, 2, 3, 4, 5, to assess the ability of each model to capture the baseline

hazard and survival functions.

Results are presented in Table 4.2. The mixture Weibull model produces unbiased

estimates and good coverage probabilities in the log hazard ratio, and the estimates of

survival and hazard, indicating its ability to capture the complex underlying hazard

function. In comparison, estimates from the Weibull model indicate minor bias in the

log hazard ratio, with large bias observed in the estimates of survival and hazard, across

the 5 time points, indicating its inability to effectively capture the underlying shape.

53

Table 4.2. Bias and coverage of the log hazard ratio and estimates ofbaseline survival and hazard at specific time points from both Weibulland mixture Weibull models.

Weibull Mixture WeibullTruth Bias 95% Coverage Bias 95% Coverage

β -0.357 -0.013 92.3 -0.002 93.7Survival

1 year 0.905 -0.019 43.9 0.001 94.72 year 0.693 0.055 5.4 -0.001 94.33 year 0.575 0.041 39.2 -0.000 95.24 year 0.494 0.004 93.7 0.001 94.35 year 0.411 -0.015 88.1 -0.000 94.1

Hazard1 year 0.220 -0.067 0.0 -0.002 93.82 year 0.250 -0.066 0.1 0.001 95.43 year 0.146 0.059 0.0 -0.001 94.24 year 0.166 0.055 1.5 -0.001 95.15 year 0.202 0.033 51.0 0.009 95.1

4.6.3. Numerical integration

I now describe in more detail scenario 3, introduced in Section 4.5.3, where I define

a more general functional form for the hazard function, which will require numerical

integration techniques to evaluate the cumulative hazard function. This is then followed

by one of the root finding techniques described above. I once again illustrate this

through an example.

4.6.3.1. Example: Fractional polynomials

Fractional polynomials select powers from a pre-defined set, usually (-2, -1, -0.5, 0,

0.5, 1, 2, 3), which can be used to model continuous covariates which exhibit non-

linearity (Royston and Sauerbrei, 2008). Note that 0 corresponds to log(). If a power

is repeated, such as 0.5, 0.5, then if x is the covariate being transformed, then this

gives x0.5 × log(x). In this case, the continuous ‘covariate’ of interest is survival time.

I once again define the proportional hazards model

h(t) = h0(t) exp(Xβ) (4.11)

54

where h0(t) is any general function which satisfies h0(t) > 0 for t > 0. Here I expand

log [h0(t)] into a function with 2 turning points, in this case a fractional polynomial

function with three powers 1, 0.5, 0.5.

log(h0(t)) = −18 + 7.3t− 11.5t0.5 log(t) + 9.5t0.5 (4.12)

The assumed hazard function is shown in Figure 4.5. This provides a reasonable fit to

the example dataset described in Section 4.3.

0.0

0.1

0.2

0.3

0.4

Haz

ard

rate

0 1 2 3 4 5Follow-up time

Fractional polynomial baseline hazard function

Figure 4.5. Example fractional polynomial baseline hazard function

The next step to simulate survival times from this underlying hazard function is to

calculate the cumulative hazard function; however, when I substitute the exponential of

Equation (4.12) into Equation (4.11) and attempt to integrate, I obtain an analytically

intractable integral, therefore requiring numerical techniques in order to evaluate it.

4.6.3.2. Gaussian quadrature

Numerical integration techniques, such as Gaussian quadrature (Stoer and Burlirsch,

2002), provide an approximation to an analytically intractable integral. Gaussian quad-

rature turns an integral into a weighted summation of a function evaluated at a set of

pre-defined points called nodes. It performs well if the function being integrated can be

approximated by a polynomial of degree 2m− 1, where m is the number of quadrature

55

nodes. For example, integrating a hazard function

∫ t

0h(u)du (4.13)

I first need to undertake a change of interval using

∫ t

0h(u)du =

t

2

∫ 1

−1h

(t

2z +

t

2

)dz (4.14)

I can now numerically integrate, using for example Gauss-Legendre quadrature, result-

ing in ∫ t

0h(u)du ≈ t

2

m∑

i=1

wih

(t

2zi +

t

2

)(4.15)

where w and z are vectors of weights and node locations, respectively. Under Gauss-

Legendre quadrature the weights are defined as

wi =2

(1− z2i )[P ′m(zi)]

where Pm(zi) is the Legendre polynomial of order m. The numerical accuracy of the

approximation depends on the number of nodes, m. In my experience I have found

that often 30 nodes are sufficient; however, when the hazard is initially very high then

often more will be required. The accuracy can be assessed by setting a simulation seed

and simulating survival times with an increasing number of nodes, and comparing the

relative difference between times. Further forms of numerical integration for integrating

over an infinite domain are introduced in Chapter 6.

Now that I can calculate the cumulative hazard, I then apply one of the root finding

procedures described in Section 4.6.1. The iterative algorithm, however, in this case

now has multiple steps, including numerical integration nested within either Newton-

Raphson steps or Brent’s method.


I now illustrate the algorithm by simulating survival data from the baseline hazard

function defined in Equation (4.12). In this case, I can use the Royston-Parmar survival

model, which models the baseline log cumulative hazard function using restricted cubic

56

splines, to assess how well it captures complex hazard functions. I apply a Royston-

Parmar model with 5 degrees of freedom and for comparison I also apply a Weibull

proportional hazards model. For each of 1000 repetitions I simulate 1000 survival

times, incorporating a binary and continuous covariate, representing gender, X1i ∼Bin(1, 0.5), and age, X2i ∼ N(65, 12), with associated log hazard ratios of β1 = −0.5

and β2 = 0.02, respectively. I assume administrative censoring at 5 years. Computation

time to generate the 1000 datasets was 144 seconds on an Intel Core i5 2.5GHz CPU.

In each repetition I monitor estimates of the log hazard ratios for the effects of gender

and age. Furthermore, I assess estimates for survival probabilities and hazard rates at

t = 1, 2, 3, 4, 5, estimated at X1i = 0 and X2i = 65.

Results are presented in Table 4.3. Under the Weibull model, I observe moderate

bias of -0.082 (-16.4%) for β1, the treatment effect, compared to essentially unbiased

estimates under the Royston-Parmar survival model. Estimates of the hazard and

survival functions are generally heavily biased under the Weibull model, with poor

coverage probabilities, compared to minimal bias and good coverage under the Royston-

Parmar approach. Note, there is a small amount of bias for the hazard at 5 years under

the Royston-Parmar survival model; however, given that this is not the true model, it

generally performs very well.

4.7. Simulating time-dependent effects

The presence of non-proportional hazards, i.e. time-dependent effects, is common-

place in the analysis of time to event data (Jatoi et al., 2011). This is often observed

in the analysis of registry based data sources where follow-up time can be over many

years (Lambert et al., 2011). Furthermore, evidence is often found of time-dependent

treatment effects (Mok et al., 2009).

4.7.1. Standard parametric distributions

Under standard parametric distributions, the inclusion of time-dependent effects can

be undertaken so as to ensure an analytically tractable and invertible cumulative hazard

function, allowing direct application of the inversion method. For example, under

57

Table 4.3. Bias and coverage of the log hazard ratio and estimates ofbaseline survival and hazard at specific time points from Weibull andRoyston-Parmar models.

Weibull RPTruth Bias 95% Coverage Bias 95% Coverage

β1 -0.500 -0.082 74.2 -0.001 93.1β2 0.020 0.003 75.4 0.000 94.2

Survival1 year 0.602 0.011 91.4 0.000 94.72 year 0.189 0.059 4.9 0.001 94.33 year 0.076 0.001 87.1 -0.000 94.24 year 0.037 -0.018 10.3 -0.001 94.45 year 0.018 -0.014 0.1 0.001 95.3

Hazard1 year 1.105 -0.365 0.0 -0.001 95.62 year 1.064 -0.009 84.9 -0.005 94.73 year 0.796 0.503 0.0 0.022 94.44 year 0.681 0.824 0.0 0.013 96.15 year 0.709 0.979 0.0 -0.064 91.2

an exponential or a Gompertz baseline hazard function, the interaction of covariates

with linear time can be included, to result in a hazard function which can still be

directly integrated, and subsequently directly solved for the simulated survival time,

t. Similarly, under a Weibull distribution, an interaction can be formed between the

covariates and log time, equivalent to changing the shape parameter, γ.

For example, consider a binary covariate, X1, which takes values 0 or 1. Under a

Gompertz baseline hazard function, I can invoke non-proportional hazards by interact-

ing X1 with linear time, t:

h(t) = λ exp(γt+ β1X1 + β2X1t) (4.16)

Equation (4.16) can be re-arranged,

h(t) = λ exp [(γ + β2X1)t+ β1X1] (4.17)

58

integrated to obtain the cumulative hazard function

H(t) =

∫ t

0λ exp [(γ + β2X1)u+ β1X1] du

=λ exp(β1X1)

γ + β2X1exp [(γ + β2X1)t]− 1 (4.18)

I therefore have, from Equation (4.1), and letting T be the simulated survival time

U = exp

(−λ exp(β1X1)

γ + β2X1exp [(γ + β2X1)T ]− 1

)(4.19)

which can be inverted and solved for t, the simulated survival time

T =1

γ + β2X1log

(− γ + β2X1

λ exp(β1X1)log(U) + 1

)(4.20)

This of course can be extended to multiple time-dependent effects; however, if I wish

to use a more complex distribution, or indeed a more complex function of time to

interact with covariates, I once again have analytically intractable and non-invertible

cumulative hazard functions.

4.7.2. Complex hazard functions

Incorporating time-dependent effects when simulating more complex hazard functions,

returns to the scenario where both numerical integration and iterative root-finding

procedures are required. For example, this arises when including a time-dependent

effect into the 2-component mixture Weibull model.

h(t) = h0(t) exp(β1(t)X) (4.21)

where β1(t) is a general function of time, t, such as a simple linear term, or something

more complex such as a fractional polynomial or spline function. This of course can

be extended to multiple time-dependent effects, with alternative complex functions of

time.


I now conduct a simulation study assessing the performance of the Weibull and

59

Royston-Parmar models (using 5 degrees of freedom for the baseline) under propor-

tional hazards, when simulating a time-dependent diminishing treatment effect. I fur-

ther apply a Royston-Parmar model, with 5 degrees of freedom for the baseline, allowing

for a time-dependent treatment effect with 3 degrees of freedom. In all models I assess

estimates of the hazard and survival functions in each treatment group. Survival times

are simulated from the baseline hazard function shown in Equation (4.12).

For each of 1000 repetitions I simulate 1000 survival times, incorporating a binary

and continuous covariate, representing treatment, X1i ∼ Bin(1, 0.5) and age, X2i ∼N(65, 12), and assume administrative censoring at 5 years. I simulate a time-dependent

treatment effect under the following

β1(t) = −0.7 + 0.01t+ 0.4 log(t) (4.22)

and proportional age effect of β2 = 0.02. Figure 4.6 illustrates the underlying log

hazard ratio shown in Equation (4.22), indicating the diminishing treatment effect.

-2.5

-2

-1.5

-1

-.5

0

Log

haza

rd r

atio


Figure 4.6. Example time-dependent log hazard ratio representing adiminishing treatment effect.

The true hazard ratio of the treatment effect at 1, 2 and 5 years is 0.502, 0.668 and

0.994, respectively, to represent a diminishing treatment effect. Computation time to

60

generate the 1000 datasets was 173 seconds on an Intel Core i5 2.5GHz CPU. In each

repetition I also monitor estimates of the log hazard ratios for the effect of age.

Results are presented in Table 4.4. As in Section 4.6.4, I observe very poor perfor-

mance when using a Weibull model, with large bias in the estimates of the hazard and

survival functions for both treatment groups. I observe improved performance under

the proportional hazards Royston-Parmar model; however, some bias and poor coverage

is seen in estimates of the hazard and survival functions, particularly in the treatment

group, but as this model assumes proportional hazards and the true model has non-

proportional hazards, this is to be expected. Under the Royston-Parmar model allowing

for a time-dependent treatment effect I observe much reduced bias and improved cover-

age probabilities, indicating that the model has captured the complex time-dependent

effect, even though I do not fit the true underlying model.

4.8. Simulating time-varying covariates

Time-varying covariates occur frequently in medical research. In cancer clinical

trials the occurrence of treatment switching or non-compliance, occurs when a patient

switches from, for example, the standard therapy to the new treatment, often around

the time of progression. An area of increasing interest in the biostatistical literature

is the joint modelling of longitudinal and survival data, where a repeatedly measured

biomarker is observed, and it is of interest to investigate the relationship of this time-

varying biomarker to survival. Joint modelling will be comprehensively described in

Chapters 6 to 10.

Recently Austin (2012) extended the methods of Bender et al. (2005) to simu-

late time-varying covariates of three types: first, a dichotomous time-varying covariate

that can change at most once; second, a continuous time-varying covariate; third, a

dichotomous time-varying covariate where subjects can switch groups multiple times.

Austin derived closed form expressions, including time-independent covariates, under

the exponential, Weibull and Gompertz distributions.

61

Table 4.4. Bias and coverage of the log hazard ratio and estimates ofbaseline survival and hazard at specific time points from proportionalhazards Weibull and Royston-Parmar models, and non-proportionalhazards Royston-Parmar model.

Weibull PH RP PH RP NPHTruth Bias 95% Coverage Bias 95% Coverage Bias 95% Coverage

β2 0.020 0.003 74.5 0.000 94.0 0.000 94.6Survival (X1 = 0, X2 = 65)

1 year 0.602 0.035 39.0 0.030 61.7 0.000 94.92 year 0.189 0.062 5.0 0.006 91.3 0.001 94.03 year 0.076 -0.006 79.6 -0.010 78.9 -0.000 94.34 year 0.037 -0.022 0.7 -0.012 42.7 -0.001 95.55 year 0.018 -0.016 0.0 -0.008 38.4 0.001 95.7

Hazard (X1 = 0, X2 = 65)1 year 1.105 -0.374 0.0 -0.058 90.1 -0.005 95.42 year 1.064 0.062 74.5 0.093 81.7 -0.012 94.83 year 0.796 0.654 0.0 0.224 10.6 0.025 95.04 year 0.681 1.054 0.0 0.268 17.9 0.025 95.65 year 0.709 1.286 0.0 0.223 54.0 -0.046 95.5

Survival (X1 = 1, X2 = 65)1 year 0.803 0.180 0.0 0.168 61.7 -0.000 94.82 year 0.406 0.281 0.0 0.206 91.3 0.001 94.63 year 0.208 0.157 0.0 0.138 78.9 -0.001 94.84 year 0.112 0.061 0.0 0.086 42.7 -0.002 95.35 year 0.059 0.018 7.2 0.055 38.4 0.001 95.5

Hazard (X1 = 1, X2 = 65)1 year 0.554 -0.707 0.0 -0.511 0.0 0.000 94.12 year 0.711 -0.451 0.0 -0.408 0.0 -0.008 94.23 year 0.632 -0.007 90.2 -0.218 0.1 0.020 93.64 year 0.612 0.263 0.0 -0.144 37.5 0.011 95.55 year 0.705 0.377 0.0 -0.181 38.1 -0.079 88.6

PH - proportional hazards, RP - Royston-Parmar modelNPH - non-proportional hazards

Under the simulation framework described above, I generalise the approach of

Austin (2012) to incorporate any combination of time-varying covariates, with a user-

defined baseline hazard function to incorporate more biologically realistic hazard func-

tions, but of course could still be a standard parametric distribution.

4.8.1. Simulating treatment switching

In this scenario I wish to simulate a time-varying binary covariate. I define X1 to

represent initial treatment a patient is randomised to, with treatment A (X1 = 0) and

treatment B (X1 = 1). I assume patients were randomised to treatment arms at t = 0.

62

For simplicity I allow patients to switch arm at most once. I also include a binary

covariate which represents disease severity, X2, with each patient having a 40% chance

of having a bad prognosis (X2 = 1), which increases a patient’s event rate. Under a

general baseline hazard function, h0(t), the hazard function is written as

h(t) = h0(t) expβ1 [I(t ≤ ts)X1 + I(t > ts)(1−X1)] + β2X2 (4.23)

where β1 is the log hazard ratio for the effect of treatment, which in this case I assume

is the same regardless of whether patients switched or not. In this example I assume

that a patient initially randomised to treatment (X1 = 1) has a treatment effect of

exp(β1) until their switching time, ts, after which their hazard ratio is 1. Thus, the

time-dependence is introduced through the indicator functions I(t ≤ ts) and I(t > ts).

The switching times need to be generated, for example I can generate the potential

switching times from a uniform distribution, which is dependent on disease severity

(X2). Endogenous/non-ignorable treatment switching can be created if the variable

X2 is deleted and not available for analysis. This scenario can be easily extended

to allow for any number of switches. Alternatively, I first could generate a vector of

survival times, ts, to represent time to progression, using the simulation framework

described in the previous sections.

4.8.2. Simulating a continuous time-varying biomarker and survival in-

corporating random effects and covariates measured with error

Here I wish to simulate a continuous biomarker, measured with error, exhibiting a

linear trend, under the following model

m(t) = β0i + β1it+ δui (4.24)

where

βi ∼ N(β,Σ) (4.25)

and ui is a vector of baseline covariates with associated coefficients δ. By including

the trajectory function, m(t), in the linear predictor of the survival model, multiplied

63

by an association parameter α, I can use the simulation algorithm described in Section

4.6.3 to directly simulate survival times under a joint model framework (Wulfsohn and

Tsiatis, 1997).

h(t) = h0(t) expψTvi + αm(t) (4.26)

where h0(t) is our user defined baseline hazard function, vi is a vector of baseline

covariates with associated log hazard ratios ψ.

Following the simulation of survival times, I can then construct any measurement

schedule for the longitudinal outcome, using Equation (4.24), and subsequently calcu-

late the observed longitudinal measurements. To complete the joint model framework,

measurement error in the longitudinal outcome can be incorporated simply by drawing

the observed longitudinal values from N(m(t), σ2e), where σ2

e is the measurement error

variance. This example further illustrates the ease at which incorporating random co-

variate effects can be conducted. This framework will be utilised in Chapters 7, 8 and

9 to simulate joint longitudinal-survival data.

4.9. Simulating a censoring distribution

In the previous examples I have assumed an administrative censoring time, i.e.

maximum follow-up time that each patient can be observed. In practice, intermittent

censoring is often observed, which may also be simulated. All of the scenarios and

techniques described above can be used to generate censoring times. By simulating a set

of event times and a second set of censoring times, for each patient, I can simply take the

minimum to obtain the observed survival time, and consequently the event indicator.

Furthermore, by making the censoring distribution dependent on covariates (be they

baseline covariates, with time-dependent effects, or time-varying), I can incorporate

informative censoring (Siannis et al., 2005). Alternatively, I could simulate survival

times, and then draw from a uniform distribution between the minimum and maximum

follow-up times to define a censoring fraction.

64

4.10. survsim

The methodology developed in this chapter to simulate survival data has been

implemented in the user friendly Stata package survsim, which can be downloaded

form within Stata by typing ssc install survsim. Example code to simulate all of

the scenarios described in this chapter can be found in the associated publications, in

Appendix A.

4.11. Discussion

I have described a general framework for the generation of survival data, incorpo-

rating any combination of complex baseline hazard functions, time-dependent effects,

time-varying covariates, delayed entry, random effects and covariates measured with

error. This centres on scenarios where the simulated survival time cannot be defined

in a closed form expression.

Previous work in the simulation of survival includes using standard baseline distri-

butions, such as the exponential, Weibull and Gompertz, with time-invariant covariates

(Bender et al., 2005). Mackenzie and Abrahamowicz (2002) described techniques to al-

low for time-dependent effects, and allowed specification of the marginal distribution

of event times and covariate distributions. A recent paper by Austin (2012) provided

closed form expressions to incorporate 3 types of time-varying covariates, which built

on work by Leemis (1987), who described techniques to invert the cumulative hazard

function with a single time-varying covariate. Furthermore, Sylvestre and Abrahamow-

icz (2008) describe two algorithms (permutation based and binomial model based) to

generate survival times with time-varying covariates. Finally, Royston (2012) provided

a method to simulate from parametric models that use restricted cubic splines on the log

cumulative hazard scale, using the Newton-Raphson root-finding technique described

in Section 4.6.1.3.

The general approach I have described relies on numerical integration to evaluate

analytically intractable hazard functions. In my experience, 15 to 20 Gauss-Legendre

quadrature points is often sufficient to provide accurate generation of survival times;

65

however, in the associated software I use 30 nodes as the default, as computation time is

often minimal. As with any estimation method which utilises numerical techniques, the

accuracy of the generation process can be assessed by defining a seed and changing the

tolerance of the root finder, and/or the number of quadrature nodes, and establishing

that the generated survival times do not change.

I have illustrated the simulation approach through a variety of simulation studies

and examples. In particular, by simulating from a complex underlying distribution, I

have shown that moderate bias can be observed in estimates of the log hazard ratio for

a treatment effect, when fitting a standard Weibull proportional hazards model.

Although in this chapter I have extolled the benefits of simulating from distribu-

tions beyond the standard choices, it must be stated that in many settings a simpler

distribution may be adequate. For example, if fitting Cox models under proportional

hazards and only the hazard ratio is of interest, then the baseline distribution used is

inconsequential and therefore a simpler distribution should take preference. However,

as described above, if evaluating parametric methods or incorporating time-dependent

effects, then using a more complex distribution can provide much more realistic sce-

narios in order to fully assess the methods being evaluated.

Further extensions not described in this thesis include incorporating a cure propor-

tion. This can be easily achieved by defining a mixture or non-mixture cure hazard

function. This framework can also be applied in the generation of competing risks data,

be it through cause-specific hazards or the approach of Beyersmann et al. (2009).

Given the inherent requirement of simulation studies to assess the statistical prop-

erties and performance of current and novel methods, I believe this framework can

play an important role in allowing the generation of more biologically realistic survival

data, incorporating much more complex scenarios. For example, the two-component

mixture distribution described in Section 4.6.1.1 has recently been used to simulate

joint model data from a baseline (cumulative) hazard function, to assess the use of

splines to capture complex baseline hazard functions (Crowther et al., 2012a), which

will be considered further in Chapter 7. Although I have concentrated on parametric

66

survival models in this chapter, the framework is entirely applicable to examining the

performance of the Cox model in any of the scenarios described (Cox, 1972).

67

CHAPTER 5

A general framework for parametric survival analysis


In this chapter I develop a general framework for the parametric analysis of sur-

vival data. Building on the simulation framework described in Chapter 4, numerical

integration will be used to allow the estimation of a parametric survival model, with

essentially any general hazard or log hazard function. I will describe a variety of ex-

tensions such as complex time-dependent effects, relative survival (excess mortality

models) and cluster robust standard errors. Concentrating on the special case of using

restricted cubic splines, on the log hazard scale, I will further show how the estimation

routine can be substantially improved using a combined analytic/numeric approach.

Example Stata code, using the publicly available stgenreg package, will be used to

display the flexibility and use of the modelling framework in developing novel survival

models.

This chapter is based on two recent papers; the first published in the Journal

of Statistical Software (Crowther and Lambert, 2013b), and the second published in

Statistics in Medicine (Crowther and Lambert, 2014), copies of which can be found in

Appendix B.

5.2. Introduction

The most popular tool for analysing survival data remains the Cox proportional

hazards model (Cox, 1972), which avoids making any assumptions for the shape of

the baseline hazard function. One of the reasons the Cox model remains the preferred

choice over parametric models is that parametric models available in standard software

are often not flexible enough to capture the underlying shape of the hazard function

68

seen in real data, illustrated in Section 4.3 of Chapter 4. Not having to specify the

baseline hazard function can be seen as both a strength and a weakness. The virtue of

the Cox model means that you do not have to make an assumption for the functional

form, where any assumption made can of course be incorrect. Conversely, Cox models

are estimated using partial likelihood, which will not be as efficient as full maximum

likelihood (if the model is correctly specified (Collett, 2003)), and when interest is in

measures of absolute risk it is much more convenient to do this parametrically.

Parametric models, such as those described in Chapter 2, remain a standard tool

for the analysis of survival data. As described in Section 2.9, knowledge of the baseline

hazard or survival allows absolute risk predictions over time (e.g. in prognostic models),

enables hazards ratios to be translated back to the absolute scale (e.g. useful for cal-

culating the number needed to treat), and further clinically meaningful statistics such

as differences in survival proportions (Lambert et al., 2010a). Parametric models are

also useful where extrapolation is required, such as in the economic decision modelling

framework (Weinstein et al., 2003; Latimer, 2013), and when capturing time-dependent

effects (Lambert et al., 2011). In particular, as described in Section 2.8.2, many authors

have shown the benefits of the Royston-Parmar flexible parametric model, as hazard

ratio estimates closely match those from a Cox model, whilst gaining the advantages

of undertaking a parametric approach, where in particular, the flexible modelling of

the baseline log cumulative hazard function and of time-dependent effects are particu-

larly appealing aspects of the modelling framework (Miladinovic et al., 2012; Rooney

et al., 2013). However, modelling on the log cumulative hazard scale is not without

its disadvantages; when there are multiple time-dependent effects there are difficulties

in interpretation of time-dependent hazard ratios, since these will vary over different

covariate patterns, even with no interaction between these covariates (Royston and

Lambert, 2011). Moving to the more traditional (log) hazard scale, whilst maintaining

flexibility, is desirable.

The traditional approach to estimation of parametric models is through maximum

likelihood, as discussed in Section 2.10. This is relatively simple when using a known

69

probability distribution function, such as the Weibull or Gompertz. Many commonly

used parametric survival models are implemented in a variety of software packages,

such as the streg package in Stata (StataCorp., 2011), survreg (Therneau, 2012) in

R (R Development Core Team, 2012) and LIFEREG in SAS (SAS Institute Inc., 2008).

However, every parametric model has underlying assumptions, with differences between

assumption of shape and covariate effects (proportional hazards, accelerated failure time

models), for example the widely used Weibull proportional hazards model assumes a

monotonically increasing or decreasing baseline hazard rate. Such assumptions can be

considered restrictive, leading to the development of other more flexible approaches

(Royston and Parmar, 2002; Royston and Lambert, 2011).

In this chapter, I present a general framework for the parametric analysis of sur-

vival data. Through the use of numerical integration, introduced in Chapter 4, any

hazard or log hazard function which can be written in a closed form expression can be

fitted through maximum likelihood. This allows complex extensions to standard para-

metric models, for example, modelling the log baseline hazard function using splines or

fractional polynomials, as well as complex time-dependent effects; methods that are un-

available in standard software. I will further extend the framework to relative survival

(excess mortality models) and cluster robust standard errors.

Concentrating on the special case of using restricted cubic splines, on the log hazard

scale to ensure positivity of the hazard function, I will further show how the estimation

routine can be substantially improved using a combined analytic/numeric approach.

Various types of splines have been used in the analysis of survival data, predominantly

on the hazard scale which results in an analytically tractable cumulative hazard func-

tion. For example, M-splines, which by definition are non-negative can be directly

applied on the hazard scale, due to the positivity condition. Kooperberg et al. (1995)

proposed using various types of splines on the log hazard scale, such as piecewise linear

splines (Kooperberg et al., 1995; Kooperberg and Clarkson, 1997).

The framework will be illustrated using the publicly available Stata package stgenreg

(Crowther and Lambert, 2013b), to display the flexibility and use of the modelling

70

framework in developing novel survival models. A further package, strcs, implements

the combined analytic/numerical approach.

5.3. A general framework for the parametric analysis of survival data

I now return to the estimation of parametric survival models, first described in

Section 2.10. Once again, the log-likelihood contribution of the ith patient, allowing for

right censoring and delayed entry (left truncation), can be written as

logLi = log

f(ti)

diS(ti)

1−di

S(t0i)

= di logf(ti)+ (1− di) logS(ti) − logS(t0i) (5.1)

where t0i and ti are the observed entry and survival/censoring times for the ith patient.

Using Equation (5.1) I can directly maximize the log-likelihood if using known proba-

bility density and survival functions. Alternatively, as shown in Section 2.10, Equation

(5.1) can be written as

logLi = di logh(ti) −∫ ti

t0i

h(u)du (5.2)

The important aspect to note from Equation (5.2), is that the log-likelihood can also be

maximised if only the hazard function is known. The maximisation of such a specified

hazard model relies on being able to evaluate the integral in Equation (5.2). If I

propose to use such functions as fractional polynomials or splines to model a complex

baseline (log) hazard function, or incorporating complex time-dependent effects, then I

have a situation where this integral cannot always be evaluated analytically, motivating

alternative approaches.

5.3.1. Numerical integration

I propose to use numerical quadrature, introduced in Chapter 4, to evaluate the cu-

mulative hazard, and hence maximise the likelihood in Equation (5.2). This allows

the estimation of a parametric survival model, specifying any function for the baseline

hazard, satisfying h(t) > 0 for all t > 0.

71

As described in Section 4.6.3.2 of Chapter 4, Gaussian quadrature allows the eval-

uation of an analytically intractable integral through a weighted sum of a function

evaluated at a set of pre-defined points, known as nodes (Stoer and Burlirsch, 2002).

The integral over [t0i, ti] in Equation (5.2) must be changed to an integral over [−1, 1]

using the following change of interval rule

∫ ti

t0i

h(x)dx =ti − t0i

2

∫ 1

−1h

(ti − t0i

2x+

t0i + ti2

)dx

Gaussian quadrature can then be applied as follows

∫ ti

t0i

h(x)dx ≈ ti − t0i2

m∑

j=1

wjh

(ti − t0i

2xj +

t0i + ti2

)

where xj is the jth quadrature node with associated weight, wj , where j = 1, . . . ,m.

This transformation allows the incorporation of delayed entry directly into the model

framework. As in Chapter 2, the default choice of quadrature is Gauss-Legendre.

Equation (5.2) then becomes

logLi ≈ di logh(ti) −ti − t0i

2

m∑

j=1

wjh

(ti − t0i

2xj +

t0i + ti2

)(5.3)

Equation (5.3) can then be maximised using standard methods described in Section

2.10. The accuracy of the numerical integral depends on the number of quadrature

nodes, m, with node locations dependent on the type of quadrature chosen. As with all

methods which use numerical integration, the stability of maximum likelihood estimates

should be established by using an increasing number of quadrature nodes. This is

explored further in Sections 5.9.1 and 5.10.1, and within the joint model framework in

Chapter 7.

72

5.4. Time-dependent effects

Time-dependent effects were introduced in Section 4.7, where the effect of a co-

variate is allowed to vary over time, also commonly known as non-proportional haz-

ards. Time-dependent effects are incorporated seamlessly into this modelling frame-

work. Within the general hazard function, an interaction can be formed between a

covariate(s) and any function of time, to provide the desired level of flexibility.

h(t) = h0(t) exp [X1β1 +X2β2(t)] (5.4)

where h0(t) is the baseline hazard function which could be anything from a simple

exponential to something more complex such as a spline based function. I have X1

is a vector of time-independent (baseline) covariates with associated log hazard ratios,

β1, and X2 is a further baseline covariate (usually ∈ X1), with β2(t) a function of

time, t, such as a simple linear term, or something more complex such as a fractional

polynomial or spline function. The extension to multiple time-dependent effects follows

naturally.

5.5. Excess mortality models

In population-based studies where interest lies in mortality associated with a par-

ticular disease, it is not always possible to use cause of death information. This may

be due to this information not being available or it considered too unreliable to use

(Begg and Schrag, 2002; Fall et al., 2008). In these situations it is common to model

and estimate excess mortality by comparing the mortality experienced amongst a dis-

eased population to that expected amongst a disease free population. The methods

have most commonly been applied to population-based cancer studies, but have also

been used in studies of HIV (Bhaskaran et al., 2008) and cardiovascular disease (Nelson

et al., 2007). The total mortality (hazard) rate for the ith patient, hi(t), is partitioned

into the expected mortality rate, h∗i (t), and the excess mortality rate associated with

a diagnosis of disease, λi(t).

hi(t) = h∗i (t) + λi(t) (5.5)

73

The expected mortality rate, h∗i (t), is usually obtained from national or regional life

tables stratified by age, calendar year, sex and sometimes other covariates such as

socio-economic class (Coleman et al., 1999).

Transforming to the survival scale gives,

Si(t) = S∗i (t)Ri(t) (5.6)

where Ri(t) is known as the relative survival function and S∗i (t) is the expected survival

function. The effect of covariates on the excess mortality rate is usually considered to

be multiplicative and so covariates, Xi are modelled as,

hi(t) = h∗i (t) + λ0(t) exp(Xiβ) (5.7)

where h∗i (t) is the expected mortality rate at time t, λ0(t) is the baseline excess hazard

function and the β’s are log excess hazard ratios (also referred to as log excess mortality

rate ratios). This model assumes proportional excess hazards, but in population-based

cancer studies this assumption is rarely true and there has been substantial work on

methods to fit models that relax the assumption of proportionality (Bolard et al., 2002;

Giorgi et al., 2003; Dickman et al., 2004; Nelson et al., 2007). A further fundamental

assumption is that h∗i (t) and S∗i (t) are assumed known.

A common model for analysing excess mortality is an extension of Royston-Parmar

models (Nelson et al., 2007), which were described in Section 2.8.2. With multiple

time-dependent effects interpretation of hazard ratios can be complicated and so there

are advantages to modelling on the log hazard scale instead. For example, in a model

on the log cumulative excess hazard scale where both age group and sex are modelled as

time-dependent effects, but with no interaction between the covariates, the estimated

time-dependent hazard ratio for sex would be different in each of the age groups. In a

model on the log excess hazard scale, this would not be the case (Royston and Lambert,

2011). Within the relative survival field, previous work by Remontet et al. (2007) used

numerical integration, but used quadratic splines, limited to only two knots, with no

restriction on the splines. Here I generalise the approach.

74

The log-likelihood for an excess mortality model is defined as,

logLi = di log h∗(ti) + λ(ti)+ log S∗(ti)+ log R(ti)

− log S∗(t0i) − log R(t0i) (5.8)

Since the terms log S∗(ti) and log S∗(t0i) do not depend on any model parameters

they can be omitted from the log-likelihood function for purposes of estimation, which

becomes

logLi = di log h∗(ti) + λ(ti)+ log R(ti) − log R(t0i) (5.9)

This means that in order to estimate the model parameters the expected mortality rate

at the time of death, h∗(ti), is needed for subjects that experience an event.

Under the general parametric framework using numerical quadrature, Equation

(5.9) becomes

logLi = di log h∗(ti) + λ(ti) −∫ ti

t0i

λ(u)du

≈ di log h∗(ti) + λ(ti)

− ti − t0i2

m∑

j=1

wjλ

(ti − t0i

2xj +

t0i + ti2

)(5.10)

which provides a general framework for relative survival models (Crowther and Lam-

bert, 2014), whereby Equation (5.10) can be maximised using techniques described in

Section 2.10 of Chapter 2.

5.6. Cluster robust standard errors

In standard survival analysis the assumption of independent observations is gener-

ally made; however, in some circumstances observations may be correlated if a group

structure exists within the data. For example in the analysis of recurrent event data,

where individual patients can experience an event multiple times, resulting in multiple

observations per individual (Prentice et al., 1981). In these circumstances, correlation

75

between observations of the same group would be expected. Failing to account for this

sort of structure can underestimate standard errors.

Given V , our standard estimate of the variance covariance matrix, which is the

inverse of the negative Hessian matrix evaluated at the maximum likelihood estimates,

I define the robust variance estimate developed by Huber (1967), White (1980) and

White (1982)

Vr = V

(N∑

i=1

u′iui

)V (5.11)

where ui is the contribution of the ith observation to ∂ logL/∂β, with N the total

number of observations.

This can be extended to allow for a clustered structure. Suppose the N observations

can be classified into M groups, which I denote by G1, . . . , GM , where groups are now

assumed independent rather than individual level observations. The robust estimate of

variance becomes

Vr = V

M∑

j=1

u(G)′j u

(G)j

V (5.12)

where u(G)j is the contribution of the jth group to ∂ log L/∂β. More specifically, Rogers

(1993) noted that if the log-likelihood is additive at the observation level, where

logL =

N∑

i=1

logLi (5.13)

then with ui = ∂ logLi/∂β, which is estimated numerically as described in Section

2.10.2 of Chapter 2, I have

uGj =∑

i∈Gjui (5.14)

I follow the implementation in Stata which also incorporates a finite sample adjustment

of V ∗r = M/(M − 1)Vr.

5.7. Prediction

Following a model fit, a variety of predictions can be calculated to assess model

fit, and produce clinically meaningful and interpretable plots. Of course, given that

76

I use numerical integration to calculate the cumulative hazard, and subsequently the

survival function in the estimation of a model, this means that numerical integration

must be used to calculate predictions of the cumulative hazard and survival functions.

For example, the fitted survival function is given by

S(t) = exp

(−∫ t

0h(u)du

)(5.15)

When approximating the integral in the above equation using quadrature, this becomes

S(t) ≈ exp

− t

2

m∑

j=1

wj h

(t

2xj +

t

2

) (5.16)

This approximation means that the calculation of confidence intervals does not follow

naturally; however, given that Equation (5.16) is simply the exponential of a linear

combination (weighted sum), then I can apply the delta method (Carstensen, 2004).

This method provides a way of calculating confidence intervals for the cumulative haz-

ard and survival functions, for any specified (log) hazard function. I compare this

method of calculating confidence intervals to an analytical form in Section 5.9.1.

5.8. Implementation in Stata: The stgenreg package

A central theme of the thesis is to not only develop methods for the analysis of

complex data, but to provide user friendly Stata software which puts the methods

directly into practice, to enable researchers to utilise the methodology described.

The methodological framework described above is implemented in the Stata package

stgenreg (Crowther and Lambert, 2013b). In this Section I describe the implementa-

tion of the program to aid in understanding of the flexibility of the approach, and how

it can be used, often through a single line of code, to fit novel survival models.

5.8.1. Program implementation and syntax

The log-likelihood shown in Equation (5.3) is maximized using the Newton-Raphson

algorithm, with first and second derivatives estimated using finite differences, as imple-

mented in the ml command in Stata (Gould et al., 2010), described in Section 2.10.

77

The evaluator program has been optimised using Stata’s matrix programming lan-

guage, Mata. This provides computational benefits and use of the wide array of math-

ematical functions available for the user to specify in the definition of the hazard or

log hazard function. In addition, I have implemented special cases of specific functions

which allow the incorporation of restricted cubic splines or fractional polynomials into

the hazard or log hazard function (Durrleman and Simon, 1989; Royston and Altman,

1994), making it extremely simple for the user to incorporate such complex and flexible

functions.

When using stgenreg one of the options loghazard() or hazard() must be de-

fined. These specify a user-defined log hazard or hazard function. The function must

be defined in Mata code, with parameters specified in square brackets, for example

[ln lambda]. The use of Mata means that mathematical operations require a colon

(:) prefix, for example :+ instead of +. Time must be coded as #t. The user can

specify covariates or functions of time within the linear predictor of any parameter,

providing a highly flexible framework.

For example, a Weibull distribution can be specified using either the log hazard or

hazard function. Each parameter should be parameterised to contain the entire real

number line, for example, both λ and γ are restricted to be positive by modelling on

the log scale.

. stgenreg, loghazard([ln_lambda] :+ [ln_gamma] :+ (exp([ln_gamma]):-1):*log(#t))

A linear predictor can be defined for any of the parameters, with the name of the

option defined as the name of the parameter specified in the loghazard() or hazard()

option. For example, a proportional hazards Weibull model can be fitted with covari-

ates treatment, age and sex by adding the option ln lambda(treatment age sex), as

follows

. stgenreg, loghazard([ln_lambda] :+ [ln_gamma] :+ (exp([ln_gamma]):-1):*log(#t))

> ln_lambda(treatment age sex)

78

One of the key advantages of stgenreg is that I can incorporate a variety of func-

tions (including functions of time) into the linear predictor of any parameter. For

example, parameter [ln lambda] has an available option ln lambda(comp1 | comp2

| ...| compn), which can contain a variety of component functions to increase com-

plexity. Each compj can contain a variety of functions described in Table 5.1.

Table 5.1. Description of each component that can be included in thelinear predictor of a parameter included in a stgenreg model.

Component Descriptionvarlist [, nocons] the user may specify a standard variable list within a

component section, with an optional nocons option

g(#t) where g() is any user defined function of #t writtenin Mata code, for example #t:^2

#rcs(options ) creates restricted cubic splines of either log time ortime. Options include df(int), the number of de-grees of freedom, noorthog which turns off the de-fault orthogonalisation, time, which creates splinesusing time rather than log time, the default, andoffset(varname) to include an offset when calculat-ing the splines. See rcsgen in Stata for more details.

#fp(numlist [,options ]) creates fractional polynomials of time with powers de-fined in numlist. If 0 is specified, log time is gener-ated. The only current option is offset() which isconsistent with that described in #rcs() above.

varname:*f(#t) to include time-dependent effects, where f(#t) is oneof #rcs(), #fp() or g().

Finally, all standard options of the ml suite in Stata can be used when fitting a

stgenreg model, such as constraints() which allow the user to constrain the value of

any coefficient to be a particular constant, and weights which can allow the extension

for use in case-cohort studies.

5.8.2. Predictions

A variety of predictions can be obtained following the estimation of a model. These

79

include the hazard, survival and cumulative hazard functions. The standard Stata syn-

tax to obtain predictions following a model fit is predict newvarname, statistic.

So for example, to obtain the fitted survival function, you simply specify predict

surv1, survival. Extended prediction options unavailable in standard software in-

clude: zeros - obtains baseline predictions, at()- obtains predictions at specified co-

variate patterns, timevar() - obtains predictions at specified times. These options can

be combined with standard choices of hazard, cumhazard and survival. Finally, the

ci option can be used to obtain confidence intervals using the delta method (Carstensen,

2004).

5.9. Analysis of example datasets using stgenreg

In this section I illustrate the use of the general modelling framework using the

England and Wales breast cancer dataset, described in Section 3.3 of Chapter 3, and

the dataset of patients with liver cirrhosis, described in Section 3.5 of Chapter 3. Briefly,

the England and Wales breast cancer dataset consists of 9721 women aged under 50 and

diagnosed with breast cancer in England and Wales between 1986 and 1990. The event

of interest is death from any cause, with follow-up restricted to 5 years. Deprivation

was categorised into 5 levels; however, I have restricted the analyses to comparing the

least and most deprived groups, for illustrative purposes. I therefore only consider a

binary covariate, dep5, with 0 for the least deprived and 1 for the most deprived group.

The second dataset consists of 488 patients with liver cirrhosis (Anderson et al., 1993).

A total of 251 patients were randomised to receive prednisone, with 237 randomised to

receive a placebo. Prothrombin index was measured repeatedly, with between 1 and

17 measurements per subject, resulting in 2968 observations. Outcome was all-cause

death.

5.9.1. Proof of concept - Weibull proportional hazards model

I begin by fitting a Weibull proportional hazards model to the breast cancer dataset,

investigating the effect of deprivation status. Given that Weibull models are available in

all standard statistical software, I first illustrate the concept showing that the estimates

80

agree with estimates derived using analytically tractable definitions of the hazard and

survival functions. The baseline hazard and log hazard functions have the following

form

hi(t) = λγtγ−1 exp(Xiβ)

and

log(h(t)) = log(λ) + log(γ) + (γ − 1) log(t) +Xiβ

where Xi is a vector of covariates, with corresponding regression coefficients β. In this

case it is convenient to use the loghazard() option of stgenreg. I can investigate

covariate effects by including deprivation status in the linear predictor of log(λ), using

the option ln lambda().

. stgenreg, loghazard([ln_lambda] :+ [ln_gamma] :+ (exp([ln_gamma]):-1):*log(#t))> ln_lambda(dep5)

Log likelihood = -8808.3461 Number of obs = 9721

Coef. Std. Err. z P>|z| [95% Conf. Interval]

ln_lambda

dep5 .2698351 .0392018 6.88 0.000 .1930011 .3466692

_cons -2.823244 .0371849 -75.92 0.000 -2.896125 -2.750363

ln_gamma

_cons .0454263 .0181255 2.51 0.012 .0099009 .0809517

Quadrature method: Gauss-Legendre with 15 nodes

A log hazard ratio of 0.270 (95% CI: 0.193, 0.347) is observed, and consequently a

hazard ratio of 1.310 (95% CI: 1.213, 1.414), indicating a 31% increase in the mortality

rate in the most deprived group compared to the least deprived. I could further model

the γ parameter by deprivation status, simply by adding the option ln gamma(dep5).

When fitting models which rely on numerical integration, it is important to estab-

lish the stability of maximum likelihood estimates by using an increasing number of

quadrature nodes. In the case of a Weibull proportional hazards model, I can both

compare with the optimised model using streg in Stata (where integrals are obtained

analytically), and compare with an increasing number of quadrature nodes to evaluate

the numerical approximation. Here I present results from fitting the streg model and

stgenreg models with 15, 30, 50 and 100 nodes.

81

Table 5.2. England and Wales breast cancer dataset. Results fromoptimised Weibull and general parametric models with varying numberof nodes.

VariableOptimised General parametric models

Weibull model 15 nodes 30 nodes 50 nodes 100 nodes

log(λ):Most deprived 0.270 0.270 0.270 0.270 0.270

(standard error) 0.039 0.039 0.039 0.039 0.039Intercept -2.825 -2.823 -2.825 -2.825 -2.825

(standard error) 0.037 0.037 0.037 0.037 0.037

log(γ) 0.047 0.045 0.046 0.047 0.047(standard error) 0.018 0.018 0.018 0.018 0.018

log-likelihood -8808.085 -8808.346 -8808.149 -8808.108 -8808.091

From Table 5.2, consistent parameter estimates are observed to 3 decimal places

with 30 nodes, and accuracy is improved when the number of nodes are increased. How-

ever, computation time will increase with an increasing number of nodes, for example

using 15 nodes takes 7.4 seconds compared with 12.4 seconds using 100 nodes (on a

HP laptop with Intel i5 2.5GHz processor and 8GB of RAM). In comparison, the fully

optimised streg model took 0.4 seconds to converge. This difference is clearly expected

as the stgenreg formulation of the Weibull model is not the most computationally effi-

cient, as there is no need to use numerical integration when using the standard Weibull

model. The performance of the quadrature approximation is more formally evaluated

in Section 5.10.1.

To illustrate the use of the delta method in calculating confidence intervals for

fitted cumulative hazard and survival functions, I present Figure 5.1, which compares

the fitted survival function and 95% confidence intervals from the analytic Weibull, and

the equivalent Weibull model fitted with quadrature.

Figure 5.1 indicates extremely close agreement between the confidence interval cal-

culated using the delta method and quadrature, described in Section 5.7, compared to

82

.7

.8

.9

1

Sur

viva

l fun

ctio

n


Fitted survival: NumericFitted survival lower CI: NumericFitted survival upper CI: NumericFitted survival: AnalyticFitted survival lower CI: AnalyticFitted survival upper CI: Analytic

Confidence interval comparison

Figure 5.1. England and Wales breast cancer dataset. Comparisonof fitted survival function and confidence intervals between analyticWeibull model and equivalent model fitted with quadrature.

the analytic equivalent, illustrating the validity of the technique. This will be further

discussed in Section 5.12.

5.9.2. Generalised gamma proportional hazards model

The Generalised Gamma (GG) is a 3-parameter parametric model implemented in a

variety of statistical packages (Cox et al., 2007). However, it is parameterised as an

accelerated failure time model in Stata. I define the survival and density functions as

follows

SGG(t) =

1− I (γ, u) if κ > 0

1− Φ (z) if κ = 0

I (γ, u) if κ < 0

(5.17)

and

fGG(t) =

γγ

σt√

2πexp(z

√γ − u) if κ 6= 0

1σt√

2πexp(−z2/2) if κ = 0

(5.18)

where γ = |κ|−2, z = sign(κ)log(t) − µ/σ, u = γ exp(|κ|z), Φ(z) is the standard

normal cumulative distribution, and I(a, x) is the incomplete gamma function.

83

Therefore, using Equation (2.6), I can write down the baseline hazard function as

the ratio of the probability distribution function to the survival function.

hGG(t) =fGG(t)

SGG(t)

To invoke proportional hazards I can then simply multiply by the exponential of a

parameter, the linear predictor of which is our vector of covariates

hGG(t) =fGG(t)

SGG(t)exp(Xiβ) or log(hGG(t)) = log

(fGG(t)

SGG(t)

)+Xiβ

where β is a vector of log hazard ratios. In terms of implementation, in the linear

predictor for theXiβ parameter I must specify the nocons option to ensure no intercept

term, obtaining a proportional hazards formulation for the GG model.

I can build up the definition of this complex hazard function using Stata’s local

macros

. local mu [mu]

. local sigma exp([ln_sigma])

. local kappa [kappa]

. local gamma (abs(`kappa´) :^ (-2))

. local z (sign(`kappa´) :* (log(#t) :- `mu´) :/ (`sigma´))

. local u ((`gamma´) :* exp(abs(`kappa´) :* (`z´)))

. local surv1 (1 :- gammap(`gamma´,ù´)) :* (`kappa´ :> 0)

. local surv2 (1 :- normal(`z´)) :* (`kappa´ :== 0)

. local surv3 gammap(`gamma´,ù´) :* (`kappa´ :< 0)

. local pdf1 ((`gamma´ :^ `gamma´) :* exp(`z´ :* sqrt(`gamma´) :- ù´) :/ ///> (`sigma´ :* #t :* sqrt(`gamma´) :* gamma(`gamma´))) :* (`kappa´ :! =0). local pdf2 (exp(-(`z´ :^ 2) :/ 2) :/ (`sigma´ :* #t :* sqrt(2 :* pi()))) ///> :* (`kappa´ :== 0). local haz (`pdf1´ :+ `pdf2´) :/ (`surv1´ :+ `surv2´ :+ `surv3´)

. stgenreg, hazard(exp([xb]) :* (`haz´)) nodes(30) xb(dep5,nocons)Log likelihood = -8801.2754 Number of obs = 9721

----------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+--------------------------------------------------------------xb |

dep5 | .2694578 .0391992 6.87 0.000 .1926289 .3462868-------------+--------------------------------------------------------------kappa |

_cons | .6752793 .0749985 9.00 0.000 .528285 .8222735-------------+--------------------------------------------------------------mu |

_cons | 2.710497 .032793 82.65 0.000 2.646224 2.774771-------------+--------------------------------------------------------------ln_sigma |

_cons | .1727204 .0521935 3.31 0.001 .0704231 .2750178----------------------------------------------------------------------------Quadrature method: Gauss-Legendre with 30 nodes

84

I obtain a log hazard ratio for the most deprived compared to the least deprived, of

0.269 (95% CI: 0.193, 0.346), which is identical to the Weibull model fitted in Section

5.9.1, to 3 decimal places. This is very much a conceptual example, but aims to illustrate

just how flexible the modelling framework is. By simply introducing an extra parameter

I can implement a model not available in any software package. As far as I am aware,

a proportional hazards extension for the generalised gamma has not previously been

done.

5.9.3. Restricted cubic splines

I now introduce an alternative to the RP model, using restricted cubic splines but now

on the log hazard scale. I begin by assuming a proportional hazards model, modelling

the baseline log hazard function using restricted cubic splines

log hi(t) = log h0(t) +Xiβ = s(log(t)|γ,k0) +Xiβ (5.19)

where Xi is a vector of baseline covariates with associated log hazard ratios β, and

s(log(t)|γ,k0) is a function of log(t) expanded into restricted cubic spline basis with

knot location vector, k0, and associated coefficient vector, γ.

Similarly to the RP model, in terms of knot locations, for the internal knots, I use

by default the centiles of the uncensored log survival times, and for the boundary knots

I use the minimum and maximum observed uncensored log survival times. Once more,

the choice of the number of spline terms (more spline terms allows greater flexibility),

is left to the user.

Shorthand notation has been programmed within stgenreg to allow a very simple

syntax when fitting a restricted cubic spline based model, for example, to fit a model

on the log hazard scale with 5 spline parameters to model the baseline, I use the

#rcs(df()) notation as follows

85

. stgenreg, loghazard([xb]) xb(dep5 | #rcs(df(5))) nodes(30)Variables _eq1_cp2_rcs1 to _eq1_cp2_rcs5 were created

initial: log likelihood = -39318.31alternative: log likelihood = -24641.326rescale: log likelihood = -9630.8486Iteration 0: log likelihood = -9630.8486Iteration 1: log likelihood = -8763.1988Iteration 2: log likelihood = -8756.2377Iteration 3: log likelihood = -8756.2213Iteration 4: log likelihood = -8756.2213



dep5 .2693634 .0392018 6.87 0.000 .1925293 .3461976

_eq1_cp2_rcs1 -.0621779 .0274602 -2.26 0.024 -.1159989 -.008357

_eq1_cp2_rcs2 .0784834 .0192975 4.07 0.000 .0406611 .1163057

_eq1_cp2_rcs3 .1158689 .0176746 6.56 0.000 .0812272 .1505106

_eq1_cp2_rcs4 -.0251518 .0143719 -1.75 0.080 -.0533202 .0030165

_eq1_cp2_rcs5 .0012793 .0134076 0.10 0.924 -.0249991 .0275576

_cons -2.910463 .0607005 -47.95 0.000 -3.029434 -2.791492


The model estimates a log hazard ratio for the most deprived compared to the least

deprived of 0.269 (95% CI: 0.193, 0.346), which I compare to that of 0.269 (95% CI:

0.193, 0.346) obtained from a proportional hazards Cox model, indicating extremely

good agreement between the parametric and semi-parametric approaches. The esti-

mated baseline hazard function from the spline-based model and associated 95% con-

fidence interval can be calculated by

. predict h1, hazard zeros ci

with the estimated function shown in Figure 5.2.

Figure 5.2 illustrates that the model predicts a highly complex underlying hazard

function, which clearly could not be captured by a simple parametric model such as

the Weibull.

5.9.3.1. Complex time-dependent effects

Continuing with the special case of using restricted cubic splines, I can incorporate

time-dependent effects into the model framework as follows

log hi(t) = s(log(t)|γ0,k0) +Xiβ +

P∑

p=1

xips(log(t)|γp,kp) (5.20)

86

0.05

0.10

0.15

0.20

0.25

Haz

ard

rate

0 1 2 3 4 5Follow−up time (years)

95% confidence interval Baseline hazard rate

Figure 5.2. England and Wales breast cancer dataset. Predicted haz-ard function for the most affluent group with 95% confidence interval.

where for the pth time-dependent effect, with p = 1, . . . , P , I have xp, the pth covariate,

multiplied by a spline function of log time, s(log(t)|γp,kp), with knot location vector,

kp, and coefficient vector, γp. Once again, degrees of freedom for each time-dependent

effect can be guided using model selection criteria, and/or the impact of different knot

locations assessed through sensitivity analysis. An advantage of modelling on the log

hazard scale is that when there are multiple time-dependent effects, the interpretation

of the time-dependent hazard ratios is simplified as they do not depend on values of

other covariates, which is the case when modelling on the cumulative hazard scale

(Royston and Lambert, 2011).

Returning to the England/Wales breast cancer data, I can investigate the presence

of a time-dependent effect due to deprivation status, by applying Equation (5.20). I

use 5 degrees of freedom to capture the baseline and use 3 degrees of freedom to model

the time-dependent effect of deprivation status, which can be fitted as follows

. stgenreg, loghazard([xb]) xb(dep5 | #rcs(df(5)) | dep5:*#rcs(df(3))) nodes(30> )Variables _eq1_cp2_rcs1 to _eq1_cp2_rcs5 were createdVariables _eq1_cp3_rcs1 to _eq1_cp3_rcs3 were created

initial: log likelihood = -39318.31alternative: log likelihood = -24641.326rescale: log likelihood = -9630.8486Iteration 0: log likelihood = -9630.8486

87

Iteration 1: log likelihood = -8755.8859Iteration 2: log likelihood = -8747.373Iteration 3: log likelihood = -8747.3275Iteration 4: log likelihood = -8747.3275



dep5 .0723415 .0924005 0.78 0.434 -.1087602 .2534433

_eq1_cp2_rcs1 -.0108058 .0309504 -0.35 0.727 -.0714673 .0498558

_eq1_cp2_rcs2 .0672877 .0224852 2.99 0.003 .0232177 .1113578

_eq1_cp2_rcs3 .1128672 .0207167 5.45 0.000 .0722634 .1534711

_eq1_cp2_rcs4 -.0261438 .0145455 -1.80 0.072 -.0546525 .002365

_eq1_cp2_rcs5 .0014202 .0134079 0.11 0.916 -.0248589 .0276992

_eq1_cp3_rcs1 -.1464002 .0443983 -3.30 0.001 -.2334194 -.0593811

_eq1_cp3_rcs2 .0425164 .0333753 1.27 0.203 -.022898 .1079307

_eq1_cp3_rcs3 .0135896 .0322604 0.42 0.674 -.0496396 .0768187

_cons -2.849318 .0649361 -43.88 0.000 -2.976591 -2.722046


Figure 5.3 shows the time-dependent hazard ratio, illustrating the decrease in the

effect of deprivation over time.

12

34

56

Haz

ard

ratio

0 1 2 3 4 5Follow-up time (years since diagnosis)

95% Confidence Interval Hazard ratio

Figure 5.3. England and Wales breast cancer dataset. Time-dependent hazard ratio for deprivation status.

In Figure 5.4 I compare the fit of the models with either time-independent or

time-dependent hazard ratios for deprivation status, by overlaying the fitted survival

functions onto the Kaplan-Meier curve, for each deprivation group. It is clear from

Figure 5.4 that I obtain a much improved fit to the Kaplan-Meier curve when modelling

the time-dependent effect of deprivation group.

88

0.6

0.7

0.8

0.9

1.0

Sur

viva

l


Proportional hazards

0.6

0.7

0.8

0.9

1.0

Sur

viva

l


Non−proportional hazards

Affluent group, KM curve Deprived group, KM curveAffluent group, stgenreg Deprived group, stgenreg

Figure 5.4. England and Wales breast cancer dataset. Kaplan-Meierestimates for the most affluent and most deprived groups, with predictedsurvival overlaid. The figure on the left shows predicted survival witha proportional effect of deprivation status, with the figure on the rightallowing for non-proportional hazards in the effect of deprivation status.

5.9.4. Time-varying covariates

Within a survival analysis framework, time-varying covariates occur when the value

of a covariate for individual patients can change at various points in follow-up. For

example in oncology clinical trials, patients will often switch treatment group when

their condition progresses (Morden et al., 2011), or biomarkers may be measured re-

peatedly over time, resulting in multiple records per subject (Wulfsohn and Tsiatis,

1997; Henderson et al., 2000). For this form of analysis the data is often set up into

start and stop times, and since delayed entry (left truncation) is allowed, this again

is incorporated into the described modelling framework. The analysis of repeatedly

measured biomarkers and survival data within a joint model framework is considered

in later chapters.

I now illustrate the data setup required for survival analysis incorporating a time-

varying covariate, usually referred to as multiple-record per subject. I return to the

liver cirrhosis dataset described above. Table 5.3 shows the observed data for 2 patients,

in multiple-records per subject structure.

89

Table 5.3. Liver cirrhosis trial of prednisone treatment. Examplemultiple-record per subject data structure.

ID Prothrombin Treatment Enter time Exit time Event

1 38 placebo 0 0.244 01 31 placebo 0.244 0.381 01 27 placebo 0.381 0.413 1

111 59 prednisone 0 0.246 0111 60 prednisone 0.246 0.498 0111 87 prednisone 0.498 0.745 0111 59 prednisone 0.745 1.128 0111 35 prednisone 1.128 1.158 1

The ‘enter’ times refer to when prothrombin index was measured, with the final

row per subject of ‘exit’ times corresponding to each subject’s observed event/censoring

time. With the data set up into this form, any general parametric model can then be

applied, for example, I apply a model using restricted cubic splines for the baseline,

adjusting for treatment and the time-varying prothrombin index, using the following

. stgenreg, loghazard([xb]) xb(pro trt | #rcs(df(5))) nodes(30)

Results are presented in Table 5.4.

Table 5.4. Liver cirrhosis trial of prednisone treatment. Results fromspline based proportional hazards model with time-varying prothrombinindex.

Variable log(HR) Std. Err. p-value 95% CI

Prothrombin index -0.035 0.002 0.000 -0.040 -0.030Treatment 0.133 0.118 0.262 -0.099 0.364

I observe a log hazard ratio of -0.035 (95% CI: -0.040, -0.030), which equates to a

hazard ratio of 0.966 (95% CI: 0.961, 0.970) indicating lower values of the biomarker

are associated with an increased risk of death.

90

5.10. Improving the estimation procedure when using restricted cubic

splines

The very nature of the modelling framework described above implies that I can

specify practically any general function in the definition of the hazard or log hazard

function, given that it satisfies h(t) > 0 for all t > 0. As a development tool, this can

be seen as being very useful, in that novel models can often be fitted with a single

line of code. However, moving beyond the general framework, and employing a more

model specific implementation can provide increased optimisation and a more efficient

implementation. To illustrate this aspect, I concentrate on a particular flexible way of

modelling survival data, using restricted cubic splines (Durrleman and Simon, 1989)

on the log hazard scale, as opposed to the log cumulative hazard scale as implemented

in RP models, described in Section 2.8.2.

5.10.1. Improving estimation using a combined analytic/numeric ap-

proach

In the restricted cubic spline based models described above, the restricted nature of

the spline function forces the baseline log hazard function to be linear beyond the

boundary knots. In those areas the cumulative hazard function can actually be written

analytically, as the log hazard is a linear function of log time. Defining the boundary

knots to be k01, k0n, I need only conduct numerical integration between k01, k0n, using

the analytical form of the cumulative hazard function beyond the boundary knots.

I define δ0i and δ1i to be the intercept and slope of the log hazard function for the

ith patient before the first knot, k01, and φ0i and φ1i to be the intercept and slope of the

log hazard function for the ith patient beyond the final knot, k0n. If there are no time-

dependent effects then δ0i, δ1i, φ0i, φ1i are constant across patients. The cumulative

hazard function can then be defined in three components

Hi(t) = H1i(t) +H2i(t) +H3i(t) (5.21)

91

If I assume t0i < k01 and ti > k0n, then before the first knot, I have

H1i(t) =exp(δ0i)

δ1i + 1

min(ti, k01)δ1i+1 − tδ1i+1

0i

(5.22)

and after the final knot,

H3i(t) =exp(φ0i)

φ1i + 1

tφ1i+1i −max(t0i, k0n)φ1i+1

(5.23)

and H2i(t) becomes

H2i(t) ≈k0n − k01

2

m∑

j=1

vjhi

(k0n − k01

2zj +

k01 + k0n

2

)(5.24)

where zj is the jth quadrature node with associated weight, vj . Alternatively, there

may be observations where k0n > t0i > k01 and ti > k0n then

H1i(t) = 0

H2i(t) ≈k0n − t0i

2

m∑

j=1

vjhi

(k0n − t0i

2zj +

t0i + k0n

2

)

H3i(t) =exp(φ0i)

φ1i + 1

tφ1i+1i −max(t0i, k0n)φ1i+1

If t0i < k01 and k01 < ti < k0n, then

H1i(t) =exp(δ0i)

δ1i + 1

min(ti, k01)δ1i+1 − tδ1i+1

0i

H2i(t) ≈ti − k01

2

m∑

j=1

vjhi

(ti − k01

2zj +

k01 + ti2

)

H3i(t) = 0

If k01 < t0i < ti < k0n, then

H1i(t) = 0

H2i(t) ≈ti − t0i

2

m∑

j=1

vjhi

(ti − t0i

2zj +

t0i + ti2

)

H3i(t) = 0

92

If t0i < ti < k01, then

H1i(t) =exp(δ0i)

δ1i + 1

tδ1i+1i − tδ1i+1

0i

H2i(t) = 0

H3i(t) = 0

Finally, if k0n < t0i < ti, then

H1i(t) = 0

H2i(t) = 0

H3i(t) =exp(φ0i)

φ1i + 1

tφ1i+1i − tφ1i+1

0i

In this section I wish to establish the benefit of incorporating the analytic compo-

nents, described above, compared to the general method of only using numerical inte-

gration, described in Section 5.3. I use the general Stata software package, stgenreg,

described above, to fit the full quadrature based approach, and a newly developed Stata

package, strcs, which implements the combined analytic and numerical approach when

using splines on the log hazard scale. Returning to the England/Wales breast cancer

dataset, I apply the spline based models shown in Equation (5.19), with 5 degrees of

freedom (6 knots), i.e. 5 spline variables to capture the baseline, incorporating the pro-

portional effect of deprivation status, with an increasing number of quadrature points,

ranging from 10 to 1000.

93

Table5.5.

En

gla

nd

and

Wal

esb

reas

tca

nce

rd

atas

et.

Com

par

ison

ofes

tim

ates

wh

enu

sin

gd

iffer

ent

nu

mb

ers

ofn

od

esfo

rth

efu

lly

nu

mer

icap

pro

ach

Nu

mb

erof

Nod

esP

aram

eter

10

20

3040

5010

0250

500

100

0M

ost

dep

rive

d0.

26856

00.

269302

0.26

9363

0.26

9380

0.26

9386

0.269

393

0.269

395

0.269

395

0.269

395

(0.0

3920

3)

(0.0

39202)

(0.0

3920

2)(0

.039

202)

(0.0

3920

2)(0

.0392

02)

(0.0

3920

2)(0

.039

202)

(0.0

392

02)

γ0

-2.9

168

19

-2.9

12434

-2.9

1046

3-2

.909

648

-2.9

0924

0-2

.908

601

-2.9

0828

9-2

.9082

01-2

.908

162

(0.0

6086

0)

(0.0

60749)

(0.0

6070

1)(0

.060

682)

(0.0

6067

3)(0

.0606

59)

(0.0

6065

1)(0

.060

648)

(0.0

606

47)

γ1

-0.0

851

13

-0.0

66088

-0.0

6217

8-0

.060

704

-0.0

5997

9-0

.058

850

-0.0

5834

6-0

.0582

14-0

.058

158

(0.0

2764

4)

(0.0

27508)

(0.0

2746

0)(0

.027

442)

(0.0

2743

2)(0

.0274

16)

(0.0

2740

8)(0

.027

405)

(0.0

274

04)

γ2

0.0

380

85

0.07

203

30.

0784

830.

0809

230.

0821

460.

084

099

0.084

980

0.085

214

0.085

314

(0.0

1994

0)

(0.0

19462)

(0.0

1929

7)(0

.019

231)

(0.0

1919

6)(0

.0191

35)

(0.0

1910

1)(0

.019

090)

(0.0

190

84)

γ3

0.1

473

81

0.12

189

10.

1158

690.

1134

730.

1122

520.

110

276

0.109

344

0.109

088

0.108

976

(0.0

1825

8)

(0.0

17899)

(0.0

1767

5)(0

.017

569)

(0.0

1750

9)(0

.0173

98)

(0.0

1733

3)(0

.017

311)

(0.0

172

99)

γ4

-0.0

404

37

-0.0

27974

-0.0

2515

2-0

.024

017

-0.0

2343

3-0

.022

474

-0.0

2201

7-0

.0218

90-0

.021

834

(0.0

1446

9)

(0.0

14429)

(0.0

1437

2)(0

.014

343)

(0.0

1432

7)(0

.0142

96)

(0.0

1427

7)(0

.014

270)

(0.0

142

67)

γ5

0.0

101

85

0.00

317

40.

0012

790.

0005

180.

0001

33-0

.000

481

-0.0

007

75-0

.0008

57-0

.000

893

(0.0

1351

2)

(0.0

13438)

(0.0

1340

8)(0

.013

395)

(0.0

1338

8)(0

.0133

74)

(0.0

1336

6)(0

.013

363)

(0.0

133

61)

log-l

ikel

ihood

-8739.

9490

-8753.

8333

-875

6.22

13-8

757.

0858

-875

7.50

06-8

758.1

249

-875

8.383

0-8

758

.4444

-875

8.468

3Standard

errors

inparentheses

94

Table5.6.

En

gla

nd

and

Wal

esb

reas

tca

nce

rd

atas

et.

Com

par

ison

ofes

tim

ates

wh

enu

sin

gd

iffer

ent

nu

mb

ers

ofn

od

esfo

rth

eco

mb

ined

analy

tica

l/nu

mer

icap

pro

ach

Nu

mb

erof

Nod

esP

aram

eter

10

20

3040

5010

0250

500

100

0M

ost

dep

rive

d0.

26929

50.

269376

0.26

9390

0.26

9393

0.26

9394

0.269

395

0.269

395

0.269

395

0.269

395

(0.0

3920

2)

(0.0

39202)

(0.0

3920

2)(0

.039

202)

(0.0

3920

2)(0

.0392

02)

(0.0

3920

2)(0

.039

202)

(0.0

392

02)

γ0

-2.9

063

90

-2.9

08770

-2.9

0835

3-2

.908

198

-2.9

0814

8-2

.908

133

-2.9

0813

3-2

.9081

33-2

.908

133

(0.0

6065

6)

(0.0

60663)

(0.0

6065

0)(0

.060

648)

(0.0

6064

7)(0

.0606

47)

(0.0

6064

7)(0

.060

647)

(0.0

606

47)

γ1

-0.0

614

99

-0.0

59304

-0.0

5846

9-0

.058

225

-0.0

5814

9-0

.058

118

-0.0

5811

7-0

.0581

17-0

.058

117

(0.0

2739

7)

(0.0

27411)

(0.0

2740

5)(0

.027

404)

(0.0

2740

4)(0

.0274

03)

(0.0

2740

3)(0

.027

403)

(0.0

274

03)

γ2

0.0

775

81

0.08

372

00.

0849

020.

0852

330.

0853

370.

085

390

0.085

390

0.085

390

0.085

390

(0.0

1903

3)

(0.0

19082)

(0.0

1908

2)(0

.019

080)

(0.0

1908

0)(0

.0190

79)

(0.0

1907

9)(0

.019

079)

(0.0

190

79)

γ3

0.1

129

49

0.11

041

00.

1093

700.

1090

430.

1089

380.

108

889

0.108

888

0.108

888

0.108

888

(0.0

1711

7)

(0.0

17279)

(0.0

1729

1)(0

.017

290)

(0.0

1728

9)(0

.0172

88)

(0.0

1728

8)(0

.017

288)

(0.0

172

88)

γ4

-0.0

246

49

-0.0

22456

-0.0

2199

6-0

.021

857

-0.0

2181

2-0

.021

790

-0.0

2179

0-0

.0217

90-0

.021

790

(0.0

1418

8)

(0.0

14258)

(0.0

1426

3)(0

.014

263)

(0.0

1426

3)(0

.0142

63)

(0.0

1426

3)(0

.014

263)

(0.0

142

63)

γ5

-0.0

001

64

-0.0

00367

-0.0

0074

5-0

.000

869

-0.0

0090

8-0

.000

921

-0.0

0092

2-0

.0009

22-0

.000

922

(0.0

1342

8)

(0.0

13363)

(0.0

1336

0)(0

.013

360)

(0.0

1336

0)(0

.0133

60)

(0.0

1336

0)(0

.013

360)

(0.0

133

60)

log-l

ikel

ihood

-8754.

2660

-8757.

6342

-875

8.25

59-8

758.

4167

-875

8.46

34-8

758.4

839

-875

8.484

0-8

758

.4840

-875

8.484

0

Standard

errors

inparentheses

95

Table 5.5 presents parameter estimates and standard errors under the full numer-

ical approach, across varying number of quadrature nodes, and Table 5.6 presents the

equivalent results for the combined analytic/numeric approach. From Table 5.5, I still

observe variation in estimates and the log-likelihood to 5 or 6 decimal places between

500 and 1000 nodes, whilst for the combined approach shown in Table 5.6, the maxi-

mum difference between 100 and 1000 nodes is 0.000001. For the combined approach

the log-likelihood does not change to 3 decimal places between 100 and 1000 nodes,

whilst the log-likelihood for the full numerical approach is only the same to 1 decimal

place.

I found that with the full numerical approach it required 23 nodes, and 50 nodes,

to establish consistent estimates to 3 and 4 decimal places, respectively. I compare that

to 18 nodes, and 27 nodes under the combined analytic and numerical approach. Final

results for the combined approach using 27 nodes are presented in Table 5.7.

Table 5.7. England and Wales breast cancer dataset. Results fromcombined analytic/numerical spline based survival model.

Variable Hazard ratio 95% CI

Most deprived 1.309 1.212 1.414

Baseline Coefficient 95% CI

γ1 -0.059 -0.112 -0.005γ2 0.085 0.047 0.122γ3 0.110 0.076 0.143γ4 -0.022 -0.050 0.006γ5 -0.001 -0.027 0.025

Intercept -2.908 -3.027 -2.789

From Table 5.7 I observe a statistically significant hazard ratio of 1.309 (95% CI:

1.212, 1.414), indicating an increased hazard rate in the most deprived group, compared

to the least deprived. Comparing computation time, the general approach with 49

quadrature nodes took 20.5 seconds on a HP laptop with Intel i5 2.5GHz processor and

8GB of RAM, compared to 17.5 seconds using the combined approach with 27 nodes.

For clarity, I can compare the estimated hazard ratio from the spline based approaches

96

to that obtained from a Weibull model, which gave an estimated hazard ratio of 1.310

(95% CI: 1.213, 1.414), indicating very close agreement.

5.11. Further analysis of example datasets using strcs

I now proceed with further illustration of the general framework; however, now

using the spline based model, on the log hazard scale, with the improved estimation

routine implemented in the Stata package strcs.

5.11.1. Excess mortality model

To illustrate the excess mortality model I use the same data source as in Section

5.10.1; the England and Wales breast cancer dataset. However, I now include women

aged over 50. Expected mortality is stratified by age, sex, calendar year, region and

deprivation quintile (Coleman et al., 1999). As for the previous analyses in this chapter,

I only include the least and most deprived groups for simplicity. Age is categorised into

5 groups: <50, 50-59, 60-69, 70-79, 80+. There are 41,645 subjects included in the

analysis.

5.11.1.1. Proportional excess hazards model

I initially fit a model where the excess mortality rate is assumed to be proportional be-

tween different covariate patterns. I compare the estimates to a model using restricted

cubic splines on the log cumulative hazard scale (Nelson et al., 2007). In both models

6 knots are used with these placed evenly according to the distribution of log death

times. Results are shown in Table 5.8.

From Table 5.8, I observe very similar hazard ratios and their 95% confidence

intervals between the models on different scales.

5.11.1.2. Time-dependent effects

A model is now fitted where the assumption of proportional excess hazards is relaxed

for all covariates. This is done by incorporating an interaction between each covariate

and a restricted cubic spline function of log time with 4 knots (3 degrees of freedom).

The knots are placed evenly according to the distribution of log death times. The

estimated excess hazard ratio for deprivation group can be seen in Figure 5.5. As there

97

Table 5.8. England and Wales breast cancer dataset. Comparison ofexcess hazard ratios (and 95% confidence intervals) from models with thelinear predictor on the log hazard scale and the log cumulative hazardscale. Both models have 6 knots with these placed evenly according tothe distribution of log death times.

Covariate log hazard log cumulative hazardMost deprived 1.313 1.313

(1.265,1.364) (1.265,1.364)

Age (50-59) 1.055 1.055(0.998,1.114) (0.998,1.114)

Age (60-69) 1.071 1.071(1.014,1.130) (1.015,1.131)

Age (70-79) 1.453 1.454(1.372,1.539) (1.373,1.540)

Age (80+) 2.647 2.647(2.484,2.822) (2.484,2.821)

Age <50 and least deprived are the reference groups

is not an interaction between deprivation group and age group then this hazard ratio

is assumed to apply for each of the 5 age groups. If the model was fitted on the log

cumulative excess hazard scale, then this would not be the case. This is illustrated

in Figure 5.6 where the same linear predictor has been fitted for a model on the log

cumulative excess hazard scale and the estimated excess hazard ratio is shown for two

age groups and is shown to be different.

5.11.2. Cluster robust standard errors

To illustrate the use of cluster robust standard errors combined with the numerical

integration approach, I use a dataset of 85 patients with bladder cancer (Prentice et al.,

1981; Therneau and Grambsch, 2000). I fit a model for recurrent event data, where the

event of interest is recurrence of bladder cancer. Each patient can experience a total of

4 events, shown in Table 5.9. A total of 112 events were observed. Covariates of interest

include treatment group (0 for placebo, 1 for thiotepa), initial number of tumors (range

1 to 8, with 8 meaning 8 or more), and initial size of tumors (in centimetres, with range

1 to 7).

98

1

1.2

1.4

1.6

1.8

Exc

ess

mor

talit

y ra

tio

0 1 2 3 4 5Year since diagnosis

Figure 5.5. England and Wales breast cancer dataset. Excess hazardratio comparing most deprived with least deprived group. The modelused 6 knots for the baseline and 4 knots for the time-dependent effect.Shaded area represents the 95% confidence interval.

1

1.2

1.4

1.6

1.8

Exc

ess

mor

talit

y ra

tio

0 1 2 3 4 5Year since diagnosis

Age <50Age 70-79

Thinner lines are lower and upper confidence intervals

Figure 5.6. England and Wales breast cancer dataset. Excess hazardratios comparing most deprived with least deprived group. The modelused 6 knots for the baseline and 4 knots for the time-dependent effect.Thicker lines are the estimated excess hazard ratios, with thinner linesassociated 95% confidence intervals.

To allow for the inherent hierarchical structure, i.e. events nested within patients,

I fit a parametric version of the Prentice-Williams-Peterson model, allowing for cluster

99

Table 5.9. Bladder cancer dataset. Number of patients who were cen-sored or experienced up to 4 recurrences of bladder cancer

Recurrence Number of patientsnumber Censored Event Total

1 38 47 852 17 29 463 5 22 274 6 14 20

robust standard errors. This model uses non-overlapping time intervals, thus for ex-

ample, a patient is not at risk of a second recurrence until after the first has occurred.

I illustrate this by showing the data structure required

. list id _t0 _t _d if id==14

id _t0 _t _d

21. 14 0 3 1

22. 14 3 9 1

23. 14 9 21 1

24. 14 21 23 0

The above shows the observed data for patient 14 who experienced three recur-

rences, and was at risk of a fourth. The variable t0 indicates when the patient becomes

at risk of each event, and the variable t indicates the recurrence time. d is the event

indicator.

The baseline hazard for each event is also allowed to vary, i.e. there is a stratification

factor by event. I use 5 knots for a shared baseline between the events, but allow

departures from this baseline using restricted cubic splines with 3 knots for each of the

subsequent events. For comparison, I also fit a Cox model, stratified by event number,

with cluster robust standard errors (Lin and Wei, 1989). Results are presented in Table

5.10.

From Table 5.10, I observe similar estimates from the spline based model, compared

to the Cox model with cluster robust standard errors. I can compare estimated baseline

hazard rates for each of the four ordered events from the spline based model, shown in

Figure 5.7. These are not directly estimated in the Cox model.

100

Table 5.10. Bladder cancer dataset. Results from spline based andCox models with cluster robust standard errors.

VariableSpline hazard model Cox model

HR Robust Std. Err. 95% CI HR Robust Std. Err. 95% CI

group 0.699 0.149 0.459 1.063 0.716 0.148 0.478 1.073size 0.990 0.064 0.872 1.123 0.992 0.061 0.878 1.120

number 1.146 0.060 1.035 1.269 1.127 0.058 1.018 1.247

.001

.01

.11

6H

azar

d ra

te


Event 1

.001

.01

.11

6H

azar

d ra

te


Event 2

.001

.01

.11

6H

azar

d ra

te


Event 3

.001

.01

.11

6H

azar

d ra

te


Event 4

95% Confidence Interval Hazard rate

Figure 5.7. Bladder cancer dataset. Baseline hazard rates for the fourordered events.

It can be seen from Figure 5.7, that those patients who go on to experience a third

and fourth event have a high initial hazard rate, demonstrating the fact that they will

likely be a more severe subgroup.

5.12. Discussion

In this chapter I have presented a framework for the general, parametric analysis

of survival data, incorporating any combination of complex baseline hazard functions,

time-dependent effects, time-varying covariates, delayed entry (left truncation), robust

and cluster robust standard errors, and the extension to relative survival. Through

specification of a user-defined hazard function, I have illustrated how to implement

101

standard proportional hazards models, novel restricted cubic spline survival models

and a generalised gamma model with proportional hazards. In essence, this modelling

framework may be used to implement a parametric survival model defined by anything

from a very simple one parameter proportional hazards model, to models which contain

highly flexible functions of time, for both the baseline and time-dependent effects, and

also allowing for a hierarchical structure. Any parameter defined in the hazard function

can be dependent on complex functions of time, including fractional polynomials or

restricted cubic splines.

The choice of the number of quadrature nodes is left to the user. Similarly to

Chapter 4, where simulated survival times should be checked for consistency by using

an increasing number of quadrature nodes, consistent parameter estimates should also

be determined by using an increasing number of quadrature points.

As it is a general framework, it may not be the most computationally efficient;

however, it is a useful tool for the development of novel models. For example, it may

be useful to develop ideas and test new models, but then spend time developing more

computationally efficient methods for specific cases. In particular, when using restricted

cubic splines on the log hazard scale, I described how the general framework can be

optimised in special cases with respect to the estimation routine, utilising the restricted

nature of the splines to incorporate the analytic parts of the cumulative hazard function,

in combination with the Gaussian quadrature. This provided a much more efficient es-

timation process, requiring far fewer quadrature nodes to obtain consistent estimates,

providing computational benefits. In application to the England/Wales breast cancer

data, I showed that the general numerical approach requires a large number of quadra-

ture nodes, compared to the combined analytic/numeric approach, in order to obtain

consistent estimates. This is due to the numerical approach struggling to capture high

hazards at the beginning of follow-up time. Given that hazard ratios are usually only

reported to two/three decimal places, the large number of nodes used in Section 5.9

will often not be required. In further examples not shown, where the hazard is low at

102

the beginning of follow-up, often < 30 nodes are sufficient with the full numerical ap-

proach. I have chosen to use restricted cubic spline functions of log time, since in many

applications it has been found that this provides an equivalent or better fit, compared

to using splines of time. However, in studies with age as the timescale it may be more

appropriate to use spline functions of untransformed time.

I described in Section 5.7, that in order to calculate standard errors and subse-

quently confidence intervals of functions such as the cumulative hazard and survival,

the delta method combined with numerical quadrature can be used. I showed briefly in

Section 5.9.1 that there was excellent agreement between confidence intervals calculated

using this method, and the equivalent analytic method where numerical integration and

the delta method were not required. It would be useful to evaluate this approximation

in further detail through simulation, particularly to calculate coverage in estimates of

the survival function, as was conducted by Rutherford et al. (2014) within the context

of RP models.

Numerical quadrature such as Gauss-Legendre will be utilised in subsequent chap-

ters to estimate the cumulative hazard function within a joint model of longitudinal

and survival data, and furthermore, alternative forms of Gaussian quadrature will be

described and evaluated when calculating the joint likelihood.

103

CHAPTER 6

Joint modelling of longitudinal and survival data


In this chapter I outline the foundations of joint modelling of longitudinal and

survival data, which has been a highly active area of biostatistical research in recent

years. I describe the motivation, model framework and estimation of joint models,

which form the basis of extensions and more in depth clinical analyses presented in

subsequent chapters. I was recently involved in a review of current methods and issues

within joint modelling, conducted by the Drug Information Association joint modelling

working group, published in Statistics in Medicine, which is included in Appendix C

(Gould et al., 2014).

6.2. Introduction

It is intrinsic to clinical practice that longitudinal and survival data are collected

in parallel. Biomarker levels such as blood pressure are recorded at more than one

time point as an indicator of patient status, as well as clinical event data such as time

to myocardial infarction, stroke or death. In many settings the two processes have

an underlying association, for example, repeated measurements of biomarkers such as

prostate specific antigen (PSA) have been used to predict recurrence of prostate cancer

(Proust-Lima and Taylor, 2009). Cluster of Differentiation 4 (CD4) counts have also

been used extensively to predict time to progression to Acquired Immune Deficiency

Syndrome (AIDS) in Human Immunodeficiency Virus (HIV) infected patients (Wang

and Taylor, 2001; Wolbers et al., 2010). The key question here is, how do changes in a

repeatedly measured biomarker profile impact the time to an event of interest?

104

Conversely, this problem can be viewed from a different perspective. Longitudinal

studies, which measure biomarkers repeatedly over time, are often affected by drop-out.

When this drop-out is associated with the underlying biomarker profile, then the miss-

ingness process can be considered missing not at random, and will affect conclusions

if ignored in the longitudinal analysis (McArdle et al., 2005). Naively removing such

patients that dropped-out through list-wise deletion can have severe consequences on

analyses. If this element of the data generating mechanism is combined with the in-

herent clustering structure of the longitudinal data, with measurements nested within

patients (and possible higher levels), then it requires a complex model framework to at-

tempt to disentangle the various elements of the generating biological processes (Diggle

et al., 2008).

Recent arguments have been made that much more importance should be placed on

attempting to evaluate patients’ perspectives in clinical research (Gould et al., 2014).

Within cancer or cardiovascular trials, such commonly available outcomes as quality

of life (QOL), measured repeatedly, are often considered as supplementary information

and not utilised to their full extent. The joint analysis of QOL and survival is a

particular area which can benefit from a joint modelling approach, where the death can

act as an informative drop-out process, especially when the two processes are strongly

associated (Billingham and Abrams, 2002; Ibrahim et al., 2010).

Although the two scenarios described above can be approached with the same mod-

elling framework, in this thesis I concentrate on the first aspect; including a biomarker,

measured repeatedly on the same subject, and with error, in a time to event analysis.

6.3. Survival analysis with time-varying covariates

The occurrence of time-varying covariates within the survival analysis framework

was briefly covered in Chapters 2 and 5. Here I give a formal definition of exogenous

and endogenous time-varying covariates, following Kalbfleisch and Prentice (2002) and

Rizopoulos (2012).

105

6.3.1. Exogenous covariates

Let yi(t) represent the covariate at time t for the ith subject, with Yi(t) = yi(s), 0 ≤s < t representing the entire covariate history up to time t. I can formally define an

exogenous covariate, which must satisfy the following relation:

Prs ≤ Ti < s+ ds|Ti ≥ s, Yi(s) = Prs ≤ Ti < s+ ds|Ti ≥ s, Yi(t) (6.1)

for all s, t such that 0 < s ≤ t, and ds → 0. Given this definition, yi(.) is associated

with the rate of event; however, note the independence between the occurrence of the

event at time s, and the future profile when t > s. Exogenous covariates can be thought

of as being external to the patients in a particular study. A common example includes

the time of day.

6.3.2. Endogenous covariates

Endogenous covariates can often be considered to be patient specific biomarkers, in

that they are often recorded on the cohort of patients under study, i.e. they are internal

processes. Given this, endogenous covariates can only be measured when the patient

is alive, therefore when the event of interest is death, the trajectory of the internal

process can have direct information relating to the time that the failure occurs. This

can be formally defined as

Si(t|Yi(t)) = Pr(Ti > t|Yi(t)) = 1 (6.2)

and therefore by definition, existence of a measurement at time t means the patient

is still alive. An often inherent component of endogenous covariates, is that they

are typically measured with error, especially since endogenous covariates are generally

biological measurements made on particular patients.

Furthermore, they are often measured intermittently, i.e. measurements are only

recorded at specified measurement times. It is these features which must be accom-

modated in the analysis of endogenous covariates. Common examples include repeated

measures of lung function, systolic and diastolic blood pressure, and serum cholesterol.

106

6.3.3. Data set up for survival analysis with time-varying covariates

For illustration, a single time-varying covariate can be incorporated into the propor-

tional hazards framework as follows

hi(t|Xi, Yi(t)) = h0(t) exp(Xiβ + αyi(t)) (6.3)

where exp(α) represents the hazard ratio for a one-unit increase in the time-varying

covariate, yi(t), at time t. This model can be fitted in standard statistical software

using the enter and exit notation, first shown in Table 5.3, which I repeat here for ease

of exposition in Table 6.1.

Table 6.1. Liver cirrhosis trial of prednisone treatment. Examplemultiple-record per subject data structure.

ID Prothrombin Treatment Enter time Exit time Event

1 38 placebo 0 0.244 01 31 placebo 0.244 0.381 01 27 placebo 0.381 0.413 1

111 59 prednisone 0 0.246 0111 60 prednisone 0.246 0.498 0111 87 prednisone 0.498 0.745 0111 59 prednisone 0.745 1.128 0111 35 prednisone 1.128 1.158 1

Using this data setup invokes a strict assumption that the time-varying covariate

only changes at observation times, and assumes that the value of the covariate is con-

stant following each measurement, until a new measurement is taken. This is illustrated

in Figure 6.1. There are fundamental issues with this approach when modelling an en-

dogenous time-varying covariate. Firstly, the magnitude of the association between the

biomarker and the event will be underestimated due to measurement error (Prentice,

1982). This particular issue will be investigated in Chapter 9. Secondly, by assuming

a last value carried forward approach, I make the assumption that the time-varying

covariate values do not change between observations, which is highly likely to be un-

realistic for many processes. The resulting interpolations conducted between possibly

107

60

80

100

120

Pro

thro

mbi

n in

dex


Observed measurements recorded at follow-up visitsBiomarker profile used in model fitting (last value carried forward)

Time-varying covariate

Figure 6.1. Liver cirrhosis trial of prednisone treatment. Time-varying covariate within a survival analysis

sparse recordings of the endogenous covariate serve to induce further measurement error

and dilute the estimate of association.

In the next Section I introduce a framework to address the inherent complications

when analysing an endogenous covariate within a survival analysis.

6.4. Formulating the standard joint model

In essence, a joint model consists of two component submodels: a model for the

trajectory of longitudinal measurements, and a model for the event occurrence, that

share one or more parameters. The form of joint model which has dominated the liter-

ature assumes that the association between the time-to-event and longitudinal marker

is characterised by shared random effects, and it is this approach which I adopt.

Building on the notation used in previous chapters and that of Rizopoulos (2012),

let Si be the true survival time of patient i = 1, . . . , n, and Ti = min(Si, Ci) the observed

survival time, with Ci the censoring time. Define an event indicator di, which takes

the value of 1 if Si ≤ Ci and 0 otherwise. Let yij = yi(tij), j = 1, . . . , ni denote the

observed longitudinal responses, with yi(tij) the jth observed longitudinal response of a

continuous biomarker for the ith patient taken at time tij . Each patient has ni repeated

measures. I further define a vector of time-independent baseline covariates, Ui.

108

6.4.1. Longitudinal submodel

The observed longitudinal responses, yij , are inherently measured intermittently, and

with error. To accommodate these features of the data, I can utilise the linear mixed

effects framework, to estimate the true underlying, and complete, subject-specific tra-

jectory function (Diggle et al., 2002). I therefore define

yi(t) = mi(t) + ei(t), ei(t) ∼ N(0, σ2) (6.4)

mi(t) = xTi (t)β + zTi (t)bi + uTi δ, bi ∼ N(0,Σ) (6.5)

with time-dependent design matrices xi(t) and zi(t) for the fixed β and random bi

effects, respectively. Furthermore, I also define a vector of time invariant baseline

covariates, ui ∈ Ui, and corresponding regression coefficients, δ. I assume the mea-

surement error, ei(t), is normally distributed with variance σ2, and is independent from

the random effects, and that cov(ei(t), ei(u)) = 0 (where t 6= u).

Through this formulation, I now have mi(t) representing the true unobserved longi-

tudinal values, essentially removing the measurement error at any time point t, not just

the time that measurements were recorded, tailored to individual patients due to the

random effects. I can then relate the longitudinal trajectory (or components of it, see

Section 6.5), directly to the risk of event. Returning to the prothrombin index example

above, I illustrate this concept in Figure 6.2, illustrating the smooth estimated profile

mi(t).

Flexibility in the longitudinal submodel can be incoporated through the use of

fractional polynomials of time, for example, which will often be sufficient to capture

the longitudinal trajectory (Royston and Altman, 1994). B-splines and restricted cubic

splines have also been used in the joint model context (Brown et al., 2005; Rizopoulos

and Ghosh, 2011). Restricted cubic splines will be used to capture variable longitudinal

profiles in Chapter 9.

6.4.2. Survival submodel

The idea behind a joint model is to link the component processes together through

109

60

80

100

120

Pro

thro

mbi

n in

dex


Observed measurements recorded at follow-up visitsBiomarker profile used in model fitting (last value carried forward)Estimated underlying biomarker profile from a joint model

Time-varying covariate

Figure 6.2. Time-varying covariate within a joint model

some shared parameters. Given Equation (6.5), I can now incorporate mi(t), our true

unobserved patient specific longitudinal trajectory, to the risk of an event. I define

Mi(t) = mi(s), 0 ≤ s ≤ t, to be the true unobserved longitudinal profile up to time

t. I define the proportional hazards time-to-event submodel

h(t|Mi(t),vi) = h0(t) expψTvi + αmi(t) (6.6)

where h0(t) is the baseline hazard function, and vi ∈ Ui a set of baseline time-

independent covariates with associated vector of log hazard ratios, ψ. By including

the true unobserved trajectory function, mi(t), into the linear predictor of the pro-

portional hazards model, I now begin to link the component submodels to form the

joint modelling framework. Multiplying mi(t) by an association parameter, α, I can

interpret exp(α) as the hazard ratio for a one unit increase in mi(t), at time t. In this

formulation I assume the association is based on the current value of the longitudinal

response at time t. This, and other association structures, are described further in

Section 6.5.

The survival function follows naturally

S(t|Mi(t),vi) = exp

(−∫ t

0h0(u) expψTvi + αmi(u) du

)(6.7)

110

Given this definition, I see that the survival function is dependent on the entire history

of the trajectory function, mi(t). The integral in Equation (6.7), due to the time-

dependent nature of the longitudinal process mi(t), often requires numerical integration

to evaluate, which complicates the estimation process. This will be described further

in Section 6.6.

6.4.2.1. Choosing a baseline hazard function

The two fundamental papers that developed the joint modelling framework I have de-

scribed above, from a frequentist perspective, were that of Wulfsohn and Tsiatis (1997)

and Henderson et al. (2000). They both chose the Cox model as the survival submodel

of choice, which of course does not directly estimate the baseline hazard function (Cox,

1972; Wulfsohn and Tsiatis, 1997; Henderson et al., 2000). In general, this can be seen

as a strength and weakness. By leaving the baseline hazard function unspecified, I

avoid making any assumptions about the underlying functional form. Conversely, and

as I have described in previous chapters, if interested in absolute measures of risk, such

as predicting the outcome for a particular patient, which is very often precisely what

is wanted, then it is much more convenient to do so within a parametric framework.

In particular, Hsieh et al. (2006) showed that incorporating the semi-parametric

Cox model into the joint model framework, results in under-estimation of the param-

eter standard errors. Consequently, bootstrapping is required to obtain appropriate

standard errors. Given that computationally intense numerical integration is required

to fit the models (described in detail in Section 6.6), this is an undesirable aspect of

this particular joint model framework.

This issue is avoided when taking a parametric approach. Such standard para-

metric choices include the exponential, Weibull and Gompertz distributions. These

forms of survival model were described in Section 2.6 of Chapter 2. Of course, as

described in Chapters 4 and 5, assuming a standard survival distribution can be con-

sidered restrictive in the underlying baseline hazard functions that can be accurately

captured. Incorporating more flexibility into the survival submodel choice is evaluated

and discussed further in Chapters 7 and 8.

111

6.5. Alternative association structures

The standard joint model framework, described above, relates the true unobserved

longitudinal profile at time t, directly to the risk of event at time t. This is often referred

to as the current value parameterisation. Exploring alternative, clinically meaningful,

ways of linking the two processes expands the usefulness of the joint modelling frame-

work.

In this Section I describe various alternatives to the current value parameterisation,

in particular, concentrating on association structures that are available within the Stata

package stjm, developed as part of this thesis (described in detail in Chapter 10).

6.5.1. Interaction effects

The standard joint model association structure, seen in Equation (6.6), assumes the

same association between the true longitudinal value and the risk of event, for all pa-

tients. In some cases, this can be considered an unrealistic assumption, in which case,

it may be of interest to allow for different values of association for different patient sub-

groups. This can be achieved through forming interactions between baseline covariates

and the true unobserved longitudinal trajectory function, as follows

h(t|Mi(t),vi1,vi2) = h0(t) exp[ψTvi1 +αT vi2 ×mi(t)

](6.8)

where vi1,vi2 ∈ Ui. As in Equation (6.6), I have a vector of baseline covariates, vi1,

but now introduce a vector of interaction covariates, vi2, which is interacted with the

true longitudinal profile, mi(t). This then provides a vector of association parameters

α, providing different associations for different covariate patterns. If vi2 = 1, then

Equation (6.8) reduces to the standard current value parameterisation in Equation

(6.6).

6.5.2. Time-dependent slope

The association structures described above are parameterised to link the current value

of the true longitudinal response. Given that I am modelling the longitudinal outcome

over time, a natural extension to the association structures is to link the rate of change,

112

or slope,

h(t|Mi(t),vi) = h0(t) expψTvi + α1mi(t) + α2m′i(t) (6.9)

with

m′i(t) =dmi(t)

dt=

dxTi (t)β + zTi (t)bidt

(6.10)

Inclusion of this further association structure, from a clinical perspective, could be

important where interest not only lies in the current values of, say, a biomarker level,

but also how the rate of change of the biomarker is associated with the risk of event. For

example, it may be of interest to know not only the effect that the level of the biomarker

has, but whether it is on an increasing or decreasing trend, at time t. This association

structure was incorporated into a two-stage regression calibration joint model by Ye

et al. (2008). Assessment of the added benefit of including the rate of change of CD4

trajectories within a joint model framework to model the risk of progression to AIDS

or death in HIV-positive patients was conducted by Wolbers et al. (2010).

6.5.3. Random effects parameterisation

Finally, I define a time-independent association structure, which includes only the

random effects in the linear predictor of the survival submodel

h(t|Mi(t),vi) = h0(t) expψTvi +αT (β + bi) (6.11)

Equation (6.11) includes both the population level mean of the random effect, β, plus

the subject specific deviation, bi. Alternatively, I can include only the subject specific

deviation, as follows

h(t|Mi(t),vi) = h0(t) expψTvi +αTbi (6.12)

The association parameters in Equations (6.11) and (6.12) differ in their interpretation.

For example, consider a random intercept and random slope longitudinal submodel

mi(t) = (β0 + b0i) + (β1 + b1i)t (6.13)

113

Under Equation (6.11), I can investigate the association between patient specific values

of the true longitudinal outcome at t = 0 using

h(t|Mi(t),vi) = h0(t) expψTvi + α1(β0 + b0i) (6.14)

where exp(α1) is the hazard ratio for a one unit increase in the baseline value of the

longitudinal outcome i.e. the intercept. This particular association structure will be

discussed in more detail in Chapter 9. Alternatively, using Equation (6.12)

h(t|Mi(t),vi) = h0(t) expψTvi + α2b0i (6.15)

where now the association is assumed to be based only on the subject specific deviation

from the population mean intercept.

I also draw attention to an equivalence which can exist between association struc-

tures, under certain simplistic longitudinal models, for example, using the trajectory

model in Equation (6.13), the rate of change association structure described in Section

6.5.2 is equivalent to a time-independent association structure as follows

h(t|Mi(t),vi) = h0(t) exp

ψTvi + α3

dmi(t)

dt

= h0(t) expψTvi + α3(β1 + b1i)

where exp(α3) now represents the hazard ratio for a one unit increase in the subject

specific slope of the true longitudinal outcome.

It is important to note, however, that interpretation under the time-independent

associations must be undertaken with caution. When a complex longitudinal trajectory

function is chosen for the random effects, such as fractional polynomials or splines, then

linking the random coefficients of such spline terms, for example, does not provide a

clinically interpretable association parameter.

From a computational perspective, however, the time-independent association struc-

tures are particularly useful, as the cumulative hazard function (and consequently the

114

survival function) required in the estimation process often requires numerical integra-

tion under commonly used log hazard scale parametric survival submodels and time-

dependent association structures. By using a time-independent association structure,

this numerical integration can be avoided as the cumulative hazard function has a

closed form, providing direct computational benefits. This is described in more detail

in the next section.

On a final note, incorporating more than one association structure into the linear

predictor can allow investigation of the effect of various aspects of the longitudinal

trajectory, such as in Equation (6.9) (Henderson et al., 2000). For example, combining

the current value parameterisation with the random intercept association can provide

an association framework which can allow for the time-dependent changes in the lon-

gitudinal response, whilst incorporating the often highly predictive component of the

baseline values.

6.6. Estimation

The estimation of a joint longitudinal and survival model is not a trivial task.

Motivated by the difficulty in estimation, and lack of available software, early work in

the field of joint modelling adopted a two-stage approach, in efforts to improve on a

standard survival analysis whereby the observed values of a time-varying covariate were

used, as described in Section 6.3.3. The two-stage approach first fitted a longitudinal

mixed effects model to the observed repeated measures, and then obtained subject-

specific predictions for the longitudinal marker, including empirical Bayes estimates of

the random effects, and subsequently used the predictions as a time-varying covariate

within a survival model (Self and Pawitan, 1992). The appeal of this approach is

in its general simplicity, in that it can be implemented in any standard statistical

software. However, many authors such as Tsiatis and Davidian (2001) and Sweeting

and Thompson (2011) have shown that although this method offers an improvement in

terms of bias compared to using the observed marker values, this two-stage approach

still produces substantial bias and poor coverage in a range of scenarios. In particular,

115

Sweeting and Thompson (2011) noted that any informative drop-out is ignored, and

the uncertainty in the estimation of the maximum likelihood estimates and empirical

Bayes contributions are ignored in the survival model, producing estimates that are too

precise.

Consequently, much of the recent research has concentrated on full likelihood ap-

proaches, where the longitudinal and survival processes are estimated simultaneously

within a joint framework, and it is this approach which I take in this thesis. Other

alternatives include a Bayesian approach, using Markov Chain Monte Carlo methods

(Faucett and Thomas, 1996; Brown and Ibrahim, 2003a), which is appealing as it avoids

the need for numerical integration to integrate out the random effects, as will be de-

scribed in subsequent sections under a likelihood approach, although can still be highly

computationally intensive.

6.6.1. Deriving the joint likelihood

I now define the log-likelihood under the full joint model approach, for the joint dis-

tribution of Ti, di, yi, the survival time, event indicator, and continuous longitudinal

response, respectively. I define a vector of time-independent random effects, bi which

serve to characterise the longitudinal and survival processes. I further define an overall

parameter vector θ = θt,θy,θb, where θt is the parameter vector for the survival pro-

cess, θy is the vector of parameters for the longitudinal process, and θb is the elements

of the variance-covariance matrix of the random effects.

The fundamental assumption of this joint model framework is that the random ef-

fects both account for the association between the survival and longitudinal processes,

but further account for the correlation between the repeated measurements of the lon-

gitudinal outcome. Following Rizopoulos (2012), this can be expressed as

p(Ti, di,yi|bi;θ) = p(Ti, di|bi;θ)p(yi|bi;θ) (6.16)

p(yi|bi;θ) =

ni∏

j=1

pyi(tij)|bi;θ (6.17)

116

where j indexes the longitudinal measurements. Furthermore, I make the common

assumptions of independent censoring, and non-informative times of measurement.

Therefore, the log-likelihood for the ith patient can be written as

log p(Ti, di,yi;θ) = log

∫p(Ti, di,yi, bi;θ)dbi

= log

∫p(Ti, di|bi;θt)

ni∏

j=1

pyi(tij)|bi;θy

p(bi;θb)dbi (6.18)

where

p(Ti, di|bi;θt) = hi(Ti|Mi(Ti);θt)di × Si(Ti|Mi(Ti);θt)

=[h0(Ti) expψTvi + αmi(Ti)

]di

× exp

(−∫ Ti

0h0(u) expψTvi + αmi(u)du

)(6.19)

pyi(tij)|bi;θy = (2πσ2)−1/2 exp

− [yi(tij)−mi(tij)]

2

2σ2

(6.20)

and

p(bi;θb) = (2π)−qb/2|Σ|−1/2 exp

(−b

Ti Σ−1bi

2

)(6.21)

with a continuous longitudinal response, yi, assuming conditional normality, and nor-

mally distributed random effects with variance-covariance matrix Σ, where qb is the

dimension of the random effects.

The estimation of full likelihood joint models has almost exclusively been through

(semi-parametric) maximum likelihood, implemented using Expectation-Maximisation

(EM) algorithms, where in the expectation step the unknown random effects are treated

as missing values (Wulfsohn and Tsiatis, 1997; Henderson et al., 2000). Furthermore,

many authors have proposed a Bayesian approach to estimation, through Markov Chain

Monte Carlo (MCMC) techniques (Brown and Ibrahim, 2003a,b; Wang and Taylor,

2001). In addition, Tsiatis and Davidian (2001) have proposed a conditional score

approach, which makes no distributional assumption on the random effects.

117

Alternatively, within the maximum likelihood framework, estimation can be con-

ducted via a direct maximisation of the observed data log-likelihood using standard

maximisation techniques such as the Newton-Raphson algorithm. It is this approach

which I adopt to fit the models. As has been discussed in Rizopoulos et al. (2009), the

score equations can be calculated; however, as with the log-likelihood, numerical inte-

gration is required to compute them. Within a generalised linear mixed effects model

context, Lesaffre and Spiessens (2001) have shown that often the integrals required for

such computed derivatives are more poorly approximated by quadrature compared to

the numerical estimates obtained using finite differences.

The important aspect to note from Equation (6.18), is that the (possibly multi-

densional) integral is analytically intractable, and therefore requires numerical integra-

tion techniques to evaluate, and hence maximise the likelihood. In the next section I

describe methods to evaluate the likelihood when integrating over the random effects.

Furthermore, under a parametric survival submodel, the integral in Equation (6.19)

requires numerical integration to evaluate it due to the time-dependency of the longi-

tudinal submodel. Evaluation of this form of integral was described in Chapters 4 and

5. It is clear that from the requirement of numerical integration required to calculate

the survival function, nested within (possibly multi-dimensional) numerical integration

to integrate over the random effects, is a computationally demanding and challenging

task.

6.6.2. Numerical integration over the infinite domain: Gauss-Hermite

quadrature

As described in earlier chapters, Gaussian quadrature is a numerical method to ap-

proximate analytically intractable integrals. In the case of the joint likelihood shown

in Equation (6.18), the (possibly multi-dimensional) integral over the infinite domain

must be evaluated using such techniques. In this Section I describe simple and adaptive

Gauss-Hermite quadrature, used to approximate the log-likelihood seen in Equation

(6.18), with my exposition based on Naylor and Smith (1982) and Tuerlinckx et al.

(2006).

118

6.6.2.1. Simple Gauss-Hermite quadrature

Gauss-Hermite quadrature is used to evaluate analytically intractable integrals of the

form (Naylor and Smith, 1982)

∫ ∞

−∞e−x

2f(x)dx ≈

m∑

q=1

wqf(xq) (6.22)

where xq and wq are the quadrature nodes and weights, with

wq =2m−1m!

√π

m2 [Hm−1(xq)]2 (6.23)

and xq is the qth root of the Hermite polynomial Hm(x). The theoretical basis for

Gauss-Hermite quadrature states that with m nodes, the quadrature formulation will

be exact if f(x) is a polynomial of degree 2m− 1. Following Naylor and Smith (1982)

and Tuerlinckx et al. (2006), if I replace the weighting function e−x2

in Equation (6.22)

with a normal density, φ(), with mean µ and standard deviation σ, I have

∫ ∞

−∞f(x)φ(x|µ, σ2)dx =

1√2πσ

∫ ∞

−∞f(x) exp

[−(x− µ)2

2σ2

]dx (6.24)

Then undertaking a change of variable, where x = µ + σ√

2r, and correspondingly

r = x−µ√2σ

, I have

∫ ∞

−∞f(x)φ(x|µ, σ2)dx =

√2σ√

2πσ

∫ ∞

−∞f(µ+ σ

√2r)e−r

2dr (6.25)

≈m∑

q=1

f(µ+ σ√

2r)wq√π

(6.26)

which corresponds to a quadrature evaluation based on the normal kernel, with nodes

µ+σ√

2xq, and weights wq/√π. If I use the standard normal kernel, then I have nodes,

dq =√

2xq, and weights, vq = wq/√π.

So far I have considered the univariate case, where I wish to integrate over a single

dimension, i.e. random effect. If I wish to move to the multivariate case, e.g. random

intercept and slopes, I can extend the univariate case to an integration over a multi-

variate normal distribution of dimension Q. I can therefore define the vector of nodes

119

for the multivariate standard normal as dq1,...,qQ = (dq1 , . . . , dqQ). However, this must

be pre-multiplied by the Choleski decomposition of the variance-covariance matrix of

the random effects, Σ1/2. To illustrate the multivariate case, I present Figures 6.3 and

6.4. In Figure 6.3 I have the 2-dimensional basis node locations, dq1,q2 = (dq1 , dq2), for-8

-6-4

-20

24

68

Qua

drat

ure

poin

t (e.

g. r

ando

m s

lope

)

-8 -6 -4 -2 0 2 4 6 8Quadrature point (e.g. random intercept)

Figure 6.3. Basis node locations for 2-dimensional 9 point simpleGauss-Hermite quadrature

9-point Gauss-Hermite quadrature. These basis node locations are then pre-multiplied

by the Choleski decomposition of the random effects variance-covariance matrix. In

Figure 6.4, I have multiplied the basis node locations by Σ1/2 =(

1 0.50.5 1

), to represent

positive correlation between the two random effects. The log-likelihood is then evalu-

ated at each of the ‘co-ordinates’, multiplied by the appropriate weights and summed

across dimensions.

Therefore, I can now write the log-likelihood seen in Equation (6.18), as

log p(Ti, di,yi;θ) = log

∫p(Ti, di|bi;θt)

ni∏

j=1

pyi(tij)|bi;θy

p(bi;θb)dbi

120

-8-6

-4-2

02

46

8Q

uadr

atur

e po

int (

e.g.

ran

dom

slo

pe)

-8 -6 -4 -2 0 2 4 6 8Quadrature point (e.g. random intercept)

Figure 6.4. Basis node locations for 2-dimensional 9 point simpleGauss-Hermite quadrature, transformed by Choleski decomposition ofvariance-covariance matrix

≈ logm∑

b1=1

· · ·m∑

bQ=1

p(Ti, di|Σ1/2db1,...,bQ ;θt)

ni∏

j=1

pyi(tij)|Σ1/2db1,...,bQ ;θy

× vb1 . . . vbQ (6.27)

The approximation to the log-likelihood seen in Equation (6.27) depends on the number

of quadrature points, m, and consequently the closeness of the objective function to

a polynomial function of degree 2m − 1. Standard practice within the generalised

linear mixed modelling framework, when utilising numerical quadrature, is to fit models

with an increasing number of quadrature points, until the difference between estimates

when using m and m+ 1 nodes, is sufficiently small. It is also important to note that

the computational burden of such quadrature evaluations grows substantially with an

increasing number of nodes, and subsequently grows exponentially with each increase

in dimension, Q.

Under simple Gauss-Hermite quadrature, the same node locations are used for

each subject, centred at the zero vector and scaled by the Choleski decomposition of

the estimated variance-covariance matrix of the random effects. Clearly, this will not

121

place the nodes in the optimum locations of the likelihood function for each subject.

If the between-subject variance is large then it is likely that the use of common node

locations will miss key information from the likelihood function for specific subjects,

resulting in an inefficient algorithm.

6.6.2.2. Adaptive Gauss-Hermite quadrature

To improve on the simple Gauss-Hermite (GH) quadrature described above, Pinheiro

and Bates (1995) proposed an adaptive quadrature routine, which allows subject-specific

centring and scaling of the quadrature nodes, in an attempt to place the nodes at the

most appropriate positions of the integral function for each subject.

This is achieved by considering an alternative kernel distribution, that of φ(bi|bi, Ωi),

as follows

log p(Ti, di,yi;θ)

= log

∫p(Ti, di|bi;θt)

ni∏

j=1

pyi(tij)|bi;θy

p(bi;θb)dbi

= log

∫ p(Ti, di|bi;θt)[∏ni

j=1 pyi(tij)|bi;θy]p(bi;θb)

φ(bi|bi, Ωi)φ(bi|bi, Ωi)dbi (6.28)

Equation (6.28) can then be evaluated using GH quadrature using this alternative

normal kernel density, with the nodes appropriately transformed using ri = bi +

Ω1/2i db1,...,bQ . This gives

log p(Ti, di,yi;θ) (6.29)

≈ (2π)Q/2|Ωi|1/2m∑

b1=1

· · ·m∑

bQ=1

p(Ti, di|ri;θt)

ni∏

j=1

pyi(tij)|ri;θy

× φ(ri|0,Σ) exp

(1

2dTb1,...,bQdb1,...,bQ

) Q∏

q=1

vbq (6.30)

I illustrate the adaptive approach conceptually in Figure 6.5. In Figure 6.5A I have a

normal density with population mean, µ = 0, and standard deviation, σ, to represent

a random intercept for example, that I wish to integrate out. Overlaid are the nodes

122

for 5-point simple GH quadrature, where the height of the vertical bars represents the

size of the weight applied at that node location. They will be scaled by the current

estimate of population standard deviation of the random effect, σ. As described above,

if the variation between subjects is moderate then it is highly likely that the original

node locations will fail to capture important contributions to the likelihood of some

patients, as shown in Figure 6.5B. The adaptive nature of GH quadrature allows us,

between each iteration of the estimation process, to estimate how each subject differs

from the population. Figure 6.5C illustrates this ‘shift’ for 2 patients. In Figure 6.5D

I can use the estimate, b1, to shift the original nodes, whilst in Figure 6.5E, I can then

scale them using the estimated standard error of b1, to locate the quadrature nodes in

the optimum location for subject 1. Further details on the implementation in Stata

can be found in Chapter 10.

This process of course extends to the multivariate case. The major benefit of

this approach from a computational, and efficiency, point of view, is that I can gain

a substantial reduction in computation time because far fewer quadrature nodes are

generally required to gain accurate estimates and standard errors, compared to the

use of simple Gauss-Hermite quadrature. This will be explored through simulation in

Chapter 7.

6.7. Prediction

In this Section I describe some of the clinically useful predictions that can be ob-

tained following the fitting of a joint model. The presence of both fixed and random

effects means that a variety of both simple and complex predictions can be made. Av-

erage predictions can be made simply by setting bi = 0 for all patients; however, for

patient-specific predictions I need estimates of the random effects, bi. There is grow-

ing interest in personalised medicine, with predictions tailored to individual patients

a clinically appealing goal. The joint model framework lends itself to this goal quite

naturally.

123

bi ~ N(0,S)

-6 -4 -2 0 2 4 6bi

(A) 5-point simple GH quadrature to eval-uate the population level random intercept

bi ~ N(0,S)

-6 -4 -2 0 2 4 6bi

(B) Subject-specific deviations from thepopulation mean

b1

b2

bi ~ N(0,S)

-6 -4 -2 0 2 4 6bi

(C) Estimate subject-specific shifts fromthe population mean

bi ~ N(0,S)

-6 -4 -2 0 2 4 6bi

(D) Centre nodes for a subject 1 using the

estimate of b1

bi ~ N(0,S)

-6 -4 -2 0 2 4 6bi

(E) Re-scale centred nodes using se(b1)

Figure 6.5. Conceptual example of adaptive Gauss-Hermite quadrature

6.7.1. Random effects

In order to obtain patient-specific predictions, I must obtain estimates of the random

effects vector, bi. This can be conducted within an empirical Bayes framework (Rabe-

Hesketh and Skrondal, 2012; Rizopoulos, 2012). I assume that the prior distribution

124

for the random effects is simply φ(bi;θ), the multivariate normal distribution specified

for the model above. This can then be combined with the likelihood, conditional on

the random effects vector, p(Ti, di|bi;θ)p(yi|bi;θ), to give (using Bayes’ theorem)

p(bi|Ti, di,yi;θ) =φ(bi;θ)p(Ti, di|bi;θ)p(yi|bi;θ)

p(Ti, di,yi;θ)(6.31)

∝ φ(bi;θ)p(Ti, di|bi;θ)p(yi|bi;θ) (6.32)

the posterior distribution, given the observed responses Ti, di and yi. The desired

empirical Bayes prediction is the mean of the posterior distribution, calculated at the

parameter estimates θ, given by

bi =

∫bip(bi|Ti, di,yi;θ)dbi (6.33)

with variance

var(bi) =

∫(bi − bi)2p(bi|Ti, di,yi;θ)dbi (6.34)

It should be noted that both integrals in Equations (6.33) and (6.34) remain analytically

intractable, requiring quadrature.

6.7.2. Longitudinal and survival predictions

6.7.2.1. Longitudinal predictions

The longitudinal trajectory can be calculated based only on the fixed portion of the

model, using

mi(t) = xTi (t)β + uTi δ (6.35)

which can be used to calculate the population level trend over time, for a specific

covariate pattern, calculated at the mean of the random effects distribution, i.e. when

bi = 0. Confidence intervals follow naturally when the prediction is only based on

fixed effects. Including random effects in the prediction, by calculating subject-specific

values for the random effects as described in Section 6.7.1, means that subject-specific

125

longitudinal predictions can be calculated, from Equation (8.2), as

mi(t) = xTi (t)β + zTi (t)bi + uTi δ (6.36)

One of the advantages of the shared parameter framework is the ability to make both

population level and subject-specific predictions.

6.7.2.2. Survival predictions

I can also make predictions for the survival outcome, such as the hazard, cumulative

hazard, and survival functions; either using only the fixed effect estimates, or includ-

ing patient-specific estimates of the random effects. Marginal survival predictions are

also often calculated, by drawing from the estimated random-effects distribution and

averaging across predictions (Rizopoulos, 2010).

6.7.3. Conditional survival predictions of Rizopoulos (2011)

As interest in personalised medicine grows, the clinical usefulness of tailoring pre-

dictions to an individual patient only grows. The very time-dependent nature of the

longitudinal process, and how changes in the evolution of the process impact the sur-

vival outcome provides an extremely appealing risk prediction framework (Rizopoulos,

2011; Proust-Lima and Taylor, 2009; Taylor et al., 2013).

Given an estimated joint model, for a specific patient, who could be a censored

observation, or indeed a new patient, it is of direct clinical interest to predict the time

to the event of interest. I now relate back to the description of endogenous covariates

in Section 6.3, where given that a patient has had a measurement at time t, in fact

implies survival up to time t, therefore, it is of interest to calculate the probability of

survival at time u > t conditional on survival up to time t. The following is based on

Rizopoulos (2011).

πi(u|t) = P (T ∗i ≥ u|T ∗i > t, Yi(t), Dn;θ) (6.37)

where Dn = Ti, di,yi; i = 1, . . . , n represents the sample dataset that was used to

fit the joint model of interest, which will be used to inform the conditional survival

predictions.

126

P (T ∗i ≥ u|T ∗i > t, Yi(t);θ)

=

∫P (T ∗i ≥ u|T ∗i > t, Yi(t), bi;θ)p(bi|T ∗i > t, Yi(t);θ)dbi

=

∫P (T ∗i ≥ u|T ∗i > t, bi;θ)p(bi|T ∗i > t, Yi(t);θ)

=

∫Su|Mi(u, bi,θ);θSt|Mi(t, bi,θ);θ p(bi|T

∗i > t, Yi(t);θ) (6.38)

A first-order estimate has been proposed for πi(u|t)

πi(u|t) =Siu|Mi(u, bi, θ); θSit|Mi(t, bi, θ); θ

+O(n−1i ) (6.39)

However, calculating appropriate standard errors for this estimate is problematic, and

hence Monte Carlo sampling schemes have been proposed to account for the variability

in both the maximum likelihood estimates, and the estimates of the random effects. Ri-

zopoulos (2011) takes a Bayesian approach to this, and proposed the following sampling

Monte Carlo scheme;

• For l = 1, . . . , L repetitions:

(1) Draw θ(l) ∼ N(θ, var(θ))

(2) Draw b(l)i ∼ bi|T ∗i > t, Yi(t),θ

(l)(3) Calculate π

(l)i (u|t) =

Siu|Mi(u,b(l)i ,θ(l));θ(l)

Sit|Mi(t,b(l)i ,θ(l));θ(l)

The Monte Carlo samples can then be used to derive statistics such as the median and

mean, with percentile-based confidence intervals.

6.7.4. Distribution of the random effects

Tsiatis and Davidian (2001) have proposed alternative joint model frameworks which

make no distributional assumption on the random effects; however, Huang et al. (2009)

and Rizopoulos et al. (2008) have shown from both a theoretical framework and through

simulation that joint models are quite robust to the assumption of normally distributed

random effects. It has generally been shown in the literature that the assumption of

127

normally distributed random effects is relatively robust to deviations from this assump-

tion. The estimates of the random effects, described in Section 6.7.1, can be assessed

for deviations to this assumption using standard techniques, such as Q-Q plots.

6.8. Discussion

The joint modelling of longitudinal and time-to-event data has received substantial

attention in the literature over the past 20 years (Wulfsohn and Tsiatis, 1997; Hen-

derson et al., 2000), with the ability to investigate the inter-relationships between the

joint processes being advocated in ever widening fields of study (Tseng et al., 2005;

Ibrahim et al., 2010). A number of excellent reviews of the field have been conducted

(Tsiatis and Davidian, 2004; Yu et al., 2004), most recently the review of Gould et al.

(2014) that I was involved in. It is now well established that joint modelling offers an

appropriate methodological framework when assessing the association between a lon-

gitudinal biomarker, measured with error, and a time to event of interest, be it from a

survival analysis context, or a longitudinal study adjusting for informative drop-out.

Joint modelling is well known to be a technically complex subject area, generally

due to the calculations required in fitting the models; however, this is becoming less so

with the increased availability of user friendly software. A central theme of this thesis

is to develop and disseminate user friendly Stata software for the joint modelling of

longitudinal and survival data, which will be comprehensively described in Chapter 10.

In this chapter I have described the methodological foundations of the standard joint

model framework. In the succeeding chapters I introduce some methodological exten-

sions and apply joint modelling to two datasets in the area of liver cirrhosis. Building

on the motivations described in Chapters 4 and 5, a particular limitation of joint mod-

els was the flexibility of the survival submodel, within a parametric approach. Many

authors proposed standard survival submodel choices, such as the Weibull distribution

(Deslandes and Chevret, 2007), but failed to consider the implications of misspecifying

the baseline hazard function. It is this which motivates the methodological extension

128

described in the next chapter. Furthermore, a comprehensive assessment of the com-

monly used numerical integration methods, described in Section 6.6.2, had not been

conducted, which the methods within a frequentist framework are heavily reliant on.

This will also be conducted in the next chapter.

129

CHAPTER 7

Flexible parametric joint modelling of longitudinal and

survival data


In this Chapter I propose to incorporate the Royston-Parmar (RP) model (Royston

and Parmar, 2002), introduced in Chapter 2, as the survival submodel within a joint

modelling framework. Motivated by the need for a more flexible formulation of the

survival submodel to capture complex hazard functions, whilst maintaining the benefits

of a parametric approach; the proposed joint model is evaluated through an extensive

simulation study, comparing the RP submodel which uses restricted cubic splines as

an alternative to a previously defined B-spline formulation to capture complex hazard

functions. I further evaluate and compare adaptive and non-adaptive Gauss-Hermite

quadrature used to evaluate the joint likelihood, an often neglected issue.

This chapter is based on a recent publication in Statistics in Medicine, a copy of

which can be found in Appendix D (Crowther et al., 2012a).

7.2. Introduction

In Chapter 6, I introduced the main concepts of joint modelling of longitudinal and

survival data, describing the motivation for undertaking a joint model analysis and

derived the frequentist estimation framework. In this chapter, I propose a methodolog-

ical extension to the joint model framework to address the issue of capturing complex

hazard functions, raised in Chapters 4, 5 and 6.

As described in Chapters 2 and 5, the occurrence of more complex survival data

and underlying baseline hazard functions has motivated the development of more flex-

ible, parametric approaches to survival analysis. One such proposition is the model

130

of Royston and Parmar (2002), which is modelled on the log cumulative hazard scale

using restricted cubic splines, to provide a highly flexible, but still fully parametric,

survival modelling framework, described in Chapter 2.

Within a joint modelling framework, I propose to incorporate the RP survival

model, to provide a highly flexible framework to capture complex hazard functions.

Rizopoulos et al. (2009) developed a joint model using this form, but expanded log

time by using B -splines. I extend their approach by alternatively using restricted cubic

splines (Durrleman and Simon, 1989; Lambert and Royston, 2009), which impose the

restriction that the fitted function is forced to be linear beyond the boundary knots,

as described in Section 2.8.1 of Chapter 2. The number of parameters in a restricted

cubic spline specification is equal to the number of internal knots plus 2 (one of which

is an intercept). The number of parameters in a B-spline formulation is equal to the

order plus the number of internal knots.

There are 2 primary motivations for my approach. First, as described in Chapters

4 and 5, standard parametric survival models can often lack the flexibility to capture

complex underlying hazard functions, for example, the Weibull assumes a monotonic

shape, which will be insufficiently flexible to fully capture a hazard function with a

turning point. Patient specific conditional survival predictions (Rizopoulos, 2011), a key

output of the joint model framework, rely on sufficiently capturing the baseline hazard

function, and consequently extrapolating the baseline hazard function. Secondly, joint

models can be considered computationally intensive, therefore by modelling on the log

cumulative hazard scale I avoid numerically integrating the hazard function, required to

evaluate the joint likelihood when using a hazard scale parametric survival submodel,

as described in Chapter 6.

I conduct an extensive simulation study to not only evaluate the proposed joint

model, comparing to the appropriate B-spline based model, but also evaluating non-

adaptive quadrature with varying numbers of nodes, and fully adaptive quadrature, an

issue often neglected in the literature. The simulation study makes use of the general

survival simulation algorithm described in Chapter 4.

131

The methods are illustrated using a dataset of 488 patients with liver cirrhosis, in-

troduced in Chapter 3 (Anderson et al., 1993). A total of 251 patients were randomised

to receive prednisone, with 237 assigned to a placebo. Prothrombin index was measured

repeatedly at baseline, with subsequent scheduled measurements at 3, 6, 12 months and

then annually; however, observed time of measurements varied substantially. A total of

2968 measurements were recorded. I investigate the effect of treatment after adjusting

for the relationship between prothrombin index and time to death.

7.3. Defining the joint model

In this section I define the joint model incorporating the Royston-Parmar survival

model as the survival submodel of choice. I use notation defined in Chapter 6. The

longitudinal submodel follows that defined in Section 6.4.1 of the previous chapter.

7.3.1. Survival submodel

I define the Royston-Parmar survival submodel, on the log cumulative hazard scale,

with Mi(t) = mi(s), 0 ≤ s ≤ t, defined to be the true unobserved longitudinal profile

up to time t

logHi(t|Mi(t),vi) = logH0(t) +ψTvi + αmi(t) (7.1)

where H0(t) is the cumulative baseline hazard function, α denotes the association pa-

rameter and ψ is a set of log (cumulative) hazard ratios associated with a set of baseline

covariates, vi, again a subset of Ui. For illustration, I present the model formulation

with the current value association structure, described in Section 6.4.2 of Chapter

6; however, note that the other association structures remain valid in this modelling

framework. Following Section 2.8.2, the log baseline cumulative hazard is written as a

restricted cubic spline function

logHi(t|Mi(t),vi) = ηi(t) = slog(t)|γ,k0+ψTvi + αmi(t) (7.2)

132

with corresponding hazard and survival functions

h(t|Mi(t),vi) =dH(.)

dt

=

dηi(t)

dt

× exp(ηi(t))

=

d[slog(t)|γ,k0+ψT vi + αmi(t)

]

dt

× exp(ηi(t))

=

1

t

dslog(t)|γ,k0d log(t)

+ αdm(t)

dt

exp(ηi(t)) (7.3)

and

S(t|Mi(t),vi) = exp [− expηi(t)] (7.4)

Note that, as opposed to the parametric survival submodels described in the previous

chapter, the survival function defined in Equation (7.4) is analytically tractable, i.e.

can be written in closed form, providing substantial computational benefits.

7.3.1.1. A note on time-dependent association structures

Due to modelling on the log cumulative hazard scale interpretation of the association

parameter must be undertaken with caution. As can be seen in the definition of the

hazard function in Equation (7.3), the association parameter α and longitudinal com-

ponent mi(t) appear not only in the definition of ηi(t) but also within the derivative

of the cumulative hazard function. This means when interpreting α as the log cumu-

lative hazard ratio for a one-unit increase in the longitudinal outcome at time t, is not

equivalent to the log hazard ratio for a one-unit increase in the longitudinal outcome at

time t, i.e. the equivalence between cumulative hazard ratios and hazard ratios breaks

down.

7.3.2. Likelihood

The joint likelihood described in Chapter 6 remains the same for the longitudinal

submodel components, and the distribution of the shared random effects. However, the

133

likelihood component for the survival submodel is now

p(Ti, di|bi,θ) =

[1

Ti

dslog(Ti)|γ,k0d log(Ti)

+ αdm(Ti)

dTi

exp(ηi(Ti))

]di

× exp [− exp(ηi(Ti))] (7.5)

As mentioned above, the important aspect to note from Equation (7.5) is that there

is no longer the integral which is present under a parametric survival model on the

log hazard scale, as described in Chapter 6. This directly avoids the need for nested

numerical integration to evaluate the cumulative hazard function, and subsequently

the survival function required for the likelihood, seen in Equation (6.19). This provides

direct computational benefits due to modelling on the log cumulative hazard scale.

7.4. Simulation study

I now evaluate the performance of the proposed joint model, comparing the use

of restricted cubic splines to the B-spline formulation of Rizopoulos et al. (2009), and

further evaluate and compare the numerical integration techniques of adaptive and

non-adaptive Gauss-Hermite quadrature, used to calculate the joint likelihood. Under

each scenario I apply both the proposed joint model with 5 degrees of freedom, plus an

intercept, resulting in 6 parameters to capture the baseline cumulative hazard function,

and a B-spline function of degree 3 (cubic splines for consistency) and 2 internal knots,

again resulting in 6 parameters to capture the baseline hazard, providing a fair compar-

ison. I also apply the proposed joint model with 1 degree of freedom (equivalent to a

Weibull based joint model). The proposed model is implemented in the stjm command

in Stata (Crowther, 2012). I use the JM package (Rizopoulos, 2010) version 0.4.0 in R

to fit the model of Rizopoulos et al. (2009). Both Stata and R packages use a tolerance

of 1E-04 for the parameter estimates, and Stata uses 1E-07 for the log-likelihood, with

R using 1.5E-8.

Furthermore, what has often been neglected in the literature, is an assessment on

the number of quadrature nodes required to obtain consistent parameter estimates of

134

effect and sufficient coverage probabilities. To each scenario I use non-adaptive Gauss-

Hermite quadrature to evaluate the joint likelihood of both the proposed model and

the B-spline model, with 5 and 15 nodes for each random effect dimension to assess

performance. I also implement 5-point fully adaptive quadrature for the proposed

model for comparison; however, this was not available for the B-spline model which is

a limitation of the R package.

For each scenario, 300 patients were included in each of 500 repetitions. The true

longitudinal profile was generated from mi(t) = β0i + β1it + δui, with β0i ∼ N(0, 1),

β1i ∼ N(0, 0.252) and correlation between (β0i, β1i) of 0.25. The observed longitudinal

measurements were then generated from yi(t) ∼ N(mi(t), 0.52). Time of measurements

were fixed at (0, 1, 2, 3, 4). Survival times were generated from log(H(t|mi(t), ui)) =

log(H0(t)) + αmi(t) + ψui, where H0(t) is detailed below. Administrative censoring

was applied at 5 years. A binary treatment group variable was generated from ui ∼Bin(1, 0.5). The direct treatment effect on the longitudinal response, δ, is fixed at -0.25,

the direct treatment effect on the time-to-event, ψ, is fixed at 0.25, and the association

parameter, α, is varied between -0.25, 0.25.

7.4.1. Generating survival times

Here I put into practice the general survival simulation algorithm described in Chapter

4, in order to simulate survival times from complex baseline hazard functions incorpo-

rating turning points. This will allow a full assessment of the use of splines to capture

complex hazard functions. For example, the method of Rizopoulos et al. (2009) was

evaluated in a simulation study with survival times generated from an exponential dis-

tribution (however, the primary motivation of the simulation study was to evaluate the

Laplacian estimation method, not the survival submodel). I generate survival times

from a two-component mixture Weibull distribution (McLachlan and McGiffin, 1994),

with:

S0(t) = p exp−λ1tγ1+ (1− p) exp−λ2t

γ2 (7.6)

135

and

H0(t) = − log [p exp−λ1tγ1+ (1− p) exp−λ2t

γ2] (7.7)

I now add the linear predictor for the association and time independent covariates, on

the log cumulative hazard scale:

logH(t) = log [− log(p exp−λ1tγ1+ (1− p) exp−λ2t

γ2)] + αmi(t) + ψui (7.8)

Therefore, given that I can write down the survival function as S(t) = exp [−H(t)], with

H(t) defined in Equation (7.8), but cannot invert it to solve for t, I can directly apply

simulation scenario 2, from Chapter 4 Section 4.5.2, utilising root finding techniques,

to simulate survival times.

Three scenarios of baseline parameters were chosen; a standard Weibull with in-

creasing hazard function, λ1 = 0.1, γ1 = 1.5, and p = 1, a mixture Weibull with a

single turning point in the baseline hazard function, λ1 = 0.1, γ1 = 1.5, λ2 = 0.1,

γ2 = 0.5, and p = 0.9, and finally a Weibull distribution with λ1 = 1E−05, γ1 = 6.1,

and p = 1. The final scenario is to assess the validity of the approach when the hazard

is essentially zero for a portion of the follow-up time.

7.4.2. Results

Tables 7.1, 7.2 and 7.3 present bias and coverage estimates from all simulations gen-

erated under the three baseline hazard functions. Under the three scenarios, survival

submodel parameter estimates from the proposed model, i.e. the direct treatment effect

on survival (ψ) and the association parameter (α), appear to be unbiased. However,

under the B-spline approach, across all scenarios I observe consistent under-estimation

of the association parameter, α. This bias is eliminated in the restricted cubic spline

approach. For example, under scenario 2 with the true α = 0.25, the percentage bias

under the restricted cubic spline approach is -0.8% compared to -10% under the B-spline

approach. Coverage probabilities very closely approximate the desired 95% in all sce-

narios when using restricted cubic splines, even with a small number of non-adaptive

136

quadrature nodes. For the longitudinal submodel parameters, I observe generally un-

biased estimates; however, in respect to variance parameters, only when the number of

non-adaptive quadrature nodes ≥ 15 or when using fully adaptive quadrature with 5

nodes. Under non-adaptive quadrature, coverage estimates are generally below the de-

sired 95% indicating a marked under-estimation of the standard errors, compared with

optimum coverage probabilities across scenarios when fully adaptive quadrature with

5 nodes is used. Further simulations, not shown here, illustrated that 35 non-adaptive

quadrature nodes were required to provide optimum coverage probabilities. Standard

errors of variance parameters are not available in R so coverage could not be assessed

for all parameters in the B-spline models.

The proposed model also produces moderate bias in the variance estimate of the

slope parameter when 5 point non-adaptive quadrature is used; however, this bias is

eliminated under both 15 point non-adaptive and 5 point adaptive quadrature. Com-

paring across degrees of freedom, I observe almost identical estimates of bias and cov-

erage probabilities between models.

Table 7.2 presents bias and coverage estimates for simulations generated from a

two-component mixture Weibull baseline hazard, described in Section 7.4.1. Results

appear entirely consistent with those found when generating under a standard Weibull

distribution. The underestimation of the standard errors of the longitudinal parameters

remains a problem when an insufficient number of quadrature nodes are used. Despite

generating data from a complex baseline hazard, the joint models fitted with only one

degree of freedom appear to estimate all parameters just as effectively as with 5 degrees

of freedom, specifically the three treatment effects. This can perhaps be expected, as is

often the case the hazard ratio can be insensitive to specification of the baseline hazard

function (Royston and Lambert, 2011).

I discuss the implications of the choice of the number of quadrature nodes, and the

insensitivity to the baseline hazard further in Section 7.6.

137

7.5. Analysis of liver cirrhosis dataset

In this section I apply the proposed joint model to the dataset described in Section

3.5 of Chapter 3, where primary interest is the effect of treatment after adjusting for

the repeatedly measured prothrombin index on the time to all-cause death. A total of

488 patients had their prothrombin index measured at baseline, with further scheduled

measurements at 3, 6, and 12 months, and annually thereafter. Median number of

measurements was 6 (range: 1, 17). A total of 292 (59.8%) patients died during the

study. Patients were randomised to 2 treatment groups, namely prednisone or placebo.

For further details regarding the dataset, I refer the reader elsewhere (Anderson et al.,

1993). I begin with some exploration of the observed data, by plotting the observed

repeated measures of prothrombin index over time, splitting between patients who died

and who were censored, shown in Figure 7.1.

050

100

150

200

Long

itudi

nal r

espo

nse


Censored

050

100

150

200

Long

itudi

nal r

espo

nse


Event

Figure 7.1. Liver cirrhosis trial of prednisone treatment. Longitudi-nal profiles of prothrombin index for patients who were censored/died.Lowess smoother overlaid.

Figure 7.1 illustrates the observed biomarker measurements for all patients, with a

lowess smoother overlaid to give an indication of the average profile over time, indicat-

ing some evidence that patients who died had generally lower values of prothrombin

138

index. This is further investigated in Figure 7.2, which provides an exploration of the

relationship between prothrombin trajectory and the time (in years) to death by plot-

ting the observed longitudinal responses against observation time, where the timescale

is adjusted by taking away the observed censoring/event time. A lowess smoother is

overlaid.

050

100

150

200

Long

itudi

nal r

espo

nse


Censored

050

100

150

200

Long

itudi

nal r

espo

nse


Event

Figure 7.2. Liver cirrhosis trial of prednisone treatment. Longitudi-nal profiles of prothrombin index for patients who were censored/died.Timescale is adjusted by taking away each patients survival time.Lowess smoother overlaid.

From Figure 7.2, it is apparent that patients who experienced the event, compared

to patients who were censored, had decreasing levels of the biomarker during the 2-3

year period before death. If I assume the association between the longitudinal and

survival models is based on the current value parameterisation discussed in Section

6.4.2, I would expect a negative association indicating a lower value of prothrombin

index has an increased risk of death. This form of plot can be a useful exploratory tool

in the analysis of joint longitudinal and survival data, specifically indicating a current

value association structure between the two processes.

I now apply the joint model described in Section 7.3 to the liver cirrhosis dataset. In

the longitudinal submodel I assume a random intercept with random effect of log(time),

139

and also adjust for the interaction between treatment and time. In preliminary analysis

log(time) showed an improved fit compared to a linear effect of time. In the survival

submodel I adjust for the direct effect of treatment. I model the association between

prothrombin index and time to death through the current value parameterisation. I

use 5 degrees of freedom to model the baseline cumulative hazard, equivalent to 4

internal knots. Boundary knots are placed at the 0th and 100th percentiles of the

uncensored log survival times. For comparison I also apply the model of Rizopoulos

et al. (2009). Under the B-spline model I use cubic splines with 2 internal knots to

provide a comparison of model fit with the same number of parameters used to model

the baseline cumulative hazard function. As adaptive quadrature is not available for

the B-spline model, I apply both models using 35 point non-adaptive quadrature.

In Table 7.4, comparing between the proposed approach and the B-spline model, I

generally observe similar parameter estimates, in particular both models show a neg-

ative association between prothrombin index and time to death, for example under

the RP approach I observe an association of -0.038 (95% CI: -0.045, -0.031), indicat-

ing a lower value of prothrombin index increases the risk of death. I observe a non-

statistically significant direct effect of treatment on survival with a log hazard ratio of

0.210 (95% CI: -0.038, 0.457).

I now return to our primary motivation of our approach which is to effectively

capture complex hazard functions. I compare the fitted marginal survival functions

across models with the Kaplan-Meier estimates for the liver cirrhosis dataset, shown

in Figure 7.3. It is evident from Figure 7.3 that the restricted cubic spline approach

provides an improved fit compared to the B-spline approach, using the same number of

parameters to model the baseline cumulative hazard function. Indeed, in Figure 7.4 I

show the marginal survival function with an increased number of internal knots under

the B-spline approach, highlighting that I need to use 5 internal knots to achieve a

function which fits as closely as the restricted cubic splines approach. In other words,

I need to use 9 parameters under the B-spline approach compared to only 6 under the

restricted cubic spline approach to achieve a well-fitting function.

140

0.0

0.2

0.4

0.6

0.8

1.0

Sur

viva

l

0 2 4 6 8 10 12 14Follow−up time (years)

Restricted cubic splines

0.0

0.2

0.4

0.6

0.8

1.0

Sur

viva

l


B−splines

Marginal survival

Kaplan−Meier curve 95% Confidence IntervalMarginal survival

Figure 7.3. Liver cirrhosis trial of prednisone treatment. Marginalsurvival using 6 parameters to model the baseline cumulative hazardfunction, overlaid on the Kaplan-Meier estimate and it’s associated 95%confidence interval.

7.5.1. Predictions

To illustrate the prognostic benefits of the joint modelling framework, conditional

survival predictions can be tailored at the individual levels using the empirical Bayes

predictions from the random effects, and sampling schemes have been proposed to

calculate appropriate standard errors for these predictions (Rizopoulos, 2011). I adapt

the approach of Rizopoulos (2011) to calculate conditional survival predictions of 2

patients with similar baseline values of prothrombin index, using the fitted restricted

cubic spline based joint model, shown in Figure 7.5. Given the negative association

between prothrombin index and an increased risk of death, it can be seen from Figure

7.5 that patient 98 has a sharply increasing pattern of prothrombin index across follow-

up time, resulting in higher survival probabilities, conditional on survival at time of final

measurement, when compared to Patient 253. Patient 253 maintains lower values of

prothrombin index, resulting in lower survival predictions. The confidence intervals are

derived using the Monte Carlo sampling scheme described in Section 6.7.3 of Chapter 6,

where I have both the estimates of the fixed effects, and the subject-specific estimates

141

0.0

0.2

0.4

0.6

0.8

1.0

Sur

viva

l


3 internal knots

0.0

0.2

0.4

0.6

0.8

1.0

Sur

viva

l


4 internal knots

0.0

0.2

0.4

0.6

0.8

1.0

Sur

viva

l


5 internal knots

Marginal survival using B−splines

Kaplan−Meier curve 95% Confidence IntervalMarginal survival

Figure 7.4. Liver cirrhosis trial of prednisone treatment. Marginalsurvival using cubic B-splines with 3, 4 or 5 internal knots, resulting in7, 8 or 9 parameters to model the baseline cumulative hazard function,overlaid on the Kaplan-Meier estimate and it’s associated 95% confi-dence interval.

of the random effects, feeding into the uncertainty. The reliance of these predictions

on accurately specifying the baseline hazard is discussed in Section 7.6.

7.5.2. Sensitivity to location and number of knots

In my experience I have found that the default knot locations, based on the distribu-

tion of uncensored event times provides the most sensible approach to modelling using

spline formulations, as was found in Rizopoulos et al. (2009) and Rutherford et al.

(2014). This allows the data to be modelled more accurately in the areas of greatest

density. Previous work within the flexible parametric survival modelling framework

have shown insensitivity to knot placements (Lambert et al., 2010a). Using 5 degrees

of freedom (4 internal knots), the default knot locations are 0.424, 1.186, 2.894, 5.418.142

0.0

0.2

0.4

0.6

0.8

1.0

Survival probability

0

50

100

150

200

Pro

thro

mbi

n in

dex


Patient 98

0.0

0.2

0.4

0.6

0.8

1.0S

urvival probability

0

50

100

150

200

Pro

thro

mbi

n in

dex


Patient 253

Predicted conditional survival

Longitudinal responsePredicted conditional survival95% pointwise confidence interval

Figure 7.5. Liver cirrhosis trial of prednisone treatment. Predictedconditional survival probabilities for patients 98 and 253.

I choose 3 other sets of internal knot locations (on the original time scale) and compare

parameter estimates and predicted marginal survival curves. I have knot locations A

of 0.3, 1, 3, 5, locations B of 1, 3, 5, 8 and locations C of 0.2, 1, 2, 9. Table 7.5

contains the parameter estimates across models with differing knot choices, illustrat-

ing once again the robustness of parameter estimates when compared to the original

results in Table 7.4, with only minor differences observed in the 3rd decimal place. Sim-

ilarly, the left plot in Figure 7.6 shows very stable predicted marginal survival curves

across knot choices. Furthermore, the right plot in Figure 7.6 illustrates the fitted

marginal survival function when using 2, 3 and 5 internal knots (with locations based

on equally spaced quantiles of the distribution of uncensored survival times), illustrat-

ing the stability of our proposed model. In comparison to Figure 7.4, I observe much

more variability in the marginal survival predictions when using B-splines with varying

number of knots.

7.6. Discussion

In this chapter I have described a highly flexible joint model for a single longitu-

dinal continuous biomarker and the time to an event of interest. The restricted cubic

143

0.0

0.2

0.4

0.6

0.8

1.0

Sur

viva

l


Kaplan−Meier curve 95% CI

Marg. survival 1 Marg. survival 2

Marg. survival 3

Varying knot locations

4 internal knots

0.0

0.2

0.4

0.6

0.8

1.0

Sur

viva

l


Kaplan−Meier curve 95% CI

Marg. survival df = 3 Marg. survival df = 4

Marg. survival df = 6

Locations based on quantiles

2, 3 or 5 internal knots

Marginal survival − sensitivity analysis

Figure 7.6. Liver cirrhosis trial of prednisone treatment. Fitted mar-ginal survival function from joint models with varying knot locationsand number of internal knots. Left hand plot used 6 parameters tomodel the baseline log cumulative hazard function, right hand plot uses4, 5 and 7 parameters.

spline basis for the log cumulative baseline hazard function provides a flexible frame-

work where often the time-to-event is of primary interest. Flexibility in the longitudinal

submodel can be incorporated through the use of fixed and/or random fractional poly-

nomials of time, which can capture a variety of shapes (Royston and Altman, 1994).

The simulation study conducted to assess the proposed joint model raised three

important issues. Firstly, I observed consistent under-estimation of the association

parameter, α, under the B-spline approach. This bias was eliminated when using re-

stricted cubic splines, both with 1 and 5 degrees of freedom. Secondly, the choice of the

number of quadrature nodes can have a marked impact on both parameter estimates,

and the associated standard errors. It should be investigated further if increasing the

non-adaptive quadrature points would eliminate the bias under the B-spline approach.

If interest is purely on the time-to-event then a lower number of quadrature nodes can

be used and will produce unbiased estimates with optimum coverage levels; however, if

the longitudinal submodel is of interest then the choice of quadrature nodes and method

144

is crucial. For example, in studies where quality of life is the longitudinal marker of

interest (Billingham and Abrams, 2002), the longitudinal response profile can be of

direct interest in order to be included into an economic decision model, where reliable

estimates of associated standard errors can be pivotal in assessing cost-effectiveness and

thus health policy decisions (Briggs, 2000). The simulation study highlighted the supe-

riority of fully adaptive Gauss-Hermite quadrature in the joint model setting. The use

of adaptive quadrature means I can use a much reduced number of quadrature nodes,

resulting in substantial computational benefits. Finally, the simulation study showed in

general how the estimates of covariate effects were insensitive to the specification of the

baseline hazard. This of course can be beneficial; however, one of the key benefits of the

joint model framework are the predictions which can be obtained. These predictions

will rely heavily on the accuracy of the model in estimating the baseline hazard func-

tion. I illustrate this in Figure 7.7, whereby data is simulated under a two-component

mixture Weibull baseline hazard function with a turning point. I apply joint models to

the single simulated dataset, firstly with 1 degree of freedom (equivalent to a Weibull

model), and then 5 degrees of freedom. I then predict the marginal survival function

and compare to the Kaplan-Meier survival curve. It is evident from Figure 7.7 that

only with a sufficient number of degrees of freedom can the baseline survival function

be adequately captured.

In application to the liver cirrhosis dataset, it was found that the restricted cu-

bic spline approach provided improved flexibility in capturing complex baseline hazard

functions when compared to a B-spline formulation with the same number of param-

eters, implying that I can obtain greater flexibility with fewer parameters. Of course,

B-spline functions of other degrees may in fact provide well-fitting models; however,

the results presented in this chapter have shown that they can produce unstable fitted

functions.

In this chapter I also described patient specific conditional survival predictions (Ri-

zopoulos, 2011); a particularly clinically relevant tool of the joint model framework.

145

0.00

0.25

0.50

0.75

1.00

Sur

viva

l pro

babi

lity


Marginal survival function − 1 degree of freedom

0.00

0.25

0.50

0.75

1.00

Sur

viva

l pro

babi

lity


Marginal survival function − 5 degrees of freedom

95% Confidence IntervalKaplan−Meier estimateMarginal survival function

Figure 7.7. Simulated data. Fitted marginal survival function fromjoint models with either 1 or 5 degrees of freedom, overlaid on theKaplan-Meier survival curve.

It is important to note that such predictions require extrapolation of both the base-

line hazard function, and the longitudinal profile (under a time-dependent association

structure), often beyond the range of the data. Concentrating on extrapolating the

baseline hazard function, the RP framework described here is particularly appealing,

as I have shown that the baseline can be captured well, providing a sufficient number of

degrees of freedom are chosen, but also the restricted nature of the splines beyond the

boundary knots immediately provides a somewhat ‘sensible’ basis for extrapolation.

There are a multitude of extensions to this joint model framework. For example,

adaption to incorporate a cure fraction can be achieved simply due to the restricted

linear basis for the final spline function. Imposing the constraint that the final spline

function beyond the last knot is constant has been implemented to allow for a cure

fraction in population-based cancer studies within the flexible parametric framework

(Andersson et al., 2011). Furthermore, extension to the competing risks setting by

modelling cause-specific hazards can be accommodated, introducing cause-specific as-

sociation parameters. The generalised linear mixed effects framework for the longitu-

dinal measures submodel can be adapted to handle categorical responses (Rizopoulos

146

and Ghosh, 2011). Finally, a Bayesian approach to the proposed model could be inves-

tigated and contrasted (Guo and Carlin, 2004). Many of these extensions are discussed

further in Chapter 11.

In application to the liver cirrhosis dataset, a single term of observation time pro-

vided sufficient flexibility to capture the shape of subject specific longitudinal trajecto-

ries; however, further flexibility could be investigated through the use of splines (Brown

et al., 2005; Rizopoulos and Ghosh, 2011), which will be investigated in Chapter 9.

During the review process for the accompanying paper to this chapter, found in

Appendix D, concerns were raised about ensuring the monotonicity of the cumulative

hazard function. In my experience, including all scenarios of the simulation study,

this is not a practical issue. If at any point in the estimation process the hazard

function goes negative, then the algorithm will fail. This was not observed in any

simulations, ensuring that valid cumulative hazard and subsequently survival functions

were estimated.

Implementation of the model is facilitated through user friendly Stata software

(Crowther, 2012), developed as part of this thesis, and described in more detail in

Chapter 10.

147

Table 7.1. Simulation results from Weibull scenario 1. Association isvaried with α = −0.25, 0.25.

Parameter True value ModelNAQ 5 nodes NAQ 15 nodes AQ 5 nodesBias 95% CP Bias 95% CP Bias 95% CP

β0 0FPM (df=1) -0.001 66.5 -0.003 84.4 -0.003 95.6FPM (df=5) -0.005 67.7 -0.002 84.6 -0.003 95.6

B-spline 0.013 52.4 0.019 81.5 - -

σ0 1FPM (df=1) -0.046 59.8 -0.002 85.6 -0.004 94.8FPM (df=5) -0.046 60.5 -0.004 85.4 -0.004 94.6

B-spline 0.012 - -0.002 - - -

β1 0FPM (df=1) -0.014 70.3 -0.002 87.2 -0.001 94.8FPM (df=5) -0.015 69.7 -0.002 86.4 -0.001 94.2

B-spline -0.027 56.6 -0.024 73.0 - -

σ1 0.25FPM (df=1) -0.248 14.6 -0.018 93.0 -0.009 94.2FPM (df=5) -0.247 14.2 -0.021 92.3 -0.009 94.2

B-spline 0.022 - -0.028 - - -

σ01 0.25FPM (df=1) 0.001 70.1 -0.020 72.8 0.001 94.8FPM (df=5) 0.001 70.5 -0.022 71.9 0.001 94.8

B-spline -0.040 - -0.018 - - -

δ -0.25FPM (df=1) -0.017 60.3 -0.004 81.2 0.001 93.8FPM (df=5) -0.012 60.9 -0.009 80.6 0.001 93.8

B-spline 0.003 49.0 -0.007 80.9 - -

σe 0.5FPM (df=1) 0.068 30.5 0.000 95.0 -0.001 95.0FPM (df=5) 0.068 30.3 0.000 95.0 -0.001 95.0

B-spline 0.096 - 0.008 - - -

ψ 0.25FPM (df=1) 0.005 94.0 0.007 94.4 0.007 94.0FPM (df=5) 0.002 94.0 0.002 93.8 0.006 93.6

B-spline -0.004 95.2 -0.002 94.6 - -

α 0.25FPM (df=1) 0.004 95.2 0.005 95.4 0.005 96.0FPM (df=5) -0.002 93.2 -0.003 91.8 0.005 95.2

B-spline -0.019 94.8 -0.012 95.6 - -

β0 0FPM (df=1) 0.001 63.2 0.006 82.6 0.002 92.6FPM (df=5) 0.000 64.0 0.004 82.4 0.002 92.6

B-spline -0.019 56.6 -0.015 80.9 - -

σ0 1FPM (df=1) -0.048 62.4 -0.004 85.6 -0.008 95.0FPM (df=5) -0.048 62.2 -0.004 84.4 -0.008 95.0

B-spline 0.009 - -0.005 - - -

β1 0FPM (df=1) 0.010 71.2 0.000 88.2 -0.001 95.2FPM (df=5) 0.010 72.0 0.001 88.2 -0.001 94.6

B-spline 0.025 55.8 0.022 72.9 - -

σ1 0.25FPM (df=1) -0.246 17.6 -0.016 92.8 -0.010 95.2FPM (df=5) -0.245 17.4 -0.012 92.6 -0.009 95.2

B-spline 0.019 - -0.029 - - -

σ01 0.25FPM (df=1) 0.001 71.6 0.000 76.6 0.016 96.4FPM (df=5) 0.002 71.4 -0.002 76.6 0.015 96.2

B-spline -0.031 - -0.004 - - -

δ -0.25FPM (df=1) -0.007 61.2 -0.008 80.4 -0.005 94.8FPM (df=5) -0.008 61.8 -0.010 80.8 -0.005 94.8

B-spline -0.012 49.7 -0.012 79.7 - -

σe 0.5FPM (df=1) 0.069 29.8 0.001 93.0 0.000 94.8FPM (df=5) 0.070 29.8 0.001 92.6 0.000 94.8

B-spline 0.095 - 0.007 - - -

ψ 0.25FPM (df=1) 0.001 94.0 0.002 94.0 0.002 94.2FPM (df=5) 0.002 93.0 -0.002 92.8 0.002 93.8

B-spline -0.001 93.7 -0.003 94.0 - -

α -0.25FPM (df=1) -0.004 96.0 -0.002 96.4 -0.002 96.2FPM (df=5) 0.001 95.6 0.008 93.8 -0.001 96.2

B-spline 0.021 92.1 0.014 94.6 - -

95% CP - 95% Coverage Probability, df - degrees of freedom, NAQ - Non-Adaptive Quadrature,AQ - Adaptive Quadrature. FPM - Flexible Parametric Model usng restricted cubic splines

148

Table 7.2. Simulation results from mixture-Weibull scenario 2. Asso-ciation is varied with α = −0.25, 0.25.


β0 0FPM (df=1) -0.007 64.2 0.002 82.1 0.001 94.8FPM (df=5) -0.004 64.8 0.003 80.9 0.002 94.7

B-spline 0.028 52.5 0.022 80.4 - -

σ0 1FPM (df=1) -0.050 58.2 -0.002 86.5 -0.004 95.2FPM (df=5) -0.050 58.8 -0.002 86.4 -0.004 95.3

B-spline 0.011 - -0.001 - - -

β1 0FPM (df=1) -0.012 67.0 -0.001 86.9 0.000 94.8FPM (df=5) -0.013 66.8 -0.001 85.6 -0.001 94.3

B-spline -0.024 57.6 -0.022 70.9 - -

σ1 0.25FPM (df=1) -0.250 11.2 -0.020 93.2 -0.010 94.6FPM (df=5) -0.252 10.4 -0.020 92.9 -0.011 94.3

B-spline 0.014 - -0.028 - - -

σ01 0.25FPM (df=1) -0.007 68.2 -0.008 75.5 0.012 95.6FPM (df=5) -0.012 68.6 -0.007 74.8 0.012 95.5

B-spline -0.022 - -0.002 - - -

δ -0.25FPM (df=1) 0.001 62.0 -0.007 79.9 -0.005 96.6FPM (df=5) 0.000 61.0 -0.008 78.5 -0.005 96.6

B-spline -0.005 50.9 -0.006 78.1 - -

σe 0.5FPM (df=1) 0.068 27.2 0.000 95.0 -0.001 96.4FPM (df=5) 0.068 27.2 0.000 95.5 -0.001 96.4

B-spline 0.101 - 0.006 - - -

ψ 0.25FPM (df=1) -0.002 95.4 -0.002 95.6 -0.002 95.4FPM (df=5) -0.003 95.4 -0.004 94.9 -0.003 95.1

B-spline -0.011 95.8 -0.012 94.9 - -

α 0.25FPM (df=1) 0.001 93.6 0.000 93.4 0.000 93.4FPM (df=5) -0.002 93.0 -0.002 91.7 -0.001 93.3

B-spline -0.025 90.1 -0.020 91.7 - -

β0 0FPM (df=1) -0.010 64.3 0.003 81.1 -0.002 94.4FPM (df=5) -0.011 63.3 -0.001 81.7 -0.002 94.4

B-spline -0.040 46.6 -0.019 78.1 - -

σ0 1FPM (df=1) -0.042 60.7 -0.001 84.5 -0.003 94.8FPM (df=5) -0.043 60.7 -0.001 85.3 -0.003 94.8

B-spline 0.012 - 0.003 - - -

β1 0FPM (df=1) 0.009 74.4 0.000 87.6 -0.001 96.4FPM (df=5) 0.009 74.8 0.001 88.2 -0.001 96.2

B-spline 0.023 58.1 0.022 75.1 - -

σ1 0.25FPM (df=1) -0.247 14.0 -0.018 95.6 -0.008 94.0FPM (df=5) -0.248 13.4 -0.018 95.2 -0.008 94.0

B-spline 0.017 - -0.028 - - -

σ01 0.25FPM (df=1) 0.002 67.7 -0.002 69.9 0.016 94.2FPM (df=5) 0.003 67.1 -0.003 70.1 0.017 94.4

B-spline -0.017 - -0.002 - - -

δ -0.25FPM (df=1) 0.014 56.3 -0.002 81.9 0.002 95.0FPM (df=5) 0.014 57.7 0.001 81.9 0.002 95.0

B-spline 0.023 43.8 -0.003 79.7 - -

σe 0.5FPM (df=1) 0.069 25.7 0.000 92.8 -0.002 93.8FPM (df=5) 0.069 25.0 0.000 92.8 -0.001 94.0

B-spline 0.100 - 0.006 - - -

ψ 0.25FPM (df=1) -0.004 94.4 -0.001 94.4 -0.002 94.4FPM (df=5) -0.004 94.2 -0.007 94.2 -0.002 94.8

B-spline -0.004 95.2 -0.004 95.0 - -

α -0.25FPM (df=1) -0.008 96.2 -0.004 96.4 -0.005 96.4FPM (df=5) -0.005 94.0 0.003 93.0 -0.005 95.2

B-spline 0.020 94.0 0.014 94.2 - -


149

Table 7.3. Simulation results from Weibull scenario 3. Association isvaried with α = −0.25, 0.25.


β0 0FPM (df=1) 0.006 57.3 -0.001 78.5 -0.001 95.0FPM (df=5) 0.009 57.1 0.000 78.0 -0.005 94.9

B-spline -0.002 42.7 0.006 74.6 - -

σ0 1FPM (df=1) -0.063 49.9 -0.010 79.5 -0.010 96.2FPM (df=5) -0.063 50.3 -0.010 79.4 -0.011 95.6

B-spline 0.006 - -0.006 - - -

β1 0FPM (df=1) 0.000 79.8 0.001 85.7 0.000 94.6FPM (df=5) 0.001 78.2 0.001 85.3 -0.002 94.1

B-spline 0.000 68.1 0.000 83.0 - -

σ1 0.25FPM (df=1) -0.217 6.6 -0.016 92.6 -0.003 93.4FPM (df=5) -0.218 6.8 -0.016 92.3 -0.005 93.9

B-spline 0.012 - -0.004 - - -

σ01 0.25FPM (df=1) -0.046 20.6 -0.020 68.5 0.011 95.2FPM (df=5) -0.046 20.4 -0.023 69.7 0.009 94.7

B-spline 0.040 - 0.012 - - -

δ -0.25FPM (df=1) 0.006 95.8 0.008 77.3 0.000 96.6FPM (df=5) -0.012 94.8 0.007 76.6 -0.001 96.6

B-spline 0.013 35.3 0.000 71.3 - -

σe 0.5FPM (df=1) 0.073 9.8 0.003 93.0 0.000 93.8FPM (df=5) 0.073 10.0 0.003 92.9 0.000 94.3

B-spline 0.133 - 0.010 - - -

ψ 0.25FPM (df=1) 0.006 95.8 0.007 95.6 0.006 95.4FPM (df=5) -0.012 94.8 0.006 95.6 -0.012 95.1

B-spline -0.005 95.4 -0.002 95.8 - -

α 0.25FPM (df=1) 0.010 94.0 0.004 94.6 0.004 94.4FPM (df=5) 0.001 93.2 0.001 93.7 -0.016 90.7

B-spline -0.025 96.0 -0.018 96.4 - -

β0 0FPM (df=1) 0.008 56.6 0.006 73.4 0.003 94.8FPM (df=5) 0.009 56.8 0.006 74.0 0.005 94.8

B-spline 0.009 39.1 0.002 70.4 - -

σ0 1FPM (df=1) -0.061 45.6 -0.009 75.8 -0.006 95.8FPM (df=5) -0.061 45.8 -0.009 76.2 -0.007 95.2

B-spline 0.011 - -0.003 - - -

β1 0FPM (df=1) 0.000 80.0 0.000 87.6 -0.001 95.2FPM (df=5) 0.000 80.4 0.000 88.2 0.000 95.4

B-spline 0.000 70.9 0.000 87.8 - -

σ1 0.25FPM (df=1) -0.219 5.2 -0.019 93.6 -0.004 94.3FPM (df=5) -0.219 4.8 -0.018 93.2 -0.005 93.6

B-spline 0.010 - -0.005 - - -

σ01 0.25FPM (df=1) -0.036 23.4 -0.021 64.5 0.010 94.1FPM (df=5) -0.034 23.4 -0.020 64.8 0.009 94.4

B-spline 0.037 - 0.009 - - -

δ -0.25FPM (df=1) -0.011 52.6 -0.015 74.6 -0.009 94.8FPM (df=5) -0.010 52.6 -0.015 75.4 -0.009 94.8

B-spline -0.023 36.3 -0.015 68.8 - -

σe 0.5FPM (df=1) 0.073 8.6 0.002 93.4 0.000 94.3FPM (df=5) 0.074 8.2 0.003 93.6 -0.001 94.4

B-spline 0.133 - 0.010 - - -

ψ 0.25FPM (df=1) -0.006 95.2 -0.003 95.2 -0.003 95.2FPM (df=5) -0.010 95.4 -0.009 94.8 -0.019 94.2

B-spline 0.000 95.4 -0.001 95.4 - -

α -0.25FPM (df=1) -0.007 95.0 -0.001 95.0 0.000 94.8FPM (df=5) 0.002 93.4 0.000 94.2 0.013 92.0

B-spline 0.028 94.2 0.021 95.0 - -


150

Table 7.4. Liver cirrhosis trial of prednisone treatment. Results fromapplying the RP restricted cubic spline and B-spline joint models.

ParameterJoint model: RCS Joint model: B-spline

Estimate 95% CI Estimate 95% CI

Longitudinal:t1 0.872 0.388 1.356 1.169 0.694 1.643t1*treatment 0.272 -0.354 0.899 0.299 -0.327 0.925Intercept 75.252 73.033 77.471 75.901 73.745 78.056

sd(t1) 2.333 1.953 2.786 2.176 - -sd(intercept) 21.839 20.152 23.667 21.327 - -corr(t1,intercept) 0.610 0.467 0.722 0.567 - -

sd(Residual) 17.612 17.092 18.148 17.658 - -

Survival:Association -0.038 -0.045 -0.031 -0.039 -0.046 -0.031Treatment 0.210 -0.038 0.457 0.241 -0.007 0.489

where t1 = log(time + 0.00273)RCS - Restricted Cubic Splines, CI - Confidence Interval

Table 7.5. Liver cirrhosis trial of prednisone treatment. Results fromjoint models with varying knot locations.

ParameterKnot locations A Knot locations B Knot locations C

Estimate 95% CI Estimate 95% CI Estimate 95% CI

Longitudinal:t1 0.877 0.393 1.361 0.873 0.390 1.357 0.874 0.391 1.358t1*treatment 0.275 -0.351 0.902 0.275 -0.352 0.901 0.273 -0.353 0.900Intercept 75.304 73.088 77.519 75.261 73.044 77.479 75.262 73.045 77.481

sd(t1) 2.333 1.954 2.786 2.331 1.952 2.784 2.331 1.951 2.784sd(intercept) 21.843 20.132 23.700 21.830 20.143 23.658 21.829 20.142 23.657corr(t1,intercept) 0.609 0.466 0.721 0.609 0.466 0.722 0.609 0.466 0.722

sd(Residual) 17.611 17.091 18.147 17.613 17.093 18.148 17.612 17.093 18.148

Survival:Association -0.038 -0.045 -0.031 -0.038 -0.045 -0.031 -0.038 -0.045 -0.031Treatment 0.209 -0.038 0.456 0.212 -0.035 0.459 0.210 -0.037 0.457

151

CHAPTER 8

Joint modelling of longitudinal and survival data using

finite mixture models


In this chapter I explore a different class of parametric survival submodels, incorpo-

rating finite mixtures of parametric distributions to model the baseline hazard flexibly

and on the log hazard scale, as an alternative to the Royston-Parmar submodel de-

scribed in the previous chapter which is modelled on the log cumulative hazard scale.

Through simulation I investigate the effect of misspecifying the baseline hazard func-

tion when modelling on the log hazard scale, and compare the finite mixture approach

to the Royston-Parmar approach through application to the liver cirrhosis dataset used

in the previous chapter and introduced in Chapter 3.

8.2. Introduction

In the previous chapter, I developed a flexible parametric framework for the joint

analysis of a longitudinal continuous biomarker and a time to event. Although the

RP based survival submodel has advantages in terms of flexibility and estimation,

by modelling on the log cumulative hazard scale, the interpretation of the association

parameter under a time-dependent association structure is not equivalent to when mod-

elling on the log hazard scale. Moving to the more traditional log hazard scale, whilst

maintaining flexibility, can overcome this.

In this chapter I propose to incorporate a finite mixture of parametric distribu-

tions as the survival submodel within the joint modelling framework. Finite mixture

models of this form have been previously shown to increase the flexibility of standard

time-to-event models (McLachlan and McGiffin, 1994), and mixture and non-mixture

152

cure models to obtain improved estimates of statistical cure (Lambert et al., 2010b).

A finite mixture of two Weibull distributions was described in Chapter 4 as a way to

simulate survival times with more biologically plausible baseline hazard functions in-

corporating turning points. This was successfully utilised in an extensive simulation

study that I was involved in, evaluating the Royston-Parmar model (Rutherford et al.,

2014), where we wished to simulate from a complex baseline hazard, and apply flexible

approximations using splines, rather than the true underlying model (in this case the

equivalence between proportional hazards and proportional cumulative hazard holds).

This was further utilised in Chapter 7, simulating data on the log cumulative hazard

scale to evaluate the RP based joint model.

In particular, in this chapter I aim to explore the impact of mispecifying the baseline

hazard function, specifically in terms of estimating the association parameter within a

joint model. In Chapter 4 Section 4.6.2, I conducted a small simulation study which

simulated survival times under a proportional hazards two-component mixture Weibull

model, and applied a Weibull proportional hazards model and the underlying mixture

model, monitoring the estimates of the log hazard ratio due to a binary covariate. It

indicated some bias in the estimate of the log hazard ratio under a Weibull model,

compared to unbiased estimates under the true mixture model.

In Section 8.3 I derive the two-component mixture survival submodel and describe

the appropriate likelihood formulation for a joint longitudinal-survival model under

this model specification. In Section 8.4 I conduct a simulation study, evaluating the

impact of mispecifying the baseline hazard function. In Section 8.5 I compare the two-

component mixture survival submodel to the Royston-Parmar submodel presented in

the previous chapter, in application to the liver cirrhosis dataset. Finally, in Section

8.6, I conclude the chapter with a discussion.

8.3. Methods

For clarity, I briefly re-define the notation. For the ith patient, I observe time-to-

event data, longitudinal response data and covariate data. Let Si be the true survival

153

time of patient i = 1, . . . , n, and Ti = min(Si, Ci) the observed survival time, with Ci

the censoring time. Define an event indicator di, which takes the value of 1 if Si ≤ Ci

and 0 otherwise. Let yij = yi(tij), j = 1, . . . ,mi denote the jth longitudinal response

measurements of a biomarker for the ith patient taken at times tij . Furthermore, I define

shared random effects, bi, which underpin the survival and longitudinal processes. Each

sub-model can be dependent on a set of covariates, Ui, which can potentially differ

between sub-models. Both censoring and time of measurements are assumed to be

non-informative.

8.3.1. Longitudinal sub-model

As in Chapters 6 and 7, I once again assume a continuous longitudinal biomarker of

interest, postulating a linear mixed effects model for the longitudinal sub-model

yi(tij) = mi(tij) + eij , eij ∼ N(0, σ2e) (8.1)

mi(tij) = xTi (tij)β + zTi (tij)bi + uiδ (8.2)

with xi and zi are design matrices for the fixed (β) and random (bi) effects of time.

Here I can incorporate flexibility by allowing both xi and zi to contain, for example,

either fractional polynomials or restricted cubic spline functions of measurement time

(Royston and Altman, 1994; Durrleman and Simon, 1989). ui ∈ Ui is a vector of

covariates, with corresponding regression coefficients, δ.

8.3.2. Two-component mixture proportional hazards models

I now derive the two-component mixture proportional hazards model. I begin with a

general definition, with components additive on the survival scale

S0(t) = pS01(t) + (1− p)S02(t) (8.3)

where S0(t) is the overall baseline survival function, with two component survival func-

tions S01(t) and S02(t), which can each take the form of the survival function of any

particular distribution, such as the exponential or Weibull. The parameter p is known

as the mixture parameter, with 0 ≤ p ≤ 1. The baseline hazard function is then

154

calculated as follows

h0(t) =d

dt[− log(S0(t))] (8.4)

Proportional hazards can then be invoked, where

h(t) = h0(t) exp(Xiβ) (8.5)

with the linear predictor ofXiβ constrained to have no intercept term. In this chapter I

concentrate on using a two-component mixture Weibull baseline hazard function, which

is defined as follows

S0(t) = p exp(−λ1tγ1) + (1− p) exp(−λ2t

γ2) (8.6)

where λ1, λ2, and γ1, γ2 are scale and shape parameters, respectively, with λ,γ > 1.

Once again, p represents the mixing parameter, with 0 ≤ p ≤ 1. Transforming to the

cumulative hazard scale

H0(t) = − log [p exp(−λ1tγ1) + (1− p) exp(−λ2t

γ2)] (8.7)

Differentiating with respect to t, I obtain the baseline hazard function:

h0(t) =λ1γ1t


γ2−1(1− p) exp(−λ2tγ2)

p exp(−λ1tγ1) + (1− p) exp(−λ2tγ2)(8.8)

As was shown in Chapter 4, this hazard function can capture a variety of shapes,

including multiple turning points, providing the desired flexibility. This can be used as

the baseline with covariates acting multiplicatively, with proportional hazards induced

as follows

h(t) =λ1γ1t


γ2−1(1− p) exp(−λ2tγ2)

p exp(−λ1tγ1) + (1− p) exp(−λ2tγ2)exp(Xs(t)β) (8.9)

where Xs(t) is a matrix of time-independent and/or time-dependent covariates, with

associated log hazard ratios, β. This form of survival model could be useful in standard

survival analysis, providing a flexible baseline function beyond standard choices such as

155

the Weibull. This model, along with a mixture Weibull-exponential model, is available

through the stmix command in Stata (Crowther and Lambert, 2011).

Within the joint modelling framework, the association structures described in Sec-

tion 6.4 of Chapter 6 are all applicable within the two-component mixture model, for

example, if I assume the association between the longitudinal and survival models acts

through the current value of the longitudinal response, as modelled by the longitudinal

submodel, and furthermore, define a vector of time independent baseline covariates,

vi ∈ Ui, I have

h(t) =λ1γ1t


γ2−1(1− p) exp(−λ2tγ2)

p exp(−λ1tγ1) + (1− p) exp(−λ2tγ2)expψTvi + α1m(t)

(8.10)

where α1 is our association parameter and estimates the strength of the association

between the longitudinal and survival components, and ψ is a vector of log hazard

ratios corresponding to baseline covariates, vi.

Alternatively, interest may be in the association between the rate of change (slope)

of the biomarker trajectory, and how it relates to the time to event. This association

structure is formulated as

h(t) =λ1γ1t


γ2−1(1− p) exp(−λ2tγ2)

p exp(−λ1tγ1) + (1− p) exp(−λ2tγ2)expψTvi + α2m

′i(t)(8.11)

where m′i(t) = ddtmi(t) is the first derivative of the longitudinal submodel and α2

estimates the strength of the association between the rate of change of the biomarker

trajectory and survival.

Finally, I can incorporate random coefficients, such as the intercept, into the linear

predictor of the survival submodel

h(t) =λ1γ1t


γ2−1(1− p) exp(−λ2tγ2)

p exp(−λ1tγ1) + (1− p) exp(−λ2tγ2)expψTvi+α3(β0+b0i)

(8.12)

where by including the fixed intercept, β0, α3 corresponds to the log hazard ratio for

a one unit increase in the patient specific baseline (t = 0) value of the biomarker. The

156

above association structures were described in Chapter 6, along with further types, I

only concentrate on the most commonly used here for illustration.

8.3.3. Maximising the full joint likelihood

I re-define the full likelihood here, formulated for a continuous biomarker and the time

to an event of interest, as follows,

n∏

i=1

∫ ∞

−∞

mi∏

j=1

f(yi(tij)|bi,θ)

f(bi|θ)f(Ti, di|bi,θ) dbi

(8.13)

with

f(yi(tij)|bi,θ) = (2πσ2e)−1/2 exp

− [yi(tij)−mi(tij)]

2

2σ2e

, (8.14)

f(bi|θ) = (2π|Σ|)−q/2 exp

−b

Ti Σ−1bi

2

, (8.15)

and

f(Ti, di|bi,θ) = h(Ti, di|bi,θ)di × S(Ti, di|bi,θ) (8.16)

where,

S(Ti, di|bi,θ) = exp

(−∫ Ti

0h(u, di|bi,θ) du

)(8.17)

where θ is our full parameter vector. I assume the random effects follow a multi-

variate normal distribution with variance-covariance matrix Σ. The hazard function,

h(Ti, di|bi,θ) is defined in Equation (8.10). The integral in Equation (8.17) can be

evaluated using k-point Gauss-Legendre quadrature, which was utilised in Chapters 4

and 5. When evaluating the full joint likelihood in Equation (8.13), I showed in the

previous chapter that non-adaptive Gauss-Hermite quadrature (Pinheiro and Bates,

1995) can lead to quite substantial under-estimation of standard errors, particularly of

parameters in the longitudinal sub-model, unless a large number (> 30) of quadrature

points are used. This also has the effect of increasing computation time substantially.

Alternatively, I recommend the routine use of fully adaptive Gauss-Hermite quadra-

ture which was shown to perform well with as few as 5 quadrature points (Crowther

157

et al., 2012a). As with all methods which use numerical integration, it is important to

establish the stability of estimates with an increasing number of quadrature nodes.

The likelihood in Equation (8.13) can once again be maximised using the Newton-

Raphson technique with the first and second derivatives estimated numerically, as im-

plemented in the ml command in Stata (Gould et al., 2010). I obtain variance estimates

using the inverse of the negative Hessian matrix evaluated at the maximum likelihood

estimates.


I evaluate the proposed model through simulation. In previous chapters I have em-

phasised the benefit of simulating time-to-event data using biologically plausible base-

line hazard functions when assessing current and novel methodology (Crowther et al.,

2012a; Crowther and Lambert, 2012). This is particularly important when assessing

flexible and parametric methods.

The primary objective of this simulation study is to assess the finite mixture based

joint model under complex baseline hazard functions. I assume a random intercept

and slope model for a continuous longitudinal outcome, including a binary treatment

group, mi(t) = β0i + β1it+ δui, where β0i ∼ N(0, 1), β1i ∼ N(0, 0.252) and correlation

between (β0i, β1i) of 0.25. I define the direct effect of treatment on the longitudinal

biomarker, δ = 0.25, with ui ∼ Bin(1, 0.5).

For each scenario, 400 patients were included in each of 500 replications. The

observed longitudinal profile was generated from yi(tij) ∼ N(mi(tij), 0.52). Time of

measurements, ti, were fixed at (0, 1, 2, 3, . . . , 9, 10). Survival times were generated

from log(h(t|bi, ui)) = log(h0(t)) + αmi(t) + ψui, with h0(t) defined in Equation (8.8),

i.e. a mixture Weibull baseline hazard. Simulating such survival times required the

general simulation algorithm described in Chapter 4. Administrative censoring was

applied at 10 years. The direct treatment effect on the time-to-event, ψ, is fixed at

0.25, and the association parameter, α, is varied between -0.25, 0.25.158

Two combinations of mixture Weibull parameters were used; a mixture Weibull

with a single turning point in the baseline hazard function, λ1 = 0.2, γ1 = 0.8, λ2 = 1,

γ2 = 1.2, p = 0.3, and a mixture Weibull with 2 turning points and generally increasing

baseline hazard function, λ1 = 0.1, γ1 = 3, λ2 = 0.1, γ2 = 1.5, p = 0.6. I illustrate

the 2 baseline hazard functions used in Figure 8.1. Scenarios 1 and 2 were chosen to

broadly reflect the observed shape of hazard functions found in two commonly used

joint model example datasets (Anderson et al., 1993; Murtagh et al., 1994).

0.0

0.2

0.4

0.6

0.8

Haz

ard

rate


Scenario 1

0.0

0.2

0.4

0.6

0.8

Haz

ard

rate


Scenario 2

Hazard functions

Figure 8.1. Baseline hazard functions used in the simulation study.

To each simulated dataset I apply the two-component mixture Weibull based joint

model, assessing bias and coverage of parameter estimates. 15-point Gauss-Legendre

quadrature is used to evaluate the cumulative hazard, with 5-point adaptive Gauss-

Hermite quadrature used to evaluate the joint likelihood. For comparison, I also apply

a Weibull based joint model so that any potential bias in the estimated hazard ratios

can be evaluated when fitting an overly simple model.

8.4.1. Simulation results

Table 8.1 contains bias and coverage estimates for the three parameters of interest,

the direct effect of treatment on survival, the direct effect of treatment on the longi-

tudinal marker, and the association parameter. Results show some substantial bias

in the association parameter when using a Weibull joint model compared to minimal

159

bias under the two-component mixture Weibull joint model. This is observed in both

scenarios with either a positive or negative association. For example under Scenario

1, with α = −0.25 the bias observed under a Weibull model was -0.103 (41.2%) with

coverage of 29.1%, compared to bias of -0.007 (2.8%) and coverage of 96.4% under the

mixture model. Moderate bias is also observed in the direct effect of treatment on sur-

vival (ψ) when fitting a Weibull model, compared to minimal bias under the mixture

model. For example in scenario 1 with α = −0.25, the bias observed in the estimates

of ψ was 0.059 (23.6%), compared to 0.008 (3.2%) under the mixture model.

Table 8.1. Estimates of bias and coverage from the simulation study.

Scenario ParameterTrue Weibull Mixture Weibullvalue Bias Coverage Bias Coverage

1

α 0.25 -0.067 43.7 0.003 96.6δ 0.25 -0.008 96.4 -0.007 96.4ψ 0.25 0.060 85.6 0.009 95.6

α -0.25 -0.103 29.1 -0.007 96.4δ 0.25 -0.006 96.0 -0.007 96.0ψ 0.25 0.059 86.4 0.008 94.2

2

α 0.25 0.033 72.4 0.003 95.2δ 0.25 -0.008 95.8 -0.006 95.8ψ 0.25 0.059 83.0 0.007 93.8

α -0.25 -0.063 44.4 0.019 89.2δ 0.25 -0.006 96.2 -0.007 96.0ψ 0.25 0.050 85.6 0.002 94.0

8.5. Application to liver cirrhosis data

I now apply the proposed model to the liver cirrhosis dataset, which was also used to

illustrate the RP based joint model in the previous chapter, allowing direct comparisons

to be made. I also apply a Weibull based joint model, as in the simulation study above.

I use a random intercept and random slope trajectory model for prothrombin index

over time, using the log time scale. I adjust for treatment in the survival sub-model

160

and the interaction between measurement time (log) and treatment in the longitudinal

model.

Results are presented in Table 8.2. Consistent parameter estimates were obtained

in all three models with 5- and 6-point adaptive quadrature, to 4 decimal places. From

the mixture-Weibull joint model I observe a statistically significant (at the 5% level)

association of -0.039 (95% CI: -0.047, -0.032), indicating a lower value of prothrombin

index increases the risk of death. I find general consistency across the Weibull, mixture

Weibull models and RP joint models, in terms of the common parameters, and in

particular the association parameter.

Comparing the log hazard scale models, I illustrate the improved fit of the mixture

Weibull approach compared to a standard Weibull through plots of the fitted marginal

survival function, overlaid on the Kaplan-Meier curve, presented in Figure 8.2. The

left hand fitted marginal survival curve clearly illustrates that the Weibull based joint

model is not flexible enough to capture the underlying hazard function, compared to

the mixture Weibull in the right hand plot, showing improved fit to the Kaplan-Meier

curve. This is emphasised by comparing AIC and BIC values from the model fits,

shown in Table 8.2.

0.0

0.2

0.4

0.6

0.8

1.0

Sur

viva

l

0 5 10 15Follow-up time (years)

0.0

0.2

0.4

0.6

0.8

1.0

Sur

viva

l


Kaplan-Meier curve 95% CI of Kaplan-MeierMarginal survival

Figure 8.2. Liver cirrhosis trial of prednisone treatment. Fitted mar-ginal survival functions. The left plot is from the Weibull based jointmodel, and the right from the mixture Weibull joint model.

161

8.6. Discussion

In this chapter, I have proposed a new time to event submodel for the joint model

framework, incorporating a two-component parametric mixture model. This formula-

tion overcomes limitations of standard parametric choices such as the exponential and

Weibull, which often lack the flexibility to capture complex hazard functions observed

in clinical datasets. In application to the trial dataset, I observed a much improved

fit to the baseline hazard function when compared to a standard Weibull based joint

model.

The simulation study conducted illustrated excellent performance of the model

in terms of bias and coverage probabilities. The adaptive Gauss-Hermite quadrature

performed well with as few as 5 quadrature nodes, supporting the results found in

the previous chapter (Crowther et al., 2012a). Furthermore, when simulating from

more biologically plausible baseline hazard functions incorporating turning points, I

observed bias in estimates of the association parameter and the direct treatment effect

on survival, when applying a Weibull based joint model.

In the previous chapter, I proposed a joint model which incorporated the flexible

parametric survival model as the survival submodel. This survival submodel is mod-

elled on the (log) cumulative hazard scale, which has direct computational benefits

by avoiding the nested Gauss-Legendre quadrature described in Section 8.3. However,

under time-dependent association structures the equivalence between log hazard ratios

and log cumulative hazard ratios does not always hold. Therefore by directly modelling

on the more traditional log hazard scale, the approach taken in this chapter, I avoid

this issue. In application to the liver cirrhosis data, I found very consistent estimates

of all parameter estimates across the modelling frameworks.

The mixture approach is not without limitations. An issue with such a mixture

of functions is that there are multiple maxima that a model could converge to. For

example, the same baseline hazard function in the described model could be constructed

if p = 0.2, or alternatively if p = 0.8, by just swapping the location and scale parameters

between the two mixtures. Furthermore, the issue of ‘how many mixtures?’ should be

162

raised. In the example described in this chapter, it is likely that two mixtures are

sufficient to capture the complexity of the baseline hazard function; however, there

could be situations where more than two mixtures may be required. This issue has been

considered from a Bayesian perspective in which the number of mixtures is treated as

a further parameter to be estimated (Demiris et al., 2011).

An alternative to the mixture approach is to use restricted cubic splines on the log

hazard scale, which I would recommend, as opposed to the RP model. This will be

discussed further in Chapter 10.

163

Table

8.2.

Liv

erci

rrh

osi

str

ial

ofp

red

nis

one

trea

tmen

t.P

aram

eter

esti

mat

esan

d95%

Con

fid

ence

Inte

rval

s(C

I)fo

rth

eW

eib

ull,

mix

ture

-Wei

bu

llan

dR

Pjo

int

mod

els.

Par

amet

erW

eib

ull

join

tm

od

elM

ixtu

re-W

eib

ull

join

tm

od

elR

Pjo

int

mod

elE

stim

ate

95%

CI

Est

imat

e95

%C

IE

stim

ate

95%

CI

Lon

git

ud

inal:

t10.9

240.

444

1.40

40.

909

0.42

81.

391

0.877

0.3

93

1.3

61

Tre

atm

ent*

t10.

269

-0.3

530.

891

0.26

9-0

.354

0.8

93

0.2

76

-0.3

510.9

02

Inte

rcep

t75.

482

73.2

7477

.690

73.2

2077

.639

75.2

4875

.304

73.0

8977

.519

sd(t

1)2.

282

1.89

72.

746

2.30

41.

920

2.766

2.3

33

1.9

64

2.7

86

sd(I

nte

rcep

t)21.7

9620

.088

23.6

4821

.805

20.0

9823.

658

21.

843

20.1

32

23.7

00

corr

(t1,

Inte

rcep

t)0.

616

0.46

80.

730

0.61

60.

470

0.7

30

0.609

0.466

0.721

sd(R

esid

ual)

17.6

33

17.1

1018

.172

17.6

2817

.106

18.1

66

17.

611

17.

091

18.

147

Su

rviv

al:

Ass

oci

ati

on

-0.0

38-0

.045

-0.0

31-0

.039

-0.0

47-0

.032

-0.0

38

-0.0

45-0

.031

Tre

atm

ent

0.19

4-0

.05

0.43

70.

194

-0.0

50.

438

0.209

-0.0

38

0.4

57

log(λ

1)

1.08

20.

606

1.55

8-0

.105

-0.9

130.7

03

--

-lo

g(γ

1)

-0.0

42

-0.1

380.

053

0.42

80.

210

0.6

46

--

-lo

g(λ

2)

--

-1.

586

1.16

32.0

09

--

-lo

g(γ

2)

--

-0.

134

-0.0

780.3

46

--

-lo

git(p)

--

--3

.081

-5.1

22-1

.04

--

-

log

like

lih

ood

-1404

1.50

4-1

4031

.385

-1402

9.375

AIC

2810

5.00

728

090.

769

2808

8.751

BIC

2814

5.45

128

142.

244

281

43.9

02

wher

et1

=lo

g(t

+0.0

02737

)

164

CHAPTER 9

Adjusting for measurement error in baseline prognostic

biomarkers: A joint modelling approach


In this chapter I explore the use of a specific association structure within the joint

model framework, describing how joint modelling can be used to account for measure-

ment error in a biomarker, even when only the baseline value of the biomarker is of

interest. This is a common occurrence in registry data sources, where often repeated

measurements exist but are simply ignored. The proposed specification is evaluated

through simulation and applied to data from the General Practice Research Database,

investigating the association between baseline Systolic Blood Pressure (SBP) and the

time to stroke in a cohort of obese patients with type 2 diabetes mellitus.

This chapter is based on a recent publication in BMC Medical Research Methodol-

ogy, a copy of which can be found in Appendix E (Crowther et al., 2013b).

9.2. Introduction

Many biomarkers such as systolic blood pressure (SBP) have been identified as

key prognostic factors in the development and validation of cardiovascular risk scores

(Conroy et al., 2003; Hippisley-Cox et al., 2007). However, often only baseline values of

these biomarkers are used, for example measured at diagnosis, despite the existence of

repeated measures, especially in registry sources such as the General Practice Research

Database (GPRD) (GPRD, 2011; Ara et al., 2012). Prognosis from diagnosis is a

relevant research question and of interest to both patients and clinicians. Furthermore,

biomarkers are often measured with error. Failing to adjust for such measurement error

leads to estimates being biased towards the null (Prentice, 1982).

165

As I have described in the previous chapters, a joint model of longitudinal and

survival data allows us to investigate the relationship between a repeatedly measured

biomarker, subject to measurement error, such as SBP, and the time to an event of

interest, such as time to non-fatal stroke. The approach which has dominated the

methodological literature involves linking the two component sub-models using shared

random effects (Wulfsohn and Tsiatis, 1997; Henderson et al., 2000). These methods

require computationally intensive numerical integration, which is difficult to implement.

However, due to the recent introduction of user-friendly software in R (Rizopoulos, 2010;

Philipson et al., 2012) and Stata (Crowther et al., 2013a), these models are starting to

find their place in applied research (Wolbers et al., 2010; Ibrahim et al., 2010; Gould

et al., 2014), but the potential uses of and forms of the association parameters, linking

the longitudinal and survival components, are yet to be fully explored.

The most commonly used association structures include the current value param-

eterisation (Wulfsohn and Tsiatis, 1997); whereby I directly link the value of the

biomarker, as estimated by the longitudinal sub-model, to survival, and the first deriv-

ative or slope (Ye et al., 2008; Wolbers et al., 2010); allowing the investigation of the

effect that the rate of change of the biomarker has on survival.

There is often interest in predicting prognosis based on an initial baseline measure-

ment (Conroy et al., 2003; Hippisley-Cox et al., 2007). In this chapter I investigate

the use of the joint model framework with a random intercept association structure

as an approach to adjust for measurement error, inherent in biomarkers such as SBP.

By incorporating the repeated measures I thus make the most efficient use of the data

available.

In Section 9.3 I briefly re-formulate the joint model, whilst in Section 9.4 I conduct

a simulation study assessing the validity of the approach, varying the magnitude of

measurement error in the assumed biomarker, the magnitude of the association between

the biomarker and survival, and compare the joint model to the standard approach of

adjusting for observed baseline biomarker values in a survival model. In Section 9.5

I illustrate the approach with application to a cohort of 4,850 patients with type 2

166

diabetes mellitus, who have repeated measures of SBP and a time-to-event outcome

of non-fatal stroke. Here I model the longitudinal trajectories flexibly using restricted

cubic splines, and assess the impact using simpler longitudinal trajectory models, using

polynomials, has on estimates of the association parameter. I conclude the chapter in

Section 9.6 with a discussion.

9.3. Joint model formulation

As introduced in Chapter 6; a joint model of longitudinal and survival data consists

of two component submodels, the longitudinal submodel and the survival submodel. I

define a set of baseline covariates, Ui, which can potentially differ between submodels.

The longitudinal submodel allows us to model the trajectory of a repeatedly measured

biomarker over time, adjusting for baseline covariates. The standard approach assumes

a linear mixed effects model. I observe

yi(tij) = mi(tij) + εij , εij ∼ N(0, σ2e) (9.1)

with

mi(tij) = xTi (tij)β + zTi (tij)bi + uiδ (9.2)

where yi(tij) is the observed longitudinal response for the ith patient measured at

the jth time point. mi(tij) is our true trajectory function consisting of design matri-

ces xTi (tij) and zTi (tij) for the fixed and random effects, β and bi, respectively, where

bi ∼ MVN(0,Σ). I can incorporate flexibility here by allowing both xTi (tij) and zTi (tij)

to contain restricted cubic spline functions of time (Durrleman and Simon, 1989), de-

scribed further in Section 9.5. I also have a vector of baseline covariates ui ∈ Ui,and corresponding regression coefficients, δ. Finally, εij is our normally distributed

measurement error with constant variance σ2e .

The time-to-event submodel usually takes the form of a proportional hazards model

hi(t) = h0(t) exp(α1mi(t) +ψTvi) (9.3)

167

with h0(t) the baseline hazard function and vi ∈ Ui is a vector of baseline covariates

with corresponding log hazard ratios, ψ. The parameter α1 is commonly named the

association parameter, indicating the strength of association between the longitudinal

biomarker and the time to event. This parameterisation assumes the hazard is de-

pendent on the biomarker through its current value. This form of association is one

of many ways to link the two component sub-models. The baseline hazard function,

h0(t), can be modelled using a parametric distribution, most frequently the Weibull,

or more flexibly using Royston-Parmar survival models (Royston and Lambert, 2011),

or of course can be left unspecified (Henderson et al., 2000).

For illustration, I let mi(tij), the longitudinal submodel, be a linear function of

time

mi(tij) = (β0 + b0i) + (β1 + b1i)tij (9.4)

giving a model with a random intercept and random linear slope. As an alternative

way of linking the component models to that of Equation (9.3), I may link elements

of the trajectory function, mi(tij), to the hazard directly. For example, I can link the

subject specific baseline biomarker values through the intercept association structure,

where

hi(t) = h0(t) exp[α2(β0 + b0i) +ψTvi

](9.5)

where α2 now estimates the strength of the association between the patient specific

baseline biomarker values, as estimated by the longitudinal submodel, and the time to

event. This way I can let the risk of event depend directly on the subject specific value

of the biomarker at time t = 0.

If interest lies in prediction when a new patient is observed at baseline, the issue

of measurement error can be accounted for through this approach. A benefit of this

association structure also lies in the evaluation of the joint likelihood. Under most

parametric survival submodels (e.g. Weibull distribution) and time-dependent asso-

ciation structures (eg. current value), numerical quadrature is required to integrate

out not only the random effects, but under Equation (9.3), nested quadrature is also

168

required to evaluate the cumulative hazard function. Under the time-independent as-

sociation structure of Equation (9.5), I avoid this nested quadrature as the cumulative

hazard function has an analytically tractable form, which provides computational ben-

efits. For example under a Weibull baseline hazard function, the cumulative hazard

and subsequently the survival under the association structure shown in Equation (9.5)

are written as

Hi(t) = λtγ exp[α2(β0 + b0i) +ψTvi

](9.6)

and

Si(t) = exp(−λtγ exp

[α2(β0 + b0i) +ψTvi

])(9.7)

As discussed in Section 9.2, this model formulation can be an alternative to the standard

approach of using the observed baseline biomarker value

hi(t) = h0i(t) exp(α3Y0i +ψTvi) (9.8)

where Y0i is the observed baseline biomarker value and α3 is the log hazard ratio for

a one unit increase in the observed baseline biomarker value. Although simple to fit,

Equation (9.8) does not account for potential measurement error in Y0i.

Equation (9.5) assumes, to some extent, that the longitudinal submodel has been

correctly specified, in that it is capturing the variability over time and providing sen-

sible estimates of the subject-specific intercept to feed into the survival component.

Robustness to this will be investigated in application to the GPRD cohort, in Section

9.5.1.


In order to assess the performance of the standard approach of including observed

biomarker values, compared to the full joint model described above, I evaluate both

through simulation, allowing an assessment of performance when I can fit the true data-

generating model. For ease of exposition I assume a longitudinal model with random

169

intercept and slope, assuming a continuous biomarker of interest with

mi(tij) = (β0 + b0i) + (β1 + b1i)tij

where β0 = β1 = 0, and b0i ∼ N(0, 1), b1i ∼ N(0, 0.252) with correlation between

(b0i, b1i) of 0.25. Observed measurements are then generated from Yij ∼ N(mi(tij), σ2e),

where tij is the time of the jth measurement for the ith patient. I vary σe from

0.1, 0.5, 1.I assume a Weibull baseline hazard function with λ = 0.1 and γ = 1.5. A binary

variable, v1i to represent treatment group was generated from Bin(1, 0.5), with an

associated log hazard ratio of φ1 = −0.5. A continuous covariate, v2i, to represent age at

baseline was generated from N(65, 12) with an associated log hazard ratio of φ2 = 0.01.

I then generate survival times from a Weibull distribution where the hazard is defined

as h(t) = h0(t) exp(α2(β0 + b0i) + φ1v1i + φ2v2i), with α2 the association parameter,

indicating the effect of a one unit increase in the value of the subject specific intercept

on the risk of event. I vary α2 = −0.5, 0.25, 0.25, 0.5. Each simulation contained 300

patients with up to 5 annual measurements (including baseline), and administrative

censoring at 5 years. This corresponds to an approximate 18.9% survival proportion at

5 years (calculated at the mean of covariate values, v1 = 12 , v2 = 65 and β0i = 0). To

each dataset I fit a Weibull proportional hazards model including the observed baseline

measurement, and a Weibull-based joint model with the random intercept association

structure. I adjust for age and treatment in the survival submodel. Each scenario is

simulated 1000 times.

To illustrate the varying measurement error standard deviations used in the simula-

tion scenarios, I show in Figure 9.1 observed longitudinal measurements from the same

100 patients with σe = 0.1, 0.5, 1, and when α = 0.25. Figure 9.1 illustrates that

as the measurement error standard deviation increases, the variability in the observed

biomarker values increases.

170

-4-2

02

4Lo

ngitu

dina

l res

pons

e

0 1 2 3 4Time of measurement (years)

se = 0.1

-4-2

02

4Lo

ngitu

dina

l res

pons

e


se = 0.5

-6-4

-20

24

Long

itudi

nal r

espo

nse


se = 1

Figure 9.1. Example simulated observed longitudinal measurementswith varying measurement error standard deviation.

9.4.1. Results

Bias and coverage estimates for the association parameter are presented in Table 9.1.

Under the standard Weibull model, I observe increasing bias in the estimates of the

association between baseline biomarker values and survival, as the magnitude of the

measurement error standard deviation, σe, increases. In parallel I observe very poor

coverage probabilities under the Weibull approach. For example, with α = 0.5 and

σe = 1, I observe bias of -0.261 (percentage bias of -52.2%) and coverage of 0.4%.

In contrast, under the joint modelling approach I observe minimal bias and coverage

probabilities close to 95% across all scenarios.

9.5. Analysis of GPRD cohort

I now present the analysis of the GPRD cohort, first introduced in Chapter 3.

The cohort consists of 4,850 obese patients diagnosed with type 2 diabetes mellitus.

There are 107,347 measurements of SBP, with maximum follow-up of 22 years. There

were 278 stroke events. In all analyses I use SBP/10 so that a unit increase in SBP/10

represents a clinically meaningful 10 mm Hg increase in SBP. The primary interest is the

171

Table 9.1. Simulation results of bias and coverage of the associationparameter, α. The standard deviation of the measurement error is σe.

True value of α σeWeibull model Joint modelBias Coverage Bias Coverage

0.500.1 -0.001 94.8 0.005 95.30.5 -0.105 65.4 0.005 95.61.0 -0.261 0.4 0.008 94.8

0.250.1 0.002 94.4 0.005 94.30.5 -0.046 89.0 0.007 94.51.0 -0.123 34.1 0.010 94.8

-0.250.1 0.003 93.8 0.001 94.00.5 0.051 87.1 0.000 94.21.0 0.127 29.7 -0.002 94.6

-0.500.1 0.000 96.6 -0.005 95.90.5 0.104 66.7 -0.006 95.71.0 0.260 0.4 -0.010 94.5

association between baseline SBP and the risk of stroke. Baseline (tij = 0) corresponds

to when each patient entered the cohort, i.e. the time of first SBP measurement.

In Figure 9.2 I show the observed SBP measurements for 9 randomly selected

patients, who had at least 10 measurements, illustrating some non-linear trajectories.

To accommodate such non-linearities I can use restricted cubic splines in the linear

mixed effects submodel. In particular, I specify the following longitudinal trajectory

submodel

mi(tij) = (β0 + b0i) + β1agei + β2sexi + β3BMIi

+ βF sF (tij ; kF ) + bRsR(tij ; kR) (9.9)

Where sF (tij ; kF ) is the restricted cubic spline basis of measurement time with cor-

responding fixed effects, βF , with knot locations kF , and sR(tij ; kR) is the restricted

cubic spline basis of measurement time with corresponding random effects, bR, and

knot locations kR.

172

Preliminary modelling of the longitudinal data can be conducted to guide model

selection, in particular, the degrees of freedom for the spline terms capturing the un-

derlying longitudinal trajectory over time. I begin by assuming a random intercept and

selecting the degrees of freedom for the fixed spline terms using the AIC and BIC. In

this case, both select five degrees of freedom for sF (tij ; kF ). For the random splines

of time I assume a linear term, which equates to one spline term for sR(tij ; kR). This

allows a very flexible form to take into account the variation in SBP over time. I further

adjust for age (years), sex (female as the reference group) and Body-Mass Index (BMI

in kg/m2) at baseline.

To allow flexibility in the survival submodel I use the flexible parametric Royston-

Parmar survival submodel, developed in Chapter 7 (Royston and Lambert, 2011; Roys-

ton and Parmar, 2002; Crowther et al., 2012a), which models the baseline log cumula-

tive hazard function using restricted cubic splines. I can once again undertake separate

analysis of just the survival data to inform model selection. In particular, I use the

AIC and BIC to select the number of degrees of freedom to capture the baseline hazard

function, following Rutherford et al. (2014). In this case, both selected two degrees of

freedom, with an AIC of 2408.7173 and BIC of 2430.483. If one degree of freedom had

been selected, then this would be equivalent to a Weibull survival model. Our final

joint model is then

logHi(t) = logH0(t) + φ1agei + φ2sexi + φ3BMIi + α2(β0 + b0i) (9.10)

where

logH0(t) = s(log(t);γ,kS) (9.11)

where the baseline log cumulative hazard function, log(H0(t)), is expanded into a re-

stricted cubic spline function of log(t), s(log(t);γ,kS), with knot locations kS and

coefficient vector, γ. In each submodel I adjust for the baseline effects of age, sex and

BMI. I fit the joint model with the random intercept association structure shown in

173

Equation (9.5). For comparison, I also apply the standard Royston-Parmar survival

model, adjusting for observed baseline SBP, age, sex and BMI.

100125150175200

SB

P


Patient 1

100125150175200

SB

P0 5 10 15

Follow-up time (years)

Patient 2

100125150175200

SB

P


Patient 3

100125150175200

SB

P


Patient 4

100125150175200

SB

P


Patient 5

100125150175200

SB

P


Patient 6

100125150175200

SB

P


Patient 7

100125150175200

SB

P


Patient 8

100125150175200

SB

P


Patient 9

Observed SBP Predicted SBP

Figure 9.2. GPRD cohort of obese patients with type 2 diabetes mel-litus. Longitudinal response measurements for SBP for 9 randomly se-lected patients who had at least 10 measurements. The dashed linerepresents the fitted longitudinal trajectories based on the joint model.

Results are presented in Table 9.2. Under the standard Royston-Parmar survival

model I observe a hazard ratio for a ten mm Hg increase in baseline SBP of 1.111

(95% CI: 1.051, 1.172). Under a joint model I observe an increased hazard ratio of

1.198 (95% CI: 1.107, 1.298). The increased effect using a joint model is consistent

with that observed in the simulation study, i.e. that the bias in the standard survival

model is towards the null. The fitted trajectories seen in Figure 9.2 appear to capture

the subject-specific measurements well, although some patients appear to only require

a linear trend.

174

Table 9.2. GPRD cohort of obese patients with type 2 diabetes mel-litus. Results from applying a Royston-Parmar proportional hazardsmodel adjusting for observed baseline systolic blood pressure, and a fulljoint model using the intercept association structure. All models areadjusted for age (years) sex (male) and BMI (kg/m2).

Standard RP model Joint modelCoefficient 95% CI Coefficient 95% CI

Survival model:Baseline SBP/10 (α2) 0.105 0.050 0.159 0.181 0.102 0.261

Age (years) 0.048 0.036 0.060 0.050 0.038 0.062Sex 0.011 -0.233 0.254 -0.010 -0.254 0.234

BMI (kg/m2) 0.011 -0.015 0.037 0.013 -0.012 0.039

Longitudinal model:Intercept - - - 13.006 12.629 13.382

Age (years) - - - 0.025 0.022 0.029Sex - - - -0.252 -0.332 -0.171

BMI (kg/m2) - - - 0.003 -0.005 0.011RCS1 - - - -0.080 -0.121 -0.039RCS2 - - - -0.006 -0.019 0.006RCS3 - - - -0.001 -0.010 0.007RCS4 - - - 0.003 0.000 0.006RCS5 - - - 0.000 -0.001 0.001

σe - - - 1.522 1.515 1.528

RCS - Restricted Cubic Spline

I illustrate the bias from the standard approach in Figure 9.3, showing predictions

from both models for a female patient aged 60, with low (90), medium (130) and high

(200) SBP baseline measurements. To quantify the differences, at 10 years under the

standard model I observe a survival probability of 0.881 for a SBP of 200, compared to

0.770 under the full joint model.

9.5.1. Robustness to model specification. The association structure used to

link baseline SBP to the risk of stroke, relies on the estimated subject-specific baseline

value of SBP, as modelled by the longitudinal submodel. This will clearly be influenced

by how the longitudinal trajectory over time has been specified. To investigate the

sensitivity of this association structure, I repeat the analysis described in Section 9.5;

however, now assume a random intercept and random linear slope. This model is

clearly inadequate to capture the variability in SBP over time, as was indicated by

175

0.6

0.7

0.8

0.9

1.0

Sur

viva

l pro

babi

lity

0 5 10 15 20Follow-up time (years)

FPM model: SBP = 90 Joint model: SBP = 90FPM model: SBP = 130 Joint model: SBP = 130FPM model: SBP = 200 Joint model: SBP = 200

Figure 9.3. Predicted survival from the Royston-Parmar flexible para-metric model (FPM) and joint model, for a female, aged 60 years, BMIof 30, with SBP of 90, 130 or 200.

the selection of restricted cubic spline function with 5 degrees of freedom in the above

application. Results are presented in Table 9.3.

Table 9.3. GPRD cohort of obese patients with type 2 diabetes mel-litus. Results from applying a joint model with random intercept andrandom linear slope in the longitudinal submodel, using the interceptassociation structure.

Coefficient Std. Err. 95% CI

Survival model:Baseline SBP/10 (α2) 0.145 0.040 0.066 0.224

Age (years) 0.051 0.006 0.038 0.063Sex -0.010 0.124 -0.253 0.233

BMI (kg/m2) 0.014 0.013 -0.012 0.039

Longitudinal model:Time -0.060 0.004 -0.067 -0.053

Age (years) 0.024 0.002 0.021 0.028Sex -0.259 0.041 -0.340 -0.179

BMI (kg/m2) 0.000 0.004 -0.007 0.008Intercept 13.218 0.191 12.843 13.593

176

Comparing results from Table 9.3 to Table 9.2, I observe very consistent estimates

for all covariate effects between models, except the association parameter. Under the

random intercept and random linear slope longitudinal trajectory, I obtain an estimate

of association of 0.145 (95% CI: 0.066, 0.224), representing a hazard ratio of 1.156 (95%

CI: 1.068, 1.251). This is interpreted as a 15.6% (95% CI: 6.8%, 25.1%) increase in

the rate of stroke for a 10 unit increase in baseline systolic blood pressure. I compare

this to a hazard ratio of 1.198 (95% CI: 1.107, 1.298), representing a 19.8% (95% CI:

10.7%, 29.8%) increase in the rate of stroke for a 10 unit increase in baseline SBP, from

the more flexible spline longitudinal submodel. This will discussed in Section 9.6. To

highlight the different model fits, the AIC and BIC from the linear trajectory model

were 415,250.38 and 415,308.42, respectively, and from the spline based model were

414,627.73 and 414,700.28, respectively, indicating a much poorer fit from the linear

trajectory based joint model.

9.6. Discussion

A wealth of patient data is becoming available in registry sources such as the Clin-

ical Practice Research Database (the successor to GPRD), providing extensive oppor-

tunities to utilise the joint modelling framework. I have shown that by incorporating

repeated measures of a biomarker within a unified joint model framework, the bias due

to measurement error is minimised, even when only the baseline level of the biomarker

is predictive of survival. As illustrated in the simulation study, ignoring measurement

error in biomarkers such as blood pressure can lead to a marked under-estimation of

covariate effects. In the application, through the use of restricted cubic splines in the

linear mixed effects submodel, I can model highly nonlinear trajectories over time,

compared to linear slope models. Furthermore, the Royston-Parmar survival submodel

can also capture complex baseline hazard functions, an important component when

predicting survival at the patient level (Crowther et al., 2012a).

Given that, to my knowledge, all current cardiovascular risk scores only use base-

line measures, with no adjustment for measurement error, the prospects of utilising this

177

framework to improve prognostic risk scores is quite substantial. Predicting survival

for a new patient using this framework follows naturally, as often only a first baseline

biomarker observation will be available. However, such a modelling approach also al-

lows a dynamic risk prediction approach to be adopted, whereby a patient’s estimated

future risk is updated as each new biomarker value is obtained in real-time using elec-

tronic health records (Rizopoulos, 2011). Such an approach could enable response to

treatment to be monitored and patients counselled accordingly.

In the analysis of the GPRD cohort, I incorporated flexibility in both the longitu-

dinal submodel through the use of restricted cubic splines, and the Royston-Parmar

survival submodel. Given that both submodels require choosing the number of degrees

of freedom, a simple sensitivity analysis can be undertaken to assess knot locations

and number of knots. I showed in Chapter 7 that the Royston-Parmar survival sub-

model is very robust to both knot placement and number of knots within a joint model

framework (Crowther et al., 2012a), and furthermore, an extensive simulation study,

described in Chapter 2, has been conducted by Rutherford et al. (2014), which showed

excellent performance of the Royston-Parmar model to capture simple and complex

baseline hazard functions. Furthermore, given that primary interest was in the sur-

vival component, and the estimate of association, often modelling the longitudinal

component with a suitable sensible functional form will provide an improved estimate

compared to simplistic approaches of separate modelling.

In sensitivity analysis, I investigated the influence that the functional form for the

longitudinal trajectory over time, has on the estimate of the association parameter. I

found a marked difference between the estimate of association from the model which

was chosen through use of the AIC and BIC to select the functional form (selecting a

restricted cubic spline function with 5 degrees of freedom for the fixed effects of time,

and a random linear slope), and an intentionally more simplistic model which assumed

a random intercept and fixed and random linear effect of time. Although the AIC and

BIC indicated a much improved fit from the selected spline based model, compared to

178

the simplistic linear model, caution must be taken when assessing model fit for each

component of a joint model. This is an area which requires further research.

In this chapter I have concentrated on a specific association structure linking the

two component submodels; however, it may be of interest to investigate linking multiple

components of a biomarkers trajectory to the time to an event of interest. For example,

recent work by Rothwell et al. (2010) has shown associations between not only baseline

blood pressure, but also variability over time as important predictors of cardiovascular

events. Furthermore, I have only compared the standard approach of adjusting for

observed baseline biomarker values to the full joint model. It would be of interest

to compare alternative approaches for adjusting for measurement error, not only in

baseline biomarkers, but also under a time-dependent association structure (Zucker,

2005; Liao et al., 2011).

Extensions to the modelling framework include incorporating multiple biomarkers.

In particular, in this chapter I modelled SBP over time, whilst adjusting for baseline

BMI. It may be of interest to model not only SBP but also the inter-relationships

between different biomarkers such as BMI, and how they are related to an event of

interest (Rizopoulos and Ghosh, 2011). A further issue with an observational dataset

in particular, is the choice of appropriate timescale. The baseline time point used here

is time of first measurement of SBP; however, when estimating survival probabilities,

this does not provide an intuitive, clinically meaningful way of conveying survival to

patients. More appropriate perhaps, would be to use age as the timescale; however,

this requires delayed entry to be incorporated into the joint model framework, which

is not currently available in any software package. I discuss this extension further in

Chapter 11.

An issue which has perhaps made researchers apprehensive about adopting a joint

modelling approach, particularly to registry based datasets, is the computational bur-

den. To illustrate computational aspects of the framework, the presented joint model

applied to the cohort of 4,850 patients and a total of 107,347 observations took just

179

over 13 minutes to converge on a HP Intel Core i5 2.5GHz CPU computer with 8Gb of

RAM, which illustrates that this form of analysis is becoming entirely feasible.

180

CHAPTER 10

Software for the joint modelling of longitudinal and

survival data


In this chapter I describe one of the central components of this thesis; the Stata

package stjm which implements the class of joint models that have been described in

the previous four chapters.

The first version of the software was published in The Stata Journal, a copy of

which can be found in Appendix F (Crowther et al., 2013a). I describe the latest

version of the software, consisting of approximately 15,000 lines of code, which includes

developments and extensions since publication, with the software available from the

Statistical Software Components (SSC) archive, which can be downloaded by typing

ssc install stjm, from Stata.

10.2. Introduction

A central theme of this thesis is to not only develop novel methodology for the

analysis of complex survival data, but in parallel, develop and disseminate Stata soft-

ware packages that can be readily used by statistical and medical researchers. Within

the Stata user community, this is generally done by depositing packages at the Statisti-

cal Software Components (SSC) archive, or through publication in The Stata Journal.

Both of these resources allow any Stata user to download the required package.

The uptake of new methodology within applied research can often be hampered

through a lack of ‘user friendly’ software, with sufficient documentation that researchers

can apply the methods with confidence. In this chapter, I describe the Stata package

stjm, which has been developed alongside the work described in Chapters 6 to 9. It

181

allows the joint modelling of a longitudinal continuous response and the time to an event

of interest, using the linear mixed effects framework as the longitudinal submodel, and

the proportional hazards or cumulative hazards framework for the survival submodel.

10.3. Data setup

I illustrate stjm through application to a dataset of 312 patients with Primary

Biliary Cirrhosis (PBC), from Murtagh et al. (1994), first introduced in Chapter 3. 158

patients were randomised to receive D-penicillamine, with 154 assigned to a placebo.

Serum bilirubin was measured repeatedly at intermittent time points. I investigate the

effect of treatment after adjusting for the relationship between serum bilirubin levels

and time to death. A total of 140 deaths were observed. Due to right skewness, in all

analyses I work with log(serum bilirubin).

The dataset must be correctly stset for use with stjm, through the use of start and

stop times. The purpose of the stset command is to declare it to be multiple-record

per subject survival data, which invokes sets of internal error checks to ensure the data

set has been setup correctly, and creates core variables such as d; the event indicator,

t0; the start times, and t; the stop times. The dataset must be stset in order for

stjm to be used, which leaves less room for user error. Through this, stjm can use t0

as measurement times, and the final row of t as the survival times. I illustrate the

data structure below:

. use fullpbc, clear

. stset stop, enter(start) f(event=1) id(id)

id: idfailure event: event == 1

obs. time interval: (stop[_n-1], stop]enter on or after: time startexit on or before: failure

1945 total obs.0 exclusions

1945 obs. remaining, representing312 subjects140 failures in single failure-per-subject data

2000.307 total analysis time at risk, at risk from t = 0earliest observed entry t = 0

last observed exit t = 14.30566

. list id logb drug _t0 _t _d if id==3 | id==5, noobs sepby(id)

182

id logb drug _t0 _t _d

3 .3364722 D-penicil 0 .48187494 0

3 .0953102 D-penicil .48187494 .99660498 0

3 .4054651 D-penicil .99660498 2.0342789 0

3 .5877866 D-penicil 2.0342789 2.7707808 1

5 1.223776 placebo 0 .54484725 0

5 .6418539 placebo .54484725 1.070529 0

5 .9162908 placebo 1.070529 2.1054649 0

5 1.740466 placebo 2.1054649 3.0062425 0

5 1.648659 placebo 3.0062425 3.9836819 0

5 2.944439 placebo 3.9836819 4.1205783 0

Here I have 2 patients with 4 and 6 measurements of log(serum bilirubin), respec-

tively. The variable t0 can now be used to denote the time that measurements were

taken, and the final row (for each patient) of t represents the survival time. The

data can be explored using the supplementary command stjmgraph (which can be in-

stalled by typing ssc install stjmgraph), which creates joint longitudinal-survival

plots. The syntax is simple, I state the command name, the name of the longitudinal

outcome and the panel identifier variable. I also use the lowess option to overlay a

lowess smoother to aid interpretation, and the adjust option which scales the x-axis, by

taking away each patient’s observed survival/censoring time from the time of measure-

ments. A variety of further standard graphical options are also allowed in the syntax

of stjmgraph, to edit for example axis titles and labels.

. stjmgraph logb, panel(id) lowess adjust

Figure 10.1 displays all patients’ longitudinal trajectories against time, across event

status, with the timescale adjusted by subtracting each patients survival/censoring

time. If the dataset was particularly large, I could restrict the plotted sample using

the if or in qualifiers. There appears to be a general increasing trend which is much

sharper in patients who died, compared to those who were censored. This is indicative

of a positive association between longitudinal response and time to death, whereby a

higher level of the biomarker appears to be associated with time to death.

183

-20

24

Long

itudi

nal r

espo

nse


Censored

-20

24

Long

itudi

nal r

espo

nse


Event

Figure 10.1. Longitudinal profiles of log(serum bilirubin) for patientswho were censored/died. Timescale is adjusted by taking away eachpatients survival time.

10.4. Model fitting

I now describe the main options which control the appropriate submodel specifi-

cations. A joint model can be fitted with essentially a single line of code, under the

following general syntax:

. stjm longdepvar [varlist] [if] [in] , panel(varname)

> survmodel(survsubmodel) [options]

The core components include declaring the longdepvar; the continuous longitudinal

response variable, panel(); the variable which contains a unique identifier for each

patient, and survmodel(); the type of survival submodel (see Section 10.4.2). Variables

specified in [varlist] pertain to baseline covariates to include in the longitudinal

submodel specification.

10.4.1. Longitudinal submodels

The specification of how to model the longitudinal response over time must be exclu-

sively handled through options, as under a time-dependent association structure the

cumulative hazard function is obtained using numerical integration, integrating over

184

follow-up time, and therefore stjm needs to know the exact specification of the time-

dependent fixed and random design matrices. To maintain flexibility, there are two

core ways to model the longitudinal outcome over time.

The first is to use polynomials using the options ffp(numlist) and rfp(numlist).

ffp() specifies power transformations of the time variable, to be included in the lon-

gitudinal submodel as fixed covariates, with t0 used as the time of measurements.

Values must be in −5,−4,−3,−2,−1,−0.5, 0, 0.5, 1, 2, 3, 4, 5. rfp() specifies power

transformations of the time variable, to be included in the longitudinal submodel as

fixed and random covariates. By default, a random intercept term is assumed in all

stjm models.

Alternatively, restricted cubic splines can be used through the options frcs(#) and

rrcs(#). By default, internal knot locations are placed at equally spaced centiles of

the distribution of observation times, with boundary knots placed at the minimum and

maximum observation times. frcs() specifies the degrees of freedom of the time vari-

able, expanding time into restricted cubic splines as fixed covariates. rrcs() specifies

the degrees of freedom of the time variable, expanding time into restricted cubic splines

as random covariates.

The variance-covariance structure of the random effects can be specified using the

covariance() option, with four choices available, namely, independent; specifies a

distinct variance for each random effect, with all covariances zero, exchangeable;

specifies equal variances for all random effects and one common pairwise covariance,

identity; specifies equal variances for all random effects with all covariances zero, and

unstructured; specifies that all variances and covariances are distinctly estimated. If

covariance() is left empty, then covariance(unstructured) is assumed.

Finally, interactions between baseline covariates and fixed effects of measurement

time can be specified using the timeinteraction(varlist) option. This allows dif-

ferent slopes for different covariate patterns.

10.4.2. Survival submodels

A variety of survival submodel choices are available, many of which were described in

185

earlier chapters. I briefly list them here for completeness, assuming the current value

parameterisation for illustration. Baseline covariates can be included in all survival

submodels, through the survcov(varlist) option.

10.4.2.1. Exponential

An exponential baseline hazard can be assumed under the following model

hi(t) = λ exp(ψTvi + αmi(t)) (10.1)

which can be specified using the survmodel(exp) option.

10.4.2.2. Weibull

A Weibull baseline hazard can be assumed under the following model

hi(t) = λγtγ−1 exp(ψTvi + αmi(t)) (10.2)

which can be specified using the survmodel(weibull) option.

10.4.2.3. Gompertz

A Gompertz baseline hazard can be assumed under the following model

hi(t) = λ exp(γt+ψTvi + αmi(t)) (10.3)

which can be specified using the survmodel(gompertz) option. I believe this was

the first implementation of the Gompertz model within a joint modelling framework

(Crowther et al., 2013a).

10.4.2.4. Restricted cubic splines on the log hazard scale

A restricted cubic spline baseline hazard can be assumed under the following model

hi(t) = exp(s(log(t)|k0,γ) +ψTvi + αmi(t)) (10.4)

which can be specified using the survmodel(rcs) option, with the degrees of freedom

defined using the df(#) option. By default, the spline terms are orthogonalised, which

can be suppressed using the noorthog option.

186

10.4.2.5. Flexible parametric model

The Royston-Parmar survival submodel described in Chapter 7 can be assumed

Hi(t) = exp(s(log(t)|k0,γ) +ψTvi + αmi(t)) (10.5)

which can be specified using the survmodel(fpm) option, with the degrees of freedom

defined using the df(#) option, or a vector of internal knots can be defined using

knots(numlist). By default, the spline terms are orthogonalised, which can be sup-

pressed using the noorthog option.

10.4.2.6. 2-component mixture models

The two-component mixture Weibull model, described in Chapter 8, is defined under

the following model

hi(t) =λ1γ1t


γ2−1(1− p) exp(−λ2tγ2)

p exp(−λ1tγ1) + (1− p) exp(−λ2tγ2)

× exp(Xiβ + αmi(t)) (10.6)

which can be specified using the survmodel(weibweib) option.

10.5. Association structures

All of the association structures that were introduced in Chapter 6 are implemented

in stjm. By default, stjm assumes the current value association structure; however,

this can be turned off by specifying the nocurrent option. Interaction effects, as

described in Section 6.5, can be specified using the assoccovariates(varlist) op-

tion, and the slope association can be used by specifying the derivassociation op-

tion. The random effects pameterisations can be specified using either the intassoc

option, which includes the subject-specific intercept, or the association(numlist)

option where numlist must be one of the numbers specified in rfp(numlist). By

default, intassoc and association() include the fixed effect (which generally leads

to a more interpretable association parameter); however, this can be turned off using

the nocoefficient option.

187

Any combination of the association structures described above can be specified

within the same joint model, providing a highly flexible framework with a variety of

ways to investigate the relationship between the two processes.

10.6. Estimation

As described in Chapter 6, evaluating the likelihood of a joint model is a computa-

tionally challenging task, requiring advanced numerical integration techniques. Within

Stata, similar techniques have been implemented within the Generalised Linear Latent

And Mixed Models (GLLAMM) (Rabe-Hesketh et al., 2002; ?), and are now part of

official Stata with the release of version 13 (StataCorp., 2013), with the introduction

of the new suite of me commands, implementing multilevel mixed effects models, and

the gsem suite to fit generalised structural equation models.

The default numerical integration technique is adaptive Gauss-Hermite (GH) quad-

rature, described in Chapter 6. Following Rabe-Hesketh et al. (2002), I implement the

adaptive GH quadrature at the beginning of each full Newton-Raphson iteration, under

the following scheme:

• Obtain starting values for θ0. These are obtained by fitting a separate linear

mixed effects model, obtaining patient-specific predictions from the mixed ef-

fects model, and including them as a time-varying covariate within a standard

survival model (this assumes the current value association, but is appropri-

ately altered under different association structures). This is a computationally

efficient way of getting excellent starting values for the full joint model.

• Repeat for k = 1, 2, . . . until Newton-Raphson iterations have converged

– Predict the subject-specific posterior means and standard deviations, bki

and Ωki :

∗ Predict the subject-specific posterior means and standard devia-

tions, bk0i and Ωk0

i using adaptive quadrature based on previous

estimates, bk−1i and Ωk−1

i

∗ Repeat for j = 1, 2, . . . until convergence:

188

· Predict the subject-specific posterior means and standard de-

viations, bkji and Ωkji using adaptive quadrature based on pre-

vious estimates, bkj−1i and Ωkj−1

i

– Update the parameter estimates to θk using adaptive quadrature using

bki and Ωki

Various options are available to control the maximisation process, including the

numerical approximations used to evaluate the joint likelihood. These include the

number of quadrature points used for each dimension of the random effects, gh(#), with

a default of 5 under adaptive quadrature and 15 under non-adaptive quadrature. The

tolerance used when updating the quadrature nodes can be controlled using atol(#)

(the j iterations in the above scheme), with a default of 1E-08.

10.6.1. A note on calculating the cumulative hazard function

As described in Chapter 6, under time-dependent association structures and a para-

metric baseline hazard function, the estimation of a joint model requires numerical

integration to evaluate the cumulative hazard function, and subsequently the survival

function required to calculate the joint likelihood. This adds to the computational de-

mands of the estimation process, given that this set of numerical integration is nested

within possibly multi-dimensional Gauss-Hermite quadrature, to integrate out the ran-

dom effects. However, this is not always the case, as if only time-independent associa-

tion structures are specified, this is detected by stjm and the nested set of numerical

integration is avoided when fitting a standard parametric baseline hazard function, such

as the exponential, Weibull or Gompertz. In testing on the liver cirrhosis trial dataset,

this can reduce computation time by up to half.

10.7. Joint modelling of longitudinally measured serum bilirubin and time

to death

To illustrate the ease of which a joint model can be fitted, I apply a joint model

to the PBC dataset. I model the longitudinal process using a linear trajectory model,

with random intercept and slope, adjusting for the interaction between fixed linear

189

time and treatment group. I model the survival process using a Weibull proportional

hazards survival submodel, and adjusting for treatment group. I use the default current

value association and the default unstructured form for the random effects variance-

covariance matrix.

. stjm logb, panel(id) rfp(1) timeinterac(trt) survm(weibull) survcov(trt)-> gen double _time_1 = X^(1)-> gen double _time_1_trt = trt * _time_1(where X = _t0)

Obtaining initial values:

Fitting full model:

-> Conducting adaptive Gauss-Hermite quadrature

Iteration 0: log likelihood = -1923.9358Iteration 1: log likelihood = -1919.2078Iteration 2: log likelihood = -1919.1856Iteration 3: log likelihood = -1919.1855

Joint model estimates Number of obs. = 1945Panel variable: id Number of panels = 312

Number of failures = 140

Log-likelihood = -1919.1855


Longitudinal

_time_1 .1826073 .0183264 9.96 0.000 .1466883 .2185264

_time_1_trt .0045744 .0244713 0.19 0.852 -.0433885 .0525373

_cons .4927346 .0582861 8.45 0.000 .3784959 .6069733

Survival

assoc:value

_cons 1.24083 .0931223 13.32 0.000 1.058314 1.423347

ln_lambda

trt .0407589 .179847 0.23 0.821 -.3117347 .3932525

_cons -4.409684 .2740596 -16.09 0.000 -4.946831 -3.872537

ln_gamma

_cons .0188928 .0827694 0.23 0.819 -.1433322 .1811178

Random effects Parameters Estimate Std. Err. [95% Conf. Interval]

id: Unstructured

sd(_time_1) .1806879 .0123806 .1579812 .2066583

sd(_cons) 1.002541 .0426659 .9223098 1.089751

corr(_time_1,_cons) .4256451 .0728762 .2730232 .5573664

sd(Residual) .3471453 .0066734 .334309 .3604745

Longitudinal submodel: Linear mixed effects modelSurvival submodel: Weibull proportional hazards model

Integration method: Adaptive Gauss-Hermite quadrature using 5 nodesCumulative hazard: Gauss-Kronrod quadrature using 15 nodes

The first component of the output indicates the creation of core variables, gener-

ated by stjm, which are included in the linear predictor of the longitudinal submodel.

190

The next statement indicates that stjm is obtaining initial values, which means it is

currently fitting a two-stage model to get starting values for the full joint model. It

then proceeds to fit the full joint model using adaptive GH quadrature. The adaptive

iterations could be shown by specifying the showadapt option. The model converges

within three Newton-Raphson iterations and the results table is produced, showing the

longitudinal and the survival parameters.

The results table indicates a population average baseline value of the biomarker of

0.493 (95% CI: 0.378, 0.607), with an average statistically significant increasing linear

trend in the control group of 0.183 (95% CI: 0.147, 0.219). This increasing trend is

estimated to be slightly higher in the treatment group, though this difference is non-

statistically significiant (estimated difference in linear trend of 0.005 (95% CI: -0.043,

0.053)).

The association parameter is labelled under assoc:value, indicating the current

value association structure, with an estimated coefficient of 1.241 (95% CI: 1.058,

1.423), which equates to a hazard ratio of 3.459 (95% CI: 2.882, 4.151) for a one-

unit increase in log serum bilirum at time t, indicating a highly positive, statistically

significant association structure. I also observe a non-statistically significant direct ef-

fect of treatment on survival, with a log hazard ratio of 0.041 (95% CI: -0.312, 0.393).

As I fitted a Weibull baseline hazard function, baseline covariates are included in the

linear predictor of log(λ) under proportional hazards. The baseline parameters for the

Weibull survival submodel are presented on their parameterised scales (i.e. the log

scale), which can be exponentiated; λ = 0.012 and γ = 1.019.

The final part of the results table shows the estimated variance-covariance ma-

trix, which by default is presented showing standard deviations and correlation coeffi-

cients. It can be changed to show variances and covariances by simply entering stjm,

variance following a model fit to produce the output table again, or the variance

option when the model is fitted. The estimate of the residual measurement error is

shown as sd(Residual).

191

Finally, some summary text is presented to indicate which submodels have been

chosen, and the type and number of quadrature nodes for both the integrals over the

random effects, and the integral to calculate the cumulative hazard.

10.8. Post-estimation

A variety of predictions can be calculated following the fitting of a joint model using

stjm. Predictions follow the standard Stata syntax of:

. predict newvarname, statistic [options]

Fitted values for both the longitudinal response and the survival component (hazard,

cumhazard, survival) are available, either based on the fixed component of the model,

xb, or the fixed and empirical Bayes estimates of the random effects, fitted. A va-

riety of residuals can also be calculated, including, residuals; response minus fit-

ted values for longitudinal submodel, rstandard; standardised longitudinal residuals,

martingale; martingale-like residuals for the survival component, and deviance; which

calculates deviance residuals for the survival component. Empirical Bayes estimates of

the random effects can also be calculated using the reffects option, along with their

standard errors using the reses option.

Extended predictions can also be obtained using the at() option, which specifies

particular values of covariates at which to calculate predictions. This was used to create

Figure 9.3 in Chapter 9, predicting survival at specific covariate patterns. Furthermore,

baseline predictions can be calculated using the zeros option, which sets all baseline

covariates to be zero.

10.9. Discussion

In this chapter I have described the implementation in Stata of the joint mod-

elling framework described in earlier chapters. Through extensive model choices and

association structures, it provides a highly flexible framework for both the longitudinal

submodel, through the use of fractional polynomials or splines, and the survival sub-

model. Through the implementation of adaptive Gauss-Hermite quadrature, accurate

192

estimates of effect can be obtained using a much reduced number of quadrature nodes,

resulting in substantial computational benefits.

Although the methodological developments I have described throughout the thesis

are all implemented in Stata, some excellent alternatives to stjm exist in the statis-

tical software R. The first is joineR (Philipson et al., 2012), which implements joint

longitudinal-survival models using the Cox model as the survival model of choice, im-

plementing the model of Henderson et al. (2000) and provides a variety of plotting

functions and variograms. Alternatively, there is the extensive JM package, which im-

plements many of the standard joint models described in this thesis, and also allows the

Cox model to be used. It also includes many extensions such as the conditional survival

predictions of Rizopoulos (2011), and multiple-imputation based residuals (Rizopoulos

et al., 2010).

The stjm package is being constantly updated and improved, to include extension to

competing risks, inclusion of a cure proportion and allowance of categorical longitudinal

responses, to name a few. Future developments will be described in Chapter 11.

193

CHAPTER 11

Discussion


In this chapter I conclude the thesis with a summary of the methodological de-

velopments described in previous chapters, whilst also considering the limitations of

the work. I further discuss the variety of future extensions, many currently under

investigation, that could build on the work that has been presented.

11.2. Introduction

In this thesis, I have developed methodology for both the simulation and analysis

of complex survival data, including the framework of joint models for longitudinal and

survival data. The methods have been assessed through simulation and application to

a variety of clinical datasets, illustrating the broad range of clinical topics that can

utilise the methodology.

In Section 11.3, I present a summary of the thesis, describing the main components

of each chapter. In Section 11.4, I place the work of the thesis in context by considering

the limitations of the work presented. In Section 11.5 I describe possible extensions

to the methodological frameworks described, including ongoing areas of research, mo-

tivated by clinically important questions. I make some final conclusions in Section

11.6.

11.3. Summary of the thesis

The following is a broad summary of the developments that have been presented

in this thesis. In Chapter 2 I described the foundations of survival analysis, which

formed the basis for extensions and developments in later chapters. Concentrating in

particular on parametric approaches to survival analysis, I described the fundamental

194

relationships which can provide clinically useful information in relating complex sta-

tistical concepts to patients. In Chapter 3 I described six motivating datasets, from

a variety of clinical areas, which were utilised throughout the thesis to illustrate the

methodological developments. Chapter 4 described the development of a general algo-

rithm to simulate survival times, providing a widely applicable framework of which to

simulate both simple and complex survival data. The framework was motivated by a

trial in breast cancer therapy, exhibiting a turning point in the baseline hazard function,

which is a commonly observed phenomenon in cancer trials, with mortality often peak-

ing shortly after the beginning of a trial before tailing off. The proposed algorithm can

simulate survival times from scenarios incorporating anything from extremely complex

underlying hazard functions, flexible time-dependent effects, time-varying covariates of

any type, and frailty terms, to name a few. This algorithm used numerical integration

nested within iterative root-finding, providing a computationally efficient framework for

simulation for use in practice, for example, it has recently been used in a study I was

involved in to investigate the performance of methods to adjust for treatment switching

in oncology trials (Latimer et al., 2014). Furthermore, I am involved in the extension

to simulate complex competing risks data, combining the general simulation algorithm

with the approach of Beyersmann et al. (2009). The numerical integration described

and used in Chapter 4, was then directly employed in Chapter 5, to develop a general

framework for the parametric analysis of survival data. Through the use of numerical

integration, a survival model can be fitted, with essentially any reasonable specification

for the hazard or log hazard function. This allowed the use of complex functions, such

as splines or fractional polynomials, to model not only the baseline hazard or log hazard

function, but time-dependent effects, and time-varying covariates. Delayed entry (left

truncation) was also accommodated within the framework. The first article relating to

Chapter 5 has been downloaded over 4,200 times since publication (Crowther and Lam-

bert, 2013b). The methods were applied to an observational dataset investigating the

effect of deprivation status on breast cancer survival, in both standard and relative sur-

vival frameworks, in particular highlighting the ability to model time-dependent effects

195

flexibly using restricted cubic splines. I further showed how the framework can be com-

bined with cluster robust standard errors to account for a hierarchical data structure,

in patients with multiple bladder cancer recurrences. The methods have since been

used in a National Institute for Health and Care Excellence (NICE) commissioned re-

port on total hip replacement and surface replacement for the treatment of pain and

disability resulting from end stage arthritis of the hip (Clarke et al., 2013), which I

was asked to collaborate on. The hip replacement example exemplifies the usefulness

of the general framework implemented in stgenreg as a development tool, whereby

we wished to fit a bathtub hazard model, a model unavailable in standard software;

however, could be fitted in a single line of code using stgenreg. Subsequently I wrote

an improved estimation routine which implemented the bathtub hazard model analyt-

ically. This echoes the improvements I illustrated in Chapter 5, through the combined

analytic/numeric approach when using restricted cubic splines on the log hazard scale,

implemented in strcs. In Chapter 6 I introduced the field of joint modelling of lon-

gitudinal and survival data, focusing on the frequentist approach and the modelling

of a single continuous longitudinal response and the time to an event of interest. I

described the foundations of the field and approaches previously conducted in the lit-

erature which motivated the extensions described in later chapters. In particular, in

Chapter 7, I extended the joint model framework to incorporate the parametric survival

model of Royston-Parmar, which provided a highly flexible framework to capture com-

plex baseline hazard functions, and provided computational benefits by modelling on

the log cumulative hazard scale. I showed improved performance of the method when

compared to a previous equivalent formulation which used cubic B-splines. Further to

this, I conducted an extensive simulation study which evaluated and compared the per-

formance of non-adaptive and fully adaptive Gauss-Hermite quadrature in calculating

the joint likelihood. This showed the clear superiority of fully adaptive quadrature,

both in terms of precision but also computational efficiency. I illustrated the proposed

model by investigating the association between repeatedly measured prothrombin in-

dex and survival, in patients with liver cirrhosis, indicating the improved fit over more

196

simple parametric approaches, and improved stability in estimates when compared to

the B-spline approach. Although modelling on the log cumulative hazard scale has its

advantages, the disadvantage of losing the equivalence between log cumulative hazard

ratios and log hazard ratios under a time-dependent association structure, motivated

the model framework of Chapter 8. In this chapter, I incorporated a mixture of two

parametric distributions as the survival submodel, returning to the log hazard scale,

concentrating on a two-component mixture Weibull model. Through simulation I illus-

trated that misspecifying the baseline hazard function can lead to bias in the estimates

of the association parameter and log hazard ratios of baseline effects. In Chapter 9, I

described the use of a specific association structure, with a novel application to registry

based data from the General Practice Research Database, investigating the association

between baseline systolic blood pressure (SBP) and the time to stroke, utilising the

repeated measures of SBP that are often simply ignored. Through simulation I showed

the reduction in bias compared to using observed baseline values of a biomarker, pro-

viding an effective way to account for measurement error. Finally in Chapter 10, I

described the implementation of the Stata package stjm, written in parallel to the

methodological developments presented in Chapters 7 to 9, for the fitting of a joint

longitudinal-survival model. A variety of model specifications were described providing

the user with great flexibility to model both the longitudinal trajectory, the survival

outcome, and how they are linked.

11.4. Limitations

It is important to consider the limitations and possible pitfalls of any analytical

method, to place in context the appropriate use of advanced statistical techniques.

The techniques described throughout Chapters 4 to 10 require at least one form of

numerical integration, be it across the infinite or finite domain. By definition, numerical

integration is a way of approximating an analytically intractable integral. Any approx-

imation contains a magnitude of error. Within the simulation framework of Chapter

4, there are instances where complex scenarios can be chosen that can avoid the need

197

for numerical integration and root-finding, such as was proposed by Austin (2012), to

simulate time-varying covariates under particular functional forms and baseline hazard

functions. Furthermore, if interest is in relative effects through fitting a Cox model,

where the baseline hazard is not estimated, then simulating from a simple parametric

form for the baseline, such as an exponential, should be preferred.

The general survival modelling framework developed in Chapter 5 also relies on in-

tegrating the hazard function using numerical quadrature to obtain the cumulative haz-

ard and consequently the survival function, for estimation. Alternatives to this, which

avoid the need for numerical integration include the reduced piecewise-exponential ap-

proach of Han et al. (2014), which combines a likelihood ratio test to identify shifts in

the hazard rate, a backward elimination procedure, and an optional presumed order

restriction on the hazard rate. However, this approach cannot model covariate effects,

as yet. Through fine-splitting of the timescale, using a piecewise exponential approach

can be advantageous, particularly as estimates closely match those from a Cox model

(Clayton and Hills, 1993; Crowther et al., 2012b). In particular, estimates will be iden-

tical to a Cox model if splitting at unique event times (Whitehead, 1980; Carstensen,

2004). A piecewise approach; however, loses the continuous time modelling nature of

the approach I developed in Chapter 5, which has particular advantages in terms of

prediction and extrapolation. Investigating improved numerical integration techniques

would be advantagous, such as using non-parametric or simple parametric estimates of

the hazard function to inform the nodes and weights of the numerical quadrature in an

attempt to place and weight them more appropriately.

A criticism of the frequentist approach to joint modelling is the reliance of numerical

quadrature to integrate out the random effects, in order to calculate the likelihood. It

is always recommended that any analyses relying on such approximations ensure con-

sistent estimates by using an increasing number of quadrature nodes (Rabe-Hesketh

et al., 2002). Many authors have proposed a Bayesian approach to joint modelling,

which avoids the need for numerical quadrature (Faucett and Thomas, 1996; Brown

198

and Ibrahim, 2003a; Ibrahim et al., 2004; Taylor et al., 2013). Often the choice be-

tween frequentist and Bayesian approaches is a philosophical one; however, the ease of

implementation of a Bayesian joint model can be seen as an advantage over more dif-

ficult to implement frequentist approaches, with vague prior distributions chosen with

the aim of recovering estimates that closely match the maximum likelihood equivalent.

This approach does not fully exploit the possible benefits of a Bayesian approach, where

genuine prior information can be incorporated. This aspect will be discussed further

in Section 11.5.

The choice of how to link the longitudinal and survival components, within a joint

model, has so far largely been guided by clinical opinion, or standard model selection

criteria, such as the AIC and BIC. Given a joint model consists of two inter-related

models, it is of interest to explore selection techniques which partition the influence

that particular components have on each aspect of a fitted model. This issue was

highlighted in Chapter 9, showing the impact that an inappropriate functional form for

the longitudinal trajectory over time can have on estimates of association.

Throughout the thesis, I have emphasised the benefits of using restricted cubic

splines for modelling both the baseline hazard function, and time-dependent effects.

An open criticism of splines is in the selection of knot locations. I have adopted the

most commonly used approach, which is to use centiles of the appropriate distribution

of observed data, for example, centiles of the uncensored (log) survival times, with the

degrees of freedom chosen by the user (Rutherford et al., 2014). Sensitivity analyses to

such selection must be conducted in order to establish suitability of functional form, and

robustness to misspecification (Royston and Lambert, 2011). Alternative estimation

routines such as penalised likelihood have been conducted to incorporate a trade-off

between the data fit and the smoothness of the functions (Rondeau et al., 2003, 2012),

which could be explored within the spline based models described in this thesis.

199

11.5. Future work

There are a variety of extensions to the work described in this thesis. The general

parametric survival framework described in Chapter 5 can be extended for competing

risks, and further to multi-state models. Competing risks occur commonly in medical

research, where patients are at risk of experiencing more than one type of event, with

the occurrence of one event preventing the occurrence of others (Putter et al., 2007).

A typical example is partitioning the mortality rate into death due to cancer, death

due to cardiovascular disease, and death due to other causes (Eloranta et al., 2012). In

this instance, the quantity that is generally of most interest is the cumulative incidence

function (CIF), which can be defined as the integral from time 0 to t, of the kth cause-

specific hazard multiplied by the overall survival function, representing the probability

of failing from cause k before time t. If utilising the general parametric framework,

which will provide smooth continuous time estimates of the CIF, then this function

will also require numerical integration to calculate it. This can be considered a post-

estimation task, as has been described previously (Lambert et al., 2010a; Hinchliffe and

Lambert, 2013). This extension is currently under investigation. Building on this is the

area of multi-state modelling, particularly for the analysis of large-scale cohort studies,

often using multiple linked databases, and how this could be used to more appropriately

populate decision models to allow for competing risks and correlation structures. A

further extension of interest is incorporating frailty terms into the general parametric

framework, to account for heterogeneity in baseline risk and covariate effects, which

are commonly used in areas such as individual patient data meta-analyses, and multi-

centre clinical trials. I have recently incorporated this extension into simple parametric

models and the Royston-Parmar survival model (Crowther et al., 2014).

Despite the extensive methodological development of joint models within the bio-

statistical literature in the past 20 years, there are many aspects which require further

work. With the growing availability of registry based data sources (Jutte et al., 2011),

comes the potential for more complex and rigorous investigations of the relationships

between biomarkers and how they relate to prognosis. For example, I am currently

200

investigating the added benefit of jointly modelling the relationship between systolic

blood pressure over time and the time to cardiovascular outcomes, in a cohort of patients

who have experienced their first cardiovascular disease event. This will be extended to

incorporate multiple biomarkers, such as serum cholesterol, where I need to account for

the expected correlation between biomarkers over time. This motivates the extension

to handle multiple biomarkers, and furthermore allowing for competing risks, where I

could investigate the different associations across event types, such as death from my-

ocardial infarction or stroke. Undertaking a Bayesian approach could be particularly

useful in this context, where there may be prior knowledge about the expected cor-

relation between biomarkers over time. Generalising the longitudinal outcomes to the

generalised linear mixed effects framework is also desirable, to accommodate categorical

and count data outcomes.

Finally, as I discussed in Chapter 9, the incorporation of delayed entry (left trun-

cation) into the joint model framework has received little attention in the joint model

literature (Piccorelli and Schluchter, 2012). If age is to be used as the timescale, which

can be of particular use when modelling observation data, due to presence of random

effects the calculation of the joint likelihood becomes more complex when delayed entry

is present, as the random effects are conditional on a patient’s entry time (Jensen et al.,

2004; Rondeau et al., 2011). I have recently implemented this extension, motivated by

a current project in breast cancer, jointly modelling repeated measures of breast den-

sity and how they are associated with survival, where patients have a minimum of two

breast density recordings, and as such are not at risk of event until the time of second

measurement.

11.6. Conclusion

In this thesis, I have developed a general framework for the simulation of survival

data, which can be utilised in a wide range of settings within the field of survival

analysis, providing an extensive approach to assess and validate assumptions through

simulation. Furthermore, the general parametric survival analysis model framework

201

developed in this thesis is widely applicable, providing the benefits of a parametric

approach to gain greater insights into clinical datasets, particularly through prediction.

As joint modelling becomes a recognised appropriate modelling technique for the com-

bined analysis of longitudinal and survival data, the areas of application will only grow,

and availability of the methods will become crucial. The importance of providing well

documented and user-friendly statistical software, in tandem to methodological devel-

opments, cannot be understated. For new techniques and improved ways of analysing

data to permeate from methodological to applied research, the methods must be acces-

sible to statisticians and non-statisticians, alike.

202

APPENDIX A

Research Papers 1 and 2 - The Stata Journal and Statistics in

Medicine

This appendix contains two papers which developed methodology and software for

the simulation of simple and complex survival data, published in The Stata Journal in

2012, and Statistics in Medicine in 2013. They are omitted from the e-thesis due to

copyright.

203

APPENDIX B

Research Papers 3 and 4 - Journal of Statistical Software and

Statistics in Medicine

This appendix contains two papers which developed methodology and software for

the general parametric analysis of survival data; the first published in the Journal of

Statistical Software in 2013, and the second published in Statistics in Medicine in 2014.

The second paper is omitted due to copyright.

204

JSS Journal of Statistical SoftwareMay 2013, Volume 53, Issue 12. http://www.jstatsoft.org/

stgenreg: A Stata Package for General Parametric

Survival Analysis

Michael J. CrowtherUniversity of Leicester

Paul C. LambertUniversity of Leicester

Abstract

In this paper we present the Stata package stgenreg for the parametric analysis ofsurvival data. Any user-defined hazard function can be specified, with the model estimatedusing maximum likelihood utilising numerical quadrature. Models that can be fitted rangefrom the Weibull proportional hazards model to the generalized gamma model, mixturemodels, cure rate models, accelerated failure time models and relative survival models.We illustrate the features of stgenreg through application to a cohort of women diagnosedwith breast cancer with outcome all-cause death.

Keywords: survival analysis, parametric models, numerical quadrature, maximum likelihood,Stata.

1. Introduction

Parametric models remain a standard tool for the analysis of survival data. Through a fullyparametric approach, we can not only obtain relative effects, such as hazard ratios in a propor-tional hazards model, but also clinically relevant absolute measures of risk, such as differencesin survival proportions (Lambert, Dickman, Nelson, and Royston 2010). Parametric modelsare also useful where extrapolation is required, such as in the economic decision modellingframework (Weinstein et al. 2003).

The most popular tool for analysing survival data remains the Cox proportional hazardsmodel (Cox 1972), which avoids making any assumptions for the shape of the baseline hazardfunction. One of the reasons the Cox model remains the prefered choice over parametricmodels is that standard parametric models available in standard software are often not flexibleenough to capture the underlying shape of the hazard function seen in real data.

The traditional approach to estimation of parametric models is through maximum likelihood.This is relatively simply when using a known probability distribution function, such as the

2 stgenreg: General Parametric Survival Analysis in Stata

Weibull or Gompertz. Many commonly used parametric survival models are implemented ina variety of software packages, such as the streg package in Stata (StataCorp. 2011), survreg(Therneau 2012) in R (R Core Team 2013) and LIFEREG in SAS (SAS Institute Inc. 2008).However, every parametric model has underlying assumptions, for example, the widely usedWeibull proportional hazards model assumes a monotonically increasing or decreasing baselinehazard rate. Such assumptions can be considered restrictive, leading to the development ofother more flexible parametric approaches (Royston and Parmar 2002; Royston and Lambert2011).

In this paper we present the Stata command stgenreg which enables the user to fit generalparametric models through specifying any baseline hazard function which can be written ina standard analytical form. This is implemented through numerical integration of the user-defined hazard function. This allows complex extensions to standard parametric models, forexample, modelling the log baseline hazard function using splines or fractional polynomials,as well as complex time-dependent effects; methods that are unavailable in standard software.Time-varying covariates can also be incorporated through using multiple records per subject.We do not consider frailty (unobserved heterogeneity) in this article.

One of the key advantages of such a general framework for survival analysis is in the devel-opment of new models, for example in one line of code a parametric survival model can befitted rather than having to directly program the likelihood evaluator.

2. Parametric survival analysis

Let T ∗i be the true event time of patient i = 1, . . . , n, and Ti = min(T ∗i , Ci) the observedsurvival time, with Ci the censoring time. Define an event indicator di, which takes the valueof 1 if T ∗i ≤ Ci and 0 otherwise. We define the probability density function of T ∗i as

f(t) = limδ→0

P (t ≤ T ∗ ≤ t+ δ)

δ

where f(t) is the unconditional probability of an event occuring in the interval (t, t+ δ). Wedefine the hazard and survival functions as

h(t) = limδ→0

P (t ≤ T ∗ ≤ t+ δ|T ∗ ≥ t)δ

and S(t) = P (T ∗ ≥ t)

such that h(t) is the instantaneous failure rate at time t, and S(t) is the probability of‘surviving’ longer than time t. This leads to

f(t) = h(t)S(t) (1)

We can further write

H(t) =

∫ t

0h(u)du S(t) = exp−H(t) (2)

where H(t) is the cumulative hazard function. When the integral in Equation 2 is analyticallyintractible, we can use numerical integration techniques to derive the cumulative hazard andthus still calculate the survival function.

Journal of Statistical Software 3

2.1. Maximum likelihood estimation

The log-likelihood contribution of the i-th patient, allowing for right censoring and delayedentry (left truncation), using Equation 1 can be written as

li = log

f(ti)

di

(S(ti)

S(t0i)

)1−di

= di logf(ti)+ (1− di) logS(ti) − (1− di) logS(t0i) (3)

where t0i and ti are the observed entry and survival/censoring times for the i-th patient.If delayed entry is not present then the third term in Equation 3 can be dropped. UsingEquation 3 we can directly maximize the log-likelihood if using known probability densityand survival functions. Alternatively, using Equation 1 we can write

li = log

h(ti)

diS(ti)

S(t0i)

= di logh(ti)+ logS(ti) − logS(t0i)

and substituting Equation 2 this becomes

li = di logh(ti) −∫ ti

t0i

h(u)du (4)

We note from Equation 4 that the likelihood can also be maximized if only the hazard func-tion is known. Of course, in standard parametric models, all 3 functions are known; however,given that often the hazard function is of most interest, specifying a complex hazard functioncan be advantageous. The maximization of such a specified hazard model relies on beingable to evaluate the integral in Equation 4. If we propose to use such functions as fractionalpolynomials or splines to model a complex baseline hazard function, or incorporating com-plex time-dependent effects, then we have a situation where this integral cannot always beevaluated analytically, motivating alternative approaches.

2.2. Numerical integration

We propose to use numerical quadrature to evaluate the cumulative hazard, and hence maxi-mize the likelihood in Equation 4, allowing the user to estimate a parametric survival model,specifying any function for the baseline hazard, satisfying h(t) > 0 for all t > 0.

Gaussian quadrature allows us to evaluate an analytically intractible integral through aweighted sum of a function evaluated at a set of pre-defined points, known as nodes (Stoerand Burlirsch 2002). We have

∫ 1

−1g(x)dx =

∫ 1

−1W (x)g(x)dx ≈

m∑

i=1

wig(xi)

where W (x) is a known weighting function and g(x) can be approximated by a polynomialfunction. The integral over [t0i, ti] in Equation 4 must be changed to an integral over [−1, 1]


using the following rule∫ ti

t0i

h(x)dx =ti − t0i

2

∫ 1

−1h

(ti − t0i

2x+

t0i + ti2

)dx

≈ ti − t0i2

m∑

i=1

wih

(ti − t0i

2xi +

t0i + ti2

)

This transformation allows the incorporation of delayed entry quite simply. The form of Gaus-sian quadrature depends on the choice of weighting function. The default within stgenreg isGauss-Legendre quadrature, with weighting function, W (x) = 1.

The accuracy of the numerical integral depends on the number of quadrature nodes, m, withnode locations dependent on the type of quadrature chosen. As with all methods which usenumerical integration, the stability of maximum likelihood estimates should be established byusing an increasing number of quadrature nodes.

2.3. Time-dependent effects and time-varying covariates

The presence of non-proportional hazards, i.e., time-dependent effects, is common in theanalysis of time to event data (Jatoi, Anderson, Jeong, and Redmond 2011). This is frequentlyobserved in registry data sources where follow-up time is often over many years (Lambertet al. 2011). Similarly in clinical trials, time-dependent treament effects are also observed(Mok et al. 2009). Time-dependent effects are incorporated seemlessly into our modellingframework, by allowing the user to interact any covariates with a specified function of time.We illustrate this in Section 4.2.1.

Time-varying covariates are a further often observed scenario in the analysis of survival data,where the value of a covariate for individual patients can change at various points in follow-up.For example in oncology clinical trials, patients will often switch treatment group when theircondition progresses (Morden, Lambert, Latimer, Abrams, and Wailoo 2011), or biomarkersmay be measured repeatedly over time, resulting in multiple records per subject (?). For thisform of analysis the data is often set up into start and stop times, and since delayed entry (lefttruncation) is allowed, this again is incorporated into the described modelling framework. Weillustrate through example in Section 4.4.

3. The Stata package stgenreg

The Stata package stgenreg is implemented as three Stata ado files. The primary shell pro-gram, stgenreg.ado, handles the syntax options for the package, which then calls the like-lihood evaluator program stgenreg_d0.ado, described in Section 3.1. Finally, a variety ofpredictions can be obtained following estimation of a model using Stata’s predict command,which calls the program stgenreg_pred.ado, described in Section 3.2.

3.1. Program implementation and syntax

The log-likelihood shown in Equation 4 is maximized using the Newton-Raphson algorithm,with first and second derivatives estimated numerically, as implemented in the ml command inStata (Gould, Pitblado, and Poi 2010). As described in Section 2.1, the integral in Equation 4is evaluated using m-point Gaussian quadrature.


The evaluator program has been optimized using Stata’s matrix programming language, Mata.This provides computational benefits and use of the wide array of mathematical functionsavailable for the user to specify in the hazard function. In addition, we have implementedspecific functions which allow the incorporation of restricted cubic splines or fractional poly-nomials into the hazard or log hazard function (Durrleman and Simon 1989; Royston andAltman 1994).

When using stgenreg one of the options loghazard() or hazard() must be defined. Thesespecify a user-defined log hazard or hazard function. The function must be defined in Matacode, with parameters specified in square brackets, for example [ln_lambda]. The use ofMata means that mathematical operations require a colon (:) prefix, for example :+ insteadof +. Time must be coded as #t. The user can specify covariates or functions of time withinthe linear predictor of any parameter, providing a highly flexible framework.

For example, we can specify a Weibull distribution using either the log hazard or hazardfunction. Each parameter is parameterized to contain the entire real number line, i.e., bothλ and γ are restricted to be positive by modelling on the log scale.

. stgenreg, loghazard([ln_lambda] :+ [ln_gamma] ///

> :+ (exp([ln_gamma]) :- 1) :* log(#t))

. stgenreg, hazard(exp([ln_lambda]) :* exp([ln_gamma]) :* ///

> #t :^ (exp([ln_gamma]) :- 1))

A linear predictor can be defined for any of the parameters, with the name of the optiondefined as the name of the parameter specified in the loghazard() or hazard() option. Forexample a proportional hazards Weibull model can be fitted with covariates treatment, ageand sex by adding the option ln_lambda(treatment age sex).

One of the key advantages of stgenreg is that we can incorporate a variety of functions (in-cluding functions of time) into the linear predictor of any parameter. For example, parameter[ln_lambda] has an available option ln_lambda(comp1 | comp2 | ...| compn), which cancontain a variety of component functions to increase complexity. Each compj can contain avariety of functions described in Table 1.

Additionally, excess mortality (relative survival) models (Nelson, Lambert, Squire, and Jones2007) can be fitted by use of the bhazard(varname) option. In these models a known expectedmortality rate, h∗(t), is included in the model as follows,

h(t) = h∗(t) + λ(t)

Here the loghazard() and hazard() options now refer to the modelling of λ(t). Note that itis the expected mortality rate at the event time that needs to be supplied to the bhazard()

option.

Finally, all standard options of the ml suite in Stata can be used when fitting a stgenreg

model, such as constraints() which allow the user to constrain the value of any coefficientto be a particular constant.

3.2. Predictions

A variety of predictions can be obtained following the estimation of a model. These includethe hazard, survival and cumulative hazard functions.


Component Description

varlist [, nocons] The user may specify a standard variable list within acomponent section, with an optional nocons option.

g(#t) Where g() is any user defined function of #t writtenin Mata code, for example #t:^2.

#rcs(options ) Creates restricted cubic splines of either log time ortime. Options include df(int), the number of de-grees of freedom, noorthog which turns off the de-fault orthogonalisation, time, which creates splinesusing time rather than log time, the default, andoffset(varname) to include an offset when calculat-ing the splines. See rcsgen in Stata for more details.

#fp(numlist [,options ]) Creates fractional polynomials of time with powers de-fined in numlist. If 0 is specified, log time is gener-ated. The only current option is offset() which isconsistent with that described in #rcs() above.

varname:*f(#t) To include time-dependent effects, where f(#t) is oneof #rcs(), #fp() or g().

Table 1: Description of each component that can be included in the linear predictor of aparameter.

The standard Stata syntax to obatin predictions following a model fit is as follows

. predict newvarname, statistic

So for example, to obtain the fitted survival, hazard and cumulative hazard functions

. predict surv1, survival

. predict haz1, hazard

. predict cumhaz1, cumhazard

Extended prediction options unavilable in standard software include: zeros – obtains base-line predictions, at() – obtains predictions at specified covariate patterns, timevar() – ob-tains predictions at specified times. These options can be combined with standard choices ofhazard, cumhazard and survival. Finally, the ci option can be used to obtain confidenceintervals.

4. Analysis of example datasets using stgenreg

We illustrate stgenreg through use of a dataset comprising of 9721 women aged under 50and diagnosed with breast cancer in England and Wales between 1986 and 1990. The eventof interest is death from any cause, with follow-up restricted to 5 years. Deprivation was


categorized into 5 levels; however, we have restricted the analyzes to comparing the mostaffluent and most deprived groups, for illustrative purposes. We therefore only consider abinary covariate, dep5, with 0 for the most affluent and 1 for the most deprived group.

We further illustrate how to incorporate a time-varying covariate through use of a datasetof 488 patients with liver cirrhosis (Anderson, Borgan, Gill, and Keiding 1993). A totalof 251 patients were randomized to receive prednisone, with 237 randomized to receive aplacebo. Prothrombin index was measured repeatedly, with between 1 and 17 measurementsper subject, resulting in 2968 observations. Outcome was all-cause death.

4.1. Weibull proportional hazards model

We begin by fitting a Weibull proportional hazards model to the breast cancer dataset, investi-gating the effect of deprivation status. Given that Weibull models are available in all standardstatistical software, we first illustrate the concept showing that the estimates agree with es-timates derived using analytically tractible definitions of the hazard and survival functions.The baseline hazard and log hazard functions have the following form

h(t) = λγtγ−1 exp(βX)

and

log(h(t)) = log(λ) + log(γ) + (γ − 1) log(t) + βX

where X is a vector of covariates, with corresponding regression coefficients β. In this case itis convenient to use the loghazard() option of stgenreg. We can investigate covariate effectsby including deprivation status in the linear predictor of log(λ), using the option ln_lambda.

. stgenreg, loghazard([ln_lambda] :+ [ln_gamma] :+ ///

> (exp([ln_gamma]) :- 1) :* log(#t)) nodes(30) ln_lambda(dep5)


----------------------------------------------------------------------------

| Coef. Std. Err. z P>|z| [95% Conf. Interval]

-----------+----------------------------------------------------------------

ln_lambda |

dep5 | .2698633 .0392017 6.88 0.000 .1930293 .3466972

_cons | -2.824814 .0370151 -76.32 0.000 -2.897362 -2.752265

-----------+----------------------------------------------------------------

ln_gamma |

_cons | .0464514 .0179823 2.58 0.010 .0112068 .081696

----------------------------------------------------------------------------


We observe a log hazard ratio of 0.270 (95% CI: 0.193, 0.347) and consequently a hazard ratioof 1.310 (95% CI: 1.213, 1.414), indicating a 31% increase in the mortality rate in the mostdeprived group compared to the most affluent. We could further adjust the γ parameter bydeprivation status but adding the option ln_gamma(dep5).


When fitting models which rely on numerical integration, it is important to establish thestability of maximum likelihood estimates by using an increasing number of quadrature nodes.In the case of a Weibull proportional hazards model, we can both compare with the optimizedmodel using streg in Stata, and compare with an increasing number of quadrature nodes.Here we present results from fitting the streg model and stgenreg models with 15, 30, 50and 100 nodes.

-----------------------------------------------------------------------------

Variable | streg stgenreg15 stgenreg30 stgenreg50 stgenreg100

-----------+-----------------------------------------------------------------

#1 |

dep5 | .2698715 .26983514 .26986326 .26986899 .26987095

| .0392017 .03920178 .03920173 .03920172 .03920171

_cons | -2.8252423 -2.8232443 -2.8248136 -2.8251059 -2.8252139

| .03694985 .03718485 .03701515 .03697471 .03695639

-----------+-----------------------------------------------------------------

#2 |

_cons | .04673335 .04542627 .04645138 .04664313 .04671442

| .01792781 .01812554 .01798227 .01794843 .0179332

-----------+-----------------------------------------------------------------

Statistics |

ll | -8808.0854 -8808.3461 -8808.149 -8808.1075 -8808.0906

-----------------------------------------------------------------------------

We obtain consistent parameter estimates to 3 decimal places with 30 nodes, and accuracy isimproved when the number of nodes are increased. However, computation time will increasewith an increasing number of nodes, for example using 15 nodes takes 7.4 seconds comparedwith 12.4 seconds using 100 nodes (on a HP laptop with Intel i5 2.5GHz processor with 8GBof RAM). In comparison, the fully optimized streg model took 0.4 seconds to converge. Thisdifference is clearly expected as the stgenreg formulation of the Weibull model is not themost computationally efficient, as there is no need to use numerical integration when usingthe standard Weibull model.

4.2. Restricted cubic spline proportional hazards model

We now introduce a much more flexible proportional hazards survival model, modelling thebaseline log hazard function using restricted cubic splines of log(time). We formulate thebaseline log hazard function

log(h(t)) = s(log(t)) +Xβ (5)

where s(log(t)) is a restricted cubic spline function of log(t). This can be implemented byusing the #rcs component option. We use the default knot locations, based on the centilesof the distribution of uncensored survival times.

This draws parallels with the flexible parametric model of Royston and Parmar (2002), imple-mented in Stata as the stpm2 command (Royston and Lambert 2011), which uses restrictedcubic splines to model the log cumulative hazard function

log(H(t)) = s(log(t)) +Xβ (6)


0.05

0.10

0.15

0.20

0.25

Ha

za

rd r

ate


95% confidence interval Baseline hazard rate

Figure 1: Predicted hazard function for the most affluent group with 95% confidence interval.

An advantage of modelling on the log hazard scale is that when there are multiple timedependent effects, the interpretation of the time-dependent hazard ratios is simplified asthey do not depend on values of other covariates, which is the case when modelling on thecumulative hazard scale (Royston and Lambert 2011).

We apply the model in Equation 5 with 5 degress of freedom, i.e., 4 internal knots placedat the 20th, 40th, 60th and 80th percentiles of the distribution of log event times, and 2boundary knots placed at the 0th and 100th percentiles.

. stgenreg, loghazard([xb]) xb(dep5 | #rcs(df(5))) nodes(30)


-----------------------------------------------------------------------------


--------------+--------------------------------------------------------------

dep5 | .2693634 .0392018 6.87 0.000 .1925293 .3461976

_eq1_cp2_rcs1 | -.0621779 .0274602 -2.26 0.024 -.1159989 -.008357

_eq1_cp2_rcs2 | .0784834 .0192975 4.07 0.000 .0406611 .1163057

_eq1_cp2_rcs3 | .1158689 .0176746 6.56 0.000 .0812272 .1505106

_eq1_cp2_rcs4 | -.0251518 .0143719 -1.75 0.080 -.0533202 .0030165

_eq1_cp2_rcs5 | .0012793 .0134076 0.10 0.924 -.0249991 .0275576

_cons | -2.910463 .0607005 -47.95 0.000 -3.029434 -2.791492

-----------------------------------------------------------------------------



When using the component options stgenreg will create variables labelled by the equationnumber (indexed from left to right in the log hazard or hazard specification) and the com-ponent number (again counting from left to right in each parameter option). So variables_eq1_cp2_* contain the spline basis variables defined by the #rcs(df(5)) component. Theestimate of the log hazard ratio for the effect of deprivation is very similar to the Weibullbased estimate; however, we have now estimated 6 parameters to model the baseline hazardfunction, an intercept and 5 parameters associated with the spline terms. We can obtain thepredicted baseline hazard function and 95% confidence interval as follows

. predict haz1, hazard ci zeros

We illustrate the fitted baseline hazard function in Figure 1.

Time-dependent effects

We now investigate the presence of a time-dependent effect due to deprivation status. Withinthe framework of restricted cubic splines, this can be investigated using the component formvarname:*#rcs(df(num)), i.e., an interaction between the effect of time (using splines) andthe deprivation group. We use 3 degrees of freedom for illustration.

. stgenreg, loghazard([xb]) nodes(30) ///

> xb(dep5 | #rcs(df(5)) | dep5 :* #rcs(df(3)))


-----------------------------------------------------------------------------


--------------+--------------------------------------------------------------

dep5 | .0723415 .0924005 0.78 0.434 -.1087602 .2534433

_eq1_cp2_rcs1 | -.0108058 .0309504 -0.35 0.727 -.0714673 .0498558

_eq1_cp2_rcs2 | .0672877 .0224852 2.99 0.003 .0232177 .1113578

_eq1_cp2_rcs3 | .1128672 .0207167 5.45 0.000 .0722634 .1534711

_eq1_cp2_rcs4 | -.0261438 .0145455 -1.80 0.072 -.0546525 .002365

_eq1_cp2_rcs5 | .0014202 .0134079 0.11 0.916 -.0248589 .0276992

_eq1_cp3_rcs1 | -.1464002 .0443983 -3.30 0.001 -.2334194 -.0593811

_eq1_cp3_rcs2 | .0425164 .0333753 1.27 0.203 -.022898 .1079307

_eq1_cp3_rcs3 | .0135896 .0322604 0.42 0.674 -.0496396 .0768187

_cons | -2.849318 .0649361 -43.88 0.000 -2.976591 -2.722046

-----------------------------------------------------------------------------


In Figure 2 we compare the fit of the models with either time-independent or time-dependenthazard ratios for deprivation status, by overlaying the fitted survival functions onto theKaplan-Meier curve, for each deprivation group. We observe a much improved fit to theKaplan-Meier curve when modelling the time-dependent effect of deprivation group. We canpredict the time-dependent hazard ratio using the partpred (Lambert 2010) command asfollows.


Figure 2: Kaplan-Meier estimates for the most affluent and most deprived groups, withpredicted survival overlaid. The figure on the left shows predicted survival with a proportionaleffect of deprivation status, with the figure on the right allowing for non-proportional hazardsin the effect of deprivatin status.

12

34

56

Ha

za

rd R

atio


95% upper bound: hr/95% lower bound: hr Prediction

Figure 3: The estimated time-dependent hazard ratio for deprivation group and associated95% confidence interval.


. partpred hr, for(dep5 _eq1_cp3*) ci(hr_uci hr_lci) eform

This is then plotted in Figure 3 which shows that the relative increase in the mortality rateis much larger at the start of follow-up and decreases to around one by 5 years.

4.3. Generalized gamma proportional hazards model

The generalized gamma (GG) is a 3-parameter parametric model implemented in a variety ofstatistical packages (Cox, Chu, Schneider, and Munoz 2007). However, it is parameterized asan accelerated failure time model in Stata. We can write the survival and density functionsas

SGG(t) =

1− I (γ, u) if κ > 0

1− Φ (z) if κ = 0

I (γ, u) if κ < 0

(7)

and

fGG(x) =

γγ

σt√2π

exp(z√

(γ)− u) if κ 6= 01

σt√2π

exp(−z2/2) if κ = 0(8)

where γ = |κ|−2, z = signlog(t)−µ, µ = γ exp(|κ|z), Φ(z) is the standard normal cumulativedistribution, and I(a, x) is the incomplete gamma function.

Therefore using Equation 1, we can write down our baseline hazard function as the ratio ofthe probability distribution function to the survival function.

hGG(t) =fGG(t)

SGG(t)

To invoke proportional hazards we can then simply multiply by the exponential of a parameter,the linear parameter of which is our vector of covariates

hGG(t) =fGG(t)

SGG(t)exp(Xβ) or log(hGG(t)) = log

(fGG(t)

SGG(t)

)+Xβ

Where β is a vector of log hazard ratios. In terms of implementation, in the linear predictor forour Xβ parameter we must specify the nocons option to ensure no intercept term, obtaininga proportional hazards formulation for the GG model. As this is a complex function, we canuse Stata’s local macros to build up the function.

. local mu [mu]

. local sigma exp([ln_sigma])

. local kappa [kappa]

. local gamma (abs(`kappa') :^ (-2))

. local z (sign(`kappa') :* (log(#t) :- `mu') :/ (`sigma'))

. local u ((`gamma') :* exp(abs(`kappa') :* (`z')))

. local surv1 (1 :- gammap(`gamma',ù')) :* (`kappa' :> 0)

. local surv2 (1 :- normal(`z')) :* (`kappa' :== 0)

. local surv3 gammap(`gamma',ù') :* (`kappa' :< 0)

. local pdf1 ((`gamma' :^ `gamma') :* exp(`z' :* sqrt(`gamma') :- ù') :/ ///


> (`sigma' :* #t :* sqrt(`gamma') :* gamma(`gamma'))) :* (`kappa' :! =0)

. local pdf2 (exp(-(`z' :^ 2) :/ 2) :/ (`sigma' :* #t :* sqrt(2 :* pi())))///

> :* (`kappa' :== 0)

. local haz (`pdf1' :+ `pdf2') :/ (`surv1' :+ `surv2' :+ `surv3')

. stgenreg, hazard(exp([xb]) :* (`haz')) nodes(30) xb(dep5,nocons)


----------------------------------------------------------------------------


-------------+--------------------------------------------------------------

xb |

dep5 | .2694578 .0391992 6.87 0.000 .1926289 .3462868

-------------+--------------------------------------------------------------

kappa |

_cons | .6752793 .0749985 9.00 0.000 .528285 .8222735

-------------+--------------------------------------------------------------

mu |

_cons | 2.710497 .032793 82.65 0.000 2.646224 2.774771

-------------+--------------------------------------------------------------

ln_sigma |

_cons | .1727204 .0521935 3.31 0.001 .0704231 .2750178

----------------------------------------------------------------------------


Once again we obtain very similar estimates to the Weibull model, but now modelling thebaseline with 3 parameters. This model formulation illustrates a powerful tool where bysimply introducing an extra parameter we can implement a model not available in any softwarepackage.

4.4. Time-varying covariates

We now illustrate the data setup required for survival analysis incorporating a time-varyingcovariate. We use the liver cirrhosis dataset described above. Here we use the enter() andid() options of stset in Stata, to declare the data as multiple record per subject.

. stset stop, enter(start) id(id) failure(event=1)

id: id

failure event: event == 1

obs. time interval: (stop[_n-1], stop]

enter on or after: time start

exit on or before: failure

---------------------------------------------------------------------------

2968 total obs.

0 exclusions


---------------------------------------------------------------------------

2968 obs. remaining, representing

488 subjects

292 failures in single failure-per-subject data

1777.749 total analysis time at risk, at risk from t = 0

earliest observed entry t = 0

last observed exit t = 13.39393

We illustrate the data structure of 2 patients, where _t0 represents the enter times at whichprothrombin was measured

. list id pro trt _t0 _t _d if id==1 | id==111, noobs sepby(id)

+-----------------------------------------------------+

| id pro trt _t0 _t _d |

|-----------------------------------------------------|

| 1 38 placebo 0 .2436754 0 |

| 1 31 placebo .2436754 .38057169 0 |

| 1 27 placebo .38057169 .41342679 1 |

|-----------------------------------------------------|

| 111 59 prednisone 0 .24641332 0 |

| 111 60 prednisone .24641332 .49830249 0 |

| 111 87 prednisone .49830249 .74471581 0 |

| 111 59 prednisone .74471581 1.1280254 0 |

| 111 35 prednisone 1.1280254 1.1581426 1 |

+-----------------------------------------------------+

We can now fit a stgenreg model using restricted cubic splines to model the baseline, ad-justing for the proportional effects of treatment and prothrombin index.

. stgenreg, loghazard([xb]) xb(pro trt | #rcs(df(3))) nolog

Variables _eq1_cp2_rcs1 to _eq1_cp2_rcs3 were created


----------------------------------------------------------------------------


--------------+-------------------------------------------------------------

pro | -.0349754 .0024771 -14.12 0.000 -.0398304 -.0301205

trt | .1325576 .1182068 1.12 0.262 -.0991235 .3642388

_eq1_cp2_rcs1 | -.091006 .0579785 -1.57 0.116 -.2046419 .0226298

_eq1_cp2_rcs2 | -.1354551 .0431334 -3.14 0.002 -.219995 -.0509151

_eq1_cp2_rcs3 | -.2292129 .0499583 -4.59 0.000 -.3271295 -.1312964

_cons | .7376377 .1690535 4.36 0.000 .4062988 1.068977

----------------------------------------------------------------------------



We observe a log hazard ratio of −0.35 (95% CI: −0.040, −0.030) indicating lower values ofthe biomarker are associated with an increased risk of death.

Alternatively stgenreg can be used in conjunction with Stata’s stsplit command, to createat risk time intervals.

5. Discussion

We have presented the stgenreg command in Stata, for the general parametric analysis ofsurvival data. Through specification of a user-defined hazard function, we have illustratedhow to implement standard proportional hazards models, novel restricted cubic spline survivalmodels and a generalized gamma model with proportional hazards. In essence, stgenreg maybe used to implement a parametric survival model defined by anything from a very simple oneparameter proportional hazards model, to models which contain highly flexible functions oftime, for both the baseline and time-dependent effects. Any parameter defined in the hazardfunction can be dependent on complex functions of time, including fractional polynomials orrestricted cubic splines.

The choice of the number of quadrature nodes is left to the user. An increasing number ofquadrature nodes should be used to establish consistent parameter estimates.

As it is a general framework, it may not be the most computationally efficient; however, it isa useful tool for the development of novel models. For example, it may be useful to developideas and test new models, but then spend time developing more computationally efficientmethods for specific cases.

In future developments we aim to allow for interval censoring, the extension to incorpo-rate frailty and a post-estimation command to calculate the cumulative incidence functionfor competing risks. The package is available from the Statistical Software Componentsarchive (Crowther and Lambert 2013) and can be installed from Stata by typing ssc install

stgenreg.

Acknowledgments

Michael Crowther was funded by a National Institute for Health Research (NIHR) DoctoralFellowship (DRF-2012-05-409).

The authors would like to thank two anonymous reviewers and an editor whose commentsgreatly improved the paper.

References

Anderson PK, Borgan Ø, Gill RD, Keiding N (1993). Statistical Models Based on CountingProcesses. Springer-Verlag.

Cox C, Chu H, Schneider MF, Munoz A (2007). “Parametric Survival Analysis and Taxonomyof Hazard Functions for the Generalized Gamma Distribution.” Statistics in Medicine,26(23), 4352–4374.


Cox DR (1972). “Regression Models and Life-Tables.” Journal of the Royal Statistical SocietyB, 34(2), 187–220.

Crowther MJ, Lambert P (2013). “stgenreg: Stata Module to Fit General Parametric SurvivalModels.” Statistical Software Components, Boston College Department of Economics. URLhttp://ideas.repec.org/c/boc/bocode/s457579.html.

Durrleman S, Simon R (1989). “Flexible Regression Models with Cubic Splines.” Statistics inMedicine, 8(5), 551–561.

Gould W, Pitblado J, Poi B (2010). Maximum Likelihood Estimation with Stata. 4th edition.Stata Press.

Jatoi I, Anderson WF, Jeong JH, Redmond CK (2011). “Breast Cancer Adjuvant Therapy:Time to Consider Its Time-Dependent Effects.” Journal of Clinical Oncology, 29(17), 2301–2304.

Lambert P (2010). “partpred: Stata Module to Generate Partial Predictions.” StatisticalSoftware Components, Boston College Department of Economics. URL http://ideas.

repec.org/c/boc/bocode/s457176.html.

Lambert PC, Dickman PW, Nelson CP, Royston P (2010). “Estimating the Crude Probabilityof Death due to Cancer and other Causes using Relative Survival Models.” Statistics inMedicine, 29(7-8), 885–895.

Lambert PC, Holmberg L, Sandin F, Bray F, Linklater KM, Purushotham A, Robinson D,Møller H (2011). “Quantifying Differences in Breast Cancer Survival between England andNorway.” Cancer Epidemiology, 35(6), 526–533.

Mok TS, Wu YL, Thongprasert S, Yang CH, Chu DT, Saijo N, Sunpaweravong P, Han B,Margono B, Ichinose Y, Nishiwaki Y, Ohe Y, Yang JJ, Chewaskulyong B, Jiang H, DuffieldEL, Watkins CL, Armour AA, Fukuoka M (2009). “Gefitinib or Carboplatin-Paclitaxel inPulmonary Adenocarcinoma.” New England Journal of Medicine, 361(10), 947–957.

Morden JP, Lambert PC, Latimer N, Abrams KR, Wailoo AJ (2011). “Assessing Methods forDealing with Treatment Switching in Randomised Controlled Trials: A Simulation Study.”BMC Medical Research Methodology, 11, 4.

Nelson CP, Lambert PC, Squire IB, Jones DR (2007). “Flexible Parametric Models for Rela-tive Survival, with Application in Coronary Heart Disease.” Statistics in Medicine, 26(30),5486–5498.

R Core Team (2013). R: A Language and Environment for Statistical Computing. R Founda-tion for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

Royston P, Altman DG (1994). “Regression Using Fractional Polynomials of ContinuousCovariates: Parsimonious Parametric Modelling.” Journal of the Royal Statistical SocietyC, 43(3), 429–467.

Royston P, Lambert PC (2011). Flexible Parametric Survival Analysis using Stata: Beyondthe Cox model. Stata Press.


Royston P, Parmar MKB (2002). “Flexible Parametric Proportional Hazards and ProportionalOdds Models for Censored Survival Data, with Application to Prognostic Modelling andEstimation of Treatment Effects.” Statistics in Medicine, 21(15), 2175–2197.

SAS Institute Inc (2008). SAS/STAT Software, Version 9.2. Cary, NC. URL http://www.

sas.com/.

StataCorp (2011). “Stata Data Analysis Statistical Software: Release 12.” URL http://www.

stata.com/.

Stoer J, Burlirsch R (2002). Introduction to Numerical Analysis. 3rd edition. Springer-Verlag.

Therneau T (2012). survival: A Package for Survival Analysis in S. R package version 2.36-14, URL http://CRAN.R-project.org/package=survival.

Weinstein MC, O’Brien B, Hornberger J, Jackson J, Johannesson M, McCabe C, Luce BR(2003). “Principles of Good Practice for Decision Analytic Modeling in Health-Care Eval-uation: Report of the ISPOR Task Force on Good Research Practices–Modeling Studies.”Value in Health, 6(1), 9–17.

Affiliation:

Michael J. CrowtherDepartment of Health SciencesUniversity of LeicesterLeicester, United KingdomE-mail: [email protected]: http://www2.le.ac.uk/departments/health-sciences/research/

biostats/staff-pages/mjc76/

Paul C. LambertDepartment of Health SciencesUniversity of LeicesterLeicester, United KingdomandDepartment of Medical Epidemiology and BiostatisticsKarolinska InstitutetStockholm, SwedenE-mail: [email protected]: http://www2.le.ac.uk/Members/pl4/

Journal of Statistical Software http://www.jstatsoft.org/

published by the American Statistical Association http://www.amstat.org/

Volume 53, Issue 12 Submitted: 2012-07-09May 2013 Accepted: 2013-01-08

APPENDIX C

Research Paper 5 - Statistics in Medicine

This appendix contains a review of current methods and issues in joint modelling

of longitudinal and survival data, conducted by the Drug Information Association joint

model working group, that I was invited to be involved in. It has been recently published

in Statistics in Medicine. It is omitted from the e-thesis due to copyright.

222

APPENDIX D

Research Paper 6 - Statistics in Medicine

This appendix contains a methodological paper published in Statistics in Medicine,

describing the development of the Royston-Parmar based joint model. It is omitted

from the e-thesis due to copyright.

223

APPENDIX E

Research Paper 7 - BMC Medical Research Methodology

This appendix contains a paper published in BMC Medical Research Methodology,

describing the novel use of the intercept association structure within a joint model,

described in Chapter 9.

224

Crowther et al. BMC Medical Research Methodology 2013, 13:146http://www.biomedcentral.com/1471-2288/13/146

RESEARCH ARTICLE Open Access

Adjusting for measurement error in baselineprognostic biomarkers included in atime-to-event analysis: a joint modellingapproachMichael J Crowther1*, Paul C Lambert1,2 and Keith R Abrams1

Abstract

Background: Methodological development of joint models of longitudinal and survival data has been rapid inrecent years; however, their full potential in applied settings are yet to be fully explored. We describe a novel use of aspecific association structure, linking the two component models through the subject specific intercept, and thusextend joint models to account for measurement error in a biomarker, even when only the baseline value of thebiomarker is of interest. This is a common occurrence in registry data sources, where often repeated measurementsexist but are simply ignored.

Methods: The proposed specification is evaluated through simulation and applied to data from the General PracticeResearch Database, investigating the association between baseline Systolic Blood Pressure (SBP) and thetime-to-stroke in a cohort of obese patients with type 2 diabetes mellitus.

Results: By directly modelling the longitudinal component we reduce bias in the hazard ratio for the effect ofbaseline SBP on the time-to-stroke, showing the large potential to improve on previous prognostic models which useonly observed baseline biomarker values.

Conclusions: The joint modelling of longitudinal and survival data is a valid approach to account for measurementerror in the analysis of a repeatedly measured biomarker and a time-to-event. User friendly Stata software is provided.

BackgroundMany biomarkers such as systolic blood pressure (SBP)have been identified as key prognostic factors in the devel-opment and validation of cardiovascular risk scores [1,2].However, often only baseline values of these biomark-ers are used, despite the existence of repeated measures,especially in registry sources such as the General PracticeResearch Database (GPRD) [3]. Furthermore, biomark-ers are often measured with error. Failing to adjust forsuch measurement error leads to estimates being biasedtowards the null [4].

*Correspondence: [email protected] of Leicester, Department of Health Sciences, Adrian Building,University Road,Leicester LE1 7RH, UKFull list of author information is available at the end of the article

A joint model of longitudinal and survival data allowsus to investigate the relationship between a repeatedlymeasured biomarker, subject to measurement error, suchas SBP, and the time to an event of interest, such astime to non-fatal stroke. The approach which has dom-inated the methodological literature involves linking thetwo component submodels using shared random effects[5,6]. From a classical perspective, these methods requirecomputationally intensive numerical integration, which isdifficult to implement. However, due to the recent intro-duction of user-friendly software in R [7,8] and Stata [9],these models are starting to find their place in appliedresearch [10,11], but the potential uses of and forms ofthe association parameters, linking the longitudinal andsurvival components, are yet to be fully explored. Alterna-tively, many authors have proposed a Bayesian approach,

© 2013 Crowther et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

Crowther et al. BMC Medical Research Methodology 2013, 13:146 Page 2 of 8http://www.biomedcentral.com/1471-2288/13/146

proving readily available BUGS code to implement themodels [12,13].

The most commonly used association structuresinclude the current value parameterisation [5]; wherebywe directly link the value of the biomarker, as estimatedby the longitudinal submodel, to survival, and the firstderivative or slope [10]; allowing the investigation of theeffect that the rate of change of the biomarker has onsurvival.

There is often interest in predicting prognosis basedon an initial baseline measurement [1,2]. In this paperwe investigate the use of the joint model framework witha random intercept association structure as an approachto adjust for measurement error, inherent in biomarkerssuch as SBP. By incorporating the repeated measures wethus make the most efficient use of the data available.In particular, as a prognostic model for future patients,we describe how this framework can be used to pre-dict survival for new patients who will only have baselinemeasurements.

MethodsA joint model of longitudinal and survival data consistsof two component submodels: the longitudinal submodeland the survival submodel. We define a set of baselinecovariates, U i, which can potentially differ between sub-models. The longitudinal submodel allows us to model thetrajectory of a repeatedly measured biomarker over time,adjusting for baseline covariates. The standard approachassumes a linear mixed effects model [14]. We observe

Yi(tij) = Wi(tij) + εij, εij ∼ N(0, σ 2e ) (1)

with

Wi(tij) = X′i(tij)β + Z′

i(tij)bi + uiδ (2)

where Yi(tij) is the observed longitudinal response forthe ith patient measured at the jth time point. Wi(tij)is our true unobserved trajectory function consisting ofdesign matrices X′

i(tij) and Z′i(tij) for the fixed and random

effects, β and bi, respectively, where bi ∼ MVN(0, ).We can incorporate flexibility here by allowing both X′

i(tij)and Z′

i(tij) to contain restricted cubic spline functions ofmeasurement time [15]. We also have a vector of baselinecovariates ui ∈ U i, and corresponding regression coeffi-cients, δ. Finally, εij is our normally distributed measure-ment error with constant variance σ 2

e . We further assumethat the random effects and error term are independent,and that cov(εij, εik) = 0 (where j = k).

The time-to-event submodel usually takes the form of aproportional hazards model

hi(t) = h0(t) exp(α1Wi(t) + φvi) (3)

with h0(t) the baseline hazard function and vi ∈ U i isa vector of baseline covariates with corresponding loghazard ratios, φ. The parameter α1 is commonly namedthe association parameter, indicating the strength of asso-ciation between the longitudinal biomarker and the timeto event. If α1 = 0, then the joint model reduces tothe two separate models and fitting a joint model willnot prove advantageous. This parameterisation assumesthe hazard is dependent on the biomarker through itscurrent value. This form of association is one of manyways to link the two component sub-models. The baselinehazard function, h0(t), can be modelled using a para-metric distribution, most frequently the Weibull, or lessrestrictively using flexible parametric survival models[16], or of course can be left unspecified [17]. However,an unspecified baseline hazard function leads to under-estimation of the standard errors of parameter estimates[18], and consequently bootstrapping is required to obtainappropriate standard errors.

For illustration, we let Wi(tij), the longitudinal sub-model, be a linear function of time where the intercept andslope varies between subjects

Wi(tij) = (β0 + b0i) + (β1 + b1i)tij (4)

giving a model with a random intercept and random lin-ear slope. As an alternative way of linking the componentmodels to that of Equation (3), we may link elementsof the trajectory function, Wi(tij), to the hazard directly.For example, we can link the subject specific baselinebiomarker values through the intercept association struc-ture, where

hi(t) = h0(t) exp [α2(β0 + b0i) + φvi] (5)

in this expression α2 now estimates the strength ofthe association between the patient specific baselinebiomarker values, as estimated by the longitudinal sub-model, and the time-to-event. This way we can let the riskof event depend directly on the subject specific value ofthe biomarker at time t = 0.

If interest lies in prediction when a new patient isobserved at baseline, the issue of measurement error canbe accounted for through this approach. A benefit of thisassociation structure also lies in the evaluation of the jointlikelihood. Under most parametric survival submodels(e.g. Weibull distribution) and time-dependent associa-tion structures (eg. current value), numerical quadratureis required to integrate out not only the random effects,but under Equation (3), nested quadrature is also requiredto evaluate the cumulative hazard function. Under thetime-independent association structure of Equation (5),we avoid this nested quadrature as the cumulativehazard function has an analytically tractable form, whichprovides computational benefits.


As discussed in the introduction, this model formula-tion can be an alternative to the standard approach ofusing the observed baseline biomarker value

hi(t) = h0i(t) exp(α3Y0i + φvi) (6)

where Y0i is the observed baseline biomarker value andα3 is the log hazard ratio for a one unit increase in theobserved baseline biomarker value. Although simple to fit,Equation (6) does not account for potential measurementerror in Y0i.

Simulation studyIn order to assess the performance of the standardapproach of including observed biomarker values, com-pared to the full joint model described above, we evalu-ated both through simulation [19]. For ease of expositionwe assume a longitudinal model with random interceptand slope, assuming a continuous biomarker of interestwith

Wi(tij) = (β0 + b0i) + (β1 + b1i)tij

where β0 = β1 = 0, and b0i ∼ N(0, 1), b1i ∼ N(0, 0.252)with correlation between (b0i, b1i) of 0.25. Observed mea-surements are then generated from Yij ∼ N(Wi(tij), σ 2

e ),where tij is the time of the jth measurement for the ith

patient. We vary σe from 0.1, 0.5, 1.We assume a Weibull baseline hazard function with

λ = 0.1 and γ = 1.5. A binary variable, X1 to rep-resent treatment group was generated from Bin(1, 0.5),with an associated log hazard ratio of φ1 = −0.5. A con-tinuous covariate, X2, to represent age at baseline wasgenerated from N(65, 12) with an associated log hazardratio of φ2 = 0.01. We then generate survival timesfrom a Weibull distribution where the hazard is definedas h(t) = h0(t) exp(α2β0i + φ1X1 + φ2X2), with α2 theassociation parameter, indicating the effect of a one unitincrease in the value of the subject specific intercept onthe risk of event. We vary α2 = −0.5, 0.25, 0.25, 0.5. Eachsimulation contained 300 patients with up to 5 annualmeasurements (including baseline), and administrativecensoring at 5 years. This corresponds to an approximate18.9% survival proportion at 5 years (calculated at themean of covariate values, X1 = 1

2 , X2 = 65 and β0i = 0).To each dataset we fit a Weibull proportional hazardsmodel including the observed baseline measurement, anda Weibull-based joint model with the random interceptassociation structure. We adjust for age and treatment inthe survival submodel. Each scenario is simulated 1000times.

To illustrate the varying measurement error standarddeviations used in the simulation scenarios, we show inFigure 1 observed longitudinal measurements from thesame 100 patients with σe = 0.1, 0.5, 1, and when

α = 0.25. Figure 1 illustrates that as the measurementerror standard deviation increases, the variability in theobserved biomarker values increases.

The GPRD cohortThe General Practice Research Database (GPRD) Grouphas obtained ethical approval from a Multi-centreResearch Ethics Committee (MREC) for all purely obser-vational research using GPRD data; namely, studies whichdo not include patient involvement The core work ofthe GPRD is covered by MREC approval granted by theTrent Multi- Research Ethics Committee (REC referencenumber 05/MRE04/87) and this study was approved bythe GPRD Independent Scientific Advisory Committee(ISAC) (Protocol number 09_094). This study is based inpart on data from GPRD obtained under licence fromthe UK Medicines and Healthcare Product RegulatoryAgency (MHRA). However, the interpretation and con-clusions contained in this study are those of the authorsalone.

The example cohort used to illustrate the methods inthis paper consists of 4,850 obese patients diagnosed withtype 2 diabetes mellitus. We have 107,347 measurementsof SBP, with maximum follow-up of 22 years. There were278 stroke events observed.

In Figure 2 we show the observed SBP measure-ments for 9 randomly selected patients, who had at least10 measurements, illustrating some nonlinear trajecto-ries. To accommodate such nonlinearities we can userestricted cubic splines in the linear mixed effects sub-model. In particular, we specify the following longitudinalsubmodel

Wi(tij) = (β0 + b0i) + β1agei + β2sexi + β3BMIi

+ (βF sF(tij; kF) + bRsR(tij; kR))(7)

Where sF(tij; kF) is the restricted cubic spline basis ofmeasurement time with corresponding fixed effects, βF ,with knot locations kF , and sR(tij; kR) is the restrictedcubic spline basis of measurement time with correspond-ing random effects, bR, and knot locations kR.

Prelimenary modelling of the longitudinal data can beconducted to guide model selection, in particular, thedegrees of freedom for the spline terms capturing theunderlying longitudinal trajectory over time.

To allow flexibility in the survival submodel we use theflexible parametric survival model [16,20], which mod-els the baseline log cumulative hazard function usingresticted cubic splines. We can once again undertakeseperate analysis of just the survival data to inform modelselection. In particular, we can use the AIC and BIC toguide the selection of the number of degrees of free-dom to capture the baseline hazard function, following


-4-2

02

4

Long

itudi

nal r

espo

nse


σe = 0.1

-4-2

02

4

Long

itudi

nal r

espo

nse


σe = 0.5

-6-4

-20

24

Long

itudi

nal r

espo

nse


σe = 1

Figure 1 Example simulated observed longitudinal measurements with varying measurement error standard deviation.

100125150175200

SB

P


Patient 1

100125150175200

SB

P


Patient 2

100125150175200

SB

P


Patient 3

100125150175200

SB

P

0 5 10 15


Patient 4

100125150175200

SB

P

0 5 10 15


Patient 5

100125150175200

SB

P

0 5 10 15


Patient 6

100125150175200

SB

P

0 5 10 15


Patient 7

100125150175200

SB

P

0 5 10 15


Patient 8

100125150175200

SB

P

0 5 10 15


Patient 9

Observed SBP Predicted SBP

Figure 2 Longitudinal response measurements for SBP for 9 randomly selected patients who had at least 10 measurements. The dashedline represents the fitted longitudinal trajectories based on the joint model.


Rutherford et al. (2013) [21]. Our final joint model isthen

log [Hi(t)] = log [H0(t)] + φ1 agei + φ2 sexi + φ3 BMIi

+ α2(β0 + b0i)

(8)

where

log [H0(t)] = s(log(t); γ , kS) (9)

where the baseline log cumulative hazard function,log [H0(t)], is expanded into a restricted cubic spline func-tion of log(t), s(log(t); γ , kS), with knot locations kS andcoefficient vector, γ . This framework has recently beenincorporated into a joint model [22]. In each submodelwe adjust for the baseline effects of age, sex and BMI.We fit the joint model with the random intercept asso-ciation structure shown in Equation (5). For comparison,we also apply the standard flexible parametric survivalmodel, adjusting for observed baseline SBP, age, sex andBMI.

ResultsSimulation study resultsBias and coverage estimates for the association parame-ter are presented in Table 1. Under the standard Weibullmodel, we observe increasing bias in the estimates ofthe association between baseline biomarker values andsurvival, as the magnitude of the measurement error stan-dard deviation, σe, increases. In parallel we observe verypoor coverage probabilities under the Weibull approach.For example, with α = 0.5 and σe = 1, we observe

bias of -0.261 (percentage bias of -52.2%) and coverage of0.4%. In contrast, under the joint modelling approach weobserve minimal bias and coverage probabilities close to95% across all scenarios.

Analysis of GPRD cohortWe now present the analysis of the GPRD cohort. In allanalyses we use SBP/10 so that a unit increase in SBP/10represents a clinically meaningful 10 unit increase in SBP.Our primary interest is the association between baselineSBP and the risk of stroke. Baseline (tij = 0) correspondsto when each patient entered the cohort, i.e. the time offirst SBP measurement.

We began by assuming a random intercept and select-ing the degrees of freedom for the fixed spline termsusing the AIC and BIC. In this case, both selected fivedegrees of freedom for sF(tij; kF), with an AIC of 417565.8and BIC of 417604.1. For the random splines of time weassumed a linear term, which equates to one spline termfor sR(tij; kR). This allows a very flexible form to take intoaccount the variation in SBP over time. We further adjustfor age, sex and Body-Mass Index (BMI) at baseline.

For the flexible parametric survival submodel, both AICand BIC selected two degrees of freedom, with an AICof 2408.7173 and BIC of 2430.483. If one degree of free-dom had been selected, then this would be equivalent to aWeibull survival model.

Results are presented in Table 2. Under the standardflexible parametric survival model we observe a hazardratio for a ten unit increase in baseline SBP of 1.111 (95%CI: 1.051, 1.172). Under a joint model we observe anincreased hazard ratio of 1.198 (95% CI: 1.107, 1.298). The

Table 1 Simulation results of the association parameter, α

True True Weibull Joint model

α σe Bias % bias MSE CP Bias % bias MSE CP

0.1 -0.001 -0.2 0.006 94.8 0.005 0.9 0.006 95.3

0.50 0.5 -0.105 -21.1 0.016 65.4 0.005 0.9 0.007 95.6

1.0 -0.261 -52.1 0.071 0.4 0.008 1.6 0.012 94.8

0.1 0.002 1.0 0.005 94.4 0.005 2.0 0.006 94.3

0.25 0.5 -0.046 -18.5 0.007 89.0 0.007 2.7 0.007 94.5

1.0 -0.123 -49.2 0.018 34.1 0.010 4.1 0.009 94.8

0.1 0.003 -1.3 0.006 93.8 0.001 -0.2 0.006 94.0

-0.25 0.5 0.051 -20.6 0.007 87.1 0.000 -0.1 0.007 94.2

1.0 0.127 -50.7 0.019 29.7 -0.002 0.9 0.009 94.6

0.1 0.000 -0.1 0.006 96.6 -0.005 1.0 0.006 95.9

-0.50 0.5 0.104 -20.9 0.015 66.7 -0.006 1.1 0.007 95.7

1.0 0.260 -52.0 0.070 0.4 -0.010 2.0 0.012 94.5

MSE - mean square error.CP - coverage probability.σe - standard deviation of the measurement error.


Table 2 Results from applying a flexible parametric proportional hazards model adjusting for observed baseline systolicblood pressure, and a full joint model using the intercept association structure

Standard FPSM Joint model

Coefficient 95% CI Coefficient 95% CI

Survival model:

Baseline SBP/10 (α2) 0.105 0.050 0.159 0.181 0.102 0.261

Age (years) 0.048 0.036 0.060 0.050 0.038 0.062

Sex (male) 0.011 -0.233 0.254 -0.010 -0.253 0.234

BMI (kg/m2) 0.011 -0.015 0.037 0.013 -0.012 0.039

Longitudinal model:

Intercept - - - 13.006 12.629 13.382

Age (years) - - - 0.025 0.022 0.029

Sex (male) - - - -0.252 -0.332 -0.171

BMI (kg/m2) - - - 0.003 -0.005 0.011

RCS1 - - - -0.080 -0.121 -0.039

RCS2 - - - -0.006 -0.019 0.006

RCS3 - - - -0.001 -0.010 0.007

RCS4 - - - 0.003 0.000 0.006

RCS5 - - - 0.000 -0.001 0.001

σe - - - 1.522 1.515 1.528

FPSM - Flexible Parametric Survival Model.RCS - Restricted Cubic Spline.

increased effect using a joint model is consistent with thatobserved in the simulation study, i.e. that the bias in thestandard survival model is towards the null. The fittedtrajectories seen in Figure 2 appear to capture the subject-specific measurements well, although some panels appearto only require a linear trend.

We illustrate how the bias from the standard approachincreases with SBP in Figure 3, showing predictions fromboth models for a female patient aged 60, with low (90),medium (130) and high (200) SBP baseline measurements.To quantify the differences, at 10 years under the stan-dard model we observe a survival probability of 0.881for a SBP of 200, compared to 0.816 under the full jointmodel.

DiscussionA wealth of patient data is becoming available in registrysources such as the GPRD, providing extensive oppor-tunities to utilise the joint modelling framework. Wehave shown that by incorporating repeated measures ofa biomarker within a unified joint model framework,we reduce bias due to measurement error, even whenonly the baseline level of the biomarker is predictive ofsurvival. As illustrated in the simulation study, ignoringmeasurement error in biomarkers such as blood pres-sure can lead to a marked underestimation of covariateeffects. In our application, through the use of restricted

cubic splines in the linear mixed effects submodel, we canmodel highly nonlinear trajectories over time, comparedto linear slope models. Furthermore, the flexible paramet-ric survival submodel can also capture complex baselinehazard functions, an important component when predict-ing survival at the patient level [22].

Given that, to our knowledge, all current cardiovas-cular risk scores only use baseline measures, with no

0.6

0.7

0.8

0.9

1.0

Sur

viva

l pro

babi

lity

0 5 10 15 20Follow-up time (years)

FPM model: SBP = 90 Joint model: SBP = 90FPM model: SBP = 130 Joint model: SBP = 130FPM model: SBP = 200 Joint model: SBP = 200

Figure 3 Predicted survival from the flexible parametric survivalmodel and joint model, for a female, aged 60 years, BMI of 30,with SBP of 90, 130 or 200.


adjustment for measurement error, the prospects of util-ising this framework to improve prognostic risk scoresis quite substantial. Predicting survival for a new patientusing this framework follows naturally, as often only a firstbaseline biomarker observation will be available. How-ever, such a modelling approach also allows a dynamic riskprediction approach to be adopted, whereby a patient’sestimated future risk is updated as each new biomarkervalue is obtained [23]. Such an approach could enableresponse to treatment to be monitored and patients coun-selled accordingly.

In the analysis of the GPRD cohort, we incorporatedflexibility in both the longitudinal submodel through theuse of restricted cubic splines, and the flexible paramet-ric survival submodel. Given that both submodels requirechoosing the number of degrees of freedom, a simple sen-sitivity analysis can be undertaken to assess knot locationsand number of knots. We showed recently that the flex-ible parametric survival submodel is very robust to bothknot placement and number of knots within a joint modelframework [22], and furthermore, an extensive simula-tion study has been conducted by Rutherford et al. (2013),which showed excellent performance of the flexible para-metric model to capture simple and complex baselinehazard functions [21]. Furthermore, given that primaryinterest was in the survival component, and the estimateof association, often modelling the longitudinal compo-nent with a suitable sensible functional form will providean improved estimate compared to simplistic approachesof seperate modelling.

In this paper we have concentrated on a specific asso-ciation structure linking the 2 component submodels;however, it may be of interest to investigate linking multi-ple components of a biomarkers trajectory to the time toan event of interest. For example, recent work by Rothwellet al. (2010) [24] has shown associations between not onlybaseline blood pressure, but also variability over time asimportant predictors of cardiovascular events. Further-more, we have only compared the standard approach ofadjusting for observed baseline biomarker values to thefull joint model. It would be of interest to compare alter-native approaches for adjusting for measurement error,not only in baseline biomarkers, but also under a time-dependent association structure [25,26].

Extensions to the modelling framework include incor-porating multiple biomarkers. In particular, in ourexample we modelled SBP over time, whilst adjusting forbaseline BMI. It may be of interest to model not onlySBP but also the inter-relationships between differentbiomarkers such as BMI, and how they are related to anevent of interest [13].

To facilitate the use of the methods in practice, userfriendly Stata software, written by the first author, isavailable, with a variety of survival model choices and

association structures, including those discussed in thisarticle [9,27]. To illustrate computational aspects of theframework, the presented joint model applied to thecohort took just over 13 minutes to converge on a stan-dard laptop computer.

ConclusionThe joint modelling of longitudinal and survival data is avalid approach to account for measurement error in theanalysis of a repeatedly measured biomarker and a time toevent. User friendly Stata software is provided.

Competing interestsThe authors declare that they have no competing interests.

Authors’ contributionsAll authors were involved in conception and design of the project. MJCconducted the simulation study, analysed the clinical dataset and wrote thefirst draft of the manscript. PCL and KRA both revised the manuscript. Allauthors read and approved the final manuscript.

AcknowledgementsMJC is funded by a National Institute for Health Research (NIHR) DoctoralFellowship (DRF-2012-05-409) and KRA is partially supported as a NIHR SeniorInvestigator (NI-51-0508-10061).The cohort of obese patients with type 2 diabetes mellitus was obtained fromthe General Practice Research Database (GPRD) under Independent ScientificAdvisory Committee (ISAC)-approved Protocol 09_094, and which was fundedby a National Institute for Health Research (NHIR) Health TechnologyAssessment (HTA) Programme Project Grant (07/85/02).The authors would like to thank Monica Hernández for initial preparatory workon the data from the GPRD cohort of patients, Roberta Ara for useful initialdiscussions regarding anti-obesity treatment, and finally the reviewers whosecomments greatly improved the paper.

Author details1University of Leicester, Department of Health Sciences, Adrian Building,University Road,Leicester LE1 7RH, UK. 2Karolinska Institutet, Department ofMedical Epidemiology and Biostatistics, Box 281, S-171 77 Stockholm, Sweden.

Received: 8 July 2013 Accepted: 4 November 2013Published: 1 December 2013

References1. Conroy RM, Pyorala K, Fitzgerald AP, Sans S, Menotti A, De Backer G,

De Bacquer D, Ducimetiere P, Jousilahti P, Keil U, Njolstad I, Oganov RG,Thomsen T, Tunstall-Pedoe H, Tverdal A, Wedel H, Whincup P,Wilhelmsen L, Graham IM, SCOREpg: Estimation of ten-year risk of fatalcardiovascular disease in Europe: the SCORE project. Eur Heart J 2003,24(11):987–1003.

2. Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, May M, Brindle P:Derivation and validation of QRISK, a new cardiovascular diseaserisk score for the United Kingdom: prospective open cohort study.BMJ 2007, 335(7611):136. [http://dx.doi.org/10.1136/bmj.39261.471806.55]

3. Ara R, Blake L, Gray L, Hernandez M, Crowther M, Dunkley A, Warren F,Jackson R, Rees A, Stevenson M, Abrams K, Cooper N, Davies M, Khunti K,Sutton A: What is the clinical effectiveness and cost-effectiveness ofusing drugs in treating obese patients in primary care? A systematicreview. Health Technol Assess 2012, 16(5):1–202. [http://dx.doi.org/10.3310/hta16050]

4. Prentice RL: Covariate measurement errors and parameterestimation in a failure time regression model. Biometrika 1982,69(2):331–342. [http://www.jstor.org/stable/2335407]

5. Wulfsohn MS, Tsiatis AA: A joint model for survival and longitudinaldata measured with error. Biometrics 1997, 53:330–339.

6. Henderson R, Diggle P, Dobson A: Joint modelling of longitudinalmeasurements and event time data. Biostatistics 2000, 1(4):465–480.


7. Rizopoulos D: JM: An R package for the joint modelling oflongitudinal and time-to-event data. J Stat Softw 2010, 35(9):1–33.[http://www.jstatsoft.org/v35/i09]

8. Philipson P, Sousa I, Diggle P, Williamson P, Kolamunnage-Dona R,Henderson R: joineR - Joint modelling of repeated measurementsand time-to-event data. 2012. [http://cran.r-project.org/web/packages/joineR/index.html]

9. Crowther MJ, Abrams KR, Lambert PC: Joint modeling of longitudinaland survival data. Stata J 2013, 13:165–184.

10. Wolbers M, Babiker A, Sabin C, Young J, Dorrucci M, Chêne G, Mussini C,Porter K, Bucher HC, CASCADE: Pretreatment CD4 cell slope andprogression to AIDS or death in HIV-infected patients initiatingantiretroviral therapy–the CASCADE collaboration: a collaborationof 23 cohort studies. PLoS Med 2010, 7(2):e1000239. [http://dx.doi.org/10.1371/journal.pmed.1000239]

11. Ibrahim JG, Chu H, Chen LM: Basic concepts and methods for jointmodels of longitudinal and survival data. J Clin Oncol 2010,28(16):2796–2801.

12. Guo X, Carlin BP: Separate and joint modeling of longitudinal andevent time data using standard computer packages. Am Stat 2004,58:16–24. [http://www.jstor.org/stable/27643494]

13. Rizopoulos D, Ghosh P: A Bayesian semiparametric multivariate jointmodel for multiple longitudinal outcomes and a time-to-event.Stat Med 2011, 30(12):1366–1380.

14. Laird NM, Ware JH: Random-effects models for longitudinal data.Biometrics 1982, 38(4):963–974.

15. Durrleman S, Simon R: Flexible regression models with cubic splines.Stat Med 1989, 8(5):551–561.

16. Royston P, Lambert PC: Flexible Parametric Survival Analysis Using Stata:Beyond the Cox Model. College Station: Stata Press; 2011.

17. Cox DR: Regression models and life-tables. J R Stat Soc Ser BMethodological 1972, 34(2):187–220.

18. Hsieh F, Tseng YK, Wang JL: Joint modeling of survival andlongitudinal data: likelihood approach revisited. Biometrics 2006,62(4):1037–1043. [http://dx.doi.org/10.1111/j.1541-0420.2006.00570.x]

19. Burton A, Altman DG, Royston P, Holder RL: The design of simulationstudies in medical statistics. Stat Med 2006, 25(11):4279–4292.

20. Royston P, Parmar MKB: Flexible parametric proportional hazards andproportional odds models for censored survival data, withapplication to prognostic modelling and estimation of treatmenteffects. Stat Med 2002, 21(15):2175–2197.

21. Rutherford MJ, Crowther MJ, Lambert PC: The use of restricted cubicsplines to approximate complex hazard functions in the analysis oftime-to-event data: a simulation study. J Stat Comput Simul 2013.[http://www.tandfonline.com/doi/abs/10.1080/00949655.2013.845890]

22. Crowther MJ, Abrams KR, Lambert PC: Flexible parametric jointmodelling of longitudinal and survival data. Stat Med 2012,31(30):4456–4471. [http://dx.doi.org/10.1002/sim.5644]

23. Rizopoulos D: Dynamic predictions and prospective accuracy in jointmodels for longitudinal and time-to-event data. Biometrics 2011,67(3):819–829. [http://dx.doi.org/10.1111/j.1541-0420.2010.01546.x]

24. Rothwell PM, Howard SC, Dolan E, O’Brien E, Dobson JE, Dahlöf B, SeverPS, Poulter NR: Prognostic significance of visit-to-visit variability,maximum systolic blood pressure, and episodic hypertension.Lancet 2010, 375(9718):895–905. [http://dx.doi.org/10.1016/S0140-6736(10)60308-X]

25. Zucker DM: A pseudo partial likelihood method for semiparametricsurvival regression with covariate errors. J Am Stat Assoc 2005,100(472):1264–1277. [http://www.tandfonline.com/doi/abs/10.1198/016214505000000538]

26. Liao X, Zucker DM, Li Y, Spiegelman D: Survival analysis witherror-prone time-varying covariates: a risk set calibration approach.Biometrics 2011, 67:50–58. [http://dx.doi.org/10.1111/j.1541-0420.2010.01423.x]

27. Crowther MJ: STJM: Stata module to fit shared parameter jointmodels of longitudinal and survival data. Stat Softw ComponentsBoston Coll Dep Econ 2012. [http://ideas.repec.org/c/boc/bocode/s457339.html]

doi:10.1186/1471-2288-13-146Cite this article as: Crowther et al.: Adjusting for measurement error inbaseline prognostic biomarkers included in a time-to-event analysis: a jointmodelling approach. BMC Medical Research Methodology 2013 13:146.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit

APPENDIX F

Research Paper 8 - The Stata Journal

This appendix contains a paper published in The Stata Journal, describing the

Stata package stjm which implements all of the joint models, and others, described in

this thesis. It is omitted from the e-thesis due to copyright.

233

References

H. Akaike. Information theory and an extension of the maximum likelihood principle,

volume 1, pages 267–281. Akademiai Kiado, 1973.

P. K. Anderson, Ø. Borgan, R. D. Gill, and N. Keiding. Statistical Models Based on

Counting Processes. New York, Springer, 1993.

T. M. Andersson, P. W. Dickman, S. Eloranta, and P. C. Lambert. Estimating and

modelling cure in population-based cancer studies within the framework of flexible

parametric survival models. BMC Med Res Methodol, 11:96, 2011.

R. Ara, L. Blake, L. Gray, M. Hernandez, M. Crowther, A. Dunkley, F. Warren, R. Jack-

son, A. Rees, M. Stevenson, K. Abrams, N. Cooper, M. Davies, K. Khunti, and

A. Sutton. What is the clinical effectiveness and cost-effectiveness of using drugs in

treating obese patients in primary care? A systematic review. Health Technol Assess,

16(5):1–202, Feb 2012.

P. C. Austin. Generating survival times to simulate cox proportional hazards models

with time-varying covariates. Stat Med, 31(29):3946–3958, 2012.

C. B. Begg and D. Schrag. Attribution of deaths following cancer treatment. J Natl

Cancer Inst, 94(14):1044–1045, Jul 2002.

A. Belot, M. Abrahamowicz, L. Remontet, and R. Giorgi. Flexible modeling of com-

peting risks in survival analysis. Stat Med, 29(23):2453–2468, 2010.

R. Bender, T. Augustin, and M. Blettner. Generating survival times to simulate Cox

proportional hazards models. Stat Med, 24(11):1713–1723, 2005.

J. Beyersmann, A. Latouche, A. Buchholz, and M. Schumacher. Simulating competing

risks data in survival analysis. Stat Med, 28(6):956–971, 2009.

234

K. Bhaskaran, O. Hamouda, M. Sannes, F. Boufassa, A. M. Johnson, P. C. Lambert,

K. Porter, and CASCADE Collaboration. Changes in the risk of death after HIV

seroconversion compared with mortality in the general population. JAMA, 300(1):

51–59, 2008.

L. J. Billingham and K. R. Abrams. Simultaneous analysis of quality of life and survival

data. Stat Methods Med Res, 11(1):25–48, 2002.

P. Bolard, C. Quantin, M. Abrahamowicz, J. Esteve, R. Giorgi, H. Chadha-Boreham,

C. Binquet, and J. Faivre. Assessing time-by-covariate interactions in relative survival

models using restrictive cubic spline functions. J Cancer Epidemiol Prev, 7(3):113–

122, 2002.

A. H. Briggs. Handling uncertainty in cost-effectiveness models. Pharmacoeconomics,

17(5):479–500, 2000.

E. R. Brown and J. G. Ibrahim. A bayesian semiparametric joint hierarchical model

for longitudinal and survival data. Biometrics, 59(2):221–228, 2003a.

E. R. Brown and J. G. Ibrahim. Bayesian approaches to joint cure-rate and longitudinal

models with applications to cancer vaccine trials. Biometrics, 59(3):686–693, 2003b.

E. R. Brown, J. G. Ibrahim, and V. DeGruttola. A flexible B-spline model for multiple

longitudinal biomarkers and survival. Biometrics, 61(1):64–73, 2005.

A. Burton, D. G. Altman, P. Royston, and R. L. Holder. The design of simulation

studies in medical statistics. Stat Med, 25(24):4279–4292, 2006.

V. Carstairs and R. Morris. Deprivation and health in Scotland. Aberdeen University

Press, Aberdeen, 1991.

B. Carstensen. Who needs the Cox model anyway? Technical report, Steno Diabetes

Center, Denmark, 2004.

Y. B. Cheung, F. Gao, and K. S. Khoo. Age at diagnosis and the choice of survival

analysis methods in cancer epidemiology. J Clin Epidemiol, 56(1):38–43, 2003.

T. G. Clark, M. J. Bradburn, S. B. Love, and D. G. Altman. Survival analysis part i:

basic concepts and first analyses. Br J Cancer, 89(2):232–238, 2003.

235

A. Clarke, R. Pulikottil-Jacob, A. Grove, K. Freeman, H. Mistry, A. Tsertsvadze,

M. Connock, R. Court, N.-B. Kandala, M. Costa, G. Suri, D. Metcalfe, M. Crowther,

S. Morrow, S. Johnson, and P. Sutcliffe. Total hip replacement and surface replace-

ment for the treatment of pain and disability resulting from end stage arthritis of the

hip (Review of technology appraisal guidance 2 and 44). Warwick Evidence, 2013.

D. Clayton and M. Hills. Statistical Methods in Epidemiology. Oxford University Press,

1993.

M. P. Coleman, P. Babb, P. Damiecki, P. Grosclaude, S. Honjo, J. Jones, G. Knerer,

A. Pitard, M. Quinn, A. Sloggett, and B. De Stavola. Cancer Survival Trends in

England and Wales, 1971–1995: Deprivation and NHS Region. Number 61 in Studies

in Medical and Population Subjects. London: The Stationery Office, 1999.

D. Collett. Modelling Survival Data in Medical Research. Chapman & Hall, 2003.

R. M. Conroy, K. Pyorala, A. P. Fitzgerald, S. Sans, A. Menotti, G. De Backer, D. De

Bacquer, P. Ducimetiere, P. Jousilahti, U. Keil, I. Njolstad, R. G. Oganov, T. Thom-

sen, H. Tunstall-Pedoe, A. Tverdal, H. Wedel, P. Whincup, L. Wilhelmsen, I. M.

Graham, and S. C. O. R. E. . Estimation of ten-year risk of fatal cardiovascular

disease in europe: the score project. Eur Heart J, 24(11):987–1003, 2003.

C. Cox, H. Chu, M. F. Schneider, and A. Munoz. Parametric Survival Analysis and

Taxonomy of Hazard Functions for the Generalized Gamma Distribution. Stat Med,

26(23):4352–4374, 2007.

D. R. Cox. Regression models and life-tables. J Roy Statist Soc Ser B, 34(2):187–220,

1972.

D. R. Cox. Partial likelihood. Biometrika, 62(2):269–276, 1975.

M. J. Crowther. SURVSIM: Stata module to simulate complex survival data. Statistical

Software Components, Boston College Department of Economics, 2011.

M. J. Crowther. STJM: Stata module to fit shared parameter joint models of longitudi-

nal and survival data. Statistical Software Components, Boston College Department

of Economics, 2012.

236

M. J. Crowther and P. C. Lambert. STMIX: Stata module to fit two-component para-

metric mixture survival models. Statistical Software Components, Boston College

Department of Economics, 2011.

M. J. Crowther and P. C. Lambert. Simulating complex survival data. Stata J, 12(4):

674–687, 2012.

M. J. Crowther and P. C. Lambert. Simulating biologically plausible complex survival

data. Stat Med, 32(23):4118–4134, 2013a.

M. J. Crowther and P. C. Lambert. stgenreg: A Stata package for the general para-

metric analysis of survival data. J Stat Softw, 53(12), 2013b.

M. J. Crowther and P. C. Lambert. A general framework for parametric survival

analysis. Stat Med, In Press, 2014.

M. J. Crowther, K. R. Abrams, and P. C. Lambert. Flexible parametric joint modelling

of longitudinal and survival data. Stat Med, 31(30):4456–4471, 2012a.

M. J. Crowther, R. D. Riley, J. A. Staessen, J. Wang, F. Gueyffier, and P. C. Lam-

bert. Individual patient data meta-analysis of survival data using Poisson regression

models. BMC Med Res Methodol, 12(34), 2012b.

M. J. Crowther, K. R. Abrams, and P. C. Lambert. Joint modeling of longitudinal and

survival data. Stata J, 13(1):165–184, 2013a.

M. J. Crowther, P. C. Lambert, and K. R. Abrams. Adjusting for measurement error in

baseline prognostic biomarkers included in a time-to-event analysis: A joint modelling

approach. BMC Med Res Methodol, 13(146), 2013b.

M. J. Crowther, M. P. Look, and R. D. Riley. Multilevel mixed effects parametric sur-

vival models using adaptive gauss-hermite quadrature with application to recurrent

events and individual participant data meta-analysis. Stat Med, 2014.

N. Demiris, D. Lunn, and L. D. Sharples. Survival extrapolation using the poly-weibull

model. Stat Methods Med Res, 2011.

E. Deslandes and S. Chevret. Assessing surrogacy from the joint modelling of multi-

variate longitudinal data and survival: application to clinical trial data on chronic

lymphocytic leukaemia. Stat Med, 26(30):5411–5421, 2007.

237

P. W. Dickman, A. Sloggett, M. Hills, and T. Hakulinen. Regression models for relative

survival. Stat Med, 23(1):51–64, 2004.

P. Diggle, P. Heagerty, K.-Y. Liang, and S. Zeger. Analysis of Longitudinal Data.

Oxford University Press, New York, 2nd edition, 2002.

P. J. Diggle, I. Sousa, and A. G. Chetwynd. Joint modelling of repeated measurements

and time-to-event outcomes: the fourth armitage lecture. Stat Med, 27(16):2981–

2998, 2008.

S. Durrleman and R. Simon. Flexible Regression Models with Cubic Splines. Stat Med,

8(5):551–561, 1989.

S. Eloranta, P. C. Lambert, T. M. Andersson, K. Czene, P. Hall, M. Bjorkholm, and

P. W. Dickman. Partitioning of excess mortality in population-based cancer patient

survival studies using flexible parametric survival models. BMC Med Res Methodol,

12(86), 2012.

K. Fall, F. Stromberg, J. Rosell, O. Andren, E. Varenhorst, and S. E. R. P. C. Group.

Reliability of death certificates in prostate cancer patients. Scand J Urol Nephrol, 42

(4):352–357, 2008.

C. L. Faucett and D. C. Thomas. Simultaneously modelling censored survival data

and repeatedly measured covariates: a Gibbs sampling approach. Stat Med, 15(15):

1663–1685, 1996.

A. Gelfand and D. K. Dey. Bayesian model choice: asymptotics and exact calculations.

J Roy Statist Soc Ser B, 54:501–514, 1994.

R. Giorgi, M. Abrahamowicz, C. Quantin, P. Bolard, J. Esteve, J. Gouvernet, and

J. Faivre. A relative survival regression model using B-spline functions to model

non-proportional hazards. Stat Med, 22(17):2767–2784, 2003.

A. L. Gould, M. E. Boye, M. J. Crowther, J. G. Ibrahim, G. Quartey, S. Micallef, and

F. Y. Bois. Joint modeling of survival and longitudinal non-survival data: current

methods and issues. Report of the DIA Bayesian joint modeling working group. Stat

Med, 2014.

238

W. Gould, J. Pitblado, and B. Poi. Maximum Likelihood Estimation with Stata. Stata

Press, 4th edition edition, 2010.

GPRD. General Practice Research Database. 2011.

P. M. Grambsch and T. M. Therneau. Proportional hazards tests and diagnostics based

on weighted residuals. Biometrika, 81:515–526, 1994.

X. Guo and B. P. Carlin. Separate and joint modeling of longitudinal and event time

data using standard computer packages. Am Stat, 58(1):16–24, 2004.

G. Han, M. J. Schell, and J. Kim. Improved survival modeling in cancer research using

a reduced piecewise exponential approach. Stat Med, 33(1):59–73, 2014.

R. Henderson, P. Diggle, and A. Dobson. Joint modelling of longitudinal measurements

and event time data. Biostatistics, 1(4):465–480, 2000.

R. Henderson, P. Diggle, and A. Dobson. Identification and efficacy of longitudinal

markers for survival. Biostatistics, 3(1):33–50, 2002.

S. R. Hinchliffe and P. C. Lambert. Flexible parametric modelling of cause-specific

hazards to estimate cumulative incidence functions. BMC Med Res Methodol, 13:13,

2013.

J. Hippisley-Cox, C. Coupland, Y. Vinogradova, J. Robson, M. May, and P. Brindle.

Derivation and validation of QRISK, a new cardiovascular disease risk score for the

United Kingdom: prospective open cohort study. BMJ, 335(7611):136, 2007.

F. Hsieh, Y.-K. Tseng, and J.-L. Wang. Joint modeling of survival and longitudinal

data: likelihood approach revisited. Biometrics, 62(4):1037–1043, 2006.

X. Huang, L. A. Stefanski, and M. Davidian. Latent-model robustness in joint models

for a primary endpoint and a longitudinal process. Biometrics, 65(3):719–727, 2009.

P. J. Huber. The behavior of maximum likelihood estimates under nonstandard con-

ditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics

and Probability, pages 221–233. University of California Press, 1967.

J. G. Ibrahim, M. Chen, and D. Sinha. Bayesian methods for joint modeling of longi-

tudinal and survival data with applications to cancer vaccine trials. Stat Sinica, 14:

863–883, 2004.

239

J. G. Ibrahim, H. Chu, and L. M. Chen. Basic concepts and methods for joint models

of longitudinal and survival data. J Clin Oncol, 28(16):2796–2801, 2010.

B. Jann. MOREMATA: Stata module (Mata) to provide various functions. Statistical

Software Components, Boston College Department of Economics, 2005.

I. Jatoi, W. F. Anderson, J.-H. Jeong, and C. K. Redmond. Breast cancer adjuvant

therapy: time to consider its time-dependent effects. J Clin Oncol, 29(17):2301–2304,

2011.

H. Jensen, R. Brookmeyer, P. Aaby, and P. K. Andersen. Shared frailty model for

left-truncated multivariate survival data. Technical report, 2004.

D. P. Jutte, L. L. Roos, and M. D. Brownell. Administrative record linkage as a tool

for public health research. Annu Rev Public Health, 32:91–108, 2011.

J. D. Kalbfleisch and R. L. Prentice. Statistical Analysis of Failure Time Data. New

York, Wiley, 2nd edition, 2002.

E. L. Kaplan and P. Meier. Nonparametric Estimation from Incomplete Observations.

Journal of the American Statistical Association, 53(282):457–481, 1958.

N. B. King, S. Harper, and M. E. Young. Use of relative and absolute effect measures

in reporting health inequalities: structured review. BMJ, 345:e5774, 2012.

C. Kooperberg and D. B. Clarkson. Hazard regression with interval-censored data.

Biometrics, 53(4):1485–1494, 1997.

C. Kooperberg, C. J. Stone, and Y. K. Truong. Hazard regression. J Amer Statist

Assoc, 90(429):pp. 78–94, 1995.

P. C. Lambert and P. Royston. Further development of flexible parametric models for

survival analysis. Stata J, 9:265–290, 2009.

P. C. Lambert, P. W. Dickman, C. P. Nelson, and P. Royston. Estimating the crude

probability of death due to cancer and other causes using relative survival models.

Stat Med, 29(7-8):885–895, 2010a.

P. C. Lambert, P. W. Dickman, C. L. Weston, and J. R. Thompson. Estimating the

cure fraction in population-based cancer studies by using finite mixture models. J

Roy Statist Soc Ser C, 59(1):35–55, 2010b.

240

P. C. Lambert, L. Holmberg, F. Sandin, F. Bray, K. M. Linklater, A. Purushotham,

D. Robinson, and H. Møller. Quantifying differences in breast cancer survival between

England and Norway. Cancer Epidemiol, 35(6):526–533, 2011.

N. R. Latimer. Survival analysis for economic evaluations alongside clinical trials–

extrapolation with patient-level data: inconsistencies, limitations, and a practical

guide. Med Decis Making, 33(6):743–754, 2013.

N. R. Latimer, K. R. Abrams, P. C. Lambert, M. J. Crowther, A. J. Wailoo, J. P.

Morden, R. L. Akehurst, and M. J. Campbell. Adjusting survival time estimates to

account for treatment switching in randomized controlled trials–an economic evalu-

ation context: Methods, limitations, and recommendations. Med Decis Making, 34

(3):387–402, 2014.

L. M. Leemis. Variate generation for accelerated life and proportional hazards models.

Operations Research, 35(6):pp. 892–894, 1987.

E. Lesaffre and B. Spiessens. On the effect of the number of quadrature points in a

logistic random effects model: an example. J Roy Statist Soc Ser C, 50(3):325–335,

2001.

X. Liao, D. M. Zucker, Y. Li, and D. Spiegelman. Survival analysis with error-prone

time-varying covariates: a risk set calibration approach. Biometrics, 67(1):50–58,

2011.

D. Y. Lin and L. J. Wei. The robust inference for the Cox proportional hazards model.

J Amer Statist Assoc, 84(408):1074–1078, 1989.

I. L. MacDonald. Numerical maximisation of likelihood: A neglected alternative to

EM? Int Stat Rev, 2014.

T. Mackenzie and M. Abrahamowicz. Marginal and hazard ratio specific random data

generation: Applications to semi-parametric bootstrapping. Stat Comp, 12:245–252,

2002.

J. J. McArdle, B. J. Small, L. Backman, and L. Fratiglioni. Longitudinal models of

growth and survival applied to the early detection of alzheimer’s disease. J Geriatr

Psychiatry Neurol, 18(4):234–241, 2005.

241

G. J. McLachlan and D. C. McGiffin. On the role of finite mixture models in survival

analysis. Stat Methods Med Res, 3(3):211–226, 1994.

B. Miladinovic, A. Kumar, R. Mhaskar, S. Kim, R. Schonwetter, and B. Djulbegovic. A

flexible alternative to the Cox proportional hazards model for assessing the prognostic

accuracy of hospice patient survival. PLoS One, 7(10):e47804, 2012.

T. S. Mok, Y.-L. Wu, S. Thongprasert, C.-H. Yang, D.-T. Chu, N. Saijo, P. Sun-

paweravong, B. Han, B. Margono, Y. Ichinose, Y. Nishiwaki, Y. Ohe, J.-J. Yang,

B. Chewaskulyong, H. Jiang, E. L. Duffield, C. L. Watkins, A. A. Armour, and

M. Fukuoka. Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma. N

Engl J Med, 361(10):947–957, 2009.

J. P. Morden, P. C. Lambert, N. Latimer, K. R. Abrams, and A. J. Wailoo. Assessing

methods for dealing with treatment switching in randomised controlled trials: a

simulation study. BMC Med Res Methodol, 11:4, 2011.

P. Murtagh, E. Dickson, M. Van Dam, G. Malincho, and P. Grambsch. Primary bil-

iary cirrhosis: Prediction of short-term survival based on repeated patient visits.

Hepatology, 20:126–134, 1994.

J. C. Naylor and A. F. M. Smith. Applications of a method for the efficient computation

of posterior distributions. Applied Statistics, 31(3):214–225, 1982.

C. P. Nelson, P. C. Lambert, I. B. Squire, and D. R. Jones. Flexible Parametric Models

for Relative Survival, with Application in Coronary Heart Disease. Stat Med, 26(30):

5486–5498, 2007.

P. Philipson, I. Sousa, P. Diggle, P. Williamson, R. Kolamunnage-Dona, and R. Hender-

son. joineR - Joint Modelling of Repeated Measurements and Time-to-Event Data,

2012.

A. V. Piccorelli and M. D. Schluchter. Jointly modeling the relationship between

longitudinal and survival data subject to left truncation with applications to cystic

fibrosis. Stat Med, 31(29):3931–3945, 2012.

J. C. Pinheiro and D. M. Bates. Approximations to the log-likelihood function in the

nonlinear mixed-effects model. J Comput Graph Statist, 4(1):pp. 12–35, 1995.

242

R. L. Prentice. Covariate measurement errors and parameter estimation in a failure

time regression model. Biometrika, 69(2):pp. 331–342, 1982.

R. L. Prentice, B. J. Williams, and A. V. Peterson. On the regression analysis of

multivariate failure time data. Biometrika, 68:373–379, 1981.

C. Proust-Lima and J. M. G. Taylor. Development and validation of a dynamic prog-

nostic tool for prostate cancer recurrence using repeated measures of posttreatment

PSA: a joint modeling approach. Biostatistics, 10(3):535–549, 2009.

H. Putter, M. Fiocco, and R. B. Geskus. Tutorial in biostatistics: competing risks and

multi-state models. Stat Med, 26(11):2389–2430, 2007.

R Development Core Team. R: A Language and Environment for Statistical Computing.

R Foundation for Statistical Computing, Vienna, Austria, 2012. ISBN 3-900051-07-0.

S. Rabe-Hesketh and A. Skrondal. Multilevel and Longitudinal Modeling Using Stata

Volume I: Continuous Responses. Stata Press, 3rd edition, 2012.

S. Rabe-Hesketh, A. Skrondal, and A. Pickles. Reliable estimation of generalized linear

mixed models using adaptive quadrature. Stata J, 2:1–21, 2002.

I. Rashid, L. Marcheselli, and M. Federico. Estimating survival in newly diagnosed

cancer patients: use of computer simulations to evaluate performances of different

approaches in a wide range of scenarios. Stat Med, 27(12):2145–2158, 2008.

N. Reid. A conversation with Sir David Cox. Statistical Science, 9:439–455, 1994.

L. Remontet, N. Bossard, A. Belot, J. Esteve, and F. R. A. N. C. I. M. An overall

strategy based on regression models to estimate relative survival and model the effects

of prognostic factors in cancer survival studies. Stat Med, 26(10):2214–2228, 2007.

D. Rizopoulos. JM: An R Package for the Joint Modelling of Longitudinal and Time-

to-Event Data. J Stat Softw, 35(9):1–33, 7 2010.

D. Rizopoulos. Dynamic predictions and prospective accuracy in joint models for lon-

gitudinal and time-to-event data. Biometrics, 67(3):819–829, 2011.

D. Rizopoulos. Joint Models for Longitudinal and Time-to-Event Data With Applica-

tions in R. Chapman & Hall, 2012.

243

D. Rizopoulos and P. Ghosh. A Bayesian semiparametric multivariate joint model for

multiple longitudinal outcomes and a time-to-event. Stat Med, 30(12):1366–1380,

2011.

D. Rizopoulos, G. Verbeke, and G. Molenberghs. Shared parameter models under

random effects misspecification. Biometrika, 95(1):63–74, 2008.

D. Rizopoulos, G. Verbeke, and E. Lesaffre. Fully exponential Laplace approximations

for the joint modelling of survival and longitudinal data. J Roy Statist Soc Ser B, 71

(3):637–654, 2009.

D. Rizopoulos, G. Verbeke, and G. Molenberghs. Multiple-imputation-based residuals

and diagnostic plots for joint models of longitudinal and survival outcomes. Biomet-

rics, 66(1):20–29, 2010.

W. H. Rogers. sg17: Regression standard errors in clustered samples. Stata Tech Bull,

13:19–23, 1993.

V. Rondeau, D. Commenges, and P. Joly. Maximum penalized likelihood estimation in

a gamma-frailty model. Lifetime Data Anal, 9(2):139–153, Jun 2003.

V. Rondeau, J.-P. Pignon, and S. Michiels. A joint model for the dependence between

clustered times to tumour progression and deaths: A meta-analysis of chemotherapy

in head and neck cancer. Stat Methods Med Res, 2011.

V. Rondeau, Y. Mazroui, and J. R. Gonzalez. frailtypack: An R Package for the

Analysis of Correlated Survival Data with Frailty Models Using Penalized Likelihood

Estimation or Parametrical Estimation. Journal of Statistical Software, 47(4), 2012.

J. Rooney, S. Byrne, M. Heverin, B. Corr, M. Elamin, A. Staines, B. Goldacre, and

O. Hardiman. Survival analysis of irish amyotrophic lateral sclerosis patients diag-

nosed from 1995-2010. PLoS One, 8(9):e74733, 2013.

P. M. Rothwell, S. C. Howard, E. Dolan, E. O’Brien, J. E. Dobson, B. Dahlof, P. S.

Sever, and N. R. Poulter. Prognostic significance of visit-to-visit variability, maxi-

mum systolic blood pressure, and episodic hypertension. Lancet, 375(9718):895–905,

2010.

244

P. Royston. Tools to simulate realistic censored survival-time distributions. Stata J, 12

(4):639–654, 2012.

P. Royston and D. G. Altman. Regression using fractional polynomials of continuous

covariates: Parsimonious parametric modelling. J Roy Statist Soc Ser C, 43(3):

429–467, 1994.

P. Royston and P. C. Lambert. Flexible Parametric Survival Analysis Using Stata:

Beyond the Cox Model. Stata Press, 2011.

P. Royston and M. K. B. Parmar. Flexible Parametric Proportional Hazards and Pro-

portional Odds Models for Censored Survival Data, with Application to Prognostic

Modelling and Estimation of Treatment Effects. Stat Med, 21(15):2175–2197, 2002.

P. Royston and W. Sauerbrei. Multivariable model-building: A pragmatic approach

to regression analysis based on fractional polynomials for modelling continuous vari-

ables. Wiley, 2008.

M. J. Rutherford, M. J. Crowther, and P. C. Lambert. The use of restricted cubic

splines to approximate complex hazard functions in the analysis of time-to-event

data: a simulation study. J Statist Comput Simulation, 2014.

SAS Institute Inc. SAS/STAT Software, Version 9.2. Cary, NC, 2008.

W. Sauerbrei, P. Royston, and M. Look. A new proposal for multivariable mod-

elling of time-varying effects in survival data based on fractional polynomial time-

transformation. Biom J, 49(3):453–473, 2007.

D. Schoenfeld. Partial Residuals for the Proportional Hazards Regression Model.

Biometrika, 69(1):239–241, 1982.

M. Schumacher, G. Bastert, H. Bojar, K. Hubner, M. Olschewski, W. Sauerbrei,

C. Schmoor, C. Beyerle, R. L. Neumann, and H. F. Rauschecker. Randomized 2

x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-

positive breast cancer patients. german breast cancer study group. J Clin Oncol, 12

(10):2086–2093, 1994.

S. Self and Y. Pawitan. AIDS Epidemiology: Methodological Issues, chapter Modeling

a marker of disease progression and onset of disease. Birkhauser, Boston, 1992.

245

F. Siannis, J. Copas, and G. Lu. Sensitivity analysis for informative censoring in

parametric survival models. Biostatistics, 6(1):77–91, 2005.

B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman &

Hall, London, 1992.

StataCorp. Statistical software: Release 12. College Station, TX: StataCorp LP, 2011.

StataCorp. Statistical software: Release 13. College Station, TX: StataCorp LP, 2013.

J. Stoer and R. Burlirsch. Introduction to Numerical Analysis. Springer, 3rd edition,

2002.

M. J. Sweeting and S. G. Thompson. Joint modelling of longitudinal and time-to-event

data with application to predicting abdominal aortic aneurysm growth and rupture.

Biom J, 53(5):750–763, 2011.

M.-P. Sylvestre and M. Abrahamowicz. Comparison of algorithms to generate event

times conditional on time-dependent covariates. Stat Med, 27(14):2618–2634, 2008.

J. M. G. Taylor, Y. Park, D. P. Ankerst, C. Proust-Lima, S. Williams, L. Kestin,

K. Bae, T. Pickles, and H. Sandler. Real-time individual predictions of prostate

cancer recurrence using joint models. Biometrics, 69(1):206–213, 2013.

T. Therneau. A Package for Survival Analysis in S, 2012. R package version 2.36-14.

T. M. Therneau and P. M. Grambsch. Modelling Survival Data: extending the Cox

model. Springer, 2000.

A. C. M. Thiebaut and J. Benichou. Choice of time-scale in Cox’s model analysis of

epidemiologic cohort data: a simulation study. Stat Med, 23(24):3803–3820, 2004.

Y.-K. Tseng, F. Hsieh, and J.-L. Wang. Joint modelling of accelerated failure time and

longitudinal data. Biometrika, 92(3):587–603, 2005.

A. A. Tsiatis and M. Davidian. A semiparametric estimator for the proportional hazards

model with longitudinal covariates measured with error. Biometrika, 88(2):447–458,

2001.

A. A. Tsiatis and M. Davidian. Joint modeling of longitudinal and time-to-event data:

an overview. Stat Sinica, 14:809–834, 2004.

246

F. Tuerlinckx, F. Rijmen, G. Verbeke, and P. D. Boeck. Statistical inference in gen-

eralized linear mixed models: a review. Br J Math Stat Psychol, 59(Pt 2):225–255,

2006.

Y. Wang and J. M. G. Taylor. Jointly modeling longitudinal and event time data with

application to acquired immunodeficiency syndrome. J Amer Statist Assoc, 96(455):

pp. 895–905, 2001.

L. J. Wei, D. Y. Lin, and L. Weissfeld. Regression analysis of multivariate incomplete

failure time data by modeling marginal distributions. J Amer Statist Assoc, 84(408):

1065–1073, 1989.

M. C. Weinstein, B. O’Brien, J. Hornberger, J. Jackson, M. Johannesson, C. Mc-

Cabe, and B. R. Luce. Principles of Good Practice for Decision Analytic Modeling

in Health-Care Evaluation: Report of the ISPOR Task Force on Good Research

Practices–Modeling Studies. Value in Health, 6(1):9–17, 2003.

H. White. A heteroskedasticity-consistent covariance matrix estimator and a direct test

for heteroskedasticity. Econometrica, 48(4):817–838, 1980.

H. White. Maximum likelihood estimation of misspecified models. Econometrica, 50:

1–25, 1982.

J. Whitehead. Fitting Cox’s regression model to survival data using GLIM. Applied

Statistics, 29:268–275, 1980.

M. Wolbers, A. Babiker, C. Sabin, J. Young, M. Dorrucci, G. Chene, C. Mussini,

K. Porter, H. C. Bucher, and CASCADE. Pretreatment CD4 cell slope and progres-

sion to AIDS or death in HIV-infected patients initiating antiretroviral therapy–the

CASCADE collaboration: a collaboration of 23 cohort studies. PLoS Med, 7(2):

e1000239, 2010.

M. S. Wulfsohn and A. A. Tsiatis. A joint model for survival and longitudinal data

measured with error. Biometrics, 53(1):330–339, 1997.

W. Ye, X. Lin, and J. M. G. Taylor. Semiparametric modeling of longitudinal mea-

surements and time-to-event data–a two-stage regression calibration approach. Bio-

metrics, 64(4):1238–1246, 2008.

247

M. Yu, N. J. Law, J. M. G. Taylor, and H. Sandler. Joint longitudinal-survival-cure

models and their application to prostate cancer. Stat Sinica, 14:835–862, 2004.

D. M. Zucker. A pseudo partial likelihood method for semiparametric survival regression

with covariate errors. J Amer Statist Assoc, 100(472):1264–1277, 2005.

248

Documents

Development and application of methodology for the parametric … · Abstract Development and application of methodology for the parametric analysis of complex survival and joint