StatisticsforSurvivalData Day2 Part IV WBL17–19 Regression ... · StatisticsforSurvivalData Day2 WBL17–19 AlainHauser [email protected] 2018-08-27 AlainHauser SurvivalAnalysis/WBL17–19

Statistics for Survival DataDay 2

WBL 17–19

Alain [email protected]

2018-08-27

Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 1 / 176

Part IV

Regression Models


Learning objectives

Explain the model assumptions behind parametric regression modelsFit a regression model in RIndicate the fitted model from an R output, and interpret itAssess whether a fitted model is appropriate (model validation)Perform model or variable selection using forward or backward search


Section 1

Weibull regression


Repetition: Weibull model in logarithmic time scale

Recall Weibull model in logarithmic time scale:Let T be Weibull distributed, and set Y := logT .We have seen that Y belongs to a location-scale familyMore precisely, Y has probability density

fY (y) = αe(y−log λ)α exp(−e(y−log λ)α

)=

1σexp

(y − µσ− exp

(y − µσ

))

with σ := 1/α, µ := log λHence we can write Y = µ+ σZ , where Z has standard extremevalue distribution: fZ (z) = exp (z − ez)


Weibull model for two groups

Recall two-sample problem with heroin data set:We fitted a Weibull model two both clinics, using the same shape, butdifferent scale parametersOn the logarithmic time scale, this means we fitted individual µ’s, buta common σ:Clinic 1: Y1 = µ1 + σZClinic 2: Y2 = µ2 + σZ

Introduce a binary explanatory variable X , settings X = 0 foraddicts in clinic 1, and X = 1 for addicts in clinic 2 (indicator variablefor clinic 2).The model can be rewritten as Y = β0 + β1X + σZ , with β0 = µ1,β1 = µ2 − µ1.


Heroin example: adding more explanatory variables

Apart from the clinic, we have more explanatory variables in the heroindata set:

I prison: indicates whether patient has prison record or notI dose: patient’s methadone dose (continuous variable)

They could also have an influence on the remission time T (andY = logT )Hence we can extend our Weibull regression model:

Y = β0 + β1X1 + β2X2 + β3X3 + σZ ,

X1: indicator variable for clinic 1, X2: indicator variable for prisonrecord, X3: methadone dose


Heroin example: Weibull regression in R

R syntax is no surprise:library(survival)addicts.weib.full <- survreg(Surv(survt, status) ~ clinic + prison + dose,

data = addicts, dist = "weibull")summary(addicts.weib.full)

#### Call:## survreg(formula = Surv(survt, status) ~ clinic + prison + dose,## data = addicts, dist = "weibull")## Value Std. Error z p## (Intercept) 4.81389 0.27499 17.51 < 2e-16## clinic2 0.70904 0.15722 4.51 6.5e-06## prisonyes -0.22947 0.12079 -1.90 0.057## dose 0.02443 0.00459 5.32 1.0e-07## Log(scale) -0.31495 0.06756 -4.66 3.1e-06#### Scale= 0.73#### Weibull distribution## Loglik(model)= -1084.5 Loglik(intercept only)= -1114.9## Chisq= 60.89 on 3 degrees of freedom, p= 3.8e-13## Number of Newton-Raphson Iterations: 7## n= 238


General Weibull regression model I

Definition (Weibull regression model)Let T be an event time, and X1, . . . ,Xp explanatory variables. The generalWeibull regression model looks as follows:

Y := logT , Y = β0 + β1X1 + . . .+ βpXp + σZ ,

where Z has standard extreme value distribution: fZ (z) = exp (z − ez).

µ = β0 + β1X1 + . . .+ βpXp is called the linear predictorEstimation of the model parameters β0, . . . , βp and σ can be donewith maximum-likelihood.MLE can account for censored data; therefore survival data shouldalways be fitted with survreg, and not, e.g., glm


General Weibull regression model II

Weibull regression model is normally written in logarithmic time scale(as before)Translated back to the original time scale, the survivor function reads

S(t;x1, . . . , xp) = P[T > t | X1 = x1, . . . ,Xp = xp]

= exp{− exp

[α(log(t)− β0 − β1x1 − . . .− βpxp)

]}The hazard function is hence

h(t;x1, . . . , xp) = −∂S∂t (t; x1, . . . , xp)

S(t; x1, . . . , xp)

= α exp[(α− 1) log(t)− α(β0 + β1x1 + . . .+ βpxp)

]= αtα−1 exp

[−α(β0 + β1x1 + . . .+ βpxp)

]Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 114 / 176

Predicting survival times

Heroin example: consider an addict without prison record and a methadonedose of 60 (in units of the data set).

Linear predictor for clinic 1 and 2:newdata <- data.frame(clinic = c("1", "2"),

prison = c("no", "no"), dose = c(60, 60))predict(addicts.weib.full, type = "lp", newdata = newdata)

## 1 2## 6.279409 6.988450

Predicted median (50% quantile) of remission time for both clinics:predict(addicts.weib.full, type = "quantile",

newdata = newdata, p = 0.5)

## 1 2## 408.2661 829.6140


Confidence intervals for quantiles

With argument se.fit = TRUE, predict also returns the estimatedstandard error for the quantilesTo calculate confidence intervals for the quantiles, it’s better to getquantiles and standard errors on the logarithmic time scale; use type ="uquantile" then:uquant <- predict(addicts.weib.full, type = "uquantile",

newdata = newdata, p = 0.75, se.fit = TRUE)quant.ci <- data.frame(

ci.lwr = exp(uquant$fit - qnorm(0.975)*uquant$se.fit),est = exp(uquant$fit),ci.upr = exp(uquant$fit + qnorm(0.975)*uquant$se.fit))

quant.ci

## ci.lwr est ci.upr## 1 571.0249 677.0832 802.8401## 2 1001.2688 1375.8618 1890.5968


Model validation: Tukey-Anscombe and Q-Q plot

Tukey-Anscombe plotPlot of residuals vs. fitted valuesWith survival data, we only know the residuals of non-censoredindividualsIt usually makes sense to plot residuals and fitted values in logarithmictime scalePlot should show residuals that have a similar distribution over allfitted values (no trend, no cone, etc.)

Q-Q plot of residuals:Plot of empirical quantiles of residuals vs. theoretical residualsexpected by error distribution (in our case, Weibull distribution, orextreme value, if on logarithmic time scale)Q-Q plot should show a straight line


TA plot and Q-Q plot in Rlibrary(car)par(mfrow = c(1, 2), cex = 0.6)lin.pred <- predict(addicts.weib.full, type = "lp")[addicts$status == 1]log.resid <- log(addicts$survt[addicts$status == 1]) - lin.predplot(lin.pred, log.resid, main = "TA plot",

xlab = "log(fitted values)", ylab = "log(residuals)")qqPlot(exp(log.resid), dist = "weibull",

shape = 1/addicts.weib.full$scale,main = "Q-Q plot", xlab = "Theor. quantiles", ylab = "Emp. quantiles")

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●●●

●●●

●

●

●● ●●

●

●●● ●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

● ●

●●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●●● ●

●

●●

●

●

●●

●

5.5 6.0 6.5 7.0 7.5 8.0

−4

−3

−2

−1

01

TA plot

log(fitted values)

log(

resi

dual

s)

0.0 1.0 2.0 3.0

0.0

0.5

1.0

1.5

2.0

Q−Q plot

Theor. quantiles

Em

p. q

uant

iles

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●

●●●●●●

●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●

●●●●

●●●●

●●●●●●●●●

●●

●

●

●

●698


Model validation for discrete explanatory variables

Assume we have only discrete explanatory variablesRemember from Weibull model without explanatory variables:

log(− log(S)

)= α log(t)− α log(λ)

Here, we have log(λ) = µ = β0 + β1x1 + . . .+ βpxp

A plot of log(− log(S)

)vs. log(t) should show straight, parallel lines:

I one line per group (= set of subjects sharing the same levels ofexplanatory variables)

I intersection of each group’s line given by values of explanatory variables

In practice, we can plot log(− log(S)

)vs. log(t) with S estimated by

the Kaplan-Meier estimator


Graphical model validation: heroin example

Let’s include only clinic and prison as explanatory variables; so theKaplan-Meier estimator in all groups is still reliable:addicts.km.strat <- survfit(Surv(survt, status) ~ clinic + prison,

data = addicts)addicts.table <- summary(addicts.km.strat)plot(log(addicts.table$time), log(-log(addicts.table$surv)),

xlab = "log(t)", ylab = "log(-log(S))", pch = 20, col = addicts.table$strata)

●

●

●

●

●●●

●●●●

●● ●●●

●●●●●●

●●●●●

●●●●●●●

●●●●●●●●●●●●●●

●●●●●

●●●●●●

●●●●

●

●

●

●

●

●●

●●

●●●●●●●

●●●●●●

●●●●

●●●●●●

●●●●●●●●●●

●●●●●●

●●●●●●

●

●

●

●

●

●●

●●

●●●

●●

●●

●

●

●

●

●●

●●●

●●

●

2 3 4 5 6

−4

−3

−2

−1

01

log(t)

log(

−lo

g(S

))


A closer look at the Weibull summary

The summary of a Weibull regression model automatically prints z andp values for the coefficients:#### Call:## survreg(formula = Surv(survt, status) ~ clinic + prison + dose,## data = addicts, dist = "weibull")## Value Std. Error z p## (Intercept) 4.81389 0.27499 17.51 < 2e-16## clinic2 0.70904 0.15722 4.51 6.5e-06## prisonyes -0.22947 0.12079 -1.90 0.057## dose 0.02443 0.00459 5.32 1.0e-07## Log(scale) -0.31495 0.06756 -4.66 3.1e-06#### Scale= 0.73#### Weibull distribution## Loglik(model)= -1084.5 Loglik(intercept only)= -1114.9## Chisq= 60.89 on 3 degrees of freedom, p= 3.8e-13## Number of Newton-Raphson Iterations: 7## n= 238

How are these p values calculated?


Recall: Fisher information matrix

Denote the vector of all model parameters by θ:θ = (β0, β1, . . . , βp, σ)

Recall (Part III): the MLE θ is asymptotically normally distributedaround the true parameter vector θ0:

θ ≈ N (θ0, I−1(θ0))

I (θ0) is the Fisher information matrix with entries

(I (θ)

)jk

:= −E[

∂2

∂θj∂θklog L(θ)

]


Confidence intervals and tests for coefficients

Consequence 1: a component θk of the parameter vector θ hasstandard error

se(θk) =√(

I (θ0)−1)kk,

which can be estimated as

se(θk) =

√(I (θ)−1

)kk

Consequence 2: an approximate confidence interval to the confidencelevel 1− α is given by[

θ − Φ−1(1− α/2) · se(θ), θ + Φ−1(1− α/2) · se(θ)]

Consequence 3: under the null hypothesis H0 : θk = 0, the teststatistic Z = θk/se(θk) is approximately normally distributed.


Heroin example: manual calculation of p value

To illustrate the Z test, we can “manually” calculate the p value of thevariable clinic2:

Extract the estimated variance of the parameter vector (i.e. theinverse Fisher information matrix):I.inv <- addicts.weib.full$var

Estimate the standard error of the coefficient for clinic2:(se <- sqrt(I.inv[2, 2]))

## [1] 0.1572246

Calculate the Z statistic and the corresponding p value:(z <- coef(addicts.weib.full)[2]/se)

## clinic2## 4.509734

2*(pnorm(abs(z), lower.tail = FALSE))

## clinic2## 6.490885e-06


Section 2

Model Selection


Heroin example: remove variable prison

From the summary of the full Weibull model for the heroin data set, wesee that prison is not significant (on the 5% level)Can we remove the variable from the model?Compare the full and the reduced model with a likelihood ratio test:addicts.weib.red <- survreg(Surv(survt, status) ~ clinic + dose,

data = addicts, dist = "weibull")anova(addicts.weib.red, addicts.weib.full, test = "Chisq")

## Terms Resid. Df -2*LL Test Df Deviance Pr(>Chi)## 1 clinic + dose 234 2172.503 NA NA NA## 2 clinic + prison + dose 233 2168.953 = 1 3.549613 0.05955934


Likelihood ratio test I

The likelihood ratio test can be used to compare arbitrary nestedmodelsTest statistic is always

D := −2 log(

likelih. of null modellikelih. of altern. model

)When the alternative (larger) model has k parameters more than thenull (smaller) model, the test statistic has the asymptotic distribution

D ≈ χ2k

under the null hypothesis


Heroin example: comparing nested models

We compare two Weibull models that differ by 2 explanatory variables:addicts.weib.red2 <- survreg(Surv(survt, status) ~ clinic,

data = addicts, dist = "weibull")anova(addicts.weib.red2, addicts.weib.full, test = "Chisq")

## Terms Resid. Df -2*LL Test Df Deviance Pr(>Chi)## 1 clinic 235 2200.155 NA NA NA## 2 clinic + prison + dose 233 2168.953 = 2 31.20223 1.676959e-07


Likelihood ratio test II

Formal representation of likelihood ratio test:1 Model: Y logT = β0 + β1X1 + . . .+ βpXp + σZ , with Z having

standard extreme value distribution2 Null hypothesis: H0 : β1 = β2 = . . . = βk = 0

Alternative hypothesis: HA : β1, β2, . . . , βk 6= 0

3 Test statistic: D = −2 log(

likelih. of null modellikelih. of altern. model

)Distribution of D under H0: T ∼ χ2

k (χ2 distribution with k degreesof freedom)

4 Choose significance level: e.g. α = 5%

5 Range of rejection: K = [q,∞), where q is the (1− α)-quantile ofthe χ2 distribution with k degrees of freedom

6 Test decision: reject if D ∈ K


Akaike information criterion

Alternative to likelihood ratio test: Akaike information criterion(AIC)AIC assigns a score to every model: AIC = 2k − 2 log(likelihood); k :number of parameters of the modelAIC “penalizes complexity”Model selection procedure: from a given set of candidate models,take the one that minimizes the AICHeroin example:AIC(addicts.weib.full)

## [1] 2178.953

AIC(addicts.weib.red)

## [1] 2180.503

AIC(addicts.weib.red2)

## [1] 2206.155


Likelihood ratio test and AIC: comparison

LR test AICAdvantage p-value can compare non-nested

modelsDisadvantage only for nested models not clear how small differ-

ence of AIC must be in-terpreted


Model selection: overview

When fitting a Weibull regression model with many explanatoryvariables, not all of them are significant in general (heroin example:prison)As with other regression techniques, we should perform model orvariable selection to get rid of non-significant variablesReasons:

I avoid overfittingI improve interpretability of modelI improve predictive power of model


Stepwise model selection I

Theoretically best approach for model selection: exhaustive search:1 fit every possible model2 keep the “best” one according to some criterion (e.g., the one that

minimizes the AIC, or the BIC, or similar)

With p explanatory variables, there are 2p possible models exhaustive search infeasible even with moderate p

Computationally feasible alternative: greedy or stepwise search,either using a likelihood ratio test or a model selection criterion suchas AIC.


Stepwise model selection II

General idea of greedy search: instead of exhaustively searching the fullmodel space, traverse it in small steps, adding or removing one explanatoryvariable at a time. Two approaches:

Backward selectionI start with full modelI sequentially drop variable that maximally reduces AICI stop when AIC cannot be minimized further

Forward selectionI start with empty modelI sequentially add variable that maximally reduces AICI stop when AIC cannot be minimized further

Do these methods necessarily find the model with the lowest AIC?


Backward selection in R

library(MASS)addicts.bw <- stepAIC(addicts.weib.full, direction = "backward",

trace = 0)summary(addicts.bw)

#### Call:## survreg(formula = Surv(survt, status) ~ clinic + prison + dose,## data = addicts, dist = "weibull")## Value Std. Error z p## (Intercept) 4.81389 0.27499 17.51 < 2e-16## clinic2 0.70904 0.15722 4.51 6.5e-06## prisonyes -0.22947 0.12079 -1.90 0.057## dose 0.02443 0.00459 5.32 1.0e-07## Log(scale) -0.31495 0.06756 -4.66 3.1e-06#### Scale= 0.73#### Weibull distribution## Loglik(model)= -1084.5 Loglik(intercept only)= -1114.9## Chisq= 60.89 on 3 degrees of freedom, p= 3.8e-13## Number of Newton-Raphson Iterations: 7## n= 238


Forward selection in R

addicts.empty <- survreg(Surv(survt, status) ~ 1,data = addicts, dist = "weibull")

addicts.fw <- stepAIC(addicts.empty, direction = "forward",scope = list(upper = ~ clinic + prison + dose), trace = 0)

summary(addicts.fw)

#### Call:## survreg(formula = Surv(survt, status) ~ dose + clinic + prison,## data = addicts, dist = "weibull")## Value Std. Error z p## (Intercept) 4.81389 0.27499 17.51 < 2e-16## dose 0.02443 0.00459 5.32 1.0e-07## clinic2 0.70904 0.15722 4.51 6.5e-06## prisonyes -0.22947 0.12079 -1.90 0.057## Log(scale) -0.31495 0.06756 -4.66 3.1e-06#### Scale= 0.73#### Weibull distribution## Loglik(model)= -1084.5 Loglik(intercept only)= -1114.9## Chisq= 60.89 on 3 degrees of freedom, p= 3.8e-13## Number of Newton-Raphson Iterations: 7## n= 238


Section 3

Log-Normal and Log-Logistic Regression


Regression models for location-scale families

Recall model in Weibull regression: in logarithmic time scale

Y = logT = β0 + β1X1 + . . .+ βpXp + σZ ,

where Z has standard extreme value distribution.Standard representation valid for all location-scale families: if Y hasdensity f (y ;µ, σ) from a location-scale family, Y = µ+ σZ with Zhaving “standard” distribution f (z ; 0, 1)

Hence we use the same ansatz of regression models for log-normal andlog-logistic distributions:

Y = logT = β0 + β1X1 + . . .+ βpXp + σZ ,

where Z has standard normal (log-normal case) or standard logistic(log-logistic case) distribution


Log-normal regression model

If the event time T has log-normal distribution, we model itsdependency from explanatory variables as

Y = logT = β0 + β1X1 + . . .+ βpXp + σZ ,

where Z has standard normal distribution: Z ∼ N (0, 1)

The survivor function has the form

S(t | X1, . . . ,Xp) = 1− Φ

(1σ

(log(t)− β0 − β1X1 − . . .− βpXp

))


Log-normal regression in R

Fitting a log-normal model is completely analog to fitting a Weibull model:addicts.lnorm.full <- survreg(Surv(survt, status) ~ clinic + prison + dose,

data = addicts, dist = "lognormal")summary(addicts.lnorm.full)

#### Call:## survreg(formula = Surv(survt, status) ~ clinic + prison + dose,## data = addicts, dist = "lognormal")## Value Std. Error z p## (Intercept) 3.98333 0.34663 11.49 <2e-16## clinic2 0.57649 0.17648 3.27 0.0011## prisonyes -0.30904 0.15431 -2.00 0.0452## dose 0.03367 0.00568 5.93 3e-09## Log(scale) 0.07476 0.05930 1.26 0.2074#### Scale= 1.08#### Log Normal distribution## Loglik(model)= -1097.8 Loglik(intercept only)= -1123.7## Chisq= 51.85 on 3 degrees of freedom, p= 3.2e-11## Number of Newton-Raphson Iterations: 4## n= 238


Model validation: TA and Q-Q plot

Model validation is analog to the Weibull case; it makes sense to make theQ-Q plot on the logarithmic time scale. R code for Q-Q plot (rest as forWeibull regression):qqPlot(log.resid, dist = "norm",

sd = addicts.lnorm.full$scale,main = "Q-Q plot", xlab = "Theor. quantiles (normal)", ylab = "Emp. quantiles")

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●●

●

●

●

●● ●● ●

●●●

●

●

●●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

● ●● ●

●

●●

●

●

●●

●

5.0 5.5 6.0 6.5 7.0 7.5 8.0

−3

−2

−1

01

TA plot

log(fitted values)

log(

resi

dual

s)

−3 −2 −1 0 1 2 3

−3

−2

−1

01

Q−Q plot

Theor. quantiles (normal)

Em

p. q

uant

iles

●

●

● ●●●●●●

●●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●

●●

● ● ●

●

86

110


Log-logistic regression model

If the event time T has log-logistic distribution, we model itsdependency from explanatory variables as

Y = logT = β0 + β1X1 + . . .+ βpXp + σZ ,

where Z has standard logistic distribution: f (z) =e−z

(1 + e−z)2

The survivor function has the form

S(t | X1, . . . ,Xp) =1

1 + exp[α(log(t)− β0 − β1X1 − . . .− βpXp

)]


Log-logistic regression in R

Fitting a log-logistic model is completely analog to fitting a Weibull model:addicts.llogis.full <- survreg(Surv(survt, status) ~ clinic + prison + dose,

data = addicts, dist = "loglogistic")summary(addicts.llogis.full)

#### Call:## survreg(formula = Surv(survt, status) ~ clinic + prison + dose,## data = addicts, dist = "loglogistic")## Value Std. Error z p## (Intercept) 4.14387 0.33829 12.25 < 2e-16## clinic2 0.58060 0.17157 3.38 0.00071## prisonyes -0.29127 0.14396 -2.02 0.04305## dose 0.03161 0.00552 5.73 1.0e-08## Log(scale) -0.53314 0.06863 -7.77 7.9e-15#### Scale= 0.587#### Log logistic distribution## Loglik(model)= -1093.9 Loglik(intercept only)= -1120## Chisq= 52.18 on 3 degrees of freedom, p= 2.7e-11## Number of Newton-Raphson Iterations: 4## n= 238


Model validation: TA and Q-Q plot

Model validation is analog to the Weibull case; it makes sense to make theQ-Q plot on the logarithmic time scale. R code for Q-Q plot (rest as forWeibull regression):qqPlot(log.resid, dist = "norm",

sd = addicts.llogis.full$scale,main = "Q-Q plot", xlab = "Theor. quantiles (logistic)", ylab = "Emp. quantiles")

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●●

●

●

●

●● ●●

●

●●●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

● ●● ●

●

●●

●

●

●●

●

5.0 5.5 6.0 6.5 7.0 7.5 8.0

−3

−2

−1

01

TA plot

log(fitted values)

log(

resi

dual

s)

−1.5 −0.5 0.5 1.0 1.5

−3

−2

−1

01

Q−Q plot

Theor. quantiles (logistic)

Em

p. q

uant

iles

●

●

●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●

●●●●

●● ● ●

●

86

110


Part V

Cox Proportional Hazards Model


Learning objectives

Write the general form of a Cox proportional hazards modelExplain the difference to parametric regression modelsFit a Cox PH model in RInterpret the R output of a Cox PH fit, expecially the hazard ratiosPerform model validation with graphical methods and tests


Cox proportional hazards (PH) model

Considered so far: parametric regression models for survival dataIn the absence of explanatory variables, we have seen the powerful,non-parametric Kaplan-Meier estimatorThe Cox proportional hazards (PH) model combines both the flexibilityof non-parametric models and the interpretability of regression modelsThe Cox PH model is one of the most popular models in survivalanalysis, especially in the analysis of medical data


Cox PH model

Cox PH modelThe Cox proportional hazards model assumes that the hazard at time thas the following form:

h(t; x1, . . . , xp) = h0(t) · exp(β1x1 + . . .+ βpxp)

h0(t) is called the baseline hazard.


Notes on the Cox PH model

The baseline hazard is not specified more precisely, i.e. has noparametric form Cox PH model is called a semiparametric modelThe baseline hazard is the same for all subjectsThe explanatory variables are assumed to be time-independentThe hazard function must, by definition, always be positive; theexponential function in the Cox PH model assures this (comparableapproach to logistic regression)

Why don’t we need a parameter β0, contrary to regression models?


Partial likelihood

Hazard of Cox PH model is not fully parametric no MLE possibleTherefore, consider partial likelihood Lc instead of likelihood:

Lc(β) =∏

i :δi=1

Li (β) ,

Li (β) = probability that an individual with covariates of subject i hasfailure at ti given that there is one failure in the risk set of tiIn “normal” likelihood, there is no conditioning involved.


Fitting a Cox PH model

Usual approach of estimating model (Cox, 1972), implemented in Rfunction coxph: maximize partial likelihoodThis approach makes baseline hazard a nuisance parameterHence, we only get an estimate of the coefficients β in a first stepThis is normally sufficient since we are usually only interested in thecoefficients: they define hazard ratios (see later).Consequence: if we want to plot the fitted survivor function, we mustuse a Kaplan-Meier estimator in addition to coxph


Estimating Cox PH model in R

Cox PH models can be fitted with the function coxph from the survivalpackage:addicts.cox <- coxph(Surv(survt, status) ~ clinic + prison + dose,

data = addicts)summary(addicts.cox)

## Call:## coxph(formula = Surv(survt, status) ~ clinic + prison + dose,## data = addicts)#### n= 238, number of events= 150#### coef exp(coef) se(coef) z Pr(>|z|)## clinic2 -1.009896 0.364257 0.214889 -4.700 2.61e-06 ***## prisonyes 0.326555 1.386184 0.167225 1.953 0.0508 .## dose -0.035369 0.965249 0.006379 -5.545 2.94e-08 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### exp(coef) exp(-coef) lower .95 upper .95## clinic2 0.3643 2.7453 0.2391 0.5550## prisonyes 1.3862 0.7214 0.9988 1.9238## [...]


Lots of output. . .

## Call:## coxph(formula = Surv(survt, status) ~ clinic + prison + dose,## data = addicts)#### n= 238, number of events= 150#### coef exp(coef) se(coef) z Pr(>|z|)## clinic2 -1.009896 0.364257 0.214889 -4.700 2.61e-06 ***## prisonyes 0.326555 1.386184 0.167225 1.953 0.0508 .## dose -0.035369 0.965249 0.006379 -5.545 2.94e-08 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### exp(coef) exp(-coef) lower .95 upper .95## clinic2 0.3643 2.7453 0.2391 0.5550## prisonyes 1.3862 0.7214 0.9988 1.9238## dose 0.9652 1.0360 0.9533 0.9774#### Concordance= 0.665 (se = 0.026 )## Rsquare= 0.238 (max possible= 0.997 )## Likelihood ratio test= 64.56 on 3 df, p=6e-14## Wald test = 54.12 on 3 df, p=1e-11## Score (logrank) test = 56.32 on 3 df, p=4e-12

Coefficients βjHazard ratios:exponentiatedcoefficients e βj

p-values for globalsignificance tests


Hazard ratio I

Consider two subjects i and j with explanatory variables xi1, . . . , xipand xj1, . . . , xjp, resp.

Their hazard ratio is HR =h(t; xi1, . . . , xip)

h(t; xj1, . . . , xjp)

The HR says how much more likely it is that subject i has an event inthe next time unit than subject jHeroin example: the HR between patients i and j says how muchlikely it is that patient i is released the next day than patient jIf the event refers to death, the HR expresses how much higher theinstantaneous death probability is for subject i than for subject j


Hazard ratio II

Why is the hazard ratio interesting?Heroin example: suppose subjects i and j have both no prison record(xi2 = xj2 = 0) and got the same methadone dose (xi3 = xj3), butsubject i was treated in clinic 2 (xi1 = 1) and subject j in clinic 1(xj1 = 0).Then the hazard ratio of subjects i and j is

HR =h0(t) · exp(β1 · xi1 + . . .+ βp · xip)

h0(t) · exp(β1 · xj1 + . . .+ βp · xjp)=

eβ1·1

eβ1·0= eβ1


Hazard ratio in a Cox PH model

Hazard ratio in Cox PH modelIn a Cox PH model, the hazard ratio of the j-th explanatory variable isdefined as

HRj = eβj

Example: heroin data. The HR of the variable clinic, HR1 = eβ1 , sayshow much more likely it is . . .

. . . that a patient from clinic 2 is released the next day. . .

. . . compared to a patient from clinic 1, . . .

. . . given or assuming that they coincide in all other explanatoryvariables (prison record and methadone dose)


Heroin example: hazard ratio of clinic

The estimated HR of the variable clinic is HR1 = 0.3643.What does this mean?Suppose you have to decide whether clinic 1 or clinic 2 does a betterjob (i.e. is releasing patients earlier) as a politician.Do you take the HR from a Cox PH model, or the log-rank test frombefore? Why?


Plotting a fitted Cox PH model I

The output of coxph cannot be plotted directly:plot(addicts.cox)

gives an error! (Why?)Example: estimate the survivor function for two patients with noprison record and a mean methadone dose, one in clinic 1 and one inclinic 2:sample.data <- data.frame(

clinic = c("1", "2"), prison = rep("no", 2), dose = rep(mean(addicts$dose), 2))sample.surv <- survfit(addicts.cox, newdata = sample.data)


Plotting a fitted Cox PH model II

. . . and plot the fitted survivor function:plot(sample.surv, col = c(1, 2), conf.int = TRUE,

xlab = "Time [days]", ylab = "S(t)")legend("bottomleft", bty = "n", lty = 1, col = 1:2,

legend = sprintf("clinic %d", 1:2))

0 200 400 600 800 1000

0.0

0.2

0.4

0.6

0.8

1.0

Time [days]

S(t

)

clinic 1clinic 2


Cox PH assumption revisited I

Recall the proportional hazards assumption:

h(t; x1, . . . , xp) = h0(t) · exp(β1x1 + . . .+ βpxp)

Hence

log h(t; x1, . . . , xp) = log h0(t) + β1x1 + . . .+ βpxp

Consequence: plots of log h(t; x1, . . . , xp) for different groups(assuming discrete, or discretized explanatory variables) should showparallel linesIn practice, we can use kernel estimates (see Part II) of the hazardfunctions


Cox PH assumption revisited I

A more common approach for graphical model validation only involvesthe Kaplan-Meier estimator, and no estimate of the hazard functionsThe cumulative hazard function H(t; x1, . . . , xp) =∫ t0 h(u; x1, . . . , xp) du can be decomposed as

H(t; x1, . . . , xp) = H0(t) ·exp(β1x1 + . . .+βpxp), H0(t) =

∫ t

0h0(u) du

Taking the logarithm yields

log(H(t; x1, . . . , xp)) = log(H0(t)) +

p∑j=1

βjxj

Recall that H(t; x1, . . . , xp) = − log S(t; x1, . . . , xp)


Model validation: does the PH assumption hold?

Practical consequence of last line:

Graphical test for PH assumptionIf the proportional hazards assumption holds, a plot of log(− log(S)) vs. tfor different groups of subjects shows parallel lines.

The PH assumption can be tested variable by variable.

−ln(−ln) S

Time

Females

MalesLowMedium

High

t

−ln−ln S

(Figures from Kleinbaum and Klein, 2005)


Heroin example: testing the PH assumption I

Plotting log(− log(S)) vs. t for the covariateclinic:addicts.km.clinic <-

survfit(Surv(survt, status) ~ clinic,data = addicts)

addicts.table <- summary(addicts.km.clinic)plot(addicts.table$time,

log(-log(addicts.table$surv)),col = addicts.table$strata,xlab = "log(t)", ylab = "log(-log(S))",pch = 20)

●

●

●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●● ●●●● ●

●

●

●

●●

●●●

●●●●

●●●●●●●

● ●● ●● ● ● ●●

0 200 600

−5

−3

−1

1

log(t)

log(

−lo

g(S

))


Heroin example: testing the PH assumption II

Simpler way: using function survplot from the rms package.As survplot is not compatible with survfit output, we have to fit the KMestimator using npsurv (which does exactly the same thing as survfit . . . ):library(rms)survplot(npsurv(Surv(survt, status) ~ clinic, data = addicts),

loglog = TRUE, xlab = "Time", ylab = "log(-log(S))")


Heroin example: testing the PH assumption III

Plots for all three covariates:

Time0 240 480 720 960

log(

−lo

g(S

))−

8−

6−

4−

20

2

clinic=1

clinic=2

Clinic

Time0 240 480 720 960

log(

−lo

g(S

))−

8−

6−

4−

20

2prison=noprison=yes

Prison

Time0 240 480 720 960

log(

−lo

g(S

))−

8−

6−

4−

20

2

dose >= median(dose)=FALSE

dose >= median(dose)=TRUE

Dose

clinic violates the Cox PH assumption!


Cox-Snell residuals I

Plot of log(− log(S)) vs. t:I Advantages: easy to produce, easy to understandI Disadvantage: not so easy to see whether lines are parallel: where they

are steep, they seem to be closer than where they are flat.

Cox-Snell residuals can be used to do a different graphical modelvalidation; it’s more difficult to understand, but easier to judge


Cox-Snell residuals II

Let xi1, . . . , xip denote the covariates of subject i , Yi = min{Ti ,Ci}its event or censoring time.Cox-Snell residual:

rCi := H0(Yi ) · exp(β1xi1 + . . .+ βpxip) ,

H0(t) :=∫ t0 h0(u) du.

It can be shown that Cox-Snell residuals have exponential distributionwhen the Cox PH assumptions are met.


Checking Cox-Snell residuals in R

In R, we get Cox-Snell residuals as follows:cox.snell <- abs(addicts$status - addicts.cox$residuals)

We can plot them against their cumulativehazard function as follows:qqPlot(cox.snell, dist = "exp", rate = mean(cox.snell))

0 2 4 6 8 10

01

23

4

exp quantiles

cox.

snel

l

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●

●●●

●●

●

●9

84


Goodness of fit testing approach

Goodness of fit testing approach is an alternative to graphical modelvalidationPro: provides a single p-value ( more objective)Contra: tests for violations of the assumption test must be donethe “wrong way”, with an unknown type II error rateRough idea of goodness of fit test:

I calculate residuals for each of the explanatory variablesI check whether the residuals are not correlated to survival time


Schoenfeld residuals and goodness of fit test

Consider the heroin data set againSuppose subject i has as event at time t(j). Then his or herSchoenfeld residual for the variable dose is the difference betweenhis or her methadone dose and a weighted mean methadone dose ofall individuals still at risk at time t(j)

The mean is weighted by the hazard of the patients in the risk setThe goodness of fit test for the variable dose now tests the nullhypothesis that the correlation coefficient between the Schoenfeldresiduals for dose and the survival time is 0.For categorical explanatory variables, the calculation of the Schoenfeldresidual is a bit different, but similar.


Goodness of fit test in R

In R, the goodness of fit can be tested with the function cox.zph from thesurvival package:cox.zph(addicts.cox)

## rho chisq p## clinic2 -0.2578 11.19 0.000824## prisonyes -0.0382 0.22 0.639369## dose 0.0724 0.70 0.402749## GLOBAL NA 12.62 0.005546

We get separate p-values for each explanatory variable as well as a globalp-value. Again: clinic seems to violate the PH assumption.


Example: survival of lung cancer patients

Data set from the Insel hospital: overall survival (variable os) wasmeasured from 67 patients with non-small cell lung cancer (NSCLC)Additional variables:

I clinical variables (grade, stage, age, preop, other, rt, drug)I gene expression measurements (cda, gldc, rrm2, tk1, tyms)

Question: which genes or clinical variables are good predictors for theoverall survival of patients?We clearly have too many explanatory variables to fit a good model variable selection


Variable selection for Cox models

variable selection for Cox models works exactly as for parametric regressionmodels.

Manual approach: iteratively remove least significant variable,compare larger and smaller model with likelihood-ratio testAutomatic approach: use forward or backward selection based on AIC


Lung cancer data set: manual variable selection

We demonstrate the first step of manual variable selection for the lungcancer data set:nsclc.cox.full <- coxph(Surv(os, status) ~ ., data = nsclc)summary(nsclc.cox.full)

The long output is omitted here; gldc is the least significant variable,hence we remove it:nsclc.cox.red <- update(nsclc.cox.full, . ~ . - gldc)# library(lmtest)anova(nsclc.cox.red, nsclc.cox.full, test = "Chisp")

## Analysis of Deviance Table## Cox model: response is Surv(os, status)## Model 1: ~ grade + stage + age + preop + other + rt + drug + cda + rrm2 + tk1 + tyms## Model 2: ~ grade + stage + age + preop + other + rt + drug + cda + gldc + rrm2 + tk1 + tyms## loglik Chisq Df P(>|Chi|)## 1 -37.681## 2 -37.677 0.0078 1 0.9295

We can indeed accept the simpler model.


Lung cancer data set: automatic variable selection

The well-known function stepAIC is a more convenient way of doingvariable selection, e.g. with backward selection:library(MASS)nsclc.cox.red <- stepAIC(nsclc.cox.full, direction = "backward", trace = 0)summary(nsclc.cox.red)

## Call:## coxph(formula = Surv(os, status) ~ grade + age + other + rt +## drug + tyms, data = nsclc)#### n= 67, number of events= 13#### coef exp(coef) se(coef) z Pr(>|z|)## grade 1.418e+00 4.131e+00 6.125e-01 2.316 0.02057 *## age 7.679e-02 1.080e+00 3.383e-02 2.270 0.02320 *## otheryes 2.830e+00 1.695e+01 9.676e-01 2.925 0.00344 **## rtno -2.244e+00 1.060e-01 1.152e+00 -1.948 0.05140 .## rtyes -2.176e+01 3.551e-10 8.298e+03 -0.003 0.99791## drug 9.185e-01 2.506e+00 3.627e-01 2.533 0.01132 *## tyms -3.857e-01 6.800e-01 1.583e-01 -2.436 0.01485 *## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## [...]


References I

David R. Cox. Regression models and life tables. Journal of the Royal StatisticalSociety, 34:187–220, 1972.

David G Kleinbaum and Mitchel Klein. Survival Analysis: A Self-Learning Text.Springer, 2005.


Documents

StatisticsforSurvivalData Day2 Part IV WBL17–19 Regression ... · StatisticsforSurvivalData Day2 WBL17–19 AlainHauser [email protected] 2018-08-27 AlainHauser SurvivalAnalysis/WBL17–19