39
1 Survival Analysis Survival Analysis A Brief Introduction

Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

Embed Size (px)

Citation preview

Page 1: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

1

Survival AnalysisSurvival AnalysisA Brief Introduction

Page 2: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

2

Page 3: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

3

1. Survival1. Survival Function, Hazard Function, Hazard FunctionFunction

In many medical studies, the primary endpoint is time until an event occurs (e.g. death, remission)

Data are typically subject to censoring (e.g. when a study ends before the event occurs)

Survival Function - A function describing the proportion of individuals surviving to or beyond a given time. Notation:◦ T: survival time of a randomly selected

individual ◦ t: a specific point in time.

◦Survival Function: 0

( ) ( ) exp ( )t

S t P T t u du

Page 4: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

4

Hazard Function/RateHazard Function/Rate Hazard Function (t): instantaneous

failure rate at time t given that the subject has survived upto time t. That is

Here f(t) is the probability density function of the survival time T. That is,

where F(t) is the cumulative distribution function of T:

4

0

0

( | ) ( )lim

( )

( ) ( ) 1lim

( )

P t T t T t P t T tt

P T t

f tS t S t

S t S t

df t F t

dt

1F t S t P T t

Page 5: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

5

2. The Key Word is 2. The Key Word is ‘Censoring’‘Censoring’Because of censoring, many common

data analysis procedures can not be adopted directly.

For example, one could use the logistic regression model to model the relationship between survival probability and some relevant covariates◦However one should use the customized

logistic regression procedures designed to account for censoring

5

Page 6: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

6

Key Assumption: Independent Censoring

Those still at risk at time t in the study are a random sample of the population at risk at time t, for all t

This assumption means that the hazard function, λ(t), can be estimated in a fair/unbiased/valid way

Page 7: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

7

3A. Kaplan-Meier (Product-Limit) Estimator of the Survival CurveThe Kaplan–Meier estimator is the

nonparametric maximum likelihood estimate of S(t). It is a product of the form

is the number of subjects alive just before time

denotes the number who died at time

i

ii

r

dr

r

dr

r

drtS

...)(ˆ

2

22

1

11

kr

kd ktkt

Page 8: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

8

Kaplan-Meier Curve, Example

Time ti # at risk

# events

0 20 0 1.00

5 20 2 [1-(2/20)]*1.00=0.90

6 18 0 [1-(0/18)]*0.90=0.90

10 15 1 [1-(1/15)]*0.90=0.84

13 14 2 (1-(2/14)]*0.84=0.72

S

Page 9: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

99

Kaplan Meier CurveKaplan Meier Curve

0.60.7

0.80.9

1.0

0 5 10 15 20Survival Time

0.6

0.7

0.8

0.9

1.0

Pro

port

ion

Sur

vivi

ng (

95%

Con

fiden

ce)

Page 10: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

10

Figure 1. Plot of survival distribution functions for the NCI and the SCI Groups. The Y-axis is the probability of not declining to GDS 3 or above. The X-axis is the time (in years) to decline. (Barry Reisberg et al., 2010; Alzheimer & Dementia; in press.)

Page 11: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

11

3B. Comparing Survival 3B. Comparing Survival FunctionsFunctions

0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50 60

Survival Distribution Function

Time

High

Medium

Low

Page 12: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

12

Log-Rank TestLog-Rank Test

The log-rank test

• tests whether the survival functions are statistically equivalent

• is a large-sample chi-square test that uses the observed and expected cell counts across the event times

• has maximum power when the ratio of hazards is constant over time.

Page 13: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

13

Wilcoxon TestWilcoxon Test

The Wilcoxon test

• weights the observed number of events minus the expected number of events by the number at risk across the event times

• can be biased if the pattern of censoring is different between the groups.

Page 14: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

14

Log-rank versus Wilcoxon Log-rank versus Wilcoxon TestTest

Log-rank test

• is more sensitive than the Wilcoxon test to differences between groups in later points in time.

Wilcoxon test

• is more sensitive than the log-rank test to differences between groups that occur in early points in time.

Page 15: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

15

4. Two Parametric DistributionsHere we present two most notable

models for the distribution of T.Exponential distribution: Weibull distribution:

◦Its survival function:

◦Thus:

)(t

11)()( ppp tptpt

pt

pp tduuptS )(expexp)(0

1

)ln()ln()(ln(ln tptS

Page 16: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

16

Weibull Hazard Function, Plot

Page 17: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

17

5. Regression Models The Exponential and the Weibull

distribution inspired two parametric regression approaches:

1. Parametric proportional hazard model – this model can be generalized to a semi-parametric model: the Cox proportional hazard model

2. Accelerated failure time model

Page 18: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

18

Proportional Hazard ModelIn a regression model for survival

analysis one can try to model the dependence on the explanatory variables by taking the (new) hazard rate to be:

Hazard rates being positive it is natural to choose the function c such that c(β,x) is positive irrespective the values of x.

)...( 221100 ikkii xxxc

Page 19: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

19

Proportional Hazard ModelThus a good choice is: The resulting proportional hazard

model is:

For the Weibull distribution we have:

For the Exponential distribution we have:

exp(.)(.) c

)...exp( 221100 ikkii xxx

)...exp( 221101

0 ikkiipp xxxtp

)...exp( 221100 ikkii xxx

Page 20: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

20

Accelerated Failure Time ModelFor the Weibull distribution

(including the Exponential distribution), the proportional hazard model is equivalent to a log linear model in survival time T:

Here the error term can be shown to follow the 2-parameter Extreme Vvalue distribution

0 1 1 2 2ln ...i i k ikT x x x

Page 21: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

21

Apply Both Models SimultaneouslyIf the underlying distribution for T is

Weibull or Exponential, one can apply both regression models simultaneously to reflect different aspects of the survival process. That is

Prediction of degree of decline using the Weibull proportional hazard model

Prediction of time of decline using the accelerated failure time model

Page 22: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

22

An Example In a recent paper (Reisberg et

al., 2010), we applied both regression models to a dementia study conducted at NYU:

The results are shown next

0 1 2 3 4 5( ) ( ) exp( * * * * * )T T Group Age Gender Education FollowUp

0 1 2 3 4 5log * * * * *T Group Age Gender Education FollowUp

Page 23: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

23

Page 24: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

24

6. Cox Proportional Hazards 6. Cox Proportional Hazards ModelModel

Page 25: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

25

Parametric versus Parametric versus Nonparametric ModelsNonparametric Models

Parametric models require that

• the distribution of survival time is known

• the hazard function is completely specified except for the values of the unknown parameters.

Examples include the Weibull model, the exponential model, and the log-normal model.

Page 26: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

26

Parametric versus Parametric versus Nonparametric ModelsNonparametric Models

Properties of nonparametric models are

• the distribution of survival time is unknown

• the hazard function is unspecified.

An example is the Cox proportional hazards model.

Page 27: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

27

Cox Proportional Hazards Cox Proportional Hazards ModelModel

1 1{ ... }0( ) ( ) i k ikX X

ih t h t e

Baseline Hazard function - involves time but not predictor variables

Linear function of a set of predictor variables - does not involve time

...

β = 0 → hazard ratio = 1Two groups have the same survival experience

Page 28: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

28

Popularity of the Cox Popularity of the Cox ModelModel

The Cox proportional hazards model

• provides the primary information desired from a survival analysis, hazard ratios and adjusted survival curves, with a minimum number of assumptions

• is a robust model where the regression coefficients closely approximate the results from the correct parametric model.

Page 29: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

29

Partial LikelihoodPartial Likelihood

Partial likelihood differs from maximum likelihood because

• it does not use the likelihoods for all subjects

• it only considers likelihoods for subjects that experience the event

• it considers subjects as part of the risk set until they are censored.

Page 30: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

30

Partial LikelihoodPartial Likelihood

Subject Survival Time Status

C 2.0 1

B 3.0 1

A 4.0 0

D 5.0 1

E 6.0 0

Page 31: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

31

Partial LikelihoodPartial Likelihood

)5()5(

)5(

)3()3()3()3(

)3(

)2()2()2()2()2(

)2(

ed

dd

edab

bb

edabc

cc

hh

hL

hhhh

hL

hhhhh

hL

Page 32: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

32

Partial LikelihoodPartial Likelihood

ekkeedkkdd

dkkdd

ekkeedkkdd

dkkdd

XXXXXX

XXX

d

XXXo

XXXo

XXXo

d

ed

dd

ee

eL

eheh

ehL

hh

hL

........

....

........

....

22112211

2211

22112211

2211

)5()5(

)5(

)5()5(

)5(

Page 33: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

33

Partial LikelihoodPartial LikelihoodThe overall likelihood is the

product of the individual likelihood. That is:

* *c b dL L L L

Page 34: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

34

7. SAS Programs for Survival Analysis

There are three SAS procedures for analyzing survival data: LIFETEST, PHREG, and LIFEREG.

PROC LIFETEST is a nonparametric procedure for estimating the survivor function, comparing the underlying survival curves of two or more samples, and testing the association of survival time with other variables.

PROC PHREG is a semiparametric procedure that fits the Cox proportional hazards model and its extensions.

PROC LIFEREG is a parametric regression procedure for modeling the distribution of survival time with a set of concomitant variables.

Page 35: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

35

Proc LIFETESTProc LIFETESTThe Kaplan-Meier(The Kaplan-Meier(K-MK-M) survival curves ) survival curves

and related tests (Log-Rank, Wilcoxon) and related tests (Log-Rank, Wilcoxon) can be generated using can be generated using SASSAS PROC PROC LIFETESTLIFETEST

PROC LIFETEST DATA=SAS-data-set <options>;TIME variable <*censor(list)>;STRATA variable <(list)> <...variable <(list)>>;TEST variables;

RUN;

Page 36: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

36

Proc PHREGProc PHREGThe Cox (proportional hazards)

regression is performed using SAS PROC PHREG

proc phreg data=rsmodel.colon;

model surv_mm*status(0,2,4) = sex yydx / risklimits;

run;

Page 37: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

37

Proc LIFEREGProc LIFEREGThe accelerated failure time

regression is performed using SAS PROC LIFEREG

proc lifereg data=subset outest=OUTEST(keep=_scale_);

model (lower, hours) = yrs_ed yrs_exp / d=normal; output out=OUT xbeta=Xbeta; run;

Page 38: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

38

Selected References PD Allison (1995). Survival

Analysis Using SAS: A Practical Guide. SAS Publishing.

JD Kalbfleisch and RL Prentice (2002).The Statistical Analysis of Failure Time Data. Wiley-Interscience.

Page 39: Survival Analysis A Brief Introduction. 2 3 1. Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event

39

Questions?