Upload
nolen
View
53
Download
0
Embed Size (px)
DESCRIPTION
Quick Overview to Survival Analysis. Jinheum Kim ([email protected]) Department of Applied Statistics University of Suwon 2007. 6. 2. Outline. Survival data Censoring & Truncation Survivor function & Hazard function Kaplan-Meier estimator Log-rank test - PowerPoint PPT Presentation
Citation preview
Jinheum KimJinheum Kim
(*******@suwon.ac.kr)(*******@suwon.ac.kr)
Department of Applied StatisticsDepartment of Applied Statistics
University of SuwonUniversity of Suwon
2007. 6. 22007. 6. 2
Jinheum KimJinheum Kim
(*******@suwon.ac.kr)(*******@suwon.ac.kr)
Department of Applied StatisticsDepartment of Applied Statistics
University of SuwonUniversity of Suwon
2007. 6. 22007. 6. 2
2
Outline
Survival data
Censoring & Truncation
Survivor function & Hazard function
Kaplan-Meier estimator
Log-rank test
Cox proportional hazards model
Illustration with an example
3
What is survival analysis?
Outcome variable: Time until an event occurs
Time origin (eg, birth date, occurrence of entry into a study or diagnosis of a disease)
Time (eg, years, months, weeks or days)
Event (eg, death, disease incidence, relapse from remission…)
( )T
4
Censoring
Censoring: Don’t know survival time exactly
Why censoring may occur?
No event before the study ends
Lost to follow-up
Withdrawn from the study
5
A hypothetical example
Study end
Study end
3T
12T
3.5T
8T
6T
3.5T
Withdrawn
Lost
2 4 6 8 10 12Weeks
A
B
C
D
E
F
6
Another types of censoring
Left censoring: , observed to fail prior to
Eg, Time to first use of marijuana
Q: When did you first use marijuana?
A: exact age, “I never used it.” or “I have used it but can
not recall just when the first was.”
[0, )T c
[0, )T c c
Double censoring
Interval censoring:
Eg, Time to cosmetic deterioration of breast
cancer patients
( , )T a b
7
Censoring vs. Truncation
When occurs?
Only those individuals whose event time lies within a certain observational window are observed( , )L RY Y
In contrast to censoring where there is at least partial information on each subject
Left truncationWhen Eg, Life lengths of elderly residents of a retirement community
Right truncation When Eg, Waiting time from infection at transfusion to clinical onset of AIDS
(sampled on June 30, 1986)
RY
0LY
8
Illustration
Data on 137 bone marrow transplant patients
Risk factors: patient and donor age, sex, and CMV status, waiting time from diagnosis to transplantation, FAB, MTX
Three groups: AML low risk(54), AML high risk(45), ALL(38)
Survival times
: time(in days) to death or end of study
: disease-free survival time(time to relapse, death or end of study)
: time to acute GvHD
: time to chronic GvHD
: time to return of platelets to normal levels
1T
2T
AT
CT
PT
9
Simplified recovery process from BMT
TRANSPLANT
Relapse
acute GvHD
Death
plateletrecovery
acute GvHD
plateletrecovery
10
Survivor function
(definition)
Probability that a person survives longer than
( ) ( )S t P T t t
(properties)
§ Non-increasing
§
§ Eventually nobody would survive
(0) 1S
( ) 0S
11
Hazard function
(definition)
instantaneous potential per unit time for the event
to occur , given that the individual has survived up to
0
( | )( ) lim
t
P t T t t T th t
t
t
(properties)
§ Non-negative
§ No upper bound
12
vs.
Focus on not failing vs. failing
The higher is, the smaller is
( )S t ( )h t
( )S t ( )h t
Directly describe survival vs. insight about conditional
failure rates
(relational formula)
or 0( ) exp ( )
tS t h u du
( ) /( )
( )
dS t dth t
S t
13
Three goals of survival analysis
Estimate survivor and/or hazard functions
Compare survivor and/or hazard functions
Assess the relationship of explanatory variables to survival time
14
Kaplan-Meier estimator
(Distinct) observed survival times:
, conventionally 1 2 kt t t 0 10, kt t
: # of individuals fail at ( 0, , )jd j k jt
: # of individuals censored in jm 1[ , )j jt t
: # of individuals at risk just prior to
( ) ( )j j j k kn m d m d
jt
multiplying (1-observed proportion
of failures) at each survival times |
ˆ( ) 1j
j
j t t j
dS t
n
t
15
Remarks on
Never reduce to zero if Not defined for
>(largest time recorded)
ˆ( )S t
0km t
(estimated asymptotic variance)
: Greenwood’s formula 2
|
ˆ ˆvar( ( )) ( )( )
j
j
j t t j j j
dS t S t
n n d
Pointwise 95% confidence interval for :
linear and symmetrical, but possibly lies out of (0,1) and low coverage rate with very small samples
( )S t ˆ ˆ( ) 1.96 ( ( ))S t se S t
Life table estimator: used for the survival data grouped into convenient intervals
Nelsen-Aalen estimator for cumulative hazard function :
( )t
|
ˆ ( )j
j
j t t j
dt
n
16
Illustration (revisited)
Survival time=time to relapse, death or end of study,
i.e, disease-free time
Estimated disease-free survival curves:
AML low risk>ALL>AML high risk
Estimated cumulative hazard rates
17
Survival curves for three disease groups
18
CI of survival for ALL group
19
CI of survival for high-risk AML group
20
CI of survival for low-risk AML group
21
Comparison of survivor functions
Test whether or not the survivor functions for two groups are equivalent
: (distinct) survival times by pooling all the sample from two groups
1 2 kt t t
: (observed ) # of failures at in group,
ijd jt2
1j ijid d
: # of individuals at risk just prior to in group
ijnjt
2
1j ijin n
1,2; 1, ,i j k
i
22
Comparison of survivor functions
Idea: Based on 1 1
k kiji ij j ij ijj j
j
nZ d d O E
n
Log-rank statistic
§ under
§ If reject a test for equality of the survivor
functions at level
21 12 2
1 1 11 1( ) 1
1
k k j j j jj j jj j
j j j
n n n dX O E d
n n n
0H
2 21 ( ),X
23
Remarks for log-rank test
Choice of weight function: , specially for
log-rank test
( )jw t ( ) 1jw t
Extension to three or more groups
Stratification on a set of covariates
Trend test for ordered alternatives
: plugging in any set of scores
24
Illustration (revisited)
Test that the disease-free survival curves of three
groups are same over 2,204
( ) 1jw t Three types of test statistic
Log-rank: 13.8037 (p-value=0.0010) with
Gehan: 16.2407 (p-value=0.0003) with
Taron-Ware: 15.6529 (p-value=0.0004) with
highly significant!
( ) 1jw t
( )j jw t n
( )j jw t n
t
25
Cox proportional hazards model
Why regression models need? To predict covariates(or explanatory variables, risk factors) for time to event
Data: , , ; min( , ), ( ), 1,2, ,j j j j j j j j jt z t x c I t x j n
Cox model : hazard rate at for
an individual with risk vector
'0( | ) ( )exp( )h t z h t z tz
A sort of semiparametric model parametrically for
the covariate effect + nonparametrically for baseline hazard function
Why PH is called?
RR(or HR)= is constant against **
( | )exp ( )
( | ) k k k
h t zz z
h t z t
26
Illustration (revisited)
Background: to adjust the comparisons of the three risk groups because this was not a randomized clinical trial
Fixed risk factors =1 if AML low-risk, =1 if AML high-risk
=waiting time
=FAB
=MTX
=1 if donor: male; =1 if patient: male;
=1 if donor & patient: male
=1 if donor: CMV positive; =1 if patient: CMV positive;
=1 if donor & patient: CMV positive
=donor age-28; =patient age-28;
1z 2z
3z
4z
5z
6z 7z
8 6 7z z z
9z 10z
11 9 10z z z
12z 13z 14 12 13z z z
27
ANOVA table for final model (fixed only)
0.265 1.24 0.363 -0.404 1
0.002 9.48 0.354 -1.091 1
p-ValueWald
Chi SquareSE(b)b
Degrees ofFreedom
1Z
2Z
0.001 11.01 0.001 0.003 1
0.728 0.12 0.020 0.007 1
0.831 0.05 0.018 0.004 112Z
13Z
14Z
0.279 0.003 9.03 0.837 14Z
28
Other regression models
Additive hazards model:
Accelerated failure time model:
Focus on direct relationship between and time to event
Effect of covariates is multiplicative on rather then on
hazard function
Parametric, but providing a good fit if correctly chosen
0( | ) ( )h t z h t z
logT z
z
t
29
Refinements of Cox model
Stratification
§ When the PH assumption is violated for some covariate
§'
0( | ) ( ) exp( ), 1,2, ,j jh t z h t z j s
Time-dependent covariates
§ Eg, BP, cholesterol, size of the tumor …
§ '0( | ( )) ( ) exp( ( ))h t z t h t z t
30
Illustration (revisited)
Time-dependent covariates
Whether or not aGvHD occurs at time NS!
Whether or not cGvHD occurs at time NS!
Whether or not the platelets recovered at time Significant!
t t
t
Final risk factors
Fixed-time effects: Disease group, FAB, Age
Time-dependent effect: Platelet recovery
Time-dependent interactions: Disease group Platelet recovery,
Age Platelet recovery, FAB Platelet recovery
31
Three regressions with a time-dependent covariate
0.0006 11.8657 0.3280 -1.1297 1
0.1542 2.0306 0.2676 0.3813 1
0.0862 2.9435 0.2892 -0.4962 1
0.4982 0.4588 0.2876 -0.1948 1
0.1732 1.8548 0.2685 0.3657 1
0.0356 4.4163 0.2962 -0.6225 1
0.2642 1.2470 0.2851 0.3184 1
0.1110 2.5400 0.2722 0.4338 1
0.0554 3.6690 0.2880 -0.5516 1
p-ValueWald
Chi SquareSE(b)b
Degrees ofFreedom
1Z
2Z( )AZ t
1Z
2Z( )CZ t
1Z
2Z
( )PZ t
32
ANOVA table for final model (fixed+time-variant)
0.1814 1.786 0.0020 0.0026 1
0.0072 7.229 0.0434 0.1166 1 Donor age -28
0.0048 7.948 0.0545 -0.1538 1 Patient age -28
0.2676 1.229 1.1139 -1.2348 1
AML with FAB
Grade 4 or 5
0.3658 0.818 1.2242 1.1071 1 AML high risk
0.1103 2.550 0.8186 1.3073 1 AML low risk
p-ValueWald
Chi SquareSE(b)b
Degrees ofFreedom
1 :Z
2 :Z
3 :Z
4 :Z
5 :Z
6 4 5Z Z Z 0.6589 0.195 0.6936 -0.3062 1 Platelet Recovery( ) :pZ t
8 2 ( )pZ Z Z t 0.1479 2.093 1.2908 -1.8675 1
0.0010 10.765 0.9257 -3.0374 17 1 ( )pZ Z Z t
0.0346 4.467 1.1609 2.4535 19 3 ( )pZ Z Z t
0.9561 0.003 0.0023 0.0001 1
0.0022 9.383 0.0480 -0.1470 1
0.0010 10.821 0.0588 0.1933 110 4 ( )pZ Z Z t 11 5 ( )pZ Z Z t
12 6 ( )pZ Z Z t
33
Tests of PH assumption
FactorWald
Chi SquareDegrees ofFreedom
p-Value
Disease group 1.735 2 0.4200
Waiting time 0.005 1 0.9441
FAB 0.444 1 0.5051
MTX 4.322 1 0.0376
Sex 0.220 3 0.9743
CMV status 1.687 3 0.6398
Age 4.759 3 0.1903
34
What did we overview so far?
How to define survival data
Censoring vs. truncation
Survivor function vs. hazard function and their relation
How to estimate the survival function: KM estimator
How to compare survival functions: Log-rank test
How to estimate risk factors: Cox proportional hazards model with fixed and/or time-dependent covariates
Illustrations with BMT data
35
THANK YOU!