View
285
Download
5
Category
Tags:
Preview:
Citation preview
MH4513
Survival Analysis
Liming Xiang
MH4513 - Chapter 1
Chapter 1. Introduction
Survival analysis
A collection of statistical procedures for data analysis
for which the outcome variable of interest is time until
a certain event occurs
Methods include tools for
Summarizing and characterizing the distributions of such data
Testing difference between groups of individuals
Setting up regression models to analyze complex influences
of covariates on these duration data
MH4513 - Chapter 1
1.1 Survival Data
Examples:
Time to death
Time it takes for a patient to respond to a therapy;
Time from response until disease relapse
Time: years, months, weeks or days from beginning of
follow-up of an individual to the occurrence of an
event
Event: death, disease incidence, relapse from remission,
recovery
MH4513 - Chapter 1
Time to event T : Survival time or lifetime
Event: failure
Start/end point for measuring time are chosen
according to the context so it may not represent the
entire life of an individual
e.g.,
Time of treatment initiation until death
Time to first recurrence of a tumor (i.e., length of remission)
after initial treatment
Lifetime of an electrical component until failure (“Reliability”)
Promotion times for employees
Time of a new house until first transaction
MH4513 - Chapter 1
Distinguishing feature of survival data
Clearly all time T >=0
T may not be observed for all individuals, instead all
we know is that during a certain period of observation
there was no event → censored data
This feature is known as censoring
e.g.
A medical study terminated before some individuals
experienced their event (some cancer patients live a long
time)
Some individuals left the study before they experienced their
event
A study is concerned with one particular cause of death,
individuals who die of other causes may be regarded as
censored
MH4513 - Chapter 1
1.2 Examples
AML study Leukemia patients time in remission
After reaching the remission via chemotherapy
treatment
group 1: received maintenance chemotherapy;
group 2: control group did not.
Table 1.1 Data for the AML maintenance study. A+ indicates a censored value
Group length of complete remission (in weeks)
Maintained 9 13 13+ 18 23 28+ 31 34 45+ 48 161+
Nonmaintained 5 5 8 8 12 16+ 23 27 30 33 43 45
MH4513 - Chapter 1
Objective: to see if maintenance chemotherapy
prolonged the time until relapse
A naive descriptive analysis
I) Analysis of AML data after throwing out censored
observations
Measures Maintained Nomaintained
Mean 25.1 21.7
Median 23.0 23.0
MH4513 - Chapter 1
II) Analysis of AML data after treating censored observations as
exact
The distribution of group 1 is more skewed to the right than that of group 2
Measures Maintained Nonmaintained
Mean 38.5 21.3
Median 28.0 19.5
MH4513 - Chapter 1
III) Analysis of AML data after accounting for the censoring
A nonparametric method used to estimate the mean and median here.
The distributions of both groups are shown quite symmetric .
Measures Maintained Nomaintained
Mean 52.6 22.7
Median 31.0 23.0
31.8
MH4513 - Chapter 1
CNS lymphoma data from clinical study
Reference paper: Dahlborg et al (1996, The Cancer Journal from Scientific
American Vol 2, 166-174)
Aim of study: to compare survival time between the
two groups
58 non-AIDS patients with central nervous system
(CNS) lymphoma were treated
Group 1: n1(=19) patients received radiation prior to blood-
brain barrier disruption (BBBD) chemotherapy treatment
Group 0: n2(=39) received BBBD treatment only
MH4513 - Chapter 1
Radiographic tumor response and survival were evaluated. A
number of variables obtained for each patient are given below
Table 1.2 the variables in the CNS lymphoma example
1 PT.NUMBER: patient number
2 Group: 1=prior radiation; 0=no prior radiation with respect
to 1st blood brain-barrier disruption (BBBD) procedure
3 Sex: 1=female; 0=male
4 Age: at time of 1st BBBD, record in years
5 Status: 1=dead; 0=alive
6 DxtoB3: time from diagnosis to 1st BBBD in years
7 DxotoDeath: time from diagnosis to death in year
8 B3toDeath: time from 1st BBBD to death in years
9 KPS.PRE: Karnofsky performance score before 1st BBBD,
numerical value 0-100
10 LESSING: Lesions; single=0, multiple=1
11 LESDEEP: Lesions: superficial=0, deep=1
12 LESSUP: Lesions: supra=0, infra=1, both=2
13 PROC: Procedure: subtotal resection=1; biopsy=2; other=3
14 RAD4000: Radiation>4000; yes=1; no=0
15 CHEMOPRIOR: yes=1, no=0
16 RESPONSE: Tumor response to chemo-complete=1; partial=2; blanks represent missing data
MH4513 - Chapter 1
Is Group 0 (no prior radiation) surviving as long or longer with
improved cognitive function?
Survival curve for the two groups are estimated:
Figure1.1: Survival functions for CNS data
Survival Time in Years from First BBBD
Pe
rce
nt S
urv
ivin
g
0 1 2 3 4 5 6 7 8 9 10 11 12
01
02
03
04
05
06
07
08
09
01
00
Primary CNS Lymphoma Patients
no radiation prior to BBBD (n=39)radiation prior to BBBD (n=19)+=patient is censored
MH4513 - Chapter 1
It is found that group 0’s curve is always above that of Group 1 suggesting a
higher rate of survival, hence a longer average survival time for Group 0
radiation profoundly impairs cognitive functioning
Next question: do any subsets of the covariates help to explain survival time?
E.g., does age at time of first treatment or gender increase or
decrease the relative risk of survival?
Implementation of some kind of regression procedure is required
MH4513 - Chapter 1
Other examples of analyzing time to event data arises in engineering and economics
An application from industrial life-testing of springs (Cox and
Oskes 1984, Example 1.3)
Springs are tested under cycles of repeated loading, failure time is the number of cycles to failure.
Examples in reliability can be found in Lawless (1982)
An application to real estate finance (Cheung, et al 2004, Journal of Real Estate Finance and Economics 29:321-339)
Time to event =transaction duration time (number of days between two transactions).
The study aims at identifying possible factors that determine the popularity of residential unit by means of a repeated sales pattern.
MH4513 - Chapter 1
1.3 Functions of Survival Time
Of course, survival time T is a positive random variable.
In routine data analysis, we may first present some summary
statistics: mean and standard error for the mean etc.
In analyzing survival data, however, the summary statistics may not
have the desired statistical properties, such as unbiasedness, due to
possible censoring.
MH4513 - Chapter 1
Other methods to present survival data are expected.
One way is
to estimate the underlying true distribution either parametrically
or non-parametrically
then to estimate other quantities of interest such as mean,
median, etc. of the survival time.
MH4513 - Chapter 1
The distribution of survival times can be described in
some equivalent ways, often characterized by 3
functions:
Probability density function (PDF)
Survival function
Hazard function (or hazard rate)
MH4513 - Chapter 1
Let random variable T be the time to the event of interest.
Definition. Cumulative distribution function (CDF)
0),()( ttTPtF (1.1)
F(t) is right continuous, i.e., )()(lim tFuFtu
.
Review of the probability density function (PDF)
Definition. Probability density function
t
ttTtP
dt
tdFtf
t
)(lim
)()(
0 (1.2)
= Rate of occurrence of death at t
f(t) is the limit of the probability that an individual fails in the short time interval t to
tt per unit time. It gives the rate of occurrence of failure at t.
MH4513 - Chapter 1
In practice, without censoring, f(t) can be estimated as the proportion of subjects dying
in an interval per unit width:
) w i d t hi n t e r v a l() s u b j e c t s(#
) a t t i m e b e g i n n i n g i n t e r v a l i n t h e d y i n g s u b j e c t s(#)(ˆ
ttf (1.3)
But when censoring presents it is not applicable.
b
adttf )( = proportion of individuals failing in time interval (a, b).
Max{f(t)}= the peak of high frequency of failure.
For example, exponential distribution f(t)=e-t .
MH4513 - Chapter 1
The survival function
In biomedical applications, it is often common to use the survival function.
Definition. Survival function S(t)
)(1)()( tFtTptS . (1.4)
For survival time T, S(t) is the probability that a randomly selected individual will
survive to time t or beyond.
In the context of equipment or manufactured item failures, S(t) is referred to as the
reliability function.
Note that dt
tdStfduuftS
t
)()(,)()(
.
MH4513 - Chapter 1
Example 1.1: For the Weibull distribution with pdf tettf 1)( , the survival function is
t
t
x edxextS
1)( .
0 5 10 15
Time
0.0
0.2
0.4
0.6
0.8
1.0
S1
S2
S3
Figure 1.3: Weibull survival functions, S1-S3 for (α, λ)=(1, 0.1), (0.5, 0.0693) and (3, 0.1277)
respectively
MH4513 - Chapter 1
Figure 1.4: Illustration that T1 is stochastically larger than T2
0.5 3.0 5.5 8.0 10.5 13.0 15.5
Time
0.0
0.2
0.4
0.6
0.8
1.0
S1 Treatment group
S2 Control group
Su
rviv
al P
rob
ab
ility
Definition The survival distribution for group 1 is stochastically larger than the survival
distribution for group 2 if )()( 21 tStS for all 0t , where )(tSi is the survival function
of group i.
Definition T1 is stochastically larger than T2 if Ti is the corresponding survival time for
groups i.
MH4513 - Chapter 1
Characteristics of S(t):
S(t) is a monotone and non-increasing in t
S(t)=1 if t=0
0)(lim)(
tSSt
In general, survival curve provides useful information, which is used to
find the median and other percentiles (25th
and 75th
) of survival time
compare survival distributions of two or more groups
MH4513 - Chapter 1
Definition. Mean survival time E(T) is used to describe the central tendency of a
distribution.
00
)()()( dttSdtttfTE (1.5)
(why?)
In survival distributions, sample mean of observed survival times is no longer an
unbiased estimate of E(T), the median is often better as a small number of
individuals with exceptional long or short lifetimes will cause the mean survival
time to be disproportionately large or small.
MH4513 - Chapter 1
Definition. Median survival time m such that S(m)=0.5. If S(t) is not strictly
decreasing, m is the smallest one such that 5.0)( mS .
Definition. pth quantile of survival time (100pth percentile) pt such that
ptS p 1)( . If S(t) is not strictly decreasing, ptS p 1)( .
MH4513 - Chapter 1
Example 1.2: In the hypothetical population in Figure 1.5, we have a population where 80% of
the individuals will survive 4.7 years ( 7.42.0 t ) and the median survival time is 6.8 years (i.e.,
50% of the population will survive at least 6.8 years).
Figure 1.5: The survival function for a hypothetical population
MH4513 - Chapter 1
Definition. Mean residual life time (mrl). For individuals of age t0, mrt(t0) measures
their expected remaining lifetime.
)|()( 000 tTtTEtmrl (1.6)
i.e., average remaining survival time given the population has survival beyond t0.
It can be show that
)(
)()(
0
00
tS
dttStmrl
t
(1.7)
(why?)
MH4513 - Chapter 1
The hazard function
Definition. The hazard function h(t).
t
tTttTtPth
t
)|(lim)(
0 (1.8)
It can be expressed in terms of the survival function S(t) and PDF f(t):
dt
tSd
tS
tfth
)](log[
)(
)()( (1.9)
From the definition, 0)( th , tth )( can be viewed as the “approximate” probability
of an individual of age t experiencing the failure in the next instant. Thus the hazard
rate gives the risk of failure per unit time during the aging process.
MH4513 - Chapter 1
Estimation: In practice when no censoring observations the hazard rate is estimated as the
proportion of individuals dying in an interval per unit time, given that they have survival to the
beginning of the interval:
) widthinterval()at surviving sindividual(#
) at time beginning interval edyingin th sindividual(#)(ˆ
t
tth (1.10)
MH4513 - Chapter 1
Definition. Cumulative hazard function.
t
dxxhtH0
)()( (1.11)
We can integrate both sides of (1.9) to get
)(log)( tStH (1.12)
Thus,
])(exp[)](exp[)(0t
dxxhtHtS (1.13)
In addition, from (1.9) we have
])(exp[)()()()(0t
dxxhthtSthtf (1.14)
MH4513 - Chapter 1
To summarize,
a) there are 1-1 relationships between any two of the
pdf, survival function and hazard rate.
Given any one of survival functions, the other two can be
easily derived.
b) the hazard rate is not a probability.
It is a probability rate. Therefore it is possible that a hazard
rate can exceed one in the same fashion as a density function
f(t) may exceed one.
MH4513 - Chapter 1
Shapes of the hazard function:
increasing (often when there is natural aging or wear)
decreasing (occasionally for certain types of electronic devices or patients
experiencing certain types of transplants with a very early likelihood of failure)
constant
bathtub-shaped (common in populations followed from birth)
hump-shaped (hazard rate is increasing early and eventually declining, used in
modeling survival after successful surgery where there is an initial increase in risk
due to infection, hemorrhaging or other complications , follows by a steady decline
in risk as the patient recovers)
MH4513 - Chapter 1
Example 1.3: For Weibull distribution, hazard rates 1)( xth are plotted for the same
values of the parameters used in Example 1.1, which involves constant, increasing and
decreasing hazards.
Figure 1.6: Weibull hazard functions, h1-h3 for (shape, scale)=(1, 0.1), (0.5, 0.0693) and
(3, 0.1277) respectively
0 5 10 15
Time
0.0
0.4
0.8
1.2
Hazard
Rate
h1
h2
h3
MH4513 - Chapter 1
Example 1.4: Suppose that the survival time T of a population follows the exponential
distribution with parameter , i.e., pdf tetf )( , for 0,0 t .
The survival function is then
t
t
x
teedxxftS
|)()( , 0t
and the hazard function by (1.9) is
)(
)()(
tS
tfth , 0t
MH4513 - Chapter 1
The mean survival time is given by
1)()()(
000
dtedttSdtttfTE t
Let 5.0)( 5.0
5.0 t
etS
. Then the median survival time is t0.5=log2/λ.
By (1.7), the mean residual life time after t0 is
)(1
)(
)()(
0
00
0
0 TEe
dte
tS
dttStmrl
t
t
t
t
A special example is given in the textbook (p17) with parameter λ=1.
MH4513 - Chapter 1
1.4 Censoring
A common feature presents in time-to-event data.
Important issues arise in clinic trials:
Some individuals are still alive (or disease-free) at the end of
the study or analysis so the event of interest, namely death (or
disease) has not occurred.
Length of follow-up varies due to staggered entry. So we cannot
observe the event for those individuals with insufficient follow-
up item.
Or Loss to follow-up: patients stop coming to clinic or move
away
Death from other causes: competing risks
MH4513 - Chapter 1
In above cases,
the exact survival times of these individuals are unknown
censored observations (or times)
If no censoring occurs, the survival data set is complete.
3 major categories of censoring:
Right censoring (Type I, II, and III censoring)
Left censoring
Interval censoring
MH4513 - Chapter 1
Type I Censoring
Definition. The event is observed only if it occurs prior to some
prespecified time.
Example.
A typical animal study (or clinical trial starts) with a fixed number
of animals (or patients) to which a treatment is applied.
Due to time or cost considerations, investigator will terminate the
study or report the results before all subjects realize their events.
MH4513 - Chapter 1
In a data set under type I censoring scheme,
exact (uncensored) observations: survival times recorded for
subjects that experience the event during the study period are the
times from the start of the experiment to their death.
censored observations: the survival times of the sacrificed
subjects that are not known exactly but are recorded as at least
the length of the study period.
If no accidental losses,
censoring observations
= length of the study period
MH4513 - Chapter 1
Example 1.5
An animal experiment for toxicological research.
Goal: to assess the effect of the carcinogen on tumor development.
Experiment: 6 rats are exposed to carcinogens by injecting tumor
cells. The experiment is terminated after 30 weeks. Survival time Ti recorded for each subject at the end of the experiment.
Ti = the time to develop a tumor of a certain size.
MH4513 - Chapter 1
Observations:
A, B and D developed tumor after 10, 15 and 25 weeks
respectively;
C and E did not develop tumor by the end of experiment;
F died accidentally without tumors after 19 weeks.
+ indicates censored data
Rat A B C D E F
Ti (wk) 10 15 30+ 25 30+ 19+
MH4513 - Chapter 1
Type II Censoring
Definition. Observation ceases after a predetermined
number of failures achieved.
The type II censoring is a useful technique for
economical use of effort in animal studies and industrial
life testing.
MH4513 - Chapter 1
Example 1.5 (continued)
If the investigator decides to terminate the study after 4 of 6 rats have developed tumors.
The survival or tumor-free times are then
If no accidental losses, censored obs. = the largest uncensored obs.
However, it is not true
in the case of this example.
Rat A B C D E F
Ti (wk) 10 15 35+ 25 35 19+
MH4513 - Chapter 1
Type III Censoring (Random/Progressive
Censoring)
Definition. The study period is fixed and subjects enter the
study at different times during the period.
Some subjects may withdraw or lost to follow-up before
the end of the study.
Censored time for each subject may be different.
MH4513 - Chapter 1
Example 1.7
6 patients with acute leukemia in a clinical study during a total study
period of 1 year. All six respond to treatment and achieve remission.
Ti = remission time of subject i.
Patient B lost to follow up after 4 months.
D and F are still in remission at the end
of study.
So these 3 are censored data.
Patient A B C D E F
Ti (months) 4 4+ 6 8+ 3 3+
MH4513 - Chapter 1
Right censoring
The above three types of censoring are belong to right censoring.
Definition. The event of interest occurs after a certain time but the
exact failure time is not known by the end of the study.
The data from this censoring scheme can be represented by
pairs of random variables
(X, δ)
δ is a censoring indicator, δ=1 event experienced while δ=0
censored
X=min(T, Cr), where Cr is a fixed censoring time and T is a
lifetime.
MH4513 - Chapter 1
Left censoring
Definition. The event of interest occurred prior to a
certain time Cl, but the exact time of occurrence is
unknown.
The data from the left censoring scheme can be
represented by pairs of random variables
(X, δ)
where δ is a censoring indicator as in right censoring and
X=max(T, Cl)
MH4513 - Chapter 1
Examples.
a)
An epidemiologist wishes to know the age at diagnosis in a follow up
study of diabetic retinopathy.
A 50 years old participant was found to have already developed
retinopathy, but there is no record of the exact time at which initial
evidence was found.
So the age at 50 is a left censored observation.
MH4513 - Chapter 1
b) Time to first use of marijuana
Data are collected through survey by asking
“When did you first use marijuana?” The answers are:
a. Exact age, _____
b. I never used it,
c. I used it but can not remember when the first time was.
Answer c, which type of censoring?
MH4513 - Chapter 1
Interval censoring
Definition. When the event of interest is known to have
occurred between times a and b.
Example 1. Medical records indicate that at age 45, the
patient in the example above did not have retinopathy.
His age at diagnosis is between 45 and 50 years.
MH4513 - Chapter 1
Example 2. Time to cosmetic deterioration of breast cancer
patients. To compare the cosmetic effect of two treatments on early
breast cancer patients:
(i) radiotherapy and
(ii) radiotherapy plus chemotherapy,
patients were observed in intervals.
The event of interest is the first time breast retraction is observed.
Breast Cancer data
Recommended