28
STAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2017 Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 1 / 28

STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

STAT 7780: Survival Analysis3. Nonparametric Estimation

Peng Zeng

Department of Mathematics and StatisticsAuburn University

Fall 2017

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 1 / 28

Page 2: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Outline

1 Kaplan-Meier estimator

2 Nelson-Aalen estimator

3 Mean and median survival time

4 Left-truncated and right-censored data

Reference: Chapter 4 in Klein and Moeschberger (2003).

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 2 / 28

Page 3: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Kaplan-Meier estimator

Example: Leukemia Remission

The data contain the time that leukemia relapsed after patients weretreated by a drug called 6-MP.

10 7 32+ 23 22 6 16 34+ 32+ 25+11+ 20+ 19+ 6 17+ 35+ 6 13 9+ 6+ 10+

Problem: find the estimates and standard errors for

survival function S(t)

cumulative hazard function H(t)

mean and median survival time

We also discuss confidence intervals and confidence bands.

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 3 / 28

Page 4: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Kaplan-Meier estimator

Notation

A sample of right-censored survival data {(Ti , δi), i = 1, . . . , n}.Ti : a time on study (either lifetime or censoring time)

δi : whether Ti is an event time (= 1) or a censoring time (= 0)

Assume that the events occur at D distinct times

t1 < t2 < · · · < tD

At time ti , there are di events.

Yi is a count of the number of individuals at risk at time ti .(individuals who are alive at ti or experience the event at ti)

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 4 / 28

Page 5: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Kaplan-Meier estimator

Example

For the leukemia example,

10 7 32+ 23 22 6 16 34+ 32+ 25+11+ 20+ 19+ 6 17+ 35+ 6 13 9+ 6+ 10+

We have

ti 6 7 10 13 16 22 23di 3 1 1 1 1 1 1Yi 21 17 15 12 11 7 6

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 5 / 28

Page 6: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Kaplan-Meier estimator

Kaplan-Meier Estimator: Heuristic

The survival function S(ti) is

S(ti) =S(ti)

S(ti−1)

S(ti−1)

S(ti−2)· · · S(t2)

S(t1)

S(t1)

S(0)S(0)

If the time is discrete, for any i ,

S(ti)

S(ti−1)=

P(T > ti)

P(T > ti−1)=

P(T > ti)

P(T ≥ ti)= P(T > ti | T ≥ ti)

and

P(T > ti | T ≥ ti) =Yi − di

Yi

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 6 / 28

Page 7: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Kaplan-Meier estimator

Kaplan-Meier Estimator

The Kaplan-Meier estimator is also called product-limit estimator.

S(t) =

{1, if t < t1∏

ti≤t(1− di/Yi), if ti ≤ t

The estimated variance is

V [S(t)] = S(t)2∑ti≤t

diYi(Yi − di)

.

The estimated standard error is√

V [S(t)].

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 7 / 28

Page 8: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Kaplan-Meier estimator

ExampleFor the leukemia example, we have

ti 6 7 10 13 16 22 23di 3 1 1 1 1 1 1Yi 21 17 15 12 11 7 6

Then

S(10) =

(1− 3

21

)(1− 1

17

)(1− 1

15

)= 0.7529

and

V [S(10)] = (0.7529)2(

3

(21)(18)+

1

(17)(16)+

1

(15)(14)

)= 0.00928

and the standard error is√

0.00928 = 0.0963.Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 8 / 28

Page 9: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Kaplan-Meier estimator

Pointwise Confidence Interval for S(t)

Let

σ2S(t) =

∑ti≤t

diYi(Yi − di)

The most commonly used 100× (1− α)% pointwise confidenceinterval for S(t) at t = t0 is given by

S(t0)± zα/2σS(t0)S(t0)

where zα is upper-tail α percentile of a standard normal distribution.

Better confidence intervals can be obtained by first transformingS(t0), for example, using a logarithm function or an arcsine-squareroot transformation.

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 9 / 28

Page 10: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Kaplan-Meier estimator

Example

For the leukemia example, we already have S(10) = 0.7529.

σ2S(10) =

3

(21)(18)+

1

(17)(16)+

1

(15)(14)= 0.01637

The pointwise 95% confidence interval is

0.7529± (1.96)(√

0.01637)(0.7529)

= 0.7529± 0.1888 = (0.5641, 0.9417).

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 10 / 28

Page 11: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Kaplan-Meier estimator

Confidence Band

For a given confidence level, we wish to find two random function,L(t) and U(t), such that S(t) falls within the band for all t in someinterval.

P[L(t) ≤ S(t) ≤ U(t), for all tL ≤ t ≤ tR ] = 1− α

We call [L(t),U(t)] a (1− α)100% confidence band for S(t).

Question: What is the difference between pointwise confidenceinterval and confidence band?

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 11 / 28

Page 12: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Kaplan-Meier estimator

Confidence Band: EP BandsEqual probability (EP) bands are proportional to pointwise confidenceintervals.

S(t)± cα(aL, aU)σS(t)S(t)

where cα(aL, aU) is obtained using Table C.3 in the textbook,

aL =nσ2

S(tL)

1 + nσ2S(tL)

, aU =nσ2

S(tU)

1 + nσ2S(tU)

,

where n is the sample size, tL < tU and tL and tU are selected suchthat

tL ≥ the smallest observed event time

tU ≤ the largest observed event time

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 12 / 28

Page 13: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Kaplan-Meier estimator

ExampleFor the leukemia example, we have already calculated

S(10) = 0.7529, σ2S(10) = 0.01637

Note that n = 21, Select tL = 6 and tR = 23. We have

σ2S(6) = 0.007937, σ2

S(23) = 0.090184.

Therefore,

aL =(21)(0.007937)

1 + (21)(0.007937)= 0.143, aU =

(21)(0.090184)

1 + (21)(0.090184)= 0.654

and c0.99(0.14, 0.66) = 3.4284. The 99% confidence band is

0.7529± (3.4284)(√

0.01637)(0.7529) = (0.4226, 1.0832)

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 13 / 28

Page 14: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Kaplan-Meier estimator

SAS Code

ods graphics on;proc lifetest data = your-SAS-data more-options;

time T * delta(level-for-censoring);run;

With ods graphics on;, SAS draws a graph for the estimated S(t)

plots = S(CL CB = EP) plots pointwise confidence intervalband confidence band for S(t)

outsurv = creates a data set for estimated S(t) and more

conftype = linear type of confidence interval, default is loglog.

confband = ep type of confidence band,

stderr outputs standard error of S(t) with outsurv =

alpha = specifies confidence level

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 14 / 28

Page 15: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Kaplan-Meier estimator

Example: Bone Marrow Transplant

The data contain disease-free (no recurrence of leukemia) survivaltime for 137 patients who received a bone marrow transplant. Thereare three groups of patients,

ALL (group = 1)

AML low risk (group = 2)

AML high risk (group = 3)

Use statement strata to compare the three groups

Two indicators,

δ1 = 1 for dead and = 0 for alive

δ2 = 1 for relapsed and = 0 for disease-free

We define an indictor δ = max(δ1, δ2) because we are consideringdisease-free survival time.

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 15 / 28

Page 16: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Nelson-Aalen estimator

Nelson-Aalen EstimatorThe Nelson-Aalen estimator is

H(t) =

{0, if t ≤ t1∑

ti≤t di/Yi , if ti ≤ t.

The estimated variance is

σ2H(t) =

∑ti≤t

diY 2i

.

For the leukemia example,

H(10) =3

21+

1

17+

1

15= 0.2683

σ2H(10) =

3

212+

1

172+

1

152= 0.0147

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 16 / 28

Page 17: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Nelson-Aalen estimator

Pointwise Confidence Interval

The 100(1− α)% pointwise confidence interval for cumulative hazardfunction is

H(t0)± zα/2σH(t0)

For example, in the leukemia example, the 95% pointwise confidenceinterval is

H(10)± z0.025σ2H(10) = 0.2683± (1.96)

√0.0147 = (0.0307, 0.5059).

Similar to survival function, the pointwise confidence interval of H(t)can also be obtained using transformation.

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 17 / 28

Page 18: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Nelson-Aalen estimator

SAS Code

By default, SAS only outputs Kaplan Meier estimator. Withoption nelson, SAS also outputs Nelson-Aalen estimator.

Use statement ods output ProductLimitEstimates = mySASfile;to output the results of Kaplan-Meier estimators andNelson-Aalen estimator.

Draw cumulative hazard function using step statement in procsgplot.

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 18 / 28

Page 19: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Mean and median survival time

Mean Survival Time

The estimated mean survival time is

µτ =

∫ τ

0

S(t)dt,

where τ is either the longest observed time or preassigned by theinvestigator.

Because S(t) is a step function, SAS estimate µτ using

µ =D∑i=1

S(ti−1)(ti − ti−1),

where tD is the largest event time and t0 = 0.

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 19 / 28

Page 20: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Mean and median survival time

Example

For the leukemia example,

ti 6 7 10 13 16 22 23di 3 1 1 1 1 1 1Yi 21 17 15 12 11 7 6S(ti ) 0.8571 0.8067 0.7529 0.6902 0.6275 0.5378 0.4482

The estimated mean is

(0.5378)(23− 22) + (0.6275)(22− 16) + · · ·+ (1.0000)(6− 0)

= 17.9093

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 20 / 28

Page 21: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Mean and median survival time

Variance and Confidence Interval

The variance of this estimator is

V (µτ ) =D∑i=1

[ ∫ τ

ti

S(t)dt]2 diYi(Yi − di)

.

A 100(1− α)% confidence interval for the mean is

µτ ± zα/2

√V (µτ )

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 21 / 28

Page 22: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Mean and median survival time

Median Survival Time

Recall that pth quantile is

xp = inf{t : S(t) ≤ 1− p}

In practice, find the smallest time xp for which S(t) is less than orequal to 1− p.

A 100(1− α)% confidence interval is the set of all time points tsatisfying

−zα/2 ≤S(t)− (1− p)

V 1/2[S(t)]≤ zα/2

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 22 / 28

Page 23: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Left-truncated and right-censored data

Example: Channing House

Channing House is a retirement center located in Palo Alto, CA. Thedata contain information on 462 individuals (97 males and 365females) who were in residence during 1964 and 1975.

age (in months): individual died and left the center

age entry (in months): when the individual entered the center

death: 1 = death and 0 = left

time: difference between age and age entry

gender: 1 = male and 2 = female

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 23 / 28

Page 24: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Left-truncated and right-censored data

Notations

Compared with right-censored data, we associated, with the jthindividual, a random age Lj at which he/she enters the study and atime Tj at which he/she either dies or censored.

Let t1 < t2 < · · · < tD be the distinct death times

Let di be the number of individuals who experience the event ofinterest at time ti .

Let Yi be the number of individuals who are at risk ofexperiencing the event of interest at time ti .

the number of individuals who entered the study prior to time tiand who have a study time of at least ti , that is, Y is thenumber of individuals with Lj ≤ ti ≤ Tj .

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 24 / 28

Page 25: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Left-truncated and right-censored data

Calculation Formulas

The previous formulas for right-censored data apply to left-truncatedand right-censored data. (Note: Yi are defined differently.)

Pay attention to the explanation. For example, the Kaplan-Meierestimator now estimate the probability of survival beyond t,conditional on survival to the smallest of the entry times L, or

P(X > t | X ≥ L) =S(t)

S(L).

Similar explanations apply to other quantities.

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 25 / 28

Page 26: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Left-truncated and right-censored data

Caution: Number At RiskIt is possible that Yi is small for some time ti . If, for some ti , Yi anddi are equal, then KM estimator will be zero for all t beyond thispoint. In this case, estimate S(t) conditional on survival to a timewhere this will not happen.

800 900 1000 1100

050

100

150

age in months

num

ber

at r

isk

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 26 / 28

Page 27: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Left-truncated and right-censored data

ExampleFor males (gender = 1), first individual entered the center at 751months and died at 777 months. The second individual entered at759 month and died at 781 months.

Hence

S(t) = 1 for t < 777

S(t) = 0.5 for 777 ≤ t < 781

S(t) = 0 for t ≥ 781.

We may estimate the conditional probability of survival beyond age t,given survival to age a. We estimate Sa(t) = P(X > t | X ≥ a) byconsidering only those deaths that occur after age a, that is

Sa(t) =∏

a≤ti≤t

(1− di

Yi

), t ≥ a

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 27 / 28

Page 28: STAT 7780: Survival Analysis - Auburn Universitywebhome.auburn.edu/~carpedm/courses/stat7040/Survival/03-nonpar.pdfSTAT 7780: Survival Analysis 3. Nonparametric Estimation Peng Zeng

Left-truncated and right-censored data

SAS Codeproc lifetest does not support left truncation. We can use proc phreginstead.

proc phreg data = SAS-data-set;model survtime * delta(0) = / entry = left-truncation-variable;by category-variable;output out = output-data-set survival = S;

run;

You must sort the data set by category-variable.

Use entry = to specify a variable for time of entry.

SAS outputs a data set containing estimated (conditional)survival probability.

Use step statement in proc sgplot to draw fitted survival function

Peng Zeng (Auburn University) STAT 7780 – Lecture Notes Fall 2017 28 / 28