43
Satistics 262 1 Statistics 262: Intermediate Biostatistics Jonathan Taylor and Kristin Cobb April 20, 2004: Introduction to Survival Analysis

Statistics 262: Intermediate Biostatistics

Embed Size (px)

DESCRIPTION

Statistics 262: Intermediate Biostatistics. April 20, 2004: Introduction to Survival Analysis. Jonathan Taylor and Kristin Cobb. What is survival analysis?. Statistical methods for analyzing longitudinal data on the occurrence of events. - PowerPoint PPT Presentation

Citation preview

Page 1: Statistics 262: Intermediate Biostatistics

Satistics 262 1

Statistics 262: Intermediate Biostatistics

Jonathan Taylor and Kristin Cobb

April 20, 2004: Introduction to Survival Analysis

Page 2: Statistics 262: Intermediate Biostatistics

Satistics 262 2

What is survival analysis? Statistical methods for analyzing

longitudinal data on the occurrence of events.

Events may include death, injury, onset of illness, recovery from illness (binary variables) or transition above or below the clinical threshold of a meaningful continuous variable (e.g. CD4 counts).

Accommodates data from randomized clinical trial or cohort study design.

Page 3: Statistics 262: Intermediate Biostatistics

Randomized Clinical Trial (RCT)

Target population

Intervention

Control

Disease

Disease-free

Disease

Disease-free

TIME

Random assignment

Disease-free, at-risk cohort

Page 4: Statistics 262: Intermediate Biostatistics

Target population

Treatment

Control

Cured

Not cured

Cured

Not cured

TIME

Random assignment

Patient population

Randomized Clinical Trial (RCT)

Page 5: Statistics 262: Intermediate Biostatistics

Target population

Treatment

Control

Dead

Alive

Dead

Alive

TIME

Random assignment

Patient population

Randomized Clinical Trial (RCT)

Page 6: Statistics 262: Intermediate Biostatistics

Cohort study (prospective/retrospective)

Target population

Exposed

Unexposed

Disease

Disease-free

Disease

Disease-free

TIME

Disease-free cohort

Page 7: Statistics 262: Intermediate Biostatistics

Satistics 262 7

Estimate time-to-event for a group of individuals, such as time until second heart-attack for a group of MI patients.

To compare time-to-event between two or more groups, such as treated vs. placebo MI patients in a randomized controlled trial.

To assess the relationship of co-variables to time-to-event, such as: does weight, insulin resistance, or cholesterol influence survival time of MI patients?

Note: expected time-to-event = 1/incidence rate

Objectives of survival analysis

Page 8: Statistics 262: Intermediate Biostatistics

Satistics 262 8

Examples of survival analysis in medicine

Page 9: Statistics 262: Intermediate Biostatistics

RCT: Women’s Health Initiative (JAMA, 2001)

On hormones

On placeboCumulative incidence

Page 10: Statistics 262: Intermediate Biostatistics

Satistics 262 10

Prospective cohort study:

From April 15, 2004 NEJM:

Use of Gene-Expression Profiling to Identify Prognostic Subclasses in Adult Acute Myeloid Leukemia

Page 11: Statistics 262: Intermediate Biostatistics

Satistics 262 11

Retrospective cohort study:From December 2003 BMJ: Aspirin, ibuprofen, and mortality after myocardial infarction: retrospective cohort study

Page 12: Statistics 262: Intermediate Biostatistics

Satistics 262 12

Why use survival analysis?

1. Why not compare mean time-to-event between your groups using a t-test or linear regression?

-- ignores censoring 2. Why not compare proportion of

events in your groups using logistic regression?

--ignores time

Page 13: Statistics 262: Intermediate Biostatistics

Satistics 262 13

Cox regression vs.logistic regression

Distinction between rate and proportion:

Incidence (hazard) rate: number of new cases of disease per population at-risk per unit time (or mortality rate, if outcome is death)

Cumulative incidence: proportion of new cases that develop in a given time period

Page 14: Statistics 262: Intermediate Biostatistics

Satistics 262 14

Cox regression vs.logistic regression

Distinction between hazard/rate ratio and odds ratio/risk ratio:

Hazard/rate ratio: ratio of incidence rates

Odds/risk ratio: ratio of proportions

By taking into account time, you are taking into account more information than just binary yes/no.

Gain power/precision.

Logistic regression aims to estimate the odds ratio; Cox regression aims to estimate the hazard ratio

Page 15: Statistics 262: Intermediate Biostatistics

Satistics 262 15

Rates vs. risks

Relationship between risk and rates:

htetR 1)(

t

h

in time disease ofy probabilitR(t)

rate hazardconstant

Page 16: Statistics 262: Intermediate Biostatistics

Satistics 262 16

Rates vs. risks

For example, if rate is 5 cases/1000 person-years, then the chance of developing disease over 10 years is:

0488.951.1)(

1)(

1)(05.

)10)(005(.

tR

etR

etRCompare to .005(10) = 5% The loss of persons

at risk because they have developed disease within the period of observation is small relative to the size of the total group.

Page 17: Statistics 262: Intermediate Biostatistics

Satistics 262 17

Rates vs. risks

If rate is 50 cases/1000 person-years, then the chance of developing disease over 10 years is:

39.61.1)(

1)(

1)(5.

)10)(05(.

tR

etR

etRCompare to .05(10) = 50%

Page 18: Statistics 262: Intermediate Biostatistics

Satistics 262 18

Rates vs. risksRelationship between risk and rates (derivation):

hthtthut

hu eeeeduhetR 1)( 0

00

Exponential density function for waiting time until the event (constant hazard rate)

hthetr )(

Preview: Waiting time distribution will change if the hazard rate changes as a function of time: h(t)

Page 19: Statistics 262: Intermediate Biostatistics

Survival Analysis: Terms Time-to-event: The time from entry into a

study until a subject has a particular outcome

Censoring: Subjects are said to be censored if they are lost to follow up or drop out of the study, or if the study ends before ends before they die or have an outcome of interest. They are counted as alive or disease-free for the time they were enrolled in the study. If dropout is related to both outcome and

treatment, dropouts may bias the results

Page 20: Statistics 262: Intermediate Biostatistics

Satistics 262 20

Right Censoring (T>t)

Common examples Termination of the study Death due to a cause that is not the

event of interest Loss to follow-up

We know that subject survived at least to time t.

Page 21: Statistics 262: Intermediate Biostatistics

Satistics 262 21

Left censoring (T<t) The origin time, not the event time, is

known only to be less than some value. For example, if you are studying

menarche and you begin following girls at age 12, you may find that some of them have already begun menstruating. Unless you can obtain information about the start date for those girls, the age of menarche is left-censored at age 12.

*from:Allison, Paul. Survival Analysis. SAS Institute. 1995.

Page 22: Statistics 262: Intermediate Biostatistics

Satistics 262 22

Interval censoring (a<T<b) When we know the event has

occurred between two time points, but don’t know the exact dates.

For example, if you’re screening subjects for HIV infection yearly, you may not be able to determine the exact date of infection.*

*from:Allison, Paul. Survival Analysis. SAS Institute. 1995.

Page 23: Statistics 262: Intermediate Biostatistics

Satistics 262 23

Data Structure: survival analysis Time variable: ti = time at last

disease-free observation or time at event

Censoring variable: ci =1 if had the event; ci =0 no event by time ti

Page 24: Statistics 262: Intermediate Biostatistics

Satistics 262 24

Choice of origin

Page 25: Statistics 262: Intermediate Biostatistics

Satistics 262 25

Page 26: Statistics 262: Intermediate Biostatistics

Satistics 262 26

Describing survival distributions Ti the event time for an individual,

is a random variable having a probability distribution.

Different models for survival data are distinguished by different choice of distribution for Ti.

Page 27: Statistics 262: Intermediate Biostatistics

Satistics 262 27

Survivor function (cumulative distribution function)

)()( tTPtF

)(1)(1)( tFtTPtS

Survival analysis typically uses complement, or the survivor function:

Example: If t=100 years, S(t=100) = probability of surviving beyond 100 years.

Cumulative failure function

Page 28: Statistics 262: Intermediate Biostatistics

Satistics 262 28

Corresponding density function

dt

tdS

dt

tdFtf

)()()(

The probability of the failure time occurring at exactly time t (out of the whole range of possible t’s).

t

ttTtPtf

t

)(lim)(

0

Also written:

Page 29: Statistics 262: Intermediate Biostatistics

Satistics 262 29

Hazard function

t

tTttTtPth

t

)/(lim)(

0

In words: the probability that if you survive to t, you will succumb to the event in the next instant.

)(

)((t) :survival anddensity from Hazard

tS

tfh

)(

)(

)(

)(

)(

)&()/()(

tS

dttf

tTP

dttTtP

tTP

tTdttTtPtTdttTtPdtth

Derivation:

Page 30: Statistics 262: Intermediate Biostatistics

Satistics 262 30

Relating these functions:

dt

tdSt

)()(f :survival fromDensity

)(

)((t) :survival anddensity from Hazard

tS

tfh

t

duuh

e 0

))((

S(t) :hazard from Survival

)(lndt

d-(t) :survival from Hazard tSh

t

duuh

etht 0

))((

)()(f :hazard fromDensity

t

duuf )(S(t) :density from Survival

Page 31: Statistics 262: Intermediate Biostatistics

Satistics 262 31

Introduction to Kaplan-Meier Non-parametric estimate of

survivor function. Commonly used to describe

survivorship of study population/s. Commonly used to compare two

study populations. Intuitive graphical presentation.

Page 32: Statistics 262: Intermediate Biostatistics

Beginning of study End of study Time in months

Subject B

Subject A

Subject C

Subject D

Subject E

Survival Data (right-censored)

1. subject E dies at 4 months

X

Page 33: Statistics 262: Intermediate Biostatistics

100%

Time in months

Corresponding Kaplan-Meier Curve

Probability of surviving to just before 4 months is 100% = 5/5

Fraction surviving this death = 4/5

Subject E dies at 4 months

Page 34: Statistics 262: Intermediate Biostatistics

Beginning of study End of study Time in months

Subject B

Subject A

Subject C

Subject D

Subject E

Survival Data 2. subject A drops out after 6 months

1. subject E dies at 4 months

X

3. subject C dies at 7 monthsX

Page 35: Statistics 262: Intermediate Biostatistics

100%

Time in months

Corresponding Kaplan-Meier Curve

subject C dies at 7 months

Fraction surviving this death = 2/3

Page 36: Statistics 262: Intermediate Biostatistics

Beginning of study End of study Time in months

Subject B

Subject A

Subject C

Subject D

Subject E

Survival Data 2. subject A drops out after 6 months

4. Subjects B and D survive for the whole year-long study period

1. subject E dies at 4 months

X

3. subject C dies at 7 monthsX

Page 37: Statistics 262: Intermediate Biostatistics

100%

Time in months

Corresponding Kaplan-Meier Curve

Product limit estimate of survival = P(surviving/at-risk through failure 1) * P(surviving/at-risk through failure 2) =4/5 * 2/3= .5333

Page 38: Statistics 262: Intermediate Biostatistics

The product limit estimate The probability of surviving in the entire

year, taking into account censoring = (4/5) (2/3) = 53%

NOTE: 40% (2/5) because the one drop-out survived at least a portion of the year.

AND <60% (3/5) because we don’t know if the one drop-out would have survived until the end of the year.

Page 39: Statistics 262: Intermediate Biostatistics

Satistics 262 39

KM estimator, formally

]1[)ˆ(

at timeevent thehave number who theis

risk-at sindividual are there, each timeat

sevent timedistinct k

:

1

ttj j

j

jj

jj

kj

jn

dtS

td

nt

t...t t

Page 40: Statistics 262: Intermediate Biostatistics

Comparing 2 groups

Page 41: Statistics 262: Intermediate Biostatistics

Caveats Survival estimates can be

unreliable toward the end of a study when there are small numbers of subjects at risk of having an event.

Page 42: Statistics 262: Intermediate Biostatistics

WHI and breast cancer

Small numbers

left

Page 43: Statistics 262: Intermediate Biostatistics

Satistics 262 43

Overview of SAS PROCS LIFETEST - Produces life tables and Kaplan-

Meier survival curves. Is primarily for univariate analysis of the timing of events.

LIFEREG – Estimates regression models with censored, continuous-time data under several alternative distributional assumptions. Does not allow for time-dependent covariates.

PHREG– Uses Cox’s partial likelihood method to estimate regression models with censored data. Handles both continuous-time and discrete-time data and allows for time-dependent covariables