53
Introduction to Survival Analysis August 3 and 5, 2004

Introduction to Survival Analysis August 3 and 5, 2004

Embed Size (px)

Citation preview

Page 1: Introduction to Survival Analysis August 3 and 5, 2004

Introduction to Survival Analysis

August 3 and 5, 2004

Page 2: Introduction to Survival Analysis August 3 and 5, 2004

OverviewOverview

What is survival analysis?Introduction to Kaplan-Meier methods.Introduction to Cox proportional hazards

methods (Thursday)Recommended reading in Walker: Chapters

21-22

Page 3: Introduction to Survival Analysis August 3 and 5, 2004

What is survival analysis?What is survival analysis?

Statistical methods for analyzing longitudinal data on the occurrence of events.

Events may include death, injury, onset of illness, recovery from illness (binary variables) or transition above or below the clinical threshold of a meaningful continuous variable (e.g. CD4 counts).

Accommodates data from randomized clinical trial or cohort study design.

Page 4: Introduction to Survival Analysis August 3 and 5, 2004

Randomized Clinical Trial Randomized Clinical Trial (RCT)(RCT)

Target population

Intervention

Control

Disease

Disease-free

Disease

Disease-free

TIME

Random assignment

Disease-free, at-risk cohort

Page 5: Introduction to Survival Analysis August 3 and 5, 2004

Target population

Treatment

Control

Cured

Not cured

Cured

Not cured

TIME

Random assignment

Patient population

Randomized Clinical Trial Randomized Clinical Trial (RCT)(RCT)

Page 6: Introduction to Survival Analysis August 3 and 5, 2004

Target population

Treatment

Control

Dead

Alive

Dead

Alive

TIME

Random assignment

Patient population

Randomized Clinical Trial Randomized Clinical Trial (RCT)(RCT)

Page 7: Introduction to Survival Analysis August 3 and 5, 2004

Cohort study Cohort study (prospective/retrospective) (prospective/retrospective)

Target population

Exposed

Unexposed

Disease

Disease-free

Disease

Disease-free

TIME

Disease-free cohort

Page 8: Introduction to Survival Analysis August 3 and 5, 2004

Examples of survival analysis Examples of survival analysis in medicinein medicine

Page 9: Introduction to Survival Analysis August 3 and 5, 2004

RCT: Women’s Health RCT: Women’s Health Initiative (Initiative (JAMAJAMA, 2001), 2001)

On hormones

On placeboCumulative incidence

Page 10: Introduction to Survival Analysis August 3 and 5, 2004

Retrospective cohort study:Retrospective cohort study:From December 2003 From December 2003 BMJBMJ: :

Aspirin, ibuprofen, and mortality after myocardial infarction: Aspirin, ibuprofen, and mortality after myocardial infarction:

retrospective cohort studyretrospective cohort study

Page 11: Introduction to Survival Analysis August 3 and 5, 2004

– Estimate time-to-event for a group of individuals, such as time until second heart-attack for a group of MI patients.

– To compare time-to-event between two or more groups, such as treated vs. placebo MI patients in a randomized controlled trial.

– To assess the relationship of co-variables to time-to-event, such as: does weight, insulin resistance, or cholesterol influence survival time of MI patients?

Note: expected time-to-event = 1/incidence rate

Objectives of survival analysis

Page 12: Introduction to Survival Analysis August 3 and 5, 2004

Survival Analysis: TermsSurvival Analysis: Terms Time-to-event: The time from entry into a study

until a subject has a particular outcome Censoring: Subjects are said to be censored if

they are lost to follow up or drop out of the study, or if the study ends before they die or have an outcome of interest. They are counted as alive or disease-free for the time they were enrolled in the study. – If dropout is related to both outcome and treatment,

dropouts may bias the results

Page 13: Introduction to Survival Analysis August 3 and 5, 2004

Why use survival analysis?Why use survival analysis?

1. Why not compare mean time-to-event between your groups using a t-test or linear regression?

-- ignores censoring 2. Why not compare proportion of events in

your groups using odds ratios or logistic regression?

--ignores time

Page 14: Introduction to Survival Analysis August 3 and 5, 2004

Data Structure: survival Data Structure: survival analysisanalysis

Time variable: ti = time at last disease-free observation or time at event

Censoring variable: ci =1 if had the event; ci =0 no event by time ti

Page 15: Introduction to Survival Analysis August 3 and 5, 2004

Choice of time of origin. Note varying start times.

Page 16: Introduction to Survival Analysis August 3 and 5, 2004

Count every subject’s time since their baseline data collection.

Page 17: Introduction to Survival Analysis August 3 and 5, 2004

Survival functionSurvival function

)()( tTPtS

Gives the probability of surviving past a certain time.

For example, the probability of surviving beyond 10, years, 50 years, or 100 years.

One goal of survival analysis is to estimate and compare survival experiences of different groups.

Survival experience is described by the survival function:

Page 18: Introduction to Survival Analysis August 3 and 5, 2004

Introduction to Kaplan-MeierIntroduction to Kaplan-Meier

Non-parametric estimate of the survival function.

Commonly used to describe survivorship of study population/s.

Commonly used to compare two study populations.

Intuitive graphical presentation.

Page 19: Introduction to Survival Analysis August 3 and 5, 2004

Beginning of study End of study Time in months

Subject B

Subject A

Subject C

Subject D

Subject E

Survival Data (right-censored)Survival Data (right-censored)

1. subject E dies at 4 months

X

Page 20: Introduction to Survival Analysis August 3 and 5, 2004

100%

Time in months

Corresponding Kaplan-Meier Corresponding Kaplan-Meier CurveCurve

Probability of surviving to 4 months is 100% = 5/5

Fraction surviving this death = 4/5

Subject E dies at 4 months

Page 21: Introduction to Survival Analysis August 3 and 5, 2004

Beginning of study End of study Time in months

Subject B

Subject A

Subject C

Subject D

Subject E

Survival DataSurvival Data

2. subject A drops out after 6 months

1. subject E dies at 4 months

X

3. subject C dies at 7 monthsX

Page 22: Introduction to Survival Analysis August 3 and 5, 2004

100%

Time in months

Corresponding Kaplan-Meier Corresponding Kaplan-Meier CurveCurve

subject C dies at 7 months

Fraction surviving this death = 2/3

Page 23: Introduction to Survival Analysis August 3 and 5, 2004

Beginning of study End of study Time in months

Subject B

Subject A

Subject C

Subject D

Subject E

Survival DataSurvival Data

2. subject A drops out after 6 months

4. Subjects B and D survive for the whole year-long study period

1. subject E dies at 4 months

X

3. subject C dies at 7 monthsX

Page 24: Introduction to Survival Analysis August 3 and 5, 2004

100%

Time in months

Corresponding Kaplan-Meier Corresponding Kaplan-Meier CurveCurve

Product limit estimate of survival = P(surviving event 1/at-risk up to failure 1) * P(surviving event 2/at-risk up to failure 2) =4/5 * 2/3= .5333

Page 25: Introduction to Survival Analysis August 3 and 5, 2004

The product limit estimateThe product limit estimate

The probability of surviving in the entire year, taking into account censoring

= (4/5) (2/3) = 53%

NOTE: 40% (2/5) because the one drop-out survived at least a portion of the year.

AND <60% (3/5) because we don’t know if the one drop-out would have survived until the end of the year.

Page 26: Introduction to Survival Analysis August 3 and 5, 2004

Comparing 2 groupsComparing 2 groups

Use log-rank test to test the null hypothesis of no difference between survival functions of the two groups.

Page 27: Introduction to Survival Analysis August 3 and 5, 2004

CaveatsCaveats

Survival estimates can be unreliable toward the end of a study when there are small numbers of subjects at risk of having an event.

Page 28: Introduction to Survival Analysis August 3 and 5, 2004

WHI and breast cancerWHI and breast cancer

Small numbers

left

Page 29: Introduction to Survival Analysis August 3 and 5, 2004

Limitations of Kaplan-MeierLimitations of Kaplan-Meier

• Mainly descriptive• Doesn’t control for covariates• Requires categorical predictors• Can’t accommodate time-dependent

variables

Page 30: Introduction to Survival Analysis August 3 and 5, 2004

Introduction to Cox RegressionIntroduction to Cox Regression

History“Regression Models and Life-Tables” by

D.R. Cox, published in 1972, is one of the most frequently cited journal articles in statistics and medicine

Page 31: Introduction to Survival Analysis August 3 and 5, 2004

Introduction to Cox RegressionIntroduction to Cox Regression

Also called proportional hazards regressionMultivariate regression technique where

time-to-event (taking into account censoring) is the dependent variable.

Estimates covariate-adjusted hazard ratios.– A hazard ratio is a ratio of incidence, or hazard,

rates

Page 32: Introduction to Survival Analysis August 3 and 5, 2004

Introduction to Cox RegressionIntroduction to Cox Regression

Distinction between rate and proportion: Incidence rate: number of new cases of disease per

population at-risk per unit time – Hazard rate: Instantaneous incidence rate; probability

that, given you survived disease-free up to time t, you succumb to the disease in the next instant.

Cumulative incidence (or cumulative risk): proportion of new cases that develop in a given time period

Page 33: Introduction to Survival Analysis August 3 and 5, 2004

Rates vs. risksRates vs. risks

Relationship between risk and rates:

htetR 1)(

t

h

in time disease ofy probabilitR(t)

rate hazardconstant

Page 34: Introduction to Survival Analysis August 3 and 5, 2004

Rates vs. risksRates vs. risks

For example, if rate is 5 cases/1000 person-years, then the chance of developing disease over 10 years is:

0488.951.1)(

1)(

1)(05.

)10)(005(.

tR

etR

etRCompare to .005(10) = 5% The loss of persons

at risk because they have developed disease within the period of observation is small relative to the size of the total group.

Page 35: Introduction to Survival Analysis August 3 and 5, 2004

Rates vs. risksRates vs. risks

If rate is 50 cases/1000 person-years, then the chance of developing disease over 10 years is:

39.61.1)(

1)(

1)(5.

)10)(05(.

tR

etR

etRCompare to .05(10) = 50%

Page 36: Introduction to Survival Analysis August 3 and 5, 2004

Distinction between hazard/rate ratio and odds ratio/risk ratio:

Hazard ratio: ratio of hazard ratesOdds/risk ratio: ratio of proportions

By taking into account time, you are taking into account more information than just binary yes/no.

Gain power/precision.

Logistic regression aims to estimate the odds ratio; Cox regression aims to estimate the hazard ratio

Introduction to Cox Regression

Page 37: Introduction to Survival Analysis August 3 and 5, 2004

Example: Example: Study of publication biasStudy of publication bias

By Kaplan-Meier methods

From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)

Page 38: Introduction to Survival Analysis August 3 and 5, 2004

From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)

Table 4 Risk factors for time to publication using univariate Cox regression analysis

Characteristic # not published # published Hazard ratio (95% CI)

 Null 29 23 1.00

Non-significant trend

16 4 0.39 (0.13 to 1.12)

Significant 47 99 2.32 (1.47 to 3.66)

Interpretation: Significant results have a 2-fold higher incidence of publication compared to null results.

Univariate Cox regressionUnivariate Cox regression

Page 39: Introduction to Survival Analysis August 3 and 5, 2004

Example : Example : Study of mortality in academy Study of mortality in academy award winning screenwriters (multivariate)award winning screenwriters (multivariate)

Kaplan-Meier methods

From: Longevity of screenwriters who win an academy award: longitudinal study BMJ 2001;323:1491-1496 ( 22-29 December )

Page 40: Introduction to Survival Analysis August 3 and 5, 2004

 Table 2. Death rates for screenwriters who have won an academy award. Values are percentages (95% confidence intervals) and are adjusted for the factor indicated   Relative increase

in death rate for winners

Basic analysis 37 (10 to 70)

Adjusted analysis  

Demographic:  

  Year of birth 32 (6 to 64)

  Sex 36 (10 to 69)

  Documented education 39 (12 to 73)

  All three factors 33 (7 to 65)

Professional:  

  Film genre 37 (10 to 70)

  Total films 39 (12 to 73)

  Total four star films 40 (13 to 75)

  Total nominations 43 (14 to 79)

  Age at first film 36 (9 to 68)

  Age at first nomination 32 (6 to 64)

  All six factors 40 (11 to 76)

All nine factors 35 (7 to 70) 

HR=1.37; interpretation: 37% higher incidence of death for winners compared with nominees

HR=1.35; interpretation: 35% higher incidence of death for winners compared with nominees even after adjusting for potential confounders

Page 41: Introduction to Survival Analysis August 3 and 5, 2004

The modelThe model

ikki xxi ethth ...

011)()(

Components:

•A baseline hazard function that is left unspecified but must be positive (=the hazard when all covariates are 0)

•A linear function of a set of k fixed covariates that is exponentiated. (=the relative risk)

ikkii xxthth ...)(log)(log 110

Can take on any form

Page 42: Introduction to Survival Analysis August 3 and 5, 2004

)(

0

0

2

1 21

2

1

)(

)(

)(

)( xxx

x

eeth

eth

th

thHR

The modelThe model

The point is to compare the hazard rates of individuals who have different covariates:

Hence, called Proportional hazards:

Hazard functions should be strictly parallel.

Page 43: Introduction to Survival Analysis August 3 and 5, 2004

Evaluation of proportional hazards assumption.

Page 44: Introduction to Survival Analysis August 3 and 5, 2004

Characteristics of Cox Characteristics of Cox RegressionRegression

Cox models the effect of predictors and covariates on the hazard rate but leaves the baseline hazard rate unspecified.

Does NOT assume knowledge of absolute risk.

Estimates relative rather than absolute risk.

Page 45: Introduction to Survival Analysis August 3 and 5, 2004

Assumptions of Cox RegressionAssumptions of Cox Regression

Proportional hazards assumption: the hazard for any individual is a fixed proportion of the hazard for any other individual

Multiplicative risk

Page 46: Introduction to Survival Analysis August 3 and 5, 2004

Survival analysis: ExampleSurvival analysis: Example

Page 47: Introduction to Survival Analysis August 3 and 5, 2004

<1800 g (n=15)

1800-2199 g (n=55)

≥2200 g (n=52)

Kaplan-Meier estimates of stress fracture-free survivorship by BMC at baseline

Page 48: Introduction to Survival Analysis August 3 and 5, 2004

<800 mg/day (n=22)

800-1499 mg/day (n=63)

1500+mg/day (n=36)

Kaplan-Meier estimates of stress fracture-free survivorship by levels of daily calcium intake at baseline

Page 49: Introduction to Survival Analysis August 3 and 5, 2004

Previous fracture (n=39)

No previous fracture(n=83)

Kaplan-Meier estimates of stress fracture-free survivorship by previous stress fracture

Page 50: Introduction to Survival Analysis August 3 and 5, 2004

Lowest quartile of lean mass

Highest quartile of lean mass

Middle two quartiles

Page 51: Introduction to Survival Analysis August 3 and 5, 2004
Page 52: Introduction to Survival Analysis August 3 and 5, 2004

Risk FactorsRisk Factors

  Hazard Ratio (95% CI) History of menstrual irregularity prior to baseline 2.91 (0.81,10.43)BMC<1800g 3.70 (1.31, 10.46)

Low calcium (<800 mg/d) 3.60 (1.12,11.59)

Stress fracture prior to baseline 5.45 (1.48,20.08)Fat mass (per kg) 1.05 (0.91, 1.21)

**All analyses are stratified on site and menstrual status at baseline, and adjusted for age and spine Z-score at baseline using Cox

Regression.

Page 53: Introduction to Survival Analysis August 3 and 5, 2004

Other protective factorsOther protective factors

Hazard Ratio (95% CI) Spine BMD (per 1-standard deviation increase) .54 (0.30, 0.96)Every 100-mg/d calcium (continuous) .90 (0.81, 0.99)

Lean mass (per kg), time-dependent .91 (0.81, 1.02)Change in lean mass (per kg) .83 (0.56, 1.24)Menarche (per 1-year older) .55 (0.34,0.90)

**All analyses are stratified on site and menstrual status at baseline, and adjusted for age and spine Z-score at baseline (except spine Z

score) using Cox Regression.