42
2 Kaplan- Meier Survival Curves and the Log-Rank Test D.G. Kleinbaum and M. Klein, Survival Analysis: A Self-Learning Text, Third Edition, Statistics for Biology and Health, DOI 10.1007/978-1-4419-6646-9_2, # Springer Science+Business Media, LLC 2012 55

[Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

  • Upload
    mitchel

  • View
    238

  • Download
    18

Embed Size (px)

Citation preview

Page 1: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

2 Kaplan-

Meier

Survival

Curves and

the Log-Rank

Test

D.G. Kleinbaum and M. Klein, Survival Analysis: A Self-Learning Text, Third Edition,Statistics for Biology and Health, DOI 10.1007/978-1-4419-6646-9_2,# Springer Science+Business Media, LLC 2012

55

Page 2: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

Introduction We begin with a brief review of the purposes of survivalanalysis, basic notation and terminology, and the basicdata layout for the computer.

We then describe how to estimate and graph survivalcurves using the Kaplan-Meier (KM) method. The esti-mated survival probabilities are computed using a productlimit formula.

Next, we describe how to compare two or more survivalcurves using the log–rank test of the null hypothesis ofa common survival curve. For two groups, the log–rankstatistic is based on the summed observed minus expectedscore for a given group and its variance estimate. Forseveral groups, a computer should always be used becausethe log–rank formula is more complicated mathematically.The test statistic is approximately chi-square in largesamples with G � 1 degrees of freedom, where G denotesthe number of groups being compared.

Several alternatives to the log–rank test will be brieflydescribed. These tests are variations of the log rank testthat weigh each observation differently. They are also largesample chi-square tests with G � 1 degrees of freedom.

Finally, we describe how to compute confidence intervalsfor the KM curve and for the median survival time.

AbbreviatedOutline

The outline below gives the user a preview of the materialto be covered by the presentation. A detailed outline forreview purposes follows the presentation.

I. Review (pages 58–60)

II. An example of Kaplan-Meier curves (pages 61–65)

III. General features of KM curves (pages 66–67)

IV. The log-rank test for two groups (pages 67–71 )

V. The log-rank test for several groups (pages 71–73)

VI. Alternatives to the log rank test (pages 73–78)

VII. Confidence intervals for KM curves(pages 78–79)

VIII. Confidence intervals for the median survivaltime (page 80)

IX. Summary (page 81)

56 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 3: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

Objectives Upon completing the chapter, the learner should be able to:

1. Compute Kaplan-Meier (KM) probabilities of survival,given survival time and failure status information on asample of subjects.

2. Interpret a graph of KM curves that compare two ormore groups.

3. Draw conclusions as to whether or not two or moresurvival curves are the same based on computer resultsthat provide a log–rank test and/or an alternative test.

4. Decide whether the log–rank test or one of thealternatives to this test is more appropriate for a givenset of survival data.

5. Compute a 95% confidence interval for a KM survivalprobability.

6. Compute a 95% confidence interval for the mediansurvival time obtained from a KM curve.

Objectives 57

Page 4: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

PresentationThis presentation describes how to plot andinterpret survival data using Kaplan-Meier(KM) survival curves and how to test whetheror not two or more KM curves are equivalentusing the log–rank test. We also describe alter-native tests to the log–rank test. Furthermore,we provide formulae for computing 95%confidence intervals for a KM curve and forthe median survival time.

I. Review We begin by reviewing the basics of survivalanalysis. Generally, survival analysis is a col-lection of statistical procedures for the analysisof data in which the outcome variable of inter-est is time until an event occurs. By event, wemean death, disease incidence, relapse fromremission, or any designated experience ofinterest that may happen to an individual.

When doing a survival analysis, we usuallyrefer to the time variable as survival time. Wealso typically refer to the event as a failure.

Most survival analyses consider a key data ana-lytical problemcalledcensoring. In essence, cen-soring occurs when we have some informationabout individual survival time, but we don’tknow the survival time exactly.

Most survival time data is right-censored,because the true survival time interval, whichwe don’t really know, has been cut off (i.e.,censored) at the right side of the observedtime interval, giving us an observed survivaltime that is shorter than the true survival time.We want to use the observed survival time todraw implications about the true survival time.

As notation, we denote by a capital T therandom variable for a person’s survival time.Next, we denote by a small letter t any specificvalue of interest for the variable T.

plot and interpretKM survival curves

test equivalence ofKM curves usinglog–rank testalternative tests9.5% CI for KM andmedium survival

FOCUS

Start EventTIME

Event: deathdiseaserelapse

Time ¼ survival time

Event ¼ failure

Censoring: Don’t know survivaltime exactly

True survival time

Observed survival time

Right-censored

NOTATION

T = survival time

random variablet = specific value for T

58 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 5: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

We let d denote a (0,1) random variableindicating either censorship or failure. A per-son who does not fail, that is, does not get theevent during the study period, must have beencensored either before or at the end of thestudy.

The survivor function, denoted by S(t), givesthe probability that the random variable Texceeds the specified time t.

Theoretically, as t ranges from 0 up to infinity,the survivor function is graphed as a decreas-ing smooth curve, which begins at S(t) ¼ 1 att ¼ 0 and heads downward toward zero ast increases toward infinity.

In practice, using data, we usually obtain esti-mated survivor curves that are step functions,as illustrated here, rather than smooth curves.

The hazard function, denoted by h(t), gives theinstantaneous potential per unit time forthe event to occur given that the individualhas survived up to time t.

In contrast to the survivor function, whichfocuses on not failing, the hazard functionfocuses on failing; in other words, the higherthe average hazard, the worse the impact onsurvival. The hazard is a rate, rather than aprobability. Thus, the values of the hazardfunction range between zero and infinity.

Regardless of which function S(t) or h(t) oneprefers, there is a clearly defined relation-ship between the two. In fact, if one knowsthe form of S(t), one can derive the corres-ponding h(t), and vice versa.

d ¼ 0, 1ð Þ random variable

¼ 1 if failure0 if censored

S tð Þ ¼ survivor function

¼ Pr T > tð Þ

1

0

Theoretical S(t)

t

S(t)

S(0)

S(¥)

¥

S(t)

S(t) in practice

t

1

0 Study end

h tð Þ ¼ hazard function

¼ instantaneous potential

given survival up to time t

Not failing

FailingS(t)

h(t)

h(t) is a rate: 0 to 1

h(t)S(t)

Presentation: I. Review 59

Page 6: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

The general data layout for a survival analysisis given by the table shown here. The first col-umn of the table identifies the study subjects.The second column gives the observed survivaltime information. The third column gives theinformation for d, the dichotomous variablethat indicates censorship status. The remain-der of the information in the table gives valuesfor explanatory variables of interest.

An alternative data layout is shown here. Thislayout is the basis upon which Kaplan-Meiersurvival curves are derived. The first columnin the table gives ordered survival times fromsmallest to largest. The second column givesfrequency counts of failures at each distinct fail-ure time. The third column gives frequencycounts, denoted by qf, of those persons censoredin the time interval starting with failure time t(f)up to but not including the next failure time,denoted by t(fþ1). The last column gives therisk set, which denotes the collection of indivi-duals who have survived at least to time t(f).

To estimate the survival probability at a giventime, we make use of the risk set at that time toinclude the information we have on a censoredperson up to the time of censorship, ratherthan simply throw away all the informationon a censored person.

The actual computation of such a survivalprobability can be carried out using theKaplan-Meier (KM) method. We introduce theKM method in the next section by way of anexample.

General Data Layout:

Indiv. # t d X1 X2 . . . Xp

1 t1 d1 X11 X12 . . . X1p

2 t2 d2 X21 X22 . . . X2p

� � � � �� � � � �� � � � �n tn dn Xn1 Xn2 . . . Xnp

Alternative (ordered) datalayout:

Orderedfailuretimes,t(f )

# offailures

mf

# censored in[t(f ), t(fþ1)),

qf

Riskset,

R(t(f ))

t(0) ¼ 0 m0 ¼ 0 q0 R(t(0))t(1) m1 q1 R(t(1))t(2) m2 q2 R(t(2))� � � �� � � �� � � �t(k) mk qk R(t(k))

Table of ordered failures:

� Uses all information up to timeof censorship;

� S(t) is derived from R(t).

Survival probability:Use Kaplan-Meier (KM)method.

60 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 7: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

II. An Example ofKaplan-Meier Curves

The data for this example derive from a studyof the remission times in weeks for two groupsof leukemia patients, with 21 patients in eachgroup. Group 1 is the treatment group andgroup 2 is the placebo group. The basic questionof interest concerns comparing the survivalexperience of the two groups.

Of the 21 persons in group 1, 9 failed during thestudy period and 12 were censored. In contrast,none of the data in group 2 are censored; thatis, all 21 persons in the placebo group went outof remission during the study period.

In Chapter 1, we observed for this data set thatgroup 1 appears to have better survival progno-sis than group 2, suggesting that the treatmentis effective. This conclusion was supported bydescriptive statistics for the average survivaltime and average hazard rate shown. Note,however, that descriptive statistics provideoverall comparisons but do not compare thetwo groups at different times of follow-up.

EXAMPLE

The data: remission times (weeks) fortwo groups of leukemia patients

Group 1 (n ¼ 21)treatment

Group 2 (n ¼ 21)placebo

6, 6, 6, 7, 10, 1, 1, 2, 2, 3,13, 16, 22, 23, 4, 4, 5, 5,6þ, 9þ, 10þ, 11þ, 8, 8, 8, 8,17þ, 19þ, 20þ, 11, 11, 12, 12,25þ, 32þ, 32þ, 15, 17, 22, 2334þ, 35þ,

Note: þ denotes censored

# failed # censored Total

Group 1 9 12 21Group 2 21 0 21

Descriptive statistics:

�T1 ignoringþ 0sð Þ ¼ 17:1; �T2 ¼ 8:6

�h1 ¼ :025; �h2 ¼ :115;�h2�h1

¼ 4:6

Presentation: II. An Example of Kaplan-Meier Curves 61

Page 8: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

A table of ordered failure times is shown herefor each group. These tables provide the basicinformation for the computation of KM curves.

Each table begins with a survival time of zero,even though no subject actually failed at thestart of follow-up. The reason for the zero isto allow for the possibility that some subjectsmight have been censored before the earliestfailure time.

Also, each table contains a column denoted asnf that gives the number of subjects in the riskset at the start of the interval. Given that therisk set is defined as the collection of indivi-duals who have survived at least to time t(f),it is assumed that nf includes those personsfailing at time t(f). In other words, nf countsthose subjects at risk for failing instanta-neously prior to time t(f).

We now describe how to compute the KMcurve for the table for group 2. The computa-tions for group 2 are quite straightforwardbecause there are no censored subjects forthis group.

The table of ordered failure times for group2 is presented here again with the addition ofanother column that contains survival proba-bility estimates. These estimates are the KMsurvival probabilities for this group. We willdiscuss the computations of these probabilitiesshortly.

EXAMPLE: (continued)

Ordered failure times:

Group 1 (treatment)

t(f) nf mf qf

0 21 0 06 21 3 17 17 1 110 15 1 213 12 1 016 11 1 322 7 1 023 6 1 5>23 — — —

Group 2 (placebo)

t(f) nf mf qf

0 21 0 01 21 2 02 19 2 03 17 1 04 16 2 05 14 2 08 12 4 011 8 2 012 6 2 015 4 1 017 3 1 022 2 1 023 1 1 0

Group 2: no censored subjectsGroup 2 (placebo)

t(f) nf mf qf S (t(f))

0 21 0 0 11 21 2 0 19/21 ¼ .902 19 2 0 17/21 ¼ .813 17 1 0 16/21 ¼ .764 16 2 0 14/21 ¼ .675 14 2 0 12/21 ¼ .578 12 4 0 8/21 ¼ .38

11 8 2 0 6/21 ¼ .2912 6 2 0 4/21 ¼ .1915 4 1 0 3/21 ¼ .1417 3 1 0 2/21 ¼ .1022 2 1 0 1/21 ¼ .0523 1 1 0 0/21 ¼ .00

62 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 9: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

A plot of the KM survival probabilitiescorresponding to each ordered failure time isshown here for group 2. Empirical plots suchas this one are typically plotted as a step func-tion that starts with a horizontal line at a sur-vival probability of 1 and then steps down tothe other survival probabilities as we movefrom one ordered failure time to another.

We now describe how the survival probabilitiesfor the group 2 data are computed. Recall thata survival probability gives the probability thata study subject survives past a specified time.

Thus, considering the group 2 data, the pro-bability of surviving past zero is unity, as itwill always be for any data set.

Next, the probability of surviving past thefirst ordered failure time of 1 week is given by19/21 or (.90) because 2 people failed at 2 week,so that 19 people from the original 21 remainas survivors past 2 week.

Similarly, the next probability concerns sub-jects surviving past 2 weeks, which is 17/21(or. 81) because 2 subjects failed at 1 week and2 subjects failed at 2 weeks leaving 17 out of theoriginal 21 subjects surviving past 2 weeks.

The remaining survival probabilities in thetable are computed in the same manner, thatis, we count the number of subjects survivingpast the specified time being considered anddivide this number by 21, the number of sub-jects at the start of follow-up.

Recall that no subject in group 2 was censored,so the q column for group 2 consists entirelyof zeros. If some of the q’s had been nonzero,an alternative formula for computing survivalprobabilities would be needed. This alterna-tive formula is called the Kaplan-Meier (KM)approach and can be illustrated using thegroup 2 data even though all values of q arezero.

EXAMPLE: (continued)

KM Curve for Group 2 (Placebo)

1

0 5

.5

10 15 20Weeks

S(t)

S(t) ¼ Pr (T > t)

Group 2 (placebo)

t(f) nf mf qf S (t(f))

0 21 0 0 11 21 2 0 19/21 ¼ .902 19 2 0 17/21 ¼ .813 17 1 0 16/21 ¼ .764 16 2 0 14/21 ¼ .675 14 2 0 12/21 ¼ .578 12 4 0 8/21 ¼ .38

11 8 2 0 6/21 ¼ .2912 6 2 0 4/21 ¼ .1915 4 1 0 3/21 ¼ .1417 3 1 0 2/21 ¼ .1022 2 1 0 1/21 ¼ .0523 1 1 0 0/21 ¼ .00

Sðtðf ÞÞ ¼# surviving past tðf Þ

21

No censorship in group 2Alternative formula: KM approach

Presentation: II. An Example of Kaplan-Meier Curves 63

Page 10: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

For example, an alternative way to calculatethe survival probability of exceeding 4 weeksfor the group 2 data can be written using theKM formula shown here. This formula involvesthe product of conditional probability terms.That is, each term in the product is the proba-bility of exceeding a specific ordered failuretime t(f) given that a subject survives up to thatfailure time.

Thus, in the KM formula for survival past4 weeks, the term 19/21 gives the probabilityof surviving past the first ordered failure time,1 week, given survival up to the first week. Notethat all 21 persons in group 2 survived up to1 week, but that 2 failed at 1 week, leaving 19persons surviving past 1 week.

Similarly, the term 16/17 gives the probabilityof surviving past the third ordered failure timeat week 3, given survival up to week 3. Therewere 17 persons who survived up to week 3 and1 of these then failed, leaving 16 survivors pastweek 3. Note that the 17 persons in the deno-minator represents the number in the risk setat week 3.

Notice that the product terms in the KM for-mula for surviving past 4 weeks stop at the 4thweek with the component 14/16. Similarly, theKM formula for surviving past 8 weeks stopsat the eighth week.

More generally, any KM formula for a survivalprobability is limited to product terms up tothe survival week being specified. That iswhy the KM formula is often referred to as a“product-limit” formula.

Next, we consider the KM formula for the datafrom group 1, where there are several censoredobservations.

The estimated survival probabilities obtainedusing the KM formula are shown here forgroup 1.

The first survival estimate on the list is S(0) � 1,as it will always be, because this gives the prob-ability of surviving past time zero.

EXAMPLE

Sð4Þ ¼ 1� 19

21� 17

19� 16

17� 14

16¼ 14

21¼ :67

Pr (T > t(f) T � t(f))

Sð4Þ ¼ 1� 1921

� 17

19� 16

17� 14

16¼ 14

21¼ :67

19

21¼ Pr T > 1 T � 1jð Þ

16

17¼ Pr T > 3 T � 3jð Þ

17 ¼ # in risk set at week 3

Sð4Þ ¼ 1� 19

21� 17

19� 16

17� 14

16

Sð8Þ ¼ 1� 19

21� 17

19� 16

17� 14

16� 12

14� 8

12

KM formula ¼ product limitformula

Group 1 (treatment)

t(f) nf mf qf S (t(f))

0 21 0 0 1

6 21 3 1 1� 1821�

��

64 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 11: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

The other survival estimates are calculated bymultiplying the estimate for the immediatelypreceding failure time by a fraction. For exam-ple, the fraction is 18/21 for surviving pastweek 6, because 21 subjects remain up toweek 6 and 3 of these subjects fail to survivepast week 6. The fraction is 16/17 for survivingpast week 7, because 17 people remain up toweek 7 and 1 of these fails to survive pastweek 7.The other fractions are calculated similarly.

For a specified failure time t(f), the fractionmaybe generally expressed as the conditional prob-ability of surviving past time t(f), given avail-ability (i.e., in the risk set) at time t(f). Thisis exactly the same formula that we previouslyused to calculate each product term in the pro-duct limit formula used for the group 2 data.

Note that a subject might not be available attime t(f) for one of two reasons: (1) either thesubject has failed prior to t(f), or (2) the subjecthas been censored prior to t(f). Group 1 hascensored observations, whereas group 2 doesnot. Thus, for group 1, censored observationshave to be taken into account when determin-ing the number available at t(f).

Plots of the KM curves for groups 1 and 2 areshown here on the same graph. Notice that theKM curve for group 1 is consistently higherthan the KM curve for group 2. These figuresindicate that group 1, which is the treatmentgroup, has better survival prognosis than group2, the placebo group. Moreover, as the numberof weeks increases, the two curves appear to getfarther apart, suggesting that the beneficialeffects of the treatment over the placebo aregreater the longer one stays in remission.

The KM plots shown above can be easilyobtained from most computer packages thatperform survival analysis, including SAS, Stata,SPSS, and R. All the user needs to do is providea KM computer program with the basic datalayout and then provide appropriate commandsto obtain plots.

EXAMPLE: (continued)

Group 1 (treatment)

t(f) nf mf qf S (t(f))

0 21 0 0 1

6 21 3 1 1� 1821

¼ :8571

7 17 1 1 :8571� 1617

¼ :8067

10 15 1 2 :8067� 14

15¼ :7529

13 12 1 0 :7529� 11

12¼ :6902

16 11 1 3 :6902� 10

11¼ :6275

22 7 1 0 :6275� 6

7¼ :5378

23 6 1 5 :5378� 5

6¼ :4482

Fraction at t(f ): Pr(T > t(f ) | T � t(f ))

Not available at t( f ): failed prior to t( f )orcensored prior to t( f )

group 1 only

KM Plots for Remission Data

Group 1 (treatment)

Group 2 (placebo)

0

1

0.8

0.6

0.4

0.2

08 16 24 32

Obtain KM plots fromcomputer package, e.g., SAS,

Stata,SPSSR

Presentation: II. An Example of Kaplan-Meier Curves 65

Page 12: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

III. General Featuresof KM Curves

The general formula for a KM survival probabil-ity at failure time t(f) is shown here. This formulagives the probability of surviving past the previ-ous failure time t(f�1), multiplied by the condi-tional probability of surviving past time t(f),given survival to at least time t(f).

The above KM formula can also be expressed asa product limit if we substitute for the survivalprobability S (t(f�1)), the product of all fractionsthat estimate the conditional probabilities forfailure times t(f�1) and earlier.

For example, the probability of survivingpast 10 weeks is given in the table for group 1(page 65) by .8067 times 14/15, which equals.7529. But the .8067 can be alternativelywritten as the product of the fractions 18/21and 16/17. Thus, the product limit formula forsurviving past 10 weeks is given by the tripleproduct shown here.

Similarly, the probability of surviving past16 weeks can be written either as .6902 � 10/11,or equivalently as the five-way product of frac-tions shown here.

The general expression for the product limitformula for the KM survival estimate is shownhere together with the general KM formulagiven earlier. Both expressions are equivalent.

A simple mathematical proof of the KM for-mula can be described in probability terms.One of the basic rules of probability is thatthe probability of a joint event, say A and B,is equal to the probability of one event, say A,times the conditional probability of the otherevent, B, given A.

General KM formula:

S t fð Þ� �

¼ S t f�1ð Þ� �� Pr T > t fð ÞjT � t fð Þ

� �KM formula ¼ product limit

formula

S tðf�1Þ� � ¼Yf�1

i¼1

Pr T> tðiÞ T � tðiÞ��� �

EXAMPLE

Sð10Þ ¼ :8067� 14

15¼ :7529

¼ 18 ×21

1617

� 14

15

Sð16Þ ¼ :6902� 10

11

¼ 18 ×21

1617

14× ×15

1112

� 10

11

S tðf Þ� � ¼ Yf

i¼1

Pr T > tðiÞ T � tðiÞ��� �

¼ Sðtðf�1ÞÞ� Pr T> tðf Þ T � tðf Þ

��� �Math proof:

Pr(A and B) ¼ Pr(A) � Pr(B | A)always

66 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 13: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

If we let A be the event that a subject survives toat least time t(f) and we let B be the event that asubject survives past time t(f), then the jointevent A and B simplifies to the event B, whichis inclusive of A. It follows that the probabilityof A and B equals the probability of survivingpast time t(f).

Also, because t(f) is the next failure time aftert(f�1), there can be no failures after time t(f�1)

and before time t(f). Therefore, the probabilityof A is equivalent to the probability of survivingpast the (f � 1)th ordered failure time.

Furthermore, the conditional probability of Bgiven A is equivalent to the conditional proba-bility in the KM formula.

Thus, using the basic rules of probability, theKM formula can be derived.

IV. The Log-Rank Testfor Two Groups

We now describe how to evaluate whether ornot KM curves for two or more groups arestatistically equivalent. In this section we con-sider two groups only. Themost popular testingmethod is called the log–rank test.

When we state that two KM curves are “statisti-cally equivalent,” we mean that, based on a test-ing procedure that compares the two curves insome “overall sense,” we do not have evidence toindicate that the true (population) survivalcurves are different.

A ¼ “T � t(f )” ! A and B ¼ BB ¼ “T > t(f )”Pr(A and B) ¼ Pr(B) ¼ S(t( f ))

No failures during t(f�1) < T < t(f )Pr(A) ¼ Pr(T > t(f�1))¼ S(t( f −1))

Pr(B|A) ¼ Pr(T > t( f )|T ≥ t( f ))

Thus, from Pr(A and B) formula,

Pr A and Bð Þ ¼ Pr Að Þ � Pr B j Að ÞS tð f Þ� � ¼ S tð f�1Þ

� �� Pr T > tð f ÞjT � tð f Þ

� �

Are KM curves statisticallyequivalent?

1.0

.8

.6

.4

.2

.08 16 24

Presentation: IV. The Log-Rank Test for Two Groups 67

Page 14: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

The log–rank test is a large-sample chi-squaretest that uses as its test criterion a statistic thatprovides an overall comparison of the KMcurves being compared. This (log–rank) statis-tic, like many other statistics used in otherkinds of chi-square tests,makes use of observedversus expected cell counts over categories ofoutcomes. The categories for the log–rank sta-tistic are defined by each of the ordered failuretimes for the entire set of data being analyzed.

As an example of the information required forthe log–rank test, we again consider the com-parison of the treatment (group 1) and placebo(group 2) subjects in the remission data on42 leukemia patients.

Here, for each ordered failure time, t(f), in theentire set of data, we show the numbers ofsubjects (mif) failing at that time, separately bygroup (i), followed by the numbers of subjects(nif) in the risk set at that time, also separatelyby group.

Thus, for example, at week 4, no subjects failedin group 1, whereas two subjects failed ingroup 2. Also, at week 4, the risk set for group 1contains 21 persons, whereas the risk set forgroup 2 contains 16 persons.

Similarly, at week 10, 1 subject failed in group 1,and no subjects failed at group 2; the risk setsfor each group contain 15 and 8 subjects,respectively.

We now expand the previous table to includeexpected cell counts and observed minusexpected values for each group at each orderedfailure time. The formula for the expectedcell counts is shown here for each group. Forgroup 1, this formula computes the expectednumber at time f (i.e., e1f) as the proportion ofthe total subjects in both groups who are at riskat time f, that is, n1f /(n1f þ n2f), multiplied bythe total number of failures at that time forboth groups (i.e., m1f þ m2f). For group 2, e2fis computed similarly.

� Chi-square test� Overall comparison of KM

curves� Observed versus expected

counts� Categories defined by ordered

failure times

EXAMPLE

Remission data: n ¼ 42

# failures # in risk set

t(f) m1f m2f n1f n2f

1 0 2 21 212 0 2 21 193 0 1 21 174 0 2 21 165 0 2 21 146 3 0 21 127 1 0 17 128 0 4 16 12

10 1 0 15 811 0 2 13 812 0 2 12 613 1 0 12 415 0 1 11 416 1 0 11 317 0 1 10 322 1 1 7 223 1 1 6 1

Expected cell counts:

e1f ¼ n1fn1f þ n2f

� � m1f þ m2f

� �"

Proportion

in risk set

"# of failures over

both groups

e2f ¼ n2fn1f þ n2f

� � m1f þ m2f

� �

68 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 15: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

When two groups are being compared, thelog–rank test statistic is formed using the sumof the observed minus expected counts over allfailure times for one of the two groups. In thisexample, this sum is �10.26 for group 1 and10.26 for group 2. We will use the group 2 valueto carry out the test, but as we can see, exceptfor the minus sign, the difference is the samefor the two groups.

For the two-group case, the log–rank statistic,shown here at the left, is computed by dividingthe square of the summed observed minusexpected score for one of the groups — say,group 2 — by the variance of the summedobserved minus expected score.

EXAMPLE

Expanded Table (Remission Data)

# failures # in risk set # expected Observed-expected

f t(f) m1f m2f n1f n2f e1f e2f m1f�e1f m2f�e2f

1 1 0 2 21 21 (21/42) � 2 (21/42) � 2 �1.00 1.002 2 0 2 21 19 (21/40) � 2 (19/40) � 2 �1.05 1.053 3 0 1 21 17 (21/38) � 1 (17/38) � 1 �0.55 0.554 4 0 2 21 16 (21/37) � 2 (16/37) � 2 �1.14 1.145 5 0 2 21 14 (21/35) � 2 (14/35) � 2 �1.20 1.206 6 3 0 21 12 (21/33) � 3 (12/33) � 3 1.09 �1.097 7 1 0 17 12 (17/29) � 1 (12/29) � 1 0.41 �0.418 8 0 4 16 12 (16/28) � 4 (12/28) � 4 �2.29 2.299 10 1 0 15 8 (15/23) � 1 (8/23) � 1 0.35 �0.35

10 11 0 2 13 8 (13/21) � 2 (8/21) � 2 �1.24 1.2411 12 0 2 12 6 (12/18) � 2 (6/18) � 2 �1.33 1.3312 13 1 0 12 4 (12/16) � 1 (4/16) � 1 0.25 �0.2513 15 0 1 11 4 (11/15) � 1 (4/15) � 1 �0.73 0.7314 16 1 0 11 3 (11/14) � 1 (3/14) � 1 0.21 �0.2115 17 0 1 10 3 (10/13) � 1 (3/13) � 1 �0.77 0.7716 22 1 1 7 2 (7/9) � 2 (2/9) � 2 �0.56 0.5617 23 1 1 6 1 (6/7) � 2 (1/7) � 2 �0.71 0.71

Totals 9 21 19.26 10.74 �10.26 −10.26

# of failure times

Oi � Ei ¼X#17f¼1

mif � eif� �

;

i ¼ 1; 2

EXAMPLE

O1 � E1 ¼ �10.26O2 � E2 ¼ 10.26

Two groups:

O2�E2¼ summed observedminusexpected score for group 2

Log�rank statistic ¼ O2 � E2ð Þ2Var O2 � E2ð Þ

Presentation: IV. The Log-Rank Test for Two Groups 69

Page 16: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

The expression for the estimated variance isshown here. For two groups, the variance for-mula is the same for each group. This varianceformula involves the number in the risk set ineach group (nif) and the number of failuresin each group (mif) at time f. The summationis over all distinct failure times.

The null hypothesis being tested is that there isno overall difference between the two survivalcurves. Under this null hypothesis, the log–-rank statistic is approximately chi-square withone degree of freedom. Thus, a P-value for thelog–rank test is determined from tables of thechi-square distribution.

Several computer programs are available forcalculating the log–rank statistic. For examplethe Stata package has a command called “ststest” that computes descriptive informationabout Kaplan-Meier curves, the log–rank sta-tistic, and alternative statistics to the log–rankstatistic, to be described later. Other packages,like SAS and SPSS, have procedures that pro-vide results similar to those of Stata. A compar-ison of Stata, SAS, SPSS and R procedures andoutput is provided in the Computer Appendix atthe back of this text.

For the remission data, the edited printoutfrom using the Stata “sts test” procedure isshown here. The log–rank statistic is 16.79and the corresponding P-value is zero to threedecimal places. This P-value indicates that thenull hypothesis should be rejected. We cantherefore conclude that the treatment and pla-cebo groups have significantly different KMsurvival curves.

Var Oi � Eið Þ

¼Xj

n1f n2f m1f þ m2f

� �n1f þ n2f � m1f � m2f

� �n1f þ n2f� �2

n1f þ n2f � 1� �

i ¼ 1; 2

H0: no difference between survivalcurves

Log-rank statistic �w2 with 1 dfunder H0

Computer programs:Stata’s “sts test”:

� descriptive statistics for KMcurves

� log–rank statistic� Alternative statistics to

log–rank statistic

EXAMPLE

Using Stata: Remission Data

GroupEventsobserved

Eventsexpected

1 9 19.252 21 10.75

Total 30 30.00

Log rank ¼ chi2(2) ¼ 16.79P-Value ¼ Pr > chi2 ¼ 0.000

70 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 17: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

Although the use of a computer is the easiestway to calculate the log–rank statistic, we pro-vide here some of the details of the calculation.We have already seen from earlier computa-tions that the value of O2 � E2 is 10.26. Theestimated variance of O2 � E2 is computedfrom the variance formula above to be 6.2685.The log–rank statistic then is obtained by squar-ing 10.26 and dividing by 6.285, which yields16.793, as shown on the computer printout.

An approximation to the log–rank statistic,shown here, can be calculated using observedand expected values for each group withouthaving to compute the variance formula. Theapproximate formula is of the classic chi-squareform that sumsover eachgroupbeing comparedthe square of the observedminus expected valuedivided by the expected value.

The calculation of the approximate formula isshown here for the remission data. Theexpected values are 19.26 and 10.74 for groups1 and 2, respectively. The chi-square valueobtained is 15.276, which is slightly smallerthan the log–rank statistic of 16.793.

V. The Log-Rank Testfor Several Groups

The log–rank test can also be used to comparethree or more survival curves. The null hypo-thesis for this more general situation is that allsurvival curves are the same.

Although the same tabular layout can be usedto carry out the calculations when there aremore than two groups, the test statistic is morecomplicated mathematically, involving bothvariances and covariances of summed observedminus expected scores for each group. A con-venient mathematical formula can be given inmatrix terms.We present thematrix formula forthe interested reader in an Appendix at the endof this chapter.

EXAMPLE

O2 � E2 ¼ 10.26Var(O2 � E2) ¼ 6.2685

Log - rank statistic ¼ O2 � E2ð Þ2dVar O2 � E2ð Þ

¼ 10:26ð Þ26:2685

¼ 16:793

Approximate formula:

X2 �X# of groups

i

Oi � Eið Þ2Ei

EXAMPLE

X2 ¼ �10:26ð Þ219:26

þ 10:26ð Þ210:74

¼ 15:276

Log-rank statistic ¼ 16.793

H0: All survival curves are the same.

Log-rank statistics for > 2 groupsinvolves variances and covariancesof Oi � Ei.

Matrix formula: See Appendix atend of this chapter.

Presentation: V. The Log-Rank Test for Several Groups 71

Page 18: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

We will not describe further details about thecalculation of the log–rank statistic, because acomputer program can easily carry out thecomputations from the basic data file. Instead,we illustrate the use of this test with datainvolving more than two groups.

If the number of groups being compared is G(� 2), then the log–rank statistic has approxi-mately a large sample chi-square distributionwith G � 1 degrees of freedom. Therefore,the decision about significance is made usingchi-square tables with the appropriate degreesof freedom.

The approximate formula previously describedinvolving only observed and expected valueswithout variance or covariance calculationscan also be used when there are more thantwo groups being compared. However, practi-cally speaking, the use of this approximateformula is not required as long as a com-puter program is available to calculate theexact log–rank statistic.

We now provide an example to illustrate theuse of the log–rank statistic to compare morethan two groups.

The data set “vets.dat” considers survival timesin days for 137 patients from the Veteran’sAdministration Lung Cancer Trial cited byKalbfleisch and Prentice in their text (The Sta-tistical Analysis of Survival Time Data, JohnWiley, pp. 223–224, 1980). A complete list ofthe variables is shown here. Failure status isdefined by the status variable (column 11).

Among the variables listed, we now focus on theperformance status variable (column 7). Thisvariable is a continuous variable, so before wecan obtain KM curves and the log–rank test,we need to categorize this variable.

Use computer program forcalculations.

G (� 2) groups:log–rank statistic � w2 withG � 1 df

Approximation formula:

X2 �X#ofgroups

i

Oi � Eið Þ2Ei

Not required because computerprogram calculates the exact log–-rank statistic

EXAMPLE

vets.dat: survival time in days,

n 137

Veteran’s Administration Lung Cancer Trial

Column 1: Treatment (standard ¼ 1, test ¼ 2)Column 2: Cell type 1 (large ¼ 1, other ¼ 0)Column 3: Cell type 2 (adeno ¼ 1, other ¼ 0)Column 4: Cell type 3 (small ¼ 1, other ¼ 0)Column 5: Cell type 4 (squamous ¼ 1, other ¼ 0)Column 6: Survival time (days)

Column 7: Performance Status

(0 ¼ worst . . . 100 ¼ best)Column 8: Disease duration (months)Column 9: AgeColumn 10: Prior therapy (none ¼ 0, some ¼ 1)Column 11: Status (0 ¼ censored, 1 ¼ died)

72 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 19: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

If, for theperformance status variable,we choosethe categories 0–59, 60–74, and 75–100, weobtain three groups of sizes 52, 50, and 35,respectively.

The KM curves for each of three groups areshown here. Notice that these curves appearto be quite different. A test of significanceof this difference is provided by the log–rankstatistic.

An edited printout of descriptive informationabout the three KM curves together with thelog–rank test results are shown here. Theseresults were obtained using the Stata package.

Because three groups are being comparedhere, G ¼ 3 and the degrees of freedom for thelog–rank test is thus G � 1, or 2. The log–rankstatistic is computed to be 29.181, which has aP-value of zero to three decimal places. Thus,the conclusion from the log–rank test is thatthere is a highly significant difference amongthe three survival curves for the performancestatus groups.

VI. Alternatives to theLog Rank Test

There are several alternatives to the log rank testoffered by Stata, SAS, SPSS, and R designedto test the hypothesis that two or more sur-vival curves are equivalent. In this section wedescribe the Wilcoxon, the Tarone-Ware, thePeto, and the Flemington-Harrington test. Allof these tests are variations of the log rank testand are easily implemented in Stata.

EXAMPLE: (continued)

Performance Status Categories

Group # Categories Size

1 0–59 522 60–74 503 75–100 35

KM curves for performance statusgroups

0

0.0

0.5

1.0

100 200 300 400 500 600

1

1

111

1

22 3

333

22

22

2

GroupEventsobserved

Eventsexpected

1 50 26.302 47 55.173 31 46.53

Total 128 128.00Log-rank ¼ chi2(2) ¼ 29.18P-value ¼ Pr > chi2 ¼ 0.0000G ¼ 3 groups; df ¼ G � 1 ¼ 2

Log-rank test is highly significant.

Conclude significant differenceamong three survival curves.

Alternative tests supported by Stata

WilcoxenTarone-WarePetoFlemington-Harrington

Presentation: VI. Alternatives to the Log Rank Test 73

Page 20: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

In describing the differences among these tests,recall that the log rank test uses the summedobserved minus expected score O � E in eachgroup to form the test statistic. This simple sumgives the same weight – namely, unity – to eachfailure time when combining observed minusexpected failures in each group.

The Wilcoxon, Tarone-Ware, Peto, andFlemington-Harrington test statistics are varia-tions of the log rank test statistic and are derivedby applying different weights at the f-th failuretime (as shown on the left for two groups).

The Wilcoxon test (called the Breslow test inSPSS) weights the observed minus expectedscore at time tf by the number at risk nf, overall groups at time tf. Thus, the Wilcoxon testplaces more emphasis on the informationat the beginning of the survival curve wherethe number at risk is large allowing early fail-ures to receive more weight than later failures.This type of weighting may be used to assesswhether the effect of a treatment on survival isstrongest in the earlier phases of administra-tion and tends to be less effective over time.

The Tarone-Ware test statistic also applies moreweight to the early failure times byweighting theobserved minus expected score at time t(f) bythe square root of the number at risk

ffiffiffiffiffinf

p. The

Peto test weights the f-th failure time by thesurvival estimate ~s(t(f)) calculated over all groupscombined. This survival estimate ~s(t(f)) is similarbut not exactly equal to the Kaplan-Meier sur-vival estimate. The Flemington-Harrington testuses theKaplan-Meier survival estimate S(t) overall groups to calculate its weights for the f-thfailure time, S(t(f�1))

p[1� S (t(f�1))]q. The weights

for each of these test statistics are summarizedon the left.

Log rank uses

Oi � Ei ¼Xf

mif � eif� �

i ¼ group #f ¼ fth failure time

Weighting the test statistic for twogroups

Test statistic:

Pf

wðtðf ÞÞ mif � eif� � !2

varPjwðtðf ÞÞ mif � eif

� � !i ¼ 1, 2f ¼ fth failure timew(t(f)) ¼ weight at fth failure time

Wilcoxon Test

� w(tf) ¼ nf (number at risk)� Earlier failures receive more

weight� Appropriate if treatment effect

is strongest in earliest phases ofadministration

Weights Used for Various TestStatistics

Test Statistic w(t(f ))

Log rank 1Wilcoxon nf

Tarone-Wareffiffiffiffiffinf

p

Peto ~s(t(f ))

Flemington-Harrington

S(t(f�1))p

�[1� S(t(f�1))]q

74 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 21: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

The Flemington-Harrington test allows themost flexibility in terms of the choice ofweights because the user provides the valuesof p and q. For example, if p ¼ 1 and q ¼ 0 thenw(t) ¼ S(t(f�1)) which gives more weight for theearlier survival times when S(t(f�1)) is close toone. However, if p¼ 0 and q¼ 1 then w(t)¼ 1�S(t(f�1)) in which case the later survival timesreceive more weight. If p ¼ 0 and q ¼ 0 thenw(t) ¼ 1, and the Flemington-Harrington testreduces to the log rank test.

On the left is a comparison of test results forthe effect of treatment (vs. placebo) using theremission data. The log rank chi-square statis-tic (also displayed previously in this chapter) isthe highest among these tests at 16.79. TheFlemington–Harrington (FH) test with p ¼ 3and q ¼ 1 yielded the lowest chi-square valueat 8.99, although with this weighting it is notimmediately obvious which part of the survivalcurve is getting the most weight. However, allthe test results are highly significant yielding asimilar conclusion to reject the null hypothesis.

On the left are comparisons of the log rank andWilcoxon tests for the 3-level performance sta-tus variable from the vets dataset discussed inthe previous section. The Wilcoxon test yields ahigher chi-square value (46.10) than the logrank test (29.18). In contrast, the log rank testfor the effect of treatment (RX) from the remis-sions data yields a higher chi-square value(16.79) than the Wilcoxon test (13.46). How-ever, both the Wilcoxon and log rank tests arehighly significant for both performance statusand for treatment variables.

Flemington-Harrington Test

w tð Þ ¼ S tðf�1Þ� �p

1� S tðf�1Þ� �� �q

if p ¼ 1 and q ¼ 0, w(t) ¼ S(t(f�1))if p ¼ 0 and q ¼ 1,w(t) ¼ 1 � S(t(f�1))

if p ¼ 0 and q ¼ 0,w(t) ¼ 1 (log rank test)

Comparisons of Test Results:Remission Data, TestingTreatment (RX)

TestChi-square(1 df) P-value

Log rank 16.79 0.0000Wilcoxon 13.46 0.0002Tarone-

Ware15.12 0.0001

Peto 14.08 0.0002FH (p¼ 3,

q ¼ 1)8.99 0.0027

FH (p¼ 1,q ¼ 3)

12.26 0.005

Vets Data, 3-Level PerformanceStatus

TestChi-square(2 df) P-value

Log rank 29.18 0.0000Wilcoxon 46.10 0.0000

Remission Data, 2-Level Treatment

TestChi-square(1 df) P-value

Log rank 16.79 0.0000Wilcoxon 13.46 0.0002

Presentation: VI. Alternatives to the Log Rank Test 75

Page 22: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

A comparison of survival curves gives insightinto why the Wilcoxon test yields a higher chi-square value than the log rank test for the 3-level performance status variable. The 3 curvesbeing compared are farthest apart in the earlypart of followup before becoming closer later.By contrast, a comparison of the 2 curves fortreatment shows the curves diverging over time.

In general, the various weightings should pro-vide similar results and will usually lead to thesame decision as to whether the null hypothe-sis is rejected. The choice of which weightingof the test statistic to use (e.g., log rank orWilcoxon) depends on which test is believed toprovide the greatest statistical power, whichin turn depends on how it is believed the nullhypothesis is violated.

If there is a clinical reason to believe the effectof an exposure is more pronounced toward thebeginning (or end) of the survival function,then it makes sense to use a weighted test sta-tistic. However, one should make an a prioridecision on which statistical test to userather than fish for a desired p-value. Fishingfor a desired result may lead to bias.

1

1

111

1

2

1.0

KM curves for performance status groups

KM Plots for Remission Data

0.5

0.0

1

0.8

Group 2 (Placebo)

Group 1 (treatment)

0.6

0.4

0.2

00 8 16 24 32

0 100 200 300 400 500 600

2 333

3

22

22

2

Choosing a Test

� Results of different weightingsusually lead to similarconclusions

� The best choice is test withmostpower

� Power depends on how the nullis violated

� Theremay be a clinical reason tochoose a particular weighting

� Choice of weighting should be apriori

76 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 23: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

The stratified log rank test is another variationof the log rank test. With this test the summedobserved minus expected scores O � E are cal-culated within strata of each group and thensummed across strata. The stratified log ranktest provides a method of testing the equi-valence of survival curves controlling forthe stratified variable. An example of the stra-tified log rank test is presented next using theremission data.

On the left is Stata output from performing astratified log rank test for the effect of treat-ment (RX) stratified by a 3-level variable(LWBC3) indicating low, medium, or high logwhite blood cell count (coded 1, 2, and 3,respectively).

Within each stratum of LWBC3, the expectednumber of events is calculated for the treatedgroup (RX ¼ 0) and for the placebo group(RX ¼ 1). The total expected number of eventsfor the treated group is found by summing theexpected number of events over the threestrata: 2.91 þ 7.36 þ 6.11 ¼ 16.38. Similarlythe total expected number of events for the pla-cebo group is calculated: 1.09 þ 2.64 þ 9.89 ¼13.62. This compares to 9 observed cases fromthe treated group and 21 observed casesfrom the placebo group yielding a chi-squarevalue of 10.14 with 1 degree of freedom (for2 levels of treatment) and a correspondingp-value of 0.0014.

Recall that when we did not control for logwhite blood cell count, the log rank test for theeffect of treatment yielded a chi-square value of16.79 and a corresponding p-value roundedto 0.0000.

Stratified log rank test

� O � E scores calculated withinstrata

� O � E scores then summedacross strata

� Allows control of stratifiedvariable

Stratified log-rank test->1wbc3 = 1

->1wbc3 = 2

->1wbc3 = 3

rx

rx

01

04

2.91

0 5 7.36

0 4 6.11

0 9 16.381 21 13.62

1 12 9.89

1 5 2.64

Total 4 4.00

Total 10 10.00

Total

->Total

16 16.00

Total 30 30.00

1.09

Eventsobserved

Eventsobserved

Eventsexpected

rxEvents

observedEvents

expected

Eventsexpected

rxEvents

observed

Eventsexpected

(*)

Presentation: VI. Alternatives to the Log Rank Test 77

Page 24: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

The only difference between the unstratifiedand stratified approaches is that for the unstrat-ified approach, the observed minus expectednumber of events for each failure time is sum-med over all failure times for each group (i).With the stratified approach, the observedminus expected number of events is summedover all failure times for each group withineach stratum and then summed over all strata.Either way, the null distribution is chi-squarewith G � 1 degrees of freedom, where G repre-sents the number of groups being compared(not the number of strata).

The stratified approach can also be applied toany of the weighted variations of the log ranktest (e.g., Wilcoxon). A limitation of the strati-fied approach is the reduced sample size withineach stratum. This is particularly problematicwith the remission dataset, which has a smallsample size to begin with.

We have shown how the stratified log rank testcan be used to test the effect of treatment whilecontrolling for log white blood cell count. Inthe next chapter we show how modeling canbe used to test an association of a predictorvariable while simultaneously controlling forother covariates.

VII. Confidence intervalsfor KM curves

We now describe how to calculate (95%)confidence intervals (CIs) for the estimatedKaplan–Meier (KM) curve.

The 95% CI formula for estimated KM proba-bility at any time point over follow-up has thegeneral large sample form shown on the left,where SKM(t)denotes the KM survival estimateat time t and Var[SKM(t)] denotes variance ofSKM(t). The most common approach used tocalculate this variance uses the Greenwood’sformula, also shown on the left.

Log rank unstratified

Oi � Ei ¼Xj

mif � eif� �

i ¼ group #, f ¼ fth failure time

Log rank stratified

Oi � Ei ¼Xs

Xf

mifs � eifs� �

i ¼ group #, f ¼ jth failure time,s ¼ stratum #

Stratified or unstratified (G groups)Under H0:

log rank statistic �w2 withG � 1 df

Can stratify with other testsWilcoxon, Tarone-Ware,Peto, Flemington-Harrington

LimitationSample-size may be small withinstrata

AlternativelyTest associations using modeling

� Can simultaneously controlcovariates

� Shown in next chapter

95% CI for the KM curve:

SKM(t) 1.96

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVar½SKM(t)]

qwhere Greenwood’s formula forVar[SKM(t)] is given by

Var½SKM(t)] ¼ SKM(t)� �2 X

f :tðf Þt

mf

nf ðnf � mf Þ� �

t(f) ¼ f-ordered failure timemf ¼ number of failures at t(f),nf ¼ number in the risk set at t(f),

78 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 25: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

The summation component of Greenwood’sformula essentially gives at each failure time t(f),a weighted (by 1/nf) average of the conditionalrisk of failing at those failure times prior to t(f).Thus, the variance formula gives the square ofthe KM coordinate at each event time weightedby the cumulative estimate of the risk at time t.

We illustrate how Greenwood’s variance is cal-culated for the treatment group (Group 1) ofthe remission times data described earlier. Thelayout for this computation is shown n the left.

At 6 weeks, the estimate of the survival functionis 0.857. There were three events at 6 weeks and21 patients at risk. Therefore, mf/nf(nf � mf) ¼3/(21�18) ¼ 0.0079. As this is the only compo-nent of the sum, the variance is then 0.0079 �0.8572 ¼ 0.0058. The corresponding 95% confi-dence interval is shown on the left, where theupper level should be modified to 1.

At 10 weeks, the estimate of the survival func-tion is 0.753. There was 1 event at 10 monthsand 15 patients at risk. Therefore,mf/nf(nf-mf)¼1/(15�14) ¼ 0.0048.

There were two other risk components prior tothis time, 0.0079 at time 6 and 0.0037 at time 7and their sumis0.0164. The varianceat 10weeksis then 0.0164�0.7532, which equals 0.0093.

The 95% CI for the proportion of patients at10 weeks is shown on the left to have the limits(0.564, 0.942). Note that since the variance isonly defined at event times, the 95% confidenceinterval remains the same at 11 and 12 weeksalso.

On the left we show the KM curves and theircorresponding 95% confidence intervals forGroup 1 (treatment) and Group 2 (placebo)for the remission time data.

mf

nf

1nf (nf - mf)

mf

P[T > t(f ) | T ≥ t(f ) ]nf

1

weight Conditional risk

nf - mf

t n m

m

nðn�mÞP m

nðn�mÞ S(t) Var[S(t)]0 21 0 0 0 1 06 21 3 0.0079 0.0079 0.857 0.00587 17 1 0.0037 0.0116 0.807 0.0076

10 15 1 0.0048 0.0164 0.753 0.009313 12 1 0.0076 0.0240 0.690 0.011416 11 1 0.0091 0.0330 0.628 0.013022 7 1 0.0238 0.0569 0.538 0.016423 6 1 0.0333 0.0902 0.448 0.0181

10weeks:m3

n3ðn3 �m3Þ ¼1

15ð14Þ ¼ :0048

Xf:tðfÞt¼10

mf

nfðnf �mfÞ� �

¼ 0:0079 þ 0:0037 þ 0:0048 ¼ 0:0164:

Var½SKM(10)] = (0.753)2(.0164) = 0.0093

95% CI : 0:753 1.96ffiffiffiffiffiffiffiffiffiffiffiffiffiffi0:0093

p¼ ð0:564;0:942Þ

Same CI at t ¼11 and 12, since noevents occurred at those times.

REMISSION

Time

GROUP

Sur

viva

l pro

babi

lity

(%)

100

80

60

40

20

00 5 10 15 20 25 30 35

12

Presentation: VII. Confidence intervals for KM curves 79

Page 26: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

VIII. Confidence intervalsfor the mediansurvival time

Returning again to the remission time dataset,we now consider the calculation of the 95% CIfor the median of the remission times for theplacebo group. Recall that the median survivalfor this group is 8 weeks.

Brookmeyer and Crowley (1982) proposed asimple way to calculate the CI for the mediansurvival time based on the fact that the squareof a standardized function of the survival curvearound the true (unknown) median value (M)is asymptotically w2distributed. This relation-ship is shown mathematically on the left.

Using the above result about the standar-dized survival curve, a general formula for the95% CI for the median survival is providedby the inequality expression shown on the left.The times for which this inequality holds areplausible values of the true median, whilethe boundaries represent upper and lowertimes for the 95% CI for the median. Thelower boundarymay be 0 and the upper bound-ary may not always exist.

For the remission time data, the calculation ofthe CI around the median of 8 weeks is given inthe table shown on the left. Since the inequalityin the CI formula is satisfied in the range t¼4weeks to 8 weeks, the resulting 95% CI is (4,8).

Brookmeyer and Crowley (B&C) caution thatthe upper limit for these estimates shouldbe adjusted to reflect censoring. They recom-mend reporting semiopen intervals that extendone event beyond the event that satisfies theinequality; some packages (e.g., SAS, R) incor-porate this recommendation.

In this example, the 95% CI for the mediansurvival time obtained from SAS output isgiven by the interval (4, 11), with the upperlimit now extended beyond 8 weeks but notincluding the event at month 11.

Remission data example:Group 1 median ¼ 8 weeks95% CI ?

Formula for 95% CI derived from:

SKMðMÞ � 0:5� �2Var½SKM(M)]

� w21 where

M ¼ true (unknown) median sur-vival time, i.e., SKM(M) ¼ 0.5

SKM(M) ¼ estimated survival prob-ability fromKM curve at the truemedian survival time

Var½SKM(M)] uses Greenwood’sformula.

95% CI for median survival:

SKMðtÞ � 0:5� �2

< 3:84Var½SKMðtÞ�

t(i) S(t) (S(t)–0.5)2 3.84 Var(S(t)) Inequality

satisfied?ˆ ˆ ˆ

0

0.096 > 0.028 so the inequality is not satisfied

0.014 > 0.044 and the inequality is satisfied

123458111215172223

10.900.810.760.670.570.380.290.190.140.100.050.00

-0.016 N

NN

NNNNNN

YYY

0.0280.0330.0410.0450.0440.0380.0280.0220.0160.008

0

0.2500.1600.0960.0680.0290.0050.0140.440.0960.1300.1600.2030.050

Caution (ref B&C, 1982): upperlimit should be adjusted to reflectcensoring, e.g., SAS’s LIFETESTadjusts above 95% CI from (4,8)to (4, 11).

80 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 27: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

IX. Summary We now briefly summarize this presentation.First, we described how to estimate and graphsurvival curves using the Kaplan-Meier (KM)method.

To compute KM curves, we must form a datalayout that orders the failure times from smal-lest to largest. For each ordered failure time,the estimated survival probability is computedusing the product limit formula shown here.Alternatively, this estimate can be computed asthe product of the survival estimate for theprevious failure time multiplied by the condi-tional probability of surviving past the currentfailure time.

When survival curves are being compared, thelog–rank test gives a statistical test of the nullhypothesis of a common survival curve. Fortwo groups, the log–rank statistic is based onthe summed observed minus expected scoresfor a given group and its variance estimate. Forseveral groups, a computer should always beused since the log–rank formula is more com-plicated mathematically. The test statistic isapproximately chi-square in large sampleswith G � 1 degrees of freedom, where Gdenotes the number of groups being compared.

Large sample confidence intervals for KMcurves can be computed based on Greenwood’sformula, shown at the left, for the variance ofan estimated KM survival probability.

KM curves:

KM curves:

1

0.8

0.6

0.4

0.2

00 8 16 24 32

t(f): fth ordered failure time

S tðf Þ� � ¼ Yf

i¼1

Pr T > tðiÞ T � tðiÞ��� �

¼ S tðf�1Þ� �

� Pr T > tðf Þ T � tðf Þ��� �

Log-rank test:

H0: common survival curve forall groups

Log - rank statistic =O2 � E2ð Þ2

Var O2 � E2ð Þlog–rank statistic �w2 with G � 1 dfunder H0

G ¼ # of groups

Greenwood’s Variance formula:

Var SKMðtÞ� � ¼ SKMðtÞ� �2�X

f:tðfÞt

mf

nf nf - mfð Þ� �

Presentation: IX. Summary 81

Page 28: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

The expression for the 95% confidence intervalis shown below Greenwood’s formula.

A large sample confidence interval formula forthe median of the KM curve can also be com-puted using the inequality formula shown hereon the left. The upper and lower boundaries oft for which this inequality holds provide the95% confidence limits.

Chapters This presentation is now complete. You canreview this presentation using the detailedoutline that follows and then try the practiceexercises and test.

Chapter 3 introduces the Cox proportionalhazards (PH) model, which is the most popularmathematical modeling approach for estimat-ing survival curves when considering severalexplanatory variables simultaneously.

95% CI for KM:

SKMðtÞ 1:96ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVar SKMðtÞ� �q

95% CI for median survival:

SKMðtÞ � 0:5� �2

3:84Var SKMðtÞ� �

1. Introduction

32. Kaplan-Meier Survival Curvesand the Log-Rank Test

Next:

3. The Cox ProportionalHazards Model and ItsCharacteristics

82 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 29: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

DetailedOutline

I. Review (pages 58–60)

A. The outcome variable is (survival) time until anevent (failure) occurs.

B. Key problem: censored data, i.e., don’t knowsurvival time exactly.

C. Notation: T ¼ survival time random variable

t ¼ specific value of Td ¼ (0, 1) variable for failure/cen-

sorship statusS(t) ¼ survivor functionh(t) ¼ hazard function

D. Properties of survivor function:

i. theoretically, graph is smooth curve,decreasing from S (t) ¼ 1 at time t ¼ 0 to S(t) ¼ 0 at t ¼ 1;

ii. in practice, graph is step function.

E. Properties of h(t):

i. instantaneous potential for failing givensurvival up to time;

ii. h(t) is a rate; ranges from 0 to 1.

F. Relationship of S (t) to h(t): if you know one youcan determine the other.

G. Goals of survival analysis: estimation of survi-vor and hazard functions; comparisons andrelationships of explanatory variables to sur-vival.

H. Data layouts

i. for the computer;

ii. for understanding the analysis: involvesrisk sets.

II. AnExampleofKaplan-MeierCurves (pages 61–65)

A. Data are from study of remission times in weeksfor two groups of leukemia patients (21 in eachgroup).

B. Group 1 (treatment group) has several censoredobservations, whereas group 2 has no censoredobservations.

C. Table of ordered failure times is provided foreach group.

D. For group 2 (all noncensored), survival prob-abilities are estimated directly and plotted.Formula used is

S tðf Þ� � ¼ # surviving past tðf Þ

21:

Detailed Outline 83

Page 30: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

E. Alternative approach for group 2 is given by aproduct limit formula.

F. For group 1, survival probabilities calculated bymultiplying estimate for immediately precedingfailure time by a conditional probability of sur-viving past current failure time, i.e.,

Sðf Þ ¼ Sðf�1Þ Pr T > tðf Þ T � tðf Þ��� �

:

III. General Features of KM Curves (pages 66–67)

A. Two alternative general formulae:

Sðf Þ ¼Yfi¼1

Pr T > tðiÞ T � tðiÞ��� � (product limit

formula)

Sðf Þ ¼ Sðf�1Þ Pr T > tðf Þ T � tðf Þ��� �

B. Second formula derived from probability rule:

Pr A and Bð Þ ¼ Pr Að Þ � Pr BjAð ÞIV. The Log-Rank Test for Two Groups (pages 67–71)

A. Large sample chi-square test; provides overallcomparison of KM curves.

B. Uses observed versus expected counts over cate-gories of outcomes, where categories aredefined by ordered failure times for entire setof data.

C. Example provided using remission data involv-ing two groups:

i. expanded table described to show howexpected and observed minus expected cellcounts are computed.

ii. for ith group at time f, where i ¼ 1 or 2:

observed counts ¼ mif,

expected counts ¼ eif, where

expected counts ¼ (proportion in risk set) �(# failures over both groups),

i.e., eif ¼ nifn1f þ n2f

� m1f þ m2f

� �:

D. Log-rank statistic for two groups:

Oi � Eið Þ2Var Oi � Eið Þ ;

where i ¼ 1. 2,

84 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 31: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

Oi � Ei ¼Xf

mif � eif� �

; and

Var Oi � Eið Þ

¼Xf

n1f n2f m1f þ m2f

� �n1f þ n2f � m1f � m2f

� �n1f þ n2f� �2

n1f þ n2f � 1� �

i ¼ 1; 2

E. H0: no difference between survival curves.

F. Log-rank statistic � w2 with 1 df under H0.

G. Approximate formula:

X2 ¼XGi¼1

Oi � Eið Þ2Ei

; where G ¼ 2 ¼ # of groups

H. Remission data example: Log-rank statistic ¼16.793, whereas X2 ¼ 15.276.

V. The Log-Rank Test for Several Groups(pages 71–73)

A. Involves variances and covariances; matrix for-mula in Appendix.

B. Use computer for calculations.

C. Under H0, log–rank statistic �w2 with G � 1 df,where G ¼ # of groups.

D. Example provided using vets.dat with intervalvariable “performance status”; this variable iscategorized into G ¼ 3 groups, so df for log–-rank test is G � 1 ¼ 2, log–rank statistic is29.181 (P ¼ 0.0).

VI. Alternatives to the Log-Rank Test (pages 73–78)

A. Alternative tests supported by Stata: Wilcoxen,Tarone-Ware, Peto, and Flemington-Harring-ton.

B. Alternative tests differ by applying differentweights at the j-th failure time.

C. The choice of alternative depends on the reasonfor the belief that the effect is more pronouncedtowards the beginning (or end) of the survivalfunction.

D. The stratified-log–rank test is a variation of thelog–rank test that controls for one or more stra-tified variables.

Detailed Outline 85

Page 32: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

VII. Confidence Intervals for KM Curves(pages 78–79)

A. General form of 95% CI:

SKM(t) 1.96

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVar½SKM(t)]

qB. Var[SKM(t)]uses Greenwood’s formula:

Var½SKM(t)] ¼ SKM(t)� �2 X

f :tðf Þt

mf

nf ðnf � mf Þ� �

C. Example using Remission Time Data

VIII. Confidence Intervals for the Median SurvivalTime (page 80)

A. General form of 95% CI where values of t satis-fying the following inequality provide confi-dence limits:

SKMðtÞ � 0:5� �2

< 3:84Var½SKMðtÞ�B. Example using Remission Time Data

IX. Summary (page 81)

86 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 33: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

PracticeExercises

1. The following data are a sample from the 1967-1980Evans County study. Survival times (in years) are givenfor two study groups, eachwith 25 participants. Group 1has no history of chronic disease (CHR¼ 0), and group 2has a positive history of chronic disease (CHR ¼ 1):

Group 1 (CHR ¼ 0): 12.3þ, 5.4, 8.2, 12.2þ, 11.7, 10.0,5.7, 9.8, 2.6, 11.0, 9.2, 12.1þ,6.6, 2.2, 1.8, 10.2, 10.7, 11.1,5.3, 3.5, 9.2, 2.5, 8.7, 3.8, 3.0

Group 2 (CHR ¼ 1): 5.8, 2.9, 8.4, 8.3, 9.1, 4.2, 4.1, 1.8,3.1, 11.4, 2.4, 1.4, 5.9, 1.6, 2.8,4.9, 3.5, 6.5, 9.9, 3.6, 5.2, 8.8,7.8, 4.7, 3.9

a. Fill in the missing information in the followingtable of ordered failure times for groups 1 and 2:

Group 1 Group 2

t(f) nf mf qf S(t(f)) t(f) nf mf qf S(t(f))

0.0 25 0 0 1.00 0.0 25 0 0 1.001.8 25 1 0 .96 1.4 25 1 0 .962.2 24 1 0 .92 1.6 24 1 0 .922.5 23 1 0 .88 1.8 23 1 0 .882.6 22 1 0 .84 2.4 22 1 0 .843.0 21 1 0 .80 2.8 21 1 0 .803.5 20 2.9 20 1 0 .763.8 19 1 0 .72 3.1 19 1 0 .725.3 18 1 0 .68 3.5 18 1 0 .685.4 17 1 0 .64 3.6 17 1 0 .645.7 16 1 0 .60 3.96.6 15 1 0 .56 4.18.2 14 1 0 .52 4.28.7 13 1 0 .48 4.7 13 1 0 .489.2 4.9 12 1 0 .449.8 10 1 0 .36 5.2 11 1 0 .4010.0 9 1 0 .32 5.8 10 1 0 .3610.2 8 1 0 .28 5.9 9 1 0 .3210.7 7 1 0 .24 6.5 8 1 0 .2811.0 6 1 0 .20 7.8 7 1 0 .2411.1 5 1 0 .16 8.3 6 1 0 .2011.7 4 8.4 5 1 0 .16

8.8 4 1 0 .129.19.9

11.4 1 1 0 .00

b. Based on your results in part a, plot the KM curvesfor groups 1 and 2 on the same graph. Comment onhow these curves compare with each other.

Practice Exercises 87

Page 34: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

c. Fill in the following expanded table of ordered failuretimes to allow for the computation of expected andobserved minus expected values at each ordered fail-ure time. Note that your new table here should com-bine both groups of ordered failure times into onelisting and should have the following format:

t(f) m1f m2f n1f n2f e1f e2f m1f�e1f m2f�e2f

1.4 0 1 25 25 .500 .500 �.500 .5001.6 0 1 25 24 .510 .490 �.510 .5101.8 1 1 25 23 1.042 .958 �.042 .0422.2 1 0 24 22 .522 .478 .478 �.4782.4 0 1 23 22 .511 .489 �.511 .5112.5 1 0 23 21 .523 .477 .477 �.4772.6 1 0 22 21 .516 .484 .484 �.4842.8 0 1 21 21 .500 .500 �.500 .5002.9 0 1 21 20 .512 .488 �.512 .5123.0 1 0 21 19 .525 .475 .475 �.4753.13.53.63.83.9 0 1 18 16 .529 .471 �.529 .5294.1 0 1 18 15 .545 .455 �.545 .5454.2 0 1 18 14 .563 .437 �.563 .5634.7 0 1 18 13 .581 .419 �.581 .5814.9 0 1 18 12 .600 .400 �.600 .6005.2 0 1 18 11 .621 .379 �.621 .6215.3 1 0 18 10 .643 .357 .357 �.3575.4 1 0 17 10 .630 .370 .370 �.3705.7 1 0 16 10 .615 .385 .385 �.3855.8 0 1 15 10 .600 .400 �.600 .6005.9 0 1 15 9 .625 .375 �.625 .6256.5 0 1 15 8 .652 .348 �.652 .6526.6 1 0 15 7 .682 .318 .318 �.3187.8 0 1 14 7 .667 .333 �.667 .6678.2 1 0 14 6 .700 .300 .300 �.3008.3 0 1 13 6 .684 .316 �.684 .6848.4 0 1 13 5 .722 .278 �.722 .7228.7 1 0 13 4 .765 .235 .335 �.3358.8 0 1 12 4 .750 .250 �.750 .7509.1 0 1 12 3 .800 .200 �.800 .8009.29.89.9

10.0 1 0 9 1 .900 .100 .100 �.10010.2 1 0 8 1 .888 .112 .112 �.11210.7 1 0 7 1 .875 .125 .125 �.12511.0 1 0 6 1 .857 .143 .143 �.14311.1 1 0 5 1 .833 .167 .167 �.16711.4 0 1 4 1 .800 .200 �.800 .80011.7 1 0 4 0 1.000 .000 .000 .000

Totals 22 25 30.79 16.21

88 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 35: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

d. Use the results in part c to compute the log–rankstatistic. Use this statistic to carry out the log–ranktest for these data. What is your null hypothesis andhow is the test statistic distributed under this nullhypothesis? What are your conclusions from thetest?

2. The following data set called “anderson.dat” consistsof remission survival times on 42 leukemia patients,half of whom get a certain new treatment therapy andthe other half of whom get a standard treatmenttherapy. The exposure variable of interest is treatmentstatus (Rx ¼ 0 if new treatment, Rx ¼ 1 if standardtreatment). Two other variables for control as potentialconfounders are log white blood cell count (i.e.,logwbc) and sex. Failure status is defined by the relapsevariable (0 if censored, 1 if failure). The data set is listedas follows:

Subj Survt Relapse Sex log WBC Rx

1 35 0 1 1.45 02 34 0 1 1.47 03 32 0 1 2.20 04 32 0 1 2.53 05 25 0 1 1.78 06 23 1 1 2.57 07 22 1 1 2.32 08 20 0 1 2.01 09 19 0 0 2.05 0

10 17 0 0 2.16 011 16 1 1 3.60 012 13 1 0 2.88 013 11 0 0 2.60 014 10 0 0 2.70 015 10 1 0 2.96 016 9 0 0 2.80 017 7 1 0 4.43 018 6 0 0 3.20 019 6 1 0 2.31 020 6 1 1 4.06 021 6 1 0 3.28 022 23 1 1 1.97 123 22 1 0 2.73 124 17 1 0 2.95 125 15 1 0 2.30 126 12 1 0 1.50 127 12 1 0 3.06 128 11 1 0 3.49 1

(Continued on next page)

Practice Exercises 89

Page 36: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

Subj Survt Relapse Sex log WBC Rx

29 11 1 0 2.12 130 8 1 0 3.52 131 8 1 0 3.05 132 8 1 0 2.32 133 8 1 1 3.26 134 5 1 1 3.49 135 5 1 0 3.97 136 4 1 1 4.36 137 4 1 1 2.42 138 3 1 1 4.01 139 2 1 1 4.91 140 2 1 1 4.48 141 1 1 1 2.80 142 1 1 1 5.00 1

a. Suppose we wish to describe KM curves for thevariable logwbc. Because logwbc is continuous,we need to categorize this variable before we com-pute KM curves. Suppose we categorize logwbcinto three categories - low, medium, and high - asfollows:

low (0–2.30), n ¼ 11;medium (2.31–3.00), n ¼ 14;high (>3.00), n ¼ 17.

Based on this categorization, compute and graphKM curves for each of the three categories oflogwbc. (You may use a computer program to assistyou or you can form three tables of ordered failuretimes and compute KM probabilities directly.)

b. Compare the three KM plots you obtained in part a.How are they different?

c. Below is an edited printout of the log–rank testcomparing the three groups.

GroupEvents

observedEventsexpected

1 4 13.062 10 10.723 16 6.21

Total 30 30.00

Log-rank ¼ chi2(2) ¼ 26.39P-value ¼ Pr > chi2 ¼ 0.0000

What do you conclude about whether or not thethree survival curves are the same?

90 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 37: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

Test To answer the questions below, you will need to use acomputer program (from SAS, Stata, SPSS, R or anyother package you are familiar with) that computes andplots KM curves and computes the log–rank test. Freelydownloadable files can be obtained from weblink http://www.sph.emory.edu/dkleinb/surv3.htm.

1. For the vets.dat data set described in the presentation:

a. Obtain KM plots for the two categories of the vari-able cell type 1 (1 ¼ large, 0 ¼ other). Comment onhow the two curves compare with each other. Carryout the log–rank, and draw conclusions from thetest(s).

b. Obtain KM plots for the four categories of cell type -large, adeno, small, and squamous. Note that youwill need to recode the data to define a single vari-able which numerically distinguishes the four cate-gories (e.g., 1 ¼ large, 2 ¼ adeno, etc.). As in part a,compare the four KM curves. Also, carry out thelog–rank for the equality of the four curves anddraw conclusions.

2. The following questions consider a data set from astudy by Caplehorn et al. (“Methadone Dosage andRetention of Patients in Maintenance Treatment,”Med. J. Aust., 1991). These data comprise the times indays spent by heroin addicts from entry to departurefrom one of two methadone clinics. There are two fur-ther covariates, namely, prison record and methadonedose, believed to affect the survival times. The data setname is addicts.dat. A listing of the variables is givenbelow:

Column 1: Subject ID

Column 2: Clinic (1 or 2)

Column 3: Survival status (0 ¼ censored, 1 ¼ departedfrom clinic)

Column 4: Survival time in days

Column 5: Prison record (0 ¼ none, 1 ¼ any)

Column 6: Methadone dose (mg/day)

Test 91

Page 38: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

a. Compute andplot theKMplots for the two categoriesof the “clinic” variable and comment on the extent towhich they differ.

b. A printout of the log–rank and Wilcoxon tests (usingStata) is provided below. What are your conclusionsfrom this printout?

GroupEvents

observedEventsexpected

1 122 90.912 28 59.09

Total 150 150.00

Log-rank ¼ chi2(1) ¼ 27.89P-value ¼ Pr > chi2 ¼ 0.0000Wilcoxon ¼ chi2(1) ¼ 11.63P-value ¼ Pr > chi2 ¼ 0.0007

c. Compute and evaluate KM curves and the log–ranktest for comparing suitably chosen categories of thevariable “Methadone dose.” Explain how you deter-mined the categories for this variable.

92 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 39: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

Answers toPracticeExercises

1. a.

Group 1 Group 2

t(f) nf mf qf S(t(f)) t(f) nf mf qf S(t(f))

0.0 25 0 0 1.00 0.0 25 0 0 1.001.8 25 1 0 .96 1.4 25 1 0 .962.2 24 1 0 .92 1.6 24 1 0 .922.5 23 1 0 .88 1.8 23 1 0 .882.6 22 1 0 .84 2.4 22 1 0 .843.0 21 1 0 .80 2.8 21 1 0 .803.5 20 1 0 .76 2.9 20 1 0 .763.8 19 1 0 .72 3.1 19 1 0 .725.3 18 1 0 .68 3.5 18 1 0 .685.4 17 1 0 .64 3.6 17 1 0 .645.7 16 1 0 .60 3.9 16 1 0 .606.6 15 1 0 .56 4.1 15 1 0 .568.2 14 1 0 .52 4.2 14 1 0 .528.7 13 1 0 .48 4.7 13 1 0 .489.2 12 2 0 .40 4.9 12 1 0 .449.8 10 1 0 .36 5.2 11 1 0 .40

10.0 9 1 0 .32 5.8 10 1 0 .3610.2 8 1 0 .28 5.9 9 1 0 .3210.7 7 1 0 .24 6.5 8 1 0 .2811.0 6 1 0 .20 7.8 7 1 0 .2411.1 5 1 0 .16 8.3 6 1 0 .2011.7 4 1 3 .12 8.4 5 1 0 .16

8.8 4 1 0 .129.1 3 1 0 .089.9 2 1 0 .04

11.4 1 1 0 .00

b. KM curves for CHR data:

2 222

22

22

22

22

11

11

11

11

1.0

0.5

0.0

0 2 4 6 8 10 12 14

Group 1 appears to have consistently better survivalprognosis than group 2. However, the KM curvesare very close during the first 4 years, but are quiteseparate after 4 years, although they appear tocome close again around 12 years.

c. Using the expanded table format, the followinginformation is obtained:

Answers to Practice Exercises 93

Page 40: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

t(f) m1f m2f n1f n2f e1f e2f m1f � e1f m2f � e2f

1.4 0 1 25 25 .500 .500 �.500 .5001.6 0 1 25 24 .510 .490 �.510 .5101.8 1 1 25 23 1.042 .958 �.042 .0422.2 1 0 24 22 .522 .478 .478 �.4782.4. 0 1 23 22 .511 .489 �.511 .5112.5. 1 0 23 21 .523 .477 .477 �.4772.6 1 0 22 21 .516 .484 .484 �.4842.8 0 1 21 21 .500 .500 �.500 .5002.9 0 1 21 20 .512 .488 �.512 .5123.0 1 0 21 19 .525 .475 .475 �.4753.1 0 1 20 19 .513 .487 �.513 .5133.5 1 1 20 18 1.053 .947 �.053 .0533.6 0 1 19 17 .528 .472 �.528 .5283.8 1 0 19 16 .543 .457 .457 �.4573.9 0 1 18 16 .529 .471 �.529 .5294.1 0 1 18 15 .545 .455 �.545 .5454.2 0 1 18 14 .563 .437 �.563 .5634.7 0 1 18 13 .581 .419 �.581 .5814.9 0 1 18 12 .600 .400 �.600 .6005.2 0 1 18 11 .621 .379 �.621 .6215.3 1 0 18 10 .643 .357 .357 �.3575.4 1 0 17 10 .630 .370 .370 �.3705.7 1 0 16 10 .615 .385 .385 �.3855.8 0 1 15 10 .600 .400 �.600 .6005.9 0 1 15 9 .625 .375 �.625 .6256.5 0 1 15 8 .652 .348 �.652 .6526.6 1 0 15 7 .682 .318 .318 �.3187.8 0 1 14 7 .667 .333 �.667 .6678.2 1 0 14 6 .700 .300 .300 �.3008.3 0 1 13 6 .684 .316 �.684 .6848.4 0 1 13 5 .722 .278 �.722 .7228.7 1 0 13 4 .765 .235 .335 �.3358.8 0 1 12 4 .750 .250 �.750 .7509.1 0 1 12 3 .800 .200 �.800 .8009.2 2 0 12 2 1.714 .286 .286 �.2869.8 1 0 10 2 .833 .167 .167 �.1679.9 0 1 9 2 .818 .182 �.818 .818

10.0 1 0 9 1 .900 .100 .100 �.10010.2 1 0 8 1 .888 .112 .112 �.11210.7 1 0 7 1 .875 .125 .125 �.12511.0 1 0 6 1 .857 .143 .143 �.14311.1 1 0 5 1 .833 .167 .167 �.16711.4 0 1 4 1 .800 .200 �.800 .80011.7 1 0 4 0 1.000 .000 .000 .000

Totals 22 25 30.79 16.21 �8.790 8.790

94 2. Kaplan-Meier Survival Curves and the Log-Rank Test

Page 41: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

d. The log–rank statistic can be computed from thetotals of the expanded table using the formulae:

log-rank statistic ¼ Oi � Eið Þ2dVar Oi � Eið ÞVar Oi � Eið Þ

¼Xi

n1f n2f m1f þ m2f

� �n1f þ n2f � m1f � m2f

� �n1f þ n2f� �2

n1f þ n2f � 1� �

The variance turns out to be 9.658, so that the log–-rank statistic is (8.79)2/9.658 ¼ 7.993.

Using Stata, the results for the log–rank test aregiven as follows:

GroupEvents

observedEventsexpected

1 22 30.792 25 16.21Total 47 47.00

Log-rank ¼ chi2(1) ¼ 7.99P-value ¼ Pr > chi2 ¼ 0.0047

The log–rank test gives highly significant results.This indicates that there is a significant differencein survival between the two groups.

2. a. For the Anderson dataset, the KM plots for the threecategories of log WBC are shown below:

00

0.2

0.4

0.6

0.8

1

8 16

Group 1 (log WBC 0-2.3)

Group 2 (log WBC 2.31-3.0)

Group 3 (log WBC > 3.0)

24 32Weeks

S(t)

b. The KM curves are quite different with group 1having consistently better survival prognosis thangroup 2, and group 2 having consistently bettersurvival prognosis than group 3. Note also that thedifference between group 1 and 2 is about the sameover time, whereas group 2 appears to diverge fromgroup 3 as time increases.

c. The log–rank statistic (26.391) is highly significantwith P-values equal to zero to three decimal places.These results indicate that there is some overalldifference between the three curves.

Answers to Practice Exercises 95

Page 42: [Statistics for Biology and Health] Survival Analysis || Kaplan-Meier Survival Curves and the Log-Rank Test

Appendix:MatrixFormulafor theLog-RankStatistic forSeveralGroups

For i ¼ 1, 2,. . ., G and f ¼ 1, 2,. . ., k, where G ¼ # of groupsand k ¼ # of distinct failure times,

nif ¼ # at risk in ith group at fth ordered failure time

mif ¼ observed # of failures in ith group at fth orderedfailure time

eif¼ expected # of failures in ith group at fth ordered failuretime

¼ nifn1f þ n2f

� m1f þ m2f

� �nf ¼

XGi¼1

nif

mf ¼XGi¼1

mif

Oi � Ei ¼Xkf¼1

mif � eif� �

Var Oi � Eið Þ ¼Xkf¼1

nif nf � nif� �

mif nf � mf

� �n2f nf � 1� � !

Cov Oi � Ei; Ol � Elð Þ ¼Xkf¼1

�nif nlf mf nf � mf

� �n2f nf � 1� � !

d ¼ O1 � E1; O2 � E2; . . . ; OG�1 � EG�1ð Þ0V ¼ vilð Þð Þwhere vii¼ Var (Oi� Ei) and vil¼ Cov (Oi� Ei,Ol� El) for i¼ 1, 2,. . ., G � 1; l ¼ 1, 2,. . ., G � 1.

Then, the log–rank statistic is given by the matrix productformula:

Log-rank statistic ¼ d0V�1d

which has approximately a chi-square distribution withG � 1 degrees of freedom under the null hypothesis thatall G groups have a common survival curve.

96 2. Kaplan-Meier Survival Curves and the Log-Rank Test