43
Chapter 7: Hypothesis testing Hypothesis testing is typically done based on the cumulative hazard function. Here we’ll use the Nelson-Aalen estimate of the cumulative hazard. The survival function is used to weight differences between the observed and expected cumulative hazard. Recall that the Nelson-Aalen estimate of the cumulative hazard is e H (t )= X t t i d i Y i In a one-sample problem, you test whether the hazard rate h(t ) is equal to some reference hazard, h 0 (t ). The null hypothesis is H 0 : h(t )= h 0 (t ). Under the null hypothesis, the expected hazard rate at time t i is h 0 (t i ). SAS Programming March 6, 2015 1 / 43

Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Chapter 7: Hypothesis testing

Hypothesis testing is typically done based on the cumulative hazardfunction. Here we’ll use the Nelson-Aalen estimate of the cumulativehazard. The survival function is used to weight differences between theobserved and expected cumulative hazard.

Recall that the Nelson-Aalen estimate of the cumulative hazard is

H(t) =∑t≤ti

diYi

In a one-sample problem, you test whether the hazard rate h(t) is equal tosome reference hazard, h0(t). The null hypothesis is H0 : h(t) = h0(t).Under the null hypothesis, the expected hazard rate at time ti is h0(ti ).

SAS Programming March 6, 2015 1 / 43

Page 2: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing: one sample

The idea is then to compare observed - expected cumulative hazard ratesat the time τ , the largest time in the study (τ = tD) if the largest time isa death time). The test statistic is then

Z (τ) = O(τ)− E (τ) =D∑i=1

W (ti )diYi−∫ τ

0W (s)h0(s) ds

where W (·) is a weight function.

The variance is

V [Z (τ)] =

∫ τ

0W 2(s)

h0(s)

Y (s)ds

SAS Programming March 6, 2015 2 / 43

Page 3: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing

The expected value of Z (τ) = 0, so if we take a z-score of Z (τ)(subtracting the mean and dividing by the standard deviation), we get

Z (τ)/√V [Z (τ)]

which has an approximate standard normal distribution. This can be usedfor either a two-sided or one-sided test. For example, a one-sided testwould be H1 : h(t) > h0(t), and you would reject only for large values of

Z (τ)/√V [Z (τ)]

SAS Programming March 6, 2015 3 / 43

Page 4: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing

The most popular choice for a weighting function is W (t) = Y (t), whichleads to

O(τ) =D∑i=1

Y (ti )diYi

=D∑i=1

di

This is also called the log-rank test (not sure why).

Other weight functions are possible. For example

W (t) = Y (t)S0(t)p[1− S0(t)]q

with 0 ≤ p, q ≤ 1 (you don’t necessarily need q = 1− p here). The choiceof p affects whether you care more about the hazard not matching thehypothesized hazard for small t or large t. For example, if p is large, thenmore emphasis is placed on the estimated hazard matching the null hazardfor small values of t.

S0(t) can be obtained from S0(t) = − exp[−H0(t)].SAS Programming March 6, 2015 4 / 43

Page 5: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing

An example where you would use the one-sided hypothesis test is intesting whether some population has a higher hazard than a referencepopulation, such as the psychiatric patients from Iowa. Recall that for thisexample, we looked at excess mortality previously.

SAS Programming March 6, 2015 5 / 43

Page 6: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing: two or more samples

If you have two or more samples (i.e., mortality for three differenttreatments or three different risk groups), then the null and alternativehypothesis are similar to that for ANOVA:

H0 : h1(t) = h2(t) = · · · hK (t), for all t ≤ τ

HA : hi (t) 6= hj(t) for some i 6= j and some t ≤ τ

where τ is the largest time at which all of the groups have at least onesubject at risk.

SAS Programming March 6, 2015 6 / 43

Page 7: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing: two or more samples

We now define ti as the unique death times for the pooled data (i.e.,ignoring the group that each observation comes from), and again tD is thelargest death time.

We observe dij deaths at time ti in sample j , and there are Yij individuals

at risk at time ti in sample j . We let di =∑K

j=1 dij be the total number of

deaths at time ti and Yi =∑K

j=1 Yij be the total number of indivdiuals atrisk (available for death?) at time ti .

SAS Programming March 6, 2015 7 / 43

Page 8: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing: two or more samples

The idea for testing the hypothesis is that under the null hypothesis, theestimate of the hazard (and cumulative hazard) should be the same (inexpectation) using the pooled data (ignoring the group the samples arefrom) and for the individual samples. We can think of the pooled data asproviding a more precise estimate of the hazard for the jth sample thanthe jth sample itself, so using the idea of observed minus expected, we canwrite

Zj(τ) =D∑i=1

Wj(t)

(dijYij− di

Yi

), j = 1, . . . ,K

If all of the Zj(τ) terms are close to 0, then all of the sample estimatedcumulative hazards are close to the pooled cumulative hazard, so they allmust be close to each other, and this supports the null hypothesis.

SAS Programming March 6, 2015 8 / 43

Page 9: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing: two or more samples

The typical weight function used is Wj(t) = Yij(t)W (ti ), where W (ti ) is acommon weight shared by each group. For this weighting scheme,

Zj(τ) =D∑i=1

[dij − Yij

(diYi

)]

V [Zj(τ)] = σjj =D∑i=1

W (ti )2Yij

Yi

(1−

Yij

Yi

)(Yi − diYi − 1

)di , j = 1, . . . ,K

cov(Zj(τ),Zk(τ)) = σjk =D∑i=1

W (ti )2Yij

Yi

Yik

Yi

(Yi − diYi − 1

)di , j 6= k

SAS Programming March 6, 2015 9 / 43

Page 10: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing: two or more samples

Based on the second formula for Zj(τ), the sum∑K

j=1 Zj(τ) is equal to 0,meaning that the Zj(τ) are not independent of one another. In particularZK (τ) is a linear combination of Z1(τ), . . . ,ZK−1(τ). Consequently, weconstruct a test statistic just based on the first K − 1 Zj(τ) terms:

χ2 = (Z1(τ), . . . ,ZK−1(τ))Σ−1(Z1(τ), . . . ,ZK−1(τ))′

where (Z1(τ), . . . ,ZK−1(τ)) is interpreted as a K − 1 row-vector, Σ is a(K − 1)× (K − 1) covariance matrix (if you had made a K × K matrixusing all the variables, it wouldn’t be full rank, and therefore notinvertible). The χ2 statistic has K − 1 degrees of freedom, and you canbase the test on this distribution.

SAS Programming March 6, 2015 10 / 43

Page 11: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing: two samples

Several weight functions are possible. W (t) = 1 for all t leads to thetwo-sample log-rank test. W (ti ) = Yi and W (ti ) =

√Yi have also been

used.

In the case of K = 2 samples, the test statistic can be written as

Z =

∑Di=1W (ti )

[di1 − Yi1

(diYi

)]√∑D

i=1W (ti )2Yi1Yi

(1− Yi1

Yi

)(Yi−diYi−1

)SInce we don’t have to square in this case, we can do one-sided as well astwo-sided hypothesis tests based on a standard normal distribution insteadof a χ2, or you can square the statistic and use a χ2

1 distribution.

SAS Programming March 6, 2015 11 / 43

Page 12: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing: two samples

SAS Programming March 6, 2015 12 / 43

Page 13: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing: two samples

This example was kidney dialysis patients with surgically implantedcatheters versus percutaneous (needle-puncture) placement of catheter.Even though the survival curves look fairly different after 1 year or so, thedifferences are not statistically signficant. Note that there are also veryfew observations for the percutaneous sample.

Actually the number of observations is fairly small for both samples, so theconfidence intervals would be fairly wide.

SAS Programming March 6, 2015 13 / 43

Page 14: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing: two samples

SAS Programming March 6, 2015 14 / 43

Page 15: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing: two samples

SAS Programming March 6, 2015 15 / 43

Page 16: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing: two samples

Different choices for the weight function affect the p-value. It is reassuringif a lot of weighting schemes give the same conclusion. The cases wherethe p-value were low were where the weighting scheme gave a lot ofweight to differences in the hazard for large values of ti , which of course iswhere they appear different. This can also be sensitive to differences incensoring patterns in the two samples, so should be used cautiously.

A problem with using lots of weighting schemes is if you only reportweighting schemes that give the results you want and different weightsconflict. This would be dishonest, so you should either pick a weightingscheme and stick to it, or report results of the different weighting schemesthat you used.

SAS Programming March 6, 2015 16 / 43

Page 17: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing: weight functions

SAS Programming March 6, 2015 17 / 43

Page 18: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing: weight functions

The most common weight functions are either flat, W (ti ) = 1 ordecreasing, with W (ti ) = Yi . A weight function that is increasing mightbe used if to compare longer term survival when early survival might bedue to complications rather than long term effectiveness of a treatment.

An example is in comparing autologous transplants versus allogenictransplants for bone marrow for leukemia. Allogenic transplant patients(receiving bone marrow from sibling) tend to have more complicationsearly on, reducing early survival rates (and increasing early hazard rates),but if interest is in long term survival, then a weight function could beused that emphasized later times.

SAS Programming March 6, 2015 18 / 43

Page 19: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing in R

To test the difference in survival curves in R, you can use survdiff()

from the survival library. An example is with the allo- versus auto-patients in the leukemia data.

> x <- read.table("leukemia2.txt")

> a <- survdiff(Surv(x$V1,x$V2)~factor(x$V3))

Call:

survdiff(formula = Surv(x$V1, x$V2) ~ factor(x$V3))

N Observed Expected (O-E)^2/E (O-E)^2/V

factor(x$V3)=1 51 28 25.8 0.182 0.382

factor(x$V3)=2 50 22 24.2 0.195 0.382

Chisq= 0.4 on 1 degrees of freedom, p= 0.537

The results suggest that the two groups had survival experiences that werenot statistically significantly different from each other.

SAS Programming March 6, 2015 19 / 43

Page 20: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing in R

To plot the two survival curves together you can use

> x <- read.table("leukemia2.txt")

> a <- survfit(Surv(x$V1[x$V3==1],x$V2[x$V3==1])~1)

> b <- survfit(Surv(x$V1[x$V3==2],x$V2[x$V3==2])~1)

> plot(a,conf=F)

> points(b$time,b$surv,type="s",col="red",lwd=3)

> legend(20,1,legend=c("auto","allo"),col=c("black","red"),

lty=c(1,1),lwd=c(1,3),cex=1.3)

SAS Programming March 6, 2015 20 / 43

Page 21: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothsis testing in R

SAS Programming March 6, 2015 21 / 43

Page 22: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing in R

The survdiff() function in R has an optional paramter rho whosedefault is 0, which results in the log rank test. Larger values of rho putlarger weight on later times and can have a big impact on the p-value.

SAS Programming March 6, 2015 22 / 43

Page 23: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Hypothesis testing in SAS

You can use PROC LIFETEST in SAS to do hypothesis testing. We’ll takea look at examples after the break.

SAS Programming March 6, 2015 23 / 43

Page 24: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Tests of trend

For multiple samples (K > 2), a different alternative hypothesis is thefollowing:

HA : h1(t) ≤ h2(t) ≤ · · · ≤ hK (t)

, for t ≤ τ , where at least one inequality is strict. This is equivalent to

HA : S1(t) ≥ · · · ≥ SK (t)

SAS Programming March 6, 2015 24 / 43

Page 25: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Tests of trend

We construct the Zj(τ)s as before and use any weight functions Wj(ti ).We also pick a new set of weights aj , j = 1, . . . ,K , where aj = j is oftenused.

The test statistic is now

Z =

∑Kj=1 ajZj(τ)√∑K

j=1

∑Kk=1 ajak σjk

where Σ = (σjk) is the K ×K covariance matrix. (It isn’t full rank, but wedon’t need the inverse.) The test statistic can be compared to a standardnormal.

SAS Programming March 6, 2015 25 / 43

Page 26: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Tests of trend

SAS Programming March 6, 2015 26 / 43

Page 27: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Stratified tests

If different populations have different covariates (age, sex, etc.), thenideally, you could use a regression approach to survival analysis to adjustfor covariates before comparing survival curves or hazard rates. This isdone in Chapter 8.

If there are a small number of levels for a predictor, then you can use astratified test instead.

Let

H0 : h1s(t) = h2s(t) = · · · = hKs(t), s = 1, . . . ,M, t ≤ τ

The idea is that for each level of the covariate (indexed by s), the hazardrate should be the same. Typically, M is small.

SAS Programming March 6, 2015 27 / 43

Page 28: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Stratified tests

For the stratified test, let

Zj .(τ) =M∑s=1

Zjs(τ)

σjk =M∑s=1

σjks

Then the test statistic is as before with multiple samples:

(Z1.(τ), . . . ,ZK−1,.(τ))Σ−1(Z1.(τ), . . . ,ZK−1,.(τ))′

which is approximately χ2 with K − 1 degrees of freedom. Here we haveK samples and M strata within each sample.

SAS Programming March 6, 2015 28 / 43

Page 29: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Renyi type tests

For a two sample problem, if hazard functions cross, then the previoustests might not detect much overall difference in the hazard rates. Thus,the overall survival experience might be similar, but it could be different inthe short term and different in the long term. If one group is at more atrisk in the short term, and another in the long term, these changes ofdirection could cancel out leading one to not reject the hypothesis that thehazards are different.

Renyi-type tests are based on the maximum absolute value of thedifferences between cumulative hazard rates rather than the summeddifferences.

The idea is similar to the Kolmogorov-Smirnov test for comparing twodistributions, which uses the largest absolute value of the differencebetweent the two empirical CDF functions, but Renyi tests allow forcensoring.

SAS Programming March 6, 2015 29 / 43

Page 30: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Renyi type tests

To construct this test, let

Z (ti ) =∑tk≤ti

W (tk)

[dk1 − Yk1

(dkYk)

)], i = 1, . . . ,D

where as usual dk = dk1 + dk2 and Yk = Yk1 + Yk2 (i.e., dk and Yk arethe pulled number of deaths and number at risk at time tk over bothsamples). The standard error of Z (τ) is

σ2(τ) =∑τk≤τ

W (tk)2(Yk1

Yk

)(Yk2

Yk

)(Yk − dkYk − 1

)dk

where τ is the largest death time tk with Yk1,Yk2 > 0

SAS Programming March 6, 2015 30 / 43

Page 31: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Renyi type tests

The test statistic is

Q = sup{|Z (t)|, t ≤ τ}/σ(τ)

you can think of the supremum here as just the maximum of the absolutevalues of the Z (tj) values. Critical values are given in the Appendix, tableC.5, and are based on the theory of Brownian motion.

SAS Programming March 6, 2015 31 / 43

Page 32: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Renyi type tests

SAS Programming March 6, 2015 32 / 43

Page 33: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Renyi type tests: finding the maximum |Z (tj)|

SAS Programming March 6, 2015 33 / 43

Page 34: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

SAS Programming March 6, 2015 34 / 43

Page 35: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Testing based on a fixed point in time

Instead of testing survival and hazard rates over all time points, you mightbe interested in the 1-yr survival rate. Note that the time being testedshould be chosen before doing the test. If you look at two survival curvesand say, “Wow, they look really different at year 3, is that significant?”then the p-value will biased too low.

It is similar to testing at many time points but then not adjusting formultiple comparisons. In practice, this is what happens all the timethough. People look at a graph of the data, which is maybe meant to bedescriptive, something jumps out at them as being unusual, and they say,“Wow, is that significant?” It’s extremely difficult to answer this type ofquestion. A better approach in this type of case might be the Renyi typeof test, because it is accounting for the fact that you are looking atmaximum differences over the entire time frame.

SAS Programming March 6, 2015 35 / 43

Page 36: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Testing based on a fixed point in time

Here we want to testH0 : S1(t0) = S2(t0)

againstHA : S1(t0) 6= S2(t0)

for two survival curves. (The method can be generalized to more survivalcurves.) The test statistic is

Z =S1(t0)− S2(t0)√

V [S1(t0)] + V [S2(t0)]

which has an approximate standard normal distribution for large samples.

SAS Programming March 6, 2015 36 / 43

Page 37: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Testing based on a fixed point in time

If you want to test multiple fixed time points, such as the 1-yr and 5-yrsurvival rates, then you should adjust for multiple comparisons. For testingtwo time points, a Bonferroni adjustment could be made, meaning thatyou reject each hypothesis only if the p-value is less than α/2. The moretime points you check, the less power you will have to find signficantdifferences.

SAS Programming March 6, 2015 37 / 43

Page 38: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Bonferroni adjustments

Probably the most popular, and simplest adjustment to make for multipletesting is Bonferroni adjustments. The idea is that to have k tests at levelα (meaning that if the null hypotheses are true for all k tests, there is onlya 5% chance of making an error on any one of them), you use an α levelof α/k for each test.

What is the rationale for doing this?

SAS Programming March 6, 2015 38 / 43

Page 39: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Bonferroni adjustments

There are several ways to justify Bonferroni adjustments. One is to look atthe expected number of false positives under the null. Let Xi = 1 if youmake a correct decision on test i , and otherwise Xi = 0. What type ofvariable is Xi? What is the probability that Xi = 1 if the null hypothesis(for experiment i) is true? What is the expected value of Xi?

SAS Programming March 6, 2015 39 / 43

Page 40: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Bonferroni adjustments

Xi as defined previously is Bernoulli with p = α if testing using level α.The expected value of a Bernoulli(p) random variable is p. (Why?), so theexpected value of Xi is α.

If you do k experiments, the expected number of false positives is

E

[k∑

i=1

Xi

]= kα

However, if you test at the α/k level, then the expected number of falsepositives is α. Thus, the Bonferroni adjustment controls the expectednumber of false positives.

SAS Programming March 6, 2015 40 / 43

Page 41: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Bonferroni adjustments

Another approach is to use something called Bonferroni’s inequality. LetAi be the event that you don’t reject the null hypothesis. Suppose we setP(Ai ) = 1− α/k when the null is true. From the Inclusion-Exclusionformula

P(A1A2) = P(A1) + P(A2)− P(A1 ∪ A2) ≥ P(A1) + P(A2)− 1

If we apply the formula again, setting B = A1A2, we get

P(A1A2A3) = [P(A1)+P(A2)−1]+P(A3)−1 ≥ P(A1)+P(A2)+P(A3)−2

In general for k events

P(A1 · · ·Ak) ≥k∑

i=1

P(Ai )− (k − 1)

SAS Programming March 6, 2015 41 / 43

Page 42: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Bonferroni adjustments

If P(Ai ) = 1− α/k , then we get

P(A1 · · ·Ak) ≥ k(

1− α

k

)− k + 1 = 1− α

Thus, the probability of all decisions being correct is at least 1− α, andthe probability of making any wrong decision is at most α.

SAS Programming March 6, 2015 42 / 43

Page 43: Chapter 7: Hypothesis testingjames/w8-STAT574b.pdfSAS Programming March 6, 2015 1 / 43 Hypothesis testing: one sample The idea is then to compare observed - expected cumulative hazard

Bonferroni adjustments

Bonferroni’s inequality can be useful in other probabilistic arguments aswell.

SAS Programming March 6, 2015 43 / 43