11
Equivalence Testing Consider the common hypothesis test... H 0 : μ 1 = μ 2 vs. H A : μ 1 6= μ 2 ...and it turns out that the client is content or even happy when the null hypothesis is NOT rejected. This should send-up a red flag in terms of the analysis that was done. In reality, did the researcher NOT want to find significance? Were they trying to show that the groups were the same? When we don’t reject, what can be said about the group means? 1

Equivalence Testing - University of Iowahomepage.stat.uiowa.edu/.../Class_notes/equivalence_testing.pdf · Con dence interval (CI) approach: ... ing, none of the new cheeses could

  • Upload
    vanhanh

  • View
    218

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Equivalence Testing - University of Iowahomepage.stat.uiowa.edu/.../Class_notes/equivalence_testing.pdf · Con dence interval (CI) approach: ... ing, none of the new cheeses could

Equivalence Testing

• Consider the common hypothesis test...

H0 : µ1 = µ2 vs. HA : µ1 6= µ2

...and it turns out that the client is contentor even happy when the null hypothesis isNOT rejected.

• This should send-up a red flag in terms of theanalysis that was done.

In reality, did the researcher NOT want tofind significance?

Were they trying to show that the groupswere the same?

When we don’t reject, what can be said aboutthe group means?

1

Page 2: Equivalence Testing - University of Iowahomepage.stat.uiowa.edu/.../Class_notes/equivalence_testing.pdf · Con dence interval (CI) approach: ... ing, none of the new cheeses could

The Dilemma of the Non-rejected Null

• Fail to reject⇒ There is not statistically sig-nificant evidence that the population meansare different.

Without more information, we usually con-sider these groups similar. But this is underthe idea that you’re looking for evidence thatthey are different.

If the researcher wants to claim that they’re‘similar’, it’s not enough to do the traditionalhypothesis test and just ‘not reject’. (If thatwas the case, just sample 2 people from eachgroup... voila! They’re the same!)

• Under our traditional hypothesis testing, weassume the null is true right from the start.(If you want to SHOW that the null is true,we certainly can’t ASSUME it to be true.)

2

Page 3: Equivalence Testing - University of Iowahomepage.stat.uiowa.edu/.../Class_notes/equivalence_testing.pdf · Con dence interval (CI) approach: ... ing, none of the new cheeses could

• If you want to show that the groups are simi-lar, first ASSUME that they are different, andthen try to gather evidence to the contrary(i.e. evidence that suggests they are the same).

⇒ This is Equivalence Testing

H0 : µ1 6= µ2 vs. HA : µ1 = µ2

• The difficult question in these tests...

“How close is close enough to be considered‘the same’ ”?

Can we specify a ‘Practical Equivalence’ value?

H0 : |µ1−µ2| > ∆ vs. HA : |µ1−µ2| < ∆︸ ︷︷ ︸ ︸ ︷︷ ︸Inequivalent Equivalent

3

Page 4: Equivalence Testing - University of Iowahomepage.stat.uiowa.edu/.../Class_notes/equivalence_testing.pdf · Con dence interval (CI) approach: ... ing, none of the new cheeses could

• In equivalence testing, the null hypothesis isa “difference of ∆ or more.”

Restating H0...

H0 : µ1 − µ2 < −∆ or µ1 − µ2 > ∆

• This leads to the most basic form of equiva-lence testing, the two one-sided test (TOST)procedure.

(y1 − y2) + ∆

σ√

1n1

+ 1n2

> z1−α or(y1 − y2)−∆

σ√

1n1

+ 1n2

< −z1−α

We declare the two group means equivalentat the α level if, and only if, both are rejected.

4

Page 5: Equivalence Testing - University of Iowahomepage.stat.uiowa.edu/.../Class_notes/equivalence_testing.pdf · Con dence interval (CI) approach: ... ing, none of the new cheeses could

• Confidence interval (CI) approach:

Construct the (1−2α) % CI for the differencein means: Y1 − Y2 ± zα · se(Y1 − Y2)

If both one-sided tests are simultaneously rejected, this CI will

be contained in the ±∆ interval.

If the CI for the difference is completely con-tained in the interval with endpoints−∆ and+∆, then we declare equivalence.

As opposed to classical testing, the researcherWANTS this confidence interval to containzero.

We want to be able to say the difference isvery likely to be zero (beyond random chance).

• In the cheese example, the client chose ∆ as2 units (on a 20 point scale).

5

Page 6: Equivalence Testing - University of Iowahomepage.stat.uiowa.edu/.../Class_notes/equivalence_testing.pdf · Con dence interval (CI) approach: ... ing, none of the new cheeses could

Even without accounting for multiple test-ing, none of the new cheeses could be de-clared equivalent to the milk fat cheese atthe α = 0.05 level (i.e. controlling the prob-ability of rejecting inequivalence when it wasactually true at the 0.05 level).

6

Page 7: Equivalence Testing - University of Iowahomepage.stat.uiowa.edu/.../Class_notes/equivalence_testing.pdf · Con dence interval (CI) approach: ... ing, none of the new cheeses could

This procedure was used to compare vacci-nation rates (as a percentage) among differ-ent races/ethnicities and different vaccines inBarker, et al. (2002).

Black Hispanic

Figure. Equivalencies in early childhood immuniza-

tion coverage (as %) by self-reported race/ethnicity

for minority groups versus Whites, National Immu-

nization Survey, 2000. (The researchers have chosen

∆ as 5%.)

7

Page 8: Equivalence Testing - University of Iowahomepage.stat.uiowa.edu/.../Class_notes/equivalence_testing.pdf · Con dence interval (CI) approach: ... ing, none of the new cheeses could

•When has this come up?

– New food item meant to be a substitute

– New generic drug compared to old stan-dard (bioequivalence)

• This process makes more sense logically be-cause more samples gives us more power fordetecting ‘equivalence’.

• It may be a subtle difference to pick-up fromyour client, but sometimes an equivalencetest is more appropriate.

8

Page 9: Equivalence Testing - University of Iowahomepage.stat.uiowa.edu/.../Class_notes/equivalence_testing.pdf · Con dence interval (CI) approach: ... ing, none of the new cheeses could

I performed an equivalence test for submis-sion for a biology collaborator.

Here is a plot of the length of the Organ ofCorti (part of the inner ear) for two differ-ent mouse genotypes (wild type and double-conditional knock-out).

OC length (mean +/- 1SEM)

Days

OC

leng

th

0 21 120

01

23

45

WTdCKO

They are obviously quite close within eachtime point, but can we consider them equiv-alent?

9

Page 10: Equivalence Testing - University of Iowahomepage.stat.uiowa.edu/.../Class_notes/equivalence_testing.pdf · Con dence interval (CI) approach: ... ing, none of the new cheeses could

Confidence interval on the difference:Simultaneous confidence intervals(no multiple comparison adjustment)

OC Length Equivalence Testing between WT and dCKO

with confidence intervals of differencesDifference in OC lengths between WT and dCKO (mm)

Tim

e P

oint

P0

P21

Month 4

-1.0 -0.5 0.0 0.5 1.0

Simultaneous confidence intervals(with bonferroni multiple comparison adjustment)

OC Length Equivalence Testing between WT and dCKO

with confidence intervals of differencesDifference in OC lengths between WT and dCKO (mm)

Tim

e P

oint

P0

P21

Month 4

-1.0 -0.5 0.0 0.5 1.0

10

Page 11: Equivalence Testing - University of Iowahomepage.stat.uiowa.edu/.../Class_notes/equivalence_testing.pdf · Con dence interval (CI) approach: ... ing, none of the new cheeses could

• Some references:

Barker, W.I., Luman, E.T., McCauley, M.M., andS.Y. Chu (2002). Assessing Equivalence: An Al-ternative to the Use of Difference Tests for Mea-suring Disparities in Vaccination Coverage. Am JEpidemiol, 156:1056-1061.

Lauzon, C. and Caffo, B. (2009). Easy MultiplicityControl in Equivalence Testing Using Two One-Sided tests. The American Statistician, 63:147-154.

Rogers, J.L., Howard, K.I., and Vessey, J.T. (1993).Using Significance Tests to Evaluate EquivalenceBetween Two Experimental Groups. Psychologi-cal Bulletin, 113:553-565.

Tempelman, R. (2004), Experimental Design and Sta-tistical Methods for Classical and BioequivalenceHypothesis Testing With an Application to DairyNutrition Studies. Journal of Animal Science,82: 162-172.

Wellek, S.(2003). Testing statistical hypotheses ofequivalence. Boca Raton, FL: Chapman and Hall/CRC.

11