42
Week 9 Testing Hypotheses

Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Embed Size (px)

Citation preview

Page 1: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Week 9

Testing Hypotheses

Page 2: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Philosophy of Hypothesis Testing

Model

Data

Null hypothesis, H0 (and alternative, HA)

Test statistic, T

p-value = prob(T is as extreme), if H0 is true

Interpret p-value

Page 3: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Examples of H0 and HA

Null hypothesis examples:

• There is no extrasensory perception.

• There is no difference between the mean pulse rates of men and women.

• There is no relationship between exercise intensity and the resulting aerobic benefit.

Alternative hypotheses examples:

• There is extrasensory perception.

• Men have lower mean pulse rates than women do.

• Increasing exercise intensity increases the resulting aerobic benefit.

Page 4: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Side-effects of generic drug

Pharmac considers replacing an expensive drug with a cheaper generic ‘equivalent’. It is known that 20% of patients experience side-effects from the original drug.

Null, H0: 20% (or fewer) will have side effects.

Altern, HA: More than 20% will have side effects.

Does the generic equivalent have more side-effects?

H0: = 0.2

HA: > 0.2

Page 5: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Logic of hypothesis testing

Like “Presumed innocent until proven guilty”

Assume H0 is true and evaluate the evidence against it

Assume H0 is true and find P(getting data like what was observed)

Page 6: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Psychic Powers

Person correctly guesses outcome of 100 coin flips.

Psychic powers?

(or cheating!!!)

Guessing?

Page 7: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Psychic Powers

Model: Each of 100 ‘guesses’ independent

Each has prob of being correct

Data: 100 guesses with all 100 correct

H0 : = 0.5 guessing

HA : > 0.5 psychic or cheating

Test stat: number correct

Page 8: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Psychic Powers

p-value: P(all 100 correct) if guessing= (½)100 = very small

Conclusion Sample results inconsistent with H0 (guessing)

Extremely strong evidence of psychic powers (or cheating)

Page 9: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Interpreting p-value p-value > 0.1

Data could easily have occurred by chance with H0

No evidence that H0 is wrong

0.05 < p-value < 0.1 Only mild evidence that H0 is wrong

0.01 < p-value < 0.05 Moderately strong evidence that H0 is wrong

p-value < 0.01 Strong evidence that H0 is wrong

Data unlikely to have occurred if H0 were true

Page 10: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Interpreting p-value

However large the p-value, you must never conclude that H0 is true. The best you can say is that there is no evidence H0 is false.

(You could never conclude that H0: = 0.5, since it could just as easily be 0.50000001 — you would have no chance of distinguishing.)

Even if p-value < 0.01, there is still some chance that H0 could be true.

Page 11: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Testing a proportion

Model: n independent trials

Each has prob of being success

Data: x correct

Possible null and alternative hypotheses H0: = 0 vs HA: ≠ 0

H0: ≥ 0 vs HA: < 0

H0: ≤ 0 vs HA: > 0

Page 12: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Testing a proportion

Test statistic X = #success ~ binomial(n, 0) if H0 is true

p-value = prob (X as extreme as observed, if H0 is true)

Evaluate from binomial (n, 0) Minitab

Page 13: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Side-effects of drug

Standard drug has p(side effect) = 0.2

Generic drug: x = 14 out of n = 50 with side-effect

H0: = 0.2

HA: > 0.2

p-value = prob(X ≥ 14) for binomial(n=50, =0.2)

= 1 – prob(X ≤ 13)

= 1 – 0.8894

= 0.1106

No evidence of higher rate of side-effects with generic

Page 14: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Testing a proportion

Test statistic X = #success ~ binomial(n, 0) if H0 is true

Big sample : n 0 ≥ 10 and n(1 - 0) ≥ 10

p ~ approx normal0, 0 1−0

⎝ ⎜

⎠ ⎟

n

⎜ ⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟ ⎟

z = p−0

0 1−0( )

n

~ approx normal0, 1( )

Page 15: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Testing a proportion

Test statistic

p-value = prob (Z as extreme as observed, if H0 is true)

Evaluate from normal (0, 1) Minitab or tables

z = p−0

0 1−0( )

n

~ approx normal0, 1( )

Page 16: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Side-effects of drug

Standard drug has p(side effect) = 0.2

Generic drug: x = 112 out of n = 400 with side-effect

H0: = 0.2

HA: > 0.2

p-value = prob(X ≥ 112) for binomial(n=400, =0.2)

= prob(Z ≥ 4) for normal(0, 1)

= 0.0000Almost certain of higher rate of side-effects with generic

z = p−

001−

0( )n

= 0.28−0.20.2×0.8400

= 4.0

Page 17: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Left and Right Foot Lengths

H0: = .5 versus Ha: .5

n = 112 students with unequal right and left foot size. x = 63 have longer right foot

Are Left and Right Foot Lengths Equal or Different?

( ) ( )32.1

1125.15.

5.5625.

1

ˆ

00

0 =−

−=

−=

npp

ppz

p = 63/112 = .5625

Let = population proportion with a longer right foot.

Page 18: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Assuming = 0.5 (H0),

So p-value = 2(0.093) = 0.186

Left and Right Foot Lengths

Page 19: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

p-value = 0.186 so no evidence against the null hypothesis.

Although was a tendency toward a longer right foot in sample, there is insufficient evidence to conclude the proportion in the population with a longer right foot is different from the proportion with a longer left foot.

Left and Right Foot Lengths

Conclusion in words

Page 20: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Statistical significancevs Real importance

The p-value does not provide information about the magnitude of the effect.

The magnitude of a statistically significant effect can be so small that the practical effect is not important.

If sample size large enough, almost any null hypothesis can be rejected.

Page 21: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Birth Month and Height

Austrian study of heights of 507,125 military recruits.

Men born in spring were, on average, about 0.6 cm taller than men born in fall (Weber et al., Nature, 1998, 391:754–755).

A small difference: 0.6 cm = about 1/4 inch.

Sample size so large that even a very small difference was statistically significant.

Headline: Spring Birthday Confers Height Advantage

Page 22: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Internet and Loneliness

A closer look: actual effects were quite small.

“one hour a week on the Internet was associated, on average, with an increase of 0.03, or 1 percent on the depression scale” (Harman, 30 August 1998, p. A3).

“greater use of the Internet was associated with declines in participants’ communication with family members in the household, declines in size of their social circle, and increases in their depression and loneliness” (Kraut et al., 1998, p. 1017)

Page 23: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Test statistic and p-value

Common definition

p-value from normal (0, 1)

N(0,1) may be improved when s.e. is estimated

z = statistic − null value standard error

Page 24: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Testing Hypotheses About Mean

1. H0: = 0 versus HA: ≠ 0 (two-tailed)

2. H0: ≥ 0 versus HA: < 0 (one- tailed)

3. H0: ≤ 0 versus HA: > 0 (one- tailed)

Often write H0: = 0 for all tests.

Always use = 0 for finding p-values

Model: Random sample from popn with mean Popn is approx normal or reasonably large sample

Hypotheses:

Page 25: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Known (or large n)

Test statistic:

p-value: From normal (0,1)

Only if value of s is known or big enough sample that we can estimate it well (e.g. n > 30)

z = statistic − null value standard error

= x−0

n

⎛ ⎝ ⎜ ⎞

⎠ ⎟

Page 26: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Unknown — t-test

Test statistic:

t has standard distn if = 0

t-distn with n – 1 degrees of freedom

p-value: Prob of more extreme t if = 0 (i.e. H0) From t(n – 1) distribution

t = statistic − null value standard error

= x−0

sn

⎛ ⎝ ⎜ ⎞

⎠ ⎟

Page 27: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Finding the p-value

Page 28: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Normal Body Temperature

What is normal body temperature?

Less than 98.6 ºF (on average)?

Data: Random sample of n = 18 normal body temps

Model: Body temps are random sample from popn with mean Popn is approx normal

98.2 97.8 99.0 98.6 98.2 97.8 98.4 99.7 98.297.4 97.6 98.4 98.0 99.2 98.6 97.1 97.2 98.5

Page 29: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Normal Body Temperature

H0: = 98.6

HA: < 98.6

Hypotheses:

Test statistic:

s.e. x( ) =s

n=0.684

18=0.161

38.2161.0

6.98217.980 −=−

=−

=

nsx

t

x = 98.217

s = 0.684

Page 30: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Normal Body Temperature

p-value:

If = 98.6 , t has a t (n – 1) distn = t (17 d.f.) distn Minitab or Excel can find tail area:

t = x−0s

n

= −2.38H0: = 98.6

HA: < 98.6

Page 31: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Normal Body Temperature

Conclusion:

p-value = 0.015 Moderately strong evidence that the mean body

temperature (of this population) is less than 98.6 ºF

H0: = 98.6

HA: < 98.6

Page 32: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Normal Body Temperature

Minitab: Does all the calculations for you

p-value = 0.015 Moderately strong evidence that the mean body

temperature (of this population) is less than 98.6 ºF

Test of mu = 98.600 vs mu < 98.600Variable N Mean StDev SE Mean T PTemperature 18 98.217 0.684 0.161 -2.38 0.015

Page 33: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Paired data example

n = 10 pilots perform flight simulation (a) sober then (b) after alcohol.

Does useful performance time decrease with alcohol use?

Is mean difference zero or >zero?

Page 34: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Paired Data and the Paired t-Test

Data: d = x1 – x2

Model: Differences are random sample from popn with mean d

Parameter estimate:

Hypothesis test:

( )n

sdes d=..d

Same analysis as before (CI & test) but using differences

Is the mean difference in the population different from 0?

Page 35: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Paired t-Test

Hypotheses H0: d = 0 HA: d 0 (or d < 0 or d > 0

(Take care picking HA.)

nsd

td

0

error standard

valuenullmean sample −=

−=

Test statistic

p-value and conclusion From t (n – 1 d.f.) distn Make sure you use correct tails, especially 1-tailed

Page 36: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Effect of Alcohol

Data: 10 pilots do flight simulation (a) sober, then (b) after alcohol Response = useful performance time

Model: Differences are random sample from popn with mean Approx normal

Does mean useful performance time decrease?

Page 37: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Effect of Alcohol

H0: d = 0

HA: d > 0

Hypotheses:

Test statistic:

d = 165 .6sd = 230.5

s.e. d( ) =sdn=230 .5

10=72.9

t= d −0

sdn

=195 .6 −0

72.9=2.68

Page 38: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Effect of Alcohol

p-value:

If d = 0 , t has a t (n – 1) distn = t (9 d.f.) distn

Minitab or Excel can find tail area:

H0: d = 0

HA: d > 0

t= d −0

sdn

=2.68

Page 39: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Effect of Alcohol

Conclusion:

p-value = 0.013 Even with a small experiment, there is moderately strong

evidence that alcohol has decreased the mean performance time

H0: d = 0

HA: d > 0

Page 40: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

Effect of Alcohol

Minitab: Does all the calculations for you

p-value = 0.013 Even with a small experiment, there is moderately strong

evidence that alcohol has decreased the mean performance time

Test of mu = 0.0 vs mu > 0.0Variable N Mean StDev SE Mean T PDiff 10 195.6 230.5 72.9 2.68 0.013

Page 41: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

More about p-values

If H0 is true, you can still get small p-values Prob(p-value ≤ 0.05) = 0.05 Prob(p-value ≤ 0.01) = 0.01 Prob(p-value ≤ 0.001) = 0.001

If HA is true, small p-values are more likely … but big p-values are still possible

p-value is approx 0 — conclude HA almost certainly true.

Never conclude H0 is true

Page 42: Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T

P-values for other tests

If you know H0 and HA

p-value (e.g. from Minitab output)

… then you should be able to report the conclusion from the test

e.g. Test for normality: H0: sample from normal popn; HA: not normal p-value = 0.163 No evidence that popn is not normal