Upload
augustine-mckinney
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Week 9
Testing Hypotheses
Philosophy of Hypothesis Testing
Model
Data
Null hypothesis, H0 (and alternative, HA)
Test statistic, T
p-value = prob(T is as extreme), if H0 is true
Interpret p-value
Examples of H0 and HA
Null hypothesis examples:
• There is no extrasensory perception.
• There is no difference between the mean pulse rates of men and women.
• There is no relationship between exercise intensity and the resulting aerobic benefit.
Alternative hypotheses examples:
• There is extrasensory perception.
• Men have lower mean pulse rates than women do.
• Increasing exercise intensity increases the resulting aerobic benefit.
Side-effects of generic drug
Pharmac considers replacing an expensive drug with a cheaper generic ‘equivalent’. It is known that 20% of patients experience side-effects from the original drug.
Null, H0: 20% (or fewer) will have side effects.
Altern, HA: More than 20% will have side effects.
Does the generic equivalent have more side-effects?
H0: = 0.2
HA: > 0.2
Logic of hypothesis testing
Like “Presumed innocent until proven guilty”
Assume H0 is true and evaluate the evidence against it
Assume H0 is true and find P(getting data like what was observed)
Psychic Powers
Person correctly guesses outcome of 100 coin flips.
Psychic powers?
(or cheating!!!)
Guessing?
Psychic Powers
Model: Each of 100 ‘guesses’ independent
Each has prob of being correct
Data: 100 guesses with all 100 correct
H0 : = 0.5 guessing
HA : > 0.5 psychic or cheating
Test stat: number correct
Psychic Powers
p-value: P(all 100 correct) if guessing= (½)100 = very small
Conclusion Sample results inconsistent with H0 (guessing)
Extremely strong evidence of psychic powers (or cheating)
Interpreting p-value p-value > 0.1
Data could easily have occurred by chance with H0
No evidence that H0 is wrong
0.05 < p-value < 0.1 Only mild evidence that H0 is wrong
0.01 < p-value < 0.05 Moderately strong evidence that H0 is wrong
p-value < 0.01 Strong evidence that H0 is wrong
Data unlikely to have occurred if H0 were true
Interpreting p-value
However large the p-value, you must never conclude that H0 is true. The best you can say is that there is no evidence H0 is false.
(You could never conclude that H0: = 0.5, since it could just as easily be 0.50000001 — you would have no chance of distinguishing.)
Even if p-value < 0.01, there is still some chance that H0 could be true.
Testing a proportion
Model: n independent trials
Each has prob of being success
Data: x correct
Possible null and alternative hypotheses H0: = 0 vs HA: ≠ 0
H0: ≥ 0 vs HA: < 0
H0: ≤ 0 vs HA: > 0
Testing a proportion
Test statistic X = #success ~ binomial(n, 0) if H0 is true
p-value = prob (X as extreme as observed, if H0 is true)
Evaluate from binomial (n, 0) Minitab
Side-effects of drug
Standard drug has p(side effect) = 0.2
Generic drug: x = 14 out of n = 50 with side-effect
H0: = 0.2
HA: > 0.2
p-value = prob(X ≥ 14) for binomial(n=50, =0.2)
= 1 – prob(X ≤ 13)
= 1 – 0.8894
= 0.1106
No evidence of higher rate of side-effects with generic
Testing a proportion
Test statistic X = #success ~ binomial(n, 0) if H0 is true
Big sample : n 0 ≥ 10 and n(1 - 0) ≥ 10
€
p ~ approx normal0, 0 1−0
⎛
⎝ ⎜
⎞
⎠ ⎟
n
⎛
⎝
⎜ ⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟ ⎟
€
z = p−0
0 1−0( )
n
~ approx normal0, 1( )
Testing a proportion
Test statistic
p-value = prob (Z as extreme as observed, if H0 is true)
Evaluate from normal (0, 1) Minitab or tables
€
z = p−0
0 1−0( )
n
~ approx normal0, 1( )
Side-effects of drug
Standard drug has p(side effect) = 0.2
Generic drug: x = 112 out of n = 400 with side-effect
H0: = 0.2
HA: > 0.2
p-value = prob(X ≥ 112) for binomial(n=400, =0.2)
= prob(Z ≥ 4) for normal(0, 1)
= 0.0000Almost certain of higher rate of side-effects with generic
€
z = p−
001−
0( )n
= 0.28−0.20.2×0.8400
= 4.0
Left and Right Foot Lengths
H0: = .5 versus Ha: .5
n = 112 students with unequal right and left foot size. x = 63 have longer right foot
Are Left and Right Foot Lengths Equal or Different?
( ) ( )32.1
1125.15.
5.5625.
1
ˆ
00
0 =−
−=
−
−=
npp
ppz
p = 63/112 = .5625
Let = population proportion with a longer right foot.
Assuming = 0.5 (H0),
So p-value = 2(0.093) = 0.186
Left and Right Foot Lengths
p-value = 0.186 so no evidence against the null hypothesis.
Although was a tendency toward a longer right foot in sample, there is insufficient evidence to conclude the proportion in the population with a longer right foot is different from the proportion with a longer left foot.
Left and Right Foot Lengths
Conclusion in words
Statistical significancevs Real importance
The p-value does not provide information about the magnitude of the effect.
The magnitude of a statistically significant effect can be so small that the practical effect is not important.
If sample size large enough, almost any null hypothesis can be rejected.
Birth Month and Height
Austrian study of heights of 507,125 military recruits.
Men born in spring were, on average, about 0.6 cm taller than men born in fall (Weber et al., Nature, 1998, 391:754–755).
A small difference: 0.6 cm = about 1/4 inch.
Sample size so large that even a very small difference was statistically significant.
Headline: Spring Birthday Confers Height Advantage
Internet and Loneliness
A closer look: actual effects were quite small.
“one hour a week on the Internet was associated, on average, with an increase of 0.03, or 1 percent on the depression scale” (Harman, 30 August 1998, p. A3).
“greater use of the Internet was associated with declines in participants’ communication with family members in the household, declines in size of their social circle, and increases in their depression and loneliness” (Kraut et al., 1998, p. 1017)
Test statistic and p-value
Common definition
p-value from normal (0, 1)
N(0,1) may be improved when s.e. is estimated
€
z = statistic − null value standard error
Testing Hypotheses About Mean
1. H0: = 0 versus HA: ≠ 0 (two-tailed)
2. H0: ≥ 0 versus HA: < 0 (one- tailed)
3. H0: ≤ 0 versus HA: > 0 (one- tailed)
Often write H0: = 0 for all tests.
Always use = 0 for finding p-values
Model: Random sample from popn with mean Popn is approx normal or reasonably large sample
Hypotheses:
Known (or large n)
Test statistic:
p-value: From normal (0,1)
Only if value of s is known or big enough sample that we can estimate it well (e.g. n > 30)
€
z = statistic − null value standard error
= x−0
n
⎛ ⎝ ⎜ ⎞
⎠ ⎟
Unknown — t-test
Test statistic:
t has standard distn if = 0
t-distn with n – 1 degrees of freedom
p-value: Prob of more extreme t if = 0 (i.e. H0) From t(n – 1) distribution
€
t = statistic − null value standard error
= x−0
sn
⎛ ⎝ ⎜ ⎞
⎠ ⎟
Finding the p-value
Normal Body Temperature
What is normal body temperature?
Less than 98.6 ºF (on average)?
Data: Random sample of n = 18 normal body temps
Model: Body temps are random sample from popn with mean Popn is approx normal
98.2 97.8 99.0 98.6 98.2 97.8 98.4 99.7 98.297.4 97.6 98.4 98.0 99.2 98.6 97.1 97.2 98.5
Normal Body Temperature
H0: = 98.6
HA: < 98.6
Hypotheses:
Test statistic:
€
s.e. x( ) =s
n=0.684
18=0.161
38.2161.0
6.98217.980 −=−
=−
=
nsx
t
€
x = 98.217
s = 0.684
Normal Body Temperature
p-value:
If = 98.6 , t has a t (n – 1) distn = t (17 d.f.) distn Minitab or Excel can find tail area:
€
t = x−0s
n
= −2.38H0: = 98.6
HA: < 98.6
Normal Body Temperature
Conclusion:
p-value = 0.015 Moderately strong evidence that the mean body
temperature (of this population) is less than 98.6 ºF
H0: = 98.6
HA: < 98.6
Normal Body Temperature
Minitab: Does all the calculations for you
p-value = 0.015 Moderately strong evidence that the mean body
temperature (of this population) is less than 98.6 ºF
Test of mu = 98.600 vs mu < 98.600Variable N Mean StDev SE Mean T PTemperature 18 98.217 0.684 0.161 -2.38 0.015
Paired data example
n = 10 pilots perform flight simulation (a) sober then (b) after alcohol.
Does useful performance time decrease with alcohol use?
Is mean difference zero or >zero?
Paired Data and the Paired t-Test
Data: d = x1 – x2
Model: Differences are random sample from popn with mean d
Parameter estimate:
Hypothesis test:
( )n
sdes d=..d
Same analysis as before (CI & test) but using differences
Is the mean difference in the population different from 0?
Paired t-Test
Hypotheses H0: d = 0 HA: d 0 (or d < 0 or d > 0
(Take care picking HA.)
nsd
td
0
error standard
valuenullmean sample −=
−=
Test statistic
p-value and conclusion From t (n – 1 d.f.) distn Make sure you use correct tails, especially 1-tailed
Effect of Alcohol
Data: 10 pilots do flight simulation (a) sober, then (b) after alcohol Response = useful performance time
Model: Differences are random sample from popn with mean Approx normal
Does mean useful performance time decrease?
Effect of Alcohol
H0: d = 0
HA: d > 0
Hypotheses:
Test statistic:
€
d = 165 .6sd = 230.5
€
s.e. d( ) =sdn=230 .5
10=72.9
€
t= d −0
sdn
=195 .6 −0
72.9=2.68
Effect of Alcohol
p-value:
If d = 0 , t has a t (n – 1) distn = t (9 d.f.) distn
Minitab or Excel can find tail area:
H0: d = 0
HA: d > 0
€
t= d −0
sdn
=2.68
Effect of Alcohol
Conclusion:
p-value = 0.013 Even with a small experiment, there is moderately strong
evidence that alcohol has decreased the mean performance time
H0: d = 0
HA: d > 0
Effect of Alcohol
Minitab: Does all the calculations for you
p-value = 0.013 Even with a small experiment, there is moderately strong
evidence that alcohol has decreased the mean performance time
Test of mu = 0.0 vs mu > 0.0Variable N Mean StDev SE Mean T PDiff 10 195.6 230.5 72.9 2.68 0.013
More about p-values
If H0 is true, you can still get small p-values Prob(p-value ≤ 0.05) = 0.05 Prob(p-value ≤ 0.01) = 0.01 Prob(p-value ≤ 0.001) = 0.001
If HA is true, small p-values are more likely … but big p-values are still possible
p-value is approx 0 — conclude HA almost certainly true.
Never conclude H0 is true
P-values for other tests
If you know H0 and HA
p-value (e.g. from Minitab output)
… then you should be able to report the conclusion from the test
e.g. Test for normality: H0: sample from normal popn; HA: not normal p-value = 0.163 No evidence that popn is not normal