Upload
buikhuong
View
224
Download
0
Embed Size (px)
Citation preview
Introduction to the Analysis of Variance
(ANOVA)
Computing One-Way Independent Measures
(Between Subjects) ANOVAs
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Analysis of Variance (ANOVA)
• The analysis of variance (ANOVA) is a statistical technique for testing for differences between the means of multiple (more than two) groups
• It is probably the most prevalent statistical technique used in psychological research.
• The ANOVA is a flexible technique that can be used with a variety of different research designs.
• In today’s lecture, I will explain the logic behind the ANOVA and introduce the one-way between groups ANOVA, which is an ANOVA in which the groups are defined along only one independent (or quasi-independent) variable
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Analysis of Variance
• The purpose of ANOVA is much the same as the t tests
presented in the preceding lectures
– Are the mean differences obtained for sample data sufficiently large for
us to conclude that there are mean differences between the populations
from which the samples were obtained
• The difference between ANOVA and the t tests is that ANOVA
can be used in situations where there are two or more means
being compared, whereas the t tests are limited to situations
where only two means are involved.
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Problem of Multiple Comparisons
• The ANOVA is necessary to protect researchers from an
excessive experimentwise error rate in situations where a
study is comparing more than two population means.
– Experimentwise error rate: the probability of making at least one Type I
error across mutliple comparisons
• These situations would require a series of several t tests to
evaluate all of the mean differences. (Remember, a t test can
compare only two means at a time)
• So? Why not just use multiple t-tests?
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Problem of Multiple Comparisons
• Why not just use multiple t-tests?
– Although each t test can be evaluated using a specific α-level (risk of
Type I error), the α-levels accumulate over a series of tests so that the
final familywise α-level can be quite large
• Example:
– For 5 levels of the independent variable, there are 10 possible pairwise
comparisons between group means:
• {1,2},{1,3},{1,4},{1,5},{2,3},{2,4},{2,5},{3,4},{3,5},{4,5}
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Problem of Multiple Comparisons
• Assume H0 is true and α=0.05. Then the probability of accepting H0 in a single pairwise comparison is:
• However, we have to make 10 such comparisons. Using the multiplicative law of probability (remember that?), and assuming independent pairwise tests, the probability of correctly retaining the null in all 10 comparisons is:
0accept single pairwise 0.91 5P H
0
10
1 1 ... 1
0.59
accept all
9
0.95
P H
01 accept all
1 0.599 0.401
experiment P H
We now have a 40% overall
chance of making a Type I error!
Therefore,
01:830:200:01-05 Fall 2014
Intro to ANOVA
01:830:200:01-05 Fall 2014
Intro to ANOVA
Null and Alternative Hypotheses in ANOVAs
• The omnibus null hypothesis is the null hypothesis in the
ANOVA: that the population means of all groups being
compared are equal
– i.e., for three groups, H0: μ1= μ2= μ3
• Alternative Hypothesis: at least one population mean is
different from the others.
01:830:200:01-05 Fall 2014
Intro to ANOVA
Assumptions of the ANOVA
• Normality of Scores – I.e., we assume that the scores in all of our group populations are
normally distributed
– Since this is important primarily for the sampling distribution of the mean, the ANOVA is fairly robust to violations of this assumption, especially if the sample sizes are reasonably large
• Homogeneity of variances – We assume that each population of scores has the same variance
– E.g., [error variance]
– ANOVA is fairly robust to violations of this assumption
• Independence of observations – E.g., given the population parameters, knowing one person’s score tells
you nothing about another person’s score.
– Violations of this assumption can have serious implications for an analysis.
01:830:200:01-05 Fall 2014
Intro to ANOVA
Populations
(µ,σ unknown)
Samples
Instructor 1 Instructor 2 Instructor 3
Alternative Hypothesis: µ1 , µ2 , and µ3 are not all equal
01:830:200:01-05 Fall 2014
Intro to ANOVA
Omnibus Null Hypothesis:
µ1 = µ2 = µ3
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Logic of the Analysis of Variance
• The test statistic for ANOVA is an F-ratio, which is a ratio of two estimates of the population variance.
• In the context of ANOVA, these variance estimates are called mean squares, or MS values
– The numerator, MSbetween, estimates variance using the sample means
of different treatment groups
– The denominator, MSwithin (or MSerror), estimates variance using the sample variances within each treatment group
variance including any treatment effects
variance without any treatment effects
between
within
MSF
MS
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Logic of the Analysis of Variance
Total Variance
Between
Treatments
Variance
Within
Treatments
Variance
Measures differences caused by:
• Systematic treatment effects
• Sampling & other non-
systematic errors
Measures differences caused by:
• Sampling & other non-
systematic errors
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Logic of the ANOVA
• Regardless of whether or not the null hypothesis is true, the
assumption of homogeneity of variances implies that all
population variances are equal
• Thus, as we did for the independent-samples t-test, we can
estimate this shared population variance by taking the
average of the sample variances (the pooled variance)
1
2 2
2
2 2
3
2 32
2 2 22 2 2 2 2
31
1 , ,3
ˆwithi pn
s s ss Avg s s s
(assuming n1 = n2 = n3)
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Logic of the ANOVA
• However, if all the population means are equal (under H0),
then we have a second way to estimate the population
variance
– we can estimate the population variance using the variance of the
sample means
• Recall that the Central Limit Theorem tells us how to compute
the variance of sample means from the population variance:
• We can rearrange this formula to solve for the population
variance given the variance of sample means:
22
Mn
2 2
Mn
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Logic of the ANOVA
• Of course, we don’t have the variance of sample means
either. However, we can estimate it by computing the variance
of our three group means
• Plugging this into the previous equation, our second estimate
of the population variance is
1 2 3
2 2 , ,ˆM Ms Var M M M
2 2ˆbetween Mns
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Logic of the ANOVA
We now have two estimates of the population variance:
• An estimate computed from the sample variances, which
should estimate the population variance regardless of whether
H0 is true
• A second estimate computed from the sample means, which
only estimates the population variance if H0 is true
2 3
2 2 2 2 2
1 , ,ˆwith n pi s Avg s s s
2 2
2 31ˆ , ,betwe Men ns nVar M M M
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Logic of the ANOVA
• The F-ratio used as the test statistic for the ANOVA is simply the
ratio between these two estimates of the population variance
• If H0 is true, then these two estimates should be equal (on average)
– In this case, the ratio should be 1.0
• However, if H0 is false, then the estimate in the numerator (which is
based on the variability of sample means) will include the treatment
effect in addition to differences in sample means expected by
chance
– In this case, the ratio should be greater than 1.0
21
2
2 3
2 3
2 2 2
1
, ,
,
ˆ
ˆ ,
between
with
between
i inwith n
nVar M MMSF
MS A
M
vg s s s
01:830:200:01-05 Fall 2014
Intro to ANOVA
The F distribution
reject H0
retain H0
01:830:200:01-05 Fall 2014
Intro to ANOVA
Populations
Samples
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Logic of the ANOVA
2 2
2 2 2
1 2 3
3
12.1
ˆ
36.
8 33.18 63.52
3
29
within ps
s s s
𝑴 𝑴𝟐
65.40 4277.16
70.95 5033.90
71.20 5069.44
sum 207.55 14380.50
207.55
369.18TM
M
k
n M
n
2
22 207.55
14380.503
14380.50 14359.00 2= 1.50
M
MMSS
k
2 2
215.0
21.5020
2
ˆbetween
M
M
M
ns
SSndf
2
2
2155.92
36 9ˆ .2
ˆbetween
within
F
Sample 1 Sample 2 Sample 3
n = 20 n = 20 n = 20
M = 65.4 M = 70.95 M = 71.2
s2 = 12.18 s2 = 33.18 s2 = 63.52
01:830:200:01-05 Fall 2014
Intro to ANOVA
Computations for the ANOVA
• In computing the terms required for the F-statistic, we won’t
explicitly compute any sample variances or standard
deviations
• Instead, in all intermediate steps, we’ll deal exclusively with
sums of squared deviations (SS) and means of squared
deviations (MS)
• Computing the F-statistic using sample standard deviations or
variances gets you the same answer, but requires more
calculations
01:830:200:01-05 Fall 2014
Intro to ANOVA
Computing the F-statistic
betweenbetween
between
SSMS
df within
within
within
SSMS
df
, betweenbetween within
within
MSF df df
MS
where,
and
01:830:200:01-05 Fall 2014
Intro to ANOVA
Computations for the ANOVA: Preliminaries
• Start by computing and for each group, then
compute:
• Grand total: The overall total, computed over all scores in all
groups (samples)
• Total sum of squared scores: The sum of squared scores
computed over all scores in all groups
k n
T ij
i j
x x
2 2k n
T ij
i j
x x
x 2x
01:830:200:01-05 Fall 2014
Intro to ANOVA
Computations for the ANOVA: SS terms
• SStotal : The sum of squared deviations of all observations
from the grand mean
– Not strictly needed for computing the F ratio, but it makes computing the
needed SS terms much easier
2
22
total T
T
T
xxSS x M
N
or
total between withinSSSS SS
(conceptual) (computational)
01:830:200:01-05 Fall 2014
Intro to ANOVA
Total Variance
Between
Treatments
Variance
Within
Treatments
Variance
Computations for the ANOVA: SS terms
total between within
total between within
SS SS
d
SS
d dfff
between total within
between total within
SS SS
d
SS
d dfff
within total between
within total between
SS SS
d
SS
dfdff
01:830:200:01-05 Fall 2014
Intro to ANOVA
Computations for the ANOVA: SS terms
• SSbetween: The sum of squared deviations of the sample
means from the grand mean multiplied by the number of
observations
• SSwithin (SSerror): The sum of squared deviations within each
sample
2
k
between i i T
i
nS M MS
within total betweenSS SSSS 1 2 ...k
within j
j
kSS SS SS SS SS or
or totb aet lween withinSS SSSS
01:830:200:01-05 Fall 2014
Intro to ANOVA
Computations for the ANOVA: df terms
• dftotal = N-1 :
– degrees of freedom associated with SStotal
– N is the total number of scores
• dfbetween = k-1 :
– degrees of freedom associated with SSbetween
– k is the number of groups (samples)
• dfwithin (or dferror)= dftotal -dfbetween = N-k :
– degrees of freedom associated with SSwithin
– Can also be computed as:
1 2 1 2. 1 .. 1.. 1 .k kdf d nfdf n n
01:830:200:01-05 Fall 2014
Intro to ANOVA
Computing the F-statistic
betweenbetween
between
SSMS
df
withinwithin
within
SSMS
df
, betweenbetween within
within
MSF df df
MS
01:830:200:01-05 Fall 2014
Intro to ANOVA
The One-Way ANOVA: Steps
1. State Hypotheses
2. Compute F-ratio statistic:
– For data in which I give you raw scores, you will have to compute:
• Sample means
• SStotal, SSbetween, & SSwithin
• dftotal, dfbetween, & dfwithin
3. Use F-ratio distribution table to find critical F-value representing rejection region
4. Make a decision: does the F-statistic for your sample fall into the rejection region?
, betweenbetween within
within
MSF df df
MS
01:830:200:01-05 Fall 2014
Intro to ANOVA
A psychologist wants to determine whether having a job
interferes with student academic performance. She measures
academic performance using students’ GPAs. She selects a
sample of 30 students.
• Of these students,10 did not work, 10 worked part-time, and
10 worked full-time during the previous semester
• Conduct an ANOVA at a .05 level of significance testing the
hypothesis that having a job interferes with student
performance
The One-Way ANOVA: Textbook Example
01:830:200:01-05 Fall 2014
Intro to ANOVA
Work Status No Work Part-Time Full-Time 3.40 3.50 2.90
3.20 3.60 3.00
3.00 2.70 2.60
3.00 3.50 3.30
3.50 3.80 3.70
3.80 2.90 2.70
3.60 3.40 2.40
4.00 3.20 2.50
3.90 3.30 3.30
2.90 3.10 3.40
M1 =3.43 M2 =3.3 M3 =2.98 MT =3.24
n1 =10 n2 =10 n3 =10 N =30
Tx 97.10
2
Tx 319.47
Source df SS MS F
Between
Within (error)
Total
Set up a summary ANOVA table:
1. Compute degrees of freedom
1
29
2
27
1total
between
within
df k
df
df N
N k
01:830:200:01-05 Fall 2014
Intro to ANOVA
Source df SS MS F
Between 2
Within (error) 27
Total 29
Set up a summary ANOVA table:
2. Compute SStotal
2
2
297.10
319.4730
319.47 5314.2 . 98 1
total
T
T
xSS x
N
Work Status No Work Part-Time Full-Time 3.40 3.50 2.90
3.20 3.60 3.00
3.00 2.70 2.60
3.00 3.50 3.30
3.50 3.80 3.70
3.80 2.90 2.70
3.60 3.40 2.40
4.00 3.20 2.50
3.90 3.30 3.30
2.90 3.10 3.40
M1 =3.43 M2 =3.3 M3 =2.98 MT =3.24
n1 =10 n2 =10 n3 =10 N =30
Tx 97.10
2
Tx 319.47
01:830:200:01-05 Fall 2014
Intro to ANOVA
Source df SS MS F
Between 2
Within (error) 27
Total 29 5.19
Set up a summary ANOVA table:
3. Compute SSbetween (or SSwithin) directly
2
2 2 210 3.43 3.24 10 3.3 3.24 10 2.98 3.24
0.361 0.036 0.6 076 1. 7
between TSS n M M
Work Status No Work Part-Time Full-Time 3.40 3.50 2.90
3.20 3.60 3.00
3.00 2.70 2.60
3.00 3.50 3.30
3.50 3.80 3.70
3.80 2.90 2.70
3.60 3.40 2.40
4.00 3.20 2.50
3.90 3.30 3.30
2.90 3.10 3.40
M1 =3.43 M2 =3.3 M3 =2.98 MT =3.24
n1 =10 n2 =10 n3 =10 N =30
Tx 97.10
2
Tx 319.47
01:830:200:01-05 Fall 2014
Intro to ANOVA
Source df SS MS F
Between 2 1.07
Within (error) 27
Total 29 5.19
Set up a summary ANOVA table:
4. Compute the missing SS value
(SSbetween or SSwithin) via subtraction:
5.19 11.07 4. 2
total betweenwithinSS SS SS
Work Status No Work Part-Time Full-Time 3.40 3.50 2.90
3.20 3.60 3.00
3.00 2.70 2.60
3.00 3.50 3.30
3.50 3.80 3.70
3.80 2.90 2.70
3.60 3.40 2.40
4.00 3.20 2.50
3.90 3.30 3.30
2.90 3.10 3.40
M1 =3.43 M2 =3.3 M3 =2.98 MT =3.24
n1 =10 n2 =10 n3 =10 N =30
Tx 97.10
2
Tx 319.47
01:830:200:01-05 Fall 2014
Intro to ANOVA
Source df SS MS F
Between 2 1.07
Within (error) 27 4.12
Total 29 5.19
Set up a summary ANOVA table:
5. Compute the MS values needed to
compute the F ratio:
0.531.07
25between
between
between
SSMS
df
4.12
23
70.15within
within
within
SSMS
df
Work Status No Work Part-Time Full-Time 3.40 3.50 2.90
3.20 3.60 3.00
3.00 2.70 2.60
3.00 3.50 3.30
3.50 3.80 3.70
3.80 2.90 2.70
3.60 3.40 2.40
4.00 3.20 2.50
3.90 3.30 3.30
2.90 3.10 3.40
M1 =3.43 M2 =3.3 M3 =2.98 MT =3.24
n1 =10 n2 =10 n3 =10 N =30
Tx 97.10
2
Tx 319.47
01:830:200:01-05 Fall 2014
Intro to ANOVA
Source df SS MS F
Between 2 1.07 0.535
Within (error) 27 4.12 0.153
Total 29 5.19
Set up a summary ANOVA table:
6. Compute the F ratio:
3.500.535
2,270
,
.153
betwerro
eenb r
er
et
ror
ween
MSF df
Mf
F
dS
Work Status No Work Part-Time Full-Time 3.40 3.50 2.90
3.20 3.60 3.00
3.00 2.70 2.60
3.00 3.50 3.30
3.50 3.80 3.70
3.80 2.90 2.70
3.60 3.40 2.40
4.00 3.20 2.50
3.90 3.30 3.30
2.90 3.10 3.40
M1 =3.43 M2 =3.3 M3 =2.98 MT =3.24
n1 =10 n2 =10 n3 =10 N =30
Tx 97.10
2
Tx 319.47
01:830:200:01-05 Fall 2014
Intro to ANOVA
1 2 3 4 5 6 7 8 9 10
1 161.45 199.50 215.71 224.58 230.16 233.99 236.77 238.88 240.54 241.88
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25
26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22
28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08
50 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.03
60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99
120 3.92 3.07 2.68 2.45 2.29 2.18 2.09 2.02 1.96 1.91
200 3.89 3.04 2.65 2.42 2.26 2.14 2.06 1.98 1.93 1.88
500 3.86 3.01 2.62 2.39 2.23 2.12 2.03 1.96 1.90 1.85
1000 3.85 3.00 2.61 2.38 2.22 2.11 2.02 1.95 1.89 1.84
dfnumerator
F table for α=0.05
reject H0
df e
rro
r
01:830:200:01-05 Fall 2014
Intro to ANOVA
1 2 3 4 5 6 7 8 9 10
1 161.45 199.50 215.71 224.58 230.16 233.99 236.77 238.88 240.54 241.88
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25
26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22
28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08
50 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.03
60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99
120 3.92 3.07 2.68 2.45 2.29 2.18 2.09 2.02 1.96 1.91
200 3.89 3.04 2.65 2.42 2.26 2.14 2.06 1.98 1.93 1.88
500 3.86 3.01 2.62 2.39 2.23 2.12 2.03 1.96 1.90 1.85
1000 3.85 3.00 2.61 2.38 2.22 2.11 2.02 1.95 1.89 1.84
dfnumerator
F table for α=0.05
reject H0
df e
rro
r
01:830:200:01-05 Fall 2014
Intro to ANOVA
Source df SS MS F
Between 2 1.07 0.535 3.50
Within (error) 27 4.12 0.153
Total 29 5.19
Set up a summary ANOVA table:
7. Compare computed F statistic with
Fcrit and make a decision
03.5 3.37;
3.3
reject
7critF
H
Conclusion: Having a job does significantly interfere with academic performance
Work Status No Work Part-Time Full-Time 3.40 3.50 2.90
3.20 3.60 3.00
3.00 2.70 2.60
3.00 3.50 3.30
3.50 3.80 3.70
3.80 2.90 2.70
3.60 3.40 2.40
4.00 3.20 2.50
3.90 3.30 3.30
2.90 3.10 3.40
M1 =3.43 M2 =3.3 M3 =2.98 MT =3.24
n1 =10 n2 =10 n3 =10 N =30
TSS = 5.19
01:830:200:01-05 Fall 2014
Intro to ANOVA
The One-way ANOVA: Example 2
Return to our running example:
Do test scores vary as a function of the instructor?
• x1 : sample scores from Dr. M’s class
• x2 : sample scores from Dr. K’s class
• x3 : sample scores from Dr. A’s class
• Null Hypothesis H0: µ1 = µ2 = µ3
• Research Hypothesis H1: one of the population means is different
• Do we accept or reject the null hypothesis? – Assume α = 0.05
01:830:200:01-05 Fall 2014
Intro to ANOVA
Source df SS MS F
Between
Within (error)
Total 465.04
Set up a summary ANOVA table:
1. Compute degrees of freedom
1
24
2
22
1total
between
within
df k
df
df N
N k
Instructor
Dr. M Dr. K Dr. A
73 62 72
71 66 68
76 66 70
68 66 62
65 58 69
72 61 66
75 67 65
67 68
67
62
n1 =7 n2 =10 n3 =8 N =25
M1 =71.43 M2 =64.20 M3 =67.50 MT =67.28
SS1 =89.71 SS2 =91.60 SS3 =68.00 SST =465.04
01:830:200:01-05 Fall 2014
Intro to ANOVA
Source df SS MS F
Between 2
Within (error) 22
Total 24 465.04
Set up a summary ANOVA table:
89.71 91.60 68.0
249.31
0
withinSS SS
Instructor
Dr. M Dr. K Dr. A
73 62 72
71 66 68
76 66 70
68 66 62
65 58 69
72 61 66
75 67 65
67 68
67
62
n1 =7 n2 =10 n3 =8 N =25
M1 =71.43 M2 =64.20 M3 =67.50 MT =67.28
SS1 =89.71 SS2 =91.60 SS3 =68.00 SST =465.04
3. Compute SSwithin (or SSbetween) directly
(This time, we’ll compute SSwithin)
01:830:200:01-05 Fall 2014
Intro to ANOVA
Set up a summary ANOVA table:
465.04 249.
21 3
31
5.7
total withbetween inSSS SSS
Instructor
Dr. M Dr. K Dr. A
73 62 72
71 66 68
76 66 70
68 66 62
65 58 69
72 61 66
75 67 65
67 68
67
62
n1 =7 n2 =10 n3 =8 N =25
M1 =71.43 M2 =64.20 M3 =67.50 MT =67.28
SS1 =89.71 SS2 =91.60 SS3 =68.00 SST =465.04
Source df SS MS F
Between 2
Within (error) 22 249.31
Total 24 465.04
4. Compute the missing SS value
(SSbetween or SSwithin) via subtraction:
01:830:200:01-05 Fall 2014
Intro to ANOVA
Set up a summary ANOVA table: Instructor
Dr. M Dr. K Dr. A
73 62 72
71 66 68
76 66 70
68 66 62
65 58 69
72 61 66
75 67 65
67 68
67
62
n1 =7 n2 =10 n3 =8 N =25
M1 =71.43 M2 =64.20 M3 =67.50 MT =67.28
SS1 =89.71 SS2 =91.60 SS3 =68.00 SST =465.04
Source df SS MS F
Between 2 215.73
Within (error) 22 249.31
Total 24 465.04
5. Compute the MS values needed to
compute the F ratio:
215.73
2107.87between
between
between
SSMS
df
249.31
211.33
2
withinwithin
within
SSMS
df
01:830:200:01-05 Fall 2014
Intro to ANOVA
Set up a summary ANOVA table: Instructor
Dr. M Dr. K Dr. A
73 62 72
71 66 68
76 66 70
68 66 62
65 58 69
72 61 66
75 67 65
67 68
67
62
n1 =7 n2 =10 n3 =8 N =25
M1 =71.43 M2 =64.20 M3 =67.50 MT =67.28
SS1 =89.71 SS2 =91.60 SS3 =68.00 SST =465.04
Source df SS MS F
Between 2 215.73 107.87
Within (error) 22 249.31 11.33
Total 24 465.04
6. Compute the F ratio:
9.52107.87
2,221
,
1.33
betwerro
eenbet r
error
ween
MSF df
Mf
F
dS
01:830:200:01-05 Fall 2014
Intro to ANOVA
1 2 3 4 5 6 7 8 9 10
1 161.45 199.50 215.71 224.58 230.16 233.99 236.77 238.88 240.54 241.88
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25
26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22
28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08
50 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.03
60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99
120 3.92 3.07 2.68 2.45 2.29 2.18 2.09 2.02 1.96 1.91
200 3.89 3.04 2.65 2.42 2.26 2.14 2.06 1.98 1.93 1.88
500 3.86 3.01 2.62 2.39 2.23 2.12 2.03 1.96 1.90 1.85
1000 3.85 3.00 2.61 2.38 2.22 2.11 2.02 1.95 1.89 1.84
dfnumerator
F table for α=0.05
reject H0
df e
rro
r
01:830:200:01-05 Fall 2014
Intro to ANOVA
1 2 3 4 5 6 7 8 9 10
1 161.45 199.50 215.71 224.58 230.16 233.99 236.77 238.88 240.54 241.88
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25
26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22
28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08
50 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.03
60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99
120 3.92 3.07 2.68 2.45 2.29 2.18 2.09 2.02 1.96 1.91
200 3.89 3.04 2.65 2.42 2.26 2.14 2.06 1.98 1.93 1.88
500 3.86 3.01 2.62 2.39 2.23 2.12 2.03 1.96 1.90 1.85
1000 3.85 3.00 2.61 2.38 2.22 2.11 2.02 1.95 1.89 1.84
dfnumerator
F table for α=0.05
reject H0
df e
rro
r
01:830:200:01-05 Fall 2014
Intro to ANOVA
Set up a summary ANOVA table: Instructor
Dr. M Dr. K Dr. A
73 62 72
71 66 68
76 66 70
68 66 62
65 58 69
72 61 66
75 67 65
67 68
67
62
n1 =7 n2 =10 n3 =8 N =25
M1 =71.43 M2 =64.20 M3 =67.50 MT =67.28
SS1 =89.71 SS2 =91.60 SS3 =68.00 SST =465.04
Source df SS MS F
Between 2 215.73 107.87 9.52
Within (error) 22 249.31 11.33
Total 24 465.04
7. Compare computed F statistic with
Fcrit and make a decision
09.52 3.4
3.
4;
4
reject
4critF
H
Conclusion: Student test scores do vary across instructors
01:830:200:01-05 Fall 2014
Intro to ANOVA
Effect Size for the One-Way ANOVA
• For ANOVAs, effect sizes are usually indicated using the
R2-family measure eta-squared (η2)
• R2-family measures indicate the effect size in terms of
proportion of variance accounted for by the treatment effect(s)
For our example:
2 0215.73
465.04.46between
total
SS
SS
2 variability explained by treatment effect
total variabilityR
01:830:200:01-05 Fall 2014
Intro to ANOVA
Post-hoc Tests for Multiple Comparisons
• Rejecting H0 only tells us that the omnibus null hypothesis
(that all sample means are equal) is false
• However, we are often interested in knowing which particular
means differ from each other
• Evaluating differences (usually pairwise) beyond the omnibus
null hypothesis requires post-hoc testing
01:830:200:01-05 Fall 2014
Intro to ANOVA
Post-hoc Tests
• The challenge in constructing a post-hoc multiple comparison test is keeping the experimentwise α low while maximizing the power of the test – Power refers to the ability of a statistical test to pick up true differences
between population means
• Researchers use many different post-hoc tests tailored to particular families of comparisons. Most of these tests are based on the t-test
• We will cover two such tests: – Fisher’s LSD (protected t-test)
– The Bonferroni procedure
01:830:200:01-05 Fall 2014
Intro to ANOVA
Fisher’s Least Significant Difference (LSD) Test
• Fisher’s LSD (protected t) test was the first proposed method for post-hoc pairwise comparisons
• It is nearly identical to the independent measures t-test. The only differences are that the denominator uses MSerror in place of pooled variance and uses dferror as the degrees of freedom for the t-statistic
• The t is “protected” in that the omnibus null hypothesis must be rejected for this test to be valid – The test is fairly liberal, producing higher than intended experimentwise α
for post-hoc tests involving more than 3 pairwise comparisons
2
21
1error
error error
t dfMS MS
n
M M
n
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Bonferroni Procedure
• The Bonferroni procedure simply adjusts the pairwise alpha
for a group of comparisons to ensure that, in the worst case
scenario, the experimentwise alpha will never exceed 0.05
• The worst case occurs when the rejection of H0 under
different pairwise comparisons is mutually exclusive
• In this case, via the additive rule, the probability of falsely
rejecting H0 in k comparisons is α1+…+αk = kα
01:830:200:01-05 Fall 2014
Intro to ANOVA
The Bonferroni Procedure
• Thus, the Bonferroni procedure requires that you divide the
pairwise alpha by the number of comparisons.
– For example, if you wanted to make 10 pairwise comparisons at a
desired experimentwise α of 0.05, you would choose the rejection
region using a pairwise criterion of α/10 =0.005
• The Bonferroni procedure is a very conservative test. It is
guaranteed to keep the experimentwise Type I error rate
below α but is more likely to lead to Type II errors (acceptance
of H0 when it is false).
• The formula for the Bonferroni procedure is exactly like that
for Fisher’s LSD test.
01:830:200:01-05 Fall 2014
Intro to ANOVA
Post hoc tests: Example (Fisher’s LSD)
ANOVA Summary Table
2
1
3
3.43
3.3
2.98
M
M
M
Let’s do all possible comparisons: {1,2},{1,3},{2,3}
error
error e
A B
B
rror
A
M Mt df
MS MS
n n
First, note that the denominator is the same for all
comparisons:
1 2 3 10nn n
t-statistic for Fisher’s LSD test
when comparing {A,B}:
270.1750.153 0.153 0.0306
10 10
A B A B A BM M M M Mt
M
Source df SS MS F
Between 2 1.07 0.535 3.50
Within (error) 27 4.12 0.153
Total 29 5.19
01:830:200:01-05 Fall 2014
Intro to ANOVA
Post hoc tests: Example (Fisher’s LSD)
Let’s do all possible comparisons: {1,2},{1,3},{2,3}
Now we simply apply this formula to all comparisons:
1 2270.175
3.3
0.175
3.43
0.
0.1750.7
134
tM M
{1,2} {1,3} {2,3}
1 3270.175
2.98
0.175
3.43
0.
0.
452.571
175
Mt
M
2 327
0.175
3.3 2.98
0.17
0
5
0.175
.32.0
5
tM M
2
1
3
3.43
3.3
2.98
M
M
M
01:830:200:01-05 Fall 2014
Intro to ANOVA
Post hoc tests: Example (Bonferroni)
Let’s do all possible comparisons: {1,2},{1,3},{2,3}
1 2270.175
3.3
0.175
3.43
0.
0.1750.7
134
tM M
{1,2} {1,3} {2,3}
1 3270.175
2.98
0.175
3.43
0.
0.
452.571
175
Mt
M
2 327
0.175
3.3 2.98
0.17
0
5
0.175
.32.0
5
tM M
2
1
3
3.43
3.3
2.98
M
M
M
We have three comparisons, so the Bonferroni correction to α
would be 0.05
.017# 3comparisons
01:830:200:01-05 Fall 2014
Intro to ANOVA
Level of significance for one-tailed test
0.25 0.2 0.15 0.1 0.05 0.025 0.01 0.005 0.0005
Level of significance for two-tailed test
df 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.001
1 1.000 1.376 1.963 3.078 6.314 12.706 31.821 63.657 636.619
2 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 31.599
3 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 12.924
4 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.998 3.499 5.408
8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.587
11 0.697 0.876 1.088 1.363 1.796 2.201 2.718 3.106 4.437
12 0.695 0.873 1.083 1.356 1.782 2.179 2.681 3.055 4.318
13 0.694 0.870 1.079 1.350 1.771 2.160 2.650 3.012 4.221
14 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 4.140
15 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 4.073
16 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 4.015
17 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.965
18 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.922
19 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.883
20 0.687 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.850 21 0.686 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.819
22 0.686 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.792
23 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.768
24 0.685 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.745
25 0.684 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.725 26 0.684 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.707
27 0.684 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.690
28 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.674
29 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.659
30 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.646 40 0.681 0.851 1.050 1.303 1.684 2.021 2.423 2.704 3.551
50 0.679 0.849 1.047 1.299 1.676 2.009 2.403 2.678 3.496
100 0.677 0.845 1.042 1.290 1.660 1.984 2.364 2.626 3.390
t-Distribution Table
Two-tailed test
One-tailed test
α
t
α/2 α/2
t -t
01:830:200:01-05 Fall 2014
Intro to ANOVA
Post hoc tests: Example
Let’s do all possible comparisons: {1,2},{1,3},{2,3}
Now we simply apply this formula to all comparisons:
1 2270.175
3.3
0.175
3.43
0.
0.1750.7
134
tM M
{1,2} {1,3} {2,3}
00.74 2.052, retai n H 02.571 2.052, rejec t H0 r2.0 eta2.052, in H
1 3270.175
2.98
0.175
3.43
0.
0.
452.571
175
Mt
M
2 327
0.175
3.3 2.98
0.17
0
5
0.175
.32.0
5
tM M
2
1
3
3.43
3.3
2.98
M
M
M
Fisher’s LSD:
Bonferroni: 00.74 2.473, retai n H 02.571 2.473, rejec t H0 r2.0 eta2.473, in H