Introduction to the Analysis of Variance

Introduction to the Analysis of Variance

Chapter 14

Chapter Topics

Basic ANOVA Concepts The omnibus null hypothesis Why ANOVA? Linear Model Equation Sums of Squares Mean Squares & Degrees of Freedom

The Completely Randomized Design Computational Formulae and Procedures Assumptions

Chapter Topics

Multiple Comparison Procedures Contrasts A Posteori Multiple Comparison Tests A Priori Multiple Comparison Tests

Practical Significance

The Omnibus Null Hypothesis

H p0 1 2 3 4: . . .

Each element denotes the mean of a different population The alternative states that at least two of the

population means are not equal If we reject, we don’t know which of the means

are different

Why ANOVA?

Why not tons of t tests? Consider the case when we have five means. Our omnibus

null hypothesis looks like:

We would have to compute ten different t tests in order to test each mean in the group.

• The number of two-sample tests that can be formed from p means is given by:

The type I error rate for these ten tests collectively is close to:

H 0 1 2 3 4 5:

p p

1

2

5 5 1

21 0

1 1 1 0 9 5 0 4 0 11 0 c . .

Why ANOVA?

ANOVA allows us to . . .. . . Test as many means as we would like in

a single, quick test. . . Control the type-I error rate at an

acceptable number

The Linear Model Equation

Each score can be thought of as a composite score consisting of three separate parameters.

is the score for the person in the population is the grand mean of all scores in an experiment is the effect of the population on this subject’s

score is the random error effect on this subject’s score This may make more sense when we see how to

estimate these parameters

X ij j ij X ij

j thi th

j

ij

j th

The Linear Model Equation

Estimating the Parameters

X j

ij

X Xj

X Xij j

The Sample Grand Mean

The Population Effect

The Error Effect

Example: Suppose I was one of 40 people taking an IQ test, and I received a score of 131. The average of all 40 people was 100. The average of people in my group (grad students) was 124.

131 = 100 + (124 – 100) + (131 – 124){

The “Treatment” Effect

(continued)

Sums of Squares

For this design, we are going to have three terms for sums of squares: SSTO

Sums of Squares Total Measures total variability among scores in an experiment

SSBG Sums of Squares Between Groups Measures variability between treatment levels

SSWG Sums of Squares Within Groups Measures variability within treatment levels

Sums of Squares

SSWG X Xij jj

p

i

n

2

11

SSTO X Xijj

p

i

n

2

11

(continued)

SSBG n X Xjj

p

2

1

Mean Squares

MSWG

X X

p n

ij jj

p

i

n

2

11

1

MSTO

X X

np

ijj

p

i

n

2

11

1

MSBG

n X X

p

jj

p

2

1

1

This is just another name for the total sample variance!

This is the average variability within treatment levels.

This is the average variability between treatment levels.

Mean Square Between Groups

MSBG

n X X

p

jj

p

2

1

1

Interpreting MSBG If MSBG is close to zero, it indicates that there is not a lot of

variability between the treatment levels. If there is little or no variation between the treatment levels but a great

deal of overall variation (MSTO), it must be that the variability within treatment levels accounts for most of the total variation.

An increase in MSBG indicates a higher amount of variability between treatment levels. If there is high variation between the treatment levels relative to the

overall variation, it must be that the variability between treatment levels accounts for most of the total variation.

Mean Square Within Groups

Interpreting MSWG If MSWG is close to zero, it indicates that there is not a lot of

variability within the treatment levels. If there is little or no variation within the treatment levels relative to

overall variation (MSTO), it must be that the variability between treatment levels accounts for most of the total variation.

An increase in MSWG indicates a higher amount of variability within treatment levels. If there is high variation within the treatment levels relative to the overall

variation, it must be that the variability between treatment levels does not account for an appreciable portion of the total variation.

MSWG

X X

p n

ij jj

p

i

n

2

11

1

Summarizing SS & MS – ANOVA Table

Source Sums of Squares Mean Squares

Between Groups

Within Groups

Total

SSBG p 1

MSBG

MSBG

M SWG

SSWG p n 1

MSWG

SSTO np 1 MSTO

d f F

SSBG SSWG SSTO

p p n np 1 1 1

MSBG M SWG M STO

Measuring Variability What we want as researchers is for our “treatment

levels” to account for more variation than does random error. Enter the F statistic Recall from previous chapters that the F statistic is defined as

the ratio of two independent variances. MSBG and MSWG are both measures of variance, and they are

independent of one another. Forming a ratio of these two numbers yields an F statistic.

Because we want our “treatment levels” to account for most of the variation, we want this statistic to be as large as possible.

How large is large enough?

FM SBG

M SWG

Expected Mean Squares Recall the assumptions:

Random Sampling / Random Assignment Normally distributed populations Equal variances Equal means?????

If the means are equal in each of the populations, it can be shown that the expected value of BOTH of the mean squares terms is:

This is the population error variance. If the means are not equal in each of the populations, it can be

shown that the expected value of the mean squares are:

Measuring Variability

E M SBG E M SWG 2

E M SBGn

p

jj

p

21

2

1

(continued)

E M SWG 2

If the population means are not equal . . . . . . we have expected values as displayed above. . . . when we form the F statistic, we see that the ratio will

become larger than one as the size of the “treatment effects” grows.

If the population means are equal . . . . . . the F ratio will be close to one

Sooooo . . . How far away from one is far enough away to say the means are different? Enter the F table with (p-1) degrees of freedom in the

numerator and n(p-1) degrees of freedom in the denominator.

Measuring Variability

E M SBGn

p

jj

p

21

2

1

(continued)

E M SWG 2

So far, we’ve been talking about ANOVA in general The Completely Randomized Design (CR-p) is one of

many designs. It is characterized by: One treatment with p levels N=n*p participants Participants are randomly assigned to the treatment levels

We usually want to restrict this so that each treatment level has the same number of participants

Differences from other designs Each participant is randomly assigned to only one treatment

level. Participants are not administered more than one treatment We don’t have to worry about participant matching, independent

samples, repeated measures, etc.

The Completely Randomized Design

Computational Procedures for CR-p

SSTO X

X

npijj

p

i

n ijj

p

i

n

2

11

11

2

SSBG

X

n

X

np

iji

n

j

p ijj

p

i

n

1

2

1

11

2

SSWG X

X

nijj

p

i

n iji

n

j

p

2

11

1

2

1

Consider an example where we are interested in the effects of sleep deprivation on hand-steadiness. That is, we want to know if the amount of experienced sleep deprivation has an effect on hand-steadiness. We, as researchers, decide to have four treatment levels. a1 – 12 hours of sleep deprivation

a2 – 18 hours of sleep deprivation



We have a total of 32 subjects, so we need to randomly assign them to one of these four groups.


We did the experiment with 32 subjects randomly assigned to each group so that eight were in each, and we recorded as our dependent variable for each subject the number of times during a two-minute interval that a stylus makes contact with the side of a half-inch hole. The data are:

Treatment Levels

a1 a2 a3 a4

4 4 5 3

6 5 6 5

3 4 5 6

3 3 4 5

1 2 3 6

3 3 4 7

2 4 3 8

2 3 4 10

(continued)


Treatment Levels

a1 a2 a3 a4

4 4 5 3

6 5 6 5

3 4 5 6

3 3 4 5

1 2 3 6

3 3 4 7

2 4 3 8

2 3 4 10

(continued) We compute the sums of squares as follows:

SSTO X

X

npijj

p

i

n ijj

p

i

n

2

11

11

2

SSBG

X

n

X

np

iji

n

j

p ijj

p

i

n

1

2

1

11

2

SSWG X

X

nijj

p

i

n iji

n

j

p

2

11

1

2

1

X

npX

ijj

p

i

n

11

2

21 3 6

8 45 7 8

X

nA

iji

n

j

p

1

2

1

2 2 2 22 4

8

2 8

8

3 4

8

5 0

86 2 7

X ASijj

p

i

n2

11

2 2 24 6 1 0 6 8 8 . . .

SSTO AS X 6 8 8 5 7 8 11 0

SSBG A X 6 2 7 5 7 8 4 9 SSWG AS A 6 8 8 6 2 7 6 1

Computational Procedures for CR-p(continued)

We use these numbers to begin our ANOVA table SSTO AS X 6 8 8 5 7 8 11 0

SSBG A X 6 2 7 5 7 8 4 9 SSWG AS A 6 8 8 6 2 7 6 1

Source SS df MS F

Between Groups

49 p-1=3 16.333 7.50*

Within Groups 61 p(n-1)=28

2.179

Total 110 np-1=31

*p<.001 A few things to note about the ANOVA table

SSBG + SSWG = SSTO; if it doesn’t, you’ve made a mistake. dfBG + dfWG = dfTO; if it doesn’t, you’ve made a mistake. MSBG = SSBG/dfBG; MSWG = SSWG/dfWG; MSBG + MSWG ≠ MSTO; if it does, you’ve probably made a mistake. For a CR-p design, F = MSBG/MSWG

CR-p Procedures in JMP

Analyze | Fit Y by X | Measurement in Y box (Continuous) | Grouping Variable in X box (Nominal) | | Means/Anova This is the exact same sequence as for a t test for

independent samples

CR-p Procedures in JMP

Co

un

t

0

2

4

6

8

10

12

a1 a2 a3 a4

Treatment

RsquareAdj RsquareRoot Mean Square ErrorMean of ResponseObservations (or Sum Wgts)

0.4454550.3860391.475998

4.25 32

Summary of Fit

TreatmentErrorC. Total

Source 3 28 31

DF 49.00000 61.00000 110.00000

Sum of Squares 16.3333 2.1786

Mean Square 7.4973

F Ratio 0.0008Prob > F

Analysis of Variance

a1a2a3a4

Level 8 8 8 8

Number 3.00000 3.50000 4.25000 6.25000

Mean0.521840.521840.521840.52184

Std Error 1.9311 2.4311 3.1811 5.1811

Lower 95% 4.0689 4.5689 5.3189 7.3189

Upper 95%

Std Error uses a pooled estimate of error variance

Means for Oneway Anova

Oneway Anova

Oneway Analysis of Count By Treatment

More on the F statistic

F3,28 0

FM SBG

M SWG

Using the critical value approach, we would need to find the point which corresponds to an area of α in the tail.

If our computed F exceeds this number, we reject the null hypothesis.

It does, so we reject the null.

Reject

F.05, 3, 28=2.95F = 7.50

More on the F statistic

F3,28 0

We determine the p-value in the same manner Through JMP (exact) Through tables (approximate)

• JMP – p=0.0008

• Tables – p<.01

Reject

F.05, 3, 28=2.95

(continued)

F = 7.50

FM SBG

M SWG

The model equation reflects all the sources of variation that affect each score.

If an experiment contains two treatments, the CR-p design is not appropriate

Participants are random samples from the respective populations, or the participants have been randomly assigned to treatment levels.

This helps distribute idiosyncratic characteristics of participants randomly over the treatment levels.

The p populations are normally distributed. The F test is robust with respect to departures from non-normality, especially

when the populations are symmetric and the “n’s” are equal.

The variances of the p populations are equal. The F test is robust with respect to heterogeneity of variances provided there

is an equal number of observations in each treatment level, the populations are normal, and the ratio of largest to smallest sample variance does not exceed three.

Assumptions Associated with the CR-p Design

If an omnibus null hypothesis is rejected, we don’t know which means differ.

Multiple Comparisons is a method of determining which means differ. Definitions:

Contrast – difference among the means

A priori tests – when a researcher wishes to test a specific set of null hypotheses prior to gathering the data

A posteori tests – when the data suggests sets of null hypotheses that are of interest to the researcher

• Also called post-hoc tests

Multiple Comparisons

Contrasts are typically denoted by and Contrasts can take a number of forms

Those in the left column are pairwise contrasts; that is, they compare two means.

Those in the right column are non-pairwise contrasts.. Each contrast has associated with it coefficients.

Integers before the means in each contrast Coefficients sum to 0; sum of absolute value of coefficients is 2

Contrasts

i i

1 1 2

2 2 3

3 1 3

4 2 3

5 2 2

6 2 1

1 2

1 3

2 3

Tukey’s Multiple Comparison Test and Confidence Interval Tukey’s HSD (Honestly Significant Difference) Used for pairwise contrasts (two-tailed) when “n’s” are

equal Test statistic given by:

Test of omnibus null hypothesis is not required beforehand If |q| exceeds the critical value in Table D.10, we reject the

null hypothesis and conclude that the two population means are significantly different.

The confidence interval for the contrast is given by:

A Posteori Multiple Comparison Tests

qX X

M SWG

n

j j

X X qM SWG

nj j p ; ,

Tukey’s Multiple Comparison Test and Confidence Interval Inspection of our data suggests we might be interested in a

contrast between the first and fourth means.

Because |q|>2.92, we reject the hypothesis and conclude that the two population means are different.

Some of the other pairwise contrasts from this data are:


q

6 2 5 3 0 0

2 1 7 8 6

8

6 2 2 8. .

.. q q. ; , . ; , .0 5 4 2 8 0 5 4 2 4 2 9 2

(continued)

q

4 2 5 3 2 5

2 1 7 8 6

8

1 9 1 6. .

..q

6 2 5 3 5 0

2 1 7 8 6

8

5 2 7. .

.. q

4 2 5 3 0 0

2 1 7 8 6

8

2 3 9 5. .

..

Tukey’s Multiple Comparison Test and Confidence Interval Assumptions with Tukey’s Multiple Comparison Test

Random sampling or random assignment of participants The p populations are normally distributed The p populations achieve homogeneity of variance The sample n’s are equal

When the sample n’s are unequal . . . . . . use the Fisher-Hayter Multiple Comparison Test

When the populations have heterogeneous variances . . . use another procedure (another class)

When the populations are not normal . . . use another procedure (another class)

A Posteori Multiple Comparison Tests(continued)

Fisher-Hayter Multiple Comparison Test Two-step procedure for pairwise comparisons

Test the omnibus null hypothesis If rejected, continue to multiple comparisons

Two advantages over Tukey’s test Does not require equal n’s More powerful than Tukey’s test for most data

Test statistic is given by

Reject the null hypothesis if |qFH|>


qFHX X

M SWG

n n

j j

j j

2

1 1

(continued)

q p ; , 1

Fisher-Hayter Multiple Comparison Test Assumptions

Random sampling or random assignment of participants The p populations are normally distributed The variances of the p populations are homogeneous

Scheffé’s Multiple Comparison Test Should be used if any of the contrasts are non-pairwise To preceded these tests with a test of the omnibus null

hypothesis is unnecessary Test statistic is given by:

Hypotheses are rejected if FS>


FS

c X c X c X

M SWGc

n

c

n

c

n

p p

p

p

1 1 2 2

2

12

1

22

2

2

. . .

. . .

(continued)

p F 11 2 ; ,

Dunn-Šidàk Test A priori test statistic for pairwise or non-pairwise one- or two-

sided contrast hypotheses Divides the level of significance equally among a set of C tests The probability of one or more type I errors is less than The test statistic is given by

Non-directional hypothesis is rejected if |tDS|> Directional hypothesis is rejected if |tDS|>

A Priori Multiple Comparison Tests

1 11

C

tD Sc X c X c X

M SWGc

n

c

n

c

n

p p

p

p

1 1 2 2

12

1

22

2

2

. . .

. . .

tD SC 2 ; ;

tD S C ; ,

All multiple comparison tests we’ve discussed require Normal populations Populations with homogeneous variances Random sampling or random assignment

Tukey’s procedure requires equal sample n’s For pairwise comparisons, we can use

Tukey’s procedure (a posteori – non-directional) Fisher-Hayter procedure (a posteori – non-directional) Scheffé’s procedure (a posteori – non-directional) Dunn-Šidàk procedure (a priori – directional or non-directional)

For nonpairwise comparisons, we can use Scheffé’s procedure (a posteori – non-directional) Dunn-Šidàk procedure (a priori – directional or non-directional)

Multiple Comparison Tests – Summary of Assumptions

We can compute confidence intervals for Tukey’s (a posteori – non-directional) Scheffé’s (a posteori – non-directional) Dunn-Šidàk (a priori – directional or non-directional)

The omnibus null hypothesis must be tested before using Fisher-Hayter (a posteori – non-directional)

Multiple Comparison Tests – Summary of Assumptions

(continued)

Recall the difference between practical significance and statistical significance

Strength of association can be measured with the ANOVA F test using omega-squared, which is given by:

The values are interpreted as follows: 0.01 – Small association 0.059 – Medium association 0.138 – large association

We can define the effect size for contrasts as follows:


2

1

SSBG p M SWG

SSTO M SWG

gX X

M SWG

j j

Chapter Review

Basic ANOVA Concepts The omnibus null hypothesis Why ANOVA? Linear Model Equation Sums of Squares Mean Squares & Degrees of Freedom

The Completely Randomized Design Computational Formulae and Procedures Assumptions

Chapter Review

Multiple Comparison Procedures Contrasts A Posteori Multiple Comparison Tests A Priori Multiple Comparison Tests


Documents

Introduction to the Analysis of Variance