1 Psych 5510/6510 Chapter 11 One-way ANOVA: Models with a Single Categorical Predictor Spring, 2009

1

Psych 5510/6510

Chapter 11

One-way ANOVA: Models with a Single Categorical Predictor

Spring, 2009

2

Categorical Predictors

Using predictor variables that represent ‘nominal’ measures (scores that reflect qualitative or categorical differences).

This will allow us to extend our approach to designs that are traditionally analyzed with t tests (2 groups) or ANOVAs (2 or more groups)

3

ExampleWe want a model that predicts a team’s batting

average (Yi) based upon whether it is in the National or American league (Xi: our categorical predictor variable).

If American League: Xi = -1

If National League: Xi = +1

As ‘league’ is a nominal variable, any two numbers could be used (e.g. Xi = 23 and 512).

4

Symbols‘k’ will stand for the level of the predictor variable

(i.e. which group of scores we are talking about, group 1 or group 2)

λk (lambda) will be the value assigned to that variable for that group. In this case: λ1 = -1 (i.e. Xi =-1 for the scores in group 1)

λ2 = +1 (i.e. Xi =+1 for the scores in group 2)

‘m’ will stand for how many groups there are (2 in this case)

5

Team Batting Average (Y) League (X)

Atlanta 272 -1

Chicago 261 -1

... ... ...

Baltimore 269 +1

Boston 270 +1

... ... ...

Categorical variable ‘league’ has two values(i.e. m = 2). For level 1 (i.e. k=1) we will usea code of -1 (i.e. λ1 = -1) and for level 2 (i.e. k=2) we will use a code of +1 (i.e. λ2 = +1).

6

Contrast CodesContrast codes (which we will be using) are simply

one of the many possible coding schemes for numerically representing categorical data. Contrast codes have two conditions:

1) The sum of lambdas equals zero:

2) They are orthogonal (covered later)

0 1 1- case in this 0 211

m

kk

7

Estimation and Inference

MODEL C: Ŷi = β0

Where β0 is the mean batting average for all teams.

MODEL A: Ŷi = β0 + β1Xi

Where Xi indicates to which league the team belongs (Xi = -1 or 1)

8

Least Squared Estimates

MODEL C: Ŷi = 260.78

MODEL A: Ŷi = 260.38 + 5.05Xi

260.78 is the mean batting average of all the teams.

260.38 and 5.05 are the intercept and slope of the regression equation for predicting Y based upon X (see next slide)

9

Scatter PlotŶi = 260.38 + 5.05Xi

National = -1American = +1

10

Analysis

202.61.3266

51.658

SSE(C)

SSR PRE

SSE(C) = 3266.61SSE(A) = 2608.1SSR = 3266.61-2608.1=658.51

06.624/)202.1(

1/202.

PA)-PRE)/(N-(1

PC)-PRE/(PA *F1,24

p=.02

PRE to F Method

11

Compare to Chapters 6 & 7So far everything is the same as in Chapters 6 & 7.

1) The intercept is still the predicted value of Y when X is zero.

2) The slope is still the amount the predicted value of Y changes when X changes by one.

3) And PRE is the measure of how much we gain by moving from moving from MODEL C (just using the mean of all the scores) to MODEL A (incorporating the information of which group the score belonged to).

The use of categorical predictors, however, adds some additional interpretations.

12

Additional Interpretations

1) The predicted value of each score is the mean of its group.

For a National League team:Ŷi = 260.38 + 5.05(-1) = 255.33...which is the mean of the National

League teams.

For an American League team:Ŷi = 260.38 + 5.05(+1) = 265.43...which is the mean of the

American League teams.

2) The intercept (260.38) is the mean of the two group means. This is an unweighted grand mean.

13

Unweighted Grand Mean

The unweighted grand mean is simple the mean of the group means, it is not necessarily the mean of all the scores unless each group has the same number of scores.

E.g. Group 1: 100,100,100,100 Group 2: 0, 0

unweighted grand mean = 50 mean of the 6 scores = 66.67

14

Unweighted vs Weighted Grand Means

67.666

)0(2)100(4

nn

YnYn mean grand Weighted

502

0100

2

YY mean grand Unweighted

21

2211

21

15

Additional Interpretations (cont)

3) The slope can be used to determine the difference between the two group means. Slope is how much the means differ as X changes by one. There is a difference of 2 between X=-1 and X=+1, so the difference between the two group means is the slope*2.

The slope = 5.05, so 5.05 x 2 = 10.10 which is the difference between the mean batting average of the two leagues.

16

H0 and HA

Look back at the scatter plot, if the two group means were identical then the slope would be zero. The F test for the regression model:

MODEL C: Ŷi = β0

MODEL A: Ŷi = β0 + β1Xi

Can be stated as either:

H0: β1 = 0 HA: β1 0, or as

H0: μ1 = μ2 HA: μ1 μ2, or as

H0: μ1 - μ2 = 0 HA: μ1 - μ2 0

Thus this is a way to test whether a statistically significant difference exists between the two group means. This is testing a non-directional alternative hypothesis (one that simply predicts the means will be different but does not predict which mean will be greater and which will be less).

17

And More...While in this chapter we are focusing on how to do

a ‘traditional t test’ for the difference between group means using multiple regression, don’t forget that this could be part of a larger model that can be tested using the model comparison approach. For example, our model of Y might include which group the person was in (variable X1) along with their score on some pretest (X2) and their height (X3). We will look at such models in a later chapter.

18

SSE

10.2608)Y(Y)Y(Y SSE(A)

Xbb Y2

ki2

ii

110

61.3266)Y(Y SSE(C)

260.78Y b Y2

i

0

MODEL C

a.k.a “SS Total”.

MODEL A

a.k.a “SS Within Groups”

19

SSE (Cont), & df

52.658)YY()YY( SSR 2k

2i

SSR = SSE(C) - SSE(A) = 3266.62 - 2608.10 = 658.52 or

a.k.a: “SS Between Groups”

dfTotal = N-PC...this goes with SSTotal which is SSE(C)

dfWithin = N-PA...this goes with SSWithin which is SSE(A)

dfBetween = PA-PC...this goes with SSBetween which is SSR

d.f

20

Summary table (MS to F Method)

Source (Text)

Source (SPSS)

Source (ANOVA)

SS df MS F p

SSR Regression Between 658.52 1 658.52 6.06 .02

SSE(A) Residual Within 2608.10 24 108.67

SSE(C) Total Total 3266.62 25

PRE = SSR/SSE(C) = .202

Remember: 1) a ‘MS’ is a SS divided by its d.f., and 2) F in this case is MSBetween/MSWithin

21

‘t’ and ‘F’• The analysis of Model A provides a value of

F* that is equivalent to what you would obtain by performing a traditional one-factor, independent groups, ANOVA.

• If you take the square root of the F* you get the value of t* (the t obtained value if you analyzed this as a t test), the square root of Fcritical = tcritical, and the ‘p value’ for ‘t’ and that for ‘F’ are identical when you test a non-directional alternative hypothesis.

22

Directional Hypotheses• Testing a directional alternative hypothesis

can also be done using this approach. • First, remember that in a test of a directional

hypothesis the predicted direction of the difference between the two group means is expressed in HA. Example, you are testing a theory which predicts that group 1 will have a greater mean than group 2:H0: μ1 μ2

HA: μ1 > μ2

23

Directional (cont.)

H0: μ1 μ2

HA: μ1 > μ2

Now analyze the data and look at the group means:

1. If they support HA then the p value of that difference is the p value from the non-directional hypothesis divided by 2.

2. If they support H0 then do not reject H0 (the p value is actually 1- p/2, which is much greater than .5, let alone .05).

24

Alternative Coding Scheme: ‘Dummy Codes’

‘Dummy codes’:

λ1 = 0

λ2 = +1Least squared estimates using dummy coding:

MODEL A: Ŷi = 255.33 + 10.10Xi

Some things stay the same:The predicted values of Y stay the same, they still come out

to be the mean of the group the score is in.Thus the error of the model is the same, as are the values of

the SS’s, df’s, MS’s, F, PRE, p, and the decision regarding H0

25

Dummy Codes (cont.)MODEL A: Ŷi = 255.33 + 10.10Xi

While dummy coding does not change the results of the analysis, the interpretation of the slope and intercept do change:

The intercept is the mean of group 1 (as we set the value of X to λ1 = 0 for scores in group 1).

The slope is now equal to the difference between the two group means as the difference between λ1 (i.e. 0) and λ2 (i.e. 1) is one (remember that the slope is how much the prediction of Y changes as the value of X changes by one).

26

AssumptionsWith regressional analysis we have the same assumptions

about the populations as we did with the t test last semester.

1. The scores are normally distributed around the regression line (in this case the regression line passes through the group data at the mean of the group).

2. The variance of the scores doesn’t change along the regression line (in this case the variance of group 1 around its mean is the same as the variance of group 2 along its mean).

3. The scores are independent.

In addition, outliers can effect the regression line exactly the same way they effect the means in the t test.

27

Assumption of Normality

-1.00 -0.50 0.00 0.50 1.00

GroupCode

25.00

30.00

35.00

40.00

45.00

Dep

end

ent

R Sq Linear = 0.176

The scores are normally distributed around the regression line (i.e.the residuals are normally distributed). From last semester we know that the analysis is ‘robust’ in terms of normality if N is large.

28

Assumption of Homogeneity of Variance

-1.00 -0.50 0.00 0.50 1.00

GroupCode

25.00

30.00

35.00

40.00

45.00

Dep

end

ent

R Sq Linear = 0.176

The variance of the scores around the regression line (i.e. the residuals) is the same for both groups. From last semester we knowthat this assumption is not important if we have an equal N in the groups.

29

Measure of Effect Size

Last semester we used Cohen’s d as a measure of the size of the effect of the independent variable when performing a t test for independent groups. This semester we are using PRE as our measure of effect size. PRE is a measure of how much the differences between the scores can be accounted for by knowing which group they were in (National League or American League), which is a way of measuring the size of the effect of the independent variable. PRE=.202

30

Categorical Predictors with More Than Two Levels

C o n d i t i o n s G r o u p 1

‘ F a i l u r e F e e d b a c k ’ G r o u p 2

‘ N o F e e d b a c k ’ G r o u p 3

‘ S u c c e s s F e e d b a c k ’ 2 4 4

2 3 6

2 4 5

3 5 4

4 5 6

4 2 4

3 4 3

4 3 3 00.3Y 1 75.3Y 2 375.4Y 3

Type of feedback given on some pre-task, then similar task done again and performance measured.

31

Traditional ANOVA

Here is the overall F test to see if a difference exists somewhere between the means, using the ANOVA techniques covered last semester.

H0: μ1 = μ2 = μ3

Ha: at least one μ is different than the rest.

ANOVA

Y

7.583 2 3.792 3.406 .052

23.375 21 1.113

30.958 23

Between Groups

Within Groups

Total

Sum ofSquares df Mean Square F Sig.

32

Traditional ANOVATo perform an ANOVA in SPSS we create a dummy

variable to indicate to which group each score belongs. SPSS knows that the dummy variable is a nominal scale and that it simply serves to indicate group membership.

Group Name

Data (Yi) Group (dummy variable)

Failure 2 1 3 1 ... ...

No Feedback 5 2 4 2 ... ...

Success 4 3 6 3 ... ...

33

Multiple RegressionThis dummy variable won’t work with multiple regression, for it

will be treated as a cardinal variable (i.e. that the ‘no feedback group’ is 1 more than the ‘failure group’, and that the ‘success group’ is 1 more than the ‘no feedback group’). ‘Group’ really is a nominal variable and needs to be treated as such.

Group Name

Data (Yi) Group (dummy variable)

Failure 2 1 3 1 ... ...

No Feedback 5 2 4 2 ... ...

Success 4 3 6 3 ... ...

34

Solution: Contrast Codes

With more than two groups one variable cannot be used to code the independent variable. With ‘m’ number of groups we need to come up with m-1 contrast codes to completely code the independent variable so that we can analyze the data using multiple regression.

That gets us to the second condition of contrast codes:

1)

2) If you have two or more contrast codes then they all must be orthogonal to each other.

01

m

kk (the first condition, covered earlier)

35

ComparisonsBefore we start looking at whether or not the

contrast codes are orthogonal let’s consider what each contrast code is doing, each code is making a ‘comparison’ between two aspects of our data (essentially asking a question that involves comparing two means).

36

Selecting Contrast Codes

Last semester we looked at the use of comparisons to answer specific questions in designs that have more than two groups. For each comparison we dropped groups, or added groups together, until we had just two things to compare. If ‘m’ is the number of groups, we knew that we could come up with m-1 number of orthogonal comparisons. We are doing exactly the same thing again, but this time we are doing it at the beginning of the analysis rather than after the overall F test has been completed. As you will see, there are some theoretical advantages to this new approach.

37

Our Contrast Codes

We have three groups (i.e. m=3) in our experiment, we need to come up with m-1 (i.e. 2) contrast codes that are orthogonal to each other. One possible set is given below (we will let λjk represent the value we plug in for X for using the jth contrast code on the scores in group k):

Code 1:λ11 = -2 λ12 = 1 λ13 = 1

Code 2:λ21 = 0 λ22 = -1 λ23 = 1

Where these codes came from will be explained in a minute, first I want to make sure you understand how they will be implemented.

38

ImplementationWe are going to regress Y on X1 and X2, together X1 and X2

completely code our independent variable (type of feedback). The values of X1 and X2 implement our contrast codes.

Group

Data (Yi) Xi1 Xi2

Failure 2 -2 0 3 -2 0 ... ... ...

No Feedback 5 1 -1 4 1 -1 ... ... ...

Success 4 1 1 6 1 1 ... ... ...

39

Comparison OneLet’s say we are interested in whether the mean of group 1

(the ‘failure feedback’ group) is different than the mean of other two groups combined (a group consisting of the ‘no feedback’ and ‘success feedback’ groups). The hypotheses for this comparison would be:

H0: μ1 = (μ2 + μ3 )/2 or equivalently μ1 - (μ2 + μ3 )/2=0

The contrast codes could be any of the following:

λ11 = -1 λ12 = .5 λ13 = .5 or

λ11 = 1 λ12 = -.5 λ13 = -.5 or

λ11 = -2 λ12 = 1 λ13 = 1 etc. I will be using this latter form, where whole numbers are used

rather than decimals, this will have certain advantages.

40

Comparison TwoLet’s say we are also interested in whether the

mean of group 2 (the ‘no feedback’ group) is different than the mean of group 3 (the ‘success feedback’ group). The hypotheses for this comparison would be:

H0: μ2 = μ3 or equivalently μ2 - μ3 = 0

The contrast codes could be any of the following:

λ11 = 0 λ12 = 1 λ13 = -1 or

λ11 = 0 λ12 = -1 λ13 = 1 etc.

41

Determining Orthogonality (cont)

Code 1:λ11 = -2 λ12 = 1 λ13 = 1

Code 2:λ21 = 0 λ22 = -1 λ23 = 1

Both codes fit the first condition of contrast codes, in that in both cases the values of delta add to zero.

Before we check computationally for orthogonality, think about it conceptually. If we compare group 1 with groups 2 and 3 combined, would that help us predict what we would find if we subsequently compared group 2 with group 3?

42


The second condition of contrast codes (that they be orthogonal to each other) is determined as follows;

1) For each group you times the lambda of one code by the lambda of the other code (i.e. multiplying downward).

2) The two codes are orthogonal if the sum of those products is zero.

Failure No Feedback Success λ1k -2 1 1

λ2k 0 -1 1

(λ1k) ( λ2k) 0 -1 1

The two codes are orthogonal because 0 + -1 + 1 = 0

43

Another example

We have three groups (i.e. m=3) in our experiment, we need to come up with m-1 (i.e. 2) contrast codes that are orthogonal to each other. Here is a possible of codes:

Code 1:λ11 = -1 λ12 = 0 λ13 = 1

Code 2:λ21 = 0 λ22 = -1 λ23 = 1

Code 1 compares the mean of group 1 with the mean of group 3.

Code 2 compares the mean of group 2 with the mean of group 3.

44


Code 1:λ11 = -1 λ12 = 0 λ13 = 1

Code 2:λ21 = 0 λ22 = -1 λ23 = 1

Both codes fit the first condition of contrast codes, in that both cases the values of delta add to zero.

Before we check computationally for orthogonality, think about it conceptually. If, for example, we found a significant difference between group 1 and group 3, would that influence your guess about whether or not there may also be a difference between group 2 and group 3?

45


The second condition of contrast codes (that they be orthogonal to each other):

Failure No Feedback Success λ1k -1 0 1

λ2k 0 -1 1

(λ1k) ( λ2k) 0 0 1

The two codes are not orthogonal because 0 + 0 + 1 doesn’t equal 0.

46

Coding Categorical VariablesContrast Coding: To completely code a categorical variable for

multiple regression requires m-1 contrast codes. Thus a categorical variable with 3 levels (as in the previous example) can be completely coded with 2 contrast codes. A categorical variable with 4 levels can be completely coded with 3 contrast codes, etc.

Dummy Coding: You can code a categorical variable with any number of groups using just one variable using dummy coding (X=1 if in group 1, X=2 if in group 2, X=3 if in group 3, etc.) if the only purpose of the variable is to inform your statistical program which scores go into which group in an ANOVA, if you want to use multiple regression to analyze the data then go with contrast codes. We will always use contrast coding in this class.

47

Coding 4 levels

Level 1 (k=1)

Level 2 (k=2)

Level 3 (k=3)

Level 4 (k=4)

λ1k -3 1 1 1

λ2k 0 -2 1 1

λ3k 0 0 -1 1

Four levels requires 4-1=3 contrast codes to completely code it. The requirement is that every pair of codes be orthogonal (i.e. λ1k is orthogonalto λ2k , and λ1k is orthogonal to λ3k , and λ2k is orthogonal to λ3k ).

It is impossible to come up with a set of more than m-1 codes that are all orthogonal to each other.

One possible set of orthogonal codes

48

There are many possible sets of codes to use

Level 1 (k=1)

Level 2 (k=2)

Level 3 (k=3)

Level 4 (k=4)

λ1k 1 -3 1 1

λ2k 1 0 -2 1

λ3k 1 0 0 -1

Another set of orthogonal codes. This has the same general pattern asthe previous set, it may be easier to simply use the previous set and thenthink about which group you want to call ‘Level 1’, which group to call‘Level 2’, etc.

49

There are many possible sets of codes to use

Level 1 (k=1)

Level 2 (k=2)

Level 3 (k=3)

Level 4 (k=4)

λ1k 1 1 -1 -1

λ2k 1 -1 0 0

λ3k 0 0 1 -1

A very different set of orthogonal codes. Note that once you have determinedall but the last code there is no freedom, there will be only one code leftthat will complete the orthogonal set.

50

How to Select a Set of Codes

There are often many different sets of contrast codes from which you may choose. Select the one that contains the comparisons that best fit your a priori questions. For example, if you think that the ‘failure’ level should lead to worse performance than the ‘no feedback’ or ‘success’ level, then select a set of contrast codes that contains the contrast which compares the ‘failure level’ with the average of the other two.

51

The Big Picture

Where we are heading with this: we will be able to do the equivalent of an overall F test to determine whether a difference exists somewhere between the levels, but we are also going to be able to test the significance of specific contrasts, thus answering more specific questions at the same time.

52

More Options

Level 1 Level 2 Level 3 Level 4 Level 5

λ1k 4 -1 -1 -1 -1

λ2k 0 3 -1 -1 -1

λ3k 0 0 2 -1 -1

λ4k 0 0 0 1 -1

If you have no a priori reason to select a specific set of contrast codes youcan always follow this general pattern (which we have seen inprevious examples) which can be adapted to fit any number of levels.This pattern is known as a set of Helmert contrast codes.

53

More Options: TrendsIf your independent variable is an ordinal scale, rather than a nominalscale, then you can select ‘orthogonal polynomial contrast codes’ to checkwhether various sorts of orderings (e.g. linear, quadratic, cubic, etc.) exist betweenyour categories. What you can test for depends upon how many categories you have: twocategories can only detect a linear relationship, three categories can detect a linear or a quadratic relationship, etc. In the following slides note that the contrast codes are indeed orthogonal.

Two-Group Experiment

Group 1 Group 2

Linear -1 1

54

Three-Group Experiment

Group 1 Group 2 Group 3

Linear -1 0 1

Quadratic -1 2 -1

Four-Group Experiment

Group 1 Group 2 Group 3 Group 4

Linear -3 -1 1 3

Quadratic 1 -1 -1 1

Cubic -1 3 -3 1

55

Level 1 Level 2 Level 3 Level 4 Level 5

linear -2 -1 0 1 2

quadratic 2 -1 -2 -1 2

cubic -1 2 0 -2 1

quartic 1 -4 6 -4 1

Five-Group Experiment

56

Parameter Estimation with a Multilevel Predictor Variable

Back to our example with a three level predictor variable (‘failure’, ‘no feedback’, ‘success’).

Code 1:λ11 = -2 λ12 = 1 λ13 = 1

Code 2:λ21 = 0 λ22 = -1 λ23 = 1

First contrast the mean of ‘failure’ with the average of the other two means, then contrast the ‘no feedback’ with ‘success’

57

Data

Group

Data (Yi) Xi1 Xi2

Failure 2 -2 0 3 -2 0 ... ... ...

No Feedback 5 1 -1 4 1 -1 ... ... ...

Success 4 1 1 6 1 1 ... ... ...

The data and the two contrast codes.

58

The following parameters result when we regress Y on X1 and X2:

MODEL A: Ŷi = 3.71 + .35Xi1 + .31Xi2

If you use a complete set of m-1 contrast codes then:1) The predicted value for each Y is the mean of the group it is in.

2) The intercept is the unweighted grand mean of the group means.

3) The slopes can be used to determine the difference in the means involved in each contrast.

1) X1 compares the mean of group 1 with the mean of the other two groups combined. The values of X1 are –2 and 1. The effect on Y of X1 changing by 1 is .35, since X1 changes by 3 when it moves from group 1 to the other two groups combined, then the difference in the means in that contrast is equal to .35x3=1.05

2) X2 is little easier to understand, it simply compares the mean of group 2 with the mean of group 3. The values of X2 are 0, -1, and 1. The effect on Y of X2 changing by 1 is .31, since X2 changes by 2 when it moves from group 2 to group 3, then the difference in those means is equal to 2x.31=.62

59

Inference with a Multilevel Predictor Variable

The first test we will look at is the overall test comparing a model that just uses the mean of Y to a model that uses both contrasts. Remember that the two contrasts completely code the three groups in the experiment, thus this is the equivalent to testing to see if the independent variable had an effect.

MODEL C: Ŷi = β0

MODEL A: Ŷi = β0 + β1Xi1 + β2Xi2

Model A (for any complete set of m-1 contrast codes) uses the mean of the group as a prediction for each score. If that reduces error compared to using the mean of all the scores (Model C) then that tells us that at least one group mean must differ from the mean of all the scores. Thus H0 can be written two ways:

H0: β1 = β2 = 0, (model comparison approach), or equivalently…

H0: μ1 = μ2 = μ3 (ANOVA approach)

60

Summary Table

Note that dfBetween = 2 (i.e. 1 d.f. for each contrast, or m-1).Also note that the test for an overall effect of the independent variable is not statistically significant. The SS, df, MS, F, and p values are identical to when we analyzed the data with an ANOVA(see the earlier slide).

Source (Text)

Source (SPSS)

Source (ANOVA)

SS df MS F PRE p

SSR Regression Between 7.58 2 3.79 3.41 .245 .052

SSE(A) Residual Within 23.38 21 1.11

SSE(C) Total Total 30.96 23

61

A More Detailed AnalysisThe previous analysis is the equivalent of the ‘overall F

test’ to see if a difference exists somewhere among the group means. This has two disadvantages:

1. It doesn’t tell us where that difference lies.

2. It lacks power. By plugging all of the contrast codes in at once we are weakening our ability to see if one or more were worthwhile by themselves. I.e. Model A in the overall analysis has one parameter for each contrast, the PRE per parameter added may be less than the PRE for any individual contrast.

It will be more informative and more powerful to look at each contrast individually.

62

Testing Contrast #1The general approach will be to use the full set of contrast codes as the

augmented model, and all but the contrast code of interest as the compact model. Remember in this contrast X1 compares the mean of group 1 with the mean of groups 2 and 3 combined.

Test 1:

Model C: Ŷi = β0 + β2Xi2 Ŷi = 3.71 + .31Xi2

Model A: Ŷi = β0 + β2Xi2 + β1Xi1 Ŷi = 3.71 + .31Xi2 + .35Xi1

H0: β1 =0, or alternatively,

H0: μ1 = (μ2 + μ3 )/2 or μ1 - (μ2 + μ3 )/2=0

Note:1) PA-PC =1, thus this is a ‘Single-degree-of-freedom test’

2) The intercept and the slope for X2 don’t change from Model C to Model A, this is because the lambdas for X1 and X2 are orthogonal (i.e. contrast codes were used and the groups have equal n’s), this causes X1 and X2 to be completely nonredundant predictors.

3) The statistical analysis here is simply that of the partial regression coefficient.

63

From SPSSSPSS will give us the following for Contrast 1 (i.e. X1):

1. Partial regression coefficient (i.e. ‘b1’)=0.352. Partial correlation coefficient = .4533. PRE = .453² = .2054. t = 2.3265. p = .03 Remember, this p value is for many equivalent

tests: whether the partial regression coefficient = 0; whether the partial correlation coefficient = 0; whether the PRE is statistically significant. Contrast 1 is statistically significant (worth adding to our model of Y).

From this we can also compute:1. F = t² = 2.326² = 5.412. SPSS does not give us the SSR for the contrast but it can

easily be computed. SScontrast = MSResidual x Fcontrast = 1.113 x 5.41 = 6.021

64

Testing Contrast #2Contrast X2 compares the mean of group 2 with the mean of group 3.Test 1:

Model C: Ŷi = β0 + β1Xi1 Ŷi = 3.71 + .35Xi1

Model A: Ŷi = β0 + β1Xi1 + β2Xi2 Ŷi = 3.71 + .35Xi1 + .31Xi2

H0: β2 =0, or alternatively,

H0: μ2 = μ3 or μ2 - μ3 = 0

From SPSS:

1. Partial regression coefficient (i.e. ‘b2’)=0.312. Partial correlation coefficient = .25

3. PRE = .25² = .063

4. Fcontrast = t²contrast = 1.185² = 1.40

5. SScontrast = MSresidual x Fcontrast = 1.113 x 1.4 = 1.566. p = .249 (not statistically significant).

65

Full Summary Table

Note that what we have done is to partition the SSbetween groups (7.68 = 6.02 + 1.56) into that which is accounted for by contrast 1 and that which is accounted for by contrast 2. SSX1 + SSX2 = SSbetween only when the contrasts are orthogonal.

Source

b SS df MS F PRE p

Between X1 X2

.35 .31

7.58 6.02 1.56

2 1 1

3.79 6.02 1.56

3.41 5.41 1.40

.245

.205

.063

.052 .03

.249

Within

23.375 21

1.11

Total

30.96 23

66

Full Summary Table

Source

b SS df MS F PRE p

Between X1 X2

.35 .31

7.58 6.02 1.56

2 1 1

3.79 6.02 1.56

3.41 5.41 1.40

.245

.205

.063

.052 .03

.249

Within

23.375 21

1.11

Total

30.96 23

The overall F test was not statistically significant (p=.052), Contrast 1 was

(p=.03), Contrast 2 was not (p=.249). The overall F test had the highest PRE but remember that the F value is based upon the PRE per parameters added, and the overall F test had two parameters (b1 and b2), the second parameter (b2) watered down the effect of the first parameter (b1)

67

Thoughts

Using multiple regression to perform an overall ANOVA requires us to first come up with a set of contrasts, this has a couple of advantages.

The first advantage is that each comparison has more power than the overall F as the comparison involves testing the addition of just one parameter. In our example if we had only performed the overall F test we would have simply concluded that the effect of our independent variable was not statistically significant.

68

Thoughts

The second advantage is conceptual. If we are satisfied in knowing that our independent variable had some sort of effect (i.e. H0 is rejected in the overall F test) than we are not doing very good science. Having to come up with a set of comparisons requires us to think about our experiment in a much more detailed way, leading to much more specificity in our theory building.

69

Other Contrast CodesSource

b SS df MS F p

Between X1 X2

... ...

7.58 ... ...

2 1 1

3.79 ... ...

3.41 ... ...

Within

23.375 21

1.11

Total

30.96 23

If we had selected a different set of contrast codes then the only

thing that would have changed is the information in the table indicated by ... The choice of which set of contrast codes to use should be based upon which set best answers your a priori questions.

70

Contrast Codes with Unequal N

A slight complication arises if you do not have an equal number of scores in each level of your categorical variable (i.e. the group n’s are not equal).

With unequal n’s your contrast coded predictor variables become somewhat redundant.

71

Unequal n’s (cont.)

How this shows up:1) The SS for the contrast codes no longer add

up to equal SSbetween.

2) The value of the parameters change as you add more contrast coded predictors (e.g. the value of β1 may change from Model C to Model A).

3) You lose a little power, for redundancy takes away from the power of adding a new predictor.

72

Unequal n’s (cont.)

Bottom Line:1) You lose a little power, so it is better to have

equal n’s if you can.

2) Everything concerning how you proceed and how you interpret the results stays the same, just don’t expect the SS of your contrast codes to sum up to equal the SS between groups (because of redundancies).

73

Nonorthogonal Comparisons

It might be the case that you can’t find one set of contrast codes that contain all of the comparisons you would like to perform (i.e. some of the comparisons you want to perform are not orthogonal with other comparisons you also want to perform).

74

Nonorthogonal Contrasts:The Problem of Error Rate

The problem with performing nonorthogonal comparisons is that of increasing error rate (Type 1 errors). The requirement that comparisons be orthogonal limits error rate in two ways:

1) It limits the number of contrasts being made to m-1.2) The requirement of orthogonality decreases the

ability of one weird mean to lead to many Type 1 errors.

75

Nonorthogonal Comparisons: Step 1

Write down all of the comparisons in which you have an a priori interest. Find a set of orthogonal contrast codes that contains the greatest number of those comparisons (or perhaps the set that contains the most important comparisons you would like to make). Proceed as previously described, don’t make any adjustments, interpret normally.

Now look at the comparisons you have left to perform…

76

Nonorthogonal Comparisons: Step 2

For every comparison that is not in your original set of contrast codes, create a complete* set of contrast codes that contains that comparison. Run that set like you did your original (getting a value of F and p for the contrast of interest), and then make one of the following adjustments...

* Make sure when you run this new analysis that your new Model A has a complete set of m-1, orthogonal, contrast codes. If you can find a set of contrast codes that contain more than of of these left-over desired comparisons that would be great.

77

Nonorthogonal Comparisons: Step 3a

If this comparison (the one that was nonorthogonal to the original set) is a ‘planned comparison’ then adjust its p value using the following formula:

k = the total number of contrasts being made on the data (include the contrasts made in the original set)

Then use (your significance level) / k as significance level for that contrast. For example if you are making 3 contrasts altogether, then the p value for the nonorthogonal contrast must be less than or equal to .05/3 = .017 to be statistically significant.

a.k.a.: Bonferronni method or Dunn method

78

Nonorthogonal Comparisons: Step 3b

If the nonorthogonal comparison is a ‘post hoc comparison’ then adjust the value of Fcritical for this contrast using the following formula:

new Fcritical = (m-1) Fcritical with m-1, n-PA df

Note: if you make every possible comparison using this new Fcritical value the probability of making at least one Type 1 error is equal to your significance level. Also, if the overall F test is significant than at least one comparison will be, if not then no comparison will be.

a.k.a.: Scheffe Method

Documents

1 Psych 5510/6510 Chapter 11 One-way ANOVA: Models with a Single Categorical Predictor Spring, 2009