Contrast Coding - Learning Research and Development CenterContrast Coding Contrasts: Test...

Preview:

Citation preview

Contrast CodingOr: One of These Levels is

Not Like the Others

Scott Fraundorf (and Tuan Lam)MLM Reading Group – 03.10.11

Administrivia

● 3/10 (TODAY): Contrast coding overview● 4/7: Simple vs main effects● 4/21: Principal components analysis● 1st week of May: Harald Baayen visit

Outline

● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions

Why Use Contrast Coding?

● Scott's example study:

● Examining recall memory for spoken discourse as a function of:

● Location of disfluencies (categorical variable)● Prior story knowledge (continuous variable)

=LOCATION OF DISFLUENCY

SUBJECT ITEM

+ ++PRIOR

KNOWLEDGE

Why Use Contrast Coding?● Regression equation: Predicts values

● Could use this to predict whether or not something will be remembered

● But in cognitive psych:● Often interested in the effect of specific levels● Test which ones differ significantly

=LOCATION OF DISFLUENCY

SUBJECT ITEM

+ ++PRIOR

KNOWLEDGE

Outline

● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions

Contrast Coding

● Example: Fluent vs. disfluencies in typical locations vs. in atypical locations

● Which ones differ significantly?

Typical Atypical Fluent0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8%

of s

tory

rec

alle

d

Contrast Coding● Contrasts: Test differences between

specific levels– Same as a planned comparison in an ANOVA

– Also analogous to a post-hoc test

● Planned comparisons vs post-hoc tests– If we are deciding tests post-hoc, greater chance

of capitalizing on chance / spurious effect

– Contrasts are set before you fit the model, but it would be possible to go back and change the contrasts afterwards

– We are basically on the honor system here—no way to prove the comparison was planned ahead of time

Contrasts!

● Contrasts like weighted sums of means– In multiple regression / MLM context, also

subject to other variables in the model

● Using your scale to test what's different

Typical Atypical Fluent

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

% o

f sto

ry r

ecal

led

Contrast CodingIt looks like the Fluentstories might not be

remembered as well.

Let's use a contrast totest this.

Contrasts

TYPICAL ATYPICAL FLUENT

Question 1: Do disfluencies affect recall?

Contrasts

Contrast weights are assigned

.33 .33 -.66

One side positive.One side negative.

This determines whichlevels are being

compared (+ versus -)

Doesn't really matterwhich side you choose

as the + side. It justaffects the sign of the

result, but notmagnitude or statistical

significanceTYPICAL ATYPICAL FLUENT

Contrasts

Contrast weights are assigned

.33 .33 -.66

One side positive.One side negative.

Codes add up to zero.

Also nice to have theabsolute values of the+ code and the – code

sum to 1.(We'll see why later.)

abs(.33) + abs(-.66) = 1

TYPICAL ATYPICAL FLUENT

Contrasts

Can conceptualize the comparison as:Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent)(holding other variables constant)

.33 .33 -.66

One side positive.One side negative.

Codes add up to zero.

Does contrast differsignificantly from zero?

If so, difference betweenlevels is significant.

TYPICAL ATYPICAL FLUENT

Contrasts

Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent)

.33 .33

-.66

*

TYPICAL ATYPICAL

FLUENT

Typical Atypical Fluent0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

Contrast Coding

*%

of s

tory

rec

alle

d

Our first contrast revealsthat fluent stories areremembered worse.

Now let's look atTypical vs Atypical

We always have j – 1 contrasts, where j = the # of levels of the factorSo, here 2 contrasts needed to fully describe

Contrasts

TYPICAL ATYPICAL

Question 2: Does location of disfluencies matter?

Contrasts

Contrast 2: .50(Typical) - .50(Atypical) + 0(Rest)

.50 -.50

One side positive.One side negative.

Codes add up to zero.

Sum of absolute valuesof codes is 1.

FLUENT(zeroed

out here!)

0TYPICAL ATYPICAL

Typical Atypical Fluent0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

Contrast Coding

*%

of s

tory

rec

alle

d

n.s.

One Important Point!● Choice of contrasts doesn't affect total

variance accounted for by variable● Only about differences between levels● Can divide this up in multiple different ways

and still account for same total variance

LOCATION IN STORY

Outline

● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions

Why -.5 and .5?● Why [-.5 .5] instead of [-1 1]?● Doesn't affect significance test● Does affect β weight (estimate)

– Std error is also scaled accordingly

FILLER LOCATION:[-1 1]

FILLERLOCATION:[-.5 .5]

Contrast Estimates

ATYPICAL LOCATION

TYPICAL LOCATION

.5

-.5

CONTRAST CODE

}1

Beta weight (estimate) represents the effect of a 1-unit change in the contrast, holding everything else constant

In this case, a 1-unit change in contrast IS the difference between the levels' codes

Thus, the contrast correctly represents .04825 as the difference between the conditions

Contrast Estimates

ATYPICAL LOCATION

TYPICAL LOCATION

1

-1

CONTRAST CODE

}2

Here, the total difference between the levels' codes is 2

So, a 1-unit change in the contrast is only HALF the difference between the levels' codes

Thus, the estimate of the contrast is .024 … only half the difference between the conditions

Contrast Estimates

ATYPICAL LOCATION

TYPICAL LOCATION

.5

-.5

CONTRAST CODE

}1

ATYPICAL LOCATION

TYPICAL LOCATION

1

-1

CONTRAST CODE

1 unit change in contrast IS the difference between levels (.04825 in this case)

1 unit change in contrast IS only half the difference between levels

}2

Beta weight (estimate) represents the effect of a 1-unit change in the contrast

So Why -.5 and .5?● Better tell you about difference in means!

– The actual difference between conditions is .048

– It would be perfectly correct to describe .024 as half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers

FILLER LOCATION:[-1 1]

FILLERLOCATION:[-.5 .5]

So Why -.5 and .5?● Better tell you about difference in means!

– The actual difference between conditions is .048

– It would be perfectly correct to describe .024 as half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers

● Both contrasts would account for the same amount of variance

● This is just another case of deciding the scale of a variable

– Akin to measuring temperature in C versus F … both account for the same variance, but the numbers are on different scales

Imbalanced Designs

● You may have an unequal number of observations per cell– e.g. some data lost,

or responses notcodable

● Correct for thisin your contrast codes if you want things centered– Ask Tuan or Scott

about how to do this :)

Outline

● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions

Contrasts in R● To check what the current contrasts are:

– contrasts(YourDataFrame$VariableName)

● To set the contrasts:– contrasts(YourDataFrame$VariableName) =

cbind(c(.33,.33,-.66),c(.50,-.50,0))

● Each c(xx,yy,zz) is the weights for one of the contrasts you want to run

● e.g. (.33, .33, -.66) is one contrast

● After setting contrasts, run lmer model to get the results of the contrasts

Contrasts in R

● Should have j – 1 contrasts, where k = # of levels of the factor

● If using a subset of data, some levels of the factor may no longer be present

– e.g. you dropped a condition

– But, R still “remembers” that these levels exist and will get mad you didn't specify enough contrasts

– Fix this by reconverting to a factor:● YourDataFrame$Variable =

factor(YourDataFrame$Variable)

Another R Tip

● To see the mean of each level of an I.V.:– tapply(YourDataFrame$DVName,

YourDataFrame$IVName,mean)

– Could also do median, sd, etc.

● For a 2-way (or more!) table– tapply(YourDataFrame$DVName,

list(YourDataFrame$IVName1, YourDataFrame$IVName2), mean)

● Doesn't work if you have missing values

– But Tuan has made a version of tapply that fixes this problem

Outline

● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions

Multiple Comparisons(Here Comes Trouble!)

Multiple Comparisons

● Lots of comparisons you can run● Suppose we tested both young & older

adults on the disfluency task:

FLUENT /YOUNGER

FLUENT /OLDER

TYPICAL /YOUNGER

TYPICAL /OLDER

ATYPICAL /YOUNGER

ATYPICAL /OLDER

Multiple Comparisons

● Some comparisons are (wholly or partial) redundant

● Suppose we find typical > fluent, but typical and atypical don't reliably differ

● Should expect atypical > fluent (to at least some degree)

● Or, we find a main effect of age● Would expect to find an effect of age

within at least some conditions if we looked at them individually

Multiple Comparisons

● Some comparisons are (wholly or partial) redundant

● j – 1 contrasts actually describe everything● j = # of levels

FLUENT

MEAN OF:TypicalAtypical

.35730}TYPICAL

ATYPICAL}.04825Can calculate all

differences between levels based on this!

Multiple Comparisons● Want to avoid multiple comparisons

● Error rate increases if you run overlapping, redundant tests

● Suppose we have the wrong value for one of means (due to sampling error, etc.)

● In a single test, we set alpha so there is a 5% chance of incorrectly rejecting H

0 .05

Multiple Comparisons● But now we run a 2nd test comparing that

same “bad” condition to another condition● Outcome of this test is correlated with the

previous one since they both refer to one of the same conditions

● Not an independent 5% chance of error● Multiple tests compound Type I error rate

Orthogonality● Avoid this issue w/ orthogonal contrasts

– Products of weights (across contrasts) sum to 0

– Matrix of contrast is made up of orthogonal vectors

– Can think of this as the contrasts being uncorrelated with each other

Orthogonality● Avoid this issue w/ orthogonal contrasts

– Products of weights (across contrasts) sum to 0

.25

.25

-.5

.33

.33

-.66

.50

-.50

0

.165

-.165

0

x =

= 0

CONTRAST 1 CONTRAST 2 PRODUCT

TYPICAL

ATYPICAL

FLUENT +

x

x

Orthogonality● Avoid this issue w/ orthogonal contrasts

– Products of weights (across contrasts) sum to 0

.25

.25

-.5

.50

-.50

0

.50

0

-.50

.25

.0

.0

x =

= .25

CONTRAST 1 CONTRAST 2 PRODUCT

TYPICAL

ATYPICAL

FLUENT +

x

x

Corrections

● “But, Scott, I really want to do more than j – 1 comparisons”

● Can apply corrections to control Type I error

● Bonferroni: Multiply p value by # of comparisons

– Worst case scenario

● Less conservative corrections may be available

Outline

● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions

How Does it Work?

=LOCATION OF DISFLUENCY

SUBJECT ITEM

+ ++PRIOR

KNOWLEDGE

Behind the scenes...

How Does it Work?

β2X

2 + β

3X

3 + ...Y=β

0● Each categorical factor gets coded as

j - 1 variables● j = number of levels in that factor● Number of contrasts you have

β0+ β

1X

1 +

=LOCATION OF DISFLUENCY

SUBJECT ITEM

+ ++PRIOR

KNOWLEDGE

How Does it Work?

● Each coded variable represents one of your contrasts

β2X

2 + β

3X

3 + ...Y=β

0+ β

1X

1 +

.33

.33-.66

CONTRAST 1

X2 =

if typical location for disfluenciesif atypical

if fluent

Value of

contrast: β2

● Sig. difference between levelsif β differsfrom 0

Outline

● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions

Other Kinds of Coding● Dummy/Treatment

Coding– Compare all levels to a

baseline level

– Doesn't allow direct comparisons between non-baseline levels

– R does this by default :(

100

010

TypicalAtypicalFluent

X2X2 X3

Other Kinds of Coding● Dummy/Treatment Coding

– Compare all levels to a baseline level

– Doesn't allow comparisons between levels

– R does this by default :(

● Sum/Effects Coding– Test whether each level

differs from overall mean or from chance

Outline

● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions

Contrasts & Interactions

● Contrasts also apply in cases where we have interactions between variables

● Interaction term represents whether the value of the contrast depends on another variable

● We'll see some examples on the next slides

Interaction Example● Suppose we also sampled different age

groups in the disfluency experiment– 3 x 2 design

● What are possible patterns of results?

Fluent,young

Typical disfluencies,young

Atypical disfluencies,young

Fluent,older

Typical disfluencies,older

Atypical disfluencies,older

YOUNGADULTS

OLDER ADULTS

Gro

up

FLUENT TYPICAL ATYPICAL

Story Type

Possible Result 1

● Contrast 1 significant– Effect of disfluencies

● Contrast 2 non-sig.– Location irrelevant

● No effect of age at all in this case

– Everything the same for both age groups

YO

UN

GO

LD

ER

Before Plot Point After Plot Point Rest of Story

0

1

2

3

4

5

6

7

8

9

Before Plot Point After Plot Point Rest of Story

0

1

2

3

4

5

6

7

8

9

CONTRAST 1

CONTRAST 2 no

AGE no

CONTRAST 1 yes

C1 x AGE no

C2 x AGE no

SIGNIFICANT?

Possible Result 2

● Contrast 2 is now significant

– Typical > atypical

● Still no effect of AGE

CONTRAST 1

CONTRAST 2 yes

AGE no

CONTRAST 1 yes

C1 x AGE no

C2 x AGE no

SIGNIFICANT?

Before Plot Point After Plot Point Rest of Story

0

1

2

3

4

5

6

7

8

9

Before Plot Point After Plot Point Rest of Story

0

1

2

3

4

5

6

7

8

9

YO

UN

GO

LD

ER

Possible Result 3

● Now, AGE effect– Older adults remember

more across the board

● But, no interaction– Disfluency effect is the

same under both load conditions

CONTRAST 1

CONTRAST 2 yes

AGE yes

CONTRAST 1 yes

C1 x AGE no

C2 x AGE no

SIGNIFICANT?

Before Plot Point After Plot Point Rest of Story

0

1

2

3

4

5

6

7

8

9

Before Plot Point After Plot Point Rest of Story

0

1

2

3

4

5

6

7

8

9

YO

UN

GO

LD

ER

Possible Result 4

● Contrast 1 interacts with AGE

– Presence of disfluencies differs across age

● Effect only foryoung adults

● Contrast 2 (location) still same in all cases

CONTRAST 1

CONTRAST 2 yes

AGE yes

CONTRAST 1 yes

C1 x AGE yes

C2 x AGE no

SIGNIFICANT?

Before Plot Point After Plot Point Rest of Story

0

1

2

3

4

5

6

7

8

9

Before Plot Point After Plot Point Rest of Story

0

1

2

3

4

5

6

7

8

9

YO

UN

GO

LD

ER

Possible Result 5

● Now, Contrast 2 also interacts with AGE

– Reversal of Typical vs Atypical effect across age

CONTRAST 1

CONTRAST 2 yes

AGE yes

CONTRAST 1 yes

C1 x AGE yes

C2 x AGE yes

SIGNIFICANT?

Before Plot Point After Plot Point Rest of Story

0

1

2

3

4

5

6

7

8

9

Before Plot Point After Plot Point Rest of Story

0

1

2

3

4

5

6

7

8

9

YO

UN

GO

LD

ER

Possible Result 6

● Contrast 2 interaction but not Contrast 1

– Typical vs Atypical comparison does depend on age

– Overall effect of having fillers does not

CONTRAST 1

CONTRAST 2 yes

AGE yes

CONTRAST 1 yes

C1 x AGE no

C2 x AGE yes

SIGNIFICANT?

Before Plot Point After Plot Point Rest of Story

0

1

2

3

4

5

6

7

8

9

Before Plot Point After Plot Point Rest of Story

0

1

2

3

4

5

6

7

8

9

YO

UN

GO

LD

ER

Interactions in R● Implementing interactions in an R model

formula (lmer or otherwise):– A + B

● Main effects of A and B, no interaction– A * B

● All possible interactions and main effects of A and B

– A : B

● Interaction of A and B, no main effect (unless you add it separately)

● In, say, a corpus analysis with 20 predictors, you wouldn't want to test a 20-way interaction … but this lets you control what to include

Recommended