56
Assumptions

Assumptions

  • Upload
    mostyn

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Assumptions. “Essentially , all models are wrong, but some are useful”. Your model has to be wrong… … but that’s o.k. if it’s illuminating!. George E.P. Box. Linear Model Assumptions. Absence of Collinearity. No influential data points. Normality of Errors. Homoskedasticity of Errors. - PowerPoint PPT Presentation

Citation preview

Page 1: Assumptions

Assumptions

Page 2: Assumptions

“Essentially, all models are wrong, but some are useful”

George E.P. Box

Your model has to bewrong…… but that’s o.k.if it’s illuminating!

Page 3: Assumptions

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Page 4: Assumptions

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Page 5: Assumptions

Absence of Collinearity

Baayen(2008: 182)

Page 6: Assumptions

Absence of Collinearity

Baayen(2008: 182)

Page 7: Assumptions

Where does collinearitycome from?

…most often, correlated predictor variables

Demo

Page 8: Assumptions

What to do?

Page 9: Assumptions

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Page 10: Assumptions

Baayen(2008: 189-

190)

Leverage

Page 11: Assumptions
Page 12: Assumptions
Page 13: Assumptions
Page 14: Assumptions
Page 15: Assumptions

DFbeta(…and much

more)

Leave-one-outInfluence Diagnostics

Page 16: Assumptions

Winter & Matlock (2013)

Page 17: Assumptions

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Page 18: Assumptions

Normality of ErrorThe error (not the data!) is assumed to be normally distributed

So, the residuals should be normally distributed

Page 19: Assumptions

xmdl = lm(y ~ x)hist(residuals(xmdl))

Page 20: Assumptions

qqnorm(residuals(xmdl))qqline(residuals(xmdl))

Page 21: Assumptions

qqnorm(residuals(xmdl))qqline(residuals(xmdl))

Page 22: Assumptions

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Page 23: Assumptions

Homoskedasticity of ErrorThe error (not the data!) is assumed to have equal variance across the predicted values

So, the residuals should have equal variance across the predicted values

Page 24: Assumptions
Page 25: Assumptions

Page 26: Assumptions

Page 27: Assumptions

Page 28: Assumptions

WHAT TO IF NORMALITY/HOMOSKEDAS

TICITY IS VIOLATED? Either: nothing + report the

violation Or: report the violation

+ transformations

Page 29: Assumptions

Two types of transformations

LinearTransformation

s

NonlinearTransformation

s

Leave shape of the distribution

intact (centering, scaling)

Do change the shape of the distribution

Page 30: Assumptions
Page 31: Assumptions
Page 32: Assumptions

Before transformation

Page 33: Assumptions

After transformation

Still bad….…. but better!!

Page 34: Assumptions

Assumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Page 35: Assumptions

Normality of Errors

Homoskedasticity of Errors

(Histogram of Residuals)

Q-Q plot of Residuals

Residual Plot

Assumptions

Page 36: Assumptions

Absence ofCollinearity

No influentialdata points

Independence

Normality of Errors

Homoskedasticity of Errors

Assumptions

Page 37: Assumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Assumptions

Page 38: Assumptions
Page 39: Assumptions

What isindependence?

Page 40: Assumptions

Rep 1

Rep 2

Rep 3

Item #1

Subject

Common experimental data

Item...

Item...

Page 41: Assumptions

Rep 1

Rep 2

Rep 3

Item #1

Subject

Common experimental data

Pseudoreplication= DisregardingDependenciesItem

...

Item...

Page 42: Assumptions

Subject1 Item1Subject1 Item2Subject1 Item3… …

Subject2 Item1Subject2 Item2Subject3 Item3…. …

Machlis et al. (1985)“pooling fallacy”

Hurlbert (1984)“pseudoreplication”

Page 43: Assumptions

Hierarchical data is everywhere• Typological data

(e.g., Bell 1978, Dryer 1989, Perkins 1989; Jaeger et al., 2011)

• Organizational data

• Classroom data

Page 44: Assumptions

German

French

English

Spanish Italian

Swedish

NorwegianFinnish

Hungarian

Turkish

Romanian

Page 45: Assumptions

German

French

English

Spanish Italian

Swedish

NorwegianFinnish

Hungarian

Turkish

Romanian

Page 46: Assumptions

Class 1 Class 2

Hierarchical data is everywhere

Page 47: Assumptions

Class 1 Class 2

Hierarchical data is everywhere

Page 48: Assumptions

Class 1 Class 2

Hierarchical data is everywhere

Page 49: Assumptions

Hierarchical data is everywhere

Page 50: Assumptions

IntraclassCorrelation (ICC)

Hierarchical data is everywhere

Page 51: Assumptions

Simulation for 16 subjects

pseudoreplication

items analysis

Type Ierrorrate

Page 52: Assumptions

Interpretational Problem:What’s the population

for inference?

Page 53: Assumptions

Violating the independence assumption makesthe p-value…

…meaningless

Page 54: Assumptions

S1

S2

Page 55: Assumptions

S1

S2

Page 56: Assumptions

That’s it(for now)