I231B QUANTITATIVE METHODS Analysis of Variance (ANOVA)

I231B QUANTITATIVE METHODS

Analysis of Variance (ANOVA)

Syllabus Changes

Thursday April 24th, Regression

April 29: Multivariate Regression

May 1: Regression Diagnostics

May 6th: Logistic Regression

May 8th: Display of some advanced topics; Course Review

2

Analysis of Variance3

In its simplest form, it is used to compare means for three or more categories. Example:

Income (metric) and Marital Status (many categories)

Relies on the F-distribution Just like the t-distribution and chi-square distribution,

there are several sampling distributions for each possible value of df.

What is ANOVA?4

If we have a categorical variable with 3+ categories and a metric/scale variable, we could just run 3 t-tests. The problem is that the 3 tests would not be

independent of each other (i.e., all of the information is known).

A better approach: compare the variability between groups (treatment variance + error) to the variability within the groups (error)

The F-ratio

MS = mean squarebg = between

groupswg = within groups

wg

bg

MSMS

F

5

df = # of categories – 1 df = # of categories – 1 (k-1)(k-1)

Interpreting the F-ratio6

Generally, an f-ratio is a measure of how different the means are relative to the variability within each sample

Larger values of ‘f’ greater likelihood that the difference between means are not just due to chance alone

Null Hypothesis in ANOVA

If there is no difference between the means, then the between-group sum of squares should = the within-group sum of squares.

wg

bg

MSMS

F

7

Visual ANOVA and f-ratio8

http://www.psych.utah.edu/stat/introstats/anovaflash.html

http://www.psych.utah.edu/stat/introstats/anovaflash.html

F-distribution9

A right-skewed distributionIt is a ratio of two chi-square

distributions

F-distribution10

F-test is always a one-tailed test. Why?

Relationship to t-test11

Why not just run many t-tests between all possible combinations? As number of comparisons grow, likelihood of some

differences are expected– but do not necessarily indicate an overall difference.

Still, t-tests become important after an ANOVA so that we can find out which pairs are significantly different.

Certain ‘corrections’ can be applied to such post-hoc t-tests so that we account for multiple comparisons (e.g., Bonferroni correction, which divides p-value by the number of comparisons being made)

Logic of the ANOVA12

Conceptual Intro to ANOVA

Class Example: anova.do and sm96_compressed.dta

http://faculty.vassar.edu/lowry/ch13pt1.html

Documents

I231B QUANTITATIVE METHODS Analysis of Variance (ANOVA)