Upload
ariel-hampton
View
225
Download
0
Embed Size (px)
DESCRIPTION
Analysis of Variance 3 In its simplest form, it is used to compare means for three or more categories. Example: Income (metric) and Marital Status (many categories) Relies on the F-distribution Just like the t-distribution and chi-square distribution, there are several sampling distributions for each possible value of df.
Citation preview
I231B QUANTITATIVE METHODS
Analysis of Variance (ANOVA)
Syllabus Changes
Thursday April 24th, Regression
April 29: Multivariate Regression
May 1: Regression Diagnostics
May 6th: Logistic Regression
May 8th: Display of some advanced topics; Course Review
2
Analysis of Variance3
In its simplest form, it is used to compare means for three or more categories. Example:
Income (metric) and Marital Status (many categories)
Relies on the F-distribution Just like the t-distribution and chi-square distribution,
there are several sampling distributions for each possible value of df.
What is ANOVA?4
If we have a categorical variable with 3+ categories and a metric/scale variable, we could just run 3 t-tests. The problem is that the 3 tests would not be
independent of each other (i.e., all of the information is known).
A better approach: compare the variability between groups (treatment variance + error) to the variability within the groups (error)
The F-ratio
MS = mean squarebg = between
groupswg = within groups
wg
bg
MSMS
F
5
df = # of categories – 1 df = # of categories – 1 (k-1)(k-1)
Interpreting the F-ratio6
Generally, an f-ratio is a measure of how different the means are relative to the variability within each sample
Larger values of ‘f’ greater likelihood that the difference between means are not just due to chance alone
Null Hypothesis in ANOVA
If there is no difference between the means, then the between-group sum of squares should = the within-group sum of squares.
wg
bg
MSMS
F
7
Visual ANOVA and f-ratio8
http://www.psych.utah.edu/stat/introstats/anovaflash.html
F-distribution9
A right-skewed distributionIt is a ratio of two chi-square
distributions
F-distribution10
F-test is always a one-tailed test. Why?
Relationship to t-test11
Why not just run many t-tests between all possible combinations? As number of comparisons grow, likelihood of some
differences are expected– but do not necessarily indicate an overall difference.
Still, t-tests become important after an ANOVA so that we can find out which pairs are significantly different.
Certain ‘corrections’ can be applied to such post-hoc t-tests so that we account for multiple comparisons (e.g., Bonferroni correction, which divides p-value by the number of comparisons being made)
Logic of the ANOVA12
Conceptual Intro to ANOVA
Class Example: anova.do and sm96_compressed.dta