View
6
Download
0
Category
Preview:
Citation preview
Lectures 7: One-Way ANOVA
Lectures 7: One-Way ANOVA
Junshu Bao
University of Pittsburgh
1 / 23
Lectures 7: One-Way ANOVA
Table of contents
Review of ANOVA
Example: Cortisol Levels and Psychiatric Disorders
Data Exploration
ANOVA Test
Post-hoc Tests
Non-parametric Method
2 / 23
Lectures 7: One-Way ANOVA
Review of ANOVA
Introduction
I ANOVA is ANALYSIS OF VARIANCE.
Contrary to what this phrase seems to say, we will be
primarily concerned with the comparison of the means of
the data, not their variances.
I One-Way ANOVA compares mean values between multipleindependent groupsI Independent variable: categorical variable (> 2 levels)I Dependent variable: continuous outcome variableI Extension of the t-test (2 groups)
3 / 23
Lectures 7: One-Way ANOVA
Review of ANOVA
ANOVA Hypothesis Testing
I Null: All the population means are equal, i.e.
H0 : µ1 = µ2 = · · · = µI
I Alternative: Not all the µi's are equal, i.e.
Ha : µi 6= µj
for some i, j.
4 / 23
Lectures 7: One-Way ANOVA
Review of ANOVA
One-Way ANOVA Model
The statistic model for one-way ANOVA is
Yij = µ+ αi + εij
I Yij is the jth observation of the ith treatment (group)
I µ is the overall mean level
I αi is the di�erential e�ect of the ith treatment.
I The αi are normalized:∑I
i=1 αi = 0.
I εij is the random error and εij ∼ N(0, σ2).
The mean of the ith treatment group is µi = µ+ αi. It follows that
H0 : α1 = α2 = · · · = αI = 0
5 / 23
Lectures 7: One-Way ANOVA
Review of ANOVA
The F Test
The analysis of variance is based on the following identity:
I∑i=1
J∑j=1
(Yij − Y..)2 =
I∑i=1
J∑j=1
(Yij − Yi.)2 + J
I∑i=1
(Yi. − Y..)2
and the identity may be symbolically expressed as
SST = SSE + SSG
I SST is the total sum of squares (total variation)
I SSE is the error sum of squares (variation within groups)
I SSG is the sum of squares among groups (variation amonggroups)
6 / 23
Lectures 7: One-Way ANOVA
Review of ANOVA
The F Test (cont.)
I Theorem:
E(SSE) =I(J − 1)σ2 =⇒ E
[SSE
I(J − 1)
]= σ2
E(SSG) =J
I∑i=1
α2i + (I − 1)σ2
Under H0, E(SSG) = (I − 1)σ2 and E[SSG/(I − 1)] = σ2.
I Theorem:
SSG/σ2 ∼χ2I−1 under H0
SSE/σ2 ∼χ2I(J−1)
And SSE and SSG are independent. Consequently,
F =SSG/(I − 1)
SSE/[I(J − 1)]=MSG
MSE∼ F(I−1),I(J−1)
7 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
Example
A group of psychiatrists wanted to better understand the linkbetween a common stress hormone and di�erent psychiatric disorders.They enrolled random samples of patients diagnosed with
1. `normal' psychiatric status (as a control group)
2. major depression
3. schizophrenia
4. bipolar disorder
5. `atypical' psychiatric status.
The cortisol level of each patient was measured at their study visit.
Research question: Are the levels of cortisol signi�cantly di�erent for
patients with di�ering psychiatric status?
8 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
Analysis Plan
To examine the relationship between cortisol level and psychiatricstatus, we will perform the following analysis:
I Data exploration
I Descriptive statisticsI Side-by-side boxplot of cortisol levels by psychiatric status.
I ANOVA
I Assumption checking
I Post-hoc tests (if appropriate)
9 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
Reading DataReading the data and create formats.
data cort;
infile `C:\STAT 1301\data_supp\cortisol.dat';
input group cortisol;
label group = `Psychiatric status';
run;
proc format;
value groupform
1 = `Normal'
2 = `Major Depression'
3 = `Bipolar Depression'
4 = `Schizophrenia'
5 = `Atypical';
run;
Notice that, originally, `group' is read as a numeric variable with
values 1, 2, 3, 4, and 5. The format procedure is used to print
numeric values as character values.
10 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
Data Exploration
Data ExplorationCompare descriptive statistics: mean and standard deviation.
proc tabulate data=cort;
class group;
format group groupform.;
var cortisol;
table group,
cortisol*(mean std n);
run;
I The `Major Depression' group has the highest sample mean ofcortisol.
I The `Major Depression' group also has the largest std.
11 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
Data Exploration
Data Exploration (2)Create a side-by-side boxplot.
proc sort data=cort;
by group;
run;
proc boxplot data=cort;
plot (cortisol)*group;
run;
I In order to create the boxplot, the data must be sorted by thegrouping variable.
I Apparently, the second group `Major Depression' is very di�erentfrom other groups.
12 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
ANOVA Test
ANOVA F Test
proc anova data=cort;
class group;
model cortisol=group;
run;
ANOVA Table:
Source DF Sum of Squares Mean Square F Value Pr>FModel 4 1426 356 22.32 <0.0001Error 66 1054 16Total 70 2480
I H0 : µ1 = µ2 = · · · = µ5 versus Ha : µi 6= µj for some i and j.
I Test statistic: F = 22.32
I p-value < 0.0001
I Decision: Reject H0. Not all of the population means are equal.13 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
ANOVA Test
Checking Assumptions
I The dependent variable is normally distributed within eachgroup.
I ANOVA is actually pretty robust against violations innormality.
I Test by running proc univariate on the variable or theresiduals by group.
I Homogeneity of variance: variances are equal across groups.
I Use Levene's test.
I Independence of observations: a study design issue.
If assumptions are violated (normality or homogeneity of variable),
can try a transformation to �x the issue or run a non-parametric
alternative (Kruskal-Wallis)
14 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
ANOVA Test
Checking Normality Assumption
proc univariate data=cort normal;
var cortisol;
by group;
run;
Test results
I Group 1: signi�cant
I Group 2: not signi�cant
I Group 3: signi�cant
I Group 4: not signi�cant
I Group 5: marginally signi�cant
15 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
ANOVA Test
Log Transformation of Response
Since the normal test is signi�cant for some groups, we can take log ofthe cortisol level and see if the data will be more `normal'.
data cort2; set cort; logcort = log(cortisol); run;
proc sort data=cort2; by group; run;
proc univariate data=cort2 normal;
var logcort; by group; run;
Test results
I Group 1: signi�cant
I Group 2: signi�cant
I Group 3: not signi�cant
I Group 4: marginally signi�cant
I Group 5: not signi�cant
Normality is not improved by log transformation.16 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
ANOVA Test
Checking Equal-Variance Assumption (1)
Levene's test
I H0: the variances in each group are equivalent. The assumptionis `met' if we accept the null hypothesis � so we want anon-signi�cant p-value.
I Ha: The variances are not equivalent.
We can ask SAS to provide you with Levene's test using the hovtestoption in the means statement.
proc anova data=cort;
class group;
model cortisol=group;
means group/ hovtest;
run;
The p-value is less than 0.0001 so the test is signi�cant.
17 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
ANOVA Test
Checking Equal-Variance Assumption (2)
Let us check the homogeneous variance assumption for thelog-transformed data.
proc anova data=cort2;
class group;
model logcort=group;
means group/ hovtest; *asks for Levene's test;
run;
The p-value is 0.0856 so fail to reject the null. The variances are not
signi�cantly di�erent among groups for log(cortisol). So we will use
the ANOVA test result on the log transformed response.
18 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
ANOVA Test
ANOVA F Test for log(cortisol)
Source DF Sum of Squares Mean Square F Value Pr>FModel 4 36.13 9.03 20.63 <0.0001Error 66 28.90 0.44Total 70 65.03
I H0 : µ1 = µ2 = · · · = µ5 versus Ha : µi 6= µj for some i and j.Note that log(µi) = log(µj) =⇒ µi = µj .
I Test statistic: F = 20.63
I p-value < 0.0001
I Decision: Reject H0. Not all of the population means are equal.
19 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
Post-hoc Tests
Post-hoc Tests
If the overall F-test is signi�cant, we only know that there is at leastone pair of means that are di�erent but we do not know WHICHpairs are di�erent.
I Pairwise tests (I = 3)
H01 :µ1 = µ2
H02 :µ1 = µ3
H03 :µ2 = µ3
I Three di�erent methods of performing pairwise tests. All 3adjust for the in�ated type I error rate when performing multiplecomparisons.
I Tukey's test (use when sample sizes are the same)*Tukey-Kramer test (for unequal sample sizes)
I Bonferroni's test (tends to be conservative)I Sche�e's test (equal or unequal sample sizes)
20 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
Post-hoc Tests
Pairwise comparisons
The bon, tukey, and sche�e options on the means statements requestthe three pairwise tests. The cldi� option presents results of the testsas con�dence intervals for all pairwise di�erences between means. It isthe default for unequal cell sizes.
proc anova data=cort2;
class group;
model logcort=group;
means group/tukey bon scheffe cldiff;
run;
Summary of test results:
I Tukey and Bonferroni have the same results:
I Signi�cantly di�erent pairs: (1,2),(1,4),(2,3),(2,4),(2,5),(3,4)
I Sche�e's test:
I Signi�cantly di�erent pairs: (2,1), (2,3), (2,4), (2,5)
21 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
Non-parametric Method
A Non-parametric Method - The Kruskal-Wallis Test
The Kruskal-Wallis test is a generalization of the Mann-Whitney test.It is a non-parametric alternative to the ANOVA F test.
proc npar1way data=cort;
class group;
var cortisol;
run;
Summary of test results:
I Test statistic: X2 = 31.7657
I p-value < 0.0001
22 / 23
Lectures 7: One-Way ANOVA
Example: Cortisol Levels and Psychiatric Disorders
Non-parametric Method
Non-Parametric Pairwise Comparison
If you specify the dscf option, proc npar1way computes the Dwass,Steel, Critchlow-Fligner (DSCF) multiple comparison analysis, whichis based on pairwise two-sample Wilcoxon comparisons.
proc npar1way data=cort dscf;
class group;
var cortisol;
run;
Signi�cantly di�erent pairs:
I (1,2), (1,4), (2,3), (2,4), (3,4)
The results are similar to those of Tukey-Kramer and Bonferroni.
23 / 23
Recommended