Upload
august-simon
View
228
Download
1
Embed Size (px)
Citation preview
Topic 25: Inference for Two-Way ANOVA
Outline
• Two-way ANOVA
–Data, models, parameter estimates
• ANOVA table, EMS
• Analytical strategies
• Regression approach
Data
• Response written Yijk where
– i denotes the level of the factor A
– j denotes the level of the factor B
–k denotes the kth observation in cell (i,j)
• i = 1, . . . , a levels of factor A
• j = 1, . . . , b levels of factor B
• k = 1, . . . , n observations in cell (i,j)
Cell means model
• Yijk = μij + εijk
–where μij is the theoretical mean or expected value of all observations in cell (i,j)
– the εijk are iid N(0, σ2)
–This means Yijk ~N(μij, σ2) and independent
Factor effects model
• μij = μ + αi + βj + (αβ)ij
• Consider μ to be the overall mean
• αi is the main effect of A
• βj is the main effect of B
• (αβ)ij is the interaction between A and B
Constraints for this interpretation
• α. = Σiαi = 0 (df = a-1)
• β. = Σjβj = 0 (df = b-1)
• (αβ).j = Σi (αβ)ij = 0 for all j
• (αβ)i. = Σj (αβ)ij= 0 for all I
df = (a-1)(b-1)
SAS GLM Constraints
• αa = 0 (1 constraint)• βb = 0 (1 constraint)• (αβ)aj = 0 for all j (b constraints)• (αβ)ib = 0 for all i (a constraints)• The total is 1+1+a+b-1=a+b+1 (the
constraint (αβ)ab is counted twice in the last two bullets above)
Parameters and constraints
• The cell means model has ab parameters for the means
• The factor effects model has (1+a+b+ab) parameters–An intercept (1)–Main effect of A (a)–Main effect of B (b)– Interaction of A and B (ab)
Factor effects model
• There are 1+a+b+ab parameters• There are 1+a+b constraints• There are ab unconstrained parameters
(or sets of parameters), the same number of parameters for the means in the cell means model
• While certain parameters depend on choice of constraints, others do not
KNNL Example• KNNL p 833• Y is the number of cases of bread sold• A is the height of the shelf display, a=3
levels: bottom, middle, top• B is the width of the shelf display, b=2:
regular, wide• n=2 stores for each of the 3x2
treatment combinations
Proc GLM with solution
proc glm data=a1; class height width; model sales=height width height*width /solution; means height*width;run;
Solution output
Intercept 44.0 B height 1 -1.0 Bheight 2 25.0 B height 3 0.0 B width 1 -4.0 Bwidth 2 0.0 B
Solution output
height*width 1 1 6.0 Bheight*width 1 2 0.0 B height*width 2 1 0.0 Bheight*width 2 2 0.0 B height*width 3 1 0.0 Bheight*width 3 2 0.0 B
Means
height width Mean1 1 45=44 -1-4+61 2 43=44 -1+0+0 2 1 65=44+25-4+02 2 69=44+25+0+03 1 40=44 +0-4+03 2 44=44 +0+0+0
Based on estimates from previous two
pages
Check normalityAlternative way to form QQplot
proc glm data=a1; class height width; model sales=height width height*width; output out=a2 r=resid;proc rank data=a2 out=a3 normal=blom; var resid; ranks zresid;
Normal Quantile plot
proc sort data=a3; by zresid;symbol1 v=circle i=sm70;proc gplot data=a3; plot resid*zresid/frame;run;
The plot
Note, dfE is only 6
ANOVA Table
Source df SS MS F A a-1 SSA MSA MSA/MSE B b-1 SSB MSB MSB/MSE AB (a-1)(b-1) SSAB MSAB MSAB/MSEError ab(n-1) SSE MSE _ Total abn-1 SSTO
Expected Mean Squares
• E(MSE) = σ2
• E(MSA) = σ2 + nb(Σiαi2)/(a-1)
• E(MSB) = σ2 + na(Σjβj2)/(b-1)
• E(MSAB) = σ2 + n(Σ )/((a-1)(b-1))
• Here, αi, βj, and (αβ)ij are defined with the usual factor effects constraints
2)( ij
An analytical strategy
• Run the model with main effects and the two-way interaction
• Plot the data, the means, and look at the normal quantile plot and residual plots
• If assumptions seem reasonable, check the significance of test for the interaction
AB interaction not sig• If the AB interaction is not statistically
significant
–Possibly rerun the analysis without the interaction (See pooling §19.10)
–Potential Type II errors when pooling
–For a main effect with more than two levels that is significant, use the means statement with the Tukey multiple comparison procedure
GLM Output
Source DF SS MS F Pr > FModel 5 1580 316 30.58 0.0003Error 6 62 10Total 11 1642
Note that there are 6 cells inthis design.
Output ANOVA
Type I or Type IIISource DF SS MS F Pr > Fheight 2 1544 772 74.71 <.0001width 1 12 12 1.16 0.3226h*w 2 24 12 1.16 0.3747
Note Type I and Type III analyses are the same becausecell size n is constant
Rerun without interaction
proc glm data=a1; class height width; model sales=height width; means height / tukey lines;run;
ANOVA output
Source DF MS F Pr > Fheight 2 772 71.81 <.0001width 1 12 1.12 0.3216
MS(height) and MS(width) havenot changed. The MSE, F*’s, and P-values have because of pooling.
Comparison of MSEs
Error 8 86 10.75
Error 6 62 10.33
Model with interaction
Model without interaction
Little change in MSE here…often only pool when df small
Pooling SS• Data = Model + Residual• When we remove a term from the `model’,
we put this variation and the associated df into `residual’
• This is called pooling• A benefit is that we have more df for error
and a simpler model• Potential Type II errors• Beneficial only in small experiments
Pooling SSE and SSAB
• For model with interaction
• SSAB=24, dfAB=2
• SSE=62, dfE=6
•MSE=10.33
• For the model with main effects only
• SSE=62+24=86, dfE=6+2=8
•MSE=10.75
Tukey Output
Mean N height
A 67.000 4 2
B 44.000 4 1BB 42.000 4 3
Plot of the means
Regression Approach
• Similar to what we did for one-way• Use a-1 variables for A• Use b-1 variables for B• Multiply each of the a-1 variables for A
times each of the b-1 for B to get (a-1)(b-1) for AB
• You can use the test statement in Proc reg to perform F tests
Create Variables
data a4;
set a1;
X1 = (height eq 1) - (height eq 3);
X2 = (height eq 2) - (height eq 3);
X3 = (width eq 1) - (width eq 2);
X13 = X1*X3;
X23 = X2*X3;
Run Proc Reg
proc reg data=a4;
model sales= X1 X2 X3 X13 X23 / ss1;
height: test X1, X2;
width: test X3;
interaction: test X13, X23;
run;
SAS Output
Analysis of Variance
Source DFSum of
SquaresMean
Square F Value Pr > FModel 5 1580.00000 316.00000 30.58 0.0003
Error 6 62.00000 10.33333
Corrected Total 11 1642.00000
Same basic ANOVA table
SAS OutputParameter Estimates
Variable DFParameter
EstimateStandard
Error t Value Pr > |t| Type I SSIntercept 1 51.00000 0.92796 54.96 <.0001 31212
X1 1 -7.00000 1.31233 -5.33 0.0018 8.00000
X2 1 16.00000 1.31233 12.19 <.0001 1536.0000
X3 1 -1.00000 0.92796 -1.08 0.3226 12.00000
X13 1 2.00000 1.31233 1.52 0.1783 18.00000
X23 1 -1.00000 1.31233 -0.76 0.4749 6.00000
SS Results
• SS(Height) = SS(X1)+SS(X2|X1)
1544 = 8.0 + 1536
• SS(Width) = SS(X3|X1,X2)
12 = 12
• SS(Height*Width) = SS(X13|X1,X2,X3) + SS(X23|X1, X2,X3,X13)
24 = 18 + 6
Test ResultsTest height Results for Dependent Variable
sales
Source DFMean
Square F Value Pr > FNumerator 2 772.0000 74.71 <.0001
Denominator 6 10.33333
Test interaction Results for Dependent Variable sales
Source DFMean
SquareF
Value Pr > FNumerator 2 12.000 1.16 0.3747
Denominator 6 10.333
Test width Results for Dependent Variable sales
Source DFMean
Square F Value Pr > FNumerator 1 12.0000 1.16 0.3226Denominator 6 10.3333
Interpreting Estimates
69)1()1(1651ˆ
452)1()7(51ˆ
52)1(51ˆ 50)1(51ˆ
4216)7(51ˆ
671651ˆ
44)7(51ˆ
22
11
2.1.
.3
.2
.1
Last slide
• Finish reading KNNL Chapter 19• Topic25.sas contains the SAS commands for these
slides• We will now focus more on the strategies needed to
handle a two- or more factor ANOVA