Upload
brian-davis
View
222
Download
0
Embed Size (px)
Citation preview
1
Experimental StatisticsExperimental Statistics - week 6 - week 6Experimental StatisticsExperimental Statistics - week 6 - week 6
Chapter 15: Randomized Complete Block Design (15.3)
Factorial Models (15.5)
2
Caution: Chapter 15 introduces some new notation - i.e. changes notation already defined
3
2 2 2.. . .. .
1 1 1 1 1
( ) ( ) ( )t n t t n
ij i ij ii j i i j
y y n y y y y
Recall: Sum-of-Squares Identity 1-Factor ANOVA
TSS SSB SSW Notation:
In words:
Total SS = SS between samples + within sample SS
4
2 2 2.. . .. .
1 1 1 1 1
( ) ( ) ( )t n t t n
ij i ij ii j i i j
y y n y y y y
Recall: Sum-of-Squares Identity 1-Factor ANOVA
TSS SSB SSW Notation:
- new notation for Chapter 15
5
2 2 2.. . .. .
1 1 1 1 1
( ) ( ) ( )t n t t n
ij i ij ii j i i j
y y n y y y y
Recall: Sum-of-Squares Identity 1-Factor ANOVA
TSS S SSWST Notation:
- new notation for Chapter 15
6
2 2 2.. . .. .
1 1 1 1 1
( ) ( ) ( )t n t t n
ij i ij ii j i i j
y y n y y y y
Recall: Sum-of-Squares Identity 1-Factor ANOVA
TSS SST SSE Notation:
- new notation for Chapter 15
In words:
Total SS = SS for “treatments” + SS for “error”
7
Revised ANOVA Table for 1-Factor ANOVA(Ch. 15 terminology - p.857)
Source SS df MS F
Treatments SST t 1
Error SSE N t Total TSS N
/( 1)MST SST t
/( )MSE SSE N t
/MST MSE
1 2 t
N
nt
n n n
total # of observations
(if equal # obs.)
8
Recall CRD Model for Gasoline Data
yij = iij
yij = iij
or
unexplained partmean for ith gasoline
observed octane
-- car-to-car differences-- temperature-- etc.
9
Gasoline Data
Question:
What if car differences are obscuring gasoline differences?
Similar to diet t-test example: Recall: person-to-person differences obscured effect of diet
10
Possible Alternative Design:
Test all 5 gasolines on the same car
- in essence we test the gasoline effect directly and remove effect of car-to-car variation
Question:How would you randomize an experiment with 4 cars?
11
Blocking an Experiment
- dividing the observations into groups (called blocks) where the observations in each block are collected under relatively similar conditions
- comparisons can many times be made more precisely this way
12
Terminology is based on Agricultural Experiments
Consider the problem of testingfertilizers on a crop - t fertilizers - n observations on each
13
Completely Randomized DesignCompletely Randomized DesignCompletely Randomized DesignCompletely Randomized Design
A
A
BB
C
C
B
A
C
C
B
A
A
B
C
t = 3 fertilizersn = 5 replications
- randomly select 15 plots- randomly assign fertilizers to the 15 plots
14
Randomized Complete Block Randomized Complete Block StrategyStrategy
Randomized Complete Block Randomized Complete Block StrategyStrategy
B | A | C
A | C | B
C | A | B
A | B | C C | B | A
t = 3 fertilizers
- select 5 “blocks”- randomly assign the 3 treatments to each block
Note: The 3 “plots” within each block are similar - similar soil type, sun, water, etc
15
Randomized Complete Block Design Randomly assign each treatment once to every block
Car Example Car 1: randomly assign each gas to this car
Car 2: ....
etc.
Agricultural Example Randomly assign each fertilizer to one of the 3 plots within each block
16
yij = ijij
Model For Randomized Complete Block (RCB) Design
effect of ith treatment
effect of jth block
unexplained error
(car)(gasoline)
1 1
0t b
i ji j
As before:
-- temperature-- etc.
17
18
Back to CAR data:
Suppose that instead of 20 cars, there were only 4 cars, and we tested each gasoline on each car.
“Restructured” CAR Data
A 91.7 91.2 90.9 90.6B 91.7 91.9 90.9 90.9C 92.4 91.2 91.6 91.0D 91.8 92.2 92.0 91.4E 93.1 92.9 92.4 92.4
Old Data Format1 2 3 4
Car
Gas
A 91.7 91.2 90.9 90.6B 91.7 91.9 90.9 90.9C 92.4 91.2 91.6 91.0D 91.8 92.2 92.0 91.4E 93.1 92.9 92.4 92.4
Gas
19
Back to CAR data:
Suppose that instead of 20 cars, there were only 4 cars, and we tested each gasoline on each car.
“Restructured” CAR Data
A 91.7 91.2 90.9 90.6B 91.7 91.9 90.9 90.9C 92.4 91.2 91.6 91.0D 91.8 92.2 92.0 91.4E 93.1 92.9 92.4 92.4
Old Data Format1 2 3 4
Car
Gas
A 91.7 91.2 90.9 90.6B 91.7 91.9 90.9 90.9C 92.4 91.2 91.6 91.0D 91.8 92.2 92.0 91.4E 93.1 92.9 92.4 92.4
Gas
20
2 2 2.. . .. .
1 1 1 1 1
( ) ( ) ( )t n t t n
ij i ij ii j i i j
y y n y y y y
Recall: Sum-of-Squares Identity 1-Factor ANOVA
TSS SST SSE Notation:
- new notation for Chapter 15
In words:
Total SS = SS for “treatments” + SS for “error”
21
2 2 2 2.. . .. . .. . . ..
1 1 1 1 1 1
( ) ( ) ( ) ( )t b t b t b
ij i j ij i ji j i j i j
y y b y y t y y y y y y
A New Sum-of-Squares Identity
TSS SST SSB SSE Not atio n:
In words:
Total SS = SS for treatments + SS for blocks + SS for error
22
Hypotheses:
To test for treatment effects - i.e. gas differenceswe test
0 1 2: tH
To test for block effects - i.e. car differences (not usually the research hypothesis)we test
0 1 2: bH
23
Randomized Complete Block Design ANOVA Table
Source SS df MS F
Treatments SST t 1
Blocks SSB
Error SSE Total TSS bt
/( 1)MST SST t
/( 1)( 1)MSE SSE b t
/MST MSE
See page 866
( 1)( 1)b t
1b /( 1)MSB SSB b /MSB MSE
24
0
( 1,( 1)( 1))
H
MSTF F t b t
MSE
We reject at significance level if
0 1 2:
: 0t
a i
H
H
at least one
Test for Treatment Effects
Note:2MSE estimates
2 2
1
1
1
t
ii
MSTt
estimates
1F - if no treatment effects, we expect ; 1F - if treatment effects, we expect
25
0
( 1,( 1)( 1))
H
MSBF F b b t
MSE
We reject at significance level if
Test for Block Effects
0 1 2:
: 0b
a j
H
H
at least one
26
The first variable (A - E) indicates gas as it did with the CompletelyRandomized Design. The second variable (B1 - B4) indicates car.
A B1 91.7A B2 91.2A B3 90.9A B4 90.6B B1 91.7B B2 91.9B B3 90.9B B4 90.9C B1 92.4C B2 91.2C B3 91.6C B4 91.0D B1 91.8D B2 92.2D B3 92.0D B4 91.4E B1 93.1E B2 92.9E B3 92.4E B4 92.4
“Restructured” CAR Data - SAS Format
27
SAS file - Randomized Complete Block Design for CAR Data
INPUT gas$ block$ octane;PROC GLM; CLASS gas block; MODEL octane=gas block; TITLE 'Gasoline Example -Randomized Complete Block Design'; MEANS gas/LSD;RUN;
28
CRD ANOVA Table Output - car data
Source SS df MS F p-value
Gas 6.108 4 1.527 6.80 0.0025 (treatments)
Error 3.370 15 0.225
Totals 9.478 19
29
RCB ANOVA Table Output - car data
Source SS df MS F p-value
Gas 6.108 4 1.527 15.58 0.0001 (treatments)
Cars 2.194 3 0.731 7.46 0.0044 (blocks)
Error 1.176 12 0.098
Totals 9.478 19
30
Dependent Variable: OCTANE Sum of MeanSource DF Squares Square F Value Pr > F Model 7 8.30200000 1.18600000 12.10 0.0001 Error 12 1.17600000 0.09800000 Corrected Total 19 9.47800000 R-Square C.V. Root MSE OCTANE Mean 0.875923 0.341347 0.3130495 91.710000 Source DF Anova SS Mean Square F Value Pr > F GAS 4 6.10800000 1.52700000 15.58 0.0001BLOCK 3 2.19400000 0.73133333 7.46 0.0044
SAS Output -- RCB CAR Data
1 2
1 2
y y
y y
and are significantly different if
| |
Multiple Comparisons in RCB AnalysisMultiple Comparisons in RCB Analysis
22
( )α/MSE
tb
(LSD)
(2 )2
( )α/ mMSE
tb
(Bonferroni)
32
33
t Grouping Mean N gas A 92.7000 4 E B 91.8500 4 D B C B 91.5500 4 C C B C B 91.3500 4 B C C 91.1000 4 A
t Grouping Mean N gas A 92.7000 4 E B 91.8500 4 D B C B 91.5500 4 C C C 91.3500 4 B C C 91.1000 4 A
CAR Data -- LSD Results
CRD Analysis
RCB Analysis
34
Bon Grouping Mean N gas A 92.7000 4 E A B A 91.8500 4 D B B 91.5500 4 C B B 91.3500 4 B B B 91.1000 4 A
CAR Data -- Bonferroni Results
CRD Analysis
RCB Analysis
Bon Grouping Mean N gas
A 92.7000 4 E
B 91.8500 4 D B B 91.5500 4 C B B 91.3500 4 B B B 91.1000 4 A
35
STIMULUS EXAMPLE:
Personal computer presents stimulus, and person responds.
Study of how RESPONSE TIME is effected by a WARNING given prior to the stimulus:
2-factors of interest:
Warning Type --- auditory or visual
Time between warning and stimulus -- 5 sec, 10 sec, or 15 sec.
36
.204 .257
.170 .279
.181 .269
.167 .283
.182 .235
.187 .260
.202 .256
.198 .281
.236 .258
Auditory Visual
5 sec
10 sec
15 sec
WarningTime
Note: “Sort of like RCB” -- what is the difference?
Question: How would you randomize? - 18 subjects - 1 subject
37
Observed data
ijky
Level of Factor A
Level of Factor B
Replication
(warning type) (time)
(response time)
Stimulus Data
38
FactorA
Factor B
2-Factor ANOVA Data
39
.
..
. .
...
ij
i
j
y
y
y
y
40
A Possible Model for STIMULUS Data
ij i j
ijk i j ijky
Note:
so according to this model
1 2j j j
difference between types 1 and 2 at time
Note: The model assumes that the difference between types is the same for all times
i = type, j = time
41
Auditory
Visual
5 10 15
Hypothetical Cell MeansHypothetical Cell Means
42
ij ij j j
Similarly
i.e. the model says
j jthe difference between times and is the same for all types
We may not want to make these assumptions!!
43
Auditory
Visual
5 10 15
Hypothetical Cell MeansHypothetical Cell Means
Auditory
Visual
5 10 15
44
Model for 2-factor Design
ijk i j ij ijky
1 1 1 1
0a b a b
i j ij iji j i j
where
45
2 2... .. ...
1 1 1 1
2. . ...
1
2. .. . . ...
1 1
2...
1
( ) ( )
( )
( )
( )
a b n a
ijk ii j k i
b
jj
a b
ij i ji j
ijkk
y y bn y y
an y y
n y y y y
y y
1 1
a b n
i j
Sum-of-Squares Breakdown
(2-factor ANOVA)
SSA
SSB
SSAB
SSE
46
2-Factor ANOVA Table(2-Factor Completely Randomized Design)
Source SS df MS F
Main Effects
A SSA a 1
B SSB b1
Interaction
AB SSAB (a 1)(b1)
Error SSE ab(n 1) Total TSS abn
/( 1)MSB SSB b
/ ( 1)MSE SSE ab n
/MSA MSE
See page 900
/( 1)( 1)MSAB SSAB a b
/MSB MSE
/( 1)MSA SSA a
/MSAB MSE
47
****************************************************** Two-Way ANOVA using PROC GLM ** showing Interaction Plots ******************************************************;data stimulus;input type$ time response;datalines;A 5 .204A 5 .170A 5 .181A 10 .167A 10 .182A 10 .187A 15 .202A 15 .198A 15 .236V 5 .257V 5 .279V 5 .269V 10 .283V 10 .235V 10 .260V 15 .256V 15 .281V 15 .258;PROC GLM; CLASSES type time; MODEL response=type time type*time;TITLE ‘Stimulus Data';run;PROC SORT;BY type time;PROC MEANS; BY type time; OUTPUT OUT=cells MEAN=response;RUN;* OUTPUT MEAN INTERACTION PLOTS;PROC GPLOT;PLOT response*type=time;SYMBOL1 V=CIRCLE I=JOIN C=BLACK;SYMBOL2 V=DOT I=JOIN C=BLACK;symbol3 V=BOX I=JOIN C=BLACK;RUN;PROC GPLOT;PLOT response*time=type;SYMBOL1 V=CIRCLE I=JOIN C=BLACK;SYMBOL2 V=DOT I=JOIN C=BLACK;RUN;PROC PRINT;RUN;
48
****************************************************** Two-Way ANOVA using PROC GLM * showing Interaction Plots *****************************************************;data stimulus;input type$ time response;datalines;A 5 .204A 5 .170A 5 .181A 10 .167A 10 .182A 10 .187A 15 .202A 15 .198A 15 .236V 5 .257V 5 .279V 5 .269V 10 .283V 10 .235V 10 .260V 15 .256V 15 .281V 15 .258;PROC GLM; CLASSES type time; MODEL response=type time type*time;TITLE ‘Stimulus Data';run;
49
Stimulus Data The GLM Procedure Dependent Variable: response Sum of Source DF Squares Mean Square F Value Pr > F Model 5 0.02554894 0.00510979 17.66 <.0001 Error 12 0.00347200 0.00028933 Corrected Total 17 0.02902094 R-Square Coeff Var Root MSE response Mean 0.880362 7.458622 0.017010 0.228056 Source DF Type I SS Mean Square F Value Pr > F type 1 0.02354450 0.02354450 81.38 <.0001 time 2 0.00115811 0.00057906 2.00 0.1778
type*time 2 0.00084633 0.00042317 1.46 0.2701
GLM Output
50
PROC SORT;BY type time;PROC MEANS; BY type time; OUTPUT OUT=cells MEAN=response;RUN;
51
---------------------------------------- type=A time=5 ---------------------------------- The MEANS Procedure Analysis Variable : response N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 3 0.1850000 0.0173494 0.1700000 0.2040000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ---------------------------------------- type=A time=10 --------------------------------- Analysis Variable : response N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 3 0.1786667 0.0104083 0.1670000 0.1870000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ---------------------------------------- type=A time=15 --------------------------------- Analysis Variable : response N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 3 0.2120000 0.0208806 0.1980000 0.2360000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ---------------------------------------- type=V time=5 ---------------------------------- Analysis Variable : response N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 3 0.2683333 0.0110151 0.2570000 0.2790000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ---------------------------------------- type=V time=10 --------------------------------- The MEANS Procedure Analysis Variable : response N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 3 0.2593333 0.0240069 0.2350000 0.2830000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ---------------------------------------- type=V time=15 --------------------------------- Analysis Variable : response N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 3 0.2650000 0.0138924 0.2560000 0.2810000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
PROC MEANS Output
52
Stimulus Data Obs type time _TYPE_ _FREQ_ response 1 A 5 0 3 0.18500 2 A 10 0 3 0.17867 3 A 15 0 3 0.21200 4 V 5 0 3 0.26833 5 V 10 0 3 0.25933
6 V 15 0 3 0.26500
PROC PRINT Output (dataset CELLS)
PROC SORT;BY type time;PROC MEANS; BY type time; OUTPUT OUT=cells MEAN=response;RUN;PROC PRINT;RUN;
53
Stimulus Data
Obs type time response 1 A 5 0.204 2 A 5 0.170 3 A 5 0.181 4 A 10 0.167 5 A 10 0.182 6 A 10 0.187 7 A 15 0.202 8 A 15 0.198 9 A 15 0.236 10 V 5 0.257 11 V 5 0.279 12 V 5 0.269 13 V 10 0.283 14 V 10 0.235 15 V 10 0.260 16 V 15 0.256 17 V 15 0.281 18 V 15 0.258
PROC PRINT Output (dataset STIMULUS)
PROC PRINT data=stimulus;RUN;
54
Stimulus Data
* OUTPUT MEAN INTERACTION PLOTS;PROC GPLOT DATA=cells;PLOT response*type=time;SYMBOL1 V=CIRCLE I=JOIN C=BLACK;SYMBOL2 V=DOT I=JOIN C=BLACK;SYMBOL3 V=BOX I=JOIN C=BLACK;RUN;
55
Stimulus Data
* OUTPUT MEAN INTERACTION PLOTS;PROC GPLOT DATA=cells;PLOT response*time=type;SYMBOL1 V=CIRCLE I=JOIN C=BLACK;SYMBOL2 V=DOT I=JOIN C=BLACK;RUN;
56
Stimulus Data
57
Lab Assignment
Run a 2-factor ANOVA on the Pilot Plant Data
a. Print ANOVA Table
b. Graph Interaction Plots
c. Plot Histogram and Probability Plot of Residuals
i. using standard GCHART and UNIVARIATE procedures
ii. using Analyst
http://support.sas.com/techsup/sample/sample_graph.html
http://support.sas.com/documentation/onlinedoc/sas9doc.html
SAS Online Resources
58
Pilot Plant Data Variable = Chemical Yield Factors: A – Temperature (160, 180)
B – Catalyst (C1 , C2) 160 C1 59160 C1 61160 C1 50 160 C1 58180 C1 74180 C1 70180 C1 69180 C1 67160 C2 50160 C2 54160 C2 46160 C2 44180 C2 81180 C2 85180 C2 79180 C2 81