Upload
delphia-morgan
View
238
Download
4
Embed Size (px)
Citation preview
Chapter 13
Analysis of Multi-factor Experiments
Dec 6th 2007
Our Group Member
Part I: Background Introduction:
1.Ruirui Pan: Why do we work on this topic?
2.Xuanti Ying: Introduction to related technology
Part II: Theoretical Derivation
3. Parameter Estimation:
Ji-Young Yun
4. Theory of two factor experiments: Mingyi Hong
5. Theory of 2^k experiments
Zheng Zhao
6. Data analysis of 2^2 experiment
Wei Hu
7. Data analysis of 2^3 experiment
Hao Zhang
8. Data analysis of 2^k experiment
Ti Zhou
Part III: Data Analysis
Part IV: Model analysis and Conclusion
9.Model diagnostic and SAS programming
Jun Huang
10.Regression approach and conclusion
Wenbin Zhang
Why do we work on this topic?
by Ruirui Pan
What is multifactor experiment?
• In statistics, a multifactor experiment (also called factorial experiment) is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors.
Basic Concepts
• The primary purpose of an experiment is to evaluate how a set of predictor variables (called factors in experimental design jargon) affect a response variable.
• The different possible values of a factor are called its levels.
• Each treatment is a particular combination of the levels of different treatment factors.
Example
Factors
• A Factor is a linked set of experimental
conditions we may wish to comparee.g. Levels of temperature Different methods of solving a problem Pressure of bicycle tire
• Two types:
Treatment factors
Nuisance factors
Example
• Suppose an engineer wishes to study the total power used by each of two different motors, A and B, running at each of two different speeds, 2000 or 3000 RPM.
So the factorial experiment would consist of 8 experimental units: motor A at 2000 RPM, motor B at 2000 RPM, motor A at 3000 RPM, and motor B at 3000 RPM. Each combination of a single level selected from every factor is present twice.
• Single factor experiment ---one-way ANOVA
• Two-factor Experiments with Fixed Crossed Factors--- Factor A with a levels and B with b levels are crossed, there are a*b treatment combinations
• 2^3 Factorial Experiments---3 factors with 2 levels each, so there are 8 treatment combinations
• 2^k Factorial Experiments--- k factors with 2 levels each, so there are 2^k treatment combinations
The importance of multifactor experiments
Introduction to related
technology
By Xuanti Ying
related technology
• ANOVA
Introduction to ANOVA
• Analysis of variance (ANOVA) is used to test hypothesis about differences between two or more means. The t-test based on the standard error of the difference between two means can only be used to test differences between two means. When there are more than two means, it is possible to compare each mean with each other mean using t-tests. However, conducting multiple t-tests can lead to severe inflation of the Type I error rate. Analysis of variance can be used to test differences among several means for significance without increasing the Type I error rate.
Who Developed this Technology
• The initial techniques of the analysis of variance were developed by the statistician and geneticist R.A.Fisher in the 1920s and 1930s.
The Significance of ANOVA
One important reason for using ANOVA methods rather than multiple two-group studies analyzed via t-tests is that the former method is more efficient, and with fewer observations we can gain more information.
• Controlling for factors
• Detects interactive effects (The term interaction was first used by Fisher, 1926.)
Logic of ANOVA• Partitioning of the sum of squares The fundamental technique is a partitioning of
the total sum of squares into components related to the effects used in the model.
• The F-test The F-test is used for comparisons of the
components of the total deviation. the F-test is the mean square for each main effect and the interaction effect divided by the within variance.
Several Types of ANOVA
• One way ANOVA is used to test for differences among two or more independent groups.
• Factorial ANOVA or Two way ANOVA is used when want to study the effects of two or more treatment variables. (our case)
• Mixed-design ANOVA
Tests Supplementing ANOVA
• All pairwise t-test• Fisher’s LSD (Least Significant Difference
Method)• Tukey’s HSD (Honestly Significantly Different
Test proposed by the statistician John Tukey)• Newman-Keuls method• Duncan’s Procedure (similar to the Newman-
Keuls method)
Parameter Estimation by Ji-Young Yun
• EXAMPLE
Consider a grade treatment experiment to evaluate the effects of sleeping hours and the percentage of attendance of the class
• Factor A : sleeping hours• Levels
• enough (if sleeping hours ≥ 8)
• normal (if 6 ≤ sleeping hours ˂ 8)
• lack (if sleeping hours ˂ 6 )
• Factor B:the percentage of attendance
• High (the percentage ≥ 50%)
• Low (the percentage ˂ 50%)
A grade treatment experiment to evaluate the effects A and B
Factor A Levels
Factor B Levels
1(high) 2(low)
1(enough) 80, 85, 60 70, 72, 62
2(normal) 90, 92, 94 80, 70, 90
3(lack) 100, 80, 90 90, 80, 70
• Each student numbers of treatment combination is 3
• a = 3
• b = 2
• n = 3
• N = (3)(2)(3) =18
A grade treatment experiment to evaluate the effects A and B
Factor A Levels
Factor B Levels
1(high) 2(low) Row mean
1(enough) 75 68 71.5
2(normal) 92 80 86
3(lack) 90 80 85
Column mean
85.6 76 80.8
Parameters & Estimates
yijk: the kth observation on the (i, j)th treatment combination
the mean of cell (i, j)
i.i.d random error, normal distribution
: i th row main effect : j th column main effect
: (i, j)th row-column interaction
ijk ij ijk i j ij ijkY ( )
ijk:
ij :
i j
ij( )
Parameters & Estimates
i i j j.., . .., . ..
ij ij i j ij i. . j( ) ..
ijy . : sample mean of the (i, j)th cell;
least square estimate of ij
i iˆ = y .. y... j jˆ = y. . y...
ij ij. i.. .j. ...= y y y y
Analysis of VarianceANOVA Table for Crossed Two-Way Layout
• SST = 2223.92
• SSA = 788.64
• SSB = 414.72
• SSAB = 18.9
• SSE = 1001.66
• SST = SSA + SSB + SSAS + SSE
• MSA = 394.43 FA = MSA/MSE=4.33
• MSB = 414.72 FB = MSB/MSE=4.55
• MSAB = 9.45 FAB = MSAB/MSE = 0.103
• MSE = 91.06• The main effect of sleeping hours and the percentage of
attendance are both highly significant, but the interaction between the sleeping hours and the percentage of attendance is NOT significant at the .1 level.
•
Theory Derivation for Two Factor Experiment
Mingyi Hong
Overview
• Sometimes a researcher might want to simultaneously examine the effects of two treatments.
• Examples
The effect of sex and race on wage
The effect of the level of pollution and the level of city services on housing prices
Data from a Balanced Two-way Layout
Factor
A levels
ⁿⁿ Factor B levels
1 2 3 4 5
1 … … …
2 … … … …
3 … … … … …
The Model
The Sum of Squares
The Chi Square Distributions
• Actually, each of the previous sum of squares divides the variance is a chi square distribution. For example,
The ANOVA Identity
Total DF = Row DF + Column DF + Interaction DF + Error DF
The Tests
Multiple Comparisons Between Rows and/or Columns• Pairwise comparisons between the row main effects
and/or between the column main effects are generally of interest only when the interactions are nonsignificant.
• Tukey method to determine 100(1-a)% simultaneous confidence intervals is as follows.
Theory derivation about 2^k experiment
Zheng Zhao
One Factor Experiment with Two levels
A Data Mean
A+ y11, y12, y13, y14,……, y1n1 y1
A- y21, y22, y23, y24,……, y2n2 y2
2² Experiment—The Introduction of Factor B with Two levels
• Factor A = High (+) or Low (-)• Factor B = High (+) or Low (-)
Four Treatment Combinations
• ab = (A High, B High)
• a = (A High, B Low)
• b = (A Low, B High)
• (1) = (A Low, B Low)
Yij ~ N (μi, σ²), i = 1, 2, …,2^k; j = 1,2,…,n
Factor Treatment
Combination
Data
A B
Low (-) Low (-) (1) y(1), 1,……, y(1),n
High (+) Low (-) a ya, 1,……, ya,n
Low (-) High (+) b yb, 1,……, yb,n
High (+) High (+) ab yab, 1,……, yab,n
Main Effect and Interaction Effect of 2² Experiment
2
)()(
2
)()()(
222
)()(
222
)()(
)1()1(
)1()1(
)1()1(
uuuuuuuu
uuuuuuuu
uuuuuuuu
baababab
ababbaab
baababab
Estimated Effects
• Est. Main Effect A
• Est. Main Effect B
• Est. Interaction AB2
)()(
2
)()(
2
)()(
)1(
)1(
)1(
yyyy
yyyy
yyyy
abab
baab
abab
2³ Experiment—One More Factor Considered Factor A
Factor B
Factor C
2³ = 8Treatment
Combinations
• (1): Low, Low, Low• a: High, Low, Low• b: Low, High, Low• ab: High, High, Low• c: Low, Low, High• ac: High, Low, High• bc: Low, High, High• abc: High, High, High
4
)}(){()}(){(
4
)}(){()}(){(
4
)}(){()}(){(
4
)}(){()}(){(
4
)()()()(
4
)()()()(
4
)()()()(
)1(
)1(
)1(
)1(
)1(
)1(
)1(
yyyyyyyyABC
yyyyyyyyAC
yyyyyyyyBC
yyyyyyyyAB
yyyyyyyyC
yyyyyyyyB
yyyyyyyyA
ababcacbcabc
caacbbcbaabc
baabcbcacabc
ababcacbcabc
caacbbcbaabc
baabcbcacabc
ababcacbcabc
2^2 experiment
Wei Hu
Calculate the estimated main effects A and B, and the interaction AB.
B Low High
Low y11=10 y12=15 ỹ1.=12.5
A High y21=20 y22=35 ỹ2.=27.5
ỹ.1=15 ỹ.2=25 ỹ..=40
The estimated main effects are:
A ={(y22-y12)+(y21-y11)}/2
={(35-15)+(20-10)}/2 = 15
B ={(y22-y21)+(y12-y11)}/2
={(35-20)+(15-10)}/2 = 10
The estimated interaction effect is:
AB ={(y22-y12)-(y21-y11)}/2
={(35-15)-(20-10)}/2 = 5
2 2 2 2 2..
1 1
2 2 2 2. .. . ..
1 1 1
2 2 2 2. .. . ..
1 1 1
( ) {(10 20) (15 20) (20 20) (35 20) } 350
( ) ( ) 2{(12.5 20) (27.5 20) } 225
( ) ( ) 2{(15 20) (25 20) } 1
I J
ij
i j
I J a
i i
i j i
I J b
j j
i j i
SST y y
SSA y y J y y
SSB y y I y y
2 2 2. . ..
1 1
2 2
00
( ) {(10 12.5 15 20) (15 12.5 25 20)
(20 27.5 15 20) (35 27.5 25 20) } 25
350 225 100 25 0
I J
ij i j
i j
SSAB y y y y
SSE SST SSA SSB SSAB
ANOVA Table (Two-Way Layout with Fixed Factors)
Source d.f. SS MS F
A
B
AB
___________________________________________
Total
2
1
( . ..)a
i
J yi y
2
1
( . ..)b
i
I y j y
1I
1J
( 1)( 1)I J 2
. . ..
1 1
( )I J
ij i j
i j
y y y y
2..
1 1
( )I J
ij
i j
y y
1N
1
1
SSA
I
SSBJ
( 1)( 1)
SSAB
I J
MSA
MSAB
MSB
MSAB
Analysis of VarianceAnalysis of Variance
(Two-Way Layout with Fixed Factors)
Source d.f. SS MS F
A 1 225 225 9
B 1 100 100 4
AB 1 25 25
Total 3 350
23 Experiment.
Hao Zhang
DESIGN AND CALCULATION MATRIX
RUN Comb I X1 X2 X1X2 X3 X1X3 X2X3X1X2
X3
1 (1) + - - + - + + -
2 x1 + + - - - - + +
3 x2 + - + - - + - +
4 x1x2 + + + + - - - -
5 x3 + - - + + - - +
6 x1x3 + + - - + + - -
7 x2x3 + - + - + - + -
8 x1x2x3 + + + + + + + +
Example:
• To study the effects of bicycle seat height, generator use, and tire pressure on the time taken to make a half-block uphill run.
• The levels of the factors were as follows:
Seat height 26” (-) 30” (+)
Generator Off (-) On (+)
Tire pressure 40 psi (-) 55 psi (+)
BICYCLE DATA:
Travel Times from Bicycle Experiment
Factor Time ( Secs.)
A B C Run 1 Run 2 Mean
- - - 51 54 52.2
+ - - 41 43 42
- + - 54 60 57
+ + - 44 43 43.5
- - + 50 48 49
+ - + 39 39 39
- + + 53 51 52
+ + + 41 44 42.5
CALCULATION OF THE EFFECTS
X1={(42.5-52)+(39-49)+(43.5-57)+(42-52.5)}/4=-10.875
X2={(42.5-39)+(52-49)+(43.5-42)+(57-52.5)}/4=+3.125
X3={(42.5-43.5)+(52-57)+(39-42)+(49-52.5)}/4=-3.125
X1X2={(42.5-52)-(39-49)}/4+{(43.5-57)-(42-52.5)}/4=-0.625
X1X3={(42.5-52)-(43.5-57)}/4+{(39-49)-(42-52.5)}/4=+1.125
X2X3={(42.5-39)-(43.5-42)}/4+{(52-49)-(57-52.5)}/4=+0.125
X1X2X3={(42.5-52)-(39-49)}/4-{43.5-57)-(42-52.5)}/4=+0.875
X1X2
X1X3
X2X3
X1X2X3
CONCLUSIONS:
♦ Only the main effects are large; All interactions are small in comparison.
♦ The X1 and X3 main effects are negative, implying that to reduce the travel time the high levels of these factors should be used.
♦ The X2 main effect is positive, implying that to reduce the travel time the low level of X2 should be used
Data Analysis of 2^k Experiment
By Ti Zhou
Based on the former discussion, we now generalize the situation into the
k level where k>3 and deal with an example.
• Assumption: n i.i.d. observations (j=1,2..,n) at each ith treatment combination
• Denote their sample means by • Estimated effect
• is the contrast coefficient for the main effects. • main effect is at high level main effect is at low level
ijy
( 1,2,..., 2 )kiy i 2
11 12 2
k
i iik k
c yContrast
ic1
1ic
For example when k=3, the contrast coefficients are as follows:
Run A B C Treatment
1 - - - -1
2 + - - a
3 - + - b
4 + + - ab
5 - - + c
6 + - + ac
7 - + + bc
8 + + + abc
The contrast coefficients for interactions are obtained by taking term-by-term products of the contrast vectors of corresponding main effects.
The contrast coefficient for 2^4 Experiment
2k Design ExampleProblem Statement:Problem Statement: Generally there are three important factors
in designing a computer.
1.Memory Size (A)
2.Cache Size (B)
3.Number of Processors (C)
A manufacturer wants to study above three effects on the
performance of computers and their interaction. The levels of
each factor are as follows:
Factor Level -1 Level 1ABC
Memory SizeCache SizeNumber of Processors
4MB1kB1
16MB2kB2
Computer Design Experiment
Treatmen
t Coded Factors Benchmark Scores
Factor Levels
RunCombinat
ion A B C Replic
a1Replica2
Low (-1)
High(+1)
1 (-1) -1 -1 -1 16.7 14.8 A(MB) 4 16
2 a 1 -1 -1 24.5 23.3 B(KB) 1 2
3 b -1 1 -1 13 11.6 C(Unit) 1 2
4 ab 1 1 -1 34.2 33.6
5 c -1 -1 1 45.1 46.3
6 ac 1 -1 1 59.2 57.3
7 bc -1 1 1 51.4 49.3
8 abc 1 1 1 81.9 84.6
Go back to Yates‘ Algorithm
• The estimated effect for each factor and interaction effect can be calculated as follows:
• For example, the main effect of A =
=
(-15.75+23.9-12.3+33.9-45.7+58.5-50.35+83.25)/4 = 18.8.
To summarize, all the effects are calculated:
2
11 12 2
k
i iik k
c yContrast
(1)( ) / 4a b ab c ac bc abcy y y y y y y y
Effect A B C AB AC BC ABC
Est.Effect 18.8 21 151.7 8.453.92
55.77
51.72
5
Since n=2, we can get SS.effect = 4*(Est.effect)^2
• Statistics Inference for the experiment
22
1 1
2
2 2
2 1,2 ( 1)
( )
2 ( 1)
( 2 )( . )
.
k
k
n
ij ii j
k
k
Effect n
SSE y y
SSEMSE s
n
n Est effect MSeffectF f
MSEsMSeffect SSeffect
AVONA Table for the Experiment of Design of Computer Factors Est.effect SSeffect/MSeffect DF Mean Sq. F ratio P-value
Memory Size(A) 18.8 1413.76 1 1413.76 937.8176 <.0001
Cache Size(B) 9.05 327.61 1 327.61 217.3201 <.0001
Number of Processors(
c) 37.925 5753.2225 1 5753.2225 3816.4 <.0001
AB 8.45 285.61 1 285.61 189.4594 <.0001
AC 3.925 61.6225 1 61.6225 40.87728 0.0002
BC 5.775 133.4025 1 133.4025 88.49254 <.0001
ABC 1.725 11.9025 1 11.9025 7.895522 0.0228
Total error12.06 8 MSE=1.5075
Total 7999.19 15
Conclusion
• Intuitively increment in all the three factors (Memory, Cache and Processor) can pose a positive effect on the performance of a computer.
• The p-value for both main effects and interaction effects are very small, indicating that all the effects are highly significant (at significance level 0.05) which is consistent with intuition.
• The F-ratio for the effect of factor c( number of processors )is the largest number among all the result. It indicates that it is the most important factor in the performance of a computer.
Yates' Algorithm Frank Yates(1902-1994) had found a systematic algorithm to perform
the above calculations. Recall the former example: Experiment of
Computer Design. We will use it to explain the algorithm.
List all the treatment combinations and their sample means in the standard order
The 2^(k-1) successive pairs of means are first added and then subtracted. The result saves in the column labeled I.
Repeat the calculation k times by using the data from column I and saves the result in the column labeled K.
Divide the first entry in column k by 2^k while divide the remains by 2^(k-1). Then we obtain the grand mean and all estimated effects.
Treatment Combination
Treatment Mean
Estimated Effect
SS for
I II III Effect
(1) 15.75 39.65 85.85 323.4 40.425
a 23.9 46.2 237.55 75.2 18.8 1413.76
b 12.3 103.95 29.75 36.2 9.05 327.61
ab 33.9 133.6 45.45 33.8 8.45 285.61
c 45.7 8.15 6.55 151.7 37.925 5753.223
ac 58.25 21.6 29.65 15.7 3.925 61.6225
bc 50.35 12.55 13.45 23.1 5.775 133.4025
abc 83.25 32.9 20.35 6.9 1.725 11.9025
RELATED SAS PROGRAM& MODEL DIAGNOSTICS
BY TONY
SAS-----
•A more efficient way
•Very interesting
One Suggestion?
This experiment employed a 23 factorial experimental design with two quantitative factors----temperature T and concentration C
----and a single qualitative factor----type of catalyst K. Each data value recorded is for the
response yields y1, y2 of two duplicate runs. The data is as follows:
Example: The Pilot Plants InvestigationRun Number Temperature Concentration Catalyst Yield1 Yield2
T(C) C(%) K(A or B) y1(%) y2(%)
1 160 20 A 59 61
2 180 20 A 74 70
3 160 40 A 50 58
4 180 40 A 69 67
5 160 20 B 50 54
6 180 20 B 81 85
7 160 40 B 46 44
8 180 40 B 79 81
Run Number Temperature Concentration Catalyst Yield1 Yield2
T(C) C(%) K(A or B) y1(%) y2(%)
1 -1 -1 -1 59 61
2 1 -1 -1 74 70
3 -1 1 -1 50 58
4 1 1 -1 69 67
5 -1 -1 1 50 54
6 1 -1 1 81 85
7 -1 1 1 46 44
8 1 1 1 79 81
data plant;
input t c k result @@;
tc=t*c;
tk=t*k;
ck=c*k;
tck=t*c*k;
datalines;
-1 -1 -1 59 -1 -1 -1 61
1 -1 -1 74 1 -1 -1 70
-1 1 -1 50 -1 1 -1 58
1 1 -1 69 1 1 -1 67
-1 -1 1 50 -1 -1 1 54
1 -1 1 81 1 -1 1 85
-1 1 1 46 -1 1 1 44
1 1 1 79 1 1 1 81
;
run;
Q:Can you input the data in a more concise way?
data plant;do k= -1 to 1 by 2;do c=-1 to 1 by 2;
do t=-1 to 1 by 2;do r=1 to 2;
input result @@;ck=k*c; tk=k*t; tc=c*t; tck=k*c*t;output;end;
end;end;
end;datalines;59 61 74 70 50 58 69 6750 54 81 85 46 44 79 81;run;Obs. k c t r Res. ck tk tc tck
1 -1 -1 -1 1 59 1 1 1 -1
2 -1 -1 -1 2 61 1 1 1 -1
3 -1 -1 1 1 74 1 -1 -1 1
4 -1 -1 1 2 70 1 -1 -1 1
5 -1 1 -1 1 50 -1 1 -1 1
6 -1 1 -1 2 58 -1 1 -1 1
7 -1 1 1 1 69 -1 -1 1 -1
proc glm data=plant;
class t c k tc tk ck tck;
model result = t c k tc tk ck tck;
run;
OUTPUT: The SAS System 19:23 Wednesday, December 1, 2007 30
The GLM Procedure
Class Level Information
Class Levels Values
t 2 -1 1
c 2 -1 1
k 2 -1 1
tc 2 -1 1
tk 2 -1 1
ck 2 -1 1
tck 2 -1 1
Number of Observations Read 16
Number of Observations Used 16
The SAS System 19:23 Wednesday, December 1, 2007 31
The GLM Procedure
Dependent Variable: result
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 7 2635.000000 376.428571 47.05 <.0001
Error 8 64.000000 8.000000
Corrected Total 15 2699.000000
R-Square Coeff Var Root MSE result Mean
0.976288 4.402221 2.828427 64.25000
Source DF Type I SS Mean Square F Value Pr > F
t 1 2116.000000 2116.000000 264.50 <.0001
c 1 100.000000 100.000000 12.50 0.0077
k 1 9.000000 9.000000 1.13 0.3198
tc 1 9.000000 9.000000 1.13 0.3198
tk 1 400.000000 400.000000 50.00 0.0001
ck 1 0.000000 0.000000 0.00 1.0000
tck 1 1.000000 1.000000 0.13 0.7328
Source DF Type III SS Mean Square F Value Pr > F
t 1 2116.000000 2116.000000 264.50 <.0001
c 1 100.000000 100.000000 12.50 0.0077
k 1 9.000000 9.000000 1.13 0.3198
tc 1 9.000000 9.000000 1.13 0.3198
tk 1 400.000000 400.000000 50.00 0.0001
ck 1 0.000000 0.000000 0.00 1.0000
tck 1 1.000000 1.000000 0.13 0.7328
Conclusion:
At the significance level 0.05, the main effects T, C and TK interaction are all significant.
REVISION:proc glm data=plant; class t c k tk; model result = t c k tk; output out=plant2 r=resid p=predic;run;proc plot data=plant2; plot resid*predic;run;
OUTPUT:
Source DF Type III SS Mean Square F Value Pr > F
t 1 2116.000000 2116.000000 314.54 <.0001
c 1 100.000000 100.000000 14.86 0.0027
k 1 9.000000 9.000000 1.34 0.2719
tk 1 400.000000 400.000000 59.46 <.0001
Constant Variance ? Plot of resid*predic. Legend: A = 1 obs, B = 2 obs, etc.
resid |
|
3.5 + A
|
3.0 + A
|
2.5 +
|
2.0 + A
|
1.5 + A A A
|
1.0 + A
|
0.5 +
|
0.0 + A A
|
-0.5 + A A
|
-1.0 + A
|
-1.5 +
|
-2.0 + A
|
-2.5 + A
|
-3.0 + A
|
-3.5 +
|
-4.0 +
|
-4.5 + A
|
---+---------+---------+---------+---------+---------+---------+---------+---------+--
45 50 55 60 65 70 75 80 85 predic
CONCLUSION
From the plot of residuals against fitted values, we find that most dots are fairly dispersed. Although dispersion of some dots appears uneven, maybe it’s not serious enough to reject the constant variance assumption.
Normal Probability Plot3.5+ *++++*
| *+++| * *+++*+| *+*+++
-0.5+ * *+++| +*+++| *++*| +++++
-4.5+ ++++*+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
CONCLUSIONThe data comes from a population which is normal or approximately normal.
proc univariate data=plant2 plot;
var resid;
run;
Is the population normal?
Regression Approach & Summary
Wenbin Zhang
• Purpose of the Regression Approach• Regression Approach to Two-Factor Experiments• Regression Approach to Experiments• An Example to Regression on Experiments• Summary on Chapter 13
2k
32
Purpose of the Regression Approach
• To provide A unified approach to the analysis of balanced or unbalanced designs
• To predict the responses at specified combinations of the levels of the experimental factors
Regression Approach to Two-Factor Experiments• Define indicator variables:
For i= 1,…, a-1
+1 if the observation is from the i-th row
= -1 if the observation is from the a-th row
0 otherwise
For j= 1,…, b-1
+1 if the observation is from the j-th column
= -1 if the observation is from the b-th column
0 otherwise
i
j
Regression Approach to Two-Factor Experiments• Regression Model:
– Indicator variables are predictor variables– are unknown parameters
• The Regression Model is equivalent to the model we have examined:
1 1 1 1
1 1 1 1
( )a b a b
i i j j ij i ji j i j
Y u v u v
,i ju v
, , ( )i j ij
ijk ij ijk i j ij ijkY ( ) i i j j.., . .., . ..
ij ij i j ij i. . j( ) ..
Regression Approach to 2k Experiments• For 22 experiments, define indicator variables x1 and x2 to
represent the levels of A and B:
if A is low if B is low
if A is high if B is high• Regression Model:
• Estimated with main effects and interaction effects:
0 12
ˆˆ ( )ˆ ˆ ˆ ˆ, , .
2 2 2 2
A B ABy
0 12( )
, , .2
0 1 1 2 2 1 2( )E Y x x x x
1 1x 2 1x 1 1
Regression Approach to 2k Experiments• Similarly, the full regression model for k=3 is:
• Substitute with main effects and interaction effects:
• The reduced model after dropping all interactions from the full model:
– The parameters will be unchanged because of the orthogonal nature of the design
0 1 1 2 2 3 3 12 1 2 13 1 3 23 2 3 123 1 2 3( )E Y x x x x x x x x x x x x
1 2 3 1 2 1 3 2 3 1 2 3ˆ2 2 2 2 2 2 2
A B C AB AC BC ABCy y x x x x x x x x x x x x
1 2 3ˆ2 2 2
A B Cy y x x x
Regression Approach to 2k Experiments• To predict the response at specified
combinations of the levels of the experimental factors:– For numerical factors:
The interpolation formula:
– For nominal factors (e.g., either on or off):
Don’t need interpolation
_ _ _ ( ) 2
2 ( ) 2i
Specified level Average level Specified level High Lowx
Range High Low
An Example to Regression on 23 Experiments• The effects of bicycles seat height, generator use, and tire
pressure on the time taken to make a half-block uphill run.
• The level of factors:
Seat height (Factor A): 26”(-), 30”(+)
Generator (Factor B): Off(-), On(+)
Tire pressure (Factor C): 40psi(-), 55psi(+)
• The data are shown in the next slice.
• Questions:– 1) Predict the minimum travel time using the regression model– 2) Predict the travel time when seat height = 27”, generator is
off, and tire pressure = 50psi.
An Example to Regression on 23 Experiments
Travel Times from Bicycle Experiment
Factor Time (Secs.)
A B C Run 1 Run 2 Mean
- - - 51 54 52.5
+ - - 41 43 42.0
- + - 54 60 57.0
+ + - 44 43 43.5
- - + 50 48 49.0
+ - + 39 39 39.0
- + + 53 51 52.0
+ + + 41 44 42.5
An Example to Regression on 23 Experiments
• Because all interactions were found nonsignificant, we only consider main effects:
(42.5 52.0) (39.0 49.0) (43.5 57.0) (42.0 52.5)10.875
4(42.5 39.0) (52.0 49.0) (43.5 42.0) (57.0 52.5)
3.1254
(42.5 43.5) (52.0 57.0) (39.0 42.0) (49.0 52.5)3.125
447.1875
10.875ˆ 47.1875
2
A
B
C
y
y x
1 2 3 1 2 3
3.125 3.12547.1875 5.4375 1.5625 1.5625
2 2x x x x x
An Example to Regression on 23 Experiments• 1) The minimum predicted travel time is:
• 2) when seat height = 27”, generator is off, and tire pressure = 50psi, we have
47.1875 5.4375( 1) 1.5625( 1) 1.5625( 1) 38.625secy
1 2 3
27 (30 26) / 2 50 (55 40) / 20.5, 1, 0.333
(30 26) / 2 (55 40) / 2
ˆ 47.1875 5.4375( 0.5) 1.5625( 1) 1.5625( 0.333) 47.823sec
x x x
y
Summary
This chapter gives us the solution of experimental design and the method of analyzing data we collected.
1. The design of the experiment is called complete factorial design compared to fractional factorial design.
2. 2k experiment is better than One-Factor-a-time because it can detect interactions.
3. We discussed two-factor experiments with arbitrary number of levels per factor and k-factor experiments with two levels per factor.. Two-factor experiments are considered for balanced designs and unbalanced designs.. 2k experiments are considered only for balanced designs.
Summary (cont.)
4. When we get the data of our experiment, we just need to find the parameter to our linear model:
SST = SS( all single factor effects) + SS(all interactive effects) + e(noise)
5. ANOVA table gives us a great way to analyze the data to get and give us the significance of each factor.
6. We can use regression approach to get the parameters of the model above.
Thank you!
Any Questions?