Chapter 13 Analysis of Multi-factor Experiments Dec 6 th 2007

Chapter 13

Analysis of Multi-factor Experiments

Dec 6th 2007

Our Group Member

Part I: Background Introduction:

1.Ruirui Pan: Why do we work on this topic?

2.Xuanti Ying: Introduction to related technology

Part II: Theoretical Derivation

3. Parameter Estimation:

Ji-Young Yun

4. Theory of two factor experiments: Mingyi Hong

5. Theory of 2^k experiments

Zheng Zhao

6. Data analysis of 2^2 experiment

Wei Hu

7. Data analysis of 2^3 experiment

Hao Zhang

8. Data analysis of 2^k experiment

Ti Zhou

Part III: Data Analysis

Part IV: Model analysis and Conclusion

9.Model diagnostic and SAS programming

Jun Huang

10.Regression approach and conclusion

Wenbin Zhang

Why do we work on this topic?

by Ruirui Pan

What is multifactor experiment?

• In statistics, a multifactor experiment (also called factorial experiment) is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors.

Basic Concepts

• The primary purpose of an experiment is to evaluate how a set of predictor variables (called factors in experimental design jargon) affect a response variable.

• The different possible values of a factor are called its levels.

• Each treatment is a particular combination of the levels of different treatment factors.

Example

Factors

• A Factor is a linked set of experimental

conditions we may wish to comparee.g. Levels of temperature Different methods of solving a problem Pressure of bicycle tire

• Two types:

Treatment factors

Nuisance factors

Example

• Suppose an engineer wishes to study the total power used by each of two different motors, A and B, running at each of two different speeds, 2000 or 3000 RPM.

So the factorial experiment would consist of 8 experimental units: motor A at 2000 RPM, motor B at 2000 RPM, motor A at 3000 RPM, and motor B at 3000 RPM. Each combination of a single level selected from every factor is present twice.

• Single factor experiment ---one-way ANOVA

• Two-factor Experiments with Fixed Crossed Factors--- Factor A with a levels and B with b levels are crossed, there are a*b treatment combinations

• 2^3 Factorial Experiments---3 factors with 2 levels each, so there are 8 treatment combinations

• 2^k Factorial Experiments--- k factors with 2 levels each, so there are 2^k treatment combinations

The importance of multifactor experiments

Introduction to related

technology

By Xuanti Ying

related technology

• ANOVA

Introduction to ANOVA

• Analysis of variance (ANOVA) is used to test hypothesis about differences between two or more means. The t-test based on the standard error of the difference between two means can only be used to test differences between two means. When there are more than two means, it is possible to compare each mean with each other mean using t-tests. However, conducting multiple t-tests can lead to severe inflation of the Type I error rate. Analysis of variance can be used to test differences among several means for significance without increasing the Type I error rate.

Who Developed this Technology

• The initial techniques of the analysis of variance were developed by the statistician and geneticist R.A.Fisher in the 1920s and 1930s.

http://en.wikipedia.org/wiki/Image:RonaldFisher.jpg

The Significance of ANOVA

One important reason for using ANOVA methods rather than multiple two-group studies analyzed via t-tests is that the former method is more efficient, and with fewer observations we can gain more information.

• Controlling for factors

• Detects interactive effects (The term interaction was first used by Fisher, 1926.)

Logic of ANOVA• Partitioning of the sum of squares The fundamental technique is a partitioning of

the total sum of squares into components related to the effects used in the model.

• The F-test The F-test is used for comparisons of the

components of the total deviation. the F-test is the mean square for each main effect and the interaction effect divided by the within variance.

Several Types of ANOVA

• One way ANOVA is used to test for differences among two or more independent groups.

• Factorial ANOVA or Two way ANOVA is used when want to study the effects of two or more treatment variables. (our case)

• Mixed-design ANOVA

Tests Supplementing ANOVA

• All pairwise t-test• Fisher’s LSD (Least Significant Difference

Method)• Tukey’s HSD (Honestly Significantly Different

Test proposed by the statistician John Tukey)• Newman-Keuls method• Duncan’s Procedure (similar to the Newman-

Keuls method)

Parameter Estimation by Ji-Young Yun

• EXAMPLE

Consider a grade treatment experiment to evaluate the effects of sleeping hours and the percentage of attendance of the class

• Factor A : sleeping hours• Levels

• enough (if sleeping hours ≥ 8)

• normal (if 6 ≤ sleeping hours ˂ 8)

• lack (if sleeping hours ˂ 6 )

• Factor B:the percentage of attendance

• High (the percentage ≥ 50%)

• Low (the percentage ˂ 50%)

A grade treatment experiment to evaluate the effects A and B

Factor A Levels

Factor B Levels

1(high) 2(low)

1(enough) 80, 85, 60 70, 72, 62

2(normal) 90, 92, 94 80, 70, 90

3(lack) 100, 80, 90 90, 80, 70

• Each student numbers of treatment combination is 3

• a = 3

• b = 2

• n = 3

• N = (3)(2)(3) =18

A grade treatment experiment to evaluate the effects A and B

Factor A Levels

Factor B Levels

1(high) 2(low) Row mean

1(enough) 75 68 71.5

2(normal) 92 80 86

3(lack) 90 80 85

Column mean

85.6 76 80.8

Parameters & Estimates

yijk: the kth observation on the (i, j)th treatment combination

the mean of cell (i, j)

i.i.d random error, normal distribution

: i th row main effect : j th column main effect

: (i, j)th row-column interaction

ijk ij ijk i j ij ijkY ( )

ijk:

ij :

i j

ij( )

Parameters & Estimates

i i j j.., . .., . ..

ij ij i j ij i. . j( ) ..

ijy . : sample mean of the (i, j)th cell;

least square estimate of ij

i iˆ = y .. y... j jˆ = y. . y...

ij ij. i.. .j. ...= y y y y

Analysis of VarianceANOVA Table for Crossed Two-Way Layout

• SST = 2223.92

• SSA = 788.64

• SSB = 414.72

• SSAB = 18.9

• SSE = 1001.66

• SST = SSA + SSB + SSAS + SSE

• MSA = 394.43 FA = MSA/MSE=4.33

• MSB = 414.72 FB = MSB/MSE=4.55

• MSAB = 9.45 FAB = MSAB/MSE = 0.103

• MSE = 91.06• The main effect of sleeping hours and the percentage of

attendance are both highly significant, but the interaction between the sleeping hours and the percentage of attendance is NOT significant at the .1 level.

•

Theory Derivation for Two Factor Experiment

Mingyi Hong

Overview

• Sometimes a researcher might want to simultaneously examine the effects of two treatments.

• Examples

The effect of sex and race on wage

The effect of the level of pollution and the level of city services on housing prices

Data from a Balanced Two-way Layout

Factor

A levels

ⁿⁿ Factor B levels

1 2 3 4 5

1 … … …

2 … … … …

3 … … … … …

The Model

The Sum of Squares

The Chi Square Distributions

• Actually, each of the previous sum of squares divides the variance is a chi square distribution. For example,

The ANOVA Identity

Total DF = Row DF + Column DF + Interaction DF + Error DF

The Tests

Multiple Comparisons Between Rows and/or Columns• Pairwise comparisons between the row main effects

and/or between the column main effects are generally of interest only when the interactions are nonsignificant.

• Tukey method to determine 100(1-a)% simultaneous confidence intervals is as follows.

Theory derivation about 2^k experiment

Zheng Zhao

One Factor Experiment with Two levels

A Data Mean

A+ y11, y12, y13, y14,……, y1n1 y1

A- y21, y22, y23, y24,……, y2n2 y2

2² Experiment—The Introduction of Factor B with Two levels

• Factor A = High (+) or Low (-)• Factor B = High (+) or Low (-)

Four Treatment Combinations

• ab = (A High, B High)

• a = (A High, B Low)

• b = (A Low, B High)

• (1) = (A Low, B Low)

Yij ~ N (μi, σ²), i = 1, 2, …,2^k; j = 1,2,…,n

Factor Treatment

Combination

Data

A B

Low (-) Low (-) (1) y(1), 1,……, y(1),n

High (+) Low (-) a ya, 1,……, ya,n

Low (-) High (+) b yb, 1,……, yb,n

High (+) High (+) ab yab, 1,……, yab,n

Main Effect and Interaction Effect of 2² Experiment

2

)()(

2

)()()(

222

)()(

222

)()(

)1()1(

)1()1(

)1()1(

uuuuuuuu

uuuuuuuu

uuuuuuuu

baababab

ababbaab

baababab

Estimated Effects

• Est. Main Effect A

• Est. Main Effect B

• Est. Interaction AB2

)()(

2

)()(

2

)()(

)1(

)1(

)1(

yyyy

yyyy

yyyy

abab

baab

abab

2³ Experiment—One More Factor Considered Factor A

Factor B

Factor C

2³ = 8Treatment

Combinations

• (1): Low, Low, Low• a: High, Low, Low• b: Low, High, Low• ab: High, High, Low• c: Low, Low, High• ac: High, Low, High• bc: Low, High, High• abc: High, High, High

4

)}(){()}(){(

4

)}(){()}(){(

4

)}(){()}(){(

4

)}(){()}(){(

4

)()()()(

4

)()()()(

4

)()()()(

)1(

)1(

)1(

)1(

)1(

)1(

)1(

yyyyyyyyABC

yyyyyyyyAC

yyyyyyyyBC

yyyyyyyyAB

yyyyyyyyC

yyyyyyyyB

yyyyyyyyA

ababcacbcabc

caacbbcbaabc

baabcbcacabc

ababcacbcabc

caacbbcbaabc

baabcbcacabc

ababcacbcabc

2^2 experiment

Wei Hu

Calculate the estimated main effects A and B, and the interaction AB.

B Low High

Low y11=10 y12=15 ỹ1.=12.5

A High y21=20 y22=35 ỹ2.=27.5

ỹ.1=15 ỹ.2=25 ỹ..=40

The estimated main effects are:

A ={(y22-y12)+(y21-y11)}/2

={(35-15)+(20-10)}/2 = 15

B ={(y22-y21)+(y12-y11)}/2

={(35-20)+(15-10)}/2 = 10

The estimated interaction effect is:

AB ={(y22-y12)-(y21-y11)}/2

={(35-15)-(20-10)}/2 = 5

2 2 2 2 2..

1 1

2 2 2 2. .. . ..

1 1 1

2 2 2 2. .. . ..

1 1 1

( ) {(10 20) (15 20) (20 20) (35 20) } 350

( ) ( ) 2{(12.5 20) (27.5 20) } 225

( ) ( ) 2{(15 20) (25 20) } 1

I J

ij

i j

I J a

i i

i j i

I J b

j j

i j i

SST y y

SSA y y J y y

SSB y y I y y

2 2 2. . ..

1 1

2 2

00

( ) {(10 12.5 15 20) (15 12.5 25 20)

(20 27.5 15 20) (35 27.5 25 20) } 25

350 225 100 25 0

I J

ij i j

i j

SSAB y y y y

SSE SST SSA SSB SSAB

ANOVA Table (Two-Way Layout with Fixed Factors)

Source d.f. SS MS F

A

B

AB

___________________________________________

Total

2

1

( . ..)a

i

J yi y

2

1

( . ..)b

i

I y j y

1I

1J

( 1)( 1)I J 2

. . ..

1 1

( )I J

ij i j

i j

y y y y

2..

1 1

( )I J

ij

i j

y y

1N

1

1

SSA

I

SSBJ

( 1)( 1)

SSAB

I J

MSA

MSAB

MSB

MSAB

Analysis of VarianceAnalysis of Variance

(Two-Way Layout with Fixed Factors)

Source d.f. SS MS F

A 1 225 225 9

B 1 100 100 4

AB 1 25 25

Total 3 350

23 Experiment.

Hao Zhang

DESIGN AND CALCULATION MATRIX

RUN Comb I X1 X2 X1X2 X3 X1X3 X2X3X1X2

X3

1 (1) + - - + - + + -

2 x1 + + - - - - + +

3 x2 + - + - - + - +

4 x1x2 + + + + - - - -

5 x3 + - - + + - - +

6 x1x3 + + - - + + - -

7 x2x3 + - + - + - + -

8 x1x2x3 + + + + + + + +

Example:

• To study the effects of bicycle seat height, generator use, and tire pressure on the time taken to make a half-block uphill run.

• The levels of the factors were as follows:

Seat height 26” (-) 30” (+)

Generator Off (-) On (+)

Tire pressure 40 psi (-) 55 psi (+)

BICYCLE DATA:

Travel Times from Bicycle Experiment

Factor Time ( Secs.)

A B C Run 1 Run 2 Mean

- - - 51 54 52.2

+ - - 41 43 42

- + - 54 60 57

+ + - 44 43 43.5

- - + 50 48 49

+ - + 39 39 39

- + + 53 51 52

+ + + 41 44 42.5

CALCULATION OF THE EFFECTS

X1={(42.5-52)+(39-49)+(43.5-57)+(42-52.5)}/4=-10.875

X2={(42.5-39)+(52-49)+(43.5-42)+(57-52.5)}/4=+3.125

X3={(42.5-43.5)+(52-57)+(39-42)+(49-52.5)}/4=-3.125

X1X2={(42.5-52)-(39-49)}/4+{(43.5-57)-(42-52.5)}/4=-0.625

X1X3={(42.5-52)-(43.5-57)}/4+{(39-49)-(42-52.5)}/4=+1.125

X2X3={(42.5-39)-(43.5-42)}/4+{(52-49)-(57-52.5)}/4=+0.125

X1X2X3={(42.5-52)-(39-49)}/4-{43.5-57)-(42-52.5)}/4=+0.875

X1X2

X1X3

X2X3

X1X2X3

CONCLUSIONS:

♦ Only the main effects are large; All interactions are small in comparison.

♦ The X1 and X3 main effects are negative, implying that to reduce the travel time the high levels of these factors should be used.

♦ The X2 main effect is positive, implying that to reduce the travel time the low level of X2 should be used

Data Analysis of 2^k Experiment

By Ti Zhou

Based on the former discussion, we now generalize the situation into the

k level where k>3 and deal with an example.

• Assumption: n i.i.d. observations (j=1,2..,n) at each ith treatment combination

• Denote their sample means by • Estimated effect

• is the contrast coefficient for the main effects. • main effect is at high level main effect is at low level

ijy

( 1,2,..., 2 )kiy i 2

11 12 2

k

i iik k

c yContrast

ic1

1ic

For example when k=3, the contrast coefficients are as follows:

Run A B C Treatment

1 - - - -1

2 + - - a

3 - + - b

4 + + - ab

5 - - + c

6 + - + ac

7 - + + bc

8 + + + abc

The contrast coefficients for interactions are obtained by taking term-by-term products of the contrast vectors of corresponding main effects.

The contrast coefficient for 2^4 Experiment

2k Design ExampleProblem Statement:Problem Statement: Generally there are three important factors

in designing a computer.

1.Memory Size (A)

2.Cache Size (B)

3.Number of Processors (C)

A manufacturer wants to study above three effects on the

performance of computers and their interaction. The levels of

each factor are as follows:

Factor Level -1 Level 1ABC

Memory SizeCache SizeNumber of Processors

4MB1kB1

16MB2kB2

Computer Design Experiment

　Treatmen

t 　Coded Factors Benchmark Scores

Factor Levels

RunCombinat

ion A B C 　Replic

a1Replica2 　　

Low (-1)

High(+1)

1 (-1) -1 -1 -1 　 16.7 14.8 　 A(MB) 4 16

2 a 1 -1 -1 　 24.5 23.3 　 B(KB) 1 2

3 b -1 1 -1 　 13 11.6 　C(Unit) 1 2

4 ab 1 1 -1 　 34.2 33.6 　　　　

5 c -1 -1 1 　 45.1 46.3 　　　　

6 ac 1 -1 1 　 59.2 57.3 　　　　

7 bc -1 1 1 　 51.4 49.3 　　　　

8 abc 1 1 1 　 81.9 84.6 　　　　

Go back to Yates‘ Algorithm

• The estimated effect for each factor and interaction effect can be calculated as follows:

• For example, the main effect of A =

=

(-15.75+23.9-12.3+33.9-45.7+58.5-50.35+83.25)/4 = 18.8.

To summarize, all the effects are calculated:

2

11 12 2

k

i iik k

c yContrast

(1)( ) / 4a b ab c ac bc abcy y y y y y y y

Effect A B C AB AC BC ABC

Est.Effect 18.8 21 151.7 8.453.92

55.77

51.72

5

Since n=2, we can get SS.effect = 4*(Est.effect)^2

• Statistics Inference for the experiment

22

1 1

2

2 2

2 1,2 ( 1)

( )

2 ( 1)

( 2 )( . )

.

k

k

n

ij ii j

k

k

Effect n

SSE y y

SSEMSE s

n

n Est effect MSeffectF f

MSEsMSeffect SSeffect

AVONA Table for the Experiment of Design of Computer Factors Est.effect SSeffect/MSeffect DF Mean Sq. F ratio P-value

Memory Size(A) 18.8 1413.76 1 1413.76 937.8176 <.0001

Cache Size(B) 9.05 327.61 1 327.61 217.3201 <.0001

Number of Processors(

c) 37.925 5753.2225 1 5753.2225 3816.4 <.0001

AB 8.45 285.61 1 285.61 189.4594 <.0001

AC 3.925 61.6225 1 61.6225 40.87728 0.0002

BC 5.775 133.4025 1 133.4025 88.49254 <.0001

ABC 1.725 11.9025 1 11.9025 7.895522 0.0228

Total error12.06 8 MSE=1.5075

Total 7999.19 15

Conclusion

• Intuitively increment in all the three factors (Memory, Cache and Processor) can pose a positive effect on the performance of a computer.

• The p-value for both main effects and interaction effects are very small, indicating that all the effects are highly significant (at significance level 0.05) which is consistent with intuition.

• The F-ratio for the effect of factor c( number of processors )is the largest number among all the result. It indicates that it is the most important factor in the performance of a computer.

Yates' Algorithm Frank Yates(1902-1994) had found a systematic algorithm to perform

the above calculations. Recall the former example: Experiment of

Computer Design. We will use it to explain the algorithm.

List all the treatment combinations and their sample means in the standard order

The 2^(k-1) successive pairs of means are first added and then subtracted. The result saves in the column labeled I.

Repeat the calculation k times by using the data from column I and saves the result in the column labeled K.

Divide the first entry in column k by 2^k while divide the remains by 2^(k-1). Then we obtain the grand mean and all estimated effects.

Treatment Combination

Treatment Mean

Estimated Effect

SS for

I II III Effect

(1) 15.75 39.65 85.85 323.4 40.425

a 23.9 46.2 237.55 75.2 18.8 1413.76

b 12.3 103.95 29.75 36.2 9.05 327.61

ab 33.9 133.6 45.45 33.8 8.45 285.61

c 45.7 8.15 6.55 151.7 37.925 5753.223

ac 58.25 21.6 29.65 15.7 3.925 61.6225

bc 50.35 12.55 13.45 23.1 5.775 133.4025

abc 83.25 32.9 20.35 6.9 1.725 11.9025

RELATED SAS PROGRAM& MODEL DIAGNOSTICS

BY TONY

SAS-----

•A more efficient way

•Very interesting

One Suggestion?

This experiment employed a 23 factorial experimental design with two quantitative factors----temperature T and concentration C

----and a single qualitative factor----type of catalyst K. Each data value recorded is for the

response yields y1, y2 of two duplicate runs. The data is as follows:

Example: The Pilot Plants InvestigationRun Number Temperature Concentration Catalyst Yield1 Yield2

T(C) C(%) K(A or B) y1(%) y2(%)

1 160 20 A 59 61

2 180 20 A 74 70

3 160 40 A 50 58

4 180 40 A 69 67

5 160 20 B 50 54

6 180 20 B 81 85

7 160 40 B 46 44

8 180 40 B 79 81

Run Number Temperature Concentration Catalyst Yield1 Yield2

T(C) C(%) K(A or B) y1(%) y2(%)

1 -1 -1 -1 59 61

2 1 -1 -1 74 70

3 -1 1 -1 50 58

4 1 1 -1 69 67

5 -1 -1 1 50 54

6 1 -1 1 81 85

7 -1 1 1 46 44

8 1 1 1 79 81

data plant;

input t c k result @@;

tc=t*c;

tk=t*k;

ck=c*k;

tck=t*c*k;

datalines;

-1 -1 -1 59 -1 -1 -1 61

1 -1 -1 74 1 -1 -1 70

-1 1 -1 50 -1 1 -1 58

1 1 -1 69 1 1 -1 67

-1 -1 1 50 -1 -1 1 54

1 -1 1 81 1 -1 1 85

-1 1 1 46 -1 1 1 44

1 1 1 79 1 1 1 81

;

run;

Q:Can you input the data in a more concise way?

data plant;do k= -1 to 1 by 2;do c=-1 to 1 by 2;

do t=-1 to 1 by 2;do r=1 to 2;

input result @@;ck=k*c; tk=k*t; tc=c*t; tck=k*c*t;output;end;

end;end;

end;datalines;59 61 74 70 50 58 69 6750 54 81 85 46 44 79 81;run;Obs. k c t r Res. ck tk tc tck

1 -1 -1 -1 1 59 1 1 1 -1

2 -1 -1 -1 2 61 1 1 1 -1

3 -1 -1 1 1 74 1 -1 -1 1

4 -1 -1 1 2 70 1 -1 -1 1

5 -1 1 -1 1 50 -1 1 -1 1

6 -1 1 -1 2 58 -1 1 -1 1

7 -1 1 1 1 69 -1 -1 1 -1

proc glm data=plant;

class t c k tc tk ck tck;

model result = t c k tc tk ck tck;

run;

OUTPUT： The SAS System 19:23 Wednesday, December 1, 2007 30

The GLM Procedure

Class Level Information

Class Levels Values

t 2 -1 1

c 2 -1 1

k 2 -1 1

tc 2 -1 1

tk 2 -1 1

ck 2 -1 1

tck 2 -1 1

Number of Observations Read 16

Number of Observations Used 16

The SAS System 19:23 Wednesday, December 1, 2007 31

The GLM Procedure

Dependent Variable: result

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 7 2635.000000 376.428571 47.05 <.0001

Error 8 64.000000 8.000000

Corrected Total 15 2699.000000

R-Square Coeff Var Root MSE result Mean

0.976288 4.402221 2.828427 64.25000

Source DF Type I SS Mean Square F Value Pr > F

t 1 2116.000000 2116.000000 264.50 <.0001

c 1 100.000000 100.000000 12.50 0.0077

k 1 9.000000 9.000000 1.13 0.3198

tc 1 9.000000 9.000000 1.13 0.3198

tk 1 400.000000 400.000000 50.00 0.0001

ck 1 0.000000 0.000000 0.00 1.0000

tck 1 1.000000 1.000000 0.13 0.7328

Source DF Type III SS Mean Square F Value Pr > F

t 1 2116.000000 2116.000000 264.50 <.0001

c 1 100.000000 100.000000 12.50 0.0077

k 1 9.000000 9.000000 1.13 0.3198

tc 1 9.000000 9.000000 1.13 0.3198

tk 1 400.000000 400.000000 50.00 0.0001

ck 1 0.000000 0.000000 0.00 1.0000

tck 1 1.000000 1.000000 0.13 0.7328

Conclusion:

At the significance level 0.05, the main effects T, C and TK interaction are all significant.

REVISION:proc glm data=plant; class t c k tk; model result = t c k tk; output out=plant2 r=resid p=predic;run;proc plot data=plant2; plot resid*predic;run;

OUTPUT:

Source DF Type III SS Mean Square F Value Pr > F

t 1 2116.000000 2116.000000 314.54 <.0001

c 1 100.000000 100.000000 14.86 0.0027

k 1 9.000000 9.000000 1.34 0.2719

tk 1 400.000000 400.000000 59.46 <.0001

Constant Variance ? Plot of resid*predic. Legend: A = 1 obs, B = 2 obs, etc.

resid |

|

3.5 + A

|

3.0 + A

|

2.5 +

|

2.0 + A

|

1.5 + A A A

|

1.0 + A

|

0.5 +

|

0.0 + A A

|

-0.5 + A A

|

-1.0 + A

|

-1.5 +

|

-2.0 + A

|

-2.5 + A

|

-3.0 + A

|

-3.5 +

|

-4.0 +

|

-4.5 + A

|

---+---------+---------+---------+---------+---------+---------+---------+---------+--

45 50 55 60 65 70 75 80 85 predic

CONCLUSION

From the plot of residuals against fitted values, we find that most dots are fairly dispersed. Although dispersion of some dots appears uneven, maybe it’s not serious enough to reject the constant variance assumption.

Normal Probability Plot3.5+ *++++*

| *+++| * *+++*+| *+*+++

-0.5+ * *+++| +*+++| *++*| +++++

-4.5+ ++++*+----+----+----+----+----+----+----+----+----+----+

-2 -1 0 +1 +2

CONCLUSIONThe data comes from a population which is normal or approximately normal.

proc univariate data=plant2 plot;

var resid;

run;

Is the population normal?

Regression Approach & Summary

Wenbin Zhang

• Purpose of the Regression Approach• Regression Approach to Two-Factor Experiments• Regression Approach to Experiments• An Example to Regression on Experiments• Summary on Chapter 13

2k

32

Purpose of the Regression Approach

• To provide A unified approach to the analysis of balanced or unbalanced designs

• To predict the responses at specified combinations of the levels of the experimental factors

Regression Approach to Two-Factor Experiments• Define indicator variables:

For i= 1,…, a-1

+1 if the observation is from the i-th row

= -1 if the observation is from the a-th row

0 otherwise

For j= 1,…, b-1

+1 if the observation is from the j-th column

= -1 if the observation is from the b-th column

0 otherwise

i

j

Regression Approach to Two-Factor Experiments• Regression Model:

– Indicator variables are predictor variables– are unknown parameters

• The Regression Model is equivalent to the model we have examined:

1 1 1 1

1 1 1 1

( )a b a b

i i j j ij i ji j i j

Y u v u v

,i ju v

, , ( )i j ij

ijk ij ijk i j ij ijkY ( ) i i j j.., . .., . ..

ij ij i j ij i. . j( ) ..

Regression Approach to 2k Experiments• For 22 experiments, define indicator variables x1 and x2 to

represent the levels of A and B:

if A is low if B is low

if A is high if B is high• Regression Model:

• Estimated with main effects and interaction effects:

0 12

ˆˆ ( )ˆ ˆ ˆ ˆ, , .

2 2 2 2

A B ABy

0 12( )

, , .2

0 1 1 2 2 1 2( )E Y x x x x

1 1x 2 1x 1 1

Regression Approach to 2k Experiments• Similarly, the full regression model for k=3 is:

• Substitute with main effects and interaction effects:

• The reduced model after dropping all interactions from the full model:

– The parameters will be unchanged because of the orthogonal nature of the design

0 1 1 2 2 3 3 12 1 2 13 1 3 23 2 3 123 1 2 3( )E Y x x x x x x x x x x x x

1 2 3 1 2 1 3 2 3 1 2 3ˆ2 2 2 2 2 2 2

A B C AB AC BC ABCy y x x x x x x x x x x x x

1 2 3ˆ2 2 2

A B Cy y x x x

Regression Approach to 2k Experiments• To predict the response at specified

combinations of the levels of the experimental factors:– For numerical factors:

The interpolation formula:

– For nominal factors (e.g., either on or off):

Don’t need interpolation

_ _ _ ( ) 2

2 ( ) 2i

Specified level Average level Specified level High Lowx

Range High Low

An Example to Regression on 23 Experiments• The effects of bicycles seat height, generator use, and tire

pressure on the time taken to make a half-block uphill run.

• The level of factors:

Seat height (Factor A): 26”(-), 30”(+)

Generator (Factor B): Off(-), On(+)

Tire pressure (Factor C): 40psi(-), 55psi(+)

• The data are shown in the next slice.

• Questions:– 1) Predict the minimum travel time using the regression model– 2) Predict the travel time when seat height = 27”, generator is

off, and tire pressure = 50psi.

An Example to Regression on 23 Experiments

Travel Times from Bicycle Experiment

Factor Time (Secs.)

A B C Run 1 Run 2 Mean

- - - 51 54 52.5

+ - - 41 43 42.0

- + - 54 60 57.0

+ + - 44 43 43.5

- - + 50 48 49.0

+ - + 39 39 39.0

- + + 53 51 52.0

+ + + 41 44 42.5

An Example to Regression on 23 Experiments

• Because all interactions were found nonsignificant, we only consider main effects:

(42.5 52.0) (39.0 49.0) (43.5 57.0) (42.0 52.5)10.875

4(42.5 39.0) (52.0 49.0) (43.5 42.0) (57.0 52.5)

3.1254

(42.5 43.5) (52.0 57.0) (39.0 42.0) (49.0 52.5)3.125

447.1875

10.875ˆ 47.1875

2

A

B

C

y

y x

1 2 3 1 2 3

3.125 3.12547.1875 5.4375 1.5625 1.5625

2 2x x x x x

An Example to Regression on 23 Experiments• 1) The minimum predicted travel time is:

• 2) when seat height = 27”, generator is off, and tire pressure = 50psi, we have

47.1875 5.4375( 1) 1.5625( 1) 1.5625( 1) 38.625secy

1 2 3

27 (30 26) / 2 50 (55 40) / 20.5, 1, 0.333

(30 26) / 2 (55 40) / 2

ˆ 47.1875 5.4375( 0.5) 1.5625( 1) 1.5625( 0.333) 47.823sec

x x x

y

Summary

This chapter gives us the solution of experimental design and the method of analyzing data we collected.

1. The design of the experiment is called complete factorial design compared to fractional factorial design.

2. 2k experiment is better than One-Factor-a-time because it can detect interactions.

3. We discussed two-factor experiments with arbitrary number of levels per factor and k-factor experiments with two levels per factor.. Two-factor experiments are considered for balanced designs and unbalanced designs.. 2k experiments are considered only for balanced designs.

Summary (cont.)

4. When we get the data of our experiment, we just need to find the parameter to our linear model:

SST = SS( all single factor effects) + SS(all interactive effects) + e(noise)

5. ANOVA table gives us a great way to analyze the data to get and give us the significance of each factor.

6. We can use regression approach to get the parameters of the model above.

Thank you!

Any Questions?

Documents

Chapter 13 Analysis of Multi-factor Experiments Dec 6 th 2007