ANOVA

ANOVA

The same slopes, but different intercepts

versicolor

verginica

setosa

2.0 2.5 3.0 3.5 4.0

4.5

5.0

5.5

6.0

6.5

7.0

7.5

8.0

sepal.width

sep

al.l

en

gth

kjkjkkj XY 50,,2,1 j

n

n

n

n

n

n

n

n

n

X

X

X

X

X

X

X

Y

Y

Y

Y

Y

Y

Y

3

31

2

21

1

12

11

3

31

2

21

1

12

11

321

3

31

2

21

1

12

11

1

1

0

0

0

0

0

0

0

1

1

0

0

0

1

1

1

1

1

1

1

),0(~ 2 Nkjiid

3,2,1k)50( n)3( K

3

3

2

2

1

1

1

dummy variables

factor (categoriacal) variable

versicolor

verginica

setosa

jjj XY 1111 jjj XY 22212 jjj XY 33313

for setosa for versicolor for verginica

kjkjkkj XY 50,,2,1 j

n

n

n

n

n

n

n

n

n

X

X

X

X

X

X

X

Y

Y

Y

Y

Y

Y

Y

3

31

2

21

1

12

11

3

31

2

21

1

12

11

321

3

31

2

21

1

12

11

1

1

0

0

0

0

0

0

0

1

1

0

0

0

1

1

1

1

1

1

1

),0(~ 2 Nkjiid

3,2,1k)50( n)3( K

3

3

2

2

1

1

1

dummy variables

factor (categoriacal) variable

versicolor

verginica

setosa

jjj XY 1111 jjj XY 22212 jjj XY 33313

for setosa for versicolor for verginica

kjkkjY

jjY 111

jjY 2212

jjY 3313

3210 : H )(: 3211 notH

0: 320 H )0(: 321 notH

ANOVA : Analysis of Variances

A statistical method for comparisons of (population) means of many groups.

For comparison of two groups, t-test is applicable. ANOVA is a generalizedmethod of t-test in this view. ANOVA does not aim to compare variances.

Equivalence of group means

R adopts this convention.

> is.factor(iris$Species)[1] TRUE> rout<- lm(Sepal.Length~Species,data=iris)> anova(rout)Analysis of Variance TableResponse: Sepal.Length Df Sum Sq Mean Sq F value Pr(>F) Species 2 63.212 31.606 119.26 < 2.2e-16 ***Residuals 147 38.956 0.265 > summary(rout)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.0060 0.0728 68.762 < 2e-16 ***Speciesversicolor 0.9300 0.1030 9.033 8.77e-16 ***Speciesvirginica 1.5820 0.1030 15.366 < 2e-16 ***

Residual standard error: 0.5148 on 147 degrees of freedomMultiple R-squared: 0.6187, Adjusted R-squared: 0.6135 F-statistic: 119.3 on 2 and 147 DF, p-value: < 2.2e-16

)0(: 321 notHThis results support .

That is, group means of sepal.length are not the same (at least one group hasdifferent group mean to the other two groups.

)0(: 321 notH

02 03

Tomato data

weight treatment1 1.50 water2 1.90 water3 1.30 water4 1.50 water5 2.40 water6 1.50 water7 1.50 Nutrient8 1.20 Nutrient9 1.20 Nutrient

10 2.10 Nutrient11 2.90 Nutrient12 1.60 Nutrient13 1.90 Nutrient+24D14 1.60 Nutrient+24D15 0.80 Nutrient+24D16 1.15 Nutrient+24D17 0.90 Nutrient+24D18 1.60 Nutrient+24D

Comparison for the weights of tomatoaccording to the treatment (trt).

There are 3 treatment groups; water, nutrient, and nutrient with 2,4D component.

The aim of this study is to see whether the nutrient and the 2,4D will increase(or decrease) the weight of tomato.

> x<- c(1.5,1.9,1.3,1.5,2.4,1.5,1.5,1.2,1.2,2.1,2.9,1.6, + 1.9,1.6,0.8,1.15,0.9,1.6)> tx<- rep(c("water", "Nutrient", "Nutrient+24D"), c(6, 6, 6))> ( tomato <- data.frame(weight=x, trt =tx ) )

> stripchart(weight~trt,pch=16, cex=1.4 ,col="red", data=tomato)> with(tomato, points(weight,trt, pch=16, cex=0.6 ,col="yellow"))

Tomato data

1.0 1.5 2.0 2.5

wa

ter

Nu

trie

nt

Nu

trie

nt+

24

D

weight

> is.factor(tomato$trt)[1] TRUE

> (sout1<- lm(weight~trt,data=tomato) )

Call: lm(formula = weight ~ trt, data = tomato)

Coefficients: (Intercept) trtNutrient+24D trtwater 1.75000 -0.42500 -0.06667

> tomato$trt <- relevel(tomato$trt, ref="water") > (sout2<- lm(weight~trt,data=tomato) )


Coefficients: (Intercept) trtNutrient trtNutrient+24D 1.68333 0.06667 -0.35833

> anova(sout1)Analysis of Variance TableResponse: weight Df Sum Sq Mean Sq F value Pr(>F)trt 2 0.6269 0.31347 1.2019 0.328Residuals 15 3.9121 0.26081

sout1 and sout2 are the same analysis, but use different base levels.

jjY 111

jjY 2212

jjY 3313

jjY 1311

jjY 2322

jjY 333

n

n

n

n

n

n

Y

Y

Y

Y

Y

Y

Y

3

31

2

21

1

12

11

321

3

31

2

21

1

12

11

1

1

0

0

0

0

0

0

0

1

1

0

0

0

1

1

1

1

1

1

1

jjY 111 jjY 2212 jjY 3313

for water for nutrient for nutrient+24D

n

n

n

n

n

n

Y

Y

Y

Y

Y

Y

Y

3

31

2

21

1

12

11

321

3

31

2

21

1

12

11

1

1

1

1

1

1

1

0

0

1

1

0

0

0

0

0

0

0

1

1

1

jjY 1311 jjY 2322 jjY 333

> summary(sout1)


Residuals: Min 1Q Median 3Q Max -0.5500 -0.3500 -0.1792 0.2750 1.1500

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.75000 0.20849 8.394 4.74e-07 ***trtNutrient+24D -0.42500 0.29485 -1.441 0.170 trtwater -0.06667 0.29485 -0.226 0.824

> summary(sout2)


Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.68333 0.20849 8.074 7.69e-07 ***trtNutrient 0.06667 0.29485 0.226 0.824 trtNutrient+24D -0.35833 0.29485 -1.215 0.243

Residual standard error: 0.5107 on 15 degrees of freedomMultiple R-squared: 0.1381, Adjusted R-squared: 0.02321 F-statistic: 1.202 on 2 and 15 DF, p-value: 0.328

Everything is the same for sout1 and sout2except the base levels.

No sure evidence of and

01 02

> anova(sout2)Analysis of Variance Table

Response: weight Df Sum Sq Mean Sq F value Pr(>F)trt 2 0.6269 0.31347 1.2019 0.328Residuals 15 3.9121 0.26081

Source Sum of Sq.

df MS F P-value

Treatment SSTR K-1 MSTR MSTR/MSE F(K-1,N-K)

Error (Residuals) SSE N-K MSE

Total SST N-1

ANOVA table

SST SSE SSTR

A company specializing in preparing students for college entrance exams had the business objective of improving its ACT preparatory course. Two factors of interest to the company are the length of the course ( a condensed 10-day period or a regular 30-day period) and the type of course (traditional classroom or online distance learning). The company collected data by randomly assigning 10 clients to the 4 cells of combinations of the two factors. What are the effects of the type of course and the length of the course on ACT scores?

ACT score data (artificial data)

Condensed (C )

Regular (R )

Traditional (T)

26, 18,27, 24,25,

19,21, 20,21, 18

34, 28,24,21,35, 23,31, 29,28,

26

Online (O) 27, 21,29, 32,30, 20,24,

28,30, 29

24, 21,16, 19,22, 19,20,

24,23, 25

> y<-c(26,18,34,28,27,24,24,21,25,19,35,23,21,20,31,29,21,18,28,26,27,21,+ 24,21,29,32,16,19,30,20,22,19,24,28,20,24,30,29,23,25)> ltx<-c("C","C","R","R","C","C","R","R","C","C","R","R","C","C","R","R",+ "C","C","R","R","C","C","R","R","C","C","R","R","C","C","R","R","C","C",+ "R","R","C","C","R","R")> tpx<-c("T","T","T","T","T","T","T","T","T","T","T","T","T","T","T","T",+ "T","T","T","T","O","O","O","O","O","O","O","O","O","O","O","O","O","O",+ "O","O","O","O","O","O")> act<-data.frame(score=y, length=ltx, type=tpx)

n

n

n

n

n

n

n

n

Y

Y

Y

Y

Y

Y

Y

Y

Y

21

211

21

211

12

121

11

112

111

22211211

21

211

21

211

12

121

11

112

111

1

1

0

0

0

0

0

0

0

1

1

1

1

0

0

0

0

0

1

1

0

0

1

1

0

0

0

1

1

1

1

1

1

1

1

1

jjY 21211121

for Condensed & Online

for Condensed & Traditional

for Regular & Online

for Regular & Traditional

jjY 111111

jjY 12121112

jjY 2221121122

)10(10,...,2,1 nj

jjY 222221121122

When interaction effect is assumed :

No interaction model for ACT data

> head(act) score length type1 26 C T2 18 C T3 34 R T4 28 R T5 27 C T6 24 C T

This model appears not so adequate.

> aout1<- aov(score~length+type,data=act)> summary(aout1) Df Sum Sq Mean Sq F value Pr(>F)length 1 0.22 0.225 0.0098 0.9217type 1 5.63 5.625 0.2448 0.6237Residuals 37 850.13 22.976 > summary.lm(aout1)

Call:aov(formula = score ~ length + type, data = act)

Residuals: Min 1Q Median 3Q Max -8.225 -3.862 -0.225 3.250 10.025

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 24.075 1.313 18.340 <2e-16 ***lengthR 0.150 1.516 0.099 0.922 typeT 0.750 1.516 0.495 0.624 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.793 on 37 degrees of freedomMultiple R-squared: 0.006834, Adjusted R-squared: -0.04685 F-statistic: 0.1273 on 2 and 37 DF, p-value: 0.8808

Two-way ANOVA without interaction model

This result shows no interaction model appears not so meaningful for the ACT data. In this model, we may accept .

02112

In this model, all the effects looks significant.That is, , looks surely negative and is surely positive.

> aout2<- aov(score~length*type,data=act)> summary(aout2) Df Sum Sq Mean Sq F value Pr(>F) length 1 0.22 0.22 0.0159 0.9002 type 1 5.63 5.63 0.3987 0.5318 length:type 1 342.22 342.22 24.2569 1.888e-05 ***Residuals 36 507.90 14.11 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > summary.lm(aout2)

Call:aov(formula = score ~ length * type, data = act)

Residuals: Min 1Q Median 3Q Max -7.000 -2.450 0.100 2.775 7.100

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 27.000 1.188 22.731 < 2e-16 ***lengthR -5.700 1.680 -3.393 0.00169 ** typeT -5.100 1.680 -3.036 0.00444 ** lengthR:typeT 11.700 2.376 4.925 1.89e-05 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.756 on 36 degrees of freedomMultiple R-squared: 0.4066, Adjusted R-squared: 0.3572 F-statistic: 8.224 on 3 and 36 DF, p-value: 0.0002686

Two-way ANOVA with interaction model

In R, length*type = length+type+length:type

In R, length:type means interaction effect between length and type.

This model looks well fitted.

12 2122

Conclusion: For traditional (T) type, course of regular (R) length, and for online (O) type, course of condensed (C) length showed better scores.

ACT score data

> with(act, interaction.plot(length, type,score, xlab="First factor"))

22

23

24

25

26

27

28

First factor

me

an

of

sco

re

C R

type

TO

Decompositions of variations in two-way ANOVA

SST SSE SSTRA

No interaction model:

Interaction model:

SSTRB

SST SSE SSTRA SSTRB

SSTRAB

Source Sum of Sq.

df MS F P-value

Treatment A SSTRA KA-1 MSTRA MSTRA/MSE F(KA-1,N-KA-KB+1)

Treatment B SSTRB KB-1 MSTRB MSTRB/MSE F(KB-1,N-KA-KB+1)

Error (Residuals)

SSE N-KA-KB+1 MSE

Total SST N-1

ANOVA table

Source Sum of Sq.

df MS F P-value

Treatment A SSTRA KA-1 MSTRA MSTRA/MSE F(KA-1,N-KA-KB+1)

Treatment B SSTRB KB-1 MSTRB MSTRB/MSE F(KB-1,N-KA-KB+1)

Treatment A:B SSTRAB (KA-1)(KB-1) MSTRAB MSTRAB/MSE F(dfAB,N-KA-KB+1)

Error (Residuals)

SSE N-KAKB MSE

Total SST N-1

No interaction model:

Interaction model:

Yesterday, YD discovered the secret diary written by R. A. Fisher.

R. A. Fisher made a note on his iris data in the diary. He mentioned that he collected the data in five days. In each day he got 10 irises for each 3 species (varieties) of iris by randomly picking from his garden, and measured the lengths of sepals and petals for the selected 30 flowers.

From the note of the diary, YD recovered a new variable date which means the date when R. A. Fisher measured the flowers, and YD added the new variable to the iris data. The new dataset is named irix.

Sizes of sepals and petals vary on the conditions changing day by day, such as temperature and humidity.

> dtx<- c(3,5,4,5,1,5,4,1,4,4,5,2,3,3,3,1,3,4,3,5,5,3,4,2,2,5,4,1,5,1,2,3,2,5,1,5,+ 1,2,4,3,4,2,2,2,1,1,1,4,2,3,3,2,1,2,1,4,3,1,3,5,1,1,2,2,4,5,4,2,3,2,1,2,2,2,3,1,+ 5,5,4,4,5,1,4,4,1,4,3,3,3,3,4,2,5,4,5,1,5,5,3,5,2,2,5,4,4,1,4,5,2,2,5,1,3,4,3,3,+ 3,3,5,4,1,3,1,4,4,5,2,4,5,5,1,1,2,5,4,1,3,3,5,2,5,1,2,1,3,3,1,2,4,2)> irix<-data.frame(iris,date=factor(dtx))> names(irix)<- c("sl","sw","pl","pw","spc","date") > head(irix) sl sw pl pw spc date1 5.1 3.5 1.4 0.2 setosa 32 4.9 3.0 1.4 0.2 setosa 53 4.7 3.2 1.3 0.2 setosa 44 4.6 3.1 1.5 0.2 setosa 5

> aout1<-aov(sl~spc,data=irix) > summary(aout1) Df Sum Sq Mean Sq F value Pr(>F) spc 2 63.212 31.606 119.26 < 2.2e-16 ***Residuals 147 38.956 0.265 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > aout2<-aov(sl~spc+date,data=irix) > summary(aout2) Df Sum Sq Mean Sq F value Pr(>F) spc 2 63.212 31.606 136.3884 < 2.2e-16 ***date 4 5.818 1.455 6.2765 0.0001108 ***Residuals 143 33.138 0.232 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘

When the date variable is introduced to the model, the MSE is slightly lowered, and it increases the F-value of the effect of the other factors. This means, by eliminating the variation due to the date of the observations, inference for the effects of interest cab be done more precisely. In this example dates of observation is a kind of block.

Blocking to "remove" the effect of nuisance factorsFor randomized block designs, there are factors or variables that are of primary interest. However, there are also several other nuisance factors. Nuisance factors are those that may affect the measured result, but are not of primary interest. For example, in applying a treatment, nuisance factors might be the specific operator who prepared the treatment, the time of day the experiment was run, and the room temperature. A nuisance factor is used as a blocking factor if every level of the primary factor occurs the same number of times with each level of the nuisance factor. The analysis of the experiment will focus on the effect of varying levels of the primary factor within each block of the experiment.In the analysis of irix data, the date variable is used as blocking factor, each day of observationis a block. R. A. Fisher randomly selected the flowers in each day (in a block), while keeping thenumber of flowers of a species balanced (10 flowers for each species). This is an example of (balanced) randomized block design.

Blocks of irix data

　　 1st day 　　　　 2nd day 　　　　 3rd day 　　　　 4th day 　　　　 5th day 　　

id sl sw pl pw spc id sl sw pl pw spc id sl sw pl pw spc id sl sw pl pw spc id sl sw pl pw spc

5 5 3.6 1.4 0.2 S 12 4.8 3.4 1.6 0.2 S 1 5.1 3.5 1.4 0.2 S 3 4.7 3.2 1.3 0.2 S 2 4.9 3 1.4 0.2 S

8 5 3.4 1.5 0.2 S 24 5.1 3.3 1.7 0.5 S 13 4.8 3 1.4 0.1 S 7 4.6 3.4 1.4 0.3 S 4 4.6 3.1 1.5 0.2 S

16 5.7 4.4 1.5 0.4 S 25 4.8 3.4 1.9 0.2 S 14 4.3 3 1.1 0.1 S 9 4.4 2.9 1.4 0.2 S 6 5.4 3.9 1.7 0.4 S

28 5.2 3.5 1.5 0.2 S 31 4.8 3.1 1.6 0.2 S 15 5.8 4 1.2 0.2 S 10 4.9 3.1 1.5 0.1 S 11 5.4 3.7 1.5 0.2 S

30 4.7 3.2 1.6 0.2 S 33 5.2 4.1 1.5 0.1 S 17 5.4 3.9 1.3 0.4 S 18 5.1 3.5 1.4 0.3 S 20 5.1 3.8 1.5 0.3 S

35 4.9 3.1 1.5 0.2 S 38 4.9 3.6 1.4 0.1 S 19 5.7 3.8 1.7 0.3 S 23 4.6 3.6 1 0.2 S 21 5.4 3.4 1.7 0.2 S

37 5.5 3.5 1.3 0.2 S 42 4.5 2.3 1.3 0.3 S 22 5.1 3.7 1.5 0.4 S 27 5 3.4 1.6 0.4 S 26 5 3 1.6 0.2 S

45 5.1 3.8 1.9 0.4 S 43 4.4 3.2 1.3 0.2 S 32 5.4 3.4 1.5 0.4 S 39 4.4 3 1.3 0.2 S 29 5.2 3.4 1.4 0.2 S

46 4.8 3 1.4 0.3 S 44 5 3.5 1.6 0.6 S 40 5.1 3.4 1.5 0.2 S 41 5 3.5 1.3 0.3 S 34 5.5 4.2 1.4 0.2 S

47 5.1 3.8 1.6 0.2 S 49 5.3 3.7 1.5 0.2 S 50 5 3.3 1.4 0.2 S 48 4.6 3.2 1.4 0.2 S 36 5 3.2 1.2 0.2 S

53 6.9 3.1 4.9 1.5 V 52 6.4 3.2 4.5 1.5 V 51 7 3.2 4.7 1.4 V 56 5.7 2.8 4.5 1.3 V 60 5.2 2.7 3.9 1.4 V

55 6.5 2.8 4.6 1.5 V 54 5.5 2.3 4 1.3 V 57 6.3 3.3 4.7 1.6 V 65 5.6 2.9 3.6 1.3 V 66 6.7 3.1 4.4 1.4 V

58 4.9 2.4 3.3 1 V 63 6 2.2 4 1 V 59 6.6 2.9 4.6 1.3 V 67 5.6 3 4.5 1.5 V 77 6.8 2.8 4.8 1.4 V

61 5 2 3.5 1 V 64 6.1 2.9 4.7 1.4 V 69 6.2 2.2 4.5 1.5 V 79 6 2.9 4.5 1.5 V 78 6.7 3 5 1.7 V

62 5.9 3 4.2 1.5 V 68 5.8 2.7 4.1 1 V 75 6.4 2.9 4.3 1.3 V 80 5.7 2.6 3.5 1 V 81 5.5 2.4 3.8 1.1 V

71 5.9 3.2 4.8 1.8 V 70 5.6 2.5 3.9 1.1 V 87 6.7 3.1 4.7 1.5 V 83 5.8 2.7 3.9 1.2 V 93 5.8 2.6 4 1.2 V

76 6.6 3 4.4 1.4 V 72 6.1 2.8 4 1.3 V 88 6.3 2.3 4.4 1.3 V 84 6 2.7 5.1 1.6 V 95 5.6 2.7 4.2 1.3 V

82 5.5 2.4 3.7 1 V 73 6.3 2.5 4.9 1.5 V 89 5.6 3 4.1 1.3 V 86 6 3.4 4.5 1.6 V 97 5.7 2.9 4.2 1.3 V

85 5.4 3 4.5 1.5 V 74 6.1 2.8 4.7 1.2 V 90 5.5 2.5 4 1.3 V 91 5.5 2.6 4.4 1.2 V 98 6.2 2.9 4.3 1.3 V

96 5.7 3 4.2 1.2 V 92 6.1 3 4.6 1.4 V 99 5.1 2.5 3 1.1 V 94 5 2.3 3.3 1 V 100 5.7 2.8 4.1 1.3 V

106 7.6 3 6.6 2.1 G 101 6.3 3.3 6 2.5 G 113 6.8 3 5.5 2.1 G 104 6.3 2.9 5.6 1.8 G 103 7.1 3 5.9 2.1 G

112 6.4 2.7 5.3 1.9 G 102 5.8 2.7 5.1 1.9 G 115 5.8 2.8 5.1 2.4 G 105 6.5 3 5.8 2.2 G 108 7.3 2.9 6.3 1.8 G

121 6.9 3.2 5.7 2.3 G 109 6.7 2.5 5.8 1.8 G 116 6.4 3.2 5.3 2.3 G 107 4.9 2.5 4.5 1.7 G 111 6.5 3.2 5.1 2 G

123 7.7 2.8 6.7 2 G 110 7.2 3.6 6.1 2.5 G 117 6.5 3 5.5 1.8 G 114 5.7 2.5 5 2 G 119 7.7 2.6 6.9 2.3 G

131 7.4 2.8 6.1 1.9 G 127 6.2 2.8 4.8 1.8 G 118 7.7 3.8 6.7 2.2 G 120 6 2.2 5 1.5 G 126 7.2 3.2 6 1.8 G

132 7.9 3.8 6.4 2 G 133 6.4 2.8 5.6 2.2 G 122 5.6 2.8 4.9 2 G 124 6.3 2.7 4.9 1.8 G 129 6.4 2.8 5.6 2.1 G

136 7.7 3 6.1 2.3 G 140 6.9 3.1 5.4 2.1 G 137 6.3 3.4 5.6 2.4 G 125 6.7 3.3 5.7 2.1 G 130 7.2 3 5.8 1.6 G

142 6.9 3.1 5.1 2.3 G 143 5.8 2.7 5.1 1.9 G 138 6.4 3.1 5.5 1.8 G 128 6.1 3 4.9 1.8 G 134 6.3 2.8 5.1 1.5 G

144 6.8 3.2 5.9 2.3 G 148 6.5 3 5.2 2 G 145 6.7 3.3 5.7 2.5 G 135 6.1 2.6 5.6 1.4 G 139 6 3 4.8 1.8 G

147 6.3 2.5 5 1.9 G 150 5.9 3 5.1 1.8 G 146 6.7 3 5.2 2.3 G 149 6.2 3.4 5.4 2.3 G 141 6.7 3.1 5.6 2.4 G

sl: Sepal.Length, sw:Sepal.Width, pl:Petal.Length, pw:Petal.Width, S: setosa, V:versicolor, G: verginicaIn each block, the measurement is done by random order.

ID Control Treatment

1 0.7 1.9

2 -1.6 0.8

3 -0.2 1.1

4 -1.2 0.1

5 -0.1 -0.1

6 3.4 4.4

7 3.7 5.5

8 0.8 1.6

9 0.0 4.6

10 2.0 3.4

Student’s sleep data

> sleep extra group ID1 0.7 1 12 -1.6 1 23 -0.2 1 34 -1.2 1 45 -0.1 1 56 3.4 1 67 3.7 1 78 0.8 1 89 0.0 1 910 2.0 1 1011 1.9 2 112 0.8 2 213 1.1 2 314 0.1 2 415 -0.1 2 516 4.4 2 617 5.5 2 718 1.6 2 819 4.6 2 920 3.4 2 10

blocks

blocking factor


> t.test(extra ~ group, paired=T, data = sleep)

Paired t-test

data: extra by group t = -4.0621, df = 9, p-value = 0.002833

> summary(aov(extra~group+ID,data=sleep)) Df Sum Sq Mean Sq F value Pr(>F) group 1 12.482 12.482 16.5009 0.002833 **ID 9 58.078 6.453 8.5308 0.001901 **Residuals 9 6.808 0.756

Paired sample t-test is also done by ANOVA, by assigning the subject variable (personal variation) to blocking factor. Note that p-value 0.002833 for the group effect in the ANOVA table is the same with that of the paired t-test.

Root dry mass and shoot dry mass of rice are recorded according to the varieties of wild type(wt) and modified type (ANU843), and types of fertilizers (F10, NH4Cl, NH4NO3) used in cultivating. Two lots of fields (blocks) were used in raising the rice.

Rice data

The aim of this study is to see the effects of varieties of rice and the fertilizers

ID PlantNoBloc

kRootDryMass

ShootDryMass

trt fert variety

1 1 1 56 132 F10 F10 wt2 2 1 66 120 F10 F10 wt… … … … … … … …11 11 2 44 37 F10 F10 wt12 12 2 41 109 F10 F10 wt13 1 1 12 45 NH4Cl NH4Cl wt14 2 1 20 60 NH4Cl NH4Cl wt… … … … … … … …23 11 2 13 55 NH4Cl NH4Cl wt24 12 2 7 34 NH4Cl NH4Cl wt25 1 1 12 71 NH4NO3 NH4NO3 wt26 2 1 18 78 NH4NO3 NH4NO3 wt… … … … … … … …35 11 2 11 51 NH4NO3 NH4NO3 wt36 12 2 20 64 NH4NO3 NH4NO3 wt37 1 1 6 8 F10 +ANU843 F10 ANU84338 2 1 4 6 F10 +ANU843 F10 ANU843… … … … … … … …47 11 2 12 15 F10 +ANU843 F10 ANU84348 12 2 7 8 F10 +ANU843 F10 ANU84349 1 1 4 22 NH4Cl +ANU843 NH4Cl ANU84350 2 1 10 36 NH4Cl +ANU843 NH4Cl ANU843… … … … … … … …59 11 2 8 59 NH4Cl +ANU843 NH4Cl ANU84360 12 2 14 61 NH4Cl +ANU843 NH4Cl ANU843

61 1 1 19 75NH4NO3 +ANU843

NH4NO3 ANU843

62 2 1 18 75NH4NO3 +ANU843

NH4NO3 ANU843

… … … … … … … …

71 11 2 7 47NH4NO3 +ANU843

NH4NO3 ANU843

72 12 2 15 79NH4NO3 +ANU843

NH4NO3 ANU843

> library(DAAG)> rice

Rice data

Shoot dry mass (g)

F10

NH4Cl

NH4NO3

F10 +ANU843

NH4Cl +ANU843

NH4NO3 +ANU843

0 50 100

20

40

60

80

10

0Level of first factor

me

an

of

Sh

oo

tDry

Ma

ssF10 NH4Cl NH4NO3

variety

wtANU843

> library(lattice)> myfun<-function(x,y,...){ panel.dotplot(x,y,pch=1,col="gray40")+ panel.average(x, y, type="p", col="black", pch=3, cex=1.25) }> dotplot(trt ~ ShootDryMass,data=rice,panel=myfun, xlab="Shoot dry mass (g)")> with(rice, interaction.plot(fert,variety,ShootDryMass,xlab="Level of first factor"))

> aout<- aov(ShootDryMass ~ Block + variety * fert, data=rice)

> summary(aout) Df Sum Sq Mean Sq F value Pr(>F) Block 1 3528 3528.0 10.902 0.001563 ** variety 1 22684 22684.5 70.100 6.366e-12 ***fert 2 7019 3509.4 10.845 8.625e-05 ***variety:fert 2 38622 19311.2 59.676 1.933e-15 ***Residuals 65 21034 323.6

> summary.lm(aout)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 129.333 8.211 15.752 < 2e-16 ***Block -14.000 4.240 -3.302 0.00156 ** varietyANU843 -101.000 7.344 -13.753 < 2e-16 ***fertNH4Cl -58.083 7.344 -7.909 4.24e-11 ***fertNH4NO3 -35.000 7.344 -4.766 1.10e-05 ***varietyANU843:fertNH4Cl 97.333 10.386 9.372 1.10e-13 ***varietyANU843:fertNH4NO3 99.167 10.386 9.548 5.42e-14 ***

Residual standard error: 17.99 on 65 degrees of freedomMultiple R-squared: 0.7736, Adjusted R-squared: 0.7527 F-statistic: 37.01 on 6 and 65 DF, p-value: < 2.2e-16

Two-way ANOVA with interaction and block effect

penetrometer

Apple firmness data

> data.frame(firmness=apple) firmness1 6.82 7.33 7.24 7.35 7.46 7.37 6.88 7.69 7.210 6.511 7.712 7.713 7.414 7.015 7.216 7.617 6.718 6.719 7.220 6.8

> apple<-c(6.8,7.3,7.2,7.3,7.4,7.3,6.8,7.6,7.2,6.5,7.7,7.7,7.4,7.0,7.2,7.6,6.7,6.7,7.2,6.8)

Purpose : to check whether there is significant difference between the two testers

III

fruit 1 2 3 4 5 6 7 8 9 10Tester A B 　

1st 7.05 7.25 7.35 7.206.85

7.70 7.20 7.40 6.70 7.00

Ifruit 1 2 3 4 5

Tester

A 7.05 7.25 7.35 7.20 6.85B 7.70 7.20 7.40 6.70 7.00

II

Tester

fruit 1 2 3 4 5

A1st 6.8 7.2 7.4 6.8 7.22nd 7.3 7.3 7.3 7.6 6.5

B1st 7.7 7.4 7.2 6.7 7.22nd 7.7 7.0 7.6 6.7 6.8

Select 10 (or 5) apples randomly from a box of apples, and assignto the testers randomly.

IV

fruit 1 2 3 4 5 6 7 8 9 10Tester A B

1st 6.8 7.2 7.4 6.8 7.2 7.7 7.4 7.2 6.7 7.22nd 7.3 7.3 7.3 7.6 6.5 7.7 7.0 7.6 6.7 6.8

mean 7.05 7.25 7.35 7.20 6.85 7.707.20

7.40 6.70 7.00

> apple<-c(6.8,7.3,7.2,7.3,7.4,7.3,6.8,7.6,7.2,6.5,7.7,7.7,7.4,7.0,7.2,7.6,6.7,6.7,7.2,6.8)> apple.mean<-apply(matrix(apple,2,),2,mean) > tx1<- tx3<- rep(c("A","B"),e=5); tx2<- tx4<- rep(c("A","B"),e=10)> fx1<- factor(rep(1:5,2)); fx2<- rep(fx1,e=2)> fx3<- factor(1:10); fx4<- rep(fx3,e=2)> apple1<-data.frame(firmness=apple.mean,tester=tx1,fruit=fx1)> apple2<-data.frame(firmness=apple,tester=tx2,fruit=fx2)> apple3<-data.frame(firmness=apple.mean,tester=tx3,fruit=fx3)> apple4<-data.frame(firmness=apple,tester=tx4,fruit=fx4)

III

j 1st 2nd 3rd 4th 5th

i

fruit 1 2 3 4 5A 7.05 7.25 7.35 7.20 6.85

fruit 6 7 8 9 10B 7.70 7.20 7.40 6.70 7.00

IV

fruit 1 2 3 4 5 6 7 8 9 10Tester A B

1st 6.8 7.2 7.4 6.8 7.2 7.7 7.4 7.2 6.7 7.22nd 7.3 7.3 7.3 7.6 6.5 7.7 7.0 7.6 6.7 6.8

III

fruit 1 2 3 4 5 6 7 8 9 10Tester A B 　

1st 7.05 7.25 7.35 7.206.85

7.70 7.20 7.40 6.70 7.00

IV


i

fruit 1 2 3 4 5

A6.8 7.2 7.4 6.8 7.27.3 7.3 7.3 7.6 6.5

fruit 6 7 8 9 10

B7.7 7.4 7.2 6.7 7.27.7 7.0 7.6 6.7 6.8

Nested notation: j vs. i(j)

In table III & IV, 10 apples are tested,but they might be indexed by the variable j varying from 1 to 5. The meaning of j-th apple is changing according to the value of the variable i.In this case we need to use the notation i(j) instead of using j simply.

Random effect

Point of interests:

-Effects of specific tester A and B on the measurement (O)-Effects of specific apples (X)

Each measurement varies randomly

Each apple also has its effect on the measurement, but the apples tested are randomly selected ones.

Effects of apples are random (random effect).

Selecting randomly

Fixed effect and random effect

Points of interest

The 10 apples

A box of apples

An orchard of apples

There is (no) difference in the firmness of the 10 apples.

There is (no) difference in the firmness of apples in the box.

There is (no) difference in the firmness of apples in the orchard.

Conclusion

Tested 10 apples

Taking all apples

Selecting randomly

Fixed effect

Random effect

Ifruit 1 2 3 4 5

TesterA 7.05 7.25 7.35 7.20 6.85

B 7.70 7.20 7.40 6.70 7.00

Table I :

> summary(aov(firmness~tester+fruit,data=apple1)) Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.009 0.00900 0.1056 0.7615fruit 4 0.391 0.09775 1.1466 0.4489Residuals 4 0.341 0.08525 > summary(aov(firmness~tester+Error(fruit),data=apple1))

Error: fruit Df Sum Sq Mean Sq F value Pr(>F)Residuals 4 0.391 0.09775

Error: Within Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.009 0.00900 0.1056 0.7615Residuals 4 0.341 0.08525

> t.test(firmness~tester,data=apple1)

Welch Two Sample t-test

data: firmness by tester t = -0.3136, df = 5.962, p-value = 0.7645

> t.test(firmness~tester,paired=T,data=apple1)

Paired t-test

data: firmness by tester t = -0.3249, df = 4, p-value = 0.7615

Two-way ANOVA

Two-way ANOVA with a random effect

Paired sample t-test

Two sample t-test

(X)

(O)

(O)

( )

> summary(aov(firmness~tester+fruit,data=apple3)) Df Sum Sq Mean Sqtester 1 0.009 0.0090 fruit 8 0.732 0.0915> summary(aout3<-aov(firmness~tester+Error(fruit),data=apple3))Error: fruit Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.009 0.0090 0.0984 0.7618Residuals 8 0.732 0.0915 > summary(aov(firmness~tester,data=apple3)) Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.009 0.0090 0.0984 0.7618Residuals 8 0.732 0.0915 > coef(aout3)(Intercept) 7.17 fruit : testerB 0.06

Table III :

III


i

fruit 1 2 3 4 5A 7.05 7.25 7.35 7.20 6.85

fruit 6 7 8 9 10B 7.70 7.20 7.40 6.70 7.00

> t.test(firmness~tester,var.equal=T,data=apple3) Two Sample t-testt = -0.3136, df = 8, p-value = 0.7618> t.test(firmness~tester,data=apple3) Welch Two Sample t-testt = -0.3136, df = 5.962, p-value = 0.7645

In two-way ANOVA,The effects of fruit and random errors are impossible to decompose.

Declaring the fruit effect is random !

For table III, the two-sample t-test assuming equal variances is equivalent to the ANOVA

( )

> summary(aov(firmness~tester+fruit,data=apple2)) Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.018 0.01800 0.1554 0.6994fruit 4 0.782 0.19550 1.6874 0.2086Residuals 14 1.622 0.11586

> summary(aout2<-aov(firmness~tester+Error(fruit),data=apple2))

Error: fruit Df Sum Sq Mean Sq F value Pr(>F)Residuals 4 0.782 0.1955

Error: Within Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.018 0.01800 0.1554 0.6994Residuals 14 1.622 0.11586

II

Tester

fruit 1 2 3 4 5

A1st 6.8 7.2 7.4 6.8 7.22nd 7.3 7.3 7.3 7.6 6.5

B1st 7.7 7.4 7.2 6.7 7.22nd 7.7 7.0 7.6 6.7 6.8

(O)

Check ! > coef(aout2) > summary.lm(aout$Within)

( )

> summary(aov(firmness~tester+Error(fruit),data=apple4))

Error: fruit Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.018 0.018 0.0984 0.7618Residuals 8 1.464 0.183

Error: Within Df Sum Sq Mean Sq F value Pr(>F)Residuals 10 0.94 0.094 > summary(aov(firmness~fruit,data=apple4)) Df Sum Sq Mean Sq F value Pr(>F)fruit 9 1.482 0.16467 1.7518 0.1975Residuals 10 0.940 0.09400

IV


i

fruit 1 2 3 4 5

A6.8 7.2 7.4 6.8 7.27.3 7.3 7.3 7.6 6.5

fruit 6 7 8 9 10

B7.7 7.4 7.2 6.7 7.27.7 7.0 7.6 6.7 6.8

> summary(aov(firmness~tester+Error(fruit),data=apple3))Error: fruit Df Sum Sq Mean Sq F value Pr(>F)tester 1 0.009 0.0090 0.0984 0.7618Residuals 8 0.732 0.0915

Compare with the result for table III


> t.test(extra ~ group, paired=T, data = sleep) Paired t-testdata: extra by group t = -4.0621, df = 9, p-value = 0.002833

> summary(aov(extra~group+ID,data=sleep)) Df Sum Sq Mean Sq F value Pr(>F) group 1 12.482 12.482 16.5009 0.002833 **ID 9 58.078 6.453 8.5308 0.001901 **Residuals 9 6.808 0.756

> summary(aov(extra~group+Error(ID),data=sleep))

Error: ID Df Sum Sq Mean Sq F value Pr(>F)Residuals 9 58.078 6.4531

Error: Within Df Sum Sq Mean Sq F value Pr(>F) group 1 12.482 12.4820 16.501 0.002833 **Residuals 9 6.808 0.7564

Declaring the block is random, but no practical difference in this kind simple example.

Rat data : Sokal, R. R., and Rohlf, F. J. (1995) Biometry. W. H. Freeman and Co., New York.

A: control, B: compound 217,C: compound 217+sugar

> install.packages("asbio")> library(asbio)> data(rat)> ?rat> ratx<-rat> names(ratx)<- tolower(names(rat))> ratx$treatment<-factor(ratx$treatment)> ratx$rat<-factor(ratx$rat)> ratx$liver<-factor(ratx$liver)

Rat 1

Rat 2Treatment A

Rat 1

Livertissue 1

Glycogen reading 1Glycogen reading 2

Livertissue 2


Livertissue 3


Rat 3

Rat 4Treatment B

Rat 5

Rat 6Treatment C

> gly<-c(131,130,131,125,136,142,150,148,140,143,160,150,157,145,154,142,147,153,+ 151,155,147,147,162,152,134,125,138,138,135,136, 138,140,139,138,134,127)> trt<- rep(LETTERS[1:3],e=12)> rx1<- factor(rep(rep(1:2,e=6),3))> rx2<- factor(rep(1:6,e=6))> lx<- factor(rep(rep(1:3,e=2),6))> ratx<-data.frame(glycogen=gly,treatment=trt,rat=rx1,liver=lx)> raty<-data.frame(glycogen=gly,treatment=trt,rat=rx2,liver=lx)

> ratx glycogen treatment rat liver1 131 A 1 12 130 A 1 13 131 A 1 24 125 A 1 25 136 A 1 36 142 A 1 37 150 A 2 18 148 A 2 19 140 A 2 210 143 A 2 211 160 A 2 312 150 A 2 313 157 B 1 114 145 B 1 115 154 B 1 216 142 B 1 217 147 B 1 318 153 B 1 319 151 B 2 120 155 B 2 121 147 B 2 222 147 B 2 223 162 B 2 324 152 B 2 325 134 C 1 126 125 C 1 127 138 C 1 228 138 C 1 229 135 C 1 330 136 C 1 331 138 C 2 132 140 C 2 133 139 C 2 234 138 C 2 235 134 C 2 336 127 C 2 3

> raty glycogen treatment rat liver1 131 A 1 12 130 A 1 13 131 A 1 24 125 A 1 25 136 A 1 36 142 A 1 37 150 A 2 18 148 A 2 19 140 A 2 210 143 A 2 211 160 A 2 312 150 A 2 313 157 B 3 114 145 B 3 115 154 B 3 216 142 B 3 217 147 B 3 318 153 B 3 319 151 B 4 120 155 B 4 121 147 B 4 222 147 B 4 223 162 B 4 324 152 B 4 325 134 C 5 126 125 C 5 127 138 C 5 228 138 C 5 229 135 C 5 330 136 C 5 331 138 C 6 132 140 C 6 133 139 C 6 234 138 C 6 235 134 C 6 336 127 C 6 3

The data provided by R is ratx, but the meaning of data implies raty.

In ratx, the variable rat is nested in treatment.

Usage of aov for nested block structure (ratx data)

> summary(aov(glycogen~treatment+Error(rat/liver),data=ratx))

Error: rat Df Sum Sq Mean Sq F value Pr(>F)Residuals 1 413.44 413.44

Error: rat:liver Df Sum Sq Mean Sq F value Pr(>F)Residuals 4 164.44 41.111

Error: Within Df Sum Sq Mean Sq F value Pr(>F) treatment 2 1557.6 778.78 18.251 8.437e-06 ***Residuals 28 1194.8 42.67

　　　 rat 1 　　　　 rat 2 　　　 Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3

Treatment A R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12

Treatment B R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24

Treatment C R25 R26 R27 R28 R29 R30 R31 R32 R33 R34 R35 R36

This case means the experiment with the same two rats, rat 1 and rat 2. More precise inference is possible for this case, if the experiment is possible.

Usage of aov for nested block structure (raty data)

> summary(aov(glycogen~treatment+Error(rat/liver),dat=raty))

Error: rat Df Sum Sq Mean Sq F value Pr(>F)treatment 2 1557.56 778.78 2.929 0.1971Residuals 3 797.67 265.89

Error: rat:liver Df Sum Sq Mean Sq F value Pr(>F)Residuals 12 594 49.5

Error: Within Df Sum Sq Mean Sq F value Pr(>F)Residuals 18 381 21.167

　　　　　 Treatment A

　　　　　　　　 Treatment B

　　　　　　　　　 Treatment C

　　　　　

　　 Rat 1 　　　　 Rat 2 　　　　 Rat 3 　　　　 Rat 4 　　　　 Rat 5 　　　　 Rat 6 　　Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3

R1 R2 R3 R4 R5 R6 R7 R8 R9R10

R11

R12

R13

R14

R15

R16

R17

R18

R19

R20

R21

R22

R23

R24

R25

R26

R27

R28

R29

R30

R31

R32

R33

R34

R35

R36

Liver factor is also a nested factor. R recognizes that automatically, because the upper level factor rat is nested factor.Actually 18 pieces of liver tissues were taken. The sum of df of rat and rat:liver is (2+3+12) which is 18-1.

Thank you !!

Documents

ANOVA