31
Chapter 12: Analysis of Variance Rep Tillage Tillage I II III A verage Plow 118.3 125.6 123.8 122.6 V -chisel 115.8 122.5 118.9 119.1 Coulterchisel 124.1 118.5 113.3 118.6 Std. chisel 109.2 114.0 122.5 115.2 H vy. disk 118.1 117.5 121.4 119.0 Lt. disk 118.3 113.7 113.7 115.2

Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Embed Size (px)

Citation preview

Page 1: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Chapter 12: Analysis of Variance

Rep TillageTillage I II III AveragePlow 118.3 125.6 123.8 122.6V-chisel 115.8 122.5 118.9 119.1Coulter chisel 124.1 118.5 113.3 118.6Std. chisel 109.2 114.0 122.5 115.2Hvy. disk 118.1 117.5 121.4 119.0Lt. disk 118.3 113.7 113.7 115.2

Page 2: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Chapter Goals

• Test a hypothesis about several means.

• Consider the analysis of variance technique (ANOVA).

• Restrict the discussion to single-factor ANOVA.

Page 3: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

12.1: Introduction to the Analysis of Variance Technique

• Compare several means simultaneously.

• The analysis of variance technique allows us to test the null hypothesis that all means are equal against the alternative hypothesis that at least one mean value is different, with a specified value of .

Page 4: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Example: A study was conducted to determine if the drying time for a certain paint is affected by the type of applicator used. The data in the table below represents the drying time (in minutes) for 3 different applicators when the paint was applied to standard wallboard. Is there any evidence to suggest the type of applicator has a significant effect on the paint drying time at the 0.05 level?

Note:

1. The type of applicator is a level.

2. The data values from repeated samplings are called replicates.

Page 5: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Applicator (Level)Brush Roller Pad

(i = 1 ) (i = 2) (i = 3)39.1 31.6 32.739.4 33.4 33.231.1 30.2 28.733.7 41.8 29.230.5 33.9 25.834.6 31.4

26.729.5

Sum 208.4 170.9 237.2

Mean 34.73 34.18 29.65

Sample Results:

1C 2C 3C

1x 2x 3x

Page 6: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Note:

1. The drying time is measured by the mean value.

is the mean drying time for level i, i = 1, 2, 3.

2. There is a certain amount of variation among the means.

3. Some variation can be expected, even if all three population means are equal.

4. Consider the question: “Is the variation among the sample means due to chance, or it is due to the effect of applicator on drying time?”

5. You might consider a dotplot of the data to see if the graphs suggests a difference among the levels?

ix

Page 7: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Solution:

1. The Set-up:

a. Population parameter of concern: The mean at each level of the test factor. Here, the mean drying time for each applicator.

b. The null and the alternative hypothesis:

H0: 1 = 2 = 3

The mean drying time is the same for each applicator.

Ha: i j for some i j

Not all drying time means are equal.

Page 8: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

2. The Test Criteria:

a. Assumptions: The data was randomly collected and all observations are independent. The effects due to

chance and untested factors are assumed to be normally distributed.

b. Test statistic: F test statistic (see below).

c. Level of significance: = 0.05

3. The Sample Evidence:

a. Sample information: Data listed in the given table.

b. Calculate the value of the test statistic:

The F statistic is a ratio of two variances.

Separate the variance in the entire data set into two parts.

Page 9: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Partition the Total Sum of Squares:

Consider the numerator of the fraction used to define the sample variance:

The numerator of this fraction is called the sum of squares, or total sum of squares.

Notation:

nsobservatio ofnumber total

levelfor nsobservatio ofnumber

column for total

i

i

i

kn

ik

iC

1

)( 22

n

xxs

Page 10: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

data in the variation total

)(SS(total)2

2

n

xx

levelsfactor between variation

SS(factor)2

3

23

2

22

1

21

n

x

k

C

kC

kC

rowswithin variation

SS(factor)SS(total)

)(SS(error)3

23

2

22

1

212

k

C

kC

kC

x

Page 11: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Calculations:

89.31280.2000369.2031619

5.61669.20316)(SS(total)

222

n

xx

97.1088.2000377.20112

19)5.616(

82.237

59.170

64.208

SS(factor)

2222

2

3

23

2

22

1

21

n

x

k

C

kC

kC

92.20397.10889.312

SS(factor)SS(total)SS(error)

Page 12: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

An ANOVA table is often used to record the sums of squares and to organize the rest of the calculations.

Format for the ANOVA Table:

Source df SS MS

Factor 108.97

Error 203.92

Total 312.89

Page 13: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Degrees of freedom, df, associated with each of the three sources of variation:

1. df(factor): one less than the number of levels (columns), c, for which the factor is tested.

df(factor) = c 12. df(total): one less than the total number of observations, n.

df(total) = n 1n = k1 + k2 + k3 + ...

3. df(error): sum of the degrees of freedom for all levels tested. Each column has ki 1 degrees of freedom.

df(error) = (k1 1) + (k2 1) + (k3 1) + ...

= n c

Page 14: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Calculations:

df(factor) = df(applicator) = c 1 = 3 1 = 2

df(total) = n 1 = 19 1 = 18

df(error) = n c = 19 3 = 16

Note:

The sums of squares and the degrees of freedom must check.

SS(factor) + SS(error) = SS(total)

df(factor) + df(error) = df(total)

Page 15: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Mean Square:

The mean square for the factor being tested and for the error is obtained by dividing the sum-of-square value by the corresponding number of degrees of freedom.

Calculations:

df(factor)SS(factor)

MS(factor) df(error)SS(error)

MS(error)

75.1216

92.203df(error)SS(error)

MS(error)

49.542

97.108df(factor)SS(factor)

MS(factor)

Page 16: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

The Complete ANOVA Table:

The Test Statistic:

Numerator degrees of freedom = df(factor)

Denominator degrees of freedom = df(error)

Source df SS MS

Factor 2 108.97 54.59

Error 16 203.92 12.75

Total 18 312.89

27.475.1249.54

MS(error)MS(factor)

* F

Page 17: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

4. The Probability Distribution (Classical Approach):

a. Critical value: F(2, 16, 0.05) = 3.63

b. F* is in the critical region.

4. The Probability Distribution (p-Value Approach):

a. The p-value:

Table 9: 0.025 < P < 0.05; By computer: P = 0.033

b. The p-value is smaller than the level of significance, .

5. The Results:

a. Decision: Reject H0.

b. Conclusion: There is evidence to suggest the three population means are not all the same. The type of applicator has a significant effect on the paint drying time.

Page 18: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

12.2: The Logic Behind ANOVA

• Many experiments are conducted to determine the effect that different levels of some test factor have on a response variable.

• Single-factor ANOVA: obtain independent random samples at each of several levels of the factor being tested.

• Draw a conclusion concerning the effect that the levels of the test factors have on the response variable.

Page 19: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

The Logic of the Analysis of Variance Technique:

1. In order to compare the means of the levels of the test factor, a measure of the variation between the levels (columns), the MS(factor), is compared to a measure of the variation within the levels, MS(error).

2. If the MS(factor) is significantly larger than the MS(error), then the means for each of the factor levels are not all the same.

This implies the factor being tested has a significant effect on the response variable.

3. If the MS(factor) is not significantly larger than the MS(error), we cannot reject the null hypothesis that all means are equal.

Page 20: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Level 1

Level 2

Level 3

20 25 30 35 40

Time

Example: Do the box-and-whisker plots below show sufficient evidence to indicate a difference in the three population means?

Page 21: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Solution:

1. The box-and-whisker plots show the relationship among the three samples.

2. The plots suggest the three sample means are different from each other.

3. This suggests the population means are different.

4. There is relatively little within-sample variation, but a relatively large amount of between-sample variation.

Page 22: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Level 1

Level 2

Level 3

Level 4

60 80 100 120 140

Speed

Example: Do the box-and-whisker plots below show sufficient evidence to indicate a difference in the three population means?

Page 23: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Solution:

1. The box-and-whisker plots show the relationship among the four samples.

2. The plots suggest the four sample means are not different from each other.

3. There is relatively little between-sample variation, but a relatively large amount of within-sample variation.

The data values within each sample cover a relatively wide range of values.

Page 24: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Assumptions:

1. Goal: to investigate the effect of various levels of a factor on a response variable.

a. We would like to know which level is most advantageous.

b. Probably want to reject H0 in favor of Ha.

c. A follow-up study might determine the “best” level of the factor.

2. a. The effects due to chance and due to untested factors are normally distributed.

b. The variance is constant throughout the experiment.

3. a. All observations are independent.

b. The data is gathered (or tests are conducted) in a randomized order to ensure independence.

Page 25: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

12.3: Applications of Single-Factor ANOVA

• Consider the notation used in ANOVA.

• Each observation has two subscripts: first indicates the column number (test factor level); second identifies the replicate (row) number.

• The column totals: Ci

• The grand total (sum of all x’s): T

Page 26: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Notation used in ANOVA:

Factor Levels

Sample from Sample from Sample from Sample from

Replication Level 1 Level 2 Level 3 Level C

k = 1 x 1,1 x 2,1 x 3,1 x c ,1

k = 2 x 1,2 x 2,2 x 3,2 x c ,2

k = 3 x 1,3 x 2,3 x 3,3 x c ,3

Column C 1 C 2 C 3 C c T

Totals T = grand total = sum of all x 's = x = C i

Page 27: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Mathematical Model for Single-Factor ANOVA:

1. : mean value for all the data without respect to the test factor.

2. Fc: effect of factor (level) c on the response variable.

3. k(c): experiment error that occurs among the k replicates in each of the c columns.

)(, ckckc Fx

Page 28: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Example: A study was conducted to determine the effectiveness of various drugs on post-operative pain. The purpose of the experiment was to decide if there is any difference in length of pain relief due to drug. Eighty patients with similar operations were selected at random and split into four groups. Each patient was given one of four drugs and checked regularly. The length of pain relief (in hours) was recorded for each patient. At the 0.05 level of significance, is there any evidence to reject the claim that the four drugs are equally effective?

Note:

1. The data is omitted here.

2. The ANOVA table is given in a later slide.

Page 29: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

Solution:

1. The Set-up:

a. Population parameter of interest: The mean time of pain relief for each factor (drug).

b. The null and alternative hypothesis:

H0: 1 = 2 = 3 = 4

Ha: the means are not all equal.

2. The Hypothesis Test Criteria:

a. Assumptions: The patients were randomly assigned to drug and their times are independent of each other. The effects due to chance and untested factors are assumed to be normally distributed.

b. Test statistic: F* with df(numerator) = df(factor) = 3 and df(denominator) = df(error) = 80 4 = 76

c. Level of significance: = 0.05

Page 30: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

3. The Sample Evidence:

a. Sample information: The ANOVA table:

b. Calculate the value of the test statistic:

Source df SS MS

Factor 3 70.84 23.61

Error 76 226.05 2.97

Total 79 296.89

95.797.261.23

MS(error)MS(factor)

* F

Page 31: Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the

4. The Probability Distribution (Classical Approach):

a. Critical value: F(3, 76, 0.05) 2.72

b. F* is in the critical region.

4. The Probability Distribution (p-Value Approach):

a. The p-value:

P = P(F* > 7.95, with dfn = 3, dfd = 76) < 0.01

By computer: P .0001

b. The p-value is smaller than the level of significance, .

5. The Results:

a. Decision: Reject H0.

b. Conclusion: There is evidence to suggest that not all drugs have the same effect on length of pain relief.