45
Analysis of Variance Chapter 15

Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

Analysis of VarianceAnalysis of Variance

Chapter 15

Page 2: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

15.1 Introduction

• Analysis of variance compares two or more populations of interval data.

• Specifically, we are interested in determining whether differences exist between the population means.

• The procedure works by analyzing the sample variance.

Page 3: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• The analysis of variance is a procedure that tests to determine whether differences exits between two or more population means.

• To do this, the technique analyzes the sample variances

15.2 One Way Analysis of Variance

Page 4: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• Example 15.1– An apple juice manufacturer is planning to develop a new

product -a liquid concentrate.– The marketing manager has to decide how to market the

new product.– Three strategies are considered

• Emphasize convenience of using the product.• Emphasize the quality of the product.• Emphasize the product’s low price.

One Way Analysis of Variance

Page 5: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• Example 15.1 - continued– An experiment was conducted as follows:

• In three cities an advertisement campaign was launched .

• In each city only one of the three characteristics

(convenience, quality, and price) was emphasized.

• The weekly sales were recorded for twenty weeks

following the beginning of the campaigns.

One Way Analysis of Variance

Page 6: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

One Way Analysis of Variance

Convnce Quality Price529 804 672658 630 531793 774 443514 717 596663 679 602719 604 502711 620 659606 697 689461 706 675529 615 512498 492 691663 719 733604 787 698495 699 776485 572 561557 523 572353 584 469557 634 581542 580 679614 624 532

Convnce Quality Price529 804 672658 630 531793 774 443514 717 596663 679 602719 604 502711 620 659606 697 689461 706 675529 615 512498 492 691663 719 733604 787 698495 699 776485 572 561557 523 572353 584 469557 634 581542 580 679614 624 532

See file Xm15 -01

Weekly sales

Weekly sales

Weekly sales

Page 7: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• Solution– The data are interval– The problem objective is to compare sales in three

cities.– We hypothesize that the three population means are

equal

One Way Analysis of Variance

Page 8: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

H0: 1 = 2= 3

H1: At least two means differ

To build the statistic needed to test thehypotheses use the following notation:

• Solution

Defining the Hypotheses

Page 9: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

Independent samples are drawn from k populations (treatments).

1 2 kX11

x21

.

.

.Xn1,1

1

1x

n

X12

x22

.

.

.Xn2,2

2

2x

n

X1k

x2k

.

.

.Xnk,k

k

kx

n

Sample sizeSample mean

First observation,first sample

Second observation,second sample

X is the “response variable”.The variables’ value are called “responses”.

Notation

Page 10: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

Terminology

• In the context of this problem…Response variable – weekly salesResponses – actual sale valuesExperimental unit – weeks in the three cities when we record sales figures.Factor – the criterion by which we classify the populations (the treatments). In this problems the factor is the marketing strategy.Factor levels – the population (treatment) names. In this problem factor levels are the marketing trategies.

Page 11: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

Two types of variability are employed when testing for the equality of the population

means

The rationale of the test statistic

Page 12: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

Graphical demonstration:Employing two types of variability

Page 13: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

20

25

30

1

7

Treatment 1 Treatment 2 Treatment 3

10

12

19

9

Treatment 1Treatment 2Treatment 3

20

161514

1110

9

10x1

15x2

20x3

10x1

15x2

20x3

The sample means are the same as before,but the larger within-sample variability makes it harder to draw a conclusionabout the population means.

A small variability withinthe samples makes it easierto draw a conclusion about the population means.

Page 14: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

The rationale behind the test statistic – I

• If the null hypothesis is true, we would expect all the sample means to be close to one another (and as a result, close to the grand mean).

• If the alternative hypothesis is true, at least some of the sample means would differ.

• Thus, we measure variability between sample means.

Page 15: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• The variability between the sample means is measured as the sum of squared distances between each mean and the grand mean.

This sum is called the Sum of Squares for Treatments

SSTIn our example treatments arerepresented by the differentadvertising strategies.

Variability between sample means

Page 16: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

2k

1jjj )xx(nSST

There are k treatments

The size of sample j The mean of sample j

Sum of squares for treatments (SST)

Note: When the sample means are close toone another, their distance from the grand mean is small, leading to a small SST. Thus, large SST indicates large variation between sample means, which supports H1.

Page 17: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• Solution – continuedCalculate SST

2k

1jjj

321

)xx(nSST

65.608x00.653x577.55x

= 20(577.55 - 613.07)2 + + 20(653.00 - 613.07)2 + + 20(608.65 - 613.07)2 == 57,512.23

The grand mean is calculated by

k21

kk2211

n...nnxn...xnxn

X

Sum of squares for treatments (SST)

Page 18: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

Is SST = 57,512.23 large enough to reject H0 in favor of H1?

See next.

Sum of squares for treatments (SST)

Page 19: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• Large variability within the samples weakens the “ability” of the sample means to represent their corresponding population means.

• Therefore, even though sample means may markedly differ from one another, SST must be judged relative to the “within samples variability”.

The rationale behind test statistic – II

Page 20: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• The variability within samples is measured by adding all the squared distances between observations and their sample means.

This sum is called the Sum of Squares for Error

SSEIn our example this is the sum of all squared differencesbetween sales in city j and thesample mean of city j (over all the three cities).

Within samples variability

Page 21: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• Solution – continuedCalculate SSE

Sum of squares for errors (SSE)

k

jjij

n

i

xxSSE

sss

j

1

2

1

23

22

21

)(

24.670,811,238,700.775,10

(n1 - 1)s12 + (n2 -1)s2

2 + (n3 -1)s32

= (20 -1)10,774.44 + (20 -1)7,238.61+ (20-1)8,670.24 = 506,983.50

Page 22: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

Is SST = 57,512.23 large enough relative to SSE = 506,983.50 to reject the null hypothesis that specifies that all the means are equal?

Sum of squares for errors (SSE)

Page 23: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

To perform the test we need to calculate the mean squaresmean squares as follows:

The mean sum of squares

Calculation of MST - Mean Square for Treatments

12.756,2813

23.512,571

k

SSTMST

Calculation of MSEMean Square for Error

45.894,8360

50.983,509

kn

SSEMSE

Page 24: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

Calculation of the test statistic

23.3

45.894,8

12.756,28

MSE

MSTF

with the following degrees of freedom:v1=k -1 and v2=n-k

Required Conditions:1. The populations tested are normally distributed.2. The variances of all the populations tested are equal.

Page 25: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

And finally the hypothesis test:

H0: 1 = 2 = …=k

H1: At least two means differ

Test statistic:

R.R: F>F,k-1,n-k

MSEMST

F

The F test rejection region

Page 26: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

The F test

Ho: 1 = 2= 3

H1: At least two means differ

Test statistic F= MST MSE= 3.2315.3FFF:.R.R 360,13,05.0knk 1

Since 3.23 > 3.15, there is sufficient evidence to reject Ho in favor of H1, and argue that at least one of the mean sales is different than the others.

23.317.894,812.756,28

MSEMST

F

Page 27: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

-0.02

0

0.02

0.04

0.06

0.08

0.1

0 1 2 3 4

• Use Excel to find the p-value– fx Statistical FDIST(3.23,2,57) = .0467

The F test p- value

p Value = P(F>3.23) = .0467

Page 28: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

Excel single factor ANOVA

SS(Total) = SST + SSE

Anova: Single Factor

SUMMARYGroups Count Sum Average Variance

Convenience 20 11551 577.55 10775.00Quality 20 13060 653.00 7238.11Price 20 12173 608.65 8670.24

ANOVASource of Variation SS df MS F P-value F crit

Between Groups 57512 2 28756 3.23 0.0468 3.16Within Groups 506984 57 8894

Total 564496 59

Xm15-01.xls

Page 29: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

15.3 Analysis of Variance Experimental Designs• Several elements may distinguish between one

experimental design and others.– The number of factors.

• Each characteristic investigated is called a factor.• Each factor has several levels.

Page 30: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

Factor ALevel 1Level2

Level 1

Factor B

Level 3

Two - way ANOVATwo factors

Level2

One - way ANOVASingle factor

Treatment 3 (level 1)

Response

Response

Treatment 1 (level 3)

Treatment 2 (level 2)

Page 31: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• Groups of matched observations are formed into blocks, in order to remove the effects of “unwanted” variability.

• By doing so we improve the chances of detecting the variability of interest.

Independent samples or blocks

Page 32: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• Fixed effects– If all possible levels of a factor are included in our analysis we

have a fixed effect ANOVA.– The conclusion of a fixed effect ANOVA applies only to the

levels studied.• Random effects

– If the levels included in our analysis represent a random sample of all the possible levels, we have a random-effect ANOVA.

– The conclusion of the random-effect ANOVA applies to all the levels (not only those studied).

Models of Fixed and Random Effects

Page 33: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• In some ANOVA models the test statistic of the fixed effects case may differ from the test statistic of the random effect case.

• Fixed and random effects - examples– Fixed effects - The advertisement Example (15.1): All the

levels of the marketing strategies were included – Random effects - To determine if there is a difference in the

production rate of 50 machines, four machines are randomly selected and there production recorded.

Models of Fixed and Random Effects.

Page 34: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

15.4 Randomized Blocks (Two-way) Analysis of Variance

• The purpose of designing a randomized block experiment is to reduce the within-treatments variation thus increasing the relative amount of between treatment variation.

• This helps in detecting differences between the treatment means more easily.

Page 35: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

Treatment 4

Treatment 3

Treatment 2

Treatment 1

Block 1Block3 Block2

Block all the observations with some commonality across treatments

Randomized Blocks

Page 36: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

TreatmentBlock 1 2 k Block mean

1 X11 X12 . . . X1k2 X21 X22 X2k...b Xb1 Xb2 Xbk

Treatment mean

1]B[x

2]B[x

b]B[x

1]T[x 2]T[x k]T[x

Block all the observations with some commonality across treatments

Randomized Blocks

Page 37: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• The sum of square total is partitioned into three sources of variation– Treatments– Blocks– Within samples (Error)

SS(Total) = SST + SSB + SSESS(Total) = SST + SSB + SSE

Sum of square for treatments Sum of square for blocks Sum of square for error

Recall. For the independent samples design we have: SS(Total) = SST + SSE

Partitioning the total variability

Page 38: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

Calculating the sums of squares

• Formulai for the calculation of the sums of squares

TreatmentBlock 1 2 k Block mean

1 X11 X12 . . . X1k2 X21 X22 X2k...b Xb1 Xb2 Xbk

Treatment mean

1]B[x

2]B[x

1]T[x 2]T[x k]T[x x2

1 X)]T[x(b

...X)]T[x(b

2

2

2

k X)]T[x(b

SST =

2

1 X)]B[x(k

2

2 X)]B[x(k

2

k X)]B[x(k

SSB=

...)()(...

)()(...)()()(

22

21

222

212

221

211

XxXX

XxXxXxXxTotalSS

kk

Page 39: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

Calculating the sums of squares

• Formulai for the calculation of the sums of squares

TreatmentBlock 1 2 k Block mean

1 X11 X12 . . . X1k2 X21 X22 X2k...b Xb1 Xb2 Xbk

Treatment mean

1]B[x

2]B[x

1]T[x 2]T[x k]T[x x2

1 X)]T[x(b

...X)]T[x(b

2

2

2

k X)]T[x(b

SST =

2

1 X)]B[x(k

2

2 X)]B[x(k

2

k X)]B[x(k

SSB=

...)X]B[x]T[xx()X]B[x]T[xx(

...)X]B[x]T[xx()X]B[x]T[xx(

...)X]B[x]T[xx()X]B[x]T[xx(SSE

22kk2

21kk1

22222

21212

22121

21111

Page 40: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

To perform hypothesis tests for treatments and blocks we need

• Mean square for treatments• Mean square for blocks• Mean square for error

Mean Squares

1kSST

MST

1bSSB

MSB

1bknSSE

MSE

Page 41: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

Test statistics for the randomized block design ANOVA

MSEMST

F

MSEMSB

F

Test statistic for treatments

Test statistic for blocks

Page 42: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• Testing the mean responses for treatments

F > F,k-1,n-k-b+1

• Testing the mean response for blocks

F> F,b-1,n-k-b+1

The F test rejection regions

Page 43: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• Example 15.2– Are there differences in the effectiveness of cholesterol

reduction drugs? – To answer this question the following experiment was

organized:• 25 groups of men with high cholesterol were matched by age

and weight. Each group consisted of 4 men.• Each person in a group received a different drug.• The cholesterol level reduction in two months was recorded.

– Can we infer from the data in Xm15-02 that there are differences in mean cholesterol reduction among the four drugs?

Randomized Blocks ANOVA - Example

Page 44: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

• Solution– Each drug can be considered a treatment.

– Each 4 records (per group) can be blocked, because they are matched by age and weight.

– This procedure eliminates the variability in cholesterol reduction related to different combinations of age and weight.

– This helps detect differences in the mean cholesterol reduction attributed to the different drugs.

Randomized Blocks ANOVA - Example

Page 45: Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested

BlocksTreatments b-1 MST / MSE MSB / MSE

Conclusion: At 5% significance level there is sufficient evidence to infer that the mean “cholesterol reduction” gained by at least two drugs are different.

K-1

Randomized Blocks ANOVA - Example

ANOVASource of Variation SS df MS F P-value F critRows 3848.7 24 160.36 10.11 0.0000 1.67Columns 196.0 3 65.32 4.12 0.0094 2.73Error 1142.6 72 15.87

Total 5187.2 99