Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics [email protected] (011) 4674108

Analysis of Variance

1

Dr. Mohammed AlahmedPh.D. in [email protected]

(011) 4674108

mailto:[email protected]

Introduction• Analysis of variance (ANOVA), as the name

implies, is a statistical technique that is intended to analyze variability in data in order to infer the inequality among population means

• The purpose of ANOVA is much the same as the t-tests presented in the preceding sections.

• The goal is to determine whether the mean differences that are obtained for sample data are sufficiently large to justify a conclusion that there are mean differences between the populations from which the samples were obtained.

2

• The difference between ANOVA and the t-tests is that ANOVA can be used in situations where there are more than two means being compared, whereas the t-tests are limited to situations where only two means are involved.

• If more than two means are compared, repeated use of the independent-samples t-test will lead to a higher Type I error rate than the level set for each t-test. (multiple comparison problem)

3

The basic ANOVA situation

• Variables in ANOVA: – Dependent variable is metric.– Independent variable(s) is nominal with two or

more levels – also called treatment, manipulation, or factor.

• One-Way ANOVA:– Two variables: 1 Categorical, 1 Quantitative – Main Question: Do the (means of) the

quantitative variables depend on which group (given by categorical variable) the individual is in?

– If categorical variable has only 2 values: • 2-sample t-test

– ANOVA allows for 3 or more groups.4

AVOVA Hypotheses

• The null hypothesis:– The means for all groups are the same

(equal).H0: 1 = 2 = ………. = k

• The alternative hypothesis:– The means are different for at least one

pair of groups. H1: 1 2 ………. k

• If we reject H0 How do you determine which means are significantly different?

5

• Before we begin, we must consider the assumptions required to use ANOVA– The underlying distributions of

the populations are normal.– The variance of each group is

equal (This is critical for ANOVA).

6

• If all of the groups had the same means, the distributions for all of the populations would look exactly the same (overlaid graphs)

7

• Now, if the means of the populations were different, the picture would look like this. Notice that the variability between the groups is much greater than within a group

Sources of variance

• When we take samples from each group, there will be two sources of variability:– Within group variability - when we sample

from a group there will be variability from person to person in the same group

– Between group variability – the difference from group to group• If the between group variability is large, the

means of the groups are likely not to bethe same• We can use the two types of variability to

determine if the means are likely different

8

• Blue arrow: within group, red arrow: between group

• Notice that when the distribution are separate, the between group variability is much greater than the within group

9

Notation for ANOVA

All groups:• n = number of individuals all together• K = number of groups• = mean for entire data set is

Group i has:• ni = # of individuals in group k

• xij = value for individual j in group k• = mean for group k• si = standard deviation for group k

10

ix

x

Sources of variability

• ANOVA measures two sources of variation in the data and compares their relative sizes:1. variation BETWEEN groups:

• for each data value look at the difference between its group mean and the overall mean.

2. variation WITHIN groups: • for each data value we look at the difference between that value and the mean of its group.

11

2iij xx

2xxi

F-statistic

• The F-statistic assesses whether you can conclude that statistical differences are present somewhere between the group means.

• The F-statistic is a ratio of the Between Group Variation divided by the Within Group Variation:

• This test statistic is compared to an F-table with k-1 and n-k degrees of freedom

• A large F is evidence against H0, since it indicates that there is more difference between groups than within groups.

12

W

B

MS

MS

Within

BetweenF

ANOVA Table

Source of variation SS df MS F p-value

Between SSB k - 1 MSB MSB/MSW

Within SSW n - k MSW

Total SST n - 1

13

k

j

n

ij

k

j

n

ijij

k

j

n

iij

jjj

xxxxxx1 1

2

1 1

2

1 1

2

Total sum of squares (SST)

Within group sum of squares (SSW)

Between group sum of squares (SSB)

+=

Example

• A researcher wishes to try three different techniques to lower the blood pressure of individuals diagnosed with high blood pressure.

• The subjects are randomly assigned to three groups; the first group takes medication, the second group exercises, and the third group follows a special diet.

• After four weeks, the reduction in each person’s blood pressure is recorded.

14

• The data are:

15

Diet Exercise Medication

5 6 10

9 8 12

12 3 9

8 0 15

4 2 13

• At α = 0.05, test the claim that there is no difference among the means

• The hypotheses to be tested are:H0: μ1 = μ2 = μ3

H1 : At least one mean is different from the others

Analyze → Compare Means → One-Way Anova

In the One-Way ANOVA menu window, place “Bp” in the Dependent List box and “Treatment” in the Factor box,

16

• To complete the process described in the text, select OK in this window without doing anything else. The resulting output is the ANOVA table shown below.

17

• As mentioned in the text, this result allows us only to conclude that at least one (true) treatment mean differs from the others; we can say nothing about the relative sizes of the (true) treatment means

• Further tests can be performed to determine which treatment mean(s) differ and, consequently, determine which (true) treatment mean(s) might have the highest (or lowest) values

since p <α, the null hypothesis is rejected.

• Verifying the Assumptions for the One-Way ANOVA F-test

• The assumptions for the one-way ANOVA F-test, as expressed in in the text, are: 1. The populations from which the samples

were obtained must be normally or approximately normally distributed.

2. The samples must be independent of one another

3. The variances of the populations must be equal

18

Assessing the normality and constant variance assumptions

19

Analyze → Descriptive Statistics → Explore

This table provides results of the test of the following hypotheses:

H0 : The population random variable is normally distributed

H1: The population random variable is not normally distributed

An ideal Normal QQ Plot will have plotted points that appear to approximately fit a linear trend;

If the error bars are close to each other in length, as appears to be the case here, one might expect the constant variance assumption to be approximately valid. 20

The second table of use is that of the “Test of Homogeneity of Variances” shown below

The test to use here is the one that is “Based on Median”. This table provides results of the test of the following hypotheses:

H0 : The population variances are equal H1: The population variances are not equal.

The p-value given in the last column is sufficiently large to conclude that the assumption of constant variances should not be rejected – The constant variance assumption may be assumed valid.

Verifying the validity of the independence assumption: The validity of the independence assumption can be difficult to assess. The best approach is to ensure that the independence of the samples is ensured by proper sampling and data collection practices.

21

Pair-wise Comparisons of Treatment Means

• Where’s the Difference?– Once ANOVA indicates that the groups

do not all appear to have the same means, what do we do?

– We can do pair wise comparisons to determine which specific means are different, but we must still take into account the problem with multiple comparisons!

22

click on the Post Hoc

23

We conclude that there is a significant difference between the medication group and exercise group, but no difference between the medication and diet and exercise and diet.

24

Documents

Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics [email protected] (011) 4674108