32
1 Analysis of variance (ANOVA)- the General Linear Model (GLM) Kazimieras Pukėnas

1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

Embed Size (px)

Citation preview

Page 1: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

1

Analysis of variance (ANOVA)-the General Linear Model (GLM)

Kazimieras Pukėnas

Page 2: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

2

Analysis of variance (ANOVA) is used to uncover the main and interaction effects of categorical independent variables (called "factors") on an interval dependent variable. The General Linear Model is "general" in the sense that one may implement both regression and ANOVA models.

The GLM Univariate procedure provides regression analysis and analysis of variance for one dependent variable by one or more factors and/or variables. The factor variables divide the population into groups. Using this GLM procedure, you can test null hypotheses about the effects of other variables on the means of various groupings of a single dependent variable. You can investigate interactions between factors as well as the effects of individual factors. In addition, the effects of covariates and covariate interactions with factors can be included.

INTRODUCTION

Page 3: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

3

INTRODUCTION

The GLM Multivariate procedure provides analysis of variance for multiple dependent variables by one or more factor variables or covariates.

The GLM Repeated Measures procedure provides analysis of variance when the same measurement is made several times on each subject or case. If between-subjects factors are specified, they divide the population into groups. Using this general linear model procedure, you can test null hypotheses about the effects of both the between-subjects factors and the within-subjects factors. You can investigate interactions between factors as well as the effects of individual factors. In addition, the effects of constant covariates and covariate interactions with the between-subjects factors can be included.

Page 4: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

4

GLM Univariate, one-way ANOVA One-way ANOVA tests differences in a single interval

dependent variable among two, three, or more groups formed by the categories of a single categorical independent variable (factor).

Data requirements: In all GLM models, the dependent(s) variable(s) X1…Xk

is/are continuous. The independents may be categorical factors (including both numeric and string types) or quantitative covariates.

The data are a random sample from a normal population. The variance(s) of the dependent variable(s) is/are assumed to be the same for each cell formed by categories of the factor(s). Analysis of variance is robust to departures from normality, although the data should be symmetric. To check assumptions, you can use homogeneity of variances tests.

Page 5: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

5

One-way ANOVA can be very briefly in popular form explained as follows:

The idea of the analysis of variance is to take a summary of the variability in all the observations and partition it into separate sources. This sum of squares total SST is partitioned into two separate, and additive, pieces. These are a sum of squares among (between), SSA and a sum of squares within, SSW

;where

; ;

;

SSBSSWSST

k

i

n

jij

i

XXSST1 1

2

k

i

n

jiij

i

XXSSW1 1

2

k

iii XXnSSB

1

2

GLM Univariate, one-way ANOVA

Page 6: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

6

ijXX

iX

- the jth observation in the ith group; - the overall mean of all samples; - the sample mean for the ith group; k - the number of independent groups (populations); ni - the size of ith group;

The ratio MSA/MSW serves as a measure of the statistical importance or significance of the differences among the group means because MSA~MSW if the null hypothesis is true, i. e. (the homogeneity of variances is assumed);

k 21

j

GLM Univariate, one-way ANOVA

Page 7: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

7

GLM Univariate, one-way ANOVA

The statistical hypotheses under consideration

Decision rule:The null hypothesis H0 is rejected (not all means are equal) if ; The null hypothesis H0 is not rejected (there is no difference between means) if ; where is the significance level;

p

p

൜𝑯0: 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘; 𝑯1:𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 𝑚𝑒𝑎𝑛𝑠 𝑑𝑖𝑓𝑓𝑒𝑟.

Page 8: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

8

GLM Univariate, two-way ANOVA

Two-way ANOVA analyzes one interval dependent in terms of the categories (groups) formed by two independents (factors), one of which may be conceived as a control variable, and tests the interaction of two independent variables.

Data requirements are similar to one-way ANOVA: The data are a random sample from a normal

population; In the population, all cell variances are the same;

Analysis of variance is robust to departures from normality, although the data should be symmetric.

Page 9: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

9

GLM Univariate, two-way ANOVA

The two‐way ANOVA tests three hypotheses: the main effect for factor A; the main effect for factor B; interaction effect of two factors. For interval scale dependent variables with

unknown means , and variance ,where a – the number of categories of factor A, b – the number of categories of factor B, we can test the hypotheses:

where null hypothesis H0 is that the factor A has no influence on the response variable;

2~ ,NX jiji

;,,1;,,1 bjai ji 2

൜𝑯0: 𝜇1Σ = 𝜇2Σ = ⋯ = 𝜇𝑎Σ; 𝑯1:𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 𝑚𝑒𝑎𝑛𝑠 𝑑𝑖𝑓𝑓𝑒𝑟.

Page 10: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

10

GLM Univariate, two-way ANOVA

where null hypothesis H0 is that the factor B has no influence on the response variable;

where null hypothesis H0 assumed that there is no interaction effect of two factors;

; ; - overall mean;

Each null hypothesis H0 is rejected if ;

൜𝑯0: 𝜇Σ1 = 𝜇Σ2 = ⋯ = 𝜇Σb; 𝑯1:𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 𝑚𝑒𝑎𝑛𝑠 𝑑𝑖𝑓𝑓𝑒𝑟. ൜𝑯0: 𝜇𝑖𝑗 − 𝜇𝑖𝛴− 𝜇𝛴𝑗 + 𝜇= 0,𝑯1: 𝜇𝑖𝑗 − 𝜇𝑖Σ − 𝜇Σ𝑗 + 𝜇≠ 0;

𝜇𝑖Σ = σ 𝜇𝑖𝑗𝑏𝑗=1𝑏 𝜇Σ𝑗 = σ 𝜇𝑖𝑗𝑎𝑖=1𝑎 p

Page 11: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

11

GLM Univariate, two-way ANOVA

The one-way and two-way ANOVA procedures in SPSS are performed in similar manner, therefore, we present the step-by-step instructions on how to perform a two-way ANOVA.

Open the file with the data analyzed. From the menus choose: Analyze General Linear Model

Univariate... Select a dependent variable in Univariate dialog box

(Fig.1) and select variables for Fixed Factor(s), Random Factor(s), and Covariate(s), as appropriate for your data. A covariates are an interval-level independents and are commonly used as control variables to test the main and interaction effects of categorical variables on a continuous dependent variable, controlling for the effects of selected other continuous variables which covary with the dependent.

Page 12: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

12

GLM Univariate, two-way ANOVA

Fig. 1. Univariate dialog box

Page 13: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

13

GLM Univariate, two-way ANOVA

Leave default Full Factorial model in dialog box Univariate: Model, i.e. you can skip Model... and Contrasts…;

Click Plots... and specify a plot by selecting factors for the horizontal axis and, optionally, factors for separate lines and separate plots in Univariate: Profile Plots dialog box (Fig. 2); the plot must be added to the Plots list.A profile plot is a line plot in which each point indicates the estimated marginal mean of a dependent variable at one level of a factor. A profile plot of one factor shows whether the estimated marginal means are increasing or decreasing across levels. For two factors, parallel lines indicate that there is no interaction between factors. Nonparallel lines indicate an interaction.

Click Continue.

Page 14: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

14

GLM Univariate, two-way ANOVA

Fig. 2 . Univariate: Profile Plots dialog box

Page 15: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

15

GLM Univariate, two-way ANOVA

Click Post Hoc... to select post hos tests in Univariate: Post Hoc Multiple Comparisons for Observed Means dialog box (Fig. 3); Once you have determined that differences exist among the means and factor has more than two levels, post hoc range tests and pairwise multiple comparisons can determine which means differ.

The Bonferroni and Tukey’s honestly significant difference tests are commonly used multiple comparison tests. But Bonferroni test is unappropriate when factor has multiple levels.

Select the corresponding variables (factors) into the Post Hoc Tests for box, check Tukey’s test and click Continue.

Page 16: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

16

GLM Univariate, two-way ANOVA

Fig. 3. Univariate: Post Hoc Multiple Comparisonsfor Observed Means dialog box

Page 17: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

17

GLM Univariate, two-way ANOVA

Click Options... At the top of Univariate: Options box (Fig. 4) you cold ask for Estimated Marginal Means to be displayed, by moving the variables (factors and interactions) to the right-hand box Display Means for. This is used when you want to adjust the means to remove the effect of covariate. When you haven’t got a covariate, the Estimated Marginal Means will be the same as the means from your sample, which are displayed using the Descriptive Statistics option at the bottom of Univariate: Options... dialog box.

Page 18: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

18

GLM Univariate, two-way ANOVA

Fig. 4. Univariate: Options dialog box

Page 19: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

19

GLM Univariate, two-way ANOVA

Click the box next to Estimates of effect size. Estimates of effect size gives a Partial Eta-Squared value for each effect and each parameter estimate. The eta-squared statistic describes the proportion of total variability attributable to a factor;;

Select Observed power. Observed power is the likelihood of finding a significant difference between groups in any particular sample with the sample size as the difference between groups in the population. In other words, Observed power is the probability of correctly rejecting a false statistical null hypothesis and is equal to 1-β, where β is the probability of a Type II error. Conventionally a test with a power greater than 0.8 (or β<=0.2) is considered statistically powerful.

Select Homogeneity tests. Homogeneity tests produces the Levene test of the homogeneity of variance for each dependent variable across all level combinations of the between-subjects factors, for between-subjects factors only.

2

10 2

Page 20: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

20

Example

Example. Data are gathered for individual swimmers in the senior swimming championship for several years. The time in which each swimmer finishes is the dependent variable. Other factors include date of championship, and age (categorical). You might find that age and date of championship are a significant effect and that the interaction of age with date is significant. It is suppose, that different individuals participated in different championships, i.e., the samples are independent. The data file fragment is show in Fig. 5.

The following basic tables are obtained from the GLM Univariate output.

Page 21: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

21

Example

Fig. 5. Data View

Page 22: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

22

Example

Table Between-Subjects Factors (Fig. 6) contains general information about independent variables (influence factors);

Levene's test of homogeneity of variance is computed by SPSS to test the GLM Univariate assumption that each group (category) of the independent(s) has the same variance. In our example, resulting p-value of Levene's test is greater than significance level (0,05) as are shown in table Levene’s Test of Equality of Error Variances (Fig.6). That is, assumptions are met. Note, that the Levene’s test is robust in the face of departures from normality.

Page 23: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

23

Example

Fig. 6. The main outputs of GLM Univariate

Page 24: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

24

Example

The Tests of Between Subjects Effects table (Fig. 7) gives us information about the main and interaction effects. This table shows that for the Age main effect, p(Sig.) = 0.000, with a Partial Eta-Squared effect size of 0.300, and Observed Power 1.000. Since p < 0.05 we reject H0. There is a significant Age main effect on the dependent variable, Time.

This table also shows that for the Championship main effect, p = 0.000, with a Partial Eta-Squared effect size of 0.475, and Observed Power 1.000. Since p < 0.05 we reject H0. There is a significant Championship main effect on the dependent variable, Time.

Finally, the table shows that for Age*Championship interaction, p = 0.319. Since p<0.05, we reject H0. There is significant interaction between Age and Championship.

Page 25: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

25

Example

Fig. 7. The main outputs of GLM Univariate

Tests of Between-Subjects Effects

Dependent Variable: Time

Source Type III Sum of

Squares

df Mean Square F Sig. Partial Eta

Squared

Noncent.

Parameter

Observed Powerb

Corrected Model 332,352a 8 41,544 32,618 ,000 ,649 260,944 1,000

Intercept 572190,678 1 572190,678 449251,797 ,000 1,000 449251,797 1,000

Age 77,069 2 38,535 30,255 ,000 ,300 60,510 1,000

Championship 162,200 2 81,100 63,675 ,000 ,475 127,350 1,000

Age * Championship 83,953 4 20,988 16,479 ,000 ,319 65,915 1,000

Error 179,585 141 1,274

Total 575540,688 150

Corrected Total 511,937 149

a. R Squared = ,649 (Adjusted R Squared = ,629)

b. Computed using alpha = ,05

Page 26: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

26

Example

The table Estimated Marginal Means (Fig. 8) shows mean of dependent variable (Time) for each level of Age and Championship, along with the standard error of estimate of the mean.

The Post Hoc Test Multiple Comparisons table (Fig.9) for the Tukey test displays all pairwise comparisions between groups of independent variable Age.

Significant differences in Time scores were found between the age groups 20-30 years and 35-39 years, also between the age groups 30-34 years and 35-39 years. No significant difference was found between the age groups 25-29 years and 30-34 years. All comparisons are made twice, so all results are repeated.

Page 27: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

27

Fig. 8. The main outputs of GLM Univariate

Page 28: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

28

Example

Post Hoc Tests

Fig. 9. The main outputs of GLM Univariate

Age

Page 29: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

29

Example

Also the table Homogenous Subsets (Fig. 10) shows there are two significantly different homogenous subsets.

Similar results are across levels of second independent variable (factor) – Championship (not shown here).

Page 30: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

30

Example

Homogeneous Subsets

Fig. 10. The main outputs of GLM Univariate

Page 31: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

31

Example

Profile plots are an easy way to visualize the relationship of factors to the dependent variable and to each other. Profile plot Estimated Marginal Means (Fig.11) shows the marginal means on the continuous dependent variable Time for value groups of factor Championship, using values of another factor Age as the X axis (the Y axis is the magnitude of the mean). That the profile plot lines are not parallel shows there is an interaction effect between Championship and Age. The fundamental difference between the nature of the curve suggests the interaction of factors - the final conclusion is based on Test of Between-Subject Effects table.

Page 32: 1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas

32

Example

Fig. 11. The main outputs of GLM Univariate