25
Regression and Correlation GTECH 201 Lecture 18

Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

  • View
    218

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Regression and Correlation

GTECH 201Lecture 18

Page 2: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

ANOVA

Analysis of Variance Continuation from matched-pair difference

of means tests; but now for 3+ cases We still check whether samples come from

one or more distinct populations Variance is a descriptive parameter ANOVA compares group means and looks

whether they differ sufficiently to reject H0

Page 3: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

ANOVA H0 and HA

Page 4: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

ANOVA Test Statistic

MSB = between-group mean squares

MSW = within-group mean squares

Between-group variability is calculated in three steps:

B

w

MSF

MS

1. Calculate overall mean as weighted average of sample means

2. Calculate between-group sum of squares3. Calculate between-group mean squares (MSB)

Page 5: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Between-group Variability

1. Total or overall mean

2. Between-group sum of squares

3. Between-group mean squares

1

k

i ii

T

n XX

N

2

2 2

1 1

k k

B i i T i i Ti i

SS n X X n X N X

1B B

BB

SS SSMS

df k

Page 6: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Within-group Variability

1. Within-group sum of squares

2. Within-group mean squares

2

1

1k

w i ii

SS n s

W WW

W

SS SSMS

df N k

Page 7: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Kruskal-Wallis Test Nonparametric equivalent of ANOVA Extension of Wilcoxon rank sum W test

to 3+ cases Average rank is Ri / ni

Then the Kruskal-Wallis H test statistic is

With N =n1 + n2 + … +nk = total number of observations, and

Ri = sum of ranks in sample i

2

1

123 1

1

ki

i i

RH N

N N n

Page 8: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

ANOVA Example

House prices by neighborhood in ,000 dollars A B C D175 151 127 174147 183 142 182138 174 124 210156 181 150 191184 193 180148 205

196

Page 9: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

ANOVA Example, continued

Sample statistics

n X sA 6 158.00 17.83B 7 183.29 17.61C 5 144.60 22.49D 4 189.25 15.48

Total 22 168.68 24.85

Now fill in the six steps of the ANOVA calculation

Page 10: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

The Six Steps

1 6(158.00) 7(183.29) 5(144.60) 4(189.25)168.68

22

k

i ii

T

n XX

N

2 2 2 2 22 2

1

6 158.00 7 183.29 5 144.60 4 189.25 22 168 6769.394k

B i i Ti

SS n X N X

6769.3942256.465

1 3B B

BB

SS SSMS

df k

2 2 2 22

1

1 5 17.83 6 17.61 4 22.49 3 15.48 6193.379k

w i ii

SS n s

6193.379

344.07722 4

W WW

W

SS SSMS

df N k

2256.465

6.558344.077

B

W

MSF

MS .003p

Page 11: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Correlation Co-relatedness between 2+ variables As the values of one variable go up,

those of the other change proportionally

Two step approach:1. Graphically - scatterplot2. Numerically – correlation coefficients

Page 12: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Is There a Correlation?

Page 13: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Scatterplots Exploratory analysis

Page 14: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Pearson’s Correlation Index

Based on concept of covariance

= covariation between X and Y

= deviation of X from its mean

= deviation of Y from its mean

Pearson’s correlation coefficient

XYCV X X Y Y XYCV

X X

Y Y

/X Y

X X Y Y Nr

S S

Page 15: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Sample and Population

r is the sample correlation coefficient

Applying the t distribution, we can infer the correlation for the whole population

Test statistic for Pearson’s r

2

2

1

r nt

r

Page 16: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Correlation Example Lake effect snow

Page 17: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Spearman’s Rank Correlation

Non-parametric alternative to Pearson

Logic similar to Kruskal and Wilcoxon

Spearman’s rank correlation coefficient 2

3

61s

dr

N N

Page 18: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Regression

In correlation we observe degrees of association but no causal or functional relationship

In regression analysis, we distinguish an independent from a dependent variable

Many forms of functional relationships bivariate linear

multivariate non-linear (curvi-linear)

Page 19: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Graphical Representation

In correlation analysis either variable could be depicted on either axis

In regression analysis, the independent variable is always on the X axis

Bivariate relationship is described by a best-fitting line through the scatterplot

Page 20: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Least-Square Regression

Objective: minimize 2id

Y a bX

Page 21: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Regression Equation

Y = a + bX

22

n XY X Yb

n X X

Y b Xa

n

Page 22: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Strength of Relationship

How much is explained by the regression equation?

Page 23: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Coefficient of Determination

Total variation of Y (all the bucket water)

Large ‘Y’ = dependent variable Small ‘y’ = deviation of each value of Y

from its mean

e = explained; u = unexplained

22y Y Y 2 2 2

e uy y y

2 2 2e uy y y

Page 24: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Explained Variation

Ratio of square of covariation between X and Y to the variation in X

where xy = covariation between X and Y

x2 = total variation of X

Coefficient of determination

222e

xyy

x

22

2eyry

Page 25: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases

Error Analysis

r 2 tells us what percentage of the variation is accounted for by the independent variable

This then allows us to infer the standard error of our estimate

which tells us, on average, how far off our prediction would be in measurement units

2

2eySE

n