22
PSY 1950 Correlation November 5, 2008

PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

PSY 1950Correlation

November 5, 2008

Page 2: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Definition• Correlation quantifies the strength and direction of a linear relationship between two variables

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 3: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

History

Page 4: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

The First Scatterplot (Galton, 1885)

Page 5: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Importance• Prior to correlation, “there was no way to discuss -- let alone measure -- the association between variables that lacked a cause-effect relationship”

• Correlation underlies many advanced statistical techniques– Factor analysis– Structural equation modeling

• Correlation informs– Prediction of a unkown variable– Validity of a measure– Reliability of a measure– Validity of a theory

Page 6: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Covariance• Covariance measures how much two variables change together– The more they change together, the higher the covariance

– Variance is a special case of covariance

Page 7: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

3

4

5

6

7

8

9

0 1 2 3 4 5 6

3

4

5

6

7

8

9

0 2 4 6

X Y X Y Product1 4 -2 -2 42 5 -1 -1 13 6 0 0 04 7 1 1 15 8 2 2 4

Score DeviationX Y X Y Product1 8 -2 2 -42 5 -1 -1 13 4 0 -2 04 5 1 -1 -15 8 2 2 4

DeviationScore

Page 8: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

The Problem with Covariation

• It reflects not only the degree of a bivariate relationship, but also the variation of each variables

• In other words, its units depends on the variables

3

5

7

9

11

13

15

17

19

21

0 1 2 3 4 5 6

3

4

5

6

7

8

9

0 2 4 6

Page 9: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Pearson Product-Moment Correlation (r)

• Special case of covariance– Standardized covariance– Covariance of standardized variables

Page 10: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Example

Page 11: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Interpreting r• Things to consider carefully

– Correlation versus causation– Restricted Range– Group sampling– Outliers– Linearity– Size– Homoscedasticity– Significance

Page 12: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Correlation versus Causation

Page 13: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Correlation versus Causation

Page 14: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Restriction of Range• When the bivariate range is artificially limited– In the case of linear relationship, the correlation is almost spuriously attenuated

– In the case of curvilinear relationship, can result in a spuriously large correlation

• Possibly a grouping/selection effect– The correlation between height and basketball ability among NBA players

• http://www.ruf.rice.edu/~lane/stat_sim/restricted_range/index.html

Page 15: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Grouping• Grouping of heterogeneous groups (either a priori via sampling or a posteriori via data segregation) can inflate correlation– e.g., the correlation between height and basketball ability among small people and tall people

– e.g., the correlation between height and weight in men and women•For men, r = .60, for women r = .49•Together, r = .78

• http://www.ruf.rice.edu/~lane/stat_sim/restricted_range/index.html

Page 16: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Outliers• Correlation is very sensitive to outliers– For all three plots, r, means, and SD are equal

Page 17: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Linearity

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 18: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Size• The magnitude of r

• The magnitude of r2

– The coefficient of determination– The proportion of variability in one variable accounted for by variability in the other variable

Page 19: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Homoscedasticity• Same as homogeneity of variance assumption

• Variance for Y does not depend on value of Y and vice-versa

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 20: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Significance• To test the null hypothesis that the population correlation, (“rho”) = 0, use:

Page 21: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

QuickTime™ and a decompressor

are needed to see this picture.

Page 22: PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables

Other measures of correlation

• Computationally identical to r– Point-biserial

•One dichotomous variable

– Phi•Two dichotomous variables

– Spearman•Both variables on ordinal scale•Tests monotonicity of relationship•As X increases, so does Y•No accurate significance test

• Computationally novel techniques– e.g., Kendll’s Tau