Upload
mauli
View
37
Download
0
Embed Size (px)
DESCRIPTION
Social Statistics: Correlation. This week. What is correlation? How to compute? How to interpret?. Correlation Coefficients. The relations between two variables How the value of one variable changes when the value of another variable changes - PowerPoint PPT Presentation
Citation preview
Social Statistics: Correlation
What is correlation? How to compute? How to interpret?
This week
2
The relations between two variables How the value of one variable changes
when the value of another variable changes
A correlation coefficient is a numerical index to reflect the relationship between two variables. Range: -1 ~ +1 Bivariate correlation (for two variables)
Correlation Coefficients
3
Parametric Pearson product-moment correlation
(named for inventor Karl Pearson) Non-parametric
Spearman’s rank correlation Kendall tau rank correlation coefficient
Correlation Coefficients
4
For two variables which are continuous in nature Height, age, test score, income
But not for discrete or categorical variables Race, political affiliation, social class,
rank
Pearson correlation coefficient
Rxy is the correlation between variable X and variable Y
5
Direct correlation (positive correlation): If both variables change in the same direction
Indirect correlation (negative correlation): If both variables change in opposite directions
Types of correlation coefficients
6
Below is Correlation Report of different Currency Exchange Rate on November 13 – 2014 (source: Bloomberg Terminal)
-0.8 and 0.5, which is stronger?
Types of correlation coefficients
7
])(][)([ 2222 YYnXXn
YXXYnrxy
Pearson product-moment correlation coefficient
xyr the correlation coefficient between X and Y n the size of the sample X the individual’s score on the X variable Y the individual’s score on the Y variable XY the product of each X score times its corresponding Y score X2 the individual X score, squared Y2 the individual Y score, squared
8
Calculate Pearson correlation coefficient for US school enrollment (unit: k) in some time points of previous 50 years. (Source: United States Census Bureau)
Exercise
1. Select two columns of data – are they correlated?2. What does this correlated mean?
9
YearG9-12 Public
G9-12 Private
College-Public
College-Private
1965 11610 1400 3970 1951
1970 13336 1311 6428 2153
1975 14304 1300 8836 2350
1980 13231 1339 9457 2640
1985 12388 1362 9479 2768
1990 11341 1136 10845 2974
1995 12502 1163 11092 3169
2000 13517 1264 11753 3560
2005 14909 1349 13022 4466
CORREL function Or PEARSON function
Using Excel to calculate
10
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 20100
2000
4000
6000
8000
10000
12000
14000
16000
G9-12 PublicG9-12 PrivateCollege-PublicCollege-Private
Scatterplot or scattergram
Visualizing a correlation
X
Y
11
X Y2 34 25 66 54 37 68 55 46 47 5
Visualizing a correlation
12
r =1, a perfect direct (or positive) correlation In real life case, 0.7 and 0.8 could be the highest you will
see
Direct (positive) correlation
0 1 2 3 4 5 6 7 8 90
2
4
6
8
10
13
Strength and direction are important
Indirect (or negative) correlation
0 1 2 3 4 5 6 7 8 90
2
4
6
8
10
14
Excel Scatterplot
Four sets of data with the same correlation of 0.816
15
Linear correlation means that X and Y are in one straight line
Curvlilinear correlation Age and memory
Linear correlation
16
incomeeducation attitudevote
74190 13 1 180931 12 3 281314 11 4 273089 11 5 262023 11 3 261217 10 4 284526 11 5 187251 11 4 162659 12 5 276450 10 6 270512 12 7 278858 9 6 178628 13 7 186212 14 8 274962 9 8 258828 11 9 461471 10 8 578621 12 7 560071 9 8 4
More than 2 variables?
How to calculate the correlation coefficient?
1. CORREL()2. Correlation in data analysis
toolset
17
Correlation matrix
More than 2 variables?
Income Education
Attitude Vote
Income 1.00 0.35 -0.19 -0.51
Education 1.00 -0.21 -0.20
Attitude 1.00 0.55
Vote 1.00
18
Data Analysis tool - correlation
Excel
19
Correlation value: - finite number ~ + finite number
Correlation coefficient value: -1.00 ~ +1.00
Meaning of Correlation coefficient
rxy value
Interpretation
0.8 ~ 1.0 Very strong relationship (share most of the things in common)
0.6 ~0.8 Strong relationship (share many things in common)
0.4 ~ 0.6 Moderate relationship (share something in common)
0.2 ~ 0.4 Weak relationship (share a little in common)
0.0 ~ 0.2 Weak or no relationship (share very little or nothing in common)
20
Coefficient of determination: The percentage of variance in one
variable that is accounted for by the variance in the other variable.
= square of coefficient
Coefficient of determination
49.0
70.02
.
.
TimeGPA
TimeGPA
r
r 49% of the variance in GPA can be explained by the variance in studying time
21
The amount of unexplained variance is called the coefficient of undetermination (coefficient of alienation)
Coefficient of nondetermination
correlation
determination
interpretation
0 0
0.5 0.25
0.9 0.81
22
In a small town in Greece, The local police found the direct
correlation between ice cream and crime
Ice cream and crime
23
The correlation represents the association between two or more variables
It has nothing to do with causality (there is no cause relation between two correlated variables) Ices cream and crime are correlated, but Ices cream does not cause crime
Correlation vs. causality
24
Correlation vs. causalitySummer
Summer is when people get together. More specifically, casual drinkers and drug users are more likely to go to bars or parties on weekends and evenings, as opposed to a Tuesday morning. These people in the social mix, flooding the city’s streets and neighborhood bars, feed the peak times for murder, experts say.
25