Upload
olivia-cunningham
View
225
Download
1
Tags:
Embed Size (px)
Citation preview
Basic Statistics
Correlation
Var
Var
Var Var
Var
Relationships
Associations
Information
?COvary
In Research
Dependent variable
Independent variables
X1
X2
X3
Y
The Concept of Correlation
Association or relationship between two variables
X Y
Covary---Go together
Co-relate?relationr
Patterns of Covariation Y
Positive correlation
Negative correlation
CorrelationCovary
Go togetherX Y X Y
XZero or no correlation
Scatter plots allow us to visualize the relationships
Scatter Plots
The chief purpose of the scatter diagram is to study the nature of the relationship between two variables
Linear/curvilinear relationship
Direction of relationship
Magnitude (size) of relationship
Represents both the X and Y scores
Variable X
Variable Y
An illustration of a perfect positive correlation
high
high
low
low
Scatter Plot A
Exact value
Variable X
Variable Y
An illustration of a positive correlation
high
high
low
low
Scatter Plot B
Estimated Y value
Variable X
Variable Y
An illustration of a perfect negative correlation
high
high
low
low
Scatter Plot C
Exact value
Variable X
Variable Y
An illustration of a negative correlation
high
high
low
low
Scatter Plot D
Estimated Y value
Variable X
Variable Y
An illustration of a zero correlation
high
high
low
low
Scatter Plot E
Variable X
Variable Y
An illustration of a curvilinear relationship
high
high
low
low
Scatter Plot F
The Measurement of Correlation
The degree of correlation between two variables can be described by such terms as “strong,” ”low,” ”positive,” or “moderate,” but these terms are not very precise.
If a correlation coefficient is computed between two sets of scores, the relationship can be described more accurately.
The Correlation Coefficient
A statistical summary of the degree and direction of relationship or association between two variables can be computed
Pearson’s Product-Moment Correlation Coefficient r
-1.00 -.50 0 + .50 1.00
Direction of relationship: Sign (+ or –)
Magnitude: 0 through +1 or 0 through -1
Negative correlation Positive correlation
No Relationship
nY)(
YnX)(
X
nY)X)((
XYr
22
22
The Pearson Product-MomentCorrelation Coefficient
1n
XXXXΣ
1n
XXΣS
2
2
Recall that the formula for a variance is:
If we replaced the second X that was squared with a second variable, Y, it would be:
1n
YYXXΣS yx
This is called a co-variance and is an index of the relationship between X and Y.
Conceptual Formula for Pearson r
n
1i
n
1i
2i
2i
n
1i
)Y(Y)X(X
)Y)(YX(Xr
ii
This formula may be rewritten to reflect the actual method of calculation
nY)(
YnX)(
X
nY)X)((
XYr
22
22
Calculation of Pearson r
You should notice that this formula is merely the sum of squares for covariance divided by the square root of the product of the sum of squares for X and Y
Formulae for Sums of Squares
n
YXXYSSxy
n
YYSSy
n
XXSSx
22
22
Therefore, the formula for calculating r may be rewritten as:
Calculation of r Using Sums of Squares
SSySSx
SSxyr
An Example
Suppose that a college statistics professor is interested in how the number of hours that a student spends studying is related to how many errors students make on the mid-term examination. To determine the relationship the professor collects the following data:
The Stats Professor’s Data
Student Hours Studied (X)
Errors (Y) X2 Y2 XY
1 4 15 16 225 60
2 4 12 16 144 48
3 5 9 25 81 45
4 6 10 36 100 60
5 7 8 49 64 56
6 7 4 49 16 28
7 7 6 49 36 42
8 9 2 81 4 18
9 9 4 81 16 36
10 12 3 100 9 36
Total X = 70 Y = 73 X2 =546 Y2=695 XY=429
The Data Needed to Calculate the Sum of Squares
X Y X2 Y2 XY
Total X = 70 Y = 73 X2 =546 Y2=695 XY=429
n
YYSSy
22
n
YXXYSSxy
n
XXSSx
22 = 546 - 702/10 = 546 - 490 = 56
= 695 - 732/10 = 695 - 523.9 = 162.1
= 429 – (70)(73)/10 = 429 – 511 = -82
Calculating the Correlation Coefficient
SSySSx
SSxyr = -82 / √(56)(162.1)
= - 0.86
Thus, the correlation between hours studied and errors made on the mid-term examination is -0.86; indicating that more time spend studying is related to fewer errors on the mid-term examination. Hopefully an obvious, but now a statistical conclusion!
Pearson Product-Moment Correlation Coefficient r
0-1 +1
Negative correlation
Positive correlation
perfect negative correlation
Perfect positive correlation
Zero correlation
nY)(
YnX)(
X
nY)X)((
XYr
22
22
Numerical values
Negative correlation Zero correlation Positive correlation
0- .35.73
nY)(
YnX)(
X
nY)X)((
XYr
22
22
Perfect Strong Moderate
The Pearson r and Marginal Distribution
The marginal distribution of X is simply the distribution of the X’s; the marginal distribution
of Y is the frequency distribution of the Y’s.
Y
X
Bivariate Normal Distribution
Bivariate relationship
Marginal distribution of X and Y are precisely the same shape.
X variable
Y variable
Interpreting r, the Correlation Coefficient
Recall that r includes two types of information:
The direction of the relationship (+ or -)The magnitude of the relationship (0 to 1)
However, there is a more precise way to use the correlation coefficient, r, to interpret the magnitude of a relationship. That is, the square of the correlation coefficient or r2.
The square of r tells us what proportion of the variance of Y can be explained by X or vice versa.
Variable X
Variable Y
An illustration of how the squared correlation accounts for variance in X, r = .7, r2 = .49
high
high
low
low
How does correlation explain variance?
Explained
Explained
Suppose you wish to estimate Y for a given value of X.
49% of variance is explained
Free to Vary
Now, let’s look at some correlation coefficients and their corresponding scatter plots.
Beginning Salary
700006000050000400003000020000100000
Cur
rent
Sal
ary
120000
100000
80000
60000
40000
20000
0
What is your estimate of r?
r = .87 r2 = .76 = 76%
Beginning Salary
700006000050000400003000020000100000
Cur
rent
Sal
ary
120000
100000
80000
60000
40000
20000
0
X
Y
What is your estimate of r?
r = -1.00 r2 = 1.00 = 100%
Beginning Salary
700006000050000400003000020000100000
Cur
rent
Sal
ary
120000
100000
80000
60000
40000
20000
0
X
Y
What is your estimate of r?
r = +1.00 r2 = 1.00 = 100%
What is your estimate of r?
r = .04
Months since Hire
10090807060
Beg
inni
ng S
ala
ry
70000
60000
50000
40000
30000
20000
10000
0
r2 = .002 = .2%
What is your estimate of r?
r = -.44
Time to Accelerate from 0 to 60 mph (sec)
3020100
Veh
icle
Wei
ght
(lbs.
)
6000
5000
4000
3000
2000
1000
r2 = .19 = 19%