Upload
ferdinand-dorsey
View
223
Download
2
Embed Size (px)
Citation preview
Correlation and
Regression
PS397Testing and Measurement
January 16, 2007Thanh-Thanh Tieu
Canoe.ca
Title of article implies that when women are depressed, they tend to drink more
Correlational relationship between drinking and depression in women
http://lifewise.canoe.ca/Living/2007/01/05/3176991-cp.html
Scatter Diagram Visual display of
relationship between variables Bivariate distribution:
two scores for each individual
Where an individual scores on both x and y
E.g., relationship between high school average and university average
Participant 11 – 3.2 high school GPA, 3.3 university GPA
Correlation What does one variable tell us about the
other? Looks at how the two variables covary
Changes in one correspond to changes in other Correlation coefficient tells us the direction
and magnitude of the relationship i.e., how variables are related (+/-) and the
strength of the relationship
Correlation Coefficient
Positive Correlation Negative Correlation
No Correlation
Correlation Coefficient Correlation coefficient varies from -1.0
(perfect negative relationship) to +1.0 (perfect positive relationship)
Accounts for the individual’s deviation above and below the group mean on each variable Above the mean on both variables = 2 positive
standard scores Below the mean on both variables = 2
negative standard scores
Correlation Coefficient Pearson correlation coefficient is mean of these
products:
yx s
yy
s
xx
Nr
1
Positive Value: standard scores have equal signs and are of approximate equal amount
Negative Value: standard score is above mean in one variable, and below mean in other (cross product is negative
No Correlation: some products are positive and some are negative
Regression If you had no other information, what is
the best prediction for a person’s grade in a course? Often we have other information (e.g., grades
on other courses, midterm grades, etc.) If variables are correlated with variable of
interest, this information can help us improve our prediction
Process called regression
Regression
Regression line: best fitting straight line through a set of points in a scatter diagram
Principle of Least Squares Minimum squared deviation from regression line
Regression Line
Y’ = a + bX Y’ = predicted score a = intercept, the value of y when x is 0,
point where regression line crosses y b = regression coefficient, slope of
regression line X = known score
Regression Line
Regression Line
Y’ = a + bXY’ = 20 + .1XWhere Y’ = predicted grade for course X = SAT score
slope = .1intercept = 20
Regression What if there were no correlation between X and
Y? What would regression line look like?
Regression The larger the value of b, the more information
we have about Y by knowing X
Regression What happens if both variables are in
terms of standard scores?
Y’ = a + bX a = 0 b = r, correlation between X and Y Regression equation would be:
ZY’ = rZx Correlation: special case of regression
where both variables are in standard scores
Regression Problems Break into groups of 3 people and
complete the problems on the handout
Terms Used in Correlation & Regression Residual: difference between predicted and
observed values
Y – Y’ Σresiduals = 0
Standard Error of the Estimate: standard deviation of residuals, kind of an average of residuals
A measure of accuracy of prediction Smaller = more accurate predictions because differences
between Y and Y’ are small
2
' 2
N
YYS yx
Terms Used in Correlation & Regression Coefficient of Determination (r2): % of total
variation in one set of scores that we know as a function of information about other set
Cross Validation: calculate standard error of estimate in a group of participants other than one used to get equation
Restricted Range: When restrictions on sample inhibit variability observed correlation will likely be deflated
Terms Used in Correlation & Regression Correlation – Causation Problem:
correlation between two variables does not necessarily mean that one causes another E.g., aggression and TV viewing
Third Variable Explanation: the possibility that a third variable that hasn’t been measured causes both E.g., aggression and TV viewing poor social
adjustment
Multiple Regression Looks at relationship among three or more
variables E.g., predicting course grade from SAT scores and average
from previous year
Where k = # of predictor variables Example: predicting law school GPA from
undergrad GPA, professors’ ratings, ageLaw school GPA = .8 (Z score of Undergrad GPA)
+ .24 (Z score of profs’ ratings) + .03 (Z score of age)
kk xbxbxbay ...' 2211
Multiple Regression
321 03.24.8.' xxxay
When variables are expressed in Z-units, weights are standardized regression coefficients Also called B’s or betas
If not Z-units using raw regression coefficients Also called b’s
Need to be careful when predictor variables are highly correlated
Best when predictor variables are uncorrelated
Teaching EvaluationFor: Thanh-Thanh TieuDate: January 16, 2007Class: Correlation & Regression, PS397
Strengths of the Lecture
Suggestions for Improvement
Additional Comments