Upload
clement-ward
View
212
Download
0
Embed Size (px)
Citation preview
Intro to StatsLinear Regression
Uses correlations Predicts value of one variable from the
value of another ***computes UKNOWN outcomes from
present, known outcomes
If we know correlation between two variables and one value, we can predict other value
In other words, what value on Y would be predicted by a score on X?
Linear Regression
You are examining a relationship between continuous variables
You wish to predict scores on one variable from scores on the other
When to use it:
Fit a line between the two variables that best captures the scores ◦ Minimal distance between each data point and
the line◦ Allows for the best guess at a score on the second
variable given some data point on the first◦ Error in prediction: Distance from each point to
the regression line◦ If the correlation were perfect, data points would
be at a 45-degree angle.
Line of best fit – regression equation
Standard Error
Y’ = bX + a
Y’ = predicted score of Y based on Xb = slope of the linea = point where line crosses the y-axisX = score used as the predictor
Formula for a Line
Parts of the linear equation b
◦ The value of b is the slope◦ From this we can tell how much the Y
variable will change when X increases by 1 point
a◦ The Y-intercept◦ This tells us what Y would be if X = 0◦ This is where the line crosses the Y axis
b = ΣXY – (ΣXΣY / n) ΣX2 – [(ΣX)2 / n]
First, b
a = ΣY - bΣX n
Second, a
Can examine how closely the actual Y values approximate the predicted Y values
If averaged across all data points, this is the standard error of the estimate◦ Estimates the imprecision of the line
Standard Error
1. State hypotheses◦ Null hypothesis: no relationship between years of
education and income H0: β = 0
◦ Research hypothesis: years of education predicts income H1: β ≠ 0
Example 1
We’ll use SPSS output to test if the x significantly predicts changes in y
Partitions variance into variance accounted for by predictors◦ And variance unaccounted for by predictors (the
residual)◦ The output will include a significance test of
whether the variance accounted for significantly differs from zero (an F-statistic)
Hypothesis Test
5. Use SPSS output
Example 1
ANOVAb
ModelSum of
Squares dfMean
Square F Sig.1 Regression 301.042 1 301.042 37.290 .004a
Residual 32.292 4 8.073
Total 333.333 5
a. Predictors: (Constant), yrsed
b. Dependent Variable: income
5. Use SPSS output for the standardized beta and the test statistic
Example 1
Model
Unstandardized Coefficients
Standardized Coefficients
t Sig.B Std. Error Beta1 (Constant) 12.708 2.091 6.077 .004
yrsed 3.542 .580 .950 6.107 .004
6. The output indicates that b = 3.54 and β = .95, with a p < .05 (actually p < .01)◦ So it does exceed the critical value
7. If over the critical value, reject the null
& conclude that years of education significantly predicts income
Example 1
In results◦ Years of education significantly predicted income, b = 3.54, t = 6.11, p < .05, such that more years of education predicted greater income.
◦ Could further say that: for every additional year of education, participants made an additional $35,400 per year (3.54 x10,000 dollars).
Example 1
Predict an outcome Y-value with multiple predictor X-values
**This is the real advantage over a correlation coefficient
Determine whether each predictor makes a unique improvement to the prediction of Y
Multiple Regression
Example 1
Model
Unstandardized Coefficients
Standardized Coefficients
t Sig.B Std. Error Beta1 (Constant) 12.708 2.091 6.077 .004
yrsed 3.542 .580 .950 6.107 .004
2 (Constant) 4.673 .996 4.692 .018
yrsed -.541 .468 -.145 -1.156 .331
pincome 1.031 .114 1.137 9.063 .003