Linear Regression. Uses correlations Predicts value of one variable from the value of another ***computes UKNOWN outcomes from present, known outcomes

Intro to StatsLinear Regression

Uses correlations Predicts value of one variable from the

value of another ***computes UKNOWN outcomes from

present, known outcomes

If we know correlation between two variables and one value, we can predict other value

In other words, what value on Y would be predicted by a score on X?

Linear Regression

You are examining a relationship between continuous variables

You wish to predict scores on one variable from scores on the other

When to use it:

Fit a line between the two variables that best captures the scores ◦ Minimal distance between each data point and

the line◦ Allows for the best guess at a score on the second

variable given some data point on the first◦ Error in prediction: Distance from each point to

the regression line◦ If the correlation were perfect, data points would

be at a 45-degree angle.

Line of best fit – regression equation

Standard Error

Y’ = bX + a

Y’ = predicted score of Y based on Xb = slope of the linea = point where line crosses the y-axisX = score used as the predictor

Formula for a Line

Parts of the linear equation b

◦ The value of b is the slope◦ From this we can tell how much the Y

variable will change when X increases by 1 point

a◦ The Y-intercept◦ This tells us what Y would be if X = 0◦ This is where the line crosses the Y axis

b = ΣXY – (ΣXΣY / n) ΣX2 – [(ΣX)2 / n]

First, b

a = ΣY - bΣX n

Second, a

Can examine how closely the actual Y values approximate the predicted Y values

If averaged across all data points, this is the standard error of the estimate◦ Estimates the imprecision of the line

Standard Error

1. State hypotheses◦ Null hypothesis: no relationship between years of

education and income H0: β = 0

◦ Research hypothesis: years of education predicts income H1: β ≠ 0

Example 1

We’ll use SPSS output to test if the x significantly predicts changes in y

Partitions variance into variance accounted for by predictors◦ And variance unaccounted for by predictors (the

residual)◦ The output will include a significance test of

whether the variance accounted for significantly differs from zero (an F-statistic)

Hypothesis Test

5. Use SPSS output

Example 1

ANOVAb

ModelSum of

Squares dfMean

Square F Sig.1 Regression 301.042 1 301.042 37.290 .004a

Residual 32.292 4 8.073

Total 333.333 5

a. Predictors: (Constant), yrsed

b. Dependent Variable: income

5. Use SPSS output for the standardized beta and the test statistic

Example 1

Model

Unstandardized Coefficients

Standardized Coefficients

t Sig.B Std. Error Beta1 (Constant) 12.708 2.091 6.077 .004

yrsed 3.542 .580 .950 6.107 .004

6. The output indicates that b = 3.54 and β = .95, with a p < .05 (actually p < .01)◦ So it does exceed the critical value

7. If over the critical value, reject the null

& conclude that years of education significantly predicts income

Example 1

In results◦ Years of education significantly predicted income, b = 3.54, t = 6.11, p < .05, such that more years of education predicted greater income.

◦ Could further say that: for every additional year of education, participants made an additional $35,400 per year (3.54 x10,000 dollars).

Example 1

Predict an outcome Y-value with multiple predictor X-values

**This is the real advantage over a correlation coefficient

Determine whether each predictor makes a unique improvement to the prediction of Y

Multiple Regression

Example 1

Model

Unstandardized Coefficients

Standardized Coefficients

t Sig.B Std. Error Beta1 (Constant) 12.708 2.091 6.077 .004

yrsed 3.542 .580 .950 6.107 .004

2 (Constant) 4.673 .996 4.692 .018

yrsed -.541 .468 -.145 -1.156 .331

pincome 1.031 .114 1.137 9.063 .003

Documents

Linear Regression. Uses correlations Predicts value of one variable from the value of another ***computes UKNOWN outcomes from present, known outcomes