23
1 Chapter 17: Introduction to Regression

Chapter 17: Introduction to Regression

  • Upload
    chaim

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Chapter 17: Introduction to Regression. Introduction to Linear Regression. The Pearson correlation measures the degree to which a set of data points form a straight line relationship. - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 17: Introduction to Regression

1

Chapter 17: Introduction to Regression

Page 2: Chapter 17: Introduction to Regression

2

Introduction to Linear Regression

• The Pearson correlation measures the degree to which a set of data points form a straight line relationship.

• Regression is a statistical procedure that determines the equation for the straight line that best fits a specific set of data.

Page 3: Chapter 17: Introduction to Regression

3

Introduction to Linear Regression (cont.)

• Any straight line can be represented by an equation of the form Y = bX + a, where b and a are constants.

• The value of b is called the slope constant and determines the direction and degree to which the line is tilted.

• The value of a is called the Y-intercept and determines the point where the line crosses the Y-axis.

Page 4: Chapter 17: Introduction to Regression
Page 5: Chapter 17: Introduction to Regression

5

Introduction to Linear Regression (cont.)

• How well a set of data points fits a straight line can be measured by calculating the distance between the data points and the line.

• The total error between the data points and the line is obtained by squaring each distance and then summing the squared values.

• The regression equation is designed to produce the minimum sum of squared errors.

Page 6: Chapter 17: Introduction to Regression

6

Introduction to Linear Regression (cont.)

The equation for the regression line is

Page 7: Chapter 17: Introduction to Regression
Page 8: Chapter 17: Introduction to Regression

8

Introduction to Linear Regression (cont.)

• The ability of the regression equation to accurately predict the Y values is measured by first computing the proportion of the Y-score variability that is predicted by the regression equation and the proportion that is not predicted.

Page 9: Chapter 17: Introduction to Regression

9

Introduction to Linear Regression (cont.)

• The unpredicted variability can be used to compute the standard error of estimate which is a measure of the average distance between the actual Y values and the predicted Y values.

Page 10: Chapter 17: Introduction to Regression

10

Introduction to Linear Regression (cont.)

• Finally, the overall significance of the regression equation can be evaluated by computing an F-ratio.

• A significant F-ratio indicates that the equation predicts a significant portion of the variability in the Y scores (more than would be expected by chance alone).

• To compute the F-ratio, you first calculate a variance or MS for the predicted variability and for the unpredicted variability:

Page 11: Chapter 17: Introduction to Regression

11

Introduction to Linear Regression (cont.)

Page 12: Chapter 17: Introduction to Regression
Page 13: Chapter 17: Introduction to Regression

13

Introduction to Multiple Regression with Two Predictor Variables

• In the same way that linear regression produces an equation that uses values of X to predict values of Y, multiple regression produces an equation that uses two different variables (X1 and X2) to predict values of Y.

• The equation is determined by a least squared error solution that minimizes the squared distances between the actual Y values and the predicted Y values.

Page 14: Chapter 17: Introduction to Regression
Page 15: Chapter 17: Introduction to Regression

15

Introduction to Multiple Regression with Two Predictor Variables (cont.)• For two predictor variables, the general form of

the multiple regression equation is:

Ŷ= b1X1 + b2X2 + a

• The ability of the multiple regression equation to accurately predict the Y values is measured by first computing the proportion of the Y-score variability that is predicted by the regression equation and the proportion that is not predicted.

Page 16: Chapter 17: Introduction to Regression

16

Introduction to Multiple Regression with Two Predictor Variables (cont.)

As with linear regression, the unpredicted variability (SS and df) can be used to compute a standard error of estimate that measures the standard distance between the actual Y values and the predicted values.

Page 17: Chapter 17: Introduction to Regression

17

Introduction to Multiple Regression with Two Predictor Variables (cont.)• In addition, the overall significance of the

multiple regression equation can be evaluated with an F-ratio:

Page 18: Chapter 17: Introduction to Regression

18

Partial Correlation

• A partial correlation measures the relationship between two variables (X and Y) while eliminating the influence of a third variable (Z).

• Partial correlations are used to reveal the real, underlying relationship between two variables when researchers suspect that the apparent relation may be distorted by a third variable.

Page 19: Chapter 17: Introduction to Regression

19

Partial Correlation (cont.)

• For example, there probably is no underlying relationship between weight and mathematics skill for elementary school children.

• However, both of these variables are positively related to age: Older children weigh more and, because they have spent more years in school, have higher mathematics skills.

Page 20: Chapter 17: Introduction to Regression

20

Partial Correlation (cont.)

• As a result, weight and mathematics skill will show a positive correlation for a sample of children that includes several different ages.

• A partial correlation between weight and mathematics skill, holding age constant, would eliminate the influence of age and show the true correlation which is near zero.

Page 21: Chapter 17: Introduction to Regression
Page 22: Chapter 17: Introduction to Regression
Page 23: Chapter 17: Introduction to Regression