18
AP STATISTICS AP STATISTICS LESSON 3 – 3 LESSON 3 – 3 LEAST – SQUARES REGRESSION

AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

Embed Size (px)

Citation preview

Page 1: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

AP STATISTICSAP STATISTICS

LESSON 3 – 3 LESSON 3 – 3

LEAST – SQUARES REGRESSION

Page 2: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

Regression Line Regression Line A regression line is a straight line that

describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x. Regression, unlike correlation, requires we have an explanatory variable and a response variable.

LSRL – Is the abbreviation for least squares regression line. LSRL is a mathematical model.

Page 3: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

Least – squares Regression Line Least – squares Regression Line Error = observed – predicted

To find the most effective model we must square the errors and sum them to find the least errors squared.

Page 4: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

Least – squares Regression LineLeast – squares Regression Line

The least – squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.

Page 5: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

Equation of the LSRLEquation of the LSRLWe have data on an explanatory variable x

and a response variable y for n individuals. From the data, calculate the means x and y and the standard deviations sx and sy, and their correlation r.

¯ ¯

Page 6: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

What happened to y = mx+b?What happened to y = mx+b?

y represents the observed (actual) values for y, and y represents the predicted values for y. We use y hat in the equation of the regression line to emphasize that the line gives predicted values for any x.

When you are solving regression problems, be sure to distinguish between y and y.

Hot tip: (x, y) is always a point on the regression line!

ˆ

ˆ

¯ ¯

Page 7: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

AP STATISTICSAP STATISTICS

LESSON 3 – 3 (DAY 2)LESSON 3 – 3 (DAY 2)

The role of r2 in regression

Page 8: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

Essential Question: Essential Question:

How is the rHow is the r22 used to determine the used to determine the reliability of a linear regression line?reliability of a linear regression line?

To calculate r2.

To find the SST, the SSE and find the r2

from them.

Page 9: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

Definitions and AbbreviationsDefinitions and Abbreviations

r2 = coefficient of determination ( The proportion of the total sample variability that is explained by the least-squares regression of y on x.

LSRL – Least squares regression line.

SST – (Total Sum of Squares)

SST = ∑ ( y – y )

SSE – (Sum of squares of errors)

SSE = ∑ ( y – ŷ)

2

2

Page 10: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

ExercisesExercises

Small rSmall r22 and Large r and Large r22

Page 158: Example 3.10 SMALL r2

Page 160: Example 3.11 LARGE r2

Page 11: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

rr2 2 in Regressionin Regression

The coefficient of determination r2, is the fraction of the variation in the values of y that is explained by least-squares regression of y on x.

r2 = SST - SSE

SST

Page 12: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

Facts about Least-squares Facts about Least-squares RegressionsRegressions

Fact 1: The distinction between explanatory and response variable is essential in regression.

Fact 2: There is a close connection between correlation and the slope of the least-squares line. A change of one standard deviation of x corresponds to a change of r standard deviations in y.

Page 13: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

Facts of RegressionFacts of Regression(continued)(continued)

Fact 3. The least-squares regression line always passes through the point ( x, y ).

Fact 4. The square of the correlation, r2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.

Page 14: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

A P STATISTICS A P STATISTICS

LESSON 3 – 3 (DAY 3)LESSON 3 – 3 (DAY 3)

RESIDUALS

Page 15: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

ESSENTIAL QUESTION:

What is a residual and what can a residual graph tell us about linear regression lines?

Objective: To define and use residuals in the analysis of linear regression lines.

Page 16: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

Residuals Residuals

A residual is the difference between an observed variable and the value predicted by the regression line.

That is, residual = observed y – predicted y

= y - ŷ

Page 17: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

Residual FactsResidual Facts

The mean of the least-square residuals is always zero.

The sum is not exactly 0 because the software rounded the residuals to four decimal places.

This is roundoff error.

The horizontal line of the residual plot is at zero.

Page 18: AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION

Residual PlotsResidual Plots A residual plot is a scatterplot of the regression

residuals against the explanatory variable. Residual plots help us assess the fit of a regression line.

If the regression line captures the overall relationship between x and y, the residuals should should have no systematic pattern. The residual plot will look something like the simplfied pattern. That plot shows a uniform scatter of the points about the fitted line, with no unusual individual observations.