Upload
muriel-mcgee
View
224
Download
1
Tags:
Embed Size (px)
Citation preview
AP STATISTICSAP STATISTICS
LESSON 3 – 3 LESSON 3 – 3
LEAST – SQUARES REGRESSION
Regression Line Regression Line A regression line is a straight line that
describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x. Regression, unlike correlation, requires we have an explanatory variable and a response variable.
LSRL – Is the abbreviation for least squares regression line. LSRL is a mathematical model.
Least – squares Regression Line Least – squares Regression Line Error = observed – predicted
To find the most effective model we must square the errors and sum them to find the least errors squared.
Least – squares Regression LineLeast – squares Regression Line
The least – squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.
Equation of the LSRLEquation of the LSRLWe have data on an explanatory variable x
and a response variable y for n individuals. From the data, calculate the means x and y and the standard deviations sx and sy, and their correlation r.
¯ ¯
What happened to y = mx+b?What happened to y = mx+b?
y represents the observed (actual) values for y, and y represents the predicted values for y. We use y hat in the equation of the regression line to emphasize that the line gives predicted values for any x.
When you are solving regression problems, be sure to distinguish between y and y.
Hot tip: (x, y) is always a point on the regression line!
ˆ
ˆ
¯ ¯
AP STATISTICSAP STATISTICS
LESSON 3 – 3 (DAY 2)LESSON 3 – 3 (DAY 2)
The role of r2 in regression
Essential Question: Essential Question:
How is the rHow is the r22 used to determine the used to determine the reliability of a linear regression line?reliability of a linear regression line?
To calculate r2.
To find the SST, the SSE and find the r2
from them.
Definitions and AbbreviationsDefinitions and Abbreviations
r2 = coefficient of determination ( The proportion of the total sample variability that is explained by the least-squares regression of y on x.
LSRL – Least squares regression line.
SST – (Total Sum of Squares)
SST = ∑ ( y – y )
SSE – (Sum of squares of errors)
SSE = ∑ ( y – ŷ)
2
2
ExercisesExercises
Small rSmall r22 and Large r and Large r22
Page 158: Example 3.10 SMALL r2
Page 160: Example 3.11 LARGE r2
rr2 2 in Regressionin Regression
The coefficient of determination r2, is the fraction of the variation in the values of y that is explained by least-squares regression of y on x.
r2 = SST - SSE
SST
Facts about Least-squares Facts about Least-squares RegressionsRegressions
Fact 1: The distinction between explanatory and response variable is essential in regression.
Fact 2: There is a close connection between correlation and the slope of the least-squares line. A change of one standard deviation of x corresponds to a change of r standard deviations in y.
Facts of RegressionFacts of Regression(continued)(continued)
Fact 3. The least-squares regression line always passes through the point ( x, y ).
Fact 4. The square of the correlation, r2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.
A P STATISTICS A P STATISTICS
LESSON 3 – 3 (DAY 3)LESSON 3 – 3 (DAY 3)
RESIDUALS
ESSENTIAL QUESTION:
What is a residual and what can a residual graph tell us about linear regression lines?
Objective: To define and use residuals in the analysis of linear regression lines.
Residuals Residuals
A residual is the difference between an observed variable and the value predicted by the regression line.
That is, residual = observed y – predicted y
= y - ŷ
Residual FactsResidual Facts
The mean of the least-square residuals is always zero.
The sum is not exactly 0 because the software rounded the residuals to four decimal places.
This is roundoff error.
The horizontal line of the residual plot is at zero.
Residual PlotsResidual Plots A residual plot is a scatterplot of the regression
residuals against the explanatory variable. Residual plots help us assess the fit of a regression line.
If the regression line captures the overall relationship between x and y, the residuals should should have no systematic pattern. The residual plot will look something like the simplfied pattern. That plot shows a uniform scatter of the points about the fitted line, with no unusual individual observations.