Chapters 8, 9, 10 Least Squares Regression Line Fitting a Line
to Bivariate Data
Slide 3
Suppose there is a relationship between two numerical
variables. Data: (x 1, y 1 ), (x 2, y 2 ), , (x n, y n ) Let x be
the amount spent on advertising and y be the amount of sales for
the product during a given period. You might want to predict
product sales for a month (y) when the amount spent on advertizing
is $10,000 (x). The letter y is used to denoted the variable you
want to predict, called the response variable. The other variable,
denoted by x, is the explanatory variable.
Slide 4
Simplest Relationship Simplest equation that describes the
dependence of variable y on variable x y = b 0 + b 1 x linear
equation b 1 is the slope it is the amount by which y changes when
x increases by 1 unit y-intercept b 0 where the line crosses the
y-axis; that is, the value of y when x = 0.
Slide 5
Graph is a line y x0 b0b0 y=b 0 +b 1 x run rise Slope
b=rise/run
Slide 6
How do you find an appropriate line for describing a bivariate
data set? y = 10 + 2x y = 4 + 2.5x Lets look at only the blue line.
To assess the fit of a line, we look at how the points deviate
vertically from the line. What is the meaning of a negative
deviation? The point (15,44) has a deviation of +4. To assess the
fit of a line, we need a way to combine the n deviations into a
single measure of fit.
Slide 7
The deviations are referred to as residuals and denoted e
i.
Slide 8
Residuals: graphically
Slide 9
8 The Least Squares (Regression) Line A good line is one that
minimizes the sum of squared differences between the points and the
line.
Slide 10
The Least Squares (Regression) Line 9 3 3 4 1 1 4 (1,2) 2 2
(2,4) (3,1.5) Sum of squared differences =(2 - 1) 2 +(4 - 2) 2
+(1.5 - 3) 2 + (4,3.2) (3.2 - 4) 2 = 6.89 Sum of squared
differences =(2 -2.5) 2 +(4 - 2.5) 2 +(1.5 - 2.5) 2 +(3.2 - 2.5) 2
= 3.99 2.5 Let us compare two lines The second line is horizontal
The smaller the sum of squared differences the better the fit of
the line to the data.
Slide 11
Criterion for choosing what line to draw: method of least
squares The method of least squares chooses the line that makes the
sum of squares of the residuals as small as possible This line has
slope b 1 and intercept b 0 that minimizes
Slide 12
Least Squares Line y = b 0 + b 1 x: Slope b 1 and Intercept b
0
Slide 13
Scatterplot with least squares prediction line (x i, y i ):
(3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6)
(2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)
Slide 14
Observed y, Predicted y predicted y when x=2.7 = b 0 + b 1 x =
b 0 + b 1 *2.7 2.7
Slide 15
Car Weight, Fuel Consumption Example, cont. (x i, y i ): (3.4,
5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2,
2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)
The Least Squares Line Always goes Through ( x, y ) (x, y ) =
(2.9, 4.39)
Slide 20
Using the least squares line for prediction. Fuel consumption
of 3,000 lb car? (x=3)
Slide 21
Be Careful! Fuel consumption of 500 lb car? (x =.5) x =.5 is
outside the range of the x-data that we used to determine the least
squares line
Slide 22
Avoid GIGO! Evaluating the least squares line 1.Create
scatterplot. Approximately linear? 2.Calculate r 2, the square of
the correlation coefficient 3.Examine residual plot
Slide 23
r 2 : The Variation Accounted For The square of the correlation
coefficient r gives important information about the usefulness of
the least squares line
Slide 24
r 2 : important information for evaluating the usefulness of
the least squares line The square of the correlation coefficient, r
2, is the fraction of the variation in y that is explained by the
least squares regression of y on x. -1 r 1 implies 0 r 2 1 The
square of the correlation coefficient, r 2, is the fraction of the
variation in y that is explained by differences in x.
Slide 25
March Madness: S(k) Sagarin rating of k th seeded team; Y ij
=Vegas point spread between seed i and seed j, i