37
Chapters 8, 9, 10 Least Squares Regression Line Fitting a Line to Bivariate Data

Chapters 8, 9, 10 Least Squares Regression Line Fitting a Line to Bivariate Data

Embed Size (px)

Citation preview

  • Slide 1
  • Slide 2
  • Chapters 8, 9, 10 Least Squares Regression Line Fitting a Line to Bivariate Data
  • Slide 3
  • Suppose there is a relationship between two numerical variables. Data: (x 1, y 1 ), (x 2, y 2 ), , (x n, y n ) Let x be the amount spent on advertising and y be the amount of sales for the product during a given period. You might want to predict product sales for a month (y) when the amount spent on advertizing is $10,000 (x). The letter y is used to denoted the variable you want to predict, called the response variable. The other variable, denoted by x, is the explanatory variable.
  • Slide 4
  • Simplest Relationship Simplest equation that describes the dependence of variable y on variable x y = b 0 + b 1 x linear equation b 1 is the slope it is the amount by which y changes when x increases by 1 unit y-intercept b 0 where the line crosses the y-axis; that is, the value of y when x = 0.
  • Slide 5
  • Graph is a line y x0 b0b0 y=b 0 +b 1 x run rise Slope b=rise/run
  • Slide 6
  • How do you find an appropriate line for describing a bivariate data set? y = 10 + 2x y = 4 + 2.5x Lets look at only the blue line. To assess the fit of a line, we look at how the points deviate vertically from the line. What is the meaning of a negative deviation? The point (15,44) has a deviation of +4. To assess the fit of a line, we need a way to combine the n deviations into a single measure of fit.
  • Slide 7
  • The deviations are referred to as residuals and denoted e i.
  • Slide 8
  • Residuals: graphically
  • Slide 9
  • 8 The Least Squares (Regression) Line A good line is one that minimizes the sum of squared differences between the points and the line.
  • Slide 10
  • The Least Squares (Regression) Line 9 3 3 4 1 1 4 (1,2) 2 2 (2,4) (3,1.5) Sum of squared differences =(2 - 1) 2 +(4 - 2) 2 +(1.5 - 3) 2 + (4,3.2) (3.2 - 4) 2 = 6.89 Sum of squared differences =(2 -2.5) 2 +(4 - 2.5) 2 +(1.5 - 2.5) 2 +(3.2 - 2.5) 2 = 3.99 2.5 Let us compare two lines The second line is horizontal The smaller the sum of squared differences the better the fit of the line to the data.
  • Slide 11
  • Criterion for choosing what line to draw: method of least squares The method of least squares chooses the line that makes the sum of squares of the residuals as small as possible This line has slope b 1 and intercept b 0 that minimizes
  • Slide 12
  • Least Squares Line y = b 0 + b 1 x: Slope b 1 and Intercept b 0
  • Slide 13
  • Scatterplot with least squares prediction line (x i, y i ): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)
  • Slide 14
  • Observed y, Predicted y predicted y when x=2.7 = b 0 + b 1 x = b 0 + b 1 *2.7 2.7
  • Slide 15
  • Car Weight, Fuel Consumption Example, cont. (x i, y i ): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)
  • Slide 16
  • Wt (x) Fuel (y) 3.45.5.5.251.111.231.555 3.85.9.9.811.512.28011.359 4.16.51.21.442.114.45212.532 2.23.3-.7.49-1.091.1881.763 2.63.6-.3.09-.79.6241.237 2.94.600.21.04410 2.02.9-.9.81-1.492.22011.341 2.73.6-.2.04-.79.6241.158 1.93.11-1.291.66411.29 3.44.9.5.25.51.2601.255 2943.905.18014.5898.49 col. sum
  • Slide 17
  • Calculations
  • Slide 18
  • Scatterplot with least squares prediction line
  • Slide 19
  • The Least Squares Line Always goes Through ( x, y ) (x, y ) = (2.9, 4.39)
  • Slide 20
  • Using the least squares line for prediction. Fuel consumption of 3,000 lb car? (x=3)
  • Slide 21
  • Be Careful! Fuel consumption of 500 lb car? (x =.5) x =.5 is outside the range of the x-data that we used to determine the least squares line
  • Slide 22
  • Avoid GIGO! Evaluating the least squares line 1.Create scatterplot. Approximately linear? 2.Calculate r 2, the square of the correlation coefficient 3.Examine residual plot
  • Slide 23
  • r 2 : The Variation Accounted For The square of the correlation coefficient r gives important information about the usefulness of the least squares line
  • Slide 24
  • r 2 : important information for evaluating the usefulness of the least squares line The square of the correlation coefficient, r 2, is the fraction of the variation in y that is explained by the least squares regression of y on x. -1 r 1 implies 0 r 2 1 The square of the correlation coefficient, r 2, is the fraction of the variation in y that is explained by differences in x.
  • Slide 25
  • March Madness: S(k) Sagarin rating of k th seeded team; Y ij =Vegas point spread between seed i and seed j, i