48
The Coefficient of Determination Lecture 46 Section 13.9 Robb T. Koether Hampden-Sydney College Tue, Apr 13, 2010 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 1 / 48

The Coefficient of Determination - Hampden-Sydney …people.hsc.edu/faculty-staff/robbk/Math121/Lectures... ·  · 2010-04-131 The Regression Identity ... 4 TI-83 - The Coefficient

  • Upload
    vananh

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

The Coefficient of DeterminationLecture 46

Section 13.9

Robb T. Koether

Hampden-Sydney College

Tue, Apr 13, 2010

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 1 / 48

Outline

1 The Regression Identity

2 Sums of Squares on the TI-83

3 Explaining Variation

4 TI-83 - The Coefficient of Determination

5 Assignment

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 2 / 48

Outline

1 The Regression Identity

2 Sums of Squares on the TI-83

3 Explaining Variation

4 TI-83 - The Coefficient of Determination

5 Assignment

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 3 / 48

Explaining the Variation in y

Statisticians use regression models to “explain” y .More specifically, through the model they use variation in x toexplain variation in y .

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 4 / 48

Explaining the Variation in y

For example, why do some people weigh more than other people?One explanation is that some people weigh more than othersbecause they are taller.That is, there is variation in weight because their is variation inheight and because weight and height are correlated.But that is only a partial explanation.

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 5 / 48

Explaining the Variation in y

Statisticians want to quantify how much of the variation in y isexplained by the variation in x .

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 6 / 48

The Regression Identity

As always, variation is measure by calculating a sum of squareddeviations.There are three different deviations that we can measure.

I Deviations of y from y (variation in the data).I Deviations of y from y (variation in the model).I Deviations of y from y (difference between the data and the model).

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 7 / 48

The Regression Identity

Variation in the data (Total sum of squares):

SST =∑

(y − y)2.

Variation in the model (Regression sum of squares):

SSR =∑

(y − y)2.

Residues (Sum of squared Errors):

SSE =∑

(y − y)2.

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 8 / 48

Example - SST, SSR, and SSE

The following data represent the heights and weights of 10 adultmales.

Height (x) Weight (y )70 18565 14071 18076 22068 15067 17068 18572 20074 21069 160

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 9 / 48

Example - SST, SSR, and SSE

The regression line is

y = −310 + 7x .

The model predicts, for example, that if a person is 70 inches tall,he will weigh 180 pounds.The model also predicts that a person will weigh an additional 7pounds for each additional inch of height.

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 10 / 48

Example - SST, SSR, and SSE

Compute the predicted weight: Y1(L1)→ L3.Height (x) Weight (y ) Pred. Wgt. (y )

70 185 18065 140 14571 180 18776 220 22268 150 16667 170 15968 185 16672 200 19474 210 20869 160 173

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 11 / 48

Example - SST, SSR, and SSE

The regression line

64 66 68 70 72 74 76

140

160

220

200

180

150

170

210

190

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 12 / 48

Example - SST, SSR, and SSE

The deviations of y from y

64 66 68 70 72 74 76

140

160

220

200

180

150

170

210

190

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 13 / 48

Example - SST, SSR, and SSE

The deviations of y from y

64 66 68 70 72 74 76

140

160

220

200

180

150

170

210

190

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 14 / 48

Example - SST, SSR, and SSE

The deviations of y from y

64 66 68 70 72 74 76

140

160

220

200

180

150

170

210

190

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 15 / 48

Example

Compute SST.x y y − y (y − y)2

70 18565 14071 18076 22068 15067 17068 18572 20074 21069 160

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 16 / 48

Example

Compute SST: L2-y.x y y − y (y − y)2

70 185 565 140 −4071 180 076 220 4068 150 −3067 170 −1068 185 572 200 2074 210 3069 160 −20

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 17 / 48

Example

Compute SST: Ans2.x y y − y (y − y)2

70 185 5 2565 140 −40 160071 180 0 076 220 40 160068 150 −30 90067 170 −10 10068 185 5 2572 200 20 40074 210 30 90069 160 −20 400

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 18 / 48

Example

Compute SST: sum(Ans).x y y − y (y − y)2

70 185 5 2565 140 −40 160071 180 0 076 220 40 160068 150 −30 90067 170 −10 10068 185 5 2572 200 20 40074 210 30 90069 160 −20 400

5950

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 19 / 48

Example

Compute SSR.x y y y − y (y − y)2

70 18565 14071 18076 22068 15067 17068 18572 20074 21069 160

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 20 / 48

Example

Compute SSR: Y1(L1)→ L3.x y y y − y (y − y)2

70 185 18065 140 14571 180 18776 220 22268 150 16667 170 15968 185 16672 200 19474 210 20869 160 173

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 21 / 48

Example

Compute SSR: L3-y.x y y y − y (y − y)2

70 185 180 065 140 145 −3571 180 187 776 220 222 4268 150 166 −1467 170 159 −2168 185 166 −1472 200 194 1474 210 208 2869 160 173 −7

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 22 / 48

Example

Compute SSR: Ans2.x y y y − y (y − y)2

70 185 180 0 065 140 145 −35 122571 180 187 7 4976 220 222 42 176468 150 166 −14 19667 170 159 −21 44168 185 166 −14 19672 200 194 14 19674 210 208 28 78469 160 173 −7 49

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 23 / 48

Example

Compute SSR: sum(Ans).x y y y − y (y − y)2

70 185 180 0 065 140 145 −35 122571 180 187 7 4976 220 222 42 176468 150 166 −14 19667 170 159 −21 44168 185 166 −14 19672 200 194 14 19674 210 208 28 78469 160 173 −7 49

4900

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 24 / 48

Example

Compute SSE.x y y y − y (y − y)2

70 18565 14071 18076 22068 15067 17068 18572 20074 21069 160

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 25 / 48

Example

Compute SSE: Y1(L1)→ L3.x y y y − y (y − y)2

70 185 18065 140 14571 180 18776 220 22268 150 16667 170 15968 185 16672 200 19474 210 20869 160 173

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 26 / 48

Example

Compute SSE: L2-L3 → L4.x y y y − y (y − y)2

70 185 180 565 140 145 −571 180 187 −776 220 222 −268 150 166 −1667 170 159 1168 185 166 1972 200 194 674 210 208 −769 160 173 −13

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 27 / 48

Example

Compute SSE: Ans2.x y y y − y (y − y)2

70 185 180 5 2565 140 145 −5 2571 180 187 −7 4976 220 222 −2 468 150 166 −16 25667 170 159 11 12168 185 166 19 36172 200 194 6 3674 210 208 −7 4969 160 173 −13 169

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 28 / 48

Example

Compute SSE: sum(Ans).x y y y − y (y − y)2

70 185 180 5 2565 140 145 −5 2571 180 187 −7 4976 220 222 −2 468 150 166 −16 25667 170 159 11 12168 185 166 19 36172 200 194 6 3674 210 208 −7 4969 160 173 −13 169

1050

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 29 / 48

Example

We have now found that

SSR = 4900.

SSE = 1050.

SST = 5950.

We see thatSSR + SSE = SST.

This is called the regression identity.

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 30 / 48

Outline

1 The Regression Identity

2 Sums of Squares on the TI-83

3 Explaining Variation

4 TI-83 - The Coefficient of Determination

5 Assignment

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 31 / 48

TI-83 - Finding SSR, SSE, and SST

TI-83 SSR, SSE, and SSTPut the x values into L1 and the y values into L2.Use LinReg(a+bx) L1,L2,Y1.Enter Y1(L1)→L3.To get SSR, evaluate sum((L3-y)2).To get SSE, evaluate sum((L2-L3)2).To get SST, evaluate sum((L2-y)2).

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 32 / 48

Outline

1 The Regression Identity

2 Sums of Squares on the TI-83

3 Explaining Variation

4 TI-83 - The Coefficient of Determination

5 Assignment

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 33 / 48

Explaining Variation

One goal of regression is to “explain” the variation in y .For example, if y were weight, how would we explain the variationin weight?That is, why do some people weigh more than others?A partial answer is that some people weigh more because theyare taller.That is, an explanatory variable is height x .What are some other partial answers?

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 34 / 48

Explaining Variation

How much of the variation in weight is explained by variation inheight?The total variation in weight is SST.The linear model (the regression line) explains some of thevariation.The model predicts the variation SSR.The remainder is SSE, the variation not predicted by the model.

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 35 / 48

Explaining Variation

Statisticians consider the predicted variation SSR to be theamount of variation in y that is explained by the model.The residual variation SSE is the remaining variation in y that isnot explained by the model.It all checks out because SST = SSR + SSE.

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 36 / 48

Variation Explained by the Model

The regression line

64 66 68 70 72 74 76

140

160

220

200

180

150

170

210

190

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 37 / 48

Variation Explained by the Model

The total variation in y (SST)

64 66 68 70 72 74 76

140

160

220

200

180

150

170

210

190

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 38 / 48

Variation Explained by the Model

The variation in y that is explained by the model (SSR)

64 66 68 70 72 74 76

140

160

220

200

180

150

170

210

190

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 39 / 48

Variation Explained by the Model

The variation in y that is unexplained by the model (SSE)

64 66 68 70 72 74 76

140

160

220

200

180

150

170

210

190

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 40 / 48

Explaining Variation

It can be shown thatr2 =

SSRSST

and, therefore,

1− r2 =SSESST

.

Therefore, r2 is the proportion of variation in y that is explained bythe model. It is called the coefficient of determination.1− r2 is the proportion that is not explained by the model.

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 41 / 48

Outline

1 The Regression Identity

2 Sums of Squares on the TI-83

3 Explaining Variation

4 TI-83 - The Coefficient of Determination

5 Assignment

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 42 / 48

TI-83 - Coefficient of Determination

TI-83 Coefficient of DeterminationTo calculate r2 on the TI-83, follow the procedure that producesthe regression line and r .In the same window, the TI-83 reports the value of r2.

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 43 / 48

TI-83 - Finding SSR, SSE, and SST

PracticeThe data on the next slide represent crude oil pricesa (x) vs.gasoline pricesb (y ).Draw the scatter plot.Find the equation of the regression line.Perform the residual analysis.Find the correlation coefficient.Find the coefficient of determination.Compute SST, SSR, and SSE.

ahttp://tonto.eia.doe.gov/dnav/pet/xls/PET_PRI_WCO_K_W.xls

bhttp://tonto.eia.doe.gov/oog/ftparea/wogirs/xls/pswrgvwrec.xls

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 44 / 48

TI-83 - Finding SSR, SSE, and SSTPractice

Date Crude Oil Date GasolineJan 16 40.98 Jan 19 1.833Jan 23 41.05 Jan 26 1.833Jan 30 42.07 Feb 2 1.894Feb 6 41.77 Feb 9 1.926Feb 13 43.04 Feb 16 1.970Feb 20 39.87 Feb 23 1.924Feb 27 40.22 Mar 2 1.942Mar 6 42.85 Mar 9 1.936Mar 13 42.91 Mar 16 1.921Mar 20 44.90 Mar 23 1.950Mar 27 50.10 Mar 30 2.048Apr 3 48.09 Apr 9 2.044

Find SST, SSR, and SSE.Find r2 and interpret the value.

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 45 / 48

Outline

1 The Regression Identity

2 Sums of Squares on the TI-83

3 Explaining Variation

4 TI-83 - The Coefficient of Determination

5 Assignment

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 46 / 48

Assignment

HomeworkRead Section 13.9, pages 868 - 869.Work the practice problem on the previous slide.

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 47 / 48

Answers to Even-Numbered Exercises

Answers to Even-Numbered ExercisesSST = 0.0490, SSR = 0.0321, SSE = 0.0169.r2 = 0.6544. About 65.44% of the variation in gas prices is due tovariation in oil prices.

Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 48 / 48