23
Regression

Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Embed Size (px)

Citation preview

Page 1: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Regression

   

Page 2: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

The Basic ProblemThe Basic Problem

• How do we predict one variable How do we predict one variable from another?from another?

• How does one variable change as How does one variable change as the other changes?the other changes?

• Cause and effectCause and effect

Page 3: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

An ExampleAn Example

• Cigarettes and CHD Mortality from Cigarettes and CHD Mortality from Chapter 9Chapter 9

• Data repeated on next slideData repeated on next slide

• We want to predict level of CHD We want to predict level of CHD mortality in a country averaging 10 mortality in a country averaging 10 cigarettes per day.cigarettes per day.

Page 4: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

The DataThe Data

Cigarette Consumption and Coronary Heart Disease Mortality for 21 Countries

Cig. 11 9 9 9 8 8 8 6 6 5 5CHD 26 21 24 21 19 13 19 11 23 15 13

Cig. 5 5 5 5 4 4 4 3 3 3CHD 4 18 12 3 11 15 6 13 4 14

Cig. = Cigarettes per adult per dayCHD = Cornary Heart Disease Mortality per 10,000 population

Page 5: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Cigarette Consumption per Adult per Day

12108642

CH

D M

ort

alit

y p

er 1

0,00

0

30

20

10

0

Page 6: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Why Do We Care?Why Do We Care?

• We may want to make a prediction.We may want to make a prediction.

• More likely, we want to understand More likely, we want to understand the relationship.the relationship. How fast does CHD mortality rise with a How fast does CHD mortality rise with a

one unit increase in smoking?one unit increase in smoking?

Note we speak about predicting, but Note we speak about predicting, but often don’t actually predict.often don’t actually predict.

Page 7: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Regression LineRegression Line

• FormulaFormula

= the predicted value of = the predicted value of YY (CHD (CHD mortality)mortality)

XX = smoking incidence for that country = smoking incidence for that country

abXY ˆ

Y

Page 8: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Regression CoefficientsRegression Coefficients

• ““Coefficients” are Coefficients” are aa and and bb

• bb = slope = slope Change in predicted Change in predicted Y Y for one unit for one unit

change inchange in X X

• aa = intercept = intercept value of when value of when XX = 0 = 0Y

Page 9: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

CalculationCalculation

• SlopeSlope

• InterceptIntercept

2

cov

X

XY

sb

XbYa

Page 10: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

For Our DataFor Our Data

• CovCovXYXY = 11.13 = 11.13

• ss22XX = 2.33 = 2.3322 = 5.43 = 5.43

• bb = 11.13/5.43 = 2.04 = 11.13/5.43 = 2.04

• aa = 14.52 - 2.04*5.95 = 2.37 = 14.52 - 2.04*5.95 = 2.37

• See SPSS printout on next slideSee SPSS printout on next slideAnswers are not exact due to rounding error and desire to match SPSS.

Page 11: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

SPSS PrintoutSPSS Printout

Page 12: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Note:Note:

• The values we obtained are shown The values we obtained are shown on printout.on printout.

• The intercept is labeled “constant.”The intercept is labeled “constant.”

• Slope is labeled by name of Slope is labeled by name of predictor variable.predictor variable.

Page 13: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Making a PredictionMaking a Prediction

• Assume that we want CHD mortality Assume that we want CHD mortality when cigarette consumption of 6.when cigarette consumption of 6.

• We predict 14.61 people/10,000 in that We predict 14.61 people/10,000 in that country will die of coronary heart country will die of coronary heart disease. disease.

61.1437.26*04.2ˆ

37.204.2ˆ

Y

XabXY

Page 14: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Accuracy of PredictionAccuracy of Prediction

• Finnish smokers smoke 6 Finnish smokers smoke 6 cigarettes/adult/daycigarettes/adult/day

• We predict 14.61 deaths/10,000We predict 14.61 deaths/10,000

• They actually have 23 deaths/10,000They actually have 23 deaths/10,000

• Our error (“residual”) = 23 - 14.61 = Our error (“residual”) = 23 - 14.61 = 8.398.39 a large errora large error

Page 15: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Cigarette Consumption per Adult per Day

12108642

CH

D M

orta

lity

per

10,0

00

30

20

10

0

Residual

Prediction

Page 16: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Errors of PredictionErrors of Prediction

• Residual varianceResidual variance The variability of predicted valuesThe variability of predicted values

• Standard error of estimateStandard error of estimate The standard deviation of predicted valuesThe standard deviation of predicted values

2)ˆ( 2

N

YYs

YY

Page 17: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Standard Error of EstimateStandard Error of Estimate

• A common measure of the accuracy A common measure of the accuracy of our predictionsof our predictions We want it to be as small as possible. We want it to be as small as possible.

2

)ˆ( 2

ˆ

N

YYs

YY

Page 18: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

r r 22 as % Predictable as % Predictable VariabilityVariability

• Define Sum of SquaresDefine Sum of Squares

• The remaining error divided by the The remaining error divided by the original errororiginal error

Y

regressionY

SS

SSSSr

2

Page 19: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

For Our DataFor Our Data

• rr = .713 = .713

• r r 22 = .713 = .71322 =.508 =.508

• Approximately 50% in variability of Approximately 50% in variability of incidence of CHD mortality is incidence of CHD mortality is associated with variability in smoking.associated with variability in smoking.

• Elaborate on what this means.Elaborate on what this means.

Page 20: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Hypothesis TestingHypothesis Testing

• Null hypothesesNull hypotheses bb** = 0 = 0

aa** = 0 = 0• Define Define bb* and * and aa**

population correlation (population correlation () = 0) = 0

• We saw how to test the last one in We saw how to test the last one in Chapter 9.Chapter 9.

Page 21: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Testing Slope and Testing Slope and InterceptIntercept

• These are given in computer These are given in computer printout as a printout as a tt test. test.

Page 22: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

TestingTesting

• The The tt values in the second from right values in the second from right column are tests on slope and intercept.column are tests on slope and intercept.

• The associated The associated pp values are next to values are next to them.them.

• The slope is significantly different from The slope is significantly different from zero, but not the intercept.zero, but not the intercept.

• Why do we care?Why do we care?

Cont.

Page 23: Regression. The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as

Testing--cont.Testing--cont.

• What does it mean if slope is not What does it mean if slope is not significant?significant? How does that relate to test on How does that relate to test on rr??

• What if the intercept is not What if the intercept is not significant?significant?

• Does significant slope mean we Does significant slope mean we predict quite well?predict quite well?