Upload
anabel-mason
View
225
Download
2
Tags:
Embed Size (px)
Citation preview
Regression
The Basic ProblemThe Basic Problem
• How do we predict one variable How do we predict one variable from another?from another?
• How does one variable change as How does one variable change as the other changes?the other changes?
• Cause and effectCause and effect
An ExampleAn Example
• Cigarettes and CHD Mortality from Cigarettes and CHD Mortality from Chapter 9Chapter 9
• Data repeated on next slideData repeated on next slide
• We want to predict level of CHD We want to predict level of CHD mortality in a country averaging 10 mortality in a country averaging 10 cigarettes per day.cigarettes per day.
The DataThe Data
Cigarette Consumption and Coronary Heart Disease Mortality for 21 Countries
Cig. 11 9 9 9 8 8 8 6 6 5 5CHD 26 21 24 21 19 13 19 11 23 15 13
Cig. 5 5 5 5 4 4 4 3 3 3CHD 4 18 12 3 11 15 6 13 4 14
Cig. = Cigarettes per adult per dayCHD = Cornary Heart Disease Mortality per 10,000 population
Cigarette Consumption per Adult per Day
12108642
CH
D M
ort
alit
y p
er 1
0,00
0
30
20
10
0
Why Do We Care?Why Do We Care?
• We may want to make a prediction.We may want to make a prediction.
• More likely, we want to understand More likely, we want to understand the relationship.the relationship. How fast does CHD mortality rise with a How fast does CHD mortality rise with a
one unit increase in smoking?one unit increase in smoking?
Note we speak about predicting, but Note we speak about predicting, but often don’t actually predict.often don’t actually predict.
Regression LineRegression Line
• FormulaFormula
= the predicted value of = the predicted value of YY (CHD (CHD mortality)mortality)
XX = smoking incidence for that country = smoking incidence for that country
abXY ˆ
Y
Regression CoefficientsRegression Coefficients
• ““Coefficients” are Coefficients” are aa and and bb
• bb = slope = slope Change in predicted Change in predicted Y Y for one unit for one unit
change inchange in X X
• aa = intercept = intercept value of when value of when XX = 0 = 0Y
CalculationCalculation
• SlopeSlope
• InterceptIntercept
2
cov
X
XY
sb
XbYa
For Our DataFor Our Data
• CovCovXYXY = 11.13 = 11.13
• ss22XX = 2.33 = 2.3322 = 5.43 = 5.43
• bb = 11.13/5.43 = 2.04 = 11.13/5.43 = 2.04
• aa = 14.52 - 2.04*5.95 = 2.37 = 14.52 - 2.04*5.95 = 2.37
• See SPSS printout on next slideSee SPSS printout on next slideAnswers are not exact due to rounding error and desire to match SPSS.
SPSS PrintoutSPSS Printout
Note:Note:
• The values we obtained are shown The values we obtained are shown on printout.on printout.
• The intercept is labeled “constant.”The intercept is labeled “constant.”
• Slope is labeled by name of Slope is labeled by name of predictor variable.predictor variable.
Making a PredictionMaking a Prediction
• Assume that we want CHD mortality Assume that we want CHD mortality when cigarette consumption of 6.when cigarette consumption of 6.
• We predict 14.61 people/10,000 in that We predict 14.61 people/10,000 in that country will die of coronary heart country will die of coronary heart disease. disease.
61.1437.26*04.2ˆ
37.204.2ˆ
Y
XabXY
Accuracy of PredictionAccuracy of Prediction
• Finnish smokers smoke 6 Finnish smokers smoke 6 cigarettes/adult/daycigarettes/adult/day
• We predict 14.61 deaths/10,000We predict 14.61 deaths/10,000
• They actually have 23 deaths/10,000They actually have 23 deaths/10,000
• Our error (“residual”) = 23 - 14.61 = Our error (“residual”) = 23 - 14.61 = 8.398.39 a large errora large error
Cigarette Consumption per Adult per Day
12108642
CH
D M
orta
lity
per
10,0
00
30
20
10
0
Residual
Prediction
Errors of PredictionErrors of Prediction
• Residual varianceResidual variance The variability of predicted valuesThe variability of predicted values
• Standard error of estimateStandard error of estimate The standard deviation of predicted valuesThe standard deviation of predicted values
2)ˆ( 2
2ˆ
N
YYs
YY
Standard Error of EstimateStandard Error of Estimate
• A common measure of the accuracy A common measure of the accuracy of our predictionsof our predictions We want it to be as small as possible. We want it to be as small as possible.
2
)ˆ( 2
ˆ
N
YYs
YY
r r 22 as % Predictable as % Predictable VariabilityVariability
• Define Sum of SquaresDefine Sum of Squares
• The remaining error divided by the The remaining error divided by the original errororiginal error
Y
regressionY
SS
SSSSr
2
For Our DataFor Our Data
• rr = .713 = .713
• r r 22 = .713 = .71322 =.508 =.508
• Approximately 50% in variability of Approximately 50% in variability of incidence of CHD mortality is incidence of CHD mortality is associated with variability in smoking.associated with variability in smoking.
• Elaborate on what this means.Elaborate on what this means.
Hypothesis TestingHypothesis Testing
• Null hypothesesNull hypotheses bb** = 0 = 0
aa** = 0 = 0• Define Define bb* and * and aa**
population correlation (population correlation () = 0) = 0
• We saw how to test the last one in We saw how to test the last one in Chapter 9.Chapter 9.
Testing Slope and Testing Slope and InterceptIntercept
• These are given in computer These are given in computer printout as a printout as a tt test. test.
TestingTesting
• The The tt values in the second from right values in the second from right column are tests on slope and intercept.column are tests on slope and intercept.
• The associated The associated pp values are next to values are next to them.them.
• The slope is significantly different from The slope is significantly different from zero, but not the intercept.zero, but not the intercept.
• Why do we care?Why do we care?
Cont.
Testing--cont.Testing--cont.
• What does it mean if slope is not What does it mean if slope is not significant?significant? How does that relate to test on How does that relate to test on rr??
• What if the intercept is not What if the intercept is not significant?significant?
• Does significant slope mean we Does significant slope mean we predict quite well?predict quite well?