Upload
pauline-greer
View
223
Download
0
Embed Size (px)
Citation preview
M23- Residuals & Minitab 1 Department of ISM, University of Alabama, 1992-2003
ResidualsResidualsResidualsResiduals
A continuation ofregression analysis
M23- Residuals & Minitab 2 Department of ISM, University of Alabama, 1992-2003
Lesson Objectives
Continue to build on regression analysis.
Learn how residual plots help identify problems with the analysis.
M23- Residuals & Minitab 3 Department of ISM, University of Alabama, 1992-2003
Example 1: Sample of n = 5 students, Y = Weight in pounds, X = Height in inches.
Case X Y
1 73 175
2 68 158
3 67 140
4 72 207
5 62 115
Wt = – 332.73 + 7.189 Ht^Prediction equation:
r-square = ?
Std. error = ?
To be foundlater.
continued …
M23- Residuals & Minitab 4 Department of ISM, University of Alabama, 1992-2003
100
120
140
160
180
200
220
60 64 68 72 76HEIGHT
Y = – 332.7 + 7.189Y = – 332.7 + 7.189XXY = – 332.7 + 7.189Y = – 332.7 + 7.189XX^
Residuals = Residuals = distance from distance from point to line, point to line, measuredmeasuredparallel to parallel to YY- axis.- axis.
Residuals = Residuals = distance from distance from point to line, point to line, measuredmeasuredparallel to parallel to YY- axis.- axis.
WE
IGH
TExample 1, continued
M23- Residuals & Minitab 5 Department of ISM, University of Alabama, 1992-2003
Calculation: For each case,
^ei = yi - yi
residual = observed value estimated mean
For the ith case,
M23- Residuals & Minitab 6 Department of ISM, University of Alabama, 1992-2003
Compute the fitted value and residual for the 4th person in the sample; i.e., X = 72 inches, Y = 207 lbs.
^fitted value = y = 4 -332.73 + 7.189( )= _________
residual = e4 =^ y4 - y4
=
= __________
Example 1, continued
M23- Residuals & Minitab 7 Department of ISM, University of Alabama, 1992-2003
ResidualResidual
PlotsPlotsResidualResidual
PlotsPlots
Scatterplot of residuals vs. the predicted means of Y, Y; or an X-variable.
^
M23- Residuals & Minitab 8 Department of ISM, University of Alabama, 1992-2003
100
120
140
160
180
200
220
60 64 68 72 76HEIGHT
Y = – 332.7 + 7.189X^
WE
IGH
T
Residuals = Residuals = distance from distance from point to line, point to line, measuredmeasuredparallel to parallel to YY- axis.- axis.
Residuals = Residuals = distance from distance from point to line, point to line, measuredmeasuredparallel to parallel to YY- axis.- axis.
Example 1, continuede4 = +22.12.
M23- Residuals & Minitab 9 Department of ISM, University of Alabama, 1992-2003
-24
-16
-8
0
8
16
24
60 64 68 72 76HEIGHT
Res
idu
als
Residual Plote4 is theresidual for the 4th case,= +22.12.
Example 1, continued
Regression line from previous plot is rotated to horizontal.
M23- Residuals & Minitab 10 Department of ISM, University of Alabama, 1992-2003
Residual Plot
Expect random dispersion around a horizontal line at zero.
Problems occur if:• Unusual patterns• Unusual cases
Scatterplot of residuals versus the predicted means of Y, Y; or an X-variable, or Time.
^
M23- Residuals & Minitab 11 Department of ISM, University of Alabama, 1992-2003
Residuals versus X
Good random patternGood random pattern
0
Res
idu
als
X, or time
M23- Residuals & Minitab 12 Department of ISM, University of Alabama, 1992-2003
Residuals versus X
Outliers?Outliers?
0
Res
idu
als
X, or time
Next step: ________ to determineif a recording error has occurred.
M23- Residuals & Minitab 13 Department of ISM, University of Alabama, 1992-2003
X, or timeNonlinear relationshipNonlinear relationship
Residuals versus X
0
Res
idu
als
Next step: Add a “quadratic term,”
or use “______.”
M23- Residuals & Minitab 14 Department of ISM, University of Alabama, 1992-2003
0
Variance is increasingVariance is increasing
Res
idu
als
Residuals versus X
X, or time
Next step: Stabilize variance by using “________.”
M23- Residuals & Minitab 15 Department of ISM, University of Alabama, 1992-2003
Residual Plots help identifyResidual Plots help identify
Unusual patterns: Possible curvature in the data. Variances that are not constant as X changes.
Unusual cases: Outliers High leverage cases Influential cases
M23- Residuals & Minitab 16 Department of ISM, University of Alabama, 1992-2003
Three properties of Three properties of ResidualsResiduals
Three properties of Three properties of ResidualsResiduals
illustrated with somecomputations.
Y = WeightX = Height
Y = WeightX = Height Y = – 332.73 + 7.189 X^
73 175 68 15867 14072 20762 115
X Y Y e = Y – Y
.01
–17.07 1.88
192.07
. . .
156.12
Residuals
Find the sum of theresiduals.
Find the sum of theresiduals.
Property 1.
round-off error
M23- Residuals & Minitab 18 Department of ISM, University of Alabama, 1992-2003
1. Residuals always sum to zero.
Properties of Least Squares Line
ei = 0.
Y = WeightX = Height
Y = WeightX = Height Y = – 332.73 + 7.189 X^
73 175 68 15867 14072 20762 115
X Y Y192.07156.12148.93184.88112.99
e = Y – Y
–17.07 1.88 –8.93 22.12 2.01
e2
291.38 3.53 79.74489.29 4.04
.01 867.98
Property 2.
Find the sum of squaresof the residuals.Find the sum of squaresof the residuals.
M23- Residuals & Minitab 20 Department of ISM, University of Alabama, 1992-2003
1. Residuals always sum to zero.
Properties of Least Squares Line
2. This “least squares” line produces a smaller “Sum of squared residuals” than any other straight line can.
ei2 = SSE = 867.98 <
“SSE for any other
line”.
M23- Residuals & Minitab 21 Department of ISM, University of Alabama, 1992-2003
100
120
140
160
180
200
220
60 64 68 72 76HEIGHT
X = 68.4, Y = 159
X
Y
WE
IGH
TProperty 3.
M23- Residuals & Minitab 22 Department of ISM, University of Alabama, 1992-2003
1. Residuals always sum to zero.
2. This “least squares” line produces a smaller “Sum of squared residuals” than any other straight line can.
Properties of Least Squares Line
3. Line always passes through the point ( x, y ).
M23- Residuals & Minitab 23 Department of ISM, University of Alabama, 1992-2003
Illustration of unusual cases:
Outliers
Leverage
Influential
M23- Residuals & Minitab 24 Department of ISM, University of Alabama, 1992-2003
Y
X
outlieroutlieroutlieroutlier
X
“Unusual point” does notnot follow patternpattern. It’s near the X-meannear the X-mean; the entire line pulled toward it.
“Unusual point” does notnot follow patternpattern. It’s near the X-meannear the X-mean; the entire line pulled toward it.
M23- Residuals & Minitab 25 Department of ISM, University of Alabama, 1992-2003
Y
X
outlieroutlieroutlieroutlier
X
“Unusual point” does
notnot follow patternpattern. The line is pulled down and
twistedtwisted slightlyslightly.
“Unusual point” does
notnot follow patternpattern. The line is pulled down and
twistedtwisted slightlyslightly.
M23- Residuals & Minitab 26 Department of ISM, University of Alabama, 1992-2003
Y
X
HighHigh
leverageleverageHighHigh
leverageleverage
X
“Unusual point” is
farfar fromfrom the X-meanX-mean, but
still followsfollows the patternpattern.
“Unusual point” is
farfar fromfrom the X-meanX-mean, but
still followsfollows the patternpattern.
M23- Residuals & Minitab 27 Department of ISM, University of Alabama, 1992-2003
Y
X
leverage leverage
& outlier,influentialinfluential
X
“Unusual point” is far from the X-meanfar from the X-mean, but does notnot follow the patternpattern.
Line reallyreally twists twists!
“Unusual point” is far from the X-meanfar from the X-mean, but does notnot follow the patternpattern.
Line reallyreally twists twists!
M23- Residuals & Minitab 28 Department of ISM, University of Alabama, 1992-2003
High Leverage Case: An extreme extreme XX value value relative to the other X values.
Outlier: An unusual y-value relative to
the patternpattern of the other cases. Usually has a large residual.
Definitions:
M23- Residuals & Minitab 29 Department of ISM, University of Alabama, 1992-2003
has an
unusually large effect on the slope of the least squares line.
Influential Case
Definitions: continued
M23- Residuals & Minitab 30 Department of ISM, University of Alabama, 1992-2003
High leverage
Definitions: continued
High leverage & Outlier influential!!
potentially influential.
Conclusion:
M23- Residuals & Minitab 31 Department of ISM, University of Alabama, 1992-2003
The least squares regression line is
notnot resistant resistant
to unusual cases.
The least squares regression line is
notnot resistant resistant
to unusual cases.
Why do we care about identifying unusual cases?
M23- Residuals & Minitab 32 Department of ISM, University of Alabama, 1992-2003
RegressionRegression
AnalysisAnalysis
in Minitabin Minitab
RegressionRegression
AnalysisAnalysis
in Minitabin Minitab
M23- Residuals & Minitab 33 Department of ISM, University of Alabama, 1992-2003
Lesson Objectives
Learn two ways to use Minitab to runa regression analysis.
Learn how to read output from Minitab.
M23- Residuals & Minitab 34 Department of ISM, University of Alabama, 1992-2003
Can height be predicted using shoe size?
Example 3, continued …
Step 1?
DTDPDTDP
M23- Residuals & Minitab 35 Department of ISM, University of Alabama, 1992-2003
Can height be predicted using shoe size? Example 3, continued …
15141312111098765
84
80
76
72
68
64
60
56
Shoe Size
Hei
ght
“Jitter” added in X-direction.
ScatterplotGraph
Plot …
The scatter for eachsubpopulation is about the same; i.e., there is “constant variance.”
FemaleMale
M23- Residuals & Minitab 36 Department of ISM, University of Alabama, 1992-2003
Stat
Regression
Regression …
Y = a + bXY = a + bX
Example 3, continued …
Method 1
M23- Residuals & Minitab 37 Department of ISM, University of Alabama, 1992-2003
Regression Analysis: Height versus Shoe Size
The regression equation isHeight = 50.5 + 1.87 Shoe Size
Predictor Coef SE Coef T PConstant 50.5230 0.5912 85.45 0.000Shoe Siz 1.87241 0.06033 31.04 0.000
S = 1.947 R-Sq = 79.1% R-Sq(adj) = 79.0%
Analysis of Variance
Source DF SS MS F PRegression 1 3650.0 3650.0 963.26 0.000Error 255 966.3 3.8Total 256 4616.3
Can height be predicted using shoe size? Example 3, continued …Copied from “Session Window.”
M23- Residuals & Minitab 38 Department of ISM, University of Alabama, 1992-2003
Regression Analysis: Height versus Shoe Size
The regression equation isHeight = 50.5 + 1.87 Shoe Size
Predictor Coef SE Coef T PConstant 50.5230 0.5912 85.45 0.000Shoe Siz 1.87241 0.06033 31.04 0.000
S = 1.947 R-Sq = 79.1% R-Sq(adj) = 79.0%
Analysis of Variance
Source DF SS MS F PRegression 1 3650.0 3650.0 963.26 0.000Error 255 966.3 3.8Total 256 4616.3
Can height be predicted using shoe size? Example 3, continued …
Least squares estimated
coefficients.
Total “Degrees of Freedom”= Number of cases - 1
M23- Residuals & Minitab 39 Department of ISM, University of Alabama, 1992-2003
Regression Analysis: Height versus Shoe Size
The regression equation isHeight = 50.5 + 1.87 Shoe Size
Predictor Coef SE Coef T PConstant 50.5230 0.5912 85.45 0.000Shoe Siz 1.87241 0.06033 31.04 0.000
S = 1.947 R-Sq = 79.1% R-Sq(adj) = 79.0%
Analysis of Variance
Source DF SS MS F PRegression 1 3650.0 3650.0 963.26 0.000Error 255 966.3 3.8Total 256 4616.3
Can height be predicted using shoe size? Example 3, continued …
R-Sq = SSRTSS
3650.04616.3
=
M23- Residuals & Minitab 40 Department of ISM, University of Alabama, 1992-2003
Regression Analysis: Height versus Shoe Size
The regression equation isHeight = 50.5 + 1.87 Shoe Size
Predictor Coef SE Coef T PConstant 50.5230 0.5912 85.45 0.000Shoe Siz 1.87241 0.06033 31.04 0.000
S = 1.947 R-Sq = 79.1% R-Sq(adj) = 79.0%
Analysis of Variance
Source DF SS MS F PRegression 1 3650.0 3650.0 963.26 0.000Error 255 966.3 3.8Total 256 4616.3
Can height be predicted using shoe size? Example 3, continued …
S = MSE = 3.8
Standard Error of Regression.Standard Error of Regression.
Measure of variation around
the regression line.
Standard Error of Regression.Standard Error of Regression.
Measure of variation around
the regression line.
Mean Squared Mean Squared ErrorErrorMSEMSE
Mean Squared Mean Squared ErrorErrorMSEMSE
Sum of Sum of squared residualssquared residuals
M23- Residuals & Minitab 41 Department of ISM, University of Alabama, 1992-2003
15105
5
0
-5
Shoe Siz
Res
idua
l
Residuals Versus Shoe Siz(response is Height)
Are there anyproblems visiblein this plot? ___________
No “Jitter” added.
Can height be predicted using shoe size? Example 3, continued …
M23- Residuals & Minitab 42 Department of ISM, University of Alabama, 1992-2003
Can height be predicted using shoe size? Example 3, continued …
r-square = 79.1%, Std. error = 1.947 inches
Least squares regression equation:
Height = 50.52 + 1.872 Shoe
The two summary measures
that should always begiven with the equation.
The two summary measures
that should always begiven with the equation.
M23- Residuals & Minitab 43 Department of ISM, University of Alabama, 1992-2003
Stat
Regression
Fitted Line Plot …
Y = a + bXY = a + bX
Can height be predicted using shoe size? Example 3, continued …
This program gives a scatterplot with the regression superimposed on it.
This program gives a scatterplot with the regression superimposed on it.
Method 2
M23- Residuals & Minitab 44 Department of ISM, University of Alabama, 1992-2003
151413121110 9 8 7 6 5
80
70
60
Shoe Size
He
ight
S = 1.94659 R-Sq = 79.1 % R-Sq(adj) = 79.0 %
Height = 50.5230 + 1.87241 Shoe Size
Regression Plot
Can height be predicted using shoe size? Example 3, continued …
The fit looks
The fit looks
M23- Residuals & Minitab 45 Department of ISM, University of Alabama, 1992-2003
Regression Analysis: Height versus Shoe Size
The regression equation isHeight = 50.5 + 1.87 Shoe Size
Predictor Coef SE Coef T PConstant 50.5230 0.5912 85.45 0.000Shoe Siz 1.87241 0.06033 31.04 0.000
S = 1.947 R-Sq = 79.1% R-Sq(adj) = 79.0%
Analysis of Variance
Source DF SS MS F PRegression 1 3650.0 3650.0 963.26 0.000Error 255 966.3 3.8Total 256 4616.3
Can height be predicted using shoe size? Example 3, continued …
What information do these values provide?What information do these values provide?
M23- Residuals & Minitab 46 Department of ISM, University of Alabama, 1992-2003
How do you determine if theX-variable is a useful predictor?
Use the “t-statistic” or the F-stat.
“t” measures how many standard errors the estimated coefficient is from “zero.”
“F” = t2 for simple regression.
1
M23- Residuals & Minitab 47 Department of ISM, University of Alabama, 1992-2003
A “P-value” is associated with “t” and “F”.
The further “t” and “F” are from zero,in either direction, the smaller the corresponding P-value will be.
P-value: a measure of the “likelihoodthat the true coefficient IS ZERO.”
How do you determine if theX-variable is a useful predictor?
22
M23- Residuals & Minitab 48 Department of ISM, University of Alabama, 1992-2003
If the P-value is NOT SMALL (i.e., “> 0.10”), then conclude: 1. For all practical purposes the true coefficient MAY BE ZERO; therefore 2. The X variable IS NOT a useful predictor of the Y variable. Don’t use it.
then conclude: 1. It is unlikely that the true coefficient is really zero, and therefore, 2. The X variable IS a useful predictor for the Y variable. Keep the variable!
If the P-value IS SMALL (typically “< 0.10”),
33
M23- Residuals & Minitab 49 Department of ISM, University of Alabama, 1992-2003
Regression Analysis: Height versus Shoe Size
The regression equation isHeight = 50.5 + 1.87 Shoe Size
Predictor Coef SE Coef T PConstant 50.5230 0.5912 85.45 0.000Shoe Siz 1.87241 0.06033 31.04 0.000
S = 1.947 R-Sq = 79.1% R-Sq(adj) = 79.0%
Analysis of Variance
Source DF SS MS F PRegression 1 3650.0 3650.0 963.26 0.000Error 255 966.3 3.8Total 256 4616.3
P-value: a measure of the likelihoodthat the true coefficient is “zero.”
“t” measures how many standard errors the estimated coefficient is from “zero.”
Can height be predicted using shoe size? Example 3, continued …
The P-value for Shoe Size IS SMALL (< 0.10).Conclusion:The “shoe size” coefficient is NOT zero!The “shoe size” coefficient is NOT zero!“Shoe size” IS a useful predictor“Shoe size” IS a useful predictor of the mean of “height”. of the mean of “height”.
The P-value for Shoe Size IS SMALL (< 0.10).Conclusion:The “shoe size” coefficient is NOT zero!The “shoe size” coefficient is NOT zero!“Shoe size” IS a useful predictor“Shoe size” IS a useful predictor of the mean of “height”. of the mean of “height”.
Could “shoe size”Could “shoe size”have a truehave a truecoefficient thatcoefficient thatis actually “zero”?is actually “zero”?
Could “shoe size”Could “shoe size”have a truehave a truecoefficient thatcoefficient thatis actually “zero”?is actually “zero”?
M23- Residuals & Minitab 50 Department of ISM, University of Alabama, 1992-2003
The logic just explained
is statistical inference.
This will be covered in more detail during the last three weeks of the course.