Upload
pauline-carroll
View
259
Download
1
Tags:
Embed Size (px)
Citation preview
Statistics
Measures of Regression and Prediction Intervals
Warm-up What value of “r” below best describes the scatterplot below?
a) 0.9
b) -0.9
c) 0.3
d) -0.3
e) 0
Warm-up A least squares regression equation was created from last year’s students data to predict Exam 3 scores based on Exam 1 scores. The equation was:
a) 69.70
b) 79.89
c) 85.53
d) 89.33
e) 90.53
xy 4845.057.50ˆ Predict the score on Exam 3 for a student that scored an 80 on Exam 1.
Objectives
Interpret the three types of variation about a regression line
Find and interpret the coefficient of determination
Find and interpret the standard error of the estimate for a regression line
Variation About a Regression Line
Three types of variation about a regression line Total variation Explained variation Unexplained variation
To find the total variation, you must first calculate The total deviation The explained deviation The unexplained deviation
Variation About a Regression Line
iy yˆiy y
ˆi iy y
(xi, ŷi)
x
y (xi, yi)
(xi, yi)
Unexplained deviation
ˆi iy yTotal
deviationiy y Explained
deviationˆiy y
y
x
Total Deviation = Explained Deviation = Unexplained Deviation =
Total variation The sum of the squares of the differences
between the y-value of each ordered pair and the mean of y.
Explained variation The sum of the squares of the differences
between each predicted y-value and the mean of y.
Variation About a Regression Line
2iy y Total variation
=
Explained variation = 2ˆiy y
Unexplained variation The sum of the squares of the differences
between the y-value of each ordered pair and each corresponding predicted y-value.
Variation About a Regression Line
2ˆi iy y Unexplained variation =
The sum of the explained and unexplained variation is equal to the total variation.
Total variation = Explained variation + Unexplained variation
Coefficient of Determination
Coefficient of determination The ratio of the explained variation to the
total variation. Denoted by r2
2 Explained variationTotal variation
r
Example: Coefficient of Determination
22 (0.913)
0.834
r
About 83.4% of the variation in the company sales can be explained by the variation in the advertising expenditures. About 16.9% of the variation is unexplained.
The correlation coefficient for the advertising expenses and company sales data as calculated is r ≈ 0.913. Find the coefficient of determination. What does this tell you about the explained variation of the data about the regression line? About the unexplained variation?
Solution:
The Standard Error of Estimate
Standard error of estimate The standard deviation of the observed yi -values about
the predicted ŷ-value for a given xi -value.
Denoted by se.
The closer the observed y-values are to the predicted y-
values, the smaller the standard error of estimate will be.
2( )ˆ2
i ie
y ys
n
n is the number of ordered pairs in the data set
The Standard Error of Estimate
2
, , , ( ), ˆ ˆ( )ˆ
i i i i i
i i
x y y y yy y
1. Make a table that includes the column heading shown.
2. Use the regression equation to calculate the predicted y-values.
3. Calculate the sum of the squares of the differences between each observed y-value and the corresponding predicted y-value.
4. Find the standard error of estimate.
ˆ iy mx b
2 ( )ˆi iy y
2( )ˆ2
i ie
y ys
n
In Words In Symbols
Example: Standard Error of Estimate
The regression equation for the advertising expenses and company sales data is
ŷ = 50.729x + 104.061
Find the standard error of estimate.
Solution:Use a table to calculate the sum of the squared differences of each observed y-value and the corresponding predicted y-value.
Solution: Standard Error of Estimate
x y ŷ i (yi – ŷ i)2
2.4 225 225.81 (225 – 225.81)2 = 0.6561
1.6 184 185.23 (184 – 185.23)2 = 1.5129
2.0 220 205.52 (220 – 205.52)2 = 209.6704
2.6 240 235.96 (240 – 235.96)2 = 16.3216
1.4 180 175.08 (180 – 175.08)2 = 24.2064
1.6 184 185.23 (184 – 185.23)2 = 1.5129
2.0 186 205.52 (186 – 205.52)2 = 381.0304
2.2 215 215.66 (215 – 215.66)2 = 0.4356Σ = 635.3463
unexplained variation
Standard error of Estimate
Related to unexplained variation (residuals) It is a measurement of the variation of the
points about the regression line. se = 0, r = 1 or -1, a perfect relation exists.
The larger the se, the more variability exists and the lower the quality of the relationship.
Solution: Standard Error of Estimate
• n = 8, Σ(yi – ŷ i)2 = 635.3463
2( )ˆ2
i ie
y ys
n
The standard error of estimate of the company sales for a specific advertising expense is about $10.29.
635.3463 10.2908 2
Summary
Interpreted the three types of variation about a regression line
Found and interpreted the coefficient of determination
Found and interpreted the standard error of the estimate for a regression line
Homework
Pg 490-492; # 2-16 even