Chapter 7Simple Linear
Regression
Business Statistic
Models
Models Representation of some phenomenon Mathematical model is a
mathematical expression of some phenomenon
Often describe relationships between variables
Types Deterministic models Probabilistic models
Deterministic Models Hypothesize exact
relationships Suitable when prediction error
is negligible Example: force is exactly
mass times acceleration F = m·a
Probabilistic Models Hypothesize two components
DeterministicRandom error
Example: sales volume (y) is 10 times advertising spending (x) + random error
y = 10x + Random error may be due to factors
other than advertising
Types of Probabilistic Models
ProbabilisticModels
Regression Models
Correlation Models
Regression Models
Types of Probabilistic Models
ProbabilisticModels
Regression Models
Correlation Models
Regression Models• Answers ‘What is the relationship between the
variables?’• Equation used
– One numerical dependent (response) variable What is to be predicted
– One or more numerical or categorical independent (explanatory) variables
• Used mainly for prediction and estimation
Regression Modeling Steps
1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of
random error term• Estimate standard deviation of error
4. Evaluate model5. Use model for prediction and estimation
Model Specification
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters3. Specify probability distribution of
random error term• Estimate standard deviation of error
4. Evaluate model5. Use model for prediction and estimation
Specifying the Model
1. Define variables• Conceptual (e.g., Advertising, price)• Empirical (e.g., List price, regular price) • Measurement (e.g., $, Units)
2. Hypothesize nature of relationship• Expected effects (i.e., Coefficients’ signs)• Functional form (linear or non-linear)• Interactions
Model Specification Is Based on Theory
Theory of field (e.g., Sociology)
Mathematical theory Previous research ‘Common sense’
Advertising
Sales
Advertising
Sales
Advertising
Sales
Advertising
Sales
Thinking Challenge: Which Is More Logical?
Simple
1 ExplanatoryVariable
Types of Regression Models
RegressionModels
2+ ExplanatoryVariables
Multiple
Linear Non-Linear Linear Non-
Linear
Linear Regression Model
Types of Regression Models
Simple
1 ExplanatoryVariable
RegressionModels
2+ ExplanatoryVariables
Multiple
Linear Non-Linear Linear Non-
Linear
y x 0 1
Linear Regression Model
Relationship between variables is a linear function
Dependent (Response) Variable
Independent (Explanatory) Variable
Population Slope
Population y-intercept
Random Error
y
E(y) = β0 + β1x (line of means)
β0 = y-interceptx
Change in y
Change in xβ1 = Slope
Line of Means
Population & Sample Regression Models
Population
$ $
$
$ $
Unknown Relationship
0 1y x
Random Sample
$$ $$
$$ $$
0 1ˆ ˆ ˆy x
y
x
Population Linear Regression Model
0 1i i iy x
0 1E y x
Observedvalue
Observed value
i = Random error
y
x
0 1ˆ ˆ ˆi i iy x
Sample Linear Regression Model
0 1ˆ ˆˆi iy x
Unsampled observation
i = Random error
Observed value
^
Estimating Parameters:Least Squares Method
Regression Modeling Steps
1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of random
error term• Estimate standard deviation of error
4. Evaluate model5. Use model for prediction and estimation
Scattergram1. Plot of all (xi, yi) pairs2. Suggests how well model will fit
0204060
0 20 40 60
x
y
0204060
0 20 40 60x
y
Thinking Challenge
• How would you draw a line through the points?• How do you determine which line ‘fits best’?
Least Squares ‘Best fit’ means difference
between actual y values and predicted y values are a minimum
But positive differences off-set negative 2 2
1 1
ˆ ˆn n
ii ii i
y y
• Least Squares minimizes the Sum of the Squared Differences (SSE)
Least Squares Graphically
2
y
x
1 3
4
^^
^^
2 0 1 2 2ˆ ˆ ˆy x
0 1ˆ ˆˆi iy x
2 2 2 2 21 2 3 4
1
ˆ ˆ ˆ ˆ ˆLS minimizes n
ii
Coefficient Equations
1 1
11 2
12
1
ˆ
n n
i ini i
i ixy i
nxx
ini
ii
x yx ySS n
SSx
xn
Slope
y-intercept
0 1ˆ ˆy x Prediction Equation
0 1ˆ ˆy x
Computation Table
xi2
x12
x22
: :xn
2
Σxi2
yi
y1
y2
:yn
Σyi
xi
x1
x2
:xn
Σxi
2
2
2
2
2
yi
y1
y2
:yn
Σyi
xiyi
Σxiyi
x1y1
x2y2
xnyn
Interpretation of Coefficients
^
^^1. Slope (1)
• Estimated y changes by 1 for each 1unit increase in x— If 1 = 2, then Sales (y) is expected to increase by 2
for each 1 unit increase in Advertising (x)
2. Y-Intercept (0)• Average value of y when x = 0
— If 0 = 4, then Average Sales (y) is expected to be 4 when Advertising (x) is 0
^
^
Least Squares ExampleYou’re a marketing analyst for Hasbro Toys. You gather the following data:
Ad $ Sales (Units)1 12 13 24 25 4
Find the least squares line relatingsales and advertising.
01234
0 1 2 3 4 5
Scattergram Sales vs. Advertising
Sales
Advertising
Parameter Estimation Solution Table
2 2
1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 20
15 10 55 26 37
xi yi xi yi xiyi
Parameter Estimation Solution
1 1
11 2 2
12
1
15 1037
5ˆ .7015
555
n n
i ini i
i ii
n
ini
ii
x yx y
n
xx
n
ˆ .1 .7y x
0 1ˆ ˆ 2 .70 3 .10y x
Parameter Estimation Computer Output
Parameter Estimates
Parameter Standard T for H0:Variable DF Estimate Error Param=0 Prob>|T|INTERCEP 1 -0.1000 0.6350 -0.157 0.8849ADVERT 1 0.7000 0.1914 3.656 0.0354
0^
1^
ˆ .1 .7y x
Coefficient Interpretation Solution1. Slope (1)
• Sales Volume (y) is expected to increase by .7 units for each $1 increase in Advertising (x)
^
2. Y-Intercept (0)• Average value of Sales Volume (y) is -.10 units
when Advertising (x) is 0— Difficult to explain to marketing manager— Expect some sales without advertising
^
01234
0 1 2 3 4 5
Regression Line Fitted to the Data
Sales
Advertising
ˆ .1 .7y x
Least Squares Thinking Challenge
You’re an economist for the county cooperative. You gather the following data:Fertilizer (lb.)Yield (lb.)
4 3.0 6 5.510 6.512 9.0
Find the least squares line relatingcrop yield and fertilizer.
02468
10
0 5 10 15
Scattergram Crop Yield vs. Fertilizer*
Yield (lb.)
Fertilizer (lb.)
Parameter Estimation Solution Table*
4 3.0 16 9.00 12 6 5.5 36 30.25 3310 6.5 100 42.25 6512 9.0 144 81.00 10832 24.0 296 162.50 218
xi2yi xi yi xiyi
2
Parameter Estimation Solution*
1 1
11 2 2
12
1
32 24218
4ˆ .6532
2964
n n
i ini i
i ii
n
ini
ii
x yx y
n
xx
n
0 1ˆ ˆ 6 .65 8 .80y x
ˆ .8 .65y x
Coefficient Interpretation Solution*
2. Y-Intercept (0)• Average Crop Yield (y) is expected to be 0.8 lb.
when no Fertilizer (x) is used
^
^1. Slope (1)• Crop Yield (y) is expected to increase by .65 lb.
for each 1 lb. increase in Fertilizer (x)
Regression Line Fitted to the Data*
02468
10
0 5 10 15
Yield (lb.)
Fertilizer (lb.)
ˆ .8 .65y x
Probability Distribution of Random Error
Regression Modeling Steps
1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of
random error term• Estimate standard deviation of error
4. Evaluate model5. Use model for prediction and estimation
Linear Regression Assumptions
1. Mean of probability distribution of error, ε, is 0
2. Probability distribution of error has constant variance
3. Probability distribution of error, ε, is normal
4. Errors are independent
Error Probability Distribution
x1 x2 x3
yE(y) = β0 + β1x
x
Random Error Variation
• Measured by standard error of regression model– Sample standard deviation of : s^
• Affects several factors– Parameter significance– Prediction accuracy
^• Variation of actual y from predicted y, y
y
xxi
Variation Measures
0 1ˆ ˆˆi iy x
yi 2ˆ( )i iy y
Unexplained sum of squares
2( )iy yTotal sum of squares
2ˆ( )iy yExplained sum of squares
y
Estimation of σ2
22 ˆ2 i i
SSEs where SSE y yn
2
2SSEs sn
Calculating SSE, s2, s ExampleYou’re a marketing analyst for Hasbro Toys. You gather the following data:
Ad $ Sales (Units)1 12 13 24 25 4
Find SSE, s2, and s.
Calculating SSE Solution
1 1
2 1
3 2
4 2
5 4
xi yi
.6
1.3
2
2.7
3.4
ˆy y
.4
-.3
0
-.7
.6
2ˆ( )y yˆ .1 .7y x
.16
.09
0
.49
.36
SSE=1.1
Calculating s2 and s Solution
2 1.1 .366672 5 2
SSEsn
.36667 .6055s
Testing for Significance
Evaluating the Model
Regression Modeling Steps
1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of
random error term• Estimate standard deviation of error
4. Evaluate model5. Use model for prediction and estimation
Test of Slope Coefficient Shows if there is a linear relationship
between x and y Involves population slope 1
Hypotheses H0: 1 = 0 (No Linear Relationship) Ha: 1 0 (Linear Relationship)
Theoretical basis is sampling distribution of slope
y
Population Linex
Sample 1 Line
Sample 2 Line
Sampling Distribution of Sample Slopes
1
Sampling Distribution
11
11SS
^
^
All Possible Sample Slopes
Sample 1: 2.5Sample 2: 1.6 Sample 3: 1.8Sample 4: 2.1 : :
Very large number of sample slopes
Slope Coefficient Test Statistic
1
1 1
ˆ
2
12
1
ˆ ˆ2
wherexx
n
ini
xx ii
t df nsSSS
xSS x
n
Test of Slope Coefficient Example
You’re a marketing analyst for Hasbro Toys. You find β0 = –.1, β1 = .7 and s = .6055.
Ad $ Sales (Units)1 12 13 24 25 4
Is the relationship significant at the .05 level of significance?
^^
Test of Slope Coefficient Solution
H0:Ha: df Critical
Value(s):
Test Statistic:
Decision:
Conclusion:
t0 3.182-3.182
.025
Reject H0 Reject H0
.025
1 = 01 0.055 - 2 = 3
Solution Table
xi yi xi2 yi
2 xiyi
1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 20
15 10 55 26 37
Test StatisticSolution
1
1
ˆ 2
1
ˆ
.6055 .191415
555
ˆ .70 3.657.1914
xx
SSSS
tS
Test of Slope Coefficient Solution
H0: 1 = 0Ha: 1 0 .05df 5 - 2 = 3Critical
Value(s):
Test Statistic:
Decision:
Conclusion:
1
1
ˆ
ˆ .70 3.657.1914
tS
Reject at = .05
There is evidence of a relationshipt0 3.182-3.182
.025
Reject H0 Reject H0
.025
Test of Slope CoefficientComputer Output
Parameter Estimates Parameter Standard T for H0:Variable DF Estimate Error Param=0 Prob>|T|INTERCEP 1 -0.1000 0.6350 -0.157 0.8849ADVERT 1 0.7000 0.1914 3.656 0.0354
t = 1 / S
P-Value
S1 1 1
^^^^
Correlation Models
Types of Probabilistic Models
ProbabilisticModels
Regression Models
Correlation Models
Correlation Models Answers ‘How strong is the linear
relationship between two variables?’ Coefficient of correlation
Sample correlation coefficient denoted r
Values range from –1 to +1 Measures degree of association Does not indicate cause–effect
relationship
Coefficient of Correlation
xy
xx yy
SSr
SS SS
2
2
2
2
xx
yy
xy
xSS x
n
ySS y
nx y
SS xyn
where
Coefficient of Correlation Values
–1.0 +1.00–.5 +.5
No Linear Correlation
Perfect Negative
Correlation
Perfect Positive
Correlation
Increasing degree of positive correlation
Increasing degree of negative correlation
Coefficient of Correlation Example
You’re a marketing analyst for Hasbro Toys. Ad $ Sales (Units)
1 12 13 24 25 4
Calculate the coefficient ofcorrelation.
Solution Table
xi yi xi2 yi
2 xiyi
1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 20
15 10 55 26 37
Coefficient of Correlation Solution
22
2
22
2
(15)55 105
(10)26 65
(15)(10)37 75
xx
yy
xy
xSS x
n
ySS y
nx y
SS xyn
7 .90410 6
xy
xx yy
SSr
SS SS
Coefficient of Correlation Thinking Challenge
You’re an economist for the county cooperative. You gather the following data:Fertilizer (lb.)Yield (lb.)
4 3.0 6 5.510 6.512 9.0
Find the coefficient of correlation.
Solution Table*
4 3.0 16 9.00 12 6 5.5 36 30.25 3310 6.5 100 42.25 6512 9.0 144 81.00 10832 24.0 296 162.50 218
xi2yi xi yi xiyi
2
Coefficient of Correlation Solution*
22
2
22
2
(32)296 404
(24)162.5 18.54
(32)(24)218 264
xx
yy
xy
xSS x
n
ySS y
nx y
SS xyn
26 .95640 18.5
xy
xx yy
SSr
SS SS
Coefficient of DeterminationProportion of variation ‘explained’ by relationship between x and y
2 Explained VariationTotal Variation
yy
yy
SS SSEr
SS
0 r2 1r2 = (coefficient of correlation)2
Coefficient of Determination Example
You’re a marketing analyst for Hasbro Toys. You know r = .904.
Ad $ Sales (Units)1 12 13 24 25 4
Calculate and interpret thecoefficient of determination.
Coefficient of Determination Solution
r2 = (coefficient of correlation)2
r2 = (.904)2
r2 = .817
Interpretation: About 81.7% of the sample variation in Sales (y) can be explained by using Ad $ (x) to predict Sales (y) in the linear model.
r2 Computer Output
Root MSE 0.60553 R-square 0.8167 Dep Mean 2.00000 Adj R-sq 0.7556 C.V. 30.27650
r2 adjusted for number of explanatory variables & sample size
r2
Using the Model for Prediction & Estimation
Regression Modeling Steps
1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of random
error term• Estimate standard deviation of error
4. Evaluate model5. Use model for prediction and
estimation
Prediction With Regression Models
Types of predictions Point estimates Interval estimates
What is predicted Population mean response E(y)
for given x Point on population regression line
Individual response (yi) for given x
What Is Predicted
Mean y, E(y)
y
^yIndividual
Prediction, y
E(y) = x
^x
xP
^ ^y i =
x
Confidence Interval Estimate for Mean Value of y at x = xp
xx
p
SSxx
nSty
2
2/1ˆ
df = n – 2
Factors Affecting Interval Width
1. Level of confidence (1 – )• Width increases as confidence increases
2. Data dispersion (s)• Width increases as variation increases
3. Sample size• Width decreases as sample size
increases4. Distance of xp from meanx
• Width increases as distance increases
Why Distance from Mean?
Sample 2 Line
y
xx1 x2
ySample 1 Line
Greater dispersion than x1
x
Confidence Interval Estimate Example
You’re a marketing analyst for Hasbro Toys.You find β0 = -.1, β 1 = .7 and s = .6055.
Ad $ Sales (Units)1 12 13 24 25 4
Find a 95% confidence interval forthe mean sales when advertising is $4.
^^
Solution Table
x i y i x i2 y i
2 x iy i
1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 2015 10 55 26 37
Confidence Interval Estimate Solution
2
/ 2
2
1ˆ
ˆ .1 .7 4 2.7
4 312.7 3.182 .60555 10
1.645 ( ) 3.755
p
xx
x xy t s
n SS
y
E Y
x to be predicted
Prediction Interval of Individual Value of y at x = xp
2
/ 21ˆ 1 p
xx
x xy t S
n SS
Note!
df = n – 2
Why the Extra ‘S’?
Expected(Mean) y
yy we're trying topredict
Prediction, y
xxp
E(y) = x
^^ ^
y i = x i
Prediction Interval ExampleYou’re a marketing analyst for Hasbro Toys.You find β0 = -.1, β 1 = .7 and s = .6055.
Ad $ Sales (Units)1 12 13 24 25 4
Predict the sales when advertising is $4. Use a 95% prediction interval.
^^
Solution Table
x i y i x i2 y i
2 x iy i
1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 2015 10 55 26 37
Prediction Interval Solution
2
/ 2
2
4
1ˆ 1
ˆ .1 .7 4 2.7
4 312.7 3.182 .6055 15 10
.503 4.897
p
xx
x xy t s
n SS
y
y
x to be predicted
Interval Estimate Computer Output
Dep Var Pred Std Err Low95% Upp95% Low95% Upp95%Obs SALES Value Predict Mean Mean Predict Predict 1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037 2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497 3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111 4 2.000 2.700 0.332 1.644 3.755 0.502 4.897 5 4.000 3.400 0.469 1.907 4.892 0.962 5.837
Predicted y when x = 4
Confidence Interval
SYPrediction
Interval
Confidence Intervals versus Prediction Intervals
x
y
x
^ ^ ^y i =
x i