View
228
Download
0
Category
Tags:
Preview:
Citation preview
Chapter 10Chapter 10
Simple RegressionSimple Regression
©
Null HypothesisNull Hypothesis
The analysis of business and economic processes makes extensive use of relationships between variables.
)(XfY
Correlation AnalysisCorrelation Analysis
The correlation coefficientcorrelation coefficient is a quantitative measure of the strength of the linear relationship between two variables.
Correlation AnalysisCorrelation Analysis
The sample correlation coefficient:
yx
xy
ss
sr
1
))((
n
YyXxs iixy
where:
Correlation AnalysisCorrelation Analysis
The null hypothesis of no linear association:
0:0 H
)1(
)2(2r
nrt
where the random variable:
follows a Student’s t Distribution with (n-2) degrees of freedom
Tests for Zero Population Tests for Zero Population CorrelationCorrelation
Let r be the sample correlation coefficient, calculated from a random sample of n pairs of observation from a joint normal distribution. The following tests of the null hypothesis
have a significance value : 1. To test H0 against the alternative
the decision rule is
0:0 H
,220)r-(1
2)-(nr if HReject nt
0:1 H
Tests for Zero Population Tests for Zero Population CorrelationCorrelation
(continued)(continued)
2. To test H0 against the alternative
the decision rule is
,220)r-(1
2)-(nr if HReject nt
0:1 H
Tests for Zero Population Tests for Zero Population CorrelationCorrelation
(continued)(continued)
3. To test H0 against the two-sided alternative
the decision rule is
Here, t n-2, is the number for which
Where the random variable tn-2 follows a Student’s t distribution with (n – 2) degrees of freedom.
,22,220)r-(1
2)-(nr
)r-(1
2)-(nr if HReject nn tort
0:1 H
)( ,2, nsn ttP
Linear Regression ModelLinear Regression Model(Example 10.2)(Example 10.2)
Year Income (x) Retail Sales (y)1 9098 54922 9138 55403 9094 53054 9282 55075 9229 54186 9347 53207 9525 55388 9756 56929 10282 5871
10 10662 615711 11019 634212 11307 590713 11432 612414 11449 618615 11697 622416 11871 649617 12018 671818 12523 692119 12053 647120 12088 639421 12215 655522 12494 6755
Linear Regression ModelLinear Regression Model(Figure 10.1)(Figure 10.1)
Retail Sales per Household vs Per Capita Disposable Income
y = 0.3815x + 1922.4
R2 = 0.9192
5000
5500
6000
6500
7000
9000 9500 10000 10500 11000 11500 12000 12500
Income
Ret
ail S
ales
Linear Regression ModelLinear Regression Model
LINEAR REGRESSION POPULATION LINEAR REGRESSION POPULATION EQUATION MODELEQUATION MODEL
Where 0 and 1 are the population model coefficients and is a random error term.
iii xY 10
Linear Regression Linear Regression OutcomesOutcomes
Linear regression provides two important results:
1. Predicted values of the dependent or endogenous variable as a function of an independent or exogenous variable.
2. Estimated marginal change in the endogenous variable that results from a one unit change in the independent or exogenous variable.
Least Squares ProcedureLeast Squares Procedure
The Least-squares procedure obtains estimates of the linear equation coefficients b0 and b1, in the model
by minimizing the sum of the squared residuals ei
This results in a procedure stated as
Choose bChoose b00 and b and b11 so that the quantity so that the quantity
is minimized. We use differential calculus to obtain is minimized. We use differential calculus to obtain the coefficient estimators that minimize SSE.. the coefficient estimators that minimize SSE..
ii xbby 10ˆ
22 )ˆ( iii yyeSSE
210
2 ))(( iii xbbyeSSE
Least-Squares Derived Least-Squares Derived Coefficient EstimatorsCoefficient Estimators
The slope coefficient estimator is
And the constant or intercept indicator is
We also note that the regression line always goes through the mean X, Y.
X
Yxyn
ii
n
iii
s
sr
Xx
YyXxb
1
2
11
)(
))((
XbYb 10
Standard Assumptions for Standard Assumptions for the Linear Regression Modelthe Linear Regression Model
The following assumptions are used to make inferences about the population linear model by using the estimated coefficients:
1. The x’s are fixed numbers, or they are realizations of random variable, X that are independent of the error terms, i’s. In the latter case, inference is carried out conditionally on the observed values of the x’s.
2. The error terms are random variables with mean 0 and the same variance, 2. The later is called homoscedasticity or uniform variance.
3. The random error terms, I, are not correlated with one another, so that
n), 1,(ifor ][0][ 22 ii EandE
ji allfor 0][ jiE
Regression Analysis for Regression Analysis for Retail Sales AnalysisRetail Sales Analysis
(Figure 10.5)(Figure 10.5)
Coefficients Standard Error t Stat P-valueIntercept 1922.392694 274.9493737 6.99180605 8.74464E-07X Income 0.38151672 0.025293061 15.08384918 2.17134E-12
The regression equation is
Y Retail Sales = 1922 + 0.382 X Income
b0 b1
Analysis of VarianceAnalysis of VarianceThe total variability in a regression analysis, SST, can be partitioned into a component explained by the regression, SSR, and a component due to unexplained error, SSE
With the components defined as,
Total sum of squares
Error sum of squares
Regression sum of squares
SSESSRSST
n
ii YySST
1
2)(
n
ii
n
iii
n
iii eyyxbbySST
1
2
1
2
1
210 )ˆ())((
n
ii
n
ii XxbYySSR
1
221
1
2 )()ˆ(
Regression Analysis for Regression Analysis for Retail Sales AnalysisRetail Sales Analysis
(Figure 10.7)(Figure 10.7)
The regression equation is
Y Retail Sales = 1922 + 0.382 X Income
Analysis of Variancedf SS MS F Significance F
Regression 1 4961434.406 4961434.406 227.522506 2.17134E-12Residual 20 436126.9127 21806.34563Total 21 5397561.318
Coefficient of Determination, Coefficient of Determination, RR22
The Coefficient of DeterminationCoefficient of Determination for a regression equation is defined as
This quantity varies from 0 to 1 and higher values indicate a better regression. Caution should be used in making general interpretations of R2 because a high value can result from either a small SSE or a large SST or both.
SST
SSE
SST
SSRR 12
Correlation and RCorrelation and R22
The multiple coefficient of determination, R2, for a simple regression is equal to the simple correlation squared:
22xyrR
Estimation of Model Error Estimation of Model Error VarianceVariance
The quantity SSE is a measure of the total squared deviation about the estimated regression line, and ei is the residual. An estimator for the variance of the population model error is
Division by n – 2 instead of n – 1 results because the simple regression model uses two estimated parameters, b0 and b1, instead of one.
22ˆ 1
2
22
n
SSE
n
es
n
ii
e
Sampling Distribution of the Sampling Distribution of the Least Squares Coefficient Least Squares Coefficient
EstimatorEstimator
If the standard least squares assumptions hold, then b1 is an unbiased estimator of 1 and has a population variance
and an unbiased sample variance estimator
2
2
1
2
22
)1()(
1
Xn
ii
b snXx
2
2
1
2
22
)1()(
1
X
en
ii
eb sn
s
Xx
ss
Basis for Inference About the Basis for Inference About the Population Regression SlopePopulation Regression Slope
Let 1 be a population regression slope and b1 its least squares estimate based on n pairs of sample observations. Then, if the standard regression assumptions hold and it can also be assumed that the errors i are normally distributed, the random variable
is distributed as Student’s t with (n – 2) degrees of freedom. In addition the central limit theorem enables us to conclude that this result is approximately valid for a wide range of non-normal distributions and large sample sizes, n.
1
11
bs
bt
Excel Output for Retail Sales Excel Output for Retail Sales ModelModel
(Figure 10.9)(Figure 10.9)
Regression StatisticsMultiple R 0.958748803R Square 0.919199267Adjusted R Square 0.91515923Standard Error 147.6697181Observations 22
Analysis of Variancedf SS MS F Significance F
Regression 1 4961434.406 4961434.406 227.522506 2.17134E-12Residual 20 436126.9127 21806.34563Total 21 5397561.318
Coefficients Standard Error t Stat P-value Lower 95%Intercept 1922.392694 274.9493737 6.99180605 8.74464E-07 1348.858617X Income 0.38151672 0.025293061 15.08384918 2.17134E-12 0.328756343
The regression equation is
Y Retail Sales = 1922 + 0.382 X Income
SSR SSE SST MSR
MSE
b0 b1 sb1 tb1
se
Tests of the Population Regression Tests of the Population Regression SlopeSlope
If the regression errors i are normally distributed and the standard least squares assumptions hold (or if the distribution of b1 is approximately normal), the following tests have significance value :
1. To test either null hypothesis
against the alternative
the decision rule is
*110
*110 :: HorH
,2
*11
0
1
b if HReject
n
b
ts
*111 : H
Tests of the Population Regression Tests of the Population Regression SlopeSlope
(continued)(continued)
2. To test either null hypothesis
against the alternative
the decision rule is
*110
*110 :: HorH
,2
*11
0
1
b if HReject
n
b
ts
*111 : H
Tests of the Population Regression Tests of the Population Regression SlopeSlope
(continued)(continued)
3. To test the null hypothesis
Against the two-sided alternative
the decision rule is
*110 : H
2/,2
*11
2/,2
*11
0
11
bb if HReject
n
bn
b
ts
orts
*111 : H
Confidence Intervals for the Confidence Intervals for the Population Regression Slope Population Regression Slope 11
If the regression errors i , are normally distributed and the standard regression assumptions hold, a 100(1 - )% confidence interval for the population regression slope 1 is given by
Where t(n – 2, /2) is the number for which
And the random variable t(n – 2) follows a Student’s t distribution with (n – 2) degrees of freedom.
11 )2/,2(11)2/,2(1 bnbn stbstb
2/)( )2/,2()2( nn ttP
F test for Simple Regression F test for Simple Regression CoefficientCoefficient
We can test the hypothesis
against the alternative
By using the F statistic
The decision rule is
We can also show that the F statistic is
For any simple regression analysis.
0: 10 H
2,-n1,0 FF if HReject
2es
SSR
MSE
MSRF
0: 11 H
2
1btF
Key WordsKey Words Analysis of Variance Assumptions for the
Least Squares Coefficient Estimators
Basis for Inference About the Population Regression Slope
Coefficient of Determination, R2
Confidence Intervals for Predictions
Confidence Intervals for the Population Regression Slope b1
Correlation and R2
Estimation of Model Error Variance
F test for Simple Regression Coefficient
Least-Squares Procedure
Linear Regression Outcomes
Key WordsKey Words(continued)(continued)
Linear Regression Population Equation Model
Population Model Sampling Distribution of
the Least Squares Coefficient Estimator
Tests for Zero Population Correlation
Tests of the Population Regression Slope
Recommended