Upload
saark
View
2.117
Download
3
Embed Size (px)
DESCRIPTION
Citation preview
QUANTITATIVE ANALYSIS FOR
BUSINESSLecture 3
July 12th, 2010
Saksarun (Jay) Mativachranon
ERROR IN REGRESSION MODEL
ASSUMPTIONS OF THE REGRESSION MODEL If we make certain assumptions about
the errors in a regression model, we can perform statistical tests to determine if the model is useful 1. Errors are independent2. Errors are normally distributed3. Errors have a mean of zero4. Errors have a constant variance
A plot of the residuals (errors) will often highlight any glaring violations of the assumption
RESIDUAL PLOTS A random plot of residuals
Figure 4.4A
Err
or
X
RESIDUAL PLOTS Nonconstant error variance
Figure 4.4B
Err
or
X
RESIDUAL PLOTS Nonlinear relationship
Figure 4.4C
Err
or
X
ANALYSIS OF VARIANCE
ANALYSIS OF VARIANCE (ANOVA) Analysis of Variance (ANOVA)
A statistical procedure for analyzing the total variability of a data set
ANALYSIS OF VARIANCE (ANOVA) Sum of squares total (SST)
Measures the total variation in the dependent variable
Sum of squares of regression (SSR)Measures the variation in the dependent
variable explained by the independent variable
Sum of squares of errors (SSE)Measures the unexplained variation
2)( YYSST
2)ˆ( YYSSR
22 )ˆ( YYeSSE
ESTIMATING THE VARIANCE
Errors are assumed to have a constant variance ( 2), but we usually don’t know this
It can be estimated using the mean squared error (MSE), s2
12
knSSE
MSEs
wheren = number of observations in the samplek = number of independent variables
THE STANDARD ERROR OF ESTIMATE The standard error of estimate (SEE) or
The standard error of the regressionMeasures uncertainty between independent
and dependent variables
2
n
SSEMSESEE
THE F STATISTIC F-test asses how well a set of
independent variables, as a group, explains the variation in the dependent variable
Where MSR = mean regression sum of squares MSE = mean squared error k = the number of slope parameters (k = 1 for
linear regression) n = number of observations
1
knSSE
kSSR
MSE
MSRF
F-STATISTIC LINEAR REGRESSION For linear regression, the hypotheses for
the validity of the model are;H0: b1 = 0Ha: b1 ≠ 0
To determine if b1 is statistically significant, the calculated F-statistic is compared with the critical F-value, Fc, at the appropriate level of significance.
F-STATISTIC LINEAR REGRESSION The degree of freedom (df) for the
numerator and denominator with one independent variable are;dfnumerator = k = 1dfdenominator = n – k – 1 = n – 2
Decision for F-testReject H0 if F > Fc
COMPANY A DATASales of Company A ($) Man Hour (Hour)
6 3
8 4
9 6
5 4
4.5 2
9.5 5
COMPANY A EXAMPLEY X (Y – Y)2 Y (Y – Y)2 (Y – Y)2
6 3 (6 – 7)2 = 1 2 + 1.25(3) = 5.75 0.0625 1.563
8 4 (8 – 7)2 = 1 2 + 1.25(4) = 7.00 1 0
9 6 (9 – 7)2 = 4 2 + 1.25(6) = 9.50 0.25 6.25
5 4 (5 – 7)2 = 4 2 + 1.25(4) = 7.00 4 0
4.5 2 (4.5 – 7)2 = 6.25
2 + 1.25(2) = 4.50 0 6.25
9.5 5 (9.5 – 7)2 = 6.25
2 + 1.25(5) = 8.25 1.5625 1.563
∑(Y – Y)2 = 22.5 ∑(Y – Y)2 = 6.875 ∑(Y – Y)2 =
15.625
Y = 7 SST = 22.5 SSE = 6.875 SSR = 15.625
^
_
_^
_
_ _^ ^
^
ESTIMATING THE VARIANCE
For Company A
718814
87506116
875061
2 ...
knSSE
MSEs
We can estimate the standard deviation, s
This is also called the standard error of the estimate or the standard deviation of the regression
31171881 .. MSEs
TESTING THE MODEL FOR SIGNIFICANCE When the sample size is too small, you
can get good values for MSE and r2 even if there is no relationship between the variables
Testing the model for significance helps determine if the values are meaningful
We do this by performing a statistical hypothesis test
TESTING THE MODEL FOR SIGNIFICANCE
We start with the general linear model XY 10
If 1 = 0, the null hypothesis is that there is no relationship between X and Y
The alternate hypothesis is that there is a linear relationship (1 ≠ 0)
If the null hypothesis can be rejected, we have proven there is a relationship
We use the F statistic for this test
TESTING THE MODEL FOR SIGNIFICANCE
The F statistic is based on the MSE and MSR
kSSR
MSR
wherek = number of independent variables in the model
The F statistic is
MSEMSR
F
This describes an F distribution withdegrees of freedom for the numerator = df1 = kdegrees of freedom for the denominator = df2 = n
– k – 1
TESTING THE MODEL FOR SIGNIFICANCE
If there is very little error, the MSE would be small and the F-statistic would be large indicating the model is useful
If the F-statistic is large, the significance level (p-value) will be low, indicating it is unlikely this would have occurred by chance
So when the F-value is large, we can reject the null hypothesis and accept that there is a linear relationship between X and Y and the values of the MSE and r2 are meaningful
STEPS IN A HYPOTHESIS TEST1. Specify null and alternative
hypotheses
2. Select the level of significance (). Common values are 0.01 and 0.05
3. Calculate the value of the test statistic using the formula
010 :H011 :H
MSEMSR
F
STEPS IN A HYPOTHESIS TEST
4. Make a decision using one of the following methodsa) Reject the null hypothesis if the test statistic is
greater than the F-value from the table Otherwise, do not reject the null hypothesis:
21 ifReject dfdfcalculated FF ,,
kdf 1
12 kndf
b) Reject the null hypothesis if the observed significance level, or p-value, is less than the level of significance (). Otherwise, do not reject the null hypothesis:
)( statistictest calculatedvalue- FPp
value- ifReject p
COMPANY AStep 1.
H0: 1 = 0 (no linear relationship between X and Y)H1: 1 ≠ 0 (linear relationship exists between X and Y)
Step 2.Select = 0.05
6250151625015
..
kSSR
MSR
09971881625015
...
MSEMSR
F
Step 3.Calculate the value of the test statistic
COMPANY AStep 4.
Reject the null hypothesis if the test statistic is greater than the F-value
df1 = k = 1df2 = n – k – 1 = 6 – 1 – 1 = 4The value of F associated with a 5%
level of significance and with degrees of freedom 1 and 4 is
F0.05,1,4 = 7.71
Fcalculated = 9.09
Reject H0 because 9.09 > 7.71
F = 7.71
0.05
9.09
COMPANY A
We can conclude there is a statistically significant relationship between X and Y
The r2 value of 0.69 means about 69% of the variability in sales (Y) is explained by Man Hour (X)
LIMITATION OF REGRESSION ANALYSIS Linear relationships can change over
timeThis is referred to as parameter instability
Even if the model is accurate, its usefulness will be limited if other market participants are also aware of and act on this model
If the assumptions do not hold, the interpretation and tests of hypotheses may not be valid
USING SOFTWARE FOR REGRESSION
USING SOFTWARE FOR REGRESSION
USING SOFTWARE FOR REGRESSIONCorrelation coefficient is
called Multiple R in Excel
ANALYSIS OF VARIANCE (ANOVA) TABLE When software is used to develop a
regression model, an ANOVA table is typically created that shows the observed significance level (p-value) for the calculated F value
This can be compared to the level of significance () to make a decisionDF SS MS F SIGNIFICANCE
Regression k SSR MSR = SSR/k MSR/MSE P(F > MSR/MSE)
Residual n - k - 1 SSE MSE = SSE/(n - k - 1)
Total n - 1 SST
Table 4.4
ANOVA FOR COMPANY A
Because this probability is less than 0.05, we reject the null hypothesis of no linear relationship and conclude there is a linear relationship between X and Y
P(F > 9.0909) = 0.0394