Business Quantitative Lecture 3

QUANTITATIVE ANALYSIS FOR

BUSINESSLecture 3

July 12th, 2010

Saksarun (Jay) Mativachranon

ERROR IN REGRESSION MODEL

ASSUMPTIONS OF THE REGRESSION MODEL If we make certain assumptions about

the errors in a regression model, we can perform statistical tests to determine if the model is useful 1. Errors are independent2. Errors are normally distributed3. Errors have a mean of zero4. Errors have a constant variance

A plot of the residuals (errors) will often highlight any glaring violations of the assumption

RESIDUAL PLOTS A random plot of residuals

Figure 4.4A

Err

or

X

RESIDUAL PLOTS Nonconstant error variance

Figure 4.4B

Err

or

X

RESIDUAL PLOTS Nonlinear relationship

Figure 4.4C

Err

or

X

ANALYSIS OF VARIANCE

ANALYSIS OF VARIANCE (ANOVA) Analysis of Variance (ANOVA)

A statistical procedure for analyzing the total variability of a data set

ANALYSIS OF VARIANCE (ANOVA) Sum of squares total (SST)

Measures the total variation in the dependent variable

Sum of squares of regression (SSR)Measures the variation in the dependent

variable explained by the independent variable

Sum of squares of errors (SSE)Measures the unexplained variation

2)( YYSST

2)ˆ( YYSSR

22 )ˆ( YYeSSE

ESTIMATING THE VARIANCE

Errors are assumed to have a constant variance ( 2), but we usually don’t know this

It can be estimated using the mean squared error (MSE), s2

12

knSSE

MSEs

wheren = number of observations in the samplek = number of independent variables

THE STANDARD ERROR OF ESTIMATE The standard error of estimate (SEE) or

The standard error of the regressionMeasures uncertainty between independent

and dependent variables

2

n

SSEMSESEE

THE F STATISTIC F-test asses how well a set of

independent variables, as a group, explains the variation in the dependent variable

Where MSR = mean regression sum of squares MSE = mean squared error k = the number of slope parameters (k = 1 for

linear regression) n = number of observations

1

knSSE

kSSR

MSE

MSRF

F-STATISTIC LINEAR REGRESSION For linear regression, the hypotheses for

the validity of the model are;H0: b1 = 0Ha: b1 ≠ 0

To determine if b1 is statistically significant, the calculated F-statistic is compared with the critical F-value, Fc, at the appropriate level of significance.

F-STATISTIC LINEAR REGRESSION The degree of freedom (df) for the

numerator and denominator with one independent variable are;dfnumerator = k = 1dfdenominator = n – k – 1 = n – 2

Decision for F-testReject H0 if F > Fc

COMPANY A DATASales of Company A ($) Man Hour (Hour)

6 3

8 4

9 6

5 4

4.5 2

9.5 5

COMPANY A EXAMPLEY X (Y – Y)2 Y (Y – Y)2 (Y – Y)2

6 3 (6 – 7)2 = 1 2 + 1.25(3) = 5.75 0.0625 1.563

8 4 (8 – 7)2 = 1 2 + 1.25(4) = 7.00 1 0

9 6 (9 – 7)2 = 4 2 + 1.25(6) = 9.50 0.25 6.25

5 4 (5 – 7)2 = 4 2 + 1.25(4) = 7.00 4 0

4.5 2 (4.5 – 7)2 = 6.25

2 + 1.25(2) = 4.50 0 6.25

9.5 5 (9.5 – 7)2 = 6.25

2 + 1.25(5) = 8.25 1.5625 1.563

∑(Y – Y)2 = 22.5 ∑(Y – Y)2 = 6.875 ∑(Y – Y)2 =

15.625

Y = 7 SST = 22.5 SSE = 6.875 SSR = 15.625

^

_

_^

_

_ _^ ^

^

ESTIMATING THE VARIANCE

For Company A

718814

87506116

875061

2 ...

knSSE

MSEs

We can estimate the standard deviation, s

This is also called the standard error of the estimate or the standard deviation of the regression

31171881 .. MSEs

TESTING THE MODEL FOR SIGNIFICANCE When the sample size is too small, you

can get good values for MSE and r2 even if there is no relationship between the variables

Testing the model for significance helps determine if the values are meaningful

We do this by performing a statistical hypothesis test

TESTING THE MODEL FOR SIGNIFICANCE

We start with the general linear model XY 10

If 1 = 0, the null hypothesis is that there is no relationship between X and Y

The alternate hypothesis is that there is a linear relationship (1 ≠ 0)

If the null hypothesis can be rejected, we have proven there is a relationship

We use the F statistic for this test


The F statistic is based on the MSE and MSR

kSSR

MSR

wherek = number of independent variables in the model

The F statistic is

MSEMSR

F

This describes an F distribution withdegrees of freedom for the numerator = df1 = kdegrees of freedom for the denominator = df2 = n

– k – 1


If there is very little error, the MSE would be small and the F-statistic would be large indicating the model is useful

If the F-statistic is large, the significance level (p-value) will be low, indicating it is unlikely this would have occurred by chance

So when the F-value is large, we can reject the null hypothesis and accept that there is a linear relationship between X and Y and the values of the MSE and r2 are meaningful

STEPS IN A HYPOTHESIS TEST1. Specify null and alternative

hypotheses

2. Select the level of significance (). Common values are 0.01 and 0.05

3. Calculate the value of the test statistic using the formula

010 :H011 :H

MSEMSR

F

STEPS IN A HYPOTHESIS TEST

4. Make a decision using one of the following methodsa) Reject the null hypothesis if the test statistic is

greater than the F-value from the table Otherwise, do not reject the null hypothesis:

21 ifReject dfdfcalculated FF ,,

kdf 1

12 kndf

b) Reject the null hypothesis if the observed significance level, or p-value, is less than the level of significance (). Otherwise, do not reject the null hypothesis:

)( statistictest calculatedvalue- FPp

value- ifReject p

COMPANY AStep 1.

H0: 1 = 0 (no linear relationship between X and Y)H1: 1 ≠ 0 (linear relationship exists between X and Y)

Step 2.Select = 0.05

6250151625015

..

kSSR

MSR

09971881625015

...

MSEMSR

F

Step 3.Calculate the value of the test statistic

COMPANY AStep 4.

Reject the null hypothesis if the test statistic is greater than the F-value

df1 = k = 1df2 = n – k – 1 = 6 – 1 – 1 = 4The value of F associated with a 5%

level of significance and with degrees of freedom 1 and 4 is

F0.05,1,4 = 7.71

Fcalculated = 9.09

Reject H0 because 9.09 > 7.71

F = 7.71

0.05

9.09

COMPANY A

We can conclude there is a statistically significant relationship between X and Y

The r2 value of 0.69 means about 69% of the variability in sales (Y) is explained by Man Hour (X)

LIMITATION OF REGRESSION ANALYSIS Linear relationships can change over

timeThis is referred to as parameter instability

Even if the model is accurate, its usefulness will be limited if other market participants are also aware of and act on this model

If the assumptions do not hold, the interpretation and tests of hypotheses may not be valid

USING SOFTWARE FOR REGRESSION

USING SOFTWARE FOR REGRESSION

USING SOFTWARE FOR REGRESSIONCorrelation coefficient is

called Multiple R in Excel

ANALYSIS OF VARIANCE (ANOVA) TABLE When software is used to develop a

regression model, an ANOVA table is typically created that shows the observed significance level (p-value) for the calculated F value

This can be compared to the level of significance () to make a decisionDF SS MS F SIGNIFICANCE

Regression k SSR MSR = SSR/k MSR/MSE P(F > MSR/MSE)

Residual n - k - 1 SSE MSE = SSE/(n - k - 1)

Total n - 1 SST

Table 4.4

ANOVA FOR COMPANY A

Because this probability is less than 0.05, we reject the null hypothesis of no linear relationship and conclude there is a linear relationship between X and Y

P(F > 9.0909) = 0.0394

Education

Business Quantitative Lecture 3