26
Lecture 8 1 Econ 140 Econ 140 Classical Regression II Lecture 8

Classical Regression II

  • Upload
    saeran

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Classical Regression II. Lecture 8. The story so far. We learned how to compute least squares estimates We talked about the assumptions underlying the CLRM: 1) Y and e are random variables 2) X i is nonrandom (it’s given) 3)E( e i ) = E( e i |X i ) = 0 - PowerPoint PPT Presentation

Citation preview

Page 1: Classical Regression II

Lecture 8 1

Econ 140Econ 140

Classical Regression II

Lecture 8

Page 2: Classical Regression II

Lecture 8 2

Econ 140Econ 140The story so far...

• We learned how to compute least squares estimates

• We talked about the assumptions underlying the CLRM:

1) Y and e are random variables

2) Xi is nonrandom (it’s given)

3) E(ei) = E(ei|Xi) = 0

4) V(ei)= V(ei|Xi) = 2

5) Covariance (eiej) = 0

Clear about difference of ei and i.

Note that and (also denoted a^ and b^) are estimates of a and b; they are also random variables and have sampling distributions.

Page 3: Classical Regression II

Lecture 8 3

Econ 140Econ 140Today’s Plan

• Inference with the classical linear regression model

– Calculating the standard error

– Calculating the t-ratio

– Root-mean square error

– 95% confidence intervals

– ANOVA tables

– ANOVA table: ANOVA stands for analysis of variance

Page 4: Classical Regression II

Lecture 8 4

Econ 140Econ 140Variation around the regression line

• iid, and assumed normal:

X

Y

X1 X2 X3

Y1

Y2

Y3

Page 5: Classical Regression II

Lecture 8 5

Econ 140Econ 140Sum of Squares Identity

• Let’s take one point, X1 and look at it graphically:

Y)(

total.1YY

Y

Y

Y)-Y(dunexplaine

Residual .3

)ˆ(explained Model, 2.

YY

X1

Page 6: Classical Regression II

Lecture 8 6

Econ 140Econ 140Sum of Squares Identity (2)

• The Sum of Squares Identity is

Total = Explained + Unexplained

or

222 )ˆ()ˆ()( YYYYYY

Page 7: Classical Regression II

Lecture 8 7

Econ 140Econ 140Sum of Squares Identity (3)

reveals how much of the variation is explained by the regression line

2)ˆ( YY

reveals how much of the variation is not explained by the regression line, or is left over

– Notice that this is also equal to 2

2)ˆ( YY

reveals how much total variation there is

– remember in a previous lecture we said that

2)( YY

0)( YY

Page 8: Classical Regression II

Lecture 8 8

Econ 140Econ 140How to calculate sum of squares

• We can write the total sum of squares as

22)( yYY

– We’re given the Y values so we can compute Y

• We can write the explained sum of squares as

2)ˆ( YY – Calculating the ESS: xy

Page 9: Classical Regression II

Lecture 8 9

Econ 140Econ 140How to calculate sum of squares (4)

• We can calculate the unexplained variation (the unexplained sum of squares) as the difference between the total and the explained sum of squares:

xyy 22

• Because we have to consider degrees of freedom when calculating each variance term, we divide the SSI by the corresponding degrees of freedom:

2

2)ˆ(1

2)ˆ(1

)(

n

YYYYn

YY

Page 10: Classical Regression II

Lecture 8 10

Econ 140Econ 140How to calculate sum of squares (5)

• The residual variance of the regression line is

22

2

2

2)ˆ( ˆ yxni

nYY

• If we take the square root we get the root mean square error (root MSE):

yxyx ˆˆ 2

Page 11: Classical Regression II

Lecture 8 11

Econ 140Econ 140Calculating test statistics

• We can calculate test statistics from the sum of squares statistics

• The variance of , the slope coefficient is

2

2ˆ2ˆx

yx

– Where )( XXx

Page 12: Classical Regression II

Lecture 8 12

Econ 140Econ 140Calculating test statistics (2)

• The standard error of is

2

2ˆ2ˆˆx

yx

• The variance of the intercept is

22

22 ˆˆ yx

xn

X

• The standard error of is

22

22 ˆˆˆ yx

xn

X

Page 13: Classical Regression II

Lecture 8 13

Econ 140Econ 140Confidence intervals

• Once we have the standard errors, we can do two things:

– form a confidence interval

– perform a hypothesis test

• A confidence interval for b:

ˆ2/ dftb

– Where df in a bi-variate model is 2

• As with univariate cases, we can calculate a confidence interval for b in a bi-variate case

Page 14: Classical Regression II

Lecture 8 14

Econ 140Econ 140Hypothesis testing

• Set up your null hypothesis and alternative

• Determine the critical region - choose a significance level ()

• Using the relevant distribution, determine your critical (tabled) value (Z/2 , or t/2 for the moment; Fdf1,df2 and n soon).

• For a given sample, compute the numeric value of the test statistic: Z*, t*, F* or *.

• Given the decision rule, determine whether to reject or not the null hypothesis.

Page 15: Classical Regression II

Lecture 8 15

Econ 140Econ 140Hypothesis testing (2)

• For standard statistical packages, the null hypothesis is that the population parameter is zero, or

Ho : b = 0

• Most of the time we only have a sample and an estimate ,

– we don’t know the actual population value

• Sometimes the value of b is dictated by economic theory

– in that case, a value will be imposed on b, such as b=1:

Ho : b = 1

Page 16: Classical Regression II

Lecture 8 16

Econ 140Econ 140Hypothesis testing (3)

• The standard t-ratio or t statistic is

ˆ

bt

• So if the null hypothesis dictates b= 0, the t-ratio becomes

ˆ

0t

Page 17: Classical Regression II

Lecture 8 17

Econ 140Econ 140Example

• Data on female earnings in Illinois {spreadsheet L8.xls}

• The variables include earnings, earnings weights, and years of education

• In this example, the first three columns represent the ‘population’. Select two samples of 30 at random from that population. First sample, create log earnings (ln Y). Note you can create means of X and Y. Multiply (ln Y) by years of education (XY). Square years of education (X2). Sum (XY) and Sum (X2). Provides all the statistics you need to calculate the least squares line

Page 18: Classical Regression II

Lecture 8 18

Econ 140Econ 140Example (2)

• I have also included an example of how to use Excel’s LINEST to calculate the regression line

• On the web you’ll find some output from Stata using the population and sample regressions from the Illinois data. Try the LINEST function and check that your output agrees with the output from Stata

• Let’s look at a graph of the sample and popluation regression lines

Page 19: Classical Regression II

Lecture 8 19

Econ 140Econ 140Example (3)

• From the spreadsheet we calculated the following:

4480

48.5

67.12

61.2010

2X

Y

X

XY

Sample size : n=30

• We use these numbers to calculate

235.084.11130.26

)61.145(304480)48.567.12(3061.2010

22

XnX

XYnXY

Page 20: Classical Regression II

Lecture 8 20

Econ 140Econ 140Example (4)

• And to calculate

645.2)07.12(235.048.5 XY • Compare our estimates with the Stata output

• Now let’s use the numbers from the spreadsheet to calculate the regression line variance

355.

2

28)30.26(235.011.16

22

n

xybyyx

Page 21: Classical Regression II

Lecture 8 21

Econ 140Econ 140Example (5)

• The variance of is

0032.ˆ

ˆ 84.111355.0

2

22

xyx

• Thus the standard error of is

056.0032.0ˆ

Page 22: Classical Regression II

Lecture 8 22

Econ 140Econ 140Example (6)

• We can calculate a confidence interval for b:

)056.0(048.2235.0

ˆ2/

dftb

• For a 95% confidence interval, b is bounded between

0.120 < b < 0.350

Page 23: Classical Regression II

Lecture 8 23

Econ 140Econ 140Example (7)

• Now the hypothesis test: The Stata output gives a t-ratio of 4.06. Our null and alternative hypotheses are

Ho: b = 0 Ho: b 0

• Our t statistic:

19.4056.0235.0

bt

• Since |t| > t/2df,, we reject the null hypothesis.

• Thus, at a 95% confidence interval, the estimate does not equal zero

Page 24: Classical Regression II

Lecture 8 24

Econ 140Econ 140A word on modeling

• The model we’ve been using is Y = a+bX

• In our spreadsheet example, our model is lnY = a + bX

• This suggests an underlying model of Y = ea+bX

• Sometimes it is better to take logs of variables to make the relationship between Y and X linear

• Because of outliers, the underlying relationship will sometimes look more like an upward sloping curve

• Logging the earnings and then comparing it it years of education gives you a far more linear relationship - it does not change your conclusions

Page 25: Classical Regression II

Lecture 8 25

Econ 140Econ 140A word on modeling (2)

• We are asking the question:

• What is the increase in earnings for an additional year of education?

• It is the differential bdXYd )(ln

• More simply we can write

)(lnln

ln

ln

1212

22

11

XXbaaYY

bXaY

bXaY

Page 26: Classical Regression II

Lecture 8 26

Econ 140Econ 140A word on modeling (3)

• The difference between X1 and X2 is a discreet change in years of education, so the difference will be one

• So we can write:

11 %

) %1ln(

ln

lnlnln

1

212

bXb eeYof

XbYof

XbY

YYY

• On the spreadsheet, calculate an additional year of school:

% of Y = e0.235 - 1 = approximately 26%

Enter into Excel: =exp(0.235)-1

• So in a semi-log equation is lnY = + X % of Y