The Classical Two-Variable Regression Model II

1/33

EC114 Introduction to Quantitative Economics16. The Classical Two-Variable Regression Model II

Marcus Chambers

Department of EconomicsUniversity of Essex

21/23 February 2012

EC114 Introduction to Quantitative Economics 16. The Classical Two-Variable Regression Model II

2/33

Outline

1 Introduction

2 Inference in the CLRM

3 Summary of Computational Procedure

4 Non-Linear Models

5 Prediction

Reference: R. L. Thomas, Using Statistics in Economics,McGraw-Hill, 2005, sections 12.312.5.


Introduction 3/33

We have seen that the Classical Linear Regression Model(CLRM) consists of a (population) regression equation,

Yi = + Xi + i, i = 1, . . . , n,

and a set of assumptions concerning X and .The assumptions are:

IA (non-random X): X is non-stochastic (non-random);

IB (fixed X): The values of X are fixed inrepeated samples;

IIA (zero mean): E(i) = 0, for all i;

IIB (constant variance): V(i) = 2 = constant for all i;

IIC (zero covariance): Cov(i, j) = 0 for all i 6= j;IID (normality): each i is normally distributed.


Introduction 4/33

Different combinations of assumptions yield differentproperties for the ordinary least squares (OLS) estimators,a and b, of and :

Property AssumptionsLinearity IA, IBUnbiasedness IA, IB, IIABLUness IA, IB, IIA, IIB, IICEfficiency IA, IB, IIA, IIB, IIC, IIDNormality IA, IB, IIA, IIB, IIC, IID

It is the last of these properties that will form the basis forinference (hypothesis testing) concerning the populationparameters and .


Inference in the CLRM 5/33

Under Assumptions IA, IB, IIA, IIB, IIC, and IID, the OLSestimators are BLUE (best linear unbiased estimators) aswell as normally distributed.We have seen that the sampling distributions of a and b aregiven by

a N(, 2a), b N(, 2b),where

V(a) = 2a =2

X2in

x2i, V(b) = 2b =

2x2i.

These distributions provide a basis for making inferencesabout and .



Standardising we obtain

a a

N(0, 1), b b

N(0, 1),

suggesting that the N(0, 1) distribution can be used forinference.However, the problem is that we dont know 2 and, hence,we cant compute 2a and 2b.We therefore need to estimate 2.An unbiased estimator of 2 is

s2 =

e2in 2

i.e. E(s2) = 2.



Note that the denominator of s2 involves n 2 and not n 1.This is because in constructing the ei we have had toestimate two parameters ( and ) and have thereforeused up two degrees of freedom.The estimated variances of a and b are then

s2a =s2

X2in

x2i, s2b =

s2x2i.

The estimated standard errors are given by the squareroots of s2a and s2b and are denoted sa and sb respectively.



The corresponding standardised versions of a and b are

a sa tn2, b sb tn2.

These distributions are Students t because we have had toestimate 2 using s2.The distributions have n 2 degrees of freedom becausewe have lost two degrees of freedom through estimating and .The standardised variables above are used to constructconfidence intervals and to test hypotheses concerning and using the tn2 distribution.



95% confidence intervals (CIs) for and can beconstructed as follows:

for , the 95% CI is a t0.025sa,for , the 95% CI is b t0.025sb,

where t0.025 is the value from the tn2 distribution that puts2.5% of the distribution into each tail.The interpretation is that we are 95% confident that liesin the interval

[a t0.025sa, a + t0.025sa] ,

while we are 95% confident that lies in the interval

[b t0.025sb, b + t0.025sb] .



Example. In Lecture 12 we looked at the demand formoney using data for 30 countries in 1985, obtaining

Y = 0.0212 + 0.17485X

where Y is money stock and X is GDP.Lets work out the standard errors of a and b and, hence,95% confidence intervals.We shall need the following sample statistics:

y2i = 26.403,

x2i = 666.86,xiyi = 116.60,

X2i = 1274.66.



We first need to compute s2; for this we neede2i =

y2i b

xiyi

= 26.403 (0.17485 116.60) = 6.0155.

It follows that

s2 =

e2in 2 =

6.015528

= 0.2148.

We then obtain

s2a =s2

X2in

x2i=

0.2148 1274.6630 666.86 = 0.0136,

s2b =s2x2i

=0.2148666.86

= 0.0003221.



The resulting standard errors are

sa =0.0136 = 0.1170, sb =

0.0003221 = 0.01795.

We can use these to form the confidence intervals thevalue t0.025 for the t28 distribution is 2.048 hence for the95% CI is

a t0.025sa = 0.0212 (2.048 0.1170)= 0.0212 0.2396

so the interval is [0.2184, 0.2608], while for

b t0.025sb = 0.17485 (2.048 0.01795)= 0.17485 0.03676

yielding the interval [0.1381, 0.2116].



Turning to hypothesis testing a common test is that ofsignificance i.e. a test of whether the population regressionparameter is zero or not.For example, in the model

Yi = + Xi + i, i = 1, . . . , n,

it is common to test

H0 : = 0 against HA : 6= 0.Why is this of interest? Note that:

Under H0 : Yi = + i;

Under HA : Yi = + Xi + i.



Hence, under H0, X does not determine Y, so this is knownas a test of significance.Put another way, it is a test of whether X is a significantdeterminant of Y.We can also test H0 : = 0 against HA : 6= 0, in whichcase

Under H0 : Yi = Xi + i;

Under HA : Yi = + Xi + i.

These null hypotheses concerning and can be testedseparately using t-tests.



When the null hypothesis is H0 : = 0, we have

TS =bsb tn2 under H0.

When the null hypothesis is H0 : = 0, we have

TS =asa tn2 under H0.

Let t0.025 denote the critical value from the tn2 distributionfor a 5% level two-tail test.In either case the decision rule is:

If |TS| > t0.025 reject H0 in favour of HA;if |TS| < t0.025 do not reject H0 (reserve judgment).



Example (continued). The demand-for-money examplesuggested a proportional relationship between Y (moneystock) and X (GDP) of the form Y = kX.We shall first test H0 : = 0 against HA : 6= 0 at the 5%level of significance; the 5% critical value from the t28distribution is 2.048.The test statistic is

TS =asa

=0.02120.1170

= 0.1812.

As |TS| = 0.1812 < 2.048 we do not reject H0 : = 0i.e. there is insufficient evidence to reject H0 in favour of HA.This supports the theory of a proportional relationship (wecannot reject the hypothesis that the intercept is zero).



Turning to H0 : = 0 against HA : 6= 0 the test statistic is

TS =bsb

=0.174850.01795

= 9.7409.

Clearly |TS| = 9.7409 > 2.048 and so we reject H0 : = 0 infavour of HA : 6= 0 i.e. there is evidence to suggest that X(GDP) is a significant determinant of Y (money stock).This result supports our earlier analysis based on thesample correlation coefficient and the coefficient ofdetermination which suggested a positive correlationbetween these variables.



From economic theory we would expect > 0 and so wemight want to test H0 : = 0 against HA : > 0.We have the same test statistic under the null but thecritical value changes due to this being an upper one-tailtest.At the 5% level we want 5% of the tn2 distribution in theupper tail rather than the 2.5% that we have for a two-tailtest.We find that now t0.05 = 1.701 and so the conclusion of ourtest is unchanged we reject H0 : = 0 but this time infavour of HA : > 0 rather than just 6= 0.



When reporting regression results it is usual to include thestandard errors of the estimates in parentheses below theestimates themselves.For the money demand example we have

Y = 0.0212 + 0.1749 X,(0.1170) (0.0180)

where figures in parentheses denote standard errors.It is important to state what the numbers in parenthesesactually are, because sometimes the t-ratios (for testingwhether the coefficient is zero) are reported instead.For example:

Y = 0.0212 + 0.1749 X,(0.1812) (9.7409)

where figures in parentheses denote t-ratios.



The advantage of reporting standard errors is that they canbe used to test other hypotheses e.g. H0 : = 1 withouthaving to derive the standard error from the t-ratio.The advantage of reporting t-ratios is that you can testimmediately for significance by comparing the t-ratio withthe appropriate critical value.The disadvantage of reporting t-ratios is that to test otherhypotheses you need to derive the standard error.For example, if we want the standard error from the t-ratio,we obtain it as follows:

TS =bsb sb = bTS .


Summary of Computational Procedure 21/33

We can summarise the computations required for OLSestimation of the two-variable regression model as follows:Step 1. Compute the quantities

Xi,

Yi,

X2i ,

Y2i ,

XiYi.

Step 2. Compute similar quantities for the deviations fromsample means:

x2i =

(Xi X)2 =

X2i

(

Xi)2

n,

y2i =

(Yi Y)2 =

Y2i (

Yi)2

n,

xiyi =

(Xi X)Yi Y) =

XiYi

Xi

Yin

.


Summary of Computational Procedure 22/33

Step 3. Compute the OLS estimates:

b =

xiyix2i, a =

Yin b

Xin

.

Step 4. Compute the sum of squared residuals and s2:e2i =

y2i b

xiyi, s2 =

e2i

n 2 .

Step 5. Compute the estimated variances of a and b:

s2a =s2

X2in

x2i, s2b =

s2x2i.

Step 6. Compute R2:

R2 =b2

x2iy2i

or R2 = 1

e2iy2i.


Non-Linear Models 23/33

The above inferential procedures apply equally well tonon-linear models that have been linearised in anappropriate way.For example, in Lecture 13 we used data on the price (X)of, and demand (Y) for, carrots, obtaining

ln(Y) = 6.73 0.744 ln(X).

The estimated price elasticity of demand is 0.744, butsuppose we want to test the hypotheses that it is equal to1 against the alternative that it is greater than 1.How do we go about doing this?



Writing the population regression as

ln(Y) = + ln(X) +

we therefore wish to test

H0 : = 1 against HA : > 1.Denoting the OLS estimate of by b we use the statistic

TS =b (1)

sb=

b + 1sb tn2 under H0.

Here, n = 30 and the upper one-tail 5% critical value fromthe t28 distribution is 1.701.



The decision rule is:

If TS > 1.701 reject H0 in favour of HA;

if TS < 1.701 do not reject H0 (reserve judgment).

We find that sb = 0.0896 and so

TS =0.744 + 10.0896

= 2.857.

Clearly TS = 2.857 > 1.701 and so we reject H0 : = 1 infavour of HA : > 1 i.e. there is evidence to suggest theprice elasticity of demand for carrots is greater than 1 (sothat the demand for carrots is price inelastic).



Inference for non-linear models therefore proceeds in thesame way as for linear models, the only difference beingthat the linearised model is written in terms of transformedvariables.For example, if Y = AX then ln(Y) = ln(A) + ln(X) andso the regression equation is

ln(Y) = + ln(X) + or Y = + X +

where Y = ln(Y) and X = ln(X).It is important that all computations are carried out usingthe transformed variables, Y and X, that appear in thelinearised model.


Prediction 27/33

Suppose we estimate a linear regression between twovariables, Y and X, using a sample of n observations.The population regression is

Yi = + Xi + i, i = 1, . . . , n,

while our sample regression is

Yi = a + bXi, i = 1, . . . , n.

Now suppose a new observation on X becomes available,but not the corresponding value of Y.Can we use our sample regression to predict thecorresponding value of Y? If so, what are the properties ofthe prediction?


Prediction 28/33

For simplicity denote the new value of X by X0.Our predicted value of Y0 is obtained by plugging X0 intothe sample regression:

Y0 = a + bX0.

However, the true but unknown value of Y0 will bedetermined (by assumption) by the population regression:

Y0 = + X0 + 0.

The difference between Y0 and Y0 is called the predictionerror or forecast error, and is denoted f .


Prediction 29/33

The value of f is given by

f = Y0 Y0= a + bX0 (+ X0 + 0)= (a ) + (b )X0 0.

It therefore depends on three things:(i) the estimation errors, a and b ;(ii) the regressor, X0; and(iii) the disturbance, 0.Only the second of these the value of X0 is known,although the distributional properties of the remainder areknown (under Classical assumptions).


Prediction 30/33

Under the full set of Classical assumptions it can be shownthat f N(0, 2f ) where

2f = 2[1 +

1n

+(X0 X)2

x2i

]and X and

x2i are obtained from the original n

observations.The variance can be estimated using

s2f = s2[1 +

1n

+(X0 X)2

x2i

]where s2 is the usual estimator of 2.


Prediction 31/33

Note that

ff

=Y Y0f

N(0, 1) and fsf

=Y Y0sf

tn2,

the latter forming a basis for inference.For example, a 95% confidence interval for f is of the form

0 t0.025sf or [t0.025sf , t0.025sf ]

while for Y0 it is of the form

Y0 t0.025sf or [Y0 t0.025sf , Y0 + t0.025sf ].

The confidence intervals get wider as X0 gets further fromX (the term (X0 X)2 determines s2f ) as depicted in the nextdiagram:


Prediction 32/33

It is apparent from the diagram that the accuracy of theprediction is greatest for values of X0 closest to X, becausehere the width of the CI is smallest.


Summary 33/33

Summary

Inference in the CLRMNon-linear models and prediction

Next week:Multiple linear regression


IntroductionInference in the CLRMSummary of Computational ProcedureNon-Linear ModelsPrediction

Documents

The Classical Two-Variable Regression Model II