24
ECO375 Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 1 / 24

ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

  • Upload
    lekiet

  • View
    236

  • Download
    3

Embed Size (px)

Citation preview

Page 1: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

ECO375 Tutorial 7Heteroscedasticity

Matt Tudball

University of Toronto Mississauga

November 9, 2017

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 1 / 24

Page 2: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Review: Heteroscedasticity

Consider again the standard multiple regression model in which anoutcome yi is linearly related to some explanatory variablesxi1, xi2, ..., xik such that

yi = β0 + β1xi1 + ...+ βkxik + ui

Recall that an assumption needed for efficiency of the OLS estimatorand construction of t-tests, F-tests and valid confidence intervals wasMLR.5:

MLR.5 The error ui homoscedastic, i.e. it has the same variance for all xiVar(ui |xi1, xi2, ..., xik) = σ2

This says that the unobservable error term ui has the same variancefor all observations i .

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 2 / 24

Page 3: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Review: Heteroscedasticity

This is a fairly strong assumption and there are many circumstancesin which it will not hold in practice.

A classic example where this assumption may fail is the relationshipbetween income and food expenditure. Poor people will generallypurchase inexpensive food and so the variance of their foodexpenditure is low. Rich people, however, may purchase expensive orinexpensive food depending on their tastes and so the variance oftheir food expenditure is high.

So for foodexpi = β0 + β1incomei + ui we will generally find thatVar(ui |incomei ) becomes larger as incomei increases.

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 3 / 24

Page 4: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Review: Heteroscedasticity

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 4 / 24

Page 5: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Review: Heteroscedasticity

Let’s give this notion a formal definition.

We define heteroscedasticity as allowing the variance of ui to varyacross xi such that Var(ui |xi1, xi2, ..., xik) = σ2i , where the i subscriptis what allows the variance to differ across observations.

While heteroscedasticity does not affect the consistency orunbiasedness of the OLS estimator, it does make the OLS estimatorinefficient and means that our t-statistics and F-statistics may notfollow their respective distributions.

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 5 / 24

Page 6: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Generalised Least Squares

So what can we do about this?

Consider the base model

yi = β0 + β1xi1 + ...+ βkxik + ui (1)

where Var(ui |xi1, xi2, ..., xik) = σ2iWe can think that there may exist a transformation pi such that a“transformed” error term u∗i = piui has homoscedastic variance

Var(u∗i |xi1, xi2, ..., xik) = σ2

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 6 / 24

Page 7: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Generalised Least Squares

Let’s say that we know the form of pi for each observation i .

Then in principle we could transform our data by multiplying bothsides of the base model (1) by pi such that

piyi = β0pi + β1pixi1 + ...+ βkpixik + piuiy∗i = β0x

∗0i + β1x

∗i1 + ...+ βkx

∗ik + u∗i

where Var(u∗i |xi1, xi2, ..., xik) = σ2 which satisfies MLR.5.

The estimator obtained by running OLS on this transformed model isknown as the Generalised Least Squares (GLS) estimator.

We will often assume that pi = 1/√hi for some function hi since this

allows us to assume a form of the heteroscedasticity σ2i = σ2hi (notethat this is purely for notational convenience).

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 7 / 24

Page 8: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Generalised Least Squares: Example

Consider a model which explores the relationship between houseprices pricei , square footage sqrfti , lot size lotsizei and number ofbedrooms bdrmsi

pricei = β0 + β1sqrtfti + β2lotsizei + β3bdrmsi + ui (2)

and consider a possible form of heteroscedasticity σ2i = σ2sqrfti .

This indicates that the variance of pricei given sqrfti , lotsizei andbdrmsi is increasing in sqrfti .

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 8 / 24

Page 9: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Generalised Least Squares: Example

Now consider the transformation pi = 1/√sqrfti . Then if we divide

both sides of (2) by√sqrfti

pricei/√sqrfti =β0/

√sqrfti + β1

√sqrfti + β2lotsizei/

√sqrfti

+ β3bdrmsi/√sqrfti + ui/

√sqrfti

price∗i =β0/√sqrfti + β1sqrtft

∗i + β2lotsize

∗i + β3bdrms∗i + u∗i

such thatVar(u∗i ) = Var(ui/

√sqrfti )

= Var(ui )/sqrfti= σ2sqrfti/sqrfti = σ2

which satisfies MLR.5.

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 9 / 24

Page 10: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Relationship with Weighted Least Squares

GLS estimators belong to a broader class of so-called WeightedLeast Squares (WLS) estimators.

We can think of the transformation pi as assigning a “weight” toeach observation i .

Observations with more variance in ui are going to be weighted downby pi and observations with less variance are going to be weighted up.

This makes intuitive sense from an efficiency perspective sinceobservations with lower variance in ui are going to contain moreprecise information compared to observations with higher variance.

The objective function of the WLS estimator with weights pi is∑ni=1 p

2i (yi − β0 − β1xi1 − ...− βkxik)2

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 10 / 24

Page 11: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Feasible Generalised Least Squares: Idea

An obvious drawback of GLS is that the transformation pi = 1/√hi

which eliminates heteroscedasticity is almost always unknown.

One approach to implementing GLS in practice is to assume aparticular functional form for hi as a function of the explanatoryvariables xi , usually denoted by h(xi ).

A popular functional form is

h(xi ) = exp(δ0 + δ1xi1 + ...+ δkxik)

This implies a form of the variance of u such that

Var(ui |xi1, xi2, ..., xik) = σ2 exp(δ0 + δ1xi1 + ...+ δkxik)

We can therefore write a model for the variance of the form

u2i =σ2 exp(δ0 + δ1xi1 + ...+ δkxik)viln(u2i ) = ln(σ2) + δ0︸ ︷︷ ︸

α0

+δ1xi1 + ...+ δkxik + ei

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 11 / 24

Page 12: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Feasible Generalised Least Squares: Implementation

Let’s describe how to implement the Feasible Generalised LeastSquares (FGLS) estimator described in the previous slide in practice.

1 Begin by estimating the original OLS model and recover the sampleresidual ui .

2 Estimate the model

ln(u2i ) = α0 + δ1xi1 + ...+ δkxik + ei

3 Set

hi = exp(α0 + δ1xi1 + ...+ δkxik)

4 Transform the original model using the weights

pi = 1/√hi

and estimate by OLS.

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 12 / 24

Page 13: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

In-Class Exercise 1

In this exercise we are going to implement FGLS in Stata. Download thedataset HPRICE1.dta from my website (matthewtudball.com).reg price sqrft lotsize bdrmspredict uhat, residualsgen log_uhat2 = ln(uhatˆ2)reg log_uhat2 sqrft lotsize bdrmspredict reshat, xbgen hhat = exp(reshat)gen phat = 1/sqrt(hhat)gen pricep = price*phatgen sqrftp = sqrft*phatgen lotsizep = lotsize*phatgen bdrmsp = bdrms*phatreg pricep phat sqrftp lotsizep bdrmsp, nocons

Why do I include the variable phat in our final regression? Why do I havenocons as an option in the final line, indicating that I do not want toinclude a constant term? (Hint: Look at Slide 9).

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 13 / 24

Page 14: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Robust (White) Standard Errors

An alternative to using GLS is to use standard errors called robust(White) standard errors.

The idea behind robust standard errors is to use u2i as an estimatorfor observation i ’s true variance σ2i .

When this is used to construct an estimator for the variance of βj theestimator is consistent.

How does this compare to using GLS?

When the functional form of hi is correctly specified, GLS is moreefficient than robust standard errors.While robust standard errors may be less efficient, they provideconsistent estimators without relying on functional form assumptions.In practice always use robust standard errors.

To implement robust standard errors in Stata in the regression inIn-Class Exercise 1, simply typereg price sqrft lotsize bdrms, robust

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 14 / 24

Page 15: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Testing for Heteroscedasticity

Since robust standard errors may be inefficient and GLS relies onassumptions that are hard to justify, we generally want to use OLS ifwe can rule out heteroscedasticity.

Let’s return again to the standard multiple regression model

yi = β0 + β1xi1 + ...+ βkxik + ui

We ultimately want to test

H0 : E(u2i ) = σ2

H1 : E(u2i ) = σ2i

In order to test this in practice, we will need to impose some structureon the form of σ2i (note that this may weaken the strength of thesetests).

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 15 / 24

Page 16: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Testing for Heteroscedasticity

Consider instead the alternative hypothesis

H1 : E(u2i ) = δ0 + δ1zi1 + ...+ δpzip

where the variables zi1, ..., zip remain unspecified for now.

Then we have a more structured form of the test in the previous slide

H0 : δ1 = ... = δp = 0H1 : δj 6= 0 for some j = 1, ..., p

Note that since the null hypothesis has multiple restrictions this willultimately be an F-test.

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 16 / 24

Page 17: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Testing for Heteroscedasticity

Let’s consider how to implement a test of this form.

1 Estimate the restricted model (OLS with homoscedasticity) andobtain the sample residuals ui .

2 Use OLS to estimate

u2i = δ0 + δ1zi1 + ...+ δpzip + ei

3 Under the null hypothesis

R2u2/p

(1−R2u2)/(n−p−1)

∼ Fn−p−1 and nR2u2 ∼ χ

2p

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 17 / 24

Page 18: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

Breusch-Pagan (BP) Test for Heteroscedasticity

Which set of zi ’s should we use?

The Breusch-Pagan (BP) Test simply uses the set of explanatoryvariables xi such that the alternative hypothesis takes the form

H1 : E(u2i ) = δ0 + δ1xi1 + ...+ δkxik

and the auxiliary regression is

u2i = δ0 + δ1xi1 + ...+ δkxik + ei

with the test statistic nR2u2 ∼ χ

2k under the null hypothesis.

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 18 / 24

Page 19: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

White Test for Heteroscedasticity

Another approach allows for a more flexible relationship between xiand σ2i by specifying polynomials in the xi ’s and interactions betweenthem.

This is known as the White Test.

Consider the simple model

yi = β0 + β1xi1 + β2xi2 + ui

White’s version of the test uses the auxiliary regression

u2i = δ0 + δ1xi1 + δ2xi2 + δ3x2i1 + δ4x

2i2 + δ5xi1xi2 + ei

with the test statistic nR2u2 ∼ χ

25.

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 19 / 24

Page 20: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

White Test for Heteroscedasticity

With many regressors, the number of interaction terms increasesquickly. This quickly becomes a dimensionality problem.

There is an alternative version of this same test that utilises arestricted combination of squares and interactions of the regressors.

Letyi = β0 + β1xi1 + ...+ βkxik

The auxiliary regression then becomes

u2i = δ0 + δ1yi + δ2y2i + ei

H0 : δ1 = δ2 = 0nR2

u2 ∼ χ22

This is “restricted” in the sense that the square and interaction termsof xi contained in y2i are assumed to have an identical effect on u2i(namely, δ2).

Note that rejecting the null is not proof of homoscedasticity.

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 20 / 24

Page 21: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

In-Class Exercise 2

In this exercise we will implement both the Breusch-Pagan Test and WhiteTest for heteroscedasticity. Load in the HPRICE1.dta dataset used in thelast exercise.

To run the Breusch-Pagan Test type the following codereg price sqrft lotsize bdrmspredict uhat, residualsgen uhat2 = uhatˆ2reg uhat2 sqrft lotsize bdrms

Do you reject the null hypothesis of homoscedasticity?

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 21 / 24

Page 22: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

In-Class Exercise 2

To run version 1 of the White Test type the following codegen sqrft2 = sqrftˆ2gen lotsize2 = lotsizeˆ2gen bdrms2 = bdrmsˆ2reg uhat2 c.sqrft##c.lotsize c.sqrft##c.bdrmsc.lotsize##c.bdrms ///

sqrft2 lotsize2 bdrms2

Do you reject the null hypothesis of homoscedasticity?

To run version 2 of the White Test type the following codereg price sqrft lotsize bdrmspredict yhat, xbgen yhat2 = yhatˆ2

reg uhat2 yhat yhat2

Do you reject the null hypothesis of homoscedasticity?

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 22 / 24

Page 23: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

In-Class Exercise 3

This is Computer Exercise 8 from Wooldridge Chapter 8. Download thedataset GPA1.dta from my website (matthewtudball.com).

1 Use OLS to estimate a model relating colGPA to hsGPA, ACT ,skipped and PC . Obtain (i.e. predict) the OLS residuals.

2 Compute version 2 of the White Test for heteroscedasticity. In the

regression of u2i on colGPAi and colGPA2

i , obtain the fitted values,

say hi .

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 23 / 24

Page 24: ECO375 Tutorial 7 Heteroscedasticity - … Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5

In-Class Exercise 3

1 Verify that the fitted values from part 2 are all strictly positive. Then,obtain the weighted least squares estimates using weights 1/hi .Compare the weighted least squares estimates for the effect ofskipping lectures and the effect of PC ownership with thecorresponding OLS estimates. What about their statisticalsignificance?

2 In the WLS estimation from part 3, obtain the robust standard errors.In other words, allow for the fact that the variance function estimatedin part 2 might be misspecified. Do the standard errors change muchfrom part 3?

Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 24 / 24