1 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS This sequence presents two methods for dealing with the problem of heteroscedasticity. We will

1

HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

This sequence presents two methods for dealing with the problem of heteroscedasticity. We will start with the general case, where the variance of the distribution of the disturbance term in observation i is ui

2.

iii uXY 21 2var

iuiu , not constant for all i

2

If we knew ui in each observation, we could derive a homoscedastic model by dividing the equation through by it.


iii uXY 21 2var

iuiu

iiii u

i

u

i

uu

i uXY

211

, not constant for all i

3

The population variance of the disturbance term in the revised model is now equal to 1 in all observations, and so the disturbance term is homoscedastic.


iii uXY 21 2var

iuiu

1 var1var 2

2

2

i

i

ii u

ui

uu

i

σσ

uσσ

u

iiii u

i

u

i

uu

i uXY

211


4

In the revised model, we regress Y' on X' and H, as defined. Note that there is no intercept in the revised model. 1 becomes the slope coefficient of the artificial variable 1/ui.


iii uXY 21 2var

iuiu

1 var1var 2

2

2

i

i

ii u

ui

uu

i

σσ

uσσ

u

iiii u

i

u

i

uu

i uXY

211


,1,'ii u

iu

ii HYY

ii u

ii

u

ii

uuXX

','

''' 21 iiii uXHY

5

The revised model is described as a weighted regression model because we are weighting observation i by a factor 1/ui. Note that we are automatically giving the highest weights to the most reliable observations (those with the lowest values of ui).


iii uXY 21 2var

iuiu

1 var1var 2

2

2

i

i

ii u

ui

uu

i

σσ

uσσ

u

iiii u

i

u

i

uu

i uXY

211


,1,'ii u

iu

ii HYY

ii u

ii

u

ii

uuXX

','

''' 21 iiii uXHY

6

Of course in practice we do not know the value of i in each observation. However it may be reasonable to suppose that it is proportional to some measurable variable, Zi.


iii uXY 21 2var


iu Zi

Assumption:

7

If this is the case, we can make the model homoscedastic by dividing through by Zi.


i

i

i

i

ii

i

Zu

ZX

ZZY 21

1

iii uXY 21 2var


iu Zi

Assumption:

8

''' 21 iiii uXHY

222

22

2 /1var

i

i

iu

uu

ii

i

ZZu

i

i

i

i

ii

i

Zu

ZX

ZZY 21

1

The disturbance term in the revised model has constant variance 2. We do not need to know the value of 2. The crucial point is that, by assumption, it is constant.


iii uXY 21 2var


iu Zi

,1,'i

ii

ii Z

HZYY

i

ii

i

ii Z

uuZXX ','

Assumption:

We will illustrate this procedure with the UNIDO data on manufacturing output and GDP. We will try scaling by population. A regression of manufacturing output per capita on GDP per capita is less likely to be subject to heteroscedasticity.

9


0

50000

100000

150000

200000

250000

300000

0 200000 400000 600000 800000 1000000 1200000 1400000

GDP

Man

ufac

turin

g

Here is the revised scatter diagram. Does it look homoscedastic? Actually, no. This is still a classic pattern of heteroscedasticity.

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 5000 10000 15000 20000 25000 30000 35000 40000

GDP per capita

Man

ufac

turin

g pe

r cap

ita

10


RSS2 is much larger than RSS1.

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 5000 10000 15000 20000 25000 30000 35000 40000

GDP per capita

Man

ufac

turin

g pe

r cap

ita

RSS1 = 5,378,000

RSS2 = 17,362,000

11


12


However, the subsamples are small and high ratios can occur on a pure chance basis. The null hypothesis of homoscedasticity is only just rejected at the 5% level.

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 5000 10000 15000 20000 25000 30000 35000 40000

GDP per capita

Man

ufac

turin

g pe

r cap

ita

RSS1 = 5,378,000

RSS2 = 17,362,000

23.39/000,378,59/000,362,17

//,

11

2212

knRSSknRSSknknF

18.39,9 %5,crit F

Often the X variable itself is a suitable scaling variable. After all, the Goldfeld–Quandt test assumes that the standard deviation of the disturbance term is proportional to it.

13


iii uXY 21 2var


iu Xi

Assumption:

Note that when we scale though by it, the 2 term becomes the intercept in the revised model.

14


i

i

ii

i

Xu

XXY

211

iii uXY 21 2var


Assumption: iu Xi

It follows that when we interpret the regression results, the slope coefficient is an estimate of 1 in the original model and the intercept is an estimate of 2.

222

22

2 /1var

i

i

iu

uu

ii

i

XXu

i

i

ii

i

Xu

XXY

211

'' 21 iii uHY

15


iii uXY 21 2var


Assumption: iu Xi

,1,'i

ii

ii X

HXYY

i

ii X

uu '

Here is the corresponding scatter diagram. Is there any evidence of heteroscedasticity?

0.00

0.10

0.20

0.30

0.40

0 10 20 30 40 50 60 70 80

1/GDP x 1,000,000

Man

ufac

turin

g/G

DP

16


No longer. The residual sums of squares for the two subsamples are almost identical, indeed closer than one would usually expect on a pure chance basis under the null hypothesis.

0.00

0.10

0.20

0.30

0.40

0 10 20 30 40 50 60 70 80

1/GDP x 1,000,000

Man

ufac

turin

g/G

DP

RSS2 = 0.070

17


RSS1 = 0.065

18


As a consequence, the F statistic is not significant. The heteroscedasticity has been eliminated.

0.00

0.10

0.20

0.30

0.40

0 10 20 30 40 50 60 70 80

1/GDP x 1,000,000

Man

ufac

turin

g/G

DP

RSS2 = 0.070

RSS1 = 0.065

08.19/065.09/070.0

//,

11

2212

knRSSknRSSknknF

18.39,9 %5,crit F

We will now consider an alternative approach to the problem. It is possible that the heteroscedasticity has been caused by an inappropriate mathematical specification. Suppose, in particular, that the true relationship is in fact logarithmic.

0

50000

100000

150000

200000

250000

300000

0 200000 400000 600000 800000 1000000 1200000 1400000

GDP

Man

ufac

turin

g

19


Here is the corresponding scatter diagram. No sign of heteroscedasticity.

7

8

9

10

11

12

13

9 10 11 12 13 14 15

log GDP

log

Man

ufac

turin

g

20


We confirm this with the Goldfeld–Quandt test. In this case there is no point in calculating the conventional test statistic. RSS2 is smaller than RSS1, so it cannot be significantly greater than RSS1.

7

8

9

10

11

12

13

9 10 11 12 13 14 15

log GDP

log

Man

ufac

turin

g

RSS2 = 1.037

RSS1 = 2.140

21


In this situation we should test whether there is evidence that the standard deviation of the disturbance term is inversely proportional to the X variable. For this purpose, the F statistic is the inverse of the conventional one.

22


7

8

9

10

11

12

13

9 10 11 12 13 14 15

log GDP

log

Man

ufac

turin

g

RSS2 = 1.037

RSS1 = 2.140

The null hypothesis of homoscedasticity is not rejected.

23


7

8

9

10

11

12

13

9 10 11 12 13 14 15

log GDP

log

Man

ufac

turin

g

RSS2 = 1.037

RSS1 = 2.140

06.29/037.19/140.2

//,

22

1112

knRSSknRSSknknF

18.39,9 %5,crit F

Now an additive disturbance term in the logarithmic model is equivalent to a multiplicative one in the original model.

24


7

8

9

10

11

12

13

9 10 11 12 13 14 15

log GDP

log

Man

ufac

turin

g

uXY loglog 21

ueXeY 21

This means that the absolute size of the effect of the disturbance term is large for large values of the X variable and small for small ones, when the scatter diagram is redrawn with the variables in their original form.

25


0

50000

100000

150000

200000

250000

300000

0 200000 400000 600000 800000 1000000 1200000 1400000

GDP

Man

ufac

turin

g

uXY loglog 21

ueXeY 21

For example, Singapore and South Korea have relatively large manufacturing sectors, and Greece and Mexico relatively small ones.

7

8

9

10

11

12

13

9 10 11 12 13 14 15

log GDP

log

Man

ufac

turin

g

South Korea

MexicoSingapore

Greece

26


0

50000

100000

150000

200000

250000

300000

0 200000 400000 600000 800000 1000000 1200000 1400000

GDP

Man

ufac

turin

g

The variations for these countries are similar when plotted on the logarithmic scale, but those for South Korea and Mexico are much larger when the variables are plotted in natural units.

South Korea

Mexico

Singapore

Greece

27


GDPUNMA 194.0604ˆ 89.02 R)5700( )013.0(

POPGDP

POPPOPUNMA 182.01612

ˆ

)1371( )016.0(

GDPGDPUNMA 1533189.0

ˆ

)841()019.0(

70.02 R

02.02 R

GDPNUAM log999.0694.1ˆlog 90.02 R)785.0( )066.0(

Here is a summary of the regressions using the four alternative specifications of the model.

28


The first regression suggests that, for every increase of $1 million in GDP, manufacturing output increases by $194,000. Thus, at the margin, manufacturing accounts for 0.19 of GDP. The intercept does not have any plausible meaning.

29


89.02 R)5700( )013.0(

POPGDP


ˆ

)1371( )016.0(


ˆ

)841()019.0(

70.02 R

02.02 R


GDPUNMA 194.0604ˆ

However, this regression was subject to severe heteroscedasticity. Although the estimate of the coefficient of GDP is unbiased, it is likely to be relatively inaccurate. Also, and this is a separate effect of heteroscedasticity, the standard errors, t tests and F test are invalid.

30


89.02 R)5700( )013.0(

POPGDP


ˆ

)1371( )016.0(


ˆ

)841()019.0(

70.02 R

02.02 R


GDPUNMA 194.0604ˆ

In the second regression, the estimate of the slope coefficient was a little lower. However for this regression also the null hypothesis of homoscedasticity was rejected, but only at the 5% level.

31


89.02 R)5700( )013.0(

)1371( )016.0(


ˆ

)841()019.0(

70.02 R

02.02 R


GDPUNMA 194.0604ˆ

POPGDP


ˆ

In the third regression the model was scaled through by GDP. As a consequence, the intercept became an estimator of the original slope coefficient, and vice versa.

32


89.02 R)5700( )013.0(

POPGDP


ˆ

)1371( )016.0(


ˆ

)841()019.0(

70.02 R

02.02 R


GDPUNMA 194.0604ˆ

For this model the null hypothesis of homoscedasticity was not rejected. In principle, therefore, it should yield more accurate estimates of the coefficients than the first two, and we are able to perform tests.

33


89.02 R)5700( )013.0(

POPGDP


ˆ

)1371( )016.0(


ˆ

)841()019.0(

70.02 R

02.02 R


GDPUNMA 194.0604ˆ

For the logarithmic model also the null hypothesis of homoscedasticity was not rejected. So we have two models which survive the Goldfeld–Quandt test. Which do you prefer? Think about it.

34


89.02 R)5700( )013.0(

POPGDP


ˆ

)1371( )016.0(


ˆ

)841()019.0(

70.02 R

02.02 R


GDPUNMA 194.0604ˆ

You probably went for the logarithmic model, attracted by the high R2. However, in this example, there is little to choose between the third and fourth models. Substantively, they have the same interpretation.

35


89.02 R)5700( )013.0(

POPGDP


ˆ

)1371( )016.0(


ˆ

)841()019.0(

70.02 R

02.02 R


GDPUNMA 194.0604ˆ

In the third model, 1/GDP has a low t statistic and appears to be an irrelevant variable. The model is telling us that manufacturing output, as a proportion of GDP, is constant. Because it is constant, R2 is effectively 0.

36


89.02 R)5700( )013.0(

POPGDP


ˆ

)1371( )016.0(


ˆ

)841()019.0(

70.02 R

02.02 R


GDPUNMA 194.0604ˆ

The fourth model is telling us that the elasticity of manufacturing output with respect to GDP is equal to 1. In other words, manufacturing output increases proportionally with GDP and remains a constant proportion of it.

37


89.02 R)5700( )013.0(

POPGDP


ˆ

)1371( )016.0(


ˆ

)841()019.0(

70.02 R

02.02 R


GDPUNMA 194.0604ˆ

Converting the logarithmic equation back into natural units, you obtain the equation shown. Like the third equation, it implies that manufacturing output accounts for a little over 0.18 of GDP, at the margin.

999.0999.0694.1 184.0 GDPGDPeMANU

38



ˆ

)841()019.0(02.02 R


Copyright Christopher Dougherty 2012.

These slideshows may be downloaded by anyone, anywhere for personal use.Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author.

The content of this slideshow comes from Section 7.3 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press.Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centrehttp://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school courseEC212 Introduction to Econometrics http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspxor the University of London International Programmes distance learning courseEC2020 Elements of Econometricswww.londoninternational.ac.uk/lse.

2012.11.10

Documents

1 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS This sequence presents two methods for dealing with the problem of heteroscedasticity. We will