Upload
sherilyn-blair
View
226
Download
0
Embed Size (px)
DESCRIPTION
3 The population variance of the disturbance term in the revised model is now equal to 1 in all observations, and so the disturbance term is homoscedastic. HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS, not constant for all i
Citation preview
1
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
This sequence presents two methods for dealing with the problem of heteroscedasticity. We will start with the general case, where the variance of the distribution of the disturbance term in observation i is ui
2.
iii uXY 21 2var
iuiu , not constant for all i
2
If we knew ui in each observation, we could derive a homoscedastic model by dividing the equation through by it.
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
iii uXY 21 2var
iuiu
iiii u
i
u
i
uu
i uXY
211
, not constant for all i
3
The population variance of the disturbance term in the revised model is now equal to 1 in all observations, and so the disturbance term is homoscedastic.
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
iii uXY 21 2var
iuiu
1 var1var 2
2
2
i
i
ii u
ui
uu
i
σσ
uσσ
u
iiii u
i
u
i
uu
i uXY
211
, not constant for all i
4
In the revised model, we regress Y' on X' and H, as defined. Note that there is no intercept in the revised model. 1 becomes the slope coefficient of the artificial variable 1/ui.
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
iii uXY 21 2var
iuiu
1 var1var 2
2
2
i
i
ii u
ui
uu
i
σσ
uσσ
u
iiii u
i
u
i
uu
i uXY
211
, not constant for all i
,1,'ii u
iu
ii HYY
ii u
ii
u
ii
uuXX
','
''' 21 iiii uXHY
5
The revised model is described as a weighted regression model because we are weighting observation i by a factor 1/ui. Note that we are automatically giving the highest weights to the most reliable observations (those with the lowest values of ui).
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
iii uXY 21 2var
iuiu
1 var1var 2
2
2
i
i
ii u
ui
uu
i
σσ
uσσ
u
iiii u
i
u
i
uu
i uXY
211
, not constant for all i
,1,'ii u
iu
ii HYY
ii u
ii
u
ii
uuXX
','
''' 21 iiii uXHY
6
Of course in practice we do not know the value of i in each observation. However it may be reasonable to suppose that it is proportional to some measurable variable, Zi.
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
iii uXY 21 2var
iuiu , not constant for all i
iu Zi
Assumption:
7
If this is the case, we can make the model homoscedastic by dividing through by Zi.
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
i
i
i
i
ii
i
Zu
ZX
ZZY 21
1
iii uXY 21 2var
iuiu , not constant for all i
iu Zi
Assumption:
8
''' 21 iiii uXHY
222
22
2 /1var
i
i
iu
uu
ii
i
ZZu
i
i
i
i
ii
i
Zu
ZX
ZZY 21
1
The disturbance term in the revised model has constant variance 2. We do not need to know the value of 2. The crucial point is that, by assumption, it is constant.
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
iii uXY 21 2var
iuiu , not constant for all i
iu Zi
,1,'i
ii
ii Z
HZYY
i
ii
i
ii Z
uuZXX ','
Assumption:
We will illustrate this procedure with the UNIDO data on manufacturing output and GDP. We will try scaling by population. A regression of manufacturing output per capita on GDP per capita is less likely to be subject to heteroscedasticity.
9
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
0
50000
100000
150000
200000
250000
300000
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP
Man
ufac
turin
g
Here is the revised scatter diagram. Does it look homoscedastic? Actually, no. This is still a classic pattern of heteroscedasticity.
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 5000 10000 15000 20000 25000 30000 35000 40000
GDP per capita
Man
ufac
turin
g pe
r cap
ita
10
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
RSS2 is much larger than RSS1.
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 5000 10000 15000 20000 25000 30000 35000 40000
GDP per capita
Man
ufac
turin
g pe
r cap
ita
RSS1 = 5,378,000
RSS2 = 17,362,000
11
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
12
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
However, the subsamples are small and high ratios can occur on a pure chance basis. The null hypothesis of homoscedasticity is only just rejected at the 5% level.
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 5000 10000 15000 20000 25000 30000 35000 40000
GDP per capita
Man
ufac
turin
g pe
r cap
ita
RSS1 = 5,378,000
RSS2 = 17,362,000
23.39/000,378,59/000,362,17
//,
11
2212
knRSSknRSSknknF
18.39,9 %5,crit F
Often the X variable itself is a suitable scaling variable. After all, the Goldfeld–Quandt test assumes that the standard deviation of the disturbance term is proportional to it.
13
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
iii uXY 21 2var
iuiu , not constant for all i
iu Xi
Assumption:
Note that when we scale though by it, the 2 term becomes the intercept in the revised model.
14
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
i
i
ii
i
Xu
XXY
211
iii uXY 21 2var
iuiu , not constant for all i
Assumption: iu Xi
It follows that when we interpret the regression results, the slope coefficient is an estimate of 1 in the original model and the intercept is an estimate of 2.
222
22
2 /1var
i
i
iu
uu
ii
i
XXu
i
i
ii
i
Xu
XXY
211
'' 21 iii uHY
15
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
iii uXY 21 2var
iuiu , not constant for all i
Assumption: iu Xi
,1,'i
ii
ii X
HXYY
i
ii X
uu '
Here is the corresponding scatter diagram. Is there any evidence of heteroscedasticity?
0.00
0.10
0.20
0.30
0.40
0 10 20 30 40 50 60 70 80
1/GDP x 1,000,000
Man
ufac
turin
g/G
DP
16
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
No longer. The residual sums of squares for the two subsamples are almost identical, indeed closer than one would usually expect on a pure chance basis under the null hypothesis.
0.00
0.10
0.20
0.30
0.40
0 10 20 30 40 50 60 70 80
1/GDP x 1,000,000
Man
ufac
turin
g/G
DP
RSS2 = 0.070
17
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
RSS1 = 0.065
18
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
As a consequence, the F statistic is not significant. The heteroscedasticity has been eliminated.
0.00
0.10
0.20
0.30
0.40
0 10 20 30 40 50 60 70 80
1/GDP x 1,000,000
Man
ufac
turin
g/G
DP
RSS2 = 0.070
RSS1 = 0.065
08.19/065.09/070.0
//,
11
2212
knRSSknRSSknknF
18.39,9 %5,crit F
We will now consider an alternative approach to the problem. It is possible that the heteroscedasticity has been caused by an inappropriate mathematical specification. Suppose, in particular, that the true relationship is in fact logarithmic.
0
50000
100000
150000
200000
250000
300000
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP
Man
ufac
turin
g
19
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
Here is the corresponding scatter diagram. No sign of heteroscedasticity.
7
8
9
10
11
12
13
9 10 11 12 13 14 15
log GDP
log
Man
ufac
turin
g
20
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
We confirm this with the Goldfeld–Quandt test. In this case there is no point in calculating the conventional test statistic. RSS2 is smaller than RSS1, so it cannot be significantly greater than RSS1.
7
8
9
10
11
12
13
9 10 11 12 13 14 15
log GDP
log
Man
ufac
turin
g
RSS2 = 1.037
RSS1 = 2.140
21
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
In this situation we should test whether there is evidence that the standard deviation of the disturbance term is inversely proportional to the X variable. For this purpose, the F statistic is the inverse of the conventional one.
22
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
7
8
9
10
11
12
13
9 10 11 12 13 14 15
log GDP
log
Man
ufac
turin
g
RSS2 = 1.037
RSS1 = 2.140
The null hypothesis of homoscedasticity is not rejected.
23
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
7
8
9
10
11
12
13
9 10 11 12 13 14 15
log GDP
log
Man
ufac
turin
g
RSS2 = 1.037
RSS1 = 2.140
06.29/037.19/140.2
//,
22
1112
knRSSknRSSknknF
18.39,9 %5,crit F
Now an additive disturbance term in the logarithmic model is equivalent to a multiplicative one in the original model.
24
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
7
8
9
10
11
12
13
9 10 11 12 13 14 15
log GDP
log
Man
ufac
turin
g
uXY loglog 21
ueXeY 21
This means that the absolute size of the effect of the disturbance term is large for large values of the X variable and small for small ones, when the scatter diagram is redrawn with the variables in their original form.
25
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
0
50000
100000
150000
200000
250000
300000
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP
Man
ufac
turin
g
uXY loglog 21
ueXeY 21
For example, Singapore and South Korea have relatively large manufacturing sectors, and Greece and Mexico relatively small ones.
7
8
9
10
11
12
13
9 10 11 12 13 14 15
log GDP
log
Man
ufac
turin
g
South Korea
MexicoSingapore
Greece
26
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
0
50000
100000
150000
200000
250000
300000
0 200000 400000 600000 800000 1000000 1200000 1400000
GDP
Man
ufac
turin
g
The variations for these countries are similar when plotted on the logarithmic scale, but those for South Korea and Mexico are much larger when the variables are plotted in natural units.
South Korea
Mexico
Singapore
Greece
27
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
GDPUNMA 194.0604ˆ 89.02 R)5700( )013.0(
POPGDP
POPPOPUNMA 182.01612
ˆ
)1371( )016.0(
GDPGDPUNMA 1533189.0
ˆ
)841()019.0(
70.02 R
02.02 R
GDPNUAM log999.0694.1ˆlog 90.02 R)785.0( )066.0(
Here is a summary of the regressions using the four alternative specifications of the model.
28
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
The first regression suggests that, for every increase of $1 million in GDP, manufacturing output increases by $194,000. Thus, at the margin, manufacturing accounts for 0.19 of GDP. The intercept does not have any plausible meaning.
29
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
89.02 R)5700( )013.0(
POPGDP
POPPOPUNMA 182.01612
ˆ
)1371( )016.0(
GDPGDPUNMA 1533189.0
ˆ
)841()019.0(
70.02 R
02.02 R
GDPNUAM log999.0694.1ˆlog 90.02 R)785.0( )066.0(
GDPUNMA 194.0604ˆ
However, this regression was subject to severe heteroscedasticity. Although the estimate of the coefficient of GDP is unbiased, it is likely to be relatively inaccurate. Also, and this is a separate effect of heteroscedasticity, the standard errors, t tests and F test are invalid.
30
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
89.02 R)5700( )013.0(
POPGDP
POPPOPUNMA 182.01612
ˆ
)1371( )016.0(
GDPGDPUNMA 1533189.0
ˆ
)841()019.0(
70.02 R
02.02 R
GDPNUAM log999.0694.1ˆlog 90.02 R)785.0( )066.0(
GDPUNMA 194.0604ˆ
In the second regression, the estimate of the slope coefficient was a little lower. However for this regression also the null hypothesis of homoscedasticity was rejected, but only at the 5% level.
31
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
89.02 R)5700( )013.0(
)1371( )016.0(
GDPGDPUNMA 1533189.0
ˆ
)841()019.0(
70.02 R
02.02 R
GDPNUAM log999.0694.1ˆlog 90.02 R)785.0( )066.0(
GDPUNMA 194.0604ˆ
POPGDP
POPPOPUNMA 182.01612
ˆ
In the third regression the model was scaled through by GDP. As a consequence, the intercept became an estimator of the original slope coefficient, and vice versa.
32
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
89.02 R)5700( )013.0(
POPGDP
POPPOPUNMA 182.01612
ˆ
)1371( )016.0(
GDPGDPUNMA 1533189.0
ˆ
)841()019.0(
70.02 R
02.02 R
GDPNUAM log999.0694.1ˆlog 90.02 R)785.0( )066.0(
GDPUNMA 194.0604ˆ
For this model the null hypothesis of homoscedasticity was not rejected. In principle, therefore, it should yield more accurate estimates of the coefficients than the first two, and we are able to perform tests.
33
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
89.02 R)5700( )013.0(
POPGDP
POPPOPUNMA 182.01612
ˆ
)1371( )016.0(
GDPGDPUNMA 1533189.0
ˆ
)841()019.0(
70.02 R
02.02 R
GDPNUAM log999.0694.1ˆlog 90.02 R)785.0( )066.0(
GDPUNMA 194.0604ˆ
For the logarithmic model also the null hypothesis of homoscedasticity was not rejected. So we have two models which survive the Goldfeld–Quandt test. Which do you prefer? Think about it.
34
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
89.02 R)5700( )013.0(
POPGDP
POPPOPUNMA 182.01612
ˆ
)1371( )016.0(
GDPGDPUNMA 1533189.0
ˆ
)841()019.0(
70.02 R
02.02 R
GDPNUAM log999.0694.1ˆlog 90.02 R)785.0( )066.0(
GDPUNMA 194.0604ˆ
You probably went for the logarithmic model, attracted by the high R2. However, in this example, there is little to choose between the third and fourth models. Substantively, they have the same interpretation.
35
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
89.02 R)5700( )013.0(
POPGDP
POPPOPUNMA 182.01612
ˆ
)1371( )016.0(
GDPGDPUNMA 1533189.0
ˆ
)841()019.0(
70.02 R
02.02 R
GDPNUAM log999.0694.1ˆlog 90.02 R)785.0( )066.0(
GDPUNMA 194.0604ˆ
In the third model, 1/GDP has a low t statistic and appears to be an irrelevant variable. The model is telling us that manufacturing output, as a proportion of GDP, is constant. Because it is constant, R2 is effectively 0.
36
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
89.02 R)5700( )013.0(
POPGDP
POPPOPUNMA 182.01612
ˆ
)1371( )016.0(
GDPGDPUNMA 1533189.0
ˆ
)841()019.0(
70.02 R
02.02 R
GDPNUAM log999.0694.1ˆlog 90.02 R)785.0( )066.0(
GDPUNMA 194.0604ˆ
The fourth model is telling us that the elasticity of manufacturing output with respect to GDP is equal to 1. In other words, manufacturing output increases proportionally with GDP and remains a constant proportion of it.
37
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
89.02 R)5700( )013.0(
POPGDP
POPPOPUNMA 182.01612
ˆ
)1371( )016.0(
GDPGDPUNMA 1533189.0
ˆ
)841()019.0(
70.02 R
02.02 R
GDPNUAM log999.0694.1ˆlog 90.02 R)785.0( )066.0(
GDPUNMA 194.0604ˆ
Converting the logarithmic equation back into natural units, you obtain the equation shown. Like the third equation, it implies that manufacturing output accounts for a little over 0.18 of GDP, at the margin.
999.0999.0694.1 184.0 GDPGDPeMANU
38
HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS
GDPGDPUNMA 1533189.0
ˆ
)841()019.0(02.02 R
GDPNUAM log999.0694.1ˆlog 90.02 R)785.0( )066.0(
Copyright Christopher Dougherty 2012.
These slideshows may be downloaded by anyone, anywhere for personal use.Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author.
The content of this slideshow comes from Section 7.3 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press.Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centrehttp://www.oup.com/uk/orc/bin/9780199567089/.
Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school courseEC212 Introduction to Econometrics http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspxor the University of London International Programmes distance learning courseEC2020 Elements of Econometricswww.londoninternational.ac.uk/lse.
2012.11.10