View
2
Download
0
Category
Preview:
Citation preview
ON THE RELATIVE EFFICIENCY OF ESTIMATORS WHICHINCLUDE THE INITIAL OBSERVATIONS IN THE
ESTIMATION OF SEEMINGLY UNRELATEDREGRESSIONS WITH FIRST ORDER~AUTOREGRESSIVE DISTURBANCES
H.E. Doran and W.E. Griffiths
No. 13 - June 1981.
A substantial part of this work was done while Doranwas a Fellow of the Alexander yon Humboldt Foundationat the University of Bonn.
ISSN 0157 - 0188ISBN 0 85834 388 6
ABSTRACT
The generalized least squares estimator for a seemingly unrelated
regressions mode! with first order vector autoregressive disturbances
is outlined, and its efficiency is compared with that of an approximate
generalized least squares estimator which ignores the first observation.
A scalar index for the loss of efficiency is developed and applied to
a special case where the matrix of autoregressive parameters is diagonal
and the regressors are smooth. Also, for a more general mode!, a
Monte Carlo study is used to investigate the relative efficiencies of
the various estimators. The results suggest that Maeshiro 1980) has
overstated the case for the exact generalized least squares estimator,
because, in many circumstances, it is only marginally better than the
approximate generalized least squares estimator.
i. Introduction
Since the work of Prais and Winsten (1954) a great deal of attention
IKadiyala (1968), Poirier (1978), Beach and Mackinnon (1978), Spitzer (1979),
Chipman (1979), Maeshiro (1979), Park and Mitchell (1980), Doran (1981)I has
been paid to the relative efficiency advantages of retaining the initial obser-
vation when estimating a regression model with a first order autoregressive
disturbance. Despite this work on the single equation model, the same problem
in seemingly unrelated regressions with first order autoregressive disturbances
IParks (1967), Guilkey and Schmidt (1973)I does not appear to have attracted
much interest. Two exceptions are the papers by Beach and Maekinnon (1979)
and Maeshiro (1980). Beach and Mackinnon, under the assumption of normally
distributed disturbances, advocate maximum likelihood estimation which retains
the initial observations and which, through the Jacobian term, constrains the
parameter estimates to lie within the stationary region. However, as they
~oint out, except in a very~special case, it is impossible to concentrate any
parameters out of the likelihood function and numerical maximization of the
function with respect to all th~ parameters is likely to be computationally
difficult. An alternative procedure which retains the initial observations,
but which does not ensure the estimates satisfy the stationary constraint, is
to use a properly constructed generalized least squares (GLS) estimator.
Computationally, this estimator involves more steps than the approximate GLS
estimators suggested by Parks (1967) and Guilkey and Schmidt (1973), but it is
likely to be less of a problem than the maximum likelihood estimator. If the
single equation results and the evidence provided by Maeshiro (1980) are any
guide, it could lead to some substantial gains in efficiency. Using a two
equation model with one explanatory variable in each equation, the nonstationary
disturbance assumptions employed by Parks (1967), and a knourn autoregressive
process, Maeshiro (1980) finds that the approximate GLS estimator which ignores
the initial observations can be considerably worse than ordinary least squares,
particularly when the explanatory variables are trended.
The purpose of this paper is similar to that of Maeshiro’s, namely, to
investigate the relative efficiency of "GLS estimators" which use, and do not
use, the initial observations. However, our approach is different. We assume
that the disturbances are stationaryI and that the autoregressive process is
not necessarily diagona!; we use a larger model (three equations with two
explanatory variables in each equation); and, using Monte Carlo experiments,
we concentrate on the situation occurring in practice where the autocorrelation
and variance parameters must be estimated. Also, under the assumption that
the autocorrelation and variance parameters are known, we follow Doran (1981)
and develop a scalar index for the loss of efficiency resulting from omission
of the initial observations. This index can be used for a model with any
number of equations and any number of explanatory variables in each equation.
For the special case of a two equation model with a diagonal autorezressive
process, we examine the index in detail, and this provides some valuable insights
into the various factors which influence the loss of efficiency.
In Section 2 the model and appropriate estimators are discussed. In
Sections the scalar index of loss of efficiency is derived and used to investigate
loss of efficiency, for the case where there are two equations and the autoregressive
process is diagonal. Results of the Monte Carlo experiments using the more
general model are reported in Section 4, followed by some concluding remarks
in Section 5.
- 3 -
2. The Model and Estimators
Following Guilkey and Schmidt (1973) we consider N seemingly unrelated
regression equations of the form
Yi = XiBi + ui’ i = 1,2,...,N. (2.i)¯
where Yi is a (T x I) vector of observations on the i-th dependent variable,
X. is a (T x K.) nonstochastic matrix of observations on K. explanatoryi i i
variables, 8. is a (K. x I) vector of coefficients to be estimated and u. isI i
a (T x i) vector of random disturbances. The system can be written more
compactly as
y= XB +u (2.2)
y~) X is block diagonal of dimensionwhere y’ : (Yi’ Y2’ "’’’ ’
(NT x K), where K :N
[ Ki, B’ :i:l
(6i, B~, ..., B~) and u’ : i , ’)(U , U2, ..., uN ¯
Let u(t) be an (N x i) vector containing a disturbance from each equation
for the t-th time period and suppose
u(t) : Ru(t-l) + s(t)’ t-: 0,+i, +2, ...,(2.3)
where R is an (N x N) matrix with (i,j)th element given by p.. and±] ~(t)
an (N x i) vector of random disturbances such that E[s(t)] = 0 and
is
IE ift : s
E[g(t)~(s~][0 if t ~ s.
(2.4)
We assume the process in equation (2.3) is stationary. This is true if the
roots of Izl RI = 0 are less than one in absolute value. Also, it implies
that the covariance matrix for u(t) does not depend on t. Let this covariance
matrix be given by E[u(t) u~t)] : 0, and let the covariance matrix for the
complete (NT x i) disturbance vector be given by E[uu’] = ~.
- 4 -
If R and Z are known the GLS estimator for 6 can be calculated by first
finding a matrix P such that B2P’ = Z @IT and then applying the seemingly
unrelated regressions estimator [Zellner (1962)] to the transformed model
y* = X*6 + u*, (2.5)
where y* = Py, X* = PX and u* = Pu. This GLS estimator is
~ : (X,p,(E-I@IT)PX)-Ix’p’(E-I@IT)py, (2,6)
and its covarianee matrix is
V(~) : (X’P’(Z-I~IT)PX)-I (2.7)
A suitable matrix P is given by
PII PI2 "’" PIN
P21 P22 "’" P2N
PNI PN2 "’" PNN
where
( TxT )
a.. 0 0 ... 0 0-11
Pii i 0 ... 0 0
o -Pii i ... o o
0 0 -Pii i
(TxT)
Pij
-Pij
0 O-
0 0
0 0
,
and the elements of
(NxN)
(2.8)
2are chosen such that Z = AO A’.
Later we shall return to how the elements of O and A can be derived given
R and Z. First we shall note the form of the transformed variables in
equation (2.5). Transformed observations on the dependent variable, for
example, are given by
i
Y~I : j~l~ijYjl’i : i, 2, ..., N (2.9)
and
N
~i : i, 2, ..., N,
(2.10)PiJYjt-l’ t = 2, 3, ..., T.Y~t : Yit j:l
- 6 -
An important difference between the transformations in (2.9) and (2.10) is
that the transformed initial observations in equation (2.9) will, as we shall
see, depend on both the Pij and the elements of ~, while in equation (2.10) the
transformed observations depend only on the Pij"
Because of this, rather than use the GLS estimator given in equation (2.6),
it is easier to just use the observations in equation (2.10) to obtain an
approximate GLS estimator. Let Pijo be the ((T - i) x T) matrix obtained by
deleting the first row in Pij and let Po be the (N(T i) x NT) matrix containing
Pijo in the (i, j)th block. The approximate GLS estimator is given by
~o : (X’Pg(Z-i~IT-i)PoX)-ix’pg(?’-i~IT_i)PoY (2.11)
and its covariance matrix is
V(~o) = (X’Pg(Z-i~IT_i)PoX)-i (2.12)
We are interested in evaluating, for a number of alternative X, R and ~,
v(~) V(~o). In addition, and more importantly, wehow much "less" is than
wish to determine whether or not the efficiency difference persists°when, in
and 8o, the pfj and the oij (the elements in Z) are replaced by estimates.
Before turning to these questions we shall indicate how the ~ij can be
obtained from the Pit and oi~’ and define estimators b and bo which use^ ^estimates of the pi~] and o~,lj and which are the counterparts of 8 and 8°
respectively.
Following Guilkey and Schmidt (1973) we can use the fact that 0 = ROR’ + Z
to solve for the elements of 8 from
vec(@): (I - R@R)-ivec(Z), (2.13)
where vec(.) is a vector obtained by stacking the columns of a matrix. Then,
to obtain A we find lower triangular matrices H and B such that E = HH’ and
~ = BB’ [Graybill (1969, p.299)] and from these, A = HB-I.3
- 7 -
To obtain the estimator b we can use one of the methods described by
Guilkey and Schmidt (1973), modified to include the initial observations.
The steps are:
(i) Apply least squares to each equation and obtain the least squares
residuals
ui = (IT - Xi(XIXi)-IxI)Yi ’ i = i, 2, ..., N.
(±i)^ ^ A
For each equation regress uit on (Ult_l, u2t_l, ..., UNt_l).
^ ^ ^
This yields estimates Pil’ Pi2’ "’’’ PiN’
notation, ~.
i = i, 2, ..., N, or, in matrix
(iii) Let ~o be the matrix Po with the Pij replaced by the estimates
obtained in step (ii) and regress ~oy on ~oX to obtain the (N(T - i) x i)
residual vector (s12’ s13’ "’’’ SiT’ ~22’ "’’’ S2T’ "’’’ gN2’ "’’’ ~NT)"
(iv)^
Estimate E by Z with elements
^ T ^ ^
oij = [ sitsjt/(T - 1) .t=2(2.14)
(v) Use the elements of ~. and R to find estimates 0 and A of 0 and A^ ^
respectively and use P and A to form P, the complete transformation matrix.o
(vi) Apply the seemingly unrelated regression estimator to the^
observations Py and PX. This yields the estimated generalized least squares
(EGLS) estimator
b : (X’~’ (~-I@IT)~X)-Ix’~’ (~,-l~IT>~y .
The approximate EGLS estimator suggested by Guilkey and Schmidt Can be
obtained by omitting steps (v) and (vi) and using instead
bo : (X’~(~-I@IT_I)~oX)-Ix’~(~-I~IT_I)~oy¯ (2.16)
- 8 -
3. An Index of Loss of Efficiency
In this Section we develop an index for the loss of efficiency which^occurs when the approximate GLS estimator 8° is used instead of the GLS estimator
^
B. Following Doran (1981) we define a~scalar index of loss of efficiency by
VoI-Ivl(3.1
where V = V(~ ) and V = V(B) are given by (2.12) and (2.7) respectively.o o
It is shown in Appendix A that
a : (-l)Nf(-l) - 1 , 3.2
where f(~) is the characteristic function of an N x N matrix P. That is,
3.3
where
p = (QA)(X’VoX)(QA)’ , 3.4
Q,Q : E-I , 3.5
and
i, ,)x’ = diag(x , x2, ..., xN ,
with x~ being the K.-row vector of the first observaiton on the regressori
variables of the i-th equation. The matrix A has already been defined in
(2.8),
If we expand the characteristic function f(B) in the form
NN N-i
f(~) : ~ + I (’llih’~ ’ (3.6ii=l
then from (3.2) we have that
N~ = ~ hi
i=1(3.7)
- 9 -
The coefficients h. may be obtained using the rules for the differentiation ofI
determinants [see, for example, Dhrymes (1978, p.470-471 and p.533-534)].
It is readily shown that
hI = tr(~), h2 = { [tr(~)]2 -tr(~2)}/2, hN = det(~)
Thus, when4
N = i, ~ -- tr(~) = det(~) _= ~
: _i (1 - ~2)x’2
: (i - )x’(X’x)-lx
N : 2, ~ = tr(~) + det(~)
N : 3, e : tr(~) + ~[tr(~)]2 - tr(~2) }+ det(~) .
From the definition (3.4) of ~ it is clear that the sample size T only enters
through the variance-covariance matrix V . Hence, under the usual assumptionO
that lim T-Ix’~-Ix is a finite nonsingular matrix, ~ is of order T-I.
Considering the expression for ~ when N = 3, it is clear that
hI : O(T-I), h2 : 0(T-2) and h3 : 0(T-3).
It seems reasonable to infer that for general N, h. = 0(T-i).i
fairly small T, the dominant term in ~ will be hI = tr(~). We wil! thus
adopt as our measure of inefficiency e, defined from now on by
Thus, even for
~ : tr(~) (3.8)
3.1 Loss of efficiency when R is diagonal and the regressors are smooth
In this subsection we consider the special case where R is diagonal and
the regressors are "smooth". The assumption of a diagonal R has been popular
in the literature [Parks (1967), Kmenta and Gilbert (1970), Maeshiro (1980)]
- i0 -
and, if, in addition, we assume that the regressors are "smooth", it is possible
to write e in a form which provides valuable insights into the factors affecting
it.
We will assume that the regressors are "smooth" in the sense that, for
observations on a regressor Xk,
Xk,t - PXk,t_I -" (i - P)Xk,t
This approximation, and the fact that R is diagonal, mean that P X will be blocko
diagonal with i-th diagonal block given by (i - Pii) Xio, where X.~o is X.~ with
the first row deleted. This in turn implies that
P X= XWo o
where
X is a [N(T-I) x K] block diagonal matrix with X. as the i-th blocko io
W is a (K x K) block diagonal matrix with (i - Pii) IK. as the i-th block.i
Then we have
~ : (QA)(X’VoX)(QA)’
: (QA)[x’(X’P~(Q’~IT_I)(Q~IT_I)PoX)’~](QA)’
= (QA)[x’W-I(x~(Q’@IT_I)(Q~IT_I)Xo)-Iw-Ix](QA),
= (QA)[x*,(Z,Z)-Ix*](QA), (3.9)
where
¯Xl,, x2~" XN*’x*’ = dzag( , ~’ )
xi*’ : (i - p [.,
Z : (Q~IT_I)Xo.
- Ii -
For example, if N = 2, and we use the fact5 that if the regressors are
appropriately scaled and R is diagonal, Z may be interpreted as a correlation
matrix, we have
andr. =
2
6An appropriate matrix Q, defined by (3.5), is
(3.10)
-~ )-½where II = (i + r) ~ and 12 = (i - r are the eigenvalues of Z
Z~Z =
2(I1
+ 1~~)×ioXio (Ii -2 , 2 2 ,
- 12 )X20XI0 (I1 + 12 )X20X20~
-i Then,
(3.11)
We may now proceed in an analogous way to Doran (1981). Suppose
61, 62, ..., 6K are t~e eigenvalues of Z’Z. Then there exists an orthogonal
matrix C such that
Z’Z = CDC’,
where D : diag(61, 62, ..., 6K) ;
If we define the scaled eigenvalues
K6i* = 6i/j!16j ’
then
6i* = 6i/tr(Z’Z) , and
where
(Z,Z)-I : CD*-Ic,/tr(Z,Z) ,
[~ : diag(61 , 62 , ..., 6K ).
- 12 -
Now by (3oli)
~ 222)[
,tr(Z’Z) = (II + I tr(Xl0Xl0) + tr(X~0X20)]
_ (T-l) [1~112 + 1~212~(1-r2)
(3.13)
where
T
I ~il 2 = (T-1)-i Zj:2 Ix’jit 2 (i : 1,2)
and
x~. is the j-th observation on the regressors in equation i.
We are now in a position to decompose (3.9) into components similar to
those obtained by Doran (1981). Substituting (3.12) and (3.13) into (3.9),
after some elementary algebra we obtain
~ :K(I - r2)
(QA)L(QA)’ (3.14)
(t-1)[I~.112 + I;,212]
where
£.. : (L).. : ] m.. (i, j : i, 2) , (3.15)1] l] (l-Pii) (l-pj j )
and the elements m.. are of the form
K
mi] : K-I £!i ~]
where 0 S lwij£1 S !. The numbers wij£ depend only on the orientation of
the vectors x! to the principal directions of Z’Z. Thus, the m.. do noti ~]
depend on Pll or P22"
- 13 -
Equations (3.14) and (3.15) illustrate how the loss in efficiency depends
on
m/(T-1),
(ii)
the number of regressors relative to the number of observations,
the length of the vectors containing the initial observations
relative to the average squared length of the vectors containing the ~emaining
observations,
I×j_l I x:l /
(iii) the "multicollinearity factor", m...
The influence of P lI’ P22 and r, and the precise dependence under (ii), will
also depend on the matrix QA. In Appendix B an explicit expression is derived
for tr[(QA)L(QA)’], and we obtain finally that ~ : tr(~) is given by
K(1-r2)
I %!
2 2 tlmo 2 ) 2~. ~ ii (1-p jj)
i=! j=l (l-Pii)(l-P(3.16)
where fij = fij(Pll’ P22’ r), (see Appendix C) and
fll’ f22 ÷ i1
f12 ÷ 0
as P ll or P22 ÷ i .
It is clear that ~ will become large when either Pll or P22 is close
to unity. As ~ is symmetric with respect to Pll and P22’ to gain a feel for
(3.16) we need only consider the case Pll ÷ i, when (approximately)
K(1-r2) l-p ii )mll
(i-Pii)2
Recalling that ~ is expressed in scaled variables, we obtain (reverting to the
- 14 -
original variables) that as Pll ÷ i,
2 !K(i-r2) (i-p ii) I xiI 2/oii
)2(T-i) (i-p ii [l~il i+I~2
mll
2/o~21
(3.17)
This expression for e gives rise to the following remarks on loss of efficiency
when approximate GLS is used instead of GLS:
(i) The correlation r between the disturbances enters mu!tiplicatively
2through the factor l-r . Thus, if Irl is reasonably close to i,
(ii)
loss of efficiency will not be as severe.
very sensitive to the value of r.
The effects of large differences in I~ll
Otherwise, ~ is not
and 1~21 are
easily identified. If 1~212/o22 << I~iI2/oll, then apart from
the factor (l-r2), loss of efficiency is dominated by the first
equation alone. This case is discussed in Doran (1981). In
particular, ~ will be almost independent of the value of P22" On
the other hand, if I~iI << I~2 2/o22, ~ will tend to be
proportional to (I~ll )/(I~21 2) and will therefore be less.
The relatively large loss in efficiency incurred by Pll being close
to unity is offset by this factor.
The above remarks give some additional insights on Maeshiro’s (1980,
Table i) results. Firstly, he only tabulates results for the case r = 0.65, but
not those for r = 0.0 or r = 0.975. Because of the miltiplicative factor (l-r2)
we suspect that his results would show ARTZEL (that is, approximate GLS in our
terminology) to be even less efficient when r = 0.0 (but not too much), but
almost as efficient as SURGLS (our GLS) when r = 0.975. Secondly, an examination
of the results for the trended case show asymmetry as Pll ÷ i and P22 ÷ i.
2/o22’
£or Maeshiro’s data, 1~2J is less than .0051~11 , and according to <3.16)
we would expect ARTZEL to be more efficient as P22 ÷ i. This effect is very
obvious in his results. Thirdly, again because of the relative magnitudes of
- 15 -
I~ll 2/all and I R21 ~2’ we would expect the efficiency of ARTZEL relative to
SURGLS to be almost independent of P22 when Pll is large. From the last column
of Table 1 it can be seen that as P22 varies from 0.i to 0.9, the efficiency
of ARTZEL only varies from 0.41 to 0.44.
Finally, we must take exception to Maeshiro’s statement that "the
retention of the first observations in the estimation of a SUR model is as
critical as in the case of a single’equation model". We believe that this
statement is far too sweeping as the relative efficiency of approximate GLS
will depend on the relationship of I~iI2/all, I~21 2/~22 and r.
4. Monte Carlo Experiments
The analysis of the previous section gives some useful insights about
when approximate GLS is likely to be inefficient, but is limited in three respects.
(i) The parameter matrices R and Z have been assumed known.
(ii) R has been assumed to be diagonal.
(iii) The regressors have been assumed to be smooth.
Monte Carlo experiments were conducted to examine the relative efficiency
of a number of competing estimators in less restrictive conditions. Of special
interest were the questions of efficiency when R and Z were not known, and the
effects of a misspecification of R.
The model consisted of three equations (N = 3) and two explanatory variables
plus a constant in each equation, so that 8 was of dimension (9 x i). The
explanatory variables were fixed in repeated samples and two settings were used:
XI: X’s independent and uniformly distributed
Xt = Zt x scale factor
X2: X’s trended and With correlations between 0.7 and 0.9.
Xt = [(l.05)t(l + (Zt-O.5)/z)] x scale factor.
In the above Zt is a uniform random variable on the interval zero to one, z was
set to give a fair degree of multicollinearity, and the scale factor was set to
16 -
make the "true R2’s’’ about 0.85. Based on t~e results of Doran (1981) and
Maeshiro (1980), one would expect the loss of efficiency from omitting the
first observation to be greater for the setting X2 where the variables are
trended and exhibit some multicollinearity, than for the setting XI. One
7sample size (T = 20) was used.
Three settings for the disturbance covariance matrix were employed,
namely
with r = 0.0, 0.2 and 0.8. A total of seven settings for R were used and,
classifying them into three categories, these were
(i) R diagonal with elements
(0.9, 0.6, 0.4), (0.5, 0.4, 0.3), (-0.9, 0.6, 0.4).
(ii) R non diagonal, of the form
2 2"(UI+P2+211Pl) ll+2Ulj
With this construction, R has eigenvalues
(~i’ ~l+iP2’ Ul-iP2)" The triplet (~I’ ~i’ P2) was given values
(0.9, 0.6, 0.6), (0.5, 0.6, 0.6) and (-0.9, 0.6, 0.6).
(iii) R = O. This is the case of zero autoregression.
The form of R under (ii) was a convenient way of setting up a nondiagonal R with
an eigenvalue close to the boundary of the stationary region (±0.9). It was
suspected that, with such a setting of R, the dropping of the first observation
would lead to a greater efficiency loss than for settings of R where the
eigenvalues were well within the stationary region.
- 17 -
For each combination of regressor variables, R and ~ (42 combinations),
one hundred samples were generated, and the means and mean square errors of
eight estimators were estimated. Those considered were
b
b
~(i) : (X’X)-Ix’y (OLS)
~(2) : (X’(~-I~I)X)-Ix’(~-I@z)y (SUR)
: ~(3) : (X’Pg(Z-I~IT_I)PoX)-Ix’pg(z-I%IT_I)Poy (approx. GLS)
: ~(4) : (X’P’(~.-I@IT)PX)-Ix’p’(F,-I@IT)Py (GLS)
= ~(5) = (X’~(~.-l@ZT_l)~oX)-ix’~(~-l@ZT_l)~oy (approx. EGLS)
: ~(6) : (X’~’(~-I@IT)~X)-Ix’~’(~-I@IT)~y (EGLS)
~(7) = ~(5) with R assumed to be diagonal^
~(8) = 8(6) with R assumed to be diagonal
To summarize the relative efficiency of each estimator we took the ratio of the
mean square error of each coefficient for each estimator to that of GLS and,
8for each estimator, these were averaged over all coefficients.
The results for r = 0.8 are presented in Table i. We chose not to report
the results for r = 0.0 and r = 0.2 since, in line with the analytical results
of the previous section, the effect of different values of r turned out to be
fairly weak. 9 In Table ~i, the rows and columns are numbered for ease of reference
with the row numbers corresponding to the eight estimators. For example, entry
(6, ii) refers to the case of the EGLS estimator applied to the model in which
R is non-diagonal with roots (-0.9, 0.6, 0.6) and the regressors are not trending.
The following conclusions emerge:
i, Some efficiency is lost by ignoring the first observations, but not
very much. The loss is greatest when a root of R is close to +i and the
regressors are trending. (Compare rows 3 and 4.)
2. The loss of efficiency in moving from the infeasible estimatdrs GLS and
approximate GLS, to the feasible estimators EGLS and approximate EGLS,
is very large. This is particularly so when R is non-diagonal, and
- 18 q
when a root of R lies near +i. (Compare rows 4, 6 and 8; and 3, 5 and
7.)
The inefficiency of approximate EGLS relative to EGLS is negligible.
When R and Z are estimated, it is better to assume R is diagonal, even
when it is not (compare rows 5 and 7, 6 and 8 for columns 7 to 12).
When R = 0~ assuming diagonal R results in only a slight loss in
efficiency (compare rows 2 and 7, and rows 2 and 8 for columns 13 and 14).
In no case can the use of OLS be justified.
5. Concluding Remarks
In this study we have examined the efficiency of approximate GLS procedures
(both feasible and infeasible) relative to the corresponding GLS procedures
for seemingly unrelated regression models with a vector autoregressive disturbance.
In the case of a single equation, application of GLS involves negligible extra
effort relative to approximate GLS. However, in the case of multiple equations
this is not the case. Our interest is inwhether the extra efficiency obtained
by using GLS is worth the considerable extra labour.
This question has also been considered by Maeshiro (1980). Arguing on
the basis of results obtained when the autoregressibe parameters are kno~, he
concludes that "the only viable and reliable mult~equation estimator of a SUR
model with AR(1) disturbances is SURGLS".
The findings of this study strongly disagree with this statement. With
the three equation model with two explanatory variables in each equation, we
did not find any cases where approximate (infeasible) GLS was considerably
inferior to (infeasible) GLSo This is despite the fact that we had setups
where the explanatory variables were trending and the roots of R were close
to ±I. The worst cases were when a root of R was close to +i (columns i, 2,
7and 8 of Table i); but even in these cases approximate GLS was always at
least 78 per cent efficient. When we moved from the infeasible tothe feasible
estimators we found that the resulting loss in efficiency completely swamps the
gain in efficiency Obtained from using (feasible) GLS rather than approximate
(feasible) GLS. Also, relative to (feasible) GLS, the approximate estimator
was in all cases at least 85 per cent efficient~ and in most cases 95 per cent
efficient.
Our experiments also indicated that the best strategy is always to assume
a diagonal autoregressive structure. When there is no autoregression there
is very little loss in efficiency. When there is non-diagonal autoregression~
this mis-specification seems to result in substantial gains in precision.
Appendix A
The loss of efficiency ~ is defined by
wh ere
V-i = X’P’(F.-I~IT)PX
V -i : X,pg(~-i~iT_i)P Xo o
Consider
M : P’(Z-I~IT)P - Pg([-I@IT_I)P°
and let M.. denote the (i, j) T x T block.i]
Now the (i, j) blocks of P and P are related through P]~±j : (p~j,± P’ ) whereo oij
Pij : (~ij’ O, ..., 0). Thus, terms like
oijPij = ( ,p, .p .)
’ Pij °ij PijPij + oil oil
will differ from oijP’..P .., only in the (i i) element.oil ol] ’
M.. must be zero everywhere except in the (i, i) element.i]
Defining the T-vector
This implies that
’ = [i, O, O, O]IT ..., ,
we have that
!
Mij : ~ij IT IT ’
where iij is some scalar.
Thus,
!M : i@ I,T iT (AI)
where
(!)ij : %ij"
21-
appropriately partitioning the matrices P ~.-I~ITBy P~ O
be easily shown that
and Z-I~IT_I, it can
¢ = A’z-IA (A2)
Pre-multiplying (AI) by X’ and postmultiplying by X,
X’MX = V-i -V-io = X’(~@ITI})x
where
x’ = diag(xl, x~ ..., x~) (A3)
and x! is the Ko row vector of the first observation on the regressors of thei i
i-th equation.
Thus
V-I = V-I + xlx’ (A4)o
Taking the determinant of (A4),
~ : I IK + VoX~X’{ (AS)
The KxK matrix V xlx’ is of rank N at most and N < K, which implies thato
V xlx’ is singular. In order to express (AS) in terms of non-singular matriceso
we use the following standard matrix algebra result [see for example Dhrymes
(1978, p.472)].
Suppose A is KxN and B is NxK with N < K.
Then
Thus, setting ~ = -i, A = V x, B = ix’ (A5) may be written in the formo
22 -
iv:if fIN * ~X’VoXI(A6)
The NxN matrix ix’V x is non singular.o
Q,Q = ~-1, then
If we define a matrix Q to satisfy
@ : (QA)’ (QA)
and (A6) may be written as
where
~ : (QA)(x’V x)(QA)’o (A7)
[see Dhrymes (1978, p.470)].
Thus, defining the characteristic function f(~) of ~ by
f(~) = IuIN - ~1,
it follows that
1 : (-l)Nf(-l) - i. (A8)
- 23-
APPENDIX B
In Section 2 the method for obtaining A is outlined. The steps are
(i) Solve O = R@R’ + Z for O;
(ii) Factorize Z = HH’ and ~ = BB’ where H and B are lower triangular;
(iii) A =~HB-I
When R is diagonal application of this procedure is particularly simple.
Using the scaled version of Z, given by
we obtain
Ill_p ii
2 ) -1
(1-p lip 2 2 )-i
r ( l-p lip
2)-i(l-p 22
il-r2 )½1
and
where
-iB
LF½" ,2.½
i IA ~i-Pll ) 0
A l-r(l-PllP22)-l("l-pl12"½) (l-Pl12)-
A -= deto :)2 r2(l_Pl12)(l_P222)(l’PllP22 -
2 2(1-Pll)(1-P22 )(1-Pl!P22)2
(BI)
24 -
Thus,
A = HB-I
Denoting by aij the elements of A (with a12 = 0), we obtain
2 2[(QA)L(QA)’]ii : b~21[(all+a~l) [ll+2a22(all+a21).[12+a22122]
(B2)
[(QA)L(QA)’]22 : ½12 22 [ ( all-a21 ) 2£11-2a22 ( all-a21 )£12+a22~22]
As2 = 2(i_r2)-i 2 2 _2r(l,r2)-i
21+I 2 .and A i-i 2 =
2 2 2tr[(QA)L(QA)’] : (l-r2)-l[(all+a21-2ralla21)~ll+a22~22+2a22(a21-rall)~12]
On substitution of a.. from (C2) we obtain
tr[ (QA)L(QA)’ ] :2 (l-p ii jj
( ) (]_-p )i:l j:l l-Pii jj
where fij -= fi~(Pll’ P22’ r)
and
fll(Pll’ P22’ r) = i +
r2(l-p211) (i-p222)
2 _ 2 2[ (1-P llP 22) r2(1-Pll) (1-P 22)]
(B3)
f22(Pll’ P22’ r) =
¯)2( l-p lip 2 2
)22 2
[(1-PllP22- ~2(1-Pl].)(1-P22)]
(B4)
25 -
f12(~ll’ ~22’ r) =
2)½( 2 ~-2r(l-Pl! I-P22) 2(I-PlIP22)
)22
~2)][ (I-PlIP22 r2(l-Pll) (I-P
(B5
As PlI’ P22 ÷ I, fll ÷ i, f22 ÷ i and f12 ÷ O.
- 26 -
References
Beach, C.M. and J.G. Mackinnon, 1978, "A Maximum Likelihood Procedure for
Regression with Autocorrelated Erroms!’, Econometrica, 46, 51-58.
Beach, C.M. and J.G. Maekinnon, 1979, "Maximum Likelihood Estimation of
Singular Equation Systems with Autoregressive Disturbances",
International Economic Review, 20, 459-464.
Chipman, J.S., 1979, "Efficiency of Least Squares Estimation of Linear Trend
when Residuals are Autocorrelated", Econometrica, 47, 115-128.
Dhrymes, P.J., 1978, Introductory Econometrics, New York: Springer-Verlag.
Doran, H.E., 1981, "Omission of an Observation from a Regression Analysis:
A Discussion on Efficiency Loss, with Applications", Journal of
Econometrics, forthcoming.
Graybill, F.A., 1969, Introduction to Matrices with Applications in Statistics,
Belmont, Calif: Wadsworth.
Guilkey, D.K. and P. Schmidt, 1973, "Estimation of Seemingly Unrelated
Regressions with Vector Autoregressive Errors", Journal of the American
Statistical Association, 68, 642-647.
Kadiyala, K.R., 1968, "A Transformation Used to Circumvent the Problem of
Autocorrelation", Econometrica, 36, 93-96.
Kmenta, J. and R.F. Gilbert, 1970, ’,Estimation of Seemingly Unrelated Regressions
with Autoregressive Disturbances", Journal of the American Statistical
Association, 65, 196-197.
Maeshiro, A., 1979, "On the Retention of the Firs~ Observation in Serial
Correlation Adjustment", International Economic Review, 20, 259-265.
Maeshiro, A., 1980, "New Evidence on the Small Sample Properties of Estimators
of SUR Models with Autocorrelated Disturbances: Things done half-way
may not be done right", Journal of Econometrics, 12, 177-188.
Park, R.E. and B.M. Mitchell, 1980, "Estimating the Autocorrelated Error Model
with Trended Data", Journal of Econometrics, 13, 185-202.
Parks, R.W., 1967, "Efficient Estimation of a System of Regression Equations
when Disturbances are Both Serially and Contemporaneously Correlated",
Journal of the American Statistical Association, 62, 500-509.
Poirier, D.J., 1978, "The Effect of the First Observation in Regression Models
with First-Order Autoregressive Disturbances", Applied Statistics, 27,
67-68.
- 27
Prais, S.J. and C.B. Winsten, 1954, "Trend Estimators and Serial Correlation’~
Chicago: Cowles Commission Discussion Paper No.383.
Spitzer, J.J., 1979, "Small-Sample Properties of Nonlinear Least Squares and
Maximum Likelihood Estimators in the Context of Autocorrelated Errors’’,
Journal of American Statistical Association, 74, 41-47.
Zellner, A., 1962, "An Efficient Method of Estimating Seemingly Unrelated
Regressions and Tests of Aggregation Bias", Journal of the American
Statistical Association, 57, 348-368.
Footnotes
iMaeshiro avoided the assumption of stationary disturbances because he thought
that, under this assumption, calculation of the covariance matrix of the GLS
estimator would require a large matrix inversion. However, as demonstrated in
Section 2, this is not the case.
/i 22When N = i, A, 0 and Z are scalars and it is easily seen that A = - p
3The diagonal elements of H and B are chosen to be positive, ensuring the
uniqueness of A.
4The expression for ~ when N = i is the same as that in Doran (1981).
51t is straightforward (though tedious) to show that if X. (i = i, 2, ..., N) isi
transformed to Xi/oii2, K and ~ are unchanged and ~ becomes a correlation matrix.
6Q is not unique. However, it is easily verified that, regardless of the
choice of Q satisfying (3.5), the characteristic function (3.3) is unique.
7In the preliminary stages of the development of the computer program we also
examined a model with two equations, and two explanatory variables plus a constant
in each equation. Two sample sizes (T = 20 and T = 40) were employed, and our
f~dings were consistent with those reported below.
8This measure of efficiency differs from that used to derive the analytical
results of Section 3, but it was more convenient from the standpoint of the
Monte Carlo experiment.
91nterested readers may obtain the full set of results from the authors.
Recommended