Upload
ehab-zuiter
View
60
Download
2
Embed Size (px)
Citation preview
Information Criteria, Model Selection
Uncertainty and the Determination of
Cointegration Rank
George Kapetanios
National Institute of Economic and Social Research
2 Dean Trench Street, Smith Square,
London SW1P 3HE, UK
May 23, 2000
1 Introduction
The determination of the cointegration rank of a multivariate cointegrated
system has attracted considerable attention in the econometric literature for
the past �fteen years. The most widely used procedures for determining
cointegration rank are those proposed by Johansen (1988). Alternative test-
ing procedures have been suggested among others by Phillips and Ouliaris
(1988), Stock and Watson (1988), Snell (1999) and Bierens (1997).
Nevertheless, the work carried out in the above papers assumes that the
lag order of the vector autoregressive process representing the multivariate
system is known. Since such an assumption is unlikely to hold in reality,
the question of how the lag order selection may a�ect the determination of
the cointegration rank arises. This paper aims to address this question in
a Monte Carlo framework. Additionally, we provide a formal justi�cation
for the application of model selection criteria in selecting the cointegration
rank. More speci�cally we will show that the standard necessary and suÆ-
cient conditions for a criterion to be weakly consistent in lag order selection
extend to the determination of the cointegration rank.
The Monte Carlo investigation tries to shed light on which criterion per-
forms best in determining the cointegration rank in terms of minimising the
1
root mean square error of the rank estimate. The Johansen maximum eigen-
value and trace procedures are also included in the comparison. The results
indicate that not knowing the lag order a�ects signi�cantly the performance
of all methods.
The paper is organised as follows: Section 2 discusses the conditions for
weak consistency of rank determination for information criteria and related
theoretical results. Section 3 discusses alternative methods for jointly de-
termining lag order and cointegration rank in the context of setting up the
Monte Carlo experiments. Section 4 presents the Monte Carlo results. Fi-
nally, Section 5 concludes.
2 Weak Consistency of Information Criteria
We assume that the multivariate system may be written as anm-dimensional
VAR(k) process given by
yt = �+�1yt�1 + : : :+�kyt�k + �t t = 1; : : : ; T (1)
where the error term �t is a zero mean i.i.d. vector with �nite positive de�nite
covariance matrix. This VAR(k) process will be referred to as cointegrated
of rank r if � = I � �1 � : : : � �k has rank r. In this case the matrix �
may be decomposed as � = ��0 where � and � are matrices of dimension
m� r. The error correction representation1 of the system is given by
�yt = ���yt�1 +1�yt�1 + : : :+k�1�yt�k+1 (2)
Standard analysis considers the problem of determining r while assuming
that k is known. Clearly this assumption is not satis�ed in most practical
situations and the usual practice involves the determination of the lag order
of the VAR preocess using model selection criteria such as the Akaike (1974)
(AIC), Schwarz (1978) (BIC) or Hannan and Quinn (1979) (HQ) information
criteria prior to carrying out standard cointegration analysis.
The general form of the loss function minimised by information criteria
is given by
IC(s) = �lT (�) + cT (s) (3)
where lt(�) is the loglikelihood of the model, s is the number of free parame-
ters and ct(s) is a penalty term promoting model parsimony. For the special
1We assume that the error correction representation exists by imposing extra conditions
such as e.g. condition (iii) of Phillips and Chao (1999).
2
case of lag selection in VAR models the loss function takes the form.
IC(k) = T ln j�̂(k)j+ cT (k) (4)
where �̂(k) is the pseudo maximum likelihood2 estimate of the covariance
matrix of the error term �t from a VAR model with lag order equal to k.
The �rst term of IC(k) is simply twice the pseudo-loglikelihood of a VAR(k)
model. cT (k) is a penalty term which di�ers from criterion to criterion and
depends on the number of free parameters and may also depend on the sam-
ple size T . For the three common information criteria the penalty terms are:
2km (AIC), k ln(T ) (BIC) and 2k ln(ln(T )) (HQ). The lag order chosen is
that for which IC(k) is minimised. Paulsen (1984) has shown that for any
VAR process with s unit roots where s < mp and mp � s roots outside the
unit circle, lag order selection with information criteria will be weakly con-
sistent i� cT (k)!1 and cT (k)=T ! 0 as T !1 and cT (k) is bounded in
k. Clearly whereas BIC and HQ are weakly consistent for lag order selection,
AIC is not. This is a well known result for AIC (see e.g. Shibata (1976)).
Note that we choose to have a general expression for the penalty term to
accommodate other less widely used criteria such as e.g. the generalised in-
formation criterion (see Konishi and Kitagawa (1996)).
We will show that for known lag order k0, information criteria whose
penalty terms satisfy these conditions are consistent in selecting the rank
of a cointegrated VAR(k) process. The condition that k is known is of no
consequence asymptotically since it is possible to choose the lag length con-
sistently through information criteria prior to using the same criteria to pick
the cointegration rank consistently.
Phillips and Chao (1999) suggest that the selection of the lag order and
the cointegration rank be carried out jointly and prove the consistency of
a new criterion called the posterior information criterion in this joint de-
termination. In the Monte Carlo study that follows we will contrast the
performance of the criteria in joint and sequential lag and cointegration rank
determination.
Theorem 1 Let yt follow a VAR(k0) process of the form given in 1 with
cointegration rank 0 � r0� m. Then, a criterion with penalty term cT (k
0; r)
will provide an estimate, r̂ which will be weakly consistent for r0 i� cT (k0; r)!
1 and cT (k0; r)=T ! 0 as T !1 and cT (k
0; r) is bounded in k
0 and r.
2Equivalent to the OLS estimate
3
The proof of the theorem is straightforward once use of the results derived
by Johansen (1988) is made. To simplify matters we will assume that the
VAR process has zero mean and that no constant is incuded in the estimation.
Extension to models with deterministic terms are straightforward. More
speci�cally we need to show that the rise of the loglikelihood coming from
increasing r dominates the fall of the penalty term when r < r0 and vice
versa when r > r0. Let �Y = (�y0; : : : ;�yT ), Y �p = (y1�p; : : : ;yT�p),
M = I��Y 0(�Y �Y 0)�1�Y ,R0 = �YM ,R1 = Y �pM , Sij = RiR0
j=T .
Finally, letG be the lower tringular Cholesky decomposition of S11 and �̂1 �
: : : � �̂m be the eigenvalues of GS10S�1
00 S01G0. Then by, say, Proposition
11.1 of L�utkepohl (1991) the di�erence in the loglikelihood of the cointegrated
VAR(k0) for cointegration ranks r1, r0, r1 > r0 is equal to �T
2
Pr1i=r0+1
ln(1�
�̂i). For r0 < r1 < r0 it is clear that T
2
Pr1i=r0+1
ln(1� �̂i) = Op(T ) providing
the required result for r < r0. For r > r
0 we �rstly have that �̂r!
p 0 by
Lemma 4 of Johansen (1988). By a simple expansion we then have that
T ln(1� �̂r) = �T �̂r + op(1). But by Lemma 6 of Johansen (1988) we have
that T �̂r0+1; : : : ; T �̂m converge to the ordered eigenvalues of the equation�����Z
1
0
WW0
du�
Z1
0
W dW0
Z1
0
dWW0
���� = 0
But by the discussion of the distribution of the sum of those eigenvalues in
Larsson (1999) we see that T �̂r0+1; : : : ; T �̂m will asymptotically have a distri-
bution of unbounded support. Therefore, for r0 > r1 > r0, T
2
Pr1i=r0+1
ln(1�
�̂i) = Op(1) providing the required result for r > r0.
Following Pesaran and Pesaran (1997) we specify the number of free
parameters for a model with no intercept or time trend to be equal to
s = m2(k0 � 1) + 2mr � r
2. The penalty terms for AIC, BIC and HQ
are then given respectively by 2s, s ln(T ) and 2s ln(ln(T )) or equivalently by
2~s, ~s ln(T ) and 2~s ln(ln(T )) where ~s = �(m� r)2.
It is clear that AIC is not consistent in rank determination. Nevertheless
it is also clear that the probability of picking a rank which is lower than the
true rank goes to zero asymptotically. In fact the following theorem provides
the means for determining the asymptotic probabilities that AIC will pick a
higher rank than the true one.
Theorem 2 Consider the VAR model of (1). The asymptotic distribution
4
of the rank estimate obtained though AIC, is given by
P (r̂AIC = r) =
�0 if r < r
0
pr if r � r0
and pr are given by expression 7 below.
For r < r0 the result is obvious. Therefore we concentrate on r > r
0. We
will denote the loss function used by AIC by AIC(r) = �lT (r) + 2~s where
lT (r) is the loglikelihood of the model for cointegration rank, r. For r � r0
we have that
P (r̂AIC = r) = P (AIC(r) � AIC(u); r0 � u � m)
Disregarding constant terms with respect to r the loglikelihood is given by
lT (r) = �TPr
i=1ln(1� �̂i). Then,
P (AIC(r) � AIC(u); r0 � u � m) = (5)
P
T
rXi=u+1
ln(1� �̂i) � 2(k � r)2 � 2(k � u)2 (r0 � u < r) and
T
uXi=r+1
ln(1� �̂i) � 2(k � u)2 � 2(k � r)2 (r � u < m)
!
But T ln(1 � �̂r) = T �̂r + op(1). Therefore, for all T larger than a positive
integer M the di�erence between the probability on the RHS of (5) and
P
T
rXi=u+1
�̂i � 2(k � r)2 � 2(k � u)2 (r0 � u < r) and (6)
T
uXi=r+1
�̂i � 2(k � u)2 � 2(k � r)2 (r � u < m)
!
is less that � for any � > 0. De�ne Sl = (T�r0+1 � 2k + 1) + : : :+ (T�l+r0 �
2k + 2l � 1), 0 � l � m � r0 and S0 = 0. Then , the probability in (6) is
equal to
P�Sr�r0 � Su > 0 (1 � u < r � r
0) and (7)
Su � Sr�r0 < 0 (r � r0� u < m� r
0)�
The joint probability distribution of Sl may easily be obtained by simulation
using the standard results of Johansen (1988). For illustrative purposes we
note that for m = 5 and r0 = 2 the probabilities that AIC will choose a
5
rank of 2,3,4 and 5 are respectively 0.502, 0.353, 0.119 and 0.026 for model
with no constant. If a constant is included the above probabilities change to
0.224, 0.398, 0.325 and 0.053. Clearly the existence of deterministic terms
is crucial. The well-known results of Shibata (1976) on the extent of the
asymptotic overestimation of the lag order in stable univariate AR process
show that the probabilitities of selecting a higher lag order than the true
one were not that big. This is not the case here. The chances of selecting a
higher rank are substantial.
3 Monte Carlo Setup
Following the theoretical results presented above we see that the widely used
BIC and HQ criteria are consistent for determining the rank of a cointegrated
system and their use is therefore justi�ed. On the other hand AIC is not con-
sistent. This in it self may be of limited signi�cance since most reduced form
time series models are abstractions from the more complex true data gen-
eration process. Nevertheless, unlike the case of lag selection in stable AR
processes where the chance of picking a higher lag order is not that large,
our theoretical analysis has suggested that AIC will pick a higher rank with
high probability. In fact, the illustrative results we have presented show that
when a deterministic term is included in the VAR model the probability of
picking certain higher ranks that the true rank is larger that the probability
of picking the true rank.
This result is made more poignant if we consider the e�ect of impos-
ing too few unit roots on the system and then use it for forecasting. As
Christo�ersen and Diebold (1998) have pointed out the crucial factor for the
forecasting performance of a VAR model is imposing enough unit roots. The
loss in performance from imposing too many unit roots is smaller than that
of imposing too few.
We now concentrate on the small sample performance of the information
criteria and compare them with the standard testing procedures developed
by Johansen. The performance criterion will be the mean square error of the
rank estimates. We choose this measure because it incorporates information
about both the mean and the variance of the estimates. One major task of
the paper is to evaluate the e�ect of lag order selection on the estimation of
the cointegration rank. To gauge this e�ect we �rstly vary the maximum lag
order permitted and secondly we consider a number of alternative selection
procedures. We use the AIC, BIC and HQ criteria both in a sequential proce-
6
dure where the lag order is determined �rst followed by the determination of
the rank and in a joint procedure where all posible combinations of lag orders
and cointegration ranks are considered. This gives six procedures which will
be respectively denoted by AICAIC, BICBIC, HQHQ, AIC BIC and HQ. We
also consider all combinations of the three information criteria for lag order
selection and the two testing procedures for rank determination. These pro-
cedures will be denoted by AIC TR, AIC MAX, BIC TR, BIC MAX, HQ
TR and HQ MAX. In total, we have 12 alternative procedures to investigate.
Duw to space restrictions and similarity of results, combinations such as the
use of one criterion for lag selection and another for rank determination are
not reported. We consider six possible maximum lag orders ranging from 1
to 6.
We consider a number of data generation processes (DGP) to obtain as
widely relevant results as possible. ALL DGPs are of dimension 5. A system
of such dimension is relatively large but not excessively so. Initial experi-
mentation suggested that the true rank is not a very signi�cant determinant
of performance. We therefore �x it to 2.
The �rst two DGPs are VAR(1) processes with smaller and large stable
eigenvalues respectively. The third and fourth DGPs are VAR(2) processes
with stable dynamics of variable persistence. The same applies to the �fth
and sixth DGPs which are VAR(3) processes. The absolute values of the
eigenvalues of the DGPs are given in Table 1.
The data are generated as follows: We pick appropriate eigenvalues for
the coeÆcent matrices of the error correction representation of the system
such that when the system is writen in VAR representation its eigenvalues
will be those given in 1. Then to construct each of the coeÆcient matrices of
the error correction representation, generically denoted by , the following
procedure is followed: We construct an m � m matrix of pseudo-random
numbers from the standard normal distribution whose columns are then or-
thogonalised via the Gram-Schmidt process to give a matrix A. Then
is given by the Schur decomposition, A�A, where � is a diagonal matrix
whose diagonal elements are the eigenvalues of A. The error terms are stan-
dard normal pseudo-random variables. We consider samples of 50, 60, 70,
..., 190, 200 observations. The absolute value of the mean square error is not
very informative for comparative purposes. Therefore, we obtain the mean
square error of each procedure when no lag selection occurs, i.e. the lag order
is set to the true lag order. Then, for each sample size we get the minimum
of these mean square errors from all procedures and use this to normalise the
7
Table 1: Absolute Values of Eigenvalues of VAR DGPs
DGP
1 2 3 4 5 6
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
0.7 0.95 0.5 0.774 0.713 0.894
0.4 0.9 0.447 0.774 0.713 0.894
0.447 0.774 0.675 0.863
0.4 0.774 0.675 0.863
0.2 0.6 0.558 0.863
0.2 0.6 0.558 0.856
0.2 0.6 0.558 0.856
0.439 0.543
0.393 0.5
0.358 0.463
0.358 0.463
0.358 0.463
mean square errors from the complete procedures where lag order selection
is undertaken. The normalised mean square errors are presented in Figures 1
to 6 at the end of the paper.
4 Monte Carlo Results
The Monte Carlo results point clearly to a number of conclusions. Firstly,
AIC performs worse than all other procedures in almost all cases. The only
exception is DGP 2 where the cointegration e�ects are weak and, therefore,
AIC with its tendency to estimate a higher rank captures them. All proce-
dures seem to be adversely a�ected by an increase in the value of the maxi-
mum lag order considered. It is always the case that considering a higher lag
order than the true one increases the relative mean square error. This e�ect
is more apparent for DGP with less persistent stable dynamics. In fact, in
a number of cases where the lag order is higher than 1 the best outcome is
when the maximum lag order considered is equal to the true lag order but
considering a higher maximum lag order is as detrimental as considering a
lower one. HQ seems to be performing slightly better than BIC. It is ad-
ditionally performing better than MAX and TR in a number of occasions.
The information criterion used to determine the lag order before the TR and
8
MAX procedures are applied seems to be largely irrelevant, although it makes
some di�erence when DGPs with persistent stable dynamics are considered.
Finally, a joint search for lag order and rank (AIC, BIC, and HQ) has a bet-
ter performance compared to a sequential search (AICAIC, BICBIC, HQHQ)
for DGPs with persistent dynamics, but the opposite holds for AIC and HQ,
small samples and for DGPs with less persistent dynamics.
5 Conclusion
This paper has considered the asymptotic validity of using information cri-
teria for the determination of the cointegration rank of nonstationary multi-
variate systems. It was shown that the standard conditions for consistency
of lag order selection extend to this framework. As a result Akaike's informa-
tion criterion was found to be inconsistent. On its own this result has limited
relevance. However in the context of cointegrated model it was analytically
shown that AIC will choose a higher rank with high probability unlike the
case of lag order selection in AR models.
Additionally, the e�ect of lag order selection in the issue of determining
the cointegration rank has been investigated using Monte Carlo techniques.
We found a very signi�cant deterioration in the performance of the proce-
dures compared to the usually assumed case of a known lag order. AIC
performed very poorly in this context unlike BIC, HQ and the testing pro-
cedures by Johansen.
References
Akaike, H. (1974): \A New Look at the Statistical Model Identi�cation,"
I.E.E.E. Trans.Auto. Control.
Bierens, H. (1997): \Nonparametric Cointegration Analysis," Journal of
Econometrics, 77, 379{404.
Christoffersen, P. F., and F. X. Diebold (1998): \Cointegration and
Long Horizon Forecasting," Journal of Business and Economic Statistics,
16, 450{458.
Hannan, E. J., and B. G. Quinn (1979): \The Determination of the Order
of an Autoregression," Journal of the Royal Statistical Society (Series B),
41, 190{195.
9
Johansen, S. (1988): \Statistical Analysis of Cointegration Vectors," Jour-
nal of Economic Dynamics and Control, 12, 231{254.
Konishi, S., and G. Kitagawa (1996): \Generalised Information Criteria
in Model Selection," Biometrika, 83(4), 875{890.
Larsson, R. (1999): \Approximation of the Asymptotic Distribution of the
Log-Likelihood Ration Test for Cointegration," Econometric Theory, 15,
789{813.
L�utkepohl, H. (1991): Introduction to Multiple Time Series Analysis.
Springer, Berlin.
Paulsen, J. (1984): \Order Determination of Multivariate Autoregressive
Time Series with Unit Roots," Journal of Time Series Analysis, 5, 115{
127.
Pesaran, M. H., and B. Pesaran (1997): Micro�t 4.0. Oxford University
Press.
Phillips, P. C. B., and J. C. Chao (1999): \Model Selection in Par-
tially Nonstationary Vector Autoregressive Processes wiith Reduced Rank
Structure," Journal of Econometrics, 91, 227{272.
Phillips, P. C. B., and S. Ouliaris (1988): \Testing for Cointegration
Using Principle Components Methods," Journal of Economic Dynamics
and Control, 12, 205{230.
Schwarz, G. (1978): \Estimating the Dimension of a Model," Annals of
Statistics, pp. 461{464.
Shibata, R. (1976): \Selection of the Order of an Autoregressive Model by
Akaike's Information Criterion," Biometrika, 63(1), 117{126.
Snell, A. (1999): \Testing for r versus r�1 Cointegrating Vectors," Journal
of Econometrics, 88, 151{191.
Stock, J., and M. Watson (1988): \Testing for Common Trends," Jour-
nal of Americal Statistical Association, 83, 1097{1107.
10
11
12
13
14
15
16