AIC BIC HQ

Information Criteria, Model Selection

Uncertainty and the Determination of

Cointegration Rank

George Kapetanios

National Institute of Economic and Social Research

2 Dean Trench Street, Smith Square,

London SW1P 3HE, UK

May 23, 2000

1 Introduction

The determination of the cointegration rank of a multivariate cointegrated

system has attracted considerable attention in the econometric literature for

the past �fteen years. The most widely used procedures for determining

cointegration rank are those proposed by Johansen (1988). Alternative test-

ing procedures have been suggested among others by Phillips and Ouliaris

(1988), Stock and Watson (1988), Snell (1999) and Bierens (1997).

Nevertheless, the work carried out in the above papers assumes that the

lag order of the vector autoregressive process representing the multivariate

system is known. Since such an assumption is unlikely to hold in reality,

the question of how the lag order selection may a�ect the determination of

the cointegration rank arises. This paper aims to address this question in

a Monte Carlo framework. Additionally, we provide a formal justi�cation

for the application of model selection criteria in selecting the cointegration

rank. More speci�cally we will show that the standard necessary and suÆ-

cient conditions for a criterion to be weakly consistent in lag order selection

extend to the determination of the cointegration rank.

The Monte Carlo investigation tries to shed light on which criterion per-

forms best in determining the cointegration rank in terms of minimising the

1

root mean square error of the rank estimate. The Johansen maximum eigen-

value and trace procedures are also included in the comparison. The results

indicate that not knowing the lag order a�ects signi�cantly the performance

of all methods.

The paper is organised as follows: Section 2 discusses the conditions for

weak consistency of rank determination for information criteria and related

theoretical results. Section 3 discusses alternative methods for jointly de-

termining lag order and cointegration rank in the context of setting up the

Monte Carlo experiments. Section 4 presents the Monte Carlo results. Fi-

nally, Section 5 concludes.

2 Weak Consistency of Information Criteria

We assume that the multivariate system may be written as anm-dimensional

VAR(k) process given by

yt = �+�1yt�1 + : : :+�kyt�k + �t t = 1; : : : ; T (1)

where the error term �t is a zero mean i.i.d. vector with �nite positive de�nite

covariance matrix. This VAR(k) process will be referred to as cointegrated

of rank r if � = I � �1 � : : : � �k has rank r. In this case the matrix �

may be decomposed as � = ��0 where � and � are matrices of dimension

m� r. The error correction representation1 of the system is given by

�yt = ��yt�1 +1�yt�1 + : : :+k�1�yt�k+1 (2)

Standard analysis considers the problem of determining r while assuming

that k is known. Clearly this assumption is not satis�ed in most practical

situations and the usual practice involves the determination of the lag order

of the VAR preocess using model selection criteria such as the Akaike (1974)

(AIC), Schwarz (1978) (BIC) or Hannan and Quinn (1979) (HQ) information

criteria prior to carrying out standard cointegration analysis.

The general form of the loss function minimised by information criteria

is given by

IC(s) = �lT (�) + cT (s) (3)

where lt(�) is the loglikelihood of the model, s is the number of free parame-

ters and ct(s) is a penalty term promoting model parsimony. For the special

1We assume that the error correction representation exists by imposing extra conditions

such as e.g. condition (iii) of Phillips and Chao (1999).

2

case of lag selection in VAR models the loss function takes the form.

IC(k) = T ln j�̂(k)j+ cT (k) (4)

where �̂(k) is the pseudo maximum likelihood2 estimate of the covariance

matrix of the error term �t from a VAR model with lag order equal to k.

The �rst term of IC(k) is simply twice the pseudo-loglikelihood of a VAR(k)

model. cT (k) is a penalty term which di�ers from criterion to criterion and

depends on the number of free parameters and may also depend on the sam-

ple size T . For the three common information criteria the penalty terms are:

2km (AIC), k ln(T ) (BIC) and 2k ln(ln(T )) (HQ). The lag order chosen is

that for which IC(k) is minimised. Paulsen (1984) has shown that for any

VAR process with s unit roots where s < mp and mp � s roots outside the

unit circle, lag order selection with information criteria will be weakly con-

sistent i� cT (k)!1 and cT (k)=T ! 0 as T !1 and cT (k) is bounded in

k. Clearly whereas BIC and HQ are weakly consistent for lag order selection,

AIC is not. This is a well known result for AIC (see e.g. Shibata (1976)).

Note that we choose to have a general expression for the penalty term to

accommodate other less widely used criteria such as e.g. the generalised in-

formation criterion (see Konishi and Kitagawa (1996)).

We will show that for known lag order k0, information criteria whose

penalty terms satisfy these conditions are consistent in selecting the rank

of a cointegrated VAR(k) process. The condition that k is known is of no

consequence asymptotically since it is possible to choose the lag length con-

sistently through information criteria prior to using the same criteria to pick

the cointegration rank consistently.

Phillips and Chao (1999) suggest that the selection of the lag order and

the cointegration rank be carried out jointly and prove the consistency of

a new criterion called the posterior information criterion in this joint de-

termination. In the Monte Carlo study that follows we will contrast the

performance of the criteria in joint and sequential lag and cointegration rank

determination.

Theorem 1 Let yt follow a VAR(k0) process of the form given in 1 with

cointegration rank 0 � r0� m. Then, a criterion with penalty term cT (k

0; r)

will provide an estimate, r̂ which will be weakly consistent for r0 i� cT (k0; r)!

1 and cT (k0; r)=T ! 0 as T !1 and cT (k

0; r) is bounded in k

0 and r.

2Equivalent to the OLS estimate

3

The proof of the theorem is straightforward once use of the results derived

by Johansen (1988) is made. To simplify matters we will assume that the

VAR process has zero mean and that no constant is incuded in the estimation.

Extension to models with deterministic terms are straightforward. More

speci�cally we need to show that the rise of the loglikelihood coming from

increasing r dominates the fall of the penalty term when r < r0 and vice

versa when r > r0. Let �Y = (�y0; : : : ;�yT ), Y �p = (y1�p; : : : ;yT�p),

M = I��Y 0(�Y �Y 0)�1�Y ,R0 = �YM ,R1 = Y �pM , Sij = RiR0

j=T .

Finally, letG be the lower tringular Cholesky decomposition of S11 and �̂1 �

: : : � �̂m be the eigenvalues of GS10S�1

00 S01G0. Then by, say, Proposition

11.1 of L�utkepohl (1991) the di�erence in the loglikelihood of the cointegrated

VAR(k0) for cointegration ranks r1, r0, r1 > r0 is equal to �T

2

Pr1i=r0+1

ln(1�

�̂i). For r0 < r1 < r0 it is clear that T

2

Pr1i=r0+1

ln(1� �̂i) = Op(T ) providing

the required result for r < r0. For r > r

0 we �rstly have that �̂r!

p 0 by

Lemma 4 of Johansen (1988). By a simple expansion we then have that

T ln(1� �̂r) = �T �̂r + op(1). But by Lemma 6 of Johansen (1988) we have

that T �̂r0+1; : : : ; T �̂m converge to the ordered eigenvalues of the equation��Z

1

0

WW0

du�

Z1

0

W dW0

Z1

0

dWW0

�� = 0

But by the discussion of the distribution of the sum of those eigenvalues in

Larsson (1999) we see that T �̂r0+1; : : : ; T �̂m will asymptotically have a distri-

bution of unbounded support. Therefore, for r0 > r1 > r0, T

2

Pr1i=r0+1

ln(1�

�̂i) = Op(1) providing the required result for r > r0.

Following Pesaran and Pesaran (1997) we specify the number of free

parameters for a model with no intercept or time trend to be equal to

s = m2(k0 � 1) + 2mr � r

2. The penalty terms for AIC, BIC and HQ

are then given respectively by 2s, s ln(T ) and 2s ln(ln(T )) or equivalently by

2~s, ~s ln(T ) and 2~s ln(ln(T )) where ~s = �(m� r)2.

It is clear that AIC is not consistent in rank determination. Nevertheless

it is also clear that the probability of picking a rank which is lower than the

true rank goes to zero asymptotically. In fact the following theorem provides

the means for determining the asymptotic probabilities that AIC will pick a

higher rank than the true one.

Theorem 2 Consider the VAR model of (1). The asymptotic distribution

4

of the rank estimate obtained though AIC, is given by

P (r̂AIC = r) =

�0 if r < r

0

pr if r � r0

and pr are given by expression 7 below.

For r < r0 the result is obvious. Therefore we concentrate on r > r

0. We

will denote the loss function used by AIC by AIC(r) = �lT (r) + 2~s where

lT (r) is the loglikelihood of the model for cointegration rank, r. For r � r0

we have that

P (r̂AIC = r) = P (AIC(r) � AIC(u); r0 � u � m)

Disregarding constant terms with respect to r the loglikelihood is given by

lT (r) = �TPr

i=1ln(1� �̂i). Then,

P (AIC(r) � AIC(u); r0 � u � m) = (5)

P

T

rXi=u+1

ln(1� �̂i) � 2(k � r)2 � 2(k � u)2 (r0 � u < r) and

T

uXi=r+1

ln(1� �̂i) � 2(k � u)2 � 2(k � r)2 (r � u < m)

!

But T ln(1 � �̂r) = T �̂r + op(1). Therefore, for all T larger than a positive

integer M the di�erence between the probability on the RHS of (5) and

P

T

rXi=u+1

�̂i � 2(k � r)2 � 2(k � u)2 (r0 � u < r) and (6)

T

uXi=r+1

�̂i � 2(k � u)2 � 2(k � r)2 (r � u < m)

!

is less that � for any � > 0. De�ne Sl = (T�r0+1 � 2k + 1) + : : :+ (T�l+r0 �

2k + 2l � 1), 0 � l � m � r0 and S0 = 0. Then , the probability in (6) is

equal to

P�Sr�r0 � Su > 0 (1 � u < r � r

0) and (7)

Su � Sr�r0 < 0 (r � r0� u < m� r

0)�

The joint probability distribution of Sl may easily be obtained by simulation

using the standard results of Johansen (1988). For illustrative purposes we

note that for m = 5 and r0 = 2 the probabilities that AIC will choose a

5

rank of 2,3,4 and 5 are respectively 0.502, 0.353, 0.119 and 0.026 for model

with no constant. If a constant is included the above probabilities change to

0.224, 0.398, 0.325 and 0.053. Clearly the existence of deterministic terms

is crucial. The well-known results of Shibata (1976) on the extent of the

asymptotic overestimation of the lag order in stable univariate AR process

show that the probabilitities of selecting a higher lag order than the true

one were not that big. This is not the case here. The chances of selecting a

higher rank are substantial.

3 Monte Carlo Setup

Following the theoretical results presented above we see that the widely used

BIC and HQ criteria are consistent for determining the rank of a cointegrated

system and their use is therefore justi�ed. On the other hand AIC is not con-

sistent. This in it self may be of limited signi�cance since most reduced form

time series models are abstractions from the more complex true data gen-

eration process. Nevertheless, unlike the case of lag selection in stable AR

processes where the chance of picking a higher lag order is not that large,

our theoretical analysis has suggested that AIC will pick a higher rank with

high probability. In fact, the illustrative results we have presented show that

when a deterministic term is included in the VAR model the probability of

picking certain higher ranks that the true rank is larger that the probability

of picking the true rank.

This result is made more poignant if we consider the e�ect of impos-

ing too few unit roots on the system and then use it for forecasting. As

Christo�ersen and Diebold (1998) have pointed out the crucial factor for the

forecasting performance of a VAR model is imposing enough unit roots. The

loss in performance from imposing too many unit roots is smaller than that

of imposing too few.

We now concentrate on the small sample performance of the information

criteria and compare them with the standard testing procedures developed

by Johansen. The performance criterion will be the mean square error of the

rank estimates. We choose this measure because it incorporates information

about both the mean and the variance of the estimates. One major task of

the paper is to evaluate the e�ect of lag order selection on the estimation of

the cointegration rank. To gauge this e�ect we �rstly vary the maximum lag

order permitted and secondly we consider a number of alternative selection

procedures. We use the AIC, BIC and HQ criteria both in a sequential proce-

6

dure where the lag order is determined �rst followed by the determination of

the rank and in a joint procedure where all posible combinations of lag orders

and cointegration ranks are considered. This gives six procedures which will

be respectively denoted by AICAIC, BICBIC, HQHQ, AIC BIC and HQ. We

also consider all combinations of the three information criteria for lag order

selection and the two testing procedures for rank determination. These pro-

cedures will be denoted by AIC TR, AIC MAX, BIC TR, BIC MAX, HQ

TR and HQ MAX. In total, we have 12 alternative procedures to investigate.

Duw to space restrictions and similarity of results, combinations such as the

use of one criterion for lag selection and another for rank determination are

not reported. We consider six possible maximum lag orders ranging from 1

to 6.

We consider a number of data generation processes (DGP) to obtain as

widely relevant results as possible. ALL DGPs are of dimension 5. A system

of such dimension is relatively large but not excessively so. Initial experi-

mentation suggested that the true rank is not a very signi�cant determinant

of performance. We therefore �x it to 2.

The �rst two DGPs are VAR(1) processes with smaller and large stable

eigenvalues respectively. The third and fourth DGPs are VAR(2) processes

with stable dynamics of variable persistence. The same applies to the �fth

and sixth DGPs which are VAR(3) processes. The absolute values of the

eigenvalues of the DGPs are given in Table 1.

The data are generated as follows: We pick appropriate eigenvalues for

the coeÆcent matrices of the error correction representation of the system

such that when the system is writen in VAR representation its eigenvalues

will be those given in 1. Then to construct each of the coeÆcient matrices of

the error correction representation, generically denoted by , the following

procedure is followed: We construct an m � m matrix of pseudo-random

numbers from the standard normal distribution whose columns are then or-

thogonalised via the Gram-Schmidt process to give a matrix A. Then

is given by the Schur decomposition, A�A, where � is a diagonal matrix

whose diagonal elements are the eigenvalues of A. The error terms are stan-

dard normal pseudo-random variables. We consider samples of 50, 60, 70,

..., 190, 200 observations. The absolute value of the mean square error is not

very informative for comparative purposes. Therefore, we obtain the mean

square error of each procedure when no lag selection occurs, i.e. the lag order

is set to the true lag order. Then, for each sample size we get the minimum

of these mean square errors from all procedures and use this to normalise the

7

Table 1: Absolute Values of Eigenvalues of VAR DGPs

DGP

1 2 3 4 5 6

1 1 1 1 1 1

1 1 1 1 1 1

1 1 1 1 1 1

0.7 0.95 0.5 0.774 0.713 0.894

0.4 0.9 0.447 0.774 0.713 0.894

0.447 0.774 0.675 0.863

0.4 0.774 0.675 0.863

0.2 0.6 0.558 0.863

0.2 0.6 0.558 0.856

0.2 0.6 0.558 0.856

0.439 0.543

0.393 0.5

0.358 0.463

0.358 0.463

0.358 0.463

mean square errors from the complete procedures where lag order selection

is undertaken. The normalised mean square errors are presented in Figures 1

to 6 at the end of the paper.

4 Monte Carlo Results

The Monte Carlo results point clearly to a number of conclusions. Firstly,

AIC performs worse than all other procedures in almost all cases. The only

exception is DGP 2 where the cointegration e�ects are weak and, therefore,

AIC with its tendency to estimate a higher rank captures them. All proce-

dures seem to be adversely a�ected by an increase in the value of the maxi-

mum lag order considered. It is always the case that considering a higher lag

order than the true one increases the relative mean square error. This e�ect

is more apparent for DGP with less persistent stable dynamics. In fact, in

a number of cases where the lag order is higher than 1 the best outcome is

when the maximum lag order considered is equal to the true lag order but

considering a higher maximum lag order is as detrimental as considering a

lower one. HQ seems to be performing slightly better than BIC. It is ad-

ditionally performing better than MAX and TR in a number of occasions.

The information criterion used to determine the lag order before the TR and

8

MAX procedures are applied seems to be largely irrelevant, although it makes

some di�erence when DGPs with persistent stable dynamics are considered.

Finally, a joint search for lag order and rank (AIC, BIC, and HQ) has a bet-

ter performance compared to a sequential search (AICAIC, BICBIC, HQHQ)

for DGPs with persistent dynamics, but the opposite holds for AIC and HQ,

small samples and for DGPs with less persistent dynamics.

5 Conclusion

This paper has considered the asymptotic validity of using information cri-

teria for the determination of the cointegration rank of nonstationary multi-

variate systems. It was shown that the standard conditions for consistency

of lag order selection extend to this framework. As a result Akaike's informa-

tion criterion was found to be inconsistent. On its own this result has limited

relevance. However in the context of cointegrated model it was analytically

shown that AIC will choose a higher rank with high probability unlike the

case of lag order selection in AR models.

Additionally, the e�ect of lag order selection in the issue of determining

the cointegration rank has been investigated using Monte Carlo techniques.

We found a very signi�cant deterioration in the performance of the proce-

dures compared to the usually assumed case of a known lag order. AIC

performed very poorly in this context unlike BIC, HQ and the testing pro-

cedures by Johansen.

References

Akaike, H. (1974): \A New Look at the Statistical Model Identi�cation,"

I.E.E.E. Trans.Auto. Control.

Bierens, H. (1997): \Nonparametric Cointegration Analysis," Journal of

Econometrics, 77, 379{404.

Christoffersen, P. F., and F. X. Diebold (1998): \Cointegration and

Long Horizon Forecasting," Journal of Business and Economic Statistics,

16, 450{458.

Hannan, E. J., and B. G. Quinn (1979): \The Determination of the Order

of an Autoregression," Journal of the Royal Statistical Society (Series B),

41, 190{195.

9

Johansen, S. (1988): \Statistical Analysis of Cointegration Vectors," Jour-

nal of Economic Dynamics and Control, 12, 231{254.

Konishi, S., and G. Kitagawa (1996): \Generalised Information Criteria

in Model Selection," Biometrika, 83(4), 875{890.

Larsson, R. (1999): \Approximation of the Asymptotic Distribution of the

Log-Likelihood Ration Test for Cointegration," Econometric Theory, 15,

789{813.

L�utkepohl, H. (1991): Introduction to Multiple Time Series Analysis.

Springer, Berlin.

Paulsen, J. (1984): \Order Determination of Multivariate Autoregressive

Time Series with Unit Roots," Journal of Time Series Analysis, 5, 115{

127.

Pesaran, M. H., and B. Pesaran (1997): Micro�t 4.0. Oxford University

Press.

Phillips, P. C. B., and J. C. Chao (1999): \Model Selection in Par-

tially Nonstationary Vector Autoregressive Processes wiith Reduced Rank

Structure," Journal of Econometrics, 91, 227{272.

Phillips, P. C. B., and S. Ouliaris (1988): \Testing for Cointegration

Using Principle Components Methods," Journal of Economic Dynamics

and Control, 12, 205{230.

Schwarz, G. (1978): \Estimating the Dimension of a Model," Annals of

Statistics, pp. 461{464.

Shibata, R. (1976): \Selection of the Order of an Autoregressive Model by

Akaike's Information Criterion," Biometrika, 63(1), 117{126.

Snell, A. (1999): \Testing for r versus r�1 Cointegrating Vectors," Journal

of Econometrics, 88, 151{191.

Stock, J., and M. Watson (1988): \Testing for Common Trends," Jour-

nal of Americal Statistical Association, 83, 1097{1107.

10

11

12

13

14

15

16

Documents

AIC BIC HQ