16
Statistical Methodology 6 (2009) 304–319 Contents lists available at ScienceDirect Statistical Methodology journal homepage: www.elsevier.com/locate/stamet A likelihood based estimator for vector autoregressive processes Anindya Roy a,* , Wayne A. Fuller b , YanYan Zhou c a Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, MD 21250, United States b Department of Statistics, Iowa State University, Ames, IA 50011, United States c Department of Statistics, California State University, East Bay, Hayward, CA 94542, United States article info Article history: Received 26 December 2007 Received in revised form 28 October 2008 Accepted 24 November 2008 Keywords: Unconditional maximum likelihood Vector autoregressive process Yule–Walker estimator abstract A onestep estimator, which is an approximation to the uncondi- tional maximum likelihood estimator (MLE) of the coefficient ma- trices of a Gaussian vector autoregressive process is presented. The onestep estimator is easy to compute and can be computed using standard software. Unlike the computation of the unconditional MLE, the computation of the onestep estimator does not require any iterative optimization and the computation is numerically sta- ble. In finite samples the onestep estimator generally has smaller mean square error than the ordinary least squares estimator. In a simple model, where the unconditional MLE can be computed, nu- merical investigation shows that the onestep estimator is slightly worse than the unconditional MLE in terms of mean square error but superior to the ordinary least squares estimator. The limiting distribution of the onestep estimator for processes with some unit roots is derived. © 2008 Elsevier B.V. All rights reserved. 1. Introduction This paper provides a closed form approximation for the unconditional maximum likelihood estimator of the coefficient matrices of a Gaussian vector autoregressive (VAR) process. Unconditional maximum likelihood estimation for a vector autoregression involves maximization of the stationary likelihood of the data, including the stationary distribution of the initial observation, with respect to the parameters of the coefficient matrix and that of the error variance matrix. This involves maximization of a highly nonlinear function of possibly high number of arguments with respect to * Corresponding author. Tel.: +1 410 455 2435; fax: +1 410 455 1066. E-mail address: [email protected] (A. Roy). 1572-3127/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.stamet.2008.11.002

A likelihood based estimator for vector autoregressive processes

Embed Size (px)

Citation preview

Page 1: A likelihood based estimator for vector autoregressive processes

Statistical Methodology 6 (2009) 304–319

Contents lists available at ScienceDirect

Statistical Methodology

journal homepage: www.elsevier.com/locate/stamet

A likelihood based estimator for vector autoregressiveprocessesAnindya Roy a,∗, Wayne A. Fuller b, YanYan Zhou ca Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, MD 21250, United Statesb Department of Statistics, Iowa State University, Ames, IA 50011, United Statesc Department of Statistics, California State University, East Bay, Hayward, CA 94542, United States

a r t i c l e i n f o

Article history:Received 26 December 2007Received in revised form28 October 2008Accepted 24 November 2008

Keywords:Unconditional maximum likelihoodVector autoregressive processYule–Walker estimator

a b s t r a c t

A onestep estimator, which is an approximation to the uncondi-tional maximum likelihood estimator (MLE) of the coefficient ma-trices of a Gaussian vector autoregressive process is presented. Theonestep estimator is easy to compute and can be computed usingstandard software. Unlike the computation of the unconditionalMLE, the computation of the onestep estimator does not requireany iterative optimization and the computation is numerically sta-ble. In finite samples the onestep estimator generally has smallermean square error than the ordinary least squares estimator. In asimple model, where the unconditional MLE can be computed, nu-merical investigation shows that the onestep estimator is slightlyworse than the unconditional MLE in terms of mean square errorbut superior to the ordinary least squares estimator. The limitingdistribution of the onestep estimator for processes with some unitroots is derived.

© 2008 Elsevier B.V. All rights reserved.

1. Introduction

This paper provides a closed form approximation for the unconditional maximum likelihoodestimator of the coefficientmatrices of a Gaussian vector autoregressive (VAR) process. Unconditionalmaximum likelihood estimation for a vector autoregression involves maximization of the stationarylikelihood of the data, including the stationary distribution of the initial observation, with respectto the parameters of the coefficient matrix and that of the error variance matrix. This involvesmaximization of a highly nonlinear function of possibly high number of arguments with respect to

∗ Corresponding author. Tel.: +1 410 455 2435; fax: +1 410 455 1066.E-mail address: [email protected] (A. Roy).

1572-3127/$ – see front matter© 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.stamet.2008.11.002

Page 2: A likelihood based estimator for vector autoregressive processes

A. Roy et al. / Statistical Methodology 6 (2009) 304–319 305

nonlinear constraints. Even with modern computing power this optimization can be quite intractabledue to the nonlinear nature of the constraints. The instability of the maximization procedure ismore severe near the boundary of the parameter process. Thus, in practice, estimation for vectorautoregression has been largely restricted to conditional likelihood estimators such as the ordinaryleast squares estimator.In view of the computational difficulty associated with the unconditional maximum likelihood

estimator, it is reasonable to find approximations to the maximum likelihood estimator that performwell in small tomoderate sample sizes and also can be expressed in closed form. Such approximationscan be based on regression type estimators or can be based on consideration of efficient estimationin the context of local asymptotic normality of autoregressive moving average (ARMA) processes.In the class of regression type estimators, the weighted symmetric estimator, [1, pp 415], andmodified weighted symmetric estimator, [2] provide close approximation to the unconditionalmaximum likelihood estimator in the univariate case. Deo [3] proposed amodification of theweightedsymmetric estimator in the vector case. The Deo [3] estimator has mixed performance comparedto the ordinary least squares estimator, being superior for some parameters and inferior for others.Onestep estimation also has a longhistory in likelihood-based estimation of ARMAprocesses. In locallyasymptotically normal (LAN) families, onestep estimation is known to produce locally asymptoticallyoptimal (locally minimax) estimators. See [4–7] for investigation of the LAN structure the univariatecase. Garel and Hallin [8] establishes the LAN structure and hence a onestep estimation method forthe vector case.In this paper, we will take the regression type approach to approximate the unconditional MLE in

the VAR processes. Our main objective will be to provide a simple form of the maximum likelihoodequations for the VAR processes such that linearization of the equations is possible by replacinga part of the equations with easily obtainable consistent estimators. We give a onestep estimatorof the coefficient matrices of a vector autoregressive process, derived as a solution to approximatelikelihood equations. The onestep estimator has a closed form and its computation only requires aregression program that can compute the ordinary least squares estimator of the VAR coefficients andthe Yule–Walker estimator of the VAR coefficients. Such procedures are available in most commercialsoftware. We show that the onestep estimator is consistent and derive its limiting distribution in thepresence of unit roots. In simple models where the likelihood maximization is possible, Monte Carlosimulation indicates that the finite sample properties of the onestep estimator are closer to those ofthe unconditional maximum likelihood estimator than to those of the least squares estimator.A brief review of the literature reveals that Tsay and Tiao [9], Ahn and Reinse [10], Phillips and

Durlauf [11], Phillips [12], Fountis [13], Fountis and Dickey [14], Johansen [15,16] have discussedproperties of least squares type estimators. In the univariate case, Gonzales-Farias [17], Elliott et al.[18], Elliott [19], Pantula, Gonzales-Farias and Fuller [20], Cox [21] and Shin [22] have studiedthe unconditional maximum likelihood estimator. These studies indicate that treating the initialobservation as fixed can result into loss of efficiency in finite sample. More recently Hong and Tsay[23] have built estimation methods for the vector autoregressive model using Bayesian techniques.We complement this literature by proposing an estimator that is directly based on the unconditionallikelihood equations of a Gaussian vector autoregressive process.The paper is organized as follows: In Section 2, we develop the onestep estimator for the

coefficients of a vector AR(p) model. Section 3 provides limiting distribution results for the onestepestimator. Monte Carlo simulation results comparing the properties of the onestep estimator withthose of the ordinary least squares estimator and of the unconditional maximum likelihood estimatorfor the coefficient matrix of a first order vector autoregressive process are given in Section 4. Section 5contains simulation results for the higher order vector autoregressive processes. In Section 6, wesummarize the findings.

2. A likelihood-based estimator

Consider the stationary k-dimensional pth order autoregressive processYt = A1Yt−1 + A2Yt−2 + · · · + ApYt−p + εt (1)

where the coefficients A1,A2, . . . ,Ap are k× kmatrices and εt ∼ NI(0,Σεε).

Page 3: A likelihood based estimator for vector autoregressive processes

306 A. Roy et al. / Statistical Methodology 6 (2009) 304–319

Let X ′t = (Y ′t , Y′

t−1, . . . Y′

t−p+1) and A = (A1 : A2 : · · · : Ap). Also let B′ = (A′ : M ′), whereM = (I : 0) is a k(p− 1)× kpmatrix with the first k(p− 1)× k(p− 1) block as the identity matrix.Then we can write the pth order autoregressive process as a first order process

Xt = BXt−1 + et , (2)

where e′t = (ε′t , 0, . . . , 0), et ∼ NI(0,Σee) and Σee is a block diagonal matrix with the upper k × k

block equal toΣεε and rest of the blocks are zeros. The stationary variancematrix,V , ofXt is a solutionto the equation

V = BVB′ + Σee.

The maximum likelihood estimator of A is a solution to the likelihood equations, where theequations are obtained by setting the partial derivatives with respect to the parameters equal tozero. We concentrate on the first set of equations which involves the derivative with respect to thecoefficient matrix. Before proceeding further, we define some notation. For k× kmatrices A and B, letL(A) = Vec(A) and (A⊗B)+ = (A′⊗B′

∗1, . . . ,A′⊗B′∗k)′,where B′

∗i is the ith column of B′, and A⊗B

is the Kronecker product of A and B. Differentiation gives

∂L(V )′

∂L(A)= [(BV )′ ⊗ (I : 0)]P + [(I : 0)⊗ (BV )′]+(I ⊗ I − B′ ⊗ B′)−1,

where P is a permutation matrix which puts the zero columns in ∂L(V )′∂L(A) to the right part of the matrix.

The zero columns occur because the derivatives of the constant matrix M = (I : 0) with respect tothe elements of A are zero. Also

∂Xp′V−1Xp∂LV

= −L2V−1XpX′

pV−1− diag(V−1XpX

pV−1),

and ∂ log|V |LV = L2V−1 − diag(V−1), where diag(A) denotes a diagonal matrix whose diagonal

elements are those of A. Letting Q (A) =∑nt=1(Yt − AXt−1)′Σ−1εε (Yt − AXt−1),we have

∂Q (A)∂L(A)

= −2L

Σ−1εε

(n∑

t=p+1

YtX ′t−1

)+ 2L

Σ−1εε A

(n∑

t=p+1

Xt−1X ′t−1

).

Let

z =12(I ⊗ I − B′ ⊗ B′)−1

× L2V−1 − diag(V−1)− 2V−1XpX′

pV−1+ diag(V−1XpX

pV−1) (3)

andw = Pz . Then the likelihood equations are

2(I : 0)L−1(z)BV + Σ−1εε AS11 = Σ−1εε

(n∑

t=p+1

YtX ′t−1

)(4)

or

2ΣeeL−1(z)BV + BS11 = S01 (5)

where Sij =∑nt=p+1 Xt−iX

′t−j, i, j = 1, 2. The likelihood equations are highly non-linear equations

in B. One way to construct an approximation to the solution is to replace some of the parametricfunctions with consistent estimators. A first approximation to the solution of the likelihood equationsis the solution to the resulting system. We seek substitutions such that the resulting approximatelikelihood equations are linear. In the expression (5), we replace V with the consistent estimatorn−1S11 and replaceΣee with the consistent estimator Σee, where Σee is the estimated error covariancematrix in the ordinary least squares regression of Xt on Xt−1. We replace z by a consistent estimator zbased on the Yule–Walker estimator of B. Let Byw, S11,yw and Σee,yw be the Yule–Walker regression

Page 4: A likelihood based estimator for vector autoregressive processes

A. Roy et al. / Statistical Methodology 6 (2009) 304–319 307

estimator of A, the sum of squares and product matrix from the Yule–Walker regression and thenormalized sum of squares and product matrix of the residuals from the Yule–Walker regression,respectively. We estimate z by replacing Bwith its consistent estimator Byw . Note that I ⊗ I − B⊗ Bhas all positive roots. Using the likelihood equation (5), the onestep estimator of B is

BOS = [I + 2n−1ΣeeL−1(z)]−1BOLS, (6)

where

z =12(I ⊗ I − B′yw ⊗ B′yw)

−1

× L2S−111,yw − diag(S−111,yw)− 2S

−111,ywXpX

pS−111,yw + diag(S

−111,ywXpX

pS−111,yw).

Often, the estimators are corrected for non-zero mean of the Yt process. The mean-corrected versionof (6) will be when the varaiables Yt , Yt−1, . . . , Yt−p in Eq. (1) are replaced by the mean-correctedvariables yt−j = Yt−j − Y(j) and Y(j) = (n− p)−1

∑n−jt=p+1−j Yt−j. Because the estimator in (6) is of the

form

B′OS = (A′

OS : M′), (7)

Eq. (7) defines the onestep estimator, AOS, of the coefficient matrix A.

3. Limiting distribution

Even though the estimator in (6) is constructed with the stationary likelihood equations, one canevaluate the estimator for models with unit roots. We now derive the limiting distribution of theonestep estimator for the canonical form of the system.We describe below the canonical form for thecoefficient matrices. The elements of the coefficient matrices of any other representation are linearcombinations of the elements of the canonical coefficientmatrices.We study only processeswhere theportion of the Jordan canonical form associated with the unit roots is a g-dimensional identity matrix.Our canonical form for the pth order k-dimensional autoregressive process with g unit roots is

Yt = H1Yt−1 +p∑i=2

Hi∆Yt−i+1 + et , (8)

where (Y ′1t , Y′

2t) = Y ′t , the vectorY1t is the vector composedof the first g elements ofYt , the upper leftg× g portion ofΣee is Ig , H1 = block diagIg ,H1,22, and H1,22− Ik−g is nonsingular. The conformablepartition of et of (8) is (e′1t , e

2t)′, where e1t is a g-dimensional vector. We can write

e2t = e′1tΣee12 + a2t ,

whereΣee12 is the upper g × (k− g) part ofΣee, a2t is uncorrelated with e1t and

Ea2ta′2t = Σaa22. (9)

We also write the first order representation of the process in (2) in a canonical form. The Jordan blockassociated with the unit roots of B is diagonal if there exists a nonsingular matrix Γ such that

B = Γ−1diag(I, B22)Γ .

Thus if Xt is a first order vector autoregressive process satisfying (2), Γ is a nonsingular matrix andXt = Γ Xt , we say that the process Xt is in the canonical form if Xt satisfies

Xt =(I 00 B22

)Xt−1 +

(e1te2t

), E

e1te′1t e1te′2te2te′1t e2te′2t

=Σee,11 Σee,12

Σee,21 Σee,22

, (10)

where Σee,11 is positive definite. Then the forms (8) and (10) are equivalent in the sense that theprocess Yt can be reduced to (8) if and only if the process Xt can be reduced to (10).

Page 5: A likelihood based estimator for vector autoregressive processes

308 A. Roy et al. / Statistical Methodology 6 (2009) 304–319

The following theorem gives the limiting distribution of the onestep estimator of the coefficientmatrices of the canonical form in the first order representation. We give the distributionfor the mean adjusted estimator. Before proceeding we define some notations. Let Dn =

diag(n, n, . . . , n, n1/2, . . . , n1/2), and

G =∫ 1

0W ∗(r)W ∗′(r)dr, Υ =

∫ 1

0W ∗(r)dW ∗′(r), ζ =

∫ 1

0W (r)dr,

whereW (r) is the standard vectorWiener process,W ∗(r) = W (r)−∫ 10 W (s)ds, Ggg is the upper left

g × g portion of G , the first g rows of Υ consist of (Υ 11,Υ 12). Let Ψ 2. be a vector of zero mean normalrandom variables with

VarVec(Ψ 2.) = Σee ⊗ V−1LL11,

VLL11 = EX2,t−1X ′2,t−1. Also, let ϕgg be the matrix composed of the first g rows of 2L−1(ϕ)Σee, and

the specific form of ϕ is given Lemma 3 of the appendix. We now state our main theorem.

Theorem 1. Let Yt be a k-dimensional pth order autoregressive process reducible to the canonical form(8). Let Xt be the corresponding first order process written in the canonical form (10). Assume the et areiid(0,Σee) random variables or that the et are martingale differences with respect to sigma-fields Atgenerated by e1, e2, . . . , et satisfying

E(et , ete′t)|At−1 = (0,Σee) a.s.

and

E|et |2+δ|At−1 < M1 <∞ a.s.

for some δ > 0. Partition X ′t as (X′

1t ,X′

2t), where X1t is the g-dimensional vector with g unit roots. Assumethat (1X ′1t ,X

2t) is a stationary process with E1X1t = 0 and X1,0 = 0. Let the onestep estimator of Bbe the mean-corrected estimator defined by (6) and the discussion immediately following (6). Then

Dn(B′OS − B′) L−→

(G−1gg (Υ 11,Υ 11Σee12 + Υ 12Σ

1/2aa22)

Ψ 2.

)+

(ϕgg0

).

The results of Theorem 1 are what one would expect. For the components of the VAR associatedwith roots less than one, the unconditional MLE, the ordinary least squares estimator and the onestepestimator all have the same limiting distribution. However, in presence of unit roots, the limitingdistribution of all three estimators are different. Simulation results in the later sections indicate thatthe limiting distribution of the onestep estimator in presence of unit roots maybe closer to that of theunconditional MLE than that of the ordinary least squares estimator.

4. Simulation results: First order process

In this section, we present some Monte Carlo results for a particular form of estimator (6) for thefirst order process

Yt − µ = A(Yt−1 − µ)+ et , et ∼ NI(0,Σee), (11)

where Yt andµ are 2-dimensional column vectors. Because we can always diagonalize at least one ofthe coefficientmatrix and the error covariancematrix, we choose the coefficientmatrix to be diagonal.The parameter matrices are

µ =

(00

), A =

(a11 00 a22

). (12)

In our simulation |a11| ≥ |a22|. The error vectors, et , are distributed as NI(0,Σee), where

Σee =

(1 ρρ 1

), ρ = 0.00, 0.70.

Page 6: A likelihood based estimator for vector autoregressive processes

A. Roy et al. / Statistical Methodology 6 (2009) 304–319 309

The observations are createdwith Yi0 = 0 if aii = 1, i = 1, 2, and as zeromean stationary observationswhen |a11| < 1. The ordinary least squares estimator

AOLS =(a11,OLS a12,OLSa21,OLS a22,OLS

)is computed as the ordinary least squares regression of Yt on Yt−1 with an intercept term. That is

AOLS =

[n∑t=2

(Yt − Y(0))(Yt−1 − Y(1))′][

n∑t=2

(Yt−1 − Y(1))(Yt−1 − Y(1))′]−1

, (13)

where

Y(j) = (n− 1)−1n∑t=2

Yt−j, j = 0, 1. (14)

The estimated error covariance matrix is

Σee = (n− 4)−1n∑t=2

(Yt − Y(0))− AOLS(Yt−1 − Y(1))

× (Yt − Y(0))− AOLS(Yt−1 − Y(1))′. (15)

The computation of the unconditional maximum likelihood estimator of A involves maximizationof the stationary likelihood

L(Y | A,µ,Σee) = C −n2log |Σee| −

12log |V | −

12(Y1 − µ)′V−1(Y1 − µ)

−12

n∑t=2

((Yt − µ)− A(Yt−1 − µ))′Σ−1ee ((Yt − µ)− A(Yt−1 − µ)), (16)

where Y = (Y1, Y2, . . . , Yn) is the data matrix. For the first order bivariate autoregressive process,one needs to maximize the likelihood (16) with respect to nine parameters. The argument functionis highly nonlinear and the parameters satisfy nonlinear constraints. None of the standard softwarepackages offer any direct computation of this maximum. The nonlinear optimization routines areunstable when the parameter space is of moderate size. For example, the nonlinear optimizationsubroutines in PROC IML of the SAS software packagewarns of poor performancewhen the dimensionof the parameter space is greater than six. Our simulations show that when we try to maximizethe likelihood with respect to the parameters in A and Σee, assuming the value of µ is known, thecomputations suffer from convergence problems and local maxima problems. The problems are moresevere when the process have roots close to one, in which case the function is more nonlinear and theparameters are near the boundary.We report here the empirical properties of an estimator (referred to as the maximum likelihood

estimator from here onward) that is obtained by maximizing the likelihood (16) with respect to Aand substituting Y and Σee for µ and Σee, respectively, where Y = n−1

∑nt=1 Yt . Thus the maximum

likelihood estimator of A is

AMLE =(a11,MLE a12,MLEa21,MLE a22,MLE

)where

AMLE = argmax L(Y1, Y2, . . . , Yn | A, Y , Σee).

For computational convenience, prior to calculation of Eq. (6) the data are transformed using anonsingular transformation T , such that

TS11,ywT ′ = I and T ΣeeT ′ = Λ,

Page 7: A likelihood based estimator for vector autoregressive processes

310 A. Roy et al. / Statistical Methodology 6 (2009) 304–319

whereΛ is a diagonal matrix. Suppose P∆P ′ and QΛQ ′ are the spectral representations of S11,yw and∆−1/2P ′ΣeeP∆−1/2 respectively, where P and Q are orthogonal matrices. Then T = Q ′∆−1/2P ′. Thetransformed model is

T (Yt − µ) = AT (Yt−1 − µ)+ Tet ,

where A = TAT−1. The onestep estimator

AOS =(a11,OS a12,OSa21,OS a22,OS

)is computed as

AOS = T−1AOST , (17)

where AOS is

AOS = [I + 2n−1ΛL(z)]−1AOLS,

AOLS = T AOLST−1,

z =12(I − Ayw ⊗ Ayw)−1LI − 2Ty1y ′1T

′+ diag(Ty1y ′1T

′),

y1 = Y1 − Y and Y = n−1∑nt=1 Yt . The estimator Ayw is the Yule–Walker estimator computed using

the transformed observations T (Yt − Y ).The estimators (13) and (17) are computedwithmean adjusted observations. For the ordinary least

squares estimator (13) the dependent variable Yt and the explanatory variable Yt−1 were adjusted byY(0) and Y(1), respectively, where Y(0) and Y(1) are defined in (14). The ordinary least squares estimator(15) of the error covariance matrix Σee was computed as the sum of squares of residuals from theordinary least squares regression of (Yt − Y(0)) on (Yt−1 − Y(1)).The computations for the maximum likelihood estimator are done with the Nelder–Mead

nonlinear optimization subroutine, ‘‘nlpnms’’, in PROC IML of the SAS software. The ‘‘nlpnms’’subroutine allows for nonlinear constraints. The parameter A is restricted to have eigenvalues lessthan 0.999999 in absolute value. The Yule–Walker estimator of A, Ayw is used as the initial estimator.The iterations failed to converge 8 times out of 1000 iterations for the model with non-diagonalerror covariance matrix. We also tried the constrained optimization over the full nine dimensionalparameter space, i.e., over the space forΘ = (µ,A,Σee). The iterations failed to converge more oftenand the results were poor. Analytical derivatives of the objective function are very complicated andnearly intractable and no gradient based optimization routine could be used. Numerical computationof the derivatives suffer from computational difficulty similar to that faced in computation of thefunction. In fact the difficulty of computation are more severe for the derivatives as they are highlynonlinear.The computation for the onestep estimator (17) requires computation of the Yule–Walker

estimator Ayw and also requires the initial observation Y1. The observations used for computation ofthe Yule–Walker regression and the initial observation Y1, were mean adjusted by Y = n−1

∑nt=1 Yt .

In the stationary case, the mean adjusted estimator of A is asymptotically equivalent to the estimatorcomputed with the original observations (Yt − µ). The work of Park [24] and Pantula [25] indicatethat when the process has a root close to one in absolute value and the sample size is small, the meanadjusted estimators are preferred to the ones calculated with the original observations (Yt − µ).Table 1 gives empirical properties of the ordinary least squares estimator, themaximum likelihood

estimator and the onestep estimator of a11, for various values of a11 and a22. The sample size, n, is 100and the results are for 10,000 Monte Carlo replications. The biases of the onestep estimator and themaximum likelihood estimator are, in general, slightly smaller than that of the least squares estimator.For large values of a11 the bias of the onestep estimator is about 85% of the bias of the ordinary leastsquares estimator.The mean square errors of the ordinary least squares estimator, the onestep estimator and the

maximum likelihood estimator of a11 are given in the last three columns.When a11 is close to or equal

Page 8: A likelihood based estimator for vector autoregressive processes

A. Roy et al. / Statistical Methodology 6 (2009) 304–319 311

Table 1Empirical properties of the onestep estimator and the least squares estimator of a11 for model (11) with ρ = 0.00, (n = 100).

Median Mean n×MSEa11 a22 a11,OLS a11,OS a11,MLE a11,OLS a11,OS a11,MLE a11,OLS a11,OS a11,MLE

0.00 0.00 −0.005 −0.006 −0.006 −0.007 −0.007 −0.007 1.021 1.022 1.0210.40 0.00 0.380 0.381 0.381 0.376 0.377 0.376 0.973 0.977 0.9750.40 0.40 0.374 0.374 0.374 0.371 0.371 0.371 0.984 0.983 0.9800.70 0.00 0.676 0.675 0.674 0.668 0.669 0.668 0.707 0.701 0.7030.70 0.40 0.671 0.671 0.671 0.664 0.665 0.665 0.718 0.708 0.7100.70 0.70 0.666 0.666 0.665 0.660 0.661 0.660 0.775 0.776 0.7720.90 0.00 0.868 0.871 0.870 0.858 0.861 0.860 0.497 0.461 0.4650.90 0.40 0.866 0.867 0.866 0.858 0.860 0.859 0.484 0.455 0.4610.90 0.70 0.864 0.867 0.864 0.853 0.856 0.855 0.589 0.548 0.5580.90 0.90 0.859 0.860 0.861 0.849 0.852 0.851 0.621 0.600 0.5900.95 0.00 0.916 0.919 0.918 0.907 0.910 0.909 0.435 0.389 0.3930.95 0.40 0.915 0.919 0.918 0.906 0.911 0.909 0.425 0.391 0.3960.95 0.70 0.912 0.916 0.914 0.902 0.907 0.905 0.495 0.441 0.4540.95 0.90 0.907 0.912 0.911 0.898 0.904 0.901 0.561 0.512 0.5160.95 0.95 0.902 0.907 0.906 0.895 0.899 0.897 0.596 0.568 0.5591.00 0.00 0.956 0.965 0.963 0.948 0.957 0.953 0.443 0.354 0.3671.00 0.40 0.955 0.964 0.962 0.944 0.954 0.950 0.513 0.411 0.4281.00 0.70 0.954 0.967 0.963 0.944 0.956 0.950 0.515 0.400 0.4261.00 0.90 0.951 0.964 0.959 0.941 0.955 0.947 0.564 0.446 0.4801.00 0.95 0.947 0.959 0.953 0.936 0.948 0.941 0.662 0.558 0.5771.00 1.00 0.941 0.948 0.946 0.931 0.941 0.937 0.709 0.634 0.623

to one, the onestep estimator and the maximum likelihood estimator have a smaller variance thanthe ordinary least squares estimator. For such cases the mean square error of the onestep estimatorof a11 is about 80%–90% of that of the ordinary least squares estimator. The properties of the onestepestimator for a11 in the case a11 = 1 and a22 = 0 are close to the properties of theweighted symmetricestimator of a unit autoregressive coefficient in a univariate first order autoregression.When both a11and a22 are far from one, the properties of the three estimators are very similar with the maximumlikelihood estimator having slightly superior performance.Table 2 gives the empirical properties of the estimators of a22. When both a11 and a22 are large, the

onestep estimator is better than the ordinary least squares estimator but worse than the maximumlikelihood estimator with respect to the mean square error criterion. When a22 is far from one, theestimators have similar behavior.The performances of the onestep estimators of a21 and a12 relative to the corresponding ordinary

least squares estimators are similar to the performances of the onestep estimators of a11 and a22,relative to the corresponding ordinary least squares estimators. The same relative relationships holdfor the maximum likelihood estimators. Hence the empirical properties of the estimators of a21 anda12 are not reported here.Tables 3 and 4 give the empirical properties of the estimators for a11 and a22, respectively, for

the model with non-diagonal error covariance matrix. The onestep estimator still outperforms theordinary least squares estimator. However, the maximum likelihood estimator is substantially betterfor models with large roots.The performance of the onestep estimator relative to that of the ordinary least squares estimator

for sample sizes, n = 50 and n = 500, not reported here, is similar to that for n = 100.

5. Simulation results: Higher order process

For higher order processes the attempts to compute the maximum likelihood estimator yieldedunsatisfactory results, especially when the parameters are near the boundary of the parameter space.Thus for the higher order processes we only provide the empirical properties of the ordinary leastsquares estimator and the onestep estimator. We present simulation results for the two dimensionalAR(2) process. The results are based on 10,000 samples of size 100. The vector Yt is composed of two

Page 9: A likelihood based estimator for vector autoregressive processes

312 A. Roy et al. / Statistical Methodology 6 (2009) 304–319

Table 2Empirical properties of the onestep estimator and the least squares estimator of a22 for model (11) with ρ = 0.00, (n = 100).

Median Mean n×MSEa11 a22 a22,OLS a22,OS a22,MLE a22,OLS a22,OS a22,MLE a22,OLS a22,OS a22,MLE

0.00 0.00 −0.009 −0.010 −0.010 −0.010 −0.010 −0.010 0.990 0.990 0.9890.40 0.00 −0.015 −0.015 −0.015 −0.014 −0.014 −0.014 1.005 1.006 1.0050.40 0.40 0.382 0.380 0.381 0.374 0.374 0.374 0.936 0.930 0.9310.70 0.00 −0.012 −0.012 −0.012 −0.014 −0.014 −0.014 0.975 0.977 0.9760.70 0.40 0.373 0.373 0.373 0.370 0.371 0.370 0.965 0.968 0.9670.70 0.70 0.668 0.667 0.667 0.661 0.661 0.661 0.758 0.764 0.7560.90 0.00 −0.018 −0.018 −0.018 −0.018 −0.018 −0.018 1.006 1.005 1.0040.90 0.40 0.369 0.369 0.369 0.367 0.366 0.366 1.010 1.004 1.0060.90 0.70 0.661 0.660 0.661 0.655 0.655 0.656 0.810 0.795 0.7950.90 0.90 0.857 0.859 0.859 0.849 0.851 0.851 0.616 0.598 0.5890.95 0.00 −0.023 −0.023 −0.023 −0.023 −0.023 −0.023 1.039 1.037 1.0360.95 0.40 0.367 0.367 0.367 0.365 0.365 0.365 0.974 0.972 0.9730.95 0.70 0.661 0.662 0.662 0.656 0.656 0.656 0.802 0.794 0.7940.95 0.90 0.855 0.855 0.857 0.845 0.847 0.847 0.690 0.677 0.6640.95 0.95 0.904 0.908 0.908 0.895 0.900 0.898 0.591 0.557 0.5461.00 0.00 −0.019 −0.019 −0.019 −0.020 −0.020 −0.020 0.995 0.995 0.9951.00 0.40 0.367 0.368 0.369 0.363 0.362 0.362 1.048 1.044 1.0471.00 0.70 0.658 0.657 0.658 0.651 0.651 0.652 0.859 0.858 0.8531.00 0.90 0.851 0.850 0.854 0.842 0.843 0.846 0.700 0.687 0.6631.00 0.95 0.897 0.897 0.900 0.887 0.889 0.891 0.741 0.716 0.6791.00 1.00 0.940 0.948 0.947 0.930 0.939 0.936 0.738 0.655 0.646

Table 3Empirical properties of the onestep estimator and the least squares estimator of a11 for model (11) with ρ = 0.70, (n = 100).

Median Mean n×MSEa11 a22 a22,OLS a22,OS a22,MLE a22,OLS a22,OS a22,MLE a22,OLS a22,OS a22,MLE

0.00 0.00 −0.012 −0.011 −0.012 −0.010 −0.011 −0.010 1.932 1.933 1.9220.40 0.00 0.379 0.381 0.382 0.374 0.374 0.376 1.605 1.611 1.6610.40 0.40 0.372 0.373 0.373 0.372 0.372 0.372 1.850 1.847 1.8330.70 0.00 0.673 0.672 0.673 0.664 0.665 0.666 0.938 0.926 0.9280.70 0.40 0.663 0.665 0.665 0.658 0.659 0.659 1.149 1.132 1.1330.70 0.70 0.665 0.667 0.666 0.658 0.658 0.657 1.373 1.375 1.3570.90 0.00 0.865 0.868 0.868 0.856 0.858 0.859 0.547 0.509 0.5080.90 0.40 0.862 0.864 0.864 0.854 0.856 0.856 0.589 0.564 0.5570.90 0.70 0.856 0.859 0.859 0.846 0.850 0.849 0.828 0.794 0.7870.90 0.90 0.860 0.863 0.863 0.852 0.855 0.854 0.947 0.923 0.8950.95 0.00 0.915 0.918 0.918 0.905 0.908 0.908 0.464 0.420 0.4170.95 0.40 0.913 0.917 0.917 0.904 0.906 0.907 0.491 0.464 0.4570.95 0.70 0.907 0.912 0.912 0.898 0.902 0.901 0.598 0.555 0.5450.95 0.90 0.904 0.911 0.907 0.893 0.901 0.897 0.802 0.741 0.7350.95 0.95 0.903 0.905 0.904 0.894 0.898 0.897 0.915 0.903 0.8641.00 0.00 0.956 0.964 0.963 0.948 0.955 0.953 0.446 0.369 0.3671.00 0.40 0.955 0.962 0.962 0.944 0.951 0.950 0.522 0.445 0.4241.00 0.70 0.954 0.962 0.963 0.944 0.951 0.950 0.548 0.465 0.4451.00 0.90 0.951 0.963 0.961 0.939 0.951 0.946 0.640 0.546 0.5241.00 0.95 0.945 0.957 0.953 0.932 0.945 0.939 0.833 0.732 0.6971.00 1.00 0.940 0.947 0.945 0.932 0.941 0.938 0.952 0.920 0.829

components, where the ith component is defined as

Yi,t = α1iYi,t−1 + α2iYi,t−2 + ei,t , i = 1, 2, (18)

where

α1i = m1i +m2i, α2i = −m1im2i,

|m1i| ≤ 1, |m2i| < 1, for i = 1, 2, and (m1i,m2i) are the roots of the process. When |m1i| < 1 and|m2i| < 1, the initial two observations are drawn from the appropriate stationary distribution. In the

Page 10: A likelihood based estimator for vector autoregressive processes

A. Roy et al. / Statistical Methodology 6 (2009) 304–319 313

Table 4Empirical properties of the onestep estimator and the least squares estimator of a22 for model (11) with ρ = 0.70, (n = 100).

Median Mean n×MSEa11 a22 a11,OLS a11,OS a11,MLE a11,OLS a11,OS a11,MLE a11,OLS a11,OS a11,MLE

0.00 0.00 −0.009 −0.008 −0.008 −0.006 −0.006 −0.006 2.021 2.023 2.0130.40 0.00 −0.009 −0.009 −0.012 −0.009 −0.009 −0.011 1.756 1.760 1.8000.40 0.40 0.378 0.378 0.377 0.373 0.373 0.372 1.886 1.879 1.8720.70 0.00 −0.001 −0.001 −0.003 −0.002 −0.002 −0.004 1.390 1.390 1.4290.70 0.40 0.388 0.388 0.387 0.384 0.385 0.384 1.464 1.468 1.4730.70 0.70 0.668 0.668 0.668 0.663 0.664 0.663 1.325 1.335 1.3160.90 0.00 −0.002 −0.002 −0.003 −0.000 −0.000 −0.001 1.075 1.072 1.0740.90 0.40 0.398 0.398 0.397 0.391 0.391 0.390 1.184 1.179 1.1860.90 0.70 0.685 0.684 0.682 0.675 0.675 0.674 1.131 1.117 1.1300.90 0.90 0.853 0.858 0.857 0.847 0.849 0.848 0.987 0.970 0.9420.95 0.00 0.000 0.000 −0.001 −0.000 −0.000 −0.002 1.044 1.045 1.0480.95 0.40 0.395 0.396 0.395 0.392 0.392 0.391 1.012 1.016 1.0220.95 0.70 0.687 0.688 0.688 0.680 0.681 0.679 0.867 0.865 0.8620.95 0.90 0.865 0.864 0.867 0.856 0.855 0.856 0.957 0.952 0.9310.95 0.95 0.903 0.906 0.906 0.895 0.900 0.899 0.882 0.854 0.8161.00 0.00 −0.006 −0.006 −0.009 −0.005 −0.005 −0.009 1.073 1.073 1.0931.00 0.40 0.390 0.391 0.387 0.386 0.388 0.385 0.967 0.962 0.9771.00 0.70 0.679 0.682 0.682 0.673 0.676 0.675 0.781 0.781 0.7741.00 0.90 0.872 0.875 0.875 0.865 0.869 0.867 0.647 0.643 0.6361.00 0.95 0.918 0.918 0.921 0.909 0.913 0.912 0.668 0.669 0.6311.00 1.00 0.938 0.948 0.944 0.930 0.939 0.935 0.991 0.954 0.893

case of a unit root, the Yit process is defined by

1Yi,t = m2i1Yi,t−1 + ei,t , i = 1, 2.

Yi,0 is set to zero, and Yi,1 = 1Yi,1 is drawn from the stationary distribution. The errors et are drawnfrom a bivariate normal distribution with identity covariance matrix. We present the simulationresults for the process written in the canonical form

Yt = H1Yt−1 + H21Yt−1 + et (19)

where H1 = A1 + A2, H2 = −A2 and Aj = diag(αj1, αj2), j = 1, 2. Let hi,lk denote the (l, k)th elementof Hi. The performance of the onestep estimator and the least squares estimator of the elements ofH2 are very similar. Also the relative performance of the onestep estimators and the least squaresestimators of h1,21 and h1,12 are similar to the relative performance of the estimators for h1,11 andh1,22 respectively. Hence results for those estimators are not reported here.Table 5 gives the ratio of the mean square error of the onestep estimator of h1,11 to that of the

corresponding least squares estimator for various combinations of the roots of the process. Whenat least one of the roots of the autoregressive polynomial is close to or equal to one, the onestepestimator of h1,11 outperforms the least squares estimator of h1,11 with respect to the mean squareerror criterion.Whenone of the roots is equal to one, and all other roots are zero, the onestep estimatorhas a mean square error that is about 20% smaller than that of the ordinary least squares estimator.The gain is similar to that in the two dimensional first order process with one root equal to one andthe other root equal to zero, or that in the first order univariate process with a unit root. When all theroots are far from one the two estimators have similar performance.Table 6 gives the ratio of mean square error of the onestep estimator of h1,22 to that of the

corresponding least squares estimator.When the process has roots close to one, the onestep estimatoris, in general,more efficient. However, for some of the parameter values investigated, the least squaresestimator performs marginally better than the onestep estimator. The maximum estimated relativeefficiency of the least squares estimator for the parameter values investigated is about 1.015. Theonestep estimator is about 15% more efficient than the ordinary least squares estimator when theprocess has two unit roots.

Page 11: A likelihood based estimator for vector autoregressive processes

314 A. Roy et al. / Statistical Methodology 6 (2009) 304–319

Table 5Ratio of themean square error of the onestep estimator and that of the least squares estimator of h1,11 formodel (19) (n = 100).

m120.80 0.00 −0.80m22 m22 m22

m11 m21 0.80 0.00 −0.80 0.00 −0.80 −0.80

1.00 1.00 0.871 0.872 0.871 0.850 0.836 0.8320.95 0.844 0.837 0.829 0.826 0.807 0.8140.90 0.829 0.841 0.839 0.837 0.801 0.8180.70 0.843 0.833 0.836 0.799 0.811 0.8000.40 0.835 0.827 0.821 0.785 0.803 0.8020.00 0.838 0.828 0.826 0.785 0.790 0.798

0.95 0.95 0.877 0.881 0.881 0.912 0.900 0.9320.90 0.889 0.868 0.876 0.897 0.910 0.9180.70 0.882 0.875 0.875 0.891 0.911 0.9270.40 0.881 0.872 0.875 0.900 0.900 0.9250.00 0.869 0.860 0.858 0.888 0.907 0.920

0.90 0.90 0.897 0.911 0.904 0.940 0.945 0.9720.70 0.904 0.903 0.925 0.945 0.959 0.9650.40 0.904 0.889 0.910 0.932 0.938 0.9910.00 0.914 0.888 0.885 0.932 0.946 0.985

0.70 0.70 0.950 0.957 0.955 0.998 1.005 1.0090.40 0.951 0.948 0.957 1.000 1.005 1.0100.00 0.958 0.963 0.965 1.009 0.991 1.015

0.40 0.40 0.966 0.977 0.985 1.012 1.005 1.0060.00 0.977 0.968 0.964 1.015 1.011 1.007

0.00 0.00 0.989 0.986 0.988 1.010 1.006 0.999

Table 6Ratio of themean square error of the onestep estimator and that of the least squares estimator of h1,22 formodel (19) (n = 100).

m120.80 0.00 −0.80m22 m22 m22

m11 m21 0.80 0.00 −0.80 0.00 −0.80 −0.80

1.00 1.00 0.880 0.871 0.854 0.867 0.845 0.8330.95 0.891 0.911 0.922 0.909 0.917 0.8990.90 0.936 0.941 0.962 0.942 0.950 0.9290.70 0.961 0.988 0.997 0.994 0.987 1.0060.40 0.968 1.005 1.004 1.005 1.011 1.0020.00 0.948 1.002 0.991 1.013 1.006 0.994

0.95 0.95 0.890 0.912 0.906 0.909 0.916 0.8960.90 0.913 0.948 0.937 0.950 0.930 0.9560.70 0.933 0.972 0.977 0.988 1.002 1.0040.40 0.945 1.001 0.993 0.991 1.010 0.9950.00 0.947 1.002 0.993 1.019 0.989 1.006

0.90 0.90 0.898 0.931 0.931 0.946 0.973 0.9550.70 0.945 0.973 0.988 0.978 0.991 1.0010.40 0.962 0.995 1.002 1.006 1.008 1.0060.00 0.961 1.000 1.007 0.991 1.006 1.005

0.70 0.70 0.953 0.966 0.984 1.004 1.010 1.0040.40 0.940 1.002 1.006 1.003 1.012 1.0050.00 0.966 1.008 1.004 0.998 1.008 1.000

0.40 0.40 0.973 1.015 1.002 1.008 0.995 1.0060.00 0.968 1.012 0.997 1.005 1.003 1.003

0.00 0.00 0.984 0.997 1.005 1.000 1.001 1.003

6. Summary

A onestep estimator which is an approximation to the maximum likelihood estimator of theparameters of a Gaussian vector autoregressive process is suggested. The onestep estimator is given

Page 12: A likelihood based estimator for vector autoregressive processes

A. Roy et al. / Statistical Methodology 6 (2009) 304–319 315

in closed form and is easy to compute. The numerical calculations for the onestep estimator are stable,even when the process has roots close to or equal to one and can be done using standard software.When the process is stationary, the limiting distribution of the onestep estimator is the same as thatof the ordinary least squares estimator. However in the presence of unit roots the onestep estimatorhas a different limiting distribution than that of the least squares estimator. In the univariate casethe onestep estimator reduces to the weighted symmetric estimator. In finite samples the onestepestimator is superior to the ordinary least squares estimator with respect to the mean square errorcriterion for most of the parameter space and is, especially superior when the autoregressive processhas roots close to or equal to one.

Acknowledgments

We would like to thank the anonymous referees for their detailed comments and suggestions.

Appendix

First we need to prove some lemmas that will be needed in the proof of our main theorem.

Lemma 1. Let Σ be a n × n positive definite matrix. Let c1, c2 . . . cn be n complex numbers with 0 <Re(c1) ≤ Re(c2) ≤ · · · ≤ Re(cn), where Re(ci) denotes the real part of ci. Let cij = (ci + cj)−1, whereci denotes the complex conjugate of ci. Let C be the matrix whose ijth element is cij. Then the Hadamardproduct C Σ is positive definite.

Proof of Lemma 1. Let z = (z1, z2, . . . , zn)′ be any nonzero vector. Then

z ′Cz =n∑i,j=1

zizjcij =n∑i,j=1

∫ 1

0zizjxci+cj−1dx =

∫ 1

0

∣∣∣∣∣ n∑i=1

zixci−1/2∣∣∣∣∣2

dx ≥ 0,

where the integral exists because Re(ci) > 0. Thus C is a nonnegative definite matrix. Therefore thereexists a matrix P such that C = PP ′. Then

z ′C Σz =n∑i,j=1

n∑k=1

σijzipkizjpkj =n∑k=1

n∑i,j=1

σijzipkizjpkj

=

n∑k=1

(z1pk1, . . . , znpkn)Σ(z1pk1, . . . , znpkn)′.

At least one of the vectors (z1pk1, . . . , znpkn) is nonzero. Otherwise some column of P is zero and wehave a contradiction. Therefore C Σ is positive definite. N

Lemma 2. Let ci,n, i = 1, 2, . . . , g be g sequences of complex numbers such that Re(ci,n) > 0 for alli, n. limn→∞ ci,n = ci, i = 1, 2, . . . , g and Re(ci) > 0, i = 1, 2, . . . , g. Let Σn be any sequence ofpositive semidefinite kp× kp matrices defined by

Σn =

(Σ11,n Σ12,nΣ21,n Σ22,n

),

where kp is a positive integer, Σ11,n is the upper left g × g block of Σn and Σ11,n is positive definite.Assume limn→∞Σn = Σ, where Σ is a positive semidefinite matrix and the upper left g × g block, Σ11,is positive definite. Let An be a sequence of matrices, with partition conformable toΣ, defined as

An =(Λn 00 A22,n

)+

(o(n−1) O(n−1/2)O(n−1/2) O(n−1/2)

),

where Λn = diag(λ1,n, λ2,n, . . . , λg,n), with λi,n = 1− n−1ci,n. Assume that all roots, possibly complex,of the upper left g × g block of An are less than one in absolute value. A22,n is a sequence of matrices

Page 13: A likelihood based estimator for vector autoregressive processes

316 A. Roy et al. / Statistical Methodology 6 (2009) 304–319

with all roots less than one in absolute value, and limn→∞ A22,n = A22 where all roots of A22 are less than1 − r in absolute value for some r ∈ (0, 1). Let Vn be the sequence of matrices satisfying the equationVn = AnVnA′n + Σn. Let f (An,Σn) =: (I − A′n ⊗ A′n)

−1(V−1n ⊗ I). Then

limn→∞

f (An,Σn) = H∞(c1, c2, . . . , cg ,A22,Σ),

where H∞(c1, c2, . . . , cg ,A22,Σ) is a finite matrix.

Proof of Lemma 2. Let∆n = diag(I − ΛnΛ∗n, I), whereΛ∗

n is the complex conjugate ofΛn. Now

f (An,Σn) = (I − A′n ⊗ A′n)−1(∆n ⊗ I)(∆−1n ⊗ I)(V−1n ⊗ I).

We will show that limits of (I − A′n ⊗ A′n)−1(∆n ⊗ I) and (∆−1n ⊗ I)(V−1n ⊗ I) exist, and are finite.

By assumption

An ⊗ An =(Λn 00 A22,n

)⊗

(Λn 00 A22,n

)+

(op(n−1) Op(n−1/2)Op(n−1/2) Op(n−1/2)

).

By definition of λn,

limn→∞

(1− λiλj)−1(1− |λi|2) = 2(ci + cj)−1Re(ci),

and limn→∞(I − λiA22,n) = I − A22 is nonsingular. From which it follows that

limn→∞

(I − A′n ⊗ A′n)−1(∆−1n ⊗ I) =: H1,∞(c1, c2, . . . , cg ,A22) (A.1)

exists and is a finite matrix. We will now show that limn→∞(∆−1n ⊗ I)(V−1n ⊗ I) exists, and is finite.We can write Vn as

Vn =∞∑k=0

AknΣnAkn′. (A.2)

Partition Vn as

Vn =(V11,n V12,nV21,n V22,n.

).

Then by (A.2), V11,n =∑∞

k=0 ΛknΣnΛ

kn∗+ o(n). Therefore the (ij)th element of V11,n is

(1− λiλj)−1σ11,n,ij + o(n), σ11,n,ij is the (ij)th element ofΣ11,n. Also by (A.2)

V21,n =∞∑k=0

ΛknΣ21,nAkn′+ O(n1/2), and V12,n =

∞∑k=0

AknΣ12,nΛkn∗+ O(n1/2).

Now limn→∞ Λn = I , limn→∞ A22,n = A22, and all the roots of A22 are less than one in absolute value.Therefore by dominated convergence theorem,

limn→∞

V21,n = (I − A22)−1Σ21, and limn→∞

V12,n = Σ12(I − A′22)−1.

Using (A.2), V22,n = V22,n + O(n−1/2), where V22,n is a positive definite matrix obtained as a solutionto V22,n = A22,nV22,nA′22,n + Σ22,n. Therefore

limn→∞

∆nVn =(

V11 0(I − A22)−1Σ21 V22

).

The (ij)th element of V11 is 2Re(ci)(ci + cj)−1σ11,ij and V22 is the positive definite matrix satisfyingV22 = A22V22A′22 + Σ22. Then, V11 = 2diag[Re(c1), Re(c2), . . . , Re(cg)]C Σ11, where C Σ11 is

Page 14: A likelihood based estimator for vector autoregressive processes

A. Roy et al. / Statistical Methodology 6 (2009) 304–319 317

defined in Lemma 4. BecauseΣ11 is positive definite and Re(ci) > 0, i = 1, 2, . . . , g , by Lemma 4, V11is nonsingular. Thus

limn→∞

∆−1n V−1n =(

V11 0(I − A22)−1Σ21 V22

)−1.

Then

limn→∞

(∆−1n ⊗ I)(V−1n ⊗ I) =: H2,∞(c1, c2, . . . , cg ,A22,Σ), (A.3)

where H2,∞(c1, c2, . . . , cg ,A22,Σ) is a finite matrix. Combining (A.1) and (A.3), we have the result.N

Lemma 3. Let Yt be a k dimensional pth order autoregressive process written in the first order canonicalform (8). Let all the assumptions of Theorem 1 hold. Let z , defined by (3), be estimated by

z =12(I − B′yw ⊗ B′yw)

−1

× L2S−111,yw − diag(S−111,yw)− 2S

−111,ywxpx

pS−111,yw + diag(S

−111,ywxpx

pS−111,yw),

where Byw and S11,yw are defined just before (6). Then

z L−→ ϕ,

where the distribution of ϕ is the distribution of 12H∞(r1, r2, . . . , rg , B22,Σee)L(2I−W ), r1, . . . , rg areroots of

|Ir − Ξgg | = 0, (A.4)

in absolute value whereΞgg is the upper g × g block of the matrix random variableΞ which has the samedistribution as limn→∞ Dn(Byw − B)′ , H∞(.) is defined in Lemma 2,

W = Diag(W−11 )W1 + 2W−11 W2 − diag(W−11 W2W−11 )W1,

W1 = diag(Ggg ,Σee11), W2 = diag(Σ1/2ee11ζζ

′Σ1/2ee11, Ψ 1Ψ

1), G is defined in Theorem 1, Ψ is a meanzero normal random variable with VarΨ 1 = (I − B−1

22 )−1Σee,22(I − B22)−1.

Proof of Lemma 3. Let z1,n = 12 (I − B′yw ⊗ B′yw)

−1(S−111,yw ⊗ I) and let

z2,n = L2I − diag(S−111,yw)S11,yw − 2S−111,ywxpx

p + diag(S−111,ywxpx

pS−111,yw)S11,yw.

Then z = z1,nz2,n and

Byw =(B11,n B12,nB21,n B22,n

)=

(Λn 00 B22,n

)+

(op(n−1) Op(n−1/2)Op(n−1) op(n−1/2)

),

where Λn = diag(λ1,n, λ2,n, . . . , λg,n) and λi,n = 1 − n−1ri, i = 1, 2, . . . , g . By Lemma 1, ri’s arecomplex random variables with strictly positive real part. By Lemma 2, p limn→∞ B22,n = B22. Fromthe Yule–Walker regression it can be seen that S11,yw satisfies the equation S11,yw = BywS11,ywB′yw +Σee,yw, and p limn→∞ Σee,yw = Σee. Therefore

[n(Λn − I), B22,n, B12,n, B21,n, Σee,yw]L−→ [diag(r1, r2, . . . , rg), B22, 0, 0,Σee].

By Skorohod’s device we can get a probability space (Ω,F ,P ) on which are defined randomvariables (Λn, Bn, Σee,n), and (r1, r2, . . . , rg , Σee) such that

Bn =(B11,n B12,nB21,n B22,n

)=

(Λn 00 B22,n

)+

(op(n−1) Op(n−1/2)Op(n−1) op(n−1/2)

),

[n(Λn − I), Bn, Σee,yw]L∼ [n(Λn − I), B, Σee,n],

Page 15: A likelihood based estimator for vector autoregressive processes

318 A. Roy et al. / Statistical Methodology 6 (2009) 304–319

where B = block diag(I, B22) and

limn→∞[n(Λn − I), B22,n, B12,n, B21,n, Σee,n] = [diag(r1, r2, . . . , rg), B22, 0, 0, Σee]

for allω in B and P(B) = 1, andL∼means equal in distribution. Also on (Ω,F ,P ), there exists random

variable V11,n such that V11,n satisfies V11,n = BnV11,nB′n+ Σee,n with probability one. If z1,n is definedby z1,n = 1

2 (I − B′n ⊗ B′n)−1(V−111,n ⊗ I) then

(V11,n, z1,n)L∼ (S11,yw, z1,n).

Then for any subsequence nm, there exist a further subsequence nmk , such that

limk→∞[n(Λnmk − I), Bnmk , Σee,nmk ] = [diag(r1, r2, . . . , rg), B, Σee],

for all ω in C and P(C) = 1. Let D = B⋂C . Then P(D) = 1, and for all ω in D (Λnmk , Bnmk , Σee,nmk )

satisfies all assumptions of Lemma 5. Therefore by Lemma 5, we have

limk→∞

z1,nmk =12H∞(r1, rg , . . . , rg , , B′22, Σee),

for all ω in D. Because the end result is independent of the subsequence we chose, we have

limn→∞

z1,n =12H∞(r1, rg , . . . , rg , , B′22, Σee).

Thus

z1,nL−→

12H∞(r1, rg , . . . , rg , , B′22, Σee).

Since12H∞(r1, r2, . . . , rg , B′22, Σee)

L∼12H∞(r1, rg , . . . , rg , , B′22, Σee),

we have

z1,nL−→

12H∞(r1, r2, . . . , rg , B′22,Σee).

To show that z2,nL−→ L−1(2I −W ), note that S11,yw and xpx′p have the same rate of convergence to

limiting random variables. From Lemma 10.3.2 in [1], the sum of squares and products of the samplemean of the nonstationary part of Xt is converging in distribution toΣ

1/2ee,11ζζ

′Σ1/2ee,11 and by Theorems

4.4.1 and 6.1.2 in [1], the samplemean of the stationary part is converging in distribution toΨ 1, where

VarΨ 1 = 2π f (0) = (I − B−1′

22 )−1Σee,22(I − B22)−1,

where f (ω) is the spectral density of the stationary component of Xt . Then using Theorem 10.3.3 in[1], we have the result. N

Proof of Theorem 1. By Lemma6, 2n−1Σee,OLSL−1(z) isOp(n−1). Expanding [I+2n−1Σee,OLSL−1(z)]−1,from (6), we have

D−1n (BOS − B)′ = D−1n (BOLS − B)′

−2n−1D−1n B′OLSL−1(z)Σee,OLS[I + 2n−1Σee,OLSL−1(z)]−1

Modifying Theorem 10.3.3 in [1] for the mean corrected case, we have

D−1n (BOLS − B)′ L−→

(G−1gg (Υ 11,Υ 11Σee12 + Υ 12Σ

1/2aa12)

Ψ 2.

).

Page 16: A likelihood based estimator for vector autoregressive processes

A. Roy et al. / Statistical Methodology 6 (2009) 304–319 319

Because p limn→∞ BOLS =(I 00 B22

)and p limn→∞ Σee,OLS = Σee, by Lemma 3, we have

D−1n (BOS − B)′ L−→

(G−1gg (Υ 11,Υ 11Σee12 + Υ 12Σ

1/2aa12)

Ψ 2.

)+

(ϕgg0

). N

References

[1] W.A. Fuller, Introduction to Statistical Time Series, 2nd ed., John Wiley, New York, 1996.[2] A. Roy, W.A. Fuller, Estimation for autoregressive time series with a root near one, J. Bus. Econom. Statist. 19 (2001)482–493.

[3] R.S. Deo, Tests for unit roots in multivariate autoregressive processes. Ph.D. Dissertation, Iowa State University, Ames,Iowa, 1995.

[4] A.R. Swensen, The asymptotic distribution of the likelihood ratio for autoregressive time series with a regression trend, J.Multivariate Anal. 16 (1985) 54–70.

[5] J.P. Kreiss, On adaptive estimation in stationary ARMA processes, Ann. Statist. 15 (1987) 112–133.[6] P. Jeganathan, Some aspects of asymptotic theory with application to time series models, Econometric Theory 11 (1995)818–887.

[7] F.C. Drost, C.A.J. Klaassen, B.J.M. Werker, Adaptive estimation in time-series models, Ann. Statist. 25 (1997) 786–817.[8] B.B. Garel, M. Hallin, Local asymptotic normality of multivariate ARMA processes with a linear trend, Ann. Inst. Statist.Math. 47 (1995) 147–169.

[9] R.S. Tsay, G.C. Tiao, Asymptotic properties of multivariate nonstationary processes with applications to autoregression,Ann. Statist. 18 (1990) 220–250.

[10] S.K. Ahn, G.C. Reinsel, Estimation of partially nonstationary multivariate autoregressive models, J. Amer. Statist. Assoc. 85(1990) 813–823.

[11] P.C.B. Phillips, S.N. Durlauf, Multiple time series regression with integrated processes, Rev. Econom. Stud. 53 (1986)473–495.

[12] P.C.B. Phillips, Fully modified ordinary least squares and vector autoregression, Econometrica 63 (1995) 1023–1078.[13] N.G. Fountis, Testing for unit roots in multivariate autoregression, Ph.D. Thesis, North Carolina State University, Raleigh,

North Carolina, 1983.[14] N.G. Fountis, D.A. Dickey, Testing for a unit root nonstationarity in multivariate autoregressive time series, Ann. Statist. 17

(1989) 419–428.[15] S. Johansen, Statistical analysis of cointegration vectors, J. Econom. Dynam. Control 12 (1988) 231–254.[16] S. Johansen, Estimation and hypothesis testing of cointegration vectors in gaussian vector autoregressive models,

Econometrica 59 (1991) 1551–1580.[17] G. Gonzalez-Farias, A new unit root test for autoregressive time series, Dissertation, North Carolina State University,

Raleigh, North Carolina, 1992, (unpublished).[18] G. Elliott, T.J. Rothenberg, J.H. Stock, Efficient tests for an autoregressive unit root, Econometrica 64 (1996) 813–836.[19] G. Elliott, Efficient tests for a unit root when the initial observation is drawn from its unconditional distribution, Internat.

Econom. Rev. 40 (1999) 767–783.[20] S.G. Pantula, G. Gonzalez-Farias, W.A. Fuller, A comparison of unit-root test criteria, J. Bus. Econom. Statist. 12 (1994)

449–459.[21] D.D. Cox, Gaussian likelihood estimation for nearly nonstationary AR(1) processes, Ann. Statist. 19 (1991) 1129–1142.[22] Shin Key-Il, A Unit Root Test for Multivariate Autoregressive Time Series, Dissertation, North Carolina State University,

Raleigh, North Carolina, 1992, (unpublished).[23] L. Hong, R.S. Tsay, A unified approach to identifying multivariate time series models, J. Amer. Statist. Assoc. 93 (1998)

770–782.[24] H.J. Park, Alternative estimators of the parameters of the autoregressive process. Ph.D. Dissertation, Iowa State University,

Ames, Iowa, 1990.[25] S.G. Pantula, Testing for unit roots in time series data, Econometric Theory 5 (1989) 256–271.