Near-Optimal Unit Root Tests with Stationary Covariates ...4 shocks to yt.R2 is the multiple coherence of (1−ρL)yt with xt at frequency zero (Brillinger (2001), p. 296) and it measures

Near-Optimal Unit Root Tests with Stationary Covariates with BetterFinite Sample Size.

Elena PesaventoEmory University

May 2005Preliminary, do not cite without permission

Abstract. Numerous tests for integration and cointegration have beenproposed in the literature. Since Elliott, Rothemberg and Stock (1996) thesearch for tests with better power has moved in the direction of Þnding testswith some optimality properties both in univariate and multivariate models.Although the optimal tests constructed so far have asymptotic power that isindistinguishable from the power envelope, it is well known that they can havesevere size distortions in Þnite samples. This paper proposes a simple andpowerful test that can be used to test for unit root or for no cointegration whenthe cointegration vector is known. Similarly to Hansen (1995), Elliott andJansson (2003), Zivot (2000), and Elliott, Jansson and Pesavento (2003) theproposed test achieves higher power by using additional information containedin covariates correlated with the variable we are testing. The test is constructedby applying Hansens test to variables that are detrended under the alternativein regression augmented with leads and lags of the stationary covariates. Usinglocal to unity parametrization, this paper analytically compute the asymptoticdistribution of the test under the null and the local alternative. I show that,although this test is not optimal in the sense of Elliott and Jansson (2003), it hasbetter Þnite sample size properties while having asymptotic power curves thatare indistinguishable from the power curves of optimal tests.

Keywords: Unit Root Test, GLS detrending.

JEL Classification: C32.

I thank Michael Jansson as the idea of this paper came after a conversationwith him.

Corresponding author: Elena Pesavento, Department of Economics, EmoryUniversity, Emory GA30322, USA. Phone: (404) 712 9297. E-mail: [email protected].

1

1. IntroductionSince the work of Fuller (1976) and Dickey and Fuller (1979) a large number of testshave been developed for the hypothesis that a variable is integrated of order oneagainst the hypothesis that it is integrated of order zero. Motivating this considerablebody of literature is the knowledge that a root equal to one can have a signiÞcantimpact on the analysis of the long- and short-run dynamics of economic variables.Unit root testing is therefore considered an important step in economic modeling1.

The seminal paper of Elliott, Rothenberg and Stock (1996) (ERS thereafter)marked the point at which to stop the search for unit root tests with better powerin an univariate setting. They show that no uniformly most powerful test for thisproblem exits, they compute the power envelope for point-optimal tests of a unit rootin an univariate model, and they derive a family of feasible tests (PT thereafter) thathave asymptotic power close to the power envelope. In fact the asymptotic power ofthe ERS test is never below the envelope and it is tangent to the power envelope atone point . In this sense, the ERS tests are approximately most powerful.

One feature of the ERS approach is that the variables are detrended under thealternative (or GLS detrended). ERS also propose a version of the ADF t-test wherethe variable have been GLS detrended before running the augmented Dickey Fullerregression (ADF −GLS test). Although the ADF −GLS does not have the sameoptimality justiÞcations of the PT test, it performs similarly in term of power whilehaving better size properties. In fact, practitioners use the ADF −GLS more oftenthat the PT test, presumably because it is more intuitive to understand the origin ofthe increase in power while being simple to compute.

The search for tests with better power is now moving in the directions of multivari-ate models. Hansen (1995) shows that additional information contained in stationarycovariates that are correlated with the variables of interest can be exploited to ob-tain tests that have higher power than univariate tests. Hansen (1995) computesthe power envelope for unit root tests in the presence of stationary covariates in amodel with no deterministic terms, while Elliott and Jansson (2003) generalize theresults to the case in which the model include constants and/or time trends. Bothpapers illustrate the signiÞcant increase in the asymptotic power envelope in mul-tivariate models that include stationary covariates. To implement a feasible test,Hansen (1995) proposes covariate augmented Dickey-Fuller (CADF ) tests computedas t−tests in a ADF regression augmented by leads and lags of the stationary covari-ates. Elliott and Jansson (2003) construct a family of tests (EJ thereafter), similarin spirit to the PT tests, that are feasible and that attain the power envelope at apoint. Both Hansen (1995) and Elliott and Jansson (2003) tests are generalizationof the ADF and PT tests and in fact they have the same asymptotical distributionof ADF and PT when there is no information in the stationary covariates, i.e. the

1Exceptions are Rossi (2004), Rossi (2005), Pesavento Rossi (2004), and Jansson and Moreira(2005), where inference in robust to the presence of not of exact unit roots.

2

correlation between the stationary covariate and the variable we are testing is zero.Both the CADF and EJ tests have power than is higher that the power of ADFand PT when the correlation is different than zero with gains that get larger as thecorrelation increases. Not only both the CADF and EJ tests outperforms univari-ate tests, but there are also signiÞcant differences between them The differences aresimilar to the differences between ADF and PT in univariate models. This is to beexpected as ADF and PT are special cases of CADF and EJ when no stationarycovariates are included. Elliott and Jansson (2003) show that EJ can signiÞcantlyoutperform CADF in term of power although it can be slightly worse in term of sizedistortions.

The goal of this paper is to propose a generalization of the CADF test that issimilar to the GLS generalization of the ADF test, and that apply to a model withstationary covariates. The test is constructed by applying GLS detrending to eachvariable according to assumption on the deterministic terms and then estimatingan augmented regression with lags and leads of the stationary covariates. To keepwith the Hansens notation, I will call this test CADF − GLS. Similarly to theADF −GLS test, the proposed test is intuitive and it is easy to compute. Section2 analytically computes the asymptotic distribution of the test under the null andthe local alternative. Using Monte Carlo simulations i show that, although this testis not optimal in the sense of Elliott and Jansson (2003), it has better Þnite samplesize properties while having asymptotic power curves that are indistinguishable fromthe power curves of optimal tests. Although the general model of Section 2 doesnot allow for cointegration2, Elliott, Jansson and Pesavento (2005) show that theproblem of testing for the null of no cointegration in cases in which there is onlyone cointegration vector that is known a-priori is isomorphic to the unit root testingproblem studied in Elliott and Jansson (2003). Section XX, study the behavior ofCADF −GLS tests when used to test for the null of no cointegration.

Section XX concludes.

2. Model: no cointegration

I consider the case where a researcher observes an (m+ 1)-dimensional vector timeseries zt = (yt, x0t)

0 generated by the triangular model

xt = µx + τxt+ ux,t (1)

yt = µy + τyt+ uy,t (2)

2The case in which a cointegration vector is present should be modeled to take account of coin-tegration and it is outside the scope of this paper

3

and

Φ (L)

µux,t

(1− ρL)uy,t¶= εt, (3)

where yt is univariate, xt is of dimension m × 1, Φ (L) is a matrix polynomial ofpossible inÞnite order in the lag operator L with Þrst element equal to the identitymatrix.

Assumption 1: max−k≤t≤0°°°¡ux,t, u

0y,t

¢0°°° = Op (1) ,where k·k is the Euclidean norm.Assumption 2: |Φ (r)| = 0 has roots outside the unit circle.Assumption 3: Et−1 (εt) = 0 (a.s.), Et−1 (εtε0t) = Σ (a.s.), and suptE kεtk2+δ <∞

for some δ > 0, where Σ is positive deÞnite, Et−1 (·) refers to the expectationconditional on εt−1, εt−2, ... .

I am interested in the problem of testing for the presence of a unit root in yt:

H0 : ρ = 1 vs. H1 : −1 < ρ < 1.

Assumptions 1-3 are fairly standard and are the same as (A1)-(A3) of Elliott andJansson (2003). Assumption 1 ensures that the initial values are asymptoticallynegligible, Assumption 2 is a stationarity condition, and Assumption 3 implies thatεt satisÞes a functional central limit theorem (e.g. Phillips and Solo (1992)). Thisis the same problem analyzed by Elliott and Jansson (2003). Following their notation,deÞne ut (ρ) =

£u0x,t, (1− ρL)uy,t

¤= Φ (L)−1 εt. Assumption 1-3 mean that

T−1/2P[T.]t−1 ut (ρ)⇒ Ω1/2W (·)

where Ω = Φ (1)−1ΣΦ (1)0−1 is 2π times the spectral density at frequency zero ofut (ρ). Partition Ω and Φ (L) conformably to zt as

Ω =

·Ωxx ωxyωyx ωyy

¸and

Φ (L) =

·Φxx (L) Φxy (L)Φyx (L) Φyy (L)

¸and deÞne R2 = δ0δ where δ = Ω−1/2xx ω0xyω

−1/2yy is a vector containing the the bivariate

zero frequency correlations between the shocks to xt and the quasi-difference of the

4

shocks to yt. R2 is the multiple coherence of (1− ρL) yt with xt at frequency zero(Brillinger (2001), p. 296) and it measures the extent to which the quasi-difference ytis determinable from the m−vector valued xt by linear time invariant operations. R2lies between zero and one, and represents the contribution of the stationary variableas it is zero when there is no long run correlation between xt and the quasi-differenceof yt. As in Elliott and Jansson (2003), R2 is assumed to be strictly less than one,thus ruling out the possibility that under the null, the partial sum of xt cointegrateswith yt. The case in which a cointegration vector is present should be modeled totake account of cointegration and it is outside the scope of this paper. The onlyexception is when there is only one cointegration vector that is known a-priori as inElliott, Jansson and Pesavento (2005). This case is analyzed in section XX.

As in Elliott and Jansson (2003) I will consider Þve cases for the deterministicpart of the model:

Case 1: µx = µy = 0 and τx = τy = 0.Case 2: µx = 0 and τx = τy = 0.Case 3: τx = τy = 0.Case 4: τx = 0.Case 5: no restrictions.

These cases represent a fairly general set of models that are relevant in empiricalapplications.

Both Hansen (1995) and Elliott and Jansson (2003) show that in the triangularmodel (1) − (3), no uniformly most powerful test exists. When R2 is differentthan zero, the stationary covariate xt contains information that can be exploited toobtain unit root tests that have power higher than standard univariate tests. Hansen(1995) suggests a covariate augmented Dickey-Fuller test (CADF ) while Elliott andJansson (2003) constructs tests (EJ) that are feasible with data and that are closeto the power envelope constructed from a point optimal family of tests.

Model (1− 3) is slightly more general than Hansen (1995) and Elliott and Jansson(2003) as it allows for a short run dynamics of unknown and possibly inÞnite order.DeÞne Γ (k) = E [ut (ρ)ut+k (ρ)] the autocovariance function of ut (ρ). Implicit inA1-A3 is the summability condition

P+∞j=−∞ kΓ (k)k <∞ and the condition that the

spectral density of ut (ρ) , fu(ρ)u(ρ) (λ) is bounded away from zero (check this). Itis well known that when those two conditions hold (Saikonnen, 1991) we can write

uy,t (ρ) =+∞Xj=−∞

eπ0x,jux,t−j (ρ) + ηt (4)

where the summability conditionP+∞j=−∞ keπx,jk < ∞ holds and ηt is a serially cor-

related stationary process such that E¡u0x,tηt+k

¢= 0 for any k = 0,±1,±2, ... The

5

spectral density of ηt is fηη (λ) = fuy(ρ)uy(ρ) (λ)− fuy(ρ)ux (λ) fuxux (λ)−1 fuxuy(ρ) (λ)so

2πfηη (0) = ωy.x = ωyy − ωyxΩ−1xxωxySubstituting (4) into (1) we have that

∆yt = dt + (ρ− 1) yt−1 ++∞Xj=−∞

eπ0x,jxt−j + ηt (5)

where

dt = (1− ρ)µy + (1− ρ) τyt− eπx (1)0 µx − eπx (1)0 τxt− τx +∞Xj=−∞

jeπ0x,jwith eπx (1) = P+∞

j=−∞ eπ0x,j Since the sequence eπx,j is absolute summable eπx,j ≈0 for |j| > k for k large enough and we can approximate (5) with

∆yt = dt + αyt−1 ++kXj=−k

eπ0x,jxt−j + ηtk (6)

where α = (ρ− 1), and ηtk = ηt +P|j|>k eπ0x,jxt−j. The deterministic term dt is

such that

Case 1 : dt = 0

Case 2 : dt = (1− ρ)µyCase 3 : dt = (1− ρ)µy − eπx (1)0 µxCase 4 : dt = (1− ρ)µy + (1− ρ) τ yt− eπx (1)0 µxCase 5 : dt = (1− ρ)µy + (1− ρ) τ yt− eπx (1)0 µx − eπx (1)0 τxt− τx +∞X

j=−∞jeπ0x,j

Notice that ηtk in (6) is uncorrelated at all leads and lags with xt but it is seriallycorrelated and the asymptotic distribution of tests on α in (6) will depend on nuisanceparameters. ModiÞed version of the tests can still be constructed as in Phillipsand Perron (1988) by using non parametric estimates of the nuisance parameters.Alternatively, the regression can be augmented with lags of ∆yt to obtain errors thatare white noise as in Hansen (1995). Hansens CADF test is then based on the t-statistics on an augmented regression in which lagged, contemporaneous and futurevalues of the stationary covariate are included:

6

∆yt = edt + ϕyt−1 +Pkj=−k π

0x,jxt−j +

Pkj=1 πy,j∆yt−j + ξtk (7)

The intuition behind this approach is that the correlation between yt and xtcan help in reducing the error variance thus resulting in more precisely regressionparameters estimates. Hansen (1995) shows that the asymptotic distribution forthe t-statistics on ϕ is different that the distribution of the ADF test and that asigniÞcant increase in the asymptotic power for local alternative can be obtainedwith the inclusion of the covariate and shows, for the case with no mean, that theasymptotic power is close to the power envelope. Hansen (1995) considers casesequivalent to Case1-Case 4 in this paper, where the regression is estimated with nomean for Case 1, with a mean for Case 2 and Case 3, and with mean and trend forCase 4. As in Elliott and Jansson (2003) I also consider the general case in whichthere are no restrictions on the deterministic terms.

Model (1)−(3) allows for autoregressive processes of inÞnite order and a conditionon the expansion rate of the truncation lag k is necessary. The following conditionis assumed throughout the paper:

Assumption 4: T−1/3k→ 0 and k→∞ as T →∞.

The condition in Assumption 4 speciÞes an upper bound for the rate at whichthe value k is allowed to tend to inÞnity with the sample size. Ng and Perron (1995)show that conventional model selection criteria like AIC and BIC yield k = Op (logT ),which satisÞes Assumption 4. Often, a second condition is also assumed to impose alower bound on k. The lower bound condition is only necessary to obtain consistencyof the parameters on the stationary variables, and it is sufficient but not necessaryto prove the limiting distribution of the relevant test statistics (Ng and Perron, 1995,Lutkephol and Saikonnen, 1999). Since I am only interested in the t-ratio statistics,Assumption 4 is necessary and sufficient to prove the asymptotic distribution of thetests.

Theorem 1 generalizes the results proved by Hansen (1995) in the context ofmodel (1)− (3) for more general cases for the deterministic terms and for an inÞniteorder polynomial, which is approximated by a Þnite lag length k chosen by a datadependent criteria.

Theorem 1 [OLS Detrending]. When the model is generated according to (1)−(3) ,withT (ρ− 1) = c, and Assumption 1 to 4 are valid, then, as T →∞:

tϕ ⇒ c¡RJd2xyc

¢−1/2 +¡RJd2xyc

¢−1 ¡RJdxycdW2

¢¡RJd2xyc

¢−1/2

7

where Jxyc (r) is a Ornstein-Uhlenbeck process such that

Jxyc (r) =Wxy (r) + c

Z 1

0e(λ−s)cWxy (s) ds,

Wxy (r) =q

R2

1−R2Wx (r) + Wy (r), Wx (r) and Wy (r) are independent standard

Brownian Motions, and Jdxyc (r) = Jxyc (r) if no deterministic terms are includedin the regression, Jdxyc (r) = Jxyc (r) −

RJxyc (s)ds if a mean is included in the re-

gression, and Jdxyc (r) = Jxyc (r)− (4− 6r)RJxyc (s) ds− (12r − 6)

RsJxyc (s) ds if a

mean and trend are included in the regression.

The asymptotic distribution of the test is the same as Hansen (1995) in his specialcase in which the errors terms in equation (7) are uncorrelated with xt−k, which holdsin well-speciÞed dynamic regressions.3

Elliott and Jansson (2003) follow the general methods of King (1980, 1988) andexamine Neyman-Pearson type of tests in the context of model (1) − (3). ElliottRothenberg and Stock (1996) Þrst applied this methodology to construct point opti-mal tests in unit root testing. The additional complication in the context of model(1)−(3) is that the likelihood ratio test is computed using the entire system. Elliottand Jansson (2003)4 compute the power envelope for the family of point optimal testsfor each possible case of the deterministic terms (Case 1 to Case 5) under some simpli-fying assumptions, and they construct feasible general tests that are asymptoticallyequivalent to the power envelope and that are valid under more general assumptions.In both Hansen (1995) and Elliott and Jansson (2003), the asymptotic power undera local alternative ρ = 1+(c/T ) depend on the parameter R2: as R2 increases, thereis a larger gain in using the information contained in the stationary covariate overan univariate test. For this reason both the CADF and EJ tests have power thatis equal to the power of ADF and PT tests respectively when R2 is zero. When R2

is different than zero, the gain from using a multivariate test over an univariate testgets larger as R2 increases.

Although both CADF and EJ have power that is larger than univariate tests,Elliott and Jansson (2003) show that the gain in term of power from using an optimaltest over the standard t-test in Hansen (1995) can be quite large. In some cases (de-pending on the deterministic case considered) the power of EJ can be up to 2-3 timeslarger than the power of CADF . The difference between EJ and CADF is similarto the difference between ADF and PT in the univariate case. In fact when R2 is zeroand there is no gain in using the stationary covariate, the asymptotic distributions of

3Note that R2 in this paper corresponds to 1− ρ2 in Hansen(1995)s notation.4Hansen (1995) also computes the power envelope for the less general case in which there are not

deterministic terms present.

8

CADF and EJ are equivalent respectively to the asymptotic distributions of ADFand PT .

In the context of univariate tests, Elliott, Rothenberg and Stock (1996) show that,although PT has higher power that ADF , the ADF test has slightly smaller size dis-tortions. Interestingly, Elliott, Rothenberg and Stock (1996) propose an alternativetest that is computed by Þrst detrending the variable under a local alternative andthen applying the ADF with no mean (ADF -GLS). Although this test does nothave the same optimality justiÞcation of the PT test, simulations show that it hassimilar power properties while having slightly better size properties. As the ADF -GLS is easier to implement, while having similar properties, often practitioner preferto use the ADF -GLS over PT tests.

This paper proposes a test that it similar in spirit to the ADF -GLS test, in thecontext of the multivariate model (1)−(3). The test is derived by applying HansensCADF test to variables that have been previously detrended under the alternative.

Theorem 2 [GLS Detrending]. When the model is generated according to (1)−(3) ,withT (ρ− 1) = c, and Assumption 1 to 4 are valid, then, as T →∞:

tϕ ⇒ c¡RJd2xyc

¢−1/2 +¡RJd2xyc

¢−1 ¡RJdxycdW2

¢¡RJd2xyc

¢−1/2where Jxyc (r) and Wxy (r) are as defined in Theorem 1, and Jdxyc (r) = Jxyc (r) ifno deterministic terms are included in the regression, and if a mean is included inthe regression, while Jdxyc (r) = Jxyc (r)−

£λJxyc (1) + (1− λ) 3

RsJxyc (s) ds

¤r where

λ = 1−c1−c_c2/3

if a mean and trend are included in the regression.

3. Size and Power Comparison3.1. Power Comparison. All three tests studied in this paper have asymptoticpower that depends on the nuisance parameter R2. Although the particular estimatorused to estimate the nuisance parameters does not affect the asymptotic distributionsunder the local alternatives, the Þnite sample properties of tests for a unit root canbe sensitive to the choice of the estimation method. To study the small samplebehavior of the proposed test, we simulate equation with scalar xt :

xt = ux,t, (8)

∆yt = (ρ− 1) yt−1 + uy,t

9

To study the small sample power of the tests we simply allow the error process

to be i.i.d. N (0,Σ), so that Ω = Σ where we choose Ω =

µ1 RR 1

¶for R2 =

0, 0.3, 0.5, 0.7. To be able to make a meaningful comparison or the rejection ratesunder the alternative Table 5, reports the size adjusted power5.

TABLES 5- ABOUT HERE

For Case 1 no deterministic terms are included in the regression, for Case 2-Case3, the regressions are estimated with a mean, and for Case 4-5 the regressionsinclude mean and trend. In each case R2 is estimated as suggested by Elliott andJansson (2003) and Hansen (1995)6 Table 5 presents the rejection rates over 10, 000replications of samples of length 100. As Table 5 shows the all three tests have powerthat is increasing with R2. The relative ranking of the tests in Þnite samples isconsistent with the asymptotic results. As documented in Elliott and Jansson (2003),EJ can have signiÞcant higher power than CADF while the power of CADF −GLSis higher that the power of CADF and it is very similar to the power of EJ.

3.2. Size Comparison. To compare the tests in term of size distortions moredynamic in the error terms is allowed. The error process ut = (uy,t, ux,t)

0 is generatedby the VARMA(1,1) model (I2 −AL)ut = (I2 +ΘL) εtwhere

A =

µa1 a2a2 a1

¶, Θ =

µθ1 θ2θ2 θ1

¶,

and εt ∼ i.i.d. N (0,Σ) , where Σ is chosen in such a way that the long-run variancecovariance matrix of ut satisÞes

Ω = (I2 −A)−1 (I2 +Θ)Σ (I2 +Θ)0 (I2 −A)−10 =µ1 RR 1

¶, R ∈ [0, 1) .

The number of lags and leads is estimated by BIC on a VAR on the Þrst differences(under the null) with a maximum of 8 lags. The same lag length chosen by BICis used for all three tests. For case 2-Case3, the regressions are estimated with amean. For Case 4-5 the model is estimated with mean and trend. The sample sizeis T = 100 and 10, 000 replications are used.

TABLES 1-4 ABOUT HERE

5The small sample power is computed from the tests. It would also be interesting to simulatedthe asymptotic power directly from the asymptotic distribution and see if the two are different. Thiscomparison will be available in a future version of the paper.

6 In the former test R2 is estimated parametrically while in the latter it is estimated non-parametrically.

10

Tables XX and XX compare the small sample size of Elliott and Janssons (2003) test,Hansens (1995) CADF test, and CADF -GLS test for various values of Θ and A.To compute the critical values in each case we estimate the value of R2 as suggestedby Elliott and Jansson (2003) and Hansen (1995). Overall the Elliott and Jansson(2003) test is worse in term of size performance than the CADF test. This is thesame type of difference found between thePT and DF tests in the univariate case, sois not surprising given that these methods are extensions of the two univariate testsrespectively. The difference between the two tests is more evident for large valuesof R2 and for the case with no trend. When Θ is nonzero both tests present sizedistortions that are severe in the presence of a large negative moving average root(as is the case for unit root tests), emphasizing the need of proper modeling of theserial correlation present in the data7

7Ng and Perron (2001) provide a method for better lag selection in the univariate case.

11

Table 1: Small Sample Size, Deterministic Case 2.A a1 0 0.2 0.8 0.2 0 0 0 0 0.2

a2 0 0 0 0.5 0 0 0 0 0Θ θ1 0 0 0 0 -0.2 0.8 -0.5 -0.8 -0.5

θ2 0 0 0 0 0 0 0 0 0

R2 = 0 0.059 0.059 0.076 0.121 0.102 0.102 0.165 0.346 0.158EJ R2 = 0.3 0.065 0.067 0.098 0.104 0.099 0.128 0.144 0.296 0.142

R2 = 0.5 0.060 0.071 0.119 0.097 0.094 0.164 0.141 0.285 0.137R2 = 0.7 0.066 0.083 0.153 0.102 0.099 0.260 0.159 0.325 0.152

R2 = 0 0.051 0.045 0.076 0.103 0.075 0.056 0.100 0.256 0.105CADF R2 = 0.3 0.055 0.056 0.082 0.073 0.070 0.061 0.082 0.197 0.086

R2 = 0.5 0.057 0.064 0.086 0.056 0.062 0.067 0.068 0.144 0.070R2 = 0.7 0.053 0.065 0.089 0.047 0.055 0.069 0.056 0.092 0.056

R2 = 0 0.072 0.058 0.089 0.086 0.111 0.070 0.152 0.295 0.156CADF -GLS R2 = 0.3 0.078 0.072 0.093 0.075 0.099 0.076 0.117 0.210 0.121

R2 = 0.5 0.072 0.071 0.090 0.075 0.078 0.073 0.092 0.156 0.093R2 = 0.7 0.066 0.072 0.087 0.076 0.067 0.071 0.066 0.094 0.068

Lags by BIC with a maximum of 8 (should use MAIC for CADF -GLS), T = 100, NMC =10, 000.

12


a2 0 0 0 0.5 0 0 0 0 0Θ θ1 0 0 0 0 -0.2 0.8 -0.5 -0.8 -0.5

θ2 0 0 0 0 0 0 0 0 0

R2 = 0EJ R2 = 0.3

R2 = 0.5R2 = 0.7

R2 = 0CADF R2 = 0.3

R2 = 0.5R2 = 0.7

R2 = 0CADF -GLS R2 = 0.3

R2 = 0.5R2 = 0.7

13


a2 0 0 0 0.5 0 0 0 0 0Θ θ1 0 0 0 0 -0.2 0.8 -0.5 -0.8 -0.5

θ2 0 0 0 0 0 0 0 0 0

R2 = 0EJ R2 = 0.3

R2 = 0.5R2 = 0.7

R2 = 0CADF R2 = 0.3

R2 = 0.5R2 = 0.7

R2 = 0CADF -GLS R2 = 0.3

R2 = 0.5R2 = 0.7

14


a2 0 0 0 0.5 0 0 0 0 0Θ θ1 0 0 0 0 -0.2 0.8 -0.5 -0.8 -0.5

θ2 0 0 0 0 0 0 0 0 0

R2 = 0EJ R2 = 0.3

R2 = 0.5R2 = 0.7

R2 = 0CADF R2 = 0.3

R2 = 0.5R2 = 0.7

R2 = 0CADF -GLS R2 = 0.3

R2 = 0.5R2 = 0.7

15

Table 5: Size adjusted, Small Sample Power.Case 2 Case 3

R2\c -2 -5 -10 -15 -2 -5 -10 -15

0 0.119 0.292 0.654 0.873EJ 0.3 0.148 0.438 0.847 0.965

0.5 0.208 0.653 0.965 0.9960.7 0.270 0.858 0.998 1.000

0 0.065 0.115 0.310 0.606CADF 0.3 0.090 0.202 0.553 0.859

0.5 0.126 0.339 0.782 0.9610.7 0.208 0.604 0.948 0.997

0 0.119 0.286 0.664 0.902CADF -GLS 0.3 0.151 0.409 0.796 0.945

0.5 0.237 0.601 0.911 0.9800.7 0.381 0.796 0.974 0.996

Case 4 Case 5R2\c -2 -5 -10 -15 -2 -5 -10 -15

0EJ 0.3

0.50.7

0CADF 0.3

0.50.7

0CADF -GLS 0.3

0.50.7

Table reports the size adjusted power for each test. The rEJection rate for c=0 is 5% in all

cases and it is not reported. T=100, NMC=10,000

16

4. ReferencesBerk, K.N., 1974, Consistent Autoregressive Spectral Estimates, Annals of Statis-tics 2, 486-502.

Brillinger, D.R., 2001, Time Series: Data Analysis and Theory.

Chan, N.H. and C. Z. Wei, 1988: Limiting Distributions of the Least SquaresEstimates on Unstable Autoregressive Processes, Annals of Statistics, 16, 367-401.

Elliott, G. and M. Jansson, 2003: Testing for Unit Roots with Stationary Co-variates, Journal of Econometrics, 115, 75-89.

Elliott, G., M. Jansson, and E. Pesavento, 2005: Optimal Power for Testing Po-tential Cointegrating Vectors with Known Parameters for Nonstationarity, Journalof Business & Economic Statistics, Vol. 23,no.1,pp.34-48.

Elliott, G., T. J. Rothenberg and J. Stock, 1996: Efficient Tests for an Autore-gressive Unit Root, Econometrica, 64, 813-836.

Hansen, B.E., 1995: Rethinking the Univariate Approach to Unit Root Testing:Using Covariates to Increase Power, Econometric Theory, 11,1148-1172.

Jansson, M. and M. Moreira, 2995: Optimal Inference in Regression Models withNearly Integrated Regressors, mimeo.

Lutkephol, H. and P. Saikkonen, 1999, Order Selection in Testing for the Coin-tegrating Rank of VAR Process, in: R.F. Engle and H. White, eds., Cointegration,Causality and Forecasting, (Oxford University Press, Oxford), 168-199.

Ng, S, and P. Perron, 1995, Unit Root Tests in ARMA Models with Data-Dependent Methods for Selection of the Truncation Lag, Journal of the AmericanStatistical Association, 90, 268-281.

Pesavento, Elena and Barbara Rossi (2004) Small Sample ConÞdence Intervalsfor Multivariate Impulse Response Functions at Long Horizons, Duke UniversityWorking Paper #03-19.

Phillips, P.C.B., 1987a: Toward and UniÞed Asymptotic Theory for Autoregres-sion, Biometrica, 74, 535-547.

Phillips, P.C.B. and V. Solo, 1992, Asymptotics of Linear Processes, The AnnalsOf Statistics 20, 971-1001.

17

Rossi B. 2004, ConÞdence intervals for half-life deviations from Purchasing PowerParity, Journal of Business & Economic Statistics, forthcoming.

Rossi B. 2005. Expectations Hypotheses Tests at Long Horizons, mimeo.

Said, E.D. and D.A. Dickey, 1984, Testing for Unit Roots in AutoregressiveMoving Average Models of Unknown Order, Biometrika 71, 599-607.

Saikonnen, P, 1991, Asymptotically Efficient Estimation of Cointegration Re-gressions, Econometric Theory 7, 1-21.

Sims, C.A., Stock J.H. and M.W. Watson, 1990, Inference in Linear Time SeriesModels with some Unit Roots, Econometrica 58, 113-144.

Wooldridge, J., 1994, Estimation and Inference for Dependent Processes, in D.Mc Fadden and R.F. Engle, eds., Handbook of Econometrics, vol.4, (North-Holland,Amsterdam), 2639-2738.

18

5. AppendixNotation used: kk is the standard Euclidean norm, ⇒ denotes weak convergence, ...

Lemma 1. When the model is generated according to (1) − (3) ,with T (ρ− 1) = c,then, as T →∞ :

(i) ω−1/2y.x T−1/2uy[T ·] =⇒ Jdxyc (·)Jxyc (r) is a Ornstein-Uhlenbeck process such that

Jxyc (r) =Wxy (r) + c

Z 1

0e(λ−s)cWxy (s) ds

Wxy (r) =q

R2

1−R2Wx (r) + Wy (r), Wx (r) and Wy (r) are independent standard

Brownian Motions, and Jdxyc (r) = Jxyc (r) if no deterministic terms are includedin the regression, Jdxyc (r) = Jxyc (r) −

RJxyc (s)ds if a mean is included in the re-

gression, and Jdxyc (r) = Jxyc (r)− (4− 6r)RJxyc (s)ds− (12r − 6)

RsJxyc (s) ds if a

mean and trend are included in the regression.

Proof. (i)Assumption A1-A3 imply T−1/2P[T ·]t=1 ut (ρ)⇒ Ω1/2W (·) where Ω1/2 ="

Ω1/2xx 0

ωyxΩ−1/2xx ω

1/2y.x

#, ω

1/2y.x = ωyy − ωyxΩ−1xxωxy and W 0 =

h fW 0x Wy

i0. Using this

notation T−1/2P[T ·]t=1 ut (ρ) ⇒ ωyxΩ

−1/2xx

fW 0x + ω

1/2y.xWy. DeÞne δ0 = ω

−1/2y.x ωyxΩ−1xx

so that δ0δ = R2

1−R2 ; then from Phillips (1987 a,b) and the multivariate Functional

Central Limit Theorem, ω−1/2y.x T−1/2uy[T ·] ⇒ δ0fW 0

x +Wy =q

R2

1−R2Wx +Wy whereWx is an univariate standard Brownian Motion independent of Wy.

(ii) and (iii) Follows directly from the Chan and Wei (1988) and Phillips (1987a).

Proof also in Pesavento (2004)

Recall equation (5) ∆yt = dt + (ρ− 1) yt−1 +P+∞j=−∞ eπ0x,jxt−j + ηt. ηt is uncor-

related at all leads and lags with xt but it is serially correlated. Assume for examplethat ψ (L) ηt = ξt where ξt is white noise and The regression (5) can be augmentedwith lags of ∆yt to obtain errors that are white noise as in Hansen (1995). The testsuggested by Hansen (1995) is then based on the t-statistics on a augmented regres-sion in which lagged, contemporaneous and future values of the stationary covariateincluded:

∆yt = edt + ϕyt−1 +P∞j=−∞ π

0x,jxt−j +

P∞j=1 πy,j∆yt−j + ξt

where ϕ = ψ (1) (ρ− 1). The deterministic is such that there is no mean inCase 1, a mean in Case 2 and 3 and a mean and trend in Cases 4 and 5. Given

19

the absolute summability condition we can approximate the regression with a Þnitenumber of lags k :

∆yt = edt + ϕyt−1 +Pkj=−k π

0x,jxt−j +

Pkj=1 πy,j∆yt−j + ξtk (9)

where ξtk = ξt +P|j|>k π

0x,jxt−j +

Pj>k π

0y,j∆yt−j .

To prove Theorem 1 lets Þrst prove some auxiliary results. For easiness ofnotation I will consider the case in which there are no deterministic terms. Theextension to the other cases is straightforward. as they are applications of the sameapproach to demeaned or demeaned and detrended variables.

Following the same methodology of Sims, Stock and Watson (1990) rewrite (9) as

∆yt = Π0wtk + ξtk

whereΠ0 =£ϕ π0

¤=

£ϕ π0x,−k ... π0x,k πy,1 ... πy,k

¤, w0tk =

£yt−1 X 0 ¤

,

X 0 =£x0t+k ... x0t ... x0t−k ∆yt−1 ... ∆yt−k

¤. The proof follows closely

Berk (1974), Said and Sickey (1984) and Saikonnen (1991). As Berk (1974) I usethe standard Euclidean norm kzk = (z0z)1/2 of a column vector z to deÞne a matrixnorm kBk such that kBk = sup kBzk : z < 1. Notice that kBk2 ≤ P

ij bij andthat kBk is dominated by the largest modulus of the eigenvalues of B.

Let Υ denote the diagonal matrix of dimensions m (2k + 1) + (k + 1):

Υ = diagh(T − 2k) (T − 2k)1/2 Im ... (T − 2k)1/2 Im (T − 2k)1/2 ... (T − 2k)1/2

iand R = Υ−1

³PT−kt=k+1wtkw

0tk

´Υ−1. We are interested in the difference between

R and R = diagh(T − 2k)−2PT−k

t=k+1 y2t−1 ΓX

iwith ΓX = E [XX 0].

Lemma 2.°°° R−R°°° = Op ¡

k2/T¢

Proof. Denote Q = [qij] = R−R . By deÞnition q11 = 0. When i > 1 and j > 1Dickey and Fuller (1984) show that (T − 2k)E

³q2ij

´≤ C for some C, where 0 < C <

∞ and it is indipendent of i, j and T . Since Q has dimensions m (2k + 1) + 1 + k,

E³kQk2

´≤ m(2k+1)+1+k

T−2k so if k2/T → 0, kQk converges in probability ot zero.

Lemma 3.°°R−1°° = Op (1)

20

Proof. Since R−1 is black diagonal,°°R−1°° is bounded by the sum of the norms of

the diagonal blocks. Under Lemma 1, and if k/T → 0, (T − 2k)−2 PT−kt=k+1 y

2t−1 ⇒

ωy.xRJ2xyc while the lower right corner of R

−1 is Γ−1X which is bounded since all theelements of X are stationary.

Lemma 4.°°° R−1 −R−1°°° = Op ¡

k/T 1/2¢

Proof. The proof follows directly from Dickey and Fuller (1984).

Denote et =P|j|>k π

0x,jxt−j +

Pj>k π

0y,j∆yt−j so that εtk = et + εt. Note that

E ketk2 ≤ CµP

|j|>k°°°π0x,j°°°2 +P

j>k

°°°π0y,j°°°2¶Lemma 5.

°°°Υ−1 PT−kt=k+1wtket

°°° = Op ¡k1/2

¢

Proof. E°°°Υ−1 PT−k

t=k+1wtket

°°°2 = E °°°(T − 2k)−1PT−kt=k+1 yt−1et

°°°2+E °°°(T − 2k)−1/2 PT−kt=k+1Xet

°°°2.E

°°°(T − 2k)−1 PT−kt=k+1 yt−1et

°°°2 = E h(T − 2k)−1 PT−k

t=k+1 uy,t−1eti2. Under Lemma

1, if k/T → 0, (T − 2k)−1 PT−kt=k+1 uy,t−1et = Op (1) andE

°°°(T − 2k)−1 PT−kt=k+1 yt−1et

°°°isOp (1). Additionally, E

°°°(T − 2k)−1/2 PT−kt=k+1Xet

°°°2 ≤ (T − 2k)−1 PT−kt=k+1E kXetk2 ≤

(T − 2k)−1 PT−kt=k+1E kXk2E ketk2 ≤ Ctr (ΓX)

µP|j|>k

°°°π0x,j°°°2 +Pj>k

°°°π0y,j°°°2¶ ≤

(C (2k + 1) tr (Γx) + ktr (Γ∆y))

µP|j|>k

°°°π0x,j°°°2 +Pj>k

°°°π0y,j°°°2¶. Under Assump-

tion 1,P|j|>k

°°°π0x,j°°°2 andPj>k

°°°π0y,j°°°2 are bounded, E °°°(T − 2k)−1/2 PT−kt=k+1Xet

°°°2is Op (k) , E

°°°Υ−1PT−kt=k+1wtket

°°°2 = 0p (k) and °°°Υ−1 PT−kt=k+1wtket

°°° = Op ¡k1/2

¢

Lemma 6.°°°Υ−1 PT−k

t=k+1wtkξt

°°° = Op ¡k1/2

¢

Proof. E°°°Υ−1 PT−k

t=k+1wtkξt

°°°2 = E °°°(T − 2k)−1 PT−kt=k+1 yt−1ξt

°°°2+E °°°(T − 2k)−1/2PT−kt=k+1Xξt

°°°2.E

°°°(T − 2k)−1 PT−kt=k+1 yt−1et

°°°2 = E h(T − 2k)−1 PT−k

t=k+1 yt−1eti2which is Op (1) un-

der Lemma 1 and given that k/T → 0. Additionally, because all the elements ofX are

21

stationary and uncorrelated at all leads and lags with εt, E°°°(T − 2k)−1/2 PT−k

t=k+1Xξt

°°°2 ≤(T − 2k)−1 PT−k

t=k+1E kXk2E kξtk2 = (C (2k + 1) tr (Γx) + ktr (Γ∆y))σ2ξ = Op (k)

and°°°Υ−1PT−k

t=k+1wtkξt

°°° = Op ¡k1/2

¢by Markov inequality.

Now we have all the results necessary to prove Theorem 1.

Proof. [Theorem 1] Υ³Π−Π

´= R−1Υ−1

PT−kt=k+1wtkξtk =

=³R−1 −R−1

´Υ−1

PT−kt=k+1wtkξtk +R

−1Υ−1PT−kt=k+1wtkξtk =

=³R−1 −R−1

´Υ−1

PT−kt=k+1wtkξt−

³R−1 −R−1

´Υ−1

PT−kt=k+1wtket+R

−1Υ−1PT−kt=k+1wtkξtk =

=E1 +E2 +E3. By Lemma 4, 5,6 and if k3/T → 0, both kE1k and kE2k are oforder op (1) . Because R−1 is black diagonal

(T − 2k) (ϕ− ϕ) =³(T − 2k)−2PT−k

t=k+1 y2t−1

´−1 ³(T − 2k)−1 PT−k

t=k+1 yt−1ξtk´+

op (1) with (T − 2k)−1PT−kt=k+1 yt−1ξtk = (T − 2k)−1

PT−kt=k+1 yt−1ξt+op (1). See also

Ng and Perron (1995) p. 278. Under Assumption 1, by Chan andWei (1988), Phillips(1987) and Lemma 1, it is easy to show that if k/T → 0, (T − 2k)−1PT−k

t=k+1 yt−1ξt ⇒ωy.xψ (1)

RJxycdW2 as ξt = ψ (L) ηt and 2π times the spectral density at frequency

zero of ηt is ωy.x = ωyy − ωyxΩ−1xxωxy. Recalling that (T − 2k)−2PT−kt=k+1 y

2t−1 ⇒

ωy.xRJ2xyc, (T − 2k) (ϕ− ϕ)⇒ ψ (1)

¡RJ2xyc

¢−1 ¡RJxycdW2

¢. DeÞne s2ξtk = (T − 2k)−1

PT−kt=k+1

ξ2tk,

sξtk converges in probability to the variance of ξtk which is ω1/2y.xψ (1) and (T − 2k)SE (ϕ)⇒

ψ (1)¡RJ2xyc

¢−1/2. Because tϕ = (T−2k)ϕ

(T−2k)SE(ϕ) +(T−2k)(ϕ−ϕ)(T−2k)SE(ϕ) =

(T−2k)(ρ−1)ψ(1)(T−2k)SE(ϕ) +

(T−2k)(ϕ−ϕ)(T−2k)SE(ϕ) =

(T−2k)ψ(1)(T−2k)SE(ϕ)+

(T−2k)(ϕ−ϕ)(T−2k)SE(ϕ)+

(T−2k)(c/T )ψ(1)(T−2k)SE(ϕ) , tϕ ⇒ c

(RJ2xyc)

−1/2+(RJ2xyc)

−1(RJxycdW2)

(RJ2xyc)

−1/2

as long as kT → 0.

If the deterministic terms are not zero, the same proof can be applied to demeanedvariables for Cases 2 and 3 and to demeaned and detrended variables for Cases 4 and5.

Proof. [Theorem 2] The proof of Theorem 2 follows directly from the proof ofTheorem 1, with different Brownian Motions deriving from the different detrendingprocedure.

Documents

Near-Optimal Unit Root Tests with Stationary Covariates ...4 shocks to yt.R2 is the multiple coherence of (1−ρL)yt with xt at frequency zero (Brillinger (2001), p. 296) and it measures