A RESIDUAL-BASED TEST FOR STOCHASTIC COINTEGRATION

A Residual-Based Test for Stochastic Cointegration

Brendan McCabeSchool of Management,University of Liverpool

Stephen LeybourneDepartment of Economics,University of Nottingham

David HarrisDepartment of Economics,University of Melbourne

May 2005

Abstract

We consider the problem of hypothesis testing in a modified version of the stochastic integration andcointegration framework of Harris, McCabe and Leybourne (2002). This nonlinear setup allows for volatilityin excess of that catered for by the standard integration/cointegration paradigm through the introductionof nonstationary heteroscedasticity. We propose a test for stochastic cointegration against the alternativeof no cointegration and a secondary test for stationary cointegration against the heteroscedastic alternative.Asymptotic distributions of these tests under their respective null hypotheses are derived and consistencyunder their respective alternatives is established. Monte Carlo evidence suggests the tests will perform wellin practice. An empirical application to the term structure of interest rates is also given.

We are most grateful to the Associate Editor and two anonymous referees for providing helpful commentson earlier versions of this paper.

1

1 IntroductionThe cointegration framework of Engle and Granger (1987) (EG) is characterized by two widely held stylizedempirical facts. The first is that, of the set of economic time series that exhibit trending behaviour, manyare adequately modelled by processes that are integrated, usually of order one, I(1). The second is that,despite this trending behaviour, such series often tend to co-move over time according to a stationary, orI(0), process i.e. they are cointegrated. Many empirical tests of important economic hypotheses are carriedout within the EG framework, for example the relationship between long run and short run interest rates- the term structure. The EG approach has, perhaps surprisingly however, uncovered only very limitedempirical evidence in support of the term structure (see Campbell and Shiller (1987)). An explanationoften put forward for this is that bond market series tend to be too volatile to be compatible with theI(1)/I(0) framework. That is, the individual series often appear visually to be more volatile, or less smooth,than would be consistent with I(1) and when co-movements between series are analyzed (most simply byexamining the spreads) these also tend to display periods of volatility in excess of that which could beassociated with stationary behaviour. In the words of Campbell and Shiller, the spreads tend to “movetoo much”.One possible approach to dealing with the presence of extra volatility is within the stochastic integration

and cointegration framework of Harris, McCabe and Leybourne (2002) (HMLa). Here, the restrictivestationarity requirement of first differences of individual series and cointegrating error terms of the EGsetup is replaced with a looser condition that these are stochastically trendless; that is, they are simply freeof I(1) stochastic trends. This notion, of course, encompasses the EG setup as a special case. We outlinethis framework in the Section 2.In Section 3 we turn to the issue of hypothesis testing in a regression model representation. The central

hypothesis of interest is whether series are stochastically cointegrated (either stationary or heteroscedastic),or not cointegrated. We suggest a residual-based statistic to test the null of stochastic cointegration. Withinstochastic cointegration, we also consider the hypothesis that the cointegration is stationary against thealternative that it is nonstationary heteroscedastic and we suggest a second statistic to test this. Moreover,when applied to first differences of an individual series, this same statistic can also be used to test the nullof I(1) against heteroscedastic integration. The asymptotic null distributions of these two test statisticsare derived under weak regularity conditions. Both are shown to have normal limit distributions that,unlike most cointegration tests, do not depend on the number of regressors involved. Their consistencyproperties under associated alternative hypotheses are also established.Some Monte Carlo studies, which examine the finite sample size and power characteristics of the new

tests, along with those of their conventional counterparts, are provided in Section 4. These highlightclearly the benefits to be gained by adopting the new test procedures, together with the shortcomingsof using conventional ones, in the stochastic cointegration framework. Finally, in Section 5 we apply ourtests to bond market data from several major economies. Our new testing framework uncovers supportingevidence in favour of the term structure the bond market, in the same situation where conventional testsyield inconsistent results. Notably, for all the interest rate series we consider here, we conclude they arebetter modelled by heteroscedastically integrated, rather than I(1), processes.

2 Stochastic Integration and CointegrationWe first consider a variant of the model introduced in HMLa

zt = µ+ δt+Πwt + εt +Vtht (1)

wt = wt−1 + ηt,

ht = ht−1 + υt

for t = 1, . . . , T . Here zt, µ, δ and εt are m × 1 vectors; wt and ηt are n × 1 vectors; ht and υt arep × 1 vectors; Π and Vt are m × n and m × p matrices, respectively. Only the process zt is observed.The disturbances εt, ηt, υt and Vt are mean zero stationary processes, which may be correlated with oneanother, wt and ht are vectors of integrated processes. So, apart from deterministics, zt consists of anintegrated component,Πwt, together with a shock term, εt+Vtht. This latter term has a linear component,εt, and a nonlinear component Vtht that is nonstationary heteroscedastic through its dependence on the

2

I(1) process ht. Note that it is entirely possible throughout our analysis that wt and ht contain identicalprocesses, though we do not enforce this restriction.1

As regards the statistical properties of the disturbance terms in(1), we make the following linear processassumption. This allows for general forms of serial correlation, cross-correlation and endogeneity.

Assumption LP.Let ζt = [υ

0t, vec(Vt)

0,η0t, ε0t]0 be generated by the vector linear process ζt =

P∞j=0Cjξt−j , where

1.P∞

j=0 j kCjk <∞ with C0 having full rank.2

2. ξt is an i.i.d. sequence.

3. E¡ξtξ

0t

¢= I.

4. For all i, E(ξ16it ) is bounded.

To examine the properties of the model more clearly, we make the temporary simplifying assumptionthat µ = δ = 0. Next, let ei be an m × 1 vector with 1 in its i’th position and 0 elsewhere, so thate0izt = zit, the i’th element of the vector zt. Then, from (1), we have

zit = e0iΠwt + e

0i(εt +Vtht)

and if e0iΠ 6= 0 then zit is said to stochastically integrated. If, in addition, e0iE(VtV0t)ei > 0, zit is said to

be heteroscedastically integrated (HI) due to the term e0iVtht; whilst if e0iVt = 0 then zit is simply I(1).So, a stochastically integrated variable encompasses both ordinary and heteroscedastic integration.To model linear relationships between the variables in zt, let c be a non-zero m× 1 vector and consider

c0zt = c0Πwt + c

0(εt +Vtht).

If c0Π = 0 then the variables of zt are said to be stochastically cointegrated. Under stochastic cointegrationc0zt = c

0(εt +Vtht) behaves like a stochastically integrated process net of its stochastic trend componentand we refer to such a process as being stochastically trendless.3 This terminology is adopted because,under Assumption LP, we can show that as s→∞ (with t fixed)

E(εt+s +Vt+sht+s|=t)−E(εt+s +Vt+sht+s)p→ 0.

In other words, the behaviour of the process up to time t has negligible effect on its behaviour into theinfinite future.4 Therefore, even though the disturbances υt have an infinitely persistent effect on ht+s,their effect on the level of Vt+sht+s is only transitory. This implies that the product process Vtht isstochastically trendless, even if Vt is correlated with υt. While it is the case that Vtht is nonstationaryheteroscedastic, as it can be shown to exhibit linearly trending in variance, it is the stochastically trendlessnature of c0zt = c0(εt +Vtht) which bestows meaning to co-movement of a nonstationary heteroscedastickind.When c0E(VtV

0t)c = 0, then c0zt = c0εt is stationary. If, in addition, Vt = 0, the variables are all

integrated and cointegrated in the standard EG sense. Because of the stationary behaviour of c0zt ineither case, we simply refer to this as stationary cointegration. When c0E(VtV

0t)c > 0, the variables zt

1 In HMLa, ht = wt. Here if any element of wt is identical to that of ht we would simply delete the corresponding elementof υt from Assumption LP below.

2Here and throughout kAk = tr (A0A).3More formally, a vector stochastic process, ut, to said to be stochastically trendless if, as s→∞ (t fixed)

E(ut+s|=t)−E(ut+s)p→ 0

where =t is the sigma field of information of all the elements in the vector up to time t. This implies that the (MSE)optimal s step ahead forecasts of a stochastically trendless process converge to the unconditional mean of the process as theforecast horizon s increases. Following the Beveridge and Nelson (1981) definition, such a process has no stochastic trend (orpermanant component), hence the terminology “stochastically trendless”. An analogous definition has also been used in theliterature on economic convergence; see Bernard and Durlauf (1996). Trendlessness is similar to the concept of a mixingaleand the associated notion of asymptotic unpredictability, with the practically minor difference that the convergence of theconditional expectation in our definition is in probability rather than in an Lp norm.

4A proof of this result is available upon request.

3

are said to be heteroscedastically cointegrated. Thus, stochastic cointegration encompasses both stationarycointegration (possibly of the EG kind) and heteroscedastic cointegration.To further position our concept of heteroscedastic cointegration, note that I(1), HI and the closely

related stochastic unit root processes all share the properties of having linear trends in their variances whilstnot being stochastic trendless.5 When these models of nonstationarity are extended to the multivariatecointegration setting, standard cointegration implies that a certain linear combination of the series becomesstochastically trendless and removes the linear trend in variance. Hence, our definition of heteroscedasticcointegration effectively provides a halfway point to standard cointegration, since the linear combinationbecomes stochastically trendless, while the linear trend in variance remains.

3 Hypothesis Tests and Test StatisticsOur primary goal is to determine if the system is stochastically cointegrated. This null, and alternative ofnon-cointegration, may be stated as H0 : c0Π = 0 and H1 : c0Π 6= 0. Within stochastic cointegration, wemay wish to know whether stationary or heteroscedastic cointegration pertains. The null of stationary coin-tegration against the heteroscedastic alternative may be tested by partitioning H0 as H0

0 : c0E(VtV

0t)c = 0

and H01 : c

0E(VtV0t)c > 0.

It proves convenient to interpret these hypotheses within a regression model. Partition zt into a scalaryt and an (m − 1) × 1 vector xt as zt = [yt,xt0]0 then partitioning (1) conformably, and rearranging, weobtain ∙

ytxt

¸=

∙µyµx

¸+

∙δyδx

¸t+

∙π0yΠx

¸wt +

∙εytεxt

¸+

∙ν0ytVxt

¸ht (2)

where yt, µy, δy and εyt are scalars, xt, µx, δx and εxt are (m− 1)× 1 vectors, π0y and ν0yt are 1× n and1×p vectors, respectively, whileΠx and Vxt are (m−1)×n and (m−1)×p matrices. Letting c = [1,−β0]0,α = µy−β0µx, κ = δy−β0δx, et = εyt−β0εxt = c0εt, q0 = π0y−β0Πx = c

0Π and ν 0t = ν 0yt−β0Vxt= c0Vt,

then we have

yt = α+ κt+ x0tβ + ut, (3)

ut = et + q0wt + ν0tht. (4)

Thus, the regression error term ut is composed of the stationary term et, the integrated term q0wt andthe heteroscedastic component ν 0tht. Note that ut need not have zero mean so that α is not an interceptin the usual sense. In the regression framework we assume there is only one cointegrating vector so thatrank(Πx) = m− 1, which imposes the restriction that n ≥ m− 1.This implies that further sub-relationships among the xt variables in (3) are excluded.6 The null

hypothesis of stochastic cointegration against alternative of non-cointegration can now be expressed via(3) as H0 : q = 0 and H1 : q 6= 0. Within H0, the null hypothesis of stationary cointegration against theheteroscedastic alternative is H0

0 : E (ν0tνt) = 0 against H

01 : E (ν

0tνt) > 0.

For later use, we also define the lag covariances for an arbitrary process at by

γj (at) = T−1TX

s=j+1

asas−j

and define a HAC estimator of the long run variance (LRV) by

ω2 (at) = γ0 (at) + 2lX

j=1

λ(j/l)γj (at) (5)

5There is a growing body of evidence that many economic and financial time series previously considered to be I(1) aremore appropriately modelled as HI or stochastic unit root processes. See the results in Section 5 of this paper and, interalia, Hansen (1992a), Leybourne McCabe and Tremayne (1996), Granger and Swanson (1997), Wu and Chen (1997) andPsaradakis et al. (2001).

6A special case of this model is studied by Hansen (1992a). When q = 0 and Vxt= 0, (3) corresponds to a regressionmodel when the regressors variables are all I(1) and the error term is heteroscedastic, so that the regressand and regressorsand are treated asymmetrically.

4

where λ (.) is a window with lag truncation parameter l. We also assume that assume that AssumptionKN below holds.

Assumption KN (Kernel and Lag Length)

1. λ (0) = 1.

2. 0 ≤ λ (x) ≤ 1 for 0 ≤ x < 1.

3. λ (x) is continuous and of bounded variation on [0, 1] .

4. l→∞ as T →∞.

3.1 Testing H0 Against H1

To test stochastic cointegration against non-cointegration we need to test whether q = 0 in

ut = et + q0wt + ν0tht.

Here, the null hypothesis is composite, encompassing both stationary and heteroscedastic cointegration;while the alternative is I(1) or heteroscedastic integration. It is far from clear how to go about constructingan optimal test statistic at this level of generality, let alone determine its limit null distribution, even if werestrict ourselves to making Gaussian i.i.d. assumptions about the distributions of the unobserved variables.Such complications, therefore, lead us to examine instead a simple statistic for which we can at least finda tractable limiting null distribution free of nuisance parameters, and also establish consistency. To thisend, we consider

Snc =XT

t=k+1utut−k. (6)

In the situation where all the disturbance terms are i.i.d., Snc with k = 1 would test for zero autocorre-lation in ut against the correlation induced by the I(1) term q0wt. When the disturbance terms are moregeneral than i.i.d., Snc needs to be modified to eliminate the nuisance parameters that result from theautocorrelation and from the presence of ν0tht. This is accomplished by allowing k to increase with T .7

Under the cointegrating null, H0, the statistic Snc (when standardized with a HAC variance estimator) isasymptotically N(0, 1) and is consistent under the alternative of no cointegration, H1. This is the contentof Theorem 1 below. Because of the linear process representation, letting k become large eliminates anycorrelation between ut and ut−k under H0, while the HAC variance estimator takes care of the term ν0tht.

8

Under the alternative, because of the I(1) term q0wt, allowing k to grow does not eliminate correlationbetween ut and ut−k. This distinction is the source of consistency of the test.Since yt and xt are observed, we estimate b = [α, κ,β0]0 of (3) by means of the estimator bk =

[αk, κk, β0k]0 given by

bk = (TX

t=k+1

Xt−k X0t)−1

TXt=k+1

Xt−kyt (7)

where Xt = [1, t,x0t]0. This estimator, described in HMLa, is called an asymptotic IV estimator (AIV).

Under H0, a minor modification of the proof of HMLa shows that βk is consistent as k and T → ∞;in contrast with the OLS estimator which is not consistent under heteroscedastic cointegration unless xtconsists entirely of I(1) processes. We now construct (6) using the AIV residuals:

but = yt − αk − κkt− x0tβk. (8)

We then have the following result.

7The form of the statistic Snc was earlier considered by Harris, McCabe and Leybourne (2003) in the context of stationaritytesting in a deterministic regression.

8Cointegrating versions of KPSS stationarity tests, such as that of Shin (1994), suffer from the fact that it is not possibleto remove the effects of nuisance parameters in the partial sum process of ut under the null of heteroscedastic cointegration,leading to incorrect size. The simulation studies of Section 4 confirm this.

5

Theorem 1 Assume the model (3), Assumption LP and Assumption KN hold. If k = O(T 1/2), l = o (k)and l < k, then(i) under H0, bSnc = T−1/2

PTt=k+1 butbut−kp

ω2(butbut−k) d→ N(0, 1),

(ii) under H1, the distribution of¯ bSnc ¯ diverges as T →∞.

Here but is defined in (8) using (7); ω2(.) is defined in (5).The first part of this proposition states that a properly standardized statistic, bSnc, is asymptotically

normal under stationary cointegration (which includes EG) and also under heteroscedastic cointegration;the second part shows that the test is consistent under H1. The same results arise if linear trends areexcluded from (3) and the fitted model.

3.2 Testing H00 Against H

01

In decomposing the composite hypothesis H0 into the null of stationary cointegration against the het-eroscedastic alternative, we need to test whether E (ν0tνt) = 0 in (4), maintaining q = 0. Under thetemporary assumption that et, νt, ηt and υt are all jointly Gaussian i.i.d. and uncorrelated with eachother, it follows from a straightforward application of McCabe and Leybourne (2000) that a locally mostpowerful test of H0

0 against H01 is given by

Shc =XT

t=1tu2t . (9)

We then have the following result.

Theorem 2 Under the conditions of Theorem 1,(i) under H0

0 , bShc = (1/12)1/2 T−3/2PTt=1 t

¡bu2t − σ2u¢q

ω2¡bu2t − σ2u

¢ d→ N(0, 1),

(ii) under H01 , the distribution of

¯ bShc ¯ diverges as T →∞.Here σ2u = T−1

PTt=1 bu2t ; ω2(.) is defined in (5).

Notice that bShc is calculated using bu2t − σ2u, rather than simply bu2t , as (9) might suggest. This alterationis needed to centre the statistic and render it invariant to the variance of ut under H0

0 .The structure of Shc can also be used to test the null of I(1) against the alternative of HI for any given

individual series, by simply constructing bShc by redefining but as but = ∆yt − δy where δy is an estimator ofthe trend coefficient δy given by δy = T−1

PTt=1∆yt . We denote this statistic bShi. It is a straightforward

special case of our results to show that bShi d→ N(0, 1) if yt is I(1) and¯ bShi ¯ diverges if yt is HI. The same

results arise if linear trends are excluded from (3), in which case but = ∆yt.94 Simulation ResultsIn this section we investigate, via Monte Carlo simulation, the finite sample behaviour of our new tests,comparing these with tests applied assuming the conventional paradigm. To test for the null of conventionalcointegration we applying the KPSS cointegration test, specifically the adaptation of Shin (1994) of theKwiatkowski et al. (1992) stationarity test. This test is uses an efficient OLS estimator in which [T 1/4]([.] denoting the integer part) lead and lag terms in ∆xt are added into the regression equation of yt onxt; see Saikkonen (1991) for details. We denote this test Kc. The tests bSnc, bShc, bShi and Kc all requirethe use of a kernel and a lag truncation parameter in their respective variance estimators. For all testswe use the Bartlett kernel for λ (.). As regards choice of l, we allow two schemes. The first simply fixes

9Analogous statistics can of course be constructed for each element of the vector xt.

6

https://www.researchgate.net/publication/4852085_A_Residual_Based_Test_of_the_Null_of_Cointegration_Against_the_Alternative_of_No_Cointegration?el=1_x_8&enrichId=rgreq-a4fe5eb3-1022-4fc0-93ed-83b464fb0e3a&enrichSource=Y292ZXJQYWdlOzIzNTY0OTkxO0FTOjI4ODc2MDYwNTQ5NTI5OEAxNDQ1ODU3MzAxNDI0

https://www.researchgate.net/publication/4852085_A_Residual_Based_Test_of_the_Null_of_Cointegration_Against_the_Alternative_of_No_Cointegration?el=1_x_8&enrichId=rgreq-a4fe5eb3-1022-4fc0-93ed-83b464fb0e3a&enrichSource=Y292ZXJQYWdlOzIzNTY0OTkxO0FTOjI4ODc2MDYwNTQ5NTI5OEAxNDQ1ODU3MzAxNDI0

l = [12(T/100)1/4], which is a fairly mainstream choice in the literature, while the second is the automaticdata-dependent selection method of Newey and West (1994).10 Here we enforce the restriction that, for ournew tests, l < k when l is chosen automatically. Regarding the choice of k, we wish to avoid choosing k ina data dependent manner as the O(T 1/2) rate is designed to deal with all processes covered by AssumptionLP. Of course, O(T 1/2) is not uniquely defined and so different possibilities need to be considered. Here weexamine three candidates. These are k = [0.75T 1/2], k = [T 1/2] and k = [1.25T 1/2]. While obviously notexhaustive, these choices nonetheless prove sufficient for us to be able to gauge the finite sample influenceof different choices of k, and also for us to recommend a value for use in practice.The simulation model we examine is (2) with m = n = p = 2. Specifically, our DGP is∙

ytxt

¸=

∙1 01 d1

¸ ∙w1tw2t

¸+

∙εytεxt

¸+

∙νyt 00 νxt

¸ ∙h1th2t

¸(10)

and the stochastic processes of (10) are generated according to

εyt = φε,yεyt−1 + 1t, εxt = φε,xεxt−1 + 2t,

νyt = φv,yνyt−1 + d2 3t, νxt = φv,xνxt−1 + d3 4t,

∆w1t = 5t, ∆w2t = 6t,

∆h1t = 7t, ∆h2t = 8t

with ( 1t, 2t, 3t, 4t, 5t, 6t, 7t, 8t)0 a multivariate standard normal white noise process. Here the di,

i = 1, 2, 3 are constants. Within this setup, if d1 = d2 = d3 = 0, thenH00 is true and stationary cointegration

between two I(1) series pertains, whilst if d1 6= 0, H1 is true and yt and xt are not cointegrated in anysense (irrespective of the status of d2 and d3). If d1 = 0 with d2 6= 0 and/or d3 6= 0, there is heteroscedasticcointegration. This may exist either between two HI series (d2 6= 0 and d3 6= 0) or between an I(1) andHI series (e.g. d2 = 0 and d3 6= 0). The model is generated over t = −99, ....0, 1, ..T , with the first 100start-up values discarded. We consider sample sizes of T = 200, 400, 600 and the number of replicationsfor all experiments is 10, 000. Tables entries represent empirical rejection frequencies of the various tests,based on regressions allowing constants but not trends, at the nominal asymptotic 0.05-level (these beingtwo tailed tests in the case of bShi, bSnc and bShc). For brevity, we only report results for the bShi tests appliedto yt. In terms of notation in the tables, if φi,j is not explicitly given, its value is set to zero. Variants ofthe tests based on the automatic lag selection are superscripted with an a.In Table 1 we have d1 = d2 = d3 = 0 throughout, so that H0

0 is true - stationary cointegration betweentwo I(1) series. The bShi test has near nominal size, indicating that I(1) rather than HI series are present,and any additional serial correlation in the form of non-zero values of φε,y clearly has little effect on its size.

As regards the test bSnc, its size is well controlled apart from when φε,y = 0.9 and φε,y = φε,x = 0.9. Here,when k = [0.75T 1/2] it is moderately oversized and so too frequently indicates absence of cointegration.However, setting k = [T 1/2] or k = [1.25T 1/2] virtually removes the oversizing problems, especially ifthe automated variants are considered. When we examine the test bShc, we find that the choice of k hasfar less effect on the size. For φε,y = 0.9 and φε,y = φε,x = 0.9, all three choices (whether based onautomated variants or not) produce oversized tests, and so indicate spurious heteroscedastic cointegration,although the degree of oversizing is not particularly serious and is mostly ameliorated as the sample sizeincreases. On the basis of these results then, specifically those pertaining to bSnc, we would conclude thatsetting k = [0.75T 1/2] is realistically too low to maintain reliable finite sample size. Notice that the non-automated KPSS cointegration test, Kc, is quite badly oversized when φε,y = 0.9 and φε,y = φε,x = 0.9, andautomating the lag choice struggles to correct this to a satisfactory degree. Interestingly, the automatedKc test can be badly oversized in the presence of negative autocorrelation, unless the sample size is large.None of the other tests, however, appear to be adversely affected by negative autocorrelation.Table 2 examines the size and power of the tests under six different models of heteroscedastic cointe-

gration, H01 . In the first four, both yt and xt are HI (d2 6= 0 and d3 6= 0); in the fifth yt is HI (d2 6= 0),

xt is I(1) (d3 = 0), with these roles being reversed in the sixth model. The size issue relates to bSnc and itis clear that the test does not appear particularly sensitive to k, with size being controlled reasonably wellfor all choices, across all model specifications. If anything, setting k = [0.75T 1/2] sometimes leads to slight

10 In the context of stationarity testing, this has been demonstrated by Hobijn, Franses and Ooms (1998) to remove muchof the well-documented oversizing problems associated with KPSS tests.

7

oversizing; setting k = [1.25T 1/2] occasionally yields slight undersizing. When considering the power of bShc,both fixed and automated variants exhibit consistency. The power does not appear to change particularlydramatically across model specifications either. Power does tend to decrease monotonically as k increases,although the rate of decrease is fairly low. The test bShi is also seen to be consistent (aside obviously fromwhen yt is I(1)). The behaviour of the Kc test is much less predictable, however. This is because, asmentioned in Section 3, the distribution of Kc in the HI case depends on nuisance parameters. This testcan have very low or reasonably high power to reject its null of stationary cointegration, depending onthe nature of the heteroscedastic cointegration. For example, if xt is I(1) as in the fifth case, its power istrivial. If, on the other hand, if xt is HI and νxt is persistent, as in the second or sixth cases, it can rejectstationary cointegration very frequently. This differing behaviour is due to the inconsistency of the OLSestimator of β (= 1) whenever xt is HI.11

In Table 3, we examine the power of the tests under the case of no cointegration, H1, here betweentwo I(1) series (bShi is not included now). Consistency of bSnc is clearly evident, as is the role of k indetermining its power. The power is seen to fall fairly rapidly with increasing k for both fixed andautomated variants.12 Notice also that power of bSnc often exceeds that of Kc. There is no contradictionhere, however: the optimality properties associated with the raw form of the KPSS statistic, on which Kc

is based, do not necessarily carry over to the current empirical version of the statistic, which needs to berobustified both to serial correlation and endogeneity. It is also apparent that the power of Kc drops quitesharply when moving from the fixed to automated lag selection.In unreported simulations, we also examined the properties of the tests when some endogeneity is

introduced. The first case revisited H00 , stationary cointegration between two I(1) series, where we set

cor( 1, 5) = −0.7; cor( 2, 5) = 0.7; such that the increment processes of εyt and εxt are correlated withthat of the random walk w1t. The sizes of bSnc and bShc were largely unaffected by introducing suchcorrelation. A second case revisited H0

1 , heteroscedastic cointegration between two HI series. Here wemade w1t and h1t identical random walks, so that the I(1) process driving part of the heteroscedasticityalso drove the level of the processes. In addition, we set cor( 4, 8) = 0.7, such that the increment processof νxt was correlated with that of the random walk h2t it multiplies into. Again, the size of bSnc remainedreasonably accurate, and consistency of bShc (and bShi) appeared unaffected. Full details of these simulationsare available upon request.All the above simulation results concerning bSnc, bShc and bShi are pretty much in line with what we would

expect given our theoretical results of Section 3 regarding asymptotic normality of the tests, their robustnessto serial correlation and endogeneity, and their consistency. They all detect the appropriate departuresfrom their respective null hypotheses. The choice of k remains an issue, however. Predominantly led by thebehaviour of bSnc, the facts are that setting k too low can, in certain situations, induce size distortions (c.f.Table 1), while setting k too high leads to a loss of power (c.f. Table 3). Moreover, it seems rather unlikelysuch a trade-off can be entirely avoided however k is chosen. As such, a reasonable compromise wouldappear to be the middle value of the three we have considered, and so we recommend setting k = [T 1/2] asa matter of practice. Whether l is selected using a fixed or automated method does not appear particularlycrucial to our tests performance and we would not favour one approach over the other.Our results also highlight the problems of using OLS-based procedures for testing for cointegration,

such as Kc. Because of the inconsistency of the OLS estimator whenever the heteroscedastic cointegrationinvolves xt that is HI, causing the test to reject, this essentially means that it is unable to discern betweenthis situation (i.e. when series “differ” by a heteroscedastic, but stochastically trendless term) and non-cointegration (i.e. when series “differ” by a stochastic trend term). Of course, we may take the view thatsince neither situation represents a stationary cointegrating relation, a rejection of the null of stationarycointegration is the appropriate outcome. However, if the heteroscedastic cointegration involves xt that isI(1), the same test tends to no longer reject this null, which clearly cannot also represent an appropriateoutcome. This of course means that the inference drawn can become crucially dependent on the orderingof the I(1) and HI variables, even asymptotically. Such considerations do not apply to our new tests astheir asymptotic distributions are free of nuisance parameters. It is also important to remember that whenapplying bSnc and bShc, we never actually need to distinguish between which series are I(1) and HI. That

11Busetti and Taylor (2003) demonstrate that the KPSS tests applied to an individual series with heteroscedastic errorscan over-reject the null of stationarity. In the current context, the cointegrating KPSS statistic actually diverges, due to theinconsistency of the OLS estimator when xt is HI.12These observations also apply to Shc, though it has rather less power than Snc since it is not constructed to detect this

alternative.

8

https://www.researchgate.net/publication/4723894_Variance_Shifts_Structural_Breaks_and_Stationarity_Tests?el=1_x_8&enrichId=rgreq-a4fe5eb3-1022-4fc0-93ed-83b464fb0e3a&enrichSource=Y292ZXJQYWdlOzIzNTY0OTkxO0FTOjI4ODc2MDYwNTQ5NTI5OEAxNDQ1ODU3MzAxNDI0

is, we never actually need to calculate the test bShi for individual series. In this respect, the only rationalefor calculating bShi is that it may provide early warning of situations where it would be unwise to applyconventional cointegration tests.

5 An Empirical Example: The Term Structure of Interest RatesA necessary empirical condition for the expectations theory of the term structure of interest rates is thatlong run and short run interest rates cointegrate. We test this empirically using monthly data from theUnited States, Canada, United Kingdom and Japan, taken from the OECD/MEI database. A single longrun interest rate, Lt, and a variety of short run rates, Sit, are used for each country, and we considerbivariate regressions of Lt on Sit, and also the re-ordered regression of Sit on Lt.13 We calculate the samearray of statistics as in Section 4, where again the regressions include constants but not trends. We alsocalculate the standard KPSS stationarity test allowing a constant (denoted Ks), to test individual seriesfor stationarity. Both fixed and automatic lag selection procedures are employed for all tests and, in viewof the results of the previous section, we set k = [T 1/2].The results are given in Table 4, where the entries are p-values of the tests based on the asymptotic

distribution. Bold print indicates a p-value of 0.05 or less and in the current context we will consider thisto represent a rejection of the associated null hypothesis. As regards the individual series, we first notethat the KPSS test, Ks, indicates rejection of I(0) for every one of the seventeen individual interest rateseries considered. Further to this, the Shi test shows that all of these interest rate series appears to be HIrather than I(1).14

Turning now to the bivariate regression results, first Lt on Sit, we see that according to bSnc, stochasticcointegration is not rejected for eight of the thirteen pairs. In both Canada and the United Kingdom, thenon-rejection is unambiguous. In the case of the United States the evidence is mixed; rejections are foundfor two of the four pairs considered. No evidence of stochastic cointegration at all is found for Japan,though the peculiar nature of Japanese short run interest rates in recent times (being effectively zero) maypartly explain this finding. According to the bShc test of the eight pairwise regressions that do not rejectstochastic cointegration, five represent stationary cointegration between HI series (three for Canada, twofor the United Kingdom) and three represent heteroscedastic cointegration between HI series (two for theUnited States, one for the United Kingdom). This pattern of results is the same whether the lag selectionis fixed or automated. When we consider the regressions of Sit on Lt, qualitatively, the results for Canada,United Kingdom and Japan are unchanged. The United States now shows two less rejections of stochasticcointegration, with one of the four being stationary cointegration, one being heteroscedastic cointegrationand two indeterminate. This makes the total of non rejections now ten out of the thirteen pairs. Thus,there is certainly a reasonable degree of support for the term structure of interest rates in these data(particularly so if the somewhat anomalous case of Japan is excluded from consideration).A rather less coherent picture emerges if we examine the outcomes from the KPSS cointegration test,

Kc. For regressions of Lt on Sit, conventional cointegration is rejected for every one of the thirteen pairsof long and short run rates if a fixed lag selection is used (this drops to four rejections if lag selection isautomated though we have seen that the power of the automated lag test can be a good deal lower thanthat of the fixed lag test). In contrast, no rejections whatsoever are obtained for regressions of Sit on Lt.Hence, the differing volatilities of long and short run interest rates appear to have a huge influence on theoutcomes for conventional OLS-based cointegration tests, to the extent that contradictory inference arises,dependent on variable ordering. In contrast, the new procedures we suggest here are designed to yieldinference that is rather more robust in this kind of setting.

6 ReferencesBernard, A.B. and S.N. Durlauf (1996) Interpreting tests of the convergence hypothesis. Journal of Econo-metrics, 71, 161—73.

Beveridge, S. and C. R. Nelson (1981) A New Approach to Decomposition of Economic Time Series intoPermanent and Transitory Components with Particular Attention to

13See the note to Table 4 for a full description of the data.14 It is easily shown that the KPSS stationarity test is consistent when the alternative is HI.

9

Measurement of the Business Cycle. Journal of Monetary Economics 7 (1981), 151—174.Busetti, F. and A.M.R. Taylor (2003) Variance shifts, structural breaks and stationarity tests. Journal ofBusiness and Economic Statistics, 21, 510-31.

Campbell, J.Y. and R.J. Shiller (1987) Cointegration and tests of present value models. Journal of PoliticalEconomy, 95, 1062-88.

Engle, R.F. and C.W.J. Granger (1987) Cointegration and error correction: representation, estimation andtesting. Econometrica, 55, 251—76.

Granger, C.W.J. and N.R. Swanson (1997) An introduction to stochastic unit root processes. Journal ofEconometrics, 80, 35-62.

Hansen, B.E. (1992a) Heteroscedastic cointegration. Journal of Econometrics, 54, 139-58.Hansen, B.E. (1992b) Convergence to stochastic integrals for dependent heterogeneous processes. Econo-metric Theory, 8, 489-500.

Harris, D., B.P.M. McCabe and S.J. Leybourne (2002) Stochastic cointegration: estimation and inference.Journal of Econometrics, 111, 2, 363-84.

Harris, D., B.P.M. McCabe and S.J. Leybourne (2003) Some limit theory for infinite order autocovariances.Econometric Theory, 19, 829-64.

Hobijn, B., P.H. Franses and M. Ooms (1998) Generalizations of the KPSS-test for stationarity. Econo-metric Institute working paper, Erasmus University, Rotterdam.

Kwiatkowski, D., P.C.B. Phillips, P. Schmidt and Y. Shin (1992) Testing the null of stationarity againstthe alternative of a unit root: how sure are we that economic time series have a unit root ? Journal ofEconometrics, 54, 159-78.

Leybourne, S.J., B.P.M. McCabe and A.R. Tremayne (1996) Can economic time series be differenced tostationarity ? Journal of Business and Economic Statistics, 14, 435—46.

McCabe, B.P.M. and S.J. Leybourne (2000) A general method of testing for random parameter variation instatistical models in Innovations in Multivariate Statistical Analysis: a Festschrift for Heinz Neudecker,eds. Heijmans, R.D.H, D.S.G. Pollock and A. Satorra, 75-85, Kluwer.

Newey, W.K. and K.D. West (1994) Automatic lag selection in covariance estimation. Review of EconomicStudies, 61, 631-54.

Phillips, P.C.B. and V. Solo (1992) Asymptotics for linear processes. Annals of Statistics 20, 971—1001.Psaradakis, Z., M. Sola and F. Spagnolo (2001) A simple procedure for detecting periodically collapsingrational bubbles. Economics Letters, 72, 317-23.

Saikkonen, P. (1991) Asymptotically efficient estimation of cointegration regressions. Econometric Theory,7, 1-21.

Shin, Y. (1994) A residual based test of the null of cointegration against the alternative of no cointegration.Econometric Theory, 10, 91-115.

Wu, J-L. and S-L. Chen (1997) Can nominal exchange rates be differenced to stationarity ? EconomicsLetters, 55, 397-402.

10

7 Proofs

7.1 Notation and Conventions

In what follows we assume that Assumptions LP and KN, the model (3) and k = O¡T 1/2

¢hold. For

the model specified by equations (1), (2) and (3), with ζt = [υ0t, vec(Vt)0,η0t, ε

0t]0, let C =

P∞j=0Cj , and

Gj=P∞

i=j∨0 (Ci−j ⊗Ci) and define covariance matrices Ω11= CC0 and Ω22=

P∞j=−∞GjG

0j . Also define

St to be the partial sum of the ζt i.e. ∆St = ζt. Selector matrices Rυ,Rν ,Rη and Rε are definedimplicitly such that υt = R0

υζt, νt = R0νζt, ηt = R0

ηζt and εt = R0εζt. When taking expectations

through an infinite summation sign, we generally do not remark on the operation when obviously squaresummable linear processes are involved.For transparency, we analyse the regression model without a time trend included, though all our results

can be shown to extend to the trend case. We also make repeated use of the following representations

ut = ut − (bk − b)0 Xt, (11)

utut−k = utut−k − (bk − b)0Xtut−k − utX0t−k(bk − b) + (bk − b)

0XtX

0t−k(bk − b)

= utut−k + zk,t (12)

with Xt = [1,x0t]0, bk − b = [(αk − α) , (βk − β)

0]0 and where zk,t is defined implicitly.

When dealing with LRV terms it is convenient to utilize the following results. First, in manipulatingexpressions involving kernels we adopt the notation λ+(j/l) = 2λ(j/l), j > 0, λ+(0) = 1. Next, for anysequences at and bt define

γj(a, b) = T−1TX

s=j+1

asbs−j

and we use the convention that γj(a) = γj(a, a). Then for the sequence at + bt we have

γj(a+ b) = γj(a) + γj(a, b) + γj(b, a) + γj(b).

Also define for any sequences at and bt

ω(a, b) =lX

j=0

λ+(j/l)γj(a, b)

again with the convention that ω2(a) = ω(a, a). So, we have for the sequence at + bt,

ω2(a+ b) = ω2(a) + ω(a, b) + ω(b, a) + ω2(b).

Thus, for δ > 0 we can write¯T−δω2(a+ b)− ω2(a)

¯≤¯T−δω(a, b)

¯+¯T−δω(b, a)

¯+ T−δω2(b). (13)

Note too that ¯T−δω(a, b)

¯≤

lXj=0

λ+(j/l).

vuutT−(δ+1)TXt=1

a2t .

vuutT−(δ+1)TXt=1

b2t (14)

with the obvious modification for a = b.In our applications at is often a product sequence, at = ctct−k, say. The summation in s starts at

k + j + 1 and in t starts at t = k + 1. Then, (14) yields

¯T−δω(a, b)

¯≤

lXj=0

λ+(j/l).

vuutT−(δ+1)TX

t=k+1

(ctct−k)2.

vuutT−(δ+1)TXt=1

b2t

≤lX

j=0

λ+(j/l).

vuutT−(δ+1)TXt=1

c4t .

vuutT−(δ+1)TXt=1

b2t . (15)

11

7.2 Proof of Theorems

We also appeal to the following lemmata in establishing the results of Theorems 1 and 2.

Lemma 1 Under H00 , ω

2 (utut−k)p→ ω2e1 where

ω2e1 = limT→∞

T−1 var(TX

t=k+1

etet−k) = c0R0

εΩ22Rεc.

Proof. In this case ut = et and utut−k = etet−k + zk,t. Setting δ = 0, at = etet−k and bt = zk,t wehave that

¯ω2 (utut−k)− ω2 (etet−k)

¯is bounded by (13). The first term in (13) is bounded by (15). That

is

|ω(etet−k, zk,t)| ≤lX

j=0

λ+ (j/l) .

vuutT−1TXt=1

e4t .

vuutT−1TX

t=k+1

z2k,t

where the order of the first RHS term is O(l) (Assumption KN.2) and the second term is Op (1), independentof k, by Markov’s inequality and Assumption LP. As for the third term, recalling the expression for zk,t in(12), note that T 1/2DT (bk −b) is Op(1) where DT = diag[1,

√T Im] as follows from HMLa. Thus, in (12),

the quadratic form in (bk−b) is of a lower order than the two linear terms in (bk−b). The linear terms are ofthe same order. So the two dominant terms in T−1

PTt=k+1 z

2k,t are T

−1PTt=k+1(bk−b)

0Xtu2t−kX

0t(bk−b)

and T−1PT

t=k+1(bk − b)0Xt−ku

2tX

0t−k(bk − b). But°°°°°T−1

TXt=k+1

(bk − b)0Xtu2t−kX

0t(bk − b)

°°°°°≤ T−1

°°°T 1/2DT (bk − b)°°°2vuutT−1

TXt=1

°°D−1T XtX 0tD−1T

°°2.vuutT−1

TXt=1

u4t

= T−1.Op(1).Op(1).Op(1)

and it is clear that the second dominant term is of the same order. So,qT−1

PTt=k+1 z

2k,t = Op(T

−1/2).Hence

|ω(etet−k, zk,t)| = Op(lT−1/2).

The same method of proof shows that |ω(zk,t, etet−k)| and ω2(zk,t) are also Op(lT−1/2).

Thus, ¯ω2 (utut−k)− ω2 (etet−k)

¯= Op(lT

−1/2).

Applying Theorem LRV of Harris, McCabe and Leybourne (2003) (HMLb) (with n = 1, α = 2 and µ = 0)then shows that ω2 (etet−k)

p→ ω2e1.

Lemma 2 Under H01 , T

−2ω2 (utut−k) = T−2ω2¡ν0thth

0t−kνt−k

¢+Op

¡lT−1/2

¢.

Proof. Now ut = et + ν0tht and utut−k = utut−k + zk,t. Setting δ = 2, at = utut−k and bt = zk,t wehave that

¯T−2ω2 (utut−k)− ω2 (utut−k)

¯is bounded by (13). The first term in (13) is bounded by (15).

That is ¯T−2ω(utut−k, zk,t)

¯≤

lXj=0

λ+ (j/l) .

vuutT−3TXt=1

u4t .

vuutT−3TX

t=k+1

z2k,t

12

where the first RHS term is O(l) and the second Op(1). The dominant term of T−3PT

t=k+1 z2k,t is°°°°°T−3

TXt=k+1

(bk − b)0Xtu2t−kX

0t(bk − b)

°°°°°≤ T−1

°°°DT (bk − b)°°°2vuutT−1

TXt=1

°°D−1T XtX 0tD−1T

°°2.vuutT−3

TXt=1

u4t

= T−1Op(1).Op(1).Op(1)

where the first two Op(1) results can be shown to hold via a simple modification of the approach of HMLa.Thus

T−3TX

t=k+1

z2k,t = Op(T−1)

and so¯T−2ω(utut−k, zk,t)

¯is bounded by anOp

¡lT−1/2

¢variable. That

¯T−2ω(zk,t, utut−k)

¯and T−2ω2(zk,t)

are also bounded by an Op

¡lT−1/2

¢variable follows similarly. Combining these results gives¯

T−2ω2 (utut−k)− ω2 (utut−k)¯= Op(lT

−1/2).

Since et is of a lower order of magnitude than ν 0tht it follows by similar arguments that¯T−2ω2 (utut−k)− ω2

¡ν0thth

0t−kνt−k

¢¯= Op(lT

−1/2).

Lemma 3 Under H01 , T

−2ω2¡ν0thth

0t−kνt−k

¢ d→R 10(W ⊗W )

0ΩPP (W ⊗W ) where

ΩPP=(Rν ⊗Rν)0Ω22 (Rν ⊗Rν) ,W = R0

υB1

with B1 a Brownian motion process.

Proof. Write

T−2ω2¡ν 0thth

0t−kνt−k

¢=

lXj=0

λ+ (j/l)T−3TX

t=k+j+1

(ht−k ⊗ ht)0 vec¡νtν

0t−k¢vec

¡νt−jν

0t−k−j

¢0(ht−k−j ⊗ ht−j)

=lX

j=0

λ+ (j/l)T−3TX

t=k+j+1

(St−k ⊗ St)0 (Rυ ⊗Rυ) (Rν ⊗Rν)0 vec(ζtζ0t−k) vec(ζt−jζ0t−k−j)0(16)

× (Rν ⊗Rν) (Rυ ⊗Rυ)0(St−k−j ⊗ St−j)

= T−3T−kXt=l+1

(St ⊗ St)0 (Rυ ⊗Rυ)

× (Rν ⊗Rν)0

⎡⎣ lXj=0

λ+ (j/l)Evec(ζtζ0t−j)Evec(ζtζ0t−j)0⎤⎦ (Rν ⊗Rν) (17)

× (Rυ ⊗Rυ)0 (St ⊗ St) + op (1)

d→Z 1

0

(B1 ⊗B1)0 (Rυ ⊗Rυ) (Rν ⊗Rν)

0Ω22 (Rν ⊗Rν) (Rυ ⊗Rυ)0 (B1 ⊗B1)

=

Z 1

0

(W ⊗W )0ΩPP (W ⊗W ) .

The key to the proof lies in replacing vec(ζtζ0t−k) vec(ζt−jζ

0t−k−j)

0 in (16) byEvec(ζtζ0t−j)Evec(ζtζ0t−j)0in (17). This means that the convergence in square brackets is nonstochastic and thus the CMT is sufficient

13

to deduce the asymptotic distribution. Also the quantity in square brackets converges to Ω22 because itcan be shown to be a consistent estimate of the long run variance of vec

¡ζtζ

0t−k¢, which is the definition

of Ω22, i.e. Ω22 = limT→∞ varT−1/2PT

t=k+1 vec¡ζtζ

0t−k¢. Then ΩPP = (Rν ⊗Rν)

0Ω22 (Rν ⊗Rν) by

definition.The validity of replacing vec(ζtζ

0t−k) vec(ζt−jζ

0t−k−j)

0 by the double expectation involves establishingthe following sequence of results (expressed in the scalar case for simplicity). That is,

lXj=0

λ+ (j/l)T−3TX

t=k+j+1

StSt−jSt−kSt−k−jζtζt−jζt−kζt−k−j

=lX

j=0

λ+ (j/l)T−3TX

t=k+j+1

S4t−kζtζt−jζt−kζt−k−j + op (1)

=lX

j=0

λ+ (j/l)T−3TX

t=k+j+1

S4t−kEt−k¡ζtζt−jζt−kζt−k−j

¢+ op (1)

=lX

j=0

λ+ (j/l)T−3T−kXt=j+1

S4t E¡ζtζt−j

¢2 + op (1)

=

⎡⎣ lXj=0

λ+ (j/l) E(ζtζt−j)2⎤⎦T−3 T−k−lX

t=1

S4t + op (1)

d→ ω22

Z 1

0

B41 .

The complete proofs of these steps are available from the authors on request. Notice that penultimateline above shows the virtue of using the expectation device as the CMT then delivers the result in a verystraightforward way.

Lemma 4 Under H00 , ω

2¡u2t − σ2u

¢ p→ ω2e2 where

ω2e2 = limT→∞

T−1 varTXt=1

¡e2t − σ2e

¢

and σ2e = E(e2t ).

The proof is similar to that of Lemma 1 and is thus omitted.

Lemma 5 Let ζt satisfy Assumption LP and let k = O(T 1/2). Then, as T →∞,⎡⎣T−3/2 TXt=k+1

(ht−k ⊗ ht)0 vec(νtν0t−k), T−1h[Ts]−k ⊗ h[Ts]−1, T−1/2[Ts]X

t=k+1

vec(νtν0t−k)

⎤⎦⇒

∙Z 1

0

(W ⊗W )0 dP,W ⊗W,P

¸where W = R0

υB1 and P = (Rν ⊗Rν)0B2 where B1 and B2 are independent Brownian motion processes.

Proof. First rewrite using ∆St = ζt, so that

T−3/2X

(ht−k⊗ht)0 vec¡νtν

0t−k¢

= T−3/2X¡

R0υSt−k⊗R0

υSt−1¢0vec

¡R0νζtζ

0t−kRν

¢+ T−3/2

Xh0t−kνt−kν

0tυt.

The proof proceeds by applying the Beveridge-Nelson decomposition to the first term and showing thatthe second term is asymptotically negligible. We use the notation

ζtζt−k =mk,t+rk,t−∆rk,t

14

where

mk,t =∞Xj=1

Gk,k−j vec¡ξtξ

0t−j¢, rk,t =

∞Xi=−∞

Fk,k−i (L) vec¡ξtξ

0t−i¢,

rk,t =∞Xi=1

Gk,k−i (L) vec¡ξtξ

0t−i¢

and the coefficients are defined by

Gk,r (L) =k−1Xj=r∨0

(Cj−r⊗Cj)Lj , Fk,r (L) =

∞Xj=r∨k

(Cj−r⊗Cj)Lj ,

Gk,k−i (L) =k−2Xj=0

Gk,k−i,jLj , Gk,k−i,j =

k−1Xr=(k−i)∨(j+1)

Cr−k+i⊗Cr.

Apply Theorem BN of HMLb to vec¡ζtζ

0t−k¢to get a martingale approximation, mk,t, a remainder

term rk,t and an over-differenced factor ∆rk,t. The idea is that the martingale term is dominant and thatthe dependence on k is absorbed into its variance. In this way the proof of convergence to a stochasticintegral can be treated by conventional methods of analysis. Thus,

T−3/2X


0t−k¢

= T−3/2X¡

R0υSt−k⊗R0

υSt−1¢0(Rν⊗Rν)

0mk,t + T−3/2X¡

R0υSt−k⊗R0


0 (rk,t−∆rk,t)

+T−3/2X

h0t−kνt−kν0tυt.

We find

T−3/2X¡

R0υSt−k⊗R0


0 rk,tp→ 0,

T−3/2X¡

R0υSt−k⊗R0


0∆rk,tp→ 0,

T−3/2X

h0t−kνt−kν0tυt

p→ 0.

The first result follows directly from Theorem SI of HMLb and the second is established along very similarlines. The last follows by writing

T−3/2X

h0t−kνt−kν0tυt = T−3/2

Xh0t−kat −Et−k(at)+ T−3/2

Xh0t−kEt−k(at)

where at = νt−kν0tυt. The first term can be shown to disappear on exploiting the properties of the

increment process i.e. that Et−kat − Et−k(at) = 0; the second term disappears by applying Theorem3.3 of Hansen (1992b).Thus,

T−3/2X


0t−k¢= T−3/2

X¡R0υSt−k⊗R0


0mk,t + op(1).

Now, since k = o (T ) , it follows from the Theorem FCLT of HMLb that

UT,[Ts] ≡ T−1¡R0υS[Ts]−k⊗R0

υS[Ts]−1¢⇒ R0

υB1⊗R0υB1,

jointly with

NT,[Ts] = T−1/2[Ts]X

vec¡νtν

0t−k¢=MT,[Ts] + op(1)⇒ (Rν⊗Rν)

0B2,

whereMT,[Ts] = T−1/2P[Ts]

mk,t. Thus Theorem SI of HMLb applies and setting BQ ≡ (Rν⊗Rν)0B2 =

P and U ≡ R0υB1 ⊗R0

υB1 =W ⊗W we have thathT−1/2

XT−1

¡R0υSt−k⊗R0


0mk,t,UT,[Ts],MT,[Ts]

i⇒∙Z 1

0

(W ⊗W )0 dP,U, P

¸.

15

7.3 Proof of Theorem 1

Part (i) (null distribution).Sections (a) and (b) below derive the asymptotic null distribution of bSnc under H0

0 and H01 , respectively.

(a) Under H00 , ut = et and from HMLa, (bk − b)0 = [Op

¡T−1/2

¢, Op

¡T−1

¢] and

T−2TX

t=k+1

xtx0t−k, T

−1/2TX

t=k+1

et, T−1

TXt=k+1

xt−ket, T−1

TXt=k+1

xtet−k, T−3/2

TXt=k+1

xt−k

are all Op (1). Consequently, using (12) we find

T−1/2TX

t=k+1

utut−k = T−1/2TX

t=k+1

etet−k +Op

³T−1/2

´.

Since et = c0εt is a linear combination of a vector linear process, it follows from an application of TheoremFCLT of HMLb that

T−1/2TX

t=k+1

etet−kd→ N

¡0, ω2e1

¢where by Lemma 1, ω2 (utut−k)

p→ ω2e1. Thus,

bSnc = T−1/2PT

t=k+1 butbut−kω(butbut−k) d→ N (0, 1) .

(b) Under H01 , ut = et + ν0tht and from a minor modification to the results of HMLa, (bk − b)0 =

[Op (1) , Op

¡T−1/2

¢], T−1

PTt=k+1 ut and T−3/2

PTt=k+1 xtut−k are Op(1). Hence, using (12) we find

T−3/2TX

t=k+1


t=k+1

utut−k +Op

³T−1/2

´.

Now, substituting ut = et + ν0tht, we can write

T−3/2TX

t=k+1


t=k+1

ν0thth0t−kνt−k + T−3/2

TXt=k+1

eth0t−kνt−k

+T−3/2TX

t=k+1

ν 0thtet−k + T−3/2TX

t=k+1

etet−k +Op

³T−1/2

´

= T−3/2TX

t=k+1

ν0thth0t−kνt−k +Op

³T−1/2

´

= T−3/2TX

t=k+1

(ht−k ⊗ ht)0 vec(νtν0t−k) +Op

³T−1/2

´d→Z 1

0

(W ⊗W )0dP

where W = R0υB1 and P = (Rν ⊗ Rν)

0B2 and B1 and B2 are independent Brownian motions withcovariance matrices Ω11 and Ω22. The weak convergence follows from Lemma 5. The covariance matrix ofP is ΩPP=(Rν ⊗Rν)

0Ω22 (Rν ⊗Rν).

Combining the results of Lemmas 2 and 3 shows that

T−2ω2 (utut−k)d→Z 1

0

(W ⊗W )0ΩPP (W ⊗W ) .

16

We now require the distribution of the ratio of T−3/2PT

t=k+1 utut−k to T−1ω (utut−k). As shown inLemma 5 ⎡⎣T−3/2 TX

t=k+1

(ht−k ⊗ ht)0 vec(νtν0t−k), T−1h[Ts]−k ⊗ h[Ts]−1, T−1/2[Ts]X

t=k+1

vec(νtν0t−k)

⎤⎦0

⇒∙Z 1

0

(W ⊗W )0 dP,W ⊗W,P

¸.

Next the CMT, with the above vector as argument and the ratio as the map, applies to conclude that

T−3/2PT

t=k+1 utut−k

T−1ω (utut−k)d→

R 10(W ⊗W )0 dPqR 1

0(W ⊗W )

0ΩPP (W ⊗W )

. (18)

As Z 1

0

(W ⊗W )0 dP ∼ N

µ0,

Z 1

0

(W ⊗W )0ΩPP (W ⊗W )

¶conditional on W , the distribution in (18) is unconditionally N (0, 1).

Part (ii) (consistency).Under H1, ut = et + q

0wt + ν 0tht where q 6= 0. Here, it is easy to show that αk − α = Op

¡T 1/2

¢and

βk − β = Op (1) and, using (12), this implies that utut−k is of the same order in probability as utut−k. Itis then straightforward to deduce that

T−1/2TX

t=k+1

utut−k = Op(T3/2).

Now we require a bound for the order of probability of ω2 (utut−k), which again is the same as the orderof probability of ω2 (utut−k). Setting a = b = utut−k and δ = 2 in (15) yields

l−1T−2ω2 (utut−k) ≤ l−1lX

j=0

λ+ (j/l) .T−3TXt=1

u4t

= O(1).Op(1).

Thus we conclude that ω2 (utut−k) = Op

¡lT 2¢at most. Hence the distribution of

¯ bSnc ¯ diverges at leastas fast as Op(

pT/l).

7.4 Proof of Theorem 2

Part (i) (null distribution).Under H0

0 , ut = et we have αk − α = Op

¡T−1/2

¢and βk − β = Op

¡T−1

¢. Then, it follows from (11)

that

T−3/2TXt=1

t¡u2t − σ2u

¢= T−1/2

TXt=1

(t

T− 12− 1

2T)¡e2t − σ2e

¢+Op(T

−1/2)

where σ2e = E(e2t ). Write

T−1/2TXt=1

(t

T− 12− 1

2T)¡e2t − σ2e

¢=

Z 1

0

(s− 12)dFT (s) + op(1)

here FT (s) is the partial sum process of e2t−σ2e which weakly converges to F (s) by Theorem 3.8 of Phillipsand Solo (1992) .Then, noting by integration by parts that

R 10(s− 1

2)dFT (s) =12FT (1)−

R 10FT (s) ds, we

can use the CMT to deduce,

T−3/2TXt=1

t¡u2t − σ2u

¢ d→Z 1

0

(s− 12)dF (s)

17

where F (s) is a Brownian motion with variance ω2e2, as defined in Lemma 4. Hence,R 10

¡s− 1

2

¢dF (s) is

normally distributed with mean zero and variance

ω2e2

Z 1

0

(s− 12)2ds = 12ω2e2

which shows

T−3/2TXt=1

t¡u2t − σ2u

¢ d→ N(0, 12ω2e2).

From Lemma 4, ω2¡u2t − σ2u

¢ p→ ω2e2 and so the result follows.

Part (ii) (consistency).Under H0

1 , ut = et + ν0tht we have αk − α = Op (1) and βk − β = Op

¡T−1/2

¢. We may write

T−3/2TXt=1

t¡u2t − σ2u

¢= T−1/2

TXt=1

(t

T− 12− 1

2T)u2t .

From (11), ut is of the same order in probability as ut and it is then straightforward to show that

T−1/2TXt=1

(t

T− 12− 1

2T)u2t = Op

³T 3/2

´and hence

T−3/2TXt=1

t¡u2t − σ2u

¢= Op

³T 3/2

´.

In the denominator, ω2¡u2t − σ2u

¢and ω2

¡u2t − σ2u

¢(where σ2u = T−1

PTt=1 u

2t ) are of the same order in

probability . Setting a = b = u2t − σ2u and δ = 2 in (14) yields

l−1T−2ω2¡u2t − σ2u

¢≤ l−1

lXj=0

λ+ (j/l) .T−3TXt=1

¡u2t − σ2u

¢2= O(1).T−3

TXt=1

u4t − (T−2TXt=1

u2t )2.

It is easily shown that both T−3PT

t=1 u4t and T

−2PTt=1 u

2t are Op(1). Hence ω2

¡u2t − σ2u

¢and consequently

ω2¡u2t − σ2u

¢are Op

¡lT 2¢at most. So, the distribution of

¯ bShc ¯ diverges at least as fast as Op(pT/l).

18

Table1.SizeoftheTestsunderStationaryCointegration;d1=d2=d3=0.

k=[0.75T

1/2]

k=[T1/2]

k=[1.25T

1/2]

Shi

Sa hi

Snc

Sa nc

Shc

Sa hc

Snc

Sa nc

Shc

Sa hc

Snc

Sa nc

Shc

Sa hc

Kc

Ka c

T(yt)

φε,y=0.5

200

.049

.046

.038

.047

.052

.057

.037

.044

.056

.059

.030

.033

.061

.061

.056

.056

400

.051

.045

.044

.048

.048

.051

.049

.051

.049

.052

.044

.045

.049

.051

.053

.051

600

.052

.048

.049

.051

.052

.053

.048

.048

.051

.051

.046

.049

.052

.052

.059

.064

φε,y=φε,x=0.5

200

.049

0.49

.044

.060

.053

.060

.040

.048

.054

.059

.038

.047

.066

.071

.063

.064

400

.051

.045

.044

.051

.047

.051

.049

.052

.051

.054

.047

.052

.051

.057

.053

.053

600

.050

.048

.047

.052

.054

.057

.047

.050

.050

.056

.053

.052

.057

.056

.062

.060

φε,y=−0.7

200

.060

.054

.038

.049

.055

.060

.038

.044

.056

.059

.038

.038

.059

.061

.057

.157

400

.049

.052

.050

.056

.052

.054

.045

.045

.051

.052

.046

.050

.052

.055

.052

.063

600

.048

.052

.052

.056

.050

.053

.053

.053

.050

.052

.046

.048

.049

.052

.049

.051

φε,y=φε,x=−0.7

200

.058

.057

.040

.055

.054

.062

.041

.046

.054

.061

.043

.046

.052

.061

.054

.307

400

.048

.051

.045

.052

.052

.054

.046

.050

.052

.051

.044

.047

.051

.052

.048

.190

600

.047

.051

.056

.058

.051

.054

.054

.053

.051

.052

.046

.046

.052

.053

.047

.052

φε,y=0.9

200

.049

.046

.121

.170

.088

.102

.053

.055

.092

.088

.044

.035

.105

.092

.171

.119

400

.048

.046

.130

.137

.085

.089

.065

.056

.087

.076

.054

.039

.094

.075

.143

.085

600

.054

.053

.134

.134

.082

.082

.066

.057

.082

.068

.073

.056

.083

.068

.154

.090

φε,y=φε,x=0.9

200

.048

.046

.138

.193

.100

.115

.054

.055

.098

.098

.044

.034

.116

.100

.238

.150

400

.048

.047

.145

.155

.081

.081

.065

.054

.083

.072

.051

.037

.090

.081

.215

.126

600

.058

.055

.136

.136

.074

.076

.070

.058

.077

.067

.065

.053

.081

.068

.218

.113

Table2.Size/PoweroftheTestsunderHeteroscedasticCointegration.

k=[0.75T

1/2]

k=[T1/2]

k=[1.25T

1/2]

Shi

Sa hi

Snc

Sa nc

Shc

Sa hc

Snc

Sa nc

Shc

Sa hc

Snc

Sa nc

Shc

Sa hc

Kc

Ka c

T(yt)

d1=0;

d2=d3=0.11

/2;φv,y=0.8

200

.461

.424

.022

.037

.400

.442

.028

.026

.412

.418

.026

.026

.388

.385

.123

.107

400

.593

.536

.042

.049

.598

.603

.039

.042

.595

.577

.044

.038

.588

.558

.117

.086

600

.661

.563

.048

.049

.698

.698

.044

.043

.692

.670

.045

.035

.690

.645

.119

.108

d1=0;

d2=d3=0.11

/2;φv,x=0.8

200

.444

.416

.068

.081

.402

.441

.049

.051

.381

.388

.038

.033

.374

.360

.461

.378

400

.578

.527

.055

.061

.570

.575

.051

.046

.557

.547

.053

.047

.547

.514

.601

.478

600

.647

.557

.063

.034

.664

.664

.056

.053

.660

.635

.061

.057

.638

.590

.694

.560

d1=0;

d2=d3=0.11

/2;φv,y=0.6,

φv,x=−0.6;

φε,y=0.6,φε,x=−0.6

200

.452

.409

.033

.043

.460

.491

.030

.033

.447

.456

.029

.031

.432

.427

.083

.083

400

.600

.545

.040

.045

.638

.648

.040

.041

.632

.620

.038

.037

.617

.595

.075

.070

600

.662

.551

.047

.049

.723

.726

.044

.042

.720

.698

.039

.039

.715

.678

.078

.077

d1=0;

d2=d3=0.11

/2;φv,y=−0.6,

φv,x=0.6;

φε,y=−0.6,φε,x=0.6

200

.402

.383

.034

.043

.432

.465

.030

.034

.430

.459

.030

.032

.419

.416

.365

.334

400

.536

.512

.038

.042

.642

.649

.040

.041

.635

628

.040

.040

.620

.596

.457

.383

600

.604

.558

.045

.046

.715

.717

.047

.046

.715

.704

.044

.043

.701

.668

.522

.430

d1=0;

d2=0.11/2;d3=0;

φv,y=0.8

200

.447

.416

.030

.041

.405

.440

.030

.036

.393

.400

.028

.026

.373

.386

.089

.077

400

.585

.535

.040

.044

.526

.525

.049

.047

.512

.500

.039

.035

.501

.477

.090

.073

600

.670

.577

.046

.047

.620

.620

.047

.046

.614

.606

.043

.038

.596

.547

.090

.070

d1=0;

d2=0;

d3=0.11

/2;φv,x=0.8

200

.045

.044

.077

.086

.377

.367

.054

.058

.356

.367

.041

.036

.348

.338

.506

.405

400

.045

.047

.069

.074

.488

.492

.062

.059

.483

.473

.061

.051

.464

.447

.667

.511

600

.050

.052

.062

.063

.600

.587

.060

.057

.588

.568

.063

.055

.574

.533

.739

.566

Table3.PoweroftheTestsunderNoCointegration.

k=[0.75T

1/2]

k=[T1/2]

k=[1.25T

1/2]

Snc

Sa nc

Shc

Sa hc

Snc

Sa nc

Shc

Sa hc

Snc

Sa nc

Shc

Sa hc

Kc

Ka c

Td1=0.2;

d2=d3=0

200

.330

.389

.126

.141

.229

.238

.130

.134

.166

.150

.145

.138

.502

.393

400

.775

.788

.231

.242

.656

.623

.241

.223

.530

.447

.233

.212

.733

.522

600

.931

.932

.319

.319

.877

.853

.324

.286

.774

.700

.336

.265

.836

.538

d1=0.4;

d2=d3=0

200

.543

.614

.186

.228

.360

.363

.192

.194

.247

.204

.203

.183

.547

.391

400

.914

.919

.297

.305

.778

.741

.307

.270

.635

.532

.318

.243

.764

.509

600

.984

.984

.352

.352

.937

.918

.366

.311

.847

.755

.367

.277

.848

.532

d1=0.6;

d2=d3=0

200

.605

.693

.301

.245

.382

.383

.218

.219

.266

.218

.231

.202

.546

.375

400

.935

.938

.308

.319

.809

.762

.321

.274

.652

.549

.334

.252

.767

.514

600

.989

.989

.376

.376

.945

.925

.382

.328

.859

.770

.388

.295

.855

.511

Table4.ApplicationtoBondMarketData;

p-valuesofTests.

Ks

Ka s

Shi

Sa hi

Snc

Sa nc

Shc

Sa hc

Kc

Ka c

Snc

Sa nc

Shc

Sa hc

Kc

Ka c

Lton

Sitregression

Siton

Ltregression

UnitedStates

Lt

.00

.01

.01

.01

S1t

.00

.01

.04

.04

.08

.08

.03

.04

.02

.08

.11

.13

.05

.07

.14

.28

S2t

.00

.02

.04

.05

.05

.05

.04

.05

.00

.03

.12

.13

.06

.07

.13

.28

S3t

.00

.01

.03

.03

.06

.07

.03

.04

.02

.09

.09

.11

.05

.07

.14

.28

S4t

.00

.01

.03

.03

.05

.05

.02

.03

.02

.09

.08

.10

.04

.04

.11

.23

Canada

Lt

.00

.01

.00

.00

S1t

.00

.01

.02

.02

.19

.19

0.52

0.52

.03

.08

.25

.25

.75

.75

.37

.52

S2t

.00

.01

.00

.00

.19

.19

0.25

0.25

.03

.08

.25

.25

.36

.36

.39

.55

S3t

.00

.01

.00

.00

.13

.13

0.19

0.19

.01

.04

.22

.22

.40

.40

.30

.47

UnitedKingdom

Lt

.00

.01

.00

.00

S1t

.00

.01

.00

.00

.32

.34

.08

.09

.01

.05

.36

.37

.13

.14

.42

.61

S2t

.00

.01

.00

.00

.54

.54

.03

.03

.01

.03

.58

.58

.04

.04

.44

.61

S3t

.00

.01

.00

.00

.32

.33

.15

.16

.01

.06

.36

.37

.21

.22

.44

.64

Japan

Lt

.00

.01

.03

.03

S1t

.00

.02

.00

.00

.02

.02

.44

.44

.03

.12

.02

.02

.16

.15

.12

.22

S2t

.00

.01

.02

.02

.03

.02

.41

.41

.04

.10

.02

.02

.12

.12

.10

.19

S3t

.00

.01

.01

.00

.02

.02

.27

.26

.04

.11

.03

.02

.27

.31

.11

.20

Note.ThedatausedinTable2aredefinedasfollows.

UnitedStates(1978:1-2002:12):Lt=Governmentcompositebondyield(>10years);S1t=Federal

fundsrate;S2t=Primerate;S3t=Rateoncertificatesofdeposit;S4t=USDollarinLondon,3month

depositrate.

Canada(1982:6-2002:12):Lt=Benchmarkbondyield(10years);S1t=Officialdiscountrate;S2t=

Overnightmoneymarketrate;S3t=Rateon90daydeposits;

UnitedKingdom

(1978:1-2002:12):Lt=Yieldon10yearGovernmentbonds;S1t=Londonclearing

banksrate;S2t=Overnightinterbankrate;S3t=Rateon3monthinterbankloans.

Japan(1989:1-2002:12):Lt=YieldoninterestbearingGovernmentbonds(10years);S1t=Official

discountrate;S2t=Uncollateralizedovernightrate;S3t=Rateon90daycertificatesofdeposit.

Documents

A RESIDUAL-BASED TEST FOR STOCHASTIC COINTEGRATION