Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Department of Economics
Working Paper
System-Equation ADL Test for Threshold Cointegration
with an Application to the Term Structure of Interest Rates
Jing Li
Miami University
2013
Working Paper # - 2013-05
System-Equation ADL Test For Threshold Cointegration With an
Application to the Term Structure Of Interest Rates
Jing Li∗
Miami University
Abstract
This paper proposes a new system-equation test for threshold cointegration based
on a threshold vector autoregressive distributed lag (ADL) model. The new test can be
applied when the cointegrating vector is unknown and when weak exogeneity fails. The
asymptotic null distribution of the new test is derived, critical values are tabulated,
and finite-sample properties are examined. In particular, the new test is shown to have
good size, so the bootstrap is not required. The new test is illustrated using the long
term and short term interest rates. We show that the system-equation model can shed
light on both asymmetric adjustment speeds and asymmetric adjustment roles. The
latter is unavailable in the Enders-Siklos single-equation testing strategy.
Keywords:
Autoregressive Distributed Lag Model; Simultaneous Equation Model; Asymmetry;
Weak Exogeneity
JEL Classification: C22, C12, C13
∗Jing Li, Department of Economics, Miami University, Oxford, OH 45056, USA. Phone: 001.513.529.4393,Fax: 001.513.529.6992, Email: [email protected].
1
Introduction
Some economic theories imply that variables are related only within certain regime. For
instance, according to the modified law of one price, spatial price linkage exists only in the
regime where price differential exceeds transaction cost. Outside that regime the arbitrage
becomes non-profitable, so the price linkage breaks. Empirical findings are provided in
Giovanini (1988), Taylor and Peel (2000), Sephton (2003) and Jang et al. (2007), among
others. This type of asymmetric adjustment toward equilibrium is the key idea of the
threshold cointegration introduced by Balke and Fomby (1997). In practice the interest
is often on testing threshold cointegration, namely, the existence of long run relationship
with regime-dependent error-correcting speeds.
This paper makes contribution to the literature by proposing a new system-equation test
for the threshold cointegration based on the autoregressive distributed lag (ADL) model.
The new test improves existing tests as follows. Enders and Siklos (2001) (ES) first propose
an Engle-Granger type residual based single-equation test for the threshold cointegration.
The ES test can capture only asymmetric adjustment speeds. In contrast, the new ADL test
is based on a system-equation model, so can accommodate not only asymmetric speeds, but
also asymmetric roles. By asymmetric roles, we mean some variables have error-correcting
speeds significantly different from other variables. The asymmetric roles are illustrated in
the application of the new test.
Next Seo (2006) develops a system-equation test using a generalized error correction
model (ECM). The ECM test requires predetermined cointegrating vectors. But in practice
the cointegrating vector can be misspecified. To make it worse, many economic theories
are not numerically specific, resulting in unknown cointegrating vectors. That means the
application of the ECM test can be limited. The new ADL test is not as restrictive, and can
be applied even when the cointegrating vector is unknown. This improvement is attributed
to the fact that in the ADL model the lagged regressand, rather than the error correction
2
term, is used as the regressor. Using the lagged regressand has another advantage. The ADL
test is shown to be insensitive to a poorly-estimated or misspecified cointegrating vector.
The ADL test conducts a more efficient grid search for the threshold value than the
ECM test. Both tests use the error correction term as the threshold variable, which is
nonstationary under the null hypothesis of no cointegration. However, the ECM test searches
the unknown threshold parameter over a range of fixed values. Because the probability of
a nonstationary series being less than a fixed value approaches zero asymptotically, the
threshold parameter is absent in the limiting distribution of the ECM test. This implies
that the information obtained in the grid search is asymptotically unexploited. In contrast,
the ADL test treats a percentile of the empirical distribution of the error correction term
as the threshold parameter. As a result, the grid search of the ADL test is over a range of
percentiles, rather than a range of fixed values. By doing so, the searching result is fully
utilized regardless of the sample size. In the end the limiting distribution of the ADL test can
be expressed as functionals of the two-parameter Brownian motion; the threshold parameter
is one of the two parameters.
Another improvement over the ECM test is that the ADL test allows for a nonzero
deterministic component in the series. By adding the trend in the testing regression, the ADL
test is able to distinguish difference stationarity from trend stationarity. This generalization
is important given that many economic series are trending. Following Sims et al. (1990), we
develop the asymptotic theory based on the canonical form of the trend-augmented model.
A single-equation ADL test is considered in Li and Lee (2010). The single-equation test
is easy to compute. However, the validity of that test hinges on the assumption of weak
exogeneity, i.e., only one variable is error-correcting; others are not. See Boswijk (1994)
for more discussion about weak exogeneity. Weak exogeneity becomes questionable when
a system is of high dimension. By assuming away weak exogeneity the new test is less
restrictive than the single-equation ADL test.
3
Finally, the new test extends the linear ADL cointegration test of Banerjee et al. (1986).
The linear test assumes regime-invariant error-correcting speeds. Early works such as Pip-
penger and Goering (1993) have shown the linear test is prone to losing power in the presence
of regime switching. The power loss does not occur to the new test because it explicitly ac-
counts for asymmetric adjustment speeds.
Testing for the threshold cointegration is nonstandard because the threshold parameter
cannot be identified under the null hypothesis. Following Hansen (1996) we adopt the sup
Wald statistic to address the so called Davies’ problem, c.f., Davies (1977, 1987). The
limiting null distribution of the new test is derived as functionals of the two-parameter
Brownian motion. The new distribution extends Horvath and Watson (1995) to threshold
processes.
We conduct Monte Carlo experiments to compare the finite-sample performance. Re-
markably, the simulation shows that the size distortion of the new test is much smaller than
the ECM test. Thus the bootstrap is not required for the new test, a good message for
practitioners. The simulation also shows misspecification in the error correction term can
lead to deterioration in the power of the ECM test, but not the ADL test. We provide an ap-
plication of the new test, in which the relationship of U.S. short term and long term interest
rates is investigated. The focal point is the asymmetry in the process of error correction.
Throughout the paper 1( .) denotes the indicator function, | .| the Euclidean norm, [x]
the integer part of x, ⇒ the weak convergence with respect to the uniform metric on [0, 1]2.
All mathematical proofs are in the appendix.
Threshold Vector ADL Model
Consider a two-regime threshold vector autoregressive distributed lag model (TVADLM)
α(L)∆yt = (c11 + c12t+ δ1yt−1)11t−1 + (c21 + c22t+ δ2yt−1)12t−1 + et, (1)
4
where yt = (y1t, . . . , ynt)′ is an n × 1 data vector from a sample of size T, ∆yt is the first
difference, et is an i.i.d n × 1 vector of error terms, and δ1 and δ2 are n × n matrixes.
Intercepts and trends are included in (1), so the model can be applied to trending series. The
deterministic terms are regime-dependent in order to account for the possible discontinuity at
the threshold parameter. The short-run dynamics is represented by α(L) ≡ In−∑p
j=1 αjLj,
a p-th order matrix polynomial in the lag operator L. In practice choosing p can be guided
by information criteria of AIC and BIC.
Economic factors such as transaction cost and policy intervention can lead to threshold
behavior (regime-switching) in an economic system. Accordingly, two indicators are defined
as
11t−1 ≡ 1(zt−1 < τ), 12t−1 ≡ 1(zt−1 ≥ τ). (2)
In words, in regime one the value of the lagged threshold variable, zt−1, is less than the
threshold parameter τ, whereas greater than or equal to τ in regime two. Alternative specifi-
cations are possible. For example, zt−1 can be replaced with zt−d, where the delay parameter
d can be obtained by the grid search. A three-regime model can be specified with three
indicators: 11t−1 = 1(zt−1 < τ1), 12t−1 = 1(τ2 ≤ zt−1 ≤ τ2) and 13t−1 = 1(zt−1 > τ2). Testing
the number of regimes is an unsolved issue. This paper focuses on the two-regime model (1)
for ease of exposition. To develop the asymptotic theory, we assume
Assumption 1 (a) zt−1 has a marginal uniform distribution, zt−1 ∼ U(0, 1). (b) et is i.i.d,
Eet = 0, Eete′t = Ω, and E|et|4 < ∞. (c) All roots of |α(L)| = 0 are outside the unit circle.
Assumption 1-(c) is typically imposed in the literature. Assumption 1-(a) and (b) are made
so that the following weak convergence holds. Under Assumption 1-(a) and (b), as T → ∞,
T−1/2
[Tr]∑t=1
11t−1et ⇒ PW (r, τ), (3)
5
where W (r, τ) denotes the multivariate version of the two-parameter Brownian motion
(TPBM) introduced by Caner and Hansen (2001). When τ = 1, W (r, 1) becomes the
standard multivariate one-parameter Brownian motion. Matrix P is the Cholesky factor of
Ω, i.e., Ω = PP ′. Basically (3) is a simple extension of Theorem 1 of Caner and Hansen
(2001) to the multivariate case. Because the TPBM depends on the threshold parameter
τ, searching that parameter is relevant for our asymptotic theory, which is not the case for
the ECM test. This difference is due to the choice of the searching range for the threshold
parameter. To see this, let γ denote the n× 1 cointegrating vector and
xt−1 ≡ γ′yt−1 (4)
denote the lagged error correction term. The literature usually lets the regime-switching
be governed by xt−1 since it measures the deviation from the equilibrium. The ECM test
grid searches the threshold value over a range of fixed values, despite the fact that xt−1
is integrated of order one or nonstationary under the null hypothesis of no cointegration.
The outcome is undesirable: the searching result is lost asymptotically and the threshold
parameter vanishes in the limiting distribution, see Theorem 2 of Seo (2006). Furthermore,
severe size distortion is reported for the ECM test, and the bootstrap is recommended as a
remedy.
In light of Assumption 1-(a) the ADL test defines the threshold variable as
zt−1 ≡ cdf(xt−1) (5)
where cdf( .) denotes the empirical distribution function1. By the property of the empir-
ical distribution function, cdf(xt−1) ∼ U(0, 1), so Assumption 1-(a) holds. Since cdf is a
1The empirical distribution function is non-decreasing, and is a step function that jumps for 1/T at eachof the T data points.
6
monotonic transformation, it follows that
1(zt−1 < τ) = 1(cdf(xt−1) < τ) = 1(xt−1 < xt−1(τ)) (6)
where xt−1 is xt−1 sorted in an ascending order, and xt−1(τ) denotes the τth element of
xt−1. In words the threshold parameter τ in this case represents a percentile of the empirical
distribution of the error correction term. The searching range for τ is the parameter space
of percentiles, which is always restricted to lie between 0 and 1. Searching the percentile has
the benefit that the parameter τ always matters, no matter in finite samples or in the limit,
and no matter xt−1 is stationary or not. More discussion about using the percentile as the
threshold value can be found in Li and Lee (2010).
The link between model (1) and the error correction model becomes apparent if we rewrite
δ1 = β1γ′, δ2 = β2γ
′, (7)
where γ is n× 1 cointegrating vector, and β1 and β2 are n× 1 vectors of adjustment speeds
in the two regimes. Here we assume the number of cointegration relationships is at most one
so that γ has column rank of 1. The issue of testing q against q + 1 threshold cointegration
relationships is left to future research. Notice that γ has no subscript in (7). This is because
most economic theories imply regime-invariant long run relationship, which is quantified by
γ. It is β, the error-correcting speed, that varies across regimes in short run. There is also a
technical reason to assume constant γ. There would be two error correction terms with two
gammas. Ambiguity would arise since we are not sure about searching which one of them.
Seo (2006) implicitly imposes regime-invariant γ and one cointegration at maximum.
Factorization (7) generalizes the one used to derive the traditional linear error correction
model. By virtue of (4) and (7), model (1) can be rewritten as a two-regime threshold vector
7
error correction model (TVECM)
α(L)∆yt = (c11 + c12t+ β1xt−1)11t−1 + (c21 + c22t+ β2xt−1)12t−1 + et. (8)
By allowing for β1 = β2, model (8) extends the traditional linear error correction model. In
a similar fashion, model (1) extends the linear ADL model by allowing for δ1 = δ2.
The ECM test of Seo (2006) is concerned with the null hypothesis H0 : β1 = β2 = 0
using a testing regression slightly different from (8). First, the trend term is excluded, so
the ECM test cannot be applied to trending series. Second, the ECM test grid searches τ
over a range of fixed values.
Given that the error correction term xt−1 appears on the right side of (8), the ECM test
requires that the cointegrating vector γ be pre-determined. This limits the application of
the ECM test as γ is unknown in many cases. Some practitioners would apply the ECM test
after estimating γ, but rigorously speaking, the asymptotic theory of Seo (2006) becomes
invalid when γ is estimated. This situation is similar to the Enger-Granger test, which
follows the Dickey Fuller distribution if the cointegrating vector is known, but Phillips-
Ouliaris distribution if the cointegrating vector is estimated, see Phillips and Ouliaris (1990)
for details.
The asymptotic theory for the proposed ADL test is valid no matter γ is known or
estimated. This is partly because the regressor of model (1) involves yt−1, the lagged regres-
sand. The error correction term only has indirect effect on model (1) through the indicator.
Thanks to the indirect role, poor estimation or misspecification of the cointegrating vector
has minimal effect on the ADL test. In contrast, xt−1 appears directly on the right hand size
of (8). As a result, a misspecified cointegrating vector will adversely affect the ECM test
through both the indicator and the regressor. In this regard TVADLM (1) is more robust
to the misspecification than TVECM (8). The subsequent simulation will confirm that.
8
The ES test of Enders and Siklos (2001) is concerned with the null hypothesis H0 : β∗1 =
β∗2 = 0 based on the single regression
α∗(L)∆xt = β∗1xt−111t−1 + β∗
2xt−112t−1 + e∗t . (9)
Regression (9) essentially replaces the regressand ∆y1t (with normalization) in the first equa-
tion of (8) with ∆xt. To justify this replacement the ES test imposes the restriction that the
short-run dynamics of components of yt involves one common factor. Kremers et al. (1992)
call this common factor restriction, in the context of the Engle-Granger testing regression.
The ADL test is based on the system model (1), so is free of the common factor restriction.
It is instructive to highlight that model (1) can shed light on two types of asymmetry.
First, asymmetric adjustment speeds are allowed for by δ1 = δ2. Second, asymmetric adjust-
ment roles can be revealed by the fact that the coefficients of yt−111t−1 and yt−112t−1 in one
regression are different from other regressions. In the extreme case called weak exogeneity,
a variable plays no role of error correction when both coefficients are zero. The univariate
model (9) can illustrate only asymmetric adjustment speeds, not asymmetric adjustment
roles. We will revisit this issue in the application.
System-Equation ADL Test for Threshold Cointegration
This paper proposes a new test for the null hypothesis
H0 : δ1 = δ2 = 0 (10)
based on TVADLM (1), called the system-equation ADL test for threshold cointegration.
Because of (7), testing (10) is equivalent to testing β1 = β2 = 0 in (8). So the ADL test and
ECM test are closely related. Li and Lee (2010) show that testing (10) can be undertaken
9
in a single-equation framework if each of β1 and β2 contains only one non-zero element (i.e.,
only one variable is error-correcting). This condition is called weak exogeneity, which can
be restrictive for a high-dimensional system. The system-equation ADL test is more general
than the single-equation ADL test by assuming away weak exogeneity.
The alternative hypotheses can be
H11 : δ1 = 0, δ2 < 0; H2
1 : δ1 < 0, δ2 = 0; H31 : δ1 < 0, δ2 < 0. (11)
Strictly speaking H11 or H2
1 implies that yt is cointegrated only in one regime. When (10) is
rejected, we conclude that series are cointegrated, but not necessarily in both regimes. The
null hypothesis (10) can also be rejected if yt is linearly cointegrated, i.e., δ1 = δ2 = 0. In this
sense the threshold cointegration test extends the linear cointegration test. Pippenger and
Goering (1993) report that the linear cointegration test can suffer power loss in the presence
of asymmetric adjustment speeds. Here the alternative hypotheses H11 and H2
1 clear show
the asymmetry is allowed for by the threshold cointegration test.
Testing (10) is nonstandard because under (10) the indicators disappear and τ cannot
be identified. Davies (1977, 1987) first raises this issue, called Davies’ problem. We follow
Hansen (1996) and calculate the sup-Wald statistic2 as follows. First, we estimate the cointe-
grating vector γ by OLS if it is unknown, and compute the error correction term (4). Then,
for given τ we define the indicators (2) using (5), fit (1) by OLS, obtain the estimated coeffi-
cients δτ = (δ1(τ), δ2(τ)), the residual et(τ), and the variance matrix Ωτ = T−1∑
et(τ)et(τ)′.
We stack matrices by letting Yt = (yt−111t−1, yt−112t−1), Yτ = [Y1Y2 . . . YT ]′, e = [e1e2 . . . eT ]
′.
Define the annihilation matrix Q ≡ I − Z(Z ′Z)−1Z ′ where Z stacks (∆yt−1, . . . ,∆yt−p, dt).
Similar to the Dickey-Fuller unit root test, the distribution of the ADL test depends on the
specification of the deterministic term dt. There are three cases:
2This paper focuses on the sup test due to its simplicity. See Andrews and Ploberger (1994) for alterna-tives.
10
(Case I, no intercept or trend): c11 = c12 = c21 = c22 = 0 in (1), so dt is null;
(Case II, intercept only): c11 = 0, c21 = 0, c12 = c22 = 0 in (1), so dt = (11t−1, 12t−1);
(Case III, intercept and trend): c11 = 0, c21 = 0, c12 = 0, c22 = 0 in (1), so dt = (11t−1, t11t−1, 12t−1, t12t−1).
Let F (τ), F d(τ) and F t(τ) denote the Wald statistics for cases I, II and III, respectively.
They are given by
F (τ), F d(τ), F t(τ) ≡ [vec(δτ )]′[var(vec(δτ ))
−1][vec(δτ )] (12)
= [vec(δτ )]′[(Y ′
τQYτ )−1 ⊗ Ωτ
]−1
[vec(δτ )] (13)
= [vec(e′QYτ )]′[(Y ′
τQYτ )−1 ⊗ Ω−1
τ
][vec(e′QYτ )] (14)
= trace[Ω−1/2
τ (e′QYτ )(Y′τQYτ )
−1(Y ′τQe)Ω−1/2′
τ
], (15)
where Ω−1/2τ is the Cholesky factor of Ωτ , vec denotes the vec operator and ⊗ the Kronecker
product operator. Note that the definition of the Wald statistics is (12), but in practice we
calculate the Wald statistics using (13). The asymptotic theory is built upon (15). Finally,
we search τ over a range, and the sup Wald test is given by
supτ∈Θ
F (τ), supτ∈Θ
F d(τ), supτ∈Θ
F t(τ), (16)
where the searching range Θ ⊂ [0, 1] is a compact set. Symmetric sets such as Θ = [0.15, 0.85]
are recommended in the literature. Note that some observations are discarded in the grid
search in order to avoid divergent asymptotic distributions.
11
Asymptotic Theory
Let W (r, τ) denote the n-dimensional two-parameter Brownian motion on (r, τ) ∈ [0, 1]2.
For notational economy we write∫ 1
0as∫. By holding τ constant, the integration is over r
for the stochastic integrations such as∫W (r, 1)dW (r, τ)′. Following the literature, we make
an auxiliary assumption for cases I and II.
Assumption 2 Under the null hypothesis (10), yt is generated by (1) with c11 = c12 = c21 =
c22 = 0.
Assumption 2 states that the true process under the null is vector random walk without
drift. The limiting null distributions of the system-equation ADL tests (16) for cases I and
II are
Theorem 1 (Case I) Under (10) and Assumptions 1, 2, as T → ∞,
supτ∈Θ
F (τ) ⇒ supτ∈Θ
trace[J ′0J
−11 J0
], (17)
where
J0 =
∫W (r, 1)dW (r, τ)′∫
W (r, 1)dW (r, 1)′ −∫W (r, 1)dW (r, τ)′
,
J1 =
τ∫W (r, 1)W (r, 1)′dr 0
0 (1− τ)∫W (r, 1)W (r, 1)′dr
.
Theorem 2 (Case II) Under (10) and Assumptions 1, 2, as T → ∞,
supτ∈Θ
F d(τ) ⇒ supτ∈Θ
trace[Jd′
0 (Jd1 )
−1Jd0
], (18)
12
where
Jd0 = J0 −
∫W (r, 1)drW (1, τ)∫
W (r, 1)dr[W (1, 1)−W (1, τ)]
,
Jd1 = J1 −
τ∫W (r, 1)dr
∫W ′(r, 1)dr 0
0 (1− τ)∫W (r, 1)dr
∫W ′(r, 1)dr
.
There are some remarks. First of all, the distributions in (17) and (18) are in terms of
the TPBM, which includes the threshold parameter τ. In contrast Theorem 2 of Seo (2006)
shows that τ is absent in the limiting distribution of the ECM test. Second, the new dis-
tributions depend on n, the dimensional parameter, and Θ, the searching range. Finally,
the distributions are free of nuisance parameters such as the Cholesky factor P and serial
correlation characterized by α(L). Therefore we can tabulate in Table 1 the critical values
of the limiting distributions for various n and Θ. Those critical values are computed as the
empirical quantiles from 5, 000 independent draws from the simulations for the asymptotic
formula in Theorems 1 and 2.
Table 1 shows that the critical value increases as the searching range Θ gets wider. This
is indicative of the divergent behavior of the sup Wald test. The critical value also increases
as the dimensional parameter n rises. Finally, the critical values of supτ∈Θ F (τ) (reported
under column w/o constant in Table 1) are less than those of supτ∈Θ F d(τ) (reported under
column w/ constant). As expected, the intercept term shifts the limiting distribution.
For case III we make the auxiliary assumption that
Assumption 3 Under the null hypothesis (10), yt is generated by (1) with c12 = c22 =
0, c11 = c21 = c0.
Assumption 3 states that under the null hypothesis yt is vector random walk with drift
component of c0. In this case we need to derive the canonical form introduced by Sims et al.
13
(1990) for (1). To see why, notice that under Assumption 3 α(L)∆yt = c0+et. So yt consists
of a stochastic trend and a deterministic trend, and the latter is correlated with the trend
term in the testing regression. Therefore it is necessary to avoid collinearity by removing
the deterministic trend from yt. Toward that end we consider the canonical form given by
α(L)∆yt = (c∗11 + c∗12t+ δ∗1ζt−1)11t−1 + (c∗21 + c∗22t+ δ∗2ζt−1)12t−1 + et. (19)
where ζt−1 = yt−1 − c0(t− 1), δ∗1 = δ1, δ∗2 = δ2, c
∗11 = c11 − c0, c
∗12 = c12 + c0, c
∗21 = c21 − c0,
and c∗22 = c22 + c0. Now the new regressor ζt−1 in (19) contains only the stochastic trend.
The hypothetical model (19) is just the algebra rearrangement of (1), so testing δ1 = δ2 = 0
in (1) is equivalent to testing δ∗1 = δ∗2 = 0 in (19). The computation of the ADL test, denoted
by supτ∈Θ F t(τ) in this case, is still based on (1). The limiting distribution of supτ∈Θ F t(τ)
is derived based on (19), and is given in the following theorem.
Theorem 3 (Case III) Under (10) and Assumptions 1, 3, as T → ∞,
supτ∈Θ
F t(τ) ⇒ supτ∈Θ
trace[J t′
0 (Jt1)
−1Jdt
], (20)
where
J t0 = J0 − C∗
21C−122 C
∗23 − C∗
31C−132 C
∗33,
J t1 = J1 − C∗
21C−122 C
∗′21 − C∗
31C−132 C
∗′31,
C21 =
τ∫W (r, 1)dr τ
∫rW (r, 1)dr
0 0
, C22 =
τ τ∫rdr
τ∫rdr τ
∫r2dr
, C23 =
W (1, τ)∫rdW (r, τ)′
C31 =
0 0
(1− τ)∫W (r, 1)dr (1− τ)
∫rW (r, 1)dr
, C32 =
1− τ (1− τ)∫rdr
(1− τ)∫rdr (1− τ)
∫r2dr
,
14
C33 =
[W (1, 1)−W (1, τ)][∫rdW (r, 1)′ −
∫rdW (r, τ)′
] .
Note that the distribution does not depend on c0, the drift term. The critical values of
the distribution are reported under the column w/trend in Table 1. In practice we can use
supτ∈Θ F t(τ) test to distinguish difference-stationary threshold series from trend-stationary
threshold series. The ECM test cannot do this since it excludes the trend term.
Monte Carlo Experiments
This section carries out the size and power comparisons in finite samples for the proposed
system-equation ADL test, the ECM of Seo (2006) and the ES test of Enders and Siklos
(2001). We first compare the rejection frequency under the null hypothesis (size) at the 5%
nominal level. For tractability, a bivariate data generating process (DGP) is considered:
∆y1t = c1∆y1t−1 + e1t (21)
∆y2t = c2∆y2t−1 + e2t (22)
xt−1 = y1t−1 − γy2t−1 (23)
xt−1 = y1t−1 − γy2t−1 (24) e1t
e2t
∼ iidn(0,Ω), Ω =
1 d1
d1 1
, (25)
where γ = 0.1, and γ is the OLS estimate. The short-run dynamics is controlled by c1 and
c2, and the correlation between the error terms is determined by d1. Their default values are
c1 = 0.1, c2 = 0.5, and d1 = 0.7. When running the simulation, we let one parameter vary
while keeping other parameters at their default values. The null hypothesis (10) is imposed
since neither y1t nor y2t is error-correcting, i.e., the error correction term (23) is absent in
(21) and (22). The number of iteration is 5000, and the sample size is 100. The initial
15
values of y1t and y2t are set as random numbers.
We drop the intercept and include one lag in the testing regressions. The corresponding
critical values at the 5% level are 30.2 for the new ADL test, 7.08 for the ES test (denoted
by Φ∗ in Table 5 of Enders and Siklos (2001)), and 10.968 for the ECM test (denoted by
supW in Table 1 of Seo (2006)). The ADL and ES tests use the estimated error correction
term (24), while the ECM test uses the pre-determined error correction term (23). Panels
A, B, C of Figure 1 plot the size against varying c1, c2 and d1, respectively.
We can summarize Figure 1 as follows. Among the three tests, the size of the ADL
test (marked by circle) is closest to the nominal level 0.05, and is largely invariant to c1,
c2 and d1. This finding verifies that the limiting distribution in Theorem 1 is indeed free of
those nuisance parameters. It also implies that the bootstrap is not needed for the ADL
test. In contrast, a size as large as 0.4 is found for the ECM test (marked by square). Size
distortion of this magnitude is also reported in Table 2 of Seo (2006). Because of the severe
size distortion, the ECM test (marked by triangle) has to be used with the bootstrap. The
size of the ES test is slightly under the nominal level.
Next we compare the rejection frequency under the alternative hypothesis (power). In
light of the size distortion, we focus on the size-adjusted power by using the 5% bootstrap
critical value obtained under the null hypothesis. The new DGP is:
∆y1t = −0.1xt−11(xt−1 < 0) + k1xt−11(xt−1 ≥ 0) + 0.1∆y1t−1 + e1t (26)
∆y2t = 0.5∆y2t−1 + e2t (27)
xt−1 = y1t−1 − (γ + f)y2t−1 (28)
(23) and (25). Now the error correction term (23) enters (26), so y1t is error-correcting and
the null hypothesis is violated. The parameter k1 determines the error-correcting speed in the
second regime. When k1 = −0.1, the system is cointegrated linearly, i.e., the error-correcting
16
speeds become constant.
Because the ADL and ES tests can use either predetermined cointegrating vector or
estimated cointegrating vector, and because the power depends on which cointegrating vector
is used, we need to be careful to make sure the tests are compared on equal footing. So we
denote the ADL and ES tests using the predetermined error correction term (28) by ADLP
(marked by diamond) and ESP (marked by star), and the tests using the estimated error
correction term (24) by ADL and ES. By allowing for f = 0 in (28), the predetermined
error correction term (28) can differ from the true error correction term (23). Then we can
evaluate the effect of misspecified cointegrating vector on the power. Figures 2 plots the
power against k1. In Panel A, f = 0, so the error correction term is correctly pre-specified.
Misspecification is introduced in panel B with f = 0.05, and in Panel C with f = 0.1.
There are two main findings. First, the powers of all tests are minimized when k1 = −0.1.
The powers start to rise as k1 moves toward −0.8. This result makes sense because k1
measures both the degree of asymmetry and strength of error-correction. Second, the powers
of ECM and ESP are sensitive to the misspecification in the cointegrating vector. For
instance, the ECM test has the greatest power in Panel A; but its power becomes the
second lowest in Panel C. In contrast, the power of the ADLP test is almost invariant to
the misspecification. This difference is expected since the misspecified error correction term
has marginal affect on the ADL test only through the indicator. But the same misspecified
error correction term is used as regressor in the ECM and ESP tests, which amplifies the
adversary effect of misspecification.
We need to interpret the power performance of the ECM test with caution. Remember
due to the nature of the Monte Carlo experiment the cointegrating vector (1,−γ) happens
to be known. That is why the ECM test can be applied here. Along with the favorable
assumption that there is no misspecification, then the ECM test is shown to dominate other
tests. In practice the condition of a correctly predetermined cointegrating vector is subject
17
to violation. On the other hand, the power of the proposed ADL test is less than the ECM
test in Panel A. But in panel C their ranking reverses. Overall, the stability of the power
indicates that the ADL test can be a reliable alternative to the ECM test. In terms of
power, the ES test consistently outperforms the ADL test. However, the ES test has the
disadvantage that it captures fewer types of asymmetry than the ADL test, as illustrated by
the next application.
An Application
This section investigates the relationship between U.S. 10-year treasury constant maturity
rate (10ybond, y1t) and the effective federal funds rate (Fedfunds, y2t). The goal is to provide
the latest empirical finding about the asymmetry in the term structure of interest rates. The
monthly data from January 1962 to August 2013 are downloaded from FRED Economic
Data at http://research.stlouisfed.org/fred2/. The sample size is 620.
Panel A of Figure 3 presents the time series plot. Both interest rates appear highly persis-
tent with long swing. With the lag selected by AIC, the augmented Dicky-Fuller t statistics
are -1.97 for Fedfunds and -1.30 for 10ybond. Therefore both series are nonstationary. Panel
A also shows the two series tend to move together over time. Such co-movement is predicted
by economic theories such as the expectation hypothesis. Empirically we are interested in
the long run relationship:
y1t = γ0 + γ1y2t + xt, (29)
where xt is the error correction term. Since γ0 and γ1 are both unknown, this is an example
where the economic theory does not quantify the cointegrating vector. The OLS estimation
of (29) is
xt = y1t − 2.81− 0.68y2t.
So on average the long term rate 10ybond exceeds the short term rate Fedfunds by 2.81
18
when the short term rate equals zero. Panel B of Figure 3 plots the error correction term
xt. The fact that xt shows less persistence or more choppiness than y1t and y2t is suggestive
of cointegration. Actually the Engle-Granger linear cointegration test applied to xt is -3.41,
rejecting the null hypothesis of no cointegration at the 5% level.
The Engle-Granger test assumes constant adjustment speeds. However, intuition tells
us the adjustment speed may be non-constant. More explicitly, the speed may depend on
whether xt, which loosely speaking measures the gap between the two series, is big or small.
Big gap (in absolute value) supposedly leads to fast adjustment.
The key question is, which value of xt will trigger the fast adjustment? Or in other
words, how to define “big” gap? In order to answer that, we grid search the threshold value
τ by minimizing the determinant3 of Ω = T−1∑
ete′t, where et = (e1t, e2t) denotes the OLS
residual of the two-regime threshold vector ADL model:
α1(L)∆y1t = (c1,11 + δ11yt−1)11t−1 + (c1,21 + δ12yt−1)12t−1 + e1t (30)
α2(L)∆y2t = (c2,11 + δ21yt−1)11t−1 + (c2,21 + δ22yt−1)12t−1 + e2t, (31)
where the deterministic term include the intercept. Trend is excluded since Figure 3 shows
no trending pattern. We follow the principle of parsimony and start with one lag (let p = 1)
in α1(L) and α2(L). The model with one lag seems adequate as the Ljung-Box Q test with
one lag finds no serial correlation in e1t and e2t.
The threshold value is estimated as τ = −1.25, and a dash line corresponding to that
value is drawn in Panel B of Figure 3. It seems that below the dash line the error series xt
moves more quickly than above the line. This asymmetry is consistent with our prior belief,
and signifies regime-switching.
Table 2 reports the estimation results for (30) and (31). There are two main findings.
3Minimizing the trace yields similar results.
19
First, asymmetric adjustment speeds are confirmed by the fact that the absolute values of the
coefficients in lower regime (with indicator 11t−1 = 1(zt−1 < −1.25)) are different from those
in upper regime (with indicator 12t−1 = 1(zt−1 ≥ −1.25)). For instance, in equation (31),
the estimated coefficient of y1t−1 is 0.22 and significant at the 10% level in lower regime, but
-0.03 and insignificant in upper regime. To interpret this result, notice that in lower regime
the error correction term is more negative than -1.25. That means the yield curve is likely to
be inverted in lower regime. An inverted yield curve is unusual, so the market reacts quickly.
For instance, in Panel A of Figure 3 we see the Fedfunds tops 10ybond and the yield curve
becomes inverted after the 200-th observation. During that period (from September 1978 to
May 1980) the adjustment speed of Fedfunds is much faster than periods with normal yield
curves.
The second main finding is, the absolute values of some coefficients in (31) are significantly
greater than (30). Take the coefficient for the regressor y2t−111t−1. It is -0.23 and significant
in (31), but -0.03 and insignificant in (30). Recall that the regressands are differenced
Fedfunds in (31) and differenced 10ybond in (30). Hence the second finding indicates that
the two interest rates play asymmetric roles in the adjustment process. In particular, the
short term rate Fedfunds has two big coefficients (0.22 and -0.23), and therefore plays bigger
role than the long term rate 10ybond4. This finding is intuitive. Fedfunds plays bigger
role just because it is a monetary policy instrument frequently manipulated by the Federal
Reserve.
Note that the two main findings are unavailable in the Engle-Granger linear cointegration
model because it pre-excludes asymmetry. The single-equation model used by Enders and
Siklos (2001) can detect only the asymmetric adjustment speeds, but not the asymmetric
roles. This application provides an example showing the system-equation threshold model
is inherently more informative than the single-equation model.
4In (30) the coefficients of y1t−112t−1 and y2t−112t−1 are statistically significant, but close to zero.
20
Finally we test the null hypothesis of no threshold cointegration, i.e., H0 : δ11 = δ12 =
δ21 = δ22 = 0 in (30) and (31), using the proposed system-equation ADL test. The ECM test
of Seo (2006) is inapplicable in this case because γ0 and γ1 in (29) are not pre-determined.
The single-equation ADL test of Li and Lee (2010) is also invalid since both y1t and y2t are
error-correcting and weak exogeneity fails. We let the searching range be Θ = [0.15, 0.85]. By
including the intercept (case II), the ADL test supτ∈Θ F d(τ) equals 128.04, greater than the
5% critical value 42.5. The Enders-Siklos test is 18.62, rejecting no threshold cointegration
as well.
We can summarize the application as follows. First, the long term and short term interest
rates are cointegrated. Second, there is evidence that fast adjustment occurs when the yield
curve is inverted. Finally, the short term rate plays bigger adjustment role than the long
term rate.
Conclusion
This paper develops a new system-equation ADL test for the threshold cointegration. Using
the new test we can resolve some problems found in the existing tests. In particular, the
new test can be used when the cointegrating vector is unknown and when weak exogeneity
fails. The new test can be applied to trending series, and the grid-searching information is
efficiently utilized by the new test irrespective of sample sizes.
The limiting null distribution of the new test can be tabulated since no nuisance param-
eters are involved. Monte Carlo simulations show that the size distortion of the new test
is negligible, so the bootstrap is not needed. The power of the new test is comparable to
the existing tests. We provide an application of the new test, in which the long run equilib-
rium between long term and short term interest rates is found and twofold asymmetries are
detected. Noticeably, statistical evidence is provided for the asymmetric adjustment roles;
such evidence is eluded in the single-equation model.
21
Appendix: Mathematical Proof
The asymptotic theory is based on the following lemma. Suppose et is an i.i.d (n×1) sequence
with mean zero, finite fourth moments, and Eete′t = Ω = PP ′ where P is the Cholesky
factor. Let ut = D(L)et denote a stationary linear vector sequence with∑∞
s=0 s|Dsij| <
∞ for each i, j = 1, . . . , n. Define ξt =∑t
j=1 uj, Λ = (D0 + D1 + . . .)P = D(1)P, and
Λ1 =∑∞
i=1E(1(zt−1 < τ)1(zt−1−i < τ)utu′t−i). Let W (r, τ) denote the n-dimensional two-
parameter Brownian motion; W (r, 1) the n-dimensional one-parameter Brownian motion in
r.
Lemma 1
a : T−1/2
[Tr]∑t=1
1(zt−1 < τ)et ⇒ PW (r, τ)
b : T−1
T∑t=1
ξt−11(zt−1 < τ)e′t ⇒ Λ
[∫W (r, 1)dW (r, τ)′
]P ′
c : T−2
T∑t=1
ξt−1ξ′t−11(zt−1 < τ) ⇒ τΛ
[∫W (r, 1)W (r, 1)′dr
]Λ′
Proof of Lemma 1-a: Let vt be an i.i.d (n× 1) vector with mean zero and unity variance
Evtv′t = I. Notice that 1(zt−1 < τ)vit, i = (1, . . . , n), is a strictly stationary and ergodic
martingale difference process with variance E(1(zt−1 < τ)vit)2 = τ. For any (r, τ), the
martingale difference central limit theorem implies that
T−1/2
[Tr]∑t=1
1(zt−1 < τ)vit →d N(0, rτI),
and the asymptotic covariance kernel is
E
T−1/2
[Tr1]∑t=1
1(zt−1 < τ1)vit
T−1/2
[Tr2]∑t=1
1(zt−1 < τ2)vit
= (r1 ∧ r2)(τ1 ∧ τ2).
22
The proof of the stochastic equicontinuity for each component of T−1/2∑[Tr]
t=1 1(zt−1 < τ)vit
is lengthy, and can be found in Caner and Hansen (2001). Because the sequences of marginal
probability measures are tight, we deduce that the univariate stochastic equicontinuity car-
ries over to the vector, see Lemma A.3 of Phillips and Durlauf (1986). The final result is
deduced after applying the Cramer-Wold device and noticing that et = Pvt.
Proof of Lemma 1-b: We apply the Beveridge-Nelson decomposition to ut and obtain
ut = D(1)et + et−1 − et where et = D(L)et, D(L) =∑∞
j=0 djLj and dj =
∑∞s=j+1 ds. The
stated result is obtained by applying Theorem 2.2 in Kurtz and Protter (1991) and Lemma
1-a above.
Proof of Lemma 1-c: This is a trivial extension of result 3 of Theorem 3 of Caner and
Hansen (2001) to the multivariate case.
Proof of Theorem 1: We prove the result when the testing regression is a TVADLM
with one lag:
∆yt = δ1yt−111t−1 + δ2yt−112t−1 + α1∆yt−1 + et.
The proof can be easily extended to a general TVADLM with p lags. Under the null
hypothesis δ1 = δ2 = 0, it follows that α(L)∆yt = et, α(L) = I − α1L. Define ut =
D(L)et, D(L) = α(L)−1, then yt−1 = ξt−1 in Lemma 1. Let yt−1 = (y′t−111t−1, y′t−112t−1)
′,
wt−1 = (y′t−1,∆y′t−1)′, δ = (δ1, δ2), and θ = (δ, α1). Consider the scaling matrix
Υ = diag(TIn, T1/2In).
23
We have
Υ(θ − θ
)=
[Υ−1
(T∑t=1
wt−1w′t−1
)Υ−1
]−1 [Υ−1
T∑t=1
wt−1et
],
where
Υ−1
(T∑t=1
wt−1w′t−1
)Υ−1 ⇒ B =
B11 B12
B′12 B22
,
B11 =
τΛ[∫
W (r, 1)W (r, 1)′dr]Λ′ 0
0 (1− τ)Λ[∫
W (r, 1)W (r, 1)′dr]Λ′
,
B12 = op(1),
B22 = E∆yt−1∆y′t−1,
and
Υ−1
T∑t=1
wt−1et ⇒ C =
C1
C2
,
C1 =
Λ[∫
W (r, 1)dW (r, τ)′]P ′
Λ[∫
W (r, 1)dW (r, 1)′ −∫W (r, 1)dW (r, τ)′
]P ′
C2 = Op(1).
In order to obtain B11 recall that 11t−1 = 1(zt−1 < τ), 12t−1 = 1 − 1(zt−1 < τ), and
11t−112t−1 = 0. The off-diagonal elements of B11 are zeros because of the orthogonality of
11t−1 and 12t−1. The upper-left term of B11 is due to Lemma 1-c; the lower-right term is
due to the facts that T−2∑T
t=1 yt−1y′t−1 ⇒ Λ
[∫W (r, 1)W (r, 1)′dr
]Λ′, cf. Lemma 3.1.(b) of
Phillips and Durlauf (1986), and
T−2
T∑t=1
yt−1y′t−112t−1 = T−2
T∑t=1
yt−1y′t−1(1−1(zt−1 < τ)) ⇒ (1−τ)Λ
[∫W (r, 1)W (r, 1)′dr
]Λ′.
24
B12 = op(1) since T−3/2∑T
t=1 yt−1∆y′t−1 = T−1/2T−1∑T
t=1 yt−1∆y′t−1 = T−1/2Op(1) = op(1).
B22 is due to the Weak Law of Large Number.
We obtain C1 by Lemma 1-b, and because T−1∑T
t=1 yt−1e′t ⇒ Λ
[∫W (r, 1)dW (r, 1)′
]P ′,
cf. Proposition 18.1.(f) of Hamilton (1994), and
T−1
T∑t=1
yt−112t−1e′t =
T−2
T∑t=1
yt−1(1− 1(zt−1 < τ))e′t ⇒ Λ
[∫W (r, 1)dW (r, 1)′ −
∫W (r, 1)dW (r, τ)′
]P ′.
C2 is due to the Central Limit Theorem.
Because Υ−1(∑T
t=1wt−1w′t−1
)Υ−1 is asymptotically block-diagonal, it follows that
Υ(θ − θ
)⇒
B−111 C1
B−122 C2
.
In particular, B−111 C1 indicates that
TIn(δ1 − δ1) ⇒ τ−1Λ′−1
[∫W (r, 1)W (r, 1)′dr
]−1 [∫W (r, 1)dW (r, τ)′
]P ′,
and
TIn(δ2 − δ2) ⇒
(1− τ)−1Λ′−1
[∫W (r, 1)W (r, 1)′dr
]−1 [∫W (r, 1)dW (r, 1)′ −
∫W (r, 1)dW (r, τ)′
]P ′.
This implies the super-consistency of δ, and the consistency of Ωτ . For given τ, let Yτ and
e be the matrices stacking (yt−111t−1, yt−112t−1) and et, respectively. Define Q as projection
25
onto the orthogonal space of the lagged term ∆yt−1. Write the OLS estimates for (δ1, δ2) as
δτ = (δ1, δ2). Then the Wald statistic for H0 : δ1 = δ2 = 0 is
F (τ) = vec(δτ )]′[(Y ′
τQYτ )−1 ⊗ Ωτ
]−1
[vec(δτ )]
= trace[Ω−1/2
τ (e′QYτ )(Y′τQYτ )
−1(Y ′τQe)Ω−1/2′
τ
].
We can show that
T−1Y ′τQe = T−1Σyt−1e
′t − T−1/2
(T−1Σyt−1∆y′t−1
) (T−1Σ∆yt−1∆y′t−1
)−1 (T−1/2Σ∆yt−1e
′t
)= T−1Σyt−1e
′t + op(1) ⇒ C1,
and
T−2Y ′τQYτ = T−2Σyt−1y
′t−1 − T−1
(T−1Σyt−1∆y′t−1
) (T−1Σ∆yt−1∆y′t−1
)−1 (T−1Σ∆yt−1y
′t−1
)= T−2Σyt−1y
′t−1 + op(1) ⇒ B11
Combining with Ω−1/2τ = P−1 + op(1) we can show
F (τ) = trace[Ω−1/2
τ (e′QYτ )(Y′τQYτ )
−1(Y ′τQe)Ω−1/2′
τ
]= trace
[Ω−1/2
τ C ′1B
−111 C1Ω
−1/2′
τ
]+ op(1)
⇒ trace[J ′0J
−11 J0
],
26
where
J0 =
∫W (r, 1)dW (r, τ)′∫
W (r, 1)dW (r, 1)′ −∫W (r, 1)dW (r, τ)′
J1 =
τ[∫
W (r, 1)W (r, 1)′dr]
0
0 (1− τ)[∫
W (r, 1)W (r, 1)′dr] .
The remaining steps for the convergence of supF follow from the continuous mapping the-
orem. End of Proof.
Proof of Theorem 2: Only a sketch of the proof is provided since it is similar to that
of Theorem 1. The testing regression is a TVADLM with one lag and regime-dependent
intercept terms:
∆yt = (c11 + δ1yt−1)11t−1 + (c21 + δ2yt−1)12t−1 + α1∆yt−1 + et.
Under the null hypothesis δ1 = δ2 = 0 and Assumption 3 c11 = c21 = 0 it follows that yt−1 =
ξt−1 in Lemma 1. For given τ, let Yτ and e be the matrices stacking (yt−111t−1, yt−112t−1) and
et, respectively. Define Qd as projection onto the orthogonal space of (11t−1, 12t−1,∆yt−1).
Because of the orthogonality between 11t−1 and 12t−1, and E∆yt−1 = 0, Qd = I− (Qd1+Qd
2+
Qd3), where Q
d1 is projection onto the space of 11t−1, Q
d2 is projection onto the space of 12t−1,
27
and Qd3 is projection onto the space of ∆yt−1. It follows that
T−1Y ′τQ
de = T−1Y ′τ e− T−1Y ′
τQd1e− T−1Y ′
τQd2e− T−1Y ′
τQd3e
T−1Y ′τ e ⇒ C1
T−1Y ′τQ
d1e ⇒ τ−1
τΛ∫W (r, 1)dr
0
W (1, τ)P ′
T−1Y ′τQ
d2e ⇒ (1− τ)−1
0
(1− τ)Λ∫W (r, 1)dr
[W (1, 1)−W (1, τ)]P ′
T−1Y ′τQ
d3e = op(1).
To summarize, we have
T−1Y ′τQ
de = C1 − Λ
∫W (r, 1)drW (1, τ)∫
W (r, 1)dr[W (1, 1)−W (1, τ)]
P ′ + op(1).
Similarly we can show that
T−2Y ′τQ
dYτ = B11−Λ
τ∫W (r, 1)dr
∫W ′(r, 1)dr 0
0 (1− τ)∫W (r, 1)dr
∫W ′(r, 1)dr
Λ′+op(1).
Finally, it follows that
F d(τ) = trace[Ω−1/2
τ (e′QdYτ )(Y′τQ
dYτ )−1(Y ′
τQde)Ω−1/2′
τ
]⇒ trace
[Jd′
0 (Jd1 )
−1Jd0
],
28
where
Jd0 = J0 −
∫W (r, 1)drW (1, τ)∫
W (r, 1)dr[W (1, 1)−W (1, τ)]
Jd1 = J1 −
τ∫W (r, 1)dr
∫W ′(r, 1)dr 0
0 (1− τ)∫W (r, 1)dr
∫W ′(r, 1)dr
.
End of Proof.
Proof of Theorem 3: Consider the testing regression given by
∆yt = (c11 + c12t+ δ1yt−1)11t−1 + (c21 + c22t+ δ2yt−1)12t−1 + et.
For easy of exposition, the above regression excludes the lagged term ∆yt−1 since it is asymp-
totically irrelevant, as shown by the proofs of Theorems 1 and 2. Under the null hypothe-
sis (10) and Assumption 3, yt = y0 + c0t + ξt where ξt =∑t
j=1 uj, ut = α(L)−1et. Because
yt is collinear with the time trend, we consider the hypothetical regression in the canonical
form
∆yt = (c∗11 + c∗12t+ δ∗1ζt−1)11t−1 + (c∗21 + c∗22t+ δ∗2ζt−1)12t−1 + et,
where ζt−1 = yt−1 − c0(t− 1), δ∗1 = δ1, δ∗2 = δ2, c
∗11 = c11 − c0, c
∗21 = c21 − c0, c
∗12 = c12 + c0,
c∗22 = c22 + c0. For given τ, stacking (ζt−111t−1, ζt−112t−1) as Xτ , (11t−1, t11t−1) as M1, and
(12t−1, t12t−1) as M2. Because 11t−112t−1 = 0, ∀t, the projection onto the orthogonal space of
(11t−1, t11t−1, 12t−1, t12t−1), denoted by Qt, is
Qt = I −M1(M′1M1)
−1M ′1 −M2(M
′2M2)
−1M ′2.
29
We can show that
T−1X ′τQ
te = T−1X ′τ [I −M1(M
′1M1)
−1M ′1 −M2(M
′2M2)
−1M ′2]e,
T−1X ′τe ⇒ C1,
T−1X ′τM1(M
′1M1)
−1M ′1e ⇒ C21C
−122 C23,
T−1X ′τM2(M
′2M2)
−1M ′2e ⇒ C31C
−132 C33,
where
C21 =
τΛ∫W (r, 1)dr τΛ
∫rW (r, 1)dr
0 0
≡ ΛC∗21,
C22 =
τ τ∫rdr
τ∫rdr τ
∫r2dr
,
C23 =
W (1, τ)P ′∫rdW (r, τ)′P ′
≡ C∗23P
′.
C31 =
0 0
(1− τ)Λ∫W (r, 1)dr (1− τ)Λ
∫rW (r, 1)dr
≡ ΛC∗31,
C32 =
1− τ (1− τ)∫rdr
(1− τ)∫rdr (1− τ)
∫r2dr
,
C33 =
[W (1, 1)−W (1, τ)]P ′[∫rdW (r, 1)′ −
∫rdW (r, τ)′
]P ′
≡ C∗33P
′.
To summarize,
T−1X ′τQ
te ⇒ C1 − C21C−122 C23 − C31C
−132 C33.
30
In a similar fashion we can show
T−2X ′τQ
tXτ ⇒ B11 − C21C−122 C
′21 − C31C
−132 C
′31.
Finally,
F t(τ) = trace[Ω−1/2
τ (e′QtXτ )(X′τQ
tXτ )−1(X ′
τQte)Ω−1/2′
τ
]⇒ trace
[J t′
0 (Jt1)
−1Jdt
],
where
J t0 = J0 − C∗
21C−122 C
∗23 − C∗
31C−132 C
∗33
J t1 = J1 − C∗
21C−122 C
∗′21 − C∗
31C−132 C
∗′31.
End of Proof.
31
References
Andrews, D.W.K., and Ploberger, W. (1994), “Optimal tests when a nuisance parameter is
present only under the alternative,” Econometrica, 62, 1383–1414.
Balke, N., and Fomby, T. (1997), “Threshold cointegration,” International Economic Review,
38, 627–645.
Banerjee, A., Dolado, J.J., Hendry, D., and Smith, G.W. (1986), “Exploring equilibrium
relationships in econometrics through static models: Some monte carlo evidence,” Oxford
Bulletin of Economics and Statistics, 48, 253–277.
Boswijk, H. P. (1994), “Testing for an unstable root in conditional and structural error
correction models,” Journal of Econometrics, 63, 37–60.
Caner, M., and Hansen, B. E. (2001), “Threshold autoregression with a unit root,” Econo-
metrica, 69, 1555–1596.
Davies, R. B. (1977), “Hypothesis testing when a nuisance parameter is present only under
the alternative,” Biometrika, 64, 247–254.
(1987), “Hypothesis testing when a nuisance parameter is only identified under the alter-
native,” Biometrika, 74, 33–43.
Enders, W., and Siklos, P. L. (2001), “Cointegration and threshold adjustment,” Journal of
Business & Economic Statistics, 19, 166–176.
Giovanini, A. (1988), “Exchange rates and traded goods prices,” Journal of International
Economics, 24, 45–68.
Hamilton, J. D. (1994), Time Series Analysis, Princeton: Princeton University Press.
32
Hansen, B. E. (1996), “Inference when a nuisance parameter is not identified under the null
hypothesis,” Econometrica, 64, 413–430.
Horvath, M. T., and Watson, M. W. (1995), “Testing for cointegration when some of the
cointegrating vectors are prespecified,” Econometric Theory, 11, 984–1014.
Jang, B-G., Koo, H. K., Liu, H., and Loewenstein, M. (2007), “Liquidity premia and trans-
action costs,” Journal of Finance, 62, 2329–2366.
Kremers, J. J. M., Ericsson, N. R., and Dolado, J. J. (1992), “The power of cointegration
tests,” Oxford Bulletin of Economics and Statistics, 54, 325–348.
Kurtz, T., and Protter, P. (1991), “Weak limit theorems for stochastic integrals and stochas-
tic differential equations,” Annals of Probability, 19, 1035–1070.
Li, J., and Lee, J. (2010), “ADL tests for threshold cointegration,” Journal of Time Series
Analysis, 31, 241–254.
Phillips, P. C. B., and Durlauf, S. (1986), “Multiple time series regression with integrated
processes,” Review of Economic Studies, 53, 473–496.
Phillips, P. C. B., and Ouliaris, S. (1990), “Asymptotic properties of residual based tests for
cointegration,” Econometrica, 58, 165–193.
Pippenger, M. K., and Goering, G. E. (1993), “A note on the empirical power of unit root
tests under threshold processes,” Oxford Bulletion of Economics and Statistics, 55, 473–
481.
Seo, M. (2006), “Bootstrap testing for the null of no cointegration in a threshold vector
error,” Journal of Econometrics, 134, 129–150.
33
Sephton, P. S. (2003), “Spatial market arbitrage and threshold cointegration,” American
Journal of Agricultural Economics, 85, 1041—1046.
Sims, C. A., Stock, J. H., and Watson, M. W. (1990), “Inference in linear time series models
with some unit roots,” Econometrica, 58, 113–144.
Taylor, M. P., and Peel, D. A. (2000), “Nonlinear adjustment, long-run equilibrium and
exchange rate fundamentals,” Journal of International Money and Finance, 19, 33—53.
34
Table 1: Asymptotic Critical Values
Φ = [0.10, 0.90] w/o constant w/ constant w/ trend1% 5% 10% 1% 5% 10% 1% 5% 10%
n = 2 35.0 30.6 28.3 49.0 43.7 40.6 62.4 54.8 51.5n = 3 61.4 55.3 52.1 78.2 71.7 67.9 95.1 87.3 83.5n = 4 95.5 88.1 83.9 114.8 107.5 103.2 135.5 126.9 122.5n = 5 138.0 128.8 123.8 160.2 151.4 146.8 184.5 175.0 169.7
Φ = [0.15, 0.85] w/o constant w/ constant w/ trend1% 5% 10% 1% 5% 10% 1% 5% 10%
n = 2 35.5 30.2 27.9 48.2 42.5 39.9 60.0 54.3 51.1n = 3 61.3 55.2 52.1 77.3 70.4 67.2 95.1 86.9 82.8n = 4 95.4 87.7 83.6 116.6 106.9 102.6 136.1 126.7 122.2n = 5 137.2 127.9 123.4 161.3 151.0 145.5 184.1 174.2 169.0
Φ = [0.20, 0.80] w/o constant w/ constant w/ trend1% 5% 10% 1% 5% 10% 1% 5% 10%
n = 2 35.4 29.6 27.3 48.0 42.0 38.9 59.4 53.5 50.4n = 3 60.7 54.2 50.9 77.6 70.3 66.7 94.2 86.1 82.4n = 4 95.9 86.8 82.7 112.4 105.6 101.4 135.8 126.0 121.4n = 5 135.8 125.9 121.6 158.9 149.4 144.7 185.4 173.4 168.2
35
Table 2: Estimation Results
xt y1t − 2.81− 0.68y2tτ -1.25
RegressorsEquation (30) y1t−111t−1 y1t−112t−1 y2t−111t−1 y2t−112t−1 ∆y1t−1 ∆y2t−1 11t−1 12t−1
Coefficients 0.06 -0.05 -0.03 0.03 -0.03 0.25 -0.15 0.18T Values 0.77 −3.98∗∗ -0.53 3.04∗∗ -1.32 5.91∗∗ -0.99 4.03∗∗
Ljung-Box Q Test 1.13
RegressorsEquation (31) y1t−111t−1 y1t−112t−1 y2t−111t−1 y2t−112t−1 ∆y1t−1 ∆y2t−1 11t−1 12t−1
Coefficients 0.22 -0.03 -0.23 0.03 0.40 0.30 0.19 0.06T Values 1.93∗ -1.45 −3.43∗∗ 1.97∗∗ 10.75∗∗ 5.10∗∗ 0.88 0.90Ljung-Box Q Test 0.52
Tests for Threshold CointegrationADL Test supτ∈[0.15,0.85] F
d(τ) = 128.04∗∗
Enders-Siklos Test 18.62∗∗
Note: ∗∗ significant at the 5% level, ∗ significant at the 10% level.
36
0 0.5 10
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5Panel A
c1
Size
ADLESECM
0 0.5 10
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5Panel B
c2
Size
ADLESECM
0 0.5 10
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5Panel C
d1
Size
ADLESECM
Figure 1: Rejection Frequency of Various Tests under the Null Hypothesis at the 5% Level.
37
−1 −0.5 00
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Panel A, f=0
k1
Powe
r
ADLESECMADLPESP
−1 −0.5 00
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Panel B, f=0.05
k1
Powe
r
ADLESECMADLPESP
−1 −0.5 00
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Panel C, f=0.1
k1
Powe
r
ADLESECMADLPESP
Figure 2: Rejection Frequency of Various Tests under the Alternative Hypothesis at the 5%
Level.
38
0 100 200 300 400 500 6000
5
10
15
20Panel A: Time Series Plot
Obs
Fedfunds10ybond
0 100 200 300 400 500 600−4
−2
0
2
4Panel B: Error Correction Term
Obs
Figure 3: Time Series Plot
39