Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Testing a large set of zero restrictions in regression models,
with an application to mixed frequency Granger causality∗
Eric Ghysels†
UNC Chapel HillJonathan B. Hill‡
UNC Chapel HillKaiji Motegi§
Kobe University
This draft: November 10, 2019
Abstract
This paper proposes a new test for a large set of zero restrictions in regression models based
on a seemingly overlooked, but simple, dimension reduction technique. The procedure in-
volves multiple parsimonious regression models where key regressors are split across simple
regressions. Each parsimonious regression model has one key regressor and other regressors
not associated with the null hypothesis. The test is based on the maximum of the squared
parameters of the key regressors. Parsimony ensures sharper estimates and therefore im-
proves power in small sample. We present the general theory of our test and focus on mixed
frequency Granger causality as a prominent application involving many zero restrictions.
Keywords: dimension reduction, Granger causality test, max test, Mixed Data Sampling
(MIDAS), parsimonious regression models.
JEL classification: C12, C22, C51.
∗We are grateful for the editor, Marine Carrasco, and two anonymous referees for their helpful comments. Wealso thank Manfred Deistler, Shigeyuki Hamori, Peter R. Hansen, Atsushi Inoue, Helmut Lutkepohl, MassimilianoMarcellino, Yasumasa Matsuda, Daisuke Nagakura, Eric Renault, Peter Robinson, seminar participants at NYUStern, Universite Libre de Bruxelles, Keio University, and UNC Chapel Hill, and participants at 2014 NBER-NSFTime Series Conference, 25th EC2 Conference, SETA 2015, 11th World Congress of the Econometric Society, 2015JJSM, Recent Developments in Time Series and Related Fields, Workshop on Advances in Econometrics 2017, and2019 JJSM for helpful comments. The third author is grateful for financial supports of JSPS KAKENHI (GrantNumber 16K17104), Kikawada Foundation, Mitsubishi UFJ Trust Scholarship Foundation, Nomura Foundation,and Suntory Foundation.
†Corresponding author. Department of Economics and Department of Finance, Kenan-Flagler Business School,University of North Carolina at Chapel Hill, and CEPR. Address: Chapel Hill, NC 27599, USA. E-mail:[email protected]
‡Department of Economics, University of North Carolina at Chapel Hill. Address: Chapel Hill, NC 27599,USA. E-mail: [email protected]
§Graduate School of Economics, Kobe University. Address: Kobe, Hyogo 657-8501, Japan. E-mail:[email protected]
1 Introduction
We propose a new test designed for a large set of zero restrictions in regression models. The test
is based on a seemingly overlooked, but simple, dimension reduction technique for regression
models. Suppose that the underlying data generating process for a scalar yt is
yt = z′ta+ x′
tb+ ϵt,
where zt is assumed to have a small dimension p while xt may have a large but finite dimension
h. We want to test the null hypothesis H0 : b = 0 against a general alternative hypothesis H1 :
b = 0.
A classical approach for testing H0 relies on what we call a full regression model yt = z′tα +
x′tβ + ut and a Wald test. This approach may result in imprecise inference when b has a large
dimension relative to sample size n. The asymptotic χ2-test may suffer from size distortions in
small sample due to parameter proliferation. A bootstrap method can be employed to improve
empirical size, but this generally results in the size-corrected bootstrap test having low power
due to large size distortions (cf. Davidson and MacKinnon (2006)). A shrinkage estimator can
be used, including Lasso, Adaptive Lasso, or Ridge Regression, but these are valid under a
sparsity assumption. This means that when applied to our setting, the shrinkage estimator may
not be valid under the alternative because the high dimensional parameter set being tested may
have all non-zero components, contradicting the sparsity assumption.1
To circumvent the issue of parameter proliferation, we propose to split each of the key
regressors xt = [x1t, . . . , xht]′ across separate regression models. This approach leads to what
we call parsimonious regression models:
yt = z′tαi + βixit + uit for i = 1, . . . , h.
The ith parsimonious regression model has zt and the ith element of xt only, so that parameter
proliferation is not an issue. We then compute a max test statistic:
Tn = max{(√nβn1)
2, . . . , (√nβnh)
2}.
The asymptotic distribution of Tn is non-standard under H0 : b = 0, but it is straightforward to
approximate a p-value by drawing directly from an asymptotically valid approximation of the
asymptotic distribution. We will prove that, under H1 : b = 0, at least one of {βn1, . . . , βnh}has a nonzero probability limit under fairly weak conditions. This key result ensures that,
although the proposed approach does not deliver a consistent estimator of b, we can reject the
null hypothesis with probability approaching one for any direction b = 0 under the alternative
hypothesis. The max test is therefore a consistent test.
The maximum of a sequence of statistics with (or without) pointwise Gaussian limits has
1Under the null the sparsity assumption is true by construction. Even if evidence points to the null, a TypeII error may be occurring in which case the test statistic need not have its intended asymptotic properties.
1
a long history (e.g. Gnedenko (1943)), including the maximum correlation over an increasing
sequence of integer displacements. See, e.g., Berman (1964) and Hannan (1974). The use of
a max statistic across econometric models is used in White’s (2000) predictive model selection
criterion. Evidently ours is the first application of a maximum statistic to a regression model
as a core test statistic for zero restrictions.
We demonstrate via Monte Carlo simulations that in comparison to the bootstrapped Wald
test, the max test is more powerful particularly as h increases and/or there is a stronger corre-
lation among regressors. In some cases, the max test is more than three times as powerful as
the bootstrapped Wald test. An intuitive explanation for the dominance of the max test is as
follows. Performing the Wald test requires the computation of Γ = (1/n)∑n
t=1XtX′t, where
Xt = [z′t,x
′t]′ is a vector of regressors in the full regression model. The resulting Γ is a less
precise estimator for E[XtX′t] as the dimension p+h increases or there is a stronger correlation
among the regressors. Moreover, the bootstrapped Wald test tends to lose finite sample power
under such circumstances.
The max test, by contrast, bypasses the computation of Γ since it only requires the compu-
tation of Γii = (1/n)∑n
t=1XitX′it for i = 1, . . . , h, where Xit = [z′
t, xit]′ is a (p+ 1)× 1 vector
of regressors in the ith parsimonious regression model. Since Γii has a much smaller dimension
than Γ, its accuracy improves considerably. Hence the max test remains powerful as the dimen-
sion h or the contemporaneous correlation within Xt increases. Since large dimensionality and
strong correlation among regressors (i.e. multicollinearity) are indeed an issue in many economic
applications, the max test is a promising alternative to the Wald test.
After presenting the general theory of the max test, we focus on Mixed Data Sampling (MI-
DAS) Granger causality as a prominent application involving many zero restrictions. Classical
tests for Granger’s (1969) causality are typically designed for single-frequency data (i.e. they
aggregate data with possibly different frequencies to the common lowest frequency). A popular
Granger causality test is a Wald test based on multi-horizon vector autoregressive (VAR) mod-
els. An advantage of the VAR approach is the capability of handling causal chains among more
than two variables.2
Time series are often sampled at different frequencies, and it is well known that temporal
aggregation generates or conceals Granger causality.3 Standard VAR models are designed for
single-frequency data, and causality tests based on those models may produce misleading results
of spurious (non-)causality. To alleviate the adverse impact of temporal aggregation, Ghysels,
Hill, and Motegi (2016) develop Granger causality tests that explicitly take advantage of mixed
frequency data. They extend Dufour, Pelletier, and Renault’s (2006) VAR-based causality test,
exploiting Ghysels’ (2016) mixed frequency vector autoregressive (MF-VAR) models.4
A challenge not addressed by Ghysels, Hill, and Motegi (2016) is parameter proliferation in
MF-VAR models which has a negative impact on the finite sample performance of their tests.
2See Lutkepohl (1993), Dufour and Renault (1998), and Dufour, Pelletier, and Renault (2006) among others.3See Ghysels, Hill, and Motegi (2016) and references therein for the consequence of temporal aggregation on
Granger causality.4Foroni, Ghysels, and Marcellino (2013) provide a survey of MF-VAR models.
2
Indeed, if we let m be the ratio of high and low frequencies (e.g. m = 3 in mixed monthly and
quarterly data), then for bivariate mixed frequency settings the MF-VAR is of dimension m + 1.
Parameter proliferation occurs when m is large, and worsens as the lag length of VAR increases.
In such a case, an asymptotic Wald test results in size distortions, while a bootstrapped Wald
test features low size-corrected power, a common consequence when a size-distorted asymptotic
test is bootstrapped (cf. Davidson and MacKinnon (2006)).5
The max test proposed in the present paper is a useful solution to the large dimensionality
associated with mixed frequency Granger causality. This paper focuses on testing for Granger
causality from a high frequency variable xH to a low frequency variable xL.6 We show via Monte
Carlo simulations that the max test dominates the Wald test in terms of empirical power. In
some cases, the max test is more than four times as powerful as the bootstrapped Wald test.
As an empirical application, we analyze Granger causality from a weekly interest rate spread
to real gross domestic product (GDP) growth in the United States. Based on a rolling window
analysis, the max test with mixed frequency data yields an intuitive result that the interest rate
spread causes the GDP growth until 1990s and the causality vanishes after that. The Wald test
with mixed frequency data yields a counter-intuitive result that significant causality emerges
only recently.
The remainder of the paper is organized as follows. In Section 2, we present general theory
of the parsimonious regression models and max tests. In Section 3, we focus on mixed fre-
quency Granger causality tests as a specific application. In Section 4, we perform Monte Carlo
simulations for a generic linear regression framework and MIDAS frameworks. The empirical
application on interest rate spread and GDP is presented in Section 5, and Section 6 concludes
the paper. In Technical Appendices, we provide omitted proofs and some technical details.
2 General theory
Consider a data generating process (DGP) in which a univariate time series {yt} depends linearlyon two groups of regressors zt = [z1t, . . . , zpt]
′ and xt = [x1t, . . . , xht]′. Define the σ-field Ft =
σ(Yτ : τ ≤ t+ 1) with all variables Yt = [yt−1,X′t]′, where Xt = [z′
t,x′t]′. Thus, Ft−1 contains
information on {yt−1, yt−2, ...} and the regressors {(zt,xt), (zt−1,xt−1), . . . }.
Assumption 2.1. The true DGP is
yt =
p∑k=1
akzkt +
h∑i=1
bixit + ϵt. (2.1)
The error {ϵt} is a stationary martingale difference sequence (mds) with respect to the increasing
σ-field filtration Ft ⊂ Ft+1, and σ2 ≡ E[ϵ2t ] > 0.
5Gotz, Hecq, and Smeekes (2016) propose reduced rank regression and Bayesian estimation to improve thefinite sample performance of Ghysels, Hill, and Motegi’s (2016) test under a large-dimensional MF-VAR.
6Granger causality from xL to xH poses a harder challenge due to the difficulty of ensuring parsimony underthe null and identifying parameters under the alternative. See Section 3.2 for discussions.
3
Remark 2.1. The mds assumption allows for conditional heteroskedasticity of unknown form,
including GARCH-type processes. We can also easily allow for stochastic volatility or other
random volatility errors by expanding the definition of the σ-field Ft. The mds assumption
can be relaxed at the expense of more technical proofs, and a more complicated asymptotic
variance structure and subsequent estimator. Since this is all well known, we do not consider
such generalizations in the present paper.
Let n be a sample size, and define an n × (p + h) matrix of regressors X = [X1, . . . ,Xn]′.
Assumption 2.2 is a standard assumption that rules out perfect multicollinearity among the
regressors.
Assumption 2.2. X is of full column rank p+ h almost surely.
Assumption 2.3. {Xt, ϵt} are strictly stationary and ergodic. Xt is square integrable.
Using standard vector notations, e.g., a = [a1, . . . , ap]′, DGP (2.1) is rewritten as
yt = z′ta+ x′
tb+ ϵt. (2.2)
where {zt} are auxiliary regressors whose coefficients are not our main target. The number
of the auxiliary regressors, p, is assumed to be relatively small. We want to test for the zero
restrictions with respect to main regressors {xt}:
H0 : b = 0.
The number of zero restrictions, h, is assumed to be finite but potentially large relative to sample
size n.
A classical approach of testing for H0 : b = 0 is the Wald test based on what we call a full
regression model :
yt =
p∑k=1
αkzkt +h∑
i=1
βixit + ut, (2.3)
where we take it granted that model (2.3) is correctly specified relative to the true DGP (2.1)
in order to focus ideas. Given model (2.3), it is straightforward to compute a Wald statistic
with respect to H0 : b = 0. The statistic has an asymptotic χ2 distribution with h degrees of
freedom under Assumptions 2.1-2.3. A potential problem of this approach is that the asymptotic
approximation may be poor when there are many zero restrictions relative to sample size. A
parametric or wild bootstrap can be used to control for the size of the test, but this typically
leads to low size-corrected power. It is therefore of use to propose a new test that achieves a
sharper size and higher power.
We propose parsimonious regression models:
yt =
p∑k=1
αkizkt + βixit + uit, i = 1, . . . , h. (2.4)
4
A crucial difference between the full regression model (2.3) and the parsimonious regression
models (2.4) is that, in the latter, there are h equations and the key regressor xit along with the
auxiliary regressors {z1t, . . . , zpt} appear in the ith equation. The parameters {α1i, . . . , αpi} may
differ across the equations i = 1, . . . , h, and unless the null hypothesis is true, they are generally
not equal to the true values {a1, . . . , ap}. We run least squares for each parsimonious regression
model to get {βn1, . . . , βnh}. Then we formulate a max test statistic
Tn = max{(√nβn1)
2, . . . , (√nβnh)
2}. (2.5)
Equations (2.4) and (2.5) form a core part of our approach. The number of regressors in each
parsimonious regression model is p+ 1, which is much smaller than p+ h in the full regression
model. As a result, the precision of βni improves toward its probability limit β∗i = plimn→∞βni
for each i.
Under H0 : b = 0, each parsimonious regression model is correctly specified and therefore
βnip→ β∗
i = bi = 0 straightforwardly. The asymptotic distribution of Tn under H0 is non-
standard, but we can compute a simulation-based p-value by drawing from an asymptotically
valid approximation to the asymptotic distribution directly (see Theorems 2.1-2.3 below). Under
H1 : b = 0, each parsimonious regression model is in general misspecified due to an omitted
regressor, and therefore βnip→ β∗
i = bi. A key result to be proven in Theorems 2.4 and 2.5 is
that at least one of {β∗1 , . . . , β
∗h} must be nonzero under H1 : b = 0. Hence the max test achieves
consistency Tnp→ ∞, although βni itself may not converge in probability to the true value bi.
We can potentially generalize the test statistic (2.5) by adding a weight to each term:
Tn = max{(√nwn1βn1)
2, . . . , (√nwnhβnh)
2},
where {wn1, . . . , wnh} is a sequence of possibly stochastic L2-bounded positive scalar weights
with non-random positive probability limits. We restrict ourselves to the equal weights wni = 1
in order to focus on the key implications from (2.4) and (2.5). Exploring non-equal weights is a
possible future task.7
2.1 Asymptotic theory under the null hypothesis
We derive the asymptotic distribution of Tn in (2.5) underH0 : b = 0. Rewrite each parsimonious
regression model (2.4) as
yt = X ′itθi + uit, i = 1, . . . , h, (2.6)
where Xit = [z1t, . . . , zpt, xit]′ and θi = [α′
i, βi]′ = [α1i, . . . , αpi, βi]
′. Stack all parameters across
the h models as θ = [θ′1, . . . ,θ
′h]
′. Define a selection matrix R that selects β = [β1, . . . , βh]′ from
7A natural choice would be the inverted standard error V−1/2ni , where Vni is a consistent estimator for the
asymptotic variance of√nβni under the null. This allows for a control of different sampling and asymptotic
dispersions of the estimators across parsimonious regression models, which may improve empirical and localpower. For the sake of compactness, we do not explore such generality here.
5
θ. R is an h× (p+ 1)h full row rank matrix such that β = Rθ, hence
R =
01×p 1 01×p 0 . . . 01×p 0
01×p 0 01×p 1 . . . 01×p 0...
......
......
......
01×p 0 01×p 0 . . . 01×p 1
. (2.7)
Theorem 2.1. Under H0 : b = 0, we have that Tnd→ max{N 2
1 , . . . ,N 2h} as n → ∞, where
N = [N1, . . . ,Nh]′ is distributed as N(0,V ) with covariance matrix:
V ≡ RSR′ ∈ Rh×h, (2.8)
where
S =
Σ11 . . . Σ1h
.... . .
...
Σh1 . . . Σhh
∈ R(p+1)h×(p+1)h,
Γij = E[XitX′jt], Λij = E[ϵ2tXitX
′jt], Σij = Γ−1
ii ΛijΓ−1jj , i, j ∈ {1, . . . , h}.
(2.9)
All proofs appear in Technical Appendix A.
Remark 2.2. Under Assumption 2.1, the error {ϵt} is an adapted martingale difference. Sup-
pose also that {ϵ2t − σ2} is an adapted martingale difference with σ2 = E[ϵ2t ] (i.e. conditional
homoskedasticity). Then, we get a well-known simplification Λij = σ2Γij .
Remark 2.3. {X1t, . . . ,Xht} share the common regressors zt. It is therefore easy to prove that
S is a positive semi-definite and singular matrix.8
Remark 2.4. We do not have a general proof that the asymptotic covariance matrix V is
nonsingular, except for a simple case with (p, h) = (1, 2). This is irrelevant for performing
the max test since we do not need to invert V . As we show below, it is easy to compute
an asymptotically valid p-value by drawing from an asymptotically valid approximation to the
distribution N(0,V ).
2.2 Simulated p-value
Let Vn be a consistent estimator for V , and draw M samples of vectors {N (1), . . . ,N (M)} inde-
pendently fromN(0, Vn).Now compute artificial test statistics T (j)n = max{(N (j)
1 )2, . . . , (N (j)h )2}
8Observe that Σij = E[ϵ2tXitX′jt], where Xit = E [XitX
′it]
−1Xit. Now define the set of all (non-unique)
regressors across all parsimonious regression models: Xt = [X ′1t, ..., X
′ht]
′ ∈ R(p+1)h. Let λ = [λ′1, ...,λ
′h]
′ whereλi = [λij ]
p+1j=1 is (p + 1) × 1 and λ′λ = 1. Then λ′Sλ =
∑hi=1
∑hj=1 E[ϵ2tλ
′iXitX
′jtλj ] = E[ϵ2t (λ
′Xt)2] ≥ 0.
Since each Xit contains zt, it is possible to find a non-zero λ such that λ′Xt = 0, e.g. λ1p+1 = 0, λ2 = −λ1,and all other λi = 0. Therefore S is singular and positive semi-definite.
6
for j = 1, . . . ,M. An asymptotic p-value approximation for Tn is
pn,M =1
M
M∑j=1
I(T (j)n > Tn
),
where I(A) is the indicator function that equals 1 if A occurs and 0 otherwise. Since N (j) are
i.i.d. over j, and M can be made arbitrarily large, by the Glivenko-Cantelli Theorem pn,M can
be made arbitrarily close to P (T (1)n > Tn). The proposed max test is to reject H0 at level α
when pn,Mn < α and to accept H0 when pn,Mn ≥ α, where {Mn}n≥1 is a sequence of positive
integers that satisfies Mn → ∞.9
Define the max test limit distribution under H0 as F 0(c) = P (max1≤i≤h(N(1)i )2 ≤ c). The
asymptotic p-value is therefore F 0(Tn) ≡ 1 − F 0(Tn) = P (max1≤i≤h(N(1)i )2 ≥ Tn). By an
argument identical to Hansen (1996: Theorem 2), we have the following link between the p-
value approximation P (T (1)n > Tn) and the asymptotic p-value for Tn.
Theorem 2.2. Let {Mn}n≥1 be a sequence of positive integers, Mn → ∞. Under Assumptions
2.1-2.3 P (T (1)n > Tn) = F 0(Tn) + op(1), hence pn,Mn = F 0(Tn) + op(1). Therefore under H0,
P (pn,Mn < α) → α for any α ∈ (0, 1).
A consistent estimator Vn is computed as follows. Run least squares for each parsimo-
nious regression model to get θni = [α′ni, βni]
′ and residuals uit = yt − X ′itθni. Define Γij =
(1/n)∑n
t=1XitX′jt, Λij = (1/n)
∑nt=1 u
2itXitX
′jt, Σij = Γ−1
ii ΛijΓ−1jj , S = [Σij ]i,j , and
Vn = RSR′.
Theorem 2.3. Under Assumptions 2.1-2.3, Vnp→ V where V is some matrix that satisfies
||V || < ∞. Specifically, V = V ≡ RSR′ under H0, and under H1 it follows RS∗R′ where
S∗ = [Γ−1ii Λ∗
ijΓ−1jj ]i,j .
2.3 Limits under the null and alternative hypotheses
Under the alternative hypothesis H1 : b = 0, βni in general does not converge in probability to
the true value bi due to omitted regressors. Let β∗i = plimn→∞βni be the so-called pseudo-true
value of βi. The same notation applies to α∗i and θ∗
i = [α∗′i , β
∗i ]
′. Let βn = [βn1, . . . , βnh]′ and
β∗ = [β∗1 , . . . , β
∗h]
′. We can characterize θ∗i as follows.
Theorem 2.4. Let Assumptions 2.1-2.3 hold. Let Γii = E[XitX′it] ∈ R(p+1)×(p+1) and Ci =
9The computation of pn,M is simple and fast. The latter follows since N (j) i.i.d.∼ N(0,V ) are easily computed
by standard modern software packages. The test statistics T (j)n then are trivially easy to compute.
7
E[Xitx′t] ∈ R(p+1)×h. Then, θn
p→ θ∗ = [θ∗′1 , . . . ,θ
∗′h ]
′, where
θ∗i =
α∗1i...
α∗pi
β∗i
=
a1...
ap
0
+ Γ−1ii Cib, i = 1, . . . , h. (2.10)
Further, βnp→ β∗ = Rθ∗ by construction.
Theorem 2.4 provides useful insights on the relationship between the underlying coefficient
b and the pseudo-true value β∗. First, it is clear from (2.10) that β∗ = 0 whenever b = 0.
This is an intuitive result since each parsimonious regression model is correctly specified under
H0 : b = 0. Second, as the next theorem proves, b = 0 whenever β∗ = 0. This is a key result that
allows us to identify the null and alternative hypotheses exactly. Of course, our approach cannot
identify all of {b1, . . . , bh} under H1. We can, however, identify that at least one of {b1, . . . , bh}must be non-zero, which is sufficient for rejecting H0 : b = 0.
Theorem 2.5. Let Assumptions 2.1-2.3 hold. Then β∗ = 0 implies b = 0, hence β∗ = 0 if and
only if b = 0. Therefore βnp→ 0 if and only if b = 0.
A proof of Theorem 2.5 exploits the non-singularity of E[ztz′t] and E[xtx
′t], which is ensured
by Assumption 2.2. We next present a simple example which illustrates Theorem 2.5.
Example 2.1 (Identification). Suppose that the true DGP is
yt = a1 + b1x1t + b2x2t + ϵt
and we run two parsimonious regression models
yt = α11 + β1x1t + u1t and yt = α12 + β2x2t + u2t.
Notice that we have only one common regressor z1t = 1 and two main regressors {x1t, x2t}.Assume for simplicity that E[xit] = 0 and E[x2it] = 1 for i = 1, 2. Finally, since Assumption 2.2
rules out perfect multicollinearity, we have that ρ12 = E[x1tx2t] ∈ (−1, 1).
In this setting, it follows that Xit = [1, xit]′, xt = [x1t, x2t]
′, and
Γii = E[XitX′it] =
[1 0
0 1
], C1 = E[X1tx
′t] =
[0 0
1 ρ12
], C2 = E[X2tx
′t] =
[0 0
ρ12 1
].
Substitute these quantities into (2.10) to get
β∗1 = b1 + ρ12b2 and β∗
2 = b2 + ρ12b1. (2.11)
Now we verify Theorem 2.5 by showing that β∗ = 0 =⇒ b = 0. Assume β∗1 = β∗
2 = 0. If ρ12 = 0,
8
then it is trivial from (2.11) that b1 = b2 = 0. If ρ12 = 0, then (2.11) implies that
(1 + ρ12)(1− ρ12)
ρ12× b1 = 0.
Since ρ12 ∈ (−1, 1), it must be the case that b1 = 0 and therefore b2 = 0.
Theorems 2.1-2.5 together imply the max test statistic has its intended limit properties
under either hypothesis. First, observe that the max test statistic construction (2.5) indicates
that Tnp→ ∞ at rate n if and only if β∗ = 0, and by Theorems 2.4 and 2.5 βn
p→ β∗ = 0
under H1 : b = 0. In conjunction with the p-value approximation consistency Theorem 2.2, the
consistency of the max test follows.
Theorem 2.6. Let Assumptions 2.1-2.3 hold, then Tnp→ ∞ at rate n and therefore P (pn,Mn <
α) → 1 for any α ∈ (0, 1) if and only if H1 : b = 0 is true. In particular, H1 : b = 0 is true if
and only if Tn/np→ max
{(β∗
1)2, . . . , (β∗
h)2}> 0.
An immediate consequence of Theorems 2.1, 2.2, 2.5, and 2.6 is that the limiting null distri-
bution arises if and only if H0 is true.
Corollary 2.7. Let {Mn}n≥1 be a sequence of positive constants, Mn → ∞. Let Assumptions
2.1-2.3 hold. Then Tnd→ max{N 2
1 , . . . ,N 2h} as n → ∞ and therefore P (pn,Mn < α) → α for any
α ∈ (0, 1) if and only if H0 : b = 0 is true.
To conclude this section, we briefly discuss a misspecification problem. Consider the case
where a true DGP has h regressors {x1t, . . . , xht}, while we run only h parsimonious regression
models with h < h so that {xh+1,t, . . . , xht} are not used. In such a case, the max test is
generally inconsistent, just as the Wald test is. Observe that (2.10) still holds for i = 1, . . . , h
when there is under-specification. (See the proof of Theorem 2.4 for derivation.) When h < h,
having β∗1 = · · · = β∗
h= 0 does not imply that b = 0. Below is a simple counter-example.
Example 2.2 (Non-identification due to Under-specification). Continue Example 2.1 except for
that h = 1 now. Hence we run only one regression model yt = α11 + β1x1t + u1t. In this simple
set-up, the parsimonious regression approach is identical to the full regression approach with
h = 1. By (2.11), β∗1 = b1 + ρ12b2. For any given ρ12 ∈ (−1, 1), having β∗
1 = 0 does not imply
b1 = b2 = 0 because (b1, b2) can take any value as long as b1 = −ρ12b2.
3 Mixed frequency Granger causality
In this section, we discuss testing for Granger causality between bivariate mixed frequency time
series as a prominent example involving a large set of zero restrictions. We restrict ourselves
to the bivariate case where we have a high frequency variable xH and a low frequency variable
xL.10
10The trivariate case involves causal chains whose complexity detracts us from the main theme of dimensionreduction. See Dufour and Renault (1998), Dufour, Pelletier, and Renault (2006), and Hill (2007) for furtherdiscussions on causal chains.
9
We first set up a DGP of xH and xL. Let m denote the ratio of sampling frequencies, i.e.
the number of high frequency time periods in each low frequency time period τ ∈ Z. We assume
throughout the paper that m is fixed (e.g. m = 3 months per quarter) in order to focus ideas and
reduce notation. All of our main results would carry over to a time-varying sequence {m(τ)}(e.g. daily xH and monthly xL) in a straightforward way.
Example 3.1 (Empirical Example of Mixed Frequency Data). Mixed frequency data arise
when we analyze a monthly variable xH and a quarterly variable xL, where m = 3. Suppose
that xH(τ, 1) is the first monthly observation in quarter τ, xH(τ, 2) is the second, and xH(τ, 3)
is the third. A leading example in macroeconomics is quarterly real GDP growth xL(τ), where
existing analyses on Granger causality use unemployment, oil prices, inflation, interest rates,
etc., often aggregated into quarters (see Hill (2007) for references). Consider monthly infla-
tion in quarter τ, denoted as [xH(τ, 1), xH(τ, 2), xH(τ, 3)]′. The resulting stacked system is
{xH(τ, 1), xH(τ, 2), xH(τ, 3), xL(τ)}. The assumption that xL(τ) is observed after xH(τ,m) is
merely a convention.
In the bivariate case, we have a K × 1 mixed frequency vector
X(τ) = [xH(τ, 1), . . . , xH(τ,m), xL(τ)]′,
where K = m + 1. Define the σ-field Fτ ≡ σ(X(τ ′) : τ ′ ≤ τ). We assume as in Ghysels (2016)
and Ghysels, Hill, and Motegi (2016) that E[X(τ)|Fτ−1] has a version that is almost surely
linear in {X(τ − 1), . . . ,X(τ − p)} for some finite p ≥ 1.
Assumption 3.1. The mixed frequency vector X(τ) is governed by a MF-VAR(p) for finite
p ≥ 1:xH(τ, 1)
...
xH(τ,m)
xL(τ)
︸ ︷︷ ︸
≡X(τ)
=
p∑k=1
d11,k . . . d1m,k c(k−1)m+1
.... . .
......
dm1,k . . . dmm,k ckm
bkm . . . b(k−1)m+1 ak
︸ ︷︷ ︸
≡Ak
xH(τ − k, 1)
...
xH(τ − k,m)
xL(τ − k)
︸ ︷︷ ︸
≡X(τ−k)
+
ϵH(τ, 1)
...
ϵH(τ,m)
ϵL(τ)
︸ ︷︷ ︸
≡ϵ(τ)
(3.1)
or compactly X(τ) =∑p
k=1AkX(τ − k) + ϵ(τ). The error {ϵ(τ)} is a strictly stationary
martingale difference sequence (mds) with respect to increasing Fτ ⊂ Fτ+1, with a positive
definite covariance matrix E[ϵ(τ)ϵ(τ)′].
Remark 3.1. The present paper borrows notation from Ghysels, Hill, and Motegi (2016). There
it is assumed that all variables follow a high frequency VAR process, but only some are observable
in a lower frequency. A central contribution is the recovery of true high frequency causal patterns
based on a mixed frequency model like (3.1). The focus here is decidedly different, with an aim
for improved inference under parameter proliferation as in mixed frequency models. We therefore
do not work with a high frequency model, nor consider the link between inference under mixed
versus high frequency models.
10
Remark 3.2. Model (3.1) captures bi-directional mixed frequency Granger causality. It there-
fore does not allow instantaneous causality or now-casting (e.g. Gotz and Hecq (2014)).
The coefficients d and a in (3.1) govern the autoregressive properties of xH and xL, re-
spectively. The coefficients b and c are relevant for Granger causality, so we explain how
they are labeled. b1 represents the impact of the most recent past observation of xH (i.e.
xH(τ − 1,m)) on xL(τ), b2 represents the impact of the second most recent past observation of
xH (i.e. xH(τ − 1,m− 1)) on xL(τ), and so on through bpm. In general, bj represents the impact
of xH on xL with j high frequency lags.
Similarly, c1 represents the impact of xL(τ−1) on the nearest observation of xH (i.e. xH(τ, 1)),
c2 represents the impact of xL(τ−1) on the second nearest observation of xH (i.e. xH(τ, 2)), cm+1
represents the impact of xL(τ−2) on the (m+1)-th nearest observation of xH (i.e. xH(τ, 1)), and
so forth. Finally, cpm represents the impact of xL(τ − p) on xH(τ,m). In general, cj represents
the impact of xL on xH with j high frequency lags.
We impose the usual stationarity and ergodicity conditions.
Assumption 3.2. All roots of the polynomial det(IK −∑p
k=1Akzk) = 0 lie outside the unit
circle, where det(·) is the determinant.
Assumption 3.3. X(τ) and ϵ(τ) are ergodic.
Granger causality from xH to xL is what we call high-to-low causality, while the reverse
direction is what we call low-to-high causality. Since high-to-low causality and low-to-high
causality pose fundamentally different challenges, we treat them separately in Sections 3.1 and
3.2.
3.1 Granger causality from high to low frequency
Pick the last row of the entire system (3.1):
xL(τ) =
p∑k=1
akxL(τ − k) +
pm∑i=1
bixH(τ − 1,m+ 1− i) + ϵL(τ),
ϵL(τ)mds∼ (0, σ2
L), σ2L > 0.
(3.2)
The index i ∈ {1, . . . , pm} is in high frequency terms, and the second argument m + 1 − i of
xH can be less than 1 since i > m occurs when p > 1. Allowing any integer value in the second
argument of xH , including those smaller than 1 or larger than m, does not cause any confusion,
and simplifies equations below. It is understood, for example, that xH(τ, 0) = xH(τ − 1,m),
xH(τ,−1) = xH(τ−1,m−1), and xH(τ,m+1) = xH(τ+1, 1).More generally, we interchangeably
write xH(τ − τ ′, i) = xH(τ, i−mτ ′) for any τ, τ ′, i ∈ Z. See Technical Appendix B for details.
Based on the classic theory of Dufour and Renault (1998) and the mixed frequency extension
made by Ghysels, Hill, and Motegi (2016), we know that xH does not Granger cause xL given
11
the mixed frequency information set Fτ = σ(X(τ ′) : τ ′ ≤ τ) if and only if
H0 : b1 = · · · = bpm = 0.
A well-known issue here is that the number of zero restrictions, pm,may be very large in some
applications, depending on the ratio of sampling frequencies m. Consider a case of weekly versus
quarterly data for instance.11 Each quarter has approximately m = 12 weeks, and the VAR
lag length is p quarters. Then pm = 36 when p = 3, and pm = 48 when p = 4, etc. Running
parsimonious regression models and the max test is our proposed solution to the parameter
proliferation arising from a large m.
There is a clear correspondence between the general linear DGP (2.1) and the mixed fre-
quency DGP (3.2). The regressand yt is xL(τ); common regressors {z1t, . . . , zpt} are identically
the low frequency lags {xL(τ − 1), . . . , xL(τ − p)}; the main regressors split across parsimonious
regression models {x1t, . . . , xht} are the high frequency lags {xH(τ − 1,m+ 1− 1), . . . , xH(τ −1,m+ 1− pm)}. The parsimonious regression models are therefore written as
xL(τ) =
p∑k=1
αk,ixL(τ − k) + βixH(τ − 1,m+ 1− i) + uL,i(τ), i = 1, . . . , pm.
Here we are using h = pm models, which matches the true high frequency lag length. If h < pm,
then there is under-specification and the max test loses consistency (cf. Example 2.2). If h ≥ pm,
then consistency is ensured although finite sample performance may be poorer due to redundant
regressors.
Assumptions 3.1-3.3 imply Assumptions 2.1-2.3. Thus, under Assumptions 3.1-3.3, Theorems
2.1-2.6 and Corollary 2.7 carry over to the max test for high-to-low causality.
3.2 Granger causality from low to high frequency
We now consider Granger causality from xL to xH . Recall from (3.1) that the DGP is MF-
VAR(p). As defined in Ghysels, Hill, and Motegi (2016), the null hypothesis of low-to-high
non-causality is written as H0 : c1 = · · · = cpm = 0 or compactly c = [c1, . . . , cpm]′ = 0.
The first m equations of (3.1) motivate us to run the following pm parsimonious regressions
to test for H0 : c = 0:
xH(τ, 1) =
pm∑j=1
δ1j,ixH(τ − 1,m+ 1− j) + γ1ixL(τ − i) + uH,i(τ, 1), i = 1, . . . , p, (3.3)
......
......
......
...
xH(τ,m) =
pm∑j=1
δmj,ixH(τ − 1,m+ 1− j) + γmixL(τ − i) + uH,i(τ,m), i = 1, . . . , p. (3.4)
11In Section 5, we indeed analyze Granger causality from weekly interest rate spread to quarterly GDP growthin the U.S.
12
In each of the pm parsimonious regression models (3.3)-(3.4), we put one high frequency τ -period
series on the left-hand side and one lagged low frequency on the right-hand side, augmented with
lagged high frequency regressors. Each of the pm models is correctly specified relative to (3.1)
under low-to-high non-causality H0 : c = 0.
For each fixed high frequency increment k ∈ {1, . . . ,m}, Assumptions 2.1-2.3 and hence
Theorems 2.1-2.3 carry over so that we can easily run the max test with respect to γk1 = · · · =γkp = 0. Low-to-high non-causality, however, implies joint zero restrictions over all increments
k ∈ {1, . . . ,m}. It is left as a future task to derive an asymptotic distribution of the max test
statistic over all pm models under the null.
Another open question is the characterization of the pseudo-true values of key parameters
{γ11, . . . , γ1p, . . . , γm1, . . . , γmp}. To prove the consistency of the proposed max test for low-to-
high causality, we should prove that at least one of those γ’s has a nonzero pseudo-true value
under H1 : c = 0.
Last but not least, parsimonious regression models (3.3)-(3.4) may suffer from parameter
proliferation even under H0 : c = 0. The number of regressors in each model is pm+ 1, which
may be large enough to distort the finite sample performance of the max test. A possible remedy
is to impose a MIDAS polynomial on the lagged high frequency variables in order to reduce the
dimension of common regressors, and to keep the low-to-high causality part unrestricted.12 An
issue of this approach is a potential misspecification of MIDAS polynomials. If the MIDAS
polynomials are misspecified, the resulting estimators of γ’s are not guaranteed to converge in
probability to 0 under H0 : c = 0.
4 Monte Carlo simulations
In this section, we perform Monte Carlo simulations and compare the finite sample performance
of the max tests and Wald tests. We consider three scenarios in order to highlight practical
advantages of the max tests in generic and MIDAS frameworks. In Section 4.1, we consider
generic DGPs that can include cross-section and single-frequency time series data. In Section
4.2, we consider a variety of DGPs and models with mixed frequency data. In Section 4.3, we
study a more specific MIDAS framework that resembles the empirical application in Section 5.
4.1 Generic framework
4.1.1 Simulation design
We consider a generic DGP that yt = x′tb + ϵt with ϵt
i.i.d.∼ N(0, 1). For simplicity, we do not
include any common regressor zt (i.e. p = 0) and only include key regressors xt. The dimension
of xt is h ∈ {10, 100, 200}, and sample size is n ∈ {10h, 50h}. We assume that xti.i.d.∼ N(0,Γ),
12See Ghysels, Santa-Clara, and Valkanov (2006) and Ghysels, Sinko, and Valkanov (2007) among others forfurther discussion of MIDAS polynomials. Various software packages including the MIDAS Matlab Toolbox(Ghysels (2013)), the R Package midasr (Ghysels, Kvedaras, and Zemlys (2016)), EViews, and Gretl cover avariety of polynomial specifications.
13
where the diagonal elements of Γ are all equal to 1 and the off-diagonal elements are all equal
to ρ ∈ {0.0, 0.3, 0.6}. The parameter ρ measures the contemporaneous correlation among the
regressors. We consider four patterns for the true coefficients b = [b1, . . . , bh]′:
DGP-1 bi = 0 for all i ∈ {1, . . . , h}.
DGP-2 bi = I(i < h/10)× 10(5h)−1 for all i ∈ {1, . . . , h}.
DGP-3 bi = I(i < h/2)× 2(5h)−1 for all i ∈ {1, . . . , h}.
DGP-4 bi = (5h)−1 for all i ∈ {1, . . . , h}.
DGP-1 is prepared for investigating the empirical size of the max and Wald tests. Nonzero
coefficients are put on the first 10% of the entire regressors under DGP-2, the first half under
DGP-3, and all regressors under DGP-4. For each of the latter three DGPs, the coefficients sum
up to∑h
i=1 bi = 1/5.
We run parsimonious regression models yt = βixit+uit for i = 1, . . . , h, and compute the max
test statistic Tn = max{(√nβn1)
2, . . . , (√nβnh)
2}. To compute a p-value, we draw M = 5000
samples from the asymptotic distribution under H0 : b = 0. We use the asymptotic distribution
without the assumption of conditional homoskedasticity (i.e. Λij = E[ϵ2tXitX′jt]).
13
For comparison, we also run a full regression model yt = x′tβ+ut and perform the asymptotic
and bootstrapped Wald tests. The asymptotic Wald test is a well-known χ2-test with degrees
of freedom h. For the bootstrapped Wald test, we generate B = 1000 bootstrap samples
based on Goncalves and Kilian’s (2004: Sec. 3.1) recursive-design wild bootstrap with standard
normal innovations. An advantage of this bootstrap is that it is robust against conditional
heteroskedasticity of unknown form.
For each test, we draw J = 1000 Monte Carlo samples and compute rejection frequencies
with respect to nominal size α = 0.05.
4.1.2 Simulation results
See Table 1 for rejection frequencies with ρ = 0; see Table 2 for rejection frequencies with
ρ = 0.3. We refrain from reporting results with ρ = 0.6 since they are qualitatively similar to
those with ρ = 0.3.14
Focusing on DGP-1, the asymptotic Wald test suffers from severe size distortions. See, for
example, the case with (ρ, h, n) = (0, 200, 2000), where the empirical size of the asymptotic Wald
test is 0.320 (Table 1). The max test and the bootstrapped Wald test are correctly sized for all
cases.
We now compare the empirical power of the max test and the bootstrapped Wald test. When
ρ = 0, the bootstrapped Wald test is slightly more powerful than the max test in most cases.13We omit the max test with the assumption of conditional homoskedasticity (i.e. Λij = σ2E[XitX
′jt]), since
it produces almost the same rejection frequencies as Λij = E[ϵ2tXitX′jt] for all cases considered (see (2.8), (2.9),
and Remark 2.2).14The results with ρ = 0.6 are available upon request.
14
See, for example, DGP-2 with (ρ, h, n) = (0, 100, 5000), where the empirical power is 0.234 for
the max test and 0.339 for the Wald test (Table 1).
When ρ ∈ {0.3, 0.6}, the max test dominates the Wald test. The empirical power of the max
test is sometimes more than three times as high as the empirical power of the Wald test. See,
for example, DGP-4 with (ρ, h, n) = (0.3, 100, 1000), where the empirical power is 0.815 for the
max test and 0.227 for the Wald test (Table 2). We observe similar overwhelming results for
many other cases with ρ ∈ {0.3, 0.6}.An intuitive explanation for the remarkably high power of the max test is as follows. Per-
forming the Wald test requires the computation of Γ = (1/n)∑n
t=1 xtx′t. The resulting Γ is
more and more an imprecise estimator for Γ as the dimension h gets larger or there is a stronger
correlation among the regressors. The bootstrapped Wald test tends to lose finite sample power
under those situations. The max test, instead, bypasses the computation of Γ since it only
requires the computation of Γii = (1/n)∑n
t=1 x2it for i = 1, . . . , h. Since Γii has a much smaller
dimension than Γ, the accuracy of estimation improves considerably. Hence the max test keeps
high power when the dimension h or the correlation within xt is relatively large. Large dimen-
sionality and strong correlation among regressors (i.e. multicollinearity) are indeed an issue in
many economic applications. The max test is a promising alternative to the Wald test in those
challenging cases.
4.2 MIDAS framework
4.2.1 Simulation design
We slightly modify the simulation design of Section 4.1 in order to set up a MIDAS framework.
Suppose that there are one high frequency variable xH and one low frequency variable xL, and
that the ratio of their sampling frequencies is m. Assume that xH follows a univariate high
frequency AR(1):
xH(τ, i) = ϕ× xH(τ, i− 1) + ϵH(τ, i), τ = 1, . . . , n, i = 1, . . . ,m,
where ϕ ∈ {0.8, 0.95} and ϵH(τ, i)i.i.d.∼ N(0, 1). The choice of ϕ ∈ {0.8, 0.95} is reasonable
since many economic and financial time series, especially when they are observed at a very high
frequency, entail substantial persistence in levels or even after taking the first difference.
A DGP of xL is given by
xL(τ) =
h∑i=1
bixH(τ − 1,m+ 1− i) + ϵL(τ), τ = 1, . . . , n,
where ϵL(τ)i.i.d.∼ N(0, 1). The dimension of regressors is h ∈ {10, 100, 200}, and the low frequency
sample size is n ∈ {10h, 50h}.The true coefficients b = [b1, . . . , bh]
′ represent (unidirectional) high-to-low Granger causality.
We consider four causal patterns:
15
DGP-1 bi = 0 for all i ∈ {1, . . . , h}.
DGP-2 bi = I(i < h/10)× 10(20h)−1 for all i ∈ {1, . . . , h}.
DGP-3 bi = I(i < h/2)× 2(20h)−1 for all i ∈ {1, . . . , h}.
DGP-4 bi = (20h)−1 for all i ∈ {1, . . . , h}.
DGP-1 is considered in order to investigate the empirical size of each causality test. The
first 10% of the h high frequency lags of xH have nonzero impacts on xL under DGP-2; the first
half have nonzero impacts under DGP-3; all lags have nonzero impacts under DGP-4. For each
of the latter three DGPs, the magnitude of causality sums up to∑h
i=1 bi = 1/20.15
We choose m = 12 since it occurs in many empirical applications involving (i) monthly and
annual data and (ii) weekly and quarterly data. In extra simulations not reported here, we also
considered other common cases of m = 3 (i.e. monthly and quarterly mixture) and m = 22 (i.e.
daily and monthly mixture). We found that rejection frequencies are not sensitive to the choice
of m given our simulation design.
We test for H0 : b = 0 via the max test without the assumption of conditional homoskedas-
ticity. We run parsimonious regression models xL(τ) = βixH(τ − 1,m + 1 − i) + uL,i(τ) for
i = 1, . . . , h, and compute the max test statistic Tn = max{(√nβn1)
2, . . . , (√nβnh)
2}. A p-value
is computed with M = 5000 draws from the asymptotic distribution under H0.
For comparison, we also run a full regression model xL(τ) =∑h
i=1 βixH(τ−1,m+1−i)+uL(τ)
and perform the asymptotic and bootstrapped Wald tests. For the bootstrapped Wald test, we
generate B = 1000 bootstrap samples based on Goncalves and Kilian’s (2004) recursive-design
wild bootstrap with standard normal innovations.
For each test, we draw J = 1000 Monte Carlo samples and compute rejection frequencies
with respect to nominal size α = 0.05.
4.2.2 Simulation results
See Table 3 for rejection frequencies with ϕ = 0.85; Table 4 for ϕ = 0.95. Our results are
essentially similar to those in the generic framework (Section 4.1). First, the asymptotic Wald
test suffers from severe size distortions. Second, the max test and the bootstrapped Wald test
are correctly sized for all cases.
Third and importantly, the max test is more and more powerful than the bootstrapped Wald
test as the persistence in xH increases. When ϕ = 0.8, the max test is roughly as powerful as
the Wald test in many cases and much more powerful in some cases. See, for example, DGP-2
with (ϕ, h, n) = (0.8, 100, 5000), where the empirical power is 0.856 for the max test and 0.349
for the Wald test (Table 3). When ϕ = 0.95, the max test dominates the Wald test. The
empirical power of the max test is sometimes more than four times as high as the empirical
15One might be tempted to consider an extra case where, for example, the last 10% of the h high frequencylags of xH have nonzero impacts on xL. Rejection frequencies in such a case are virtually identical to those underDGP-2, since xH follows a simple AR(1) process in our setting.
16
power of the Wald test. See, for example, DGP-3 with (ϕ, h, n) = (0.95, 200, 2000), where the
empirical power is 0.794 for the max test and 0.192 for the Wald test (Table 4). We observe
equally overwhelming results in many other cases.
4.3 Empirical framework
In this section, we consider a more specific MIDAS framework which resembles our empirical
application in Section 5. Consider a low frequency variable xL and a high frequency variable
xH with m = 12. In our empirical application, xL is quarterly U.S. GDP growth and xH is
weekly interest rate spread in the U.S. Assume that xH follows a univariate high frequency
AR(1): xH(τ, i) = ϕH × xH(τ, i − 1) + ϵH(τ, i) with ϵH(τ, i)i.i.d.∼ N(0, 1). A main DGP is that
xL(τ) = ϕL×xL(τ −1)+∑24
i=1 bixH(τ −1,m+1− i)+ ϵL(τ) with ϵL(τ)i.i.d.∼ N(0, 1). We choose
ϕH = 0.95 and ϕL = 0.85 to mimic the strong persistence of the interest rate spread and GDP
growth. The high frequency lag length of h = 24 weeks also matches our empirical analysis.
As in Section 4.2, we consider four cases for b = [b1, . . . , b24]′. We assume that bi = 0 under
DGP-1; bi = (1/30)× I(i ≤ 3) under DGP-2; bi = (1/120)× I(i ≤ 12) under DGP-3; bi = 1/240
under DGP-4. The low frequency sample size is set to be n ∈ {80, 200} in line with our empirical
application. The smaller value of n = 80 quarters matches a rolling window analysis, while the
larger value of n = 200 quarters resembles a full sample analysis.
We perform the max test without the assumption of conditional homoskedasticity, asymptotic
Wald test, and bootstrapped Wald test across J = 1000 Monte Carlo samples. Parsimonious
regression models used for the max test are xL(τ) = α0i +∑2
k=1 αkixL(τ − k) + βixH(τ −1,m + 1 − i) + uL,i(τ) for i ∈ {1, . . . , 24}. A full regression model used for the Wald test is
xL(τ) = α0 +∑2
k=1 αkxL(τ − k) +∑24
i=1 βixH(τ − 1,m+1− i) + uL(τ). Residual diagnostics in
the empirical section suggest that those are reasonable specifications given our data.
For comparison, we aggregate the high frequency variable as xH(τ) = xH(τ,m) and rerun
the max and Wald tests. This aggregation scheme is called stock sampling since it picks the last
high frequency observation for each low frequency time period τ . In our empirical application,
we implement the stock sampling since term spread is a stock variable.
Parsimonious regression models used for the low frequency max test are xL(τ) = α0i +∑2k=1 αkixL(τ − k) + βixH(τ − i) + uL,i(τ) for i ∈ {1, 2, 3}. A full regression model used for the
low frequency Wald test is xL(τ) = α0 +∑2
k=1 αkxL(τ − k) +∑3
i=1 βixH(τ − i) + uL(τ). Since
the lag i is now in terms of low frequency, a relatively small lag length hLF = 3 is enough to
capture the causality from xH to xL. The nominal size is 5% for each test.
See Table 5 for rejection frequencies. Focusing on the mixed frequency models, the asymp-
totic Wald test suffers from severe size distortions as expected. Its empirical size is 0.491 for
n = 80 and 0.164 for n = 200. The max test and the bootstrapped Wald test have correct size
for both sample sizes.
The max test achieves much higher empirical power than the bootstrapped Wald test for
each sample size and causal pattern. See, for example, DGP-3 with n = 80, where the empirical
power is 0.538 for the max test and 0.180 for the bootstrapped Wald test. Those results highlight
17
the remarkable advantage of the max test in small sample with many parameters.
Focusing on the low frequency models, all tests including the asymptotic Wald test are
correctly sized. That is not surprising since hMF = 24 high frequency lags of xH is now replaced
with hLF = 3 low frequency lags of aggregated xH . Given our causal patterns, the max and
Wald tests attain equally high power under DGP-2, DGP-3, and DGP-4. The stock aggregation
of xH does not result in spurious non-causality given those DGPs.
5 Empirical application
As an empirical application, we test for Granger causality from a weekly term spread (i.e.
difference between long-term and short-term interest rates) to quarterly real GDP growth in the
United States. A decline in the interest rate spread has historically been regarded as a strong
predictor of a recession, but recent events place doubt on its use for such prediction.16 Recall
that, in 2005, the interest rate spread fell substantially due to a relatively constant long-term
rate and a soaring short-term rate (well known as “Greenspan’s Conundrum”), yet a recession
did not follow immediately. The subprime mortgage crisis started nearly two years later, in
December 2007, and therefore may not be directly related to the 2005 plummet in the term
spread. It is therefore an interesting study to test for causality from the term spread to GDP.
5.1 Data and preliminary analysis
We use seasonally-adjusted quarterly real GDP growth as a business cycle measure. In order
to remove potential seasonal effects remaining after seasonal adjustment, we use annual growth
(i.e. log xL(τ)− log xL(τ − 4)). The short and long term interest rates used for the term spread
are respectively the effective federal funds (FF) rate and 10-year Treasury constant maturity
rate. We aggregate each daily series into weekly series by picking the last observation in each
week (recall that interest rates are stock variables). The sample period covers January 5, 1962
to December 31, 2013, which contains 2,736 weeks or 208 quarters.17
Figure 1 shows time series plots of weekly FF rate, weekly 10-year rate, their spread (10Y−FF),
and quarterly GDP growth. The shaded areas represent recession periods defined by the Na-
tional Bureau of Economic Research (NBER). In the first half of the sample period, a sharp
decline of the spread seems to be immediately followed by a recession. In the second half of the
sample period, there appears to be a weaker association, and a larger time lag between a spread
drop and a recession.
See Table 6 for sample statistics. The 10-year rate is about 1% point higher than the FF
rate on average, while average GDP growth is 3.15%. It is interesting that both of the short-
term and long-term interest rates are positively skewed but their difference has a large negative
16See Stock and Watson (2003) for a survey of the historical relationship between term spread and businesscycle.
17All data are downloaded from Federal Reserve Economic Data maintained by the Federal Reserve Bank ofSt. Louis.
18
skewness of −1.198. The term spread has a relatively large kurtosis of 5.611, whereas the GDP
growth has a smaller kurtosis of 3.543. Because of the large skewness and kurtosis, both the
Kolmogorov-Smirnov (KS) test and Anderson-Darling (AD) test reject the null hypothesis of
normality for the FF rate, 10-year rate, and their spread. The normality hypothesis for GDP is
accepted by the KS test but rejected by the AD test.
In Table 7, we perform Phillips and Perron’s (1988) unit root tests for the weekly term spread
and quarterly GDP growth. We consider a regression model with a constant and drift as well as
a regression model with a constant, drift, and linear time trend. The number of lags included
in the Newey-West long-run variance estimator is l ∈ {2, 4}. The null hypothesis of unit root
is rejected at the 5% level for each variable, model specification, and lag length.18 Hence the
weekly term spread and quarterly GDP growth are likely to be stationary. Note, however, that
they have a strong degree of persistence—the estimated AR(1) parameter is approximately 0.95
for the spread and 0.87 for the GDP. In view of our simulation results in Section 4, the max test
is supposed to be more powerful than the bootstrapped Wald test when the variables of interest
are persistent.
The number of weeks contained in each quarter τ , denoted by m(τ), is not constant. In our
sample period, 13 quarters have 12 weeks each, 150 quarters have 13 weeks each, and 45 quarters
have 14 weeks each. While the max test can be applied with time-varying m(τ), we simplify
the analysis by taking a sample average at the end of each quarter, resulting in the following
modified term spread {x∗H(τ, i)}12i=1:
x∗H(τ, i) =
xH(τ, i) for i = 1, . . . , 11,
1m(τ)−11
∑m(τ)k=12 xH(τ, k) for i = 12.
This modification gives us a dataset with n = 208 quarters, m = 12, and therefore T = m×n =
2, 496 weeks.
5.2 Models and tests
Our main interest lies in a rolling window analysis since we want to observe a dynamic evolution
of Granger causality from term spread to GDP. We set a window size to be 80 quarters (i.e. 20
years) so that there are 129 windows in total. The first subsample covers the first quarter of 1962
through the fourth quarter of 1981 (written as 1962:I-1981:IV), the second one covers 1962:II-
1982:I, and the last one covers 1994:I-2013:IV. As a supplemental analysis, we also perform a
full sample analysis using all 208 quarters (i.e. 52 years).
We use the same model specifications for both full sample analysis and rolling window
18The unit root hypothesis is also rejected when lag length is l = 8. The result with l = 8 is omitted to savespace, but available upon request.
19
analysis. First, a full regression model with mixed frequency (MF) data is specified as
xL(τ) = α0 +
2∑k=1
αkxL(τ − k) +
24∑i=1
βix∗H(τ − 1, 12 + 1− i) + uL(τ). (5.1)
We are regressing the quarterly GDP growth xL(τ) onto a constant, p = 2 quarters of lagged
GDP growth, and hMF = 24 weeks of lagged interest rate spread. We choose p = 2 since the
resulting residuals appear to be serially uncorrelated. We perform the Wald test based on model
(5.1). P-values are computed after generating B = 1000 bootstrap samples based on Goncalves
and Kilian’s (2004) recursive-design wild bootstrap with standard normal innovations. The
asymptotic test is not performed since it is subject to serious size distortions in view of our
simulation results.
Naturally, parsimonious regression models with MF data are specified as
xL(τ) = α0i +2∑
k=1
αkixL(τ − k) + βix∗H(τ − 1, 12 + 1− i) + uL,i(τ), i = 1, . . . , 24. (5.2)
We perform the max test based on model (5.2). P-values are computed based on the robust
covariance matrix with M = 100000 draws from an approximation to the limit distribution
under non-causality.
For comparison, we also perform low frequency (LF) analysis by aggregating the weekly
term spread to the quarterly level. Since the term spread is a stock variable, we aggregate it as
x∗H(τ) = x∗H(τ,m). A full regression model with LF data is specified as
xL(τ) = α0 +
2∑k=1
αkxL(τ − k) +
3∑i=1
βix∗H(τ − i) + uL(τ). (5.3)
This model has q = 2 quarters of lagged xL and hLF = 3 quarters of lagged x∗H . We perform
the Wald test based on model (5.3) again with Goncalves and Kilian’s (2004) recursive-design
wild bootstrap.
Similarly, parsimonious regression models with LF data are specified as
xL(τ) = α0i +2∑
k=1
αkixL(τ − i) + βix∗H(τ − i) + uL,i(τ), i = 1, 2, 3.
We perform the max test based on this model with M = 100000 draws from an approximation
to the limit distribution under non-causality.
5.3 Empirical results
Figure 2 plots rolling window p-values for each causality test over the 129 windows. Unless
otherwise stated, the nominal size is fixed at 5%. All tests except for the MF Wald test find
significant causality from term spread to GDP in early periods. Significant causality is detected
20
up until 1979:I-1998:IV by the MF max test; up until 1975:III-1995:II by the LF max test;
up until 1974:III-1994:II by the LF Wald test. The MF max test has the longest period of
significant causality. These three tests all agree that there is non-causality in recent periods,
possibly reflecting some structural change in the middle of the entire sample.
The MF Wald test, in contrast, suggests that there is significant causality only after subsam-
ple 1990:IV-2010:III, which is somewhat counter-intuitive. This result may stem from parameter
proliferation. The full regression model with mixed frequency data has many more parameters
than any other model. The results of the MF max test seem to be more intuitive and preferable
than the results of the MF Wald test.
As a diagnostic check, we perform the Ljung-Box Q-test of serial uncorrelatedness of the
least squares residuals from the MF full regression model (5.1) and LF full regression model
(5.3) in order to check whether these models are well specified. Since the true innovations may
not be serially independent, we use Horowitz, Lobato, Nankervis and Savin’s (2006) double
blocks-of-blocks bootstrap with block size b ∈ {4, 10, 20}. The number of bootstrap samples is
{M1,M2} = {999, 249} for the first and second stages. We perform the Q-tests with 4, 8, or 12
lags for each window and model.
When the block size is b = 4, the null hypothesis of residual uncorrelatedness in the MF case
is rejected at the 5% level in only {13, 5, 1} windows out of 129 for {4, 8, 12} lags, suggesting
that the MF model is well specified. In the LF case, the null hypothesis is rejected at the 5%
level in {51, 23, 33} windows with {4, 8, 12} lags, suggesting that the LF model may not be well
specified. The MF model generally produces fewer rejections than the LF model under larger
block sizes b ∈ {10, 20}. Overall, the MF model seems to yield a better fit than the LF model
in terms of residual uncorrelatedness.
As a brief supplemental analysis, we perform a full sample analysis using all 208 quarters. We
consider the same model specifications and testing procedures as in the rolling window analysis
for easy comparison. Resulting p-values are 0.000 for the MF max test, 0.359 for the MF Wald
test, 0.000 for the LF max test, and 0.008 for the LF Wald test. Only the MF Wald test fails
to reject non-causality, the same result repeatedly observed in the first half of rolling windows.
It again suggests that the MF Wald test is less robust against parameter proliferation.
6 Conclusions
We propose a new test designed for a large set of zero restrictions in regression models. A
classical Wald test may have a poor finite sample performance when the number of zero restric-
tions is relatively large. Our proposed solution to the dimensionality problem is to split key
regressors across many parsimonious regression models. The ith parsimonious regression model
contains small-dimensional common regressors and the ith key regressor only, so that parameter
proliferation is less an issue. We then take the maximum of the squared estimators across all
parsimonious regression models.
The asymptotic distribution of our max test statistic is non-standard under the null hypoth-
21
esis, but an approximate p-value is readily available by drawing from an asymptotically valid
approximation to the asymptotic distribution directly. Under the alternative hypothesis, at least
one of the key estimators has a nonzero probability limit under fairly weak conditions. The max
test therefore attains consistency.
After presenting the general theory of the max test, the paper focuses on mixed frequency
Granger causality as a prominent application involving many zero restrictions. Granger causality
from a high frequency variable to a low frequency variable is straightforward to handle. The
reverse direction poses a harder challenge in terms of deriving the asymptotic distribution under
the null, identifying the pseudo-true values of key parameters under the alternative, and keeping
parsimony under the null.
We perform Monte Carlo simulations with generic and MIDAS settings. The asymptotic
Wald test leads to serious size distortions, while the max test and the bootstrapped Wald test
have sharp size. The max test is more and more powerful than the bootstrapped Wald test as
the dimension of the model or correlation among regressors increases. In many cases, the max
test is three or four times as powerful as the bootstrapped Wald test. An essential reason for
the remarkably high power of the max test is that it bypasses the computation of the sample
covariance matrix of all regressors unlike the Wald test.
As an empirical application, we test for Granger causality from a weekly interest rate spread
to real GDP growth in the U.S. The max test with mixed frequency data yields an intuitive
result that the interest rate spread causes GDP growth until the 1990s, after which causal-
ity vanishes. The Wald test with mixed frequency data yields a counter-intuitive result that
significant causality emerges only recently.
We close this paper by discussing a broad range of future applications of the max test. One
can easily generalize the max test for an increasing number of parameters, and would therefore
apply to, for example, nonparametric regression models using Fourier flexible forms (Gallant
and Souza (1991)), Chebyshev, Laguerre, or Hermite polynomials (see e.g. Draper, Smith, and
Pownell (1966)), and splines (Rice and Rosenblatt (1983), Friedman (1991))—where our test has
use for detecting redundant terms. Another application is a max-correlation white noise test for
weakly dependent time series (see e.g. Xiao and Wu (2014), Hill and Motegi (2019a), and Hill
and Motegi (2019b)). Those are only a few out of many examples involving a large—possibly
infinite—set of zero restrictions. We leave them as an area of future research.
References
Andersen, N. T., and V. Dobric (1987): “The central limit theorem for stochastic processes,”Annals of Probability, 15, 164–177.
Berman, S. M. (1964): “Limit theorems for the maximum term in stationary sequences,”Annals of Mathematical Statistics, 35, 502–516.
22
Billingsley, P. (1961): “The Lindeberg-Levy theorem for martingales,” Proceedings of theAmerican Mathematical Society, 12, 788–792.
Davidson, R., and J. G. MacKinnon (2006): “The power of bootstrap and asymptotic tests,”Journal of Econometrics, 133, 421–441.
Draper, N. R., H. Smith, and E. Pownell (1966): Applied regression analysis. Wiley NewYork.
Dudley, R. M. (1978): “Central Limit Theorems for empirical measures,” Annals of Probabil-ity, 6, 899–929.
Dufour, J. M., D. Pelletier, and E. Renault (2006): “Short run and long run causalityin time series: Inference,” Journal of Econometrics, 132, 337–362.
Dufour, J. M., and E. Renault (1998): “Short run and long run causality in time series:Theory,” Econometrica, 66, 1099–1125.
Foroni, C., E. Ghysels, and M. Marcellino (2013): “Mixed-frequency vector autore-gressive models,” in VAR Models in Macroeconomics – New Developments and Applications:Essays in Honor of Christopher A. Sims, ed. by T. B. Fomby, L. Kilian, and A. Murphy,vol. 32, pp. 247–272. Emerald Group Publishing Limited.
Friedman, J. H. (1991): “Multivariate adaptive regression splines,” Annals of Statistics, 19,1–67.
Gallant, A. R., and G. Souza (1991): “On the asymptotic normality of Fourier flexible formestimates,” Journal of Econometrics, 50, 329–353.
Ghysels, E. (2013): “Matlab Toolbox for mixed sampling frequency data analysis us-ing MIDAS regression models,” Available at http://www.mathworks.com/matlabcentral/fileexchange/45150-midas-matlab-toolbox.
(2016): “Macroeconomics and the reality of mixed frequency data,” Journal of Econo-metrics, 193, 294–314.
Ghysels, E., J. B. Hill, and K. Motegi (2016): “Testing for Granger causality with mixedfrequency data,” Journal of Econometrics, 192, 207–230.
Ghysels, E., V. Kvedaras, and V. Zemlys (2016): “Mixed frequency data sampling regres-sion models: The R package midasr,” Journal of Statistical Software, 72, 1–35.
Ghysels, E., P. Santa-Clara, and R. Valkanov (2006): “Predicting volatility: Gettingthe most out of return data sampled at different frequencies,” Journal of Econometrics, 131,59–95.
Ghysels, E., A. Sinko, and R. Valkanov (2007): “MIDAS regressions: Further results andnew directions,” Econometric Reviews, 26, 53–90.
Gnedenko, B. V. (1943): “Sur la Distribution Limite du Terms Maximum d’une SerieAleatoire,” Annals of Mathematics, 44, 423–453.
Goncalves, S., and L. Kilian (2004): “Bootstrapping autoregressions with conditional het-eroskedasticity of unknown form,” Journal of Econometrics, 123, 89–120.
23
Gotz, T. B., and A. Hecq (2014): “Nowcasting causality in mixed frequency vector autore-gressive models,” Economics Letters, 122, 74–78.
Gotz, T. B., A. Hecq, and S. Smeekes (2016): “Testing for Granger causality in largemixed-frequency VARs,” Journal of Econometrics, 193, 418–432.
Granger, C. W. J. (1969): “Investigating causal relations by econometric models and cross-spectral methods,” Econometrica, 3, 424–438.
Hannan, E. J. (1974): “The uniform convergence of autocovariances,” Annals of Statistics, 2,803–806.
Hansen, B. (1996): “Inference when a nuisance parameter is not identified under the nullhypothesis,” Econometrica, 64, 413–430.
Hill, J. B. (2007): “Efficient tests of long-run causation in trivariate VAR processes with arolling window study of the money-income relationship,” Journal of Applied Econometrics,22, 747–765.
Hill, J. B., and K. Motegi (2019a): “A max-correlation white noise test for weakly dependenttime series,” Econometric Theory, forthcoming.
(2019b): “Testing the white noise hypothesis of stock returns,” Economic Modelling,76, 231–242.
Horowitz, J., I. N. Lobato, J. C. Nankervis, and N. E. Savin (2006): “Bootstrappingthe Box-Pierce Q test: A robust test of uncorrelatedness,” Journal of Econometrics, 133,841–862.
Lutkepohl, H. (1993): “Testing for causation between two variables in higher-dimensionalVAR models,” in Studies in Applied Econometrics, ed. by H. Schneeweiß, and K. Zimmermann,pp. 75–91. Physica-Verlag Heidelberg.
Phillips, P. C. B., and P. Perron (1988): “Testing for a unit root in time series regression,”Biometrika, 75, 335–346.
Rice, J., and M. Rosenblatt (1983): “Smoothing splines: Regression, derivatives and de-convolution,” Annals of Statistics, 11, 141–156.
Stock, J. H., and M. W. Watson (2003): “Forecasting output and inflation: The role ofasset prices,” Journal of Economic Literature, 41, 788–829.
White, H. (2000): “A reality check for data snooping,” Econometrica, 68, 1097–1126.
Xiao, H., and W. B. Wu (2014): “Portmanteau test and simultaneous inference for serialcovariances,” Statistica Sinica, 24, 577–600.
24
Technical Appendices
A Proofs of main results
In this section we prove Theorems 2.1-2.5. Proofs of Theorem 2.6 and Corollary 2.7 are omittedsince they are self-explanatory.
A.1 Proof of Theorem 2.1
Recall from (2.6) that the ith parsimonious regression model is written as yt = X ′itθi + uit,
where Xit = [z1t, . . . , zpt, xit]′ and θi = [α′
i, βi]′ = [α1i, . . . , αpi, βi]
′. Let θni = [α′ni, βni]
′ =
[αn1i, . . . , αnpi, βni]′ be the least squares estimator for θi.
In order to characterize the limit distribution of Tn = max1≤i≤h(√nβni)
2, we must show
convergence of the finite dimensional distributions of {√nβni}hi=1 and stochastic equicontinuity
(e.g. Dudley (1978) and Andersen and Dobric (1987)). By discreteness of i, note ∀(ϵ, η) > 0 ∃δ ∈(0, 1) such that sup1≤i≤h: |i−i|≤δ |
√nβni −
√nβni| = 0 a.s. Therefore limn→∞ P (sup1≤i≤h: |i−i|≤δ
|√nβni −
√nβni| > η) ≤ ε for some δ > 0, hence {
√nβni}hi=1 is stochastically equicontinuous.
Let β = [β1, . . . , βh]′ and βn = [βn1, . . . , βnh]
′. Stack all parameters across the h parsimoniousregression models as θ = [θ′
1, . . . ,θ′h]
′ and θn = [θ′n1, . . . , θ
′nh]
′. Define an h × (p + 1)h full row
rank selection matrix R such that βn = Rθn. See (2.7) for the precise construction of R.
To prove Theorem 2.1, it suffices to show that√nβn
d→ N(0,V ) under H0 : b = 0, where V
is defined in (2.8). The main statement that Tn = max1≤i≤h(√nβni)
2 d→ max1≤i≤hN 2i , where
N = [N1, . . . ,Nh]′ ∼ N(0,V ), then follows instantly from the continuous mapping theorem
since max{·} is a continuous function.Stack the underlying coefficients as θ0i = [a1, . . . , ap, bi]
′ and θ0 = [θ′01, . . . ,θ
′0h]
′. It is suffi-
cient to show that√n(θn − θ0)
d→ N(0,S) under H0 : b = 0, where S is defined in (2.9). Thisresult suffices since
√nβn = R×
√n(θn − θ0) under H0 and V = RSR′ by construction.
Under H0, we have that θ0i = [a1, . . . , ap, 0]′ for i = 1, . . . , h. Hence the ith parsimonious
regression model includes the true DGP as yt = X ′itθ0i + ϵt. Define Γij = E[XitX
′jt], Λij =
E[ϵ2tXitX′jt], Σij = Γ−1
ii ΛijΓ−1jj , and S = [Σij ] for i, j ∈ {1, . . . , h}, as in (2.9).
Under Assumption 2.3 Xt it stationary and ergodic. Coupled with square integrability, theergodic theorem yields:
Γii =1
n
n∑t=1
XitX′it
p→ Γii, (A.1)
where Γii is a positive definite matrix under Assumption 2.2. By (A.1),
√n(θni − θ0i) =
√n
(n∑
t=1
XitX′it
)−1 n∑t=1
Xitϵt = Γ−1ii × 1√
n
n∑t=1
Xitϵt + op(1).
Pick any λ = [λ′1, . . . ,λ
′h]
′ with λi ∈ Rp+1 and λ′iλi = 1. Define Xt(λ) =
∑hi=1 λ
′iΓ
−1ii Xit. We
25
have that
λ′ ×√n(θn − θ0) =
h∑i=1
λ′i
√n(θni − θ0i) =
h∑i=1
λ′i
(Γ−1ii × 1√
n
n∑t=1
Xitϵt
)+ op(1)
=1√n
n∑t=1
(h∑
i=1
λ′iΓ
−1ii Xit
)ϵt + op(1) =
1√n
n∑t=1
Xt(λ)ϵt + op(1).
Observe that
E[Xt(λ)2ϵ2t ] =
h∑i=1
h∑j=1
λ′iΓ
−1ii ΛijΓ
−1jj λj = λ′Sλ < ∞.
Under Assumptions 2.1-2.3, {∑h
i=1 λ′iXitϵt} is a stationary, ergodic, square integrable martingale
difference. Therefore Billingsley’s (1961) central limit theorem applies to yield (1/√n)∑n
t=1Xt(λ)ϵtd→ N(0,λ′Sλ). By the Cramer-Wold theorem,
√n(θn − θ0)
d→ N(0,S). Hence,
√nβn = R×
√n(θn − θ0)
d→ N(0,V )
and therefore
Tn = max{(√nβn1)
2, . . . , (√nβnh)
2}
d→ max{N 2
1 , . . . ,N 2h
}. QED
A.2 Proof of Theorem 2.2
Tn = max1≤i≤h(√nβni)
2 operates on a discrete-valued stochastic function gn(i) ≡ βni. As shownin the proof of Theorem 2.1, this implies that weak convergence for {gn(1), . . . , gn(h)} is identicalto convergence in the finite dimensional distributions of {gn(1), . . . , gn(h)}. The proof of Hansen’s(1996) Theorem 2 therefore carries over to the present theorem. QED
A.3 Proof of Theorem 2.3
Under Assumptions 2.1-2.3, {yt,Xit, ϵt} are square integrable, stationary and ergodic. Hence
Γijp→ Γij and θni
p→ (E[XitX′it])
−1E[Xityt] ≡ θ∗i . Under H0, the parsimonious regression
model is identically yt = X ′itθ0i + ϵt with θ0i = [a1, . . . , ap, 0]
′ for i = 1, . . . , h. Hence θnip→ θ0i
and θ∗i = θ0i.
Combined with stationarity, ergodicity, and square integrability, Λijp→ Λ∗
ij ≡ E[(yt −X ′
itθ∗i )
2XitX′jt] with ∥Λ∗
ij∥ < ∞. Also, Λ∗ij = Λij under H0. Therefore Vn
p→ V ≡ RSR′
under H0. Under H1 : b = 0, it follows that Vnp→ RS∗R′ with S∗ = [Γ−1
ii Λ∗ijΓ
−1jj ]i,j and
∥RS∗R′∥ < ∞. QED
A.4 Proof of Theorem 2.4
Recall that the ith parsimonious regression model is yt = X ′itθi + uit. In view of stationarity,
square integrability, and ergodicity, the least squares estimator satisfies θnip→ θ∗
i , where θ∗i =
26
[E[XitX′it]]
−1E[Xityt]. Recall from (2.2) that the DGP is yt = z′ta+ x′
tb+ ϵt. Therefore
θ∗i =
[E[XitX
′it
]]−1E[Xit(z
′ta+ x′
tb+ ϵt)]=
[E[XitX
′it
]]−1(E[Xitz
′t]a+ E[Xitx
′t]b+ E[Xitϵt])
=[E[XitX
′it
]]−1(E[Xitz
′t]a+ E[Xitx
′t]b) = Γ−1
ii (E[Xitz′t]a+Cib),
(A.2)
where the third equality holds from the mds assumption of ϵt. Note that zt = [Ip,0p×1]Xit fori ∈ {1, . . . , h} by construction. Hence
E[Xitz
′t
]= E
[XitX
′it
]×[
Ip01×p
]= Γii ×
[Ip
01×p
]. (A.3)
Substitute (A.3) into (A.2) to get the desired result (2.10). QED
A.5 Proof of Theorem 2.5
Pick the last row of (2.10). The lower left block of Γ−1ii is
−n−1i E
[xitz
′t
] [E[ztz
′t
]]−1
while the lower right block is simply n−1i , where
ni = E[x2it]− E
[xitz
′t
] {E[ztz
′t
]}−1E [ztxit] .
Hence, the last row of Γ−1ii Ci appearing in (2.10) is n−1
i d′i, where
di = E [xtxit]− E[xtz
′t
] {E[ztz
′t
]}−1E [ztxit] . (A.4)
Impose β∗ = 0 hereafter. Then, n−1i d′
ib = 0 for any i ∈ {1, . . . , h} in view of (2.10).Since ni is a nonzero finite scalar for any i ∈ {1, . . . , h} by the nonsingularity of E [XitX
′it],
we have that d′ib = 0 for all i ∈ {1, . . . , h}. Stack these equations to get that Db = 0, where
D = [d1, . . . ,dh]′ ∈ Rh×h. To prove the main statement that b = 0, it is sufficient to show that
D is non-singular. Equation (A.4) implies that
D = E[xtx′t]− E[xtz
′t][E[ztz
′t]]−1
E[ztx′t].
Now define
Γ = E
[[ztxt
] [z′t x′
t
]].
Γ is trivially non-singular by Assumption 2.2. Note that D is the Schur complement of Γ withrespect to E[ztz
′t]. Therefore, by the classic argument of partitioned matrix inversion, D is
non-singular as desired. Thus, β∗ = 0 implies that b = 0. QED
B Double time indices
In this section, we introduce useful notations for handling mixed frequency data. Consider ahigh frequency variable xH and a low frequency variable xL. Let τ ∈ Z be a low frequency timeperiod. Suppose that each low frequency time period has m high frequency time periods. Inperiod τ, we observe {xH(τ, 1), xH(τ, 2), . . . , xH(τ,m), xL(τ)} sequentially.
27
It is often useful to admit a notational convention that allows the second argument of xH tobe an arbitrary integer. It is understood, for example, that xH(τ, 0) = xH(τ−1,m), xH(τ,−1) =xH(τ − 1,m − 1), and xH(τ,m + 1) = xH(τ + 1, 1). In general, what we call a high frequencysimplification operates as follows.
xH(τ, i) =
{xH(τ −
⌈1−im
⌉,m⌈1−im
⌉+ i)
if i ≤ 0,
xH(τ +
⌊i−1m
⌋, i−m
⌊i−1m
⌋)if i ≥ m+ 1.
(B.1)
⌈x⌉ is the smallest integer not smaller than x, while ⌊x⌋ is the largest integer not larger thanx. By applying the high frequency simplification, any integer put in the second argument of xHis transformed to a natural number between 1 and m with the first argument being modifiedappropriately. One can indeed verify that m ⌈(1− i)/m⌉ + i ∈ {1, . . . ,m} when i ≤ 0, andi−m ⌊(i− 1)/m⌋ ∈ {1, . . . ,m} when i ≥ m+ 1.
Since the high frequency simplification allows both arguments of xH to be any integer, thefollowing low frequency simplification is well defined.
xH(τ − τ ′, i) = xH(τ, i−mτ ′), ∀τ, τ ′, i ∈ Z. (B.2)
Equation (B.2) states that any lag or lead τ appearing in the first argument can be deleted bymodifying the second argument appropriately. The second argument may go below 1 or abovem, but such a case is covered by the high frequency simplification (B.1). Equations (B.1) and(B.2) are of practical use when one writes DGPs or regression models involving mixed frequencydata.
28
Table 1: Rejection frequencies under generic framework (ρ = 0)
DGP-1: yt =∑h
i=1 bixit + ϵt with bi = 0
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .029 .032 .040 .051 .049 .052
Wald (asy) .110 .051 .193 .081 .320 .084
Wald (boot) .054 .043 .062 .054 .061 .063
DGP-2: yt =∑h
i=1 bixit + ϵt with bi = I(i < h/10)× 10(5h)−1
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .175 .949 .057 .234 .049 .128
Wald (asy) .306 .896 .302 .410 .393 .339
Wald (boot) .199 .874 .088 .339 .082 .275
DGP-3: yt =∑h
i=1 bixit + ϵt with bi = I(i < h/2)× 2(5h)−1
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .051 .145 .049 .073 .051 .052
Wald (asy) .155 .225 .227 .133 .335 .111
Wald (boot) .080 .206 .075 .098 .067 .084
DGP-4: yt =∑h
i=1 bixit + ϵt with bi = (5h)−1
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .048 .084 .038 .065 .041 .045
Wald (asy) .118 .130 .205 .096 .307 .089
Wald (boot) .069 .120 .050 .079 .045 .072
Rejection frequencies of the max test without the assumption of conditional homoskedasticity, asymptotic
Wald test, and bootstrapped Wald test across J = 1000 Monte Carlo samples. The dimension of regressors
is h ∈ {10, 100, 200}, and sample size is n ∈ {10h, 50h}. Any pair of regressors has correlation coefficient
ρ = 0. The nominal size is 5%.
29
Table 2: Rejection frequencies under generic framework (ρ = 0.3)
DGP-1: yt =∑h
i=1 bixit + ϵt with bi = 0
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .031 .051 .046 .054 .047 .033
Wald (asy) .101 .052 .210 .085 .320 .083
Wald (boot) .066 .050 .043 .058 .062 .061
DGP-2: yt =∑h
i=1 bixit + ϵt with bi = I(i < h/10)× 10(5h)−1
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .232 .949 .833 1.000 .992 1.000
Wald (asy) .320 .879 .537 0.994 .748 1.000
Wald (boot) .235 .877 .258 0.991 .368 1.000
DGP-3: yt =∑h
i=1 bixit + ϵt with bi = I(i < h/2)× 2(5h)−1
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .137 .609 .813 1.000 .987 1.000
Wald (asy) .203 .484 .489 0.980 .730 1.000
Wald (boot) .123 .464 .210 0.970 .306 .998
DGP-4: yt =∑h
i=1 bixit + ϵt with bi = (5h)−1
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .113 .582 .815 1.000 .993 1.000
Wald (asy) .175 .436 .527 .978 .715 1.000
Wald (boot) .106 .420 .227 .963 .313 1.000
Rejection frequencies of the max test without the assumption of conditional homoskedasticity, asymptotic
Wald test, and bootstrapped Wald test across J = 1000 Monte Carlo samples. The dimension of regressors
is h ∈ {10, 100, 200}, and sample size is n ∈ {10h, 50h}. Any pair of regressors has correlation coefficient
ρ = 0.3. The nominal size is 5%.
30
Table 3: Rejection frequencies under MIDAS framework (ϕ = 0.8)
DGP-1: xL(τ) =∑h
i=1 bixH(τ − 1,m+ 1− i) + ϵL(τ), bi = 0
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .042 .033 .040 .050 .057 .048
Wald (asy) .108 .048 .217 .075 .301 .074
Wald (boot) .068 .040 .047 .056 .041 .046
DGP-2: xL(τ) =∑h
i=1 bixH(τ − 1,m+ 1− i) + ϵL(τ), bi = I(i < h/10)× 10(20h)−1
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .073 .270 .163 .856 .157 .870
Wald (asy) .131 .208 .296 .409 .414 .407
Wald (boot) .075 .187 .097 .349 .084 .312
DGP-3: xL(τ) =∑h
i=1 bixH(τ − 1,m+ 1− i) + ϵL(τ), bi = I(i < h/2)× 2(20h)−1
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .070 .252 .081 .217 .059 .163
Wald (asy) .141 .165 .239 .130 .308 .142
Wald (boot) .081 .149 .060 .098 .056 .092
DGP-4: xL(τ) =∑h
i=1 bixH(τ − 1,m+ 1− i) + ϵL(τ), bi = (20h)−1
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .076 .204 .052 .123 .046 .095
Wald (asy) .122 .123 .253 .121 .317 .103
Wald (boot) .069 .104 .064 .087 .066 .060
Rejection frequencies of the max test without the assumption of conditional homoskedasticity, asymptotic
Wald test, and bootstrapped Wald test across J = 1000 Monte Carlo samples. The dimension of regressors is
h ∈ {10, 100, 200}, and low frequency sample size is n ∈ {10h, 50h}. xH follows a univariate high frequency
AR(1): xH(τ, i) = ϕ× xH(τ, i− 1) + ϵH(τ, i) with ϕ = 0.8. The nominal size is 5%.
31
Table 4: Rejection frequencies under MIDAS framework (ϕ = 0.95)
DGP-1: xL(τ) =∑h
i=1 bixH(τ − 1,m+ 1− i) + ϵL(τ), bi = 0
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .043 .039 .044 .042 .037 .051
Wald (asy) .121 .045 .222 .086 .306 .075
Wald (boot) .067 .035 .062 .060 .064 .046
DGP-2: xL(τ) =∑h
i=1 bixH(τ − 1,m+ 1− i) + ϵL(τ), bi = I(i < h/10)× 10(20h)−1
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .274 .910 .969 1.000 .999 1.000
Wald (asy) .241 .701 .663 1.000 .872 1.000
Wald (boot) .150 .684 .386 1.000 .536 1.000
DGP-3: xL(τ) =∑h
i=1 bixH(τ − 1,m+ 1− i) + ϵL(τ), bi = I(i < h/2)× 2(20h)−1
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .258 .915 .800 1.000 .794 1.000
Wald (asy) .231 .649 .486 .986 .592 .978
Wald (boot) .147 .627 .206 .979 .192 .951
DGP-4: xL(τ) =∑h
i=1 bixH(τ − 1,m+ 1− i) + ϵL(τ), bi = (20h)−1
h = 10 h = 100 h = 200
n = 100 n = 500 n = 1000 n = 5000 n = 2000 n = 10000
Max .258 .870 .550 .999 .451 1.000
Wald (asy) .232 .605 .376 .835 .464 .733
Wald (boot) .151 .573 .129 .790 .113 .645
Rejection frequencies of the max test without the assumption of conditional homoskedasticity, asymptotic
Wald test, and bootstrapped Wald test across J = 1000 Monte Carlo samples. The dimension of regressors is
h ∈ {10, 100, 200}, and low frequency sample size is n ∈ {10h, 50h}. xH follows a univariate high frequency
AR(1): xH(τ, i) = ϕ× xH(τ, i− 1) + ϵH(τ, i) with ϕ = 0.95. The nominal size is 5%.
32
Table 5: Rejection frequencies under empirical framework
Mixed frequency models
DGP-1 DGP-2 DGP-3 DGP-4
n = 80 n = 200 n = 80 n = 200 n = 80 n = 200 n = 80 n = 200
Max .042 .049 .557 .949 .538 .948 .486 .900
Wald (asy) .491 .164 .791 .838 .743 .766 .666 .678
Wald (boot) .043 .047 .218 .643 .180 .523 .174 .465
Low frequency models
DGP-1 DGP-2 DGP-3 DGP-4
n = 80 n = 200 n = 80 n = 200 n = 80 n = 200 n = 80 n = 200
Max .044 .045 .533 .955 .455 .890 .381 .868
Wald (asy) .066 .053 .584 .946 .506 .871 .424 .846
Wald (boot) .052 .005 .506 .939 .427 .860 .356 .836
Rejection frequencies of the max test without the assumption of conditional homoskedasticity, asymptotic
Wald test, and bootstrapped Wald test across J = 1000 Monte Carlo samples. xH follows a univariate high
frequency AR(1): xH(τ, j) = 0.95×xH(τ, j−1)+ϵH(τ, j). The main DGP is that xL(τ) = 0.85×xL(τ−1)+∑24i=1 bixH(τ − 1,m+1− i) with m = 12, where bi = 0 under DGP-1; bi = (1/30)× I(i ≤ 3) under DGP-2;
bi = (1/120)× I(i ≤ 12) under DGP-3; bi = 1/240 under DGP-4. Sample size in terms of low frequency is
n ∈ {80, 200}. Parsimonious regression models used for the MF max test are xL(τ) = α0i+∑2
k=1 αkixL(τ−k)+βixH(τ−1,m+1− i)+uL,i(τ) for i ∈ {1, . . . , 24}. A full regression model used for the MF Wald test is
xL(τ) = α0+∑2
k=1 αkxL(τ−k)+∑24
i=1 βixH(τ−1,m+1−i)+uL(τ). Parsimonious regression models used
for the LF max test are xL(τ) = α0i+∑2
k=1 αkixL(τ −k)+βixH(τ − i,m)+uL,i(τ) for i ∈ {1, 2, 3}. A full
regression model used for the LF Wald test is xL(τ) = α0+∑2
k=1 αkxL(τ−k)+∑3
i=1 βixH(τ−i,m)+uL(τ).
The nominal size is 5%.
33
Table 6: Sample statistics of U.S. interest rates and real GDP growth
# Obs. Mean Median Stdev Min Max Skew Kurt p-KS p-AD
FF 2736 5.563 5.250 3.643 0.040 22.00 0.928 4.615 0.000 0.001
10Y 2736 6.555 6.210 2.734 1.470 15.68 0.781 3.488 0.000 0.001
SPR 2736 0.991 1.160 1.800 −9.570 4.440 −1.198 5.611 0.000 0.001
GDP 208 3.151 3.250 2.349 −4.100 8.500 −0.461 3.543 0.177 0.002
FF: Weekly effective federal funds rate. 10Y: Weekly 10-year Treasury constant maturity rate. SPR:
Weekly term spread (10Y−FF). GDP: Percentage growth rate of quarterly GDP. For each variable, we
report the number of observations, sample mean, median, standard deviation, minimum, maximum,
skewness, kurtosis, p-values of the Kolmogorov-Smirnov test for normality, p-value of the Anderson-
Darling test for normality. The sample period covers January 5, 1962 through December 31, 2013.
34
Table 7: Results of Phillips-Perron unit root tests
Weekly term spread Quarterly GDP growth
Drift Drift + Trend Drift Drift + Trend
Lag l l = 2 l = 4 l = 2 l = 4 l = 2 l = 4 l = 2 l = 4
ϕn 0.958 0.958 0.952 0.952 0.878 0.878 0.867 0.867
se(ϕn) 0.006 0.006 0.006 0.006 0.032 0.032 0.034 0.034
Sn −5.849 −5.709 −6.381 −6.263 −4.591 −4.682 −4.760 −4.861
5% c.v. −2.861 −2.864 −3.414 −3.414 −2.876 −2.876 −3.434 −3.434
R/A Reject Reject Reject Reject Reject Reject Reject Reject
We perform Phillips and Perron’s (1988) unit root tests for weekly term spread and quarterly GDP
growth. We consider a regression model with a constant and drift (Drift) as well as a regression model
with a constant, drift, and linear time trend (Drift + Trend). The number of lags included in the Newey-
West long-run variance estimator is l ∈ {2, 4}. ϕn is an OLS estimate for the AR(1) parameter; se(ϕn)
is the standard error of ϕn; Sn is the test statistic; “5% c.v.” is the 5% critical value; “R/A” signifies the
rejection or acceptance of H0 : ϕ = 1. The sample period covers January 5, 1962 through December 31,
2013, which contains 2,736 weeks or 208 quarters.
35
Figure 1: Time series plots of U.S. interest rates and real GDP growth
1970 1980 1990 2000 20100
5
10
15
20
25
(a) Weekly federal funds rate
1970 1980 1990 2000 20100
5
10
15
20
25
(b) Weekly 10-year bond yield
1970 1980 1990 2000 2010-10
-5
0
5
10
(c) Weekly yield spread
1970 1980 1990 2000 2010-10
-5
0
5
10
(d) Quarterly GDP growth
In this figure we present time series plots of (a) weekly effective federal funds rate, (b) weekly 10-year
Treasury constant maturity rate, (c) their spread (10Y−FF), and (d) quarterly real GDP growth from
previous year. The sample period covers January 5, 1962 through December 31, 2013, which contains
2,736 weeks or 208 quarters. The shaded areas represent recession periods defined by the National Bureau
of Economic Research (NBER).
36
Figure 2: P-values for tests of non-causality from interest rate spread to GDP
1965 1970 1975 1980 1985 19900
0.2
0.4
0.6
0.8
1
(a) Mixed frequency max test
1965 1970 1975 1980 1985 19900
0.2
0.4
0.6
0.8
1
(b) Mixed frequency Wald test
1965 1970 1975 1980 1985 19900
0.2
0.4
0.6
0.8
1
(c) Low frequency max test
1965 1970 1975 1980 1985 19900
0.2
0.4
0.6
0.8
1
(d) Low frequency Wald test
Rolling window p-values based on (a) the mixed frequency max test, (b) the mixed frequency Wald test, (c) the
low frequency max test, and (d) the low frequency Wald test. The mixed frequency tests use weekly term spread
and quarterly GDP growth, while the low frequency tests use quarterly term spread and GDP growth. The
sample period covers January 5, 1962 through December 31, 2013, which contains 2,736 weeks or 208 quarters.
The window size is 80 quarters. Each point of the x-axis represents the beginning date of a given window. The
shaded area is [0, 0.05], hence any p-value in that range indicates rejection of non-causality from the interest rate
spread to GDP growth at the 5% level in a given window.
37