Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Uncovering regimes in predictive regressions: A model selection
based approach
Jean-Yves Pitarakis
University of Southampton
Economics Division
Southampton SO17 1BJ, UK.
March 2012
Abstract
A model selection based approach for detecting the presence of threshold effects in predictive
regressions with a highly persistent regressor and possible endogeneity is proposed. The method
is shown to lead to the selection of the true model asymptotically and to be flexible enough
to accommodate threshold variables that are either stationary or highly persistent, including
the predictor itself. An analytical approximation to the probability of wrongly selecting the
threshold model over the linear specification is also obtained across alternative penalty term
choices. An extensive simulation study documents its excellent finite sample properties and
an application to the predictability of stock returns with dividend payout ratios illustrates its
practical usefulness. Results indicate a strong degree of predictability that is present solely
during particular states of the economy.
Key Words: Predictive Regressions, Threshold, Model Selection, AIC, BIC.
JEL: C22, C32.
1Financial support from the ESRC is gratefully acknowledged. Address for Correspondence: Jean-Yves Pitarakis,
University of Southampton, School of Social Sciences, Economics Division, Southampton, SO17 1BJ, United-
Kingdom. Email: [email protected]. Tel: +44-23-80592631
1 Introduction
Predictive regressions refer to simple regression models in which future values of a stationary
variable are predicted with the current value of a highly persistent predictor. A widely studied em-
pirical example involves the predictability of stock returns with Dividend Yields, Dividend Payouts
or Price to Earnings ratios. Typically, such predictors have sample autocorrelations that cluster
close to one, while the variables being predicted are substantially noisier. An additional layer of
complication that arises in this setting is the endogeneity induced by the correlation between the
shocks to the regressand and past values of the predictors. Over the past twenty years a very
fruitful agenda in both the finance and econometrics literature has aimed at improving inferences
in such settings. Research in this area has mainly been concerned with improving the properties
of tests designed to detect the statistical significance of the estimated slope coefficients associated
with the persistent predictors (see Stambaugh (1999), Lewellen (2003), Campbell and Yogo (2006),
Jansson and Moreira (2006), Cochrane (2008) amongs numerous others).
More recently there was also a recognition that such models may benefit from more flexible
structures that allow their parameters to change over time or according to economically meaning-
ful episodes (e.g. recessions versus expansions). One strand of this literature has seeked to use
the existing toolkit for detecting structural breaks (e.g. Andrews SupWald type tests, Andrews
(1993)) within this predictive regression setting (see Rapach and Wohar (2006), Paye and Timmer-
mann (2006) amongst others) while more recently Gonzalo and Pitarakis (2012) adapted the Sup
Wald based tests developed in the statistical threshold model literature (Hansen (1996), Caner and
Hansen (2001)) to the predictive regression framework, specifically taking into consideration the
highly persistent nature of the predictor variables and the presence of endogeneity. The shortcom-
ings of this test based approach are its reliance on nonstandard asymptotics and its occasional poor
finite sample performance when sample sizes being considered are small. Perhaps more importantly
existing methods are also not designed to handle cases where the regime switches are themselves
triggered by a highly persistent variable.
Borrowing from the early work on the use of model selection criteria for detecting the presence
of single and multiple threshold effects developed in a purely stationary and ergodic framework
1
with strict exogeneity in Gonzalo and Pitarakis (2002), we propose to adapt a similar approach
to a predictive regression setting with threshold effects. Unlike traditional test based methods an
important feature of our approach is that it is flexible enough to allow the threshold variable to
be either stationary or highly persistent (e.g. nearly integrated) and it may even be the persistent
regressor itself. This is of considerable importance in applied work if one wishes to explore the
formation of regimes that are driven by quantities such as interest rates or valuation ratios. In-
troduced to the literature through the work of Tong (1983, 1990) and Tong and Lim (1980) (see
Hansen (2011), Tong (2011) and references therein) such models lead to a piecewise linear predic-
tive regression setting with the switch between regimes being triggered by an observable variable
exceeding or falling below an unknown threshold. As discussed in Gonzalo and Pitarakis (2012) a
particularly useful feature that such specifications may help capture is the notion that the strength
of predictability may be alternating over time according to particular episodes.
The plan of the paper is as follows. Section 2 introduces our model and assumptions and
investigates the large sample properties of our model selection approach. Section 3 provides an
extensive simulation exercise demonstrating the theory as well as providing a practical overview
of how our approach is performing in finite samples. Section 4 applies our methodology to the
predictability of stock returns for both the aggregate US market as well as Value and Growth
portfolios. Section 5 concludes.
2 The Models and The Selection Procedure
Our goal is to use a model selection procedure to distinguish between the following linear predictive
regression model M1
yt+1 = α+ βxt + et+1 (1)
and the predictive regression model with threshold effects M2 formulated as
yt+1 = (α1 + β1xt)I(zt ≤ γ) + (α2 + β2xt)I(zt > γ) + et+1 (2)
2
with et denoting the random disturbance term, zt the threshold variable and xt the predictor.
Following common practice in the literature we parameterise xt as the nearly integrated process
xt =(
1− c
T
)xt−1 + vt (3)
with vt denoting another stationary disturbance and c > 0 the near integration parameter.
Instead of proceeding with a traditional test based approach we here view the problem of testing
linearity within (2) as a model selection problem. The selection procedure is performed through the
optimisation of an objective function formulated as a standard information theoretic criterion such
as the AIC and BIC (Akaike (1973, 1974), Schwarz (1978)). The first component of the criterion is
given by a function of the residual variance estimated from each model and which is a decreasing
function of the model dimension and its second component penalises the increase in the number of
parameters as we move from M1 to M2. More specifically we let
IC(γ) = ln σ̂2(γ) + 4cTT
(4)
denote our model selection criterion associated with M2 with 4 estimated parameters and where
cT is the deterministic penalty term. When cT = 2 we refer to (4) as an AIC type of criterion while
cT = lnT corresponds to the BIC. Note that since we are not assuming that γ is known IC(γ)
is evaluated at all possible values of γ ∈ Γ. Here Γ = [γ, γ] is a subset of the threshold variable
sample space and following common practice it is understood that γ and γ are set by trimming a
fixed fraction of the top and bottom of zt.
We also write the model selection criterion associated with the linear specification as
IC = ln σ̂2 + 2cTT
(5)
where here σ̂2 refers to the residual variance estimated from the linear model. Our model selection
rule can now be stated as leading to the selection of the linear model if
IC < minγ∈Γ
IC(γ) (6)
and to the choice ofM2 otherwise i.e. there is a γ ∈ Γ such that IC > IC(γ). It is also convenient
to formulate the above rule as pointing to the choice of M2 if
maxγ∈Γ
T ln(σ̂2/σ̂2(γ)) > 2 cT (7)
3
which highlights the similarities between our model selection based approach and a conventional
test based approach. It is also important to point out that similar model selection based approaches
have been advocated in a wide range of other time series based contexts. In Chen and Gupta (1997)
for instance the authors designed a model selection based approach for detecting the presence of
structural breaks in the variance of a series. In Gonzalo and Pitarakis (1998) the authors intro-
duced a model selection approach for the specification of vector error correction models. Phillips
(2008) advocated the use of a model selection based approach for distinguishing between a sta-
tionary model and a random walk. In Gonzalo and Pitarakis (2002) the authors proposed a model
selection approach for the determination of the number of thresholds in a multiple threshold model.
Unlike our present context however Gonzalo and Pitarakis (2002) restricted their setting to strictly
stationary and ergodic specifications.
Before proceeding further we summarise our operating framework and assumptions. For no-
tational simplicity we rewrite (1) and (2) as y = Xβ + e and y = X1β1 + X2β2 + e. Here X
stacks the elements of (1 xt) while X1 and X2 stack (Iit xtIit I2t xtI2t) with Iit ≡ I(zt ≤ γ) and
I2t = I(zt > γ). The dependence of X1 and X2 on γ is omitted for notational simplicity.
In what follows we will operate under a set of standard primitive assumptions on et while the
rest of the probabilistic structure characterising (1)-(2) will be framed within a group of high level
assumptions. This will allow us to highlight the broad scope and applicability of our method. For
later use we let DT denote a deterministic normalising matrix defined as DT = diag(√T , T ). We
also refer to Ft as the sigma field generated by {wt−j , j ≥ 0} with wt = (et, vt, zt).
Assumptions A: et is such that E[et|Ft−1] = 0, E[e2t |Ft−1] = σ2
e <∞ and suptE[|et|2+δ|Ft−1] <∞
for some δ > 0.
The above restrictions are the norm in the linear predictive regression literature (see for instance
Campbell and Yogo (2006) and references therein) and have also been used in Gonzalo and Pitarakis
(2012). The assumptions on et ensure that a functional CLT applies to et (see Chan and Wei (1977)).
The mean independence assumption is in a way similar to a standard least squares setting requiring
regressors to be orthogonal to the error term. Since our models are commonly used in the context
of the predictability of stock returns the m.d.s. setting makes intuitive sense. Note however that
within (1)-(2) the shocks to yt may be contemporaneously correlated with either zt and/or xt.
4
Following the predictive regression literature, endogeneity in the context of (1)-(2) refers to the
fact that the long run covariances of Brownian Motions associated with et, vt and the shocks to zt
may be nondiagonal due to their contemporaneous correlations. Next, we introduce the following
set of associated high level assumptions.
Assumptions B: (i) D−1T X ′ie = Op(1) and (ii) D−1
T X ′iXiD−1T = Op(1) for i = 1, 2.
The above expressions involve normalised versions of sample moments such as∑Iit−1et,
∑xt−1Iit−1et,∑
Iit−1 and∑x2t−1Iit−1 and can be viewed as FCLT type of conditions. Note for instance that
under the primitive assumptions considered in Gonzalo and Pitarakis (2012) all of the above con-
ditions are satisfied when the threshold variable zt is taken as a strictly stationary and ergodic
process. More generally however the above also holds if the threshold variable is nearly integrated,
say zt = xt. In this latter instance the stochastic boundedness of the above quantities follows from
Seo (2008).
Proposition 1. Under Assumptions A and B, the model selection procedure with a penalty
term satisfying (i) cT → ∞ and (ii) cT /T → 0 as T → ∞ is such that P (M2|M1) → 0 and
P (M1|M2)→ 0.
The above proposition implies that our model selection procedure will point to the correct
model asymptotically provided that a suitable penalty term is used. It is interesting to note that the
presence of a highly persistent regressor (the nearly integrated process xt) has not affected the large
sample properties of our model selection based approach compared to its implementation within a
purely stationary multiple threshold context in Gonzalo and Pitarakis (2002). More importantly
the fact that our highly persistent regressor is parameterised with an unknown c that cannot be
estimated from the data is of no consequence on the ability of our model selection procedure to
reach a correct decision asymptotically. This is an important advantage of our approach since under
most frameworks a nearly integrated parameterisation typically translates into having limiting
distributions that depend on the unknown noncentrality parameter c. The above results suggest
that a BIC type penalty will ensure that the correct model is selected asymptotically while with
an AIC type of penalty the probability of wrongly selecting M2 when M1 is true does not vanish
asymptotically. On the other hand since cT /T → 0 for both penalties, both the BIC and AIC do
not underfit asymptotically.
5
Under further restrictions on the stochastic properties of the threshold variable zt and strict
stationarity and ergodicity of zt in particular it is also possible to gain further insight on the role
of the penalty term on model selection based inferences using the testing analogy in (7), since the
limiting distribution of Wald, LR or LM type conventional test statistics for testing linearity within
(2) follows directly from Gonzalo and Pitarakis (2012) where the authors established the validity
of a Brownian Bridge type of limit of the form Q(λ) = BB(λ)′BB(λ)/λ(1− λ) with λ ∈ (0, 1) and
BB(λ) denoting a two dimensional standard Brownian Bridge. It is also common practice to take
the supremum of Q(λ) over a closed subset of (0, 1) (see Andrews (1993)) by trimming the top and
bottom of the threshold variable sample space. Assuming equal trimming on both ends we have
λ ∈ [λ0, 1− λ0] ≡ Λ with λ0 denoting the trimming fraction (e.g. 10%).
It now follows that under the additional assumption that zt is strictly stationary and ergodic, in
large samples we can approximate the model selection based probability of selectingM2 whenM1
is true as P [supλ∈ΛQ(λ) > 2cT ]. This observation offers a very valuable setting within which to
explore the impact of the magnitude of our chosen penalty terms on our model selection procedure
since very accurate analytical approximations to such probabilities are readily available (see DeLong
(1981), James, Ling and Siegmund (1987) and Estrella (2003)).
Formally, we are interested in the probability that the squared Brownian Bridge process crosses
the boundary given by λ(1−λ)2cT . Letting z denote a generic cutoff, following Estrella (2003), we
can write
P [supλQ(λ) > z] = P
[sup
1<λ<r
‖BB(r)‖√r
> z
]= z e−
z2
[1
2
(1− 2
z
)log r +
2
z+ o(z−2)
](8)
with r = ((1 − λ0)/λ0)2 (see equation (26) in James, Ling and Siegmund (1987) specialised to a
bivariate setting). For a given λ0 and a choice of z, expression (8) allows us to evaluate analytically
P [M2|M1]. From our equation (7) we are particularly interested in boundaries given by z = 2cT
so that (8) allows us to write
P [M2|M1] = 2 cT e−cT
[1
2(1− 1
cT) log r +
1
cT+ o(c−2
T )
]. (9)
The above makes it clear that as cT →∞ the probability of overfitting vanishes in large samples. A
requirement that is not satified by the AIC criterion. Under cT = 2 and setting λ0 = 0.1 the above
6
approximation leads to P [M2|M1] ≈ 87%. This figure is in fact close to our simulation based
estimates documented in Table 1 below which suggests that when M2 is the true model the AIC
based model selection procedure points to M1 about 20% of the times and to M2 about 80% of
the times. The discrepency of about 7% between our simulations and the above approximation is
due to the fact that expressions such as (8) are valid as z →∞. Indeed, as we move to a larger BIC
type penalty, the accuracy of (8) becomes immediately clear. With cT = lnT and a sample size of
T = 1000 we have cT = 6.91 which leads to P [M2|M1] ≈ 2.5% which is remarkably close to the
figures we obtained in Table 1. Interestingly, this latter estimate also suggests that our BIC based
model selection procedure is analogous to a test based approach that uses a 2.5% nominal size.
Finally (8) also highlights the influence of the ad-hoc choice of λ0 on P [M2|M1]. Using λ0 = 0.25
instead of λ0 = 0.10 for instance leads to P [M2|M1] ≈ 57% under the AIC penalty highlighting
the important influence of the choice of a trimming parameter on inferences.
3 Model Selection in Finite Samples
We are initially interested in assesing the ability of our proposed model selection procedure not to
overfit by wrongly selecting the threshold specification when the correct model is given by the linear
model M1. Throughout all our simulations we consider two of the most commonly used penalty
terms in the wider literature and given by cT = 2 and cT = lnT . Note that cT = 2 does not satisfy
one of the conditions in our Proposition 1 and hence under this penalty the probability of selecting
M2 when M1 is true will not vanish asymptotically. Another important feature considered in our
simulations is the sensitivity of our methodology to the magnitude of c the noncentrality parameter.
For this purpose all our experiments are ran for c ∈ {1, 5, 10} so that we cover regions that are near
the edge of the unit root as well as regions that are moderately far away from the unit circle.
Our first DGP is M1. We distinguish between two alternative scenarios about the threshold
variable zt. The latter is taken to either follow an AR(1) process, say zt = φzzt−1 + ezt with
|φz| < 1 or set to equal xt itself. We also let vt follow the AR(1) process vt = φvvt−1 + evt and let
the correlation structure of the disturbance vector {et, evt, ezt} be such that all three disturbances
7
may be jointly correlated through the covariance matrix
Σ1 =
1 −0.5 0.4
−0.5 1 0.4
0.4 0.4 1
(10)
along with the uncorrelatedness scenario Σ0 = Id3. All random disturbances are generated as
normally distributed random variables throughout. Finally, we set (α, β) = (1, 0.25) and (φz, φv) =
(0.50, 0.50).
Under the scenario where zt = xt the disturbance vector is given by {et, evt} whose covariance
we set as
Σ̃1 =
(1 −0.5
−0.5 1
)(11)
together with the scenario Σ̃0 = Id2.
Note that the above framework allows a great degree of generality that goes beyond what is
typically assumed in the traditional test based literature. Table 1 presents the correct decision
frequencies associated with our model selection procedure under the stationary threshold scenario
while Table 2 focuses on the case where zt = xt.
Table 1: Correct Decision Frequencies under M1 and a Stationary Threshold Variable.
BIC,Σ0 BIC,Σ1
c=1 c=5 c=10 c=1 c=5 c=10
T=200 91.40 91.60 91.70 91.40 91.90 92.40
T=400 95.00 94.90 95.20 95.40 95.20 95.10
T=1000 97.60 97.50 97.50 97.10 97.60 97.62
AIC, Σ0 AIC, Σ1
c=1 c=5 c=10 c=1 c=5 c=10
T=200 22.70 22.50 22.60 22.10 21.50 22.30
T=400 22.20 22.40 22.50 22.00 22.60 22.35
T=1000 21.50 20.20 19.50 21.00 21.25 22.00
Our results unequivocally highlight the reliability and strong performance of a BIC type penalty.
Even under a moderately small sample size such as T = 200 the correct decision frequencies
associated with our BIC based model selection approach cluster around 95% and reach figures
8
Table 2: Correct Decision Frequencies under M1 and a Nearly Integrated Threshold Variable.
BIC,Σ0 BIC,Σ1
c=1 c=5 c=10 c=1 c=5 c=10
T=200 89.50 90.15 89.80 86.85 88.40 88.55
T=400 93.05 93.25 93.75 92.50 93.00 93.25
T=1000 96.65 97.05 97.45 96.60 96.75 96.82
AIC, Σ0 AIC, Σ1
c=1 c=5 c=10 c=1 c=5 c=10
T=200 13.70 14.30 16.10 13.50 14.35 16.15
T=400 10.40 14.10 13.95 10.40 13.65 14.10
T=1000 12.30 10.40 11.55 10.50 11.60 13.20
close to 100% for larger sample sizes. Perhaps more interestingly comparing the frequencies in
Tables 1 and 2 suggests that the performance of our model selection procedure remains equally
strong when the threshold variable is taken to be the highly persistent regressor itself.
We also note that the degree of persistence of xt has little influence on the correct decision
frequencies when the true model is linear. Looking at the magnitudes of the correct decision
frequencies under exogeneity versus endogeneity (i.e. Σ0 versus Σ1 or Σ̃0 versus Σ̃1) we can also
infer that the nature of the joint interactions betwen et, evt and ezt have virtually no impact on
correct decision frequencies. Finally we continue to note the inadequacy of an AIC type penalty.
The AIC based model selection criterion is pointing to the less parsimonious nonlinear specification
too often, only being able to select the true model about 22% of the times under a stationary
threshold variable and about 10% of the times when zt = xt. This is expected from our theoretical
results since under a constant penalty the probability of overfitting does not vanish asymptotically.
Our next goal is to explore the ability of our model selection approach to reach correct decisions
when the correct specification is given by the threshold model M2. We also use our experiments
to illustrate the fact that although the AIC displays a strong tendency to overfit its associated
probability of wrongly pointing toM1 whenM2 is true vanishes asymptotically. Given the strong
performance of the BIC documented in Table 1 it is also important to explore whether this is due
to its ability to detect the correct model or whether it arises spuriously due to the strength of its
penalty. While maintaining the same probabilistic structure as in Table 1 we here consider the
9
following additional parameterisations for the α′is and β′is in (2). We set (α1, β1) = (1.00, 0.25)
and move α2 and β2 away from (α1, β1) in increments. Having demonstrated the robustness of the
model selection approach to alternative magnitudes of c we conduct our power experiments setting
c = 1 throughout. Table 3 below summarises and labels our alternative parameterisations. We
Table 3: Model Parameterisation.
DGP α1 α2 β1 β2
A 1.00 1.20 0.25 0.25
B 1.00 1.40 0.25 0.25
C 1.00 1.00 0.25 0.50
D 1.00 1.50 0.25 0.25
E 1.00 1.20 0.25 0.50
F 1.00 1.20 0.25 0.35
must also stress that across all of the above parameterisations the threshold variable is assumed
to follow either an AR(1) process or be given by xt itself as in our earlier experiments and we set
γ = 0 throughout.
Table 4: Correct Decision Frequencies under M2 and a Stationary Threshold Variable.
T=200 T=400 T=1000 T=200 T=400 T=1000 T=200 T=400 T=1000 T=200 T=400 T=1000
Σ0,BIC Σ1, BIC Σ0,AIC Σ1, AIC
γ = 0.0
A 21.10 26.60 52.00 18.30 23.60 51.10 87.50 93.60 99.40 84.90 92.40 99.00
B 61.20 85.90 100.00 56.30 84.65 98.25 97.90 100.00 100.00 99.25 100.00 100.00
C 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
D 80.90 98.20 100.00 77.20 98.80 100.00 99.50 100.00 100.00 99.15 100.00 100.00
E 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
F 98.60 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
Our results in Table 4 continue to suggest that the BIC based model selection approach has
a good and often excellent ability to detect deviations from M1 even under scenarios in which
the two models are very close. Note for instance that scenario C keeps intercepts constant while
allowing the slope parameter β to switch by an increment of 0.25 units. Even under T=400 the
correct decision frequencies cluster at 100% under most scenarios. Interestingly and as in our M1
based experiments we note the little impact that the exogeneity versus endogeneity distinction has
10
Table 5: Correct Decision Frequencies under M2 and a Nearly Integrated Threshold Variable.
T=200 T=400 T=1000 T=200 T=400 T=1000 T=200 T=400 T=1000 T=200 T=400 T=1000
Σ0,BIC Σ1, BIC Σ0,AIC Σ1, AIC
γ = 0.0
A 13.40 11.50 13.00 14.75 10.90 12.95 86.95 91.60 94.25 88.30 90.60 94.40
B 23.75 29.80 50.85 23.70 28.10 49.25 91.50 96.00 98.10 91.15 95.65 98.50
C 71.15 81.00 86.90 72.25 80.35 85.10 97.10 98.82 99.10 96.35 98.35 99.15
D 30.90 39.00 67.90 31.10 38.75 66.05 93.00 97.10 98.50 92.65 97.10 99.00
E 71.98 81.28 86.65 72.10 80.25 87.50 97.15 99.00 99.00 97.05 98.55 99.40
F 46.10 63.72 78.95 45.70 63.10 79.70 94.50 96.10 99.10 94.85 96.90 98.80
on correct decision frequencies. This is an important and powerful feature of our model selection
approach when compared with traditional tests.
For comparison purposes we have also implemented the SupWaldA test developed in Gonzalo
and Pitarakis (2012) on the same set of experiments labelled as A − F and with the covariance
structure given by Σ1. Under T=200 and a 5% significance level the empirical powers associated
with models A−F were 12.1%, 58.3%, 100%, 70.8%, 100% and 98.3% respectively compared with
18.3%, 56.3%, 100%, 77.2%, 100% and 100% for our BIC based model selection procedure.
Comparing the correct decision frequencies in Table 4 with those in Table 5 it is also clear
that the nature of the threshold variable has an important impact on the magnitude of the correct
decision frequencies when the true model is given by the threshold specification. We observe an
important drop in power when the regime switches are triggered by a nearly integrated threshold
variable as opposed to a purely stationary one. Under some scenarios such as A and B under which
only the intercept parameters shift across regimes power is reduced by more than half. It is only
when the DGPs are characterised by switches in both their intercept and slope parameters that
correct decision frequencies reach good levels in smaller samples.
Regarding the behaviour of the AIC, Tables 1 and 2 have made it clear that it is not a suitable
criterion in the present context. Even when the true model is linear AIC based inferences tend
to over select M2 which suggests that the correct decision frequencies documented in Tables 4-5
are mainly due to the inherent nature of the AIC to point to the least parsimonious specification.
11
Tables 4-5 also make it clear that when the true model is M2 AIC based inferences do not point
to M1 as established in Proposition 1. Indeed since AIC’s penalty satisfies the requirement that
cT /T → 0 the associated probability of underfitting vanishes asymptotically.
4 Dividend Payouts and Returns Across Regimes
We use our methodology to explore the presence of threshold effects in the predictability of aggregate
stock returns with an emphasis on predictability induced by dividend payout ratios defined as the
log difference between dividends and earnings.
Lamont (1998) documented a strong positive and statistically significant association between
future returns on the SP500 index and current dividend payout ratios while Bali, Demirtas and
Tehranian (2008) demonstrate the lack of robustness of this result across different sample periods.
More recently, using a novel methodology and monthly CRSP value weighted returns Kostakis,
Magdalinos and Stamatogiannis (2010) further document the absence of any predictive power for
the dividend payout ratio.
Here we aim to illustrate our methodology by using it to document the significant impact that
the inclusion of regimes may have on predictability and on the above debate. Our regimes take
the form of good versus bad times by using the growth in industrial production as our threshold
variable. This then allows us to assess whether the potential predictability of returns with dividend
payouts is affected by the state of the economy.
Our dependent variable is the monthly value weighted returns series (inclusive of distributions)
for NYSE, AMEX and NASDAQ obtained from the CRSP database and covering the period 1950-
2007. Excess returns are subsequently obtained using 90-day treasury bill rates. In addition to
the aggregate returns our analysis below also considers the predictability of returns to five value
weighted portfolios constructed by sorting the universe of stocks into book-to-market (BM) quintiles
so as to distinguish between portfolios made up of so called value (or cheap) stocks and their growth
(expensive) counterparts. This latter data is publicly available through Kenneth French’s data
library.
12
We let erdt denote the excess return series corresponding to the aggregate market while the
excess returns associated with the five BM based quintile portfolios are denoted erbm1t, erbm2t,
erbm3t, erbm4t and erbm5t. Note that erbm1t refers to the returns to the first quintile portfolio
(i.e. low BM portfolio) while erbm5t refers to the returns to the top quintile (i.e. high BM
portfolio). In what follows we let det refer to our predictor variable and ipgrt to the growth rate
in the seasonally adjusted industrial production index. Our baseline threshold model is as in (2)
with yt = {erdt, erbm1t, . . . , erbm5t}, xt = det and zt = ipgrt.
Before proceeding with the implementation of our model selection procedure we have estimated
the linear predictive regression specification yt+1 = α + βxt + et+1 associated with each series so
as to assess whether the dividend payout variable displays any linear predictive power within our
sample. Table 6 displays the estimated slope coefficients associated with det together with the
t-statistics and the R2 of each estimated regression model.
Table 6: Linear Predictability
β̂ tβ=0 R2
erdt 0.0050 0.658 0.001
erbm1t 0.0077 0.885 0.001
erbm2t -0.0005 -0.068 0.000
erbm3t 0.0029 0.414 0.000
erbm4t 0.0011 0.159 0.000
erbm5t 0.0019 0.238 0.000
The above results strongly and unequivocally point towards a complete absence of any linear
predictability induced by the dividend payout variable. Results are also robust across both the
aggregate market and the returns to the individual BM based portfolios.
We next explore the potential presence of threshold nonlinearities in the relationship linking
returns and dividend payouts through the implementation of our model selection propcedure. Re-
sults are presented in Table 7 which displays the magnitudes of the model selection criteria for the
two competing models.
Interestingly the threshold model appears to be supported by the data for the aggregate series as
well as the low book-to-market portfolios while for the last two high book-to-market quintiles the
13
Table 7: Model Selection
IC IC(γ̂) Model
erdt -6.332 -6.345 M2
erbm1t -6.114 -6.122 M2
erbm2t -6.283 -6.289 M2
erbm3t -6.392 -6.400 M2
erbm4t -6.362 -6.359 M1
erbm5t -6.096 -6.092 M1
linear model has stronger support. In this latter case and based on our results in Table 6 it is also
clear that dividend payouts have no predictive power for future returns to high BM portfolio returns,
a result in line with Kostakis, Magdalinos and Stamatogiannis (2010). Given the support for model
M2 for the remaining portfolio’s returns however we next aim to document the explicit influence of
regimes on det induced predictability. This is achieved through the estimation of (2) for both the
aggregate series and the first three BM based portfolios. Results presented in Table 8 are striking
Table 8: Threshold Models
ipgrt ≤ γ̂ ipgrt > γ̂
β̂1 tβ1=0 R21 T1 β̂2 tβ2=0 R2
2 T2 γ̂
erdt 0.094 4.660 0.149 131 -0.012 -1.340 0.003 564 -0.0036
erbm1t 0.101 4.243 0.142 128 -0.009 -1.022 0.002 567 -0.0036
erbm2t 0.081 4.265 0.118 119 -0.015 -1.897 0.005 576 -0.0040
erbm3t 0.082 4.738 0.127 131 -0.012 -1.587 0.004 564 -0.0035
when compared with the total absence of any statistical significance documented in Table 6 under
linearity. Across all four portfolios we observe a substantial reversal in the statistical significance
of β̂1 associated with the impact of det during the low growth regimes. Dividend payout ratios are
strongly and positively associated with future returns but solely during low growth periods. The
switch in the goodness of fit across regimes is also remarkable, moving from 0% to magnitudes
close to 15%. Our results further suggest that if dividend payout based predictability is to be
interpreted as time varying expected returns consistent with the efficient markets hypothesis it will
be important to rationalise the reasons why this may be true for some but not all portfolios.
14
5 Conclusions
In this paper we have introduced a model selection based approach for uncovering the presence
of threshold effects in predictive regressions. We have shown it to display favourable large sample
properties and more importantly an excellent performance in finite samples. We further illustrated
its usefulness through an application to the predictability of stock returns. Besides the simplicity
of its implementation an important feature of our approach is its flexibility when it comes to the
stochastic properties of the threshold variables.
15
REFERENCES
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In:
Petrov, B.N., Csaki, F. (Eds.), Second International Symposium on Information Theory. Akademiai
Kiado, Budapest, pp. 267281.
Akaike, H. (1974). ‘A new look at the statistical model identification’, IEEE Transactions on
Automatic Control, 19, 716723.
Andrews, D.W.K. (1993). ‘Test for parameter instability and structural change with unknown
change point’, Econometrica, 61, 821856.
Bali, T. G., Demirtas, K. O. and Tehranian, H. (2008). ‘Aggregate Earnings, Firm-Level
Earnings, and Expected Stock Returns’, Journal of Financial and Quantitative Analysis, 43, 657-
684.
Caner, M. and Hansen, B.E. (2001). ‘Threshold autoregression with a unit root’, Econometrica,
69, 15551596.
Campbell, J. Y. and Yogo, M. (2006). ‘Efficient tests of stock return predictability’, Journal of
Financial Economics, 81, 27-60.
Chan, N. H. and Wei, C. Z. (1987). ‘Limiting Distributions of Least Squares Estimates of
Unstable Autoregressive Processes’, Annals of Statistics, 15, 1050-1063.
Chen, J. and Gupta, A. K. (1997). ‘Testing and Locating Variance Changepoints with Appli-
cation to Stock Prices’, Journal of the American Statistical Association, 92, 739-747.
Cochrane, J. H. (2008). ‘The Dog That Did Not Bark: A Defense of Return Predictability’,
Review of Financial Studies, 21, 1533-1575.
DeLong, D. M. (1981). ‘Crossing Probabilities for a square root boundary by a Bessel Process’,
Communications in Statistics, A10, 2197-213.
Estrella, A. (2003). ‘Critical Values and P Values of Bessel Process Distributions: Computation
and Application to Structural Break Tests’, Econometric Theory, 19, 1128-1143.
16
Gonzalo, J. and Pitarakis, J. (1998). ‘Specification via model selection in vector error correction
models’, Economics Letters, 60, 321328.
Gonzalo, J. and Pitarakis, J. (2002). ‘Estimation and Model Selection Based Inference in Simple
and Multiple Threshold Models’, Journal of Econometrics, 110, 319-352.
Gonzalo, J. and Pitarakis, J. (2012). ‘Regime Specific Predictability in Predictive Regressions’,
Journal of Business and Economic Statistics, 30, 229-241.
Hansen, B. E. (2011). ‘Threshold Autoregression in Economics’, Statistics and Its Interface, 4,
123-127.
Hansen, B.E. (1996). ‘Inference when a nuisance parameter is not identified under the null
hypothesis’, Econometrica, 64, 413430.
Jansson, M. and Moreira, M. J. (2006). ‘Optimal Inference in Regression Models with Nearly
Integrated Regressors’, Econometrica, 74, 681-714.
James, B., Ling, J. and Siegmund, D. (1987). ‘Test for a change-point’, Biometrika, 74, 71-83.
Kostakis, A., Magdalinos, A. and Stamatogiannis, M. (2010). ‘Robust econometric inference
for stock return predictability’, Unpublished Manuscript, University of Southampton.
Lamont, O. (1998). ‘Earnings and Expected Returns’, Journal of Finance, 53, 1563-1587.
Lewellen, J. (2003). ‘Predicting Returns with Financial Ratios’, Journal of Financial Eco-
nomics, 74, 209-235.
Paye, B. S. and Timmermann, A. (2006). ‘Instability of Return Prediction Models’, Journal of
Empirical Finance, 13, 274-315.
Phillips, P. C. B. (2008). ‘Unit Root Model Selection’, Journal of the Japanese Statistical
Society, 38, 65-74.
Rapach, D. E. and Wohar, M. E. (2006). ‘Structural Breaks and Predictive Regression Models
of Aggregate US Stock Returns’, Journal of Financial Econometrics, 4, 238-274.
Schwarz, G. (1978). ‘Estimating the dimension of a model’, Annals of Statistics, 6, 461464.
17
Seo, M. H. (2008). ‘Unit Root Test in a Threshold Autoregression: Asymptotic Theory and
Residual-Based Block Bootstrap’, Econometric Theory, 24, 1699-1716.
Stambaugh, R. F. (1999). ‘Predictive Regressions’, Journal of Financial Economics, 54, 375-
421.
Tong, H. (1983). Threshold Models in Non-linear Time Series Analysis. Lecture Notes in
Statistics, Vol. 21. Springer, Berlin.
Tong, H. (1990). Non-Linear Time Series: A Dynamical System Approach. Oxford University
Press, Oxford.
Tong, H. and Lim, K.S. (1980). ‘Threshold autoregression, limit cycles and cyclical data’,
Journal of The Royal Statistical Society Series B, 4, 245292.
Tong, H. (2011). ‘Threshold models in time-series analysis30 years on’, Statistics and Its Inter-
face, 4, 107-136.
18
APPENDIX
PROOF OF PROPOSITION 1. We initially consider the case where the true model is given by
M2 while the fitted model is M1. It now suffices to show that P [IC < IC(γ0)] → 0 as T → ∞
hence implying that our model selection procedure does not point toM1 asymptotically. Since the
true model is given by y = X01β1 +X0
2β2 + e standard algebra gives
σ̂2(γ0) =e′e
T− e′X1
0(X10′X1
0)−1X10′e
T− e′X2
0(X20′X2
0)−1X20′e
T(12)
so that from our Assumptions A and B we have σ̂2(γ0) = e′e/T + op(1) and IC(γ0) = ln(e′e/T ) +
4cT /T + op(1). When the fitted model is M1 however so that σ̂2 = (y′y − y′X(X ′X)−1X ′y)/T
with y = X01β1 +X0
2β2 + e lengthy but standard algebra gives
σ̂2 =e′e
T− 1
Te′X(X ′X)−1X ′e+
1
T(β1 − β2)′X0
1′X0
1 (X ′X)−1X02′X0
2 (β1 − β2)
+1
T
[2β′1X
01′e+ 2β′2X
02′e− 2β′1(X0
1′X0
1 )(X ′X)−1X ′e− 2β′2(X02′X0
2 )(X ′X)−1X ′e]. (13)
From our Assumptions A and B it immediately follows that the third term in the right hand side
of (13) is
1
T(β1 − β2)′X0
1′X0
1D−1T (D−1
T X ′XD−1T )−1D−1
T X02′X0
2 (β1 − β2) = Op(T ) (14)
since D−1T X ′XD−1
T = Op(1) and D−1T X ′iXiD
−1T = Op(1). Furthermore since D−1
T X ′e = Op(1),
D−1T X ′ie = Op(1) and e′e/T
p→ σ2 the remaining terms in (13) are Op(1) implying that σ̂2−σ̂2(γ0) =
Op(T ) and thus leading to the required result provided that cT /T → 0 since P [IC < IC(γ0)] =
P [ln σ̂2 < ln σ̂2(γ0) + 2cT /T ].
Next, we concentrate on the case where the true model is given by the linear specification M1
and note that P [M2|M1]→ 0 is equivalent to P [IC > IC(γ)]→ 0 for some γ ∈ Γ. This allows us
to write
P [IC > IC(γ)] ≤ P [IC > minγIC(γ)]
= P [maxγ
T ln
(σ̂2
σ̂2(γ)
)> 2cT ]. (15)
Under M1 we have
σ̂2(γ) =u′u
T− 1
T
2∑i=1
e′XiD−1T (D−1
T X ′iXiD−1T )−1D−1
T X ′ie (16)
19
so that from Assumptions A and B it follows that σ̂2(γ)p→ σ2
e and T (σ̂2 − σ̂2(γ)) = Op(1) leading
to maxγ T ln(σ̂2/σ̂2(γ)) = Op(1). Thus under cT → ∞ it follows that the right hand side of (15)
vanishes asymptotically as required.
20