Uncovering regimes in predictive regressions: A model ...€¦ · Uncovering regimes in predictive regressions: A model selection based approach Jean-Yves Pitarakis University of

Uncovering regimes in predictive regressions: A model selection

based approach

Jean-Yves Pitarakis

University of Southampton

Economics Division

Southampton SO17 1BJ, UK.

March 2012

Abstract

A model selection based approach for detecting the presence of threshold effects in predictive

regressions with a highly persistent regressor and possible endogeneity is proposed. The method

is shown to lead to the selection of the true model asymptotically and to be flexible enough

to accommodate threshold variables that are either stationary or highly persistent, including

the predictor itself. An analytical approximation to the probability of wrongly selecting the

threshold model over the linear specification is also obtained across alternative penalty term

choices. An extensive simulation study documents its excellent finite sample properties and

an application to the predictability of stock returns with dividend payout ratios illustrates its

practical usefulness. Results indicate a strong degree of predictability that is present solely

during particular states of the economy.

Key Words: Predictive Regressions, Threshold, Model Selection, AIC, BIC.

JEL: C22, C32.

1Financial support from the ESRC is gratefully acknowledged. Address for Correspondence: Jean-Yves Pitarakis,

University of Southampton, School of Social Sciences, Economics Division, Southampton, SO17 1BJ, United-

Kingdom. Email: [email protected]. Tel: +44-23-80592631

1 Introduction

Predictive regressions refer to simple regression models in which future values of a stationary

variable are predicted with the current value of a highly persistent predictor. A widely studied em-

pirical example involves the predictability of stock returns with Dividend Yields, Dividend Payouts

or Price to Earnings ratios. Typically, such predictors have sample autocorrelations that cluster

close to one, while the variables being predicted are substantially noisier. An additional layer of

complication that arises in this setting is the endogeneity induced by the correlation between the

shocks to the regressand and past values of the predictors. Over the past twenty years a very

fruitful agenda in both the finance and econometrics literature has aimed at improving inferences

in such settings. Research in this area has mainly been concerned with improving the properties

of tests designed to detect the statistical significance of the estimated slope coefficients associated

with the persistent predictors (see Stambaugh (1999), Lewellen (2003), Campbell and Yogo (2006),

Jansson and Moreira (2006), Cochrane (2008) amongs numerous others).

More recently there was also a recognition that such models may benefit from more flexible

structures that allow their parameters to change over time or according to economically meaning-

ful episodes (e.g. recessions versus expansions). One strand of this literature has seeked to use

the existing toolkit for detecting structural breaks (e.g. Andrews SupWald type tests, Andrews

(1993)) within this predictive regression setting (see Rapach and Wohar (2006), Paye and Timmer-

mann (2006) amongst others) while more recently Gonzalo and Pitarakis (2012) adapted the Sup

Wald based tests developed in the statistical threshold model literature (Hansen (1996), Caner and

Hansen (2001)) to the predictive regression framework, specifically taking into consideration the

highly persistent nature of the predictor variables and the presence of endogeneity. The shortcom-

ings of this test based approach are its reliance on nonstandard asymptotics and its occasional poor

finite sample performance when sample sizes being considered are small. Perhaps more importantly

existing methods are also not designed to handle cases where the regime switches are themselves

triggered by a highly persistent variable.

Borrowing from the early work on the use of model selection criteria for detecting the presence

of single and multiple threshold effects developed in a purely stationary and ergodic framework

1

with strict exogeneity in Gonzalo and Pitarakis (2002), we propose to adapt a similar approach

to a predictive regression setting with threshold effects. Unlike traditional test based methods an

important feature of our approach is that it is flexible enough to allow the threshold variable to

be either stationary or highly persistent (e.g. nearly integrated) and it may even be the persistent

regressor itself. This is of considerable importance in applied work if one wishes to explore the

formation of regimes that are driven by quantities such as interest rates or valuation ratios. In-

troduced to the literature through the work of Tong (1983, 1990) and Tong and Lim (1980) (see

Hansen (2011), Tong (2011) and references therein) such models lead to a piecewise linear predic-

tive regression setting with the switch between regimes being triggered by an observable variable

exceeding or falling below an unknown threshold. As discussed in Gonzalo and Pitarakis (2012) a

particularly useful feature that such specifications may help capture is the notion that the strength

of predictability may be alternating over time according to particular episodes.

The plan of the paper is as follows. Section 2 introduces our model and assumptions and

investigates the large sample properties of our model selection approach. Section 3 provides an

extensive simulation exercise demonstrating the theory as well as providing a practical overview

of how our approach is performing in finite samples. Section 4 applies our methodology to the

predictability of stock returns for both the aggregate US market as well as Value and Growth

portfolios. Section 5 concludes.

2 The Models and The Selection Procedure

Our goal is to use a model selection procedure to distinguish between the following linear predictive

regression model M1

yt+1 = α+ βxt + et+1 (1)

and the predictive regression model with threshold effects M2 formulated as

yt+1 = (α1 + β1xt)I(zt ≤ γ) + (α2 + β2xt)I(zt > γ) + et+1 (2)

2

with et denoting the random disturbance term, zt the threshold variable and xt the predictor.

Following common practice in the literature we parameterise xt as the nearly integrated process

xt =(

1− c

T

)xt−1 + vt (3)

with vt denoting another stationary disturbance and c > 0 the near integration parameter.

Instead of proceeding with a traditional test based approach we here view the problem of testing

linearity within (2) as a model selection problem. The selection procedure is performed through the

optimisation of an objective function formulated as a standard information theoretic criterion such

as the AIC and BIC (Akaike (1973, 1974), Schwarz (1978)). The first component of the criterion is

given by a function of the residual variance estimated from each model and which is a decreasing

function of the model dimension and its second component penalises the increase in the number of

parameters as we move from M1 to M2. More specifically we let

IC(γ) = ln σ̂2(γ) + 4cTT

(4)

denote our model selection criterion associated with M2 with 4 estimated parameters and where

cT is the deterministic penalty term. When cT = 2 we refer to (4) as an AIC type of criterion while

cT = lnT corresponds to the BIC. Note that since we are not assuming that γ is known IC(γ)

is evaluated at all possible values of γ ∈ Γ. Here Γ = [γ, γ] is a subset of the threshold variable

sample space and following common practice it is understood that γ and γ are set by trimming a

fixed fraction of the top and bottom of zt.

We also write the model selection criterion associated with the linear specification as

IC = ln σ̂2 + 2cTT

(5)

where here σ̂2 refers to the residual variance estimated from the linear model. Our model selection

rule can now be stated as leading to the selection of the linear model if

IC < minγ∈Γ

IC(γ) (6)

and to the choice ofM2 otherwise i.e. there is a γ ∈ Γ such that IC > IC(γ). It is also convenient

to formulate the above rule as pointing to the choice of M2 if

maxγ∈Γ

T ln(σ̂2/σ̂2(γ)) > 2 cT (7)

3

which highlights the similarities between our model selection based approach and a conventional

test based approach. It is also important to point out that similar model selection based approaches

have been advocated in a wide range of other time series based contexts. In Chen and Gupta (1997)

for instance the authors designed a model selection based approach for detecting the presence of

structural breaks in the variance of a series. In Gonzalo and Pitarakis (1998) the authors intro-

duced a model selection approach for the specification of vector error correction models. Phillips

(2008) advocated the use of a model selection based approach for distinguishing between a sta-

tionary model and a random walk. In Gonzalo and Pitarakis (2002) the authors proposed a model

selection approach for the determination of the number of thresholds in a multiple threshold model.

Unlike our present context however Gonzalo and Pitarakis (2002) restricted their setting to strictly

stationary and ergodic specifications.

Before proceeding further we summarise our operating framework and assumptions. For no-

tational simplicity we rewrite (1) and (2) as y = Xβ + e and y = X1β1 + X2β2 + e. Here X

stacks the elements of (1 xt) while X1 and X2 stack (Iit xtIit I2t xtI2t) with Iit ≡ I(zt ≤ γ) and

I2t = I(zt > γ). The dependence of X1 and X2 on γ is omitted for notational simplicity.

In what follows we will operate under a set of standard primitive assumptions on et while the

rest of the probabilistic structure characterising (1)-(2) will be framed within a group of high level

assumptions. This will allow us to highlight the broad scope and applicability of our method. For

later use we let DT denote a deterministic normalising matrix defined as DT = diag(√T , T ). We

also refer to Ft as the sigma field generated by {wt−j , j ≥ 0} with wt = (et, vt, zt).

Assumptions A: et is such that E[et|Ft−1] = 0, E[e2t |Ft−1] = σ2

e <∞ and suptE[|et|2+δ|Ft−1] <∞

for some δ > 0.

The above restrictions are the norm in the linear predictive regression literature (see for instance

Campbell and Yogo (2006) and references therein) and have also been used in Gonzalo and Pitarakis

(2012). The assumptions on et ensure that a functional CLT applies to et (see Chan and Wei (1977)).

The mean independence assumption is in a way similar to a standard least squares setting requiring

regressors to be orthogonal to the error term. Since our models are commonly used in the context

of the predictability of stock returns the m.d.s. setting makes intuitive sense. Note however that

within (1)-(2) the shocks to yt may be contemporaneously correlated with either zt and/or xt.

4

Following the predictive regression literature, endogeneity in the context of (1)-(2) refers to the

fact that the long run covariances of Brownian Motions associated with et, vt and the shocks to zt

may be nondiagonal due to their contemporaneous correlations. Next, we introduce the following

set of associated high level assumptions.

Assumptions B: (i) D−1T X ′ie = Op(1) and (ii) D−1

T X ′iXiD−1T = Op(1) for i = 1, 2.

The above expressions involve normalised versions of sample moments such as∑Iit−1et,

∑xt−1Iit−1et,∑

Iit−1 and∑x2t−1Iit−1 and can be viewed as FCLT type of conditions. Note for instance that

under the primitive assumptions considered in Gonzalo and Pitarakis (2012) all of the above con-

ditions are satisfied when the threshold variable zt is taken as a strictly stationary and ergodic

process. More generally however the above also holds if the threshold variable is nearly integrated,

say zt = xt. In this latter instance the stochastic boundedness of the above quantities follows from

Seo (2008).

Proposition 1. Under Assumptions A and B, the model selection procedure with a penalty

term satisfying (i) cT → ∞ and (ii) cT /T → 0 as T → ∞ is such that P (M2|M1) → 0 and

P (M1|M2)→ 0.

The above proposition implies that our model selection procedure will point to the correct

model asymptotically provided that a suitable penalty term is used. It is interesting to note that the

presence of a highly persistent regressor (the nearly integrated process xt) has not affected the large

sample properties of our model selection based approach compared to its implementation within a

purely stationary multiple threshold context in Gonzalo and Pitarakis (2002). More importantly

the fact that our highly persistent regressor is parameterised with an unknown c that cannot be

estimated from the data is of no consequence on the ability of our model selection procedure to

reach a correct decision asymptotically. This is an important advantage of our approach since under

most frameworks a nearly integrated parameterisation typically translates into having limiting

distributions that depend on the unknown noncentrality parameter c. The above results suggest

that a BIC type penalty will ensure that the correct model is selected asymptotically while with

an AIC type of penalty the probability of wrongly selecting M2 when M1 is true does not vanish

asymptotically. On the other hand since cT /T → 0 for both penalties, both the BIC and AIC do

not underfit asymptotically.

5

Under further restrictions on the stochastic properties of the threshold variable zt and strict

stationarity and ergodicity of zt in particular it is also possible to gain further insight on the role

of the penalty term on model selection based inferences using the testing analogy in (7), since the

limiting distribution of Wald, LR or LM type conventional test statistics for testing linearity within

(2) follows directly from Gonzalo and Pitarakis (2012) where the authors established the validity

of a Brownian Bridge type of limit of the form Q(λ) = BB(λ)′BB(λ)/λ(1− λ) with λ ∈ (0, 1) and

BB(λ) denoting a two dimensional standard Brownian Bridge. It is also common practice to take

the supremum of Q(λ) over a closed subset of (0, 1) (see Andrews (1993)) by trimming the top and

bottom of the threshold variable sample space. Assuming equal trimming on both ends we have

λ ∈ [λ0, 1− λ0] ≡ Λ with λ0 denoting the trimming fraction (e.g. 10%).

It now follows that under the additional assumption that zt is strictly stationary and ergodic, in

large samples we can approximate the model selection based probability of selectingM2 whenM1

is true as P [supλ∈ΛQ(λ) > 2cT ]. This observation offers a very valuable setting within which to

explore the impact of the magnitude of our chosen penalty terms on our model selection procedure

since very accurate analytical approximations to such probabilities are readily available (see DeLong

(1981), James, Ling and Siegmund (1987) and Estrella (2003)).

Formally, we are interested in the probability that the squared Brownian Bridge process crosses

the boundary given by λ(1−λ)2cT . Letting z denote a generic cutoff, following Estrella (2003), we

can write

P [supλQ(λ) > z] = P

[sup

1<λ<r

‖BB(r)‖√r

> z

]= z e−

z2

[1

2

(1− 2

z

)log r +

2

z+ o(z−2)

](8)

with r = ((1 − λ0)/λ0)2 (see equation (26) in James, Ling and Siegmund (1987) specialised to a

bivariate setting). For a given λ0 and a choice of z, expression (8) allows us to evaluate analytically

P [M2|M1]. From our equation (7) we are particularly interested in boundaries given by z = 2cT

so that (8) allows us to write

P [M2|M1] = 2 cT e−cT

[1

2(1− 1

cT) log r +

1

cT+ o(c−2

T )

]. (9)

The above makes it clear that as cT →∞ the probability of overfitting vanishes in large samples. A

requirement that is not satified by the AIC criterion. Under cT = 2 and setting λ0 = 0.1 the above

6

approximation leads to P [M2|M1] ≈ 87%. This figure is in fact close to our simulation based

estimates documented in Table 1 below which suggests that when M2 is the true model the AIC

based model selection procedure points to M1 about 20% of the times and to M2 about 80% of

the times. The discrepency of about 7% between our simulations and the above approximation is

due to the fact that expressions such as (8) are valid as z →∞. Indeed, as we move to a larger BIC

type penalty, the accuracy of (8) becomes immediately clear. With cT = lnT and a sample size of

T = 1000 we have cT = 6.91 which leads to P [M2|M1] ≈ 2.5% which is remarkably close to the

figures we obtained in Table 1. Interestingly, this latter estimate also suggests that our BIC based

model selection procedure is analogous to a test based approach that uses a 2.5% nominal size.

Finally (8) also highlights the influence of the ad-hoc choice of λ0 on P [M2|M1]. Using λ0 = 0.25

instead of λ0 = 0.10 for instance leads to P [M2|M1] ≈ 57% under the AIC penalty highlighting

the important influence of the choice of a trimming parameter on inferences.

3 Model Selection in Finite Samples

We are initially interested in assesing the ability of our proposed model selection procedure not to

overfit by wrongly selecting the threshold specification when the correct model is given by the linear

model M1. Throughout all our simulations we consider two of the most commonly used penalty

terms in the wider literature and given by cT = 2 and cT = lnT . Note that cT = 2 does not satisfy

one of the conditions in our Proposition 1 and hence under this penalty the probability of selecting

M2 when M1 is true will not vanish asymptotically. Another important feature considered in our

simulations is the sensitivity of our methodology to the magnitude of c the noncentrality parameter.

For this purpose all our experiments are ran for c ∈ {1, 5, 10} so that we cover regions that are near

the edge of the unit root as well as regions that are moderately far away from the unit circle.

Our first DGP is M1. We distinguish between two alternative scenarios about the threshold

variable zt. The latter is taken to either follow an AR(1) process, say zt = φzzt−1 + ezt with

|φz| < 1 or set to equal xt itself. We also let vt follow the AR(1) process vt = φvvt−1 + evt and let

the correlation structure of the disturbance vector {et, evt, ezt} be such that all three disturbances

7

may be jointly correlated through the covariance matrix

Σ1 =

1 −0.5 0.4

−0.5 1 0.4

0.4 0.4 1

(10)

along with the uncorrelatedness scenario Σ0 = Id3. All random disturbances are generated as

normally distributed random variables throughout. Finally, we set (α, β) = (1, 0.25) and (φz, φv) =

(0.50, 0.50).

Under the scenario where zt = xt the disturbance vector is given by {et, evt} whose covariance

we set as

Σ̃1 =

(1 −0.5

−0.5 1

)(11)

together with the scenario Σ̃0 = Id2.

Note that the above framework allows a great degree of generality that goes beyond what is

typically assumed in the traditional test based literature. Table 1 presents the correct decision

frequencies associated with our model selection procedure under the stationary threshold scenario

while Table 2 focuses on the case where zt = xt.

Table 1: Correct Decision Frequencies under M1 and a Stationary Threshold Variable.

BIC,Σ0 BIC,Σ1

c=1 c=5 c=10 c=1 c=5 c=10

T=200 91.40 91.60 91.70 91.40 91.90 92.40

T=400 95.00 94.90 95.20 95.40 95.20 95.10

T=1000 97.60 97.50 97.50 97.10 97.60 97.62

AIC, Σ0 AIC, Σ1

c=1 c=5 c=10 c=1 c=5 c=10

T=200 22.70 22.50 22.60 22.10 21.50 22.30

T=400 22.20 22.40 22.50 22.00 22.60 22.35

T=1000 21.50 20.20 19.50 21.00 21.25 22.00

Our results unequivocally highlight the reliability and strong performance of a BIC type penalty.

Even under a moderately small sample size such as T = 200 the correct decision frequencies

associated with our BIC based model selection approach cluster around 95% and reach figures

8

Table 2: Correct Decision Frequencies under M1 and a Nearly Integrated Threshold Variable.

BIC,Σ0 BIC,Σ1

c=1 c=5 c=10 c=1 c=5 c=10

T=200 89.50 90.15 89.80 86.85 88.40 88.55

T=400 93.05 93.25 93.75 92.50 93.00 93.25

T=1000 96.65 97.05 97.45 96.60 96.75 96.82

AIC, Σ0 AIC, Σ1

c=1 c=5 c=10 c=1 c=5 c=10

T=200 13.70 14.30 16.10 13.50 14.35 16.15

T=400 10.40 14.10 13.95 10.40 13.65 14.10

T=1000 12.30 10.40 11.55 10.50 11.60 13.20

close to 100% for larger sample sizes. Perhaps more interestingly comparing the frequencies in

Tables 1 and 2 suggests that the performance of our model selection procedure remains equally

strong when the threshold variable is taken to be the highly persistent regressor itself.

We also note that the degree of persistence of xt has little influence on the correct decision

frequencies when the true model is linear. Looking at the magnitudes of the correct decision

frequencies under exogeneity versus endogeneity (i.e. Σ0 versus Σ1 or Σ̃0 versus Σ̃1) we can also

infer that the nature of the joint interactions betwen et, evt and ezt have virtually no impact on

correct decision frequencies. Finally we continue to note the inadequacy of an AIC type penalty.

The AIC based model selection criterion is pointing to the less parsimonious nonlinear specification

too often, only being able to select the true model about 22% of the times under a stationary

threshold variable and about 10% of the times when zt = xt. This is expected from our theoretical

results since under a constant penalty the probability of overfitting does not vanish asymptotically.

Our next goal is to explore the ability of our model selection approach to reach correct decisions

when the correct specification is given by the threshold model M2. We also use our experiments

to illustrate the fact that although the AIC displays a strong tendency to overfit its associated

probability of wrongly pointing toM1 whenM2 is true vanishes asymptotically. Given the strong

performance of the BIC documented in Table 1 it is also important to explore whether this is due

to its ability to detect the correct model or whether it arises spuriously due to the strength of its

penalty. While maintaining the same probabilistic structure as in Table 1 we here consider the

9

following additional parameterisations for the α′is and β′is in (2). We set (α1, β1) = (1.00, 0.25)

and move α2 and β2 away from (α1, β1) in increments. Having demonstrated the robustness of the

model selection approach to alternative magnitudes of c we conduct our power experiments setting

c = 1 throughout. Table 3 below summarises and labels our alternative parameterisations. We

Table 3: Model Parameterisation.

DGP α1 α2 β1 β2

A 1.00 1.20 0.25 0.25

B 1.00 1.40 0.25 0.25

C 1.00 1.00 0.25 0.50

D 1.00 1.50 0.25 0.25

E 1.00 1.20 0.25 0.50

F 1.00 1.20 0.25 0.35

must also stress that across all of the above parameterisations the threshold variable is assumed

to follow either an AR(1) process or be given by xt itself as in our earlier experiments and we set

γ = 0 throughout.

Table 4: Correct Decision Frequencies under M2 and a Stationary Threshold Variable.

T=200 T=400 T=1000 T=200 T=400 T=1000 T=200 T=400 T=1000 T=200 T=400 T=1000

Σ0,BIC Σ1, BIC Σ0,AIC Σ1, AIC

γ = 0.0

A 21.10 26.60 52.00 18.30 23.60 51.10 87.50 93.60 99.40 84.90 92.40 99.00

B 61.20 85.90 100.00 56.30 84.65 98.25 97.90 100.00 100.00 99.25 100.00 100.00

C 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00

D 80.90 98.20 100.00 77.20 98.80 100.00 99.50 100.00 100.00 99.15 100.00 100.00

E 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00

F 98.60 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00

Our results in Table 4 continue to suggest that the BIC based model selection approach has

a good and often excellent ability to detect deviations from M1 even under scenarios in which

the two models are very close. Note for instance that scenario C keeps intercepts constant while

allowing the slope parameter β to switch by an increment of 0.25 units. Even under T=400 the

correct decision frequencies cluster at 100% under most scenarios. Interestingly and as in our M1

based experiments we note the little impact that the exogeneity versus endogeneity distinction has

10

Table 5: Correct Decision Frequencies under M2 and a Nearly Integrated Threshold Variable.

T=200 T=400 T=1000 T=200 T=400 T=1000 T=200 T=400 T=1000 T=200 T=400 T=1000

Σ0,BIC Σ1, BIC Σ0,AIC Σ1, AIC

γ = 0.0

A 13.40 11.50 13.00 14.75 10.90 12.95 86.95 91.60 94.25 88.30 90.60 94.40

B 23.75 29.80 50.85 23.70 28.10 49.25 91.50 96.00 98.10 91.15 95.65 98.50

C 71.15 81.00 86.90 72.25 80.35 85.10 97.10 98.82 99.10 96.35 98.35 99.15

D 30.90 39.00 67.90 31.10 38.75 66.05 93.00 97.10 98.50 92.65 97.10 99.00

E 71.98 81.28 86.65 72.10 80.25 87.50 97.15 99.00 99.00 97.05 98.55 99.40

F 46.10 63.72 78.95 45.70 63.10 79.70 94.50 96.10 99.10 94.85 96.90 98.80

on correct decision frequencies. This is an important and powerful feature of our model selection

approach when compared with traditional tests.

For comparison purposes we have also implemented the SupWaldA test developed in Gonzalo

and Pitarakis (2012) on the same set of experiments labelled as A − F and with the covariance

structure given by Σ1. Under T=200 and a 5% significance level the empirical powers associated

with models A−F were 12.1%, 58.3%, 100%, 70.8%, 100% and 98.3% respectively compared with

18.3%, 56.3%, 100%, 77.2%, 100% and 100% for our BIC based model selection procedure.

Comparing the correct decision frequencies in Table 4 with those in Table 5 it is also clear

that the nature of the threshold variable has an important impact on the magnitude of the correct

decision frequencies when the true model is given by the threshold specification. We observe an

important drop in power when the regime switches are triggered by a nearly integrated threshold

variable as opposed to a purely stationary one. Under some scenarios such as A and B under which

only the intercept parameters shift across regimes power is reduced by more than half. It is only

when the DGPs are characterised by switches in both their intercept and slope parameters that

correct decision frequencies reach good levels in smaller samples.

Regarding the behaviour of the AIC, Tables 1 and 2 have made it clear that it is not a suitable

criterion in the present context. Even when the true model is linear AIC based inferences tend

to over select M2 which suggests that the correct decision frequencies documented in Tables 4-5

are mainly due to the inherent nature of the AIC to point to the least parsimonious specification.

11

Tables 4-5 also make it clear that when the true model is M2 AIC based inferences do not point

to M1 as established in Proposition 1. Indeed since AIC’s penalty satisfies the requirement that

cT /T → 0 the associated probability of underfitting vanishes asymptotically.

4 Dividend Payouts and Returns Across Regimes

We use our methodology to explore the presence of threshold effects in the predictability of aggregate

stock returns with an emphasis on predictability induced by dividend payout ratios defined as the

log difference between dividends and earnings.

Lamont (1998) documented a strong positive and statistically significant association between

future returns on the SP500 index and current dividend payout ratios while Bali, Demirtas and

Tehranian (2008) demonstrate the lack of robustness of this result across different sample periods.

More recently, using a novel methodology and monthly CRSP value weighted returns Kostakis,

Magdalinos and Stamatogiannis (2010) further document the absence of any predictive power for

the dividend payout ratio.

Here we aim to illustrate our methodology by using it to document the significant impact that

the inclusion of regimes may have on predictability and on the above debate. Our regimes take

the form of good versus bad times by using the growth in industrial production as our threshold

variable. This then allows us to assess whether the potential predictability of returns with dividend

payouts is affected by the state of the economy.

Our dependent variable is the monthly value weighted returns series (inclusive of distributions)

for NYSE, AMEX and NASDAQ obtained from the CRSP database and covering the period 1950-

2007. Excess returns are subsequently obtained using 90-day treasury bill rates. In addition to

the aggregate returns our analysis below also considers the predictability of returns to five value

weighted portfolios constructed by sorting the universe of stocks into book-to-market (BM) quintiles

so as to distinguish between portfolios made up of so called value (or cheap) stocks and their growth

(expensive) counterparts. This latter data is publicly available through Kenneth French’s data

library.

12

We let erdt denote the excess return series corresponding to the aggregate market while the

excess returns associated with the five BM based quintile portfolios are denoted erbm1t, erbm2t,

erbm3t, erbm4t and erbm5t. Note that erbm1t refers to the returns to the first quintile portfolio

(i.e. low BM portfolio) while erbm5t refers to the returns to the top quintile (i.e. high BM

portfolio). In what follows we let det refer to our predictor variable and ipgrt to the growth rate

in the seasonally adjusted industrial production index. Our baseline threshold model is as in (2)

with yt = {erdt, erbm1t, . . . , erbm5t}, xt = det and zt = ipgrt.

Before proceeding with the implementation of our model selection procedure we have estimated

the linear predictive regression specification yt+1 = α + βxt + et+1 associated with each series so

as to assess whether the dividend payout variable displays any linear predictive power within our

sample. Table 6 displays the estimated slope coefficients associated with det together with the

t-statistics and the R2 of each estimated regression model.

Table 6: Linear Predictability

β̂ tβ=0 R2

erdt 0.0050 0.658 0.001

erbm1t 0.0077 0.885 0.001

erbm2t -0.0005 -0.068 0.000

erbm3t 0.0029 0.414 0.000

erbm4t 0.0011 0.159 0.000

erbm5t 0.0019 0.238 0.000

The above results strongly and unequivocally point towards a complete absence of any linear

predictability induced by the dividend payout variable. Results are also robust across both the

aggregate market and the returns to the individual BM based portfolios.

We next explore the potential presence of threshold nonlinearities in the relationship linking

returns and dividend payouts through the implementation of our model selection propcedure. Re-

sults are presented in Table 7 which displays the magnitudes of the model selection criteria for the

two competing models.

Interestingly the threshold model appears to be supported by the data for the aggregate series as

well as the low book-to-market portfolios while for the last two high book-to-market quintiles the

13

Table 7: Model Selection

IC IC(γ̂) Model

erdt -6.332 -6.345 M2

erbm1t -6.114 -6.122 M2

erbm2t -6.283 -6.289 M2

erbm3t -6.392 -6.400 M2

erbm4t -6.362 -6.359 M1

erbm5t -6.096 -6.092 M1

linear model has stronger support. In this latter case and based on our results in Table 6 it is also

clear that dividend payouts have no predictive power for future returns to high BM portfolio returns,

a result in line with Kostakis, Magdalinos and Stamatogiannis (2010). Given the support for model

M2 for the remaining portfolio’s returns however we next aim to document the explicit influence of

regimes on det induced predictability. This is achieved through the estimation of (2) for both the

aggregate series and the first three BM based portfolios. Results presented in Table 8 are striking

Table 8: Threshold Models

ipgrt ≤ γ̂ ipgrt > γ̂

β̂1 tβ1=0 R21 T1 β̂2 tβ2=0 R2

2 T2 γ̂

erdt 0.094 4.660 0.149 131 -0.012 -1.340 0.003 564 -0.0036

erbm1t 0.101 4.243 0.142 128 -0.009 -1.022 0.002 567 -0.0036

erbm2t 0.081 4.265 0.118 119 -0.015 -1.897 0.005 576 -0.0040

erbm3t 0.082 4.738 0.127 131 -0.012 -1.587 0.004 564 -0.0035

when compared with the total absence of any statistical significance documented in Table 6 under

linearity. Across all four portfolios we observe a substantial reversal in the statistical significance

of β̂1 associated with the impact of det during the low growth regimes. Dividend payout ratios are

strongly and positively associated with future returns but solely during low growth periods. The

switch in the goodness of fit across regimes is also remarkable, moving from 0% to magnitudes

close to 15%. Our results further suggest that if dividend payout based predictability is to be

interpreted as time varying expected returns consistent with the efficient markets hypothesis it will

be important to rationalise the reasons why this may be true for some but not all portfolios.

14

5 Conclusions

In this paper we have introduced a model selection based approach for uncovering the presence

of threshold effects in predictive regressions. We have shown it to display favourable large sample

properties and more importantly an excellent performance in finite samples. We further illustrated

its usefulness through an application to the predictability of stock returns. Besides the simplicity

of its implementation an important feature of our approach is its flexibility when it comes to the

stochastic properties of the threshold variables.

15

REFERENCES

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In:

Petrov, B.N., Csaki, F. (Eds.), Second International Symposium on Information Theory. Akademiai

Kiado, Budapest, pp. 267281.

Akaike, H. (1974). ‘A new look at the statistical model identification’, IEEE Transactions on

Automatic Control, 19, 716723.

Andrews, D.W.K. (1993). ‘Test for parameter instability and structural change with unknown

change point’, Econometrica, 61, 821856.

Bali, T. G., Demirtas, K. O. and Tehranian, H. (2008). ‘Aggregate Earnings, Firm-Level

Earnings, and Expected Stock Returns’, Journal of Financial and Quantitative Analysis, 43, 657-

684.

Caner, M. and Hansen, B.E. (2001). ‘Threshold autoregression with a unit root’, Econometrica,

69, 15551596.

Campbell, J. Y. and Yogo, M. (2006). ‘Efficient tests of stock return predictability’, Journal of

Financial Economics, 81, 27-60.

Chan, N. H. and Wei, C. Z. (1987). ‘Limiting Distributions of Least Squares Estimates of

Unstable Autoregressive Processes’, Annals of Statistics, 15, 1050-1063.

Chen, J. and Gupta, A. K. (1997). ‘Testing and Locating Variance Changepoints with Appli-

cation to Stock Prices’, Journal of the American Statistical Association, 92, 739-747.

Cochrane, J. H. (2008). ‘The Dog That Did Not Bark: A Defense of Return Predictability’,

Review of Financial Studies, 21, 1533-1575.

DeLong, D. M. (1981). ‘Crossing Probabilities for a square root boundary by a Bessel Process’,

Communications in Statistics, A10, 2197-213.

Estrella, A. (2003). ‘Critical Values and P Values of Bessel Process Distributions: Computation

and Application to Structural Break Tests’, Econometric Theory, 19, 1128-1143.

16

Gonzalo, J. and Pitarakis, J. (1998). ‘Specification via model selection in vector error correction

models’, Economics Letters, 60, 321328.

Gonzalo, J. and Pitarakis, J. (2002). ‘Estimation and Model Selection Based Inference in Simple

and Multiple Threshold Models’, Journal of Econometrics, 110, 319-352.

Gonzalo, J. and Pitarakis, J. (2012). ‘Regime Specific Predictability in Predictive Regressions’,

Journal of Business and Economic Statistics, 30, 229-241.

Hansen, B. E. (2011). ‘Threshold Autoregression in Economics’, Statistics and Its Interface, 4,

123-127.

Hansen, B.E. (1996). ‘Inference when a nuisance parameter is not identified under the null

hypothesis’, Econometrica, 64, 413430.

Jansson, M. and Moreira, M. J. (2006). ‘Optimal Inference in Regression Models with Nearly

Integrated Regressors’, Econometrica, 74, 681-714.

James, B., Ling, J. and Siegmund, D. (1987). ‘Test for a change-point’, Biometrika, 74, 71-83.

Kostakis, A., Magdalinos, A. and Stamatogiannis, M. (2010). ‘Robust econometric inference

for stock return predictability’, Unpublished Manuscript, University of Southampton.

Lamont, O. (1998). ‘Earnings and Expected Returns’, Journal of Finance, 53, 1563-1587.

Lewellen, J. (2003). ‘Predicting Returns with Financial Ratios’, Journal of Financial Eco-

nomics, 74, 209-235.

Paye, B. S. and Timmermann, A. (2006). ‘Instability of Return Prediction Models’, Journal of

Empirical Finance, 13, 274-315.

Phillips, P. C. B. (2008). ‘Unit Root Model Selection’, Journal of the Japanese Statistical

Society, 38, 65-74.

Rapach, D. E. and Wohar, M. E. (2006). ‘Structural Breaks and Predictive Regression Models

of Aggregate US Stock Returns’, Journal of Financial Econometrics, 4, 238-274.

Schwarz, G. (1978). ‘Estimating the dimension of a model’, Annals of Statistics, 6, 461464.

17

Seo, M. H. (2008). ‘Unit Root Test in a Threshold Autoregression: Asymptotic Theory and

Residual-Based Block Bootstrap’, Econometric Theory, 24, 1699-1716.

Stambaugh, R. F. (1999). ‘Predictive Regressions’, Journal of Financial Economics, 54, 375-

421.

Tong, H. (1983). Threshold Models in Non-linear Time Series Analysis. Lecture Notes in

Statistics, Vol. 21. Springer, Berlin.

Tong, H. (1990). Non-Linear Time Series: A Dynamical System Approach. Oxford University

Press, Oxford.

Tong, H. and Lim, K.S. (1980). ‘Threshold autoregression, limit cycles and cyclical data’,

Journal of The Royal Statistical Society Series B, 4, 245292.

Tong, H. (2011). ‘Threshold models in time-series analysis30 years on’, Statistics and Its Inter-

face, 4, 107-136.

18

APPENDIX

PROOF OF PROPOSITION 1. We initially consider the case where the true model is given by

M2 while the fitted model is M1. It now suffices to show that P [IC < IC(γ0)] → 0 as T → ∞

hence implying that our model selection procedure does not point toM1 asymptotically. Since the

true model is given by y = X01β1 +X0

2β2 + e standard algebra gives

σ̂2(γ0) =e′e

T− e′X1

0(X10′X1

0)−1X10′e

T− e′X2

0(X20′X2

0)−1X20′e

T(12)

so that from our Assumptions A and B we have σ̂2(γ0) = e′e/T + op(1) and IC(γ0) = ln(e′e/T ) +

4cT /T + op(1). When the fitted model is M1 however so that σ̂2 = (y′y − y′X(X ′X)−1X ′y)/T

with y = X01β1 +X0

2β2 + e lengthy but standard algebra gives

σ̂2 =e′e

T− 1

Te′X(X ′X)−1X ′e+

1

T(β1 − β2)′X0

1′X0

1 (X ′X)−1X02′X0

2 (β1 − β2)

+1

T

[2β′1X

01′e+ 2β′2X

02′e− 2β′1(X0

1′X0

1 )(X ′X)−1X ′e− 2β′2(X02′X0

2 )(X ′X)−1X ′e]. (13)

From our Assumptions A and B it immediately follows that the third term in the right hand side

of (13) is

1

T(β1 − β2)′X0

1′X0

1D−1T (D−1

T X ′XD−1T )−1D−1

T X02′X0

2 (β1 − β2) = Op(T ) (14)

since D−1T X ′XD−1

T = Op(1) and D−1T X ′iXiD

−1T = Op(1). Furthermore since D−1

T X ′e = Op(1),

D−1T X ′ie = Op(1) and e′e/T

p→ σ2 the remaining terms in (13) are Op(1) implying that σ̂2−σ̂2(γ0) =

Op(T ) and thus leading to the required result provided that cT /T → 0 since P [IC < IC(γ0)] =

P [ln σ̂2 < ln σ̂2(γ0) + 2cT /T ].

Next, we concentrate on the case where the true model is given by the linear specification M1

and note that P [M2|M1]→ 0 is equivalent to P [IC > IC(γ)]→ 0 for some γ ∈ Γ. This allows us

to write

P [IC > IC(γ)] ≤ P [IC > minγIC(γ)]

= P [maxγ

T ln

(σ̂2

σ̂2(γ)

)> 2cT ]. (15)

Under M1 we have

σ̂2(γ) =u′u

T− 1

T

2∑i=1

e′XiD−1T (D−1

T X ′iXiD−1T )−1D−1

T X ′ie (16)

19

so that from Assumptions A and B it follows that σ̂2(γ)p→ σ2

e and T (σ̂2 − σ̂2(γ)) = Op(1) leading

to maxγ T ln(σ̂2/σ̂2(γ)) = Op(1). Thus under cT → ∞ it follows that the right hand side of (15)

vanishes asymptotically as required.

20

Documents

Uncovering regimes in predictive regressions: A model ...€¦ · Uncovering regimes in predictive regressions: A model selection based approach Jean-Yves Pitarakis University of