Upload
trinhdien
View
231
Download
4
Embed Size (px)
Citation preview
Development and Application of Hidden Markov Models inthe Bayesian Framework
by
Yong Song
A thesis submitted in conformity with the requirementsfor the degree of Doctor of PhilosophyGraduate Department of Economics
University of Toronto
Copyright c© 2011 by Yong Song
ii
Abstract
Development and Application of Hidden Markov Models in the Bayesian Framework
Yong Song
Doctor of Philosophy
Graduate Department of Economics
University of Toronto
2011
This thesis develops new hidden Markov models and applies them to financial market
and macroeconomic time series.
Chapter 1 proposes a probabilistic model of the return distribution with rich and
heterogeneous intra-regime dynamics. It focuses on the characteristics and dynamics of
bear market rallies and bull market corrections, including, for example, the probability
of transition from a bear market rally into a bull market versus back to the primary bear
state. A Bayesian estimation approach accounts for parameter and regime uncertainty
and provides probability statements regarding future regimes and returns. A Value-at-
Risk example illustrates the economic value of our approach.
Chapter 2 develops a new efficient approach to model and forecast time series data
with an unknown number of change-points. The key is assuming a conjugate prior for
the time-varying parameters which characterize each regime and treating the regime du-
ration as a state variable. Conditional on this prior and the time-invariant parameters,
the predictive density and the posterior of the change-points have closed forms. The con-
jugate prior is further modeled as hierarchical to exploit the information across regimes.
This framework allows breaks in the variance, the regression coefficients or both. In
addition to the time-invariant structural change probability, one extension assumes the
regime duration has a Poisson distribution. A new Markov Chain Monte Carlo sampler
draws the parameters from the posterior distribution efficiently. The model is applied to
iii
Canadian inflation time series.
Chapter 3 proposes an infinite dimension Markov switching model to accommodate
regime switching and structural break dynamics or a combination of both in a Bayesian
framework. Two parallel hierarchical structures, one governing the transition probabil-
ities and another governing the parameters of the conditional data density, keep the
model parsimonious and improve forecasts. This nonparametric approach allows for
regime persistence and estimates the number of states automatically. A global identifica-
tion algorithm for structural changes versus regime switching is presented. Applications
to U.S. real interest rates and inflation compare the new model to existing paramet-
ric alternatives. Besides identifying episodes of regime switching and structural breaks,
the hierarchical distribution governing the parameters of the conditional data density
provides significant gains to forecasting precision.
iv
Dedication
This thesis is dedicated to my parents, Deren Song and Jiwei Li, who gave birth to me
and always support my decision to pursue my academic career.
It is also dedicated to my aunt Jihong Li, who treats me like her own son.
Lastly and the most importantly, it is dedicated to my wife, Mei Dong, who always
encourages me when I meet with difficulties. She is my super woman.
v
Acknowledgements
I can not overstate my gratitude to my supervisor, professor John Maheu. He guided me
through not only this thesis but also my Ph.D. years. For five years, he has taught me
from the very basics to the research frontier. His knowledge and sharp intuition in eco-
nomics and econometrics helped me avoid many detours in my research. His enthusiasm
and inspiration encouraged me to keep a high morale during difficult times. Professor
Maheu is the nicest and fairest person I ever met and I can feel his warmth whenever
I talk with him either about research or personal life. Econometrics is very challenging
for myself since I do not have strong mathematics background. His personality is as
important as his expertise to keep me going and finishing my Ph.D. degree. Thanks to
him, I am very happy and excited to work in my research field.
I am grateful to professor Thomas McCurdy and professor Martin Burda, who are
in my thesis committee. Professor McCurdy is one of my coauthors for the first chapter
of this thesis. I have learned a lot from him about research methodology during our
collaboration. He is very supportive and provides me with many opportunities to present
my work. Professor Burda gave me helpful advice to improve my presentation skills.
Although they are very busy in their own work, both of them were always available to
talk with me.
I thank professor Gary Koop for being a very helpful external examiner and flying
all the way from Scotland to Toronto to attend my oral exam. His report on this thesis
provides many sharp questions for me to consider and sheds many lights on my future
research.
I am indebted to professor Christian Gourieroux for a thorough proof-reading of my
job market paper and many helpful comments. And I would like to thank professor John
Geweke for providing many detailed comments on my job market paper.
I also want to thank Robert Kohn, James Morley, Christos Ntantamis, Daniel Smith,
Rodney Strachan, Hao Zhou and seminar participants at Australian National University,
vi
the Bachelier Finance Society 6th World Congress, the Bank of Canada, Canadian Eco-
nomics Association Annual Conference, Canadian Econometric Study Group, CenSoC
at University of Technology Sydney, Econometric Society Summer Meeting, Northern
Finance Association Annual Meeting, Rimimi Centre for Economic Analysis, Univer-
sity of Melbourne, University of New South Wales, University of Toronto, Third Risk
Management Conference Mont Tremblant and Wilfrid Laurier University.
I want express my regard to my primary school teacher Jing Zhou, who is a great
teacher with responsibilities.
I also appreciate my student colleagues for stimulating a competitive and friendly
environment, particularly Xin Jin, Tat-kei Lai, Wei Liu, Brian McCaig, Ling Sun and
Simiao Zhou. I also thank my friends Jingjing Zhang, Haiying Kang, Yong Han, Shaoyan
Zhu in Toronto and Zhiguang Li, Zhihong Si, Jinghuan Wang in Tianjin.
The first chapter is based on joint work with John Maheu and Tom McCurdy.
Contents
1 Components of Bull and Bear Markets: Bull Corrections and Bear
Rallies viii
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Bull and Bear Dating Algorithms . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Estimation and Model Comparison . . . . . . . . . . . . . . . . . . . . . 12
1.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2 An Efficient Approach to Estimate and Forecast in the Presence of an
Unknown Number of Change-points 48
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2 Maheu-Gordon Model with Conjugate Prior . . . . . . . . . . . . . . . . 53
2.3 Hierarchical Structural Break Model . . . . . . . . . . . . . . . . . . . . 62
2.4 Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.5 Application to Canada Inflation . . . . . . . . . . . . . . . . . . . . . . . 69
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
vii
CONTENTS viii
3 Modeling Regime Switching and Structural Breaks with an Infinite
Dimension Markov Switching Model 92
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.2 Dirichlet process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.3 Sticky double hierarchical Dirichlet process hidden Markov model . . . . 101
3.4 Estimation, inference and forecasting . . . . . . . . . . . . . . . . . . . . 104
3.5 Simulation evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.6 Application to U.S. real interest rate . . . . . . . . . . . . . . . . . . . . 114
3.7 Application to U.S. inflation . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
3.9 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Bibliography 149
Chapter 1
Components of Bull and Bear
Markets: Bull Corrections and Bear
Rallies
ix
Chapter 1. Components of Bull and Bear Markets 1
1.1 Introduction
There is a widespread belief both by investors, policy makers and academics that low
frequency trends do exist in the stock market. Traditionally these positive and negative
low frequency trends have been labelled as bull and bear markets respectively. If these
trends do exist, then it is important to extract them from the data to analyse their
properties and consider their use as inputs into investment decisions and risk assessment.
We propose a model that provides answers to typical questions such as, ’Are we in a
bull market or a bear market rally?’ or ’Will this bull market correction become a bear
market?’.
Traditional methods of identifying bull and bear markets are based on an ex post
assessment of the peaks and troughs of the price index. Formal dating algorithms based
on a set of rules for classification are found, for example, in Gonzalez et al. (2005),
Lunde and Timmermann (2004) and Pagan and Sossounov (2003). Some of this work is
related to the dating methods used to identify turning points in the business cycle (Bry
and Boschan (1971)). A drawback is that a turning point can only be identified several
observations after it occurs.
Ex post dating algorithms sort returns into a particular regime with probability zero
or one. The data provides more information; investors may be interested in estimated
probabilities associated with particular states. Such information can be used to answer
questions such as ’How likely is it that the market could turn into a bear next month?’.
Further, ex post dating methods cannot be used for statistical inference on returns or for
investment decisions which require more information from the return distribution, such
as changing risk assessments. For adequate risk management and investment decisions,
we need a probability model for returns and one for which the distribution of returns
changes over time.
For time series that tend to be cyclical, for example, due to business cycles, a popular
model has been a two-state regime-switching model in which the states are latent and the
Chapter 1. Components of Bull and Bear Markets 2
mixing parameters are estimated from the available data. One popular parameterization
is a Markov-switching (MS) model for which transitions between states are governed by
a Markov chain. Hamilton (1989a) applied a two-state MS model to quarterly U.S. GNP
growth rates in order to identify business cycles and estimate 1st-order Markov transition
probabilities associated with the expansion and recession phases of those cycles.
Stock markets are also perceived to have a cyclical pattern which can be captured
with regime-switching models. For example, Hamilton and Lin (1996) relate business
cycles and stock market regimes, Chauvet and Potter (2000) and Maheu and McCurdy
(2000a) use a Markov-switching parameterization to analyze properties of bull and bear
market regimes extracted from aggregate stock market returns.1 The latter paper al-
lows duration-dependent transition probabilities, as well duration-dependent intra-state
dynamics for returns and volatilities. Lunde and Timmermann (2004) study duration
dependence after sorting stock returns into either a bull or bear market using their dating
algorithm. Ntantamis (2009) explores potential explanatory variables for stock market
regimes’ duration.
In a related literature that investigates cyclical patterns in a broader class of as-
sets, Guidolin and Timmermann (2005) use a 3-state regime-switching model to identify
bull and bear markets in monthly UK stock and bond returns and analyze implications
for predictability and optimal asset allocation. Guidolin and Timmermann (2006) add
an additional state in order to model the nonlinear joint dynamics of monthly returns
associated with small and large cap stocks and long-term bonds.
In contrast to the existing literature, our objective is to use higher-frequency weekly
1There are many other applications of regime-switching models to forcing processes for asset pricingmodels and to asset returns. Cecchetti et al. (1990), Kandel and Stambaugh (1990), Gordon and St-Amour (2000), Calvet and Fisher (2007), Lettau et al. (2008), Guidolin and Timmermann (2008) andDavid and Veronesi (2009), among others, derive implications of regime-switching for equilibrium assetprices. Examples for interest rates include Garcia and Perron (1996b) and Ang and Bekaert (2002c).Applications that explore the implications of nonlinearities due to regimes switches for asset allocationand/or predictability of returns include Turner et al. (1989), van Norden and Schaller (1997), Maheuand McCurdy (2000b), Perez-Quiros and Timmermann (2001), Ang and Bekaert (2002b), Guidolin andTimmermann (2007).
Chapter 1. Components of Bull and Bear Markets 3
data and to provide a real-time approach to identifying phases of the market that relate
to investors’ perceptions of primary and secondary trends in aggregate stock returns.
Existing approaches do not explicitly model bull market corrections and bear market
rallies. Separating short-term reversals from the primary trend in high-frequency market
returns is an important empirical regularity that a model must capture for it to be able
to account for market dynamics.
We propose a latent 4-state Markov-switching model for weekly stock returns. Our
focus is on modeling the component states of bull and bear market regimes in order to
identify and forecast bull, bull correction, bear and bear rally states. The bear and bear
rally states govern the bear regime; the bull correction and bull states govern the bull
regime. The model can accommodate short-term reversals (secondary trends) within each
regime of the market. For example, in the bull regime it is possible to have a series of
persistent negative returns (a bull correction), despite the fact that the expected long-run
return (primary trend) is positive in that regime. Analogously, bear markets often exhibit
persistent rallies which are subsequently reversed as investors take the opportunity to sell
with the result that the average return in that regime is still negative.
It is important to note that our additional states allow for both intra and inter-regime
transitions. A bear rally is allowed to move back to the bear state or to exit the bear
regime by moving to a bull state. Likewise, a bull correction can move back to the bull
state or exit the bull regime by transitioning to a bear state. This richer structure allows
regimes to feature several episodes of their component states. For example, a bull regime
can be characterized by a combination of bull states and bull corrections. Similarly, a
bear regime can consist of several episodes of the bear state and the bear rally state,
exactly as many investors feel we observe in the data. Because, the realization of states
in a regime will differ over time, bull and bear regimes can be heterogenous over time.
These important intra and inter-regime dynamics are absent in the existing literature.
Our Bayesian estimation approach accounts for parameter and regime uncertainty and
Chapter 1. Components of Bull and Bear Markets 4
provides probability statements regarding future regimes and returns. As noted above,
each bear and bull regime has two states. We identify the model by imposing the long-run
mean of returns to be negative in the bear regime and positive in the bull regime; while
allowing for very different dynamics within each regime. We consider several versions of
the model in which the variance dynamics are decoupled from the mean dynamics. We
find that a model in which the states associated with the first and second moment are
coupled provides the best fit to the data.
Applied to 125 years of data our model provides superior identification of trends in
stock prices. One important difference with our specification is that the richer dynamics
in each regime, facilitated by our 4-state model, allow us to extract bull and bear markets
in higher frequency data. As we show, a problem with a two-state Markov-switching
model applied to higher frequency data is that it results in too many switches between
the high and low return states. In other words, it is incapable of extracting the low
frequency trends in the market. In high frequency data it is important to allow for
short-term reversals in the regime of the market. Relative to a two-state model we find
that market regimes are more persistent and there is less erratic switching. According to
Bayes factors, our 4-state model of bull and bear markets is strongly favored over several
alternatives including a two-state model, and different variance dynamics.
Our results include probabilistic identification of bear, bear rally, bull correction and
bull states – as well as the characteristics of the associated bear and bull market regimes.
For instance, bull regimes have an average duration of just under 5 years, while the
duration of a bull correction is 4 months on average and a bear rally is just over half a
year. The cumulative return mean of the bull market state is 7.88% but bull corrections
offset this by 2.13% on average. Average cumulative return in the bear market state
is -12.4% but bear market rallies counteract that steep decline by yielding a cumulative
return of 7.1% on average. Note that these states are combined into bull and bear market
regimes in heterogenous patterns over time yielding an average cumulative return in the
Chapter 1. Components of Bull and Bear Markets 5
bull market regime of 33% while that for the bear market regime is about -10%. Also,
although the average cumulative return in the bear rally state is not much less than
that in the bull market state, the ex post Sharpe ratio for the latter is about 2.5 times
larger. This result highlights the importance of also considering assessments of volatility
associated with the alternative states, for example, when identifying bear market rallies
versus bull markets.
Of primary importance is the fact that our model can tell us the probabilities of
market states in real time, unlike dating algorithms. It can also produce out-of-sample
forecasts. For example, the model identifies in real time a transition from a bull market
correction to a bear market in early October 2008. The bear rally and bull correction
states are critical to modeling turning points between regimes; our results show that most
transitions between bull and bear regimes occur through these states. This is consistent
with investors’ perceptions.2 Further, we find asymmetries in intra-regime dynamics, for
example, a bull market correction returns to the bull market state more often than a bear
market rally reverts to the bear state. These are important features that the existing
literature on bull and bear markets ignores.
Our Markov-switching structure provides a full description of the return distribution.
In an out-of-sample application, the probability statements concerning the predictive
density of returns are used to generate Value-at-Risk forecasts. This provides a simple
example of the economic value of our proposed model.
This chapter is organized as follows. The next section describes the data, Section 1.3
discusses two alternative ex post market regime dating algorithms. We use one of these
algorithms to sort actual data and data simulated from our candidate models in order to
determine whether the latter can match commonly perceived features of bull and bear
markets. Section 1.4 summarizes the benchmark 2-state model and develops our proposed
2A Google search turns up such headlines as: ’Bull Market or Bear-Market Rally?’, ’Genuine bullmarket, not a bear market rally’, ’A bear rally in bull’s clothing?’, ’Bear market rally/Bull marketbeginning?’, and many more.
Chapter 1. Components of Bull and Bear Markets 6
4-state specification. Estimation and model comparison are discussed in Section 1.5.
Section 1.6 presents results including: parameter estimates; probabilistic identification
of the market states and regimes; and Value-at-Risk forecasts. Section 1.7 concludes.
1.2 Data
We begin with 125 years of daily capital gain returns on a broad market equity index.
Our source for the period 1926-2008 inclusive is the value-weighted return excluding
dividends associated with the CRSP S&P 500 index.3 The 1885:02-1925 daily capital
gain returns are courtesy of Bill Schwert (see Schwert (1990)). For 2009-2010, we use the
daily rates of change of the S&P 500 index level (SPX) obtained from Reuters.
Returns are converted to daily continuously compounded returns from which we
construct weekly continuously compounded returns by cumulating daily returns from
Wednesday close to Wednesday close of the following week. If a Wednesday is missing,
we use Tuesday close. If the Tuesday is also missing, we use Thursday. Weekly realized
variance (RV) is computed as the sum of daily (intra-week) squared returns.
Weekly returns are scaled by 100 so they are percentage returns. Unless otherwise
indicated, henceforth returns refer to weekly continuously compounded returns expressed
as a percentage. We have 6498 weekly observations covering the period February 25, 1885
to January 20, 2010. Summary statistics are shown in Table 1.1.
1.3 Bull and Bear Dating Algorithms
Ex post sorting methods for classification of stock returns into bull and bear phases
are called dating algorithms. Such algorithms attempt to use a sequence of rules to
isolate patterns in the data. A popular algorithm is that used by Bry and Boschan
3Note that this is the S&P 90 prior to March 4, 1957.
Chapter 1. Components of Bull and Bear Markets 7
(1971) to identify turning points of business cycles. Pagan and Sossounov (2003) adapted
this algorithm to study the characteristics of bull/bear regimes in monthly stock prices.
First a criterion for identifying potential peaks and troughs is applied; then censoring
rules are used to impose minimum duration constraints on both phases and complete
cycles. Finally, an exception to the rule for the minimum length of a phase is allowed to
accommodate ’sharp movements’ in stock prices.
The Pagan and Sossounov (2003) BB algorithm is summarized in the appendix. There
are alternative dating algorithms or filters for identifying turning points. For example,
the Lunde and Timmermann (2004) (LT) algorithm identifies bull and bear markets using
a cumulative return threshold of 20% to locate peaks and troughs moving forward.4 They
define a binary market indicator variable It which takes the value 1 if the stock market
is identified by their algorithm to be in a bull state at time t and 0 if it is in a bear state.
Our application of this LT dating algorithm is also summarized in the appendix.
The classification of our data into bull and bear regimes using these two filters is
found in Table 1.2. There are several features to note. First, the sorting of the data is
broadly similar but with important differences. For example, during the 1930s the BB
approach finds many more switches between market phases than does the LT algorithm.
More recently, both identify 1987-12 as a trough but the subsequent bull phase ends in
1990-06 for LT but 2000-03 for BB. The average bear duration is similar (66 weeks) while
the average bull duration is quite different, 117.0 weeks (BB) versus 166.7 (LT). In other
words, the different parameters and assumptions in the filtering methods can result in a
different classification of market phases.
Although the ex post dating algorithms can filter the data to locate different regimes,
they cannot be used for forecasting or inference. In addition, since the sorting rule
focuses on the first moment, it does not characterize the full distribution of returns.
4Lunde and Timmermann (2004) explore alternative thresholds and also asymmetric thresholds forswitching from bull versus from bear markets. For this description we use a threshold of 20%.
Chapter 1. Components of Bull and Bear Markets 8
The latter is required if we wish to derive features of the regimes that are useful for
measuring and forecasting risk. Also, as noted above, ex post dating algorithms sort
returns into a particular regime with probability zero or one. However, the data provides
more information allowing one to estimate probabilities associated with particular states.
Nevertheless, the dating algorithms are still very useful. For example, we use the
LT algorithm to sort data simulated from our candidate parametric models in order to
determine whether the latter can match commonly perceived features of bull and bear
markets.
1.4 Models
In this section, we briefly review a benchmark two-state model, our proposed 4-state
model, and some alternative specifications of the latter used to evaluate robustness of
our best model.
1.4.1 Two-State Markov-Switching Model
The concept of bull and bear markets suggests cycles or trends that get reversed. Since
those regimes are not observable, as discussed in Section 1.1, two-state latent-variable
MS models have been applied to stock market data. A two-state 1st-order Markov model
can be written
rt|st ∼ N(µst , σ2st) (1)
pij = p(st = j|st−1 = i) (2)
i = 1, 2, j = 1, 2. We impose µ1 < 0 and µ2 > 0 so that st = 1 is the bear market and
st = 2 is the bull market.
Modeling of the latent regimes, regime probabilities, and state transition probabili-
Chapter 1. Components of Bull and Bear Markets 9
ties, allows explicit model estimation and inference. In addition, in contrast to dating
algorithms or filters, forecasts are possible. Investors can base their investment decisions
on the posterior states or the whole forecast density.
1.4.2 MS-4 to allow Bull Corrections and Bear Rallies
Consider the following general K-state first-order Markov-switching model for returns
rt|st ∼ N(µst , σ2st) (3)
pij = p(st = j|st−1 = i) (4)
i = 1, ..., K, j = 1, ..., K. We explore a 4-state model, K = 4, in order to focus on model-
ing potential phases of the aggregate stock market. Without any additional restrictions
we cannot identify the model and relate it to market phases. Therefore, we consider the
following restrictions. First, the states st = 1, 2 are assumed to govern the bear market;
we label these states as the bear regime. The states st = 3, 4 are assumed to govern the
bull market; these states are labeled the bull regime. Each regime has 2 states which
allows for positive and negative periods of price growth within each regime. In particular
we impose
µ1 < 0 (bear market state), (5)
µ2 > 0 (bear market rally),
µ3 < 0 (bull market correction),
µ4 > 0 (bull market state).
This structure can capture short-term reversals in market trends. Each state can have a
different variance and can accommodate autoregressive heteroskedasticity in returns. In
addition, conditional heteroskedasticity within each regime can be captured.
Chapter 1. Components of Bull and Bear Markets 10
Consistent with the 2 states in each regime the full transition matrix is
P =
p11 p12 0 p14
p21 p22 0 p24
p31 0 p33 p34
p41 0 p43 p44
. (6)
This structure allows for several important features that are excluded in the smaller
Markov-switching models in the literature. First, a bear regime can feature several
episodes of the bear state and bear rally state, exactly as many investors feel we observe
in the data. Similarly, the bull regime can be characterized by a combination of bull
states and bull corrections. Because, the realization of states in a regime will differ over
time, both bull and bear regime will tend to look heterogenous to some extent. For
instance, based on returns, a bear regime lasting 5 periods made of the states
st = 1, st+1 = 1, st+2 = 1, st+3 = 2, st+4 = 2, st+5 = 2, st+6 = 4
will tend to look very different than
st = 1, st+1 = 1, st+2 = 1, st+3 = 2, st+4 = 1, st+5 = 1, st+6 = 4.
A second important contribution is that a bear rally is allowed to move either into
the bull state or back to the bear state; analogously, a bull correction can move to a bear
state or back to the bull state. These important inter and intra-regime dynamics are
absent in the existing literature.
The unconditional probabilities associated with P can be solved (Hamilton (1994))
π = (A′A)−1A′e (7)
Chapter 1. Components of Bull and Bear Markets 11
where A′ = [P ′ − I, ι] and e′ = [0, 0, 0, 0, 1] and ι = [1, 1, 1, 1]′.
Using the matrix of unconditional state probabilities given by (7), we impose the
following conditions on long-run returns in the bear and bull regimes respectively5,
E[rt|bear regime, st = 1, 2] =π1
π1 + π2µ1 +
π2π1 + π2
µ2 < 0 (8)
E[rt|bull regime, st = 3, 4] =π3
π3 + π4µ3 +
π4π3 + π4
µ4 > 0. (9)
We do not impose any constraint on the variances.
The equations (5) and (6), along with equations (8) and (9), serve to identify6 bull
and bear regimes. The bull (bear) regime has a long-run positive (negative) return. Each
market regime can display short-term reversals that differ from their long-run mean. For
example, a bear regime can display a bear market rally (temporary period of positive
returns), even though its long-run return is negative. Similarly for the bull market.
1.4.3 Other Models for Robustness Checks
Besides the 4-state model we consider several other specifications and provide model
comparisons among them. The dependencies in the variance of returns are the most
dominate feature of the data. This structure may adversely dominate dynamics of the
conditional mean. The following specifications are included to investigate this issue.
5Note that at this point we are abstracting from an equilibrium model of investor behavior. Investorscannot identify states with probability 1 so modeling investors’ expected returns at each point is beyondthe scope of this chapter. Regimes or states may have negative expected returns for some limited periodfor a variety of reasons such as changes in risk premiums due to learning following breaks (Pastor andStambaugh (2001)), different investment horizons (Guidolin and Timmermann (2005)), etc.
6Discrete mixture of distributions are subject to identification issues. Label switching occurs whenthe states and parameters are permuted but the likelihood stays the same. Our prior restrictions avoidthis issue and identify the model. For more discussion on this see Fruhwirth-Schnatter (2006).
Chapter 1. Components of Bull and Bear Markets 12
Restricted 4-State Model
This is identical to the 4-state model in Section 1.4.2 except that inside a regime the
return innovations are homoskedastic. That is, σ21 = σ2
2 and σ23 = σ2
4. In this case, the
variance within each regime is restricted to be constant although the overall variance of
returns can change over time due to switches between regimes.
Markov-Switching Mean and i.i.d. Variance Model
In this model, the mean and variance dynamics are decoupled. This is a robustness
check to determine to what extent the variance dynamics might be driving the regime
transitions. This specification is identical to the Markov-switching model in Section 1.4.2
except that only the conditional mean follows the Markov chain while the variance follows
an independent i.i.d mixture. That is,
rt|st = µst + zt (10)
zt ∼L∑i=1
ηiN(0, σ2i ), ηi ≥ 0,
L∑i=1
ηi = 1 (11)
pij = p(st = j|st−1 = i), i, j = 1, ..., K (12)
For identification, σ21 < σ2
2 < · · · < σ2L is imposed along with the constraints used for the
conditional mean in the previous section. We focus on the case K = 4 and L = 4, again
to allow us to capture at least four phases of cycles for aggregate stock returns.
1.5 Estimation and Model Comparison
1.5.1 Estimation
In this section we discuss Bayesian estimation for the most general model introduced in
Section 1.4.2 assuming there are K states, k = 1, ..., K. The other models are estimated
Chapter 1. Components of Bull and Bear Markets 13
in a similar way with minor modifications.
There are 3 groups of parameters M = µ1, ..., µK, Σ = σ21, ..., σ
2K, and the ele-
ments of the transition matrix P . Let θ = M,Σ, P and given data IT = r1, ..., rT we
augment the parameter space to include the states S = s1, ..., sT so that we sample from
the full posterior p(θ, S|IT ). Assuming conditionally conjugate priors µi ∼ N(mi, n2i ),
σ−2i ∼ G(vi/2, wi/2) and each row of P following a Dirichlet distribution, allows for a
Gibbs sampling approach following Chib (1996). Gibbs sampling iterates on sampling
from the following conditional densities given startup parameter values for M , Σ and P :
S|M,Σ, P
M |Σ, P, S
Σ|M,P, S
P |M,Σ, S
Sequentially sampling from each of these conditional densities results in one iteration of
the Gibbs sampler. Dropping an initial set of draws to remove any dependence from
startup values, the remaining draws S(j),M (j),Σ(j), P (j)Nj=1 are collected to estimate
features of the posterior density. Simulation consistent estimates can be obtained as
sample averages of the draws. For example, the posterior mean of the state dependent
mean and standard deviation of returns are estimated as
1
N
N∑j=1
µ(j)k ,
1
N
N∑j=1
σ(j)k , (13)
for k = 1, ..., K and are simulation consistent estimates of E[µk|IT ] and E[σk|IT ] respec-
tively.
The first sampling step of S|M,Σ, P involves a joint draw of all the states. Chib (1996)
shows that this can be done by a so-called forward and backward smoother through the
Chapter 1. Components of Bull and Bear Markets 14
identity
p(S|θ, IT ) = p(sT |θ, IT )T−1∏t=1
p(st|st+1, θ, It). (14)
The forward pass is to compute the Hamilton (1989a) filter for t = 1, ..., T
p(st = k|θ, It−1) =K∑l=1
p(st−1 = l|θ, It−1)plk, k = 1, ..., K, (15)
p(st = k|θ, It) =p(st = k|θ, It−1)f(rt|It−1, st = k)∑Kl=1 p(st = l|θ, It−1)f(rt|It−1, st = l)
, k = 1, ..., K. (16)
Note that f(rt|It−1, st = k) is the normal pdf N(µk, σ2k). Finally, Chib (1996) has shown
that a joint draw of the states can be taken sequentially from
p(st|st+1, θ, It) ∝ p(st|θ, It)p(st+1|st, P ), (17)
where the first term on the right-hand side is from (16) and the second term is from the
transition matrix. This is the backward step and runs from t = T − 1, T − 2, ..., 1. The
draw of sT is taken according to p(sT = k|θ, IT ), k = 1, ..., K.
The second and third sampling steps are straightforward and use results from the
linear regression model. Conditional on S we select the data in regime k and let the
number of observations of st = k be denoted as Tk. Then µk|Σ, P, S ∼ N(ak, Ak),
ak = Ak
σ−2k ∑t∈t|st=k
rt + n−2k mk
, Ak = (σ−2k Tk + n−2k )−1. (18)
A draw of the variance is taken from
σ−2k |M,P, S ∼ G
(Tk + vk)/2,
∑t∈t|st=k
(rt − µk)2 + wk
/2
(19)
Given the conjugate Dirichlet prior on each row of P , the final step is to sample
Chapter 1. Components of Bull and Bear Markets 15
P |M,Σ, S from the Dirichlet distribution (Geweke (2005)).
An important byproduct of Gibbs sampling is an estimate of the smoothed state
probabilties p(st|IT ) which can be estimated as
p(st = i|IT ) =1
N
N∑j=1
1st=i(S(j)) (20)
for i = 1, ..., K.
At each step, if a parameter draw violates any of the prior restrictions in (5), (6), (8)
and (9), then it is discarded. For the 4-state model we set the independent priors as
µ1 ∼ N(−0.7, 1), µ2 ∼ N(0.2, 1), µ3 ∼ N(−0.2, 1), µ4 ∼ N(0.3, 1) (21)
σ−2i ∼ G(0.5, 0.05) for i = 1, 2, 3, 4 (22)
p11, p12, p14 ∼ Dir(8, 1.5, 0.5), p21, p22, p24 ∼ Dir(1.5, 8, 0.5) (23)
p31, p33, p34 ∼ Dir(0.5, 8, 1.5), p41, p43, p44 ∼ Dir(0.5, 1.5, 8). (24)
These priors are informative but cover a wide range of empirically relevant parameter
values.7
1.5.2 Model Comparison
If the marginal likelihood can be computed for a model it is possible to compare models
based on Bayes factors. Non-nested models can be compared as well as specifications with
a different number of states. Note that the Bayes factor penalizes over-parameterized
models that do not deliver improved predictions.8 For the general Markov-switching
7We checked the prior sensitivity. It includes scaling the variance of the µi’s by 10, increasing thevariance of the precision σ−2
i by 10 times, setting the transition probabilities to have uniform distributionand all combinations of these priors. The sorting of the data into bull and bear regimes is robust andthe model comparison results are consistent.
8This is referred to as an Ockham’s razor effect. See Kass and Raftery (1995b) for a discussion onthe benefits of Bayes factors.
Chapter 1. Components of Bull and Bear Markets 16
model with K states, the marginal likelihood for model Mi is defined as
p(r|Mi) =
∫p(r|Mi, θ)p(θ|Mi)dθ (25)
which integrates out parameter uncertainty. p(θ|Mi) is the prior and
p(r|Mi, θ) =T∏t=1
f(rt|It−1, θ) (26)
is the likelihood which has S integrated out according to
f(rt|It−1, θ) =K∑k=1
f(rt|It−1, θ, st = k)p(st = k|θ, It−1). (27)
The term p(st = k|θ, It−1) is available from the Hamilton filter. Chib (1995) shows how
to estimate the marginal likelihood for MS models. His estimate is based on re-arranging
Bayes’ theorem as
p(r|Mi) =p(r|Mi, θ
∗)p(θ∗|Mi)
p(θ∗|r,Mi)(28)
where θ∗ is a point of high mass in the posterior pdf. The terms in the numerator are
directly available above while the denominator can be estimated using additional Gibbs
sampling runs.9
A log-Bayes factor between model Mi and Mj is defined as
log(BFij) = log(p(r|Mi))− log(p(r|Mj)). (29)
Kass and Raftery (1995b) suggest interpreting the evidence for Mi versus Mj as: not
worth more than a bare mention for 0 ≤ log(BFij) < 1; positive for 1 ≤ log(BFij) < 3;
9The integrating constant in the prior pdf is estimated by simulation.
Chapter 1. Components of Bull and Bear Markets 17
strong for 3 ≤ log(BFij) < 5; and very strong for log(BFij) ≥ 5.
1.5.3 Predictive Density
An important feature of our probabilitic approach is that a predictive density of fu-
ture returns can be computed that integrates out all uncertainty regarding states and
parameters.
The predictive density for future returns based on current information at time t is
computed as
p(rt|It−1) =
∫f(rt|It−1, θ)p(θ|It−1)dθ (30)
which involved integrating out both state and parameter uncertainty using the posterior
distribution p(θ|It−1). From the Gibbs sampling draws S(j),M (j),Σ(j), P (j)Nj=1 based
on data It−1 we approximate the predictive density as
p(rt|It−1) =1
N
N∑i=1
K∑k=1
f(rt|It−1, θ(i), st = k)p(st = k|s(i)t−1, θ(i)) (31)
where f(rt|It−1, θ(i), st = k) follows N(µ(i)k , σ
2(i)k ) and p(st = k|s(i)t−1, θ(i)) is the transition
probability.
The predictive mean of a future state st can also be easily estimated by simulating
from the distribution p(st = k|s(i)t−1, θ(i)) a state s(j)t for each state and parameter draw
s(i)t−1, θ
(i). The average of these draws, s(j)t Nj=1 is an estimate of E[st|It−1].
Chapter 1. Components of Bull and Bear Markets 18
1.6 Results
1.6.1 Parameter Estimates and Implied Distributions
Model estimates for the 2-state Markov-switching (MS-2) model are found in Table 1.3.
State 1 has a negative conditional mean along with a high conditional variance whereas
state 2 displays a high conditional mean with a low conditional variance. Both regimes
are very persistent. These results are consistent with the sorting of bull and bear regimes
in Maheu and McCurdy (2000a) and Guidolin and Timmermann (2005).
Estimates for our proposed 4-state model (MS-4) are found in Table 1.4. All param-
eters are precisely estimated indicating that the data are quite informative. Recall that
states st = 1, 2 capture the bear regime while states st = 3, 4 capture the bull regime.
Each regime contains a state with a positive and a negative conditional mean. We label
states 1 and 2 the bear and bear rally states respectively; states 3 and 4 are the bull
correction and bull states.
Consistent with the MS-2 model, volatility is highest in the bear regime. In particular,
the highest volatility occurs in the bear regime in state 1. This state also delivers the
lowest average return. The highest average return and lowest volatility is in state 4
which is part of the bull regime. The bear rally state (st = 2) delivers a conditional
mean of 0.23 and conditional standard deviation of 2.63. However, this mean is lower
and the volatility higher than the bull positive growth state (st = 4). Analogously, the
bull correction state (st = 3) has a larger conditional mean (−0.13 > −0.94) and smaller
volatility (2.18 < 6.01) than the bear state 1.
All states display high persistence (pii is high for all i). However, the transition
probabilities display some asymmetries. For example, the probability of a bear rally
moving back to the bear state 1 (p21 = 0.015) is a little lower than changing regime to
a bull market (p24 = 0.019). On the other hand, the probability of a bull correction
returning to a bull market (p34 = 0.051) is considerably higher than changing regime to
Chapter 1. Components of Bull and Bear Markets 19
the bear state (p31 = 0.010).
Figure 1.1 displays the density of each of the 4 states. The differences in the illustrated
densities are in accord with the parameter estimates in Table 1.4. Differences in the
spreads of the densities are most apparent but the locations are also different. There is
no suggestion from these plots that states 1 and 2 are the same or that states 3 and 4
are the same, as a two-state Markov-switching model would assume.
Integrating state 1 and 2 gives the bear regime and doing the same for states 3 and 4
produces the bull regime. These densities are shown in Figure 1.2. The bear regime has a
mean slightly below 0 but with a much larger variance than the bull regime. The implied
unconditional density of returns is a mixture of these two regimes and displayed in the
middle of the figure. Table 1.5 reports the unconditional probabilties for the states. On
average the market spends 0.157 of time in a bear rally while 0.304 in a bull correction.
The most time is spent in the bull growth state 4. The unconditional probability of the
bull regime is 0.773.
A comparison of the regime statistics implied by the parameter estimates for the MS-2
and MS-4 models is found in Table 1.6. The expected duration of regimes is much longer
in the 4-state model. That is, by allowing heterogeneity within a regime in our 4-state
model, we switch between bull and bear markets less frequently. For instance, in a MS-2
parameterization the bull market has a duration of only 82.6 weeks, about 18 months,
while the richer MS-4 model has a bull duration of just under 5 years. As we will see
below, there is much more switching between regimes in the MS-2 model.
In the 2-state model, the expected return and variance are fixed within a regime. In
this case, the only source of intra-regime variance is return innovations. For example for
the bear regime in the MS-2 model, the expected variance is E[Var(rt|st = 1)] = 19.6.
In contrast, the average variance for each regime in the 4-state model can be attributed
to changes in the conditional mean as well as to the average conditional variance of the
return innovations. For instance, the average variance of returns in the bear regime can
Chapter 1. Components of Bull and Bear Markets 20
be decomposed as Var(rt|st = 1, 2) = Var(E[rt|st]|st = 1, 2) + E[Var(rt|st)|st = 1, 2] =
0.31 + 16.1, with a similar result for the bull regime. For the bull and bear phase, the
mean dynamics account for a small share, 2% of the total variance.10
The MS-2 model assumes normality in both market regimes while the MS-4 shows
that the data is at odds with this assumption. Skewness in present in bear markets while
excess kurtosis is found in both bull and bear regimes. Overall the bear market deviates
more from a normal distribution; it has thicker tails and captures more extreme events.
Table 1.7 summarizes features of the MS-4 parameterization for both the regimes and
their component states derived from the posterior parameter estimates. The bear regime
duration is 77.8 weeks, much shorter than the bull regime duration of 256.0 weeks. The
average cumulative return in the bear (bull) regime11 is -9.94 (33.0). The volatility in
the bear market is more than twice that in the bull market. The third panel provides
a breakdown of cumulative return means in each of the component states of the market
regimes. The bear rally yields a cumulative return of 7.10 on average which partially
offsets the average decline of -12.4 in state 1. On the other hand the bull correction has
a cumulative return mean of -2.13 which diminishes the average cumulative return of
7.88 in state 4. Note that these states are combined into bull and bear market regimes
in heterogenous patterns over time yielding the statistics for regimes summarized in the
first two panels of Table 1.7.
Although the stock market spends most of the time in the bull regime (states 3 and
4), in terms of individual states it is state 2 that has the longest duration while the
shortest is state 1. The final panel of Table 1.7 records the conditional mean divided by
the associated conditional standard deviation for each state, that is, estimates of µi/σi
from Table 1.4. This is analogous to an ex post Sharpe ratio. State 4 provides the most
10This is computed as 0.31/(0.31+16.1) and 0.04/(0.04+2.89).11This is equal to the expected return for the bear regime, given by Equation (8), times the expected
duration for that regime which is
( π1
π1+π2π2
π1+π2
)′ [I2 −
(p11 p12p21 p22
)]−2(p14p24
).
Chapter 1. Components of Bull and Bear Markets 21
favorable risk-return tradeoff followed by state 2, 3 and 1. Note that the Sharpe ratio
in the bull state 4 is approximately 2.5 times larger than in the bear rally (state 2). In
other words, even though the bear rally delivers a postive expected return, that return
is much more variable than in the bull state.
1.6.2 Model Comparisons
One can conduct formal model comparisons based on the marginal likelihoods reported
in Table 1.8. The constant mean and variance model performs the worst (has the lowest
marginal likelihood). The next model has a constant mean but allows the variances
to follow a 4-state i.i.d. mixture. Following this are models with a 2-state versus a 4-
state Markov-switching conditional mean – both combined with a 4-state i.i.d. variance
as in Section 1.4.3. In both cases, the additional dynamics that are introduced to the
conditional mean of returns provides a significant improvement over the constant mean
case with the same 4-state i.i.d. variance. However, all of these specifications are strongly
dominated by their counterparts which allow a common 2 (or 4) state Markov chain
to direct both conditional moments. These specifications capture persistence in the
conditional variance.
Note that the log-Bayes factor between the 2-state MS and the 4-state MS in the
conditional mean restricted to have only a 2-state conditional variance (Section 1.4.3)
is large at 53.4 = −13849.9 − (−13903.3). This improved fit comes when additional
conditional mean dynamics (going from 2 to 4 states) are added to the basic 2-state MS
model. The best model is the 4-state Markov-switching model. The log-Bayes factor in
support of the 4-state versus the 2-state model is 162.9 = −13740.4 − (−13903.3). The
zero restrictions in the transition matrix (6) are also strongly supported by the data. For
instance, the log-Bayes factor is 6.9 = −13740.4 − (−13747.3) in support of the MS-4
model with P matrix (6) as compared to a 4 state model with an unrestricted transition
matrix (all 16 elements of P are estimated).
Chapter 1. Components of Bull and Bear Markets 22
Overall, there is very strong evidence that the 4-state specification of Section 1.4.2
provides the best fit to weekly returns. The comparisons also show that this improved fit
comes from improved fit to both the conditional mean and variance. Not only does our
MS-4 model provide a better economic characterization of differences in stock market
cycles but the model statistically dominates other alternatives.
The Markov-switching models specify a latent variable that directs low frequency
trends in the data. As such, the regime characteristics from the population model are
not directly comparable to the dating algorithms of Section 1.3. Instead, we consider
the dating algorithm as a lens to view both the S&P500 data and data simulated from
our preferred MS-4 model. Using parameter draws from the Gibbs sampler, we simulate
return data from the model and then apply the LT dating algorithm to those simulated
returns. This is done many times12 and the average and 0.70 density intervals of these
statistics are reported in Table 1.9 along with the statistics from the S&P500 data.
Although our model provides a richer 4 state description of bull and bear markets it does
account for all of the data statistics associated with a simpler 2 state view of the market
using the LT dating algorithm.
1.6.3 Identification of Historical Turning Points in the Market
The dating of the market regimes using the LT dating algorithm are found in the top panel
of Figure 1.3. The shaded portions under the cumulative return denote bull markets while
the white portions of the figure are the bear markets. Below this panel is the smoothed
probability of a bull market, p(st = 3|IT ) + p(st = 4|IT ) for the 4-state model. The final
plot in Figure 1.3 is the smoothed probability of a bull market, p(st = 2|IT ) from the
2-state model. The 4-state model produces less erratic shifts between market regimes,
closely matches the trends in prices, and generally corresponds to the dating algorithm.
The 2-state model is less able to extract the low frequency trends in the market. In
1210,000 simulations each of 6498 observations.
Chapter 1. Components of Bull and Bear Markets 23
high frequency data it is important to allow intra-regime dynamics, such as short-term
reversals.
Note that the success of our model should not be based on how well it matches
the results from dating algorithms. Rather this comparison is done to show that the
latent-state MS models can identify bull and bear markets with similar features to those
identified by conventional dating algorithms. Beyond that, the Markov-switching models
presented in this chapter provide a superior approach to modeling stock market trends
as they deliver a full specification of the distribution of returns along with latent market
dynamics. Such an approach permits out-of-sample forecasting which we turn to in
Section 1.6.4.
The following subsections discuss how our model identifies sub-regime dynamics using
examples from various subperiods. There are several important points revealed by this
dicussion. First, bear (bull) markets are persistent but are made of many regular transi-
tions between states 1 and 2 (3 and 4). Second, in each of the examples the move between
regimes occurs through either the bear rally or the bull correction state. In other words,
these additional dynamics are critical to fully capturing turning points in stock market
cycles. This is also borne out by our model estimates. The most likely route for a bear
market to go to a bull market is through the bear rally state. Given that a bull market
has just started, the probability is 0.9342 that the previous state was a bear rally13, and
only 0.0658 that it was a bear state. Similarly, given that a bear market has just started,
the probability is 0.8663 that the previous state was a bull correction, and only 0.1337
that it was a bull state. The following subperiod descriptions provide examples of this
richer specification of turning points plus frequent reversals within a regime.
13p(st = 2|st+1 = 4, st = 1 or 2) ∝ p24π2
π1+π2, p(st = 1|st+1 = 4, st = 1 or 2) ∝ p14π1
π1+π2
Chapter 1. Components of Bull and Bear Markets 24
1927-1939
Figure 1.4 displays the log-price and the realized volatility (square root of realized vari-
ance) in the top panel, the smoothed states of the MS-4 model in the second panel, and
the posterior probability of the bull market, p(st = 3|IT )+p(st = 4|IT ), in the last panel.
Just before the crash of 1929 the model identifies a bull correction state. The tran-
sition from a bull to bear market occurs as a move from a bull market state to a bull
correction state and then into the bear regime. For the week ending October 16 1929,
there was a return of -3.348 and the market transitioned from the bull correction state
into the bear market state with p(st = 1|IT ) = 0.63. This is further reinforced so that
the next 5 weeks have essentially probability 1 for state 1.
As this figure shows, the remainder of this subperiod is decisively a bear market,
but displays considerable heterogeneity in that there are several short-lived bear rallies.
The high levels of realized volatility coincide with the high volatility in the bear market
states. Periods of somewhat lower volatility are associated with the bear rally states.
In Figure 1.4, a strong bear rally begins in late November 1933 and lasts until August
25, 1937, at which time there is a move back into the bear market state. Realized
volatility increases with this move into state 1.
1980-1985
In Figure 1.5, the market displays several moves between the bull market state and the
bull correction state before a short-term move into a bear market in August of 1982.
Once again the transition from a bull to bear market is through a bull correction state.
However, the bear market that emerges has state 1 that lasts only about 4 weeks. This
is followed by a bear rally that results in increased prices accompanied with substantial
volatility. The bear rally turns into a bull market in late April of 1983, thereafter are
periods of the bull market state and bull corrections.
Chapter 1. Components of Bull and Bear Markets 25
1987 crash
Prior to the 1987 crash there is a dramatic run-up in stock prices with generally low
volatility, as illustrated in the top panel of Figure 1.6. It is interesting to note that the
model shows a great deal of uncertainty about the state of the market well before the
crash. In the first week of October, just before the crash, the most likely state is the bull
correction with p(st = 3|IT ) = 0.37. The bear state which starts the following week lasts
for about 5 weeks after which a strong bear rally quickly emerges as of the week ending
November 18, 1987. It is the bear rally state that exits into a bull market during the
week of August 17, 1988. Prices resume their strong increase until they plateau with a
bull correction beginning the week of October 4, 1989.
2006-2010
We conclude with an analysis of recent market activity in Figure 1.7. The bull market
state turned into a bull correction in mid-July 2007, which persisted until an abrupt move
into the bear market state in early September 2008. This transition was accompanied
by a dramatic increase in realized volatility. According to our model, the bear market
became a bear market rally in the third week of March 2009 where it stayed until mid-
November 2009 when it moved into the bull market state. As noted earlier, the positive
trend in returns during a bear market rally do not get interpreted as a bull market until
the market volatility declines to levels more typical of bull markets.
1.6.4 Example Application
An industry standard measure of potential portfolio loss is the Value-at-Risk (VaR).
VaR(α),t is defined as the 100α percent quantile of the portfolio value or return distribution
given information at time t− 1. We compute VaR(α),t from the predictive density of the
Chapter 1. Components of Bull and Bear Markets 26
MS-4 model as
p(rt < VaR(α),t|It−1) = α. (32)
Given a correctly specified model, the probability of a return of VaR(α),t or less is α.
To compute the Value-at-Risk from the MS-4 model we do the following. First, N
draws from the predictive density are taken as follows: draw θ and st−1 from the Gibbs
sampler, a future state st is simulated based on P and rt|st ∼ N(µst , σ2st). The details
are discussed in Section 1.5.3. From the resulting draws, the rt with rank [Nα] is an
estimate of VaR(α),t.
Figure 1.8 displays the conditional VaR from January 3, 2007 to January 20, 2010
predicted by the MS-4 model, as well as that implied by the normal benchmark for
α = 0.05. At each point the model is estimated based on information up to t − 1.
Similarly, the benchmark, N(0, σ2), sets σ2 to the sample variance using It−1.
The normal benchmark overestimates the VaR for the early part of this subsample but
starts to understate it at times, beginning in mid-2007, and then severely under estimates
in the last few months of 2008. The MS-4 model provides a very different VaR(.05),t over
time because it takes into account the predicted regime, as indicated by the middle and
bottom panels of Figure 1.8 which show forecasts of the states and regimes respectively.
Note that the potential losses, shown in the top panel, increase considerably in September
and October 2008 as the model identifies a move from a bull to a bear market.
Real-time Identification of the Bear Market
This out-of-sample application also gives us an opportunity to assess in real time when
our model identified a move into the bear regime. In Section 1.6.3 this was discussed in
the context of the full sample smoothed estimates. We now consider the identification
process that would have been historically available to investors using the model forecasts.
Chapter 1. Components of Bull and Bear Markets 27
This will differ from the previous results as we are using a smaller sample and updating
estimates as new data arrives.
The second and third panel of Figure 1.8 report the predictive mean of the states
and regimes. Prior to 2008, forecasts of the bull states occur the most, including some
short episodes of bull corrections. In the first week of October 2008, the probability of
a bull regime drops from 0.85 to essentially zero and remains there for some time. In
other words, the model in real time detects a turning point in the first week of October
2008 from the bull to the bear regime. The first half of the bear regime that follows is
characterized by the bear state while the second half is largely classified as a bear rally.
Toward the end of our sample there is a move from the bear market rally state to a
bull market. In real time, in early December 2009 the model forecasts a move from the
bear rally to the bull market state. For the week ending December 9, we have p(st =
1|It−1) = 0.02, p(st = 2|It−1) = 0.17, p(st = 3|It−1) = 0.14 and p(st = 4|It−1) = 0.67.
The evidence for a bull market regime gradually strenthens; the last observation in our
sample, January 20, 2010, has probabilities 0.01, 0.11, 0.07 and 0.81 for states 1,2,3 and
4, with the bull market state being the most likely.
1.7 Conclusion
This chapter proposes a new 4-state Markov-switching model to identify the components
of bull and bear market regimes in weekly stock market data. Bull correction and bull
states govern the bull regime; bear rally and bear states govern the bear regime. Our
probability model fully describes the return distribution while treating bull and bear
regimes and their component states as unobservable.
A bear rally is allowed to move back to the bear state or to exit the bear regime
by moving to a bull state. Likewise, a bull correction can move back to the bull state
or exit the bull regime by transitioning to a bear state. This implies that regimes can
Chapter 1. Components of Bull and Bear Markets 28
feature several episodes of their component states. For example, a bull regime can be
characterized by a combination of bull states and bull corrections. Similarly, a bear
regime can consist of several episodes of the bear state and the bear rally state. Because
the realization of states in a regime will differ over time, bull and bear regimes can be
heterogenous over time. This richer structure, including both intra-regime and inter-
regime dynamics, results in a richer characterization of market cycles.
Probability statements on regimes and future returns are available. Our model
strongly dominates other alternatives. Model comparisons show that the 4-state speci-
fication of bull and bear markets is strongly favored over several alternatives including
a two-state model, as well as various alternative specifications for variance dynamics.
For example, relative to a two-state model, there is less erratic switching so that market
regimes are more persistent.
We find that bull corrections and bear rallies are empirically important for out-of-
sample forecasts of turning points and VaR predictions. For these out-of-sample appli-
cations, the model provides probability statements concerning the predictive density of
returns. The probabilities are used in an example application that compares VaR fore-
casts to a normal benchmark model. The latter overestimates the VaR for much of the
sample and then tends to understate it from mid-2007 to late 2009. The MS-4 specifi-
cation has a very different VaR(.05),t over time because it takes into account forecasts of
regime changes. The potential losses increased considerably in September and October
of 2008 as the model identifies a move from a bull to a bear market.
Chapter 1. Components of Bull and Bear Markets 29
1.8 Appendix
The Pagan and Sossounov (2003) adaptation of the Bry-Boschan (BB) algorithm can be
summarized as follows:
1. Identify the peaks and troughs by using a window of 8 months.
2. Enforce alternation of phases by deleting the lower of adjacent peaks and the higher
of adjacent troughs.
3. Eliminate phases less than 4 months unless changes exceed 20%.
4. Eliminate cycles less than 16 months.
Window width and phase duration constraints will depend on the particular series and
will obviously be different for smoothed business cycle data than for stock prices. Pagan
and Sossounov (2003) provide a detailed discussion of their choices for these constraints.
The Lunde and Timmermann (2004) dating algorithm defines a binary market indi-
cator variable It which takes the value 1 if the stock market is in a bull state at time t
and 0 if it is in a bear state. The stock price at the end of period t is labelled Pt. Our
application of their dating algorithm can be summarized as: use a 6-month window to
locate the initial local maximum or minimum.
Suppose we have a local maximum at time t0, in which case we set Pmaxt0
= Pt0 .
1. Define stopping-time variables associated with a bull market as
τmax(Pmaxt0
, t0 | It0 = 1) = inft0 + τ : Pt0+τ ≥ Pmaxt0
τmin(Pmaxt0
, t0 | It0 = 1) = inft0 + τ : Pt0+τ ≤ 0.8Pmaxt0
2. One of the following happens:
Chapter 1. Components of Bull and Bear Markets 30
• If τmax < τmin, bull market continues, update the new peak value Pmaxt0+τmax
=
Pt0+τmax and set It0+1 = · · · It0+τmax = 1. Update t0 = t0 + τmax still as local
maximum and continue with step 1 above.
• If τmax > τmin, we find a trough at time t0 + τmin and we have been in a bear
market from t0 + 1 to t0 + τmin. Set It0+1 = · · · = It0+τmin= 0. Record the
value Pmint0+τmin
= Pt0+τminand update t0 = t0 + τmin as local minimum. Go to
step 3 below since t0 is a local minimum now.
When t0 is a local minimum:
3 Bear market stopping times are
τmin(Pmint0
, t0 | It0 = 0) = inft0 + τ : Pt0+τ ≤ Pmint0
τmax(Pmint0
, t0 | It0 = 0) = inft0 + τ : Pt0+τ ≥ 1.2Pmint0
4 One of the following happens:
• If τmin < τmax, bear market continues, update the new trough value, Pmint0+τmin
=
Pt0+τminand set It0+1 = · · · = It0+τmin
= 0. Update t0 = t0 + τmin and continue
with step 3.
• If τmin > τmax we find a peak at time t0 + τmax and we have been in a bull
market from t0 + 1 to t0 + τmax. Set It0+1 = · · · = It0+τmin= 1. Record the
value Pmaxt0+τmax
= Pt0+τmax and update t0 = t0 + τmax as a local maximum. Go
to 1 above since t0 is a local maximum now.
This process is repeated until the last data point. All periods with It = 1 are in bull
regime and It = 0 are in bear regime.
Chapter 1. Components of Bull and Bear Markets 31
Table 1.1: Weekly Return Statistics (1885-2010)a
N Mean standard deviation Skewness Kurtosis J-Bb
6498 0.085 2.40 -0.49 11.2 18475.5∗
a Continuously compounded returnsb Jarque-Bera normality test: p-value = 0.00000
Chapter 1. Components of Bull and Bear Markets 32
Table 1.2: BB and LT Dating Algorithm Turning Points
Troughs Peaks Troughs Peaks
BBa LTb BB LT BB LT BB LT
1985-02 1940-06 1940-11
1885-04 1942-05 1942-05 1943-07
1886-12 1943-12 1946-05 1946-05
1888-06 1890-06 1890-06 1948-02 1948-02 1948-06
1890-12 1890-12 1892-03 1892-03 1949-06 1952-12
1893-08 1893-08 1895-09 1895-09 1953-09 1956-07 1956-07
1896-08 1896-08 1897-09 1957-12 1957-12 1959-07
1898-03 1899-04 1960-10 1961-12
1900-07 1902-09 1902-09 1962-07 1962-07 1966-02 1966-02
1903-10 1903-10 1906-01 1966-10 1966-10 1968-12 1968-12
1906-10 1970-06 1970-06 1971-04
1907-11 1907-11 1909-08 1909-08 1971-12 1973-01 1973-01
1910-08 1912-10 1974-10 1974-10 1976-09
1914-12 1914-12 1916-11 1916-11 1978-03 1978-09
1917-12 1917-12 1919-07 1919-07 1980-04 1980-11 1980-11
1921-06 1921-06 1929-09 1929-09 1982-08 1982-08 1983-06
1929-11 1930-04 1984-08 1987-08 1987-08
1932-06 1932-06 1932-09 1987-12 1987-12 1990-06
1933-03 1933-07 1933-07 1990-10 2000-03 2000-03
1933-10 1934-02 2002-10 2002-10 2007-10 2007-10
1935-03 1935-03 1937-03 1937-03 2009-03 2009-03 2010-01 2010-01
1938-04 1938-04 1938-11
1939-04 1939-10 1939-10
a BB: Bry and Boschan algorithm using Pagan and Sossounov parametersb LT: Lunde and Timmermann algorithm
Chapter 1. Components of Bull and Bear Markets 33
Table 1.3: MS-2-State Model Estimates
mean median std 0.95 DI
µ1 -0.46 -0.46 0.14 (-0.73, -0.20)
µ2 0.20 0.20 0.02 ( 0.16, 0.25)
σ1 4.42 4.42 0.13 ( 4.18, 4.69)
σ2 1.64 1.64 0.02 ( 1.59, 1.69)
p11 0.94 0.94 0.01 ( 0.92, 0.96)
p22 0.99 0.99 0.002 ( 0.98, 0.99)
This table reports the posterior mean,median, standard deviation and 0.95density intervals for model parameters.
Chapter 1. Components of Bull and Bear Markets 34
Table 1.4: MS-4-State Model Estimates
mean median std 95% DI
µ1 -0.94 -0.92 0.27 (-1.50, -0.45)
µ2 0.23 0.23 0.10 ( 0.04, 0.43)
µ3 -0.13 0.12 0.08 (-0.31, -0.01)
µ4 0.30 0.29 0.04 (0.22, 0.38)
σ1 6.01 5.98 0.35 (5.41, 6.77)
σ2 2.63 2.61 0.18 (2.36, 3.08)
σ3 2.18 2.19 0.12 (1.94, 2.39)
σ4 1.30 1.30 0.04 (1.20, 1.37)
p11 0.921 0.923 0.020 (0.877, 0.955)
p12 0.076 0.074 0.020 (0.042, 0.120)
p14 0.003 0.001 0.004 (3e-6, 0.013)
p21 0.015 0.014 0.007 (0.005, 0.031)
p22 0.966 0.967 0.009 (0.945, 0.980)
p24 0.019 0.018 0.006 (0.009, 0.034)
p31 0.010 0.009 0.003 (0.004, 0.017)
p33 0.939 0.943 0.018 (0.899, 0.965)
p34 0.051 0.048 0.017 (0.027, 0.088)
p41 0.001 0.0003 0.0007 (6e-7, 0.002)
p43 0.039 0.037 0.012 (0.024, 0.067)
p44 0.960 0.963 0.012 (0.933, 0.976)
The posterior mean, median, standarddeviation and 0.95 density intervals for modelparameters.
Chapter 1. Components of Bull and Bear Markets 35
Table 1.5: Unconditional State Probabilites
mean 0.95 DI
π1 0.070 (0.035, 0.117)
π2 0.157 (0.073, 0.270)
π3 0.304 (0.216, 0.397)
π4 0.469 (0.346, 0.579)
The posterior mean and 0.95 density intervalsassociated with the posterior distribution for πfrom Equation (7).
Chapter 1. Components of Bull and Bear Markets 36
Table 1.6: Posterior Regime Statistics for MS-2 and MS-4 Models
MS-2 MS-4
bear mean -0.46 -0.13
(-0.73, -0.20) (-0.367, -0.005)
bear duration 18.2 77.8
(13.2, 25.0) (44.4, 134.6)
bear standard deviation 4.42 4.04
(4.18, 4.69) (3.51, 4.73)
bear variance from Var(E[rt|st]|st = 1, 2) 0.00 0.31
(0.07, 0.68)
bear variance from E[Var(rt|st)|st = 1, 2] 19.6 16.1
(17.5, 22.0) (12.1, 22.0)
bear skewness 0 -0.42
(-0.68, -0.20)
bear kurtosus 3 5.12
(4.37, 5.93)
bull mean 0.20 0.13
(0.16, 0.25) (0.07, 0.18)
bull duration 82.6 256.0
(59.1, 115.9) (123.5, 509.6)
bull standard deviation 1.64 1.71
(1.59, 1.69) (1.59, 1.83)
bull variance from Var(E[rt|st]|st = 3, 4) 0.00 0.04
(0.02, 0.09)
bull variance from E[Var(rt|st)|st = 3, 4] 2.69 2.89
(2.54, 2.85) (2.47, 3.30)
bull skewness 0 0.04
(-0.11, 0.16)
bull kurtosus 3 3.77
(3.51, 4.03)
The posterior mean and 0.95 density interval for regime statistics.
Chapter 1. Components of Bull and Bear Markets 37
Table 1.7: Posterior State Statistics for the MS-4 Model
mean median std 95% DI
Bear mean -0.13 -0.11 0.10 (-0.367, -0.005)
Bear duration 77.8 74.0 23.1 (44.4, 134.6)
Bear cumulative return -9.94 -8.28 7.89 (-29.6, -0.41)
Bear std 4.04 4.01 0.31 (3.51, 4.73)
Bull mean 0.13 0.13 0.03 (0.07, 0.18)
Bull duration 256.0 235.6 100.9 (123.5, 509.6)
Bull cumulative return 33.0 30.0 14.9 (12.9, 70.3)
Bull std 1.71 1.71 0.06 (1.59, 1.83)
s=1: cumulative return -12.4 -11.8 4.49 (-23.0, -5.45)
s=2: cumulative return 7.10 6.97 3.10 (1.47, 13.8)
s=3: cumulative return -2.13 -2.07 1.09 (-4.46, -0.27)
s=4: cumulative return 7.88 7.75 1.67 (5.02, 11.6)
s=1: duration 13.5 13.0 3.63 (8.13, 22.2)
s=2: duration 31.2 30.1 8.39 (18.3, 51.0)
s=3: duration 17.9 17.4 4.80 (9.91, 28.8)
s=4: duration 27.2 26.9 6.75 (14.9, 41.4)
s=1: µ1/σ1 -0.16 -0.15 0.04 (-0.25, -0.07)
s=2: µ2/σ2 0.09 0.09 0.04 (0.02, 0.17)
s=3: µ3/σ3 -0.06 -0.05 0.04 (-0.14, -0.01)
s=4: µ4/σ4 0.23 0.22 0.04 (0.17, 0.31)
This table report posterior statistics for various populationmoments.
Chapter 1. Components of Bull and Bear Markets 38
Table 1.8: Log Marginal Likelihoods: Alternative Models
Model log f(Y | Model)
Constant mean with constant variance -14924.1
Constant mean with 4-state i.i.d variance -14256.7
MS-4-state mean with 4-state i.i.d. variance (10 with K = 4) -14036.4
MS-2-state (1) -13903.3
MS-4-state mean with constant intra-regime variance (σ21 = σ2
2 , σ23 = σ2
4) -13849.9
MS-4-state (3) with unrestricted transition matrix P -13747.3
MS-4-state (3)-(6) -13740.4
Chapter 1. Components of Bull and Bear Markets 39
Table 1.9: Dating-algorithm filtering ofdata and simulated data
S&P MS-4
Avg. number of bears 29 31.7
(22, 42)a
Avg. bear duration 63.1 55.9
(40.5, 74.7)
Avg. bear amplitudeb -45.0 -43.4
(-52.7, -35.8)
Avg. bear return -0.71 -0.80
(-1.08, -0.57)
Avg. bear std 3.16 3.15
(2.60, 3.73)
Avg. number of bulls 28 31.4
(22, 42)
Avg. bull duration 166.7 158.5
(103.0, 235.3)
Avg. bull amplitude 66.4 60.2
(46.3, 80.0)
Avg. bull return 0.40 0.39
(0.31, 0.48)
Avg. bull std 2.53 2.42
(1.97, 2.91)
a 70% density intervalb Aggregate return over one regime
Chapter 1. Components of Bull and Bear Markets 40
−10 −5 0 5 10
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
dens
ity
s=1s=2s=3s=4
Figure 1.1: MS-4-States, State Densities
Chapter 1. Components of Bull and Bear Markets 41
−10 −5 0 5 10
0.00
0.05
0.10
0.15
0.20
0.25
0.30
dens
ity
BearBullUnconditional
Figure 1.2: MS-4-States, Regime Densities
Chapter 1. Components of Bull and Bear Markets 42
100
200
300
400
500
600
700
LT d
ecom
posi
tion
0.0
0.2
0.4
0.6
0.8
1.0
MS
4: B
ull P
roba
bilit
ies
0.0
0.2
0.4
0.6
0.8
1.0
MS
2: B
ull P
roba
bilit
ies
188502 190212 192102 193811 195609 197406 199204 201001
Figure 1.3: LT algorithm, MS-4 and MS-2
Chapter 1. Components of Bull and Bear Markets 43
050
100
150
200
Log
pric
e In
dex
ReturnRV
05
1015
RV
0.0
0.2
0.4
0.6
0.8
1.0
Sta
te P
roba
bilit
ies
s=1s=2s=3s=4
0.0
0.2
0.4
0.6
0.8
1.0
Bul
l Pro
babi
litie
s
192701 192811 193009 193207 193405 193603 193801 193911
Figure 1.4: MS-4, 1927-1939
Chapter 1. Components of Bull and Bear Markets 44
320
340
360
380
Log
pric
e In
dex
ReturnRV
12
34
5R
V
0.0
0.2
0.4
0.6
0.8
1.0
Sta
te P
roba
bilit
ies
s=1s=2s=3s=4
0.0
0.2
0.4
0.6
0.8
1.0
Bul
l Pro
babi
litie
s
198001 198011 198109 198207 198305 198403 198501 198512
Figure 1.5: MS-4, 1980-1985
Chapter 1. Components of Bull and Bear Markets 45
360
380
400
420
440
Log
pric
e In
dex
ReturnRV
02
46
810
RV
0.0
0.2
0.4
0.6
0.8
1.0
Sta
te P
roba
bilit
ies
s=1s=2s=3s=4
0.0
0.2
0.4
0.6
0.8
1.0
Bul
l Pro
babi
litie
s
198501 198511 198609 198707 198805 198903 199001 199012
Figure 1.6: MS-4, 1985-1990
Chapter 1. Components of Bull and Bear Markets 46
520
540
560
580
Log
pric
e In
dex
ReturnRV
05
1015
RV
0.0
0.2
0.4
0.6
0.8
1.0
Sta
te P
roba
bilit
ies
bull correction
bull
bear
bear rally
mid−July−07 early−Sep−08 late−Mar−09 mid−Nov−09
0.0
0.2
0.4
0.6
0.8
1.0
Bul
l Pro
babi
litie
s
200601 200608 200702 200709 200804 200811 200906 201001
Figure 1.7: MS-4, 2006-2010
Chapter 1. Components of Bull and Bear Markets 47
++++
+++
+
++
+
+
++
+++++++
++++++
+
++
+
+
+
+++
+
+++
++
+
+
+
+
+
++
+
+
++++
+
+
+
+
+
+++
++
+++
+++
+++
+
++
+
++
+
++++++
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+
+
+
+
+
+++
++
+
+
++
+++
+
+
+
+
+
+++
+
++++++
−15
−10
−5
05
10R
etur
n
MS−4Normal
0.0
0.2
0.4
0.6
0.8
1.0
For
ecas
t Sta
te P
roba
bilit
ies
s=1s=2
s=3s=4
0.0
0.2
0.4
0.6
0.8
1.0
For
ecas
t of P
roba
bilit
y in
Bul
l
200701 200706 200711 200804 200809 200902 200907 200912
Figure 1.8: Value-at-Risk from MS-4 and Benchmark Normal distribution
Chapter 2
An Efficient Approach to Estimate
and Forecast in the Presence of an
Unknown Number of Change-points
48
Chapter 2. A New Change-point Model 49
2.1 Introduction
Accounting for structural instability in macroeconomic and financial time series modeling
and forecasting is important. Applications to the time series data including the Phillips
curve, US real interest rates and inflation have confirmed the necessity of modeling the
parameters which characterize the conditional data density as time-varying. Failing to
do so usually produces inferior out-of-sample forecasts, because the data before and after
a structural break have different implications for the most recent regime. Suppose an
abrupt structural break happens and the parameters in the new regime are independent
of the past, the estimation of the parameters in the current regime will be contaminated
if the data before the break point is used. The forecast is further distorted by the biased
estimation.
A model with a single change-point is not adequate to describe the structural insta-
bility from empirical studies. Multiple change-point models that allow the data dynamics
to change out-of-sample is helpful to forecasting in many applications. This chapter fo-
cuses on multiple change-point models in the Bayesian framework, because the Bayesian
approach provides inferences based on a finite sample size and integrates out the estima-
tion uncertainty in the out-of-sample forecasting. Markov chain Monte Carlo (MCMC)
sampling methods make the model estimation straightforward.
A popular Bayesian model for multiple structural breaks is Chib (1998). He models
structural breaks as a Markov chain and estimates it with a fixed number of regimes. His
approach is not appropriate for out-of-sample forecasting. Pesaran et al. (2006) extend
Chib’s (1998) model with a hierarchical prior for the parameters which characterize each
regime. Koop and Potter (2007) further model a hierarchical distribution for regime du-
rations, which implies that the structural change probabilities are duration dependent.
Gerlach et al. (2000) introduce the mixture innovation model in a state space represen-
tation to allow an unknown number of regimes and Giordani and Kohn (2008) apply an
adaptive method to improve computational efficiency. Both Koop and Potter’s (2007)
Chapter 2. A New Change-point Model 50
and Gerlach et al.’s (2000) methodology nest the time varying parameter (TVP) model,
which assumes that the parameters changes at each time period (see Stock and Watson
(1991, 1996) and Primiceri (2005)).
In another direction, Inclan (1993) and Wang and Zivot (2000) suggest to obtain the
analytic form of the posterior distribution of the change-points conditional on the number
of regimes by using the conjugate priors. Their approaches explore all the possible periods
for the structural breaks to occur. Suppose the length of a time series is T and the number
of change-points is M , the total number of the different combinations of the change-points
is T !(T−M)!M !
. If T = 1000 and M = 3, it requires 166, 167, 000 marginal likelihoods to be
computed.1 Even if T = 100, it still requires 161, 700 marginal likelihoods. In general,
their methodology is impractical if the number of regimes is larger than 3. Maheu and
Gordon (2008) avoid such problem by introducing a real time forecasting model and
concentrate on the last regime. It includes many sub-models and each sub-model is
indexed by the duration of the last regime. They report the filtered but do not discuss
the smoothed distribution of the change-points.
This chapter extends Maheu and Gordon (2008) and Maheu and McCurdy (2009)
in five directions. First, I use a conjugate prior for the parameters which characterize
each regime. Conditional on this prior and the time-invariant parameters, the predictive
density has a closed form. The computational burden is reduced compared to Maheu
and Gordon (2008), in which a non-conjugate prior is assumed 2. Second, a hierarchi-
cal structure for the conjugate prior is introduced to allow learning and sharing of the
information across regimes as in Pesaran et al. (2006). In the presence of a structural
break, the new parameters are drawn independently from the hierarchical prior. Third,
one extension of the new approach models the regime duration as a Poisson distribution,
which implies duration dependent break probabilities. Fourth, this chapter shows how
1The methods developed in this chapter can deal with a sample size of 1000.2Maheu and Gordon (2008) assume a conditional conjugate prior and use Gibbs sampling to compute
the predictive density.
Chapter 2. A New Change-point Model 51
to produce the smoothed distribution of the change-points. Lastly, different types of the
break dynamics including having breaks in the variance, the regression coefficients or
both are nested in this framework.
The differences between this chapter and Koop and Potter’s (2007) method are the
follows. First, Koop and Potter (2007) assume a heterogeneous distribution for the
duration in each regime. Their approach augments the state space by regime durations,
so there are O(T 2) states, which implies a large transition matrix. In contrast, I assume
that the regime durations are drawn from the same distribution. So the total number of
states is O(T ) in the new model. Second, Koop and Potter (2007) assume that after a
structural change, the parameters in the new regime are related to those in the previous
regime through a random walk. Instead, I assume that in each regime the parameters
are drawn independently from a hierarchical prior. This reflects an abrupt change of
the parameters and is convenient for computation. Lastly, this chapter introduces a
new MCMC sampler to draw all the parameters including the hierarchical prior, the
parameters of the durations, the change-points and the parameters characterizing each
regime from their posterior distribution jointly. Based on Casella and Robert (1996),
this posterior sampler is efficient. I also expect the new approach to be very fast in
computation.
Four versions of the model are proposed in this new framework. The first type allows
breaks in the regression coefficients and the variance simultaneously. The second allows
the regression coefficients to change but keeps the variance constant. The third one
keeps the regression coefficient constant while allowing breaks in the variance. All of the
above three versions assume that the structural change probability is time-invariant. The
last type models the regime duration as a Poisson distribution, which implies duration
dependent break probabilities.
The new MCMC sampler is applicable to all of the four versions of the models. It de-
composes the parameter space into the time-varying parameters and the time-invariant
Chapter 2. A New Change-point Model 52
parameters and samples them jointly by taking advantage of the analytic form of the
predictive density. The sampler first draws the time-invariant parameters from its poste-
rior distribution by a Metropolis-Hastings step. The proposal distribution used to sample
the time-invariant parameters is the conditional posterior distribution implied by a Gibbs
sampler. Then, the time-varying parameters including the change-points are drawn from
the posterior distribution conditional on the time-invariant parameters. This approach
is efficient because the sampler draws all the parameters jointly.
Those different versions of the models are applied to a Canada inflation series to
investigate its dynamic stability. The log marginal likelihood is used as the criteria for
model comparison. The best model is the hierarchical model which allows breaks in the
regression coefficients and the variance simultaneously. It identifies 4 major change-points
in the Canada inflation dynamics. The model comparison also shows that the duration
dependent break probability is not a significant feature of the data. After controlling for
the structural breaks, adding extra lags as the explanatory variables does not improve
the out-of-sample forecasting.
This chapter is organized as the following. Section 2.2 introduces the Maheu and
Gordon (2008) model and revises it with conjugate priors. The modified model has a
closed form for the predictive density conditional on the structural break probabilities.
A new Markov Chain Monte Carlo method is proposed to sample from the posterior
distribution efficiently. Section 2.3 extends the non-hierarchical prior to a hierarchical one
in order to exploit the information across regimes. Different extensions of the hierarchical
model are introduced in Section 2.4, including a model with breaks only in the variance
or in the regression coefficients. Duration dependent break probability is also discussed
by assuming a Poisson distribution for the regime durations. Section 2.5 applies this
framework to a Canada inflation time series. Section 2.6 concludes.
Chapter 2. A New Change-point Model 53
2.2 Maheu-Gordon Model with Conjugate Prior
This section briefly reviews Maheu and Gordon’s (2008) model and its key assumption.
Then the advantage of using the conjugate priors is described. I also provide the method
to calculate the predictive density and to estimate the model by a Markov Chain Monte
Carlo sampler.
In Maheu and Gordon’s (2008) model, there are t sub-models at each time t. The sub-
model Mi, where i = 1, . . . , t, is indexed by the most recent break point i. For example,
the sub-model M1 assumes that no break happens through out the whole sample. For
a sub-model Mi, the parameters of the data density are the same for time τ ∈ [i, t] and
different from time τ < i. The sub-model Mt implies that a structural break is present
and the parameters which characterize the data density are changed at time t. As time i is
the starting point of the most recent regime, they assume that the data before time i is not
informative for the posterior of the parameters θ in model Mi. Define Yi,t = (yi, . . . , yt)
for 1 ≤ i ≤ t, the previous statement is equivalent to p(θ |Mi, Y1,t) = p(θ |Mi, Yi,t).
This model is originally designed for forecasting. As the time grows, the number of
sub-models increases and each sub-model Mi need to be estimated for each time t. The
filtered distribution of the sub-models is updated by the Bayes’ rule. In detail:
1. At time 1, there is only one sub-model, so the filtered distribution of sub-models is
degenerate as p(M1 | y1) = 1.
2. At time t, compute the predictive likelihood for every sub-model by
p(yt+1 | Y1,t,Mi) =
∫p(yt+1 | Y1,t, θ)p(θ |Mi, Yi,t)dθ
where the posterior p(θ |Mi, Yi,t) ∝ p(θ)p(Yi,t | θ) for i = 1, . . . , t. When i = t+ 1,
the predictive likelihood only depends on the prior p(θ).
3. Compute the filtered distribution of the sub-models at time t + 1. Define λt as
Chapter 2. A New Change-point Model 54
the break probability at time t and Λ1,t = (λ1, . . . , λt). This allows sub-models
based on different information sets to be combined. In particular, the sub-model
probability is
p(Mi | Y1,t+1,Λ1,t) ∝
(1− λt)p(Mi | Y1,t,Λ1,t)p(yt+1 | Y1,t,Mi) if i = 1, . . . , t
λtp(yt+1 | Y1,t,Mt+1) if i = t+ 1
The predictive density is calculated by integrating out all the sub-models and the
uncertainty of the structural break.
p(yt+1 | Y1,t,Λ1,t) = λtp(yt+1 | Y1,t,Mt+1)+(1−λt)t∑i=1
p(yt+1 | Y1,t,Mi)p(Mi | Y1,t,Λ1,t)
4. Repeat 2-3 until the last period T .
The key assumption of Maheu and Gordon (2008) is that the data before the most
recent break point is uninformative to the current regime. The predictive density at time
t+ 1 for the sub-model Mi only depends on the index i and the data from time i to t. In
this framework, the index of the sub-models can be regarded as the state variable, since
it contains all the information needed to compute the predictive density after integrating
out the sub-model parameter θ. If they drop this assumption and adopt Koop and
Potter’s (2007) approach by assuming that the parameters after a break is related to
those in the previous regime, then it is impractical to use the previously described steps,
because the whole path of change-points is needed to obtain the predictive density. In
another words, the state space is expanded from O(T ) to O(2T ).
Pesaran et al. (2006) argued the importance of modeling a hierarchical prior to use
information across different regimes. Maheu and Gordon (2008) adopt a non-hierarchical
prior because of the heavy computational burden. From the above step 2, there are O(T 2)
predictive likelihoods to compute. Each time of the computation involves the estimation
Chapter 2. A New Change-point Model 55
of the posterior distribution of the parameter θ. Since they do not use a conjugate prior,
the estimation is done numerically by using Gibbs sampling. So, to further estimate a
hierarchical prior is computationally infeasible.
Now, I will show how to apply the conjugate prior to Maheu and Gordon’s (2008)
model to improve computational efficiency. Next section will discuss the hierarchical
priors.
Notice that the most recent break at time i has a one-to-one relationship to the
duration of the last regime up to time t. If dt is the duration, then dt = t − i + 1 by
definition. The duration is used in this chapter for two reasons. First, the model in
this chapter studies not only the forecasting problem but also the ex-post analysis of
multiple change-points. Since Maheu and Gordon (2008) do not consider the smoothed
distribution of breaks and only focus on the filtered distribution of sub-models, their
notation of the sub-model (Mi) drops a subscript t, which represents the current time
period and is implicitly assumed in the real-time setting. However, if we are interested
in a past time period of τ < t, Mi need a notation to represent the date τ where we
are standing at. On the other hand, the new notation dt has a subscript t representing
the time and the value of dt is the duration up to the time t. Second, the new approach
introduced nests the duration dependent break probabilities. So using the duration is
natural and easier for presentation.
Formally, I define dt as the duration of the most recent regime up to time t and
dt ∈ 1, . . . , t by construction. If a break happens at time t, then dt = 1. If dt = t, then
there is no break throughout the whole sample. The predictive density conditional on
the duration is given by
p(yt+1 | dt+1, Y1,t) =
∫p(yt+1 | θ, Y1,t)p(θ | dt+1, Y1,t)dθ
=
∫p(yt+1 | θ, Y1,t)p(θ | Yt−dt+1+2,t)dθ
Chapter 2. A New Change-point Model 56
θ is the collection of parameters which characterize the most recent regime that is associ-
ated with data yt+1 and has duration dt. The second equality comes from the assumption
that the data before a break point is uninformative to the regime after it. If τ > t, Yτ,t
is an empty set. For example, if dt+1 = 1, p(θ | Yt−dt+1+2,t) is equivalent to its prior p(θ).
The conditional distribution of yt+1 | θ, Y1,t is a linear model with an i.i.d. normal
error term. The prior is assumed a Normal-Gamma distribution, which is conjugate to
the model. By conjugacy, the posterior distribution θ | Yt−dt+1+2,t is also Normal-Gamma.
The predictive density p(yt+1 | dt+1, Y1,t) is a Student-t distribution if we integrate out
θ. Conditional on dt, the posterior distribution and the predictive density have analytic
forms.
If we assume a constant structural break probability λt = π, the Maheu and Gordon
(2008) model with the conjugate prior can be written as follows:
dt =
dt−1 + 1 w.p. 1− π
1 w.p. π
(βt, σ−2t ) ∼ 1(dt = 1)NG(β,H, χ/2, ν/2) + 1(dt > 1)δ(βt−1,σ
−2t−1)
(1)
yt | βt, σt, Y1,t−1 ∼ N(x′tβt, σ2t )
The covariate xt can include exogenous or lagged dependent variables. In this chapter, I
set xt = (1, yt−1, . . . , yt−q)′, which implies an AR(q) model in each regime. Define θt ≡
(βt, σ2t ) as the collection of the parameters which characterize the data density at time t. If
a break happens (dt = 1), θt is drawn independently from the prior NG(β,H, χ/2, ν/2),
where NG represents a Normal-Gamma distribution. In detail, the precision (inverse
of variance) σ−2t is drawn from a Gamma distribution G(χ/2, ν/2), where χ/2 is the
multiplier and ν/2 is the degree of freedom. Its prior mean is νχ
and the prior variance
is 2νχ2 . It also implies the prior mean of the variance σ2
t isχ
ν−1 . Conditional on the
variance, the vector of the regression coefficients βt is drawn from a multivariate normal
Chapter 2. A New Change-point Model 57
distribution N(β,H−1σ2t ). δθ represents a degenerate distribution at a mass point θ. If
there is no break (dt > 1), all parameters are the same as those in the previous period.
By conjugacy of the prior, the posterior distribution of the parameters which char-
acterize the data density at time t is still Normal-Gamma conditional on the duration
dt.
βt, σ−2t | dt, Y1,t ∼ NG(β, H−1, χ/2, ν/2)
with
β = H−1(Hβ +X ′t−dt+1,tYt−dt+1,t)
H = H +X ′t−dt+1,tXt−dt+1,t
χ = χ+ Y ′t−dt+1,tYt−dt+1,t + β′Hβ − β′Hβ
ν = ν + dt
where Xt−dt+1,t = (xt−dt+1, . . . , xt)′.
If there is no break at time t+ 1, the new duration increase by 1 (dt+1 = dt + 1) and
the parameters which characterize the data dynamics stay the same (θt+1 = θt) as the
last period. The posterior distribution of θt is used to compute the predictive density.
p(yt+1 | dt+1 = dt + 1, Y1,t) =
∫p(yt+1 | θt, Y1,t)p(θt | Yt−dt+1,t)dθ
∝
(1 +
(yt − x′tβ)2
χ(x′tH−1xt + 1)
)− (ν+1)2
.
The last line is the kernel of a Student-t distribution, so
yt+1 | dt+1 = dt + 1, Y1,t ∼ t
(x′tβ,
χ(x′tH−1xt + 1)
ν, ν
).
For the special case of dt+1 = 1, a structural change happens at time t + 1, so the
data before t + 1 is uninformative to the predictive density. Simply replace the above
Chapter 2. A New Change-point Model 58
filtered distribution of the parameters by the prior will produce
yt+1 | dt+1 = 1, Y1,t ∼ t
(x′tβ,
χ(x′tH−1xt + 1)
ν, ν
).
By integrating out the model parameters, the predictive density depends on the du-
ration dt+1 and the past information Y1,t. Now Chib’s (1996) method can be applied to
sample D1,T = (d1, . . . , dT ) jointly. In detail, first use the forward-filtering method to
calculate the filtered distribution of the duration dt for t = 1, . . . , T .
1. At t = 1, the distribution of the duration is p(d1 = 1 | y1) = 1 by assumption.
2. The forecasting step:
p(dt+1 = j | Y1,t) =
p(dt = j − 1 | Y1,t)(1− π) for j = 2, · · · , t+ 1
π for j = 1
3. The updating step:
p(dt+1 = j | Y1,t+1) =p(yt+1 | dt+1 = j, Y1,t)p(dt+1 = j | Y1,t)
p(yt+1 | Y1,t)
for j = 1, . . . , t + 1. The first term in the numerator on the right hand side is
a student-t distribution density function which we have derived using the conju-
gate prior. The second term is obtained from step 2. The predictive likelihood is
computed by summing over all the values of the duration dt+1.
p(yt+1 | Y1,t) =t+1∑j=1
p(yt+1 | dt+1 = j, Y1,t)p(dt+1 = j | Y1,t)
4. Iterate over step 2 and 3 until the last period T .
Then, use the backward-sampling method to draw the vector of durationsD1,T = (d1 . . . , dT )
Chapter 2. A New Change-point Model 59
jointly.
1. Sample the last period duration dT from dT | Y1,T , which is obtained from the last
iteration of the forward-filtering step.
2. If dt > 1, then dt−1 = dt − 1 by construction.
3. If dt = 1, then sample dt−1 from the distribution dt−1 | Y1,t−1. This is because
dt = 1 implies a structural change at time t. Hence, for any τ ≥ t, the data yτ is
in a new regime and uninformative to dt−1. More rigorously, dt−1 | dt = 1, Y1,t−1 is
equivalent to dt−1 | dt = 1, Y1,T .
4. Iterate step 2 and 3 until the first period t = 1.
Assuming the conjugate prior in Maheu and Gordon’s (2008) model has several fea-
tures. First, the computational burden is negligible compared to the original model with
the non-conjugate priors. Meanwhile, the computer memory required by the predictive
likelihoods is O(T 2), which is manageable for a sample size up to several thousands.
Second, the optimal number of regimes is estimated in a straightforward way. This num-
ber is sampled from the posterior distribution and equal to the number of time t with
duration value dt = 1. Define K as the number of regimes implied by one sample of
the vector of the durations D1,T from the posterior distribution, then K =T∑t=1
1(dt = 1).
The posterior distribution of K − 1 is the distribution of the number of change-points.
Third, the posterior sampler is efficient based on Casella and Robert (1996), because the
parameters Θ1,T = θtTt=1 are integrated out.
2.2.1 Estimation and Inference
The parameters in the non-hierarchical prior, (β,H, χ, ν), are fixed. In the case of the
constant break probability, the prior of the break probability π is assumed as a Beta
distribution, B(πa, πb). Because the analytic conditional marginal likelihood p(Y1,T | π)
Chapter 2. A New Change-point Model 60
exists, π can be sampled through a Metropolis-Hastings framework by integrating out the
time-varying parameters Θ1,T and the regime durations D1,T . For an efficient proposal
sampling distribution, I exploit the information from the previous sample of the regime
durations D1,T in the Markov chain. This is motivated by the fact that the sampling of π
can be done in a Gibbs sampler conditional on D1,T . Instead of using the Gibbs sampler
to sample π and accept it, this method uses the conditional posterior distribution in the
Gibbs sampler as a proposal distribution and accept it with a probability implied by the
Metropolis-Hastings algorithm.
In general, a Gibbs sampler could alternatively draws random samples from p(π |
Θ1,T , D1,T , Y1,T ) and p(Θ1,T , D1,T | π, Y1,T ). In contrast, the Metropolis-Hastings step
samples from p(π | Y1,T ) first and then from p(Θ1,T , D1,T | π, Y1,T ), which is equivalent to
sampling from the joint posterior distribution p(π,Θ1,T , D1,T | Y1,T ).
1. Sample π | Y1,T from a proposal distribution:
π(i) | Y1,T ∼ Beta(πa +K(i−1) − 1, πb + T −K(i−1))
K(i−1) is the number of regimes implied from the previous sample of D(i−1)1,T . Accept
π(i) with probability
min
1,
p(π(i) | πa, πb)p(π(i−1) | πa, πb)
p(YT | π(i))
p(YT | π(i−1))
p(π(i−1) | πa +K(i−1) − 1, πb + T −K(i−1))
p(π(i) | πa +K(i−1) − 1, πb + T −K(i−1))
If not accepted, π(i) is set equal to π(i−1).
2. Sample Θ1,T , D1,T | π, Y1,T :
(a) Sample D1,T | π, Y1,T from the previously described forward-backward method.
Calculate the number of regimes K and index the regimes by 1, · · · , K. Use
an auxiliary variable st to represent the regime index at time t. Define s1 = 1
and st = 1 for t > 1 until time τ with dτ = 1, which implies there is a break
Chapter 2. A New Change-point Model 61
and the data is in a new regime. Then set sτ = 2 at this break point, and
iterate until the last period with sT = K.
For example, if D1,T = (1, 2, 3, 1, 2, 1, 2, 3, 4), we can infer there are K =
3 regimes and the time series of regime indicators is S1,T = (s1, . . . , sT ) =
(1, 1, 1, 2, 2, 3, 3, 3, 3). As we can see, there is a one-to-one relationship between
D1,T and S1,T .
(b) To sample Θ1,T | D1,T , π, Y1,T , we only need to sample K different sets of
parameters because their values are constant in each regime. Define β∗i , σ∗i as
the distinct parameters which characterize the ith regime, where i = 1, . . . , K.
β∗i , σ∗−2i ∼ NG(βi, H
−1i , χi/2, νi/2)
with
βi = H−1i (Hβ +X ′iYi)
H i = H +X ′iXi
χi = χ+ Y ′i Yi + β′Hβ − β′iH iβi
νi = ν +Di
Xi = (xt0 , . . . , xt1)′ and Yi = (yt0 , . . . , yt1)
′, where st = i if and only if t0 ≤ t ≤
t1. So, Xi and Yi represent the data in the ith regime. Di = t1 − t0 + 1 is the
duration of the ith regime.
The Markov chain is run for N0 + N times and the first N0 iterations are discarded
as burn-in samples. The rest of the samples of the parametersπ(i),Θ
(i)1,T , D
(i)1,T
Ni=1
are
used for inferences and forecasting as if they were drawn from the posterior distribution.
For example, the posterior mean of the break probability is computed as the sample
average of π(i) as E(π | Y1,T ) = 1N
N∑i=1
π(i). The posterior mean of the volatility at time t
Chapter 2. A New Change-point Model 62
is E(σ2t | Y1,T ) = 1
N
N∑i=1
σ2t(i)
. Similarly, we can also obtain the predictive density at time
T + 1. Because we know
p(yT+1 | Y1,T ) = E (p(yT+1 | dT , Y1,T ) | Y1,T )
Using the posterior distribution, this can be estimated as
p(yT+1 | Y1,T ) =1
N
N∑i=1
p(yT+1 | dT+1 = d
(i)T + 1, Y1,T )(1− π(i)) + p(yT+1 | dT+1 = 1, Y1,T )π(i)
2.3 Hierarchical Structural Break Model
2.3.1 Hierarchical Distribution
Maheu and Gordon (2008) require a careful choice of the prior in forecasting, because the
parameters in a new regime only depend on the prior in the presence of a break. They
do not learn about this prior distribution from the the parameters in each regime. In
contrast Pesaran et al. (2006) proposed to estimate the pior to improve forecasting by
exploiting the information across regimes. This section introduces a hierarchical prior for
the structural break model. This is computationally feasible only if using the conjugate
prior as in the previous section. The model is referred as the hierarchical SB-LSV model:
SB means structural break and LSV means that the level, the slope and the variance are
subject to breaks. The model in the previous section is labeled as the non-hierarchical
SB-LSV model.
In detail, the previous prior parameters β,H, χ, ν are not fixed any more but given a
Chapter 2. A New Change-point Model 63
prior. The hierarchical SB-LSV model is the following:
β,H ∼ N−W(m0, τ−10 , A0, a0)
χ ∼ G(d0/2, c0/2)
ν ∼ Exp(ρ0)
dt =
dt−1 + 1 w.p. 1− π
1 w.p. π
(2)
(βt, σ−2t ) ∼ 1(dt = 1)NG(β,H, χ/2, ν/2) + 1(dt > 1)δ(βt−1,σ
−2t−1)
yt | βt, σt, Y1,t−1 ∼ N(x′tβt, σ2t )
The positive definite matrix H has a Wishart distribution W(A0, a0), where A0 is a
positive definite matrix and a0 is a positive scalar. The prior mean of H is a0A0. The
prior variance of H ij is a0(A2ij + AiiAjj), where subscript ij means the ith row and the
jth column. β | H is a multivariate Normal N(m0, τ−10 H−1), where τ0 is a positive scalar.
χ has a Gamma distribution with a prior mean of c0/d0 and a prior variance of 2c0/d20.
ν has an Exponential distribution with both of the prior mean and variance equal to ρ0.
Conditional on the number of regimesK and the distinct parameter values β∗i , σ∗i Ki=1,
the posterior distribution of the hierarchical parameters β and H are still Normal-
Wishart.
β,H | β∗i , σ∗i Ki=1 ∼ N−W(m1, τ−11 , A1, a1)
Chapter 2. A New Change-point Model 64
with
m1 =1
τ1
(τ0m0 +
K∑i=1
σ∗−2i β∗i
)
τ1 = τ0 +K∑i=1
σ∗−2i
A1 =
(A−10 +
K∑i=1
σ∗−2i β∗i β∗′i + τ0m0m
′0 − τ1m1m
′1
)−1a1 = a0 +K
The posterior of χ | ν,K, σ∗i Ki=1 is a Gamma distribution.
χ | ν, σ∗i Ki=1 ∼ G(d1/2, c1/2)
with
d1 = d0 +K∑i=1
σ∗−2i
c1 = c0 +Kν
The posterior of ν | χ,K, σ∗i Ki=1 does not have a convenient form,
p(ν | χ,K, σ∗i Ki=1) ∝
((χ/2)ν/2
Γ(ν/2)
)K ( K∏i=1
σ∗−2i
)ν/2
exp− ν
ρ0.
It is sampled by a Metropolis-Hastings algorithm using a random walk as the proposal
distribution.
Similar to the sampling of the break probability π in the non-hierarchical SB-LSV
model, I use a new MCMC sampler to draw the time-invariant parameters including
the hierarchical parameters by using the proposal distribution in a Gibbs sampler as
the proposal distribution. To implement the sampler, define Ψ = (π, β,H, χ, ν) as the
Chapter 2. A New Change-point Model 65
collection of the break probability and the parameters of the hierarchical prior, which are
all time-invariant. Since the analytic form of the marginal likelihood p(Y1,T | Ψ) exists,
the joint sampler draws Ψ from a proposal distribution and accept the new draw with
a probability implied by the Metropolis-Hastings algorithm. Then, sample the regime
durations D1,T and the time-varying parameters Θ1,T conditional on Ψ and the data Y1,T .
The details are in the appendix.
After discarding the burn-in samples, the rest of the sample is used to draw inferences
from the posterior as in the non-hierarchical model. The predictive likelihood, p(yT+1 |
Y1,T ) is estimated by
1
N
N∑i=1
p(yT+1 | dT+1 = d
(i)T + 1,Ψ(i), Y1,T )(1− π(i)) + p(yT+1 | dT+1 = 1,Ψ(i), Y1,T )π(i)
2.4 Extension
This new approach has two crucial assumptions. One is the conjugate prior for the regime
dependent parameters which characterize the conditional data density. The other is that
the data before a break point is uninformative to the regime after it conditional on the
time-invariant parameters. Both are necessary for the analytic form of the predictive
density. If we do not use the conjugate prior, each predictive density p(yt+1 | dt+1, Y1,t)
has to be estimated numerically. If the second assumption is violated, the data before
the break can provide information to the regime after it, the duration dt itself is not
sufficient for the predictive density given the time-invariant parameters. For example, in
Koop and Potter’s (2007) model, in order to integrate out the parameters in the most
recent regime, we need to know the whole sample path of the durations D1,t = (d1, . . . , dt).
However, since the vector of durations D1,t takes 2t values, it is computationally infeasible
to calculate the predictive likelihood for every case, while in the new model it is feasible.
This section extends the model while preserving the two assumptions. The first exten-
Chapter 2. A New Change-point Model 66
sion allows the structural breaks only in the variance σ2t or in the regression coefficients
βt. The second extension considers the duration dependent break probabilities. Because
we have an analytic form for the predictive density even with duration dependent break
probabilities, our approach continues to be computationally straightforward. Since mod-
eling the duration dependent break probability is equivalent to modeling the duration,
the extension assumes a Poisson distribution for each regime.
2.4.1 Breaks in the Variance
The model with breaks only in the variance is referred as the hierarchical SB-V model.
It assumes a time-invariant vector of the regression coefficients β. The time-varying
variance σ2t are drawn from a hierarchical prior. In detail:
χ ∼ G(d0/2, c0/2)
ν ∼ Exp(ρ0)
β ∼ N(β,H−1)
dt =
dt−1 + 1 w.p. 1− π
1 w.p. π
(3)
σ−2t ∼ 1(dt = 1)G(χ/2, ν/2) + 1(dt > 1)δσ−2t−1
yt | β, σt, Y1,t−1 ∼ N(x′tβ, σ2t )
The prior for the regression coefficients β is not modelled as hierarchical since it is
constant across all regimes. The parameters of its prior β and H are fixed. On the other
hand, the prior for the variance σ2t is modelled as hierarchical to share the information
across regimes. Since the regression coefficient β is the same in all regimes, the data
before a break point is informative to the regime after it. So the duration of the most
recent regime dt+1 is not sufficient for computing the posterior of the parameters in that
Chapter 2. A New Change-point Model 67
regime. More rigorously, p(θt | dt, Y1,T ) 6= p(θt | dt, Y1,t). And the predictive density
p(yt+1 | dt+1, Y1,t) is not a student-t distribution any more as in the non-hierarchical
SB-LSV model.
Notice that the second key assumption is still preserved because the vector of the
regression coefficients β is time-invariant for the hierarchical SB-V model. Conditional
on β, if a break happens, the volatility is independently drawn from the hierarchical
prior and the previous information is not useful for the current regime. Although p(θt |
dt, Y1,T ) 6= p(θt | dt, Y1,t), we still have p(θt | dt, β, Y1,T ) = p(θt | dt, β, Y1,t).
Meanwhile, conditional on β, the prior for the variance is conjugate. So the model
can be estimated using the method similar to that in the hierarchical SB-LSV model.
Specifically, define the collection of the time-invariant parameters as Ψ = (π, β, χ, ν). The
posterior MCMC sampler first randomly draw Ψ | Y1,T using the proposal distribution
in a Gibbs sampler and accept it with the probability implied by a Metropolis-Hastings
algorithm. Then, conditional on Ψ and the data Y1,T , draw the regime durations D1,T
and the time-varying parameters Θ1,T . In the hierarchical SB-V model, Θ1,T = σtTt=1,
because the time-invariant regression coefficients β ∈ Ψ are sampled in the first step.
The details are in the appendix.
2.4.2 Breaks in the Regression Coefficients
We can also fix the variance σ2 as time-invariant and only allow the regression coefficients
to change over time. This model is named as the hierarchical SB-LS since the breaks
only happen for the level and slopes. Conditional on the variance σ2, the data before a
break is not informative to the current regime. Also, the conjugate prior exists for the
regression coefficient βt in each regime. Since the two key assumptions are satisfied, the
hierarchical SB-LS model can be estimated as the hierarchical SB-LSV or SB-V model.
Chapter 2. A New Change-point Model 68
In detail, the model is:
β,H ∼ N−W(m0, τ−10 , A0, a0)
σ−2 ∼ G(χ/2, ν/2)
dt =
dt−1 + 1 w.p. 1− π
1 w.p. π
(4)
βt ∼ 1(dt = 1)N(β,H−1) + 1(dt > 1)δβt−1
yt | βt, σ, Y1,t−1 ∼ N(x′tβt, σ2)
The posterior sampler randomly draws the time-invariant parameter Ψ = (π, β,H, σ)
from the its posterior distribution using a MCMC sampler. Then it samples the the
regime durations D1,T and the time varying parameters Θ1,T = βtTt=1 conditional on
the time-invariant parameter Ψ and the data Y1,T . The details are in the appendix.
2.4.3 Duration Dependent Break Probability
Previously, the time-invariant structural break probability π is used in the forecasting step
to compute p(dt+1 = j | Y1,t) in order to construct the filtered probability p(dt = j | Y1,t)
and the predictive density p(yt+1 | Y1,t). If the break probability depends on the regime
duration, define the break probability p(dt+1 = 1 | dt = j) as πj. Then p(dt+1 = j | Y1,t)
is calculated as
p(dt+1 = j | Y1,t) =
p(dt = j − 1 | Y1,t)(1− πj−1) for j = 2, · · · , t+ 1
1−t∑
k=1
p(dt = k | Y1,t)πk for j = 1
The updating step of the forward filtering procedure and the backward sampling proce-
dure are not affected. Conditional on the durations D1,T , the posterior of the parameters
which characterize each regime are not changed, either. So the estimation is still com-
Chapter 2. A New Change-point Model 69
putationally straightforward.
A Poisson distribution is assumed as the distribution for regime durations in this
extension. The hazard rate represents the duration dependent break probabilities3. The
Poisson distribution function is P (Duration = d | λ) = e−λ λ(d−1)
(d−1)! , where d ≥ 1. The
implied break probability is
P (dt+1 = 1 | dt = j, λ) = P (Duration = j | Duration ≥ j, λ)
=P (Duration = j | λ)
P (Duration ≥ j | λ)
=e−λλ(j−1)
(j − 1)γ(j − 1, λ)
where γ(x, y) is the incomplete gamma functions with γ(x, y) =∫ y0tx−1e−tdt. The no-
break probability P (dt+1 = j + 1 | dt = j) is simply 1 − P (dt+1 = 1 | dt = j, λ). The
priors for the other parameters are set the same as the hierarchical SB-LSV model. This
extension is labeled as the hierarchical DDSB-LSV model, where DD means duration
dependent.
To estimate the hierarchical DDSB-LSV model, notice that the set of the time-
invariant parameters Ψ now is (λ, β,H, χ, ν). The posterior sampler draws Ψ from its
posterior distribution by a Metropolis-Hastings sampler. Then the time-varying param-
eters Θ1,T and the regime durations D1,T are sampled conditional on the time-invariant
parameter Ψ and the data Y1,T . This is still a joint sampler as in the hierarchical SB-LSV
with the time-invariant break probability. Details are in the appendix.
2.5 Application to Canada Inflation
The new approach is applied to a Canada quarterly inflation time series to investigate
its dynamics instability. The data is constructed from the quarterly CPI, which is down-
3In general, any hazard function in the survival analysis can be applied to model the duration.
Chapter 2. A New Change-point Model 70
loaded from CANSIM4. The quarterly inflation rate is calculated as the log difference of
the CPI data and scaled by 100. It starts from 1961Q1 and ends at 2009Q4 with 196
observations in total. The summary statistics are in Table 2.1.
The hierarchical models used are SB-LSV, SB-V, SB-LS and DDSB-LSV models. Two
non-hierarchical SB-LSV models are also applied, one estimates the break probability π
and the other fixes π = 0.01. Linear autoregressive models are used as benchmarks for
model comparison. For all the structural break models, I assume that the explanatory
variables in each regime include an intercept and the one-period lag of the dependent
variable. So the data follows an AR(1) process in the each regime.
The prior of the hierarchical SB-LSV model is:
π ∼ B(1, 9)
H ∼W(
0.2 0
0 0.2
, 5)
β | H ∼ N(
0
0
, H−1)
χ ∼ G(2, 2)
ν ∼ Exp(2)
This prior is informative but covers a wide range of empirically realistic values. The
prior mean of the break probability E(π) = 0.1, which implies infrequent breaks. The
inverse of the variance in each regime is drawn from a Gamma distribution, which has
a degree of freedom centered at 2 and a multiplier centered at 1. Conditional on these
4TABLE NUMBER: 3800003. TABLE TITLE: GROSS DOMESTIC PRODUCT (GDP) INDEXES.Data Sources: IMDB (Integrated Meta Data Base) Numbers: 1901 - NATIONAL INCOME ANDEXPENDITURE ACCOUNTS. SERIES TITLE: CANADA; IMPLICIT PRICE INDEXES 2002=100;PERSONAL EXPENDITURE ON CONSUMER GOODS AND SERVICES SERIES FREQUENCY:Quarterly
Chapter 2. A New Change-point Model 71
values, the variance in each regime has a mean of 1.0 and a variance of infinity. The
prior mean of β is a vector of 0’s and the prior mean of H−1 is 2.5 times an identity
matrix. Conditional on these values and the variance in a regime, the intercept and
the autoregressive coefficient of the AR(1) process in the same regime are both drawn
independently from a Normal distribution with a mean of 0 and a variance equals to 2.5
times the variance in that regime.
The prior and the posterior summary of the parameters are in Table 2.2. The posterior
mean of the structural change probability π is 0.04, which is less than its prior mean of
0.1. The posterior mean implies an average duration of 6 years and 1 quarter. The 95%
density interval is narrower than that of the prior, because the data provide information
to shrink the interval. The prior mean of the precision matrix H is the identity matrix,
which is consistent with its posterior mean. So we do not learn much information from
the data for H. On the other hand, the prior and the posterior mean for the intercept β0
are 0 and 0.82, respectively. And the posterior 95% density interval of β0
does not cover
0. We can conclude that the β in the hierarchical structure learns from the information
across regimes. χ does not learn from the data since its prior and posterior mean are 1.0
and 1.01, respectively. However, its density interval shrinks, which implies that the data
confirms the prior assumption. Lastly, ν learns from the data because its prior mean
is 2.0 while its posterior mean is 6.04 and the 95% posterior density interval does not
include the prior mean.
The posterior means of the regression coefficients E(βt | Y1,T ), the standard deviations
E(σt | Y1,T ) and the structural change probabilities p(dt = 1 | Y1,T ) for t = 1, . . . , T , are
plotted in Figure 2.1. The top panel is the data. The second panel plots the break
probabilities over time. The middle panel plots the intercept βt,0 over time. The AR(1)
coefficient βt,1 is plotted in the fourth panel and is labeled as persistence. The standard
deviation σt is in the bottom panel. From the plot of the break probabilities, we can
visually identify 4 major breaks in the inflation process. The first is in the mid-60’s,
Chapter 2. A New Change-point Model 72
which is featured by an increase of the inflation level. The second is in the early 70’s,
which is associated with oil crisis and characterized by an increase of the persistence and
the volatility. In the mid 80’s, a structural change happened by decreasing in both of the
persistence and the volatility, which is consistent with the great moderation. The last
break happened in the early 90’s, which is featured by decreasing in both of the inflation
level and its volatility. Figured 2.1 shows that each break brings different dynamic
patterns to the inflation process.
The non-hierarchical SB-LSV model fixes the parameters of the priors at β = (0, 0)′, H =
I2, χ = 1, ν = 2, which are the prior means of the hierarchical SB-LSV model. The break
probability π has the same prior as that of the hierarchical model, which is B(1, 9).
The posterior mean of π equals to 0.01. Its 95% density interval is (0.002, 0.029). The
non-hierarchical SB-LSV model implies a longer regime duration than the hierarchical
SB-LSV model.
The posterior means of the time varying parameters E(βt | Y1,T ), E(σt | Y1,T ) and
the probabilities of breaks, p(dt = 1 | Y1,T ), are plotted in Figure 2.2. Each panel has
the same interpretation as in Figure 2.1. There are three points worth noticing. First,
there is only one spike in the break probabilities, which is in the early 90’s. It captures
the decrease of the variance and is consistent with the last change-point identified by
the hierarchical SB-LSV model. Second, although it is not visually identifiable in the
second panel, from the middle and the fourth panel, we can observe a gradual increase of
the persistence and a decrease of the intercept between the mid 60’s and the early 70’s.
However, there are many uncertainties for the identification of the change-point in that
period. Lastly, the non-hierarchical SB-LSV model fails to identify the great moderation
in the mid 80’s.
As one alternative to the time-invariant break probability, the duration is modeled as
a Poisson distribution to fit the inflation dynamics. The prior of the duration parameter
λ is assumed as an exponential distribution with a mean of 50. The other priors are set
Chapter 2. A New Change-point Model 73
as the same as that of the hierarchical SB-LSV model. For simplicity, the first period
is assumed to be the first period of its regime. Table 2.3 shows the posterior summary
of the parameters. The learning of the β is similar to that in the hierarchical SB-LSV
model, but the χ and ν are different. The estimates of λ implies one regime lasts about 7
years and a quarter, which is comparable to the length of 6 years and a quarter implied
by the hierarchical SB-LSV model. Figure 2.3 shows that the change-points identified
by the duration dependent model is consistent with Figure 2.1. The dynamics patterns
of these two figures are similar except for the last 10 years. The hierarchical DDSB-LSV
model says that some structural change uncertainties exist around the year 2000, after
which the volatility increased. Although the smoothed parameters for the hierarchical
DDSB-LSV model are similar to that of the hierarchical SB-LSV model, the later model
comparison shows that the Poisson duration is strongly rejected. This is attributed to the
fact that the duration dependent break probability implied by the Poisson distribution
is very small if the regime duration is short. For example, if the duration parameter λ
equals to the posterior mean of 28.9, the break probability p(dt+1 = 1 | dt) is less than
1.0e−5 if the duration dt < 10. This feature causes the model to learn regime changes
slower than a constant break probability model.
For the hierarchical SB-V model, which only allows breaks in the variance, the prior
of the time-invariant regression coefficient vector β is
β ∼ N
0
0
,
1 0
0 1
Its mean and the precision matrix are the prior means in the hierarchical SB-LSV model.
The priors of π, χ and ν are the same as the hierarchical SB-LSV model. The posterior
summary is in Table 2.4. The most prominent feature is that the posterior mean of the
break probability π is 0.16, which is much higher than that of the hierarchical or the
non-hierarchical SB-LSV model.
Chapter 2. A New Change-point Model 74
The frequent change of volatilities is shown in Figure 2.4. The middle panel is the
break probability, from which we can observe that the process is characterized by many
breaks in the variance. This frequent break pattern is similar to the ARCH effects in Engle
(1982, 1983). The bottom panel plots the posterior means of the standard deviations
E(σt | YT ). Although we can see some episodes such as from the mid 80’s to the early
90’s are more stable, there is no general pattern about the volatility evolution. In practice,
it is not desirable to have too frequent structural changes, which implies that less data
can be used to estimate the most recent regime. The frequent break pattern of Canada
inflation estimated by the hierarchical SB-V model reflects the model misspecification,
because the more general hierarchical SB-LSV model nests the hierarchical SB-V model
and it does not find as many breaks as the later one does.
On the other hand, the hierarchical SB-LS model allows the breaks to happen only
in the regression coefficients and keeps the variance constant. The prior of the inverse of
the variance is:
σ−2 ∼ G(1, 0.5)
The values of the multiplier and the degree of freedom in this prior are the means implied
by the prior for the hierarchical SB-LSV model. The priors for π, β andH are set the same
as that of the hierarchical SB-LSV model. The posterior summary is in Table 2.5. The
posterior for the break probability is similar to that of the hierarchical SB-LSV model.
Figure 2.5 plots the posterior means of the regression coefficients and the probabilities
of breaks. Surprisingly, the hierarchical SB-LS model locates the same change-points as
the hierarchical SB-LSV model does in Figure 2.1.
Some questions are raised from the above results. Are changes in volatility important
for the Canada inflation series? Is the great moderation a feature of data? Can a duration
dependent break probability improve the out-of-sample forecasting? This chapter uses
the log marginal likelihoods for model comparison to answer these questions.
Chapter 2. A New Change-point Model 75
Use i as the indicator of a model, the marginal likelihood of the model Mi is
p(Y1,T | Mi) =T∏t=1
p(yt | Y1,t−1,Mi)
This decomposition shows that the marginal likelihood is intrinsically the comparison
based on the out-of-sample forecasts, which automatically penalizes the over-parameterized
model. An improvement on the marginal likelihood implies better forecasting ability over
the whole sample.
The log marginal likelihood is calculated asT∑t=1
log p(yt | Y1,t−1,Mi). The one-period
predictive likelihood p(yt | Y1,t−1,Mi) is calculated by using the data up to t − 1 to
estimate the model and plugging the value of yt into the predictive density function.
The first period is simply to use the prior as the posterior estimates. Kass and Raftery
(1995a) propose to compare the model Mi and Mj by the log Bayes factors log(BFij),
where BFij =p(Y1,T |Mi)
p(Y1,T |Mj)is the ratio of the marginal likelihoods. In short, a positive
value of log(BFij) supports model Mi against Mj. Quantitatively, Kass and Raftery
(1995a) suggest the results barely worth a mention for 0 ≤ log(BFij) < 1; positive for
1 ≤ log(BFij) < 3; strong for 3 ≤ log(BFij) < 5; and very strong for log(BFij) ≥ 5.
Table 2.6 shows the log marginal likelihoods of different models. The autoregressive
models are also applied as benchmarks.
yt | β, σ, Y1,t−1 ∼ N(β0 + β1yt−1 + . . .+ βqyt−q, σ2) (5)
The prior is set as Normal-Gamma
(β, σ−2) ∼ NG(β,H, χ/2, ν/2)
The parameters β = 0(q+1)×1, H = Iq+1, χ = 1, ν = 2. If q = 1, it is an AR(1) process
and the values are the same as in the non-hierarchical SB-LSV model.
Chapter 2. A New Change-point Model 76
The hierarchical SB-V, the hierarchical DDSB-LSV, the non-hierarchical models and
the AR(1) model perform the worst and have log marginal likelihoods less than −155.
The duration dependent break probability is not appropriate for the Canada inflation
dynamics in the application. The AR(2) and the AR(3) model improve the performance
by adding more lags. The hierarchical SB-LS model has the log marginal likelihood
of −140.4, which is larger than that of the AR(2) and the AR(3) model by −140.4 −
(−144.2) = 3.8 and −140.4 − (−144.7) = 4.3, respectively. So, keeping the AR(1)
dynamics but allowing the breaks in the regression coefficients improves the marginal
likelihood more than adding extra lags. The optimal choice is the hierarchical SB-LSV
model with the log marginal likelihood of −122.5, which dominates the other models
strongly.
For a robustness check, I estimate each of the break models by assuming an AR(2) or
AR(3) in each regime. For the hierarchical SB-LSV model, the log marginal likelihoods
are −126.3 and −129.7 for the AR(2) or AR(3) case, which are less than that with the
basic AR(1) assumption. The largest log marginal likelihood of the rest of the break
models using AR(2) or AR(3) in each regime is −144.4. The optimal model is still the
hierarchical SB-LSV with AR(1) process in each regime. Hence, after controlling for the
structural breaks, adding extra number of lags does not improve the marginal likelihood
or forecasting in terms of the predictive likelihoods.
To check the prior sensitivity of the break probability π in SB-LSV, SB-LS and SB-V
models, the alternative priors B(1, 19),B(1, 99) and B(1, 999) were used. For the DDSB-
LSV, the prior mean of λ is set as 10 or 100. The posterior means of the time-varying
parameters in Figure 2.1-2.5 are similar. The result of the model comparison is consistent
with the original one. The priors of the hierarchical parameters are kept the same, since
they cover a reasonably wide range of the parameter space.
Chapter 2. A New Change-point Model 77
2.6 Conclusion
A new approach is introduced to estimate and forecast time series with multiple change-
points. This methodology obtains the analytic form of the predictive density by taking
advantage of the conjugate prior for the parameters that characterize each regime. The
prior is modeled as hierarchical to exploit the information across regimes to improve
forecast.
This approach allows the breaks in the variance, the regression coefficients or both. It
also nests the duration dependent break probabilities. One extension assumes the regime
duration has a Poisson distribution.
A new Markov Chain Monte Carlo sampler is introduced to draw the parameters from
the posterior distribution efficiently. This methodology uses the conditional posterior
distribution in the Gibbs sampler as a proposal distribution and accepts the random draw
by a Metropolis-Hastings algorithm. This approach is efficient because the parameters
are sampled jointly.
This new model and its extensions are applied to a Canada inflation series. The log
marginal likelihood is used as the criteria for model comparison. The best model is the
hierarchical model which allows the breaks in the regression coefficients and the variance
simultaneously. It identifies 4 major change-points in the Canada inflation dynamics.
The model comparison also shows that the duration dependent break probability is not
a feature of the data. It further shows that after controlling for the structural breaks,
adding extra lags as the explanatory variables does not improve the out-of-sample fore-
cast.
Chapter 2. A New Change-point Model 78
2.7 Appendix
2.7.1 Hierarchical SB-LSV Model
1. Sampling π(i), β(i), H(i), χ(i), ν(i) | YT from the following proposal distribution.
(a) Sample π(i) | K(i−1) ∼ B(πa+K(i−1)−1, πb+T−K(i−1)) as the non-hierarchical
model.
(b) Sample H(i) | β(i−1)k , σ
(i−1)k Kk=1 ∼W(A1, a1)
(c) Sample β(i) | H(i), β(i−1)k , σ
(i−1)k Kk=1 ∼ N(m1, (τ1H
(i))−1)
(d) Sample χ(i) | ν(i−1), σ(i−1)k Kk=1 ∼ G(d1/2, c1/2)
(e) Sample ν(i) | ν(i−1) ∼ G( ζν(i−1) , ζ)
with
m1 =1
τ1
(τ0m0 +
K∑i=1
σ−2i βi
)
τ1 = τ0 +K∑i=1
σ−2i
A1 =
(A−10 +
K∑i=1
σ−2i βiβ′i + τ0m0m
′0 − τ1m1m
′1
)−1a1 = a0 +K
d1 = d0 +K∑i=1
σ−2i
c1 = c0 +Kν(i−1)
Accept the whole set Ψ(i) = (π(i), β(i), H(i), χ(i), ν(i)) with probability
min
1,
p(Ψ(i))
p(Ψ(i−1))
p(YT | Ψ(i))
p(YT | Ψ(i−1))
pprop(Ψ(i−1))
pprop(Ψ(i))
where p(Ψ) is the prior density and pprop(Ψ) is the proposal density.
Chapter 2. A New Change-point Model 79
2. Sample st, βt, σtTt=1 | Ψ as the non-hierarchical structural break model.
2.7.2 Hierarchical SB-V Model
The predictive likelihood is computed as:
p(yt | st, Yt−1, β) ∝(
1 +(yt − x′tβ)2
χ
)− (ν+1)2
or
yt | st, Yt−1, β ∼ t(x′tβ,χ
ν, ν)
with the mean x′tβ and the variance χν−2 , where
χ = χ+ E ′t−st+1,t−1Et−st+1,t−1
ν = ν + st − 1
Et−st+1,t−1 = (et−st+1, . . . , et−1)′ is the residual vector with et = yt − x′tβ. The posterior
sampling scheme is the following:
1. Sampling π(i), β(i), χ(i), ν(i) | YT from the following proposal distribution.
(a) Sample π(i) | K(i−1) ∼ B(πa+K(i−1)−1, πb+T−K(i−1)) as the non-hierarchical
model.
(b) Sample β(i) | σ(i−1)k Kk=1, ST ∼ N(β,H
−1)
(c) Sample χ(i) | ν(i−1), σ(i−1)k Kk=1 ∼ G(d1/2, c1/2)
(d) Sample ν(i) | ν(i−1) ∼ G( ζν(i−1) , ζ)
Chapter 2. A New Change-point Model 80
with
β = H−1
(Hβ +T∑t=1
xtytσ2t
)
H = H +T∑t=1
xtx′t
σ2t
d1 = d0 +K∑i=1
σ−2i
c1 = c0 +Kν(i−1)
Accept the whole set Ψ(i) = (π(i), β(i), χ(i), ν(i)) with probability
min
1,
p(Ψ(i))
p(Ψ(i−1))
p(YT | Ψ(i))
p(YT | Ψ(i−1))
pprop(Ψ(i−1))
pprop(Ψ(i))
where p(Ψ) is the prior density and pprop(Ψ) is the proposal density.
2. Sample st, σtTt=1 | Ψ similar to the non-hierarchical structural break model.
2.7.3 Hierarchical SB-LS Model
The predictive likelihood of yt | st, Yt−1, σ is
yt | st, Yt−1, σ ∼ N(x′tβ, x′tH−1xt + σ2)
where β = H−1(Hβ + σ−2X ′t−st+1,t−1Yt−st+1,t−1) and H = H + σ−2X ′t−st+1,t−1Xt−st+1,t−1.
The posterior sampler is
1. Sampling π(i), β(i), H(i), σ(i) | YT from the following proposal distribution.
(a) Sample π(i) | K(i−1) ∼ B(πa+K(i−1)−1, πb+T−K(i−1)) as the non-hierarchical
model.
(b) Sample H(i) | β(i−1)k Kk=1 ∼W(A1, a1)
Chapter 2. A New Change-point Model 81
(c) Sample β(i) | H(i), β(i−1)k Kk=1 ∼ N(m1, (τ1H
(i))−1)
(d) Sample σ−2(i) | β(i−1)
k Kk=1, ST ∼ G(χ1/2, ν1/2)
with
m1 =1
τ1
(τ0m0 +
K∑i=1
βi
)
τ1 = τ0 +K
A1 =
(A−10 +
K∑i=1
βiβ′i + τ0m0m
′0 − τ1m1m
′1
)−1a1 = a0 +K
χ1 = χ0 +T∑t=1
(yt − xtβt)2
ν1 = ν0 + T
Accept the whole set Ψ(i) = (π(i), β(i), H(i), σ(i)) with probability
min
1,
p(Ψ(i))
p(Ψ(i−1))
p(YT | Ψ(i))
p(YT | Ψ(i−1))
pprop(Ψ(i−1))
pprop(Ψ(i))
where p(Ψ) is the prior density and pprop(Ψ) is the proposal density.
2. Sample st, βtTt=1 | Ψ similar the non-hierarchical structural break model.
2.7.4 Hierarchical DDSB-LSV Model
1. Sampling λ(i), β(i), H(i), χ(i), ν(i) | YT from the following proposal distribution.
(a) Sample λ(i) by a random walk proposal distribution
(b) Sample H(i) | β(i−1)k , σ
(i−1)k Kk=1 ∼W(A1, a1)
(c) Sample β(i) | H(i), β(i−1)k , σ
(i−1)k Kk=1 ∼ N(m1, (τ1H
(i))−1)
Chapter 2. A New Change-point Model 82
(d) Sample χ(i) | ν(i−1), σ(i−1)k Kk=1 ∼ G(d1/2, c1/2)
(e) Sample ν(i) | ν(i−1) ∼ G( ζν(i−1) , ζ)
with
m1 =1
τ1
(τ0m0 +
K∑i=1
σ−2i βi
)
τ1 = τ0 +K∑i=1
σ−2i
A1 =
(A−10 +
K∑i=1
σ−2i βiβ′i + τ0m0m
′0 − τ1m1m
′1
)−1a1 = a0 +K
d1 = d0 +K∑i=1
σ−2i
c1 = c0 +Kν(i−1)
Accept the whole set Ψ(i) = (λ(i), β(i), H(i), χ(i), ν(i)) with probability
min
1,
p(Ψ(i))
p(Ψ(i−1))
p(YT | Ψ(i))
p(YT | Ψ(i−1))
pprop(Ψ(i−1))
pprop(Ψ(i))
where p(Ψ) is the prior density and pprop(Ψ) is the proposal density.
2. Sample st, βt, σtTt=1 | Ψ as the non-hierarchical structural break model.
Chapter 2. A New Change-point Model 83
2.7.5 Tables
Table 2.1: Summary statistics of Canada inflation
Mean 1.01
Min -0.54
Max 3.12
Variance 0.69
Skewness 0.83
Excess Kurtosis 0.09
Canada quarterly inflation rate from 1961Q1-2009Q4. There are 196observations in total. The data is scaled by 100 to represent quarterlypercentage change. Data Sources: IMDB (Integrated Meta Data Base)TABLE NUMBER: 3800003. Numbers: 1901
Chapter 2. A New Change-point Model 84
Table 2.2: Posterior summary of the hierarchical SB-LSV model
Prior Prior
Mean 0.95DI Mean Sd 0.95 DI
π 0.1 (0.003, 0.34) 0.04 0.02 (0.01, 0.07)
β0
0.0 (-3.08, 3.08) 0.82 0.20 (0.45, 1.23)
β1
0.0 (-3.08, 3.08) -0.05 0.17 (-0.40, 0.29)
H00 1.0 (0.16, 2.52) 1.01 0.45 (0.35, 2.08)
H01 0.0 (-0.91, 0.91) -0.01 0.37 (-0.77, 0.70)
H11 1.0 (0.16, 2.52) 1.30 0.58 (0.45, 2.68)
χ 1.0 (0.12, 2.79) 1.01 0.37 (0.43, 1.87)
ν 2.0 (0.05, 7.38) 6.04 2.27 (2.45, 11.1)
Canada quarterly inflation rate from 1961Q1-2009Q4. There are 196observations in total. The data is scaled by 100 to represent quarterlypercentage change. Data Sources: IMDB (Integrated Meta Data Base)TABLE NUMBER: 3800003. Numbers: 1901
Table 2.3: Posterior summary of the hierarchical DDSB-LSV model
Prior Prior
Mean 0.95DI Mean Sd 0.95 DI
λ 50.0 (1.27, 184.4) 28.9 7.32 (14.70, 44.22)
β0
0.0 (-3.08, 3.08) 0.74 0.16 (0.44, 1.08)
β1
0.0 (-3.08, 3.08) -0.03 0.15 (-0.33, 0.28)
H00 1.0 (0.16, 2.52) 1.00 0.35 (0.43, 1.80)
H01 0.0 (-0.91, 0.91) -0.06 0.28 (-0.61, 0.48)
H11 1.0 (0.16, 2.52) 1.19 0.39 (0.56, 2.11)
χ 1.0 (0.12, 2.79) 3.86 1.76 (1.22, 8.26)
ν 2.0 (0.05, 7.38) 26.1 12.2 (8.03, 56.7)
Canada quarterly inflation rate from 1961Q1-2009Q4. There are 196observations in total. The data is scaled by 100 to represent quarterlypercentage change. Data Sources: IMDB (Integrated Meta Data Base)TABLE NUMBER: 3800003. Numbers: 1901
Chapter 2. A New Change-point Model 85
Table 2.4: Posterior summary of the hierarchical SB-V model
Prior Prior
Mean 0.95DI Mean Sd 0.95 DI
π 0.1 (0.003, 0.34) 0.16 0.07 (0.05, 0.32)
β0 0.0 (-1.96, 1.96) 0.16 0.05 (0.09, 0.28)
β1 0.0 (-1.96, 1.96) -0.05 0.17 (-0.40, 0.29)
χ 1.0 (0.12, 2.79) 0.96 0.42 (0.38, 1.99)
ν 2.0 (0.05, 7.38) 4.65 1.59 (2.22, 8.42)
Canada quarterly inflation rate from 1961Q1-2009Q4. There are 196observations in total. The data is scaled by 100 to represent quarterlypercentage change. Data Sources: IMDB (Integrated Meta Data Base)TABLE NUMBER: 3800003. Numbers: 1901
Table 2.5: Posterior summary of the hierarchical SB-LS model
Prior Prior
Mean 0.95DI Mean Sd 0.95 DI
π 0.1 (0.003, 0.34) 0.03 0.01 (0.01, 0.07)
σ2 - (0.14, 19.7) 0.17 0.02 (0.14, 0.21)
β0
0.0 (-3.08, 3.08) 0.53 0.36 (-0.16, 1.21)
β1
0.0 (-3.08, 3.08) -0.27 0.32 (-0.93, 0.33)
H00 1.0 (0.16, 2.52) 1.63 0.67 (0.62, 3.22)
H01 0.0 (-0.91, 0.91) 0.12 0.53 (-0.98, 1.14)
H11 1.0 (0.16, 2.52) 2.10 0.89 (0.76, 4.42)
Canada quarterly inflation rate from 1961Q1-2009Q4. There are 196observations in total. The data is scaled by 100 to represent quarterlypercentage change. Data Sources: IMDB (Integrated Meta Data Base)TABLE NUMBER: 3800003. Numbers: 1901
Chapter 2. A New Change-point Model 86
Table 2.6: Log marginal likelihoods
Hierarchical SB-LSV -122.5
Hierarchical SB-V -159.8
Hierarchical SB-LS -140.4
Hierarchical DDSB-LSV -155.6
Non-hierarchical SB-LSV -158.4
Non-hierarchical SB-LSV with π = 0.01 -156.6
AR(1) -160.8
AR(2) -144.2
AR(3) -144.7
Canada quarterly inflation rate from 1961Q1-2009Q4. There are 196observations in total. The data is scaled by 100 to represent quarterlypercentage change. Data Sources: IMDB (Integrated Meta Data Base)TABLE NUMBER: 3800003. Numbers: 1901
Chapter 2. A New Change-point Model 87
2.7.6 Figures
01
23
Infla
tion
rate
0.0
0.2
0.4
0.6
Bre
ak P
roba
bilit
y0.
40.
60.
81.
01.
2In
terc
ept
−0.
40.
00.
20.
4P
ersi
sten
ce0.
340.
380.
420.
46st
anda
rd d
evia
tion
196309 196809 197306 197803 198303 198712 199212 199709 200206 200706
Figure 2.1: Posterior mean of the regression coefficients, the standard deviations and thebreak probabilities from the hierarchical SB-LSV model applied to a Canada quarterlyinflation series from 1961Q1-2009Q4.
Chapter 2. A New Change-point Model 88
01
23
Infla
tion
rate
0.0
0.2
0.4
0.6
0.8
Bre
ak P
roba
bilit
y0.
380.
400.
420.
44In
terc
ept
0.0
0.2
0.4
0.6
Per
sist
ence
0.45
0.50
0.55
0.60
stan
dard
dev
iatio
n
196309 196809 197306 197803 198303 198712 199212 199709 200206 200706
Figure 2.2: Posterior mean of the regression coefficients, the standard deviations andthe break probabilities from the non-hierarchical SB-LSV model applied to a Canadaquarterly inflation series from 1961Q1-2009Q4.
Chapter 2. A New Change-point Model 89
01
23
infla
tion
rate
0.0
0.2
0.4
0.6
0.8
Bre
ak P
roba
bilit
y0.
40.
60.
81.
01.
2In
terc
ept
−0.
20.
00.
20.
4P
ersi
sten
ce0.
340.
380.
42st
anda
rd d
evia
tion
196309 196809 197306 197803 198303 198712 199212 199709 200206 200706
Figure 2.3: Posterior mean of the regression coefficients, the standard deviations and thebreak probabilities from the hierarchical DDSB-LSV model applied to a Canada quarterlyinflation series from 1961Q1-2009Q4.
Chapter 2. A New Change-point Model 90
01
23
Infla
tion
rate
0.0
0.1
0.2
0.3
0.4
0.5
Bre
ak P
roba
bilit
y0.
30.
40.
50.
60.
70.
8st
anda
rd d
evia
tion
196309 196809 197306 197803 198303 198712 199212 199709 200206 200706
Figure 2.4: Posterior mean of the standard deviations and the break probabilities fromthe hierarchical SB-V model applied to a Canada quarterly inflation series from 1961Q1-2009Q4.
Chapter 2. A New Change-point Model 91
01
23
Infla
tion
rate
0.0
0.2
0.4
0.6
0.8
Bre
ak P
roba
bilit
y0.
40.
60.
81.
01.
2In
terc
ept
−0.
6−
0.4
−0.
20.
00.
20.
4P
ersi
sten
ce
196309 196809 197306 197803 198303 198712 199212 199709 200206 200706
Figure 2.5: Posterior mean of the regression coefficients and the break probabilities fromthe hierarchical SB-LS model applied to a Canada quarterly inflation series from 1961Q1-2009Q4.
Chapter 3
Modeling Regime Switching and
Structural Breaks with an Infinite
Dimension Markov Switching Model
92
Chapter 3. Modeling Regime Switching and Structural Breaks 93
3.1 Introduction
This chapter contributes to the current literature by accommodating regime switching
and structural break dynamics in a unified framework. Current regime switching models
are not suitable for capturing instability of dynamics because they assume a finite number
of states and that the future is like the past. Structural break models allow the dynamics
to change over time, however, they may incur loss in estimation precision because the
past states cannot recur and the parameters in each state are estimated separately. An
infinite dimension Markov switching model is proposed to accommodate both types of
model and provide much richer dynamics. I show how to globally identify structural
breaks versus regime switching. In applications to U.S. real interest rates and inflation,
the new model performs better than the alternative parametric regime switching models
and the structural break models in terms of in-sample fit and out-of-sample forecasts.
The model estimation and forecasting are based on a Bayesian framework.
Regime switching models were first applied by Hamilton (1989b) to U.S. GNP data.
It is an important methodology to model nonlinear dynamics and is widely applied to eco-
nomic data including business cycles (Hamilton, 1989b), bull and bear markets (Maheu
et al., 2010), interest rates (Ang and Bekaert, 2002a) and inflation (Evans and Wachtel,
1993). Geweke and Amisano (2011) use a hierarchical mixture structure to capture the
dynamics of financial asset returns. There are two common features of these models.
First, past states can recur over time. Second, the number of states is finite (it is usually
2 and at most 4). In the rest of this chapter, a regime switching model is assumed to have
both features. However, the second feature may cause biased out-of-sample forecasts if
sudden changes of the dynamics exist.
In contrast to regime switching models, structural break models can capture dynamic
instability by assuming an infinite or a much larger number of states at the cost of extra
restrictions. For example, Koop and Potter (2007) proposed a structural break model
with an infinite number of states. If there is a change in the data dynamics, it will
Chapter 3. Modeling Regime Switching and Structural Breaks 94
be captured by a new state. The restriction in their model is that the parameters in
a new state are different from those in the previous ones. This condition is imposed
for estimation tractability. However, it prevents the data divided by break points from
sharing the same model parameter, and could incur some loss in estimation precision.
In the current literature, structural break models such as Chib (1998), Wang and Zivot
(2000), Pesaran et al. (2006) and Maheu and Gordon (2008) have the same feature as
Koop and Potter (2007); namely that the states cannot recur. In the rest of this chapter,
a structural break model is assumed to have non-recurring states and an infinite or a
large number of states.
As we can see, regime switching and structural break dynamics have different impli-
cations for data fitting and forecasting. What is missing in the current literature is a
method to reconcile them. For instance, a common practice is to use one approach or the
other in applications to specific problems. Levin and Piger (2004) modelled U.S. infla-
tion as a structural break process while Evans and Wachtel (1993) assumed a two-regime
Markov switching model. Which feature is more important for inflation analysis, regime
switching, structural breaks or both? Garcia and Perron (1996a) used a three-regime
Markov switching model for U.S. real interest rates while Wang and Zivot (2000) applied
a model with structural breaks in mean and volatility. Did the real interest rates in 1981
have distinct dynamics or return to a historical state with the same dynamics? Existing
econometric models have difficulty answering these questions.
This chapter provides a solution by proposing an infinite dimension Markov switch-
ing model. It incorporates regime switching and structural break dynamics in a unified
framework. Recurring states are allowed to improve estimation and forecasting preci-
sion. An unknown number of states is embedded in the infinite dimension structure and
estimated endogenously to capture the dynamic instability. Different from the Bayesian
model averaging methodology, this model combines different dynamics nonlinearly.
The proposed model builds on and extends Fox et al. (2008). They used a Dirichlet
Chapter 3. Modeling Regime Switching and Structural Breaks 95
process1 as a prior on the transition probabilities of an infinite hidden Markov switching
model. The key innovation in their work is introducing a sticky parameter that favours
state persistence and avoids the saturation of states. Their model is denoted by FSJW
in the rest of this chapter. Jochmann (2010) applies FSJW to investigate the structural
breaks in the U.S. inflation dynamics.
The contributions of this chapter are as follows. First, a second hierarchical structure
in addition to FSJW is introduced to allow learning and sharing of information for the
parameter of the conditional data density in each state. This approach is labelled as
the sticky double hierarchical Dirichlet process hidden Markov model (SDHDP-HMM).
Second, I present an algorithm to globally define structural breaks versus regime switching
dynamics.2 This is done by avoiding the label switching problem and focusing on label
invariant posterior statistics. Lastly, this chapter provides a detailed comparison of the
new SDHDP-HMM against existing alternative regime switching and structural change
models by out-of-sample density forecasting through a simulation study and two empirical
applications to U.S. real interest rates and inflation. The results show that the SDHDP-
HMM is robust to model uncertainty and superior in forecasting, and the hierarchical
structure on the conditional data density parameters improves out-of-sample performance
significantly.
In the application to U.S. real interest rates, the SDHDP-HMM is compared to the
regime switching model by Garcia and Perron (1996a) in a Bayesian framework and the
structural break model by Wang and Zivot (2000) with minor modifications. The results
of the SDHDP-HMM supports Garcia and Perron’s (1996a) finding that the switching
points occurred at the beginning of 1973 (the oil crisis) and the middle of 1981 (the
federal budget deficit) instead of Huizinga and Mishkin’s (1986) finding of October 1979
and October 1982 (both are monetary policy changes). The SDHDP-HMM also identifies
1The Dirichlet process is a commonly used prior in Bayesian nonparametric models.2Jochmann (2010) proposes to identify structural breaks, but ignores recurring states in the posterior
inference.
Chapter 3. Modeling Regime Switching and Structural Breaks 96
two of the three turning points found by Wang and Zivot (2000). The model comparison
based on the predictive likelihood shows regime switching dynamics dominates structural
break dynamics for U.S. real interest rates.
The second application is to U.S. inflation. The SDHDP-HMM is compared to the
regime switching model by Evans and Wachtel (1993) in a Bayesian framework and a
structural break model by Chib (1998). This application shows that inflation has fea-
tures of both regime switching and structural breaks. The SDHDP-HMM can capture
both features and provide richer dynamics than existing parametric models. The pre-
dictive likelihoods further confirm that it is robust to model uncertainty and superior in
forecasting.
The rest of this chapter is organized as follows: Section 3.2 introduces the Dirich-
let process to make this chapter self-contained. Section 3.3 outlines the sticky double
hierarchical Dirichlet process hidden Markov model and discusses its model structure
and implications. Section 3.4 sketches the posterior sampling algorithm, explains how
to identify the regime switching and the structural break dynamics, and describes the
forecasting method. Section 3.5 compares the SDHDP-HMM to regime switching and
structural break models through simulation. Section 3.6 studies the dynamics of U.S.
real interest rate by revisiting the Markov switching model of Garcia and Perron (1996a)
in the Bayesian framework and the structural break model of Wang and Zivot (2000)
with minor modification, and comparing them to the SDHDP-HMM using an extended
data set. Section 3.7 applies the SDHDP-HMM to U.S. inflation, and compares it to
Evans and Wachtel’s (1993) Markov switching model in a Bayesian framework, Chib’s
(1998) structural break model and Fox et al.’s (2008) model. Section 3.8 concludes.
Chapter 3. Modeling Regime Switching and Structural Breaks 97
3.2 Dirichlet process
Before introducing the Dirichlet process, the definition of the Dirichlet distribution is the
following:
Definition The Dirichlet distribution is denoted by Dir(α), where α is aK-dimensional
vector of positive values. Each sample x from Dir(α) is a K-dimentional vector with
xi ∈ (0, 1) andK∑i=1
xi = 1. The probability density function is:
p(x | α) =
Γ
(K∑i=1
αi
)K∏i=1
Γ(αi)
K∏i=1
xαi−1i .
A special case is the Beta distribution denoted by B(α1, α2), which is a Dirichlet
distribution with K = 2.
Define α0 =K∑i=1
αi, Xi, the ith element of the random vector X from a Dirichlet dis-
tribution Dir(α), has mean αiα0
and variance αi(α0−αi)α20(α0+1)
. Hence, we can further decompose
α into two parts: a shape parameter G0 = (α1
α0, · · · , αK
α0) and a concentration parameter
α0. The shape parameter G0 represents the center of the random vector X and the
concentration parameter α0 controls how close X is to G0.
The Dirichlet distribution is conjugate to the multimonial distribution in the following
sense: if
X ∼ Dir(α)
β = (n1, . . . , nK) | X ∼Mult(X)
Chapter 3. Modeling Regime Switching and Structural Breaks 98
where ni is the number of occurrences of i in a sample of n =K∑i=1
ni points from the
discrete distribution on 1, · · · , K defined by X, then
X | β = (n1, . . . , nK) ∼ Dir(α + β).
This relationship is used in Bayesian statistics to estimate the hidden parameters X,
given a collection of n samples. Intuitively, if the prior is represented as Dir(α), then
Dir(α + β) is the posterior following a sequence of observations with histogram β.
The Dirichlet process was introduced by Ferguson (1973) as the extension of the
Dirichlet distribution from a finite dimension to an infinite dimension. It is a distribution
of distributions and has two parameters: the shape parameter G0 is a distribution over
a sample space Ω, and the concentration parameter α0 is a positive scalar. They have
similar interpretations as their counterparts in the Dirichlet distribution. The formal
definition is the following:
Definition The Dirichlet process over a set Ω is a stochastic process whose sample path
is a probability distribution over Ω. For a random distribution F distributed according to
a Dirichlet process DP(α0, G0), given any finite measurable partition A1, A2, · · · , AK of
the sample space Ω, the random vector (F (A1), · · · , F (AK)) is distributed as a Dirichlet
distribution with parameters (α0G0(A1), · · · , α0G0(AK)).
Use the results from the Dirichlet distribution, for any measurable set A, the random
variable F (A) has mean G0(A) and variance G0(A)(1−G0(A))α0+1
. The mean implies the shape
parameter G0 represents the centre of a random distribution F drawn from a Dirichlet
process DP(α0, G0). We define ai ∼ F as an observation drawn from the distribution
F . Because by definition P (ai ∈ A | F ) = F (A), we can derive P (ai ∈ A | G0) =
E(P (ai ∈ A | F ) | G0) = E(F (A) | G0) = G0(A). Hence, the shape parameter G0 is also
the marginal distribution of an observation ai. The variance implies the concentration
parameter α0 controls how close the random distribution F is to the shape parameter
Chapter 3. Modeling Regime Switching and Structural Breaks 99
G0. The larger α0 is, the more likely F is close to G0, and vice versa.
Suppose there are n observations, a = (a1, · · · , an), drawn from the distribution F .
Usen∑i=1
δai(Aj) to represent the number of ai in set Aj, where A1, · · · , AK is a measurable
partition of the sample space Ω and δai(Aj) is the Dirac measure, where
δai(Aj) =
1 if ai ∈ Aj
0 if ai /∈ Aj.
Conditional on (F (A1), · · · , F (AK)), the vector
(n∑i=1
δai(A1), · · · ,n∑i=1
δai(AK)
)has a multi-
nomial distribution. By the conjugacy of the Dirichlet distribution to the multimomial
distribution, the posterior distribution of (F (A1), · · · , F (AK)) is still a Dirichlet distri-
bution:
(F (A1), · · · , F (AK)) | a ∼ Dir
(α0G0(A1) +
n∑i=1
δai(A1), · · · , α0G0(AK) +n∑i=1
δai(AK)
)
Because this result is valid for any finite measurable partition, the posterior of F is still
a Dirichlet process by definition, with new parameters α∗0 and G∗0, where
α∗0 = α0 + n
G∗0 =α0
α0 + nG0 +
n
α0 + n
n∑i=1
δain
The posterior shape parameter, G∗0, is the mixture of the prior and the empirical
distribution implied by observations. As n → ∞, the shape parameter of the posterior
converges to the empirical distribution. The concentration parameter α∗0 → ∞ implies
the posterior of F converges to the empirical distribution with probability one. Ferguson
(1973) showed that a random distribution drawn from a Dirichlet process is almost surely
discrete, although the shape parameter G0 can be continuous. Thus, the Dirichlet process
Chapter 3. Modeling Regime Switching and Structural Breaks 100
can only be used to model continuous distributions with approximation.
For a random distribution F ∼ DP(α0, G0), because F is almost surely discrete, it can
be represented by two parts: different values for θi and their corresponding probabilities
pi, where i = 1, 2, · · · . Sethuraman (1994) found the stick breaking representation of the
Dirichlet process by writing F ≡ (θ, p), where θ ≡ (θ1, θ2, · · · )′, p ≡ (p1, p2, · · · )′ with
pi > 0 and∞∑i=1
pi = 1. The F ∼ DP(α0, G0) can be generated by
Viiid∼ Beta(1, α0) (1)
pi = Vi
i−1∏j=1
(1− Vj) (2)
θiiid∼ G0 (3)
where i = 1, 2, · · · . In this representation, p and θ are generated independently. The
process generating p, (1) and (2), is called the stick breaking process. The name comes
after the pi’s generation. For each i, the remaining probability, 1 −i−1∑j=1
pj, is sliced by a
proportion of Vi and given to pi. It’s like breaking a stick an infinite number of times.
This chapter uses the notation p ∼ SBP(α0) for this process.
The Dirichlet process was not widely used for continuous random variables until West
et al. (1994) and Escobar and West (1995) proposed the Dirichlet process mixture model
(DPM). A simple DPM model assumes the distribution of the random variable y is an
infinite mixture of different distributions.
p ∼ SBP(α0) (4)
θiiid∼ G0 for i = 1, 2, · · · (5)
g(y) =∞∑i=1
pif(y | θi) (6)
Chapter 3. Modeling Regime Switching and Structural Breaks 101
where g(y) is the probability density function of y and f(y | θi) is some probability
density function depending on θi. For example, if f(y | θi) is the normal distribution
density function and θi represents the mean and variance, y is distributed as an infinite
mixture of normal distributions. Hence, continuous random variables can be modelled
non-parametrically by the DPM model.
3.3 Sticky double hierarchical Dirichlet process hid-
den Markov model
The DPM model is used for cross sectional data in West et al. (1994), Escobar and West
(1995) and Shahbaba and Neal (2009) because of the exchangeability of the observations.
However, it is not appropriate for time series modelling because of its lack of state
persistence. This chapter extends the work of Fox et al. (2008) to propose the sticky
double hierarchical Dirichlet process hidden Markov model as follows:
π0 ∼ SBP(γ) (7)
πi | π0 ∼ DP (c, (1− ρ)π0 + ρδi) (8)
λ ∼ G (9)
θiiid∼ G0(λ) (10)
st | st−1 = i ∼ πi (11)
yt | st = j, Yt−1 ∼ f(yt | θj, Yt−1) (12)
where i, j = 1, 2, · · · , and Yt = (y1, · · · , yt)′ represents the data up to time t.
(7) and (8) comprise the first hierarchical structure which governs the transition
probabilities. π0 is the hierarchical distribution drawn from the stick breaking process
with parameter γ and represents a discrete distribution with support on the natural
numbers. Each infinite dimensional vector πi is drawn from a Dirichlet process with
Chapter 3. Modeling Regime Switching and Structural Breaks 102
the concentration parameter c and the shape parameter (1 − ρ)π0 + ρδi, which is a
convex combination of the hierarchical distribution π0 and a degenerate distribution at
integer j. There are three points worth noticing for clarity. First, because the shape
parameter (1 − ρ)π0 + ρδi has support only on natural numbers and each number is
associated with non-zero probability, the random distribution πi can only take values of
the natural numbers and each value will receive positive probability by the stick breaking
representation. When combining the same values and sorting them in ascending order,
each πi will have πij representing the probability of taking integer j. So we can use the
vector πi = (πi1, πi2, · · · )′ to represent a distribution drawn from DP(c, (1− ρ)π0 + ρδi).
Second, πi is the infinite dimension vector of transition probabilities given the past state
st−1 = i by (11); the probability of transition from state i to state j is πij. Stacking
πis to construct the infinite dimensional transition matrix P = (π′1, π′2, · · · )′ gives the
hidden Markov model representation. Lastly, if ρ is larger, πi is expected to have a larger
probability at integer i. This implies st, the state at time t, is more likely to be the same
as st−1. Hence, ρ captures state persistence. In the rest of this chapter, ρ is referred as
the sticky coefficient.
(9) and (10) comprise the second hierarchical structure which governs the parameters
of the conditional data density. G0(λ) is the hierarchical distribution from which the
state dependent parameter θi is drawn independently; G is the prior of λ. This structure
provides a way of learning λ from past values of θi to improve estimation and forecasting.
If a new state is born, the conditional data density parameter θnew is drawn from G0(λ).
Without the second hierarchical structure, the new draw θnew depends on some assumed
prior. Pesaran et al. (2006) argued the importance of modelling the hierarchical distri-
bution for the conditional data density parameters in the presence of structural breaks.
This chapter adopts their method to estimate the hierarchical distribution G0(λ).
In comparison to the SDHDP-HMM, FSJW is comprised of (7)-(8) and (10)-(12).
The stick breaking representation of the Dirichlet process is not fully explored by FSJW,
Chapter 3. Modeling Regime Switching and Structural Breaks 103
since it has only one hierarchical structure on the transition probabilities. In fact, the
stick breaking representation (1)-(3) decomposes the generation of a distribution F from
a Dirichlet process into two independent parts: the probabilities are generated from a
stick breaking process and the parameter values are independently generated from the
shape G0(λ). The SDHDP-HMM takes fuller advantage of this structure than FSJW by
modelling two parallel hierarchical structures.
The SDHDP-HMM can be summarized as an infinite dimension Markov switching
model with a specific prior. Conditional on the hierarchical distribution π0 and the
sticky coefficient ρ, the mean of the transition matrix is
E(P | π0, ρ) = (1− ρ) ·
π01 π02 π03 · · ·
π01 π02 π03 · · ·
π01 π02 π03 · · ·...
......
. . .
+ ρ ·
1 0 0 · · ·
0 1 0 · · ·
0 0 1 · · ·...
......
. . .
The sticky coefficient ρ captures the state persistence by adding weights to the diagonal
elements of the transition matrix. The concentration parameter c controls how close P
is to E(P | π, ρ).
The common practice of setting the prior on the transition matrix of a Markov switch-
ing model assumes each row of the transition matrix is drawn from a Dirichlet distribution
independently. If extended to the infinite dimension, each row πi should be drawn from
a stick breaking process. However, Teh et al. (2006) argued this prior may have an over-
parametrization problem without a hierarchical structure similar to (7) and (8), because
it precludes each πi from sharing information between each other. In terms of parsimony,
the SDHDP-HMM only needs one stick breaking process for the hierarchical distribu-
tion π0, instead of assuming an infinite number of the stick breaking processes for the
whole transition matrix P . In other words, the hierarchical structure on the transition
probabilities collapses setting the prior on the infinite dimension matrix P to the infinite
Chapter 3. Modeling Regime Switching and Structural Breaks 104
dimension vector π0.
The SDHDP-HMM is also related to the DPM model (4)-(6), because (12) can be
replaced by
yt | st−1 = i, Yt−1 ∼∞∑j=1
πijf(yt | θj, Yt−1). (13)
On one hand, the DPM representation implies the SDHDP-HMM is nonparametric. On
the other hand, in contrast to the DPM model, the mixture probability πij is state
dependent. This feature allows the SDHDP-HMM to capture time varying dynamics.
In summary, the SDHDP-HMM is an infinite state space Markov switching model
with a specific form of prior to capture state persistence. Two parallel hierarchical
structures are proposed to provide parsimony and improve forecasting. It preserves the
nonparametric methodology of the DPM model but has state dependent probabilities in
its mixture components.
3.4 Estimation, inference and forecasting
In the following simulation study and applications, the conditional dynamics yt | θj, Yt−1
in (12) is set as a Gaussian AR(q) process:
yt | θj, Yt−1 ∼ N(φj0 + φj1yt−1 + · · ·+ φjqyt−q, σ2j ).
By definition, the conditional data density parameter is θi = (φ′i, σi)′ with φi = (φi0, φi1, · · · , φiq)′.
The hierarchical distribution G0(λ) in (10) is assumed as the regular normal-gamma
distribution in the Bayesian literature.3 The conditional data density parameter θi is
generated as follows:
σ−2i ∼ G (χ/2, ν/2) , φi | σi ∼ N(φ, σ2
iH−1) (14)
3For example, see Geweke (2009).
Chapter 3. Modeling Regime Switching and Structural Breaks 105
By definition, λ = (φ,H, χ, ν). φ is a (q + 1)× 1 vector, H is a (q + 1)× (q + 1) positive
definite matrix, and χ and ν are positive scalars. It is a standard conjugate prior for
linear models. Precision parameter σ−2i is drawn from a gamma distribution with degree
of freedom ν/2 and multiplier χ/2. Given the hierarchical distribution parameter λ, the
conditional mean and variance of σ−2i are ν/χ and 2ν/(χ)2, respectively. And φi | σi
is drawn from a multivariate normal distribution with mean φ and covariance matrix
σ2iH−1.
The prior on the hierarchical parameters λ in (9) follows Pesaran et al. (2006):
H ∼W(A0, a0) (15)
φ | H ∼ N(m0, τ0H−1) (16)
χ ∼ G(d0/2, c0/2) (17)
ν ∼ Exp(ρ0). (18)
H is drawn from a Wishart distribution with parameters of a (q + 1) × (q + 1) positive
definite matrix A0 and a positive scalar a0. Samples from this distribution are positive
definite matrices. The expected value of H is A0a0. The variance of Hij, the ith row
and jth colomn element of H, is a0(A2ij + AiiAij), where Aij is the ith row and jth
column element of A0. m0 is a (q + 1)× 1 vector representing the mean of φ, and τ0 is a
positive scalar, which controls the prior belief of the dispersion of φ. χ is distributed as
a gamma distribution with the multiplier d0/2 and the degree of freedom c0/2. ν has an
exponential distribution with parameter ρ0,
The posterior sampling is based on the block sampler of Fox et al. (2008). It approx-
imates the infinite number of states by a large but finite number of states, which is more
efficient than the individual sampler. 4
4Consistency of the approximation was proved by Ishwaran and Zarepour (2000), and Ishwaran andZarepour (2002). Ishwaran and James (2001) compared the individual sampler with the block samplerand found the latter to be more efficient in terms of mixing.
Chapter 3. Modeling Regime Switching and Structural Breaks 106
In order to apply the block sampler following Fox et al. (2008), the SDHDP-HMM is
approximated by a finite number of states proposed as follows:
π0 ∼ Dir(γL, · · · , γ
L
)(19)
πi | π0 ∼ Dir ((1− ρ)cπ01, ..., (1− ρ)cπ0i + ρc, · · · , (1− ρ)cπ0L) (20)
λ ∼ G (21)
θiiid∼ G0(λ) (22)
st | st−1 = i ∼ πi (23)
yt | st = j, Yt−1 ∼ N(φj0 + φj1yt−1 + · · ·+ φjqyt−q, σ2j ) (24)
where L is the maximal number of states in the approximation and i = 1, 2, · · · , L. The
hierarchical distribution G0(λ) and its prior are set as (14) and (15)-(18), respectively.
From the empirical point of view, the essence of the SDHDP-HMM is not only its
infinite dimension, but also its sensible hierarchical structure of the prior. If L is large
enough, the finite approximation (19)-(24) is equivalent to the original model (7)-(12) in
practice.
3.4.1 Estimation
Appendix 3.9 shows the detailed posterior sampling algorithm. The parameter space
is partitioned into four parts: (S, I), (Θ, P, π0), (φ,H, χ) and ν. S, I and Θ are the
collections of st, a binary auxiliary variable It and θi, respectively.5 Each part is sampled
conditional on the other parts and the data Y as follows:
1. Sample (S, I) | Θ, P, Y
(a) Sample S | Θ, P, Y by the forward and backward smoother in Chib (1996).
(b) Sample I | S by a Polya Urn scheme.
5It is an auxiliary variable for sampling of π0. The details are in the appendix 3.9.
Chapter 3. Modeling Regime Switching and Structural Breaks 107
2. Sample (Θ, P, π0) | S, I, Y
(a) Sample Θ | S, Y by regular linear model result.
(b) Sample π0 | I by a Dirichlet distribution.
(c) Sample P | π0, S by Dirichlet distributions.
3. Sample (φ,H, χ) | S,Θ, ν
(a) Sample (φ,H) | S,Θ by conjugacy of the Normal-Wishart distribution.
(b) Sample χ | ν, S,Θ by a gamma distribution.
4. Sample ν | χ, S,Θ by a Metropolis-Hastings algorithm.
After initiate the parameter values, the algorithm is applied iteratively many times to
obtain a large sample of the model parameters. The first block of samples is discarded to
remove dependence on the initial values. The rest of the sample, S(i),Θ(i), P (i), π(i)0 , φ
(i), H(i), χ(i), ν(i)Ni=1,
are used for inferences as if they were drawn from the posterior distribution. Simula-
tion consistent posterior statistics are computed as sample averages. For example, the
posterior mean of φ, E(φ | Y ), is calculated by 1N
∑Ni=1 φ
(i).
Fox et al. (2008) did not consider the label switching problem, which is an issue in
mixture models.6 For example, switching the values of (θj, πj) and (θk, πk), swapping the
values of state st for st = j, k, while keeping the other parameters unchanged, will result in
the same likelihood value in the finite approximation of the SDHDP-HMM. Inferences on
a label dependent statistic such as θj are misleading without extra constraints. Geweke
(2007) showed that inappropriate constraints can also result in misleading inferences.
To identify regime switching and structural breaks, this chapter uses label invariant
statistics. So the posterior sampling algorithm can be implemented without modification
as suggested by Geweke (2007).
6See Celeux et al. (2000), Fruhwirth-Schnatter (2001) and Geweke (2007)
Chapter 3. Modeling Regime Switching and Structural Breaks 108
3.4.2 Identification of regime switching and structural breaks
A heuristic illustration of how an SDHDP-HMM nests different dynamics, including
regime switching and structural breaks, is plotted in Figure 3.1. Each path comprised
by arrows is one sample path of state S in an SDHDP-HMM. Figure 3.1a represents the
no state change case (the Gaussian AR(q) model from the assumption). Figures 3.1b-
3.1d are the regime switching, structural break and frequent parameter change cases,
respectively. Figure 3.1e captures more complicated dynamics, in which some states are
only visited for one consecutive period while others are not.
The current literature does not study the identification of regime switching and struc-
tural breaks in infinite dimension Markov switching models. This chapter proposes a
global identification algorithm to identify regime switching and structural breaks based
on whether a state is recurrent or not. In detail, if a state only appears for one consecu-
tive period, it is classified as a non-recurrent state. Otherwise, it is defined as recurrent.
The starting time of a recurrent (non-recurrent) state is identified as a regime switching
(structural break) point. In Figure 3.2, states 1 and 4, marked with circles, are non-
recurrent states and the starting points of these two segments are identified as structural
breaks. States 2 and 3, marked with triangles, are recurrent states. The starting time of
each consecutive period is identified as a regime switching point.
In detail, if there exist time t0 and t1 (without loss of generality, let t0 ≤ t1) such
that st = j if and only if t0 ≤ t ≤ t1, then state j is non-recurrent and t0 is identified as
a break point. On the other hand, if st0 6= st0−1 and t0 is not a break point, then t0 is
identified as a regime switching point.
This identifcation criteria is simply because, in general, states are recurrent in the
regime switching models but non-recurrent in structural break models. There are two
points worth noticing. First, in terms of mathematical statistics, a recurrent (non-
recurrent) state in a Markov chain is defined as a state which will be visited with probabil-
ity one (less than one) in the future. This chapter defines the recurrence (non-recurrence)
Chapter 3. Modeling Regime Switching and Structural Breaks 109
as a statistic on one realized posterior sample path of the state variable S. Because the
mathematical definition is not applicable to the estimation with a finite sample size, there
should be no confusion between these two concepts. Second, a true path of states from a
regime switching model can have non-recurrent states because of randomness or a small
sample size. For example, states 2, 3 and 4 in Figure 3.2 can be generated from a three-
regime switching model. The algorithm identifies state 4 as a non-recurrent states, and
its starting point is classified as a break point. Hence, this identification approach may
label a switching point of a regime switching model as a structural break even if the true
states were observed. However, this is simply accidental. As more data are observed, an
embedded regime switching model will have all its states identified as recurrrent.
More importantly, the purpose of the identification is not to decompose the infinite
dimension Markov switching model into several regime switching and structural break
sub-models (there is no unique way even if we wanted to), but to study the richer dy-
namics which allow recurrent states while accommodating structural breaks. Even if a
non-recurrent state was generated from a regime switching model, it usually has different
implication from the recurrent states of the same model.
Hence, separating the recurrent and non-recurrent states is both empirically reason-
able and theoretically consistent with the definition of regime switching and structural
breaks of the existing respective models. In the rest of this chapter, the SDHDP-HMM
associates regime switching and structural breaks to recurrent and non-recurrent states.
3.4.3 Forecast and model comparison
Predictive likelihood is used to compare the SDHDP-HMM to the existing regime switch-
ing and structural break models. It is similar to the marginal likelihood by Kass and
Raftery (1995a). Conditional on an initial data set Yt, the predictive likelihood of
Chapter 3. Modeling Regime Switching and Structural Breaks 110
Y Tt+1 = (yt+1, · · · , yT ) by model Mi is calculated as
p(Y Tt+1 | Yt,Mi) =
T∏τ=t+1
p(yτ | Yτ−1,Mi). (25)
It is equivalent to the marginal likelihood p(YT |Mi) if t = 0.
The calculation of one-period predictive likelihood of model Mi, p(yt | Yt−1,Mi), is
p(yt | Yt−1,Mi) =1
N
N∑i=1
f(yt | Υ(i), Yt−1,Mi) (26)
where Υ(i) is one sample of parameters from the posterior distribution conditional on the
historical data Yt−1. For the SDHDP-HMM, (26) is
p(yt | Yt−1) =1
N
N∑i=1
L∑k=1
π(i)jk f(yt | θ(i)k , s
(i)t−1 = j, Yt−1).
After the calculation of the one-period predictive likelihood, p(yt | Yt−1), the data is
updated by adding one observation, yt, and the model is re-estimated for the prediction
of the next period. This is repeated until the last predictive likelihood, p(yT | YT−1), is
obtained.
Kass and Raftery (1995a) compared model Mi and Mj by the difference of their
log marginal likelihood: log(BFij) = log(Y | Mi) − log(Y | Mj). They suggested in-
terpreting the evidence for Mi versus Mj as: not worth more than a bare mention for
0 ≤ log(BFij) < 1; positive for 1 ≤ log(BFij) < 3; strong for 3 ≤ log(BFij) < 5; and
very strong for log(BFij) ≥ 5. BFij is referred as the Bayes factor of Mi versus Mj.
This chapter uses this criteria for model comparison by predictive likelihood. Geweke
and Amisano (2010) showed the interpretation is the same as Kass and Raftery (1995a)
if we regard the initial data Yt as a training sample.
Chapter 3. Modeling Regime Switching and Structural Breaks 111
3.5 Simulation evidence
To investigate how the SDHDP-HMM reconciles the regime switching and the structural
break models, this section provides some simulation evidence based on three models: the
SDHDP-HMM, a finite Markov switching model, and a structural break model. Each
model simulates a data set of 1000 observations, and all three data sets are estimated by
a SDHDP-HMM with the same prior. First, I plot the posterior means of the conditional
data density parameters E(θst | YT ) and the true values θst over time. If the SDHDP-
HMM fits the model well, the posterior means should be close to the true ones. Second,
more rigorous study is based on the predictive likelihoods. Each of the three models are
estimated on each of the three simulated data sets. The last 100 observations are used to
calculate the predictive likelihood. If the SDHDP-HMM is able to accommodate the other
two models, its predictive likelihood based on the data simulated from the alternative
model should be close to the predictive likelihood estimated by the true model; and if
the SDHDP-HMM provides richer dynamics than the other two models, its predictive
likelihood based on the data simulated from the SDHDP-HMM should strongly dominates
the predictive likelihoods calculated by the other two models.
The parameters of the SDHDP-HMM in the simulation are set as: γ = 3, c = 10, ρ =
0.9, χ = 2, ν = 2, φ = 0 and H = I. The number of AR lags is set as 2. The simulation is
done through the Polya-Urn scheme without approximation as in Fox et al. (2009). The
simulated data are plotted in Figure 3.3.
The first competitor is a K-state Markov switching model as follows:
(pi1, · · · , piK) ∼ Dir(ai1, · · · , aiK) (27)
(φi, σi)iid∼ G0 (28)
Pr(st = j | st−1 = i) = pij (29)
yt | st = j, Yt−1 ∼ N(φj0 + φj1yt−1 + · · ·+ φjqyt−q, σ2j ) (30)
Chapter 3. Modeling Regime Switching and Structural Breaks 112
where i, j = 1, · · · , K. Each AR process uses 2 lags as in the SDHDP-HMM. The
number of states, K, is set as 3. Conditional data density parameters are φ1 = (0, 0.8, 0),
φ2 = (1,−0.5, 0.2), φ3 = (2, 0.1, 0.3) and (σ1, σ2, σ3) = (1, 0.5, 2). The transition matrix
is set as P =
0.96 0.02 0.02
0.02 0.96 0.02
0.02 0.02 0.96
. The simulated data are plotted in Figure 3.4. The
prior of each row of the transition matrix, (pj1, · · · , pjK), is set as independent Dirichlet
distribution Dir(1, · · · , 1). The prior of the conditional data density parameters G0 is
set as the normal-gamma distribution, where σ−2i ∼ G (1, 1) and φi | σi ∼ N (0, σ2i I).
The second competitor is a K-state structural break model from Chib (1998):
p ∼ B(ap, bp) (31)
Pr(st = i | st−1 = i) =
p if i < K
1 if i = K(32)
Pr(st = i+ 1 | st−1 = i) = 1− p if i < K (33)
(φi, σi)iid∼ G0 for i = 1, · · · , K (34)
yt | st = i, Yt−1 ∼ N(φi0 + φi1yt−1 + · · ·+ φiqyt−q, σ2k) (35)
where i = 1, 2, · · · , K is the state indicator. The break probability 1 − p and the
number of AR lags are set as 0.003 and 2, respectively. In the simulation, the K = 4 and
the parameters of the conditional data density are φ1 = (0, 0.8, 0), φ2 = (1,−0.5, 0.2),
φ3 = (0.5, 0.1, 0.3), φ4 = (0, 0.5, 0.2) and (σ1, σ2, σ3, σ4) = (1, 0.5, 1, 0.5). The simulated
data are plotted in Figure 3.5. K = 5 is used in the estimation to nest the true data
generating process. The prior of p is set as a beta distribution B(9, 1), and G0 is set in
the same way as the Markov switching model of (28).
All of the three simulated data sets are estimated by the SDHDP-HMM. The pa-
rameters γ, c, ρ and the number of AR lags are set in the same way as in the SDHDP-
Chapter 3. Modeling Regime Switching and Structural Breaks 113
HMM used in the simulation. The maximal number of states, L, is assumed as 10.
The priors on the other parameters are weakly informative as follows: H ∼W(0.2I, 5),
φ | H ∼ N(0, H−1), χ ∼ G(0.5, 0.5) and ν ∼ Exp(1).
The intercept, the persistence parameter (sum of AR coefficients), the standard devi-
ation and the cumulative number of active states of the simulated data from the SDHDP-
HMM over time are plotted in Figure 3.6 using solid lines. The posterior means of those
parameters from the estimation are also plotted for comparison in the same figure using
dashed lines. It is not surprising that the estimated values tracks the true ones closely
and sharply identifies the change points. Because the estimation is based on the finite
approximation, while the simulation is based on the true data generating process, the
results support the validity of the block sampler.
Figure 3.7 plots the true values of the intercept, the persistence, the volatility and the
cumulative number of switching of the simulated data from the Markov switching model
over time using solid lines. It also includes the posterior means of these parameters esti-
mated from the SDHDP-HMM marked with dashed lines. Figure 3.8 plots the true and
the posterior mean of the regime switching and structural break probabilities implied
by the SDHDP-HMM. The SDHDP-HMM sharply identifies almost all the switching
points. From the middle panel, the global identification does not find prominent strauc-
tural breaks.
Figure 3.9 plots the true parameters from the data simulated from the structural break
model using solid lines and the posterior mean of those parameters estimated from the
SDHDP-HMM using dashed lines. Again, the SDHDP-HMM tracks different parameters
closely. Figure 3.10 plots the true and the posterior mean of the structural break and
regime switching probabilities. The SDHDP-HMM identifies all the break points. The
bottom panel shows some small probabilities of regime switching around the structural
break points. Those values are very small compared to the structural break probabilities.
A more rigorous model comparison can be found in Table 3.1. It shows the log
Chapter 3. Modeling Regime Switching and Structural Breaks 114
predictive likelihoods of the last 100 observations estimated by all of the above three
models on all of the three simulated data sets. The SDHDP-HMM is robust to model
misspecification because it is not strongly rejected against the true model by the log
predictive likelihoods. For example, if the true data generating process is the Markov
switching model, the log predictive likelihoods computed by the true model and the
SDHDP-HMM are −208.10 and −208.32, respectively. The difference is only −208.10−
(−208.32) = 0.22 < 1, which is not worth more than a bare mention. On the other hand,
both the Markov switching model and the structural break model are strongly rejected
if the other one is the true model. For example, if the structural break model is the
data generating process, the log predictive likelihoods calculate by the true model and
the Markov switching model are −178.41 and −187.26. Their difference is −178.41 −
(−187.26) = 8.85 > 5, which is very strong against the misspecified model.
In addition to its robustness, the SDHDP-HMM is also able to capture more compli-
cated dynamics than the Markov switching model and the structural break model. If the
SDHDP-HMM is the true data generating process, the Markov switching model and the
structural break model are both rejected strongly. The log predictive likelihood of the
SDHDP-HMM is 12.75 larger than the Markov switching model and 91.4 larger than the
structural break model. Both values are greater than 5.
In summary, the simulation evidence shows the SDHDP-HMM is robust to model
uncertainty. Both of the Markov switching model and the structural break model can be
tracked closely. Meanwhile, SDHDP-HMM provides richer dynamics than the other two
types of models.
3.6 Application to U.S. real interest rate
The first application is to U.S. real interest rates. Previous studies by Fama (1975); Rose
(1988) and Walsh (1987) tested the stability of their dynamics. While Fama (1975) found
Chapter 3. Modeling Regime Switching and Structural Breaks 115
the ex ante real interest rate as a constant, Rose (1988) and Walsh (1987) cannot reject
the existence of an integrated component. Garcia and Perron (1996a) reconciled these
results using a three-regime Markov switching model and found switching points at the
beginning of 1973 (the oil crisis) and the middle of 1981 (the federal budget deficit) using
quarterly U.S. real interest rates of Huizinga and Mishkin (1986) from 1961Q1-1986Q3.
The real interest rate dynamics in each state are characterized by an Gaussian AR(2)
process. Wang and Zivot (2000) used the same data to investigate structural breaks and
found support of four states (3 breaks) by Bayes factors.
This chapter constructs U.S. quarterly real interest rates in the same way as Huizinga
and Mishkin (1986) and extends their data set to a total of 252 observations from 1947Q1
to 2009Q4. The last 200 observations are used for predictive likelihood calculation.
Alternative models for comparison include the Markov switching model of Garcia and
Perron (1996a) put in a Bayesian framework, the structural break model of Wang and
Zivot (2000) with minor modifications and linear AR models. All but the linear model
have the Gaussian AR(2) process in each state as in Garcia and Perron (1996a) and
Wang and Zivot (2000).
The priors of the SDHDP-HMM are set as follows:
π0 ∼ Dir(1/L, · · · , 1/L)
πi | π0 ∼ Dir(π01, · · · , π0i + 9, · · · , π0L)
H ∼W(0.2 I, 5)
φ | H ∼ N(0, H−1)
χ ∼ G(0.5, 2.5)
ν ∼ Exp(5)
where i = 1, · · · , L. The block sampler uses the truncation of L = 10.7 For prior
7L = 10 is chosen to represent a potentially large number of states and keep a reasonable amount of
Chapter 3. Modeling Regime Switching and Structural Breaks 116
sensitivity I investigated the model estimates with values of 5 and 10 for γ, 1 and 20 for c
and 0.5 for ρ. I also assumed a continuous prior on (γ, c, ρ) to estimate these values. The
posterior means of the time-varying parameters are similar and the results of the model
comparison are consistent with the original one. The priors for the second hierarchical
parameters are kept the same, since they cover a reasonably wide range of the parameter
space.
The Markov switching model used is (27)-(30). Garcia and Perron (1996a) estimated
the model in the classical approach and this chapter revisits their paper in the Bayesian
framework. The prior of each row of the transition matrix, (pi1, · · · , piK), is set as
Dir(1, · · · , 1). The priors of φi and σi are σ−2i ∼ G(2.5, 0.5) and φi | σi ∼ N(0, σ2i · I).
The structural break model is (31)-(35). The model proposed in this chapter allows
simultaneous breaks of the intercept, the AR coefficients and the volatility, while Wang
and Zivot (2000) only allowed the intercept and the volatility to change. The prior of
p is a beta distribution B(9, 1), and parameters φi and σi have the same priors as the
Markov switching model.
A linear AR model is applied as a benchmark for model comparison:
(φ, σ) ∼ G0 (36)
yt | Yt−1 ∼ N(φ0 + φ1yt−1 + · · ·+ φqyt−q, σ2) (37)
where the prior of σ is set in the same way as in the Markov switching model and the
structural break model. The prior of φ | σ is N(0, σ2 · I), where the dimension of vector
0 and the identity matrix I depends on the number of lags q in the AR model.
Table 3.2 shows the log predictive likelihoods of the different models. Firstly, the table
shows that all linear models are dominated by nonlinear models. Secondly, the log pre-
dictive likelihoods strongly support the Markov switching models against the structural
computation. Larger values of L produce similar results.
Chapter 3. Modeling Regime Switching and Structural Breaks 117
break models. The log predictive likelihood of the four-regime or five-regime Markov
switching model is larger than that of any K-regime structural break models by more
than 5, which is very strong based on Kass and Raftery (1995a). Lastly, although the
SDHDP-HMM does not strongly dominate the Markov switching models, it still performs
the best among all the models. This is consistent with the simulation evidence that the
SDHDP-HMM can provide robust forecasts by optimally combining regime switching
and structural breaks in the Bayesian framework.
The whole sample is estimated by the SDHDP-HMM with the same prior as in the
predictive likelihood calculation. Figure 3.11 plots the posterior mean of different param-
eters over time, including the regime switching and structural break probabilities. There
is no sign of structural breaks from the bottom panel, so the regime switching dynamics
prevail over the structural break dynamics, which is consistent with Table 3.2 based on
the predictive likelihoods. Three important regimes are found in the figure: one has high
volatility and high persistence, one has low volatility and intermediate persistence and
the last one has intermediate volatility and low persistence.
Figure 3.12 plots the posterior mean of the cumulative number of active states over
time. A state is defined as active if it is occupied by data. The posterior mean of
the total number of active states is 3.4. Compared to the truncation of L = 10 in the
estimation, this value implies that the finite truncation restriction is not binding, so the
nonparametric flavor is preserved.
Garcia and Perron (1996a) found switching points at the beginning of 1973 and the
middle of 1981. In the SDHDP-HMM, the probability of regime switching in 1973Q1 is
0.39, which is consistent with their finding. From 1980Q2 to 1981Q1, the probabilities of
regime switching are 0.18, 0.13, 0.32 and 0.19, respectively. There are many uncertainties
in the switching point identification at these times. However, it is quite likely that the
state changed in one of these episodes, which is only slightly earlier than in Garcia and
Perron (1996a). On the other hand, Huizinga and Mishkin (1986) identified October 1979
Chapter 3. Modeling Regime Switching and Structural Breaks 118
and October 1982 as the turning points. Probabilities of regime switching or structural
breaks in 1979Q3 and Q4 are less than 0.02 and 0.04 respectively, while in 1982Q3 and
1982Q4 they are both less than 0.01. Thus, the SDHDP-HMM supports Garcia and
Perron (1996a) against Huizinga and Mishkin (1986).
As an attempt to locate potential state changing points, I define a time with the sum of
regime switching and structural break probability greater than 0.3 as a candidate turning
point. There are 9 points in total: 1952Q1, 1952Q3, 1956Q2, 1958Q2, 1973Q1, 1980Q4,
1986Q2, 2002Q1, and 2005Q3. Among those points, 1973Q1 and 1980Q4 are consistent
with Garcia and Perron (1996a). Wang and Zivot (2000) found 1970Q3, 1980Q2 and
1985Q4 as structural break points. 1980Q4 and 1986Q2 are close to their finding. How-
ever, the SDHDP-HMM does not identify late 1970 as neither a break nor a switching
point, which contradicts their result.
In summary, by using a larger sample, U.S. real interest rates are better described by
a regime switching model than a structural break one. The robustness of the SDHDP-
HMM to model uncertainty is supported by the predictive likelihoods. The SDHDP-
HMM performs better than all the parametric alternatives in forecasting.
3.7 Application to U.S. inflation
The second application is to the U.S. inflation. Ang et al. (2007) studied the performance
of different methods including time series models, Phillips curve based models, asset
pricing models and surveys. The regime switching model is the best in their most recent
sub-sample. Evans and Wachtel (1993) applied a two-regime Markov switching model
to explain consistent inflation forecast bias. Their model incorporated a random walk
model of Stock and Watson (1991) in one regime and a stationary AR(1) model in
another. Structural breaks in inflation were studied by Groen et al. (2009); Levin and
Piger (2004) and Duffy and Engle-Warnick (2006). Application of the SDHDP-HMM
Chapter 3. Modeling Regime Switching and Structural Breaks 119
can reconcile these two types of models and provide more description of the inflation
dynamics.
Monthly inflation rates are constructed from U.S. Bureau of Labor Statistics based
on CPI-U. There are 1152 observations from Feb 1914 to Jan 2010. They are computed
as annualized monthly CPI-U growth rates scaled by 100. The alternative models for
comparison include the FSJW, the regime switching model of Evans and Wachtel (1993),
a structural break model from Chib (1998) and linear Gaussian AR(q) models.
For the SDHDP-HMM, each state has Gaussian AR(1) dynamics. L = 10 and the
priors are:
π0 ∼ Dir(1/L, · · · , 1/L)
πi | π0 ∼ Dir(π01, · · · , π0i + 9, · · · , π0L)
H ∼W(0.2 I, 5)
φ | H ∼ N(0, H−1)
χ ∼ G(0.5, 2.5)
ν ∼ Exp(5)
with i = 1, · · · , L.8
In FSJW, each state has Gaussian AR(1) dynamics and the number of states L =
10, as in the SDHDP-HMM, to use the block sampler. The priors of the transition
probabilities are the same as in the SDHDP-HMM. The prior on the parameters of
conditional data density is normal-gamma: σ−2i ∼ G(0.5, 0.5) and φi | σi ∼ N(0, σ2i I).
For comparison, the structural break model of (31)-(35) is also applied with the
number of the AR lags equal to 1. The prior of p is a beta distribution B(9, 1); and the
8The prior sensitivity check is the same as in the real interest rate application. The posterior meansof the time varying parameters are similar and the model comparison results are consistent with theoriginal one.
Chapter 3. Modeling Regime Switching and Structural Breaks 120
priors of φi and σi are the same as in FSJW.
Another alternative model is the regime switching model of Evans and Wachtel (1993):
P (st = i | st−1 = i) = pi
(φ0, σ0) ∼ G0
σ1 ∼ G1
yt | st = 0, Yt−1 ∼ N(φ00 + φ01yt−1, σ20)
yt | st = 1, Yt−1 ∼ N(yt−1, σ21)
where i = 1, 2. The prior of the self-transition probability, pi, is a beta distribution
B(9, 1). φ0, σ0, and σ1 have the same priors as FSJW and the structural break model.
The linear AR model of (36) and (37) is applied as a benchmark for model comparison.
The prior of σ is set the same as in FSJW, the Markov switching model and the structural
break model. The prior of φ | σ is N(0, σ2 · I), where the dimension of the vector 0 and
the identity matrix I depends on the number of lags q in the AR model.
The last 200 observations are used to calculate the log predictive likelihoods. The
results are shown in Table 3.3. First, the linear models are strongly dominated by the
nonlinear models. Second, the regime switching model of Evans and Wachtel (1993)
strongly dominates the structural break models. Third, FSJW strongly dominates all
the other parametric alternatives including the regime switching model. The differ-
ence between the log predictive likelihoods of FSJW and the regime switching model is
−82.45 − (−92.50) = 6.05, which implies heuristically FSJW is exp(6.05) ≈ 424 times
better than the Evans and Wachtel (1993) model. Last, The SDHDP-HMM is the best
model in terms of the log predictive likelihood. The difference of the log likelihoods of
the SDHDP-HMM and FSJW is −74.07− (−82.45) = 8.38, which implies the SDHDP-
HMM is exp(8.34) ≈ 4188 times better than FSJW. Because the SDHDP-HMM nest the
parametric alternatives, its dominance can be attributed to the fact that both the regime
Chapter 3. Modeling Regime Switching and Structural Breaks 121
switching and the structural break dynamics are important for inflation, and each single
type of the parametric model alone can not capture its dynamics.
The models are estimated on the whole sample. The posterior summary statistics
are located in Table 3.4. The posterior mean of the persistence parameter is 0.97 with
a 95% density interval of (0.742, 1.199), which implies the inflation dynamics are likely
to be persistent in a new state. On the other hand, FSJW draws the parameters of
the conditional data density for each new state from the prior assumption. This key
difference contributes to the superior forecasting ability of the SDHDP-HMM to FSJW.
The smoothed means of conditional data density parameters, break probabilities and
switching probabilities over time for the SDHDP-HMM are in Figure 3.13. The instability
of the dynamics is consistent with Jochmann (2010). The last panel plots the structural
breaks and regime switching probabilities at different times. There are two major breaks
at 1920-07 and 1930-05. The structural break and regime switching probabilities of
1920-07 are 0.3 and 0.5, respectively. There is quite a large chance for this time to have
unique dynamics different from other periods. For 1930-05, the structural break and
regime switching probabilities are 0.13 and 0.09. This implies that if the state changed
at this time, it would be more likely to be a structural break.
To illustrate the dominance of the regime switching dynamics over the structural
break dynamics, Figure 3.14 plots the probabilities of past states to be the same as
the last period, Jan 2010, or p(zτ = z201001 | Y ). Most of the positive probabilities are
before 1955. This emphasizes the importance of modelling recurrent states in forecasting.
Structural break models perform worse than the SDHDP-HMM and the regime switching
model because they drop much useful information.
Figure 3.15 plots the smoothed regression coefficients, standard deviations and break
probabilities over time estimated by the structural break model with K = 10. Structural
breaks happened in the first half of the sample, therefore the recent regime switching
implied by the SDHDP-HMM is not identified.
Chapter 3. Modeling Regime Switching and Structural Breaks 122
Figure 3.16 plots the smoothed probabilities of the random walk state and the smoothed
volatility estimated by the regime switching model of Evans and Wachtel (1993) over time.
The random walk dynamics dominate after 1953. In recent times, inflation dynamics en-
tered into the stationary AR(1) state. This is consistent with the SDHDP-HMM evidence
shown in Figure 3.14 that the most recent episodes are associated with data before 1955.
In another word, there is a regime change back to the same state in the past.
In Figure 3.17, all regime switching and structural break probabilities are plotted
for comparison. The first panel is the regime switching model; the second panel is the
structural break model and the last is the SDHDP-HMM. Two features can be summa-
rized from the figure. First, the state changes identified by the structural break model
and the regime switching model are associated with the state changes identified by the
SDHDP-HMM. Second, the SDHDP-HMM estimates more turning points than each of
the alternative models. This implies it captures some dynamics that can not be iden-
tified by the regime switching or the structural break models alone. Together with the
log predictive likelihood results in Table 3.3, inflation shows both regime switching and
structural break features.
In summary, the regime switching and the structural break dynamics are both impor-
tant for inflation modelling and forecasting. The SDHDP-HMM is able to capture both
of these features. In the SDHDP-HMM, the parameters of the conditional data density in
each state can provide information for the learning of the hierarchical distribution G0(λ)
and significantly improve forecasting.
3.8 Conclusion
This chapter proposes to apply an infinite dimension Markov switching model labelled as
the sticky double hierarchical Dirichlet process hidden Markov model (SDHDP-HMM) to
accommodate regime switching and structural break dynamics. Two parallel hierarchical
Chapter 3. Modeling Regime Switching and Structural Breaks 123
structures, one governing the transition probabilities and the other governing the param-
eters of the conditional data density, are imposed for parsimony and to improve forecasts.
An algorithm for the global identification of regime switching and structural breaks is
proposed based on label invariant statistics. A simulation study shows the SDHDP-HMM
is robust to model uncertainty and able to capture more complicated dynamics than the
regime switching and the structural break models.
Applications to U.S. real interest rates and inflation show the SDHDP-HMM is robust
to model uncertainty and provides better forecasts than regime switching and structural
break models. The second hierarchical structure on the data density parameters provides
significant improvement in inflation forecasting. From both the predictive likelihood
results and the posterior probabilities of regime switching and structural breaks, U.S.
real interest rates are better described by a regime switching model while inflation has
both features of regime switching and structural breaks.
Chapter 3. Modeling Regime Switching and Structural Breaks 124
3.9 Appendix
3.9.1 Sample (S, I) | Θ, P, Y
S | Θ, P, Y is sampled by the forward and backward smoother in Chib (1996).
I is introduced to facilitate the π0 sampling. From (19) and (20), the filtered distri-
bution of πi conditional on St = (s1, · · · , st) and π0 is a Dirichlet distribution:
πi | St, π0 ∼ Dir(c(1− ρ)π01 + n
(t)i1 , · · · , c(1− ρ)π0i + cρ+ n
(t)ii , · · · , c(1− ρ)π0L + n
(t)iL
)
where n(t)ij is the number of τ | sτ = j, sτ−1 = i, τ ≤ t. Integrate out πi, the conditional
distribution of st+1 given St and π0 is:
p(st+1 = j | st = i, St, π0) ∝ c(1− ρ)π0j + cρδi(j) + n(t)ij
Construct a variable It with a Bernoulli distribution:
p(It+1 | st = i, St, π0) ∝
cρ+
L∑j=1
n(t)ij if It+1 = 0
c(1− ρ) if It+1 = 1
Construct the conditional distribution:
p(st+1 = j | It+1 = 0, st = i, St, β) ∝ n(t)ij + cρδi(j)
p(st+1 = j | It+1 = 1, st = i, St, β) ∝ π0j
This construction preserves the same conditional distribution of st+1 given St and π0.
To sample I | S, use the Bernoulli distribution:
It+1 | st = i, st+1 = j, π0 ∼ Ber(c(1− ρ)π0j
n(t)ij + cρδi(j) + c(1− ρ)π0j
).
Chapter 3. Modeling Regime Switching and Structural Breaks 125
3.9.2 Sample (Θ, P, π0) | S, I, Y
After sampling I and S, write mi =∑st=i
It. By construction, the conditional posterior of
π0 given S and I only depends on I and is a Dirichlet distribution by conjugacy:
π0 | S, I ∼ Dir(γ
L+m1, . . . ,
γ
L+mL)
This approach of sampling π0 is simpler than Fox et al. (2009).
Conditional on π0 and S, the sampling of πi is straightforward by conjugacy:
πi | π0, S ∼ Dir(c(1− ρ)π01 + ni1, · · · , c(1− ρ)π0i + cρ+ nii, · · · , c(1− ρ)π0L + niL)
where nij is the number of τ | sτ = j, sτ−1 = i.
Sampling Θ | S, Y uses the results of regular linear models. The prior is:
(φi, σ−2i ) ∼ N−G(φ,H, χ, ν).
By conjugacy, the posterior is:
(φi, σ−2i ) | S, Y ∼ N−G(φi, H i, χi, νi)
with:
φi = H−1i (Hφ+X ′iYi)
H i = H +X ′iXi
χi = χ+ Y ′i Yi + φ′Hφ− φ′Hφ
νi = ν + ni
where Yi is the collection of yt in state i. xt = (1, yt−1, · · · , yt−q) is the regressor in the
Chapter 3. Modeling Regime Switching and Structural Breaks 126
AR(q) model. Xi and ni are the collection of xt and the number of observations in state
i, respectively.
3.9.3 Sample (φ,H, χ) | S,Θ, ν
The conditional posterior is:
φ,H | φi, σiKi=1 ∼ N−W(m1, τ1, A1, a1)
where K is the number of active states. φi and σi are the parameters associated with
these states:
m1 =1
τ−10 +K∑i=1
σ−2i
(τ−10 m0 +
K∑i=1
σ−2i φi
)
τ1 =1
τ−10 +K∑i=1
σ−2i
A1 =
(A−10 +
K∑i=1
σ−2i φiφ′i + τ−10 m0m
′0 − τ−11 m1m
′1
)−1a1 = a0 +K.
The conditional posterior of χ is:
χ | ν, σiKi=1 ∼ G(d1/2, c1/2)
with d1 = d0 +K∑i=1
σ−2i and c1 = c0 +Kν.
Chapter 3. Modeling Regime Switching and Structural Breaks 127
3.9.4 Sample ν | χ, S,Θ
The conditional posterior of ν has no regular density form:
p(ν | χ, σiKi=1) ∝(
(χ/2)ν/2
Γ(ν/2)
)K ( K∏i=1
σ−2i
)ν/2
exp− ν
ρ0.
The Metroplolis-Hastings method is applied to sample ν. Draw a new ν from a proposal
distribution:
ν | ν ′ ∼ G(ζνν ′, ζν)
with acceptance probability min
1,
p(ν|χ,σiKi=1)fG(ν′; ζνν,ζν)
p(ν′|χ,σiKi=1)fG(ν;ζνν′ ,ζν)
, where ν ′ is the value from
the previous sweep. ζν is fine tuned to produce a reasonable acceptance rate around 0.5,
as suggested by Roberts et al. (1997) and Muller (1991).
Chapter 3. Modeling Regime Switching and Structural Breaks 128
3.9.5 Tables
Table 3.1: Log predictive likelihoods in simulation study
DGP Estimated Model
SDHDP-HMM MS SB
SDHDP-HMM -170.55 -183.30 -264.65
MS -208.32 -208.10 -212.07
SB -179.51 -187.26 -178.41
The SDHDP-HMM is (7)-(12); the MS is the 3-state markovswitching model of (27)-(30); and the SB is the 4-statestructural break model of (31)-(35). 1000 observations aresimulated from each model and the last 100 are used tocalculate the predictive likelihoods. The first column showsthe names of the data generating processes. The first rowshows the names of the estimated models.
Chapter 3. Modeling Regime Switching and Structural Breaks 129
Table 3.2: Log predictive likelihoods of U.S. real interest rates
AR(q) q=2 q=3 q= 4
-457.62 -451.07 -455.97
MS(K)b K=3 K=4 K=5
-433.09 -426.62 -424.51
SB(K)c K=3 K=4 K=5 K=10 K=15 K=20
-450.82 -451.62 -437.28 -433.50 -432.69 -434.24
SDHDP-HMMe -423.50
There are 252 observations from 1947Q1 to 2009Q4 for U.S. quarterly realinterest rate. The last 200 observations are used to calculate the predictivelikelihoods. MS(K) is the K-state Markov switching model of (27)-(30) andSB(K) is the K-state structural break model of (31)-(35). For the SDHDP-HMM,MS(K) and SB(K), each state has Gaussian AR(2) dynamics.
Table 3.3: Log predictive likelihoods of U.S. inflation
AR(q) q=1 q=2 q= 3
-185.06 -173.17 -173.42
MS b -92.50
SB(K)c K=3 K=5 K=10
-125.50 -98.69 -101.18
FSJWd -82.45
SDHDP-HMMe -74.07
There are 1153 observations from Feb 1914 to Jan 2010 forU.S. monthly inflation rate. The last 200 observations areused to calculate the predictive likelihoods. MS is the 2-stateMarkov switching model of Evans and Wachtel (1993); SB(K)is the K-state structural break model of (31)-(35); and theFSJW is Fox et al.’s (2008) model (or the SDHDP-HMMwithout the hierarchical structure of G0 on the conditionaldata density parameters). For the SDHDP-HMM, FSJW, MSand SB(K), each state has Gaussian AR(1) dynamics.
Chapter 3. Modeling Regime Switching and Structural Breaks 130
Table 3.4: Posterior summary of theSDHDP-HMM parameters estimatedfrom U.S. inflation
mean Std 95% DI
φ0 0.03 0.20 (-0.376, 0.432)
φ1 0.97 0.11 (0.742, 1.199)
H00 0.77 0.42 (0.225, 1.788)
H01 0.02 0.35 (-0.692, 0.734)
H11 2.06 0.84 (0.768, 4.047)
χ 0.19 0.12 (0.034, 0.488)
ν 1.21 0.50 (0.496, 2.414)
There are 1153 observations from Feb1914 to Jan 2010 for U.S. monthlyinflation rate. Each state has GaussianAR(1) dynamics:yt = φst0 + φst1yt−1 + σstεt. Theparameters φi and σi are drawn from thehierarchical distribution:σ−1i ∼ G(χ/2, ν/2) andφi | σi ∼ N(φ, σiH
−1).
Chapter 3. Modeling Regime Switching and Structural Breaks 131
3.9.6 Figures
Chapter 3. Modeling Regime Switching and Structural Breaks 132
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
(a)
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
(b)
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
(c)
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
(d)
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
3
5
6
7
8
2
4
1
(e)
Figure 3.1: The horizontal dimension from left to right represents time and the verticalcircles represent different states. The numbers in the circles are the labels of the statesand they are exchangeable. The SDHDP-HMM nests: (a) no state change, (b) regimeswitching, (c) structural breaks, (d) frequent parameter change and (e) regime switchingand structural breaks.
Chapter 3. Modeling Regime Switching and Structural Breaks 133
time
12
34
structural breakregime switching
Figure 3.2: Example of the global identification of regime switching and structural breaks.All the points represent one sample of the states (s1, · · · , sT ) from the posterior samples.The circles are non-recurrent states which only appear for one consecutive period andthe triangles are recurrent states. The solid arrows point to the break points and thedashed arrows point to the switching points.
Chapter 3. Modeling Regime Switching and Structural Breaks 134
0 200 400 600 800 1000
−15
−10
−5
05
10
time
Figure 3.3: Data simulated by a SDHDP-HMM. Each state has Gaussian AR(2) dynam-ics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt.
Chapter 3. Modeling Regime Switching and Structural Breaks 135
0 200 400 600 800 1000
−4
−2
02
46
8
time
Figure 3.4: Data simulated by a 3-state Markov switching model of (27)-(30). Each statehas Gaussian AR(2) dynamics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt.
Chapter 3. Modeling Regime Switching and Structural Breaks 136
0 200 400 600 800 1000
−3
−2
−1
01
23
time
Figure 3.5: Data simulated by a structural break model of (31)-(35). Each state hasGaussian AR(2) dynamics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt.
Chapter 3. Modeling Regime Switching and Structural Breaks 137
−1.
0−
0.5
0.0
0.5
1.0
Time
Inte
rcep
t
TrueEstimated
50 150 250 350 450 550 650 750 850 950−
1.0
−0.
50.
00.
51.
0
Time
Per
sist
ence
TrueEstimated
50 150 250 350 450 550 650 750 850 950
12
34
56
Time
Std True
Estimated
50 150 250 350 450 550 650 750 850 950
12
34
56
Time
num
ber
of s
tate
s
TrueEstimated
50 150 250 350 450 550 650 750 850 950
Figure 3.6: The SDHDP-HMM estimates the data from figure 3.3. Each state has Gaus-sian AR(2) dynamics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt. The solid lines are the truevalues and the dashed lines are the posterior means of those values estimated by theSDHDP-HMM. The top-left panel plots the intercepts φst0; the top-right panel plots thepersistence parameters φst1 + φst2; the bottom-left plots the conditional standard devia-tions σst and the bottom-right plots the cumulative number of the active states (activestate means it has been visited at least once).
Chapter 3. Modeling Regime Switching and Structural Breaks 138
0.0
0.5
1.0
1.5
2.0
Time
Inte
rcep
t
TrueEstimated
50 150 250 350 450 550 650 750 850 950
−0.
4−
0.2
0.0
0.2
0.4
0.6
0.8
Time
Per
sist
ence
TrueEstimated
50 150 250 350 450 550 650 750 850 950
0.5
1.0
1.5
2.0
Time
Std
TrueEstimated
50 150 250 350 450 550 650 750 850 950
02
46
810
Time
Num
ber
of s
witc
hing
TrueEstimated
50 150 250 350 450 550 650 750 850 950
Figure 3.7: The SDHDP-HMM estimates the data in figure 3.4. Each state has GaussianAR(2) dynamics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt. The solid lines are the truevalues and the dashed lines are the posterior means of those values estimated by theSDHDP-HMM. The top-left panel plots the intercepts φst0; the top-right panel plotsthe persistence parameters φst1 + φst2; the bottom-left plots the conditional standarddeviations σst and the bottom-right plots the cumulative number of regime switching.
Chapter 3. Modeling Regime Switching and Structural Breaks 139
0.0
0.2
0.4
0.6
0.8
1.0
True probabilities of regime switching
0.0
0.2
0.4
0.6
0.8
1.0
Break probabilities
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Switching probabilities
Figure 3.8: Globally identified smoothed probabilities of structural breaks and regimeswitching. The data is in figure 3.4, which is simulated by a 3-state Markov switch-ing model and estimated by the SDHDP-HMM. The top panel is the switching points;the middle panel is the probabilites of structural breaks and the bottom panel is theprobabilities of regime switching.
Chapter 3. Modeling Regime Switching and Structural Breaks 140
0.0
0.5
1.0
1.5
Time
Inte
rcep
t
TrueEstimated
50 150 250 350 450 550 650 750 850 950
−0.
4−
0.2
0.0
0.2
0.4
0.6
0.8
Time
Per
sist
ence
TrueEstimated
50 150 250 350 450 550 650 750 850 950
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
Time
Std
TrueEstimated
50 150 250 350 450 550 650 750 850 950
12
34
5
Time
Num
ber
of s
tate
s
TrueEstimated
50 150 250 350 450 550 650 750 850 950
Figure 3.9: The SDHDP-HMM estimates the data in figure 3.5. Each state has GaussianAR(2) dynamics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt. The solid lines are the truevalues and the dashed lines are the posterior means of those values estimated by theSDHDP-HMM. The top-left panel plots the intercepts φst0; the top-right panel plotsthe persistence parameters φst1 + φst2; the bottom-left plots the conditional standarddeviations σst and the bottom-right plots the cumulative number of states.
Chapter 3. Modeling Regime Switching and Structural Breaks 141
0.0
0.2
0.4
0.6
0.8
1.0
True probabilities of structural breaks
0.0
0.2
0.4
0.6
0.8
1.0
Break probabilities
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Switching probabilities
Figure 3.10: Globally identified smoothed probabilities of structural breaks and regimeswitching. The data is in figure 3.5, which is simulated by a 4-state structural break modeland estimated by the SDHDP-HMM. The top panel is the switching points; the middlepanel is the probabilites of structural breaks and the bottom panel is the probabilities ofregime switching.
Chapter 3. Modeling Regime Switching and Structural Breaks 142
−10
−5
05
10R
eal i
nter
est r
ate
−1.
0−
0.5
0.0
0.5
1.0
inte
rcep
t0.
20.
30.
40.
50.
6pe
rsis
tenc
e1.
52.
53.
5st
d0.
00.
20.
40.
60.
8pr
obab
ility
break probswitch prob
1950−2 1956−3 1962−4 1969−1 1975−2 1981−3 1987−4 1994−1 2000−2 2006−3
Figure 3.11: There are 252 observations from 1947Q1 to 2009Q4 for U.S. quarterly realinterest rate. The data are estimated by the SDHDP-HMM and each state has GaussianAR(2) dynamics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt. The first panel plots the dataand the rest plots the posterior mean of different parameters: the second panel plotsthe intercepts φst0, the third panel plots the persistence parameters φst1 + φst2, thefourth panel plots the conditional standard deviations σst and the last panel plots theprobabilites of regime switching and structural breaks.
Chapter 3. Modeling Regime Switching and Structural Breaks 143
−10
−5
05
10
Index
Data
1.0
1.5
2.0
2.5
3.0
Index
Number of States
1950−2 1962−4 1975−2 1987−4 2000−2
Figure 3.12: There are 252 observations from 1947Q1 to 2009Q4 for U.S. quarterly realinterest rate. The data are estimated by the SDHDP-HMM and each state has GaussianAR(2) dynamics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt. The top panel plots the dataand the bottom panel plots the posterior mean of the cumulative number of active states(active state means it has been visited at least once).
Chapter 3. Modeling Regime Switching and Structural Breaks 144
−10
010
20In
flatio
n−
3−
10
12
3in
terc
ept
0.85
0.90
0.95
1.00
pers
iste
nce
0.5
1.0
1.5
std
0.0
0.2
0.4
0.6
0.8
prob
abili
ty
break probswitch prob
191810 192805 193801 194708 195703 196610 197605 198601 199508 200503
Figure 3.13: There are 1153 observations from Feb 1914 to Jan 2010 for U.S. monthlyinflation rate. The data are estimated by the SDHDP-HMM and each state has GaussianAR(1) dynamics: yt = φst0 + φst1yt−1 + σstεt. The first panel plots the data and the restplots the posterior mean of different parameters: the second panel plots the interceptsφst0, the third panel plots the persistence parameters φst1, the fourth panel plots theconditional standard deviations σst and the last panel plots the probabilites of regimeswitching and structural breaks.
Chapter 3. Modeling Regime Switching and Structural Breaks 145
0.0
0.2
0.4
0.6
0.8
Pro
babi
litie
s
191912 193112 194312 195512 196712 197912 199112 200312
Figure 3.14: There are 1153 observations from Feb 1914 to Jan 2010 for U.S. monthlyinflation rate. The data are estimated by the SDHDP-HMM and each state has GaussianAR(1) dynamics: yt = φst0+φst1yt−1+σstεt. This figure plots the smoothed probabilitiesof past states of U.S. inflation to be the same as Jan 2010, or p(zτ = z201001 | Y ).
Chapter 3. Modeling Regime Switching and Structural Breaks 146
−10
010
20In
flatio
n−
1.0
0.0
1.0
2.0
Mea
n0.
20.
40.
60.
81.
0P
ersi
sten
ce0.
40.
81.
21.
6st
d0.
00.
20.
40.
60.
8P
rob
of b
reak
191810 192805 193801 194708 195703 196610 197605 198601 199508 200503
Figure 3.15: There are 1153 observations from Feb 1914 to Jan 2010 for U.S. monthlyinflation rate . The data are estimated by the structural break model of Chib (1998) andeach state has Gaussian AR(1) dynamics: yt = φst0 + φst1yt−1 + σstεt. The first panelplots the data and the rest plots the posterior mean of different parameters: the secondpanel plots the intercepts φst0, the third panel plots the persistence parameters φst1, thefourth panel plots the conditional standard deviations σst and the last panel plots theprobabilites of structural breaks.
Chapter 3. Modeling Regime Switching and Structural Breaks 147
−10
010
20In
flatio
n0.
00.
20.
40.
60.
81.
0P
rob
of r
ando
m w
alk
0.4
0.6
0.8
1.0
1.2
std
191810 192805 193801 194708 195703 196610 197605 198601 199508 200503
Figure 3.16: There are 1153 observations from Feb 1914 to Jan 2010 for U.S. monthlyinflation rate. The data are estimated by the 2-state Markov switching model of Evansand Wachtel (1993). The first panel plots the data and the rest plots the posterior meanof different parameters: the second panel plots the probabilities of in the random walkstate and the last panel plots the conditional standard deviations.
Chapter 3. Modeling Regime Switching and Structural Breaks 148
0.0
0.2
0.4
0.6
0.8
1.0
Index
Markov Switching Model
0.0
0.2
0.4
0.6
0.8
1.0
Index
Structural Break Model
0.0
0.2
0.4
0.6
0.8
1.0
Index
Structural BreakRegime Switching
191810 192805 193801 194708 195703 196610 197605 198601 199508 200503
Figure 3.17: There are 1153 observations from Feb 1914 to Jan 2010 for U.S. monthlyinflation rate. The first panel plots the posterior probabilities of regime switching by the2-state Markov switching model of Evans and Wachtel (1993); the second panel plots theposterior probabilities of structural breaks by the structural break model of Chib (1998)and the last panel plots the posterior probabilities of regime switching and structuralbreaks by the SDHDP-HMM.
Bibliography
Ang, A. and Bekaert, G. Regime switches in interest rates. Journal of Business &
Economic Statistics, 20(2):163–182, 2002a.
Ang, A., Bekaert, G., and Wei, M. Do macro variables, asset markets, or surveys forecast
inflation better? Journal of Monetary Economics, 54(4):1163–1212, 2007.
Ang, Andrew and Bekaert, Geert. International asset allocation with regime shifts.
Review of Financial Studies, 15:1137–1187, 2002b.
Ang, Andrew and Bekaert, Geert. Regime switches in interest rates. Journal of Business
& Economic Statistics, 20:163–182, 2002c.
Bry, G. and Boschan, C. Cyclical Analysis of Time Series: Selected Procedures and
Computer Programs. NBER, New Yor, 1971.
Calvet, Laurent and Fisher, Adlai. Multifrequency news and stock returns. Journal of
Financial Economics, 86(1):178–212, 2007.
Casella, G. and Robert, C.P. Rao-Blackwellisation of sampling schemes. Biometrika, 83
(1):81, 1996.
Cecchetti, S., Lam, P., and Mark, Nelson. Mean reversion in equilibrium asset prices.
American Economic Review, 80:398–418, 1990.
149
BIBLIOGRAPHY 150
Celeux, G., Hurn, M., and Robert, C.P. Computational and Inferential Difficulties with
Mixture Posterior Distributions. Journal of the American Statistical Association, 95
(451), 2000.
Chauvet, Marcelle and Potter, Simon. Coincident and leading indicators of the stock
market. Journal of Empirical Finance, 7:87–111, 2000.
Chib, S. Marginal likelihood from the gibbs output. Journal of the American Statistical
Association, 90(432):1313–1321, 1995.
Chib, S. Calculating posterior distributions and modal estimates in Markov mixture
models* 1. Journal of Econometrics, 75(1):79–97, 1996.
Chib, S. Estimation and comparison of multiple change-point models. Journal of Econo-
metrics, 86(2):221–241, 1998.
David, Alexander and Veronesi, Pietro. What ties return volatilities to price valuations
and fundamentals? Chicago Booth Research Working Paper No. 10-05, 2009.
Duffy, J. and Engle-Warnick, J. Multiple regimes in US monetary policy? A nonpara-
metric approach. Journal of Money Credit and Banking, 38(5):1363, 2006.
Engle, R.F. Autoregressive conditional heteroscedasticity with estimates of the variance
of United Kingdom inflation. Econometrica: Journal of the Econometric Society, 50
(4):987–1007, 1982.
Engle, R.F. Estimates of the Variance of US Inflation Based upon the ARCH Model.
Journal of Money, Credit and Banking, 15(3):286–301, 1983.
Escobar, MD and West, M. Bayesian density estimation and inference using mixtures.
Journal of the American Statistical Association, 90, 1995.
Evans, M. and Wachtel, P. Inflation regimes and the sources of inflation uncertainty.
Journal of Money, Credit and Banking, pages 475–511, 1993.
BIBLIOGRAPHY 151
Fama, E.F. Short-term interest rates as predictors of inflation. The American Economic
Review, 65(3):269–282, 1975.
Ferguson. A bayesian analysis of some nonparametric problem. The Annals of Statistics,
1(2):209–230, 1973.
Fox, E.B., Sudderth, E.B., Jordan, M.I., and Willsky, A.S. An HDP-HMM for systems
with state persistence. In Proceedings of the 25th international conference on Machine
learning, pages 312–319. ACM, 2008.
Fox, E.B., Sudderth, E.B., Jordan, M.I., and Willsky, A.S. The Sticky HDP-HMM:
Bayesian Nonparametric Hidden Markov Models with Persistent States. Arxiv preprint
arXiv:0905.2592, 2009.
Fruhwirth-Schnatter, S. Markov Chain Monte Carlo Estimation of Classical and Dynamic
Switching and Mixture Models. Journal of the American Statistical Association, 96
(453), 2001.
Fruhwirth-Schnatter, Sylvia. Finite Mixture and Markov Switching Models. Springer
Series in Statistics. New York/Berlin/Heidelburg, 2006.
Garcia, R. and Perron, P. An analysis of the real interest rate under regime shifts. The
Review of Economics and Statistics, 78(1):111–125, 1996a.
Garcia, Rene and Perron, Pierre. An analysis of real interest rates under regime shifts.
Review of Economics and Statistics, pages 111–125, 1996b.
Gerlach, R., Carter, C., and Kohn, R. Efficient Bayesian Inference for Dynamic Mixture
Models. Journal of the American Statistical Association, 95(451), 2000.
Geweke, J. Interpretation and inference in mixture models: Simple MCMC works. Com-
putational Statistics & Data Analysis, 51(7):3529–3550, 2007.
BIBLIOGRAPHY 152
Geweke, J. Complete and Incomplete Econometric Models. Princeton Univ Pr, 2009.
Geweke, J. and Amisano, G. Comparing and evaluating Bayesian predictive distributions
of asset returns. International Journal of Forecasting, 2010.
Geweke, J. and Amisano, G. Hierarchical markov normal mixture models with applica-
tions to financial asset returns. Journal of Applied Econometrics, 26(1):1–19, 2011.
Geweke, John. Contemporary Bayesian Econometrics and Statistics. Wiley, 2005.
Giordani, P. and Kohn, R. Efficient Bayesian inference for multiple change-point and
mixture innovation models. Journal of Business and Economic Statistics, 26(1):66–77,
2008.
Gonzalez, Liliana, Powell, John G., Shi, Jing, and Wilson, Antony. Two centuries of bull
and bear market cycles. International Review of Economics and Finance, 14:469–486,
2005.
Gordon, S. and St-Amour, P. A preference regime model of bull and bear markets.
American Economic Review, 90(4):1019–1033, 2000.
Groen, J.J.J., Paap, R., and Ravazzolo, F. Real-time inflation forecasting in a changing
world. http://hdl.handle.net/1765/16709, 2009.
Guidolin, Massimo and Timmermann, Allan. Economic implications of bull and bear
regimes in uk stock and bond returns. The Economic Journal, 115:111–143, 2005.
Guidolin, Massimo and Timmermann, Allan. An econometric model of nonlinear dynam-
ics in the joint distribution of stock and bond returns. Journal of Applied Econometrics,
21(1):1–22, 2006.
Guidolin, Massimo and Timmermann, Allan. Asset allocation under multivariate regime
switching. Journal of Economic Dynamics and Control, 31(11):3503–3504, 2007.
BIBLIOGRAPHY 153
Guidolin, Massimo and Timmermann, Allan. International asset allocation under skew
and kurtosis preferences. Review of Financial Studies, 21(2):889–935, 2008.
Hamilton, J. D. A new approach to the economic analysis of non-stationary time series
and the business cycle. Econometrica, 57:357–384, 1989a.
Hamilton, J. D. and Lin, G. Stock market volatility and the business cycle. Journal of
Applied Econometrics, 11:573–593, 1996.
Hamilton, James D. Time Series Analysis. Princeton University Press, Princeton, New
Jersey, 1994.
Hamilton, J.D. A new approach to the economic analysis of nonstationary time series and
the business cycle. Econometrica: Journal of the Econometric Society, 57(2):357–384,
1989b.
Huizinga, J. and Mishkin, F.S. Monetary policy regime shifts and the unusual behavior
of real interest rates, 1986.
Inclan, C. Detection of multiple changes of variance using posterior odds. Journal of
Business & Economic Statistics, 11(3):289–300, 1993.
Ishwaran, H. and James, L.F. Gibbs Sampling Methods for Stick-Breaking Priors. Journal
of the American Statistical Association, 96(453), 2001.
Ishwaran, H. and Zarepour, M. Markov chain Monte Carlo in approximate Dirichlet and
beta two-parameter process hierarchical models. Biometrika, 87(2):371, 2000.
Ishwaran, H. and Zarepour, M. Dirichlet prior sieves in finite normal mixtures. Statistica
Sinica, 12(3):941–963, 2002.
Jochmann, M. Modeling U S Inflation Dynamics: A Bayesian Nonparametric Approach.
Working Paper Series, 2010.
BIBLIOGRAPHY 154
Kandel, Shmuel and Stambaugh, Robert. Expectations and volatility of consumption
and asset returns. Review of Financial Studies, 3:207–232, 1990.
Kass, R.E. and Raftery, A.E. Bayes factors. Journal of the American Statistical Associ-
ation, 90(430):773–795, 1995a.
Kass, Robert E. and Raftery, Adrian E. Bayes factors. Journal of the American Statistical
Association, 90(420):773–795, 1995b.
Koop, G. and Potter, S.M. Estimation and forecasting in models with multiple breaks.
Review of Economic Studies, 74(3):763, 2007.
Lettau, Martin, Ludvigson, Sydney C., and Wachter, Jessica A. The declining equity
premium: What role does macroeconomic risk play. Review of Financial Studies, 21
(4):1653–1687, 2008.
Levin, A.T. and Piger, J.M. Is inflation persistence intrinsic in industrial economies?
2004.
Lunde, Asger and Timmermann, Allan G. Duration dependence in stock prices: An
analysis of bull and bear markets. Journal of Business & Economic Statistics, 22(3):
253–273, 2004.
Maheu, J. M. and McCurdy, T. H. Identifying bull and bear markets in stock returns.
Journal of Business & Economic Statistics, 18(1):100–112, 2000a.
Maheu, J. M. and McCurdy, T. H. Volatility dynamics under duration-dependent mixing.
Journal of Empirical Finance, 7(3-4):345–372, 2000b.
Maheu, J.M. and Gordon, S. Learning, forecasting and structural breaks. Journal of
Applied Econometrics, 23(5):553–583, 2008.
BIBLIOGRAPHY 155
Maheu, J.M. and McCurdy, T.H. How useful are historical data for forecasting the long-
run equity return distribution? Journal of Business and Economic Statistics, 27(1):
95–112, 2009.
Maheu, J.M., McCurdy, T.H., and Song, Y. Components of bull and bear markets: bull
corrections and bear rallies. Working Papers, 2010.
Muller, P. A generic approach to posterior integration and Gibbs sampling. Rapport
technique, pages 91–09, 1991.
Ntantamis, Christos. A duration hidden markov model for the identification of regimes
in stock market returns. University of Aarhus - CREATES, Available at SSRN:
http://ssrn.com/abstract=1343726, 2009.
Pagan, Adrian R. and Sossounov, Kirill A. A simple framework for analysing bull and
bear markets. Journal of Applied Econometrics, 18(1):23–46, 2003.
Pastor, Lubos and Stambaugh, Robert F. The equity premium and structural breaks.
Journal of Finance, 4:1207–1231, 2001.
Perez-Quiros, G. and Timmermann, A. Business cycle asymmetries in stock returns: Ev-
idence from higher order moments and conditional densities. Journal of Econometrics,
103(1-2):259–306, 2001.
Pesaran, M.H., Pettenuzzo, D., and Timmermann, A. Forecasting time series subject to
multiple structural breaks. Review of Economic Studies, 73(4):1057–1084, 2006.
Primiceri, G.E. Time varying structural vector autoregressions and monetary policy.
Review of Economic Studies, 72(3):821–852, 2005.
Roberts, GO, Gelman, A., and Gilks, WR. Weak convergence and optimal scaling of
random walk Metropolis algorithms. The Annals of Applied Probability, 7(1):110–120,
1997.
BIBLIOGRAPHY 156
Rose, A.K. Is the real interest rate stable? Journal of Finance, 43(5):1095–1112, 1988.
Schwert, G. William. Indexes of u.s. stock prices from 1802 to 1987. Journal of Business,
63(3):399–426, 1990.
Sethuraman, J. A constructive definition of dirichlet priors. Statistica Sinica, 4:639–650,
1994.
Shahbaba, B. and Neal, R.M. Nonlinear models using dirichlet process mixtures. Journal
of Machine Learning Research, 10:1829–1850, 2009.
Stock, J.H. and Watson, M.W. A probability model of the coincident economic indicators.
Leading Economic indicators: new approaches and forecasting records, 66, 1991.
Stock, J.H. and Watson, M.W. Evidence on structural instability in macroeconomic time
series relations. Journal of Business & Economic Statistics, 14(1):11–30, 1996.
Teh, Y.W., Jordan, M.I., Beal, M.J., and Blei, D.M. Hierarchical dirichlet processes.
Journal of the American Statistical Association, 101(476):1566–1581, 2006.
Turner, C., Startz, R., and Nelson, C. A markov model of heteroskedasticity, risk, and
learning in the stock market. Journal of Financial Economics, 25:3–22, 1989.
van Norden, Simon and Schaller, Huntley. Regime switching in stock market returns.
Applied Financial Economics, 7:177–191, 1997.
Walsh, C.E. Three questions concerning nominal and real interest rates. Economic
Review, (Fall):5–19, 1987.
Wang, J. and Zivot, E. A Bayesian time series model of multiple structural changes in
level, trend, and variance. Journal of Business & Economic Statistics, 18(3):374–386,
2000.
BIBLIOGRAPHY 157
West, M., Muller, P., and Escobar, M.D. Hierarchical priors and mixture models, with
application in regression and density estimation. Aspects of uncertainty: A Tribute to
DV Lindley, pages 363–386, 1994.