Development and Application of Hidden Markov … and Application of Hidden Markov Models in ... Development and Application of Hidden Markov Models in the Bayesian Framework ... 1.1

Development and Application of Hidden Markov Models inthe Bayesian Framework

by

Yong Song

A thesis submitted in conformity with the requirementsfor the degree of Doctor of PhilosophyGraduate Department of Economics

University of Toronto

Copyright c© 2011 by Yong Song

ii

Abstract

Development and Application of Hidden Markov Models in the Bayesian Framework

Yong Song

Doctor of Philosophy

Graduate Department of Economics

University of Toronto

2011

This thesis develops new hidden Markov models and applies them to financial market

and macroeconomic time series.

Chapter 1 proposes a probabilistic model of the return distribution with rich and

heterogeneous intra-regime dynamics. It focuses on the characteristics and dynamics of

bear market rallies and bull market corrections, including, for example, the probability

of transition from a bear market rally into a bull market versus back to the primary bear

state. A Bayesian estimation approach accounts for parameter and regime uncertainty

and provides probability statements regarding future regimes and returns. A Value-at-

Risk example illustrates the economic value of our approach.

Chapter 2 develops a new efficient approach to model and forecast time series data

with an unknown number of change-points. The key is assuming a conjugate prior for

the time-varying parameters which characterize each regime and treating the regime du-

ration as a state variable. Conditional on this prior and the time-invariant parameters,

the predictive density and the posterior of the change-points have closed forms. The con-

jugate prior is further modeled as hierarchical to exploit the information across regimes.

This framework allows breaks in the variance, the regression coefficients or both. In

addition to the time-invariant structural change probability, one extension assumes the

regime duration has a Poisson distribution. A new Markov Chain Monte Carlo sampler

draws the parameters from the posterior distribution efficiently. The model is applied to

iii

Canadian inflation time series.

Chapter 3 proposes an infinite dimension Markov switching model to accommodate

regime switching and structural break dynamics or a combination of both in a Bayesian

framework. Two parallel hierarchical structures, one governing the transition probabil-

ities and another governing the parameters of the conditional data density, keep the

model parsimonious and improve forecasts. This nonparametric approach allows for

regime persistence and estimates the number of states automatically. A global identifica-

tion algorithm for structural changes versus regime switching is presented. Applications

to U.S. real interest rates and inflation compare the new model to existing paramet-

ric alternatives. Besides identifying episodes of regime switching and structural breaks,

the hierarchical distribution governing the parameters of the conditional data density

provides significant gains to forecasting precision.

iv

Dedication

This thesis is dedicated to my parents, Deren Song and Jiwei Li, who gave birth to me

and always support my decision to pursue my academic career.

It is also dedicated to my aunt Jihong Li, who treats me like her own son.

Lastly and the most importantly, it is dedicated to my wife, Mei Dong, who always

encourages me when I meet with difficulties. She is my super woman.

v

Acknowledgements

I can not overstate my gratitude to my supervisor, professor John Maheu. He guided me

through not only this thesis but also my Ph.D. years. For five years, he has taught me

from the very basics to the research frontier. His knowledge and sharp intuition in eco-

nomics and econometrics helped me avoid many detours in my research. His enthusiasm

and inspiration encouraged me to keep a high morale during difficult times. Professor

Maheu is the nicest and fairest person I ever met and I can feel his warmth whenever

I talk with him either about research or personal life. Econometrics is very challenging

for myself since I do not have strong mathematics background. His personality is as

important as his expertise to keep me going and finishing my Ph.D. degree. Thanks to

him, I am very happy and excited to work in my research field.

I am grateful to professor Thomas McCurdy and professor Martin Burda, who are

in my thesis committee. Professor McCurdy is one of my coauthors for the first chapter

of this thesis. I have learned a lot from him about research methodology during our

collaboration. He is very supportive and provides me with many opportunities to present

my work. Professor Burda gave me helpful advice to improve my presentation skills.

Although they are very busy in their own work, both of them were always available to

talk with me.

I thank professor Gary Koop for being a very helpful external examiner and flying

all the way from Scotland to Toronto to attend my oral exam. His report on this thesis

provides many sharp questions for me to consider and sheds many lights on my future

research.

I am indebted to professor Christian Gourieroux for a thorough proof-reading of my

job market paper and many helpful comments. And I would like to thank professor John

Geweke for providing many detailed comments on my job market paper.

I also want to thank Robert Kohn, James Morley, Christos Ntantamis, Daniel Smith,

Rodney Strachan, Hao Zhou and seminar participants at Australian National University,

vi

the Bachelier Finance Society 6th World Congress, the Bank of Canada, Canadian Eco-

nomics Association Annual Conference, Canadian Econometric Study Group, CenSoC

at University of Technology Sydney, Econometric Society Summer Meeting, Northern

Finance Association Annual Meeting, Rimimi Centre for Economic Analysis, Univer-

sity of Melbourne, University of New South Wales, University of Toronto, Third Risk

Management Conference Mont Tremblant and Wilfrid Laurier University.

I want express my regard to my primary school teacher Jing Zhou, who is a great

teacher with responsibilities.

I also appreciate my student colleagues for stimulating a competitive and friendly

environment, particularly Xin Jin, Tat-kei Lai, Wei Liu, Brian McCaig, Ling Sun and

Simiao Zhou. I also thank my friends Jingjing Zhang, Haiying Kang, Yong Han, Shaoyan

Zhu in Toronto and Zhiguang Li, Zhihong Si, Jinghuan Wang in Tianjin.

The first chapter is based on joint work with John Maheu and Tom McCurdy.

Contents

1 Components of Bull and Bear Markets: Bull Corrections and Bear

Rallies viii

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Bull and Bear Dating Algorithms . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5 Estimation and Model Comparison . . . . . . . . . . . . . . . . . . . . . 12

1.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2 An Efficient Approach to Estimate and Forecast in the Presence of an

Unknown Number of Change-points 48

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.2 Maheu-Gordon Model with Conjugate Prior . . . . . . . . . . . . . . . . 53

2.3 Hierarchical Structural Break Model . . . . . . . . . . . . . . . . . . . . 62

2.4 Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

2.5 Application to Canada Inflation . . . . . . . . . . . . . . . . . . . . . . . 69

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

2.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

vii

CONTENTS viii

3 Modeling Regime Switching and Structural Breaks with an Infinite

Dimension Markov Switching Model 92

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

3.2 Dirichlet process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3.3 Sticky double hierarchical Dirichlet process hidden Markov model . . . . 101

3.4 Estimation, inference and forecasting . . . . . . . . . . . . . . . . . . . . 104

3.5 Simulation evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

3.6 Application to U.S. real interest rate . . . . . . . . . . . . . . . . . . . . 114

3.7 Application to U.S. inflation . . . . . . . . . . . . . . . . . . . . . . . . . 118

3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

3.9 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Bibliography 149

Chapter 1

Components of Bull and Bear

Markets: Bull Corrections and Bear

Rallies

ix

Chapter 1. Components of Bull and Bear Markets 1

1.1 Introduction

There is a widespread belief both by investors, policy makers and academics that low

frequency trends do exist in the stock market. Traditionally these positive and negative

low frequency trends have been labelled as bull and bear markets respectively. If these

trends do exist, then it is important to extract them from the data to analyse their

properties and consider their use as inputs into investment decisions and risk assessment.

We propose a model that provides answers to typical questions such as, ’Are we in a

bull market or a bear market rally?’ or ’Will this bull market correction become a bear

market?’.

Traditional methods of identifying bull and bear markets are based on an ex post

assessment of the peaks and troughs of the price index. Formal dating algorithms based

on a set of rules for classification are found, for example, in Gonzalez et al. (2005),

Lunde and Timmermann (2004) and Pagan and Sossounov (2003). Some of this work is

related to the dating methods used to identify turning points in the business cycle (Bry

and Boschan (1971)). A drawback is that a turning point can only be identified several

observations after it occurs.

Ex post dating algorithms sort returns into a particular regime with probability zero

or one. The data provides more information; investors may be interested in estimated

probabilities associated with particular states. Such information can be used to answer

questions such as ’How likely is it that the market could turn into a bear next month?’.

Further, ex post dating methods cannot be used for statistical inference on returns or for

investment decisions which require more information from the return distribution, such

as changing risk assessments. For adequate risk management and investment decisions,

we need a probability model for returns and one for which the distribution of returns

changes over time.

For time series that tend to be cyclical, for example, due to business cycles, a popular

model has been a two-state regime-switching model in which the states are latent and the


mixing parameters are estimated from the available data. One popular parameterization

is a Markov-switching (MS) model for which transitions between states are governed by

a Markov chain. Hamilton (1989a) applied a two-state MS model to quarterly U.S. GNP

growth rates in order to identify business cycles and estimate 1st-order Markov transition

probabilities associated with the expansion and recession phases of those cycles.

Stock markets are also perceived to have a cyclical pattern which can be captured

with regime-switching models. For example, Hamilton and Lin (1996) relate business

cycles and stock market regimes, Chauvet and Potter (2000) and Maheu and McCurdy

(2000a) use a Markov-switching parameterization to analyze properties of bull and bear

market regimes extracted from aggregate stock market returns.1 The latter paper al-

lows duration-dependent transition probabilities, as well duration-dependent intra-state

dynamics for returns and volatilities. Lunde and Timmermann (2004) study duration

dependence after sorting stock returns into either a bull or bear market using their dating

algorithm. Ntantamis (2009) explores potential explanatory variables for stock market

regimes’ duration.

In a related literature that investigates cyclical patterns in a broader class of as-

sets, Guidolin and Timmermann (2005) use a 3-state regime-switching model to identify

bull and bear markets in monthly UK stock and bond returns and analyze implications

for predictability and optimal asset allocation. Guidolin and Timmermann (2006) add

an additional state in order to model the nonlinear joint dynamics of monthly returns

associated with small and large cap stocks and long-term bonds.

In contrast to the existing literature, our objective is to use higher-frequency weekly

1There are many other applications of regime-switching models to forcing processes for asset pricingmodels and to asset returns. Cecchetti et al. (1990), Kandel and Stambaugh (1990), Gordon and St-Amour (2000), Calvet and Fisher (2007), Lettau et al. (2008), Guidolin and Timmermann (2008) andDavid and Veronesi (2009), among others, derive implications of regime-switching for equilibrium assetprices. Examples for interest rates include Garcia and Perron (1996b) and Ang and Bekaert (2002c).Applications that explore the implications of nonlinearities due to regimes switches for asset allocationand/or predictability of returns include Turner et al. (1989), van Norden and Schaller (1997), Maheuand McCurdy (2000b), Perez-Quiros and Timmermann (2001), Ang and Bekaert (2002b), Guidolin andTimmermann (2007).


data and to provide a real-time approach to identifying phases of the market that relate

to investors’ perceptions of primary and secondary trends in aggregate stock returns.

Existing approaches do not explicitly model bull market corrections and bear market

rallies. Separating short-term reversals from the primary trend in high-frequency market

returns is an important empirical regularity that a model must capture for it to be able

to account for market dynamics.

We propose a latent 4-state Markov-switching model for weekly stock returns. Our

focus is on modeling the component states of bull and bear market regimes in order to

identify and forecast bull, bull correction, bear and bear rally states. The bear and bear

rally states govern the bear regime; the bull correction and bull states govern the bull

regime. The model can accommodate short-term reversals (secondary trends) within each

regime of the market. For example, in the bull regime it is possible to have a series of

persistent negative returns (a bull correction), despite the fact that the expected long-run

return (primary trend) is positive in that regime. Analogously, bear markets often exhibit

persistent rallies which are subsequently reversed as investors take the opportunity to sell

with the result that the average return in that regime is still negative.

It is important to note that our additional states allow for both intra and inter-regime

transitions. A bear rally is allowed to move back to the bear state or to exit the bear

regime by moving to a bull state. Likewise, a bull correction can move back to the bull

state or exit the bull regime by transitioning to a bear state. This richer structure allows

regimes to feature several episodes of their component states. For example, a bull regime

can be characterized by a combination of bull states and bull corrections. Similarly, a

bear regime can consist of several episodes of the bear state and the bear rally state,

exactly as many investors feel we observe in the data. Because, the realization of states

in a regime will differ over time, bull and bear regimes can be heterogenous over time.

These important intra and inter-regime dynamics are absent in the existing literature.

Our Bayesian estimation approach accounts for parameter and regime uncertainty and


provides probability statements regarding future regimes and returns. As noted above,

each bear and bull regime has two states. We identify the model by imposing the long-run

mean of returns to be negative in the bear regime and positive in the bull regime; while

allowing for very different dynamics within each regime. We consider several versions of

the model in which the variance dynamics are decoupled from the mean dynamics. We

find that a model in which the states associated with the first and second moment are

coupled provides the best fit to the data.

Applied to 125 years of data our model provides superior identification of trends in

stock prices. One important difference with our specification is that the richer dynamics

in each regime, facilitated by our 4-state model, allow us to extract bull and bear markets

in higher frequency data. As we show, a problem with a two-state Markov-switching

model applied to higher frequency data is that it results in too many switches between

the high and low return states. In other words, it is incapable of extracting the low

frequency trends in the market. In high frequency data it is important to allow for

short-term reversals in the regime of the market. Relative to a two-state model we find

that market regimes are more persistent and there is less erratic switching. According to

Bayes factors, our 4-state model of bull and bear markets is strongly favored over several

alternatives including a two-state model, and different variance dynamics.

Our results include probabilistic identification of bear, bear rally, bull correction and

bull states – as well as the characteristics of the associated bear and bull market regimes.

For instance, bull regimes have an average duration of just under 5 years, while the

duration of a bull correction is 4 months on average and a bear rally is just over half a

year. The cumulative return mean of the bull market state is 7.88% but bull corrections

offset this by 2.13% on average. Average cumulative return in the bear market state

is -12.4% but bear market rallies counteract that steep decline by yielding a cumulative

return of 7.1% on average. Note that these states are combined into bull and bear market

regimes in heterogenous patterns over time yielding an average cumulative return in the


bull market regime of 33% while that for the bear market regime is about -10%. Also,

although the average cumulative return in the bear rally state is not much less than

that in the bull market state, the ex post Sharpe ratio for the latter is about 2.5 times

larger. This result highlights the importance of also considering assessments of volatility

associated with the alternative states, for example, when identifying bear market rallies

versus bull markets.

Of primary importance is the fact that our model can tell us the probabilities of

market states in real time, unlike dating algorithms. It can also produce out-of-sample

forecasts. For example, the model identifies in real time a transition from a bull market

correction to a bear market in early October 2008. The bear rally and bull correction

states are critical to modeling turning points between regimes; our results show that most

transitions between bull and bear regimes occur through these states. This is consistent

with investors’ perceptions.2 Further, we find asymmetries in intra-regime dynamics, for

example, a bull market correction returns to the bull market state more often than a bear

market rally reverts to the bear state. These are important features that the existing

literature on bull and bear markets ignores.

Our Markov-switching structure provides a full description of the return distribution.

In an out-of-sample application, the probability statements concerning the predictive

density of returns are used to generate Value-at-Risk forecasts. This provides a simple

example of the economic value of our proposed model.

This chapter is organized as follows. The next section describes the data, Section 1.3

discusses two alternative ex post market regime dating algorithms. We use one of these

algorithms to sort actual data and data simulated from our candidate models in order to

determine whether the latter can match commonly perceived features of bull and bear

markets. Section 1.4 summarizes the benchmark 2-state model and develops our proposed

2A Google search turns up such headlines as: ’Bull Market or Bear-Market Rally?’, ’Genuine bullmarket, not a bear market rally’, ’A bear rally in bull’s clothing?’, ’Bear market rally/Bull marketbeginning?’, and many more.


4-state specification. Estimation and model comparison are discussed in Section 1.5.

Section 1.6 presents results including: parameter estimates; probabilistic identification

of the market states and regimes; and Value-at-Risk forecasts. Section 1.7 concludes.

1.2 Data

We begin with 125 years of daily capital gain returns on a broad market equity index.

Our source for the period 1926-2008 inclusive is the value-weighted return excluding

dividends associated with the CRSP S&P 500 index.3 The 1885:02-1925 daily capital

gain returns are courtesy of Bill Schwert (see Schwert (1990)). For 2009-2010, we use the

daily rates of change of the S&P 500 index level (SPX) obtained from Reuters.

Returns are converted to daily continuously compounded returns from which we

construct weekly continuously compounded returns by cumulating daily returns from

Wednesday close to Wednesday close of the following week. If a Wednesday is missing,

we use Tuesday close. If the Tuesday is also missing, we use Thursday. Weekly realized

variance (RV) is computed as the sum of daily (intra-week) squared returns.

Weekly returns are scaled by 100 so they are percentage returns. Unless otherwise

indicated, henceforth returns refer to weekly continuously compounded returns expressed

as a percentage. We have 6498 weekly observations covering the period February 25, 1885

to January 20, 2010. Summary statistics are shown in Table 1.1.

1.3 Bull and Bear Dating Algorithms

Ex post sorting methods for classification of stock returns into bull and bear phases

are called dating algorithms. Such algorithms attempt to use a sequence of rules to

isolate patterns in the data. A popular algorithm is that used by Bry and Boschan

3Note that this is the S&P 90 prior to March 4, 1957.


(1971) to identify turning points of business cycles. Pagan and Sossounov (2003) adapted

this algorithm to study the characteristics of bull/bear regimes in monthly stock prices.

First a criterion for identifying potential peaks and troughs is applied; then censoring

rules are used to impose minimum duration constraints on both phases and complete

cycles. Finally, an exception to the rule for the minimum length of a phase is allowed to

accommodate ’sharp movements’ in stock prices.

The Pagan and Sossounov (2003) BB algorithm is summarized in the appendix. There

are alternative dating algorithms or filters for identifying turning points. For example,

the Lunde and Timmermann (2004) (LT) algorithm identifies bull and bear markets using

a cumulative return threshold of 20% to locate peaks and troughs moving forward.4 They

define a binary market indicator variable It which takes the value 1 if the stock market

is identified by their algorithm to be in a bull state at time t and 0 if it is in a bear state.

Our application of this LT dating algorithm is also summarized in the appendix.

The classification of our data into bull and bear regimes using these two filters is

found in Table 1.2. There are several features to note. First, the sorting of the data is

broadly similar but with important differences. For example, during the 1930s the BB

approach finds many more switches between market phases than does the LT algorithm.

More recently, both identify 1987-12 as a trough but the subsequent bull phase ends in

1990-06 for LT but 2000-03 for BB. The average bear duration is similar (66 weeks) while

the average bull duration is quite different, 117.0 weeks (BB) versus 166.7 (LT). In other

words, the different parameters and assumptions in the filtering methods can result in a

different classification of market phases.

Although the ex post dating algorithms can filter the data to locate different regimes,

they cannot be used for forecasting or inference. In addition, since the sorting rule

focuses on the first moment, it does not characterize the full distribution of returns.

4Lunde and Timmermann (2004) explore alternative thresholds and also asymmetric thresholds forswitching from bull versus from bear markets. For this description we use a threshold of 20%.


The latter is required if we wish to derive features of the regimes that are useful for

measuring and forecasting risk. Also, as noted above, ex post dating algorithms sort

returns into a particular regime with probability zero or one. However, the data provides

more information allowing one to estimate probabilities associated with particular states.

Nevertheless, the dating algorithms are still very useful. For example, we use the

LT algorithm to sort data simulated from our candidate parametric models in order to

determine whether the latter can match commonly perceived features of bull and bear

markets.

1.4 Models

In this section, we briefly review a benchmark two-state model, our proposed 4-state

model, and some alternative specifications of the latter used to evaluate robustness of

our best model.

1.4.1 Two-State Markov-Switching Model

The concept of bull and bear markets suggests cycles or trends that get reversed. Since

those regimes are not observable, as discussed in Section 1.1, two-state latent-variable

MS models have been applied to stock market data. A two-state 1st-order Markov model

can be written

rt|st ∼ N(µst , σ2st) (1)

pij = p(st = j|st−1 = i) (2)

i = 1, 2, j = 1, 2. We impose µ1 < 0 and µ2 > 0 so that st = 1 is the bear market and

st = 2 is the bull market.

Modeling of the latent regimes, regime probabilities, and state transition probabili-


ties, allows explicit model estimation and inference. In addition, in contrast to dating

algorithms or filters, forecasts are possible. Investors can base their investment decisions

on the posterior states or the whole forecast density.

1.4.2 MS-4 to allow Bull Corrections and Bear Rallies

Consider the following general K-state first-order Markov-switching model for returns

rt|st ∼ N(µst , σ2st) (3)

pij = p(st = j|st−1 = i) (4)

i = 1, ..., K, j = 1, ..., K. We explore a 4-state model, K = 4, in order to focus on model-

ing potential phases of the aggregate stock market. Without any additional restrictions

we cannot identify the model and relate it to market phases. Therefore, we consider the

following restrictions. First, the states st = 1, 2 are assumed to govern the bear market;

we label these states as the bear regime. The states st = 3, 4 are assumed to govern the

bull market; these states are labeled the bull regime. Each regime has 2 states which

allows for positive and negative periods of price growth within each regime. In particular

we impose

µ1 < 0 (bear market state), (5)

µ2 > 0 (bear market rally),

µ3 < 0 (bull market correction),

µ4 > 0 (bull market state).

This structure can capture short-term reversals in market trends. Each state can have a

different variance and can accommodate autoregressive heteroskedasticity in returns. In

addition, conditional heteroskedasticity within each regime can be captured.


Consistent with the 2 states in each regime the full transition matrix is

P =

p11 p12 0 p14

p21 p22 0 p24

p31 0 p33 p34

p41 0 p43 p44

. (6)

This structure allows for several important features that are excluded in the smaller

Markov-switching models in the literature. First, a bear regime can feature several

episodes of the bear state and bear rally state, exactly as many investors feel we observe

in the data. Similarly, the bull regime can be characterized by a combination of bull

states and bull corrections. Because, the realization of states in a regime will differ over

time, both bull and bear regime will tend to look heterogenous to some extent. For

instance, based on returns, a bear regime lasting 5 periods made of the states

st = 1, st+1 = 1, st+2 = 1, st+3 = 2, st+4 = 2, st+5 = 2, st+6 = 4

will tend to look very different than

st = 1, st+1 = 1, st+2 = 1, st+3 = 2, st+4 = 1, st+5 = 1, st+6 = 4.

A second important contribution is that a bear rally is allowed to move either into

the bull state or back to the bear state; analogously, a bull correction can move to a bear

state or back to the bull state. These important inter and intra-regime dynamics are

absent in the existing literature.

The unconditional probabilities associated with P can be solved (Hamilton (1994))

π = (A′A)−1A′e (7)


where A′ = [P ′ − I, ι] and e′ = [0, 0, 0, 0, 1] and ι = [1, 1, 1, 1]′.

Using the matrix of unconditional state probabilities given by (7), we impose the

following conditions on long-run returns in the bear and bull regimes respectively5,

E[rt|bear regime, st = 1, 2] =π1

π1 + π2µ1 +

π2π1 + π2

µ2 < 0 (8)

E[rt|bull regime, st = 3, 4] =π3

π3 + π4µ3 +

π4π3 + π4

µ4 > 0. (9)

We do not impose any constraint on the variances.

The equations (5) and (6), along with equations (8) and (9), serve to identify6 bull

and bear regimes. The bull (bear) regime has a long-run positive (negative) return. Each

market regime can display short-term reversals that differ from their long-run mean. For

example, a bear regime can display a bear market rally (temporary period of positive

returns), even though its long-run return is negative. Similarly for the bull market.

1.4.3 Other Models for Robustness Checks

Besides the 4-state model we consider several other specifications and provide model

comparisons among them. The dependencies in the variance of returns are the most

dominate feature of the data. This structure may adversely dominate dynamics of the

conditional mean. The following specifications are included to investigate this issue.

5Note that at this point we are abstracting from an equilibrium model of investor behavior. Investorscannot identify states with probability 1 so modeling investors’ expected returns at each point is beyondthe scope of this chapter. Regimes or states may have negative expected returns for some limited periodfor a variety of reasons such as changes in risk premiums due to learning following breaks (Pastor andStambaugh (2001)), different investment horizons (Guidolin and Timmermann (2005)), etc.

6Discrete mixture of distributions are subject to identification issues. Label switching occurs whenthe states and parameters are permuted but the likelihood stays the same. Our prior restrictions avoidthis issue and identify the model. For more discussion on this see Fruhwirth-Schnatter (2006).


Restricted 4-State Model

This is identical to the 4-state model in Section 1.4.2 except that inside a regime the

return innovations are homoskedastic. That is, σ21 = σ2

2 and σ23 = σ2

4. In this case, the

variance within each regime is restricted to be constant although the overall variance of

returns can change over time due to switches between regimes.

Markov-Switching Mean and i.i.d. Variance Model

In this model, the mean and variance dynamics are decoupled. This is a robustness

check to determine to what extent the variance dynamics might be driving the regime

transitions. This specification is identical to the Markov-switching model in Section 1.4.2

except that only the conditional mean follows the Markov chain while the variance follows

an independent i.i.d mixture. That is,

rt|st = µst + zt (10)

zt ∼L∑i=1

ηiN(0, σ2i ), ηi ≥ 0,

L∑i=1

ηi = 1 (11)

pij = p(st = j|st−1 = i), i, j = 1, ..., K (12)

For identification, σ21 < σ2

2 < · · · < σ2L is imposed along with the constraints used for the

conditional mean in the previous section. We focus on the case K = 4 and L = 4, again

to allow us to capture at least four phases of cycles for aggregate stock returns.

1.5 Estimation and Model Comparison

1.5.1 Estimation

In this section we discuss Bayesian estimation for the most general model introduced in

Section 1.4.2 assuming there are K states, k = 1, ..., K. The other models are estimated


in a similar way with minor modifications.

There are 3 groups of parameters M = µ1, ..., µK, Σ = σ21, ..., σ

2K, and the ele-

ments of the transition matrix P . Let θ = M,Σ, P and given data IT = r1, ..., rT we

augment the parameter space to include the states S = s1, ..., sT so that we sample from

the full posterior p(θ, S|IT ). Assuming conditionally conjugate priors µi ∼ N(mi, n2i ),

σ−2i ∼ G(vi/2, wi/2) and each row of P following a Dirichlet distribution, allows for a

Gibbs sampling approach following Chib (1996). Gibbs sampling iterates on sampling

from the following conditional densities given startup parameter values for M , Σ and P :

S|M,Σ, P

M |Σ, P, S

Σ|M,P, S

P |M,Σ, S

Sequentially sampling from each of these conditional densities results in one iteration of

the Gibbs sampler. Dropping an initial set of draws to remove any dependence from

startup values, the remaining draws S(j),M (j),Σ(j), P (j)Nj=1 are collected to estimate

features of the posterior density. Simulation consistent estimates can be obtained as

sample averages of the draws. For example, the posterior mean of the state dependent

mean and standard deviation of returns are estimated as

1

N

N∑j=1

µ(j)k ,

1

N

N∑j=1

σ(j)k , (13)

for k = 1, ..., K and are simulation consistent estimates of E[µk|IT ] and E[σk|IT ] respec-

tively.

The first sampling step of S|M,Σ, P involves a joint draw of all the states. Chib (1996)

shows that this can be done by a so-called forward and backward smoother through the


identity

p(S|θ, IT ) = p(sT |θ, IT )T−1∏t=1

p(st|st+1, θ, It). (14)

The forward pass is to compute the Hamilton (1989a) filter for t = 1, ..., T

p(st = k|θ, It−1) =K∑l=1

p(st−1 = l|θ, It−1)plk, k = 1, ..., K, (15)

p(st = k|θ, It) =p(st = k|θ, It−1)f(rt|It−1, st = k)∑Kl=1 p(st = l|θ, It−1)f(rt|It−1, st = l)

, k = 1, ..., K. (16)

Note that f(rt|It−1, st = k) is the normal pdf N(µk, σ2k). Finally, Chib (1996) has shown

that a joint draw of the states can be taken sequentially from

p(st|st+1, θ, It) ∝ p(st|θ, It)p(st+1|st, P ), (17)

where the first term on the right-hand side is from (16) and the second term is from the

transition matrix. This is the backward step and runs from t = T − 1, T − 2, ..., 1. The

draw of sT is taken according to p(sT = k|θ, IT ), k = 1, ..., K.

The second and third sampling steps are straightforward and use results from the

linear regression model. Conditional on S we select the data in regime k and let the

number of observations of st = k be denoted as Tk. Then µk|Σ, P, S ∼ N(ak, Ak),

ak = Ak

σ−2k ∑t∈t|st=k

rt + n−2k mk

, Ak = (σ−2k Tk + n−2k )−1. (18)

A draw of the variance is taken from

σ−2k |M,P, S ∼ G

(Tk + vk)/2,

∑t∈t|st=k

(rt − µk)2 + wk

/2

(19)

Given the conjugate Dirichlet prior on each row of P , the final step is to sample


P |M,Σ, S from the Dirichlet distribution (Geweke (2005)).

An important byproduct of Gibbs sampling is an estimate of the smoothed state

probabilties p(st|IT ) which can be estimated as

p(st = i|IT ) =1

N

N∑j=1

1st=i(S(j)) (20)

for i = 1, ..., K.

At each step, if a parameter draw violates any of the prior restrictions in (5), (6), (8)

and (9), then it is discarded. For the 4-state model we set the independent priors as

µ1 ∼ N(−0.7, 1), µ2 ∼ N(0.2, 1), µ3 ∼ N(−0.2, 1), µ4 ∼ N(0.3, 1) (21)

σ−2i ∼ G(0.5, 0.05) for i = 1, 2, 3, 4 (22)

p11, p12, p14 ∼ Dir(8, 1.5, 0.5), p21, p22, p24 ∼ Dir(1.5, 8, 0.5) (23)

p31, p33, p34 ∼ Dir(0.5, 8, 1.5), p41, p43, p44 ∼ Dir(0.5, 1.5, 8). (24)

These priors are informative but cover a wide range of empirically relevant parameter

values.7

1.5.2 Model Comparison

If the marginal likelihood can be computed for a model it is possible to compare models

based on Bayes factors. Non-nested models can be compared as well as specifications with

a different number of states. Note that the Bayes factor penalizes over-parameterized

models that do not deliver improved predictions.8 For the general Markov-switching

7We checked the prior sensitivity. It includes scaling the variance of the µi’s by 10, increasing thevariance of the precision σ−2

i by 10 times, setting the transition probabilities to have uniform distributionand all combinations of these priors. The sorting of the data into bull and bear regimes is robust andthe model comparison results are consistent.

8This is referred to as an Ockham’s razor effect. See Kass and Raftery (1995b) for a discussion onthe benefits of Bayes factors.


model with K states, the marginal likelihood for model Mi is defined as

p(r|Mi) =

∫p(r|Mi, θ)p(θ|Mi)dθ (25)

which integrates out parameter uncertainty. p(θ|Mi) is the prior and

p(r|Mi, θ) =T∏t=1

f(rt|It−1, θ) (26)

is the likelihood which has S integrated out according to

f(rt|It−1, θ) =K∑k=1

f(rt|It−1, θ, st = k)p(st = k|θ, It−1). (27)

The term p(st = k|θ, It−1) is available from the Hamilton filter. Chib (1995) shows how

to estimate the marginal likelihood for MS models. His estimate is based on re-arranging

Bayes’ theorem as

p(r|Mi) =p(r|Mi, θ

∗)p(θ∗|Mi)

p(θ∗|r,Mi)(28)

where θ∗ is a point of high mass in the posterior pdf. The terms in the numerator are

directly available above while the denominator can be estimated using additional Gibbs

sampling runs.9

A log-Bayes factor between model Mi and Mj is defined as

log(BFij) = log(p(r|Mi))− log(p(r|Mj)). (29)

Kass and Raftery (1995b) suggest interpreting the evidence for Mi versus Mj as: not

worth more than a bare mention for 0 ≤ log(BFij) < 1; positive for 1 ≤ log(BFij) < 3;

9The integrating constant in the prior pdf is estimated by simulation.


strong for 3 ≤ log(BFij) < 5; and very strong for log(BFij) ≥ 5.

1.5.3 Predictive Density

An important feature of our probabilitic approach is that a predictive density of fu-

ture returns can be computed that integrates out all uncertainty regarding states and

parameters.

The predictive density for future returns based on current information at time t is

computed as

p(rt|It−1) =

∫f(rt|It−1, θ)p(θ|It−1)dθ (30)

which involved integrating out both state and parameter uncertainty using the posterior

distribution p(θ|It−1). From the Gibbs sampling draws S(j),M (j),Σ(j), P (j)Nj=1 based

on data It−1 we approximate the predictive density as

p(rt|It−1) =1

N

N∑i=1

K∑k=1

f(rt|It−1, θ(i), st = k)p(st = k|s(i)t−1, θ(i)) (31)

where f(rt|It−1, θ(i), st = k) follows N(µ(i)k , σ

2(i)k ) and p(st = k|s(i)t−1, θ(i)) is the transition

probability.

The predictive mean of a future state st can also be easily estimated by simulating

from the distribution p(st = k|s(i)t−1, θ(i)) a state s(j)t for each state and parameter draw

s(i)t−1, θ

(i). The average of these draws, s(j)t Nj=1 is an estimate of E[st|It−1].


1.6 Results

1.6.1 Parameter Estimates and Implied Distributions

Model estimates for the 2-state Markov-switching (MS-2) model are found in Table 1.3.

State 1 has a negative conditional mean along with a high conditional variance whereas

state 2 displays a high conditional mean with a low conditional variance. Both regimes

are very persistent. These results are consistent with the sorting of bull and bear regimes

in Maheu and McCurdy (2000a) and Guidolin and Timmermann (2005).

Estimates for our proposed 4-state model (MS-4) are found in Table 1.4. All param-

eters are precisely estimated indicating that the data are quite informative. Recall that

states st = 1, 2 capture the bear regime while states st = 3, 4 capture the bull regime.

Each regime contains a state with a positive and a negative conditional mean. We label

states 1 and 2 the bear and bear rally states respectively; states 3 and 4 are the bull

correction and bull states.

Consistent with the MS-2 model, volatility is highest in the bear regime. In particular,

the highest volatility occurs in the bear regime in state 1. This state also delivers the

lowest average return. The highest average return and lowest volatility is in state 4

which is part of the bull regime. The bear rally state (st = 2) delivers a conditional

mean of 0.23 and conditional standard deviation of 2.63. However, this mean is lower

and the volatility higher than the bull positive growth state (st = 4). Analogously, the

bull correction state (st = 3) has a larger conditional mean (−0.13 > −0.94) and smaller

volatility (2.18 < 6.01) than the bear state 1.

All states display high persistence (pii is high for all i). However, the transition

probabilities display some asymmetries. For example, the probability of a bear rally

moving back to the bear state 1 (p21 = 0.015) is a little lower than changing regime to

a bull market (p24 = 0.019). On the other hand, the probability of a bull correction

returning to a bull market (p34 = 0.051) is considerably higher than changing regime to


the bear state (p31 = 0.010).

Figure 1.1 displays the density of each of the 4 states. The differences in the illustrated

densities are in accord with the parameter estimates in Table 1.4. Differences in the

spreads of the densities are most apparent but the locations are also different. There is

no suggestion from these plots that states 1 and 2 are the same or that states 3 and 4

are the same, as a two-state Markov-switching model would assume.

Integrating state 1 and 2 gives the bear regime and doing the same for states 3 and 4

produces the bull regime. These densities are shown in Figure 1.2. The bear regime has a

mean slightly below 0 but with a much larger variance than the bull regime. The implied

unconditional density of returns is a mixture of these two regimes and displayed in the

middle of the figure. Table 1.5 reports the unconditional probabilties for the states. On

average the market spends 0.157 of time in a bear rally while 0.304 in a bull correction.

The most time is spent in the bull growth state 4. The unconditional probability of the

bull regime is 0.773.

A comparison of the regime statistics implied by the parameter estimates for the MS-2

and MS-4 models is found in Table 1.6. The expected duration of regimes is much longer

in the 4-state model. That is, by allowing heterogeneity within a regime in our 4-state

model, we switch between bull and bear markets less frequently. For instance, in a MS-2

parameterization the bull market has a duration of only 82.6 weeks, about 18 months,

while the richer MS-4 model has a bull duration of just under 5 years. As we will see

below, there is much more switching between regimes in the MS-2 model.

In the 2-state model, the expected return and variance are fixed within a regime. In

this case, the only source of intra-regime variance is return innovations. For example for

the bear regime in the MS-2 model, the expected variance is E[Var(rt|st = 1)] = 19.6.

In contrast, the average variance for each regime in the 4-state model can be attributed

to changes in the conditional mean as well as to the average conditional variance of the

return innovations. For instance, the average variance of returns in the bear regime can


be decomposed as Var(rt|st = 1, 2) = Var(E[rt|st]|st = 1, 2) + E[Var(rt|st)|st = 1, 2] =

0.31 + 16.1, with a similar result for the bull regime. For the bull and bear phase, the

mean dynamics account for a small share, 2% of the total variance.10

The MS-2 model assumes normality in both market regimes while the MS-4 shows

that the data is at odds with this assumption. Skewness in present in bear markets while

excess kurtosis is found in both bull and bear regimes. Overall the bear market deviates

more from a normal distribution; it has thicker tails and captures more extreme events.

Table 1.7 summarizes features of the MS-4 parameterization for both the regimes and

their component states derived from the posterior parameter estimates. The bear regime

duration is 77.8 weeks, much shorter than the bull regime duration of 256.0 weeks. The

average cumulative return in the bear (bull) regime11 is -9.94 (33.0). The volatility in

the bear market is more than twice that in the bull market. The third panel provides

a breakdown of cumulative return means in each of the component states of the market

regimes. The bear rally yields a cumulative return of 7.10 on average which partially

offsets the average decline of -12.4 in state 1. On the other hand the bull correction has

a cumulative return mean of -2.13 which diminishes the average cumulative return of

7.88 in state 4. Note that these states are combined into bull and bear market regimes

in heterogenous patterns over time yielding the statistics for regimes summarized in the

first two panels of Table 1.7.

Although the stock market spends most of the time in the bull regime (states 3 and

4), in terms of individual states it is state 2 that has the longest duration while the

shortest is state 1. The final panel of Table 1.7 records the conditional mean divided by

the associated conditional standard deviation for each state, that is, estimates of µi/σi

from Table 1.4. This is analogous to an ex post Sharpe ratio. State 4 provides the most

10This is computed as 0.31/(0.31+16.1) and 0.04/(0.04+2.89).11This is equal to the expected return for the bear regime, given by Equation (8), times the expected

duration for that regime which is

( π1

π1+π2π2

π1+π2

)′ [I2 −

(p11 p12p21 p22

)]−2(p14p24

).


favorable risk-return tradeoff followed by state 2, 3 and 1. Note that the Sharpe ratio

in the bull state 4 is approximately 2.5 times larger than in the bear rally (state 2). In

other words, even though the bear rally delivers a postive expected return, that return

is much more variable than in the bull state.

1.6.2 Model Comparisons

One can conduct formal model comparisons based on the marginal likelihoods reported

in Table 1.8. The constant mean and variance model performs the worst (has the lowest

marginal likelihood). The next model has a constant mean but allows the variances

to follow a 4-state i.i.d. mixture. Following this are models with a 2-state versus a 4-

state Markov-switching conditional mean – both combined with a 4-state i.i.d. variance

as in Section 1.4.3. In both cases, the additional dynamics that are introduced to the

conditional mean of returns provides a significant improvement over the constant mean

case with the same 4-state i.i.d. variance. However, all of these specifications are strongly

dominated by their counterparts which allow a common 2 (or 4) state Markov chain

to direct both conditional moments. These specifications capture persistence in the

conditional variance.

Note that the log-Bayes factor between the 2-state MS and the 4-state MS in the

conditional mean restricted to have only a 2-state conditional variance (Section 1.4.3)

is large at 53.4 = −13849.9 − (−13903.3). This improved fit comes when additional

conditional mean dynamics (going from 2 to 4 states) are added to the basic 2-state MS

model. The best model is the 4-state Markov-switching model. The log-Bayes factor in

support of the 4-state versus the 2-state model is 162.9 = −13740.4 − (−13903.3). The

zero restrictions in the transition matrix (6) are also strongly supported by the data. For

instance, the log-Bayes factor is 6.9 = −13740.4 − (−13747.3) in support of the MS-4

model with P matrix (6) as compared to a 4 state model with an unrestricted transition

matrix (all 16 elements of P are estimated).


Overall, there is very strong evidence that the 4-state specification of Section 1.4.2

provides the best fit to weekly returns. The comparisons also show that this improved fit

comes from improved fit to both the conditional mean and variance. Not only does our

MS-4 model provide a better economic characterization of differences in stock market

cycles but the model statistically dominates other alternatives.

The Markov-switching models specify a latent variable that directs low frequency

trends in the data. As such, the regime characteristics from the population model are

not directly comparable to the dating algorithms of Section 1.3. Instead, we consider

the dating algorithm as a lens to view both the S&P500 data and data simulated from

our preferred MS-4 model. Using parameter draws from the Gibbs sampler, we simulate

return data from the model and then apply the LT dating algorithm to those simulated

returns. This is done many times12 and the average and 0.70 density intervals of these

statistics are reported in Table 1.9 along with the statistics from the S&P500 data.

Although our model provides a richer 4 state description of bull and bear markets it does

account for all of the data statistics associated with a simpler 2 state view of the market

using the LT dating algorithm.

1.6.3 Identification of Historical Turning Points in the Market

The dating of the market regimes using the LT dating algorithm are found in the top panel

of Figure 1.3. The shaded portions under the cumulative return denote bull markets while

the white portions of the figure are the bear markets. Below this panel is the smoothed

probability of a bull market, p(st = 3|IT ) + p(st = 4|IT ) for the 4-state model. The final

plot in Figure 1.3 is the smoothed probability of a bull market, p(st = 2|IT ) from the

2-state model. The 4-state model produces less erratic shifts between market regimes,

closely matches the trends in prices, and generally corresponds to the dating algorithm.

The 2-state model is less able to extract the low frequency trends in the market. In

1210,000 simulations each of 6498 observations.


high frequency data it is important to allow intra-regime dynamics, such as short-term

reversals.

Note that the success of our model should not be based on how well it matches

the results from dating algorithms. Rather this comparison is done to show that the

latent-state MS models can identify bull and bear markets with similar features to those

identified by conventional dating algorithms. Beyond that, the Markov-switching models

presented in this chapter provide a superior approach to modeling stock market trends

as they deliver a full specification of the distribution of returns along with latent market

dynamics. Such an approach permits out-of-sample forecasting which we turn to in

Section 1.6.4.

The following subsections discuss how our model identifies sub-regime dynamics using

examples from various subperiods. There are several important points revealed by this

dicussion. First, bear (bull) markets are persistent but are made of many regular transi-

tions between states 1 and 2 (3 and 4). Second, in each of the examples the move between

regimes occurs through either the bear rally or the bull correction state. In other words,

these additional dynamics are critical to fully capturing turning points in stock market

cycles. This is also borne out by our model estimates. The most likely route for a bear

market to go to a bull market is through the bear rally state. Given that a bull market

has just started, the probability is 0.9342 that the previous state was a bear rally13, and

only 0.0658 that it was a bear state. Similarly, given that a bear market has just started,

the probability is 0.8663 that the previous state was a bull correction, and only 0.1337

that it was a bull state. The following subperiod descriptions provide examples of this

richer specification of turning points plus frequent reversals within a regime.

13p(st = 2|st+1 = 4, st = 1 or 2) ∝ p24π2

π1+π2, p(st = 1|st+1 = 4, st = 1 or 2) ∝ p14π1

π1+π2


1927-1939

Figure 1.4 displays the log-price and the realized volatility (square root of realized vari-

ance) in the top panel, the smoothed states of the MS-4 model in the second panel, and

the posterior probability of the bull market, p(st = 3|IT )+p(st = 4|IT ), in the last panel.

Just before the crash of 1929 the model identifies a bull correction state. The tran-

sition from a bull to bear market occurs as a move from a bull market state to a bull

correction state and then into the bear regime. For the week ending October 16 1929,

there was a return of -3.348 and the market transitioned from the bull correction state

into the bear market state with p(st = 1|IT ) = 0.63. This is further reinforced so that

the next 5 weeks have essentially probability 1 for state 1.

As this figure shows, the remainder of this subperiod is decisively a bear market,

but displays considerable heterogeneity in that there are several short-lived bear rallies.

The high levels of realized volatility coincide with the high volatility in the bear market

states. Periods of somewhat lower volatility are associated with the bear rally states.

In Figure 1.4, a strong bear rally begins in late November 1933 and lasts until August

25, 1937, at which time there is a move back into the bear market state. Realized

volatility increases with this move into state 1.

1980-1985

In Figure 1.5, the market displays several moves between the bull market state and the

bull correction state before a short-term move into a bear market in August of 1982.

Once again the transition from a bull to bear market is through a bull correction state.

However, the bear market that emerges has state 1 that lasts only about 4 weeks. This

is followed by a bear rally that results in increased prices accompanied with substantial

volatility. The bear rally turns into a bull market in late April of 1983, thereafter are

periods of the bull market state and bull corrections.


1987 crash

Prior to the 1987 crash there is a dramatic run-up in stock prices with generally low

volatility, as illustrated in the top panel of Figure 1.6. It is interesting to note that the

model shows a great deal of uncertainty about the state of the market well before the

crash. In the first week of October, just before the crash, the most likely state is the bull

correction with p(st = 3|IT ) = 0.37. The bear state which starts the following week lasts

for about 5 weeks after which a strong bear rally quickly emerges as of the week ending

November 18, 1987. It is the bear rally state that exits into a bull market during the

week of August 17, 1988. Prices resume their strong increase until they plateau with a

bull correction beginning the week of October 4, 1989.

2006-2010

We conclude with an analysis of recent market activity in Figure 1.7. The bull market

state turned into a bull correction in mid-July 2007, which persisted until an abrupt move

into the bear market state in early September 2008. This transition was accompanied

by a dramatic increase in realized volatility. According to our model, the bear market

became a bear market rally in the third week of March 2009 where it stayed until mid-

November 2009 when it moved into the bull market state. As noted earlier, the positive

trend in returns during a bear market rally do not get interpreted as a bull market until

the market volatility declines to levels more typical of bull markets.

1.6.4 Example Application

An industry standard measure of potential portfolio loss is the Value-at-Risk (VaR).

VaR(α),t is defined as the 100α percent quantile of the portfolio value or return distribution

given information at time t− 1. We compute VaR(α),t from the predictive density of the


MS-4 model as

p(rt < VaR(α),t|It−1) = α. (32)

Given a correctly specified model, the probability of a return of VaR(α),t or less is α.

To compute the Value-at-Risk from the MS-4 model we do the following. First, N

draws from the predictive density are taken as follows: draw θ and st−1 from the Gibbs

sampler, a future state st is simulated based on P and rt|st ∼ N(µst , σ2st). The details

are discussed in Section 1.5.3. From the resulting draws, the rt with rank [Nα] is an

estimate of VaR(α),t.

Figure 1.8 displays the conditional VaR from January 3, 2007 to January 20, 2010

predicted by the MS-4 model, as well as that implied by the normal benchmark for

α = 0.05. At each point the model is estimated based on information up to t − 1.

Similarly, the benchmark, N(0, σ2), sets σ2 to the sample variance using It−1.

The normal benchmark overestimates the VaR for the early part of this subsample but

starts to understate it at times, beginning in mid-2007, and then severely under estimates

in the last few months of 2008. The MS-4 model provides a very different VaR(.05),t over

time because it takes into account the predicted regime, as indicated by the middle and

bottom panels of Figure 1.8 which show forecasts of the states and regimes respectively.

Note that the potential losses, shown in the top panel, increase considerably in September

and October 2008 as the model identifies a move from a bull to a bear market.

Real-time Identification of the Bear Market

This out-of-sample application also gives us an opportunity to assess in real time when

our model identified a move into the bear regime. In Section 1.6.3 this was discussed in

the context of the full sample smoothed estimates. We now consider the identification

process that would have been historically available to investors using the model forecasts.


This will differ from the previous results as we are using a smaller sample and updating

estimates as new data arrives.

The second and third panel of Figure 1.8 report the predictive mean of the states

and regimes. Prior to 2008, forecasts of the bull states occur the most, including some

short episodes of bull corrections. In the first week of October 2008, the probability of

a bull regime drops from 0.85 to essentially zero and remains there for some time. In

other words, the model in real time detects a turning point in the first week of October

2008 from the bull to the bear regime. The first half of the bear regime that follows is

characterized by the bear state while the second half is largely classified as a bear rally.

Toward the end of our sample there is a move from the bear market rally state to a

bull market. In real time, in early December 2009 the model forecasts a move from the

bear rally to the bull market state. For the week ending December 9, we have p(st =

1|It−1) = 0.02, p(st = 2|It−1) = 0.17, p(st = 3|It−1) = 0.14 and p(st = 4|It−1) = 0.67.

The evidence for a bull market regime gradually strenthens; the last observation in our

sample, January 20, 2010, has probabilities 0.01, 0.11, 0.07 and 0.81 for states 1,2,3 and

4, with the bull market state being the most likely.

1.7 Conclusion

This chapter proposes a new 4-state Markov-switching model to identify the components

of bull and bear market regimes in weekly stock market data. Bull correction and bull

states govern the bull regime; bear rally and bear states govern the bear regime. Our

probability model fully describes the return distribution while treating bull and bear

regimes and their component states as unobservable.

A bear rally is allowed to move back to the bear state or to exit the bear regime

by moving to a bull state. Likewise, a bull correction can move back to the bull state

or exit the bull regime by transitioning to a bear state. This implies that regimes can


feature several episodes of their component states. For example, a bull regime can be

characterized by a combination of bull states and bull corrections. Similarly, a bear

regime can consist of several episodes of the bear state and the bear rally state. Because

the realization of states in a regime will differ over time, bull and bear regimes can be

heterogenous over time. This richer structure, including both intra-regime and inter-

regime dynamics, results in a richer characterization of market cycles.

Probability statements on regimes and future returns are available. Our model

strongly dominates other alternatives. Model comparisons show that the 4-state speci-

fication of bull and bear markets is strongly favored over several alternatives including

a two-state model, as well as various alternative specifications for variance dynamics.

For example, relative to a two-state model, there is less erratic switching so that market

regimes are more persistent.

We find that bull corrections and bear rallies are empirically important for out-of-

sample forecasts of turning points and VaR predictions. For these out-of-sample appli-

cations, the model provides probability statements concerning the predictive density of

returns. The probabilities are used in an example application that compares VaR fore-

casts to a normal benchmark model. The latter overestimates the VaR for much of the

sample and then tends to understate it from mid-2007 to late 2009. The MS-4 specifi-

cation has a very different VaR(.05),t over time because it takes into account forecasts of

regime changes. The potential losses increased considerably in September and October

of 2008 as the model identifies a move from a bull to a bear market.


1.8 Appendix

The Pagan and Sossounov (2003) adaptation of the Bry-Boschan (BB) algorithm can be

summarized as follows:

1. Identify the peaks and troughs by using a window of 8 months.

2. Enforce alternation of phases by deleting the lower of adjacent peaks and the higher

of adjacent troughs.

3. Eliminate phases less than 4 months unless changes exceed 20%.

4. Eliminate cycles less than 16 months.

Window width and phase duration constraints will depend on the particular series and

will obviously be different for smoothed business cycle data than for stock prices. Pagan

and Sossounov (2003) provide a detailed discussion of their choices for these constraints.

The Lunde and Timmermann (2004) dating algorithm defines a binary market indi-

cator variable It which takes the value 1 if the stock market is in a bull state at time t

and 0 if it is in a bear state. The stock price at the end of period t is labelled Pt. Our

application of their dating algorithm can be summarized as: use a 6-month window to

locate the initial local maximum or minimum.

Suppose we have a local maximum at time t0, in which case we set Pmaxt0

= Pt0 .

1. Define stopping-time variables associated with a bull market as

τmax(Pmaxt0

, t0 | It0 = 1) = inft0 + τ : Pt0+τ ≥ Pmaxt0

τmin(Pmaxt0

, t0 | It0 = 1) = inft0 + τ : Pt0+τ ≤ 0.8Pmaxt0

2. One of the following happens:


• If τmax < τmin, bull market continues, update the new peak value Pmaxt0+τmax

=

Pt0+τmax and set It0+1 = · · · It0+τmax = 1. Update t0 = t0 + τmax still as local

maximum and continue with step 1 above.

• If τmax > τmin, we find a trough at time t0 + τmin and we have been in a bear

market from t0 + 1 to t0 + τmin. Set It0+1 = · · · = It0+τmin= 0. Record the

value Pmint0+τmin

= Pt0+τminand update t0 = t0 + τmin as local minimum. Go to

step 3 below since t0 is a local minimum now.

When t0 is a local minimum:

3 Bear market stopping times are

τmin(Pmint0

, t0 | It0 = 0) = inft0 + τ : Pt0+τ ≤ Pmint0

τmax(Pmint0

, t0 | It0 = 0) = inft0 + τ : Pt0+τ ≥ 1.2Pmint0

4 One of the following happens:

• If τmin < τmax, bear market continues, update the new trough value, Pmint0+τmin

=

Pt0+τminand set It0+1 = · · · = It0+τmin

= 0. Update t0 = t0 + τmin and continue

with step 3.

• If τmin > τmax we find a peak at time t0 + τmax and we have been in a bull

market from t0 + 1 to t0 + τmax. Set It0+1 = · · · = It0+τmin= 1. Record the

value Pmaxt0+τmax

= Pt0+τmax and update t0 = t0 + τmax as a local maximum. Go

to 1 above since t0 is a local maximum now.

This process is repeated until the last data point. All periods with It = 1 are in bull

regime and It = 0 are in bear regime.


Table 1.1: Weekly Return Statistics (1885-2010)a

N Mean standard deviation Skewness Kurtosis J-Bb

6498 0.085 2.40 -0.49 11.2 18475.5∗

a Continuously compounded returnsb Jarque-Bera normality test: p-value = 0.00000


Table 1.2: BB and LT Dating Algorithm Turning Points

Troughs Peaks Troughs Peaks

BBa LTb BB LT BB LT BB LT

1985-02 1940-06 1940-11

1885-04 1942-05 1942-05 1943-07

1886-12 1943-12 1946-05 1946-05

1888-06 1890-06 1890-06 1948-02 1948-02 1948-06

1890-12 1890-12 1892-03 1892-03 1949-06 1952-12

1893-08 1893-08 1895-09 1895-09 1953-09 1956-07 1956-07

1896-08 1896-08 1897-09 1957-12 1957-12 1959-07

1898-03 1899-04 1960-10 1961-12

1900-07 1902-09 1902-09 1962-07 1962-07 1966-02 1966-02

1903-10 1903-10 1906-01 1966-10 1966-10 1968-12 1968-12

1906-10 1970-06 1970-06 1971-04

1907-11 1907-11 1909-08 1909-08 1971-12 1973-01 1973-01

1910-08 1912-10 1974-10 1974-10 1976-09

1914-12 1914-12 1916-11 1916-11 1978-03 1978-09

1917-12 1917-12 1919-07 1919-07 1980-04 1980-11 1980-11

1921-06 1921-06 1929-09 1929-09 1982-08 1982-08 1983-06

1929-11 1930-04 1984-08 1987-08 1987-08

1932-06 1932-06 1932-09 1987-12 1987-12 1990-06

1933-03 1933-07 1933-07 1990-10 2000-03 2000-03

1933-10 1934-02 2002-10 2002-10 2007-10 2007-10

1935-03 1935-03 1937-03 1937-03 2009-03 2009-03 2010-01 2010-01

1938-04 1938-04 1938-11

1939-04 1939-10 1939-10

a BB: Bry and Boschan algorithm using Pagan and Sossounov parametersb LT: Lunde and Timmermann algorithm


Table 1.3: MS-2-State Model Estimates

mean median std 0.95 DI

µ1 -0.46 -0.46 0.14 (-0.73, -0.20)

µ2 0.20 0.20 0.02 ( 0.16, 0.25)

σ1 4.42 4.42 0.13 ( 4.18, 4.69)

σ2 1.64 1.64 0.02 ( 1.59, 1.69)

p11 0.94 0.94 0.01 ( 0.92, 0.96)

p22 0.99 0.99 0.002 ( 0.98, 0.99)

This table reports the posterior mean,median, standard deviation and 0.95density intervals for model parameters.


Table 1.4: MS-4-State Model Estimates

mean median std 95% DI

µ1 -0.94 -0.92 0.27 (-1.50, -0.45)

µ2 0.23 0.23 0.10 ( 0.04, 0.43)

µ3 -0.13 0.12 0.08 (-0.31, -0.01)

µ4 0.30 0.29 0.04 (0.22, 0.38)

σ1 6.01 5.98 0.35 (5.41, 6.77)

σ2 2.63 2.61 0.18 (2.36, 3.08)

σ3 2.18 2.19 0.12 (1.94, 2.39)

σ4 1.30 1.30 0.04 (1.20, 1.37)

p11 0.921 0.923 0.020 (0.877, 0.955)

p12 0.076 0.074 0.020 (0.042, 0.120)

p14 0.003 0.001 0.004 (3e-6, 0.013)

p21 0.015 0.014 0.007 (0.005, 0.031)

p22 0.966 0.967 0.009 (0.945, 0.980)

p24 0.019 0.018 0.006 (0.009, 0.034)

p31 0.010 0.009 0.003 (0.004, 0.017)

p33 0.939 0.943 0.018 (0.899, 0.965)

p34 0.051 0.048 0.017 (0.027, 0.088)

p41 0.001 0.0003 0.0007 (6e-7, 0.002)

p43 0.039 0.037 0.012 (0.024, 0.067)

p44 0.960 0.963 0.012 (0.933, 0.976)

The posterior mean, median, standarddeviation and 0.95 density intervals for modelparameters.


Table 1.5: Unconditional State Probabilites

mean 0.95 DI

π1 0.070 (0.035, 0.117)

π2 0.157 (0.073, 0.270)

π3 0.304 (0.216, 0.397)

π4 0.469 (0.346, 0.579)

The posterior mean and 0.95 density intervalsassociated with the posterior distribution for πfrom Equation (7).


Table 1.6: Posterior Regime Statistics for MS-2 and MS-4 Models

MS-2 MS-4

bear mean -0.46 -0.13

(-0.73, -0.20) (-0.367, -0.005)

bear duration 18.2 77.8

(13.2, 25.0) (44.4, 134.6)

bear standard deviation 4.42 4.04

(4.18, 4.69) (3.51, 4.73)

bear variance from Var(E[rt|st]|st = 1, 2) 0.00 0.31

(0.07, 0.68)

bear variance from E[Var(rt|st)|st = 1, 2] 19.6 16.1

(17.5, 22.0) (12.1, 22.0)

bear skewness 0 -0.42

(-0.68, -0.20)

bear kurtosus 3 5.12

(4.37, 5.93)

bull mean 0.20 0.13

(0.16, 0.25) (0.07, 0.18)

bull duration 82.6 256.0

(59.1, 115.9) (123.5, 509.6)

bull standard deviation 1.64 1.71

(1.59, 1.69) (1.59, 1.83)

bull variance from Var(E[rt|st]|st = 3, 4) 0.00 0.04

(0.02, 0.09)

bull variance from E[Var(rt|st)|st = 3, 4] 2.69 2.89

(2.54, 2.85) (2.47, 3.30)

bull skewness 0 0.04

(-0.11, 0.16)

bull kurtosus 3 3.77

(3.51, 4.03)

The posterior mean and 0.95 density interval for regime statistics.


Table 1.7: Posterior State Statistics for the MS-4 Model

mean median std 95% DI

Bear mean -0.13 -0.11 0.10 (-0.367, -0.005)

Bear duration 77.8 74.0 23.1 (44.4, 134.6)

Bear cumulative return -9.94 -8.28 7.89 (-29.6, -0.41)

Bear std 4.04 4.01 0.31 (3.51, 4.73)

Bull mean 0.13 0.13 0.03 (0.07, 0.18)

Bull duration 256.0 235.6 100.9 (123.5, 509.6)

Bull cumulative return 33.0 30.0 14.9 (12.9, 70.3)

Bull std 1.71 1.71 0.06 (1.59, 1.83)

s=1: cumulative return -12.4 -11.8 4.49 (-23.0, -5.45)

s=2: cumulative return 7.10 6.97 3.10 (1.47, 13.8)

s=3: cumulative return -2.13 -2.07 1.09 (-4.46, -0.27)

s=4: cumulative return 7.88 7.75 1.67 (5.02, 11.6)

s=1: duration 13.5 13.0 3.63 (8.13, 22.2)

s=2: duration 31.2 30.1 8.39 (18.3, 51.0)

s=3: duration 17.9 17.4 4.80 (9.91, 28.8)

s=4: duration 27.2 26.9 6.75 (14.9, 41.4)

s=1: µ1/σ1 -0.16 -0.15 0.04 (-0.25, -0.07)

s=2: µ2/σ2 0.09 0.09 0.04 (0.02, 0.17)

s=3: µ3/σ3 -0.06 -0.05 0.04 (-0.14, -0.01)

s=4: µ4/σ4 0.23 0.22 0.04 (0.17, 0.31)

This table report posterior statistics for various populationmoments.


Table 1.8: Log Marginal Likelihoods: Alternative Models

Model log f(Y | Model)

Constant mean with constant variance -14924.1

Constant mean with 4-state i.i.d variance -14256.7

MS-4-state mean with 4-state i.i.d. variance (10 with K = 4) -14036.4

MS-2-state (1) -13903.3

MS-4-state mean with constant intra-regime variance (σ21 = σ2

2 , σ23 = σ2

4) -13849.9

MS-4-state (3) with unrestricted transition matrix P -13747.3

MS-4-state (3)-(6) -13740.4


Table 1.9: Dating-algorithm filtering ofdata and simulated data

S&P MS-4

Avg. number of bears 29 31.7

(22, 42)a

Avg. bear duration 63.1 55.9

(40.5, 74.7)

Avg. bear amplitudeb -45.0 -43.4

(-52.7, -35.8)

Avg. bear return -0.71 -0.80

(-1.08, -0.57)

Avg. bear std 3.16 3.15

(2.60, 3.73)

Avg. number of bulls 28 31.4

(22, 42)

Avg. bull duration 166.7 158.5

(103.0, 235.3)

Avg. bull amplitude 66.4 60.2

(46.3, 80.0)

Avg. bull return 0.40 0.39

(0.31, 0.48)

Avg. bull std 2.53 2.42

(1.97, 2.91)

a 70% density intervalb Aggregate return over one regime


−10 −5 0 5 10

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

dens

ity

s=1s=2s=3s=4

Figure 1.1: MS-4-States, State Densities


−10 −5 0 5 10

0.00

0.05

0.10

0.15

0.20

0.25

0.30

dens

ity

BearBullUnconditional

Figure 1.2: MS-4-States, Regime Densities


100

200

300

400

500

600

700

LT d

ecom

posi

tion

0.0

0.2

0.4

0.6

0.8

1.0

MS

4: B

ull P

roba

bilit

ies

0.0

0.2

0.4

0.6

0.8

1.0

MS

2: B

ull P

roba

bilit

ies

188502 190212 192102 193811 195609 197406 199204 201001

Figure 1.3: LT algorithm, MS-4 and MS-2


050

100

150

200

Log

pric

e In

dex

ReturnRV

05

1015

RV

0.0

0.2

0.4

0.6

0.8

1.0

Sta

te P

roba

bilit

ies

s=1s=2s=3s=4

0.0

0.2

0.4

0.6

0.8

1.0

Bul

l Pro

babi

litie

s

192701 192811 193009 193207 193405 193603 193801 193911

Figure 1.4: MS-4, 1927-1939


320

340

360

380

Log

pric

e In

dex

ReturnRV

12

34

5R

V

0.0

0.2

0.4

0.6

0.8

1.0

Sta

te P

roba

bilit

ies

s=1s=2s=3s=4

0.0

0.2

0.4

0.6

0.8

1.0

Bul

l Pro

babi

litie

s

198001 198011 198109 198207 198305 198403 198501 198512

Figure 1.5: MS-4, 1980-1985


360

380

400

420

440

Log

pric

e In

dex

ReturnRV

02

46

810

RV

0.0

0.2

0.4

0.6

0.8

1.0

Sta

te P

roba

bilit

ies

s=1s=2s=3s=4

0.0

0.2

0.4

0.6

0.8

1.0

Bul

l Pro

babi

litie

s

198501 198511 198609 198707 198805 198903 199001 199012

Figure 1.6: MS-4, 1985-1990


520

540

560

580

Log

pric

e In

dex

ReturnRV

05

1015

RV

0.0

0.2

0.4

0.6

0.8

1.0

Sta

te P

roba

bilit

ies

bull correction

bull

bear

bear rally

mid−July−07 early−Sep−08 late−Mar−09 mid−Nov−09

0.0

0.2

0.4

0.6

0.8

1.0

Bul

l Pro

babi

litie

s

200601 200608 200702 200709 200804 200811 200906 201001

Figure 1.7: MS-4, 2006-2010


++++

+++

+

++

+

+

++

+++++++

++++++

+

++

+

+

+

+++

+

+++

++

+

+

+

+

+

++

+

+

++++

+

+

+

+

+

+++

++

+++

+++

+++

+

++

+

++

+

++++++

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

++

+

+

+

+

+

+

+

+

+

+

+++

++

+

+

++

+++

+

+

+

+

+

+++

+

++++++

−15

−10

−5

05

10R

etur

n

MS−4Normal

0.0

0.2

0.4

0.6

0.8

1.0

For

ecas

t Sta

te P

roba

bilit

ies

s=1s=2

s=3s=4

0.0

0.2

0.4

0.6

0.8

1.0

For

ecas

t of P

roba

bilit

y in

Bul

l

200701 200706 200711 200804 200809 200902 200907 200912

Figure 1.8: Value-at-Risk from MS-4 and Benchmark Normal distribution

Chapter 2

An Efficient Approach to Estimate

and Forecast in the Presence of an

Unknown Number of Change-points

48

Chapter 2. A New Change-point Model 49

2.1 Introduction

Accounting for structural instability in macroeconomic and financial time series modeling

and forecasting is important. Applications to the time series data including the Phillips

curve, US real interest rates and inflation have confirmed the necessity of modeling the

parameters which characterize the conditional data density as time-varying. Failing to

do so usually produces inferior out-of-sample forecasts, because the data before and after

a structural break have different implications for the most recent regime. Suppose an

abrupt structural break happens and the parameters in the new regime are independent

of the past, the estimation of the parameters in the current regime will be contaminated

if the data before the break point is used. The forecast is further distorted by the biased

estimation.

A model with a single change-point is not adequate to describe the structural insta-

bility from empirical studies. Multiple change-point models that allow the data dynamics

to change out-of-sample is helpful to forecasting in many applications. This chapter fo-

cuses on multiple change-point models in the Bayesian framework, because the Bayesian

approach provides inferences based on a finite sample size and integrates out the estima-

tion uncertainty in the out-of-sample forecasting. Markov chain Monte Carlo (MCMC)

sampling methods make the model estimation straightforward.

A popular Bayesian model for multiple structural breaks is Chib (1998). He models

structural breaks as a Markov chain and estimates it with a fixed number of regimes. His

approach is not appropriate for out-of-sample forecasting. Pesaran et al. (2006) extend

Chib’s (1998) model with a hierarchical prior for the parameters which characterize each

regime. Koop and Potter (2007) further model a hierarchical distribution for regime du-

rations, which implies that the structural change probabilities are duration dependent.

Gerlach et al. (2000) introduce the mixture innovation model in a state space represen-

tation to allow an unknown number of regimes and Giordani and Kohn (2008) apply an

adaptive method to improve computational efficiency. Both Koop and Potter’s (2007)


and Gerlach et al.’s (2000) methodology nest the time varying parameter (TVP) model,

which assumes that the parameters changes at each time period (see Stock and Watson

(1991, 1996) and Primiceri (2005)).

In another direction, Inclan (1993) and Wang and Zivot (2000) suggest to obtain the

analytic form of the posterior distribution of the change-points conditional on the number

of regimes by using the conjugate priors. Their approaches explore all the possible periods

for the structural breaks to occur. Suppose the length of a time series is T and the number

of change-points is M , the total number of the different combinations of the change-points

is T !(T−M)!M !

. If T = 1000 and M = 3, it requires 166, 167, 000 marginal likelihoods to be

computed.1 Even if T = 100, it still requires 161, 700 marginal likelihoods. In general,

their methodology is impractical if the number of regimes is larger than 3. Maheu and

Gordon (2008) avoid such problem by introducing a real time forecasting model and

concentrate on the last regime. It includes many sub-models and each sub-model is

indexed by the duration of the last regime. They report the filtered but do not discuss

the smoothed distribution of the change-points.

This chapter extends Maheu and Gordon (2008) and Maheu and McCurdy (2009)

in five directions. First, I use a conjugate prior for the parameters which characterize

each regime. Conditional on this prior and the time-invariant parameters, the predictive

density has a closed form. The computational burden is reduced compared to Maheu

and Gordon (2008), in which a non-conjugate prior is assumed 2. Second, a hierarchi-

cal structure for the conjugate prior is introduced to allow learning and sharing of the

information across regimes as in Pesaran et al. (2006). In the presence of a structural

break, the new parameters are drawn independently from the hierarchical prior. Third,

one extension of the new approach models the regime duration as a Poisson distribution,

which implies duration dependent break probabilities. Fourth, this chapter shows how

1The methods developed in this chapter can deal with a sample size of 1000.2Maheu and Gordon (2008) assume a conditional conjugate prior and use Gibbs sampling to compute

the predictive density.


to produce the smoothed distribution of the change-points. Lastly, different types of the

break dynamics including having breaks in the variance, the regression coefficients or

both are nested in this framework.

The differences between this chapter and Koop and Potter’s (2007) method are the

follows. First, Koop and Potter (2007) assume a heterogeneous distribution for the

duration in each regime. Their approach augments the state space by regime durations,

so there are O(T 2) states, which implies a large transition matrix. In contrast, I assume

that the regime durations are drawn from the same distribution. So the total number of

states is O(T ) in the new model. Second, Koop and Potter (2007) assume that after a

structural change, the parameters in the new regime are related to those in the previous

regime through a random walk. Instead, I assume that in each regime the parameters

are drawn independently from a hierarchical prior. This reflects an abrupt change of

the parameters and is convenient for computation. Lastly, this chapter introduces a

new MCMC sampler to draw all the parameters including the hierarchical prior, the

parameters of the durations, the change-points and the parameters characterizing each

regime from their posterior distribution jointly. Based on Casella and Robert (1996),

this posterior sampler is efficient. I also expect the new approach to be very fast in

computation.

Four versions of the model are proposed in this new framework. The first type allows

breaks in the regression coefficients and the variance simultaneously. The second allows

the regression coefficients to change but keeps the variance constant. The third one

keeps the regression coefficient constant while allowing breaks in the variance. All of the

above three versions assume that the structural change probability is time-invariant. The

last type models the regime duration as a Poisson distribution, which implies duration

dependent break probabilities.

The new MCMC sampler is applicable to all of the four versions of the models. It de-

composes the parameter space into the time-varying parameters and the time-invariant


parameters and samples them jointly by taking advantage of the analytic form of the

predictive density. The sampler first draws the time-invariant parameters from its poste-

rior distribution by a Metropolis-Hastings step. The proposal distribution used to sample

the time-invariant parameters is the conditional posterior distribution implied by a Gibbs

sampler. Then, the time-varying parameters including the change-points are drawn from

the posterior distribution conditional on the time-invariant parameters. This approach

is efficient because the sampler draws all the parameters jointly.

Those different versions of the models are applied to a Canada inflation series to

investigate its dynamic stability. The log marginal likelihood is used as the criteria for

model comparison. The best model is the hierarchical model which allows breaks in the

regression coefficients and the variance simultaneously. It identifies 4 major change-points

in the Canada inflation dynamics. The model comparison also shows that the duration

dependent break probability is not a significant feature of the data. After controlling for

the structural breaks, adding extra lags as the explanatory variables does not improve

the out-of-sample forecasting.

This chapter is organized as the following. Section 2.2 introduces the Maheu and

Gordon (2008) model and revises it with conjugate priors. The modified model has a

closed form for the predictive density conditional on the structural break probabilities.

A new Markov Chain Monte Carlo method is proposed to sample from the posterior

distribution efficiently. Section 2.3 extends the non-hierarchical prior to a hierarchical one

in order to exploit the information across regimes. Different extensions of the hierarchical

model are introduced in Section 2.4, including a model with breaks only in the variance

or in the regression coefficients. Duration dependent break probability is also discussed

by assuming a Poisson distribution for the regime durations. Section 2.5 applies this

framework to a Canada inflation time series. Section 2.6 concludes.


2.2 Maheu-Gordon Model with Conjugate Prior

This section briefly reviews Maheu and Gordon’s (2008) model and its key assumption.

Then the advantage of using the conjugate priors is described. I also provide the method

to calculate the predictive density and to estimate the model by a Markov Chain Monte

Carlo sampler.

In Maheu and Gordon’s (2008) model, there are t sub-models at each time t. The sub-

model Mi, where i = 1, . . . , t, is indexed by the most recent break point i. For example,

the sub-model M1 assumes that no break happens through out the whole sample. For

a sub-model Mi, the parameters of the data density are the same for time τ ∈ [i, t] and

different from time τ < i. The sub-model Mt implies that a structural break is present

and the parameters which characterize the data density are changed at time t. As time i is

the starting point of the most recent regime, they assume that the data before time i is not

informative for the posterior of the parameters θ in model Mi. Define Yi,t = (yi, . . . , yt)

for 1 ≤ i ≤ t, the previous statement is equivalent to p(θ |Mi, Y1,t) = p(θ |Mi, Yi,t).

This model is originally designed for forecasting. As the time grows, the number of

sub-models increases and each sub-model Mi need to be estimated for each time t. The

filtered distribution of the sub-models is updated by the Bayes’ rule. In detail:

1. At time 1, there is only one sub-model, so the filtered distribution of sub-models is

degenerate as p(M1 | y1) = 1.

2. At time t, compute the predictive likelihood for every sub-model by

p(yt+1 | Y1,t,Mi) =

∫p(yt+1 | Y1,t, θ)p(θ |Mi, Yi,t)dθ

where the posterior p(θ |Mi, Yi,t) ∝ p(θ)p(Yi,t | θ) for i = 1, . . . , t. When i = t+ 1,

the predictive likelihood only depends on the prior p(θ).

3. Compute the filtered distribution of the sub-models at time t + 1. Define λt as


the break probability at time t and Λ1,t = (λ1, . . . , λt). This allows sub-models

based on different information sets to be combined. In particular, the sub-model

probability is

p(Mi | Y1,t+1,Λ1,t) ∝

(1− λt)p(Mi | Y1,t,Λ1,t)p(yt+1 | Y1,t,Mi) if i = 1, . . . , t

λtp(yt+1 | Y1,t,Mt+1) if i = t+ 1

The predictive density is calculated by integrating out all the sub-models and the

uncertainty of the structural break.

p(yt+1 | Y1,t,Λ1,t) = λtp(yt+1 | Y1,t,Mt+1)+(1−λt)t∑i=1

p(yt+1 | Y1,t,Mi)p(Mi | Y1,t,Λ1,t)

4. Repeat 2-3 until the last period T .

The key assumption of Maheu and Gordon (2008) is that the data before the most

recent break point is uninformative to the current regime. The predictive density at time

t+ 1 for the sub-model Mi only depends on the index i and the data from time i to t. In

this framework, the index of the sub-models can be regarded as the state variable, since

it contains all the information needed to compute the predictive density after integrating

out the sub-model parameter θ. If they drop this assumption and adopt Koop and

Potter’s (2007) approach by assuming that the parameters after a break is related to

those in the previous regime, then it is impractical to use the previously described steps,

because the whole path of change-points is needed to obtain the predictive density. In

another words, the state space is expanded from O(T ) to O(2T ).

Pesaran et al. (2006) argued the importance of modeling a hierarchical prior to use

information across different regimes. Maheu and Gordon (2008) adopt a non-hierarchical

prior because of the heavy computational burden. From the above step 2, there are O(T 2)

predictive likelihoods to compute. Each time of the computation involves the estimation


of the posterior distribution of the parameter θ. Since they do not use a conjugate prior,

the estimation is done numerically by using Gibbs sampling. So, to further estimate a

hierarchical prior is computationally infeasible.

Now, I will show how to apply the conjugate prior to Maheu and Gordon’s (2008)

model to improve computational efficiency. Next section will discuss the hierarchical

priors.

Notice that the most recent break at time i has a one-to-one relationship to the

duration of the last regime up to time t. If dt is the duration, then dt = t − i + 1 by

definition. The duration is used in this chapter for two reasons. First, the model in

this chapter studies not only the forecasting problem but also the ex-post analysis of

multiple change-points. Since Maheu and Gordon (2008) do not consider the smoothed

distribution of breaks and only focus on the filtered distribution of sub-models, their

notation of the sub-model (Mi) drops a subscript t, which represents the current time

period and is implicitly assumed in the real-time setting. However, if we are interested

in a past time period of τ < t, Mi need a notation to represent the date τ where we

are standing at. On the other hand, the new notation dt has a subscript t representing

the time and the value of dt is the duration up to the time t. Second, the new approach

introduced nests the duration dependent break probabilities. So using the duration is

natural and easier for presentation.

Formally, I define dt as the duration of the most recent regime up to time t and

dt ∈ 1, . . . , t by construction. If a break happens at time t, then dt = 1. If dt = t, then

there is no break throughout the whole sample. The predictive density conditional on

the duration is given by

p(yt+1 | dt+1, Y1,t) =

∫p(yt+1 | θ, Y1,t)p(θ | dt+1, Y1,t)dθ

=

∫p(yt+1 | θ, Y1,t)p(θ | Yt−dt+1+2,t)dθ


θ is the collection of parameters which characterize the most recent regime that is associ-

ated with data yt+1 and has duration dt. The second equality comes from the assumption

that the data before a break point is uninformative to the regime after it. If τ > t, Yτ,t

is an empty set. For example, if dt+1 = 1, p(θ | Yt−dt+1+2,t) is equivalent to its prior p(θ).

The conditional distribution of yt+1 | θ, Y1,t is a linear model with an i.i.d. normal

error term. The prior is assumed a Normal-Gamma distribution, which is conjugate to

the model. By conjugacy, the posterior distribution θ | Yt−dt+1+2,t is also Normal-Gamma.

The predictive density p(yt+1 | dt+1, Y1,t) is a Student-t distribution if we integrate out

θ. Conditional on dt, the posterior distribution and the predictive density have analytic

forms.

If we assume a constant structural break probability λt = π, the Maheu and Gordon

(2008) model with the conjugate prior can be written as follows:

dt =

dt−1 + 1 w.p. 1− π

1 w.p. π

(βt, σ−2t ) ∼ 1(dt = 1)NG(β,H, χ/2, ν/2) + 1(dt > 1)δ(βt−1,σ

−2t−1)

(1)

yt | βt, σt, Y1,t−1 ∼ N(x′tβt, σ2t )

The covariate xt can include exogenous or lagged dependent variables. In this chapter, I

set xt = (1, yt−1, . . . , yt−q)′, which implies an AR(q) model in each regime. Define θt ≡

(βt, σ2t ) as the collection of the parameters which characterize the data density at time t. If

a break happens (dt = 1), θt is drawn independently from the prior NG(β,H, χ/2, ν/2),

where NG represents a Normal-Gamma distribution. In detail, the precision (inverse

of variance) σ−2t is drawn from a Gamma distribution G(χ/2, ν/2), where χ/2 is the

multiplier and ν/2 is the degree of freedom. Its prior mean is νχ

and the prior variance

is 2νχ2 . It also implies the prior mean of the variance σ2

t isχ

ν−1 . Conditional on the

variance, the vector of the regression coefficients βt is drawn from a multivariate normal


distribution N(β,H−1σ2t ). δθ represents a degenerate distribution at a mass point θ. If

there is no break (dt > 1), all parameters are the same as those in the previous period.

By conjugacy of the prior, the posterior distribution of the parameters which char-

acterize the data density at time t is still Normal-Gamma conditional on the duration

dt.

βt, σ−2t | dt, Y1,t ∼ NG(β, H−1, χ/2, ν/2)

with

β = H−1(Hβ +X ′t−dt+1,tYt−dt+1,t)

H = H +X ′t−dt+1,tXt−dt+1,t

χ = χ+ Y ′t−dt+1,tYt−dt+1,t + β′Hβ − β′Hβ

ν = ν + dt

where Xt−dt+1,t = (xt−dt+1, . . . , xt)′.

If there is no break at time t+ 1, the new duration increase by 1 (dt+1 = dt + 1) and

the parameters which characterize the data dynamics stay the same (θt+1 = θt) as the

last period. The posterior distribution of θt is used to compute the predictive density.

p(yt+1 | dt+1 = dt + 1, Y1,t) =

∫p(yt+1 | θt, Y1,t)p(θt | Yt−dt+1,t)dθ

∝

(1 +

(yt − x′tβ)2

χ(x′tH−1xt + 1)

)− (ν+1)2

.

The last line is the kernel of a Student-t distribution, so

yt+1 | dt+1 = dt + 1, Y1,t ∼ t

(x′tβ,


ν, ν

).

For the special case of dt+1 = 1, a structural change happens at time t + 1, so the

data before t + 1 is uninformative to the predictive density. Simply replace the above


filtered distribution of the parameters by the prior will produce

yt+1 | dt+1 = 1, Y1,t ∼ t

(x′tβ,


ν, ν

).

By integrating out the model parameters, the predictive density depends on the du-

ration dt+1 and the past information Y1,t. Now Chib’s (1996) method can be applied to

sample D1,T = (d1, . . . , dT ) jointly. In detail, first use the forward-filtering method to

calculate the filtered distribution of the duration dt for t = 1, . . . , T .

1. At t = 1, the distribution of the duration is p(d1 = 1 | y1) = 1 by assumption.

2. The forecasting step:

p(dt+1 = j | Y1,t) =

p(dt = j − 1 | Y1,t)(1− π) for j = 2, · · · , t+ 1

π for j = 1

3. The updating step:

p(dt+1 = j | Y1,t+1) =p(yt+1 | dt+1 = j, Y1,t)p(dt+1 = j | Y1,t)

p(yt+1 | Y1,t)

for j = 1, . . . , t + 1. The first term in the numerator on the right hand side is

a student-t distribution density function which we have derived using the conju-

gate prior. The second term is obtained from step 2. The predictive likelihood is

computed by summing over all the values of the duration dt+1.

p(yt+1 | Y1,t) =t+1∑j=1

p(yt+1 | dt+1 = j, Y1,t)p(dt+1 = j | Y1,t)

4. Iterate over step 2 and 3 until the last period T .

Then, use the backward-sampling method to draw the vector of durationsD1,T = (d1 . . . , dT )


jointly.

1. Sample the last period duration dT from dT | Y1,T , which is obtained from the last

iteration of the forward-filtering step.

2. If dt > 1, then dt−1 = dt − 1 by construction.

3. If dt = 1, then sample dt−1 from the distribution dt−1 | Y1,t−1. This is because

dt = 1 implies a structural change at time t. Hence, for any τ ≥ t, the data yτ is

in a new regime and uninformative to dt−1. More rigorously, dt−1 | dt = 1, Y1,t−1 is

equivalent to dt−1 | dt = 1, Y1,T .

4. Iterate step 2 and 3 until the first period t = 1.

Assuming the conjugate prior in Maheu and Gordon’s (2008) model has several fea-

tures. First, the computational burden is negligible compared to the original model with

the non-conjugate priors. Meanwhile, the computer memory required by the predictive

likelihoods is O(T 2), which is manageable for a sample size up to several thousands.

Second, the optimal number of regimes is estimated in a straightforward way. This num-

ber is sampled from the posterior distribution and equal to the number of time t with

duration value dt = 1. Define K as the number of regimes implied by one sample of

the vector of the durations D1,T from the posterior distribution, then K =T∑t=1

1(dt = 1).

The posterior distribution of K − 1 is the distribution of the number of change-points.

Third, the posterior sampler is efficient based on Casella and Robert (1996), because the

parameters Θ1,T = θtTt=1 are integrated out.

2.2.1 Estimation and Inference

The parameters in the non-hierarchical prior, (β,H, χ, ν), are fixed. In the case of the

constant break probability, the prior of the break probability π is assumed as a Beta

distribution, B(πa, πb). Because the analytic conditional marginal likelihood p(Y1,T | π)


exists, π can be sampled through a Metropolis-Hastings framework by integrating out the

time-varying parameters Θ1,T and the regime durations D1,T . For an efficient proposal

sampling distribution, I exploit the information from the previous sample of the regime

durations D1,T in the Markov chain. This is motivated by the fact that the sampling of π

can be done in a Gibbs sampler conditional on D1,T . Instead of using the Gibbs sampler

to sample π and accept it, this method uses the conditional posterior distribution in the

Gibbs sampler as a proposal distribution and accept it with a probability implied by the

Metropolis-Hastings algorithm.

In general, a Gibbs sampler could alternatively draws random samples from p(π |

Θ1,T , D1,T , Y1,T ) and p(Θ1,T , D1,T | π, Y1,T ). In contrast, the Metropolis-Hastings step

samples from p(π | Y1,T ) first and then from p(Θ1,T , D1,T | π, Y1,T ), which is equivalent to

sampling from the joint posterior distribution p(π,Θ1,T , D1,T | Y1,T ).

1. Sample π | Y1,T from a proposal distribution:

π(i) | Y1,T ∼ Beta(πa +K(i−1) − 1, πb + T −K(i−1))

K(i−1) is the number of regimes implied from the previous sample of D(i−1)1,T . Accept

π(i) with probability

min

1,

p(π(i) | πa, πb)p(π(i−1) | πa, πb)

p(YT | π(i))

p(YT | π(i−1))

p(π(i−1) | πa +K(i−1) − 1, πb + T −K(i−1))

p(π(i) | πa +K(i−1) − 1, πb + T −K(i−1))

If not accepted, π(i) is set equal to π(i−1).

2. Sample Θ1,T , D1,T | π, Y1,T :

(a) Sample D1,T | π, Y1,T from the previously described forward-backward method.

Calculate the number of regimes K and index the regimes by 1, · · · , K. Use

an auxiliary variable st to represent the regime index at time t. Define s1 = 1

and st = 1 for t > 1 until time τ with dτ = 1, which implies there is a break


and the data is in a new regime. Then set sτ = 2 at this break point, and

iterate until the last period with sT = K.

For example, if D1,T = (1, 2, 3, 1, 2, 1, 2, 3, 4), we can infer there are K =

3 regimes and the time series of regime indicators is S1,T = (s1, . . . , sT ) =

(1, 1, 1, 2, 2, 3, 3, 3, 3). As we can see, there is a one-to-one relationship between

D1,T and S1,T .

(b) To sample Θ1,T | D1,T , π, Y1,T , we only need to sample K different sets of

parameters because their values are constant in each regime. Define β∗i , σ∗i as

the distinct parameters which characterize the ith regime, where i = 1, . . . , K.

β∗i , σ∗−2i ∼ NG(βi, H

−1i , χi/2, νi/2)

with

βi = H−1i (Hβ +X ′iYi)

H i = H +X ′iXi

χi = χ+ Y ′i Yi + β′Hβ − β′iH iβi

νi = ν +Di

Xi = (xt0 , . . . , xt1)′ and Yi = (yt0 , . . . , yt1)

′, where st = i if and only if t0 ≤ t ≤

t1. So, Xi and Yi represent the data in the ith regime. Di = t1 − t0 + 1 is the

duration of the ith regime.

The Markov chain is run for N0 + N times and the first N0 iterations are discarded

as burn-in samples. The rest of the samples of the parametersπ(i),Θ

(i)1,T , D

(i)1,T

Ni=1

are

used for inferences and forecasting as if they were drawn from the posterior distribution.

For example, the posterior mean of the break probability is computed as the sample

average of π(i) as E(π | Y1,T ) = 1N

N∑i=1

π(i). The posterior mean of the volatility at time t


is E(σ2t | Y1,T ) = 1

N

N∑i=1

σ2t(i)

. Similarly, we can also obtain the predictive density at time

T + 1. Because we know

p(yT+1 | Y1,T ) = E (p(yT+1 | dT , Y1,T ) | Y1,T )

Using the posterior distribution, this can be estimated as

p(yT+1 | Y1,T ) =1

N

N∑i=1

p(yT+1 | dT+1 = d

(i)T + 1, Y1,T )(1− π(i)) + p(yT+1 | dT+1 = 1, Y1,T )π(i)

2.3 Hierarchical Structural Break Model

2.3.1 Hierarchical Distribution

Maheu and Gordon (2008) require a careful choice of the prior in forecasting, because the

parameters in a new regime only depend on the prior in the presence of a break. They

do not learn about this prior distribution from the the parameters in each regime. In

contrast Pesaran et al. (2006) proposed to estimate the pior to improve forecasting by

exploiting the information across regimes. This section introduces a hierarchical prior for

the structural break model. This is computationally feasible only if using the conjugate

prior as in the previous section. The model is referred as the hierarchical SB-LSV model:

SB means structural break and LSV means that the level, the slope and the variance are

subject to breaks. The model in the previous section is labeled as the non-hierarchical

SB-LSV model.

In detail, the previous prior parameters β,H, χ, ν are not fixed any more but given a


prior. The hierarchical SB-LSV model is the following:

β,H ∼ N−W(m0, τ−10 , A0, a0)

χ ∼ G(d0/2, c0/2)

ν ∼ Exp(ρ0)

dt =

dt−1 + 1 w.p. 1− π

1 w.p. π

(2)

(βt, σ−2t ) ∼ 1(dt = 1)NG(β,H, χ/2, ν/2) + 1(dt > 1)δ(βt−1,σ

−2t−1)

yt | βt, σt, Y1,t−1 ∼ N(x′tβt, σ2t )

The positive definite matrix H has a Wishart distribution W(A0, a0), where A0 is a

positive definite matrix and a0 is a positive scalar. The prior mean of H is a0A0. The

prior variance of H ij is a0(A2ij + AiiAjj), where subscript ij means the ith row and the

jth column. β | H is a multivariate Normal N(m0, τ−10 H−1), where τ0 is a positive scalar.

χ has a Gamma distribution with a prior mean of c0/d0 and a prior variance of 2c0/d20.

ν has an Exponential distribution with both of the prior mean and variance equal to ρ0.

Conditional on the number of regimesK and the distinct parameter values β∗i , σ∗i Ki=1,

the posterior distribution of the hierarchical parameters β and H are still Normal-

Wishart.

β,H | β∗i , σ∗i Ki=1 ∼ N−W(m1, τ−11 , A1, a1)


with

m1 =1

τ1

(τ0m0 +

K∑i=1

σ∗−2i β∗i

)

τ1 = τ0 +K∑i=1

σ∗−2i

A1 =

(A−10 +

K∑i=1

σ∗−2i β∗i β∗′i + τ0m0m

′0 − τ1m1m

′1

)−1a1 = a0 +K

The posterior of χ | ν,K, σ∗i Ki=1 is a Gamma distribution.

χ | ν, σ∗i Ki=1 ∼ G(d1/2, c1/2)

with

d1 = d0 +K∑i=1

σ∗−2i

c1 = c0 +Kν

The posterior of ν | χ,K, σ∗i Ki=1 does not have a convenient form,

p(ν | χ,K, σ∗i Ki=1) ∝

((χ/2)ν/2

Γ(ν/2)

)K ( K∏i=1

σ∗−2i

)ν/2

exp− ν

ρ0.

It is sampled by a Metropolis-Hastings algorithm using a random walk as the proposal

distribution.

Similar to the sampling of the break probability π in the non-hierarchical SB-LSV

model, I use a new MCMC sampler to draw the time-invariant parameters including

the hierarchical parameters by using the proposal distribution in a Gibbs sampler as

the proposal distribution. To implement the sampler, define Ψ = (π, β,H, χ, ν) as the


collection of the break probability and the parameters of the hierarchical prior, which are

all time-invariant. Since the analytic form of the marginal likelihood p(Y1,T | Ψ) exists,

the joint sampler draws Ψ from a proposal distribution and accept the new draw with

a probability implied by the Metropolis-Hastings algorithm. Then, sample the regime

durations D1,T and the time-varying parameters Θ1,T conditional on Ψ and the data Y1,T .

The details are in the appendix.

After discarding the burn-in samples, the rest of the sample is used to draw inferences

from the posterior as in the non-hierarchical model. The predictive likelihood, p(yT+1 |

Y1,T ) is estimated by

1

N

N∑i=1

p(yT+1 | dT+1 = d

(i)T + 1,Ψ(i), Y1,T )(1− π(i)) + p(yT+1 | dT+1 = 1,Ψ(i), Y1,T )π(i)

2.4 Extension

This new approach has two crucial assumptions. One is the conjugate prior for the regime

dependent parameters which characterize the conditional data density. The other is that

the data before a break point is uninformative to the regime after it conditional on the

time-invariant parameters. Both are necessary for the analytic form of the predictive

density. If we do not use the conjugate prior, each predictive density p(yt+1 | dt+1, Y1,t)

has to be estimated numerically. If the second assumption is violated, the data before

the break can provide information to the regime after it, the duration dt itself is not

sufficient for the predictive density given the time-invariant parameters. For example, in

Koop and Potter’s (2007) model, in order to integrate out the parameters in the most

recent regime, we need to know the whole sample path of the durations D1,t = (d1, . . . , dt).

However, since the vector of durations D1,t takes 2t values, it is computationally infeasible

to calculate the predictive likelihood for every case, while in the new model it is feasible.

This section extends the model while preserving the two assumptions. The first exten-


sion allows the structural breaks only in the variance σ2t or in the regression coefficients

βt. The second extension considers the duration dependent break probabilities. Because

we have an analytic form for the predictive density even with duration dependent break

probabilities, our approach continues to be computationally straightforward. Since mod-

eling the duration dependent break probability is equivalent to modeling the duration,

the extension assumes a Poisson distribution for each regime.

2.4.1 Breaks in the Variance

The model with breaks only in the variance is referred as the hierarchical SB-V model.

It assumes a time-invariant vector of the regression coefficients β. The time-varying

variance σ2t are drawn from a hierarchical prior. In detail:

χ ∼ G(d0/2, c0/2)

ν ∼ Exp(ρ0)

β ∼ N(β,H−1)

dt =

dt−1 + 1 w.p. 1− π

1 w.p. π

(3)

σ−2t ∼ 1(dt = 1)G(χ/2, ν/2) + 1(dt > 1)δσ−2t−1

yt | β, σt, Y1,t−1 ∼ N(x′tβ, σ2t )

The prior for the regression coefficients β is not modelled as hierarchical since it is

constant across all regimes. The parameters of its prior β and H are fixed. On the other

hand, the prior for the variance σ2t is modelled as hierarchical to share the information

across regimes. Since the regression coefficient β is the same in all regimes, the data

before a break point is informative to the regime after it. So the duration of the most

recent regime dt+1 is not sufficient for computing the posterior of the parameters in that


regime. More rigorously, p(θt | dt, Y1,T ) 6= p(θt | dt, Y1,t). And the predictive density

p(yt+1 | dt+1, Y1,t) is not a student-t distribution any more as in the non-hierarchical

SB-LSV model.

Notice that the second key assumption is still preserved because the vector of the

regression coefficients β is time-invariant for the hierarchical SB-V model. Conditional

on β, if a break happens, the volatility is independently drawn from the hierarchical

prior and the previous information is not useful for the current regime. Although p(θt |

dt, Y1,T ) 6= p(θt | dt, Y1,t), we still have p(θt | dt, β, Y1,T ) = p(θt | dt, β, Y1,t).

Meanwhile, conditional on β, the prior for the variance is conjugate. So the model

can be estimated using the method similar to that in the hierarchical SB-LSV model.

Specifically, define the collection of the time-invariant parameters as Ψ = (π, β, χ, ν). The

posterior MCMC sampler first randomly draw Ψ | Y1,T using the proposal distribution

in a Gibbs sampler and accept it with the probability implied by a Metropolis-Hastings

algorithm. Then, conditional on Ψ and the data Y1,T , draw the regime durations D1,T

and the time-varying parameters Θ1,T . In the hierarchical SB-V model, Θ1,T = σtTt=1,

because the time-invariant regression coefficients β ∈ Ψ are sampled in the first step.

The details are in the appendix.

2.4.2 Breaks in the Regression Coefficients

We can also fix the variance σ2 as time-invariant and only allow the regression coefficients

to change over time. This model is named as the hierarchical SB-LS since the breaks

only happen for the level and slopes. Conditional on the variance σ2, the data before a

break is not informative to the current regime. Also, the conjugate prior exists for the

regression coefficient βt in each regime. Since the two key assumptions are satisfied, the

hierarchical SB-LS model can be estimated as the hierarchical SB-LSV or SB-V model.


In detail, the model is:

β,H ∼ N−W(m0, τ−10 , A0, a0)

σ−2 ∼ G(χ/2, ν/2)

dt =

dt−1 + 1 w.p. 1− π

1 w.p. π

(4)

βt ∼ 1(dt = 1)N(β,H−1) + 1(dt > 1)δβt−1

yt | βt, σ, Y1,t−1 ∼ N(x′tβt, σ2)

The posterior sampler randomly draws the time-invariant parameter Ψ = (π, β,H, σ)

from the its posterior distribution using a MCMC sampler. Then it samples the the

regime durations D1,T and the time varying parameters Θ1,T = βtTt=1 conditional on

the time-invariant parameter Ψ and the data Y1,T . The details are in the appendix.

2.4.3 Duration Dependent Break Probability

Previously, the time-invariant structural break probability π is used in the forecasting step

to compute p(dt+1 = j | Y1,t) in order to construct the filtered probability p(dt = j | Y1,t)

and the predictive density p(yt+1 | Y1,t). If the break probability depends on the regime

duration, define the break probability p(dt+1 = 1 | dt = j) as πj. Then p(dt+1 = j | Y1,t)

is calculated as

p(dt+1 = j | Y1,t) =

p(dt = j − 1 | Y1,t)(1− πj−1) for j = 2, · · · , t+ 1

1−t∑

k=1

p(dt = k | Y1,t)πk for j = 1

The updating step of the forward filtering procedure and the backward sampling proce-

dure are not affected. Conditional on the durations D1,T , the posterior of the parameters

which characterize each regime are not changed, either. So the estimation is still com-


putationally straightforward.

A Poisson distribution is assumed as the distribution for regime durations in this

extension. The hazard rate represents the duration dependent break probabilities3. The

Poisson distribution function is P (Duration = d | λ) = e−λ λ(d−1)

(d−1)! , where d ≥ 1. The

implied break probability is

P (dt+1 = 1 | dt = j, λ) = P (Duration = j | Duration ≥ j, λ)

=P (Duration = j | λ)

P (Duration ≥ j | λ)

=e−λλ(j−1)

(j − 1)γ(j − 1, λ)

where γ(x, y) is the incomplete gamma functions with γ(x, y) =∫ y0tx−1e−tdt. The no-

break probability P (dt+1 = j + 1 | dt = j) is simply 1 − P (dt+1 = 1 | dt = j, λ). The

priors for the other parameters are set the same as the hierarchical SB-LSV model. This

extension is labeled as the hierarchical DDSB-LSV model, where DD means duration

dependent.

To estimate the hierarchical DDSB-LSV model, notice that the set of the time-

invariant parameters Ψ now is (λ, β,H, χ, ν). The posterior sampler draws Ψ from its

posterior distribution by a Metropolis-Hastings sampler. Then the time-varying param-

eters Θ1,T and the regime durations D1,T are sampled conditional on the time-invariant

parameter Ψ and the data Y1,T . This is still a joint sampler as in the hierarchical SB-LSV

with the time-invariant break probability. Details are in the appendix.

2.5 Application to Canada Inflation

The new approach is applied to a Canada quarterly inflation time series to investigate

its dynamics instability. The data is constructed from the quarterly CPI, which is down-

3In general, any hazard function in the survival analysis can be applied to model the duration.


loaded from CANSIM4. The quarterly inflation rate is calculated as the log difference of

the CPI data and scaled by 100. It starts from 1961Q1 and ends at 2009Q4 with 196

observations in total. The summary statistics are in Table 2.1.

The hierarchical models used are SB-LSV, SB-V, SB-LS and DDSB-LSV models. Two

non-hierarchical SB-LSV models are also applied, one estimates the break probability π

and the other fixes π = 0.01. Linear autoregressive models are used as benchmarks for

model comparison. For all the structural break models, I assume that the explanatory

variables in each regime include an intercept and the one-period lag of the dependent

variable. So the data follows an AR(1) process in the each regime.

The prior of the hierarchical SB-LSV model is:

π ∼ B(1, 9)

H ∼W(

0.2 0

0 0.2

, 5)

β | H ∼ N(

0

0

, H−1)

χ ∼ G(2, 2)

ν ∼ Exp(2)

This prior is informative but covers a wide range of empirically realistic values. The

prior mean of the break probability E(π) = 0.1, which implies infrequent breaks. The

inverse of the variance in each regime is drawn from a Gamma distribution, which has

a degree of freedom centered at 2 and a multiplier centered at 1. Conditional on these

4TABLE NUMBER: 3800003. TABLE TITLE: GROSS DOMESTIC PRODUCT (GDP) INDEXES.Data Sources: IMDB (Integrated Meta Data Base) Numbers: 1901 - NATIONAL INCOME ANDEXPENDITURE ACCOUNTS. SERIES TITLE: CANADA; IMPLICIT PRICE INDEXES 2002=100;PERSONAL EXPENDITURE ON CONSUMER GOODS AND SERVICES SERIES FREQUENCY:Quarterly


values, the variance in each regime has a mean of 1.0 and a variance of infinity. The

prior mean of β is a vector of 0’s and the prior mean of H−1 is 2.5 times an identity

matrix. Conditional on these values and the variance in a regime, the intercept and

the autoregressive coefficient of the AR(1) process in the same regime are both drawn

independently from a Normal distribution with a mean of 0 and a variance equals to 2.5

times the variance in that regime.

The prior and the posterior summary of the parameters are in Table 2.2. The posterior

mean of the structural change probability π is 0.04, which is less than its prior mean of

0.1. The posterior mean implies an average duration of 6 years and 1 quarter. The 95%

density interval is narrower than that of the prior, because the data provide information

to shrink the interval. The prior mean of the precision matrix H is the identity matrix,

which is consistent with its posterior mean. So we do not learn much information from

the data for H. On the other hand, the prior and the posterior mean for the intercept β0

are 0 and 0.82, respectively. And the posterior 95% density interval of β0

does not cover

0. We can conclude that the β in the hierarchical structure learns from the information

across regimes. χ does not learn from the data since its prior and posterior mean are 1.0

and 1.01, respectively. However, its density interval shrinks, which implies that the data

confirms the prior assumption. Lastly, ν learns from the data because its prior mean

is 2.0 while its posterior mean is 6.04 and the 95% posterior density interval does not

include the prior mean.

The posterior means of the regression coefficients E(βt | Y1,T ), the standard deviations

E(σt | Y1,T ) and the structural change probabilities p(dt = 1 | Y1,T ) for t = 1, . . . , T , are

plotted in Figure 2.1. The top panel is the data. The second panel plots the break

probabilities over time. The middle panel plots the intercept βt,0 over time. The AR(1)

coefficient βt,1 is plotted in the fourth panel and is labeled as persistence. The standard

deviation σt is in the bottom panel. From the plot of the break probabilities, we can

visually identify 4 major breaks in the inflation process. The first is in the mid-60’s,


which is featured by an increase of the inflation level. The second is in the early 70’s,

which is associated with oil crisis and characterized by an increase of the persistence and

the volatility. In the mid 80’s, a structural change happened by decreasing in both of the

persistence and the volatility, which is consistent with the great moderation. The last

break happened in the early 90’s, which is featured by decreasing in both of the inflation

level and its volatility. Figured 2.1 shows that each break brings different dynamic

patterns to the inflation process.

The non-hierarchical SB-LSV model fixes the parameters of the priors at β = (0, 0)′, H =

I2, χ = 1, ν = 2, which are the prior means of the hierarchical SB-LSV model. The break

probability π has the same prior as that of the hierarchical model, which is B(1, 9).

The posterior mean of π equals to 0.01. Its 95% density interval is (0.002, 0.029). The

non-hierarchical SB-LSV model implies a longer regime duration than the hierarchical

SB-LSV model.

The posterior means of the time varying parameters E(βt | Y1,T ), E(σt | Y1,T ) and

the probabilities of breaks, p(dt = 1 | Y1,T ), are plotted in Figure 2.2. Each panel has

the same interpretation as in Figure 2.1. There are three points worth noticing. First,

there is only one spike in the break probabilities, which is in the early 90’s. It captures

the decrease of the variance and is consistent with the last change-point identified by

the hierarchical SB-LSV model. Second, although it is not visually identifiable in the

second panel, from the middle and the fourth panel, we can observe a gradual increase of

the persistence and a decrease of the intercept between the mid 60’s and the early 70’s.

However, there are many uncertainties for the identification of the change-point in that

period. Lastly, the non-hierarchical SB-LSV model fails to identify the great moderation

in the mid 80’s.

As one alternative to the time-invariant break probability, the duration is modeled as

a Poisson distribution to fit the inflation dynamics. The prior of the duration parameter

λ is assumed as an exponential distribution with a mean of 50. The other priors are set


as the same as that of the hierarchical SB-LSV model. For simplicity, the first period

is assumed to be the first period of its regime. Table 2.3 shows the posterior summary

of the parameters. The learning of the β is similar to that in the hierarchical SB-LSV

model, but the χ and ν are different. The estimates of λ implies one regime lasts about 7

years and a quarter, which is comparable to the length of 6 years and a quarter implied

by the hierarchical SB-LSV model. Figure 2.3 shows that the change-points identified

by the duration dependent model is consistent with Figure 2.1. The dynamics patterns

of these two figures are similar except for the last 10 years. The hierarchical DDSB-LSV

model says that some structural change uncertainties exist around the year 2000, after

which the volatility increased. Although the smoothed parameters for the hierarchical

DDSB-LSV model are similar to that of the hierarchical SB-LSV model, the later model

comparison shows that the Poisson duration is strongly rejected. This is attributed to the

fact that the duration dependent break probability implied by the Poisson distribution

is very small if the regime duration is short. For example, if the duration parameter λ

equals to the posterior mean of 28.9, the break probability p(dt+1 = 1 | dt) is less than

1.0e−5 if the duration dt < 10. This feature causes the model to learn regime changes

slower than a constant break probability model.

For the hierarchical SB-V model, which only allows breaks in the variance, the prior

of the time-invariant regression coefficient vector β is

β ∼ N

0

0

,

1 0

0 1

Its mean and the precision matrix are the prior means in the hierarchical SB-LSV model.

The priors of π, χ and ν are the same as the hierarchical SB-LSV model. The posterior

summary is in Table 2.4. The most prominent feature is that the posterior mean of the

break probability π is 0.16, which is much higher than that of the hierarchical or the

non-hierarchical SB-LSV model.


The frequent change of volatilities is shown in Figure 2.4. The middle panel is the

break probability, from which we can observe that the process is characterized by many

breaks in the variance. This frequent break pattern is similar to the ARCH effects in Engle

(1982, 1983). The bottom panel plots the posterior means of the standard deviations

E(σt | YT ). Although we can see some episodes such as from the mid 80’s to the early

90’s are more stable, there is no general pattern about the volatility evolution. In practice,

it is not desirable to have too frequent structural changes, which implies that less data

can be used to estimate the most recent regime. The frequent break pattern of Canada

inflation estimated by the hierarchical SB-V model reflects the model misspecification,

because the more general hierarchical SB-LSV model nests the hierarchical SB-V model

and it does not find as many breaks as the later one does.

On the other hand, the hierarchical SB-LS model allows the breaks to happen only

in the regression coefficients and keeps the variance constant. The prior of the inverse of

the variance is:

σ−2 ∼ G(1, 0.5)

The values of the multiplier and the degree of freedom in this prior are the means implied

by the prior for the hierarchical SB-LSV model. The priors for π, β andH are set the same

as that of the hierarchical SB-LSV model. The posterior summary is in Table 2.5. The

posterior for the break probability is similar to that of the hierarchical SB-LSV model.

Figure 2.5 plots the posterior means of the regression coefficients and the probabilities

of breaks. Surprisingly, the hierarchical SB-LS model locates the same change-points as

the hierarchical SB-LSV model does in Figure 2.1.

Some questions are raised from the above results. Are changes in volatility important

for the Canada inflation series? Is the great moderation a feature of data? Can a duration

dependent break probability improve the out-of-sample forecasting? This chapter uses

the log marginal likelihoods for model comparison to answer these questions.


Use i as the indicator of a model, the marginal likelihood of the model Mi is

p(Y1,T | Mi) =T∏t=1

p(yt | Y1,t−1,Mi)

This decomposition shows that the marginal likelihood is intrinsically the comparison

based on the out-of-sample forecasts, which automatically penalizes the over-parameterized

model. An improvement on the marginal likelihood implies better forecasting ability over

the whole sample.

The log marginal likelihood is calculated asT∑t=1

log p(yt | Y1,t−1,Mi). The one-period

predictive likelihood p(yt | Y1,t−1,Mi) is calculated by using the data up to t − 1 to

estimate the model and plugging the value of yt into the predictive density function.

The first period is simply to use the prior as the posterior estimates. Kass and Raftery

(1995a) propose to compare the model Mi and Mj by the log Bayes factors log(BFij),

where BFij =p(Y1,T |Mi)

p(Y1,T |Mj)is the ratio of the marginal likelihoods. In short, a positive

value of log(BFij) supports model Mi against Mj. Quantitatively, Kass and Raftery

(1995a) suggest the results barely worth a mention for 0 ≤ log(BFij) < 1; positive for

1 ≤ log(BFij) < 3; strong for 3 ≤ log(BFij) < 5; and very strong for log(BFij) ≥ 5.

Table 2.6 shows the log marginal likelihoods of different models. The autoregressive

models are also applied as benchmarks.

yt | β, σ, Y1,t−1 ∼ N(β0 + β1yt−1 + . . .+ βqyt−q, σ2) (5)

The prior is set as Normal-Gamma

(β, σ−2) ∼ NG(β,H, χ/2, ν/2)

The parameters β = 0(q+1)×1, H = Iq+1, χ = 1, ν = 2. If q = 1, it is an AR(1) process

and the values are the same as in the non-hierarchical SB-LSV model.


The hierarchical SB-V, the hierarchical DDSB-LSV, the non-hierarchical models and

the AR(1) model perform the worst and have log marginal likelihoods less than −155.

The duration dependent break probability is not appropriate for the Canada inflation

dynamics in the application. The AR(2) and the AR(3) model improve the performance

by adding more lags. The hierarchical SB-LS model has the log marginal likelihood

of −140.4, which is larger than that of the AR(2) and the AR(3) model by −140.4 −

(−144.2) = 3.8 and −140.4 − (−144.7) = 4.3, respectively. So, keeping the AR(1)

dynamics but allowing the breaks in the regression coefficients improves the marginal

likelihood more than adding extra lags. The optimal choice is the hierarchical SB-LSV

model with the log marginal likelihood of −122.5, which dominates the other models

strongly.

For a robustness check, I estimate each of the break models by assuming an AR(2) or

AR(3) in each regime. For the hierarchical SB-LSV model, the log marginal likelihoods

are −126.3 and −129.7 for the AR(2) or AR(3) case, which are less than that with the

basic AR(1) assumption. The largest log marginal likelihood of the rest of the break

models using AR(2) or AR(3) in each regime is −144.4. The optimal model is still the

hierarchical SB-LSV with AR(1) process in each regime. Hence, after controlling for the

structural breaks, adding extra number of lags does not improve the marginal likelihood

or forecasting in terms of the predictive likelihoods.

To check the prior sensitivity of the break probability π in SB-LSV, SB-LS and SB-V

models, the alternative priors B(1, 19),B(1, 99) and B(1, 999) were used. For the DDSB-

LSV, the prior mean of λ is set as 10 or 100. The posterior means of the time-varying

parameters in Figure 2.1-2.5 are similar. The result of the model comparison is consistent

with the original one. The priors of the hierarchical parameters are kept the same, since

they cover a reasonably wide range of the parameter space.


2.6 Conclusion

A new approach is introduced to estimate and forecast time series with multiple change-

points. This methodology obtains the analytic form of the predictive density by taking

advantage of the conjugate prior for the parameters that characterize each regime. The

prior is modeled as hierarchical to exploit the information across regimes to improve

forecast.

This approach allows the breaks in the variance, the regression coefficients or both. It

also nests the duration dependent break probabilities. One extension assumes the regime

duration has a Poisson distribution.

A new Markov Chain Monte Carlo sampler is introduced to draw the parameters from

the posterior distribution efficiently. This methodology uses the conditional posterior

distribution in the Gibbs sampler as a proposal distribution and accepts the random draw

by a Metropolis-Hastings algorithm. This approach is efficient because the parameters

are sampled jointly.

This new model and its extensions are applied to a Canada inflation series. The log

marginal likelihood is used as the criteria for model comparison. The best model is the

hierarchical model which allows the breaks in the regression coefficients and the variance

simultaneously. It identifies 4 major change-points in the Canada inflation dynamics.

The model comparison also shows that the duration dependent break probability is not

a feature of the data. It further shows that after controlling for the structural breaks,

adding extra lags as the explanatory variables does not improve the out-of-sample fore-

cast.


2.7 Appendix

2.7.1 Hierarchical SB-LSV Model

1. Sampling π(i), β(i), H(i), χ(i), ν(i) | YT from the following proposal distribution.

(a) Sample π(i) | K(i−1) ∼ B(πa+K(i−1)−1, πb+T−K(i−1)) as the non-hierarchical

model.

(b) Sample H(i) | β(i−1)k , σ

(i−1)k Kk=1 ∼W(A1, a1)

(c) Sample β(i) | H(i), β(i−1)k , σ

(i−1)k Kk=1 ∼ N(m1, (τ1H

(i))−1)

(d) Sample χ(i) | ν(i−1), σ(i−1)k Kk=1 ∼ G(d1/2, c1/2)

(e) Sample ν(i) | ν(i−1) ∼ G( ζν(i−1) , ζ)

with

m1 =1

τ1

(τ0m0 +

K∑i=1

σ−2i βi

)

τ1 = τ0 +K∑i=1

σ−2i

A1 =

(A−10 +

K∑i=1

σ−2i βiβ′i + τ0m0m

′0 − τ1m1m

′1

)−1a1 = a0 +K

d1 = d0 +K∑i=1

σ−2i

c1 = c0 +Kν(i−1)

Accept the whole set Ψ(i) = (π(i), β(i), H(i), χ(i), ν(i)) with probability

min

1,

p(Ψ(i))

p(Ψ(i−1))

p(YT | Ψ(i))

p(YT | Ψ(i−1))

pprop(Ψ(i−1))

pprop(Ψ(i))

where p(Ψ) is the prior density and pprop(Ψ) is the proposal density.


2. Sample st, βt, σtTt=1 | Ψ as the non-hierarchical structural break model.

2.7.2 Hierarchical SB-V Model

The predictive likelihood is computed as:

p(yt | st, Yt−1, β) ∝(

1 +(yt − x′tβ)2

χ

)− (ν+1)2

or

yt | st, Yt−1, β ∼ t(x′tβ,χ

ν, ν)

with the mean x′tβ and the variance χν−2 , where

χ = χ+ E ′t−st+1,t−1Et−st+1,t−1

ν = ν + st − 1

Et−st+1,t−1 = (et−st+1, . . . , et−1)′ is the residual vector with et = yt − x′tβ. The posterior

sampling scheme is the following:

1. Sampling π(i), β(i), χ(i), ν(i) | YT from the following proposal distribution.


model.

(b) Sample β(i) | σ(i−1)k Kk=1, ST ∼ N(β,H

−1)

(c) Sample χ(i) | ν(i−1), σ(i−1)k Kk=1 ∼ G(d1/2, c1/2)

(d) Sample ν(i) | ν(i−1) ∼ G( ζν(i−1) , ζ)


with

β = H−1

(Hβ +T∑t=1

xtytσ2t

)

H = H +T∑t=1

xtx′t

σ2t

d1 = d0 +K∑i=1

σ−2i

c1 = c0 +Kν(i−1)

Accept the whole set Ψ(i) = (π(i), β(i), χ(i), ν(i)) with probability

min

1,

p(Ψ(i))

p(Ψ(i−1))

p(YT | Ψ(i))

p(YT | Ψ(i−1))

pprop(Ψ(i−1))

pprop(Ψ(i))


2. Sample st, σtTt=1 | Ψ similar to the non-hierarchical structural break model.

2.7.3 Hierarchical SB-LS Model

The predictive likelihood of yt | st, Yt−1, σ is

yt | st, Yt−1, σ ∼ N(x′tβ, x′tH−1xt + σ2)

where β = H−1(Hβ + σ−2X ′t−st+1,t−1Yt−st+1,t−1) and H = H + σ−2X ′t−st+1,t−1Xt−st+1,t−1.

The posterior sampler is

1. Sampling π(i), β(i), H(i), σ(i) | YT from the following proposal distribution.


model.

(b) Sample H(i) | β(i−1)k Kk=1 ∼W(A1, a1)


(c) Sample β(i) | H(i), β(i−1)k Kk=1 ∼ N(m1, (τ1H

(i))−1)

(d) Sample σ−2(i) | β(i−1)

k Kk=1, ST ∼ G(χ1/2, ν1/2)

with

m1 =1

τ1

(τ0m0 +

K∑i=1

βi

)

τ1 = τ0 +K

A1 =

(A−10 +

K∑i=1

βiβ′i + τ0m0m

′0 − τ1m1m

′1

)−1a1 = a0 +K

χ1 = χ0 +T∑t=1

(yt − xtβt)2

ν1 = ν0 + T

Accept the whole set Ψ(i) = (π(i), β(i), H(i), σ(i)) with probability

min

1,

p(Ψ(i))

p(Ψ(i−1))

p(YT | Ψ(i))

p(YT | Ψ(i−1))

pprop(Ψ(i−1))

pprop(Ψ(i))


2. Sample st, βtTt=1 | Ψ similar the non-hierarchical structural break model.

2.7.4 Hierarchical DDSB-LSV Model

1. Sampling λ(i), β(i), H(i), χ(i), ν(i) | YT from the following proposal distribution.

(a) Sample λ(i) by a random walk proposal distribution

(b) Sample H(i) | β(i−1)k , σ

(i−1)k Kk=1 ∼W(A1, a1)

(c) Sample β(i) | H(i), β(i−1)k , σ

(i−1)k Kk=1 ∼ N(m1, (τ1H

(i))−1)


(d) Sample χ(i) | ν(i−1), σ(i−1)k Kk=1 ∼ G(d1/2, c1/2)

(e) Sample ν(i) | ν(i−1) ∼ G( ζν(i−1) , ζ)

with

m1 =1

τ1

(τ0m0 +

K∑i=1

σ−2i βi

)

τ1 = τ0 +K∑i=1

σ−2i

A1 =

(A−10 +

K∑i=1

σ−2i βiβ′i + τ0m0m

′0 − τ1m1m

′1

)−1a1 = a0 +K

d1 = d0 +K∑i=1

σ−2i

c1 = c0 +Kν(i−1)

Accept the whole set Ψ(i) = (λ(i), β(i), H(i), χ(i), ν(i)) with probability

min

1,

p(Ψ(i))

p(Ψ(i−1))

p(YT | Ψ(i))

p(YT | Ψ(i−1))

pprop(Ψ(i−1))

pprop(Ψ(i))


2. Sample st, βt, σtTt=1 | Ψ as the non-hierarchical structural break model.


2.7.5 Tables

Table 2.1: Summary statistics of Canada inflation

Mean 1.01

Min -0.54

Max 3.12

Variance 0.69

Skewness 0.83

Excess Kurtosis 0.09

Canada quarterly inflation rate from 1961Q1-2009Q4. There are 196observations in total. The data is scaled by 100 to represent quarterlypercentage change. Data Sources: IMDB (Integrated Meta Data Base)TABLE NUMBER: 3800003. Numbers: 1901


Table 2.2: Posterior summary of the hierarchical SB-LSV model

Prior Prior

Mean 0.95DI Mean Sd 0.95 DI

π 0.1 (0.003, 0.34) 0.04 0.02 (0.01, 0.07)

β0

0.0 (-3.08, 3.08) 0.82 0.20 (0.45, 1.23)

β1

0.0 (-3.08, 3.08) -0.05 0.17 (-0.40, 0.29)

H00 1.0 (0.16, 2.52) 1.01 0.45 (0.35, 2.08)

H01 0.0 (-0.91, 0.91) -0.01 0.37 (-0.77, 0.70)

H11 1.0 (0.16, 2.52) 1.30 0.58 (0.45, 2.68)

χ 1.0 (0.12, 2.79) 1.01 0.37 (0.43, 1.87)

ν 2.0 (0.05, 7.38) 6.04 2.27 (2.45, 11.1)


Table 2.3: Posterior summary of the hierarchical DDSB-LSV model

Prior Prior


λ 50.0 (1.27, 184.4) 28.9 7.32 (14.70, 44.22)

β0

0.0 (-3.08, 3.08) 0.74 0.16 (0.44, 1.08)

β1

0.0 (-3.08, 3.08) -0.03 0.15 (-0.33, 0.28)

H00 1.0 (0.16, 2.52) 1.00 0.35 (0.43, 1.80)

H01 0.0 (-0.91, 0.91) -0.06 0.28 (-0.61, 0.48)

H11 1.0 (0.16, 2.52) 1.19 0.39 (0.56, 2.11)

χ 1.0 (0.12, 2.79) 3.86 1.76 (1.22, 8.26)

ν 2.0 (0.05, 7.38) 26.1 12.2 (8.03, 56.7)



Table 2.4: Posterior summary of the hierarchical SB-V model

Prior Prior


π 0.1 (0.003, 0.34) 0.16 0.07 (0.05, 0.32)

β0 0.0 (-1.96, 1.96) 0.16 0.05 (0.09, 0.28)

β1 0.0 (-1.96, 1.96) -0.05 0.17 (-0.40, 0.29)

χ 1.0 (0.12, 2.79) 0.96 0.42 (0.38, 1.99)

ν 2.0 (0.05, 7.38) 4.65 1.59 (2.22, 8.42)


Table 2.5: Posterior summary of the hierarchical SB-LS model

Prior Prior


π 0.1 (0.003, 0.34) 0.03 0.01 (0.01, 0.07)

σ2 - (0.14, 19.7) 0.17 0.02 (0.14, 0.21)

β0

0.0 (-3.08, 3.08) 0.53 0.36 (-0.16, 1.21)

β1

0.0 (-3.08, 3.08) -0.27 0.32 (-0.93, 0.33)

H00 1.0 (0.16, 2.52) 1.63 0.67 (0.62, 3.22)

H01 0.0 (-0.91, 0.91) 0.12 0.53 (-0.98, 1.14)

H11 1.0 (0.16, 2.52) 2.10 0.89 (0.76, 4.42)



Table 2.6: Log marginal likelihoods

Hierarchical SB-LSV -122.5

Hierarchical SB-V -159.8

Hierarchical SB-LS -140.4

Hierarchical DDSB-LSV -155.6

Non-hierarchical SB-LSV -158.4

Non-hierarchical SB-LSV with π = 0.01 -156.6

AR(1) -160.8

AR(2) -144.2

AR(3) -144.7



2.7.6 Figures

01

23

Infla

tion

rate

0.0

0.2

0.4

0.6

Bre

ak P

roba

bilit

y0.

40.

60.

81.

01.

2In

terc

ept

−0.

40.

00.

20.

4P

ersi

sten

ce0.

340.

380.

420.

46st

anda

rd d

evia

tion

196309 196809 197306 197803 198303 198712 199212 199709 200206 200706

Figure 2.1: Posterior mean of the regression coefficients, the standard deviations and thebreak probabilities from the hierarchical SB-LSV model applied to a Canada quarterlyinflation series from 1961Q1-2009Q4.


01

23

Infla

tion

rate

0.0

0.2

0.4

0.6

0.8

Bre

ak P

roba

bilit

y0.

380.

400.

420.

44In

terc

ept

0.0

0.2

0.4

0.6

Per

sist

ence

0.45

0.50

0.55

0.60

stan

dard

dev

iatio

n

196309 196809 197306 197803 198303 198712 199212 199709 200206 200706

Figure 2.2: Posterior mean of the regression coefficients, the standard deviations andthe break probabilities from the non-hierarchical SB-LSV model applied to a Canadaquarterly inflation series from 1961Q1-2009Q4.


01

23

infla

tion

rate

0.0

0.2

0.4

0.6

0.8

Bre

ak P

roba

bilit

y0.

40.

60.

81.

01.

2In

terc

ept

−0.

20.

00.

20.

4P

ersi

sten

ce0.

340.

380.

42st

anda

rd d

evia

tion

196309 196809 197306 197803 198303 198712 199212 199709 200206 200706

Figure 2.3: Posterior mean of the regression coefficients, the standard deviations and thebreak probabilities from the hierarchical DDSB-LSV model applied to a Canada quarterlyinflation series from 1961Q1-2009Q4.


01

23

Infla

tion

rate

0.0

0.1

0.2

0.3

0.4

0.5

Bre

ak P

roba

bilit

y0.

30.

40.

50.

60.

70.

8st

anda

rd d

evia

tion

196309 196809 197306 197803 198303 198712 199212 199709 200206 200706

Figure 2.4: Posterior mean of the standard deviations and the break probabilities fromthe hierarchical SB-V model applied to a Canada quarterly inflation series from 1961Q1-2009Q4.


01

23

Infla

tion

rate

0.0

0.2

0.4

0.6

0.8

Bre

ak P

roba

bilit

y0.

40.

60.

81.

01.

2In

terc

ept

−0.

6−

0.4

−0.

20.

00.

20.

4P

ersi

sten

ce

196309 196809 197306 197803 198303 198712 199212 199709 200206 200706

Figure 2.5: Posterior mean of the regression coefficients and the break probabilities fromthe hierarchical SB-LS model applied to a Canada quarterly inflation series from 1961Q1-2009Q4.

Chapter 3

Modeling Regime Switching and

Structural Breaks with an Infinite

Dimension Markov Switching Model

92

Chapter 3. Modeling Regime Switching and Structural Breaks 93

3.1 Introduction

This chapter contributes to the current literature by accommodating regime switching

and structural break dynamics in a unified framework. Current regime switching models

are not suitable for capturing instability of dynamics because they assume a finite number

of states and that the future is like the past. Structural break models allow the dynamics

to change over time, however, they may incur loss in estimation precision because the

past states cannot recur and the parameters in each state are estimated separately. An

infinite dimension Markov switching model is proposed to accommodate both types of

model and provide much richer dynamics. I show how to globally identify structural

breaks versus regime switching. In applications to U.S. real interest rates and inflation,

the new model performs better than the alternative parametric regime switching models

and the structural break models in terms of in-sample fit and out-of-sample forecasts.

The model estimation and forecasting are based on a Bayesian framework.

Regime switching models were first applied by Hamilton (1989b) to U.S. GNP data.

It is an important methodology to model nonlinear dynamics and is widely applied to eco-

nomic data including business cycles (Hamilton, 1989b), bull and bear markets (Maheu

et al., 2010), interest rates (Ang and Bekaert, 2002a) and inflation (Evans and Wachtel,

1993). Geweke and Amisano (2011) use a hierarchical mixture structure to capture the

dynamics of financial asset returns. There are two common features of these models.

First, past states can recur over time. Second, the number of states is finite (it is usually

2 and at most 4). In the rest of this chapter, a regime switching model is assumed to have

both features. However, the second feature may cause biased out-of-sample forecasts if

sudden changes of the dynamics exist.

In contrast to regime switching models, structural break models can capture dynamic

instability by assuming an infinite or a much larger number of states at the cost of extra

restrictions. For example, Koop and Potter (2007) proposed a structural break model

with an infinite number of states. If there is a change in the data dynamics, it will


be captured by a new state. The restriction in their model is that the parameters in

a new state are different from those in the previous ones. This condition is imposed

for estimation tractability. However, it prevents the data divided by break points from

sharing the same model parameter, and could incur some loss in estimation precision.

In the current literature, structural break models such as Chib (1998), Wang and Zivot

(2000), Pesaran et al. (2006) and Maheu and Gordon (2008) have the same feature as

Koop and Potter (2007); namely that the states cannot recur. In the rest of this chapter,

a structural break model is assumed to have non-recurring states and an infinite or a

large number of states.

As we can see, regime switching and structural break dynamics have different impli-

cations for data fitting and forecasting. What is missing in the current literature is a

method to reconcile them. For instance, a common practice is to use one approach or the

other in applications to specific problems. Levin and Piger (2004) modelled U.S. infla-

tion as a structural break process while Evans and Wachtel (1993) assumed a two-regime

Markov switching model. Which feature is more important for inflation analysis, regime

switching, structural breaks or both? Garcia and Perron (1996a) used a three-regime

Markov switching model for U.S. real interest rates while Wang and Zivot (2000) applied

a model with structural breaks in mean and volatility. Did the real interest rates in 1981

have distinct dynamics or return to a historical state with the same dynamics? Existing

econometric models have difficulty answering these questions.

This chapter provides a solution by proposing an infinite dimension Markov switch-

ing model. It incorporates regime switching and structural break dynamics in a unified

framework. Recurring states are allowed to improve estimation and forecasting preci-

sion. An unknown number of states is embedded in the infinite dimension structure and

estimated endogenously to capture the dynamic instability. Different from the Bayesian

model averaging methodology, this model combines different dynamics nonlinearly.

The proposed model builds on and extends Fox et al. (2008). They used a Dirichlet


process1 as a prior on the transition probabilities of an infinite hidden Markov switching

model. The key innovation in their work is introducing a sticky parameter that favours

state persistence and avoids the saturation of states. Their model is denoted by FSJW

in the rest of this chapter. Jochmann (2010) applies FSJW to investigate the structural

breaks in the U.S. inflation dynamics.

The contributions of this chapter are as follows. First, a second hierarchical structure

in addition to FSJW is introduced to allow learning and sharing of information for the

parameter of the conditional data density in each state. This approach is labelled as

the sticky double hierarchical Dirichlet process hidden Markov model (SDHDP-HMM).

Second, I present an algorithm to globally define structural breaks versus regime switching

dynamics.2 This is done by avoiding the label switching problem and focusing on label

invariant posterior statistics. Lastly, this chapter provides a detailed comparison of the

new SDHDP-HMM against existing alternative regime switching and structural change

models by out-of-sample density forecasting through a simulation study and two empirical

applications to U.S. real interest rates and inflation. The results show that the SDHDP-

HMM is robust to model uncertainty and superior in forecasting, and the hierarchical

structure on the conditional data density parameters improves out-of-sample performance

significantly.

In the application to U.S. real interest rates, the SDHDP-HMM is compared to the

regime switching model by Garcia and Perron (1996a) in a Bayesian framework and the

structural break model by Wang and Zivot (2000) with minor modifications. The results

of the SDHDP-HMM supports Garcia and Perron’s (1996a) finding that the switching

points occurred at the beginning of 1973 (the oil crisis) and the middle of 1981 (the

federal budget deficit) instead of Huizinga and Mishkin’s (1986) finding of October 1979

and October 1982 (both are monetary policy changes). The SDHDP-HMM also identifies

1The Dirichlet process is a commonly used prior in Bayesian nonparametric models.2Jochmann (2010) proposes to identify structural breaks, but ignores recurring states in the posterior

inference.


two of the three turning points found by Wang and Zivot (2000). The model comparison

based on the predictive likelihood shows regime switching dynamics dominates structural

break dynamics for U.S. real interest rates.

The second application is to U.S. inflation. The SDHDP-HMM is compared to the

regime switching model by Evans and Wachtel (1993) in a Bayesian framework and a

structural break model by Chib (1998). This application shows that inflation has fea-

tures of both regime switching and structural breaks. The SDHDP-HMM can capture

both features and provide richer dynamics than existing parametric models. The pre-

dictive likelihoods further confirm that it is robust to model uncertainty and superior in

forecasting.

The rest of this chapter is organized as follows: Section 3.2 introduces the Dirich-

let process to make this chapter self-contained. Section 3.3 outlines the sticky double

hierarchical Dirichlet process hidden Markov model and discusses its model structure

and implications. Section 3.4 sketches the posterior sampling algorithm, explains how

to identify the regime switching and the structural break dynamics, and describes the

forecasting method. Section 3.5 compares the SDHDP-HMM to regime switching and

structural break models through simulation. Section 3.6 studies the dynamics of U.S.

real interest rate by revisiting the Markov switching model of Garcia and Perron (1996a)

in the Bayesian framework and the structural break model of Wang and Zivot (2000)

with minor modification, and comparing them to the SDHDP-HMM using an extended

data set. Section 3.7 applies the SDHDP-HMM to U.S. inflation, and compares it to

Evans and Wachtel’s (1993) Markov switching model in a Bayesian framework, Chib’s

(1998) structural break model and Fox et al.’s (2008) model. Section 3.8 concludes.


3.2 Dirichlet process

Before introducing the Dirichlet process, the definition of the Dirichlet distribution is the

following:

Definition The Dirichlet distribution is denoted by Dir(α), where α is aK-dimensional

vector of positive values. Each sample x from Dir(α) is a K-dimentional vector with

xi ∈ (0, 1) andK∑i=1

xi = 1. The probability density function is:

p(x | α) =

Γ

(K∑i=1

αi

)K∏i=1

Γ(αi)

K∏i=1

xαi−1i .

A special case is the Beta distribution denoted by B(α1, α2), which is a Dirichlet

distribution with K = 2.

Define α0 =K∑i=1

αi, Xi, the ith element of the random vector X from a Dirichlet dis-

tribution Dir(α), has mean αiα0

and variance αi(α0−αi)α20(α0+1)

. Hence, we can further decompose

α into two parts: a shape parameter G0 = (α1

α0, · · · , αK

α0) and a concentration parameter

α0. The shape parameter G0 represents the center of the random vector X and the

concentration parameter α0 controls how close X is to G0.

The Dirichlet distribution is conjugate to the multimonial distribution in the following

sense: if

X ∼ Dir(α)

β = (n1, . . . , nK) | X ∼Mult(X)


where ni is the number of occurrences of i in a sample of n =K∑i=1

ni points from the

discrete distribution on 1, · · · , K defined by X, then

X | β = (n1, . . . , nK) ∼ Dir(α + β).

This relationship is used in Bayesian statistics to estimate the hidden parameters X,

given a collection of n samples. Intuitively, if the prior is represented as Dir(α), then

Dir(α + β) is the posterior following a sequence of observations with histogram β.

The Dirichlet process was introduced by Ferguson (1973) as the extension of the

Dirichlet distribution from a finite dimension to an infinite dimension. It is a distribution

of distributions and has two parameters: the shape parameter G0 is a distribution over

a sample space Ω, and the concentration parameter α0 is a positive scalar. They have

similar interpretations as their counterparts in the Dirichlet distribution. The formal

definition is the following:

Definition The Dirichlet process over a set Ω is a stochastic process whose sample path

is a probability distribution over Ω. For a random distribution F distributed according to

a Dirichlet process DP(α0, G0), given any finite measurable partition A1, A2, · · · , AK of

the sample space Ω, the random vector (F (A1), · · · , F (AK)) is distributed as a Dirichlet

distribution with parameters (α0G0(A1), · · · , α0G0(AK)).

Use the results from the Dirichlet distribution, for any measurable set A, the random

variable F (A) has mean G0(A) and variance G0(A)(1−G0(A))α0+1

. The mean implies the shape

parameter G0 represents the centre of a random distribution F drawn from a Dirichlet

process DP(α0, G0). We define ai ∼ F as an observation drawn from the distribution

F . Because by definition P (ai ∈ A | F ) = F (A), we can derive P (ai ∈ A | G0) =

E(P (ai ∈ A | F ) | G0) = E(F (A) | G0) = G0(A). Hence, the shape parameter G0 is also

the marginal distribution of an observation ai. The variance implies the concentration

parameter α0 controls how close the random distribution F is to the shape parameter


G0. The larger α0 is, the more likely F is close to G0, and vice versa.

Suppose there are n observations, a = (a1, · · · , an), drawn from the distribution F .

Usen∑i=1

δai(Aj) to represent the number of ai in set Aj, where A1, · · · , AK is a measurable

partition of the sample space Ω and δai(Aj) is the Dirac measure, where

δai(Aj) =

1 if ai ∈ Aj

0 if ai /∈ Aj.

Conditional on (F (A1), · · · , F (AK)), the vector

(n∑i=1

δai(A1), · · · ,n∑i=1

δai(AK)

)has a multi-

nomial distribution. By the conjugacy of the Dirichlet distribution to the multimomial

distribution, the posterior distribution of (F (A1), · · · , F (AK)) is still a Dirichlet distri-

bution:

(F (A1), · · · , F (AK)) | a ∼ Dir

(α0G0(A1) +

n∑i=1

δai(A1), · · · , α0G0(AK) +n∑i=1

δai(AK)

)

Because this result is valid for any finite measurable partition, the posterior of F is still

a Dirichlet process by definition, with new parameters α∗0 and G∗0, where

α∗0 = α0 + n

G∗0 =α0

α0 + nG0 +

n

α0 + n

n∑i=1

δain

The posterior shape parameter, G∗0, is the mixture of the prior and the empirical

distribution implied by observations. As n → ∞, the shape parameter of the posterior

converges to the empirical distribution. The concentration parameter α∗0 → ∞ implies

the posterior of F converges to the empirical distribution with probability one. Ferguson

(1973) showed that a random distribution drawn from a Dirichlet process is almost surely

discrete, although the shape parameter G0 can be continuous. Thus, the Dirichlet process


can only be used to model continuous distributions with approximation.

For a random distribution F ∼ DP(α0, G0), because F is almost surely discrete, it can

be represented by two parts: different values for θi and their corresponding probabilities

pi, where i = 1, 2, · · · . Sethuraman (1994) found the stick breaking representation of the

Dirichlet process by writing F ≡ (θ, p), where θ ≡ (θ1, θ2, · · · )′, p ≡ (p1, p2, · · · )′ with

pi > 0 and∞∑i=1

pi = 1. The F ∼ DP(α0, G0) can be generated by

Viiid∼ Beta(1, α0) (1)

pi = Vi

i−1∏j=1

(1− Vj) (2)

θiiid∼ G0 (3)

where i = 1, 2, · · · . In this representation, p and θ are generated independently. The

process generating p, (1) and (2), is called the stick breaking process. The name comes

after the pi’s generation. For each i, the remaining probability, 1 −i−1∑j=1

pj, is sliced by a

proportion of Vi and given to pi. It’s like breaking a stick an infinite number of times.

This chapter uses the notation p ∼ SBP(α0) for this process.

The Dirichlet process was not widely used for continuous random variables until West

et al. (1994) and Escobar and West (1995) proposed the Dirichlet process mixture model

(DPM). A simple DPM model assumes the distribution of the random variable y is an

infinite mixture of different distributions.

p ∼ SBP(α0) (4)

θiiid∼ G0 for i = 1, 2, · · · (5)

g(y) =∞∑i=1

pif(y | θi) (6)


where g(y) is the probability density function of y and f(y | θi) is some probability

density function depending on θi. For example, if f(y | θi) is the normal distribution

density function and θi represents the mean and variance, y is distributed as an infinite

mixture of normal distributions. Hence, continuous random variables can be modelled

non-parametrically by the DPM model.

3.3 Sticky double hierarchical Dirichlet process hid-

den Markov model

The DPM model is used for cross sectional data in West et al. (1994), Escobar and West

(1995) and Shahbaba and Neal (2009) because of the exchangeability of the observations.

However, it is not appropriate for time series modelling because of its lack of state

persistence. This chapter extends the work of Fox et al. (2008) to propose the sticky

double hierarchical Dirichlet process hidden Markov model as follows:

π0 ∼ SBP(γ) (7)

πi | π0 ∼ DP (c, (1− ρ)π0 + ρδi) (8)

λ ∼ G (9)

θiiid∼ G0(λ) (10)

st | st−1 = i ∼ πi (11)

yt | st = j, Yt−1 ∼ f(yt | θj, Yt−1) (12)

where i, j = 1, 2, · · · , and Yt = (y1, · · · , yt)′ represents the data up to time t.

(7) and (8) comprise the first hierarchical structure which governs the transition

probabilities. π0 is the hierarchical distribution drawn from the stick breaking process

with parameter γ and represents a discrete distribution with support on the natural

numbers. Each infinite dimensional vector πi is drawn from a Dirichlet process with


the concentration parameter c and the shape parameter (1 − ρ)π0 + ρδi, which is a

convex combination of the hierarchical distribution π0 and a degenerate distribution at

integer j. There are three points worth noticing for clarity. First, because the shape

parameter (1 − ρ)π0 + ρδi has support only on natural numbers and each number is

associated with non-zero probability, the random distribution πi can only take values of

the natural numbers and each value will receive positive probability by the stick breaking

representation. When combining the same values and sorting them in ascending order,

each πi will have πij representing the probability of taking integer j. So we can use the

vector πi = (πi1, πi2, · · · )′ to represent a distribution drawn from DP(c, (1− ρ)π0 + ρδi).

Second, πi is the infinite dimension vector of transition probabilities given the past state

st−1 = i by (11); the probability of transition from state i to state j is πij. Stacking

πis to construct the infinite dimensional transition matrix P = (π′1, π′2, · · · )′ gives the

hidden Markov model representation. Lastly, if ρ is larger, πi is expected to have a larger

probability at integer i. This implies st, the state at time t, is more likely to be the same

as st−1. Hence, ρ captures state persistence. In the rest of this chapter, ρ is referred as

the sticky coefficient.

(9) and (10) comprise the second hierarchical structure which governs the parameters

of the conditional data density. G0(λ) is the hierarchical distribution from which the

state dependent parameter θi is drawn independently; G is the prior of λ. This structure

provides a way of learning λ from past values of θi to improve estimation and forecasting.

If a new state is born, the conditional data density parameter θnew is drawn from G0(λ).

Without the second hierarchical structure, the new draw θnew depends on some assumed

prior. Pesaran et al. (2006) argued the importance of modelling the hierarchical distri-

bution for the conditional data density parameters in the presence of structural breaks.

This chapter adopts their method to estimate the hierarchical distribution G0(λ).

In comparison to the SDHDP-HMM, FSJW is comprised of (7)-(8) and (10)-(12).

The stick breaking representation of the Dirichlet process is not fully explored by FSJW,


since it has only one hierarchical structure on the transition probabilities. In fact, the

stick breaking representation (1)-(3) decomposes the generation of a distribution F from

a Dirichlet process into two independent parts: the probabilities are generated from a

stick breaking process and the parameter values are independently generated from the

shape G0(λ). The SDHDP-HMM takes fuller advantage of this structure than FSJW by

modelling two parallel hierarchical structures.

The SDHDP-HMM can be summarized as an infinite dimension Markov switching

model with a specific prior. Conditional on the hierarchical distribution π0 and the

sticky coefficient ρ, the mean of the transition matrix is

E(P | π0, ρ) = (1− ρ) ·

π01 π02 π03 · · ·

π01 π02 π03 · · ·

π01 π02 π03 · · ·...

......

. . .

+ ρ ·

1 0 0 · · ·

0 1 0 · · ·

0 0 1 · · ·...

......

. . .

The sticky coefficient ρ captures the state persistence by adding weights to the diagonal

elements of the transition matrix. The concentration parameter c controls how close P

is to E(P | π, ρ).

The common practice of setting the prior on the transition matrix of a Markov switch-

ing model assumes each row of the transition matrix is drawn from a Dirichlet distribution

independently. If extended to the infinite dimension, each row πi should be drawn from

a stick breaking process. However, Teh et al. (2006) argued this prior may have an over-

parametrization problem without a hierarchical structure similar to (7) and (8), because

it precludes each πi from sharing information between each other. In terms of parsimony,

the SDHDP-HMM only needs one stick breaking process for the hierarchical distribu-

tion π0, instead of assuming an infinite number of the stick breaking processes for the

whole transition matrix P . In other words, the hierarchical structure on the transition

probabilities collapses setting the prior on the infinite dimension matrix P to the infinite


dimension vector π0.

The SDHDP-HMM is also related to the DPM model (4)-(6), because (12) can be

replaced by

yt | st−1 = i, Yt−1 ∼∞∑j=1

πijf(yt | θj, Yt−1). (13)

On one hand, the DPM representation implies the SDHDP-HMM is nonparametric. On

the other hand, in contrast to the DPM model, the mixture probability πij is state

dependent. This feature allows the SDHDP-HMM to capture time varying dynamics.

In summary, the SDHDP-HMM is an infinite state space Markov switching model

with a specific form of prior to capture state persistence. Two parallel hierarchical

structures are proposed to provide parsimony and improve forecasting. It preserves the

nonparametric methodology of the DPM model but has state dependent probabilities in

its mixture components.

3.4 Estimation, inference and forecasting

In the following simulation study and applications, the conditional dynamics yt | θj, Yt−1

in (12) is set as a Gaussian AR(q) process:

yt | θj, Yt−1 ∼ N(φj0 + φj1yt−1 + · · ·+ φjqyt−q, σ2j ).

By definition, the conditional data density parameter is θi = (φ′i, σi)′ with φi = (φi0, φi1, · · · , φiq)′.

The hierarchical distribution G0(λ) in (10) is assumed as the regular normal-gamma

distribution in the Bayesian literature.3 The conditional data density parameter θi is

generated as follows:

σ−2i ∼ G (χ/2, ν/2) , φi | σi ∼ N(φ, σ2

iH−1) (14)

3For example, see Geweke (2009).


By definition, λ = (φ,H, χ, ν). φ is a (q + 1)× 1 vector, H is a (q + 1)× (q + 1) positive

definite matrix, and χ and ν are positive scalars. It is a standard conjugate prior for

linear models. Precision parameter σ−2i is drawn from a gamma distribution with degree

of freedom ν/2 and multiplier χ/2. Given the hierarchical distribution parameter λ, the

conditional mean and variance of σ−2i are ν/χ and 2ν/(χ)2, respectively. And φi | σi

is drawn from a multivariate normal distribution with mean φ and covariance matrix

σ2iH−1.

The prior on the hierarchical parameters λ in (9) follows Pesaran et al. (2006):

H ∼W(A0, a0) (15)

φ | H ∼ N(m0, τ0H−1) (16)

χ ∼ G(d0/2, c0/2) (17)

ν ∼ Exp(ρ0). (18)

H is drawn from a Wishart distribution with parameters of a (q + 1) × (q + 1) positive

definite matrix A0 and a positive scalar a0. Samples from this distribution are positive

definite matrices. The expected value of H is A0a0. The variance of Hij, the ith row

and jth colomn element of H, is a0(A2ij + AiiAij), where Aij is the ith row and jth

column element of A0. m0 is a (q + 1)× 1 vector representing the mean of φ, and τ0 is a

positive scalar, which controls the prior belief of the dispersion of φ. χ is distributed as

a gamma distribution with the multiplier d0/2 and the degree of freedom c0/2. ν has an

exponential distribution with parameter ρ0,

The posterior sampling is based on the block sampler of Fox et al. (2008). It approx-

imates the infinite number of states by a large but finite number of states, which is more

efficient than the individual sampler. 4

4Consistency of the approximation was proved by Ishwaran and Zarepour (2000), and Ishwaran andZarepour (2002). Ishwaran and James (2001) compared the individual sampler with the block samplerand found the latter to be more efficient in terms of mixing.


In order to apply the block sampler following Fox et al. (2008), the SDHDP-HMM is

approximated by a finite number of states proposed as follows:

π0 ∼ Dir(γL, · · · , γ

L

)(19)

πi | π0 ∼ Dir ((1− ρ)cπ01, ..., (1− ρ)cπ0i + ρc, · · · , (1− ρ)cπ0L) (20)

λ ∼ G (21)

θiiid∼ G0(λ) (22)

st | st−1 = i ∼ πi (23)

yt | st = j, Yt−1 ∼ N(φj0 + φj1yt−1 + · · ·+ φjqyt−q, σ2j ) (24)

where L is the maximal number of states in the approximation and i = 1, 2, · · · , L. The

hierarchical distribution G0(λ) and its prior are set as (14) and (15)-(18), respectively.

From the empirical point of view, the essence of the SDHDP-HMM is not only its

infinite dimension, but also its sensible hierarchical structure of the prior. If L is large

enough, the finite approximation (19)-(24) is equivalent to the original model (7)-(12) in

practice.

3.4.1 Estimation

Appendix 3.9 shows the detailed posterior sampling algorithm. The parameter space

is partitioned into four parts: (S, I), (Θ, P, π0), (φ,H, χ) and ν. S, I and Θ are the

collections of st, a binary auxiliary variable It and θi, respectively.5 Each part is sampled

conditional on the other parts and the data Y as follows:

1. Sample (S, I) | Θ, P, Y

(a) Sample S | Θ, P, Y by the forward and backward smoother in Chib (1996).

(b) Sample I | S by a Polya Urn scheme.

5It is an auxiliary variable for sampling of π0. The details are in the appendix 3.9.


2. Sample (Θ, P, π0) | S, I, Y

(a) Sample Θ | S, Y by regular linear model result.

(b) Sample π0 | I by a Dirichlet distribution.

(c) Sample P | π0, S by Dirichlet distributions.

3. Sample (φ,H, χ) | S,Θ, ν

(a) Sample (φ,H) | S,Θ by conjugacy of the Normal-Wishart distribution.

(b) Sample χ | ν, S,Θ by a gamma distribution.

4. Sample ν | χ, S,Θ by a Metropolis-Hastings algorithm.

After initiate the parameter values, the algorithm is applied iteratively many times to

obtain a large sample of the model parameters. The first block of samples is discarded to

remove dependence on the initial values. The rest of the sample, S(i),Θ(i), P (i), π(i)0 , φ

(i), H(i), χ(i), ν(i)Ni=1,

are used for inferences as if they were drawn from the posterior distribution. Simula-

tion consistent posterior statistics are computed as sample averages. For example, the

posterior mean of φ, E(φ | Y ), is calculated by 1N

∑Ni=1 φ

(i).

Fox et al. (2008) did not consider the label switching problem, which is an issue in

mixture models.6 For example, switching the values of (θj, πj) and (θk, πk), swapping the

values of state st for st = j, k, while keeping the other parameters unchanged, will result in

the same likelihood value in the finite approximation of the SDHDP-HMM. Inferences on

a label dependent statistic such as θj are misleading without extra constraints. Geweke

(2007) showed that inappropriate constraints can also result in misleading inferences.

To identify regime switching and structural breaks, this chapter uses label invariant

statistics. So the posterior sampling algorithm can be implemented without modification

as suggested by Geweke (2007).

6See Celeux et al. (2000), Fruhwirth-Schnatter (2001) and Geweke (2007)


3.4.2 Identification of regime switching and structural breaks

A heuristic illustration of how an SDHDP-HMM nests different dynamics, including

regime switching and structural breaks, is plotted in Figure 3.1. Each path comprised

by arrows is one sample path of state S in an SDHDP-HMM. Figure 3.1a represents the

no state change case (the Gaussian AR(q) model from the assumption). Figures 3.1b-

3.1d are the regime switching, structural break and frequent parameter change cases,

respectively. Figure 3.1e captures more complicated dynamics, in which some states are

only visited for one consecutive period while others are not.

The current literature does not study the identification of regime switching and struc-

tural breaks in infinite dimension Markov switching models. This chapter proposes a

global identification algorithm to identify regime switching and structural breaks based

on whether a state is recurrent or not. In detail, if a state only appears for one consecu-

tive period, it is classified as a non-recurrent state. Otherwise, it is defined as recurrent.

The starting time of a recurrent (non-recurrent) state is identified as a regime switching

(structural break) point. In Figure 3.2, states 1 and 4, marked with circles, are non-

recurrent states and the starting points of these two segments are identified as structural

breaks. States 2 and 3, marked with triangles, are recurrent states. The starting time of

each consecutive period is identified as a regime switching point.

In detail, if there exist time t0 and t1 (without loss of generality, let t0 ≤ t1) such

that st = j if and only if t0 ≤ t ≤ t1, then state j is non-recurrent and t0 is identified as

a break point. On the other hand, if st0 6= st0−1 and t0 is not a break point, then t0 is

identified as a regime switching point.

This identifcation criteria is simply because, in general, states are recurrent in the

regime switching models but non-recurrent in structural break models. There are two

points worth noticing. First, in terms of mathematical statistics, a recurrent (non-

recurrent) state in a Markov chain is defined as a state which will be visited with probabil-

ity one (less than one) in the future. This chapter defines the recurrence (non-recurrence)


as a statistic on one realized posterior sample path of the state variable S. Because the

mathematical definition is not applicable to the estimation with a finite sample size, there

should be no confusion between these two concepts. Second, a true path of states from a

regime switching model can have non-recurrent states because of randomness or a small

sample size. For example, states 2, 3 and 4 in Figure 3.2 can be generated from a three-

regime switching model. The algorithm identifies state 4 as a non-recurrent states, and

its starting point is classified as a break point. Hence, this identification approach may

label a switching point of a regime switching model as a structural break even if the true

states were observed. However, this is simply accidental. As more data are observed, an

embedded regime switching model will have all its states identified as recurrrent.

More importantly, the purpose of the identification is not to decompose the infinite

dimension Markov switching model into several regime switching and structural break

sub-models (there is no unique way even if we wanted to), but to study the richer dy-

namics which allow recurrent states while accommodating structural breaks. Even if a

non-recurrent state was generated from a regime switching model, it usually has different

implication from the recurrent states of the same model.

Hence, separating the recurrent and non-recurrent states is both empirically reason-

able and theoretically consistent with the definition of regime switching and structural

breaks of the existing respective models. In the rest of this chapter, the SDHDP-HMM

associates regime switching and structural breaks to recurrent and non-recurrent states.

3.4.3 Forecast and model comparison

Predictive likelihood is used to compare the SDHDP-HMM to the existing regime switch-

ing and structural break models. It is similar to the marginal likelihood by Kass and

Raftery (1995a). Conditional on an initial data set Yt, the predictive likelihood of


Y Tt+1 = (yt+1, · · · , yT ) by model Mi is calculated as

p(Y Tt+1 | Yt,Mi) =

T∏τ=t+1

p(yτ | Yτ−1,Mi). (25)

It is equivalent to the marginal likelihood p(YT |Mi) if t = 0.

The calculation of one-period predictive likelihood of model Mi, p(yt | Yt−1,Mi), is

p(yt | Yt−1,Mi) =1

N

N∑i=1

f(yt | Υ(i), Yt−1,Mi) (26)

where Υ(i) is one sample of parameters from the posterior distribution conditional on the

historical data Yt−1. For the SDHDP-HMM, (26) is

p(yt | Yt−1) =1

N

N∑i=1

L∑k=1

π(i)jk f(yt | θ(i)k , s

(i)t−1 = j, Yt−1).

After the calculation of the one-period predictive likelihood, p(yt | Yt−1), the data is

updated by adding one observation, yt, and the model is re-estimated for the prediction

of the next period. This is repeated until the last predictive likelihood, p(yT | YT−1), is

obtained.

Kass and Raftery (1995a) compared model Mi and Mj by the difference of their

log marginal likelihood: log(BFij) = log(Y | Mi) − log(Y | Mj). They suggested in-

terpreting the evidence for Mi versus Mj as: not worth more than a bare mention for

0 ≤ log(BFij) < 1; positive for 1 ≤ log(BFij) < 3; strong for 3 ≤ log(BFij) < 5; and

very strong for log(BFij) ≥ 5. BFij is referred as the Bayes factor of Mi versus Mj.

This chapter uses this criteria for model comparison by predictive likelihood. Geweke

and Amisano (2010) showed the interpretation is the same as Kass and Raftery (1995a)

if we regard the initial data Yt as a training sample.


3.5 Simulation evidence

To investigate how the SDHDP-HMM reconciles the regime switching and the structural

break models, this section provides some simulation evidence based on three models: the

SDHDP-HMM, a finite Markov switching model, and a structural break model. Each

model simulates a data set of 1000 observations, and all three data sets are estimated by

a SDHDP-HMM with the same prior. First, I plot the posterior means of the conditional

data density parameters E(θst | YT ) and the true values θst over time. If the SDHDP-

HMM fits the model well, the posterior means should be close to the true ones. Second,

more rigorous study is based on the predictive likelihoods. Each of the three models are

estimated on each of the three simulated data sets. The last 100 observations are used to

calculate the predictive likelihood. If the SDHDP-HMM is able to accommodate the other

two models, its predictive likelihood based on the data simulated from the alternative

model should be close to the predictive likelihood estimated by the true model; and if

the SDHDP-HMM provides richer dynamics than the other two models, its predictive

likelihood based on the data simulated from the SDHDP-HMM should strongly dominates

the predictive likelihoods calculated by the other two models.

The parameters of the SDHDP-HMM in the simulation are set as: γ = 3, c = 10, ρ =

0.9, χ = 2, ν = 2, φ = 0 and H = I. The number of AR lags is set as 2. The simulation is

done through the Polya-Urn scheme without approximation as in Fox et al. (2009). The

simulated data are plotted in Figure 3.3.

The first competitor is a K-state Markov switching model as follows:

(pi1, · · · , piK) ∼ Dir(ai1, · · · , aiK) (27)

(φi, σi)iid∼ G0 (28)

Pr(st = j | st−1 = i) = pij (29)

yt | st = j, Yt−1 ∼ N(φj0 + φj1yt−1 + · · ·+ φjqyt−q, σ2j ) (30)


where i, j = 1, · · · , K. Each AR process uses 2 lags as in the SDHDP-HMM. The

number of states, K, is set as 3. Conditional data density parameters are φ1 = (0, 0.8, 0),

φ2 = (1,−0.5, 0.2), φ3 = (2, 0.1, 0.3) and (σ1, σ2, σ3) = (1, 0.5, 2). The transition matrix

is set as P =

0.96 0.02 0.02

0.02 0.96 0.02

0.02 0.02 0.96

. The simulated data are plotted in Figure 3.4. The

prior of each row of the transition matrix, (pj1, · · · , pjK), is set as independent Dirichlet

distribution Dir(1, · · · , 1). The prior of the conditional data density parameters G0 is

set as the normal-gamma distribution, where σ−2i ∼ G (1, 1) and φi | σi ∼ N (0, σ2i I).

The second competitor is a K-state structural break model from Chib (1998):

p ∼ B(ap, bp) (31)

Pr(st = i | st−1 = i) =

p if i < K

1 if i = K(32)

Pr(st = i+ 1 | st−1 = i) = 1− p if i < K (33)

(φi, σi)iid∼ G0 for i = 1, · · · , K (34)

yt | st = i, Yt−1 ∼ N(φi0 + φi1yt−1 + · · ·+ φiqyt−q, σ2k) (35)

where i = 1, 2, · · · , K is the state indicator. The break probability 1 − p and the

number of AR lags are set as 0.003 and 2, respectively. In the simulation, the K = 4 and

the parameters of the conditional data density are φ1 = (0, 0.8, 0), φ2 = (1,−0.5, 0.2),

φ3 = (0.5, 0.1, 0.3), φ4 = (0, 0.5, 0.2) and (σ1, σ2, σ3, σ4) = (1, 0.5, 1, 0.5). The simulated

data are plotted in Figure 3.5. K = 5 is used in the estimation to nest the true data

generating process. The prior of p is set as a beta distribution B(9, 1), and G0 is set in

the same way as the Markov switching model of (28).

All of the three simulated data sets are estimated by the SDHDP-HMM. The pa-

rameters γ, c, ρ and the number of AR lags are set in the same way as in the SDHDP-


HMM used in the simulation. The maximal number of states, L, is assumed as 10.

The priors on the other parameters are weakly informative as follows: H ∼W(0.2I, 5),

φ | H ∼ N(0, H−1), χ ∼ G(0.5, 0.5) and ν ∼ Exp(1).

The intercept, the persistence parameter (sum of AR coefficients), the standard devi-

ation and the cumulative number of active states of the simulated data from the SDHDP-

HMM over time are plotted in Figure 3.6 using solid lines. The posterior means of those

parameters from the estimation are also plotted for comparison in the same figure using

dashed lines. It is not surprising that the estimated values tracks the true ones closely

and sharply identifies the change points. Because the estimation is based on the finite

approximation, while the simulation is based on the true data generating process, the

results support the validity of the block sampler.

Figure 3.7 plots the true values of the intercept, the persistence, the volatility and the

cumulative number of switching of the simulated data from the Markov switching model

over time using solid lines. It also includes the posterior means of these parameters esti-

mated from the SDHDP-HMM marked with dashed lines. Figure 3.8 plots the true and

the posterior mean of the regime switching and structural break probabilities implied

by the SDHDP-HMM. The SDHDP-HMM sharply identifies almost all the switching

points. From the middle panel, the global identification does not find prominent strauc-

tural breaks.

Figure 3.9 plots the true parameters from the data simulated from the structural break

model using solid lines and the posterior mean of those parameters estimated from the

SDHDP-HMM using dashed lines. Again, the SDHDP-HMM tracks different parameters

closely. Figure 3.10 plots the true and the posterior mean of the structural break and

regime switching probabilities. The SDHDP-HMM identifies all the break points. The

bottom panel shows some small probabilities of regime switching around the structural

break points. Those values are very small compared to the structural break probabilities.

A more rigorous model comparison can be found in Table 3.1. It shows the log


predictive likelihoods of the last 100 observations estimated by all of the above three

models on all of the three simulated data sets. The SDHDP-HMM is robust to model

misspecification because it is not strongly rejected against the true model by the log

predictive likelihoods. For example, if the true data generating process is the Markov

switching model, the log predictive likelihoods computed by the true model and the

SDHDP-HMM are −208.10 and −208.32, respectively. The difference is only −208.10−

(−208.32) = 0.22 < 1, which is not worth more than a bare mention. On the other hand,

both the Markov switching model and the structural break model are strongly rejected

if the other one is the true model. For example, if the structural break model is the

data generating process, the log predictive likelihoods calculate by the true model and

the Markov switching model are −178.41 and −187.26. Their difference is −178.41 −

(−187.26) = 8.85 > 5, which is very strong against the misspecified model.

In addition to its robustness, the SDHDP-HMM is also able to capture more compli-

cated dynamics than the Markov switching model and the structural break model. If the

SDHDP-HMM is the true data generating process, the Markov switching model and the

structural break model are both rejected strongly. The log predictive likelihood of the

SDHDP-HMM is 12.75 larger than the Markov switching model and 91.4 larger than the

structural break model. Both values are greater than 5.

In summary, the simulation evidence shows the SDHDP-HMM is robust to model

uncertainty. Both of the Markov switching model and the structural break model can be

tracked closely. Meanwhile, SDHDP-HMM provides richer dynamics than the other two

types of models.

3.6 Application to U.S. real interest rate

The first application is to U.S. real interest rates. Previous studies by Fama (1975); Rose

(1988) and Walsh (1987) tested the stability of their dynamics. While Fama (1975) found


the ex ante real interest rate as a constant, Rose (1988) and Walsh (1987) cannot reject

the existence of an integrated component. Garcia and Perron (1996a) reconciled these

results using a three-regime Markov switching model and found switching points at the

beginning of 1973 (the oil crisis) and the middle of 1981 (the federal budget deficit) using

quarterly U.S. real interest rates of Huizinga and Mishkin (1986) from 1961Q1-1986Q3.

The real interest rate dynamics in each state are characterized by an Gaussian AR(2)

process. Wang and Zivot (2000) used the same data to investigate structural breaks and

found support of four states (3 breaks) by Bayes factors.

This chapter constructs U.S. quarterly real interest rates in the same way as Huizinga

and Mishkin (1986) and extends their data set to a total of 252 observations from 1947Q1

to 2009Q4. The last 200 observations are used for predictive likelihood calculation.

Alternative models for comparison include the Markov switching model of Garcia and

Perron (1996a) put in a Bayesian framework, the structural break model of Wang and

Zivot (2000) with minor modifications and linear AR models. All but the linear model

have the Gaussian AR(2) process in each state as in Garcia and Perron (1996a) and

Wang and Zivot (2000).

The priors of the SDHDP-HMM are set as follows:

π0 ∼ Dir(1/L, · · · , 1/L)

πi | π0 ∼ Dir(π01, · · · , π0i + 9, · · · , π0L)

H ∼W(0.2 I, 5)

φ | H ∼ N(0, H−1)

χ ∼ G(0.5, 2.5)

ν ∼ Exp(5)

where i = 1, · · · , L. The block sampler uses the truncation of L = 10.7 For prior

7L = 10 is chosen to represent a potentially large number of states and keep a reasonable amount of


sensitivity I investigated the model estimates with values of 5 and 10 for γ, 1 and 20 for c

and 0.5 for ρ. I also assumed a continuous prior on (γ, c, ρ) to estimate these values. The

posterior means of the time-varying parameters are similar and the results of the model

comparison are consistent with the original one. The priors for the second hierarchical

parameters are kept the same, since they cover a reasonably wide range of the parameter

space.

The Markov switching model used is (27)-(30). Garcia and Perron (1996a) estimated

the model in the classical approach and this chapter revisits their paper in the Bayesian

framework. The prior of each row of the transition matrix, (pi1, · · · , piK), is set as

Dir(1, · · · , 1). The priors of φi and σi are σ−2i ∼ G(2.5, 0.5) and φi | σi ∼ N(0, σ2i · I).

The structural break model is (31)-(35). The model proposed in this chapter allows

simultaneous breaks of the intercept, the AR coefficients and the volatility, while Wang

and Zivot (2000) only allowed the intercept and the volatility to change. The prior of

p is a beta distribution B(9, 1), and parameters φi and σi have the same priors as the

Markov switching model.

A linear AR model is applied as a benchmark for model comparison:

(φ, σ) ∼ G0 (36)

yt | Yt−1 ∼ N(φ0 + φ1yt−1 + · · ·+ φqyt−q, σ2) (37)

where the prior of σ is set in the same way as in the Markov switching model and the

structural break model. The prior of φ | σ is N(0, σ2 · I), where the dimension of vector

0 and the identity matrix I depends on the number of lags q in the AR model.

Table 3.2 shows the log predictive likelihoods of the different models. Firstly, the table

shows that all linear models are dominated by nonlinear models. Secondly, the log pre-

dictive likelihoods strongly support the Markov switching models against the structural

computation. Larger values of L produce similar results.


break models. The log predictive likelihood of the four-regime or five-regime Markov

switching model is larger than that of any K-regime structural break models by more

than 5, which is very strong based on Kass and Raftery (1995a). Lastly, although the

SDHDP-HMM does not strongly dominate the Markov switching models, it still performs

the best among all the models. This is consistent with the simulation evidence that the

SDHDP-HMM can provide robust forecasts by optimally combining regime switching

and structural breaks in the Bayesian framework.

The whole sample is estimated by the SDHDP-HMM with the same prior as in the

predictive likelihood calculation. Figure 3.11 plots the posterior mean of different param-

eters over time, including the regime switching and structural break probabilities. There

is no sign of structural breaks from the bottom panel, so the regime switching dynamics

prevail over the structural break dynamics, which is consistent with Table 3.2 based on

the predictive likelihoods. Three important regimes are found in the figure: one has high

volatility and high persistence, one has low volatility and intermediate persistence and

the last one has intermediate volatility and low persistence.

Figure 3.12 plots the posterior mean of the cumulative number of active states over

time. A state is defined as active if it is occupied by data. The posterior mean of

the total number of active states is 3.4. Compared to the truncation of L = 10 in the

estimation, this value implies that the finite truncation restriction is not binding, so the

nonparametric flavor is preserved.

Garcia and Perron (1996a) found switching points at the beginning of 1973 and the

middle of 1981. In the SDHDP-HMM, the probability of regime switching in 1973Q1 is

0.39, which is consistent with their finding. From 1980Q2 to 1981Q1, the probabilities of

regime switching are 0.18, 0.13, 0.32 and 0.19, respectively. There are many uncertainties

in the switching point identification at these times. However, it is quite likely that the

state changed in one of these episodes, which is only slightly earlier than in Garcia and

Perron (1996a). On the other hand, Huizinga and Mishkin (1986) identified October 1979


and October 1982 as the turning points. Probabilities of regime switching or structural

breaks in 1979Q3 and Q4 are less than 0.02 and 0.04 respectively, while in 1982Q3 and

1982Q4 they are both less than 0.01. Thus, the SDHDP-HMM supports Garcia and

Perron (1996a) against Huizinga and Mishkin (1986).

As an attempt to locate potential state changing points, I define a time with the sum of

regime switching and structural break probability greater than 0.3 as a candidate turning

point. There are 9 points in total: 1952Q1, 1952Q3, 1956Q2, 1958Q2, 1973Q1, 1980Q4,

1986Q2, 2002Q1, and 2005Q3. Among those points, 1973Q1 and 1980Q4 are consistent

with Garcia and Perron (1996a). Wang and Zivot (2000) found 1970Q3, 1980Q2 and

1985Q4 as structural break points. 1980Q4 and 1986Q2 are close to their finding. How-

ever, the SDHDP-HMM does not identify late 1970 as neither a break nor a switching

point, which contradicts their result.

In summary, by using a larger sample, U.S. real interest rates are better described by

a regime switching model than a structural break one. The robustness of the SDHDP-

HMM to model uncertainty is supported by the predictive likelihoods. The SDHDP-

HMM performs better than all the parametric alternatives in forecasting.

3.7 Application to U.S. inflation

The second application is to the U.S. inflation. Ang et al. (2007) studied the performance

of different methods including time series models, Phillips curve based models, asset

pricing models and surveys. The regime switching model is the best in their most recent

sub-sample. Evans and Wachtel (1993) applied a two-regime Markov switching model

to explain consistent inflation forecast bias. Their model incorporated a random walk

model of Stock and Watson (1991) in one regime and a stationary AR(1) model in

another. Structural breaks in inflation were studied by Groen et al. (2009); Levin and

Piger (2004) and Duffy and Engle-Warnick (2006). Application of the SDHDP-HMM


can reconcile these two types of models and provide more description of the inflation

dynamics.

Monthly inflation rates are constructed from U.S. Bureau of Labor Statistics based

on CPI-U. There are 1152 observations from Feb 1914 to Jan 2010. They are computed

as annualized monthly CPI-U growth rates scaled by 100. The alternative models for

comparison include the FSJW, the regime switching model of Evans and Wachtel (1993),

a structural break model from Chib (1998) and linear Gaussian AR(q) models.

For the SDHDP-HMM, each state has Gaussian AR(1) dynamics. L = 10 and the

priors are:

π0 ∼ Dir(1/L, · · · , 1/L)

πi | π0 ∼ Dir(π01, · · · , π0i + 9, · · · , π0L)

H ∼W(0.2 I, 5)

φ | H ∼ N(0, H−1)

χ ∼ G(0.5, 2.5)

ν ∼ Exp(5)

with i = 1, · · · , L.8

In FSJW, each state has Gaussian AR(1) dynamics and the number of states L =

10, as in the SDHDP-HMM, to use the block sampler. The priors of the transition

probabilities are the same as in the SDHDP-HMM. The prior on the parameters of

conditional data density is normal-gamma: σ−2i ∼ G(0.5, 0.5) and φi | σi ∼ N(0, σ2i I).

For comparison, the structural break model of (31)-(35) is also applied with the

number of the AR lags equal to 1. The prior of p is a beta distribution B(9, 1); and the

8The prior sensitivity check is the same as in the real interest rate application. The posterior meansof the time varying parameters are similar and the model comparison results are consistent with theoriginal one.


priors of φi and σi are the same as in FSJW.

Another alternative model is the regime switching model of Evans and Wachtel (1993):

P (st = i | st−1 = i) = pi

(φ0, σ0) ∼ G0

σ1 ∼ G1

yt | st = 0, Yt−1 ∼ N(φ00 + φ01yt−1, σ20)

yt | st = 1, Yt−1 ∼ N(yt−1, σ21)

where i = 1, 2. The prior of the self-transition probability, pi, is a beta distribution

B(9, 1). φ0, σ0, and σ1 have the same priors as FSJW and the structural break model.

The linear AR model of (36) and (37) is applied as a benchmark for model comparison.

The prior of σ is set the same as in FSJW, the Markov switching model and the structural

break model. The prior of φ | σ is N(0, σ2 · I), where the dimension of the vector 0 and

the identity matrix I depends on the number of lags q in the AR model.

The last 200 observations are used to calculate the log predictive likelihoods. The

results are shown in Table 3.3. First, the linear models are strongly dominated by the

nonlinear models. Second, the regime switching model of Evans and Wachtel (1993)

strongly dominates the structural break models. Third, FSJW strongly dominates all

the other parametric alternatives including the regime switching model. The differ-

ence between the log predictive likelihoods of FSJW and the regime switching model is

−82.45 − (−92.50) = 6.05, which implies heuristically FSJW is exp(6.05) ≈ 424 times

better than the Evans and Wachtel (1993) model. Last, The SDHDP-HMM is the best

model in terms of the log predictive likelihood. The difference of the log likelihoods of

the SDHDP-HMM and FSJW is −74.07− (−82.45) = 8.38, which implies the SDHDP-

HMM is exp(8.34) ≈ 4188 times better than FSJW. Because the SDHDP-HMM nest the

parametric alternatives, its dominance can be attributed to the fact that both the regime


switching and the structural break dynamics are important for inflation, and each single

type of the parametric model alone can not capture its dynamics.

The models are estimated on the whole sample. The posterior summary statistics

are located in Table 3.4. The posterior mean of the persistence parameter is 0.97 with

a 95% density interval of (0.742, 1.199), which implies the inflation dynamics are likely

to be persistent in a new state. On the other hand, FSJW draws the parameters of

the conditional data density for each new state from the prior assumption. This key

difference contributes to the superior forecasting ability of the SDHDP-HMM to FSJW.

The smoothed means of conditional data density parameters, break probabilities and

switching probabilities over time for the SDHDP-HMM are in Figure 3.13. The instability

of the dynamics is consistent with Jochmann (2010). The last panel plots the structural

breaks and regime switching probabilities at different times. There are two major breaks

at 1920-07 and 1930-05. The structural break and regime switching probabilities of

1920-07 are 0.3 and 0.5, respectively. There is quite a large chance for this time to have

unique dynamics different from other periods. For 1930-05, the structural break and

regime switching probabilities are 0.13 and 0.09. This implies that if the state changed

at this time, it would be more likely to be a structural break.

To illustrate the dominance of the regime switching dynamics over the structural

break dynamics, Figure 3.14 plots the probabilities of past states to be the same as

the last period, Jan 2010, or p(zτ = z201001 | Y ). Most of the positive probabilities are

before 1955. This emphasizes the importance of modelling recurrent states in forecasting.

Structural break models perform worse than the SDHDP-HMM and the regime switching

model because they drop much useful information.

Figure 3.15 plots the smoothed regression coefficients, standard deviations and break

probabilities over time estimated by the structural break model with K = 10. Structural

breaks happened in the first half of the sample, therefore the recent regime switching

implied by the SDHDP-HMM is not identified.


Figure 3.16 plots the smoothed probabilities of the random walk state and the smoothed

volatility estimated by the regime switching model of Evans and Wachtel (1993) over time.

The random walk dynamics dominate after 1953. In recent times, inflation dynamics en-

tered into the stationary AR(1) state. This is consistent with the SDHDP-HMM evidence

shown in Figure 3.14 that the most recent episodes are associated with data before 1955.

In another word, there is a regime change back to the same state in the past.

In Figure 3.17, all regime switching and structural break probabilities are plotted

for comparison. The first panel is the regime switching model; the second panel is the

structural break model and the last is the SDHDP-HMM. Two features can be summa-

rized from the figure. First, the state changes identified by the structural break model

and the regime switching model are associated with the state changes identified by the

SDHDP-HMM. Second, the SDHDP-HMM estimates more turning points than each of

the alternative models. This implies it captures some dynamics that can not be iden-

tified by the regime switching or the structural break models alone. Together with the

log predictive likelihood results in Table 3.3, inflation shows both regime switching and

structural break features.

In summary, the regime switching and the structural break dynamics are both impor-

tant for inflation modelling and forecasting. The SDHDP-HMM is able to capture both

of these features. In the SDHDP-HMM, the parameters of the conditional data density in

each state can provide information for the learning of the hierarchical distribution G0(λ)

and significantly improve forecasting.

3.8 Conclusion

This chapter proposes to apply an infinite dimension Markov switching model labelled as

the sticky double hierarchical Dirichlet process hidden Markov model (SDHDP-HMM) to

accommodate regime switching and structural break dynamics. Two parallel hierarchical


structures, one governing the transition probabilities and the other governing the param-

eters of the conditional data density, are imposed for parsimony and to improve forecasts.

An algorithm for the global identification of regime switching and structural breaks is

proposed based on label invariant statistics. A simulation study shows the SDHDP-HMM

is robust to model uncertainty and able to capture more complicated dynamics than the

regime switching and the structural break models.

Applications to U.S. real interest rates and inflation show the SDHDP-HMM is robust

to model uncertainty and provides better forecasts than regime switching and structural

break models. The second hierarchical structure on the data density parameters provides

significant improvement in inflation forecasting. From both the predictive likelihood

results and the posterior probabilities of regime switching and structural breaks, U.S.

real interest rates are better described by a regime switching model while inflation has

both features of regime switching and structural breaks.


3.9 Appendix

3.9.1 Sample (S, I) | Θ, P, Y

S | Θ, P, Y is sampled by the forward and backward smoother in Chib (1996).

I is introduced to facilitate the π0 sampling. From (19) and (20), the filtered distri-

bution of πi conditional on St = (s1, · · · , st) and π0 is a Dirichlet distribution:

πi | St, π0 ∼ Dir(c(1− ρ)π01 + n

(t)i1 , · · · , c(1− ρ)π0i + cρ+ n

(t)ii , · · · , c(1− ρ)π0L + n

(t)iL

)

where n(t)ij is the number of τ | sτ = j, sτ−1 = i, τ ≤ t. Integrate out πi, the conditional

distribution of st+1 given St and π0 is:

p(st+1 = j | st = i, St, π0) ∝ c(1− ρ)π0j + cρδi(j) + n(t)ij

Construct a variable It with a Bernoulli distribution:

p(It+1 | st = i, St, π0) ∝

cρ+

L∑j=1

n(t)ij if It+1 = 0

c(1− ρ) if It+1 = 1

Construct the conditional distribution:

p(st+1 = j | It+1 = 0, st = i, St, β) ∝ n(t)ij + cρδi(j)

p(st+1 = j | It+1 = 1, st = i, St, β) ∝ π0j

This construction preserves the same conditional distribution of st+1 given St and π0.

To sample I | S, use the Bernoulli distribution:

It+1 | st = i, st+1 = j, π0 ∼ Ber(c(1− ρ)π0j

n(t)ij + cρδi(j) + c(1− ρ)π0j

).


3.9.2 Sample (Θ, P, π0) | S, I, Y

After sampling I and S, write mi =∑st=i

It. By construction, the conditional posterior of

π0 given S and I only depends on I and is a Dirichlet distribution by conjugacy:

π0 | S, I ∼ Dir(γ

L+m1, . . . ,

γ

L+mL)

This approach of sampling π0 is simpler than Fox et al. (2009).

Conditional on π0 and S, the sampling of πi is straightforward by conjugacy:

πi | π0, S ∼ Dir(c(1− ρ)π01 + ni1, · · · , c(1− ρ)π0i + cρ+ nii, · · · , c(1− ρ)π0L + niL)

where nij is the number of τ | sτ = j, sτ−1 = i.

Sampling Θ | S, Y uses the results of regular linear models. The prior is:

(φi, σ−2i ) ∼ N−G(φ,H, χ, ν).

By conjugacy, the posterior is:

(φi, σ−2i ) | S, Y ∼ N−G(φi, H i, χi, νi)

with:

φi = H−1i (Hφ+X ′iYi)

H i = H +X ′iXi

χi = χ+ Y ′i Yi + φ′Hφ− φ′Hφ

νi = ν + ni

where Yi is the collection of yt in state i. xt = (1, yt−1, · · · , yt−q) is the regressor in the


AR(q) model. Xi and ni are the collection of xt and the number of observations in state

i, respectively.

3.9.3 Sample (φ,H, χ) | S,Θ, ν

The conditional posterior is:

φ,H | φi, σiKi=1 ∼ N−W(m1, τ1, A1, a1)

where K is the number of active states. φi and σi are the parameters associated with

these states:

m1 =1

τ−10 +K∑i=1

σ−2i

(τ−10 m0 +

K∑i=1

σ−2i φi

)

τ1 =1

τ−10 +K∑i=1

σ−2i

A1 =

(A−10 +

K∑i=1

σ−2i φiφ′i + τ−10 m0m

′0 − τ−11 m1m

′1

)−1a1 = a0 +K.

The conditional posterior of χ is:

χ | ν, σiKi=1 ∼ G(d1/2, c1/2)

with d1 = d0 +K∑i=1

σ−2i and c1 = c0 +Kν.


3.9.4 Sample ν | χ, S,Θ

The conditional posterior of ν has no regular density form:

p(ν | χ, σiKi=1) ∝(

(χ/2)ν/2

Γ(ν/2)

)K ( K∏i=1

σ−2i

)ν/2

exp− ν

ρ0.

The Metroplolis-Hastings method is applied to sample ν. Draw a new ν from a proposal

distribution:

ν | ν ′ ∼ G(ζνν ′, ζν)

with acceptance probability min

1,

p(ν|χ,σiKi=1)fG(ν′; ζνν,ζν)

p(ν′|χ,σiKi=1)fG(ν;ζνν′ ,ζν)

, where ν ′ is the value from

the previous sweep. ζν is fine tuned to produce a reasonable acceptance rate around 0.5,

as suggested by Roberts et al. (1997) and Muller (1991).


3.9.5 Tables

Table 3.1: Log predictive likelihoods in simulation study

DGP Estimated Model

SDHDP-HMM MS SB

SDHDP-HMM -170.55 -183.30 -264.65

MS -208.32 -208.10 -212.07

SB -179.51 -187.26 -178.41

The SDHDP-HMM is (7)-(12); the MS is the 3-state markovswitching model of (27)-(30); and the SB is the 4-statestructural break model of (31)-(35). 1000 observations aresimulated from each model and the last 100 are used tocalculate the predictive likelihoods. The first column showsthe names of the data generating processes. The first rowshows the names of the estimated models.


Table 3.2: Log predictive likelihoods of U.S. real interest rates

AR(q) q=2 q=3 q= 4

-457.62 -451.07 -455.97

MS(K)b K=3 K=4 K=5

-433.09 -426.62 -424.51

SB(K)c K=3 K=4 K=5 K=10 K=15 K=20

-450.82 -451.62 -437.28 -433.50 -432.69 -434.24

SDHDP-HMMe -423.50

There are 252 observations from 1947Q1 to 2009Q4 for U.S. quarterly realinterest rate. The last 200 observations are used to calculate the predictivelikelihoods. MS(K) is the K-state Markov switching model of (27)-(30) andSB(K) is the K-state structural break model of (31)-(35). For the SDHDP-HMM,MS(K) and SB(K), each state has Gaussian AR(2) dynamics.

Table 3.3: Log predictive likelihoods of U.S. inflation

AR(q) q=1 q=2 q= 3

-185.06 -173.17 -173.42

MS b -92.50

SB(K)c K=3 K=5 K=10

-125.50 -98.69 -101.18

FSJWd -82.45

SDHDP-HMMe -74.07

There are 1153 observations from Feb 1914 to Jan 2010 forU.S. monthly inflation rate. The last 200 observations areused to calculate the predictive likelihoods. MS is the 2-stateMarkov switching model of Evans and Wachtel (1993); SB(K)is the K-state structural break model of (31)-(35); and theFSJW is Fox et al.’s (2008) model (or the SDHDP-HMMwithout the hierarchical structure of G0 on the conditionaldata density parameters). For the SDHDP-HMM, FSJW, MSand SB(K), each state has Gaussian AR(1) dynamics.


Table 3.4: Posterior summary of theSDHDP-HMM parameters estimatedfrom U.S. inflation

mean Std 95% DI

φ0 0.03 0.20 (-0.376, 0.432)

φ1 0.97 0.11 (0.742, 1.199)

H00 0.77 0.42 (0.225, 1.788)

H01 0.02 0.35 (-0.692, 0.734)

H11 2.06 0.84 (0.768, 4.047)

χ 0.19 0.12 (0.034, 0.488)

ν 1.21 0.50 (0.496, 2.414)

There are 1153 observations from Feb1914 to Jan 2010 for U.S. monthlyinflation rate. Each state has GaussianAR(1) dynamics:yt = φst0 + φst1yt−1 + σstεt. Theparameters φi and σi are drawn from thehierarchical distribution:σ−1i ∼ G(χ/2, ν/2) andφi | σi ∼ N(φ, σiH

−1).


3.9.6 Figures


3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

(a)

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

(b)

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

(c)

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

(d)

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

3

5

6

7

8

2

4

1

(e)

Figure 3.1: The horizontal dimension from left to right represents time and the verticalcircles represent different states. The numbers in the circles are the labels of the statesand they are exchangeable. The SDHDP-HMM nests: (a) no state change, (b) regimeswitching, (c) structural breaks, (d) frequent parameter change and (e) regime switchingand structural breaks.


time

12

34

structural breakregime switching

Figure 3.2: Example of the global identification of regime switching and structural breaks.All the points represent one sample of the states (s1, · · · , sT ) from the posterior samples.The circles are non-recurrent states which only appear for one consecutive period andthe triangles are recurrent states. The solid arrows point to the break points and thedashed arrows point to the switching points.


0 200 400 600 800 1000

−15

−10

−5

05

10

time

Figure 3.3: Data simulated by a SDHDP-HMM. Each state has Gaussian AR(2) dynam-ics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt.


0 200 400 600 800 1000

−4

−2

02

46

8

time

Figure 3.4: Data simulated by a 3-state Markov switching model of (27)-(30). Each statehas Gaussian AR(2) dynamics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt.


0 200 400 600 800 1000

−3

−2

−1

01

23

time

Figure 3.5: Data simulated by a structural break model of (31)-(35). Each state hasGaussian AR(2) dynamics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt.


−1.

0−

0.5

0.0

0.5

1.0

Time

Inte

rcep

t

TrueEstimated

50 150 250 350 450 550 650 750 850 950−

1.0

−0.

50.

00.

51.

0

Time

Per

sist

ence

TrueEstimated

50 150 250 350 450 550 650 750 850 950

12

34

56

Time

Std True

Estimated

50 150 250 350 450 550 650 750 850 950

12

34

56

Time

num

ber

of s

tate

s

TrueEstimated

50 150 250 350 450 550 650 750 850 950

Figure 3.6: The SDHDP-HMM estimates the data from figure 3.3. Each state has Gaus-sian AR(2) dynamics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt. The solid lines are the truevalues and the dashed lines are the posterior means of those values estimated by theSDHDP-HMM. The top-left panel plots the intercepts φst0; the top-right panel plots thepersistence parameters φst1 + φst2; the bottom-left plots the conditional standard devia-tions σst and the bottom-right plots the cumulative number of the active states (activestate means it has been visited at least once).


0.0

0.5

1.0

1.5

2.0

Time

Inte

rcep

t

TrueEstimated

50 150 250 350 450 550 650 750 850 950

−0.

4−

0.2

0.0

0.2

0.4

0.6

0.8

Time

Per

sist

ence

TrueEstimated

50 150 250 350 450 550 650 750 850 950

0.5

1.0

1.5

2.0

Time

Std

TrueEstimated

50 150 250 350 450 550 650 750 850 950

02

46

810

Time

Num

ber

of s

witc

hing

TrueEstimated

50 150 250 350 450 550 650 750 850 950

Figure 3.7: The SDHDP-HMM estimates the data in figure 3.4. Each state has GaussianAR(2) dynamics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt. The solid lines are the truevalues and the dashed lines are the posterior means of those values estimated by theSDHDP-HMM. The top-left panel plots the intercepts φst0; the top-right panel plotsthe persistence parameters φst1 + φst2; the bottom-left plots the conditional standarddeviations σst and the bottom-right plots the cumulative number of regime switching.


0.0

0.2

0.4

0.6

0.8

1.0

True probabilities of regime switching

0.0

0.2

0.4

0.6

0.8

1.0

Break probabilities

0 200 400 600 800 1000

0.0

0.2

0.4

0.6

0.8

1.0

Switching probabilities

Figure 3.8: Globally identified smoothed probabilities of structural breaks and regimeswitching. The data is in figure 3.4, which is simulated by a 3-state Markov switch-ing model and estimated by the SDHDP-HMM. The top panel is the switching points;the middle panel is the probabilites of structural breaks and the bottom panel is theprobabilities of regime switching.


0.0

0.5

1.0

1.5

Time

Inte

rcep

t

TrueEstimated

50 150 250 350 450 550 650 750 850 950

−0.

4−

0.2

0.0

0.2

0.4

0.6

0.8

Time

Per

sist

ence

TrueEstimated

50 150 250 350 450 550 650 750 850 950

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

Time

Std

TrueEstimated

50 150 250 350 450 550 650 750 850 950

12

34

5

Time

Num

ber

of s

tate

s

TrueEstimated

50 150 250 350 450 550 650 750 850 950

Figure 3.9: The SDHDP-HMM estimates the data in figure 3.5. Each state has GaussianAR(2) dynamics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt. The solid lines are the truevalues and the dashed lines are the posterior means of those values estimated by theSDHDP-HMM. The top-left panel plots the intercepts φst0; the top-right panel plotsthe persistence parameters φst1 + φst2; the bottom-left plots the conditional standarddeviations σst and the bottom-right plots the cumulative number of states.


0.0

0.2

0.4

0.6

0.8

1.0

True probabilities of structural breaks

0.0

0.2

0.4

0.6

0.8

1.0

Break probabilities

0 200 400 600 800 1000

0.0

0.2

0.4

0.6

0.8

1.0

Switching probabilities

Figure 3.10: Globally identified smoothed probabilities of structural breaks and regimeswitching. The data is in figure 3.5, which is simulated by a 4-state structural break modeland estimated by the SDHDP-HMM. The top panel is the switching points; the middlepanel is the probabilites of structural breaks and the bottom panel is the probabilities ofregime switching.


−10

−5

05

10R

eal i

nter

est r

ate

−1.

0−

0.5

0.0

0.5

1.0

inte

rcep

t0.

20.

30.

40.

50.

6pe

rsis

tenc

e1.

52.

53.

5st

d0.

00.

20.

40.

60.

8pr

obab

ility

break probswitch prob

1950−2 1956−3 1962−4 1969−1 1975−2 1981−3 1987−4 1994−1 2000−2 2006−3

Figure 3.11: There are 252 observations from 1947Q1 to 2009Q4 for U.S. quarterly realinterest rate. The data are estimated by the SDHDP-HMM and each state has GaussianAR(2) dynamics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt. The first panel plots the dataand the rest plots the posterior mean of different parameters: the second panel plotsthe intercepts φst0, the third panel plots the persistence parameters φst1 + φst2, thefourth panel plots the conditional standard deviations σst and the last panel plots theprobabilites of regime switching and structural breaks.


−10

−5

05

10

Index

Data

1.0

1.5

2.0

2.5

3.0

Index

Number of States

1950−2 1962−4 1975−2 1987−4 2000−2

Figure 3.12: There are 252 observations from 1947Q1 to 2009Q4 for U.S. quarterly realinterest rate. The data are estimated by the SDHDP-HMM and each state has GaussianAR(2) dynamics: yt = φst0 + φst1yt−1 + φst2yt−2 + σstεt. The top panel plots the dataand the bottom panel plots the posterior mean of the cumulative number of active states(active state means it has been visited at least once).


−10

010

20In

flatio

n−

3−

10

12

3in

terc

ept

0.85

0.90

0.95

1.00

pers

iste

nce

0.5

1.0

1.5

std

0.0

0.2

0.4

0.6

0.8

prob

abili

ty

break probswitch prob

191810 192805 193801 194708 195703 196610 197605 198601 199508 200503

Figure 3.13: There are 1153 observations from Feb 1914 to Jan 2010 for U.S. monthlyinflation rate. The data are estimated by the SDHDP-HMM and each state has GaussianAR(1) dynamics: yt = φst0 + φst1yt−1 + σstεt. The first panel plots the data and the restplots the posterior mean of different parameters: the second panel plots the interceptsφst0, the third panel plots the persistence parameters φst1, the fourth panel plots theconditional standard deviations σst and the last panel plots the probabilites of regimeswitching and structural breaks.


0.0

0.2

0.4

0.6

0.8

Pro

babi

litie

s

191912 193112 194312 195512 196712 197912 199112 200312

Figure 3.14: There are 1153 observations from Feb 1914 to Jan 2010 for U.S. monthlyinflation rate. The data are estimated by the SDHDP-HMM and each state has GaussianAR(1) dynamics: yt = φst0+φst1yt−1+σstεt. This figure plots the smoothed probabilitiesof past states of U.S. inflation to be the same as Jan 2010, or p(zτ = z201001 | Y ).


−10

010

20In

flatio

n−

1.0

0.0

1.0

2.0

Mea

n0.

20.

40.

60.

81.

0P

ersi

sten

ce0.

40.

81.

21.

6st

d0.

00.

20.

40.

60.

8P

rob

of b

reak

191810 192805 193801 194708 195703 196610 197605 198601 199508 200503

Figure 3.15: There are 1153 observations from Feb 1914 to Jan 2010 for U.S. monthlyinflation rate . The data are estimated by the structural break model of Chib (1998) andeach state has Gaussian AR(1) dynamics: yt = φst0 + φst1yt−1 + σstεt. The first panelplots the data and the rest plots the posterior mean of different parameters: the secondpanel plots the intercepts φst0, the third panel plots the persistence parameters φst1, thefourth panel plots the conditional standard deviations σst and the last panel plots theprobabilites of structural breaks.


−10

010

20In

flatio

n0.

00.

20.

40.

60.

81.

0P

rob

of r

ando

m w

alk

0.4

0.6

0.8

1.0

1.2

std

191810 192805 193801 194708 195703 196610 197605 198601 199508 200503

Figure 3.16: There are 1153 observations from Feb 1914 to Jan 2010 for U.S. monthlyinflation rate. The data are estimated by the 2-state Markov switching model of Evansand Wachtel (1993). The first panel plots the data and the rest plots the posterior meanof different parameters: the second panel plots the probabilities of in the random walkstate and the last panel plots the conditional standard deviations.


0.0

0.2

0.4

0.6

0.8

1.0

Index

Markov Switching Model

0.0

0.2

0.4

0.6

0.8

1.0

Index

Structural Break Model

0.0

0.2

0.4

0.6

0.8

1.0

Index

Structural BreakRegime Switching

191810 192805 193801 194708 195703 196610 197605 198601 199508 200503

Figure 3.17: There are 1153 observations from Feb 1914 to Jan 2010 for U.S. monthlyinflation rate. The first panel plots the posterior probabilities of regime switching by the2-state Markov switching model of Evans and Wachtel (1993); the second panel plots theposterior probabilities of structural breaks by the structural break model of Chib (1998)and the last panel plots the posterior probabilities of regime switching and structuralbreaks by the SDHDP-HMM.

Bibliography

Ang, A. and Bekaert, G. Regime switches in interest rates. Journal of Business &

Economic Statistics, 20(2):163–182, 2002a.

Ang, A., Bekaert, G., and Wei, M. Do macro variables, asset markets, or surveys forecast

inflation better? Journal of Monetary Economics, 54(4):1163–1212, 2007.

Ang, Andrew and Bekaert, Geert. International asset allocation with regime shifts.

Review of Financial Studies, 15:1137–1187, 2002b.

Ang, Andrew and Bekaert, Geert. Regime switches in interest rates. Journal of Business

& Economic Statistics, 20:163–182, 2002c.

Bry, G. and Boschan, C. Cyclical Analysis of Time Series: Selected Procedures and

Computer Programs. NBER, New Yor, 1971.

Calvet, Laurent and Fisher, Adlai. Multifrequency news and stock returns. Journal of

Financial Economics, 86(1):178–212, 2007.

Casella, G. and Robert, C.P. Rao-Blackwellisation of sampling schemes. Biometrika, 83

(1):81, 1996.

Cecchetti, S., Lam, P., and Mark, Nelson. Mean reversion in equilibrium asset prices.

American Economic Review, 80:398–418, 1990.

149

BIBLIOGRAPHY 150

Celeux, G., Hurn, M., and Robert, C.P. Computational and Inferential Difficulties with

Mixture Posterior Distributions. Journal of the American Statistical Association, 95

(451), 2000.

Chauvet, Marcelle and Potter, Simon. Coincident and leading indicators of the stock

market. Journal of Empirical Finance, 7:87–111, 2000.

Chib, S. Marginal likelihood from the gibbs output. Journal of the American Statistical

Association, 90(432):1313–1321, 1995.

Chib, S. Calculating posterior distributions and modal estimates in Markov mixture

models* 1. Journal of Econometrics, 75(1):79–97, 1996.

Chib, S. Estimation and comparison of multiple change-point models. Journal of Econo-

metrics, 86(2):221–241, 1998.

David, Alexander and Veronesi, Pietro. What ties return volatilities to price valuations

and fundamentals? Chicago Booth Research Working Paper No. 10-05, 2009.

Duffy, J. and Engle-Warnick, J. Multiple regimes in US monetary policy? A nonpara-

metric approach. Journal of Money Credit and Banking, 38(5):1363, 2006.

Engle, R.F. Autoregressive conditional heteroscedasticity with estimates of the variance

of United Kingdom inflation. Econometrica: Journal of the Econometric Society, 50

(4):987–1007, 1982.

Engle, R.F. Estimates of the Variance of US Inflation Based upon the ARCH Model.

Journal of Money, Credit and Banking, 15(3):286–301, 1983.

Escobar, MD and West, M. Bayesian density estimation and inference using mixtures.

Journal of the American Statistical Association, 90, 1995.

Evans, M. and Wachtel, P. Inflation regimes and the sources of inflation uncertainty.

Journal of Money, Credit and Banking, pages 475–511, 1993.

BIBLIOGRAPHY 151

Fama, E.F. Short-term interest rates as predictors of inflation. The American Economic

Review, 65(3):269–282, 1975.

Ferguson. A bayesian analysis of some nonparametric problem. The Annals of Statistics,

1(2):209–230, 1973.

Fox, E.B., Sudderth, E.B., Jordan, M.I., and Willsky, A.S. An HDP-HMM for systems

with state persistence. In Proceedings of the 25th international conference on Machine

learning, pages 312–319. ACM, 2008.

Fox, E.B., Sudderth, E.B., Jordan, M.I., and Willsky, A.S. The Sticky HDP-HMM:

Bayesian Nonparametric Hidden Markov Models with Persistent States. Arxiv preprint

arXiv:0905.2592, 2009.

Fruhwirth-Schnatter, S. Markov Chain Monte Carlo Estimation of Classical and Dynamic

Switching and Mixture Models. Journal of the American Statistical Association, 96

(453), 2001.

Fruhwirth-Schnatter, Sylvia. Finite Mixture and Markov Switching Models. Springer

Series in Statistics. New York/Berlin/Heidelburg, 2006.

Garcia, R. and Perron, P. An analysis of the real interest rate under regime shifts. The

Review of Economics and Statistics, 78(1):111–125, 1996a.

Garcia, Rene and Perron, Pierre. An analysis of real interest rates under regime shifts.

Review of Economics and Statistics, pages 111–125, 1996b.

Gerlach, R., Carter, C., and Kohn, R. Efficient Bayesian Inference for Dynamic Mixture

Models. Journal of the American Statistical Association, 95(451), 2000.

Geweke, J. Interpretation and inference in mixture models: Simple MCMC works. Com-

putational Statistics & Data Analysis, 51(7):3529–3550, 2007.

BIBLIOGRAPHY 152

Geweke, J. Complete and Incomplete Econometric Models. Princeton Univ Pr, 2009.

Geweke, J. and Amisano, G. Comparing and evaluating Bayesian predictive distributions

of asset returns. International Journal of Forecasting, 2010.

Geweke, J. and Amisano, G. Hierarchical markov normal mixture models with applica-

tions to financial asset returns. Journal of Applied Econometrics, 26(1):1–19, 2011.

Geweke, John. Contemporary Bayesian Econometrics and Statistics. Wiley, 2005.

Giordani, P. and Kohn, R. Efficient Bayesian inference for multiple change-point and

mixture innovation models. Journal of Business and Economic Statistics, 26(1):66–77,

2008.

Gonzalez, Liliana, Powell, John G., Shi, Jing, and Wilson, Antony. Two centuries of bull

and bear market cycles. International Review of Economics and Finance, 14:469–486,

2005.

Gordon, S. and St-Amour, P. A preference regime model of bull and bear markets.

American Economic Review, 90(4):1019–1033, 2000.

Groen, J.J.J., Paap, R., and Ravazzolo, F. Real-time inflation forecasting in a changing

world. http://hdl.handle.net/1765/16709, 2009.

Guidolin, Massimo and Timmermann, Allan. Economic implications of bull and bear

regimes in uk stock and bond returns. The Economic Journal, 115:111–143, 2005.

Guidolin, Massimo and Timmermann, Allan. An econometric model of nonlinear dynam-

ics in the joint distribution of stock and bond returns. Journal of Applied Econometrics,

21(1):1–22, 2006.

Guidolin, Massimo and Timmermann, Allan. Asset allocation under multivariate regime

switching. Journal of Economic Dynamics and Control, 31(11):3503–3504, 2007.

BIBLIOGRAPHY 153

Guidolin, Massimo and Timmermann, Allan. International asset allocation under skew

and kurtosis preferences. Review of Financial Studies, 21(2):889–935, 2008.

Hamilton, J. D. A new approach to the economic analysis of non-stationary time series

and the business cycle. Econometrica, 57:357–384, 1989a.

Hamilton, J. D. and Lin, G. Stock market volatility and the business cycle. Journal of

Applied Econometrics, 11:573–593, 1996.

Hamilton, James D. Time Series Analysis. Princeton University Press, Princeton, New

Jersey, 1994.

Hamilton, J.D. A new approach to the economic analysis of nonstationary time series and

the business cycle. Econometrica: Journal of the Econometric Society, 57(2):357–384,

1989b.

Huizinga, J. and Mishkin, F.S. Monetary policy regime shifts and the unusual behavior

of real interest rates, 1986.

Inclan, C. Detection of multiple changes of variance using posterior odds. Journal of

Business & Economic Statistics, 11(3):289–300, 1993.

Ishwaran, H. and James, L.F. Gibbs Sampling Methods for Stick-Breaking Priors. Journal

of the American Statistical Association, 96(453), 2001.

Ishwaran, H. and Zarepour, M. Markov chain Monte Carlo in approximate Dirichlet and

beta two-parameter process hierarchical models. Biometrika, 87(2):371, 2000.

Ishwaran, H. and Zarepour, M. Dirichlet prior sieves in finite normal mixtures. Statistica

Sinica, 12(3):941–963, 2002.

Jochmann, M. Modeling U S Inflation Dynamics: A Bayesian Nonparametric Approach.

Working Paper Series, 2010.

BIBLIOGRAPHY 154

Kandel, Shmuel and Stambaugh, Robert. Expectations and volatility of consumption

and asset returns. Review of Financial Studies, 3:207–232, 1990.

Kass, R.E. and Raftery, A.E. Bayes factors. Journal of the American Statistical Associ-

ation, 90(430):773–795, 1995a.

Kass, Robert E. and Raftery, Adrian E. Bayes factors. Journal of the American Statistical

Association, 90(420):773–795, 1995b.

Koop, G. and Potter, S.M. Estimation and forecasting in models with multiple breaks.

Review of Economic Studies, 74(3):763, 2007.

Lettau, Martin, Ludvigson, Sydney C., and Wachter, Jessica A. The declining equity

premium: What role does macroeconomic risk play. Review of Financial Studies, 21

(4):1653–1687, 2008.

Levin, A.T. and Piger, J.M. Is inflation persistence intrinsic in industrial economies?

2004.

Lunde, Asger and Timmermann, Allan G. Duration dependence in stock prices: An

analysis of bull and bear markets. Journal of Business & Economic Statistics, 22(3):

253–273, 2004.

Maheu, J. M. and McCurdy, T. H. Identifying bull and bear markets in stock returns.

Journal of Business & Economic Statistics, 18(1):100–112, 2000a.

Maheu, J. M. and McCurdy, T. H. Volatility dynamics under duration-dependent mixing.

Journal of Empirical Finance, 7(3-4):345–372, 2000b.

Maheu, J.M. and Gordon, S. Learning, forecasting and structural breaks. Journal of

Applied Econometrics, 23(5):553–583, 2008.

BIBLIOGRAPHY 155

Maheu, J.M. and McCurdy, T.H. How useful are historical data for forecasting the long-

run equity return distribution? Journal of Business and Economic Statistics, 27(1):

95–112, 2009.

Maheu, J.M., McCurdy, T.H., and Song, Y. Components of bull and bear markets: bull

corrections and bear rallies. Working Papers, 2010.

Muller, P. A generic approach to posterior integration and Gibbs sampling. Rapport

technique, pages 91–09, 1991.

Ntantamis, Christos. A duration hidden markov model for the identification of regimes

in stock market returns. University of Aarhus - CREATES, Available at SSRN:

http://ssrn.com/abstract=1343726, 2009.

Pagan, Adrian R. and Sossounov, Kirill A. A simple framework for analysing bull and

bear markets. Journal of Applied Econometrics, 18(1):23–46, 2003.

Pastor, Lubos and Stambaugh, Robert F. The equity premium and structural breaks.

Journal of Finance, 4:1207–1231, 2001.

Perez-Quiros, G. and Timmermann, A. Business cycle asymmetries in stock returns: Ev-

idence from higher order moments and conditional densities. Journal of Econometrics,

103(1-2):259–306, 2001.

Pesaran, M.H., Pettenuzzo, D., and Timmermann, A. Forecasting time series subject to

multiple structural breaks. Review of Economic Studies, 73(4):1057–1084, 2006.

Primiceri, G.E. Time varying structural vector autoregressions and monetary policy.

Review of Economic Studies, 72(3):821–852, 2005.

Roberts, GO, Gelman, A., and Gilks, WR. Weak convergence and optimal scaling of

random walk Metropolis algorithms. The Annals of Applied Probability, 7(1):110–120,

1997.

BIBLIOGRAPHY 156

Rose, A.K. Is the real interest rate stable? Journal of Finance, 43(5):1095–1112, 1988.

Schwert, G. William. Indexes of u.s. stock prices from 1802 to 1987. Journal of Business,

63(3):399–426, 1990.

Sethuraman, J. A constructive definition of dirichlet priors. Statistica Sinica, 4:639–650,

1994.

Shahbaba, B. and Neal, R.M. Nonlinear models using dirichlet process mixtures. Journal

of Machine Learning Research, 10:1829–1850, 2009.

Stock, J.H. and Watson, M.W. A probability model of the coincident economic indicators.

Leading Economic indicators: new approaches and forecasting records, 66, 1991.

Stock, J.H. and Watson, M.W. Evidence on structural instability in macroeconomic time

series relations. Journal of Business & Economic Statistics, 14(1):11–30, 1996.

Teh, Y.W., Jordan, M.I., Beal, M.J., and Blei, D.M. Hierarchical dirichlet processes.

Journal of the American Statistical Association, 101(476):1566–1581, 2006.

Turner, C., Startz, R., and Nelson, C. A markov model of heteroskedasticity, risk, and

learning in the stock market. Journal of Financial Economics, 25:3–22, 1989.

van Norden, Simon and Schaller, Huntley. Regime switching in stock market returns.

Applied Financial Economics, 7:177–191, 1997.

Walsh, C.E. Three questions concerning nominal and real interest rates. Economic

Review, (Fall):5–19, 1987.

Wang, J. and Zivot, E. A Bayesian time series model of multiple structural changes in

level, trend, and variance. Journal of Business & Economic Statistics, 18(3):374–386,

2000.

BIBLIOGRAPHY 157

West, M., Muller, P., and Escobar, M.D. Hierarchical priors and mixture models, with

application in regression and density estimation. Aspects of uncertainty: A Tribute to

DV Lindley, pages 363–386, 1994.

Documents

Development and Application of Hidden Markov … and Application of Hidden Markov Models in ... Development and Application of Hidden Markov Models in the Bayesian Framework ... 1.1