Download pdf - Increasing Overreaction and Excess Volatility of Long Rates · 2020-04-20 · 1Introduction Since Shiller (1981) celebrated analysis of the stock market, many studies have documented

Increasing Overreaction and Excess Volatility of Long

Rates

Daniele d’Arienzo∗

January 2020

Please click here for the latest version. Do not circulate.

Abstract

Giglio and Kelly (2017) find that the volatility of long-term rates is too large relative

to that of short-term rates for a large class of rational expectations models. I assess the

possibility that such excess volatility may come from investor beliefs. I use survey data

on analyst expectations and data on market beliefs recovered from observed yields using

the methodology of Ross (2015). I obtain three main findings. First, the two datasets

reveal a remarkably similar pattern of horizon dependent departures from rationality:

expectations about long rates over-react relative to expectations about short rates.

Second, a model of diagnostic expectations rationalizes this horizon dependent belief

distortions and generates excess volatility of long term rates. Third, when calibrated

to the data, this model accounts from roughly 80% of the excess volatility puzzle for a

reasonable value of the diagnosticity parameter.

∗Bocconi University, Department of Finance, via Rontgen 1, 20136 Milan, Italy,[email protected]. I am deeply indebted with Nicola Gennaioli for invaluable guidance.I thank for helpful comments Pedro Bordalo, John Campbell, Francesco Corielli, Massimo Guidolin, SamHanson, Christian Skov Jensen, Eben Lazarus, Peter Maxted, Fulvio Ortu, Andrei Shleifer, Claudio Tebaldi,Yijun Zhou (discussant) and seminar participants at Bocconi University, Harvard University, Boston College,TADC 2019 and Sloan-Nomis 2019. All errors are my own.

1

https://www.dropbox.com/s/1xs6n8j1abbewjw/d%27Arienzo_jmp.pdf?dl=0

mailto:[email protected]

1 Introduction

Since Shiller (1981) celebrated analysis of the stock market, many studies have documented

the so called excess volatility of asset prices relative to measures of fundamentals (De Bondt

and Thaler (1985), LeRoy and Porter (1981) and Campbell and Shiller (1987) among others).

There are two main explanations for this finding. One of them assumes rational expectations

and emphasizes the role of discount rate variation (Campbell (2003), Cochrane (2011)). The

other one relaxes rationality and views price volatility as reflecting excess volatility of beliefs.

Consistent with the latter view, a growing body of work documents excess volatility of beliefs

using survey data, e.g. Gennaioli and Shleifer (2018).

A recent paper by Giglio and Kelly (2017) assesses these hypotheses in the realm of the

term structure of interest rates. This setting is appropriate because rational term structure

models impose precise constraints on the amount of covariation of asset prices at different

maturities, even after adjusting for discount rate variation. The key finding of this analysis is

that long term interest rates are significantly more volatile with respect to short term interest

rates relative to what rational models predict. This finding indicates that beliefs may be a

key driver of excess volatility in an important setting such as the bond market where riskless

rates are determined. This analysis raises three questions. First, can one go beyond assessing

volatility in excess of discount rates and measure beliefs in the bond market? Second, if the

answer is affirmative, can one assess which departure of rationality, if any, do these beliefs

display? Third, what is the psychological foundation of this departure from rationality, and

can it account for the excess volatility of long term rates, not only qualitatively but also

quantitatively? This paper seeks to address these questions.

One immediate way to make progress is to proxy the beliefs of the bond market by using

expectations data. Figure (1) reports a preliminary analysis using data from Blue Chip on

professional forecasts of future rates at maturities of 1y, 2y, 5y, 10y, 20y and 30y (monthly

frequency). For each maturity, I report the pooled correlation between the forecast revision

made by the analyst1 and his subsequent forecast error. Two features stand out in the

1The time t forecast revision of a variable Xt+m (m > 0) is defined as the difference between the time tforecast of Xt+m and the time t− 1 forecast of Xt+m.

2

Figure 1: Sensitivity of forecast error to past forecast revision using monthly professionalforecasts of (annualized) interest rates.

data. First, at any maturity the average analyst over-reacts to news: when he revises his

forecast of future rates up, reality systematically falls below his revised expectation. This is

captured by the fact that in Figure (1) all correlations are negative. Second, there is more

over-reaction for longer term interest rates, namely the coefficient becomes more negative for

longer maturities. This latter aspect especially resonates with the possibility that long-term

expectations may move too much with news, causing excess volatility in long-term rates

relative to short-rates.

To analyze systematically this possibility, in Section 2 I introduce non-rational beliefs

into an otherwise standard affine term structure model. In particular, I allow for distortions

in belief formation that are maturity dependent. In this analysis, I show that stronger over-

reaction of beliefs for longer maturities, as displayed in Figure (1), indeed gives rise to excess

volatility of equilibrium long term rates relative to short term ones. I also show that, within

the class of affine models, there is a precise mapping between excess volatility of interest

rates at a given maturity and the correlation between forecast revisions and forecast errors

3

at the same maturity. That is, there is a sort of “duality” between the estimates in Figure

(1) and the amount of interest rate volatility measured from equilibrium yields.

The remainder of the paper systematically studies this connection. One limitation in

the analysis of Figure (1) concerns the use of survey data. On the one hand, such data are

available only for a small subset of maturities, frequencies, and time periods. One would

ideally want to have a richer term structure to fully assess the volatility of beliefs. On the

other hand, the beliefs of professional forecasters may not be representative about the beliefs

of bond traders. Ideally, one would want to measure the beliefs of the marginal investor, and

this is clearly not available in conventional datasets.

To address these measurement issues, in Section 3 I use a method developed by Ross

(2015) to extract market beliefs from equilibrium yields. This method rests on two assump-

tions: i) the dynamic of risk factors moving interest rates is stationary, and ii) the stochastic

discount factor is path independent. Borovicka et al. (2016) criticized assumption ii), showing

that it does not hold for asset pricing models that contains long run risks (Bansal and Yaron

(2004)) or that more generally include a martingale component in the SDF. Against this

critique, I show that even if recovered beliefs are misspecified in the Borovicka et al. (2016)

sense, the use of Ross’ method is still informative with respect to the maturity dependent

over- (or under-) reaction that Figure (1) exemplifies. The intuition is that maturity depen-

dent information processing distortions entail a violation of the law of iterated expectations,

and such violation cannot be obtained under any rational asset pricing model, including those

incorporating long run risks or more generally martingale components in the SDF.

I thus proceed to implement Ross’ method for the yield curve, and describe the main

patterns of this data in Section 4. The recovered beliefs exhibit, similarly to the professional

forecasts, increasing over-reaction at long maturities (greater than 10y). To validate the

use of Ross recovered beliefs, I systematically compare the latter to professional forecasts.

The two measured of beliefs display a remarkably strong and positive correlation, which

suggests that recovered beliefs are not noise, and instead capture systematic features of

market expectations. Interestingly, I find that, compared to the cross section of professional

forecasts, Ross recovered beliefs weigh more heavily unbiased and accurate beliefs (relative

to equally weighted averages). This property can be traced back to the working of arbitrage

4

capital (Buraschi et al. (2018)), which partially (but not fully) offsets individual beliefs

distortions.

Having validated Ross recovered beliefs, and having confirmed the broad pattern of matu-

rity increasing over-reaction, In Section 5 I ask: what is the psychology of maturity increasing

over-reaction? And can this form of belief distortions account quantitatively for excess volatil-

ity in long term rates? To answer the first question, I adapt to a term structure setting the

diagnostic expectations model of Bordalo et al. (2017). This model features over-reaction

and it is grounded on the Tversky and Kahneman (1974) representativeness heuristic. The

basic principle of diagnostic beliefs is “kernel of truth”: beliefs move in the right direction

but exaggerate true features of the data generating process. I show that in a term structure

model the kernel of truth logic naturally yields maturity increasing over-reaction. The in-

tuition is that a diagnostic agent will over-react more strongly to a given signal when there

is large fundamental uncertainty, which is indeed the case for longer term rates. The model

provides therefore a foundation for increasingly over-reacting beliefs and for the violation of

the law of iterated expectations, and in a way that is disciplined by the parameters of the

data generating process. The latter property proves useful for quantification.

I then conclude the analysis by calibrating the diagnostic expectations model and by

quantifying its ability to capture excess volatility in beliefs and in interest rates. I find

that: i) the diagnosticity distortion parameter calibrated from recovered beliefs is consistent

with previous literature (Bordalo et al. (2018a)), ii) such parameter matches fairly well the

documented average over-reaction of analyst forecasts, and iii) it accounts for roughly 80%

of the excess volatility in interest rates documented by Giglio and Kelly (2017).

My paper contributes to a growing body of research aimed at testing the rational expec-

tation assumption with beliefs data. Bordalo et al. (2018a) finds evidence of over-reaction

to information in several individual macroeconomic and financial time series. Piazzesi and

Schneider (2011) finds that traditional factors (level and slope) are perceived as more persis-

tent by financial analysts and Cieslak (2018) finds that systematic errors in short rate expec-

tations drive bond returns predictability as opposed to time varying risk premia. Brooks et al.

(2018) shows that beliefs of professional forecaster over-react to FOMC announcements, gen-

erating post-announcement drift and excess sensitivity of long term rates. Relative to these

5

papers, I use Ross method to recover market beliefs, offer a parsimonious characterization

of belief distortions based on diagnostic expectations, and quantify the model’s ability to

account for the volatility anomaly.

This paper also relates to the debate about the recovery theorem. Martin and Ross

(2019) investigates the recovery theorem in fixed income markets, where the assumption of

stationary and Markovian state variables may be more plausible, relative to equity markets.

My setting is an empirical counterpart of Martin and Ross (2019), where I test for the

rationality of recovered beliefs. Jensen et al. (2019) generalize the recovery theorem to non

Markovian as well as non stationary settings. Qin et al. (2018) propose an empirical test for

the degeneracy of the martingale component of the SDF in the context of US treasury bonds

and reject it, but the test assumes rational expectations. Similarly to Qin et al. (2018), I

implement the Ross recovery theorem using the pricing measure from the estimation of a

standard Q-affine term structure model. This class of models is flexible since it does not

entail assumptions on the physical measure (Le et al. (2010)). Close to the intuition of the

recovery theorem, Augenblick and Lazarus (2017) provide evidences of excess movements of

stock market prices, particularly strong for long horizons.

My contribution to this literature is to consider a pattern, maturity increasing over-

reaction, that is robust to the Borovicka et al. (2016) misspecification. Furthermore, while

work on recovery is mostly methodological, I offer a systematic characterization of belief

formation and account of the excess volatility pattern.

The paper unfolds as follows: in Section 2, I introduce increasingly over-reacting beliefs

into an otherwise standard affine model for the term structure, and I show that in this

economy the error predictability of Figure (1) and the excess volatility of Giglio and Kelly

(2017) are closely related. In Section 3, I discuss the recovery theorem and its empirical

implementation. In Section 4, I compare recovered beliefs with survey data. In Section

5, I introduce a model of beliefs formation and I quantitatively assess the increasingly over-

reacting beliefs channel for excess volatility. In Section 6, I provide robustness checks. Section

7 concludes. All proofs are in Appendices.

6

2 Over-reaction to news and excess volatility

Figure (1) shows that analysts expectations about future interest rates tend to over-react to

news, more strongly so for longer maturities. This section formally shows that, within the

conventional class of Q-affine models, there is a direct link between such horizon-dependent

over reaction in beliefs and the excess volatility of long term interest rates documented by

Giglio and Kelly (2017). This connection informs my empirical analysis, aimed at: i) mea-

suring market beliefs, ii) characterizing their departure from rationality, and iii) quantifying

the latter’s impact on the excess volatility of interest rates. I illustrate the basic logic in a

one factor economy, while, for the empirical analysis, I consider multiple factors.

Consider a frictionless market, where zero coupon bonds with different maturities are traded.

Pt,m denotes the price at time t of a zero coupon bond with time to maturity m. The yield

to maturity, yt,m, is defined as:

Pt,m = e−m·yt,m ,

and the yield curve at time t equals the collection of yields {yt,m}m≥0.

Denote the one period interest rate prevailing at time s by rs (short rate henceforth). Then,

the average one period interest rate obtained on an investment at maturity m is equal to

rt,m := 1m

∑m−1i=0 rt+i. In a world with deterministic interest rates, the yield obtained from

investing at maturity m is simply equal to the interest rate at that maturity: yt,m = rt,m. If

instead the short rate is exposed to risk factors (e.g. GDP growth) the interest rate obtained

from an investment at maturity m is not known with certainty. Specifically, assume that the

short rate rs is a function of the risk factor at time s, which is denoted by Xs. Then, while at

time t the current short rate rt is known, longer maturity rates rt,m are not known because

the dynamics of the risk factor is stochastic. As a result, the price of the bond and hence the

yield to maturity yt,m at t is influenced both by expectations about future rates and by risk

aversion. We now characterize these expectations, and in particular their departure from

rationality. Next we show how the same expectations affect – together with risk aversion –

the yield curve in equilibrium.

7

Increasingly Over-Reacting Expectations

As in conventional affine models, the short rate is an affine function of the risk factor Xt,

which for simplicity we assume to follow a stationary AR(1). Then, the dynamics of the

short rate is fully described by:

rt = δ0 + δ1Xt

Xt = ρPXt−1 + σPεPt ,

(1)

where εPt is an i.i.d. shock. In this notation, superscript P captures the so called ”physical

measure”, or data generating process. As a consequence, the dynamics of interest rates at

maturity m, rt,m := 1m

∑m−1i=0 rt+i, reads:

rt,m = δ0 + bPmXt + εPt,m,

where bPm := δ1m

∑m−1i=0 (ρP)i and εPt,m := δ1

m

∑m−1k=1

∑ki=1(ρP)k−iσPεPt+i. The coefficients bPm cap-

ture the sensitivity of interest rates at maturity m, rt,m, to current information (Xt) under

the physical measure. This sensitivity geometrically decays to zero as the maturity increases.

εPt,m captures fundamental risk about interest rates at maturity m.

We allow expectations of future interest rates, to depart from rationality. In particular, we

allow them to over-react to the current state in a maturity dependent fashion, as suggested

by Figure (1). Formally, we assume that at time t the market perceives interest rates at

maturity m to be:

rθt,m = δ0 + bPm(1 + ψθm)Xt + σθmεPt,m, (2)

where θ denotes departures from rational expectations. There are two such departures.

First, the variance of fundamental shocks is potentially distorted, with σθm 6= 1. Second,

and more important, beliefs of future rates may display excess sensitivity (ψθm ≥ 0) to the

current state. The beliefs in Equation (2) arise when agents perceive interest rate shocks

εPθ

t,m := σθmεPt,m + ψθmb

PmXt so that perceived news are distorted toward the current state Xt.

Pθ denotes a distorted data generating process measure for the factor, and the derivation of

Equation (2) from the distorted factor dynamics is performed in Appendix A.

8

Equation (2) implies that expectations formed at time t about interest rates at maturity

m take the convenient intuitive form:

EPt

[rθt,m]

=: EPθt [rt,m]︸︷︷︸

distorted expectations

= EPt [rt,m]︸︷︷︸

rational expecations

+ψθm(EPt [rt,m]− EP [rt,m]

)︸︷︷︸news relative to the average

. (3)

The distorted expectation EPθt [·] is equal to the rational expectation plus an adjustment

in the direction of the news, where the latter is captured by E Pt [rt,m]− EP [rt,m].

Departures from rationality at maturity m are parameterized by the coefficient ψθm ≥ 0

When ψθm = 0, expectations are rational. When ψθm > 0 agents overract to news. That is,

they over-estimate future rates in states truly indicative of higher than average future rates

EPt [rt,m] > EP [rt,m]. By contrast, agents under-estimate future rates in states truly indicative

of lower than average future rates EPt [rt,m] < EP [rt,m].2

I allow the distortion parameter ψθm to be maturity-dependent. For now, I leave such

dependence unspecified. In Section 5, however, I show that under Gaussian noise, Equation

(3) naturally follows from the diagnostic expectation model of Bordalo et al. (2018b). Such

model also yields horizon dependent distortion parameters ψθm that are in turn functions of

a more primitive distortion parameter θ. This is why I denote distorted expectations using

θ.

2.1 Risk Aversion and the Yield Curve

Consider a market forming expectations according to Equation (2). What does the yield curve

looks like? To answer this question, one must also consider investors’ risk aversion, which

affects – together with beliefs – required rates of return. To capture risk aversion, consider a

stochastic discount factor (SDF) Mt,m discounting more heavily future cash flows occurring

in states in which the representative investor is poorer. Then the price of a maturity m zero

2The comparison with average information can be generalized to the comparison with a weighted sum ofpast predictions. This case includes the model of diagnostic expectations of Bordalo et al. (2018b) and it isdiscussed in Section 6.

9

coupon bond, and hence yields yt,m, is pinned down by the discounted expected payoff3:

Pt,m = EPθt [Mt,m] := EQθ

t

[e−m·rt,m

], (4)

where the so called risk neutral probability measure Qθ inflates the perceived probability

of states in which the representative investor is poor. By using Qθ the economic analyst can

account for risk aversion while using the convenient analytics prevailing under risk neutrality.

By the fundamental asset pricing equation, given a stochastic discount factor Mt,m, the risk

neutral measure density fQθ(Xt+m|Xt) associated to it is implicitly defined by the condition:

fQθ(Xt+m|Xt)e−m·rt,m = Mt,mf Pθ(Xt+m|Xt),

which captures the optimality condition for a market with distorted beliefs captured by

Pθ. Here I emphasize that under Pθ the factor is still an AR(1) with with maturity dependent

distorted persistence and volatility (the full analytics is performed in appendix A). To use

the machinery of affine term structure models, we assume that the stochastic discount factor

is such that the distorted and risk neutral dynamics Qθ for the factor is AR(1), with maturity

dependent persistence and volatility. In this case, the risk adjustment to beliefs Pθ yields the

following distorted and risk neutral dynamics for interest rates at maturity m:

rθt,m = δ0 + bQmXt + ψθmbQmXt + εQt,m,

where bQm := δ1m

∑m−1i=0 (ρQ)i and εQt,m := δ1

m

∑mk=2

∑k−2i=0 (ρQ)iσPεQt+1+i. ρ

Q 6= ρP is the risk neu-

tral persistence of the factor under rational expectations, while εQt+1 is a zero mean shock

with risk neutral variance σQ 6= σP, again under rational expectations. bQm is the risk neutral

sensitivity of future rates to Xt and εQt,m := δ1m

∑m−1k=1

∑k−1i=1 (ρQ)k−1σQεQt+i is the risk neutral

shock at maturity m, both under rational expectations. Importantly, under the belief distor-

3 Note that under the maturity dependent over-reaction model we have that the law of iterated expecta-

tions fails, in the sense that EPθ

t [Mt,t+2] 6= EPθ

t

[Mt,t+1EPθ

t+1[Mt+1,t+2]]. We need therefore to take a stance

about valuation. In line with the logic of Figure (1), we assume that the market forecasts future rates and

price future states with a ”buy and hold” valuation approach, namely we set Pt,t+2 = EPθ

t [Mt,t+2].

10

tions of Equation (2) the risk adjusted dynamics of interest rates still exhibit overreaction

(ψθm) to the current state Xt.

Proposition 1. Assume homoskedastic Q-shocks. The yield curve under over-reacting beliefs

is equal to:

yθt,m = − 1

mlogEQθ

t [e−m·rt,m ] = aθm + (bQm + ψθmbPm)Xt, (5)

where:

bQm =δ1

m

m−1∑i=0

(ρ Q)i,

aθm = δ0 −1

mlogEQ[e−m·σ

θmε

Qt,m ].

When expectations are rational, namely ψθm = 0, Equation (5) is the signature affine term

structure model (see Duffee (2013) for a review). The coefficients am and bQm are maturity

dependent. Interest rates volatility is shaped by the sensitivity bQm of yields to changes in the

factor Xt. The more persistent is the factor, namely the higher is ρQ, the more sensitive is

the interest rate at any given maturity to news, namely the higher is bQm. On the other hand,

because the factor is stationary, |ρ|Q < 1, long term rates should be less sensitive than short

term rates, namely b Qm declines with m. The Giglio and Kelly (2017) excess volatility puzzle

is rooted in the maturity profile of the bQm terms: they decay too slowly relative to what is

implied by the persistence ρQ of the factors. The issue is then: can the horizon dependent

over-reaction of Figure 1 rationalize this finding?4

With over-reacting beliefs, ψθm > 0, equilibrium yields retain a tractable form. Overreac-

tion in beliefs enhances the sensitivity of yields to the risk factor relative to a rational world,

4A different yet important issue is the extent to which Q-affine models correctly capture the yield curvedynamics. To this regard, it is worth noting that: i) empirically, at each maturity m, the same risk factorsexplain the variability of yields to that maturity, yt,m, with R2 close to one as shown in Section 3, ii) Giglioand Kelly (2017) show that quadratic Q-specification, stationary long memory process nor regime switchingmodels cannot account for the excess volatility and iii) Q-affine models allows highly non linear P -dynamicsas well as non standard SDFs, provided that their products yields Q affinity.

11

namely bQm → bQm +ψθmbPm. This formula connects over-reaction in beliefs and excess volatility

in interest rates.

To see this connection more precisely, consider expectations about future rates, computed

using the distorted physical measure Pθ and consider the measured volatility of interest rates.

I obtain the following result.

Theorem 1. (Over-reaction and excess volatility).

Under expectations (3) and the affine setting:

i) (Increasing Overreaction) the CG coefficients βm obtained by regressing the forecast

error made at maturity m using Pθ with the forecast revision under Pθ about the same

maturity are equal to:

βm = −c ψθm1 + ψθm

,

where c is a positive, maturity independent, constant;

ii) (Excess Volatility) the volatility of yields at maturity m relative to the volatility of the

short rate is equal to:

VP[yθt,m]

VP[yθt,1]=

(bQm + ψθmb

Pm

bQ1 + ψθ1bP1

)2

>VP[yt,m]

VP[yθt,1],

where VP[yθt,m] is the measured volatility of yields while VP[yt,m] is the volatility arising

under rational expectations. Over reaction to news yields both the pattern in Figure (1)

and excess volatility of long term rates if and only if ψθm is positive and increases in

maturity m.

This result conveys two important messages. First over-reaction to news increasing in

the horizon m reconciles the observed patterns in forecast errors and excess volatility in long

term rates. This is a general result, and holds beyond the more restrictive assumptions of

this Section. As I show in the proof of Theorem (1), such connection holds under general

Q-affine models, where P is Markovian and Q is AR(1), provided expectations about future

interest rates over-react according to Equation (3).

12

Second, and crucially, Theorem (1) says that in the conventional class of Q-affine term

structure models, when the data generating process P for the risk factors is itself AR(1), the

maturity dependent over-reacting beliefs in Equation (2) create a precise link between the

over-reaction coefficient measured using beliefs data and the excess volatility detected from

prices. They are both pinned down by the same maturity increasing distortion ψθm.

The remainder of the paper assesses empirically these predictions. In Sections 3 and 4

I start tackling this challenge by offering a measure of market beliefs. The survey evidence

in Figure (1) has two pitfalls. First, it has only a limited number of maturities. Second,

and most important, the beliefs of professional forecasters may not be representative of the

beliefs of market participants. To overcome these issues, I use methods developed by Ross

to extract information about beliefs from asset prices in conventional affine models5.

In Section 5 I add the assumption that the physical measure P is a Gaussian AR(1). In

this case, the over-reaction parameters ψθm can be conveniently founded using the diagnostic

expectations model (Bordalo et al. (2018b)). This allows me to obtain an estimate of their

distortion parameter θ that can be benchmarked to existing estimates. Furthermore, Theo-

rem (1) highlights a duality between over-reaction in beliefs and excess volatility in interest

rates, so that ψθm can be estimated both from beliefs data and directly from yields. I use this

duality to assess which conclusions about ψθm are most robust.

3 Beyond survey data: Ross Recovery Theorem and

the term structure of beliefs

The method of Ross (2015), rests on two assumptions: i) the underlying state of the economy

follows a stationary Markovian process, both under P and under Q, and ii) the SDF Mt,t+m is

path independent, namely it depends only on the final value Xt+m and on the initial value Xt

of the state variable, and not on the path from t to t+m. Assumption i) is an approximation,

but of significant empirical power: it is widely recognized that few state variables drive the

yield curve (see Duffee (2013)) and stationarity is not rejected in the data (see Giglio and

Kelly (2017) and Martin and Ross (2019)). Assumption ii) is more controversial: Borovicka

5I will consider prices of US treasury bonds which are directly related to interest rates and yields.

13

et al. (2016) shows that path independence is not met in ”long run risk” asset pricing model.

In this case, the method of Ross (2015) does not recover market beliefs, but beliefs adjusted

for a martingale component. This is an important criticism but, as I show below, my analysis

is immune to it because the increasingly over-reacting beliefs in (3) can still be recovered from

the data.

To grasp how Ross’s method works, consider an Arrow-Debreu security that pays one dollar

if next period’s state is j (assume for simplicity that there is a finite number N of states,

which correspond to factor values in my setting). Under rational expectations, if the current

state is i, the price of such Arrow Debreu security is equal to:

Aij = MijPij,

where Pij := P(Xt+1 = j|Xt = i) is the physical probability of transitioning from state i

to state j and Mij is the stochastic discount factor (SDF) capturing the marginal rate of

substitution between current consumption in i and future consumption in j. Due to risk

aversion, Mij overweights bad states, attaching a higher price to a safe bond because it pays

out in them.

In the presence of non-rational beliefs, the fundamental asset pricing equation implies

that the price of the same Arrow Debreu security is equal to:

Aθij = MijPθij.

An econometrician observing Arrow Debreu prices may recover market beliefs, be they

rational Pij or distorted Pθij, by performing an ”inverse risk adjustment” to the prices them-

selves. Ross has shown that such inverse risk adjustment is indeed possible, so that market

beliefs can be recovered from prices, under assumptions i) and ii) above. Ross’s method has

been so far used under the assumption of rational expectations.

As displayed in Figure (2) above, when market beliefs may not be rational, the use of

Ross’ method entails an ambiguity: the econometrician does not know if Arrow Debreu prices

are A (they reflect rational beliefs) or if they are Aθ (they reflect non rational beliefs). Here I

proceed as follows: I use Ross’ method, and then performed on the recovered market beliefs

14

P ARisk

Pθ

Belief distortions

Aθ

Figure 2: The blue line adjusts the data generating process P for preferences, which deter-mines Arrow-Debreu prices A. The red line distorts the data generating process P, thusdefining beliefs Pθ, thus generating Arrow-Debreu prices Aθ.

some statistical tests of rationality, considering in particular the predictability of forecast

errors. The outcome of this test tells us if we are in top or bottom row of Figure (2).

Before carrying out the analysis, consider again the critical path independence assumption

i). When the SDF is path independent, it means that there exists a constant δ and function

z of the state such that the one period SDF can be written as:

Mij = δzizj

.

This property is satisfied by conventional CRRA preferences (over stationary variables).

However, this assumption has been criticized by Borovicka et al. (2016), who show that

it assumes away the kind of long-run risk adjustments from the SDF that are embedded

in many conventional consumption based asset pricing models (CRRA when consumption

exhibit stochastic trends, long run risk models as well as habit formation models)6. As they

show, assuming path independence in these cases is akin to neglecting the fact that the SDF

also contains a martingale component that changes over time. Does such misspecification

invalidate the detection of the horizon dependent over-reacting beliefs that I consider here?

To answer this question, suppose that the SDF has a martingale component. Then, by

applying Ross method we would not recover the true market beliefs Pθ but the beliefs Pθ

contaminated by misspecification. I then obtain the following result.

6However Walden (2017) explicitly showed that relabeling the state variables may lead to the assumptionto be met, in the case of CRRA utility with non stationary consumption.

15

Theorem 2. Consider the Q-affine setting, non rational expectations as in ( 3) and suppose

that the martingale component of the SDF is non degenerate, so that the econometrician

detects Pθ , not Pθ. Then, the following are equivalent:

i) the CG coefficients estimated by regressing forecast errors of Ross recovered beliefs on

forecast revisions of Ross recovered beliefs vary with maturity m.

ii) Eθ[·] violates LIE.

Moreover, the CG coefficients obtained from Ross recovered beliefs are negative and de-

creasing:

βθm = −c′ ψθ

m

1 + ψθm,

where c′ > 0 is a maturity independent constant.

With misspecification a la Borovicka et al. (2016), the econometrician applying Ross’

method no longer recovers exact beliefs. Crucially, however, rationality tests still allow her

to recover the horizon dependent over-reaction, critical for my analysis, and detected in Figure

(1) for professional forecasters. The intuition is that horizon-dependent distortions entail a

strong violation of rationality, namely a violation of the law of iterated expectations.7 As

such, it cannot be accounted by any rational expectations models, including those featuring

long run risk. In this respect, the Ross recovery theorem remains useful for spotting horizon

dependent beliefs distortions.

3.1 Empirical implementation

To apply Ross’ method, we need Arrow Debreu prices. These are not directly observed but

they can be inferred from market prices. To do so, note that one period ahead Arrow-Debreu

prices can also be written as state by state discounted risk neutral probabilities:

Aθij = Qθ

ije−r(i),

7Formally, the LIE is a mathematical theorem which holds true for every probability measure. Theinconsistency of Pθ at different maturities is mathematically due to the fact that Pθ is a different probabilitymeasure for each maturity, as defined by condition (3).

16

where Qθij is the risk neutral probability of transitioning from state i to state j. The above

formula includes both the case of rational expectations (θ = 0) as well as non rational ones

(θ 6= 0).

To compute the risk neutral probabilities Qθij I first estimate the risk neutral parameters

of a three factor affine model. Then, I discretize the state space using the Rouwenhorst

method (Cooley (1995)), compute the discretized transition probabilities Qθij and finally I

discount them by the known short rate. This procedure yields Aθij.

I follow conventional methods and consider as factors the first three principal components

of the yield curve (see Duffee (2013)). The construction of the factors is discussed in Appendix

B. The three factor Qθ-dynamics takes the form:

rt = δ0 + δ>1 Xt

Xt+1 = ρQθXt + ΣCεQ

θ

t+1,

where εQθ

t are i.i.d. shocks. The matrix ρQθ

is assumed to be diagonal, ρQθ

= diag(ρQθ

1 , ρQθ

2 , ρQθ

3 ).8

ΣC is the Cholesky decomposition of the variance-covariance matrix of the residuals, Σ. Fol-

lowing Cochrane and Piazzesi (2009), I assume that Σ coincides under Pθ and Qθ, so I can

estimate it with the variance covariance matrix of residuals in a V AR(1) for Xt. Finally, the

entries of the diagonal matrix ρQθ

are estimated by matching observed yields.

Due to my emphasis on maturity dependent over-reaction, I perform this estimation ma-

turity by maturity independently, since I do not want to impose restrictions across maturities.

Summarizing the procedure:

1. δ0, δ1 are the OLS estimates of the regression of the short rate on the three factors.

2. Σ is estimated as the variance covariance matrix of the residuals of a VAR(1) for the

factors. ΣC is the corresponding Cholesky decomposition.

3. The risk neutral parameters of the factor dynamics are estimated, maturity by maturity,

8This restriction is necessary to achieve identification of the matrix ρQθ

assuming that the noise is Gaussian.In this case, a Gaussian transition probability density is in fact characterized by 9 independent parameters,which parametrize the mean and the covariance matrix (see Hamilton and Wu (2012)).

17

as:

ρQθ

m := arg minρQθ

1

T

∑t

(yt,m − ym,t)2 = arg minρQθ

1

T

∑t

(aθm + bQ

θ

m Xt − ym,t)2

where bQθ

m is a function of the the parameters ρQθ, whose analytic expression in computed

in Appendix A.

4. One period ahead Arrow-Debreu prices are finally estimated using yields to maturity

m only as fitting the average yield curve:

A(m)ij = Q(m)

ij e−r(i).

A final step before recovering beliefs in needed. The Recovery Theorem entails an eigen-

value problem for the Arrow-Debreu matrix, while the affine specification relies on continuous

variables. As I discussed before, to tackle this issue I discretize the continuous state space of

Xt by using the Rouwenhorst method (Cooley (1995)), which represents the state of art in

approximating an AR(1) process with a finite state space Markov chain. The method gener-

ates a Markov chain which matches mean, variance and autocorrelation of the original AR(1)

process. These are the moments I am interested in: the mean of the factor determines the

behavior of the average yield curve, the autocorrelation and the variance determine the term

structure of volatilities. Further details about the implementation are discussed in Appendix

B.

4 Recovered Beliefs, Survey data and Rationality tests

I now study recovered beliefs and compare those with survey data from professional forecast-

ers. The latter step helps me validate the analysis because the two sources of data are highly

independent.

18

4.1 Data

US treasury yields

Gurkaynak et al. (2007) provide (and keep updated) nominal, annualized, zero coupon bond

yields with yearly maturities from 1 year to 30 years. Gurkaynak et al. (2007) infer the yield

curves time series from observed prices of fixed income instruments. The data are jointly

available at all maturities from 11-25-1985 to 12-31-2016, at the daily frequency.

The Blue Chip Survey of Professional forecasters

The Blue chip survey of professional forecasters contains forecasts about yields to maturities

1, 2, 5, 10, 20 and 30y from leading financial institutions, which are flagged in the dataset.

Forecasts with maturity 1, 2, 5 and 10 years are available from January 1984, forecasts with

maturity 30y are available starting from January 2000, while forecasts with maturity 20y

are available starting from January 2004. Forecasts about the next quarter yield curve are

reported at the monthly frequency, so the prediction horizon oscillates between 2 and 6

months9. I remove from the sample those forecasters who reply to the survey for a short time

period only (less then 5 years). At each time t, I remove those replies to the survey which

contain answers for less than 3 prediction horizons: what I am mostly interested in are indeed

forecasts at different horizons. Finally, for each prediction horizon, I remove outliers which

are defined as observations in the first and last percentile of the distribution of forecasts.

4.2 Tests of rationality

Tests of rationality involve the predictability of the forecast error, defined, for each available

maturity as:

FEt+1[yt+1,m] = yt+1,m − yt+1,m|t,

where yt+1,m is the observed yield and yt+1,m|t is the prediction of yields with maturity m at

time t + 1 done at time t. I compute forecast errors using both survey data and recovered

9 The unpredictability of the forecast error which is implied by the rational expectation hypothesis is inprinciple unaltered by the moving forecast horizon. However, to ameliorate concerns regarding changes inexpectations due to changes of the prediction horizon, I consider time fixed effects in the robustness section.

19

beliefs. In the former case, each yt+1,m|t has multiple observations across different forecasters

and the forecast error is forecaster specific. In the latter case, the forecast is computed as:

yt+1,m|t :=∑

Xt+1∈Grid

Pθ(Xt+1|Xt)︸︷︷︸Recovered beliefs

× (aθm + (bQθ

m )>Xt+1)︸︷︷︸Emprical affine mapping

.

Expectations about future yields are conveniently decomposed into distorted expectations

about factors (identified using the recovery theorem) and the pricing function (empirical

affine mappings). The affine mapping is known because aθm and bQθ

m have been estimated as

discussed in Section 3. To simplify notation, yt+1,m|t does not carry superscript θ.

Under the null hypothesis of full information rational expectations, the forecast error should

not be predictable on the basis of past information. But what is past information? Following

Coibion and Gorodnichenko (2015), I define information at time t by the forecast revision:

FRt[yt+1,m] := yt+1,m|t − yt+1,m|t−1.

The logic underlying this definition is that information moves beliefs : if no information is

observed at time t, then there is no revision in beliefs, i.e. FRt[yt+1,m] = 0.

For each maturity available, I then run the regression:

FEt+1[yt+1,m] = αm + βmFRt[yt+1,m] + εt+1,m.

I call βm ”CG coefficient” from Coibion and Gorodnichenko (2015). A positive βm > 0 is

interpreted as under-reaction to information at maturity m. In this case, when beliefs about

yields at horizon m are revised upward, they systematically under-estimate realized yields.

That is, beliefs are not revised enough. On the contrary, βm < 0 is interpreted as over-

reaction to information at maturity m: when beliefs about yields at horizon m are revised

upward, they systematically over-estimate realized yields. That is, beliefs are revised too

much. The CG coefficients obtained with the Ross recovered beliefs for maturities ranging

of 2, 3, . . . 30 years10 and by pooled estimation of professional forecasters data are shown in

10 The 1y yield is used as short rate and therefore predictions to this horizon cannot be computed.

20

Figure 3: Slope of FE on FR using recovered beliefs (red) and using the Blue Chip dataset(pooled OLS). Confidence intervals are computed at 5%.

Figure 3.

Recovered beliefs also exhibit maturity dependent reaction to information. Just like the

beliefs of professional forecasters, they over-react more for long term yields than for short

term ones. Thus, the key message of excess reaction to information of long run beliefs relative

to short run beliefs is consistent across survey data and Ross recovered beliefs. The main

difference between the two datasets arises because Ross recovered beliefs exhibit a pattern

of under-reaction to information at the short end of the yield curve and over-reaction to

information at the long end. Professional forecaster data, on the contrary, exhibit over-

reaction only 11 12.

Ross recovered beliefs capture market beliefs, and the marginal investor is potentially

11 In the robustness Section, I consider different specifications of the regression performed with professionalforecasters data, including time fixed effects, forecasters fixed effects as well as single and double clusteringof standard errors. The results are consistent.

12In Bordalo et al. (2018a), the authors find under-reaction at maturities shorter than one year with BlueChip data, at the quarterly frequency. Here, I do not consider those maturities for the sake of comparisonwith recovered beliefs.

21

different from the average forecaster. For this reason the qualitative similarity of predictabil-

ity patterns across the two datasets is surprising: there is no mechanical reason for it, and

this suggests that stronger over-reaction for longer horizons may be a robust feature of be-

liefs. We can even more directly compare Ross recovered beliefs and professional forecasts by

correlating forecast revisions and the level of forecasts across the two datasets (considering

the mean as well as the median forecast). Figure (4) shows that the two datasets are quite

aligned along both criteria13.

Figure 4: Left panel: correlation between mean forecasts (red triangle), median forecasts(blue circle) and Ross recovered forecasts as a function of the maturity. Right panel: cor-relation between mean forecast revision (red triangle), median forecast revision (blue circle)and Ross recovered forecast revisions as a function of the maturity.

The strong, positive correlation between Ross recovered beliefs and professional forecasts

is important. It indicates that the two types of beliefs data are not noise, and thus they

capture significant features of market beliefs. Of course, the benefit of Ross recovered beliefs

is that, in addition to being more tightly linked to the marginal investor, they are available

for all maturities and frequencies.

Are market beliefs, as measured with the Ross method, better or worse than the beliefs of

analysts? There are two possible metrics to asses this: the predictability of the forecast error

(which captures biases in forecasting) and the mean square error (which capture accuracy

13 For the comparison, I aggregated recovered beliefs from the daily to the monthly frequency.

22

in forecasting, i.e. bias plus precision). In Figure (5), I plot the estimated distribution of

distortions (biases) using a Gaussian kernel density estimator.

The distribution of forecaster distortions is not symmetric around rationality (i.e. β = 0):

the majority of forecasters over-react to news. This is reflected in the fact that the average

bias, captured by the pooled regression in the introduction and reported also in Figure (5) is

negative. The above figure also reports as a benchmark the CG coefficient of the consensus

forecast, defined as the predictability of errors for the median analyst forecast. Note that

the consensus always under-reacts to news, a fact that has been previously documented in

Bordalo et al. (2018a), where the authors also reconcile it with it with individual analyst

over-reaction14.

The distortion of the Ross recovered forecast lies in between the median forecaster and

the average bias, and it is closer to rationality (β = 0) than both. Thus, regarding the

bias dimension, this analysis suggests that the Ross recovered beliefs weigh more unbiased

forecasts, relative to the average professional forecaster bias. This may be due to arbitrage

capital moving partly, though not fully, towards less biased investors.

Second, I investigate the accuracy of forecasters, as measured by the mean square error in

prediction. In Figure (5), I consider two groups of forecasters. The top 25% most accurate

forecasters and the bottom 25% least accurate forecasters. In the data, the former forecasters

are rational, namely, their CG coefficients are statistically indistinguishable for zero, while the

worst 25% are highly over-reacting. Thus, there is a positive relation between imprecision and

bias in forecasting. A direct comparison between the correlations of Ross recovered forecasts

and top/worse 25% (ranked according to the MSE criterion) is offered in Figure (6).

Figure (6) suggests that the Ross recovered forecast slightly over-weights more accurate

views

Overall this analysis suggests that Ross forecasts weight more both more unbiased forecasters

and more accurate forecasters.14The intuition is that when forecasters observe different noisy signals, stemming for instance from hetero-

geneous information sets, each analysts over-reacts to his own news, but does not react at all to the signal ofother analysts. This second effect can be so strong that the consensus forecast under-reacts to the consensusrevision even if each analyst over-reacts to his own information.

23

Figure 5: Density of forecasters distortions (CG coefficients) for different maturities. Distor-tions from recovered beliefs are shown in black, pooled distortions in blue, consensus (median)distortions in purple, best forecasters distortions (top quartile according to the mean squareerror criterion) in green, worse forecasters distortions (bottom quartile according to the meansquare error criterion) in red. Confidence intervals are computed at the 5% level.

24

Figure 6: Correlation between mean forecast of top 25% forecasters (ranked according tothe MSE criterion) and Ross recovered forecasts (blue triangle), Correlation between meanforecast of worse 25% forecasters (ranked according to the MSE criterion) and Ross recoveredforecasts (orange triangle).

Broadly speaking, this Section conveys the following messages. First, market beliefs re-

covered using Ross method display the same maturity increasing over-reaction displayed by

survey data. Second, market beliefs and survey data are highly positively correlated. Third,

market beliefs are less biased and more accurate than the average professional forecaster.

This indicates that our recovered market beliefs capture systematic patterns in beliefs, and

that arbitrage helps reduce the impact of highly biased forecasters on asset prices, consistent

with basic asset pricing theory.

Having validated the recovered beliefs, several questions emerge. First, why do forecasts

over-react in this maturity dependent way? What is the psychological foundation? Second,

can the over-reaction detected from beliefs account for the excess volatility of interest rates

along the lines of Theorem (1)? To make progress, I specify a realistic model of belief

formation, based on the diagnostic expectation model of Bordalo et al. (2018b), and I show

that it offers an answer to both questions: it yields maturity dependent over-reaction, it can

quantitatively account for belief distortions and this in turn captures a big chunck of the

excess volatility of interest rates documented by Giglio and Kelly (2017).

25

5 The Horizon Dependent Diagnostic Expectations Model

Using the diagnostic expectations model of Bordalo et al. (2018b), I now rationalize maturity-

increasing over-reaction using a single parsimonious departure of beliefs from rationality,

captured by the diagnosticity parameter θ that I estimate and use to quantify the explanatory

power of the model for excess volatility of interest rates. As argued in Section 2, the duality

between belief distortions and over-reaction in interest rates offers two separate methods

for estimating θ, one based on beliefs data, another based on yields. I show that these

two independent strategies yield similar estimates for θ that are consistent with previous

estimates, and that can quantitatively account for excess volatility of interest rates and for

the predictability of the forecast errors.

5.1 Diagnostic Expectations

Diagnostic expectations are based on Kahneman and Tversky’s (Tversky and Kahneman

(1974)) representativeness heuristics in probability judgments. Representativeness captures

the idea that, when making a conditional probabilistic assessment, humans typically over-

weight representative (or diagnostic) traits, defined as the traits that are objectively more

frequent in such group relative to a comparison group. A conventional example is the ex-

aggeration of the probability that an Irish person is red haired, because this hair color is

relatively more frequent in Ireland than elsewhere (although even in Ireland it is unlikely in

absolute terms). This heuristics has been widely documented in the psychology and cogni-

tive science literature, since the seminal work of Tversky and Kahneman (1974) and it has

recently been adapted to a dynamic setting by Bordalo et al. (2018b). When forecasting,

economic agents exaggerate objectively positive news relative to a benchmark prediction,

shaped by past information.

To capture this idea in my setting, consider the following diagnostic distribution at time t

of interest rates with maturity m, rt+m = 1m

∑mi=1 rt+i:

fPθ(rt,m|Xt) ∝ fP(rt,m|Xt)

(fP(rt,m|Xt)

fP(rt,m)

)θ,

26

where fP(·) is the unconditional or long run distribution of rt,m.

The diagnostic distribution of rt,m at time t re-weights the correct density fP(·|Xt) via the

likelihood ratio(fP(rt+m|Xt)fP(rt,m)

)to the power θ. Investors over-weight future values of interest

rates that are more likely under current information relative to the average information, where

the latter is captured by the long run distribution. The parameter θ > 0 in the diagnostic

distribution quantifies the degree of over-reaction. The larger is θ the more outcomes whose

likelihood has increased are overweighted in beliefs.

My model differs in two dimensions to the model of Bordalo et al. (2018b). First, in Bor-

dalo et al. (2018b), the authors compare rational forecasts at time t with rational forecast at

time t− 1, while I use the average information as comparison. This is technically convenient

because my model preserves Markovianity, which greatly simplifies the identification of re-

covered beliefs in Section 3. In Section 6, I show, theoretically, that results are qualitatively

robust to the specification used in Bordalo et al. (2018b). Second, and more important, the

benchmark distribution in Bordalo et al. (2018b) is assumed to have the same volatility as

the rational forecast. This assumption is innocuous when considering a fixed horizon as in

Bordalo et al. (2018b) and it greatly simplifies the math. However, this assumption misses

an important intuition: namely that diagnostic distortions should depend on the underlying

uncertainty about the economic environment, as discussed in ?. I now show that allowing

for this role of uncertainty is highly relevant here, because it implies the violation of the law

of iterated expectations taking the form of maturity increasing over-reaction.

The diagnostic distribution can be conveniently computed for linear and Gaussian dy-

namics. Assume that the P dynamics of the factor is:rt = δ0 + δ>1 X

Xt = ρPXt−1 + ΣCεPt ,

where XtP∼ V AR(1)(ρP,Σ), ΣC is a lower triangular matrix, εPt are i.i.d. Gaussian shocks

and Σ := ΣCΣ>>

is the one period variance-covariance matrix. Here, in order to derive a

parsimonious expression for increasing over-reaction, I impose additional assumptions relative

27

to Q-affine setting (the latter is the only assumption I needed so far). Specifically, I assume a

VAR(1) Gaussian dynamics for the factor, under the physical measure P. Then, the diagnostic

distribution of interest rates at maturity m is characterized as follows.

Theorem 3. ( Diagnostic distribution ) Given XtP∼ V AR(1)(ρP,Σ), under Gaussian noise,

the diagnostic distribution of interest rates to maturity m, rt,m, f θP(rt+m|Xt), is Gaussian,

with mean:

EPθt [rt,m] = EP

t [rt,m] + ψθm(EPt [rt,m]− EP[rt,m]

)(6)

and variance:

VPθt [rt,m] =

(θ + 1

VPt [rt,m]

− θ

VP[rt,m]

)−1

, (7)

where

ψθm :=θVPt [rt,m]

VP[rt,m]

1 + θ − θVPt [rt,m]

VP[rt,m]

, (8)

EPt [rt+m] =

1

m

(δ0 + δ>1

(m−1∑i=0

ρPi

)Xt

), (9)

EP[rt,m] = δ0, (10)

VPt [rt,m] =

1

m2(δ1)>

(2∑i=0

(m−2∑i=0

ρP2i

Σ

))δ1 (11)

and

VP[rt,m] = E[VPt [rt,m]] + V[EP

t [rt,m]] (12)

= VPt [rt,m] +

1

m2(δ1)>

(m−1∑i=0

ρPi

)Σ

(m−1∑i=0

ρPi

)δ1 (13)

As evident from Equation (6), the diagnostic model endogeneizes the maturity increasing

over-reaction assumed in reduced from in Section 2. The coefficient ψθm in formula (8), that

characterizes the departures from rationality in Section 2, it now pinned down by: i) the scalar

parameter θ, ii) the maturity m, and iii) the true conditional and unconditional variances

28

VPt [rt,m] and VP[rt,m]. Belief distortions starts from zero at m = 0, then become positive at

m = 1 and increase monotonically as m→∞, approaching a finite limiting value equal to θ.

The intuition for this result is simple: as the maturity m increases, fundamental uncer-

tainty is higher. As a result, the tails of the distribution are prominent, so they more easily

come to mind after information makes them more likely. That is, after good news, the right

tail becomes very representative for long maturities and it is highly over-weighted. As a

result, expectations are too optimistic. After bad news, the left tail becomes very represen-

tative for long maturities and is highly overweighted. As a result, expectations become too

pessimistic. In both cases, beliefs over-react and they do so more for longer maturities.

Theorem (2) also shows that in reduced form, for a given value of ψθm, the diagnostic

model yields the same expectations for interest rates assumed in Section 2 and it therefore

yields the same rule for equilibrium yields as in Theorem (1):

yθt,m = aθm + (bQm)>Xt + ψθm(bPm)>Xt,

where however we consider the realistic multi-factor setting and the coefficients bPm and

bQm are now vectors, with different entries for different factors. Of course, the distortion ψθm

now depends on the data generating process, on maturity m, and on diagnosticity θ. This

aspect places restrictions in the quantitative analysis.

5.2 Calibration

Theorem (1) suggests two independent routes to estimate the same primitive parameter θ.

First, I retrieve θ using the profile of CG coefficients βm, obtained with recovered beliefs. I

match βm with their theoretical counterparts, as derived in Theorem (1). I can then assess

the fitting ability of such estimated θ on excess volatility. Second, I follow the mirror route:

I estimate θ by matching excess volatility, and then assess its ability in fitting the profile of

CG coefficients.

For the first exercise, I estimate θ that best matches the profile of CG coefficients. Note

that the fit here cannot be perfect: in the short end of the curve Ross recovered beliefs

29

under-react, a pattern which cannot be rationalized under diagnostic expectations, namely

θ > 0. The estimated value obtained when using this method fulfills:

θ = arg minθ

30∑i=2

(βm − βθm

)2

≈ 0.7.

Second, I consider excess volatility. The diagnostic parameter θ disciplines excess volatility:

the excess sensitivity of yields to factors is controlled by ψθm, which is a function of the rational

conditional variance of interest rates with maturity m and the rational unconditional variance

of interest rates with the same maturity. The variance of yields in a diagnostic world relates

to the variance of yields in a rational world as:

VP[yθm,t]︸︷︷︸data

= VP[yt,m]︸︷︷︸RE model

+ψθm(bPm)>ΣbPm,

From this observation, I can calibrate the diagnostic parameter θ from the excess volatility

of yields itself. Indeed, the variance of rational yields of the right hand side can be estimated

by fitting the short end of the yield curve as in Giglio and Kelly (2017), while bPm is can

be computed using the estimated persistence ρP obtained from a VAR(1) estimation of the

P-factor dynamics15. Therefore, I estimate θ as:

θ = arg minθ

30∑i=1

(VP[yθm,t]− VP[yθ=0

t,m ]− ψθm(bPm)>ΣbPm)2)2 ≈ 0.47.

The two estimates differ, but remarkably they are close to previous estimates of parame-

ter θ obtained using different data and different methodologies. For instance, Bordalo et al.

(2018a) use survey forecasts of professional forecasters for many macro-financial variables and

shows that typically there is over-reaction with magnitude θ ∼ 0.6, and equal to θ ∼ 0.48 for

medium to long term interest rates. This is an additional confirmation that in my setting

Ross recovery captures robust patterns of beliefs. To grasp the quantitative meaning of the

estimated values of θ, consider the benchmark θ = 1. In this case, distorted forecasts of

15 I used the methodology of Section 3 and imposed consistency across short maturities, namely: ρQ :=

arg minρQ(

1TM

∑t,m (yt,m − yt,m)

2)

. M is set as equal to 5: it quantifies how short the short end of the

yield curve is. Robustnesses relative to the parameters to be included in the short end show consistency ofthe results.

30

long maturity rates are equal to the rational forecast plus the revision. Assuming that the

baseline rational forecast for the long run rate is around 2%, after the arrival of news indica-

tive of a higher rational forecast of 3%, the diagnostic forecast will be then 4%. Distortions

are thus sizable. This back-of-the-envelope calculation shows that the numbers at play are

economically relevant.

Having estimated values for the distortion parameter θ, we can now evaluate the accuracy of

the diagnostic model for excess volatility in interest rates. Figure (7) reports the volatilities

of yields at different maturities obtained under a three factor affine rational model, namely

a counter-factual model setting θ = 0, together with the variance of yields obtained from the

same model in which we plug the estimated θ ≈ 0.47 and θ ≈ 0.7.

Figure 7: Excess volatility in the data (blue up triangle) versus the fit of a rational expec-tations affine model (red down triangle) and the diagnostic expectations affine model withθ ≈ 0.47 (green right triangle) and θ ≈ 0.7 (purple left triangle).

Figure (7) shows that diagnostic expectations capture much of the variation of the yield

curve. At the longest maturity available, diagnostic expectations fit roughly

√VP[yθt,30]

VP[yθt,30]≈ 82%

of excess volatility. Considering the distortion parameter estimated from the profile of CG

coefficients, θ ≈ 0.7, provide an explanation of ≈ 55 % of the excess volatility. This latter

31

case shows that information implied from CG coefficients actually does help explaining excess

volatility.

The symmetric question is, as suggested by Theorem (1), how much of the predictability

of the forecast error can be explained from a distorting parameter θ which is inferred from

excess volatility? The link between the forecast error predictability (CG coefficients) and the

over-reaction is the one in Theorem (2), where the coefficients ψθm have now been derived

from the diagnostic expectations model.

Figure 8: CG coefficients from Blue Chip data (blue up triangle), recovered beliefs (red righttriangle) and implied by the calibrated diagnostic expectation model for θ ≈ 0.47 (green lefttriangle) and θ ≈ 0.7 (purple down triangle).

Figure (8) shows the term structure of CG coefficients, in the data (survey data in Blue,

Ross recovered beliefs in red) as well as the βm profiles implied by the calibrated parameters

θ ≈ 0.47 from excess volatility (green) and the calibrated distorted parameter θ ≈ 0.7 from

CG coefficients (purple). The fit is not expected to be perfect because under-reaction cannot

be reconciled directly with the diagnostic expectation model. Indeed, within this model, the

CG coefficients should be negative. The presence of under-reaction, particularly at short

maturities, has been previously documented by different authors (Bouchaud et al. (2019),

Bordalo et al. (2018a)). Other forces may be at play other than representativeness, such

32

as disagreement and informational frictions, which needs to be further investigate. When

considering the whole term structure, however, there is a clear pattern of increasing reaction

to news, and, in particular, or over-reaction at long maturities.

6 Robustness

In this Section, I discuss theoretical and empirical robustness to the recovery theorem. Then,

I consider different specifications of the forecast error predictability tests. Finally, I consider

alternative specifications of the diagnostic expectations model.

6.1 Empirical implementation of the recovery theorem

6.1.1 Discretization and sampling frequency

Figure (9) shows that the number of states chosen for the discretization for each factor

(N = 25, 50, 75, 100) and the sampling frequency (daily versus monthly) do not qualitatively

affect the CG coefficients curve.

Figure 9: Left panel: CG coefficients at the monthly frequency. Right panel: CG coefficientsat the daily frequency.

6.1.2 Error predictability in different sub-samples

The level of the nominal yield curve is close to unit root process, especially at high frequency

(e.g. daily). Does possible non stationarity relates to under/over-reaction to news? Figure

33

(??) shows that the decreasing pattern in CG coefficients survives in different subsamples.

6.2 Predictability of the forecast error, different tests

News have been defined as the difference between the rational forecast and the average

forecast, in the diagnostic expectation model used. Here I consider CG like tests, where the

forecast revision is defined as the difference between the rational forecast and the long run

forecast. The figure shows qualitative agreement, both using survey data and using recovered

beliefs.

6.3 Diagnostic Expectations: different benchmark distributions

One important degree of freedom in the specification of the diagnostic expectation model is

the choice of the comparison distribution. Here, I have chosen the unconditional or long run

distribution of future rates. This is a convenient choice because the diagnostic distribution

remains Markovian, namely rt,m depends on Xt but not on Xt−1,Xt−2, . . . . It is however

important to investigate what changes with different benchmark distributions.

34

Figure 10: Left panel: Blue Chip Financial. Right panel: Ross recovered beliefs

Over-reaction relative to time t− 1 prediction

Consider the following specification, inspired by Bordalo et al. (2018b)16:

fPθ(rt,m|Xt) ∝ fP(rt,m|Xt)

(fP(rt,m|Xt)

fP(rt,m|Xt−1)

)θ.

The diagnostic distribution of rt,m in this case is still Gaussian, with mean:

EPθt [rt,m] = EP

t [rt,m] + ψθm(EPt [rt,m]− EP

t−1[rt,m])

and variance:

VPθt [rt,m] =

(θ + 1

VPt [rt,m]

− θ

VPt−1[rt,m]

)−1

,

16In Bordalo et al. (2018b), the authors consider the convenient benchmark distribution as fP(rt,m|Xt :=Xt−1). This simplifies the algebra of diagnostic expectations: only the first moment is distorted and thedistortion is independent of the maturity. This assumption is innocuous from a single horizon perspective,which is the setting of Bordalo et al. (2018b), yet it forces the law of iterated expectations to hold, differ-ently from slight perturbations of the benchmark distribution. Therefore, this simplifying assumption is notappropriate for my setting.

35

where

ψθm :=θ

VPt [rt,m]

VPt−1[rt,m]

1 + θ − θ V Pt [rt,m]

VPt−1[rt,m]

.

Also in this case, the distortion coefficients ψθm starts at zero (namely limm→0 ψθm = 0),

asymptotically approaches θ (namely limm→∞ ψθm = θ) and they are increasing. Moving to

risk neutral distorted expectations, the diagnostic yield curve reads:

yθt,m = aθ

m + (bQm)>Xt + ψθm((bPm)>Xt − (bPm+1)>Xt−1

)

In this case, the relation between the volatility of the yields in a diagnostic world, relative

to the rational one is modified by forecast revision of interest rates. After good news yields

are higher while after bad news are lower, relative to the RE case. Unconditionally, yields

display higher variance in the diagnostic case, since the extra term ψθm(bPm)>Xt−(bPm+1)>Xt−1

is positively correlated with (bQm)>Xt. This is so because bPm+1 < bPm and the P persistence is

on average smaller than one (or in the on linear case it is assumed the be smaller than one

on average). they display higher volatility.

Over-reaction relative to time t− k prediction (k > 1)

In this case the logic of the case k = 1 still goes through with:

ψθm :=θ

VPt [rt,m]

VPt−k[rt,m]

1 + θ − θ V Pt [rt,m]

VPt−k[rt,m]

.

and:

36

yθt,m = aθm + ψθm((bQm)>Xt+1 − (bP)>m+kXt−k

).

Over-reaction relative to a weighted average of past predictions

Yesterday information and average information are two useful benchmark, yet one may con-

sider a more ”colorful” memory. I consider the following specification, inspired by Bordalo

et al. (2018b), Internet Appendix):

fPθ(rt,m|Xt) ∝ fP(rt,m|Xt)M∏k=1

(fP(rt,m|Xt)

fP(rt,m|Xt−k)

)θak.

where 0 ≤ ak ≤ 1 are positive weights on past information such that∑

k ak = 1 and

1 ≤M ≤ ∞.17 The diagnostic distribution of rt,m in this case is still Gaussian, with mean:

EPθt [rt,m] = EP

t [rt,m] + ψθm

(EPt [rt,m]−

M∑k=1

akEPt−k[rt,m]

)

and variance:

VPθt [rt,m] =

(θ + 1

VPt [rt,m]

− θM∑k=1

akVPt−k[rt,m]

)−1

,

where

ψθm :=θVP[rt,m]

∑Mk=1

akVPt−k[rt,m]

1 + θ − θVP[rt,m]∑M

k=1ak

VPt−k[rt,m]

.

17When M =∞ suitable regularity conditions need to be assumed for convergence.

37

Also in this case, the distortion coefficients ψθm starts at zero (namely limm→0 ψθm = 0),

asymptotically approaches θ (namely limm→∞ ψθm = θ) and they are increasing. Moving to

risk neutral distorted expectations,the diagnostic yield curve reads:

yθt,m = aθm + ψθm

((bQm)>Xt+1 −

M∑k=1

ak(bP)>m+kXt−k

).

The diagnostic yield curve is highly non Markovian in this case, yet it still feature the

excess volatility pattern.

7 Conclusions

This paper shows empirically that beliefs distortions increases with maturity (decreasing CG

coefficients) and that this is tightly linked to the excess volatility in the term structure of

asset prices documented by Giglio and Kelly (2017). The crucial property that beliefs fail in

the data and that accounts for excess volatility is the law of iterated expectations. I show

that a diagnostic affine model, can quantitatively capture the variation in excess volatility

and the distortions in the individual beliefs. In the model agents over-react differently for

different levels of the fundamental uncertainty, which naturally varies in the context of term

structure. This approach is a first step toward two research direction: the investigation of the

effects of higher moments in beliefs distortions and the incorporations of non rational beliefs

into quantitative finance models. At the aggregate level, I show the beliefs dynamics mixes

under-reaction at short maturities and over-reaction at long-maturities. A foundation of such

cross-over needs further investigations. Methodologically, this paper implements empirically

the Ross Recovery theorem in the context of the term structure of interest rates, and show

that, despite the identification problem raised by Borovicka et al. (2016), recovered beliefs

can spot inter-temporal belief inconsistencies. This is important to augment survey data

with asset prices information and it is portable to different domains, such as the study of the

equity term structure.

38

References

Augenblick, N. and Lazarus, E. (2017). Restrictions on asset-price movements under rational

expectations: Theory and evidence. Technical report, Working Paper.

Bansal, R. and Yaron, A. (2004). Risks for the long run: A potential resolution of asset

pricing puzzles. The journal of Finance, 59(4):1481–1509.

Bordalo, P., Gennaioli, N., Ma, Y., and Shleifer, A. (2018a). Overreaction in macroeconomic

expectations. Technical report, Working Paper.

Bordalo, P., Gennaioli, N., Porta, R. L., and Shleifer, A. (2017). Diagnostic expectations

and stock returns. Technical report, National Bureau of Economic Research.

Bordalo, P., Gennaioli, N., and Shleifer, A. (2018b). Diagnostic expectations and credit

cycles. The Journal of Finance, 73(1):199–227.

Borovicka, J., Hansen, L. P., and Scheinkman, J. A. (2016). Misspecified recovery. The

Journal of Finance, 71(6):2493–2544.

Bouchaud, J.-p., Krueger, P., Landier, A., and Thesmar, D. (2019). Sticky expectations and

the profitability anomaly. The Journal of Finance, 74(2):639–674.

Brooks, J., Katz, M., and Lustig, H. (2018). Post-fomc announcement drift in us bond

markets. Technical report, National Bureau of Economic Research.

Buraschi, A., Piatti, I., and Whelan, P. (2018). Rationality and subjective bond risk premia.

Campbell, J. Y. (2003). Consumption-based asset pricing. Handbook of the Economics of

Finance, 1:803–887.

Campbell, J. Y. and Shiller, R. J. (1987). Cointegration and tests of present value models.

Journal of political economy, 95(5):1062–1088.

Cieslak, A. (2018). Short-rate expectations and unexpected returns in treasury bonds. The

Review of Financial Studies, 31(9):3265–3306.

39

Cochrane, J. H. (2011). Presidential address: Discount rates. The Journal of finance,

66(4):1047–1108.

Cochrane, J. H. and Piazzesi, M. (2009). Decomposing the yield curve. In AFA 2010 Atlanta

Meetings Paper.

Coibion, O. and Gorodnichenko, Y. (2015). Information rigidity and the expectations forma-

tion process: A simple framework and new facts. American Economic Review, 105(8):2644–

78.

Cooley, T. F. (1995). Frontiers of business cycle research. Princeton University Press.

De Bondt, W. F. and Thaler, R. (1985). Does the stock market overreact? The Journal of

finance, 40(3):793–805.

Duffee, G. (2013). Forecasting interest rates. In Handbook of economic forecasting, volume 2,

pages 385–426. Elsevier.

Gennaioli, N. and Shleifer, A. (2018). A Crisis of Beliefs: Investor Psychology and Financial

Fragility. Princeton University Press.

Giglio, S. and Kelly, B. (2017). Excess volatility: Beyond discount rates. The Quarterly

Journal of Economics, 133(1):71–127.

Gurkaynak, R. S., Sack, B., and Wright, J. H. (2007). The us treasury yield curve: 1961 to

the present. Journal of monetary Economics, 54(8):2291–2304.

Hamilton, J. D. and Wu, J. C. (2012). Identification and estimation of gaussian affine term

structure models. Journal of Econometrics, 168(2):315–331.

Jensen, C. S., Lando, D., and Pedersen, L. H. (2019). Generalized recovery. Journal of

Financial Economics, 133(1):154–174.

Le, A., Singleton, K. J., and Dai, Q. (2010). Discrete-time affine-q term structure models

with generalized market prices of risk. The Review of Financial Studies, 23(5):2184–2227.

40

LeRoy, S. F. and Porter, R. D. (1981). The present-value relation: Tests based on implied

variance bounds. Econometrica: Journal of the Econometric Society, pages 555–574.

Martin, I. W. and Ross, S. A. (2019). Notes on the yield curve. Journal of Financial

Economics.

Meyer, C. D. (2000). Matrix analysis and applied linear algebra, volume 71. Siam.

Piazzesi, M. and Schneider, M. (2011). Trend and cycle in bond premia. Manuscript, Stanford

Univ., http://www. stanford. edu/ piazzesi/trend cycle. pdf.

Qin, L., Linetsky, V., and Nie, Y. (2018). Long forward probabilities, recovery, and the term

structure of bond risk premiums. The Review of Financial Studies, 31(12):4863–4883.

Ross, S. (2015). The recovery theorem. The Journal of Finance, 70(2):615–648.

Shiller, R. J. (1981). The use of volatility measures in assessing market efficiency. The Journal

of Finance, 36(2):291–304.

Tversky, A. and Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases.

science, 185(4157):1124–1131.

Walden, J. (2017). Recovery with unbounded diffusion processes. Review of Finance,

21(4):1403–1444.

41

A Affine term structure models with overreacting be-

liefs

Consider first the standard Q-affine term structure models, which are defined by the two

following ingredients. First, few factors (or state variables) Xt = (X1,t, X2,t, · · · ) drive the

short rate in an affine fashion and, second, the Q-dynamics (assuming no arbitrage) of the

factors is a VAR(1) with homoskedastic shocks.

rt = δ0 + δ>1 Xt

Xt = ρQXt−1 + ΣQεQt .

This defines the class of Q-affine models I consider, together with sufficient regularity

conditions for the quantities computed to be well defined. The convenient affine specification

and linear dynamics implies that prices are exponentially affine in the factors:

yt,m = − 1

mlogPt,m = − 1

mEQt [em·rt,m ]

= δ0 − logEQt

[e−ε

Qt,m

]+δ>1m

m−1∑i=0

(ρQ)iXt,

where bQm =δ>1m

∑m−1i=0 (ρQ)i and εQt,m =

∑m−1k=1

∑li=1(ρQ)l−1ΣQεt+i. The previous expression

is affine since Q shocks are independent of time t information and therefore the cumulant

generating function in the previous expression is maturity dependent by not state dependent.

This setting is quite general since I do not have assumptions about the physical measure nor

about the SDF other than technical regularity conditions, which are worked out in Le et al.

(2010).

First, I discuss how this setting relates to the empirical analysis of Section 3 and Section 4

and then how this setting relates to the theoretical model of beliefs formation developed in

Section 2. I do so by incrementally adding structure and assumptions needed, relative to the

Q-affine benchmark so far discussed.

In section 3, I apply the recovery theorem independently estimating the Q measure at dif-

42

ferent maturities. This amounts to detect distortions in the sensitivity coefficients bQm =

δ>1m

∑m−1i=0 (ρQ)i. Theorem (1) shows that overreacting beliefs can both generate such distor-

tions, which, in turn, account for the ? excess volatility puzzle and explain the predictability

of forecast error documented with survey data.

How do beliefs generate such distortions? Assume that, under the physical measure P, factors

evolves in a Markovian fashion:

Xt+m = f (m)(Xt)Xt + ΣPεPt+m,C ,

where f (m)(Xt−1) = f(f(· · ·︸︷︷︸m times

(Xt−1) denotes a Markovian, yet possibly non linear dynam-

ics for the factors and εPt+m,C =∑m−2

i=0 f (i+1)(Xt)ΣPεPt+1+i denotes the cumulated shock to

maturity m, which is heteroskedastic if the P-dynamics is non linear. Then, assume that,

when considering horizon m, the market distorts the dynamics of the factors in a maturity

m dependent fashion. Formally, ∀ l ≤ m:

Xt+l = (1 + ψθm)f (l)(Xt)Xt + σθmεPt+l,C .

The local persistences of all fundamentals up to time t + m are inflated if ψθm > 0.

Equivalently, cumulated shocks on the factors at time t + l when forming expectations at

maturity m ≥ l are distorted as:

εPθ

t+l,C = σθmεPt+l,C + ψθmf

(l)(Xt)Xt.

Then, the distorted interest rates to maturity m at time t reads:

rt,m =1

m

m∑i=0

rt+i = δ0 + (1 + ψθm)bPm(Xt)Xt + σθmεPt+m,C .

where bPm as well as cumulated shocks to interest rates are defined as in Section 2. Then,

I assume that the SDF is such that:

43

Xt+l = (ρQ)lXt + ψθmf(l)(Xt)Xt + σθmε

Qt+l,C .

Equivalently, shocks on the factor at time t + l when forming risk adjusted expectations

at maturity m ≥ l are distorted as:

εQθ

t+l,C = σθmεQt+l,C + ψθmb

Pm(Xt)Xt.

which implies that:

rt,m =1

m

m∑i=0

rt+i = δ0 + bQmXt + ψθmbPm(Xt)Xt + σθmε

Qt,m.

This expression settles the ground to test both excess volatility of yields (bPm(Xt) positively

and increasingly in m contributing to the volatility of yields) and increasingly over-reacting

beliefs (ψθm > 0 and increasing in m).

44

B Estimation

B.1 Factor construction

I use the data available from Gurkaynak et al. (2007), who provide a daily estimate of the

yield curve, from ??, with maturities 1y, 2y, . . . 30y

Factors are constructed as follows. First estimate of the variance covariance matrix as:

1

T

T∑t=1

yty>t︸︷︷︸

30×30

.

Then, I consider the first three principal components of the variance covariance matrix,

PC1︸︷︷︸30×1

. PC1 and PC3. Principal components are unconditionally orthogonal. For i = 1, 2, 3, I

build factor Xi as the projection of the yield curve into PCi:

Xt,i := 〈PCi, yt〉.

Principal components and factors are plotted in Figure ??.

B.2 Parameter Estimation

We want to estimate the parameters of the risk neutral dynamics:

yt+1 = δ0 + δ>1 Xt

Xt = ρXt−1 + ΣCεQt

where is the upper triangular Cholesky decomposition of the one period variance-covariance

Σ. This is assumed not to be stochastic and to be the same under Q and under P. Therefore

is can be estimated via OLS.

45

In order to estimate the remaining parameters ρ1, ρ2 and ρ3, let me compute the affine

relation for yields. The price at time t of a bond with time to maturity m ≥ 1 is:

Pt,m = e−yt,mm = EQt

[e−(rt+1+···+rt+m)

]=

= exp

{−δ0m−

(3∑i=1

δ1,i

m−1∑k=0

ρki

)Xt,i

}EQt

[exp

{−

m−2∑i=0

δ>1 ΣCρi

}].

Moving from prices to yields:

yt,m = − logPt,mm

= δ0 +δ>1∑m−1

k=0 ρk

mXt −

logEQt

[exp{−∑m−2

i=0 δ>1 ΣCρi}]

m

Note that the maturity m need to be expressed in the same units of the sampling fre-

quencies. For example, for monthly data and yearly maturities for 1y to 30y, the maturities

should be expressed in months, m = 12× 1, . . . 12× 30.

Let us know discuss hot to estimate model’s parameters via OLS. First, we get δ0, δ>1 via

OLS estimation of:

y1t := rt = δ0 + δ>1 Xt.

Thus we can estimate ρi,τ as:

ρ := arg minρi,τ

(1

T

T∑t=1

30∑m=1

yt,m(ρ)− ym,t

)2

,

where yt,m(ρ) is computed via the affine relation, while ym,t denote the data. Similarly, for

computing, maturity by maturity parameters, I compute:

ρm := arg minρi,τ

(1

T

T∑t=1

yt,m(ρ)− ym,t

)2

.

46

B.3 Discretization

I need to discretize the state space in order to compute the Arrow-Debreu price matrix. There

are well known methods to efficiently discretize an AR(1). In the case of multivariate pro-

cesses with not trivial correlations, the following transformation provide a set of uncorrelated

equation:

Xt −→ Zt := ΣC−1Xt.

Now, consider the VAR(1) Q-dynamics:

Xt = ρXt−1 + ΣCεQt .

Multiplying both sides by ΣC−1, I get:

Zt = ρZt−1 + εQt .

This is a system of three independent AR(1) that can be independently discretized. After

the discretization I come back to the real factor with the inverse transformation to the

discretized version of Zt, say ZDt :

XDt ←− ΣCZD

t .

I rely on the Rouwenhorst discretization method, which matches conditional and uncon-

ditional first and second moments, and is the state of the art among those technique. I use

10 levels for each factor, thus having overall 1000 states. This choice does not seems to affect

the results.

47

C Tables

48

Y1

Y2

Y3

Y4

Y5

Y6

Y7

Y8

Y9

Y10

const

3.71

***

3.97

***

4.21

***

4.43

***

4.63

***

4.81

***

4.97

***

5.11

***

5.23

***

5.80

***

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

X1,t

0.21

***

0.22

***

0.22

***

0.21

***

0.21

***

0.20

***

0.20

***

0.19

***

0.19

***

0.16

***

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

X2,t

0.52

***

0.42

***

0.33

***

0.25

***

0.19

***

0.13

***

0.08

***

0.04

***

0.01

***

-0.1

6***

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

X3,t

0.25

***

0.16

***

0.08

***

0.01

***

-0.0

5***

-0.1

1***

-0.1

5***

-0.1

8***

-0.2

1***

0.36

***

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

R-s

quar

ed1.

001.

001.

001.

001.

001.

001.

001.

001.

001.

00N

o.ob

serv

atio

ns

7903

7903

7903

7903

7903

7903

7903

7903

7903

7903

Tab

le1:

Reg

ress

ions

ofyie

lds

onfa

ctor

s

Y11

Y12

Y13

Y14

Y15

Y16

Y17

Y18

Y19

Y20

const

5.43

***

5.51

***

5.58

***

5.64

***

5.69

***

5.73

***

5.76

***

5.79

***

5.81

***

5.83

***

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

X1,t

0.18

***

0.18

***

0.18

***

0.18

***

0.18

***

0.17

***

0.17

***

0.17

***

0.17

***

0.17

***

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

X2,t

-0.0

4***

-0.0

6***

-0.0

7***

-0.0

9***

-0.1

0***

-0.1

1***

-0.1

2***

-0.1

2***

-0.1

3***

-0.1

3***

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

X3,t

-0.2

2***

-0.2

2***

-0.2

1***

-0.1

9***

-0.1

7***

-0.1

4***

-0.1

1***

-0.0

8***

-0.0

5***

-0.0

1***

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

R-s

quar

ed1.

001.

001.

001.

001.

001.

001.

001.

001.

001.

00N

o.ob

serv

atio

ns

7903

7903

7903

7903

7903

7903

7903

7903

7903

7903

49

Y21

Y22

Y23

Y24

Y25

Y26

Y27

Y28

Y29

Y30

const

5.84

***

5.84

***

5.85

***

5.85

***

5.84

***

5.84

***

5.83

***

5.82

***

5.81

***

5.80

***

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

X1,t

0.17

***

0.17

***

0.17

***

0.17

***

0.17

***

0.17

***

0.16

***

0.16

***

0.16

***

0.16

***

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

X2,t

-0.1

4***

-0.1

4***

-0.1

4***

-0.1

5***

-0.1

5***

-0.1

5***

-0.1

5***

-0.1

5***

-0.1

6***

-0.1

6***

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

X3,t

0.03

***

0.06

***

0.10

***

0.14

***

0.18

***

0.22

***

0.25

***

0.29

***

0.33

***

0.36

***

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

(0.0

0)(0

.00)

R-s

quar

ed1.

001.

001.

001.

001.

001.

001.

001.

001.

001.

00N

o.ob

serv

atio

ns

7903

7903

7903

7903

7903

7903

7903

7903

7903

7903

50

The estimated covariance matrix of the factors is:

Σ =

1. −0.333506 −0.221877

−0.333506 1 0.033376

−0.221877 −0.333506 1

.

The Cholesky decomposition of the estimated covariance matrix of the factors is:

ΣC =

1. 0. 0.

−0.333506 0.94274798 0

−0.221877 −0.0430882 0.974122171

.

Average estimated ρ :

ρQ =

0.968190 0 0

0 0.92868686 0

0 0 0.911450

51

D The Recovery Theorem

Let me now show how the path independence assumption helps with the identification of

beliefs. Under path independence the fundamental asset pricing equation can be written as:

Aθijzj = δziPθij.

Noticing that probabilities add up to one, one gets:

∑j

Aθijzj = δzi.

This is an eigenvalue problem from the Arrow-Debreu matrix. By Perron-Frobenious the-

orem18 ( see Meyer (2000)), the Arrow-Debreu matrix has unique positive eigenvector (which

can be identified with z) corresponding to the largest eigenvalue (which can be identified

with δ). Therefore, under path independent SDF it is possible to identify beliefs as:

Pθij = Aθij

1

δ

zjzi.

18The Perron-Frobenious theorem assumes a positive and irreducible matrix. The Arrow-Debreu matrix ispositive and irreducible under no arbitrage.

52

So far, I exploited only one period Arrow-Debreu prices. Do prices of multi-period Arrow-

Debreu secutiries provide additional information? If the law of iterated expectation holds,

then long term beliefs are iterations of short term belies:

Pm = P× · · · × P︸︷︷︸m times

= Pm,

where Pm is the m-th power of the one period transition probability matrix P. In this

case one period Arrow-Debreu prices are sufficient to identify the term structure of beliefs.

However, if the law holds true or not is ultimately an empirical question: moreover, a testable

implication of the law of iterated expectation is the fact that the term structure of CG

coefficients needs to be flat (2). Given a panel of bond prices {Pt,m}t,m, Arrow-Debreu prices

to maturity m can be therefore estimated by relying of the time series {Pm,t}t only. Using

the recovery theorem, the econometrician can access the term structure of beliefs Pθm, where

[Pθm]ij is the believed probability of transitioning from state i to state j in m steps and test

the horizon dependence of beliefs as discussed in (2).

53

E Proofs

Proposition 1

Proof. From the last equation in Appendix A, it is straightforward to compute:

yt,m = − 1

mEQθt

[e−m·rt,m

]= δ0 −

1

mlogEQ

[e−σ

θmε

Qt,m

]+ bQmXt + ψθmb

Pm(Xt)Xt.

The expectations is the cumulant generating function is independent of time t information

because Q shocks are homoskedastic.

Theorem 1 (Increasing over-reaction and excess volatility)

Proof. First, in order to compute the forecast of future yields (which are an endogenous

variable) at maturity m, note that the agent first compute distorted forecasts of factors and

then she plug those in into the pricing equation which relates yields to factors. Therefore,

there is a compounding effect of distortion coefficients, which will appear in the following

calculations. The following approximation will be helpful. Consider the term:

b>mCov(Ft[Xt+1], Ft[Xt+1]− Ft−k[Xt+1])bmb>mV[Ft[Xt+1]− Ft−k[Xt+1]]bm

,

where Ft[Xt+1] − Ft−k[Xt+1] is the forecast revision, when the reference distribution in

the expectation model is the k−lagged one. The case k = ∞ corresponds to taking the

unconditional (or historical) average as benchmark. Convex combination of the past forecasts

can be easily handled as well. In this proof, bm := bPm(Xt) for convenience of notation. If the

DGP is a V AR(1) with diagonal persistence matrix ρ, then:

b>mCov(Ft[Xt+1], Ft[Xt+1]− Ft−k[Xt+1])bmb>mV[Ft[Xt+1]− Ft−k[Xt+1]]bm

= 1− b>m(ρ2 − ρ1+k)V[Xt]bmb>mV[Ft[Xt+1]− Ft−k[Xt+1]]bm

. (14)

54

Therefore, the term is approximately equal to one, for highly persistent processes. The

argument goes through also for non linear processes, provided that the local persistence ρ(Xt)

is, on average, close to one. Moreover, for a fixed persistence matrix ρ, the deviation from

one, asymptotically vanishes for m → ∞. This is so because each entry of bm convergences

geometrically for large maturities. The approximation is exact also in the special case of

exactly equally persistent factors or in the special case of a single factor model.

i) Under rational expectations. βm = 0 because the forecast error is unpredictable on the

basis of past information.

Under maturity independent distortion ψθ:

βm =Cov

(FEθ

t+1[yθt+1,m], FRθt [y

θt+1,m]

)V[FRθ

t [yθt+1,m]

] =−ψθ(1 + ψθ)

(1 + ψθ)2

(bθm)>Cov (Ft[Xt+1], FRt[Xt+1]) bθm(bθm)>V[FRt[Xt+1]]bθm

.

Under average reference forecast or under one factor model or under approximation (14) the

second fraction reduces to a positive constant. Therefore βm is negative ⇐⇒ ψθ > 0.

Under maturity dependent distortion ψθm:

βm =Cov

(FEθ

t+1[yθt+1,m], FRθt [y

θt+1,m]

)V[FRθ[yθt+1,m]

]=−ψθm+1(1 + ψθm)

(1 + ψθm)2

(bθm)>Cov(Ft[Xt+1], Ft[Xt+1]− ψθm+2

ψθm+1Ft−1[Xt+1]

)bθm

(bθm)>V[Ft[Xt+1]− ψθm+2

ψθm+1Ft−1[Xt+1]]bθm

.

Under average reference forecast or under one factor model the second fraction reduces to a

positive constant. Under approximation (14) and assuming ψm+2

ψm+1ρi < 1 for each entry of the

persistence matrix ρ, the second fraction reduces to a positive constant. βθm is thus negative

and increasing ⇐⇒ ψθm > 0 and ψθm is increasing in m.

ii) It follows from the expression in Proposition (1).

Theorem 2

Proof. i) Calculations

55

ii)

1.) Under rational expectations, the misspecified CG coefficients, computed with forecasts

identified by the recovery theorem and distorted by the martingale component of the SDF,

are:

βm =Cov

(FEt+1[yt+1,m], FRt[yt+1,m]

)V[FRt[yt+1,m]

] .

First, consider the one period ahead misspecified forecast. I drop the additive constant

aQm in the following calculations, because it is inessential for the computation of covariances.

Ft[Xt+1] = b>mEt [ht+1Xt+1] = b>mEt [Xt+1]Et [ht+1] + b>mCovt (ht+1,Xt+1)

= b>mEt [Xt+1] + b>mCovt (ht+1,Xt+1) .

Therefore, the misspecified forecast error reads:

FEt+1[yt+1,m] = yt+1,m − Et [yt+1,m] = FEt+1[yt+1,m]− b>mCovt (ht+1,Xt+1) .

Similarly, the two period ahead misspecified forecast reads:

Ft−1[yt+1,1] = b>mEt−1 [ht+1Xt+1] = b>mEt−1 [Xt+1]Et−1 [ht+1] + b>mCovt−1 (ht+1,Xt+1)

= b>mEt−1 [Xt+1] + b>mCovt−1 (ht+1,Xt+1) .

The forecast revision of yields with maturity m reads:

FRt[yt+1,m] = FRt[yt+1,m] + b>m (Covt (ht+1,Xt+1)− Covt−1 (ht+1,Xt+1)) .

The covariance between forecast error and forecast revision reads:

Cov(FEt+1[yt+1], FRt[yt+1,m]

)= b>mB1bm,

56

where:

B1 := −Cov(

Covt (ht+1,Xt+1) , FRt[Xt+1]).

I used the fact that under rational expectations the forecast error and the forecast revision

are orthogonal. Note that the term B1 does not depend on the maturity m. The variance of

the forecast revision reads:

V[FRt[yt+1,m]

]= b>m (FRt[Xt+1] +B2) bm,

where:

B2 = V [Covt (ht+1,Xt+1)− Covt−1 (ht+1,Xt+1)]

+ 2Cov (FRt[Xt+1], Covt (ht+1,Xt+1)− Covt−1 (ht+1,Xt+1)) .

The misspecified CG coefficients therefore read:

βm =b>m (Cov (FEt+1[Xt+1], FRt[Xt]) +B1) bm

b>m (V[FRt[Xt+1]] +B2) bm=

b>mB1bmb>m (V[FRt[Xt+1]] +B2) bm

,

which, under approximation (14), does not depend on m. The denominator is positive,

since it is a variance. The numerator is positive in the case of standard consumption based

asset pricing models19. In this case, the econometrician detects a flat term structure of CG

coefficients, even though there are not departures from RE. Conversely, a non trivial term

structure of CG regression coefficients reveal to the econometrician that beliefs are not ra-

tional.

2. Non rational expectations. First, consider the one period ahead misspecified forecast.

19In standard consumption based asset pricing models, the SDF is negatively correlated with consumptiongrowth, which corresponds to a linear combination of the factors. Therefore, when positive news arrives,the first term in the unconditional covariance in B1 decreases, while the forecast revision increases. Theunconditional covariance is therefore negative and thus B1 is positive.

57

I drop the additive constant aQθ

m in the following calculations, because it is inessential for the

computation of covariances.

FEθ

t+1,m := yθt+1,m − EPθt

[yθt+1,m

]= yθt+1,m − (1 + ψθm+1)EP

t

[yθt+1,m

]yθt+1,m − EPθ

t

[yθt+1,m

]= yθt+1,m − (1 + ψθm+1)(EP

t

[yθt+1,m

]+ EP

t

[yθt+1,m

]− EP

t

[yθt+1,m

])

= FEθt+1,m +

(EPt

[yθt+1,m

]− EP

t

[yθt+1,m

])− ψθ

m+1EPt

[yθt+1,m

].

Distorted and misspecified CB coefficients reads:

βθm = βθm +bθm>(Cov

(Ft+1[Xt+1]− Ft[Xt+1], Ft[Xt+1]− ψm+2

ψm+1Ft−1[Xt+1]

)+B1

)bθm

bθm>(V[FRt[Xt+1]] +B2

)bθm

the second term, under one factor model or under approximation (14) does not depend

on m. The coefficients βθm are negative and decreasing iff ψθm is positive and increasing and

provided that (bθm)>V[Ft[Xt+1]]bθm(bθm)>Cov(Ft[Xt+1],Ft−1[Xt+1]bθm

>ψθ2ψθ1

. Differences of distorting coefficients remain

identified.

Theorem (3 ( P-diagnostic expectations)

Proof.

rt,mP∼ N

(δ0,

1

m2(δ1)>Σ(1− (ρP)2))−1δ1

)

and

rt+m|XtP∼ N

(δ0 + (δ1)>

(m−1∑i=0

(ρP)i

)Xt,

1

m2(δ1)>Σ

(m−2∑i=0

(ρP)2i

)δ1

).

We want to compute the distribution:

58

fPθ(rt+m|Xt) ∝ fP(rt+m|Xt)

(fP(rt+m|Xt)

fP(rt+m)

)θ. (15)

First observe that, given G1 ∼ N (µ1, σ21) and G2 ∼ N (µ2, σ

22) with σ2

2 > σ21:

1∫R fG2(x)

(fG1

(x)

fG2(x)

)θdxfG2(x)

(fG1(x)

fG2(x)

)θ

is a Gaussian pdf with mean:

µ1 +θσ21

σ22

1 + θ − θ σ21

σ22

(µ1 − µ2),

and variance:

(1 + θ

σ21

− θ

σ22

)−1

.

Then, apply the previous computation to rt,m.

Rational pricing of zero coupon bonds

Consider the arbitrage free price of zero coupon bonds (ZCB) at time t with time to

maturity t+m:

Pt,m = EQt

[e−rt,m

],

where, as usual, rt,m =∑m−1i=0 rt+im

and rt denotes the short rate at time t. Assume that the

short rate is an affine function of a set of factors, rt = δ0 + δ>1 Xt, which evolve as a VAR(1)

under the risk neutral measure:

Xt+1 = ρQXt + ΣQCεQt+1.

ΣQC is upper triangular, ρQ is diagonal and ΣQ := (ΣQC)>ΣQC is the variance covariance

matrix of the residuals. Then, rt,m has conditional risk neutral mean and variances given by:

59

EQt [rt,m] = δ0 +

δ>1∑m−1

i=0 (ρQ)iXt

m

VQt [rt,m] =

δ>1∑m−2

i=0

∑ij=0(ρQ)2jΣQδ1

m2

Consider the following class of additive P-dynamics for the factors.

Xt+1 = fP(Xt) + ΣPCεPt+1,

where ΣPC is upper triangular, fP(Xt) is diagonal and ΣP := (ΣPC)>ΣPC is the variance

covariance matrix of the residuals. The P dynamics is assumed to be Markovian as the risk

neutral one and with additive noise. It not assumed to be linear. The linear case is simply

recovered imposing fP(Xt) = ρPXt. Then, rt,m has conditional physical mean and variances

given by:

EPt [rt,m] = δ0 +

δ>1∑m−1

i=0 (fPi (Xt)

m

VPt [rt,m] =

δ>1∑m−2

i=0

∑ij=0(fP

j )2ΣPδ1

m2,

where fP0 (Xt) = Xt, f

P1 (Xt) = fP(Xt) and, for i > 1, fP

i (Xt) = fP(. . . fP︸︷︷︸i times

(Xt)).

Thus, under this class of asset pricing models the following change of measure applies:

ΣQCεQt+1 = ΣPCεPt+1 + (fP(Xt)− ρQXt)

= ΣPCεPt+1 + EPt [Xt+1]− EQ

t [Xt+1] .

Effective shocks to interest rates with maturity m, rt,m −Ekt [rt,m], for k = P,Q are given by:

60

δ>1

m−2∑i=0

i∑j=0

(ρQ)jΣQεPt+i = δ>1

m−2∑i=0

i∑j=0

fPj (Xt)Σ

PεPt+i +m

(δ>1

m−1∑i=0

fPi (Xt)− δ>1

m−1∑i=0

(ρQ)iXt

),

or:

√VQt [rt,m]εQt,m =

√VPt [rt,m]εPt,m + EP

t [rt,m]− EQt [rt,m], (16)

where: εkt,m :=∑m−1i=1

∑ij=0 ε

kt+j

m, for k = P,Q.

Pricing of zero coupon bonds with over-reacting beliefs

First consider the case in which beliefs distortions only affects linearly the first moment:

EPθt [rt,m] := at,mEP[rt,m] + bt,m, where at,m, bt,m and known at time t. Then, an agent which

believes EPθt [rt,m] is the correct mean, while having the same preferences as in the rational

case, will adjust beliefs as:

√VQt [rt,m]εQt,m =

√VPt [rt,m]εPt,m + at,mEP

t [rt,m] + bt,m − EQθt [rt,m].

The risk neutral distorted expectation EQθt [rt,m] is determined from the condition that the

change of measure (preference adjustment) is unchanged:

EQθt [rt,m] = EQ

t [rt,m] + (at,m − 1)EPt [rt,m] + bt,m.

Consider now the case in which the variance is also distorted, in particular it is scaled as:

VPθt [rt,m] = c2

t,mVPt [rt,m], where c2

t,m is known at time t. Then:

√VQθt [rt,m]εQt,m = ct,m

√VPt [rt,m]εPt,m + at,mEP

t [rt,m] + bt,m − EQθt [rt,m].

Then:

VQθt [rt,m] = VQ

t [rt,m]ct,m,

61

and:

EQθt [rt,m] = EQ

t [rt,m] + (at,m − 1)EPt [rt,m] + bt,m.

Note that if Pθ satisfies the law of iterated expectations, then:

EPt−k[EPθ

t [rt,m]− EPt [rt,m]] = (at−k,m − 1)EP

t−k[rt−k,m] + bt−k,m = EPθt−k[rt−k,m]− EP

t−k[rt−k,m].

This means that the conditional bias EPθt [rt,m] − EP

t [rt,m] is a P−martingale and it is

therefore unpredictable. Similarly, temporary misspricing is not predictable. In the case the

law is violated instead there is, in principle, predictable misspricing (arbitrage opportunities).

62

Figure 11: Time mean (blue circle) and median (red triangle) of analysts dispersion (crosssectional standard deviation) as a function of the maturity.

F Additional Results

Figure (5) shows that distortions from recovered beliefs are a combination of the ”aver-

age” distortions (i.e. the distortion from the pooled regression) and of the distortion of the

”average” (i.e. the consensus forecast), at least statistically. To quantitatively predict the

aggregate outcome, one would need a specific model of aggregation, which is left for future

work. The aforementioned mechanism proposed by Bordalo et al. (2018a) is however consis-

tent with some evidence from the cross-sectional dispersion of forecasters: Figure (11) shows

that the average cross-sectional dispersions decrease with maturity. Views are more aligned

for predictions at longer horizons than for predictions at shorter horizons20. However, at

longer horizon, data are much less rich, which may weaken the claim. In fact, as shown in

Figure (5), the maturities 20y and 30y, it is not possible to categorize forecasters along the

mean square error dimension; also, for the consensus forecast, confidence intervals are huge.

20This may be intuitive thinking that a sharp benchmark for long run interest rates forecasts may be theone set by central banks.

63