Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Are Cross-Sectional Predictors Good Market-Level Predictors?
JOSEPH ENGELBERG, R. DAVID MCLEAN, JEFFREY PONTIFF,
AND MATTHEW C. RINGGENBERG*
September 2019
ABSTRACT
Firm-level variables that predict cross-sectional stock returns, such as price-to-earnings and short
interest, are often averaged and used to predict market returns. We extend this literature and limit
the data-snooping bias by using a near-complete population of the literature’s cross-sectional
return predictors. We find the literature has ignored several cross-sectional variables–such as asset
turnover and Z-score–that contain strong in-sample predictability when examined in isolation.
However, after accounting for the number of predictors and their interdependence, we find little
evidence that cross-sectional predictors make good time-series predictors, especially out of
sample.
Keywords: Return predictability, data snooping, statistical bias, market risk premium.
JEL Code: G00, G14, L3, C1
* Engelberg is at the University of California, San Diego, McLean is at Georgetown University, Pontiff is at Boston
College, and Ringgenberg is at the University of Utah. The authors thank Mike Cooper, Bryan Kelly, Amit Goyal,
Owen Lamont, Allan Timmermann, and conference and seminar participants at the 2018 Society for Financial Studies
Cavalcade, the 2019 American Finance Association, MIT (Accounting), TCU, UC-Berkeley, University of Kentucky,
University of Michigan, UNLV, University of Utah, UC Riverside, University of Virginia, and Washington University
in St. Louis. All errors are our own. Comments welcome. © 2016-2019.
Electronic copy available at: https://ssrn.com/abstract=3459229
1
Is the market risk premium predictable? Financial research has strived to answer this
question going back, at least, to Dow (1920). “Data-mining” or “data-snooping” (Lo and
MacKinlay, 1990) poses a well-recognized obstacle in obtaining an answer: research is more likely
to be published in academic journals if it rejects the no-predictability null, and thus, statistically
significant predictors in the literature will have a disproportionately high Type I error rate. Harvey,
Liu, and Zhou (2015) demonstrate this using a wide range of cross-sectional predictors. Thus,
many apparent predictors featured in the literature will fail to predict outside of the original sample.
When choosing candidate right-hand-side variables to predict the market risk premium,
academics often draw from the well of cross-sectional predictors. For example, Baker and Wurgler
(2000) cite the extensive market-timing literature, which relates equity issuance to returns in the
cross-section, and suggest “that issuers try to time both their idiosyncratic return and the market
return.” They go on to find that aggregate equity issuance predicts aggregate market returns.
Similar arguments for aggregating cross-sectional predictors for the purpose of predicting the
market risk premium are found in Campbell and Shiller (1988) with P/E ratios, Seyhun (1988)
with insider trading, Chordia et al. (2002) with order imbalance, Rapach et al. (2016) with short
interest, and Wen (2019) with asset growth. In this paper, we examine cross-sectional predictors
such as these in light of the data mining bias. Are the aforementioned predictors simply well-
chosen among a much larger set of predictors? Or do cross-sectional predictors, in general,
aggregate to give market-level predictability?
To answer these questions, we use a sample of 140 cross-sectional predictors that is
essentially the population of firm-level characteristics that have demonstrated cross-sectional
predictability in the academic literature. This builds on the 97-predictor list used in McLean and
Electronic copy available at: https://ssrn.com/abstract=3459229
2
Pontiff (2016) and Engelberg, McLean, and Pontiff (2018). The benefit of using such an exhaustive
list is that it avoids selection biases, thereby reducing the data-snooping problem.
For each of our 140 predictors, we calculate cross-sectional averages of the firm-specific
values each month to get a single, monthly value. For each predictor, we construct equal-weighted
and value-weighted averages; the resulting database has 280 different predictive variables (140
equal-weighted and 140 value-weighted). To examine the market-level predictability of these
variables, we perform both in-sample and out-of-sample tests. Like other papers in the time-series
literature, our in-sample tests use the entire sample of data and estimate a single parameter estimate
from a time-series regression of the market risk premium on the predictor. Our out-of-sample tests
consist of expanding rolling window regressions that use only information available at each point
in time to examine whether a predictor is useful for forecasting the market’s risk premium.
At first glance, it appears that cross-sectional predictors are good market-level predictors
in-sample. Out of the 280 predictive variables, 268 of them are stationary and exhibit sufficient
time series variation.1 Of these 268 predictors, 54 (approximately 20%) predict 1-year market
returns in an OLS regression with coefficients that are significant at the 10% level or better and 20
(approximately 7%) are significant at the 1% level. Statistical significance is determined using the
stationary bootstrap of Politis and Romano (1994).
The strength of this result is strongly related to the horizon of predictability: when
considering 1-month market returns, only 34 of the 268 predictors (approximately 13%) are
significant at the 10% level and 8 (approximately 3%) are significant at the 1% level. We also find
that several cross-sectional predictors – such as Asset Turnover and Z-Score – are among the best
1 We test each of the 280 predictors for a unit root using an Augmented Dickey-Fuller (1979) test. If a variable is non-
stationary, we then calculate deviations from a linear trend model. If that is also non-stationary, we instead calculate
the first-difference. If that is also non-stationary, we drop the variable. We also drop variables that aggregate to form
a variable that does not exhibit time-series variation. This procedure yields 268 possible predictors.
Electronic copy available at: https://ssrn.com/abstract=3459229
3
performers for market predictability, but have not been documented in this literature previously.
For example, value-weighted asset turnover can predict the market risk premium with an R-
squared of 17.8% at the one-year horizon. Aggregate asset turnover has not been previously
proposed as a predictor of market risk premia.
Following McLean and Pontiff (2016), we also consider five sub-categories of cross-
sectional predictors: (i) Event, (ii) Fundamental, (iii) Market, (iv) Valuation, and (v) Opinion.
Valuation predictors are ratios of market prices to fundamentals. Many Valuation predictors have
received attention in the existing market risk premium literature, including the dividend-to-price
and earnings-to-price ratios. Moreover, since valuation ratios could be a function of discount rates,
they are arguably the most likely cross-sectional predictors to work in a time-series setting (Kelly
and Pruitt (2013)). Yet, when we examine the predictors broken out by the five sub-categories, we
find the weakest evidence for the Valuation and Opinion sub-categories. While the mean fraction
of significant predictors across all horizons is 16%, for the Opinion and Valuation sub-categories
this number falls to 11% and 14%, respectively. In other words, the Valuation predictors are
actually less likely to be significant. The Valuation result offers a comparison to Lewellen (2004),
who investigates whether valuation measures documented in the extant literature remain robust in
various subsamples. Lewellen is able to reject the null of predictability for dividend yield in both
subsamples, although his results for earnings-to-price and book-to-market depend on the
subsample examined. Our results show that a broader sample of Valuation measures shows only
weak in-sample evidence of return predictability.
Other subcategories fare better. Among Event predictors, which are based on corporate
events or changes in firm performance, 17% are statistically significant and the best performers
include Asset Turnover and Inventory Growth Rate. Among Market predictors, which are
Electronic copy available at: https://ssrn.com/abstract=3459229
4
predictors that only use price and trading volume data and no accounting data, 18% are statistically
significant and the best performers include Short Interest and Long-Term Reversal. Moreover, we
find similar evidence against the null of no-predictability when we examine value-weighted vs.
equal-weighted predictors; 24% of all value-weighted predictors are significant in-sample and 16%
of all equal-weighted predictors are significant in-sample.
We also find that, among the 280 predictors, the 10 variables that exhibit the best cross-
sectional predictability (from which we generate 10 equal-weighted and 10 value-weighted
aggregate predictors) also exhibit better in-sample time-series predictability: 23% are statistically
significant. Similarly, 32% of the 10 most commonly cited cross-sectional predictors are also
statistically significant. In other words, predictors that appear to be more important in the cross-
sectional predictability literature seem to make better time series predictors.
Since we examine 268 predictors, we expect that some variables will appear significant by
chance. Moreover, several cross-sectional variables are related to each other so the 268 tests we
conduct will not be independent. To address both the number of tests we conduct as well as the
dependency among the tests, we perform the Romano and Wolf (2005) resampling-based
stepdown procedure to compute adjusted p-values that control the family-wise error rate while
accounting for the number of tests and dependence. This procedure is better than either the well-
known Bonferroni (Dunn (1961)) or Holm (1979) tests because it considers the dependence
structure across multiple tests and thus has more power (Romano and Wolf (2005)).2
Using in-sample regressions, when we examine the full set of 268 predictors and perform
the Romano and Wolf (2005) stepdown procedure we are unable to reject the null of no-
2 Both the Bonferroni (Dunn (1961) and Holm (1979) tests lack power in that they tend not to reject the null enough
when the null is false.
Electronic copy available at: https://ssrn.com/abstract=3459229
5
predictability at the 1% level for any predictor at any horizon. At the 1-month and 3-month
horizons, no variable is significant even at the 10% level. At the 12-month horizon, Z-Score, Size
and Asset Turnover have Romano and Wolf p-values between 3% and 5%. In short, we find most
of the cross-sectional variables that appeared to be statistically significant when examined in
isolation are no longer when examined in the context of all cross-sectional predictors. Moreover,
there is no cross-sectional predictor that appears to be a stellar time-series predictor after
accounting for the number of cross-sectional variables that currently exist in the literature.
Things look even bleaker for cross-sectional predictors when we consider out-of-sample
forecasting regressions. Among the 263 stationary predictors we examine in out-of-sample
regressions,3 3% significantly predict market returns at the 1-month horizon and 17% predict
market returns at the 12-month horizon (compared with 12% and 20% in-sample). Moreover, once
we adjust the individual p-values using the Romano and Wolf (2005) stepdown procedure, we no
longer find evidence against the null at the 10% level for any predictor at any horizon. For example,
in our out-of-sample tests, Asset Turnover again appears to be a strong predictor. It exhibits
positive out-of-sample R-squared values at every forecasting horizon, with an impressive R-
squared of 17% at the one-year horizon. However, the corresponding Romano and Wolf (2005)
adjusted p-value is not statistically significant at any of the usual levels. Overall, we find little
evidence of any return predictability in out-of-sample regressions after accounting for the number
of cross-sectional predictors and their interdependence.
Our paper makes a number of contributions. First, our paper provides context for existing
papers which propose that a particular cross-sectional predictor should be transformed into a time
3 While there are 268 stationary predictors in total, we lose five predictors in our out-of-sample regressions because
they do not have long enough time-series to conduct out-of-sample regressions.
Electronic copy available at: https://ssrn.com/abstract=3459229
6
series one. Our paper says that existing results in the literature, which have economically large
coefficients and impressive t-stats, would obtain often enough by chance if we ran tests across our
268 predictors. Thus, absent a reason for a specific predictor’s conversion from cross-sectional
predictor to time-series predictor, our results suggest that the results for a given single predictor
should be interpreted with caution.
Second, our results provide new insight into the nature of return predictability, both in the
cross-section and the time-series. By aggregating cross-sectional anomalies into time series
variables, we are able to understand whether cross-sectional anomaly variables contain
information about both the idiosyncratic and systematic components of returns. Our results suggest
they largely contain idiosyncratic information.
Finally, we also contribute to the extensive literature on predicting the equity risk premium.
While Goyal and Welch (2008) show that 14 popular time-series variables do not significantly
predict returns in out-of-sample tests, subsequent papers have documented evidence of return
predictability using firm-level variables aggregated across stocks (e.g., Hirshleifer, Hou, and Teoh
(2009), Rapach, Ringgenberg, and Zhou (2016)). Our results extend these findings by showing
that several other cross-sectional anomalies can be aggregated to form time series predictors.
However, like Goyal and Welch (2014), we find weak evidence of predictability out-of-sample
among these new predictors.
The remainder of this paper proceeds as follows: Section I briefly describes the existing
literature and outlines the theoretical relation between cross-sectional predictive variables and time
series return predictability. Section II describes the data used in this study. Section III characterizes
our findings and Section IV concludes.
Electronic copy available at: https://ssrn.com/abstract=3459229
7
I. Background
Financial researchers have examined the predictability of stock returns for over a century
(e.g., Gibson (1906)) and a large literature has documented evidence of predictability in the cross-
section of stock returns. A separate literature has examined the predictability of the equity risk-
premium using time-series predictive variables. Yet, to date, these two literatures have evolved
relatively independently. We connect these two literatures by creating a sample of time-series
predictors based on 140 well-known cross-sectional predictors.
A. Time Series Return Predictability
A number of papers find in-sample evidence of time series return predictability, but out-of-
sample evidence is rare, suggesting that many predictors are the result of data snooping (i.e.,
overfitting). For example, Bossaerts and Hillion (1999) use model selection criteria from the
statistics literature to choose candidate predictors, which allows them to partially avoid data
snooping biases, yet they find that the resulting predictors are unable to forecast returns in out-of-
sample tests. Similarly, Goyal and Welch (2008) examine 14 popular predictors from the existing
literature and find that they fail to forecast the equity risk premium in out-of-sample tests. Cooper
and Gulen (2006) note that researchers have many different choices regarding the specification of
predictability tests, including the predictor variables, the estimation periods, and the assets being
forecasted. They perform specification searches across these parameters and find that return
predictability results are highly sensitive to these parameter choices. More recently, Bartsch,
Dichtl, Drobetz, and Neuhierl (2017) examine a wide variety of possible permutations of the
predictors in Goyal and Welch (2008) and the technical predictors in Neely, Rapach, Tu, and Zhou
Electronic copy available at: https://ssrn.com/abstract=3459229
8
(2014) and find that most out-of-sample performance for these variables appears to be due to data
snooping.
In light of the poor performance of many predictive variables in out-of-sample tests,
researchers have focused on developing methodologies that are robust to data snooping concerns.
Foster, Smith, and Whaley (1997) develop a procedure to account for data snooping biases when
evaluating the fit of predictive regressions. White (2000) develops a reality check bootstrap (RCB)
to account for data snooping biases that result from specification searches, and Sullivan,
Timmermann, and White (1999) show how to apply the RCB procedure using a set of technical
trading rules. While the White RCB procedure determines whether the best predictor among a
group is statistically significant after adjusting for data snooping biases, Romano and Wolf (2005)
show how to adjust the p-value for each individual predictor to account for data snooping biases.
To the best of our knowledge, we are the first to apply the Romano and Wolf (2005) stepdown
procedure to a large set of predictive variables derived from existing academic studies.
B. Cross-sectional Return Predictability
While the literature on time series return predictability has generally found that most
predictors fail to perform in out-of-sample tests, a large literature finds evidence of return
predictability in the cross-section of stocks. Consequently, a number of recent papers have
examined the multi-dimensionality of the cross-section of returns. McLean and Pontiff (2016)
examine 97 anomalies in the extant literature and find many anomalies exhibit lower returns
following publication in an academic journal. Green, Hand, and Zhang (2017) examine 94 firm
characteristics simultaneously and find that only 12 of them reliably contain information about
returns. Harvey, Liu and Zhu (2016) note there are more than 300 factors (many of them are not
Electronic copy available at: https://ssrn.com/abstract=3459229
9
based on cross-sectional variables) in the academic literature, and Harvey and Liu (2016) examine
these factors after accounting for potential data snooping. Finally, Hou, Xue, and Zhang (2017)
create and replicate more than 400 different cross-sectional predictors and find that many of these
variables do not predict returns in various subsamples with weighting schemes that favor the
largest stocks.4 However, they do not examine the relation between these predictors and aggregate
market returns as we do in this paper.
B. The Information in Cross-Sectional Anomaly Variables
While there are extensive literatures on both time series predictability and cross-sectional
predictability, they have largely evolved independently, with a few notable exceptions. Several
papers show that firm-level anomalies aggregate to market-wide predictors including Pontiff and
Schall (1998) with book-to-market ratios, Campbell and Shiller (1988) with P/E ratios, and
Chordia et al. (2002) with insider trading. More recently, Hirshleifer, Hou, and Teoh (2009) find
that firm-level accruals and cash-flow, when aggregated across stocks, contain information about
market returns, and Wen (2019) shows that aggregate asset growth predicts market returns. Finally,
Rapach, Ringgenberg and Zhou (2016) show that firm level short interest aggregates to form one
of the strongest known predictors of market returns.
C. The Information in Idiosyncratic Anomaly Variables
In this paper, our goal is to understand the relation between cross-sectional return
predictability and time-series return predictability. While it may seem natural that cross-sectional
4 Their set of characteristics is larger than ours since they use multiple versions of the same variable, e.g., using
quarterly and annual data and forecasting returns over 1-month and 1-year. Also, they include predictors that have
not been introduced in previous literature. In contrast, we use one version of each predictor, and the predictor needs
to be documented in a published study.
Electronic copy available at: https://ssrn.com/abstract=3459229
10
return predictors should aggregate across assets to generate time-series return predictors, it is
possible to have one without the other.5 Specifically, note that cross-sectional anomaly variables
could predict returns because they forecast either the idiosyncratic component of returns or the
systematic component of returns. As such, cross-sectional return predictors do not necessarily
aggregate to form good time series predictors. To see this, define a variable 𝑋𝑖,𝑡𝑖𝑑𝑖𝑜 as an
idiosyncratic anomaly if it forecasts the idiosyncratic portion of stock returns for asset i on date
t+1. Using the market model, we can write the return on asset i as:
𝑅𝑖,𝑡 = 𝑅𝑓 + β𝑖(𝑅𝑚,𝑡 − 𝑅𝑓) + 𝜀𝑖,𝑡, (1)
where 𝑅𝑖,𝑡 is the return on stock i on date t, 𝑅𝑓 is the risk-free rate, 𝑅𝑚,𝑡 is the market return, and
is the idiosyncratic portion of stock i's return. We define an idiosyncratic anomaly as a variable
𝑋𝑖,𝑡−1𝑖𝑑𝑖𝑜 that satisfies 𝛾1 ≠ 0 in a linear regression of the form: 6
𝜀𝑖,𝑡 = 𝛾0 + 𝛾1𝑋𝑖,𝑡−1𝑖𝑑𝑖𝑜 + 𝜔𝑖,𝑡, (2)
where 𝜀𝑖,𝑡 is the abnormal return from the market model (Sharpe (1964); Lintner (1965)). In other
words, an idiosyncratic anomaly, by definition, forecasts the portion of asset i’s return that is not
explained by aggregate market movements. However, while 𝑋𝑖,𝑡−1𝑖𝑑𝑖𝑜 contains information about
individual stock returns, it will not aggregate to generate market return predictability. To see this,
5 Indeed, several existing papers find that firm-level relations do not hold at the aggregate level. Kothari, Lewellen,
and Warner (2006) document a negative relation between returns and earnings surprise at the aggregate level, in
contrast to the positive relation documented at the firm-level. Similarly, Hirshleifer, Hou, and Teoh (2009) find that
the relation between accruals and returns changes sign between firm-level and aggregate-level analyses. 6 For simplicity, we ignore the sign of the abnormal return and define an anomaly as any variable that predicts abnormal
returns in either direction.
Electronic copy available at: https://ssrn.com/abstract=3459229
11
average equation (2) across assets all N stocks in the economy and multiply both sides by 𝑚𝑐𝑖,𝑡
∑ 𝑚𝑐𝑁𝑖 𝑖,𝑡
,
where 𝑚𝑐𝑖,𝑡 is the market capitalization of stock i on date t:
𝑚𝑐𝑖,𝑡
∑ 𝑚𝑐𝑁𝑖 𝑖,𝑡
∑ [𝑅𝑖,𝑡𝑁𝑖=1 − 𝑅𝑓 − β𝑖(𝑅𝑚,𝑡 − 𝑅𝑓)] = 𝛾0̅ + 𝛾1̅𝑋𝑡−1
𝑖𝑑𝑖𝑜̅̅ ̅̅ ̅̅ ̅, (3)
where the bar above a variable denotes the value-weighted mean. It is simple to show that the left-
hand side of equation (3) is equal to zero. Thus, the value-weighted idiosyncratic anomaly variable
𝑋𝑡−1𝑖𝑑𝑖𝑜̅̅ ̅̅ ̅̅ ̅ contains no information about aggregate market returns.
D. The Information in Systematic Anomaly Variables
While idiosyncratic anomalies contain no information about aggregate market returns, it is
possible to have a variable that predicts information in the cross-section that contains information
about the aggregate risk-premium. Define a variable 𝑋𝑖,𝑡−1𝑠𝑦𝑠𝑡
as a systematic anomaly if it forecasts
the systematic portion of stock returns for asset i on date t+1. Thus, define a systematic anomaly
as a variable 𝑋𝑖,𝑡−1𝑠𝑦𝑠𝑡
that satisfies 𝛾1 ≠ 0 in a linear regression of the form:
β𝑖(𝑅𝑚,𝑡 − 𝑅𝑓) = 𝛾0 + 𝛾1𝑋𝑖,𝑡−1𝑠𝑦𝑠𝑡
+ 𝜔𝑖,𝑡. (4)
Because the market beta is 1, it is easy to show that the left-hand side of equation (4) implies a
direct linear relation between the predictor variable and the market risk-premium. Notice also that
this relation goes in both directions: if a time-series predictor is constructed from individual assets,
Electronic copy available at: https://ssrn.com/abstract=3459229
12
it must contain information about the systematic portion of individual asset returns.7,8 This
implication provides additional economic information to test the validity of proposed predictors.
In other words, when evaluating predictors constructed from individual characteristics, we should
focus on the subsample of individual characteristics that contain information about individual asset
returns. Accordingly, in the rest of the paper, we examine the aggregate information in a subset of
140 anomalies that have been previously shown to contain information about individual asset
returns (McLean and Pontiff (2016)).
II. Data
To examine the relation between cross-sectional anomaly variables and the equity risk
premium, we combine daily data from the Center for Research in Security Prices (“CRSP”) and
Compustat over the period 1926 through 2017. We first construct aggregate time series variables
for each of the 97 cross-sectional anomalies in McLean and Pontiff (2016) and we supplement this
dataset with 43 additional variables from the extant literature to arrive at 140 candidate predictors.9
Table 1 contains summary statistics for the cross-sectional variables we use to form time-series
predictors. Of the 140 variables, 36 are classified in the Event sub-category, 36 are classified as
Fundamental, 31 are classified as Market, 15 are classified as Valuation, and the remaining 22 are
classified as Opinion. We also define two other sub-categories: Popular and Best Cross-Sectional.
Popular consists of the ten most highly cited predictors in our sample based on Google scholar
7 In the market model (Sharpe (1964); Lintner (1965)), the systematic portion of stock returns reflects compensation
for bearing systematic risk. However, outside of the market model, a common component of returns could exist that
is not related to systematic risk (for example, consumer sentiment). Similar to the systematic portion of returns in
the market model, such a variable could be related to the cross-section of returns and, since it has a common
component, it could aggregate to contain information about the equity risk premium. 8 Theoretically, it is possible that investors switch from gathering systematic information to gathering idiosyncratic
information depending on economic conditions as in Kacperczyk, Van Nieuwerburgh, and Veldkamp (2016). As a
result, some variables could contain idiosyncratic information at times, and systematic information at other times. 9 See the Appendix for an overview of the construction of these 140 variables.
Electronic copy available at: https://ssrn.com/abstract=3459229
13
citations of the original paper documenting the cross-sectional result. The mean number of
citations for the Popular category is 9,906 as of 2018. Similarly, Best Cross-Sectional consists of
the ten predictors with the greatest cross-sectional t-statistic. The Best Cross-Sectional predictors
exhibit strong cross-sectional predictive power; indeed, the mean cross-sectional t-statistic for
these predictors is approximately 9.
We construct two time-series predictors based on the equal-weighted average and value-
weighted average. We test each time-series predictor for a unit root using an Augmented Dickey-
Fuller (1979) test. Because some of the resulting time-series variables are non-stationary, we then
proceed as follows: if the raw (untransformed) variable is stationary, we use it in our tests. If not,
we calculate the deviations from a linear trend model for each variable; if that is stationary then
we use it. If not, we calculate the first-difference of each variable; if that is stationary we use it,
otherwise we drop the variable. For the linear trend, we estimate a model of the form:
xt = a + bt + ut for t = 1, . . . ,T, (5)
for each predictor variable xt and time period t. We take the fitted residual, �̂�𝑡, as our de-trended
measure. By construction, �̂�𝑡 has a mean of zero and we normalize it to have a standard deviation
of one.10
Finally, we also note that some of our cross-sectional predictors should (theoretically)
aggregate to form a variable that is constant across time (e.g., CAPM beta). Accordingly, after
constructing time-series versions of each variable, we examine each variable’s time-series standard
deviation and we drop predictors that exhibit little to no time-series variation.11
10 For our in-sample analyses, we estimate the linear trend model using all available data. For our out-of-sample
analyses, we estimate the trend model only using data available at each point in time to avoid a look-ahead bias. 11 Formally, we calculate a measure of variation for each variable as the ratio of its time-series standard deviation to
the absolute value of its time-series mean, and we drop variables with a ratio less than 0.06.
Electronic copy available at: https://ssrn.com/abstract=3459229
14
The database has a possibility of 280 different predictive variables (140 equal-weighted and
140 value-weighted variables). Of these, 268 variables survive the screening procedures discussed
above; we use these 268 variables in our predictive regressions. Finally, we combine the time
series predictors with data on the equity risk premium. We calculate the equity risk premium as
the log return on the S&P 500 index minus the log return on a one-month Treasury bill.12
III. Results
In this section, we examine whether cross-sectional anomalies, in general, contain
information about the equity risk premium. We start by examining in-sample tests that use the
entire sample of data and estimate a single parameter estimate from a time-series regression of the
market risk premium on the predictor. We then examine out-of-sample tests that use rolling
regressions to test whether a variable is useful for predicting the future equity risk premium, using
only information available at each date.
A. In-Sample Tests
As discussed in Section II, above, we start with 140 anomaly variables from the existing
literature and, using these, we form 268 candidate predictor variables. As in McLean and Pontiff
(2016), the sample length for each predictive variable depends on data availability. Some
predictors have data available as far back as 1926, while other variables have samples that start
more recently. In our in-sample tests, we allow the length of the time series examined to vary,
depending on data availability.
For each variable, we run predictive regression models of the form:
12 We download this data from Amit Goyal’s website (http://www.hec.unil.ch/agoyal/).
Electronic copy available at: https://ssrn.com/abstract=3459229
15
rt:t+h = α + βxt + εt:t+h for t = 1, . . . ,T −h, (6)
where rt:t+h = (1/h)(rt+1 +···+rt+h), rt is the continuously compounded S&P 500 return for month t
from CRSP including dividends and excess of the monthly risk-free rate from Goyal and Welch
(2008), xt is the predictor variable, and h denotes the forecast horizon. We examine four different
forecast horizons: one month ahead, one quarter ahead, one-half year ahead, and one year ahead
(i.e., h = 1, 3, 6, or 12). For each predictor at each forecast horizon, the regression is run using all
available data, leading to a single parameter estimate (β) that measures the predictive ability of the
candidate variable at that horizon.13
The results are shown in Tables 2 and 3. In Table 2, we provide a summary of the
performance of the candidate predictors, broken out by various sub-categories. We report the
fraction of predictors that are statistically significant at the 10% level or better using t-statistics
computed using a stationary block bootstrap (Politis and Romano (1994)) with 1000 draws to
account for the Stambaugh (1999) bias and the fact that the model uses overlapping observations
when h > 1 (Goetzmann and Jorion (1993); Hodrick (1992); Nelson and Kim (1993)).14 Panel A
displays the results from all 268 predictors, as well as seven sub-categories: (1) Popular, defined
as the 10 most commonly used anomalies in the cross-sectional literature based on Google Scholar
Citations; (2) Best Cross-sectional, defined as the 10 most statistically significant anomalies in the
cross-sectional literature; (3) Event, defined as predictors that are based on events within the firm
or external events that affect the firm; (4) Fundamental, defined as predictors constructed from
13 In untabulated results, we have estimated equation (6) using the weighted least squares method of Johnson (2019).
This estimator does not affect our conclusions. 14 Specifically, we resample the original data using a stationary block bootstrap with mean block size of 5. We have
also examined larger block sizes (10, 25, and 50). To avoid the overlapping dependent variable issue, we have also
tried aggregating the independent variable rather than the dependent variable as suggested by Cochrane (1991) and
Jegadeesh (1991); our general conclusions - about the lack of statistically significant predictability after accounting
for the number of tests run - do not change under any of these alternate approaches.
Electronic copy available at: https://ssrn.com/abstract=3459229
16
financial statement data; (5) Market, defined as predictors that can be constructed using only
financial data; (6) Opinion defined as predictors based on the opinions or actions of analysts,
institutions, and insiders, and (7) Valuation, defined as predictors that are ratios where one of the
numbers reflects a market value and the other reflects fundamentals.
For the entire sample of predictors, we find evidence of return predictability. At the one-
month ahead forecasting horizon, 34 of the 268 predictors (13%) are statistically significant at the
10% level or better. This number increases monotonically as the forecast horizon increases, to 43
(16%), 45 (17%), and 54 (20%) at the 3-month, 6-month, and 12-month horizon. When we
examine the seven sub-categories defined above, the results are similar. Interestingly, the results
are generally weakest for the Valuation and Opinion categories. Since valuation ratios may be a
function of discount rates, they are arguably the most likely cross-sectional predictors to work in
a time-series setting (Kelly and Pruitt (2013)). Indeed, several different Valuation predictors have
been studied in the existing market risk premium literature, notably the dividend-to-price and
earnings-to-price ratios. Our results suggest these predictors are an exception, rather than the norm.
In general, Valuation predictors are weaker than the Event, Market, and Fundamental predictors
and they are only slightly better than Opinion predictors. Across all horizons, only 14% of
Valuation predictors are statistically significant, versus 16% across all predictors.
When we examine the results for the “best” cross-sectional variables, we find evidence that
strong cross-sectional predictors are more likely to be strong time-series predictors. Specifically,
31% of Popular predictors, those in the top ten by citation count, are statistically significant at the
one-year horizon. Similarly, 20% of Best Cross-Sectional predictors, those with the top ten highest
cross-sectional t-statistics, are statistically significant at the one-year horizon. The results are
consistent with the idea that the best cross-sectional variables forecast returns because they contain
Electronic copy available at: https://ssrn.com/abstract=3459229
17
information about the systematic component of returns. As a result, these variables also aggregate
to form good time-series predictors.
We then examine the results broken out by the different methodologies used to construct the
aggregate predictor variables. Specifically, in panels B and C, we examine the results when we
calculate the aggregate predictor using a value-weighted average or an equal-weighted average,
respectively. The results are largely consistent across the two methodologies: across all predictors,
15% are statistically significant at the one-month horizon when value-weighted and 10% are
statistically significant at the one-month horizon when equal-weighted. As the horizon extends,
the value-weighted predictors appear to perform slightly better, but the results are generally similar
across the two groups. For example, across all predictors, 24% are statistically significant at the
annual horizon when value-weighted versus 16% when equal-weighted. Overall, the results in
Table 2 suggest some cross-sectional anomaly variables can be used to construct aggregate return
predictors.
A.1 Multiple Hypothesis Testing
Of course, the results in Table 2 were based on 268 different regression models; as such, the
results are subject to concerns about data snooping. Put differently, with 268 different regression
models, we are likely to find some predictors that are statistically significant simply because of
type I errors. Fortunately, a growing literature shows how to adjust p-values to account for the
number of models considered.
White (2000) develops a reality check bootstrap (RCB) to correct for data-snooping. While
a number of approaches exist to adjust p-values for the bias that results from multiple hypothesis
testing, the White approach has several desirable properties. First, it uses a bootstrap procedure to
Electronic copy available at: https://ssrn.com/abstract=3459229
18
estimate the dependence structure of the p-values across all considered models. In contrast, the
Bonferroni (Dunn (1961) and Holm (1979) approaches assume the worst-case dependence
structure. This causes them to be overly conservative in that they do not reject the null hypothesis
enough when the null is false. Because it estimates the actual dependence structure, the White
RCB has greater power than the Bonferroni and Holm methods. Second, the White procedure is
particularly suited to the application of return predictability regressions because the procedure uses
a loss function that compares the performance of a predictor to a benchmark model. Economic
theory suggests the equity risk premium should be positive; as a result, a strategy that simply
predicts positive returns all the time would frequently be correct. Accordingly, the return
predictability literature often compares the forecast accuracy of a predictive variable to the so-
called “prevailing mean” model which uses the prevailing mean return as the forecast of next
period’s equity risk premium. The White RCB also uses a loss function that compares the mean
squared forecast error for each predictor to the mean squared forecast error from a benchmark
model that uses the prevailing mean return.
However, despite these advantages, the White procedure does have a drawback: if the null
is rejected, it indicates that the best predictor examined is better than the benchmark, but it does
not reveal whether other predictors are better. Put differently, the White procedure does not test
whether the second-best predictor, or the kth best predictor, is better than the benchmark.
Accordingly, Romano and Wolf (2005) develop a step-down procedure that extends the White
procedure to calculate whether individual predictors are better than the benchmark.15 The resulting
procedure thus controls the familywise error rate and provides p-values for each individual
predictor. Romano and Wolf (2005) also argue that in most cases a studentized test statistic (like a
15 See Romano and Wolf (2016) for a detailed discussion of the procedure.
Electronic copy available at: https://ssrn.com/abstract=3459229
19
t-statistic) should be used in ranking predictors and we adopt this approach as well.16 Formally,
we proceed as follows for the case of predicting the 1-month ahead equity risk premium (the
procedures are analogous for the 3-month, 6-month, and 12-month cases):
1) For each predictor, find the OLS coefficient (�̂�) and its associated t-statistic from
Equation (6). Rank these t-statistics by their absolute value from 1 to 268. For the nth
ranked predictor, call the t-statistic |t|n.
2) For each predictor, calculate an adjusted dependent variable r*t:t+1 where r*t:t+1 = rt:t+1 -
�̂�xt. Now, for each predictor x, there is no relation between r* and x by construction (i.e.
if we regressed r* on x in the full sample the coefficient and associated t-stat for x would
be zero). We use r* (rather than r) when building bootstrapped samples so that we
operate under the null of no predictability from x.
3) Create a T × 268 matrix for the dependent r* values where rows index time (T) and
columns index the predictors. Call this matrix R*. Also create a T × 268 matrix for the
independent x values where rows index time and columns index the predictors. Call this
matrix X.
4) Bootstrap row samples from the R* and X matrices using the stationary bootstrap (Politis
and Romano (1994)).17 Bootstrapping by row preserves the cross-sectional dependence
in the data. Create 1000 bootstrapped datasets of dependent and independent variables.
16 We use studentized test statistics as recommended in Hansen (2005) and Romano and Wolf (2005), and we
calculate the studentization inside each bootstrap sample as recommended by Romano and Wolf (2005). We thank
Allan Timmermann for suggesting this, and for helpful conversations about the White (2000) and Romano and Wolf
(2005) procedures. 17 We use the bootstrap procedure in Politis and Romano (1994) which preserves the underlying time-series properties
of the data. For each predictor, we use 1000 replications with a mean block size of 5. Sullivan, Timmerman, and White
(1999) show the procedure is robust to various block size choices.
Electronic copy available at: https://ssrn.com/abstract=3459229
20
5) For each bootstrap dataset and each predictor, regress the dependent variable on the
independent variable and save the corresponding coefficient and t-statistic.
6) For the nth ranked predictor (from step 1), construct a list of the 1 through n-1 ranked
“better” predictors (also from step 1) and calculate the maximum absolute t-statistic in
each bootstrap dataset after this list of better predictors has been removed. Then the
Romano and Wolf (2005) p-value for the nth ranked predictor is the fraction of these
maximum absolute t-statistics from the bootstrapped datasets whose value is greater than
|t|n.18
We are one of the first papers to use this procedure to evaluate the predictability of the
market’s risk premium. In this area, the closest related papers are Sullivan, Timmerman, and White
(1999) and Chordia, Goyal, and Saretto (2018). Sullivan, Timmerman, and White (1999) apply the
White procedure to examine the performance of technical trading strategies in predicting the equity
risk premium. Sullivan et al. find no evidence that technical trading strategies can generate
portfolio performance that out-performs a benchmark. More recently, Chordia, Goyal, and Saretto
(2018) examine the performance of trading strategies in the cross-section and they use several
methods to correct for multiple hypothesis testing bias, including the Romano and Wolf (2005)
procedure. They find that most strategies that have been studied in the extant literature are not
significant after adjusting for multiple hypothesis testing. Our paper unifies these two literatures
by providing the first evidence on the performance of time-series predictors formed from cross-
sectional variables.
18 Following Romano and Wolf (2005) we also impose the monotonicity condition that the p-value for the nth ranked
predictor must be less than or equal to the p-value for the (n+1)th ranked predictor.
Electronic copy available at: https://ssrn.com/abstract=3459229
21
A.2 In-Sample Results Adjusted for Multiple Hypothesis Testing
In Table 3, we display detailed estimates for individual predictors that are statistically
significant before adjusting for multiple testing. The table presents coefficient estimates, R-
squared values, and both raw and Romano and Wolf (2005) adjusted p-values. Here we reach the
main conclusion of the paper: most of the cross-sectional variables that appeared to be statistically
significant when examined in isolation no longer are when examined in the context of all cross-
sectional predictors.
For example, at the 1-month and 3-month horizons, none of the predictors remain significant
at the 10% level when computing Romano and Wolf (2005) adjusted p-values compared to 34 and
43, respectively, when computing individual p-values (Table 2). At the 6-month horizon, there are
2 predictors that are significant at the 10% level (compared to 45 in Table 2) and at the 12-month
horizon there are 3 predictors significant at the 10% level (compared to 54 in Table 2). Moreover,
the predictors with remarkable statistical significance when examined in isolation, no longer
appear to be so stellar when examined among the set of 268 predictors. For example, when
computing individual p-values, 66 predictors are significant at the 1% level across all horizons;
however, when computing Romano and Wolf (2005) adjusted p-values 0 predictors are significant
at the 1% level across all horizons.
In terms of economic significance, a number of predictors are noteworthy. Z-score and Asset
Turnover, with R-squared values of 4.6% and of 4.7% shown in column (8), are among the best
predictors at the three-month horizon. Moreover, several other predictors have R-squared values
exceeding 3% including Short Interest, Percent Operating Accrual, and Target Price. Although
these R-squared values might seem small in absolute magnitude, Zhou (2010) notes that R-squared
Electronic copy available at: https://ssrn.com/abstract=3459229
22
values from predictive regressions are typically small in absolute magnitude as stock returns are
difficult to forecast. Accordingly, Zhou (2014) builds on the insights of Ross (2005) to construct
a mathematical bound on the maximum R-squared that can exist under no arbitrage conditions.
Using consumption growth rates as a state variable, Zhou’s bounds imply a maximum R-squared
at the monthly horizon of between 0.079% and 0.177%19 and Huang and Zhou (2017) show that
the quarterly R-squared is bound by at most 3.74%, depending on the specification, and in most
cases it is less than 3%. In light of this, the return predictability for some predictors documented
in Table 3 seems economically large.20
In addition, we note that many of the best predictors at the 3-month horizon are also good
predictors at the six-month and twelve-month horizons. At the twelve-month horizon, a number of
the predictors have impressive R-squared values of 10% or greater including Z-score, Short
Interest, and Percent Operating Accrual. For example, Z-score exhibits a remarkable R-squared
of 18.5% at the twelve-month horizon.
Despite some impressive results with individual predictors, the big-picture takeaway from
the Romano and Wolf (2005) exercise remains: if we began with 268 predictors that, by
construction had no ability to predict the market return, we would find a handful of predictors with
similarly impressive results often enough simply by chance.
19 The bounds developed in Zhou (2010) depend on the choice of a state variable. We do not take a stance on state
variables in this paper, we simply note that the R-squared values we document appear economically meaningful
relative to the example bounds presented in Zhou (2010). 20 We also construct predictors based on sets of cross-sectional predictors using principal component analysis
(PCA). PCA requires non-missing observations for each predictor; as a result, we are not able to utilize the entire set
of predictors until 1999. In untabulated results, the first principal component extracted from all equal-weighted
predictors and value-weighted predictors exhibits strong return predictability over the period 1999 to 2017, before
adjusting for multiple testing. However, if we estimate the PCA using the set of variables with data starting in 1980,
the evidence becomes mixed; we find weak evidence of return predictability for value-weighted predictors, but no
evidence of return predictability for equal-weighted predictors. We thank Bryan Kelly for suggesting this.
Electronic copy available at: https://ssrn.com/abstract=3459229
23
B. Out-of-Sample Tests
A number of papers note that in-sample tests may overstate predictability due to the use of
information that was not known ex-ante (e.g., Cooper, Gutierrez, Marcum (2005), Goyal and
Welch (2008)). Accordingly, in this section, we revisit each of our tests using out-of-sample
forecasting regressions. We again run predictive regressions of the form:
rt:t+h = αt + βtxt + εt:t+h for t = 1, . . . ,T −h, (7)
where rt:t+h = (1/h)(rt+1 +···+rt+h), rt is the excess return on the S&P500, and xt is the predictor
variable. However, now we estimate the model separately for each time period, using only
information that was available at each date. As such, we estimate new parameter estimates for αt
and βt at each point in time. If the relation between the predictor variable and the equity risk
premium is stable over time, then this out-of-sample approach should give the same results as the
in-sample analysis discussed in section III.A. However, if the relation between the predictor
variable and the equity risk premium is not stable, the out-of-sample tests may lead to a different
conclusion.
As previously discussed, the sample length for each predictive variable depends on data
availability. For the out-of-sample tests, we require a minimum of 10 years of data to train the
model before we make our first forecast. To ensure we have an adequate time series to train the
model and evaluate the resulting forecasts, we require that each cross-sectional anomaly variable
has at least 20 years of data to be included in the out-of-sample analyses. This additional sample
filter eliminates several of the anomaly variables from consideration; of the 280 possible
specifications, we are left with 263 that are stationary and have long enough time series.
As before, we start by summarizing the results across all specifications. Table 4 provides a
summary of the performance of the candidate predictors. To make inferences, we calculate the out-
Electronic copy available at: https://ssrn.com/abstract=3459229
24
of-sample 𝑅𝑂𝑆2 statistic defined as in Campbell and Thompson (2008).21 To calculate the out-of-
sample 𝑅𝑂𝑆2 , we use the prevailing mean equity risk premium, at each date, as our benchmark
model and we use the Clark and West (2007) statistic to assess statistical significance.22 Panel A
summarizes statistical significance for all 263 predictors, as well as the seven sub-categories
(Popular, Best Cross-sectional, Event, Fundamental, Market, Opinion, and Valuation). Recall that
in Table 2, the in-sample evidence was strongest at the annual horizons, however, even at the 1-
month horizon approximately 13% of the predictors were statistically significant at the 10% level
or better. In contrast, the out-of-sample evidence is weaker. At the one-month horizon, only 3% of
the variables are statistically significant and while this number increases monotonically as the
forecast horizon increases, it is still only 17% at the twelve-month horizon (versus 20% in Table
2). When we examine the individual sub-categories, the results do not look much better. Again,
the Valuation and Opinion sub-categories appears to be the worst predictor, despite the popularity
of Valuation predictors in the extant literature. Moreover, while the Popular and Best Cross-
Sectional categories continue to show some evidence of predictability at longer horizons, none of
the predictors in these categories are significant at the one-month horizon.
In the remaining panels of Table 4, we examine the out-of-sample results broken out by the
different methodologies used to construct the aggregate predictor variables. Specifically, in panels
B and C, we examine the results when we calculate the aggregate predictor using a value-weighted
average or an equal-weighted average, respectively. Again, the results look similar, regardless of
the methodology used to construct the predictors. Only 4 of 135 value-weighted predictors are
21 We use the unconstrained out-of-sample R-squared from Campbell and Thompson (2008) (i.e., we do not impose
any sign restrictions). 22 Formally, we test the null hypothesis that the mean square forecast error (MSFE) from the baseline model is less
than or equal to the MSFE from the predictive model versus the alternative hypothesis that the MSFE from the
benchmark model is greater than the MSFE from the predictive model (𝐻0: 𝑅𝑂𝑆2 ≤ 0 against (𝐻𝐴: 𝑅𝑂𝑆
2 > 0).
Electronic copy available at: https://ssrn.com/abstract=3459229
25
significant at the one-month horizon and only 3 of 128 equal-weighted predictors are significant.
Overall, the results show some evidence that cross-sectional variables contain systematic
information at longer horizons. Taken together, it is tempting to conclude that some cross-sectional
predictors can be used to form aggregate predictors, suggesting they contain information about the
systematic component of returns. However, our out-of-sample analyses have considered more than
260 different specifications. In the next section, we revisit our results after accounting for the
number of tests we have run.
B.1. Out-of-Sample Results Adjusted for Multiple Hypothesis Testing
As before, we ask whether our out-of-sample tests show evidence of return predictability
after accounting for possible data snooping biases. To do this, we again use the Romano and Wolf
(2005) procedure. The procedure follows a similar process as outlined in section 3.A.1, except we
run rolling regressions and compare the prediction from these regressions to a rolling average
market risk premium. Formally, we proceed as follows for the case of predicting the 1-month
ahead equity risk premium (the procedures are analogous for the 3-month, 6-month, and 12-month
cases):
1) At each point in time, forecast the equity risk premium next period using the benchmark
model (i.e., using the prevailing mean equity risk premium from the start of the sample
until the current date).
2) At each point in time, forecast the equity risk premium next period using out-of-sample
regressions for each of the 263 candidate predictor variables on each date.
3) For each date and predictor, calculate a loss function as in White (2000) by comparing
the mean square forecast error of the predictor to the mean square forecast error of the
Electronic copy available at: https://ssrn.com/abstract=3459229
26
benchmark model. Call these values 𝑓x,t+1, where x indexes the predictor variable (in our
case, x ranges from 1 to 263). Call the T × 263 matrix of 𝑓x,t+1 values matrix F.
a. For each predictor, calculate the time-series mean of 𝑓x,t+1 and call it 𝑓x̅.
4) Bootstrap rows of loss function values (𝑓x,t+1) from matrix F using a stationary block
bootstrap (Politis and Romano (1994)) to ensure that the bootstrap accurately estimates
the sampling distribution.23
5) Calculate studentized test statistics based on the actual sample: Vx = 𝑛½�̅�𝑥
�̂� , where �̂� is an
estimate of the standard deviation of 𝑓�̅�.
6) For each bootstrap sample i, calculate the maximum test statistic across the models
considered Ti = max �̅�*x,i =
𝑛½(�̅�𝑥,𝑖∗ −�̅�𝑥)
�̂�∗ , where �̂�∗ is an estimate of the standard deviation
of (𝑓�̅�,𝑖∗ − 𝑓�̅�).
a. Sort the predictors based on their realized test statistics (Vx). Starting with the
best predictor, compute the p-value for predictor x as the percent of Ti
observations that exceed Vx.24
b. If the p-value exceeds the desired confidence level, stop. Otherwise, remove the
current predictor from the sample and repeat step 6.
In Table 5, we display detailed estimates for individual predictors that are statistically
significant before adjusting for multiple testing. The table presents the time-series mean of each
23 We use the bootstrap procedure in Politis and Romano (1994) which preserves the underlying time-series properties
of the data. For each predictor, we use 1000 replications with a mean block size of 5. Sullivan, Timmerman, and White
(1999) show the procedure is robust to various block size choices. As with the in-sample calculation, we bootstrap
by row, which preserves the cross-sectional dependence in the data. 24 Formally, we calculate the p-value as 100 × (M+1)/(N+1), where N is the number of observations and M is the
number of observations that exceed the test statistic.
Electronic copy available at: https://ssrn.com/abstract=3459229
27
coefficient estimate, the Campbell and Thompson (2008) out-of-sample R-squared, and both raw
and Romano and Wolf (2005) adjusted p-values. None of the predictors is significant, at any
horizon, after adjusting for multiple testing. Formally, we fail to reject the null of no predictably
at all forecasting horizon for all predictors using the Romano and Wolf (2005) stepdown
procedure.
Interestingly, despite the lack of statistical significance, the out-of-sample R-squared values
in Table 5 highlight many of the same predictors that performed well in Table 3, in the in-sample
analyses. Analogous to Table 3, we report the time-series mean of the betas from these regressions
in addition to the out-of-sample 𝑅𝑂𝑆2 statistic. Indeed, Z-score, Asset Turnover, Short Interest, and
Percent Operating Accrual all have out-of-sample R-squared values that are positive and
economically meaningful at the three-month, six-month, and twelve-month horizons. Once again,
Asset Turnover exhibits impressive results, with an out-of-sample R-squared value of 17.5% at the
twelve-month horizon. Because the out-of-sample 𝑅𝑂𝑆2 statistic measures the proportional
reduction in mean squared forecast error that results from using the predictor (relative to the
benchmark model), its magnitude is also useful for interpreting the economic significance of these
findings. For the four variables listed above, the out-of-sample 𝑅𝑂𝑆2 statistic suggests economically
large return predictability.
The results in this subsection echo the conclusions of subsection B. Many predictors
exhibit strong out-of-sample t-statistics. However, once multiple hypothesis testing is considered,
we find no evidence of time-series predictability from cross-sectional predictors
Electronic copy available at: https://ssrn.com/abstract=3459229
28
IV. Conclusion
There is a large literature examining the cross-sectional determinants of stock returns.
Similarly, many time series variables have been proposed as predictors of the equity risk premium.
While these literatures have evolved largely independently, a handful of papers have proposed
certain cross-sectional variables as candidates for time-series predictability. We examine the link
between cross-sectional and time-series predictability while accounting for data snooping by
considering a full set of 140 cross-sectional predictors. Our analyses provide new information on
the nature of return predictability. While it may seem natural that cross-sectional variables should
contain information about aggregate market returns, we show that this does not have to be the case.
Crucially, the relation depends on the nature of cross-sectional return predictability: if a firm-level
variable contains only information about the idiosyncratic component of returns, then it will not
be possible to generate a variable that significantly predicts the equity risk premium by simply
aggregating the firm-level variable across assets.
We find plenty of evidence that, in isolation, certain cross-sectional variables make great
time series predictors. Some of these variables, like Z-Score and Asset Turnover, have never been
proposed as time-series variables before. However, these results disappear once we use the
Romano and Wolf (2005) stepdown procedure to account for the data snooping bias arising from
the plethora of predictive variables being considered. Moreover, when we examine out-of-sample
forecasting regressions, we continue to find little evidence of return predictability. After we apply
the Romano and Wolf (2005) procedure to our out-of-sample results, we find no evidence of return
predictability.
If each of our nearly 300 variables were examined by different econometricians, it is likely
that several articles would be written discussing the powerful in-sample time-series information in
Electronic copy available at: https://ssrn.com/abstract=3459229
29
cross-sectional variables. Claims of predictability in these articles would likely be bolstered by
out-of-sample Goyal-Welch (2008) tests. Indeed, several such articles exist. In this paper, we take
a different approach. Once we consider the set of all existing cross-sectional variables documented
in the extant literature, the difference in conclusions is stark. The evidence no longer suggests that
cross-sectional variables can be aggregated to form time-series predictors. Put differently, our
results show that cross-sectional variables are able to forecast in the cross-section of returns
because they contain information about the idiosyncratic component of returns, not the systematic
component of returns.
Electronic copy available at: https://ssrn.com/abstract=3459229
30
REFERENCES
Baker, M. and Wurgler, J., 2000, The Equity Share in New Issues and Aggregate Stock Returns,
The Journal of Finance 55, 2219–2257.
Bartsch, Viktoria-Sophie, Hubert Dichtl, Wolfgang Drobetz, and Andreas Neuhierl, 2017, Data
Snooping in Equity Premium Prediction, Working Paper.
Bay, Raymond, 1978, Anomalies in relationships between securities' yields and yield-
surrogates, Journal of Financial Economics 6, 103–126.
Berk, Jonathan B, 1995, A Critique of Size-Related Anomalies, Review of Financial Studies 8,
275–286.
Bossaerts, P., and Pierre Hillion, 1999, Implementing Statistical Criteria to Select Return
Forecasting Models: What Do We Learn?, Review of Financial Studies 12, 405–428.
Campbell, John Y. and Robert J. Shiller, 1988, The Dividend-Price Ratio and Expectations of
Future Dividends and Discount Factors, Review of Financial Studies 1, 195–228.
Campbell, J.Y. , Thompson, S.B., 2008. Predicting excess stock returns out of sample: can
anything beat the historical average? Review of Financial Studies 21, 1509–1531.
Chordia, Tarun, Richard Roll and Avanidhar Subrahmanyam, 2002, Order Imbalance,
Liquidity and Market Returns, Journal of Financial Economics 65, 111–130.
Chordia, Tarun, Amit Goyal and Alessio Saretto, 2018, In Defense of Market Efficiency:
Evidence from Two Million Strategies, working paper.
Clark, T.E., West, K.D., 2007. Approximately normal tests for equal predictive accuracy in
nested models, Journal of Econometrics 138, 291–311.
Cochrane, John H., 1991, Volatility Tests and Efficient Markets: A Review Essay, Journal of
Monetary Economics 27, 463–485.
Cochrane, John H., 2008, The Dog That Did Not Bark: A Defense of Return Predictability,
Review of Financial Studies 21, 1533–1575.
Cooper, Michael, and Hyseyin Gulen, Is Time-Series-Based Predictability Evident in Real
Time?, Journal of Business 79, 1263–1292.
Cowles, Alfred, 1933, Can Stock Market Forecasters Forecast?, Econometrica 1, 309–324.
Dickey, D.A., and W.A. Fuller, 1979, Distribution of the estimators for autoregressive time
series with a unit root. Journal of the American Statistical Association 74, 427–431.
Electronic copy available at: https://ssrn.com/abstract=3459229
31
Dow, C. H., 1920, Scientific Stock Speculation, The Magazine of Wall Street.
Dunn, O.J., 1961, Multiple comparisons among means, Journal of the American Statistical
Association 56, 52-64.
Foster, F. Douglas, Tom Smith, and Robert E. Whaley, 1997, Assessing Goodness-of-Fit of
Asset Pricing Models: The Distribution of the Maximal R2, Journal of Finance 52, 591–
607.
Gibson, Thomas, 1906, The Pitfalls of Speculation. The Moody Corporation, New York.
Green, J., J. R. Hand and X. F. Zhang, 2017, “The Characteristics that Provide Independent
Information about Average U.S. Monthly Stock Returns, forthcoming Review of
Financial Studies.
Goetzmann, W.N. , Jorion, P. , 1993. Testing the predictive power of dividend yields. Journal
of Finance 48, 663–679.
Goyal, Amit, and Ivo Welch, 2008, A Comprehensive Look at the Empirical Performance of
Equity Premium Prediction, Review of Financial Studies 21, 1455-1508.
Hansen, Peter Reinhard, 2005, A Test for Superior Predictive Ability, Journal of Business &
Economic Statistics 23, 365–380.
Harvey, Campbell R., Yan Liu, and Heqing Zhu, 2016, …and the Cross-Section of Expected
Returns, Review of Financial Studies 29, 5–68.
Harvey, Campbell R., and Yan Liu, 2015, Lucky Factors, Working Paper.
Hirshleifer, David, Kewei Hou, and Siew Hong Teoh, 2009, Accruals, cash flows, and
aggregate stock returns, Journal of Financial Economics 91, 389–406.
Hodrick, R.J., 1992. Dividend yields and expected stock returns: alternative procedures for
inference and measurement, Review of Financial Studies 5, 357–386.
Huang, Dashan and Guofu Zhou, 2017, Upper Bounds on Return Predictability, Journal of
Financial and Quantitative Analysis 52, 401–425.
Hou, K., C. Xue, and L. Zhang, 2017, Replicating Anomalies, Working Paper.
Jegadeesh, N., 1991, Seasonality in Stock Price Mean Reversion: Evidence from the U.S. and
the U.K, Journal of Finance 46, 1427–44.
Johnson, Travis, 2019, A fresh look at return predictability using a more efficient estimator,
Review of Asset Pricing Studies 9, 1-46.
Electronic copy available at: https://ssrn.com/abstract=3459229
32
Kacperczyk, M. S. Van Nieuwerburgh, and L. Veldkamp, 2016, A rational theory of mutual
funds' attention allocation, Econometrica 84, 571–626.
Lewellen, J., 2004, Predicting returns with financial ratios, Journal of Financial Economics 74,
209–235.
Lintner, J., 1965, The Valuation of Risky Assets and the Selection of Risky Investments in
Stock Portfolios and Capital Budgets, The Review of Economics and Statistics 47, 13–
37.
Lo, Andrew W., and A. Craig MacKinlay, Data-Snooping Bias in Tests of Financial Asset
Pricing Models, Review of Financial Studies 3, 431–467.
McLean, R. David, and Jeffrey Pontiff, 2016, Does Academic Publication Destroy Stock
Return Predictability?, Journal of Finance 71, 5–32.
Neely, C.J., D.E. Rapach, J. Tu, and G. Zhou, 2014, Forecasting the Equity Risk Premium: The
Role of Technical Indicators, Management Science 60, 1772–1791.
Nelson, C.R. , Kim, M.J. , 1993. Predictable stock returns: the role of small sample bias,
Journal of Finance 48, 641–661.
Politis, D., and J. Romano, 1994, The Stationary Bootstrap, Journal of the American Statistical
Association 89, 1303–1313.
Pontiff, Jeffrey, and Lawrence Schall, 1998, Book-to-market as a predictor of market returns,
Journal of Financial Economics 49, 141–160.
Rapach, David E., Matthew C. Ringgenberg, Guofu Zhou, 2016, Aggregate Short Interest and
Return Predictability, Journal of Financial Economics 121, 46–65.
Romano, Joseph P. and Michael Wolf, 2005, Stepwise multiple testing as formalized data
snooping, Econometrica 73, 1237–1282.
Romano, Joseph P. and Michael Wolf, 2016, Efficient computation of adjusted p-values for
resampling-based stepdown multiple testing, Statistics and Probability Letters 113, 38–
40.
Ross, S.A., 2005, Neoclassical Finance. Princeton University Press.
Schwert, G.W., 2003. Anomalies and market efficiency. In: Constantinides, G.M., Harris, M.,
Stultz, R. (Eds.), Handbook of the Economics of Finance, 1B. Elsevier, Amsterdam, pp.
939–974.
Seyhun, H. Nejat, 1988, The Information Content of Aggregate Insider Trading, The Journal
of Business 61, 1–24.
Electronic copy available at: https://ssrn.com/abstract=3459229
33
Sharpe, W., 1964, Capital Asset Prices: A theory of market equilibrium under conditions of
risk, Journal of Finance 19, 425–442.
Stambaugh, R.F., 1999, Predictive regressions, Journal of Financial Economics 54, 375–421.
Sullivan, R., A. Timmermann, and H. White, 1999, Data-snooping, technical trade rule
performance, and the bootstrap, The Journal of Finance 54, 1647–1691.
Wen, Q., 2019, Asset Growth and Stock Market Returns: A Time-Series Analysis, Review of
Finance 23, 599–628.
White, H., 2000, A Reality Check for Data Snooping, Econometrica 68, 1097–1126.
Zhou, G., 2010, How much stock return predictability can we expect from an asset pricing
model?, Economic Letters 108, 184–186.
Electronic copy available at: https://ssrn.com/abstract=3459229
34
Table 1
Summary Statistics The table displays summary statistics for the cross-sectional variables from which we construct time-series
predictors. To construct time-series predictors out of cross-sectional variables, we calculate the value-weighted
and equal-weighted mean across all stocks on each date. The first row displays statistics for all predictors, and
the remaining rows examine sub-samples formed on the ten most popular cross-sectional predictors (Popular),
the ten most statistically significant cross-sectional predictors (Best Cross-Sectional), and five different
groupings based on the categories in McLean and Pontiff (2016): (1) Event, (2) Fundamental, (3) Market, (4)
Opinion, and (5) Valuation. We examine only nine Popular variables due to ties at the tenth value. See Section
3.C for a discussion of these categories. We display the mean and median values of cross-sectional t-statics as
well as the number of citations for each predictor from Google Scholar as of 2018.
(1) (2) (3) (4) (5) (6)
Cross-Sectional t-statistic Citations
Type N Mean Median Mean Median
All 140 3.03 2.71 1,432 517
Popular 9 3.11 3.06 9,906 8,345
Best Cross-sectional 10 9.24 8.63 847 782
Event 36 3.83 2.94 869 467
Fundamental 36 2.37 1.91 973 428
Market 31 3.12 3.06 2668 878
Opinion 22 2.56 2.04 798 689
Valuation 15 3.14 2.88 2262 273
Electronic copy available at: https://ssrn.com/abstract=3459229
35
Table 2
Summary of In-Sample Anomaly Performance The table displays a count of the number of predictive variables that are statistically significant at the 10% level
or better. For each anomaly, we estimate an in-sample predictive regression of the form:
rt:t+h = α + βxt + εt:t+h for t = 1, . . . ,T −h,
where rt:t+h = (1/h)(rt+1 +···+rt+h), rt is the continuously compounded S&P 500 return for month t from CRSP
including dividends and excess of the monthly risk-free rate from Goyal and Welch (2008), h indicates the
forecast horizon in months, and xt is one of the 140 predictor variables. To construct time-series predictors out
of cross-sectional predictors, we calculate the value-weighted and equal-weighted mean across all stocks on each
date resulting in 280 possible predictors. Because some of the resulting variables are non-stationary, we also
examine the first-difference of these predictors and we examine a linearly de-trended version of these predictors.
We use the first version of each variable that is stationary; 12 predictors are non-stationary in every form, so
there are 268 predictors. The table shows the number of predictors that are statistically significant relative to the
number of variables considered. Panel A summarizes the results for all predictors, Panel B examines value-
weighted predictors, Panel C examines equal-weighted predictors. In each panel, the first row displays results
for all variables in that panel, and the remaining rows examine sub-samples formed on the ten most popular
cross-sectional predictors (Popular), the ten most statistically significant cross-sectional predictors (Best Cross-
Sectional), and five different groupings based on the categories in McLean and Pontiff (2016): (1) Event, (2)
Fundamental, (3) Market, (4) Opinion, and (5) Valuation. See Section 3.C for a discussion of these categories.
(1) (2) (3) (4) (5)
Number significant predictors / Total number predictors
Predictive Variable h=1 h=3 h=6 h=12
Panel A: All Predictors
All Predictors 34/268 43/268 45/268 54/268
Popular 4/15 4/15 5/15 6/15
Best Cross-sectional 5/20 4/20 5/20 4/20
Event 7/69 9/69 14/69 18/69
Fundamental 10/66 13/66 14/66 11/66
Market 10/60 9/60 9/60 15/60
Opinion 6/44 9/44 3/44 2/44
Valuation 1/29 3/29 5/29 8/29
Panel B: VW Predictors
All Predictors 21/136 26/136 28/136 33/136
Popular 2/8 3/8 4/8 4/8
Best Cross-sectional 3/10 3/10 4/10 3/10
Event 4/35 4/35 8/35 11/35
Fundamental 7/34 10/34 11/34 7/34
Market 6/30 7/30 6/30 10/30
Opinion 3/22 3/22 1/22 1/22
Valuation 1/15 2/15 2/15 4/15
Panel C: EW Predictors
All Predictors 13/132 17/132 17/132 21/132
Popular 2/7 1/7 1/7 2/7
Best Cross-sectional 2/10 1/10 1/10 1/10
Event 3/34 5/34 6/34 7/34
Fundamental 3/32 3/32 3/32 4/32
Market 4/30 2/30 3/30 5/30
Opinion 3/22 6/22 2/22 1/22
Valuation 0/14 1/14 3/14 4/14
Electronic copy available at: https://ssrn.com/abstract=3459229
36
Table 3
Best In-Sample Predictive Regression Results using Romano and Wolf p-values The table reports the ordinary least squares estimate of β, p-values, and the 𝑅2 statistic from in-sample predictive
regression models of the form:
rt:t+h = α + βxt + εt:t+h for t = 1, . . . ,T −h,
where rt:t+h = (1/h)(rt+1 +···+rt+h), rt is the continuously compounded S&P 500 return for month t from CRSP including
dividends and excess of the monthly risk-free rate from Goyal and Welch (2008), h indicates the forecast horizon in
months, and xt is the predictor variable shown in columns (2) and (9). Panel A displays results for the 1-month horizon,
Panel B shows the 3-month horizon, Panel C shows the 6-month horizon, and Panel D shows the 12-month horizon.
Within each panel, predictors are sorted by their Romano and Wolf p-value and then their unadjusted p-value. We
report all predictors that have unadjusted p-values less than 10% for a given horizon. Unadjusted p-values are shown
in columns (6) and (13) and Romano and Wolf (2005) adjusted p-values are shown in columns (7) and (14).
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14)
EW EW
or Raw RW or Raw RW
Rank Predictor VW �̂� 𝑹𝟐 P-Value Rank Predictor VW �̂� 𝑹𝟐 P-Value
Panel A: 1 Month Horizon 1 Z-score VW -0.0056 1.6% 0.00 0.82 18 Amihud's Measure EW 0.0026 0.4% 0.02 1.00
2 ST reversal VW 0.0050 0.8% 0.07 0.82 19 Spinoffs VW -0.0040 0.9% 0.03 1.00
3 Size VW -0.0048 0.8% 0.01 0.86 20 ΔRec + Accrual EW -0.0046 1.1% 0.03 1.00 4 ST reversal EW 0.0049 0.8% 0.09 0.86 21 Δ# of Analysts VW -0.0043 1.1% 0.03 1.00
5 Asset Turnover VW 0.0053 1.5% 0.00 0.88 22 Forecast Dispersion VW 0.0032 0.6% 0.03 1.00
6 Z-score EW -0.0052 1.4% 0.01 0.91 23 Inventory Growth EW 0.0033 0.6% 0.04 1.00 7 LT reversal VW -0.0046 0.7% 0.01 0.92 24 Secured / Total Debt VW 0.0040 0.9% 0.04 1.00
8 Zero Trading Days VW 0.0042 1.0% 0.00 0.96 25 Sustainable Growth VW -0.0031 0.5% 0.04 1.00
9 ΔNC Op. Assets VW -0.0036 0.7% 0.02 0.98 26 Moment LT Reverse VW 0.0025 0.2% 0.05 1.00 10 SEO VW -0.0042 0.9% 0.02 0.99 27 Spreads EW 0.0026 0.4% 0.05 1.00
11 ΔRec + Accrual VW -0.0049 1.3% 0.05 0.99 28 Gross Profitability VW -0.0030 0.5% 0.05 1.00
12 Coskewness EW 0.0036 0.7% 0.03 0.99 29 Misvalue Innovation VW -0.0035 0.7% 0.07 1.00 13 Coskewness VW 0.0037 0.8% 0.06 0.99 30 Target Price Return EW -0.0054 1.7% 0.07 1.00
14 NOA VW -0.0035 0.7% 0.02 1.00 31 Cash Flow Var. EW 0.0034 0.6% 0.08 1.00
15 ΔTax to Assets EW 0.0055 1.6% 0.00 1.00 32 Repurchases EW -0.0035 0.6% 0.09 1.00 16 Δ# of Analysts EW -0.0043 1.0% 0.01 1.00 33 Capex Growth VW -0.0032 0.5% 0.09 1.00
17 Asset Growth VW -0.0035 0.7% 0.02 1.00 34 Profit Margin EW 0.0031 0.5% 0.10 1.00
Panel B: 3 Month Horizon
1 Short Interest EW -0.0062 5.7% 0.03 0.25 23 Δ Rec + Accrual VW -0.0037 2.1% 0.03 1.00
2 Z-score VW -0.0055 4.6% 0.00 0.34 24 Cash Flow Variance EW 0.0033 1.7% 0.05 1.00
3 Asset Turnover VW 0.0056 4.7% 0.00 0.34 25 Misvalue Innovation VW -0.0032 1.6% 0.06 1.00 4 Size VW -0.0050 2.4% 0.00 0.35 26 Capex Growth VW -0.0032 1.6% 0.07 1.00
5 LT Reversal VW -0.0048 2.2% 0.01 0.45 27 Spreads VW 0.0028 1.2% 0.07 1.00
6 Zero Trading Days VW 0.0042 2.8% 0.00 0.63 28 Accruals VW 0.0029 1.3% 0.08 1.00 7 Z-score EW -0.0046 3.1% 0.00 0.69 29 Analyst Intensity EW -0.0026 1.0% 0.01 1.00
8 SEO VW -0.0045 3.1% 0.00 0.71 30 Spreads EW 0.0024 0.9% 0.04 1.00
9 NOA VW -0.0038 2.3% 0.01 0.80 31 Moment LT Reverse VW 0.0022 0.4% 0.06 1.00 10 ΔNC Op. Assets VW -0.0033 1.8% 0.03 0.85 32 Secured / Total Debt VW 0.0021 0.7% 0.06 1.00
11 Target Price Return EW -0.0062 6.0% 0.01 0.85 33 ΔBook to Assets VW -0.0013 0.2% 0.06 1.00
12 Asset Growth VW -0.0036 2.1% 0.01 0.86 34 Reverse Stock Split VW 0.0028 1.3% 0.07 1.00
13 Coskewness VW 0.0036 2.1% 0.04 0.86 35 Dividend Omission EW 0.0026 1.0% 0.08 1.00
14 Inventory Growth EW 0.0036 2.1% 0.01 0.88 36 Δ # of Analysts VW -0.0037 2.1% 0.08 1.00
15 % Operat. Accrual VW -0.0045 3.4% 0.04 0.93 37 Opportunistic Sells VW -0.0028 1.3% 0.09 1.00 16 Sustain Growth VW -0.0033 1.7% 0.03 0.95 38 Reverse Stock Split EW -0.0020 0.6% 0.09 1.00
17 Profit Margin EW 0.0034 1.7% 0.04 0.97 39 Δ Rec + Accrual EW -0.0029 1.3% 0.09 1.00
18 Agnostic Value VW 0.0040 2.5% 0.01 0.98 40 ΔNumber of Analyst EW -0.0034 1.9% 0.09 1.00 19 Gross Profitability VW -0.0029 1.3% 0.05 0.98 41 Low Recommend EW -0.0026 1.1% 0.10 1.00
20 Repurchases EW -0.0032 1.6% 0.07 0.99 42 Volume VW -0.0020 0.4% 0.10 1.00
21 Δ Tax to Assets EW 0.0044 2.9% 0.00 1.00 43 ΔInstitutional owner EW -0.0022 0.7% 0.10 1.00 22 Agnostic Value EW -0.0037 2.2% 0.02 1.00
Electronic copy available at: https://ssrn.com/abstract=3459229
37
Table 3 - Continued (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14)
EW EW
or Raw RW or Raw RW
Rank Predictor VW �̂� 𝑹𝟐 P-Value Rank Predictor VW �̂� 𝑹𝟐 P-Value
Panel C: 6 Month Horizon 1 Short Interest EW -0.0063 10.8% 0.02 0.08 24 Repurchases EW -0.0030 2.6% 0.04 0.94
2 Asset Turnover VW 0.0059 9.7% 0.00 0.09 25 Coskewness VW 0.0027 2.2% 0.05 0.94 3 Z-score VW -0.0057 9.1% 0.00 0.10 26 Inventory Growth VW 0.0028 2.3% 0.08 0.94
4 Size VW -0.0050 4.7% 0.00 0.11 27 Volume Trend EW 0.0032 3.0% 0.09 0.95
5 LT Reversal VW -0.0049 4.6% 0.01 0.15 28 Spreads EW 0.0026 2.0% 0.01 0.95 6 Zero Trading Days VW 0.0043 5.5% 0.00 0.34 29 Momentum-reversal VW -0.0026 1.2% 0.06 0.95
7 Z-score EW -0.0044 5.2% 0.00 0.53 30 Dividends EW 0.0026 2.0% 0.06 0.95
8 Inventory Growth EW 0.0039 4.5% 0.01 0.56 31 Reverse Stock Split VW 0.0029 2.6% 0.01 0.96 9 NOA VW -0.0037 4.0% 0.00 0.63 32 ΔSales-ΔInventory EW -0.0026 2.0% 0.05 0.97
10 Asset Growth VW -0.0037 3.9% 0.01 0.66 33 Share issues PW VW -0.0028 2.1% 0.10 0.98
11 Spreads VW 0.0035 3.7% 0.00 0.67 34 R&D/MV EW 0.0026 2.0% 0.04 0.98 12 Sustainable Growth VW -0.0036 3.8% 0.01 0.70 35 Cash Flow Variance EW 0.0029 2.6% 0.05 0.98
13 ΔNC Op. Assets VW -0.0032 3.1% 0.04 0.70 36 Misvalue Innovation VW -0.0029 2.5% 0.07 0.99
14 % Operating Accrual VW -0.0047 6.7% 0.01 0.71 37 Org. Capital EW 0.0023 1.5% 0.05 1.00 15 SEO VW -0.0038 4.0% 0.01 0.76 38 Share issues DT VW -0.0017 0.8% 0.06 1.00
16 Profit Margin EW 0.0036 3.6% 0.00 0.80 39 Share issues DT EW -0.0021 1.2% 0.06 1.00
17 Gross Profitability VW -0.0030 2.7% 0.02 0.87 40 Earn Consistency VW -0.0016 0.7% 0.10 1.00 18 Target Price Return EW -0.0052 7.0% 0.02 0.91 41 Agnostic Value VW 0.0017 0.9% 0.05 1.00
19 ΔCapex-ΔInd Capex VW -0.0029 2.5% 0.01 0.92 42 Analyst Intensity EW -0.0012 0.4% 0.04 1.00
20 Capex Growth VW -0.0031 3.0% 0.05 0.93 43 ΔBook to Assets VW -0.0010 0.3% 0.06 1.00 21 ΔTax to Assets EW 0.0048 6.0% 0.00 0.93 44 Cash to Assets VW 0.0014 0.6% 0.05 1.00
22 accruals VW 0.0029 2.4% 0.03 0.94 45 Opportunistic Sells VW -0.0026 2.1% 0.02 1.00
23 ΔCapex-ΔInd Capex EW -0.0029 2.4% 0.00 0.94
Panel D: 12 Month Horizon
1 Z-score VW -0.0058 18.5% 0.00 0.03 28 Volume Trend EW 0.0030 5.0% 0.05 0.91
2 Size VW -0.0054 10.0% 0.00 0.04 29 Share issues PW VW -0.0027 4.0% 0.05 0.92 3 Asset Turnover VW 0.0058 17.8% 0.00 0.04 30 ΔAsset Turnover VW 0.0028 4.2% 0.02 0.92
4 Short Interest EW -0.0052 14.2% 0.01 0.11 31 Dividends EW 0.0023 3.2% 0.04 0.94
5 LT Reversal VW -0.0044 7.3% 0.00 0.13 32 Target Price Return EW -0.0043 8.7% 0.03 0.95 6 Zero Trading Days VW 0.0040 9.5% 0.00 0.21 33 SEO VW -0.0026 3.6% 0.06 0.96
7 Z-score EW -0.0043 9.8% 0.00 0.30 34 Pension Funding EW -0.0028 4.4% 0.05 0.96
8 Sustainable Growth VW -0.0039 8.3% 0.00 0.33 35 Inventory Growth VW 0.0023 2.9% 0.10 0.98 9 ΔNC Op. Assets VW -0.0035 6.8% 0.01 0.34 36 ΔSales-ΔInventory EW -0.0022 2.8% 0.02 0.98
10 Asset Growth VW -0.0037 7.7% 0.00 0.37 37 Momentum-reversal VW -0.0022 1.7% 0.04 0.98
11 Spreads VW 0.0035 7.2% 0.00 0.40 38 Lagged Momentum VW -0.0021 1.6% 0.08 0.98 12 Inventory Growth EW 0.0037 7.7% 0.01 0.40 39 R&D/MV EW 0.0022 2.5% 0.04 0.99
13 Price VW -0.0036 4.1% 0.02 0.50 40 Insider Buy VW 0.0025 3.8% 0.02 0.99
14 % Operating Accrual VW -0.0046 12.2% 0.01 0.52 41 Reverse Stock Split VW 0.0022 2.8% 0.03 0.99 15 NOA VW -0.0033 6.3% 0.00 0.57 42 Coskewness VW 0.0019 2.1% 0.06 0.99
16 Capex Growth VW -0.0034 7.1% 0.01 0.66 43 Cash Flow Variance EW 0.0023 3.0% 0.06 0.99
17 Profit Margin EW 0.0033 5.9% 0.00 0.72 44 Exchange Switch VW -0.0019 2.1% 0.07 0.99 18 ΔCapex-ΔInd Capex EW -0.0030 5.2% 0.00 0.73 45 Covariance Risk VW 0.0018 2.0% 0.09 1.00
19 R&D Increases EW -0.0029 4.6% 0.05 0.82 46 M/B and Accruals VW -0.0020 2.3% 0.09 1.00 20 ΔCapex-ΔInd Capex VW -0.0028 4.6% 0.00 0.83 47 ΔTax to Assets EW 0.0029 4.0% 0.01 1.00
21 Spreads EW 0.0027 4.2% 0.00 0.83 48 Moment LT Reverse VW 0.0015 0.8% 0.02 1.00
22 Gross Profitability VW -0.0027 4.3% 0.02 0.84 49 E/P VW 0.0008 0.4% 0.04 1.00
23 Org. Capital EW 0.0030 4.9% 0.00 0.86 50 Coskewness EW 0.0010 0.6% 0.05 1.00
24 Repurchases EW -0.0031 4.9% 0.02 0.87 51 Share issues DT EW -0.0018 1.8% 0.05 1.00
25 Employee growth VW -0.0027 4.3% 0.05 0.87 52 E/P EW -0.0017 1.7% 0.05 1.00 26 Misvalue Innovation VW -0.0030 5.3% 0.03 0.91 53 Dividend Initiation VW 0.0013 1.0% 0.08 1.00
27 LT Reversal EW -0.0025 2.3% 0.05 0.91 54 CF/MV VW 0.0007 0.3% 0.08 1.00
Electronic copy available at: https://ssrn.com/abstract=3459229
38
Table 4
Summary of Out of Sample Anomaly Performance The table displays a count of the number of predictive variables that are statistically significant at the 10% level or
better. For each anomaly, we estimate an out of sample predictive regression of the form:
rt:t+h = α + βxt + εt:t+h for t = 1, . . . ,T −h,
where rt:t+h = (1/h)(rt+1 +···+rt+h), rt is the continuously compounded S&P 500 return for month t from CRSP including
dividends and excess of the monthly risk-free rate from Goyal and Welch (2008), h indicates the forecast horizon in
months, and xt is one of the 140 predictor variables. To construct aggregate predictors out of cross-sectional anomalies,
we calculate the value-weighted and equal-weighted mean across all stocks on each date. Because some of the
resulting variables are non-stationary, we also examine the first-difference of these predictors and we examine
a linearly de-trended version of these predictors. We use the first version of each variable that is stationary; 12
predictors are non-stationary in every form and we omit variables with less than 20 years of data, which yields
263 possible predictors. The table shows the number of predictors that are statistically significant relative to the
number of variables considered. Panel A summarizes the results for all predictors, Panel B examines value-
weighted predictors, Panel C examines equal-weighted predictors. In each panel, the first row displays results
for all variables in that panel, and the remaining rows examine sub-samples formed on the ten most popular
cross-sectional predictors (Popular), the ten most statistically significant cross-sectional predictors (Best Cross-
Sectional), and five different groupings based on the categories in McLean and Pontiff (2015): (1) Event, (2)
Fundamental, (3) Market, (4) Opinion, and (5) Valuation. See Section 3.C for a discussion of these categories.
(1) (2) (3) (4) (5)
Number significant predictors / Total number predictors
Predictive Variable h=1 h=3 h=6 h=12
Panel A: All Predictors
All Predictors 7/263 32/263 41/263 45/263
Popular 0/15 3/15 8/15 9/15
Best Cross-sectional 0/18 2/18 3/18 3/18
Event 1/67 5/67 9/67 10/67
Fundamental 4/65 8/65 8/65 8/65
Market 1/61 15/61 22/61 25/61
Opinion 1/41 3/41 1/41 0/41
Valuation 0/29 1/29 1/29 2/29
Panel B: VW Predictors
All Predictors 4/135 21/135 24/135 27/135
Popular 0/8 2/8 4/8 4/8
Best Cross-sectional 0/9 1/9 2/9 2/9
Event 0/34 3/34 5/34 6/34
Fundamental 3/34 5/34 6/34 7/34
Market 0/31 10/31 12/31 13/31
Opinion 1/21 2/21 0/21 0/21
Valuation 0/15 1/15 1/15 1/15
Panel C: EW Predictors
All Predictors 3/128 11/128 17/128 18/128
Popular 0/7 1/7 4/7 5/7
Best Cross-sectional 0/9 1/9 1/9 1/9
Event 1/33 2/33 4/33 4/33
Fundamental 1/31 3/31 2/31 1/31
Market 1/30 5/30 10/30 12/30
Opinion 0/20 1/20 1/20 0/20
Valuation 0/14 0/14 0/14 1/14
Electronic copy available at: https://ssrn.com/abstract=3459229
39
Table 5
Best Out-of-Sample Predictive Regression Results using Romano and Wolf p-values The table reports the mean of the ordinary least squares estimate of β, p-values, and the Campbell and Thompson
(2008) 𝑅𝑂𝑆2 statistic from out of sample predictive regression models of the form:
rt:t+h = α + βxt + εt:t+h for t = 1, . . . ,T −h,
where rt:t+h = (1/h)(rt+1 +···+rt+h), rt is the continuously compounded S&P 500 return for month t from CRSP including
dividends and excess of the monthly risk-free rate from Goyal and Welch (2008), h indicates the forecast horizon in
months, and xt is the predictor variable in the first column. �̂� (column (4)) is the time-series mean of the coefficient
estimates for each predictor. The Campbell and Thompson 𝑅𝑂𝑆2 statistic (columns (5) and (12)) is calculated as 1 minus
the proportional reduction in mean squared forecast error (MSFE) at the h-month horizon for a predictive regression
forecast of the S&P 500 log excess return based on the predictor variable in the first column vis-a-vis the prevailing
mean benchmark forecast. Panel A displays results for the 1-month horizon, Panel B shows the 3-month horizon,
Panel C shows the 6-month horizon, and Panel D shows the 12-month horizon. Within each panel, predictors are
sorted by their Romano and Wolf p-value and then their unadjusted p-value. We report all predictors that have
unadjusted p-values less than 10% for a given horizon. Unadjusted p-values are shown in columns (6) and (13) and
Romano and Wolf (2005) adjusted p-values are shown in columns (7) and (14).
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14)
EW EW
or Raw RW or Raw RW
Rank Predictor VW �̂� 𝑹𝟐 P-Value Rank Predictor VW �̂� 𝑹𝟐 P-Value
Panel A: 1 Month Horizon 1 Asset Turnover VW 0.0070 0.3% 0.04 1.00
2 ΔRec + Accrual VW -0.0029 1.1% 0.04 1.00
3 Z-score EW -0.0043 0.5% 0.04 1.00 4 Z-score VW -0.0049 0.7% 0.05 1.00
5 ΔTax to Assets EW 0.0065 -0.9% 0.06 1.00
6 Pension Funding VW -0.0043 -0.4% 0.08 1.00 7 Coskewness EW 0.0045 0.0% 0.09 1.00
Panel B: 3 Month Horizon
1 Misvalue Innovation VW -0.0030 1.3% 0.04 0.97 17 Short Interest VW 0.0059 0.5% 0.04 1.00 2 Short Interest EW -0.0051 6.4% 0.02 0.99 18 Exchange Switch VW -0.0034 -0.9% 0.04 1.00
3 Idio Risk VW -0.0015 1.1% 0.00 1.00 19 Repurchases EW -0.0039 3.9% 0.04 1.00 4 Asset Turnover VW 0.0073 3.8% 0.00 1.00 20 ΔInstitutional owner VW 0.0036 -13.4% 0.04 1.00
5 ST Reversal EW -0.0002 0.4% 0.00 1.00 21 Cash Flow Variance VW 0.0042 0.0% 0.05 1.00
6 ST Reversal VW -0.0007 0.3% 0.00 1.00 22 LT Reversal VW -0.0066 -0.6% 0.05 1.00 7 Z-score VW -0.0048 3.3% 0.01 1.00 23 Cash Flow Variance EW 0.0020 1.1% 0.05 1.00
8 Lagged Momentum VW -0.0013 0.8% 0.01 1.00 24 Age EW 0.0163 -106.8% 0.07 1.00
9 Max VW -0.0014 1.2% 0.01 1.00 25 Moment LT Reverse VW 0.0021 0.3% 0.08 1.00 10 Analyst Intensity EW -0.0073 -0.1% 0.02 1.00 26 Age VW 0.0019 -0.1% 0.08 1.00
11 Lagged Momentum EW -0.0010 0.6% 0.02 1.00 27 Reverse Stock Split VW 0.0048 -0.1% 0.08 1.00
12 ΔNC Op. Assets VW -0.0013 1.6% 0.02 1.00 28 ΔTax to Assets EW 0.0051 2.2% 0.08 1.00 13 Price VW -0.0040 1.0% 0.02 1.00 29 ΔRec + Accrual VW -0.0030 1.5% 0.09 1.00
14 Size EW 0.0000 0.3% 0.03 1.00 30 Capex Growth VW -0.0024 1.3% 0.09 1.00
15 Coskewness VW 0.0055 1.4% 0.03 1.00 31 Max EW -0.0001 0.4% 0.09 1.00 16 Z-score EW -0.0039 1.4% 0.04 1.00 32 Momentum-reversal VW -0.0009 0.5% 0.10 1.00
Electronic copy available at: https://ssrn.com/abstract=3459229
40
Table 5 - Continued (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14)
EW EW
or Raw RW or Raw RW
Rank Predictor VW �̂� 𝑹𝟐 P-Value Rank Predictor VW �̂� 𝑹𝟐 P-Value
Panel C: 6 Month Horizon 1 Misvalue Innovation VW -0.0027 2.9% 0.04 0.72 22 LT Reversal EW -0.0028 1.5% 0.03 1.00
2 Short Interest EW -0.0050 15.8% 0.01 0.87 23 Repurchases EW -0.0043 7.0% 0.03 1.00 3 Cash Flow Variance EW 0.0018 2.4% 0.03 0.90 24 Exchange Switch VW -0.0041 -1.9% 0.04 1.00
4 ST Reversal EW 0.0000 0.3% 0.00 0.95 25 Short Interest VW 0.0047 4.2% 0.04 1.00
5 Z-score VW -0.0050 7.4% 0.01 0.97 26 Reverse Stock Split VW 0.0043 -1.3% 0.04 1.00 6 Idio Risk VW -0.0020 1.6% 0.00 0.99 27 Spreads VW 0.0080 -2.3% 0.04 1.00
7 Size VW 0.0004 0.4% 0.00 1.00 28 Idio Risk EW -0.0001 1.3% 0.04 1.00
8 ST Reversal VW 0.0003 0.3% 0.00 1.00 29 Momentum-reversal EW -0.0011 1.2% 0.04 1.00 9 Asset Turnover VW 0.0075 9.6% 0.00 1.00 30 Age VW 0.0019 -0.1% 0.05 1.00
10 Moment LT Reverse VW 0.0008 1.2% 0.00 1.00 31 Coskewness VW 0.0046 1.4% 0.05 1.00
11 Momentum-reversal VW -0.0024 2.2% 0.00 1.00 32 Dividends EW 0.0000 1.0% 0.06 1.00 12 Price VW -0.0036 1.9% 0.01 1.00 33 Z-score EW -0.0041 1.9% 0.06 1.00
13 Max EW -0.0014 0.8% 0.01 1.00 34 ΔCapex-ΔInd Capex VW -0.0037 0.0% 0.06 1.00
14 Max VW -0.0024 1.5% 0.01 1.00 35 Price EW -0.0023 0.5% 0.06 1.00 15 Size EW 0.0005 0.2% 0.01 1.00 36 Dividends VW 0.0000 1.0% 0.06 1.00
16 Lagged Momentum VW -0.0015 1.9% 0.01 1.00 37 ΔCapex-ΔInd Capex EW -0.0038 -0.1% 0.06 1.00
17 LT Reversal VW -0.0064 1.2% 0.01 1.00 38 Capex Growth VW -0.0026 3.1% 0.07 1.00 18 Spreads EW 0.0120 -1.6% 0.02 1.00 39 ΔSales-ΔInventory EW -0.0042 -4.3% 0.08 1.00
19 ΔNC Op. Assets VW -0.0009 3.3% 0.02 1.00 40 Asset Growth VW 0.0003 1.0% 0.09 1.00
20 Lagged Momentum EW -0.0012 1.6% 0.02 1.00 41 Analyst Intensity EW -0.0043 -0.7% 0.09 1.00 21 Cash Flow Variance VW 0.0057 0.3% 0.03 1.00
Panel D: 12 Month Horizon
1 Misvalue Innovation VW -0.0029 6.1% 0.04 0.62 24 Idio Risk VW 0.0005 -0.8% 0.01 1.00 2 Short Interest EW -0.0049 20.6% 0.02 0.70 25 Capex Growth VW -0.0034 7.6% 0.02 1.00
3 Cash Flow Variance EW 0.0013 4.4% 0.02 0.80 26 Max EW -0.0003 -0.3% 0.03 1.00
4 Z-score VW -0.0053 15.0% 0.03 0.94 27 Dividends EW 0.0000 1.7% 0.03 1.00 5 Moment LT Reverse VW 0.0013 4.0% 0.00 0.96 28 Dividends VW 0.0000 1.7% 0.03 1.00
6 Moment LT Reverse EW 0.0009 3.9% 0.00 0.99 29 ΔNC Op. Assets VW -0.0008 7.4% 0.03 1.00
7 Short Interest VW 0.0026 5.6% 0.02 0.99 30 Age VW 0.0021 -1.2% 0.03 1.00 8 Repurchases EW -0.0044 11.2% 0.04 0.99 31 Exchange Switch VW -0.0040 -2.3% 0.04 1.00
9 Lagged Momentum VW -0.0020 4.4% 0.00 1.00 32 Momentum-reversal EW -0.0006 1.1% 0.05 1.00
10 Size VW 0.0009 -0.2% 0.00 1.00 33 Sustainable Growth VW -0.0021 2.3% 0.05 1.00 11 Reverse Stock Split VW 0.0028 -2.5% 0.00 1.00 34 ΔCapex-ΔInd Capex EW -0.0025 2.5% 0.05 1.00
12 Lagged Momentum EW -0.0011 2.9% 0.00 1.00 35 ΔCapex-ΔInd Capex VW -0.0030 0.4% 0.07 1.00
13 Price VW -0.0047 3.0% 0.00 1.00 36 Cash Flow Variance VW 0.0062 -0.8% 0.07 1.00 14 Momentum-reversal VW -0.0020 2.7% 0.00 1.00 37 Org. Capital EW 0.0027 4.6% 0.07 1.00
15 Spreads EW 0.0096 1.5% 0.00 1.00 38 Volume / MV VW -0.0012 0.0% 0.09 1.00
16 Max VW -0.0013 1.1% 0.00 1.00 39 Price EW -0.0030 -0.8% 0.09 1.00 17 Asset Turnover VW 0.0079 17.5% 0.00 1.00 40 Volume Trend EW 0.0031 2.6% 0.09 1.00
18 ST Reversal VW 0.0011 -1.0% 0.01 1.00 41 Momentum EW 0.0007 -1.6% 0.09 1.00
19 ST Reversal EW 0.0010 -1.0% 0.01 1.00 42 Asset Growth VW 0.0001 3.1% 0.09 1.00 20 Spreads VW 0.0067 1.6% 0.01 1.00 43 ΔSales-ΔInventory EW -0.0024 -4.6% 0.09 1.00
21 Size EW 0.0010 -1.3% 0.01 1.00 44 Coskewness VW 0.0039 0.2% 0.09 1.00 22 LT Reversal VW -0.0052 6.1% 0.01 1.00 45 % Operating Accrual VW -0.0046 9.8% 0.10 1.00
23 LT Reversal EW -0.0025 3.6% 0.01 1.00
Electronic copy available at: https://ssrn.com/abstract=3459229