Are Cross-Sectional Predictors Good Market-Level …...Owen Lamont, Allan Timmermann, and conference and seminar participants at the 2018 Society for Financial Studies Cavalcade, the

Are Cross-Sectional Predictors Good Market-Level Predictors?

JOSEPH ENGELBERG, R. DAVID MCLEAN, JEFFREY PONTIFF,

AND MATTHEW C. RINGGENBERG*

September 2019

ABSTRACT

Firm-level variables that predict cross-sectional stock returns, such as price-to-earnings and short

interest, are often averaged and used to predict market returns. We extend this literature and limit

the data-snooping bias by using a near-complete population of the literature’s cross-sectional

return predictors. We find the literature has ignored several cross-sectional variables–such as asset

turnover and Z-score–that contain strong in-sample predictability when examined in isolation.

However, after accounting for the number of predictors and their interdependence, we find little

evidence that cross-sectional predictors make good time-series predictors, especially out of

sample.

Keywords: Return predictability, data snooping, statistical bias, market risk premium.

JEL Code: G00, G14, L3, C1

* Engelberg is at the University of California, San Diego, McLean is at Georgetown University, Pontiff is at Boston

College, and Ringgenberg is at the University of Utah. The authors thank Mike Cooper, Bryan Kelly, Amit Goyal,

Owen Lamont, Allan Timmermann, and conference and seminar participants at the 2018 Society for Financial Studies

Cavalcade, the 2019 American Finance Association, MIT (Accounting), TCU, UC-Berkeley, University of Kentucky,

University of Michigan, UNLV, University of Utah, UC Riverside, University of Virginia, and Washington University

in St. Louis. All errors are our own. Comments welcome. © 2016-2019.

Electronic copy available at: https://ssrn.com/abstract=3459229

1

Is the market risk premium predictable? Financial research has strived to answer this

question going back, at least, to Dow (1920). “Data-mining” or “data-snooping” (Lo and

MacKinlay, 1990) poses a well-recognized obstacle in obtaining an answer: research is more likely

to be published in academic journals if it rejects the no-predictability null, and thus, statistically

significant predictors in the literature will have a disproportionately high Type I error rate. Harvey,

Liu, and Zhou (2015) demonstrate this using a wide range of cross-sectional predictors. Thus,

many apparent predictors featured in the literature will fail to predict outside of the original sample.

When choosing candidate right-hand-side variables to predict the market risk premium,

academics often draw from the well of cross-sectional predictors. For example, Baker and Wurgler

(2000) cite the extensive market-timing literature, which relates equity issuance to returns in the

cross-section, and suggest “that issuers try to time both their idiosyncratic return and the market

return.” They go on to find that aggregate equity issuance predicts aggregate market returns.

Similar arguments for aggregating cross-sectional predictors for the purpose of predicting the

market risk premium are found in Campbell and Shiller (1988) with P/E ratios, Seyhun (1988)

with insider trading, Chordia et al. (2002) with order imbalance, Rapach et al. (2016) with short

interest, and Wen (2019) with asset growth. In this paper, we examine cross-sectional predictors

such as these in light of the data mining bias. Are the aforementioned predictors simply well-

chosen among a much larger set of predictors? Or do cross-sectional predictors, in general,

aggregate to give market-level predictability?

To answer these questions, we use a sample of 140 cross-sectional predictors that is

essentially the population of firm-level characteristics that have demonstrated cross-sectional

predictability in the academic literature. This builds on the 97-predictor list used in McLean and


2

Pontiff (2016) and Engelberg, McLean, and Pontiff (2018). The benefit of using such an exhaustive

list is that it avoids selection biases, thereby reducing the data-snooping problem.

For each of our 140 predictors, we calculate cross-sectional averages of the firm-specific

values each month to get a single, monthly value. For each predictor, we construct equal-weighted

and value-weighted averages; the resulting database has 280 different predictive variables (140

equal-weighted and 140 value-weighted). To examine the market-level predictability of these

variables, we perform both in-sample and out-of-sample tests. Like other papers in the time-series

literature, our in-sample tests use the entire sample of data and estimate a single parameter estimate

from a time-series regression of the market risk premium on the predictor. Our out-of-sample tests

consist of expanding rolling window regressions that use only information available at each point

in time to examine whether a predictor is useful for forecasting the market’s risk premium.

At first glance, it appears that cross-sectional predictors are good market-level predictors

in-sample. Out of the 280 predictive variables, 268 of them are stationary and exhibit sufficient

time series variation.1 Of these 268 predictors, 54 (approximately 20%) predict 1-year market

returns in an OLS regression with coefficients that are significant at the 10% level or better and 20

(approximately 7%) are significant at the 1% level. Statistical significance is determined using the

stationary bootstrap of Politis and Romano (1994).

The strength of this result is strongly related to the horizon of predictability: when

considering 1-month market returns, only 34 of the 268 predictors (approximately 13%) are

significant at the 10% level and 8 (approximately 3%) are significant at the 1% level. We also find

that several cross-sectional predictors – such as Asset Turnover and Z-Score – are among the best

1 We test each of the 280 predictors for a unit root using an Augmented Dickey-Fuller (1979) test. If a variable is non-

stationary, we then calculate deviations from a linear trend model. If that is also non-stationary, we instead calculate

the first-difference. If that is also non-stationary, we drop the variable. We also drop variables that aggregate to form

a variable that does not exhibit time-series variation. This procedure yields 268 possible predictors.


3

performers for market predictability, but have not been documented in this literature previously.

For example, value-weighted asset turnover can predict the market risk premium with an R-

squared of 17.8% at the one-year horizon. Aggregate asset turnover has not been previously

proposed as a predictor of market risk premia.

Following McLean and Pontiff (2016), we also consider five sub-categories of cross-

sectional predictors: (i) Event, (ii) Fundamental, (iii) Market, (iv) Valuation, and (v) Opinion.

Valuation predictors are ratios of market prices to fundamentals. Many Valuation predictors have

received attention in the existing market risk premium literature, including the dividend-to-price

and earnings-to-price ratios. Moreover, since valuation ratios could be a function of discount rates,

they are arguably the most likely cross-sectional predictors to work in a time-series setting (Kelly

and Pruitt (2013)). Yet, when we examine the predictors broken out by the five sub-categories, we

find the weakest evidence for the Valuation and Opinion sub-categories. While the mean fraction

of significant predictors across all horizons is 16%, for the Opinion and Valuation sub-categories

this number falls to 11% and 14%, respectively. In other words, the Valuation predictors are

actually less likely to be significant. The Valuation result offers a comparison to Lewellen (2004),

who investigates whether valuation measures documented in the extant literature remain robust in

various subsamples. Lewellen is able to reject the null of predictability for dividend yield in both

subsamples, although his results for earnings-to-price and book-to-market depend on the

subsample examined. Our results show that a broader sample of Valuation measures shows only

weak in-sample evidence of return predictability.

Other subcategories fare better. Among Event predictors, which are based on corporate

events or changes in firm performance, 17% are statistically significant and the best performers

include Asset Turnover and Inventory Growth Rate. Among Market predictors, which are


4

predictors that only use price and trading volume data and no accounting data, 18% are statistically

significant and the best performers include Short Interest and Long-Term Reversal. Moreover, we

find similar evidence against the null of no-predictability when we examine value-weighted vs.

equal-weighted predictors; 24% of all value-weighted predictors are significant in-sample and 16%

of all equal-weighted predictors are significant in-sample.

We also find that, among the 280 predictors, the 10 variables that exhibit the best cross-

sectional predictability (from which we generate 10 equal-weighted and 10 value-weighted

aggregate predictors) also exhibit better in-sample time-series predictability: 23% are statistically

significant. Similarly, 32% of the 10 most commonly cited cross-sectional predictors are also

statistically significant. In other words, predictors that appear to be more important in the cross-

sectional predictability literature seem to make better time series predictors.

Since we examine 268 predictors, we expect that some variables will appear significant by

chance. Moreover, several cross-sectional variables are related to each other so the 268 tests we

conduct will not be independent. To address both the number of tests we conduct as well as the

dependency among the tests, we perform the Romano and Wolf (2005) resampling-based

stepdown procedure to compute adjusted p-values that control the family-wise error rate while

accounting for the number of tests and dependence. This procedure is better than either the well-

known Bonferroni (Dunn (1961)) or Holm (1979) tests because it considers the dependence

structure across multiple tests and thus has more power (Romano and Wolf (2005)).2

Using in-sample regressions, when we examine the full set of 268 predictors and perform

the Romano and Wolf (2005) stepdown procedure we are unable to reject the null of no-

2 Both the Bonferroni (Dunn (1961) and Holm (1979) tests lack power in that they tend not to reject the null enough

when the null is false.


5

predictability at the 1% level for any predictor at any horizon. At the 1-month and 3-month

horizons, no variable is significant even at the 10% level. At the 12-month horizon, Z-Score, Size

and Asset Turnover have Romano and Wolf p-values between 3% and 5%. In short, we find most

of the cross-sectional variables that appeared to be statistically significant when examined in

isolation are no longer when examined in the context of all cross-sectional predictors. Moreover,

there is no cross-sectional predictor that appears to be a stellar time-series predictor after

accounting for the number of cross-sectional variables that currently exist in the literature.

Things look even bleaker for cross-sectional predictors when we consider out-of-sample

forecasting regressions. Among the 263 stationary predictors we examine in out-of-sample

regressions,3 3% significantly predict market returns at the 1-month horizon and 17% predict

market returns at the 12-month horizon (compared with 12% and 20% in-sample). Moreover, once

we adjust the individual p-values using the Romano and Wolf (2005) stepdown procedure, we no

longer find evidence against the null at the 10% level for any predictor at any horizon. For example,

in our out-of-sample tests, Asset Turnover again appears to be a strong predictor. It exhibits

positive out-of-sample R-squared values at every forecasting horizon, with an impressive R-

squared of 17% at the one-year horizon. However, the corresponding Romano and Wolf (2005)

adjusted p-value is not statistically significant at any of the usual levels. Overall, we find little

evidence of any return predictability in out-of-sample regressions after accounting for the number

of cross-sectional predictors and their interdependence.

Our paper makes a number of contributions. First, our paper provides context for existing

papers which propose that a particular cross-sectional predictor should be transformed into a time

3 While there are 268 stationary predictors in total, we lose five predictors in our out-of-sample regressions because

they do not have long enough time-series to conduct out-of-sample regressions.


6

series one. Our paper says that existing results in the literature, which have economically large

coefficients and impressive t-stats, would obtain often enough by chance if we ran tests across our

268 predictors. Thus, absent a reason for a specific predictor’s conversion from cross-sectional

predictor to time-series predictor, our results suggest that the results for a given single predictor

should be interpreted with caution.

Second, our results provide new insight into the nature of return predictability, both in the

cross-section and the time-series. By aggregating cross-sectional anomalies into time series

variables, we are able to understand whether cross-sectional anomaly variables contain

information about both the idiosyncratic and systematic components of returns. Our results suggest

they largely contain idiosyncratic information.

Finally, we also contribute to the extensive literature on predicting the equity risk premium.

While Goyal and Welch (2008) show that 14 popular time-series variables do not significantly

predict returns in out-of-sample tests, subsequent papers have documented evidence of return

predictability using firm-level variables aggregated across stocks (e.g., Hirshleifer, Hou, and Teoh

(2009), Rapach, Ringgenberg, and Zhou (2016)). Our results extend these findings by showing

that several other cross-sectional anomalies can be aggregated to form time series predictors.

However, like Goyal and Welch (2014), we find weak evidence of predictability out-of-sample

among these new predictors.

The remainder of this paper proceeds as follows: Section I briefly describes the existing

literature and outlines the theoretical relation between cross-sectional predictive variables and time

series return predictability. Section II describes the data used in this study. Section III characterizes

our findings and Section IV concludes.


7

I. Background

Financial researchers have examined the predictability of stock returns for over a century

(e.g., Gibson (1906)) and a large literature has documented evidence of predictability in the cross-

section of stock returns. A separate literature has examined the predictability of the equity risk-

premium using time-series predictive variables. Yet, to date, these two literatures have evolved

relatively independently. We connect these two literatures by creating a sample of time-series

predictors based on 140 well-known cross-sectional predictors.

A. Time Series Return Predictability

A number of papers find in-sample evidence of time series return predictability, but out-of-

sample evidence is rare, suggesting that many predictors are the result of data snooping (i.e.,

overfitting). For example, Bossaerts and Hillion (1999) use model selection criteria from the

statistics literature to choose candidate predictors, which allows them to partially avoid data

snooping biases, yet they find that the resulting predictors are unable to forecast returns in out-of-

sample tests. Similarly, Goyal and Welch (2008) examine 14 popular predictors from the existing

literature and find that they fail to forecast the equity risk premium in out-of-sample tests. Cooper

and Gulen (2006) note that researchers have many different choices regarding the specification of

predictability tests, including the predictor variables, the estimation periods, and the assets being

forecasted. They perform specification searches across these parameters and find that return

predictability results are highly sensitive to these parameter choices. More recently, Bartsch,

Dichtl, Drobetz, and Neuhierl (2017) examine a wide variety of possible permutations of the

predictors in Goyal and Welch (2008) and the technical predictors in Neely, Rapach, Tu, and Zhou


8

(2014) and find that most out-of-sample performance for these variables appears to be due to data

snooping.

In light of the poor performance of many predictive variables in out-of-sample tests,

researchers have focused on developing methodologies that are robust to data snooping concerns.

Foster, Smith, and Whaley (1997) develop a procedure to account for data snooping biases when

evaluating the fit of predictive regressions. White (2000) develops a reality check bootstrap (RCB)

to account for data snooping biases that result from specification searches, and Sullivan,

Timmermann, and White (1999) show how to apply the RCB procedure using a set of technical

trading rules. While the White RCB procedure determines whether the best predictor among a

group is statistically significant after adjusting for data snooping biases, Romano and Wolf (2005)

show how to adjust the p-value for each individual predictor to account for data snooping biases.

To the best of our knowledge, we are the first to apply the Romano and Wolf (2005) stepdown

procedure to a large set of predictive variables derived from existing academic studies.

B. Cross-sectional Return Predictability

While the literature on time series return predictability has generally found that most

predictors fail to perform in out-of-sample tests, a large literature finds evidence of return

predictability in the cross-section of stocks. Consequently, a number of recent papers have

examined the multi-dimensionality of the cross-section of returns. McLean and Pontiff (2016)

examine 97 anomalies in the extant literature and find many anomalies exhibit lower returns

following publication in an academic journal. Green, Hand, and Zhang (2017) examine 94 firm

characteristics simultaneously and find that only 12 of them reliably contain information about

returns. Harvey, Liu and Zhu (2016) note there are more than 300 factors (many of them are not


9

based on cross-sectional variables) in the academic literature, and Harvey and Liu (2016) examine

these factors after accounting for potential data snooping. Finally, Hou, Xue, and Zhang (2017)

create and replicate more than 400 different cross-sectional predictors and find that many of these

variables do not predict returns in various subsamples with weighting schemes that favor the

largest stocks.4 However, they do not examine the relation between these predictors and aggregate

market returns as we do in this paper.

B. The Information in Cross-Sectional Anomaly Variables

While there are extensive literatures on both time series predictability and cross-sectional

predictability, they have largely evolved independently, with a few notable exceptions. Several

papers show that firm-level anomalies aggregate to market-wide predictors including Pontiff and

Schall (1998) with book-to-market ratios, Campbell and Shiller (1988) with P/E ratios, and

Chordia et al. (2002) with insider trading. More recently, Hirshleifer, Hou, and Teoh (2009) find

that firm-level accruals and cash-flow, when aggregated across stocks, contain information about

market returns, and Wen (2019) shows that aggregate asset growth predicts market returns. Finally,

Rapach, Ringgenberg and Zhou (2016) show that firm level short interest aggregates to form one

of the strongest known predictors of market returns.

C. The Information in Idiosyncratic Anomaly Variables

In this paper, our goal is to understand the relation between cross-sectional return

predictability and time-series return predictability. While it may seem natural that cross-sectional

4 Their set of characteristics is larger than ours since they use multiple versions of the same variable, e.g., using

quarterly and annual data and forecasting returns over 1-month and 1-year. Also, they include predictors that have

not been introduced in previous literature. In contrast, we use one version of each predictor, and the predictor needs

to be documented in a published study.


10

return predictors should aggregate across assets to generate time-series return predictors, it is

possible to have one without the other.5 Specifically, note that cross-sectional anomaly variables

could predict returns because they forecast either the idiosyncratic component of returns or the

systematic component of returns. As such, cross-sectional return predictors do not necessarily

aggregate to form good time series predictors. To see this, define a variable 𝑋𝑖,𝑡𝑖𝑑𝑖𝑜 as an

idiosyncratic anomaly if it forecasts the idiosyncratic portion of stock returns for asset i on date

t+1. Using the market model, we can write the return on asset i as:

𝑅𝑖,𝑡 = 𝑅𝑓 + β𝑖(𝑅𝑚,𝑡 − 𝑅𝑓) + 𝜀𝑖,𝑡, (1)

where 𝑅𝑖,𝑡 is the return on stock i on date t, 𝑅𝑓 is the risk-free rate, 𝑅𝑚,𝑡 is the market return, and

is the idiosyncratic portion of stock i's return. We define an idiosyncratic anomaly as a variable

𝑋𝑖,𝑡−1𝑖𝑑𝑖𝑜 that satisfies 𝛾1 ≠ 0 in a linear regression of the form: 6

𝜀𝑖,𝑡 = 𝛾0 + 𝛾1𝑋𝑖,𝑡−1𝑖𝑑𝑖𝑜 + 𝜔𝑖,𝑡, (2)

where 𝜀𝑖,𝑡 is the abnormal return from the market model (Sharpe (1964); Lintner (1965)). In other

words, an idiosyncratic anomaly, by definition, forecasts the portion of asset i’s return that is not

explained by aggregate market movements. However, while 𝑋𝑖,𝑡−1𝑖𝑑𝑖𝑜 contains information about

individual stock returns, it will not aggregate to generate market return predictability. To see this,

5 Indeed, several existing papers find that firm-level relations do not hold at the aggregate level. Kothari, Lewellen,

and Warner (2006) document a negative relation between returns and earnings surprise at the aggregate level, in

contrast to the positive relation documented at the firm-level. Similarly, Hirshleifer, Hou, and Teoh (2009) find that

the relation between accruals and returns changes sign between firm-level and aggregate-level analyses. 6 For simplicity, we ignore the sign of the abnormal return and define an anomaly as any variable that predicts abnormal

returns in either direction.


11

average equation (2) across assets all N stocks in the economy and multiply both sides by 𝑚𝑐𝑖,𝑡

∑ 𝑚𝑐𝑁𝑖 𝑖,𝑡

,

where 𝑚𝑐𝑖,𝑡 is the market capitalization of stock i on date t:

𝑚𝑐𝑖,𝑡

∑ 𝑚𝑐𝑁𝑖 𝑖,𝑡

∑ [𝑅𝑖,𝑡𝑁𝑖=1 − 𝑅𝑓 − β𝑖(𝑅𝑚,𝑡 − 𝑅𝑓)] = 𝛾0̅ + 𝛾1̅𝑋𝑡−1

𝑖𝑑𝑖𝑜̅̅ ̅̅ ̅̅ ̅, (3)

where the bar above a variable denotes the value-weighted mean. It is simple to show that the left-

hand side of equation (3) is equal to zero. Thus, the value-weighted idiosyncratic anomaly variable

𝑋𝑡−1𝑖𝑑𝑖𝑜̅̅ ̅̅ ̅̅ ̅ contains no information about aggregate market returns.

D. The Information in Systematic Anomaly Variables

While idiosyncratic anomalies contain no information about aggregate market returns, it is

possible to have a variable that predicts information in the cross-section that contains information

about the aggregate risk-premium. Define a variable 𝑋𝑖,𝑡−1𝑠𝑦𝑠𝑡

as a systematic anomaly if it forecasts

the systematic portion of stock returns for asset i on date t+1. Thus, define a systematic anomaly

as a variable 𝑋𝑖,𝑡−1𝑠𝑦𝑠𝑡

that satisfies 𝛾1 ≠ 0 in a linear regression of the form:

β𝑖(𝑅𝑚,𝑡 − 𝑅𝑓) = 𝛾0 + 𝛾1𝑋𝑖,𝑡−1𝑠𝑦𝑠𝑡

+ 𝜔𝑖,𝑡. (4)

Because the market beta is 1, it is easy to show that the left-hand side of equation (4) implies a

direct linear relation between the predictor variable and the market risk-premium. Notice also that

this relation goes in both directions: if a time-series predictor is constructed from individual assets,


12

it must contain information about the systematic portion of individual asset returns.7,8 This

implication provides additional economic information to test the validity of proposed predictors.

In other words, when evaluating predictors constructed from individual characteristics, we should

focus on the subsample of individual characteristics that contain information about individual asset

returns. Accordingly, in the rest of the paper, we examine the aggregate information in a subset of

140 anomalies that have been previously shown to contain information about individual asset

returns (McLean and Pontiff (2016)).

II. Data

To examine the relation between cross-sectional anomaly variables and the equity risk

premium, we combine daily data from the Center for Research in Security Prices (“CRSP”) and

Compustat over the period 1926 through 2017. We first construct aggregate time series variables

for each of the 97 cross-sectional anomalies in McLean and Pontiff (2016) and we supplement this

dataset with 43 additional variables from the extant literature to arrive at 140 candidate predictors.9

Table 1 contains summary statistics for the cross-sectional variables we use to form time-series

predictors. Of the 140 variables, 36 are classified in the Event sub-category, 36 are classified as

Fundamental, 31 are classified as Market, 15 are classified as Valuation, and the remaining 22 are

classified as Opinion. We also define two other sub-categories: Popular and Best Cross-Sectional.

Popular consists of the ten most highly cited predictors in our sample based on Google scholar

7 In the market model (Sharpe (1964); Lintner (1965)), the systematic portion of stock returns reflects compensation

for bearing systematic risk. However, outside of the market model, a common component of returns could exist that

is not related to systematic risk (for example, consumer sentiment). Similar to the systematic portion of returns in

the market model, such a variable could be related to the cross-section of returns and, since it has a common

component, it could aggregate to contain information about the equity risk premium. 8 Theoretically, it is possible that investors switch from gathering systematic information to gathering idiosyncratic

information depending on economic conditions as in Kacperczyk, Van Nieuwerburgh, and Veldkamp (2016). As a

result, some variables could contain idiosyncratic information at times, and systematic information at other times. 9 See the Appendix for an overview of the construction of these 140 variables.


13

citations of the original paper documenting the cross-sectional result. The mean number of

citations for the Popular category is 9,906 as of 2018. Similarly, Best Cross-Sectional consists of

the ten predictors with the greatest cross-sectional t-statistic. The Best Cross-Sectional predictors

exhibit strong cross-sectional predictive power; indeed, the mean cross-sectional t-statistic for

these predictors is approximately 9.

We construct two time-series predictors based on the equal-weighted average and value-

weighted average. We test each time-series predictor for a unit root using an Augmented Dickey-

Fuller (1979) test. Because some of the resulting time-series variables are non-stationary, we then

proceed as follows: if the raw (untransformed) variable is stationary, we use it in our tests. If not,

we calculate the deviations from a linear trend model for each variable; if that is stationary then

we use it. If not, we calculate the first-difference of each variable; if that is stationary we use it,

otherwise we drop the variable. For the linear trend, we estimate a model of the form:

xt = a + bt + ut for t = 1, . . . ,T, (5)

for each predictor variable xt and time period t. We take the fitted residual, �̂�𝑡, as our de-trended

measure. By construction, �̂�𝑡 has a mean of zero and we normalize it to have a standard deviation

of one.10

Finally, we also note that some of our cross-sectional predictors should (theoretically)

aggregate to form a variable that is constant across time (e.g., CAPM beta). Accordingly, after

constructing time-series versions of each variable, we examine each variable’s time-series standard

deviation and we drop predictors that exhibit little to no time-series variation.11

10 For our in-sample analyses, we estimate the linear trend model using all available data. For our out-of-sample

analyses, we estimate the trend model only using data available at each point in time to avoid a look-ahead bias. 11 Formally, we calculate a measure of variation for each variable as the ratio of its time-series standard deviation to

the absolute value of its time-series mean, and we drop variables with a ratio less than 0.06.


14

The database has a possibility of 280 different predictive variables (140 equal-weighted and

140 value-weighted variables). Of these, 268 variables survive the screening procedures discussed

above; we use these 268 variables in our predictive regressions. Finally, we combine the time

series predictors with data on the equity risk premium. We calculate the equity risk premium as

the log return on the S&P 500 index minus the log return on a one-month Treasury bill.12

III. Results

In this section, we examine whether cross-sectional anomalies, in general, contain

information about the equity risk premium. We start by examining in-sample tests that use the

entire sample of data and estimate a single parameter estimate from a time-series regression of the

market risk premium on the predictor. We then examine out-of-sample tests that use rolling

regressions to test whether a variable is useful for predicting the future equity risk premium, using

only information available at each date.

A. In-Sample Tests

As discussed in Section II, above, we start with 140 anomaly variables from the existing

literature and, using these, we form 268 candidate predictor variables. As in McLean and Pontiff

(2016), the sample length for each predictive variable depends on data availability. Some

predictors have data available as far back as 1926, while other variables have samples that start

more recently. In our in-sample tests, we allow the length of the time series examined to vary,

depending on data availability.

For each variable, we run predictive regression models of the form:

12 We download this data from Amit Goyal’s website (http://www.hec.unil.ch/agoyal/).


15

rt:t+h = α + βxt + εt:t+h for t = 1, . . . ,T −h, (6)

where rt:t+h = (1/h)(rt+1 +···+rt+h), rt is the continuously compounded S&P 500 return for month t

from CRSP including dividends and excess of the monthly risk-free rate from Goyal and Welch

(2008), xt is the predictor variable, and h denotes the forecast horizon. We examine four different

forecast horizons: one month ahead, one quarter ahead, one-half year ahead, and one year ahead

(i.e., h = 1, 3, 6, or 12). For each predictor at each forecast horizon, the regression is run using all

available data, leading to a single parameter estimate (β) that measures the predictive ability of the

candidate variable at that horizon.13

The results are shown in Tables 2 and 3. In Table 2, we provide a summary of the

performance of the candidate predictors, broken out by various sub-categories. We report the

fraction of predictors that are statistically significant at the 10% level or better using t-statistics

computed using a stationary block bootstrap (Politis and Romano (1994)) with 1000 draws to

account for the Stambaugh (1999) bias and the fact that the model uses overlapping observations

when h > 1 (Goetzmann and Jorion (1993); Hodrick (1992); Nelson and Kim (1993)).14 Panel A

displays the results from all 268 predictors, as well as seven sub-categories: (1) Popular, defined

as the 10 most commonly used anomalies in the cross-sectional literature based on Google Scholar

Citations; (2) Best Cross-sectional, defined as the 10 most statistically significant anomalies in the

cross-sectional literature; (3) Event, defined as predictors that are based on events within the firm

or external events that affect the firm; (4) Fundamental, defined as predictors constructed from

13 In untabulated results, we have estimated equation (6) using the weighted least squares method of Johnson (2019).

This estimator does not affect our conclusions. 14 Specifically, we resample the original data using a stationary block bootstrap with mean block size of 5. We have

also examined larger block sizes (10, 25, and 50). To avoid the overlapping dependent variable issue, we have also

tried aggregating the independent variable rather than the dependent variable as suggested by Cochrane (1991) and

Jegadeesh (1991); our general conclusions - about the lack of statistically significant predictability after accounting

for the number of tests run - do not change under any of these alternate approaches.


16

financial statement data; (5) Market, defined as predictors that can be constructed using only

financial data; (6) Opinion defined as predictors based on the opinions or actions of analysts,

institutions, and insiders, and (7) Valuation, defined as predictors that are ratios where one of the

numbers reflects a market value and the other reflects fundamentals.

For the entire sample of predictors, we find evidence of return predictability. At the one-

month ahead forecasting horizon, 34 of the 268 predictors (13%) are statistically significant at the

10% level or better. This number increases monotonically as the forecast horizon increases, to 43

(16%), 45 (17%), and 54 (20%) at the 3-month, 6-month, and 12-month horizon. When we

examine the seven sub-categories defined above, the results are similar. Interestingly, the results

are generally weakest for the Valuation and Opinion categories. Since valuation ratios may be a

function of discount rates, they are arguably the most likely cross-sectional predictors to work in

a time-series setting (Kelly and Pruitt (2013)). Indeed, several different Valuation predictors have

been studied in the existing market risk premium literature, notably the dividend-to-price and

earnings-to-price ratios. Our results suggest these predictors are an exception, rather than the norm.

In general, Valuation predictors are weaker than the Event, Market, and Fundamental predictors

and they are only slightly better than Opinion predictors. Across all horizons, only 14% of

Valuation predictors are statistically significant, versus 16% across all predictors.

When we examine the results for the “best” cross-sectional variables, we find evidence that

strong cross-sectional predictors are more likely to be strong time-series predictors. Specifically,

31% of Popular predictors, those in the top ten by citation count, are statistically significant at the

one-year horizon. Similarly, 20% of Best Cross-Sectional predictors, those with the top ten highest

cross-sectional t-statistics, are statistically significant at the one-year horizon. The results are

consistent with the idea that the best cross-sectional variables forecast returns because they contain


17

information about the systematic component of returns. As a result, these variables also aggregate

to form good time-series predictors.

We then examine the results broken out by the different methodologies used to construct the

aggregate predictor variables. Specifically, in panels B and C, we examine the results when we

calculate the aggregate predictor using a value-weighted average or an equal-weighted average,

respectively. The results are largely consistent across the two methodologies: across all predictors,

15% are statistically significant at the one-month horizon when value-weighted and 10% are

statistically significant at the one-month horizon when equal-weighted. As the horizon extends,

the value-weighted predictors appear to perform slightly better, but the results are generally similar

across the two groups. For example, across all predictors, 24% are statistically significant at the

annual horizon when value-weighted versus 16% when equal-weighted. Overall, the results in

Table 2 suggest some cross-sectional anomaly variables can be used to construct aggregate return

predictors.

A.1 Multiple Hypothesis Testing

Of course, the results in Table 2 were based on 268 different regression models; as such, the

results are subject to concerns about data snooping. Put differently, with 268 different regression

models, we are likely to find some predictors that are statistically significant simply because of

type I errors. Fortunately, a growing literature shows how to adjust p-values to account for the

number of models considered.

White (2000) develops a reality check bootstrap (RCB) to correct for data-snooping. While

a number of approaches exist to adjust p-values for the bias that results from multiple hypothesis

testing, the White approach has several desirable properties. First, it uses a bootstrap procedure to


18

estimate the dependence structure of the p-values across all considered models. In contrast, the

Bonferroni (Dunn (1961) and Holm (1979) approaches assume the worst-case dependence

structure. This causes them to be overly conservative in that they do not reject the null hypothesis

enough when the null is false. Because it estimates the actual dependence structure, the White

RCB has greater power than the Bonferroni and Holm methods. Second, the White procedure is

particularly suited to the application of return predictability regressions because the procedure uses

a loss function that compares the performance of a predictor to a benchmark model. Economic

theory suggests the equity risk premium should be positive; as a result, a strategy that simply

predicts positive returns all the time would frequently be correct. Accordingly, the return

predictability literature often compares the forecast accuracy of a predictive variable to the so-

called “prevailing mean” model which uses the prevailing mean return as the forecast of next

period’s equity risk premium. The White RCB also uses a loss function that compares the mean

squared forecast error for each predictor to the mean squared forecast error from a benchmark

model that uses the prevailing mean return.

However, despite these advantages, the White procedure does have a drawback: if the null

is rejected, it indicates that the best predictor examined is better than the benchmark, but it does

not reveal whether other predictors are better. Put differently, the White procedure does not test

whether the second-best predictor, or the kth best predictor, is better than the benchmark.

Accordingly, Romano and Wolf (2005) develop a step-down procedure that extends the White

procedure to calculate whether individual predictors are better than the benchmark.15 The resulting

procedure thus controls the familywise error rate and provides p-values for each individual

predictor. Romano and Wolf (2005) also argue that in most cases a studentized test statistic (like a

15 See Romano and Wolf (2016) for a detailed discussion of the procedure.


19

t-statistic) should be used in ranking predictors and we adopt this approach as well.16 Formally,

we proceed as follows for the case of predicting the 1-month ahead equity risk premium (the

procedures are analogous for the 3-month, 6-month, and 12-month cases):

1) For each predictor, find the OLS coefficient (�̂�) and its associated t-statistic from

Equation (6). Rank these t-statistics by their absolute value from 1 to 268. For the nth

ranked predictor, call the t-statistic |t|n.

2) For each predictor, calculate an adjusted dependent variable r*t:t+1 where r*t:t+1 = rt:t+1 -

�̂�xt. Now, for each predictor x, there is no relation between r* and x by construction (i.e.

if we regressed r* on x in the full sample the coefficient and associated t-stat for x would

be zero). We use r* (rather than r) when building bootstrapped samples so that we

operate under the null of no predictability from x.

3) Create a T × 268 matrix for the dependent r* values where rows index time (T) and

columns index the predictors. Call this matrix R*. Also create a T × 268 matrix for the

independent x values where rows index time and columns index the predictors. Call this

matrix X.

4) Bootstrap row samples from the R* and X matrices using the stationary bootstrap (Politis

and Romano (1994)).17 Bootstrapping by row preserves the cross-sectional dependence

in the data. Create 1000 bootstrapped datasets of dependent and independent variables.

16 We use studentized test statistics as recommended in Hansen (2005) and Romano and Wolf (2005), and we

calculate the studentization inside each bootstrap sample as recommended by Romano and Wolf (2005). We thank

Allan Timmermann for suggesting this, and for helpful conversations about the White (2000) and Romano and Wolf

(2005) procedures. 17 We use the bootstrap procedure in Politis and Romano (1994) which preserves the underlying time-series properties

of the data. For each predictor, we use 1000 replications with a mean block size of 5. Sullivan, Timmerman, and White

(1999) show the procedure is robust to various block size choices.


20

5) For each bootstrap dataset and each predictor, regress the dependent variable on the

independent variable and save the corresponding coefficient and t-statistic.

6) For the nth ranked predictor (from step 1), construct a list of the 1 through n-1 ranked

“better” predictors (also from step 1) and calculate the maximum absolute t-statistic in

each bootstrap dataset after this list of better predictors has been removed. Then the

Romano and Wolf (2005) p-value for the nth ranked predictor is the fraction of these

maximum absolute t-statistics from the bootstrapped datasets whose value is greater than

|t|n.18

We are one of the first papers to use this procedure to evaluate the predictability of the

market’s risk premium. In this area, the closest related papers are Sullivan, Timmerman, and White

(1999) and Chordia, Goyal, and Saretto (2018). Sullivan, Timmerman, and White (1999) apply the

White procedure to examine the performance of technical trading strategies in predicting the equity

risk premium. Sullivan et al. find no evidence that technical trading strategies can generate

portfolio performance that out-performs a benchmark. More recently, Chordia, Goyal, and Saretto

(2018) examine the performance of trading strategies in the cross-section and they use several

methods to correct for multiple hypothesis testing bias, including the Romano and Wolf (2005)

procedure. They find that most strategies that have been studied in the extant literature are not

significant after adjusting for multiple hypothesis testing. Our paper unifies these two literatures

by providing the first evidence on the performance of time-series predictors formed from cross-

sectional variables.

18 Following Romano and Wolf (2005) we also impose the monotonicity condition that the p-value for the nth ranked

predictor must be less than or equal to the p-value for the (n+1)th ranked predictor.


21

A.2 In-Sample Results Adjusted for Multiple Hypothesis Testing

In Table 3, we display detailed estimates for individual predictors that are statistically

significant before adjusting for multiple testing. The table presents coefficient estimates, R-

squared values, and both raw and Romano and Wolf (2005) adjusted p-values. Here we reach the

main conclusion of the paper: most of the cross-sectional variables that appeared to be statistically

significant when examined in isolation no longer are when examined in the context of all cross-

sectional predictors.

For example, at the 1-month and 3-month horizons, none of the predictors remain significant

at the 10% level when computing Romano and Wolf (2005) adjusted p-values compared to 34 and

43, respectively, when computing individual p-values (Table 2). At the 6-month horizon, there are

2 predictors that are significant at the 10% level (compared to 45 in Table 2) and at the 12-month

horizon there are 3 predictors significant at the 10% level (compared to 54 in Table 2). Moreover,

the predictors with remarkable statistical significance when examined in isolation, no longer

appear to be so stellar when examined among the set of 268 predictors. For example, when

computing individual p-values, 66 predictors are significant at the 1% level across all horizons;

however, when computing Romano and Wolf (2005) adjusted p-values 0 predictors are significant

at the 1% level across all horizons.

In terms of economic significance, a number of predictors are noteworthy. Z-score and Asset

Turnover, with R-squared values of 4.6% and of 4.7% shown in column (8), are among the best

predictors at the three-month horizon. Moreover, several other predictors have R-squared values

exceeding 3% including Short Interest, Percent Operating Accrual, and Target Price. Although

these R-squared values might seem small in absolute magnitude, Zhou (2010) notes that R-squared


22

values from predictive regressions are typically small in absolute magnitude as stock returns are

difficult to forecast. Accordingly, Zhou (2014) builds on the insights of Ross (2005) to construct

a mathematical bound on the maximum R-squared that can exist under no arbitrage conditions.

Using consumption growth rates as a state variable, Zhou’s bounds imply a maximum R-squared

at the monthly horizon of between 0.079% and 0.177%19 and Huang and Zhou (2017) show that

the quarterly R-squared is bound by at most 3.74%, depending on the specification, and in most

cases it is less than 3%. In light of this, the return predictability for some predictors documented

in Table 3 seems economically large.20

In addition, we note that many of the best predictors at the 3-month horizon are also good

predictors at the six-month and twelve-month horizons. At the twelve-month horizon, a number of

the predictors have impressive R-squared values of 10% or greater including Z-score, Short

Interest, and Percent Operating Accrual. For example, Z-score exhibits a remarkable R-squared

of 18.5% at the twelve-month horizon.

Despite some impressive results with individual predictors, the big-picture takeaway from

the Romano and Wolf (2005) exercise remains: if we began with 268 predictors that, by

construction had no ability to predict the market return, we would find a handful of predictors with

similarly impressive results often enough simply by chance.

19 The bounds developed in Zhou (2010) depend on the choice of a state variable. We do not take a stance on state

variables in this paper, we simply note that the R-squared values we document appear economically meaningful

relative to the example bounds presented in Zhou (2010). 20 We also construct predictors based on sets of cross-sectional predictors using principal component analysis

(PCA). PCA requires non-missing observations for each predictor; as a result, we are not able to utilize the entire set

of predictors until 1999. In untabulated results, the first principal component extracted from all equal-weighted

predictors and value-weighted predictors exhibits strong return predictability over the period 1999 to 2017, before

adjusting for multiple testing. However, if we estimate the PCA using the set of variables with data starting in 1980,

the evidence becomes mixed; we find weak evidence of return predictability for value-weighted predictors, but no

evidence of return predictability for equal-weighted predictors. We thank Bryan Kelly for suggesting this.


23

B. Out-of-Sample Tests

A number of papers note that in-sample tests may overstate predictability due to the use of

information that was not known ex-ante (e.g., Cooper, Gutierrez, Marcum (2005), Goyal and

Welch (2008)). Accordingly, in this section, we revisit each of our tests using out-of-sample

forecasting regressions. We again run predictive regressions of the form:

rt:t+h = αt + βtxt + εt:t+h for t = 1, . . . ,T −h, (7)

where rt:t+h = (1/h)(rt+1 +···+rt+h), rt is the excess return on the S&P500, and xt is the predictor

variable. However, now we estimate the model separately for each time period, using only

information that was available at each date. As such, we estimate new parameter estimates for αt

and βt at each point in time. If the relation between the predictor variable and the equity risk

premium is stable over time, then this out-of-sample approach should give the same results as the

in-sample analysis discussed in section III.A. However, if the relation between the predictor

variable and the equity risk premium is not stable, the out-of-sample tests may lead to a different

conclusion.

As previously discussed, the sample length for each predictive variable depends on data

availability. For the out-of-sample tests, we require a minimum of 10 years of data to train the

model before we make our first forecast. To ensure we have an adequate time series to train the

model and evaluate the resulting forecasts, we require that each cross-sectional anomaly variable

has at least 20 years of data to be included in the out-of-sample analyses. This additional sample

filter eliminates several of the anomaly variables from consideration; of the 280 possible

specifications, we are left with 263 that are stationary and have long enough time series.

As before, we start by summarizing the results across all specifications. Table 4 provides a

summary of the performance of the candidate predictors. To make inferences, we calculate the out-


24

of-sample 𝑅𝑂𝑆2 statistic defined as in Campbell and Thompson (2008).21 To calculate the out-of-

sample 𝑅𝑂𝑆2 , we use the prevailing mean equity risk premium, at each date, as our benchmark

model and we use the Clark and West (2007) statistic to assess statistical significance.22 Panel A

summarizes statistical significance for all 263 predictors, as well as the seven sub-categories

(Popular, Best Cross-sectional, Event, Fundamental, Market, Opinion, and Valuation). Recall that

in Table 2, the in-sample evidence was strongest at the annual horizons, however, even at the 1-

month horizon approximately 13% of the predictors were statistically significant at the 10% level

or better. In contrast, the out-of-sample evidence is weaker. At the one-month horizon, only 3% of

the variables are statistically significant and while this number increases monotonically as the

forecast horizon increases, it is still only 17% at the twelve-month horizon (versus 20% in Table

2). When we examine the individual sub-categories, the results do not look much better. Again,

the Valuation and Opinion sub-categories appears to be the worst predictor, despite the popularity

of Valuation predictors in the extant literature. Moreover, while the Popular and Best Cross-

Sectional categories continue to show some evidence of predictability at longer horizons, none of

the predictors in these categories are significant at the one-month horizon.

In the remaining panels of Table 4, we examine the out-of-sample results broken out by the

different methodologies used to construct the aggregate predictor variables. Specifically, in panels

B and C, we examine the results when we calculate the aggregate predictor using a value-weighted

average or an equal-weighted average, respectively. Again, the results look similar, regardless of

the methodology used to construct the predictors. Only 4 of 135 value-weighted predictors are

21 We use the unconstrained out-of-sample R-squared from Campbell and Thompson (2008) (i.e., we do not impose

any sign restrictions). 22 Formally, we test the null hypothesis that the mean square forecast error (MSFE) from the baseline model is less

than or equal to the MSFE from the predictive model versus the alternative hypothesis that the MSFE from the

benchmark model is greater than the MSFE from the predictive model (𝐻0: 𝑅𝑂𝑆2 ≤ 0 against (𝐻𝐴: 𝑅𝑂𝑆

2 > 0).


25

significant at the one-month horizon and only 3 of 128 equal-weighted predictors are significant.

Overall, the results show some evidence that cross-sectional variables contain systematic

information at longer horizons. Taken together, it is tempting to conclude that some cross-sectional

predictors can be used to form aggregate predictors, suggesting they contain information about the

systematic component of returns. However, our out-of-sample analyses have considered more than

260 different specifications. In the next section, we revisit our results after accounting for the

number of tests we have run.

B.1. Out-of-Sample Results Adjusted for Multiple Hypothesis Testing

As before, we ask whether our out-of-sample tests show evidence of return predictability

after accounting for possible data snooping biases. To do this, we again use the Romano and Wolf

(2005) procedure. The procedure follows a similar process as outlined in section 3.A.1, except we

run rolling regressions and compare the prediction from these regressions to a rolling average

market risk premium. Formally, we proceed as follows for the case of predicting the 1-month

ahead equity risk premium (the procedures are analogous for the 3-month, 6-month, and 12-month

cases):

1) At each point in time, forecast the equity risk premium next period using the benchmark

model (i.e., using the prevailing mean equity risk premium from the start of the sample

until the current date).

2) At each point in time, forecast the equity risk premium next period using out-of-sample

regressions for each of the 263 candidate predictor variables on each date.

3) For each date and predictor, calculate a loss function as in White (2000) by comparing

the mean square forecast error of the predictor to the mean square forecast error of the


26

benchmark model. Call these values 𝑓x,t+1, where x indexes the predictor variable (in our

case, x ranges from 1 to 263). Call the T × 263 matrix of 𝑓x,t+1 values matrix F.

a. For each predictor, calculate the time-series mean of 𝑓x,t+1 and call it 𝑓x̅.

4) Bootstrap rows of loss function values (𝑓x,t+1) from matrix F using a stationary block

bootstrap (Politis and Romano (1994)) to ensure that the bootstrap accurately estimates

the sampling distribution.23

5) Calculate studentized test statistics based on the actual sample: Vx = 𝑛½�̅�𝑥

�̂� , where �̂� is an

estimate of the standard deviation of 𝑓�̅�.

6) For each bootstrap sample i, calculate the maximum test statistic across the models

considered Ti = max �̅�*x,i =

𝑛½(�̅�𝑥,𝑖∗ −�̅�𝑥)

�̂�∗ , where �̂�∗ is an estimate of the standard deviation

of (𝑓�̅�,𝑖∗ − 𝑓�̅�).

a. Sort the predictors based on their realized test statistics (Vx). Starting with the

best predictor, compute the p-value for predictor x as the percent of Ti

observations that exceed Vx.24

b. If the p-value exceeds the desired confidence level, stop. Otherwise, remove the

current predictor from the sample and repeat step 6.

In Table 5, we display detailed estimates for individual predictors that are statistically

significant before adjusting for multiple testing. The table presents the time-series mean of each

23 We use the bootstrap procedure in Politis and Romano (1994) which preserves the underlying time-series properties

of the data. For each predictor, we use 1000 replications with a mean block size of 5. Sullivan, Timmerman, and White

(1999) show the procedure is robust to various block size choices. As with the in-sample calculation, we bootstrap

by row, which preserves the cross-sectional dependence in the data. 24 Formally, we calculate the p-value as 100 × (M+1)/(N+1), where N is the number of observations and M is the

number of observations that exceed the test statistic.


27

coefficient estimate, the Campbell and Thompson (2008) out-of-sample R-squared, and both raw

and Romano and Wolf (2005) adjusted p-values. None of the predictors is significant, at any

horizon, after adjusting for multiple testing. Formally, we fail to reject the null of no predictably

at all forecasting horizon for all predictors using the Romano and Wolf (2005) stepdown

procedure.

Interestingly, despite the lack of statistical significance, the out-of-sample R-squared values

in Table 5 highlight many of the same predictors that performed well in Table 3, in the in-sample

analyses. Analogous to Table 3, we report the time-series mean of the betas from these regressions

in addition to the out-of-sample 𝑅𝑂𝑆2 statistic. Indeed, Z-score, Asset Turnover, Short Interest, and

Percent Operating Accrual all have out-of-sample R-squared values that are positive and

economically meaningful at the three-month, six-month, and twelve-month horizons. Once again,

Asset Turnover exhibits impressive results, with an out-of-sample R-squared value of 17.5% at the

twelve-month horizon. Because the out-of-sample 𝑅𝑂𝑆2 statistic measures the proportional

reduction in mean squared forecast error that results from using the predictor (relative to the

benchmark model), its magnitude is also useful for interpreting the economic significance of these

findings. For the four variables listed above, the out-of-sample 𝑅𝑂𝑆2 statistic suggests economically

large return predictability.

The results in this subsection echo the conclusions of subsection B. Many predictors

exhibit strong out-of-sample t-statistics. However, once multiple hypothesis testing is considered,

we find no evidence of time-series predictability from cross-sectional predictors


28

IV. Conclusion

There is a large literature examining the cross-sectional determinants of stock returns.

Similarly, many time series variables have been proposed as predictors of the equity risk premium.

While these literatures have evolved largely independently, a handful of papers have proposed

certain cross-sectional variables as candidates for time-series predictability. We examine the link

between cross-sectional and time-series predictability while accounting for data snooping by

considering a full set of 140 cross-sectional predictors. Our analyses provide new information on

the nature of return predictability. While it may seem natural that cross-sectional variables should

contain information about aggregate market returns, we show that this does not have to be the case.

Crucially, the relation depends on the nature of cross-sectional return predictability: if a firm-level

variable contains only information about the idiosyncratic component of returns, then it will not

be possible to generate a variable that significantly predicts the equity risk premium by simply

aggregating the firm-level variable across assets.

We find plenty of evidence that, in isolation, certain cross-sectional variables make great

time series predictors. Some of these variables, like Z-Score and Asset Turnover, have never been

proposed as time-series variables before. However, these results disappear once we use the

Romano and Wolf (2005) stepdown procedure to account for the data snooping bias arising from

the plethora of predictive variables being considered. Moreover, when we examine out-of-sample

forecasting regressions, we continue to find little evidence of return predictability. After we apply

the Romano and Wolf (2005) procedure to our out-of-sample results, we find no evidence of return

predictability.

If each of our nearly 300 variables were examined by different econometricians, it is likely

that several articles would be written discussing the powerful in-sample time-series information in


29

cross-sectional variables. Claims of predictability in these articles would likely be bolstered by

out-of-sample Goyal-Welch (2008) tests. Indeed, several such articles exist. In this paper, we take

a different approach. Once we consider the set of all existing cross-sectional variables documented

in the extant literature, the difference in conclusions is stark. The evidence no longer suggests that

cross-sectional variables can be aggregated to form time-series predictors. Put differently, our

results show that cross-sectional variables are able to forecast in the cross-section of returns

because they contain information about the idiosyncratic component of returns, not the systematic

component of returns.


30

REFERENCES

Baker, M. and Wurgler, J., 2000, The Equity Share in New Issues and Aggregate Stock Returns,

The Journal of Finance 55, 2219–2257.

Bartsch, Viktoria-Sophie, Hubert Dichtl, Wolfgang Drobetz, and Andreas Neuhierl, 2017, Data

Snooping in Equity Premium Prediction, Working Paper.

Bay, Raymond, 1978, Anomalies in relationships between securities' yields and yield-

surrogates, Journal of Financial Economics 6, 103–126.

Berk, Jonathan B, 1995, A Critique of Size-Related Anomalies, Review of Financial Studies 8,

275–286.

Bossaerts, P., and Pierre Hillion, 1999, Implementing Statistical Criteria to Select Return

Forecasting Models: What Do We Learn?, Review of Financial Studies 12, 405–428.

Campbell, John Y. and Robert J. Shiller, 1988, The Dividend-Price Ratio and Expectations of

Future Dividends and Discount Factors, Review of Financial Studies 1, 195–228.

Campbell, J.Y. , Thompson, S.B., 2008. Predicting excess stock returns out of sample: can

anything beat the historical average? Review of Financial Studies 21, 1509–1531.

Chordia, Tarun, Richard Roll and Avanidhar Subrahmanyam, 2002, Order Imbalance,

Liquidity and Market Returns, Journal of Financial Economics 65, 111–130.

Chordia, Tarun, Amit Goyal and Alessio Saretto, 2018, In Defense of Market Efficiency:

Evidence from Two Million Strategies, working paper.

Clark, T.E., West, K.D., 2007. Approximately normal tests for equal predictive accuracy in

nested models, Journal of Econometrics 138, 291–311.

Cochrane, John H., 1991, Volatility Tests and Efficient Markets: A Review Essay, Journal of

Monetary Economics 27, 463–485.

Cochrane, John H., 2008, The Dog That Did Not Bark: A Defense of Return Predictability,

Review of Financial Studies 21, 1533–1575.

Cooper, Michael, and Hyseyin Gulen, Is Time-Series-Based Predictability Evident in Real

Time?, Journal of Business 79, 1263–1292.

Cowles, Alfred, 1933, Can Stock Market Forecasters Forecast?, Econometrica 1, 309–324.

Dickey, D.A., and W.A. Fuller, 1979, Distribution of the estimators for autoregressive time

series with a unit root. Journal of the American Statistical Association 74, 427–431.


https://scholar.google.com/citations?view_op=view_citation&hl=en&user=TEVgnyQAAAAJ&cstart=60&sortby=pubdate&citation_for_view=TEVgnyQAAAAJ:2osOgNQ5qMEC

https://scholar.google.com/citations?view_op=view_citation&hl=en&user=TEVgnyQAAAAJ&cstart=60&sortby=pubdate&citation_for_view=TEVgnyQAAAAJ:2osOgNQ5qMEC

31

Dow, C. H., 1920, Scientific Stock Speculation, The Magazine of Wall Street.

Dunn, O.J., 1961, Multiple comparisons among means, Journal of the American Statistical

Association 56, 52-64.

Foster, F. Douglas, Tom Smith, and Robert E. Whaley, 1997, Assessing Goodness-of-Fit of

Asset Pricing Models: The Distribution of the Maximal R2, Journal of Finance 52, 591–

607.

Gibson, Thomas, 1906, The Pitfalls of Speculation. The Moody Corporation, New York.

Green, J., J. R. Hand and X. F. Zhang, 2017, “The Characteristics that Provide Independent

Information about Average U.S. Monthly Stock Returns, forthcoming Review of

Financial Studies.

Goetzmann, W.N. , Jorion, P. , 1993. Testing the predictive power of dividend yields. Journal

of Finance 48, 663–679.

Goyal, Amit, and Ivo Welch, 2008, A Comprehensive Look at the Empirical Performance of

Equity Premium Prediction, Review of Financial Studies 21, 1455-1508.

Hansen, Peter Reinhard, 2005, A Test for Superior Predictive Ability, Journal of Business &

Economic Statistics 23, 365–380.

Harvey, Campbell R., Yan Liu, and Heqing Zhu, 2016, …and the Cross-Section of Expected

Returns, Review of Financial Studies 29, 5–68.

Harvey, Campbell R., and Yan Liu, 2015, Lucky Factors, Working Paper.

Hirshleifer, David, Kewei Hou, and Siew Hong Teoh, 2009, Accruals, cash flows, and

aggregate stock returns, Journal of Financial Economics 91, 389–406.

Hodrick, R.J., 1992. Dividend yields and expected stock returns: alternative procedures for

inference and measurement, Review of Financial Studies 5, 357–386.

Huang, Dashan and Guofu Zhou, 2017, Upper Bounds on Return Predictability, Journal of

Financial and Quantitative Analysis 52, 401–425.

Hou, K., C. Xue, and L. Zhang, 2017, Replicating Anomalies, Working Paper.

Jegadeesh, N., 1991, Seasonality in Stock Price Mean Reversion: Evidence from the U.S. and

the U.K, Journal of Finance 46, 1427–44.

Johnson, Travis, 2019, A fresh look at return predictability using a more efficient estimator,

Review of Asset Pricing Studies 9, 1-46.


32

Kacperczyk, M. S. Van Nieuwerburgh, and L. Veldkamp, 2016, A rational theory of mutual

funds' attention allocation, Econometrica 84, 571–626.

Lewellen, J., 2004, Predicting returns with financial ratios, Journal of Financial Economics 74,

209–235.

Lintner, J., 1965, The Valuation of Risky Assets and the Selection of Risky Investments in

Stock Portfolios and Capital Budgets, The Review of Economics and Statistics 47, 13–

37.

Lo, Andrew W., and A. Craig MacKinlay, Data-Snooping Bias in Tests of Financial Asset

Pricing Models, Review of Financial Studies 3, 431–467.

McLean, R. David, and Jeffrey Pontiff, 2016, Does Academic Publication Destroy Stock

Return Predictability?, Journal of Finance 71, 5–32.

Neely, C.J., D.E. Rapach, J. Tu, and G. Zhou, 2014, Forecasting the Equity Risk Premium: The

Role of Technical Indicators, Management Science 60, 1772–1791.

Nelson, C.R. , Kim, M.J. , 1993. Predictable stock returns: the role of small sample bias,

Journal of Finance 48, 641–661.

Politis, D., and J. Romano, 1994, The Stationary Bootstrap, Journal of the American Statistical

Association 89, 1303–1313.

Pontiff, Jeffrey, and Lawrence Schall, 1998, Book-to-market as a predictor of market returns,

Journal of Financial Economics 49, 141–160.

Rapach, David E., Matthew C. Ringgenberg, Guofu Zhou, 2016, Aggregate Short Interest and

Return Predictability, Journal of Financial Economics 121, 46–65.

Romano, Joseph P. and Michael Wolf, 2005, Stepwise multiple testing as formalized data

snooping, Econometrica 73, 1237–1282.

Romano, Joseph P. and Michael Wolf, 2016, Efficient computation of adjusted p-values for

resampling-based stepdown multiple testing, Statistics and Probability Letters 113, 38–

40.

Ross, S.A., 2005, Neoclassical Finance. Princeton University Press.

Schwert, G.W., 2003. Anomalies and market efficiency. In: Constantinides, G.M., Harris, M.,

Stultz, R. (Eds.), Handbook of the Economics of Finance, 1B. Elsevier, Amsterdam, pp.

939–974.

Seyhun, H. Nejat, 1988, The Information Content of Aggregate Insider Trading, The Journal

of Business 61, 1–24.


33

Sharpe, W., 1964, Capital Asset Prices: A theory of market equilibrium under conditions of

risk, Journal of Finance 19, 425–442.

Stambaugh, R.F., 1999, Predictive regressions, Journal of Financial Economics 54, 375–421.

Sullivan, R., A. Timmermann, and H. White, 1999, Data-snooping, technical trade rule

performance, and the bootstrap, The Journal of Finance 54, 1647–1691.

Wen, Q., 2019, Asset Growth and Stock Market Returns: A Time-Series Analysis, Review of

Finance 23, 599–628.

White, H., 2000, A Reality Check for Data Snooping, Econometrica 68, 1097–1126.

Zhou, G., 2010, How much stock return predictability can we expect from an asset pricing

model?, Economic Letters 108, 184–186.


34

Table 1

Summary Statistics The table displays summary statistics for the cross-sectional variables from which we construct time-series

predictors. To construct time-series predictors out of cross-sectional variables, we calculate the value-weighted

and equal-weighted mean across all stocks on each date. The first row displays statistics for all predictors, and

the remaining rows examine sub-samples formed on the ten most popular cross-sectional predictors (Popular),

the ten most statistically significant cross-sectional predictors (Best Cross-Sectional), and five different

groupings based on the categories in McLean and Pontiff (2016): (1) Event, (2) Fundamental, (3) Market, (4)

Opinion, and (5) Valuation. We examine only nine Popular variables due to ties at the tenth value. See Section

3.C for a discussion of these categories. We display the mean and median values of cross-sectional t-statics as

well as the number of citations for each predictor from Google Scholar as of 2018.

(1) (2) (3) (4) (5) (6)

Cross-Sectional t-statistic Citations

Type N Mean Median Mean Median

All 140 3.03 2.71 1,432 517

Popular 9 3.11 3.06 9,906 8,345

Best Cross-sectional 10 9.24 8.63 847 782

Event 36 3.83 2.94 869 467

Fundamental 36 2.37 1.91 973 428

Market 31 3.12 3.06 2668 878

Opinion 22 2.56 2.04 798 689

Valuation 15 3.14 2.88 2262 273


35

Table 2

Summary of In-Sample Anomaly Performance The table displays a count of the number of predictive variables that are statistically significant at the 10% level

or better. For each anomaly, we estimate an in-sample predictive regression of the form:

rt:t+h = α + βxt + εt:t+h for t = 1, . . . ,T −h,

where rt:t+h = (1/h)(rt+1 +···+rt+h), rt is the continuously compounded S&P 500 return for month t from CRSP

including dividends and excess of the monthly risk-free rate from Goyal and Welch (2008), h indicates the

forecast horizon in months, and xt is one of the 140 predictor variables. To construct time-series predictors out

of cross-sectional predictors, we calculate the value-weighted and equal-weighted mean across all stocks on each

date resulting in 280 possible predictors. Because some of the resulting variables are non-stationary, we also

examine the first-difference of these predictors and we examine a linearly de-trended version of these predictors.

We use the first version of each variable that is stationary; 12 predictors are non-stationary in every form, so

there are 268 predictors. The table shows the number of predictors that are statistically significant relative to the

number of variables considered. Panel A summarizes the results for all predictors, Panel B examines value-

weighted predictors, Panel C examines equal-weighted predictors. In each panel, the first row displays results

for all variables in that panel, and the remaining rows examine sub-samples formed on the ten most popular

cross-sectional predictors (Popular), the ten most statistically significant cross-sectional predictors (Best Cross-

Sectional), and five different groupings based on the categories in McLean and Pontiff (2016): (1) Event, (2)

Fundamental, (3) Market, (4) Opinion, and (5) Valuation. See Section 3.C for a discussion of these categories.

(1) (2) (3) (4) (5)

Number significant predictors / Total number predictors

Predictive Variable h=1 h=3 h=6 h=12

Panel A: All Predictors

All Predictors 34/268 43/268 45/268 54/268

Popular 4/15 4/15 5/15 6/15

Best Cross-sectional 5/20 4/20 5/20 4/20

Event 7/69 9/69 14/69 18/69

Fundamental 10/66 13/66 14/66 11/66

Market 10/60 9/60 9/60 15/60

Opinion 6/44 9/44 3/44 2/44

Valuation 1/29 3/29 5/29 8/29

Panel B: VW Predictors

All Predictors 21/136 26/136 28/136 33/136

Popular 2/8 3/8 4/8 4/8


Event 4/35 4/35 8/35 11/35

Fundamental 7/34 10/34 11/34 7/34

Market 6/30 7/30 6/30 10/30

Opinion 3/22 3/22 1/22 1/22

Valuation 1/15 2/15 2/15 4/15

Panel C: EW Predictors

All Predictors 13/132 17/132 17/132 21/132

Popular 2/7 1/7 1/7 2/7


Event 3/34 5/34 6/34 7/34

Fundamental 3/32 3/32 3/32 4/32

Market 4/30 2/30 3/30 5/30

Opinion 3/22 6/22 2/22 1/22

Valuation 0/14 1/14 3/14 4/14


36

Table 3

Best In-Sample Predictive Regression Results using Romano and Wolf p-values The table reports the ordinary least squares estimate of β, p-values, and the 𝑅2 statistic from in-sample predictive

regression models of the form:


where rt:t+h = (1/h)(rt+1 +···+rt+h), rt is the continuously compounded S&P 500 return for month t from CRSP including

dividends and excess of the monthly risk-free rate from Goyal and Welch (2008), h indicates the forecast horizon in

months, and xt is the predictor variable shown in columns (2) and (9). Panel A displays results for the 1-month horizon,

Panel B shows the 3-month horizon, Panel C shows the 6-month horizon, and Panel D shows the 12-month horizon.

Within each panel, predictors are sorted by their Romano and Wolf p-value and then their unadjusted p-value. We

report all predictors that have unadjusted p-values less than 10% for a given horizon. Unadjusted p-values are shown

in columns (6) and (13) and Romano and Wolf (2005) adjusted p-values are shown in columns (7) and (14).

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14)

EW EW

or Raw RW or Raw RW

Rank Predictor VW �̂� 𝑹𝟐 P-Value Rank Predictor VW �̂� 𝑹𝟐 P-Value

Panel A: 1 Month Horizon 1 Z-score VW -0.0056 1.6% 0.00 0.82 18 Amihud's Measure EW 0.0026 0.4% 0.02 1.00

2 ST reversal VW 0.0050 0.8% 0.07 0.82 19 Spinoffs VW -0.0040 0.9% 0.03 1.00

3 Size VW -0.0048 0.8% 0.01 0.86 20 ΔRec + Accrual EW -0.0046 1.1% 0.03 1.00 4 ST reversal EW 0.0049 0.8% 0.09 0.86 21 Δ# of Analysts VW -0.0043 1.1% 0.03 1.00

5 Asset Turnover VW 0.0053 1.5% 0.00 0.88 22 Forecast Dispersion VW 0.0032 0.6% 0.03 1.00

6 Z-score EW -0.0052 1.4% 0.01 0.91 23 Inventory Growth EW 0.0033 0.6% 0.04 1.00 7 LT reversal VW -0.0046 0.7% 0.01 0.92 24 Secured / Total Debt VW 0.0040 0.9% 0.04 1.00

8 Zero Trading Days VW 0.0042 1.0% 0.00 0.96 25 Sustainable Growth VW -0.0031 0.5% 0.04 1.00

9 ΔNC Op. Assets VW -0.0036 0.7% 0.02 0.98 26 Moment LT Reverse VW 0.0025 0.2% 0.05 1.00 10 SEO VW -0.0042 0.9% 0.02 0.99 27 Spreads EW 0.0026 0.4% 0.05 1.00

11 ΔRec + Accrual VW -0.0049 1.3% 0.05 0.99 28 Gross Profitability VW -0.0030 0.5% 0.05 1.00

12 Coskewness EW 0.0036 0.7% 0.03 0.99 29 Misvalue Innovation VW -0.0035 0.7% 0.07 1.00 13 Coskewness VW 0.0037 0.8% 0.06 0.99 30 Target Price Return EW -0.0054 1.7% 0.07 1.00

14 NOA VW -0.0035 0.7% 0.02 1.00 31 Cash Flow Var. EW 0.0034 0.6% 0.08 1.00

15 ΔTax to Assets EW 0.0055 1.6% 0.00 1.00 32 Repurchases EW -0.0035 0.6% 0.09 1.00 16 Δ# of Analysts EW -0.0043 1.0% 0.01 1.00 33 Capex Growth VW -0.0032 0.5% 0.09 1.00

17 Asset Growth VW -0.0035 0.7% 0.02 1.00 34 Profit Margin EW 0.0031 0.5% 0.10 1.00

Panel B: 3 Month Horizon

1 Short Interest EW -0.0062 5.7% 0.03 0.25 23 Δ Rec + Accrual VW -0.0037 2.1% 0.03 1.00

2 Z-score VW -0.0055 4.6% 0.00 0.34 24 Cash Flow Variance EW 0.0033 1.7% 0.05 1.00

3 Asset Turnover VW 0.0056 4.7% 0.00 0.34 25 Misvalue Innovation VW -0.0032 1.6% 0.06 1.00 4 Size VW -0.0050 2.4% 0.00 0.35 26 Capex Growth VW -0.0032 1.6% 0.07 1.00

5 LT Reversal VW -0.0048 2.2% 0.01 0.45 27 Spreads VW 0.0028 1.2% 0.07 1.00

6 Zero Trading Days VW 0.0042 2.8% 0.00 0.63 28 Accruals VW 0.0029 1.3% 0.08 1.00 7 Z-score EW -0.0046 3.1% 0.00 0.69 29 Analyst Intensity EW -0.0026 1.0% 0.01 1.00

8 SEO VW -0.0045 3.1% 0.00 0.71 30 Spreads EW 0.0024 0.9% 0.04 1.00

9 NOA VW -0.0038 2.3% 0.01 0.80 31 Moment LT Reverse VW 0.0022 0.4% 0.06 1.00 10 ΔNC Op. Assets VW -0.0033 1.8% 0.03 0.85 32 Secured / Total Debt VW 0.0021 0.7% 0.06 1.00

11 Target Price Return EW -0.0062 6.0% 0.01 0.85 33 ΔBook to Assets VW -0.0013 0.2% 0.06 1.00

12 Asset Growth VW -0.0036 2.1% 0.01 0.86 34 Reverse Stock Split VW 0.0028 1.3% 0.07 1.00

13 Coskewness VW 0.0036 2.1% 0.04 0.86 35 Dividend Omission EW 0.0026 1.0% 0.08 1.00

14 Inventory Growth EW 0.0036 2.1% 0.01 0.88 36 Δ # of Analysts VW -0.0037 2.1% 0.08 1.00

15 % Operat. Accrual VW -0.0045 3.4% 0.04 0.93 37 Opportunistic Sells VW -0.0028 1.3% 0.09 1.00 16 Sustain Growth VW -0.0033 1.7% 0.03 0.95 38 Reverse Stock Split EW -0.0020 0.6% 0.09 1.00

17 Profit Margin EW 0.0034 1.7% 0.04 0.97 39 Δ Rec + Accrual EW -0.0029 1.3% 0.09 1.00

18 Agnostic Value VW 0.0040 2.5% 0.01 0.98 40 ΔNumber of Analyst EW -0.0034 1.9% 0.09 1.00 19 Gross Profitability VW -0.0029 1.3% 0.05 0.98 41 Low Recommend EW -0.0026 1.1% 0.10 1.00

20 Repurchases EW -0.0032 1.6% 0.07 0.99 42 Volume VW -0.0020 0.4% 0.10 1.00

21 Δ Tax to Assets EW 0.0044 2.9% 0.00 1.00 43 ΔInstitutional owner EW -0.0022 0.7% 0.10 1.00 22 Agnostic Value EW -0.0037 2.2% 0.02 1.00


37

Table 3 - Continued (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14)

EW EW

or Raw RW or Raw RW


Panel C: 6 Month Horizon 1 Short Interest EW -0.0063 10.8% 0.02 0.08 24 Repurchases EW -0.0030 2.6% 0.04 0.94

2 Asset Turnover VW 0.0059 9.7% 0.00 0.09 25 Coskewness VW 0.0027 2.2% 0.05 0.94 3 Z-score VW -0.0057 9.1% 0.00 0.10 26 Inventory Growth VW 0.0028 2.3% 0.08 0.94

4 Size VW -0.0050 4.7% 0.00 0.11 27 Volume Trend EW 0.0032 3.0% 0.09 0.95

5 LT Reversal VW -0.0049 4.6% 0.01 0.15 28 Spreads EW 0.0026 2.0% 0.01 0.95 6 Zero Trading Days VW 0.0043 5.5% 0.00 0.34 29 Momentum-reversal VW -0.0026 1.2% 0.06 0.95

7 Z-score EW -0.0044 5.2% 0.00 0.53 30 Dividends EW 0.0026 2.0% 0.06 0.95

8 Inventory Growth EW 0.0039 4.5% 0.01 0.56 31 Reverse Stock Split VW 0.0029 2.6% 0.01 0.96 9 NOA VW -0.0037 4.0% 0.00 0.63 32 ΔSales-ΔInventory EW -0.0026 2.0% 0.05 0.97

10 Asset Growth VW -0.0037 3.9% 0.01 0.66 33 Share issues PW VW -0.0028 2.1% 0.10 0.98

11 Spreads VW 0.0035 3.7% 0.00 0.67 34 R&D/MV EW 0.0026 2.0% 0.04 0.98 12 Sustainable Growth VW -0.0036 3.8% 0.01 0.70 35 Cash Flow Variance EW 0.0029 2.6% 0.05 0.98

13 ΔNC Op. Assets VW -0.0032 3.1% 0.04 0.70 36 Misvalue Innovation VW -0.0029 2.5% 0.07 0.99

14 % Operating Accrual VW -0.0047 6.7% 0.01 0.71 37 Org. Capital EW 0.0023 1.5% 0.05 1.00 15 SEO VW -0.0038 4.0% 0.01 0.76 38 Share issues DT VW -0.0017 0.8% 0.06 1.00

16 Profit Margin EW 0.0036 3.6% 0.00 0.80 39 Share issues DT EW -0.0021 1.2% 0.06 1.00

17 Gross Profitability VW -0.0030 2.7% 0.02 0.87 40 Earn Consistency VW -0.0016 0.7% 0.10 1.00 18 Target Price Return EW -0.0052 7.0% 0.02 0.91 41 Agnostic Value VW 0.0017 0.9% 0.05 1.00

19 ΔCapex-ΔInd Capex VW -0.0029 2.5% 0.01 0.92 42 Analyst Intensity EW -0.0012 0.4% 0.04 1.00

20 Capex Growth VW -0.0031 3.0% 0.05 0.93 43 ΔBook to Assets VW -0.0010 0.3% 0.06 1.00 21 ΔTax to Assets EW 0.0048 6.0% 0.00 0.93 44 Cash to Assets VW 0.0014 0.6% 0.05 1.00

22 accruals VW 0.0029 2.4% 0.03 0.94 45 Opportunistic Sells VW -0.0026 2.1% 0.02 1.00

23 ΔCapex-ΔInd Capex EW -0.0029 2.4% 0.00 0.94

Panel D: 12 Month Horizon

1 Z-score VW -0.0058 18.5% 0.00 0.03 28 Volume Trend EW 0.0030 5.0% 0.05 0.91

2 Size VW -0.0054 10.0% 0.00 0.04 29 Share issues PW VW -0.0027 4.0% 0.05 0.92 3 Asset Turnover VW 0.0058 17.8% 0.00 0.04 30 ΔAsset Turnover VW 0.0028 4.2% 0.02 0.92

4 Short Interest EW -0.0052 14.2% 0.01 0.11 31 Dividends EW 0.0023 3.2% 0.04 0.94

5 LT Reversal VW -0.0044 7.3% 0.00 0.13 32 Target Price Return EW -0.0043 8.7% 0.03 0.95 6 Zero Trading Days VW 0.0040 9.5% 0.00 0.21 33 SEO VW -0.0026 3.6% 0.06 0.96

7 Z-score EW -0.0043 9.8% 0.00 0.30 34 Pension Funding EW -0.0028 4.4% 0.05 0.96

8 Sustainable Growth VW -0.0039 8.3% 0.00 0.33 35 Inventory Growth VW 0.0023 2.9% 0.10 0.98 9 ΔNC Op. Assets VW -0.0035 6.8% 0.01 0.34 36 ΔSales-ΔInventory EW -0.0022 2.8% 0.02 0.98

10 Asset Growth VW -0.0037 7.7% 0.00 0.37 37 Momentum-reversal VW -0.0022 1.7% 0.04 0.98

11 Spreads VW 0.0035 7.2% 0.00 0.40 38 Lagged Momentum VW -0.0021 1.6% 0.08 0.98 12 Inventory Growth EW 0.0037 7.7% 0.01 0.40 39 R&D/MV EW 0.0022 2.5% 0.04 0.99

13 Price VW -0.0036 4.1% 0.02 0.50 40 Insider Buy VW 0.0025 3.8% 0.02 0.99

14 % Operating Accrual VW -0.0046 12.2% 0.01 0.52 41 Reverse Stock Split VW 0.0022 2.8% 0.03 0.99 15 NOA VW -0.0033 6.3% 0.00 0.57 42 Coskewness VW 0.0019 2.1% 0.06 0.99

16 Capex Growth VW -0.0034 7.1% 0.01 0.66 43 Cash Flow Variance EW 0.0023 3.0% 0.06 0.99

17 Profit Margin EW 0.0033 5.9% 0.00 0.72 44 Exchange Switch VW -0.0019 2.1% 0.07 0.99 18 ΔCapex-ΔInd Capex EW -0.0030 5.2% 0.00 0.73 45 Covariance Risk VW 0.0018 2.0% 0.09 1.00

19 R&D Increases EW -0.0029 4.6% 0.05 0.82 46 M/B and Accruals VW -0.0020 2.3% 0.09 1.00 20 ΔCapex-ΔInd Capex VW -0.0028 4.6% 0.00 0.83 47 ΔTax to Assets EW 0.0029 4.0% 0.01 1.00

21 Spreads EW 0.0027 4.2% 0.00 0.83 48 Moment LT Reverse VW 0.0015 0.8% 0.02 1.00

22 Gross Profitability VW -0.0027 4.3% 0.02 0.84 49 E/P VW 0.0008 0.4% 0.04 1.00

23 Org. Capital EW 0.0030 4.9% 0.00 0.86 50 Coskewness EW 0.0010 0.6% 0.05 1.00

24 Repurchases EW -0.0031 4.9% 0.02 0.87 51 Share issues DT EW -0.0018 1.8% 0.05 1.00

25 Employee growth VW -0.0027 4.3% 0.05 0.87 52 E/P EW -0.0017 1.7% 0.05 1.00 26 Misvalue Innovation VW -0.0030 5.3% 0.03 0.91 53 Dividend Initiation VW 0.0013 1.0% 0.08 1.00

27 LT Reversal EW -0.0025 2.3% 0.05 0.91 54 CF/MV VW 0.0007 0.3% 0.08 1.00


38

Table 4

Summary of Out of Sample Anomaly Performance The table displays a count of the number of predictive variables that are statistically significant at the 10% level or

better. For each anomaly, we estimate an out of sample predictive regression of the form:




months, and xt is one of the 140 predictor variables. To construct aggregate predictors out of cross-sectional anomalies,

we calculate the value-weighted and equal-weighted mean across all stocks on each date. Because some of the

resulting variables are non-stationary, we also examine the first-difference of these predictors and we examine

a linearly de-trended version of these predictors. We use the first version of each variable that is stationary; 12

predictors are non-stationary in every form and we omit variables with less than 20 years of data, which yields

263 possible predictors. The table shows the number of predictors that are statistically significant relative to the

number of variables considered. Panel A summarizes the results for all predictors, Panel B examines value-

weighted predictors, Panel C examines equal-weighted predictors. In each panel, the first row displays results

for all variables in that panel, and the remaining rows examine sub-samples formed on the ten most popular

cross-sectional predictors (Popular), the ten most statistically significant cross-sectional predictors (Best Cross-

Sectional), and five different groupings based on the categories in McLean and Pontiff (2015): (1) Event, (2)

Fundamental, (3) Market, (4) Opinion, and (5) Valuation. See Section 3.C for a discussion of these categories.

(1) (2) (3) (4) (5)

Number significant predictors / Total number predictors

Predictive Variable h=1 h=3 h=6 h=12

Panel A: All Predictors

All Predictors 7/263 32/263 41/263 45/263

Popular 0/15 3/15 8/15 9/15


Event 1/67 5/67 9/67 10/67

Fundamental 4/65 8/65 8/65 8/65

Market 1/61 15/61 22/61 25/61

Opinion 1/41 3/41 1/41 0/41

Valuation 0/29 1/29 1/29 2/29

Panel B: VW Predictors

All Predictors 4/135 21/135 24/135 27/135

Popular 0/8 2/8 4/8 4/8


Event 0/34 3/34 5/34 6/34

Fundamental 3/34 5/34 6/34 7/34

Market 0/31 10/31 12/31 13/31

Opinion 1/21 2/21 0/21 0/21

Valuation 0/15 1/15 1/15 1/15

Panel C: EW Predictors

All Predictors 3/128 11/128 17/128 18/128

Popular 0/7 1/7 4/7 5/7


Event 1/33 2/33 4/33 4/33

Fundamental 1/31 3/31 2/31 1/31

Market 1/30 5/30 10/30 12/30

Opinion 0/20 1/20 1/20 0/20

Valuation 0/14 0/14 0/14 1/14


39

Table 5

Best Out-of-Sample Predictive Regression Results using Romano and Wolf p-values The table reports the mean of the ordinary least squares estimate of β, p-values, and the Campbell and Thompson

(2008) 𝑅𝑂𝑆2 statistic from out of sample predictive regression models of the form:




months, and xt is the predictor variable in the first column. �̂� (column (4)) is the time-series mean of the coefficient

estimates for each predictor. The Campbell and Thompson 𝑅𝑂𝑆2 statistic (columns (5) and (12)) is calculated as 1 minus

the proportional reduction in mean squared forecast error (MSFE) at the h-month horizon for a predictive regression

forecast of the S&P 500 log excess return based on the predictor variable in the first column vis-a-vis the prevailing

mean benchmark forecast. Panel A displays results for the 1-month horizon, Panel B shows the 3-month horizon,

Panel C shows the 6-month horizon, and Panel D shows the 12-month horizon. Within each panel, predictors are

sorted by their Romano and Wolf p-value and then their unadjusted p-value. We report all predictors that have

unadjusted p-values less than 10% for a given horizon. Unadjusted p-values are shown in columns (6) and (13) and

Romano and Wolf (2005) adjusted p-values are shown in columns (7) and (14).

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14)

EW EW

or Raw RW or Raw RW


Panel A: 1 Month Horizon 1 Asset Turnover VW 0.0070 0.3% 0.04 1.00

2 ΔRec + Accrual VW -0.0029 1.1% 0.04 1.00

3 Z-score EW -0.0043 0.5% 0.04 1.00 4 Z-score VW -0.0049 0.7% 0.05 1.00

5 ΔTax to Assets EW 0.0065 -0.9% 0.06 1.00

6 Pension Funding VW -0.0043 -0.4% 0.08 1.00 7 Coskewness EW 0.0045 0.0% 0.09 1.00

Panel B: 3 Month Horizon

1 Misvalue Innovation VW -0.0030 1.3% 0.04 0.97 17 Short Interest VW 0.0059 0.5% 0.04 1.00 2 Short Interest EW -0.0051 6.4% 0.02 0.99 18 Exchange Switch VW -0.0034 -0.9% 0.04 1.00

3 Idio Risk VW -0.0015 1.1% 0.00 1.00 19 Repurchases EW -0.0039 3.9% 0.04 1.00 4 Asset Turnover VW 0.0073 3.8% 0.00 1.00 20 ΔInstitutional owner VW 0.0036 -13.4% 0.04 1.00

5 ST Reversal EW -0.0002 0.4% 0.00 1.00 21 Cash Flow Variance VW 0.0042 0.0% 0.05 1.00

6 ST Reversal VW -0.0007 0.3% 0.00 1.00 22 LT Reversal VW -0.0066 -0.6% 0.05 1.00 7 Z-score VW -0.0048 3.3% 0.01 1.00 23 Cash Flow Variance EW 0.0020 1.1% 0.05 1.00

8 Lagged Momentum VW -0.0013 0.8% 0.01 1.00 24 Age EW 0.0163 -106.8% 0.07 1.00

9 Max VW -0.0014 1.2% 0.01 1.00 25 Moment LT Reverse VW 0.0021 0.3% 0.08 1.00 10 Analyst Intensity EW -0.0073 -0.1% 0.02 1.00 26 Age VW 0.0019 -0.1% 0.08 1.00

11 Lagged Momentum EW -0.0010 0.6% 0.02 1.00 27 Reverse Stock Split VW 0.0048 -0.1% 0.08 1.00

12 ΔNC Op. Assets VW -0.0013 1.6% 0.02 1.00 28 ΔTax to Assets EW 0.0051 2.2% 0.08 1.00 13 Price VW -0.0040 1.0% 0.02 1.00 29 ΔRec + Accrual VW -0.0030 1.5% 0.09 1.00

14 Size EW 0.0000 0.3% 0.03 1.00 30 Capex Growth VW -0.0024 1.3% 0.09 1.00

15 Coskewness VW 0.0055 1.4% 0.03 1.00 31 Max EW -0.0001 0.4% 0.09 1.00 16 Z-score EW -0.0039 1.4% 0.04 1.00 32 Momentum-reversal VW -0.0009 0.5% 0.10 1.00


40

Table 5 - Continued (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14)

EW EW

or Raw RW or Raw RW


Panel C: 6 Month Horizon 1 Misvalue Innovation VW -0.0027 2.9% 0.04 0.72 22 LT Reversal EW -0.0028 1.5% 0.03 1.00

2 Short Interest EW -0.0050 15.8% 0.01 0.87 23 Repurchases EW -0.0043 7.0% 0.03 1.00 3 Cash Flow Variance EW 0.0018 2.4% 0.03 0.90 24 Exchange Switch VW -0.0041 -1.9% 0.04 1.00

4 ST Reversal EW 0.0000 0.3% 0.00 0.95 25 Short Interest VW 0.0047 4.2% 0.04 1.00

5 Z-score VW -0.0050 7.4% 0.01 0.97 26 Reverse Stock Split VW 0.0043 -1.3% 0.04 1.00 6 Idio Risk VW -0.0020 1.6% 0.00 0.99 27 Spreads VW 0.0080 -2.3% 0.04 1.00

7 Size VW 0.0004 0.4% 0.00 1.00 28 Idio Risk EW -0.0001 1.3% 0.04 1.00

8 ST Reversal VW 0.0003 0.3% 0.00 1.00 29 Momentum-reversal EW -0.0011 1.2% 0.04 1.00 9 Asset Turnover VW 0.0075 9.6% 0.00 1.00 30 Age VW 0.0019 -0.1% 0.05 1.00

10 Moment LT Reverse VW 0.0008 1.2% 0.00 1.00 31 Coskewness VW 0.0046 1.4% 0.05 1.00

11 Momentum-reversal VW -0.0024 2.2% 0.00 1.00 32 Dividends EW 0.0000 1.0% 0.06 1.00 12 Price VW -0.0036 1.9% 0.01 1.00 33 Z-score EW -0.0041 1.9% 0.06 1.00

13 Max EW -0.0014 0.8% 0.01 1.00 34 ΔCapex-ΔInd Capex VW -0.0037 0.0% 0.06 1.00

14 Max VW -0.0024 1.5% 0.01 1.00 35 Price EW -0.0023 0.5% 0.06 1.00 15 Size EW 0.0005 0.2% 0.01 1.00 36 Dividends VW 0.0000 1.0% 0.06 1.00

16 Lagged Momentum VW -0.0015 1.9% 0.01 1.00 37 ΔCapex-ΔInd Capex EW -0.0038 -0.1% 0.06 1.00

17 LT Reversal VW -0.0064 1.2% 0.01 1.00 38 Capex Growth VW -0.0026 3.1% 0.07 1.00 18 Spreads EW 0.0120 -1.6% 0.02 1.00 39 ΔSales-ΔInventory EW -0.0042 -4.3% 0.08 1.00

19 ΔNC Op. Assets VW -0.0009 3.3% 0.02 1.00 40 Asset Growth VW 0.0003 1.0% 0.09 1.00

20 Lagged Momentum EW -0.0012 1.6% 0.02 1.00 41 Analyst Intensity EW -0.0043 -0.7% 0.09 1.00 21 Cash Flow Variance VW 0.0057 0.3% 0.03 1.00

Panel D: 12 Month Horizon

1 Misvalue Innovation VW -0.0029 6.1% 0.04 0.62 24 Idio Risk VW 0.0005 -0.8% 0.01 1.00 2 Short Interest EW -0.0049 20.6% 0.02 0.70 25 Capex Growth VW -0.0034 7.6% 0.02 1.00

3 Cash Flow Variance EW 0.0013 4.4% 0.02 0.80 26 Max EW -0.0003 -0.3% 0.03 1.00

4 Z-score VW -0.0053 15.0% 0.03 0.94 27 Dividends EW 0.0000 1.7% 0.03 1.00 5 Moment LT Reverse VW 0.0013 4.0% 0.00 0.96 28 Dividends VW 0.0000 1.7% 0.03 1.00

6 Moment LT Reverse EW 0.0009 3.9% 0.00 0.99 29 ΔNC Op. Assets VW -0.0008 7.4% 0.03 1.00

7 Short Interest VW 0.0026 5.6% 0.02 0.99 30 Age VW 0.0021 -1.2% 0.03 1.00 8 Repurchases EW -0.0044 11.2% 0.04 0.99 31 Exchange Switch VW -0.0040 -2.3% 0.04 1.00

9 Lagged Momentum VW -0.0020 4.4% 0.00 1.00 32 Momentum-reversal EW -0.0006 1.1% 0.05 1.00

10 Size VW 0.0009 -0.2% 0.00 1.00 33 Sustainable Growth VW -0.0021 2.3% 0.05 1.00 11 Reverse Stock Split VW 0.0028 -2.5% 0.00 1.00 34 ΔCapex-ΔInd Capex EW -0.0025 2.5% 0.05 1.00

12 Lagged Momentum EW -0.0011 2.9% 0.00 1.00 35 ΔCapex-ΔInd Capex VW -0.0030 0.4% 0.07 1.00

13 Price VW -0.0047 3.0% 0.00 1.00 36 Cash Flow Variance VW 0.0062 -0.8% 0.07 1.00 14 Momentum-reversal VW -0.0020 2.7% 0.00 1.00 37 Org. Capital EW 0.0027 4.6% 0.07 1.00

15 Spreads EW 0.0096 1.5% 0.00 1.00 38 Volume / MV VW -0.0012 0.0% 0.09 1.00

16 Max VW -0.0013 1.1% 0.00 1.00 39 Price EW -0.0030 -0.8% 0.09 1.00 17 Asset Turnover VW 0.0079 17.5% 0.00 1.00 40 Volume Trend EW 0.0031 2.6% 0.09 1.00

18 ST Reversal VW 0.0011 -1.0% 0.01 1.00 41 Momentum EW 0.0007 -1.6% 0.09 1.00

19 ST Reversal EW 0.0010 -1.0% 0.01 1.00 42 Asset Growth VW 0.0001 3.1% 0.09 1.00 20 Spreads VW 0.0067 1.6% 0.01 1.00 43 ΔSales-ΔInventory EW -0.0024 -4.6% 0.09 1.00

21 Size EW 0.0010 -1.3% 0.01 1.00 44 Coskewness VW 0.0039 0.2% 0.09 1.00 22 LT Reversal VW -0.0052 6.1% 0.01 1.00 45 % Operating Accrual VW -0.0046 9.8% 0.10 1.00

23 LT Reversal EW -0.0025 3.6% 0.01 1.00


Documents

Are Cross-Sectional Predictors Good Market-Level …...Owen Lamont, Allan Timmermann, and conference and seminar participants at the 2018 Society for Financial Studies Cavalcade, the