Not Fooled by Randomness

  • Upload
    sootos

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

  • 8/12/2019 Not Fooled by Randomness

    1/52Electronic copy available at: http://ssrn.com/abstract=2143293

    Not fooled by randomness: using random

    portfolios to analyze investment funds

    Roberto [email protected]

    Faculty of Economics and Business, University of Chile

    August 2012

    The biggest challenge in testing mutual funds for manager skill is the lack of a

    probability distribution of returns under the null hypothesis of no skill. A

    methodology based on randomly trading portfolios and non parametric statistical

    tests is explored, and a test of skill is proposed. Simulation is used to perform an

    in-depth study of the properties of this test, and to compare its power against that

    of other tests of skill based on factor model alphas. Empirical tests performed on a

    sample of US equity mutual funds find evidence of skill in a reduced number of

    managers, but that the value added by this skill is charged away from the investors

    in the form of fund fees and expenses. Overall, random portfolio based measures

    are found to be more powerful and easier to interpret than tests based on traditional

    and bootstrapped factor model alphas.

  • 8/12/2019 Not Fooled by Randomness

    2/52Electronic copy available at: http://ssrn.com/abstract=2143293

    1

    1. Introduction

    Fund performance measures, while theoretically good indicators of past overperformance1,

    are notoriously unreliable predictors of future performance2. This makes them a poor

    choice for investors who wish to allocate their capital in funds that, at least in expectation,

    will overperform in the future.

    Perhaps this is why in the last few years the discussion has shifted from measuring

    performance to a more elusive factor: testing fund manager skill. The implicit argument is

    that, while skill in no way guarantees persisting overperformance, a fund manager that has

    obtained a high level of performance in the past through skill is much more likely to repeat

    such performance in the future than one who was merely lucky.

    While the argument is certainly sound, performance is an observable variable, while skill is

    not and therefore measuring or testing for skill presents important empirical problems.

    Past attempts at measuring skill are based solely on factor model alphas, first as indicators

    of overperformance and lately interpreted as signals of manager skill. However, as

    explained in Kosowski, Timmermann, Wermers and White (2006) (henceforth KTWW),

    the cross-sectional distribution of the resulting regression alphas exhibit strong deviations

    from normality, which invalidates standard statistical significance tests and, much more

    importantly, it is not clear what the distribution of alphas should be under the null

    hypothesis of no skill. KTWW claim to solve both these problems with a bootstrap

    1See Kothari & Warner (2001), who critique regularly used performance measures as lacking power enough

    to detect economically large magnitudes of abnormal fund performance.2See Carhart (1997) and most of the literature regarding persistence in mutual fund performance, as well as

    Goetzmann (2007) on how, in any case, most of these measures are susceptible to fund manager

    manipulation.

  • 8/12/2019 Not Fooled by Randomness

    3/52Electronic copy available at: http://ssrn.com/abstract=2143293

    2

    methodology, which they apply to a large sample of U.S. mutual funds, as do Cuthbertson,

    Nitzsche and O'Sullivan (2008) in the U.K.

    In the present paper I test these methodologies and find that both of them, standard and

    bootstrap alpha, are prone to be fooled by randomness: that is, they falsely detect skill

    in simulated samples of portfolio returns where overperformance is a result of luck. This is

    not surprising, since regression alphas are also performance metrics, and as such they are

    highly correlated with fund performance, thus supplying little in the way of new

    information above regularly used measurements.

    I develop a new methodology to test for skill, based on randomly trading portfolios as

    proposed in Burns (2007), which are used to derive the empirical distribution of fund

    returns under the null of no skill. This measure is superior to the alpha-based

    methodologies in that it is powerful enough to distinguish skill from luck in all except the

    most extreme cases of luck, is more intuitive in its interpretation since it relies on simple

    fund returns, and is designed to be applied to the identification of skill in individual funds,

    as opposed to the KTWW method and that of Barras, Scaillet and Wermers (2008), which

    is designed to find only the proportion of skilled funds in a given market.

    This paper is organized as follows. The next section explores previous measures of fund

    manager skill and the results obtained with these. Section 3 details the methodologies

    behind the alpha and random portfolio measures of skill. Section 4 presents the results of

    tests the power of the measures, using simulated samples of portfolios that are constructed

    to obtain a performance above a benchmark through skill or luck. Section 5 presents results

    of the application of both types of measures to a sample of U.S. equity funds, and Section 6

    contains concluding remarks.

  • 8/12/2019 Not Fooled by Randomness

    4/52

    3

    2. Measuring fund manager skill: current state of the art and new proposed measure

    Factor model alphas have recently been pushed beyond the original Jensen performance

    measure and considered evidence of fund manager skill. In general, this line of research

    involves controlling portfolio returns for known risk factors, such as exposure to the

    market, firm size, market-to-book ratio and momentum. If these models yield positive and

    significant values of alpha, then this is considered evidence that the manager of the fund is

    skillful (see Silli (2006) for a review).

    However, these traditional regression alphas are of dubious value as tools to evaluate skill.

    Apart from the various criticisms inherent in regression models (see Silli (2006), Ferson

    and Schadt (1996), Christopherson (1998), Spiegel, Mamaysky and Zhang (2003, 2006),

    and others), two critical shortcomings can be identified in these models that make any

    inference gleaned from them unreliable. First, hypothesis testing of regression alphas relies

    on the assumption of normality of the alphas distribution. KTWW find extreme deviations

    from normality in the distribution of alphas in the U.S. market, as do Cuthbertson, Nitzsche

    and O'Sullivan (2008) in the U.K. Second, in order to conduct a statistical test, we need to

    have the probability distribution of the variable of interest under the null hypothesis.

    However, we do not know what a distribution of alphas looks like if funds are managed

    with no skill. We therefore take a positive and significant alpha as evidence of skill, even

    though Kosowski et. al. use basic statistical testing criteria to show that, in a relatively large

    sample, we can expect to observe a certain number of alphas that are positive and

    significant through pure random fluctuation (i.e.: luck).

    KTWW propose a bootstrap approach to improve the regression alpha test of skill. First

    they select a sample of funds that operate in a certain market. For each fund in the sample,

    they then regress its returns on a factor model, and obtain factor loadings and residuals.

  • 8/12/2019 Not Fooled by Randomness

    5/52

    4

    Then, they repetitively resample the residuals and, together with the factor loadings,

    construct a new set of returns, where the value of alpha is set to zero. Finally, they regress

    the resulting returns on the same factor model and obtain the resulting alpha. Repeating this

    process a number of times, KTWW construct a null distribution of no skill for the alphas

    of the full market under study (a distribution as it would look if the alphas real value were

    zero). They then compare the number of funds in the market for which regression alphas

    are positive and significant with the number of alphas that happen to be positive and

    significant in this null distribution, that is, by luck. Their conclusion is that since the

    number of real positive alphas exceeds that of the lucky alphas, then some of the real

    alphas must be the result of fund manager skill, as random fluctuation cannot explain them

    all away.

    The bootstrap methodology addresses the problem of normality, as the testing is done on

    the basis of empirical distributions and there is no assumption of a parametric one. Also, by

    forcing alpha to be zero in the iteration process, this bootstrap process is one way to obtain

    a distribution under the null of no skill.

    However, the methodology still suffers from serious drawbacks. First, the characterization

    of alpha as a measure of skill, even if correctly estimated, is still a matter of interpretation

    as opposed to simple returns which are unambiguous in their origin and interpretation.

    Second, this measure is of an absolute nature and, as is also the case with the standard

    alpha analysis described above, there will inevitably be a high correlation between

    performance and alphas. That is, funds that overperform will tend to have positive and

    significant alphas, irrespective of manager skill. This is because factor models regress fund

    returns on a number of factors, and these factors are equal for all funds tested. In essence,

    the factors become a type of benchmark that is equally applied to each fund to correct

  • 8/12/2019 Not Fooled by Randomness

    6/52

    5

    returns for exposure to certain risks. The consequence is that inference is unreliable: funds

    that perform well need not be managed by skillful managers since luck plays a big role in

    performance. In fact, as is shown in Section 4, traditional and bootstrapped measures of

    alpha are likely to erroneously reject the null of no skill in the presence of lucky

    portfolios, which obtain a high level of performance due to lack. These measures are

    therefore easily fooled by randomness. On the other hand, the possibility of finding a

    skillful manager whose track record shows poor results is almost zero, and so the

    possibility of using the alpha measure as a diagnostic tool to help identify shortcomings and

    improve performance is minimal.

    Moreover, as a matter of application and interpretation of results, in order to estimate the

    bootstrap model of KTWW the alphas of all funds in the market must be estimated and

    bootstrapped, and inference is obtained by comparing real fund alphas to a ranked matrix of

    bootstrapped alphas (that is, a matrix where the resulting bootstrap alphas have been

    ordered from the highest to smallest). Thus, the KTWW methodology requires data from all

    funds in the market, even if the study is focused on a single fund. While Cuthbertson et. al.

    claim that they are able to study individual funds using this methodology, they still require

    the full market dataset, and their inference is based in comparing each real funds alpha

    against a distribution of bootstrapped alphas that corresponds to the real funds performance

    rank. That is, the best performing funds alpha is compared to the distribution of the highest

    bootstrapped alphas, the second best fund to the second best distribution, etc. The

    underlying assumption of this test is that the best performing fund will always obtain the

    best possible bootstrapped alphas, the second best fund will obtain the second best set of

    alphas, and so on for the rest of the funds as ranked by their performance.

  • 8/12/2019 Not Fooled by Randomness

    7/52

    6

    The lack of a known distribution under a certain null hypothesis is addressed in a general

    framework for investment fund analysis in Dawson and Young (2003). They argue that our

    inability to carry out experiments with control groups makes obtaining these distributions

    of the null hypothesis a complicated task, and advocate the use samples of random

    portfolios in a Monte Carlo experiment setting to generate them. Burns (2007) notes that

    constrained random portfolios, that is, portfolios that are allowed to trade randomly but

    within the same bounds faced by real fund managers, constitute a control group for a

    measure of skill since by construction there is no skill in their trading decisions. The fund

    managers constraints or restrictions can be imposed by the firm that offers the funds, for

    example in terms of the prospectus and investment goals, or self-imposed trading behavior

    that the manager maintains over his career. These restrictions may be in the form of a

    subset of the universe of assets in which the manager is allowed to invest (cash, fixed

    income, equity and derivatives, value vs. growth stocks, small vs. large firms, etc.),

    acceptable levels of risk (minimum and maximum; expressed as standard deviation, VaR,

    benchmark risk, etc.), turnover ratio, number of assets in the portfolio, etc.

    I will henceforth refer to the general use of constrained random portfolios as an analysis

    tool as Constrained Random Portfolio Analysis, or CRPA.

    The portfolio returns resulting from randomly trading portfolios are obtained purely by

    chance, with no value-adding (or subtracting!) intervention, and thus represent a subset of

    the state-space of feasible portfolios that could be attained by the fund manager. A large

    enough sample of CRPA portfolios will therefore generate the probability distribution of

    every level of performance potentially attainable by the fund manager, within the

    constraints she faces. Real fund returns are then compared to the distribution obtained from

    the random portfolios. Rejection of the null of no skill depends on a chosen significance

  • 8/12/2019 Not Fooled by Randomness

    8/52

    7

    level: for a manager to be considered skillful, her funds returns should be at least better

    than a certain percentile of the random fund distribution, where that percentile corresponds

    to the desired level of significance. In other words, a manager is considered skillful if she

    is able to do better than a certain number of random portfolios.

    Correctly applied, CRPA addresses all the arguments against previous measures of fund

    manager skill. Being a non-parametric approach, it sidesteps all the theoretical and

    econometric problems associated with factor models. The analysis is strictly individual: one

    fund can be analyzed with no need of data of other funds (nor the market, macroeconomic

    variables, etc.) Thus, in this sense the amount of data required for the analysis is lower than

    for other measures of skill, and there is no peer group or relevant benchmark decision to

    make. This means as well that the measure is relative and specific to the fund being

    tested. Finally, CRPA introduces a flexible and powerful framework that can be used in

    many other applications beyond testing for manager skill.

  • 8/12/2019 Not Fooled by Randomness

    9/52

    8

    3. Empirical Methodology

    Factor model alphas, when positive and significant, are considered signs of fund manager

    skill or, at least, abnormal performance not attributable to known sources of risk. The three

    most widely used specifications used to estimate these alphas are:

    Jensen's alpha (Jensen 1968),

    =+ + 1

    Fama and French 3 factor model (Fama & French 1993),

    =+ + + + 2

    and Carhart's 4 factor model (Carhart 1997),

    =+ + + + + 3

    , where Rpt are the returns of portfolio at time t, rf is the risk-free rate, is the

    regression alpha, is the fund's beta with respect to the market, RMtis the market return at

    time t, is the fund's beta with respect to Fama & French's High-Minus-Low factor,

    HML is Fama & French's High-Minus-Low factor, is the fund's beta with respect to

    Fama & French's Small-Minus-Big factor, SMB is Fama & French's Small-Minus-Big

    factor, is the fund's beta with respect to Carhart's momentum factor, UMD is

    Carhart's momentum factor, and ept is an error term.

    The null hypothesis of no skill is rejected if a funds regression alpha is found to be positive

    and significant.

    A criticism of these models is that the betas are unconditioned and static. Conditional

    models with time-varying betas have been developed and estimated (see Silli (2006)), in the

    hopes of obtaining more precise estimation of the regression coefficients. However, as far

  • 8/12/2019 Not Fooled by Randomness

    10/52

    9

    as skill testing is concerned, KTWW find that inference obtained from both conditional and

    unconditional models is virtually the same. Therefore, in the tests that follow after this

    section, only unconditional models are used.

    Standard factor model analysis generally identifies the alpha (or regression intercept) as

    evidence of abnormal return or fund manager skill (if positive and significant). However,

    standard models are incapable of differentiating between positive alphas obtained by

    skillful managers, and those that result from sheer luck (good or bad), as unlikely events

    that can nevertheless be observed at the tails of the distribution.

    The Bootstrap Alpha technique improves upon the standard analysis. Applied first in

    Kosowski, Timmerman, Wermers and White (2006), then replicated in Cuthbertson,

    Nitzche and O'Sullivan (2007) and (with small variations) in Fama and French (2010), this

    technique seeks to obtain a distribution of factor model alphas from a bootstrap process

    where the true alpha has been set to zero. Thus, this distribution will show the probabilities

    or expected frequencies of observed positive and negative alphas under the null that there

    exist no managers with skill. This distribution is then compared to the distribution of real

    investment fund alphas and, in its simplest form, the number of funds that fall in the

    extreme quantiles in one distribution can be compared to those of the other. For example, in

    KTWW one statement of their analysis reads Panel A indicates that nine funds should

    have an alpha estimate higher than 10% per year by chance, whereas in reality, 29 funds

    achieve this alpha. This is taken as evidence that the market must contain at least some

    funds that obtain positive alphas by dint of their managers' skill.

    CRPA: A non-parametric alternative to factor model alphas

  • 8/12/2019 Not Fooled by Randomness

    11/52

    10

    Using the software package PortfolioProbe3, samples of randomly trading funds can be

    constructed which, while devoid of skill, may still be bound by user-defined constraints.

    A sufficiently large number of these random funds constitute the sample which is then used

    as the control group or distribution under the null to test fund manager skill, and can be

    used to perform other types of analyses.

    To obtain the relevant distribution under the null of no skill, Burns (2007) considers using

    the holding period return for each portfolio in a large sample of randomly trading funds.

    That is, if 1,000 portfolios are generated then the return of each portfolio is calculated, then

    the distribution is based on the cross-section of the 1,000 holding period returns thus

    obtained. The skill test then consists of comparing the real funds return with the ranked

    random portfolio returns. If the real return attains a certain percentile, for example its

    better than 95% of the random returns, then we reject the null of no skill.

    Figure 1 shows the probability density of a sample of 1,000 random funds holding period

    returns, with a (dashed) line depicting the 95th percentile threshold.

    [Figure 1 about here]

    The plot was made with a sample of 1,000 random funds trading S&P 500 listed securities

    for 6 years, from 2005 to 2010. The constraints imposed on these portfolios were on

    turnover and portfolio asset count (the number of different assets that could be contained in

    the portfolio). The values used for these constraints are consistent with the mean of these

    values (turnover and asset count) for real funds currently operating in the market4.

    Therefore, for a fund manager at the helm of a fund trading these securities and operating

    under constraints similar to those simulated, she would have to obtain a return equal to or

    3Actually, PortfolioProbe is a library of functions written for the R language.

    4The data was obtained from CRSP Mutual Fund database. The sample employed corresponds to actively

    managed funds, whose net asset value is composed in at least 90% of stocks listed in the S&P500 index.

  • 8/12/2019 Not Fooled by Randomness

    12/52

    11

    better than roughly 80% during the 6 year period for the null of no skill to be rejected.

    I propose an improvement to this methodology, in which the null distribution used for

    testing is that of the time series of returns of a single random portfolio, as opposed to the

    cross-sectional approach used in Burns (2007). Using a sample of random portfolios

    ordered by a certain criteria (which could be, for example, mean return) I first set the

    critical value for a percentile, then choose the random portfolio which occupies the position

    of that percentile in the ordered sample.

    Hypothesis testing is now based on the concept of stochastic order5. We are interested in

    testing whether the distribution of returns of the managed fund is stochastically greater than

    that of the chosen percentile random fund.

    A random variable A can be said to be stochastically greater than another random variable

    B if

    Pr > Pr > (,+) 4

    By this definition we could say that A is 'bigger' than B, but the financial interpretation is

    far more interesting: the probability that fund A obtains a return higher than x is higher than

    that of fund B attaining a similar performance. While far less powerful than the concept of

    stochastic dominance, stochastic order can guide decision making in the sense that it could

    point towards a fund manager who has a higher probability of obtaining a certain level of

    return in future realizations. This is consistent with the argument in favor of skill over luck

    described in the first section of this article.

    In order to test stochastic order between two distributions, the null and the real funds time

    series of returns, the non-parametric Mann-Whitney U test (also called Mann-Whitney-

    Wilcoxon and therefore referred to as MWW) is used. The MWW test is used to assess

    5See, for example, Shaked and Shanthikumar (1994)

  • 8/12/2019 Not Fooled by Randomness

    13/52

    12

    whether one of two samples of independent observations tends to have larger values than

    the other. The test involves estimation of the U statistic, which is calculated by first

    ranking the values of both distributions and then adding these ranks. The one tail tests

    alternative hypothesis can be stated as that the probability of an observation from sample X

    is higher than one from sample Y is higher than 0.5, or

    > + ( = ) > 0.5 5

    For robustness purposes, tests of location can also be used. A standard parametric t-test of the

    difference in means of both distributions is employed, as well as a non parametric alternative based

    on permutation. This last test consists in calculating a certain statistic, for example, the difference

    between the sample means. Then, the elements of both samples are mixed together, and repeatedly

    resampled. Each iteration, two vectors are obtained with the same number of observations as each

    original sample, but with elements drawn from the mixed dataset, i.e. can contain observations from

    either sample. The relevant statistic is calculated, and the process is repeated. If the statistic of

    interest is the difference between the means, then after each iteration a difference of means is

    calculated between the resulting vectors. A large number of iterations will generate an empirical

    distribution of the difference between the means of the vectors, under the null hypothesis that both

    original samples were drawn from the same distribution. If the real difference in means is large

    enough (higher than a critical value), then the null is rejected and both samples are assumed to come

    from different distributions.

    This methodology improves the quality of the test, as it analyzes the managed funds full

    distribution of returns and not just the overall result achieved over a period of time. The

    most important consequence of this analysis is that funds that achieve impressive results by

    luck are much more likely to fail the test. Indeed, as is shown in simulation tests, this

    version of the CRPA skill test is by far less likely to be fooled by randomness than all the

    previously described measures.

  • 8/12/2019 Not Fooled by Randomness

    14/52

    13

    4. Power of the Skill Tests

    In the previous sections we encountered three measures or tests of fund manager skill,

    standard regression alphas, bootstrap alphas and CRPA, detailed their methodologies, and

    listed some of their potential shortcomings. In this section I directly test each measure in

    terms of its power to detect skill. Moreover, and more importantly, I test the measures

    potential to differentiate between skill and luck.

    Since, other than the standard alpha test, test statistics do not have parametric distributions,

    analytical expressions of the tests power are not obtainable. Hence, I proceed to estimate

    the power of each test via simulation.

    Power curves for each skill test are built by applying the test of skill to simulated samples

    of portfolios which are constructed to exhibit skill or to be simply lucky. This is

    accomplished by adding an extra rate of monthly return to the time series of returns of a

    baseline vector of returns. For the tests reported, the baseline vector is obtained from a

    sample of CRPA random portfolios. These portfolios trade S&P500 stocks, and do so

    constrained to the maximum and minimum levels of turnover and number of assets in each

    portfolio observed in a sample of real U.S. mutual funds that invest primarily in S&P500

    stocks. The average monthly returns for these random portfolios are calculated, and then

    the portfolios are ranked by this variable from smaller to larger. Then, the baseline for the

    power test samples is chosen as a percentile from these ranked portfolios. For example, if

    the 95th

    percentile is chosen and 1,000 portfolios were generated, then the baseline is a

    vector consisting of the time series of returns of the 50th

    best performing portfolio.

    Let the baseline portfolio be referred to as b, then its returns are rbi, where i is the time

    period to which this return corresponds (i=1, , n; where n is the total number of time

  • 8/12/2019 Not Fooled by Randomness

    15/52

    14

    periods under study) and let the full vector of returns be rb. This vector will have a mean

    monthly return, , and a standard deviation, .

    The samples used to construct the power curves are composed of 1,000 portfolios simulated

    for each, skill and luck. To illustrate how these samples are generated, let be a

    predetermined rate of added return, and e a noise term. Thus, e is a random variable,

    which distributes N(0,2 ). Finally, define lp as the number of lucky periods to be

    simulated in the luck portfolio sample.

    Thus, the skill sample is generated by drawing vectors of e and producing portfolio time

    series of returns of the form

    = + + 8

    , where is a vector of length n, and each element of the vector is equal to /n. Thus, the

    resulting vectors of returns represent a single (smooth) monthly increase in return with

    respect to the baseline, plus a noise or randomizing term with zero mean.

    On the other hand, the sample of lucky portfolios is constructed as

    = + + 9

    , where is a vector of length n, and its elements will contain lp instances of /lp and the

    rest will be zero, with the position of the non-zero elements chosen randomly for each

    portfolio in the sample. So, for example, if lp is equal to 1, we are simulating a lucky fund

    manager that is able to match the performance of a skilful manager with a single lucky

    break, that is, a large added return in a single month, while during the rest of the time

    period the returns of his fund are, in expectation, no different than the baseline. The

    resulting samples have properties that make them ideal for the power tests (see appendix I).

  • 8/12/2019 Not Fooled by Randomness

    16/52

    15

    The power test is carried out for various given levels of , ranging from zero (no skill) to

    4% per month, a large added return that ensures that at that end of the range the power

    curve converges to a probability of 1. Also, the number of lucky periods is allowed to

    vary, and can take values of 1 (the full extra return added to a single months return), 3, 5

    and 10.

    It should be noted that these samples are consistent with the previously given definition of

    skill vs. luck in investment funds: the skillful manager may have good and bad periods,

    but overall she should be able to obtain a consistent performance that is better than the

    market average. On the other hand, a lucky manager may be able to match (or surpass) the

    performance of a skillful manager, but does so because of a relatively small number of

    lucky breaks, or periods of exceptionally good returns, which have a small probability of

    being repeated going forward.

    The plots that follow show the resulting power curves for each test: standard alpha,

    bootstrap alpha, and CRPA in its two versions, the Burns cross-sectional measure and the

    95thpercentile time-series measure.

    For both regression alpha tests (standard and bootstrap) only the results based on the

    Carhart four-factor model are shown. This is done to preserve the images clarity, as results

    stemming from other models (one and three factor) are qualitatively equal and are available

    upon request. For the same reason, percentile distribution (time series) based CRPA testing

    is done using only the MWW test, as t-test and permutation test results are very similar.

    Finally, in order to simplify the images, number of power curves plotted is further reduced

    by introducing the concept of net power. Since detecting skill where none exist is, in fact,

    a failure of the test employed, the power associated with this type of outcome is deducted

    from the estimated power of detecting skill in samples that do have it. Thus, net power is

  • 8/12/2019 Not Fooled by Randomness

    17/52

    16

    defined as power to detect skill power to detect luck. This measure of net power is

    also consistent with the aim of skill tests, which is to separate skill from luck.

    Figure 3 shows the power curves where the luck derived returns are constructed with a

    single lucky month in a six year period. This is the most extreme case of luck, and

    should be the easiest for the tests to identify as such. As can be observed, the Burns measure

    lacks power when applied to skillful and lucky samples with similar levels of return. In fact, its net

    power is c lose to zero for any level of added return, while the other measures e xhibit similar levels

    of net power. This result is due to a disproportionally large power component estimated for the

    luck sample, which in net terms eliminates the equally large power for detect ing skill in the skillful

    sample, which would otherwise trump other measures. The other measures fare better, with the

    standard alpha and Percentile CRPA test showing very low tendencies to be fooled by these

    lucky funds.

    [Figure 3 about here]

    As the number of lucky periods increases, we can see that the power curves based on

    simulated skillful samples remain virtually the same, but the likelihood that the measures of

    skill will mistakenly take a lucky fund to be skilful increases. This affects all measures by

    severely reducing their net power. However, the effect is least noticeable for the CRPA /

    MWW test, which at a distribution of luck into 3 lucky periods becomes the most

    powerful test, and remains so for all power evaluations that follow.

    Figure 4 shows a generalized deterioration of power for all tests, with the extra return factor

    for lucky portfolios now spread over 5 periods . As mentioned above, the MWW test is

    still relatively powerful, and remains the best alternative.

    [Figure 4 about here]

    Once the number of lucky periods reaches 10, out of 72 total trading periods (6 years worth

  • 8/12/2019 Not Fooled by Randomness

    18/52

    17

    of data for each sample), all net power curves show marked deterioration, with the standard

    alpha, bootstrap alpha and Burns test net power essentially zero, as depicted in Figure 5.

    While the Percentile CRPA test is still the best, its net power never rises above

    approximately 30%, making its use in these situations questionable.

    [Figure 5 about here]

    This can be explained again from the point of view of our definitions of skill and luck. As

    the number of extra return time periods increases, the boundary between luck and skill

    starts to blur. A fund with a relatively large number of good returns in a time series of

    fixed length cannot be easily dismissed as lucky, as this might be evidence of a skillful

    manager at the helm.

    Finally, a point could be made that the testing framework is flawed, since by construction

    the samples of lucky funds have the same expected return, but higher volatility than tho se

    of skillful portfolios. Thus, one sample stochastically dominates the other, and the

    identification of skillful portfolios could be easily made by applying most measures of risk-

    adjusted returns (for example, the Sharpe ratio). However, the argument made here is that

    in detecting skill it is not the global rate of return that matters, but how that return is

    attained over a period of time. To test the robustness of the Percentile-CRPA test to the

    stochastic dominance point, I next perform the power tests using samples with no stochastic

    dominance: while the added volatilities remain the same, the return factor added to the

    lucky portfolios is larger than that added to the skillful portfolios. Tests are performed

    where the added return factor for the lucky sample is increased with respect to the

    endowment of the skillful sample by factors of 20%, 40% and 60%.

    Figure 6 shows the most extreme case simulated, with the net power curves for all tests

    where the sample of lucky portfolios has been endowed with a return factor which is 60%

  • 8/12/2019 Not Fooled by Randomness

    19/52

    18

    higher than that of the skillful portfolios, and spread over 5 periods of time. As can be

    observed, the Percentile-CRPA measure remains unaffected and able to separate skill from

    highly performing lucky funds, while the other tests have net power measures that fall

    below zero, indicating that the test is swayed by the extra return of the lucky funds and

    attributes skill to these portfolios more often than it does to truly skillful ones.

    [Figure 6 about here]

  • 8/12/2019 Not Fooled by Randomness

    20/52

    19

    5. Empirical Tests of Measures of Skill

    5.1 Sample of Investment Funds and Required Data

    While most performance measures require only portfolio returns, CRPA needs a wider

    range of data for its implementation. The goal is to obtain as complete a picture as possible

    of the constraints faced by the fund manager in her decision making process, in order to

    integrate as many of these constraints into the CRPA portfolio formation process as

    possible.

    One of the first, and most important explicit constraints placed on any fund manager is the

    universe of securities which are eligible to be part of the fund, which is a subset of the

    securities available in the market. This constraint is clearly defined in the funds

    prospectus, and is an integral part of the managers mandate and investment strategy.

    While CRPA can be applied to virtually any kind of investment fund, to generate the

    random portfolios we require a dataset containing the time series of returns of all assets

    eligible to be part of the portfolio. Thus, for example, if we wished to analyze a corporate

    bond portfolio, any and all bonds that the manager might conceivably invest in must be

    included in this dataset, so that random portfolios could eventually contain these assets as

    well. While firms tend to have a single stock listed in one exchange, they can (and do) have

    various issues of bonds trading in the markets, which invariably makes the amount of data

    required far larger. The same can be said for funds which are allowed to trade derivatives

    and other assets (and even simple equity funds, which can trade stocks listed in various

    markets, worldwide). Again, while conceptually the process is the same, the practical

    aspects become more complicated. In order to simplify the data gathering and random

    portfolio generation process, I choose to analyze a sample of funds that invest primarily in

    stocks of firms listed in the S&P500 index.

  • 8/12/2019 Not Fooled by Randomness

    21/52

    20

    Funds are selected that consistently maintain positions in S&P500 stocks that equal or

    exceed 90% of their assets (i.e.: are mostly invested in these stocks) throughout the period

    under study, which spans 6 years, from 2005 to 2010.

    The data then collected includes the monthly returns of S&P500 stocks, as well as each

    funds monthly returns, and yearly measures of turnover and asset co unt (number of assets

    in the portfolio). Table I contains the sample fund names and Nasdaq tickers, as well as the

    average values observed for turnover and asset count measures (which are used as random

    portfolio generation constraints, in conjunction with the sample of S&P500 stocks) for the

    2005-2010 period. Investment fund quarterly holdings are collected as well. All data is

    obtained from CRSP6.

    [Table I about here]

    While the overall sample average turnover rate and asset count data is presented for each

    fund, the algorithm that produces the samples of random portfolios required to implement

    CRPA works better with bounds expressed as ranges of permissible values, as opposed to

    the fixed values shown above. Thus, the average minimum and maximum turnover and

    asset count for each real portfolio is calculated7, and these are then used as random

    portfolio formation restrictions, in conjunction with the eligible stocks themselves and a

    diversification restriction, expressed as a maximum capital allocation to any one stock of

    10%.

    Although these funds compete in the same market segment, and therefore have very similar

    mandates, we can already see that the restrictions faced (or imposed) by each manager can

    have large variations. While the average turnover rate for the sample is 1.66, the minimum

    6 The Name column has the complete registered name of each fund, while the Name (short) column

    contains an abbreviated designation, which will be used throughout the analysis.7Data available upon request.

  • 8/12/2019 Not Fooled by Randomness

    22/52

    21

    reported is 0.13 (Jensen) while the largest is above 10 (Rydex Growth). For Asset Count,

    the average number of assets under management is 106, with a minimum of 26 (Jensen) and

    a maximum in excess of 500 (Vanguard). This last one could conceivably be hard to

    simulate with random portfolios, given that inevitably it will contain stocks not listed in the

    S&P500 index. However, funds were chosen by imposing the condition that at least 90% of

    their assets be invested in S&P stocks. Thus, random funds that only contain these stocks

    will still be a close approximation of the assets eligible to the fund manager, while the other

    stocks that comprise the list reported at one point must be represent very minor holdings.

    Table II shows descriptive statistics of each funds time series of returns. The market

    portfolio is included as a benchmark8.

    [Table II about here]

    For the six year period between 2005 and 2010, the average Holding Period Return (HPR)

    for the sample is 18%, while mean monthly return is close to 0.35%. As with management

    restrictions, there is much variability in the sample, with the minimum return being 0.14%

    per month (ProFunds) and a maximum of 0.58% per month (SunAmerica). It should be

    noted that the market portfolio shows a monthly performance close to the best performing

    fund, with most other actively managed funds lagging the market. The median return is

    invariably higher than the mean, evincing skewed distributions, a fact which is confirmed

    by a relatively high level of negative skewness. Also detected in all funds is excess

    kurtosis, which explains why for all funds normality of returns is rejected at the 1% level in

    most cases, and a few at the 5% level (see last column of the table, where the statistic of a

    8Market portfolio returns are obtained from the Fama & French dataset which also contain their SMB and

    HML factors, all of which are used later to obtain factor model regression alphas.

  • 8/12/2019 Not Fooled by Randomness

    23/52

    22

    Jarque-Bera test is shown). The non normality of returns immediately casts doubts on the

    interpretation and accuracy of later performance measures, which rely on normality.

    As with average monthly return, risk taking is also highly idiosyncratic in these funds, as

    depicted by the standard deviation of the funds returns. While the average is 5.1%, the

    values range from a minimum of 4.22% (Jensen), to a maximum of 7.95% (Rydex Value).

    The level of risk-taking will, of course, affect some performance measures, such as the

    Sharpe index. It is therefore premature to draw any insight into the funds qualities, be it

    performance or management skill.

    Finally, in the next section most tests are carried out using gross returns, as the measure of

    skill should, in an absolute sense, be related to the overall performance that a manager can

    obtain. However, the investor does not receive the full benefit of these returns, as they are

    reduced by the funds fees and other expenses. Thus, some tests are also performed using

    net returns, to analyze how initial results are affected by expenses. Each funds expenses

    are shown in Table III, both the total expenses as self reported data9, as well as the expense

    ratio obtained from CRSP.

    [Table III about here]

    5.2 Fund Performance and Tests of Skill

    In this section performance and skill tests are applied to the sample of mutual funds, and

    the results from each are analyzed and contrasted.

    When contemplating an investment in a mutual fund most investors, even those with some

    level of financial education, would consider past measures return sufficient information to

    base their decisions on. Thus, fund salespeople will seldom present information beyond

    9 Self-reported data is obtained from each funds publicly available information, such as prospecta,

    brochures and web pages. These documents and web addresses are available upon request.

  • 8/12/2019 Not Fooled by Randomness

    24/52

    23

    holding period return and/or mean monthly return, data which was presented in the

    previous section, but is included in Table IV. Also included are the standard deviation, as

    some investors would also consider measures of risk, and the Sharpe index as a simple risk-

    adjusted measure of return.

    [Table IV about here]

    While these measures give no clue as to the managers skills, they are by far the most

    employed by management firms in fund marketing and sales, and by investors to choose

    between investment options. In order to contrast the decision results based on these

    statistics with those of more advanced methodologies, the last column presents a ranking of

    funds by their Sharpe indexes. Although in the previous sections we saw that there is

    appreciable variability in fund risk and returns, ranking by the Sharpe ratio is similar to

    ranking by raw returns. This is perhaps because these funds operate under similar mandates

    and in the same market niche, prompting a sufficiently similar risk-taking behavior to make

    this variable have little impact when correcting returns to take it into account. Regarding

    performance itself, as can be seen in the Sharpe or Ranking columns the best fund is

    SunAmerica, while the market portfolio is the second best performing fund in the sample.

    While this has been previously reported, this is bad news for the fund management, as

    passive management is consistently cheaper (in terms of transaction costs and fees) than

    actively managed funds, so if the passive market portfolio performs better, then there would

    seem to be very little evidence in favor of active management.

    Previous studies make extensive use of factor models to estimate regression alphas. These

    alphas have been interpreted as a performance measure (as is described, for example,

    Jensens alpha), but increasingly they have come to represent fund manager skill in the

    prevalent literature. Table V shows the alphas obtained for each fund under study, using

  • 8/12/2019 Not Fooled by Randomness

    25/52

    24

    unconditional versions of the single factor model (as used to obtain Jensens alpha), Fama

    & Frenchs three factor model, and Carharts four factor model10

    .

    [Table V about here]

    Also as reported previously, most alphas turn out to be insignificant. The two exceptions

    are the WaMu and ProFunds, which exhibit alphas which are negative and significant. If

    the standard skill interpretation of regression alphas were to be employed, then we could

    say that the managers of these funds actively subtract value through their actions, as

    opposed to adding value (which would be the interpretation of a positive and significant

    alpha).

    It should also be noted that, while these two funds with negative and significant alphas have

    very low Sharpe indexes compared to the rest of the sample, the correlation between

    negative alpha and poor performance is not perfect, as WaMu ranks 18th

    but, for example,

    Rochdale Value and Ameristock rank 19th

    and 20th

    respectively, and their alphas are

    negative but insignificant, as most other funds.

    Notwithstanding the popularity and extensively documented applications of factor models,

    as reported in the first section factor model alphas have been criticized and new

    methodologies proposed to obtain better measures of fund manager skill, the main

    contender being Kosowski et. al.s Bootstrap Alpha. Though employed to evaluate a full

    market of funds, this methodology has since been applied by Cuthbertson et. al. to test

    individual funds for manager skill. Following their methodology, I test the 20 funds in the

    sample for fund manager skill using the bootstrap alpha methodology. As in Cuthbertson et.

    al., I use two separate (though complementary) hypotheses to test for significance on both

    tails of the resulting empirical distributions,

    10The market portfolio is not included in the table as, by definition, its alpha should be zero.

  • 8/12/2019 Not Fooled by Randomness

    26/52

    25

    Hypothesis A: fund manager has skill or adds value,

    HA: 0: 0, : > 0 15

    Hypothesis B: fund manager has negative skill or actively destroys value,

    HB: 0: 0, : < 0 16

    Table VI shows the funds real Carhart four factor alpha, as well as the empirical p-values

    obtained from each funds bootstrapped distribution for both hypotheses.

    [Table VI about here]

    While it is not surprising that the null of no skill is not rejected for any of the funds (see

    HA pval), the last column shows that a startling number of fund alphas (19 out of the total

    20) are negative and significant at the 1% level, indicating value-destroying management.

    Shocking though these results may seem, they do seem coherent in conjunction with the

    previously studied statistics. Specifically, if we assume the market to have zero alpha, then

    if most of these funds tend to lag the market in terms of performance (both raw and risk

    adjusted), it is not surprising that their alphas should be negative. As to the statistical

    significance of these alphas, regular tests are at odds with the bootstrap analysis, but the

    general trend is clear and consistent.

    The real question here is whether were actually measuring skill, or these are still measures

    of performance, so influenced by extraneous factors that the existence of the funds

    managers skill cannot be ascertained. That is, these are all measurements obtained from

    factors related to market and other portfolios performance, and as such are more akin to

    benchmarks than true measures of individual skill, which, while related to observable

    performance, would not be determined by it.

  • 8/12/2019 Not Fooled by Randomness

    27/52

    26

    Next, I apply CRPA measures to the sample of mutual funds. Table VII contains the

    resulting empirical p-values obtained from both, the Burns CRPA measure, and the three

    tests used to determine stochastic order in the Percentile CRPA test.

    [Table VII about here]

    As can be observed, in this sample of 20 mutual funds the null of no skill is not rejected in

    most cases11

    . However, for the Jensen fund all variants of the CRPA measure reject the null

    at the 5% level or better, while for 6 other funds only the Burns measure rejects the null.

    The power tests in the first chapter of this dissertation show that the Burns measure can

    reject the null in the presence of a fund managed with no skill, but with a sufficiently high

    overall (holding period) return, since this measure analyzes only such returns, as opposed to

    the way in which the return is composed, that is, the distribution of partial returns (in the

    case of the referred tests, the time series of monthly returns). Thus, the recommendation

    gleaned from the power tests is to reject the null only when both CRPA measures do so12

    .

    Looking at the results on Table VII the obvious inference is that only the Jensen fund is

    truly managed with skill, while the funds where only the Burns measure rejects the null

    managed to obtain a holding period return large enough to put them at the extreme of the

    random portfolio distribution.

    Comparisons of the time series distributions of these funds returns prove to be

    enlightening.

    11Skill in this case referring to the ability to add value for the investors. CRPA analysis is not used here to

    test the other tail of the distributions, to ascertain if there is value destroying behavior, as previously

    reported with bootstrap alphas.12

    The Vanguard fund might also be a candidate for a skillful manager, since the null is also rejected by the

    MWW test, the most sensitive test used to discriminate between the random portfolio percentile and the

    real funds distribution.

  • 8/12/2019 Not Fooled by Randomness

    28/52

    27

    In Table II we can see that while the Jensen fund has a high holding period return (the

    statistic used in the Burns measure) compared to the rest of the sample, it is not the highest

    (which belongs to Seligman Value). However, the Jensen fund reaches the second highest

    overall return while maintaining the lowest volatility of returns, as seen in its standard

    deviation (4.22% per month versus Seligman Values 5.72%, the highest in the sample).

    This is evidence that the Jensen fund achieves its performance through steady returns which

    are more likely attributable to superior skill, as opposed distributions with a few periods of

    high return and more periods of low returns, which lead to higher volatility and can be

    interpreted as luck. This interpretation is bolstered by the fact that these funds operate in the

    same market, and have very similar mandates (which would not be the case if, for example,

    we were comparing equity and bond funds).

    To illustrate the above conjecture, Figure 7 displays the time series of fund returns

    probability density, comparing the densities of the Jensen fund (full line), ranked 4th

    in the

    sample, with that of the best ranked fund, SunAmerica (dashed line).

    [Figure 7 about here]

    While these distributions centers seem to be close (mean monthly return for Jensen is

    0.4%, compared to SunAmericas 0.6%), the higher volatility of the SunAmerica fund is

    clearly seen as a lower peak in the probability mass, and fatter tails, depicting a fund that

    may have attained single high returns in certain periods, but is less likely to obtain similar

    future performance (i.e.: lower probability of obtaining a result close to its historical mean).

    A final note concerns the use of the evidence presented in a hypothetical decision making

    process. If the investor is presented with the usual performance statistics, the decision

  • 8/12/2019 Not Fooled by Randomness

    29/52

    28

    would inevitably be to invest in the highest ranked fund, that is, in SunAmerica13

    . If the

    decision is to be based on factor model alphas, then no fund appears to be superior to the

    market portfolio, whereas if bootstrapped alphas are contemplated, then all of these funds

    would definitely be discarded as potential investment alternatives. Only CRPA tests show

    any glimmer of hope for these actively managed funds. Using these results, an investor

    would consider investing in the Jensen fund, pending further analysis of costs and fees, to

    compare it with a passive strategy.

    While fund expenses should not be contemplated in a pure analysis of manager skill, they

    do impinge on an investors decision making process. Thus, the question is, how sensitive

    are the previously derived measures to the addition of expenses? In Table VIII the CRPA-

    based statistics presented in Table VII are reproduced, but recalculated using fund returns

    net of expenses.

    [Table VIII about here]

    This analysis, made from the point of view of the investor, shows that where the Jensen

    fund was previously identified as skillfully managed by most tests, it now only registers

    significance in the Burns cross-sectional test, at first glance making inference about

    manager skill unreliable. The correct interpretation of this result is that, though the manager

    appears to be skillful, expenses lower the expected benefit to the investor to the point where

    she is equally well off investing randomly by herself (and thus avoiding the charges). In

    other words, and as has been concluded in previous articles, any overperformance of the

    fund seems to be charged away from the investor, so that the benefit of the managers skill

    are enjoyed only by the brokerage firm and/or the manager himself.

    13Unless expenses and fees are also considered, in which case perhaps a passive, market index fund would

    beat all alternative strategies.

  • 8/12/2019 Not Fooled by Randomness

    30/52

    29

    Finally, consider the analysis that could be made of this data by the fund managers

    themselves. The Jensen fund seems to be the only skillfully managed portfolio in the

    sample, but it does not obtain the best returns. While luck inevitably plays a part in all

    financial results, further analysis can be made contrasting the management constraints faced

    by the Jensen manager to those imposed on other managers of similar funds. As was seen in

    the previous section, the Jensen fund has the lowest average turnover ratio of the sample, as

    well as the lowest asset count. In the first chapter of this dissertation it is shown that for

    samples of random funds (where all other variables are controlled for), differences in

    management constraints can have an impact in the resulting return distributions. As an

    example, for relatively low levels of turnover, having a low or high number of assets under

    management are dominated as a strategy by a mid-range level. This points to potential areas

    of improvement worth investigating. Perhaps the only thing the Jensen fund must do to

    improve performance is used the already detected skill to manage a larger number of assets,

    compared to its present level.

    On further point is raised in Lisi (2011), who implements a measure of skill based on

    random portfolios, but instead of the CRPA methodology, generates simple equal-weighted

    portfolios from randomly selected stocks traded in the Italian market. Lisi then goes on to

    apply risk adjustment and other measures to the sample of random funds before using them

    to make statistical tests. However, the CRPA methodology makes these adjustments

    unnecessary. Consider the fact that portfolio risk is a function of manager decisions,

    coupled with management restrictions. Thus, adding fund manager constraints to the

    random portfolio generation algorithm eliminates the need for further processing of the

    resulting sample, be it the use of risk adjustment or factor models. That is, risk is already

    restricted to the potentially available portfolios, and applying risk adjustment measures

  • 8/12/2019 Not Fooled by Randomness

    31/52

    30

    should not alter the inference obtained from a CRPA analysis. I confirm this by applying

    further tests in which I apply the CRPA tests to the portfolios Sharpe measures and

    regression alphas. The results, withheld for brevity14

    , do not change the conclusions

    described above. That is, there is no added value obtained from applying further measures,

    and the correct application of CRPA to simple portfolio returns suffices.

    14Available upon request.

  • 8/12/2019 Not Fooled by Randomness

    32/52

    31

    6. Conclusions

    A new general framework for investment fund analysis using randomly trading portfolios is

    outlined and one application, a test of fund manager skill, is developed and fully studied.

    The skill test based on CRPA is found to be a powerful and appealing alternative to

    traditional methods. On one hand, the statistical properties of the resulting distributions are

    free from various assumption problems and biases long recognized in other families of

    tests, in particular, the problems of parametric regression-based measures. On the other

    hand, while fund manager skill is the focus of this paper, the implications and potential

    applications of this methodology are extremely varied.

    As an empirical application of the CRPA-based test of skill, a sample of U.S. large cap

    mutual funds is analyzed from the point of view of a prospective investor. Standard

    performance measures usually employed to make investment decisions are estimated, and

    tests of skill are applied, where the null hypothesis for all tests is that managers have no

    skill. The results obtained from standard and bootstrap regression alpha methodologies are,

    at best, inconclusive, and in the worst cases show alphas which are negative and significant,

    signifying negative skill or a reduction in portfolio value attributable to the managers

    actions. CRPA skill tests of the sample of funds reveal that, while the null is not rejected

    for most of them, skill can be tentatively identified in a few. Results are of further interest

    because, unlike the case of regression alphas, CRPA test results are not necessarily

    correlated with performance measures. In fact, the one fund where all CRPA tests reject the

    null is not the best performing fund in the sample (in terms of returns and Sharpe ratio), but

    the fourth best. While the CRPA skill tests detect skill in some funds, once fund fees are

    deducted from their returns, the test fails to reject the null of no skill, which is consistent

  • 8/12/2019 Not Fooled by Randomness

    33/52

    32

    with previous literature that shows that, while there may be some value added by a few

    money managers, this value is charged away from the investors.

    The application of the CRPA skill test described above would serve as a guide for investor

    decision making. However, the same test can be used in other applications.

    As a diagnostic tool for fund management firms, we can consider the case in which the null

    is rejected for a fund manager, and therefore we can consider this manager as possessing

    skill, but nevertheless the funds performance lags behind a benchmark or peer group.

    Further analysis of the trading constraints could, in theory, pinpoint areas where the

    manager is over or under restricted, with respect to the competition. If these bounds can be

    changed (for example, allow the manager to take on more risk, or to increase the frequency

    of trades), then performance could be easily improved. More generally, if a group of

    managers with certain mandates lags in performance with respect to others with a different

    set of goals, then perhaps what is being uncovered is a systematic market anomaly. An

    example of a known anomaly, size, might manifest as investors in small firms obtaining

    better returns than those who invest in large firms, after controlling for manager skill.

    Perhaps CRPA could help discover other, hitherto unreported, anomalies.

    There are also implications for the fund manager job market: would hiring be based on

    track record alone and other second-hand sources of data, if skill could be measured

    reliably? Also, fund charges to investors could potentially be analyzed and based on

    manager skill, which is a more direct link to potential performance than other measures.

    CRPA provides a robust new framework for various types of investment fund analysis,

    including testing for fund manager skill. The results shown here confirm that CRPA tests

    are more sensitive in detecting skill than factor model based tests, and the interpretation of

    their results are also easier. No risk adjustment is required, as in other measures, and

  • 8/12/2019 Not Fooled by Randomness

    34/52

    33

    potential econometric problems such as non-normalities are not an issue, due to the use of

    non-parametric statistics.

    Further applications of CRPA include, for example, testing for sources of manager skill

    once the null of no skill is rejected. Simple and intuitive statistics can be employed in

    conjunction with random portfolio methodology to test for stock picking and asset

    allocation abilities. Beyond fund manager skill, CRPA can be employed as a non-

    parametric alternative to traditional measures which can be severely weakened in their

    applicability by specification problems. One example is measures of market herding, which

    suffer from the lack of a distribution under the null hypothesis of no herding. CRPA-

    generated markets can serve to obtain an empirical distribution in which there is no

    herding, against which to benchmark the resulting herding statistic for a more precise

    measure of statistical significance.

    In all, CRPA is an important addition to the finance analysis toolkit.

  • 8/12/2019 Not Fooled by Randomness

    35/52

    34

    List of References

    Burns, P., 2007, Random Portfolios for Performance Measurement, in Erricos John

    Kontoghiorghes & Cristian Gatu eds.: Optimization, Econometric and FinancialAnalysis(Springer).

    Burns Statistics (2011). PortfolioProbe: Portfolio Probe. R package version1.03.http://www.burns-stat.com/.

    Carhart, M., 1997, On Persistance of Mutual Fund Performance, Journal of Finance,Vol. 52, No. 1, 57-82.

    Christopherson, J., Ferson, W., Glassman, D., 1998, Conditioning Manager Alphas onEconomic Information: Another Look at the Persistence of Performance,Review of

    Financial Studies, Vol.11, No. 1, 111-142.

    Cuthbertson, K., Nitzsche, D., O'Sullivan, N., 2008, UK mutual fund performance: skill

    or luck?,Journal of Empirical Finance15, 613-634.

    Dawson, R., R. Young, 2003, Near-uniformly distributed, stochastically generatedportfolios, in Stephen Satchel & Alan Scowcroft eds.: Advances in PortfolioConstruction and Implementation(Butterworth-Heinemann Finance).

    Fama, E., K. French, 2010, Luck versus skill in the cross-section of mutual fundreturns,Journal of Finance, Vol. LXV-5, 1915-1947.

    Ferson, W., R. Schadt, 1996, Measuring fund strategy and performance in changingeconomic conditions,Journal of Finance, Vol. 52, No. 2, 425-461.

    Kosowski, R., Timmermann, A., Wermers, R., White, H., 2006, Can Mutual Fund StarsReally Pick Stocks? New Evidence from a Bootstrap Analysis,Journal of Finance,Vol. LXI, No. 6, 2551-2595.

    Lisi, F., 2011, Dicing with the market: randomized procedures for evaluation of mutualfunds, Quantitative Finance, Vol. 11, No. 2, 163-172.

    R Development Core Team (2011). R: A language and environment for statisticalcomputing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.

    Shaked, M. and J. G. Shanthikumar, 1994, Stochastic Orders and their Applications,Associated Press.

    Silli, B., 2006, Modern Approaches in the Evaluation of Management Skill in theMutual Fund Industry (working paper)

    Sharpe, W., 1992, Asset Allocation: Management Style and Performance Measurement,Journal of Portfolio Management, Vol. 18, No. 2, 7-19

    http://www.burns-stat.com/http://www.burns-stat.com/http://www.burns-stat.com/
  • 8/12/2019 Not Fooled by Randomness

    36/52

    35

    Appendix I: Properties of Random Portfolio Samples for Power Tests

    Property 1: The expected return for all funds is the same, whether skillful or lucky.

    We can see this by taking expectation in (8) and (9). For both equations we have that

    () = + 10

    Property 2: Lucky portfolios have higher variance than skillful ones.

    I calculate the variance of each type of fund. For the skillful funds we have

    2 = 2+

    2+ 2 + 2 , + 2 , + 2, 11

    But is a vector of equal numbers (zero variance), and by construction, 2 = 2,

    thus

    2 = 22 +2 , 11

    On the other hand, for the lucky funds,

    2 = 2+

    2+ 2 +2 ,+ 2 , + 2 , 12

    2 = 22 + 2+2 ,+2 , +2 , 13

    The difference between these two variances is

    2 2 = 2+2 , +2 , 14

    Now, looking at the terms on the right-hand side of this equation, we can see that the first

    term is always positive, since its a variance, and the second term is also positive, as just

    plus a vector of zero or positive constants. The last term, the covariance between the

    added return vector and the error term, can be positive or negative. However, simulation

    shows that the probability that the complete expression (the sum of the three terms) is

  • 8/12/2019 Not Fooled by Randomness

    37/52

    36

    negative is very low15

    . Thus, the difference in variances tends to be positive, and therefore

    lucky portfolios tend to have a higher variance than skillful portfolios.

    15Full results of this Monte Carlo test are not reported in the interest of brevity, but are available upon

    request. In short, the average difference of the variances obtained from 1,000 Monte Carlo iterations is

    almost always positive. The only negative values appear in some samples when the added return factor is

    set to zero. In this case both samples are drawn from the same distribution, and thus the difference of

    variances has a 50/50 chance of being positive or negative.

  • 8/12/2019 Not Fooled by Randomness

    38/52

    37

    Table I

    Identification data, average yearly turnover rate and asset count for funds in sample.The yearly turnover ratio is the dollar value of all trades occurring in each year (buy and

    sell) divided by the total value of assets at the beginning of the year. The figure shown is

    the average turnover ratio for the 6 year period studied. Similarly, asset count is the 6 year

    average of the yearly number of assets in each portfolio.

    Nasdaq Ticker Fund Name Name (short) Turnover Ratio Asset Count

    CBXCXCalamos Investment Trust:

    CALAMOS Blue Chip FundCalamos

    0.46 108

    JAMEXWilliamsburg Investment Trust:

    Jamestown Equity FundJamestown

    0.52 65

    JENSX Jensen Portfolio, Inc Jensen0.13 26

    NOLVXNorthern Funds: Large Cap Value

    FundNorthern

    0.45 48

    SGRCX Seligman Growth Fund, Inc.Seligman

    Growth

    1.57 64

    SVLCXSeligman Value Fund Series, Inc:

    Seligman Large-Cap Value FundSeligman Value

    0.23 35

    FDSTXSunAmerica Focused Series, Inc:

    Focused Dividend Strategy PortfolioSunAmerica

    1.69 30

    VTGIX

    Vanguard Tax-Managed Funds:

    Vanguard Tax-Managed Growth &

    Income Fund

    Vanguard 0.14 525

    ACGKXVan Kampen Growth & Income

    Fund: Growth & Income FundVan Kampen

    0.36 74

    WSHCXWashington Mutual Investors Fund,

    IncWaMu

    0.22 132

    AMSTX Ameristock Mutual Fund, Inc Ameristock0.20 37

    LOMAXAdvisors Series Trust: Edgar Lomax

    Value FundLomax

    0.52 48

    DDVCXDelaware Group Equity Funds II:

    Delaware Value FundDelaware

    0.28 34

    HGKEXAdvisors' Inner Circle Fund: HGK

    Equity Value FundHGK

    0.53 48

    RIMGXRochdale Investment Trust:

    Rochdale Large Growth Portfolio

    Rochdale

    Growth0.52 77

    RIMVXRochdale Investment Trust:

    Rochdale Large Value PortfolioRochdale Value

    0.55 90

    BHGSXBaird Funds, Inc: Baird LargeCap

    FundBaird

    0.46 49

    LVPIX ProFunds: Large-Cap ValueProFund ProFunds 6.64 337

    SFECXRydex Series Funds: Large-Cap

    Growth FundRydex Growth

    10.08 144

    SEGIXRydex Series Funds: Large-Cap

    Value FundRydex Value

    7.67 152

    http://www.rydex-sgi.com/products/mutual_funds/info/overview.rails?cusip=814289302http://www.rydex-sgi.com/products/mutual_funds/info/overview.rails?cusip=814289302
  • 8/12/2019 Not Fooled by Randomness

    39/52

    38

    Table II

    Summary statistics of fund sample returnsHPR is the holding period return for the 6 year period under study. Mean, Median Standard

    Deviation (St. Dev.), Skewness and Kurtosis are calculated for each fund based on their monthly

    returns. The last column shows the Jarque-Bera test of normality statistic for each fund, with

    significance being denoted with *, ** and *** for a 10%, 5% and 1% level, respectively.

    Portfolio HPR Mean Median St. Dev. Skew Kurtosis Jarque Bera

    Market 0.32 0.0052 0.0117 0.0508 -0.8903 4.7215 18.15***

    Calamos 0.21 0.0038 0.0119 0.0472 -0.8878 4.8158 19.08***

    Jamestown 0.15 0.0029 0.0095 0.0444 -1.1266 5.4389 32.62***

    Jensen 0.25 0.0041 0.0092 0.0422 -0.8768 5.215 23.61***

    Northern 0.14 0.0032 0.0119 0.0526 -0.7402 4.5334 13.44***

    Seligman Growth 0.22 0.0042 0.0066 0.0534 -0.9104 4.7767 19.15***

    Seligman Value 0.26 0.0049 0.0096 0.0572 -0.737 5.0373 18.71***

    SunAmerica 0.35 0.0058 0.01 0.0559 -0.0419 5.6083 20.15***Vanguard 0.20 0.0037 0.0125 0.048 -0.8513 4.4199 14.54***

    Van Kampen 0.22 0.0039 0.0082 0.0474 -0.7083 3.7443 7.58**

    WaMu 0.11 0.0025 0.0106 0.0443 -1.0576 5.0582 25.77***

    Ameristock 0.09 0.0022 0.0102 0.0451 -0.6409 4.1218 8.58**

    Lomax 0.17 0.0034 0.0135 0.0489 -0.9321 4.4941 16.89***

    Delaware 0.18 0.0034 0.011 0.0443 -1.0189 4.2503 16.91***

    HGK 0.20 0.0038 0.0134 0.049 -1.1034 5.2559 29.46***

    Rochdale Growth 0.15 0.0034 0.0092 0.0535 -0.632 4.589 12.2***

    Rochdale Value 0.08 0.0026 0.0119 0.0545 -0.9462 5.4183 27.89***

    Baird 0.14 0.0032 0.0044 0.0515 -0.7186 5.5828 25.85***ProFunds 0.00 0.0014 0.0115 0.0521 -0.8977 4.404 15.37***

    Rydex Growth 0.23 0.0044 0.0036 0.0547 -0.5716 4.9627 15.26***

    Rydex Value 0.02 0.0034 0.0096 0.0795 0.1027 6.6613 39.78***

  • 8/12/2019 Not Fooled by Randomness

    40/52

    39

    Table III

    Fund expensesSelf Reported total expenses obtained fromfund publications (prospecta, web sites, etc.) Expense

    Ratio data obtained from CRSP. Both measures reported are yearly costs, as percentage of assets.

    Portfolio Self Reported Expense RatioCalamos 0.0235 0.0123

    Jamestown 0.0113 0.0110

    Jensen 0.0125 0.0107

    Northern 0.0110 0.0114

    Seligman Growth 0.0197 0.0163

    Seligman Value 0.0215 0.0189

    SunAmerica 0.0095 0.0104

    Vanguard 0.0155 0.0128

    Van Kampen 0.0150 0.0112

    WaMu 0.0149 0.0104Ameristock 0.0091 0.0057

    Lomax 0.0099 0.0108

    Delaware 0.0185 0.0137

    HGK 0.0099 0.0086

    Rochdale Growth 0.0150 0.0218

    Rochdale Value 0.0152 0.0207

    Baird 0.0100 0.0179

    ProFunds 0.0273 0.0127

    Rydex Growth 0.0218 0.0239

    Rydex Value 0.0190 0.0167

  • 8/12/2019 Not Fooled by Randomness

    41/52

    40

    Table IV

    Standard portfolio performance measures

    HPR are holding period returns obtained from portfolio data over a period of 6 years.

    Mean is the portfolios average monthly return, while St. Dev. is the standard deviation of

    those returns. Sharpe is the funds Sharpe ratio for the period under study. Rank

    corresponds to the rank each fund holds in the sample, ordered by their Sharpe ratios.

    Name HPR Mean St. Dev. Sharpe Rank

    Market 0.32 0.005 0.051 0.064 2

    Calamos 0.21 0.004 0.047 0.040 8

    Jamestown 0.15 0.003 0.044 0.022 16

    Jensen 0.25 0.004 0.042 0.050 4

    Northern 0.14 0.003 0.053 0.024 14

    Seligman Growth 0.22 0.004 0.053 0.043 6

    Seligman Value 0.26 0.005 0.057 0.052 3

    SunAmerica 0.35 0.006 0.056 0.068 1

    Vanguard 0.20 0.004 0.048 0.037 10

    Van Kampen 0.22 0.004 0.047 0.041 7

    WaMu 0.11 0.003 0.044 0.013 18

    Ameristock 0.09 0.002 0.045 0.006 20

    Lomax 0.17 0.003 0.049 0.031 12

    Delaware 0.18 0.003 0.044 0.032 11

    HGK 0.20 0.004 0.049 0.038 9

    Rochdale Growth 0.15 0.003 0.054 0.027 13

    Rochdale Value 0.08 0.003 0.055 0.012 19

    Baird 0.14 0.003 0.052 0.024 15

    ProFunds 0.00 0.001 0.052 -0.010 21

    Rydex Growth 0.23 0.004 0.055 0.046 5

    Rydex Value 0.02 0.003 0.080 0.018 17

  • 8/12/2019 Not Fooled by Randomness

    42/52

    41

    Table V

    Factor model alphas

    Regression alphas obtained from a one factor model (Jensens alpha), Fama & Frenchs

    three factor model and Carharts four factor model. Significance is denoted with *, ** and

    *** for a 10%, 5% and 1% level, respectively.

    Portfolios Jensen Fama & French Carhart

    Calamos -0.0011 -0.0007 -0.0007

    Jamestown -0.0018 -0.0016 -0.0016

    Jensen -0.0004 -0.0002 -0.0002

    Northern -0.002 -0.0019 -0.002

    Seligman Growth -0.001 -0.0006 -0.0006

    Seligman Value -0.0005 -0.0002 -0.0003

    SunAmerica 0.0005 0.0003 0.0002

    Vanguard -0.0013* -0.001* -0.001*

    Van Kampen -0.001 -0.0005 -0.0005

    WaMu -0.0022* -0.0017 -0.0017

    Ameristock -0.0025 -0.0022 -0.0022

    Lomax -0.0015 -0.001 -0.001

    Delaware -0.0012 -0.0008 -0.0008

    HGK -0.0012 -0.0008 -0.0008

    Rochdale Growth -0.0017 -0.0015 -0.0014

    Rochdale Value -0.0027 -0.0025 -0.0023

    Baird -0.002 -0.0022 -0.0023

    ProFunds -0.0038** -0.0035*** -0.0035***

    Rydex Growth -0.0009 -0.0012 -0.0012Rydex Value -0.0031 -0.0034 -0.0037*

  • 8/12/2019 Not Fooled by Randomness

    43/52

    42

    Table VI

    Bootstrap alphas

    The first column contains the value of the regression alpha obtained from Carharts four

    factor model. The following two columns show the bootstrap p-values obtained for this

    alpha when testing two hypotheses alternative to the null that alpha is zero. HA tests the

    right tail, or a positive alpha, while HB tests the left tail, or a negative alpha. Significance is

    denoted with *, ** and *** for a 10%, 5% and 1% level, respectively.

    Portfolio Carhart Alpha HA pval HB pval

    Calamos -7.0E-04 1.00 0.00***

    Jamestown -1.6E-03 1.00 0.00***

    Jensen -2.0E-04 1.00 0.00***

    Northern -2.0E-03 1.00 0.00***

    Seligman Growth -6.0E-04 1.00 0.00***

    Seligman Value -3.0E-04 1.00 0.00***

    SunAmerica 2.0E-04 1.00 0.00***

    Vanguard -1.0E-03 1.00 0.00***

    Van Kampen -5.0E-04 1.00 0.00***

    WaMu -1.7E-03 1.00 0.00***

    Ameristock -2.2E-03 1.00 0.00***

    Lomax -1.0E-03 1.00 0.00***

    Delaware -8.0E-04 1.00 0.00***

    HGK -8.0E-04 1.00 0.00***

    Rochdale Growth -1.4E-03 1.00 0.00***

    Rochdale Value -2.3E-03 0.98 0.02**

    Baird -2.3E-03 0.99 0.01***

    ProFunds -3.5E-03 0.97 0.03**

    Rydex Growth -1.2E-03 1.00 0.00***

    Rydex Value -3.7E-03 0.87 0.13

  • 8/12/2019 Not Fooled by Randomness

    44/52

    43

    Table VII

    CRPA tests of fund manager skill

    All values shown are empirically obtained p-values. Burns is the p-value from the Burns

    (2007) cross-sectional measure of skill. The other three columns contain p-values derived

    from the percentile or time-series approach to CRPA skill testing in which the distribution

    of a funds time series of returns is compared with that of a percentile distribution obtained

    from a sample of random funds. The T- and Permutation tests measure significance of the

    difference in the distribution means. MWW is a test of stochastic order, where the null

    hypothesis is that both samples (fund and random portfolio returns) are drawn from the

    same distribution, and the one-tailed alternative is that fund returns are stochastically

    greater than random portfolio returns. Significance is denoted with *, ** and *** for a

    10%, 5% and 1% level, respectively.

    Portfolio Burns T-test MWW Test Permutation Test

    Calamos 0.221 0.767 0.2607 0.747Jamestown 0.134 0.7811 0.3631 0.752

    Jensen 0.003*** 0.0387** 0.0343** 0.038**

    Northern 0.129 0.6985 0.1845 0.647

    Seligman Growth 0.426 0.8439 0.3253 0.825

    Seligman Value 0.032** 0.4366 0.2854 0.449

    SunAmerica 0.238 0.8455 0.411 0.816

    Vanguard 0.092* 0.6431 0.039** 0.556

    Van Kampen 0.076* 0.675 0.5352 0.678

    WaMu 0.144 0.7566 0.4898 0.749

    Ameristock 0.003*** 0.1854 0.3212 0.201

    Lomax 0.057* 0.6124 0.6305 0.609

    Delaware 0.132 0.7398 0.4671 0.682

    HGK 0.11 0.7729 0.2758 0.777

    Rochdale Growth 0.183 0.6612 0.1298 0.566

    Rochdale Value 0.263 0.7606 0.1423 0.713

    Baird 0.046** 0.5192 0.107 0.523

    ProFunds 0.998 0.9243 0.9558 0.987

    Rydex Growth 0.819 0.8666 0.5397 0.881

    Rydex Value 0.991 0.8385 0.728 0.767

  • 8/12/2019 Not Fooled by Randomness

    45/52

    44

    Table VIII

    CRPA tests of fund manager skill, adjusting returns for fund expenses

    All values shown are empirically obtained p-values. Burns is the p-value from the Burns

    (2007) cross-sectional measure of skill. The other three columns contain p-values derived

    from the percentile or time-series approach to CRPA skill testing in which the distribution

    of a funds time series of returns is compared with that of a percentile distribution obtained

    from a sample of random funds. The T- and Permutation tests measure significance of the

    difference in the distribution means. MWW is a test of stochastic order, where the null

    hypothesis is that both samples (fund and random portfolio returns) are drawn from the

    same distribution, and the one-tailed alternative is that fund returns are stochastically

    greater than random portfolio returns. Significance is denoted with *, ** and *** for a

    10%, 5% and 1% level, respectively.

    Portfolio Burns T-test MWW Test Permutation Test

    Calamos 0.794 0.8939 0.6412 0.902

    Jamestown 0.256 0.8555 0.6022 0.826Jensen 0.012** 0.1343 0.1423 0.135

    Northern 0.256 0.7792 0.2815 0.743

    Seligman Growth 0.718 0.8922 0.5622 0.883

    Seligman Value 0.139 0.7171 0.6726 0.709

    SunAmerica 0.311 0.8685 0.5216 0.843

    Vanguard 0.347 0.7845 0.2777 0.642

    Van Kampen 0.24 0.8336 0.7715 0.843

    WaMu 0.592 0.9073 0.7966 0.907

    Ameristock 0.004*** 0.3158 0.5261 0.324

    Lomax 0.103 0.7168 0.7801 0.702

    Delaware 0.79 0.8988 0.8215 0.888

    HGK 0.178 0.834 0.4762 0.812

    Rochdale Growth 0.483 0.7418 0.2893 0.65

    Rochdale Value 0.605 0.8423 0.3031 0.811

    Baird 0.087* 0.6127 0.1876 0.584

    ProFunds 1 0.9521 0.9973 1

    Rydex Growth 0.968 0.8938 0.7998 0.956

    Rydex Value 1 0.8656 0.8785 0.854

  • 8/12/2019 Not Fooled by Randomness

    46/52

    45

    Figure 1

    Random fund sample holding period return probability densityProbability density plot for a sample of 1,000 random funds trading securities listed in theS&P 500 for 6 years, from 2005 to 2010, with constraints on turnover and portfolio assetcount consistent with the mean of these values for real funds currently operating in the

    market. Density obtained from the holding period returns of each of 1,000 funds. Six yearholding period return plotted on x-axis, probability in the y-axis. 95th percentile holding

    period return denoted by dotted line.

  • 8/12/2019 Not Fooled by Randomness

    47/52

    46

    Figure 2

    Full random portfolio time series of returns probability densitiesTime series of returns probability densities plotted for each of a set of 1,000 random fund

    probability distributions, from the same sample used in figure I and ordered by average monthly

    return. Separate sample portfolio return densities arrayed along the Y axis, with returns on the X

    axis and the probability densities of these returns visible on the Z axis.

    CRPA 1000 random fund sample Probability Density Surface

  • 8/12/2019 Not Fooled by Randomness

    48/52

    47

    Figure 3

    Net Power of Skill Test, 1 Lucky Period

    Net power curves of tests of skill . Extra Return Factor is the additional monthly return added to a

    baseline random portfolios returns to simulate overperformance through skill or luck, ranging

    from 0% (no skill), to 4%. Net power of the test denotes the probability that the test rejects the

    null of no skill when the sample has skill minus the probability that the test rejects the null of noskill when the sample is just lucky (average of number of times the test rejections out of 1,000

    trials, for each level of added return).

  • 8/12/2019 Not Fooled by Randomness

    49/52

    48

    Figure 4

    Net Power of Skill Test, 5 Lucky Periods

    Net power curves of tests of skill . Extra Return Factor is the additional monthly return added to a

    baseline random portfolios returns to simulate overperformance through skill or luck, ranging

    from 0% (no skill), to 4%. Net power of the test denotes the probability that the test rejects the

    null of no skill when the sample has skill minus the probability that the test rejects the null of noskill when the sample is just lucky (average of number of times the test rejections out of 1,000

    trials, for each level of added return).

  • 8/12/2019 Not Fooled by Randomness

    50/52

    49

    Figure 5

    Net Power of Skill Test, 10 Lucky Periods

    Net power curves of tests of skill . Extra Return Factor is the additional monthly return added to a

    baseline random portfolios returns to simulate overperformance through skill or luck, ranging

    from 0% (no skill), to 4%. Net power of the test denotes the probability that the test rejects the

    null of no skill when the sample has skill minus the probability that the test rejects the null of noskill when the sample is just lucky (average of number of times the test rejections out of 1,000

    trials, for each level of added return).

  • 8/12/2019 Not Fooled by Randomness

    51/52

    50

    Figure 6

    Net Power of Skill Test, 5 lucky periods, samples with no stochastic dominance: 60%

    extra return for lucky portfolios

    Net power curves of tests of skill . Extra Return Factor is the additional monthly return added to a

    baseline random portfolios returns to simulate overperformance through skill or luck, ranging

    from 0% (no skill), to 4%. Extra return factor for luck sample is higher than for skill sample. Net

    power of the test denotes the probability that the test rejects the null of no skill when the sample

    has skill minus the probability that the test rejects the null of no skill when the sample is just lucky

    (average of number of times the test rejections out of 1,000 trials, for each level of added return).

  • 8/12/2019 Not Fooled by Randomness

    52/52

    Figure7

    Portfolio Return Probability Density

    Probability densities of the time series of returns of the Jensen fund (full line) and

    SunAmerica (dashed line). The x-axis denotes monthly portfolio return, while the y-axis

    shows probability density. Plot obtained using kernel density estimation, with a Gaussian

    kernel and automated bandwidth selection.