421
TECHNICAL INCERTO: LECTURES NOTES ON PROBABILITY, VOL 1 SILENT RISK nassim nicholas taleb In which is provided a mathematical parallel version of the author’s Incerto, with derivations, examples, theorems, & heuristics. (This Draft Is For Error Detection)

Silent Risk

Embed Size (px)

DESCRIPTION

Taleb's technical work

Citation preview

  • TECHNICAL INCERTO : LECTURES NOTES ONPROBAB I L I TY, VOL 1

    SILENT RISKnassim nicholas taleb

    In which is provided a mathematical parallel versionof the authors Incerto, with derivations, examples,

    theorems, & heuristics.(This Draft Is For Error Detection)

  • descartes publishingThe aim of DesCartes Publishing is to "berize" academic publishing by cut-ting the middleperson and producing books of the highest scientific quality atthe most affordable price. DesCartes engages in ethical publishing; it avoidsdouble-charging the public sector for books produced on university time.All DesCartes books and monographs are required to be freely available in

    PDF form.

    EDITORIAL BOARD

    This format is based on Andr Miedes ClassicThesis.

    Nassim Nicholas Taleb: Silent Risk, Lectures on Probability, Vol. 1 c 2015

  • WarningThere is a saying to "never show a fool incomplete work". This is work-in-

    progress, full of incomplete sections and (I hope) minor mistakes. The use ofthe material here requires some intellectual and personal maturity.

    ABSTRACT

    "Empirical evidence that the boat is safe", or how we tend to be fooled by silent risks. Factumstultus cognoscit (The fool only understand risks after the harm is done). Risk is both precaution-ary (fragility based) and evidentiary (statistical based); it is too serious a business to be left tomechanistic users of probability theory.This figure encapsulates the scientific "nonsucker" approach to risk and probability. CourtesyGeorge Nasr.

    ***

    This book provides a mathematical framework for decision making and the analysisof (consequential) hidden risks, those tail events undetected or improperly detectedby statistical machinery; and substitutes fragility as a more reliable measure ofexposure. Model error is mapped as risk, even tail risk.1

    Risks are seen in tail events rather than in the variations; this necessarily linksthem mathematically to an asymmetric response to intensity of shocks, convex orconcave.

    1 This is a polite way to say No-BS approach to probability. iii

  • The difference between "models" and "the real world" ecologies lies largely inan additional layer of uncertainty that typically (because of the same asymmetricresponse by small probabilities to additional uncertainty) thickens the tails andinvalidates all probabilistic tail risk measurements models, by their very natureof reduction, are vulnerable to a chronic underestimation of the tails.So tail events are not measurable; but the good news is that exposure to tail

    events is. In "Fat Tail Domains" (Extremistan), tail events are rarely present in pastdata: their statistical presence appears too late, and time series analysis is similarto sending troops after the battle. Hence the concept of fragility is introduced: isone vulnerable (i.e., asymmetric) to model error or model perturbation (seen as anadditional layer of uncertainty)?Part I, the introductory section of the book, presents the Black Swan problem. It

    provides a rigorous definition of probability as used in real-world decisions, map-ping distinct classes of payoffs and their sensitivity to model or representation error.It presents a mathematical version of contract theory. It shows how errors perco-late in higher dimensions, how robustness in statistics doesnt lead to robustness ofpayoff and introduces a sketch of real-world risk management.Part II It presents the effect of the combination of fat tails and model error on

    the slowness of convergence of standard measurements under the law of largenumber: some claims require 104 times more data than is thought. It is shownthat many of the statistical techniques used in social sciences are either inconsistentwith probability theory.Part III proposes an approach to risk measurement based on fragility as nonlinear

    (concave) response, and explores nonlinearities and their statistical consequences.Risk management would consist in building structures that are not negatively asym-metric, that is both "robust" to both model error and tail events. Antifragility is aconvex response to perturbations of a certain class of variables. The link betweennonlinearity and survival is shown.Part IV focuses on problems in quantitative finance (such as the effect of ap-

    plication of Itos lemma under fat-tailed conditions or under real-world texture),problems that that can be generalized to probability and risk in general.

    iv

  • CONTENTSAcknowledgments xviiiNotes for Reviewers xx

    i figuring out probability and what it means 11 so what is the black swan problem? 3

    1.1 The issue 31.2 Disciplines 51.3 Statistics and Risk: Two Different Businesses 101.4 Robustness of Payoff or Robustness of Estimation? 10

    1.4.1 Some Elements of Robustness in Statistics 121.4.2 The Black Swan and Statistical Robustness 131.4.3 Mean Deviation vs Standard Deviation, Class Lecture Deriva-

    tions 141.4.4 Effect of Fatter Tails on the "efficiency" of STD vs MD 15

    1.5 Why We Should Retire Standard Deviation 151.6 Error Propagation in Higher Dimensions IN PROGRESS 161.7 The Tail Sensitivity Theorems IN PROGRESS 201.8 Skepticism and Precaution 231.9 How Science is Black Swan Compatible 25

    2 what is probability? what is exposure? 272.1 The Conflation of Events and Exposures 28

    2.1.1 Contract Theory 312.2 On the Conflation Between Tail Risks and Pascals Wager 332.3 Payoff Classes P1 through P4 35

    2.3.1 Atomic Payoff P1 362.3.2 Binary Payoff Class P2 372.3.3 Vanilla Payoff Class P3, building blocks for regular expo-

    sures. 382.3.4 Composite/Sigmoidal Payoff Class P4 42

    2.4 Achieving Nonlinearity through P4 422.5 Main Errors in the Literature 422.6 The Applicability of Some Psychological Biases 432.7 Misfitness of Prediction Markets 43

    2.7.1 The Black Swan is Not About Probability But Payoff 432.7.2 Chernoff Bound 452.7.3 Fatter tails lower the probability of remote events (the binary)

    and raise the value of the vanilla. 452.7.4 The law of large numbers works better with the binary than

    the variable 462.8 Finding Inconsistencies in Scholarly Treatments of Events 482.9 Metaprobability and the Payoff Kernel 492.10 Classification and Codification of Exposures 542.11 Numeraire definition 54

    v

  • Contents2.12 What is an Insurable Risk? 552.13 Ruin problems 552.14 Skepticism, Uncertainty, and Scale of a Distributon 552.15 Why Pascal Wager Has Nothing to Do With the Left Tail 55

    3 black swan, part ii: "real world" rigor 573.1 A Course With an Absurd Title 603.2 Problems and Inverse Problems 623.3 Fragility, not Just Statistics, For Hidden Risks 693.4 Solution: The Convex Heuristic 70

    3.4.1 Convex Heuristics, Rationality, and Revelation of Preferences 723.5 Fragility and Model Error 73

    3.5.1 Why Engineering? 733.5.2 Risk is not Variations 733.5.3 What Do Fat Tails Have to Do With This? 733.5.4 Fat Tails and Model Expansion 733.5.5 Savages Difference Between The Small and LargeWorld 74

    3.6 General Classification of Problems Related To Fat Tails 75General Classification of Problems Related To Fat Tails 753.7 Closing the Introduction 77

    a whats a charlatan in risk and probability? 79a.1 Charlatan 79

    a.1.1 Citation Rings and Cosmetic Job Market Science 80

    ii fat tails: the lln under real world ecologies 854 fat tails and the problem of induction 87

    4.1 The Problem of (Enumerative) Induction 874.2 Empirical Risk Estimators 874.3 Fat Tails, the Finite Moment Case 904.4 A Simple Heuristic to Create Mildly Fat Tails 934.5 The Body, The Shoulders, and The Tails 954.6 Fattening of Tails With Skewed Variance 984.7 Fat Tails in Higher Dimension 1004.8 Scalable and Nonscalable, A Deeper View of Fat Tails 1014.9 Subexponential as a class of fat tailed distributions 1034.10 Joint Fat-Tailedness and Elliptical Distributions 1074.11 Different Approaches For Statistical Estimators 1084.12 Econometrics imagines functions in L2 Space 1144.13 Typical Manifestations of The Turkey Surprise 1154.14 Metrics for Functions Outside L2 Space 1174.15 Using the Hilbert Transform 1204.16 A Comment on Bayesian Methods in Risk Management 121

    b special cases of fat tails 123b.1 Multimodality and Fat Tails, or the War and Peace Model 123b.2 Transition probabilites: what can break will break 126

    c quick and robust measure of fat tails 129c.1 Introduction 129c.2 First Metric, the Simple Estimator 129

    vi

  • Contentsc.3 Second Metric, the X2 estimator 131

    5 hierarchy of distributions for asymmetries 1335.1 Permissible Empirical Statements 1335.2 Masquerade Example 1345.3 The Probabilistic Version of Absense of Evidence 1355.4 Via Negativa and One-Sided Arbitrage of Statistical Methods 1355.5 Hierarchy of Distributions in Term of Tails 1365.6 How To Arbitrage Kolmogorov-Smirnov 1405.7 Mistaking Evidence for Anecdotes & The Reverse 143

    6 effects of higher orders of uncertainty 1456.1 Meta-Probability Distribution 1456.2 Metadistribution and the Calibration of Power Laws 1476.3 The Effect of Metaprobability on Fat Tails 1486.4 Fukushima, Or How Errors Compound 1486.5 The Markowitz inconsistency 1496.6 Psychological pseudo-biases under second layer of uncertainty. 151

    6.6.1 The pathologization fallacy 1517 large numbers and clt in the real world in progress 157

    7.0.2 The "Pinker Problem" 1577.1 The problem of Matching Errors 1597.2 Generalizing Mean Deviation as Partial Expectation 1607.3 Class of Stable Distributions 161

    7.3.1 Results 1617.3.2 Stochastic Alpha or Mixed Samples 163

    7.4 Symmetric NonStable Distributions in the Subexponential Class 1647.4.1 Symmetric Mixed Gaussians, Stochastic Mean 1647.4.2 Half cubic Student T (Lvy Stable Basin) 1657.4.3 Cubic Student T (Gaussian Basin) 166

    7.5 Asymmetric NonStable Distributions in the Subexponetial Class 1677.5.1 One-tailed Pareto Distributions 1677.5.2 The Lognormal and Borderline Subexponential Class 167

    7.6 Asymmetric Distributions in the Superexponential Class 1677.6.1 Mixing Gaussian Distributions and Poisson Case 1677.6.2 Skew Normal Distribution 1677.6.3 Super-thin tailed distributions: Subgaussians 168

    7.7 Acknowledgement 1697.7.1 Cumulants 1697.7.2 Derivations using explicit E(jXj) 1697.7.3 Derivations using the Hilbert Transform and b = 0 169

    d tedious derivations for lln across fat tails in progress 171d.1 Comments on LLN and Bounds 171

    d.1.1 Speed of Convergence for Simple Cases 171d.1.2 Comparing N = 1 to N = 2 for a symmetric power law with

    1< a 2. 173d.2 Digression Into Inversion of Characteristic Function of NonStable

    Power Law 175d.2.1 Integrable Characteristic Functions 176

    vii

  • Contentsd.3 Pulling the PDF of an n-Summed Student T 176

    8 preasymptotics and central limit in the real world 1798.1 Using Log Cumulants to Observe Preasymptotics 1858.2 Convergence of the Maximum of a Finite Variance Power Law 1898.3 Sources and Further Readings 1898.4 Convergence for Non-Lvy StablePower Laws Temporarily here) 190

    e making the a-stable distribution one-tailed 193f where standard diversification fails 195g fat tails and random matrices 1999 some misuses of statistics in social science 201

    9.1 Mechanistic Statistical Statements 2019.2 Attribute Substitution 2019.3 The Tails Sampling Property 2039.4 A discussion of the Paretan 80/20 Rule 2059.5 Survivorship Bias (Casanova) Property 2089.6 Left (Right) Tail Sample Insufficiency Under Negative (Positive) Skew-

    ness 2109.7 Why N=1 Can Be Very, Very Significant Statistically 2119.8 The Instability of Squared Variations in Regressions 2119.9 Statistical Testing of Differences Between Variables 2149.10 Studying the Statistical Properties of Binaries and Extending to Vanil-

    las 2159.11 Why Economics Time Series Dont Replicate 215

    9.11.1 Performance of Standard Parametric Risk Estimators, f (x) =xn (Norm L2) 216

    9.12 A General Summary of The Problem of Reliance on Past Time Se-ries 221

    9.13 Conclusion 221h on the instability of econometric data 22310 fat tails from recursive uncertainty 225

    10.1 Layering uncertainty 22510.2 Regime 1 (Explosive): Case of a constant error parameter a 23110.3 Convergence to Power Laws 23210.4 Regime 1b: Preservation of Variance 23410.5 Regime 2: Cases of decaying parameters an 23410.6 Conclusion and Suggested Application 237

    11 parametrization and tails 23911.1 Some Bad News Concerning power laws 23911.2 Extreme Value Theory: Not a Panacea 240

    11.2.1 Some Intuition: How does the Extreme Value Distributionemerge? 241

    11.2.2 Derivations 24211.2.3 Extreme Values for Fat-Tailed Distribution 24211.2.4 A Severe Inverse Problem for EVT 243

    11.3 Using Power Laws Without Being Harmed by Mistakes 244i poisson vs. power law tails 247

    i.1 Beware The Poisson 247

    viii

  • Contentsi.2 Leave it to the Data 248

    i.2.1 Global Macroeconomic data 24912 the fourth quadrant "solution" 251

    12.1 Two types of Decisions 25113 risk and precaution 253

    13.0.1 Stopping time equivalence 25413.0.2 Sign of exposure 25413.0.3 layering 254

    13.1 What is the Precautionary Principle 25413.2 Why Ruin is Serious Business 25513.3 Fallacious arguments in Risk Management 255

    13.3.1 Crossing the road (the paralysis fallacy) 25513.3.2 The Psychology of Risk and Thick Tailed Distributions 25613.3.3 The Loch Ness fallacy 25713.3.4 The fallacy of misusing the naturalistic fallacy 25713.3.5 The "Butterfly in China" fallacy 25813.3.6 The potato fallacy 25813.3.7 The Russian roulette fallacy (the counterexamples in the risk

    domain) 25813.3.8 The Carpenter Fallacy 25913.3.9 The technological salvation fallacy 25913.3.10 The pathologization fallacy 260

    14 skin in the game and risk taking 26114.1 Payoff Skewness and Lack of Skin-in-the-Game 266

    iii (anti)fragility and nonlinear responses to random variables 27315 exposures as transformed random variables 275

    15.1 The Conflation Problem Redux: Exposures to x ConfusedWith Knowl-edge About x 27515.1.1 Limitations of knowledge 27515.1.2 Bad news 27615.1.3 The central point about what to understand 27615.1.4 Fragility and Antifragility 276

    15.2 Transformations of Probability Distributions 27615.3 Application 1: Happiness ( f (x)) is different from wealth (x) 27815.4 The effect of convexity on the distribution of f(x) 28115.5 Estimation Methods When the Payoff is Convex 282

    15.5.1 Convexity and Explosive Payoffs 28315.5.2 Conclusion: The Asymmetry in Decision Making 285

    16 an uncertainty approach to fragility 28716.1 A Review of The General Notion of "Robustness" 28716.2 Introduction 287

    16.2.1 Intrinsic and Inherited Fragility: 29016.2.2 Fragility As Separate Risk From Psychological Preferences 29016.2.3 Beyond Jensens Inequality 290

    17 the fragility theorems 29517.1 Tail Sensitivity to Uncertainty 295

    ix

  • Contents17.1.1 Precise Expression of Fragility 297

    17.2 Effect of Nonlinearity on Intrinsic Fragility 29817.3 Fragility Drift 302

    17.3.1 Second-order Fragility 30217.4 Expressions of Robustness and Antifragility 302

    17.4.1 Definition of Robustness 30317.4.2 Antifragility 30417.4.3 Remarks 30517.4.4 Unconditionality of the shortfall measure x 306

    18 applications to model error 30718.0.5 Example:Application to Budget Deficits 30718.0.6 Model Error and Semi-Bias as Nonlinearity fromMissed Stochas-

    ticity of Variables 30818.1 Model Bias, Second Order Effects, and Fragility 309

    19 the fragility measurement heuristics 31119.0.1 The Fragility/Model Error Detection Heuristic (detecting wA

    and wB when cogent) 31119.1 Example 1 (Detecting Risk Not Shown By Stress Test) 31119.2 The Heuristic applied to a stress testing 312

    19.2.1 Further Applications Investigated in Next Chapters 31219.3 Stress Tests 31319.4 General Methodology 314

    20 fragility and economic models 31520.1 The Markowitz Inconsistency 31520.2 Application: Ricardian Model and Left Tail Exposure 316

    21 the origin of thin-tails 32121.1 Properties of the Inherited Probability Distribution 32221.2 Conclusion and Remarks 325

    22 small is beautiful: risk, scale and concentration 32722.1 Introduction: The Tower of Babel 327

    22.1.1 Only Iatrogenics of Scale and Concentration 32922.2 Unbounded Convexity Effects 33022.3 A Richer Model: The Generalized Sigmoid 332

    22.3.1 Application 33423 why is the fragile nonlinear? the arcsine distribution of life 339

    23.1 The Coffee Cup 33923.1.1 Illustration Case 1: Gaussian 34123.1.2 Illustration Case 2: Power Law 341

    24 how the world will progressively look weirder 34524.1 How Noise Explodes Faster than Data 34524.2 Derivations 347

    25 the convexity of wealth to inequality 35125.1 The One Percent of the One Percent are Divorced from the Rest 351

    26 nonlinearities and risk in medicine 35526.1 Antifragility from Uneven Distribution 355

    x

  • Contentsiv deeper mathematical problems in finance and option theory 35927 real world rigor in mathematical finance 361

    27.0.1 The Holman Fraud 36127.1 Central Conflation 361

    27.1.1 Conflation of Fat Tails and Long Volatility 36127.1.2 Conflation of Mean and Standard Deviation 36127.1.3 No, We Dont Use (and Have Never Used) Black-Scholes 36127.1.4 Option strategies: tests 36127.1.5 American Options are Different 36127.1.6 The divorce between the underlying and the strategy 361

    28 american options and hidden convexity 36328.1 This Note 36328.2 The General Mathematical Results: Pricing Series of Hidden Options

    "Use One Lose All" 36328.3 Misplaced Precision 36528.4 The Pricing Equation 36628.5 War Stories 36628.6 The Stopping Time Problem 36928.7 Expressing the various sub-options 36928.8 Conclusion 369

    29 brownian motion in the real world 37129.1 Path Dependence and History as Revelation of Antifragility 37129.2 SP and path dependence (incomplete) 37229.3 Brownian Motion in the Real World 37329.4 Stochastic Processes and Nonanticipating Strategies 37429.5 Finite Variance not Necessary for Anything Ecological (incl. quant

    finance) 37429.6 Recovering Kernels from Option Prices 37429.7 Problems With the Revovery theorem 375

    29.7.1 Path dependence 37629.8 Dynamic Hedging In Discrete Time 37729.9 The operation of dynamic hedging 37729.10The stochastic exposure problem, case of gamma 37929.11The stochastic delta problem 380

    29.11.1 Lower bound on Dynamic hedging 38130 unique option pricing measure with neither dynamic hedging norcomplete markets 383

    30.1 Background 38330.2 Proof 386

    30.2.1 Case 1: Forward as risk-neutral measure 38630.2.2 Derivations 386

    30.3 Case where the Forward is not risk neutral 38830.4 comment 389

    Bibliography 391Index 399

    xi

  • CHAPTER SUMMAR IESChapter 1: Presenting the Black Swan problem and the historical traditions

    involved, showing its role in risk management and proba-bility theory. We show the difference between decision andknowledge. We compare robustness of statistic to robustnessin payoff space. 3

    Chapter 2: Probability defined first things first in a practically rigorousmanner. Why and how we cannot dissociate probabilityfrom decision. The notion of contract theory. Fallacies com-ing from verbalistic descriptions of probability. Fourier Trans-forms of Payoffs. The difference between classes of pay-offs with probabilistic consequences. Formal definition ofmetaprobability. 27

    Chapter 3: We continue with the Black Swan idea and present the projectof the codification of Risk and decision theory as related tothe real world (that is "no BS") in nonmathematical language(other chapters are mathematical). Introduces the main fal-lacies treated in the project. What can and should be math-ematized. Presents the central principles of risk bearing. In-troduces the idea of fragility as a response to volatility, theassociated notion of convex heuristic, the problem of invis-ibility of the probability distribution and the spirit of thebook. Explains why risk is in the tails not in the variations.Explains that the layering of random variables makes moreecological a view that is corresponds tot the "real world" andhow layering of model errors generates fat tails. 57

    Chapter 4: Introducing mathematical formulations of fat tails. Shows how theproblem of induction gets worse. Empirical risk estimator.Introduces different heuristics to "fatten" tails. Where do thetails start? Sampling error and convex payoffs. 87

    xiii

  • Chapter SummariesChapter 5: Using the asymptotic Radon-Nikodym derivatives of probabil-

    ity measures, we construct a formal methodology to avoidthe "masquerade problem" namely that standard "empirical"tests are not empirical at all and can be fooled by fat tails,though not by thin tails, as a fat tailed distribution (whichrequires a lot more data) can masquerade as a low-risk one,but not the reverse. Remarkably this point is the statisticalversion of the logical asymmetry between evidence of absenceand absence of evidence. We put some refinement around thenotion of "failure to reject", as it may misapply in some sit-uations. We show how such tests as Kolmogorov Smirnoff,Anderson-Darling, Jarque-Bera, Mardia Kurtosis, and otherscan be gamed and how our ranking rectifies the problem.133

    Chapter 6: The Spectrum Between Uncertainty and Risk. There has been a bit ofdiscussions about the distinction between "uncertainty" and"risk". We believe in gradation of uncertainty at the level ofthe probability distribution itself (a "meta" or higher order ofuncertainty.) One end of the spectrum, "Knightian risk", isnot available for us mortals in the real world. We show howthe effect on fat tails and on the calibration of tail exponentsand reveal inconsistencies in models such as Markowitz orthose used for intertemporal discounting (as many violationsof "rationality" arent violations . 145

    Chapter 7: The Law of Large Numbers is the foundation of statistical knowl-edge or, even (inductive) knowledge tout court. The behav-ior of the sum of random variables allows us to get to theasymptote and use handy asymptotic properties. Howeverreal life is more complicated. We cannot talk about LLNwithout figuring out the speed of convergence, which, whenit is at

    pn, is only so asymptotically. Further, in some cases

    the LLN doesnt work at all. For very fat tailed, under theslightest parametric error, it will be more than 400 timesslower than thought. 157

    Chapter 8: The behavior of the sum of random variables allows us to get to theasymptote and use handy asymptotic properties, that is, Pla-tonic distributions. But the problem is that in the real worldwe never get to the asymptote, we just get "close" Some dis-tributions get close quickly, others very slowly (even if theyhave finite variance). We examine how fat tailedness wors-ens the process. 179

    xiv

  • Chapter SummariesChapter 9: We apply the results of the previous chapter on the slowness

    of the LLN and list misapplication of statistics in social sci-ence, almost all of them linked to misinterpretation of theeffects of fat-tailedness (and often from lack of awareness offat tails), and how by attribute substitution researchers cansubstitute one measure for another. Why for example, be-cause of chronic small-sample effects, the 80/20 is milderin-sample (less fat-tailed) than in reality and why regressionrarely works. 201

    Chapter 10: Error about Errors. Probabilistic representations require theinclusion of model (or representation) error (a probabilisticstatement has to have an error rate), and, in the event of suchtreatment, one also needs to include second, third and higherorder errors (about the methods used to compute the errors)and by a regress argument, to take the idea to its logicallimit, one should be continuously reapplying the thinkingall the way to its limit unless when one has a reason to stop,as a declared a priori that escapes quantitative and statisticalmethod. We show how power laws emerge from nested er-rors on errors of the standard deviation for a Gaussian distri-bution. We also show under which regime regressed errorslead to non-power law fat-tailed distributions. 225

    Chapter 11: We present case studies around the point that, simply, somemodels depend quite a bit on small variations in parameters.The effect on the Gaussian is easy to gauge, and expected.But many believe in power laws as panacea. Even if onebelieved the r.v. was power law distributed, one still wouldnot be able to make a precise statement on tail risks. Showsweaknesses of calibration of Extreme Value Theory. 239

    Chapter 12: A less technical demarcation between Black Swan Domains andothers. 251

    Chapter 13: We present the difference between ruin problems, particularlywhen systemic, and other risk management of a more com-putable nature. 253

    xv

  • Chapter SummariesChapter 14: Standard economic theory makes an allowance for the agency

    problem, but not the compounding of moral hazard in thepresence of informational opacity, particularly in what con-cerns high-impact events in fat tailed domains (under slowconvergence for the law of large numbers). Nor did it look atexposure as a filter that removes nefarious risk takers fromthe system so they stop harming others. (In the language ofprobability, skin in the game creates an absorbing state forthe agent, not just the principal). But the ancients did; sodid many aspects of moral philosophy. We propose a globaland morally mandatory heuristic that anyone involved in anaction which can possibly generate harm for others, evenprobabilistically, should be required to be exposed to somedamage, regardless of context. While perhaps not sufficient,the heuristic is certainly necessary hence mandatory. It issupposed to counter voluntary and involuntary risk hiding and risk transfer in the tails. 261

    Chapter 15: Deeper into the conflation between a random variable and expo-sure to it. 275

    Chapter 16: We provide a mathematical approach to fragility as negative sen-sitivity to a semi-measure of dispersion and volatility (a vari-ant of negative or positive "vega") and examine the link tononlinear effects. We link to the literature on model "robust-ness" and show how we add nonlinearity to the conventionalapproaches. 287

    Chapter 17: Presents the fragility theorems and the transfer function be-tween nonlinear response and the benefits and harm fromincreased uncertainty. 295

    Chapter 18: Where model error is treated as a random event. 307Chapter 19: Presents the IMF fragility heuristics, particularly in the improve-

    ment of stress testing. 311Chapter 20: We discuss the errors in economics in the light of our fragility

    criterion. 315Chapter 21: The literature of heavy tails starts with a random walk and

    finds mechanisms that lead to fat tails under aggregation.We follow the inverse route and show how starting withfat tails we get to thin-tails from the probability distribu-tion of the response to a random variable. We introduce ageneral dose-response curve show how the left and right-boundedness of the reponse in natural things leads to thin-tails, even when the underlying variable of the exposureis fat-tailed. 321

    xvi

  • Chapter SummariesChapter 22: We extract the effect of size on the degradation of the expectation

    of a random variable, from nonlinear response. The methodis general and allows to show the "small is beautiful" or "de-centralized is effective" or "a diverse ecology is safer" effectfrom a response to a stochastic stressor and prove stochasticdiseconomies of scale and concentration (with as examplethe Irish potato famine and GMOs). We apply the method-ology to environmental harm using standard sigmoid dose-response to show the need to split sources of pollution acrossindependent 327

    Chapter 23: Explains why the fragile is necessarily in the nonlinear. Showshow an ArcSine distribution emerges from the nonlinearityof the response. 339

    Chapter 25: The one percent of the one percent has tail properties such thatthe tail wealth (expectation

    R K x p(x) dx) depends far more

    on inequality than wealth. 351Chapter 26: Examines nonlinearities in medicine /iatrogenics as a risk man-

    agement problem. 355Chapter 27: A list of common problems 361Chapter 28: American Options have hidden optionalities. Using a European

    option as a baseline we heuristically add the difference. Wealso show how these hidden options are extremely signifi-cant compared to the focus of the research literature on in-significant marginal improvements of the pricing equationsbut in the Black Scholes World. 363

    Chapter 29: Much of the work concerning martingales and Brownian mo-tion has been idealized; we look for holes and pockets ofmismatch to reality, with consequences. Infinite (or unde-fined) higher moments are not compatible with Ito calculusoutside the asymptote. Path dependence as a measure offragility. 371

    Chapter 30: Proof that under simple assumptions, such as constraints of Put-Call Parity, the probability measure for the valuation of a Eu-ropean option has the mean derived from the forward pricewhich can, but does not have to be the risk-neutral one, un-der any general probability distribution, bypassing the Black-Scholes-Merton dynamic hedging argument, and withoutthe requirement of complete markets and other strong as-sumptions. We confirm that the heuristics used by tradersfor centuries are both more robust, more consistent, andmore rigorous than held in the economics literature. We alsoshow that options can be priced using infinite variance (finitemean) distributions. 383

    xvii

  • PREAMBLE , NOTES ON THE TEXTThis author travelled two careers in the opposite of the usual directions:

    1) From risk taking to probability: I came to deepening my studies ofprobability and did doctoral work during and after trading derivativesand volatility packages and maturing a certain bottom-up organic viewof probability and probability distributions. The episode lasted for 21years, interrupted in its middle for doctoral work. Indeed, volatility andderivatives (under the condition of skin in the game) are a great steppingstone into probability: much like driving a car at a speed of 600mph (oreven 6,000 mph) is a great way to understand its vulnerabilities.

    But this book goes beyond derivatives as it addresses probability prob-lems in general, and only those that are generalizable,

    and

    2) From practical essays (under the cover of "philosophical") to spe-cialized work: I only started publishing technical approaches (outsidespecialized option related matters) after publishing nontechnical "philo-sophical and practical" ones, though on the very same subject.

    But the philosophical (or practical) essays and the technical derivations werewritten synchronously, not in sequence, largely in an idiosyncratic way, what themathematician Marco Avellaneda called "private mathematical language", of whichthis is the translation in fact the technical derivations for The Black Swan[120] andAntifragile[121] were started long before the essay form. So it took twenty years tomature the ideas and techniques of fragility and nonlinear response, the notion ofprobability as less rigorous than "exposure" for decision making, and the idea that"truth space" requires different types of logic than "consequence space", one builton asymmetries.Risk takers view the world very differently from most academic users of proba-

    bility and industry risk analysts, largely because the notion of "skin in the game"imposes a certain type of rigor and skepticism about we call in the next chaptercosmetic "job-market" science.Risk is a serious business and it is high time that those who learned about it

    via risk-taking have something not "anecdotal" to say about the subject. In fact wewill try to build a new maximally rigorous approach to it, one that incorporatespractice.

    acknowledgmentsThe text is not entirely that of the author. Four chapters contain recycled text writ-ten with collaborators in standalone articles: the late Benoit Mandelbrot (section of

    xviii

  • Chapter Summariesslowness of LLN under power laws, even with finite variance), Elie Canetti and thestress-testing staff at the International Monetary Fund (for the heuristic to detecttail events), Phil Tetlock (binary vs variable for forecasting), Constantine Sandis(skin in the game) and Raphael Douady (mathematical mapping of fragility). Butit is the latter paper that represents the biggest debt: as the central point of thisbook is convex response (or, more generally, nonlinear effects which subsume tailevents), the latter paper is the result of 18 years of mulling that single idea, as anextention of Dynamic Hedging [118] applied outside the options domain, with 18years of collaborative conversation with Raphael before the actual composition!This book is in debt to three persons who left us. In addition to Benoit Mandel-

    brot, this author feels deep gratitude to the late David Freedman, for his encourage-ments to develop a rigorous model-error based, real-world approach to statistics,grounded in classical skeptical empiricism, and one that could circumvent the prob-lem of induction: and the method was clear, of the type "dont use statistics whereyou can be a sucker" or "figure out where you can be the sucker". There was this"moment" in the air, when a group composed of the then unknown John Ioannidis,Stan Young, Philip Stark, and others got together I was then an almost unpub-lished and argumentative "volatility" trader, something people couldnt quite un-derstand unless expressed as "nonlinear payoffs", even then (Dynamic Hedging wasunreadable to nonspecialists) and felt that getting David Freedmans attention wasmore of a burden than a blessing, as it meant a severe obligation.2

    Indeed this exact book project was born from a 2012 Berkeley statistics depart-ment commencement lecture, given in his memory, with the message: "statisticsis the most powerful weapon today, it comes with responsibility" (since numericalassessments increase risk taking) and the corrolary, directly taken from his legacy:

    "Understand the models errors before you understand the model".

    leading to the theme of this book, that all one needs to do is figure out the answerto the following question:

    Are you convex or concave to model errors?

    It was a very sad story to get a message from the statistical geophysicist AlbertTarantola linking to the electronic version of his book Inverse Problem Theory: Meth-ods for Data Fitting and Model Parameter Estimation [124]. He had been maturingan idea on dealing with probability with his new work taking probability ab ovo.Tarantola had been piqued by the "masquerade" problem in The Black Swan pre-sented in Chapter 5 and the notion that most risk methods "predict the irrelevant".Tragically, he passed away before the conference he was organizing took place, andwhile I ended up never meeting him, I felt mentored by his approach along withthe obligation to deliver technical results of the problem in its applications to riskmanagement.Sections of this text were presented in many places as I said it took years to ma-

    ture the point. Some of these chapters are adapted from lectures on hedging with

    2 The late David Freedman was known to this author for his paper with Stark What is the chance of anearthquake?[48] but he really covered all manner of statistical mistakes, particular with the modelersbiases in his magisterial textbook [49]. xix

  • Chapter SummariesPaul Wilmott and from my course "Risk Management in the Real World" at NYUwhich as I state in the next chapter is an absurd (but necessary) title. Outside of riskpractitioners, in the first stage, I got invitations from statistical and mathematics de-partments initially to satisfy their curiosity about the exoticism of "outsider" andstrange "volatility" trader or "quant" wild animal. But they soon got disappointedthat the animal was not much of a wild animal but an orthodox statistician, actuallyoverzealous about a nobullshit approach. I thank Wolfgang Hrtle for, before thisbook was started in the following form, a full-day seminar at Humboldt Universityand Pantula Sastry for providing the inaugurating lecture of the International Yearof Statistics at the National Science Foundation.Carl Tony Fakhry has taken the thankless task of diligently rederiving every equa-

    tion (at the time of writing he has just reached Chapter 3). I also thank WenzhaoWu and Mian Wang for list of typos.

    to the readerThe text can be read by (motivated) non-quants: everything mathematical in thetext is accompanied with a "literary" commentary, so in many sections the mathcan be safely skipped. Its mission, to repeat, is to show a risk-taker perspectiveon risk management, integrated into the mathematical language, not to lecture onstatistical concepts.On the other hand, when it comes to math, it assumes a basic "quant level"

    advanced or heuristic knowledge of mathematical statistics, and is written as amonograph; it is closer to a longer research paper or old fashioned treatise. As Imade sure there is little overlap with other books on the subject, I calibrated thistext to the wonderful textbook by the late A. Papoulis Probability, Random Variables,and Stochastic Processes [93]: there is nothing basic discussed in this text that is notdefined in Papoulis.For more advanced, more mathematical, or deeper matters such as convergence

    theorems, the text provides definitions, but the reader is recommended to use Lo-eves two volumes Probability Theory [77] and [78] for a measure theoretic approach,or Fellers two volumes, [43] and [42] and, for probability bounds, Petrov[95]. Forextreme value theory, Embrecht et al [34] is irreplaceable.

    status/completion of bookThis is a first draft for general discussion, not for equation-wise verification. Thereare still typos, errors and problems progressively discovered by readers thanks tothe dissemination on the web. The bibliographical references are not uniform, theyare in the process of being integrated into bibtex.Note that there are redundancies that will be removed at the end of the composi-

    tion.

    xx

  • Chapter SummariesAugust 2014 status: After completing most of the math sections, I startedputting words and structure around the concepts, so I am progressively intro-ducing "definitions", "remarks", and comments in plain English, inspired byboth Bourbaki and legal codifications. The idea is to codify and explain allterms to clear up the usual confusions. As of August 2014, I am only 15%done.

    Below is the list of the incomplete sections.

    Incomplete Sections in Part I (mostly concerned with limitations of measurements of tailprobabilities)i A list of symbols.

    ii Chapter 4 proposes a measure of fattailedness based on ratio of Norms for all(superexponential, subexponential, and powerlaws with tail exponent >2); it ismore powerful than Kurtosis since we show it to be unstable in many domains.It lead us to a robust heuristic derivation of fat tails. We will add an Appendixcomparing it to the Hill estimator.

    iii An Appendix on the misfunctioning of maximum likelihood estimators (exten-sion of the problem of Chapter 3).

    iv In the chapter on pathologies of stochastic processes, a longer explanation ofwhy a stochastic integral "in the real world" requires 3 periods not 2 with ex-amples (event information for computation of exposureXt ! order Xt+Dt !execution Xt+2Dt).

    v The "Weron" effect of recovered a from estimates higher than true values.

    vi A lengthier (and clearer) exposition of the variety of bounds: MarkovChebychevLusinBerhshteinLyapunov Berry-Esseen Chernoff bounds with tables.

    vii A discussion of the Von Mises condition. A discussion of the Cramr condition.Connected: Why the research on large deviations remains outside fat-taileddomains.

    viii A discussion of convergence (and nonconvergence) of random matrices to theWigner semicirle, along with its importance with respect to Big Data

    ix A section of pitfalls when deriving slopes for power laws, with situations wherewe tend to overestimate the exponent.

    incomplete sections(mostly concerned with building exposures and convexity of payoffs: What is andWhat is Not "Long Volatility")

    xxi

  • Chapter Summariesi A discussion of gamblers ruin. The interest is the connection to tail events andfragility. "Ruin" is a better name because the idea of survival for an aggregate,such as probability of ecocide for the planet.

    ii An exposition of the precautionary principle as a result of the fragility criterion.

    iii A discussion of the "real option" literature showing connecting fragility to thenegative of "real option".

    iv A link between concavity and iatrogenic risks (modeled as short volatility).

    v A concluding chapter.

    Best Regards,Nassim Nicholas TalebJanuary 2015

    xxii

  • Part I

    F IGUR ING OUT PROBAB I L I TY AND WHAT ITMEANS

  • 1 SO WHAT IS THE BLACK SWANPROBLEM?Abstract of Chapter 1Presenting the Black Swan problem and the historical traditions involved, show-ing its role in risk management and probability theory. We show the differencebetween decision and knowledge. We compare robustness of statistic to robust-ness in payoff space.

    Figure 1.1: The BlackSwan Problem

    Rara Avis: The Metaphor The metaphor of the black swan is not at all amodern one contrary to its usual attribution to Popper, Mill, Hume, andothers. I selected it because it corresponds to the ancient idea of a "rare bird."The Latin poet Juvenal refers to a "bird as rare as the black swan" rara avis interris nigroque simillima cygno.

    1.1 the issueAs shown in Figure 1.1, the Black Swan is a problem that straddles many separateand distinct professional research traditions and problems.

    3

  • so what is the black swan problem?

    Figure 1.2: Connecting disciplines. Further, shows the "Fat Tonyism", i.e. real world rigor.

    4

  • 1.2 disciplinesDefinition 1.1 (A Black Swan).The black Swan is a) a highly unexpected event or a given observer, b) carries large conse-quences, and c) is subjected to ex-post rationalization.

    Fallacy 1.1 (Observer dependence).It is fallacious to argue whether a given event "is a black swan" without asking "for whom?"(the Black Swan for the turkey is not one for the butcher).

    For instance the events of Sept 11 2001 were highly unexpected events for peoplein the NYC towers, not to the terrorists.

    Definition 1.2 (Deficit of Theory of Mind).Misinterpretation of a Black Swan in particular and statistical randomness in general asobjective phenomena common to all observers arises from deficits of the theory of mind whichconsists in not understanding that different people can share different information sets andthe highly unexpected for one could be the highly expected for others.

    Such deficits of theory of mind are highly present in the scientific community.

    Remark 1.1 (Epistemic aspect).The observer dependence gives the Black Swan an epistemic dimension.

    In our more technical terms, it is not about probability but its long term effectcomes from the first (or higher) moments, that is the payoff.This tendency to weave narratives is the reason we retrospectively to assess the

    true unpredictability in the environment, owing to a collection of mental biases. Inaddition, with a similar bias, people misestimate the accumulation of the serial riskof ruin from dynamic multi-period risk taking.

    Academic Biases It has been the experience of the author that most academicalacademics, owing to a selection bias for a certain "focus", suffer from considerabledeficit in erudition (mathematical and philosophical) to connect the point discussedin the Incerto to its context and applications so they mistake the idea of The BlackSwan for something they may have been exposed to in their own area, with corre-sponding antecedents, or narrow research tradition. So the idea of The Black Swanseems sort of familiar to them, but when prompted to define it, they make analogiesto the wrong derivations, fields, or concepts.This comes from the of squeezing work from a discipline (say tail risk) into an-

    other, say statistics, without knowing what one should know.1

    1.2 disciplinesThe genealogical chart 1.2 makes clear the texture of the idea and earlier research,both in the philosophical and mathematical traditions: for instance, in spite of the

    1 In [85] "For, as Popper repeatedly stresses, one should not take disciplines too seriously. What mattersare problems rather than disciplines, the latter existing largely for historical reasons and administrativeconvenience. The problems of natural philosophy cut across all conventional disciplinary boundaries."We do believe that there exists a discipline of risk management (necessarily tail risk management) whichis a completely separate one. 5

  • so what is the black swan problem?affinity and the complementarity, there has been no connection between the philo-sophical idea of skepticism and the mathematics of fat tails (or even the concept offat tails as seen in consequence space).2

    Problem of Induction The classical logical problem of how observation of ele-ments in a set A allows generalizations that applies outside of it. Where A is atemporal series, using the notation in this book, how the enumeration of observa-tions up to period t allows to make statements about a given period t +Dt.We use "empirical risk estimators" as naive frequencies of subsets of events.Note that contrary to recent treatments the problem isnt Hume, but has been

    treated by Sextus Empiricus, Algazel, a collection of Fideists, etc.One interesting extension is that a time series can fit a wide selection of extrapo-

    lating nonlinear models (The "grue paradox").

    Convergence and Law of Large Numbers At what speed the sum of n randomvariables reveal properties about their mean. It is a central theme, but in Chapter7 in particular, we show that under fat tails the law operates very slowly, if at all.Also under model errors.We transform the LLN into a mathematical version of the problem of induction

    in Chapter 4 and the speed of convergence in Chapter 7.Another convergence problem concerns the Central Limit Theorem, and the more

    general Stable Distribution Basin: how sums (averages) of random variables con-verge in distribution to a basin, the Gaussian being a special case. We find con-vergence to the Gaussian to be slower in the real world than in the textbook, andtotally defective as an approximation since convergence doesnt truly take place inthe tails.

    Tail Events A tail event is one with small probability that takes place away fromthe center of the distribution. We define mathematically "where does the tail start"section 4.5 for a unimodal distribution, with crossovers between "body" and "tails"based on convexity to the scale of the distribution.

    Fat Tailed Distributions Fattailedness for a distribution is to what extent the prop-erties of a sum is dominated by those of a single observation, the "tail event". Wehave more precise gradations mathematically defined in Chapter 4 : binary, sub-gaussian, thin-tailed, mixed finite moments, subexponential, and scalable. Wedefine Mediocristan and Extremistan at the two opposite poles of the classes ofdistributions.

    Tail Risks Risk of events that take place away from the center, mostly what iscalled "ruin" in the insurance literature.Risk and tail risk are indistinguishable for our purpose, owing to the fact that

    events in the body of the distributions are not quite the business of risk manage-ment.

    Cramr Condition Requires existence of exponential moments as a condition forthe use of the Large Deviation Theorem(s), in other words, demands thintailedness.Simply, to be in the yellow part of Figure 1.3.

    2 The only treatment linking mathematical probability to skepticism and degrees of belief as expressed inthe ancient traditions is James Franklins The science of conjecture: evidence and probability before Pascal [46].6

  • 1.2 disciplinesThe Supreme Scientific Rigor of The Russian School of Probability Onecan believe in the rigor of mathematical statements about probability withoutfalling into the trap of providing naive computations subjected to model error.If this author were to belong to a school of thought designated by a nationality,the

    {Nationality} school of {discipline},it would be the Russian school of probability.Members across three generations: P.L. Chebyshev, A.A. Markov, A.M. Lya-punov, S.N. Bernshtein (ie. Bernstein), E.E. Slutskii, N.V. Smirnov, L.N.Bolshev, V.I. Romanovskii, A.N. Kolmogorov, Yu.V. Linnik, and the new gen-eration: V. Petrov, A.N. Nagaev, A. Shyrayev, and a few more.They had something rather potent in the history of scientific thought: theythought in inequalities, not equalities (most famous: Markov, Chebyshev, Bern-stein, Lyapunov). They used bounds, not estimates. Even their central limitversion was a matter of bounds, which we exploit later by seeing what takesplace outside the bounds. They were world apart from the new generation ofusers who think in terms of precise probability or worse, mechanistic socialscientists. Their method accommodates skepticism, one-sided thinking: "A is> x, AO(x) [Big-O: "of order" x], rather than A = x.For those working on integrating the mathematical rigor in risk bearing theyprovide a great source. We always know one-side, not the other. We knowthe lowest value we are willing to pay for insurance, not necessarily the upperbound (or vice versa).a

    a The way this connects to robustness, which we will formalize in later chapters, is as follows. Isrobust what does not change across perturbation of parameters of the probability distribution; thisis the core of the idea in Part II with our focus on fragility and antifragility. The point is refinedwith concave or convex to such perturbations.

    Large Deviation Principle and Theorems Under some conditions, the tail proba-bilities decay exponentially, a property that is central in risk management, both inunderstanding of the tail behavior and as a method to figure out where does notnecessarily decay exponentially. The principle tells us why things do not blow-upunder thin-tailedness but, more significantly, why they could under fat tails, orwhere the Cramr condition is not satistied. Where MN is a mean of a sequence ofrealizations (identically distributed) of a random variable, for large N, consider thetail probability:

    P(MN > x) eNI(x),where I(.) is the Cramer (or rate) function (Varadhan [133], Denbo and Zeitouni[24]). If we know the distribution of X, then, by Legendre transformation, I(x) =supq>0 (qx l(q)), where l(q) = logE

    eq(X)

    is the cumulant generating function

    (which explains the Cramer condition which requires existence of exponential mo-ments).A special case for Bernoulli variables is the Chernoff Bound discussed with bi-

    nary payoffs in 2.2.

    7

  • so what is the black swan problem?

    Degenerate

    Bernoulli

    Thin- Tailed from Convergence to GaussianCOMPACT SUPPORT

    Subexponential

    Supercubic 3Lvy-Stable

  • 1.2 disciplinesInseparability Problem The idea that decisions and statistical properties shoulddepend on one another. It is foolish to take a decision under statistical opacitywhen exposed to left-tail risk, though not otherwise.

    Rational and PseudoRational Decision-Making Pseudorational decision-makingis when a researcher, typically a psychologist, pathologizes a decision or deems itnonrational while using the wrong probability representation.

    Contract Theory The modification of exposure by changing contractually the risk,rather than focusing on unknown distributions.

    Skeptical Empiricism Often mistaken for empiricism, it is the exact opposite ofwhat can be called nave empiricism. It looks for properties of data without fallingfor problems of induction, mostly by asymmetric inference: distinguish betweenevidence of absence and absence of evidence. Seeing one black swan can allows usto state that the statement "there are no black swans" is false, but not seeing anydoesnt allow us to state that "there are no black swans".More practically, we develop a version of skeptical empiricism into nonevidentiary

    risk management, leading to the precautionary principle.

    Figure 1.4: A Discourse to Show that SkepticalPhilosophy is of Great Use in Science by Franoisde La Mothe Le Vayer (1588-1672), apparentlyBishop Huets source. Every time I find a orig-inal thinker who figured out the skeptical so-lution to the Black Swan problem, it turnsout that he may just be rephrasing a prede-cessor not maliciously, but we forget to digto the roots. As we insist, "Humes problem"has little to do with Hume, who carried theheavy multi-volumeDictionary of Pierre Bayle(his predecessors) across Europe. I initiallythought it was Huet who was the man; but asone digs, new predecessors crop up.

    Academic pseudo-similarities problem Because of lack of erudition (typical ofsocial scientists), and an overriding desire to simplify, some (the nonmathematical)say about the Incerto "Hume did that" (not only wrong, but Hume was rephrasingBayles ideas), other artists say this is just about "fat tails" or "Mandelbrot" (notrealizing fat tails originate with Pareto and Mandelbrot was "gray swans" not blackones), and not realizing the lack of connection of the mathematical fat tails traditionto the problem of induction; others attribute the idea to "Frank Knight" (who missedboth fat tails and skeptical philosophy), other idiots to Popper (who rediscoveredFaviers synthesis of the ancient philosophical tradition from Menodotus), othersto the biases "Kahneman-Tversky" (limited, according to Daniel Kahneman to thintails), etc.

    9

  • so what is the black swan problem?Incidentally people who know little tend to make statements overestimating sim-

    ilarities and eradicating differentiation: "Tolstoy? Nothing special. Just a novel inRussian. Dostoevsky did that".

    1.3 statistics and risk: two different businessesWe said (and will repeat) that risk management and tail risk are the same thing;further they cannot be separate from decision-making. Hence the risk manager isthe complement of the statistician they are in different businesses. (As a matterof fact, as we will see in the section on robust statistics, they are in diametricallyopposite businesses).

    The (conventional) statistician looks at the properties inside the confidenceintervals, the risk analyst outside of them. (Figures 1.5 and 1.6

    which is the reason statisticians have trouble with risk statements.

    Domain of (Mechanistic)Statistical

    Techniques

    98% tunnelTail RiskTail Opportunity(optionality)

    -15 -10 -5 5 10 15

    Figure 1.5: Conventionalfocus of statistical infer-ence, which is why mech-anistic uses cannot ap-ply to tail risks. Scien-tific papers operate in thatspace and cannot be di-rectly used to discuss risk.In addition confidence dis-cussions are binary state-ments and do not addresspayoff (see the codifica-tions in Chapter 1).(Notethat the two distributionsare calibrated to deliverexactly the same proba-bility of staying in thetunnel [K,K], here 98%,with K : P(X < K) =.01,P(X > K) = .99).

    1.4 robustness of payoff or robustness of estima-tion?Paradoxically, the use of "robust" statistics in some domains always makes onesexposure more fragile, and vice versa. We will attempt to specify in which domainsincreasing statistical robustness directly worsens exposures.

    10

  • 1.4 robustness of payoff or robustness of estimation?

    Fat-tailedDistribution(payoff swampsprobability)

    Thin-tailedDistribution(probability swamps payoff)

    -22 -20 -18 -16 -14 -12 -10

    Figure 1.6: Zoom-in ofFigure 1.5, showing maindifference between tailrisks seen under thin(Mediocristan) and fattails (Extremistan) andwhy the usual statisticstexbook discussionson probabilities needto be refined for riskmanagement .

    The occurrence of a Black Swan event doesnt change the ex post probability much(it is "robust" according to the definitions we will provide below), but it can changethe perceived expectation.

    The unbearable robustness of probability A practical example, take a streamof payoff, or sample stochastic run XT (xi)Ti=1 examined at period T, comingfrom a generating process with straight properties: the benefits are boundedat +1 and the losses unbounded at . Let the sample run at time T be freeof losses in its history, so XT = f1, 1, . . . , 1, 1g; minXT = maxXt = 1. Addan observation at minus infinity (sort of, since the payoff is unbounded, or,more practically put an arbitrarily large negative number, say 1015) for the(T + 1)th observation. Now a probability adjustment (or revision to take intoaccount the probability of the negative catastrophe) will be of the order 1T+1 ,so if T = 10000; it is truly tiny; the "Black Swan" (defined as x < 0) which wasdeemed to be, say, 0 (or rather small), has now a known probability of around0.00001 .00001 (one can overreact and quintuple the newly discovered proba-bility and still it would make little difference in probability space). The changein "estimated" probability, no matter how significant the event appears to be, istiny. But the expectation of the payoff blew up to our practical minus "infinity".For that, all it took is a single observation.This example illustrates the robustness of probability as statistically estimated

    compared to that of the payoff. Which alas, is the problem.

    This is quite critical because, and it is the theme of the next chapter, risk is con-verned with payoff space (and tail payoff at that), not probability space, a trivialpoint with nontrivial consequences. This in effect makes our discussion differentfrom those of the logical black swan (not capitalized) since the former is entirely inTrue/False space, hence probability space.Let us get a bit more technical in the discussion of statistical robustness.

    11

  • so what is the black swan problem?1.4.1 Some Elements of Robustness in StatisticsWe will discuss the Influence Function (or Empirical Influence Function), see Ham-pel 1974 [59], i.e., by how much a statistical functional changes in response to arbi-trary change in a single data point, that is, replace the i-th value in the sample byan arbitrary value and consider at the output of the estimator. (Reid, [103], Huber1981[66]). But first we define a statistical functional.

    Definition 1.3 (Statistical Functional).A statistical functional (Huber, [66]) is an estimator T(n)(x1, x2, . . . , xn) based onempirical frequencies in a sample of size n that is not determined by the size of thesample, that is, is invariant to nm (where n,m 2 Z, 8m > n, T(n) = T(nm)).For (x1, x2, . . . , xn) being represented by a distribution function F, we write T(n)(F).

    Note that in our formulation we used m, i.e., a reduction and increase in samplesize, rather than just +m.A statistical functional, under assumptions of continuity, can be written as a type

    of Taylor expansion for functionals, known in statistics as a Von Mises Expansion.From [103]:

    T(G) = T(F) + dTF(G F) +O(something) (1.1)where dTF(G F) is a differential T(F) operating on (G F), which is a Gteauxdirectional derivative.3

    dTF(G F) = limt!0

    1t

    (T[F + t(G F)] T(F)) . (1.2)Our business here with the Gteaux derivative is because it shows the change of astatistical functional of a distribution F in reponse to a contamination by anotherstatistical distribution G.

    Influence Function Where d(x) is the Dirac Delta Function at x and F the cumula-tive distribution (as estimated),

    IF(x; F, T) = limt!0

    1t

    T(1 t)F + td(x)

    T (F)

    (1.3)

    Rewriting 1.2:

    dTF(G F) =Z

    IF(x; F, T) d(G F)(x)since both G and F have total mass of 1,

    dTF(G F) =Z

    IF(x; F, T) dG(x)

    where IF is the same influence function in 1.3.

    3 The Gteaux derivative dF(u; ) of F at u in the direction Y is defined as dF(u;y) = limt!0F(u+ty)F(u)

    t ifthe limit exists for all Y in the domain.12

  • 1.4 robustness of payoff or robustness of estimation?In brief, IF, the Influence Function allows us to estimate the effect of Diracfunction at x on the estimator (the probability is so small that we can justadd the event without changing the weighting). The IF allows the detectionof the "fragility" of the measure to an outlier, with the use of influence maps.However unlike our measure of fragility it does not include acceleration anddoes not account for nonlinear exposures.

    In discrete time, we consider:

    IFi(X) = n(Tn(x1, x2, . . . , xn, x) Tn(x1, x2, . . . , xn))We will use the influence function in the discussion of fragility, in our attempt

    to remedy the fact that some influence functions are unbounded, hence the corre-sponding finite variation would be indirectly parametric.

    1.4.2 The Black Swan and Statistical RobustnessRemark 1.2."Robustifying" the statistical measure (robustness taken in the statistical sense4) can leadto the "fragilizing" in the risk management sense, and vice versa.

    The nightmare of Robust Statistic It can be explained in this citation in a textbook,Maronna et al, [84] (and, of all things, 2006):

    The three-sigma edit rule: Observations with jtij> 3 ( my note: ti xixs ,where s is the in-sample standard deviation and x the sample mean) are tradi-tionally deemed as suspicious (the three-sigma rule), based on the fact that theywould be "very unlikely" under normality, since P(jxj 3) = 0.003 for a randomvariable x with a standard normal distribution.

    Consider the difference between the mean (not robust) and the median (robust).Using the median to gauge tail risks will blow people up. Methods to "robustify" astatistical estimation:

    clipping outliers, Winsorizing use of median discussions of frequencies of success/failure use of VaR instead of CVar low dimentional stress testing linearizing the payoff functionWe dedicate quite a bit to the point with the difference between vanilla and binary

    payoffs in the next chapter.

    4 Let us say for now that, with our mathematical formulation of fragility, we define (non)robustness quitedifferently, based on presence or absence of nonlinear response to random events. 13

  • so what is the black swan problem?1.4.3 Mean Deviation vs Standard Deviation, Class Lecture DerivationsWhy The [CENSORED] did statistical science pick Mean Deviation over STD? Hereis the story, with analytical derivations not seemingly available in the literature. InHuber [66]:

    There had been a dispute between Eddington and Fisher, around 1920, aboutthe relative merits of dn (mean deviation) and Sn (standard deviation). Fisherthen pointed out that for exactly normal observations, Sn is 12% more efficientthan dn, and this seemed to settle the matter. (My emphasis)

    Where N is the count of the summands:

    Asymptotic Relative Efficiency = limN!

    V(Std)E(Std)2

    ,V(Mad)E(Mad)2

    !

    Assume we are certain that Xi, the components of sample follow a Gaussiandistribution, normalized to mean=0 and a standard deviation of 1.

    Relative Standard Deviation Error The Characteristic function Y1(t) of the distri-

    bution of x2: Y1(t) =R

    ex22 +itx

    2p2p

    dx = 1p12it . With the squared deviation z = x

    2, f ,the pdf of N summands becomes:

    fZ(z) =12p

    Z

    exp(itz)

    1p1 2it

    Ndt =

    2 N2 e z2 z N2 1

    GN2

    , z > 0.Now take y =

    pz, fY(y) = 2

    1 N2 ez22 zN1

    G( N2 ), z > 0, which corresponds to the Chi

    Distribution with N degrees of freedom. Integrating to get the variance: Vstd(N) =

    N 2G(N+12 )

    2

    G( N2 )2 . And, with the mean equalling

    p2G( N+12 )G( N2 )

    , we get V(Std)E(Std)2 =

    NG( N2 )2

    2G( N+12 )2 1.

    Relative Mean Deviation Error Characteristic function again for jxj is that of afolded Normal distribution, but let us redo it:

    Y2(t) =R 0

    q2p e x22 +itx = e t

    22

    1 + i erfi

    tp2

    , where erfi is the imaginary error

    function er f (iz)=i.

    The first moment: M1 = i t1e

    t2

    2N21 + i erfi

    tp2N

    N t=0

    =q

    2p .

    The second moment, M2 = (i)2 2t2e

    t2

    2N21 + i erfi

    tp2N

    N t=0

    = 2N+p2pN .

    Hence, V(Mad)E(Mad)2 =

    M2M21M12 =

    p22N .

    Finalemente, the Asymptotic Relative Efficiency For a Gaussian

    ARE = limN!

    N

    NG( N2 )2

    G( N+12 )2 2

    p 2 =

    1p 2 .875

    14

  • 1.5 why we should retire standard deviationwhich means that the standard deviation is 12.5% more "efficient" than the meandeviation conditional on the data being Gaussian and these blokes bought the argu-ment.Except that the slightest contamination blows up the ratio. We will discussthe problem of STD vs MD at length in Chapter 4 (characteristically on Fat Tailsand "Turkey Problems") and we will show later why Norm L2 is not appropriatefor about anything; but for now let us get a glimpse on how fragile the STD is.

    1.4.4 Eect of Fatter Tails on the "eciency" of STD vs MDConsider a standard mixing model for volatility, allowing us to raise the kurtosisup to twice that a Gaussian.Switch between Gaussians (keeping the mean constant and central at 0) with:

    V(x) =

    s2(1 + a)s2(1 a)

    with probability pwith probability (1 p)

    Assume further that s = 1.[INCOMPLETE]

    1.5 why we should retire standard deviationThe notion of standard deviation has confused hordes of scientists; it is time toretire it from common use and replace it with the more effective one of mean de-viation. Standard deviation, STD, should be left to mathematicians, physicists andmathematical statisticians deriving limit theorems. There is no scientific reason touse it in statistical investigations in the age of the computer, as it does more harmthan good-particularly with the growing class of people in social science mechanis-tically applying statistical tools to scientific problems.Say someone just asked you to measure the "average daily variations" for the

    temperature of your town (or for the stock price of a company, or the blood pressureof your uncle) over the past five days. The five changes are: (-23, 7, -3, 20, -1). Howdo you do it?Do you take every observation: square it, average the total, then take the square

    root? Or do you remove the sign and calculate the average? For there are seriousdifferences between the two methods. The first produces an average of 15.7, thesecond 10.8. The first is technically called the root mean square deviation. The sec-ond is the mean absolute deviation, MAD. It corresponds to "real life" much betterthan the first-and to reality. In fact, whenever people make decisions after beingsupplied with the standard deviation number, they act as if it were the expectedmean deviation.It is all due to a historical accident: in 1893, the great Karl Pearson introduced

    the term "standard deviation" for what had been known as "root mean square er-ror". The confusion started then: people thought it meant mean deviation. Theidea stuck: every time a newspaper has attempted to clarify the concept of market"volatility", it defined it verbally as mean deviation yet produced the numericalmeasure of the (higher) standard deviation.

    15

  • so what is the black swan problem?But it is not just journalists who fall for the mistake: I recall seeing official doc-

    uments from the department of commerce and the Federal Reserve partaking ofthe conflation, even regulators in statements on market volatility. What is worse,Goldstein and I found that a high number of data scientists (many with PhDs) alsoget confused in real life.It all comes from bad terminology for something non-intuitive. By a psycholog-

    ical bias Danny Kahneman calls attribute substitution, some people mistake MADfor STD because the former is easier to come to mind.1) MAD is more accurate in sample measurements, and less volatile than STD

    since it is a natural weight whereas standard deviation uses the observation itselfas its own weight, imparting large weights to large observations, thus overweighingtail events.2) We often use STD in equations but really end up reconverting it within the

    process into MAD (say in finance, for option pricing). In the Gaussian world, STD

    is about 1.25 time MAD, that is,q

    p2 . But we adjust with stochastic volatility where

    STD is often as high as 1.6 times MAD.3) Many statistical phenomena and processes have "infinite variance" (sa the pop-

    ular Pareto 80/20 rule) but have finite, and very well behaved, mean deviations.Whenever the mean exists, MAD exists. The reverse (infinite MAD and finite STD)is never true.4) Many economists have dismissed "infinite variance" models thinking these

    meant "infinite mean deviation". Sad, but true. When the great Benoit Mandel-brot proposed his infinite variance models fifty years ago, economists freaked outbecause of the conflation.It is sad that such a minor point can lead to so much confusion: our scientific tools

    are way too far ahead of our casual intuitions, which starts to be a problem withscience. So I close with a statement by Sir Ronald A. Fisher: The statistician cannotevade the responsibility for understanding the process he applies or recommends.

    1.6 error propagation in higher dimensions in progressOutliers get worse with dimensionality One observation we traders madewhen the VaR (Value at Risk) came out: on some random days, without any-thing noticeable happening in prices, we would be baffled to see the jointdistribution blowing up in the tails, that is, deliver an event that has lowerthan 103 probability without any of the variables exceeding one "standard de-viation" (or, rather, the equivalent in mean average deviation). Let me repeat:without anything noticeable happening in the various underlying distributions in-volved.We had already been made familiar with such puzzles from quanto payoffs,

    derivatives that entail more than one underlying variable, but nothing of thatmagnitude.

    Of course we naturally blamed the effect on the break-down in correlations. But(as we see in section 4.17 on joint fattailedness and elliptical distributions) one large

    16

  • 1.6 error propagation in higher dimensions in progresssegment of the effect had little to do with breakdown in correlation; it comes fromparameter imprecision, a slight error rate around the expected variations.

    Remark 1.3 (Convexity of error propagation to higher dimensions).Parameter estimation errors at the level of the underlying probabilities propagating into ad-dimensional cupola are convex in relation to the order of dimensionality d.

    Consider a vector of independent but not identically distributed random vari-ables each one could follow its own distribution. Now take the "tail exposure" asprobability of being outside an interval, which we arbitrarily define as "below".For the general proof we need some tools and definitions.

    Probability integral transform Let X have a continuous distribution for which thecumulative distribution function (CDF) is FX . Then the random variable Y definedas Y = FX(X) is uniform between 0 and 1, which we write YU[0,1].The proof is as follows: For t 2 [0, 1],

    P(Y t) = P(FX(X) t) = P(X F1X (t)) = FX(F1X (t)) = t.

    which is the cumulative distribution function of of the U[0,1].

    Cupolas Let us define a copula Cd : [0, 1]d ! [0, 1]. Consider a ddimensionalrandom vector (X1,X2, . . . ,Xd) with continuous marginal distribution functionsF1(X1), F2(X2), . . . , Fd(Xd)

    . The copula of (X1,X2, . . . ,Xd) is the joint cumulative

    distribution function of the variables (U1,U2, . . . ,Ud):

    C(u1, u2, . . . , ud) = P(U1 u1,U2 u2, . . . ,Ud ud)

    where (U1,U2, . . . ,Ud) = (F1(X1), F2(X2), . . . , Fd(Xd)).Next let us set arbitrarily a level f : f = f1 = f2 = . . . = fd, where these are

    cumulative "left tail probabilities" as they correspond to the values on the left ofwhatever level Ki, which is set such that f = F1i (Ki). Note that f 2 [0, 1).Assuming independence, we simplify with Cd( f ) = f n.

    Errors expressed in probability We can easily prove things with a simplified model,where f has a symmetric error d, and see how it propagates in higher dimensions.

    ..f.

    f d

    .p = 1

    2

    .

    f + d

    .

    p =12

    Define Mi(d) as the ith moment for the copula of dimension d. The first momentis M1(1) = f . The second moment becomes M2(1) = 12

    ( f d)2 + (d + f )2 = f 2

    d2.

    Define the relative error for the d-copula as RE(d) = Md2M2d1M2d1

    , we have: RE(d) =

    17

  • so what is the black swan problem?f2d

    d2 + f 2

    d 1 which increases with d in an accelerated way owing to its con-vexity, as the second derivative becomes:

    2RE(d)d2

    = f2dd2 + f 2

    d logd2 + f 2

    2 log( f )

    2.

    A more involved elaboration using full distributions is as follows. The uncon-ditional f is uniformly distributed on (0, 1). So let the conditional, that is a givenstarting point for f at m, be distributed with a mean of m (we presume no bias)following a Beta distribution, which is a natural conditional distribution bounded

    on the same domain (0, 1), Bh bmm1 , b

    i. Clearly the distribution of the error will

    have a variance, calibrated through b, in accordance with whatever error rate theassessors have, which could be in proportion to sample size.5

    The distribution of an individual cumulative frequency becomes

    F( f ) =(1 x)b1x

    bmm11

    b bmm1 , b

    , f 2 (0, 1)where b(., .) is the beta function.There are ways to derive the PDF of the product of beta distributed independent

    random variables in terms of Meijer functions (Tang and Gupta, [123], Bhargaval etal,[11], most recent Dunkl, [30]). But it is not necessary at this stage as we can getthe moments using the product rule and compute the relative errors.The first moment M1(1) = m and M2(1) =

    m((b1)m+1)bm+1 .

    Using for the relative error for the d-copula as RE(d) = Md2M2d1M2d1

    , we have:

    RE(d) = m2d

    m((b 1)m + 1)b m + 1

    d m2d

    !(1.5)

    5 Effect of Errors on the Mean of the Cupola (a Note): Where we show that the presence of an error ratelowers the tail probabilities for the copula regardless of distributions.Define an error rate d in probability that affects all elements of the vector or random variables, d < f .We simulate a Bernouilli distribution of f+d,d, ...g with di=1 d(1)i = 0 for d odd.We can write Cd( f ) = di=1

    f + d(1)i, which yields:

    Cd( f ) =f d+1

    df ;1

    d+1

    d + f(1.4)

    where (; )q is the q-Pochhammer symbol, the q-analog of the Pochhammer which represents the "fallingfactorial", (a; q)n = n1k=0 (1 aqk) = (1 a)(1 aq)(1 aq2) (1 aqn1) with (a; q)0 = 1. The derivativebecomes:

    Cd( f )d

    = f d(d + f )

    df ;1

    (1,0)d+1

    + f df ;1

    d+1

    (d + f )2

    which is negative. We can see the effect: for d = 2, ( f + d)( f d) = f 2 d2.So ironically creating symmetric noise at the level of the distribution lowers tail probability, a mild effectin higher dimensions since tail probabilities by compounding become very low.Now, to simplify the point, define f = F(K), f+ = F(K + DK), f = F(K DK) We can clearly show thatf+ f f 2.18

  • 1.6 error propagation in higher dimensions in progress

    0.2 0.4 0.6 0.8 1.0f

    Pr

    Figure 1.7: The uniformdistribution of cumulativedistributions f, vs Betawith mean at m = 13 .

    RE(d) /. { 0.1, b 20}RE(d) /. { 0.3, b 20}

    4 6 8 10d

    5

    10

    15

    20

    25

    Relative Error of Cupola RE[d]

    Figure 1.8: RE(d), the rel-ative error for the cupola(from estimation error inthe components) as weincrease dimensionality.Consider RE(1) as the errorfor a given component.

    10 15 20 25 30d

    5.01091.010101.510102.01010Relative Error of Cupola RE[d]

    Figure 1.9: Relative errorfor the cupola as we in-crease dimensionality,withm = .03.

    with derivative

    REd

    = m2dm((b 1)m + 1)

    b m + 1d

    logm((b 1)m + 1)

    b m + 1 2 log(m)

    19

  • so what is the black swan problem?The main idea of The Black Swan was understood by :100% of Firemen99.9% of skin-in-the-game risk-takers and businesspersons85% of common readers80% of hard scientists (except some complexity artists)65% of psychologists (except Harvard psychologists)60% of traders25% of U.K. journalists15% of money managers who manage money of others1.5% of "Risk professionals"1% of U.S. journalistsand0% of economists (or perhaps, to be fair, .5%)If is frequent that economists like Andrew Lo and Mueller [76] or Nicholas Bar-beris [7] play straw man by treating it as "popular" (to delegitimize is intellec-tual content) while both misunderstanding (and misrepresenting) its messageand falling for the very errors it warns against, as in the confusion betweenbinary and vanilla exposures or the notion of adapting decision to model orrepresentational error.a

    a Lo and Mueler: "... "black swans" (Taleb, 2007). These cultural icons refer to disasters that occur soinfrequently that they are virtually impossible to analyze using standard statistical inference. How-ever, we find this perspective less than helpful because it suggests a state of hopeless ignorancein which we resign ourselves to being buffeted and battered by the unknowable." Had they readThe Black Swan they would have found the message is the exact opposite of "blissful ignorance".[CENSORED] [CENSORED].

    1.7 the tail sensitivity theorems in progress Why should uncertainty about climate models lead to a more conserva-tive, more cautious, "greener" stance, even if one disbelieved the models?

    Why do super-rich gain more from inequality than from gains in GDP, inproportion to how rich they are?

    Why do derivatives "in the tails" depend more on changes expectedvolatility than from changes in the mean (not well known by practition-ers of mathematical finance (derivatives) who get periodically harmed byit)?

    Why should worry about GMOs even if one accepted their purportedbenefits?

    It is a necessarily mathematical relation that remote parts of the distribution the tails are less sensitive to changes in the mean, and more sensitive to otherparameters controlling the dispersion and scale (which in the special case of classof finite-variance distributions would be the variance) or those controlling the tailexponent.

    20

  • 1.7 the tail sensitivity theorems in progressWe are interested in the expectation of order k for the segment above (below) a

    certain level K, mK =R sgn(K)jKj x

    k dF(x), jxj2 (l,) where l 0. Without specifyingwhat type of distribution we are considering, we can show that there exists a valuejxj> jxq j defined as the "tail" in ?? above which "large values" of jxj have thefollowing properties.For Type 1 distributions:

    Remark 1.4 (Delta-vega dominance for Type 1 distributions).

    Changes in the value of f : []! [0, 1], f(x) = xm l 1s e(g(1s (xd))

    n, m 2 Z, s 0 where l

    is a norming constant and g(x) a function in the class C1 that satisfiesRx2D x

    meg(xds ) dx =

    sl , develop a greater relative dependence on changes in s than on d when jxj> jxq j, with

    jxq j: g0j xq d

    sj j s

    xq d j,

    and the relative effect grows at larger values of |x|, that is above jx0q j:

    jx0q j=

    8>:x : 1s22s2(d x)g0

    xds

    + s

    g0xds

    2 g00 xds + (x d)2 > 0

    9>=>;In other words to use derivatives parlance, the vega dominates the gamma be-

    yond a certain level xq in an increasing way.Taking the ratio of derivatives:

    f

    s

    f

    d=x ds

    1g0xds

    The relative acceleration becomes:

    2f

    s22fd2

    =1s2

    2s2(d x)g0

    xds

    + s

    g0xds

    2 g00 xds + (x d)2

    which has singularity atx : g0(.) =

    pg00(.)

    .

    As we see g(.) can take many functional form in the class C1 that satisfy theintegrability and derivatives conditions. With l constant, the numerator is fs =

    Cxmeg(

    xds )((dx)g0( xds )+s)

    s3 , the denominator:fd =

    Cxmeg(xds )g0( xds )s2 , which makes

    the ratio invariant to m and l.

    Remark 1.5.Type 1 includes the normal, lognormal, exponential, gamma distributions.

    For the Gaussian, g(x) (xm)22s2 , m = 0, l = 1p2ps . The behavior of the kth moment isexamined with m = 1. The condition for the dominance of s over s:

    jxj d + 12sm

    qm2 + 4s2

    21

  • so what is the black swan problem?for the first oder dominance. The second order dominance:

    jxj dr

    12

    5 +

    p17s

    For the lognormal, g(x) (mlog(x))22s2 , m = 1, l = 1p2ps . The behavior of the kthmoment is examined with k = m 1. The condition:

    x d + sem+2s2

    Crossover points

    -4 -2 2 4 x

    -20-10

    10

    20

    30

    40Ratio of 2nd derivatives

    Figure 1.10: GaussianCase

    1 2 3 4 5 6 7x

    -5

    5

    10Ratio of 2nd derivatives

    Figure 1.11: LognormalCase

    For Type 2 distributions:

    Remark 1.6 (Delta vega tradeoff for type 2 distributions).

    Changes in the value of f(x) = L(a, s)xds

    a1where L is a slowly varying function of

    x develop a greater relative dependence on changes in s or those in a than on d at greatervalues of jxj.

    22

  • 1.8 skepticism and precautionProof. As before (from the property of the independence of the ratio of derivativesof g(.), m, and other parameters, we recover:

    s

    L(a, s)xm( xds )

    a1

    d

    L(a, s)xm( xds )a1

    = (x d) L(0,1)(a, s)(a + 1)L(a, s)

    +1s

    !

    For the dependence on the exponent a:

    s

    Lxm( xds )

    a1

    d

    Lxm( xds )a1

    = (x d)L(a, s) log

    xds

    L(1,0)(a, s)

    (a + 1)L(a, s)

    Remark 1.7.Class 2 includes powerlaw tailed distributions (Pareto, Lvy-Stable,Student T, Singh-Maddala).

    Low model uncertainty

    High model uncertainty

    Ruin

    Ruin probability

    -5 5 10 15

    Figure 1.12: The more un-certain or skeptical one isof "scientific" models andprojections, the higher therisk of ruin, which flies inthe face of the argumentof the style "skepticalof climate models". Nomatter how increased theprobability of benefits,ruin as an absorbingbarrier, i.e. causing ex-tinction without furtherrecovery, can more thancancels them out. Thisgraph assumes changesin uncertainty withoutchanges in benefits (amean-preserving sensitiv-ity) the next one isolatesthe changes in benefits.

    1.8 skepticism and precautionWe can see in Figure 1.2 (the map of the Incerto) the lack of explicit intellectualbridge between the skeptical tradition and the mathematical expressions of un-certainty (which includes model error). This fragmentation of traditions makespeople fail to realize that the mathematics of uncertainty are the expression of

    23

  • so what is the black swan problem?Figure 1.13: The graphshows the asymmetry be-tween benefits and harmand the effect on the ruinprobabilities. Shows theeffect on ruin probabil-ity of changes the In-formation Ratio, that is,expected benefit

    uncertainty (or signal di-vided by noise). Bene-fits are small comparedto negative effects. Threecases are considered, twofrom Extremistan: ex-tremely fat-tailed (a =1), and less fat-tailed(a = 2), and one fromMediocristan.

    skepticism and should be treated as such and remain compatible with the twomillennia of philosophical and logical work on skepticism.

    Mathematically, more uncertainty about the future or about a model increasesthe scale of the distribution, hence thickens the "left tail" (as well as the "right one"if two-tailed) which raises the potential ruin. As shown in theorems 1.4 and 1.6, thesurvival probability is reduced no matter the "benefits" because of the convexityeffect, one needs a lot, a lot of benefits to offset tail increases.The introduction in general in any field with potential iatrogenics of any new

    element without available track record (hence model uncertainty) fattens the lefttail.Some straight applications:

    Skepticism about climate models should lead to more precautionary policiesin the presence of ruin. It is incoherent to doubt the mean while reducing thevariance.

    "Mitigating" policies aiming at reducing risks say geoengineering in fact arelikely to increase such risk.

    Conservatism is a dominant strategy in the tails.In addition, such increase uncertainty matters far more in Extremistan and has

    benign effects in Mediocristan. Figure 1.13 shows the asymmetries between costsand benefits as far as ruin probabilities, and why these matter more for fat-taileddomains than thin-tailed ones. In thin-tailed domains, an increase in uncertaintychanges the probability of ruin by several orders of magnitude, but the effect re-mains small: from say 1040 to 1030 is not quite worrisome. In fat-tailed domains,the effect is sizeable as we start with a substantially higher probability of ruin(which is typically underestimated).We show in Figures 1.12 and 1.13 that an increase in uncertainty leads to an

    increase in the probability of ruin, hence "skepticism" is that its impact on decisions24

  • 1.9 how science is black swan compatibleshould lead to increased, not decreased conservatism in the presence of ruin. Moreskepticism about models implies necessarilymore uncertainty about the tails, whichnecessitates more precaution about newly implemented techniques, or larger sizeof exposures. As we said, Nature might not be smart, but its longer track recordmeans smaller uncertainty in following its logic.

    1.9 how science is black swan compatible[BACKGROUND ON GMO debates]

    Feynman Read (or listen to) the great Richard Feynman express his opinionson science in his BS-busting pieces and you will notice his compatibility withthe skeptical empirical tradition, as he focuses on

    1. The asymmetry between absence of evidence and evidence of absence:"Science doesnt tell you that a theory is right, it can only tell you if atheory is wrong",

    2. The avoidance of consensus of "experts": science is a formal mechanismof superseding previously established statements and not a democraticaggregation of the opinions of individual scientists. ("Science is the beliefin the incompetence of experts"). Note that the skeptical empirical tradi-tion since Sextus Empiricus focused on doubting experts, not innocuousreligious practices many such as the fideists doubted what is called sci-entific knowledge.

    Traders know intuitively that financial markets are not predicted by theaggregate consensus (in fact the opposite) journalists and failed traders-turned-salespersons focus on consensus, surveys and raw beliefs. Priceformation is a function that is not linearly additive of the opinions of par-ticipants but one that depends on the upper bound, as in evolutionarysystems.

    Likewise scientific knowledge isnt the product of the opinions of a coun-cil of cardinals; it can only progress by consensus-busting it would oth-erwise be frozen in time. So if all "experts" say risk management by VaRis OK except for one who says and proves that it isnt OK, it doesntmean it is OK. One single argument rigorously formulated can but cen-turies of verbalistic opinions of experts.

    3. The burden of proof falls on one side not another: Someone askedFeynman if there were flying saucers. Feynman rejected the idea as un-likely. The person whom Feynman called an "antagonist" replied: "Thisis unscientific. Can you prove that there are no flying saucers?" To Feyn-man if was not unscientific at all, quite the opposite. Likewise this authorhad to answer "can you prove that there is risk of ruin?" with the usualreply: "You are the one who needs to prove absence of risk of ruin."

    25

  • so what is the black swan problem?Keep in mind that in decision making (specially under fat-tailed do-mains) we are not focusing on the standard scientific "likely" or "unlikely"but on impact as well (particularly in the left tail) which requires themodification of so