19
Gerd Gigerenzer Max Planck Institute for Human Development Berlin Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual … a meaningless ordeal of pedantic calculations. Edwin Boring … one of the worst things ever happened to psychology. Paul Meehl … a wrongheaded view of what constituted scientific progress. R. Duncan Luce

Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

Embed Size (px)

Citation preview

Page 1: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

Gerd Gigerenzer

Max Planck Institute for Human Development Berlin

Surrogate Science:How Fisher, Neyman-Pearson, and Bayes Were

Transformed into the Null Ritual

… a meaningless ordeal of pedantic calculations. Edwin Boring

… one of the worst things ever happened to psychology. Paul Meehl

… a wrongheaded view of what constituted scientific progress.R. Duncan Luce

Page 2: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

A Brief Time Table

• 1763 Thomas Bayes’ treatise published posthumously.• R. A. Fisher’s null hypothesis testing 1925, 1935, 1956. • Jerzy Neyman & Egon Pearson’s decision theory, late 1920s.

• Creation of the “hybrid theory” by textbook writers in psychology and education in the 1940s and 50s.

• Institutionalization of the “null ritual” in psychology 1950-1960, and subsequently in the social and biomedical sciences.

• Replication crisis in the social and biomedical sciences, ca. 2010.

Gigerenzer, Swijtink, Porter, Daston, Beatty & Krüger (1989). The empire of chance. Cambridge University Press.

Page 3: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

The Hybrid Logic of Statistical Inference

Superego(Neyman-Pearson)

Two statistical hypotheses; alpha and beta determined before the experiment; sample size calculated; no statements about the truth of hypotheses; …

Ego(Fisher)

Null hypothesis only; significance level computed after the experiment; beta and power ignored; sample size by rule of thumb;

gets papers published but left with feeling of guilt

Id (Bayes)

Desire for probabilities of hypotheses

Page 4: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

The Null Ritual

1. Set up a null hypothesis of “no mean difference” or “zero correlation”. Do not specify the predictions of your own research hypothesis.

2. Use 5 % as a convention for rejecting the null. If significant, accept your research hypothesis. Report p < .05, p < .01, or p < .001, whichever comes next to the obtained p-value.

3. Always perform this procedure.

Gigerenzer 2004 Mindless statistics. Journal of Socio-Economics

Page 5: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

Four Features of Rituals

1. Repetition of the same sequence of actions

2. Special numbers or colors

3. Fear of sanctions for rule violations

4. Delusions and wishful thinking

Dulaney & Fiske 1994

Page 6: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

Textbook Writers’ Wishful ThinkingSome early examples including the Replication Fallacy

J. P. Guilford 1942. Fundamental Statistics in Psychology and Education.“we obtained directly the probabilities that the null hypothesis was plausible”

“If the result comes out one way, the hypothesis is probably correct, if it comes out another way, the hypothesis is probably wrong.” J. C. Nunnally 1975. Introduction to Statistics for Psychology and Education."the probability that an observed difference is real”

"the investigator can have 95 percent confidence that the sample mean actually differs from the population mean”

"the statistical confidence . . . with odds of 95 out of 100 that the observed difference will hold up in investigations”

"All of these are different ways to say the same thing.”

Page 7: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

Hybrid Logic: “It’s All the Same.”

Textbooks confuse p-values with the probability of Type I error:

“The p-value is more informative than the 5 percentage/1 percentage approach, in that it gives the exact probability of a Type I error.” (Dougherty 2007, Introduction to Econometrics, 3rd ed., p. 105).

“Notice that SPSS computes the exact probability of a Type I error (the p-value).” (Howell 2008, Fundamental statistics for the behavioral sciences, p. 299).

For a list of textbooks featuring this confusion, see R. Hubbard, 2016, Corrupt Research, p. 209-212.

Even critics of significance testing confuse p-values with Type I errors:

[The researcher] “gleefully records the tiny probability number ‘p < .001,’ and there is a tendency to feel the extreme smallness of this probability of a Type I error” (Meehl 1967, p. 107)

“The value of p that is obtained as the result of NHST is the probability of a Type I error on the assumption that the null hypothesis is true.” Nickerson 2000, p. 243:

Page 8: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

Statistical Power

“too difficult to discuss” (Guilford 1956 Fundamental Statistics, p. 217)

1994 APA Publication Manual: the first to mention that power should be taken seriously.

Page 9: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

Power estimates reported in reviews and meta-analyses from the social and behavioral sciences.Estimates are for small effect sizes (d = .2) and alpha = 5%. Smaldino & McElreath 2016.

Statistical Power of Experiments Is Constantly Low: Flipping a Coin Would Be the Better Experiment

Page 10: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

Statistical Power in the Neurosciences

730 neuroscience studies (e.g. Alzheimer’s disease genetics, cancer biomarkers)

Overall median power: 21%

A few Neurology studies: power > 90%

461 fMRI studies: power = 8%

Button et al. 2013 Nature Reviews Neuroscience

Page 11: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

Researchers’ Delusions

Page 12: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

Systematic delusions about the p-value

Delusions (Abbreviated)

British Academic Psychologists (n=70)

German Statistics Teachers (n=30)

German Professors/ Lecturers (n=39)

German Psychology Students (n=44)

1. H0 is absolutely disproved 1 10 15 34

2. Probability of H0 is found 36 17 26 32

3. H1 is absolutely proved 6 10 13 20

4. Probability of H1 is found 66 33 33 59

5. Probability of wrong decision 86 73 67 68

6. Probability of replication 60 37 49 41

Total: One or more delusions 97 80 90 100

British academic psychologists: Oakes (1986). German psychologists who teach statistics, professors of psychology, and students who successfully completed two semesters of statistics courses: Haller & Kraus (2002).

Page 13: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

Delusions (Abbreviated)

British Academic Psychologists (n=70)

German Statistics Teachers (n=30)

German Professors/ Lecturers (n=39)

German Psychology Students (n=44)

1. H0 is absolutely disproved 1 10 15 34

2. Probability of H0 is found 36 17 26 32

3. H1 is absolutely proved 6 10 13 20

4. Probability of H1 is found 66 33 33 59

5. Probability of wrong decision 86 73 67 68

6. Probability of replication 60 37 49 41

Total: One or more delusions 97 80 90 100

British academic psychologists: Oakes (1986). German psychologists who teach statistics, professors of psychology, and students who successfully completed two semesters of statistics courses: Haller & Kraus (2002).

Illusion of Certainty

Page 14: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

Delusions (Abbreviated)

British Academic Psychologists (n=70)

German Statistics Teachers (n=30)

German Professors/ Lecturers (n=39)

German Psychology Students (n=44)

1. H0 is absolutely disproved 1 10 15 34

2. Probability of H0 is found 36 17 26 32

3. H1 is absolutely proved 6 10 13 20

4. Probability of H1 is found 66 33 33 59

5. Probability of wrong decision 86 73 67 68

6. Probability of replication 60 37 49 41

Total: One or more delusions 97 80 90 100

British academic psychologists: Oakes (1986). German psychologists who teach statistics, professors of psychology, and students who successfully completed two semesters of statistics courses: Haller & Kraus (2002).

Illusion of Certainty Bayesian Id

Page 15: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

Delusions (Abbreviated)

British Academic Psychologists (n=70)

German Statistics Teachers (n=30)

German Professors/ Lecturers (n=39)

German Psychology Students (n=44)

1. H0 is absolutely disproved 1 10 15 34

2. Probability of H0 is found 36 17 26 32

3. H1 is absolutely proved 6 10 13 20

4. Probability of H1 is found 66 33 33 59

5. Probability of wrong decision 86 73 67 68

6. Probability of replication 60 37 49 41

Total: One or more delusions 97 80 90 100

British academic psychologists: Oakes (1986). German psychologists who teach statistics, professors of psychology, and students who successfully completed two semesters of statistics courses: Haller & Kraus (2002).

Illusion of Certainty Bayesian Id Replication Fallacy

Page 16: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

Bad ScienceTo Produce Significant Results

Page 17: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

  1. Failing to report all dependent measures: 67% (78%) 2. Collecting more data after seeing whether results were significant: 58% (72%)

3. Selectively reporting studies that “worked”: 50% (67%) 4. Excluding data after looking at the impact of doing so for the results: 43% (62%)

5. Reporting an unexpected finding as having been predicted from the start: 35% (54%)

6. Failing to report all of a study’s conditions: 27% (42%) 7. Rounding down a p-value (e.g., reporting .054 as less than .05): 23% (39%)

 

Bad Science Has Replaced Good Research:Self-reported questionable practices (estimates of true prevalence) by

2,155 US academic psychologists. Listed are the 7 most frequent practices.

John, Loewenstein, & Prelec 2012 Psychological Science

Page 18: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

1. Ban p-values? NO. 2. Replace by Bayes factors? NO. Another chimera.

3. Revolutionize teaching of statistics. Teach the statistical toolbox.

4. Change research assessment culture from quantity to quality, and from counting to reading.

5. Get philosophers and historians of science better involved in analyzing statistical rituals and the growth of surrogate science.

 

Surrogate Science

Gigerenzer & Marewski 2015. Surrogate science. Journal of Management.Gigerenzer 2004. Mindless statistics. Journal of Socio-Economics.

Page 19: Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed into the Null Ritual

1. Good scientific practice – bold theories, double-blind experiments, minimizing measurement error, replication, etc. – became reduced in the social sciences to a surrogate: statistical significance. 2. Instead of teaching a toolbox of statistical methods by Fisher,

Neyman-Pearson, Bayes, and others, textbook writers created a hybrid theory with the null ritual at its core, and presented it anonymously as statistics per se.

3. The null ritual requires delusions about the meaning of the p-value. It’s blind spots led to studies with a power so low that throwing a coin would do better. To compensate, researchers engage in bad science to produce significant results which are unlikely to be reproducible.

3. Researchers’ delusion that the p-value already specifies the probability of replication (1 – p) makes replication studies appear superfluous.

4. The replication crisis in the social and biomedical sciences is typically attributed to wrong incentives. But that is only half the story. Researchers tend to believe in the ritual, and the null ritual also explains why these incentives and not others were set in the first place.

 

Surrogate Science