29
The Replication Crisis and HASS How Best Practices can Assist in Producing Reliable Research Martin Schweinberger The University of Queensland [email protected] Open Data Day

The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,[email protected]

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS

How Best Practices can Assist in

Producing Reliable Research

Martin SchweinbergerThe University of Queensland

[email protected]

Open Data Day

Page 2: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Background

Aims of this talk

◮ One of my core concerns: “Best Practices”(with respect to research technology and data analysis inlinguistics and language studies)

◮ Raise awareness for Best Practices in HASS

◮ Start a discussion about issues related to Best Practices

◮ Introduce R as a remedy to some issues related to bestpractices. . .

Martin Schweinberger, The University of Queensland, [email protected] 2

Page 3: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Introduction: Replication Crisis

What is the Replication Crisis?

Martin Schweinberger, The University of Queensland, [email protected] 3

Page 4: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Introduction: Replication Crisis

Replication Crisis

. . . ongoing methodological crisis primarily affecting parts ofthe social and life sciences beginning in the early 2010s.

◮ growing awareness of the problem that results of manyscientific studies are difficult or impossible toreplicate/reproduce.

◮ reproducibility is an essential part of the scientific method,

◮ inability to replicate the studies of others has potentiallygrave consequences for many fields of science in whichsignificant theories are grounded on unreproducible work.

Martin Schweinberger, The University of Queensland, [email protected] 4

Page 5: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Introduction: Replication Crisis

Replication Crisis

Martin Schweinberger, The University of Queensland, [email protected] 5

Page 6: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Introduction: Replication Crisis

Replication Crisis

Martin Schweinberger, The University of Queensland, [email protected] 6

Page 7: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Introduction: Replication Crisis

Replication Crisis

Martin Schweinberger, The University of Queensland, [email protected] 7

Page 8: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Introduction: Replication Crisis

Replication Crisis

Martin Schweinberger, The University of Queensland, [email protected] 8

Page 9: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Introduction: Replication Crisis

Replication Crisis

Martin Schweinberger, The University of Queensland, [email protected] 9

Page 10: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Introduction: Replication Crisis

Replication Crisis

Martin Schweinberger, The University of Queensland, [email protected] 10

Page 11: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Introduction: Replication Crisis

Replication Crisis

Martin Schweinberger, The University of Queensland, [email protected] 11

Page 12: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Introduction: Replication Crisis

Replication Crisis

Martin Schweinberger, The University of Queensland, [email protected] 12

Page 13: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Introduction: Replication Crisis

Replication Crisis

Nature 2016 poll of 1,500 scientists

◮ 70% had failed to reproduce at least one other scientist’sexperiment

◮ 50% had failed to reproduce one of their own experiments(cf. Fanelli 2009)

2009 meta-analysis of surveys on science fraud (Fanelli 2009)

◮ 2% admitted to falsifying studies at least once

◮ 14% admitted to personally knowing someone who did

Martin Schweinberger, The University of Queensland, [email protected] 13

Page 14: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Introduction: Replication Crisis

Loss of (public) trust!

Martin Schweinberger, The University of Queensland, [email protected] 14

Page 15: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Introduction: Replication Crisis

What about the Humanities?

Martin Schweinberger, The University of Queensland, [email protected] 15

Page 16: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Replication Crisis in Language Studies

Problem

We just do not know how bad our science is. . .(outright forgery, data manipulation, p-hacking, etc.)

because we do not (or only rarely)reproduce and replicate. . .

Martin Schweinberger, The University of Queensland, [email protected] 16

assuming you are studying language

Page 17: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Replication Crisis in Language Studies

Replication Crisis in Language StudiesGood

◮ blind peer-review

◮ we are open and share if we are asked (sometimes)

◮ discussion has begun (cf. e.g. Berez-Kroeker et al. 2018)

Bad

◮ analyses are not reproducible/replicated

◮ reliance on tools not scripts

◮ reproduction is discouraged(if successful: journals are not interested in publishing thesame analysis twice/several times;if unsuccessful: researchers do not want to threaten theface of other researchers)

Martin Schweinberger, The University of Queensland, [email protected] 17

Page 18: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Solutions

How can we solve this issue?

Martin Schweinberger, The University of Queensland, [email protected] 18

Page 19: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Solutions

Solutions

Open access (FAIR data)Access to data sets to enable reproduction

PublicationAbility to reproduce/replicate should be mandatory

Scripting / CodeScripts rather than tools

Martin Schweinberger, The University of Queensland, [email protected] 19

Page 20: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Solutions

Open access (FAIR data)◮ Access to data sets to enable replication (see

Berez-Kroeker et al. 2018: for a more extensivediscussion on this point)

◮ Access should be easy (not only for programmers!)

◮ (Open) Public Repositoriesdata sets/corpora/raw data should be made available forreplication (within ethical boundaries)

◮ Corpora should be treated as publications and should becited as such (increases citations and makes it moreattractive to publish data sets/corpora)

◮ Papers that rely on data that is not available should notbe published in journals (pressure on publishing houses orother outlets)

Martin Schweinberger, The University of Queensland, [email protected] 20

Page 21: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Solutions

PublicationIf we want the HASS community to adopt Best Practices weneed to change as a community

- No publication of non-replicable research!

- Publication of null results must be encouraged(pre-registration)

- Results of all replications should be published

- Replication should be a common practice especially duringBA/MA (students learn how more advanced researchershave handled problems and conducted research)

- Installing best practices: extensive support for trainingprograms

- “Center for Quality Assurance” or sth. like that wherepeople can voice concerns about research practices

Martin Schweinberger, The University of Queensland, [email protected] 21

Page 22: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Solutions

Scripting / Code◮ Scripts allow exact replication (total transparency)

◮ Only practical solutions for true replication(too time consuming to replicate a tool-based analysis)

◮ Data analysis is too fine-grained to be described in papers(including all steps the researcher has undertaken)

◮ Training programs for basic programming atuniversities/schools (obligatory for grad programs)

Martin Schweinberger, The University of Queensland, [email protected] 22

Page 23: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Solutions

Why R?

Allows full transparency and replication of research

- Open source free-ware ( 6= SPSS)

- Scripts can be shared easily (easily connected to Git)

- Allows full transparency because all steps of the analysisare available

- A human/user-centered language ( 6= the C family orJava)

- Fully-fledged programming environment

- Not teaching researchers to use software but to createsoftware → flexibility, independence, employability!

Martin Schweinberger, The University of Queensland, [email protected] 23

Page 24: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Solutions

Why R?

Allows full transparency and replication of research

- One of the fastest growing world’s top 10 programmingenvironments

- Enormous support community (StackOverflow, etc.)

- Extreme flexibility of methods (thousands of packages)

- Variability in output (statistics, visualizations, textanalysis, speech analysis, websites, slides, apps, netbooks,etc.)

- Compatibility with other software packages that arecommon in Language Studies (PRAAT, MAUS, etc.)

Martin Schweinberger, The University of Queensland, [email protected] 24

Page 25: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Solutions

Why R?

For HASS (Language Studies)

- Combines the advantages of Python, Stata/MatLab andGephi:

Offers the same functionality of Python (NLP) but is(arguably) better at complex data analysis(Stata/MatLab/SPSS) and data viz (including geomapping) (e.g. Gephi)

- Is already wide-spread in the community!

- Usable for many different glyph systems (unicode)

- Can be used to create and curate corpora

Martin Schweinberger, The University of Queensland, [email protected] 25

Page 26: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Solutions

R in HASS

Every journey begin s with a first step and, step by step, wecan go miles on end!

◮ Packages for text analysis/NLP/data viz/statistics arereadily available

◮ Complex issues can be broken down into simple chunks

◮ Very easy to learn (steep or shallow learning curve)

◮ Even very basic skills allow performing complex analyses

Martin Schweinberger, The University of Queensland, [email protected] 26

Page 27: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS Solutions

Solutions at UQ

◮ Training program: workshops on R√

/X

(for all levels of expertise Center for Digital

Scholarship/School of Languages and Cultures)

◮ Materials√

/X

Language Technology and Data Analysis Laboratory(LADAL) website (data and text analysis with R:https://slcladal.github.io/index.html)

◮ Study program X (beginning to plan a program)Digital HASS (BA/MA program including modules ondata and text analysis with R)

Martin Schweinberger, The University of Queensland, [email protected] 27

Page 28: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

Aschwanden, C. (2018). Psychology’s replication crisis has made the field better.

Berez-Kroeker, A. L., L. Gawne, S. S. Kung, B. F. Kelly, T. Heston, G. Holton,P. Pulsifer, D. I. Beaver, S. Chelliah, S. Dubinsky, et al. (2018). Reproducibleresearch in linguistics: A position statement on data citation and attribution in ourfield. Linguistics 56(1), 1–18.

Diener, E. and R. Biswas-Diener (2019). The replication crisis in psychology.

Fanelli, D. (2009). How many scientists fabricate and falsify research? a systematicreview and meta-analysis of survey data. PLoS One 4, e5738.

McRae, M. (2018). Science’s ’replication crisis’ has reached even the most respectablejournals, report shows.

Resnick, B. (2018). More social science studies just failed to replicate. here’s why thisis good.what scientists learn from failed replications: how to do better science.

Velasco, E. (2019). Researcher discusses the the science replication crisis.

Weir, K. (2015). A reproducibility crisis? the headlines were hard to miss: Psychology,they proclaimed, is in crisis. Monitor on Psychology 46, 39.

Yong, E. (2018). Psychology’s replication crisis is running out of excuses. another bigproject has found that only half of studies can be repeated. and this time, theusual explanations fall flat.

Page 29: The Replication Crisis and HASS [.25cm] How Best Practices ...martinschweinberger.de/docs/ppt/schweinberger-ppt... · OpenDataDay MartinSchweinberger,TheUniversityofQueensland,m.schweinberger@uq.edu.au

The Replication Crisis and HASS References"

The Replication Crisis and HASS

How Best Practices can Assist in

Producing Reliable Research

Martin SchweinbergerThe University of Queensland

[email protected]

Open Data Day

Martin Schweinberger, The University of Queensland, [email protected] 28