Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
The Replication Crisis and HASS
How Best Practices can Assist in
Producing Reliable Research
Martin SchweinbergerThe University of Queensland
Open Data Day
The Replication Crisis and HASS Background
Aims of this talk
◮ One of my core concerns: “Best Practices”(with respect to research technology and data analysis inlinguistics and language studies)
◮ Raise awareness for Best Practices in HASS
◮ Start a discussion about issues related to Best Practices
◮ Introduce R as a remedy to some issues related to bestpractices. . .
Martin Schweinberger, The University of Queensland, [email protected] 2
The Replication Crisis and HASS Introduction: Replication Crisis
What is the Replication Crisis?
Martin Schweinberger, The University of Queensland, [email protected] 3
The Replication Crisis and HASS Introduction: Replication Crisis
Replication Crisis
. . . ongoing methodological crisis primarily affecting parts ofthe social and life sciences beginning in the early 2010s.
◮ growing awareness of the problem that results of manyscientific studies are difficult or impossible toreplicate/reproduce.
◮ reproducibility is an essential part of the scientific method,
◮ inability to replicate the studies of others has potentiallygrave consequences for many fields of science in whichsignificant theories are grounded on unreproducible work.
Martin Schweinberger, The University of Queensland, [email protected] 4
The Replication Crisis and HASS Introduction: Replication Crisis
Replication Crisis
Martin Schweinberger, The University of Queensland, [email protected] 5
The Replication Crisis and HASS Introduction: Replication Crisis
Replication Crisis
Martin Schweinberger, The University of Queensland, [email protected] 6
The Replication Crisis and HASS Introduction: Replication Crisis
Replication Crisis
Martin Schweinberger, The University of Queensland, [email protected] 7
The Replication Crisis and HASS Introduction: Replication Crisis
Replication Crisis
Martin Schweinberger, The University of Queensland, [email protected] 8
The Replication Crisis and HASS Introduction: Replication Crisis
Replication Crisis
Martin Schweinberger, The University of Queensland, [email protected] 9
The Replication Crisis and HASS Introduction: Replication Crisis
Replication Crisis
Martin Schweinberger, The University of Queensland, [email protected] 10
The Replication Crisis and HASS Introduction: Replication Crisis
Replication Crisis
Martin Schweinberger, The University of Queensland, [email protected] 11
The Replication Crisis and HASS Introduction: Replication Crisis
Replication Crisis
Martin Schweinberger, The University of Queensland, [email protected] 12
The Replication Crisis and HASS Introduction: Replication Crisis
Replication Crisis
Nature 2016 poll of 1,500 scientists
◮ 70% had failed to reproduce at least one other scientist’sexperiment
◮ 50% had failed to reproduce one of their own experiments(cf. Fanelli 2009)
2009 meta-analysis of surveys on science fraud (Fanelli 2009)
◮ 2% admitted to falsifying studies at least once
◮ 14% admitted to personally knowing someone who did
Martin Schweinberger, The University of Queensland, [email protected] 13
The Replication Crisis and HASS Introduction: Replication Crisis
Loss of (public) trust!
Martin Schweinberger, The University of Queensland, [email protected] 14
The Replication Crisis and HASS Introduction: Replication Crisis
What about the Humanities?
Martin Schweinberger, The University of Queensland, [email protected] 15
The Replication Crisis and HASS Replication Crisis in Language Studies
Problem
We just do not know how bad our science is. . .(outright forgery, data manipulation, p-hacking, etc.)
because we do not (or only rarely)reproduce and replicate. . .
Martin Schweinberger, The University of Queensland, [email protected] 16
assuming you are studying language
The Replication Crisis and HASS Replication Crisis in Language Studies
Replication Crisis in Language StudiesGood
◮ blind peer-review
◮ we are open and share if we are asked (sometimes)
◮ discussion has begun (cf. e.g. Berez-Kroeker et al. 2018)
Bad
◮ analyses are not reproducible/replicated
◮ reliance on tools not scripts
◮ reproduction is discouraged(if successful: journals are not interested in publishing thesame analysis twice/several times;if unsuccessful: researchers do not want to threaten theface of other researchers)
Martin Schweinberger, The University of Queensland, [email protected] 17
The Replication Crisis and HASS Solutions
How can we solve this issue?
Martin Schweinberger, The University of Queensland, [email protected] 18
The Replication Crisis and HASS Solutions
Solutions
Open access (FAIR data)Access to data sets to enable reproduction
PublicationAbility to reproduce/replicate should be mandatory
Scripting / CodeScripts rather than tools
Martin Schweinberger, The University of Queensland, [email protected] 19
The Replication Crisis and HASS Solutions
Open access (FAIR data)◮ Access to data sets to enable replication (see
Berez-Kroeker et al. 2018: for a more extensivediscussion on this point)
◮ Access should be easy (not only for programmers!)
◮ (Open) Public Repositoriesdata sets/corpora/raw data should be made available forreplication (within ethical boundaries)
◮ Corpora should be treated as publications and should becited as such (increases citations and makes it moreattractive to publish data sets/corpora)
◮ Papers that rely on data that is not available should notbe published in journals (pressure on publishing houses orother outlets)
Martin Schweinberger, The University of Queensland, [email protected] 20
The Replication Crisis and HASS Solutions
PublicationIf we want the HASS community to adopt Best Practices weneed to change as a community
- No publication of non-replicable research!
- Publication of null results must be encouraged(pre-registration)
- Results of all replications should be published
- Replication should be a common practice especially duringBA/MA (students learn how more advanced researchershave handled problems and conducted research)
- Installing best practices: extensive support for trainingprograms
- “Center for Quality Assurance” or sth. like that wherepeople can voice concerns about research practices
Martin Schweinberger, The University of Queensland, [email protected] 21
The Replication Crisis and HASS Solutions
Scripting / Code◮ Scripts allow exact replication (total transparency)
◮ Only practical solutions for true replication(too time consuming to replicate a tool-based analysis)
◮ Data analysis is too fine-grained to be described in papers(including all steps the researcher has undertaken)
◮ Training programs for basic programming atuniversities/schools (obligatory for grad programs)
Martin Schweinberger, The University of Queensland, [email protected] 22
The Replication Crisis and HASS Solutions
Why R?
Allows full transparency and replication of research
- Open source free-ware ( 6= SPSS)
- Scripts can be shared easily (easily connected to Git)
- Allows full transparency because all steps of the analysisare available
- A human/user-centered language ( 6= the C family orJava)
- Fully-fledged programming environment
- Not teaching researchers to use software but to createsoftware → flexibility, independence, employability!
Martin Schweinberger, The University of Queensland, [email protected] 23
The Replication Crisis and HASS Solutions
Why R?
Allows full transparency and replication of research
- One of the fastest growing world’s top 10 programmingenvironments
- Enormous support community (StackOverflow, etc.)
- Extreme flexibility of methods (thousands of packages)
- Variability in output (statistics, visualizations, textanalysis, speech analysis, websites, slides, apps, netbooks,etc.)
- Compatibility with other software packages that arecommon in Language Studies (PRAAT, MAUS, etc.)
Martin Schweinberger, The University of Queensland, [email protected] 24
The Replication Crisis and HASS Solutions
Why R?
For HASS (Language Studies)
- Combines the advantages of Python, Stata/MatLab andGephi:
Offers the same functionality of Python (NLP) but is(arguably) better at complex data analysis(Stata/MatLab/SPSS) and data viz (including geomapping) (e.g. Gephi)
- Is already wide-spread in the community!
- Usable for many different glyph systems (unicode)
- Can be used to create and curate corpora
Martin Schweinberger, The University of Queensland, [email protected] 25
The Replication Crisis and HASS Solutions
R in HASS
Every journey begin s with a first step and, step by step, wecan go miles on end!
◮ Packages for text analysis/NLP/data viz/statistics arereadily available
◮ Complex issues can be broken down into simple chunks
◮ Very easy to learn (steep or shallow learning curve)
◮ Even very basic skills allow performing complex analyses
Martin Schweinberger, The University of Queensland, [email protected] 26
The Replication Crisis and HASS Solutions
Solutions at UQ
◮ Training program: workshops on R√
/X
(for all levels of expertise Center for Digital
Scholarship/School of Languages and Cultures)
◮ Materials√
/X
Language Technology and Data Analysis Laboratory(LADAL) website (data and text analysis with R:https://slcladal.github.io/index.html)
◮ Study program X (beginning to plan a program)Digital HASS (BA/MA program including modules ondata and text analysis with R)
Martin Schweinberger, The University of Queensland, [email protected] 27
Aschwanden, C. (2018). Psychology’s replication crisis has made the field better.
Berez-Kroeker, A. L., L. Gawne, S. S. Kung, B. F. Kelly, T. Heston, G. Holton,P. Pulsifer, D. I. Beaver, S. Chelliah, S. Dubinsky, et al. (2018). Reproducibleresearch in linguistics: A position statement on data citation and attribution in ourfield. Linguistics 56(1), 1–18.
Diener, E. and R. Biswas-Diener (2019). The replication crisis in psychology.
Fanelli, D. (2009). How many scientists fabricate and falsify research? a systematicreview and meta-analysis of survey data. PLoS One 4, e5738.
McRae, M. (2018). Science’s ’replication crisis’ has reached even the most respectablejournals, report shows.
Resnick, B. (2018). More social science studies just failed to replicate. here’s why thisis good.what scientists learn from failed replications: how to do better science.
Velasco, E. (2019). Researcher discusses the the science replication crisis.
Weir, K. (2015). A reproducibility crisis? the headlines were hard to miss: Psychology,they proclaimed, is in crisis. Monitor on Psychology 46, 39.
Yong, E. (2018). Psychology’s replication crisis is running out of excuses. another bigproject has found that only half of studies can be repeated. and this time, theusual explanations fall flat.
The Replication Crisis and HASS References"
The Replication Crisis and HASS
How Best Practices can Assist in
Producing Reliable Research
Martin SchweinbergerThe University of Queensland
Open Data Day
Martin Schweinberger, The University of Queensland, [email protected] 28