33
Steps toward reproducible research Karl Broman Biostatistics & Medical Informatics, UW–Madison kbroman.org github.com/kbroman @kwbroman Slides: bit.ly/UMass2016

Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

Steps toward reproducible research

Karl Broman

Biostatistics & Medical Informatics, UW–Madison

kbroman.orggithub.com/kbroman

@kwbromanSlides: bit.ly/UMass2016

Page 2: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

Karl -- this is very interesting ,however you used an old version ofthe data (n=143 rather than n=226).

I'm really sorry you did all thatwork on the incomplete dataset.

Bruce

2

Page 3: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

The results in Table 1 don’t seem tocorrespond to those in Figure 2.

3

Page 4: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

In what order do I run these scripts?

4

Page 5: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

Where did we get this data file?

5

Page 6: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

Why did I omit those samples?

6

Page 7: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

How did I make that figure?

7

Page 8: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

”Your script is now giving an error.”

8

Page 9: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

”The attached is similar to the code we used.”

9

Page 10: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

Reproducible

vs.

(Replicable) invisible text

10

Page 11: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

Reproducible

vs.

Replicable

10

Page 12: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

Reproducible

vs.

Correct

10

Page 13: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

Levels of quality

▶ Are the tables and figures reproducible from the codeand data?

▶ Does the code actually do what you think it does?

▶ In addition to what was done, is it clear why it wasdone?

(e.g., why did you omit those six subjects?)

▶ Can the code be used for other data?

▶ Can you extend the code to do other things?

11

Page 14: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

Steps toward reproducible research

kbroman.org/steps2rr

12

Page 15: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

1. Everything with a script

If you do something once,you’ll do it 1000 times.

13

Page 16: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

2. Organize your data & code

File organization and namingare powerful weapons against chaos.

– Jenny Bryan

14

Page 17: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

2. Organize your data & code

Your closest collaborator is you six months ago,but you don’t reply to emails.

(paraphrasing Mark Holder)

14

Page 18: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

2. Organize your data & code

RawData/ Notes/DerivedData/ Refs/

Python/ ReadMe.txtR/ ToDo.txtRuby/ Makefile

14

Page 19: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

3. Automate the process (GNU Make)

R/analysis.html: R/analysis.Rmd Data/cleandata.csvcd R;R -e "rmarkdown::render('analysis.Rmd')"

Data/cleandata.csv: R/prepData.R RawData/rawdata.csvcd R;R CMD BATCH prepData.R

RawData/rawdata.csv: Python/xls2csv.py RawData/rawdata.xlsPython/xls2csv.py RawData/rawdata.xls > RawData/rawdata.csv

15

Page 20: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

4. Turn scripts into reproducible reports

16

Page 21: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

4. Turn scripts into reproducible reports

16

Page 22: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

5. Turn repeated code into functions

# Pythondef read_genotypes (filename):

"Read matrix of genotype data"

# Rplot_genotypes <-function(genotypes , ...){}

17

Page 23: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

6. Create a package/module

Don’t repeat yourself

18

Page 24: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

7. Use version control (git/GitHub)

19

Page 25: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

7. Use version control (git/GitHub)

19

Page 26: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

7. Use version control (git/GitHub)

19

Page 27: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

7. Use version control (git/GitHub)

19

Page 28: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

7. Use version control (git/GitHub)

19

Page 29: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

8. License your software

Pick a license, any license

– Jeff Atwood

20

Page 30: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

Other considerations▶ Testing

are you getting the right answers?

▶ Software versionswill your stuff work when dependencies change?

▶ Large-scale computationscomputation time + dependence on cluster environment

▶ Collaborationscoordinating who does what and where things live

▶ Distributionwhere and how to distribute data and code?

21

Page 31: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

The most important tool is the mindset,when starting, that the end product

will be reproducible.

– Keith Baggerly

22

Page 32: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

Summary1. Everything with a script

2. Organize your data & code

3. Automate the process (GNU Make)

4. Turn scripts into reproducible reports

5. Turn repeated code into functions

6. Create a package/module

7. Use version control (git/GitHub)

8. Pick a license, any license

23

Page 33: Steps toward reproducible research - UW–Madisonkbroman/presentations/repro_researc… · Steps toward reproducible research Author: Karl Broman Created Date: 4/11/2016 10:07:00

Slides: bit.ly/UMass2016

kbroman.org

github.com/kbroman

@kwbroman

24