33
Steps toward reproducible research Karl Broman Biostatistics & Medical Informatics, UW–Madison kbroman.org github.com/kbroman @kwbroman Slides: bit.ly/CBB2016_wnotes

Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

Steps toward reproducible research

Karl Broman

Biostatistics & Medical Informatics, UW–Madison

kbroman.orggithub.com/kbroman

@kwbromanSlides: bit.ly/CBB2016_wnotes

Page 2: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

Karl -- this is very interesting ,however you used an old version ofthe data (n=143 rather than n=226).

I'm really sorry you did all thatwork on the incomplete dataset.

Bruce

2

Page 3: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

The results in Table 1 don’t seem tocorrespond to those in Figure 2.

3

Page 4: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

In what order do I run these scripts?

4

Page 5: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

Where did we get this data file?

5

Page 6: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

Why did I omit those samples?

6

Page 7: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

How did I make that figure?

7

Page 8: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

“Your script is now giving an error.”

8

Page 9: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

“The attached is similar to the code we used.”

9

Page 10: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

Reproducible

vs.

(Replicable) invisible text

10

Page 11: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

Reproducible

vs.

Replicable

10

Page 12: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

Reproducible

vs.

Correct

10

Page 13: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

Levels of quality

▶ Are the tables and figures reproducible from the codeand data?

▶ Does the code actually do what you think it does?

▶ In addition to what was done, is it clear why it wasdone?

(e.g., why did you omit those six subjects?)

▶ Can the code be used for other data?

▶ Can you extend the code to do other things?

11

Page 14: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

Steps toward reproducible research

kbroman.org/steps2rr

12

Page 15: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

1. Organize your data & code

File organization and namingare powerful weapons against chaos.

– Jenny Bryan

13

Page 16: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

1. Organize your data & code

Your closest collaborator is you six months ago,but you don’t reply to emails.

(paraphrasing Mark Holder)

13

Page 17: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

1. Organize your data & code

RawData/ Notes/DerivedData/ Refs/

Python/ ReadMe.txtR/ ToDo.txtRuby/ Makefile

13

Page 18: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

2. Everything with a script

If you do something once,you’ll do it 1000 times.

14

Page 19: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

3. Automate the process (GNU Make)

R/analysis.html: R/analysis.Rmd Data/cleandata.csvcd R;R -e "rmarkdown::render('analysis.Rmd')"

Data/cleandata.csv: R/prepData.R RawData/rawdata.csvcd R;R CMD BATCH prepData.R

RawData/rawdata.csv: Python/xls2csv.py RawData/rawdata.xlsPython/xls2csv.py RawData/rawdata.xls > RawData/rawdata.csv

15

Page 20: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

4. Turn scripts into reproducible reports

16

Page 21: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

4. Turn scripts into reproducible reports

16

Page 22: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

5. Turn repeated code into functions

# Pythondef read_genotypes (filename):

"Read matrix of genotype data"

# Rplot_genotypes <-function(genotypes , ...){}

17

Page 23: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

6. Create a package/module

Don’t repeat yourself

18

Page 24: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

7. Use version control (git/GitHub)

19

Page 25: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

7. Use version control (git/GitHub)

19

Page 26: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

7. Use version control (git/GitHub)

19

Page 27: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

7. Use version control (git/GitHub)

19

Page 28: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

7. Use version control (git/GitHub)

19

Page 29: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

8. License your software

Pick a license, any license

– Jeff Atwood

20

Page 30: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

Other considerations▶ Testing

are you getting the right answers?

▶ Software versionswill your stuff work when dependencies change?

▶ Large-scale computationscomputation time + dependence on cluster environment

▶ Collaborationscoordinating who does what and where things live

▶ Distributionwhere and how to distribute data and code?

21

Page 31: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

The most important tool is the mindset,when starting, that the end product

will be reproducible.

– Keith Baggerly

22

Page 32: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

Summary1. Organize your data & code

2. Everything with a script

3. Automate the process (GNU Make)

4. Turn scripts into reproducible reports

5. Turn repeated code into functions

6. Create a package/module

7. Use version control (git/GitHub)

8. Pick a license, any license

23

Page 33: Steps toward reproducible researchkbroman/presentations...2. Everything with a script 3. Automate the process (GNU Make) 4. Turn scripts into reproducible reports 5. Turn repeated

Slides: bit.ly/CBB2016_wnotes

kbroman.org

github.com/kbroman

@kwbroman

24