24
Reproducible Research and CSIRO MATHEMATICS, INFORMATICS, AND STATISTICS Alec Zwart 21 November 2012 , S. & Claerbout, J. F. Guest Editors’ Introduction: Reproducible Research. Computing in Science & Engineering 11, 5–7 (2009).

Reproducible Research and R - Alec Zwart

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Reproducible Research and R - Alec Zwart

Reproducible Research and

CSIRO MATHEMATICS, INFORMATICS, AND STATISTICS

Alec Zwart 21 November 2012

Image: Fomel, S. & Claerbout, J. F. Guest Editors’ Introduction: Reproducible Research. Computing in Science & Engineering 11, 5–7 (2009).

Page 2: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

Preliminary: Markup

2 |

Page 3: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

Donald Knuth - Literate Programming

• DE Knuth - The Computer Journal, 1984.

‘Instead of imagining that our main task is to instruct a computer

what to do, let us concentrate rather on explaining to human

beings what we want a computer to do.’

3 |

Page 4: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

WEB:

4 |

‘Weave’

‘Tangle’

Page 5: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

Weaving (modern version)

5 |

CWEBCWEB, noweb,Sweave, knitr…

Code blocks

Text w/ markup

Code block outputs

Text markup processor(LaTeX, Web browser, Markdown processor)

Formatted output

Language translator(R, Python…)

Text, code & output

w/ markup

Page 6: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

Why?

• Knuth – a way to program

• Automatic report generation (web services)

• Reports, articles, program documentation/tutorials

• Reproducible research

6 |

Page 7: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

Reproducible Research

• Promoted by Jon F. Claerbout, Stanford University (1990’s?).

• Early publication: Wavelab and Reproducible Research, Buckheit & Donoho 1995.• ‘When we publish articles containing figures which were generated by

computer, we also publish the complete software environment which generates the figures’.

• Special issue, Computing in Science & Engineering, V11-1,2009.

7 |

Image: Fomel, S. & Claerbout, J. F. Guest Editors’ Introduction: Reproducible Research. Computing in Science & Engineering 11, 5–7 (2009).

Page 8: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

Reproducible Research in Statistics

• Gentleman & Temple Lang 2004, ‘Statistical Analysis and Reproducible Research’.

‘It is important, if not essential, to integrate the computations and code used in data analyses, methodological descriptions, simulations, and so on with the documents that describe and rely on them.’

8 |

Page 9: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart9 |

Gentleman and Temple Lange: The Compendium

Page 10: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

Literate programming systems in

• CRAN: Task Views – ReproduceableResearch

• Sweave (R+LaTeX, standard for vignette production)

• Knitr (various + various)

• Other possibilities (ascii, odfWeave, brew, etc)

10 |

Page 11: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart11 |

Knitr + Markdown – Yihui Xie

Page 12: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

Publish on…

12 |

Page 13: Reproducible Research and R - Alec Zwart

CSIRO Mathematics, Informatics & StatisticsAlec Zwart

t +61 2 6216 7010e [email protected]

CSIRO MATHEMATICS, INFORMATICS AND STATISTICS

Thank you

Page 14: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

Reproducible Research – again, why?

• Anil Potti - Duke University, North Carolina• Personalised medicine for cancer patients• Microarray work

• Statisticians Keith Baggerly, Kevin Coombes intrigued by results from Potti’s research – decide to investigate:• Found errors, including lots of simple ones – mislabelled samples,

mismatched gene names, etc.

• To date: 10 retractions, 7 corrections, 1 partial retraction. Anil Potti resigned.

• Dishonesty? Ignorance, incompetence + wishful thinking? Unclear

14 |

Page 15: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

B & D: Reproducible Research – why?

• Buckheit & Donoho – anecdotes:• Which of these printouts was the right version of the figure? – Arrgh!• Stolen brief case – loss of irreplaceable figures.• Limitations of oral communication of software & algorithms.• Documentation – returning to old work.• Er – can’t remember what parameter values gave this result – not to worry…

• ‘An article about computational science in a scientific publication is NOT the scholarship itself, it is merely advertising of the scholarship.The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures’ - Buckheit & Donoho

15 |

Page 16: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

Knitr + markdown

• Markdown – text formatting system, not nearly as powerful as LaTeX, but simple

• Knitr + markdown great for producing quick reports in HTML

• Incorporated into RStudio – See RStudio pages for docs.

• Knitr webpage: http://yihui.name/knitr • Documentation for code chunk options:

http://yihui.name/knitr/options

16 |

Page 17: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart17 |

Page 18: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

Weaving (modern version)

18 |

CWEBCWEB, noweb,Sweave, knitr…

Code blocks

Text w/ markup

Code block outputs

Text markup processor(LaTeX, Web browser, Markdown processor)

Formatted output

Language translator(R, Python…)

Text, code & outputw/ markup

Page 19: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart19 |

Page 20: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart20 |

Knitr + Markdown – Yihui Xie

Page 21: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart21 |

Knitr + Markdown – Yihui Xie

Page 22: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

G & TL – the Compendium

• For RR, may need to provide:• Dynamic document files• Extra code files• Extra text processing files (e.g. LaTeX style files, etc?)• Data files• Instructions/documentation

• Place all of this in a suitable container – the Compendium• A folder with subfolders• An R package!

• GolubRR package – Gentleman 2005, ‘Reproducible Research: A Bioinformatics Case Study’.

22 |

Page 23: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

WEB

23 |

CWEBWEB

‘Weave’

‘Tangle’

Page 24: Reproducible Research and R - Alec Zwart

Reproduceable Research and R | Alec Zwart

Knitr

• Yihui Xie• Scratching Sweave itches?• Greater functionality

• better R output capture, • better code formatting, • built in caching, • better graphics handling, • source R code from scripts, • more customizable.

• Multiple programming languages (R, python, AWK), and alternative text processing systems (LaTeX, markdown, restructured text & more)

24 |