View
117
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Reproducible Research and
CSIRO MATHEMATICS, INFORMATICS, AND STATISTICS
Alec Zwart 21 November 2012
Image: Fomel, S. & Claerbout, J. F. Guest Editors’ Introduction: Reproducible Research. Computing in Science & Engineering 11, 5–7 (2009).
Reproduceable Research and R | Alec Zwart
Preliminary: Markup
2 |
Reproduceable Research and R | Alec Zwart
Donald Knuth - Literate Programming
• DE Knuth - The Computer Journal, 1984.
‘Instead of imagining that our main task is to instruct a computer
what to do, let us concentrate rather on explaining to human
beings what we want a computer to do.’
3 |
Reproduceable Research and R | Alec Zwart
WEB:
4 |
‘Weave’
‘Tangle’
Reproduceable Research and R | Alec Zwart
Weaving (modern version)
5 |
CWEBCWEB, noweb,Sweave, knitr…
Code blocks
Text w/ markup
Code block outputs
Text markup processor(LaTeX, Web browser, Markdown processor)
Formatted output
Language translator(R, Python…)
Text, code & output
w/ markup
Reproduceable Research and R | Alec Zwart
Why?
• Knuth – a way to program
• Automatic report generation (web services)
• Reports, articles, program documentation/tutorials
• Reproducible research
6 |
Reproduceable Research and R | Alec Zwart
Reproducible Research
• Promoted by Jon F. Claerbout, Stanford University (1990’s?).
• Early publication: Wavelab and Reproducible Research, Buckheit & Donoho 1995.• ‘When we publish articles containing figures which were generated by
computer, we also publish the complete software environment which generates the figures’.
• Special issue, Computing in Science & Engineering, V11-1,2009.
7 |
Image: Fomel, S. & Claerbout, J. F. Guest Editors’ Introduction: Reproducible Research. Computing in Science & Engineering 11, 5–7 (2009).
Reproduceable Research and R | Alec Zwart
Reproducible Research in Statistics
• Gentleman & Temple Lang 2004, ‘Statistical Analysis and Reproducible Research’.
‘It is important, if not essential, to integrate the computations and code used in data analyses, methodological descriptions, simulations, and so on with the documents that describe and rely on them.’
8 |
Reproduceable Research and R | Alec Zwart9 |
Gentleman and Temple Lange: The Compendium
Reproduceable Research and R | Alec Zwart
Literate programming systems in
• CRAN: Task Views – ReproduceableResearch
• Sweave (R+LaTeX, standard for vignette production)
• Knitr (various + various)
• Other possibilities (ascii, odfWeave, brew, etc)
10 |
Reproduceable Research and R | Alec Zwart11 |
Knitr + Markdown – Yihui Xie
Reproduceable Research and R | Alec Zwart
Publish on…
12 |
CSIRO Mathematics, Informatics & StatisticsAlec Zwart
t +61 2 6216 7010e [email protected]
CSIRO MATHEMATICS, INFORMATICS AND STATISTICS
Thank you
Reproduceable Research and R | Alec Zwart
Reproducible Research – again, why?
• Anil Potti - Duke University, North Carolina• Personalised medicine for cancer patients• Microarray work
• Statisticians Keith Baggerly, Kevin Coombes intrigued by results from Potti’s research – decide to investigate:• Found errors, including lots of simple ones – mislabelled samples,
mismatched gene names, etc.
• To date: 10 retractions, 7 corrections, 1 partial retraction. Anil Potti resigned.
• Dishonesty? Ignorance, incompetence + wishful thinking? Unclear
14 |
Reproduceable Research and R | Alec Zwart
B & D: Reproducible Research – why?
• Buckheit & Donoho – anecdotes:• Which of these printouts was the right version of the figure? – Arrgh!• Stolen brief case – loss of irreplaceable figures.• Limitations of oral communication of software & algorithms.• Documentation – returning to old work.• Er – can’t remember what parameter values gave this result – not to worry…
• ‘An article about computational science in a scientific publication is NOT the scholarship itself, it is merely advertising of the scholarship.The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures’ - Buckheit & Donoho
15 |
Reproduceable Research and R | Alec Zwart
Knitr + markdown
• Markdown – text formatting system, not nearly as powerful as LaTeX, but simple
• Knitr + markdown great for producing quick reports in HTML
• Incorporated into RStudio – See RStudio pages for docs.
• Knitr webpage: http://yihui.name/knitr • Documentation for code chunk options:
http://yihui.name/knitr/options
16 |
Reproduceable Research and R | Alec Zwart17 |
Reproduceable Research and R | Alec Zwart
Weaving (modern version)
18 |
CWEBCWEB, noweb,Sweave, knitr…
Code blocks
Text w/ markup
Code block outputs
Text markup processor(LaTeX, Web browser, Markdown processor)
Formatted output
Language translator(R, Python…)
Text, code & outputw/ markup
Reproduceable Research and R | Alec Zwart19 |
Reproduceable Research and R | Alec Zwart20 |
Knitr + Markdown – Yihui Xie
Reproduceable Research and R | Alec Zwart21 |
Knitr + Markdown – Yihui Xie
Reproduceable Research and R | Alec Zwart
G & TL – the Compendium
• For RR, may need to provide:• Dynamic document files• Extra code files• Extra text processing files (e.g. LaTeX style files, etc?)• Data files• Instructions/documentation
• Place all of this in a suitable container – the Compendium• A folder with subfolders• An R package!
• GolubRR package – Gentleman 2005, ‘Reproducible Research: A Bioinformatics Case Study’.
22 |
Reproduceable Research and R | Alec Zwart
WEB
23 |
CWEBWEB
‘Weave’
‘Tangle’
Reproduceable Research and R | Alec Zwart
Knitr
• Yihui Xie• Scratching Sweave itches?• Greater functionality
• better R output capture, • better code formatting, • built in caching, • better graphics handling, • source R code from scripts, • more customizable.
• Multiple programming languages (R, python, AWK), and alternative text processing systems (LaTeX, markdown, restructured text & more)
24 |