Upload
herbert-peters
View
218
Download
2
Tags:
Embed Size (px)
Citation preview
A A R H U S U N I V E R S I T E T
Faculty of Agricultural Sciences
Literate programming with SAS
- and other languages
Søren Højsgaard
Faculty of Agricultural SciencesAarhus University
Denmark
SASforum, May 2009, Copenhagen
Take-home message
Literate programming: Combining text, code and results in one document
Supports text formats: LaTeX / OpenOffice (OpenDocument Text)
In combination with the ’engines’ SAS, R, S-plus, Maple, Stata, …
Ensures reproducibility of analysis Great help in ”recalling what I did 2 months ago”
StatWeave does all this – and is free… This talk: Focus on StatWeave with OpenOffice
and SAS/R …
Overview – Combining code, documentation and results
Source document Writing SAS statements More writing R statements Even more writing More SAS statements More writing…
Final document Writing SAS statements SAS output SAS graphics More writing R statements R output Even more writing SAS statements SAS output More writing…
Hello StatWeave World…
What is literate programming
Knuth (1979) coined the term literate programming: Create software as works of literature: Embed source code into descriptive text (rather than the
opposite which is common practice) Software should follow flow of thoughts and logic Should be designed to be readable by humans (and not
only by compilers / programs).
Very useful idea in statistics…
Why literate programming?
Reproducible statistical analysis Research, consulting Document exactly what has been done Possible to re-run if data change
Manuals, course notes etc. Shown output guaranteed to be result of shown code
Some systems for literate programming
Comments inside code WEB (Knuth 1979) and friends Sweave (Lesich 2002)
R code in LaTeX documents
odfWeave (Kuhn and Coulter 2007) R code in OpenOffice documents
SASweave (Lenth and Højsgaard 2007) SAS / R code in LaTeX documents
StatWeave SAS / R / maple / S-plus / Stata … code in LaTeX and
OpenOffice documents
StatWeave
StatWeave created by Russ Lenth, University of Iowa, USA Available: http://www.cs.uiowa.edu/~rlenth/StatWeave/ StatWeave is in its making, but becomming ”mature” and
stable.
Statweave design goals Support many languages
R, S-plus, SAS, Stata, Maple, … Support different word processing systems, currently
LaTeX OpenDocument Text (ODT) www.openoffice.org
Portability: Usable on all platforms (Written in JAVA) Extendible:
Add other languages
Under the hood of StatWeave
Source file is regular text document but with code chunks added (with special tags)
Two basic operations Weaving:
Process source file into single document with code listings, output listings, graphs…
Tangling: Extract code from source file to run later
Weaving is useful for reproducible statistical analysis
Running StatWeave
Command-line interface:statweave SAS-HelloWorld-swv.odt statweave --tangle SAS-HelloWorld-swv.odtstatweave --keepall SAS-HelloWorld-swv.odt
Graphical User Interface:
Generally, source xxx-swv.odt becomes output xxx.odt
Chicken weight data
Set global options (for SAS code) Inline evaluation of expressions
… chicken weight data
… chicken weight data
Output can be saved for later use - and display
Code reuse and argument substitution
Save code chunks for later execution Pass arguments to code chunks Simplest case: Not unlike a macro…
…code reuse and argument substitution
Costumize display and output (tables) by reusable code chunk
…code reuse and argument substitution
Multi-language example: SAS, R and DOS together
Can use different engines in the same source file Use SAS when appropriate; use R when appropriate; use
Maple when appropriate…
Weaving: SAS/R/XX chunks assembled into separate code files. Code files are processed in order of first appearence in
the source file
…Multi-language example: SAS, R and DOS together
…Multi-language example: SAS, R and DOS together
…Multi-language example: SAS, R and DOS together
…Multi-language example: SAS, R and DOS together
…Multi-language example: SAS, R and DOS together
…Multi-language example: SAS, R and DOS together
Synchronization issue: SAS chunk depends on data from R chunk which depends on data from SAS chunk….
Solution: The restart option will restart the engines
Code chunks are processed as a whole
Code chunks are processed as a ”unit” so in general one can not split a call to proc xxxx over several chunks:
Thus the following is illegal
… one exception in SAS: IML
Odds and ends – Maple
Differentiate y= sin(x) xxx
Output is ugly, but it reads:
Odds and ends – calling the shell
Want to list all StatWeave / Open office source files: *-swv.odt
Summary
Reproducible statistical analyses Integrate text, code and results in one document Several text formats Several languages
This talk (and the examples) are avaiable at http://genetics.agrsci.dk/~sorenh/misc/
All credit is due to Russ Lenth, the creator of StatWeave. Thanks!!!!