28
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural Sciences Aarhus University Denmark SASforum, May 2009, Copenhagen

A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Embed Size (px)

Citation preview

Page 1: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

A A R H U S U N I V E R S I T E T

Faculty of Agricultural Sciences

Literate programming with SAS

- and other languages

Søren Højsgaard

Faculty of Agricultural SciencesAarhus University

Denmark

SASforum, May 2009, Copenhagen

Page 2: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Take-home message

Literate programming: Combining text, code and results in one document

Supports text formats: LaTeX / OpenOffice (OpenDocument Text)

In combination with the ’engines’ SAS, R, S-plus, Maple, Stata, …

Ensures reproducibility of analysis Great help in ”recalling what I did 2 months ago”

StatWeave does all this – and is free… This talk: Focus on StatWeave with OpenOffice

and SAS/R …

Page 3: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Overview – Combining code, documentation and results

Source document Writing SAS statements More writing R statements Even more writing More SAS statements More writing…

Final document Writing SAS statements SAS output SAS graphics More writing R statements R output Even more writing SAS statements SAS output More writing…

Page 4: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Hello StatWeave World…

Page 5: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

What is literate programming

Knuth (1979) coined the term literate programming: Create software as works of literature: Embed source code into descriptive text (rather than the

opposite which is common practice) Software should follow flow of thoughts and logic Should be designed to be readable by humans (and not

only by compilers / programs).

Very useful idea in statistics…

Page 6: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Why literate programming?

Reproducible statistical analysis Research, consulting Document exactly what has been done Possible to re-run if data change

Manuals, course notes etc. Shown output guaranteed to be result of shown code

Page 7: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Some systems for literate programming

Comments inside code WEB (Knuth 1979) and friends Sweave (Lesich 2002)

R code in LaTeX documents

odfWeave (Kuhn and Coulter 2007) R code in OpenOffice documents

SASweave (Lenth and Højsgaard 2007) SAS / R code in LaTeX documents

StatWeave SAS / R / maple / S-plus / Stata … code in LaTeX and

OpenOffice documents

Page 8: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

StatWeave

StatWeave created by Russ Lenth, University of Iowa, USA Available: http://www.cs.uiowa.edu/~rlenth/StatWeave/ StatWeave is in its making, but becomming ”mature” and

stable.

Statweave design goals Support many languages

R, S-plus, SAS, Stata, Maple, … Support different word processing systems, currently

LaTeX OpenDocument Text (ODT) www.openoffice.org

Portability: Usable on all platforms (Written in JAVA) Extendible:

Add other languages

Page 9: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Under the hood of StatWeave

Source file is regular text document but with code chunks added (with special tags)

Two basic operations Weaving:

Process source file into single document with code listings, output listings, graphs…

Tangling: Extract code from source file to run later

Weaving is useful for reproducible statistical analysis

Page 10: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Running StatWeave

Command-line interface:statweave SAS-HelloWorld-swv.odt statweave --tangle SAS-HelloWorld-swv.odtstatweave --keepall SAS-HelloWorld-swv.odt

Graphical User Interface:

Generally, source xxx-swv.odt becomes output xxx.odt

Page 11: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Chicken weight data

Set global options (for SAS code) Inline evaluation of expressions

Page 12: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

… chicken weight data

Page 13: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

… chicken weight data

Output can be saved for later use - and display

Page 14: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Code reuse and argument substitution

Save code chunks for later execution Pass arguments to code chunks Simplest case: Not unlike a macro…

Page 15: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

…code reuse and argument substitution

Costumize display and output (tables) by reusable code chunk

Page 16: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

…code reuse and argument substitution

Page 17: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Multi-language example: SAS, R and DOS together

Can use different engines in the same source file Use SAS when appropriate; use R when appropriate; use

Maple when appropriate…

Weaving: SAS/R/XX chunks assembled into separate code files. Code files are processed in order of first appearence in

the source file

Page 18: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

…Multi-language example: SAS, R and DOS together

Page 19: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

…Multi-language example: SAS, R and DOS together

Page 20: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

…Multi-language example: SAS, R and DOS together

Page 21: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

…Multi-language example: SAS, R and DOS together

Page 22: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

…Multi-language example: SAS, R and DOS together

Page 23: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

…Multi-language example: SAS, R and DOS together

Synchronization issue: SAS chunk depends on data from R chunk which depends on data from SAS chunk….

Solution: The restart option will restart the engines

Page 24: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Code chunks are processed as a whole

Code chunks are processed as a ”unit” so in general one can not split a call to proc xxxx over several chunks:

Thus the following is illegal

Page 25: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

… one exception in SAS: IML

Page 26: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Odds and ends – Maple

Differentiate y= sin(x) xxx

Output is ugly, but it reads:

Page 27: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Odds and ends – calling the shell

Want to list all StatWeave / Open office source files: *-swv.odt

Page 28: A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural

Summary

Reproducible statistical analyses Integrate text, code and results in one document Several text formats Several languages

This talk (and the examples) are avaiable at http://genetics.agrsci.dk/~sorenh/misc/

All credit is due to Russ Lenth, the creator of StatWeave. Thanks!!!!