Upload
ian-cook
View
6.816
Download
0
Tags:
Embed Size (px)
DESCRIPTION
An overview of the pros and cons of R, the free and open source language and environment for statistical computing and graphics.
Citation preview
R: The Good and The Bad
AnalyticsCamp NC, May 12, 2011Ian Cook, Organizer, Raleigh-Durham-Chapel Hill R Users Group
The Good…
= ?
• Effectively the lingua franca of data analysis and statistical computing
• Free and open source• As a statistical language, it’s generally
considered to be very easy to code in (vs. SAS, JSL, SPSS, etc.)
The Good
• Native cross-platform and 64-bit support• Typically easy to install and configure• Community of millions of users; brilliant minds• Rapidly growing number of packages (2800+
on CRAN, 950+ projects on R-Forge)– http://cran.r-project.org/web/packages/ and
http://r-forge.r-project.org/
The Good
• Great free, open soruce IDEs and GUIs (e.g., StatET for Eclipse, RStudio just released in late February, Emacs Speaks Statistics, JGR, Tinn-R, lots more)– See “Editors and IDEs” and “Graphical User
Interfaces” sections of http://en.wikipedia.org/wiki/R_(programming_language). Also see http://sciviews.org/_rgui/ and http://stackoverflow.com/questions/1097367/what-ides-are-available-for-r-in-linux
The Good
• Active mailing lists, trolled by the gurus, very easy to get your questions answered– On a humorous note:
http://yihui.name/en/2010/04/rules-of-thumb-to-meet-r-gurus-in-the-help-list/
• CRAN Task Views– http://cran.r-project.org/web/views/
The Good
• Growing coverage on Stack Exchange, also on “CrossValidated” statistical analysis Stack Exchange site– http://stackoverflow.com/questions/tagged/r and
http://stats.stackexchange.com/• #rstats hashtag on Twitter– http://twitter.com/search/%23rstats
• Blogger community dedicated to covering R– http://www.r-bloggers.com/
• Growing list of print books and ebooks
The Good
• Commercial and open source data analysis/mining/analytics/visualization software increasingly integrating with R (Spotfire, SPSS, Netezza, JMP, SAS/IML, RapidMiner)– http
://decisionstats.com/2010/05/04/commercial-r-integration-in-software/
• Revolution Analytics (products, blog, community site)– http://www.revolutionanalytics.com/,
http://blog.revolutionanalytics.com/, and http://www.inside-r.org/
The Good
The Bad…
= ?
• Command prompt, lack of GUI is intimidating• Slow (especially looping)• Poor parallelization• Syntactical curiosities, annoyances, design
flaws; little chance of them being remedied– E.g., http
://radfordneal.wordpress.com/2008/09/21/design-flaws-in-r-3-%E2%80%94-zero-subscripts/
• Indices start at 1!
The Bad
• Subtle problems with scoping– http
://stackoverflow.com/questions/3840769/scoping-and-functions-in-r-2-11-1-whats-going-wrong
• Poor memory performance, difficulty handing big data
• Can be difficult to compile base R and R packages from source– Requires compilers for Fortran, Perl, C/C++, Tcl
The Bad
• Onerous terms of AGPL• Has been proposed that the R community start
over and build something better from scratch– Estimated that a total rewrite could improve speed
by 2 orders of magnitude– http://
stackoverflow.com/questions/3706990/is-r-that-bad-that-it-should-be-rewritten-from-scratch
• Increasingly attractive alternatives (e.g. Python)
The Bad
The Verdict
?
Join the Raleigh-Durham-Chapel Hill R Users Group at:http://www.meetup.com/Triangle-useR/