Upload
revolution-analytics
View
5.627
Download
1
Tags:
Embed Size (px)
Citation preview
Revolution Confidential
R evolution R :100% R and More
P res ented by:David S mithV P Marketing and C ommunityR evolution A nalytic s
Revolution Confidential
P oll Ques tion
Which stats package do you use most?
Revolution ConfidentialF ebruary 22, 2011: Welc ome!
Thanks for coming. Slides and replay available (soon) at: http://bit.ly/z9xUG9
David SmithVP Marketing & Community, Revolution AnalyticsEditor, Revolutions blog
http://blog.revolutionanalytics.comTwitter: @revodavid
3
Revolution ConfidentialIn today’s webc as t:
About Revolution Analytics and R
What Revolution R adds to R
Resources for getting more from R
Q&A
4Introducing Revolution R
Revolution ConfidentialWhat is R ?
Data analysis software A programming language Development platform designed by and for statisticians
An environment Huge library of algorithms for data access, data
manipulation, analysis and graphics An open-source software project Free, open, and active
A community Thousands of contributors, 2 million users Resources and help in every domain
5
Download the White Paper
R is Hotbit.ly/r-is-hot
Revolution Confidential
Source: http://r4stats.com/popularity 6
R is exploding in popularity and func tionality
Stata 10%
S-Plus 0%
SPSS -27%
SAS -11%
R 46%
Scholarly ActivityGoogle Scholar hits (’05-’09 CAGR)
20102008200620042002
Package GrowthNumber of R packages listed on CRAN
“A key benefit of R is that it provides near-instant availability of new and
experimental methods created by its user base — without waiting for the
development/release cycle of commercial software. SAS recognizes the value of R
to our customer base…”
Product Marketing Manager SAS Institute, Inc.
“I’ve been astonished by the rate at which R has been adopted. Four years ago,
everyone in my economics department [at the University of Chicago] was using
Stata; now, as far as I can tell, R is the standard tool, and students learn it first.”
Deputy Editor for New Products at Forbes
Revolution Confidential“ R is the mos t powerful & flexible s tatis tic al programming language in the world” 1
Capabilities Sophisticated
statistical analyses Predictive analytics Data visualization
Applications Real-time trading Finance Risk assessment Forecasting Bio-technology Drug development Social networks .. and more
1. Norman Nie, multiple interviews 7
15
20
25
30
MSFT [2009-
Last 29.29
Revolution Confidential
P oll Ques tion
If you're not using R today, what would you most like to use R for?
Revolution ConfidentialR evolution R E nterpris e is
10
Revolution ConfidentialR P roduc tivity E nvironment (Windows )
11
Script with type ahead and code
snippetsSolutions window
for organizing code and data
Packages installed and
loaded
Objects loaded in the
R Environment
Object details
Sophisticated debugging with
breakpoints , variable values etc.
http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm
Revolution ConfidentialInterac tive Debugging
One-click to set a breakpoint in an R script Step in/out/over, inspect variables Eliminate the edit -> browser -> repair cycle
12
Revolution ConfidentialP erformance: Multi-threaded Math
13
OpenSource R
Revolution R Enterprise
Computation (4-core laptop) Open Source R Revolution R Speedup
Linear Algebra1
Matrix Multiply 327 sec 13.4 sec 23x
Cholesky Factorization 31.3 sec 1.8 sec 17x
Linear Discriminant Analysis 216 sec 74.6 sec 2x
General R Benchmarks2
R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x
R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable
1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php2. http://r.research.att.com/benchmarks/
Revolution ConfidentialT hree P aradigms for B ig Data
Standard R engine is constrained by capacity and performance
Revolution R Enterprise offers three methods for big data with R: Off-line: high-performance file-based analytics Off-line, parallel & distributed analytics On-line, in-database analytics Hadoop Netezza
14
Revolution Confidential
R evolution R E nterpris e with R evoS caleRB ig Data S tatis tics in R
15
www.revolutionanalytics.com/bigdata
Every US airline departure and arrival, 1987-2008
File: AirlineData87to08.xdfRows: 123.5 millionVariables: 29Size on disk: 13.2Gb
arrDelayLm2 <- rxLinMod(ArrDelay ~ DayOfWeek:F(CRSDepTime),cube=TRUE)
Revolution ConfidentialR evoS c aleR : B ig Data algorithms
Data processing (rxDataStep) Descriptive statistics (rxSummary) Tables and cubes (rxCube, rxCrossTabs) Correlations/covariances (rxCovCor, rxCor,
rxCov, rxSSCP) Linear regressions (rxLinMod) Logistic regressions (rxLogit) K means clustering (rxKmeans) Predictions (scoring) (rxPredict) Custom distributed computing (RxExec)
Revolution R Enterprise 16
Revolution Confidential
Compute Node
(RevoScaleR)
Compute Node
(RevoScaleR) Master Node
(RevoScaleR)
DataPartition
DataPartition
Compute Node
(RevoScaleR)
Compute Node
(RevoScaleR)
DataPartition
DataPartition
• Portions of the data source are made available to each compute node
• RevoScaleR on the master node assigns a task to each compute node
• Each compute node independently processes its data, and returns its intermediate results back to the master node
• master node aggregates all of the intermediate results from each compute node and produces the final result
R evoS c aleR – Dis tributed C omputing
17
*Available now for Microsoft HPC ServerVideo demo: http://bit.ly/ugQ9KR
Revolution ConfidentialP latform-agnos tic B ig Data A nalytic s Set “compute context” to define hardware (one line of code)
Native job-scheduler handles distribution, monitoring, failover etc.
Same code runs on other supported architectures Just change compute context
Supported architectures: Windows: Microsoft HPC Server Linux: Platform Computing LSF (coming 2012)
18
42 seconds instead of 6 minutes
Revolution Confidential
Hadoop File Based In-database
A common analytic platform acros s big data architectures
19
Revolution ConfidentialIn-Databas e E xecution with IB M Netezza
20
More info: http://bit.ly/R-Netezza
Revolution ConfidentialR and Hadoop Hadoop offers a scalable infrastructure for
processing massive amounts of data Storage – HDFS, HBASE Distributed Computing - MapReduce
R is a statistical programming language for developing advanced analytic applications
Currently, writing analytics for Hadoop requires a combination of Java, pig, Python, …
The Rhadoop project makes it possible to write Big Data algorithms for Hadoop using the R language alone.
21
Revolution ConfidentialR evoC onnec tR for Hadoop
22
Revolution R Client
R
Map or Reduce
Job Tracker
Task Node
HDFS
HBASE
Thrift
rhdfs - R and HDFS rhbase - R and HBASE rmr - R and MapReduce
Write Map-Reduce analytics using only R code with these R packages:
rmr
rhdfs rhbase
More information at:bit.ly/r-hadoop
Revolution Confidential
E nterpris e R eadines s : R evolution R E nterpris e S erver
Multi-User Support Production Applications
Integrate R analytics into Web based applications Data Analysis and Visualization Reporting Dashboards Interactive applications
Revolution R Enterprise Server with RevoDeployR
23
Revolution Confidential
24
E nterpris e-Wide Deployment Research and Development
Excel BIWeb AppRevoDeployR Server
Web Services API
Management Console
Revolution R Enterprise Server+ Hadoop+ IBM Netezza+ Windows HPC Server cluster
Data Scientists / Modelers
Production
Analysts / Corporate Users
End-User Deployment
Revolution ConfidentialOn-Demand A nalytics with R evoDeployR
25
Revolution ConfidentialT he A dvanc ed A nalytic s S tac k
Deployment / Consumption
Advanced Analytics
ETL
Data / Infrastructure
“Open Analytics Stack” White Paper: bit.ly/lC43Kw26
Revolution Confidential
On-Call Technical Support Consulting Migration | Analytics | Applications | Validation
Training R | Revolution R | Statistical Topics
Systems Integration BI | ERP | Databases | Cloud
27
Revolution Confidential
Wrapping Up
Revolution ConfidentialWhy R ?
29
Every data analysis technique at your fingertips Create beautiful and unique data visualizations Get better results faster Draw on the talents of data scientists worldwide R is hot, and growing fast
Revolution ConfidentialR evolution R E nterpris e
30
High-performance R for multiprocessor systemsModern Integrated Development EnvironmentStatistical Analysis of Terabyte-Class Data Sets In-database R analytics with Hadoop and NetezzaDeploy R Applications via Web ServicesTelephone and email technical supportTraining and consulting services100% compatible with R packages
Production-Grade Statistical Analysis for the Workplace
Revolution ConfidentialR evolution R E nterpris e: F ree to A cademia
Personal use Research Teaching Package development
31
Free Academic Downloadwww.revolutionanalytics.com/downloads/free-academic.php
Discounted Technical Support Subscriptions Available
Revolution ConfidentialT hank You!
Download slides, replay http://bit.ly/z9xUG9
Learn more about Revolution R revolutionanalytics.com/products
Contact Revolution Analytics http://bit.ly/hey-revo
32
Feb 29: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise
A Step-by-Step Approach for Acceleration and Innovation, presented by William Zanine (IBM Analytics Solutions).
www.revolutionanalytics.com/news-events/free-webinars
Revolution Confidential
P oll Ques tion
What interests you most about Revolution R Enterprise?
Revolution Confidential
34
The leading commercial provider of software and support for the popular open source R statistics language.
www.revolutionanalytics.com+1 (650) 646 9545
Twitter: @RevolutionR