Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
• The leading commercial provider of so4ware and support for the popular open source R sta9s9cs language.
• Palo Alto, Sea?le, New York. • www.revolu9onanaly9cs.com/video.php
RevoScaleR Webinar 2
Revolution Confidential “R is the most powerful & flexible statistical programming language in the world”…
0
500
1,000
1,500
2,000
2,500
3,000
3,500
1995 2000 2005 2010
R
SAS Stata
SPSS
S-Plus
600
900
1,050
2,000
4,000
SPSS
SAS
R
Stata
S-Plus
-27%
Stata
0%
SAS
R 46%
SPSS
-11%
10%
S-Plus
Internet Discussion Mean monthly traffic on email discussion list
Web Site Popularity Number of links to main web site
Scholarly Activity Google Scholar hits (’05-’09 CAGR)
3 Source: http://r4stats.com/popularity
Revolution Confidential
The coming revolution – due to disruptive technological change
4
I believe there is going to be a revolution in both statistical practice and theory over the next several years
This revolution will be driven by disruptive technological change: our ability to collect and store data is rapidly and greatly outpacing our ability to analyze that data
Revolution Confidential Huge benefits to huge data
More information, more to be learned
Variables and relationships can be visualized and analyzed in much greater detail
Can allow the data to speak for itself; can relax or eliminate assumptions
Can get better predictions and better understandings of effects
Revolution R Enterprise 5
Revolution Confidential
We are currently incapable of analyzing much of the data we have
The most commonly-used statistical software tools either fail completely or are too slow to be useful on huge data sets
In many ways we are back where we were in the ‘70s and ‘80’s in terms of ability to handle common data sizes
Revolution R Enterprise 6
Revolution Confidential Code museums and the end of an era The vast majority of the data analysis
software in use today is based on algorithms that are 30, 40, 50 or more years old
Much of the actual code dates back that far
During that period of time the rising tide of technology allowed the same code to run faster and on bigger data sets
We are at the end of that era
Revolution R Enterprise 7
Revolution Confidential To keep up with the tsunami of data
We must:
use more cores
use more hard drives
use more computers
Existing statistical software can’t do this
We need new software
Revolution R Enterprise 8
Revolution Confidential New statistical software must be
Scalable – the same code that works on 100 observations should work on a 100 billion
Fast – need results in a timely manner; good data analysis requires interactivity
Easy to use – we need to be able to use clusters and clouds as easily as we can use single workstations today
Revolution R Enterprise 9
Revolution Confidential New software should also
Be easily extendible to new fast, scalable algorithms
Leverage as much existing software as possible
Be flexible, forgiving, and familiar to lots of people
Revolution R Enterprise 10
Revolution Confidential Is this possible? Yes!
Based on R – R is not only the statistical language of the present, in my opinion it is the language of the future
Based on “parallel external memory algorithms” -- at Revolution Analytics, we have released a framework for automatically and efficiently parallelizing and distributing a wide class of statistical and data mining algorithms
Revolution R Enterprise 11
Revolution Confidential Scalability of RevoScaleR: Regression, 1 million - 1.1 billion rows, 443 betas
0
200
400
600
800
1000
1200
0 200 400 600 800 1000 1200
Time (secs)
~ 1.1 million rows/second
Revolution R Enterprise 12
Revolution Confidential Please contact me if you have questions
Lee Edlefsen [email protected]
Revolution R Enterprise 13