Upload
m8r4mqdon
View
218
Download
0
Embed Size (px)
Citation preview
7/30/2019 Rise of Data Science in Age of Big Data
1/38
Revolution Confidential
T he R is e of DataS c ience in the age of
B ig Data Analytic s
Why Data Dis tillation and MachineL earning A rent E nough
David M S mith
VP Marketing and C ommunityR evolution A nalytics
7/30/2019 Rise of Data Science in Age of Big Data
2/38
Revolution ConfidentialToday, well dis cus s :
What is Data Science? Why machine learning isnt enough
Why Data Science works
The Data Scientists Toolkit
The Future of Big Data Analytics
Closing thoughts and resources
2
7/30/2019 Rise of Data Science in Age of Big Data
3/38
Revolution Confidential
3 Dov Harrington, CC By-2.0http://www.flickr.com/photos/idovermani/4110546683/
7/30/2019 Rise of Data Science in Age of Big Data
4/38
Revolution ConfidentialWhere is it s afe to fis h near S an F rancis co?
4San Francisco Estuary Institutehttp://www.sfei.org/tools/wqt
7/30/2019 Rise of Data Science in Age of Big Data
5/38
Revolution ConfidentialHurric ane S andy
Bob Rudishttp://rud.is/b/2012/10/28/watch-sandy-in-r-including-forecast-cone/
5
7/30/2019 Rise of Data Science in Age of Big Data
6/38
Revolution ConfidentialHurric ane S andy
Ed Chenhttp://blog.echen.me/hurricane-sandy-outages/
6
7/30/2019 Rise of Data Science in Age of Big Data
7/38
Revolution Confidential
When did Mic hael J acks on have his
biggest hits?
New York Times, June 25 2009 (3 hours after Michael Jacksons death)http://www.nytimes.com/interactive/2009/06/25/arts/0625-jackson-graphic.html 7
7/30/2019 Rise of Data Science in Age of Big Data
8/38
Revolution ConfidentialT hree E s s ential S kills of Data S c ientis ts
8Drew Conwayhttp://www.dataists.com/2010/09/the-data-science-venn-diagram/
Data IntegrationMashups
Applications
ModelsVisualizationPredictionsUncertainty
ProblemsData SourcesCredibility
EffectiveDataApplications
7/30/2019 Rise of Data Science in Age of Big Data
9/38
Revolution Confidential
9Image Abode of Chaos, CC BY 2.0http://www.flickr.com/photos/home_of_chaos/6418989233/
7/30/2019 Rise of Data Science in Age of Big Data
10/38
Revolution ConfidentialMac hine learning (ML ) for predictions
10
Response
Features
Responses
ML
scoringrules
Building the Model
Validat
ion
set
Predictions
scoringrules
Validating the Model
New
Data
P
redictions(scores)
scoringrules
Scoring new data
Accuracy
7/30/2019 Rise of Data Science in Age of Big Data
11/38
Revolution ConfidentialP roblem: A lack of pers pective
11Image 2010 David M Smith. Some rights reserved CC BY-2.0
7/30/2019 Rise of Data Science in Age of Big Data
12/38
Revolution ConfidentialP roblem: L ac k of credibility
12
7/30/2019 Rise of Data Science in Age of Big Data
13/38
Revolution ConfidentialP roblem: C omplexity
13
7/30/2019 Rise of Data Science in Age of Big Data
14/38
Revolution ConfidentialData Science to the
Rescue!
14
7/30/2019 Rise of Data Science in Age of Big Data
15/38
Revolution ConfidentialA ns wer Unas ked Ques tions
15Revolutions blog: The Uncanny Valley of Big Datahttp://blog.revolutionanalytics.com/2012/02/the-uncanny-valley-of-big-data.html
7/30/2019 Rise of Data Science in Age of Big Data
16/38
Revolution Confidential
16
More data beatsbetter algorithms,every time Google
Companies that have
massive amounts of datawithout massive amounts
of clue are going to bedisplaced by startups thathave less data but more
clue.--Tim OReilly
Google Research, The Unreasonable Effectiveness of Data:
http://googleresearch.blogspot.com/2009/03/unreasonable-effectiveness-of-data.html
Tim OReilly on Google+: https://plus.google.com/107033731246200681024/posts/4Xa76AtxYwdTechnoCalifornia: http://technocalifornia.blogspot.com/2012/07/more-data-or-better-models.html
F ill in knowledge gaps
7/30/2019 Rise of Data Science in Age of Big Data
17/38
Revolution ConfidentialAvoid ineffective reactions
17Stupid Data Miner Trickshttp://nerdsonwallstreet.typepad.com/my_weblog/files/dataminejune_2000.pdf
S&P500
7/30/2019 Rise of Data Science in Age of Big Data
18/38
Revolution Confidential
18 Henricks Photos CC-BY-ND 2.0http://www.flickr.com/photos/hendricksphotos/3240667626/
7/30/2019 Rise of Data Science in Age of Big Data
19/38
Revolution Confidential0. Data (B ig & Mes s y)
19
7/30/2019 Rise of Data Science in Age of Big Data
20/38
Revolution Confidential1. A language for programming with data
20
Download the White Paper
R is Hotbit.ly/r-is-hot
http://info.revolutionanalytics.com/R-is-Hot-Whitepaper.htmlhttp://info.revolutionanalytics.com/R-is-Hot-Whitepaper.html7/30/2019 Rise of Data Science in Age of Big Data
21/38
Revolution Confidential
21
Grant awards to homeless veterans FY09Data: Data.govAnalysis: Drew Conway
User-defined functions
Internet API interfaceXML parsing
Custom graphics
Data import and pre-processing
Iterative data processing
http://explore.data.gov/National-Security-and-Veterans-Affairs/VA-Homeless-Grant-and-Per-Diem-FY09/2uzu-vjiahttp://www.drewconway.com/zia/?p=2486http://www.drewconway.com/zia/?p=2486http://explore.data.gov/National-Security-and-Veterans-Affairs/VA-Homeless-Grant-and-Per-Diem-FY09/2uzu-vjia7/30/2019 Rise of Data Science in Age of Big Data
22/38
Revolution Confidential2. S peed. L ots and lots of s peed.
22
VariableTransformation
ModelEstimation
ModelRefinement
ModelComparison /Benkmarking
Feature
SelectionSampling
Aggregation
Data Predictions
7/30/2019 Rise of Data Science in Age of Big Data
23/38
Revolution Confidential
Core 0(Thread 0)
Core n(Thread n)
Core 2(Thread 2)
Core 1(Thread 1)
Multicore Processor (4, 8, 16+ cores)
DataData Data
Disk
Shared Memory
Us e all available c omputing c yc les
23
7/30/2019 Rise of Data Science in Age of Big Data
24/38
Revolution Confidential
ComputeNode
ComputeNode
MasterNode
DataPartition
DataPartition
ComputeNode
Compute
Node
DataPartition
DataPartition
3. A lgorithms that dont c hoke on B ig Data
PEMAs: Parallel External-Memory Algorithms
24
BIGDATA
7/30/2019 Rise of Data Science in Age of Big Data
25/38
Revolution ConfidentialDrink les s c offee!
25
Single ThreadedNon-optimized
algorithms
OptimizedParallelized
Algorithms
7/30/2019 Rise of Data Science in Age of Big Data
26/38
Revolution Confidential4. Move code to data (not vice versa)
26
Map-Reduce
RHadoop: http://bit.ly/RHadoop
7/30/2019 Rise of Data Science in Age of Big Data
27/38
Revolution ConfidentialB ig Data A ppliances
27
More info: http://bit.ly/R-Netezza
http://bit.ly/R-Netezzahttp://bit.ly/R-Netezza7/30/2019 Rise of Data Science in Age of Big Data
28/38
Revolution ConfidentialPlay Nice with Others
Business Intelligence Tools Web-based data apps
Reporting / Spreadsheets
Presentation Layer
R
Analytics Layer
Relational datastores Unstructured datastores
Data Layer
28
7/30/2019 Rise of Data Science in Age of Big Data
29/38
Revolution ConfidentialWhat every data s c ientis t needs
Open-Source RRevolution R
Enterprise
Interface with multiple data sources
Exploratory data analysis
Wide range of statistical methods
High-speed computation
Big Data support
Data/code locality (Hadoop, etc.)
Print-quality data visualization
Scheduled batch production
Works in a multi-tool ecosystem
Integration into Data Apps
29
7/30/2019 Rise of Data Science in Age of Big Data
30/38
Revolution ConfidentialR evolution R E nterpris e: B ig-Data R
Open-Source RRevolution R
Enterprise
Interface with multiple data sources
Exploratory data analysis
Wide range of statistical methods
High-speed computation
Big Data support
Data/code locality (Hadoop, etc.)
Print-quality data visualization
Scheduled batch production
Works in a multi-tool ecosystem
Integration into Data Apps
30www.revolutionanalytics.com/products
7/30/2019 Rise of Data Science in Age of Big Data
31/38
Revolution Confidential
31Image www.tinyplanetphotography.com
7/30/2019 Rise of Data Science in Age of Big Data
32/38
Revolution ConfidentialAnd the future?
Even more data
Cloud computing
Demand forData Scientists
Diverging paradigms for data analytics
32http://www.indeed.com/jobtrends
7/30/2019 Rise of Data Science in Age of Big Data
33/38
Revolution ConfidentialDiverging data paradigms
33
Hadoop
NoSQL
Files
Clusters
Data
Appliances
More data, better fault tolerance
Easier programming, better performance
Exploration
Modeling
Storage
Preprocessing
Production
7/30/2019 Rise of Data Science in Age of Big Data
34/38
Revolution ConfidentialData S c ience in P roduction
Real-time Big Data Analytics: FromDeployment to Production
Thursday, November 29, 2012
10:00AM - 11:00AM Pacific Time
www.revolutionanalytics.com/news-events/free-webinars/
34
7/30/2019 Rise of Data Science in Age of Big Data
35/38
Revolution ConfidentialB uilding Data S c ience Teams
DJ Patil in OReilly Radar: http://oreil.ly/I3H5fI
Statistics and Data Science graduates
Kaggle and Chorus
Revolution Analytics R Training: http://www.revolutionanalytics.com/services/training/
35
http://oreil.ly/I3H5fIhttp://www.revolutionanalytics.com/services/training/http://www.revolutionanalytics.com/services/training/http://oreil.ly/I3H5fI7/30/2019 Rise of Data Science in Age of Big Data
36/38
Revolution ConfidentialClosing Thoughts
Data Science process leads to morepowerful, and more useful models
Data Scientists need a technology platformto think about, explore, and model data
Revolution R Enterprise is R for Big Data
36
7/30/2019 Rise of Data Science in Age of Big Data
37/38
Revolution ConfidentialResources
Revolution R Enterprise : R for Big Data www.revolutionanalytics.com/products
Rhadoop : Connecting R and Hadoop
bit.ly/r-hadoop
Contact David Smith
@revodavid
blog.revolutionanalytics.com
37
http://www.revolutionanalytics.com/productshttp://bit.ly/r-hadoopmailto:[email protected]://blog.revolutionanalytics.com/http://blog.revolutionanalytics.com/mailto:[email protected]://bit.ly/r-hadoophttp://www.revolutionanalytics.com/products7/30/2019 Rise of Data Science in Age of Big Data
38/38
Revolution ConfidentialT hank you.
www.revolutionanalytics.com 650.646.9545 Twitter: @RevolutionR
The leading commercial provider of software and support for the popularopen source R statistics language.
http://www.revolutionanalytics.com/http://twitter.com/RevolutionRhttp://twitter.com/RevolutionRhttp://www.revolutionanalytics.com/http://www.revolutionanalytics.com/