Upload
revolution-analytics
View
103
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
1
Big-data, real-time R?
Yes, you can!
David SmithRevolution Analytics
@revodavid
2
REAL TIME
BIG DATA
PREDICTIVE ANALYTICS
Buzzword Bingo!
With R?
3
Real-time Deployment
1. Data distillation2. Model development
and validation3. Model deployment4. Real-time model
scoring5. Model refresh
4Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0
“Big Data”
5
1. Data Distillation in Hadoop
Unstructured
Data
Analytics Data Mart
Structured Data
Log Files
Sensor Streams
Language Text
HDFS LoadMap-
ReduceRHadooprmr
6
2. The Model Development Cycle
Feature SelectionSamplingAggregat
ion
Variable Trans-
formation
Model Estimatio
n
Model Refinem
ent
Model Compari
son / Bench-
marking
Predictive Model
R White Paperbit.ly/r-is-hot
Structured Data
7
Big-Data Predictive Models with ScaleR
8
3: Deployment Options
Unknown factorsSQL / Rules EngineCode (C++, Java, R, Hadoop)PMML Engine
Factors known in advanceBatch Lookup Tables
Factors
Scores
9
4. Real-Time Scoring Factors
Scores
”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0
Decision TreeLogistic RegressionNeural NetworkK-means clusteringEnsemble Model
Predictive Model
User IDBrowserTime/Date / LocationPrevious purchasesFriend data
Any known information
Product of most interestOffer of most likely saleMost relevant linkForecast sale valueOptimal Bid
Prediction or Selection
Scoring Rules
10
5. Model refresh Factors
Scores
Actual Outcomes
11
Big Data
Real Time
Kilobytes/Sec
Megabytes/Sec
Gigabytes Terabytes
Petabytes Exabytes
Seconds
Milliseconds
Minutes
Minutes Hours
12
Real-World ExamplesRevolution Analytics Case Studies
13
Why did I buy that blender?
Just browsing in the mallTV ad / magazine adCoupon in the mail“Just moved” promo emailWebstore recommendationBrowsing catalog
14
UpStream: Attribution Modeling
• ETL
• Marketing channel data
• Behavioral variables
• Promotional data
• Overlay data
• Exploratory data analysis• Time-to-event models• GAM survival models
• Scoring for inference
• Scoring for prediction
• 5 billion scores per day per retailer
UPSTREAM DATA FORMAT
CUSTOM VARIABLES (PMML)
16
ACI
Top-20 mutual fund company$125B assetsResearch and data-drivenInnovative
17
• Collaboration
• Speed• Deployment
Process• Adoption• Results
Analytics Function Library
rACI Package (w/ RevoR)
Model Building Function Library
Data Acquisition Function Library
Portfolio Optimization and Simulation API
Market Data from Thomson Reuters (QA-Direct)
American Century Quant Proprietary Data
Additional 3rd Party Data Vendors
Live Analytics
PRODUCTION MODEL GENERATION AND TRADING PROCESSES
Data Feeds
18
PREDICTIVE ANALYTICSBIG DATA
REAL TIMEYes You Can!
19
www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR
The leading enterprise provider of software and services for Open Source R
Big-Data, Real-Time R?Yes, you can!
David Smith@revodavid