Upload
coral-dorsey
View
227
Download
5
Tags:
Embed Size (px)
Citation preview
1
Real-Time Big Data Analytics
From Deployment to Production
David SmithRevolution Analytics
@revodavid
5
Predictive Analytics Model
Factors
Scores
”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0
Decision TreeLogistic RegressionNeural NetworkK-means clusteringEnsemble Model
Predictive Model
User IDBrowserTime/Date / LocationPrevious purchasesFriend data
Any known information
Product of most interestOffer of most likely saleMost relevant linkForecast sale valueOptimal Bid
Prediction or Selection
Scoring Rules
"CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0
6
Real-time Deployment
1. Data distillation2. Model development and
validation3. Model deployment4. Real-time model
scoring5. Model refresh
7
1. Data Distillation in Hadoop
Unstructured
Data
Analytics Data Mart
Structured Data
Log Files
Sensor Streams
Language Text
HDFS LoadMap-
Reducermr
8
2. The Model Development Cycle
Feature SelectionSamplingAggregat
ion
Variable Trans-
formation
Model Estimatio
n
Model Refinem
ent
Model Compari
son / Bench-
marking
Structured Data Predictive Model
R White Paperbit.ly/r-is-hot
9
3: Deployment Options
Unknown factorsSQL / Rules EngineCode (C++, Java, R, Hadoop)PMML Engine
Factors known in advanceBatch Lookup Tables
Factors
Scores
10
Why did I buy that blender?
Just browsing in the mallTV ad / magazine adCoupon in the mail“Just moved” promo emailWebstore recommendationBrowsing catalog
• ETL
• Marketing channel data
• Behavioral variables
• Promotional data
• Overlay data
• Exploratory data analysis• Time-to-event models• GAM survival models
• Scoring for inference
• Scoring for prediction
• 5 billion scores per day per retailer
UPSTREAM DATA FORMAT
CUSTOM VARIABLES (PMML)
4. Model Scorin
g
14
Big Data
Real Time
Kilobytes/Sec
Megabytes/Sec
Gigabytes Terabytes
Petabytes Exabytes
Seconds
Milliseconds
Minutes
Minutes Hours
16
www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR
The leading enterprise provider of software and services for Open Source R
Real-Time Big Data Predictive Analytics: From Deployment to Production
Booth 618 / Office Hours Weds 1:30PM
David Smith@revodavid