16
Real-Time Big Data Analytics From Deployment to Production 1 David Smith Revolution Analytics @revodavid

Real-Time Big Data Analytics From Deployment to Production 1 David Smith Revolution Analytics @revodavid

Embed Size (px)

Citation preview

1

Real-Time Big Data Analytics

From Deployment to Production

David SmithRevolution Analytics

@revodavid

2

WHAT’S UP

WITH THAT?

3

REAL TIME

BIG DATA

PREDICTIVE ANALYTICS

Buzzword Bingo!

4Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0

5

Predictive Analytics Model

Factors

Scores

”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0

Decision TreeLogistic RegressionNeural NetworkK-means clusteringEnsemble Model

Predictive Model

User IDBrowserTime/Date / LocationPrevious purchasesFriend data

Any known information

Product of most interestOffer of most likely saleMost relevant linkForecast sale valueOptimal Bid

Prediction or Selection

Scoring Rules

"CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0

6

Real-time Deployment

1. Data distillation2. Model development and

validation3. Model deployment4. Real-time model

scoring5. Model refresh

7

1. Data Distillation in Hadoop

Unstructured

Data

Analytics Data Mart

Structured Data

Log Files

Sensor Streams

Language Text

HDFS LoadMap-

Reducermr

8

2. The Model Development Cycle

Feature SelectionSamplingAggregat

ion

Variable Trans-

formation

Model Estimatio

n

Model Refinem

ent

Model Compari

son / Bench-

marking

Structured Data Predictive Model

R White Paperbit.ly/r-is-hot

9

3: Deployment Options

Unknown factorsSQL / Rules EngineCode (C++, Java, R, Hadoop)PMML Engine

Factors known in advanceBatch Lookup Tables

Factors

Scores

10

Why did I buy that blender?

Just browsing in the mallTV ad / magazine adCoupon in the mail“Just moved” promo emailWebstore recommendationBrowsing catalog

11

UpStream: Attribution Modeling

• ETL

• Marketing channel data

• Behavioral variables

• Promotional data

• Overlay data

• Exploratory data analysis• Time-to-event models• GAM survival models

• Scoring for inference

• Scoring for prediction

• 5 billion scores per day per retailer

UPSTREAM DATA FORMAT

CUSTOM VARIABLES (PMML)

4. Model Scorin

g

13

5. Model refresh Factors

Scores

Actual Outcomes

14

Big Data

Real Time

Kilobytes/Sec

Megabytes/Sec

Gigabytes Terabytes

Petabytes Exabytes

Seconds

Milliseconds

Minutes

Minutes Hours

15

PREDICTIVE ANALYTICSBIG DATA

REAL TIME

WHAT’S UP WITH THAT?

16

www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR

The leading enterprise provider of software and services for Open Source R

Real-Time Big Data Predictive Analytics: From Deployment to Production

Booth 618 / Office Hours Weds 1:30PM

David Smith@revodavid