19
Big-data, real-time R? Yes, you can! 1 David Smith Revolution Analytics @revodavid

Big data real time R - useR! 2013 - David Smith

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Big data real time R - useR! 2013 - David Smith

1

Big-data, real-time R?

Yes, you can!

David SmithRevolution Analytics

@revodavid

Page 2: Big data real time R - useR! 2013 - David Smith

2

REAL TIME

BIG DATA

PREDICTIVE ANALYTICS

Buzzword Bingo!

With R?

Page 3: Big data real time R - useR! 2013 - David Smith

3

Real-time Deployment

1. Data distillation2. Model development

and validation3. Model deployment4. Real-time model

scoring5. Model refresh

Page 4: Big data real time R - useR! 2013 - David Smith

4Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0

“Big Data”

Page 5: Big data real time R - useR! 2013 - David Smith

5

1. Data Distillation in Hadoop

Unstructured

Data

Analytics Data Mart

Structured Data

Log Files

Sensor Streams

Language Text

HDFS LoadMap-

ReduceRHadooprmr

Page 6: Big data real time R - useR! 2013 - David Smith

6

2. The Model Development Cycle

Feature SelectionSamplingAggregat

ion

Variable Trans-

formation

Model Estimatio

n

Model Refinem

ent

Model Compari

son / Bench-

marking

Predictive Model

R White Paperbit.ly/r-is-hot

Structured Data

Page 7: Big data real time R - useR! 2013 - David Smith

7

Big-Data Predictive Models with ScaleR

Page 8: Big data real time R - useR! 2013 - David Smith

8

3: Deployment Options

Unknown factorsSQL / Rules EngineCode (C++, Java, R, Hadoop)PMML Engine

Factors known in advanceBatch Lookup Tables

Factors

Scores

Page 9: Big data real time R - useR! 2013 - David Smith

9

4. Real-Time Scoring Factors

Scores

”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0

Decision TreeLogistic RegressionNeural NetworkK-means clusteringEnsemble Model

Predictive Model

User IDBrowserTime/Date / LocationPrevious purchasesFriend data

Any known information

Product of most interestOffer of most likely saleMost relevant linkForecast sale valueOptimal Bid

Prediction or Selection

Scoring Rules

Page 10: Big data real time R - useR! 2013 - David Smith

10

5. Model refresh Factors

Scores

Actual Outcomes

Page 11: Big data real time R - useR! 2013 - David Smith

11

Big Data

Real Time

Kilobytes/Sec

Megabytes/Sec

Gigabytes Terabytes

Petabytes Exabytes

Seconds

Milliseconds

Minutes

Minutes Hours

Page 12: Big data real time R - useR! 2013 - David Smith

12

Real-World ExamplesRevolution Analytics Case Studies

Page 13: Big data real time R - useR! 2013 - David Smith

13

Why did I buy that blender?

Just browsing in the mallTV ad / magazine adCoupon in the mail“Just moved” promo emailWebstore recommendationBrowsing catalog

Page 14: Big data real time R - useR! 2013 - David Smith

14

UpStream: Attribution Modeling

Page 15: Big data real time R - useR! 2013 - David Smith

• ETL

• Marketing channel data

• Behavioral variables

• Promotional data

• Overlay data

• Exploratory data analysis• Time-to-event models• GAM survival models

• Scoring for inference

• Scoring for prediction

• 5 billion scores per day per retailer

UPSTREAM DATA FORMAT

CUSTOM VARIABLES (PMML)

Page 16: Big data real time R - useR! 2013 - David Smith

16

ACI

Top-20 mutual fund company$125B assetsResearch and data-drivenInnovative

Page 17: Big data real time R - useR! 2013 - David Smith

17

• Collaboration

• Speed• Deployment

Process• Adoption• Results

Analytics Function Library

rACI Package (w/ RevoR)

Model Building Function Library

Data Acquisition Function Library

Portfolio Optimization and Simulation API

Market Data from Thomson Reuters (QA-Direct)

American Century Quant Proprietary Data

Additional 3rd Party Data Vendors

Live Analytics

PRODUCTION MODEL GENERATION AND TRADING PROCESSES

Data Feeds

Page 18: Big data real time R - useR! 2013 - David Smith

18

PREDICTIVE ANALYTICSBIG DATA

REAL TIMEYes You Can!

Page 19: Big data real time R - useR! 2013 - David Smith

19

www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR

The leading enterprise provider of software and services for Open Source R

Big-Data, Real-Time R?Yes, you can!

David Smith@revodavid