32
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Massive Predictive Modeling using Oracle R Technologies Mark Hornick, Director, Oracle Advanced Analytics

Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Massive Predictive Modeling using Oracle R Technologies Mark Hornick, Director, Oracle Advanced Analytics

Page 2: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

3

Page 3: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Agenda

1

2

3

Massive Predictive Modeling

Use cases

Enabling technologies

4

Page 4: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Quick Survey: How many models have you built? in your lifetime

> 10

> 100

> 1000

> 10000

>100000

>1000000

5

Page 5: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 7

# Models

Data Size (rows)

1 millions

billions

100s

Massive Predictive Modeling

“Specialized” “Generalized”

Page 6: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 8

# Models

Data Size (rows)

1 millions

billions

100s

“Broad coverage”

“Targeted”

# Models per Entity

1

1000s

Page 7: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Massive Predictive Modeling - Goals

• Build one or more models per entity, e.g., customer

• Understand and/or predict entity behavior

• Aggregate results across entities, e.g., to assess future demand

9

model

model

model

model

model

model

model

model

model

Σ cust=1

n

Demand over time

Page 8: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Massive Predictive Modeling - Challenges

• Effectively dealing with “Big Data” – Hardware, software, network, storage

• Algorithms that scale and perform with Big Data

• Building “many” models in parallel

• Production deployment

• Storing and managing models

• Backup, recovery, and security

10

Page 9: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Use Cases

14

Page 10: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Predicting Customer Electricity Usage

15

Page 11: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Motivation: Energy Theft Detecting patterns of meter tampering

SA country loses

US$4 billion per year due

to energy theft

Storage of information about

which meters have been

tampered with

Analysis and decision making

Forecast future behavior

16

Page 12: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Motivation: Different customers, different demands

Each customer has different demand and consumption

patterns

Storage of information about the consumption

of each customer in different periods of day

Creation of a demand and consumption

curve for each customer

Analysis: in which period will company have to deliver more energy?

Price electricity in a

given period

Customer decides when to use energy to reduce cost

Company redirects the

energy to where it is most needed at the moment, saving on the generation

Page 13: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Sensor Data Analysis

• Model each customer’s usage to understand behavior and predict individual usage and overall aggregate demand

• Consider 200K customers, each with a utility “smart meter”

• 1 reading / meter / hour

• 200K x 8760 hours / year 1.752B readings

• 3 years worth of data 5.256B readings

• 26280 readings per customer

• 10 seconds to build each model 555.6 hours (23.2 days) …with 128 DOP 4.3 hours

Page 14: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

f(dat,args,…) {

}

Oracle Database

Data c1 c2 ci cn

R Script build model

f(dat,args,…) f(dat,args,…) f(dat,args,…) f(dat,args,…)

Model c1

Model c2

Model cn

Model ci

R Datastore R Script Repository

Database-centric architecture Smart meter scenario

Page 15: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

scores c1

scores c2

scores ci

scores cn

f(dat,args,…) { }

Oracle Database

Data c1 c2 ci cn

R Script score data

f(dat,args,…) f(dat,args,…) f(dat,args,…) f(dat,args,…)

Model Model Model Model R Datastore R Script Repository

Database-centric architecture Smart meter scenario

Page 16: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

How many lines of code do you think it should take to implement this?

Page 17: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Build models and store in database, partition on CUST_ID

ore.groupApply (CUST_USAGE_DATA,

CUST_USAGE_DATA$CUST_ID,

function(dat, ds.name) {

cust_id <- dat$CUST_ID[1]

mod <- lm(Consumption ~ . -CUST_ID, dat)

mod$effects <- mod$residuals <- mod$fitted.values <- NULL

name <- paste("mod", cust_id,sep="")

assign(name, mod)

ds.name1 <- paste(ds.name,".",cust_id,sep="")

ore.save(list=paste("mod",cust_id,sep=""), name=ds.name1, overwrite=TRUE)

TRUE

},

ds.name="myDatastore", ore.connect=TRUE, parallel=TRUE

)

14 lines

22

Page 18: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Score customers in database, partition on CUST_ID

ore.groupApply(CUST_USAGE_DATA_NEW,

CUST_USAGE_DATA_NEW$CUST_ID,

function(dat, ds.name) {

cust_id <- dat$CUST_ID[1]

ds.name1 <- paste(ds.name,".",cust_id,sep="")

ore.load(ds.name1)

name <- paste("mod", cust_id,sep="")

mod <- get(name)

prd <- predict(mod, newdata=dat)

prd[as.integer(rownames(prd))] <- prd

res <- cbind(CUST_ID=cust_id, PRED = prd)

data.frame(res)

},

ds.name="myDatastore", ore.connect=TRUE, parallel=TRUE,

FUN.VALUE=data.frame(CUST_ID=numeric(0), PRED=numeric(0))

)

16 lines

23

Page 19: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Execution Examples (with DOP=24)

• 1000 Models

– Data: 26,280,000 rows

– Total build time: 65.2 seconds

– Total scoring time: 25.7 seconds (all data)

• 10,000 Models

– Data: 262,800,000 rows

– Total build time: 516 seconds

– Total scoring time: 217 seconds (all data)

24

• 50,000 Models

– Data: 1,314,000,000 rows

– Total build time: 55.85 minutes

– Total scoring time: 18 minutes (all data)

1

10

100

1000

10000

26.3 262.8 1314

Exe

cuti

on

(se

c)

# rows (millions)

Build Time

Score Time

1 Model/Customer

Page 20: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Simulation

25

Page 21: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Compute distribution of generated random normal values simulation <- function(index, n) {

set.seed(index)

x <- rnorm(n)

res <- data.frame(t(matrix(summary(x))))

names(res) <- c("min","q1","median","mean","q3","max")

res$id <- index

res

}

(res <- simulation(1,1000))

26

Page 22: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Simulation with sample size 1000 over 10 trials res <- ore.indexApply(10, simulation, n=1000, FUN.VALUE=res[1,], parallel=TRUE)

stats <- ore.pull(res)

library(reshape2)

melt.stats <- melt(stats, id.vars="id")

boxplot(value~variable, data=melt.stats, main="Distribution of Stats - sample 1000, 10 trials")

27

Page 23: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Simulation with sample sizes 101:6 and 100 trials

num.trials <- 100

for(n in 10^(1:6)){

t1 <- system.time(stats <- ore.pull(ore.indexApply(num.trials, simulation, n=n,

FUN.VALUE=res[1,], parallel=TRUE)))[3]

cat("n=",n,", time=",t1,"\n")

melt.stats <- melt(stats, id.vars="id")

boxplot(value~variable, data=melt.stats,

main=paste("Distribution of Stats - sample",n,",", num.trials, "trials"))

gc()

}

28

Page 24: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Plot Results: sample sizes 101:6 and 100 trials

Page 25: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Scalable Performance varying number of trials 200..5000

(10^x)

Page 26: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Enabling Technologies

32

Page 27: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle R Enterprise • Oracle Advanced Analytics Option to Oracle Database

• Eliminate memory constraint of client R engine

• Minimize or eliminate data movement latency

• Execute R scripts through database server machine for scalability and performance

• Achieve scalability and performance by leveraging Oracle Database as HPC environment

• Enable integration and management of R scripts through SQL

• Operationalize entire R scripts in production applications – eliminate porting R code

• Avoid reinventing code to integrate R results into existing applications

Client R Engine

ORE packages

Oracle Database User tables

Transparency Layer

In-db stats

Database Server Machine

SQL Interfaces SQL*Plus, SQLDeveloper, …

34

Page 28: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle’s R Technologies

• Oracle R Distribution

• ROracle

• Oracle R Enterprise

• Oracle R Advanced Analytics for Hadoop

Software available to R Community for free

35

Come to our booth to learn more…

Page 29: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Resources

• Oracle R Distribution • ROracle • Oracle R Enterprise • Oracle R Advanced Analytics for Hadoop

• Book: Using R to Unlock the Value of Big Data

• Blog: https://blogs.oracle.com/R/

• Forum: https://forums.oracle.com/forums/forum.jspa?forumID=1397

http://oracle.com/goto/R

47

Page 30: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

FastR

• New implementation of R in Java

– Uses the new Truffle interpreter framework and Graal optimizing compiler in conjunction with the HotSpot™ JVM for high performance, scalability and portability

– Dynamically compiles, adaptively optimizes and deoptimizes at run time

– Joint effort: Oracle Labs (Germany, USA, Austria), JKU Linz (Austria), Purdue University (USA), TU Dortmund (Germany)

• Open-source project (research prototype!)

– GPLv2

– https://bitbucket.org/allr/fastr

• More info at the poster session

48

Page 31: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 49

Page 32: Massive Predictive Modeling - Oracle Cloud · Title: How to Use the PowerPoint Template Author: marcos arancibia Keywords: Oracle corporate Tagline Created Date: 7/5/2014 1:10:30