10
In-Database Analytics with R Michele Chambers – Advanced Analytics Product Management, Director Brian Hess – Advanced Analytics, Director & Principal Mathematician © 2010 Netezza, Inc. All rights reserved

In-database Analytics with R v3 · Advanced Analytics –the Traditional Way Demand Forecasting SAS Analytics Grid Data Warehouse Data ETL SQL ETL Use R! 2010 -Netezza -In-database

Embed Size (px)

Citation preview

In-Database Analytics with R

Michele Chambers – Advanced Analytics Product Management, Director

Brian Hess – Advanced Analytics, Director & Principal Mathematician

© 2010 Netezza, Inc. All rights reserved

Agenda

• What are in-database analytics?

• How does in-database analytics processing help you?

• Can in-database analytics be used for data mining as well as scoring?

• How can you take advantage of a massively parallel architecture to speed up

embarrassingly parallel algorithms as well as heroic computations?

Use R! 2010 - Netezza - In-database Analytics with R

Page 2

Advanced Analytics – the Traditional Way

Demand

Forecasting

SASAnalytics

Grid

Data

Warehouse

Data

ETL

SQL

ETL

Use R! 2010 - Netezza - In-database Analytics with R

Page 3

Fraud

DetectionR, S+

C/C++, Java, Python,

Fortran, …

SQL

SQL

ETL

What are in-database analytics?

Embedding of analytics inside the database

so that the computation processing occurs as

Use R! 2010 - Netezza - In-database Analytics with R

so that the computation processing occurs as

close to the data as possible

Page 4

What’s the Big Deal with In-Database Analytics?

R ClientR Client

BI ClientBI Client

FPGA CPU

FPGA

Analytics

CPU

Host

Use R! 2010 - Netezza - In-database Analytics with R

Page 5

LoaderLoader

Visualization

Client

Visualization

Client

Applications

FPGA

Analytics

CPU

Analytics HostsHost

Disk

Enclosures S-Blades™Network

Fabric

Netezza Appliance

Moving Compute Next to Data As Data Streams By

Data Parallelism

• Scoring / Predicting

> Concurrent calculation of a prediction or score on large quantities of data

Task Parallelism

• Model Simulation / Experimentation

> Concurrent simulation on different data

> 100’s different models running against 1M‘s rows

Data Mining with Task/Data Parallelism

• Series of iterations that can be parallelized

Define problem

Use R! 2010 - Netezza - In-database Analytics with R

Page 6

Score

calculation

on Core 1

Scoring Model

Control Program

Score

calculation

on Core X

Simulation

#1

on Core 1

Monte Carlo Simulation

Control Program

Simulation

#X

on Core X

Deploy and use model

Validate model

Build model

Explore data

Prepare data

InIn--Database AnalyticsDatabase AnalyticsInIn--Database AnalyticsDatabase Analytics

Software Development KitSoftware Development KitParallel Analytic EnginesParallel Analytic Engines

Open Source AnalyticsOpen Source AnalyticsOpen Source AnalyticsOpen Source Analytics

R R

AnalyticsAnalytics

R R

AnalyticsAnalytics

Scientific Scientific

AnalyticsAnalytics

Scientific Scientific

AnalyticsAnalyticsData PrepData PrepData PrepData Prep

Data Data

MiningMining

Data Data

MiningMining

Predictive Predictive

AnalyticsAnalytics

Predictive Predictive

AnalyticsAnalytics

nzAnalyticsnzAnalyticsnzAnalyticsnzAnalytics

SpatialSpatialSpatialSpatial

Custom

Customer/

Partner

Analytics

Use R! 2010 - Netezza - In-database Analytics with R

Page 7

Software Development KitSoftware Development KitParallel Analytic EnginesParallel Analytic Engines

nzMatrix

nzEngine

for

R

nzEngine

for Hadoop

nzAdaptors

for

C, C++, Java,

Python, Fortran

nzAdaptors

for

C, C++, Java,

Python, Fortran

nzPlug-in

for

Eclipse

nzPlug-in

for

Eclipse

nzPackage

for

R GUI

nzPackage

for

R GUI

Streaming Accelerator

Netezza AMPP™ Platform

Company Confidential

What does In-Database Analytics Deliver?

Efficient process

Faster turnarounds

Ability to react to

market

Unlimited analytic

complexity

Use R! 2010 - Netezza - In-database Analytics with R

Page 8

$Reduce expenses

Unlimited/ current data

marketcomplexity

Ability to experiment

01011111010

10101010101

01010101010

10101010101

01010101010

01110101010

10101010101

01010101010

10101010101

01010101010

01010101010

10101010101

01010101010

10101010101

01010101010

So, What Should I Look For in a Database?

In-database analytics checklist

1. Data streaming

2. Flexible, easy-to-use in-database mechanisms

Use R! 2010 - Netezza - In-database Analytics with R

3. Easy, fast, extensible development environment

4. Wide availability of tools including open source tools

5. Industry accepted standards/tools

6. Easy to manage and maintain

Page 9

Michele Chambers [email protected] 508.382.8264

Thank youMichele Chambers [email protected] 508.382.8264

Brian Hess [email protected] 508.382.8471

Page 10

Use R! 2010 - Netezza - In-database Analytics with R