Upload
ngokhanh
View
223
Download
0
Embed Size (px)
Citation preview
In-Database Analytics with R
Michele Chambers – Advanced Analytics Product Management, Director
Brian Hess – Advanced Analytics, Director & Principal Mathematician
© 2010 Netezza, Inc. All rights reserved
Agenda
• What are in-database analytics?
• How does in-database analytics processing help you?
• Can in-database analytics be used for data mining as well as scoring?
• How can you take advantage of a massively parallel architecture to speed up
embarrassingly parallel algorithms as well as heroic computations?
Use R! 2010 - Netezza - In-database Analytics with R
Page 2
Advanced Analytics – the Traditional Way
Demand
Forecasting
SASAnalytics
Grid
Data
Warehouse
Data
ETL
SQL
ETL
Use R! 2010 - Netezza - In-database Analytics with R
Page 3
Fraud
DetectionR, S+
C/C++, Java, Python,
Fortran, …
SQL
SQL
ETL
What are in-database analytics?
Embedding of analytics inside the database
so that the computation processing occurs as
Use R! 2010 - Netezza - In-database Analytics with R
so that the computation processing occurs as
close to the data as possible
Page 4
What’s the Big Deal with In-Database Analytics?
R ClientR Client
BI ClientBI Client
FPGA CPU
FPGA
Analytics
CPU
Host
Use R! 2010 - Netezza - In-database Analytics with R
Page 5
LoaderLoader
Visualization
Client
Visualization
Client
Applications
FPGA
Analytics
CPU
Analytics HostsHost
Disk
Enclosures S-Blades™Network
Fabric
Netezza Appliance
Moving Compute Next to Data As Data Streams By
Data Parallelism
• Scoring / Predicting
> Concurrent calculation of a prediction or score on large quantities of data
Task Parallelism
• Model Simulation / Experimentation
> Concurrent simulation on different data
> 100’s different models running against 1M‘s rows
Data Mining with Task/Data Parallelism
• Series of iterations that can be parallelized
Define problem
Use R! 2010 - Netezza - In-database Analytics with R
Page 6
Score
calculation
on Core 1
Scoring Model
Control Program
Score
calculation
on Core X
Simulation
#1
on Core 1
Monte Carlo Simulation
Control Program
Simulation
#X
on Core X
Deploy and use model
Validate model
Build model
Explore data
Prepare data
InIn--Database AnalyticsDatabase AnalyticsInIn--Database AnalyticsDatabase Analytics
Software Development KitSoftware Development KitParallel Analytic EnginesParallel Analytic Engines
Open Source AnalyticsOpen Source AnalyticsOpen Source AnalyticsOpen Source Analytics
R R
AnalyticsAnalytics
R R
AnalyticsAnalytics
Scientific Scientific
AnalyticsAnalytics
Scientific Scientific
AnalyticsAnalyticsData PrepData PrepData PrepData Prep
Data Data
MiningMining
Data Data
MiningMining
Predictive Predictive
AnalyticsAnalytics
Predictive Predictive
AnalyticsAnalytics
nzAnalyticsnzAnalyticsnzAnalyticsnzAnalytics
SpatialSpatialSpatialSpatial
Custom
Customer/
Partner
Analytics
Use R! 2010 - Netezza - In-database Analytics with R
Page 7
Software Development KitSoftware Development KitParallel Analytic EnginesParallel Analytic Engines
nzMatrix
nzEngine
for
R
nzEngine
for Hadoop
nzAdaptors
for
C, C++, Java,
Python, Fortran
nzAdaptors
for
C, C++, Java,
Python, Fortran
nzPlug-in
for
Eclipse
nzPlug-in
for
Eclipse
nzPackage
for
R GUI
nzPackage
for
R GUI
Streaming Accelerator
Netezza AMPP™ Platform
Company Confidential
What does In-Database Analytics Deliver?
Efficient process
Faster turnarounds
Ability to react to
market
Unlimited analytic
complexity
Use R! 2010 - Netezza - In-database Analytics with R
Page 8
$Reduce expenses
Unlimited/ current data
marketcomplexity
Ability to experiment
01011111010
10101010101
01010101010
10101010101
01010101010
01110101010
10101010101
01010101010
10101010101
01010101010
01010101010
10101010101
01010101010
10101010101
01010101010
So, What Should I Look For in a Database?
In-database analytics checklist
1. Data streaming
2. Flexible, easy-to-use in-database mechanisms
Use R! 2010 - Netezza - In-database Analytics with R
3. Easy, fast, extensible development environment
4. Wide availability of tools including open source tools
5. Industry accepted standards/tools
6. Easy to manage and maintain
Page 9
Michele Chambers [email protected] 508.382.8264
Thank youMichele Chambers [email protected] 508.382.8264
Brian Hess [email protected] 508.382.8471
Page 10
Use R! 2010 - Netezza - In-database Analytics with R