Netezza pure data

Preview:

Citation preview

IBM PureData System for Analytics

Powered by Netezza

Hossein Sarshar

Agenda• What is PureData and Netezza

o History

o Characteristics

o Product chain

• PureData Hardware Architectureo Introduction

o Hardware architecture

o Paralleled structures

• Analytics with PureDatao Introduction

o In-database analytics tools

• Demo

IBM® PureData™ for Analytics 2

What is PureData and Netezza

PureSystems

PureFlex PureApplication

IBM® PureData™ for Analytics 3

In 2010, IBM bought a new analytics platform called

Netezza. It was founded in 2000 at Marlborough, CA.

IBM later rebranded it to PureData.

What is PureData and Netezza

PureSystems

PureFlex PureApplication PureData

IBM® PureData™ for Analytics 4

PureSystems Product Family

PureFlex: o Combines and optimizes compute, storage, networking and virtualization

capabilities under a single, unified management console into an

infrastructure system.

PureApplication:o Is a platform system designed and tuned specifically for transactional web

and database applications.

PureData:o Based on Netezza technology, PureData is all data experts need in a

single well tuned appliance.

IBM® PureData™ for Analytics 5

PureData

Operational Analytics

Transactions Analytics

PureSystemsCharacteristics

• Built-in Expertso No indexing/tuning/partitioning

o Fully parallel, optimized in-Database Analytics.

o No storage administration.

o No software installation.

• Integration by Design:o Server, Storage, Database in one easy to use package.

o Automatic parallelization and resource optimization to scale economically

o Enterprise-class security and platform management

• Simplified Experience:o Up and running in hours.

o Minimal ongoing administration.

o Standard interfaces to best of breed Analytics, BI, and data integration tools.

o Built-in analytics capabilities allow users to derive insight from data quickly.

o Easy connectivity to other Big Data Platform components

IBM® PureData™ for Analytics 6

Each of these come as an appliance equal to simplified yet strong private clouds with

minimal administration

PureData Introduction• It is a datawarehousing and data analytics

appliance that is fast enough to process terabytes of data in seconds. It is a fully parallel machine.

• Netezza’s main technology is using FPGA (Field Programmable Gateway Array) to filter unnecessary files in parallel manner.

• PureData uses Netezza technology to perform deep analytics on huge amount of data in a reasonable time.

• It is purpose-built for high performance analytics.

• It supports all DB structures (3NF, Star, De-Normalized table)

IBM® PureData™ for Analytics 7

PureData Architecture

IBM® PureData™ for Analytics 8

Disk storage

RAID 1 disksHigh speed data

streams

SMP Host

Redhat linuxservers

OptimizerCompiler

A gateway to the system

Snippet-Blades

Query accelerator using FPGAs

S-Blades (SPU)

IBM® PureData™ for Analytics 9

S-Blades

IBM® PureData™ for Analytics 10

Intel Quad-Core

Dual-Core FPGADRAM

IBM BladeCenter Server Netezza DB Accelerator

SAS Expander

Module

SAS Expander

Module

S-Blades Overview• There are 8 intel core on IBM Blade-Center Server

and 8 FPGA on Netezza DB accelerator.o FPGA has similar dimensions a CPU has, consumes 5 times less power and

clock speed is about 5 times less

o More caching capability

o Low latency and high throughput

• Each of these S-Blades takes ownership of 6-8 disks.

• The queries are divided into subqueries that are

processed by S-Blades.

IBM® PureData™ for Analytics 11

PureData AMPP (Shared-Nothing) Architecture

12

Advanced Analytics

Loader

ETL

BI

Applications

FPGA

Memory

CPU

FPGA

Memory

CPU

FPGA

Memory

CPU

Hosts

SMP

Host

Disk

EnclosuresS-Blades™

Network

Fabric

Netezza Appliance

FPGA Secret Sauce

IBM® PureData™ for Analytics 13

FPGA Core CPU Core

Uncompress Project Restrict,

Visibility

Complex ∑

Group by, …

select DISTRICT,

PRODUCTGRP,

sum(NRX)

from MTHLY_RX_TERR_DATA

where MONTH = '20091201'

and MARKET = 509123

and SPECIALTY = 'GASTRO'

Slice of table

MTHLY_RX_TERR_DATA

(compressed)

where MONTH = '20091201'

and MARKET = 509123

and SPECIALTY = 'GASTRO'

sum(NRX)

select DISTRICT,

PRODUCTGRP,

sum(NRX)

Using FPGA reduces a tremendous among of

unnecessary data movement

PureData System Configuration

14IBM® PureData™ for Analytics

PureData System Configuration

IBM® PureData™ for Analytics 15

PureData System Configuration

IBM® PureData™ for Analytics 16

Single Rack System Multi Rack System

Specs N3001-

002

N3001-

005

N3001-

010

N3001-

020

N3001-

040

N3001-

080

Racks 1 1 1 2 4 8

Active S-Blades 2 4 7 14 28 56

CPU Cores 40 80 140 280 560 1120

FPGA Cores 32 64 112 224 448 896

User Data in TB 32 98 192 384 768 1536

N3001 is the newest IBM PureData

What is Achievable• Having agile analytics platform.

• No administration effort to install/manage

• Scalability in petabyte level

• Linear speedup scalability by adding additional

racks.

• Big Data Meets Deep Analytics => No need to

sample

IBM® PureData™ for Analytics 17

High Performance Analytics Architecture

IBM® PureData™ for Analytics 18

PureData Analytics Modules

IBM® PureData™ for Analytics 19

Netezza In-Database Analytics Options

Classification Time Series Clustering

Associate Rules

Simulation and Monte

Carlo AnalysisGeospatial

IBM® PureData™ for Analytics 20

Demo• Installation

• Client Tool Exploration

• Command Execution

IBM® PureData™ for Analytics 21

Summary• A system for analytics

• Out-of-the-box solution

• It uses FPGA technology to boost query execution

• It uses nothing-shared approach.

• PureData uses open standards to communicate to

outside world

• It has many NZ in-database and 3rd party in-

database options to enrich our analytics

IBM® PureData™ for Analytics 22

References• http://www-01.ibm.com/software/data/netezza/

• http://www.ibm.com/ibm/puresystems/ca/en/

IBM® PureData™ for Analytics 24

Recommended