17
© 2010 IBM Corporation 1 InfoSphere Data Replication CDC for Netezza (Pure Data for Analytics)

CDC Netezza Customer-2015-Mar-31 - IBM€¦ · DB2, Oracle, SQL Server, etc Flat files Key Benefits: • Low impact • Flexible implementation ... • CDC uses Linux Named Pipe to

  • Upload
    buinhi

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

© 2010 IBM Corporation1

InfoSphere Data Replication CDC for Netezza (Pure Data for Analytics)

Information Management Software

2

Overview

- Netezza has different behavior than other databases

- Overall CDC Philosophy (low impact vs. low latency)

- Topology

- Requirements

- Netezza Optimizations (filtering, bulk apply, updat es)

- Simplified Tuning

- Limitations

Information Management Software

Log-Based Change Data Capture

Database Logs

Source Engine Target Engine

TCP/IP

Monitoring and Configuration

Database

Message Queue

Web Services

DB2, Oracle,SQL Server, etc

Flat files

Key Benefits:

• Low impact

• Flexible implementation

• Heterogeneous platform support

• Easy to use

Information Server

InfoSphere Information

Server

Netezza

Information Management Software

Netezza is Different

• Optimized for bulk operations and queries

• Not optimized for small OLTP transactions

• Traditional row by row apply is sub-optimal

• Striving for very low apply latency can generate te rribly inefficient workloads for the database

Information Management Software

IIDR/CDC for Netezza

• Optimized to make best use of your Netezza applianc e

• Leverages Netezza’s strengths in bulk loading to de liver high throughput apply

• Designed to minimize the impact on the appliance – A pply Latency is minutes not seconds

Information Management Software

Topology – Installation on RedHat Linux (64-bit x86)

CDC Netezza

Netezza Appliance

• Need connectivity to NPS machine

CDC Source

Information Management Software

Topology – Instances and Database

• Each CDC Instance connects to a NZ Database

• Consider number of DBs when sizing CDC

SALES

MKT

ARC

INVT

CUST

INVT

CUST

CDC Netezza

MKT

SALES

Information Management Software

CDC Netezza Requirements

• OS: RedHat Linux 5.3 or later

• Architecture: x86 (64-bit) or 64-bit zLinux

• Per Subscription Memory: 4Gb minimum, default 8Gb

• Netezza JDBC 6.0.3 driver and later (not supplied)

• Connectivity to Netezza Appliance

Information Management Software

Reliability, Recoverability

Information Management Software

Engine Details

• Data is loaded into Netezza External Tables

• CDC uses Linux Named Pipe to populate External Tabl es

• Data Never hits the disk until it’s in Netezza

Netezza DBEXT_TBLPipe

INSERT/UPDATE T1 SELECT … FROM EXTERNAL TABLE EXT_TBL …;

T11,’a’

2,’b’

3,’c’

Information Management Software

Netezza Optimizations – Updates

• Updates are replicated as DELETE and INSERT

• Netezza tables should have primary keys

• Update and Delete workloads from the source require table mappings to select the unique key column set (primary keys are automatic)

• Updates marked as N/A in performance monitor

Information Management Software

CDC Netezza Optimizations – Net changes

• CDC NZ can filter some operations that are made red undant by deletes. For example, inserting 10,000 rows then deleting them.

Information Management Software

Simplified Tuning

• Less System Parameters

• One tuning parameter : acceptable_latency_in_minute s

Information Management Software

The Bulk Apply with High Volumes

• Data is buffered in memory to create large UOW

• If enough work is found, CDC will apply

TX3

TX2

TX1

TX4

TX5

Netezza DBAPPLY

12:0012:0112:02

acceptable_latency_in_minutes =5

Information Management Software

The Bulk Apply with Low Volumes

• Data is buffered in memory to create large UOW

• If not enough work is found in a given time it’s fl ushed

TX2

TX1

Netezza DBAPPLY

12:0012:0312:0412:05

acceptable_latency_in_minutes =5

Information Management Software

Limitations

• General Rule: with exception of apply, no access to target DB

• User Exits, Stored Procedures cannot use CDC’s conn ection

• No differential refresh

• No conflict Detection and Resolution

• INTERVAL data type not supported

• Does not support LOBs

• Must be deselected or mapped to varchar/char/nvarchar/nchar and limited in length

Information Management Software

Other Considerations

• Since UPDATEs are replicated as DELETEs and INSERTs:

• Full logging on source database is required for rep licated tables (keys and changes logging isn’t enough)