Replicate from Oracle to data warehouses and analytics

Preview:

Citation preview

© 2015 VMware Inc. All rights reserved.

VMware Continuent Replication Replicate from Oracle to data warehouses and analytics

MC Brown Senior Product Line Manager October 22nd, 2015

2

Agenda

1 Introduction to VMware Continuent

2 Understanding VMware Continuent Replication

3 Using Analytics and Data Warehouses

4 Warp-up and Questions

Introducing VMware Continuent

Business continuity for business-critical MySQL database applications

Commercial-grade multi-site HA/DR

Database Clustering Flexible, high-performance replication for Oracle and MySQL

Simple data loading into analytics and big data

Data Replication

Oracle Oracle MySQL Oracle MySQL MySQL (+ MariaDB, Percona Server) Oracle Hadoop, Redshift, Vertica MySQL Hadoop, Redshift, Vertica

Products Products MySQL Single Site HA MySQL Multi-Site HA and DR

Replication solves important problems for RDBMS users

• Real-time local copies in case the DBMS fails • Real-time remote copies in case the site fails •  Loading data into quickly into analytic systems •  Feeding edge applications from the Oracle mother ship • Migrating from Oracle to:

– New Oracle versions – Less expensive editions – Non-Oracle DBMS

CONFIDENTIAL 4

5

Agenda

1 Introduction to VMware Continuent

2 Understanding VMware Continuent Replication

3 Using Analytics and Data Warehouses

4 Wrap-up and Questions

VMware Continuent implements flexible, high-performance replication for Oracle and MySQL

6

Replicator mySQL

DBMS Logs

mySQL

Replicator

THL

THL

Download transactions via network or from file system

Apply using JDBC (Transactions + metadata)

(Transactions + metadata)

Primary

Secondary

Source

Target

Low latency transfer

Low application impact

VMware Continuent captures transactions directly from Oracle REDO logs

7

Replicator mySQL

REDO Logs

mySQL

THL

(Transactions + metadata)

Primary

(To secondary)

Capture data

dictionary

Source

Capture raw transactions

Staging area for REDO log

data

Replicator Host Oracle DBMS Host

Convert to serialized row

changes and DDL

Low-impact, high performance

•  Source Oracle DBMS requirements: – Supplemental logging – Archive logs – Replicator metadata stored in DBMS – Replicator login with access to catalogs and flashback query –  local process to read REDO logs

•  Target Oracle DBMS requirements: – Replicator metadata stored in DBMS

CONFIDENTIAL 8

Transaction Based Replication

CONFIDENTIAL 9

Transaction Log (Row changes + Statements)

0 Create table db1.foo 1 Create table db2.foo 2 insert into db1. foo values(1, … 3 Update db1.foo where id=1… 4 Insert into db2.foo values(5,…) 5 Insert into db1.foo values(3,…) 6 Delete from db2.foo where id=5

Source

Target

Parallel Apply

10

THL Parallel queue (Transactions + metadata)

Target

Extract Filter Apply Extract Filter Apply

Extract Filter Apply

Extract Filter Apply

Extract Filter Apply

Stage Stage Stage

Replicator Pipeline

Source replicator

Parallel Extraction for Provisioning

11

THL

(Transactions + metadata)

Extract Filter Apply Extract Filter Apply

Stage Stage

Replicator Pipeline

Source Multi-threaded data extraction using flashback queries

Topologies

12

Replicator Replicator

Replicator

Fan-in

Replicator Replicator

Replicator

Fan-out

Multiple Targets

13

Replicator Replicator

Replicator

Replicator

Source

Other RDBMS versions and OS platforms

Other RDBMS types

Non-relational DBMS

We can even divide logs into transaction sequences on keys

14

Table=db1.foo, key=1 2 insert into db1. foo values(1, … 3 Update db1.foo where id=1…

Table db2.foo, key=5 4 Insert into db2.foo values(5,…) 6 Delete from db2.foo where id=5

Table=db1.foo, key=3 5 Insert into db1.foo values(3,…) Source

Target

Ordering transactions around keys enables efficient data warehouse loading

15

Replicator

Source DBMS

CSV Files CSV Files CSV Files CSV Files

Load Script

HADOOP CLUSTER

Parallel loading

Map/Reduce View Generation

16

Agenda

1 Introduction to VMware Continuent

2 Understanding Continuent Replication

3 Using Analytics and Data Warehouses

4 Wrap-up and Questions

Data Warehouse Integration and Usage is Changing •  Traditional data warehouse usage was based on dump from transactional store, loads into data

warehouse

•  Data warehouse and analytics were done off historical data loaded •  Data warehouses often use merged data from multiple sources, which was hard to handled

•  Data warehouses are now frequently sources as well as targets for data, i.e.: –  Export data to data warehouse –  Analyze data –  Feed summary data back to application to display stats to users

17

Modern Data Warehouse Sequences

How do we cope with that model •  Traditional Extract-Transform-Load (ETL) methods take too long •  Data needs to be replicated into a data warehouse in real-time

•  Continuous stream of information •  Replicate everything

•  Use data warehouse to provide join and analytics

Data Warehouse Choices •  Oracle •  Hadoop

–  General purpose storage platform –  Map Reduce for data processing –  Front-end interfaces for interaction in SQL-like (Hive, HBase, Impala) and non-SQL (Pig, native, Spark) –  JDBC/ODBC Interfaces improving

•  Vertica –  Massive cluster-based column store –  SQL and ODBC/JDBC Interface

•  Amazon Redshift –  Highly flexible column store –  Easy to deploy

21

(software formerly known as Tungsten Replicator) is a fast,

open source, database replication engine

Designed for speed and flexibility

Apache V2 license 100% open source, find it on Github

VMware Continuent for Replication/Data Warehouses

22

Transactional Store Data Warehouse

Dump/Provision

Transactions? X

Batch

The Data Warehouse Impedance Mismatch

Transactional and Data Warehouse Metadata •  Replicating data is not just about the data •  Table structures must be replicated too

•  ddlscan handles the translation –  Migrates an existing MySQL or Oracle schema into the target schema –  Template based –  Handles underlying data type matches –  Needs to be executed before replication starts

Replicating into Vertica

Replicator

Replicator

CSV

JS

JDBC

cpimport

staging

base

merge

Replicating into Redshift

Replicator

Replicator

CSV

JS

JDBC

s3cmd

staging

base

merge

COPY

Replicating into Hadoop

Replicator

Replicator

CSV

JS

hadoop fs

Initial Materialization within Hadoop

load-reduce-check

Migrate staging/base DDL

Hive materialization

CSV

StagingTable

Base Table

Ongoing Materialization within Hadoop

materialize

Hive materialization

CSV

StagingTable

Base Table

Comparing Loading Methods for Hadoop Manual via CSV Sqoop Tungsten

Replicator

Process Manual/Scripted Manual/Scripted Fully Automated

Incremental Loading

Possible with DDL changes

Requires DDL changes

Fully Supported

Latency Full-load Intermittent Real-time

Extraction Requirements

Full table scan Full and partial table scans

Low-impact CDC/binlog scan

Sqoop and Materialization within Hadoop

Hive materialization

CSV

StagingTable

Base Table

Sqoop

Replicate

31

Op Seqno ID Msg

I 1 1 Hello World!

I 2 2 Meet MC

D 3 1

I 3 1 Goodbye World

Op Seqno

ID Msg

I 2 2 Meet MC

I 3 1 Goodbye World

How the Materialization Works

32

1 2 3 4 5 6 7 8 9 1 0

1 1

1 2

1 3

1 4

1 5

1 6

1 7

1 8

1 9

2 0

2 1

2 2

2 3

2 4

2 5

2 6

2 7

2 8

2 9

3 0

3 1

3 2

3 3

3 4

3 5

3 6

3 7

3 8

3 9

4 0

4 1

4 2

4 3

4 4

4 5

Monday Wednesday Friday

Data Warehouse Possibilities: Point in Time Tables

33

Op Seqno

ID Date Msg

I 1 1 1/6/14 Hello World!

I 2 2 2/6/14 Meet MC

I 3 1 2/6/14 Goodbye World

I 4 1 3/6/14 Hello Tuesday

I 4 2 3/6/14 Ruby Wednesday

I 5 1 4/6/14 Final Count

ID Date Msg 1 1/6/14 Hello World! 1 2/6/14 Goodbye World 1 3/6/14 Hello Tuesday 1 4/6/14 Final Count

Data Warehouse Possibilities: Time Series Generation

34

Agenda

1 Introduction to VMware Continuent

2 Understanding Continuent Replication

3 Using Analytics and Data Warehouses

4 Wrap-up and Questions

Wrap-up •  VMware Continuent Replication provides robust, flexible capabilities that have

been battle-tested in demanding customer environments •  Replication features compare favorably to Oracle GoldenGate and Data Guard •  VMware Continuent handles HA/DR, data warehouse loading, and edge

application use cases

35

For more information, contact us: Robert Noyes Alliance Manager, AMER & LATAM rnoyes@vmware.com +1 (650) 575-0958 Philippe Bernard Alliance Manager, EMEA & APJ pbernard@vmware.com +41 79 347 1385

MC Brown Senior Product Line Manager mcb@vmware.com Eero Teerikorpi Sr. Director, Strategic Alliance eteerikorpi@vmware.com +1 (408) 431-3305

www.vmware.com/products/continuent

Recommended