36
© 2015 VMware Inc. All rights reserved. VMware Continuent Replication Replicate from Oracle to data warehouses and analytics MC Brown Senior Product Line Manager October 22 nd , 2015

Replicate from Oracle to data warehouses and analytics

Embed Size (px)

Citation preview

Page 1: Replicate from Oracle to data warehouses and analytics

© 2015 VMware Inc. All rights reserved.

VMware Continuent Replication Replicate from Oracle to data warehouses and analytics

MC Brown Senior Product Line Manager October 22nd, 2015

Page 2: Replicate from Oracle to data warehouses and analytics

2

Agenda

1 Introduction to VMware Continuent

2 Understanding VMware Continuent Replication

3 Using Analytics and Data Warehouses

4 Warp-up and Questions

Page 3: Replicate from Oracle to data warehouses and analytics

Introducing VMware Continuent

Business continuity for business-critical MySQL database applications

Commercial-grade multi-site HA/DR

Database Clustering Flexible, high-performance replication for Oracle and MySQL

Simple data loading into analytics and big data

Data Replication

Oracle Oracle MySQL Oracle MySQL MySQL (+ MariaDB, Percona Server) Oracle Hadoop, Redshift, Vertica MySQL Hadoop, Redshift, Vertica

Products Products MySQL Single Site HA MySQL Multi-Site HA and DR

Page 4: Replicate from Oracle to data warehouses and analytics

Replication solves important problems for RDBMS users

• Real-time local copies in case the DBMS fails • Real-time remote copies in case the site fails •  Loading data into quickly into analytic systems •  Feeding edge applications from the Oracle mother ship • Migrating from Oracle to:

– New Oracle versions – Less expensive editions – Non-Oracle DBMS

CONFIDENTIAL 4

Page 5: Replicate from Oracle to data warehouses and analytics

5

Agenda

1 Introduction to VMware Continuent

2 Understanding VMware Continuent Replication

3 Using Analytics and Data Warehouses

4 Wrap-up and Questions

Page 6: Replicate from Oracle to data warehouses and analytics

VMware Continuent implements flexible, high-performance replication for Oracle and MySQL

6

Replicator mySQL

DBMS Logs

mySQL

Replicator

THL

THL

Download transactions via network or from file system

Apply using JDBC (Transactions + metadata)

(Transactions + metadata)

Primary

Secondary

Source

Target

Low latency transfer

Low application impact

Page 7: Replicate from Oracle to data warehouses and analytics

VMware Continuent captures transactions directly from Oracle REDO logs

7

Replicator mySQL

REDO Logs

mySQL

THL

(Transactions + metadata)

Primary

(To secondary)

Capture data

dictionary

Source

Capture raw transactions

Staging area for REDO log

data

Replicator Host Oracle DBMS Host

Convert to serialized row

changes and DDL

Page 8: Replicate from Oracle to data warehouses and analytics

Low-impact, high performance

•  Source Oracle DBMS requirements: – Supplemental logging – Archive logs – Replicator metadata stored in DBMS – Replicator login with access to catalogs and flashback query –  local process to read REDO logs

•  Target Oracle DBMS requirements: – Replicator metadata stored in DBMS

CONFIDENTIAL 8

Page 9: Replicate from Oracle to data warehouses and analytics

Transaction Based Replication

CONFIDENTIAL 9

Transaction Log (Row changes + Statements)

0 Create table db1.foo 1 Create table db2.foo 2 insert into db1. foo values(1, … 3 Update db1.foo where id=1… 4 Insert into db2.foo values(5,…) 5 Insert into db1.foo values(3,…) 6 Delete from db2.foo where id=5

Source

Target

Page 10: Replicate from Oracle to data warehouses and analytics

Parallel Apply

10

THL Parallel queue (Transactions + metadata)

Target

Extract Filter Apply Extract Filter Apply

Extract Filter Apply

Extract Filter Apply

Extract Filter Apply

Stage Stage Stage

Replicator Pipeline

Source replicator

Page 11: Replicate from Oracle to data warehouses and analytics

Parallel Extraction for Provisioning

11

THL

(Transactions + metadata)

Extract Filter Apply Extract Filter Apply

Stage Stage

Replicator Pipeline

Source Multi-threaded data extraction using flashback queries

Page 12: Replicate from Oracle to data warehouses and analytics

Topologies

12

Replicator Replicator

Replicator

Fan-in

Replicator Replicator

Replicator

Fan-out

Page 13: Replicate from Oracle to data warehouses and analytics

Multiple Targets

13

Replicator Replicator

Replicator

Replicator

Source

Other RDBMS versions and OS platforms

Other RDBMS types

Non-relational DBMS

Page 14: Replicate from Oracle to data warehouses and analytics

We can even divide logs into transaction sequences on keys

14

Table=db1.foo, key=1 2 insert into db1. foo values(1, … 3 Update db1.foo where id=1…

Table db2.foo, key=5 4 Insert into db2.foo values(5,…) 6 Delete from db2.foo where id=5

Table=db1.foo, key=3 5 Insert into db1.foo values(3,…) Source

Target

Page 15: Replicate from Oracle to data warehouses and analytics

Ordering transactions around keys enables efficient data warehouse loading

15

Replicator

Source DBMS

CSV Files CSV Files CSV Files CSV Files

Load Script

HADOOP CLUSTER

Parallel loading

Map/Reduce View Generation

Page 16: Replicate from Oracle to data warehouses and analytics

16

Agenda

1 Introduction to VMware Continuent

2 Understanding Continuent Replication

3 Using Analytics and Data Warehouses

4 Wrap-up and Questions

Page 17: Replicate from Oracle to data warehouses and analytics

Data Warehouse Integration and Usage is Changing •  Traditional data warehouse usage was based on dump from transactional store, loads into data

warehouse

•  Data warehouse and analytics were done off historical data loaded •  Data warehouses often use merged data from multiple sources, which was hard to handled

•  Data warehouses are now frequently sources as well as targets for data, i.e.: –  Export data to data warehouse –  Analyze data –  Feed summary data back to application to display stats to users

17

Page 18: Replicate from Oracle to data warehouses and analytics

Modern Data Warehouse Sequences

Page 19: Replicate from Oracle to data warehouses and analytics

How do we cope with that model •  Traditional Extract-Transform-Load (ETL) methods take too long •  Data needs to be replicated into a data warehouse in real-time

•  Continuous stream of information •  Replicate everything

•  Use data warehouse to provide join and analytics

Page 20: Replicate from Oracle to data warehouses and analytics

Data Warehouse Choices •  Oracle •  Hadoop

–  General purpose storage platform –  Map Reduce for data processing –  Front-end interfaces for interaction in SQL-like (Hive, HBase, Impala) and non-SQL (Pig, native, Spark) –  JDBC/ODBC Interfaces improving

•  Vertica –  Massive cluster-based column store –  SQL and ODBC/JDBC Interface

•  Amazon Redshift –  Highly flexible column store –  Easy to deploy

Page 21: Replicate from Oracle to data warehouses and analytics

21

(software formerly known as Tungsten Replicator) is a fast,

open source, database replication engine

Designed for speed and flexibility

Apache V2 license 100% open source, find it on Github

VMware Continuent for Replication/Data Warehouses

Page 22: Replicate from Oracle to data warehouses and analytics

22

Transactional Store Data Warehouse

Dump/Provision

Transactions? X

Batch

The Data Warehouse Impedance Mismatch

Page 23: Replicate from Oracle to data warehouses and analytics

Transactional and Data Warehouse Metadata •  Replicating data is not just about the data •  Table structures must be replicated too

•  ddlscan handles the translation –  Migrates an existing MySQL or Oracle schema into the target schema –  Template based –  Handles underlying data type matches –  Needs to be executed before replication starts

Page 24: Replicate from Oracle to data warehouses and analytics

Replicating into Vertica

Replicator

Replicator

CSV

JS

JDBC

cpimport

staging

base

merge

Page 25: Replicate from Oracle to data warehouses and analytics

Replicating into Redshift

Replicator

Replicator

CSV

JS

JDBC

s3cmd

staging

base

merge

COPY

Page 26: Replicate from Oracle to data warehouses and analytics

Replicating into Hadoop

Replicator

Replicator

CSV

JS

hadoop fs

Page 27: Replicate from Oracle to data warehouses and analytics

Initial Materialization within Hadoop

load-reduce-check

Migrate staging/base DDL

Hive materialization

CSV

StagingTable

Base Table

Page 28: Replicate from Oracle to data warehouses and analytics

Ongoing Materialization within Hadoop

materialize

Hive materialization

CSV

StagingTable

Base Table

Page 29: Replicate from Oracle to data warehouses and analytics

Comparing Loading Methods for Hadoop Manual via CSV Sqoop Tungsten

Replicator

Process Manual/Scripted Manual/Scripted Fully Automated

Incremental Loading

Possible with DDL changes

Requires DDL changes

Fully Supported

Latency Full-load Intermittent Real-time

Extraction Requirements

Full table scan Full and partial table scans

Low-impact CDC/binlog scan

Page 30: Replicate from Oracle to data warehouses and analytics

Sqoop and Materialization within Hadoop

Hive materialization

CSV

StagingTable

Base Table

Sqoop

Replicate

Page 31: Replicate from Oracle to data warehouses and analytics

31

Op Seqno ID Msg

I 1 1 Hello World!

I 2 2 Meet MC

D 3 1

I 3 1 Goodbye World

Op Seqno

ID Msg

I 2 2 Meet MC

I 3 1 Goodbye World

How the Materialization Works

Page 32: Replicate from Oracle to data warehouses and analytics

32

1 2 3 4 5 6 7 8 9 1 0

1 1

1 2

1 3

1 4

1 5

1 6

1 7

1 8

1 9

2 0

2 1

2 2

2 3

2 4

2 5

2 6

2 7

2 8

2 9

3 0

3 1

3 2

3 3

3 4

3 5

3 6

3 7

3 8

3 9

4 0

4 1

4 2

4 3

4 4

4 5

Monday Wednesday Friday

Data Warehouse Possibilities: Point in Time Tables

Page 33: Replicate from Oracle to data warehouses and analytics

33

Op Seqno

ID Date Msg

I 1 1 1/6/14 Hello World!

I 2 2 2/6/14 Meet MC

I 3 1 2/6/14 Goodbye World

I 4 1 3/6/14 Hello Tuesday

I 4 2 3/6/14 Ruby Wednesday

I 5 1 4/6/14 Final Count

ID Date Msg 1 1/6/14 Hello World! 1 2/6/14 Goodbye World 1 3/6/14 Hello Tuesday 1 4/6/14 Final Count

Data Warehouse Possibilities: Time Series Generation

Page 34: Replicate from Oracle to data warehouses and analytics

34

Agenda

1 Introduction to VMware Continuent

2 Understanding Continuent Replication

3 Using Analytics and Data Warehouses

4 Wrap-up and Questions

Page 35: Replicate from Oracle to data warehouses and analytics

Wrap-up •  VMware Continuent Replication provides robust, flexible capabilities that have

been battle-tested in demanding customer environments •  Replication features compare favorably to Oracle GoldenGate and Data Guard •  VMware Continuent handles HA/DR, data warehouse loading, and edge

application use cases

35

Page 36: Replicate from Oracle to data warehouses and analytics

For more information, contact us: Robert Noyes Alliance Manager, AMER & LATAM [email protected] +1 (650) 575-0958 Philippe Bernard Alliance Manager, EMEA & APJ [email protected] +41 79 347 1385

MC Brown Senior Product Line Manager [email protected] Eero Teerikorpi Sr. Director, Strategic Alliance [email protected] +1 (408) 431-3305

www.vmware.com/products/continuent