Replicate from Oracle to data warehouses and analytics

VMware Continuent Replication Replicate from Oracle to data warehouses and analytics

MC Brown Senior Product Line Manager October 22nd, 2015

Agenda

1 Introduction to VMware Continuent

2 Understanding VMware Continuent Replication

3 Using Analytics and Data Warehouses

4 Warp-up and Questions

Introducing VMware Continuent

Business continuity for business-critical MySQL database applications

Commercial-grade multi-site HA/DR

Database Clustering Flexible, high-performance replication for Oracle and MySQL

Simple data loading into analytics and big data

Data Replication

Oracle Oracle MySQL Oracle MySQL MySQL (+ MariaDB, Percona Server) Oracle Hadoop, Redshift, Vertica MySQL Hadoop, Redshift, Vertica

Products Products MySQL Single Site HA MySQL Multi-Site HA and DR

Replication solves important problems for RDBMS users

• Real-time local copies in case the DBMS fails • Real-time remote copies in case the site fails •  Loading data into quickly into analytic systems •  Feeding edge applications from the Oracle mother ship • Migrating from Oracle to:

– New Oracle versions – Less expensive editions – Non-Oracle DBMS

CONFIDENTIAL 4

Agenda

2 Understanding VMware Continuent Replication

4 Wrap-up and Questions

VMware Continuent implements flexible, high-performance replication for Oracle and MySQL

Replicator mySQL

DBMS Logs

Replicator

Download transactions via network or from file system

Apply using JDBC (Transactions + metadata)

(Transactions + metadata)

Primary

Secondary

Source

Target

Low latency transfer

Low application impact

VMware Continuent captures transactions directly from Oracle REDO logs

Replicator mySQL

REDO Logs

Primary

(To secondary)

Capture data

dictionary

Source

Capture raw transactions

Staging area for REDO log

Replicator Host Oracle DBMS Host

Convert to serialized row

changes and DDL

Low-impact, high performance

•  Source Oracle DBMS requirements: – Supplemental logging – Archive logs – Replicator metadata stored in DBMS – Replicator login with access to catalogs and flashback query –  local process to read REDO logs

•  Target Oracle DBMS requirements: – Replicator metadata stored in DBMS

CONFIDENTIAL 8

Transaction Based Replication

CONFIDENTIAL 9

Transaction Log (Row changes + Statements)

0 Create table db1.foo 1 Create table db2.foo 2 insert into db1. foo values(1, … 3 Update db1.foo where id=1… 4 Insert into db2.foo values(5,…) 5 Insert into db1.foo values(3,…) 6 Delete from db2.foo where id=5

Source

Target

Parallel Apply

THL Parallel queue (Transactions + metadata)

Target

Extract Filter Apply Extract Filter Apply

Extract Filter Apply

Stage Stage Stage

Replicator Pipeline

Source replicator

Parallel Extraction for Provisioning

Extract Filter Apply Extract Filter Apply

Stage Stage

Replicator Pipeline

Source Multi-threaded data extraction using flashback queries

Topologies

Replicator Replicator

Replicator

Fan-in

Replicator

Fan-out

Multiple Targets

Replicator

Source

Other RDBMS versions and OS platforms

Other RDBMS types

Non-relational DBMS

We can even divide logs into transaction sequences on keys

Table=db1.foo, key=1 2 insert into db1. foo values(1, … 3 Update db1.foo where id=1…

Table db2.foo, key=5 4 Insert into db2.foo values(5,…) 6 Delete from db2.foo where id=5

Table=db1.foo, key=3 5 Insert into db1.foo values(3,…) Source

Target

Ordering transactions around keys enables efficient data warehouse loading

Replicator

Source DBMS

CSV Files CSV Files CSV Files CSV Files

Load Script

HADOOP CLUSTER

Parallel loading

Map/Reduce View Generation

Agenda

2 Understanding Continuent Replication

Data Warehouse Integration and Usage is Changing •  Traditional data warehouse usage was based on dump from transactional store, loads into data

warehouse

•  Data warehouse and analytics were done off historical data loaded •  Data warehouses often use merged data from multiple sources, which was hard to handled

•  Data warehouses are now frequently sources as well as targets for data, i.e.: –  Export data to data warehouse –  Analyze data –  Feed summary data back to application to display stats to users

Modern Data Warehouse Sequences

How do we cope with that model •  Traditional Extract-Transform-Load (ETL) methods take too long •  Data needs to be replicated into a data warehouse in real-time

•  Continuous stream of information •  Replicate everything

•  Use data warehouse to provide join and analytics

Data Warehouse Choices •  Oracle •  Hadoop

–  General purpose storage platform –  Map Reduce for data processing –  Front-end interfaces for interaction in SQL-like (Hive, HBase, Impala) and non-SQL (Pig, native, Spark) –  JDBC/ODBC Interfaces improving

•  Vertica –  Massive cluster-based column store –  SQL and ODBC/JDBC Interface

•  Amazon Redshift –  Highly flexible column store –  Easy to deploy

(software formerly known as Tungsten Replicator) is a fast,

open source, database replication engine

Designed for speed and flexibility

Apache V2 license 100% open source, find it on Github

VMware Continuent for Replication/Data Warehouses

Transactional Store Data Warehouse

Dump/Provision

Transactions? X

The Data Warehouse Impedance Mismatch

Transactional and Data Warehouse Metadata •  Replicating data is not just about the data •  Table structures must be replicated too

•  ddlscan handles the translation –  Migrates an existing MySQL or Oracle schema into the target schema –  Template based –  Handles underlying data type matches –  Needs to be executed before replication starts

Replicating into Vertica

Replicator

cpimport

staging

Replicating into Redshift

Replicator

staging

Replicating into Hadoop

Replicator

hadoop fs

Initial Materialization within Hadoop

load-reduce-check

Migrate staging/base DDL

Hive materialization

StagingTable

Base Table

Ongoing Materialization within Hadoop

materialize

StagingTable

Base Table

Comparing Loading Methods for Hadoop Manual via CSV Sqoop Tungsten

Replicator

Process Manual/Scripted Manual/Scripted Fully Automated

Incremental Loading

Possible with DDL changes

Requires DDL changes

Fully Supported

Latency Full-load Intermittent Real-time

Extraction Requirements

Full table scan Full and partial table scans

Low-impact CDC/binlog scan

Sqoop and Materialization within Hadoop

StagingTable

Base Table

Replicate

Op Seqno ID Msg

I 1 1 Hello World!

I 2 2 Meet MC

I 3 1 Goodbye World

Op Seqno

ID Msg

I 2 2 Meet MC

I 3 1 Goodbye World

How the Materialization Works

1 2 3 4 5 6 7 8 9 1 0

Monday Wednesday Friday

Data Warehouse Possibilities: Point in Time Tables

Op Seqno

ID Date Msg

I 1 1 1/6/14 Hello World!

I 2 2 2/6/14 Meet MC

I 3 1 2/6/14 Goodbye World

I 4 1 3/6/14 Hello Tuesday

I 4 2 3/6/14 Ruby Wednesday

I 5 1 4/6/14 Final Count

ID Date Msg 1 1/6/14 Hello World! 1 2/6/14 Goodbye World 1 3/6/14 Hello Tuesday 1 4/6/14 Final Count

Data Warehouse Possibilities: Time Series Generation

Agenda

2 Understanding Continuent Replication

Wrap-up •  VMware Continuent Replication provides robust, flexible capabilities that have

been battle-tested in demanding customer environments •  Replication features compare favorably to Oracle GoldenGate and Data Guard •  VMware Continuent handles HA/DR, data warehouse loading, and edge

application use cases

For more information, contact us: Robert Noyes Alliance Manager, AMER & LATAM rnoyes@vmware.com +1 (650) 575-0958 Philippe Bernard Alliance Manager, EMEA & APJ pbernard@vmware.com +41 79 347 1385

MC Brown Senior Product Line Manager mcb@vmware.com Eero Teerikorpi Sr. Director, Strategic Alliance eteerikorpi@vmware.com +1 (408) 431-3305

www.vmware.com/products/continuent

Replicate from Oracle to data warehouses and analytics

Software

Carnot Warehouses

SPLUNK® FOR BIG DATA ANALYTICS · PDF fileSplunk: the Platform for Big Data Analytics Splunk software helps you unlock the hidden value of this data. ... and enterprise data warehouses

LME Listed Warehouses

Roadmap Big Data Warehouses and Analytics: About ... · Big Data Warehouses and Analytics: About Scalability and Realtime Pedro Furtado Sixth European Business Intelligence & Big

Analytics in a day...Analytics A limitless analytics service with unmatched time to insight, that delivers insights from all your data, across data warehouses and big data analytics

Business Intelligence Analytics - microsoft.com · Business Intelligence Analytics ... tured data stored in data warehouses. As more data becomes available, ... at data in a business

Terminals & Warehouses

Daubert Industrial Warehouses

Data Warehouses - UiO

ESTABLISHING NEW WAREHOUSES€¦ · ESTABLISHING NEW WAREHOUSES: ESTABLISHING MULTIPLE WAREHOUSES IN MAIDUGURI, NIGERIA ... Establishing Multiple Warehouses I shared my concerns with

Industrial Warehouses - LoopNet

Data Warehouses & OLAP

P POTATO WAREHOUSES

SUJAI HAJELAd24wuq6o951i2g.cloudfront.net/img/events/3417929/assets/...Retail Hospitality Healthcare Education-Walk straight-Turn right Asset visibility and traffic analytics Warehouses

Warehouses Without Inventory

Streaming Analytics, Data Lakes and PI Integratorscdn.osisoft.com/...francisco/...MZiegler_Streaming-Analytics-Data-Lakes-PI-Integrators.pdf& Data Warehouses Available Today PI Integrator

Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics

Factories and warehouses

Data Modeling Sabotages Supply Chain Analytics · 2020-04-19 · 2 Data Modeling Sabotages Supply Chain Analytics Why Legacy Data Warehouses Prevent True Insight Modern supply chains

Replication in real-time from Oracle and MySQL into data warehouses and analytics