Upload
continuent
View
458
Download
1
Embed Size (px)
Citation preview
© 2015 VMware Inc. All rights reserved.
VMware Continuent Replication Replicate from Oracle to data warehouses and analytics
MC Brown Senior Product Line Manager October 22nd, 2015
2
Agenda
1 Introduction to VMware Continuent
2 Understanding VMware Continuent Replication
3 Using Analytics and Data Warehouses
4 Warp-up and Questions
Introducing VMware Continuent
Business continuity for business-critical MySQL database applications
Commercial-grade multi-site HA/DR
Database Clustering Flexible, high-performance replication for Oracle and MySQL
Simple data loading into analytics and big data
Data Replication
Oracle Oracle MySQL Oracle MySQL MySQL (+ MariaDB, Percona Server) Oracle Hadoop, Redshift, Vertica MySQL Hadoop, Redshift, Vertica
Products Products MySQL Single Site HA MySQL Multi-Site HA and DR
Replication solves important problems for RDBMS users
• Real-time local copies in case the DBMS fails • Real-time remote copies in case the site fails • Loading data into quickly into analytic systems • Feeding edge applications from the Oracle mother ship • Migrating from Oracle to:
– New Oracle versions – Less expensive editions – Non-Oracle DBMS
CONFIDENTIAL 4
5
Agenda
1 Introduction to VMware Continuent
2 Understanding VMware Continuent Replication
3 Using Analytics and Data Warehouses
4 Wrap-up and Questions
VMware Continuent implements flexible, high-performance replication for Oracle and MySQL
6
Replicator mySQL
DBMS Logs
mySQL
Replicator
THL
THL
Download transactions via network or from file system
Apply using JDBC (Transactions + metadata)
(Transactions + metadata)
Primary
Secondary
Source
Target
Low latency transfer
Low application impact
VMware Continuent captures transactions directly from Oracle REDO logs
7
Replicator mySQL
REDO Logs
mySQL
THL
(Transactions + metadata)
Primary
(To secondary)
Capture data
dictionary
Source
Capture raw transactions
Staging area for REDO log
data
Replicator Host Oracle DBMS Host
Convert to serialized row
changes and DDL
Low-impact, high performance
• Source Oracle DBMS requirements: – Supplemental logging – Archive logs – Replicator metadata stored in DBMS – Replicator login with access to catalogs and flashback query – local process to read REDO logs
• Target Oracle DBMS requirements: – Replicator metadata stored in DBMS
CONFIDENTIAL 8
Transaction Based Replication
CONFIDENTIAL 9
Transaction Log (Row changes + Statements)
0 Create table db1.foo 1 Create table db2.foo 2 insert into db1. foo values(1, … 3 Update db1.foo where id=1… 4 Insert into db2.foo values(5,…) 5 Insert into db1.foo values(3,…) 6 Delete from db2.foo where id=5
Source
Target
Parallel Apply
10
THL Parallel queue (Transactions + metadata)
Target
Extract Filter Apply Extract Filter Apply
Extract Filter Apply
Extract Filter Apply
Extract Filter Apply
Stage Stage Stage
Replicator Pipeline
Source replicator
Parallel Extraction for Provisioning
11
THL
(Transactions + metadata)
Extract Filter Apply Extract Filter Apply
Stage Stage
Replicator Pipeline
Source Multi-threaded data extraction using flashback queries
Topologies
12
Replicator Replicator
Replicator
Fan-in
Replicator Replicator
Replicator
Fan-out
Multiple Targets
13
Replicator Replicator
Replicator
Replicator
Source
Other RDBMS versions and OS platforms
Other RDBMS types
Non-relational DBMS
We can even divide logs into transaction sequences on keys
14
Table=db1.foo, key=1 2 insert into db1. foo values(1, … 3 Update db1.foo where id=1…
Table db2.foo, key=5 4 Insert into db2.foo values(5,…) 6 Delete from db2.foo where id=5
Table=db1.foo, key=3 5 Insert into db1.foo values(3,…) Source
Target
Ordering transactions around keys enables efficient data warehouse loading
15
Replicator
Source DBMS
CSV Files CSV Files CSV Files CSV Files
Load Script
HADOOP CLUSTER
Parallel loading
Map/Reduce View Generation
16
Agenda
1 Introduction to VMware Continuent
2 Understanding Continuent Replication
3 Using Analytics and Data Warehouses
4 Wrap-up and Questions
Data Warehouse Integration and Usage is Changing • Traditional data warehouse usage was based on dump from transactional store, loads into data
warehouse
• Data warehouse and analytics were done off historical data loaded • Data warehouses often use merged data from multiple sources, which was hard to handled
• Data warehouses are now frequently sources as well as targets for data, i.e.: – Export data to data warehouse – Analyze data – Feed summary data back to application to display stats to users
17
Modern Data Warehouse Sequences
How do we cope with that model • Traditional Extract-Transform-Load (ETL) methods take too long • Data needs to be replicated into a data warehouse in real-time
• Continuous stream of information • Replicate everything
• Use data warehouse to provide join and analytics
Data Warehouse Choices • Oracle • Hadoop
– General purpose storage platform – Map Reduce for data processing – Front-end interfaces for interaction in SQL-like (Hive, HBase, Impala) and non-SQL (Pig, native, Spark) – JDBC/ODBC Interfaces improving
• Vertica – Massive cluster-based column store – SQL and ODBC/JDBC Interface
• Amazon Redshift – Highly flexible column store – Easy to deploy
21
(software formerly known as Tungsten Replicator) is a fast,
open source, database replication engine
Designed for speed and flexibility
Apache V2 license 100% open source, find it on Github
VMware Continuent for Replication/Data Warehouses
22
Transactional Store Data Warehouse
Dump/Provision
Transactions? X
Batch
The Data Warehouse Impedance Mismatch
Transactional and Data Warehouse Metadata • Replicating data is not just about the data • Table structures must be replicated too
• ddlscan handles the translation – Migrates an existing MySQL or Oracle schema into the target schema – Template based – Handles underlying data type matches – Needs to be executed before replication starts
Replicating into Vertica
Replicator
Replicator
CSV
JS
JDBC
cpimport
staging
base
merge
Replicating into Redshift
Replicator
Replicator
CSV
JS
JDBC
s3cmd
staging
base
merge
COPY
Replicating into Hadoop
Replicator
Replicator
CSV
JS
hadoop fs
Initial Materialization within Hadoop
load-reduce-check
Migrate staging/base DDL
Hive materialization
CSV
StagingTable
Base Table
Ongoing Materialization within Hadoop
materialize
Hive materialization
CSV
StagingTable
Base Table
Comparing Loading Methods for Hadoop Manual via CSV Sqoop Tungsten
Replicator
Process Manual/Scripted Manual/Scripted Fully Automated
Incremental Loading
Possible with DDL changes
Requires DDL changes
Fully Supported
Latency Full-load Intermittent Real-time
Extraction Requirements
Full table scan Full and partial table scans
Low-impact CDC/binlog scan
Sqoop and Materialization within Hadoop
Hive materialization
CSV
StagingTable
Base Table
Sqoop
Replicate
31
Op Seqno ID Msg
I 1 1 Hello World!
I 2 2 Meet MC
D 3 1
I 3 1 Goodbye World
Op Seqno
ID Msg
I 2 2 Meet MC
I 3 1 Goodbye World
How the Materialization Works
32
1 2 3 4 5 6 7 8 9 1 0
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
2 0
2 1
2 2
2 3
2 4
2 5
2 6
2 7
2 8
2 9
3 0
3 1
3 2
3 3
3 4
3 5
3 6
3 7
3 8
3 9
4 0
4 1
4 2
4 3
4 4
4 5
Monday Wednesday Friday
Data Warehouse Possibilities: Point in Time Tables
33
Op Seqno
ID Date Msg
I 1 1 1/6/14 Hello World!
I 2 2 2/6/14 Meet MC
I 3 1 2/6/14 Goodbye World
I 4 1 3/6/14 Hello Tuesday
I 4 2 3/6/14 Ruby Wednesday
I 5 1 4/6/14 Final Count
ID Date Msg 1 1/6/14 Hello World! 1 2/6/14 Goodbye World 1 3/6/14 Hello Tuesday 1 4/6/14 Final Count
Data Warehouse Possibilities: Time Series Generation
34
Agenda
1 Introduction to VMware Continuent
2 Understanding Continuent Replication
3 Using Analytics and Data Warehouses
4 Wrap-up and Questions
Wrap-up • VMware Continuent Replication provides robust, flexible capabilities that have
been battle-tested in demanding customer environments • Replication features compare favorably to Oracle GoldenGate and Data Guard • VMware Continuent handles HA/DR, data warehouse loading, and edge
application use cases
35
For more information, contact us: Robert Noyes Alliance Manager, AMER & LATAM [email protected] +1 (650) 575-0958 Philippe Bernard Alliance Manager, EMEA & APJ [email protected] +41 79 347 1385
MC Brown Senior Product Line Manager [email protected] Eero Teerikorpi Sr. Director, Strategic Alliance [email protected] +1 (408) 431-3305
www.vmware.com/products/continuent