Upload
linas-virbalas
View
932
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Sharing data between different DBMS types is an inevitable need in Today's diverse IT environments. Need for real-time data integration, seamless migration and data warehousing are the main reasons driving demand for heterogenous replication. In this talk we'll review how open source Tungsten Replicator can replicate data in real-time between databases like MySQL, PostgreSQL, Oracle, MongoDB and others. Join us for this both technical and enlightening talk.We'll cover fundamental steps behind configuring heterogeneous replication, the importance of transaction transforming filters and common challenges rising when replicating cross DBMS-type. We'll conclude with in-line demos to show you how it looks in action.
Citation preview
© Continuent 2010
Linas Virbalas Continuent, Inc.
© Continuent 2010
/ Definition & Motivation / Scoping the Challenge / MySQL ->
• PostgreSQL • Oracle • MongoDB
/ Demo 1 / PostgreSQL ->
• MySQL
/ Demo 2 / Q&A
© Continuent 2010
© Continuent 2010
Heterogeneous Replication
Replication between different types of DBMS
© Continuent 2010
1. Real-time integration of data between different DBMS types
2. Seamless migration out of one DBMS type to another 3. Data warehousing (real-time) from different DBMS
types 4. Leveraging specific SQL power of other DBMS types
© Continuent 2010
/ Name: Linas Virbalas / Country: Lithuania / Implementing for Tungsten:
• MySQL -> PostgreSQL • MySQL -> Greenplum • MySQL -> Oracle • PostgreSQL WAL • PostgreSQL Streaming Replication • PostgreSQL Logical Replication
via Slony logs
/ Blog: http://flyingclusters.blogspot.com
© Continuent 2010
© Continuent 2010
1. MySQL -> … • Replicating from MySQL to PostgreSQL/Greenplum, Oracle,
MongoDB
2. PostgreSQL -> … • Replicating from PostgreSQL to MySQL
© Continuent 2010
With Tungsten Replicator
© Continuent 2010
/ Open Source GPL v2 / JAVA / Interfaces to implement new:
• Extractors • Filters • Appliers
/ Multiple replication services per one process
© Continuent 2010
Technology: Replication Pipelines
© Continuent 2010
© Continuent 2010
/ Statement Based Replication
/ Row Based Replication
© Continuent 2010
© Continuent 2010
Master Replicator
MySQL Extractor
Transaction History Log
Slave Replicator
PostgreSQL Applier
Transaction History Log
Filters Filters
© Continuent 2010
/ Provisioning / Data Type Differences / Database vs. Schema / Default (Implicitly Defined) Schema Selection / SQL Dialect Differences
• Statement Replication vs. Row Replication
/ Character Sets and Binary Data / Old Versions of MySQL
© Continuent 2010
Provisioning
/ Harder way: Dump data explicitly
/ Easier way: Replicate a mysqldump backup
Replicator
© Continuent 2010
MySQL PostgreSQL ! TINYINT SMALLINT
SMALLINT SMALLINT INTEGER INTEGER BIGINT BIGINT
! CHAR(1) CHAR(5) = {‘true’, ‘false’} CHAR(x) CHAR(x) VARCHAR(x) VARCHAR(x) DATE DATE TIMESTAMP TIMESTAMP
! TEXT (diff. sizes) TEXT ! BLOB BYTEA
…
/ Note the type differences between MySQL and PG
© Continuent 2010
Database vs. Schema
/ In MySQL these are the same: ! !CREATE DATABASE foo!
! !CREATE SCHEMA foo!
/ In PostgreSQL these are very different: CREATE DATABASE foo!! !CREATE SCHEMA foo!
/ Tungsten uses filters to rectify MySQL databases to PostgreSQL schemas
© Continuent 2010
MySQL Implicit MySQL Explicit CREATE SCHEMA s; CREATE SCHEMA s; USE s;
! CREATE TABLE t (i int); CREATE TABLE s.t (i int); ! INSERT INTO t (1); INSERT INTO s.t (1);
/ MySQL: Trivial to use `USE` / MySQL: Going without `USE` generates different
events
/ PG: Extract the default schema from the event / PG: Set it before applying
MySQL PostgreSQL USE s; > SET search_path TO s, "$user”;
© Continuent 2010
MySQL PostgreSQL CREATE TABLE complex (id INTEGER AUTO_INCREMENT PRIMARY KEY, i INT);
CREATE TABLE complex (id SERIAL PRIMARY KEY, i INT);
CREATE TABLE dt (i TINYINT); CREATE TABLE dt (i SMALLINT); …
/ Differences between DDL and DML statement SQL dialects
/ Row Replication resolves issues rising from differences in DML, but still leaves DDL to handle
/ Tungsten Replicator Filters come to the rescue! • Simple to develop Java or JavaScript extensions • Event structure IN -> Filter -> Event structure OUT
© Continuent 2010
MySQL PostgreSQL INSERT INTO embedded_blob (key, data) VALUES (1, ‘?\0^Es\0^\0\’’)
ARGH!!! (SQL statement fails)
create table xlate(id int, d1 varchar(25) character set latin1, d2 varchar(25) character set utf8);
ARGH!!! (no way to translate to common charset)
/ Statement replication: MySQL syntax is “permissive” / Embedded binary / alternate charsets / Different charsets for different clients
/ Row replication: database/table/column charsets may differ
/ Answer: Stick with one character set throughout; use row replication to move binary data
© Continuent 2010
MySQL Versions
/ Problem: Data stored on hard-to-replicate MySQL versions or configurations
• Row replication not enabled (5.1) • No row replication support (5.0, 4.1) • Tungsten cannot read binlog (4.1)
/ Answer: MySQL blackhole replication • (Blackhole = no store, just a binlog) • Caveat: Check MySQL docs carefully
Replicator
© Continuent 2010
© Continuent 2010
Master Replicator
MySQL Extractor
Transaction History Log
Slave Replicator
Oracle Applier
Transaction History Log
Filters Filters
© Continuent 2010
/ TEXT length limitation • VARCHAR(4000) => CLOB
/ Primary Keys and PrimaryKeyFilter • Goal:
UPDATE t SET c1 = x1, c2 = x2, c3 = x3 WHERE p = p1
• NOT:
UPDATE t SET c1 = x1, c2 = x2, c3 = x3 WHERE p = p1 AND c1 = x1 AND c2 = x2 AND c3 = x3 AND …!
© Continuent 2010
© Continuent 2010
> use mydb switched to db mydb!
> db.test.insert( {"test": "test value", "anumber" : 5 } )!
> db.test.find() { "_id" : ObjectId("4dce9a4f3d6e186ffccdd4bb"), "test" : "test value", "anumber" : 5 }!
> exit!
© Continuent 2010
/ MySQL binary log doesn’t hold column names
• mysql> INSERT INTO foo (id, data) VALUES (1, 'hello from MySQL!');
• If nothing done becomes:
> db.foo.find(); { "_id" : ObjectId("4dc55e45ad90a25b9b57909d"), " " : "1”, " " : "hello from MySQL!”}
• Solution: to fill in column names on master side. Then:
> db.foo.find(); { "_id" : ObjectId("4dc55e45ad90a25b9b57909d"), ” " : "1”, “ " : "hello from MySQL!”}
© Continuent 2010
MySQL -> MongoDB: The Pipeline
© Continuent 2010
© Continuent 2010
© Continuent 2010
Logical Physical MySQL Statement Based x
MySQL Row Based x MySQL Mixed x
PostgreSQL WAL Shipping x PostgreSQL Streaming Replication x Filters (data transformation) possible + -
Different data/structure on slave possible
+ -
/ A transaction is not accessible to the replicator under physical replication
/ Tungsten Replicator manages WAL/Streaming Replication
© Continuent 2010
Logical Physical MySQL Statement Based x
MySQL Row Based x MySQL Mixed x
PostgreSQL WAL Shipping x PostgreSQL Streaming Replication x
Tungsten Replicator w/ PostgreSQLSlonyExtractor
x
Filters (data transformation) possible + - Different data/structure on slave
possible + -
/ With PostgreSQLSlonyExtractor transaction goes through the Replicator pipeline
Slave Replicator
MySQLApplier
Transaction History Log
Master Replicator
PostgreSQL SlonyExtractor
Transaction History Log
Filters Filters
© Continuent 2010
© Continuent 2010
/ We’ve reviewed an open source heterogeneous replicator (professional services available upon request)
/ Tungsten Replicator encapsulates the complexity and corner cases of the subject
/ Replicating: • out of MySQL – now; • out of PostgreSQL – prototype; • out of Oracle – designs ready, awaiting sponsorship.
© Continuent 2010
© Continuent 2010
Open Source http://tungsten-replicator.org #tungsten @ irc.freenode.net
My Blog: http://flyingclusters.blogspot.com
Commercial [email protected]
Continuent Web Site: http://www.continuent.com