39
© Continuent 2010 Linas Virbalas Continuent, Inc.

Breaking the-database-type-barrier-replicating-across-different-dbms

Embed Size (px)

DESCRIPTION

Sharing data between different DBMS types is an inevitable need in Today's diverse IT environments. Need for real-time data integration, seamless migration and data warehousing are the main reasons driving demand for heterogenous replication. In this talk we'll review how open source Tungsten Replicator can replicate data in real-time between databases like MySQL, PostgreSQL, Oracle, MongoDB and others. Join us for this both technical and enlightening talk.We'll cover fundamental steps behind configuring heterogeneous replication, the importance of transaction transforming filters and common challenges rising when replicating cross DBMS-type. We'll conclude with in-line demos to show you how it looks in action.

Citation preview

Page 1: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Linas Virbalas Continuent, Inc.

Page 2: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

/  Definition & Motivation /  Scoping the Challenge /  MySQL ->

•  PostgreSQL •  Oracle •  MongoDB

/  Demo 1 /  PostgreSQL ->

•  MySQL

/  Demo 2 /  Q&A

Page 3: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Page 4: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Heterogeneous Replication

Replication between different types of DBMS

Page 5: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

1.  Real-time integration of data between different DBMS types

2.  Seamless migration out of one DBMS type to another 3.  Data warehousing (real-time) from different DBMS

types 4.  Leveraging specific SQL power of other DBMS types

Page 6: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

/  Name: Linas Virbalas /  Country: Lithuania /  Implementing for Tungsten:

•  MySQL -> PostgreSQL •  MySQL -> Greenplum •  MySQL -> Oracle •  PostgreSQL WAL •  PostgreSQL Streaming Replication •  PostgreSQL Logical Replication

via Slony logs

/  Blog: http://flyingclusters.blogspot.com

Page 7: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Page 8: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

1.  MySQL -> … •  Replicating from MySQL to PostgreSQL/Greenplum, Oracle,

MongoDB

2.  PostgreSQL -> … •  Replicating from PostgreSQL to MySQL

Page 9: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

With Tungsten Replicator

Page 10: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

/  Open Source GPL v2 /  JAVA /  Interfaces to implement new:

•  Extractors •  Filters •  Appliers

/  Multiple replication services per one process

Page 11: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Technology: Replication Pipelines

Page 12: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Page 13: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

/  Statement Based Replication

/  Row Based Replication

Page 14: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Page 15: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Master Replicator

MySQL Extractor

Transaction History Log

Slave Replicator

PostgreSQL Applier

Transaction History Log

Filters Filters

Page 16: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

/  Provisioning /  Data Type Differences /  Database vs. Schema /  Default (Implicitly Defined) Schema Selection /  SQL Dialect Differences

•  Statement Replication vs. Row Replication

/  Character Sets and Binary Data /  Old Versions of MySQL

Page 17: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Provisioning

/  Harder way: Dump data explicitly

/  Easier way: Replicate a mysqldump backup

Replicator

Page 18: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

MySQL PostgreSQL ! TINYINT SMALLINT

SMALLINT SMALLINT INTEGER INTEGER BIGINT BIGINT

! CHAR(1) CHAR(5) = {‘true’, ‘false’} CHAR(x) CHAR(x) VARCHAR(x) VARCHAR(x) DATE DATE TIMESTAMP TIMESTAMP

! TEXT (diff. sizes) TEXT ! BLOB BYTEA

/  Note the type differences between MySQL and PG

Page 19: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Database vs. Schema

/  In MySQL these are the same: ! !CREATE DATABASE foo!

! !CREATE SCHEMA foo!

/  In PostgreSQL these are very different: CREATE DATABASE foo!! !CREATE SCHEMA foo!

/  Tungsten uses filters to rectify MySQL databases to PostgreSQL schemas

Page 20: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

MySQL Implicit MySQL Explicit CREATE SCHEMA s; CREATE SCHEMA s; USE s;

! CREATE TABLE t (i int); CREATE TABLE s.t (i int); ! INSERT INTO t (1); INSERT INTO s.t (1);

/  MySQL: Trivial to use `USE` /  MySQL: Going without `USE` generates different

events

/  PG: Extract the default schema from the event /  PG: Set it before applying

MySQL PostgreSQL USE s; > SET search_path TO s, "$user”;

Page 21: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

MySQL PostgreSQL CREATE TABLE complex (id INTEGER AUTO_INCREMENT PRIMARY KEY, i INT);

CREATE TABLE complex (id SERIAL PRIMARY KEY, i INT);

CREATE TABLE dt (i TINYINT); CREATE TABLE dt (i SMALLINT); …

/  Differences between DDL and DML statement SQL dialects

/  Row Replication resolves issues rising from differences in DML, but still leaves DDL to handle

/  Tungsten Replicator Filters come to the rescue! •  Simple to develop Java or JavaScript extensions •  Event structure IN -> Filter -> Event structure OUT

Page 22: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

MySQL PostgreSQL INSERT INTO embedded_blob (key, data) VALUES (1, ‘?\0^Es\0^\0\’’)

ARGH!!! (SQL statement fails)

create table xlate(id int, d1 varchar(25) character set latin1, d2 varchar(25) character set utf8);

ARGH!!! (no way to translate to common charset)

/  Statement replication: MySQL syntax is “permissive” /  Embedded binary / alternate charsets /  Different charsets for different clients

/  Row replication: database/table/column charsets may differ

/  Answer: Stick with one character set throughout; use row replication to move binary data

Page 23: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

MySQL Versions

/  Problem: Data stored on hard-to-replicate MySQL versions or configurations

•  Row replication not enabled (5.1) •  No row replication support (5.0, 4.1) •  Tungsten cannot read binlog (4.1)

/  Answer: MySQL blackhole replication •  (Blackhole = no store, just a binlog) •  Caveat: Check MySQL docs carefully

Replicator

Page 24: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Page 25: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Master Replicator

MySQL Extractor

Transaction History Log

Slave Replicator

Oracle Applier

Transaction History Log

Filters Filters

Page 26: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

/  TEXT length limitation •  VARCHAR(4000) => CLOB

/  Primary Keys and PrimaryKeyFilter •  Goal:

UPDATE t SET c1 = x1, c2 = x2, c3 = x3 WHERE p = p1

•  NOT:

UPDATE t SET c1 = x1, c2 = x2, c3 = x3 WHERE p = p1 AND c1 = x1 AND c2 = x2 AND c3 = x3 AND …!

Page 27: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Page 28: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

> use mydb switched to db mydb!

> db.test.insert( {"test": "test value", "anumber" : 5 } )!

> db.test.find() { "_id" : ObjectId("4dce9a4f3d6e186ffccdd4bb"), "test" : "test value", "anumber" : 5 }!

> exit!

Page 29: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

/  MySQL binary log doesn’t hold column names

•  mysql> INSERT INTO foo (id, data) VALUES (1, 'hello from MySQL!');

•  If nothing done becomes:

> db.foo.find(); { "_id" : ObjectId("4dc55e45ad90a25b9b57909d"), " " : "1”, " " : "hello from MySQL!”}

•  Solution: to fill in column names on master side. Then:

> db.foo.find(); { "_id" : ObjectId("4dc55e45ad90a25b9b57909d"), ” " : "1”, “ " : "hello from MySQL!”}

Page 30: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

MySQL -> MongoDB: The Pipeline

Page 31: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Page 32: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Page 33: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Logical Physical MySQL Statement Based x

MySQL Row Based x MySQL Mixed x

PostgreSQL WAL Shipping x PostgreSQL Streaming Replication x Filters (data transformation) possible + -

Different data/structure on slave possible

+ -

/  A transaction is not accessible to the replicator under physical replication

/  Tungsten Replicator manages WAL/Streaming Replication

Page 34: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Logical Physical MySQL Statement Based x

MySQL Row Based x MySQL Mixed x

PostgreSQL WAL Shipping x PostgreSQL Streaming Replication x

Tungsten Replicator w/ PostgreSQLSlonyExtractor

x

Filters (data transformation) possible + - Different data/structure on slave

possible + -

/  With PostgreSQLSlonyExtractor transaction goes through the Replicator pipeline

Page 35: Breaking the-database-type-barrier-replicating-across-different-dbms

Slave Replicator

MySQLApplier

Transaction History Log

Master Replicator

PostgreSQL SlonyExtractor

Transaction History Log

Filters Filters

Page 36: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Page 37: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

/  We’ve reviewed an open source heterogeneous replicator (professional services available upon request)

/  Tungsten Replicator encapsulates the complexity and corner cases of the subject

/  Replicating: •  out of MySQL – now; •  out of PostgreSQL – prototype; •  out of Oracle – designs ready, awaiting sponsorship.

Page 38: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Page 39: Breaking the-database-type-barrier-replicating-across-different-dbms

© Continuent 2010

Open Source http://tungsten-replicator.org #tungsten @ irc.freenode.net

My Blog: http://flyingclusters.blogspot.com

Commercial [email protected]

Continuent Web Site: http://www.continuent.com