Upload
continuent
View
423
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Continuent Tungsten offers real-time replication from MySQL to a variety of DBMS types including Vertica. In this Tungsten University webcast we will show you the details of setting up MySQL-to-Vertica replication, including the following topics: • Introduction to Continuent Tungsten features for data warehouse loading • Installation for MySQL to Vertica replication • Best practices for applications: charsets, schema design, time zones, etc. • Techniques for filtering and transforming data • Performance tuning and trouble shooting • Adapting batch loading for new use cases The webinar includes technical information and a live demo to help you get your own data warehouse loading application up and running quickly.
Citation preview
©Continuent 2013
Tungsten University: Load a Vertica Data
Warehouse with MySQL DataRobert Hodges
CEO, Continuent
©Continuent 2013
Introducing Continuent
2
• The leading provider of clustering and replication for open source DBMS
• Our Product: Continuent Tungsten
• Clustering - Commercial-grade HA, performance scaling and data management for MySQL
• Replication - Flexible, high-performance data movement
©Continuent 2013
OLTP and Data Warehouse Fundamentals
3
©Continuent 2013
The Contenders
4
Popular open source RDBMS for transaction
processing
Popular closed source RDBMS
for analytics
©Continuent 2013
Storage Layout in MySQL
5
id cust_id prod_id ...
1 335301 532 ...
2 2378 6235 ...
3 ... ... ...
Sales Table
id sku type
532 C00135 consumer
533 S09957 specialty
... ...
Product Tableprod_id id
532 1
6235 2
... ...
Prod_ID Index
Row format makes table scans very
slow
Indexes slow OLTP
Low/no data compression
Limited indextypes
Limited join
types
©Continuent 2013
Storage Layout in Vertica
6
Sales Table
cust_id
335301
2378
...
prod_id
532
6235
...
Fast scans on columns
Updates to single rows are
hideously slow
quantity
1
3
...
id
1
2
3
Every column is an index
Good compression
id
532
533
...
sku
C00135
S09957
...
type
consumer
specialty
...
Product Table
Fast joins with parallel
query
©Continuent 2013
Traditional ETL Problems
7
MySQL
SalesTable
SalesTable
LoadTransferExtract
Date columns = intrusive
Batch-oriented = not timely
Scan for changes = performance hit
©Continuent 2013
Questions for Real-Time Loading
• Do I need to transform data and if so how?
• Do I need to clean up bad information?
• Do I need to process UPDATE/DELETE too?
• Do I need to load from multiple sources?
• How timely do loads need to be?
• What if something fails?
8
©Continuent 2013
Tungsten Replicator Basics
9
©Continuent 2013
Real-Time Data Replication
10
MySQL
SalesTable
SalesTable
Fast propagation = timely
No SQL changes = transparent
Automatic change capture = low impact
DBMSLogs
Data Replication
©Continuent 2013
Tungsten Master/Slave in Action
11
Master
(Transactions + Metadata)
Slave
THL
DBMSLogs
Replicator
(Transactions + Metadata)
THLReplicator
Download transactions via network
Apply using JDBC
©Continuent 2013
Pipelines with Parallel Apply
12
Extract Filter Apply
StageExtract Filter Apply
StageStage
Pipeline
RemoteMaster
TransactionHistory Log
ParallelQueue
SlaveDBMS
Extract Filter ApplyExtract Filter ApplyExtract Filter Apply
(Assign Shard ID)
©Continuent 2013
Real-Time Batch Loading
13
MySQL Tungsten Master Replicator
Service my2vr
MySQLExtractorSpecial Filters* pkey - Fill in pkey info* colnames - Fill in names* replicate - Ignore tables
binlog_format=row
Tungsten Slave Replicator
Service my2vr
MySQLBinlog
CSVFilesCSVFilesCSVFilesCSVFilesCSVFiles
Large transaction batches to leverage load parallelization
Single transactions from OLTP operations
©Continuent 2013
Batch Loading--The Gory Details
14
Replicator
Service my2vrTransactions from master
CSVFilesCSVFilesCSVFiles
StagingTablesStagingTablesStagingTables
Base Tables
Base Tables
Base Tables
Merge Script
(or)COPY
directly to base tables
COPY to stage tables SELECT to
base tables
©Continuent 2013
Setting Up MySQL to Vertica Replication
15
©Continuent 2013
DEMO
16
MySQL to Vertica replication with some bells and a whistle
MySQL
db01db02db03
db01renamed02
Xsysbenchsysbenchsysbench
©Continuent 2013
Get the Code
wget --no-check-certificate https://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/tungsten-replicator-2.1.0-285.tar.gz
tar -xf tungsten-replicator-2.1.0-285.tar.gz
cd tungsten-replicator-2.1.0-285
17
©Continuent 2013
Installing MySQL Master
18
tools/tungsten-installer --master-slave -a \ --service-name=mysql2vertica \ --master-host=mysql1 \ --cluster-hosts=mysql1 \ --datasource-user=tungsten \ --datasource-password=secret \ --home-directory=/opt/continuent \ --buffer-size=100 \ --java-file-encoding=UTF8 \ --java-user-timezone=GMT \ --mysql-use-bytes-for-string=false \ --svc-extractor-filters=replicate,colnames,pkey \ --property=replicator.filter.pkey.addPkeyToInserts=true \ --property=replicator.filter.pkey.addColumnsToDeletes=true \ --property=replicator.filter.replicate.do=db01.*,db02.* \ --start-and-report
©Continuent 2013
Installing Vertica Slave
19
$ tools/tungsten-installer --master-slave -a \ --service-name=mysql2vertica \ --home-directory=/opt/continuent \ --cluster-hosts=vertica1 \ --master-host=mysql1 \ --datasource-type=vertica \ --datasource-user=dbadmin \ --datasource-password=secret \ --datasource-port=5433 \ --batch-enabled=true --batch-load-template=vertica6 \ --vertica-dbname=bigdata \ --java-user-timezone=GMT \ --java-file-encoding=UTF8 \ --svc-applier-filters=dbtransform \ --property=replicator.filter.dbtransform.from_regex1=db02 \ --property=replicator.filter.dbtransform.to_regex1=renamed02 \ --property=replicator.stage.q-to-dbms.blockCommitRowCount=25000 \ --start-and-report
©Continuent 2013
Generate Schema Using ddlscan
20
•Data types?•Column lengths?•Naming conventions?•Staging tables?
MySQL Tables
ddlscan
©Continuent 2013
Tungsten ddlscan Utility
cd /opt/continuent/tungsten/tungsten-replicator/bin
# Base table generation../ddlscan -template ddl-mysql-vertica.vm \ -db db01 -user tungsten -pass secret >> ddl.sql
# Staging table generation./ddlscan -template ddl-mysql-vertica-staging.vm \ -db db01 -user tungsten -pass secret >> ddl.sql
# Load into Verticavsql -Udbadmin -wsecret < ddl.sql
21
©Continuent 2013
Checking Status
# Checking status on mastertrepctl -host logos1 heartbeattrepctl -host logos1 status
# Checking status on slavetrepctl -host vertica1 status
# Checking detailed performance of apply task. trepctl -host vertica1 status -name tasks
22
©Continuent 2013
Application Tips and Tricks
23
©Continuent 2013
Application Design Practices
24
• Primary keys on all tables
• (Tungsten requires single column keys)
• Clean schema design *really* helps
• UTF-8 character set--or at least be consistent
• Use GMT timezone--or be very consistent about dates
• Use row replication on MySQL master
©Continuent 2013
Transforming Data -- Replicator Filters
25
• Tables to ignore/include?
• Schema/table/column renaming?
• Map names to upper/lower case?
• Drop data?
tungsten-installer --master-slave -a \ --service-name=mysql2vertica \ ... --svc-extractor-filters=pkey,colnames,replicate \ --property=replicator.filter.replicate.do=db01.*,db02.*\ ...
©Continuent 2013
List of Commonly Used Filters
26
• CDC -- Transform log to record of changes
• colnames -- Add column names
• dbtransform -- Change db name only
• enumtostring -- Make MySQL enums a string
• pkey -- Add primary key metadata
• rename -- Rename db/table/column
• replicate -- Replicate/don’t replicate tables
• zerodate2null -- Make MySQL ‘0’ dates null
©Continuent 2013
Transforming Data -- Staging Server(s)
27
OLTP Servers
StagingServer with Triggers/SQL
Vertica Cluster
©Continuent 2013
Transforming Data -- Merge Script Hacks
28
# Hacked load script for Vertica--deletes always precede inserts, so# inserts can load directly.
# Extract deleted data keys and put in temp CSV file for deletes. !egrep '^"D",' %%CSV_FILE%% |cut -d, -f4 > %%CSV_FILE%%.deleteCOPY %%STAGE_TABLE_FQN%% FROM '%%CSV_FILE%%.delete' DIRECT NULL 'null' DELIMITER ',' ENCLOSED BY '"'
# Delete rows using an IN clause. You could also set a column value to # mark deleted rows. DELETE FROM %%BASE_TABLE%% WHERE %%BASE_PKEY%% IN (SELECT %%STAGE_PKEY%% FROM %%STAGE_TABLE_FQN%%)
# Load inserts directly into base table from a separate CSV file. !egrep '^"I",' %%CSV_FILE%% |cut -d, -f4- > %%CSV_FILE%%.insert
COPY %%BASE_TABLE%% FROM '%%CSV_FILE%%.insert' DIRECT NULL 'null' DELIMITER ',' ENCLOSED BY '"'
©Continuent 2013
Provisioning -- Using CSV
29
mysql> SELECT * from sales INTO OUTFILE ‘sales.csv’;...(Fix up data if necessary)...vsql> COPY sales FROM 'sales.csv' DIRECT NULL 'null' DELIMITER ',' ENCLOSED BY '"';
©Continuent 2013
Provisioning Using a Sandbox Server
30
OLTP Server
Temporary Sandbox Server
Vertica Cluster
1. Restore logical backup
2. Replicate restored transactions
3. Replicate normally after restore loads
©Continuent 2013
Parallel Provisioning from Sandbox
31
OLTP Server
Temporary Sandbox Server
Vertica Cluster
1. Restore logical backup
2. Replicate restored data in parallel
3. Replicate normally after restore loads
©Continuent 2013
Complex Topologies: Fan-In
32
VerticaCluster
logos1
Master
logos2
Master
logos2
SlaveServices
logos1
©Continuent 2013
Wrapping Up
33
©Continuent 2013
Tungsten University Sessions
34
• Load a Vertica Data Warehouse with MySQL Data (May 30 10am PDT and June 4, 4pm CEST)
Send feedback to: [email protected]
©Continuent 2012.
Continuent Web Page:http://www.continuent.com
Tungsten Replicator 2.0:http://code.google.com/p/tungsten-replicator
Our Blogs:http://scale-out-blog.blogspot.comhttp://!yingclusters.blogspot.comhttp://datacharmer.org/bloghttp://www.continuent.com/news/blogs
560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel +1 (866) 998-3642 Fax +1 (408) 668-1009e-mail: [email protected]