Upload
kenny-gryp
View
614
Download
1
Embed Size (px)
Citation preview
My Experience as MySQLConsultant On Upgrading MySQLit's quite complex...
Kenny Gryp MySQL Practice Manager
Table of Contents
The O�cial Documentation
Make Your Own Documentation
Potential Risks
Establish Upgrade Method For A Single Server
Rollback Scenario Testing
Test Writes
Test Individual Reads
Workload Testing
Establish (& Test) Migration Process
Migration In Production
(Rollback)
Post-Migration Assessment
3 / 77
Oracle's Recommended Process
Backup your dataRead all release notes and assesshttps://dev.mysql.com/doc/relnotes/mysql/5.7/en/Read Changes Affecting Upgrades to MySQL 5.7https://dev.mysql.com/doc/refman/5.7/en/upgrading-from-previous-series.html
5 / 77
Oracle's Recommended Process
Upgrade Slaves FirstIn-Place Upgrade:
Clean shutdown (innodb_fast_shutdown=0)Run mysql_upgrade
Logical Upgrade:
mysqldump dataImport data againRun mysql_upgrade to �x mysql schema
http://dev.mysql.com/doc/refman/5.7/en/upgrading.html
7 / 77
Oracle's Recommended Process (cont.)
A Lot of Risk:
No guarantee queries will execute the sameNo guarantee queries will be same speed or fasterNo guarantee all your queries will still work (new defaultstricter sql_mode)
There is no o�cial support to upgrade from <5.6 to 5.7
but we might actually be able to do that
8 / 77
Documenting The Process
PEBKAC: Human errors happen and create issues
import data using wrong character setsetting up replica using wrong binlog �le/pos...
Document every step, we need to repeat it multiple times
10 / 77
Optimizer Changes
Example: index_merge_intersection
Often seen during migrations to MySQL 5.6Affects environments with sub-optimal indexingQueries with c1='a' AND c2='b' when composite index(c1,c2) is missingIs often slower when selectivity with 1 of the 2 columns isbad (and it happens frequently)Result: a lot of queries were slower in new environment
Need SELECT performance tests between versions
https://www.percona.com/blog/2012/12/14/the-optimization-that-often-isnt-index-merge-intersection/
12 / 77
New Defaults In MySQL 5.7
The new defaults in MySQL 5.7 make a lot of sense:
More use of available features and performanceenhancements out of the boxMore strictness with data/query validation
New Reserved wordsApplications might not be ready for it.
Drupal 7 - https://www.drupal.org/node/2545480They will/might break the application more easily:sql_mode=ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES, NO_ZERO_IN_DATE,NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER, NO_ENGINE_SUBSTITUTIONinnodb_strict_mode=1
Needs SELECT & DML query validity tests between versions 13 / 77
Other Changes in MySQL 5.7
Passwords that use the older pre-4.1 password hashing formatis removed.
14 / 77
MySQL 5.0.37
+-------+| 0 |+-------+
MySQL 5.0.45
+-------+| 1 |+-------+
Minor Versions Also At Risk
CREATE TABLE date (d DATE);INSERT INTO date VALUES ('2017-04-19');SELECT COUNT(*) FROM date WHERE d < NOW()-INTERVAL 1 DAY;
Seen with DELETE FROM date WHERE d < NOW()-INTERVAL 1 DAY in binlog_format=STATEMENTenvironments.Needs SELECT & DML query result tests between versions
15 / 77
Workload
SYNC_BINLOG=1 in MySQL 5.7
Can impact certain environments, might not be noticed whenlooking at a single query
InnoDB LRU Flushing changes require tuning for heavyworkloads in 5.6 (innodb_lru_scan_depth)When switching to MySQL 8.0 with the new data dictionary...Need to do Workload Testing between versions
http://mysqlentomologist.blogspot.com/2015/10/fun-with-bugs-38-regression-bugs-in.htmlhttp://lefred.be/content/sync_binlog-1-in-5-7/
16 / 77
Upgrade Method For A Single Server
Follow MySQL documentation:http://dev.mysql.com/doc/refman/5.7/en/upgrading.htmlEnsure to document every commandRestore from backupOr take a replica you can miss
20 / 77
Writes - Replication Consistency
pt-table-checksum: validate consistency in a replication topologyIdentify problems caused by PEBKACEnsure events replicate properly(binlog_format=STATEMENT)Upgrade a replica or add a replica which is using the modi�edversion.Do it on production, will have no result in test/staging
https://www.percona.com/doc/percona-toolkit/3.0/pt-table-checksum.html
24 / 77
Rollback Scenario Testing
Possibility to fall back in case something went wrong duringmigrationCan be done using replication, but has to be tested!
27 / 77
Rollback Scenario Testing
You might need to change some settings to your new my.cnf to beable to support replicating back.
Example:
binlog_checksum = NONEbinlog_row_image = FULLbinlog_rows_query_log_events = OFFlog_bin_use_v1_row_events = 1gtid_mode = OFFlog_slave_updates=1skip-slave-start
29 / 77
Where To Run pt-table-checksum?
GTID:
pt-table-checksum can only be run on Master (Errant Transactions)Or scratch the pt-table-checksum host after tests
non-GTID:
pt-table-checksum can be run on intermediate masterbinlog_format=ROW:
only 1 tier below can be checksummedrun on every tier that has a replica (for rollback)
pt-table-checksum can bring prod overhead when run onactive masterLet replication run for a while before checksumming
34 / 77
pt-table-checksum results
On every replica (including rollback):
SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunksFROM percona.checksumWHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc))GROUP BY db, tbl;
+----+-----------------+------------+--------+| db | tbl | total_rows | chunks |+----+-----------------+------------+--------+| db | telephone_debit | 44342 | 1 || db | orderline | 21451 | 3 || db | orders | 25125215 | 12 |+----+-----------------+------------+--------+
35 / 77
pt-table-checksum - Analysis
Which chunks failed?
db: db tbl: telephone_debit chunk: 100 chunk_time: 0.4956125 chunk_index: PRIMARYlower_boundary: 5014733upper_boundary: 5059074 this_crc: 7fd37eb9 this_cnt: 44342 master_crc: b7babd94 master_cnt: 44342 ts: 2013-02-05 01:59:48
37 / 77
pt-table-checksum - Analysis
Which chunks failed?
db: db tbl: telephone_debit chunk: 100 chunk_time: 0.4956125 chunk_index: PRIMARYlower_boundary: 5014733upper_boundary: 5059074 this_crc: 7fd37eb9 this_cnt: 44342 master_crc: b7babd94 master_cnt: 44342 ts: 2013-02-05 01:59:48
38 / 77
pt-table-checksum - Analysis
SELECT * INTO outfile '/tmp/telephone_debit_mysql56'FROM db.telephone_debitWHERE id BETWEEN 5014733 AND 5059074;
SELECT * INTO outfile '/tmp/telephone_debit_mysql57'FROM db.telephone_debitWHERE id BETWEEN 5014733 AND 5059074;
# diff -u /tmp/telephone_debit_mysql5{6,7}
39 / 77
pt-table-checksum - Analysis
SELECT * INTO outfile '/tmp/telephone_debit_mysql56'FROM db.telephone_debitWHERE id BETWEEN 5014733 AND 5059074;
SELECT * INTO outfile '/tmp/telephone_debit_mysql57'FROM db.telephone_debitWHERE id BETWEEN 5014733 AND 5059074;
# diff -u /tmp/telephone_debit_mysql5{6,7}
Use twindb_table_compare! https://github.com/twindb/twindb_table_compare
40 / 77
pt-table-checksum - Analysis
Wrong upgrade method
backupswrong replication �le/pos...binlog_format=STATEMENT using (UUID()...)
Common Seen Issues replicating older versions:
Floating point differences: Storing currencies in a DOUBLETemporal data typesInvalid dates converted to zero datesTrailing spaces in CHAR �elds
41 / 77
Testing Writes
Consistency Checks Process:
ChecksumCheck for differences
On new environmentOn rollback environment
For each inconsistency
Analyze diffFind root causeFix problemDocument problem & solution
Repeat checksum again
42 / 77
Testing Reads - Collect Queries
Collection Techniques:
Slow Query Log
long_query_time=0
Careful when ~+10000 QPSPercona Server: log_slow_rate_limit
tcpdump
'packets lost' in libpcap
Application/Load Balancer queries
Ensure:
Get the full workload (long enough)Get data from Master & ReplicasCollect batchjob queries running at night
https://www.percona.com/doc/percona-server/5.7/diagnostics/slow_extended.html
45 / 77
Testing Reads - Setup 2 Environments
Need 2 Test Servers:
Reuse servers from checksum + rollbackEnsure they have the same data (break replication at same time)Same HW speci�cationsSimilar Con�gurations on buffer pool, flatc...Fast enough to more or less resemble productionOptionally can be done using 1 machine (pt-upgrade --save-results)
47 / 77
Testing Reads - pt-upgrade
pt-upgrade:
runs one query at a time on both test environmentscompares differences:
warnings/errorsresultset (even different order)query response time
Run pt-upgrade on third host with similar network latencyRun twice to warm up buffer pool �rst (need to be equal)Can also compare writes for execution time & warningsFilter slowlog initially to limit similar queries
pt-query-digest --no-report --output slowlog --samples 20https://www.percona.com/doc/percona-toolkit/3.0/pt-upgrade.html
49 / 77
Testing Reads - pt-upgrade
Reporting class because there are 1000 row diffs.
Total queries 10Unique queries 10Discarded queries 0
select ... from ...
#### Row diffs: 10##-- 1.@ row 2< 13178,"dim0",37,2,21,,,0,0,0,1,NULL,NULL> 13178,"dimø",37,2,21,,,0,0,0,1,NULL,NULL...
50 / 77
Testing Reads - pt-upgrade
Reporting class because it has diffs, but hasn't been reported yet.
SELECT * FROM `database`.table WHERE treeid = '' AND productid='0'
## Warning diffs: 2
Code: 1366 Level: WarningMessage: Incorrect integer value: '' for column 'treeid' at row 1
vs.
No warning 1366
51 / 77
Testing Reads - pt-upgrade
SELECT *FROM `database`.client_ordersWHERE client=? AND blacklist=? LIMIT ?
## Query time diffs: 1
-- 1.
0.000513 vs. 0.036395 seconds (70.9x increase)
SELECT *FROM `database`.client_ordersWHERE client=57450 AND blacklist=1 LIMIT 1
52 / 77
Testing Reads Process
Collect queriesRun pt-upgrade (twice)For each entry in report
Figure out why it is reportedDeploy �x in Prod ApplicationMake schema changesDocument analysis
Run pt-upgrade again
53 / 77
Workload Testing - Query Playback
Uses slowlog to replay queries
Needs long_query_time=0 - challenging on busy serversEnough data during peak workload
Tries to execute workload as realistically as possible same connections, same transactions, same delays betweenqueriesRun against both environments, compare speed
Think about preloading buffer on both the same way
Active development by Marius Wachtler (ex)-DropBox! Thankyou!(uno�cal product of Percona, no support)
56 / 77
Workload Testing - ProxySQL Mirroring
Mirror queries from Load Balancer to test environmentGood Blogpost: https://www.pythian.com/blog/using-proxysql-validate-mysql-updates/
58 / 77
Migration Process
Create Migration Plan
Different for every environment/applicationUpgrade a replica �rst for a couple of days/weeks?How to switch masters?
How is failover being handled nowadays? MHA, Orchestrator, Manual, GTID/msyqlrpladmin...?
Test in staging!
60 / 77
Rollback
What went wrong?I did not follow the full process! (or I forgot to document it)Do consistency checks again!
68 / 77
Post-Migration
Check trending for different behavior
more cpu load?more disk IO?higher amount of innodb_rows_* and handler_*threads_running stability?do some query optimization
If all looks good, scratch the 5.6 rollback & make it 5.7Remove the rollback speci�c con�guration options
70 / 77
Multi-Use
(Minor MySQL version upgrades)Major MySQL version upgradesSwitching Hardware from Intel -> AMD archictureUsing a new kernel/libc/memory allocatorSwitching storage enginesMariaDB/Percona Server/MySQL...
73 / 77
Do I really have to go through this?
Many success stories:
Have done several MySQL upgrades from 4.1 -> 5.5 without intermediate slavesUpgraded environments with major schema changes in the mix (mssql-style environments using stored procedures only)Found numerous application bugs using this processOptimized many customers schemas/queries in the meantime
As long as you follow this process completely, the risk of running into problems is quite small.
74 / 77
Do I really have to go through this?
It Depends:
Your business might be risk-averse: every change has to be thoroughly testedOther companies just upgrade a replica in production and seehow it goes
My suggestion to do this at least for:
Major MySQL version upgradesSwitching storage engines
75 / 77
Summary
Test Step Skip?
Document Upgrade Single Server Really? Why?
Rollback Scenarios Not Recommended
Consistency Checks Required, No Debate!
Read Tests Strongly Suggested
Workload Tests Possible (Early Adopter Alert)
Migration Tests Not Recommended To Skip
76 / 77