Upload
olivier-dasini
View
3.314
Download
0
Embed Size (px)
Citation preview
Olivier Dasini - @freshdaz
Upgrade to MySQL 5.6 without downtime
Meetup LeMug.fr @Dailymotion - Paris - Sept 17, 2015
1
Olivier Dasini - @freshdaz
AgendaMe, Myself & I
Technical background
Why upgrade to 5.6?
Performance testing
Preprod upgrade
Production upgrade
Wrap-up2
Olivier Dasini - @freshdaz
Olivier DASINI - @freshdaz
● MySQL Geek & Data enthusiast
● Technical writer, blogger and speaker
● Insatiable hunger of learning
● co-creator of French MySQL User Group
Me, Myself & I
3
Olivier Dasini - @freshdaz
AgendaMe, Myself & I
Technical background
Why upgrade to 5.6?
Performance testing
Preprod upgrade
Production upgrade
Wrap-up4
Olivier Dasini - @freshdaz
Technical background 1/3Can split MySQL users in 3 types regarding their working set order of magnitude:
● <= Tens of GBs : 20%○ MySQL usage probably not (so) critical○ Migration (quite) easy, could be manual
● <= Tens of TBs : 75%○ MySQL is critical => strong production constraints○ Migration should be carefully planned○ Need automation however some parts could be manual
● >= Hundreds+ of TBs : 5%○ MySQL highly critical. think twice (or more) before upgrading.○ Same than above w/ automation (everywhere)
5
Olivier Dasini - @freshdaz
Technical background 2/3The company :
● Software development
● Provides a cloud-based customer service platform
○ ~ 1,000 people
○ ~ 60,000 paid customers in 150 countries
6
Olivier Dasini - @freshdaz
Technical background 3/3MySQL flavour : Percona Server 5.5 on Fusion IO
Data size : ~ 30 TB | Daily growth rate : up to 40 GB
# MySQL group of replicas (1 Master / n Slaves) : ~ 50
# MySQL instances : ~ 200
Mostly OLTP oriented workload - InnoDB tables
Thousands qps, mostly reads (Selects)
Replication lag sensitive
No downtime allowed!!!
7
Olivier Dasini - @freshdaz
AgendaMe, Myself & I
Technical background
Why upgrade to 5.6?
Performance testing
Preprod upgrade
Production upgrade
Wrap-up8
Olivier Dasini - @freshdaz
Why upgrade to 5.6? 1/3Tons of new cool stuffs :
● Security improvements● InnoDB enhancements● Partitioning● Performance Schema● Replication and logging● Optimizer enhancements● …
Complete list : http://dev.mysql.com/doc/refman/5.6/en/mysql-nutshell.html
9
Olivier Dasini - @freshdaz
Why upgrade to 5.6? 2/3Choose what features we'd like to have.
Team brainstorming...
● Define which added features will suit○ Schedule when we'll use them○ Avoid too many changes at one time
● Pay attention to deprecated features○ They'll probably be removed in future version○ Shouldn't be used anymore
● Pay extra attention to removed features○ They'll break your server
10
Olivier Dasini - @freshdaz
Why upgrade to 5.6? 3/3Team brainstorming result :
● InnoDB enhancement○ Persistent stats ○ Online DDL○ New flushing algo○ New checksum algo
● Performance Schema● Replication
○ Smaller image for Row base replication○ Crash safe Master ⇔ Crash safe binlog○ Crash safe Slave ⇔ Table logging for master / slaves info○ GTID (for automatic Switchover/Failover) : [Phase 2]○ Parallel replication : [Phase 3]
● Optimizer enhancements...11
Upgrade Confidence Index : 60%
Olivier Dasini - @freshdaz
AgendaMe, Myself & I
Technical background
Why upgrade to 5.6?
Performance testing
Preprod upgrade
Production upgrade
Wrap-up12
Olivier Dasini - @freshdaz
Performance testing 1/135.6 upgrade will be awesome
(at least in theory)
Many articles proves it, Yeah!
http://dimitrik.free.fr/blog/archives/2013/02/mysql-performance-mysql-56-vs-mysql-55-vs-mariadb-55.htmlhttps://blogs.oracle.com/MySQL/entry/mysql_5_6_is_a
Benchmarks never lies :)… but is their truth ours?
In real life perf will depend on many factors like workload, hardware, configurations, …
What about us?
13
Olivier Dasini - @freshdaz
Performance testing 2/13● The plan is to get our own numbers● Compare 5.5 and 5.6 performances in a production context● Unfortunately we have customers !!! :)● Out of production but with similar context (as far as
possible)○ Data○ Queries○ Workload○ Hardware○ Configuration...
=> Ad-hoc 5.6 upgrade on 1 server14
Olivier Dasini - @freshdaz
Performance testing 3/13Build 5.6 test server from a 5.5 slave.
Choose a "small" cluster (1.5 TB)
Ad_hoc upgrade is quite straightforward:
Clone a 5.5 server -> Upgrade in 5.6 -> Setting up replication
Steps● Take a binary backup (Xtrabackup) from db5.5 (5.5 instance)● Restore the binary backup on new server (5.6 candidate but still in 5.5)● 5.6 binaries upgrade + New configuration (5.6 my.cnf)● mysql_upgrade● Start replication (master is still in 5.5)
15
Olivier Dasini - @freshdaz
Performance testing 4/13Issue : Fatal replication error 1/2
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event 'db_master_5.5-bin-log.003440' at 974453835, the last event read from '/var/log/mysql/db_master_5.5-bin-log.003440' at 974453835, the last byte read from '/var/log/mysql/db_master_5.5-bin-log.003440' at 974453854.'
On the master binary log:
ERROR: Error in Log_event::read_log_event(): 'Event too big', data_len: 1852797793, event_type: 104
Could not read entry at offset 974453835: Error in log format or read error.
#150318 18:09:39 server id 174326798 end_log_pos 107 Start: binlog v 4, server v 5.5.32-31.0-log created 150318 18:09:39
16
Olivier Dasini - @freshdaz
Performance testing 5/13Issue : Fatal replication error 2/2
● We've never found any explanation.● We tried to increase the max_allowed_packet dynamically
on both master and the 5.6 slave… but no effect.● Only 5.6 slave was impacted ie no issues for 5.5 slaves● No fixes except ignore this binlog ie switch to the next
one.○ Meaning risks of losing events…○ Also high risks of inconsistency
So we dropped the data and reloaded a fresh 5.5 dump + mysql_upgrade. 17
Olivier Dasini - @freshdaz
Performance testing 6/13The goal is to compare performance between 5.5 & 5.6
5.6 status :
○ Replicating data as any other 5.5 slaves○ Contains production data○ Same hardware characteristics
Ready to start our benchmarks \o/
18
Olivier Dasini - @freshdaz
Performance testing 7/13Toolpt-upgrade : https://www.percona.com/doc/percona-toolkit/2.2/pt-upgrade.html
pt-upgrade executes queries in the given MySQL LOGS on each DSN, compares the results, and reports any significant differences. The tool can also save the results for later analyses. LOGS can be slow, general, binary, tcpdump and raw.
Best practices
● Split your (slow) logs into small chunks : 200 ~ 500 MB of data○ Easier to manage○ Output easier to analyse
● Choose carefully your data samples○ Capture queries at different time○ Reduce the risk to missed important queries
19
Olivier Dasini - @freshdaz
Performance testing 8/13Phase 1 - Collect Slow Logs
For each collection :
● Connect to 5.5 slave in production ● Set long_query_time to 0
○ mysql> SET GLOBAL long_query_time = 0; ● Clean slow log
○ $ cp /dev/null /var/log/mysql/slow-log ● Wait for X mins or watch the slow-log grow to ~300MB (whichever comes 1st)● Set long_query_time to its default value
○ mysql> SET GLOBAL long_query_time = <DEFAULT_VALUE>; ● Copy dated slow log
○ $ cp /var/log/mysql/slow-log ./slow-log-$(date +"%F-%H-%M-%S") ● Clean slow log
○ $ cp /dev/null /var/log/mysql/slow-log20
Olivier Dasini - @freshdaz
Performance testing 9/13Phase 2 - Benchmarks (cold & warm buffers) and Compare 1/2
1. Ensure both slaves - 5.5 & 5.6 - have no replication lag2. Stop replication on db_5.5:
a. mysql_5.5> STOP SLAVE; 3. Wait for a few seconds....4. Stop replication on db_5.6:
a. mysql_5.6> STOP SLAVE;5. Note down the master log file and position from the above step-4. 6. Both slaves should be in perfect sync.
Update db_5.5's master log/position to reflect db_5.6's master log/position respectively. So the when pt-upgrade is run, it returns the same set and the number of of rowsa. mysql_5.5> START SLAVE SQL_THREAD UNTIL MASTER_LOG_FILE =
'<log_file>', MASTER_LOG_POS = <log_position>;
21
Olivier Dasini - @freshdaz
Performance testing 10/13Phase 2 - Benchmarks (cold & warm buffers) and Compare 2/2
7. Run pt-upgrade on db_5.5 (reference results)a. Cold bench (after a mysql restart)b. Warm bench (after the first run)
8. Run pt-upgrade on db_5.6a. Cold bench (after a mysql restart)b. Warm bench (after the first run)
9. db_5.5. back to production
22
Olivier Dasini - @freshdaz
Performance testing 11/13Our tests was interesting
Query response time was usually equals or better in 5.6
However we found 1 big query regression
● Query time: From (0.09 sec) to (16 min 40.35 sec)
23
Upgrade Confidence Index : 75%
Olivier Dasini - @freshdaz
Performance testing 12/13Issue : Query regression
● Basically Optimizer was chosen the wrong index.● Bug opened to MySQL (by Percona)
Possible fixes :
● Disable index extensions algorithm (pre 5.6.9 behavior)○ SET optimizer_switch="use_index_extensions=off";
● Use hint: IGNORE / FORCE INDEX○ … IGNORE INDEX (bad_index) … || … FORCE INDEX (good_index) …
● Use NULL-safe equal operator ie replace "IS NULL" by "<=> NULL"○ … column_id <=> NULL …
● Rewrite query○ The most sustainable choice○ Many possibilities… worked with the appropriate dev team
24
Olivier Dasini - @freshdaz
Performance testing 13/13As soon as the query was fixed and tested we put the 5.6 in production.
● 5.6 is like the other 5.5 slaves● Monitored closely for weeks● Slow query logs analysis chown good numbers
○ Fewer slow queries○ Smaller amount of total slow query time
● Smaller CPU usage
So far so good…
25
Upgrade Confidence Index : 90%
Olivier Dasini - @freshdaz
AgendaMe, Myself & I
Technical background
Why upgrade to 5.6?
Performance testing
Preprod upgrade
Production upgrade
Wrap-up26
Olivier Dasini - @freshdaz
Preprod upgrade 1/9● Workload different from production : smaller● Data size different from production : tinier● Hardware also different
=> Not relevant for performance tests
But is very important to :● Test the upgrade process
○ Can't do it manually○ Should be transparent for our customers
● Know how our internal tools / other apps will behave with 5.6○ Databases are used in so many different ways○ Can't test them all so if it breaks someone will shout!
● Sensibilise other MySQL consumers to this migration○ We need their feedback
This step is also very important because an entire cluster downgrade (back to 5.5) is a painful operation
27
Olivier Dasini - @freshdaz
Preprod upgrade 2/9Preprod technical context
Flavour : Percona Server 5.5 on VMs
Data size : ~ GBs
# MySQL group of replicas : 4
# MySQL instances : 12
Mostly OLTP oriented workload - InnoDB tables
Hundreds qps, mostly reads (Selects)
Replication lag sensitive - Preferably no downtime
28
Olivier Dasini - @freshdaz
Preprod upgrade 3/9Overall process - Upgrade the 1st slave
● Put OOR one slave (per) cluster● Upgrade the slave ⇔ [more details later]● Put it back to rotation (as a replica)● Checks / Tests / Monitor● Backup the slave (Binary backup w/ Xtrabackup)
○ Base backup for other slaves
Similar to what we'll use in production (obvious!)
29
Olivier Dasini - @freshdaz
Preprod upgrade 4/9Overall process - Upgrade the 2nd (other) slave(s)
● Put OOR the 5.5 slave● Drop the data● Upgrade the binaries● Restore the 5.6 binary backup on this slave.● Put it back to rotation● Checks / Tests / Monitor
● So far, a downgrade is still quite easy:○ Binary backup from master, restore to slave after binaries downgrade
30
Olivier Dasini - @freshdaz
Preprod upgrade 5/9Overall process - Upgrade the master
Last step, easy but very sensitive
● Switch master failover ○ Promote a 5.6 slave to become the new master○ Usually less than 1 second in read only mode
● Then upgrade the old master & restore it from 5.6 backup● We have our internal tool for switch master failover
○ but 5.6 broke it…○ Whole cluster in a read only state without master ie no write allowed○ Fortunately that happens in preprod :)
31
Olivier Dasini - @freshdaz
Preprod upgrade 6/9Issue : Internal tools broken - Switch master failover
The tool uses deprecated statements SLAVE START and SLAVE STOP, instead of START SLAVE and STOP SLAVE. But they were removed in 5.6.
In old versions of MySQL (before 4.0.5), this statement was called SLAVE START. This usage is still accepted in MySQL 5.5 for backward compatibility, but is deprecated and is removed in MySQL 5.6 : https://dev.mysql.com/doc/refman/5.5/en/start-slave.html
The SLAVE START and SLAVE STOP statements. Use The START SLAVE and STOP SLAVE statements : http://dev.mysql.com/doc/refman/5.6/en/mysql-nutshell.html
Fix: Use the right statements
=> avoid usage of deprecated commands / functions /...
32
Olivier Dasini - @freshdaz
Preprod upgrade 7/9Issue : Internal tools broken - Internal usage
Because of the new configuration, new information are logged in the binlog:
You can also cause the server to write checksums for the events using CRC32 checksums by setting the binlog_checksum system variable : http://dev.mysql.com/doc/refman/5.6/en/mysql-nutshell.html
http://dev.mysql.com/doc/refman/5.6/en/replication-options-binary-log.html#sysvar_binlog_checksum
These tools parses the binlog…
Fix : Development by the relevant team33
Olivier Dasini - @freshdaz
Preprod upgrade 8/9Upgrade workflow 1/2
1. Extract schema and data + Pre-upgrade checks
2. Drop MySQL directories (datadir, logdir)
[ binaries upgraded to 5.6 by OPS + Disk encryption ] : OPS tasks
3. Load schema + Post-upgrade checks
4. Load data + Post-upgrade check2 & Compare differences in "before" & "after" checks
Checks: object count, charset,...34
Olivier Dasini - @freshdaz
Preprod upgrade 9/9Upgrade workflow 2/2
● Upgrade process was split in a dozen of scripts● Theses scripts was called by 4 main wrapper scripts for convenience● 2 types of granularity provide more flexibility
○ In case of issue DBAs can resume the process "manually" at any step○ An extra step can easily be added eg (schema modification)
● Automation is important○ Tasks are pretty straightforward but time consuming○ Lowering risk of error○ Hundreds of servers
● DBA needs to be aware of the status● Script sends emails to DBAs when
○ Task is completed○ In case of error 35
Upgrade Confidence Index : 95%
Olivier Dasini - @freshdaz
AgendaMe, Myself & I
Technical background
Why upgrade to 5.6?
Performance testing
Preprod upgrade
Production upgrade
Wrap-up36
Olivier Dasini - @freshdaz
Prod upgrade Final step(s), final tests
● Preprod is similar but not identical to prod.● To be more comfortable we
○ Added extra slaves on our smaller clusters○ Ran the full process on them
● Not possible to test the switch master failover● But we were confident enough to start, so we started
○ In progress...
37
Upgrade Confidence Index : 99%
Olivier Dasini - @freshdaz
AgendaMe, Myself & I
Technical background
Why upgrade to 5.6?
Performance testing
Preprod upgrade
Production upgrade
Wrap-up38
Olivier Dasini - @freshdaz
Wrap-up● Identified what's relevant for you in the new release
○ Understand the changes : added / removed features○ Don't be an earlier adopter (if you don't have a proper support team)
: let other clean the way● Make your own tests
○ Performance : related to your workload / data set○ Functional : are your apps depend on a removed/changed feature?
● Split the work in lots○ Easier to manage/debug/...
● Automation○ Manual things are error prone○ Write it once, use it at will
● Communication○ Explain / describe what you are going to do○ Involve consumers, looking for their feedback
39
Olivier Dasini - @freshdaz
Questions?
40
Thank you!
Olivier DASINI
Twitter : @freshdaz
Mail : [email protected]
Skype : olivier.dasini