Upload
alfranio-junior
View
311
Download
3
Embed Size (px)
Citation preview
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.
GroupReplication:AJourneytotheGroupCommunicationCore
Alfranio Correia([email protected])PrincipalSoftwareEngineer
4thofFebruary Oracle/Fosdem2017
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
SafeHarborStatementThefollowingisintendedtooutlineourgeneralproductdirection.Itisintendedforinformationpurposesonly,andmaynotbeincorporatedintoanycontract.Itisnotacommitmenttodeliveranymaterial,code,orfunctionality,andshouldnotberelieduponinmakingpurchasingdecisions.Thedevelopment,release,andtimingofanyfeaturesorfunctionalitydescribedforOracle’sproductsremainsatthesolediscretionofOracle.
24thofFebruary Oracle/Fosdem2017
Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|
ProgramAgenda
4thofFebruary Oracle/Fosdem2017
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Background
GroupCommunicationInterface
Group Communication Engine
Performance
Conclusion
ProgramAgenda
4thofFebruary Oracle/Fosdem2017 4
1
2
3
4
5
Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|
Background
4thofFebruary Oracle/Fosdem2017
1
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
MySQLInnoDB Cluster
64thofFebruary Oracle/Fosdem2017
S1 S2 S3 S4 S…
M
M M
MySQLConnectorApplication
MySQLRouter
MySQLConnectorApplication
MySQLRouter
MySQLShell
HA
ReplicaSet
1
S1 S2 S3 S4 S…
M
M M
MySQLConnectorApplication
MySQLRouter
HA
ReplicaSet2
ReplicaSet3
MySQLConnectorApplication
MySQLRouter
S1 S2 S3 S4
M
M M
HA
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
MySQLGroupReplication• WhatisMySQLGroupReplication?“Multi-masterupdateeverywhere replicationpluginforMySQLwithbuilt-inautomaticdistributedrecovery,conflictdetection andgroupmembership.”
• WhatdoestheMySQLGroupReplicationplugindofortheuser?– Automates serverfailover inSinglePrimary– Providesfault tolerance– Enablesupdateeverywhere setups– Automatesgroupreconfiguration(handlingofcrashes,failures,re-connects)– Providesahighlyavailablereplicated database
74thofFebruary Oracle/Fosdem2017
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
MajorBuilding Blocks
84thofFebruary Oracle/Fosdem2017
M M M M MCom.API
ReplicationPlugin
API
MySQLServer
Group Comm.System (Corosync)GroupCom.Engine
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
TheCompleteStack
94thofFebruary Oracle/Fosdem2017
API
ReplicationPlugin
API
MySQLServer
PerformanceSchemaTables:Monitoring
MySQL
APIs:Lifecycle/Capture/Applier
InnoDBReplicationProtocol
GroupCom.API
GroupCom.Engine
Network
PluginCapture ApplierConflicts
Handler
GroupComm.System(Corosync)GroupCom.Engine
GroupCom.Binding
Recovery
Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|
Group Communication Interface
4thofFebruary Oracle/Fosdem2017
2
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Design• Abstract interfacetosupport different solutions– Reconfigurethe group and get membership information– Send and receive messages
• Usesthe observer pattern–MySQLGroupReplication listens toevents
• Different implementations perCommunication Systems• Made the transition from Corosync easy
114thofFebruary Oracle/Fosdem2017
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Semantics• Closed Group–Only group members cansend and receive messages
• TotalOrder–Messages aretotally ordered among each other
• SafeDelivery–One cannot deliver amessage if the majority can’t doso
• View Synchrony– Changes tomembership aretolltaly ordered with messages
124thofFebruary Oracle/Fosdem2017
Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|
Group Communication Engine
4thofFebruary Oracle/Fosdem2017
3
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Built-inCommunicationEngine• Based on provendistributedsystemsalgorithms(Paxos)– Compression,multi-platform,dynamicmembership,SSL,IPwhitelisting
• Nothird-partysoftwarerequired• Nonetworkmulticastsupport required–MySQLGroupReplicationcanoperateoncloudbasedinstallationswheremulticastisunsupported
144thofFebruary Oracle/Fosdem2017
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
PaxosFamily and Friends
154thofFebruary Oracle/Fosdem2017
Multi-Paxos
Fast Paxos
Disk Paxos
Cheap Paxos
VerticalPaxos
Generalized Paxos
Raft
Mencius
Flexible Paxos
Egalitarian Paxos
Byzantine Paxos
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
BasicPaxos
164thofFebruary Oracle/Fosdem2017
M0
M1
M2
Prepare/Election Phase
M0
M1
M2
Accept Phase
M0
M1
M2
Learn Phase
• Get agreement on avalue:– Next message/transaction tobedelivered
• Members may have different roles:– Usually all members areproposers,acceptors and learners
• Need aquorum tomake progress– Usually amajority
1 2
3
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
PreparePhase
174thofFebruary Oracle/Fosdem2017
• Proposer sends apreparerequest with number “n”tomembers (i.e.acceptors)• If an acceptor has not received arequest with anumber greater than “n”,it will respond• It will promise not toaccept arequest numberedless than “n”• If the reply has anon-empty value,the leaderwillusethat with the highest number
M0
M1
M2
Prepare1.1
M0
M1
M2
Promise1.2
(n)
(n)
(y,value)
(x,value)
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Accept Phase
184thofFebruary Oracle/Fosdem2017
• If the leaderfinds outthat anon-empty value hasbeen previously proposed,it will useit• Otherwise,it will propose anew value• Requires anetworkround-triptoget agreement
M0
M1
M2
Accept2.1
M0
M1
M2
Accepted2.2
(n,value)
(n,value)
(ack)
(ack)
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Learn Phase
194thofFebruary Oracle/Fosdem2017
• It will inform other members about the decision• Only one learner is required tohave progress• If the member already has the value,an ack isenough
M0
M1
M2
Learn3
(value)
(value)
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Multi-Paxos
204thofFebruary Oracle/Fosdem2017
slot 0
0
1
2
Accept/Learn0
1
2
Accept/Learn0
1
2
Accept/Learn0
1
2
Election0
1
2
Accept/Learn0
1
2
Election
slot 1 slot 2 slot 3 ...
• Consensus roundtodecideon each slot’s content
• Replicated LogStream
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
So what?• They caneasily become abottleneck• Multiple leaders:eXtended COMmunications
214thofFebruary Oracle/Fosdem2017
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
How doesXCOMwork?
224thofFebruary Oracle/Fosdem2017
slot 0
0
1
2
Accept/Learn
slot 1 slot 2 slot 3
0
1
2
Accept/Learn
slot 4 slot 5 ......
0
1
2
Accept/Learn0
1
2
Accept/Learn0
1
2
Accept/Learn0
1
2
Accept/Learn
• Every member is aleaderso noleaderelection
• Every member owns aIn-Memory Replicated Log
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Nothing toPropose
234thofFebruary Oracle/Fosdem2017
slot 0
0
1
2
Accept/Learn
nop slot 2 slot 3
0
1
2
Accept/Learn
nop slot 5 ......
0
1
2
Accept/Learn0
1
2
Learn0
1
2
Accept/Learn0
1
2
Learn
• Only alearn message with a“nop”is enough
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
How is the optimization possible?• Member “1”sends alearn message “(0,nop)”tomember “4”and dies• Non-leaderscanonly propose “nop”(s)on behalf of others• They mustgo through all Paxosphases
244thofFebruary Oracle/Fosdem2017
0
2
3
1
4
Learn
1
2
3
0
4
(1)
(1)
1
2
3
0
4
(0,-)
(0,-)
1
2
3
0
4
(1,nop)
(1,nop)
1
2
3
0
4
(ack)
(ack)
Prepare Promise Accept Accepted
1
2
3
0
4
(nop)
(nop)
Learn
(0,nop)
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
HandlingFailures/Suspicions
254thofFebruary Oracle/Fosdem2017
slot 0
0
1
2
Accept/Learn0
1
2
Accept/Learn0
1
2
Prep./Accept/Learn
slot 1 slot 2 nop
0
1
2
Accept/Learn0
1
2
Accept/Learn
slot 4
0
1
2
Accept/Learn
slot 5 ......
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Implemented Optimizations inXCOM• Pipeline– Proposes several “transactions”inparallel– Improvesperformanceinhigh latency networks– Current value is “10”
• Batch– ImprovesCPUusage– Improvesperformanceinhigh latency/low bandwidth networks– Current value is “5”
264thofFebruary Oracle/Fosdem2017
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Implemented Optimizations inBiding• Compression– Reduces bandwith consumption
• Automatically reconfigureagroup– Faulty members areexpelled
274thofFebruary Oracle/Fosdem2017
Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|
Performance
4thofFebruary Oracle/Fosdem2017
6
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Configuration• Multipe writers – One perServer• Singlewriter – Just one client• OracleServerX5-2Lwith two IntelXeon E5-2660-V3processors– 20Cores– 40HardwareThreads
• OracleEnterprise Linux7,kernel 3.8.13-118.13.3• 10Gbps ethernet• Used “tc”tothrottle network
294thofFebruary Oracle/Fosdem2017
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Multiple writers (256Bytes)
304thofFebruary Oracle/Fosdem2017
3members 5members 7members 3members 5members 7members
Uncompressed256bytepayload Compressed256bytepayload
0
20000
40000
60000
80000
100000
120000
140000
16000010Gbpsnetworkwith0.1mslatency
200Mbpsnetworkwith7mslatency
• Compression improvesperformanceinMetropolitan
• Headers arenot compressed (~200bytes)though
Messagesp
erse
cond
sent
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Multiple writers (1KBytes)
314thofFebruary Oracle/Fosdem2017
• Check whether compression may help or not
• Usually helps when bandwidth is aproblem
3members 5members 7members 3members 5members 7members
Uncompressed1Kpayload Compressed1Kpayload
0
20000
40000
60000
80000
100000
12000010Gbpsnetworkwith0.1mslatency
200Mbpsnetworkwith7mslatency
Messagesp
erse
cond
sent
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
SingleWriter (1KBytes)
324thofFebruary Oracle/Fosdem2017
3members 5members 7members 3members 5members 7members
Uncompressed1Kpayload Compressed1Kpayload
0
20000
40000
60000
80000
100000
12000010Gbpsnetworkwith0.1mslatency
200Mbpsnetworkwith7mslatency
• The scale outeffect with multiple writers is small
• Compression doesnot help here
Messagesp
erse
cond
sent
Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|
Conclusion
4thofFebruary Oracle/Fosdem2017
5
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Current Status• Has made into MySQL 5.7.17release• GAinDecember 2016
344thofFebruary Oracle/Fosdem2017
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Future• Configurable Paxosrole(s)– Leader/Acceptor/Learner or Acceptor/Learner or Learner
• Multiple leadersonly if needed:– Avoids the skip message– ImprovesCPUand networkusage
• Not all members need tomake messages networkdurable– Reduces resilience but improvesperformance
354thofFebruary Oracle/Fosdem2017
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Future• Expose someconfiguration options:– Batch– Pipeline
• Compression at low level layers aswell• Write tonetworkinparallel• Overlay networks
364thofFebruary Oracle/Fosdem2017
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Wheretogofromhere?• Packages– http://www.mysql.com/downloads/
• Documentation– http://dev.mysql.com/doc/refman/5.7/en/group-replication.html
• BlogsfromtheEngineers(news,technicalinformation,andmuchmore)– http://mysqlhighavailability.com
374thofFebruary Oracle/Fosdem2017