Upload
raghavendra-prabhu
View
347
Download
2
Embed Size (px)
Citation preview
Corpus collapsumPartition tolerance testing of Galera with
Docker and NetEm
Raghavendra Prabhu [email protected]
Percona [email protected] randomsurfer wnohang.net rdprabhu ronin13
Introduction
Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributedsystem. ”
“ A distributed system is one in which the failure of acomputer you didn’t even know existed can render your owncomputer unusable. ” - Leslie Lamport
“ Never attribute to malice that which is adequatelyexplained by stupidity. ” - Hanlon’s Razor
“ Never attribute to Byzantine failure which can beexplained by an ill node(s) ” - Me
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68
Introduction
Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributedsystem. ”
“ A distributed system is one in which the failure of acomputer you didn’t even know existed can render your owncomputer unusable. ” - Leslie Lamport
“ Never attribute to malice that which is adequatelyexplained by stupidity. ” - Hanlon’s Razor
“ Never attribute to Byzantine failure which can beexplained by an ill node(s) ” - Me
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68
Introduction
Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributedsystem. ”
“ A distributed system is one in which the failure of acomputer you didn’t even know existed can render your owncomputer unusable. ” - Leslie Lamport
“ Never attribute to malice that which is adequatelyexplained by stupidity. ” - Hanlon’s Razor
“ Never attribute to Byzantine failure which can beexplained by an ill node(s) ” - Me
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68
Introduction
Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributedsystem. ”
“ A distributed system is one in which the failure of acomputer you didn’t even know existed can render your owncomputer unusable. ” - Leslie Lamport
“ Never attribute to malice that which is adequatelyexplained by stupidity. ” - Hanlon’s Razor
“ Never attribute to Byzantine failure which can beexplained by an ill node(s) ” - Me
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68
Introduction
Actors
▶ Database - WSREP/PXC▶ Plugin - Galera▶ Traffic control
♦ Traffic Control - tc♦ NetEm
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 7 / 68
Introduction
Actors
▶ Database - WSREP/PXC▶ Plugin - Galera▶ Traffic control
♦ Traffic Control - tc♦ NetEm
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 7 / 68
Introduction
Actors
▶ Database - WSREP/PXC▶ Plugin - Galera▶ Traffic control
♦ Traffic Control - tc♦ NetEm
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 7 / 68
Introduction
Actors
▶ Containers - Docker▶ Load
♦ Generators - Sysbench, RQG▶ Network
♦ Dnsmasq♦ nsenter
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 8 / 68
Introduction
Actors
▶ Containers - Docker▶ Load
♦ Generators - Sysbench, RQG▶ Network
♦ Dnsmasq♦ nsenter
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 8 / 68
Introduction
Actors
▶ Jenkins♦ Build flow and CI
▶ Storage♦ Why
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 9 / 68
Details
Rationale
▶ The ‘P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68
Details
Rationale
▶ The ‘P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68
Details
Rationale
▶ The ‘P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68
Details
Rationale
▶ The ‘P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68
Details
Rationale
▶ Failures in warehouses.▶ Not quorum, but consensus.▶ Real world networks and synchronous replication
- Delay- Partition- Non-graceful exits
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 12 / 68
Details
Galera
▶ Data-centric approach▶ Extended Virtual Synchrony▶ Causality and Synchronous▶ Flow control and temporal
Synchrony
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 14 / 68
Details
Galera
▶ Latency- Global ordering- Certification and not apply- Communication overhead
▶ Layers- Replication- Certification- Group communication
▶ Isolation- REPEATABLE-READ- SNAPSHOT-ISOLATION
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 15 / 68
Details
Where did it start
▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192▶ Loss of PC▶ Crash▶ HAT
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 20 / 68
Details
Tests
▶ Chaos testing▶ Flow control with sysbench▶ Network Loss▶ Future
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 22 / 68
Details
NetEm
▶ Initial setup- Bridge- Egress only- IFB- Present state
▶ NetEm- tc qdisc buckets- packet loss, delay, corruption, duplication, reordering- nsenter
▶ Future- Docker exec- Rocket ACI
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 24 / 68
Details
Tests: Chaos testing
▶ Nodes killed at random around sysbench▶ Less than half of nodes are chosen▶ docker inspect && SIGKILL▶ Configurable sleep && retry
♦ Snapshot/Incremental State Transfer- Composability of transactional databases
▶ docker restart && repeat
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 25 / 68
Details
Tests: Network Loss
▶ Loss nodes▶ Detach/Keep qdisc▶ Reconciliation▶ Sanity checks▶ Formation of PC || time to recover
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 26 / 68
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
Details
Parameters
▶ Sysbench▶ Segment▶ Reconciliation period▶ Loss nodes
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68
Details
Parameters
▶ Sysbench▶ Segment▶ Reconciliation period▶ Loss nodes
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68
Details
Parameters
▶ Sysbench▶ Segment▶ Reconciliation period▶ Loss nodes
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68
Details
Parameters
▶ Sysbench▶ Segment▶ Reconciliation period▶ Loss nodes
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68
Details
Parameters
▶ NetEm▶ Qdisc detach▶ fsync▶ Shutdown
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68
Details
Parameters
▶ NetEm▶ Qdisc detach▶ fsync▶ Shutdown
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68
Details
Parameters
▶ NetEm▶ Qdisc detach▶ fsync▶ Shutdown
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68
Details
Parameters
▶ NetEm▶ Qdisc detach▶ fsync▶ Shutdown
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68
Details
Docker
▶ Why not virtualizeOccamNamespaces
▶ Simplicity♦ Network
Logical scalability♦ One application per node
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 34 / 68
Details
Docker
▶ Portability- Qualitative behavior.
▶ Reproducibility- Makes it determinstic
▶ Configurable and CI- Byproducts
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 35 / 68
Details
Docker
▶ QEMU vis-à-vis Docker▶ Scalability
♦ Performance♦ Feature
▶ Abstraction of channels
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 36 / 68
Details
Container Networking
▶ Linking didn’t help▶ Dnsmasq to rescue!
♦ Hosts file and volumes♦ SIGHUP and refresh
▶ Potential issues
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 37 / 68
Details
Overview
▶ Transient noise▶ Lasting ’sickness’▶ Sick nodes▶ Dead members
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 39 / 68
Details
Method I
▶ Qdisc is detached after load▶ Objective
- Time to recover of full cluster▶ Done with a larger subset
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 40 / 68
Details
Method II
▶ Qdisc is kept till the end▶ Objective
- Formation of primary component▶ Comparatively smaller set
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 41 / 68
Details
Observations
▶ Post sanity types- Why
▶ Which method is more pertinent▶ State transfer issues
- Beginning- During re-emergence
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 42 / 68
Details
Observations
▶ Direct load to affected nodes▶ Partition external to system▶ Logs
- journalctl- Streaming?
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 43 / 68
Details
Other noises
▶ Aim▶ Fsync
- libeatmydata- Variance
▶ Correlation with network▶ How with Docker
- LD_PRELOAD
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 44 / 68
Details
Load generation
▶ Sysbench- Generation- Reconnect on partition
▶ Sockets chosen- Load on affected nodes
▶ Distribution of Load- RR with socat- Native sysbench support- HAProxy?
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 46 / 68
Details
Load generation
▶ Nature of data/load- DDL
▶ RQG in future- Fuzz testing
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 47 / 68
Details
Eviction
▶ STONITH▶ Permanent eviction▶ ’N’ strikes & out!
- Timers - evs parameters- wsrep_evs_delayed and wsrep_evs_evict_list
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 50 / 68
Details
Eviction
▶ Aim▶ Quorum required
- Why? - Not shoot each other- Non-PC nodes also.
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 51 / 68
Details
Eviction
▶ Aim▶ Quorum required
- Why? - Not shoot each other- Non-PC nodes also.
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 51 / 68
Details
Coredumps with Docker
▶ Breakdown of abstraction▶ Lack of isolation▶ What was done
- Volumes- core_pattern & sysctl- suid and ulimit
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 52 / 68
Details
WAN Segments
▶ How they work▶ Simulates data center▶ Random allocation - latency multiplier▶ Joiner starvation▶ Donor selection
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 53 / 68
Epilogue
The code
▶ Github:- https://github.com/percona/pxc-docker-https://github.com/percona/percona-xtradb-cluster/- https://github.com/percona/galera
▶ Jenkins:- http://jenkins.percona.com/job/PXC-5.6-netem/- http://jenkins.percona.com/job/PXC-5.6-bench/- http://jenkins.percona.com/job/PXC-5.6-chaos/
▶ Contributions/testing/bugs welcome!
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 54 / 68
Epilogue
Code: todo
▶ Docker automated builds▶ Orchestration▶ Docker
♦ Injection♦ Signal proxying
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 55 / 68
Epilogue
Code: todo
▶ => Proof of concept to a framework =>▶ Run it bare - CoreOS, Atomic▶ Overlay with etcd/fleet/libswarm
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 56 / 68
Epilogue
Future work
▶ Fault injection♦ Memory
- Poisoned memory♦ Disk
- libeatmydata- Opposite- ENOSPC
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 58 / 68
Epilogue
Fault injection
▶ CPU- NUMA?- Hotplug
▶ More network- corruption, duplication, reordering, rate-limit- Better distribution- Other shaping
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 59 / 68
Epilogue
Future work
▶ Disturb cluster more!- Membership changes* Manual eviction* Pull the cord!- Corrupt nodes
▶ Introduce inconsistencies- Consistency voting- Silent corruptions
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 61 / 68
Epilogue
Eventual consistency
▶ CAP▶ Latency factor▶ Is Galera EC? No!
- ACIDs only, No BASE▶ Bounded Staleness
- PBS▶ ACID and CAP▶ Instrumentation▶ Lambda architecture
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 62 / 68
Epilogue
Further Reading
▶ Resources▶ Byzantine fault tolerance
- Reaching agreement in presence of faults▶ The Network is Reliable▶ NetEm▶ Latency: The New Web Performance Bottleneck▶ Galera Cluster Documentation▶ Auto eviction code▶ Don’t Settle for Eventual Consistency▶ Extended Virtual Synchrony▶ Galera Flow Control
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 63 / 68
Epilogue
Further Reading
▶ Worst-Case Distributed Systems Design▶ HAT, not CAP: Introducing Highly Available Transactions▶ Bridging the Gap: Opportunities in Coordination-Avoiding
Databases▶ Linearizability versus Serializability
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 64 / 68
Epilogue
We are Hiring Too!
▶ Looking for build engineer - Packaging and Jenkins/CI are yourstrengths and you are a linux geek.bonus points if you are a linux distrouser/contributor/maintainer.
▶ Senior C/C++ developer - if linux userspace development anddatabases (and distributed systems) is your thing.
▶ Apply here: http://percona.theresumator.com/.
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 65 / 68
Conference for Databasegeeks!
My Talk: Securing databases withsystemd for containers and
services
Epilogue
About/Contact - HA compliant
▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDBCluster, Percona.
▶ Slides will be at slideshare.net/slidunder.▶ About.me: raghavendra.prabhu▶ Keybase.io: rdprabhu▶ Presentation under CC BY-SA 4.0
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 67 / 68
Epilogue
Image Credits▶ http://galeracluster.com/documentation-webpages/▶ https://en.wikipedia.org/wiki/Network_theory▶ https://upload.wikimedia.org/wikipedia/commons/6/60/Corpus_callosum.png▶ http://www.thebarrow.org/Neurological_Services/Epilepsy/204354▶ https://flic.kr/p/9J6GNu▶ http://schauerte.me/data.html▶ https://secure.flickr.com/photos/brewbooks/7780990192▶ https://www.flickr.com/photos/kwerfeldein/2649294869▶ https://secure.flickr.com/photos/mindmob/51951632▶ https://secure.flickr.com/photos/arenamontanus/2227769907▶ https://www.flickr.com/photos/markop/477199204▶ https://www.flickr.com/photos/gcwest/281385801▶ https://www.flickr.com/photos/29233640@N07/13466208953▶ https://www.flickr.com/photos/bob_in_thailand/9782777742/▶ http://ok-panic.net/art/jeff/dennis.jpg▶ https://www.facebook.com/sciencedump/photos/a.296290153732762.90161.
111815475513565/985102638184840/?type=1▶ http://upload.wikimedia.org/wikipedia/commons/0/05/Sna_large.png▶ http://background-kid.com/background-images-light-blue-color.html
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 68 / 68