92
Corpus collapsum Partition tolerance of Galera in a noisy high load environment Highload++ 2014 Raghavendra Prabhu [email protected] Percona [email protected] randomsurfer wnohang.net rdprabhu ronin13

Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Embed Size (px)

DESCRIPTION

This is the talk given at Highload++ 2014 in Moscow, Russia. The topic was partition tolerance testing of Galera in a noisy high load environment with NetEm and Docker.

Citation preview

Page 1: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Corpus collapsumPartition tolerance of Galera in a noisy high load

environmentHighload++ 2014

Raghavendra Prabhu [email protected]

Percona [email protected] randomsurfer wnohang.net rdprabhu ronin13

Page 2: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

The Title?

Page 3: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Our Cluster

Page 4: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Split brain

Page 5: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Seed quotes..

“ ’Network is reliable’ - a fallacy of the distributed system. ” - PeterDeutsch

“ A distributed system is one in which the failure of a computer you didn’teven know existed can render your own computer unusable. ” - Leslie Lamport

“ Never attribute to malice that which is adequately explained by stupidity.” - Hanlon’s Razor

“ Never attribute to Byzantine failure which can be explained by an illnode(s) ” - Me

5 / 92

Page 6: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Seed quotes..

“ ’Network is reliable’ - a fallacy of the distributed system. ” - PeterDeutsch

“ A distributed system is one in which the failure of a computer you didn’teven know existed can render your own computer unusable. ” - Leslie Lamport

“ Never attribute to malice that which is adequately explained by stupidity.” - Hanlon’s Razor

“ Never attribute to Byzantine failure which can be explained by an illnode(s) ” - Me

6 / 92

Page 7: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Seed quotes..

“ ’Network is reliable’ - a fallacy of the distributed system. ” - PeterDeutsch

“ A distributed system is one in which the failure of a computer you didn’teven know existed can render your own computer unusable. ” - Leslie Lamport

“ Never attribute to malice that which is adequately explained by stupidity.” - Hanlon’s Razor

“ Never attribute to Byzantine failure which can be explained by an illnode(s) ” - Me

7 / 92

Page 8: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Seed quotes..

“ ’Network is reliable’ - a fallacy of the distributed system. ” - PeterDeutsch

“ A distributed system is one in which the failure of a computer you didn’teven know existed can render your own computer unusable. ” - Leslie Lamport

“ Never attribute to malice that which is adequately explained by stupidity.” - Hanlon’s Razor

“ Never attribute to Byzantine failure which can be explained by an illnode(s) ” - Me

8 / 92

Page 9: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

The fallacies

▶ The network is reliable▶ Latency is zero▶ Bandwidth is infinite▶ The network is secure

9 / 92

Page 10: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

The fallacies

▶ Topology doesn’t change▶ There is one administrator▶ Transport cost is zero▶ The network is homogeneous

10 / 92

Page 11: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

20000 feet view

Page 12: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Actors

▶ Database - WSREP/PXC▶ Plugin - Galera▶ Traffic control

♦ Traffic Control - tc♦ NetEm

12 / 92

Page 13: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Actors

▶ Database - WSREP/PXC▶ Plugin - Galera▶ Traffic control

♦ Traffic Control - tc♦ NetEm

13 / 92

Page 14: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Actors

▶ Database - WSREP/PXC▶ Plugin - Galera▶ Traffic control

♦ Traffic Control - tc♦ NetEm

14 / 92

Page 15: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Actors

▶ Containers - Docker▶ Load

♦ Generators - Sysbench, RQG▶ Network

♦ Dnsmasq♦ nsenter

15 / 92

Page 16: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Actors

▶ Containers - Docker▶ Load

♦ Generators - Sysbench, RQG▶ Network

♦ Dnsmasq♦ nsenter

16 / 92

Page 17: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Actors

▶ Jenkins♦ Build flow and CI

▶ Storage♦ Why

17 / 92

Page 18: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

But why

▶ The ‘P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance

18 / 92

Page 19: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

But why

▶ The ‘P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance

19 / 92

Page 20: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

But why

▶ The ‘P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance

20 / 92

Page 21: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

But why

▶ The ‘P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance

21 / 92

Page 22: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

But why

▶ Failures in warehouses.▶ Not quorum, but consensus.▶ Real world networks and synchronous replication

- Delay- Partition

22 / 92

Page 23: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Galera

Page 24: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Galera

▶ Data-centric approach▶ EVS▶ Causality and Synchronous▶ Latency

24 / 92

Page 25: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
Page 26: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
Page 27: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
Page 28: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Where did it start

Page 29: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Where did it start

▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192▶ Loss of PC▶ Crash▶ HA goal

29 / 92

Page 30: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

One can bring the whole down

Page 31: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

The Flow

Page 32: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

Jenkins Build image(s?) Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

32 / 92

Page 33: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

Jenkins Build image(s?) Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

33 / 92

Page 34: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

Jenkins Build image(s?) Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

34 / 92

Page 35: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

Jenkins Build image(s?) Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

35 / 92

Page 36: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

Jenkins Build image(s?) Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

36 / 92

Page 37: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

Jenkins Build image(s?) Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

37 / 92

Page 38: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

Jenkins Build image(s?) Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

38 / 92

Page 39: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

Jenkins Build image(s?) Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

39 / 92

Page 40: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Examine!

40 / 92

Page 41: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Examine!

41 / 92

Page 42: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Examine!

42 / 92

Page 43: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Examine!

43 / 92

Page 44: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Examine!

44 / 92

Page 45: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Examine!

45 / 92

Page 46: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Examine!

46 / 92

Page 47: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Examine!

47 / 92

Page 48: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Cluster Resilience

48 / 92

Page 49: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Parameters

▶ Sysbench▶ Segments▶ Reconciliation period▶ Number of nodes with loss

49 / 92

Page 50: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Parameters

▶ Sysbench▶ Segments▶ Reconciliation period▶ Number of nodes with loss

50 / 92

Page 51: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Parameters

▶ Sysbench▶ Segments▶ Reconciliation period▶ Number of nodes with loss

51 / 92

Page 52: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Parameters

▶ Sysbench▶ Segments▶ Reconciliation period▶ Number of nodes with loss

52 / 92

Page 53: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Parameters

▶ NetEm▶ Detach qdisc or keep it?▶ Fsync▶ Shutdown

53 / 92

Page 54: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Parameters

▶ NetEm▶ Detach qdisc or keep it?▶ Fsync▶ Shutdown

54 / 92

Page 55: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Parameters

▶ NetEm▶ Detach qdisc or keep it?▶ Fsync▶ Shutdown

55 / 92

Page 56: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Parameters

▶ NetEm▶ Detach qdisc or keep it?▶ Fsync▶ Shutdown

56 / 92

Page 57: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Parameters

Page 58: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Containers!

Page 59: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Docker

▶ Why not virtualize♦ Occam♦ Namespaces

▶ Simplicity♦ Network♦ One application per node

59 / 92

Page 60: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Docker

▶ Portability- See same qualitative behavior that I do.

▶ Reproducibility- Makes it determinstic

▶ Configurable and CI- Byproducts

60 / 92

Page 61: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Docker

▶ QEMU and Docker▶ Scalability

♦ Performance♦ Feature

▶ Abstraction of channels

61 / 92

Page 62: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Container Networking

▶ Linking didn’t help▶ Dnsmasq to rescue!

♦ Hosts file and volumes♦ SIGHUP and refresh

▶ More elegant methods- Swarm

62 / 92

Page 63: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Noise

▶ Initial setup- Bridge- Egress only- IFB

▶ Present state▶ NetEm

- tc qdisc buckets- packet loss, delay, corruption, duplication, reordering- nsenter

▶ Future- Docker exec

63 / 92

Page 64: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Testing methods

Page 65: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Method I

▶ Qdisc is detached after load▶ Objective

- Time to recover of full cluster▶ Done with a larger subset

65 / 92

Page 66: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Method II

▶ Qdisc is kept till the end▶ Objective

- Formation of primary component▶ Comparatively smaller subset

66 / 92

Page 67: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Observations

▶ Post sanity types- Why- Whom

▶ Which method is more pertinent▶ State transfer issues

- Beginning- During re-emergence

67 / 92

Page 68: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Observations

▶ Direct load to affected nodes▶ Logs

- journalctl- Streaming?

68 / 92

Page 69: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Other noises

▶ Aim▶ Fsync

- libeatmydata- Variance

▶ Correlation with network▶ How with Docker

- LD_PRELOAD

69 / 92

Page 70: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

System Load

Page 71: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Load generation

▶ Sysbench- Generation- Reconnect on partition

▶ Sockets chosen- Load on affected nodes

▶ Distribution of Load- RR with socat- Native sysbench support- HAProxy?

71 / 92

Page 72: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Load generation

▶ Nature of data/load- DDL

▶ RQG in future- Fuzz testing

72 / 92

Page 73: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

The Fix

Page 74: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Strike Out!

Page 75: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Eviction

▶ STONITH▶ Permanent eviction▶ ’N’ strikes & out!

- Timers - evs parameters- wsrep_evs_delayed and wsrep_evs_evict_list

75 / 92

Page 76: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Eviction

▶ Aim▶ Quorum required

- Keep majority of the group live or to avoid eviction.- Why? - Not shoot each other- Non-PC nodes also.

76 / 92

Page 77: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Eviction

▶ Aim▶ Quorum required

- Keep majority of the group live or to avoid eviction.- Why? - Not shoot each other- Non-PC nodes also.

77 / 92

Page 78: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Eviction

▶ EVS version and upgrade▶ TODO!

- Ingress only- Follow here.

▶ Credits to Teemu Ollakka, Yan Zhang and Alex Yurchenko from codership.

78 / 92

Page 79: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Coredumps with Docker

▶ Breakdown of abstraction▶ Lack of isolation▶ What was done

- Volumes- core_pattern & sysctl- suid and ulimit

79 / 92

Page 80: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

WAN Segments

▶ How they work▶ Random allocation▶ Joiner starvation▶ Simulates data center▶ Donor selection

80 / 92

Page 81: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

The code

▶ Github: https://github.com/percona/pxc-docker▶ Jenkins: http://jenkins.percona.com/job/PXC-5.6-netem/▶ Contributions/testing welcome!▶ Dependencies

- Sysbench

81 / 92

Page 82: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Code: todo

▶ Docker automated builds▶ Orchestration

- Kubernetes, Mesos et.al.▶ Docker

♦ Injection♦ Signal proxying

82 / 92

Page 83: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Code: todo

▶ Poll and exit during reconciliation- Instrument v/s number of nodes

▶ Use actual channels▶ Run it bare - CoreOS etc.▶ Overlay with etcd/fleet/libswarm

83 / 92

Page 84: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Future work

Page 85: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Future work

▶ Fault injection♦ Memory

- Poisoned memory/madvise♦ Disk

- libeatmydata- Opposite- ENOSPC

85 / 92

Page 86: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Fault injection

▶ CPU- NUMA?- Hotplug

▶ More network- corruption, duplication, reordering, rate-limit- Better distribution- Other shaping

86 / 92

Page 87: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

More Chaos

Page 88: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Future work

▶ Disturb cluster more!- Membership changes* Manual eviction* Pull the cord!- Corrupt nodes

▶ Consistency voting

88 / 92

Page 90: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

About

▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDB Cluster, Percona.▶ Slides will be at slideshare.▶ About.me: raghavendra.prabhu▶ Keybase.io: rdprabhu▶ Presentation under CC BY-SA 4.0

90 / 92

Page 91: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Image Credits▶ http://galeracluster.com/documentation-webpages/▶ http://www.thelastdragontribute.com/40th-anniversary-death-of-bruce-lee/▶ https://upload.wikimedia.org/wikipedia/commons/6/60/Corpus_callosum.png▶ http://www.thebarrow.org/Neurological_Services/Epilepsy/204354▶ https://flic.kr/p/9J6GNu▶ https://secure.flickr.com/photos/brewbooks/7780990192▶ https://www.flickr.com/photos/kwerfeldein/2649294869▶ https://secure.flickr.com/photos/mindmob/51951632▶ https://secure.flickr.com/photos/arenamontanus/2227769907▶ https://www.flickr.com/photos/markop/477199204▶ http://galeracluster.com/wp-content/uploads/2013/10/galera_replication1.png▶ https://www.flickr.com/photos/gcwest/281385801▶ https://www.flickr.com/photos/opethdamna/360934079▶ http://digital-amphetamine.deviantart.com/art/Sky-82555664▶ http://highload.co/i/logo.png▶ https://flic.kr/p/xTT8n▶ https://www.flickr.com/photos/29233640@N07/13466208953▶ https://www.flickr.com/photos/bob_in_thailand/9782777742/

91 / 92

Page 92: Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

Spaseeba & Da sveedaneeya!