Upload
distributed-matters
View
503
Download
8
Embed Size (px)
Citation preview
Spilothe high-available PostgreSQL cluster
Feike SteenbergenZalando SE
• 15 EU countries• 3 fulfilment
centers • 15+ million
active customers• 2.2 billion €
revenue 2014
Zalando
150 000+ products
We are growing!
Our databases
• > 150 production Postgresql databases
• > 13.5 TB data• > 5 TB biggest DB• 400-1000+ write tps• > 2 DB failures/month
Zalando never sleeps
Infrastructure bottleneck
ACID Teamcreatealterdeploymigratefailoverupgrade
80+ teams
Radical Agility
Purpose
Autonomy
Mastery
Cloud
• 2013: ZCloud
• 2014: project Pequod
• 2015: Let’s just use AWS…
Amazon abbreviations
• AWS - amazon web services• EC2 - elastic compute cloud• ELB - elastic load balancer• RDS - relational DB service• CF - Cloud Formation• ASG - Auto Scaling Group
AWS
• One account per team
• Microservices
• REST/OAuth2
• Deployment with Docker
Autonomous teams on AWS
REST
INTERNET
Autonomous teams• Team decides which product to build• … and which technologies to use
• REST/OAuth2 mandatory
• Team is responsible for its infrastructure
• Developers should take care of infrastructure
• ..including production databases
• On AWS!
Databases?
Isn’t it dangerous?
DBAs running with scissors, by Gavin M. Roy: https://www.flickr.com/photos/gavinmroy/4638958958
ACID team provides
PostgreSQL trainings
What about failover?
• Detect the master failure
• Elect a new master
• Redirect clients
Autofailover tasks
Autofailover issues
• Discarded writes
• Split-brain
• False positives
RDS?
• Support for PostgreSQL
• Automatic failover
• Most extensions
• Automatic backups
RDS?• Vendor lock
• No superuser
• No untrusted languages
• No logical decoding plugins
• Costs
Spilo (სპილო)
Spilo does
• Rapid deployment of PostgreSQL on AWS EC2 instances
• Streaming replication with auto-failover
Spilo on AWS
SpiloMASTER
SpiloREPLICA
SpiloREPLICA
Master connection
Application DB request
ETCD cluster statusupdate
Failover
SpiloREPLICA
SpiloREPLICA
Master connection
Application DB request
ETCD cluster statusupdate
Failover
SpiloMASTER
SpiloREPLICA
Master connection
Application DB request
ETCD cluster statusupdate
NEWSPILOSTARTS…
Failover
SpiloMASTER
SpiloREPLICA
Master connection
Application DB request
ETCD cluster statusupdate
SpiloREPLICA
What is Spilo?
c
Patroni
MASTER
c
Patroni
REPLICA
c
Patroni
REPLICA
Auto-scaling group Auto-scaling group
Demo: Deploying Spilo
• We use stups• First we define a template• We create a Cloud Formation
Stack from this template
Patroni (პატრონი)• Handles new replicas and failover
• Based on ideas and code of the Compose Governor
• Open-source• Runs everywhere
Compose Governor idea
● Use etcd for failover decision
● Run etcd on every node● Run 1 node with HAProxy + etcd
Distributed configuration systems
• Fault tolerant
• Reliably store small amounts of strongly-consistent data between distributed nodes
• Good for storing the PostgreSQL cluster state
Distributed consensus
LEADER
CLIENT CLIENT CLIENT
write request
Distributed consensus
LEADER
CLIENT CLIENT CLIENT
write request
LEADER
Cluster state in etcd$ export ETCD=172.17.0.2:4001$ etcdctl -C $ETCD ls /service/ --recursive/service/dm/service/dm/optime/service/dm/optime/leader/service/dm/members/service/dm/members/postgresql_172_17_0_3/service/dm/members/postgresql_172_17_0_4/service/dm/members/postgresql_172_17_0_5/service/dm/initialize/service/dm/leader
Leader key• Points to the member key• Has a TTL, autoexpires• Acts as an exclusive lock• Only the leader can become the
master
Leader TTL$ http http://$ETCD/v2/keys/service/dm/leader...{ "action": "get", "node": { "createdIndex": 8, "expiration": "2015-11-20T09:56:43.59367038Z", "key": "/service/dm/leader", "modifiedIndex": 85, "ttl": 22, "value": "postgresql_172_17_0_3" }}
Member key$ etcdctl -C $ETCD get /service/dm/members/postgresql_172_17_0_5{ "conn_url": "postgres://un:[email protected]:5432/postgres", "api_url": "http://172.17.0.5:8008/patroni", "tags": {}, "state": "running", "role": "replica", "xlog_location": 67109176}
PatroniPatroni
MASTER REPLICA
MASTER LB
PostgreSQL connection
API HealthCheck (master)
Connection and API URL
PatroniPatroni
MASTER REPLICA
MASTER LB
PostgreSQL connection
API HealthCheck (master)
Connection and API URL
PatroniPatroni
MASTER REPLICA
MASTER LB
PostgreSQL connection
API HealthCheck (master)
Connection and API URL
PatroniPatroni
MASTER REPLICA
MASTER LB
PostgreSQL connection
API HealthCheck (master)
Connection and API URL
Connection and API URL
PatroniPatroni
MASTER REPLICA
MASTER LB REPLICA LB
API HealthCheck (slave)
PostgreSQL connection
API HealthCheck (master)
Initialize key$ etcdctl -C $ETCD get /service/dm/initialize6219169399948550171
• PostgreSQL cluster system ID• Created by the first node that joins
the cluster• Nodes with different system ID are
not allowed to join
Patroni modules
ETCD ZOOKEEPER
ABSTRACT DCS PostgreSQL REST API
High availability
Asynchronous executor
Callbacks
Demo time!
https://asciinema.org/a/2ttvu50yehjo2712s1w43udio
• Robust exception handling• Run long-running tasks (i.e.
base backup in a separate thread)
• ETCD + Zookeeper• Rest API
Patroni improvements
Patroni improvements
• Configurable replica imaging• Support for pg_rewind• patronictl• packaged:pip install patroni
Patroni improvements
• Manual failover• Initialize from external cluster• Attach to already running
PostgreSQL nodes• Tags (i.e. nofailover)
• Spilo:github.com/zalando/spilospilo.readthedocs.org
• Patroni:github.com/zalando/patronipatroni.readthedocs.org
• Stups:github.com/zalando-stups/stups.io
• Feedback: @ekief
Thank you!