54
Spilo the high-available PostgreSQL cluster Feike Steenbergen Zalando SE

Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Embed Size (px)

Citation preview

Page 1: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Spilothe high-available PostgreSQL cluster

Feike SteenbergenZalando SE

Page 2: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

• 15 EU countries• 3 fulfilment

centers • 15+ million

active customers• 2.2 billion €

revenue 2014

Zalando

Page 3: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

150 000+ products

Page 4: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

We are growing!

Page 5: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Page 6: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Our databases

• > 150 production Postgresql databases

• > 13.5 TB data• > 5 TB biggest DB• 400-1000+ write tps• > 2 DB failures/month

Page 7: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Zalando never sleeps

Page 8: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Infrastructure bottleneck

ACID Teamcreatealterdeploymigratefailoverupgrade

80+ teams

Page 9: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Radical Agility

Page 10: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Purpose

Page 11: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Autonomy

Page 12: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Mastery

Page 13: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Cloud

• 2013: ZCloud

• 2014: project Pequod

• 2015: Let’s just use AWS…

Page 14: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Amazon abbreviations

• AWS - amazon web services• EC2 - elastic compute cloud• ELB - elastic load balancer• RDS - relational DB service• CF - Cloud Formation• ASG - Auto Scaling Group

Page 15: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

AWS

• One account per team

• Microservices

• REST/OAuth2

• Deployment with Docker

Page 16: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Autonomous teams on AWS

REST

INTERNET

Page 17: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Autonomous teams• Team decides which product to build• … and which technologies to use

• REST/OAuth2 mandatory

• Team is responsible for its infrastructure

Page 18: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

• Developers should take care of infrastructure

• ..including production databases

• On AWS!

Databases?

Page 19: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Isn’t it dangerous?

DBAs running with scissors, by Gavin M. Roy: https://www.flickr.com/photos/gavinmroy/4638958958

Page 20: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

ACID team provides

PostgreSQL trainings

Page 21: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

What about failover?

Page 22: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

• Detect the master failure

• Elect a new master

• Redirect clients

Autofailover tasks

Page 23: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Autofailover issues

• Discarded writes

• Split-brain

• False positives

Page 24: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

RDS?

• Support for PostgreSQL

• Automatic failover

• Most extensions

• Automatic backups

Page 25: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

RDS?• Vendor lock

• No superuser

• No untrusted languages

• No logical decoding plugins

• Costs

Page 26: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Spilo (სპილო)

Page 27: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Spilo does

• Rapid deployment of PostgreSQL on AWS EC2 instances

• Streaming replication with auto-failover

Page 28: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Spilo on AWS

SpiloMASTER

SpiloREPLICA

SpiloREPLICA

Master connection

Application DB request

ETCD cluster statusupdate

Page 29: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Failover

SpiloREPLICA

SpiloREPLICA

Master connection

Application DB request

ETCD cluster statusupdate

Page 30: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Failover

SpiloMASTER

SpiloREPLICA

Master connection

Application DB request

ETCD cluster statusupdate

NEWSPILOSTARTS…

Page 31: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Failover

SpiloMASTER

SpiloREPLICA

Master connection

Application DB request

ETCD cluster statusupdate

SpiloREPLICA

Page 32: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

What is Spilo?

c

Patroni

MASTER

c

Patroni

REPLICA

c

Patroni

REPLICA

Auto-scaling group Auto-scaling group

Page 33: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Demo: Deploying Spilo

• We use stups• First we define a template• We create a Cloud Formation

Stack from this template

Page 34: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Patroni (პატრონი)• Handles new replicas and failover

• Based on ideas and code of the Compose Governor

• Open-source• Runs everywhere

Page 35: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Compose Governor idea

● Use etcd for failover decision

● Run etcd on every node● Run 1 node with HAProxy + etcd

Page 36: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Distributed configuration systems

• Fault tolerant

• Reliably store small amounts of strongly-consistent data between distributed nodes

• Good for storing the PostgreSQL cluster state

Page 37: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Distributed consensus

LEADER

CLIENT CLIENT CLIENT

write request

Page 38: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Distributed consensus

LEADER

CLIENT CLIENT CLIENT

write request

LEADER

Page 39: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Cluster state in etcd$ export ETCD=172.17.0.2:4001$ etcdctl -C $ETCD ls /service/ --recursive/service/dm/service/dm/optime/service/dm/optime/leader/service/dm/members/service/dm/members/postgresql_172_17_0_3/service/dm/members/postgresql_172_17_0_4/service/dm/members/postgresql_172_17_0_5/service/dm/initialize/service/dm/leader

Page 40: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Leader key• Points to the member key• Has a TTL, autoexpires• Acts as an exclusive lock• Only the leader can become the

master

Page 41: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Leader TTL$ http http://$ETCD/v2/keys/service/dm/leader...{ "action": "get", "node": { "createdIndex": 8, "expiration": "2015-11-20T09:56:43.59367038Z", "key": "/service/dm/leader", "modifiedIndex": 85, "ttl": 22, "value": "postgresql_172_17_0_3" }}

Page 42: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Member key$ etcdctl -C $ETCD get /service/dm/members/postgresql_172_17_0_5{ "conn_url": "postgres://un:[email protected]:5432/postgres", "api_url": "http://172.17.0.5:8008/patroni", "tags": {}, "state": "running", "role": "replica", "xlog_location": 67109176}

Page 43: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

PatroniPatroni

MASTER REPLICA

MASTER LB

PostgreSQL connection

API HealthCheck (master)

Connection and API URL

Page 44: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

PatroniPatroni

MASTER REPLICA

MASTER LB

PostgreSQL connection

API HealthCheck (master)

Connection and API URL

Page 45: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

PatroniPatroni

MASTER REPLICA

MASTER LB

PostgreSQL connection

API HealthCheck (master)

Connection and API URL

Page 46: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

PatroniPatroni

MASTER REPLICA

MASTER LB

PostgreSQL connection

API HealthCheck (master)

Connection and API URL

Page 47: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Connection and API URL

PatroniPatroni

MASTER REPLICA

MASTER LB REPLICA LB

API HealthCheck (slave)

PostgreSQL connection

API HealthCheck (master)

Page 48: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Initialize key$ etcdctl -C $ETCD get /service/dm/initialize6219169399948550171

• PostgreSQL cluster system ID• Created by the first node that joins

the cluster• Nodes with different system ID are

not allowed to join

Page 49: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Patroni modules

ETCD ZOOKEEPER

ABSTRACT DCS PostgreSQL REST API

High availability

Asynchronous executor

Callbacks

Page 50: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Demo time!

https://asciinema.org/a/2ttvu50yehjo2712s1w43udio

Page 51: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

• Robust exception handling• Run long-running tasks (i.e.

base backup in a separate thread)

• ETCD + Zookeeper• Rest API

Patroni improvements

Page 52: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Patroni improvements

• Configurable replica imaging• Support for pg_rewind• patronictl• packaged:pip install patroni

Page 53: Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Patroni improvements

• Manual failover• Initialize from external cluster• Attach to already running

PostgreSQL nodes• Tags (i.e. nofailover)