34
Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru

Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

  • Upload
    others

  • View
    76

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Postgres Cluster and Multimaster

Ivan Panchenko

Postgres Pro

postgrespro.ru

Page 2: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

2

Cluster definition: several DBs working as one

Redundancy

Sharding

Parallel query processing

Failover

Dynamic reconfiguration

Cluster-wide data

consistency (distributed

transaction)

Read scalability

Write scalability

Reliability

Page 3: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

First attempts (~2006)

Trigger based (SLONY/Londiste) or application

level replication

No real consistency : needs verifications

No real failover (data loss possible)

Read scalability

Only application level sharding for write scalability

3

Page 4: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

4

Mainstream evolution

WAL (write ahead logs)

WAL shipping

WAL streaming: on-line async replication.

Logical replication: is more flexible

Synchronous replication: necessary to exclude data loss

HA provided by external tools or performed manually

Page 5: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

5

Asymmetric (Single master) clusters

MASTER SYNC REP ASYNC REP

Read

Write

Read Only In case of

master failure

Page 6: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

6

Reliable cluster: main challenge

Split-brain problem:

MASTER NEW MASTER

Read

Write

Read

Write

Temporary or long lasting Internal

connection failure:

• Sync replica promotion

• Some clients connect to Old master

• Some clients connect to New master

• Chaos grows

Client

1

Client

2

Page 7: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Reliable cluster architecture

7

PostgreSQL PostgreSQL PostgreSQL

Distributed configuration

database Fencing

Client switch agent: DNS, IP, proxy

Replication

Page 8: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Reliable cluster: present solutions

8

Solution Origin License Basis Split brain

protection

Patroni

(engine, not a

solution)

Zalando MIT Etcd OR

zookeper OR

consul

May be

PAF Dalibo Postgres Corosync/

Pacemaker Yes

Repmgr 2nd Quadrant GPLv3 - Should be

Postgres Pro Postgres Pro Commercial Corosync/

Pacemaker Yes

Page 9: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Corosync/Pacemaker

Developed by Red Hat.

Resource Agent – an interface utility to manage a resource. Must

implement the following commands:

1. start

2. stop

3. status

9

4. monitor

5. promote

6. demote

Page 10: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Cluster practice

Diagnostics

Failover

Synchronous replica switchover

Asynchronous replica switchover

Recovery

Split brain

Deleting node

Adding node 10

Page 11: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Cluster status transitions

11

Master Sync

Async Off

Page 12: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

A minimal cluster

12

Page 13: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Failover visualised

13

Page 14: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

OLAP clusters

Citus DB

Green Plum

14

Technology: Postgres fork

License: Commercial; moving

to open source

Transaction consistency: None

Scalability: good

Page 15: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Read scalability in vanilla Postgres

15

Single host 9.6:

Parallel query execution

Page 16: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Read scalability in distributed database with sharding

16

Table partitioning

FDW: remote partitions

No transaction integrity

Page 18: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Write-scalable clusters

Postgres XC (dev.since

2010)

Postgres XL (2014)

Postgres X2 (2016)

18

Technology: Postgres fork

Write scalability: some

Parallel processing: yes

Failover: yes

Transaction consistency: not

enough

Page 19: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

BDR (Bidirectional replication)

Logical-replication

based

Post-commit replication

Each transaction

replicated to each node

19

Technology: Postgres fork;

moving to PostgreSQL

License: Commercial; moving

to open source

Transaction consistency: None

Read scalability: good

Page 20: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Postgres Pro Multimaster

Logical replication

based

Each transaction

replicated to each node

Distributed transaction

manager

Internal failover engine

20

Technology: Postgres

extension

License: Commercial; some

parts - open source

Transaction consistency: Yes

Read scalability: good

Write scalability: will have

Part of Postgres Pro Enterprise commercial distribution

Page 21: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Easy in use cluster

No performance penalty for reads.

Transaction can be issued to any node.

No special actions required in case of failure

(excl. client reconnect)

21

Page 22: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Design goals

Identical data on all nodes

Possibility to have local tables

Maximum Postgres compatibility

Writes to any node

Fault tolerance

22

Next step: add sharding for write scalability

Page 23: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Transaction manager requirements

No single point of failure

+: Spanner, Cockroach, Clock-SI

- : Pg-XL

Read-only transactions from a single node without communication

between nodes

+: SAP HANA, Spanner, Cockroach, Clock-SI

- : Pg-XL

23

Page 24: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Why logical replication?

Already existing open source solution by 2nd Quadrant

Very flexible, i.e:

o Can skip tables

o Replicates between different versions

25

Page 25: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Why logical replication?

Already existing open source solution by 2nd Quadrant

Very flexible, i.e:

o Can skip tables

o Replicates between different versions

26

Page 26: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Transaction implementation

27

BE – backend,

WS – Walsender,

Arb – Arbiter,

WR – Walreceiver

Page 27: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Normal work

28

Page 28: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Internal network split

29

Page 29: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Internal network split: recovery

30

Page 30: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Normal work after recovery

31

Page 31: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Failures tested

Node stop-start

Node kill-start

Simple network split

Asymmetric network split

Shift time

Change clock speed on nodes (work in progress)

32

Page 32: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Performance

Read-only performance is the same as in single instance

Commit takes more time (two network roundtrips).

Logical decoding slows down big transactions (to be fixed soon)

33

Page 33: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

Information

Postgres Pro documentation: http://postgrespro.com/docs

PgConf.RU : international conference in Moscow – March 15-17.

o http://pgconf.ru/

o Russian and English with simultaneous translation

o 7 Tutorials; > 50 talks

34

Page 34: Postgres Cluster and Multimaster · Postgres Cluster and Multimaster Ivan Panchenko Postgres Pro postgrespro.ru . 2 ... Mainstream evolution WAL (write ahead logs) ... Fault tolerance

postgrespro.ru

http://postgrespro.co.il/

+972 54 305 7642

[email protected]

Postgres Miktzoanim