Best Practices of HA and Replication of PostgreSQL in Virtualized Environments

© 2010 VMware Inc. All rights reserved

Best Practices for HA and Replication for PostgreSQL in Virtualized Environments March 2013 Jignesh Shah & vPostgres Team @ VMware

2

Agenda §  Enterprise needs §  Technologies readily available •  Virtualization Technologies for HA • Replication modes of PostgreSQL (in core)

§  Datacenter Deployment Blueprints • HA within Datacenter, • Read-Scaling within Datacenter

• DR/Read-Scaling across Datacenters

3

Enterprise Needs for Mission Critical Databases

4

Causes of Downtime

§  Planned Downtime •  Software upgrade (OS patches, SQL Server cumulative updates) • Hardware/BIOS upgrade

§  Unplanned Downtime • Datacenter failure (natural disasters, fire) •  Server failure (failed CPU, bad network card)

•  I/O subsystem failure (disk failure, controller failure)

•  Software/Data corruption (application bugs, OS binary corruptions) • User Error (shutdown a SQL service, dropped a table)

5

Enterprises need HA

§  HA - High Availability of the Database Service •  Sustain Database Service failure if it goes down •  Sustain Physical Hardware Failures

•  Sustain Data/Storage Failures •  100% Data Guarantee

§ Goal • Reduce Mean Time To Recover (MTTR) or Recovery Time Objective (RTO)

•  Typically driven by SLAs

6

Planning a High Availability Strategy

§  Requirements •  Recovery Time Objective (RTO)

•  What does 99.99% availability really mean?

•  Recovery Point Objective (RPO) •  Zero data lost?

•  HA vs. DR requirements

§  Evaluating a technology •  What’s the cost for implementing the technology?

•  What’s the complexity of implementing, and managing the technology?

•  What’s the downtime potential?

•  What’s the data loss exposure? Availability % Downtime / Year Downtime / Month * Downtime / week

"Two Nines" - 99% 3.65 Days 7.2 Hours 1.69 Hours "Three Nines" - 99.9% 8.76 Hours 43.2 Minutes 10.1 Minutes "Four Nines" - 99.99% 52.56 Minutes 4.32 Minutes 1.01 Minutes "Five Nines" - 99.999% 5.26 Minutes 25.9 Seconds 6.06 Seconds

* Using a 30 day month

7

Enterprises need DR

§  DR – Disaster Recovery for your site • Overcome Complete Site Failure • Closest if not 100% Data Guarantee expected

•  Some data loss may be acceptable

•  Key Metrics •  RTO – Recovery Time Objective

•  Time to Recover the service •  RPO – Recovery Point Objective

8

Enterprises also need Scale UP

§  Scale UP – Throughput increases with more resources given in the same VM

§  Though in reality limited by Amdahl’s law

9

Enterprises also need Scale Out

§  Scale Out – Throughput increases with more resources given via more nodes (VMs)

§  Typically Shared Nothing architecture (few Shared ‘something’) § Often results in “partitions” or “shards”

10

Scale Out - For Reads

§  Scale Out or Multi-Node Scaling for Reads § Online retailer Use Case §  99% Reads and 1% Actual Write transactions

11

Scale Out - For Writes

§  Scale Out or Multi-nodes for Writes §  Example Use case: 24/7 Booking system §  Constant booking/changes/updates happening

12

CAP Theorem

§  Consistency •  all nodes see the same data at the same time

§  Availability • Guarantee that every request receives a response about whether it was

successful or failed

§  Partition Tolerance •  the system continues to operate despite arbitrary message loss or failure of

part of the system

13

Virtualization Technologies for HA

14

Virtualization Availability Features

15

VM Mobility

§  Server Maintenance •  VMware vSphere® vMotion® and

VMware vSphere Distributed Resource Scheduler (DRS) Maintenance Mode

• Migrate running VMs to other servers in the pool •  Automatically distribute workloads

for optimal performance

§  Storage Maintenance •  VMware vSphere® Storage vMotion

• Migrate VM disks to other storage targets without disruption

Key Benefits •  Eliminate downtime for common

maintenance

•  No application or end user impact

•  Freedom to perform maintenance whenever desired

16

VMware vSphere High Availability (HA)

§  Protection against host or operating system failure •  Automatic restart of virtual machines on any available host in cluster •  Provides simple and reliable first line of defense for all databases • Minutes to restart • OS and application independent, does not require complex configuration

or expensive licenses

17

App-Aware HA Through Health Monitoring APIs

§  Leverage third-party solutions that integrate with VMware HA (for example, Symantec ApplicationHA)

OS

APP

OS

APP

Database Health Monitoring •  Detect database service failures inside VM VMware HA

1

Database Service Restart Inside VM •  App start / stop / restart inside VM •  Automatic restart when app problem detected

2

Integration with VMware HA •  VMware HA automatically initiated when

•  App restart fails inside VM •  Heartbeat from VM fails

3

App Restart

1

2

3

18

Simple, Reliable DR with VMware SRM

§  Provide the simplest and most reliable disaster protection and site migration for all applications

§  Provide cost-efficient replication of applications to failover site §  Simplify management of recovery and migration plans §  Replace manual run books with centralized recovery plans §  From weeks to minutes to set up new plan §  Automate failover and migration

processes for reliable recovery §  Provide for fast, automated failover §  Enable non-disruptive testing §  Automate failback processes

19

Har

dwar

e Fa

ilure

Tol

eran

ce

Application Coverage

VMware FT

Unprotected

Automated Restart

Continuous

0% 10% 100%

VMware HA

VMotion (Planned Downtime)

PostgreSQL Streaming Replication

RedHat/OS Cluster

High Availability Options through Virtualization Technologies

§  Clustering too complex and expensive for most applications §  VMware HA and FT provide simple, cost-effective availability §  VMotion provides continuous availability against

planned downtime

20

PostgreSQL Replication Modes

21

PostgreSQL Replication

§  Single master, multi-slave §  Cascading slave possible with vFabric Postgres 9.2 § Mechanism based on WAL (Write-Ahead Logs) § Multiple modes and multiple recovery ways • Warm standby •  Asynchronous hot standby

•  Synchronous hot standby

§  Slaves can perform read operations optionally • Good for read scale

§  Node failover, reconnection possible

22

File-based replication

§  File-based recovery method using WAL archives § Master node sends WAL files to archive once completed §  Slave node recovers those files automatically §  Some delay for the information recovered on slave • Usable if application can lose some data • Good performance, everything is scp/rsync/cp-based

•  Timing when WAL file is sent can be controlled

vPG master

vPG slave 1

WAL file

WAL Archive disk

WAL file

vPG slave 2

23

Asynchronous replication

§ WAL record-based replication § Good balance performance/data loss •  Some delay possible for write-heavy applications • Data loss possible if slaves not in complete sync due to delay

§  Possible to connect a slave to a master or a slave (cascading mode)

vPG master

WAL shipping

Slave 1

Slave 2

Slave 1-1

Slave 1-2

24

Synchronous mode

§  COMMIT-based replication • Only one slave in sync with master • Master waits that transaction COMMIT happens on sync slave, then commits

§  No data loss based on transaction commit •  Performance impact • Good for critical applications

§  Cascading slaves are async

vPG master

WAL shipping

Slave 1

Slave 2

Slave 1-1

Slave 1-2

async

25

HA operations: failover and node reconnection

26

Node failover (1)

§  Same procedure for all the replication modes

§  Failover procedure • Connect to slave VM

•  Promote the slave

•  recovery.conf renamed to recovery.done in $PGDATA

•  Former slave able to run write queries

vPG master Slave

Promotion

ssh postgres@$SLAVE_IP

pg_ctl promote

27

Node failover (2)

§  Locate archive disk to a new slave node • Recreate new virtual disk on new node • Update restore_command in recovery.conf of the remaining slaves

• Update archive_command in postgresql.conf of promoted slave • Copy WAL files from remaining archive disk to prevent SPOF after loss of

master

28

Node reconnection §  In case a previously failed node is up again

§  Reconnection procedure • Connect to old master VM

• Create recovery.conf depending on recovery mode wanted

•  Start node

•  Important! Retrieving WAL is necessary for timeline switch

old master Promoted

Slave Reconnect

ssh postgres@$MASTER_IP

recovery_target_timeline = ‘latest’ standby_mode = on restore_command = ’scp $SLAVE_IP:/archive/%f %p’ primary_conninfo = 'host=$SLAVE_IP application_name=$OLD_NAME’

service postgresql start

29

Additional tips

§  DB and server UI • Usable as normal, cannot create objects on slave of course

§ wal_level •  ‘archive’ for archive only recovery

•  ‘hot_standby’ for read-queries on slaves

§  pg_stat_replication to get status of connected slaves

postgres=# SELECT pg_current_xlog_location(), application_name, sync_state, flush_location FROM pg_stat_replication;

pg_current_xlog_location | application_name | sync_state | flush_location --------------------------+------------------+------------+---------------- 0/5000000 | slave2 | async | 0/5000000 0/5000000 | slave1 | sync | 0/5000000 (2 rows)

30

Virtualized PostgreSQL Datacenter Deployment

Blueprints

31

Single Data Center Deployment

Highly Available PostgreSQL database server with HA from virtualization environment

§ Easy to setup with one click HA

§ Handles CPU/Memory hardware issues

§ Requires Storage RAID 1 for storage protection (atleast)

§ RTO in couple of minutes

DNS Name

Applications

Site 1

32

vSphere HA with PostgreSQL 9.2 Streaming Replication)

§  Protection against HW/SW failures and DB corruption §  Storage flexibility

(FC, iSCSI, NFS) §  Compatible w/ vMotion,

DRS, HA §  RTO in few seconds §  vSphere HA + Streaming Replication • Master generally restarted with vSphere HA • When Master is unable to recover, the Replica can be promoted to master

• Reduces synchronization time after VM recovery

33

Single Data Center Deployment

Highly Available PostgreSQL database server with synchronous replication

§ Synchronous Replication within Data Center

§ Low Down Time (lower than HA)

§ Automated Failover for hardware issues including Storage

Virtual IP or DNS or pgPool or pgBouncer

Applications

Site 1

34

Multi-site Data Center Deployment

Replication across Data Centers with PostgreSQL for Read Scaling/DR


§ Asynchronous replication across data enters

§ Read Scaling (Application Driver )

Virtual IP or pgPool or pgBouncer

Applications

Site 1

Site 2

Site 3

35

Multi-site Data Center Deployment

Replication across Data Centers with Write Scaling (requires sharding)

§ Each Site has its own shard, its synchronous replica and asynchronous replicas of other sites

§ Asynchronous replication across data enters

§ HA/DR built-in

§ Sharding is application driven


Applications

Site 1

Site 2

Site 3

36

Hybrid Cloud

Hybrid Cloud Scaling for Fluctuating Read peaks

§ Many times reads go up to 99% of workload

§  (Example a sensational story that every one wants to read)


§ Asynchronous Replica slaves within Data Center and on Hybrid Clouds

§ More replicas are spun up when load increases and discarded when it decreases


Applications

Cascaded Read Replicas

Site 1

37

Virtualization Platform

Summary

Planned downtime avoidance •  vMotion & Storage vMotion

Un-Planned downtime recovery •  vSphere HA + AppAware HA •  vSphere FT

Disaster recovery •  Site Recovery Manager

Future /Others

•  BiDirectional Replication

•  Slony/Londiste, etc

PostgreSQL 9.2

•  Database Replication •  Synchronous •  Asynchronous •  Log Shipping

38

Your Feedback is Important!

If interested, §  Drop your at end of session §  Email: [email protected]

39

Thanks. Questions?

Follow us on twitter: @vPostgres

vFabric Blog: http://blogs.vmware.com/vfabric/postgres

Technology

Best Practices of HA and Replication of PostgreSQL in Virtualized Environments