Upload
denish-patel
View
4.428
Download
1
Tags:
Embed Size (px)
Citation preview
/
Deploying Maximum HA architecture
withPostgreSQL
Denish PatelDatabase Architect
Who am I ?•Denish Patel
• Database Architect with OmniTI for more than 5 years
• Expertise in PostgreSQL , Oracle, MySQL, MS SQL Server
• Contact : [email protected]
• Blog: http://denishjpatel.blogspot.com/
• Providing Solutions for business problems to deliver
• Scalability
• Reliability
• High Availability
• Consistency
• Security1
We are hiring!! Apply @ l42.org/lg
Agendum
•Why do you need HA architecture ?
•Why PostgreSQL ?
• Traditional HA Architecture
•Goals for Maximum HA
•Maximum HA Solution
2
Assumptions
•Consistency and Availability Matters (CAP theorem)
•Good to reduce MTTF but you have “real” control on MTTR.
3
Why do you need HA architecture?
Loss of productivity
Loss of Revenue
Dissatisfied Customers
4
Application Downtime
Unavailability of Data
Why do you need HA architecture?
5
Unplanned Outages
System Failures
Data Failures
PreventTolerateRecover FastPlanned
Outages
System Changes
Data Changes
Why PostreSQL ?
• Best protection at Lowest Cost
• No additional software costs for providing maximum Availability compared to closed source databases
• Provide free feature sets to prevent outages, tolerate them and recover fast.
6
Traditional HA Architecture
7
MasterDatabase
Standby Database
WAL WAL
Copy WAL files
PostgreSQL 8
Traditional HA Architecture
8
MasterDatabase
Hot Standby Database
WAL WAL
Copy WAL files
Steaming Replication
PostgreSQL 9
Goals for Maximum HA Architecture
9
• 99.99% Uptime of application
• Reduce MTTR
• Planned outages
• Unplanned outages
Plan to reduce MTTR
10
• How do you manage failover ?
• Is it transparent to your application?
• Hot Backups/ Dumps
• Are you running on production server?
• Schema backups
• How often? Are they under revision control ?
• WAL files copy scripts
• Do all of your prod servers using same copy of the script ?
• Where is your reporting queries pointing to ?
• Production DB?
System Failures
11
Unplanned Outages
System Failures
Server Node Fails
Storage Fails
Site Fails
Handle System Failures
12
Master
inet
AppServer
Failover
Floating IP/ VIP
Site Failures
13
Unplanned Outages
System Failures
Server Node Fails
Storage Fails
Site Fails
Handle Site Failures
14
Master
inet
AppServer
Failover
Floating IP/ VIP
Offsite Bkp
WAL apply
Ship WAL Files
SRHS
Data Failures
15
Unplanned Outages
Data Failures
Human Error
Data Corruption
Handle Data Failures
16
• PITR slave lag using OMNIpitr
• 1 hour lag on wal apply
• Periodic pg_dump tables from slave
• Run pg_extractor
• https://github.com/omniti-labs/pg_extractor
• Track schema changes into subversion/git
Data Corruption
17
Unplanned Outages
Data Failures
Human Error
Data Corruption
Handle Data Corruption
18
•File System level backups
•Backups on Slave database using OMNIpitr
•Regular recovery testing
•Snapshot backups for faster recovery
•Solaris ZFS is recommended!
•Monthly pg_dump backups
•Backups on slave
System Changes
19
Planned Outages
System Changes
OS Upgrade
Database Upgrade
Network Changes
Handle OS Upgrades
20
Master
Read Slave 1
Failover
SRHS
SRHS
NAS
Floating IP Master
WAL Copy
Handle OS Upgrades
21
Master
Read Slave 1
New Master
SRHS
SRHS
NAS
Floating IP
Master
WAL Copy
Upgrade OS
Handle OS Upgrades
22
Floating IP
Master
WAL Copy
New Failove
r
Read Slave 1
New Master
SRHS
SRHS
NAS
System Changes
23
Planned Outages
System Changes
OS Upgrade
Database Upgrade
Network Changes
Handle Database Upgrade
24
PG 8.3+ ?
Outage acceptabl
e?pg_upgrade –check
pass?
pg_dumppg_restor
e
Third party Rep
i.e Slony Drop incompatible tables before upgrade and restore after
pg_upgrade
No
Yes
No
Outage acceptab
le?
Yes
Yes
Yes
No
No
* Only showing recommended options
Handle Data Changes
25
Planned Outages
Data Changes
Alter Schemas
Data growth
Handle Alter schemas
26
• Transactional DDL
• CREATE or REPLACE views
• NOT VALID
• Checks
• FKs
• Add column without scanning entire table
• NULLABLE
• No Default
Handle Data Changes
27
Planned Outages
Data Changes
Alter Schemas
Data growth
Handle Data Growth
28
PostgreSQL Bloat removal
• Offline
• VACUUM FULL
• CLUSTER
• Online
• Rebuild index CONCURRENTLY
• Rebuild table online using pg_reorg
http://denishjpatel.blogspot.com/2011/03/extreme-training-session-at-pgeast-p90x.html
Now we have ….
29
9PITR
pg_extractor
Floating IP
pg_reorg
Maximum HA Architecture
30
Master
LB
Failover
Bkp
WAL apply
Bkp
SRHS
SRHS
Floating IP
Master
Read Salve 1
Read Slave 2
NAS
App
References
•PostgreSQL Documentations
•http://www.postgresql.org/docs/
•OmniTI Labs
•https://labs.omniti.com/
•OMNIpitr
•pg_extractor
30
Thanks
•PG Day NYC Conference Committee
•OmniTI
•You!!
31
Questions?
32