50
Postgres-XC: Write-Scalable PostgreSQL Cluster Mason Sharp August 7th, 2012 CC License: Attribution-NonCommercial-ShareAlike

Postgres-XC Write Scalable PostgreSQL Cluster

Embed Size (px)

DESCRIPTION

An overview of Postgres-XC is provided. Postgres-XC is a free, open source, PostgreSQL based write scalable cluster. It runs on multiple servers, is fully ACID and consistent. Postgres-XC is a traditional relational alternative to cases where NoSQL solutions are being considered. This presentation was given in San Francisco August 7th by Mason Sharp, one of the original architects of Postgres-XC and co-founder of StormDB (http://www.stormdb.com).

Citation preview

Page 1: Postgres-XC Write Scalable PostgreSQL Cluster

Postgres-XC: Write-Scalable PostgreSQL Cluster

Mason Sharp

August 7th, 2012

CC License: Attribution-NonCommercial-ShareAlike

Page 2: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 2

Content Attribution

• Koichi Suzuki• Michael Paquier• Ashutosh Bapat• Pavan Deolasee• Mason Sharp• ...?

Page 3: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 3

Who am I

● Mason Sharp

● Co-organizer of NYC PUG

● Co-founder of StormDB

● Previously worked at EnterpriseDB

● Original architect of Stado (GridSQL)

● One of the original architects of Postgres-XC

Page 4: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 4

PostgreSQL User Groups

San Francisco New York616 Members 502 Members

Tokyo2000? Members

New:PhiladelphiaLos Angeles

Page 5: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 5

NYC PUG Meetup Membership

Page 6: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 6

NYC PUG Speakers

● Recent speakers include● Bruce Momjian● Greg Smith● Greg Stark● Joe Conway● Joachim Wieland

Page 7: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 7

NYC PUG Speakers

We want you!

Page 8: Postgres-XC Write Scalable PostgreSQL Cluster

8

Postges-XC Talk

● Background● Postgres-XC Introduction & Usage● Postgres-XC Components● Postgres-XC Details

Page 9: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 9

Background

Page 10: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 10

Data Tier Scaling

● Up versus Out

● More memory, more cores● Read-only Replicated Slaves

● Caching

● Memcached● Sharding

● NoSQL

● NewSQL

Page 11: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 11

XC Origins

Koichi Suzuki, NTT Data Mason Sharp

Page 12: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 12

PostgreSQL-Related Clustering Projects

● pgpool-II

● Read replicated slaves● PL/Proxy

● Used by Skype, meetme (myYearbook)● All access is over a stored function

● Postgres-R, PostgresForest

● Stado (GridSQL)

● Parallel Query● Not write-scalable

Can we make it write scalable?

Page 13: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 13

Postgres-XC Introduction

Page 14: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 14

Overview

● PostgreSQL-based database cluster● Same API to Apps as PostgreSQL

– Same drivers● Currently based upon PG 9.1. Soon: 9.2.

● Symmetric Multi-headed Cluster

● No master, no slave– Not just PostgreSQL replication.

– Application can read/write to any coordinator server● Consistent database view to all the transactions

– Complete ACID property to all the transactions in the cluster

● Scales both for Write and Read

Page 15: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 15

Postgres-XC Cluster

Coordinator

Data Node

PG-XC Server

Coordinator

Data Node

Coordinator

Data Node

Coordinator

Data Node

・・・・・

Communication among PG-XC servers

Add PG-XC servers as needed

Global Transaction Manager

Application can connect to any server to have the same database view and service.

GTM

PG-XC Server PG-XC Server PG-XC Server

Page 16: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 16

Read/Write Scalability

DBT-1 throughput scalability

Page 17: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 17

I Consistency

Page 18: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 18

Is XC right for you?

● I need write scalability● I like ACID● I like SQL● I don't want to rewrite my existing SQL

applications● I want to leverage the PostgreSQL community

for all of their contrib modules

Page 19: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 19

Why XC may not be right for you

● I need MPP parallel query capability● Parallel Query in XC Limited● Try Stado: www.stado.us

● I need a solution with built-in HA● I need massive scale and have loose

consistency requirements● I would rather use a NoSQL solution so I can

put it on my resume

Page 20: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 20

Postgres-XC Components

Page 21: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 21

Page 22: Postgres-XC Write Scalable PostgreSQL Cluster

Coordinator Overview● Based on PostgreSQL 9.1 (9.2 soon)● Accepts connections from clients● Parses and plans requests● Interacts with Global Transaction Manager● Uses pooler for Data Node connections● Sends down XIDs and snapshots to Data

Nodes● Collects results and returns to client● Uses two phase commit if necessary

22

Page 23: Postgres-XC Write Scalable PostgreSQL Cluster

Data Node Overview● Based on PostgreSQL 9.1 (9.2 soon)● Where user created data is actually

stored● Coordinators (not clients) connects to

Data Nodes● Accepts XID and snapshots from

Coordinator● The rest is fairly similar to vanilla

PostgreSQL

23

Page 24: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 24

Global Transaction Manager

Cluster nodesGTM

XIDSnapshotTimestampSequence values

Page 25: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 25

Summary● Coordinator

● Visible to apps

● SQL analysis, planning, execution

● Connection pooling

● Datanode (or simply “NODE”)

● Actual database store

● Local SQL execution

● GTM (Global Transaction Manager)

● Provides consistent database view to transactions

– GXID (Global Transaction ID)

– Snapshot (List of active transactions)

– Other global values such as SEQUENCE

● GTM Proxy, integrates server-local transaction requirement for performance

Postgres-XC core, based uponvanilla PostgreSQL

Share same binary

May want to colocate

Different binaries

Page 26: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 26

Data Distribution

Distribution Strategies

Page 27: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 27

Distributing the data

● Replicated table● Each row in the table is replicated to the datanodes● Statement based replication

● Distributed table● Each row of the table is stored on one datanode,

decided by one of following strategies– Hash– Round Robin– Modulo– Range and user defined function (future)

Page 28: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 28

Table Distribution and Replication

● Each table can be distributed or replicated● Strategy based on usage

– Transaction tables → Distributed– Static lookup tables → Replicate– Distribute parent-children together

● Join pushdown when possible● Where clause pushdown● Simple parallel aggregates

Page 29: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 29

Defining Tables

● Table Distribution/Replication● CREATE TABLE tab (…) DISTRIBUTE BY

HASH(col) | MODULO(col) | ROUND ROBIN | REPLICATION

Page 30: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 30

Replicated Tables

Writes

write write write

val val2

1 2

2 10

3 4

val val2

1 2

2 10

3 4

val val2

1 2

2 10

3 4

Reads

read

val val2

1 2

2 10

3 4

val val2

1 2

2 10

3 4

val val2

1 2

2 10

3 4

Page 31: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 31

Distributed Tables

Combiner

Read

read read read

val val2

1 2

2 10

3 4

val val2

11 21

21 101

31 41

val val2

10 20

20 100

30 40

Write

write

val val2

1 2

2 10

3 4

val val2

11 21

21 101

31 41

val val2

10 20

20 100

30 40

Page 32: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 32

Join PushdownHash/Module distributed

Round Robin Replicated

Hash/Modulo distributed

Inner join with equality condition on the distribution column with same data type and same distribution strategy

NO Inner join if replicated table's distribution list is superset of distributed table's distribution list

Round Robin No No Inner join if replicated table's distribution list is superset of distributed table's distribution list

Replicated Inner join if replicated table's distribution list is superset of distributed table's distribution list

Inner join if replicated table's distribution list is superset of distributed table's distribution list

All kinds of joins

Page 33: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 33

Constraints● XC does not support Global constraints – i.e.

constraints across datanodes● Constraints within a datanode are supportedDistribution strategy Unique, primary key

constraintsForeign key constraints

Replicated Supported Supported if the referenced table is also replicated on the same nodes

Hash/Modulo distributed Supported if primary OR unique key is distribution key

Supported if the referenced table is replicated on same nodes OR it's distributed by primary key in the same manner and same nodes

Round Robin Not supported Supported if the referenced table is replicated on same nodes

Page 34: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 34

Demo

Page 35: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 35

Transaction Management

Why MVCC is Important for ConsistencyGlobal Transaction Manger

Page 36: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 36

Multi-version Concurrency Control (MVCC) (quick overview)

● Readers do not block writers

● Writers do not block readers

● Transaction Ids (XIDs)

● Every transaction gets an ID● Snapshots contain a list of running XIDs

Page 37: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 37

Multi-version Concurrency Control (MVCC) (quickly discussed)

Example:

T1 Begin...

T2 Begin; INSERT...; Commit

T3 Begin...

T4 Begin; SELECT

● T4's snapshot contains T1 and T3

● T2 already committed● It can see T2's commits, but not T1's nor T3's

Page 38: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 38

Multi-version Concurrency Control (MVCC) on 2 Independent Nodes

Example:T1 Begin...

T2 Begin; INSERT..; Commit;

T3 Begin...

T4 Begin; SELECT

● Node 1: T2 Commit, T4 SELECT

● Node 2: T4 SELECT, T2 Commit

● T4's SELECT statement returns inconsistent data

● Includes data from Node1, but not Node2. ● C in ACID Fails

Page 39: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 39

Global Transaction Manager (GTM)

● Provides Global Transaction Consistency

Cluster nodesGTM

XIDSnapshotTimestampSequence values

Page 40: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 40

Transaction Management

● 2PC is used to guarantee transactional consistency across nodes

● When there are more than one nodes involved OR● When there are explicit 2PC transactions

● Only those nodes where write activity has happened, participate in 2PC

● In PostgreSQL 2PC can not be applied if temporary tables are involved. Same restriction applies in Postgres-XC

● When single coordinator command needs multiple datanode commands, we encase those in transaction block

Page 41: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 41

Postgres-XC Considerations

Page 42: Postgres-XC Write Scalable PostgreSQL Cluster

• Depending on implementation– Current Implementation

– Large snapshot size and number– Too many interaction between GTM and Coordinators

Can GTM be a Performance Bottleneck?

July 12th, 2012 42

Applicable up to five PG-XC servers (DBT-1)

GTM Threads

GTM

Sna

psh

ot

Da

ta

GTM Main Thread

Create Terminate

Coordinators

Coordinator Backend

Clie

nt

Libr

ary

Co

ord

ina

tor

Ca

llInte

rne

t D

om

ain

So

cke

t

Lock

Page 43: Postgres-XC Write Scalable PostgreSQL Cluster

Can GTM be a Performance Bottleneck?

July 12th, 2012 43

Proxy Implementation

•Request/Response grouping•Single representative snapshot applied to multiple transactions

GTM Worker Threads

GTM

Sna

psh

ot

Da

ta

GTM Main Thread

Create Terminate

Coordinators

Coordinator Backend

Clie

nt

Libr

ary

Co

ord

ina

tor

Ca

llInte

rne

t D

om

ain

S

ocke

t

Lock

GT

M S

nap

shot

Han

dler

GT

M S

erv

er

Sca

nne

r

Ca

ll

Ser

ver

Pro

toco

l Han

dler

Proxy Main Thread

Ba

cken

d C

omm

and

Han

dler

Ba

cken

d R

esp

ons

e H

andl

er

Un

ix

Do

ma

in

Soc

ket

Connection

CreateTerminate

ConnectionAssignment

GTM Proxy Thread

Page 44: Postgres-XC Write Scalable PostgreSQL Cluster

Can GTM be a SPOF?

July 12th, 2012 44

• Implement GTM Standby

GTM Master GTM Standby

Checkpoint next starting point (GXID and Sequence)

Standby can failover the master without referring to GTM master information.

Page 45: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 45

Parallel Query

● OK for simple queries● Also when all joins can be pushed down

– Star schema with replicated dimensions

● Even aggregates● SELECT SUM(col1) FROM tab1

● If cross-node join needed performs poorly● Data on one node needs to join with another● Ships all data to coordinator for joining

Page 46: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 46

High Availability

● GTM-standby provides basic HA● No native HA for nodes

● Use HA middleware such as Pacemaker

● Each data node should be configured with synchronous replication

Page 47: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 47

Status

Settings and options

Page 48: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 48

Present Status

● Project/Developer site● http://postgres-xc.sourceforge.net/

● http://sourceforge.net/projects/postgres-xc/

● Version 1.0 available● Base PostgreSQL version: 9.1● Soon, PostgreSQL 9.2!

– Group commit: even more write scalability– “Index-only Scans”

● Get Involved● Even as just a tester

Page 49: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 49

Easy way of trying it out?

● www.stormdb.com● Not Postgres-XC, but similar● Nothing to install, cloud hosted● Free beta

Page 50: Postgres-XC Write Scalable PostgreSQL Cluster

Aug 7, 2012 Postgres-XC 50

Thank You

[email protected]: mason_db