Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Scaling & Sharding PostgreSQLPrinciples and Practice
Jason Petersen
Software Developer, Citus Data
Copyright © 2015 Citus Data, Inc. 1
This talk
Copyright © 2015 Citus Data, Inc. 2
What we talk about when we talk about sharding:
Copyright © 2015 Citus Data, Inc. 3
What we talk about when we talk about sharding:
Horizontal partitioning
Copyright © 2015 Citus Data, Inc. 4
Horizontal partitioning […] involves putting different rows
into different tables.— Wikipedia, “Shard (database architecture)”
Copyright © 2015 Citus Data, Inc. 5
Sharding goes beyond this: […] it does this across potentially
multiple instances of the schema.— Also Wikipedia
Copyright © 2015 Citus Data, Inc. 6
Putting our foot down
Sharding is a form of horizontal partitioning which distributes database rows across totally separate physical database servers.
Copyright © 2015 Citus Data, Inc. 7
A form of horizontal partitioning which distributes rows across
totally separate physical database servers.
— Citus Data
Copyright © 2015 Citus Data, Inc. 8
What is Citus Data?
Copyright © 2015 Citus Data, Inc. 9
(Pronounced like “Midas”)
Copyright © 2015 Citus Data, Inc. 10
(We make CitusDB)
Copyright © 2015 Citus Data, Inc. 11
What is CitusDB?
— Scalable analytics DB
— Extends PostgreSQL
— Brings distributed query logic
— Supports all types, extensions
— Does it all using sharding
Copyright © 2015 Citus Data, Inc. 12
You may be thinking…click_events_2012.
Node%#1%(PostgreSQL)%
click_events_2013.
Node%#2%
click_events_2014.
Node%#3%
Copyright © 2015 Citus Data, Inc. 13
How doesthat scale?Copyright © 2015 Citus Data, Inc. 14
Not very well…
Node%#4%
click_events_2012.
Node%#1%
(4#TB)# click_events_2013.
Node%#2%
(4#TB)# click_events_2014.
Node%#3%
(4#TB)#
Copyright © 2015 Citus Data, Inc. 15
Not very well…
click_events_2012.
Node%#1%
(4#TB)# click_events_2013.
Node%#2%
(4#TB)# click_events_2014.
Node%#3%
(4#TB)#
Node%#4%
1#TB#(each)#
Copyright © 2015 Citus Data, Inc. 16
What about loadcharacteristics?
Copyright © 2015 Citus Data, Inc. 17
Not great, either…
click_events_2012.
Node%#1%
click_events_2013.
Node%#2%
click_events_2014.
Node%#3%click_events_2012.
Node%#4%click_events_2013.
Node%#5%
click_events_2014.
Node%#6%
Copyright © 2015 Citus Data, Inc. 18
Not great, either…
click_events_2012.
Node%#1%
click_events_2013.
Node%#2%
click_events_2014.
Node%#3%click_events_2012.
Node%#4%click_events_2013.
Node%#5%
click_events_2014.
Node%#6%
Copyright © 2015 Citus Data, Inc. 19
So what to do?
Copyright © 2015 Citus Data, Inc. 20
… when initially implementing sharding you’ll want to create an
arbitrary number of logical shards.
— Craig Kerstiens, “Sharding Your Database”
Copyright © 2015 Citus Data, Inc. 21
“Logical”?Copyright © 2015 Citus Data, Inc. 22
[the] system consists of several thousand ‘logical’ shards that are
mapped in code to far fewer physical shards…
— “Sharding & IDs at Instagram”
Copyright © 2015 Citus Data, Inc. 23
… we can start with just a few database servers, and eventually
move to many more…— “Sharding & IDs at Instagram”
Copyright © 2015 Citus Data, Inc. 24
A better approach
Node%#1%(PostgreSQL)%
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Node%#2%
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Node%#3%
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Copyright © 2015 Citus Data, Inc. 25
Easier growth…
Node%#4%
Node%#1%(PostgreSQL)%
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Node%#2%
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Node%#3%
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
512$MB$(each)$
Copyright © 2015 Citus Data, Inc. 26
Graceful failure…
Node%#1%(PostgreSQL)%
1" 6" 7"
…" …" …"
…" …" …"
…" …" …"
Node%#2%
1" 2" 7"
…" …" …"
…" …" …"
…" …" …"
Node%#3%
2" 3" 8"
…" …" …"
…" …" …"
…" …" …"
Node%#4%
3" 4" 8"
…" …" …"
…" …" …"
…" …" …"
Node%#5%
4" 5" 9"
…" …" …"
…" …" …"
…" …" …"
Node%#6%
5" 6" 9"
…" …" …"
…" …" …"
…" …" …"
Copyright © 2015 Citus Data, Inc. 27
Graceful failure…
Node%#1%(PostgreSQL)%
1" 6" 7"
…" …" …"
…" …" …"
…" …" …"
Node%#2%
1" 2" 7"
…" …" …"
…" …" …"
…" …" …"
Node%#3%
2" 3" 8"
…" …" …"
…" …" …"
…" …" …"
Node%#4%
3" 4" 8"
…" …" …"
…" …" …"
…" …" …"
Node%#5%
4" 5" 9"
…" …" …"
…" …" …"
…" …" …"
Node%#6%
5" 6" 9"
…" …" …"
…" …" …"
…" …" …"
Copyright © 2015 Citus Data, Inc. 28
Logical!Copyright © 2015 Citus Data, Inc. 29
Logical shard benefits
— Enables rebalancing
— Better failure modes
— More granular migrations
— Performance benefits
Copyright © 2015 Citus Data, Inc. 30
But wait!Copyright © 2015 Citus Data, Inc. 31
Sharding concerns
— Operations burden
— Network resiliency
— ACID tradeoffs?
— No return
Copyright © 2015 Citus Data, Inc. 32
(Should be your last)
Copyright © 2015 Citus Data, Inc. 33
Pyramids!Copyright © 2015 Citus Data, Inc. 34
Self-Actualiza.on
Esteem
Love/Belonging
Safety
Physiological
Copyright © 2015 Citus Data, Inc. 35
Sharding!
SplitLoad
HardwareandTuning
DatabaseSchema
Applica;onCode
Copyright © 2015 Citus Data, Inc. 36
Always useSCIENCE
Copyright © 2015 Citus Data, Inc. 37
Getting to ScalePrinciples
1. Generate realistic load
2. Measure, measure, measure…
3. Change one thing
4. Determine the impact
5. GOTO the first step
Copyright © 2015 Citus Data, Inc. 38
Getting to ScaleDetermining Workload
— pg_stat_statements
— pgBadger
— PoWA
— New Relic
Copyright © 2015 Citus Data, Inc. 39
Getting to ScaleGenerating Load
— pgbench
— apachebench
— jmeter
— Fill up your queue!
Copyright © 2015 Citus Data, Inc. 40
Getting to Scale
Measuring
— Long runtimes
— Eliminate hidden unknowns
— pgbench-tools
— time
Copyright © 2015 Citus Data, Inc. 41
Changeand
CompareCopyright © 2015 Citus Data, Inc. 42
Sharding!
SplitLoad
HardwareandTuning
DatabaseSchema
Applica;onCode
Copyright © 2015 Citus Data, Inc. 43
Getting to sharding…Optimize application logic
Add caching. Use connection pools. Bundle writes and issue them in batches. Use JOINs judiciously. Dig beneath your ORM.
Copyright © 2015 Citus Data, Inc. 44
Getting to sharding…✓ Optimize application logic
Tweak schemas
Denormalize where necessary. Add indexes to all commonly used columns. Locally partition tables if it makes sense. Move hot columns to separate tables.
Copyright © 2015 Citus Data, Inc. 45
Getting to sharding…✓ Optimize application logic
✓ Tweak schemas
Upgrade and tune
Benchmark your system. Determine resource bottlenecks. Upgrade. Tune postgresql.conf to within an inch of its life. Do the same1 for your OS.
1 Check out Brendan Gregg’s USE Method
Copyright © 2015 Citus Data, Inc. 46
Getting to sharding…✓ Optimize application logic
✓ Tweak schemas
✓ Upgrade and tune
Try replication
Use a read replica. Use read replicas for every distinct workload (to avoid background jobs evicting your app’s working set from the DB cache).
Copyright © 2015 Citus Data, Inc. 47
Getting to sharding…✓ Optimize application logic
✓ Tweak schemas
✓ Upgrade and tune
✓ Try replication
Split writes
Modularize concerns within your app to isolate write-heavy tables to their own database.
Copyright © 2015 Citus Data, Inc. 48
You’ve already…✓ Optimized application logic
✓ Tweaked schemas
✓ Upgraded and tuned
✓ Tried replication
✓ Split writes
When you’re on the best hardware with a tuned OS, optimized queries, and servers devoted to each workload and you’re still worried about scaling?
Copyright © 2015 Citus Data, Inc. 49
You’re ready to shard.
Copyright © 2015 Citus Data, Inc. 50
Our dream extension
— Creates and manages shards
— Uses regular SQL commands
— Supports replicas/failover
— Integrated with CitusDB
Copyright © 2015 Citus Data, Inc. 51
pg_shard
Copyright © 2015 Citus Data, Inc. 52
pg_shard
Motivation
— Real-time ingest for CitusDB
— Customers building their own
— Could be NoSQL alternative
Copyright © 2015 Citus Data, Inc. 53
pg_shard
User needs
— Dynamic rebalancing/scaling
— “Automagic” failure handling
— Transactions not so important
Copyright © 2015 Citus Data, Inc. 54
Good News, Everyone!
Copyright © 2015 Citus Data, Inc. 55
Upcoming Developments
— Streamlining offerings
— CitusDB soon open-source
— Extension, not standalone
— Real-time modifications
— Contact us for early access
Copyright © 2015 Citus Data, Inc. 56
Sharding principles
Copyright © 2015 Citus Data, Inc. 57
Principles of sharding
— Need to know where to put rows
— And where to find stored ones
— Designate a dimension of data as key
— In relational databases: a column
— Logical shard covers range of values
Copyright © 2015 Citus Data, Inc. 58
Visualized
MongoDB uses logical shards, but calls them “chunks”. Weird, but they made a decent diagram2 of the concept:
2 From the MongoDB Manual, “Shard Keys”
Copyright © 2015 Citus Data, Inc. 59
Shard key refinements
— Pass into hash function(smooths out distribution)
— Use contiguous range
— Specify a list of columns
— Generalize to any expression
Copyright © 2015 Citus Data, Inc. 60
Choosing a key
Copyright © 2015 Citus Data, Inc. 61
The field you choose as your hashed shard key should have a
good cardinality.— MongoDB Manual, “Shard Keys”
Copyright © 2015 Citus Data, Inc. 62
… the correct shard key can have a great impact on […]
performance [and] capability…— ib., “Considerations for Selecting Shard Keys”
Copyright © 2015 Citus Data, Inc. 63
Choosing a key
— What is most important to your application?
— Spreading incoming writes
— Targeting reads to reduce latency
— Consider key frequently in WHERE clauses
— Use a hybrid approach when it makes sense(shard on customer, partition on time)
— Mind the “hot spots”Copyright © 2015 Citus Data, Inc. 64
Costs of poor choice
— Cross-shard scans hurt performance
— Low cardinality limits ultimate scalability
— Switching keys after distribution burdensome
Copyright © 2015 Citus Data, Inc. 65
So how doesthis thing work?
Copyright © 2015 Citus Data, Inc. 66
pg_shard
Installation
— Build from GitHub source
— pgxnclient install pg_shard
— sudo yum install pg_shard_94
— CloudFormation templates3
3 Available on the Citus Data blog
Copyright © 2015 Citus Data, Inc. 67
pg_shard
Master'Node'(PostgreSQL'+'pg_shard)'
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Worker'Node'#1'
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Worker'Node'#2'
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Worker'Node'#3'
shard'and'shard'placement'metadata'
Copyright © 2015 Citus Data, Inc. 68
pg_shard
Master node
— Holds authoritative shard state
— One metadata row per:
— Sharded table
— Shard
— Placement
— Just regular tablesCopyright © 2015 Citus Data, Inc. 69
pg_shard
Master failure
Increasing by acceptable downtime…
1. Use streaming replication and failover
2. Use EBS volume for data directory
3. Restore from pg_dump, etc.
4. Reconstruct from workers
Copyright © 2015 Citus Data, Inc. 70
pg_shard
Metadata structure
postgres=# SELECT * FROM pgs_distribution_metadata.shard;
id | relation_id | storage | min_value | max_value-------+-------------+---------+-------------+------------- 10004 | 177880 | t | -2147483648 | -1879048194 10005 | 177880 | t | -1879048193 | -1610612739 10006 | 177880 | t | -1610612738 | -1342177284 10007 | 177880 | t | -1342177283 | -1073741829 10008 | 177880 | t | -1073741828 | -805306374 10009 | 177880 | t | -805306373 | -536870919 ... | ... | ... | ... | ...
Copyright © 2015 Citus Data, Inc. 71
pg_shard
Worker nodes
— Logical shards are placed on nodes
— Each placement is one PostgreSQL table
— Object names extended by shard identifiere.g. click_events_1001 for shard 1001
— Indexes, constraints propagated at creation
Copyright © 2015 Citus Data, Inc. 72
pg_shard
Worker failure
— Unreachable nodes marked as inactive
— Repair with master_copy_shard_placement
1. Replay DDL commands for table, objects
2. Copy data from healthy node
3. Update master metadata
Copyright © 2015 Citus Data, Inc. 73
pg_shard
First steps…
Copyright © 2015 Citus Data, Inc. 74
pg_shard
Distributing a table
-- create regular table and some indexesCREATE TABLE users ( id integer NOT NULL, name text NOT NULL, birthday date NOT NULL, CONSTRAINT name_present CHECK (btrim(name) != '') );
CREATE INDEX id_idx ON users (id);CREATE INDEX bday_idx ON users (birthday);CREATE INDEX name_idx ON users (name);CREATE INDEX pfx_idx ON users (lower(name) text_pattern_ops);
Copyright © 2015 Citus Data, Inc. 75
pg_shard
Distributing a table
CREATE EXTENSION IF NOT EXISTS pg_shard;
-- designate table as distributed; specify keySELECT master_create_distributed_table('users', 'id');
-- create sixteen shards, each with two copiesSELECT master_create_worker_shards('users', 16, 2);
Copyright © 2015 Citus Data, Inc. 76
pg_shard
Just use SQL!
INSERT INTO users VALUES (1, 'Jason Petersen', '2015-03-23');INSERT INTO users VALUES (2, 'Ozgun Erdogan', '2013-02-11');INSERT INTO users VALUES (3, 'Ageless', NULL);INSERT INTO users VALUES (4, ' ', '2010-08-17');
DELETE FROM users WHERE id = 2;
UPDATE users SET birthday = '1900-06-01' WHERE id = 1;
SELECT name FROM users WHERE id = 1;SELECT max(birthday) FROM users;
Copyright © 2015 Citus Data, Inc. 77
Under the hood
Copyright © 2015 Citus Data, Inc. 78
pg_shard
PostgreSQL hooks
— Full control over command lifecycle
— Specific hooks for specific needs:
— Planning
— Execution (Start, Run, Finish, End)
— Utility
Copyright © 2015 Citus Data, Inc. 79
pg_shard
Planning phase
— Determine whether distributed
— Fall through to PostgreSQL if not(enables regular tables on master!)
— Find involved shards based on shard key
— Deparse query to shard-specific SQL
Copyright © 2015 Citus Data, Inc. 80
pg_shard
Planning example
Starting with the input SQL…
INSERT INTO users VALUES (5, 'Tom Lane', '2005-07-08');
Copyright © 2015 Citus Data, Inc. 81
pg_shard
Planning example
… determine the partition key clauses…
(id = 5)
Copyright © 2015 Citus Data, Inc. 82
pg_shard
Planning example
… use them to find the proper shard…
SELECT id FROM pgs_distribution_metadata.shardWHERE hashint4(5) BETWEEN min_value::integer AND max_value::integer AND relation_id = 'users'::regclass;
# id # -------# 10003
Copyright © 2015 Citus Data, Inc. 83
pg_shard
Planning example
… generate shard-specific SQL…
INSERT INTO users_10003 VALUES (5, 'Tom Lane', '2005-07-08');
… and send it to the shard’s placements.
Copyright © 2015 Citus Data, Inc. 84
pg_shard
Execution
Now we know what the SQL is and where it should be routed. Execution logic differs depending if the query is a SELECT or a modification.
Copyright © 2015 Citus Data, Inc. 85
pg_shard
Distributed modification
— Locks enforce safe commutation
— Replicas visited in predictable order
— Per-session libpq connection pool
— If replica errors out, mark as inactive
Copyright © 2015 Citus Data, Inc. 86
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Worker&Node&
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Worker&Node&
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Worker&Node&
Single'shard-INSERT'Replica1on-factor:-2-
Master&
INSERT"INTO"customer_reviews"...&
Copyright © 2015 Citus Data, Inc. 87
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Worker&Node&
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Worker&Node&
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Worker&Node&
Single'shard-INSERT-One-replica-fails-
Master&
INSERT"INTO"customer_reviews"...&
Copyright © 2015 Citus Data, Inc. 88
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Worker&Node&
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Worker&Node&
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Worker&Node&
Single'shard-INSERT-Master-marks-inac3ve-
Master&
Sets&shard&6,&node&3&to&inac8ve&status&
INSERT"INTO"customer_reviews"...&
Copyright © 2015 Citus Data, Inc. 89
pg_shard
Modification semantics
— Consistent (read your own writes)
— Safety comes from commutativity rules
— Can reorder SELECTs and INSERTs
— Not so for UPDATEs and DELETEs
— Constraints require predictable order
Copyright © 2015 Citus Data, Inc. 90
pg_shard
Targeted SELECT
— Fetch entire result from single shard
— Failover to anther replica on error
— Do not modify state if failure occurs
— Common key-value access pattern
Copyright © 2015 Citus Data, Inc. 91
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Worker&Node&
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Worker&Node&
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Worker&Node&
Targeted(SELECT&Try(first(placement(
Master&
SELECT"*"FROM"customer_reviews""""""""""WHERE"customer_id"="'HN892';&
Copyright © 2015 Citus Data, Inc. 92
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Worker&Node&
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Worker&Node&
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Worker&Node&
Targeted(SELECT(Encounter(error(
Master&
SELECT"*"FROM"customer_reviews""""""""""WHERE"customer_id"="'HN892';&
Copyright © 2015 Citus Data, Inc. 93
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Worker&Node&
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Worker&Node&
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Worker&Node&
Targeted(SELECT(Try(next(placement(
Master&
SELECT"*"FROM"customer_reviews""""""""""WHERE"customer_id"="'HN892';&
Copyright © 2015 Citus Data, Inc. 94
pg_shard
Limitations
— Transactions cannot…
— involve multiple shards
— span multiple statements
— Cross-shard constraints unenforced
Copyright © 2015 Citus Data, Inc. 95
What are people building?
pg_shard’s capabilities and limitations are similar to those of many popular NoSQL solutions.
Copyright © 2015 Citus Data, Inc. 96
What are people building?
pg_shard in Production
— Clickstream event data
— HyperLogLog4 for scalable UNIQUEs
— 30,000 INSERTs/second ingest
— Around 200GB data already
— CitusDB SELECTs: 100x faster
4 “HyperLogLog data structures as a native [PostgreSQL] data type”
Copyright © 2015 Citus Data, Inc. 97
Upcoming features?
— More SQL coverage
— Rebalancing
— Multi-master
— Auto-recovery
— INSERT streaming/pipelining
— Suggestions welcome
Copyright © 2015 Citus Data, Inc. 98
Scaling summary
— Explore these avenues first!
— Many little experiments
— Cross-cutting; whole-stack
— Get out every ounce
Copyright © 2015 Citus Data, Inc. 99
Sharding summary
— Shard once you rule out all else
— Use many small “logical shards”
— Think carefully when picking key
— pg_shard/CitusDB merging!
Copyright © 2015 Citus Data, Inc. 100
pg_shard summary
— Open source sharding for PostgreSQL
— First-class PostgreSQL extension
— LOAD, CREATE TABLE, distribute
— https://github.com/citusdata/pg_shard
Copyright © 2015 Citus Data, Inc. 101
Contact
— Jason: [email protected]
— General: [email protected]
Copyright © 2015 Citus Data, Inc. 102
QuestionsCopyright © 2015 Citus Data, Inc. 103
Copyright © 2015 Citus Data, Inc. 104