Upload
satoshi-nagayasu
View
687
Download
2
Tags:
Embed Size (px)
Citation preview
PostgreSQL 9.4, 9.5 and BeyondJSON, Analytics, and More
Uptime Technologies
Satoshi Nagayasu@snaga
COSCUP 2015
Who I Am• Satoshi Nagayasu
– Database enthusiast. DBA and Data Steward.– Traveling in Asia: Hong Kong, Shenzhen, Beijing, Singapore,
Taipei• Uptime Technologies
– Co-founder– Consulting services around Database and Platform
Technologies.• PostgreSQL
– pgstatindex, pageinspect, xlogdump– PostgresForest, Postgres-XC (cluster technology)– Organizing Japanese Users Group.
Thanks to...• Magnus Hagander• Michael Paquier• Toshi Harada• Noriyoshi Shinoda
• ... and many pg guys!
Agenda• 9.4 & 9.5 Overview• NoSQL (JSON and GIN Index)• Analytics (Aggregation, Mat.View & BRIN)• SQL (UPSERT)• Security (Row Level Security)• Replication and Beyond (Logical Decoding)• Administration (ALTER SYSTEM)• Infrastructure (For Parallelization)• Beyond 9.5
9.4 Status• The first official release.
– 9.4 released on December 18th, 2014.
• The latest stable release– 9.4.4 released on June 12th, 2015.
9.5 Status• Current Status
– 9.5 Alpha 2 released on August 5th, 2015
• The first Beta (or another Alpha) will be coming in September.
• The final release will be coming on late this year.
http://www.postgresql.org/about/news/1604/
Statistics• 9.4.4 - compared to 9.3.9
– 3,910 files changed.– 51,724 insertions (+) in *.c, *.h files– 16,387 deletions (-)
• 9.5alpha2 - compared to 9.4.4– 4,246 files changed.– 91,138 insertions (+)– 22,393 deletions (-)
Categories of Enhancements• NoSQL (JSON and GIN Index)• Analytics (Aggregation, Mat.View & BRIN)• SQL (UPSERT)• Security (Row Level Security)• Replication+ (Logical Decoding)• Administration (ALTER SYSTEM)• Basic Infrastructure (Parallelization)
NoSQL - JSONB• “Binary JSON”
– Different from JSON, a text representation– Faster for searching
• With JSONB...– No duplicated keys allowed. Last wins.– Key order not preserved.– Can take advantages of GIN Index.
NoSQL - JSONBOperator Description9.4-> Get an element by key as a JSON object->> Get an element by key as a text object#> Get an element by path as a JSON object#>> Get an element by path as a text object<@, @> Evaluate whether a JSON object contains a key/value pair? Evaluate whether a JSON object contains a key or a value?| Evaluate whether a JSON object contains ANY of keys or values?& Evaluate whether a JSON object contains ALL of keys or values9.5|| Insert or Update an element to a JSON object- Delete an element by key from a JSON object#- Delete an element by path from a JSON object
http://www.postgresql.org/docs/9.5/static/functions-json.html
NoSQL - GIN Index• JSON+btree vs. JSONB+GIN
– Btree indexes vs. GIN index
http://www.slideshare.net/toshiharada/jpug-studyjsonbdatatype20141011-40103981
Table Index Size Comparison
Analytics - Aggregation• New Aggregate Functions (New in 9.4)
– percentile_cont()– percentile_disc()– mode()– rank()– dense_rank()– percent_rank()– cume_dist()
Analytics – Materialized Views
• REFRESH MATERIALIZED VIEW CONCURRENTLY myview
• Refreshing a MV concurrently (in background) without exclusive lock.
• Usability and availability improved.
Analytics - BRIN Index• Block Range INdex (New in 9.5)
– Holds "summary“ data, instead of raw data.– Reduces index size tremendously.– Also reduces creation/maintenance cost.– Needs extra tuple fetch to get the exact record.
0
50,000
100,000
150,000
200,000
250,000
300,000
Btree BRIN
Elap
sed tim
e (m
s)
Index Creation
0
50,000
100,000
150,000
200,000
250,000
300,000
Btree BRIN
Numbe
r of B
locks
Index Size
0
2
4
6
8
10
12
14
16
18
Btree BRIN
Elap
sed tim
e (m
s)
Select 1 record
https://gist.github.com/snaga/82173bd49749ccf0fa6c
Analytics - BRIN Index• Structure of BRIN Index
Table File
Block Range 1 (128 Blocks)
Block Range 2
Block Range 3BlockRange
Min. Value Max. Value
1 1992-01-02 1992-01-282 1992-01-27 1992-02-083 1992-02-08 1992-02-16… … …
Holds min/max valuesfor “Block Ranges”,
128 blocks each(by default).
(in case a date column)
Analytics - TABLESAMPLE• Allows user to specify random BERNOULLI
sampling or block level SYSTEM sampling– Would improve SELECT query performance
• Need to specify– Sampling method (SYSTEM | BERNOULLI)– Fraction of the table (in a percentage)
• Limitation– Currently accepts only on regular tables and materialized
views
http://www.postgresql.org/docs/9.5/static/sql-select.html
Analytics - TABLESAMPLEWithout TABLESAMPLE
Cost: 44076
With SYSTEM Sampl.Cost: 1199
With BERNOULLI Sampl.Cost: 25513
INSERT, or UPDATE?• “duplicate key violation” is one of the
bothersome things in database programming
INSERT … ON CONFLICT …
INSERT INTO nation VALUES (12, ‐‐ n_nationkey'JAPAN', ‐‐ n_name2, ‐‐ n_regionkey'Japan (Japanese: 日本 … in East Asia.' ‐‐ n_comment
)ON CONFLICT (n_nationkey)DO UPDATE SET n_comment = EXCLUDED.n_comment;
http://www.postgresql.org/docs/9.5/static/sql-insert.html
• This query updates n_comment column when INSERT conflicts on n_nationkey column.
Row Level Security• Row Level Security (RLS)
– Allows users to define access policy to determine which rows in the table should be returned.
– Disabled by default.– CREATE POLICY, ALTER POLICY, DROP
POLICY
• Limitation– Not applicable to the system catalog
http://www.postgresql.org/docs/9.5/static/ddl-rowsecurity.html
Row Level Security• Multiple rows (user records) in the table.
Define a policy to filterout with user name
Row Level Security• Each user can see only own record.
“user01” can see only“user01” record
“user02” can see only“user02” record
Replication and Beyond –Logical Decoding
• “Logical” representation from replication stream– INSERT/UPDATE/DELETE operations– Can be replayed on different version/platform
• pg_recvlogical command– Shows how it works
• Replication can be more flexible– BDR (Bi-Directional Rep.), Slony, and more ...– Continuous Backup as well
Administration - ALTER SYSTEM
• ALTER SYSTEM SET– puts new value in postgresql.auto.conf– pg_reload_conf() reloads them.– postgresql.auto.conf takes priority over
postgresql.conf.
• ALTER SYSTEM RESET– Remove values from postgresql.auto.conf.
Dynamic Background Workers
• In 9.3, background workers must start at the postmaster (listener process) startup.
• After 9.4, they can be launched “on-demand” basis.
• From parallelization point of view...– It allows to launch multiple background
processes to execute child queries in parallel.
Dynamic Shared Memory• Shared memory can be allocated “on-demand”
basis– Cf.) by background workers
• Main segment (ex. shared_buffers) still fixed at startup
• Also supports lightweight message queue
• From parallelization point of view...– It allows to share data and communicate with
several bgworker processes.
pl/pgsql stacktrace
http://h50146.www5.hp.com/services/ci/opensource/pdfs/PostgreSQL_9_4%20_Ver_1_0.pdf
Commitfest 2015-7~CommitFest is a process to review, fix and commit the submitted patches.
• Parallel Seq scan• Waits monitoring• Support multiple synchronous standby servers• and others..
Still work in progress...
commitfest.postgresql.org
Wrap-up• One of the most developer-friendly
RDBMSes in the world.
• Analytics features and the performance are improving.
• Things are going to parallel.
Resources• www.postgresql.org
• www.planetpostgresql.org
• www.pgcon.org
• wiki.postgresql.org
• www.postgresql.org/docs/9.5
• news.ycombinator.com/item?id=10039527
Postgres Toolkit• A script collection to manage PostgreSQL
– Helps DBA to perform complicated tasks– Consists of 13 scripts as of v0.2.2.
• A "Victorinox" for PostgreSQL DBA
• uptime.jp/go/pt
pgDay Asia 2016• As A Joint Event with FOSSASIA 2016
– 1 Day, 2 Tracks (not fixed yet)
• FOSSASIA 2016– March 18h-20th in Singapore
• Still “Work In Progress”, but mark your calendar NOW!
Photo by Michael Cannon https://flic.kr/p/rieAXe
Any Question?• E-mail: [email protected]• Twitter, Github: @snaga• WeChat: satoshinagayasu