59
PostgreSQL 9.4, 9.5 and Beyond JSON, Analytics, and More Uptime Technologies Satoshi Nagayasu @snaga COSCUP 2015

PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei

Embed Size (px)

Citation preview

PostgreSQL 9.4, 9.5 and BeyondJSON, Analytics, and More

Uptime Technologies

Satoshi Nagayasu@snaga

COSCUP 2015

Who I Am• Satoshi Nagayasu

– Database enthusiast. DBA and Data Steward.– Traveling in Asia: Hong Kong, Shenzhen, Beijing, Singapore,

Taipei• Uptime Technologies

– Co-founder– Consulting services around Database and Platform

Technologies.• PostgreSQL

– pgstatindex, pageinspect, xlogdump– PostgresForest, Postgres-XC (cluster technology)– Organizing Japanese Users Group.

Thanks to...• Magnus Hagander• Michael Paquier• Toshi Harada• Noriyoshi Shinoda

• ... and many pg guys!

Agenda• 9.4 & 9.5 Overview• NoSQL (JSON and GIN Index)• Analytics (Aggregation, Mat.View & BRIN)• SQL (UPSERT)• Security (Row Level Security)• Replication and Beyond (Logical Decoding)• Administration (ALTER SYSTEM)• Infrastructure (For Parallelization)• Beyond 9.5

9.4 & 9.5 Overview

9.4 Status• The first official release.

– 9.4 released on December 18th, 2014.

• The latest stable release– 9.4.4 released on June 12th, 2015.

9.5 Status• Current Status

– 9.5 Alpha 2 released on August 5th, 2015

• The first Beta (or another Alpha) will be coming in September.

• The final release will be coming on late this year.

http://www.postgresql.org/about/news/1604/

Statistics• 9.4.4 - compared to 9.3.9

– 3,910 files changed.– 51,724 insertions (+) in *.c, *.h files– 16,387 deletions (-)

• 9.5alpha2 - compared to 9.4.4– 4,246 files changed.– 91,138 insertions (+)– 22,393 deletions (-)

9.4 Overview - Changes

Lots of changes!!

Categories of Enhancements• NoSQL (JSON and GIN Index)• Analytics (Aggregation, Mat.View & BRIN)• SQL (UPSERT)• Security (Row Level Security)• Replication+ (Logical Decoding)• Administration (ALTER SYSTEM)• Basic Infrastructure (Parallelization)

NoSQL(JSON and GIN Index)

NoSQL - JSONB• JSON (9.3) vs. JSONB (9.4)

NoSQL - JSONB• “Binary JSON”

– Different from JSON, a text representation– Faster for searching

• With JSONB...– No duplicated keys allowed. Last wins.– Key order not preserved.– Can take advantages of GIN Index.

NoSQL - JSONBOperator Description9.4-> Get an element by key as a JSON object->> Get an element by key as a text object#> Get an element by path as a JSON object#>> Get an element by path as a text object<@, @> Evaluate whether a JSON object contains a key/value pair? Evaluate whether a JSON object contains a key or a value?| Evaluate whether a JSON object contains ANY of keys or values?& Evaluate whether a JSON object contains ALL of keys or values9.5|| Insert or Update an element to a JSON object- Delete an element by key from a JSON object#- Delete an element by path from a JSON object

http://www.postgresql.org/docs/9.5/static/functions-json.html

NoSQL - GIN Index• JSON+btree vs. JSONB+GIN

– Btree indexes vs. GIN index

http://www.slideshare.net/toshiharada/jpug-studyjsonbdatatype20141011-40103981

Table Index Size Comparison

Analytics(Aggregation & Materialized View)

Analytics - Aggregation• FILTER replaces CASE WHEN.

Analytics - Aggregation• New Aggregate Functions (New in 9.4)

– percentile_cont()– percentile_disc()– mode()– rank()– dense_rank()– percent_rank()– cume_dist()

Analytics - Aggregation• Ordered-set aggregates

– mode(), most common value in a subset

Analytics - Aggregation• Ordered-set aggregates

– rank(), rank of a value in a subset

Analytics - Aggregation

• New in 9.5–ROLLUP()–CUBE()–GROUPING SETS()

Analytics - ROLLUP• Calculates total/subtotal values

Analytics - CUBE• Calculates for all combinations of the

specified columns

Analytics – GROUPING SETS• Runs multiple GROUP BY queries at once

Two GROUP BYsat once.

Analytics – Materialized Views

• REFRESH MATERIALIZED VIEW CONCURRENTLY myview

• Refreshing a MV concurrently (in background) without exclusive lock.

• Usability and availability improved.

Analytics - BRIN Index• Block Range INdex (New in 9.5)

– Holds "summary“ data, instead of raw data.– Reduces index size tremendously.– Also reduces creation/maintenance cost.– Needs extra tuple fetch to get the exact record.

0

50,000

100,000

150,000

200,000

250,000

300,000

Btree BRIN

Elap

sed tim

e (m

s)

Index Creation

0

50,000

100,000

150,000

200,000

250,000

300,000

Btree BRIN

Numbe

r of B

locks

Index Size

0

2

4

6

8

10

12

14

16

18

Btree BRIN

Elap

sed tim

e (m

s)

Select 1 record

https://gist.github.com/snaga/82173bd49749ccf0fa6c

Analytics - BRIN Index• Structure of BRIN Index

Table File

Block Range 1 (128 Blocks)

Block Range 2

Block Range 3BlockRange

Min. Value Max. Value

1 1992-01-02 1992-01-282 1992-01-27 1992-02-083 1992-02-08 1992-02-16… … …

Holds min/max valuesfor “Block Ranges”,

128 blocks each(by default).

(in case a date column)

Analytics - TABLESAMPLE• Allows user to specify random BERNOULLI

sampling or block level SYSTEM sampling– Would improve SELECT query performance

• Need to specify– Sampling method (SYSTEM | BERNOULLI)– Fraction of the table (in a percentage)

• Limitation– Currently accepts only on regular tables and materialized

views

http://www.postgresql.org/docs/9.5/static/sql-select.html

Analytics - TABLESAMPLE• Calculating the average of total price.

– With/without TABLESAMPLE

Analytics - TABLESAMPLEWithout TABLESAMPLE

Cost: 44076

With SYSTEM Sampl.Cost: 1199

With BERNOULLI Sampl.Cost: 25513

UPSERT

INSERT, or UPDATE?• “duplicate key violation” is one of the

bothersome things in database programming

INSERT … ON CONFLICT …• Now, you can INSERT or UPDATE in one

statement with “ON CONFLICT”.

INSERT … ON CONFLICT …

INSERT INTO nation VALUES (12, ‐‐ n_nationkey'JAPAN', ‐‐ n_name2, ‐‐ n_regionkey'Japan (Japanese: 日本 … in East Asia.' ‐‐ n_comment

)ON CONFLICT (n_nationkey)DO UPDATE SET n_comment = EXCLUDED.n_comment;

http://www.postgresql.org/docs/9.5/static/sql-insert.html

• This query updates n_comment column when INSERT conflicts on n_nationkey column.

Row Level Security

Row Level Security• Row Level Security (RLS)

– Allows users to define access policy to determine which rows in the table should be returned.

– Disabled by default.– CREATE POLICY, ALTER POLICY, DROP

POLICY

• Limitation– Not applicable to the system catalog

http://www.postgresql.org/docs/9.5/static/ddl-rowsecurity.html

Row Level Security• Multiple rows (user records) in the table.

Define a policy to filterout with user name

Row Level Security• Each user can see only own record.

“user01” can see only“user01” record

“user02” can see only“user02” record

Replication and Beyond(Logical Decoding)

Replication and Beyond –Logical Decoding

• “Logical” representation from replication stream– INSERT/UPDATE/DELETE operations– Can be replayed on different version/platform

• pg_recvlogical command– Shows how it works

• Replication can be more flexible– BDR (Bi-Directional Rep.), Slony, and more ...– Continuous Backup as well

pg_recvlogical

Administration(ALTER SYSTEM)

Administration - ALTER SYSTEM

• ALTER SYSTEM SET– puts new value in postgresql.auto.conf– pg_reload_conf() reloads them.– postgresql.auto.conf takes priority over

postgresql.conf.

• ALTER SYSTEM RESET– Remove values from postgresql.auto.conf.

Infrastructure(For Parallelization)

Dynamic Background Workers

• In 9.3, background workers must start at the postmaster (listener process) startup.

• After 9.4, they can be launched “on-demand” basis.

• From parallelization point of view...– It allows to launch multiple background

processes to execute child queries in parallel.

Dynamic Shared Memory• Shared memory can be allocated “on-demand”

basis– Cf.) by background workers

• Main segment (ex. shared_buffers) still fixed at startup

• Also supports lightweight message queue

• From parallelization point of view...– It allows to share data and communicate with

several bgworker processes.

My Tiny Favorite(pl/pgsql stacktrace)

pl/pgsql stacktrace

http://h50146.www5.hp.com/services/ci/opensource/pdfs/PostgreSQL_9_4%20_Ver_1_0.pdf

There are many other enhancements,

so please try it asap.

Beyond 9.5

Commitfest 2015-7~CommitFest is a process to review, fix and commit the submitted patches.

• Parallel Seq scan• Waits monitoring• Support multiple synchronous standby servers• and others..

Still work in progress...

commitfest.postgresql.org

Wrap-up• One of the most developer-friendly

RDBMSes in the world.

• Analytics features and the performance are improving.

• Things are going to parallel.

Resources• www.postgresql.org

• www.planetpostgresql.org

• www.pgcon.org

• wiki.postgresql.org

• www.postgresql.org/docs/9.5

• news.ycombinator.com/item?id=10039527

Postgres Toolkit• A script collection to manage PostgreSQL

– Helps DBA to perform complicated tasks– Consists of 13 scripts as of v0.2.2.

• A "Victorinox" for PostgreSQL DBA

• uptime.jp/go/pt

SQL Firewall for Postgres

SQL Injection prevented!

pgDay Asia 2016• As A Joint Event with FOSSASIA 2016

– 1 Day, 2 Tracks (not fixed yet)

• FOSSASIA 2016– March 18h-20th in Singapore

• Still “Work In Progress”, but mark your calendar NOW!

Photo by Michael Cannon https://flic.kr/p/rieAXe

Any Question?• E-mail: [email protected]• Twitter, Github: @snaga• WeChat: satoshinagayasu

Thank You!