41
Sharing Code and Experiences @fabriziomello

GSoC2014 - Uniritter Presentation May, 2015

Embed Size (px)

Citation preview

Sharing Code and Experiences

@fabriziomello

Just to stay all of us tuned!!!

Everybody knows what a database is?

Is this a database?

Of course not!

But what wikipedia tell us?

A database is an integrated and organized collection of logically related records or files or data that are stored in a computer system which consolidates records previously stored in a separate files into a common pool of data records that provides data for many applications.Source: http://en.wikipedia.org/wiki/Database

WTF???

Ok, more simple and in pt-br !!

“Bancos de dados ou bases de dados são coleções organizadas de dados que se relacionam de forma a criar algum sentido (Informação) e dar mais eficiência durante uma pesquisa ou estudo.” (Wikipedia)

Source: http://pt.wikipedia.org/wiki/Banco_de_dados

Database is a conceptRelational Database Management Systems

is the implementation of this concept

About me

● IT experience since 1993○ Programming Languages (Basic, C, Clipper, Pascal,

PHP, Javascript, …)○ Operating Systems (Windows “argh”, Unix and

Linux)○ PostgreSQL, Firebird, MySQL, Oracle○ Agile Methodologies (XP, Lean, Scrum, …)○ …

Fabrízio de Royes Mello

Fabrízio de Royes Mello● Bachelor in Information Systems in 2002

● Entrepeneur at http://timbira.com

● Agile Methodologies Specialization student 2014/2015

● PostgreSQL colaborator since 2008 (Brazilian community and now the international too)

… and nowadays

● PostgreSQL contributor (more than 27 patches as developer and/or reviewer)

● Brazilian Community○ http://postgresql.org.br○ http://listas.postgresql.org.br

● PostgreSQL Consultant at Timbira○ http://timbira.com.br

● Judô Practitioner

About PostgreSQL

PostgreSQL (http://postgresql.org) ● The world’s most advanced open source database● Run in all major operating systems: Linux, UNIX (AIX,

BSD, HP-UX, SGI IRIX, Mac OS X, Solaris, Tru64), and Windows

● Fully ACID compliant (Atomicity, Consistency, Isolation and Durability)

● Full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages)

● Native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, among others.

PostgreSQL (http://postgresql.org) ● Before : born from INGRES● 1986 : Project start (Berkley)● 1987 : First Postgres version Postgres● 1991 : (v 3) with the most of the actual features● 1993 : (v 4.2) last released by Berkley● 1994 : Andrew Yu and Jolly Chen release Postgre95

with support to SQL language● 1997 : (v 6) Name changes to PostgreSQL● 2000 : (v 7) Support to Foreign Keys

PostgreSQL (http://postgresql.org) ● 2005 : (v 8) Native port to Windows, Tablespaces,

Savepoints, Point-In-Time-Recovery● 2005 : (v 8.1) Two-phase Commit, Roles● 2006 : (v 8.2) [Insert, Update, Delete] Returning,

improve performance OLTP and BI● 2008 : (v 8.3) Debug PL/PgSQL, Tsearch2 (XML)

incorporated to the core, performance improvements● 2009 : (v 8.4) Windowing Functions, Common Table

Expressions and Recursive Queries, Parallel Restore, “pg_upgrade”

PostgreSQL (http://postgresql.org) ● 2010 : (v 9.0) Hot Standby and Streaming Replication● 2011 : (v 9.1) Synchronous Replicacion, FDW

(SQL/MED), CREATE EXTENSION, Unlogged Tables● 2012 : (v 9.2) Index-only Scans, Cascading Replication,

JSON, Range Types● 2013 : (v 9.3) Materialized Views, Lateral Join, writable

FDW, Event Triggers, Background Workers● 2014 : (v 9.4) JSONB, Logical Decoding, Dynamic

Background Workers

PostgreSQL (http://postgresql.org) ● 2015 : (v 9.5) INSERT … ON CONFLICT UPDATE

(upsert), IMPORT FOREIGN SCHEMA, ALTER TABLE .. SET LOGGED, Parallel Infrastructure

● 2016 : (v 9.6) Parallel Query??? BDR (Bi-directional Replication)???

About FOOS and Google

FOSS (free and open source software) and me

● My first contact was using Linux in 1997● I fell in love with this culture since then● In 1999 I met PostgreSQL so since then I

knew this would be part of my life● Because of this decision I had a lot of

troubles, including financial…● But here I am :-)

Is a global program that offers students stipends to write code for open source

projects.

We have worked with the open source community to

identify and fund exciting projects for the upcoming

summer.

Connect students to open source communities

GSoC and PostgreSQL

● Since 2006● Cool projects

○ Fast GiST index build○ New phpPgAdmin Plugin Architecture (brazilian)○ pgAdmin database designer○ Better indexing for ranges○ Document collection Foreign-data Wrapper

And now my project ...

PostgreSQL 9.1 introduced a new kind of tableUnlogged Tables

What means “Unlogged”?

First we need to know what means “WAL”

PostgreSQL is Full-ACID and to guarantee data integrity uses a standard method called

WAL (Write-Ahead Logging)

WAL (Write-Ahead Logging)“In computer science, write-ahead logging (WAL) is a family of techniques for providing atomicity and durability (two of the ACID properties) in database systems.

In a system using WAL, all modifications are written to a log before they are applied. Usually both redo and undo information is stored in the log.”

http://en.wikipedia.org/wiki/Write-ahead_logging

Ok, and what means “Unlogged” ?

● Unlogged means that the data written in these tables is not written to WAL.

● So it makes written really, really fast compared to written into regular tables.

So I’ll use it to all of my tables...

● However you won’t want to do that, because

● They are neither crash-safe (an unlogged table is automatically truncated after a crash or unclean shutdown)

● And they are nor replicated using SR

But there are some cool use cases

● Speed ETL jobs● Cache● Session State● Queues?!● ...

And now we have the power to ...

● change from UNLOGGED to LOGGED○ ALTER TABLE name SET LOGGED;

● change from LOGGED to UNLOGGED○ ALTER TABLE name SET UNLOGGED;

Already committed commit: f41872d0c1239d36ab03393c39ec0b70e9ee2a3cauthor: Alvaro Herrera <[email protected]>date: Fri, 22 Aug 2014 14:27:00 -0400Implement ALTER TABLE .. SET LOGGED / UNLOGGED

This enables changing permanent (logged) tables to unlogged andvice-versa.

(Docs for ALTER TABLE / SET TABLESPACE got shuffled in an order thathopefully makes more sense than the original.)

Author: Fabrízio de Royes MelloReviewed by: Christoph Berg, Andres Freund, Thom BrownSome tweaking by Álvaro Herrera

How it works

1. Acquire AcessExclusiveLock2. Check dependencies

a. Cannot change temp tablesb. Check Foreign Keys

3. Change indexes “relpersistence”4. Create new heap/toast with new relpersistence5. Rewrite heap/toast6. Rewrite indexes

New patch with refactoring1. Acquire AcessExclusiveLock2. Check dependencies

a. Cannot change temp tablesb. Check Foreign Keys

3. Create new heap/toast with new relpersistence (pass down relpersistence to reindex_index)

4. Rewrite heap/toast5. Rewrite indexes

Currently Caveats

● AccessExclusiveLock● Rewrite datafiles

Future work

● Don’t rewrite datafiles when wal_level = minimal

● Unlogged Indexes on Regular Tables● Unlogged Materialized Views (was reverted

by Tom Lane because of the bad design)

Questions?

Special thanks to

● Stephen Frost (mentor)● Josh Berkus and Thom Brown (organizers)● Christoph Berg (patch review)● Álvaro Herrera (patch review and commit)● Maristela Kohlrausch de Andrade (my

english teacher)

● http://fabriziomello.github.io● https://www.linkedin.com/in/fabriziomello● @fabriziomello

My contacts