Prof. Stefan Keller, IFS / Geometa Lab HSR
(Slides © CC-BY)
PostgreSQL as GPU Database
for Real-Time Analytics
Vortrag, Swiss PUG, Zürich, 9. November 2017
About Scalability
Scale-up
Vertical
Add more HW-components (homo- or heterogeneous)
Expensive(?)
No open source, platform lock-in(?)
Scale-out
Horizontal
Cheap commodity HW as „nodes‟
Flexibly add more nodes
Open source
Need to relax constraints, even ACID (BASE)?
2 Stefan Keller, "PostgreSQL as GPU Database..."
GPU Databases
Stefan Keller, "PostgreSQL as GPU Database..."
GPU Databases
More and more GPUs and Memory Bandwith…
Use Cases:
Analytical - not transactional
OLTP + OLAP = Hybrid transactional/analytical processing
(HTAP) => No need to move data to warehouse
Setting:
Single-node => much simpler to maintain
Discrete GPU (rather than FPGA, speciality chips)
CPU vs. GPU:
CPU is suited for low latency, complex data + ops
GPU is suited for troughput of homogeneous ops
4
GPU Database Reference
Architecture
Master of Science in Engineering (MSE)
Stefan Keller, "PostgreSQL as GPU Database..."
Paper
Paper by Heime, Siegmund, Bellatreche, Saake (Universities of Magdeburg, Berlin, Passau, Futuroscope/France) on
GPU-accelerated database systems: Survey and open challenges in Transactions on Large-Scale Data-and Knowledge-Centered Systems XV. Springer Berlin Heidelberg, 2014. Pages 1-35. Weblink: http://bit.ly/1rMOuZC (pdf)
Contents:
Design Choices
Evaluation of 8 GDBMS
Reference architecture
Insights for all co-processors
6
Overview
Exemplary architecture of a system with a graphics card:
7 Stefan Keller, "PostgreSQL as GPU Database..."
Architecture of GPU-aware DBMSs
Design choices/space of GPU-aware DBMSs
8 Stefan Keller, "PostgreSQL as GPU Database..."
PG-Strom / PostgreSQL
Stefan Keller, "PostgreSQL as GPU Database..."
PostgreSQL - www.postgresql.org
“The world's most advanced open source database”.
Open source aka BSD/MIT license
PostgreSQL 10 Released October 2017 (since 2002)
Fully ACID compliant object-relational database system
Reputation for reliability, data integrity, and correctness
Broad community
Runs on all major operating systems
Broad support of SQL and data types
Scalable in quantity of data and concurrent users
Extensible: Modules (EXTENSION, Network), Foreign
Data Wrappers (SQL/MED), Language APIs
10 Stefan Keller, "PostgreSQL as GPU Database..."
PG-Strom
PG-Strom - http://strom.kaigai.gr.jp/ - Version 1.0
“Limit breaker of PostgreSQL”
Extension module to accelerate SQL workloads using multi-thousands cores and high bandwidth memory. Open source GPLv2.
Requirements PostgreSQL 9.5
CUDA
Main use cases
In-database analytics: realt-time statistics
Rapid batch processing: ETL/ELT
Main SW architecture design decisions:
Heterogeneous scale-up
On-the-fly native GPU code generation
Asynchronous pipeline execution mode
11 Stefan Keller, "PostgreSQL as GPU Database..."
PG-Strom: SW architecture
12 Stefan Keller, "PostgreSQL as GPU Database..."
PG-Strom: Overview
13 Stefan Keller, "PostgreSQL as GPU Database..."
Source: http://strom.kaigai.gr.jp/
PG-Strom: Overview ff.
14 Stefan Keller, "PostgreSQL as GPU Database..."
PG-Strom: Features - Data types
Data Types:
Numeric: …; Date/Time: …; Others: bool, money
Text: …
Limits on text and varchar(x)
=> "GPU cannot process compressed or TOAST'ed data"
=> "ALTER TABLE ... SET STORAGE PLAIN" or MAIN
Not supported:
geometry, geography (PostGIS)
See Reference:Data Types for details
Internals:
Custom Scan Provider, see
www.postgresql.org/docs/current/static/custom-scan.html
15 Stefan Keller, "PostgreSQL as GPU Database..."
PG-Strom: Features – SQL workloads
Full Table Scan
with scan qualifiers, GPU runs evaluation of scan qualifier and filter out invisible rows…
Tables Join
Parallel version of hash-join algorithm and simple (none parameterized) nest-loop algorithm are supported…
Group By/Aggregation
GPU runs pre-processing of aggregate operations, to reduce the number of rows to be processed by CPU….
Projection
When SQL query contains complicated mathematical formulas, GPU runs calculation of these expression on the device, then CPU just references the calculated results
16 Stefan Keller, "PostgreSQL as GPU Database..."
PG-Strom: Limits
Latency
0.2-0.3 sec to initialize GPU device
Max. concurrent sessions
up to 3-5
Database size:
10 GB = data in shared buffer of PostgreSQL, or disk cache
of operating system
Tipp: Use pg_prewarm
See http://strom.kaigai.gr.jp/install.html
17 Stefan Keller, "PostgreSQL as GPU Database..."
PG-Strom: Performance
Estimations:
RDBMS + GPU => factor 3
Columnar In-Memory => factor 10
Pure GPU => factor 100
Benchmarks
See next slides
See Seminar 22. January 2018, 14-16h, HSR Rapperswil
18 Stefan Keller, "PostgreSQL as GPU Database..."
19 Stefan Keller, "PostgreSQL as GPU Database..."
PG-Strom: Further development
Version 1.x
More concurrent sessions
Data size: SSD collaboration feature at v2.0
PostGIS?
Where is it compared to the Rerefence Architecture?
…
20 Stefan Keller, "PostgreSQL as GPU Database..."
GPU Databases - öffentliche
Präsentationen im Seminar
Database Systems der HSR
Master of Science in Engineering (MSE)
Stefan Keller, "PostgreSQL as GPU Database..."
Seminar
SW:
PG-Storm 1.0 / PostgreSQL 9.5
MapD Open Source Edition
PostgreSQL 10, Tuned
HW:
Commodity Server („Pizzabox“)
IBM Power8 Server („Pizzabox“)
Data, Benchmarks, Docker-Files
See https://wiki.hsr.ch/Datenbanken/wiki.cgi?SeminarDatenbanksystemeHS1718
22
Seminar
Benchmarks:
Cold start PG-Storm, MapD, PostgreSQL (= 3x)
Warm start PG-Storm, MapD, PostgreSQL (= 3x)
Presentations:
4 students
German spoken, english report
Final (public) presentations:
22. January 2018, 14-16h
HSR Rapperswil, Room 8.125
Registration: http://techup.ch/tag/htap
23 Stefan Keller, "PostgreSQL as GPU Database..."
Discussion
Credits
Kohei KaiGai
Stefan Keller
Geometa Lab at Institute for Software
HSR Hochschule für Technik Rapperswil
www.hsr.ch/geometalab
@sfkeller
24 Stefan Keller, "PostgreSQL as GPU Database..."