Upload
valentine-gogichashvili
View
276
Download
3
Embed Size (px)
Citation preview
PostgreSQL at ZalandoSQL in Fashion
About me
Valentine GogichashviliHead of Data Engineering @ZalandoTechtwitter: @valgoggoogle+: +valgogemail: [email protected]
15 countries4 fulfillment centers15+ million active customers2.9 billion € revenue 2015150,000+ products9,000+ employees
One of Europe's largest online fashion retailers
Zalando Technology
BERLINDORTMUNDDUBLIN
HELSINKI
ERFURT
MÖNCHENGLADBACH
HAMBURG
Zalando Technology
900+ TECHNOLOGISTS
Rapidly growing international team
http://tech.zalando.de
THE HISTORY
Once upon a time...
Started as a tiny online shop
Prototyped on Magento (PHP)
Used MySQL as a database
Web Application
Backend
Database
REBOOT
REBOOT
5½ years ago
● Java○ macro service architecture with SOAP as RPC layer
● PostgreSQL ○ Heavy usage of Stored Procedures○ 4 databases + 1 sharded database on 2 shards
● Python for tooling (i.e code deploy automation)
REBOOT
Java Web Frontend
Java Backend
PostgreSQL
Java Backend
PostgreSQL
Java Backend
PostgreSQL 9.0 RC1PostgreSQL 9.0
RC1PostgreSQL 9.0 RC1PostgreSQL
"macro" services
SQL
SQL
● human readable query language
● standed the test of time
● allows automatic optimization of access to data
SQL
● window functions for moving averages etc.
● recursive SQL for searching trees and networks
● non-blocking locking
● DDL locking (schema guarantees)
PostgreSQL
PostgreSQL
The world's
most advanced
open-source
database
PostgreSQL
● Minimal DDL locks (easy schema changes)
● Nearest neighbour searching using an index
● Block-range indexes
● Serializability (true theoretical serializability)
● JSONB - compressed JSON, fully indexable
● Data sampling with guarantees on execution time
PostgreSQL
● Read scalability: "Hot standby" replicas
● Synchronous Replication
● Cascading Replication
● Logical Decoding to allow Change Data Capture
● BDR for Multi-Master Scalability
● Write scalability/sharding (is on the way)
PostgreSQL
● Multiple data models (tables, JSON, ROWs, arrays)
● ACID
● Eventual Consistency (on-demand)
Stored Procedures
Stored Procedures
● clean transaction scope
● very clean data
● processing close to data
● no need in classical ORM mappers
Stored Procedures
● no debugger in Eclipse or IntelliJ
● difficult for projects heavy on CRUD
● versioning automation is needed
Stored Procedures
Java Sproc Wrapper
● very easy to use
● proxies stored procedures as Java method calls
● supports complex type mapping
● supports transparent sharding
Stored Procedures
CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS intAS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id$$LANGUAGE 'sql' SECURITY DEFINER;
SQL
Stored Procedures
@SProcServicepublic interface CustomerSProcService { @SProcCall int registerCustomer(@SProcParam String email, @SProcParam Gender gender);}
JAVA
CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS intAS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id$$LANGUAGE 'sql' SECURITY DEFINER;
SQL
Stored Procedures
@SProcServicepublic interface CustomerSProcService { @SProcCall int registerCustomer(@SProcParam String email, @SProcParam Gender gender);}
CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS intAS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id$$LANGUAGE 'sql' SECURITY DEFINER;
SQL
JAVA
Stored Procedures
@SProcCallList<Order> findOrders(@SProcParam String email);
JAVA
CREATE FUNCTION find_orders(p_email text, OUT order_id int, OUT order_created timestamptz, OUT shipping_address order_address) RETURNS SETOF recordAS $$ SELECT o_id, o_created, ROW(oa_street, oa_city, oa_country)::order_address FROM z_data."order" JOIN z_data.order_address ON oa_order_id = o_id JOIN z_data.customer ON c_id = o_customer_id WHERE c_email = p_email$$LANGUAGE 'sql' SECURITY DEFINER;
SQL
Stored Procedures
CREATE FUNCTION find_orders(p_email text, OUT order_id int, OUT order_created timestamptz, OUT shipping_address order_address) RETURNS SETOF recordAS $$ SELECT o_id, o_created, ROW(oa_street, oa_city, oa_country)::order_address FROM z_data."order" JOIN z_data.order_address ON oa_order_id = o_id JOIN z_data.customer ON c_id = o_customer_id WHERE c_email = p_email$$LANGUAGE 'sql' SECURITY DEFINER;
@SProcCallList<Order> findOrders(@SProcParam String email);
JAVA
SQL
Stored Procedures
@SProcCall
int registerCustomer(@SProcParam @ShardKey CustomerNumber customerNumber,
@SProcParam String email,
@SProcParam Gender gender);
@SProcCall
Article getArticle(@SProcParam @ShardKey Sku sku);
@SProcCall(runOnAllShards = true, parallel = true)
List<Order> findOrders(@SProcParam String email);
JAVA
JAVA
JAVA
Stored Procedures
Virtual Shard IDs (pre-sharding)
0 1 0 1 1 0 0 1
7 6 4 3 2 1 05
md5(partitioning_key)
SprocWrapper
PostgreSQL 9.0 RC1PostgreSQL 9.0
RC1PostgreSQL 9.0 RC1PostgreSQL
Java Application
Stored Procedures
Schema based stored procedure versioning
● uses search_path on the clients
● new schema for every application version
● automated deployments
Stored Procedures
Database Tables
api_v16_01
Stored Procedures
Database Tables
api_v16_01
search_path = api_v16_01, public;
Stored Procedures
Database Tables
api_v16_01api_v16_02
search_path = api_v16_01, public;
Stored Procedures
Database Tables
api_v16_01api_v16_02
search_path = api_v16_02, public;
search_path = api_v16_01, public;
Stored Procedures
Database Tables
api_v16_01api_v16_02
search_path = api_v16_02, public;
Stored Procedures
Database Tables
api_v16_01api_v16_02
search_path = api_v16_02, public;
Stored Procedures
Database Tables
api_v16_02
search_path = api_v16_02, public;
https://github.com/zalando/PGObserver
https://github.com/zalando/PGObserver
Schema Management
Schema management
DBDIFF database schema management
● schema changes must be documented
● atomic changes per feature
● locks should be minimal
Schema management
BEGIN; SELECT _v.register_patch('ZEOS-15430.order');
CREATE TABLE z_data.order_address ( oa_id int SERIAL, oa_country z_data.country, oa_city varchar(64), oa_street varchar(128), ... );
ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id);COMMIT;
DBDIFF SQL
Schema management
BEGIN; SELECT _v.register_patch('ZEOS-15430.order');
\i order/database/order/10_tables/10_order_address.sql ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id);COMMIT;
DBDIFF SQL
Schema management
BEGIN; SELECT _v.register_patch('ZEOS-15430.order');
\i order/database/order/10_tables/10_order_address.sql SET statement_timeout TO '3s';
ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id);COMMIT;
DBDIFF SQL
Schema management
pg_view (https://github.com/zalando/pg_view)
● helps to monitor locks and load in real-time
● used during all DB schema change rollouts
nice_updater (https://github.com/zalando/acid-tools)
● runs big migrations controlling database/system load
● used during automatic data migrations
https://github.com/zalando/pg_view
High Availability
High Availability
Patroni - High Availability Runner● etcd or ZooKeeper for master electionSpilo - PostgreSQL AWS appliance● Zalando Patroni
for High Availability● Docker
for packaging● Zalando STUPS
for audit compliance
Monitoring: ZMON
Open Source at Zalando Technology
● Tech Blog of Zalando Technology● https://zalando.github.io/ - Open Source Projects
○ Java Sproc Wrapper○ PGObserver○ pg_view○ Patroni○ Spilo○ ...
Questions?
Where to Find Us:
Tech Blog: tech.zalando.comGitHub: github.com/zalando
Twitter: @ZalandoTechInstagram: zalandotech
Jobs: http://tech.zalando.com/jobs