24
Skypicker.com Travel anywhere, anytime.

Rubyslava + PyVo #48

Embed Size (px)

Citation preview

Page 1: Rubyslava + PyVo #48

Skypicker.comTravel anywhere, anytime.

Page 2: Rubyslava + PyVo #48

Skypicker• flight ticket search&booking engine

• covering markets in Europe, Russia and China

• hundreds of TBs of airline data processed monthly

• selling thousands of tickets daily

• covering LCCs only

• LCs in progress!

Page 3: Rubyslava + PyVo #48

API• millions of daily searches

• average response time <1s (we are slowing down some queries)

• Built on top of PostgreSQL

• used worldwide =)

• a huge home grown data processing framework running under the API

Page 4: Rubyslava + PyVo #48

SP databases• 5 db clusters, >0.5TB memory each• the main one handling 20M updates / hour (this is

caused be the airline tickets price changes)• main table has 1 billion+ rows• basically unlimited read scaling with the replication

feature• managed mostly by Ansible and custom bash stuff

for semi auto failover • there are tools like repmgr though

Page 5: Rubyslava + PyVo #48

PostgreSQL

• our silver bullet for everything

• for low-cost Big data, there is no better combo than PG+Redshift, yet.

• dont even try with Hadoop/Cassandra, they will make your wallet cry

Page 6: Rubyslava + PyVo #48

HW tips• running on bare metal, because the SSD RAID

10 on Intel 3700 series

• series 3500 will kill you. After few weeks of 24/7 load, there will be a huge performance drop, always

• (its a feature!)

Page 7: Rubyslava + PyVo #48

Why not cloud• AWS is not the best fit for a high performance

PG cluster, the I/O is unstable and unpredictable

• joyent.com cloud is fine (8k$/mth/instance)

• bare metal from Rackspace works also well (1800$/mth). The traffic can get expensive here…

Page 8: Rubyslava + PyVo #48

How we found out these things

• …randomly…by fucking things up…

• But! We multiplied our master db performance 5 times in the past 6 months

• From 15M to nearly 75M

Page 9: Rubyslava + PyVo #48

Prove it!• October 2014, 15M

• April 2015, 75M

Page 10: Rubyslava + PyVo #48

Replication for dummies

• simple master-slave

• a good way to die

Page 11: Rubyslava + PyVo #48

Cascaded replication

Page 12: Rubyslava + PyVo #48

Pros&cons

• adds some replication delay (-)

• nobody cares (+)

• because it scales! (++)

Page 13: Rubyslava + PyVo #48

How the data flows1. new price for the flight pushed to a queue

2. picked up by a worker

3. inserted to db

4. (magic happens here, shitload of updates)

5. copied to slave servers for select statements

6. search query on slave server

7. …booking made…?

8. profit!

Page 14: Rubyslava + PyVo #48

Queue over the db• our data processing framework is pushing the

data to a redis queue

• workers are picking up the data and inserting to DB

• load can be easily balanced here

• you wont loose any data if you need to restart your db (this can be achieved also with pgbouncer)

• monitor the size of the queue and keep it near 0 =)

Page 15: Rubyslava + PyVo #48

HaProxy• probably the most stable piece of software

ever made. TCP balancer

• has a health check for PG

• if your slave will go down, nobody will notice

• (just dont forget to have alert for it)

Page 16: Rubyslava + PyVo #48

Pgbouncer• small shit

• useful when you are doing thousands of connections to your db

• lowered the server load to half

• boosted the writes by 30%

Page 17: Rubyslava + PyVo #48

Optimalizations steps(for dummies)

1. optimize your queries with Explain

2. do some pg config changes

3. buy better hw

4. goto 2

Page 18: Rubyslava + PyVo #48

Little bit advanced!• table partitioning (this is the game changer)

• partial indexes

• turn off vacuum, use pg_repack for rebuilding the tables

• run analyse often

• turn on the genetic query planner

Page 19: Rubyslava + PyVo #48

Redshift

• Dummy PG database from Amazon made for science&some sql

• its costly to download the data from AWS after they are processed

• you should also try Snowflake, Vertica

Page 20: Rubyslava + PyVo #48

Redshift flow• using the PG fdw feature to connect RS

remotely to our slave database

• download data

• process it

• push it to master db

Page 21: Rubyslava + PyVo #48

Postgresql replication

• no battle tested master-master solution, yet (9.4)

• its async - dont forget to monitor the delay between your master and slaves

• cascading replication for unlimited scaling

Page 22: Rubyslava + PyVo #48

Postgresql config tuning

• 12-Step Program for Scaling Web Applications on PostgreSQL from Wanelo.com

• they cover every aspect of the config optimalization and we dont want to copy it here =)

Page 23: Rubyslava + PyVo #48

What are our pains why we are here

• our data will grow 10 times by adding legacy carriers in the next 2 months

• we need DB masters and developers who will help us to manage this growth

Page 24: Rubyslava + PyVo #48

We are hiring!

• We offer

• many money

• skills

Get in touch at [email protected]